Commit 9ff9c1f6 authored Sep 22, 2016 by A. Unique TensorFlower Committed by TensorFlower Gardener Sep 22, 2016

Parallelize inner matrix multiplications of BatchMatMul on CPU when appropriate.

* Uses simple heuristics to choose between parallelizing outer (batch), inner (matmul) or both.
* Adds benchmarks for BatchMatMul.
* Switches matmul benchmark to use real time so GFlops reported are w.r.t. walltime and measure the effect of multi-threading.
* Fixes bug in cost_per_unit calculation. The old code calculated B*M*N instead of M*N*K.
Change: 134025273

parent d738806a

Show whitespace changes

Inline Side-by-side

Please to comment