Commit 25abe106 authored Jan 11, 2019 by Sanjoy Das Committed by TensorFlower Gardener Jan 11, 2019

[XLA:CPU] Implement batch dot

For now we lower a batch dot to N non-batch dot operations.  In the future we
may consider a more direct lowering.  This current CL is still better than what
we had because:

 - We don't blow up compile time by unrolling the batch dot
 - We avoid the slice and concat in the cases where they wouldn't get optimized away

After this CL DotDecomposer can be simplified, but I'll wait a bit and do that
in a separate CL to make it easier to roll this CL back if necessary.

PiperOrigin-RevId: 228926335

parent 5a4bf97a

Show whitespace changes

Inline Side-by-side

Please to comment