[XLA:CPU] Implement batch dot
For now we lower a batch dot to N non-batch dot operations. In the future we may consider a more direct lowering. This current CL is still better than what we had because: - We don't blow up compile time by unrolling the batch dot - We avoid the slice and concat in the cases where they wouldn't get optimized away After this CL DotDecomposer can be simplified, but I'll wait a bit and do that in a separate CL to make it easier to roll this CL back if necessary. PiperOrigin-RevId: 228926335
Loading
Please sign in to comment