[XLA:GPU] Forward batched dot to cublas instead of expanding it
This gives a huge speedup for users of batchdot. This is a minimal implementation without autotuning and without support for strided batch gemm. PiperOrigin-RevId: 207247740
Loading
Please sign in to comment