Commit 37b48fac authored by Benjamin Kramer's avatar Benjamin Kramer Committed by TensorFlower Gardener
Browse files

[XLA:GPU] Forward batched dot to cublas instead of expanding it

This gives a huge speedup for users of batchdot. This is a minimal implementation without autotuning and without support for strided batch gemm.

PiperOrigin-RevId: 207247740
parent f74a3af2
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment