Commit 37b48fac authored Aug 03, 2018 by Benjamin Kramer Committed by TensorFlower Gardener Aug 03, 2018

[XLA:GPU] Forward batched dot to cublas instead of expanding it

This gives a huge speedup for users of batchdot. This is a minimal implementation without autotuning and without support for strided batch gemm.

PiperOrigin-RevId: 207247740

parent f74a3af2

Show whitespace changes

Inline Side-by-side

Please to comment