[XLA:GPU] Add a fast version of gemmStridedBatched for cuda 9.1
It's unfortunate that this was only added in 9.1, but I haven't found a good way of emulating the behavior on 9.0 without falling back to non-batched gemms. PiperOrigin-RevId: 207286575
Loading
Please sign in to comment