Commit e36c16c6 authored Aug 03, 2018 by Benjamin Kramer Committed by TensorFlower Gardener Aug 03, 2018

[XLA:GPU] Use strided batched gemm instead of building pointer tables.

This is mostly a huge amount of plumbing just to call into the cublas functions.
blasGemmStridedBatched has been available since CUDA 8.0.

For autotuning we'd need cublasGemmStridedBatchedEx, which is new in CUDA 9.2
so I didn't wire that up yet.

PiperOrigin-RevId: 207285707

parent 7935c176

Show whitespace changes

Inline Side-by-side

Please to comment