Commit e36c16c6 authored by Benjamin Kramer's avatar Benjamin Kramer Committed by TensorFlower Gardener
Browse files

[XLA:GPU] Use strided batched gemm instead of building pointer tables.

This is mostly a huge amount of plumbing just to call into the cublas functions.
blasGemmStridedBatched has been available since CUDA 8.0.

For autotuning we'd need cublasGemmStridedBatchedEx, which is new in CUDA 9.2
so I didn't wire that up yet.

PiperOrigin-RevId: 207285707
parent 7935c176
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment