Commit f08f24cd authored by Ben Barsdell's avatar Ben Barsdell Committed by Jonathan Hseu
Browse files

Add GPU support for float16 batched matmul (#18436)

* Add GPU support for float16 batched matmul

- Uses cublasGemmBatchedEx introduced in CUDA 9.1.
- Includes support for Tensor Op math.
- Falls back to a loop over non-batched gemm calls on older CUDA
  versions or GPU architectures.

* Refactor GPU batched gemm into one internal func
parent 9201e2c0
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment