Commit f08f24cd authored May 10, 2018 by Ben Barsdell Committed by Jonathan Hseu May 10, 2018

Add GPU support for float16 batched matmul (#18436)

* Add GPU support for float16 batched matmul

- Uses cublasGemmBatchedEx introduced in CUDA 9.1.
- Includes support for Tensor Op math.
- Falls back to a loop over non-batched gemm calls on older CUDA
  versions or GPU architectures.

* Refactor GPU batched gemm into one internal func

parent 9201e2c0

Show whitespace changes

Inline Side-by-side

Please to comment