Commit d9d029f5 authored by Thomas Joerg's avatar Thomas Joerg Committed by TensorFlower Gardener
Browse files

[XLA:GPU] Generalize the column reduction algorithm to handle tile widths greater than 1.

Tiles of width 1 result in poor memory bandwidth for 16b inputs.

PiperOrigin-RevId: 205033124
parent f1de0ddd
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment