[XLA:GPU] Generalize the column reduction algorithm to handle tile widths greater than 1.
Tiles of width 1 result in poor memory bandwidth for 16b inputs. PiperOrigin-RevId: 205033124
Loading
Please sign in to comment
Tiles of width 1 result in poor memory bandwidth for 16b inputs. PiperOrigin-RevId: 205033124