[XLA:GPU] Add infrastructure for unrolling kernels to improve bandwidth utilization.
We often have simple kernels that do very little actual work, duplicating that can increase the used bandwidth. This change introduces flags and infrastructure for unrolling kernels, it doesn't include any cost heuristics and is disabled by default. Based on code written by Bixia Zheng. PiperOrigin-RevId: 192296781
Loading
Please sign in to comment