Commit 40f85aff authored by Benjamin Kramer's avatar Benjamin Kramer Committed by TensorFlower Gardener
Browse files

[XLA:GPU] Add infrastructure for unrolling kernels to improve bandwidth utilization.

We often have simple kernels that do very little actual work, duplicating that
can increase the used bandwidth.

This change introduces flags and infrastructure for unrolling kernels, it
doesn't include any cost heuristics and is disabled by default. Based on code
written by Bixia Zheng.

PiperOrigin-RevId: 192296781
parent d7e4458c
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment