[XLA:GPU] Enhance for-loop analysis.
XLA:GPU's "for-loop" analysis looks at `while` loops and tries to determine whether they run a constant number of times. If so, the loop is emitted as a "ForThunk", which can be run more efficiently on the GPU than the more general WhileThunk. A problem with our for-loop analysis is that it runs very late in the pass pipeline, after fusion, layout-assignment, copy-insertion, etc. At this point it's challenging to pattern-match the HLO to figure out whether or not a loop is a bona fide `for` loop. We can try to brute-force the loop count, but that only works for relatively small loops. This patch makes two changes: - Moves `for` loop matching to a separate pass, run before layout assignment or fusion. The loop count is stored in a backend-config on the while instruction. - Adds pattern-matching machinery for loop counts. PiperOrigin-RevId: 235610452
Loading
Please sign in to comment