Commit 517112e7 authored Feb 25, 2019 by Justin Lebar Committed by TensorFlower Gardener Feb 25, 2019

[XLA:GPU] Enhance for-loop analysis.

XLA:GPU's "for-loop" analysis looks at `while` loops and tries to determine
whether they run a constant number of times.  If so, the loop is emitted as a
"ForThunk", which can be run more efficiently on the GPU than the more general
WhileThunk.

A problem with our for-loop analysis is that it runs very late in the pass
pipeline, after fusion, layout-assignment, copy-insertion, etc.  At this point
it's challenging to pattern-match the HLO to figure out whether or not a loop
is a bona fide `for` loop.  We can try to brute-force the loop count, but that
only works for relatively small loops.

This patch makes two changes:

 - Moves `for` loop matching to a separate pass, run before layout assignment
   or fusion.  The loop count is stored in a backend-config on the while
   instruction.

 - Adds pattern-matching machinery for loop counts.

PiperOrigin-RevId: 235610452

parent db7cb475

Show whitespace changes

Inline Side-by-side

Please to comment