[XLA:GPU] Refactor the 0-2-1 tiling implementation to support more use cases.
The current implementation of 0-2-1 tiling only handles two cases, a single kCopy instruction and a kLoop fusion instruction. This change adds class TilingScheme to support more general tiling methods that consist of tiles and block of tiles. It also restructures the existing tiling implementation to generate the kernel based on a TilingScheme object, a few function objects that provide hooks for HLO specific code generation and other parameters. This change supports our next step in extending tiling for reduction. PiperOrigin-RevId: 221393383
Loading
Please sign in to comment