[XLA:GPU] Convert the reduction implementation to use tiling scheme.
Convert the implementation of scalar reduction, row reduction and column reduction to use EmitTiledKernel, which is a more general kernel tiling implementation that is based on the information defined by an object of TilingScheme. For scalar reduction and row reduction, the new implementation should generate the same optimized code as the old implementation. For column reduction, the old implementation in routine IrEmitterUnnested::EmitColumnReduction uses kTileWidth=2 so that one thread computes the partial results for two elements in the output of each kReduce instruction. The new implementation is equivalent to the old implementation with kTileWidth=1 in this regard. PiperOrigin-RevId: 222752674
Loading
Please sign in to comment