[XLA:GPU] Selectively exclude 0-2-1 transposed input parameters from the shared
memory transpose implementation. Previously, we either use the shared memory tranpose implementation for all the 0-2-1 tranposed input parameters in a fusion or not to use the shared memory tranpose implement for any input parameters, if the kernel contains instructions such as kReverse and kGather which can make the shared memory tranpose implemetation unsafe. There are two problems in such an approach. First, the set of instructions that can make the shared memory tranpose implementation unsafe is more than the aforementioned two instructions. Second, even though a fusion contains such instructions, it may still be safe to use shared memory tranpose to implement some, if not all, 0-2-1 tranposed input parameters that are not involved in the computation of such instructions. This change adds an analysis to inspect the transitive users of each 0-2-1 tranposed input parameter to decide whether it is safe to use the shared memory tranpose implementation for the parameter. Add two test cases. PiperOrigin-RevId: 228084822
Loading
Please sign in to comment