Commit 38c91321 authored by Bixia Zheng's avatar Bixia Zheng Committed by TensorFlower Gardener
Browse files

[XLA:GPU] Selectively exclude 0-2-1 transposed input parameters from the shared

memory transpose implementation.

Previously, we either use the shared memory tranpose implementation for all the
0-2-1 tranposed input parameters in a fusion or not to use the shared memory
tranpose implement for any input parameters, if the kernel contains instructions
such as kReverse and kGather which can make the shared memory tranpose
implemetation unsafe. There are two problems in such an approach. First, the
set of instructions that can make the shared memory tranpose implementation
unsafe is more than the aforementioned two instructions. Second, even though a
fusion contains such instructions, it may still be safe to use shared memory
tranpose to implement some, if not all, 0-2-1 tranposed input parameters that
are not involved in the computation of such instructions. This change adds an
analysis to inspect the transitive users of each 0-2-1 tranposed input
parameter to decide whether it is safe to use the shared memory tranpose
implementation for the parameter.

Add two test cases.

PiperOrigin-RevId: 228084822
parent ebd5477f
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment