Use the host implementation of vec permute op if the input on the host. Note
that the op still needs to be placed on the GPU so that it stays within the same partiion with the neighboring ops, and as a result, no unnecessary send and recv are created. PiperOrigin-RevId: 193457328
Loading
Please sign in to comment