Use optimized ops to handle GPU memory swapping: this avoids the need for 2
pairs of extra _send/_recv nodes which speeds things up a bit. This also ensures that performance doesn't depend on the recv scheduling built in TF, which isn't always optimal. PiperOrigin-RevId: 187057831
Loading
Please sign in to comment