Avoid device to host copies by "partially declustering" certain nodes.
"Partial declustering" is defined as cloning a clustered node outside its cluster and transferring some of its outgoing edges to the cloned version. Some TensorFlow operations expect their inputs in host-memory and, because XLA only produces device tensors, such nodes can incur a device-to-host copy if not clustered along with their producers. TensorFlow operations, on the other hand, may produce their outputs in host memory so cloning the producer to outside the cluster and moving the host-mem expecting consumers to use the cloned version instead lets us avoid the memcpy. PiperOrigin-RevId: 208710603
Loading
Please sign in to comment