Commit 9c95751e authored by Chris Leary's avatar Chris Leary Committed by TensorFlower Gardener
Browse files

[XLA:GPU] Add NCCL-based AllReduce replica support to XLA.

This requires a CUDA-config build to enable, as the NCCL library can
only be built in a CUDA-enabled build. In non-CUDA-config builds the NCCL
thunk returns an error.

Used a super-conservative-and-quite-likely-overkill concurrency
approach, in a followup CL it'd be better to optimize for the common case where
we're enqueueing a lot of operations with the same replica count onto a stream
in a non-synchronizing fashion, and only force thread synchronization if the
number of replicas changes.

In the future this should likely be unified with NcclManager in
tensorflow/core/nccl -- for now it is separate since the EventMgr-style
memory allocation strategy from TensorFlow is not used in XLA, so some
parameterization of the memory strategy being used in that library is
likely necessary, at which point it should be reasonable to scoop out
this ~200 line implementation in the cc file and replace it with the
NcclManager abstraction to unify the two implementations.

PiperOrigin-RevId: 235632126
parent 5bce34fe
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment