[XLA:GPU] Implement trivial (one-replica) cross-replica-sum on XLA:GPU.
Also fix the CPU implementation to work in the case when there are multiple operands to the cross-replica-sum op. PiperOrigin-RevId: 197506311
Loading
Please sign in to comment