[XLA] Initialize arrays using cudaMemset when possible.
Previously we were using our own hand-rolled initializer thunk. This worked OK for reduces, because the amount of data we were initializing is usually small. But for e.g. select-and-scatter, it's quite slow. This patch lets us use cudaMemset instead. PiperOrigin-RevId: 189904720
Loading
Please sign in to comment