Commit 3908ccbc authored by Peter Hawkins's avatar Peter Hawkins Committed by TensorFlower Gardener
Browse files

[TF:XLA] Enable multiple streams for the XLA_GPU device, i.e., concurrent...

[TF:XLA] Enable multiple streams for the XLA_GPU device, i.e., concurrent computations and transfers.

This device is used only for unit tests.

While the original intent of this change was to get more test coverage of the multiple stream path by enabling it on XLA_GPU, it turns out that it also fixes a XLA_GPU-specific bug introduced by
https://github.com/tensorflow/tensorflow/commit/c4705c30d577138017069a2c897e8c9d66eb49bc
where a CUDA host callback calls done() in XlaDeviceContext::CopyCPUTensorToDevice().
CUDA host callbacks are forbidden from calling CUDA driver methods, but done() can deallocate tensors, which ends up calling cuFree(). In multi-stream mode, we call done() eagerly, not on a callback, so the problem doesn't arise.

PiperOrigin-RevId: 220342021
parent ff0ef344
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment