Commit 3908ccbc authored Nov 06, 2018 by Peter Hawkins Committed by TensorFlower Gardener Nov 06, 2018

[TF:XLA] Enable multiple streams for the XLA_GPU device, i.e., concurrent...

[TF:XLA] Enable multiple streams for the XLA_GPU device, i.e., concurrent computations and transfers.

This device is used only for unit tests.

While the original intent of this change was to get more test coverage of the multiple stream path by enabling it on XLA_GPU, it turns out that it also fixes a XLA_GPU-specific bug introduced by
https://github.com/tensorflow/tensorflow/commit/c4705c30d577138017069a2c897e8c9d66eb49bc
where a CUDA host callback calls done() in XlaDeviceContext::CopyCPUTensorToDevice().
CUDA host callbacks are forbidden from calling CUDA driver methods, but done() can deallocate tensors, which ends up calling cuFree(). In multi-stream mode, we call done() eagerly, not on a callback, so the problem doesn't arise.

PiperOrigin-RevId: 220342021

parent ff0ef344

Show whitespace changes

Inline Side-by-side

Please to comment