[TF:XLA] Enable multiple streams for the XLA_GPU device, i.e., concurrent...
[TF:XLA] Enable multiple streams for the XLA_GPU device, i.e., concurrent computations and transfers. This device is used only for unit tests. While the original intent of this change was to get more test coverage of the multiple stream path by enabling it on XLA_GPU, it turns out that it also fixes a XLA_GPU-specific bug introduced by https://github.com/tensorflow/tensorflow/commit/c4705c30d577138017069a2c897e8c9d66eb49bc where a CUDA host callback calls done() in XlaDeviceContext::CopyCPUTensorToDevice(). CUDA host callbacks are forbidden from calling CUDA driver methods, but done() can deallocate tensors, which ends up calling cuFree(). In multi-stream mode, we call done() eagerly, not on a callback, so the problem doesn't arise. PiperOrigin-RevId: 220342021
Loading
Please sign in to comment