Ensure XlaDevice discards failed streams from its DeviceContext.
Streams have a monotonic state machine; if a stream encounters any error, it will remain in an error state forever. Without this change, a previously failed stream would poison the DeviceContext, and cause subsequent operations to fail. This is slightly tricky to fix; when a stream fails, there isn't a convenient way to signal to the XlaDevice that the DeviceContext should be updated. The fix is to introduce a method XlaDevice::EnsureDeviceContextOk, which ensures all streams in the context are valid. This is called at natural points in the execution, e.g. FillContextMap, which occurs at the start of every session.run. Locking has also been added to XlaDevice, since multiple threads may attempt to access the device concurrently, e.g. via concurrent session.run calls. PiperOrigin-RevId: 207596952
Loading
Please sign in to comment