Commit 941165e0 authored by Todd Wang's avatar Todd Wang Committed by TensorFlower Gardener
Browse files

Ensure XlaDevice discards failed streams from its DeviceContext.

Streams have a monotonic state machine; if a stream encounters any
error, it will remain in an error state forever. Without this change,
a previously failed stream would poison the DeviceContext, and cause
subsequent operations to fail. This is slightly tricky to fix; when a
stream fails, there isn't a convenient way to signal to the XlaDevice
that the DeviceContext should be updated.

The fix is to introduce a method XlaDevice::EnsureDeviceContextOk,
which ensures all streams in the context are valid. This is called at
natural points in the execution, e.g. FillContextMap, which occurs at
the start of every session.run.

Locking has also been added to XlaDevice, since multiple threads may
attempt to access the device concurrently, e.g. via concurrent
session.run calls.

PiperOrigin-RevId: 207596952
parent e7a6afe8
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment