Fix data race in tensorflow/core/kernels/map_stage_op.cc, and use tf...
Fix data race in tensorflow/core/kernels/map_stage_op.cc, and use tf locking/formatting conventions. - Fix data race in which a routine mysteriously releases a lock which was held when it was called, and which the caller expects will still be held afterwards.(!) See the changes to the routines notify_inserters_if_bounded() and notify_removers(). The data race was in put() on current_bytes_ near the end, after the call to put_complete(). put_complete() calls notify_removers(), which released the lock, thus causing the race. Any reader of the code would have expected that the lock would have been held at that point, but it wasn't. The same pattern in notify_inserters_if_bounded() didn't cause a race in the current code, but is a pitfall for future maintainers, who would not mornally expect something called notify_inserters_if_bounded() to mean "and in some circumstances, unlock". Even in the absense of this bug, and the weird, uncommented spec for those routines, it is almost always a bad idea to release the lock before calling a condition-variable wakeup. First, it complicates the code, and makes it hard to reason about object deletion if the deletion is predicated by one of the signalled conditions. Second, it's an _unnecessary_ complication: a well-written condition variable implementation will requeue the woken thread(s) on the mutex queue, so the expensive part of the wakeup will be deferred until the waking thread unlocks anyway. - Avoid the C++11 unique_lock mechanism that allowed the bug to happen, and instead use tensorflow::mutex, tensorflow::mutex_lock, and tensorflow::condition_variable. - Use while-loops instead of lambdas for condition-variable waits, because while-loops are much easier to read. - Fix the routines would_exceed_memory_limit() and is_capacity_full() to include their own validity check, so it's not needed at each of their call sites. This improves the readability at the call sites. - Use annotalysis locking annotations so that other violations can be found statically by the compiler. - Reformat with clang-format. PiperOrigin-RevId: 175913085
Loading
Please sign in to comment