Fixing MultiDeviceIterator memory leak issue in eager mode.
In Graph mode, we rely on MultiDeviceIteratorHandleOp destruction to decrement the ref count for the resource. Since we don't destroy kernels in Eager mode, we explicitly added in a destroy_resource_op to mitigate this. The problem is that this isn't enough. The ResourceMgr.LookupOrCreate method ends up increasing the ref count of the resource by 2 and we were effectively doing two Unref's in graph mode in the destructor. So even with the destroy resource op, the refcount remained 1 and didn't go down to zero. The fix here is to handle the Eager mode case separately, similar to what we've done with the AnonymousIteratorHandleOp. Instead of creating a whole new kernel, we re-use the existing kernel and use a special shared_name argument to identify when to switch the behavior. Now in Eager mode, after running the HandleOp kernel, the refcount of the resource is 1. PiperOrigin-RevId: 231333966
Loading
Please sign in to comment