[tf.data] Add `IteratorContext::allocator()`.
This enables the various iterator implementations to use the actual allocator for the device on which they are running, rather than defaulting to `cpu_allocator()` (which is typically a plain malloc). In future, this will enable allocating iterator outputs in CUDA-pinned memory (and GPU memory). PERFORMANCE NOTE: In sessions where `ConfigProto.force_gpu_compatible == True`, this change has the effect of allocating all input pipeline tensors in CUDA-pinned memory. Previous if this flag was set, only the tensors allocated during function execution would be allocated in this space, and other tensors (e.g. the result of a `Dataset.batch()` would be allocated using `cpu_allocator()` (i.e. `malloc()`). This change should lead to more efficient communication between a host-side input pipeline and GPUs, but it may also create more pressure on the CUDA host allocator (whose default maximum size is 64GB). The "TF_CUDA_HOST_MEM_LIMIT_IN_MB" environment variable can be used to override this value. This change is a starting point for working on issue #13610. PiperOrigin-RevId: 183881907
Loading
Please sign in to comment