Create dataset kernels as we go i.e. in the __init__ method of the Dataset
class. This removes the _as_variant_tensor() method from the DatasetV2 class (the version going to be used in TF 2.0) and replaces it with a _variant_tensor property that returns the variant_tensor representing the dataset. Also the __init__() method of DatasetV2 now takes a variant_tensor input. For the DatasetV1 class (current API), we run the _as_variant_tensor() method in the __init__() method, so classes subclassing DatasetV1 should make their super() calls in the end. Another implication is for Estimator code. The estimator input_fn's are supposed to be self contained and can't have ops from other graphs (like default graphs) in them. Earlier on because we didn't add anything to the graph while creating the Dataset object, this wasn't an issue but now this is a problem and the dataset creation code now needs to move into the input_fns. A few other changes were required to make this happen 1. The make_one_shot_iterator code captures inputs by value and since now inputs to a dataset could be other datasets which are not capturable, we use the whitelisting mechanism in functions to recreate these ops. 2. The distribution strategies multi-worker code relied on dataset kernel re-creation on different devices while we created the iterator. In the new world, with the kernels already created, we now have to "clone" the dataset on different devices. 3. Auto sharding in distribution strategies is broken with this CL. For now, this CL disables it, but we can subsequently fix it using some of the cloning logic done for 2). 4. AsGraphDefInternal for functions that capture inputs that are datasets now need to be handled differently as DT_VARIANT tensors representing datasets are not serializable. PiperOrigin-RevId: 226115500
Loading
Please sign in to comment