Add a GPU kernel for tf.dynamic_partition. (#13905)
* Add a GPU kernel for tf.dynamic_partition. The algorithm has the following steps: 1. Radix-sort the information in partitions. 2. Count how many times each id appears. 3. Allocate memory for the output. 4. Gather the data in the output tensors. The op is async. * Add a note explaining the general approach for the GPU version. * Handle the case where partitions or some output tensor is empty.
Loading
Please sign in to comment