Add GPU kernel for `tf.bincount` (#13813)
* Split BincountOp with GPU and CPU version This commit splits BincountOp with GPU and CPU version. GPU implementation to follow. Signed-off-by:Yong Tang <yong.tang.github@outlook.com> * Add GPU kernel for `tf.bincount`. This fix tries to address the issue raised in 11554 where there is no GPU support for `tf.bincount`. This fix adds GPU support for `tf.bincount`. This fix fixes 11554. Signed-off-by:
Yong Tang <yong.tang.github@outlook.com> * Update test cases for GPU support of `tf.bincount` Signed-off-by:
Yong Tang <yong.tang.github@outlook.com> * Address cases where input.size() == 0 or output.size() == 0 Signed-off-by:
Yong Tang <yong.tang.github@outlook.com> * Use CUB for histogram bincount when weight = 1. Signed-off-by:
Yong Tang <yong.tang.github@outlook.com> * Use unsorted_segment_sum when weights.size() != 0 Signed-off-by:
Yong Tang <yong.tang.github@outlook.com> * Remove unneeded GPU kernels. Signed-off-by:
Yong Tang <yong.tang.github@outlook.com> * Update Bazel BUILD file with Buildifier Signed-off-by:
Yong Tang <yong.tang.github@outlook.com> * Move unsorted_segment_sum to python part. Signed-off-by:
Yong Tang <yong.tang.github@outlook.com> * Address review comments Signed-off-by:
Yong Tang <yong.tang.github@outlook.com> * Address review feedback. Signed-off-by:
Yong Tang <yong.tang.github@outlook.com> * Add benchmark tests. Some run result: ``` Running main() from test_main.cc Benchmark Time(ns) Iterations ---------------------------------------------- BM_Bincount_cpu_32_1000 114922 5150 285.1M items/s BM_Bincount_cpu_32_2000 124291 5524 263.6M items/s BM_Bincount_cpu_32_5000 159548 4287 205.4M items/s BM_Bincount_cpu_64_1000 145006 4793 452.0M items/s BM_Bincount_cpu_64_2000 150301 4457 436.0M items/s BM_Bincount_cpu_64_5000 180001 3880 364.1M items/s BM_Bincount_cpu_128_1000 204993 3405 639.4M items/s BM_Bincount_cpu_128_2000 209144 3311 626.7M items/s BM_Bincount_cpu_128_5000 231580 3003 566.0M items/s BM_Bincount_gpu_32_1000 61178 10000 535.6M items/s BM_Bincount_gpu_32_2000 61021 10000 537.0M items/s BM_Bincount_gpu_32_5000 61177 10000 535.6M items/s BM_Bincount_gpu_64_1000 61317 10000 1068.8M items/s BM_Bincount_gpu_64_2000 60726 10000 1079.2M items/s BM_Bincount_gpu_64_5000 61721 10000 1061.8M items/s BM_Bincount_gpu_128_1000 69935 10000 1874.2M items/s BM_Bincount_gpu_128_2000 79760 9852 1643.3M items/s BM_Bincount_gpu_128_5000 100407 6974 1305.4M items/s ``` Signed-off-by:
Yong Tang <yong.tang.github@outlook.com> * Remove Bincount from `hidden_ops.txt`, and remove unneeded libraries so that Jenkins could pass. Bincount should not be added `hidden_ops.txt` as it will cause compatibility test fail. And the following libs should not be needed in CI/CD: ``` - "@local_config_cuda//cuda:cublas", - "@local_config_cuda//cuda:cuda_driver", - "@local_config_cuda//cuda:cudnn", - "@local_config_cuda//cuda:cufft", - "@local_config_cuda//cuda:curand", ``` Signed-off-by:
Yong Tang <yong.tang.github@outlook.com>
Loading
Please sign in to comment