Commit 9389c259 authored by Yong Tang's avatar Yong Tang Committed by gunan
Browse files

Add GPU kernel for `tf.bincount` (#13813)



* Split BincountOp with GPU and CPU version

This commit splits BincountOp with GPU and CPU version.
GPU implementation to follow.

Signed-off-by: default avatarYong Tang <yong.tang.github@outlook.com>

* Add GPU kernel for `tf.bincount`.

This fix tries to address the issue raised in 11554 where
there is no GPU support for `tf.bincount`.

This fix adds GPU support for `tf.bincount`.

This fix fixes 11554.

Signed-off-by: default avatarYong Tang <yong.tang.github@outlook.com>

* Update test cases for GPU support of `tf.bincount`

Signed-off-by: default avatarYong Tang <yong.tang.github@outlook.com>

* Address cases where input.size() == 0 or output.size() == 0

Signed-off-by: default avatarYong Tang <yong.tang.github@outlook.com>

* Use CUB for histogram bincount when weight = 1.

Signed-off-by: default avatarYong Tang <yong.tang.github@outlook.com>

* Use unsorted_segment_sum when weights.size() != 0

Signed-off-by: default avatarYong Tang <yong.tang.github@outlook.com>

* Remove unneeded GPU kernels.

Signed-off-by: default avatarYong Tang <yong.tang.github@outlook.com>

* Update Bazel BUILD file with Buildifier

Signed-off-by: default avatarYong Tang <yong.tang.github@outlook.com>

* Move unsorted_segment_sum to python part.

Signed-off-by: default avatarYong Tang <yong.tang.github@outlook.com>

* Address review comments

Signed-off-by: default avatarYong Tang <yong.tang.github@outlook.com>

* Address review feedback.

Signed-off-by: default avatarYong Tang <yong.tang.github@outlook.com>

* Add benchmark tests.

Some run result:
```
Running main() from test_main.cc
Benchmark                  Time(ns) Iterations
----------------------------------------------
BM_Bincount_cpu_32_1000      114922       5150   285.1M items/s
BM_Bincount_cpu_32_2000      124291       5524   263.6M items/s
BM_Bincount_cpu_32_5000      159548       4287   205.4M items/s
BM_Bincount_cpu_64_1000      145006       4793   452.0M items/s
BM_Bincount_cpu_64_2000      150301       4457   436.0M items/s
BM_Bincount_cpu_64_5000      180001       3880   364.1M items/s
BM_Bincount_cpu_128_1000     204993       3405   639.4M items/s
BM_Bincount_cpu_128_2000     209144       3311   626.7M items/s
BM_Bincount_cpu_128_5000     231580       3003   566.0M items/s

BM_Bincount_gpu_32_1000       61178      10000   535.6M items/s
BM_Bincount_gpu_32_2000       61021      10000   537.0M items/s
BM_Bincount_gpu_32_5000       61177      10000   535.6M items/s
BM_Bincount_gpu_64_1000       61317      10000   1068.8M items/s
BM_Bincount_gpu_64_2000       60726      10000   1079.2M items/s
BM_Bincount_gpu_64_5000       61721      10000   1061.8M items/s
BM_Bincount_gpu_128_1000      69935      10000   1874.2M items/s
BM_Bincount_gpu_128_2000      79760       9852   1643.3M items/s
BM_Bincount_gpu_128_5000     100407       6974   1305.4M items/s
```

Signed-off-by: default avatarYong Tang <yong.tang.github@outlook.com>

* Remove Bincount from `hidden_ops.txt`, and remove unneeded

libraries so that Jenkins could pass.

Bincount should not be added `hidden_ops.txt` as it will cause
compatibility test fail.

And the following libs should not be needed in CI/CD:
```
-        "@local_config_cuda//cuda:cublas",
-        "@local_config_cuda//cuda:cuda_driver",
-        "@local_config_cuda//cuda:cudnn",
-        "@local_config_cuda//cuda:cufft",
-        "@local_config_cuda//cuda:curand",
```

Signed-off-by: default avatarYong Tang <yong.tang.github@outlook.com>
parent 7f84d88d
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment