Commit 8280e0ae authored by Eugene Brevdo's avatar Eugene Brevdo Committed by TensorFlower Gardener
Browse files

GPU-enabled WhereOp using CUB.

* Import CUB.
* Add GPU-enabled async WhereOp.
* Added benchmarks.
* Added support for bool ResourceVariables on GPU.

Benchmark results on machine with single K40 tesla GPU:

Where on bool matrix shape [m x n] with p percentage values true below.
For small-medium sizes, running WhereOp on GPU is ~4-2x slower.  For
realistic large problem sizes, it's 2-5x faster.  This timing ignores
the time spent copying a tensor from GPU -> CPU and back from CPU -> GPU
when the WhereOp is between GPU computations (so the performance impact
should actually be better).

Benchmark: m_10_n_10_p_0.01_use_gpu_False        wall_time: 9.01e-05 s   Throughput: 0.00129 GB/s
Benchmark: m_10_n_10_p_0.01_use_gpu_True         wall_time: 0.000187 s   Throughput: 0.000621 GB/s
Benchmark: m_10_n_10_p_0.5_use_gpu_False         wall_time: 9.3e-05 s    Throughput: 0.00968 GB/s
Benchmark: m_10_n_10_p_0.5_use_gpu_True          wall_time: 0.000252 s   Throughput: 0.00357 GB/s
Benchmark: m_10_n_10_p_0.99_use_gpu_False        wall_time: 0.000152 s   Throughput: 0.0111 GB/s
Benchmark: m_10_n_10_p_0.99_use_gpu_True         wall_time: 0.000245 s   Throughput: 0.00687 GB/s
Benchmark: m_10_n_100_p_0.01_use_gpu_False       wall_time: 9.3e-05 s    Throughput: 0.0125 GB/s
Benchmark: m_10_n_100_p_0.01_use_gpu_True        wall_time: 0.000253 s   Throughput: 0.00458 GB/s
Benchmark: m_10_n_100_p_0.5_use_gpu_False        wall_time: 9.8e-05 s    Throughput: 0.0918 GB/s
Benchmark: m_10_n_100_p_0.5_use_gpu_True         wall_time: 0.00026 s    Throughput: 0.0346 GB/s
Benchmark: m_10_n_100_p_0.99_use_gpu_False       wall_time: 0.000104 s   Throughput: 0.162 GB/s
Benchmark: m_10_n_100_p_0.99_use_gpu_True        wall_time: 0.000288 s   Throughput: 0.0586 GB/s
Benchmark: m_10_n_1000_p_0.01_use_gpu_False      wall_time: 0.000105 s   Throughput: 0.111 GB/s
Benchmark: m_10_n_1000_p_0.01_use_gpu_True       wall_time: 0.000283 s   Throughput: 0.041 GB/s
Benchmark: m_10_n_1000_p_0.5_use_gpu_False       wall_time: 0.000185 s   Throughput: 0.486 GB/s
Benchmark: m_10_n_1000_p_0.5_use_gpu_True        wall_time: 0.000335 s   Throughput: 0.269 GB/s
Benchmark: m_10_n_1000_p_0.99_use_gpu_False      wall_time: 0.000203 s   Throughput: 0.83 GB/s
Benchmark: m_10_n_1000_p_0.99_use_gpu_True       wall_time: 0.000346 s   Throughput: 0.486 GB/s
Benchmark: m_10_n_10000_p_0.01_use_gpu_False     wall_time: 0.00019 s    Throughput: 0.609 GB/s
Benchmark: m_10_n_10000_p_0.01_use_gpu_True      wall_time: 0.00028 s    Throughput: 0.414 GB/s
Benchmark: m_10_n_10000_p_0.5_use_gpu_False      wall_time: 0.00117 s    Throughput: 0.771 GB/s
Benchmark: m_10_n_10000_p_0.5_use_gpu_True       wall_time: 0.000426 s   Throughput: 2.11 GB/s
Benchmark: m_10_n_10000_p_0.99_use_gpu_False     wall_time: 0.0014 s     Throughput: 1.2 GB/s
Benchmark: m_10_n_10000_p_0.99_use_gpu_True      wall_time: 0.000482 s   Throughput: 3.5 GB/s
Benchmark: m_10_n_100000_p_0.01_use_gpu_False    wall_time: 0.00129 s    Throughput: 0.899 GB/s
Benchmark: m_10_n_100000_p_0.01_use_gpu_True     wall_time: 0.000336 s   Throughput: 3.45 GB/s
Benchmark: m_10_n_100000_p_0.5_use_gpu_False     wall_time: 0.0102 s     Throughput: 0.885 GB/s
Benchmark: m_10_n_100000_p_0.5_use_gpu_True      wall_time: 0.00136 s    Throughput: 6.6 GB/s
Benchmark: m_10_n_100000_p_0.99_use_gpu_False    wall_time: 0.0116 s     Throughput: 1.45 GB/s
Benchmark: m_10_n_100000_p_0.99_use_gpu_True     wall_time: 0.00233 s    Throughput: 7.23 GB/s
Benchmark: m_10_n_1000000_p_0.01_use_gpu_False   wall_time: 0.0111 s     Throughput: 1.04 GB/s
Benchmark: m_10_n_1000000_p_0.01_use_gpu_True    wall_time: 0.00109 s    Throughput: 10.6 GB/s
Benchmark: m_10_n_1000000_p_0.5_use_gpu_False    wall_time: 0.0895 s     Throughput: 1.01 GB/s
Benchmark: m_10_n_1000000_p_0.5_use_gpu_True     wall_time: 0.0103 s     Throughput: 8.7 GB/s
Benchmark: m_10_n_1000000_p_0.99_use_gpu_False   wall_time: 0.107 s      Throughput: 1.58 GB/s
Benchmark: m_10_n_1000000_p_0.99_use_gpu_True    wall_time: 0.0201 s     Throughput: 8.39 GB/s
PiperOrigin-RevId: 160582709
parent 4aa7c4d2
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment