Commit 33035bb7 authored Jul 24, 2018 by Adrian Kuegel Committed by TensorFlower Gardener Jul 24, 2018

Parallelize BitonicSort on GPU.

We now emit O(log^n) kernel thunks. Each thunk is responsible for looping over
the other dimensions, and then doing a comparison loop through the dimension
that should be sorted.

PiperOrigin-RevId: 205791397

parent fca1561b

Show whitespace changes

Inline Side-by-side

Please to comment