Add unsortedsegment(prod/min/max/sqrt_n/mean). (#15858)
* Add unsortedsegment(prod/min/max/sqrt_n/mean). This commit adds CPU/GPU implementations for prod/min/max ops and python implementations for mean/sqrt_n. Also, it adapts and unifies the corresponding tests of all unsorted reductions. Note: The new gradient of unsorted_segment_max fixes the crash occuring when negative indices on CPU are used. * update golden API * Fix compilation of atomicAdd for cuda_arch < 600. \n This commit moves the std::complex specialization of atomicAdd below the double specialization of atomicAdd for cuda_arch 600. * Enable bfloat16, change inline to EIGEN_STRONG_INLINE. * fix includes of cuda_device_functions; fix typo
Loading
Please sign in to comment