[XLA:GPU] Unroll unfused elementwise op kernels.
So far we only unrolled loop fusions, elementwise ops is a logical extension. We don't spend a lot of time in unfused elementwise ops in benchmarks, so this is only worth a small speedup on V100. PiperOrigin-RevId: 195121530
Loading
Please sign in to comment