[XLA:GPU] Enhance unrolling heuristics for column reduction.
Previously, we enable unrolling only when the reduce operands are of small data types. This change adds a simple analysis to count the number of tensors that can be vectorized and can't be vectorized in order to decide whether unrolling is beneficial for the kernel. Add test cases. PiperOrigin-RevId: 228618121
Loading
Please sign in to comment