[XLA] Cache computations when creating reduces in algebraic simplifier or batchnorm expander
Otherwise we create a lot of identical small computations. This shouldn't have an effect except for cluttering the HLO, but turns out HloCSE doesn't look inside of the computation of reduces, effectively never eliminating reduces that were produced via this code path. While there clean up some YAGNI, this only worked for F32 anyways, so just hardcode it. PiperOrigin-RevId: 196689316
Loading
Please sign in to comment