[XLA] Fix erf for f16 inputs, and improve precision in f32.
Switch to using the f32 implementations. Previously we were using the f64 implementation with all constants truncated down to f32. Part of the problem is that it was computing erf(x) = x * polynomial1(x^2) / polynomial2(x^2) and assuming that we had enough precision that neither of the polynomials would overflow. On f16, polynomial2 would overflow to inf even for quite reasonable values of x (e.g. 0.79), thus causing the result to be 0 incorrectly. This may reduce precision for f64, but that's a trade-off we're willing to make (especially since we were using f32 coefficients *anyway*). erf and erfc also now call each other when their inputs are out of range. Relevant to #25052. PiperOrigin-RevId: 231896451
Loading
Please sign in to comment