Change the Eigen reduction code to use a tree to improve numerical stability.
This changes the InnerMostDimReducer to use a summation tree, which is more numerically stable than the previous approach of sequential addition into an accumulator. This solves the issue for reduction over all or a trailing subset of dimensions. This change does not improve the numerical accuracy for MeanReducer, which maintains state. Benchmarks show a 40% (AVX) to 50% (SSE) slowdown for small row reductions (sum, float). column- and full reductions are unchanged. Cleaned up TensorFunctors.h a bit by moving the traits to reducer_traits and updating the code that uses the reducers accordingly. Introduced a new trait "IsExactlyAssociative" and new template specializations of InnerMostDimReducer to ensure that we only invoke the new and slightly more expensive codepath when it is needed, i.e. for sum reduction of non-integer types. PiperOrigin-RevId: 198946075
Loading
Please sign in to comment