improve handling at end of buffer
a prior change reduced iterations through the input buffer to avoid the NEON operations from overrunning the end of the locally allocated buffer. While avoiding the overrun, it generated bad results. Here we instead extend the locally allocated buffers enough that the original iteration count won't overrun. Some pre-existing bit-exact issues remain. Bug: 136616344 Test: CTS + bit-exact cross-checks. (cherry picked from commit aae866aed579da4e1c3299a1e9b94a1713a0decb) Merged-In: Ifb790a7d1d09a4ce7da900b43e3fa1f7ab01ac53 Change-Id: Ia4a94c89979d6b3b0ddead135036aec40259f53a
Loading