math/big: improve performance of mulAddVWW on ppc64x
This changes the assembly implementation on ppc64x to improve performance by reordering some instructions. It also eliminates an unnecessary move by changing an ADDZE to use the correct target register. Improvement on power9: MulAddVWW/1 6.89ns ± 0% 7.30ns ± 0% +5.95% (p=1.000 n=1+1) MulAddVWW/2 8.04ns ± 0% 8.06ns ± 0% +0.25% (p=1.000 n=1+1) MulAddVWW/3 9.39ns ± 0% 9.39ns ± 0% ~ (all equal) MulAddVWW/4 9.76ns ± 0% 9.48ns ± 0% -2.87% (p=1.000 n=1+1) MulAddVWW/5 10.5ns ± 0% 10.3ns ± 0% -1.90% (p=1.000 n=1+1) MulAddVWW/10 15.4ns ± 0% 14.9ns ± 0% -3.25% (p=1.000 n=1+1) MulAddVWW/100 149ns ± 0% 125ns ± 0% -16.11% (p=1.000 n=1+1) MulAddVWW/1000 1.42µs ± 0% 1.28µs ± 0% -9.74% (p=1.000 n=1+1) MulAddVWW/10000 14.2µs ± 0% 12.8µs ± 0% -9.73% (p=1.000 n=1+1) MulAddVWW/100000 144µs ± 0% 129µs ± 0% -10.10% (p=1.000 n=1+1) Change-Id: I0ae7002a69783ca19d7a4e3e42042ae75dc60069 Reviewed-on: https://go-review.googlesource.com/c/go/+/248721 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by:Carlos Eduardo Seo <cseo@linux.vnet.ibm.com> Reviewed-by:
Paul Murphy <murp@ibm.com>
Loading
Please sign in to comment