Commit 813f8eae authored by Alberto Donizetti's avatar Alberto Donizetti Committed by Robert Griesemer
Browse files

math/big: remove Direct Sqrt computation

The Float.Sqrt method switches (for performance reasons) between
direct (uses Quo) and inverse (doesn't) computation, depending on the
precision, with threshold 128.

Unfortunately the implementation of recursive division in CL 172018
made Quo slightly slower exactly in the range around and below the
threshold Sqrt is using, so this strategy is no longer profitable.

The new division algorithm allocates more, and this has increased the
amount of allocations performed by Sqrt when using the direct method;
on low precisions the computation is fast, so additional allocations
have an negative impact on performance.

Interestingly, only using the inverse method doesn't just reverse the
effects of the Quo algorithm change, but it seems to make performances
better overall for small precisions:

name                 old time/op    new time/op    delta
FloatSqrt/64-4          643ns ± 1%     635ns ± 1%   -1.24%  (p=0.000 n=10+10)
FloatSqrt/128-4        1.44µs ± 1%    1.02µs ± 1%  -29.25%  (p=0.000 n=10+10)
FloatSqrt/256-4        1.49µs ± 1%    1.49µs ± 1%     ~     (p=0.752 n=10+10)
FloatSqrt/1000-4       3.71µs ± 1%    3.74µs ± 1%   +0.87%  (p=0.001 n=10+10)
FloatSqrt/10000-4      35.3µs ± 1%    35.6µs ± 1%   +0.82%  (p=0.002 n=10+9)
FloatSqrt/100000-4      844µs ± 1%     844µs ± 0%     ~     (p=0.549 n=10+9)
FloatSqrt/1000000-4    69.5ms ± 0%    69.6ms ± 0%     ~     (p=0.222 n=9+9)

name                 old alloc/op   new alloc/op   delta
FloatSqrt/64-4           280B ± 0%      200B ± 0%  -28.57%  (p=0.000 n=10+10)
FloatSqrt/128-4          504B ± 0%      248B ± 0%  -50.79%  (p=0.000 n=10+10)
FloatSqrt/256-4          344B ± 0%      344B ± 0%     ~     (all equal)
FloatSqrt/1000-4       1.30kB ± 0%    1.30kB ± 0%     ~     (all equal)
FloatSqrt/10000-4      13.5kB ± 0%    13.5kB ± 0%     ~     (p=0.237 n=10+10)
FloatSqrt/100000-4      123kB ± 0%     123kB ± 0%     ~     (p=0.247 n=10+10)
FloatSqrt/1000000-4    1.83MB ± 1%    1.83MB ± 3%     ~     (p=0.779 n=8+10)

name                 old allocs/op  new allocs/op  delta
FloatSqrt/64-4           8.00 ± 0%      5.00 ± 0%  -37.50%  (p=0.000 n=10+10)
FloatSqrt/128-4          11.0 ± 0%       5.0 ± 0%  -54.55%  (p=0.000 n=10+10)
FloatSqrt/256-4          5.00 ± 0%      5.00 ± 0%     ~     (all equal)
FloatSqrt/1000-4         6.00 ± 0%      6.00 ± 0%     ~     (all equal)
FloatSqrt/10000-4        6.00 ± 0%      6.00 ± 0%     ~     (all equal)
FloatSqrt/100000-4       6.00 ± 0%      6.00 ± 0%     ~     (all equal)
FloatSqrt/1000000-4      10.3 ±13%      10.3 ±13%     ~     (p=1.000 n=10+10)

For example, 1.02µs for FloatSqrt/128 is actually better than what I
was getting on the same machine before the Quo changes.

The .8% slowdown on /1000 and /10000 appears to be real and it is
quite baffling (that codepath was not touched at all); it may be
caused by code alignment changes.

Change-Id: Ib03761cdc1055674bc7526d4f3a23d7a25094029
Reviewed-on: https://go-review.googlesource.com/c/go/+/228062


Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: default avatarRobert Griesemer <gri@golang.org>
parent 435b9dd1
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment