Commit 7ee8467b authored Apr 04, 2020 by Josh Bleecher Snyder

cmd/compile: use MOVBQZX for OpAMD64LoweredHasCPUFeature

In the commit message of CL 212360, I wrote:

> This new intrinsic ... generates MOVB+TESTB+NE.
> (It is possible that MOVBQZX+TESTQ+NE would be better.)

I should have tested. MOVBQZX+TESTQ+NE does in fact appear to be better.

For the benchmark in #36196, on my machine:

name      old time/op  new time/op  delta
FMA-8     0.86ns ± 6%  0.70ns ± 5%  -18.79%  (p=0.000 n=98+97)
NonFMA-8  0.61ns ± 5%  0.60ns ± 4%   -0.74%  (p=0.001 n=100+97)

Interestingly, these are both considerably faster than
the measurements I took a couple of months ago (1.4ns/2ns).
It appears that CL 219131 (clearing VZEROUPPER in asyncPreempt) helped a lot.
And FMA is now once again slower than NonFMA, although this change
helps it regain some ground.

Updates #15808
Updates #36351
Updates #36196

Change-Id: I8a326289a963b1939aaa7eaa2fab2ec536467c7d
Reviewed-on: https://go-review.googlesource.com/c/go/+/227238

Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>

parent 64f19d70

Show whitespace changes

Inline Side-by-side

Please to comment