Commit b6245cef authored by Michael Munday's avatar Michael Munday
Browse files

internal/bytealg: add SIMD byte count implementation for s390x

Add a 'single lane' SIMD implemementation of the single byte count
function for use on machines that support the vector facility. This
allows up to 16 bytes to be counted per loop iteration.

We can probably improve performance further by adding more 'lanes'
(i.e. counting more bytes in parallel) however this will increase
the complexity of the function so I'm not sure it is worth doing
yet.

name                old speed      new speed       delta
pkg:strings goos:linux goarch:s390x
CountByte/10         789MB/s ± 0%   1131MB/s ± 0%    +43.44%  (p=0.000 n=9+9)
CountByte/32         936MB/s ± 0%   3236MB/s ± 0%   +245.87%  (p=0.000 n=8+9)
CountByte/4096      1.06GB/s ± 0%  21.26GB/s ± 0%  +1907.07%  (p=0.000 n=10+10)
CountByte/4194304   1.06GB/s ± 0%  20.54GB/s ± 0%  +1838.50%  (p=0.000 n=10+10)
CountByte/67108864  1.06GB/s ± 0%  18.31GB/s ± 0%  +1629.51%  (p=0.000 n=10+10)
pkg:bytes goos:linux goarch:s390x
CountSingle/10       800MB/s ± 0%    986MB/s ± 0%    +23.21%  (p=0.000 n=9+10)
CountSingle/32       925MB/s ± 0%   2744MB/s ± 0%   +196.55%  (p=0.000 n=9+10)
CountSingle/4K      1.26GB/s ± 0%  19.44GB/s ± 0%  +1445.59%  (p=0.000 n=10+10)
CountSingle/4M      1.26GB/s ± 0%  20.28GB/s ± 0%  +1510.26%  (p=0.000 n=8+10)
CountSingle/64M     1.23GB/s ± 0%  17.78GB/s ± 0%  +1350.67%  (p=0.000 n=9+10)

Change-Id: I230d57905db92a8fdfc50b1d5be338941ae3a7a1
Reviewed-on: https://go-review.googlesource.com/c/go/+/199979


Run-TryBot: Michael Munday <mike.munday@ibm.com>
Reviewed-by: default avatarKeith Randall <khr@golang.org>
parent d9ee4b28
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment