crypto: x86/aes-xts - optimize _compute_first_set_of_tweaks for AVX-512
Optimize the AVX-512 version of _compute_first_set_of_tweaks by using vectorized shifts to compute the first vector of tweak blocks, and by using byte-aligned shifts when multiplying by x^8. AES-XTS performance on AMD Ryzen 9 9950X (Zen 5) improves by about 2% for 4096-byte messages or 6% for 512-byte messages. AES-XTS performance on Intel Sapphire Rapids improves by about 1% for 4096-byte messages or 3% for 512-byte messages. Code size decreases by 75 bytes which outweighs the increase in rodata size of 16 bytes. Signed-off-by:Eric Biggers <ebiggers@google.com> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
Loading
Please sign in to comment