Commit 30e8b66c authored by A. Unique TensorFlower's avatar A. Unique TensorFlower Committed by TensorFlower Gardener
Browse files

Optimize SparseMatMulOp::Compute in two ways:

 - A significant portion if the time in  is spent comparing bfloat16s to zero. This is costly because operator==(bfloat16, bfloat16) does two conversions to float. Avoid conversions to float to do the comparison by adding a fast test for equality of bfloat16 to zero. This gains 0% to 7% for bfloat16. To get an idea of why this is better, see: https://godbolt.org/z/T1zqvx.
 - Avoid a modulo computation + a switch which is redundant with internal loop checks + resize calls. This gains 0% to 1.5% depending on the benchmark.

This is never slower than the previous code and is up to 7% faster.

PiperOrigin-RevId: 231541527
parent 799acd00
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment