Commit 30e8b66c authored Jan 29, 2019 by A. Unique TensorFlower Committed by TensorFlower Gardener Jan 29, 2019

Optimize SparseMatMulOp::Compute in two ways:

 - A significant portion if the time in  is spent comparing bfloat16s to zero. This is costly because operator==(bfloat16, bfloat16) does two conversions to float. Avoid conversions to float to do the comparison by adding a fast test for equality of bfloat16 to zero. This gains 0% to 7% for bfloat16. To get an idea of why this is better, see: https://godbolt.org/z/T1zqvx.
 - Avoid a modulo computation + a switch which is redundant with internal loop checks + resize calls. This gains 0% to 1.5% depending on the benchmark.

This is never slower than the previous code and is up to 7% faster.

PiperOrigin-RevId: 231541527

parent 799acd00

Show whitespace changes

Inline Side-by-side

Please to comment