Optimize SparseMatMulOp::Compute in two ways:
- A significant portion if the time in is spent comparing bfloat16s to zero. This is costly because operator==(bfloat16, bfloat16) does two conversions to float. Avoid conversions to float to do the comparison by adding a fast test for equality of bfloat16 to zero. This gains 0% to 7% for bfloat16. To get an idea of why this is better, see: https://godbolt.org/z/T1zqvx. - Avoid a modulo computation + a switch which is redundant with internal loop checks + resize calls. This gains 0% to 1.5% depending on the benchmark. This is never slower than the previous code and is up to 7% faster. PiperOrigin-RevId: 231541527
Loading
Please sign in to comment