Improve fast_tensor_util for bfloat16
In 19180, improvement has been done to speed up the
fast_tensor_util for `float16`. As both `float16`
and `bfloat16` uses the same size, `bfloat16`
could be improved as well. This fix speeds up `bfloat16`
in a similiar fashion as `float16`.
This fix is related to 19180.
Signed-off-by:
Yong Tang <yong.tang.github@outlook.com>
Loading
Please sign in to comment