Commit db7cb475 authored by Bixia Zheng's avatar Bixia Zheng Committed by TensorFlower Gardener
Browse files

[XLA:GPU] Workaround the LLVM PTX backend bug for llvm.round.

The llvm.round intrinsic and the HLO RoundNearestAfz instruction have the same
semantics. As such, we previously translate the RoundNearestAfz HLO instruction
to the llvm.round intrinsic. However, the PTX LLVM backend currently translates
llvm.round to PTX cvt.rni, which rounds to the even integer when the source is
equidistant between two integers. This change translates the RoundNearestAfz HLO
instruction to a call of the NVIDIA libdevice routine __nv_round_ instead, to
workaround this LLVM PTX backend bug which we are going to fix later. This
workaround may be more preferable than the LLVM PTX backend fix for the XLA use
case as expanding the non-trivial llvm.round implementation early allows better
optimizations.

Add an exhaustive test for RoundNearestAfz.

PiperOrigin-RevId: 235610143
parent 1ee0db9c
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment