[XLA:GPU] Workaround the LLVM PTX backend bug for llvm.round.
The llvm.round intrinsic and the HLO RoundNearestAfz instruction have the same semantics. As such, we previously translate the RoundNearestAfz HLO instruction to the llvm.round intrinsic. However, the PTX LLVM backend currently translates llvm.round to PTX cvt.rni, which rounds to the even integer when the source is equidistant between two integers. This change translates the RoundNearestAfz HLO instruction to a call of the NVIDIA libdevice routine __nv_round_ instead, to workaround this LLVM PTX backend bug which we are going to fix later. This workaround may be more preferable than the LLVM PTX backend fix for the XLA use case as expanding the non-trivial llvm.round implementation early allows better optimizations. Add an exhaustive test for RoundNearestAfz. PiperOrigin-RevId: 235610143
Loading
Please sign in to comment