Commit 5f2c44f6 authored by Benjamin Kramer's avatar Benjamin Kramer Committed by TensorFlower Gardener
Browse files

[XLA:GPU] Make the input-fused reduce emitter work on 16-bit types

There's a bunch of things going on here:
- BuildInitializerThunk threw away half of 16 bit init values. Fix that.
- Make HandleFusion verify that it gets input-fusible reduces
- Fuse BF16 again in multi-output fusion. This was a workaround for the initializer bug
- Drop the 32 bit requirement from unfused reduce emission. It is really confusing to have different code paths for fused and unfused reduces
- Emit 8/16 integer bit add/min/max as CAS.

This is somewhat covered by existing tests.

PiperOrigin-RevId: 202125572
parent c5feedab
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment