[TF:XLA] Register the standard Stack kernels on XLA_... devices.
Refactor the core stack ops to split the op classes from their registrations on CPU/GPU/SYSCL devices. Refactor the stack push op to be templated on an allow_swapping bool, rather than a specific device. The device was only ever used in a type equality test to determine whether to swap or not. On XLA_... devices, previously stack operators only worked when the entire computation was grouped into a single cluster (e.g., via xla.compile()). This change also allows stack-using operators to work in "ondemand" or eager modes, when running ops one-at-a-time. However, since the compiled and interpreted representations of stacks are still different, there is not yet any support for passing stacks into or out of compiled blocks. Stack usage must remain entirely inside or entirely outside a compiled block until we rectify this in a future change. PiperOrigin-RevId: 220132340
Loading
Please sign in to comment