Commit 0f13eb91 authored by Benjamin Kramer's avatar Benjamin Kramer Committed by TensorFlower Gardener
Browse files

[XLA:GPU] Elide tuple roots of the entry computation

The tuple buffer is never read, so stop emitting code to fill it. A typical
root tuple consists of a H2D memcpy and a host callback, both of which are
somewhat slow.

This helps tiny models and inference benchmarks, where the host/device syncs
can be a significant part of the runtime of the entire computation.

PiperOrigin-RevId: 216968475
parent 3e94e19e
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment