Avoid excessive cpu<->gpu memory swaps, compute shape ops on the CPU. This
results in +10% perf improvement for tensor2tensor Transformer model training step times, and +37% perf improvement for tensor2tensor Transformer model decoding. PiperOrigin-RevId: 212804933
Loading
Please sign in to comment