Commit 3a1df26a authored by A. Unique TensorFlower's avatar A. Unique TensorFlower Committed by TensorFlower Gardener
Browse files

Pass the device ordinal to use for execution to the XLA compiler for

auto-tuning.

Previously, when compiling a graph for multiple devices concurrently, XLA
would use the default device for auto-tuning. With this patch tf_cnn_benchmark
with model resnet50 finishes on 8 V100s batch 128, and gets a speedup of ~20%
over a single one; the next steps are to get it to run at batch 256 and to
scale well.

PiperOrigin-RevId: 206720140
parent 78f58629
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment