Commit 723f285b authored by Justin Lebar's avatar Justin Lebar Committed by TensorFlower Gardener
Browse files

[XLA] Improvements to replay_computation tool.

 * Reduce threshold at which we run fake-data generation on the device
   from 1gb to 1mb.  At the old threshold, I observed cases where
   we'd spend many seconds, and >50% of our runtime, in logf(), used for
   computing random numbers.

 * Don't retrieve or print the result when running with fake data.
   Presumably this is uninteresting, because garbage in, garbage out.
   Retrieving this data can take as long as running the whole
   computation, and printing it can take many times longer.

 * Add a LOG(INFO) indicating how long execution took.

 * Add a --num_runs flag.  This is particularly important on GPUs, where
   the first run does autotuning, and so isn't interesting from a
   performance perspective.

PiperOrigin-RevId: 177185636
parent c81a8ae5
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment