[XLA] Improvements to replay_computation tool.
* Reduce threshold at which we run fake-data generation on the device from 1gb to 1mb. At the old threshold, I observed cases where we'd spend many seconds, and >50% of our runtime, in logf(), used for computing random numbers. * Don't retrieve or print the result when running with fake data. Presumably this is uninteresting, because garbage in, garbage out. Retrieving this data can take as long as running the whole computation, and printing it can take many times longer. * Add a LOG(INFO) indicating how long execution took. * Add a --num_runs flag. This is particularly important on GPUs, where the first run does autotuning, and so isn't interesting from a performance perspective. PiperOrigin-RevId: 177185636
Loading
Please sign in to comment