Add GPU explicit padding to tf.nn.conv2d.
Benchmark results: All benchmark results were run on a Z840 with a Titan V, with internal TensorFlow. 1. Resnet50 Eager results The internal resnet50 Eager benchmarks were run, to ensure no regressions in Resnet50 in Eager mode that could have occurred due to the extra Python overhead this change adds. The benchmarks run are here: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/eager/python/examples/resnet50/resnet50_test.py. Each row was run 150 times and the average was taken. Note none of these benchmarks use explicit padding. Numbers represent time, so lower is better. Benchmark Name After Before % diff apply_async_gpu_batch_64_channels_first 0.0726 0.0726 -0.06% apply_gpu_batch_64_channels_first 0.0725 0.0725 -0.06% apply_with_defun_gpu_batch_64_channels_first 0.0755 0.0756 0.07% train_async_gpu_batch_16_channels_first 0.0776 0.0778 0.27% train_async_gpu_batch_32_channels_first 0.1268 0.1271 0.23% train_dataset_gpu_batch_16_channels_first 0.1085 0.1094 0.77% train_dataset_gpu_batch_32_channels_first 0.1473 0.1477 0.28% train_dataset_with_defun_gpu_batch_16_channels_first 0.0800 0.0803 0.37% train_dataset_with_defun_gpu_batch_32_channels_first 0.1325 0.1326 0.09% train_gpu_batch_16_channels_first 0.0812 0.0813 0.18% train_gpu_batch_32_channels_first 0.1329 0.1325 -0.32% train_with_defun_gpu_batch_16_channels_first 0.0789 0.0791 0.26% train_with_defun_gpu_batch_32_channels_first 0.1325 0.1325 -0.02% There is minimal impact to Eager performance. 2. tf_cnn_benchmarks tf_cnn_benchmarks was run internally with the following flags: --batch_size=128 --model=resnet50 It was run 60 times with and without this change. With this change, tf_cnn_benchmarks had all instances of a tf.pad followed by Conv2D replaced with an explicitly padded Conv2d. It got 330.96 images/sec with this change and 330.80 without, and the difference is likely noise. Therefore, this change does not improve tf_cnn_benchmarks performance. 3. Conv2D benchmarks The added benchmarks to conv_ops_test.py were run with this change, each 400 times and the average was taken. They were not run without this change. The table groups the 8 benchmarks into 4 pairs, with each pair running two similar benchmarks, one with explicit padding, and one without explicit padding. Benchmark name Explicit Non-explicit % diff explicit/manual pad forward 0.001815 0.002006 10.56% explicit/manual pad backward 0.006261 0.006937 10.79% eager explicit/same pad 0.039320 0.038403 -2.33% graph explicit/same pad 0.037039 0.037034 -0.11% The first two rows show there is theoretical performance gains to using explicit padding over a manual tf.pad followed by the convolution. On Resnet50, we were not able to achieve this performance gain in practice, as tf_cnn_benchmarks saw no improvement. On models that use larger paddings than in Resnet50, the performance gain will less negligible. The last two rows compare explicit padding padding to the equivalent same padding, to see if explicit padding adds any overhead. In Graph mode, there is no overhead, but Eager mode has some overhead with explicit padding over SAME padding. PiperOrigin-RevId: 228439591
Loading
Please sign in to comment