Commit a21dcdfc authored Jan 25, 2017 by Brennan Saeta Committed by TensorFlower Gardener Jan 25, 2017

Improve the performance of the resize_bicubic_op.

This change results a >2X speedup for scaling up an image, and ~1.6x speed improvement for scaling down an image. (Based on the benchmarks defined in image_ops_test.py.)

Additionally, we preserve the old behavior in a unit test, and ensure we do not deviate by more than 1e-5. (The computations are the same, but we've reordered them, and so floating point inaccuricies crop up.)

The two biggest performance wins come from:
1. Instead of using array<float, 4>, manage them as varaibles. (This allows the compiler to avoid pushing things onto the stack and instead use registers.)
2. Cache previously computed intermediate values to avoid having to fetch and re-compute.
Change: 145577417

parent 2d2ca484

Show whitespace changes

Inline Side-by-side

Please to comment