Commit 418c7258 authored Sep 11, 2018 by Eugene Zhulenev Committed by TensorFlower Gardener Sep 11, 2018

Optimize Spatial&Cuboid backward kernel convolutions.

Without shuffle TensorExecutor uses optimized (specialized) gemm_pack_rhs to pack memory before contraction. Custom rhs packer is much faster than contracting by inner dimension with default packer.

1. CuboidConvolutionBwdKernel: ~10x-25x speedup
2. SpatialConvolutionBwdKernel: ~2x-10x speedup

PiperOrigin-RevId: 212506483

parent dad6912b

Show whitespace changes

Inline Side-by-side

Please to comment