Cudnn RNN v2 kernels with autotune capability
CudnnRNN V2 kernels run all applicable cudnn rnn algorithms and pick the best one for following runs.
* To enable autotune, TF_CUDNN_RNN_USE_AUTOTUNE and TF_CUDNN_RNN_USE_V2 need to be set to {"1" or unset}.
* TF_CUDNN_RNN_USE_AUTOTUNE does not work with existing CudnnRNN kernels.
* V2 kernels work with existing cudnn checkpoints, since it doesn't change persistence format.
This change
* Introduces v2 kernels as templates inheriting the v1 kernels.
* Profiles fwd and bak runs in v2 kernel (forward pass)
* Exposes the chosen algorithm as fwd op output and bak op input.
* Changes rnn descriptor cache key to include AlgorithmDesc (since cudnn rnn descriptor can't be reused across different algorithms)
* Updates unittests s.t. it tests both v1 and v2 kernels. When testing v2 kernels, autotune is turned on.
PiperOrigin-RevId: 194333948
Loading
Please sign in to comment