LARS Optimizer in TensorFlow
Based on CL from Chris Ying and contributions from Y. You and Wang Tao. Introduced by "Large Batch Training of Convolutional Networks" by Y. You, I. Gitman, and B. Ginsburg. (https://arxiv.org/abs/1708.03888) Implements the LARS learning rate scheme presented in the paper above. This optimizer is useful when scaling the batch size to up to 32K without significant performance degradation. It is recommended to use the optimizer in conjunction with: - Gradual learning rate warm-up - Linear learning rate scaling - Poly rule learning rate decay With this optimizer, ResNet-50 now converges to 76.3% top-1 accuracy at batch 32K on a JF Pod. PiperOrigin-RevId: 208914187
Loading
Please sign in to comment