Local Regularizer Improves Generalization

  • Yikai Zhang Rutgers University
  • Hui Qu Rutgers University
  • Dimitris Metaxas Rutgers University
  • Chao Chen Stony Brook University

Abstract

Regularization plays an important role in generalization of deep learning. In this paper, we study the generalization power of an unbiased regularizor for training algorithms in deep learning. We focus on training methods called Locally Regularized Stochastic Gradient Descent (LRSGD). An LRSGD leverages a proximal type penalty in gradient descent steps to regularize SGD in training. We show that by carefully choosing relevant parameters, LRSGD generalizes better than SGD. Our thorough theoretical analysis is supported by experimental evidence. It advances our theoretical understanding of deep learning and provides new perspectives on designing training algorithms. The code is available at https://github.com/huiqu18/LRSGD.

Published
2020-04-03
Section
AAAI Technical Track: Machine Learning