AAAI Publications, Thirty-First AAAI Conference on Artificial Intelligence

Font Size: 
Generalization Error Bounds for Optimization Algorithms via Stability
Qi Meng, Yue Wang, Wei Chen, Taifeng Wang, Zhi-Ming Ma, Tie-Yan Liu

Last modified: 2017-02-13


Many machine learning tasks can be formulated as Regularized Empirical Risk Minimization (R-ERM), and solved by optimization algorithms such as gradient descent (GD), stochastic gradient descent (SGD), and stochastic variance reduction (SVRG). Conventional analysis on these optimization algorithms focuses on their convergence rates during the training process, however, people in the machine learning community may care more about the generalization performance of the learned model on unseen test data. In this paper, we investigate on this issue, by using stability as a tool. In particular, we decompose the generalization error for R-ERM, and derive its upper bound for both convex and nonconvex cases. In convex cases, we prove that the generalization error can be bounded by the convergence rate of the optimization algorithm and the stability of the R-ERM process, both in expectation (in the order of 𝒪(1/n)+ 𝔼ρ(T)), where ρ(T) is the convergence error and T is the number of iterations) and in high probability (in the order of 𝒪(log{1/δ / √n + ρ(T) with probability 1 – δ). For nonconvex cases, we can also obtain a similar expected generalization error bound. Our theorems indicate that 1) along with the training process, the generalization error will decrease for all the optimization algorithms under our investigation; 2) Comparatively speaking, SVRG has better generalization ability than GD and SGD. We have conducted experiments on both convex and nonconvex problems, and the experimental results verify our theoretical findings.


generalization;stability;optimization;stochastic gradient descent;variance reduction

Full Text: PDF