Cogra: Concept-Drift-Aware Stochastic Gradient Descent for Time-Series Forecasting

  • Kohei Miyaguchi The University of Tokyo
  • Hiroshi Kajino IBM Research - Tokyo


We approach the time-series forecasting problem in the presence of concept drift by automatic learning rate tuning of stochastic gradient descent (SGD). The SGD-based approach is preferable to other concept drift algorithms in that it can be applied to any model and it can keep learning efficiently whilst predicting online. Among a number of SGD algorithms, the variance-based SGD (vSGD) can successfully handle concept drift by automatic learning rate tuning, which is reduced to an adaptive mean estimation problem. However, its performance is still limited because of its heuristic mean estimator. In this paper, we present a concept-drift-aware stochastic gradient descent (Cogra), equipped with more theoretically-sound mean estimator called sequential mean tracker (SMT). Our key contribution is that we define a goodness criterion for the mean estimators; SMT is designed to be optimal according to this criterion. As a result of comprehensive experiments, we find that (i) our SMT can estimate the mean better than vSGD’s estimator in the presence of concept drift, and (ii) in terms of predictive performance, Cogra reduces the predictive loss by 16–67% for real-world datasets, indicating that SMT improves the prediction accuracy significantly.