Asynchronous Stochastic Gradient Descent for Extreme-Scale Recommender Systems

Authors

Lewis Liu

University of Montreal, Quebec

Kun Zhao

Alibaba Group

Proceedings:

No. 1: AAAI-21 Technical Tracks 1

Volume

Issue:

Proceedings of the AAAI Conference on Artificial Intelligence, 35

Track:

AAAI Technical Track on Application Domains

Downloads:

Download PDF

Abstract:

Recommender systems are influential for many internet applications. As the size of the dataset provided for a recommendation model grows rapidly, how to utilize such amount of data effectively matters a lot. For a typical Click-Through-Rate(CTR) prediction model, the amount of daily samples can probably be up to hundreds of terabytes, which reaches dozens of petabytes at an extreme-scale when we take several days into consideration. Such data makes it essential to train the model parallelly and continuously. Traditional asynchronous stochastic gradient descent (ASGD) and its variants are proved efficient but often suffer from stale gradients. Hence, the model convergence tends to be worse as more workers are used. Moreover, the existing adaptive optimizers, which are friendly to sparse data, stagger in long-term training due to the significant imbalance between new and accumulated gradients. To address the challenges posed by extreme-scale data, we propose: 1) Staleness normalization and data normalization to eliminate the turbulence of stale gradients when training asynchronously in hundreds and thousands of workers; 2) SWAP, a novel framework for adaptive optimizers to balance the new and historical gradients by taking sampling period into consideration. We implement these approaches in TensorFlow and apply them to CTR tasks in real-world e- commerce scenarios. Experiments show that the number of workers in asynchronous training can be extended to 3000 with guaranteed convergence, and the final AUC is improved by more than 5 percentage.

DOI:

10.1609/aaai.v35i1.16108

AAAI

Proceedings of the AAAI Conference on Artificial Intelligence, 35

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.