Long Short-Term Sample Distillation

Liang Jiang; Zujie Wen; Zhongping Liang; Yafang Wang; Gerard de Melo; Zhe Li; Liangzhuang Ma; Jiaxing Zhang; Xiaolong Li; Yuan Qi

doi:10.1609/aaai.v34i04.5859

Authors

Liang Jiang Ant Financial Services Group
Zujie Wen Ant Financial Services Group
Zhongping Liang Ant Financial Services Group
Yafang Wang Ant Financial Services Group
Gerard de Melo Rutgers University
Zhe Li Ant Financial Services Group
Liangzhuang Ma Ant Financial Services Group
Jiaxing Zhang AntFinancial Services Group
Xiaolong Li Ant Financial
Yuan Qi Ant Financial Services Group

DOI:

https://doi.org/10.1609/aaai.v34i04.5859

Abstract

In the past decade, there has been substantial progress at training increasingly deep neural networks. Recent advances within the teacher–student training paradigm have established that information about past training updates show promise as a source of guidance during subsequent training steps. Based on this notion, in this paper, we propose Long Short-Term Sample Distillation, a novel training policy that simultaneously leverages multiple phases of the previous training process to guide the later training updates to a neural network, while efficiently proceeding in just one single generation pass. With Long Short-Term Sample Distillation, the supervision signal for each sample is decomposed into two parts: a long-term signal and a short-term one. The long-term teacher draws on snapshots from several epochs ago in order to provide steadfast guidance and to guarantee teacher–student differences, while the short-term one yields more up-to-date cues with the goal of enabling higher-quality updates. Moreover, the teachers for each sample are unique, such that, overall, the model learns from a very diverse set of teachers. Comprehensive experimental results across a range of vision and NLP tasks demonstrate the effectiveness of this new training method.

Long Short-Term Sample Distillation

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription