Proceedings:
No. 11: AAAI-21 Technical Tracks 11
Volume
Issue:
Proceedings of the AAAI Conference on Artificial Intelligence, 35
Track:
AAAI Technical Track on Machine Learning IV
Downloads:
Abstract:
Multi-exit architectures, in which a sequence of intermediate classifiers are introduced at different depths of the feature layers, perform adaptive computation by early exiting ``easy" samples to speed up the inference. In this paper, a novel Harmonized Dense Knowledge Distillation (HDKD) training method for multi-exit architecture is designed to encourage each exit to flexibly learn from all its later exits. In particular, a general dense knowledge distillation training objective is proposed to incorporate all possible beneficial supervision information for multi-exit learning, where a harmonized weighting scheme is designed for the multi-objective optimization problem consisting of multi-exit classification loss and dense distillation loss. A bilevel optimization algorithm is introduced for alternatively updating the weights of multiple objectives and the multi-exit network parameters. Specifically, the loss weighting parameters are optimized with respect to its performance on validation set by gradient descent. Experiments on CIFAR100 and ImageNet show that the HDKD strategy harmoniously improves the performance of the state-of-the-art multi-exit neural networks. Moreover, this method does not require within architecture modifications and can be effectively combined with other previously-proposed training techniques and further boosts the performance.
DOI:
10.1609/aaai.v35i11.17225
AAAI
Proceedings of the AAAI Conference on Artificial Intelligence, 35