Accelerating Neural Machine Translation with Partial Word Embedding Compression

Authors

Fan Zhang

Communication University of China Samsung Research China - Beijing

Mei Tu

Samsung Research China - Beijing

Jinyao Yan

Communication University of China

Proceedings:

No. 16: AAAI-21 Technical Tracks 16

Volume

Issue:

Proceedings of the AAAI Conference on Artificial Intelligence, 35

Track:

AAAI Technical Track on Speech and Natural Language Processing III

Downloads:

Download PDF

Abstract:

Large model size and high computational complexity prevent the neural machine translation (NMT) models from being deployed to low resource devices (e.g. mobile phones). Due to the large vocabulary, a large storage memory is required for the word embedding matrix in NMT models, in the meantime, high latency is introduced when constructing the word probability distribution. Based on reusing the word embedding matrix in the softmax layer, it is possible to handle the two problems brought by large vocabulary at the same time. In this paper, we propose Partial Vector Quantization (P-VQ) for NMT models, which can both compress the word embedding matrix and accelerate word probability prediction in the softmax layer. With P-VQ, the word embedding matrix is split into two low dimensional matrices, namely the shared part and the exclusive part. We compress the shared part by vector quantization and leave the exclusive part unchanged to maintain the uniqueness of each word. For acceleration, in the softmax layer, we replace most of the multiplication operations with the efficient looking-up operations based on our compression to reduce the computational complexity. Furthermore, we adopt curriculum learning and compact the word embedding matrix gradually to improve the compression quality. Experimental results on the Chinese-to-English translation task show that our method can reduce 74.35% of parameters of the word embedding and 74.42% of the FLOPs of the softmax layer. Meanwhile, the average BLEU score on the WMT test sets only drops 0.04.

DOI:

10.1609/aaai.v35i16.17688

AAAI

Proceedings of the AAAI Conference on Artificial Intelligence, 35

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.