Generating Diverse Translation by Manipulating Multi-Head Attention

Authors

  • Zewei Sun Nanjing University
  • Shujian Huang Nanjing University
  • Hao-Ran Wei Nanjing University
  • Xin-yu Dai Nanjing University
  • Jiajun Chen Nanjing University

DOI:

https://doi.org/10.1609/aaai.v34i05.6429

Abstract

Transformer model (Vaswani et al. 2017) has been widely used in machine translation tasks and obtained state-of-the-art results. In this paper, we report an interesting phenomenon in its encoder-decoder multi-head attention: different attention heads of the final decoder layer align to different word translation candidates. We empirically verify this discovery and propose a method to generate diverse translations by manipulating heads. Furthermore, we make use of these diverse translations with the back-translation technique for better data augmentation. Experiment results show that our method generates diverse translations without a severe drop in translation quality. Experiments also show that back-translation with these diverse translations could bring a significant improvement in performance on translation tasks. An auxiliary experiment of conversation response generation task proves the effect of diversity as well.

Downloads

Published

2020-04-03

How to Cite

Sun, Z., Huang, S., Wei, H.-R., Dai, X.- yu, & Chen, J. (2020). Generating Diverse Translation by Manipulating Multi-Head Attention. Proceedings of the AAAI Conference on Artificial Intelligence, 34(05), 8976-8983. https://doi.org/10.1609/aaai.v34i05.6429

Issue

Section

AAAI Technical Track: Natural Language Processing