Self-attention mechanisms have recently caused many concerns on Natural Language Processing (NLP) tasks. Relative positional information is important to self-attention mechanisms. We propose Faraway Mask focusing on the (2m + 1)-gram words and Scaled-Distance Mask putting the logarithmic distance punishment to avoid and weaken the self-attention of distant words respectively. To exploit different masks, we present Positional Self-Attention Layer for generating different Masked-Self-Attentions and a following Position-Fusion Layer in which fused positional information multiplies the Masked-Self-Attentions for generating sentence embeddings. To evaluate our sentence embeddings approach Multiple Positional Self-Attention Network (MPSAN), we perform the comparison experiments on sentiment analysis, semantic relatedness and sentence classification tasks. The result shows that our MPSAN outperforms state-of-the-art methods on five datasets and the test accuracy is improved by 0.81%, 0.6% on SST, CR datasets, respectively. In addition, we reduce training parameters and improve the time efficiency of MPSAN by lowering the dimension number of self-attention and simplifying fusion mechanism.
Published Date: 2020-06-02
Registration: ISSN 2374-3468 (Online) ISSN 2159-5399 (Print) ISBN 978-1-57735-835-0 (10 issue set)
Copyright: Published by AAAI Press, Palo Alto, California USA Copyright © 2020, Association for the Advancement of Artificial Intelligence All Rights Reserved