Re-Attention for Visual Question Answering

Authors

  • Wenya Guo Nankai University
  • Ying Zhang Nankai University
  • Xiaoping Wu Nankai University
  • Jufeng Yang Nankai University
  • Xiangrui Cai Nankai University
  • Xiaojie Yuan Nankai University

DOI:

https://doi.org/10.1609/aaai.v34i01.5338

Abstract

Visual Question Answering~(VQA) requires a simultaneous understanding of images and questions. Existing methods achieve well performance by focusing on both key objects in images and key words in questions. However, the answer also contains rich information which can help to better describe the image and generate more accurate attention maps. In this paper, to utilize the information in answer, we propose a re-attention framework for the VQA task. We first associate image and question by calculating the similarity of each object-word pairs in the feature space. Then, based on the answer, the learned model re-attends the corresponding visual objects in images and reconstructs the initial attention map to produce consistent results. Benefiting from the re-attention procedure, the question can be better understood, and the satisfactory answer is generated. Extensive experiments on the benchmark dataset demonstrate the proposed method performs favorably against the state-of-the-art approaches.

Downloads

Published

2020-04-03

How to Cite

Guo, W., Zhang, Y., Wu, X., Yang, J., Cai, X., & Yuan, X. (2020). Re-Attention for Visual Question Answering. Proceedings of the AAAI Conference on Artificial Intelligence, 34(01), 91-98. https://doi.org/10.1609/aaai.v34i01.5338

Issue

Section

AAAI Technical Track: AI and the Web