Text Embedding Bank for Detailed Image Paragraph Captioning

Authors

Arjun Gupta

University of Illinois at Urbana-Champaign

Zengming Shen

University of Illinois at Urbana-Champaign

Thomas Huang

University of Illinois at Urbana-Champaign

Proceedings:

No. 18: AAAI-21 Student Papers and Demonstrations

Volume

Issue:

Proceedings of the AAAI Conference on Artificial Intelligence, 35

Track:

AAAI Student Abstract and Poster Program

Downloads:

Download PDF

Abstract:

Existing deep learning-based models for image captioning typically consist of an image encoder to extract visual features and a language model decoder, an architecture that has shown promising results in single high-level sentence generation. However, only the word-level guiding signal is available when the image encoder is optimized to extract visual features. The inconsistency between the parallel extraction of visual features and sequential text supervision limits its success when the length of the generated text is long (more than 50 words). We propose a new module, called the Text Embedding Bank (TEB), to address this problem for image paragraph captioning. This module uses the paragraph vector model to learn fixed-length feature representations from a variable-length paragraph. We refer to the fixed-length feature as the TEB. This TEB module plays two roles to benefit paragraph captioning performance. First, it acts as a form of global and coherent deep supervision to regularize visual feature extraction in the image encoder. Second, it acts as a distributed memory to provide features of the whole paragraph to the language model, which alleviates the long-term dependency problem. Adding this module to two existing state-of-the-art methods achieves a new state-of-the-art result on the paragraph captioning Stanford Visual Genome dataset.

DOI:

10.1609/aaai.v35i18.17892

AAAI

Proceedings of the AAAI Conference on Artificial Intelligence, 35

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.