Segment-Then-Rank: Non-Factoid Question Answering on Instructional Videos

Authors

  • Kyungjae Lee Yonsei University
  • Nan Duan Microsoft Research Asia
  • Lei Ji University of Chinese Academy of Science
  • Jason Li Microsoft STCA Multimedia Group
  • Seung-won Hwang Yonsei University

DOI:

https://doi.org/10.1609/aaai.v34i05.6327

Abstract

We study the problem of non-factoid QA on instructional videos. Existing work focuses either on visual or textual modality of video content, to find matching answers to the question. However, neither is flexible enough for our problem setting of non-factoid answers with varying lengths. Motivated by this, we propose a two-stage model: (a) multimodal segmentation of video into span candidates and (b) length-adaptive ranking of the candidates to the question. First, for segmentation, we propose Segmenter for generating span candidates of diverse length, considering both textual and visual modality. Second, for ranking, we propose Ranker to score the candidates, dynamically combining the two models with complementary strength for both short and long spans respectively. Experimental result demonstrates that our model achieves state-of-the-art performance.

Downloads

Published

2020-04-03

How to Cite

Lee, K., Duan, N., Ji, L., Li, J., & Hwang, S.- won. (2020). Segment-Then-Rank: Non-Factoid Question Answering on Instructional Videos. Proceedings of the AAAI Conference on Artificial Intelligence, 34(05), 8147-8154. https://doi.org/10.1609/aaai.v34i05.6327

Issue

Section

AAAI Technical Track: Natural Language Processing