Proceedings:
No. 1: AAAI-19, IAAI-19, EAAI-20
Volume
Issue:
Proceedings of the AAAI Conference on Artificial Intelligence, 33
Track:
AAAI Technical Track: Vision
Downloads:
Abstract:
This paper presents an efficient algorithm to tackle temporal localization of activities in videos via sentence queries. The task differs from traditional action localization in three aspects: (1) Activities are combinations of various kinds of actions and may span a long period of time. (2) Sentence queries are not limited to a predefined list of classes. (3) The videos usually contain multiple different activity instances. Traditional proposal-based approaches for action localization that only consider the class-agnostic “actionness” of video snippets are insufficient to tackle this task. We propose a novel Semantic Activity Proposal (SAP) which integrates the semantic information of sentence queries into the proposal generation process to get discriminative activity proposals. Visual and semantic information are jointly utilized for proposal ranking and refinement. We evaluate our algorithm on the TACoS dataset and the Charades-STA dataset. Experimental results show that our algorithm outperforms existing methods on both datasets, and at the same time reduces the number of proposals by a factor of at least 10.
DOI:
10.1609/aaai.v33i01.33018199
AAAI
Proceedings of the AAAI Conference on Artificial Intelligence, 33