In this paper, we propose a weakly supervised temporal action localization method on untrimmed videos based on prototypical networks. We observe two challenges posed by weakly supervision, namely action-background separation and action relation construction. Unlike the previous method, we propose to achieve action-background separation only by the original videos. To achieve this, a clustering loss is adopted to separate actions from backgrounds and learn intra-compact features, which helps in detecting complete action instances. Besides, a similarity weighting module is devised to further separate actions from backgrounds. To effectively identify actions, we propose to construct relations among actions for prototype learning. A GCN-based prototype embedding module is introduced to generate relational prototypes. Experiments on THUMOS14 and ActivityNet1.2 datasets show that our method outperforms the state-of-the-art methods.
Published Date: 2020-06-02
Registration: ISSN 2374-3468 (Online) ISSN 2159-5399 (Print) ISBN 978-1-57735-835-0 (10 issue set)
Copyright: Published by AAAI Press, Palo Alto, California USA Copyright © 2020, Association for the Advancement of Artificial Intelligence All Rights Reserved