Proceedings:
No. 2: AAAI-21 Technical Tracks 2
Volume
Issue:
Proceedings of the AAAI Conference on Artificial Intelligence, 35
Track:
AAAI Technical Track on Computer Vision I
Downloads:
Abstract:
Video classification is computationally expensive. In this paper, we address theproblem of frame selection to reduce the computational cost of video classification.Recent work has successfully leveraged frame selection for long, untrimmed videos,where much of the content is not relevant, and easy to discard. In this work, however,we focus on the more standard short, trimmed video classification problem. Weargue that good frame selection can not only reduce the computational cost of videoclassification but also increase the accuracy by getting rid of frames that are hard toclassify. In contrast to previous work, we propose a method that instead of selectingframes by considering one at a time, considers them jointly. This results in a moreefficient selection, where “good" frames are more effectively distributed over thevideo, like snapshots that tell a story. We call the proposed frame selection SMARTand we test it in combination with different backbone architectures and on multiplebenchmarks (Kinetics [5], Something-something [14], UCF101 [31]). We showthat the SMART frame selection consistently improves the accuracy compared toother frame selection strategies while reducing the computational cost by a factorof 4 to 10 times. Additionally, we show that when the primary goal is recognitionperformance, our selection strategy can improve over recent state-of-the-art modelsand frame selection strategies on various benchmarks (UCF101, HMDB51 [21],FCVID [17], and ActivityNet [4]).
DOI:
10.1609/aaai.v35i2.16235
AAAI
Proceedings of the AAAI Conference on Artificial Intelligence, 35