Published Date: 2018-02-08
Registration: ISSN 2374-3468 (Online) ISSN 2159-5399 (Print)
Copyright: Published by AAAI Press, Palo Alto, California USA Copyright © 2018, Association for the Advancement of Artificial Intelligence All Rights Reserved.
Action prediction based on video is an important problem in computer vision field with many applications, such as preventing accidents and criminal activities. It's challenging to predict actions at the early stage because of the large variations between early observed videos and complete ones. Besides, intra-class variations cause confusions to the predictors as well. In this paper, we propose a mem-LSTM model to predict actions in the early stage, in which a memory module is introduced to record several "hard-to-predict" samples and a variety of early observations. Our method uses Convolution Neural Network (CNN) and Long Short-Term Memory (LSTM) to model partial observed video input. We augment LSTM with a memory module to remember challenging video instances. With the memory module, our mem-LSTM model not only achieves impressive performance in the early stage but also makes predictions without the prior knowledge of observation ratio. Information in future frames is also utilized using a bi-directional layer of LSTM. Experiments on UCF-101 and Sports-1M datasets show that our method outperforms state-of-the-art methods.