AAAI Publications, Workshops at the Twenty-Eighth AAAI Conference on Artificial Intelligence

Multi-Modal Analysis of Movies for Rhythm Extraction
Devon Bates, Arnav Jhala

Last modified: 2014-06-18


This paper motivates a multi-modal approach for analysis of aesthetic elements of films through integration of visual and auditory features. Prior work in characterizing aesthetic elements of film has predominantly focused on visual features. We present comparison of analysis from multiple modalities in a rhythm extraction task. For detection of important events based on a model of rhythm/tempo we compare analysis of visual features and auditory features. We introduce an audio tempo function that characterizes the pacing of a video segment. We compare this function with its visual pace counterpart. Preliminary results indicate that an integrated approach could reveal more semantic and aesthetic information from digital media. With combined information from the two signals, tasks such as automatic identification of important narrative events, can enable deeper analysis of large-scale video corpora.


multi-modal analysis, aesthetics of video; signal processing

