Multimedia Information Extraction Roadmap Critical Technical Challenges Information about people, their activities and communication is one of the most important types of content to extract from video data. This information is expressed in both the audio and visual domains. The critical technical challenges for extracting such content include: 1) Understanding interactions between people — their relationships, functional roles, hierarchies and dominance; and understanding their activities. 2) Broadening the robustness of multimodal information extraction techniques beyond narrowly constrained sets of actions, limited venues, and reliance on highly instrumented collection environments. 3) Obtaining sufficient amounts of annotated data for training models and classifiers. 4) Developing novel multimodal fusion techniques for semantically complex tasks involving human behavior.