Efforts to automatically extract salient features from rich media documents (audio, video, still images, speech, etc.) have been underway and maturing for the last 15 to 20 years; longer than that if you consider speech recognition research. However, there is still a paucity of real world applications that leverage the technology, and scaling the technology for situations “outside the lab” have posed mighty challenges. In general scalability, reliability/robustness, and accuracy have presented formidable obstacles to finding, and then reaching, a market. The following lists sketch what I believe are the major technical challenges and gaps in state of the art multimedia information extraction (MMIE), as well as existing approaches and resources that can be quickly tapped for research and development.