We present an approach to robust, real-time person tracking in crowded and/or unknown environments using multimodal integration. We combine stereo, color, and face detection modules into a single robust system, and show an initial application for an interactive display where the user sees his face distorted into various comic poses in real-time. Stereo processing is used to isolate the figure of a user from other objects and people in the background. Skin-hue classification identifies and tracks likely body parts within the foreground region, and face pattern detection discriminates and localizes the face within the tracked body parts. We discuss the failure modes of these individual components, and report results with the complete system in trials with thousands of users.