We examine the problem of Object Discovery, the autonomous acquisition of object models, using a combination of shape, appearance and motion. We propose a novel multi-stage technique for detecting rigidly moving objects and modeling their appearance for recognition. First, a stereo camera is used to acquire a sequence of images and depth maps of a given scene. Then the scene is oversegmented using normalized cuts based on a combination of shape and appearance. SIFT image features are matched between sequential pairs of images to identify groups of moving features. The 3D movement of these features is used to determine which regions in the segmentation of the scene correspond to objects, grouping oversegmented regions as necessary. Additional features are extracted from these regions and combined with the rigidly moving image features to create snapshots of the object's appearance. Over time, these snapshots are combined to produce models. We show sample outputs for each each stage of our approach and demonstrate the effectiveness of our object models for recognition.