We describe a machine learning architecture for hierarchical vision systems. These vision systems work by successively grouping visual constructs at one level, selecting the most promising, and passing them up to higher levels of processing. This continues from the pixel-level of the image to the object-modelevel. Traditionally, researchers have used static heuristics at each level to select the best constructs. In practice, this approach is brittle, because people have not been successful at surveying the evidence necessary for robust performance, and is static, because designers have not incorporated learning mechanisms that would let the system improve its performance with the aid of user feedback. The machine learning architecture proposed herein is an attempt to address both of these issues.