We examine how to use emerging far-infrared imager ensembles to detect certain objects of interest (e.g., faces, hands, people and animals) in synchronized RGB video streams at very low power. We formulate the problem as one of selecting subsets of sensing elements (among many thousand possibilities) from the ensembles for tests. The subset selection problem is naturally adaptive and online: testing certain elements early can obviate the need for testing many others later, and selection policies must be updated at inference time. We pose the ensemble sensor selection problem as a structured extension of test-cost-sensitive classification, propose a principled suite of techniques to exploit ensemble structure to speed up processing and show how to re-estimate policies fast. We estimate reductions in power consumption of roughly 50x relative to even highly optimized implementations of face detection, a canonical object-detection problem. We also illustrate the benefits of adaptivity and online estimation.