We introduce an unsupervised approach to efficiently discover the underlying features in a data set via crowdsourcing. Our queries ask crowd members to articulate a feature common to two out of three displayed examples. In addition, we ask the crowd to provide binary labels for these discovered features on the remaining examples. The triples are chosen adaptively based on the labels of the previously discovered features on the data set. This approach is motivated by a formal framework of feature elicitation that we introduce and analyze in this paper. In two natural models of features, hierarchical and independent, we show that a simple adaptive algorithm recovers all features with less labor than any nonadaptive algorithm. The savings are as a result of automatically avoiding the elicitation of redundant features or synonyms. Experimental results validate the theoretical findings and the usefulness of this approach.
Published Date: 2015-11-12
Registration: ISBN 978-1-57735-740-7