Abstract:
We consider a finite-pool data categorization scenario which requires exhaustively classifying a given set of examples with a limited budget. We adopt a hybrid human-machine approach which blends automatic machine learning with human labeling across a tiered workforce composed of domain experts and crowd workers. To effectively achieve high-accuracy labels over the instances in the pool at minimal cost, we develop a novel approach based on decision-theoretic active learning. On the important task of biomedical citation screening for systematic reviews, results on real data show that our method achieves consistent improvements over baseline strategies. To foster further research by others, we have made our data available online.

Published Date: 2015-11-12
Registration: ISBN 978-1-57735-740-7
DOI:
10.1609/hcomp.v3i1.13225