Crowdsourcing can be a fast, flexible and cost-effective approach to obtaining data for training and evaluating machine learning algorithms. In this paper, we discuss a novel crowdsourcing application: creating a dataset for evaluating name matchers. Name matching is the challenging and subjective task of identifying which names refer to the same person; it is crucial for effective entity disambiguation and search. We have developed an effective question interface and work quality analysis algorithm for our task, which can be applied to other ranking tasks (e.g. search result ranking, recommendation system evaluation, etc.). We have demonstrated that our crowdsourced dataset can successfully be used to evaluate automatic name-matching algorithms.
Published Date: 2013-11-10
Registration: ISBN 978-1-57735-607-3