Efficient Strategies for Improving Partitioning-Based Author Coreference by Incorporating Web Pages as Graph Nodes

Pallika H. Kanani, Andrew McCallum

Entity resolution in the domain of research paper authors is an important, but difficult problem. It suffers from insufficient contextual information, hence adding information from the web can significantly improve performance. We formulate the author coreference problem as one of graph partitioning with discriminatively-trained edge weights. Building on our previous work, this paper presents improved and more comprehensive results for the method in which we incorporate web documents as additional nodes in the graph. We also propose efficient strategies to select a subset of nodes to add to the graph and to select a subset of queries to gather additional nodes, without significant loss of performance gain. We extend the classic Set-cover problem to develop a node selection criteria, hence opening up interesting theoretical possibilities. Finally, we propose a hybrid approach, that achieves 74.3% of the total improvement gain using only 18.3% of all additional mentions.

Subjects: 12. Machine Learning and Discovery; 10. Knowledge Acquisition

Submitted: May 15, 2007

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.