Name Disambiguation Using Web Connection

Yiming Lu, Zaiqing Nie, Taoyuan Cheng, Ying Gao, Ji-Rong Wen

Name disambiguation is an important challenge in data cleaning. In this paper, we focus on the problem that multiple real-world objects (e.g., authors, actors) in a dataset share the same name. We show that Web corpora can be exploited to significantly improve the accuracy (i.e. precision and recall) of name disambiguation. We introduce a novel approach called WebNaD (Web-based Name Disambiguation) to effectively measure and use the Web connection between different object appearances of the same name in the local dataset. Our empirical study done in the context of Libra, an academic search engine that indexes 1 million papers, shows the effectiveness of our approach.

Subjects: 1.10 Information Retrieval; 1.6 Engineering And Science

Submitted: May 15, 2007

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.