Probabilistic Representations for Integrating Unreliable Data Sources

David Mimno, Andrew McCallum, Gerome Miklau

Databases constructed automatically through web mining and information extraction often overlap with databases constructed and curated by hand. These two types of databases are complementary: automatic extraction provides increased scope, while curated databases provide increased accuracy. The uncertain nature of such integration tasks suggests that the final representation of the merged database should represent multiple possible values. We present initial work on a system to integrate two bibliographic databases, DBLP and Rexa, while maintaining and assigning probabilistic confidences to different alternative values in merged records.

Subjects: 11. Knowledge Representation; 3.4 Probabilistic Reasoning

Submitted: May 15, 2007

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.