Heuristic Joins to Integrate Structured Heterogeneous Data

Scott B. Huffman and David Steier

Heterogeneous data sources often exhibit semantic heterogeneity at the data level; that is, the same entity in the world is referred to in different ways both within and across sources. This paper discusses a framework for combining information from such sources, called heuristic join, that is an extension of the familiar equi-join for homogeneous sources. Heuristic join uses heuristic match operators rather than simple equality to determine whether tuples refer to the same entity. The inexactness of heuristic matching introduces a number of parameters into heuristic join that are not present in equi-joins. Our work is motivated by a real-world data integration problem that required the use of heuristic joins.

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.