Son Dao and Brad Perry, Hughes Research Laboratories
An application of data mining techniques to heterogeneous database schema integration is introduced. We use attribute-oriented induction to mine for characteristic and classification rules about individual attributes from heterogeneous databases. Each mining request is conditioned on a subset of attributes identified as common between the multiple databases. We develop a method to compare the rules for two or more attributes (from different databases) and use the similarity between the rules as a basis to suggest similarity between attributes. As a result, we use relationships between and among entire sets of attributes from multiple databases to drive the schema integration process. Our initial efforts and prototypes applying data mining to assist schema integration prove promising and, we feel, identify a fruitful application area for data mining research.