Rónán Páircéir, Sally McClean and Bryan Scotney
Data Warehouses and statistical databases (Shoshani 1997) contain both numerical attributes (measures) and categorical attributes (dimensions). These data are often stored within a relational database with an associated hierarchical structure. There are few algorithms to date that explicitly exploit this hierarchical structure when carrying out knowledge discovery on such data. We look at a number of aspects of knowledge discovery from a set of databases distributed over the internet including the following: Discovery of statistical relationships, rules and exceptions from hierarchically structured data which may contain heterogeneous and non-independent instances; Use of aggregates as a set of sufficient statistics in place of base data for efficient model computation; Leveraging the power of a relational database system for efficient computation of sufficient statistics; Use of statistical metadata to aid distributed data integration and knowledge discovery.