Track:
Integrated and Interactive Systems
Downloads:
Abstract:
One of the main obstacles in applying data mining techniques to large, real-world databases is the lack of efficient data management. In this paper, we present the design and implementation of an effective two-level architecture for a data mining environment. It consists of a mining tool and a parallel DBMS server. The mining tool organize= and controls the search process, while the DBMS provides optimal response times for the few query types being used by the tool. Key elements of our architecture are its use of fast and simple database operations, its re-use of results obtained by previous queries, its maximal use of main-memory to keep the database hot-set resident, and its parallel computation of queries. Apart from a clear separation of responsibilities, we show that this architecture leads to competitive performance on large data sets. Moreover, this architecture provides a flexible experimentation platform for further studies in optimization of repetitive database queries and quality driven rule discovery schemes,