Some Implementation Aspects of a Discovery System

Authors

Willi Klosgen

Track:

Discovery of Dependencies and Models

Downloads:

Abstract:

Explora supports Discovery in Databases by large scale search for interesting instances of statistical patterns. Due to the variety of patterns and the immense combinatorial possibilities in studying relations between variables in subsets of data, at least two implementation problems arise. First, the user must be saved from getting overwhelmed with a deluge of findings. This can be achieved by some basic organization principles included into search. One principle is, to organize search hierarchically and to study first the strongest hypotheses (that mostly means the most general ones). Weaker hypotheses are then eliminated from further search. But even in moderately sized data, that approach alone usually does not prevent from large sets of findings. Therefore, in a second evaluation phase, a refinement strategy selects the most interesting verified statements and treats also the overlapping problem (due to correlations between'independent variables). Further, the user can focus a discovery task by a more detailed specification of the analysis problem to be treated. Second, it is important for discovery systems, to manage the efficiency problem. Each hypothesis evaluated when processing the large search space refers to subsets of cases stored in a database. These subsets correspond to combinations of variables and their (taxonomical) values. principal, each subset needs random accesses to a lot of cases which takes much computation time. We describe solutions implemented in the discovery system Explora to deal with these two problems. In an appendix, results of a discovery session in Explora are presented, and the necessity to insert more statistical strategies into a "higher" discovery level is discussed. On this level, instances of patterns verified during basic search are selected, refined, and combined to achieve a higher quality of presented findings including more interpretation potentiality.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.