Discovering Attribute Dependence in Databases by Integrating Symbolic Learning and Statistical Techniques

L F. Imam, R. S. Michalski, and L. Kershberg

The paper presents a method for integrating two different types of data analysis: symbolic inductive learning and statistical methods. The method concerns the problem of discovering rules characterizing the dependence between a group of dependent attributes and a group of independent groups, e.g., decision attributes and measurable attributes. The method proceeds along two paths. One paths generates a set of rules characterizing this dependency using a symbolic inductive learning program (AQ15). In the obtained rules, each attribute is assigned "importance score" which represents the ratio of the total number of examples that can be classified by this attribute to the maximum number of classified examples. Second path calculates the correlation coefficient between the decision and measurable attributes using the Chi-square test. The independent attributes with low "importance score" and low correlation are considered irrelevant, and removed from the data. The AQ15 is applied again to the data set spanned over the reduced set of attributes, in order to determine the simplest expression characterizing the dependency. The method has been experimentally applied to two real world problems: "Wind bracing" Mto determine the dependence of the type of wind-bracing for tall buildings on the parameters of the building, and "Accident factors"Mto determine the dependence between the age of construction workers and the types of accidents. The results have shown that the proposed approach of combining two different methods for relevant attribute determination gives better results than those obtained by applying either method separately.

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.