S. H. Zeller
The diabetes dataset is characterized by a limited number of independent variables and the existence of strong prior information about the nature of causality thought to exist. Consequently, careful statistical modeling using modem techniques should be able to do a good job in determining what the underlying relationships are. It of great interest, however, to see how well alternative approaches -- that require less intervention on the part of analysts -- can compare to the statistical modeling approach. Two such alternative approaches to knowledge discovery are examined. The first is based on induction and takes two forms: tree based regression and the C4.5 algorithm of Quinlan. The second employs a simple neural net and backpropagation algorithm. The results from these two alternative approaches are then compared to the statistical model results.