On the Use of Statistical, Neural Net and Machine Learning Techniques to Find Structure in Medical Data: A Case Study with the Diabetes Dataset

S. H. Zeller

The diabetes dataset is characterized by a limited number of independent variables and the existence of strong prior information about the nature of causality thought to exist. Consequently, careful statistical modeling using modem techniques should be able to do a good job in determining what the underlying relationships are. It of great interest, however, to see how well alternative approaches -- that require less intervention on the part of analysts -- can compare to the statistical modeling approach. Two such alternative approaches to knowledge discovery are examined. The first is based on induction and takes two forms: tree based regression and the C4.5 algorithm of Quinlan. The second employs a simple neural net and backpropagation algorithm. The results from these two alternative approaches are then compared to the statistical model results.


This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.