Data Mining and Model Simplicity: A Case Study in Diagnosis

Gregory M. Provan, Rockwell Science Center and Moninder Singh, University of Pennsylvania

We describe the results of performing data mining on a challenging medical diagnosis domain, acute abdominal pain. This domain is well known to be difficult, yielding little more than 60% predictive accuracy for most human and machine diagnosticians. Moreover, many researchers argue that one of the simplest approaches, the naive Bayesian classifier, is optimal. By comparing the performance of the naive Bayesian classifier to its more general cousin, the Bayesian network classifier, and to selective Bayesian classifiers with just 10% of the total attributes, we show that the simplest models perform at least as well as the more complex models. We argue that simple models like the selective naive Bayesian classifier will perform as well as more complicated models for similarly complex domains with relatively small data sets, thereby calling into question the extra expense necessary to induce more complex models.

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.