Evaluating Probability Estimates from Decision Trees

Nitesh Chawla, David Cieslak

Decision trees, a popular choice for classification, have their limitation in providing good quality probability estimates. Typically, smoothing methods such as Laplace or m-estimate are applied at the decision tree leaves to overcome the systematic bias introduced by the frequency-based estimates. An ensemble of decision trees has also been shown to help in reducing the bias and variance in the leaf estimates, resulting in better calibrated probabilistic predictions. In this work, we evaluate the calibration or quality of these estimates using various loss measures. We also examine the relationship between the quality of such estimates and resulting rank-ordering of test instances. Our results quantify the impact of smoothing in terms of the loss measures, and the coupled relationship with the AUC measure.

Subjects: 12. Machine Learning and Discovery; 15.6 Decision Trees

Submitted: May 25, 2006

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.