Decision Tree Pruning: Biased or Optimal?

Sholom M. Weiss, Nitin Indurkhya

We evaluate the performance of weakest-link pruning of decision trees using cross-validation. This technique maps tree pruning into a problem of tree selection: Find the best (i.e. the right-sized) tree, from a set of trees ranging in size from the unpruned tree to a null tree. For samples with at least 200 cases, extensive empirical evidence supports the following conclusions relative to tree selection: (a) 10-fold cross-validation is nearly unbiased; (b) not pruning a covering tree is highly biased; (c) 10-fold cross-validation is consistent with optimal tree selection for large sample sizes and (d) the accuracy of tree selection by 10-fold cross-validation is largely dependent on sample size, irrespective of the population distribution.


This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.