Toward a Theoretical Understanding of Why and When Decision Tree Pruning Algorithms Fail

Authors

Tim Oates and David Jensen, University of Massachusetts

Proceedings:

Proceedings of the AAAI Conference on Artificial Intelligence, 16

Volume

Issue:

Proceedings of the AAAI Conference on Artificial Intelligence, 16

Track:

Learning

Downloads:

Download PDF

Abstract:

Recent empirical studies revealed two surprising pathologies of several common decision tree pruning algorithms. First, tree size is often a linear function of training set size, even when additional tree structure yields no increase in accuracy (Oates and Jensen 1997). Second, building trees with data in which the class label and the attributes are independent often results in large trees (Oates and Jensen 1998. In both cases, the pruning algorithms fail to control tree growth as one would expect them to. We explore this behavior theoretically by constructing a statistical model of reduced error pruning (Quinlan 1987). The model explains why and when the pathologies occur, and makes predictions about how to lessen their effects. The predictions are operationalized in a variant of reduced error pruning that is shown to control tree growth far better than the original algorithm. Finally, we argue that several other common pruning techniques can be viewed within the same framework, thus explaining their pathological behavior as well.

AAAI

Proceedings of the AAAI Conference on Artificial Intelligence, 16

ISBN 978-0-262-51106-3

July 18-22, 1999, Orlando, Florida. Published by The AAAI Press, Menlo Park, California.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.