Occam’s Razor and a Non-Syntactic Measure of Decision Tree Complexity

Authors

Goutam Paul

Proceedings:

Book One

Volume

Issue:

Proceedings of the AAAI Conference on Artificial Intelligence, 19

Track:

Student Abstracts

Downloads:

Download PDF

Abstract:

Occam’s razor, attributed to the fourteenth century English philosopher William of Occam, states: “plurality should not be assumed without necessity.” The machine learning interpretation of Occam’s razor is that if two models have the same performance on the training set, choose the simpler. Decision tree learning widely uses Occam’s razor. Popular decision tree generating algorithms are based on information gain criterion which inherently prefers shorter trees (Mitchel 1997). Furthermore, decision tree pruning is common regardless of the splitting criterion. Experiments suggest that shorter trees indeed have better generalization accuracy (GA), typically estimated by a validation set prediction accuracy. However, some case studies show evidence apparently against Occam’s razor. Recently, Webb (1996) has built C4.5X, a version of C4.5 decision tree classifier (Quinlan 1993) with a postprocessor, which adds more nodes and branches to the tree generated by basic C4.5. He showed that though C4.5 and C4.5X have identical training set accuracies, the generalization accuracy over some datasets is better for C4.5X. But Webb’s argument is based on the traditional syntactic complexity measure (number of nodes) of decision trees. In this paper, we explore a non-syntactic measure of decision tree complexity using the notion of Kolmogorov Complexity (Kolmogorov 1965) and show that in this measure the complexity of C4.5X tree is less than that of C4.5 tree on average. Hence, according to our measure of complexity, C4.5X does not violate Occam’s razor.

AAAI

Proceedings of the AAAI Conference on Artificial Intelligence, 19

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.