Tom Bylaner, Lisa Tate
AdaBoost is a well known, effective technique for increasing the accuracy of learning algorithms. However, it has the potential to overfit the training set because its objective is to minimize error on the training set. We demonstrate that overfitting in AdaBoost can be alleviated in a time-efficient manner using a combination of dagging and validation sets. Half of the training set is removed to form the validation set. The sequence of base classifiers, produced by AdaBoost from the training set, is applied to the validation set, creating a modified set of weights. The training and validation sets are switched, and a second pass is performed. The final classifier votes using both sets of weights. We show our algorithm has similar performance on standard datasets and improved performance when classification noise is added.
Subjects: 12. Machine Learning and Discovery
Submitted: Feb 10, 2006