Track:
All Papers
Downloads:
Abstract:
For many text classification tasks, sets of background text are easily available from the Web and other online sources. We show that such background text can greatly improve text classification performance by treating the background text as unlabeled data and using existing techniques based on EM for iteratively labeling this background text. Although results are most pronounced when the background text falls into categories that mirror those present in the training and test data, we show improved classification accuracy even though the use of background text violates many of the assumptions underlying the original approach, especially in the presence of limited training data.