Text Classification in USENET Newsgroups: A Progress Report

Scott A. Weiss, Simon Kasif. and Eric Brill

We report on our investigations into topic classification with USENET newsgroups. Our framework is to determine the newsgroup that a new document should be posted to. We train our system by forming "metadocuments" that represent each topic. We discuss our experiments with this method, and provide evidence that choosing particular documents or words to use in these models degrades classification accuracy. We also describe a technique called classification-based retrieval for finding documents similar to a query document.


This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.