Knowledge Discovery for Document Classification

Chidanand Apte, Fred Damerau, and Sholom Weiss

We report on extensive experiments using rule-based induction methods for document classification. The goal is to automatically discover patterns in document classifications, potentially surpassing humans who currently read and classify these documents. By using a decision rule model, we induce results in a form compatible with expensive human engineered systems that have recently demonstrated excellent performance. Using computer-intensive rule induction methods, we have conducted experiments over a vast set of document families, including UPI, Reuters, NTIS, and the Library of Congress Catalog. We report on several approaches to classic problems for such applications, including choosing the right representation for text, and handling high dimensionality.

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.