Proceedings:
Proceedings of the AAAI Conference on Artificial Intelligence, 16
Volume
Issue:
Proceedings of the AAAI Conference on Artificial Intelligence, 16
Track:
Natural Language and Information Retrieval
Downloads:
Abstract:
This paper investigates the effect of prior feature selection in Support Vector Machine (SVM) text categorization. The input space was gradually increased by mutual information (MI) filtering and part-of-speech (POS) filtering, which determine the portion of words that are appropriate for SVM learning from the information-theoretic and linguistic perspectives, respectively. The common results for both filtering are that 1) the optimal number of features was completely different among categories, and 2) the average performance for categories was best when all of the words were used. In addition, the comparison of two experiments clarifies that 3) POS filtering consistently outperforms MI filtering, which indicates that SVMs cannot find irrelevant parts-of-speech. These results suggest a simple strategy to utilize a full number of words that are picked up by a rough filtering technique like part-of-speech tagging.
AAAI
Proceedings of the AAAI Conference on Artificial Intelligence, 16
ISBN 978-0-262-51106-3
July 18-22, 1999, Orlando, Florida. Published by The AAAI Press, Menlo Park, California.