Feature Selection in SVM Text Categorization

Authors

Hirotoshi Taira, NTT Communication Science Labs, and Masahiko Haruno, ATR Human Information Processing Research Labs

Proceedings:

Proceedings of the AAAI Conference on Artificial Intelligence, 16

Volume

Issue:

Proceedings of the AAAI Conference on Artificial Intelligence, 16

Track:

Natural Language and Information Retrieval

Downloads:

Download PDF

Abstract:

This paper investigates the effect of prior feature selection in Support Vector Machine (SVM) text categorization. The input space was gradually increased by mutual information (MI) filtering and part-of-speech (POS) filtering, which determine the portion of words that are appropriate for SVM learning from the information-theoretic and linguistic perspectives, respectively. The common results for both filtering are that 1) the optimal number of features was completely different among categories, and 2) the average performance for categories was best when all of the words were used. In addition, the comparison of two experiments clarifies that 3) POS filtering consistently outperforms MI filtering, which indicates that SVMs cannot find irrelevant parts-of-speech. These results suggest a simple strategy to utilize a full number of words that are picked up by a rough filtering technique like part-of-speech tagging.

AAAI

Proceedings of the AAAI Conference on Artificial Intelligence, 16

ISBN 978-0-262-51106-3

July 18-22, 1999, Orlando, Florida. Published by The AAAI Press, Menlo Park, California.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.