Published:
May 2004
Proceedings:
Proceedings of the Seventeenth International Florida Artificial Intelligence Research Society Conference (FLAIRS 2004)
Volume
Issue:
Proceedings of the Seventeenth International Florida Artificial Intelligence Research Society Conference (FLAIRS 2004)
Track:
All Papers
Downloads:
Abstract:
This paper presents work that uses Transductive Latent Semantic Indexing (LSI) for text classification. In addition to relying on labeled training data, we improve classification accuracy by incorporating the set of test examples in the classification process. Rather than performing LSI’s singular value decomposition (SVD) process solely on the training data, we instead use an expanded term-by-document matrix that includes both the labeled data as well as any available test examples. We report the performance of LSI on data sets both with and without the inclusion of the test examples, and we show that tailoring the SVD process to the test examples can be even more useful than adding additional training data. The test set can be a useful tool to combat the possible inclusion of unrelated data in the original corpus.
FLAIRS
Proceedings of the Seventeenth International Florida Artificial Intelligence Research Society Conference (FLAIRS 2004)
ISBN 978-1-57735-201-3
Published by The AAAI Press, Menlo Park, California.