This paper presents work that uses Transductive Latent Semantic Indexing (LSI) for text classification. In addition to relying on labeled training data, we improve classification accuracy by incorporating the set of test examples in the classification process. Rather than performing LSI’s singular value decomposition (SVD) process solely on the training data, we instead use an expanded term-by-document matrix that includes both the labeled data as well as any available test examples. We report the performance of LSI on data sets both with and without the inclusion of the test examples, and we show that tailoring the SVD process to the test examples can be even more useful than adding additional training data. The test set can be a useful tool to combat the possible inclusion of unrelated data in the original corpus.
Published Date: May 2004
Registration: ISBN 978-1-57735-201-3
Copyright: Published by The AAAI Press, Menlo Park, California.