Automatic Cross-Language Retrieval Using Latent Semantic Indexing

Susan T. Dumais, Todd A. Letsche, Michael L. Littman and Thomas K. Kandauer

We describe a method for fully automated cross-language document retrieval in which no query translation is required. Queries in one language can retrieve documents in other languages (as well as the original language). This is accomplished by a method that automatically constructs a multilingual semantic space using Latent Semantic Indexing (LSI). Strong test results for the cross-language LSI (CLLSI) method are presented for a new French-English collection. We also provide evidence that this automatic method performs comparably to a retrieval method based on machine translation (MT-LSI), and explore several practical training methods. By all available measures, CL-LSI performs quite well and is widely applicable.


This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.