Rong Hu, Weizhu Chen, Jian Hu, Yansheng Lu, Zheng Chen, Qiang Yang
Query translation for Cross-Lingual Information Retrieval (CLIR) has gained increasing attention in the research area. Previous work mainly used machine translation systems, bilingual dictionaries, or web corpora to perform query translation. However, most of these approaches require either expensive language resources or complex language models, and cannot achieve timely translation for new queries. In this paper, we propose a novel solution to automatically acquire query translation pairs from the knowledge hidden in the click-through data, that are represented by the URL a user clicks after submitting a query to a search engine. Our proposed solution consists of two stages: identifying bilingual URL pair patterns in the click-through data and matching query translation pairs based on user click behavior. Experimental results on a real dataset show that our method not only generates existing query translation pairs with high precision, but also generates many timely query translation pairs that could not be obtained by previous methods. A comparative study between our system and two commercial online translation systems shows the advantage of our proposed method.
Subjects: 10. Knowledge Acquisition; 12. Machine Learning and Discovery
Submitted: Apr 14, 2008