How to Compile a Bilingual Collocational Lexicon Automatically

Frank Smadja

In this paper, we propose a technique for constructing bilingual collocation dictionaries completely automatically. The technique we propose first identifies a set of collocations in one language and then attempts to translate them using the Hansards as waining data. To do this, we propose to use Xlract, a collocation compiler [Smadja 92], to identify collocations and to use mutual information statistics to Iranslate the collocations into the other language. The algorithm we describe is an iterative method that builds the Iranslation of a given collocation by adding words one by one. This technique allows a collocation containing n words to be translated into a collocation of p words. The paper describes the proposed algorithm and shows how it is applied in the translation of the following three collocations: "senior citizen," "Madam Speaker," and "election campaign."

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.