Debra S. Baddorf and Martha W. Evens
This paper describes our attempts to find information about phrases and their syntactic variants for inclusion in a computer lexicon. We started with a list of thirty phrases from the British Collins English Dictionary. After making a few phrase modifications to accommodate American usage, we searched for occurrences in the Gutenberg corpus, the Wall Street Journal (1987, 1988, and 1989) and the Department of Energy technical abstracts from the ACL-DCI CDROM. Finding syntactic variants of phrases forced us to allow variations in word order, tense and number, and to be flexible in looking for the smaller words of a phrase. We had to use flexible matching techniques to handle insertions or changes in adjectives, adverbs, and prepositions.