WordNet and Distributional Analysis: A Class-based Approach to Lexical Discovery

Philip Resnik

It has become common in statistical studies of natural language data to use measures of lexical association, such as the information-theoretic measure of mutual information, to extract useful relationships between words. For example, [Hindle, 1990] uses an estimate of mutual information to calculate what nouns a verb can take as its subjects and objects, based on distributions found within a large corpus of naturally occurring text. Lexical association has its limits, however, since oftentimes either the data are insufficient to provide reliable word/word correspondences, or a task requires more abstraction than word/word correspondences permit. In this paper I present a generalization of lexical association techniques that addresses these limitations by facilitating statistical discovery of facts involving word classes rather than individual words. Although defining association measures over classes (as sets of words) is straightforward in theory, making direct use of such a definition is impractical because there are simply too many classes to consider. Rather than considering all possible classes, I propose constraining the set of possible word classes by using WordNet, a broad-coverage lexical/conceptual hierarchy.

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.