Track:
Contents
Downloads:
Abstract:
It has become common in statistical studies of natural language data to use measures of lexical association, such as the information-theoretic measure of mutual information, to extract useful relationships between words. For example, [Hindle, 1990] uses an estimate of mutual information to calculate what nouns a verb can take as its subjects and objects, based on distributions found within a large corpus of naturally occurring text. Lexical association has its limits, however, since oftentimes either the data are insufficient to provide reliable word/word correspondences, or a task requires more abstraction than word/word correspondences permit. In this paper I present a generalization of lexical association techniques that addresses these limitations by facilitating statistical discovery of facts involving word classes rather than individual words. Although defining association measures over classes (as sets of words) is straightforward in theory, making direct use of such a definition is impractical because there are simply too many classes to consider. Rather than considering all possible classes, I propose constraining the set of possible word classes by using WordNet, a broad-coverage lexical/conceptual hierarchy.