Mining On-line Sources for Definition Knowledge

Horacio Saggion and Robert Gaizauskas

Finding definitions in huge text collections is a challenging problem, not only because of the many ways in which definitions can be conveyed in natural language texts but also because the definiendum (i.e., the thing to be defined) has not, on its own, enough discriminative power to allow selection of definition-bearing passages from the collection. We have developed a method that uses already available external sources to gather knowledge about the “definiendum” before trying to define it using the given text collection. This knowledge consists of lists of relevant secondary terms that frequently co-occur with the definiendum in definition-bearing passages or “definiens”. External sources used to gather secondary terms are an on-line enyclopedia, a lexical database and the Web. These secondary terms together with the definiendum are used to select passages from the text collection performing information retrieval. Further linguistic analysis is carried out on each passage to extract definition strings from the passages using a number of criteria including the presence of main and secondary terms or definition patterns.

