Information Extraction (IE) systems typically rely on extraction patterns encoding domain-specific knowledge. When matched against natural language texts, these patterns recognize with high accuracy information relevant to the extraction task. Adapting an IE system to a new extraction scenario entails devising a new collection of extraction patterns - a time-consuming and expensive process. To overcome this obstacle, we have implemented in CICERO, our IE system, a pattern acquisition mechanism that combines lexicosemantic knowledge available from WordNet with syntactic information collected from training corpora. The open-domain nature of the knowledge encoded in WordNet grants portability of our approach across multiple extraction domains.
Published Date: May 2001
Registration: ISBN 978-1-57735-133-7
Copyright: Published by The AAAI Press, Menlo Park, California.