Proceedings:
Proceedings of the AAAI Conference on Artificial Intelligence, 16
Volume
Issue:
Proceedings of the AAAI Conference on Artificial Intelligence, 16
Track:
Natural Language and Information Retrieval
Downloads:
Abstract:
Information extraction systems usually require two dictionaries: a semantic lexicon containing domain-specific phrases and a dictionary of extraction patterns for the domain. We present a multi-level bootstrapping algorithm for building both the semantic lexicon and extraction patterns simultaneously. As input, our technique requires only unannotated training texts and a handful of "seed words" for a category. We use a "mutual bootstrapping" technique to alternately select the best extraction pattern for the category and bootstrap its extractions into the semantic lexicon, which is the basis for selecting the next extraction pattern. To make this approach more robust, we add a second level of bootstrapping ("meta-bootstrapping") that retains only the most reliable lexicon entries produced by mutual bootstrapping and then restarts the process. We evaluated this multi-level bootstrapping technique on a collection of corporate web pages and a corpus of terrorism news articles. The algorithm produced high-quality dictionaries for several semantic categories, and the dictionaries proved to be useful for extracting information from new web pages.
AAAI
Proceedings of the AAAI Conference on Artificial Intelligence, 16
ISBN 978-0-262-51106-3
July 18-22, 1999, Orlando, Florida. Published by The AAAI Press, Menlo Park, California.