Published:
May 2004
Proceedings:
Proceedings of the Seventeenth International Florida Artificial Intelligence Research Society Conference (FLAIRS 2004)
Volume
Issue:
Proceedings of the Seventeenth International Florida Artificial Intelligence Research Society Conference (FLAIRS 2004)
Track:
All Papers
Downloads:
Abstract:
The wealth of information contained in the world-wide web has created much interest in systems for integrating information from multiple sites. We describe a universal wrapper machine that can learn to extract information from the web given only a set of general rules describing the data domain. It cleanly separates out site-independent and site-specific knowledge from execution implementation. Site-independent knowledge is expressed in user-supplied domain rules, while site-specific knowledge is expressed in automatically-generated context-free grammars that describe site structures. The two are combined by using the domain rules to semantically interpret the parse trees generated by the grammars. The resulting declarative wrapper specifications are easily understandable by humans and can be executed to perform information extraction. Once extracted, tuples can be queried by external agents using a high-level agent communication language.
FLAIRS
Proceedings of the Seventeenth International Florida Artificial Intelligence Research Society Conference (FLAIRS 2004)
ISBN 978-1-57735-201-3
Published by The AAAI Press, Menlo Park, California.