Proceedings:
Proceedings of the AAAI Conference on Artificial Intelligence, 16
Volume
Issue:
Proceedings of the AAAI Conference on Artificial Intelligence, 16
Track:
Technical Papers
Downloads:
Abstract:
Recent work on Internet information integration assumes a library of wrappers, specialized information extraction procedures. Maintaining wrappers is difficult, because the formatting regularities on which they rely often change. The wrapper verification problem is to determine whether a wrapper is correct. Standard regression testing approaches are inappropriate, because both the formatting regularities and a site’s underlying content may change. We introduce RAPTURE, a fully-implemented, domain-independent verification algorithm. RAPTURE uses well-motivated heuristics to compute the similarity between a wrapper’s expected and observed output. Experiments with 27 actual Internet sites show a substantial performance improvement over standard regression testing.
AAAI
Proceedings of the AAAI Conference on Artificial Intelligence, 16
ISBN 978-0-262-51106-3
July 18-22, 1999, Orlando, Florida. Published by The AAAI Press, Menlo Park, California.