Anti-Unification Based Learning of T-Wrappers for Information Extraction

Bernd Thomas

We present a method for learning wrappers for multi-slot extraction from semi-structured documents. The presented method learns how to construct automatically wrappers from positive examples, consisting of text tuples occurring in the document. These wrappers (T-wrappers) are based on a feature structure unification based pattern language for information extraction. The presented technique is an inductive machine learning method based on a modified version of least general generalization (td-Anti-Unification) for a subset of feature structures (tokens).

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.