Track:
Contents
Downloads:
Abstract:
We present a method for learning wrappers for multi-slot extraction from semi-structured documents. The presented method learns how to construct automatically wrappers from positive examples, consisting of text tuples occurring in the document. These wrappers (T-wrappers) are based on a feature structure unification based pattern language for information extraction. The presented technique is an inductive machine learning method based on a modified version of least general generalization (td-Anti-Unification) for a subset of feature structures (tokens).