Track:
Wrapper Learning
Downloads:
Abstract:
Recently, many systems have been built that automatically interact with Internet information resources. However, these resources are usually formatted for use by people; eg, the relevant content is embedded in HTML pages. Wrappers are often used to extract a resource’s content, but hand-coding wrappers is tedious and error-prone. We advocate wrapper induction, a technique for automatically constructing wrappers. We have identified several wrapper classes that can be learned quickly (most sites require only a handful of examples, consuming a few CPU seconds of processing), yet which are useful for handling numerous Internet resources (70% of surveyed sites can be handled by our techniques).