Published:
May 2001
Proceedings:
Proceedings of the Fourteenth International Florida Artificial Intelligence Research Society Conference (FLAIRS 2001)
Volume
Issue:
Proceedings of the Fourteenth International Florida Artificial Intelligence Research Society Conference (FLAIRS 2001)
Track:
All Papers
Downloads:
Abstract:
The new wrapper model for extracting text data from HTML documents is introduced. In this model, an HTML file is considered as an ordered labeled tree. The learning algorithm takes the sequence of pairs of an HTML tree and a set of nodes The nodes indicate the labels to extract from the HTML tree. The goal of the learning algorithm is to output the wrapper which exactly extracts the labels from the HTML trees.
FLAIRS
Proceedings of the Fourteenth International Florida Artificial Intelligence Research Society Conference (FLAIRS 2001)
ISBN 978-1-57735-133-7
Published by The AAAI Press, Menlo Park, California.