Extracting Partial Structures from HTML Documents

Authors

Hiroshi Sakamoto

Yoshitsugu Murakami

Hiroki Arimura

Setsuo Arikawa

Kyushu University

Japan

Published:

May 2001

Proceedings:

Proceedings of the Fourteenth International Florida Artificial Intelligence Research Society Conference (FLAIRS 2001)

Volume

Issue:

Proceedings of the Fourteenth International Florida Artificial Intelligence Research Society Conference (FLAIRS 2001)

Track:

All Papers

Downloads:

Download PDF

Abstract:

The new wrapper model for extracting text data from HTML documents is introduced. In this model, an HTML file is considered as an ordered labeled tree. The learning algorithm takes the sequence of pairs of an HTML tree and a set of nodes The nodes indicate the labels to extract from the HTML tree. The goal of the learning algorithm is to output the wrapper which exactly extracts the labels from the HTML trees.

FLAIRS

Proceedings of the Fourteenth International Florida Artificial Intelligence Research Society Conference (FLAIRS 2001)

ISBN 978-1-57735-133-7

Published by The AAAI Press, Menlo Park, California.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.