Content-related metadata plays an important role in the effort of developing intelligent web applications. One of the most established form of providing content-related metadata is the assignment of web-pages to content categories. We describe the Spectacle system for classifying individual web pages on the basis of their syntactic structure. This classification requires the specification of classification rules associating common page structures with predefined classes. In this paper, we propose an approach for the automatic acquisition of these classification rules using techniques from inductive logic programming and describe experiments in applying the approach to an existing web-based information system.
Published Date: May 2002
Registration: ISBN 978-1-57735-141-2
Copyright: Published by The AAAI Press, Menlo Park, California