Learning Semantic Descriptions of Web Information Sources

Mark J Carman, Craig A. Knoblock

The Internet is full of information sources providing various types of data from weather forecasts to travel deals. These sources can be accessed via web-forms, Web Services or RSS feeds. In order to make automated use of these sources, one needs to first model them semantically. Writing semantic descriptions for web sources is both tedious and error prone. In this paper we investigate the problem of automatically generating such models. We introduce a framework for learning Datalog definitions for web sources, in which we actively invoke sources and compare the data they produce with that of known sources of information. We perform an inductive search through the space of plausible source definitions in order to learn the best possible semantic model for each new source. The paper includes an empirical evaluation demonstrating the effectiveness of our approach on real-world web sources.

Subjects: 12. Machine Learning and Discovery


Submitted: Oct 16, 2006

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.