Populating the Semantic Web

Authors

Kristina Lerman

Cenk Gazen

Steven Minton

and Craig Knoblock

Track:

Contents

Downloads:

Download PDF

Abstract:

The vision of the Semantic Web is that a vast store of online information “meaningful to computers will unleash a revolution of new possibilities.” Unfortunately, the vast majority of information on the Web is formatted to be easily read by human users, not computer applications. In order to make the vision of the Semantic Web a reality, tools for automatically annotating Web content with semantic labels will be required. We describe the ADEL system that automatically extracts records from Web sites and semantically labels the fields. The system exploits similarities in the layout of Web pages in order to learn the grammar that generated these pages. It them uses this grammar to extract structured records from these Web pages. ADEL system also exploits the fact that sites in the same domain will provide the same, or similar data. By collecting labeled examples of data during the training stage, we are able to learn structural descriptions of data fields and later use these descriptions to semantically label new data fields. We show that on a Used Car shopping domain, ADEL achieves precision of 64% and recall of 89% on extracting and labeling data columns.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.