Beginning to Understand Unstructured, Ungrammatical Text: An Information Integration Approach

Authors

Matthew Michelson

Craig A. Knoblock

Proceedings:

Machine Reading

Volume

Issue:

Papers from the 2007 AAAI Spring Symposium

Track:

Contents

Downloads:

Download PDF

Abstract:

As information agents become pervasive, they will need to read and understand the vast amount of information on the World Wide Web. One such valuable source of information is unstructured and ungrammatical text that appears in data sources such as online auctions or internet classifieds. One way to begin to understand this text is to figure out the entities that the text references. This can be thought of as the semantic annotation problem, where the goal is to extract the attributes embedded within the text and then annotate the text with these extracted attributes. If enough attributes can be extracted, then the entity referenced in the text can be determined. For example, if we have a used car for sale in a classified ad, and we can identify the make, model and year within the post, we can identify the car for sale. However, information extraction is difficult because the text does not contain reliable structural or grammatical clues. In this paper we present an unsupervised approach to semantically annotating such unstructured and ungrammatical text with the intention that this will help in the problem of machine understanding on the Web. Furthermore, we define an architecture that allows for better understanding over time. We present experiments to show our annotation approach is competitive with the state-of-the-art which uses supervised machine learning, even though our technique is fully unsupervised.

Spring

Papers from the 2007 AAAI Spring Symposium

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.