Unsupervised Context Sensitive Language Acquisition from Large, Untagged Corpora

Authors

Zach Solan

Eytan Ruppin

David Horn

and Shimon Edelman

Proceedings:

Language Learning: An Interdisciplinary Perspective

Volume

Issue:

Papers from the 2004 AAAI Spring Symposium

Track:

Contents

Downloads:

Download PDF

Abstract:

A central tenet of generative linguistics is that extensive innate knowledge of grammar is essential to explain the acquisition of language from positive-only data (Chomsky, 1986). We explore an alternative hypothesis, according to which syntax is an abstraction that emerges from exposure to language (Hopper, 1998), coexisting with the corpus data within the same representational mechanism. Far from parsimonious, the representation we introduce allows partial overlap of linguistic patterns or constructions (Croft, 2001). The incremental process of acquisition of patterns is driven both by structural similarities and by statistical information inherent in the data, so that frequent strings of similar composition come to be represented by the same pattern. The degree of abstraction of a pattern varies: it may be high, as in the case of a frame with several slots, each occupied by a member of an equivalence class associated with it, or low, as in the extreme case of idioms or formulaic language snippets, where there is no abstraction at all (Langacker, 1987; Wray, 2002). The acquired patterns represent fully the original data, and, crucially, enable structure-sensitive generalization in the production and the assimilation of unseen examples.

Spring

Papers from the 2004 AAAI Spring Symposium

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.