Training Stochastic Grammars from Unlabelled Text Corpora

Julian Kupiec and John Maxwell

The paper describes various aspects and practicalities of applying the "Hidden Markov" approach to train parameters of regular and contextfree stochastic grammars. The approach enables grammars to be trained from unlabelled text corpora, providing flexibility in the choice of syntactic categories and text domain. Part-of-speech tagging and parsing are discussed as applications. Linguistic considerations can be used to develop constrained grammars, providing appropriate higher-order context for disamhiguation. Unconstrained grammars provide the opportunity to capture patterns that are not covered by a specific grammar. Experimental results are discussed for these alternatives.

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.