Hidden Markov models (HMMs) are a powerful probabilistic tool for modeling time series data, and have been applied with success to many language-related tasks such as part of speech tagging, speech recognition, text segmentation and topic detection. This paper describes the application of HMMs to another language related task--information extraction--the problem of locating textual sub-segments that answer a particular information need. In our work, the HMM state transition probabilities and word emission probabilities are learned from labeled training data. As in many machine learning problems, however, the lack of sufficient labeled training data hinders the reliability of the model. The key contribution of this paper is the use of a statistical technique called "shrinkage" that significantly improves parameter estimation of the HMM emission probabilities in the facr of sparse training data. In experiments on seminar announcements and Reuters acquisitions articles, shrinkage is shown to reduce error by up to 40%, and the resulting HMM outperforms a state-of-the-art rule-learning system.