Proceedings:
Intelligent Integration and Use of Text, Image, Video, and Audio Corpora
Volume
Issue:
Papers from the 1997 AAAI Spring Symposium
Track:
Contents
Downloads:
Abstract:
Obtaining sufficient labelled training data is a persistent difficulty for speech recognition research. Although well transcribed data is expensive to produce, there is a constant stream of challenging speech data and poor transcription broadcast as closed-captioned television. We describe a reliable unsupervised method for identifying accurately transcribed sections of these broadcasts, and show how these segments can be used to train a recognition system. Starting from acoustic models trained on the Wall Street Journal database, a single iteration of our training method reduced the word error rate on an independent broadcast television news test set from 62.2 % to 59.5%.
Spring
Papers from the 1997 AAAI Spring Symposium