Searching Recorded Speech Based on the Temporal Extent of Topic Labels

Douglas W. Oard and Anton Leuski

Recorded speech poses unusual challenges for the design of interactive end-user search systems. Automatic speech recognition is sufficiently accurate to support the automated components of interactive search systems in some applications, but finding useful recordings among those nominated by the system can be difficult because listening to audio is time consuming and because recognition errors and speech disfluencies make difficult to mitigate that effect by skimming automatic transcripts. Support for rapid browsing based on supervised learning for automatic classification has shown promise, however, and a segment-then-label framework has emerged as the dominant paradigm for applying that technique to news broadcasts. This paper argues for more general framework, which we call an activation matrix, that provides a flexible representation for the mapping between labels and time. Three approaches to the generation of activation matrices are briefly described, with the main focus of the paper then being the use of activation matrices to support search and selection in interactive systems.

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.