Abstract:
We report on an audio retrieval system which lets Internet users efficiently access a large text and audio corpus containing the transcripts and recordings of the proceedings of the United States House of Representatives. The audio has been temporally aligned to corresponding text transcripts (which are manually generated by the U.S. Government) using an automatic method based on speaker identification. This system is an example of using digital storage and structured media to make a large multimedia archive easily accessible.