Evaluating Speech-Driven Web Retrieval in the Third NTCIR Workshop

Atsushi Fujii and Katunobu Itou

Speech recognition has of late become a practical technology for real world applications. For the purpose of research and development in speech-driven retrieval, which facilitates retrieving information with spoken queries, we organized the speech-driven retrieval subtask in the NTCIR-3Web retrieval task. Search topics for the Web retrieval main task were dictated by ten speakers and were recorded as collections of spoken queries. We used those queries to evaluate the performance of our speech-driven retrieval system, in which speech recognition and text retrieval modules were integrated. The text retrieval module, which is based on a probabilistic model, indexed only textual contents in documents (Web pages), but did not use HTML tags and hyperlink information in documents. Experimental results showed that a) the use of target documents for language modeling and b) enhancement of the vocabulary size in speech recognition were effective to improve the system performance.

