Comparing Real-Real, Simulated-Simulated, and Simulated-Real Spoken Dialogue Corpora

Hua Ai, Diane Litman

User simulation is used to generate large corpora for using reinforcement learning to automatically learn the best policy for spoken dialogue systems. Although this approach is becoming increasingly popular, the differences between simulated and real corpora are not well studied. We build two simulation models to interact with an intelligent tutoring system. Both models are trained on two different real corpora separately. We use several evaluation measures proposed in previous research to compare between our two simulated corpora, between the original two real corpora, and between the simulated and real corpora. We next examine the differentiating power of these measures. Our results show that although these simple statistical measures can distinguish real corpora from simulated ones, these measures cannot help us to draw a conclusion on the reality of the simulated corpora since even two real corpora can be very different when evaluated on the same measures.

Subjects: 13. Natural Language Processing; 18. Speech Understanding

Submitted: May 11, 2006

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.