Abstract:
As speech recognizers become more robust, they are popularly accepted as an essential component of human-computer interaction. State-ofthe- art speaker-independent speech recognizers exist with word recognition error rates below 10%. To achieve even higher and robust recognition performance, multi-modal speech recognition techniques that combine video and audio information call be used. Speech reading, the video portion of bimodal speech recognizer, introduces not only additional computatalonal cost of video processing, but also chanllenges in the design of the integrated audio-video recognizer.