Intelligent Talk-and-Touch Interfaces Using Multi-Modal Semantic Grammars

Bruce Krulwich, Chad Burkey

Multi-modal interfaces have been proposed as a way to capture the ease and expressivity of natural communication. Interfaces of this sort allow users to communicate with computers through combinations of speech, gesture, touch, expression, etc. A critical problem in developing such an interface is integrating these different inputs (e.g., spoken sentences, pointing gestures, etc.) into a single interpretation. For example, in combining speech and gesture, a system must relate each gesture to the appropriate part of the sentence. We are investigating this problem as it arises in our talk and touch interfaces, which combine full-sentence speech and screen-touching. Our solution, which has been implemented in two completed prototypes, uses multi-modal semantic grammars to match screen touches to speech utterances. Through this mechanism, our systems can easily support wide variations in the speech patterns used to indicate touch references. Additionally, they can ask specific focused questions to the user in the event of an inability to understand the input. They earl also incorporate other semantic information, such as contextual references or references to previous sentence referents, through this single unified approach. Our two prototypes appear effective in providing a straightforward and powerful interface to novice computer users.


This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.