AAAI Publications, 2010 AAAI Fall Symposium Series

Font Size: 
Eye Spy: Improving Vision through Dialog
Adam Vogel, Karthik Raghunathan, Dan Jurafsky

Last modified: 2010-11-03


Despite efforts to build robust vision systems, robots in new environments inevitably encounter new objects. Traditional supervised learning requires gathering and annotating sampleimages in the environment, usually in the form of bounding boxes or segmentations. This training interface takes some experience to do correctly and is quite tedious. We report work in progress on a robotic dialog system to learn names and attributes of objects through spoken interaction with a human teacher. The robot and human play a variant of the children’s games “I Spy” and “20 Questions”. In our game, the human places objects of interest in front of the robot, then picks an object in her head. The robot asks a series of natural language questions about the target object, with the goal of pointing at the correct object while asking a minimum number of questions. The questions range from attributes such as color (“Is it red?”) to category questions (“Is it a cup?”). The robot selects questions to ask based on an information gain criteria, seeking to minimize the entropy of the visual model given the answer to the question.

Full Text: PDF