Our long-term objective is to develop robots that engage in natural language-mediated cooperative tasks with humans. To support this goal, we are developing an amodal representation called a grounded situation model (GSM), as well as a modular architecture in which the GSM resides in a centrally located module. We present an implemented system that allows of a range of conversational and assistive behavior by a manipulator robot. The robot updates beliefs about its physical environment and body, based on a mixture of linguistic, visual and proprioceptive evidence. It can answer basic questions about the present or past and also perform actions through verbal interaction. Most importantly, a novel contribution of our approach is the robot’s ability for seamless integration of both languageand sensor-derived information about the situation: For example, the system can acquire parts of situations either by seeing them or by “imagining” them through descriptions given by the user: “There is a red ball at the left”. These situations can later be used to create mental imagery, thus enabling bidirectional translation between perception and language. This work constitutes a step towards robots that use situated natural language grounded in perception and action.