In this paper, we propose a human-robot interface based on speech and vision. This interface allows a person to interact with the robot as it explores. During exploration the robot learns about its surrounding environment building a 2D occupancy grid map and a database of more complex visual landmarks and their locations. Such knowledge enables the robot to act intelligently. It is a difficult task for a robot to autonomously decide what are the most interesting regions to visit. Using the proposed interface, a human operator guides the robot during this exploration phase. Because of the robot’s insufficient on-board computational power, we use a distributed architecture to realize our system.