A number of people have been studying methods for realizing autonomous robots with the capability of perception- planning-execution in a real world. In most system, the computer directs attention serially to each feature in the environment by moving a couple of cameras as human eye balls. We, however, consider it is much easier to build the robot vision with an architecture different from the human’s. The robot vision often needs to simultaneously execute different tasks using different eye motions. If the robot navigates in a real urban environment, it follows the road, finds moving obstacles and traffic signs, and views and memorizes interesting patterns along the route. We, humans, execute these tasks by moving our gaze and changing our attention, however the robot could accomplish these tasks by cooperation of independent agents with their own eyes and computing power. We built a system with 4 cameras moving independently each other. Each agent analyzes the image data and controls the eye motion, such as fixation to a moving target, fixation to a static target for vision-guided navigation, or monitoring a wide area to find obstacles. The perceptual zoom is, thus, placed upon multiple different agents and the information provided by the agents is integrated to determine the robot action.