Understanding and following verbal instructions is a crucial human ability, allowing quick transfer of knowledge about acting in the world. This paper presents the MARCO modular architecture for reasoning about and following route instructions. The Syntactic Parser parses route instruction texts. The Content Framer interprets the surface meaning of each sentence. The Instruction Modeler combines information across phrases and sentences: imperatively, as a representation of what to do under which circumstances, and declaratively, as a model of what is stated and implied about the world. The Executor reactively interleaves action and perception, executing the instructions in the context of the environment. The Robot Controller adapts low-level actions and local view descriptions to the particular follower’s motor and sensory capabilities. When the follower can interact with a director, the follower can plan dialog moves that increase its understanding and likelihood of reaching the destination.