Following verbal route instructions requires knowledge of language, space, action and perception. We present Marco, an agent that follows free-form, natural language route instructions by representing and executing a sequence of compound action specifications that model which actions to take under which conditions. Marco infers implicit actions from knowledge of both linguistic conditional phrases and from spatial action and local configurations. Thus, Marco performs explicit actions, implicit actions necessary to achieve the stated conditions, and exploratory actions to learn about the world. We gathered a corpus of 786 route instructions from six people in three large-scale virtual indoor environments. Thirty-six other people followed these instructions and rated them for quality. These human participants finished at the intended destination on 69% of the trials. Marco followed the same instructions in the same environments, with a success rate of 61%. We measured the efficacy of action inference with Marco variants lacking action inference: executing only explicit actions, Marco succeeded on just 28% of the trials. For this task, inferring implicit actions is essential to follow poor instructions, but is also crucial for many highly-rated route instructions.