The article presents an approach to modeling spatio-temporal comprehension in situated dialogue. The approach combines linguistic reasoning with reasoning about intentions and plans during incremental interpretation of dialogue moves. The article explores how intention-directed planning can prime selectional attention in utterance comprehension by disambiguating linguistic analyses on the basis of plan availability, and by raising expectations what action(s) may be mentioned next. Also, planning can complement linguistic analyses with details on spatiotemporal-causal structure established in planning inferences. This makes such inferences available for future referencing in the discourse context.