Human produced route instructions are usually conveyed either verbally in spoken discourse or by written texts, or by graphical means, i.e. by illustrating the route in a map or by drawing sketch-maps, or by combining both kinds of external representations. Whereas verbal route instructions focus on the actions to be performed and take the spatial environment only as the frame for these actions, maps and other pictorial representations foreground the spatial environment without possessing adequate means for representing the actions. Today, in the time of World Wide Web and Geographical Information Systems, way-finding queries can be given to systems, which provide "driving directions" containing canned text as well as different types of maps. In this paper I describe the principles and the architecture underlying a system for generating multimodal route instructions combining natural language route descriptions and visualizations of the route to follow, such that the strengths of both means for communication route knowledge are brought together.