Summarization of Documents that Include Graphics

Robert Futrelle

When documents include graphics such as diagrams, photos, and data plots, the graphics may also require summarization. This paper discusses essential differences in informational content and rhetorical structure between text and graphics, as well as their interplay. The three approaches to graphics summarization discussed are: Selection, in which a subset of figures is chosen; Merging, in which information in multiple figures is merged into one; and Distillation, in which a single diagram is reduced to a simpler form. These procedures have to consider the content and relations of the graphical elements within figures, the relations among a collection of figures, and the figure captions and discussions of figure content in the running text. We argue that for summarization to be successful, metadata, a manipulable representation of the content of figures, needs to be generated or included initially. Often, the textual references to figures are not very informative, so it will be necessary to generate metadata by diagram parsing, as we have done, or to develop intelligent authoring systems that will allow the author to easily include metadata. This paper introduces this new area of research with manual summarization examples and follows them with a discussion of automated techniques under development.

