Antoine Blais, Iana Atanassova, Jean-Pierre Desclès, Mimi Zhang, Leila Zighem
The exploitation of the discourse structure of a text and the identification of the discourse categories are essential elements for the automatic summarization, as well as for the textual information retrieval. In this paper we will describe an automatic summarization strategy that uses these elements as the basis for the extraction of the most relevant textual segments that will constitute the summary. Certain linguistic markers allow us to annotate automatically a text according to discourse categories, in order to make visible the discourse structure and the discourse categories in the text. Our approach is domain independent and the discourse categories that we use for summarization are general for all natural languages. This makes it possible to apply our method to articles in various domains and in different languages.
Subjects: 13. Natural Language Processing; 13.1 Discourse
Submitted: Feb 8, 2007