Multi-Document Summarization by Maximizing Informative Content-Words

Wen-tau Yih, Joshua Goodman, Lucy Vanderwende, Hisami Suzuki

We show that a simple procedure based on maximizing the number of informative content-words can produce some of the best reported results for multi-document summarization. We first assign a score to each term in the document cluster, using only frequency and position information, and then we find the set of sentences in the document cluster that maximizes the sum of these scores, subject to length constraints. Our overall results are the best reported on the DUC-2004 summarization task for the ROUGE-1 score, and are the best, but not statistically significantly different from the best system in MSE-2005. Our system is also substantially simpler than the previous best system.

Subjects: 13. Natural Language Processing; 13.1 Discourse

Submitted: Oct 16, 2006

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.