A Publicly Available Annotated Corpus for Supervised Email Summarization

Jan Ulrich, Gabriel Murray, Giuseppe Carenini

Annotated email corpora are necessary for evaluation and training of machine learning summarization techniques. The scarcity of corpora has been a limiting factor for research in this field. We describe our process of creating a new annotated email thread corpus that will be made publicly available. We present the trade-offs of the different annotation methods that could be used.

Subjects: 1.10 Information Retrieval; 13.1 Discourse

Submitted: May 5, 2008

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.