Jan Ulrich, Gabriel Murray, Giuseppe Carenini
Annotated email corpora are necessary for evaluation and training of machine learning summarization techniques. The scarcity of corpora has been a limiting factor for research in this field. We describe our process of creating a new annotated email thread corpus that will be made publicly available. We present the trade-offs of the different annotation methods that could be used.
Subjects: 1.10 Information Retrieval; 13.1 Discourse
Submitted: May 5, 2008