Investigations in Unsupervised Back-of-the-Book Indexing

Andras Csomai, Rada Mihalcea

This paper describes our experiments with unsupervised methods for back-of-the-book index construction. Through comparative evaluations performed on a gold standard data set of 29 books and their corresponding indexes, we draw conclusions as to what are the most accurate unsupervised methods for automatic index construction. We show that if the right sequence of methods and heuristics is used, the performance of an unsupervised back-of-the-book index construction system can be raised with up to 250% relative increase in F-measure as compared to the performance of a system based on the traditional tf*idf weighting scheme.

Subjects: 13. Natural Language Processing; 1.10 Information Retrieval

Submitted: Feb 11, 2007

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.