AAAI Publications, Twenty-Fourth International FLAIRS Conference

Font Size: 
Automatic Reduction of a Document-Derived Noun Vocabulary
Sven Anderson, S. Rebecca Thomas, Camden Segal, Yu Wu

Last modified: 2011-03-20


We propose and evaluate five related algorithms that automatically derive limited-size noun vocabularies from text documents of 2,000-30,000 words.The proposed algorithms combine Personalized Page Rank and principles of information maximization, and are applied to the WordNet graph for nouns. For the best-performing algorithm the difference between automatically generated reduced noun lexicons and those created by human writers is approximately 1-2 WordNet edges per lexical item. Our results also indicate the importance of performing word-sense disambiguation with sentence-level context information at the earliest stage of analysis.

Full Text: PDF