Guang Can Liu, Yong Yu, Xing Zhu
One of the core components in information retrieval(IR) is the document-term-weighting scheme. In this paper,we will propose a novel learning-based term-weighting approach to improve the retrieval performance of vector space model in homogeneous collections. We first introduce a simple learning system to weighting the index terms of documents. Then, we deduce a formal computational approach according to some theories of matrix computation and statistical inference. Our experiments on 8 collections will show that our approach outperforms classic TF.IDF weighting, about 20%~45%.
Content Area: 19. Semantic Web, Information Retrieval, and Extraction
Subjects: 1.10 Information Retrieval; 12. Machine Learning and Discovery
Submitted: May 4, 2005