Eric Bloedorn, Inderjeet Mani. and T. Richard MacMillan
As more information becomes available electronically, tools for finding information of interest to users become increasingly important. Building tools for assisting users in finding relevant information is often complicated by the difficulty in artioalating user interest in a form that can be used for searching. The goal of the research described here is to build a system for generating comprehensible user profiles that accurately capture user interest with minimum user interaction. Machine learning methods offer a promising approach to solving this problem. The research described here focuses on the importance of a suitable generalization hierarchy and representation for learning profiles which are predictively accurate and comprehensible. In our experiments using AQISc and C4.5 we evaluated both traditional features based on weighted term vectors as well as subject features corresponding to categories which could be drawn from a thesaurus. Our experiments, conducted in the context of a content-based profiling system for on-line newspapers on the World Wide Web (the IDD News Browser) demonstrate the importance of a generalization hierarchy in olxaining high predictive accuracy, precision and recall, and stability of learning.