Semi-supervised Learning for Multi-component Data Classification

Akinori Fujino, Naonori Ueda, Kazumi Saito

This paper presents a method for designing a semi-supervised classifier for multi-component data such as web pages consisting of text and link information. The proposed method is based on a hybrid of generative and discriminative approaches to take advantage of both approaches. With our hybrid approach, for each component, we consider an individual generative model trained on labeled samples and a model introduced to reduce the effect of the bias that results when there are few labeled samples. Then, we construct a hybrid classifier by combining all the models based on the maximum entropy principle. In our experimental results using three test collections such as web pages and technical papers, we confirmed that our hybrid approach was effective in improving the generalization performance of multi-component data classification.

Subjects: 12. Machine Learning and Discovery

n

Submitted: Oct 13, 2006


This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.