This paper analyzes the effect of unlabeled training data in generative classifiers. We are interested in classification performance when unlabeled data are added to an existing pool of labeled data. We show that unlabeled data can degrade the performance of a classifier when there are discrepancies between modeling assumptions used to build the classifier and the actual model that generates the data; our analysis of this situation explains several seemingly disparate results in the literature.
Published Date: May 2002
Registration: ISBN 978-1-57735-141-2
Copyright: Published by The AAAI Press, Menlo Park, California