Semi-Supervised Learning for Maximizing the Partial AUC

  • Tomoharu Iwata NTT Communication Science Laboratories
  • Akinori Fujino NTT Communication Science Laboratories
  • Naonori Ueda NTT Communication Science Laboratories

Abstract

The partial area under a receiver operating characteristic curve (pAUC) is a performance measurement for binary classification problems that summarizes the true positive rate with the specific range of the false positive rate. Obtaining classifiers that achieve high pAUC is important in a wide variety of applications, such as cancer screening and spam filtering. Although many methods have been proposed for maximizing the pAUC, existing methods require many labeled data for training. In this paper, we propose a semi-supervised learning method for maximizing the pAUC, which trains a classifier with a small amount of labeled data and a large amount of unlabeled data. To exploit the unlabeled data, we derive two approximations of the pAUC: the first is calculated from positive and unlabeled data, and the second is calculated from negative and unlabeled data. A classifier is trained by maximizing the weighted sum of the two approximations of the pAUC and the pAUC that is calculated from positive and negative data. With experiments using various datasets, we demonstrate that the proposed method achieves higher test pAUCs than existing methods.

Published
2020-04-03
Section
AAAI Technical Track: Machine Learning