Recovering the Propensity Score from Biased Positive Unlabeled Data

Authors

Walter Gerych

Worcester Polytechnic Institute

Thomas Hartvigsen

MIT CSAIL

Luke Buquicchio

Worcester Polytechnic Institute

Emmanuel Agu

Worcester Polytechnic Institute

Elke Rundensteiner

Worcester Polytechnic Institute

Proceedings:

No. 6: AAAI-22 Technical Tracks 6

Volume

Issue:

Proceedings of the AAAI Conference on Artificial Intelligence, 36

Track:

AAAI Technical Track on Machine Learning I

Downloads:

Download PDF

Abstract:

Positive-Unlabeled (PU) learning methods train a classifier to distinguish between the positive and negative classes given only positive and unlabeled data. While traditional PU methods require the labeled positive samples to be an unbiased sample of the positive distribution, in practice the labeled sample is often a biased draw from the true distribution. Prior work shows that if we know the likelihood that each positive instance will be selected for labeling, referred to as the propensity score, then the biased sample can be used for PU learning. Unfortunately, no prior work has been proposed an inference strategy for which the propensity score is identifiable. In this work, we propose two sets of assumptions under which the propensity score can be uniquely determined: one in which no assumption is made on the functional form of the propensity score (requiring assumptions on the data distribution), and the second which loosens the data assumptions while assuming a functional form for the propensity score. We then propose inference strategies for each case. Our empirical study shows that our approach significantly outperforms the state-of-the-art propensity estimation methods on a rich variety of benchmark datasets.

DOI:

10.1609/aaai.v36i6.20624

AAAI

Proceedings of the AAAI Conference on Artificial Intelligence, 36

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.