Combating Sampling Bias: A Self-Training Method in Credit Risk Models

Authors

Jingxian Liao

Intuit AI+Data, Intuit, Inc. Department of Computer Science, University of California Davis

Wei Wang

Intuit AI+Data, Intuit, Inc.

Jason Xue

Intuit AI+Data, Intuit, Inc.

Anthony Lei

QuickBooks Capital, Intuit, Inc.

Xue Han

Intuit AI+Data, Intuit, Inc.

Kun Lu

Intuit AI+Data, Intuit, Inc.

Proceedings:

No. 11: IAAI-22, EAAI-22, AAAI-22 Special Programs and Special Track, Student Papers and Demonstrations

Volume

Issue:

Proceedings of the AAAI Conference on Artificial Intelligence, 36

Track:

IAAI Technical Track on Emerging Applications of AI

Downloads:

Download PDF

Abstract:

A significant challenge in credit risk models for underwriting is the presence of bias in model training data. When most credit risk models are built using only applicants who had been funded for credit, such non-random sampling predominantly influenced by credit policymakers and previous loan performances may introduce sampling bias to the models, and thus alter their prediction of default on loan repayment when screening applications from prospective borrowers. In this paper, we propose a novel data augmentation method that aims to identify and pseudo-label parts of the historically declined loan applications to mitigate sampling bias in the training data. We also introduce a new measure to assess the performance from the business perspective, loan application approval rates at various loan default rate levels. Our proposed methods were compared to the original supervised learning model and the traditional sampling issue remedy techniques in the industry. The experiment and early production results from deployed model show that self-training method with calibrated probability as data augmentation selection criteria improved the ability of credit scoring to differentiate default loan applications and, more importantly, can increase loan approval rate up to 8.8%, while keeping similar default rate comparing to baselines. The results demonstrate practical implications on how future underwriting model development processes should follow.

DOI:

10.1609/aaai.v36i11.21528

AAAI

Proceedings of the AAAI Conference on Artificial Intelligence, 36

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.