The Effect of Alternate Scaling Approaches on the Performance of Different Supervised Learning Algorithms. An Empirical Study in the Case of Credit Scoring

Harald Kauderer and Gholamreza Nakhaeizadeh

Building classification tools to discriminate between good and bad credit risks is a supervised learning task that can be solved using different approaches. In constructing such tools, generally, a set of training data containing qualitative and quantitative attributes is used to learn the discriminant rules. In real world of credit applications a lot of the available information about the customer and his payment behavior appears in qualitative categorical attributes. On the other hand many approaches of supervised learning require quantitative numerical input attributes. Qualitative attributes first have to be transformed in numerical, before they can be used for the learning process. One very simple approach to handle that problem is to code each possible value of all qualitative categorical attributes in new separate binary attributes. This leads to an increasing number of input attributes, that makes learning more complicated and less reliable. In particular neural networks need more time for training and often loose accuracy. In this paper we consider different scaling approaches - here the number of attributes does not increase - to transform categorical into numerical attributes. We use them as input variables to learn the discriminant rules in order to enhance accuracy and stability of the rules. Using real world credit data, we evaluate different approaches and compare the results.

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.