We describe a case study in data mining for personal loan evaluation, performed at the abn amro bank in the Netherlands. Historical data of clients and their pay-back behaviour are used to learn to predict whether a client will default or not. It is shown that, due to the pre-selection by a credit scoring system, the data base is a sample from a different population than the bank is actually interested in; this necessarily restricts inference as well. Furthermore we point out the importance of integrity and consistency checking when the data are entered into the system: noise is a serious problem. The actual experimental comparison involves a classical statistical method, linear discriminant analysis, and the classification tree algorithm C4.5. Both methods use one and the same training set, drawn from the historical database, to learn a classification function. The percentages of correct classifications on an independent test set are 71.4% and 73.6% respectively. McNemar’s test shows that the null hypothesis of equal performance has a p-value of 0.1417. The classification tree constructed by C4.5 uses 10 out of 38 attributes to distinguish between defaulters and non-defaulters, and is consistent with the available theory on credit scoring. The linear discriminant function uses 17 variables to make the classification. Both from the viewpoint of predictive accuracy and comprehensibilty, the classification tree performs better in this study. To make furhter progress, the level of noise in the data has to be reduced, and data has to be collected on loans that are rejected by the credit scoring system.