Credit scoring model have been developed by banks and researchers to improve the process of assessing credit worthiness during the credit evaluation process. The objective of credit scoring models is to assign credit risk to either a “good risk” group that is likely to repay financial obligation or a “bad risk” group who has high possibility of defaulting on the financial obligation. Construction of credit scoring models requires data mining techniques. Using historical data on payments, demographic characteristics and statistical techniques, credit scoring models can help identify the important demographic characteristics related to credit risk and provide a score for each customer. This paper illustrates using data mining to improve assessment of credit worthiness using credit scoring models. Due to privacy concerns and unavailability of real financial data from banks this study applies the credit scoring techniques using data of payment history of members from a recreational club. The club has been facing a problem of rising number in defaulters in their monthly club subscription payments. The management would like to have a model which they can deploy to identify potential defaulters. The classification performance of credit scorecard model, logistic regression model and decision tree model were compared. The classification error rates for credit scorecard model, logistic regression and decision tree were 27.9%, 28.8% and 28.1%, respectively. Although no model outperforms the other, scorecards are relatively much easier to deploy in practical applications.
Credit scoring models are very useful for many practical applications especially for banks and financial institutions. The decision-making process of accepting or rejecting a client’s credit by banks is commonly executed via judgmental techniques and/or credit scoring models. Most banks and financial institutions use the judgmental approach which is based on the 3C’s, 4C’s or 5C’s which are character, capital, collateral, capacity and condition. Credit scoring is a system creditors use to assign credit applicants to either a “good credit” one that is likely to repay financial obligation or a “bad credit” one who has a high possibility of defaulting on financial obligation. Generally, Linear Discriminant Analysis and logistic regression are two popular statistical tools to construct credit scoring models (Abdou et al., 2008, Desai et al., 1996, Gao et al., 2006, Hand and Henley, 1997, Thomas, 2000 and Vojtek and Kocenda, 2006). However, with the advance in information and computer technology new techniques are appearing under the name of data mining. Data mining software such as SAS® Enterprise Miner and SPSS PASW® 13 modeler provide not only the classical methods but new novel predictive modeling and classification techniques such as decision tree, neural networks, support vector machine (SVM), and k-nearest neighbors.
Although credit scoring methods are widely used for loan applications in financial and banking institutions, it can be used for other type of organizations such as insurance, real estate, telecommunication and recreational clubs for predicting late payments. For example, Gschwind (2007) showed a data mining application in real estate for predicting late payments by tenant. Due to privacy concerns and unavailability of data from banks, for this paper, historical payment of monthly subscription from members of a local recreational club was used. Payment of the monthly subscription fee is an obligation of the club members besides paying the permanent membership fee. The management faces the problem in the rising number of defaulters. So far, there has been no significant effort to improve cash flow by proactively predicting non-payments using quantitative methods, and taking corrective actions before a late payment happened. Discussion with the management of the club revealed that they use judgmental techniques to determine the defaulters or non-defaulters and whether to terminate the membership of defaulters. The main source of income for most recreational clubs is the membership monthly payments. A large number of defaulters will result in cash flow problem and loss of income for the club. This will affect the financial planning of the club activities and the management faces the problem of ensuring that the club does not go bankrupt. The objective of this paper is to illustrate the use of data mining in assessing credit worthiness using credit scoring models and for prediction of an event such as default in payment so that early intervention can be done to prevent financial loss.
This paper is organized as follows. Section 2 provides a review of the applications of data mining and credit scoring models. Then, the conceptual framework is presented. The methodology for constructing the credit scoring models is covered in Section 3. The results are discussed in Section 4. Finally, the limitations of the data mining approach to the construction of credit scoring models are highlighted in the concluding section.
For the past decade, the availability and high computing capability of data mining software enables business organizations to analyze and gain useful information from their large customer database. The main data mining techniques are predictive modeling, classification, cluster analysis and association (a.k.a market and basket) analysis. These techniques are highly useful for the purpose of credit scoring, target marketing, customer retention, customer profiling, marketing campaigns, fraud detection, churn modeling, customer segmentation, product-bundling, cross-selling and up-selling of products. Here we discuss some limitations to constructing credit scoring models. Two main limitations are the availability of data and sample selection issues. All too often a good credit scoring model cannot be obtained due to unavailability or poor quality (recording errors and high percentage of missing values) of available data. Moreover, credit scoring models built using historical data of past applicants who were accepted could lead to a biased sample when used to evaluate new applicants. To remedy this bias, SAS Enterprise Miner provide a Reject Inference node whereby the rejected applicants are scored (predicted as good or bad) using the model built based on accepted applicants. These scored data are then added to the accepted sample and the augmented sample serves as an input to a second modeling run (SAS Institute Inc., 2009). Next is the issue is which model is the best? According to results from past studies there is no overall ‘best’ model. The performance of credit scoring models depends on the data structure, data quality and the objective of the classification. Sophisticated techniques such as ANNs, MARS and SVM have shown only slight improvements in classification accuracy. In practical applications, classification methods which are easy to understand such as scorecards and decision trees are more appealing to users