مقایسه تکنیک های داده کاوی برای دقت پیش بینی احتمال به طور پیش فرض از مشتریان کارت اعتباری
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|22134||2009||8 صفحه PDF||سفارش دهید||3633 کلمه|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Expert Systems with Applications, Volume 36, Issue 2, Part 1, March 2009, Pages 2473–2480
This research aimed at the case of customers’ default payments in Taiwan and compares the predictive accuracy of probability of default among six data mining methods. From the perspective of risk management, the result of predictive accuracy of the estimated probability of default will be more valuable than the binary result of classification - credible or not credible clients. Because the real probability of default is unknown, this study presented the novel “Sorting Smoothing Method” to estimate the real probability of default. With the real probability of default as the response variable (Y), and the predictive probability of default as the independent variable (X), the simple linear regression result (Y = A + BX) shows that the forecasting model produced by artificial neural network has the highest coefficient of determination; its regression intercept (A) is close to zero, and regression coefficient (B) to one. Therefore, among the six data mining techniques, artificial neural network is the only one that can accurately estimate the real probability of default.
In recent years, the credit card issuers in Taiwan faced the cash and credit card debt crisis and the delinquency is expected to peak in the third quarter of 2006 (Chou, 2006). In order to increase market share, card-issuing banks in Taiwan over-issued cash and credit cards to unqualified applicants. At the same time, most cardholders, irrespective of their repayment ability, overused credit card for consumption and accumulated heavy credit and cash–card debts. The crisis caused the blow to consumer finance confidence and it is a big challenge for both banks and cardholders. In a well-developed financial system, crisis management is on the downstream and risk prediction is on the upstream. The major purpose of risk prediction is to use financial information, such as business financial statement, customer transaction and repayment records, etc., to predict business performance or individual customers’ credit risk and to reduce the damage and uncertainty. Many statistical methods, including discriminant analysis, logistic regression, Bayes classifier, and nearest neighbor, have been used to develop models of risk prediction (Hand & Henley, 1997). With the evolution of artificial intelligence and machine learning, artificial neural networks and classification trees were also employed to forecast credit risk (Koh and Chan, 2002 and Thomas, 2000). Credit risk here means the probability of a delay in the repayment of the credit granted (Paolo, 2001). From the perspective of risk control, estimating the probability of default will be more meaningful than classifying customers into the binary results – risky and non-risky. Therefore, whether or not the estimated probability of default produced from data mining methods can represent the “real” probability of default is an important problem. To forecast probability of default is a challenge facing practitioners and researchers, and it needs more study (Baesens et al., 2003, Baesens et al., 2003, Desai et al., 1996, Hand and Henley, 1997, Jagielska and Jaworski, 1996, Lee et al., 2002, Rosenberg and Gleit, 1994 and Thomas, 2000). Because the real probability of default is unknown, this study proposed the novel “Sorting Smoothing Method” to deduce the real default probability and offered the solutions to the following two questions: (1) Is there any difference of classification accuracy among the six data mining techniques? (2) Could the estimated probability of default produced from data mining methods represent the real probability of default? In the next section, we review the six data mining techniques (discriminant analysis, logistic regression, Bayes classifier, nearest neighbor, artificial neural networks, and classification trees) and their applications on credit scoring. Then, using the real cardholders’ credit risk data in Taiwan, we compare the classification accuracy among them. Section 4 is dedicated to the predictive performance of probability of default among them. Finally, Section 5 contains some concluding remarks.
نتیجه گیری انگلیسی
This paper examines the six major classification techniques in data mining and compares the performance of classification and predictive accuracy among them. The novel Sorting Smoothing Method, for the first time, is presented to estimate the real probability of default. In the classification accuracy among the six data mining techniques, the results show that there are little differences in error rates among the six methods. However, there are relatively big differences in area ratio among the six techniques. Obviously, area ratio is more sensitive and is an appropriate criterion to measure the classification accuracy of models. Artificial neural networks perform classification more accurately than the other five methods. In the predictive accuracy of probability of default, artificial neural networks also show the best performance based on R2 (0.9647, close to 1), regression intercept (0.0145, close to 0), and regression coefficient (0.9971, close to 1). The predictive default probability produced by ANN is the only one that could be used to represent real probability of default. From the perspective of risk control, estimating the probability of default is more meaningful than classifying clients into binary results – risky and non-risky. Therefore, artificial neural networks should be employed to score clients instead of other data mining techniques, such as logistic regression.