ترجمه فارسی عنوان مقاله

پیش بینی تلاطم کارت اعتباری با استفاده از رگرسیون لجستیک و درخت تصمیم گیری

عنوان انگلیسی

Credit card churn forecasting by logistic regression and decision tree

کد مقاله	سال انتشار	تعداد صفحات مقاله انگلیسی
1414	2011	13 صفحه PDF

منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Expert Systems with Applications, Volume 38, Issue 12, November–December 2011, Pages 15273–15285

ترجمه کلمات کلیدی

- کارت اعتباری - تلاطم مشتری - رگرسیون لجستیک - درخت تصمیم گیری - داده کاوی -

کلمات کلیدی انگلیسی

Credit card,Customer churn,Logistic regression,Decision tree,Data mining,

دانلود رایگان 2 صفحه اول مقاله لاتین (PDF)

پیش نمایش مقاله

چکیده انگلیسی

In this paper, two data mining algorithms are applied to build a churn prediction model using credit card data collected from a real Chinese bank. The contribution of four variable categories: customer information, card information, risk information, and transaction activity information are examined. The paper analyzes a process of dealing with variables when data is obtained from a database instead of a survey. Instead of considering the all 135 variables into the model directly, it selects the certain variables from the perspective of not only correlation but also economic sense. In addition to the accuracy of analytic results, the paper designs a misclassification cost measurement by taking the two types error and the economic sense into account, which is more suitable to evaluate the credit card churn prediction model. The algorithms used in this study include logistic regression and decision tree which are proven mature and powerful classification algorithms. The test result shows that regression performs a little better than decision tree.

مقدمه انگلیسی

Data mining refers to discover knowledge from a large amount of data. In this paper, we discuss the application of data mining including logistic regression and decision tree to predict the churn of credit card users. The banks can take corresponding actions to retain the customers according to the suggestion of the models. With today’s cost-cutting and intensive competitive pressure, more companies start to focus on Customer Relationship Management (CRM). The unknown future behaviors of the customers are quite important to CRM. Hence, it is of crucial importance to detect the customers’ future decision then the company can take corresponding actions early (Glady, Baesens, & Croux, 2008). The customers who stop using the company’s products are usually called churners. Finding the churners can help companies retain their customers. Gustafsson, Johnson, and Roos (2005) studied telecommunication services to examine the effects of customer satisfaction and behavior on customer retention. Results indicated a need for CRM managers to more accurately determine customer satisfaction in order to reduce customer churn. One of the major reasons for this is that it costs less to retain existing customers than to acquire new customers (Roberts, 2000). It costs up to five times as much to make a sale to a new customer as it does to make an additional sale to an existing customer (Dixon, 1999, Floyd, 2000 and Slater and Narver, 2000). And, it is becoming more evident that the only way to remain a leader in this industry is to not only be customer-driven but also focus on building long-term relationships. Due to the development of information technology, many companies have accumulated a large amount of data. Analyzing this data can help the manager make the right marketing decision and pinpoint the right customer to market. Because of the large amount of accumulated data and serious churn related to credit card holders, it is a very good field in which to predict churn. Several studies have proved the effectiveness of the power of customer retention. A bank is able to increase its profits by 85% due to a 5% improvement in the retention rate (Reichheld & Sasser, 1990). Van den Poel and Larivière (2004) calculated the financial impact of a one percent increase in customer retention rate. The power of the model can stay for a relatively long time. According to the research of Neslin, the churn models in the data typically still perform very well if used to predict churn for a database compiled 3 months after the calibration data (Neslin, Gupta, Kamakura, Lu, & Mason, 2006). As the economy develops in China, a large amount of credit cards are issued. As of the third quarter of 2008, 132 million cards have been issued in China.2 But many of the card holders are not active (or called churn holders). With increasing bank competition, customers are able to choose among multiple service providers and easily exercise their right of switching from one service provider to another. If banks can predict future behaviors before the customers close their account or stop using the card to pay, they can market to retain these customers. The main purpose of this paper is not to provide a new data mining algorithm, but to focus on the application of the churn prediction, to provide a framework of understanding the knowledge of the card holders’ hidden pattern using the data of Chinese banks. From the data preparation to useful knowledge, the goal is application of churn prediction. In this paper, we introduce a way to complete churn prediction considering profit. The rest of the paper is organized as follows. The definition of churn and the summary of the algorithms and criteria are introduced in Section 2. The data used in the research is described in Section 3, and the modeling process based on logistic regression and decision tree are presented in Section 4 and 5, respectively. In Section 6, we conclude.

نتیجه گیری انگلیسی

In this research, we have proposed a process of churn prediction of credit card in China’s banking industry. The purpose of this research is not to propose a new algorithm, but focuses on the execution and the understanding of the model. The suitable design of derivable variables and the systematical way to build a model could be helpful to execute the rules. The two types of errors are not good enough to reflect the fit of the model. If the evaluation is only relied on accuracy, the result may mislead the choice of the model users. In our research, we have developed a new measure criterion called misclassification cost which takes the economic cost into the evaluation of the model. The empirical results of the case study in the paper have shown that the cost coefficient is an effective measure for the model’s performance. We have designed 135 variables to summarize behaviors and choices of the credit card users. After considering the multicollinearity, 95 variables are chosen to build the model. They are variables related to the categories of customer personal information, basic card information, risk information and transaction information. In the best model (model 6) of this paper based on logistic regression, there are two customer personal information variables, four card basic information variables, three risk information variables and six transaction information variables. The selected variables have shown that the demographic information makes little contribution to the churn prediction. The card information and the transaction information which relate to behavior work very well in the model. Decision tree algorithm has been also used to build models. The test results of the model have shown that the logistic regression performs better than the decision tree. In the existing researches, multicollinearity has not been discussed in decision tree application. In this paper, one more model using all of the 135 variables is also tried. The results show that the variables without multicollinearity work better. However, decision tree-based models can provide rules in the rule form easy to understand; the rules can guide banks in making marketing strategies. The decision tree rules have shown that the behaviors of the customer can better reflect future customer decisions. Even if it is impossible to access the personal information of the customers, it is acceptable to build a model based only on the transaction data of the users.