پیش بینی اکتساب سودآوری مشتری : پیدا کردن ترکیب بهینه ای از منبع داده ها و روش های داده کاوی
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|22290||2013||6 صفحه PDF||سفارش دهید||محاسبه نشده|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Expert Systems with Applications, Volume 40, Issue 6, May 2013, Pages 2007–2012
The customer acquisition process is generally a stressful undertaking for sales representatives. Luckily there are models that assist them in selecting the ‘right’ leads to pursue. Two factors play a role in this process: the probability of converting into a customer and the profitability once the lead is in fact a customer. This paper focuses on the latter. It makes two main contributions to the existing literature. Firstly, it investigates the predictive performance of two types of data: web data and commercially available data. The aim is to find out which of these two have the highest accuracy as input predictor for profitability and to research if they improve accuracy even more when combined. Secondly, the predictive performance of different data mining techniques is investigated. Results show that bagged decision trees are consistently higher in accuracy. Web data is better in predicting profitability than commercial data, but combining both is even better. The added value of commercial data is, although statistically significant, fairly limited.
The acquisition of new customers is considered a multi-stage process in which only certain leads become real customers (Cooper and Budd, 2007, Patterson, 2007 and Yu and Cai, 2007). This process is generally a stressful undertaking for sales representatives. Fortunately, these sales reps are assisted by models that assist them in selecting the ‘right’ leads to pursue. Two factors are important in selecting the ‘right’ lead: the probability the lead will convert into a customer and the profitability of that lead once he/she is a customer. This paper focuses on the latter. The goal is to design a model that is able to predict a dichotomous version of profitability (i.e., yes a customer is profitable or no a customer is not profitable). Profitability models exist, however, the main bottleneck they have is a lack of quality data. A new data source is introduced to solve this problem and it is compared in its performance to a more traditional data source. Furthermore, we investigate the impact of the data mining technique utilized on the estimated models of both data sources and examine which combination provides the highest accuracy. This paper investigates the impact of three techniques: logistic regression, decision trees and bagged decision trees. While logistic regression is a more basic data mining technique that is often used in research, (bagged) decision trees are more advanced and less popular. The reason to consider different data mining techniques is twofold. First, according to Neslin, Gupta, Kamakura, Lu, and Mason (2006), which data mining technique is used has an impact on the predictive performance of the created models. So, employing different techniques is a way to increase the overall predictive performance by finding the optimal technique. Second, the data mining techniques are used as a proxy of data complexity and noisiness. Basic techniques are only capable of estimating simple, linear relations, while more advanced techniques are able to fit more complex, noisy data. If (bagged) decision trees are not able to perform better than logistic regression for a specific data source, we can conclude that this data source is most likely linear and noise-free in nature. A quality model to predict profitability can only be constructed if quality data is available. Most models rely on commercial data purchased from specialized vendors (Rygielski et al., 2002 and Wilson, 2006). A relatively new and underinvestigated source of input for customer profitability models is textual information extracted from websites. Web mining and text mining can be used to gather this information from existing and potential customers’ websites (Thorleuchter, Van den Poel, & Prinzie, 2012). However, textual information is seldom used as input for analyses in companies (Coussement & Van den Poel, 2009). The reason for this is that web data contains unstructured data that is hard to analyze. Nevertheless, latent indexing techniques can be used to make the data more structured and available as input for acquisition models (Thorleuchter et al., 2012). This paper makes two main contributions to the existing literature. Firstly, it investigates the predictive performance of two sources of data: web data and commercially available data. The aim is to find out which of these two has the highest accuracy as input predictor for profitability and to research if they improve accuracy even more when combined. Secondly, the predictive performance of different data mining techniques is investigated. So the overall research question can be formulated as follows: which technique is most accurate in combination with which data source? These two main contributions also show in what way this paper is different from the one presented by Thorleuchter et al. (2012). It investigates and compares different data sources and data mining techniques instead of simply focusing on only web data using a logistic regression. In this way there is a clear benchmark (i.e., commercial data) to which web data can be compared. As a result, this paper can be seen as the first real test of using textual data extracted using web mining as input for profitability models. Furthermore, the results obtained in this paper are discussed in more detail. The remainder of the paper is structured as follows. First, the web vs. the commercially available data are discussed. Next we go deeper into the different data mining techniques. Third, a short description of the used data is given. Then, the results are presented. Finally, we end with a conclusion and discussion and we discuss the limitations of this paper and give suggestions for further research.
نتیجه گیری انگلیسی
The goal of this paper was to investigate which data mining techniques worked best in predicting customer profitability in combination with which data source. The techniques under investigation were logistic regression, decision trees and bagged decision trees. Two types of data were used: data originating from web mining and data purchased from a specialized vendor. The web data is free and available to anyone with internet access. Regardless of data source, it was the bagging of decision trees that provided the highest AUC (except for commercial data; in this case regression worked equally well). Web data had a higher predictive performance compared to commercial data, but the combination of both data types rendered the best results. This has the following managerial implications. Bagged decision trees should be preferred over logistic regression and normal decision trees to build a model. Moreover, web data is the ideal starting input for this model. If the budget is available to buy external data, this can be combined with web data to further increase the predictive performance of the model. However, a cost-benefit analysis should be done to find out whether the high cost of buying data is justified by the (relatively) small increase in predictive power.