به سوی یک CRM موفق: انتخاب متغیر، نمونه برداری و اثر کلی
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
865 | 2006 | 12 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Decision Support Systems, Volume 41, Issue 2, January 2006, Pages 542–553
چکیده انگلیسی
This paper studies the effects of variable selection and class distribution on the performance of specific logit regression (i.e., a primitive classier system) and artificial neural network (ANN; a relatively more sophisticated classifier system) implementations in a customer relationship management (CRM) setting. Finally, ensemble models are constructed by combining the predictions of multiple classiers. This paper shows that ANN ensembles with variable selection show the most stable performance over various class distributions.
مقدمه انگلیسی
Advances in data warehouses and the vast amount of customers' demographic, psychographic, and behavioral information provide marketing managers a new marketing channel—database micromarketing. Traditional mass marketing channels (e.g., advertisements in TV and newspapers) have been very successful and are still important. However, customer relationship management (CRM) programs for macromarketing slowly give way to new CRM programs for micromarketing. This is because, in micromarketing programs, the firms can develop a marketing message directed toward a specific group of households that are most likely to open to the customized message. Both marketing [7] and [13] and data mining researchers [3] and [10] have presented various database marketing approaches for successful CRM programs. The simplest example is the RFM (recency, frequency, monetary) approach that targets households by using knowledge of the customer's purchase history [21]. When targeting new households with no prior relationship, the analysis of the relationship between demographics and the response to a test mailing of a representative household sample can be utilized. In Piatetsky-Shapiro and Masand [18], the protability condition of a campaign was explicitly formulated as a function of the lift of the model, uniform campaign cost per mailing, and marginal revenue per identified positive record. Chou et al. [9] devised an effective model for identifying prospective insurance buyers when buyer versus nonbuyer information is not available. Gersten et al. [12] presented a model to select prospects in the automotive industry where the buying decision takes a long time. A good summary of related studies on CRM programs from marketing and data mining communities can be found in Kim and Street [15]. Traditionally, the optimal selection of customer targets has been considered one of the most important factors for a successful CRM program. Thus, many models have been proposed to identify as many customers as possible who will respond to a specific solicitation campaign letter, or who will end further relationships with the firm. In particular, with exceptionally high annual churn rates (20–40%), firms in mobile telecommunications industry try to develop predictive models that accurately identify which customers are most likely to churn. In addition to the predictive accuracy, the comprehensibility of a model becomes another important issue in developing CRM programs. A rule-based system that consists of too many if–then statements makes it difficult for marketing managers to understand key drivers of consumer behaviors. The poor comprehensibility can greatly reduce managers' trust in the system itself, and prevent decision makers from developing long-term CRM programs. The ultimate goal of this study is to provide practitioners useful guidelines to building predictive models for effective CRM programs. In particular, this paper investigates the importance of variable selection and class distribution, and how they can affect the performance of predictive models. Specific implementations of two different learning algorithms are used: logit regression (i.e., a primitive model) and artificial neural networks (ANNs; a relatively more sophisticated classifier model). These algorithms have been widely used in developing predictive models for CRM applications. In particular, ANNs have been used in many other marketing applications such as customer clustering [1] and market segmentation [14]. In this study, ANNs and logit linear regression are used to estimate each individual's likelihood of ending current relationships after learning linear or possibly nonlinear relationships between given input variables and the churner indicator. Variable selection is used to enhance the comprehensibility of models in this paper. Variable selection is the process of choosing a subset of the original predictive variables by eliminating variables that are either redundant or possess little predictive information. By identifying key determinants of churning behaviors of customers, variable selection can not only enhance the comprehensibility of predictive models but also save a great amount of computational time and cost. However, eliminating many input variables may have different effects on the predictive accuracy of models depending on their representational powers and structural complexities. Therefore, this study aims to analyze the relationship between variable selection and the predictive accuracy of predictive models over various class distributions. Various sampling techniques are also used to shorten computational time and enhance the accuracy of a predictive model by removing noise records that have the same values for input variables except class indicator. Sampling in this paper is also used to vary class distributions to study the relationships between class distribution and the predictive accuracy of classifiers. When class distributions in the training and test data are significantly different, the calibrated model on the training data may not perform well on the test data. However, the same class distribution of the training and test data does not necessarily make the calibrated model perform best on the test set [23]. Therefore, it is necessary to vary the class distribution of the training data to the optimal class distribution for calibrating predictive models. In particular, this paper intends to identify optimal class distributions for different types of classifiers. This paper also presents an ensemble approach that combines the predictions of multiple models. In order to build an ensemble, the estimated probabilities of being a churner from multiple models are combined with the equal weight. Furthermore, the effects of variable selection, structural complexity of models, and class distribution on the ensemble models are empirically estimated to provide decision makers and data analysts useful guidelines on how to develop an accurate model for CRM programs. This paper is organized as follows. Section 2 briefly reviews variable subset selection, sampling, and ensemble decision making. Section 3 introduces the original data set and presents a reduced variable subset by a simple variable selection algorithm. In Section 4, the experimental results of both ANN and logit models are presented. In particular, the relative importance of variable selection, sampling, and ensemble decision making in two different algorithms is contrasted and highlighted. Section 5 concludes the paper and provides suggestions for future research directions.
نتیجه گیری انگلیسی
This paper studies the effects of variable selection and class distribution on the performance of individual and ensemble classifiers. Fig. 2 graphically shows the performance of logit and ANN models in previous three tables. As a general guideline, it is very important for data analysts to consider the relationship between the amount of information and the complexity of predictive models. When highly sophisticated predictive models are chosen for data analysis, compact information via variable selection is highly recommended. Sophisticated models with compact information show great improvement in terms of predictive accuracy and robustness. However, when the predictive model is primitive, more input variables provide additional information to enhance the performance of classifiers. Most of all, by combining multiple classifiers as an ensemble, market managers can have an accurate model. Note, however, that an ensemble model is computationally expensive compared to a single predictive model. Although the results presented in this study are promising, they may be suboptimal solutions. One of possible reasons comes from the fact that the forward selection may produce suboptimal solutions because of its irrevocable variable selection process and inability to consider the interaction among variables. For example, consider a variable fk+1, which is not very predictive alone and, therefore, is not chosen among top k variables. However, it is possible that fk+1 can contribute more than the variable f1, chosen at the first stage to the predictive power when combined with other k−1 selected variables. However, this paper does not analyze this possibility further to focus on the more fundamental research goal: estimating relative effects of various factors to develop successful CRM programs. In future work, various classifiers for CRM programs will be compared and analyzed to confirm and complement the findings in this paper. Note that the current paper investigates a particular instance of primitive (logit regression) and sophisticated classifiers (ANNs); hence, the findings in this paper apply only to the instances examined. The structural and representational characteristics of classifiers can affect the relative importance of variable selection and class distribution. For example, a naive Bayes classifier, one Bayesian learning method, can be more sensitive to variable selection and class distribution than classifiers analyzed in this paper. This is because the classification rules of the naive Bayes classifier are based on the frequency of data combinations within the training data. Rule-based systems such as decision tree algorithms can be very sensitive to the chosen variable subset. This is because different variable subsets provide different combinations of variables at the root and descendant nodes of the tree, which affects the performance of tree classifiers. Once experimental results are collected and analyzed, more practical guidelines for developing CRM programs with various learning algorithms can be presented.