انتخاب متغیر های بیزی برای مدل های پاسخ های باینری و پیش بینی بازاریابی مستقیم
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|23577||2010||7 صفحه PDF||سفارش دهید||6500 کلمه|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Expert Systems with Applications, Volume 37, Issue 12, December 2010, Pages 7656–7662
Selecting good variables to build forecasting models is a major challenge for direct marketing given the increasing amount and variety of data. This study adopts the Bayesian variable selection (BVS) using informative priors to select variables for binary response models and forecasting for direct marketing. The variable sets by forward selection and BVS are applied to logistic regression and Bayesian networks. The results of validation using a holdout dataset and the entire dataset suggest that BVS improves the performance of the logistic regression model over the forward selection and full variable sets while Bayesian networks achieve better results using BVS. Thus, Bayesian variable selection can help to select variables and build accurate models using innovative forecasting methods.
The key objective for direct marketing forecasting is to identify potential customers from an existing database so that marketers can design accurate targeted marketing to increase sales and profitability. Meanwhile, today’s businesses are capable of generating and collecting a huge amount of customer and transactional data in a relatively short period. Data reduction, and more specifically variable selection, is a major challenge in database marketing (Rossi & Allenby, 2003). Traditional methods of stepwise variable selection do not consider the interrelations among variables and may not identify the best subset for model building. Researchers need a more efficient method of variable selection to build accurate forecasting models and to take advantage of innovative modelling methods that have become increasingly viable. Recently, the Bayesian method has been proposed as a semi-automatic method for variable selection and provides a feasible solution for exhaustive search (George, 2000). In comparison with the conventional statistical methods, Bayesian variable selection (BVS) is more beneficial for forecasting methods that are apt in handing nonlinearity and interactions among variables. However, how to execute efficient BVS remains a significant challenge. Moreover, whether the Bayesian approach to automatic variable selection can improve the accuracy of forecasting with real data warrant investigation (Rossi & Allenby, 2003). This study proposes Bayesian variable selection using informative priors to select variables to build direct marketing forecasting models. For computing analytically tractable priors and posterior model probabilities, we adopt the efficient algorithms of Chen, Ibrahim, and Yiannoutsos (1999) that require Gibbs samples from a single model. We first perform variable selection using both forward selection method and BVS. Then, we test the effect of the selected subsets on the forecast accuracy of both logistic regression and Bayesian networks on a holdout dataset and the entire dataset. The results of validation suggest that BVS improves the accuracy of forecast of logistic regression over the forward selection and the full variable sets. Bayesian networks, a model of joint probability distribution, achieve better results with the BVS set. These findings have meaningful implications for selecting variables to build forecasting models and direct marketing.
نتیجه گیری انگلیسی
7.1. Findings and implications First, given the large sample size, forward and backward selection of variables makes little difference in the variables that are actually selected. Stepwise selection is computationally inefficient with a large dataset. Second, BVS by relying on the informative priors results in a different set of variables. Third, by using the top decile lift as the performance criterion for the forecasting models, BVS improves the performance of logistic regression over the forward selection method based on the holdout validation. But the improvement is negligible when validation is performed on the entire dataset. Furthermore, Bayesian networks, a model of joint distribution, achieve better predictive results based on the BVS set by exploring the interactions among variables, thus benefit more from BVS than logistic regression. In other words, BVS can potentially supply a set of variables with less noise and give a better opportunity to identify the underlying data distribution. Overall, the results suggest that BVS using informative priors from customer purchase history provides a feasible solution for exhaustive search to select variables and build forecasting models. Having more variables and data is often a mixed blessing and does not guarantee building better forecasting models. Managers face an ever-growing need to reduce the number of variables effectively. Although researchers can rely on prior experience and exercise their judgment in trial-and-error selection processes, the increasing variety and number of variables would make an automated variable selection solution more desirable. BVS provides an efficient and exhaustive method to select variables for subsequent model building. It allows researchers to use the insight from previous studies to build more accurate forecasting models and can potentially improve the performance of direct marketing operations. An advanced method of variable selection also requires more sophisticated modelling methods that consider the interrelations among variables. Given the demand for useful information on a timely basis, methodological and technological advances should be undertaken to greatly reduce the marketing research cycle time (Malhotra, Peterson, & Kleiser, 1999). As the amount and variety of data collected by marketers continue to grow, the method advanced here provides an efficient tool for marketing managers to extract and update knowledge from the continuous data inflow in a timely fashion and to select better subsets for building forecasting models and assisting management decisions. 7.2. Limitations and suggestions Due to space and time limitation, we compare the BVS method only with the forward and backward selection in logistic regression. Other approaches to variable selection can offer more interesting comparisons among the competing methods. The results of the study are based on direct marketing dataset. The proposed method and its generalizability need to be tested on other types of data and problems. Although the results are very encouraging for applications of BVS in business forecasting, the validation was done only on one holdout dataset and the entire dataset of the same period. Validating the model and BVS set on future data would provide stronger evidence on the merit of the proposed method and its applicability and potential benefits under the real-life business scenarios. This approach is also very useful when the researchers have new variables to examine while incorporating the effect of pre-existing variables from a previous study. It is possible to give historical data different weights. Moreover, variable selection can also be seen as a natural by-product of Bayesian networks learning (Chen, Hao, & Ibrahim, 2000). Furthermore, the BVS approach can also be applied to other marketing research problems that are based on large customer databases, such as predicting brand switching, churn behaviour, loan default and other issues related to forecasting sales or losses and managing customer relationships. These problems are similar to direct marketing forecasting in many ways, including large datasets, a small class of the target customers, and perhaps budget constraints that require accurate forecasting models and targeted marketing actions. These large noisy datasets and the great number and variety of variables make an automated or semi-automated process of variable section an attractive alternative. Given its efficiency in computing conditional and marginal probabilities, Bayesian variable selection has the potential to provide an efficient solution for fully automatic variable selection that can help to improve forecasting accuracy and the performance of marketing and business operations.