سیستم هوشمند برای هدف قرار دادن مشتری : یک رویکرد داده کاوی
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
5504 | 2004 | 14 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Decision Support Systems, Volume 37, Issue 2, May 2004, Pages 215–228
چکیده انگلیسی
We propose a data mining approach for market managers that uses artificial neural networks (ANNs) guided by genetic algorithms (GAs). Our predictive model allows the selection of an optimal target point where expected profit from direct mailing is maximized. Our approach also produces models that are easier to interpret by using a smaller number of predictive features. Through sensitivity analysis, we also show that our chosen model significantly outperforms the baseline algorithms in terms of hit rate and expected net profit on key target points.
مقدمه انگلیسی
The ultimate goal of decision support systems is to provide managers with information that is useful for understanding various managerial aspects of a problem and to choose a best solution among many alternatives. In this paper, we focus on a very specific decision support system on behalf of market managers who want to develop and implement efficient marketing programs by fully utilizing a customer database. This is important because, due to the growing interest in micro-marketing, many firms devote considerable resources to identifying households that may be open to targeted marketing messages. This becomes more critical through the easy availability of data warehouses combining demographic, psychographic and behavioral information. Both the marketing [8], [19] and [33] and data-mining communities [4], [32], [27] and [13] have presented various database-based approaches for direct marketing. A good review of how data mining can be integrated into a knowledge-based marketing can be found in [41]. Traditionally, the optimal selection of mailing targets has been considered one of the most important factors for direct marketing to be successful. Thus, many models aim to identify as many customers as possible who will respond to a specific solicitation campaign letter, based on the customer's estimated probability of responding to marketing program. This problem becomes more complicated when the interpretability of the model is important. For example, in database marketing applications, it is critical for managers to understand the key drivers of consumer response. A predictive model that is essentially a “black box” is not useful for developing comprehensive marketing strategies. At the same time, a rule-based system that consists of too many if-then statements can make it difficult for users to identify the key drivers. Note that two principal goals, model interpretability and predictive accuracy, can be in conflict. Another important but often neglected aspect of models is the decision support function that helps market managers make strategic marketing plans. For example, market managers want to know how many customers should be targeted to maximize the expected net profit or increase market share while at least recovering the operational costs of a specific campaign. In order to attain this goal, market managers need a sensitivity analysis that shows how the value of the objective function (e.g., the expected net profit from the campaign) changes as campaign parameters vary (e.g., the campaign scope measured by the number of customers targeted). In this paper, we propose a data-mining approach to building predictive models that satisfies these requirements efficiently and effectively. First, we show how to build predictive models that combine artificial neural networks (ANNs) [37] with genetic algorithms (GAs) [18] to help market managers identify prospective households. ANNs have been used in other marketing applications such as customer clustering [16] and [1] and market segmentation [21] and [2]. We use ANNs to identify optimal campaign targets based on each individual's likelihood of responding to campaign message positively. This can be done by learning linear or possibly nonlinear relationships between given input variables and the response indicator. We go one step further from this traditional approach. Because we are also interested in isolating key determinants of customer response, we select different subsets of variables using GAs and use only those selected variables to train different ANNs. GAs have become a very powerful tool in finance, economics, accounting, operations research, and other fields as an alternative to hill-climbing search algorithms. This is mainly because those heuristic algorithms might lead to a local optimum, while GAs are more likely to avoid local optima by evaluating multiple solutions simultaneously and adjusting their search bias toward more promising areas. Further, GAs have been known to have superior performance to other search algorithms for data sets with high dimensionality [28]. Second, we demonstrate through a sensitivity analysis that our approach can be used to determine the scope of marketing campaign given marginal revenue per customer and marginal cost per campaign mail. This can be a very useful tool for market managers who want to assess the impacts of various factors such as mailing cost and limited campaign budget on the outcomes of marketing campaign. Finally, we enhance the interpretability of our model by reducing the dimensionality of data sets. Traditionally, feature extraction algorithms including principal component analysis (PCA) have been often used for this purpose. However, PCA is not appropriate when the ultimate goal is not only to reduce the dimensionality, but also to obtain highly accurate predictive models. This is because PCA does not take into account the relationship between dependant and other input variables in the process of data reduction. Further, the resulting principal components from PCA can be difficult to interpret when the space of input variables is huge. Data reduction is performed via feature selection in our approach. Feature selection is defined as the process of choosing a subset of the original predictive variables by eliminating features that are either redundant or possess little predictive information. If we extract as much information as possible from a given data set while using the smallest number of features, we cannot only save a great amount of computing time and cost, but also build a model that generalizes better to households not in the test mailing. Feature selection can also significantly improve the comprehensibility of the resulting classifier models. Even a complicated model—such as a neural network—can be more easily understood if constructed from only a few variables. Our methodology exploits the desirable characteristics of GAs and ANNs to achieve two principal goals of household targeting at a specific target point: model interpretability and predictive accuracy. A standard GA is used to search through the possible combinations of features. The input features selected by GA are used to train ANNs. The trained ANN is tested on an evaluation set, and a proposed model is evaluated in terms of two quality measurements—cumulative hit rate (which is maximized) and complexity (which is minimized). We define the cumulative hit rate as the ratio of the number of actual customers identified out of the total number of actual customers in a data set. This process is repeated many times as the algorithm searches for a desirable balance between predictive accuracy and model complexity. The result is a highly accurate predictive model that uses only a subset of the original features, thus simplifying the model and reducing the risk of overfitting. It also provides useful information on reducing future data collection costs. In order to help market managers determine the campaign scope, we run the GA/ANN model repeatedly over different target points to obtain local solutions. A local solution is a predictive feature subset with the highest fitness value at a specific target point. At a target point i where 0≤i≤100, our GA/ANN model searches for a model that is optimal when the best i% of customers in a new data set is targeted based on the estimated probability of responding to the marketing campaign. Once we obtain local solutions, we combine them into an Ensemble, a global solution that is used to choose the best target point. Note that our Ensemble model is different from popular ensemble algorithms such as Bagging [7] and Boosting [15] that combine the predictions of multiple models by voting. Each local solution in our Ensemble model scores and selects prospects at a specific target point independently of other local solutions. Finally, in order to present the performance of local solutions and an Ensemble, we use a lift curve that shows the relationship between target points and corresponding cumulative hit rate. This paper is organized as follows. In Section 2, we explain GAs for feature selection in detail, and motivate the use of a GA to search for the global optimum. In Section 3, we describe the structure of the GA/ANN model, and review the feature subset selection procedure. In Section 4, we present experimental results of both the GA/ANN model and a single ANN with the complete set of features. In particular, a global solution is constructed by incorporating the local solutions obtained over various target points. We show that such a model can be used to help market managers determine the best target point where the expected profit is maximized. In Section 5, we review related work for direct marketing from both the marketing and data mining communities. Section 6 concludes the paper and provides suggestions for future research directions.
نتیجه گیری انگلیسی
In this paper, we presented a novel approach for customer targeting in database marketing. We used a genetic algorithm to search for possible combinations of features and an artificial neural network to score customers. One of the clear strengths of the GA/ANN approach is its ability to construct predictive models that reflect the direct marketer's decision process. In particular, with information of campaign costs and profit per additional actual customer, we show that our system not only maximizes the hit rate at fixed target point but also selects a “best” target point where expected profit from direct mailing is maximized. Further, our models are made easier to interpret by using smaller number of features. In future work, we will look into a customer targeting model that assumes the heterogeneous structure of marginal revenue per each prospect. The data set we analyzed in this paper does not include critical information such as monetary values that each customer spent to purchase a caravan insurance policy. It is also reasonable to assume that there is relatively small difference in terms of monetary values in insurance options available to customers. However, assume the case that a non-profit organization sends solicitation letters to possible donors for charity. For this organization, maximizing the total amount of donated money is more important than identifying as many donors as possible. This is because, in an extreme case, single donor can donate more money than all the other donors. In this case, value-based customer targeting becomes critical. By considering raised monetary value by targeted customers as one objective, our GA/ANN model will be able to find an optimal solution with maximized monetary value. Related to this direction of research, it is interesting to see if a bi-level model that has two separate procedures, one for estimating donation probability and the other for estimating donation amounts, can do better. It was claimed that any single classifier model that learns two parameters is likely to make more errors in learning decision rules than a bi-level model [43]. Another research direction is to investigate whether or not the chosen feature subsets are related with target points. Our experimental results in Section 4.2.1 show that different feature subsets are chosen at different target points except for a few common features. We expected a good predictive subset of features to appear in most of the local solutions. We suspect that a certain subset of features can discriminate well buyers from non-buyers at a target point, but not as well as other features at different points. It could also happen because of strong correlation among insurance-related features. However, it warrants further investigation to support our speculation.