پیش بینی رفتار پراکنده مشتری با استفاده از داده های متعدد: روش مشارکتی MK-SVM
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|20903||2012||9 صفحه PDF||سفارش دهید||7459 کلمه|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Knowledge-Based Systems, Volume 35, November 2012, Pages 111–119
In the customer-centered marketplace, the understanding of customer behavior is a critical success factor. The big databases in an organization usually involve multiplex data such as static, time series, symbolic sequential and textual data which are separately stored in different databases of different sections. It poses a challenge to traditional centralized customer behavior prediction. In this study, a novel approach called collaborative multiple kernel support vector machine (C-MK-SVM) is developed for distributed customer behavior prediction using multiplex data. The alternating direction method of multipliers (ADMM) is used for the global optimization of the distributed sub-models in C-MK-SVM. Computational experiments on a practical retail dataset are reported. Computational results show that C-MK-SVM exhibits better customer behavior prediction performance and higher computational speed than support vector machine and multiple kernel support vector machine.
Today business is evolving from the product-centered to a customer-centered environment  and . The in-depth understanding of customer behavior is a critical success factor to build long term, profitable relationships with specific customers in global competitive marketplace . Therefore, customer behavior prediction is a crucial means of analytical customer relationship management (CRM)  and . Yan et al. proposed a framework of customer behavior prediction for customer retention and profit maximization in telecommunications . This framework consists of five components: churn prediction, churn reason prediction, offer acceptance prediction, revenue estimate, and collection risk estimate. Kim et al. proposed a methodology to identify the propensity of a specific customer to buy a product to enhance the one-to-one marketing . Prinzie et al. used the durable acquisition sequence information and duration information to propose a Next-Product-to-Buy model for cross-selling . In summary, customer behavior prediction mainly involves customer churn prediction and customer purchase prediction. Customer churn prediction is a part of loyalty programs. It aims at identifying the customers who are prone to switch at least some of their purchases from one company, and assisting to companies in improving intervention strategies to convince these customers to stay , , , , ,  and . Customer purchase prediction aims at predicting whether or not the customer is prone to purchase or repeatedly purchase the product, or predicting the product group from which the customer is prone to purchase his next product , , , , ,  and . Therefore, customer purchase prediction is regarded as an important basis for direct marketers to target personalized advertise and promotion activities to specific classes of customers  and . Data is becoming one of the top priorities for information services executives. From the data mining perspective, customer behavior prediction can be regarded as a classification problem that is one of the most common tasks in data mining  and . Data mining techniques such as artificial neural network (ANN) , support vector machine (SVM) ,  and , Bayesian network  and , and ensemble learning  are widely used to predict customer behavior and bring potentially useful decision information. All above studies make contributions to centralized customer behavior prediction. With the growth of database technologies, the limitations of traditional centralized customer behavior prediction arise. The databases within an organization which collect and store internal and external data increase dramatically . Internal data refer to data generated from systems within an organization, such as customer demographic data, transactional data, product-based data, customer review and complaint data . External data refer to data that is not generated by systems within an organization, such as government census data, industry benchmark data, customer psychographic data and economic data  and . Internal and external data involve multiple types. For example, customer demographic and psychographic data are static data; customer transactional data and economic data are multivariate time series data  and ; product acquisition data is symbolic sequential data , ,  and ; customer review and complaint data are textual data . In customer behavior prediction, the time series data are usually transformed into static data through aggregation  and , and the symbolic sequential and textual data are usually neglected . Moreover, multiplex data are usually separately stored in different databases of different sections . Traditional centralized customer behavior prediction mainly uses static data  and . It faces the challenge of integrating multiple distributed data sources and multiple types of data to reach the combined prediction results. With the emergence of advanced computing technologies such as sensor networks and cloud computing, lots of efforts have been made on distributed learning in recent years  and . Support vector machine (SVM) is a state-of-art machine learning approach  and . Multiple kernel support vector machine (MK-SVM) is a popular topic in kernel methods , , , , ,  and . It aims at learning the optimal kernel function by optimizing the combination of multiple heterogeneous basic kernels  and , multiple basic kernels with different feature subsets  or hyperparameters . MK-SVM has been solved by an optimization method such as semi-infinite linear program, quadratically-constrained quadratic program, gradient-based approaches and block coordinator descent approach , , , , ,  and . MK-SVM is well suited to integrating multiplex data. The topics of distributed SVM have been discussed extensively in recent years ,  and . However, little research has been done on distributed MK-SVM. At the best of our knowledge, it is the first paper on improved MK-SVM for distributed customer behavior prediction using multiplex data. The alternating direction method of multipliers (ADMM) was first proposed by Glowinski and Marrocco in 1975  and Gabay and Mercier in 1976 . Boyd et al. provided a detail discussion of ADMM . They discussed the applications of ADMM to many convex optimization problems such as Basis Pursuit and Lasso. Moreover, Boyd et al. considered two ways including splitting across examples and splitting across features to solve the convex optimization problems in a distributed manner. Hence, ADMM is a powerful algorithm for distributed convex optimization and has the potential for solving MK-SVM in a distributed way. In this study, a novel approach called collaborative multiple kernel support vector machine (C-MK-SVM) is developed for distributed customer behavior prediction using multiplex data. In comparison with traditional customer behavior prediction, the major contributions of this study are summarized as follows. Firstly, a framework of distributed customer behavior prediction is proposed to integrate multiple distributed data sources and multiple types of data to improve the prediction accuracy. Secondly, in this framework, a collaborative MK-SVM (C-MK-SVM) approach is developed to model multiple feature subsets and multiple sample subsets in a decomposition-coordination manner. In C-MK-SVM, ADMM is applied to the global optimization of the sub-models with multiple basic kernels in the local processors. This paper is organized as follows. In the next section, the fundamentals of SVM are explained. In Section 3, the proposed method C-MK-SVM is specified. A framework of C-MK-SVM for distributed customer behavior prediction is developed in Section 4. The computational experiments are reported in Section 5. In Section 6, the conclusions are presented.
نتیجه گیری انگلیسی
Multiple kernel learning draws more and more attention in recent years. At the best of our knowledge, relatively little research concerns distributed multiple kernel learning. In this study, a framework of distributed customer behavior prediction using multiplex data is proposed to adapt to the emerging distributed computing environment, a novel approach called collaborative multiple kernel support vector machine (C-MK-SVM) is developed for modeling multiplex data in a distributed manner, and the alternating direction method of multipliers (ADMM) is used for the global optimization of C-MK-SVM. The computational experiments are conducted on a practical retail dataset. The experimental results show that: (1) C-MK-SVM obtains higher PCC, Sensitivity and AUC and far less computational time than SVM and MK-SVM; (2) C-MK-SVM performs well on the imbalanced data, and is robust to noise data; (3) the usages of time series and symbolic sequential variables improve the prediction performance of C-MK-SVM. Customer behavior prediction is a standard binary classification problem. Therefore, this study only focuses on the binary class C-MK-SVM. Although the centralized multiclass MK-SVM has been studied , the distributed multiclass MK-SVM is a new and interesting topic. Multiclass C-MK-SVM using ADMM, as an extension of C-MK-SVM using ADMM proposed in this study, will be our further work.