درک پویا از فرآیندهای رفتار مشتری بر اساس خوشه بندی و استخراج توالی
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
20910 | 2014 | 10 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Expert Systems with Applications, Volume 41, Issue 10, August 2014, Pages 4648–4657
چکیده انگلیسی
In this paper, a novel approach towards enabling the exploratory understanding of the dynamics inherent in the capture of customers’ data at different points in time is outlined. The proposed methodology combines state-of-art data mining clustering techniques with a tuned sequence mining method to discover prominent customer behavior trajectories in data bases, which — when combined — represent the “behavior process” as it is followed by particular groups of customers. The framework is applied to a real-life case of an event organizer; it is shown how behavior trajectories can help to explain consumer decisions and to improve business processes that are influenced by customer actions.
مقدمه انگلیسی
Various data mining techniques have been proven to be a valuable approach in the quest for knowledge discovery in data from an exploratory point of view. Clustering techniques, for instance, combined with strong visualization techniques, allow analysts to get fast insights into the data they are confronted with. For these reasons, techniques such as k-means clustering and self-organizing maps have been widely and successfully applied in practice and extensively discussed in the literature ( Kohonen, 2001). When executed at one specific moment in time, however, as it often happens, the aforementioned techniques offer a static picture describing the composition of the data set at hand based on certain patterns derived from the attributes characterizing the instances in this data set (see e.g. Zorrilla and Garcia-Saiz, 2013, Li et al., 2011 and Carlei et al., 2012). It would, however, be of great interest for the analyst to be able to understand the dynamics associated with the items represented in the data base, hence recording a “movie” of the data set instead of static pictures at specific points in time. This concept is denoted “trajectory” or “customer behavior trajectory” in the remainder of this paper. By describing an object using different attributes, it is possible to obtain a state which describes this object. When repeating this description at different points in time, a sequence of states, or trajectory, is obtained and can be analyzed. In a case where the object of interest is a customer, different attributes linked to her behavior can be captured at a specific point in time and will provide a description of the state of this customer, also called customer behavior. By repeating this description, a customer behavior trajectory is obtained. In this paper, which is a journal extension of Seret, vanden Broucke, Baesens, and Vanthienen (in press), an approach enabling the exploratory understanding of such dynamics inherent in the capture of customers’ data at different points in time is proposed. The contribution of this paper is twofold. First, a general methodology is proposed and offers a comprehensible way to analyze movements in high dimensional spaces using unsupervised methods and visualization. Although multiple researchers are working on dynamic clustering or trajectory mining, few of them are really interested in the comprehensibility of the results for practitioners, which is one of the main research motivations of this work. Broadly summarized, our novel approach is based on a two-step clustering approach, incorporating both self-organizing maps and k-means that will generate coordinate sequences used as input for a sequence mining technique. The proposed methodology combines these methods to discover prominent customer behavior trajectories in data bases, which together help analysts to understand the behavior process as it is followed by particular groups of customers. Second, the methodology is applied in order to answer a complex business question in a real-life ticketing context. From a business perspective, understanding the dynamics of customer behaviors is a logical next step for companies applying segmentation techniques to understand their customers since, by definition, they may not stay indefinitely in the same segments. Capturing these movements becomes then a crucial objective which can only be achieved if comprehensible techniques are proposed. With this in mind, the step-wise visual approach proposed in this work aims not only at identifying the movements but also at reporting them in a way comprehensible for end-users. These different considerations show the relevance of this work for both researchers and practitioners. Moreover, thanks to the general methodology proposed in Section 3.4, the experiments of Section 4 can be easily repeated in other contexts. The remainder of the paper is structured as follows, in Section 2, an overview of related work is provided. Next, in Section 3, the different techniques and approaches used in the remainder of the paper are introduced from a theoretical perspective. In Section 4, an application using real-life data from the concert industry is proposed and illustrates how the different concepts and techniques can be combined in order to answer advanced business questions. Section 5 concludes the paper.
نتیجه گیری انگلیسی
In this paper, a novel approach enabling the exploratory analysis of the customer’s dynamics is proposed. The main aim of this paper was to provide the analyst with techniques enabling the stepwise exploration of the data by constantly enriching the insights about the movements present in the database. In Section 3 the self-organizing maps, the knowledge-based constrained clustering, the k-means algorithm and the generalized sequential pattern algorithm were presented from a theoretical point of view and combined in a generic methodology enabling the understanding of the dynamics of items present in a data set. To achieve this, cluster-level trajectories are created and used as input for two approaches capturing the main trends in these trajectories. The first approach aims at finding frequent trajectories that are then plotted on the SOM, providing a powerful visualization facility. In order to summarize the trends using statistical approaches instead of the visualization, the second approach captures the main trends by focusing on specific segments of the trajectories by calculating deltas which are further clustered and interpreted. The proposed methodology has been applied making use of real data and advanced business-oriented questions in a ticketing context. The methodology was illustrated and explained into detail while guiding the reader into one way to use the proposed methodology, creating new insights into the dynamics of the customers’ behavior preceding the first subscription. From a research perspective, this paper contributes to the literature by presenting a general methodology enabling a comprehensible exploration of movements in high dimensional spaces while introducing prior knowledge in the clustering task. The methodology is based on a combination of different algorithms, some of them being well known algorithms, some of them being contributions of this work. Being able to understand and report movements in high dimensional spaces has been identified as a gap in the dynamic clustering literature and is one of the main motivations of this paper. By proposing two approaches to understand generated trajectories, the first moves towards comprehensible dynamic techniques are made. Moreover, by using a technique allowing the introduction of some prior knowledge in order to guide the clustering algorithm, this paper contributes to the literature of constrained clustering and illustrates how prior knowledge can be used in an unsupervised setting. From a business perspective, the application of the proposed methodology in a ticketing context in order to answer a complex business question with a time-dimensional aspect is, to the best of our knowledge, a novel exercise. By making each step of the proposed methodology value-adding and comprehensible, this work allows other practitioners to explore the dynamics of their databases in an unsupervised way. As a result of this experiment, the business involved decided to investigate the daily usage of dynamic techniques in order to approach their customers in an appropriate way. The main insights relate to the validation of the hypothesis that cross-clusters movements exist (can be captured and reported) and the fact that a unique answer to a complex business question may lead to incorrect conclusions. Although the experiments reported in this work are limited to one application of the proposed methodology, multiple projects in other contexts are already planned. In future work, possibilities towards applying the proposed approach as a basis for predictive use cases can be investigated. It is indeed important to notice that this paper aims at exploring the dynamics underlying a data set and at providing the analyst with summarization tools, hence covering the descriptive aspects of the analysis. A next step could then consist of a model making use of the knowledge generated with the proposed approach as input to better predict future states of identified trajectories (both for new data instances based on attributes available from the start or for instances already having visited different clusters), hence including predictive aspects in the analysis. Finally, further research should focus on the creation of new techniques enabling the description of full trajectories and movements in a way comprehensible for both humans and machines in order to balance the subjectivity introduced by purely visual techniques.