فراخوانی تجزیه و تحلیل جوامع و شناسایی با استفاده از تکنیک های یادگیری ماشینی
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
20889 | 2009 | 9 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Expert Systems with Applications, Volume 36, Issue 3, Part 2, April 2009, Pages 6218–6226
چکیده انگلیسی
The analysis of social communities related logs has recently received considerable attention for its importance in shedding light on social concerns by identifying different groups, and hence helps in resolving issues like predicting terrorist groups. In the customer analysis domain, identifying calling communities can be used for determining a particular customer’s value according to the general pattern behavior of the community that the customer belongs to; this helps the effective targeted marketing design, which is significantly important for increasing profitability. In telecommunication industry, machine learning techniques have been applied to the Call Detail Record (CDR) for predicting customer behavior such as churn prediction. In this paper, we pursue identifying the calling communities and demonstrate how cluster analysis can be used to effectively identify communities using information derived from the CDR data. We use the information extracted from the cluster analysis to identify customer calling patterns. Customers calling patterns are then given to a classification algorithm to generate a classifier model for predicting the calling communities of a customer. We apply different machine learning techniques to build classifier models and compare them in terms of classification accuracy and computational performance. The reported test results demonstrate the applicability and effectiveness of the proposed approach.
مقدمه انگلیسی
Though started as subfield of sociology to identify and study the relations between people and the things they do, with the advance in the computing based prediction and analysis techniques, social network analysis in general and community identification in particular has received considerable and increasing attention over the past decade in different domains, including web mining (Flake et al., 2000 and Kleinberg, 1999), biological networks (Girvan & Newman, 2002), among others. So, identifying social communities is an emerging research area that has already attracted the attention of many research groups. The main theme is to analyze logs that reflect social communication between different parties. The analysis leads to valuable discoveries that may have essential social and economical impact. From social perspective, the discoveries may highlight terrorist groups, family relationships, friendship, etc. From economical perspective, the analysis may lead to certain target customer groups. We concentrate on the latter perspective in this paper. In particular, we investigate customer relationships by analyzing call detail records obtained from a telecommunication company. Other researchers concentrated on identifying terrorist groups (e.g., Nasrullah & Larsen, 2006). In telecommunication, a Call Detail Record (CDR) is a record containing information about recent system usage, such as the identities of sources (points of origin), the identities of destinations (endpoints), the duration of each call, the amount billed for each call, the total usage time in the billing period, the total free time remaining in the billing period, and the running total charged during the billing period. The format of the CDR varies among telecom providers and call-logging softwares. In recent years, we witness dramatic increase in the competition among telecommunication companies in order to detain their current customers and acquire new ones. For this reason, the ability to dynamically classify and predict customers’ behaviors according to their calling patterns obtained from CDR data has attracted considerable attention in the research community; it is beneficial for (Yan, Fassino, & Baldasare, 2005): • Churn Prediction: The goal is to understand when and why company’s customers are likely to leave so that appropriate action can be planned. Customers become “churners” when they discontinue their subscription and move their business to a competitor company. This has been developed in the telecommunication industry using data mining techniques. Data mining is applied in this area to perform two major tasks: 1. predict whether a particular customer will churn and when this will happen; 2. understand why particular customers churn. By predicting which customers are likely to churn, the telecommunication company can reduce the rate of churn by offering the customers new incentives to stay. • Identifying Calling Communities: This can be used for determining a particular customer’s value according to the general pattern behavior of the community that the particular customer belongs to. This helps the effective targeted marketing design, which is significantly important for increasing profitability in the telecommunication industry. The main focus of this study is to use an unsupervised machine learning technique, namely clustering to classify customers of a mobile service provider into appropriate calling communities according to the statistics extracted from the CDR data. Clustering is one of the most prominent approaches for identifying unknown classes amongst a group of objects, and has been successfully used as a tool in many fields such as biology, image analysis, finance, etc. The classification algorithms evaluated in this paper use an unsupervised learning mechanism, wherein unlabeled training data is grouped based on similarity. Once an acceptable clustering has been found using the similarities and dissimilarities in the training data set, the clustering is transformed into a classifier by employing a classification technique. In our approach, the clusters are labeled, and a new object is classified with the label of the cluster to which it is most similar. However, the type of the classifier that is to be used to identify calling communities and customers’ value is very crucial. For instance, in a marketing campaign, selecting inappropriate customers is very costly for verifying the impact of the marketing campaign when it is intended to modify the strategy toward the target group. In this study, the agglomerative hierarchical clustering approach has been applied for the clustering task. It can produce an ordering of the objects (cluster tree), which may be more informative for the nature of the data being analyzed and investigated. For building the classifier model, we use supervised Machine Learning (ML) techniques to automatically classify and identify customers’ communities based on customers’ characteristics extracted from clusters. A ML algorithm constructs a classifier by exploring through data objects (training set) to find a set of rules which determine the class of each object according to its attributes. These rules are later used to predict the class or missing attribute value of unseen objects whose class might not be known. Supervised machine learning techniques have been widely used in real world classification applications such as increasing revenue, preventing theft, medical diagnosis, market analysis signaling potentially fraudulent transactions, etc. In this paper, we evaluate and compare the performance of our classifier model, in terms of accuracy and time complexity, by applying several machine learning techniques. The reported results are promising. The rest of this paper is organized as follows. Section 2 covers CDR data and calling neighbors. Section 3 presents details of the proposed models. Section 4 reports experimental results. Section 5 is conclusions.
نتیجه گیری انگلیسی
In this paper, we demonstrated how cluster analysis can be used to effectively identify calling communities by using information derived from the CDR data. We have proposed a similarity measure that combines both the first- and second-order distances for cluster formation. The agglomerative hierarchical clustering approach has been applied for the clustering task. We use the information extracted from the cluster analysis to identify customer calling patterns. We have constructed a variety of features in order to represent the customer behavior within her/his own cluster as well as toward other clusters. Customers calling patterns are then given to a classification algorithm to generate a classifier model for predicting the calling communities of a customer. Different machine learning algorithms have been evaluated for automated communities identification in terms of performance and the classifier build time. Our work is especially important for targeted marketing campaigns in the telecommunication industry since the CDR data is often the only primary data source available for the customers. Based on the assumption that customers in the same calling community might behave similarly, targeted efforts can be focused on for certain communities. Further, by identifying the community leader, marketing efforts can be directed to the community leaders since they are believed to have influence on other community members’ behavior. In the context of automated calling communities identification, the type of the classifier that is to be used to identify calling communities and customers’ value is very crucial. For instance, in a marketing campaign, selecting inappropriate customers is very costly for verifying the impact of the marketing campaign when it is intended to modify the strategy toward the target group. The fuzzy classification is one possible approach that offers more convenience for selecting customer subgroups and for measuring the efficiency and validity of the communities regarding the marketing campaign design. As the next step, we would like to build a flexible fuzzy classifier system that takes advantage of the application of membership functions to increase or decrease the homogeneity between the targeted customers depending on whether the proposed products are very specific or intended for a large community.