استفاده از روش های آموزش نظارت شده و نظارت نشده برای تشخیص کلاهبرداری مخابراتی
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
17703 | 2008 | 6 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Knowledge-Based Systems, Volume 21, Issue 7, October 2008, Pages 721–726
چکیده انگلیسی
This paper investigates the usefulness of applying different learning approaches to a problem of telecommunications fraud detection. Five different user models are compared by means of both supervised and unsupervised learning techniques, namely the multilayer perceptron and the hierarchical agglomerative clustering. One aim of the study is to identify the user model that best identifies fraud cases. The second task is to explore different views of the same problem and see what can be learned form the application of each different technique. All data come from real defrauded user accounts in a telecommunications network. The models are compared in terms of their performances. Each technique’s outcome is evaluated with appropriate measures.
مقدمه انگلیسی
Telecommunications fraud can be simply described as any activity by which telecommunications service is obtained without intention of paying [10]. Telecommunications fraud has certain characteristics that make it particularly attractive to fraudsters. The main one is that the danger of localization is small. This is because all actions are performed from a distance which in conjunction with the mess topology and the size of networks makes the process of localization time-consuming and expensive. Additionally, no particularly sophisticated equipment is needed, if one is needed at all. The simple knowledge of an access code, which can be acquired even with methods of social engineering, makes the implementation of fraud feasible. Finally, the product of telecommunications fraud, a phone call, is directly convertible to money [16]. Several categories of telecommunications fraud have been reported. The main are the technical fraud, the contractual fraud, the hacking fraud, and the procedural fraud [10]. In [1] 12 distinct fraud types are identified while combinations of them have also been reported [13]. The most common fraud scenario in private networks is the superimposed fraud. This is the case of an employee, the fraudster, who uses another employee’s authorization code to access outgoing trunks and costly services. Thus, the fraudster’s activity is superimposed over the legitimate user’s one. Telecommunications fraud has drawn the attention of many researchers in recent years not only due to the huge economic burden on companies’ accountings but also due to the interesting aspect of user behavior characterization. Fraud detection techniques involve the monitoring of users’ behavior in order to identify deviations from some expected or normal norm. Research in telecommunications fraud detection is mainly motivated by fraudulent activities in mobile technologies [1], [4], [10], [20], [24] and [31]. The techniques used come from the area of statistical modeling like rule discovery [2], [7], [24] and [30], clustering [3] and [27], Bayesian rules [4], visualization methods [5], Markov models [31] or neural network classification [14], [21], [24] and [32]. Combinations of more than one method have also been proposed [17], [28] and [31]. In [8] one can find a bibliography on the use of data mining and machine learning methods for automatic fraud detection. The site is updated up to November 2004. Most of the aforementioned approaches use a combination of legitimate user behavior examples and some fraud examples. The aim is to detect any usage changes in the legitimate user’s history. The industry’s interest in fraud detection problems is also stressed by the high number of relevant patents. A quick search with the keywords “fraud detection” in an online search-engine, [9], on July 2007, revealed 76 patents, 22 of them being relevant to telecommunications. In general, all fraud cases can actually be viewed as fraud scenarios which are related to the way the access to the network was acquired. Detection techniques tailored to one case may fail to detect other types of fraud. For example, velocity traps which can identify the use of a cloned cell phone will fail to detect a case of contractual fraud. So, fraud detection focuses on the analysis of users’ activity. The related approaches are divided into two main subcategories. The absolute analysis that searches for thresholds between legal and fraudulent behavior, and the differential approach that tries to detect extreme changes in a user’s behavior. In both cases, analysis is achieved by means of statistical and probabilistic methods, neural networks and rule based systems. However, the use of indicators of excessive usage has been criticized as they may not only imply fraud but may also point to the best customers [25]. In the present paper, we are interested in the different lessons than can be learned from the application of different learning algorithms on different user behavior representations (profiles). Both supervised and unsupervised learning methods are applied. One would expect the findings of one method to be used as inputs to the other one, e.g. first use the unsupervised method and then apply the supervised one in order to boost the learning process. However, this is not the case in the present work. Each method is applied independently from the other and is expected to reveal different aspects of the modeling approach. The main task is to cross-check the effectiveness of different user profiles to discriminate between legitimate and fraudulent activity and additionally identify the elements that are important in the learning process and compare the conclusions from the application of the two methods. The paper proceeds as follows. In the next section, the data that were used are described along with the user modeling approach. In Section 3 the experimental procedure, i.e., the learning methods, is presented. The experimental results are given in Section 4. In the last section conclusions are drawn.
نتیجه گیری انگلیسی
In the present paper, supervised and unsupervised learning techniques were applied to a fraud detection problem. In particular, a multilayer perceptron classifier and the hierarchical agglomerative clustering technique were applied on five models (profiles) of telecommunications users’ behavior. The profiles are used as a user characterization method, in order to discriminate legitimate from fraudulent usage in a telecommunications environment. The input data consisted of real user accounts which have been defrauded. Neural network classifiers performed very well on the problem, giving measures of TP rates better that 80% with FP less than 2%. However, from the data analysis point of view, neural networks work like black boxes so they do not reveal the nature of the discriminating characteristics. Hence, an unsupervised approach was also used, independently from the supervised one, to show different aspects of the same problem. Experiments with cluster analysis showed that the outcome depends on the distance measure used. Euclidean distance produced a distinct cluster of outliers regardless of their class membership, while correlation separated the fraudulent from normal cases more clearly. This observation is coherent to the success of the FFNN to the problem. One of the FFNN’s abilities is that they can easily find correlations between large numbers of variables. From both analyses it is concluded that accumulated, in time, characteristics of a user yield better discrimination results. Aggregating user’s behavior for periods larger than a week was avoided in order to preserve some level of on-line detection ability. Both approaches gave better results with Profile1. This provided us with enough confidence to use Profile1 as the appropriate user behavior characterization approach. It was used as a first step in the evolvement of an expert system that is tailored to identify superimposed fraud in the telecommunications environment from which the data came. Clustering, also, revealed that misclassification occurs due to mixed types of behavior. That is, there are cases for which the legitimate user acts like a fraudster, e.g. making long or expensive calls, while at the same time a fraudster may act like a normal user. This observation also reveals the complexity in a fraud detection problem. In general, the task of detecting fraud in an unsupervised manner is a more difficult one, given the dynamic appearance of new fraud types. The application of the more sophisticated self-organizing map (SOM) is considered as the next step to the present work. Moreover, any accurate fraud detection technique is bound to be proprietary to the environment in which it is working.