تشخیص کلاه برداری در کارت اعتباری توسط الگوریتم ژنتیک و جستجوی پراکنده
|کد مقاله||سال انتشار||تعداد صفحات مقاله انگلیسی||ترجمه فارسی|
|17742||2011||7 صفحه PDF||14 صفحه WORD|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Expert Systems with Applications, Volume 38, Issue 10, 15 September 2011, Pages 13057–13063
بحث و نتایج
خلاصه و نتیجه گیری
In this study we develop a method which improves a credit card fraud detection solution currently being used in a bank. With this solution each transaction is scored and based on these scores the transactions are classified as fraudulent or legitimate. In fraud detection solutions the typical objective is to minimize the wrongly classified number of transactions. However, in reality, wrong classification of each transaction do not have the same effect in that if a card is in the hand of fraudsters its whole available limit is used up. Thus, the misclassification cost should be taken as the available limit of the card. This is what we aim at minimizing in this study. As for the solution method, we suggest a novel combination of the two well known meta-heuristic approaches, namely the genetic algorithms and the scatter search. The method is applied to real data and very successful results are obtained compared to current practice.
This study is motivated from an industrial consultancy project. Our industrial partner (a major bank in Turkey) has been using an internally developed credit card fraud detection solution for some years. Although that solution has been regarded as successful, the bank authorities thought that it can further be improved due to two expectations/reasons. First, the weights of the parameters used could be better adjusted using the recent card usage behaviors and frauds happened. Second, it has been understood that a good solution is not necessarily the one detecting many frauds but the one detecting frauds maybe fewer in number but larger in risk. Fraud can be defined as the illegal usage of any system or good. Correspondingly the legal activities can be named as legitimate. We can face with fraud in a variety of different domains including banking, insurance, telecommunications, health care and public services. In banking, frauds can be observed in the use of credit cards, debit cards, internet banking accounts and call center (telephone banking). Money laundering and personnel fraud are the other banking related fraud types. The losses due to fraud sum up to huge amounts and it is a major threat to the legal economy. Inherited to its importance it has attracted the interest of many scientists. During the last 10 years (1999–2009) 1361 articles are found to be published according to the ISI Web of Knowledge data when a search with the keyword “fraud” is made. In this study we are concerned only with the credit card frauds. When we analyzed the data of our industrial partner and several other banks we observe that only several out of 100,000 transactions are fraudulent transactions. The rest are legitimate. This extremely high imbalance between the two classes makes the fraud detection a challenging task. Fraud detection has been usually seen as a data mining problem where the objective is to correctly classify the transactions as legitimate or fraudulent. For classification problems many performance measures are defined most of which are related to the correct number of cases classified correctly. Among these the accuracy ratio, the capture rate, the hit rate, the gini index and the lift are the most popular ones (Gadi et al., 2008 and Kim and Han, 2003). Parallel to its popularity, in the literature there are many studies on fraud detection using various data mining algorithms including decision trees, regression and artificial neural networks. Quah and Srinagesh (2008) suggest a framework which can be applied real time where first an outlier analysis is made separately for each customer using self organizing maps and then a predictive algorithm is utilized to classify the abnormal looking transactions. Panigrahi, Kundu, Sural, and Majumdar (2009) suggest a four component fraud detection solution which is connected in a serial manner. The main idea is first to determine a set of suspicious transactions and then run a Bayesian learning algorithm on this list to predict the frauds. Sanchez, Vila, Cerda, and Serrano (2009) presented a different approach and used association rule mining to define the patterns for normal card usage and indicating the ones not fitting to these patterns as suspicious. The study of Bolton and Hand (2002) provides a very good summary of literature on fraud detection problems. In these studies, the performance of the algorithms is mostly measured by the above measures. When the fraudsters obtain a card, they usually use (spend) its entire available (unused) limit. According to the statistics, they do this in four or five transactions, on the average. Thus, for the fraud detection problem, although the above mentioned measures are quite relevant, as indicated by the bank authorities, a measure, measuring the loss that can be saved on the cards whose transactions are identified as fraud is more prominent. In other words, detecting a fraud on a card having a large available limit is more valuable than detecting a fraud on a card having a small available limit. As a result, what we are faced with is a classification problem with variable misclassification costs. As the classical DM algorithms are not designed for such a misclassification cost structure, they are not directly applicable to our case (they work well when the objective is to minimize the incorrectly classified number of cases). Either some modifications should be made on them or new algorithms should be developed specifically for this purpose (actually in some popular DM software packages like SAS Enterprise Miner or SPSS PASW Modeler, it is possible to introduce different misclassification costs for the two classes but there has to a fixed ratio between them and thus they are not sufficient to handle our case). As the classical DM algorithms are not directly usable, we need alternative methods for our classification problem. In this regard, we thought that, the meta-heuristic algorithms which are applicable to many different problem domains could serve. After analyzing the main characteristics of the popular meta-heuristic algorithms, for our problem we decided to use the genetic algorithm (GA) and the scatter search (SS) in a combined manner. We called our hybrid solution method as GASS. Genetic algorithms are evolutionary algorithms which aim at obtaining better solutions as time progresses (Mitchell, 1998). Since their first introduction by Holland (1975), they have been successfully applied to many problem domains from astronomy (Charbonneau, 1995) to sports (Charbonneau, 1995), from optimization Levi et al., 2007 and Krzysztof and Peter, 2004 to computer science (Kaya, 2010), etc. They have also been used in data mining mainly for variable selection (Bidgoli, Kashy, Kortemeyer, & Punch, 2003) and are mostly coupled with other DM algorithms. Scatter search is another type of evolutionary algorithms. It has been first introduced by Glover (1977). Afterwards, it has been almost forgotten for about 20 years and since its re-introduction in 1997 (Glover, 1997) it has been applied to many different problems. However, to the best of our knowledge nobody has used it in DM problems so far. The contributions of this study to the literature are twofold. First, a new classification cost function for the fraud detection problem is introduced. Secondly, a novel implementation of two well known meta-heuristic algorithms is made. The rest of the paper is organized as follows. In the next section, the fraud detection problem we were faced is described in detail together with the current detection system used in our industrial partner. Section 3 briefly summarizes the basic principles of genetic algorithms and scatter search and then details the GASS implementation. The results obtained on the sample databases and the selections of the best solution parameters are discussed in Section 4. The sensitivity analysis regarding the parameter values is also made and presented in this section. The paper is finalized in Section 5 by providing the summary of the study and the major conclusions arrived.
نتیجه گیری انگلیسی
In this study we undertook the problem of detecting fraudulent credit card transactions. The study is based on a real application project where an available fraud detecting system’s performance is tried to be improved by just playing with the values of the parameters. The objective of the study was taken differently than the typical classification problems in that we had a variable misclassification cost. As the standard data mining algorithms does not fit well with this situation we decided to use meta-heuristic algorithms. For this purpose we combined two well known methods: the genetic algorithms and the scatter search. At the end of the study we have improved the performance of the existing solution by about 200%. As far as the effectiveness of the variables in detecting fraud are concerned, the statistics related with the popular and unpopular regions for a credit card holder were found to be most important. Some type of variables such as the MCC and country statistics was not included in the scope of this study. Thus, the findings obtained here may not be generalized to the global fraud detection problem, however, when such data become available the methodology described here can easily be enlarged to cope with them. As future work, some effective algorithms which can perform well for the classification problem with variable misclassification costs could be developed.