کشف کلاه برداری در بازی های آنلاین قمار و قرعه کشی
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
17744 | 2011 | 12 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Expert Systems with Applications, Volume 38, Issue 10, 15 September 2011, Pages 13158–13169
چکیده انگلیسی
Fraud detection has been an important topic of research in the data mining community for the past two decades. Supervised, semi-supervised, and unsupervised approaches to fraud detection have been proposed for the telecommunications, credit, insurance and health-care industries. We describe a novel hybrid system for detecting fraud in the highly growing lotteries and online games of chance sector. While the objectives of fraudsters in this sector are not unique, money laundering and insider attack scenarios are much more prevalent in lotteries than in the previously studied sectors. The lack of labeled data for supervised classifier design, user anonymity, and the size of the data-sets are the other key factors differentiating the problem from previous studies, and are the key drivers behind the design and implementation decisions for the system described. The system employs online algorithms that optimally aggregate statistical information from raw data and applies a number of pre-specified checks against known fraud scenarios as well as novel clustering-based algorithms for outlier detection which are then fused together to produce alerts with high detection rates at acceptable false alarm levels.
مقدمه انگلیسی
Cyber-crime has been a constant threat for profit as well as non-profit organizations since the beginning of the commercial internet in the mid-nineties. As a response, several approaches to counter-measure cyber-criminal attacks have been proposed from various communities: more powerful authentication and other techniques from the computer and communications security communities aimed at denying falsifying or stealing identities (client or server alike) while intrusion detection and fraud detection techniques aimed at detecting changes from “normal profile behavior”, or equivalently, at detecting “anomalous behavior”. For anomaly or outlier detection, a wide range of traditional techniques from the machine learning and data mining literature have been proposed, including supervised learning approaches where a set of training data is provided that contains labeled “normal” and “fraudulent” transactions (Koutsoutos et al., 2007 and Sherman, 2002) as well as unsupervised approaches (Breunig et al., 2000 and Yamanishi et al., 2004). So far, the major industries that have taken steps to prevent or at least detect fraud include the insurance, credit card & banking, telecommunications (Fawcett & Provost, 1997), and health-care sectors (Phua et al., 2004 and Phua et al., 2005). However, (online or not) government-run as well as privately-held lotteries and book-makers are also facing very significant threats to their operations by fraudsters attempting to appropriate a portion of the value of the gambles taking place there. In fact, given the very large revenues generated by lotteries and betting companies, detecting and preventing fraud in this sector is a major issue that to our knowledge has not been studied enough. In the games of chance sector, the most prevalent problem is that of money laundering. While not detrimental to the organization organizing the game of chance or lottery in immediate financial terms (the organization does not lose money), its long-term effects are loss of reputation and the public’s belief that the organization is run for the benefit of criminals; such effects can have significant long-term negative effect on the image and profitability of the organization. Therefore, organizations running games of chance should take every possible step to show they are battling money laundering schemes as best as they can. The second most prevalent problem that an organization running lotteries and betting games faces is that of insider attacks. Authorized agents running terminals for public participation to a lottery have been known to attempt to scam the organization’s systems for their own profit; some of the various scenarios they use will be described in Section 3. Finally, large data sizes are very common in the fraud-detection domain, mainly because it is through the sheer data size that the perpetrator can hope to hide their criminal intents. In the case of lotteries and betting games however, the volume of data can reach up to 14,000,000 transactions in a single day, and issues such as I/O, space requirements and linear-time complexity of the algorithms employed become major aspects for the design of the system. 1.1. Our contribution Our contribution starts with the development of highly efficient data structures for keeping sufficient and necessary statistics for performing a combination of statistical tests and cluster analysis in order to detect highly unlikely events in lotteries and betting games, both in the individual user level as well as at the aggregator (agent) level, which make possible the detection of anomalies in the behavior of individual users as well as groups of anonymous users or agents of the lotteries. Based on an analysis in Section 3 of the major attack scenarios that organizations running lotteries and games of chance face, we have designed and implemented a special-purpose data structure, namely a data-cube inspired from OLAP databases that is used to hold the statistical information necessary to detect deviations in very large data sets. Unlike standard data-cube structures however which usually hold only a single quantity in each cell, we design the cube so as to keep a number of data in each cell, including frequency statistics of various important quantities for fraud detection. Transaction aggregation for the purposes of detecting fraud has been proposed before by Whitrow, Hand, Juszcak, Weston, and Adams (2009), and also very recently by Krivko (2010), but both studies only consider aggregating credit card transactions along a single dimension, namely time, whereas our data-cube aggregates data simultaneously along multiple dimensions, including of course the time dimension. Another major difference between our study and that of Whitrow et al., 2009 and Krivko, 2010 is that they deal with supervised learning and design optimized classifiers having a labeled set of transactions to provide a ground-truth model for what constitutes normal or fraudulent activity. We present novel clustering-based outlier-detection techniques that partition the data-cube into a large number of clusters using structural entropy of Sum of Square Errors (SSE) as the clustering criterion and detect smaller-than-expected clusters that are candidate outlier clusters. New techniques for cluster ensemble coordination employed, allow the data-cube to be efficiently partitioned among a large number of clusters in a near-optimal way, avoiding the central limit catastrophe ( Baum, 1986) that is known to plague most other clustering algorithms. Experimental results verify that our new techniques produce much better detection rates for a given false-alarm rate than standard techniques such as Local Outlying Factor ( Breunig et al., 2000) as discussed in Section 7. Another major characteristic of our system not found in others’ works is that it works with batch – anonymous – data coming from agents terminals, as well as with individual – identified – user transactions, in which case, individual user profiles are built and monitored in near real-time to detect large deviations from the “expected” behavior. We discuss in detail both modes of system operation. The rest of this paper is organized as follows: in Section 2 we present a detailed review of related work that has been carried out during the past two decades. In Section 3, we described the major attack scenarios an organization running lotteries and games of chance faces and an analysis of these threats, and in Section 4 we describe the data-cube data-structure designed to handle very large volumes of data. Then, in Section 5, we present the suite of algorithms that we have implemented to detect anomalies in the transactions processed by the system. We show the user interaction with the system and the visualization of abnormalities detected in Section 6. In Section 7 we present the results of running the system on both benchmark data-sets as well as on real-world data-sets provided by an international organization running lotteries and games of chance. Then, we present our conclusions and a list of future directions this research will take in the near future.
نتیجه گیری انگلیسی
Organizations running lotteries and games of chance face significant risk from fraud-related threats. Money laundering, fixing games, and agent cheating are only a few of the possible fraud schemes that have been perpetrated in the past. The enormity of the number of transactions carried out by such organizations makes the task of fraud-detection a real challenge. We have designed and successfully implemented a system that utilizes novel data-structures inspired from OLAP data-base technology to aggregate necessary and sufficient statistical information about individual transactions, on which a combination of outlier detection strategies is applied, and whose results are then fused together by means of the data-cube cells, to produce a final sorted in order of importance list of alerts that the system administrator can then further investigate. We have combined (I) test results from running pre-specified scenarios, (II) standard statistical inference tests, (III) density-aware clustering methods (LOF), and (IV) cluster ensembles to cluster data-cube cells in large number of groups using structural entropy or MSSC criteria (EXAMCE). Experiments have shown that both LOF and EXAMCE are highly suitable methods for detecting outliers in large-dimensional data-sets. Results from each method are weighted and fused together to produce a sorted list of alerts by aggregating them into the data-cube cells or slices for which they arise. The prototype implementation compresses the original transactional (coupon) data to such factors so that the aggregated statistics fit into main memory and thus enable the online algorithms previously discussed to operate on them. New data are aggregated in periodic intervals and incrementally added to the previous historical statistical information kept in stored data-cubes. In this way, the system is kept up-to-date with very small delays (in the order of a few hours), and is able to run the full suite of its algorithms against the operational data-bases of an international provider of lotteries and games of chance in less than 1 h on a commodity server.