دانلود مقاله ISI انگلیسی شماره 17733
ترجمه فارسی عنوان مقاله

استفاده از تکنیک های داده کاوی در کشف کلاهبرداری مالی: چارچوب طبقه بندی و بررسی علمی ادبیات

عنوان انگلیسی
The application of data mining techniques in financial fraud detection: A classification framework and an academic review of literature
کد مقاله سال انتشار تعداد صفحات مقاله انگلیسی
17733 2011 11 صفحه PDF
منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Decision Support Systems, Volume 50, Issue 3, February 2011, Pages 559–569

ترجمه کلمات کلیدی
کلاهبرداری مالی - تشخیص کلاهبرداری - بررسی ادبیات - داده کاوی - هوش کسب و کار
کلمات کلیدی انگلیسی
Financial fraud, Fraud detection, Literature review, Data mining, Business intelligence,
پیش نمایش مقاله
پیش نمایش مقاله  استفاده از تکنیک های داده کاوی در کشف کلاهبرداری مالی: چارچوب طبقه بندی و بررسی علمی ادبیات

چکیده انگلیسی

This paper presents a review of — and classification scheme for — the literature on the application of data mining techniques for the detection of financial fraud. Although financial fraud detection (FFD) is an emerging topic of great importance, a comprehensive literature review of the subject has yet to be carried out. This paper thus represents the first systematic, identifiable and comprehensive academic literature review of the data mining techniques that have been applied to FFD. 49 journal articles on the subject published between 1997 and 2008 was analyzed and classified into four categories of financial fraud (bank fraud, insurance fraud, securities and commodities fraud, and other related financial fraud) and six classes of data mining techniques (classification, regression, clustering, prediction, outlier detection, and visualization). The findings of this review clearly show that data mining techniques have been applied most extensively to the detection of insurance fraud, although corporate fraud and credit card fraud have also attracted a great deal of attention in recent years. In contrast, we find a distinct lack of research on mortgage fraud, money laundering, and securities and commodities fraud. The main data mining techniques used for FFD are logistic models, neural networks, the Bayesian belief network, and decision trees, all of which provide primary solutions to the problems inherent in the detection and classification of fraudulent data. This paper also addresses the gaps between FFD and the needs of the industry to encourage additional research on neglected topics, and concludes with several suggestions for further FFD research.

مقدمه انگلیسی

In recent years, financial fraud, including credit card fraud, corporate fraud and money laundering, has attracted a great deal of concern and attention. The Oxford English Dictionary [55], p. 562] defines fraud as “wrongful or criminal deception intended to result in financial or personal gain.” Phua et al. [58] describe fraud as leading to the abuse of a profit organization's system without necessarily leading to direct legal consequences. Although there is no universally accepted definition of financial fraud, Wang et al. [78], p. 1120] define it as “a deliberate act that is contrary to law, rule, or policy with intent to obtain unauthorized financial benefit.” Economically, financial fraud is becoming an increasingly serious problem. A striking case in point is the Ponzi scheme perpetuated by former NASDAQ chairman Bernard Madoff, which has led to the loss of approximately US$50 billion worldwide [34]. Another example is that of Joseph Hirko, former co-chief executive officer of Enron Broadband Services (EBS), who has avowed to forfeit approximately US$8.7 million in restitution to Enron victims through the U.S. Securities and Exchange Commission's Enron Fair Fund after pleading guilty to wire fraud [34]. According to a 2007 BBC news report [8], fraudulent insurance claims cost UK insurers a total of 1.6 billion pounds a year. The overall losses caused by financial fraud are incalculable. Financial fraud detection (FFD) is vital for the prevention of the often devastating consequences of financial fraud. FFD involves distinguishing fraudulent financial data from authentic data, thereby disclosing fraudulent behavior or activities and enabling decision makers to develop appropriate strategies to decrease the impact of fraud. Data mining plays an important role in FFD, as it is often applied to extract and uncover the hidden truths behind very large quantities of data. Bose and Mahapatra [14] define data mining as a process of identifying interesting patterns in databases that can then be used in decision making. Turban et al. [73] define data mining as a process that uses statistical, mathematical, artificial intelligence, and machine-learning techniques to extract and identify useful information and subsequently gain knowledge from a large database. Frawley et al. [35] state that the objective of data mining is to obtain useful, non-explicit information from data stored in large repositories. Kou et al. [47] highlight that an important advantage of data mining is that it can be used to develop a new class of models to identify new attacks before they can be detected by human experts. Phua et al. [58] point out that fraud detection has become one of the best established applications of data mining in both industry and government. Various data mining techniques have been applied in FFD, such as neural networks [18], [27], [31], [38], [45] and [75], logistic regression models [10], [54], [65] and [85], the naïve Bayes method [11] and [77], and decision trees [45] and [46], among others. Over the past few years, a number of review articles have appeared in conference or journal publications. Bolton and Hand [13], for example, have reviewed statistical methods of detecting fraud, including credit card fraud, money laundering, telecommunications fraud, etc. Zhang and Zhou [88] have surveyed financial applications of data mining including stock market and bankruptcy predictions and fraud detection. Phua et al. [58] present a survey of data mining-based fraud detection research, including credit transaction fraud, telecoms subscription fraud, automobile insurance fraud and the like. Others have reviewed insurance fraud [24] and financial statement fraud [86]. However, the survey presented herein is an up-to-date, comprehensive and state-of-the-art review of data mining applications in FFD. This paper has three objectives. The first is to develop a framework for classifying the applications of data mining to FFD. The second is to provide a systematic and comprehensive review of existing research articles on the applications of data mining to FFD. The third is to use the review and framework to generate a roadmap for researchers and practitioners seeking to better comprehend this field. The remainder of this article is structured as follows. Section 2 presents the methodological framework for research. Section 3 provides our classification framework for the application of data mining in FFD. Section 4 analyzes FFD research according to this classification framework. Section 5 concludes our research and suggests further research directions.

نتیجه گیری انگلیسی

A critical part of any new research venture is the construction of a good classification framework and the establishment of a reference collection of relevant literature. The research area of FFD is no exception. Although the importance of data mining techniques in the detection of financial fraud has been recognized, a comprehensive classification framework or a systematic review of their application in FFD research studies is lacking. In this study, we conduct an extensive review of academic articles and provide a comprehensive bibliography and classification framework for the applications of data mining to FFD. Our intention is to inform both academics and practitioners of the areas in which specific data mining techniques can be applied to FFD, and to report and compile a systematic review of the burgeoning literature on FFD. Although our study cannot claim to be exhaustive, we believe that it will prove a useful resource for anyone interested in FFD research, and will help simulate further interest in the field. The results of our study lead to the following conclusions. • Of the four FF-based categories, Insurance fraud has attracted the greatest attention from researchers. Phua et al. [58] point out that insurance fraud is more likely to be committed by offenders, which may be why this type of fraud has gained so much research attention. Insurance fraud is also the area of FFD to which data mining techniques are most commonly applied (24 articles out of 49, or 49%), with automobile insurance fraud in particular being described in 17 out of the 24 articles. Artís et al. [5] argue that this is a subject of major concern for both companies and consumers. • There are only a few studies on money laundering, mortgage fraud, mass marketing fraud, and securities and commodities fraud. Further, there is only one article that discusses the application of data mining to the detection of money laundering, and no articles reporting its application to the other three fraud types. Nevertheless, these fraudulent activities are important and deserve more research. Gao and Ye [36] emphasize that anti-money laundering research is of critical significance to national financial stability and international security, and the UN Office on Drugs and Crime (UNODC) estimates that the total amount of “black” money circulating worldwide reached 320 billion dollars in 2008 [69]. • The data mining techniques of outlier detection and visualization have seen only limited use. The lack of research on the application of outlier detection techniques to FFD may be due to the difficulty of detecting outliers. Indeed, Agyemang et al. [2] point out that outlier detection is a very complex task akin to finding a needle in a haystack. Distinct from other data mining techniques, outlier detection techniques are dedicated to finding rare patterns associated with very few data objects. In the field of FFD, outlier detection is highly suitable for distinguishing fraudulent data from authentic data, and thus deserves more investigation. Similarly, visualization techniques have a strong ability to recognize and present data anomalies, which could make the identification and quantification of fraud schemes much easier [64]. We suggest that one of the reasons for the limited number of relevant journal articles (49) published between 1997 and 2008 is the difficulty of obtaining sufficient research data. Fanning and Cogger [31] highlight the challenge of obtaining fraudulent financial statements, and note that this creates enormous obstacles in FFD research. The most urgent challenge facing FFD is to bridge the gap between practitioners and researchers. The existing FFD research concentrates on particular types of data mining techniques or models, but future research should direct its attention toward finding more practical principles and solutions for practitioners to help them to design, develop, and implement data mining and business intelligence systems that can be applied to FFD. We predict that increasing amounts of privacy-preserving financial data will be publicly available in the near future due to increased collaboration between practitioners and researchers, and that this should lead to more investigations of data mining techniques that can be applied to privacy-preserving data. A further problem faced by FFD is that of cost sensitivity. The cost of misclassification (false positive and false negative errors) differs, with a false negative error (misclassifying a fraudulent activity as a normal activity) usually being more costly than a false positive error (misclassifying a normal activity as a fraudulent activity) [58]. Few studies have explicitly included cost in their FFD modeling [74], but future research on the application of data mining techniques to FFD problems should take into account cost sensitivity considerations. This study has two major limitations. First, our review applied several keywords to search only nine online databases for articles published between 1997 and 2008. A future review could be expanded in scope. Second, we considered only articles written in English. Future research could be expanded to include relevant articles published in other languages.