ترجمه فارسی عنوان مقاله

مطالب مجله داده کاوی برای تشخیص کلاه برداری: یک مطالعه اکتشافی

عنوان انگلیسی

Data mining journal entries for fraud detection: An exploratory study

کد مقاله	سال انتشار	تعداد صفحات مقاله انگلیسی
17722	2010	25 صفحه PDF

منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : International Journal of Accounting Information Systems, Volume 11, Issue 3, September 2010, Pages 157–181

ترجمه کلمات کلیدی

کلاه برداری - نوشته های مجله - داده کاوی - حسابرسی - سیستم های اطلاعاتی حسابداری

کلمات کلیدی انگلیسی

Fraud, Journal entries, Data mining, Auditing, Accounting information systems,

دانلود رایگان 2 صفحه اول مقاله لاتین (PDF)

پیش نمایش مقاله

چکیده انگلیسی

Fraud detection has become a critical component of financial audits and audit standards have heightened emphasis on journal entries as part of fraud detection. This paper canvasses perspectives on applying data mining techniques to journal entries. In the past, the impediment to researching journal entry data mining is getting access to journal entry data sets, which may explain why the published research in this area is a null set. For this project, we had access to journal entry data sets for 29 different organizations. Our initial exploratory test of the data sets had interesting preliminary findings. (1) For all 29 entities, the distribution of first digits of journal dollar amounts differed from that expected by Benford's Law. (2) Regarding last digits, unlike first digits, which are expected to have a logarithmic distribution, the last digits would be expected to have a uniform distribution. Our test found that the distribution was not uniform for many of the entities. In fact, eight entities had one number whose frequency was three times more than expected. (3) We compared the number of accounts related to the top five most frequently occurring three last digit combinations. Four entities had a very high occurrences of the most frequent three digit combinations that involved only a small set of accounts, one entity had a low occurrences of the most frequent three digit combination that involved a large set of accounts and 24 had a low occurrences of the most frequent three digit combinations that involved a small set of accounts. In general, the first four entities would probably pose the highest risk of fraud because it could indicate that the fraudster is covering up or falsifying a particular class of transactions. In the future, we will apply more data mining techniques to discover other patterns and relationships in the data sets. We also want to seed the dataset with fraud indicators (e.g., pairs of accounts that would not be expected in a journal entry) and compare the sensitivity of the different data mining techniques to find these seeded indicators.

مقدمه انگلیسی

This paper explores emerging research issues related to the application of statistical data mining technology to fraud detection in journal entries. The detection of fraud and particularly of financial statement fraud1 has become an increasingly important component of the financial statement audit over the last decade. A number of important financial statement frauds have involved fraudulent journal entries or managerial override of controls that have utilized journal entries within computerized accounting information systems. These journal entries have often involved well-known examples of financial statement fraud including inappropriate revenue recognition, inappropriate capitalization of expenses and a wide variety of inappropriate accruals. Given likely fraudster response to known patterns of fraudulent journal entries such as non-standard journal entries 2 and the enormous volume of journal entries in typical computerized accounting information systems, it is questionable that direct auditor assessment of small samples of journal entries will effectively and efficiently detect likely patterns of fraudulent activity. Automated auditor analysis of journal entries has been increasingly mandated by auditing standards in the U.S. and internationally. Some degree of direct computerized analysis of journal entries is now part of the toolkit of audit teams on major audit engagements. There is, however, very little knowledge of the efficacy of this important class of audit procedures. Although there are large bodies of literature regarding data mining in other domains, a broad search of audit literature did not locate any research literature on the data mining of journal entries.3 Yet, auditing standards require that auditors consider fraud in their financial audits and those standards specifically require that auditors examine journal entries. Based on the successful applications of data mining to other domains, it would appear that data mining holds the potential to improve both the effectiveness and efficiency of the auditors in their analysis of journal entries and fraud detection. This is in line with recent calls for research on the role of journal entries in the audit process (Curtis et al. 2009). In this paper, we set out the underlying issues that will guide effective and efficient data mining of journal entries. We review the standards from auditing regulators and guidance from the professional audit community and explore the potential for statistical data mining of large sets of journal entries. We then test the statistical properties of journal entries, in an exploratory study. We make first steps to data mining of such journal entries. These first steps are tested with a set of journal entries for 29 entities. We consider the essential elements of the journal entries. We explore their statistical properties, concentrating on their dispersion from known distributions. We identify some preliminary patterns within the journal entries. The paper makes an important contribution to the literature on data analysis, data mining and fraud detection within journal entries. The remainder of this paper proceeds as follows: the next section provides general background material and then specifically addresses the role of journal entries in committing fraud and draws lessons from recent frauds that used journal entries. The section also summarizes the responses of standard setters to the heightened fraud risk environment since the late 1990s. In the third section, we explore the issues involved in data mining journal entries. We discuss both the technical and the statistical properties of journal entries and how data mining can leverage the economic relationships embedded in the account combinations represented in the journal entry. In the fourth section, we introduce our data set. We then discuss our initial exploration of the statistical properties of the journal entries in our data set in the next section. In the final section, we draw conclusions and point to a research agenda.

نتیجه گیری انگلیسی

Fraud detection has become an increasingly important element of the financial statement audit. There is clear evidence of the importance of journal entries in the conduct of financial statement frauds over the last decade, with one of the most egregious being WorldCom. It is hardly surprising, then, that a key element in recent professional developments in increasing the fraud detection requirements in the financial statement audit has been significantly heightened requirements to assess the controls on journal entries and to conduct substantive tests thereon. Unfortunately, research on data mining journal entries from a fraud detection perspective is essentially a null set. In this paper, we canvass a number of perspectives on such data mining. The nature and form of the population of journal entries posted to the general ledger in computerized accounting information systems is a function of several technological and entity-level characteristics. In a modern ERP system, journal entries will be highly granular — even atomic. In more traditional accounting information systems, general ledger journal entries may be highly aggregated where the general ledger will receive summarized journals from subsidiary systems. These summarized journal entries will capture information with a very different profile than in an ERP system. Journal entries will flow from a variety of other systems and business processes. Journal entries may flow from consolidation systems, from automated or semi-automated general ledger and from manual entries. Data mining approaches must be sufficiently flexible to accommodate these different data structures and flows. There is a clear and pressing need for research on a variety of interrelated areas in data mining journal entries. Data mining journal entries must bring together five characteristics, viz (a) amount, (b) chart of accounts code to establish impact on the general ledger, (c) source of the journal entry, (d) control characteristics surrounding the individual journal entry and (e) opening and, by extension, closing general ledger balances. The biggest impediment to doing research in data mining of journal entries is getting access to one or more real-world journal entry data sets. For this project, we had access to 36 different data sets, of which 29 were appropriate for our initial analysis. The seven excluded data sets had less than 12 months of data. There are potentially many more data mining techniques that could be applied to this data set. However, our digital analysis techniques did bring up some interesting preliminary findings, including: • For all 29 entities we tested, the Chi-square distribution indicates that the first digits of journal dollar amounts differs from that expected by Benford's Law. If, on one hand, we assume that Benford's law should apply to journal entries, these variations means the auditors would have a tremendous number of red flags to investigate. On the other hand, Benford's Law builds on certain assumptions about underlying data, so, further research is needed to explore whether or how journal entries violate one or more of those assumptions. • Professional guidance recommends identifying journal entries that contain round numbers or a consistent ending number. Unlike first digits, which are expected to have a logarithmic distribution, the last digits would be expected to have a uniform distribution. Our test found that the distribution was definitely not uniform for many of the entities. Eight of the 29 entities had one of the fourth digits being three times more than expected. However, there could be situations in organizations that make some numbers appear more often, which would have to be identified by the auditors. • Since investigating false positives could be expensive for the auditors, auditors will have to develop and select audit methodologies appropriate to the characteristics of the journal entries. We compared the number of accounts related to the top-five most-frequently occurring three last digit combinations. Of the 29 entities, four entities had a very high occurrences of the top-five three-digit combination that involved only a small set of accounts, one had a low occurrences of the top-five three-digit combination that involved a large set of accounts and 24 had a low occurrences of the top-five three-digit combination that involved a small set of accounts. In general, all else being equal, the first four firms probably pose the highest risk of fraud for the auditors since they had a very high number of rounded number or consistent number transactions and they are posted to just a few accounts which could indicate that the fraudster is covering up or falsifying a particular class of transactions. • In term of general patterns of transaction volumes, there did not appear to be any. We expected to see increases at quarter end or year, but we did not find consistent examples of this in our 29 entities. Our initial analysis of the 29 journal entry data sets just begins the potential analysis of these data sets. In the future, we expect to apply many more data mining techniques to discover other patterns and relationships in the data sets. We also want to start seeding the dataset with fraud indicators (e.g., pairs of accounts that would not be expected in a journal entry) and compare the sensitivity of the different data mining techniques to find these seeded indicators.