پیشگیری از تقلب آبونمان در ارتباطات از راه دور با استفاده از قوانین فازی و شبکه های عصبی
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
17690 | 2006 | 8 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Expert Systems with Applications, Volume 31, Issue 2, August 2006, Pages 337–344
چکیده انگلیسی
A system to prevent subscription fraud in fixed telecommunications with high impact on long-distance carriers is proposed. The system consists of a classification module and a prediction module. The classification module classifies subscribers according to their previous historical behavior into four different categories: subscription fraudulent, otherwise fraudulent, insolvent and normal. The prediction module allows us to identify potential fraudulent customers at the time of subscription. The classification module was implemented using fuzzy rules. It was applied to a database containing information of over 10,000 real subscribers of a major telecom company in Chile. In this database, a subscription fraud prevalence of 2.2% was found. The prediction module was implemented as a multilayer perceptron neural network. It was able to identify 56.2% of the true fraudsters, screening only 3.5% of all the subscribers in the test set. This study shows the feasibility of significantly preventing subscription fraud in telecommunications by analyzing the application information and the customer antecedents at the time of application.
مقدمه انگلیسی
The biggest revenue leakage area in the telecom industry is fraud (Wieland, 2004). Global telecommunications fraud losses are estimated in the tens of billions of dollars every year (FML, 2003 and Hoath, 1998). The history of telecommunications crime, including several types of fraudulent activities, was reviewed by Collins, 1999a, Collins, 1999b and Collins, 2000. Some authors have emphasized the importance of distinguishing between fraud prevention and fraud detection (Bolton & Hand, 2002). Fraud prevention describes measures to avoid fraud to occur in the first place. In contrast, fraud detection involves identifying fraud as quickly as possible once it has been committed. Shawe-Taylor, Howker, Gosset, Hyland, Verrelst and Moreau (2000) distinguished six different fraud scenarios: subscription fraud, the manipulation of Private Branch Exchange (PBX) facilities or dial through fraud, freephone fraud, premium rate service fraud, handset theft and roaming fraud. Subscription fraud, which is defined as the use of telephone services with no intention of paying, is probably the most significant and prevalent worldwide telecom fraud (FML, 2003 and Hoath, 1998). Subscription fraud can be subdivided into two categories: (a) for profit, i.e. mainly for selling long distance calls and (b) for personal usage. Subscription fraud can be committed upon fixed and mobile telephones, and it is usually difficult to distinguish from bad debt, particularly if the fraud is for personal usage. Both subscription fraud and bad debt are major problems to telecom in developing and third world countries (Hoath, 1999). Two strategies have been proposed for detecting subscription fraud: examining account applications and tracking customer behavior (Fawcett & Provost, 2002b). Other efforts have focused on formalizing and predicting the deceiving intention of fraudsters (Barghava, Zhong, & Lu, 2003). The detection of fraud in mobile telecommunications was investigated in the European project Advance Security for Personal Communications Technologies (ASPeCT) (Burge and Shawe-Taylor, 2001, Shawe-Taylor et al., 1999 and Shawe-Taylor et al., 2000). The ASPeCT fraud detection tool is based on investigating sequences of call detail records (CDRs), which contain the details of each mobile phone call attempt for billing purposes. The information produced for billing also contains usage behavior information valuable for fraud detection. A differential analysis is performed to identify a fraudster through profiling the behavior of a user. The analysis of user profiles are based on comparison of recent and longer-term behavior histories derived from the toll ticket data. Alarms are activated when the usage pattern of a mobile phone changes significantly over a short period of time. The ASPeCT fraud detection tool utilizes a rule-based system for identifying certain frauds, and neural networks (NNs) to deal with novel or abnormal instances or scenarios. Rosset, Murad, Neumann, Idan, and Pinkas (1999) used customer data, in addition to CDRs, to discover rules for identifying subscription fraud. According to Cahill, Lambert, Pinheiro, and Sun (2002), a fraud detection algorithm has two components: (a) a summary of the activity on an account that can be kept current and (b) rules that are applied to account summaries to identify accounts with fraudulent activity. A popular approach is to reduce the CDRs for an account to several statistics that are computed for each period, e.g. average call duration, and compare them to thresholds. Fawcett and Provost, 1997 and Fawcett and Provost, 1997b and Fawcett (2002a) developed a method for choosing account-specific thresholds rather than universal thresholds. Their procedure takes daily traffic summaries for a set of accounts that experienced at least 30 days of fraud-free traffic activity followed by a period of fraud. This method was applied to cellular cloning, in which fraudulent usage is superimposed upon the legitimate usage of an account. For each account a set of rules that distinguish fraud from non-fraud was developed. The superset of the rules for all accounts was then pruned by keeping only those that cover many accounts, with possibly different thresholds for different accounts. Cahill et al. (2002) defined account signatures to track legitimate call behaviors in real time. An account signature describes which call variables (e.g. call duration) are likely and which are unlikely for the account. Signatures evolve with each new call that is not considered fraudulent, so each established customer eventually has its own signature. Likewise, fraud signatures are defined for each kind of fraud using the same structure as an account signature. A call is scored by comparing its probability to belong to the account signature and to a fraud signature. For new accounts the first calls are used to assign signature components, associating them with calling patterns of a given segment of customers with similar initial information. Cortes et al., 2001 and Cortes et al., 2003 applied large dynamic graphs, represented as the union of small sub-graphs called communities of interest, to the area of telecommunications fraud detection. The nodes in the graphs are network IDs and the edges represent communications between pairs of network IDs. In one application, the ‘guilt by association’ argument was used to detect new cases of fraud in the network, one week after the new accounts were activated. It was found that the probability of an account to be fraudulent is an increasing function of the number of fraudulent nodes in its community of interest. A second example used a distance metric between communities of interest to suggest when an individual whose account had recently been disconnected for fraud had assumed a new network identity. This assumed that the calling patterns of the new account had not changed very much from the previous account. In the last decade, modern intelligent systems have been applied to fraud detection. Bolton and Hand (2002) reviewed the statistical and machine learning technologies for fraud detection, including their application to detect activities in money laundering, e-commerce, credit card fraud, telecommunication fraud and computer intrusion. Weatherford (2002) presented several real-world applications of intelligent fraud detection technologies. Kou, Lu, Sirwongwattana, and Huang (2004) made a survey of fraud detection techniques used in telecommunication, as well as in credit card fraud and computer intrusion. Phua, Lee, Smith, and Gayler (2005) made a comprehensive survey of data mining techniques applied to fraud detection. Hong and Weiss (2001) presented several predictive models for data mining applied to fraud detection and insurance risk assessment. Some authors have provided comprehensive surveys of NNs (Vellido et al., 1999 and Wong et al., 1997) and Expert Systems (ES) (Liao, 2005 and Wong and Monaco, 1995) applications in business. Vellido et al. (1999) found that published applications of NNs in real-world scale are scant. One difficulty for publishing results is the need for confidentiality of private companies operating in a tough competitive environment. The main advantages of NNs are: (a) their suitability to handle incomplete, missing or noisy data; (b) being a non-parametric method, it does not require any a-priori assumptions about the distribution and/or mapping of the data; and (c) their demonstrated capability to approximate any continuous function. The lack of explanatory capabilities is considered as the main shortcoming of the application of NNs. Hence, several attempts have been made to integrate NNs and ES; a synergistic effect between them is expected, as ES are characterized by their capability of explaining their own reasoning process. Other authors have used data mining techniques to develop a decision support system for predicting customer insolvency in telecommunications (Daskalaki, Kopanas, Goudara, & Avouris, 2003). In their approach, it is assumed that insolvent customers behave differently on the average from the rest of the customers, especially during a critical period preceding the due-date for payment. The prediction of customer insolvency for a telecommunications company as a problem was found to be similar to the fraud detection problems in mobile and conventional telecommunications as well as in credit or calling card operations. Among the common characteristics found are the following: significant loss of revenue, unpredictability of human behavior, information retrieval involves processing huge amounts of data from several different sources; fraudulent cases are rare compared to legitimate ones. Ezawa and Norton (1996) constructed Bayesian networks to predict uncollectible telecommunications accounts. The related problem of subscriber churning in mobile telecommunications, i.e. the movement of subscribers from one provider to another, has been investigated using NNs (Mozer, Wolniewicz, Grimes, Johnson, & Kaushansky, 2000) and data mining (Wei & Chiu, 2002). Mozer et al. (2000) used techniques from statistical machine learning to evaluate the benefits of predicting churn. Experiments were carried out using a database of 47,000 subscribers that included information about their usage (CDRs, quality of service), billing, credit, application for service (contract details, rate plan, and credit report), and complaint history. The outcome was expressed using a lift curve which plots the fraction of all churners having churn probability above a threshold versus the fraction of all subscribers having churn probability above the threshold. Wei and Chiu (2002) built a model that predicts churning from subscriber contractual information and call patterns changes extracted from CDRs. The proposed churn-prediction technique used a decision tree induction algorithm for learning. A randomly selected data set included 1.5–2% churners and 98–98.5% non-churners. The proposed technique was capable of identifying 10% of the subscribers that contained 54% of the true churners. According to Chan, Fan, Prodromidis, and Stolfo (1999) the fraud detection task is characterized by (a) skewed distribution of data, i.e. many more transactions are legitimate than fraudulent, and (b) non-uniform cost per error, e.g. the cost of failing to detect a fraud varies with each transaction. The authors addressed skewness by partitioning the data set into subsets with a desired distribution, applying mining techniques to the subsets, and combining the mined classifiers. The issue of non-uniform cost was addressed by developing the appropriate cost model and biasing the methods towards reducing cost. Stolfo, Fan, Lee, Prodromidis, and Chan (1997) argued that for the fraud detection domain, the fraud catching rate (true positive rate) and false alarm rate (false positive rate) are better metrics than overall accuracy due to unequal error costs and uneven class distributions. Given a skewed distribution on the original data, artificially balanced training data with a 50/50% fraud/non-fraud distribution lead to classifiers with the highest true positive rate and low false positive rate. Provost and Fawcett (1997) presented a method called ROC convex hull which combined techniques from receiver operating characteristic (ROC) analysis and a decision analysis method for analyzing and visualizing classifier performance. ROC graphs depict tradeoffs between the hit (true positive) rate and the false alarm (false positive) rate. However, ROC graphs illustrate the behavior of a classifier disregarding class distribution or error cost ( Fawcett, 2003). An iso-performance line is defined in ROC space, where all classifiers corresponding to points in the line have the same expected cost. Each set of classes and distributions define a family of iso-performance lines. The optimal classifier will be the point on the convex hull that intersects the iso-performance line with largest true positive rate intercept. The method assumes that there are only two classes and that costs do not vary within a given type of error. Chile has one of the most liberal telecommunications regimes in the world (Stehmann, 1995). Chile has pioneered the privatization and deregulation of both the long distance and the local telephony market (Paredes, 2005). By June 2002, there were 20 carriers operating in the long-distance market; 13 operators in the local telephony market and six operators in the mobile communications market (SUBTEL, 2002). Although the ratio of mobile phones to fixed lines reached 1.6 in 2002, the traffic generated in the latter network was six times higher than the traffic generated in the mobile network. The aim of this research is to develop a system for scoring the risk of subscription fraud at the time of application for fixed telephone lines. In particular, our study focused on the identification of subscribers who would order new fixed lines to make use of long distance services without paying the corresponding telephone bills. This corresponds mainly to the fraud-for-profit category. For this problem, the objective was to detect as many subscription fraudsters as possible while minimizing false alarms. Our study was conducted with real data provided by a major telecom operating in Chile. For confidentiality reasons the telecom name and some particular information, such as the full list of variables and rules, are not published.
نتیجه گیری انگلیسی
In contrast to fraud detection systems that operate once the fraud has been committed, the proposed system is predictive and operates at the application time. Demographics and commercial antecedents, as well as other characteristics associated to the application for a new phone line, were used as predictors. The predictive module was able to identify 3.5% of the subscribers containing 56.2% of the true fraudsters. A manual analysis of errors showed that most of the FP cases corresponded to the insolvent category. One third of these corresponded to customers that never paid the bills but had a typical residential average expenditure. This pattern corresponds to the category of fraud for personal usage, and could be considered as a kind of subscription fraud. In the future, the proposed system could be enhanced by adding information about the subscriber's call patterns. In this way, cases marked as potentially risky by the system at the time of application, could be followed up after the installation date for closer examination. It is well known that the patterns as well as the levels and costs of fraud change very quickly in time. Because of this complexity, any fraud system could become rapidly obsolete. In our system architecture, the classification module should operate continuously to monitor the prevalence of fraud, and to provide new fraud cases for adjusting the prediction module. This study was carried out on fixed telecommunications but the techniques proposed here could be extended to subscription fraud in mobile communications, as well as other markets.