دانلود مقاله ISI انگلیسی شماره 17752
ترجمه فارسی عنوان مقاله

یک روش احتمالی برای تشخیص کلاهبرداری در مخابرات

عنوان انگلیسی
A probabilistic approach to fraud detection in telecommunications
کد مقاله سال انتشار تعداد صفحات مقاله انگلیسی
17752 2012 13 صفحه PDF
منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Knowledge-Based Systems, Volume 26, February 2012, Pages 246–258

ترجمه کلمات کلیدی
واگرایی کولبک - لیبلر - تخصیص دیریکله پنهان - تشخیص کلاهبرداری - پروفایل کاربری - ارتباطات -
کلمات کلیدی انگلیسی
Kullback–Leibler divergence, Latent Dirichlet Allocation, Fraud detection, User profiling, Telecommunications,
پیش نمایش مقاله
پیش نمایش مقاله  یک روش احتمالی برای تشخیص کلاهبرداری در مخابرات

چکیده انگلیسی

In this paper, a method for telecommunications fraud detection is proposed. The method is based on the user profiling utilizing the Latent Dirichlet Allocation (LDA). Fraudulent behavior is detected with use of a threshold-type classification algorithm, allocating the telecommunication accounts into one of two classes: fraudulent account and non-fraudulent account. The paper provides also a method for automatic threshold computation. The accounts are classified with use of the Kullback–Leibler divergence (KL-divergence). Therefore, we also introduce three methods for approximating the KL-divergence between two LDAs. Finally, the results of experimental study on KL-divergence approximation and fraud detection in telecommunications are reported.

مقدمه انگلیسی

The Kullback–Leibler divergence (KL-divergence) is a well-known quantity, widely used in probability theory, statistics, and information theory. It was introduced by Kullback and Leibler in [16], and in the probability and statistics context, it evaluates the dissimilarity between two probability distributions, while in the information theory context, it is the measure of relative entropy. The KL-divergence has a wide range of applications, including multivariate data analysis (for example, pattern recognition and discriminant analysis), estimation, approximation, and regression. We focus on its use in fraud detection in telecommunications, which can be regarded as the recognition problem. In our study, this recognition comes down to binary classification, i.e., classification to one of two classes: fraudulent account and non-fraudulent account. We applied a simple threshold-type classification algorithm with automatic threshold setting, which provides computational simplicity and efficiency. There is a number of fraud detection problems, including credit card frauds, money laundering, computer intrusion, and telecommunications frauds, to name but a few. Among all of them, the fraud detection in telecommunications appears to be one of the most difficult, since there is a large amount of data, that needs to be analyzed, and, simultaneously, there is only a small number of fraudulent calls samples, which could be used as the learning data for the learning-based methods. Consequently, this problem essentially inhibits and limits an application of the learning-based techniques, like the neural-networks-based classifiers, for example. Fraud detection systems, generally, fall into rule-based systems and user-profiles-based systems. The second of these approaches is regarded as more effective, and became more popular in real-world applications.

نتیجه گیری انگلیسی

The paper proposed a method for fraud detection in telecommunications, based on the user profiling and classification. The telecommunication users were profiled with the LDA probabilistic model. Fraudulent activity was detected on the basis of the threshold-type classification algorithm, allocating the telecommunication accounts into one of two classes: fraudulent account and non-fraudulent account. The classification was performed on the basis of KL-divergence evaluation between a classified account’s LDA model, and a reference account’s LDA model. This paper provided also a method for automatic threshold computation. Also, we introduced the Multinomial Mixture Model (MMM) defined with the formula (9), and we proposed two methods for approximation of the KL-divergence between a pair of such models. This was necessary for the subsequent development of the algorithms for approximating the KL-divergence between two LDA models, since the LDA model incorporates two MMMs, as it is shown in the joint distribution (16). The ability of determining the KL-divergence between LDAs, in turn, was required for operating of our classification algorithm. Consequently, in further part of our paper, two methods for approximation of the KL-divergence between two LDA models were introduced, and, in experimental study, compared to the Monte-Carlo simulation method. For a lower number of latent classes (one to six classes), our methods yield very similar output, what, considering much lower computational complexity than the Monte-Carlo method, can be recognized as the satisfactory result. Finally, we demonstrated the results of telecommunications fraud detection. We assessed the performance of our method on the basis of the comparison with the GMM-based method. We used the ROC curves and its AUROC, HDZF, and LFMD values as the evaluation metrics for the investigated techniques. We carried out the experiments on real-world and simulated data. Our fraud detection method appeared superior over the GMM-based method in both of the studied cases, this way confirming the effectiveness of the proposed approach. A limitation of the proposed approach is associated with the fact that in certain cases, the users, which fall outside of the threshold circle of the classification algorithm, are not necessarily fraudsters, but they can be, for example, the best customers of a telecommunication provider (discussed in Section 7). On the other hand, advantages of the introduced method are: the ability to deal with a small amount of fraudulent samples by avoiding the training process, and the fact of using only three CDR features (destination, start-time, and call duration), this way, making the detection easier and faster. Future research may be focused on integration of various data analysis techniques so as to increase the final rate of fraud detection, like it is done, for example, in [18]. Further study may also concern certain improvements of KL-divergence approximation between LDAs, and threshold setting method.