دانلود مقاله ISI انگلیسی شماره 17747
ترجمه فارسی عنوان مقاله

تشخیص کلاهبرداری و سایر خسارات غیر فنی در یک ابزار قدرت با استفاده از ضریب پیرسون، شبکه های بیزی و درخت تصمیم

عنوان انگلیسی
Detection of frauds and other non-technical losses in a power utility using Pearson coefficient, Bayesian networks and decision trees
کد مقاله سال انتشار تعداد صفحات مقاله انگلیسی
17747 2012 9 صفحه PDF
منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : International Journal of Electrical Power & Energy Systems, Volume 34, Issue 1, January 2012, Pages 90–98

ترجمه کلمات کلیدی
از دست دادن غیر فنی - داده کاوی - ضریب همبستگی پیرسون - درخت تصمیم گیری - شبکه های بیزی -
کلمات کلیدی انگلیسی
Non-technical loss, Data mining, Pearson correlation coefficient, Decision tree, Bayesian network,
پیش نمایش مقاله
پیش نمایش مقاله  تشخیص کلاهبرداری و سایر خسارات غیر فنی در یک ابزار قدرت با استفاده از ضریب پیرسون، شبکه های بیزی و درخت تصمیم

چکیده انگلیسی

For the electrical sector, minimizing non-technical losses is a very important task because it has a high impact in the company profits. Thus, this paper describes some new advances for the detection of non-technical losses in the customers of one of the most important power utilities of Spain and Latin America: Endesa Company. The study is within the framework of the MIDAS project that is being developed at the Electronic Technology Department of the University of Seville with the funding of this company. The advances presented in this article have an objective of detecting customers with anomalous drops in their consumed energy (the most-frequent symptom of a non-technical loss in a customer) by means of a windowed analysis with the use of the Pearson coefficient. On the other hand, besides Bayesian networks, decision trees have been used for detecting other types of patterns of non-technical loss. The algorithms have been tested with real customers of the database of Endesa Company. Currently, the system is in operation.

مقدمه انگلیسی

A non-technical loss (NTL) is defined as any consumed energy or service which is not billed because of a measurement equipment failure or an ill-intentioned and fraudulent manipulation of the said equipment. For the electrical distribution business, detecting NTLs is a very important task; since, for instance, in Spain it is estimated that the percentage of fraud in terms of energy with respect to the total NTLs about 35–45%. Although in the literature there are many works and researches on fraud and NTL detection in other fields [1], [2], [3], [4], [5], [6], [7], [8] and [9], there is not much research about NTL detection in power utilities [10], [11], [12], [13], [14] and [15] in spite of the percentage of NTLs is high in this field. Besides, these works are basically theoretical and limited to the use of few types of detection techniques (rough sets, support vector machines and wavelet transform). Thus, the current methodology adopted by the electrical companies in the detection of NTLs is basically of two kinds. The first one is based on making in situ inspections of some users (chosen after a consumption study) from a previously chosen zone. The second one is based on the study of the users which have null consumption during a certain period. The main problem of the first alternative is it requires a large number of inspectors and, therefore, involves a high cost. The problem with the second option is the possibility of detecting users only with null consumption (these are only the clearest cases of non-technical losses) and not those customers with non-null consumption but quite lower than the consumption that they might have. Nowadays, data mining techniques [16] and [17] are applied to multiple fields and power utility is an industry in which it has met with success recently [18], [19], [20], [21] and [22]. The work is within the framework of MIDAS project which is being developed at the Electronic Technology Department of the University of Seville with the funding of the electrical company. We have presented the results of the MIDAS project using a detection process based on extraction rules and clustering techniques [23] and [24] as well as preliminary versions of the algorithms for the detection of drops [25]. This article describes new advances in the data mining process applied to detection of NTLs in power utilities. Besides, it includes a complete process of NTL detections from the databases of the Endesa Company. Thus, other additional lines have been developed in order to detect other types of NTLs. One of the ideas of these methods is to identify patterns of drastic drop of consumption. It is because it is known that the main symptom of an NTL is a drop in the billed energy of the customers. Thus, with this purpose, these methods are based on the use of the Pearson coefficient [26] and [27] on the evolution of the consumption of the customer. Besides, in order to carry out the detection of NTLs that include other type of consumption pattern, a model based on a Bayesian network [18] and a decision tree [18] has been developed. A Bayesian network is a graphical model that encodes probabilistic relationships among variables of interest. Bayesian networks are applied in cases of uncertainty when we know certain probabilities and are looking for unknown probabilities given specific conditions. Some applications of Bayesian networks are: churn prevention [28], generation of diagnostic in medicine [29], pattern recognition in vision [30] and fault diagnosis [31] as well as forecasting [32] in power systems. Besides, these networks have also been used to detect anomaly and frauds in disciplines other than power utilities such as credit card or telecommunication networks [2], [33] and [34]. On the other hand, it is possible to find some works that suggest the use of decision trees in power systems [35] and [36] and to detect some types of frauds [7] and [37]. However, besides our studies [23] and [24], as we said, not much research is done on detection of NTLs and frauds in power utilities [12], [13], [14], [15], [16] and [17] and nothing about the detection of consumption drops or development of models with the use of Bayesian networks. In order to carry out the data mining process (including the algorithms as well as the models of Bayesian network and decision tree), we used a powerful software called IBM SPSS Modeler 14 used extensively in data mining. This software provide a quick access to the databases and many libraries for the generation of models such as: clustering processes, decision trees, neural networks and Bayesian networks. The article is structured as follows: Section 2 describes the sample set which has been used to develop the algorithms and select the customers to be inspected by the company. Sections 3 and 4 describe the developed models. Finally, Section 5 contains the results as well as the conclusions from the study.

نتیجه گیری انگلیسی

Prior to sending the customers for the inspection, the results of all generated models were crosschecked in order to see how many customers were matching and to ensure that the different algorithms were not redundant. Thus, after merging the customers detected with each one of the algorithms (the 2 algorithms to detect drops, the Bayesian network and the decision tree), the results shown in Table 1 were obtained. Table 1. Customers selected to be inspected by the company. Algorithm No. customers selected Progressive drop and stabilization 18 Drastic drop and stabilization 23 Bayesian network 43 Decision tree 64 Total 148 Once merged the results of the previous algorithms 140 Table options As is evident from the table only 8 customers from the 148 selected customers were detected by more than one algorithm. Thus, we could deduce that each algorithm detected a type of different patterns of NTL. Thus, a list of 140 customers with an evident and suspicious pattern of consumption with NTLs was obtained. These cases of NTL could be due to a drop of electrical demand for their business but never due to a low contract because in that case they would have reading information in their equipment. Therefore, it was significant as additional information to study the type of business of these suspicious customers in order to know whether it was a business in which the demand is currently falling (e.g., currently, the construction business in Spain). Thus, we studied the business information for each customer in order to be able to control this fact and to avoid unnecessary inspections. It is known by the inspectors of the Company that the following types of business are more likely to have consumption drops innate to their use of the energy (and not due to possible NTLs): wells, lightings, irrigation pumps, water purification and construction (previously mentioned). So, from the 148 selected customers, we filtered those with these types of contracts (and therefore likely to have an anomaly pattern of consumption) and a definitive list of 101 was obtained. In summary, a complete flow chart is shown in Fig. 10. In this diagram it is possible to observe the global scheme and the different steps for the detections of the NTLs. Full-size image (31 K) Fig. 10. Flow chart of the detection process. Figure options Currently, the Endesa Company is carrying out inspections with a set of customers from the ones who were detected by the presented methods. Up to now, with the results obtained in the in situ inspections, the system has reached a success rate of 38%. These results are considered very satisfactory taking into account, first, the rate of success of the company in its routine inspections (less than 10%) and, second, the little input information used in the algorithms (only the evolution of the consumption of the customer and the type of contract). As conclusions, it is necessary to remark that NTL is an important issue in power utilities because it has a high impact on company profits. Despite this, nowadays the methodology of detection of NTLs of the companies is very limited since these companies use detection methods that do not exploit the use of data mining techniques. Different methods to detect NTLs have been developed and tested on a real database supplied by the Endesa Company. Concretely, in this paper, a line of work based on 5 different algorithms has been presented for the detection of NTLs using of the Pearson coefficient, Bayesian networks and decision trees. Before sending the customers for inspection, a table analysis involving a filtering task by the type of contract was carried out in order to enhance the accuracy percentage of the detections. The system obtained a success rate of 38% in the inspection of real customers aided by the presented algorithms. At present, in terms of energy, these detections are equivalent to a total energy recovery of about 2 millions of kWh, which implies a large amount of money is recovered for the Endesa Company. Therefore, the contributions of this work with respect to the existing ones are as follows: – The development of a system based on various complementary models with the application of techniques not implemented until now in the literature about the topic [11], [12], [13], [14], [15], [16], [23], [24] and [25]. – A system tested in field and which is currently in operation by one of the most important power utilities of the world (Endesa Company). – Good results (and better than those obtained with the lines of work existing in the literature) obtained both verification (with the different tests in the algorithms) and validation (the inspections carried out in situ). Also, to avoid this filtering task carried out by the human component, we are currently developing an expert system that takes automatically the last filtering process (depending on the type of business of the customers) and the final selection of customers (from whose selected by the algorithms described in this paper) to be inspected in situ by Endesa Company.