روش تشخیص کلاهبرداری زودهنگام موثر برای مزایده های آنلاین
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
17755 | 2012 | 15 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Electronic Commerce Research and Applications, Volume 11, Issue 4, July–August 2012, Pages 346–360
چکیده انگلیسی
While online auctions continue to increase, so does the incidence of online auction fraud. To avoid discovery, fraudsters often disguise themselves as honest members by imitating normal trading behaviors. Therefore, maintaining vigilance is not sufficient to prevent fraud. Participants in online auctions need a more proactive approach to protect their profits, such as an early fraud detection system. In practice, both accuracy and timeliness are equally important when designing an effective detection system. An instant but incorrect message to the users is not acceptable. However, a lengthy detection procedure is also unsatisfactory in assisting traders to place timely bids. The detection result would be more helpful if it can report potential fraudsters as early as possible. This study proposes a new early fraud detection method that considers accuracy and timeliness simultaneously. To determine the most appropriate attributes that distinguish between normal traders and fraudsters, a modified wrapper procedure is developed to select a subset of attributes from a large candidate attribute pool. Using these attributes, a complement phased modeling procedure is then proposed to extract the features of the latest part of traders’ transaction histories, reducing the time and resources needed for modeling and data collection. An early fraud detection model can be obtained by constructing decision trees or by instance-based learning. Our experimental results show that the performance of the selected attributes is superior to other attribute sets, while the hybrid complement phased models markedly improve the accuracy of fraud detection.
مقدمه انگلیسی
The Internet has changed the way the people interact with each other. Family and friends can instantly and conveniently get in touch with each other in ways that were unimaginable only a few decades earlier. Such speed and convenience also opened e-commerce to both businesses and individuals around the world. Nowhere has this been more obvious or lucrative than in the case of online auctions, where millions of transactions can occur in the blink of an eye. Physical goods, as well as service packages, are traded as online commodities without the limitations of time and physical location. Because the Internet offers many opportunities to interact with strangers (Resnick et al. 2000), this anonymity combined with the convenience of the Internet allows the online auction to become prosperous. For example, eBay—the largest worldwide auction site—posted US$9.2 billion in revenue and US$1.8 billion net income for 2010 (eBay 2010), while the total revenue of Taiwan’s online auction market reached NT$15.3 million (III 2010). Unfortunately, such vast profits also attract the attention of criminals who use fraud to cash in on the lucrative online trading market. According to annual reports of the Internet Complaint Center, online auction fraud ranks as one of the top two serious Internet crimes in recent years, contributing to a risky situation for online auction participants (NW3C 2010). Most online auction houses realize that fraud corrodes not only their trustworthiness but also the prosperity of the entire market. For instance, multiple online identities are easy to create: a fraudster could use his many accounts to execute sophisticated schemes, while camouflaging his malicious intent and evading traditional detection methods that merely examine individual identities (Chau 2011). As more and more inexperienced traders become targeted victims, they begin to distrust the market, resulting in fewer buyers and fewer sellers (Gavish and Tucci 2008). To help promote trust in the online market, auction houses developed reputation systems to assist users in evaluating potential trading partners. Reputation systems help buyers decide whether to purchase a product based on a feedback score. After each trade, both the seller and the buyer can leave ratings and feedback comments on the other party. Over time, these comments and feedback accumulate in the trader’s transaction history, of which the feedback score is one part. This kind of reputation system is simple and easy to understand. It uses +1, 0, and −1 to denote the level of satisfaction for a trade. However, this kind of scoring mechanism has some drawbacks (Rubin et al., 2005 and Buchegger and Boudec, 2003). For instance, the level of satisfaction cannot express participants’ precise thoughts or insights. In addition, a buyer may hesitate to give negative feedback to the seller to avoid receiving vengeful feedback in return. While reputation systems provide a certain degree of protection, they are not enough to protect traders from fraudulent schemes. Most online auction houses adopt passive approaches to the coordination of reputation systems and management policies that could address fraudulent schemes. However, if users had more proactive approaches, such as an automatic fraud detector, online trading could be safer. From the perspective of crime prevention, the capability of early warning is indispensible for fraud detection (Burge et al., 1997 and Dohono, 2004). The detection procedure must not only identify fraud that has already occurred, but it must warn traders of potential fraudsters. The identification of a fraudster should not rely only on behavioral features that occur once the fraud has been activated. Even a less-experienced trader can distinguish between a normal trader and a fraudster if the fraudster’s transaction history reveals more negative ratings. An early fraud detection system would provide a method that would alert users before a fraud is activated. Auction houses can both help and benefit from an effective early fraud detection system. In spite of possible misjudgment on suspicious accounts, the auction houses can mark potential fraudsters that are under surveillance as early as possible. As a result, the quality of services of auction houses will be improved by eliminating potential fraudulent events. Prior research has proposed online auction fraud detection (Chau and Faloutsos 2005, Ku et al., 2007, Pandit et al., 2007 and Zhang et al., 2008). The typical detection procedure used in the previous work consists of two fundamental steps: (1) a set of attributes is devised and their values are extracted from the transaction histories to distinguish between normal traders and fraudsters; and (2) a detection model based on these attributes is built by machine learning techniques, such as decision trees or instance-based learning. In terms of devising a set of attributes, for example, the speed of obtaining feedback scores is an effective attribute to identify a fraudster who builds his reputation rapidly with fake transactions. In general, the detection accuracy is strongly related to the effectiveness of the attribute set and the appropriateness of the modeling method. Further, the cost of detection is affected by how efficiently the attributes can be extracted from the transaction histories. Previous work in this area has provided some level of progress in this area; however, some problems still exist and need to be resolved. • Latent fraudulent behavior is usually sophisticated and opaque, often including several fraudsters working together to scam buyers. For example, fraudsters can help each other by pretending to be buyers and leaving a lot of positive feedback to persuade other traders that the fraudster is actually a reliable trading partner. • Using a larger number of measured attributes incurs more computation effort and is not necessarily helpful in raising fraud detection accuracy. In fact, the detection accuracy will be degraded if irrelevant attributes are incorporated into model construction. Additionally, even though some algorithms, such as the naïve-Bayes algorithm, are robust with respect to irrelevant attributes, the performance may degrade quickly if correlated attributes are added (Kohavi and John 1997). Applying the expectation-maximization (EM) algorithm might be helpful in fraud detection that calculates the fraudster cluster probabilities as expected class values and maximizes the likelihood of the distributions. • Data retrieval for long-lived accounts is limited, and constructing a complete transaction history is impossible. Even if such retrieval were possible, analyzing such a huge amount of data would be too time-consuming. Therefore, it is practical to construct a more parsimonious detection model that does not require navigating numerous complex web pages to differentiate between legitimate users and fraudsters. Meanwhile, the detection accuracy can be similar to or better than those generated by complicated and costly steps. To detect fraud effectively and efficiently in online auctions, this study aims to develop a detection method with higher accuracy but lower cost. First, we focus on devising a concise set of measured attributes for early fraud detection. For this purpose, a rich attribute set comprised of 44 elements is evaluated using a modified wrapper procedure. Based on the results, 10 of the 44 attributes are chosen. A complement phased modeling procedure is then proposed to further improve the detection accuracy and to reduce the cost of model construction. To test the effectiveness of the proposed procedure, 2475 real transaction records were collected from Yahoo! in Taiwan for analysis. The results demonstrate that the set of 10 selected attributes is superior to that of two larger attribute sets. This finding suggests that detection accuracy can be maintained or even improved by using fewer attributes in the model. Moreover, the hybrid complement phased modeling can materially improve the detection accuracy to more than 90%. The rest of this paper includes a literature review followed by a section on the methods of measured attribute selection that is used in this study. Section 4 discusses how to apply complement phased modeling to construct early detection models with the selected measured attributes. Section 5 presents the experimental results, followed by the conclusion and suggestions for future work in the final section.
نتیجه گیری انگلیسی
The prosperity of online auctions has produced tremendous revenue and profit growth. However, with more profit comes more temptation, with the result being more fraudsters. Internet crime statistics demonstrate that the number of online auction fraudsters continues to increase rapidly (NW3C 2010). To obtain more benefits, fraudsters develop new sophisticated schemes to increase the success rate of scamming and avoid being discovered. Experienced traders identify fraudsters by examining negative feedback with correlated information recorded in transaction histories. However, we cannot assume that all negative feedback occurs as a result of scams. Miscommunication can also lead to negative feedback. Sometimes negative feedback is left by malicious competitors. Even good traders can have occasional negative feedback in their transaction histories. Apparently, traders depend on self-securitizing that is not enough to prevent fraud. Formulating the behavior patterns in the latency period reduces the risk of being defrauded. It is more practical to reduce the threat of fraud if there is an effective early detection system that warns traders about highly suspicious fraudsters as soon as possible. For this purpose, this study identified and tested a concise subset consisted of 10 measured attributes with hybrid complement phased models to construct an early fraud detection system. The reduced attribute set not only uses less computational effort in modeling, but also decreases the possibility of mutual interference or correlation among attributes. The contribution of this study is to show that a modified wrapper procedure for attribute selection can be efficiently integrated with complement phased modeling to construct a fraud detection system. It is noteworthy that most of the selected attributes in the wrapper procedure are rating-related. The result implies that the binary reputation system is not perfect but valuable. And, it is not enough to detect fraud by just looking at prices and items. In fact, experienced users can identify some fraudsters by investigating the relationships among ratings, prices and items simultaneously. For those who lack tacit knowledge about ratings, an early fraud detection system can make it explicit with learning algorithms. Another contribution of this study is to show how the hidden information needed for fraud detection can be extracted from the existing reputation systems. The experimental results demonstrate that our approach is useful in identifying both activated and latent fraudsters. On average, the F measure is around 90%, indicating the feasibility of early fraud detection in the real world. These results ascertain that both the appropriate selection of attributes and the use of the complement phased model not only enhance the effectiveness of a fraud detection model, but increase the accuracy of the prediction. As mentioned earlier, researchers ( Chau and Faloutsos, 2005, Chang and Chang, 2009 and McGlohon et al., 2009) have proposed many attributes for use in fraud detection. Intuitively, a large attribute set with many kinds of features would seem to increase detection accuracy. However, according to the results of this research, this assumption it is not necessarily true. On the contrary, a carefully-selected attribute set that is compact can be more effective than an all-inclusive one. In addition, a lot of price-related and item-related attributes are proposed in previous research; however, we found that the rating-related attributes are more effective than the others. This indicates a new direction in the development of attribute sets. In fact, McGlohon et al. (2009) and Chua et al. (2006a,b) have considered the issue of improving the detection accuracy by identifying the fraudsters’ accomplices. We think it is a promising line of research, but these methods need to be more carefully designed to detect latent fraudsters, such as by including the hybrid phased modeling proposed in our work. The early fraud detection system can be applied by users, auction houses, and law enforcement. Users have extra objective analysis information to assist them in their final bidding decisions. Auction houses can monitor any suspects, and then provide warning to those who just paid to pay attention to what will buy. In addition, law enforcement can use the system to investigate the validity of a fraud report. It is important to note that fraudsters also observe experienced traders’ behavior and adjust their own accordingly to stay one step ahead in this adversarial situation. As a result, fraudulent schemes are constantly evolving. Moreover, fraudsters can also use the same learning algorithm to learn the behavior of a detection system. Therefore, the detection system should be revised continuously according the evolution of fraud schemes to maintain an acceptable level of accuracy; otherwise, the system will become obsolete in a short time. That is the main concern of our future work. One limitation of this study is that text comments were not converted into measured attributes. In fact, an online auction user can leave text feedbacks for a completed trade, which describe the details of feelings and level of satisfaction to this transaction. People can read a lot intention between lines. Obviously, these texts can provide more deep insight than a simple positive, neutral or negative feedback score. Thus, using text mining to analyze comments could further enhance the precision of fraud identification. In practice, textual analysis increases computing efforts as well as the time it takes to predict potential fraud. In addition, it is hardly to differentiate the messages from real buyers or fake buyers. Even so, tools for automatic content analysis or sentiment analysis would be other considerations in our future work. Another limitation is that not all transactions history information kept on the auction sites is available to the public. As a result, an outsider of the auction site was unable to obtain other trading information that could be useful for identifying fraudster. Only limited numbers of attributes can be designed for the fraud detection system. Combining clustering techniques into the detection system could be another direction for future work. Using probabilities and clustering techniques rather than making arbitrary decisions seems beneficial because they allow the detection procedure to converge slowly instead of jumping to conclusions that may be wrong. In applications, coupling naïve Bayes and EM algorithms in this manner performs well in the domain of document classification. Clustering techniques such as EM are good at dealing with unlabeled instances. Understanding the nature of fraudulent behavior by clustering is helpful in detecting latent fraudsters. This is a tradeoff when labeled instances are expensive but unlabeled ones are virtually free. However, for the purpose of early fraud detection, it is necessary to identify the few labeled fraudsters to train the learner. The EM clustering algorithm assumes that the data is generated randomly from a mixture of different probability distributions. In particular, a one to one correspondence between mixture components and classes has been assumed. However, when the learner is being trained, our goal is to label known instances of fraud. In our future work, EM could be applied to cluster the types of fraudulent behavior in advance for labeling unknown instances. The ability to identify how fraudulent behavior develops would enhance the accuracy of prediction models.