ترجمه فارسی عنوان مقاله

بررسی شدت حملات فیشینگ: یک رویکرد داده کاوی هیبریدی

عنوان انگلیسی

Assessing the severity of phishing attacks: A hybrid data mining approach

کد مقاله	سال انتشار	تعداد صفحات مقاله انگلیسی
22195	2011	11 صفحه PDF

منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Decision Support Systems, Volume 50, Issue 4, March 2011, Pages 662–672

ترجمه کلمات کلیدی

فیشینگ - خطر - طبقه بندی نظارت شده - استخراج عبارت متن - اهمیت متغیر

کلمات کلیدی انگلیسی

Phishing, Risk, Supervised classification, Text phrase extraction, Variable importance

دانلود رایگان 2 صفحه اول مقاله لاتین (PDF)

پیش نمایش مقاله

چکیده انگلیسی

Phishing is an online crime that increasingly plagues firms and their consumers. We assess the severity of phishing attacks in terms of their risk levels and the potential loss in market value suffered by the targeted firms. We analyze 1030 phishing alerts released on a public database as well as financial data related to the targeted firms using a hybrid method that predicts the severity of the attack with up to 89% accuracy using text phrase extraction and supervised classification. Our research identifies some important textual and financial variables that impact the severity of the attacks and potential financial loss.

مقدمه انگلیسی

Phishing is a major security threat to the online community. It is a kind of identity theft that makes use of social engineering skills and technical subterfuge to entice the unsuspecting online consumer to give away their personal information and financial credentials [5]. A typical phishing attack consists of four phases, namely, preparation, mass broadcast, mature, and account hijack [8]. The tremendous financial impact of phishing is borne by the fact that phishing caused an estimated financial loss of US $3.2 billion affecting 3.6 million people from September 2006 to August 2007 [40]. The number of reported phishing incidents grew exponentially, and increased by 293.7% from 8829 in December 2004 to 34,758 in October 2008 [4] and [5]. Not only do phishing attacks cause financial loss, but they also shatter the confidence of customers in conducting e-commerce. Managers of some of the US super regional banks have indicated that the deteriorating customer trust is a major concern with respect to phishing [46]. A recent survey found that most customers of European banks only use online banking to check their account balances instead of conducting online transactions due to the fear of getting phished [15]. Another study also reported that the customer fear psychosis has resulted in a 20% decrease in the rate of opening of genuine emails [10]. To make customers aware of latest phishing attacks, some international organizations and government statutory bodies, such as the Anti-phishing Working Group (APWG), have published phishing alerts on their websites. To assess the risk level of each phishing attack, some firms have sought help from information security experts who evaluated reported phishing incidents based on the contents of the phishing email and the phishing websites. However, as phishing incidents continue to increase at a tremendous rate, the manual risk assessment method involving experts may be too slow. Data mining techniques can improve the assessment of phishing attacks. They can discover the knowledge embedded in the traits of prior phishing attacks and identify the inherent characteristics that contribute to the different risk levels of a phishing attack. This can help predict the associated risk level of a new phishing incident in a short period of time with a reasonable accuracy. Furthermore, the risk level, which is based on the technical sophistication of phishing attacks, may not be directly related to financial loss caused by an attack. Past research has shown that the impact of sophisticated phishing alerts on stock markets is not as significant as phishing alerts whose risk level is considered to be moderate [33]. However, the financial loss resulting from a phishing attack is always of great concern to security administrators as well as consumers of an organization. Therefore, a warning mechanism that can identify the phishing incidents that are either very risky or likely to cause a large financial loss will be of great interest to shareholders and senior managers of the targeted companies. In this research we use supervised classification techniques, which is a major stream of data mining, to assess the severity of phishing attacks. At the same time, we identify the key antecedents that contribute to a high risk level or a high financial loss generation by a phishing attack. We use a hybrid approach which combines key phrase extraction and supervised classification methods that makes use of the textual data description of the phishing attack as well as financial data of the targeted company to assess the severity of a phishing attack according to its risk level or financial loss generating potential. The three classifiers used for this purpose result in a classification accuracy of up to 89%. Our results also show that the key identifying variables for risk level and potential financial loss of phishing attacks are different from each other. High risk level is associated with phishing emails that ask customers of large firms to update their accounts whereas high financial loss is characterized by phishing attacks targeted to customers of large firms that have high total liabilities.

نتیجه گیری انگلیسی

Phishing has become one of the biggest threats to the online community. Many researchers have explored ways to deter such crime. Information security specialists and anti-phishing organizations have set up phishing alerts databases that assess each reported phishing incident in terms of its risk level. In the view of increasing number of reported phishing incidents, we believe that such a manual assessment approach is not efficient enough to provide a timely report, and is also not complete as it ignores the possible financial impact of phishing incidents. In this research, we adopted a hybrid text and data mining model that used key phrase extraction technique to discover important semantic categories from the textual content of the phishing alerts, and combined those discovered categories with financial data of the targeted companies to come up with classification of risk level of the attack and the loss in market value of the firm that it was likely to cause. The performance of the hybrid model was quite superior in terms of top decile lift and accuracy, and demonstrated the need to consider textual data as well as financial data for making prediction about the severity of the phishing alert. Furthermore, our results showed that risk level and CAR were fundamentally different from each other as we discovered that different textual and financial factors impacted them. This implied that it was important to evaluate both for fully assessing the severity of the phishing alerts—a practice we recommend that all anti-phishing organizations should adopt in future to make their members more knowledgeable about the severity of phishing attacks. In this research, we assume equal misclassification cost. However, in future researchers can conduct the experiments using unequal misclassification cost. For example, if the false positive alarm is issued, it will mislead the investors and other stakeholders of the company. In that case, classifying low risk or low CAR attacks as high risk or high CAR will cause unnecessary worry among investors. However, the impact of misclassifying a high risk or high CAR phishing attack can be quite severe. Such a false negative misclassification may turn out to be very costly for the firm. It will be interesting to see if the assumption of unequal misclassification cost leads to similar prediction as the current research.