اثرات تصادفی مدل رگرسیون لجستیک برای تشخیص ناهنجاری
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|24847||2014||5 صفحه PDF||سفارش دهید||3740 کلمه|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Expert Systems with Applications, Volume 37, Issue 10, October 2010, Pages 7162–7166
As the influence of the internet continues to expand as a medium for communications and commerce, the threat from spammers, system attackers, and criminal enterprises has grown accordingly. This paper proposes a random effects logistic regression model to predict anomaly detection. Unlike the previous studies on anomaly detection, a random effects model was applied, which accommodates not only the risk factors of the exposures but also the uncertainty not explained by such factors. The specific factors of the risk category such as retained ‘protocol type’ and ‘logged in’ are included in the proposed model. The research is based on a sample of 49,427 random observations for 42 variables of the KDD-cup 1999 (Data Mining and Knowledge Discovery competition) data set that contains ‘normal’ and ‘anomaly’ connections. The proposed model has a classification accuracy of 98.94% for the training data set, while that for the validation data set is 98.68%.
As advances in networking technology help to connect people around the globe, the internet continues to expand its influence as a medium for communications and commerce. At a similar speed, the threat from spammers, system attackers, and criminal enterprises has continually escalated. Intrusion Detection Systems (IDS) analyze audit trail data to detect any unusual user behavior. In addition, IDS detects hostile activities or exploits in a network (Depren, Topallar, Anarim, & Ciliz, 2005). Although the idea behind intrusion detection is that simple patterns of legitimate user behavior can be captured and the behavior of an anomalous user can be distinguished and identified from normal users (Anderson, 1980), abnormal behavior detection is still a difficult task to implement because of unpredictable attacks (Wang, 2005). Statistical analysis is the most widely used technique, which defines normal behavior by collecting data relating to the behavior of legitimate users over a period of time (Anderson, Lunt, Javits, Tamaru, & Valdes, 1995). Statistical techniques have been adapted to anomaly detection, which includes principal component analysis (Shyul, Chen, Sarinnapakorn1, & Chang, 2003), cluster and multivariate analysis (Taylor & Alves-Foss, 2002), Bayesian analysis (Barbard, Wu, & Jajodia, 2001), frequency and simple significance tests (Masum et al., 2000, Qin and Hwang, 2004 and Zhou and Lang, 2003), and multinomial logistic regression (Wang, 2005). Gowadia, Farkas, and Valtorta (2005) adapted the occurrence probability of specific attacks in the existing Bayesian Networks-based anomaly detection system. By observing the input parameters, they were able to anticipate the occurrence probability of specific attacks corresponding to the sequence of input parameters. Lee, Kim, and Kwon (2008) proposed a method for proactive detection of DDoS attacks by exploiting its architecture; which consists of a selection of handlers and agents, communication and compromise, and attack by cluster analysis. Wu and Zhang (2006) presented novel anomaly detection and a clustering algorithm for network anomaly detection based on factor analysis and the Mahalanobis distance. Depren et al. (2007) proposed a novel IDS architecture utilizing both anomaly and misuse detection approaches. The proposed anomaly detection module used a Self-Organizing Map (SOM) structure to model normal behavior. SOM is a neural network model for analyzing and visualizing high dimensional data. Arranz, Cruz, Sanz-Bobi, Ruiz, and Coutino (2008) used neural network for detection of anomalies. Statistical approaches to anomaly detection have several advantages and disadvantages. First, the disadvantage is that skilled attackers can be accustomed to statistical anomaly detection, also known as the inability to decipher the difference between abnormal and normal behavior. It can also be difficult to determine thresholds that balance the likelihood of false positives with the likelihood of false negatives. In addition, statistical methods need accurate statistical distributions, but not all behaviors can be modeled using purely statistical methods (Patcha & Park, 2007). However, the advantage is not only the ability to detect novel attacks or unknown attacks, but also the systems do not require prior knowledge of security flaws or attacks. Statistical approaches can provide accurate notification of malicious activities that typically occur over extended periods of time and are good indicators of impending attacks. One of the popular statistical approaches is a fixed effect logistic regression model, which accommodates predictors for anomaly behavior. However, this model does not accommodate variation that cannot be explained by such predictors. Accordingly, in this paper a random effects logistic regression model is proposed. The advantage of using such a random effects model for anomaly detection is to accommodate not only the network environment characteristics but also the uncertainty that cannot be explained by such network environment characteristics. The random effects model has been frequently used to accommodate both ‘between cluster variation’ as well as ‘within cluster variation’ (Sohn, 1996, Sohn, 1997, Sohn, 1999, Sohn, 2002, Sohn and Choi, 2006 and Sohn and Park, 1998). The outline of this study is as follows: Section 2 introduces the anomaly detection, and Section 3 deals with the random effects logistic regression model for anomaly detection. Section 4 contains an empirical case study and its results. Finally, in Section 5, the results of the study are summarized.
نتیجه گیری انگلیسی
As detection of system attacks has become an important factor to strengthen the competitiveness of a country, the Korean government has been increasing investment in computer security systems. In order to effectively manage system attack problems, an accurate anomaly detection model is needed. Many anomaly detection systems have utilized various models using logistic regression, multiple discriminant analysis, neural network, and clustering. However, there is a weakness in that these approaches do not accommodate the situation where the systems exhibit different attack probabilities under the same condition. The empirical study results indicate high classification accuracy of the random effects logistic regression model. From the analyzed results, it is recommended to use a random effects logistic regression model for predicting anomaly detection. In this paper, the random effects logistic regression model is proposed for anomaly detection that considers not only system characteristics, but also the uncertainty that cannot be explained by such predictor characteristics. The third International Knowledge Discovery and Data Mining Tools Competition (KDD-cup) 1999 data was analyzed. With more information regarding target variables, further analysis is required to predict various anomaly levels.