داده کاوی مبتنی بر تشخیص حمله تزریق SQL با استفاده از درخت های پرس و جو داخلی
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|22314||2014||15 صفحه PDF||سفارش دهید||12910 کلمه|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Expert Systems with Applications, Volume 41, Issue 11, 1 September 2014, Pages 5416–5430
Detecting SQL injection attacks (SQLIAs) is becoming increasingly important in database-driven web sites. Until now, most of the studies on SQLIA detection have focused on the structured query language (SQL) structure at the application level. Unfortunately, this approach inevitably fails to detect those attacks that use already stored procedure and data within the database system. In this paper, we propose a framework to detect SQLIAs at database level by using SVM classification and various kernel functions. The key issue of SQLIA detection framework is how to represent the internal query tree collected from database log suitable for SVM classification algorithm in order to acquire good performance in detecting SQLIAs. To solve the issue, we first propose a novel method to convert the query tree into an n-dimensional feature vector by using a multi-dimensional sequence as an intermediate representation. The reason that it is difficult to directly convert the query tree into an n-dimensional feature vector is the complexity and variability of the query tree structure. Second, we propose a method to extract the syntactic features, as well as the semantic features when generating feature vector. Third, we propose a method to transform string feature values into numeric feature values, combining multiple statistical models. The combined model maps one string value to one numeric value by containing the multiple characteristic of each string value. In order to demonstrate the feasibility of our proposals in practical environments, we implement the SQLIA detection system based on PostgreSQL, a popular open source database system, and we perform experiments. The experimental results using the internal query trees of PostgreSQL validate that our proposal is effective in detecting SQLIAs, with at least 99.6% of the probability that the probability for malicious queries to be correctly predicted as SQLIA is greater than the probability for normal queries to be incorrectly predicted as SQLIA. Finally, we perform additional experiments to compare our proposal with syntax-focused feature extraction and single statistical model based on feature transformation. The experimental results show that our proposal significantly increases the probability of correctly detecting SQLIAs for various SQL statements, when compared to the previous methods.
With the development of information technology, a massive amount of sensitive personal information and proprietary information has accumulated in databases, which are considered to be the most valuable asset of organizations. However, the more the economic value of data increases, the more the attempt to extort the data increases. As database security concerns increase, Gartner recognizes the emergence of database audit and protection (DAP) tool (Wheatman, 2012). In addition, Gartner considers the detection and prevention of database intrusion attack as one increasingly important use case of DAP. Database intrusion attacks could be broadly categorized into two types, depending on the access point. In the first type, malicious users with privileged user accounts or compromised user accounts directly access the database, and abuse the structured query language (SQL), to harvest the data. In the second type, malicious users indirectly access the database using the vulnerability of database-driven web applications. That is, malicious users attack the database by altering the original SQL statements within the applications, through the user input values. This type of database intrusion is well known as an SQL injection attack (SQLIA). For the first type of database intrusion, the detection of abnormal behaviors of users has been much studied through the analysis of database log (Bertino et al., 2005, Mathew et al., 2010 and Shebaro et al., 2013). On the other hand, for the second type of database intrusion, namely SQLIA, the detection of malicious user inputs has been studied mainly through the analysis of queries generated within the web application (Shar & Tan, 2013). The approach to detect SQLIA at the application level turns out to be unable to detect some types of attacks (which will be discussed later in more detail). The theme of the paper is to design an efficient and accurate method to detect such SQLIAs at the database level, using a query tree (which is an internal representation of an SQL statement) written in database logs. The basis of our proposal is to model SQLIA detection as a data-mining based binary classification problem, in order to separate malicious query trees from normal query trees. The data-mining based method to detect SQLIA is beneficial in detecting unknown attacks with high accuracy against the rapid emergence of various forms of attack (Choi et al., 2012, Pinzón et al., 2011, Santos et al., 2011 and Wu and Yen, 2009). We utilize the Support Vector Machine (SVM) as a binary classification. The SVM is known to provide high accuracy in the process of a binary classification, and to deal with high-dimensional data (Boser et al., 1992 and Burges, 1998). During the binary classification processing, we use the non-linear vector kernel function as a vector similarity measurement. The non-linear vector kernel function helps the linear binary classifier be extended to a non-linear binary classifier (Hofmann, Schölkopf, & Smola, 2008). In most cases, the non-linear classifier has better accuracy than the linear classifier (Ben-Hur & Weston, 2010). Through the experiments, we select the non-linear vector kernel function, and good values for the kernel parameters that are suited for SQLIA detection. We report the evaluation results of the SVM model with the chosen kernel function and kernel parameters. The experimental result of our proposal shows that the area under receiver operating characteristics curve (AUC) is 0.999 for SELECT and INSERT statements, and the AUC is 0.996 for stored procedures. This means that our SQLIA detection method yields at least 99.6% of the probability that the probability for malicious queries to be correctly predicted as SQLIA is greater than the probability for normal queries to be incorrectly predicted as SQLIA. The rest of this paper is structured as follows. In Section 2, we describe related works. In Section 3, we propose the framework to detect SQLIA at the database level and describe our contributions. In Section 4, we describe a method to convert the query tree into a feature vector representation, which is our major contribution. We report experimental results on our proposal in Section 5. We conclude in Section 6 with conclusions and future work.
نتیجه گیری انگلیسی
In this paper, we present a framework for detecting SQLIA at the database level, based on a data-mining technique. As our main contribution, we propose a novel feature vector generation method, which converts the query trees of a real database system into n-dimensional feature vectors, through multi-dimensional sequences. In the conversion process of the query tree, we use the feature extraction method with extracting semantic features, as well as syntactic features, and the feature transformation method with combining multiple statistical models. These methods decrease the computation time while increasing the probability of correctly detection SQLIA, and hence improve the overall performance. We implement the SQLIA detection system utilizing the SVM classification and various kernel functions and show the excellence of our proposal with experiments. Our proposal achieves a high probability in correctly detecting SQLIA for a stored procedure, as well as the SELECT statement and the INSERT statement. In addition, we compare experiments with our proposal with experiments using only syntactic features and experiments using only a single statistical model, and then show the our proposal is competitive against the other methods. In our work, the most difficult task is to search the criteria internal paths, owing to the variableness of the query tree; and the issue is the time consumption to generate a multi-dimensional sequence from a query tree. In future work, we will study a method to easily search the internal paths for query trees of various database systems, and we will attempt to reduce the overall time for feature vector generation. Next, we plan to apply our methods for detecting abnormal user behavior and study the method to cooperate with integrated techniques to detect database intrusion attack in the DAP tool.