داده کاوی و کاربرد پردازش در گزارش جزئی از یک شرکت هواپیمایی در ترکیه
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
22223 | 2011 | 9 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Expert Systems with Applications, Volume 38, Issue 6, June 2011, Pages 6618–6626
چکیده انگلیسی
Risk and safety have always been important considerations in aviation. With the rapid growth in air travel, flight delays, cancellations and incidents/accidents have also dramatically increased in recent years (Nazeri & Jianping, 2002). There is a large amount of knowledge and data accumulation in aviation industry. These data could be stored in the form of pilot reports, maintenance reports, incident reports or delay reports. This paper focuses on different preprocessing and feature selection techniques applied on the 15 component reports of an airline company in Turkey to understand and clean the data set. Regression analysis, anomaly detection analysis, find dependencies and rough sets are used in this study in order to reduce the data set. Also the classification techniques of data mining are used to predict the warning level of the component as the class attribute. For this purpose Polyanalyst, SPSS Clementine, Minitab and Rosetta software tools are used. Find laws module of Polyanalyst is used to find the relations and information retrieval about the components warning level.
مقدمه انگلیسی
Data mining methods have been successfully applied to different fields. Aviation industry is one of these fields. With the rapid growth in air travel, flight delays, cancellations and incidents have also dramatically increased in recent years (Nazeri & Jianping, 2002). As a result of this, there is a large amount of knowledge and data accumulation in aviation industry. These data could be stored in the form of pilot reports, maintenance reports, incident reports, component reports or delay reports. This paper explains the preprocessing and data mining application on the component reports of an airline company in Turkey. Nowadays the analysis of such data is automatically conducted and analysts have difficulties in dealing with the growing data efficiently and on time. In conclusion, in the automatic and smart analysis of the complexed structure high volume data in aviation industry capable instruments are needed. Data mining-one of those instruments – which was not known before and is potentially useful is an instrument used for to reveal the information hidden in the data (Jiawei & Kamber, 2001). The science of extracting useful information from large data sets or databases is known as data mining. It is a new discipline, lying at the intersection of statistics, machine learning, data management and databases, pattern recognition/artificial intelligence, and other areas (Hand, Manila, & Smyth, 2001). Data mining is the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owner. The relationships and summaries derived through a data mining exercise are often referred to as models or patterns (Hand et al., 2001). Data mining, also popularly referred to as knowledge discovery in databases (KDD), is the automated or convenient extraction of patterns representing knowledge implicitly stored in large databases, data warehouses, and other massive information repositories (Jiawei & Kamber, 2001). The KDD process is outlined in Fig. 1 (Dunham, 2002). It is interactive and iterative involving, more or less, the following steps (Mitra & Acharya, 2003): 1. Understanding the application domain: This includes relevant prior knowledge and goals of the application. 2. Extracting the target data set: This is nothing but selecting a data set or focusing on a subset of variables, using feature ranking and selection techniques. 3. Data preprocessing: This is required to improve the quality of the actual data for mining. This also increases the mining efficiency by reducing the time required for mining the preprocessed data. Data preprocessing involves data cleaning, data transformation, data integration, data reduction or data compression for compact representation, etc. (a) Data cleaning: It consists of some basic operations, such as normalization, noise removal and handling of missing data, and reduction of redundancy. Data from real-world sources are often erroneous, incomplete, and inconsistent, perhaps due to operational error or system implementation flaws. Such low-quality data need to be cleaned prior to data mining. (b) Data integration: Integration plays an important role in KDD. This operation includes integrating multiple, heterogeneous data sets generated from different sources. (c) Data reduction and projection: This includes finding useful features to represent the data (depending on the goal of the task) and using dimensionality reduction, feature discretization, and feature extraction (or transformation) methods. Application of the principles of data compression can play an important role in data reduction and is a possible area of future development, particularly in the area of knowledge discovery from multimedia data set. 4. Data mining: Data mining constitutes one or more of the following functions, namely, classification, regression, clustering, summarization, image retrieval, discovering association rules and functional dependencies, and rule extraction. 5. Interpretation: This includes interpreting the discovered patterns, as well as the possible (low-dimensional) visualization of the extracted patterns. Visualization is an important aid that increases understandability from the perspective of humans. One can evaluate the mined patterns automatically or semiautomatically to identify the truly interesting or useful patterns for the user. 6. Using discovered knowledge: It includes incorporating this knowledge into the performance system and taking actions based on the knowledge
نتیجه گیری انگلیسی
In this study we have explored the use of different preprocessing techniques on aviation components data. We have discussed the basic concepts and principles of data mining. In the first step, preprocessing is applied for cleansing of data and finding dependencies between input and output attributes. In the second step, find laws classification algorithm is performed for pattern extraction within aviation components data. The result equations for all data sets are shown in Table 36. As shown in Table 36, data sets reduced by rough sets (Rs) and find dependencies (Fd) are more effective and so we have given the rules that are found by these data sets. We talked about the rules that were found by this analysis with the experts of aviation company. As a result they said that these rules are usable.