تکاملی مبتنی بر انتخاب ویژگی رویکردهای با معیارهای جدید برای داده کاوی: یک مطالعه موردی از داده های تصویب اعتبار
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|22154||2009||9 صفحه PDF||سفارش دهید||4920 کلمه|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Expert Systems with Applications, Volume 36, Issue 3, Part 2, April 2009, Pages 5900–5908
In this paper, the feature selection problem was formulated as a multi-objective optimization problem, and new criteria were proposed to fulfill the goal. Foremost, data were pre-processed with missing value replacement scheme, re-sampling procedure, data type transformation procedure, and min-max normalization procedure. After that a wide variety of classifiers and feature selection methods were conducted and evaluated. Finally, the paper presented comprehensive experiments to show the relative performance of the classification tasks. The experimental results revealed the success of proposed methods in credit approval data. In addition, the numeric results also provide guides in selection of feature selection methods and classifiers in the knowledge discovery process
Recently, data mining or knowledge discovery in databases (KDD) has emerged as a very active, evolving area in information technology. Hundreds of novel mining algorithms and new applications such as medicine, business, and engineering have been proposed in the last decade. The aim of data mining is to extract knowledge from data (i.e., to help human finding and interpreting the ’hidden information’ in massive raw data). The information and knowledge mined from the large quantities must be meaningful enough to lead to some advantages, usually economic advantages (Witten & Frank, 2005). A credit scoring technique is the set of decision models and their fundamental techniques assist lenders in the granting of consumer credit (Thomas, 2000). It has been extensively used for credit admission evaluation in recent years. The basic principle of credit scoring is based on the analysis of the past performance of consumers to predict the credit score of those who will be assessed. In fact, the essential operations and philosophy are similar to the knowledge discovery process. Researchers have developed a variety of parametric statistical models such as LDA and logistic regression models (Desai, Crook, & Overstreet, 1996) for credit scoring. Nevertheless, assumptions of the underlying probability distribution are essential part of these methods. Moreover, those methods also assume linear relationships between attributes. These restrictions or shortages decrease the predictive accuracy of the credit scoring models and prevent their success. In this paper, we applied meta-heuristic search techniques to find approximations of Pareto optimal set for the feature selection problem. Moreover, we proposed two new objectives for this combination optimization problem. Some pre-processing steps were conducted before the knowledge discovery process. The primary contributions of the paper are as follows: 1. Since the feature selection problem could be considered as a combination optimization problem, the paper proposed new criteria for single/multiple objective evolutionary feature selection. The paper presented comprehensive experiments to show the relative performance of the classification tasks in the knowledge discovery process. 2. The results of an empirical study presented the relative performance of five different feature selection techniques. The results show: (a) New criteria with evolutionary algorithm outperform other feature selection methods. (b) K-nearest neighbor classifier usually produces poor performance no matter what performance measure is used. The remainder of this paper is organized as follows. Section 2 described the workflow of the knowledge discovery process. How we preprocess data instances were described precisely in the section. Section 3 introduced the feature selection problem and proposed solutions. The new objectives for single/multiple objective optimization were proposed in the section. In Section 4, we presented experimental setting and results. Finally, we concluded in Section 5.
نتیجه گیری انگلیسی
In the paper, we proposed new criteria for single and multi-objective evolutionary feature selection algorithms. Comprehensive experiments were conducted to show the relative performance of the classification tasks. It was observed that the evolutionary-based feature selection with new criteria outperforms other techniques in finding a large and important part of the Pareto front in the feature selection problem. Since data pre-processing and feature selection are important steps in the knowledge discovery process, the further work will apply these techniques with other classifiers to larges scale problems. Moreover, new multi-objective evolutionary algorithms should be considered.