حداکثر حاشیه یک الگوریتم ژنتیکی برای طبقه بندی اشتباه به حداقل رساندن هزینه در قابلیت انتخاب مسئله
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
8247 | 2013 | 8 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Expert Systems with Applications, Volume 40, Issue 10, August 2013, Pages 3918–3925
چکیده انگلیسی
We consider a feature selection problem where the decision-making objective is to minimize overall misclassification cost by selecting relevant features from a training dataset. We propose a two-stage solution approach for solving misclassification cost minimizing feature selection (MCMFS) problem. Additionally, we propose a maximum-margin genetic algorithm (MMGA) that maximizes margin of separation between classes by taking into account all examples as opposed to maximizing margin of separation using a few support vectors. Feature selection is carried out by either an exhaustive or a heuristic simulated annealing approach in the first stage and a cost sensitive classification using either MMGA or cost sensitive support vector machines (SVM) in the second stage. Using simulated and real-world data sets and different misclassification cost matrices, we test our two-stage approach for solving the MCMFS problem. Our results indicate that feature selection plays an important role when misclassification cost asymmetries increase and the MMGA shows equal or better performance than the SVM.
مقدمه انگلیسی
Feature selection (FS) problem, also known as dimensionality reduction problem, has been extensively studied in statistics, pattern recognition (Zavaschi, Britto, Oliveira, & Koerich, 2013), finance (Won, Kim, & Bae, 2012), medicine (Gorunescu, Belciug, Gorunescu, & Badea, 2012) and data mining literature (Pinheiro et al., 2012, Saeys et al., 2007 and Zhuang et al., 2012). Among the advantages of FS are better generalization due to lower model overfitting; and efficient and cost effective data mining models (Saeys et al., 2007 and Stracuzzi and Utgoff, 2004). There are several statistical and machine learning methods that are available for solving the FS problem. Guyon and Elisseeff (2003) provide a good introduction to some of these methods. While FS methods are well established, for datasets containing large number of decision-making attributes, selecting features from training data is not always easy and is often computationally intensive (Saeys et al., 2007). For datasets containing a large number of input decision-making attributes, an automatic approach is required for identifying low cardinality subset of relevant input decision-making attributes (Stracuzzi & Utgoff, 2004). Reducing cardinality of input decision-making attributes reduces organizational information acquisition cost and improves generalizability of a classifier. Since several real-world classification problems require FS (Fahlman and Lebiere, 1990 and Kivinen and Warmuth, 1997), a variety of FS approaches are proposed in data mining literature. These FS approaches from the literature can be divided into two main categories: filter methods and wrapper methods. The primary difference between filter and wrapper methods is that the filter methods select input decision-making attributes independent of the learning algorithm, whereas the wrapper methods make learner-dependent selection of input decision-making attributes. Filter methods use statistical or information theoretical measures to reduce dimensionality. Among the techniques used in filter methods are cross entropy, correlations, principal component analysis, χ2 test and distance metrics ( Dunteman, 1989, Hall, 1999, Kira and Rendell, 1992, Koller and Sahami, 1996 and Liu and Setino, 1997). Wrapper methods, on the other hand, use search based methods for selecting input decision-making attributes. These search based methods include greedy search (Caruana & Freitag, 1994), backward elimination beam search (Aha & Bankert, 1994), nearest neighborhood search (Langley & Sage, 1994), backward best first search (Kohavi & John, 1997), randomized search (Stracuzzi & Utgoff, 2004), and heuristic genetic algorithms (Vafaie & Jong, 1995). A general consensus in the literature is that the wrapper methods outperform the filter methods (Stracuzzi & Utgoff, 2004). To our knowledge, most studies on the FS problem have assumed symmetric misclassification error costs. The FS problem under asymmetric misclassification error cost did not receive a whole lot of attention in the literature. It is possible that the misclassification cost asymmetry and the number of selected features may interact with each other making certain features more relevant when misclassification costs are symmetric and certain other features more relevant when misclassification costs are asymmetric. Additionally, the benefit of using FS when misclassification cost asymmetries increase is not clearly established. That is, it is not clear if FS is more beneficial, less beneficial or about the same when misclassification cost asymmetries increase. In our research, we implemented a misclassification cost minimizing feature selection (MCMFS) problem to gain better understanding of the FS problem under varying symmetric and asymmetric misclassification costs. When solving the MCMFS, we are essentially solving two problems: the FS problem and the misclassification cost minimization problem. In our research, we solved these two problems separately in two different stages. In the first stage, we selected decision-making features and in the second stage, we learned a misclassification cost minimizing discriminant function based on features selected in the first stage. We propose a two-stage wrapper method procedure for solving the MCMFS. Additionally, we propose two search based methods for solving the FS problem in the first stage. The first FS method uses exhaustive backtracking search and considers all possible features of cardinality greater than or equal to two decision-making attributes. The major drawback of exhaustive backtracking search is increased computational requirements making it unrealistic for a large number of input decision-making attributes. As a result, we propose a second FS method that uses simulated annealing (SA) heuristic to identify features of cardinality greater than one. The SA heuristic procedure does not guarantee an optimal solution but uses realistic computational time to provide a heuristic solution. For learning a discriminant function that minimizes misclassification cost in the second stage, we use two different methods as well. Our first method is the misclassification cost minimizing support vector machine (SVM) proposed by Masnadi-Shirazi and Vasconcelos (2010). We design our second method by considering some principles of the SVM (Bradley, Fayyad, & Mangasarian, 1999) where a classification function that minimizes misclassification cost also minimizes an upper bound on generalization error. While there are several classification algorithms that allow decision-makers to minimize misclassification costs, Pendharkar and Nanda (2006) showed that genetic algorithm (GA) based misclassification cost minimizing classifiers generally perform the best and are more flexible. Thus, we use a GA based linear classifier that minimizes misclassification cost and provides a separating plane that maximizes the margin of separation between classes. We call our misclassification cost minimizing classifier a max-margin genetic algorithm (MMGA). The rest of the paper is organized as follows. In Section 2, we formally present the MCMFS and propose a framework for two-stage solution approach. In Section 3, we describe the SVM and the MMGA classifier for solving classification problems involving asymmetric misclassification costs. In Section 4, using the proposed framework of two stage solution approach, we describe three hybrid approaches – backtracking-MMGA (BT-MMGA), BT-SVM and simulated annealing MMGA (SA-MMGA) – for solving the MCMFS. In Section 5, we describe our simulated and real-world data; and experiments and results. In Section 6, we conclude our paper with the summary of our findings.
نتیجه گیری انگلیسی
We proposed an MCMFS problem and provided exhaustive and heuristic procedures to solve the MCMFS problem. Additionally, we formulated an MMGA for solving the misclassification cost minimizing classification problem. Using simulated and real-world datasets; three different misclassification cost matrices and proposed solution procedures; we solved the MMCA problem using three feature selection procedures (BT-MMGA, SA-MMGA and BT-SVM) and traditional misclassification cost minimizing non-feature selection based SVM. The results of our experiments showed that the BT-MMGA was the best approach when misclassification costs were asymmetric and neither the SA-MMGA nor the BT-SVM could perform better than the BT-MMGA. When misclassification costs were symmetric, FS did not appear to be a good strategy and non-FS based SVM always performed better than the FS based methods. Our experiments also indicated that the set of relevant features was dependent on misclassification cost asymmetry and a set of features that were best for one misclassification cost asymmetry may not always be the best when another different misclassification cost asymmetry was chosen. The BT-MMGA and the BT-SVM provided good results for up to 10 decision-making attributes. However, for datasets containing more than 10 decision-making attributes, exhaustive search BT procedure became computationally intensive. Under such cases, SA-MMGA or other heuristic procedures were the only viable options for solving the MCMFS problem. Our two stage framework was a general framework which is suitable to include other misclassification cost minimizing classifiers such as probabilistic neural networks. Future research may focus on comparing the performance of MMGA and other misclassification cost minimizing classifiers for solving the MCMFS problem. For large scale problems containing over 10 attributes, when exhaustive search procedures cannot be used, it may be desirable to develop procedures to establish lower bounds on overall misclassification cost. These lower bounds can provide a good benchmark for performance comparisons of heuristic techniques.