ترکیب شبکه های کاربردی و تجزیه و تحلیل حساسیت به عنوان متدی برای انتخاب ویژگی
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|26498||2011||9 صفحه PDF||سفارش دهید||محاسبه نشده|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Expert Systems with Applications, Volume 38, Issue 10, 15 September 2011, Pages 12930–12938
In this paper, a new wrapper method for feature selection, namely IAFN-FS (Incremental ANalysis Of VAriance and Functional Networks for Feature Selection) is presented. The method uses as induction algorithm the AFN (ANOVA and Functional Networks) learning method; follows a backward non-sequential strategy from the complete set of features (thus allowing to discard several variables in one step, and so reducing computational time); and is able to consider “multivariate” relations between features. An important characteristic of the method is that it permits the user the interpretation of the results obtained, because the relevance of each feature selected or rejected is given in terms of its variance. IAFN-FS is applied to several benchmark real-world classification data sets showing adequate performance results. Also, a comparison with the results obtained by other wrapper methods is carried out, showing that the proposed method obtains better performance results in average.
Feature extraction addresses the problem of finding the most compact and informative set of features of a given problem, to improve the efficiency or data storage or processing. The problem of feature extraction is decomposed in two steps: feature construction and feature selection. Feature construction methods complement the human expertise to convert “raw” data into a set of useful features. It may be considered a preprocessing transformation that may include: standardization, normalization, discretization, signal enhancement, extraction of local features, etc. Some construction methods do not alter the space dimensionality, while others enlarge it, reduce it or can act in either direction. But one should be aware of not losing information at the feature construction stage. Guyon and Elisseeff (2003) argued that it is always better to err on the side of being too inclusive rather than risking to discard useful information. Adding many features seems reasonable but it comes at a price: it increases the dimensionality of the patterns and thereby immerses the relevant information into a sea of possibly irrelevant, noisy or redundant features. Feature selection is the process in which the number of initial features is reduced and a subset of them that retain enough information for obtaining good, or even better, performance results is selected. Feature selection is primarily performed to select relevant and informative features, but it can have other motivations, including: • general data reduction, to limit storage requirements and increase algorithm speed, • feature set reduction, to save resources in the next round of data collection or during utilization, • performance improvement, to gain in predictive accuracy, • data understanding, to gain knowledge about the process that generated the data or simply visualize the data, • diminish temporal resources, providing faster and most-effective models. There are several ways of classifying feature selection methods. The most common taxonomy is classifying them in wrapper and filter methods (Kohavi & John, 1997). Filter methods rely on general characteristics of the training data in order to provide a complete order of the features using a relevance index, without optimizing the performance of a predictor (Guyon & Elisseeff, 2003). Wrapper methods use a learning algorithm, along with a statistical re-sampling technique such as cross-validation to score subsets of features according to their predictive value (Kohavi & John, 1997). Wrapper methods are usually more expensive computationally, but also result in better performance (Blum & Langley, 1997). Another important classification of feature selection methods is the distinction between univariate and multivariate algorithms. At this respect, many filters (ANOVA, t-test, Kolmogorov–Smirnoff test, etc.), and also some wrappers (Setiono and Liu, 1997, Yu and Chen, 2005 and Yu et al., 2005) rank/select the features according to some consistency or redundancy measure that assumes feature independence. This assumption has some limitations: • features that are not individually relevant may become relevant in the context of others; • features that are individually relevant may not all be useful because of possible redundancies. Some existing methods do not effectively capitalize on the nature of multivariate associations between classes. This is especially important for data set with overlapping patterns which cannot be efficiently separated based on distance measure or decision boundaries. So-called “multivariate” methods take into account feature dependencies (Guyon et al., 2002 and Lai et al., 2006). Multivariate methods potentially achieve better results because they do not make simplifying assumptions of variable/feature independence. One justification of multivariate methods is that features that are individually irrelevant may become relevant when used in combination (example XOR problem). Another justification of multivariate methods is that they take into account feature redundancy and yield more compact subsets of features. Wrappers use different strategies to perform a search through the space of feature subsets. The size of this space is exponential in the number of features, and thus an exhaustive search is intractable in most real situations. Mainly, search strategies can be backward or forward, although there are some new methods that start at an intermediate point. In a forward selection (Bo & Jonassen, 2002) method one starts with an empty set of features and progressively adds features yielding to the improvement of a performance index. In a backward elimination procedure one starts with all the features and progressively eliminates the least useful ones. Both procedures are reasonably fast and robust against overfitting, and both provide nested feature subsets. However, they may lead to different subsets and, depending on the application and the objectives, one approach may be preferred over the other one. Backward elimination procedures may yield better performance but at the expense of possibly larger feature sets. However, if the feature set selected is reduced too much, performance of the method may diminish drastically. Although most recent applications of feature selection methods are in very high dimensional spaces with a comparatively reduced number of samples (Guyon & Elisseeff, 2003), there are still several issues that deserve attention in wrappers, such as the stability of the selection method and the use of knowledge to guide the search. In this paper, a wrapper algorithm for feature selection is presented. The method, called IAFN-FS (Incremental ANOVA and Functional Networks Feature Selection), is based on functional networks (Castillo, Cobo, Gutiérrez, & Pruneda, 1998) and analysis of variance decomposition. Functional networks (FN) are a generalization of neural networks that bring together domain knowledge, to determine the structure of the problem, and data, to estimate the unknown functional neurons (Castillo et al., 1998). So, knowledge can be used to guide the feature selection. The IAFN-FS method follows a backward strategy and it considers “multivariate” relations between features. Besides, IAFN-FS presents several other advantages, such as that it allows to discard several variables in just one step. Another important advantage of the method is that it permits the user the interpretation of the results obtained, because the relevance of each feature selected or rejected is given in terms of variance. The proposed method is applied to real-world classification data sets of the UCI Learning repository, and its performance results are compared to those obtained by other wrapper methods. As it will be shown, IAFN-FS exhibits good accuracy results while maintaining a reduced set of variables. The paper is structured as follows: Section 2 gives a brief introduction to Functional Networks and describes the ANOVA decomposition including how to obtain the sensitivity indices. Section 3 describes the proposed wrapper method for feature selection. Section 4 presents the results achieved over different benchmark data sets and a comparative study is carried out; and finally, Section 5 is devoted to conclusions and future work.
نتیجه گیری انگلیسی
A new wrapper method for binary data sets, called IAFN-FS, has been described. The method is based on ANOVA decomposition and functional networks, and allows for interpretation of the selection performed over the feature set in terms of variance of the feature, alone or in combination with others. Another important advantage of the method is the use of a non-sequential backward search that permits discarding more than one variable in each selection step, and thus reducing computational time. The performance of the method, both in terms of final accuracy and final number of features obtained is tested over several artificial and real data sets, and they are compared with those achieved by other wrapper methods based on well-known classifiers such as naïve Bayes C4.5 or SVM using different search strategies. As the experimental results shown, the method proposed obtains better or similar performance results. As future work, as the algorithm only allows us for feature selection in binary data sets, we plan to extend the method for treating data sets with multiple classes. There are different ways to deal with multiple class problems; a common technique, that will be used in this case, consists on dividing the problem into several binary problems and solve each one individually and afterwards unifying the results. It was also pinpointed the requirement of an appropriate symbolic-numeric conversion in order to improve the performance, therefore, different conversion methods will be tested. Finally, another more complicated point is the use of the algorithm for data sets with a very high number of input features. In this case, the exponential complexity of the decomposition makes it difficult the application of IAFN-FS. Thus, we plan to develop a hybrid algorithm composed by a filter or an ensemble of filters and using the wrapper proposed herein as a second step.