مجموعه پراکنده با استفاده از روش ترکیبی موزون بر اساس برنامه ریزی خطی
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
25238 | 2011 | 10 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Pattern Recognition, Volume 44, Issue 1, January 2011, Pages 97–106
چکیده انگلیسی
An ensemble of multiple classifiers is widely considered to be an effective technique for improving accuracy and stability of a single classifier. This paper proposes a framework of sparse ensembles and deals with new linear weighted combination methods for sparse ensembles. Sparse ensemble is to sparsely combine the outputs of multiple classifiers by using a sparse weight vector. When the continuous outputs of multiple classifiers are provided in our methods, the problem of solving sparse weight vector can be formulated as linear programming problems in which the hinge loss or/and the 1-norm regularization are exploited. Both the hinge loss and the 1-norm regularization are techniques inducing sparsity used in machine learning. We only ensemble classifiers with nonzero weight coefficients. In these LP-based methods, the ensemble training error is minimized while the weight vector of ensemble learning is controlled, which can be thought as implementing the structure risk minimization rule and naturally explains good performance of these methods. The promising experimental results over UCI data sets and the radar high-resolution range profile data are presented.
مقدمه انگلیسی
Recently, combining multiple classifiers has been a very active research technique. It is widely accepted that combining multiple classifiers can achieve better classification performance than a single (best) classifier, supported by experimental results [1], [2] and [3]. An ensemble means combining multiple versions of a single classifier or multiple various classifiers. One classifier used in an ensemble is called an individual or component classifier. There are two important issues in combining multiple classifiers. One is that an ensemble of classifiers must be both diverse and accurate in order to get better performance. Diversity can ensure that all the individual classifiers make uncorrelated errors. If classifiers get the same errors which will be propagated to the ensemble, no improvement can be achieved in combining multiple classifiers. In ensemble learning, there are two schemes to implement diversity [4]. One scheme is to seek diversity explicitly (i.e., to define a diversity measure and optimize it), and the other is to seek diversity implicitly. Here we consider the scheme of seeking diversity implicitly. One common way is to train individual classifiers by using different (randomly selected) training sets [5], [6] and [7]. Bagging [5] and Boosting [6] are well known examples of successfully iterative methods for reducing a generalization error. The other way is to train multiple classifiers by using different feature sets [8] and [9]. In addition, accuracy of individual classifiers is also important, since too many poor classifiers can suppress correct predictions of good classifiers. The other issue is about combination rules or fusion rules, which is regarding how to combine the outputs of individual classifiers. So far, many combination rules have been proposed [10], [11], [12], [13], [14], [15] and [16]. If the labels are available, a simple (majority) voting (SV) rule can be used [10]. If the continuous outputs like posteriori probabilities are supplied, an average, linear or nonlinear combination rules can be employed [10], [12] and [16]. Linear weighted voting is the most frequently used rule [11], [12] and [15]. Work on weighted voting have addressed the problem of weights estimation, in a regression setting [11], [14] and [15], or in a classification setting [12], [17] and [18]. A linear weighted voting based on the minimum classification error (WV-MCE) criterion is presented in [12], which is solved by using gradient descent methods. In [17], a genetic algorithm (GA) is used to select the best subset of classifiers and the corresponding weight coefficients in neural network ensembles. Grove et al. [18] suggest that we should make the minimum margin of learned ensembles as large as possible by minimizing training set error. They propose the LP-Adaboost method to find the sparse weight vector. The LP-Adaboost method in [18] and the GA-based method in [17] are the beginning of sparse ensembles. By sparse ensembles, we mean combining the outputs of all classifiers by a sparse weight vector. Each classifier model has its own weight value, zero or nonzero. Only classifiers corresponding to nonzero coefficients play a role in the ensemble. As it is known, a sparse model representation in machine learning is expected to improve the generalization performance and computational efficiency [19], [20] and [21]. The mechanism to maximize the sparseness of a model representation can be thought of as an approximative form of the minimizing description length principle which can be used to improve the generalization performance [7]. The sparsity in machine learning can be measured by the number of nonzero coefficients in a decision function. The above combination rules except LP-Adaboost and GA-based methods are to try to combine all classifiers in an ensemble. In general classifier ensembles, it is necessary to combine all individual classifiers to ensure good performance. It results in a large memory requirement and a slow classification speed [22]. Selective ensembles, also called pruned ensembles, are designed to remedy the drawbacks of general classifier ensembles. Only a fraction of individual classifiers is selected and combined by using simple or weighted voting in selective ensembles. In [22], some methods are introduced for selecting a subset of individual classifiers, and the performance of these methods are compared in several benchmark classification tasks. The problem of selecting the optimal subset of classifiers is a combinatorial search assuming that the generalization performance can be estimated in terms of some quantity measured on the training set [22]. Recently, global optimization methods, e.g., GA [23] and semi-definite programming [24] are used to solve the combinatorial search problem. Since the global methods cost a lot, some suboptimal ensemble pruning methods based on ordered aggregation are proposed, including reduce-error pruning [25], margin distance minimization (MDM) [26], orientation ordering [27], boosting-based ordering [28], expectation propagation [29], and so on. Among the pruning techniques, MDM and boosting-based ordering methods provide similar or better classification performance [22]. Actually the concept of pruned ensembles is identical with that of sparse ensembles. In pruned ensemble, the coefficients of selected classifiers are nonzero, and unselected are zero, which generates a sparse weight vector. Generally, pruned ensembles use simple voting or weighted voting. The nonzero coefficients take the value one in simple voting [22], and a value proportional to the classification accuracy of the corresponding classifier [30] and [31], or found by some optimization methods [23], [24] and [29] in weighted voting. This paper gives a framework of sparse ensemble learning, and proposes new weighted combination methods for sparse ensembles. The key problem in sparse ensembles is to find a sparse weight vector. Grove and Schuurmans use a linear programming method to find a sparse weight vector. The objective function of LP-Adaboost is to minimize maximum margin in [18]. Here, our goal is to find a sparse weight vector by minimizing the ensemble training error and simultaneously controlling the weight vector of ensemble learning, which can be taken as implementing the structural risk minimization rule from the view of machine learning. In our methods, the continuous outputs (estimated posteriori probabilities or discriminant function values) of individual classifier are required. This learning problem can also be formulated as linear programming problems in which sparseness techniques the hinge loss or/and the 1-norm regularization are used. In our experiments, we consider the k NN classifier as an individual classifier and apply the new linear weighted combination rule to combine the multiple k NN classifiers. The rest of this paper is organized as follows. In Section 2, we propose the framework of sparse ensembles and review the related work including some classical combination rules. Section 3 presents new linear weighted voting based on LP. We compare our methods with the single k NN classifier and the k NN ensemble classifiers based on other seven combination rules on the UCI data sets and the radar high-resolution range profile (HRRP) data in Section 4. Section 5 concludes this paper.
نتیجه گیری انگلیسی
This paper deals with linear weighted combination methods based on LP, which are applied to sparse ensembles. In ensembles, we consider minimizing the ensemble training error in terms of all learned individual classifiers. The problem can be cast into linear programming problems in which the hinge loss or/and the 1-norm regularization are adapted. Both techniques can induce a sparse solution. The optimization goal of these LP-based methods is to minimize the ensemble training error and to control the weight vector of ensemble learning, which accounts for their good performance. Our combination rules can be applied to the ensemble of any classifiers if posterior probabilities or discriminant values of these classifiers can be obtained. In experiments, we compare our methods with other methods by ensembling k NN classifiers. Experimental results on UCI data sets and the radar high-resolution range profile data confirm the validity of our rules. Our methods have a promising sparseness and generalization performance. Especially the WV-LP1 method behaves very well in the most of data sets investigated here.