تنوع اقدامات برای تجزیه و تحلیل چند سیستم طبقه بندی کننده و طراحی
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
27961 | 2005 | 16 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Information Fusion, Volume 6, Issue 1, March 2005, Pages 21–36
چکیده انگلیسی
In the context of Multiple Classifier Systems, diversity among base classifiers is known to be a necessary condition for improvement in ensemble performance. In this paper the ability of several pair-wise diversity measures to predict generalisation error is compared. A new pair-wise measure, which is computed between pairs of patterns rather than pairs of classifiers, is also proposed for two-class problems. It is shown experimentally that the proposed measure is well correlated with base classifier test error as base classifier complexity is systematically varied. However, correlation with unity-weighted sum and vote is shown to be weaker, demonstrating the difficulty in choosing base classifier complexity for optimal fusion. An alternative strategy based on weighted combination is also investigated and shown to be less sensitive to number of training epochs.
مقدمه انگلیسی
A method of designing pattern recognition systems, known as the Multiple Classifier System (MCS) or committee/ensemble approach, has emerged over recent years to address the practical problem of designing classification systems with improved accuracy and efficiency. The aim is to design a composite system that outperforms any individual classifier by pooling together the decisions of all classifiers. The rationale is that it may be more difficult to optimise the design of a single complex classifier than to optimise the design of a combination of relatively simple classifiers. Attempts to understand the effectiveness of the MCS framework have prompted the development of various measures. The Margin (Section 4.1) concept was used originally to help explain Boosting and Support Vector Machines. Bias and Variance (Section 4.2) are concepts from regression theory that have motivated modified definitions for 0/1 loss function for characterising Bagging and other ensemble techniques. Various diversity measures (Section 3) have been studied with the intention of determining whether they correlate with ensemble accuracy. However, the question of whether the information available from any of these measures can be used to assist MCS design is open. Most commonly, MCS parameter values are set with the help of either a validation set or cross-validation techniques [1]. In [2] these measures are described and explained in the context of a vote counting framework. In this paper, in contrast to [2], the proposed measure relaxes the assumption on Hamming Distance (Eq. (22)) and is experimentally compared with various pair-wise diversity measures. Although it is known that diversity among base classifiers is a necessary condition for improvement in ensemble performance, there is no general agreement about how to quantify the notion of diversity among a set of classifiers. Diversity measures can be categorised into two types [3], pair-wise and non-pair-wise. In order to apply pair-wise measures to finding overall diversity of a set of classifiers it is necessary to average over the set. Non-pair-wise measures attempt to measure diversity of a set of classifiers directly, based for example on Variance, entropy or proportion of classifiers that fail on randomly selected patterns. The main difficulty with diversity measures is the so-called accuracy–diversity dilemma. As explained in [4], as base classifiers approach the highest levels of accuracy, diversity must decrease so that it is expected that there will be a trade-off between diversity and accuracy. There has been no convincing theory or experimental study to suggest that there exists any measure that can reliably predict generalisation error of an ensemble. In [3] the desirability of using negatively correlated base classifiers in an ensemble is recognised, and it is shown experimentally that four pair-wise diversity measures (Eqs. , , and ) are similarly related to majority vote accuracy when classifier dependency is systematically changed. The conclusion in [5] was that the Double Fault measure (Eq. (17)) showed reasonable correlation with some combination methods. Since there is a lack of a general theory on how diversity impacts ensemble performance, experimental studies provide an important contribution to discovering whether a relationship exists and if so whether it can be quantified and understood. To be really useful for MCS design, a measure should be capable of extracting relevant information from the training set. Model selection from training data is known to require a built-in assumption, since realistic learning problems are in general ill-posed [6]. The assumption here is that base classifier complexity is varied over a suitable range and that over-fitting of the training set is detected by observing changes in diversity or correlation. It is shown experimentally in Section 6 that, over a range of datasets, some measures are well correlated with base classifier test error when number of training epochs is varied. As with Bias/Variance definitions (Section 4.2) one must assume that the underlying probability distributions are well-behaved, and it is easy to construct examples of probability distributions for which the method fails. The results in Section 6 also demonstrate that correlation with unity-weighting test error is not as strong as with the mean base classifier test error, illustrating the difficulty of choosing base classifier complexity for optimal fusion. An alternative strategy based on weighted combination is also investigated and the sensitivity of combined test error to number of epochs is compared with unity weighting. The paper is organised as follows. A measure of correlation, based on a spectral representation of a Boolean function, is defined in Section 2. Conventional pair-wise measures are described in Section 3, which also includes proposal of a new pair-wise measure computed over pairs of patterns rather than pairs of classifiers. Margin and Bias/Variance are discussed in Section 4, and in Section 5 various weighted combination schemes are proposed. Experimental evidence, incorporating multi-layer perceptron (MLP) base classifiers in an MCS framework, is presented and evaluated in Section 6.
نتیجه گیری انگلیسی
The experiments reported in this paper demonstrate how various measures and test error vary with complexity of MLP base classifier. In order to quantify this relationship, correlation coefficients with respect to test error for varying number of training epochs was calculated for each dataset. The mean correlation coefficients for artificial and real data show that the proposed pair-wise measure, calculated over patterns rather than classifiers, is well correlated with base classifier test error and warrants further investigation. The results suggest that it may be possible to select base classifier complexity to minimise mean base classifier test error based on information extracted from the training set. MAJ or SUM is generally optimised at fewer number of epochs compared with base classifier, but design of optimal fusion based on the proposed measure may be feasible if a relationship between base classifier and fused test error can be established. An alternative strategy is to use a weighted combination, which is shown in this paper to be much less sensitive to the number of training epochs. Further work is aimed at applying these measures to multi-class problems, by incorporating (Error Correcting) Output Coding [24], which decomposes multi-class into a set of complementary two-class problems.