طبقه بندی تولید گروهی با استفاده از خوشه بندی لایه غیر یکنواخت و الگوریتم ژنتیک
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
8155 | 2013 | 13 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Knowledge-Based Systems, Volume 43, May 2013, Pages 30–42
چکیده انگلیسی
In this paper, we propose a novel cluster oriented ensemble classifier generation method and a Genetic Algorithm based approach to optimize the parameters. In the proposed method the data set is partitioned into a variable number of clusters at different layers. Base classifiers are trained on the clusters at different layers. Due to the variability of the number of clusters at different layers, the cluster compositions in one layer are different from that in another layer. Due to this difference in cluster contents, the base classifiers trained at different layers are diverse among each other. A test pattern is classified by the base classifier of the nearest cluster at each layer and the decisions from different layers are fused using majority voting. The accuracy of the proposed method depends on the number of layers and the number of clusters at the corresponding layer. A Genetic Algorithm based search is incorporated to obtain the optimal number of layers and clusters. The Genetic Algorithm is evaluated under three different objective functions: optimizing (i) accuracy, (ii) diversity, and (iii) accuracy × diversity. We have conducted a number of experiments to evaluate the effectiveness of the different objective functions.
مقدمه انگلیسی
An ensemble classifier [1], [2] and [9] refers to a collection of base classifiers that are trained simultaneously on the data. Their decisions on a pattern are combined to obtain a classification verdict. Ensemble classifiers are also known as multiple classifier systems and committee of classifiers. The training process in an ensemble classifier aims to produce the base classifiers in such a way that they are accurate and also differ from each other in terms of the errors they make on identical patterns. This phenomenon is known as diversity [3], [4] and [5]. The fusion methods on the other hand explore ways to merge the decisions from the base classifiers into a final verdict. A commonly used approach to generate the base classifiers is by training them on different subsets of the data. This ensures diversified learning of the base classifiers and achieves higher accuracy. The subset selection algorithm varies among the different ensemble generation methods. Patterns in data set tend to scatter over the Euclidian space and form subgroups. The clustering process identifies these natural subgroups [6]. Divide-and-conquer approach towards ensemble classifier generation [7] produces training subsets for the base classifiers by clustering the data set. Space decomposition process identifies multiple clusters within the classified data (Fig. 1). Classified data refers to a labelled data set where each pattern is associated with a class label. A cluster produced in this way can be homogeneous or heterogeneous in nature. A homogeneous cluster contains patterns belonging to a single class and only the class label needs to be memorized as the decision is always unique. A heterogeneous cluster contains overlapping patterns from multiple classes that are close in Euclidian space. Each cluster is learned by a base classifier in the divide-and-conquer approach. The learning of the base classifiers is thus focussed and specialized. A pattern is classified by finding the nearest cluster and the corresponding base classifier to provide a verdict.
نتیجه گیری انگلیسی
Space decomposition or divide-and-conquer (i.e. clustering classified data) is in practice for quite a while. As reported in [8] the performance of this approach in some cases is even worse than unique classifiers. This is due to the fact that a pattern can belong to one cluster only and as a result the decision can be obtained from a single classifier. The concept of diversity thus does not apply on to this approach and only one classifier is in fact trained on a pattern leading to poor classification performance. We aim to address this issue of lack of diversity using overlapping clustering. In this regard we make use of the fact that a data set can be partitioned into different number of clusters. In k-means clustering algorithm k indicates the number of clusters and in hierarchical clustering algorithm [42] the cutoff threshold can be used to define the final number of clusters. In order to achieve diversity the data set can be independently partitioned n times into variable number of clusters and identical patterns will belong to n alternate clusters. We use the terminology n layers to refer to n alternative clustering of the data set in this paper. The decision provided by the base classifiers trained on the n non-uniform clusters at n layers can be fused to obtain the final verdict on the pattern. The clusters at different layers are non-uniform as the number of clusters vary at different layers. With clustering we can generate the base classifiers and with layers we can achieve the diversity. Note that the number of layers at which maximum diversity is achieved depends on the characteristics of the data set and needs to be optimized. We have adopted a Genetic Algorithm based search approach to identify the optimal number of layers and clusters. The optimal results in GA depend on the objective function. We have used three different objective functions: optimizing (i) accuracy, (ii) diversity, and (iii) accuracy × diversity. The research presented in this paper is based on the above philosophy and aims to: (i) develop a novel method for generating ensemble of classifiers using non-uniform cluster layers and optimizing the number of layers, (ii) investigate the impact of number of clusters and number of layers on classification accuracy, (iii) obtain a comparative analysis of the different objective functions in GA, and (iv) obtain a comparative analysis on how well the proposed approach performs compared to the commonly used approaches for ensemble classifier generation. The paper is organized as follows. Section 2 reviews existing approaches for ensemble classifier generation and decision fusion, some commonly used base classifiers and Genetic Algorithm. Section 3 presents the proposed approach to generate ensemble classifiers. The experimental platform is presented in Section 4. Section 5 presents the experimental results and discussion. Finally, Section 6 concludes the paper.