دانلود مقاله ISI انگلیسی شماره 7514
ترجمه فارسی عنوان مقاله

روش های اندازه گیری (سنجشی) پارتیشن بندی فازی با استفاده از متغیر طول رشته الگوریتم مصنوعی کلنی زنبور عسل

عنوان انگلیسی
Automatic fuzzy partitioning approach using Variable string length Artificial Bee Colony (VABC) algorithm
کد مقاله سال انتشار تعداد صفحات مقاله انگلیسی
7514 2012 21 صفحه PDF
منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Applied Soft Computing, Volume 12, Issue 11, November 2012, Pages 3421–3441

ترجمه کلمات کلیدی
ازدحام هوش - کلنی زنبور عسل مصنوعی - متغیر رشته ای طول
کلمات کلیدی انگلیسی
پیش نمایش مقاله
پیش نمایش مقاله   روش های اندازه گیری (سنجشی) پارتیشن بندی فازی با استفاده از متغیر طول رشته  الگوریتم مصنوعی کلنی زنبور عسل

چکیده انگلیسی

Swarm intelligence based automatic fuzzy clustering is recently an important and interesting unsupervised learning problem. In this article, an automatic fuzzy clustering technique is proposed based on a novel version of Artificial Bee Colony (ABC) algorithm. The idea of variable length genotypes is introduced to the ABC, and a novel version of ABC, called Variable string length Artificial Bee Colony (VABC) algorithm, is proposed. The VABC algorithm is derived from the ABC by redefining or modifying some operations in the ABC: the fixed length strings are represented by using variable length strings, the scheme for producing candidate solutions is modified, and some mutation operations are introduced. Use of VABC allows the encoding of variable number of clusters. This makes the VABC based Fuzzy C-Means clustering technique (VABC-FCM) not require a priori specification of the number of clusters. Moreover, the VABC-FCM has powerful global search ability under rational parameter setting. Some artificial data sets and real-life data sets are applied to validate the performance of VABC-FCM. The experimental results show that VABC-FCM can automatically evolve the optimal number of clusters and find proper fuzzy partitioning for these data sets when a rational validity index is adopted. Finally, the performance of VABC-FCM is compared with those of the Variable string length Genetic Algorithm based Fuzzy C-Means clustering (VGA-FCM), Particle Swarm Optimization algorithm based Fuzzy C-Means clustering (PSO-FCM), and Differential Evolutional algorithm based Fuzzy C-Means clustering (DE-FCM). The results show that the VABC-FCM outperforms VGA-FCM, PSO-FCM and DE-FCM in most of the cases.

مقدمه انگلیسی

Clustering can be considered the most important unsupervised learning problem, so, as every other problem of this kind, it deals with finding natural partitioning of a data set such that data points within the same cluster are more similar than those within different clusters. The existing clustering algorithms can be simply classified into following two categories: hierarchical clustering and partitional clustering [40]. Fuzzy C-Means (FCM) [6] is a well-known partitional clustering technique that uses the principles of fuzzy sets to evolve a partition matrix for the unlabeled data points. However, FCM has three major limitations: (1) it often gets stuck at suboptimal solutions based on the initial configuration; (2) it requires the a priori specification of the number of clusters (c); and (3) it can detect only hyper spherical shaped clusters. In this paper, we focus on investigation of the first two issues and thus propose a novel fuzzy clustering technique. To overcome the first limitation, the evolutionary algorithms and swarm intelligent techniques can be applied. Genetic Algorithm (GA) [10] and [32] is randomized search and optimization techniques guided by the principles of evolution and natural genetics. Genetic and other evolutionary algorithms such as Differential Evolution (DE) have been earlier used for clustering of data, see in [1], [7], [9], [12], [26], [27], [29], [30] and [35], without claiming of completeness. Over the last decade, modeling the behavior of social insects such as birds, ants, and bees for the purpose of search and optimization has become an emerging area of swarm intelligence and successfully applied to cluster. Some ant colony algorithm based clustering techniques were presented [13], [18] and [41] and heir performances were compared with GA, Tabu search (TS) and Simulated Annealing (SA) algorithm. The Particle Swarm Optimization (PSO), simulating bird flocking, was used for clustering in [17], [25], [31], [36] and [47]. Honey-bees are among the most closely studied social insets. Their foraging behavior, learning, memorizing and information sharing characteristics have recently been one of the most interesting research areas in swarm intelligence [45]. Recently, Karaboga [20], Karaboga and Basturk [21] have described an Artificial Bee Colony (ABC) algorithm based on the foraging behavior of honey-bees for numerical optimization problems. They have compared the performance of the ABC algorithm with those of other well-known modern heuristic algorithms such as GA, DE and PSO for unconstrained optimization problems [21]. They remarked that the performance of ABC is better than those of the GA, DE and PSO in most of the cases. This motivates the ABC algorithm based K-means clustering techniques [23] and [48] and ABC algorithm based fuzzy clustering (ABC-FCM) [22]. The performance of ABC based clustering techniques was compared with the popular heuristics algorithm in clustering such as GA, SA, TS, and PSO [22], [23] and [48]. It reveals that the ABC based clustering has very encouraging results in terms of quality of solution and the processing time required. Among all the above works, however, the number of clusters is assumed to be fixed a priori and/or the clusters are assumed to be crisp in nature. In most of the real-life situations, the number of clusters in a data set is not known a priori. The real challenge in this situation is to be able to automatically evolve a proper value of c and to provide the appropriate partitioning of a data set. To overcome the second limitation, Maulik and Bandyopadhyay [28] attempt to automatically evolve the appreciate number of clusters as well as fuzzy partitioning of a data set. For this purpose, Variable string length GA (VGA), where different chromosomes in the population may encode different number of clusters, is used. Thus, the so-called Fuzzy-VGA clustering technique is proposed. In this method, clustering validity index such as Xie-Beni (XB index) of the partitioning encoded in a chromosome is used to measure its fitness value. In order to tackle the concept of variable string lengths, the crossover and the mutation operators are redefined accordingly. Following Maulik and Bandyopadhyay, Saha and Bandyopadhyay [37] propose a fuzzy genetic clustering technique based on a new Point Symmetry distance, called Fuzzy-VGAPS, which not only can automatically evolve the number of clusters but also can deal with the clustering for different shapes. Besides, there are few literatures on this issue until now. Therefore, it is necessary and interesting to propose novel methods on investigation of this issue. As already remarked hereinbefore, the ABC algorithm is recently the most popular swarm intelligent algorithm for numerical function optimization and clustering. However, the conventional ABC based clustering techniques [22], [23] and [48] cannot automatically determine the number of clusters. It therefore is interesting to propose a novel version of ABC algorithm holding the property such as that of the VGA. In other words, as a counterpart to VGA, there exists the Variable string length ABC (VABC), in which, different strings in the same population encode different number of clusters. In order to deal with the concept of variable string length, the original exploration scheme in ABC is modified. In addition, to avoid suboptimum and to accelerate the convergence, some mutation operations are introduced. With the VABC, a novel Fuzzy C-Means algorithm, which can automatically evolve the number of clusters, is thus proposed to partition unlabeled data points. The objective of this paper is to propose a novel fuzzy partitioning approach using Variable string length ABC. The two main contributions of this study are: the proposed VABC algorithm and the VABC based fuzzy partitioning approach. The rest of this paper is organized as follows. Section 2 introduces the Variable string length ABC algorithm (VABC) after recalling the principles of the ABC algorithm. Section 3 presents the VABC based fuzzy partitioning approach (VABC-FCM). Section 4 designs a numerical experiment to analyze the influence of control parameters on the performance of VABC-FCM by using an artificial data set. In Section 5, some other artificial data sets and real-life data sets are applied to validate VABC-FCM, by comparing with other clustering approaches. The last section concludes this paper.

نتیجه گیری انگلیسی

In this study, a novel fuzzy clustering technique (VABC-FCM) is proposed based on the Variable string length Artificial Bee Colony (ABC) algorithm. The characteristics of the VABC-FCM are that it can automatically evolve the number of clusters and find the proper fuzzy partitioning for a wide variety of data sets. This novel fuzzy clustering technique is proposed through the following two steps. First of all, the idea of variable length genotypes is introduced to the Artificial Bee Colony algorithm, and the so-called Variable string length Artificial Bee Colony (VABC) algorithm is proposed. The motivations for the VABC algorithm are that, firstly, the ABC algorithm is recently a popular swarm intelligent algorithm, and as a counterpart of the Variable string length GA (VGA) algorithm, there exists Variable string length ABC algorithm; secondly, as remarked by Karaboga [21], the ABC outperforms the GA in most of the cases for numerical optimizations, and thus, proposing Variable string length version of ABC may be interesting; finally, the idea of variable length genotypes is useful and is widely applied in broad fields, introducing this idea to ABC provides a novel method in the view of algorithmic point and broadens the application of ABC algorithm in the view of applicable point. The VABC algorithm is derived from the ABC by redefining or modifying some operations in ABC: the fixed strings are represented by using variable length strings, the scheme for producing candidate solutions is modified, and some mutation operations are introduced. Second of all, the VABC is used to solve the two main open issues for clustering techniques, i.e., to make the fuzzy clustering find proper fuzzy partitioning and to automatically evolve the number of clusters without requiring the a priori specification of the number of clusters. Use of Variable string length ABC allows the encoding of variable number of clusters. This makes the VABC-FCM require no a priori specification of the number of clusters and only upper bound number of clusters is needed. By using an artificial data set, the influences of control parameters and initialization way on the performance of VABC-FCM are analyzed. It concludes that the performance of VABC-FCM is insensitive to the initialization ways and is sensitive to the selection of mutation probability threshold. It suggests that the mutation probability threshold should not be too smaller. According to the suggestions, some artificial data sets and real-life data sets are used to validate the VABC-FCM. The experimental results show that the VABC-FCM can automatically evolve the optimal number of clusters and find proper fuzzy partitioning for these data sets when a rational validity index is adopted. Based on the same data sets, the performance of VABC-FCM is compared with those of the VGA-FCM, PSO-FCM and DE-FCM. It indicates that in most of the cases the VABC-FCM outperforms the VGA-FCM, PSO-FCM and DE-FCM with respect to MS, classification accuracy, convergence speed and ability to finding the true cluster number. In this study, it shows that the selection of validity index is important to the clustering techniques. Thus, a future work is to study the VABC based multi-objective clustering techniques. Moreover, it is essential to make the VABC-FCM suitable to cluster data sets having any type of similar density clusters, irrespective of their geometrical shape and overlapping nature. This is another further work.