دانلود مقاله ISI انگلیسی شماره 7742
ترجمه فارسی عنوان مقاله

رمان الگوریتم گروهی برای طبقه بندی پزشکی مبتنی بر بهینه سازی کلونی مورچه

عنوان انگلیسی
A novel ensemble algorithm for biomedical classification based on Ant Colony Optimization
کد مقاله سال انتشار تعداد صفحات مقاله انگلیسی
7742 2011 10 صفحه PDF
منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Applied Soft Computing, Volume 11, Issue 8, December 2011, Pages 5674–5683

ترجمه کلمات کلیدی
بهینه سازی کلونی مورچه - مجموعه راف - یادگیری کنسرت گروه - طبقه بندی پزشکی
کلمات کلیدی انگلیسی
پیش نمایش مقاله
پیش نمایش مقاله  رمان الگوریتم گروهی برای طبقه بندی پزشکی مبتنی بر بهینه سازی کلونی مورچه

چکیده انگلیسی

One of the major tasks in biomedicine is the classification and prediction of biomedical data. Ensemble learning is an effective method to significantly improve the generalization ability of classification and thus have obtained more and more attentions in the biomedicine community. However, most existing techniques in ensemble learning employ all the trained component classifiers to constitute ensembles, which are sometimes unnecessarily large and can lead to extra memory costs and computational times. For improving the generalization ability and efficiency of ensemble for biomedical classification, an Ant Colony Optimization and rough set based ensemble approach is proposed in this paper. Ant Colony Optimization and rough set theory are incorporated to select a subset of all the trained component classifiers for aggregation. Experiment results show that compared with existing methods, it not only decreases the size of ensemble, but also obtains higher prediction performance.

مقدمه انگلیسی

One of the major tasks in biomedicine is the classification and prediction of biomedical data. It could lead us to the elucidation of the secrets of life or ways to prevent certain currently non-curable diseases such as HIV. Although laboratory experiment is the most effective method for investigating the data, it is very financially and labor expensive. With the rapid increase in size of the biomedical databases, it is essential to use computational algorithms and tools to automate the classification process. Now, many algorithms in the fields of machine learning have therefore been widely used for the classification analysis of biomedical data, such as decision trees, k-nearest neighbor and artificial neural network [1]. Ensemble method is one of the major advances in machine learning in the past years. It is learning algorithm that trains a set of component classifiers and then combines their predictions to classify new examples [2]. As an effective method to improve classification performance, ensemble technique is available for the classification analysis of biomedical data and thus gaining more and more attentions in biomedicine community. However, ensemble method has two important drawbacks. Firstly, it requires much more memory to store all the learning models in the ensemble, and secondly it takes much more computation time to produce a prediction for an unlabeled example. The storage and computation time increase with the number of component classifiers in the ensemble. Most existing techniques in ensemble learning employ all the trained component classifiers to constitute ensembles, which are sometimes unnecessarily large and can lead to extra memory costs and computational times. The problems frequently limit the application of ensemble method to classification of biomedical data. Rough set theory, introduced by Pawlak in 1982, is a formal mathematical tool to deal with imprecision, uncertainty and vagueness [3]. As an important feature selection method, rough set can preserve the meaning of the features. The essence of rough set approach to feature selection is to find a subset of the original features. However, the number of possible subsets is always very large when N (N is the number of features) is large because there are 2N subsets and to examine exhaustively all subsets of features for selecting the optimal one is an NP-hard problem. Therefore, it is necessary to investigate fast and effective approximate algorithms. Previous methods employed an incremental hill-climbing algorithm to select feature. However, this often led to a non-minimal feature combination. Ant Colony Optimization (ACO) is a population-based paradigm that can be used to find approximate solutions to difficult optimization problems. The first ACO algorithm which can be classified within this technique was introduced in the early 1990s by Colorni and Dorigo [4], and since then many diverse variants of the basic principle have been reported in the literature. ACO algorithm is inspired by the social behavior of ant colonies in their search for the shortest path to food sources. Although they have no sight, ants are capable of finding the shortest route between a food source and their nest by chemical materials called pheromone that they leave when moving. As an important branch of newly developed form of artificial intelligence called Swarm Intelligence, ACO algorithm has been shown to be an effective tool in finding good solutions. It has an advantage over simulated annealing and Genetic Algorithm approaches when the graph may change dynamically because it can be run continuously and adapt to changes in real time [5]. ACO algorithm was firstly used in solving traveling salesman problem (TSP) [6] and then has been successfully applied to a large number of difficult problems like the quadratic assignment problem (QAP), routing in telecommunication networks, graph coloring problems, feature selection, etc. [7]. Particularly, ACO is attractive for feature selection since there is no heuristic information that can guide search to the optimal minimal subset every time and ants can discover the best feature combinations as they traverse the graph when features are represented as a graph. For improving the prediction ability and efficiency to classify biomedical data, an ACO and rough set based ensemble algorithm is proposed in this paper. ACO and rough set theory are incorporated to select a subset of the all trained component classifiers for aggregation. Experiment results show that compared with existing methods, it not only decreases the size of ensemble, but also obtains higher performance of prediction for biomedical data. The remainder of the paper is organized as follows. Section 2 gives an overview of related work. Section 3 introduces the basic background ideas about ensemble learning, rough set and ACO for the sake of further discussion. Section 4 introduces the incorporation of ACO with rough set for feature selection. Section 5 describes the proposed novel ensemble algorithm in detail. Section 6 discusses experimental results. Finally, Section 7 presents concluding remarks and directions of our future work

نتیجه گیری انگلیسی

With the rapid increase in size of the biomedical databases, classifying the data effectively and efficiently has become critical. Ensemble learning is a hot topic in machine learning and has high potential application in biomedicine. To improve efficiency and effectiveness of ensemble for biomedical data, a novel ensemble algorithm is proposed based on ACO and rough set theory. Comparison of the proposed algorithm with popular methods is conducted on four biomedical datasets. The experimental results indicate that the proposed algorithm yields much better performance than artificial neural network, Bagging, Boosting, kNN and SVM. Our future effort is to combine the proposed algorithm and feature selection to improve the classification performance further.