استفاده از بهینه سازی کلونی مورچگان برای مونتاژهای پیکربندی انباشته سازی برای داده کاوی
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
22310 | 2014 | 15 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Expert Systems with Applications, Volume 41, Issue 6, May 2014, Pages 2688–2702
چکیده انگلیسی
An ensemble is a collective decision-making system which applies a strategy to combine the predictions of learned classifiers to generate its prediction of new instances. Early research has proved that ensemble classifiers in most cases can be more accurate than any single component classifier both empirically and theoretically. Though many ensemble approaches are proposed, it is still not an easy task to find a suitable ensemble configuration for a specific dataset. In some early works, the ensemble is selected manually according to the experience of the specialists. Metaheuristic methods can be alternative solutions to find configurations. Ant Colony Optimization (ACO) is one popular approach among metaheuristics. In this work, we propose a new ensemble construction method which applies ACO to the stacking ensemble construction process to generate domain-specific configurations. A number of experiments are performed to compare the proposed approach with some well-known ensemble methods on 18 benchmark data mining datasets. The approach is also applied to learning ensembles for a real-world cost-sensitive data mining problem. The experiment results show that the new approach can generate better stacking ensembles.
مقدمه انگلیسی
Over years of development, it has become more and more difficult to improve significantly the performance of a single classifier. Recently, there has been growing research interest in the method to combine different classifiers together to achieve better performance. The combining method is referred to as Ensemble. In early research, ensembles were proved empirically and theoretically to perform more accurately than any single component classifier in most cases. If an ensemble is generated by a set of classifiers which are trained from the same learning algorithm, this ensemble is a homogeneous ensemble. If an ensemble is generated by a set of classifiers, which are trained from different learning algorithms, this ensemble is a heterogeneous ensemble (Dietterich, 2000). For example, Bagging (Breiman, 1996) and Boosting (Schapire, 1990) are homogeneous ensembles, while stacking (Wolpert, 1992) is a heterogeneous ensemble. To generate an ensemble to achieve expected results, two important things should be considered carefully. The first is to introduce enough diversity into the components of an ensemble. The second is to choose a suitable combining method to combine the diverse outputs to a single output (Polikar, 2006). The diversity is the foundation of an ensemble. However, as the diversity increases, the marginal effect decreases after a certain threshold. The memories and computing cost increase significantly while the performance does not improve steadily. For early Bagging and Boosting methods, the diversity is achieved by using the re-sample strategy. The classifiers included in Bagging are trained with the data subsets, which are randomly sampled from the original dataset. A majority voting scheme is applied as the combining method to make a collective decision. Boosting uses a weighted re-sample strategy. The weights of all instances are initialized equally. If an instance is misclassified, its weight will be increased. Thus it will be more likely to select the misclassified instances into the next training subset. The diversity generating process stops when the errors are too small. The combining scheme of Boosting is a weighted majority voting. Compared to Bagging and Boosting, stacking does not manipulate the training dataset directly. Instead, an ensemble of classifiers is generated based on two levels. In the base level, multiple classifiers are trained with different learning algorithms. The diversity is introduced because different learning algorithms make different errors in the same dataset. A meta-classifier is applied to generate the final prediction. The meta-classifier is trained with a learning algorithm using a meta-dataset which combines the outputs of base-level classifiers and the real class label. One problem of stacking is how to obtain an “appropriate” configuration of the base-level classifiers and meta-classifier for each domain-specific dataset. The number of base-level classifiers and the kinds of learning algorithms are closely related to the diversity. The kind of meta-classifier is also important to the fusion of the base-level classifiers. However, such configuration is still “Black Art” (Wolpert, 1992). Some researchers have proposed different methods to determine the configuration of stacking. Ting and Witten solved two issues about the type of meta-classifier and the kinds of its input attributes (Ting & Witten, 1999). Dz̆eroski and Z̆enko introduced Multi-Response Model Trees as the meta-classifier (Džeroski & Z̆enko, 2002). Zheng and Padmanabhan, 2007 and Zhu, 2010 proposed their Data Envelopment Analysis (DEA) approaches respectively. Ledezma et al. and Ordóñez et al. proposed approaches which search the ensemble configurations using Genetic Algorithms (GAs) (Ledezma et al., 2002 and Ordóñez et al., 2008). In this work, we propose an approach using Ant Colony Optimization (ACO) to optimize the stacking configuration. ACO is a metaheuristic algorithm which is inspired by the foraging behaviour in real ant colonies. Some approaches were proposed recently to apply ACO in data mining. Parpinelli et al. proposed Ant Miner to extract classification rules (Parpinelli, Lopes, & Freitas, 2002). Some approaches apply ACO in feature subset selection tasks (Al-Ani, 2006 and Zhang et al., 2010). The rest of this paper is organized as follows. In Section 2, the background of this work, including the related ensemble approaches and the Ant Colony Optimization method, is introduced. In Section 3, the details of our approach are presented. In Section 4, a number of conducted experiments are described to compare our approach with other ensemble methods. Further, the experiment results are presented and discussed in this section. In Section 5, our approach is applied to solve a real-world data mining problem. In the last section, a conclusion is given.
نتیجه گیری انگلیسی
6.1. Findings In this work, a comprehensive study is conducted to optimize the performance of ACO-Stacking. In the study, we develop different versions of the ACO-Stacking approach by considering different ideas, such as the adoption of local information. In the first version (ACO-S1), no local information is implemented and only one learning algorithm (C4.5 DT) is used to create the meta-classifier. We focus on the effects of ACO in guiding the search and the combination of base-level classifiers. The first version aims to find as many as possible combinations of base-level classifiers with the same meta-classifier. The second version (ACO-S2) is quite different from the first. We implement the concept of a classifiers pool. The base-level classifiers are all trained prior to the stacking searching process, instead of training them when they are selected by some stackings. The classifiers pool may improve the efficiency of the training process (Ordóñez et al., 2008). The second difference is that we extend the searching space of the stacking by introducing the meta-classifiers set. The local information is also introduced in this version to accelerate the convergence process to find the optimal solution. The third version (ACO-S3) is similar to the second, the major change being that the correlative differences of base-level classifiers are used as the local information. From the comparison between ACO-Stacking and other ensemble methods including AdaBoost, Bagging, Random Forest, StackingC and GA-Ensemble, on the 18-benchmark datasets, it can be observed that ACO-Stacking has better performance than other ensemble methods in many datasets. By using Holm’s procedure (Holm, 1979), ACO-S3 outperforms Bagging, Random Forest and AdaBoost at the 5% level of significance. It outperforms GA-Ensemble at the 10% level of significance. From the comparison between these three versions of ACO-Stacking, ACO-S3 wins ACO-S1 in 11 benchmark datasets, and wins ACO-S2 in 12 benchmark datasets. However, ACO-S1 wins ACO-S2 in 11 benchmark datasets. We found that, without integrating local information into ACO-Stacking, the pure ACO-Stacking approach is more stochastic than other versions in generating ensembles. The pool of base-level classifiers is expected to provide better results. However, since ACO-S2 applies precision as the local information, the base-level classifiers with higher precision may have similar decision boundaries for certain difficult problems while some base-level classifiers with lower precision may have better decision boundaries. Moreover, if such situations occur frequently in the search process, the performance of ACO-S2 could be affected, which explains why ACO-S2 is significantly outperformed by ACO-S3 in six datasets at the 5% level. ACO-S3 uses correlative differences of base-level classifiers to overcome such problem in order to have a more diverse combination of base-level classifiers. For the results of the real-world cost-sensitive data mining application in direct marketing, the proposed approach is able to generate good cost-sensitive ensembles as it significantly outperforms most of the other methods including Logistic Regression, Bayesian Networks, Bagging, AdaCost, and AdaC2 in both cumulative response lifts and cumulative profit lifts. 6.2. Contributions and implications In this work, the contributions are threefold. Firstly, this is the first work to apply Ant Colony Optimization to a stacking configuration problem. Stacking is a well-known ensemble; however, how to configure an optimal stacking for a specific dataset is still regarded as a “black art”. Furthermore, though Ant Colony Optimization performs well in many applications, it has not been implemented in solving stacking configuration problems. In this study, ACO is firstly integrated into the stacking configuration searching process. Secondly, we implement the local information in the ACO-Stacking process. Several kinds of local information are studied to improve the performance of ACO. The correlative differences, which represent the variations of predictions from different classifiers, are adopted in our latest version of ACO-Stacking. Thirdly, this approach could be applied to solve different data mining problems and real-world direct marketing problems. Direct marketing data is often very unbalanced and cost-sensitive, thus making it hard to solve its problems with regular data mining models. ACO-Stacking is modified with cost-sensitive measures to tackle this problem. It is important to emphasize that it is not necessary for the learning algorithms used to generate the base-level classifiers and meta-level classifier to be cost-sensitive. By using our ACO-Stacking method, these non-cost-sensitive learning algorithms can be employed to handle cost-sensitive data-mining problems. In comparison with other approaches, our approach performs better. In the dataset, our approach gains a higher cumulative response rate and greater profits than other approaches. 6.3. Future work In this work, we limit our ACO-Stacking approach to a single performance evaluation criterion for each application. For example, only accuracy is used in the benchmark datasets and only the cumulative profit lift is used for the direct marketing application. ACO has been proved to be strong in multi-criteria optimization problems. One possible future direction is to extend ACO-Stacking to find multi-criteria ensembles. Furthermore, only two measures for local information (Precision and correlative differences) are selected and applied in the approach. However, many other criteria can be employed as local information, so the best metric for local information can be further explored. A relatively short execution time is very essential for an application. One future direction of this work is to modify ACO-Stacking to run in parallel to improve the efficiency. Much research has been done to parallelize the ACO approach on a Graphic Processing Unit thereby to accelerate the execution efficiency without many additional resources required. Ensembles do not only refer to ensembles of classifiers. Nowadays, ensembles are widely used in clustering and regression tasks (Zhou et al., 2001 and Fern and Brodley, 2003). In the future, we may try to use our ACO-Stacking approach in clustering and regression tasks.