القای درخت های تصمیم گیری با الگوریتم بهینه سازی کلونی مورچه ها
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
7833 | 2012 | 12 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Applied Soft Computing, Volume 12, Issue 11, November 2012, Pages 3615–3626
چکیده انگلیسی
Decision trees have been widely used in data mining and machine learning as a comprehensible knowledge representation. While ant colony optimization (ACO) algorithms have been successfully applied to extract classification rules, decision tree induction with ACO algorithms remains an almost unexplored research area. In this paper we propose a novel ACO algorithm to induce decision trees, combining commonly used strategies from both traditional decision tree induction algorithms and ACO. The proposed algorithm is compared against three decision tree induction algorithms, namely C4.5, CART and cACDT, in 22 publicly available data sets. The results show that the predictive accuracy of the proposed algorithm is statistically significantly higher than the accuracy of both C4.5 and CART, which are well-known conventional algorithms for decision tree induction, and the accuracy of the ACO-based cACDT decision tree algorithm.
مقدمه انگلیسی
One of the most studied data mining tasks in the literature is the classification task [15] and [29]. In essence, the classification task consists of learning a predictive relationship between input values and a desired output. Each example (data instance or record) is described by a set of features (attributes)—referred to as predictor attributes—and a class attribute. Given a set of examples, a classification algorithm aims at creating a model, which represents the relationship between predictor attributes values and class values (labels), and which is able to predict the class label of a new (unseen) example based on the values of its predictor attributes. Classification problems can be viewed as optimisation problems, where the goal is to find the best function (model) that represents the predictive relationships in the data. A classification problem can be formally specified as: • Given: {(e1,ce1),…,(en,cen)}{(e1,ce1),…,(en,cen)} pairs representing the training data D, where each ei denotes the set of predictor attributes’ values of the i-th example (1 ≤ i ≤ n, where n is the total number or examples), and each ceicei denotes the class label associated with the i-th example out of m different class labels available in the set C. • Find: a function f : D → C that maps each example ei in D to its correspondent class label ceicei in C. The main goal of a classification algorithm is to build a model that maximises the predictive accuracy—the number of correct predictions—in the test data (unseen during training), although in many application domains the comprehensibility of the model plays an important role [15], [8] and [19]. For instance, in medical diagnosis the classification model should be validated and interpreted by doctors; in credit scoring the classification model should be interpreted by an expert, improving their confidence in the model; in protein function prediction the classification model should be interpreted to provide useful insights about the correlation of protein features and their functions and ultimately improve the current biological knowledge about protein functions. In these domains, it is crucial to produce comprehensible classification models. Ant colony optimization (ACO) [11], [12] and [13] algorithms involve a colony of ants (agents), which despite the relative simplicity of their individuals’ behaviours, cooperate with one another to achieve an unified intelligent behaviour. As a result, the colony produces a system capable of performing a robust search to find high-quality solutions for optimisation problems with a large search space. In the context of the classification task in data mining, ACO algorithms have the advantage of performing a flexible robust search for a good combination of predictor attributes, less likely to be affected by the problem of attribute interaction [10] and [17]. Decision trees are widely used as a comprehensible representation model, given that they can be easily represented in a graphical form and also be represented as a set of classification rules, which generally can be expressed in natural language in the form of IF-THEN rules. Most ACO algorithms for classification in data mining have focused on extracting classification rules [22] and [18]. In this paper, we propose a novel ACO algorithm for the induction of decision trees. The proposed algorithm—called Ant-Tree-Miner (ant colony optimization-based decision tree induction)—is compared against two well-known decision tree induction algorithms, namely C4.5 [31] and CART [6], and the ACO-based cACDT algorithm [5] in 22 publicly available data sets in terms of both predictive accuracy and size of the induced decision trees. The remainder of this paper is organised as follows. Section 2 presents the background of this paper, discussing the top-down strategy commonly used to induce decision trees, an overview of ant colony optimization (ACO) and the related work on ACO algorithms for induction of tree structures. The proposed algorithm is described in Section 3. The computational results are presented in Section 4. Finally, Section 5 concludes this paper and presents future research directions.
نتیجه گیری انگلیسی
This paper has proposed a novel ant colony algorithm, called Ant-Tree-Miner, for the induction of decision trees in the context of the classification task in data mining. When used as a classification model, decision trees have the advantage of representing a comprehensible model and being easily expressed in a graphical form or in a set of classification rules. The Ant-Tree-Miner algorithm follows a top-down approach by probabilistically selecting attributes to be added as decision nodes based on the amount of pheromone and heuristic information. The pheromone matrix representation takes into account the level in the decision tree that a particular branch—(attribute, value) pair—appears to discriminate between multiples occurrences of the same branch. Four variations of the Ant-Tree-Miner algorithm were compared against two well-known decision tree induction algorithms, namely C4.5 (Weka's J48 algorithm) and CART (Weka's SimpleCART algorithm), and the ACO-based cACDT decision tree algorithm in 22 publicly available data sets in terms of both predictive accuracy and number of leaf nodes of the discovered decision trees. The results have shown the Ant-Tree-Miner algorithm outperforms C4.5, CART and cACDT in terms of predictive accuracy, achieving a statistically significantly better average rank across all data sets. It also presents a good trade-off between predictive accuracy and size of the classification model (number of leaf nodes in the discovered trees): Ant-Tree-Miner achieved the best rank in predictive accuracy and second best rank in number of leaf nodes; C4.5 achieved the fourth best rank in predictive accuracy and sixth rank in number of leaf nodes; CART achieved the sixth rank in predictive accuracy and first rank in number of leaf nodes; cACDT achieved the seventh rank in predictive accuracy and the third rank in number of leaf nodes. The evaluation of different variations of the Ant-Tree-Miner algorithm have shown that the binary (single interval) discretisation in combination with the two-step (classification accuracy and C4.5's error-based) pruner provides the best performance in terms of both predictive accuracy and size of the classification model. There are several potential avenues for future research. It would be interesting to investigate different heuristic information measures, as well as different discretisation procedures, which can improve the selection of attributes in decision nodes and consequently the structure of the decision trees. The use of vertices to represent leaf nodes in the construction graph, allowing the ants to choose when to add a leaf node, could potentially avoid the need for a pruning procedure. This would not only simplify the algorithm, but also provide an elegant solution for the problem of overfitting the training data during the construction of a candidate decision tree.