روش برنامه ریزی پویا برای بهینه سازی قواعد تصمیم گیری تقریبی
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|25877||2013||16 صفحه PDF||سفارش دهید||10149 کلمه|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Information Sciences, Volume 221, 1 February 2013, Pages 403–418
This paper is devoted to the study of an extension of dynamic programming approach which allows sequential optimization of approximate decision rules relative to the length and coverage. We introduce an uncertainty measure R(T) which is the number of unordered pairs of rows with different decisions in the decision table T. For a nonnegative real number β, we consider β-decision rules that localize rows in subtables of T with uncertainty at most β. Our algorithm constructs a directed acyclic graph Δβ(T) which nodes are subtables of the decision table T given by systems of equations of the kind “attribute = value”. This algorithm finishes the partitioning of a subtable when its uncertainty is at most β. The graph Δβ(T) allows us to describe the whole set of so-called irredundant β-decision rules. We can describe all irredundant β-decision rules with minimum length, and after that among these rules describe all rules with maximum coverage. We can also change the order of optimization. The consideration of irredundant rules only does not change the results of optimization. This paper contains also results of experiments with decision tables from UCI Machine Learning Repository.
Decision rules are widely used as parts of algorithms (parallel or nondeterministic), as a way for knowledge representation, and also as parts of classifiers (predictors). Exact decision rules can be overfitted, i.e., depend essentially on the noise or adjusted too much to the existing examples. Approximate rules are less dependent on the noise. They have usually smaller number of attributes and are better from the point of view of understanding. Classifiers based on approximate decision rules have often better accuracy than the classifiers based on exact decision rules. Therefore approximate decision rules and closely connected with them approximate reducts are studied intensively last years by Nguyen et al. , , , , , , , , , , , , ,  and . There are different approaches to the construction of decision rules and reducts: brute-force approach which is applicable to tables with relatively small number of attributes, genetic algorithms ,  and , Apriori algorithm , simulated annealing , Boolean reasoning ,  and , separate-and-conquer approach (algorithms based on a sequential covering procedure) , , ,  and , ant colony optimization ,  and , algorithms based on decision tree construction ,  and , different kinds of greedy algorithms  and . Each method can have different modifications. For example, as in the case of decision trees, we can use greedy algorithms based on different uncertainty measures (Gini index, entropy, etc.) for construction of decision rules. We are interested in the construction of short rules which cover many objects. In particular, the choice of short rules is connected with the Minimum Description Length principle . The rule coverage is important to discover major patterns in the data. Unfortunately, the problems of minimization of length and maximization of coverage of decision rules are NP-hard. Using results of Feige  it is possible to prove that under reasonable assumptions on the class NP there are no approximate algorithms with high accuracy and polynomial complexity for minimization of decision rule length. The most part of approaches mentioned above (with the exception of brute-force, Apriori, and Boolean reasoning) cannot guarantee the construction of shortest rules or rules with maximum coverage. To work with approximate decision rules, we introduce an uncertainty measure R(T) that is the number of unordered pairs of rows with different decisions in the decision table T. Then we consider β-decision rules that localize rows in subtables of T with uncertainty at most β. To construct optimal β-decision rules we use approach based on an extension of dynamic programming. We construct a directed acyclic graph Δβ(T) which nodes are subtables of the decision table T given by systems of equations of the kind “attribute = value”. We finish the partitioning of a subtable when its uncertainty is at most β. The parameter β helps us to control computational complexity and makes the algorithm applicable to solving more complex problems. The constructed graph allows us to describe the whole set of so-called irredundant β-decision rules. Then we can find (in fact, describe) all irredundant β-decision rules with minimum length, and after that among these rules find all rules with maximum coverage. We can change the order of optimization: find all irredundant β-decision rules with maximum coverage, and after that find among such rules all irredundant β-decision rules with minimum length. We prove that by removal of some conditions from the left-hand side of each β-decision rule that is not irredundant and by changing the decision on the right-hand side of this rule we can obtain an irredundant β-decision rule which length is at most the length of initial rule and the coverage is at least the coverage of initial rule. It means that we work not only with optimal rules among irredundant β-decision rules but also with optimal ones among all β-decision rules. Similar approach to decision tree optimization was considered in ,  and . First results for decision rules based on dynamic programming methods were obtained in . The aim of this study was to find one decision rule with minimum length for each row. In this paper, we consider algorithms for optimization of β-decision rules relative to the length and coverage, and results of experiments with some decision tables from UCI Machine Learning Repository  based on Dagger software system created in KAUST. This paper consists of nine sections. Section 2 contains main notions. Section 3 is devoted to the consideration of irredundant β-decision rules. In Section 4, we study directed acyclic graph Δβ(T) which allows to describe the whole set of irredundant β-decision rules. In Section 5, we consider a procedure of optimization of this graph (in fact, corresponding β-decision rules) relative to the length, and in Section 6 – relative to the coverage. In Section 7, we discuss possibilities of sequential optimization of rules relative to the length and coverage. Section 8 contains results of experiments with decision tables from UCI Machine Learning Repository, and Section 9 – conclusions.
نتیجه گیری انگلیسی
We studied an extension of dynamic programming approach to the optimization of β-decision rules relative to the length and coverage. This extension allows us to describe the whole set of irredundant β-decision rules and optimize these rules sequentially relative to the length and coverage or relative to the coverage and length. We discussed the notion of totally optimal β-decision rule. We considered also results of experiments with decision tables from UCI Machine Learning Repository . We can optimize rules also relative to the number of misclassifications. In this case, is possible to make sequential optimization relative to arbitrary subset and order of cost functions length l, coverage c, number of misclassifications m: (l, c), (c, l), (l, m), (m, l), (m, c), (c, m), (l, c, m), (l, m, c), (c, l, m), (c, m, l), (m, l, c), (m, c, l). Optimizations of irredundant β-decision rules relative to the length and coverage and construction of totally optimal rules can be considered as tools which support design of classifiers. To predict the value of decision attribute for a new object we can use in a classifier only totally optimal rules or rules with maximum coverage, etc. Moreover, it is known that, very often, classifiers based on approximate decision rules have better accuracy than classifiers based on exact decision rules. Short rules which cover many objects can be useful also in knowledge discovery to represent knowledge extracted from decision tables. In this case, rules with smaller number of descriptors are more understandable. Future investigations will be connected with the study of other cost functions, uncertainty measures and construction of classifiers.