دامنه صلاحیت طبقه بندی شبکه های بیزی نیمه ساده لوحانه
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
29295 | 2014 | 29 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Information Sciences, Volume 260, 1 March 2014, Pages 120–148
چکیده انگلیسی
The motivation for this paper comes from observing the recent tendency to assert that rather than a unique and globally superior classifier, there exist local winners. Hence, the proposal of new classifiers can be seen as an attempt to cover new areas of the complexity space of datasets, or even to compete with those previously assigned to others. Several complexity measures for supervised classification have been designed to define these areas. In this paper, we want to discover which type of datasets, defined by certain range values of the complexity measures for supervised classification, fits for some of the most well-known semi-naive Bayesian network classifiers. This study is carried out on continuous and discrete domains for naive Bayes and Averaged One-Dependence Estimators (AODE), which are two widely used incremental classifiers that provide some of the best trade-offs between error performance and efficiency. Furthermore, an automatic procedure to advise on the best semi-naive BNC to use for classification, based on the values of certain complexity measures, is proposed.
مقدمه انگلیسی
The use of complexity measures (CMs) for supervised classification has received increasing attention since their formal definition in Ho and Basu [28]. These measures have been defined to determine the intrinsic characteristics of real-world classification problems. They aim at characterising the complexity of a classification problem by a set of geometrical descriptors, such as: the degree of linear separability amongst classes, Fisher’s discriminant ratio or the width of the class boundary. Hence, many subsequent studies have applied these CMs to find the domains of competence of different classifiers [3], [4], [56], [38], [39] and [41]. Other works have also attempted to generalise these measures to problems with multiple classes [47] and [49], as the original definition only covers binary-class problems, although there is still no agreement on the set of measures to use in the multi-class setting. There is no work, however, relating to semi-naive Bayesian network classifiers (BNCs), or to discrete domains, since these CMs have traditionally been applied to numeric domains. The simplest of the BNCs is naive Bayes (NB), which assumes all the attributes are independent given the class. In spite of its naive assumption, it performs surprisingly well in certain domains [13]. Hence, numerous techniques have been proposed that aim to improve the accuracy of NB by alleviating the attribute interdependence problem. We refer to them as semi-naive Bayesian network classifiers, a term first introduced by Kononenko [34]. It is now used to group together the BNCs that extend the naive Bayes graphical structure by adding extra arcs with the goal of alleviating the naive Bayes’ independence assumption in a computationally efficient manner. Hence, they either do not perform a structural search, or if they do it is very simple (usually quadratic in the number of attributes in the worst case).1 The family of semi-naive Bayesian network classifiers is being extensively used for machine learning and data mining in a variety of scientific applications, especially the Averaged One-Dependence Estimator2 (AODE) [69]. These two classifiers, namely NB and AODE, have already been supported by the research community, and are considered good representatives of the family of semi-naive BNCs. Surprisingly, their domains of competence have remained unexplored until this paper. We also focus on this family of classifiers also because they are able to naturally deal with uncertainty with similar complexity demands. They provide relatively good error rates with low computational requirements, and they are able to learn incrementally, an important property for learning from big data as well. In the literature we can find studies based on data complexity measures for classifiers belonging to different paradigms, e.g. decision trees, fuzzy classifiers, probabilistic classifiers, support vector machines, neural networks and nearest neighbour classification [3], [4], [56], [38], [39], [41] and [54]. In this type of studies, classifiers of (very) different complexity are analyzed, NB being usually considered as the representative for probabilistic reasoning. The training/classification time and space required by these classifiers with respect to the number of attributes, instances, classes and values per attribute vary substantially. In this paper we focus on the probabilistic setting, and so we assume that the user wants to use a probabilistic classifier, which is very usual when having to deal with uncertain relations between the variables and/or a probabilistic outcome is desired besides the predicted class label. Furthermore, we want to compare the domains of competence of classifiers of similar complexity, and this is the reason why we set the focus on the semi-naive BNC family, where little or no effort is devoted to structural learning, leaving out more complex BNCs such as those based on directly learning a Bayesian network. Hence the two major contributions of this paper are (1) the description of the common characteristics of the ideal datasets for this family of classifiers and (2) a recommendation mechanism to select the best of these classifiers for a given dataset. Thus, our main objective in this paper is to explore the behaviour of some of the semi-naive BNCs according to the values of the CMs in the literature for a particular group of datasets, where behaviour refers to the predictive power in terms of the accuracy obtained. Since the natural domain of BNCs comprises exclusively discrete attributes, we want to analyse as well the descriptive power of the CMs on discrete domains. Our study begins with the definition of the domains of competence for NB and AODE, according to the values of a selected group of CMs. We select these two BNCs because NB represents the baseline and AODE is (perhaps) the most outstanding semi-naive BNC, they do not require structural learning and, in both cases, the learning algorithm has no tunable parameters, which allows for a cleaner and unbiased study. To do so, we obtain rules which describe both good and bad behaviours for these two classifiers. These rules take the values of data complexity metrics in their antecedents, so the behaviour of the classifiers is predicted from the values of several CMs of a particular dataset prior to their application. This study is carried out both on discrete and numeric domains. We analyse the global tendency followed by the values of these CMs when they are calculated on the discrete version of numeric datasets. Furthermore, we propose an automatic procedure to advise on the best semi-naive BNC to be used for prediction on a particular dataset. For this purpose we consider NB, AODE, the Hidden One-Dependence Estimator (HODE), Tree Augmented NB (TAN) and the k-dependence Bayesian classifier (KDB). The rest of the paper is divided as follows: Section 2 contains four subsections devoted to preliminaries and previous work, that is, Section 2.1 describes the semi-naive BNCs considered; Sections 2.2 and 2.3 include a short background to data complexity and describe the different complexity measures respectively; and Section 2.4 briefly describes an algorithm to extract ranges of good and bad behaviour in a set of datasets given a particular complexity measure. Section 3 provides the first case study, we characterise NB and AODE (in discrete and continuous domains) based on several CMs. In Section 4 (case study II), a meta-classifier to predict the best semi-naive BNC, based on some of these CMs, is proposed and empirically tested. Section 5 presents the main conclusions and outlines future work. Finally, Appendix A includes several graphs showing bivariate relationships between some of the CMs on NB.
نتیجه گیری انگلیسی
This paper includes the following contributions: (1) it defines the common characteristics of the ideal datasets to be classified with the family of the most popular semi-naive BNCs, and (2) provides an automatic method to find the best BNC (in terms of error prediction) given a particular dataset. Motivated by the increasing evidence that an “all-powerful” classifier does not exist, we define the characteristics of the datasets for which these semi-naive BNCs provide accurate results. For this purpose, we have used several recently-proposed complexity measures. These measures have already shown their power to characterise classifiers of a different nature, although mainly on continuous datasets. We have characterised NB and AODE on discrete and continuous datasets, and contrary to our initial assumption, it has been easier to do this on discrete domains. The most important result is that it is possible to characterise both NB and AODE for both domains and to obtain disjoint rules to predict whether the classifier will perform well or poorly, depending on the values of some of the complexity measures. To finalise, an automatic procedure to advise on the best semi-naive BNC to use for classification is proposed. This implies that the machine learning community could predict, with an estimated error of 13%, the best possible semi-naive BNC by training a multi-label classifier. The study carried out in this paper can be easily extended in different points. Firstly, the test bed of considered datasets can be extended by incorporating the datasets from the Landscape contest, which has been created to cover a wider range of the complexity measurement space. Secondly, in our work the selection of measures (out of the 14 measures considered initially) has been made using empirical criteria, mainly based on the pattern provided by plotting the complexity measures. This might not be the best method, and so there is a need for a more theoretical way to establish the reliability of a measure to characterise a particular classifier, something which remains to be explored. Thirdly, in Appendix A, a preliminary study of the relationships between pairs of measures to characterise accuracy results for NB has been included. It remains for future works to extend this promising study for other classifiers and, probably, for higher dimensionality relations. Fourthly, we propose to consider the bias/variance decomposition [33] in future studies. Even though accuracy provides a good general view of performance, the use of bias and variance components in isolation could provide more knowledge about the good or bad performance of the different classifiers on several datasets.