یادگیری ساختار مبتنی بر حاشیه تصادفی طبقه های شبکه بیزی
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|29194||2013||8 صفحه PDF||سفارش دهید||5920 کلمه|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Pattern Recognition, Volume 46, Issue 2, February 2013, Pages 464–471
The margin criterion for parameter learning in graphical models gained significant impact over the last years. We use the maximum margin score for discriminatively optimizing the structure of Bayesian network classifiers. Furthermore, greedy hill-climbing and simulated annealing search heuristics are applied to determine the classifier structures. In the experiments, we demonstrate the advantages of maximum margin optimized Bayesian network structures in terms of classification performance compared to traditionally used discriminative structure learning methods. Stochastic simulated annealing requires less score evaluations than greedy heuristics. Additionally, we compare generative and discriminative parameter learning on both generatively and discriminatively structured Bayesian network classifiers. Margin-optimized Bayesian network classifiers achieve similar classification performance as support vector machines. Moreover, missing feature values during classification can be handled by discriminatively optimized Bayesian network classifiers, a case where purely discriminative classifiers usually require mechanisms to complete unknown feature values in the data first. Highlights ► Stochastic margin-based structure learning of Bayesian network classifiers (BNC). ► We introduce the maximum margin criterion for discriminative structure learning of BNC. ► We empirically show its advantages using deterministic/stochastic search heuristics. ► We compare generative/discriminative parameter learning on generative/discriminative structured BNC. ► Missing features can be handled by discriminatively optimized BNC.
Generative probabilistic classifiers optimize the joint probability distribution of the features XX and the corresponding class labels C using maximum likelihood (ML) estimation. The class label is usually predicted using the maximum a posteriori estimate of the class posteriors P(C|X)P(C|X) obtained by applying Bayes rule. Discriminative probabilistic classifiers such as logistic regression model P(C|X)P(C|X) directly. Discriminative classifiers may lead to better classification performance, particularly when the class conditional distributions poorly approximate the true distribution . Basically, in Bayesian network classifiers both parameters and structure can be learned either generatively or discriminatively . Discriminative learning requires objective functions such as classification rate (CR), conditional log-likelihood (CL), or the margin (as we propose to use in this paper), that optimize the model for a particular inference scenario, e.g. for a classification task. We are particularly interested in learning the discriminative structure1 of a generative Bayesian network classifier that factorize as P(C,X)=P(X|C)P(C)P(C,X)=P(X|C)P(C). Learning the graph structure of a Bayesian network classifier is hard. Optimally learning various forms of constrained Bayesian network structures is NP-hard  even in the generative sense. Recently, approaches for finding the globally optimal generative Bayesian network structure have been proposed. These methods are based on dynamic programming  and , branch-and-bound techniques  and , or search over various variable orderings . The experiments of these exact methods are restricted to ∼50∼50 variable nodes. Alternatively, approximate methods such as stochastic search or greedy heuristics are used, which can handle cases with many more variables. Discriminative structure learning is not less difficult because of the non-decomposability2 of the scores. Discriminative structure learning methods – relevant for learning Bayesian network classifiers – are usually approximate methods based on local search heuristics. In , a greedy hill-climbing heuristic is used to learn a classifier structure using the CR score. Particularly, at each iteration one edge is added to the structure which complies with the restrictions of the network topology and the acyclicity constraints of a Bayesian network. In a similar algorithm, the CL has been applied for discriminative structure learning . Recently, we introduced a computationally efficient order-based greedy search heuristic for finding discriminative structures . Our order-based structure learning is based on the observations in  and shows similarities to the K2 heuristic . However, we proposed to use a discriminative scoring metric (i.e. CR) and suggest approaches for establishing the variable ordering based on conditional mutual information . One of the most successful discriminative classifiers, namely the support vector machine (SVM), finds a decision boundary which maximizes the margin between samples of distinct classes resulting in good generalization properties  of the classifier. Recently, the margin criterion has been applied to learn the parameters of probabilistic models. Taskar et al.  observed that undirected graphical models can be efficiently trained to maximize the margin. More recently, Guo et al.  introduced the maximization of the margin for parameter learning based on convex relaxation to Bayesian networks. We proposed to use a conjugate gradient algorithm for maximum margin optimization of the parameters and show its advantages with respect to computational requirements . Further generative and discriminative parameter learning methods for Bayesian network classifiers are summarized in  and  and references therein. In this paper, we use the maximum margin (MM) criterion for discriminative structure learning. We use greedy hill-climbing (HC) and stochastic search heuristics such as simulated annealing (SA)  and  for learning discriminative classifier structures. SA is less prone to get stuck in local optima. We empirically evaluate our margin-based discriminative structure learning heuristics on two handwritten digit recognition tasks, one spam e-mail, and one remote sensing data set. We use naive Bayes (NB) as well as generatively and discriminatively optimized tree augmented naive Bayes (TAN)  structures. Furthermore, we experimentally compare both discriminative and generative parameter learning on both discriminative and generatively structured Bayesian network classifiers. Maximum margin structure learning outperforms recently proposed generative and discriminative structure learning approaches. SA heuristics mostly lead to better performing structures at a lower number of score evaluations (CR or MM) compared to HC methods. Discriminative parameter learning produces a significantly better classification performance than ML parameter learning on the same classifier structure. This is especially valid for cases where the structure of the underlying model is not optimized for classification . We introduced the MM score for structure learning in  using the HC heuristic. The benefit of the MM score over other discriminative scores (i.e. CR) remained open in  since the HC heuristic might get trapped in local optimal solutions. This makes the reported performance gain of the MM score during structure learning ambiguous—either MM is useful, or the HC heuristic using CR gets stuck in low-performing local optimal solutions. For this reason we use SA which partially alleviates this problem. Recently, we also used the MM score for exact structure learning of Bayesian network classifiers . This method is capable to find the global optimal solution. It is based on branch-and-bound techniques within a linear programming framework which offers the advantage of an any-time solution, i.e. premature termination of the algorithm returns the currently best solution together with a worst-case sub-optimality bound. Empirically it is shown that MM optimized structures compete with SVMs and outperform generatively learned network structures. Unfortunately, experiments are limited to rather small-scale data sets from the UCI repository . To overcome these limitations, we use approximate methods for structure learning in this paper. The paper is organized as follows: In Section 2, we introduce Bayesian network classifiers as well as NB and TAN structures. In Section 3, we present the non-decomposable discriminative scores CL, CR, and MM. Additionally, we discuss techniques for making the determination of these discriminative scores computationally competitive. Section 4 introduces different structure learning heuristics. Particular focus is on SA which is rarely used for discriminative learning of Bayesian network structures. In Section 5, we present experimental results. Section 6 concludes the paper.
نتیجه گیری انگلیسی
In this paper, we proposed to use the maximum margin score for learning discriminative Bayesian network classifier structures. Furthermore, we replaced traditional greedy hill-climbing heuristics for discriminative structure optimization with stochastic simulated annealing methods. Simulated annealing offers mechanisms to escape local optima. The main results of the work are: (i) Maximum margin structure learning mostly outperform other generative and discriminative structure learning methods. (ii) Stochastic optimization leads for most cases to better performing Bayesian network structures and requires a lower number of score evaluations in comparison to greedy hill-climbing. (iii) We also provide results for discriminative parameter learning on top of generatively or discriminatively optimized structures. Margin-optimized Bayesian networks perform on par with SVMs in terms of classification rate, however the Bayesian network classifiers can directly deal with missing features, a case where discriminative classifiers usually require feature imputation techniques.