دانلود مقاله ISI انگلیسی شماره 29316
ترجمه فارسی عنوان مقاله

الگوریتم ترکیبی برای یادگیری ساختار شبکه های بیزی و کاربرد آن در آموزش چند برچسب

عنوان انگلیسی
A hybrid algorithm for Bayesian network structure learning with application to multi-label learning
کد مقاله سال انتشار تعداد صفحات مقاله انگلیسی
29316 2014 18 صفحه PDF
منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Expert Systems with Applications, Volume 41, Issue 15, 1 November 2014, Pages 6755–6772

ترجمه کلمات کلیدی
- شبکه های بیزی - آموزش چند برچسب - مرز مارکوف - ویژگی انتخاب زیر مجموعه -
کلمات کلیدی انگلیسی
Bayesian networks, Multi-label learning, Markov boundary, Feature subset selection,
پیش نمایش مقاله
پیش نمایش مقاله  الگوریتم ترکیبی برای یادگیری ساختار شبکه های بیزی و کاربرد آن در آموزش چند برچسب

چکیده انگلیسی

We present a novel hybrid algorithm for Bayesian network structure learning, called H2PC. It first reconstructs the skeleton of a Bayesian network and then performs a Bayesian-scoring greedy hill-climbing search to orient the edges. The algorithm is based on divide-and-conquer constraint-based subroutines to learn the local structure around a target variable. We conduct two series of experimental comparisons of H2PC against Max–Min Hill-Climbing (MMHC), which is currently the most powerful state-of-the-art algorithm for Bayesian network structure learning. First, we use eight well-known Bayesian network benchmarks with various data sizes to assess the quality of the learned structure returned by the algorithms. Our extensive experiments show that H2PC outperforms MMHC in terms of goodness of fit to new data and quality of the network structure with respect to the true dependence structure of the data. Second, we investigate H2PC’s ability to solve the multi-label learning problem. We provide theoretical results to characterize and identify graphically the so-called minimal label powersets that appear as irreducible factors in the joint distribution under the faithfulness condition. The multi-label learning problem is then decomposed into a series of multi-class classification problems, where each multi-class variable encodes a label powerset. H2PC is shown to compare favorably to MMHC in terms of global classification accuracy over ten multi-label data sets covering different application domains. Overall, our experiments support the conclusions that local structural learning with H2PC in the form of local neighborhood induction is a theoretically well-motivated and empirically effective learning framework that is well suited to multi-label learning. The source code (in R) of H2PC as well as all data sets used for the empirical tests are publicly available.

مقدمه انگلیسی

A Bayesian network (BN) is a probabilistic model formed by a structure and parameters. The structure of a BN is a directed acyclic graph (DAG), whilst its parameters are conditional probability distributions associated with the variables in the model. The problem of finding the DAG that encodes the conditional independencies present in the data attracted a great deal of interest over the last years (Gasse et al., 2012, Kojima et al., 2010, Peña, 2012, Perrier et al., 2008, Rodrigues de Morais and Aussem, 2010a, Scutari, 2010, Scutari and Brogini, 2012 and Villanueva and Maciel, 2012). The inferred DAG is very useful for many applications, including feature selection (Aliferis et al., 2010, Peña et al., 2007 and Rodrigues de Morais and Aussem, 2010b), causal relationships inference from observational data (Aliferis et al., 2010, Aussem et al., 2012, Aussem et al., 2010, Brown and Tsamardinos, 2008, Cawley, 2008, Ellis and Wong, 2008 and Prestat et al., 2013) and more recently multi-label learning (Dembczyski et al., 2012, Guo and Gu, 2011 and Zhang and Zhang, 2010). Ideally the DAG should coincide with the dependence structure of the global distribution, or it should at least identify a distribution as close as possible to the correct one in the probability space. This step, called structure learning, is similar in approaches and terminology to model selection procedures for classical statistical models. Basically, constraint-based (CB) learning methods systematically check the data for conditional independence relationships and use them as constraints to construct a partially oriented graph representative of a BN equivalence class, whilst search-and-score (SS) methods make use of a goodness-of-fit score function for evaluating graphical structures with regard to the data set. Hybrid methods attempt to get the best of both worlds: they learn a skeleton with a CB approach and constrain on the DAGs considered during the SS phase. In this study, we present a novel hybrid algorithm for Bayesian network structure learning, called H2PC.1 It first reconstructs the skeleton of a Bayesian network and then performs a Bayesian-scoring greedy hill-climbing search to orient the edges. The algorithm is based on divide-and-conquer constraint-based subroutines to learn the local structure around a target variable. HPC may be thought of as a way to compensate for the large number of false negatives at the output of the weak PC learner, by performing extra computations. As this may arise at the expense of the number of false positives, we control the expected proportion of false discoveries (i.e. false positive nodes) among all the discoveries made in PCTPCT. We use a modification of the Incremental association Markov boundary algorithm (IAMB), initially developed by Tsamardinos et al. in Tsamardinos, Aliferis, and Statnikov (2003) and later modified by Peña in Peña (2008) to control the FDR of edges when learning Bayesian network models. HPC scales to thousands of variables and can deal with many fewer samples (n<qn<q). To illustrate its performance by means of empirical evidence, we conduct two series of experimental comparisons of H2PC against Max–Min Hill-Climbing (MMHC), which is currently the most powerful state-of-the-art algorithm for BN structure learning (Tsamardinos, Brown, & Aliferis, 2006), using well-known BN benchmarks with various data sizes, to assess the goodness of fit to new data as well as the quality of the network structure with respect to the true dependence structure of the data. We then address a real application of H2PC where the true dependence structure is unknown. More specifically, we investigate H2PC’s ability to encode the joint distribution of the label set conditioned on the input features in the multi-label classification (MLC) problem. Many challenging applications, such as photo and video annotation and web page categorization, can benefit from being formulated as MLC tasks with large number of categories (Dembczyski et al., 2012, Kocev et al., 2007, Madjarov et al., 2012, Read et al., 2009 and Tsoumakas et al., 2010b). Recent research in MLC focuses on the exploitation of the label conditional dependency in order to better predict the label combination for each example. We show that local BN structure discovery methods offer an elegant and powerful approach to solve this problem. We establish two theorems (Theorems Theorem 6 and Theorem 7) linking the concepts of marginal Markov boundaries, joint Markov boundaries and so-called label powersets under the faithfulness assumption. These Theorems offer a simple guideline to characterize graphically: (i) the minimal label powerset decomposition, (i.e. into minimal subsets YLP⊆YYLP⊆Y such that View the MathML sourceYLPY⧹YLP|X), and (ii) the minimal subset of features, w.r.t an Information Theory criterion, needed to predict each label powerset, thereby reducing the input space and the computational burden of the multi-label classification. To solve the MLC problem with BNs, the DAG obtained from the data plays a pivotal role. So in this second series of experiments, we assess the comparative ability of H2PC and MMHC to encode the label dependency structure by means of an indirect goodness of fit indicator, namely the 0/10/1 loss function, which makes sense in the MLC context. The rest of the paper is organized as follows: In the Section 2, we review the theory of BN and discuss the main BN structure learning strategies. We then present the H2PC algorithm in details in Section 3. Section 4 evaluates our proposed method and shows results for several tasks involving artificial data sampled from known BNs. Then we report, in Section 5, on our experiments on real-world data sets in a multi-label learning context so as to provide empirical support for the proposed methodology. The main theoretical results appear formally as two theorems (Theorems 6 and 7) in Section 5. Their proofs are established in the Appendix. Finally, Section 6 raises several issues for future work and we conclude in Section 7 with a summary of our contribution.

نتیجه گیری انگلیسی

We first discussed a hybrid algorithm for global or local (around target nodes) BN structure learning called Hybrid HPC (H2PC). Our extensive experiments showed that H2PC outperforms the state-of-the-art MMHC by a significant margin in terms of edge recall without sacrificing the number of extra edges, which is crucial for the soundness of the super-structure used during the second stage of hybrid methods, like the ones proposed in Perrier et al., 2008 and Kojima et al., 2010. The code of H2PC is open-source and publicly available online at https://github.com/madbix/bnlearn-clone-3.4. Second, we discussed an application of H2PC to the multi-label learning problem which is a challenging problem in many real-world application domains. We established theoretical results, under the faithfulness condition, in order to characterize graphically the so-called minimal label powersets that appear as irreducible factors in the joint distribution and their respective Markov boundaries. As far as we know, this is the first investigation of Markov boundary principles to the optimal variable/feature selection problem in multi-label learning. These formal results offer a simple guideline to characterize graphically: (i) the minimal label powerset decomposition, (i.e. into minimal subsets YLP⊆YYLP⊆Y such that View the MathML sourceYLPY⧹YLP|X), and (ii) the minimal subset of features, w.r.t an Information Theory criterion, needed to predict each label powerset, thereby reducing the input space and the computational burden of the multi-label classification. The theoretical analysis laid the foundation for a practical multi-label classification procedure. Another set of experiments were carried out on a broad range of multi-label data sets from different problem domains to demonstrate its effectiveness. H2PC was shown to outperform MMHC by a significant margin in terms of global accuracy. We suggest several avenues for future research. As far as BN structure learning is concerned, future work will aim at: (1) ascertaining which independence test (e.g. tests targeting specific distributions, employing parametric assumptions etc.) is most suited to the data at hand (Scutari, 2011 and Tsamardinos and Borboudakis, 2010); (2) controlling the false discovery rate of the edges in the graph output by H2PC (Peña, 2008) especially when dealing with more nodes than samples, e.g. learning gene networks from gene expression data. In this study, H2PC was run independently on each node without keeping track of the dependencies found previously. This lead to some loss of efficiency due to redundant calculations. The optimization of the H2PC code is currently being undertaken to lower the computational cost while maintaining its performance. These optimizations will include the use of a cache to store the (in) dependencies and the use of a global structure. Other interesting research avenues to explore are extensions and modifications to the greedy search-and-score procedure which is the most promising part to optimize in terms of computational gains. Regarding the multi-label learning problem, experiments with several thousands of labels are currently been conducted and will be reported in due course. We also intend to work on relaxing the faithfulness assumption and derive a practical multi-label classification algorithm based on H2PC that is correct under milder assumptions underlying the joint distribution (e.g. Composition, Intersection). This needs further substantiation through more analysis.