As a powerful formalism, Bayesian networks play an increasingly important role in the Uncertainty Field. This paper proposes a hybrid method to discover the knowledge represented in Bayesian networks. The hybrid method combines dependency analysis, ant colony optimization (ACO), and the simulated annealing strategy. Firstly, the new method uses order-0 independence tests with a self-adjusting threshold value to reduce the size of the search space, so that the search process takes less time to find the near-optimal solution. Secondly, better Bayesian network models are generated by using an improved ACO algorithm, where a new heuristic function is introduced to further enhance the search effectiveness and efficiency. Finally, an optimization scheme based on simulated annealing is employed to improve the optimization efficiency in the stochastic search process of ants. In a number of experiments and comparisons, the hybrid method outperforms the original ACO-B which uses ACO and some other network learning algorithms.
Bayesian networks (BNs) are important probabilistic models within the field of artificial intelligence, and also powerful formalisms to model the uncertainty in the real world. A Bayesian network uses a graphical model to depict conditional independence among random variables in the domain and encodes the joint probability distribution. Given a network and observations of some variables, the values of other unobserved variables can be predicted by a probabilistic inference. Nowadays, many systems have been constructed based on this paradigm in a variety of different areas including vision recognition, medial diagnosis, trouble-shooting, information retrieval and so on.
With the development and popularity of BNs, earning BN structure from data has received considerable attention, and researchers have proposed various learning algorithms [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12] and [13]. Generally, these algorithms can be classified into two main categories [3]: the dependency analysis approach, and the score-and-search approach. The first poses BN learning as a constraint satisfaction problem, and constructs a BN by dependency tests [2] and [3]. The second poses BN learning as an optimization problem, and uses a search method to find a network structure with the best score where a scoring metric is employed to evaluate candidate networks [1] and [4]. Unfortunately, both approaches have their own drawbacks. For example, the first approach has to perform an exponential number of dependency tests and some test results of higher order are unreliable, while the second approach often traps in a local optimum due to huge search spaces and the limitation of search methods. To solve these problems, new algorithms have been developed in recent years. For instance, there are three efficient approaches using a meta-heuristic mechanism to get the global near-optimum in the candidate network space. The first uses Genetic Algorithm (GA) [5] and [7], the second applies Evolutionary Programming (EP) [8] and [11], and the third employs ant colony optimization (ACO) [6] and [9]. Moreover, there is a research focus [10] and [11] that combines basic ideas of the dependency analysis approach and the score-and-search approach. These hybrid methods first use a dependency analysis method to reduce the search space of candidate solutions, then employ a score-and-search method to search in the reduced space. Different methods in dependency analysis and score-and-search phases can be used, which compose different hybrid methods.
In this paper, we propose a hybrid method to learn BNs. The hybrid method consists of two phases, namely, the Conditional Independence (CI) test phase and the search phase. In the CI test phase, order-0 independence tests with a self-adjusting threshold value are conducted to dynamically restrict search spaces of feasible solutions, so that the search process in the next phase can be accelerated while keeping good solution quality. In the search phase, an improved ACO for learning BNs is used to find good models. Here we use two techniques: 1. A new heuristic function combining the global score-increase of a solution with local mutual information between nodes is introduced to enhance the search effectiveness and efficiency. 2. An optimization strategy based on a Metropolis rule of simulated annealing is employed to further improve the optimization efficiency in the stochastic searching of ants. We call our new method hybrid ant colony optimization for Bayesian network learning (HACO-B). In a number of experiments, we perform an analytical study to compare the new method to ACO-B and some other network learning algorithms. The experimental results on benchmark data sets show that the hybrid algorithm outperforms the original ACO-B and some other network learning algorithms.
The paper is organized as follows. In Section 2, we present the background of Bayesian networks and the basic idea of the ant colony optimization for learning Bayesian networks. In Section 3, we describe our new algorithm in detail. Section 4 reports our experimental results. Finally, we conclude the paper in Section 5.
In this paper, we propose a new algorithm, HACO-B, for learning Bayesian networks effectively and efficiently. The algorithm combines dependency analysis, ant colony optimization (ACO) and the simulated annealing strategy. Based on the mechanism of learning BNs by ACO, three new improvements are described. Empirical results illustrate that the new algorithm is superior in terms of the computational time on all data sets we have tested, while it can also achieve better solution quality on most data sets. Most importantly, our algorithm greatly enhances the convergence speed on large scale data sets compared to the original ACO-B algorithm. We have also presented the performance comparison between HACO-B, HEA, MDLEP, PheGT2R, BNPC, and WinMine, and found that HACO-B can discover better BNs effectively.
The three new strategies used with HACO-B yield improvements in the reduction of search spaces, the enhancement of the exploration ability of search algorithms and the adjustment of local optimizations, all of which are equally significant for stochastic search algorithms to solve other optimization problems. Our future work is to extend our study to more complex problems in learning BNs, e.g., problems with incomplete data, hidden variables and multi-relational data.