Computing the posterior probability distribution for a set of query variables by search result is an important task of inferences with a Bayesian network. Starting from real applications, it is also necessary to make inferences when the evidence is not contained in training data. In this paper, we are to augment the learning function to Bayesian network inferences, and extend the classical “search”-based inferences to “search + learning”-based inferences. Based on the support vector machine, we use a class of hyperplanes to construct the hypothesis space. Then we use the method of solving an optimal hyperplane to find a maximum likelihood hypothesis for the value not contained in training data. Further, we give a convergent Gibbs sampling algorithm for approximate probabilistic inference with the presence of maximum likelihood parameters. Preliminary experiments show the feasibility of our proposed methods.
As a graphical representation of probabilistic causal relationships, Bayesian networks (BNs) are effective and widely used frameworks (Cooper and Herskovits, 1992, Heckerman and Wellman, 1995, Pearl, 1998 and Russel and Norving, 2002). A Bayesian network can be constructed by means of statistical learning from sample data (Buntine, 1996, Cheng et al., 2002 and Pearl, 1987).
The basic task for probabilistic inference in a Bayesian network is to compute the posterior probability distribution for a set of query variables by search result, given some observed evidence. For example, in Fig. 1, we can deduce the probability distribution for cholesterol standards of somebody whose age is 60. However, in real applications, some queries are also addressed frequently given arbitrary evidence values. For example. if we know John is 65 years old, how to deduce the probability distribution for his cholesterol standards? The ordinary inference with the Bayesian network (in Fig. 1) by search cannot answer this question, since there is no data about patients of 65 years old in the training sample. This leads to our idea of extending “search”-based inferences to “search + learning”-based inferences with a Bayesian network. “Learning” means determining a hypothesis space H and finding the most probable hypothesis h in H, given the training sample.
Full-size image (23 K)
Fig. 1.
A simple Bayesian network about Age and Cholesterol.
Figure options
Fortunately, support vector machine (SVM) is a new machine learning method based on the statistical learning theory. The support vector machine not only has solved certain problems in many learning methods, such as small sample, over fitting, high dimension and local minimum, but also has a fairly high generalization (forecasting) ability (Burges, 1998 and Chang and Liu, 2001). These characteristics of SVMs make possible extend the general Bayesian network inferences by augmenting the learning function and obtain the desired hypothesis.
In this paper, our purpose is to discuss Bayesian network inferences with the learning function, as well as the network construction. We extend general Bayesian networks by augmenting maximum likelihood parameters to make the inference done when evidence values are not contained in training sample data. Based on the support vector machine, we use a class of hyperplanes to construct the hypothesis space and use the method of solving an optimal hyperplane to find a maximum likelihood hypothesis, in which both the linear and nonlinear cases are discussed. Thus we can obtain conditional probability tables of a Bayesian network including maximum likelihood parameters. This approach is not only extending the expressive power of a Bayesian network, but also finding a new application for SVMs.
Further, in this paper we give a Gibbs sampling algorithm for approximate probabilistic inference with the presence of maximum likelihood parameters.
Preliminary experiments were conducted to test the accuracy of our proposed method for learning the maximum likelihood parameters and the convergence of the algorithm for corresponding approximate inferences. Experimental results show that our proposed methods are feasible.
The remainder of this paper is organized as follows. In Section 2, related work is introduced. In Section 3, we propose the method for maximum likelihood hypothesis learning based on SVM. In Section 4, we give the approximate inference algorithm of Bayesian networks with the learning function. In Section 5, experimental results are shown. At last, we conclude and discuss the future work in Section 6.
Computing the posterior probability distribution for a set of query variables by search result is an important task of inference with a Bayesian network. Starting from real applications, it is also necessary to make inference when the evidence is not contained in training data.
In this paper, we extent Bayesian networks with the learning function, and extend the classical “search”-based inferences to “search + learning”-based inferences. Based on the support vector machine, we use a class of hyperplanes to construct the hypothesis space and use the method of solving an optimal hyperplane to find a maximum likelihood hypothesis for the value not contained in training data. In our methods, both the linear and the nonlinear cases are considered.
Further, we give a convergent Gibbs sampling algorithm for approximate probabilistic inference with the presence of maximum likelihood parameters.
Preliminary experiments were conducted to test the accuracy of our proposed method for learning the maximum likelihood parameters and the convergence of the algorithm for corresponding approximate inferences. Experimental results show that our proposed methods are feasible.
Our proposed methods in this paper also raise some other interesting research issues. Motivated by the idea of extending general Bayesian networks with the learning function, the leaning of maximum likelihood parameters will be more efficient if we can only concern the existing conditional probability tables, instead of learning from training data. Based on SVMs, the Bayesian network learning from small samples can be further studied. These are right our future work.