رویکرد مبتنی بر بهینه سازی برای طراحی شبکه های بیزی
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
28638 | 2008 | 14 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Mathematical and Computer Modelling, Volume 48, Issues 7–8, October 2008, Pages 1265–1278
چکیده انگلیسی
Bayesian networks model conditional dependencies among the domain variables, and provide a way to deduce their interrelationships as well as a method for the classification of new instances. One of the most challenging problems in using Bayesian networks, in the absence of a domain expert who can dictate the model, is inducing the structure of the network from a large, multivariate data set. We propose a new methodology for the design of the structure of a Bayesian network based on concepts of graph theory and nonlinear integer optimization techniques.
مقدمه انگلیسی
A Bayesian network is a graphical representation of an nn-dimensional probability distribution. It is a directed acyclic graph (DAG) in which each node represents a variable of interest, and the arcs represent dependencies between the variables. The strengths of the dependencies are quantified by conditional probabilities. Thus, the architecture of a Bayesian network involves two components: (1) the structure of the network, which describes the direct association pairs of variables, and (2) the parameters of the network that represent the probability distribution. In some contexts, the structure of the network may have a causal interpretation, i.e., the ending node of an arc may be a direct effect of the node at the beginning of the arc. In such cases, a Bayesian network is known as a causal network [33]. To represent causal relations, we identify the parents of each node (nodes incident to the tails of all arcs into the node) that are the direct causes of the node. Bayesian networks have been used in a wide variety of domains, including biology [18], computer games [42], data mining [29] and [4], diagnosis of failures [5] and [37], medicine [38], [3], [2], [28] and [43], prediction [1] and [44], reliability analysis [41], students advising [13] and [35] and weather forecasting [24]. There are several advantages in representing a probability distribution by a Bayesian network. Bayesian networks allow a fast and intuitive understanding of the relations (dependence vs. independence) between the variables. The chain-rule allows joint distributions to be represented as a product of conditional distributions. In order to use a Bayesian network, it is necessary to have at least a subset of the conditional distributions available. However, in general, it is less difficult to derive those conditional distributions than it is to specify the joint distribution. The probability distribution represented by a Bayesian network can be used, for example, to do Bayesian inference, i.e., if values of some of the variables are known and they are introduced in the network, it is possible to use the network to compute other probabilities, given partial evidence. Two approaches are available for determining the structure of a Bayesian network. One, a human expert who is an expert in the domain dictates the arcs that connect the nodes in the network. Two, a computerized induction method is used to derive a structure directly from a relevant data set. Construction of Bayesian networks using the former method is time consuming and both benefits from, and is limited by, the subjective judgment of the expert consulted. For those reasons, there has been substantial recent interest in identifying a computerized, objective methodology for inducing the graphical structure of the Bayesian network from a multivariate data set. We propose a new approach, using a mathematical program, to create the network structure. In order to formulate the objective function of that mathematical program, we define coefficients of dependence between variables. The coefficients are used to define a measure of the global dependence in the network, which is then maximized over the space of structures of Bayesian networks.
نتیجه گیری انگلیسی
Bayesian network models are attractive for capturing relationships among variables because they explicitly represent the presence or absence of dependencies, and, if appropriate, may also provide a mechanism for detecting causal links. In this paper, we proposed a new, optimization-based method for inducing the network structure solely from the data, as opposed to requiring guidance from an expert human. Creating a model using mathematical programming produces an architecture that is replicable, and that may be more objective than one produced by a human. When human input to the modeling process is available and reliable, our method might be used to suggest possible structures for review by the expert, speeding up the modeling process when the data set is too large or complex to be effectively scanned by a human. The numerical results on three data sets support the conjecture that our modeling approach is capable of constructing an adequate recovery of a known network structure and that the structures proposed by our method yield good predictive results. Our approach does have several limitations. Our objective function only considers the conditional independence between two variables given a single other variable, using what we term 1-step dependence coefficients. The results on the Asia network, some of the relationships which involve more complex relationships, show that our method may therefore include additional arcs that are not present in the true network, that is, our method may include a small number of spurious conditional dependencies where the conditional dependence between two variables involves two or more other variables. The extra arcs do not appear to detract from the predictive power of our networks, but they may affect the interpretation of the overall model. Additionally, because our 1-step coefficients are symmetric, our mathematical program does not indicate the direction of the arcs, only their presence or absence. As a result, in some cases there may be some arcs with ambiguous direction. The ambiguity does not affect the predictive power of the result, but, again, it may affect the model’s interpretation. Finally, our formulation requires that all variables be discrete, and transforming a continuous variable into a discrete one may result in an undesirable loss of information. In the future it would be interesting to extend the current mathematical programming model to address the above issues. It is possible to extend the 1-step dependence coefficient to higher levels, at the cost of a more complex optimization problem, but with the benefit that the resulting network would be less likely to include spurious arcs. Our model is based on a chi-square association measure, but other such measures exist. A likelihood-ratio measure might yield different or better network architectures. A more complex formulation might resolve the issue of ambiguous direction arcs, and might also be able to incorporate directly continuous variables.