# آموزش ساختار شبکه های بیزی: به سوی گراف ضروری توسط ابزار برنامه ریزی خطی عدد صحیح

کد مقاله | سال انتشار | مقاله انگلیسی | ترجمه فارسی | تعداد کلمات |
---|---|---|---|---|

25486 | 2014 | 29 صفحه PDF | سفارش دهید | محاسبه نشده |

**Publisher :** Elsevier - Science Direct (الزویر - ساینس دایرکت)

**Journal :** International Journal of Approximate Reasoning, Volume 55, Issue 4, June 2014, Pages 1043–1071

#### چکیده انگلیسی

The basic idea of the geometric approach to learning a Bayesian network (BN) structure is to represent every BN structure by a certain vector. If the vector representative is chosen properly, it allows one to re-formulate the task of finding the global maximum of a score over BN structures as an integer linear programming (ILP) problem. Such a suitable zero-one vector representative is the characteristic imset, introduced by Studený, Hemmecke and Lindner in 2010, in the proceedings of the 5th PGM workshop. In this paper, extensions of characteristic imsets are considered which additionally encode chain graphs without flags equivalent to acyclic directed graphs. The main contribution is a polyhedral description of the respective domain of the ILP problem, that is, by means of a set of linear inequalities. This theoretical result opens the way to the application of ILP software packages. The advantage of our approach is that, as a by-product of the ILP optimization procedure, one may get the essential graph, which is a traditional graphical BN representative. We also describe some computational experiments based on this idea.

#### مقدمه انگلیسی

Learning a Bayesian network (BN) structure is the statistical task of model choice, where the candidate statistical structural models are ascribed to acyclic directed graphs. A score-and-search approach to this learning task consists in maximization of a quality criterion QQ, also called a score or a scoring function , which is a real function of the (acyclic directed) graph G and the observed database D . The value Q(G,D)Q(G,D) says how much the BN structure defined by the graph G, that is, the statistical model ascribed to G, fits the database D. Note that some researchers in machine learning are accustomed to identify a BN structure with the respective equivalence class of graphs and prefer to talk about learning an equivalence class of Bayesian networks. Two important technical assumptions on the criterion QQ were pinpointed in the literature in connection with computational aspects of the above-mentioned maximization task. Since the goal is to learn a BN structure, QQ should be score equivalent [3], which means it ascribes the same value to equivalent graphs, that is, to graphs defining the same BN structure. The other assumption is that QQ is decomposable , which means Q(G,D)Q(G,D) decomposes into contributions which correspond to factors in the factorization according to the graph G . Typically, it is required that QQ is additively decomposable [5], that is, Q(G,D)Q(G,D) is the sum of such local contributions, called local scores. The geometric approach is to represent every BN structure by a certain vector so that such a criterion QQ becomes an affine function of the vector representative, that is, a linear function plus a constant term. This idea was introduced by Studený already in 2005 [22] and then deepened by Studený, Vomlel and Hemmecke in 2010 [23]. A suitable (uniquely determined) zero-one vector BN representative seems to be the characteristic imset, introduced recently by Studený, Hemmecke and Lindner [24] and [12]. Jaakkola et al. [13] and Cussens [6] and [7] came independently with an analogous geometric approach. The main difference is that they used certain special zero-one vector codes of (acyclic) directed graphs to represent (non-uniquely) BN structures. On the other hand, they made much more progress with the practical application of integer linear programming (ILP) tools. To overcome technical problems with the exponential length of their vectors they utilized the idea of the reduction of the search space developed by de Campos et al. [9] and [10], based on a particular form of databases and quality criteria occurring in practice. We have compared [28] both methods of BN structure vector representation and found that the characteristic imset can be viewed as a (many-to-one) linear function of the above mentioned zero-one codes of directed graphs. We also found [28] that by transforming the polyhedral approximation used by Jaakkola et al. [13] and Cussens [7] one also gets a polyhedral approximation for the characteristic imset polytope. However, an unpleasant finding was that, while there is a one-to-one correspondence for their acyclicity-encoding inequalities, their basic non-negativity and equality constraints are transformed to much higher number of inequalities. Thus, direct transformation of their inequalities does not lead to a suitable ILP problem formulation in terms of characteristic imsets. Lindner [17] in her thesis dealt with the problem of finding a workable LP relaxation of the characteristic imset polytope, that is, an outer polyhedral approximation such that the only vectors with integer components satisfying the considered inequalities are the characteristic imsets. To overcome/avoid this problem she used the idea of extending the characteristic imset with additional components, which is a useful standard trick in combinatorial optimization. Her additional components allowed her to encode acyclic directed graphs inducing the characteristic imset. She also reported on some computational experiments based her approach. This paper is another step on the way to develop a consistent ILP approach based on characteristic imsets. Being inspired by Lindner [17], we introduce an extended vector BN structure representative which includes the characteristic imset and, moreover, encodes a certain special graph (equivalent to an acyclic directed graph). The main theoretical result is a polyhedral characterization of the domain of the respective ILP problem. Note that the objective in this ILP problem only depends on the characteristic imset and the additional components play an auxiliary role; this is different from the approach based on direct encoding of graphs [13] and [7]. More specifically, a set of linear inequalities is presented such that the only vectors with integer components in the polyhedron specified by those inequalities are the above mentioned extended characteristic imsets. The presented inequalities are classified in four groups. The number of inequalities in the first two groups is polynomial in the number of variables, that is, in the number of nodes of the graph, while the number of remaining inequalities is exponential. However, provided that the length of the vector representatives is limited/reduced to a quasi-polynomial number by the idea presented by de Campos and Ji [10], the number of inequalities in the third group can be reduced to a quasi-polynomial number as well. The inequalities in the fourth group correspond to acyclicity restrictions. In general, they cannot be reduced to a polynomial number, but the method of iterative constraint adding may be applied to solve the respective ILP problem. The overall number of our inequalities is lower than Lindner [17] provided. The main advantage of our approach is that once an optimal solution is found one can use the obtained (extended) characteristic imset to get the corresponding essential graph, which is known as a standard (unique) graphical BN representative [2]. This graph can be obtained as the solution of a secondary ILP problem: it has another objective but shares with the main ILP problem the first two groups of inequalities. Thus, the second ILP problem has both polynomial length of vectors and polynomial number of inequalities. We have also performed some computational experiments whose aim was to verify the feasibility of the approach. This was confirmed but the experiments also indicated that the number of our inequalities is too high to be able to compete with the running times presented at GOBNILP web page [32]. On the other hand, the experiments confirmed the assumption that the second ILP problem, used to get the essential graph, is easily solvable. In Section 2 we recall basic concepts and some special results on chain graphs we use later in the paper. In Section 3 more details on the ILP approach to learning BN structure are given; specifically, we describe how to compute the objective in the characteristic imsets case (from local scores) and how to utilize the search space reduction [10] in the context of characteristic imsets. Section 4 describes the theoretical basis of our specific ILP approach; we say which graphs are encoded in our extended characteristic imsets, list/comment the inequalities and formulate the main results. In Section 5 we describe both phases to learn the optimal essential graph. Section 6 deals with two specific methods to solve the main ILP problem that we tested in our computational experiments, described then in Section 7. In Conclusions we comment the results and discuss further perspectives. Appendix A contains the proofs.

#### نتیجه گیری انگلیسی

The theoretical advantage of the presented approach (in comparison with the other ILP approaches) is that the solution is a unique BN representative and the vector representatives are more compressed. The augmentation does not kill this comparative advantage because the extended characteristic imsets are still more than View the MathML source|N|3-times shorter than the straightforward graph codes. However, as the computational experiments showed, the situation is reversed as the result of usual pre-processing procedure, namely the search space reduction described in Section 3.2. This procedure seems to be particularly advantageous for the straightforward graph codes because it allows one to better exploit the results of the pruning. Our experiments confirmed that typically our pruned cc-vector (as suggested in Section 3.2) is longer the respective pruned η-vector. On the other hand, additional experiments (see Section 7.3) suggested that if one considers maximally shortened pruned characteristic imset then one can obtain a shorter vector than the pruned η-vector in practice, as well. The contribution of the paper is mainly theoretical, motivated by the aim to develop a consistent ILP approach to learning BN structure based on characteristic imsets. We reformulated the BN structure learning task as a special ILP optimization problem and derived the formula for the respective objective (see Section 3.1). We have also extended the characteristic imset, our unique vector BN structure representative, by additional components with the aim to encode certain relevant graphs (Section 4.1). Finally, we came with polyhedral characterization of the domain of the respective ILP optimization problem, which includes as an option the case of graphs with possible cycles (see Section 4.2 and Section 4.3). Our approach is particularly suitable for getting the essential graph, the standard unique graphical BN structure representative, as the solution of a simple additional ILP optimization problem. Note that this second ILP problem is, in fact, independent of the main ILP problem, that is, of the task of finding optimal BN structure and be utilized as an extension to other ILP methods for BN structure learning. Thus, the main practical benefit of the presented approach is that it allows one to get directly the essential graph as the result of solving a secondary ILP problem, that is, to avoid in this way the need for additional graph-reconstruction procedure. Motivated by the aim at practical solving of our main ILP optimization problem by the method of iterative constraint adding (see Section 6, specifically Section 6.2) we performed two series of computational experiments. The goal of the experiments was just to verify the ability to learn the BN structure, not to compete in running times with other, more developed, approaches. We fairly admit that, in our experiments, we have not succeeded to get better running times than reported by Cussens and Bartlett [32]. The experiments, mainly the second series, seem to confirm our opinion that the main reason for the comparative inefficiency of our approach is the large number of our non-acyclicity inequalities to handle. Although the number of these inequalities is theoretically polynomial in |N||N|, namely O(|N|3)O(|N|3), in the tested cases, it was much higher in comparison with simple linear number of non-negativity and equality constraints for the η-vector. Note that the exponential class of acyclicity cluster inequalities is the same for both approaches. On the other hand, the experiments confirmed that the secondary ILP procedure to find the essential graph on basis of the characteristic imset (see Section 5.2) is fast. Thus, provided we find an optimal acyclic directed graph by some other method, the LP relaxation proposed in this paper can be used to get by a simple ILP procedure the respective essential graph or an equivalent acyclic directed graph with a prescribed preference of arrow directions. One of the questions raised in Conclusions of the original PGM'12 conference paper [25] was whether, for the efficiency of learning, a smaller number of inequalities is better than a tighter LP relaxation. It seems that the experiments gave us a clear answer that to get an efficient ILP BN learning procedure one surely has to avoid large number of inequalities or come with a clever way to iteratively add them. Therefore, our future theoretical effort will be directed to finding an LP relaxation for characteristic imsets with a small number of inequalities. We hope there is a way of avoiding the exponential number of acyclicity inequalities, perhaps even by combining our approach with other ILP learning approaches. For example, Cussens et al. [8] recently came in the context of pedigree learning with a method of ruling out cycles based on |N||N| additional integral components to the η -vector and O(|N|2)O(|N|2) inequalities. An alternative research direction could be to apply the idea of iterative constraint adding to the collection of our non-acyclicity inequalities and find a way to handle the constraints by a specialized user cut detection algorithm.