This paper examines concepts of independence for full conditional probabilities; that is, for set-functions that encode conditional probabilities as primary objects, and that allow conditioning on events of probability zero. Full conditional probabilities have been used in economics, in philosophy, in statistics, in artificial intelligence. This paper characterizes the structure of full conditional probabilities under various concepts of independence; limitations of existing concepts are examined with respect to the theory of Bayesian networks. The concept of layer independence (factorization across layers) is introduced; this seems to be the first concept of independence for full conditional probabilities that satisfies the graphoid properties of Symmetry, Redundancy, Decomposition, Weak Union, and Contraction. A theory of Bayesian networks is proposed where full conditional probabilities are encoded using infinitesimals, with a brief discussion of hyperreal full conditional probabilities.
A standard probability measure is a real-valued, non-negative, countably additive set-function, such that the possibility space gets probability 1. In fact, if the space is finite, as we assume in this paper, there is no need to be concerned with countable additivity, and one deals only with finite additivity. In standard probability theory, the primitive concept is the “unconditional” probability P(A)P(A) of an event A ; from this concept one defines conditional probability P(A|B)P(A|B) of event A given event B , as the ratio P(A∩B)/P(B)P(A∩B)/P(B). This definition however is only enforced if P(B)>0P(B)>0; otherwise, the conditional probability P(A|B)P(A|B) is left undefined.
A full conditional probability is a real-valued, non-negative set-function, but now the primitive concept is the conditional probability P(A|B)P(A|B) for event A given event B . This quantity is only restricted by the relationship P(A∩B)=P(A|B)P(B)P(A∩B)=P(A|B)P(B). Note that P(A|B)P(A|B) is a well-defined quantity even if P(B)=0P(B)=0.
Full conditional probabilities offer an alternative to standard probabilities that has found applications in economics [6], [7], [8] and [35], decision theory [26] and [45] and statistics [9] and [40], in philosophy [24] and [33], and in artificial intelligence, particularly in dealing with default reasoning [1], [11], [13], [15], [23] and [30]. Applications in statistics and artificial intelligence are usually connected with the theory of coherent probabilities; indeed, a set of probability assessments is said to be coherent if and only if the assessments can be extended to a full conditional probability on some suitable space [19], [28], [39] and [45]. Full conditional probabilities are related to other uncertainty representations such as lexicographic probabilities [7] and [30], and hyperreal probabilities [25] and [27].
In this paper we study concepts of independence applied to full conditional probabilities. We characterize the structure of joint full conditional probabilities when various judgments of independence are enforced. We examine difficulties caused by failure of some graphoid properties and by non-uniqueness of joint probabilities under judgments of independence. We discuss such difficulties within the usual theory of Bayesian networks [38].
We then propose the concept of layer independence as it satisfies the graphoid properties of Symmetry, Redundancy, Decomposition, Weak Union, and Contraction. We also propose a theory of Bayesian networks that accommodates full conditional probabilities by resorting to infinitesimals, and comment on a theory of hyperreal full conditional probabilities.
This paper should be relevant to researchers concerned with full conditional probabilities and their applications for instance in game theory and default reasoning, and also relevant to anyone interested in uncertainty modeling where conditional probabilities are the primary object of interest. The paper is organized as follows. Section 2 reviews the necessary background on full conditional probabilities. Section 3 characterizes the structure of full conditional probabilities under various judgments of independence. Section 4 introduces layer factorization, defines layer independence, and analyzes its graphoid properties. Section 5 examines the challenges posed by failure of graphoid properties and non-uniqueness, paying special attention to the theory of Bayesian networks. We suggest a strategy to specify joint full conditional probabilities through Bayesian networks, by resorting to infinitesimals. Section 6 offers brief remarks on a theory of hyperreal full conditional probabilities.
We have studied concepts of independence for full conditional probabilities, and the construction of joint full distributions from marginal and conditional ones using judgments of independence. We have derived the structure of joint full conditional probabilities under epistemic/h-/full independence, and examined the semi-graphoid properties of these (and other) concepts of independence. We have introduced the condition of layer factorization; the derived concept of layer independence is particularly interesting because it satisfies all semi-graphoid properties.
We have also examined non-uniqueness of full joint conditional probabilities under various concepts of independence. We suggested a specification strategy that adapts the theory of Bayesian networks to full conditional probabilities, by parameterizing probability values with an infinitesimal ϵ. We closed by commenting on a theory of hyperreal full conditional probabilities.
Our proposal concerning modeling tools, such as Bayesian networks, can be summarized as follows. Whenever a modeling tool, originally built for standard probability measures, is to be used to specify full conditional probabilities, the most effective way to do so is to extend the tool into the hyperreal line, so that specification of probability values only deals with positive values. Instead of trying to change completely the semantics of modeling tools so as to cope with failure of graphoid properties and of uniqueness, it is better to view these modeling tools as devices that specify approximating sequences. Full conditional probabilities are then obtained in the limit, and there are no concerns about non-uniqueness.