Skip to the Main Content

Note:These pages make extensive use of the latest XHTML and CSS Standards. They ought to look great in any standards-compliant modern browser. Unfortunately, they will probably look horrible in older browsers, like Netscape 4.x and IE 4.x. Moreover, many posts use MathML, which is, currently only supported in Mozilla. My best suggestion (and you will thank me when surfing an ever-increasing number of sites on the web which have been crafted to use the new standards) is to upgrade to the latest version of your browser. If that's not possible, consider moving to the Standards-compliant and open-source Mozilla browser.

December 6, 2007

Progic V

Posted by David Corfield

I’ve come across something promissing for the Progic project. Apparently there is a way to complete the analogy:

propositional logic : predicate logic :: Bayesian networks: ?

The answer, it is claimed, is ‘probabilistic relational models’. Now before we take a look a them, it would be worth considering in what sense Bayesian networks are propositional.

And before that we need to know what a Bayesian network is. Well, first of all, it’s a directed acyclic graph (DAG), so a graph with directed arrows, and no cycles. Each vertex corresponds to a variable which may take any of a set of values, e.g., , AA taking values in {a i}\{a_i\}. A ‘parent’ of vertex AA is a vertex with an edge pointing from it to AA. Then the final ingredient for a Bayesian network is a conditional probability distribution for each vertex given its parents, Pr(a i|Pa(A) j)Pr(a_i | Pa(A)_j).

A Bayesian network with vertices, say, A,B,C,DA, B, C, D, gives a joint probability distibution over the variables. We know we can factorise any joint distribution as, say,

P(a,b,c,d)=P(d)P(c|d)P(b|c,d)P(a|b,c,d). P(a, b, c, d) = P(d) \cdot P(c | d) \cdot P(b | c, d) \cdot P(a | b, c, d).

This corresponds to a network with arrows from DD to AA, BB and CC, from CC to AA and BB, and from BB to AA. AA is childless.

Of course, we could permute the variables and have, say, DD the childless vertex. But what we hope happens is that there are some independencies between variables. So for instance, perhaps P(b|c,d)=P(b|d)P(b | c, d) = P(b | d) and P(a|b,c,d)=P(a|b,c)P(a | b, c, d) = P(a | b, c). Or in words, BB is independent of CC conditional on DD, and AA is independent of DD conditional on BB and CC. Then a DAG may be drawn to represent these independences, with arrows going from DD to BB and CC, and arrows converging from BB and CC to AA. The distribution is encoded far more efficiently if the DAG is sparse.

If Bayesian networks are to be thought of as ‘propositional’, you’d imagine it’s because each variable is an atomic proposition. But this suggests that each vertex ought to be binary valued. While there are binary Bayesian networks, there is no need for them to have this property. Even restricting to discrete variables, they may take values in a set of any countable cardinality. One natural response is to wonder why in propositional logic we restrict ourselves to pairs of propositions, {P,¬P}\{P, \neg P\}, precisely one of which is true. Don’t we encounter situations where we’re reasoning about an entity which may be in one of three states, say, a gear stick in drive, neutral or reverse? Why treat them as three separate atoms, only to add the premise that precisely one holds?

Another issue for us at the Café is that a Bayesian network seems to be very much like an arrow from the product of the parentless vertices to probability distributions over the product of the childless vertices, an arrow in the Kleisli category. This suggests that if we have two networks, the childless vertices of the first matching the parentless vertices of the second, then they could be composed. So with a pair of simple graphs, CBC \to B and BAB \to A, we would have:

P(a i|c k)= b jP(a i|b j)P(b j|c k). P(a_i | c_k) = \sum_{b_j}P(a_i | b_j) \cdot P(b_j | c_k).

What doesn’t quite fit however is that a Bayesian network represents a distribution over the product of all the node variables. In the case above we ought to have composed a joint distribution over CC and BB with one over BB and AA resulting in one over C,BC, B and AA.

Maybe the issue is that there’s an invisible 1, out of which arrows feed into the parentless vertices, giving an (unconditional) distribution over them. If so, one ought not to compose as I suggested as the domain and codomain would not match.

If we include these unwritten arrows, e.g., 1CB1 \to C \to B as a network representing a joint distribution over BB and CC, the final node doesn’t mention CC. So we might think to draw an edge out of the parent CC to a copy of itself at the bottom (right, here) of the network, with obvious distribution P(c i|c j)=δ ijP(c_i | c_j) = \delta_{ij}. Then we’d have a network 1CB×C1 \to C \to B \times C.

Something a bit odd is going on here. Should we be after multispans, rather than arrows in a monoidal category?

Posted at December 6, 2007 12:16 PM UTC

TrackBack URL for this Entry:   https://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/1524

7 Comments & 0 Trackbacks

Re: Progic V

I guess an important part of what’s going on here is that there is no category with sets as objects, and morphisms joint probability distributions over the product of domain and codomain.

In a category like Rel we can compose R:ABR: A \to B and S:BCS: B \to C to give a relation SR:ACS \cdot R: A \to C. But given distributions over A×BA \times B and B×CB \times C, there’s no obvious composition. It’s only when we have conditional distributions P(B|A)P(B | A) and P(C|B)P(C | B) that we can compose, in the Kleislian way.

So maybe the graphical representation of the Bayesian network is rather misleading.

Posted by: David Corfield on December 7, 2007 10:11 AM | Permalink | Reply to this

Re: Progic V

Would we ever represent a relation between, say, sets A,B,C,andDA, B, C, and D, as the subset of DD featuring in the relation, followed by for each dd the subset of CC, for each dd and cc the subset of BB, and for each d,c,andbd, c, and b the subset of AA?

Posted by: David Corfield on December 14, 2007 9:24 AM | Permalink | Reply to this

Re: Progic V

Ooh, Coecke and Spekkens have brought their category theoretical picture calculus to bear on Bayesian inference here.

Posted by: David Corfield on February 14, 2011 8:33 PM | Permalink | Reply to this

Re: Progic V

My, don’t things grow up! See what they can do with Bayesian networks and category theory these days:

  • Tobias Fritz, Andreas Klingler, The d-separation criterion in Categorical Probability (arXiv:2207.05740)
Posted by: David Corfield on July 22, 2022 9:50 AM | Permalink | Reply to this

Re: Progic V

It’s quite intriguing to be reading this post ~15 years later. David wrote in 2007:

If we include these unwritten arrows, e.g., 1CB1 \to C \to B as a network representing a joint distribution over BB and CC, the final node doesn’t mention CC. So we might think to draw an edge out of the parent CC to a copy of itself at the bottom (right, here) of the network, with obvious distribution P(c i|c j)=δ ijP(c_i|c_j)=\delta_{ij}. Then we’d have a network 1CB×C1 \to C \to B \times C.

That’s actually exactly how it works! It’s one of the many uses of the “copy” morphisms that seem to be so ubiquitous in probability theory. For Bayesian networks, this idea isn’t ours, but has already been implemented in Brendan Fong’s 2012 MSc thesis.

Posted by: Tobias Fritz on July 26, 2022 10:56 AM | Permalink | Reply to this

Re: Progic V

Yes, a pleasing anticipation. Applied category theory before its time.

By the way, given your interest both in dd-separation, and so causality, and in De Finetti’s theorem done category-theoretically, how about a category-theoretic treatment of Causal de Finetti?

Posted by: David Corfield on July 26, 2022 12:01 PM | Permalink | Reply to this

Re: Progic V

Thank you for the relevant pointer – I didn’t know about that paper. I wonder how it’s related to A generalization of hierarchical exchangeability on trees to directed acyclic graphs. In any case, this is indeed one of the directions that we have in mind, and on which my coauthor Tomáš Gonda has some partial results.

Posted by: Tobias Fritz on July 26, 2022 12:54 PM | Permalink | Reply to this

Post a New Comment