## February 7, 2007

### Category Theoretic Probability Theory

#### Posted by David Corfield

Having noticed (e.g., here and here) that what I do in my day job (statistical learning theory) has much to do with my hobby (things discussed here), I ought to be thinking about probability theory in category theoretic terms. What would seem to be the most promising approach is described by Prakash Panangaden in Probabilistic Relations.

The category SRel (stochastic relations) has as objects sets equipped with a $\sigma$-field. Morphisms are conditional probability densities or stochastic kernels. So, a morphism from $( X, \Sigma_X)$ to $( Y, \Sigma_Y)$ is a function $h: X \times \Sigma_Y \to [0, 1]$ such that

1. $\forall B \in \Sigma_Y . \lambda x \in X . h(x, B)$ is a bounded measurable function,
2. $\forall x \in X . \lambda B \in \Sigma_Y . h(x, B)$ is a subprobability measure on $\Sigma_Y$.

If $k$ is a morphism from $Y$ to $Z$, then $k \cdot h$ from $X$ to $Z$ is defined as $(k \cdot h)(x, C) = \int_Y k(y, C)h(x, d y)$.

Apparently this is based on work by Michele Giry, which in turn was based on earlier work by Lawvere. This definition differs from Giry’s in the second clause where subprobability measures are allowed, rather than ordinary probability measures.

Panangaden notes that something very similar to the way that the category of relations can be constructed from the powerset functor is at stake. Just as the category of relations is the Kleisli category of the powerset functor over the category of sets, SRel is the Kleisli category of the functor over the category of measurable spaces and measurable functions which sends a measurable space, $X$, to the measurable space of subprobability measures on $X$. This functor gives rise to a monad.

Now I’d like to know

1. What is gained by the move from probability measures to subprobability measures?
2. How to put this work into relation with the fact that there is a natural choice of metric with which to give a riemannian geometry to a space of probability distributions, and also to a space of conditional distributions (see section 6, p. 39, of this).
Posted at February 7, 2007 11:50 AM UTC

TrackBack URL for this Entry:   http://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/1152

### Re: Category Theoretic Probability Theory

Can you tell us what a subprobability measure is? It doesn’t seem to be mentioned in Panangaden’s paper.

Also, you might like to take a look at Alex Simpson’s very interesting talk notes, Probabilistic Observations and Valuations.

Posted by: Tom Leinster on February 7, 2007 3:58 PM | Permalink | Reply to this

### Re: Category Theoretic Probability Theory

A subprobability measure on a space is allowed to take a value less than or equal to 1 on the whole space.

I’m not sure why they do this rather than Giry’s original construction. One motivation seems to be to model probabilistic processes from $X$ to a coproduct $X + Y$. Then you can iterate to form a process which looks to see where in Y you eventually end up. This relates to SRel being traced.

I don’t know if there’s a strictly probabilistic reason, though. Might be worth checking what the dual of the Giry version is, to correspond to the duality between SRel and what they call stochastic predicate transformations.

What can be said about enrichment of categories and Riemannian structures?

Posted by: David Corfield on February 7, 2007 4:23 PM | Permalink | Reply to this

### Re: Category Theoretic Probability Theory

There’s a monad on Measure Spaces, $1 + -$: Mes $\to$ Mes. A probability measure on $1 + X$ is a subprobability measure on $X$. Panangaden’s monad is a composite of Giry’s and $1 + -$.

I guess, to answer my own question, the opposite of the Kleisli category of Giry’s monad has as morphisms $X \to Y$, linear maps from bounded functions on $X$ to bounded functions on $Y$, which send the characteristic function on $X$ to the characteristic function on $Y$.

Posted by: David Corfield on February 8, 2007 1:05 PM | Permalink | Reply to this

### Re: Category Theoretic Probability Theory

David Mumford (yes, that one) has an interesting paper about refounding mathematics using probability measures rather than sets. (Experts will surely complain about my phrasing there; see the paper for yourself.) Apparently it settles the Continuum Hypothesis and the Axiom of Choice.

It’s been a while since I read it but I don’t recall it being particularly categorical.

Posted by: Allen Knutson on February 7, 2007 4:08 PM | Permalink | Reply to this

### Re: Category Theoretic Probability Theory

Another, and IMHO more promising, approach to fundations is Girard’s, who shows that sets are level -1, categories are level -2 and operator algebras are level -3, the last level. See
http://iml.univ-mrs.fr/~girard/truth.pdf

Posted by: Ricatego on February 8, 2007 12:23 PM | Permalink | Reply to this

### Re: Category Theoretic Probability Theory

Another, and IMHO more promising, approach to fundations is Girard’s, who shows that sets are level -1, categories are level -2 and operator algebras are level -3, the last level.

This is very intersting! But why should we expect stop there? Girard’s level -3 is bicategorial, and already here is much to learn. But eventually we will go further; ultimately, the foundations of mathematics should be explicitly ∞categorial.

Of course, this direction is opposite to the usual idea of foundations (which is why Girard is using negative numbers), but it is what we need to understand now.

Posted by: Toby Bartels on February 9, 2007 5:30 AM | Permalink | Reply to this

### Re: Category Theoretic Probability Theory

Another, and IMHO more promising, approach to fundations is Girard’s #, who shows that sets are level -1, categories are level -2 and operator algebras are level -3, the last level.

This is very intersting! But why should we expect stop there? Girard’s level -3 is bicategorial, […]

Ah! That helps me understand the former statement.

So Girard is talking about the bicategory of (operator) algebras? Objects algebras, morphisms algebra morphisms (or maybe more genrally bimodules) and 2-morphisms the obvious 2-morphisms?

(I did briefly look at the first link that Ricatego provided, but didn’t see any operator algebras discussed.)

Posted by: urs on February 9, 2007 9:30 AM | Permalink | Reply to this

### Re: Category Theoretic Probability Theory

(I did briefly look at the first link that Ricatego provided, but didn’t see any operator algebras discussed.)

I don’t see them discussed in the second link either. (The problem here is that the first link is too brief, and the second too complete. We need a third link which will be just right!) I think that the reference to ‘operator algebras’ is just a vague sense of the idea, with any actual connection a bit convoluted (operator algebras — C* algebras — Hilbert spaces — quantum logic — linear logic — level -3).

Girard promotes linear logic. So he says: * Level -1: classical logic; * Level -2: intuitionistic logic; * Level -3: linear logic.

The models of these are: * Level -1: Boolean algebras; * Level -2: Cartesian closed categories; * Level -3: linear bicategories.

As Boolean algebras are, of course, simply structured sets, that is why Girard’s levels are: * Level -1: sets; * Level -2: categories; * Level -3: bicategories.

(I say this even though Girard himself seems to find bicategorial models of little use for linear logic, unlike CCCs, which he uses a lot in his discussion of intuitionistic logic.)

I don’t know what comes next —Girard certainly seems to regard linear logic as overcoming all deficiencies of classical and intuitionistic logic, so he states there is no level below -3— but I’m sure that we’ll want it eventually.

Posted by: Toby Bartels on February 12, 2007 3:13 AM | Permalink | Reply to this

### Re: Category Theoretic Probability Theory

I wrote (in part):

The models of [linear logic] are […] bicategories.

A reference (which I have not yet read): http://www.math.mcgill.ca/rags/bicats/bicat.*.

Posted by: Toby Bartels on February 12, 2007 3:21 AM | Permalink | Reply to this

### Re: Category Theoretic Probability Theory

I think I heard Mumford give a talk about that at Berkeley. He was referring to Chris Freiling’s 1986 paper on CH and throwing darts. The negation of CH is equivalent to his axiom that for every function from $\mathbb{R}$ to the countable subsets of $\mathbb{R}$, there exist $x$ and $y$ such that $x\not\in f(y)$ and $y\not\in f(x)$. He motivates this axiom by a probabilistic intuition, that if we were to choose $x$ and $y$ independently at random, then $x$ would have probability 0 of being in $f(y)$ and $y$ would have probability 0 of being in $f(x)$. He uses a similar move to motivate axioms requiring the continuum to be at least $\aleph_\omega$.

I think set theorists largely agree with his conclusions, but think that Freiling’s motivations aren’t as good as many other motivations.

Admittedly, I have no idea how Mumford incorporates Freiling’s stuff here.

Posted by: Kenny Easwaran on February 13, 2007 11:42 PM | Permalink | Reply to this

### Re: Category Theoretic Probability Theory

Kleisli Morphisms and Randomized Congruences for the Giry Monad by E.-E. Doberkat

Posted by: David Corfield on February 7, 2007 5:04 PM | Permalink | Reply to this

### Re: Category Theoretic Probability Theory

For the CS folk out there, Avi Pfeffer and Norman Ramsey make nice application of probability monads in their Stochastic lambda calculus paper. Pfeffer has a full-fledged probabilistic modelling language, IBAL, based on the calculus.

Another implementation, in which the monads are used directly (rather than implicitly, as in IBAL) is by Martin Erwig in his Probabilistic Functional Programming project (in Haskell). What I like about Erwig’s implementation is that you can take the same monadic description of a model, and use it in two different semantic contexts, one corresponding to direct Bayesian updates, the other corresponding to sampling.

This degree of reuse is rarely seen in statistical software, and reminds me of a point made by John Langford in his blog that Bayesianism, seen as an extension of logic, naturally interpolates from the statistical setting all the way through to pure software engineering/computer programming. This point seemed very deep to me when I first read it — not n-categorically deep maybe, but enough for me!

Posted by: Allan E on February 8, 2007 5:19 AM | Permalink | Reply to this

### Re: Category Theoretic Probability Theory

I am unfortunately very much out of my mathematical depth here, but convinced there is something in it.

Here is a relevant old reference:

Statistical Isomorphism.
Norman Morse. Richard Sacksteder. The Annals of Mathematical Statistics, Vol. 37, No. 1, 203-214. Feb., 1966.

I myself built on this (and related work by Le Cam) in:
Dawid, A. P. (1980). Conditional independence for statistical operations. Ann. Statist. 8, 598–617.

Another important book-length work is:

Statistical decision rules and optimal inference. N.N.Čencov (Chentsov).
Amer. Math. Soc. (1982) (Translated from Russian: 1972 - Nauka, Moscow).

In particular Chentsov develops information geometry from a category-theoretic angle.

Posted by: Philip Dawid on February 11, 2007 6:02 PM | Permalink | Reply to this
Read the post Category Theoretic Probability Theory II
Weblog: The n-Category Café
Excerpt: Encouraged by the comment of a statistician as eminent as Phil Dawid, I shall continue with what category theory has to say about probability theory. Recall that we were considering a monad on the category of measurable spaces, Meas....
Tracked: February 13, 2007 6:58 PM
Read the post Category Theory in Machine Learning
Weblog: The n-Category Café
Excerpt: Does category theory have a future in machine learning?
Tracked: September 5, 2007 4:04 PM
Read the post Girard on the Limitations of Categories
Weblog: The n-Category Café
Excerpt: Girard on categories
Tracked: July 23, 2008 9:55 AM
Read the post Coalgebraic Modal Logic
Weblog: The n-Category Café
Excerpt: Putting together ideas of modal logic and coalgebra with ways of modelling first-order theories.
Tracked: September 7, 2009 12:52 PM

### Re: Category Theoretic Probability Theory

Those reading about this topic may also be interested in: N. N. Cencov. Statistical Decisions Rules and Optimal Inference. Translations of Mathematical Monographs. Volume 53. American Mathematical Society. 1982.

Cencov’s “category of statistical decisions” coincides with Giry’s (Lawvere’s) category. I have the sense that Cencov discovered this category independently of Lawvere although years later.

Posted by: Ralph Wojtowicz on December 23, 2009 4:04 AM | Permalink | Reply to this

### Re: Category Theoretic Probability Theory

Those reading this topic may be interested in: N. N. Cencov. Statistical Decisions Rules and Optimal Inference. Translations of Mathematical Monographs. Volume 53. American Mathematical Society. 1982.

Cencov’s “category of statistical decisions” coincides with the category of Giry and Lawvere. I believe that Cencov discovered it independently although many years after Lawvere.

Posted by: Ralph Wojtowicz on December 23, 2009 4:10 AM | Permalink | Reply to this

Post a New Comment