## July 19, 2022

### Probability Monads as Codensity Monads

#### Posted by Tom Leinster

My PhD student Ruben Van Belle has just published his first paper!

Ruben Van Belle, Probability monads as codensity monads. Theory and Applications of Categories 38 (2022), 811–842.

It’s a treasure trove of theorems demonstrating how many of the monads loosely referred to as “probability monads” arise as codensity monads in a certain uniform manner, which I’ll tell you about now.

A probability monad is a monad that takes a space $X$ as input and produces as output the space of probability measures on $X$. It’s an imprecise term, but there are a number of monads that are universally agreed to count as probability monads, of which the Giry monad is the best known. There are also some boundary cases: should the ultrafilter monad count as a probability monad? (An ultrafilter is a kind of primitive probability measure.)

The codensity monad of a functor $G: \mathbf{B} \to \mathbf{A}$ is a certain canonical monad on $\mathbf{A}$. If $G$ has a left adjoint $F$, then the codensity monad of $G$ is the composite $G F$. But even if $G$ doesn’t have a left adjoint, its codensity monad is often well-defined. By definition, it’s the right Kan extension of $G$ along itself. A good non-trivial example: the codensity monad of the inclusion $\mathbf{FinSet} \hookrightarrow \mathbf{Set}$ is the ultrafilter monad.

Ruben’s insight was as follows. There is something fundamentally countable about probability theory. For instance, probability is additive on countable families of mutually exclusive events, but not general families. You can define a probability measure on a countable set $X$ as an $X$-indexed family of nonnegative reals summing to $1$, but that definition doesn’t work when $X$ is larger. And — more nebulously — although probability theory very often involves uncountable spaces such as $\mathbb{R}^n$ and uncountable families of random variables (arising in stochastic processes), there is a feeling that the heart of probability theory remains countable, that the extension from countable to uncountable infinities is somehow automatic.

So, take some category of spaces $\mathbf{A}$ and a probability monad $T$ on it. Let $\mathbf{A}_c$ be the subcategory of $\mathbf{A}$ consisting of just the countable spaces. Then you can restrict $T$ to $\mathbf{A}_c$ to get a functor $G: \mathbf{A}_c \to \mathbf{A}$. If the idea expounded in the last paragraph is correct, then it should be possible to reconstruct $T$ from $G$ (its restriction to countable spaces) in some automatic way. The substance of the definition of $T$ should be in the countable part $\mathbf{A}_c$ of $\mathbf{A}$; everything else should be fluff.

Ruben shows that this is indeed the case. And as you’ll probably have guessed, the way that $T$ is reconstructed from $G$ is as its codensity monad. That is, in many cases:

A probability monad is the codensity monad of its restriction to countable spaces.

Sometimes it’s even simpler: you can get away with just finite spaces. Sometimes there are some complicating wrinkles. But that’s the main idea.

Here are four theorems illustrating this principle, all from Ruben’s paper.

• Let $\mathbf{A}$ be the category of measurable spaces and $T$ the Giry monad on $\mathbf{A}$, which assigns to each space $X$ the space $T(X)$ of probability measures on it. Then $T$ is the codensity monad of the functor from countable sets to $\mathbf{A}$ that assigns to each countable set the space of probability measures on it.

This theorem embodies the idea that once you have the definition of probability measure on a countable set, the definition for general measurable spaces is automatic.

• Let $\mathbf{A}$ be the category of compact Hausdorff spaces and $T$ the Radon monad on $\mathbf{A}$, which assigns to each space $X$ the space $T(X)$ of Radon probability measures on it. (A Radon measure is a suitably nice measure on the Borel sets, i.e. the smallest $\sigma$-algebra containing the open sets.) Then $T$ is the codensity monad of the functor $G$ from finite sets to $\mathbf{A}$ that assigns to each finite set $X$ the space of probability measures on it.

So for compact Hausdorff spaces, once you’ve got the definition of probability measure on finite sets, the general definition of Radon probability measure follows automatically. A striking feature of this theorem was that although the input functor $G$ was entirely finite in nature, the output monad involves countable additivity. We got that for free.

• Let $\mathbf{A}$ be the category of compact metric spaces and distance-decreasing maps. Let $T$ be the “bounded Lipschitz monad” on $\mathbf{A}$. It assigns to each space $X$ the space $T(X)$ of Radon probability measures on it, with a certain metric called the “bounded Lipschitz metric”. This is closely related to the Wasserstein/Kantorovich metric, and you can find the definition on page 829 of Ruben’s paper. And again, $T$ is the codensity monad of its restriction to finite sets.

• Finally, let $\mathbf{A}$ be the category of Hausdorff spaces. There is a probability monad $T$ on $\mathbf{A}$ that Ruben dubs the Baire monad, assigning to each space $X$ the space of “Baire probability measures” on $X$. I won’t give the definition (see page 825 if you want it), but the point is that the general notion of Baire probability measure reduces to the usual notion of probability measure when $X$ is metric or compact Hausdorff, but still behaves well on the category of Hausdorff spaces as a whole.

In this case, the result is that $T$ is the codensity monad of its restriction to a certain category of countable discrete spaces (i.e. countable sets). It’s not the full subcategory; only certain maps are included. But I’ll leave Ruben to explain that; it’s in section 6 of his paper.

All of this provokes a categorical question. Given a monad $T$ on a category $\mathbf{A}$ and a subcategory $\mathbf{B}$ of $\mathbf{A}$, when is it the case that $T$ is the codensity monad of its restriction to $\mathbf{B}$? Or more generally, given a monad $T$ on $\mathbf{A}$ and a functor $G: \mathbf{B} \to \mathbf{A}$, when is it the case that $T$ is the codensity monad of the composite functor

$\mathbf{B} \stackrel{G}{\to} \mathbf{A} \stackrel{T}{\to} \mathbf{A}?$

We know the answer when $T$ is the identity monad. For in that case, the question is asking “for which functors $G$ is the codensity monad of $G$ the identity?”, and the answer is “when $G$ is codense”. It’s not too hard to show that in general, $T$ is the codensity monad of $T G$ if and only if the composite functor

$\mathbf{B} \stackrel{G}{\to} \mathbf{A} \to \mathbf{A}_T$

is codense, where $\mathbf{A} \to \mathbf{A}_T$ is the free functor into the Kleisli category of $T$. I’m sure Ruben and I aren’t the first to come across this fact, but can anyone point to where it appears in the literature? And does anyone know other applications of it?

Posted at July 19, 2022 9:47 AM UTC

TrackBack URL for this Entry:   https://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/3408

## 8 Comments & 0 Trackbacks

### Re: Probability Monads as Codensity Monads

Well done, Ruben!

Is there some abstract way to relate this codensity account to the graded monad account of Fritz and Perrone, as described here at nLab: graded monad?

…taking the left Kan extension of the graded list monad $\mathbf{B} \mathbb{N} \to [Set,Set]$ described above results in the usual list monad on $Set$, given by a lax monoidal functor $1 \to [Set,Set]$. Based on a similar construction on the category of complete metric spaces, (Fritz & Perrone 17) have contructed a monad of Radon probability measures without any appeal to measure theory; the intuitive idea being that a probability measure can be thought of as an idealized version of a finite sample, and spaces of finite samples make up a graded monad. Forgetting the grading by taking the above Kan extension then produces the Kantorovich monad, containing all Radon probability measures of finite first moment. This construction reduces certain problems in measure-theoretic probability to purely combinatorial problems.

Posted by: David Corfield on July 19, 2022 1:40 PM | Permalink | Reply to this

### Re: Probability Monads as Codensity Monads

Thank you for your comment.

I don’t immediately see a way to relate these two constructions. In the construction of the Kantorovich monad of Fritz and Perrone, the space of measures is constructed as a colimit of finitely supported measures and relies on using complete metric spaces. In the codensity construction the space of measures is formed as a limit of countably supported measures, but doesn’t need completeness of the spaces.

Posted by: Ruben Van Belle on July 25, 2022 11:46 AM | Permalink | Reply to this

### Re: Probability Monads as Codensity Monads

In the case of the inclusion $\iota:\mathbf{Meas}_f \rightarrow \mathbf{Meas}$ the right Kan extension $Ran_{\iota}(\mathcal{G} \circ\iota)$ is, as you know, is just the natural transformation $\mu:\mathcal{G}^2 \Rightarrow \mathcal{G}$ restricted to the subcategory $\mathbf{Meas}_f$. Tacking on another $\mathcal{G}$ to $\iota$ doesn’t change anything - the universal arrow $Ran_{\iota}(\mathcal{G} \circ \iota) \Rightarrow \iota$ still factors through the multiplication natural tranformation $\mu$. I don’t think replacing the inclusion map $\iota$ by $G$ changes the answer - does that universal arrow factor through $\mu$?

I agree that the essence of the Giry monad (and integration) is countability which is the beauty of what Ruben’s result implies. That is because the fundamental Giry algebra is the measurable function $\epsilon_{N}: \mathcal{G}(N) \rightarrow N$, where $N$ is the set of natural numbers with the powerset $\sigma$-algebra, specified by $\sum_{i \in N} p_i \delta_i \mapsto min_i \{i | p_i \gt 0\}$ which is a countably affine function. When I say it is the fundamental $\sigma$-algebra I am saying the two object $N$ is, if you restrict to standard measurable spaces which have a $\sigma$-algebra specified by $\sigma(F)$ where $F$ is a field that has a countably basis, then the subcategory consisting of $N$ and all monotonic functions is codense in standard measurable spaces. (It’s hard to say anything about $\mathbf{Meas}$ because getting a codense functor into $\mathbf{Meas}$ is nontrivial.)

Moreover, both those objects $\mathcal{G}(N)$ and $N$ have a super convex space structure, and the full subcategory of super convex space specified by the single object $\mathcal{G}(N)$ is dense in super convex spaces. (That is actually trivial because super convex spaces are defined using $\mathcal{G}(N)$. (See the nLab page.)) To find all the algebras requires dropping down several layers into the category of super convex spaces - one first seemingly has to drop down to the subcategory where those two objects, $N$ and $\mathcal{G}(N)$, viewed as super convex spaces, are adequate (dense and codense).

OK. My apologies for the long comment. But I found Ruben’s work very exciting also.

Posted by: Kirk Sturtz on July 19, 2022 2:02 PM | Permalink | Reply to this

### Re: Probability Monads as Codensity Monads

Thank you for your comment.

Yes, extending $G$ along the inclusion of countable measurable spaces in measurable spaces would also give the Giry monad. This is essentially because the measurable maps $X\to A$, for some measurable space $X$ and a countable set $A$, contain enough information about the measurable space. However, for topological spaces this is not the case anymore, since continuous maps $X\to A$ only detect clopen subsets of $X$.

Instead of using a subcategory of countable spaces, we could also use a subcategory with $\mathbb{N}$ as the only object and the codensity construction would still work. This implies that $\mathbb{N}$ is codense in the Kleisli category of the probability monad. In the case of the Radon monad the full category with 3, the discrete three element space, as only object would even be enough and therefore this object is codense in the Kleisli category of the Radon monad.

I would be interested to hear more about the condition that $\mathbb{N}$ and $\mathcal{G}(\mathbb{N})$ need to be dense and codense in the category of algebras of the Giry monad. Could you explain this more?

Posted by: Ruben Van Belle on July 25, 2022 12:42 PM | Permalink | Reply to this

### Re: Probability Monads as Codensity Monads

Ah, I like your explanation. I should be thinking in terms of a codense functor into the Kleisi category to get the codensity monad $\mathbb{G}$. I wasn’t aware that the discrete space 3 was codense for the Radon monad. Nice!

The short answer to why the inclusion functor $\iota$ of the full subcategory of super convex spaces, $\mathbf{SCvx}$, consisting of the two objects is $\mathbb{N}$ and $\Delta_{\mathbb{N}}$ and denoted $\Omega$, needs to be codense is because to be able to factorize the Giry monad through a subcategory of $\mathbf{SCvx}$, we need the right Kan extension of $j \circ \mathbf{\Sigma}': \Omega \rightarrow \mathbf{Std}_2 \hookrightarrow \mathbf{Std}$, where $\mathbf{\Sigma}': \Omega \rightarrow \mathbf{Std}_2$ just views those two objects as measurable spaces, along $\iota$ to give us the underlying set of a super convex space endowed with a $\sigma$-algebra (the initial $\sigma$-algebra) such that all the countably affine maps into those objects become measurable. If we only require countably affine maps into $\mathbb{N}$ to be measurable then $\Delta_{\mathbb{N}}$ collapses to the measurable space which is isomorphic to $\mathbb{N}$, and hence the counit $\epsilon:\mathbf{\Sigma} \circ \mathcal{P} \Rightarrow \mathbf{1}$ for $\Delta_{\mathbb{N}}$ will not yield a barycenter map. (The right Kan extension can also be characterized in terms of ideals - see the nLab page for super convex spaces to get the basic idea.)

The reason we need the algebras for $\mathbf{Std}$ is because there the algebras need only be measurable - as opposed to simply continuous which one gets with the Radon monad, and any probability monad on a topological space. For applications, such as quantum computation the discrete space (and their quotients) such as $\epsilon_2: \mathcal{G}(\mathbf{2}) \rightarrow \mathbf{2}$ are necessary. The recent research by Sam Staton and Ned Summers addressed the issue of quantum computation using locally convex compact Hausdorff spaces for which $\epsilon_2$ is not an algebra. But I suspect that locally convex compact Hausdorff spaces are in fact super convex spaces. (The caveat here is that the space $\mathbb{C}$ needs to be replaced by the one point compactification $\mathbb{C}_{\infty}$.)

Posted by: Kirk Sturtz on July 26, 2022 4:21 AM | Permalink | Reply to this

### Re: Probability Monads as Codensity Monads

Oops. The counit $\epsilon$ should read $\mathcal{P} \circ \mathbf{\Sigma} \Rightarrow \mathbf{one}$.

Basically, that right Kan extension $\mathbf{\Sigma}$ yields non-separated measurable spaces for all super convex spaces which are a quotient space of a free space. The example obtained by taking the coequalizer of two points on the unit interval gives a nice example of that.

Posted by: Kirk Sturtz on July 26, 2022 5:10 AM | Permalink | Reply to this

### Re: Probability Monads as Codensity Monads

The condition of being codense is NOT necessary or desirable! That would cut down the category to only free objects. An example is given by taking the coequalizer of the two distinct interior points on the unit interval to get the discrete space $A=\{0,1,u\}$ with three points. There are no countably affine maps from $A$ into either $\mathbb{N}$ or into $\mathcal{G}(\mathbb{N})$ which can separate $0$ and $1$.

Posted by: Kirk Sturtz on July 27, 2022 3:35 AM | Permalink | Reply to this

### Re: Probability Monads as Codensity Monads

Incidentally, I wrote:

should the ultrafilter monad count as a probability monad? (An ultrafilter is a kind of primitive probability measure.)

Although in many ways I do like to think of the ultrafilter monad as a probability monad, here’s a reason not to: failure of Fubini.

I’ll explain what I mean. Let $\mathcal{U}$ be an ultrafilter on a set $X$. For any finite set $S$ and function $f: X \to S$, there is a unique element $s \in S$ such that $f^{-1}(s) \in \mathcal{U}$. This element $s$ deserves to be called $\int_X f \, d\mathcal{U}$. After all, if we’re thinking of $\mathcal{U}$ as something like a probability measure on $X$, then $\int_X f \,d\mathcal{U}$ should be the expected value of $f$ with respect to $\mathcal{U}$. But $f(x) = s$ for almost all $x \in X$, so $\int_X f \, d\mathcal{U} = s$.

By “failure of Fubini”, I mean that for ultrafilters $\mathcal{U}$ on $X$ and $\mathcal{V}$ on $Y$, and a function $f$ from $X \times Y$ to a finite set $S$, it can happen that

$\int_Y \int_X f(x, y) \, d\mathcal{U}(x) \, d\mathcal{V}(y) \neq \int_X \int_Y f(x, y) \, d\mathcal{V}(y) \, d\mathcal{U}(x).$

A counterexample: let $X = Y = \mathbb{N}$, let $\mathcal{U} = \mathcal{V}$ be a nonprincipal ultrafilter on $\mathbb{N}$, and let $f$ be the characteristic function of the subset $\{(x, y): x \leq y\}$ of $\mathbb{N} \times \mathbb{N}$.

I guess this failure of Fubini is equivalent to the statement that the ultrafilter monad is not commutative, but I haven’t thought it through… is that right?

Posted by: Tom Leinster on July 25, 2022 12:33 PM | Permalink | Reply to this

Post a New Comment