### Probability Monads as Codensity Monads

#### Posted by Tom Leinster

My PhD student Ruben Van Belle has just published his first paper!

Ruben Van Belle, Probability monads as codensity monads.

Theory and Applications of Categories38 (2022), 811–842.

It’s a treasure trove of theorems demonstrating how many of the monads loosely referred to as “probability monads” arise as codensity monads in a certain uniform manner, which I’ll tell you about now.

A **probability monad** is a monad that takes a space $X$ as input and
produces as output the space of probability measures on $X$. It’s an
imprecise term, but there are a number of monads that are
universally agreed to count as probability monads, of which the Giry
monad is the best known. There are also some boundary cases: should
the ultrafilter monad count as a probability monad? (An ultrafilter is a
kind of primitive probability measure.)

The **codensity monad** of a functor $G: \mathbf{B} \to \mathbf{A}$ is a
certain canonical monad on $\mathbf{A}$. If $G$ has a left adjoint $F$,
then the codensity monad of $G$ is the composite $G F$. But even if $G$
doesn’t have a left adjoint, its codensity monad is often well-defined. By
definition, it’s the right Kan extension of $G$ along itself. A good
non-trivial example: the codensity monad of the inclusion $\mathbf{FinSet}
\hookrightarrow \mathbf{Set}$ is the ultrafilter monad.

Ruben’s insight was as follows. There is something fundamentally countable
about probability theory. For instance, probability is additive on
*countable* families of mutually exclusive events, but not general
families. You can define a probability measure on a *countable* set $X$ as
an $X$-indexed family of nonnegative reals summing to $1$, but that
definition doesn’t work when $X$ is larger. And — more nebulously
— although probability theory very often involves uncountable spaces
such as $\mathbb{R}^n$ and uncountable families of random variables
(arising in stochastic processes), there is a feeling that the heart of
probability theory remains countable, that the extension from countable to
uncountable infinities is somehow automatic.

So, take some category of spaces $\mathbf{A}$ and a probability monad $T$ on it. Let $\mathbf{A}_c$ be the subcategory of $\mathbf{A}$ consisting of just the countable spaces. Then you can restrict $T$ to $\mathbf{A}_c$ to get a functor $G: \mathbf{A}_c \to \mathbf{A}$. If the idea expounded in the last paragraph is correct, then it should be possible to reconstruct $T$ from $G$ (its restriction to countable spaces) in some automatic way. The substance of the definition of $T$ should be in the countable part $\mathbf{A}_c$ of $\mathbf{A}$; everything else should be fluff.

Ruben shows that this is indeed the case. And as you’ll probably have guessed, the way that $T$ is reconstructed from $G$ is as its codensity monad. That is, in many cases:

A probability monad is the codensity monad of its restriction to countable spaces.

Sometimes it’s even simpler: you can get away with just *finite*
spaces. Sometimes there are some complicating wrinkles. But that’s the main
idea.

Here are four theorems illustrating this principle, all from Ruben’s paper.

Let $\mathbf{A}$ be the category of measurable spaces and $T$ the Giry monad on $\mathbf{A}$, which assigns to each space $X$ the space $T(X)$ of probability measures on it. Then $T$ is the codensity monad of the functor from countable sets to $\mathbf{A}$ that assigns to each countable set the space of probability measures on it.

This theorem embodies the idea that once you have the definition of probability measure on a

*countable*set, the definition for general measurable spaces is automatic.Let $\mathbf{A}$ be the category of compact Hausdorff spaces and $T$ the Radon monad on $\mathbf{A}$, which assigns to each space $X$ the space $T(X)$ of Radon probability measures on it. (A Radon measure is a suitably nice measure on the Borel sets, i.e. the smallest $\sigma$-algebra containing the open sets.) Then $T$ is the codensity monad of the functor $G$ from

*finite*sets to $\mathbf{A}$ that assigns to each finite set $X$ the space of probability measures on it.So for compact Hausdorff spaces, once you’ve got the definition of probability measure on finite sets, the general definition of Radon probability measure follows automatically. A striking feature of this theorem was that although the input functor $G$ was entirely

*finite*in nature, the output monad involves*countable*additivity. We got that for free.Let $\mathbf{A}$ be the category of compact metric spaces and distance-decreasing maps. Let $T$ be the “bounded Lipschitz monad” on $\mathbf{A}$. It assigns to each space $X$ the space $T(X)$ of Radon probability measures on it, with a certain metric called the “bounded Lipschitz metric”. This is closely related to the Wasserstein/Kantorovich metric, and you can find the definition on page 829 of Ruben’s paper. And again, $T$ is the codensity monad of its restriction to

*finite*sets.Finally, let $\mathbf{A}$ be the category of Hausdorff spaces. There is a probability monad $T$ on $\mathbf{A}$ that Ruben dubs the

**Baire monad**, assigning to each space $X$ the space of “Baire probability measures” on $X$. I won’t give the definition (see page 825 if you want it), but the point is that the general notion of Baire probability measure reduces to the usual notion of probability measure when $X$ is metric or compact Hausdorff, but still behaves well on the category of Hausdorff spaces as a whole.In this case, the result is that $T$ is the codensity monad of its restriction to a certain category of countable discrete spaces (i.e. countable sets). It’s not the

*full*subcategory; only certain maps are included. But I’ll leave Ruben to explain that; it’s in section 6 of his paper.

All of this provokes a categorical question. Given a monad $T$ on a category $\mathbf{A}$ and a subcategory $\mathbf{B}$ of $\mathbf{A}$, when is it the case that $T$ is the codensity monad of its restriction to $\mathbf{B}$? Or more generally, given a monad $T$ on $\mathbf{A}$ and a functor $G: \mathbf{B} \to \mathbf{A}$, when is it the case that $T$ is the codensity monad of the composite functor

$\mathbf{B} \stackrel{G}{\to} \mathbf{A} \stackrel{T}{\to} \mathbf{A}?$

We know the answer when $T$ is the identity monad. For in that case, the question is asking “for which functors $G$ is the codensity monad of $G$ the identity?”, and the answer is “when $G$ is codense”. It’s not too hard to show that in general, $T$ is the codensity monad of $T G$ if and only if the composite functor

$\mathbf{B} \stackrel{G}{\to} \mathbf{A} \to \mathbf{A}_T$

is codense, where $\mathbf{A} \to \mathbf{A}_T$ is the free functor into the Kleisli category of $T$. I’m sure Ruben and I aren’t the first to come across this fact, but can anyone point to where it appears in the literature? And does anyone know other applications of it?

## Re: Probability Monads as Codensity Monads

Well done, Ruben!

Is there some abstract way to relate this codensity account to the graded monad account of Fritz and Perrone, as described here at nLab: graded monad?