### A Categorical Look at Random Variables

#### Posted by Tom Leinster

*guest post by Mark Meckes*

For the past several years I’ve been thinking on and off about whether there’s a fruitful category-theoretic perspective on probability theory, or at least a perspective with a category-theoretic flavor.

(You can see this MathOverflow question by Pete Clark for some background, though I started thinking about this question somewhat earlier. The fact that I’m writing this post should tell you something about my attitude toward my own answer there. On the other hand, that answer indicates something of the perspective I’m coming from.)

I’m a long way from finding such a perspective I’m happy with, but I have some observations I’d like to share with other n-Category Café patrons on the subject, in hopes of stirring up some interesting discussion. The main idea here was pointed out to me by Tom, who I pester about this subject on an approximately annual basis.

Let’s first dispense with one rather banal observation. Let $\mathbf{Prob}$ be the category whose objects are probability spaces (measure spaces with total measure $1$), and whose morphisms are almost-everywhere-equality equivalence classes of measure-preserving maps. Then:

Probability theory is

notabout the category $\mathbf{Prob}$.

To put it a little less (ahem) categorically, probability theory is not about the category $\mathbf{Prob}$, in the sense that group theory or topology might be said (however incompletely) to be about the categories $\mathbf{Grp}$ or $\mathbf{Top}$. The most basic justification of this assertion is that isomorphic objects in $\mathbf{Prob}$ are not “the same” from the point of view of probability theory. Indeed, the distributions of

- a uniform random variable in an interval,
- an infinite sequence of independent coin flips, and
- Brownian motion $\{B_t : t \ge 0\}$

are radically different things in probability theory, but they’re all isomorphic to each other in $\mathbf{Prob}$!

Anyway, as any probabilist will tell you, probability theory isn’t
about probability spaces. The fundamental “objects” in probability
theory are actually the *morphisms* of $\mathbf{Prob}$: random
variables.

Typically, a random variable is defined to be a measurable map
$X:\Omega \to E$, where $(\Omega, \mathbb{P})$ is a probability space
and $E$ is, a priori, just a measurable space. (I’m suppressing
$\sigma$-algebras here, which indicates how modest the scope of this
post is: serious probability theory works with multiple
$\sigma$-algebras on a single space.) But every random variable
canonically induces a probability measure on its codomain, its
**distribution** $\mu = X_\# \mathbb{P}$ defined by

$\mu(A) = \mathbb{P}(X^{-1}(A))$

for every measurable $A \subseteq E$. This formula is precisely what it means to say that $X:(\Omega, \mathbb{P}) \to (E, \mu)$ is measure-preserving.

In probability theory, the only questions we’re allowed to ask about $X$ are about its distribution. On the other hand, two random variables which have the same distribution are not thought of as “the same random variable” in the same way that isomorphic groups are “the same group”. In fact, a probabilist’s favorite trick is to replace a random variable $X:\Omega \to E$ with another random variable $X':\Omega' \to E$ which has the same distribution, but which is in some way easier to analyze. For example, $X'$ may factor in a useful way as the composition of two morphisms in $\mathbf{Prob}$ (although probabilists don’t normally write about things in those terms).

Now let’s fix a codomain $E$. Then there is a category $\mathbf{R}(E)$ whose objects are $E$-valued random variables; if $X$ and $X'$ are two random variables with domains $(\Omega, \mathbb{P})$ and $(\Omega', \mathbb{P}')$ respectively, then a morphism from $X$ to $X'$ is a measure-preserving map $f:\Omega \to \Omega'$ such that $X' \circ f = X$. (Figuring out how to typeset the commutative triangle here is more trouble than I feel like going to.)

In this case

$X_{\#} \mathbb{P} = (X' \circ f)_{\#} \mathbb{P} = X'_{\#} f_{\#} \mathbb{P} = X'_{\#} \mathbb{P}',$

so if a morphism $X \to X'$ exists, then $X$ and $X'$ have the same distribution. Moreover, if $\mu$ is a probability measure on $E$, there is a canonical random variable with distribution $\mu$, namely, the identity map $Id_E$ on $(E,\mu)$, and any random variable $X$ with distribution $\mu$ itself defines a morphism from the object $X$ to that object $Id_E$.

It follows that the family $\mathbf{R}(E, \mu)$ of random variables with distribution $\mu$ is a connected component of $\mathbf{R}(E)$. (I don’t know whether the construction of $\mathbf{R}(E)$ from $\mathbf{Prob}$ has a standard name, but I have learned that its connected components $\mathbf{R}(E, \mu)$ are slice categories of $\mathbf{Prob}$.)

Now a typical theorem in probability theory starts by taking a family of random variables $X_i : \Omega \to E_i$ all defined on the same domain $\Omega$. That’s no problem in this picture: this is the same as a single random variable $X : \Omega \to \prod_i E_i$. (There’s also always some kind of assumption about the relationships among the $X_i$ — independence, for example, though that’s only the simplest such relationship that people think about — I don’t (yet!) have any thoughts to share about expressing those relationships in terms of the picture here.)

The next thing is to cook up a new random variable defined on $\Omega$ by applying some measurable function $F:\prod_i E_i \to E$. A prototype is the function (well, family of functions)

$F: \mathbb{R}^n \to \mathbb{R}, \qquad (x_1, \ldots, x_n) \mapsto \sum_{i=1}^n x_n,$

which has a starring role in all the classics: the Weak and Strong Laws of Large Numbers, Central Limit Theorem, Law of the Iterated Logarithm, Cramér’s Theorem, etc. This fits nicely into this picture, too: any measurable map $F:E \to E'$ induces a functor $F_!:\mathbf{R}(E) \to \mathbf{R}(E')$ in an obvious way (a morphism in $\mathbf{R}(E)$ given by a measure-preserving $f:\Omega \to \Omega'$ is mapped to a morphism $\mathbf{R}(E)$ given by the same $f$ — that point is probably obvious to most of the people here, but I needed to think a bit about it a bit to convince myself that $F_!$ really is a functor).

Finally, as I said, a probabilist may go about understanding the distribution of the random variable $F(X)$ — that is, the object $F_!(X)$ of $\mathbf{R}(E)$ — by instead working with another object $Y$ in the same connected component of $\mathbf{R}(E)$. Both the assumptions on $X$ and the structure of $F$ may be used to help cook up $Y$.

This is quite different from any category-theoretic perspective I’ve ever encountered in, say, algebra or topology, but my ignorance of those fields is broad and deep. If anyone finds this kind of category-theoretic picture familiar, I’d love to hear about it!

One last observation here is that I believe (I haven’t tried writing out all the details) that the mappings

$E \mapsto \mathbf{R}(E), \qquad F \mapsto F_!$

define a functor $\mathbf{Meas} \to \mathbf{Cat}$. I have no idea what, if anything, this observation may do for probability theory.

## Re: A Categorical Look at Random Variables

It’s very nearly the case that in $\mathbf{Prob}$ every morphism is an epimorphism (because the image of a measure-preserving map has to have outer-measure $1$). So the category $\mathbf{Prob}$ reminds me of the category $\mathbf{Field}$ in which every morphism is a monomorphism.

In probability people fix a sample space $\Omega$ and study the relationships between the random variables $\Omega\to X$. We’re supposed to think of $\Omega$ as arbitrary and not very important. It exists merely to support the random variables, and it could easily be replaced by any $\Omega'$ with a map $\Omega'\to\Omega$.

Does a similar situation occur in the theory of fields? From my Galois theory classes I vaguely remember seeing arguments in which we cared about algebraic extensions of $\mathbb{Q}$, but in order to prove the desired result we had to fix an arbitrary algebraically closed extension of $\mathbb{Q}$ and consider their embeddings inside it. If I’m remembering correctly then this would be dual to the situation in probability. We don’t care about fields on their own; we only care about how their embeddings into a larger field interact.

Does this analogy hold up? Also, could we tweak the definition of $\mathbf{Prob}$ so that every morphism actually is epimorphic?