A Categorical Look at Random Variables
Posted by Tom Leinster
guest post by Mark Meckes
For the past several years I’ve been thinking on and off about whether there’s a fruitful category-theoretic perspective on probability theory, or at least a perspective with a category-theoretic flavor.
(You can see this MathOverflow question by Pete Clark for some background, though I started thinking about this question somewhat earlier. The fact that I’m writing this post should tell you something about my attitude toward my own answer there. On the other hand, that answer indicates something of the perspective I’m coming from.)
I’m a long way from finding such a perspective I’m happy with, but I have some observations I’d like to share with other n-Category Café patrons on the subject, in hopes of stirring up some interesting discussion. The main idea here was pointed out to me by Tom, who I pester about this subject on an approximately annual basis.
Let’s first dispense with one rather banal observation. Let be the category whose objects are probability spaces (measure spaces with total measure ), and whose morphisms are almost-everywhere-equality equivalence classes of measure-preserving maps. Then:
Probability theory is not about the category .
To put it a little less (ahem) categorically, probability theory is not about the category , in the sense that group theory or topology might be said (however incompletely) to be about the categories or . The most basic justification of this assertion is that isomorphic objects in are not “the same” from the point of view of probability theory. Indeed, the distributions of
- a uniform random variable in an interval,
- an infinite sequence of independent coin flips, and
- Brownian motion
are radically different things in probability theory, but they’re all isomorphic to each other in !
Anyway, as any probabilist will tell you, probability theory isn’t about probability spaces. The fundamental “objects” in probability theory are actually the morphisms of : random variables.
Typically, a random variable is defined to be a measurable map , where is a probability space and is, a priori, just a measurable space. (I’m suppressing -algebras here, which indicates how modest the scope of this post is: serious probability theory works with multiple -algebras on a single space.) But every random variable canonically induces a probability measure on its codomain, its distribution defined by
for every measurable . This formula is precisely what it means to say that is measure-preserving.
In probability theory, the only questions we’re allowed to ask about are about its distribution. On the other hand, two random variables which have the same distribution are not thought of as “the same random variable” in the same way that isomorphic groups are “the same group”. In fact, a probabilist’s favorite trick is to replace a random variable with another random variable which has the same distribution, but which is in some way easier to analyze. For example, may factor in a useful way as the composition of two morphisms in (although probabilists don’t normally write about things in those terms).
Now let’s fix a codomain . Then there is a category whose objects are -valued random variables; if and are two random variables with domains and respectively, then a morphism from to is a measure-preserving map such that . (Figuring out how to typeset the commutative triangle here is more trouble than I feel like going to.)
In this case
so if a morphism exists, then and have the same distribution. Moreover, if is a probability measure on , there is a canonical random variable with distribution , namely, the identity map on , and any random variable with distribution itself defines a morphism from the object to that object .
It follows that the family of random variables with distribution is a connected component of . (I don’t know whether the construction of from has a standard name, but I have learned that its connected components are slice categories of .)
Now a typical theorem in probability theory starts by taking a family of random variables all defined on the same domain . That’s no problem in this picture: this is the same as a single random variable . (There’s also always some kind of assumption about the relationships among the — independence, for example, though that’s only the simplest such relationship that people think about — I don’t (yet!) have any thoughts to share about expressing those relationships in terms of the picture here.)
The next thing is to cook up a new random variable defined on by applying some measurable function . A prototype is the function (well, family of functions)
which has a starring role in all the classics: the Weak and Strong Laws of Large Numbers, Central Limit Theorem, Law of the Iterated Logarithm, Cramér’s Theorem, etc. This fits nicely into this picture, too: any measurable map induces a functor in an obvious way (a morphism in given by a measure-preserving is mapped to a morphism given by the same — that point is probably obvious to most of the people here, but I needed to think a bit about it a bit to convince myself that really is a functor).
Finally, as I said, a probabilist may go about understanding the distribution of the random variable — that is, the object of — by instead working with another object in the same connected component of . Both the assumptions on and the structure of may be used to help cook up .
This is quite different from any category-theoretic perspective I’ve ever encountered in, say, algebra or topology, but my ignorance of those fields is broad and deep. If anyone finds this kind of category-theoretic picture familiar, I’d love to hear about it!
One last observation here is that I believe (I haven’t tried writing out all the details) that the mappings
define a functor . I have no idea what, if anything, this observation may do for probability theory.
Re: A Categorical Look at Random Variables
It’s very nearly the case that in every morphism is an epimorphism (because the image of a measure-preserving map has to have outer-measure ). So the category reminds me of the category in which every morphism is a monomorphism.
In probability people fix a sample space and study the relationships between the random variables . We’re supposed to think of as arbitrary and not very important. It exists merely to support the random variables, and it could easily be replaced by any with a map .
Does a similar situation occur in the theory of fields? From my Galois theory classes I vaguely remember seeing arguments in which we cared about algebraic extensions of , but in order to prove the desired result we had to fix an arbitrary algebraically closed extension of and consider their embeddings inside it. If I’m remembering correctly then this would be dual to the situation in probability. We don’t care about fields on their own; we only care about how their embeddings into a larger field interact.
Does this analogy hold up? Also, could we tweak the definition of so that every morphism actually is epimorphic?