Skip to the Main Content

Note:These pages make extensive use of the latest XHTML and CSS Standards. They ought to look great in any standards-compliant modern browser. Unfortunately, they will probably look horrible in older browsers, like Netscape 4.x and IE 4.x. Moreover, many posts use MathML, which is, currently only supported in Mozilla. My best suggestion (and you will thank me when surfing an ever-increasing number of sites on the web which have been crafted to use the new standards) is to upgrade to the latest version of your browser. If that's not possible, consider moving to the Standards-compliant and open-source Mozilla browser.

September 5, 2018

A Categorical Look at Random Variables

Posted by Tom Leinster

guest post by Mark Meckes

For the past several years I’ve been thinking on and off about whether there’s a fruitful category-theoretic perspective on probability theory, or at least a perspective with a category-theoretic flavor.

(You can see this MathOverflow question by Pete Clark for some background, though I started thinking about this question somewhat earlier. The fact that I’m writing this post should tell you something about my attitude toward my own answer there. On the other hand, that answer indicates something of the perspective I’m coming from.)

I’m a long way from finding such a perspective I’m happy with, but I have some observations I’d like to share with other n-Category Café patrons on the subject, in hopes of stirring up some interesting discussion. The main idea here was pointed out to me by Tom, who I pester about this subject on an approximately annual basis.

Let’s first dispense with one rather banal observation. Let Prob\mathbf{Prob} be the category whose objects are probability spaces (measure spaces with total measure 11), and whose morphisms are almost-everywhere-equality equivalence classes of measure-preserving maps. Then:

Probability theory is not about the category Prob\mathbf{Prob}.

To put it a little less (ahem) categorically, probability theory is not about the category Prob\mathbf{Prob}, in the sense that group theory or topology might be said (however incompletely) to be about the categories Grp\mathbf{Grp} or Top\mathbf{Top}. The most basic justification of this assertion is that isomorphic objects in Prob\mathbf{Prob} are not “the same” from the point of view of probability theory. Indeed, the distributions of

  1. a uniform random variable in an interval,
  2. an infinite sequence of independent coin flips, and
  3. Brownian motion {B t:t0}\{B_t : t \ge 0\}

are radically different things in probability theory, but they’re all isomorphic to each other in Prob\mathbf{Prob}!

Anyway, as any probabilist will tell you, probability theory isn’t about probability spaces. The fundamental “objects” in probability theory are actually the morphisms of Prob\mathbf{Prob}: random variables.

Typically, a random variable is defined to be a measurable map X:ΩEX:\Omega \to E, where (Ω,)(\Omega, \mathbb{P}) is a probability space and EE is, a priori, just a measurable space. (I’m suppressing σ\sigma-algebras here, which indicates how modest the scope of this post is: serious probability theory works with multiple σ\sigma-algebras on a single space.) But every random variable canonically induces a probability measure on its codomain, its distribution μ=X #\mu = X_\# \mathbb{P} defined by

μ(A)=(X 1(A)) \mu(A) = \mathbb{P}(X^{-1}(A))

for every measurable AEA \subseteq E. This formula is precisely what it means to say that X:(Ω,)(E,μ)X:(\Omega, \mathbb{P}) \to (E, \mu) is measure-preserving.

In probability theory, the only questions we’re allowed to ask about XX are about its distribution. On the other hand, two random variables which have the same distribution are not thought of as “the same random variable” in the same way that isomorphic groups are “the same group”. In fact, a probabilist’s favorite trick is to replace a random variable X:ΩEX:\Omega \to E with another random variable X:ΩEX':\Omega' \to E which has the same distribution, but which is in some way easier to analyze. For example, XX' may factor in a useful way as the composition of two morphisms in Prob\mathbf{Prob} (although probabilists don’t normally write about things in those terms).

Now let’s fix a codomain EE. Then there is a category R(E)\mathbf{R}(E) whose objects are EE-valued random variables; if XX and XX' are two random variables with domains (Ω,)(\Omega, \mathbb{P}) and (Ω,)(\Omega', \mathbb{P}') respectively, then a morphism from XX to XX' is a measure-preserving map f:ΩΩf:\Omega \to \Omega' such that Xf=XX' \circ f = X. (Figuring out how to typeset the commutative triangle here is more trouble than I feel like going to.)

In this case

X #=(Xf) #=X #f #=X #, X_{\#} \mathbb{P} = (X' \circ f)_{\#} \mathbb{P} = X'_{\#} f_{\#} \mathbb{P} = X'_{\#} \mathbb{P}',

so if a morphism XXX \to X' exists, then XX and XX' have the same distribution. Moreover, if μ\mu is a probability measure on EE, there is a canonical random variable with distribution μ\mu, namely, the identity map Id EId_E on (E,μ)(E,\mu), and any random variable XX with distribution μ\mu itself defines a morphism from the object XX to that object Id EId_E.

It follows that the family R(E,μ)\mathbf{R}(E, \mu) of random variables with distribution μ\mu is a connected component of R(E)\mathbf{R}(E). (I don’t know whether the construction of R(E)\mathbf{R}(E) from Prob\mathbf{Prob} has a standard name, but I have learned that its connected components R(E,μ)\mathbf{R}(E, \mu) are slice categories of Prob\mathbf{Prob}.)

Now a typical theorem in probability theory starts by taking a family of random variables X i:ΩE iX_i : \Omega \to E_i all defined on the same domain Ω\Omega. That’s no problem in this picture: this is the same as a single random variable X:Ω iE iX : \Omega \to \prod_i E_i. (There’s also always some kind of assumption about the relationships among the X iX_i — independence, for example, though that’s only the simplest such relationship that people think about — I don’t (yet!) have any thoughts to share about expressing those relationships in terms of the picture here.)

The next thing is to cook up a new random variable defined on Ω\Omega by applying some measurable function F: iE iEF:\prod_i E_i \to E. A prototype is the function (well, family of functions)

F: n,(x 1,,x n) i=1 nx n, F: \mathbb{R}^n \to \mathbb{R}, \qquad (x_1, \ldots, x_n) \mapsto \sum_{i=1}^n x_n,

which has a starring role in all the classics: the Weak and Strong Laws of Large Numbers, Central Limit Theorem, Law of the Iterated Logarithm, Cramér’s Theorem, etc. This fits nicely into this picture, too: any measurable map F:EEF:E \to E' induces a functor F !:R(E)R(E)F_!:\mathbf{R}(E) \to \mathbf{R}(E') in an obvious way (a morphism in R(E)\mathbf{R}(E) given by a measure-preserving f:ΩΩf:\Omega \to \Omega' is mapped to a morphism R(E)\mathbf{R}(E) given by the same ff — that point is probably obvious to most of the people here, but I needed to think a bit about it a bit to convince myself that F !F_! really is a functor).

Finally, as I said, a probabilist may go about understanding the distribution of the random variable F(X)F(X) — that is, the object F !(X)F_!(X) of R(E)\mathbf{R}(E) — by instead working with another object YY in the same connected component of R(E)\mathbf{R}(E). Both the assumptions on XX and the structure of FF may be used to help cook up YY.

This is quite different from any category-theoretic perspective I’ve ever encountered in, say, algebra or topology, but my ignorance of those fields is broad and deep. If anyone finds this kind of category-theoretic picture familiar, I’d love to hear about it!

One last observation here is that I believe (I haven’t tried writing out all the details) that the mappings

ER(E),FF ! E \mapsto \mathbf{R}(E), \qquad F \mapsto F_!

define a functor MeasCat\mathbf{Meas} \to \mathbf{Cat}. I have no idea what, if anything, this observation may do for probability theory.

Posted at September 5, 2018 10:24 PM UTC

TrackBack URL for this Entry:   https://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/3058

35 Comments & 0 Trackbacks

Re: A Categorical Look at Random Variables

It’s very nearly the case that in Prob\mathbf{Prob} every morphism is an epimorphism (because the image of a measure-preserving map has to have outer-measure 11). So the category Prob\mathbf{Prob} reminds me of the category Field\mathbf{Field} in which every morphism is a monomorphism.

In probability people fix a sample space Ω\Omega and study the relationships between the random variables ΩX\Omega\to X. We’re supposed to think of Ω\Omega as arbitrary and not very important. It exists merely to support the random variables, and it could easily be replaced by any Ω\Omega' with a map ΩΩ\Omega'\to\Omega.

Does a similar situation occur in the theory of fields? From my Galois theory classes I vaguely remember seeing arguments in which we cared about algebraic extensions of \mathbb{Q}, but in order to prove the desired result we had to fix an arbitrary algebraically closed extension of \mathbb{Q} and consider their embeddings inside it. If I’m remembering correctly then this would be dual to the situation in probability. We don’t care about fields on their own; we only care about how their embeddings into a larger field interact.

Does this analogy hold up? Also, could we tweak the definition of Prob\mathbf{Prob} so that every morphism actually is epimorphic?

Posted by: Oscar Cunningham on September 6, 2018 12:30 AM | Permalink | Reply to this

Re: A Categorical Look at Random Variables

I believe that with the definition I adopted here for Prob\mathbf{Prob}, in which morphisms are a.e.-equivalence classes of functions, every morphism is indeed an epimorphism.

Your proposed analogy is intriguing (and I must admit that, despite having taught Galois theory several times, it never occurred to me). One of my goals in this project is to better understand what makes one part of mathematics different from other parts. Since the theory of fields is kind of weird from the perspective of other parts of algebra, this would be a very interesting way of pinpointing part of how probability is different from algebra, or how field theory is different from other parts of algebra.

Posted by: Mark Meckes on September 6, 2018 1:30 PM | Permalink | Reply to this

Re: A Categorical Look at Random Variables

I have an example of a morphism that isn’t epic. Let A={*}A=\{*\} be the probability space with one point, and let B={0,1}B=\{0,1\} be the probability space with two points in which the only measurable sets are \emptyset and BB.

Now let m:ABm:A\to B send ** to 00, let f:BBf:B\to B be the identity and let g:BBg:B\to B be the constant 00 function.

Then ff isn’t equal to gg almost everywhere, since they differ on 11, which isn’t contained in any measure-zero set. But mfm\circ f is equal to mgm\circ g, so mm isn’t epic.

Posted by: Oscar Cunningham on September 6, 2018 2:59 PM | Permalink | Reply to this

Re: A Categorical Look at Random Variables

Ah, right. Maybe one could tweak things in a way that basically says “No stupid σ\sigma-algebras”, but I don’t know off-hand what version of that, if any, would do the job.

Posted by: Mark Meckes on September 6, 2018 4:44 PM | Permalink | Reply to this

Re: A Categorical Look at Random Variables

Perhaps instead of saying that f,g:BCf,g:B\to C are the same if they agree almost everywhere, we should instead say that ff and gg are the same whenever there is a k:DBk:D\to B such that the maps kfk\circ f and kgk\circ g agree almost everywhere*. That definitely forces every morphism in Prob\mathbf{Prob} to be epic, while still not quotienting together any maps that we would genuinely want to be different. In particular ff and gg are distinct if they differ on a set of positive measure.

Alternatively, instead of passing to a quotient category, we could just define Prob\mathbf{Prob} to the subcategory of the measure-preserving maps which are surjective. This would also force every morphism to be epic.

These two choices of definitions for Prob\mathbf{Prob} reminded me of Remark 2 in these notes by Terry Tao. Essentially the first definition of Prob\mathbf{Prob} is for people who think that “probability 00” and “empty set” should be identical notions, and the second definition of Prob\mathbf{Prob} is for people who want to preserve the distinction. It’s nice that we end up with a category of epimorphisms no matter which route we take.

*[I can’t actually prove that this is an equivalence relation. Suppose we have f,g,h:BCf,g,h:B\to C and k:DBk:D\to B, k:DBk':D'\to B such that kf=kgk\circ f = k\circ g and kg=khk'\circ g = k'\circ h almost everywhere. Then how can we conclude that f and h must be equivalent? It would suffice if we could prove that for any maps k:DBk:D\to B, k:DBk':D'\to B there was an EE and maps j:EDj:E\to D, j:EDj':E\to D' such that jk=jkj'\circ k' = j\circ k (in category theoretic language: “every cospan has a cone”). But I don’t know of any measure theory result that proves this. It is similar to disintegration.

Certainly the dual result is true in Field\mathbf{Field}. If we have a field K with two extensions LL,LL' then there is always a field FF forming a cocone of the span.]

Posted by: Oscar Cunningham on September 6, 2018 6:22 PM | Permalink | Reply to this

Re: A Categorical Look at Random Variables

One occasionally used solution, which I’ve learned from Robert Furber, is to identify to measurable maps f,g:ABf,g : A \to B if for every measurable SBS\subseteq B, the symmetric difference of f 1(S)f^{-1}(S) and g 1(S)g^{-1}(S) has measure zero. This will identify ff and gg in Oscar’s example. More generally, it implies that every morphism is epic, simply because the morphisms are measure-preserving by definition.

Posted by: Tobias Fritz on September 6, 2018 6:25 PM | Permalink | Reply to this

Re: A Categorical Look at Random Variables

It probably also makes sense to identify probability spaces whose symmetric difference has measure zero, for that matter.

Posted by: Mark Meckes on September 12, 2018 11:46 AM | Permalink | Reply to this

Re: A Categorical Look at Random Variables

Can you elaborate a bit on what you mean by that? What is the symmetric difference of probability spaces?

One nice feature of using a definition of Prob\mathbf{Prob} in which maps which agree almost surely are identified is that you can add or remove events of measure zero to/from a probability space, and you will get another probability space which is isomorphic to the original one as objects in Prob\mathbf{Prob}. Is this what you’re after?

Since my alternative definition of Prob\mathbf{Prob} identifies in particular maps which agree almost surely, this happens with my proposed redefinition of Prob\mathbf{Prob} as well.

Posted by: Tobias Fritz on September 12, 2018 2:00 PM | Permalink | Reply to this

Re: A Categorical Look at Random Variables

By symmetric difference I just mean the symmetric difference of the underlying sets, so that adding or removing events of measure zero to/from Ω\Omega results in a probability space Ω\Omega' which is identified with Ω\Omega.

As you say, if maps which agree almost surely are identified with each other, then Ω\Omega and Ω\Omega' are already guaranteed to be isomorphic in Prob\mathbf{Prob}. But as I explained in the post, there are many isomorphic probability spaces in Prob\mathbf{Prob} which should not be identified with each other in probability theory.

Posted by: Mark Meckes on September 12, 2018 2:52 PM | Permalink | Reply to this

Re: A Categorical Look at Random Variables

From the categorical perspective, isn’t it perfectly enough to have an isomorphism ΩΩ\Omega\cong\Omega' in such a situation rather than an on-the-nose identification? If you try to do identify such Ω\Omega and Ω\Omega' from the beginning, then it will be difficult to define some version of Prob\mathbf{Prob} with those equivalence classes as objects, since it’s hard to write down a well-defined notion of morphism. But for all practical purposes, having an isomorphism ΩΩ\Omega\cong\Omega' is perfectly enough.


Concerning your three examples of probability spaces which should not be identified, I believe that the reason that they should not be identified is that considering them as mere objects of Prob\mathbf{Prob} does not capture all of the relevant structure. Of course, this does not mean that the additional structure is not amenable to a categorical treatment! More concretely, your second and third examples are stochastic processes indexed by \mathbb{N} and +\mathbb{R}_+, respectively. So categorically, they are not just objects in Prob\mathbf{Prob}, but really diagrams in Prob\mathbf{Prob}, whose shape is that of a bunch of morphisms with the same domain (like a span), indexed by \mathbb{N} and +\mathbb{R}_+, respectively. (In each of your examples, these morphisms also all happen to have the same codomain, namely {heads,tails}\{\mathrm{heads},\mathrm{tails}\} or \mathbb{R} respectively, but this is arguably less relevant.)

So depending on what exactly you’re interested in about those spaces, different kinds of categorical structure will be relevant. So if you want to consider the expectation value of the [0,1][0,1]-valued random variable in the first example, then you will have to consider [0,1][0,1] as an Eilenberg-Moore algebra of a suitable probability monad on ℙ𝕣𝕠𝕓\mathbb{Prob}. If you also want to talk about higher moments, then the pushforwards along the power maps [0,1][0,1][0,1]\to [0,1] are relevant as well in addition to the algebra structure.

In summary, the reason that those situations are different from the perspective of probability is the same reason as to why they are different from the perspective of category theory. It’s just different kinds of structure that one encounters. But most or all of that structure is amenable to a categorical treatment. (I’m not aware of anything that’s not.) In this sense, I would argue that probability theory is about the category Prob\mathbf{Prob} or variants thereof, at least when equipped with suitable additional structure such as a probability monad.

This probably agrees with your perspective and will not be new to you, but I haven’t seen it mentioned explicitly in the discussion, so I thought I might as well make it more explicit.

Posted by: Tobias Fritz on September 12, 2018 3:38 PM | Permalink | Reply to this

Re: A Categorical Look at Random Variables

Hi Tobias, Which monad on Prob\mathbf{Prob} did you have in mind? In Alex Simpson’s work there is a nice monad on probability sheaves (e.g. Sec 5 here). I think the Kleisli maps between representables are spans, or perhaps “couplings”, modulo …. So it matches what you informally said, although I don’t see that it is a monad on Prob\mathbf{Prob} itself. Is this roughly what you had in mind?

Posted by: Sam Staton on September 13, 2018 7:05 AM | Permalink | Reply to this

Re: A Categorical Look at Random Variables

Sam, sorry, I was speaking too informally and said something stupid. What I have in mind is the Giry monad, and even more so its many cousins on other categories of spaces — definitely spaces that are not yet equipped with measures. One then obtains (variants of) Prob\mathbf{Prob} by taking the category of elements of the underlying functor of the monad.

In probability theory, one is very often interested in weak convergence of probability measures. In order to capture this categorically, taking the category of spaces to contain mere measurable spaces is not enough; one also needs some topological or metric structure to talk about convergence, and the morphisms should preserve (at least some of) this structure. The most well-behaved probability monads that I’m aware of are the Radon monad on compact Hausdorff spaces and the Kantorovich monad on complete metric spaces. Using a suitable monad of this type, it should be possible to generalize the Kolmogorov extension theorem to a theorem saying that the underlying functor of the monad commutes with filtered limits. (And the Kolmogorov extension theorem lies at the foundation of the theory of stochastic processes. The point is that an infinite product space is the filtered limit of its finite subproduct spaces.)

Unfortunately I don’t know much about probability sheaves. From my naive understanding, I’m worried that there may be many probability sheaves that are not interesting or meaningful from the probability theory perspective. Intuitively it feels like probability sheaves form a category which is rather large. Do you happen to have any insight into this?

Posted by: Tobias Fritz on September 13, 2018 3:00 PM | Permalink | Reply to this

Re: A Categorical Look at Random Variables

Thanks Tobias. Regarding convergence, I’ve been thinking about this quite a bit too. For example, since you mention filtered limits, and Stone spaces are the free profinite completion of finite sets, I suspect that there is a Radon monad on Stone spaces that is a canonical filtered-limit-preserving extension of the finite probability monad, so that Kolmogorov extension becomes more of a definition than a theorem. I couldn’t find this explicit in the literature but perhaps you know it. (Or perhaps it doesn’t work!)

Regarding probability sheaves, I think I understand how you feel: some objects are like RV(E)\mathrm{RV}(E), some are like probability spaces, but what about arbitrary objects? Perhaps it is an instance of nice objects versus nice categories! It does seem to be a very nice category. For example, coming back to the other point, Alex Simpson relates dependent choice to Kolmogorov extension.

Posted by: Sam Staton on September 14, 2018 8:45 PM | Permalink | Reply to this

Re: A Categorical Look at Random Variables

I like the dichotomy between nice objects and nice categories. In the standard examples like schemes or generalized smooth spaces, there are also plenty of non-nice objects that come up in nature, such as Spec()\mathrm{Spec}(\mathbb{Z}) in the scheme case or orbifolds in the generalized smooth space case, and this is a good argument for working with those particular nice categories. So are there any non-nice probability sheaves that you’ve come across in nature?

Posted by: Tobias Fritz on September 16, 2018 11:06 AM | Permalink | Reply to this

Re: A Categorical Look at Random Variables

Thanks for your comment, Tobias. It really gets at everything that I’ve been struggling to sort out for myself here.

Regarding your first paragraph:

From the categorical perspective, isn’t it perfectly enough to have an isomorphism ΩΩ\Omega \cong \Omega' in such a situation rather than an on-the-nose identification?

But for all practical purposes, having an isomorphism ΩΩ\Omega \cong \Omega' is perfectly enough.

The issue is, “perfectly enough” for which “practical purposes”? The point I was making with the three examples was that isomorphic probability spaces are not necessarily “the same” from the point of view of probability theory. However, spaces which are equal up to events of measure zero are “the same”. So isomorphism is not enough to describe “sameness” in this sense.

I think your second paragraph points toward the right solution to this conundrum. To add one more detail to what you said: isomorphism really is “perfectly enough” to describe sameness for the domains of random variables. But the codomains need to be thought of as more than just objects of Prob\mathbf{Prob}. And it’s in trying to describe that extra structure in a categorical way that my meager knowledge of category theory gives out. But you’ve given me some good words to go look up and move forward with.

So yes, everything you said agrees with my perspective, but some of it is new to me.

(One more comment as an aside: it is absolutely irrelevant that my three examples involved families morphisms with the same codomain; that was merely for simplicity’s sake.)

Posted by: Mark Meckes on September 12, 2018 8:43 PM | Permalink | Reply to this

Re: A Categorical Look at Random Variables

I’m glad you’ve found my comment useful. If you or anyone else is looking for an introduction to probability monads, there’s a very nice exposition in Chapter 1 of Paolo’s thesis.

Posted by: Tobias Fritz on September 13, 2018 2:38 PM | Permalink | Reply to this

Re: A Categorical Look at Random Variables

What a load of category theory.

Posted by: df on September 6, 2018 6:12 AM | Permalink | Reply to this

Re: A Categorical Look at Random Variables

Thank you Mark for your post. I have been also thinking for a while about developing a more categorical understanding of random variables, with a very similar picture in mind. So let me share some of my thoughts (quite preliminary) on that topic. One way to formulate the notion of random variable is to start from the forgetful functor p:ProbMeasp\quad : \quad\mathbf{Prob}\longrightarrow\mathbf{Meas} Here, I define Prob\mathbf{Prob} as the category whose objects are probability spaces (measure spaces with total measure 1) and whose morphisms are measure-preserving maps, and Meas\mathbf{Meas} as the category whose objects are measurable spaces, and whose morphisms are measurable functions. In recent work with Noam Zeilberger, we develop the idea inspired by type theory and which may be traced back to Jean B'enabou that every functor p:p\quad :\quad \mathcal{E}\longrightarrow\mathcal{B} defines what we call a refinement system. The intuition is that every object RR of the total category \mathcal{E} is a refinement of its image A=p(R)A=p(R) in the basis category \mathcal{B}. In other words, and in the passive mode, every object AA of the basis category \mathcal{B} is “refined” by the elements RR of its fiber A\mathcal{E}_A along the functor pp. Now, the nice thing is that in this basic setting, one may define a left (and lax) notion of refinement as follows. A left refinement of an object AA in the basis category \mathcal{E} is defined as a pair (R,f)(R,f) consisting of an object RR in the total category \mathcal{E} and of a morphism f:p(R)Af:p(R)\to A. Details of the construction appear in our paper with Noam An Isbell Duality Theorem for Type Refinement Systems where what I call left refinement here is called positive representation there (see Def. 3.3). Now, coming back to our original discussion, let us consider again the forgetful functor p:ProbMeasp \quad : \quad \mathbf{Prob}\longrightarrow\mathbf{Meas} but this time as a refinement system. It appears then that a random variable X:ΩEX\quad : \quad \Omega\longrightarrow E on a measurable space EE may be simply defined as a left refinement of the measurable space EE, consisting of a probability space (Ω,)(\Omega,\mathbb{P}) in the total category Prob\mathbf{Prob} together with a measurable function X:p(Ω,)EX\quad : \quad p(\Omega,\mathbb{P})\longrightarrow E in the basis category. This is just another way to formulate random variables in a categorical language. The idea of an object RR refining an object AA up to a morphism f:p(R)Af:p(R)\to A is obviously reminiscent of the notion of homotopy fiber in algebraic topology, and it is thus nice to see it at work at the very heart of probability theory. And I fully agree with Mark (thank you for your post again!) that much remains to be done in order to connect the two fields more seriously, besides these preliminary observations.
Posted by: Paul-Andre Mellies on September 6, 2018 10:50 AM | Permalink | Reply to this

Re: A Categorical Look at Random Variables

That’s very interesting, thanks! Is your comment about homotopy fibers the same thing Simon mentions below?

Posted by: Mark Meckes on September 6, 2018 4:54 PM | Permalink | Reply to this

Re: A Categorical Look at Random Variables

Finally, as I said, a probabilist may go about understanding the distribution of the random variable F(X)F(X) – that is, the object F !(X)F_!(X) of R(E)\mathbf{R}(E) – by instead working with another object YY in the same connected component of R(E)\mathbf{R}(E). Both the assumptions on XX and the structure of FF may be used to help cook up YY.

Can you give any (very basic!) examples of this? The one thing that it vaguely reminded me of was in homotopy theory when you use a (co)fibrant replacement (e.g. a nice resolution) which is equivalent in an appropriate sense to the original object.

Posted by: Simon Willerton on September 6, 2018 11:19 AM | Permalink | Reply to this

Re: A Categorical Look at Random Variables

Can you give any (very basic!) examples of this?

Sure, if you could also return the favor and explain (and maybe give some very basic examples of) this:

The one thing that it vaguely reminded me of was in homotopy theory when you use a (co)fibrant replacement (e.g. a nice resolution) which is equivalent in an appropriate sense to the original object.

Here’s the most basic example I can think of. The same thing can be done more simply by other means, but it serves to illustrate the phenomenon. I’ll first write as a probabilist would:

Let XX be an integrable real-valued random variable whose distribution is symmetric; that is, X-X has the same distribution as XX. Let YY be another random variable, independent of XX, which has the values ±1\pm 1 each with probability 1/21/2. Then XX has the same distribution as XYX Y. By independence it follows that 𝔼X=𝔼(XY)=(𝔼X)(𝔼Y)=(𝔼X)0=0. \mathbb{E} X = \mathbb{E} (X Y) = (\mathbb{E} X) (\mathbb{E} Y) = (\mathbb{E} X) \cdot 0 = 0. (Here 𝔼\mathbb{E} denotes expectation, i.e., the integral with respect to the probability measure on the domain of XX.)

Here’s what’s going on in detail in terms of the category-theoretic picture. We start with a random variable X:ΩX:\Omega \to \mathbb{R} (an object in R()\mathbf{R}(\mathbb{R}), and assume that XX is integrable and symmetric. These are assumptions about the distribution, so they specify a subcollection of connected components of R()\mathbf{R}(\mathbb{R}) in which XX lies. I don’t currently have a way of explaining integrability nicely in this picture, but symmetry means that if F:RRF:\R \to \R is the function F(x)=xF(x) = -x, then the object F !(X)F_!(X) lies in the same connected component of R()\mathbf{R}(\mathbb{R}) as XX.

Now I define a new random variable Z:Ω×{1,1} 2Z:\Omega \times \{-1, 1\} \to \mathbb{R}^2 by Z(ω,y)=(X(ω),y)Z(\omega, y) = (X(\omega), y), and equip Ω×{1,1}\Omega \times \{-1, 1\} with the probability measure which is the product of \mathbb{P} (the original probability measure on Ω\Omega) and the uniform probability measure on {1,1}\{-1,1\}. Denote by π i: 2\pi_i:\mathbb{R}^2 \to \mathbb{R} the coordinate projection maps for i=1,2i=1,2. Then Y:=π 2(Z)Y := \pi_2(Z) has the claimed distribution; X=π 1(Z)X' = \pi_1(Z) and YY are independent; and XX, XX', and XYX' Y all have the same distribution (lie in the same component of R()\mathbf{R}(\mathbb{R})).

Then 𝔼X=𝔼X𝔼(XY)=(𝔼X)(𝔼Y)=(𝔼X)0=0. \mathbb{E} X = \mathbb{E} X' \mathbb{E} (X' Y) = (\mathbb{E} X') (\mathbb{E} Y) = (\mathbb{E} X') \cdot 0 = 0.

Note that the probabilist’s way of writing not only elides the fact that the domain of the random variable changes; it also ignores the fact that when that happens, you technically must be working with a different random variable.

Posted by: Mark Meckes on September 6, 2018 2:47 PM | Permalink | Reply to this

Re: A Categorical Look at Random Variables

I just realized I rather unkindly changed the meanings of all the letters around. YY in the post corresponds to either X=π 1(Z)X' = \pi_1(Z) or XYX' Y in the comment, and FF in the post corresponds to the unmentioned identity function in the comment. π 2\pi_2 and ZZ don’t correspond to anything named explicitly in the post; they’re part of the cooking up of YY in the context of the post.

Posted by: Mark Meckes on September 6, 2018 4:50 PM | Permalink | Reply to this

Re: A Categorical Look at Random Variables

I just spotted a typo in the last paragraph: Prob\mathbf{Prob} should be Meas\mathbf{Meas} there. (Maybe some moderator who sees this could fix it.)

Posted by: Mark Meckes on September 6, 2018 1:24 PM | Permalink | Reply to this

Re: A Categorical Look at Random Variables

Fixed.

Posted by: Tom Leinster on September 13, 2018 7:14 PM | Permalink | Reply to this

Re: A Categorical Look at Random Variables

Thank you Mark for the very interesting post!

I see that most comments are about your (nice!) idea about expressing random variables in terms of the slice category construction. I would like to address more, instead, what you say about applying category theory to probability, since I kinda share your concern.

You make a very good point: probability theory is not about probability spaces, nor about spaces of probability measures.

In order for category theory to be helpful to other branches of math, one does not just need to express the “foundational” part in terms of categories. That is of course important, but it is only setting the stage. In every math book, that’s usually Chapter 0. What about the rest of the book?

I believe that, in order to apply seriously category theory to probability (the way it is for example applied to algebraic geometry and topology), the first question that needs to be asked is, rather: what are the problems that are studied in probability theory and related fields? What are the main results? Sometimes, by passing to the categorical formalism, the objects change radically. Sometimes results become definitions: think of the definitions of boundary and coboundary map in algebraic topology and homological algebra.

So, it’s true, the objects that probability theorists use are mostly random variables. But I don’t think that those should necessarily be the main objects of a categorical treatment, nor should one focus on “what the main objects are”. What are probability theorists saying in terms of random variables? And why do they express it in terms of random variables?

Here is one possible answer, out of many. The reason why people use random variables, as opposed for example to just their laws (probability distributions), is that random variables may interact, i.e. have “correlation”. This is what makes probability theory so interesting. For example, in order for the law of large numbers to hold, the trials should not only be identically distributed, but also stochastically independent. In order to express the same concept there are many different alternatives: people in optimal transport prefer using directly joint distributions (which they call couplings). People in dynamical systems prefer talking about stochastic maps, or Markov kernels, which again may exhibit statistical dependence or not. The same concept, stochastic interaction, can be expressed in many ways, and the reason why people in different fields use different formalisms is largely historical and sociological.

(Of course, how to express random variables categorically remains an interesting question - especially since it seems to have analogies to other parts of math!)

Just like stochastic interaction, another key concept in probability theory seems to be that of average, or expectation. This, thankfully, has been addressed a bit more by category theorists, since it is the basic idea behind probability monads and their algebras. While probability monads at the present time are mostly only “setting the stage”, I think their introduction is going already in the right direction. (But of course I’m biased here, since that’s part of my work.)

That said, of course, their is also work to be done at the foundational level. For example, I find that that from the categorical perspective, the concept of valuation has better properties than the usual measures. But that’s another long story, so I think I’ll stop here :)

Posted by: Paolo Perrone on September 6, 2018 5:44 PM | Permalink | Reply to this

Re: A Categorical Look at Random Variables

Mark and I had a nice conversation about this last week, much of which centered on the larger issue of what the goal of a categorical perspective on probability theory would look like and what it would do. We talked a lot about how developments in homotopy theory led people to consider different categorical formulations of what the subject was even about and how that was useful in solving problems or making folk statements concrete. I think this last point (making folk statements concrete) is easy to overlook, but often a good reason to modify your foundations in some way.

Posted by: Nick Gurski on September 6, 2018 9:11 PM | Permalink | Reply to this

Re: A Categorical Look at Random Variables

This is a very good point - and making folk statements concrete is a very important task for structuralists. It is generally a sign of good theory-building. This is also one of the reasons why a structural perspective doesn’t necessarily “abstract the matter away from the practice”.

Posted by: Paolo Perrone on September 9, 2018 4:29 PM | Permalink | Reply to this

Re: A Categorical Look at Random Variables

This is maybe a good spot to bring up Tim Gowers’s essay “The Two Cultures of Mathematics” (which has been discussed here before). Part of my interest here is in trying to chip away at the barrier between theory-building and problem-solving cultures, or to better understand the nature of the barrier in the process.

Part of the difference appears to be that some fields have found that making folk statements precise has been extremely fruitful, whereas other fields (Gowers discuss mainly his own areas of combinatorics and Banach space geometry here) have powerful general principles that resist being summarized in theorems that are simultaneously precise enough to be theorems and general enough to be useful.

Posted by: Mark Meckes on September 11, 2018 2:52 PM | Permalink | Reply to this

Re: A Categorical Look at Random Variables

As a category theorist who studies probability, I’m always amazed by how little the communities of probability and category theorists know about each other’s fields (with few notable exceptions). When I talk to a category theorist, often I have to explain the basic idea of the law of large numbers, and when I talk to a probabilist, I often have to explain the basic idea behind the concept of functor. In my (limited) experience, both understand the new concepts well, provided that they are presented in a way which is compatible with their way of thinking. I think that part of this divide can be explained by the way the basic concepts in the respective fields are taught, which in turn has historical and sociological reasons, not necessarily pertaining the mathematics itself. Of course people in different fields think differently, and people who think differently tend to go to different fields. But I find it at least curious that, for example, whenever I explain the concept of a functor to a probability theorist, they generally come up themselves with examples of them, in probability. Analogously, most category theorists seem to understand Markov kernels work if I present them as Kleisli morphisms of a probability monad.

Whatever the reason for this divide, I would also like to see less of it. Us-versus-them attitudes are usually far from fruitful, in mathematics as well as elsewhere.

Posted by: Paolo Perrone on September 11, 2018 4:34 PM | Permalink | Reply to this

Re: A Categorical Look at Random Variables

I agree that the focus here should be on what goes on in probability theory — what are the theorems, what are the techniques? — because, after all (as you and Nick allude to), you can change what you think of as the central objects are, sometimes quite fruitfully. In fact, though I didn’t discuss this in the post, my criterion for a perspective I’m happy with is precisely whether it provides a reasonable way of looking at the main techniques and theorems of probability theory.

I don’t find your argument for random variables over distributions very compelling, since it’s easy enough to address by saying, “Don’t look at just the individual distributions, look at the joint distribution of all the random variables.” (This is closely related to what I said in another comment below.) And indeed some of the hypotheses about interactions of random variables are easier to state in terms of measures. For example, you could say the classical limit theorems like the law of large numbers are theorems about product measures.

What I do find more persuasive is the following argument. As I described in the post, the structure of a typical theorem in probability is:

Start with a collection of random variables X iX_i defined on a common domain, apply a function f: iE iEf:\prod_i E_i \to E, and look at the distribution of f({X i})f(\{X_i\}).

You can try to phrase this in terms of distributions only:

Start with a probability distribution μ\mu on a set Ω\Omega (i.e., the joint distribution of the E iE_i), apply a function f:ΩEf:\Omega \to E, and look at the push-forward f #μf_\# \mu.

But even though the phrase “random variable” wasn’t mentioned, there’s still one right there: ff itself.

The point is that the theorems of probability theory are unavoidably about random variables. As you say, that doesn’t necessarily mean that random variables have to be the main objects of a structuralist perspective on probability theory. And it certainly doesn’t mean that a categorical approach to probability has to involve a category whose objects are random variables. But it does suggest that random variables are the right starting point.

Posted by: Mark Meckes on September 7, 2018 10:16 PM | Permalink | Reply to this

Re: A Categorical Look at Random Variables

Right, your argument for using random variables is more compelling than mine :)

It is true, while correlation or “interactions” can be taken care of by looking at joints, the idea of a random variable in the sense of a “measurable map” (along which one pushes forward a probability distribution) is one of the central concepts. I agree that any structural treatment should contain some version of this idea.

Posted by: Paolo Perrone on September 9, 2018 4:35 PM | Permalink | Reply to this

Re: A Categorical Look at Random Variables

Hi Mark: glad to see some people with working experience of probability theory are still interested in this.

I can’t remember if I already brought this up on those old MathOverflow posts/discussions, but is there a good rejoinder to the POV set out in the preface to Williams’s Probability with Martingales, some of which I quote below:

At the level of this book, the theory would be more ‘elegant’ if we regarded a random variable as an equivalence class of measurable functions on the sample space, two functions belonging to the same equivalence class if and only if they are equal almost everywhere…

I have however chosen the ‘inelegant’ route… I hope that this book will tempt you to progress to the much more interesting, and much more important, theory where the parameter set of our process is uncountable… the equivalence-class formulation just will not work: the ‘cleverness’ of introducing quotient spaces loses the subtlety which is essential even for formulating the results on existence of continuous modifications, etc.

Apologies if this is orthogonal to what you were considering in you post, but I think it is a genuinely interesting issue that any categorical approach to probability theory has to reckon with.

Posted by: Yemon Choi on September 6, 2018 6:23 PM | Permalink | Reply to this

Re: A Categorical Look at Random Variables

Hi, Yemon. Thanks for your comment. I think it’s actually very much on point.

My rejoinder is that when you’re working with a stochastic process {X t:tT}\{X_t : t \in T\} indexed by some parameter set TT, you shouldn’t think of it as a collection of random variables, each of which is an equivalence class. Rather, it is an equivalence class of random functions X(ω):TEX(\omega):T \to E, that is, a random element of E TE^T. The tricky point is that

For every tTt \in T, for almost every ω\omega, X t(ω)=X t(ω)X_t(\omega) = X_t'(\omega)

does not imply

For almost every ω\omega, for every tTt \in T, X t(ω)=X t(ω)X_t(\omega) = X_t'(\omega)

(There’s an existential quantifier implicit in “for almost every”!) The first statement is what you’d think of as equivalence for each of a collection of random variables; the second is equivalence of random functions.

I think Williams understands this very well and decided to handle things differently for pedagogical reasons. (And some authors don’t deal with the issue at all.)

I didn’t want to dwell on this issue in the post, either, but it’s hinted at in the fact that in order to talk about a family of random variables (i.e., a stochastic process), I worked with a “random variable” whose codomain is a product set; thus, a random function. But now that I think about it, Williams’s point about uncountable parameter sets may indicate that my approach of treating morphisms in Prob\mathbf{Prob} as equivalence classes could run into trouble down the line.

Posted by: Mark Meckes on September 6, 2018 10:50 PM | Permalink | Reply to this

Re: A Categorical Look at Random Variables

Nice post! Just to point out a couple of points in the literature (which you may already be aware of).

The first and most relevant is I think Alex Simpson’s work on probability sheaves. He finds profit in thinking of random variables as sheaves Prob opSet\mathbf{Prob}^\mathrm{op}\to \mathbf{Set} and then studying them within that sheaf topos. See for instance Probability Sheaves and the Giry Monad, Category-theoretic Structure for Independence and Conditional Independence, and various online talks and slides. His sheaves are very similar to your RV(E)\mathbf{RV}(E), as far as I can tell, perhaps via discrete fibrations.

The second point is our idea of quasi-Borel spaces. One starting point for that is the idea that the σ\sigma-algebra on EE is not as important as the random variables. So we define a quasi-Borel space to be a set EE together with a set M[ΩE]M\subseteq [\Omega\to E] from some fixed Ω\Omega, thought of as the admissible random variables. In many examples MM comprises the measurable functions, but other instances are also useful, especially to get Cartesian closure.

Posted by: Sam Staton on September 9, 2018 8:44 AM | Permalink | Reply to this

Re: A Categorical Look at Random Variables

It may be relevant to note that Juan Pablo Vigneaux will be giving a talk « Une introduction àla topologie de l’information » based on his recent article (https://arxiv.org/abs/1709.07807).

The talk is in Francois Metayer’s `group de travail’ on Friday 28th September in Paris. Contact Francois if you are interested in attending.

Posted by: Tim Porter on September 17, 2018 2:29 PM | Permalink | Reply to this

Post a New Comment