On the Law of Large Numbers (Such As 60)

December 20, 2011

On the Law of Large Numbers (Such As 60)

Posted by Tom Leinster

$MathML-enabled post (click for more details).$

I spent yesterday morning in the computer science department at Strathclyde, for the 60th birthday celebrations of Peter Hancock. Often these events are months away from the person’s actual birthday, but in this case it was only one day off. Happy birthday, Hank!

One of the talks I went to was by Alex Simpson, the title of which was too good not to use as the title of this post. Simpson’s recent work has re-examined foundational questions about probability and randomness. It’s also the most exciting use of locales I think I’ve ever encountered. I want to tell you something about it here.

$MathML-enabled post (click for more details).$

Let me begin by making clear that all mistakes here are definitely mine. The chances of mistakes are higher than usual, as I don’t understand this stuff nearly as well as I’d like to. In particular, Simpson’s work has made me realize that my understanding of locales is not as good as I’d like. If you want correct, precise statements, you should read his paper Measure, randomness and sublocales, which covers much of what I’m going to say.

We’ll get to locales in a bit. First, though, a basic question:

What is a random sequence?

If someone tells you that the binary digits of $\pi$ are thought to behave like a random sequence, you know roughly what they mean, even though the binary digits of $\pi$ aren’t random at all. Probably they’re using “random” to include properties such as “there are as many 0s as 1s” or “the time you have to wait before the appearance of a given $n$ -digit block is on average the same as it would be for a random sequence” — that kind of thing. There’s some intuitive understanding here, perhaps one that can be made precise in several different ways, but an intuition all the same.

Simpson takes as his model scenario an infinite sequence of fair coin tosses. The set of all sequences of coin tosses is

$2^\mathbb{N} = 2 \times 2 \times \cdots,$

where $2$ is a two-element set. As a coin belonging to a theoretical computer scientist, its faces are labelled $0$ and $1$ , not heads and tails; so the two elements of $2$ will be called $0$ and $1$ .

We’ll need two pieces of structure on the set $2^\mathbb{N}$ .

Topology Give $2 = \{0, 1\}$ the discrete topology, and give $2^\mathbb{N}$ the resulting product topology. The topological space $2^\mathbb{N}$ is then called the Cantor set (or more logically, Cantor space). People often think of the Cantor set as embedded in $\mathbb{R}$ , via the map sending $x = (x_0, x_1, \ldots)$ to $\sum 2x_n 3^{-(n + 1)}$ , but we won’t need that viewpoint here.

Simpson emphasized the following influential idea: in informatic terms, open subsets can be regarded as observable properties. I don’t know much about the history of this idea. Alex told me afterwards that it was made very explicit in papers of Mike Smyth, under the name of “semidecidable property”. The name “observable property” is due to Samson Abramsky, I believe.

The Cantor set provides a good example. Imagine someone tossing a coin repeatedly, while you stand there watching. Because you’re a finite creature, all you can observe is how the sequence of coin-tosses starts. Now, a subset $U$ of $2^\mathbb{N}$ is open if and only if it has the following property: for all $x = (x_0, x_1, \ldots) \in U$ , there exists $n \in \mathbb{N}$ such that

$\{ y \in 2^\mathbb{N} : (x_0, \ldots, x_n) = (y_0, \ldots, y_n)\} \subseteq U.$

That’s just how the product topology is defined, but you can see that it also embodies the idea that a set is open if it consists of all sequences satisfying some observable property.

Measure The Cantor set carries a canonical measure. It’s the unique probability measure with the obvious symmetry and self-similarity properties. For example, it gives measure $1/4$ to the set of all sequences beginning $10$ . I’ll call this measure $\lambda$ .

Aside As far as a measure theorist is concerned, $2^\mathbb{N}$ with its measure $\lambda$ is the same as $[0, 1]$ with Lebesgue measure. Indeed, there’s a surjection $f: 2^\mathbb{N} \to [0, 1]$ sending a sequence $x$ to the real number with binary expansion $0.x_0 x_1 x_2 \ldots$ . It’s not quite injective, because numbers of the form $m/2^n$ have more than one binary expansion. On the other hand, the set of such numbers has measure zero, and every other real number has just one binary expansion. So from a measure-theoretic point of view, $f$ might as well be a bijection. Also, $f$ turns the measure $\lambda$ on $2^\mathbb{N}$ into Lebesgue measure on $[0, 1]$ . For example, the image under $f$ of the set of sequences beginning $10$ is $[1/2, 3/4]$ , which has measure $1/4$ . So you could replace $2^\mathbb{N}$ by $[0, 1]$ if you wanted.

Now let’s return to the question:

What is a random sequence?

Here’s Simpson’s bold idea. We think of a sequence as ‘random’ — remember the example of $\pi$ — if it has all ‘generic’ properties. In other words, a sequence is random if it belongs to all subspaces of $2^\mathbb{N}$ of measure one. So, he defines the space $\mathbf{Ran}$ of random sequences to be the intersection of all measure-one subspaces of $2^\mathbb{N}$ .

This is so bold that it’s obviously wrong. No sequence has all generic properties: any given sequence $x$ , no matter how ‘random’ it might look, fails to have the generic property of not being equal to $x$ . That is, any given sequence $x \in 2^\mathbb{N}$ fails to belong to the measure-one subspace $2^\mathbb{N} \setminus \{x\}$ .

But this doesn’t deter Simpson, because he knows the following: just because a space has no points doesn’t mean it’s trivial.

If you know about locales, you’ll have guessed that they’re how Simpson makes sense of this. If you don’t know about locales, keep reading anyway, because I’ll tell you now.

The idea is childishly simple. In a topological space, the open sets form a partially ordered set with respect to inclusion. This poset has a few properties: e.g. every pair of elements in the poset has a greatest lower bound, since the intersection of two open sets is open. A locale is a poset with these properties.

Why bother? Partly because of applications like the one I’m about to tell you, but also for much more basic reasons. Many mathematicians are unhappy with topological spaces in which the points can’t be well separated by open sets (e.g. non-Hausdorff spaces). There’s an instinct that that’s not in the true spirit of topology. For example, indiscrete spaces are non-separated in the most extreme way. To some extent, locales relieve this problem: e.g. all nonempty indiscrete spaces give rise to the same locale, the two-element totally ordered set.

Although points play no part in the definition of locale, there is a definition of ‘point of a locale’. That is, every locale has an underlying set of points. I won’t tell you how what the definition is, but as a sanity check, if you start with a Hausdorff topological space $X$ , the points of the resulting locale are exactly the points of $X$ .

Now here’s the crucial thing: there are useful locales with no points at all. For example, there’s a ‘locale of surjections $\mathbb{N} \to \mathbb{R}$ ’. Without saying anything about what that locale is, it’s clear that it can’t have any points.

Simpson’s locale $\mathbf{Ran}$ of random sequences, which I’ll define in a moment, is another locale with no points. No individual sequence is random, but there’s a meaningful space of random sequences.

By definition, $\mathbf{Ran}$ is the intersection of all measure-one open sublocales of $2^\mathbb{N}$ . It’s also equal to the intersection of all measure-one not-necessarily-open sublocales of $2^\mathbb{N}$ . To understand those statements, you evidently need to understand what ‘sublocale’ means, but you can appreciate the idea anyway: a sequence is random if it’s contained in every measure-one subspace.

Another way to say it is that

$\mathbf{Ran} = Open(2^\mathbb{N})/\sim.$

Here $Open(2^\mathbb{N})$ is the poset of open subsets of $2^\mathbb{N}$ , and the equivalence relation $\sim$ is given by $U \sim V$ if and only if $\lambda(U \Delta V) = 0$ . Remember that $\lambda$ is the canonical measure on $2^\mathbb{N}$ . Also, I’m using $\Delta$ to denote symmetric difference. So this is a standard idea in measure theory: identify two sets if their symmetric difference has measure zero.

I still haven’t got to the Law of Large Numbers — henceforth, the LLN.

For a sequence $x$ of zeros and ones, the LLN says that

$\lim_{n \to \infty} \frac{1}{n} \sum_{i = 0}^n x_i = \frac{1}{2}.$

Do all random sequences satisfy the LLN? In other words, is the set of sequences satisfying the LLN the whole of the locale $\mathbf{Ran}$ ?

Well… we have a problem. The set $L$ of sequences satisfying the LLN is some subset of $2^\mathbb{N}$ , and not even an open subset at that. It doesn’t make sense to ask whether it’s the whole of $\mathbf{Ran}$ . Quite simply, $L$ and $\mathbf{Ran}$ are objects of different types. So how do we begin to compare them?

Much of Simpson’s talk was about this. He took advantage of the fact that $L$ is a Borel set in $2^\mathbb{N}$ . Recall that in any topological space, the class of Borel sets is, by definition, the smallest class of subsets containing all the open sets and closed under complements and countable unions. The Borel sets are, therefore, also closed under countable intersections. Now, unwinding the definition of limit, the set $L$ of sequences satisfying the law of large numbers is

$L = \bigcap_{\varepsilon \gt 0} \bigcup_{N \in \mathbb{N}} \bigcap_{n \geq N} \Bigl\{ x \in 2^\mathbb{N} : \Big| \frac{1}{n} \sum_{i = 0}^n x_i - \frac{1}{2} \Big| < \varepsilon \Bigr\}.$

This doesn’t quite look like a Borel set, because the first intersection is over an uncountable collection of $\varepsilon$ s. On the other hand, it makes no difference if we restrict $\varepsilon$ to lie in the countable set $\{1, 1/2, 1/3, \ldots\}$ , say. So $L$ is Borel.

I said earlier that points play no part in the definition of locale, but you can still define ‘point of a locale’. Similarly, Borel sets play no part in the definition of locale, but you can still define ‘Borel set in a locale’. In particular, this gives the notion of ‘Borel set in $\mathbf{Ran}$ ’. There’s a particular Borel set in $\mathbf{Ran}$ consisting of all the sequences satisfying the LLN. I’ll call that Borel set $L$ , too.

The question is, then: is $L = \mathbf{Ran}$ ?

Simpson stated an informal version of this question near the beginning of his talk, and spent most of the middle part setting up the language to state it precisely. It was only in the last few minutes that he answered it. Like a Hollywood movie whose ending is obvious to everyone in the cinema, it was clear that he wouldn’t have bothered doing all this if the answer was no. So a ripple of surprise went round the room when he told us that the answer is, in fact, no.

What’s going on? The moral, he said, is that the LLN is not derivable from observable properties of randomness. You can only observe a sequence of coin tosses for a finite amount of time, and that tells you nothing about its long-term behaviour. You can’t gather any evidence about whether the sequence will satisfy the LLN. This relates to the fact that the open sets in $2^\mathbb{N}$ , which make up our locale, correspond to the observable properties of sequences.

(Actually, when he told us the ‘moral’, he mis-spoke and called it the ‘morel’, to rhyme with Borel. This is, as someone in the audience pointed out, a kind of mushroom.)

The very end of the talk was a precise statement of the following slogan: LLN is ‘true’ in the sense that it’s consistent with observable facts. As I understand it, this is the strongest positive statement that could possibly be made once you’ve understood the limited power of finite-time observation. But for the details of that, you might have to wait for Alex’s next paper.

Posted at December 20, 2011 11:49 PM UTC

TrackBack URL for this Entry: https://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/2481

19 Comments & 0 Trackbacks

Re: On the Law of Large Numbers (Such As 60)

$MathML-enabled post (click for more details).$

For no good reason at all, I was looking at this interesting webpage which includes, among other things, the alarming pieces of Haskel

> forsome, forevery :: (Cantor -> Bool) -> Bool
> find :: (Cantor -> Bool) -> Cantor

…

> find = find_i

> forsome p = p(find(\a -> p a))
> forevery p = not(forsome(\a -> not(p a)))

and

> find_i :: (Cantor -> Bool) -> Cantor
> find_i p = if forsome(\a -> p(Zero # a))
>            then Zero # find_i(\a -> p(Zero # a))
>            else One  # find_i(\a -> p(One  # a))

… The internet rhymes, and that is nifty.

Posted by: Jesse C. McKeown on December 21, 2011 4:45 AM | Permalink | Reply to this

Re: On the Law of Large Numbers (Such As 60)

Nice post!

Tom wrote:

open subsets can be regarded as observable properties.

The idea that open subsets can be regarded as ‘stable’ properties — properties which when true don’t become false under tiny perturbations — is both obvious and widely discussed.

In physics it shows up as part of a long conversation about what’s a physically meaningful statement: for example, what would it mean to say the length of my desk is a rational (or irrational) number of centimeters?

From this viewpoint, climbing up the Borel hierarchy means working with more and more idealized sorts of propositions, which are further and further removed from being stable, but still nice to have around mathematically.

In mathematics, some famous guy observed a long time ago that we get models of the intuitionistic propositional calculus by taking propositions to be open sets and negation to be ‘interior of the complement’. Who was that guy—Kuratowski??? Heyting??? Anyway, someone who came along shortly after Brouwer.

The idea that a random sequence is one with all ‘generic’ properties of some sort seems to go back to Richard von Mises. There are a lot of papers trying to make it precise, most famously by Martin-Löf.

Hmm, I hadn’t known this:

Per Martin-Löf began bird watching in his youth and remains an enthusiastic bird-watcher. As a teenager, he published an article on estimating the mortality rates of birds, using data from bird ringing, in a Swedish zoological journal. This paper was soon cited in leading international journals, and this paper continues to be cited

Posted by: John Baez on December 21, 2011 5:18 AM | Permalink | Reply to this

Re: On the Law of Large Numbers (Such As 60)

$MathML-enabled post (click for more details).$

Thanks for all that history, John. Alex did apologize for not giving very many references in his talk, but evidently there would have been a substantial history to describe. And there are references to the papers of Martin-Löf, for one, in Alex’s first paper on this.

Posted by: Tom Leinster on December 21, 2011 10:49 AM | Permalink | Reply to this

Re: On the Law of Large Numbers (Such As 60)

In mathematics, some famous guy observed a long time ago that we get models of the intuitionistic propositional calculus by taking propositions to be open sets and negation to be ‘interior of the complement’. Who was that guy—Kuratowski??? Heyting??? Anyway, someone who came along shortly after Brouwer.

It was Tarski:

A. Tarski, Der Aussagenkalk¼l und die Topologie, Fundamenta Mathemeticae 31 (1938), pp. 103–134.

Posted by: Ulrik Buchholtz on December 22, 2011 1:31 AM | Permalink | Reply to this

Re: On the Law of Large Numbers (Such As 60)

as a sanity check, if you start with a topological space X, the points of the resulting locale are exactly the points of X.

Well, no, and that's connected to the complaint that some spaces X are pathological in some ways, with the extreme example of indiscrete spaces (which, if inhabited, all give rise to the same locale, as you remarked, and this locale has only one point).

You might say it like this:

as a sanity check, if you start with a sane topological space X, the points of the resulting locale are exactly the points of X.

Except that people use the term ‘sober’ instead of ‘sane’ here.

Posted by: Toby Bartels on December 21, 2011 6:24 AM | Permalink | Reply to this

Re: On the Law of Large Numbers (Such As 60)

$MathML-enabled post (click for more details).$

Oh yes, oops. Thanks for pointing that out.

That’s a wrong enough thing to say that I’ll go back and change it.

Posted by: Tom Leinster on December 21, 2011 10:50 AM | Permalink | Reply to this

Re: On the Law of Large Numbers (Such As 60)

$MathML-enabled post (click for more details).$

This is beautiful! One minor correction:

if you start with a topological space X, the points of the resulting locale are exactly the points of X.

That’s not quite true — in fact it contradicts your earlier statement that all inhabited indiscrete spaces give rise to the same locale. But it is true if $X$ is Hausdorff, or more generally if it is sober.

Roughly, one might say that the points of the locale corresponding to $X$ are the points whose existence you could infer if you were only allowed to measure open sets. In an (inhabited) indiscrete space, you could only infer the existence of one point. In some other spaces, you could infer the existence of points that aren’t actually there. But in a sober space, every inferrable point corresponds to a unique actual point.

Posted by: Mike Shulman on December 21, 2011 7:19 AM | Permalink | Reply to this

Re: On the Law of Large Numbers (Such As 60)

$MathML-enabled post (click for more details).$

Right, thanks. Mental blip. As I just said in reply to Toby catching the same error, it’s misleading enough that I’m going to go back and change it. But your and Toby’s comments will remain helpful.

Posted by: Tom Leinster on December 21, 2011 10:52 AM | Permalink | Reply to this

Re: On the Law of Large Numbers (Such As 60)

$MathML-enabled post (click for more details).$

So this is a standard idea in measure theory: identify two sets if their symmetric difference has measure zero.

The whole post, but this line in particular, makes me wonder how much of the basic language of functional analysis can be translated into locale-based terms. The Riesz representation theorem can be thought of as giving an isomorphism between ${C_0(X)}^*$ and a vector space generated by (some class of) measures on the locale of points of $X$ , yes? Using that as your translation, statements like “we equate $L^p$ functions if they differ on a set of measure zero” ought to become things like “the locale of support of any $L^p$ function is contained within $\mathbf{Ran}$ ,” which sounds a lot more natural to me even though it’s saying exactly the same thing.

I obviously need to think about this sometime when I’m not busy packing…

Posted by: MJSS on December 22, 2011 5:36 AM | Permalink | Reply to this

Re: On the Law of Large Numbers (Such As 60)

It is true that locales allow us to formulate measure theory
as it is used in functional analysis and other areas very nicely,
although in a slightly different way than Simpson’s approach
(which leaves open the question of relating the two approaches).

As it turns out, the “right” category of measurable spaces is in fact
a full subcategory of the category of locales.
This subcategory is also equivalent to the opposite category of
the category of commutative von Neumann algebras.

In this formalism there is no such thing as “almost everywhere”:
functions are just equal, not equal up to some negligible part.

See this link for details.

Posted by: Dmitri Pavlov on December 22, 2011 3:10 PM | Permalink | Reply to this

Re: On the Law of Large Numbers (Such As 60)

Thanks for the link! That does look like the genuine formalism I was trying to handwave at. :)

Posted by: MJSS on December 22, 2011 4:06 PM | Permalink | Reply to this

Re: On the Law of Large Numbers (Such As 60)

Choose Markdown and put ASCII angle brackets around the link to make it live: . (It doesn't look like this is actually working, but more likely it's my phone's fault, so I'll post this anyway.)

Posted by: Toby Bartels on December 22, 2011 11:20 PM | Permalink | Reply to this

Re: On the Law of Large Numbers (Such As 60)

$MathML-enabled post (click for more details).$

I have made Dmitri’s above link to measurable locale appear.

I also added links to this entry to measurable space (to which I once had added the list of references to Mitri’s MO posts, and where you, I think, have a “revision plan” that includes a note “localic”) and to measure space.

This just in case someone feels inspired to work on harmonizing / interrelating these three entries further.

Posted by: Urs Schreiber on December 23, 2011 12:26 PM | Permalink | Reply to this

Re: On the Law of Large Numbers (Such As 60)

Thanks, Urs.

Posted by: Tom Leinster on December 23, 2011 10:55 PM | Permalink | Reply to this

Re: On the Law of Large Numbers (Such As 60)

For anybody wondering what went wrong with my comment, my phone's browser seems to have edited both my submission to the Cafe's server and that server's response to it, in ways that I find hard to comprehend. But there are exactly zero decent web browsers available on Android, so this is what we must live with sometimes. The instructions given there do work:

<http://ncatlab.org/nlab/show/measurable+locale>

produces

‘http://ncatlab.org/nlab/show/measurable+locale’.

Posted by: Toby Bartels on December 24, 2011 10:11 AM | Permalink | Reply to this

Re: On the Law of Large Numbers (Such As 60)

If someone tells you that the binary digits of π are thought to behave like a random sequence, you know roughly what they mean, even though the binary digits of π aren’t random at all. Probably they’re using “random” to include properties such as “there are as many 0s as 1s” or “the time you have to wait before the appearance of a given n-digit block is on average the same as it would be for a random sequence” — that kind of thing. There’s some intuitive understanding here, perhaps one that can be made precise in several different ways, but an intuition all the same.

Somewhat off topic:

For a sequence of samples of iid (independent, identically distributed) random numbers it is of course possible to define statistical tests in order to decide if a given finite sequence should be accepted or not. These tests are usually applied to e.g. pseudo-random number generators (see the page random number generator on the Azimuth wiki).

No individual sequence is random, but there’s a meaningful space of random sequences.

This reminds me of an anecdote I read in the book “Numerical Recipes”: One of the authors noticed that linear congruence pseudo-random number generators generate numbers that accumulate on hyperplanes, so that they don’t “look random” at all (these kinds of generators are used as a deterrent example today). When asked about this the programming consultant replied:

We guarantee that each number is random individually, but we don’t guarantee that more than one of them is random.

Posted by: Tim van Beek on December 22, 2011 3:22 PM | Permalink | Reply to this

Re: On the Law of Large Numbers (Such As 60)

This is fantastic. After the break, I think I may try to tell this story to some of the undergraduates here. Thanks Tom.

Posted by: Emily Riehl on December 22, 2011 7:09 PM | Permalink | Reply to this

Re: On the Law of Large Numbers (Such As 60)

$MathML-enabled post (click for more details).$

I’m glad you like it! And if you do end up talking to the undergrads about it, I’d be interested to know how it goes.

If only Alex were here to take part in this conversation, but alas, I think he said he was out of internet contact for the holidays.

Posted by: Tom Leinster on December 23, 2011 10:58 PM | Permalink | Reply to this

Re: On the Law of Large Numbers (Such As 60)

For additional reading on the Borel hierarchy, I would suggest the following by Arnold Miller:

http://projecteuclid.org/euclid.lnl/1235423346

Posted by: Pete on December 30, 2011 1:14 PM | Permalink | Reply to this

The n-Category Café

Skip to the Main Content

December 20, 2011