### On the Law of Large Numbers (Such As 60)

#### Posted by Tom Leinster

I spent yesterday morning in the computer science department at Strathclyde,
for the
60th
birthday celebrations of Peter Hancock. Often these events are months away
from the person’s *actual* birthday, but in this case it was only one day
off. Happy birthday, Hank!

One of the talks I went to was by Alex Simpson, the title of which was too good not to use as the title of this post. Simpson’s recent work has re-examined foundational questions about probability and randomness. It’s also the most exciting use of locales I think I’ve ever encountered. I want to tell you something about it here.

Let me begin by making clear that all mistakes here are definitely mine. The chances of mistakes are higher than usual, as I don’t understand this stuff nearly as well as I’d like to. In particular, Simpson’s work has made me realize that my understanding of locales is not as good as I’d like. If you want correct, precise statements, you should read his paper Measure, randomness and sublocales, which covers much of what I’m going to say.

We’ll get to locales in a bit. First, though, a basic question:

What is a random sequence?

If someone tells you that the binary digits of $\pi$ are thought to behave like a random sequence, you know roughly what they mean, even though the binary digits of $\pi$ aren’t random at all. Probably they’re using “random” to include properties such as “there are as many 0s as 1s” or “the time you have to wait before the appearance of a given $n$-digit block is on average the same as it would be for a random sequence” — that kind of thing. There’s some intuitive understanding here, perhaps one that can be made precise in several different ways, but an intuition all the same.

Simpson takes as his model scenario an infinite sequence of fair coin tosses. The set of all sequences of coin tosses is

$2^\mathbb{N} = 2 \times 2 \times \cdots,$

where $2$ is a two-element set. As a
coin belonging to a *theoretical* computer scientist, its faces are
labelled $0$ and $1$, not heads and tails; so the two elements of $2$ will be
called $0$ and $1$.

We’ll need two pieces of structure on the set $2^\mathbb{N}$.

**Topology** Give $2 = \{0, 1\}$ the discrete topology, and give
$2^\mathbb{N}$ the resulting product topology. The topological space
$2^\mathbb{N}$ is then called the **Cantor set** (or more logically,
**Cantor space**). People often think of the Cantor set as embedded in
$\mathbb{R}$, via the map sending $x = (x_0, x_1, \ldots)$ to $\sum 2x_n 3^{-(n
+ 1)}$, but we won’t need that viewpoint here.

Simpson emphasized the following influential idea: in informatic terms, *open
subsets can be regarded as observable properties*. I don’t know much about the history of this idea. Alex told me afterwards that it was made very explicit in papers of Mike Smyth, under the name of “semidecidable property”. The name “observable property” is due to Samson Abramsky, I believe.

The Cantor set provides a good
example. Imagine someone tossing a coin repeatedly, while you stand there
watching. Because you’re a finite creature, all you can observe is how the
sequence of coin-tosses
*starts*. Now, a subset $U$ of $2^\mathbb{N}$ is open if and only if it
has the following property: for all $x = (x_0, x_1, \ldots) \in U$, there
exists $n \in \mathbb{N}$ such that

$\{ y \in 2^\mathbb{N} : (x_0, \ldots, x_n) = (y_0, \ldots, y_n)\} \subseteq U.$

That’s just how the product topology is defined, but you can see that it also embodies the idea that a set is open if it consists of all sequences satisfying some observable property.

**Measure**
The Cantor set carries a canonical measure. It’s the unique probability
measure with the obvious symmetry and
self-similarity properties. For
example, it gives measure $1/4$ to the set of all sequences beginning $10$.
I’ll call this measure $\lambda$.

AsideAs far as a measure theorist is concerned, $2^\mathbb{N}$ with its measure $\lambda$ is the same as $[0, 1]$ with Lebesgue measure. Indeed, there’s a surjection $f: 2^\mathbb{N} \to [0, 1]$ sending a sequence $x$ to the real number with binary expansion $0.x_0 x_1 x_2 \ldots$. It’s not quite injective, because numbers of the form $m/2^n$ have more than one binary expansion. On the other hand, the set of such numbers has measure zero, and every other real number has just one binary expansion. So from a measure-theoretic point of view, $f$ might as well be a bijection. Also, $f$ turns the measure $\lambda$ on $2^\mathbb{N}$ into Lebesgue measure on $[0, 1]$. For example, the image under $f$ of the set of sequences beginning $10$ is $[1/2, 3/4]$, which has measure $1/4$. So you could replace $2^\mathbb{N}$ by $[0, 1]$ if you wanted.

Now let’s return to the question:

What is a random sequence?

Here’s Simpson’s bold idea. We think of a sequence as ‘random’ — remember the example of $\pi$ — if it has all ‘generic’ properties. In other words, a sequence is random if it belongs to all subspaces of $2^\mathbb{N}$ of measure one. So, he defines the space $\mathbf{Ran}$ of random sequences to be the intersection of all measure-one subspaces of $2^\mathbb{N}$.

This is so bold that it’s obviously wrong. *No* sequence has *all*
generic properties: any given sequence $x$, no matter how ‘random’ it might
look, fails to have the generic property of not being equal to $x$. That is,
any given sequence $x \in 2^\mathbb{N}$ fails to belong to the measure-one
subspace $2^\mathbb{N} \setminus \{x\}$.

But this doesn’t deter Simpson, because he knows the following: just because a space has no points doesn’t mean it’s trivial.

If you know about locales, you’ll have guessed that they’re how Simpson makes sense of this. If you don’t know about locales, keep reading anyway, because I’ll tell you now.

The idea is childishly simple. In a topological space, the open sets form a
partially
ordered set with respect to inclusion. This poset has a few
properties: e.g. every
pair of elements in the poset has a greatest lower bound, since the
intersection of two open sets is open. A **locale** is a
poset with these properties.

Why bother? Partly because of applications like the one I’m about to tell you, but also for much more basic reasons. Many mathematicians are unhappy with topological spaces in which the points can’t be well separated by open sets (e.g. non-Hausdorff spaces). There’s an instinct that that’s not in the true spirit of topology. For example, indiscrete spaces are non-separated in the most extreme way. To some extent, locales relieve this problem: e.g. all nonempty indiscrete spaces give rise to the same locale, the two-element totally ordered set.

Although points play no part in the *definition* of locale, there
*is* a
definition of ‘point of a locale’. That is, every locale has an underlying set
of points. I won’t tell you how what the definition is, but as a sanity check,
if you start with a Hausdorff topological space $X$, the points of the resulting locale are exactly the points of $X$.

Now here’s the crucial thing: *there are useful locales with no points at
all*. For example, there’s a ‘locale of surjections $\mathbb{N} \to
\mathbb{R}$’. Without saying anything about what that locale is, it’s clear
that it can’t have any points.

Simpson’s locale $\mathbf{Ran}$ of random sequences, which I’ll define in a
moment, is another
locale with no points. No *individual* sequence is random, but there’s a
meaningful *space* of random sequences.

By definition, $\mathbf{Ran}$ is the intersection of all measure-one open sublocales of $2^\mathbb{N}$. It’s also equal to the intersection of all measure-one not-necessarily-open sublocales of $2^\mathbb{N}$. To understand those statements, you evidently need to understand what ‘sublocale’ means, but you can appreciate the idea anyway: a sequence is random if it’s contained in every measure-one subspace.

Another way to say it is that

$\mathbf{Ran} = Open(2^\mathbb{N})/\sim.$

Here $Open(2^\mathbb{N})$ is the poset of open subsets of $2^\mathbb{N}$, and the equivalence relation $\sim$ is given by $U \sim V$ if and only if $\lambda(U \Delta V) = 0$. Remember that $\lambda$ is the canonical measure on $2^\mathbb{N}$. Also, I’m using $\Delta$ to denote symmetric difference. So this is a standard idea in measure theory: identify two sets if their symmetric difference has measure zero.

I still haven’t got to the Law of Large Numbers — henceforth, the LLN.

For a sequence $x$ of zeros and ones, the LLN says that

$\lim_{n \to \infty} \frac{1}{n} \sum_{i = 0}^n x_i = \frac{1}{2}.$

Do all random sequences satisfy the LLN? In other words, is the set of sequences satisfying the LLN the whole of the locale $\mathbf{Ran}$?

Well… we have a problem. The set $L$ of sequences satisfying the LLN is some subset of $2^\mathbb{N}$, and not even an open subset at that. It doesn’t make sense to ask whether it’s the whole of $\mathbf{Ran}$. Quite simply, $L$ and $\mathbf{Ran}$ are objects of different types. So how do we begin to compare them?

Much of Simpson’s talk was about this. He took advantage of the fact that
$L$ is a Borel set in $2^\mathbb{N}$. Recall that in any topological
space, the class of
**Borel sets**
is, by definition, the smallest class of subsets containing all the open sets
and closed under
complements and countable unions. The Borel sets are, therefore, also closed
under countable intersections. Now, unwinding the definition of limit, the set $L$ of sequences satisfying the
law of large numbers is

$L = \bigcap_{\varepsilon \gt 0} \bigcup_{N \in \mathbb{N}} \bigcap_{n \geq N} \Bigl\{ x \in 2^\mathbb{N} : \Big| \frac{1}{n} \sum_{i = 0}^n x_i - \frac{1}{2} \Big| < \varepsilon \Bigr\}.$

This doesn’t quite look like a Borel set, because the first intersection is over an uncountable collection of $\varepsilon$s. On the other hand, it makes no difference if we restrict $\varepsilon$ to lie in the countable set $\{1, 1/2, 1/3, \ldots\}$, say. So $L$ is Borel.

I said earlier that points play no part in the definition of locale, but you can still define ‘point of a locale’. Similarly, Borel sets play no part in the definition of locale, but you can still define ‘Borel set in a locale’. In particular, this gives the notion of ‘Borel set in $\mathbf{Ran}$’. There’s a particular Borel set in $\mathbf{Ran}$ consisting of all the sequences satisfying the LLN. I’ll call that Borel set $L$, too.

The question is, then: is $L = \mathbf{Ran}$?

Simpson stated an informal version of this question near the beginning of his talk, and spent most of the middle part setting up the language to state it precisely. It was only in the last few minutes that he answered it. Like a Hollywood movie whose ending is obvious to everyone in the cinema, it was clear that he wouldn’t have bothered doing all this if the answer was no. So a ripple of surprise went round the room when he told us that the answer is, in fact, no.

What’s going on? The moral, he said, is that *the LLN is not derivable
from observable properties of randomness*. You can only observe a
sequence of coin tosses for a finite amount of time, and that tells you
nothing about its long-term behaviour. You can’t gather *any* evidence about whether the sequence will satisfy the LLN. This relates to the fact that the
open sets in $2^\mathbb{N}$, which make up our locale, correspond to the
observable properties of sequences.

(Actually, when he told us the ‘moral’, he mis-spoke and called it the ‘morel’, to rhyme with Borel. This is, as someone in the audience pointed out, a kind of mushroom.)

The very end of the talk was a precise statement of the following slogan: LLN is ‘true’ in the sense that it’s consistent with observable facts. As I understand it, this is the strongest positive statement that could possibly be made once you’ve understood the limited power of finite-time observation. But for the details of that, you might have to wait for Alex’s next paper.

## Re: On the Law of Large Numbers (Such As 60)

For no good reason at all, I was looking at this interesting webpage which includes, among other things, the alarming pieces of Haskel

…

and

… The internet rhymes, and that is nifty.