Planet Musings

August 19, 2018

John BaezOpen Petri Nets (Part 3)

I’ve been talking about my new paper with Jade Master:

• John Baez and Jade Master, Open Petri nets.

In Part 1 we saw the double category of open Petri nets; in Part 2 we saw the reachability semantics for open Petri nets as a double functor. Now I’d like to wrap up by explaining some of the machinery beneath the hood of our results.

I fell in love with Petri nets when I realized that they were really just presentations of free symmetric monoidal categories. If you like category theory, this turns Petri nets from something mysterious into something attractive.

In any category you can compose morphisms f\colon X \to Y and g\colon Y \to Z and get a morphism gf \colon X \to Z. In a monoidal category you can also tensor morphisms f \colon X \to X' and g \colon Y \to Y' and get a morphism f \otimes g \colon X \otimes X' \to Y \otimes Y'. This of course relies on your ability to tensor objects. In a symmetric monoidal category you also have X \otimes Y \cong Y \otimes X. And of course, there is more to it than this. But this is enough to get started.

A Petri net has ‘places’ and also ‘transitions’ going between multisets of places:

From this data we can try to generate a symmetric monoidal category whose objects are built from the places and whose morphisms are built from the transitions. So, for example, the above Petri net would give a symmetric monoidal category with an object

2 susceptible + 3 infected

and a morphism from this to the object

susceptible + 2 infected

(built using the transition infection), and a morphism
from this to the object

susceptible + infected + resistant

(built using the transition recovery) and so on. Here we are using + to denote the tensor product in our symmetric monoidal category, as usual in chemistry.

When we do this sort of construction, the resulting symmetric monoidal category is ‘free’. That is, we are not imposing any really interesting equations: the objects are freely generated by the places in our Petri net by tensoring, and the morphisms are freely generated by the transitions by tensoring and composition.

That’s the basic idea. The problem is making this idea precise!

Many people have tried in many different ways. I like this approach the best:

• José Meseguer and Ugo Montanari, Petri nets are monoids, Information and Computation 88 (1990), 105–155.

but I think it can be simplified a bit, so let me describe what Jade and I did in our paper.

The problem is that there are different notions of symmetric monoidal category, and also different notions of morphism between Petri nets. We take the maximally strict approach, and work with ‘commutative’ monoidal categories. These are just commutative monoid objects in \mathrm{Cat}, so their associator:

\alpha_{A,B,C} \colon (A + B) + C \stackrel{\sim}{\longrightarrow} A + (B + C)

their left and right unitor:

\lambda_A \colon 0 + A \stackrel{\sim}{\longrightarrow} A

\rho_A \colon A + 0 \stackrel{\sim}{\longrightarrow} A

and even—disturbingly—their braiding:

\sigma_{A,B} \colon A + B \stackrel{\sim}{\longrightarrow} B + A

are all identity morphisms.

The last would ordinarily be seen as ‘going too far’, since while every symmetric monoidal category is equivalent to one with trivial associator and unitors, this ceases to be true if we also require the braiding to be trivial. However, it seems that Petri nets most naturally serve to present symmetric monoidal categories of this very strict sort. There just isn’t enough information in a Petri net to make it worthwhile giving them a nontrivial braiding

\sigma_{A,B} \stackrel{\sim}{\longrightarrow} A + B \cong B + A

It took me a while to accept this, but now it seem obvious. If you want a nontrivial braiding, you should be using something a bit fancier than a Petri net.

Thus, we construct adjoint functors between a category of Petri nets, which we call \textrm{Petri}, and a category of ‘commutative monoidal categories’, which we call \textrm{CMC}.

An object of \textrm{Petri} is a Petri net: that is, a set S of places, a set T of transitions, and source and target functions

s, t \colon T \to \mathbb{N}[S]

where \mathbb{N}[S] is the underlying set of the free commutative monoid on S.

More concretely, \mathbb{N}[S] is the set of formal finite linear combinations of elements of S with natural number coefficients. The set S naturally includes in \mathbb{N}[S], and for any function

f \colon S \to S'

there is a unique monoid homomorphism

\mathbb{N}[f] \colon \mathbb{N}[S] \to \mathbb{N}[S']

extending f.

A Petri net morphism from a Petri net

s, t \colon T \to \mathbb{N}[S]

to a Petri net

s', t' \colon T' \to \mathbb{N}[S']

is a pair of functions

f \colon T \to T'

g \colon S \to S'

making the two obvious diagrams commute:

There is a category \textrm{Petri} with Petri nets as objects and Petri net morphisms as morphisms.

On the other hand, a commutative monoidal category is a commutative monoid object in \mathrm{Cat}. Explicitly, it’s a strict monoidal category (C,+,0) such that for all objects A and B we have

A + B = B + A

and for all morphisms f and g

f + g = g + f

Note that a commutative monoidal category is the same as a strict symmetric monoidal category where the symmetry isomorphisms

\sigma_{A,B} \colon A + B \stackrel{\sim}{\longrightarrow} B+A

are all identity morphisms. Every strict monoidal functor between commutative monoidal categories is automatically a strict symmetric monoidal functor. So, we let \mathrm{CMC} be the category whose objects are commutative monoidal categories and whose morphisms are strict monoidal functors.

There’s a functor

U \colon \mathrm{CMC} \to \mathrm{Petri}

sending any commutative monoidal category C to its underlying Petri net. This Petri net has the set of objects \mathrm{Ob}(C) as its set of places and the set of morphisms \mathrm{Mor}(C) as its set of transitions, and

s, t \colon \mathrm{Mor}(C) \to \mathrm{Ob}(C) \hookrightarrow \mathbb{N}[\mathrm{Ob}(C)]

as its source and target maps.

Proposition. The functor U \colon \mathrm{CMC} \to \mathrm{Petri} has a left adjoint

F \colon \mathrm{Petri} \to \mathrm{CMC}

This is Proposition 10 in our paper, and we give an explicit construction of this left adjoint.

So that’s our conception of the free commutative monoidal category on a Petri net. It’s pretty simple. How could anyone have done anything else?

Montanari and Meseguer do almost the same thing, but our category of Petri nets is a subcategory of theirs: our morphisms of Petri nets send places to places, while they allow more general maps that send a place to a formal linear combination of places. On the other hand, they consider a full subcategory of our \mathrm{CMC} containing only commutative monoidal categories whose objects form a free commutative monoid.

Other papers do a variety of more complicated things. I don’t have the energy to explain them all, but you can see some here:

• Pierpaolo Degano, José Meseguer and Ugo Montanari, Axiomatizing net computations and processes, in Logic in Computer Science 1989, IEEE, New Jersey, 1989, pp. 175–185.

• Vladimiro Sassone, Strong concatenable processes: an approach to the category of Petri net computations, BRICS Report Series, Dept. of Computer Science, U. Aarhus, 1994.

• Vladimiro Sassone, On the category of Petri net computations, in Colloquium on Trees in Algebra and Programming, Springer, Berlin, 1995.

• Vladimiro Sassone, An axiomatization of the algebra of Petri net concatenable processes, in Theoretical Computer Science 170 (1996), 277–296.

• Vladimiro Sassone and Pavel Sobociński, A congruence for Petri nets, Electronic Notes in Theoretical Computer Science 127 (2005), 107–120.

Getting the free commutative monoidal category on a Petri net right is key to developing the reachability semantics for open Petri nets in a nice way. But to see that, you’ll have to read our paper!

Part 1: the double category of open Petri nets.

Part 2: the reachability semantics for open Petri nets.

Part 3: the free symmetric monoidal category on a Petri net.

John BaezOpen Petri Nets (Part 1)

Jade Master and I have just finished a paper on open Petri nets:

• John Baez and Jade Master, Open Petri nets.

Abstract. The reachability semantics for Petri nets can be studied using open Petri nets. For us an ‘open’ Petri net is one with certain places designated as inputs and outputs via a cospan of sets. We can compose open Petri nets by gluing the outputs of one to the inputs of another. Open Petri nets can be treated as morphisms of a category, which becomes symmetric monoidal under disjoint union. However, since the composite of open Petri nets is defined only up to isomorphism, it is better to treat them as morphisms of a symmetric monoidal double category \mathbb{O}\mathbf{pen}(\mathrm{Petri}). Various choices of semantics for open Petri nets can be described using symmetric monoidal double functors out of \mathbb{O}\mathbf{pen}(\mathrm{Petri}). Here we describe the reachability semantics, which assigns to each open Petri net the relation saying which markings of the outputs can be obtained from a given marking of the inputs via a sequence of transitions. We show this semantics gives a symmetric monoidal lax double functor from \mathbb{O}\mathbf{pen}(\mathrm{Petri}) to the double category of relations. A key step in the proof is to treat Petri nets as presentations of symmetric monoidal categories; for this we use the work of Meseguer, Montanari, Sassone and others.

I’m excited about this, especially because our friends at Statebox are planning to use open Petri nets in their software. They’ve recently come out with a paper too:

• Fabrizio Romano Genovese and Jelle Herold, Executions in (semi-)integer Petri nets are compact closed categories.

Petri nets are widely used to model open systems in subjects ranging from computer science to chemistry. There are various kinds of Petri net, and various ways to make them ‘open’, and my paper with Jade only handles the simplest. But our techniques are flexible, so they can be generalized.

What’s an open Petri net? For us, it’s a thing like this:

The yellow circles are called ‘places’ (or in chemistry, ‘species’). The aqua rectangles are called ‘transitions’ (or in chemistry, ‘reactions’). There can in general be lots of places and lots of transitions. The bold arrows from places to transitions and from transitions to places complete the structure of a Petri net. There are also arbitrary functions from sets X and Y into the set of places. This makes our Petri net into an ‘open’ Petri net.

We can think of open Petri nets as morphisms between finite sets. There’s a way to compose them! Suppose we have an open Petri net P from X to Y, where now I’ve given names to the points in these sets:

We write this as P \colon X \nrightarrow Y for short, where the funky arrow reminds us this isn’t a function between sets. Given another open Petri net Q \colon Y \nrightarrow Z, for example this:

the first step in composing P and Q is to put the pictures together:

At this point, if we ignore the sets X,Y,Z, we have a new Petri net whose set of places is the disjoint union of those for P and Q.

The second step is to identify a place of P with a place of Q whenever both are images of the same point in Y. We can then stop drawing everything involving Y, and get an open Petri net QP \colon X \nrightarrow Z, which looks like this:

Formalizing this simple construction leads us into a bit of higher category theory. The process of taking the disjoint union of two sets of places and then quotienting by an equivalence relation is a pushout. Pushouts are defined only up to canonical isomorphism: for example, the place labeled C in the last diagram above could equally well have been labeled D or E. This is why to get a category, with composition strictly associative, we need to use isomorphism classes of open Petri nets as morphisms. But there are advantages to avoiding this and working with open Petri nets themselves. Basically, it’s better to work with things than mere isomorphism classes of things! If we do this, we obtain not a category but a bicategory with open Petri nets as morphisms.

However, this bicategory is equipped with more structure. Besides composing open Petri nets, we can also ‘tensor’ them via disjoint union: this describes Petri nets being run in parallel rather than in series. The result is a symmetric monoidal bicategory. Unfortunately, the axioms for a symmetric monoidal bicategory are cumbersome to check directly. Double categories turn out to be more convenient.

Double categories were introduced in the 1960s by Charles Ehresmann. More recently they have found their way into applied mathematics. They been used to study various things, including open dynamical systems:

• Eugene Lerman and David Spivak, An algebra of open continuous time dynamical systems and networks.

open electrical circuits and chemical reaction networks:

• Kenny Courser, A bicategory of decorated cospans, Theory and Applications of Categories 32 (2017), 995–1027.

open discrete-time Markov chains:

• Florence Clerc, Harrison Humphrey and P. Panangaden, Bicategories of Markov processes, in Models, Algorithms, Logics and Tools, Lecture Notes in Computer Science 10460, Springer, Berlin, 2017, pp. 112–124.

and coarse-graining for open continuous-time Markov chains:

• John Baez and Kenny Courser, Coarse-graining open Markov processes. (Blog article here.)

As noted by Shulman, the easiest way to get a symmetric monoidal bicategory is often to first construct a symmetric monoidal double category:

• Mike Shulman, Constructing symmetric monoidal bicategories.

The theory of ‘structured cospans’ gives a systematic way to build symmetric monoidal double categories—Kenny Courser and I are writing a paper on this—and Jade and I use this to construct the symmetric monoidal double category of open Petri nets.

A 2-morphism in a double category can be drawn as a square like this:

We call X_1,X_2,Y_1 and Y_2 ‘objects’, f and g ‘vertical 1-morphisms’, M and N ‘horizontal 1-cells’, and \alpha a ‘2-morphism’. We can compose vertical 1-morphisms to get new vertical 1-morphisms and compose horizontal 1-cells to get new horizontal 1-cells. We can compose the 2-morphisms in two ways: horizontally and vertically. (This is just a quick sketch of the ideas, not the full definition.)

In our paper, Jade and I start by constructing a symmetric monoidal double category \mathbb{O}\mathbf{pen}(\textrm{Petri}) with:

• sets X, Y, Z, \dots as objects,

• functions f \colon X \to Y as vertical 1-morphisms,

• open Petri nets P \colon X \nrightarrow Y as horizontal 1-cells,

• morphisms between open Petri nets as 2-morphisms.

(Since composition of horizontal 1-cells is associative only up to an invertible 2-morphism, this is technically a pseudo double category.)

What are the morphisms between open Petri nets like? A simple example may be help give a feel for this. There is a morphism from this open Petri net:

to this one:

mapping both primed and unprimed symbols to unprimed ones. This describes a process of ‘simplifying’ an open Petri net. There are also morphisms that include simple open Petri nets in more complicated ones, etc.

This is just the start. Our real goal is to study the semantics of open Petri nets: that is, how they actually describe processes! More on that soon!

Part 1: the double category of open Petri nets.

Part 2: the reachability semantics for open Petri nets.

Part 3: the free symmetric monoidal category on a Petri net.

John BaezOpen Petri Nets (Part 2)

I’d like to continue talking about this paper:

• John Baez and Jade Master, Open Petri nets.

Last time I explained, in a sketchy way, the double category of open Petri nets. This time I’d like to describe a ‘semantics’ for open Petri nets.

In his famous thesis Functorial Semantics of Algebraic Theories, Lawvere introduced the idea that semantics, as a map from expressions to their meanings, should be a functor between categories. This has been generalized in many directions, and the same idea works for double categories. So, we describe our semantics for open Petri nets as a map

\blacksquare \colon \mathbb{O}\mathbf{pen}(\mathrm{Petri}) \to \mathbb{R}\mathbf{el}

from our double category of open Petri nets to a double category of relations. This map sends any open Petri net to its ‘reachability relation’.

In Petri net theory, a marking of a set X is a finite multisubset of X. We can think of this as a way of placing finitely ‘tokens’—little black dots—on the elements of X. A Petri net lets us start with some marking of its places and then repeatedly change the marking by moving tokens around, using the transitions. This is how Petri nets describe processes!

For example, here’s a Petri net from chemistry:

Here’s a marking of its places:

In this example there are only 0 or 1 tokens in each place: we’ve got one atom of carbon, one molecule of oxygen, one molecule of sodium hydroxide, one molecule of hydrochloric acid, and nothing else. But in general we could have any natural number of tokens in each place, as long as the total number of tokens is finite.

Using the transitions we can repeatedly change the markings, like this:




People say one marking is reachable from another if you can get it using a finite sequence of transitions in this manner. (Our paper explains this well-known notion more formally.)

We need something slightly fancier in our paper since we are dealing with open Petri nets. Let \mathbb{N}[X] denote the set of markings of X. Given an open Petri net P \colon X \nrightarrow Y there is a reachability relation

\blacksquare P \subseteq \mathbb{N}[X] \times \mathbb{N}[Y]

This relation holds when we can take a given marking of X, feed those tokens into the Petri net P, move them around using transitions in P, and have them pop out and give a certain marking of Y, leaving no tokens behind.

For example, consider this open Petri net P \colon X \nrightarrow Y:

Here is a marking of X:

We can feed these tokens into P and move them around using transitions in P:

They can then pop out into Y, leaving none behind:

This gives a marking of Y that is ‘reachable’ from the original marking of X.

The main result of our paper is that the map sending an open Petri net P to its reachability relation \blacksquare P extends to a ‘lax double functor’

\blacksquare \colon \mathbb{O}\mathbf{pen}(\mathrm{Petri}) \to \mathbb{R}\mathbf{el}

where \mathbb{O}\mathbf{pen}(\mathrm{Petri}) is a double category having open Petri nets as horizontal 1-cells and \mathbb{R}\mathbf{el} is a double category having relations as horizontal 1-cells.

I can give you a bit more detail on those double categories, and also give you a clue about what ‘lax’ means, without it becoming too stressful.

Last time I said the double category \mathbb{O}\mathbf{pen}(\mathrm{Petri}) has:

• sets X, Y, Z, \dots as objects,

• functions f \colon X \to Y as vertical 1-morphisms,

• open Petri nets P \colon X \nrightarrow Y as horizontal 1-cells—they look like this:

• morphisms between open Petri nets as 2-morphisms—an example would be the visually obvious map from this open Petri net:

to this one:

What about \mathbb{R}\mathbf{el}? This double category has

• sets X, Y, Z, \dots as objects,

• functions f \colon X \to Y as vertical 1-morphisms,

• relations R \subseteq X \times Y as horizontal 1-cells from X to Y, and

• maps between relations as 2-morphisms. Here a map between relations is a square

that obeys

(f \times g) R \subseteq S

So, the idea of the reachability semantics is that it maps:

• any set X to the set \mathbb{N}[X] consisting of all markings of that set.

• any function f \colon X \to Y to the obvious function

\mathbb{N}(f) \colon \mathbb{N}[X] \to \mathbb{N}[Y]

(Yes, \mathbb{N} is a really a functor.)

• any open Petri net P \colon X \nrightarrow Y to its reachability relation

\blacksquare P \colon \mathbb{N}[X] \to \mathbb{N}[Y]

• any morphism between Petri nets to the obvious map between their reachability relations.

Especially if you draw some examples, all this seems quite reasonable and nice. But it’s important to note that \blacksquare is a lax double functor. This means that it does not send a composite open Petri net PQ to the composite of the reachability relations for P and Q. So, we do not have

\blacksquare Q \; \blacksquare P = \blacksquare (QP)

Instead, we just have

\blacksquare Q \; \blacksquare P \subseteq \blacksquare (QP)

It’s easy to see why. Take P \colon X \nrightarrow Y to be this open Petri net:

and take Q \colon Y \nrightarrow Z to be this one:

Then their composite QP \colon X \nrightarrow Y is this:

It’s easy to see that \blacksquare (QP) is a proper subset of \blacksquare Q \; \blacksquare P. In QP a token can move all the way from point 1 to point 5. But it does not do so by first moving through P and then moving through Q! It has to take a more complicated zig-zag path where it first leaves P and enters Q, then comes back into P, and then goes to Q.

In our paper, Jade and I conjecture that we get

\blacksquare Q \; \blacksquare P = \blacksquare (QP)

if we restrict the reachability semantics to a certain specific sub-double category of \mathbb{O}\mathbf{pen}(\mathrm{Petri}) consisting of ‘one-way’ open Petri nets.

Finally, besides showing that

\blacksquare \colon \mathbb{O}\mathbf{pen}(\mathrm{Petri}) \to \mathbb{R}\mathbf{el}

is a lax double functor, we also show that it’s symmetric monoidal. This means that the reachability semantics works as you’d expect when you run two open Petri nets ‘in parallel’.

In a way, the most important thing about our paper is that it illustrates some methods to study semantics for symmetric monoidal double categories. Kenny Courser and I will describe these methods more generally in our paper “Structured cospans.” They can be applied to timed Petri nets, colored Petri nets, and various other kinds of Petri nets. One can also develop a reachability semantics for open Petri nets that are glued together along transitions as well as places.

I hear that the company Statebox wants these and other generalizations. We aim to please—so we’d like to give it a try.

Next time I’ll wrap up this little series of posts by explaining how Petri nets give symmetric monoidal categories.

Part 1: the double category of open Petri nets.

Part 2: the reachability semantics for open Petri nets.

Part 3: the free symmetric monoidal category on a Petri net.

August 18, 2018

Terence TaoQED version 2.0: an interactive text in first-order logic

As readers who have followed my previous post will know, I have been spending the last few weeks extending my previous interactive text on propositional logic (entitied “QED”) to also cover first-order logic.  The text has now reached what seems to be a stable form, with a complete set of deductive rules for first-order logic with equality, and no major bugs as far as I can tell (apart from one weird visual bug I can’t eradicate, in that some graphics elements can occasionally temporarily disappear when one clicks on an item).  So it will likely not change much going forward.

I feel though that there could be more that could be done with this sort of framework (e.g., improved GUI, modification to other logics, developing the ability to write one’s own texts and libraries, exploring mathematical theories such as Peano arithmetic, etc.).  But writing this text (particularly the first-order logic sections) has brought me close to the limit of my programming ability, as the number of bugs introduced with each new feature implemented has begun to grow at an alarming rate.  I would like to repackage the code so that it can be re-used by more adept programmers for further possible applications, though I have never done something like this before and would appreciate advice on how to do so.   The code is already available under a Creative Commons licence, but I am not sure how readable and modifiable it will be to others currently.

[One thing I noticed is that I would probably have to make more of a decoupling between the GUI elements, the underlying logical elements, and the interactive text.  For instance, at some point I made the decision (convenient at the time) to use some GUI elements to store some of the state variables of the text, e.g. the exercise buttons are currently storing the status of what exercises are unlocked or not.  This is presumably not an example of good programming practice, though it would be relatively easy to fix.  More seriously, due to my inability to come up with a good general-purpose matching algorithm (or even specification of such an algorithm) for the the laws of first-order logic, many of the laws have to be hard-coded into the matching routine, so one cannot currently remove them from the text.  It may well be that the best thing to do in fact is to rework the entire codebase from scratch using more professional software design methods.]



Doug NatelsonPhonons and negative mass

There has been quite a bit of media attention given to this paper, which looks at whether sound waves involve the transport of mass (and therefore whether they should interact with gravitational fields and produce gravitational fields of their own). 

The authors conclude that, under certain circumstances, sound wavepackets (phonons, in the limit where we really think about quantized excitations) rise in a downward-directed gravitational field.  Considered as a distinct object, such a wavepacket has some property, the amount of "invariant mass" that it transports as it propagates along, that turns out to be negative.

Now, most people familiar with the physics of conventional sound would say, hang on, how do sound waves in some medium transport any mass at all?  That is, we think of ordinary sound in a gas like air as pressure waves, with compressions and rarefactions, regions of alternating enhanced and decreased density (and pressure).  In the limit of small amplitudes (the "linear regime"), we can consider the density variations in the wave to be mathematically small, meaning that we can use the parameter \(\delta \rho/rho_{0}\) as a small perturbation, where \(\rho_{0}\) is the average density and \(\delta \rho\) is the change.  Linear regime sound usually doesn't transport mass.  The same is true for sound in the linear regime in a conventional liquid or a solid. 

In the paper, the authors do an analysis where they find that the mass transported by sound is proportional with a negative sign to \(dc_{\mathrm{s}}/dP\), how the speed of sound \(c_{\mathrm{s}}\) changes with pressure for that medium.  (Note that for an ideal gas, \(c_{\mathrm{s}} = \sqrt{\gamma k_{\mathrm{B}}T/m}\), where \(\gamma\) is the ratio of heat capacities at constant pressure and volume, \(m\) is the mass of a gas molecule, and \(T\) is the temperature.  There is no explicit pressure dependence, and sound is "massless" in that case.)

I admit that I don't follow all the details, but it seems to me that the authors have found that for a nonlinear medium such that \(dc_{\mathrm{s}}/dP > 0\), sound wavepackets have a bit less mass than the average density of the surrounding medium.  That means that they experience buoyancy (they "fall up" in a downward-directed gravitational field), and exert an effectively negative gravitational potential compared to their background medium.  It's a neat result, and I can see where there could be circumstances where it might be important (e.g. sound waves in neutron stars, where the density is very high and you could imagine astrophysical consequences).  That being said, perhaps someone in the comments can explain why this is being portrayed as so surprising - I may be missing something.

August 17, 2018

Matt von HippelCurrent Themes 2018

I’m at Current Themes in High Energy Physics and Cosmology this week, the yearly conference of the Niels Bohr International Academy. (I talked about their trademark eclectic mix of topics last year.)

This year, the “current theme” was broadly gravitational (though with plenty of exceptions!).


For example, almost getting kicked out of the Botanical Garden

There were talks on phenomena we observe gravitationally, like dark matter. There were talks on calculating amplitudes in gravity theories, both classical and quantum. There were talks about black holes, and the overall shape of the universe. Subir Sarkar talked about his suspicion that the expansion of the universe isn’t actually accelerating, and while I still think the news coverage of it was overblown I sympathize a bit more with his point. He’s got a fairly specific worry, that we’re in a region that’s moving unusually with respect to the surrounding universe, that hasn’t really been investigated in much detail before. I don’t think he’s found anything definitive yet, but it will be interesting as more data accumulates to see what happens.

Of course, current themes can’t stick to just one theme, so there were non-gravitational talks as well. Nima Arkani-Hamed’s talk covered some results he’s talked about in the past, a geometric picture for constraining various theories, but with an interesting new development: while most of the constraints he found restrict things to be positive, one type of constraint he investigated allowed for a very small negative region, around thirty orders of magnitude smaller than the positive part. The extremely small size of the negative region was the most surprising part of the story, as it’s quite hard to get that kind of extremely small scale out of the math we typically invoke in physics (a similar sense of surprise motivates the idea of “naturalness” in particle physics).

There were other interesting talks, which I might talk about later. They should have slides up online soon in case any of you want to have a look.

David Hoggdetecting sources in imaging

I spent a large part of the morning going through the figures, exposition, and connection to the literature in the draft by Dustin Lang (Toronto) and me about how to detect a star in a collection of probably multi-band images. There is interesting and interestingly different prior work, which wasn't prior when we started this project! Yes, it's an old paper. One of the points Lang makes is that what you detect—if, say, you use Bayes or posterior ratios—depends both on your priors and your utility. And when there are complex combinations of things creating detections in your images (including asteroids, cosmic rays, red and blue galaxies, quasars, stars, and high-redshift g dropouts, say), you want those priors to be reasonable if you want your catalogs to be reasonable.

I also spoke with Kate Storey-Fisher (NYU) about adversarial systematics in cosmology and other radical projects, and [unnamed collaborators] about our HST proposal, which is nearly ready (and anonymous, which is cool!).

August 16, 2018

David Hogga likelihood function for element abundances

On the flight back to NYC, I worked on building a likelihood function for the dependence of element abundances on dynamical actions. It is a hard problem; what freedom to give the model? I was doing crazy binning experiments, trying to make the model fully non-parametric, but it has annoying pathologies, like that the assignments of stars to bins is strongly dynamical-parameter-dependent. And thus you get a jagged likelihood function. At the end of a long set of numerical experiments on my laptop, I realized the following simple thing (duh):

All the abundance-ratio dependences on vertical action are extremely smooth! So using a parameterized rather than non-parametric form should be absolutely fine. By the time I had this realization, I was done all I could do for the day, so: Tomorrow!

David Hogglast day in HD

It was my last day in Heidelberg today! Very sad to go; I love it here. I spent my day making sure my projects are on target as I disappear to an undisclosed location for the next week and a half. Also Simon J Murphy (Sydney) showed up and we talked delta Scuti stars and also observation planning for time-domain discovery.

August 14, 2018

Doug NatelsonAPS March Meeting 2019 - DCMP invited symposia, DMP focused topics

A reminder to my condensed matter colleagues who go to the APS March Meeting:  We know the quality of the meeting depends strongly on getting good invited talks, the 30+6 minute talks that either come all in a group (an "invited session" or "invited symposium") or sprinkled down individually in the contributed sessions.

Now is the time to put together nominations for these things.  The more high quality nominations, the better the content of the meeting.

The APS Division of Condensed Matter Physics is seeking nominations for invited symposia.  See here for the details.  The online submission deadline is August 24th!

Similarly, the APS Division of Materials Physics is seeking nominations for invited talks as part of their Focus Topic sessions.  The list of Focus Topics is here.  The online submission deadline for these is August 29th. 

August 13, 2018

Andrew JaffePlanck: Demographics and Diversity

Another aspect of Planck’s legacy bears examining.

A couple of months ago, the 2018 Gruber Prize in Cosmology was awarded to the Planck Satellite. This was (I think) a well-deserved honour for all of us who have worked on Planck during the more than 20 years since its conception, for a mission which confirmed a standard model of cosmology and measured the parameters which describe it to accuracies of a few percent. Planck is the latest in a series of telescopes and satellites dating back to the COBE Satellite in the early 90s, through the MAXIMA and Boomerang balloons (among many others) around the turn of the 21st century, and the WMAP Satellite (The Gruber Foundation seems to like CMB satellites: COBE won the Prize in 2006 and WMAP in 2012).

Well, it wasn’t really awarded to the Planck Satellite itself, of course: 50% of the half-million-dollar award went to the Principal Investigators of the two Planck instruments, Jean-Loup Puget and Reno Mandolesi, and the other half to the “Planck Team”. The Gruber site officially mentions 334 members of the Collaboration as recipients of the Prize.

Unfortunately, the Gruber Foundation apparently has some convoluted rules about how it makes such group awards, and the PIs were not allowed to split the monetary portion of the prize among the full 300-plus team. Instead, they decided to share the second half of the funds amongst “43 identified members made up of the Planck Science Team, key members of the Planck editorial board, and Co-Investigators of the two instruments.” Those words were originally on the Gruber site but in fact have since been removed — there is no public recognition of this aspect of the award, which is completely appropriate as it is the whole team who deserves the award. (Full disclosure: as a member of the Planck Editorial Board and a Co-Investigator, I am one of that smaller group of 43, chosen not entirely transparently by the PIs.)

I also understand that the PIs will use a portion of their award to create a fund for all members of the collaboration to draw on for Planck-related travel over the coming years, now that there is little or no governmental funding remaining for Planck work, and those of us who will also receive a financial portion of the award will also be encouraged to do so (after, unfortunately, having to work out the tax implications of both receiving the prize and donating it back).

This seems like a reasonable way to handle a problem with no real fair solution, although, as usual in large collaborations like Planck, the communications about this left many Planck collaborators in the dark. (Planck also won the Royal Society 2018 Group Achievement Award which, because there is no money involved, could be uncontroversially awarded to the ESA Planck Team, without an explicit list. And the situation is much better than for the Nobel Prize.)

However, this seemingly reasonable solution reveals an even bigger, longer-standing, and wider-ranging problem: only about 50 of the 334 names on the full Planck team list (roughly 15%) are women. This is already appallingly low. Worse still, none of the 43 formerly “identified” members officially receiving a monetary prize are women (although we would have expected about 6 given even that terrible fraction). Put more explicitly, there is not a single woman in the upper reaches of Planck scientific management.

This terrible situation was also noted by my colleague Jean-Luc Starck (one of the larger group of 334) and Olivier Berné. As a slight corrective to this, it was refreshing to see Nature’s take on the end of Planck dominated by interviews with young members of the collaboration including several women who will, we hope, be dominating the field over the coming years and decades.

David Hoggso many M dwarfs!

In a great victory, Jessica Birky (UCSD) has used The Cannon to put internally consistent labels on more than 10,000 M-type dwarf stars observed in (both) the SDSS-IV APOGEE and Gaia surveys. Her labels are based on a data-driven model of spectra, but they correlate beautifully with the Gaia photometry and astrometry, and they do a good job of predicting photometric measures of temperature. The spectral model is also remarkable: Using only two parameters (effective temperature and a mean composition) the model explains a typical 8000-pixel APOGEE spectrum to percent-level accuracy. So I am pretty happy. This has implications for TESS too. We spent time late in the day writing the paper.

BackreactionBook Review: “Through Two Doors at Once” by Anil Ananthaswamy

Through Two Doors at Once: The Elegant Experiment That Captures the Enigma of Our Quantum Reality Hardcover  By Anil Ananthaswamy Dutton (August 7, 2018) The first time I saw the double-slit experiment, I thought it was a trick, an elaborate construction with mirrors, cooked up by malicious physics teachers. But no, it was not, as I was soon to learn. A laser beam pointed at a plate with two

August 12, 2018

Doug NatelsonWhat is (dielectric) polarization?

This post is an indirect follow-on from here, and was spawned by a request that I discuss the "modern theory of polarization".  I have to say, this has been very educational for me.   Before I try to give a very simple explanation of the issues, those interested in some more technical meat should look here, or here, or here, or at this nice blog post.  

Colloquially, an electric dipole is an overall neutral object with some separation between its positive and negative charge.  A great example is a water molecule, which has a little bit of excess negative charge on the oxygen atom, and a little deficit of electrons on the hydrogen atoms.  

Once we pick an origin for our coordinate system, we can define the electric dipole moment of some charge distribution as \(\mathbf{p} \equiv \int \mathbf{r}\rho(\mathbf{r}) d^{3}\mathbf{r}\), where \(\rho\) is the local charge density.  Often we care about the induced dipole, the dipole moment that is produced when some object like a molecule has its charges rearrange due to an applied electric field.  In that case, \(\mathbf{p}_{\mathrm{ind}} = \alpha \cdot \mathbf{E}\), where \(\alpha\) is the polarizability.  (In general \(\alpha\) is a tensor, because \(\mathbf{p}\) and \(\mathbf{E}\) don't have to point in the same direction.)

If we stick a slab of some insulator between metal plates and apply a voltage across the plates to generate an electric field, we learn in first-year undergrad physics that the charges inside the insulator slightly redistribute themselves - the material polarizes.  If we imagine dividing the material into little chunks, we can define the polarization \(\mathbf{P}\) as the electric dipole moment per unit volume.  For a solid, we can pick some volume and define \(\mathbf{P} = \mathbf{p}/V\), where \(V\) is the volume over which the integral is done for calculating \(\mathbf{p}\).

We can go farther than that.  If we say that the insulator is built up out of a bunch of little polarizable objects each with polarization \(\alpha\), then we can do a self-consistent calculation, where we let each polarizable object see both the externally applied electric field and the electric field from its neighboring dipoles.  Then we can solve for \(\mathbf{P}\) and therefore the relative dielectric constant in terms of \(\alpha\).  The result is called the Clausius-Mossotti relation.

In crystalline solids, however, it turns out that there is a serious problem!  As explained clearly here, because the charge in a crystal is distributed periodically in space, the definition of \(\mathbf{P}\) given above is ambiguous because there are many ways to define the "unit cell" over which the integral is performed.  This is a big deal.  

The "modern theory of polarization" resolves this problem, and actually involves the electronic Berry Phase.  First, it's important to remember that polarization is really defined experimentally by how much charge flows when that capacitor described above has the voltage applied across it.  So, the problem we're really trying to solve is, find the integrated current that flows when an electric field is ramped up to some value across a periodic solid.  We can find that by adding up all the contributions of the different electronic states that are labeled by wavevectors \(\mathbf{k}\).  For each \(\mathbf{k}\) in a given band, there is a contribution that has to do with how the energy varies with \(\mathbf{k}\) (that's the part that looks roughly like a classical velocity), and there's a second piece that has to do with how the actual electronic wavefunctions vary with \(\mathbf{k}\), which is proportional to the Berry curvature.   If you add up all the \(\mathbf{k}\) contributions over the filled electronic states in the insulator, the first terms all cancel out, but the second terms don't, and actually give you a well-defined amount of charge.   

Bottom line:  In an insulating crystal, the actual polarization that shows up in an applied electric field comes from how the electronic states vary with \(\mathbf{k}\) within the filled bands.  This is a really surprising and deep result, and it was only realized in the 1990s.  It's pretty neat that even "simple" things like crystalline insulators can still contain surprises (in this case, one that foreshadowed the whole topological insulator boom). 

David Hoggmaking the angle distribution uniform

Years ago, Jo Bovy (now Toronto) and I wrote this crazy paper, in which we infer the force law in the Solar System from a snapshot of the 8 planets' positions and velocities. Because you can't infer dynamics from initial conditions in general, we had to make additional assumptions; we made the assumptions that the system is old, non-resonant, and being observed at no special time. That led to the conclusion that the distribution function should depend only on actions and not on conjugate angles.

But that's not enough: How to do inference? The frequentist solution is orbital roulette, in which you choose the force law(s) in which the conjugate angles look well mixed or uniformly distributed. That's clever, but what's the Bayesian generalization? (Or, really, specification?)

It turns out that there is no way to generate the data with a likelihood function and also insist that the angles be mixed. In Bayesian inference, all you can do is generate the data, and the data can be generated with functions that don't depend on angles. But beyond the generative model, you can't additionally insist that the angles look mixed. That isn't part of the generative model! So the solution (which was expensive) was to just model the kinematic snapshot with a very general form for the distribution function, which has a lot of flexibility but only depends on actions, generate the angles uniformly, and hope for the best. And it worked.

Why am I saying all of this? Because exactly the same issue came up today (and in the last few weeks) between Rix (MPIA) and me: I have this project to find the potential in which the chemical abundances don't vary with angle. And I can make frequentist methods that are based on minimizing angle dependences. But the only Bayesian methods I can create don't really directly insist that the abundances don't depend on angle: They only insist that the abundance distribution is controlled by the actions alone. I spent the non-discussion part of the day coding up relevant stuff.

John BaezThe Philosophy and Physics of Noether’s Theorems


I’ll be speaking at a conference celebrating the centenary of Emmy Noether’s work connecting symmetries and conservation laws:

The Philosophy and Physics of Noether’s Theorems, 5-6 October 2018, Fischer Hall, 1-4 Suffolk Street, London, UK. Organized by Bryan W. Roberts (LSE) and Nicholas Teh (Notre Dame).

They write:

2018 brings with it the centenary of a major milestone in mathematical physics: the publication of Amalie (“Emmy”) Noether’s theorems relating symmetry and physical quantities, which continue to be a font of inspiration for “symmetry arguments” in physics, and for the interpretation of symmetry within philosophy.

In order to celebrate Noether’s legacy, the University of Notre Dame and the LSE Centre for Philosophy of Natural and Social Sciences are co-organizing a conference that will bring together leading mathematicians, physicists, and philosophers of physics in order to discuss the enduring impact of Noether’s work.

There’s a registration fee, which you can see on the conference website, along with a map showing the conference location, a schedule of the talks, and other useful stuff.

Here are the speakers:

John Baez (UC Riverside)

Jeremy Butterfield (Cambridge)

Anne-Christine Davis (Cambridge)

Sebastian De Haro (Amsterdam and Cambridge)

Ruth Gregory (Durham)

Yvette Kosmann-Schwarzbach (Paris)

Peter Olver (UMN)

Sabrina Pasterski (Harvard)

Oliver Pooley (Oxford)

Tudor Ratiu (Shanghai Jiao Tong and Geneva)

Kasia Rejzner (York)

Robert Spekkens (Perimeter)

I’m looking forward to analyzing the basic assumptions behind various generalizations of Noether’s first theorem, the one that shows symmetries of a Lagrangian give conserved quantities. Having generalized it to Markov processes, I know there’s a lot more to what’s going on here than just the wonders of Lagrangian mechanics:

• John Baez and Brendan Fong, A Noether theorem for Markov processes, J. Math. Phys. 54 (2013), 013301. (Blog article here.)

I’ve been trying to get to the bottom of it ever since.

August 11, 2018

n-Category Café The Philosophy and Physics of Noether's Theorems

Nicholas Teh tells me that there is to be a conference held in London, UK, on October 5-6, 2018, celebrating the centenary of Emmy Noether’s work in mathematical physics.

2018 brings with it the centenary of a major milestone in mathematical physics: the publication of Amalie (“Emmy”) Noether’s theorems relating symmetry and physical quantities, which continue to be a font of inspiration for “symmetry arguments” in physics, and for the interpretation of symmetry within philosophy.

In order to celebrate Noether’s legacy, the University of Notre Dame and the LSE Centre for Philosophy of Natural and Social Sciences are co-organizing a conference that will bring together leading mathematicians, physicists, and philosophers of physics in order to discuss the enduring impact of Noether’s work.

Speakers include our very own John Baez.

We have the entry nLab: Noether’s theorem. Since this (the first theorem) concerns group symmetries and conserved quantities, and since we are at the nn-Category Café, naturally we’re interested in higher Noetherian constructions, involving actions by higher groups. For an example of this you can turn to Urs Schreiber’s Higher prequantum geometry and its talk of ‘higher Noether currents’ as a L L_{\infty}-algebra extension (p. 21).

Here are all the conference speakers:

August 10, 2018

Matt von HippelDifferent Fields, Different Worlds

My grandfather is a molecular biologist. When we meet, we swap stories: the state of my field and his, different methods and focuses but often a surprising amount of common ground.

Recently he forwarded me an article by Raymond Goldstein, a biological physicist, arguing that biologists ought to be more comfortable with physical reasoning. The article is interesting in its own right, contrasting how physicists and biologists think about the relationship between models, predictions, and experiments. But what struck me most about the article wasn’t the content, but the context.

Goldstein’s article focuses on a question that seemed to me oddly myopic: should physical models be in the Results section, or the Discussion section?

As someone who has never written a paper with either a Results section or a Discussion section, I wondered why anyone would care. In my field, paper formats are fairly flexible. We usually have an Introduction and a Conclusion, yes, but in between we use however many sections we need to explain what we need to. In contrast, biology papers seem to have a very fixed structure: after the Introduction, there’s a Results section, a Discussion section, and a Materials and Methods section at the end.

At first blush, this seemed incredibly bizarre. Why describe your results before the methods you used to get them? How do you talk about your results without discussing them, but still take a full section to do it? And why do reviewers care how you divide things up in the first place?

It made a bit more sense once I thought about how biology differs from theoretical physics. In theoretical physics, the “methods” are most of the result: unsolved problems are usually unsolved because existing methods don’t solve them, and we need to develop new methods to make progress. Our “methods”, in turn, are often the part of the paper experts are most eager to read. In biology, in contrast, the methods are much more standardized. While papers will occasionally introduce new methods, there are so many unexplored biological phenomena that most of the time researchers don’t need to invent a new method: just asking a question no-one else has asked can be enough for a discovery. In that environment, the “results” matter a lot more: they’re the part that takes the most scrutiny, that needs to stand up on its own.

I can even understand the need for a fixed structure. Biology is a much bigger field than theoretical physics. My field is small enough that we all pretty much know each other. If a paper is hard to read, we’ll probably get a chance to ask the author what they meant. Biology, in contrast, is huge. An important result could come from anywhere, and anyone. Having a standardized format makes it a lot easier to scan through an unfamiliar paper and find what you need, especially when there might be hundreds of relevant papers.

The problem with a standardized system, as always, is the existence of exceptions. A more “physics-like” biology paper is more readable with “physics-like” conventions, even if the rest of the field needs to stay “biology-like”. Because of that, I have a lot of sympathy for Goldstein’s argument, but I can’t help but feel that he should be asking for more. If creating new mathematical models and refining them with observation is at the heart of what Goldstein is doing, then maybe he shouldn’t have to use Results/Discussion/Methods in the first place. Maybe he should be allowed to write biology papers that look more like physics papers.

Doug NatelsonHydraulic jump: New insights into a very old phenomenon

Ever since I learned about them, I thought that hydraulic jumps were cool.  As I wrote here, a hydraulic jump is an analog of a standing shockwave.  The key dimensionless parameter in a shockwave in a gas is the Mach number, the ratio between the fluid speed \(v\) and the local speed of sound, \(c_{\mathrm{s}}\).   The gas that goes from supersonic (\(\mathrm{Ma} > 1\)) on one side of the shock to subsonic (\(\mathrm{Ma} < 1\)) on the other side.

For a looong time, the standard analysis of hydraulic jumps assumed that the relevant dimensionless number here was the Froude number, the ratio of fluid speed to the speed of (gravitationally driven) shallow water waves, \(\sqrt{g h}\), where \(g\) is the gravitational acceleration and \(h\) is the thickness of the liquid (say on the thin side of the jump).  That's basically correct for macroscopic jumps that you might see in a canal or in my previous example.

However, a group from Cambridge University has shown that this is not the right way to think about the kind of hydraulic jump you see in your sink when the stream of water from the faucet hits the basin.  (Sorry that I can't find a non-pay link to the paper.)  They show this conclusively by the very simple, direct method of producing hydraulic jumps by shooting water streams horizontally onto a wall, and vertically onto a "ceiling".  The fact that hydraulic jumps look the same in all these cases clearly shows that gravity can't be playing the dominant role in this case.  Instead, the correct analysis is to worry about not just gravity but also surface tension.  They do a general treatment (which is quite elegant and understandable to fluid mechanics-literate undergrads) and find that the condition for a hydraulic jump to form is now \(\mathrm{We}^{-1} + \mathrm{Fr}^{-2} = 1\), where \(\mathrm{Fr} \sim v/\sqrt{g h}\) as usual, and the Weber number \(\mathrm{We} \sim \rho v^{2} h/\gamma\), where \(\rho\) is the fluid density and \(\gamma\) is the surface tension.   The authors do a convincing analysis of experimental data with this model, and it works well.  I think it's very cool that we can still get new insights into phenomena, and this is an example understandable at the undergrad level where some textbook treatments will literally have to be rewritten.

August 09, 2018

Jordan EllenbergDinner experiment

Ground lamb from Double Ewe Farm (Arena, WI) bought at Conscious Carnivore, stir-fried with scallions/mushrooms/cabbage/garlic/soy sauce/sesame oil and served on top of shiso leaves from Crossroads Community Farm (Cross Plains, WI) with Hot Mama’s habanero sauce from Belize.

I would include a picture of it but it actually didn’t look very pretty.  It tasted great, though!


Jordan EllenbergEighth Grade

Saw this with CJ.  Good movie.  If you’re wondering, can you see this with your adolescent, definitely yes.  If you’re wondering, will my adolescent have a deep conversation with me afterwards about the challenges of growing up, well, that’s not really CJ’s style but good luck with it!

My favorite thing about Eighth Grade is the way it captures the adolescent challenge seeing other human beings as actual people, like oneself, with their own interior lives.  Other people, for Kayla, are still mostly instruments, things to do something with, or things from which to get a response.  Or maybe she’s just at the moment of learning that other people are not just that?  Very good the way she records Olivia’s name in her phone as “Olivia High School” — other people are roles, they fit in slots — the crush, the shadow, the rival.  Olivia, older, engages with Kayla’s real self in a way that Kayla isn’t yet ready to reciprocate.

But why did she have to say at the end that high school was going to be cool, except math?  Come on, teen movie, be better.


John PreskillThe Curious Behavior of Topological Insulators

IQIM hosts a Summer Research Institute that invites high school Physics teachers to work directly with staff, students, and researchers in the lab.  Last summer I worked with Marcus Teague, a highly intelligent and very patient Caltech Staff Scientist in the Yeh Group, to help set up an experiment for studying exotic material samples under circularly polarized light.  I had researched, ordered, and assembled parts for the optics and vacuum chamber.  As I returned to Caltech this summer, I was eager to learn how the Yeh Group had proceeded with the study.

Yeh group 2017

Yeh group (2017): I am the one on the front-left of the picture, next to Dr. Yeh and in front of Kyle Chen. Benjamin Fackrell, another physics teacher interning at the Yeh lab, is all the way to the right.

The optics equipment I had researched, ordered, and helped to set up last summer is being used currently to study topological insulator (TI) samples that Kyle Chien-Chang Chen, a doctoral candidate, has worked on in the Yeh Lab.  Yes, a high school Physics teacher played a small role in their real research! It is exciting and humbling to have a connection to real-time research.


Quartz quarter-wave plates are important elements in many experiments involving light. They convert linearly polarized light to circularly polarized light.

Kyle receives a variety of TI samples from UCLA; the current sample up for review is Bismuth Antimony Telluride \mathrm{(BiSb)}_2\mathrm{Te}_3.  Depending on the particular sample and the type of testing, Kyle has a variety of procedures to prep the samples for study.  And this summer, Kyle has help from visiting Canadian student Adrian Llanos. Below are figures of some of the monolayer and bilayer structures for topological insulators studied in the lab.

2016 0808 sample from UCLA

Pictures of samples from UCLA

Under normal conditions, a topological insulator (TI) is only conductive on the surface. The center of a TI sample is an insulator. But when the surface states open an energy gap, the surface of the TI becomes insulating. The energy gap is the amount of energy necessary to remove an electron from the top valence band to become free to move about.  This gap is the result of the interaction between the conduction band and valence band surface states from the opposing surfaces of a thin film. The resistance of the conducting surface actually increases. The Yeh group is hoping that the circularly polarized light can help align the spin of the Chromium electrons, part of the bilayer of the TI.  At the same time, light has other effects, like photo-doping, which excites more electrons into the conduction bands and thus reduces the resistance. The conductivity of the surface of the TI changes as the preferentially chosen spin up or spin down is manipulated by the circularly polarized light or by the changing magnetic field.


A physical property measurement system.

This interesting experiment on TI samples is taking place within a device called a Physical Property Measurement System (PPMS).  The PPMS is able to house the TI sample and the optics equipment to generate circularly polarized light, while allowing the researchers to vary the temperature and magnetic field.  The Yeh Group is able to artificially turn up the magnetic field or the circularly polarized light in order to control the resistance and current signal within the sample.  The properties of surface conductivity are studied up to 8 Tesla (over one-hundred thousand times the Earth’s magnetic field), and from room temperature (just under 300 Kelvin) to just below 2 Kelvin (colder than outer space).


Right-Hand-Rule used to determine the direction of the magnetic (Lorentz) force.

In the presence of a magnetic field, when a current is applied to a conductor, the electrons will experience a force at a right angle to the magnetic field, following the right-hand rule (or the Physics gang sign, as we affectionately call it in my classroom).  This causes the electrons to curve perpendicular to their original path and perpendicular to the magnetic field. The build up of electrons on one end of the conductor creates a potential difference. This potential difference perpendicular to the original current is known as the ordinary Hall Effect.  The ratio of the induced voltage to the applied current is known as the Hall Resistance.

Under very low temperatures, the Quantum Hall Effect is observed. As the magnetic field is changed, the Hall Voltage increases in set quantum amounts, as opposed to gradually. Likewise, the Hall Resistance is quantized.  It is a such an interesting phenomenon!

For a transport measurement of the TI samples, Kyle usually uses a Hall Bar Geometry in order to measure the Hall Effect accurately. Since the sample is sufficiently large, he can simply solder it for measurement.


Transport Measurements of TI Samples follow the same setup as Quantum Hall measurements on graphene: Current runs through electrodes attached to the North/South ends of the sample, while electron flow is measured longitudinally, as well as along the East/West ends (Hall conductance).

What is really curious is that the Bismuth Antimony Telluride samples are exhibiting the Hall Effect even when no external magnetic field is applied!  When the sample is measured, there is a Hall Resistance despite no external magnetic field. Hence the sample itself must be magnetic.  This phenomenon is called the Anomalous Hall Effect.

According to Kyle, there is no fancy way to measure the magnetization directly; it is only a matter of measuring a sample’s Hall Resistance. The Hall Resistance should be zero when there is no Anomalous Hall Effect, and when there is ferromagnetism (spins want to align in the direction of their neighbors), you see a non-zero value.  What is really interesting is that they assume ferromagnetism would break the time-reversal symmetry and thus open a gap at the surface states.  A very strange behavior that is also observed is that the longitudinal resistance increases gradually.  

Running PPMS

Running PPMS

Typically the quantum Hall Resistance increases in quantum increments.  Even if the surface gap is open, the sample is not insulating because the gap is small (<0.3 eV); hence, under these conditions this TI is behaving much more like a semiconductor!

Next, the group will examine these samples using the Scanning Tunneling Microscope (STM).  The STM will be able to provide local topological information by examining 1 micron by 1 micron areas.  In comparison, the PPMS research with these samples is telling the story of the global behavior of the sample.  The combination of information from the PPMS and STM research will provide a more holistic story of the behavior of these unique samples.

I am thrilled to see how the group has used what we started with last summer to find interesting new results.  I am fascinated to see what they learn in the coming months with the different samples and STM testing. And I am quite excited to share these applications with my students in the upcoming new school year.  Another summer packed with learning!

Terence TaoBirkar, Figalli, Scholze, Venkatesh

Every four years at the International Congress of Mathematicians (ICM), the Fields Medal laureates are announced.     Today, at the 2018 ICM in Rio de Janeiro, it was announced that the Fields Medal was awarded to Caucher Birkar, Alessio Figalli, Peter Scholze, and Akshay Venkatesh.

After the two previous congresses in 2010 and 2014, I wrote blog posts describing some of the work of each of the winners.  This time, though, I happened to be a member of the Fields Medal selection committee, and as such had access to a large number of confidential letters and discussions about the candidates with the other committee members; in order to have the opinions and discussion as candid as possible, it was explicitly understood that these communications would not be publicly disclosed.  Because of this, I will unfortunately not be able to express much of a comment or opinion on the candidates or the process as an individual (as opposed to a joint statement of the committee).  I can refer you instead to the formal citations of the laureates (which, as a committee member, I was involved in crafting, and then signing off on), or the profiles of the laureates by Quanta magazine; see also the short biographical videos of the laureates by the Simons Foundation that accompanied the formal announcements of the winners. I am sure, though, that there will be plenty of other mathematicians who will be able to present the work of each of the medalists (for instance, there was a laudatio given at the ICM for each of the winners, which should eventually be made available at this link).

I know that there is a substantial amount of interest in finding out more about the inner workings of the Fields Medal selection process.  For the reasons stated above, I as an individual will unfortunately be unable to answer any questions about this process (e.g., I cannot reveal any information about other nominees, or of any comparisons between any two candidates or nominees).  I think I can safely express the following two personal opinions though.  Firstly, while I have served on many prize committees in the past, the process for the Fields Medal committee was by far the most thorough and deliberate of any I have been part of, and I for one learned an astonishing amount about the mathematical work of all of the shortlisted nominees, which was an absolutely essential component of the deliberations, in particular giving the discussions a context which would have been very difficult to obtain for an individual mathematician not in possession of all the confidential letters, presentations, and other information available to the committee (in particular, some of my preconceived impressions about the nominees going into the process had to be corrected in light of this more complete information).  Secondly, I believe the four medalists are all extremely deserving recipients of the prize, and I fully stand by the decision of the committee to award the Fields medals this year to these four.

I’ll leave the comments to this post open for anyone who wishes to discuss the work of the medalists.  But, for the reasons above, I will not participate in the discussion myself.

[Edit, Aug 1: looks like the ICM site is (barely) up and running now, so links have been added.  At this time of writing, there does not seem to be an online announcement of the composition of the committee, but this should appear in due course. -T.]

[Edit, Aug 9: the composition of the Fields Medal Committee for 2018 (which included myself) can be found here. -T.]


August 07, 2018

Doug NatelsonFaculty position at Rice - experimental atomic/molecular/optical

Faculty Position in Experimental Atomic/Molecular/Optical Physics at Rice University

The Department of Physics and Astronomy at Rice University in Houston, TX ( invites applications for a tenure-track faculty position in experimental atomic, molecular, and optical physics.  The Department expects to make an appointment at the assistant professor level. Applicants should have an outstanding research record and recognizable potential for excellence in teaching and mentoring at the undergraduate and graduate levels. The successful candidate is expected to establish a distinguished, externally funded research program and support the educational and service missions of the Department and University.

Applicants must have a PhD in physics or related field, and they should submit the following: (1) cover letter; (2) curriculum vitae; (3) research statement; (4) three publications; (5) teaching statement; and (6) the names, professional affiliations, and email addresses of three references. For full details and to apply, please visit: The review of applications will begin November 1, 2018, but all those received by December 1, 2018 will be assured full consideration. The appointment is expected to start in July 2019.  Further inquiries should be directed to the chair of the search committee, Prof. Thomas C. Killian (

Rice University is an Equal Opportunity Employer with commitment to diversity at all levels, and considers for employment qualified applicants without regard to race, color, religion, age, sex, sexual orientation, gender identity, national or ethnic origin, genetic information, disability or protected veteran status.

BackreactionDear Dr B: Is it possible that there is a universe in every particle?

“Is it possible that our ‘elementary’ particles are actually large scale aggregations of a different set of something much smaller? Then, from a mathematical point of view, there could be an infinite sequence of smaller (and larger) building blocks and universes.”                                                                       ~Peter Letts Dear Peter, I love the idea that there is a

BackreactionLimits of Reductionism

Almost forgot to mention I made it 3rd prize in the 2018 FQXi essay contest “What is fundamental?” The new essay continues my thoughts about whether free will is or isn’t compatible with what we know about the laws of nature. For many years I was convinced that the only way to make free will compatible with physics is to adopt a meaningless definition of free will. The current status is that I

BackreactionWhat's the purpose of working in the foundations of physics?

That’s me. Photo by George Musser. Yes, I need a haircut. [Several people asked me for a transcript of my intro speech that I gave yesterday in Utrecht at the 19th UK and European conference on foundations of physics. So here it is.] Thank you very much for the invitation to this 19th UK and European conference on Foundations of physics. The topic of this conference combines everything that

BackreactionEvidence for modified gravity is now evidence against it.

Hamster. Not to scale.Img src: Petopedia. It’s day 12,805 in the war between modified gravity and dark matter. That’s counting the days since the publication of Mordehai Milgrom’s 1983 paper. In this paper he proposed to alter Einstein’s theory of general relativity rather than conjecturing invisible stuff. Dark matter, to remind you, are hypothetical clouds of particles that hover around

August 06, 2018

John PreskillThe Quantum Wave in Computing

Summer is a great time for academics. Imagine: three full months off! Hit the beach. Tune that golf pitch. Hike the sierras. Go on a cruise. Watch soccer with the brazilenos (there’s been better years for that one). Catch the sunset by the Sydney opera house. Take a nap.


A visiting researcher taking full advantage of the Simons Institute’s world-class relaxation facilities. And yes, I bet you he’s proving a theorem at the same time.

Think that’s outrageous? We have it even better. Not only do we get to travel the globe worry-free, but we prove theorems while doing it. For some of us summer is the only time of year when we manage to prove theorems. Ideas accumulate during the year, blossom during the conferences and workshops that mark the start of the summer, and hatch during the few weeks that many of us set aside as “quiet time” to finally “wrap things up”.

I recently had the pleasure of contributing to the general well-being of my academic colleagues by helping to co-organize (with Andrew Childs, Ignacio Cirac, and Umesh Vazirani) a 2-month long program on “Challenges in Quantum Computation” at the Simons Institute in Berkeley. In this post I report on the program and describe one of the highlights discussed during it: Mahadev’s very recent breakthrough on classical verification of quantum computation.

Challenges in Quantum Computation

The Simons Institute has been in place on the UC Berkeley campus since the Fall of 2013, and in fact one of their first programs was on “Quantum Hamiltonian Complexity”, in Spring 2014 (see my account of one of the semester’s workshops here). Since then the institute has been hosting a pair of semester-long programs at a time, in all areas of theoretical computer science and neighboring fields. Our “summer cluster” had a slightly different flavor: shorter, smaller, it doubled up as the prelude to a full semester-long program scheduled for Spring 2020 (provisional title: The Quantum Wave in Computing, a title inspired from Umesh Vazirani’s recent tutorial at STOC’18 in Los Angeles) — (my interpretation of) the idea being that the ongoing surge in experimental capabilities supports a much broader overhaul of some of the central questions of computer science, from the more applied (such as, programming languages and compilers), to the most theoretical (such as, what complexity classes play the most central role).

This summer’s program hosted a couple dozen participants at a time. Some stayed for the full 2 months, while others visited for shorter times. The Simons Institute is a fantastic place for collaborative research. The three-story building is entirely devoted to us. There are pleasant yet not-too-comfortable shared offices, but the highlight is the two large communal rooms meant for organized and spontaneous discussion. Filled with whiteboards, bright daylight, comfy couches, a constant supply of tea, coffee, and cookies, and eager theorists!

After a couple weeks of settling down the program kicked off with an invigorating workshop. Our goal for the workshop was to frame the theoretical questions raised by the sudden jump in the capabilities of experimental quantum devices that we are all witnessing. There were talks describing progress in experiments (superconducting qubits, ion traps, and cold atoms were represented), suggesting applications for the new devices (from quantum simulation & quantum chemistry to quantum optimization and machine learning through “quantum supremacy” and randomness generation), and laying the theoretical framework for trustworthy interaction with the quantum devices (interactive proofs, testing, and verifiable delegation). We had an outstanding line-up of speakers. All talks (except the panel discussions, unfortunately) were recorded, and you can watch them here.

The workshop was followed by five additional weeks of “residency”, that allowed long-term participants to digest and develop the ideas presented during the workshop. In my experience these few additional weeks, right after the workshop, make all the difference. It is the same difference as between a quick conference call and a leisurely afternoon at the whiteboard: while the former may feel productive and bring the adrenaline levels up, the latter is more suited to in-depth exploration and unexpected discoveries.

There would be much to say about the ideas discussed during the workshop and following weeks. I will describe a single one of these ideas — in my opinion, one of the most outstanding ideas to have emerged at the interface of quantum computing and theoretical computer science in recent years! The result, “Classical Verification of Quantum Computations”, is by Urmila Mahadev, a Ph.D.~student at UC Berkeley (I think she just graduated). Urmila gave a wonderful talk on her result at the workshop, and I highly recommend watching the recorded video. In the remainder of this post I’ll provide an overview of the result. I also wrote a slightly more technical introduction that eager readers will find here.

A cryptographic leash on quantum systems

Mahadev’s result is already famous: announced on the blog of Scott Aaronson, it has earned her a long-standing 25$ prize, awarded for “solving the problem of proving the results of an arbitrary quantum computation to a classical skeptic”. Or, in complexity-theoretic linguo, for showing that “every language in the class BQP admits an interactive protocol where the prover is in BQP and the verifier is in BPP”. What does this mean?

Verifying quantum computations in the high complexity regime

On his blog Scott Aaronson traces the question back to a talk given by Daniel Gottesman in 2004. An eloquent formulation appears in a subsequent paper by Dorit Aharonov and Umesh Vazirani, aptly titled “Is Quantum Mechanics Falsifiable? A computational perspective on the foundations of Quantum Mechanics”.

Here is the problem. As readers of this blog are well aware, Feynman’s idea of a quantum computer, and the subsequent formalization by Bernstein and Vazirani of the Quantum Turing Machine, layed the theoretical foundation for the construction of computing devices whose inner functioning is based on the laws of quantum physics. Most readers also probably realize that we currently believe that these quantum devices will have the ability to efficiently solve computational problems (the class of which is denoted BQP) that are thought to be beyond the reach of classical computers (represented by the class BPP). A prominent example is factoring, but there are many others. The most elementary example is arguably Feynman’s original proposal: a quantum computer can be used to simulate the evolution of any quantum mechanical system “in real time”. In contrast, the best classical simulations available can take exponential time to converge even on concrete examples of practical interest. This places a computational impediment to scientific progress: the work of many physicists, chemists, and biologists, would be greatly sped up if only they could perform simulations at will.

So this hypothetical quantum device claims (or will likely claim) that it has the ability to efficiently solve computational problems for which there is no known efficient classical algorithm. Not only this but, as is widely believed in complexity-theoretic circles (a belief recently strenghtened by the proof of an oracle separation between BQP and PH by Tal and Raz), for some of these problems, even given the answer, there does not exist a classical proof that the answer is correct. The quantum device’s claim cannot be verified! This seems to place the future of science at the mercy of an ingenuous charlatan, with good enough design & marketing skills, that would convince us that it is providing the solution to exponentially complex problems by throwing stardust in our eyes. (Wait, did this happen already?)

Today is the most exciting time in quantum computing since the discovery of Shor’s algorithm for factoring: while we’re not quite ready to run that particular algorithm yet, experimental capabilities have ramped up to the point where we are just about to probe the “high-complexity” regime of quantum mechanics, by making predictions that cannot be emulated, or even verified, using the most powerful classical supercomputers available. What confidence will we have that the predictions have been obtained correctly? Note that this question is different from the question of testing the validity of the theory of quantum mechanics itself. The result presented here assumes the validity of quantum mechanics. What it offers is a method to test, assuming the correctness of quantum mechanics, that a device performs the calculation that it claims to have performed. If the device has supra-quantum powers, all bets are off. Even assuming the correctness of quantum mechanics, however, the device may, intentionally or not (e.g. due to faulty hardware), mislead the experimentalist. This is the scenario that Mahadev’s result aims to counter.

Interactive proofs

The first key idea is to use the power of interaction. The situation can be framed as follows: given a certain computation, such that a device (henceforth called “prover”) has the ability to perform the computation, but another entity, the classical physicist (henceforth called “verifier”) does not, is there a way for the verifier to extract the right answer from the prover with high confidence — given that the prover may not be trusted, and may attempt to use its superior computing power to mislead the verifier instead of performing the required computation?

The simplest scenario would be one where the verifier can execute the computation herself, and check the prover’s outcome. The second simplest scenario is one where the verifier cannot execute the computation, but there is a short proof that the prover can provide that allows her to fully certify the outcome. These two scenario correspond to problems in BPP and NP respectively; an example of the latter is factoring. As argued earlier, not all quantum computations (BQP) are believed to fall within these two classes. Both direct computation and proof verification are ruled out. What can we do? Use interaction!

The framework of interactive proofs originates in complexity theory in the 1990s. An interactive proof is a protocol through which a verifier (typically a computationally bounded entity, such as the physicist and her classical laptop) interacts with a more powerful, but generally untrusted, prover (such as the experimental quantum device). The goal of the protocol is for the verifier to certify the validity of a certain computational statement.

Here is a classical example (the expert — or impatient — reader may safely skip this). The example is for a problem that lies in co-NP, but is not believed to lie in NP. Suppose that both the verifier and prover have access to two graphs, {G} and {H}, such that the verifier wishes to raise an “ACCEPT” flag if and only if the two graphs are not isomorphic. In general this is a hard decision to make, because the verifier would have to check all possible mappings from one graph to the other, of which there are exponentially many. Here is how the verifier can extract the correct answer by interacting with a powerful, untrusted prover. First, the verifier flips a fair coin. If the coin comes up heads, she selects a random relabeling of the vertices of {G}. If the coin comes up tail, she selects a random relabeling of the vertices of {H}. The verifier then sends the relabeled graph to the prover, and asks the prover to guess which graph the verifier has hidden. If the prover provides the correct answer (easy to check), the verifier concludes that the graphs were not isomorphic. Otherwise, she concludes that they were isomorphic. It is not hard to see that, if the graphs are indeed not isomorphic, the prover always has a means to correctly identify the hidden graph, and convince the verifier to make the right decision. But if the graphs are isomorphic, then there is no way for the prover to distinguish the random relabelings (since the distributions obtained by randomly relabeling each graph are identical), and so the verifier makes the right decision with probability 1/2. Repeating the protocol a few times, with a different choice of relabeling each time, quickly drives the probability of making an error to {0}.

A deep result from the 1990s exactly charaterizes the class of computational problems (languages) that a classical polynomial-time verifier can decide in this model: IP = PSPACE. In words, any problem whose solution can be found in polynomial space has an interactive proof in which the verifier only needs polynomial time. Now observe that PSPACE contains NP, and much more: in fact PSPACE contains BQP as well (and even QMA)! (See this nice recent article in Quanta for a gentle introduction to these complexity classes, and more.) Thus any problem that can be decided on a quantum computer can also be decided without a quantum computer, by interacting with a powerful entity, the prover, that can convince the verifier of the right answer without being able to induce her in error (in spite of the prover’s greater power).

Are we not done? We’re not! The problem is that the result PSPACE = IP, even when specialized to BQP, requires (for all we know) a prover whose power matches that of PSPACE (almost: see e.g. this recent result for a slighlty more efficient prover). And as much as our experimental quantum device inches towards the power of BQP, we certainly wouldn’t dare ask it to perform a PSPACE-hard computation. So even though in principle there do exist interactive proofs for BQP-complete languages, these interactive proofs require a prover whose computational power goes much beyond what we believe is physically achievable. But that’s useless (for us): back to square zero.

Interactive proofs with quantum provers

Prior to Mahadev’s result, a sequence of beautiful results in the late 2000’s introduced a clever extension of the model of interactive proofs by allowing the verifier to make use of a very limited quantum computer. For example, the verifier may have the ability to prepare single qubits in two possible bases of her choice, one qubit at a time, and send them to the prover. Or the verifier may have the ability to receive single qubits from the prover, one at a time, and measure them in one of two bases of her choice. In both cases it was shown that the verifier could combine such limited quantum capacity with the possibility to interact with a quantum polynomial-time prover to verify arbitrary polynomial-time quantum computation. The idea for the protocols crucially relied on the ability of the verifier to prepare qubits in a way that any deviation by the prover from the presecribed honest behavior would be detected (e.g. by encoding information in mutually unbiased bases unknown to the prover). For a decade the question remained open: can a completely classical verifier certify the computation performed by a quantum prover?

Mahadev’s result brings a positive resolution to this question. Mahadev describes a protocol with the following properties. First, as expected, for any quantum computation, there is a quantum prover that will convince the classical verifier of the right outcome for the computation. This property is called completeness of the protocol. Second, no prover can convince the classical verifier to accept a wrong outcome. This property is called soundness of the protocol. In Mahadev’s result the latter property comes with a twist: soundness holds provided the prover cannot break post-quantum cryptography. In contrast, the earlier results mentioned in the previous paragraph obtained protocols that were sound against an arbitrarily powerful prover. The additional cryptographic assumption gives Mahadev’s result a “win-win” flavor: either the protocol is sound, or someone in the quantum cloud has figured out how to break an increasingly standard cryptographic assumption (namely, post-quantum security of the Learning With Errors problem) — in all cases, a verified quantum feat!

In the remaining of the post I will give a high-level overview of Mahadev’s protocol and its analysis. For more detail, see the accompanying blog post.

The protocol is constructed in two steps. The first step builds on insights from works preceding this one. This step reduces the problem of verifying the outcome of an arbitrary quantum computation to a seemingly much simpler problem, that nevertheless encapsulates all the subtlety of the verification task. The problem is the following — in keeping with the terminology employed by Mahadev, I’ll call it the qubit commitment problem. Suppose that a prover claims to have prepared a single-qubit state of its choice; call it {| \psi \rangle} ({| \psi \rangle} is not known to the verifier). Suppose the verifier challenges the prover for the outcome of a measurement performed on {| \psi \rangle}, either in the computational basis (the eigenbasis of the Pauli Z), or in the Hadamard basis (the eigenbasis of the Pauli X). Which basis to use is the verifier’s choice, but of course only one basis can be asked. Does there exist a protocol that guarantees that, at the end of the protocol, the verifier will be able to produce a bit that matches the true outcome of a measurement of {| \psi \rangle} in the chosen basis? (More precisely, it should be that the verifier’s final bit has the same distribution as the outcome of a measurement of {| \psi \rangle} in the chosen basis.)

The reduction that accomplishes this first step combines Kitaev’s circuit-to-Hamiltonian construction with some gadgetry from perturbation theory, and I will not describe it here. An important property of the reduction is that it is ultimately sufficient that the verifier has the guarantee that the measurement outcomes she obtains in either case, computational or Hadamard, are consistent with measurement outcomes for the correct measurements performed on some quantum state. In principle the state does not need to be related to anything the prover does (though of course the analysis will eventually define that state from the prover), it only needs to exist. Specifically, we wish to rule out situations where e.g. the prover claims that both outcomes are deterministically “0”, a claim that would violate the uncertainty principle. (For the sake of the argument, let’s ignore that in the case of a single qubit the space of outcomes allowed by quantum mechanics can be explicitly mapped out — in the actual protocol, the prover commits to {n} qubits, not just one.)

Committing to a qubit

The second step of the protocol construction introduces a key idea. In order to accomplish the sought-after commitment, the verifier is going to engage in an initial commitment phase with the prover. In this phase, the prover is required to provide classical information to the verifier, that “commits” it to a specific qubit. This committed qubit is the state on which the prover will later perform the measurement asked by the verifier. The classical information obtained in the commitment phase will bind the prover to reporting the correct outcome, for both of the verifier’s basis choice — or risk being caught cheating.bit_commit_cartoon

How does this work? Commitments to bits, or even qubits, are an old story in cryptography. The standard method for committing to a bit {b} is based on the use of a one-way permutation {f}, together with a hardcore predicate {h} for {f} (i.e.~an efficiently computable function {h: \{0,1\}^n\rightarrow \{0,1\}} such that given {f(x)}, it is hard to predict {h(x)}). The construction goes as follows. The committer selects a uniformly random string {r} and sends {(y,m)=(f(r),h(r)\oplus b)}. To unveil the commitment {b}, it is enough to reveal a string {r} such that {f(r)=y}; the receiver can then compute {h(r)} and decode {b=h(r)\oplus m}. The point is that since {f} is a permutation, the value {y} uniquely “commits” the sender to an {r}, and thus to a {b}; however, given {y=f(r)} for an unknown {r} the hardcore predicate {h(r)} looks uniformly random, thus {(y,m)} reveals no information about {b} to the receiver.

What is new in Mahadev’s scheme is not only that the commitment is to a qubit, instead of a bit, but even more importabtly that the commitment is provided by classical information, which is necessary to obtain a classical protocol. (Commitments to qubits, using qubits, can be obtained by combining the quantum one-time pad with the commitment scheme described above.) To explain how this is achieved we’ll need a slightly more advanced crypographic primitive: a pair of injective trapdoor one-way functions {f_0,f_1:\{0,1\}^n\rightarrow\{0,1\}^n}. This means that it is easy to evaluate both functions on any input, but that given a value {y} in their common range, it is hard to find a preimage of {y} under either function — except if one is given the trapdoor information. (Note that this is an over-simplification of the actual primitive used by Mahadev, which has additional properties, including that of being “claw-free”.)

The commitment phase of the protocol works as follows. Starting from a state {| \psi \rangle=\alpha| 0 \rangle+\beta| 1 \rangle} of its choice, the prover is supposed to perform the following steps. First, the prover creates a uniform superposition over the common domain of {f_0} and {f_1}. Then it evaluates either function, {f_0} or {f_1}, in an additional register, by controlling on the qubit of {| \psi \rangle}. Finally, the prover measures the register that contains the image of {f_0} or {f_1}. This achieves the following sequence of transformations:

\displaystyle \begin{array}{rcl} \alpha| 0 \rangle+\beta| 1 \rangle &\mapsto& (\alpha| 0 \rangle + \beta| 1 \rangle) \otimes \Big(2^{-n/2} \sum_{x\in\{0,1\}^n} | x \rangle\Big) \\ &\mapsto & 2^{-n/2} \sum_x \alpha | 0 \rangle| x \rangle| f_0(x) \rangle + \beta | 1 \rangle| f_1(x) \rangle\\ &\mapsto & \big(\alpha| 0 \rangle| x_0 \rangle+\beta| 1 \rangle| x_1 \rangle\big)| y \rangle\;, \end{array}

where {y\in\{0,1\}^n} is the measured image. The string {y} is called the prover’s commitment string. It is required to report it to the verifier.

In what sense is {y} a commitment to the state {| \psi \rangle}? The key point is that, once it has measured {y}, the prover has “lost control” over its qubit — it has effectively handed over that control to the verifier. For example, the prover no longer has the ability to perform an arbitrary rotation on its qubit. Why? The prover knows {y} (it had to report it to the verifier) but not {x_0} and {x_1} (this is the claw-free assumption on the pair {(f_0,f_1)}). What this means — though of course it has to be shown — is that the prover can no longer recover the state {| \psi \rangle}! It does not have the ability to “uncompute” {x_0} and {x_1}. Thus the qubit has been “set in cryptographic stone”. In contrast, the verifier can use the trapdoor information to recover {x_0} and {x_1}. This gives her extra leverage on the prover. This asymmetry, introduced by cryptography, is what eventually allows the verifier to extract a truthful measurement outcome from the prover (or detect lying).

It is such a wonderful idea! It stuns me every time Urmila explains it. Proving it is of course rather delicate. In this post I make an attempt at going into the idea in a little more depth. The best resource remains Urmila’s paper, as well as her talk at the Simons Institute.

Open questions

What is great about this result is not that it closes a decades-old open question, but that by introducing a truly novel idea it opens up a whole new field of investigation. Some of the ideas that led to the result were already fleshed out by Mahadev in her work on homomorphic encryption for quantum circuits, and I expect many more results to continue building on these ideas.

An obvious outstanding question is whether the cryptography is needed at all: could there be a scheme achieving the same result as Mahadev’s, but without computational assumptions on the prover? It is known that if such a scheme exists, it is unlikely to have the property of being blind, meaning that the prover learns nothing about the computation that the verifier wishes it to execute (aside from an upper bound on its length); see this paper for “implausibility” results in this direction. Mahadev’s protocol relies on “post-hoc” verification, and is not blind. Urmila points out that it is likely the protocol could be made blind by composing it with her protocol for homomorphic encryption. Could there be a different protocol, that would not go through post-hoc verification, but instead directly guide the prover through the evaluation of a universal circuit on an encrypted input, gate by gate, as did some previous works?



John BaezApplied Category Theory Course: Collaborative Design

In my online course we’re reading the fourth chapter of Fong and Spivak’s book Seven Sketches. Chapter 4 is about collaborative design: building big projects from smaller parts. This is based on work by Andrea Censi:

• Andrea Censi, A mathematical theory of co-design.

The main mathematical content of this chapter is the theory of enriched profunctors. We’ll mainly talk about enriched profunctors between categories enriched in monoidal preorders. The picture above shows what one of these looks like!

Here are my lectures so far:

Lecture 55 – Chapter 4: Enriched Profunctors and Collaborative Design
Lecture 56 – Chapter 4: Feasibility Relations
Lecture 57 – Chapter 4: Feasibility Relations
Lecture 58 – Chapter 4: Composing Feasibility Relations
Lecture 59 – Chapter 4: Cost-Enriched Profunctors
Lecture 60 – Chapter 4: Closed Monoidal Preorders
Lecture 61 – Chapter 4: Closed Monoidal Preorders
Lecture 62 – Chapter 4: Constructing Enriched Categories
Lecture 63 – Chapter 4: Composing Enriched Profunctors
Lecture 64 – Chapter 4: The Category of Enriched Profunctors
Lecture 65 – Chapter 4: Collaborative Design
Lecture 66 – Chapter 4: Collaborative Design
Lecture 67 – Chapter 4: Feedback in Collaborative Design
Lecture 68 – Chapter 4: Feedback in Collaborative Design
Lecture 69 – Chapter 4: Feedback in Collaborative Design
Lecture 70 – Chapter 4: Tensoring Enriched Profunctors

August 05, 2018

Jordan EllenbergBefore the Golden Age, and memories of memories

When I was a kid, one of my favorite books was Before the Golden Age, a thousand-page anthology Isaac Asimov edited of his favorite stories from the pulp era of science fiction, the early 1930s, when Asimov was a teenager.  I was reading those stories at about the same age Asimov was when he read them.  Asimov put this anthology together in 1974, and remarks in his afterwords on his surprise at how well he remembered these stories.  I, reading them in my own adulthood, am surprised by the same thing.  The armored fighting suits with all the different-colored rays!  1930s science fiction was really into “rays.”

On the other hand, reading these stories again now, and thinking about whether I’d want to lend this book to CJ, I’m stopped short by, well, how super-racist a lot of these stories are?  I hadn’t remembered this at all.  Like, you write a story (“Awlo of Ulm”) about a guy who makes himself smaller than an atom and discovers an entirely new subnuclear universe, and the wildest thing you can imagine finding there is… that the black-skinned subnuclear people are cannibalistic savages, and the yellow-skinned, slant-eyed ones are hyperrational, technically advanced, and cruel, and the white-skinned ones are sort of feudal and hapless but really standup guys when you get to know them?

Anyway, then I read the story, and then I read Asimov’s 1974 afterwords, when he writes about how he was stopped short, reading the stories again then, by how super-racist a lot of the stories were, and that he hadn’t remembered that at all.

So not only did I forget the stories had a lot of racism, I also forgot about Asimov forgetting about the stories having a lot of racism!

1930s SF was really worried about (but also, I think, kind of attracted to) the idea that humans, by relying on machines for aid, would become less and less physically capable, transforming first into big-headed weaklings and finally into animate brains, maybe with tiny eyes or beaks or tentacles attached.  This image comes up in at least three of the stories I’ve read so far (but is most vividly portrayed in “The Man Who Evolved.”)

Of course, you can ask:  was this actually a dominant concern of 1930s SF, or was it a dominant concern of nerdy teen Isaac Asimov?  What I know about the pulps is what I know from this anthology, so my memory of it is my memory of his memory of it.

When I was a kid, by the way, I sent Isaac Asimov a fan letter.  I was really into his collections of popular science essays, which I read again and again.  I told him “I’ll bet I’m your only seven-year-old fan.”  He sent back a postcard that said “I’ll bet you are not my only seven-year-old fan.”  Damn, Asimov, you burned me good.

August 03, 2018

Noncommutative GeometryMaster Class on Noncommutative geometry: spaces, bundles and connections

Registration is now open for this master class. For more details please check their website  here.

Matt von HippelAdversarial Collaborations for Physics

Sometimes physics debates get ugly. For the scientists reading this, imagine your worst opponents. Think of the people who always misinterpret your work while using shoddy arguments to prop up their own, where every question at a talk becomes a screaming match until you just stop going to the same conferences at all.

Now, imagine writing a paper with those people.

Adversarial collaborations, subject of a recent a contest on the blog Slate Star Codex, are a proposed method for resolving scientific debates. Two scientists on opposite sides of an argument commit to writing a paper together, describing the overall state of knowledge on the topic. For the paper to get published, both sides have to sign off on it: they both have to agree that everything in the paper is true. This prevents either side from cheating, or from coming back later with made-up objections: if a point in the paper is wrong, one side or the other is bound to catch it.

This won’t work for the most vicious debates, when one (or both) sides isn’t interested in common ground. But for some ongoing debates in physics, I think this approach could actually help.

One advantage of adversarial collaborations is in preventing accusations of bias. The debate between dark matter and MOND-like proposals is filled with these kinds of accusations: claims that one group or another is ignoring important data, being dishonest about the parameters they need to fit, or applying standards of proof they would never require of their own pet theory. Adversarial collaboration prevents these kinds of accusations: whatever comes out of an adversarial collaboration, both sides would make sure the other side didn’t bias it.

Another advantage of adversarial collaborations is that they make it much harder for one side to move the goalposts, or to accuse the other side of moving the goalposts. From the sidelines, one thing that frustrates me watching string theorists debate whether the theory can describe de Sitter space is that they rarely articulate what it would take to decisively show that a particular model gives rise to de Sitter. Any conclusion of an adversarial collaboration between de Sitter skeptics and optimists would at least guarantee that both parties agreed on the criteria. Similarly, I get the impression that many debates about interpretations of quantum mechanics are bogged down by one side claiming they’ve closed off a loophole with a new experiment, only for the other to claim it wasn’t the loophole they were actually using, something that could be avoided if both sides were involved in the experiment from the beginning.

It’s possible, even likely, that no-one will try adversarial collaboration for these debates. Even if they did, it’s quite possible the collaborations wouldn’t be able to agree on anything! Still, I have to hope that someone takes the plunge and tries writing a paper with their enemies. At minimum, it’ll be an interesting read!

July 30, 2018

John PreskillSo long, and thanks for all the Fourier transforms

The air conditioning in Caltech’s Annenberg Center for Information Science and Technology broke this July. Pasadena reached 87°F on the fourth, but my office missed the memo. The thermostat read 62°.

Hyperactive air conditioning suits a thermodynamicist’s office as jittery wifi suits an electrical-engineering building. Thermodynamicists call air conditioners “heat pumps.” A heat pump funnels heat—the energy of random motion—from cooler bodies to hotter. Heat flows spontaneously only from hot to cold on average, according to the Second Law of Thermodynamics. Pumping heat against its inclination costs work, organized energy drawn from a reliable source.

Reliable sources include batteries, coiled springs, and ACME anvils hoisted into the air. Batteries have chemical energy that power electric fans. ACME anvils have gravitational potential energy that splat coyotes.


I hoisted binder after binder onto my desk this July. The binders felt like understudies for ACME anvils, bulging with papers. They contained notes I’d written, and articles I’d read, for research throughout the past five years. My Caltech sojourn was switching off its lights and drawing its shutters. A control theorist was inheriting my desk. I had to move my possessions to an office downstairs, where I’d moonlight until quitting town.

Quitting town.

I hadn’t expected to feel at home in southern California, after stints in New and old England. But research and researchers drew me to California and then hooked me. Caltech’s Institute for Quantum Information and Matter (IQIM) has provided an intellectual home, colleagues-cum-friends, and a base from which to branch out to other scholars and institutions.

The IQIM has provided also the liberty to deck out my research program as a college dorm room with posters—according to my tastes, values, and exuberances. My thesis demanded the title “Quantum steampunk: Quantum information, thermodynamics, their intersection, and applications thereof across physics.” I began developing the concept of quantum steampunk on this blog. Writing a manifesto for the concept, in the thesis’s introduction, proved a delight:

The steampunk movement has invaded literature, film, and art over the past three decades. Futuristic technologies mingle, in steampunk works, with Victorian and wild-west settings. Top hats, nascent factories, and grimy cities counterbalance time machines, airships, and automata. The genre arguably originated in 1895, with the H.G. Wells novel The Time Machine. Recent steampunk books include the best-selling The Invention of Hugo Cabret; films include the major motion picture Wild Wild West; and artwork ranges from painting to jewelry to sculpture.

Steampunk captures the romanticism of fusing the old with the cutting-edge. Technologies proliferated during the Victorian era: locomotives, Charles Babbage’s analytical engine, factories, and more. Innovation facilitated exploration. Add time machines, and the spirit of adventure sweeps you away. Little wonder that fans flock to steampunk conventions, decked out in overcoats, cravats, and goggles.

What steampunk fans dream, quantum-information thermodynamicists live.

Thermodynamics budded during the late 1800s, when steam engines drove the Industrial Revolution. Sadi Carnot, Ludwig Boltzmann, and other thinkers wondered how efficiently engines could operate. Their practical questions led to fundamental insights—about why time flows; how much one can know about a physical system; and how simple macroscopic properties, like temperature, can capture complex behaviors, like collisions by steam particles. An idealization of steam—the classical ideal gas—exemplifies the conventional thermodynamic system. Such systems contain many particles, behave classically, and are often assumed to remain in equilibrium.

But thermodynamic concepts—such as heat, work, and equilibrium—characterize small scales, quantum systems, and out-of-equilibrium processes. Today’s experimentalists probe these settings, stretching single DNA strands with optical tweezers [4], cooling superconducting qubits to build quantum computers [5, 6], and extracting work from single-electron boxes [7]. These settings demand reconciliation with 19th-century thermodynamics. We need a toolkit for fusing the old with the new.

Quantum information (QI) theory provides such a toolkit. Quantum phenomena serve as resources for processing information in ways impossible with classical systems. Quantum computers can solve certain computationally difficult problems quickly; quantum teleportation transmits information as telephones cannot; quantum cryptography secures messages; and quantum metrology centers on high- precision measurements. These applications rely on entanglement (strong correlations between quantum systems), disturbances by measurements, quantum uncertainty, and discreteness.

Technological promise has driven fundamental insights, as in thermodynamics. QI theory has blossomed into a mathematical toolkit that includes entropies, uncertainty relations, and resource theories. These tools are reshaping fundamental science, in applications across physics, computer science, and chemistry.

QI is being used to update thermodynamics, in the field of quantum thermodynamics (QT) [8, 9]. QT features entropies suited to small scales; quantum engines; the roles of coherence in thermalization and transport; and the transduction of information into work, à la Maxwell’s demon [10].

This thesis (i) contributes to the theory of QI thermodynamics and (ii) applies the theory, as a toolkit, across physics. Spheres touched on include atomic, molecular, and optical (AMO) physics; nonequilibrium statistical mechanics; condensed matter; chemistry; and high-energy physics. I propose the name quantum steampunk for this program…

Never did I anticipate, in college, that a PhD could reflect my identity and style. I feared losing myself and my perspective in a subproblem of a subproblem of a subproblem. But I found myself blessed with the chance to name the aesthetic that’s guided my work, the scent I’ve unconsciously followed from book to class to research project to conversation, to paper, since…middle school, come to think of it. I’m grateful for that opportunity.

Q. steampunk

Whump, went my quantum-engine binder on my desk. I’d stuck an address label, pointing to Annenberg, to the binder. If the binder walked away, whoever found it would know where it belonged. Scratching at the label with a fingernail failed to budge the sticker. I stuck a label addressed to Cambridge, Massachusetts alongside the Pasadena address.

I’m grateful to be joining Harvard as an ITAMP (Institute for Theoretical Atomic, Molecular, and Optical Physics) Postdoctoral Fellow. You’ll be able to catch me in Harvard’s physics department, in ITAMP, or at MIT, starting this September.

While hunting for a Cambridge apartment, I skyped with potential roommates. I’d inquire about locations, about landlords and landladies, about tidiness, and about heating. The heating system’s pretty old, most tenants would admit. We keep the temperature between 60 and 65 degrees, to keep costs down. I’d nod and extol the layering of sweaters, but I shivered inside.

One tenant surprised me. The heating…works too well, she said. It’s pretty warm, to tell the truth. I thought about heat pumps and quantum engines, about picnics in the Pasadena sunshine, about the Julys I’d enjoyed while the world around me had sweated. Within an hour, I’d committed to sharing the apartment.


Some of you have asked whether I’ll continue blogging for Quantum Frontiers. Yes: Extricating me from the IQIM requires more than 3,000 miles.

See you in Cambridge.


With apologies to Douglas Adams.

July 29, 2018

Jordan EllenbergUnitarians, legislative districting, and fairness

I gave a talk about gerrymandering at the Prairie Unitarian Universalist society.  As usual, I showed how you can pretty easily district a state with a 60-40 partisan split to give the popular majority party 60% of the seats, 40% of the seats, or 100% of the seats.  After I do that, I ask the audience which map they consider the most fair and which they consider the least fair.  Usually, people rate the proportional representation map the fairest, and the map where the popular minority controls the legislature the least fair.

But not this time!  The Unitarians, almost to a one, thought the districting where the popular majority controlled all the seats was the least fair.  I take from this that the Unitarian ethos rates “the minority rules over the majority” as a lesser evil than “the minority is given no voice at all.”




Terence TaoGamifying propositional logic: QED, an interactive textbook

About six years ago on this blog, I started thinking about trying to make a web-based game based around high-school algebra, and ended up using Scratch to write a short but playable puzzle game in which one solves linear equations for an unknown {x} using a restricted set of moves. (At almost the same time, there were a number of more professionally made games released along similar lines, most notably Dragonbox.)

Since then, I have thought a couple times about whether there were other parts of mathematics which could be gamified in a similar fashion. Shortly after my first blog posts on this topic, I experimented with a similar gamification of Lewis Carroll’s classic list of logic puzzles, but the results were quite clunky, and I was never satisfied with the results.

Over the last few weeks I returned to this topic though, thinking in particular about how to gamify the rules of inference of propositional logic, in a manner that at least vaguely resembles how mathematicians actually go about making logical arguments (e.g., splitting into cases, arguing by contradiction, using previous result as lemmas to help with subsequent ones, and so forth). The rules of inference are a list of a dozen or so deductive rules concerning propositional sentences (things like “({A} AND {B}) OR (NOT {C})”, where {A,B,C} are some formulas). A typical such rule is Modus Ponens: if the sentence {A} is known to be true, and the implication “{A} IMPLIES {B}” is also known to be true, then one can deduce that {B} is also true. Furthermore, in this deductive calculus it is possible to temporarily introduce some unproven statements as an assumption, only to discharge them later. In particular, we have the deduction theorem: if, after making an assumption {A}, one is able to derive the statement {B}, then one can conclude that the implication “{A} IMPLIES {B}” is true without any further assumption.

It took a while for me to come up with a workable game-like graphical interface for all of this, but I finally managed to set one up, now using Javascript instead of Scratch (which would be hopelessly inadequate for this task); indeed, part of the motivation of this project was to finally learn how to program in Javascript, which turned out to be not as formidable as I had feared (certainly having experience with other C-like languages like C++, Java, or lua, as well as some prior knowledge of HTML, was very helpful). The main code for this project is available here. Using this code, I have created an interactive textbook in the style of a computer game, which I have titled “QED”. This text contains thirty-odd exercises arranged in twelve sections that function as game “levels”, in which one has to use a given set of rules of inference, together with a given set of hypotheses, to reach a desired conclusion. The set of available rules increases as one advances through the text; in particular, each new section gives one or more rules, and additionally each exercise one solves automatically becomes a new deduction rule one can exploit in later levels, much as lemmas and propositions are used in actual mathematics to prove more difficult theorems. The text automatically tries to match available deduction rules to the sentences one clicks on or drags, to try to minimise the amount of manual input one needs to actually make a deduction.

Most of one’s proof activity takes place in a “root environment” of statements that are known to be true (under the given hypothesis), but for more advanced exercises one has to also work in sub-environments in which additional assumptions are made. I found the graphical metaphor of nested boxes to be useful to depict this tree of sub-environments, and it seems to combine well with the drag-and-drop interface.

The text also logs one’s moves in a more traditional proof format, which shows how the mechanics of the game correspond to a traditional mathematical argument. My hope is that this will give students a way to understand the underlying concept of forming a proof in a manner that is more difficult to achieve using traditional, non-interactive textbooks.

I have tried to organise the exercises in a game-like progression in which one first works with easy levels that train the player on a small number of moves, and then introduce more advanced moves one at a time. As such, the order in which the rules of inference are introduced is a little idiosyncratic. The most powerful rule (the law of the excluded middle, which is what separates classical logic from intuitionistic logic) is saved for the final section of the text.

Anyway, I am now satisfied enough with the state of the code and the interactive text that I am willing to make both available (and open source; I selected a CC-BY licence for both), and would be happy to receive feedback on any aspect of the either. In principle one could extend the game mechanics to other mathematical topics than the propositional calculus – the rules of inference for first-order logic being an obvious next candidate – but it seems to make sense to focus just on propositional logic for now.

Jordan EllenbergWisconsin municipal vexillology update

Madison is changing its flag!  The old one

has a Zuni sun symbol in the middle of it, which people correctly feel is a sort of random and annoying and unrelated-to-Madison vic of somebody else’s religious symbol.  On the other hand, on pure design grounds it’s kind of a great flag!  Simple, but you see the lakes, the isthmus, the Capitol.  The new flag elegantly keeps all that while skimming off the cultural appropriation:


Meanwhile, in Milwaukee, pressure is mounting to adopt the People’s Flag. Milwaukee’s existing flag is an ungepotchkit mess, routinely ranked among the nation’s worst city banners.  I mean:

I think my favorite part of this mess is that there are two miniflags inside this flag, and the one that’s not the U.S. flag nobody even remembers what it is!

Anyway, this is the proposed new flag, currently the subject of hot civic dissent:

I think this is great.  Daring color choices, you get your lake, you get your big flat lake, you get your optimistic sense of sunrise.  Make the right choice, Cream City!

July 27, 2018

Matt von HippelJournalists Need to Adapt to Preprints, Not Ignore Them

Nature has an article making the rounds this week, decrying the dangers of preprints.

On the surface, this is a bit like an article by foxes decrying the dangers of henhouses. There’s a pretty big conflict of interest when a journal like Nature, that makes huge amounts of money out of research scientists would be happy to publish for free, gets snippy about scientists sharing their work elsewhere. I was expecting an article about how “important” the peer review process is, how we can’t just “let anyone” publish, and the like.

Instead, I was pleasantly surprised. The article is about a real challenge, the weakening of journalistic embargoes. While this is still a problem I think journalists can think their way around, it’s a bit subtler than the usual argument.

For the record, peer review is usually presented as much more important than it actually is. When a scientific article gets submitted to a journal, it gets sent to two or three experts in the field for comment. In the best cases, these experts read the paper carefully and send criticism back. They don’t replicate the experiments, they don’t even (except for a few heroic souls) reproduce the calculations. That kind of careful reading is important, but it’s hardly unique: it’s something scientists do on their own when they want to build off of someone else’s paper, and it’s what good journalists get when they send a paper to experts for comments before writing an article. If peer review in a journal is important, it’s to ensure that this careful reading happens at least once, a sort of minimal evidence that the paper is good enough to appear on a scientist’s CV.

The Nature article points out that peer review serves another purpose, specifically one of delay. While a journal is preparing to publish an article they can send it out to journalists, after making them sign an agreement (an embargo) that they won’t tell the public until the journal publishes. This gives the journalists a bit of lead time, so the more responsible ones can research and fact-check before publishing.

Open-access preprints cut out the lead time. If the paper just appears online with no warning and no embargoes, journalists can write about it immediately. The unethical journalists can skip fact-checking and publish first, and the ethical ones have to follow soon after, or risk publishing “old news”. Nobody gets the time to properly vet, or understand, a new paper.

There’s a simple solution I’ve seen from a few folks on Twitter: “Don’t be an unethical journalist!” That doesn’t actually solve the problem though. The question is, if you’re an ethical journalist, but other people are unethical journalists, what do you do?

Apparently, what some ethical journalists do is to carry on as if preprints didn’t exist. The Nature article describes journalists who, after a preprint has been covered extensively by others, wait until a journal publishes it and then cover it as if nothing had happened. The article frames this as virtuous, but doomed: journalists sticking to their ethics even if it means publishing “old news”.

To be 100% clear here, this is not virtuous. If you present a paper’s publication in a journal as news, when it was already released as a preprint, you are actively misleading the public. I can’t count the number of times I’ve gotten messages from readers, confused because they saw a scientific result covered again months later and thought it was new. It leads to a sort of mental “double-counting”, where the public assumes that the scientific result was found twice, and therefore that it’s more solid. Unless the publication itself is unexpected (something that wasn’t expected to pass peer review, or something controversial like Mochizuki’s proof of the ABC conjecture) mere publication in a journal of an already-public result is not news.

What science journalists need to do here is to step back, and think about how their colleagues cover stories. Current events these days don’t have embargoes, they aren’t fed through carefully managed press releases. There’s a flurry of initial coverage, and it gets things wrong and misses details and misleads people, because science isn’t the only field that’s complicated, real life is complicated. Journalists have adapted to this schedule, mostly, by specializing. Some journalists and news outlets cover breaking news as it happens, others cover it later with more in-depth analysis. Crucially, the latter journalists don’t present the topic as new. They write explicitly in the light of previous news, as a response to existing discussion. That way, the public isn’t misled, and their existing misunderstandings can be corrected.

The Nature article brings up public health, and other topics where misunderstandings can do lasting damage, as areas where embargoes are useful. While I agree, I would hope many of these areas would figure out embargoes on their own. My field certainly does: the big results of scientific collaborations aren’t just put online as preprints, they’re released only after the collaboration sets up its own journalistic embargoes, and prepares its own press releases. In a world of preprints, this sort of practice needs to happen for important controversial public health and environmental results as well. Unethical scientists might still release too fast, to keep journalists from fact-checking, but they could do that anyway, without preprints. You don’t need a preprint to call a journalist on the phone and claim you cured cancer.

As open-access preprints become the norm, journalists will have to adapt. I’m confident they will be able to, but only if they stop treating science journalism as unique, and start treating it as news. Science journalism isn’t teaching, you’re not just passing down facts someone else has vetted. You’re asking the same questions as any other journalist: who did what? And what really happened? If you can do that, preprints shouldn’t be scary.

July 26, 2018

Sean CarrollMindscape Podcast

For anyone who hasn’t been following along on other social media, the big news is that I’ve started a podcast, called Mindscape. It’s still young, but early returns are promising!

I won’t be posting each new episode here; the podcast has a “blog” of its own, and episodes and associated show notes will be published there. You can subscribe by RSS as usual, or there is also an email list you can sign up for. For podcast aficionados, Mindscape should be available wherever finer podcasts are served, including iTunes, Google Play, Stitcher, Spotify, and so on.

As explained at the welcome post, the format will be fairly conventional: me talking to smart people about interesting ideas. It won’t be all, or even primarily, about physics; much of my personal motivation is to get the opportunity to talk about all sorts of other interesting things. I’m expecting there will be occasional solo episodes that just have me rambling on about one thing or another.

We’ve already had a bunch of cool guests, check these out:

And there are more exciting episodes on the way. Enjoy, and spread the word!

July 23, 2018

Terence TaoA corollary of the ergodic theorem

Let {(X,T,\mu)} be a measure-preserving system – a probability space {(X,\mu)} equipped with a measure-preserving translation {T} (which for simplicity of discussion we shall assume to be invertible). We will informally think of two points {x,y} in this space as being “close” if {y = T^n x} for some {n} that is not too large; this allows one to distinguish between “local” structure at a point {x} (in which one only looks at nearby points {T^n x} for moderately large {n}) and “global” structure (in which one looks at the entire space {X}). The local/global distinction is also known as the time-averaged/space-averaged distinction in ergodic theory.

A measure-preserving system is said to be ergodic if all the invariant sets are either zero measure or full measure. An equivalent form of this statement is that any measurable function {f: X \rightarrow {\bf R}} which is locally essentially constant in the sense that {f(Tx) = f(x)} for {\mu}-almost every {x}, is necessarily globally essentially constant in the sense that there is a constant {c} such that {f(x) = c} for {\mu}-almost every {x}. A basic consequence of ergodicity is the mean ergodic theorem: if {f \in L^2(X,\mu)}, then the averages {x \mapsto \frac{1}{N} \sum_{n=1}^N f(T^n x)} converge in {L^2} norm to the mean {\int_X f\ d\mu}. (The mean ergodic theorem also applies to other {L^p} spaces with {1 < p < \infty}, though it is usually proven first in the Hilbert space {L^2}.) Informally: in ergodic systems, time averages are asymptotically equal to space averages. Specialising to the case of indicator functions, this implies in particular that {\frac{1}{N} \sum_{n=1}^N \mu( E \cap T^n E)} converges to {\mu(E)^2} for any measurable set {E}.

In this short note I would like to use the mean ergodic theorem to show that ergodic systems also have the property that “somewhat locally constant” functions are necessarily “somewhat globally constant”; this is not a deep observation, and probably already in the literature, but I found it a cute statement that I had not previously seen. More precisely:

Corollary 1 Let {(X,T,\mu)} be an ergodic measure-preserving system, and let {f: X \rightarrow {\bf R}} be measurable. Suppose that

\displaystyle  \limsup_{N \rightarrow \infty} \frac{1}{N} \sum_{n=1}^N \mu( \{ x \in X: f(T^n x) = f(x) \} ) \geq \delta \ \ \ \ \ (1)

for some {0 \leq \delta \leq 1}. Then there exists a constant {c} such that {f(x)=c} for {x} in a set of measure at least {\delta}.

Informally: if {f} is locally constant on pairs {x, T^n x} at least {\delta} of the time, then {f} is globally constant at least {\delta} of the time. Of course the claim fails if the ergodicity hypothesis is dropped, as one can simply take {f} to be an invariant function that is not essentially constant, such as the indicator function of an invariant set of intermediate measure. This corollary can be viewed as a manifestation of the general principle that ergodic systems have the same “global” (or “space-averaged”) behaviour as “local” (or “time-averaged”) behaviour, in contrast to non-ergodic systems in which local properties do not automatically transfer over to their global counterparts.

Proof: By composing {f} with (say) the tangent function, we may assume without loss of generality that {f} is bounded. Let {k>0}, and partition {X} as {\bigcup_{m \in {\bf Z}} E_{m,k}}, where {E_{m,k}} is the level set

\displaystyle  E_{m,k} := \{ x \in X: m 2^{-k} \leq f(x) < (m+1) 2^{-k} \}.

For each {k}, only finitely many of the {E_{m,k}} are non-empty. By (1), one has

\displaystyle  \limsup_{N \rightarrow \infty} \sum_m \frac{1}{N} \sum_{n=1}^N \mu( E_{m,k} \cap T^n E_{m,k} ) \geq \delta.

Using the ergodic theorem, we conclude that

\displaystyle  \sum_m \mu( E_{m,k} )^2 \geq \delta.

On the other hand, {\sum_m \mu(E_{m,k}) = 1}. Thus there exists {m_k} such that {\mu(E_{m_k,k}) \geq \delta}, thus

\displaystyle  \mu( \{ x \in X: m_k 2^{-k} \leq f(x) < (m_k+1) 2^{-k} \} ) \geq \delta.

By the Bolzano-Weierstrass theorem, we may pass to a subsequence where {m_k 2^{-k}} converges to a limit {c}, then we have

\displaystyle  \mu( \{ x \in X: c-2^{-k} \leq f(x) \leq c+2^{-k} \}) \geq \delta

for infinitely many {k}, and hence

\displaystyle  \mu( \{ x \in X: f(x) = c \}) \geq \delta.

The claim follows. \Box

July 20, 2018

Matt von HippelThe Physics Isn’t New, We Are

Last week, I mentioned the announcement from the IceCube, Fermi-LAT, and MAGIC collaborations of high-energy neutrinos and gamma rays detected from the same source, the blazar TXS 0506+056. Blazars are sources of gamma rays, thought to be enormous spinning black holes that act like particle colliders vastly more powerful than the LHC. This one, near Orion’s elbow, is “aimed” roughly at Earth, allowing us to detect the light and particles it emits. On September 22, a neutrino with energy around 300 TeV was detected by IceCube (a kilometer-wide block of Antarctic ice stuffed with detectors), coming from the direction of TXS 0506+056. Soon after, the satellite Fermi-LAT and ground-based telescope MAGIC were able to confirm that the blazar TXS 0506+056 was flaring at the time. The IceCube team then looked back, and found more neutrinos coming from the same source in earlier years. There are still lingering questions (Why didn’t they see this kind of behavior from other, closer blazars?) but it’s still a nice development in the emerging field of “multi-messenger” astronomy.

It also got me thinking about a conversation I had a while back, before one of Perimeter’s Public Lectures. An elderly fellow was worried about the LHC. He wondered if putting all of that energy in the same place, again and again, might do something unprecedented: weaken the fabric of space and time, perhaps, until it breaks? He acknowledged this didn’t make physical sense, but what if we’re wrong about the physics? Do we really want to take that risk?

At the time, I made the same point that gets made to counter fears of the LHC creating a black hole: that the energy of the LHC is less than the energy of cosmic rays, particles from space that collide with our atmosphere on a regular basis. If there was any danger, it would have already happened. Now, knowing about blazars, I can make a similar point: there are “galactic colliders” with energies so much higher than any machine we can build that there’s no chance we could screw things up on that kind of scale: if we could, they already would have.

This connects to a broader point, about how to frame particle physics. Each time we build an experiment, we’re replicating something that’s happened before. Our technology simply isn’t powerful enough to do something truly unprecedented in the universe: we’re not even close! Instead, the point of an experiment is to reproduce something where we can see it. It’s not the physics itself, but our involvement in it, our understanding of it, that’s genuinely new.

The IceCube experiment itself is a great example of this: throughout Antarctica, neutrinos collide with ice. The only difference is that in IceCube’s ice, we can see them do it. More broadly, I have to wonder how much this is behind the “unreasonable effectiveness of mathematics”: if mathematics is just the most precise way humans have to communicate with each other, then of course it will be effective in physics, since the goal of physics is to communicate the nature of the world to humans!

There may well come a day when we’re really able to do something truly unprecedented, that has never been done before in the history of the universe. Until then, we’re playing catch-up, taking laws the universe has tested extensively and making them legible, getting humanity that much closer to understanding physics that, somewhere out there, already exists.

n-Category Café Compositionality: the Editorial Board

An editorial board has now been chosen for the journal Compositionality, and they’re waiting for people to submit papers.

We are happy to announce the founding editorial board of Compositionality, featuring established researchers working across logic, computer science, physics, linguistics, coalgebra, and pure category theory (see the full list below). Our steering board considered many strong applications to our initial open call for editors, and it was not easy narrowing down to the final list, but we think that the quality of this editorial board and the general response bodes well for our growing research community.

In the meantime, we hope you will consider submitting something to our first issue. Look out in the coming weeks for the journal’s official open-for-submissions announcement.

The editorial board of Compositionality:

• Corina Cristea, University of Southampton, UK

• Ross Duncan, University of Strathclyde, UK

• Andrée Ehresmann, University of Picardie Jules Verne, France

• Tobias Fritz, Max Planck Institute, Germany

• Neil Ghani, University of Strathclyde, UK

• Dan Ghica, University of Birmingham, UK

• Jeremy Gibbons, University of Oxford, UK

• Nick Gurski, Case Western Reserve University, USA

• Helle Hvid Hansen, Delft University of Technology, Netherlands

• Chris Heunen, University of Edinburgh, UK

• Aleks Kissinger, Radboud University, Netherlands

• Joachim Kock, Universitat Autònoma de Barcelona, Spain

• Martha Lewis, University of Amsterdam, Netherlands

• Samuel Mimram, École Polytechnique, France

• Simona Paoli, University of Leicester, UK

• Dusko Pavlovic, University of Hawaii, USA

• Christian Retoré, Université de Montpellier, France

• Mehrnoosh Sadrzadeh, Queen Mary University, UK

• Peter Selinger, Dalhousie University, Canada

• Pawel Sobocinski, University of Southampton, UK

• David Spivak, MIT, USA

• Jamie Vicary, University of Birmingham, UK

• Simon Willerton, University of Sheffield, UK

Joshua Tan, Brendan Fong, and Nina Otter
Executive editors, Compositionality

July 19, 2018

Andrew Jaffe(Almost) The end of Planck

This week, we released (most of) the final set of papers from the Planck collaboration — the long-awaited Planck 2018 results (which were originally meant to be the “Planck 2016 results”, but everything takes longer than you hope…), available on the ESA website as well as the arXiv. More importantly for many astrophysicists and cosmologists, the final public release of Planck data is also available.

Anyway, we aren’t quite finished: those of you up on your roman numerals will notice that there are only 9 papers but the last one is “XII” — the rest of the papers will come out over the coming months. So it’s not the end, but at least it’s the beginning of the end.

And it’s been a long time coming. I attended my first Planck-related meeting in 2000 or so (and plenty of people had been working on the projects that would become Planck for a half-decade by that point). For the last year or more, the number of people working on Planck has dwindled as grant money has dried up (most of the scientists now analysing the data are doing so without direct funding for the work).

(I won’t rehash the scientific and technical background to the Planck Satellite and the cosmic microwave background (CMB), which I’ve been writing about for most of the lifetime of this blog.)

Planck 2018: the science

So, in the language of the title of the first paper in the series, what is the legacy of Planck? The state of our science is strong. For the first time, we present full results from both the temperature of the CMB and its polarization. Unfortunately, we don’t actually use all the data available to us — on the largest angular scales, Planck’s results remain contaminated by astrophysical foregrounds and unknown “systematic” errors. This is especially true of our measurements of the polarization of the CMB, unfortunately, which is probably Planck’s most significant limitation.

The remaining data are an excellent match for what is becoming the standard model of cosmology: ΛCDM, or “Lambda-Cold Dark Matter”, which is dominated, first, by a component which makes the Universe accelerate in its expansion (Λ, Greek Lambda), usually thought to be Einstein’s cosmological constant; and secondarily by an invisible component that seems to interact only by gravity (CDM, or “cold dark matter”). We have tested for more exotic versions of both of these components, but the simplest model seems to fit the data without needing any such extensions. We also observe the atoms and light which comprise the more prosaic kinds of matter we observe in our day-to-day lives, which make up only a few percent of the Universe.

All together, the sum of the densities of these components are just enough to make the curvature of the Universe exactly flat through Einstein’s General Relativity and its famous relationship between the amount of stuff (mass) and the geometry of space-time. Furthermore, we can measure the way the matter in the Universe is distributed as a function of the length scale of the structures involved. All of these are consistent with the predictions of the famous or infamous theory of cosmic inflation), which expanded the Universe when it was much less than one second old by factors of more than 1020. This made the Universe appear flat (think of zooming into a curved surface) and expanded the tiny random fluctuations of quantum mechanics so quickly and so much that they eventually became the galaxies and clusters of galaxies we observe today. (Unfortunately, we still haven’t observed the long-awaited primordial B-mode polarization that would be a somewhat direct signature of inflation, although the combination of data from Planck and BICEP2/Keck give the strongest constraint to date.)

Most of these results are encoded in a function called the CMB power spectrum, something I’ve shown here on the blog a few times before, but I never tire of the beautiful agreement between theory and experiment, so I’ll do it again: PlanckSpectra (The figure is from the Planck “legacy” paper; more details are in others in the 2018 series, especially the Planck “cosmological parameters” paper.) The top panel gives the power spectrum for the Planck temperature data, the second panel the cross-correlation between temperature and the so-called E-mode polarization, the left bottom panel the polarization-only spectrum, and the right bottom the spectrum from the gravitational lensing of CMB photons due to matter along the line of sight. (There are also spectra for the B mode of polarization, but Planck cannot distinguish these from zero.) The points are “one sigma” error bars, and the blue curve gives the best fit model.

As an important aside, these spectra per se are not used to determine the cosmological parameters; rather, we use a Bayesian procedure to calculate the likelihood of the parameters directly from the data. On small scales (corresponding to 𝓁>30 since 𝓁 is related to the inverse of an angular distance), estimates of spectra from individual detectors are used as an approximation to the proper Bayesian formula; on large scales (𝓁<30) we use a more complicated likelihood function, calculated somewhat differently for data from Planck’s High- and Low-frequency instruments, which captures more of the details of the full Bayesian procedure (although, as noted above, we don’t use all possible combinations of polarization and temperature data to avoid contamination by foregrounds and unaccounted-for sources of noise).

Of course, not all cosmological data, from Planck and elsewhere, seem to agree completely with the theory. Perhaps most famously, local measurements of how fast the Universe is expanding today — the Hubble constant — give a value of H0 = (73.52 ± 1.62) km/s/Mpc (the units give how much faster something is moving away from us in km/s as they get further away, measured in megaparsecs (Mpc); whereas Planck (which infers the value within a constrained model) gives (67.27 ± 0.60) km/s/Mpc . This is a pretty significant discrepancy and, unfortunately, it seems difficult to find an interesting cosmological effect that could be responsible for these differences. Rather, we are forced to expect that it is due to one or more of the experiments having some unaccounted-for source of error.

The term of art for these discrepancies is “tension” and indeed there are a few other “tensions” between Planck and other datasets, as well as within the Planck data itself: weak gravitational lensing measurements of the distortion of light rays due to the clustering of matter in the relatively nearby Universe show evidence for slightly weaker clustering than that inferred from Planck data. There are tensions even within Planck, when we measure the same quantities by different means (including things related to similar gravitational lensing effects). But, just as “half of all three-sigma results are wrong”, we expect that we’ve mis- or under-estimated (or to quote the no-longer-in-the-running-for-the-worst president ever, “misunderestimated”) our errors much or all of the time and should really learn to expect this sort of thing. Some may turn out to be real, but many will be statistical flukes or systematic experimental errors.

(If you were looking a briefer but more technical fly-through the Planck results — from someone not on the Planck team — check out Renee Hlozek’s tweetstorm.)

Planck 2018: lessons learned

So, Planck has more or less lived up to its advanced billing as providing definitive measurements of the cosmological parameters, while still leaving enough “tensions” and other open questions to keep us cosmologists working for decades to come (we are already planning the next generation of ground-based telescopes and satellites for measuring the CMB).

But did we do things in the best possible way? Almost certainly not. My colleague (and former grad student!) Joe Zuntz has pointed out that we don’t use any explicit “blinding” in our statistical analysis. The point is to avoid our own biases when doing an analysis: you don’t want to stop looking for sources of error when you agree with the model you thought would be true. This works really well when you can enumerate all of your sources of error and then simulate them. In practice, most collaborations (such as the Polarbear team with whom I also work) choose to un-blind some results exactly to be able to find such sources of error, and indeed this is the motivation behind the scores of “null tests” that we run on different combinations of Planck data. We discuss this a little in an appendix of the “legacy” paper — null tests are important, but we have often found that a fully blind procedure isn’t powerful enough to find all sources of error, and in many cases (including some motivated by external scientists looking at Planck data) it was exactly low-level discrepancies within the processed results that have led us to new systematic effects. A more fully-blind procedure would be preferable, of course, but I hope this is a case of the great being the enemy of the good (or good enough). I suspect that those next-generation CMB experiments will incorporate blinding from the beginning.

Further, although we have released a lot of software and data to the community, it would be very difficult to reproduce all of our results. Nowadays, experiments are moving toward a fully open-source model, where all the software is publicly available (in Planck, not all of our analysis software was available to other members of the collaboration, much less to the community at large). This does impose an extra burden on the scientists, but it is probably worth the effort, and again, needs to be built into the collaboration’s policies from the start.

That’s the science and methodology. But Planck is also important as having been one of the first of what is now pretty standard in astrophysics: a collaboration of many hundreds of scientists (and many hundreds more of engineers, administrators, and others without whom Planck would not have been possible). In the end, we persisted, and persevered, and did some great science. But I learned that scientists need to learn to be better at communicating, both from the top of the organisation down, and from the “bottom” (I hesitate to use that word, since that is where much of the real work is done) up, especially when those lines of hoped-for communication are usually between different labs or Universities, very often between different countries. Physicists, I have learned, can be pretty bad at managing — and at being managed. This isn’t a great combination, and I say this as a middle-manager in the Planck organisation, very much guilty on both fronts.

Andrew JaffeLoncon 3

Briefly (but not brief enough for a single tweet): I’ll be speaking at Loncon 3, the 72nd World Science Fiction Convention, this weekend (doesn’t that website have a 90s retro feel?).

At 1:30 on Saturday afternoon, I’ll be part of a panel trying to answer the question “What Is Science?” As Justice Potter Stewart once said in a somewhat more NSFW context, the best answer is probably “I know it when I see it” but we’ll see if we can do a little better than that tomorrow. My fellow panelists seem to be writers, curators, philosophers and theologians (one of whom purports to believe that the “the laws of thermodynamics prove the existence of God” — a claim about which I admit some skepticism…) so we’ll see what a proper physicist can add to the discussion.

At 8pm in the evening, for participants without anything better to do on a Saturday night, I’ll be alone on stage discussing “The Random Universe”, giving an overview of how we can somehow learn about the Universe despite incomplete information and inherently random physical processes.

There is plenty of other good stuff throughout the convention, which runs from 14 to 18 August. Imperial Astrophysics will be part of “The Great Cosmic Show”, with scientists talking about some of the exciting astrophysical research going on here in London. And Imperial’s own Dave Clements is running the whole (not fictional) science programme for the convention. If you’re around, come and say hi to any or all of us.

n-Category Café The Duties of a Mathematician

What are the ethical responsibilities of a mathematician? I can think of many, some of which I even try to fulfill, but this document raises one that I have mixed feelings about:


The ethical responsibility of mathematicians includes a certain duty, never precisely stated in any formal way, but of course felt by and known to serious researchers: to dedicate an appropriate amount of time to study each new groundbreaking theory or proof in one’s general area. Truly groundbreaking theories are rare, and this duty is not too cumbersome. This duty is especially applicable to researchers who are in the most active research period of their mathematical life and have already senior academic positions. In real life this informal duty can be taken to mean that a reasonable number of mathematicians in each major mathematical country studies such groundbreaking theories.

My first reaction to this claimed duty was quite personal: namely, that I couldn’t possibly meet it. My research is too thinly spread over too many fields to “study each new groundbreaking theory or proof” in my general area. While Fesenko says that “truly groundbreaking theories are rare, and this duty is not too cumbersome”, I feel the opposite. I’d really love to learn more about the Langlands program, and the amplitudohedron, and Connes’ work on the Riemann Hypothesis, and Lurie’s work on (,1)(\infty,1)-topoi, and homotopy type theory, and Monstrous Moonshine, and new developments in machine learning, and … many other things. But there’s not enough time!

More importantly, while it’s undeniably good to know what’s going on, that doesn’t make it a “duty”. I believe mathematicians should be free to study what they’re interested in.

But perhaps Fesenko has a specific kind of mathematician in mind, without mentioning it: not the larks who fly free, but the solid, established “gatekeepers” and “empire-builders”. These are the people who master a specific field, gain academic power, and strongly influence the field’s development, often by making pronouncements about what’s important and what’s not.

For such people to ignore promising developments in their self-proclaimed realm of expertise can indeed be damaging. Perhaps these people have a duty to spend a certain amount of time studying each new ground-breaking theory in their ambit. But I’m fundamentally suspicious of these people in the first place! So, I’m not eager to figure out their duties.

What do you think about “the duties of a mathematician”?

Of course I would be remiss not to mention the obvious, namely that Fesenko is complaining about the reception of Mochizuki’s work on inter-universal Teichmüller theory. If you read his whole article, that will be completely clear. But this is a controversial subject, and “hard cases make bad law”—so while it makes a fascinating read, I’d rather talk about the duties of a mathematician more generally. If you want to discuss what Fesenko has to say about inter-universal Teichmüller theory, Peter Woit’s blog might be a better place, since he’s jumped right into the middle of that conversation:

As for me, my joy is to learn new mathematics, figure things out, explain things, and talk to people about math. My duties include helping students who are having trouble, trying to make mathematics open-access, and coaxing mathematicians to turn their skills toward saving the planet. The difference is that joy makes me do things spontaneously, while duty taps me on the shoulder and says “don’t forget….”

July 17, 2018

Terence Tao246C notes 2: Circle packings, conformal maps, and quasiconformal maps

We now leave the topic of Riemann surfaces, and turn now to the (loosely related) topic of conformal mapping (and quasiconformal mapping). Recall that a conformal map {f: U \rightarrow V} from an open subset {U} of the complex plane to another open set {V} is a map that is holomorphic and bijective, which (by Rouché’s theorem) also forces the derivative of {f} to be nowhere vanishing. We then say that the two open sets {U,V} are conformally equivalent. From the Cauchy-Riemann equations we see that conformal maps are orientation-preserving and angle-preserving; from the Newton approximation {f( z_0 + \Delta z) \approx f(z_0) + f'(z_0) \Delta z + O( |\Delta z|^2)} we see that they almost preserve small circles, indeed for {\varepsilon} small the circle {\{ z: |z-z_0| = \varepsilon\}} will approximately map to {\{ w: |w - f(z_0)| = |f'(z_0)| \varepsilon \}}.

In previous quarters, we proved a fundamental theorem about this concept, the Riemann mapping theorem:

Theorem 1 (Riemann mapping theorem) Let {U} be a simply connected open subset of {{\bf C}} that is not all of {{\bf C}}. Then {U} is conformally equivalent to the unit disk {D(0,1)}.

This theorem was proven in these 246A lecture notes, using an argument of Koebe. At a very high level, one can sketch Koebe’s proof of the Riemann mapping theorem as follows: among all the injective holomorphic maps {f: U \rightarrow D(0,1)} from {U} to {D(0,1)} that map some fixed point {z_0 \in U} to {0}, pick one that maximises the magnitude {|f'(z_0)|} of the derivative (ignoring for this discussion the issue of proving that a maximiser exists). If {f(U)} avoids some point in {D(0,1)}, one can compose {f} with various holomorphic maps and use Schwarz’s lemma and the chain rule to increase {|f'(z_0)|} without destroying injectivity; see the previous lecture notes for details. The conformal map {\phi: U \rightarrow D(0,1)} is unique up to Möbius automorphisms of the disk; one can fix the map by picking two distinct points {z_0,z_1} in {U}, and requiring {\phi(z_0)} to be zero and {\phi(z_1)} to be positive real.

It is a beautiful observation of Thurston that the concept of a conformal mapping has a discrete counterpart, namely the mapping of one circle packing to another. Furthermore, one can run a version of Koebe’s argument (using now a discrete version of Perron’s method) to prove the Riemann mapping theorem through circle packings. In principle, this leads to a mostly elementary approach to conformal geometry, based on extremely classical mathematics that goes all the way back to Apollonius. However, in order to prove the basic existence and uniqueness theorems of circle packing, as well as the convergence to conformal maps in the continuous limit, it seems to be necessary (or at least highly convenient) to use much more modern machinery, including the theory of quasiconformal mapping, and also the Riemann mapping theorem itself (so in particular we are not structuring these notes to provide a completely independent proof of that theorem, though this may well be possible).

To make the above discussion more precise we need some notation.

Definition 2 (Circle packing) A (finite) circle packing is a finite collection {(C_j)_{j \in J}} of circles {C_j = \{ z \in {\bf C}: |z-z_j| = r_j\}} in the complex numbers indexed by some finite set {J}, whose interiors are all disjoint (but which are allowed to be tangent to each other), and whose union is connected. The nerve of a circle packing is the finite graph whose vertices {\{z_j: j \in J \}} are the centres of the circle packing, with two such centres connected by an edge if the circles are tangent. (In these notes all graphs are undirected, finite and simple, unless otherwise specified.)

It is clear that the nerve of a circle packing is connected and planar, since one can draw the nerve by placing each vertex (tautologically) in its location in the complex plane, and drawing each edge by the line segment between the centres of the circles it connects (this line segment will pass through the point of tangency of the two circles). Later in these notes we will also have to consider some infinite circle packings, most notably the infinite regular hexagonal circle packing.

The first basic theorem in the subject is the following converse statement:

Theorem 3 (Circle packing theorem) Every connected planar graph is the nerve of a circle packing.

Of course, there can be multiple circle packings associated to a given connected planar graph; indeed, since reflections across a line and Möbius transformations map circles to circles (or lines), they will map circle packings to circle packings (unless one or more of the circles is sent to a line). It turns out that once one adds enough edges to the planar graph, the circle packing is otherwise rigid:

Theorem 4 (Koebe-Andreev-Thurston theorem) If a connected planar graph is maximal (i.e., no further edge can be added to it without destroying planarity), then the circle packing given by the above theorem is unique up to reflections and Möbius transformations.

Exercise 5 Let {G} be a connected planar graph with {n \geq 3} vertices. Show that the following are equivalent:

  • (i) {G} is a maximal planar graph.
  • (ii) {G} has {3n-6} edges.
  • (iii) Every drawing {D} of {G} divides the plane into faces that have three edges each. (This includes one unbounded face.)
  • (iv) At least one drawing {D} of {G} divides the plane into faces that have three edges each.

(Hint: use Euler’s formula {V-E+F=2}, where {F} is the number of faces including the unbounded face.)

Thurston conjectured that circle packings can be used to approximate the conformal map arising in the Riemann mapping theorem. Here is an informal statement:

Conjecture 6 (Informal Thurston conjecture) Let {U} be a simply connected domain, with two distinct points {z_0,z_1}. Let {\phi: U \rightarrow D(0,1)} be the conformal map from {U} to {D(0,1)} that maps {z_0} to the origin and {z_1} to a positive real. For any small {\varepsilon>0}, let {{\mathcal C}_\varepsilon} be the portion of the regular hexagonal circle packing by circles of radius {\varepsilon} that are contained in {U}, and let {{\mathcal C}'_\varepsilon} be an circle packing of {D(0,1)} with all “boundary circles” tangent to {D(0,1)}, giving rise to an “approximate map” {\phi_\varepsilon: U_\varepsilon \rightarrow D(0,1)} defined on the subset {U_\varepsilon} of {U} consisting of the circles of {{\mathcal C}_\varepsilon}, their interiors, and the interstitial regions between triples of mutually tangent circles. Normalise this map so that {\phi_\varepsilon(z_0)} is zero and {\phi_\varepsilon(z_1)} is a positive real. Then {\phi_\varepsilon} converges to {\phi} as {\varepsilon \rightarrow 0}.

A rigorous version of this conjecture was proven by Rodin and Sullivan. Besides some elementary geometric lemmas (regarding the relative sizes of various configurations of tangent circles), the main ingredients are a rigidity result for the regular hexagonal circle packing, and the theory of quasiconformal maps. Quasiconformal maps are what seem on the surface to be a very broad generalisation of the notion of a conformal map. Informally, conformal maps take infinitesimal circles to infinitesimal circles, whereas quasiconformal maps take infinitesimal circles to infinitesimal ellipses of bounded eccentricity. In terms of Wirtinger derivatives, conformal maps obey the Cauchy-Riemann equation {\frac{\partial \phi}{\partial \overline{z}} = 0}, while (sufficiently smooth) quasiconformal maps only obey an inequality {|\frac{\partial \phi}{\partial \overline{z}}| \leq \frac{K-1}{K+1} |\frac{\partial \phi}{\partial z}|}. As such, quasiconformal maps are considerably more plentiful than conformal maps, and in particular it is possible to create piecewise smooth quasiconformal maps by gluing together various simple maps such as affine maps or Möbius transformations; such piecewise maps will naturally arise when trying to rigorously build the map {\phi_\varepsilon} alluded to in the above conjecture. On the other hand, it turns out that quasiconformal maps still have many vestiges of the rigidity properties enjoyed by conformal maps; for instance, there are quasiconformal analogues of fundamental theorems in conformal mapping such as the Schwarz reflection principle, Liouville’s theorem, or Hurwitz’s theorem. Among other things, these quasiconformal rigidity theorems allow one to create conformal maps from the limit of quasiconformal maps in many circumstances, and this will be how the Thurston conjecture will be proven. A key technical tool in establishing these sorts of rigidity theorems will be the theory of an important quasiconformal (quasi-)invariant, the conformal modulus (or, equivalently, the extremal length, which is the reciprocal of the modulus).

— 1. Proof of the circle packing theorem —

We loosely follow the treatment of Beardon and Stephenson. It is slightly more convenient to temporarily work in the Riemann sphere {{\bf C} \cup \{\infty\}} rather than the complex plane {{\bf C}}, in order to more easily use Möbius transformations. (Later we will make another change of venue, working in the Poincaré disk {D(0,1)} instead of the Riemann sphere.)

Define a Riemann sphere circle to be either a circle in {{\bf C}} or a line in {{\bf C}} together with {\infty}, together with one of the two components of the complement of this circle or line designated as the “interior”. In the case of a line, this “interior” is just one of the two half-planes on either side of the line; in the case of the circle, this is either the usual interior or the usual exterior plus the point at infinity; in the last case, we refer to the Riemann sphere circle as an exterior circle. (One could also equivalently work with an orientation on the circle rather than assigning an interior, since the interior could then be described as the region to (say) the left of the circle as one traverses the circle along the indicated orientation.) Note that Möbius transforms map Riemann sphere circles to Riemann sphere circles. If one views the Riemann sphere as a geometric sphere in Euclidean space {{\bf R}^3}, then Riemann sphere circles are just circles on this geometric sphere, which then have a centre on this sphere that lies in the region designated as the interior of the circle. We caution though that this “Riemann sphere” centre does not always correspond to the Euclidean notion of the centre of a circle. For instance, the real line, with the upper half-plane designated as interior, will have {i} as its Riemann sphere centre; if instead one designates the lower half-plane as the interior, the Riemann sphere centre will now be {-i}. We can then define a Riemann sphere circle packing in exact analogy with circle packings in {{\bf C}}, namely finite collections of Riemann sphere circles whose interiors are disjoint and whose union is connected; we also define the nerve as before. This is now a graph that can be drawn in the Riemann sphere, using great circle arcs in the Riemann sphere rather than line segments; it is also planar, since one can apply a Möbius transformation to move all the points and edges of the drawing away from infinity.

By Exercise 5, a maximal planar graph with at least three vertices can be drawn as a triangulation of the Riemann sphere. If there are at least four vertices, then it is easy to see that each vertex has degree at least three (a vertex of degree zero, one or two in a triangulation with simple edges will lead to a connected component of at most three vertices). It is a topological fact, not established here, that any two triangulations of such a graph are homotopic up to reflection (to reverse the orientation). If a Riemann sphere circle packing has the nerve of a maximal planar graph {G} of at least four vertices, then we see that this nerve induces an explicit triangulation of the Riemann sphere by connecting the centres of any pair of tangent circles with the great circle arc that passes through the point of tangency. If {G} was not maximal, one no longer gets a triangulation this way, but one still obtains a partition of the Riemann sphere into spherical polygons.

We remark that the triangles in this triangulation can also be described purely from the abstract graph {G}. Define a triangle in {G} to be a triple {w_1,w_2,w_3} of vertices in {G} which are all adjacent to each other, and such that the removal of these three vertices from {G} does not disconnect the graph. One can check that there is a one-to-one correspondence between such triangles in a maximal planar graph {G} and the triangles in any Riemann sphere triangulation of this graph.

Theorems 3, 4 are then a consequence of

Theorem 7 (Riemann sphere circle packing theorem) Let {G} be a maximal planar graph with at least four vertices, drawn as a triangulation of the Riemann sphere. Then there exists a Riemann sphere circle packing with nerve {G} whose triangulation is homotopic to the given triangulation. Furthermore, this packing is unique up to Möbius transformations.

Exercise 8 Deduce Theorems 3, 4 from Theorem 7. (Hint: If one has a non-maximal planar graph for Theorem 3, add a vertex at the interior of each non-triangular face of a drawing of that graph, and connect that vertex to the vertices of the face, to create a maximal planar graph to which Theorem 4 or Theorem 7 can be applied. Then delete these “helper vertices” to create a packing of the original planar graph that does not contain any “unwanted” tangencies. You may use without proof the above assertion that any two triangulations of a maximal planar graph are homotopic up to reflection.)

Exercise 9 Verify Theorem 7 when {G} has exactly four vertices. (Hint: for the uniqueness, one can use Möbius transformations to move two of the circles to become parallel lines.)

To prove this theorem, we will make a reduction with regards to the existence component of Theorem 7. For technical reasons we will need to introduce a notion of non-degeneracy. Let {G} be a maximal planar graph with at least four vertices, and let {v} be a vertex in {G}. As discussed above, the degree {d} of {v} is at least three. Writing the neighbours of {v} in clockwise or counterclockwise order (with respect to a triangulation) as {v_1,\dots,v_d} (starting from some arbitrary neighbour), we see that each {v_i} is adjacent to {v_{i-1}} and {v_{i+1}} (with the conventions {v_0=v_d} and {v_{d+1}=v_1}). We say that {v} is non-degenerate if there are no further adjacencies between the {v_1,\dots,v_d}, and if there is at least one further vertex in {G} besides {v,v_1,\dots,v_d}. Here is another characterisation:

Exercise 10 Let {G} be a maximal planar graph with at least four vertices, let {v} be a vertex in {G}, and let {v_1,\dots,v_d} be the neighbours of {v}. Show that the following are equivalent:

  • (i) {v} is non-degenerate.
  • (ii) The graph {G \backslash \{ v, v_1, \dots, v_d \}} is connected and non-empty, and every vertex in {v_1,\dots,v_d} is adjacent to at least one vertex in {G \backslash \{ v, v_1, \dots, v_d \}}.

We will then derive Theorem 7 from

Theorem 11 (Inductive step) Let {G} be a maximal planar graph with at least four vertices {V}, drawn as a triangulation of the Riemann sphere. Let {v} be a non-degenerate vertex of {G}, and let {G - \{v\}} be the graph formed by deleting {v} (and edges emenating from {v}) from {G}. Suppose that there exists a Riemann sphere circle packing {(C_w)_{w \in V \backslash \{v\}}} whose nerve is at least {G - \{v\}} (that is, {C_w} and {C_{w'}} are tangent whenever {w,w'} are adjacent in {G - \{v\}}, although we also allow additional tangencies), and whose associated subdivision of the Riemann sphere into spherical polygons is homotopic to the given triangulation with {v} removed. Then there is a Riemann sphere circle packing {(\tilde C_w)_{w \in V}} with nerve {G} whose triangulation is homotopic to the given triangulation. Furthermore this circle packing {(\tilde C_w)_{w \in V}} is unique up to Möbius transformations.

Let us now see how Theorem 7 follows from Theorem 11. Fix {G} as in Theorem 7. By Exercise 9 and induction we may assume that {G} has at least five vertices, and that the claim has been proven for any smaller number of vertices.

First suppose that {G} contains a non-degenerate vertex {v}. Let {v_1,\dots,v_d} be the the neighbours of {v}. One can then form a new graph {G'} with one fewer vertex by deleting {v}, and then connecting {v_3,\dots,v_{d-1}} to {v_1} (one can think of this operation as contracting the edge {\{v,v_1\}} to a point). One can check that this is still a maximal planar graph that can triangulate the Riemann sphere in a fashion compatible with the original triangulation of {G} (in that all the common vertices, edges, and faces are unchanged). By induction hypothesis, {G'} is the nerve of a circle packing that is compatible with this triangulation, and hence this circle packing has nerve at least {G - \{v\}}. Applying Theorem 11, we then obtain the required claim for {G}.

Now suppose that {G} contains a degenerate vertex {v}. Let {v_1,\dots,v_d} be the neighbours of {v} traversed in order. By hypothesis, there is an additional adjacency between the {v_1,\dots,v_d}; by relabeling we may assume that {v_1} is adjacent to {v_k} for some {3 \leq k \leq d-1}. The vertices {V} in {G} can then be partitioned as

\displaystyle  V = \{v\} \cup \{ v_1,\dots,v_d\} \cup V_1 \cup V_2

where {V_1} denotes those vertices in {V \backslash \{ v_1,\dots,v_d\}} that lie in the region enclosed by the loop {v_1,\dots,v_k, v_1} that does not contain {v}, and {V_2} denotes those vertices in {V \backslash \{ v_1,\dots,v_d\}} that lie in the region enclosed by the loop {v_k,\dots,v_d,v_1, v_k} that does not contain {v}. One can then form two graphs {G_1, G_2}, formed by restricting {G} to the vertices {\tilde V_1 := \{v, v_1,\dots,v_k\} \cup V_1} and {\tilde V_2 := \{ v, v_k, \dots, v_d, v_1\} \cup V_2} respectively; furthermore, these graphs are also maximal planar (with triangulations that are compatible with those of {G}). By induction hypothesis, we can find a circle packing {(C_w)_{w \in \tilde V_1}} with nerve {G_1}, and a circle packing {(C'_w)_{w \in \tilde V_2}} with nerve {G_2}. Note that the circles {C_v, C_{v_1}, C_{v_k}} are mutually tangent, as are {C'_v, C'_{v_1}, C'_{v_k}}. By applying a Möbius transformation one may assume that these circles agree, thus (cf. Exercise 9) {C_v = C'_v}, {C_{v_1} = C'_{v_1}, C_{v_k} = C'_{v_k}}. The complement of the these three circles (and their interiors) determine two connected “interstitial” regions (that are in the shape of an arbelos, up to Möbius transformation); one can check that the remaining circles in {(C_w)_{w \in \tilde V_1}} will lie in one of these regions, and the remaining circles in {(C'_w)_{w \in \tilde V_2}} lie in the other. Hence one can glue these circle packings together to form a single circle packing with nerve {G}, which is homotopic to the given triangulation. Also, since a Möbius transformation that fixes three mutually tangent circles has to be the identity, the uniqueness of this circle packing up to Möbius transformations follows from the uniqueness for the two component circle packings {(C_w)_{w \in \tilde V_1}}, {(C'_w)_{w \in \tilde V_2}}.

It remains to prove Theorem 11. To help fix the freedom to apply Möbius transformations, we can normalise the target circle packing {(\tilde C_w)_{w \in V}} so that {\tilde C_v} is the exterior circle {\{ |z|=1\}}, thus all the other circles {\tilde C_w} in the packing will lie in the closed unit disk {\overline{D(0,1)}}. Similarly, by applying a suitable Möbius transformation one can assume that {\infty} lies outside of the interior of all the circles {C_w} in the original packing, and after a scaling one may then assume that all the circles {C_w} lie in the unit disk {D(0,1)}.

At this point it becomes convenient to switch from the “elliptic” conformal geometry of the Riemann sphere {{\bf C} \cup \{\infty\}} to the “hyperbolic” conformal geometry of the unit disk {D(0,1)}. Recall that the Möbius transformations that preserve the disk {D(0,1)} are given by the maps

\displaystyle  z \mapsto e^{i\theta} \frac{z-\alpha}{1-\overline{\alpha} z} \ \ \ \ \ (1)

for real {\theta} and {\alpha \in D(0,1)} (see Theorem 19 of these notes). It comes with a natural metric that interacts well with circles:

Exercise 12 Define the Poincaré distance {d(z_1,z_2)} between two points of {D(0,1)} by the formula

\displaystyle  d(z_1,z_2) := 2 \mathrm{arctanh} |\frac{z_1-z_2}{1-z_1 \overline{z_2}}|.

Given a measurable subset {E} of {D(0,1)}, define the hyperbolic area of {E} to be the quantity

\displaystyle  \mathrm{area}(E) := \int_E \frac{4\ dx dy}{(1-|z|^2)^2}

where {dx dy} is the Euclidean area element on {D(0,1)}.

  • (i) Show that the Poincaré distance is invariant with respect to Möbius automorphisms of {D(0,1)}, thus {d(Tz_1, Tz_2) = d(z_1,z_2)} whenever {T} is a transformation of the form (1). Similarly show that the hyperbolic area is invariant with respect to such transformations.
  • (ii) Show that the Poincaré distance defines a metric on {D(0,1)}. Furthermore, show that any two distinct points {z_1,z_2} are connected by a unique geodesic, which is a portion of either a line or a circle that meets the unit circle orthogonally at two points. (Hint: use the symmetries of (i) to normalise the points one is studying.)
  • (iii) If {C} is a circle in the interior of {D(0,1)}, show that there exists a point {z_C} in {D(0,1)} and a positive real number {r_C} (which we call the hyperbolic center and hyperbolic radius respectively) such that {C = \{ z \in D(0,1): d(z,z_C) = r_C \}}. (In general, the hyperbolic center and radius will not quite agree with their familiar Euclidean counterparts.) Conversely, show that for any {z_C \in D(0,1)} and {r_C > 0}, the set {\{ z \in D(0,1): d(z,z_C) = r_C \}} is a circle in {D(0,1)}.
  • (iv) If two circles {C_1, C_2} in {D(0,1)} are externally tangent, show that the geodesic connecting the hyperbolic centers {z_{C_1}, z_{C_2}} passes through the point of tangency, orthogonally to the two tangent circles.

Exercise 13 (Schwarz-Pick theorem) Let {f: D(0,1) \rightarrow D(0,1)} be a holomorphic map. Show that {d(f(z_1),f(z_2)) \leq d(z_1,z_2)} for all {z_1,z_2 \in D(0,1)}. If {z_1 \neq z_2}, show that equality occurs if and only if {f} is a Möbius automorphism (1) of {D(0,1)}. (This result is known as the Schwarz-Pick theorem.)

We will refer to circles that lie in the closure {\overline{D(0,1)}} of the unit disk as hyperbolic circles. These can be divided into the finite radius hyperbolic circles, which lie in the interior of the unit disk (as per part (iii) of the above exercise), and the horocycles, which are internally tangent to the unit circle. By convention, we view horocycles as having infinite radius, and having center at their point of tangency to the unit circle; they can be viewed as the limiting case of finite radius hyperbolic circles when the radius goes to infinity and the center goes off to the boundary of the disk (at the same rate as the radius, as measured with respect to the Poincaré distance). We write {C(p,r)} for the hyperbolic circle with hyperbolic centre {p} and hyperbolic radius {r} (thus either {0 < r < \infty} and {p \in D(0,1)}, or {r = \infty} and {p} is on the unit circle); there is an annoying caveat that when {r=\infty} there is more than one horocycle {C(p,\infty)} with hyperbolic centre {p}, but we will tolerate this breakdown of functional dependence of {C} on {p} and {r} in order to simplify the notation. A hyperbolic circle packing is a circle packing {(C(p_v,r_v))_{v \in V}} in which all circles are hyperbolic circles.

We also observe that the geodesic structure extends to the boundary of the unit disk: for any two distinct points {z_1,z_2} in {\overline{D(0,1)}}, there is a unique geodesic that connects them.

In view of the above discussion, Theorem 7 may now be formulated as follows:

Theorem 14 (Inductive step, hyperbolic formulation) Let {G} be a maximal planar graph with at least four vertices {V}, let {v} be a non-degenerate vertex of {G}, and let {v_1,\dots,v_d} be the vertices adjacent to {v}. Suppose that there exists a hyperbolic circle packing {(C(p_w,r_w))_{w \in V \backslash \{v\}}} whose nerve is at least {G - \{v\}}. Then there is a hyperbolic circle packing {(C(\tilde p_w,\tilde r_w))_{V \backslash \{v\}}} homotopic to {(C(p_w,r_w))_{w \in V \backslash \{v\}}} such that the boundary circles {C(\tilde p_{v_j}, \tilde r_{v_j})}, {j=1,\dots,d} are all horocycles. Furthermore, this packing is unique up to Möbius automorphisms (1) of the disk {D(0,1)}.

Indeed, once one adjoints the exterior unit circle to {(C(p_w,r_w))_{w \in V \backslash \{v\}}}, one obtains a Riemann sphere circle packing whose nerve is at least {G}, and hence equal to {G} since {G} is maximal.

To prove this theorem, the intuition is to “inflate” the hyperbolic radius of the circles of {C_w} until the boundary circles all become infinite radius (i.e., horocycles). The difficulty is that one cannot just arbitrarily increase the radius of any given circle without destroying the required tangency properties. The resolution to this difficulty given in the work of Beardon and Stephenson that we are following here was inspired by Perron’s method of subharmonic functions, in which one faced an analogous difficulty that one could not easily manipulate a harmonic function without destroying its harmonicity. There, the solution was to work instead with the more flexible class of subharmonic functions; here we similarly work with the concept of a subpacking.

We will need some preliminaries to define this concept precisely. We first need some hyperbolic trigonometry. We define a hyperbolic triangle to be the solid (and closed) region in {\overline{D(0,1)}} enclosed by three distinct points {z_1,z_2,z_3} in {\overline{D(0,1)}} and the geodesic arcs connecting them. (Note that we allow one or more of the vertices to be on the boundary of the disk, so that the sides of the triangle could have infinite length.) Let {T := (0,+\infty]^3 \backslash \{ (\infty,\infty,\infty)\}} be the space of triples {(r_1,r_2,r_3)} with {0 < r_1,r_2,r_3 \leq \infty} and not all of {r_1,r_2,r_3} infinite. We say that a hyperbolic triangle with vertices {p_1,p_2,p_3} is a {(r_1,r_2,r_3)}-triangle if there are hyperbolic circles {C(p_i,r_1), C(p_2,r_2), C(p_3,r_3)} with the indicated hyperbolic centres and hyperbolic radii that are externally tangent to each other; note that this implies that the sidelengths opposite {p_1,p_2,p_3} have length {r_2+r_3, r_1+r_3, r_1+r_2} respectively (see Figure 3 of Beardon and Stephenson). It is easy to see that for any {(r_1,r_2,r_3) \in T}, there exists a unique {(r_1,r_2,r_3)}-triangle in {\overline{D(0,1)}} up to reflections and Möbius automorphisms (use Möbius transforms to fix two of the hyperbolic circles, and consider all the circles externally tangent to both of these circles; the case when one or two of the {r_1,r_2,r_3} are infinite may need to be treated separately.). As a consequence, there is a well defined angle {\alpha_i(r_1,r_2,r_3) \in [0,\pi)} for {i=1,2,3} subtended by the vertex {p_i} of an {(r_1,r_2,r_3)} triangle. We need some basic facts from hyperbolic geometry:

Exercise 15 (Hyperbolic trigonometry)

  • (i) (Hyperbolic cosine rule) For any {0 < r_1,r_2,r_3 < \infty}, show that the quantity {\cos \alpha_1(r_1,r_2,r_3)} is equal to the ratio

    \displaystyle  \frac{\cosh( r_1+r_2) \cosh(r_1+r_3) - \cosh(r_2+r_3)}{\sinh(r_1+r_2) \sinh(r_1+r_3)}.

    Furthermore, establish the limiting angles

    \displaystyle  \alpha_1(\infty,r_2,r_3) = \alpha_1(\infty,\infty,r_3) = \alpha_1(\infty,r_2,\infty) = 0

    \displaystyle  \cos \alpha_1(r_1,\infty,r_3) = \frac{\cosh(r_1+r_3) - \exp(r_3-r_1)}{\sinh(r_1+r_3)}

    \displaystyle  \cos \alpha_1(r_1,r_2,\infty) = \frac{\cosh(r_1+r_2) - \exp(r_2-r_1)}{\sinh(r_1+r_2)}

    \displaystyle  \cos \alpha_1(r_1,\infty,\infty) = 1 - 2\exp(-2r_1).

    (Hint: to facilitate computations, use a Möbius transform to move the {p_1} vertex to the origin when the radius there is finite.) Conclude in particular that {\alpha_1: T \rightarrow [0,\pi)} is continuous (using the topology of the extended real line for each component of {T}). Discuss how this rule relates to the Euclidean cosine rule in the limit as {r_1,r_2,r_3} go to zero. Of course, by relabeling one obtains similar formulae for {\alpha_2(r_1,r_2,r_3)} and {\alpha_3(r_1,r_2,r_3)}.

  • (ii) (Area rule) Show that the area of a hyperbolic triangle is given by {\pi - \alpha_1-\alpha_2-\alpha_3}, where {\alpha_1,\alpha_2,\alpha_3} are the angles of the hyperbolic triangle. (Hint: there are several ways to proceed. For instance, one can prove this for small hyperbolic triangles (of diameter {O(\varepsilon)}) up to errors of size {o(\varepsilon^2)} after normalising as in (ii), and then establish the general case by subdividing a large hyperbolic triangle into many small hyperbolic triangles. This rule is also a special case of the Gauss-Bonnet theorem in Riemannian geometry. One can also first establish the case when several of the radii are infinite, and use that to derive finite cases.) In particular, the area {\mathrm{Area}(r_1,r_2,r_3)} of a {(r_1,r_2,r_3)}-triangle is given by the formula

    \displaystyle  \pi - \alpha_1(r_1,r_2,r_3) - \alpha_2(r_1,r_2,r_3) - \alpha_3(r_1,r_2,r_3). \ \ \ \ \ (2)

  • (iii) Show that the area of the interior of a hyperbolic circle {C(p,r)} with {r<\infty} is equal to {4\pi \sinh^2(r/2)}.

Henceforth we fix {G, v, v_1,\dots,v_d, {\mathcal C} = (C(p_w,r_w))_{w \in V \backslash \{v\}}} as in Theorem 14. We refer to the vertices {v_1,\dots,v_d} as boundary vertices of {G - \{v\}} and the remaining vertices as interior vertices; edges between boundary vertices are boundary edges, all other edges will be called interior edges (including edges that have one vertex on the boundary). Triangles in {G -\{v\}} that involve two boundary vertices (and thus necessarily one interior vertex) will be called boundary triangles; all other triangles (including ones that involve one boundary vertex) will be called interior triangles. To any triangle {w_1,w_2,w_3} of {G - \{v\}}, we can form the hyperbolic triangle {\Delta_{\mathcal C}(w_1,w_2,w_3)} with vertices {p_{w_1}, p_{w_2}, p_{w_3}}; this is an {(r_{w_1}, r_{w_2}, r_{w_3})}-triangle. Let {\Sigma} denote the collection of such hyperbolic triangles; because {{\mathcal C}} is a packing, we see that these triangles have disjoint interiors. They also fit together in the following way: if {e} is a side of a hyperbolic triangle in {\Sigma}, then there will be another hyperbolic triangle in {\Sigma} that shares that side precisely when {e} is associated to an interior edge of {G - \{v\}}. The union of all these triangles is homeomorphic to the region formed by starting with a triangulation of the Riemann sphere by {G} and removing the triangles containing {v} as a vertex, and is therefore homeomorphic to a disk. One can think of the collection {\Sigma} of hyperbolic triangles, together with the vertices and edges shared by these triangles, as a two-dimensional (hyperbolic) simplicial complex, though we will not develop the full machinery of such complexes here.

Our objective is to find another hyperbolic circle packing {\tilde {\mathcal C} = (C(\tilde p_w, \tilde r_w))_{w \in V \backslash \{v\}}} homotopic to the existing circle packing {{\mathcal C}}, such at all the boundary circles (circles centred at boundary vertices) are horocycles. We observe that such a hyperbolic circle packing is completely described (up to Möbius transformations) by the hyperbolic radii {(\tilde r_w)_{w \in V \backslash \{v\}}} of these circles. Indeed, suppose one knows the values of these hyperbolic radii. Then each hyperbolic triangle {\Delta_{\mathcal C}(w_1,w_2,w_3)} in {\Sigma} is associated to a hyperbolic triangle {\Delta_{\tilde {\mathcal C}}(w_1,w_2,w_3)} whose sides and angles are known from Exercise 15. As the orientation of each hyperbolic triangle is fixed, each hyperbolic triangle is determined up to a Möbius automorphism of {D(0,1)}. Once one fixes one hyperbolic triangle, the adjacent hyperbolic triangles (that share a common side with the first triangle) are then also fixed; continuing in this fashion we see that the entire hyperbolic circle packing {\tilde {\mathcal C}} is determined.

On the other hand, not every choice of radii {(\tilde r_w)_{w \in V \backslash \{v\}}} will lead to a hyperbolic circle packing {\tilde {\mathcal C}} with the required properties. There are two obvious constraints that need to be satisfied:

  • (i) (Local constraint) The angles {\alpha_1( \tilde r_w, \tilde r_{w_1}, \tilde r_{w_2})} of all the hyperbolic triangles {\Delta_{\tilde {\mathcal C}}(w,w_1,w_2)} around any given interior vertex {w} must sum to exactly {2\pi}.
  • (ii) (Boundary constraint) The radii associated to boundary vertices must be infinite.

There could potentially also be a global constraint, in that one requires the circles of the packing to be disjoint – including circles that are not necessarily adjacent to each other. In general, one can easily create configurations of circles that are local circle packings but not global ones (see e.g., Figure 7 of Beardon-Stephenson). However, it turns out that one can use the boundary constraint and topological arguments to prevent this from happening. We first need a topological lemma:

Lemma 16 (Topological lemma) Let {U, V} be bounded connected open subsets of {{\bf C}} with {V} simply connected, and let {f: \overline{U} \rightarrow \overline{V}} be a continuous map such that {f(\partial U) \subset \partial V} and {f(U) \subset V}. Suppose furthermore that the restriction of {f} to {U} is a local homeomorphism. Then {f} is in fact a global homeomorphism.

The requirement that the restriction of {f} to {U} be a local homeomorphism can in fact be relaxed to local injectivity thanks to the invariance of domain theorem. The complex numbers {{\bf C}} can be replaced here by any finite-dimensional vector space.

Proof: The preimage {f^{-1}(p)} of any point {p} in the interior of {V} is closed, discrete, and disjoint from {\partial U}, and is hence finite. Around each point in the preimage, there is a neighbourhood on which {f} is a homeomorphism onto a neighbourhood of {p}. If one deletes the closure of these neighbourhoods, the image under {f} is compact and avoids {p}, and thus avoids a neighbourhood of {p}. From this we can show that {f} is a covering map from {U} to {V}. As the base {V} is simply connected, it is its own universal cover, and hence (by the connectedness of {U}) {f} must be a homeomorphism as claimed. \Box

Proposition 17 Suppose we assign a radius {\tilde r_w \in (0,+\infty]} to each {w \in V \backslash \{v\}} that obeys the local constraint (i) and the boundary constraint (ii). Then there is a hyperbolic circle packing {(C(\tilde p_w, \tilde r_w))_{w \in V \backslash \{v\}}} with nerve {G - \{v\}} and the indicated radii.

Proof: We first create the hyperbolic triangles {\Delta_{\tilde {\mathcal C}}(w_1,w_2,w_3)} associated with the required hyperbolic circle packing, and then verify that this indeed arises from a circle packing.

Start with a single triangle {(w^0_1,w^0_2,w^0_3)} in {G - \{v\}}, and arbitrarily select a {(\tilde r_{w^0_1}, \tilde r_{w^0_2}, \tilde r_{w^0_3})}-triangle {\Delta_{\tilde {\mathcal C}}(w^0_1,w^0_2,w^0_3)} with the same orientation as {\Delta_{{\mathcal C}}(w_1,w_2,w_3)}. By Exercise 15(i), such a triangle exists (and is unique up to Möbius automorphisms of the disk). If a hyperbolic triangle {\Delta_{\tilde {\mathcal C}}(w_1,w_2,w_3)} has been fixed, and {(w_2,w_3,w_4)} (say) is an adjacent triangle in {G - \{v\}}, we can select {\Delta_{\tilde {\mathcal C}}(w_2,w_3,w_4)} to be the unique {(r_{w_2}, r_{w_3}, r_{w_4})}-triangle with the same orientation as {\Delta_{{\mathcal C}}(w_2,w_3,w_4)} that shares the {w_2,w_3} side in common with {\Delta_{\tilde {\mathcal C}}(w_1,w_2,w_3)} (with the {w_2} and {w_3} vertices agreeing). Similarly for other permutations of the labels. As {G} is a maximal planar graph with {v} non-degenerate (so in particular the set of internal vertices is connected), we can continue this construction to eventually fix every triangle in {G - \{v\}}. There is the potential issue that a given triangle {\Delta_{{\mathcal C}}(w_1,w_2,w_3)} may depend on the order in which one arrives at that triangle starting from {(w^0_1,w^0_2,w^0_3)}, but one can check from a monodromy argument (in the spirit of the monodromy theorem) using the local constraint (i) and the simply connected nature of the triangulation associated to {{\mathcal C}} that there is in fact no dependence on the order. (The process resembles that of laying down jigsaw pieces in the shape of hyperbolic triangles together, with the local constraint ensuring that there is always a flush fit locally.)

Now we show that the hyperbolic triangles {\Delta_{\tilde {\mathcal C}}(w_1,w_2,w_3)} have disjoint interiors inside the disk {D(0,1)}. Let {X} denote the topological space formed by taking the disjoint union of the hyperbolic triangles {\Delta_{\tilde {\mathcal C}}(w_1,w_2,w_3)} (now viewed as abstract topological spaces rather than subsets of the disk) and then gluing together all common edges, e.g. identifying the {\{w_2,w_3\}} edge of {\Delta_{\tilde {\mathcal C}}(w_1,w_2,w_3)} with the same edge of {\Delta_{\tilde {\mathcal C}}(w_2,w_3,w_4)} if {(w_1,w_2,w_3)} and {(w_2,w_3,w_4)} are adjacent triangles in {G - \{v\}}. This space is homeomorphic to the union of the original hyperbolic triangles {\Delta_{\tilde {\mathcal C}}(w_1,w_2,w_3)}, and is thus homeomorphic to the closed unit disk. There is an obvious projection map {\pi} from {X} to the union of the {\Delta_{\tilde {\mathcal C}}(w_1,w_2,w_3)}, which maps the abstract copy in {X} of a given hyperbolic triangle {\Delta_{\tilde {\mathcal C}}(w_1,w_2,w_3)} to its concrete counterpart in {\overline{D(0,1)}} in the obvious fashion. This map is continuous. It does not quite cover the full closed disk, mainly because (by the boundary condition (ii)) the boundary hyperbolic triangles {\Delta_{\tilde {\mathcal C}}(v_i,v_{i+1},w)} touch the boundary of the disk at the vertices associated to {v_i} and {v_{i+1}} but do not follow the boundary arc connecting these vertices, being bounded instead by the geodesic from the {v_i} vertex to the {v_{i+1}} vertex; the missing region is a lens-shaped region bounded by two circular arcs. However, by applying another homeomorphism (that does not alter the edges from {v_i} to {w} or {v_{i+1}} to {w}), one can “push out” the {\{v_i,v_{i+1}\}} edge of this hyperbolic triangle across the lens to become the boundary arc from {v_i} to {v_{i+1}}. If one performs this modification for each boundary triangle, one arrives at a modified continuous map {\tilde \pi} from {X} to {\overline{D(0,1)}}, which now has the property that the boundary of {X} maps to the boundary of the disk, and the interior of {X} maps to the interior of the disk. Also one can check that this map is a local homeomorphism. By Lemma 16, {\tilde \pi} is injective; undoing the boundary modifications we conclude that {\pi} is injective. Thus the hyperbolic triangles {\Delta_{\tilde {\mathcal C}}(w_1,w_2,w_3)} have disjoint interiors. Furthermore, the arguments show that for each boundary triangle {\Delta_{\tilde {\mathcal C}}(v_i,v_{i+1},w)}, the lens-shaped regions between the boundary arc between the vertices associated to {v_i, v_{i+1}} and the corresponding edge of the boundary triangle are also disjoint from the hyperbolic triangles and from each other. On the other hand, all of the hyperbolic circles and in {{\tilde {\mathcal C}}} and their interiors are contained in the union of the hyperbolic triangles {\Delta_{\tilde {\mathcal C}}(w_1,w_2,w_3)} and the lens-shaped regions, with each hyperbolic triangle containing portions only of the hyperbolic circles with hyperbolic centres at the vertices of the triangle, and similarly for the lens-shaped regions. From this one can verify that the interiors of the hyperbolic circles are all disjoint from each other, and give a hyperbolic circle packing with the required properties. \Box

In view of the above proposition, the only remaining task is to find an assignment of radii {(\tilde r_w)_{w \in V \backslash \{v\}}} obeying both the local condition (i) and the boundary condition (ii). This is analogous to finding a harmonic function with specified boundary data. To do this, we perform the following analogue of Perron’s method. Define a subpacking to be an assignment {(\tilde r_w)_{w \in V \backslash \{v\}}} of radii {\tilde r_w \in (0,+\infty]} obeying the following

  • (i’) (Local sub-condition) The angles {\alpha_1( \tilde r_w, \tilde r_{w_1}, \tilde r_{w_2})} around any given interior vertex {w} sum to at least {2\pi}.

This can be compared with the definition of a (smooth) subharmonic function as one where the Laplacian is always at least zero. Note that we always have at least one subpacking, namely the one provided by the radii of the original hyperbolic circle packing {{\mathcal C}}. Intuitively, in each subpacking, the radius {\tilde r_w} at an interior vertex {w} is either “too small” or “just right”.

We now need a key monotonicity property, analogous to how the maximum of two subharmonic functions is again subharmonic:

Exercise 18 (Monotonicity)

  • (i) Show that the angle {\alpha_1( r_1, r_2, r_3)} (as defined in Exercise 15(i)) is strictly decreasing in {r_1} and strictly increasing in {r_2} or {r_3} (if one holds the other two radii fixed). Do these claims agree with your geometric intuition?
  • (ii) Conclude that whenever {{\mathcal R}' = (r'_w)_{w \in V \backslash \{v\}}} and {{\mathcal R}'' = (r''_w)_{w \in V \backslash \{v\}}} are subpackings, that {\max( {\mathcal R}' , {\mathcal R}'' ) := (\max(r'_w, r''_w))_{w \in V \backslash \{v\}}} is also a subpacking.
  • (iii) Let {(r_1,r_2,r_3), (r'_1,r'_2,r'_3) \in T} be such that {r_i \leq r'_i} for {i=1,2,3}. Show that {\mathrm{Area}(r_1,r_2,r_3) \leq \mathrm{Area}(r'_1,r'_2,r'_3)}, with equality if and only if {r_i=r'_i} for all {i=1,2,3}. (Hint: increase just one of the radii {r_1,r_2,r_3}. One can either use calculus (after first disposing of various infinite radii cases) or one can argue geometrically.)

As with Perron’s method, we can now try to construct a hyperbolic circle packing by taking the supremum of all the subpackings. To avoid degeneracies we need an upper bound:

Proposition 19 (Upper bound) Let {(\tilde r_w)_{w \in V \backslash \{v\}}} be a subpacking. Then for any interior vertex {w} of degree {d}, one has {\tilde r_w \leq \sqrt{d}}.

The precise value of {\sqrt{d}} is not so important for our arguments, but the fact that it is finite will be. This boundedness of interior circles in a circle packing is a key feature of hyperbolic geometry that is not present in Euclidean geometry, and is one of the reasons why we moved to a hyperbolic perspective in the first place.

Proof: By the subpacking property and pigeonhole principle, there is a triangle {w, w_1, w_2} in {G - \{v\}} such that {\alpha_1(w,w_1,w_2) \geq \frac{2\pi}{d}}. The hyperbolic triangle associated to {(w_1,w_2,w_3)} has area at most {\pi} by (2); on the other hand, it contains a sector of a hyperbolic circle of radius {\tilde r_w} and angle {\frac{2\pi}{d}}, and hence has area at least {\frac{1}{d} 4\pi \sinh^2(r/2) \geq \frac{\pi r^2}{d}}, thanks to Exercise 15(iv). Comparing the two bounds gives the claim. \Box

Now define {{\mathcal R} = ( \tilde r_w )_{w \in V \backslash \{v\}}} to be the (pointwise) supremum of all the subpackings. By the above proposition, {\tilde r_w} is finite at every interior vertex. By Exercise 18, one can view {{\mathcal R}} as a monotone increasing limit of subpackings, and is thus again a subpacking (due to the continuity properties of {\alpha_1} as long as at least one of the radii stays bounded); thus {{\mathcal R}} is the maximal subpacking. On the other hand, if {\tilde r_w} is finite at some boundary vertex, then by Exercise 18(i) one could replace that radius by a larger quantity without destroying the subpacking property, contradicting the maximality of {{\mathcal R}}. Thus all the boundary radii are infinite, that is to say the boundary condition (ii) holds. Finally, if the sum of the angles at an interior vertex {w} is strictly greater than {\pi}, then by Exercise 18 we could increase the radius at this vertex slightly without destroying the subpacking property at {w} or at any other of the interior vertices, again contradicting the maximality of {{\mathcal R}}. Thus {{\mathcal R}} obeys the local condition (i), and we have demonstrated existence of the required hyperbolic circle packing.

Finally we establish uniqueness. It suffices to establish that {{\mathcal R}} is the unique tuple that obeys the local condition (i) and the boundary condition (ii). Suppose we had another tuple {{\mathcal R}' = ( r'_w )_{w \in V \backslash \{v\}}} other than {{\mathcal R}} that obeyed these two conditions. Then by the maximality of {{\mathcal R}}, we have {r'_w \leq \tilde r_w} for all {w}. By Exercise 18(iii), this implies that

\displaystyle  \mathrm{Area}( r'_{w_1}, r'_{w_2}, r'_{w_3} ) \leq \mathrm{Area}( \tilde r_{w_1}, \tilde r_{w_2}, \tilde r_{w_3} )

for any triangle {(w_1,w_2,w_3)} in {T}. Summing over all triangles and using (2), we conclude that

\displaystyle  \sum_{w \in V \backslash \{v\}} \sum_{w_1,w_2: (w,w_1,w_2) \hbox{ triangle}} \alpha_1(r'_{w}, r'_{w_1}, r'_{w_2})

\displaystyle \geq \sum_{w \in V \backslash \{v\}} \sum_{w_1,w_2: (w,w_1,w_2) \hbox{ triangle}} \alpha_1(\tilde r_{w}, \tilde r_{w_1}, \tilde r_{w_2})

where the inner sum is over the pairs {w_1,w_2} such that {(w,w_1,w_2)} forms a triangle in {G - \{v\}}. But by the local condition (i) and the boundary condition (ii), the inner sum on either side is equal to {2\pi} for an interior vertex and {0} for a boundary vertex. Thus the two sides agree, which by Exercise 18(iii) implies that {r'_w = \tilde r_w} for all {w}. This proves Theorem 14 and thus Theorems 7, 3, 4.

— 2. Quasiconformal maps —

In this section we set up some of the foundational theory of quasiconformal mapping, which are generalisations of the conformal mapping concept that can tolerate some deviations from perfect conformality, while still retaining many of the good properties of conformal maps (such as being preserved under uniform limits), though with the notable caveat that in contrast to conformal maps, quasiconformal maps need not be smooth. As such, this theory will come in handy when proving convergence of circle packings to the Riemann map. The material here is largely drawn from the text of Lehto and Virtanen.

We first need the following refinement of the Riemann mapping theorem, known as Carathéodory’s theorem:

Theorem 20 (Carathéodory’s theorem) Let {U} be a bounded simply connected domain in {{\bf C}} whose boundary {\partial U} is a Jordan curve, and let {\phi: D(0,1) \rightarrow U} be a conformal map between {D(0,1)} and {U} (as given by the Riemann mapping theorem). Then {\phi} extends to a continuous homeomorphism from {\overline{D}(0,1)} to {\overline{U}}.

The condition that {\partial U} be a Jordan curve is clearly necessary, since if {\partial U} is not simple then there are paths in {D(0,1)} that end up at different points in {\partial D(0,1)} but have the same endpoint in {\partial U} after applying {\phi}, which prevents {\phi} being continuously extended to a homeomorphism. If one relaxes the requirement that {\partial U} be a Jordan curve to the claim that {{\bf C} \backslash U} is locally connected, then it is possible to modify this argument to still obtain a continuous extension of {U} to {\overline{U}}, although the extension will no longer be necessarily a homeomorphism, but we will not prove this fact here.

Proof: We first prove continuous extension to the boundary. It suffices to show that for every point {\zeta} on the boundary of the unit circle, the diameters of the sets {\phi( D(0,1) \cap D( \zeta, r_n ) )} go to zero for some sequence of radii {r_n \rightarrow 0}.

First observe from the change of variables formula that the area of {U = \phi(D(0,1))} is given by {\int_{D(0,1)} |\phi'(z)|^2\ dx dy}, where {dx dy} denotes Lebesgue measure (or the area element). In particular, this integral is finite. Expanding in polar coordinates around {\zeta}, we conclude that

\displaystyle  \int_0^2 \left(\int_{0}^{2\pi} 1_{D(0,1)}(\zeta+re^{i\theta}) |\phi'( \zeta + r e^{i\theta} )|^2\ d \theta\right) r dr < \infty.

Since {\int_0^2 \frac{dr}{r}} diverges near {r=0}, we conclude from the pigeonhole principle that there exists a sequence of radii {0 < r_n < 2} decreasing to zero such that

\displaystyle  r_n^2 \int_{0}^{2\pi} 1_{D(0,1)}(\zeta+re^{i\theta}) |\phi'( \zeta + r_n e^{i\theta} )|^2\ d \theta \rightarrow 0

and hence by Cauchy-Schwarz

\displaystyle  r_n \int_{0}^{2\pi} 1_{D(0,1)}(\zeta+re^{i\theta}) |\phi'( \zeta + r_n e^{i\theta} )|\ d \theta \rightarrow 0

If we let {C_n} denote the circular arc {\{ \zeta + r_n e^{i\theta}: 0 \leq \theta \leq 2\pi \} \cap D(0,1)}, we conclude from this and the triangle inequality (and chain rule) that {\phi(C_n)} is a rectifiable curve with length going to zero as {n \rightarrow \infty}. Let {a_n, b_n} denote the endpoints of this curve. Clearly they lie in {\overline{U}}. If (say) {a_n} was in {U}, then as {\phi} is a homeomorphism from {D(0,1)} to {U}, {C_n} would have one endpoint in {D(0,1)} rather than {\partial D(0,1)}, which is absurd. Thus {a_n} lies in {\partial U}, and similarly for {b_n}. Since the length of {\phi(C_n)} goes to zero, the distance between {a_n} and {b_n} goes to zero. Since {\partial U} is a Jordan curve, it can be parameterised homeomorphically by {\partial D(0,1)}, and so by compactness we also see that the distance between the parameterisations of {a_n} and {b_n} in {\partial D(0,1)} must also go to zero, hence (by uniform continuity of the inverse parameterisation) {a_n} and {b_n} are connected along {\partial U} by an arc whose diameter goes to zero. Combining this arc with {\phi(C_n)}, we obtain a Jordan curve of diameter going to zero which separates {\phi(D(0,1) \cap D(\zeta, r_n))} from the rest of {U}. Sending {r} to infinity, we see that {\phi(D(0,1) \cap D(\zeta, r_n))} (which decreases with {n}) must eventually map in the interior of this curve rather than the exterior, and so the diameter goes to zero as claimed.

The above construction shows that {\phi} extends to a continuous map (which by abuse of notation we continue to call {\phi}) from {\overline{D(0,1)}} to {\overline{U}}, and the proof also shows that {\partial D(0,1)} maps to {\partial U}. As {\phi(\overline{D(0,1)})} is a compact subset of {\overline{U}} that contains {U}, it must surject onto {\overline{U}}. As both {\overline{D(0,1)}} and {\overline{U}} are compact Hausdorff spaces, we will now be done if we can show injectivity. The only way injectivity can fail is if there are two distinct points {\zeta,\omega} on {\partial D(0,1)} that map to the same point. Let {C} be the line segment connecting {\zeta} with {\omega}, then {\phi(C)} is a Jordan curve in {\overline{U}} that meets {\partial U} only at {\phi(\zeta) = \phi(\omega)}. {C} divides {\overline{D(0,1)}} into two regions; one of which must map to the interior of {\phi(C)}, which implies that there is an entire arc of {\partial D(0,1)} which maps to the single point {\phi(\zeta)=\phi(\omega)}. But then by the Schwarz reflection principle, {\phi} extends conformally across this arc and is constant in a non-isolated set, thus is constant everywhere by analytic continuation, which is absurd. This establishes the required injectivity. \Box

This has the following consequence. Define a Jordan quadrilateral to be the open region {Q} enclosed by a Jordan curve with four distinct marked points {p_1,p_2,p_3,p_4} on it in counterclockwise order, which we call the vertices of the quadrilateral. The arcs in {\partial Q} connecting {p_1} to {p_2} or {p_3} to {p_4} will be called the {a}-sides; the arcs connecting {p_2} to {p_3} or {p_4} to {p_1} will be called {b}-sides. (Thus for instance each cyclic permutation of the {p_1,p_2,p_3,p_4} vertices will swap the {a}-sides and {b}-sides, while keeping the interior region {Q} unchanged.) A key example of a Jordan quadrilateral are the (Euclidean) rectangles, in which the vertices {p_1,\dots,p_4} are the usual corners of the rectangle, traversed counterclockwise. The {a}-sides then are line segments of some length {a}, and the {b}-sides are line segments of some length {b} that are orthogonal to the {a}-sides. A vertex-preserving conformal map from one Jordan quadrilateral {Q} to another {Q'} will be a conformal map that extends to a homeomorphism from {\overline{Q}} to {\overline{Q'}} that maps the corners of {Q} to the respective corners of {Q'} (in particular, {a}-sides get mapped to {a}-sides, and similarly for {b}-sides).

Exercise 21 Let {Q} be a Jordan quadrilateral with vertices {p_1,p_2,p_3,p_4}.

  • (i) Show that there exists {r > 1} and a conformal map {\phi: Q \rightarrow \mathbf{H}} to the upper half-plane {\mathbf{H} := \{ z: \mathrm{Im} z > 0 \}} (viewed as a subset of the Riemann sphere) that extends continuously to a homeomorphism {\phi: \overline{Q} \rightarrow \overline{\mathbf{H}}} and which maps {p_1,p_2,p_3,p_4} to {-r, -1, 1, r} respectively. (Hint: first map {p_1,p_2,p_3,p_4} to increasing elements of the real line, then use the intermediate value theorem to enforce {\phi(p_1)+\phi(p_4) = \phi(p_2)+\phi(p_3)}.)
  • (ii) Show that there is a vertex-preserving conformal map {\psi: Q \rightarrow R} from {Q} to a rectangle {R} (Hint: use Schwarz-Christoffel mapping.)
  • (iii) Show that the rectangle {R} in part (ii) is unique up to affine transformations. (Hint: if one has a conformal map between rectangles that preserves the vertices, extend it via repeated use of the Schwarz reflection principle to an entire map.)

This allows for the following definition: the conformal modulus {\mathrm{mod}(Q)} (or modulus for short, also called module in older literature) of a Jordan quadrilateral with vertices {p_1,p_2,p_3,p_4} is the ratio {b/a}, where {a,b} are the lengths of the {a}-sides and {b}-sides of a rectangle {R} that is conformal to {Q} in a vertex-preserving vashion.. This is a number between {0} and {\infty}; each cyclic permutation of the vertices replaces the modulus with its reciprocal. It is clear from construction that the modulus of a Jordan quadrilateral is unaffected by vertex-preserving conformal transformations.

Now we define quasiconformal maps. Informally, conformal maps are homeomorphisms that map infinitesimal circles to infinitesimal circles; quasiconformal maps are homeomorphisms that map infinitesimal circles to curves that differ from an infinitesimal circle by “bounded distortion”. However, for the purpose of setting up the foundations of the theory, it is slightly more convenient to work with rectangles instead of circles (it is easier to partition rectangles into subrectangles than disks into subdisks). We therefore introduce

Definition 22 Let {K \geq 1}. An orientation-preserving homeomorphism {\phi: U \rightarrow V} between two domains {U,V} in {{\bf C}} is said to be {K}-quasiconformal if one has {\mathrm{mod}(\phi(Q)) \leq K \mathrm{mod}(Q)} for every Jordan quadrilateral {Q} in {U}. (In these notes, we do not consider orientation-reversing homeomorphisms to be quasiconformal.)

Note that by cyclically permuting the vertices of {Q}, we automatically also obtain the inequality

\displaystyle  \frac{1}{\mathrm{mod}(\phi(Q))} \leq K \frac{1}{\mathrm{mod}(Q)}

or equivalently

\displaystyle  \frac{1}{K} \mathrm{mod}(Q) \leq \mathrm{mod}(\phi(Q))

for any Jordan quadrilateral. Thus it is not possible to have any {K}-quasiconformal maps for {K<1} (excluding the degenerate case when {U,V} are empty), and a map is {1}-conformal if and only if it preserves the modulus. In particular, conformal maps are {1}-conformal; we will shortly establish that the converse claim is also true. It is also clear from the definition that the inverse of a {K}-quasiconformal map is also {K}-quasiconformal, and the composition of a {K}-quasiconformal map and a {K'}-quasiconformal map is a {KK'}-quasiconformal map.

It is helpful to have an alternate characterisation of the modulus that does not explicitly mention conformal mapping:

Proposition 23 (Alternate definition of modulus) Let {Q} be a Jordan quadrilateral with vertices {p_1,p_2,p_3,p_4}. Then {\mathrm{mod}(Q)} is the smallest quantity with the following property: for any Borel measurable {\rho: Q \rightarrow [0,+\infty)} one can find a curve {\gamma} in {Q} connecting one {a}-side of {Q} to another, and which is locally rectifiable away from endpoints, such that

\displaystyle  \left(\int_\gamma \rho(z)\ |dz|\right)^2 \leq \mathrm{mod}(Q) \int_Q \rho^2(z)\ dx dy

where {\int_\gamma |dz|} denotes integration using the length element of {\gamma} (not to be confused with the contour integral {\int_\gamma\ dz}).

The reciprocal of this notion of modulus generalises to the concept of extremal length, which we will not develop further here.

Proof: Observe from the change of variables formula that if {\phi: Q \rightarrow Q'} is a vertex-preserving conformal mapping between Jordan quadrilaterals {Q,Q'}, and {\gamma} is a locally rectifiable curve connecting one {a}-side of {Q} to another, then {\phi \circ \gamma} is a locally rectifiable curve connecting one {a}-side of {Q'} to another, with

\displaystyle  \int_{\phi \circ \gamma} \rho \circ \phi^{-1}(w) \ |dw| = \int_\phi \rho(z) |\phi'(z)|\ |dz|


\displaystyle  \int_{Q'} |\rho \circ \phi^{-1}(w)|^2\ dw \overline{dw} = \int_Q |\rho(z)|^2 |\phi'(z)|^2\ dx dy.

As a consequence, if the proposition holds for {Q} it also holds for {Q'}. Thus we may assume without loss of generality that {Q} is a rectangle, which we may normalise to be {\{ x+iy: 0 \leq y \leq 1; 0 \leq x \leq M \}} with vertices {i, 0, M, M+i}, so that the modulus is {M}. For any measurable {\rho: Q \rightarrow [0,+\infty)}, we have from Cauchy-Schwarz and Fubini’s theorem that

\displaystyle  \int_0^1 \left(\int_0^M \rho(x+iy)\ dx\right)^2 dy \leq M \int_0^1 \int_0^M \rho^2(x+iy)\ dx dy

\displaystyle  = M \int_Q \rho^2(z)\ dz \overline{dz}

and hence by the pigeonhole principle there exists {y} such that

\displaystyle  \left(\int_0^M \rho(x+iy)\ dx\right)^2 \leq M \int_Q \rho^2(z)\ dz \overline{dz}.

On the other hand, if we set {\rho=1}, then {\int_Q \rho^2(z)\ dz \overline{dz} = M}, and for any curve {\gamma} connecting the {a}-side from {0} to {i} to the {a}-side from {M} to {M+i}, we have

\displaystyle  \int_\gamma \rho\ |dz| \geq \left| \int_\gamma \rho\ dx \right| = M.

Thus {M} is the best constant with the required property, proving the claim. \Box

Here are some quick and useful consequences of this characterisation:

Note: it would be more logical to reverse the order of the next two exercises, since Exercise 25 can be used to prove Exercise 24. I will do this at the conclusion of this course.

Exercise 24 (Superadditivity)

  • (i) If {Q_1, Q_2} are disjoint Jordan quadrilaterals that share a common {a}-side, and which can be glued together along this side to form a new Jordan quadrilateral {Q_1 \cup Q_2}, show that {\mathrm{mod}(Q_1 \cup Q_2) \geq \mathrm{mod}(Q_1) + \mathrm{mod}(Q_2)}. If equality occurs, show that after conformally mapping {Q_1 \cup Q_2} to a rectangle (in a vertex preserving fashion), {Q_1}, {Q_2} are mapped to subrectangles (formed by cutting the original parallel to the {a}-side).
  • (ii) If {Q_1, Q_2} are disjoint Jordan quadrilaterals that share a common {b}-side, and which can be glued together along this side to form a new Jordan quadrilateral {Q_1 \cup Q_2}, show that {\frac{1}{\mathrm{mod}(Q_1 \cup Q_2)} \geq \frac{1}{\mathrm{mod}(Q_1)} + \frac{1}{\mathrm{mod}(Q_2)}}. If equality occurs, show that after conformally mapping {Q_1 \cup Q_2} to a rectangle (in a vertex preserving fashion), {Q_1}, {Q_2} are mapped to subrectangles (formed by cutting the original parallel to the {b}-side).

Exercise 25 (Rengel’s inequality) Let {Q} be a Jordan quadrilateral of area {A}, let {b} be the shortest (Euclidean) distance between a point on one {a}-side and a point on the other {a}-side, and similarly let {a} be the shortest (Euclidean) distance between a point on one {b}-side and a point on the other {b}-side. Show that

\displaystyle  \frac{b^2}{A} \leq \mathrm{mod}(Q) \leq \frac{A}{a^2}

and that equality in either case occurs if and only if {Q} is a rectangle.

Exercise 26 (Continuity from below) Suppose {Q_n} is a sequence of Jordan quadrilaterals which converge to another Jordan quadrilateral {Q}, in the sense that the vertices of {Q_n} converge to their respective counterparts in {Q}, each {a}-side in {Q_n} converges (in the Hausdorff sense) to the {a}-side of {Q}, and the similarly for {b}-sides. Suppose also that {Q_n \subset Q} for all {n}. Show that {\mathrm{mod}(Q_n)} converges to {\mathrm{mod}(Q)}. (Hint: map {Q} to a rectangle and use Rengel’s inequality.)

Proposition 27 (Local quasiconformality implies quasiconformality) Let {K \geq 1}, and let {\phi: U \rightarrow V} be an orientation-preserving homeomorphism between complex domains {U,V} which is locally {K}-quasiconformal in the sense that for every {z_0 \in U} there is a neighbourhood {U_{z_0}} of {z_0} in {U} such that {\phi} is {K}-quasiconformal from {U_{z_0}} to {\phi(U_{z_0})}. Then {\phi} is {K}-quasiconformal.

Proof: We need to show that {\mathrm{mod}(\phi(Q)) \leq K \mathrm{mod}(Q)} for any Jordan quadrilateral {Q} in {U}. The hypothesis gives this claim for all quadrilaterals in the sufficiently small neighbourhood of any point in {U}. For any natural number {n}, we can subdivide {\phi(Q)} into {n} quadrilaterals {\phi(Q_1),\dots,\phi(Q_n)} with modulus {\frac{1}{n} \mathrm{mod}(\phi(Q))} with adjacent {a}-sides, by first conformally mapping {\phi(Q)} to a rectangle and then doing an equally spaced vertical subdivision. Similarly, each quadrilateral {Q_i} can be subdivided into {n} quadrilaterals {Q_{i,1},\dots,Q_{i,n}} of modulus {n \mathrm{mod}(Q_i)} by mapping {Q_i} to a rectangle and performing horizontal subdivision. By the local {K}-quasiconformality of {\phi}, we will have

\displaystyle  \mathrm{mod}( \phi(Q_{i,j}) ) \leq K \mathrm{mod}( Q_{i,j} ) = K n \mathrm{mod}(Q_i)

for all {i,j=1,\dots,n}, if {n} is large enough. By superadditivity this implies that

\displaystyle  \mathrm{mod}( \phi(Q_i) ) \leq K\mathrm{mod}(Q_i)

for each {i}, and hence

\displaystyle  \mathrm{mod}(Q_i) \geq \frac{1}{Kn} \mathrm{mod}(\phi(Q)).

Applying superadditivity again we obtain

\displaystyle  \mathrm{mod}(Q) \geq \frac{1}{K} \mathrm{mod}(\phi(Q)).

giving the claim. \Box

We can now reverse the implication that conformal maps are {1}-conformal:

Proposition 28 Every {1}-conformal map {\phi: U \rightarrow V} is conformal.

Proof: By covering {U} by quadrilaterals we may assume without loss of generality that {U} (and hence also {\phi(U)=V}) is a Jordan quadrilateral; by composing on left and right with conformal maps we may assume that {U} and {V} are rectangles. As {\phi} is {1}-conformal, the rectangles have the same modulus, so after a further affine transformation we may assume that {U=V} is the rectangle with vertices {i, 0, M, M+i} for some modulus {M}. If one subdivides {U} into two rectangles along an intermediate vertical line segment connecting say {x} to {x+i} for some {0 < x < M}, the moduli of these rectangles are {x} and {M-x}. Applying the {1}-conformal map and the converse portion of Exercise 24, we conclude that these rectangles must be preserved by {\phi}, thus {\phi} preserves the {x} coordinate. Similarly {\phi} preserves the {y} coordinate, and is therefore the identity map, which is of course conformal. \Box

Next, we can give a simple criterion for quasiconformality in the continuously differentiable case:

Theorem 29 Let {K \geq 1}, and let {\phi: U \rightarrow V} be an orientation-preserving diffeomorphism (a continuously (real) differentiable homeomorphism whose derivative is always nondegenerate) between complex domains {U,V}. Then the following are equivalent:

  • (i) {\phi} is {K}-quasiconformal.
  • (ii) For any point {z_0 \in U} and phases {v,w \in S^1 := \{ z \in {\bf C}: |z|=1\}}, one has

    \displaystyle  |D_v \phi(z_0)| \leq K|D_w \phi(z_0)|

    where {D_v \phi(z_0) := \frac{\partial}{\partial t} \phi(z_0+tv)|_{t=0}} denotes the directional derivative.

Proof: Let us first show that (ii) implies (i). Let {Q} be a Jordan quadrilateral in {U}; we have to show that {\mathrm{mod}(\phi(Q)) \leq K \mathrm{mod}(Q)}. From the chain rule one can check that condition (ii) is unchanged by composing {\phi} with conformal maps on the left or right, so we may assume without loss of generality that {Q} and {\phi(Q)} are rectangles; in fact we may normalise {Q} to have vertices {i, 0, T, T+i} and {\phi(Q)} to have vertices {i, 0, T', T'+i} where {T = \mathrm{mod}(Q)} and {T' = \mathrm{mod}(Q')}. From the change of variables formula (and the singular value decomposition), followed by Fubini’s theorem and Cauchy-Schwarz, we have

\displaystyle  T' = \int_{\phi(Q)}\ dx dy

\displaystyle  = \int_Q \mathrm{det}(D\phi)(z)\ dx dy

\displaystyle  = \int_Q \max_{v \in S^1} |D_v \phi(z)| \min_{w \in S^1} |D_w \phi(z)|\ dx dy

\displaystyle  \geq \int_Q \frac{1}{K} \left|\frac{\partial}{\partial x} \phi(z)\right|^2\ dx dy

\displaystyle = \frac{1}{K} \int_0^1 \int_0^T \left|\frac{\partial}{\partial x} \phi(x+iy)\right|^2\ dx dy

\displaystyle  \geq \frac{1}{K T} \int_0^1 \left|\int_0^T \frac{\partial}{\partial x} \phi(x+iy)\right|^2\ dx dy

\displaystyle  = \frac{1}{K T} \int_0^1 (T')^2\ dy

3 and hence {T' \leq K T}, giving the claim.

Now suppose that (ii) failed, then by the singular value decomposition we can find {z_0 \in U} and a phase {v \in S^1} such that

\displaystyle  D_{iv} \phi(z_0) = i \lambda D_v \phi(z_0)

for some real {\lambda} with {\lambda > K}. After translations and rotations we may normalise so that

\displaystyle  \phi(0) = 0; \frac{\partial}{\partial x} \phi(0) = 1; \frac{\partial}{\partial y} \phi(0) = i\lambda.

But then from Rengel’s inequality and Taylor expansion one sees that {\phi} will map a unit square with vertices {0, -\varepsilon, -\varepsilon+i\varepsilon, i\varepsilon} to a quadrilateral of modulus converging to {\lambda} as {\varepsilon \rightarrow 0}, contradicting (i). \Box

Exercise 30 Show that the conditions (i), (ii) in the above theorem are also equivalent to the bound

\displaystyle  \left|\frac{\partial}{\partial \overline{z}} \phi(z_0)\right| \leq \frac{K-1}{K+1} \left|\frac{\partial }{\partial z} \phi(z_0)\right|

for all {z_0 \in U}, where

\displaystyle  \frac{\partial }{\partial z} := \frac{1}{2} ( \frac{\partial }{\partial x} - i \frac{\partial }{\partial y}); \quad \frac{\partial }{\partial \overline{z}} := \frac{1}{2} ( \frac{\partial }{\partial x} + i \frac{\partial }{\partial y})

are the Wirtinger derivatives.

We now prove a technical regularity result on quasiconformal maps.

Proposition 31 (Absolute continuity on lines) Let {\phi: U \rightarrow V} be a {K}-quasiconformal map between two complex domains {U,V} for some {K}. Suppose that {U} contains the closed rectangle with endpoints {0, i, T+i, T}. Then for almost every {0 \leq y \leq 1}, the map {x \mapsto \phi(x+iy)} is absolutely continuous on {[0,R]}.

Proof: For each {y}, let {A(y)} denote the area of the image {\{ \phi(x+iy'): 0 \leq x \leq T; 0 \leq y \leq y'\}} of the rectangle with endpoints {0, iy, T+iy, T}. This is a bounded monotone function on {[0,1]} and is hence differentiable almost everywhere. It will thus suffice to show that the map {x \mapsto \phi(x+iy)} is absolutely continuous on {[0,R]} whenever {y \in (0,1)} is a point of differentiability of {A}.

Let {\varepsilon > 0}, and let {[x_1,x'_1],\dots,[x_m,x'_m]} be disjoint intervals in {[0,R]} of total length {\sum_{j=1}^m x'_j-x_j \leq \varepsilon}. To show absolute continuity, we need a bound on {\sum_{j=1}^m |\phi(x'_j) - \phi(x_j)|} that goes to zero as {\varepsilon \rightarrow 0} uniformly in the choice of intervals. Let {\delta>0} be a small number (that can depend on the intervals), and for each {j=1,\dots,m} let {R_j} be the rectangle with vertices {x_j+i(y_j+\delta)}, {x_j+iy_j}, {x'_y+iy_j}, {x'_j+i(y_j+\delta)} This rectangle has modulus {(x'_j-x_j)/\delta}, and hence {\phi(R_j)} has modulus at most {K (x'_j-x_j)/\delta}. On the other hand, by Rengel’s inequality this modulus is at least {|\phi(x'_j)-\phi(x_j)-o(1)|^2 / \mathrm{Area}(\phi(R_j))}, where {o(1)} is a quantity that goes to zero as {\delta \rightarrow 0} (holding the intervals fixed). We conclude that

\displaystyle  |\phi(x'_j)-\phi(x_j)|^2 \leq \frac{K}{\delta} (x'_j-x_j) \mathrm{Area}(\phi(R_j)) + o(1).

On the other hand, we have

\displaystyle  \sum_{j=1}^m \mathrm{Area}(\phi(R_j)) \leq A(y+\delta) - A(y) = (A'(y)+o(1)) \delta.

By Cauchy-Schwarz, we thus have

\displaystyle  (\sum_{j=1}^m |\phi(x'_j)-\phi(x_j)|)^2 \leq K A'(y) \sum_{j=1}^m (x'_j-x_j) + o(1);

sending {\delta \rightarrow 0}, we conclude

\displaystyle  \sum_{j=1}^m |\phi(x'_j)-\phi(x_j)| \leq K^{1/2} A'(y)^{1/2} \varepsilon^{1/2}

giving the claim. \Box

Exercise 32 Let {\phi: U \rightarrow V} be a {K}-quasiconformal map between two complex domains {U,V} for some {K}. Suppose that there is a closed set {S \subset {\bf C}} of Lebesgue measure zero such that {\phi} is conformal on {U \backslash S}. Show that {\phi} is {1}-conformal (and hence conformal, by Proposition 28). (Hint: Arguing as in the proof of Theorem 29, it suffices to show that of {\phi} maps the rectangle with endpoints {0, i, T+i, T} to the rectangle with endpoints {0, i, T'+i, T'}, then {T' \leq T}. Repeat the proof of that theorem, using the absolute continuity of lines at a crucial juncture to justify using the fundamental theorem of calculus.)

Recall Hurwitz’s theorem that the locally uniform limit of conformal maps is either conformal or constant. It turns out there is a similar result for quasiconformal maps. We will just prove a weak version of the result (see Theorem II.5.5 of Lehto-Virtanen for the full statement):

Theorem 33 Let {K \geq 1}, and let {\phi_n: U \rightarrow V_n} be a sequence of {K}-quasiconformal maps that converge locally uniformly to an orientation-preserving homeomorphism {\phi: U \rightarrow V}. Then {\phi} is also {K}-quasiconformal.

It is important for this theorem that we do not insist that quasiconformal maps are necessarily differentiable. Indeed for applications to circle packing we will be working with maps that are only piecewise smooth, or possibly even worse, even though at the end of the day we will recover a smooth conformal map in the limit.

Proof: Let {Q} be a Jordan quadrilateral in {U}. We need to show that {\mathrm{mod}(\phi(Q)) \leq K \mathrm{mod}(Q)}. By restricting {U} we may assume {U=Q}. By composing {\phi, \phi_n} with a conformal map we may assume that {Q} is a rectangle. We can write {Q} as the increasing limit of rectangles {Q_m} of the same modulus, then for any {n,m} we have {\mathrm{mod}(\phi_n(Q_m)) \leq K \mathrm{mod}(Q)}. By choosing {n_m} going to infinity sufficiently rapidly, {\phi_{n_m}(Q_m)} stays inside {\phi(Q)} and converges to {\phi(Q)} in the sense of Exercise 26, and the claim then follows from that exercise. \Box

Another basic property of conformal mappings (a consequence of Morera’s theorem) is that they can be glued along a common edge as long as the combined map is also a homeomorphism; this fact underlies for instance the Schwarz reflection principle. We have a quasiconformal analogue:

Theorem 34 Let {K \geq 1}, and let {\phi: U \rightarrow V} be an orientation-preserving homeomorphism. Let {C} be a real analytic (and topologically closed) contour that lies in {U} except possibly at the endpoints. If {\phi: U \backslash C \rightarrow \phi(U \backslash C)} is {K}-quasiconformal, then {\phi: U \rightarrow V} is {K}-quasiconformal.

We will generally apply this theorem in the case when {C} disconnects {U} into two components, in which case {\phi} can be viewed as the gluing of the restrictions of this map to the two components.

Proof: As in the proof of the previous theorem, we may take {U} to be a rectangle {Q}, and it suffices to show that {\mathrm{mod}(\phi(Q)) \leq K \mathrm{mod}(Q)}. We may normalise {Q} to have vertices {i, 0, M, M+i} where {M = \mathrm{mod}(Q)}, and similarly normalise {\phi(Q)} to be a rectangle of vertices {i, 0, M', M'+i}, so we now need to show that {M' \leq KM}. The real analytic contour {C} meets {Q} in a finite number of curves, which can be broken up further into a finite horizontal line segments and graphs {\{ f_j(y) + iy: y \in I_j\}} for various closed intervals {I_j \subset [0,1]} and real analytic {f_j: I_j \rightarrow [0,M]}. For any {\varepsilon>0}, we can then use the uniform continuity of the {f_j} to subdivide {Q} into a finite number of rectangles {Q_k= \{ x+iy: y \in J_k, 0 \leq x \leq M \}} where on each such rectangle, {C} meets the interior of {Q_k} in a bounded number of graphs {\{ f_j(y) +iy: y \in J_k\}} whose horizontal variation is {O(\varepsilon)}. This subdivides {Q_k} into a bounded number of Jordan quadrilaterals {Q_{k,j}}. If we let {s_{k,j}} denote the distance between the {a}-sides of {\phi(Q_{k,j})}, then by uniform continuity of {\phi} and the triangle inequality we have

\displaystyle  \sum_j s_{k,j} \geq (1+o(1)) M'

as {\varepsilon \rightarrow 0}. By Rengel’s inequality, we have

\displaystyle  \mathrm{mod}( \phi(Q_{k,j}) ) \geq \frac{s_{k,j}^2}{\mathrm{Area}(\phi(Q_{k,j}))};

since {\mathrm{mod}( \phi(Q_{k,j}) ) \leq K \mathrm{mod}( Q_{k,j} )}, we conclude using superadditivity that

\displaystyle  \mathrm{mod}(Q_j) \geq \frac{1}{K} \sum_k \frac{s_{k,j}^2}{\mathrm{Area}(\phi(Q_{k,j}))}

and hence by Cauchy-Schwarz

\displaystyle  \mathrm{mod}(Q_j) \geq \frac{1}{K} \frac{(\sum_k s_{k,j})^2}{\sum_k \mathrm{Area}(\phi(Q_{k,j}))}

and thus

\displaystyle  \frac{1}{\mathrm{mod}(Q_j)} \leq (1+o(1)) K \frac{\mathrm{Area}(\phi(Q_j))}{(M')^2}.

Summing in {j}, we obtain

\displaystyle  \frac{1}{M} \leq (1+o(1)) K \frac{M'}{(M')^2}

giving the desired bound {M' \leq K M} after sending {\varepsilon \rightarrow 0}. \Box

It will be convenient to study analogues of the modulus when quadrilaterals are replaced by generalisations of annuli. We define a ring domain to be a region bounded between two Jordan curves {C_1, C_2}, where {C_1} (the inner boundary) is contained inside the interior of {C_2} (the outer boundary). For instance, the annulus {\{ z: r < |z-z_0| < R \}} is a ring domain for any {z_0 \in {\bf C}} and {0 < r < R < \infty}. In the spirit of Proposition 23, define the modulus {\mathrm{mod}(A)} of a ring domain {A} to be the supremum of all the quantities {M} with the following property: for any Borel measurable {\rho: A \rightarrow [0,+\infty)} one can find a rectifable curve {\gamma} in {A} winding once around the inner boundary {C_1}, such that

\displaystyle  \left(\int_\gamma \rho(z)\ |dz|\right)^2 \leq \frac{2\pi}{M} \int_A \rho^2(z)\ dz \overline{dz}.

We record some basic properties of this modulus:

Exercise 35 Show that every ring domain is conformal to an annulus. (There are several ways to proceed here. One is to start by using Perron’s method to construct a harmonic function that is {1} on one of the boundaries of the annulus and {0} on the other. Another is to apply a logarithm map to transform the annulus to a simply connected domain with a “parabolic” group of discrete translation symmetries, use the Riemann mapping theorem to map this to a disc, and use the uniqueness aspect of the Riemann mapping theorem to figure out what happens to the symmetry.)

Exercise 36

As a basic application of this concept we have the fact that the complex plane cannot be quasiconformal to any proper subset:

Proposition 37 Let {\phi: {\bf C} \rightarrow V} be a {K}-quasiconformal map for some {K \geq 1}; then {V = {\bf C}}.

Proof: As {V} is homeomorphic to {{\bf C}}, it is simply connected. Thus, if we assume for contradiction that {V \neq {\bf C}}, then by the Riemann mapping theorem {V} is conformal to {D(0,1)}, so we may assume without loss of generality that {V = D(0,1)}.

By Exercise 36(i), the moduli {\log R} of the annuli {\{ z: 1 \leq |z| \leq R \}} goes to infinity as {R \rightarrow \infty}, and hence (by Exercise 36(ii) (applied to {\phi^{-1}}) the moduli of the ring domains {\{ \phi(z): 1 \leq |z| \leq R \}} must also go to infinity. However, as the inner boundary of this domain is fixed and the outer one is bounded, all these ring domains can be contained inside a common annulus, contradicting Exercise 36(iii). \Box

For some further applications of the modulus of ring domains, we need the following result of Grötzsch:

Theorem 38 (Grötzsch modulus theorem) Let {0 < r < 1}, and let {A} be the ring domain formed from {D(0,1)} by deleting the line segment from {0} to {r}. [Technically, {A} is not quite a ring domain as defined above, but as mentioned in Exercise 36(ii), the definition of modulus remains valid in this case.] Let {A'} be another ring domain contained in {D(0,1)} whose inner boundary encloses both {0} and {r}. Then {\mathrm{mod}(A') \leq \mathrm{mod}(A)}.

Proof: Let {R := \exp(\mathrm{mod}(A))}, then by Exercise 35 we can find a conformal map {f} from {A} to the annulus {\{ z: 1 \leq |z| \leq R \}}. As {A} is symmetric around the real axis, and the only conformal automorphisms of the annulus that preserve the inner and outer boundaries are rotations (as can be seen for instance by using the Schwarz reflection principle repeatedly to extend such automorphisms to an entire function of linear growth), we may assume that {f} obeys the symmetry {f(\overline{z}) = \overline{f(z)}}. Let {\rho: A \rightarrow {\bf R}^+} be the function {\rho := |f'/f|}, then {\rho} is symmetric around the real axis. One can view {\rho} as a measurable function on {A'}; from the change of variables formula we have

\displaystyle  \int_{A'} \rho^2\ dz d\overline{z} = \int_{1 \leq |z| \leq R} \frac{1}{|z|^2}\ dz d\overline{z} = 2\pi \log R,

so in particular {\rho} is square-integrable. Our task is to show that {\mathrm{mod}(A') \leq \log R}; by the definition of modulus, it suffices to show that

\displaystyle  (\int_\gamma \rho\ d|z|)^2 \leq \frac{2\pi}{\log R} \int_{A'} \rho^2\ dz d\overline{z}

for any rectifiable curve {\gamma} that goes once around {A'}, and thus once around {0} and {r} in {D(0,1)}. By a limiting argument we may assume that {\gamma} is polygonal. By repeatedly reflecting around the real axis whenever {\gamma} crosses the line segment between {0} and {r}, we may assume that {\gamma} does not actually cross this segment, and then by perturbation we may assume it is contained in {A}. But then by change of variables we have

\displaystyle  \int_\gamma \rho\ d|z| = \int_{f(\gamma)} \frac{d|z|}{|z|} \leq |\int_{f(\gamma)} \frac{dz}{z}| = 2\pi

by the Cauchy integral formula, and the claim follows. \Box

Exercise 39 Let {\phi_n: U \rightarrow V_n} be a sequence of {K}-quasiconformal maps for some {K \geq 1}, such that all the {V_n} are uniformly bounded. Show that the {\phi_n} are a normal family, that is to say every sequence in {\phi_n} contains a subsequence that converges locally uniformly. (Hint: use an argument similar to that in the proof of Proposition 37, combined with Theorem 38, to establish some equicontinuity of the {\phi_n}.)

There are many further basic properties of the conformal modulus for both quadrilaterals and annuli; we refer the interested reader to Lehto-Virtanen for details.

— 3. Rigidity of the hexagonal circle packing —

We return now to circle packings. In order to understand finite circle packings, it is convenient (in order to use some limiting arguments) to consider some infinite circle packings. A basic example of an infinite circle packing is the regular hexagonal circle packing

\displaystyle  {\mathcal H} := ( z_0 + S^1 )_{z_0 \in \Gamma}

where {\Gamma} is the hexagonal lattice

\displaystyle  \Gamma := \{ 2 n + 2 e^{2\pi i/3} m: n,m \in {\bf Z} \}

and {z_0 + S^1 := \{ z_0 + e^{i \theta}: \theta \in {\bf R} \}} is the unit circle centred at {z_0}. This is clearly an (infinite) circle packing, with two circles {z_0+S^1, z_1+S^1} in this packing (externally) tangent if and only if they differ by twice a sixth root of unity. Between any three mutually tangent circles in this packing is an open region that we will call an interstice. It is inscribed in a dual circle that meets the three original circles orthogonally and can be computed to have radius {1/\sqrt{3}}; the interstice can then be viewed as a hyperbolic triangle in this dual circle in which all three sides have infinite length.

Next we need two simple geometric lemmas, due to Rodin and Sullivan.

Lemma 40 (Ring lemma) Let {C} be a circle that is externally tangent to a chain {C_1,\dots,C_n} of circles with disjoint interiors, with each {C_i} externally tangent to {C_{i+1}} (with the convention {C_{n+1}=C_1}). Then there is a constant {c_n} depending only on {n}, such that the radii of each of the {C_i} is at least {c_n} times the radius of {C}.

Proof: Without loss of generality we may assume that {C} has radius {1} and that the radius {r_1} of {C_1} is maximal among the radii {r_i} of the {C_i}. As the polygon connecting the centers of the {C_i} has to contain {C}, we see that {r_1 \gg_n 1}. This forces {r_2 \gg_n 1}, for if {r_2} was too small then {C_2} would be so deep in the cuspidal region between {C} and {C_1} that it would not be possible for {C_3, C_4,\dots C_n} to escape this cusp and go around {C_1}. A similar argument then gives {r_3 \gg_n 1}, and so forth, giving the claim. \Box

Lemma 41 (Length-area lemma) Let {n \geq 1}, and let {{\mathcal H}_n} consist of those circles in {{\mathcal H}} that can be connected to the circle {0 + S^1} by a path of length at most {n} (going through consecutively tangent circles in {{\mathcal H}}). Let {{\mathcal C}_n} be circle packing with the same nerve as {{\mathcal H}_n} that is contained in a disk of radius {R}. Then the circle {C_0} in {{\mathcal C}_n} associated to the circle {0+S^1} in {{\mathcal H}_n} has radius {O(\frac{R}{\log^{1/2} n})}.

The point of this bound is that when {R} is bounded and {n \rightarrow \infty}, the radius of {C_0} is forced to go to zero.

Proof: We can surround {0+S^1} by {n} disjoint chains {(C_{j,i})_{i=1}^{6j}, j=1,\dots,n} of consecutively tangent circles {z_{j,i}+S^1}, {i=1,\dots, 6j} in {{\mathcal H}_n}. Each circle is associated to a corresponding circle in {{\mathcal C}} of some radius {r_{j,i}}. The total area {\sum_{j=1}^n \sum_{i=1}^{6j} \pi r_{ij}^2} of these circles is at most the area {\pi R^2} of the disk of radius {R}. Since {\sum_{j=1}^n \frac{1}{n} \gg \log n}, this implies from the pigeonhole principle that there exists {j} for which

\displaystyle  \sum_{i=1}^{6j} \pi r_{ij}^2 \ll \frac{R^2}{j \log n}

and hence by Cauchy-Schwarz

\displaystyle  \sum_{i=1}^{6j} r_{ij} \ll \frac{R}{\log^{1/2} n}.

Connecting the centers of these circles, we obtain a polygonal path of length {O( \frac{R}{\log^{1/2} n})} that goes around {C_0}, and the claim follows. \Box

For every circle {z_0 + S^1} in the circle packing {{\mathcal H}}, we can form the inversion map {\iota_{z_0}: {\bf C} \cup \{\infty\} \rightarrow {\bf C} \cup \{\infty\}} across this circle on the Riemann sphere, defined by setting

\displaystyle  \iota_{z_0}( z_0 + re^{i\theta} ) := z_0 + \frac{1}{r} e^{i\theta}

for {0 < r < \infty} and {\theta \in {\bf R}}, with the convention that {\iota_{z_0}} maps {z_0} to {\infty} and vice versa. These are conjugates of Möbius transformations; they preserve the circle {z_0+S^1} and swap the interior with the exterior. Let {G} be the group of transformations of {{\bf C} \cup \{\infty\}} generated by these inversions {\iota_{z_0}}; this is essentially a Schottky group (except for the fact that we are are allowing for conjugate Möbius transformations in addition to ordinary Möbius transformations). Let {I} denote the union of all the interstices in {{\mathcal H}}, and let {GI := \bigcup_{g \in G} g(I)} be the union of the images of the interstitial regions {I} under all of these transformations. We have the following basic fact:

Proposition 42 {{\bf C} \backslash GI} has Lebesgue measure zero.

Proof: (Sketch) I thank Mario Bonk for this argument. Let {G {\mathcal H}} denote all the circles formed by applying an element of {G} to the circles in {{\mathcal H}}. If {z} lies in {{\bf C} \backslash GI}, then it lies inside one of the circles in {{\mathcal H}}, and then after inverting through that circle it lies in another circle in {{\mathcal H}}, and so forth; undoing the inversions, we conclude that {z} lies in infinite number of nested circles. Let {C} be one of these circles. {GI} contains a union of six interstices bounded by {C} and a cycle of six circles internally tangent to {C} and consecutively externally tangent to each other. Applying the same argument used to establish the ring lemma (Lemma 40), we see that the six internal circles have radii comparable to that of {C}, and hence {GI} has density {\gg 1} in the disk enclosed by {C}, which also contains {z}. The ring lemma also shows that the radius of each circle in the nested sequence is at most {1-c} times the one enclosing it for some absolute constant {c>0}, so in particular the disks shrink to zero in size. Thus {z} cannot be a point of density of {{\bf C} \backslash GI}, and hence by the Lebesgue density theorem this set has measure zero. \Box

We also need another simple geometric observation:

Exercise 43 Let {C_1,C_2,C_3} be mutually externally tangent circles, and let {C'_1, C'_2, C'_3} be another triple of mutually external circles, with the same orientation (e.g. {C_1,C_2,C_3} and {C'_1,C'_2,C'_3} both go counterclockwise around their interstitial region). Show that there exists a Möbius transformation {\phi} that maps each {C_i} to {C'_i} and which maps the interstice of {C_1,C_2,C_3} conformally onto the interstice of {C'_1,C'_2, C'_3}.

Now we can give a rigidity result for the hexagonal circle packing, somewhat in the spirit of Theorem 4 (though it does not immediately follow from that theorem), and also due to Rodin and Sullivan:

Proposition 44 (Rigidity of infinite hexagonal packing) Let {{\mathcal C}} be an infinite circle packing in {{\bf C}} with the same nerve as the hexagonal circle packing {{\mathcal H}}. Then {{\mathcal C}} is in fact equal to the hexagonl circle packing up to affine transformations and reflections.

Proof: By applying a reflection we may assume that {{\mathcal C}} and {{\mathcal H}} have the same orientation. For each interstitial region {I_j} of {{\mathcal H}} there is an associated interstitial region {I'_j} of {{\mathcal C}}, and by Exercise 43 there is a Möbius transformation {T_j: I_j \rightarrow I'_j}. These can be glued together to form a map {\phi_0} that is initially defined (and conformal) on the interstitial regions {I =\bigcup_j I_j}; we would like to extend it to the entire complex plane by defining it also inside the circles {z_j + S^1}.

Now consider a circle {z_j+S^1} in {{\mathcal H}}. It is bounded by six interstitial regions {I_1,\dots,I_6}, which map to six interstitial regions {I'_1,\dots,I'_6} that lie between the circle {C_0} corresponding to {z_j+S^1} and six tangent circles {C_1,\dots,C_6}. By the ring lemma, all of the circles {C_1,\dots,C_6} have radii comparable to the radius {r_j} of {C_0}. As a consequence, the map {\phi_0}, which is defined (and piecewise Möbius) on the boundary of {z_j + S^1} as a map to the boundary of {C_0}, has derivative comparable in magnitude to {r_j} also. By extending this map radially (in the sense of defining {\phi(z_j + r e^{i\theta}) := w_j + r r_j (\phi(z_j + e^{i\theta})-w_j)} for {0 < r < 1} and {\theta \in {\bf R}}, where {w_j} is the centre of {C_0}, we see from Theorem 29 that we can extend {\phi_0} to be {K}-quasiconformal in the interior of {z_j+S^1} except possibly at {z_j} for some {K=O(1)}, and to a homeomorphism from {{\bf C}} to the region {\phi_0({\bf C})} consisting of the union of the disks in {{\mathcal C}} and their interstitial regions. By many applications of Theorem 34, {\phi_0} is now {K}-quasiconformal on all of {{\bf C}}, and conformal in the interstitial regions {I}. By Proposition 37, {\phi_0} surjects onto {{\bf C}}, thus the circle packing {{\mathcal C}} and all of its interstitial regions cover the entire complex plane.

Next, we use a version of the Schwarz reflection principle to replace {\phi_0} by another {K}-quasiconformal map that is conformal on a larger region than {I}. Namely, pick a circle {z_j+S^1} in {{\mathcal H}}, and let {C_0} be the corresponding circle in {{\mathcal C}}. Let {\iota_j} and {\iota'_j} be the inversions across {z_j+S^1} and {C_0} respectively. Note that {\phi_0} maps the circle {z_j+S^1} to {C_0}, with the interior mapping to the interior and exterior mapping to the exterior. We can then define a modified map {\phi_1} by setting {\phi_1(z)} equal to {\phi_0(z)} on or outside {z_j+S_1}, and {\phi_1(z)} equal to {\iota'_j \circ \phi_0 \circ \iota_j(z)} inside {z_j+S_1} (with the convention that {\phi_0} maps {\infty} to {\infty}). This is still an orientation-preserving function {{\bf C}}; by Theorem 34 it is still {K}-quasiconformal. It remains conformal on the interstitial region {I}, but is now also conformal on the additional interstitial region {\iota_j(I)}. Repeating this construction one can find a sequence {\phi_n:{\bf C} \rightarrow {\bf C}} of {K}-quasiconformal maps that map each circle {z_j+S^1} to their counterparts {C_0}, and which are conformal on a sequence {I_n} of sets that increase up to {GI}. By Exercise 39, the restriction of {\phi_n} to any compact set forms a normal family (the fact that the circles {z_j+S^1} map to the circles {C_0} will give the required uniform boundedness for topological reasons), and hence (by the usual diagonalisation argument) the {\phi_n} themselves are a normal family; similarly for {\phi_n^{-1}}. Thus, by passing to a subsequence, we may assume that the {\phi_n} converge locally uniformly to a limit {\phi}, and that {\phi_n^{-1}} also converge locally uniformly to a limit which must then invert {\phi}. Thus {\phi} is a homeomorphism, and thus {K}-quasiconformal by Theorem 33. It is conformal on {GI}, and hence by Proposition 32 it is conformal. But the only conformal maps of the complex plane are the affine maps (see Proposition 15 of this previous blog post), and hence {{\mathcal C}} is an affine copy of {{\mathcal H}} as required. \Box

By a standard limiting argument, the perfect rigidity of the infinite circle packing can be used to give approximate rigidity of finite circle packings:

Corollary 45 (Approximate rigidity of finite hexagonal packings) Let {\varepsilon>0}, and suppose that {n} is sufficiently large depending on {\varepsilon}. Let {{\mathcal H}_n} and {{\mathcal C}_n} be as in Lemma 41. Let {r_0} be the radius of the circle {C_1} in {{\mathcal C}_n} associated to {0+S_1}, and let {r_1} be the radius of an adjacent circle {C_1}. Then {1-\varepsilon \leq \frac{r_1}{r_0} \leq 1+\varepsilon}.

Proof: We may normalise {r_0=1} and {C_0=S^1}. Suppose for contradiction that the claim failed, then one can find a sequence {n} tending to infinity, and circle packings {{\mathcal C}_n} with nerve {{\mathcal H}_n} with {C_0 = C_{0,n} = S^1}, such that the radius {r_{1,n}} of the adjacent circle {C_1 = C_{1,n}} stays bounded away from {1}. By many applications of the ring lemma, for each circle {z + S^1} of {{\mathcal H}}, the corresponding circle {C_{z,n}} in {{\mathcal C}_n} has radius bounded above and below by zero. Passing to a subsequence using Bolzano-Weierstrass and using the Arzela-Ascoli diagonalisation argument, we may assume that the radii {r_{z,n}} of these circles converge to a positive finite limit {r_{z,\infty}}. Applying a rotation we may also assume that the circles {C_{1,n}} converge to a limit circle {C_{1,\infty}} (using the obvious topology on the space of circles); we can also assume that the orientation of the {{\mathcal C}_n} does not depend on {n}. A simple induction then shows that {C_{z,n}} converges to a limit circle {C_{z,\infty}}, giving a circle packing {{\mathcal C}_\infty} with the same nerve as {{\mathcal H}}. But then by Lemma 44, {{\mathcal C}_\infty} is an affine copy of {{\mathcal H}}, which among other things implies that {r_{1,\infty} = r_{0,\infty} = 1}. Thus {r_{1,n}} converges to {1}, giving the required contradiction. \Box

A more quantitative version of this corollary was worked out by He. There is also a purely topological proof of the rigidity of the infinite hexagonal circle packing due to Schramm.

— 4. Approximating a conformal map by circle packing —

Let {U} be a simply connected bounded region in {{\bf C}} with two distinct distinguished points {z_0, z_1 \in U}. By the Riemann mapping theorem, there is a unique conformal map {\phi: U \rightarrow D(0,1)} that maps {z_0} to {0} and {z_1} to a positive real. However, many proofs of this theorem are rather nonconstructive, and do not come with an effective algorithm to locate, or at least approximate, this map {\phi}.

It was conjectured by Thurston, and later proven by Rodin and Sullivan, that one could achieve this by applying the circle packing theorem (Theorem 3) to a circle packing in {U} by small circles. To formalise this, we need some more notation. Let {\varepsilon>0} be a small number, and let {\varepsilon \cdot {\mathcal H}} be the infinite hexagonal packing scaled by {\varepsilon}. For every circle in {\varepsilon \cdot {\mathcal H}}, define the flower to be the union of this circle, its interior, the six interstices bounding it, and the six circles tangent to the circle (together with their interiors). Let {C_0} be a circle in {\varepsilon \cdot {\mathcal H}} such that {z_0} lies in its flower. For {\varepsilon} small enough, this flower is contained in {U}. Let {{\mathcal I}_\varepsilon} denote all circles in {\varepsilon \cdot {\mathcal H}} that can be reached from {C_0} by a finite chain of consecutively tangent circles in {\varepsilon \cdot {\mathcal H}}, whose flowers all lie in {U}. Elements of {{\mathcal I}_\varepsilon} will be called inner circles, and circles in {\varepsilon \cdot {\mathcal H}} that are not an inner circle but are tangent to it will be called border circles. Because {U} is simply connected, the union of all the flowers of inner circles is also simply connected. As a consequence, one can traverse the border circles by a cycle of consecutively tangent circles, with the inner circles enclosed by this cycle. Let {{\mathcal C}_\varepsilon} be the circle packing consisting of the inner circles and border circles. Applying Theorem 3 followed by a Möbius transformation, one can then find a circle packing {{\mathcal C}'_\varepsilon} in {D(0,1)} with the same nerve and orientation as {{\mathcal C}_\varepsilon}, such that all the circles in {{\mathcal C}'_\varepsilon} associated to border circles of {{\mathcal C}_\varepsilon} are internally tangent to {D(0,1)}. Applying a Möbius transformation, we may assume that the flower containing {z_0} in {{\mathcal C}_\varepsilon} is mapped to the flower containing {0}, and the flower containing {z_1} is mapped to a flower containing a positive real. (From the exercise below {z_1} will lie in such a flower for {\varepsilon} small enough.)

Let {U_\varepsilon} be the union of all the solid closed equilateral triangles formed by the centres of mutually tangent circles in {{\mathcal C}_\varepsilon}, and let {D_\varepsilon} be the corresponding union of the solid closed triangles from {{\mathcal C}'_\varepsilon}. Let {\phi_\varepsilon} be the piecewise affine map from {U_\varepsilon} to {D_\varepsilon} that maps each triangle in {U_\varepsilon} to the associated triangle in {D_\varepsilon}.

Exercise 46 Show that {U_\varepsilon} converges to {U} as {\varepsilon \rightarrow 0} in the Hausdorff sense. In particular, {z_1} lies in {U_\varepsilon} for sufficiently small {\varepsilon}.

Exercise 47 By modifying the proof of the length-area lemma, show that all the circles {C} in {{\mathcal C}'_\varepsilon} have radius that goes uniformly to zero as {\varepsilon \rightarrow 0}. (Hint: for circles {C} deep in the interior, the length-area lemma works as is; for circles {C} near the boundary, one has to encircle {C} by a sequence of chains that need not be closed, but may instead terminate on the boundary of {D(0,1)}. The argument may be viewed as a discrete version of the one used to prove Theorem 20.) Using this and the previous exercise, show that {D_\varepsilon} converges to {D(0,1)} in the Hausdorff sense.

From Corollary 45 we see that as {\varepsilon \rightarrow 0}, the circles in {{\mathcal C}'_\varepsilon} corresponding to adjacent circles of {{\mathcal C}_\varepsilon} in a fixed compact subset {R} of {U} have radii differing by a ratio of {1+o(1)}. We conclude that in any compact subset {R'} of {D(0,1)}, adjacent circles in {{\mathcal C}'_\varepsilon} in {R'} also have radii differing by a ratio of {1+o(1)}, which implies by trigonometry that the triangles of {D_\varepsilon} in {R'} are approximately equilateral in the sense that their angles are {\frac{\pi}{3}+o(1)}. By Theorem 29 {\phi_\varepsilon} is {1+o(1)}-quasiconformal on each such triangle, and hence by Theorem 34 it is {1+o(1)}-quasiconformal on {R}. By Exercise 39 every sequence of {\phi_\varepsilon} has a subsequence which converges locally uniformly on {U}, and whose inverses converge locally uniformly on {D}; the limit is then a homeomorphism from {U} to {D} that maps {z_0} to {0} and {z_1} to a positive real. By Theorem 33 the limit is locally {1}-conformal and hence conformal, hence by uniqueness of the Riemann mapping it must equal {\phi}. As {\phi} is the unique limit point of all subsequences of the {\phi_\varepsilon}, this implies (by the Urysohn subsequence principle) that {\phi_\varepsilon} converges locally uniformly to {\phi}, thus making precise the sense in which the circle packings converge to the Riemann map.

July 16, 2018

Tommaso DorigoA Beautiful New Spectroscopy Measurement

What is spectroscopy ? 
(A) the observation of ghosts by infrared visors or other optical devices
(B) the study of excited states of matter through observation of energy emissions

If you answered (A), you are probably using a lousy internet search engine; and btw, you are rather dumb. Ghosts do not exist. 

Otherwise you are welcome to read on. We are, in fact, about to discuss a cutting-edge spectroscopy measurement, performed by the CMS experiment using lots of proton-proton collisions by the CERN Large Hadron Collider (LHC). 

read more

July 12, 2018

Matt Strassler“Seeing” Double: Neutrinos and Photons Observed from the Same Cosmic Source

There has long been a question as to what types of events and processes are responsible for the highest-energy neutrinos coming from space and observed by scientists.  Another question, probably related, is what creates the majority of high-energy cosmic rays — the particles, mostly protons, that are constantly raining down upon the Earth.

As scientists’ ability to detect high-energy neutrinos (particles that are hugely abundant, electrically neutral, very light-weight, and very difficult to observe) and high-energy photons (particles of light, though not necessarily of visible light) have become more powerful and precise, there’s been considerable hope of getting an answer to these question.  One of the things we’ve been awaiting (and been disappointed a couple of times) is a violent explosion out in the universe that produces both high-energy photons and neutrinos at the same time, at a high enough rate that both types of particles can be observed at the same time coming from the same direction.

In recent years, there has been some indirect evidence that blazars — narrow jets of particles, pointed in our general direction like the barrel of a gun, and created as material swirls near and almost into giant black holes in the centers of very distant galaxies — may be responsible for the high-energy neutrinos.  Strong direct evidence in favor of this hypothesis has just been presented today.   Last year, one of these blazars flared brightly, and the flare created both high-energy neutrinos and high-energy photons that were observed within the same period, coming from the same place in the sky.

I have written about the IceCube neutrino observatory before; it’s a cubic kilometer of ice under the South Pole, instrumented with light detectors, and it’s ideal for observing neutrinos whose motion-energy far exceeds that of the protons in the Large Hadron Collider, where the Higgs particle was discovered.  These neutrinos mostly pass through Ice Cube undetected, but one in 100,000 hits something, and debris from the collision produces visible light that Ice Cube’s detectors can record.   IceCube has already made important discoveries, detecting a new class of high-energy neutrinos.

On Sept 22 of last year, one of these very high-energy neutrinos was observed at IceCube. More precisely, a muon created underground by the collision of this neutrino with an atomic nucleus was observed in IceCube.  To create the observed muon, the neutrino must have had a motion-energy tens of thousand times larger than than the motion-energy of each proton at the Large Hadron Collider (LHC).  And the direction of the neutrino’s motion is known too; it’s essentially the same as that of the observed muon.  So IceCube’s scientists knew where, on the sky, this neutrino had come from.

(This doesn’t work for typical cosmic rays; protons, for instance, travel in curved paths because they are deflected by cosmic magnetic fields, so even if you measure their travel direction at their arrival to Earth, you don’t then know where they came from. Neutrinos, beng electrically neutral, aren’t affected by magnetic fields and travel in a straight line, just as photons do.)

Very close to that direction is a well-known blazar (TXS-0506), four billion light years away (a good fraction of the distance across the visible universe).

The IceCube scientists immediately reported their neutrino observation to scientists with high-energy photon detectors.  (I’ve also written about some of the detectors used to study the very high-energy photons that we find in the sky: in particular, the Fermi/LAT satellite played a role in this latest discovery.) Fermi/LAT, which continuously monitors the sky, was already detecting high-energy photons coming from the same direction.   Within a few days the Fermi scientists had confirmed that TXS-0506 was indeed flaring at the time — already starting in April 2017 in fact, six times as bright as normal.  With this news from IceCube and Fermi/LAT, many other telescopes (including the MAGIC cosmic ray detector telescopes among others) then followed suit and studied the blazar, learning more about the properties of its flare.

Now, just a single neutrino on its own isn’t entirely convincing; is it possible that this was all just a coincidence?  So the IceCube folks went back to their older data to snoop around.  There they discovered, in their 2014-2015 data, a dramatic flare in neutrinos — more than a dozen neutrinos, seen over 150 days, had come from the same direction in the sky where TXS-0506 is sitting.  (More precisely, nearly 20 from this direction were seen, in a time period where normally there’d just be 6 or 7 by random chance.)  This confirms that this blazar is indeed a source of neutrinos.  And from the energies of the neutrinos in this flare, yet more can be learned about this blazar, and how it makes  high-energy photons and neutrinos at the same time.  Interestingly, so far at least, there’s no strong evidence for this 2014 flare in photons, except perhaps an increase in the number of the highest-energy photons… but not in the total brightness of the source.

The full picture, still emerging, tends to support the idea that the blazar arises from a supermassive black hole, acting as a natural particle accelerator, making a narrow spray of particles, including protons, at extremely high energy.  These protons, millions of times more energetic than those at the Large Hadron Collider, then collide with more ordinary particles that are just wandering around, such as visible-light photons from starlight or infrared photons from the ambient heat of the universe.  The collisions produce particles called pions, made from quarks and anti-quarks and gluons (just as protons are), which in turn decay either to photons or to (among other things) neutrinos.  And its those resulting photons and neutrinos which have now been jointly observed.

Since cosmic rays, the mysterious high energy particles from outer space that are constantly raining down on our planet, are mostly protons, this is evidence that many, perhaps most, of the highest energy cosmic rays are created in the natural particle accelerators associated with blazars. Many scientists have suspected that the most extreme cosmic rays are associated with the most active black holes at the centers of galaxies, and now we have evidence and more details in favor of this idea.  It now appears likely that that this question will be answerable over time, as more blazar flares are observed and studied.

The announcement of this important discovery was made at the National Science Foundation by Francis Halzen, the IceCube principal investigator, Olga Botner, former IceCube spokesperson, Regina Caputo, the Fermi-LAT analysis coordinator, and Razmik Mirzoyan, MAGIC spokesperson.

The fact that both photons and neutrinos have been observed from the same source is an example of what people are now calling “multi-messenger astronomy”; a previous example was the observation in gravitational waves, and in photons of many different energies, of two merging neutron stars.  Of course, something like this already happened in 1987, when a supernova was seen by eye, and also observed in neutrinos.  But in this case, the neutrinos and photons have energies millions and billions of times larger!


July 09, 2018

n-Category Café Beyond Classical Bayesian Networks

guest post by Pablo Andres-Martinez and Sophie Raynor

In the final installment of the Applied Category Theory seminar, we discussed the 2014 paper “Theory-independent limits on correlations from generalized Bayesian networks” by Henson, Lal and Pusey.

In this post, we’ll give a short introduction to Bayesian networks, explain why quantum mechanics means that one may want to generalise them, and present the main results of the paper. That’s a lot to cover, and there won’t be a huge amount of category theory, but we hope to give the reader some intuition about the issues involved, and another example of monoidal categories used in causal theory.


Bayesian networks are a graphical modelling tool used to show how random variables interact. A Bayesian network consists of a pair (G,P)(G,P) of directed acyclic graph (DAG) GG together with a joint probability distribution PP on its nodes, satisfying the Markov condition. Intuitively the graph describes a flow of information.

The Markov condition says that the system doesn’t have memory. That is, the distribution on a given node YY is only dependent on the distributions on the nodes XX for which there is an edge XYX \rightarrow Y. Consider the following chain of binary events. In spring, the pollen in the air may cause someone to have an allergic reaction that may make them sneeze.

a poset

In this case the Markov condition says that given that you know that someone is having an allergic reaction, whether or not it is spring is not going to influence your belief about the likelihood of them sneezing. Which seems sensible.

Bayesian networks are useful

  • as an inference tool, thanks to belief propagation algorithms,

  • and because, given a Bayesian network (G,P)(G,P), we can describe d-separation properties on GG which enable us to discover conditional independences in PP.

It is this second point that we’ll be interested in here.

Before getting into the details of the paper, let’s try to motivate this discussion by explaining its title: “Theory-independent limits on correlations from generalized Bayesian networks" and giving a little more background to the problem it aims to solve.

Crudely put, the paper aims to generalise a method that assumes classical mechanics to one that holds in quantum and more general theories.

Classical mechanics rests on two intuitively reasonable and desirable assumptions, together called local causality,

  • Causality:

    Causality is usually treated as a physical primitive. Simply put it is the principle that there is a (partial) ordering of events in space time. In order to have information flow from event AA to event BB, AA must be in the past of BB.

    Physicists often define causality in terms of a discarding principle: If we ignore the outcome of a physical process, it doesn’t matter what process has occurred. Or, put another way, the outcome of a physical process doesn’t change the initial conditions.

  • Locality:

    Locality is the assumption that, at any given instant, the values of any particle’s properties are independent of any other particle. Intuitively, it says that particles are individual entities that can be understood in isolation of any other particle.

    Physicists usually picture particles as having a private list of numbers determining their properties. The principle of locality would be violated if any of the entries of such a list were a function whose domain is another particle’s property values.

In 1935 Einstein, Podolsky and Rosen showed that quantum mechanics (which was a recently born theory) predicted that a pair of particles could be prepared so that applying an action on one of them would instantaneously affect the other, no matter how distant in space they were, thus contradicting local causality. This seemed so unreasonable that the authors presented it as evidence that quantum mechanics was wrong.

But Einstein was wrong. In 1964, John S. Bell set the bases for an experimental test that would demonstrate that Einstein’s “spooky action at a distance” (Einstein’s own words), now known as entanglement, was indeed real. Bell’s experiment has been replicated countless of times and has plenty of variations. This video gives a detailed explanation of one of these experiments, for a non-physicist audience.

But then, if acting on a particle has an instantaneous effect on a distant point in space, one of the two principle above is violated: On one hand, if we acted on both particles at the same time, each action being a distinct event, both would be affecting each other’s result, so it would not be possible to decide on an ordering; causality would be broken. The other option would be to reject locality: a property’s value may be given by a function, so the resulting value may instantaneously change when the distant ‘domain’ particle is altered. In that case, the particles’ information was never separated in space, as they were never truly isolated, so causality is preserved.

Since causality is integral to our understanding of the world and forms the basis of scientific reasoning, the standard interpretation of quantum mechanics is to accept non-locality.

The definition of Bayesian networks implies a discarding principle and hence there is a formal sense in which they are causal (even if, as we shall see, the correlations they model do not always reflect the temporal order). Under this interpretation, the causal theory Bayesian networks describe is classical. Precisely, they can only model probability distributions that satisfy local causality. Hence, in particular, they are not sufficient to model all physical correlations.

The goal of the paper is to develop a framework that generalises Bayesian networks and d-separation results, so that we can still use graph properties to reason about conditional dependence under any given causal theory, be it classical, quantum, or even more general. In particular, this theory will be able to handle all physically observed correlations, and all theoretically postulated correlations.

Though category theory is not mentioned explicitly, the authors achieve their goal by using the categorical framework of operational probablistic theories (OPTs).

Bayesian networks and d-separation

Consider the situation in which we have three Boolean random variables. Alice is either sneezing or she is not, she either has a a fever or she does not, and she may or may not have flu.

Now, flu can cause both sneezing and fever, that is

P(sneezing|flu)P(sneezing) and likewise P(fever|flu)P(fever)P(sneezing \ | \ flu ) \neq P( sneezing) \ \text{ and likewise } \ P(fever \ | \ flu ) \neq P( fever)

so we could represent this graphically as

a poset

Moreover, intuitively we wouldn’t expect there to be any other edges in the above graph. Sneezing and fever, though correlated - each is more likely if Alice has flu - are not direct causes of each other. That is,

P(sneezing|fever)P(sneezing) but P(sneezing|fever,flu)=P(sneezing|flu).P(sneezing \ | \ fever ) \neq P(sneezing) \ \text{ but } \ P(sneezing \ | \ fever, \ flu ) = P(sneezing \ | \ flu).

Bayesian networks

Let GG be a directed acyclic graph or DAG GG. (Here a directed graph is a presheaf on (\bullet \rightrightarrows \bullet)).

The set Pa(Y)Pa(Y) of parents of a node YY of GG contains those nodes XX of GG such that there is a directed edge XYX \to Y.

So, in the example above Pa(flu)=Pa(flu) = \emptyset while Pa(fever)=Pa(sneezing)={flu}Pa(fever) = Pa(sneezing) = \{ flu \}.

To each node XX of a directed graph GG, we may associate a random variable, also denoted XX. If VV is the set of nodes of GG and (x X) XV(x_X)_{X \in V} is a choice of value x Xx_X for each node XX, such that yy is the chosen value for YY, then pa(y)pa(y) will denote the Pa(Y)Pa(Y)-tuple of values (x X) XPa(Y)(x_X)_{X \in Pa(Y)}.

To define Bayesian networks, and establish the notation, let’s revise some probability basics.

Let P(x,y|z)P(x,y \ | \ z) mean P(X=x and Y=y|Z=z)P(X = x \text{ and } \ Y = y \ | \ Z = z), the probability that XX has the value xx, and YY has the value yy given that ZZ has the value zz. Recall that this is given by

P(x,y|z)=P(x,y,z)P(z).P(x,y \ |\ z) = \frac{ P(x,y,z) }{P(z)}.

The chain rule says that, given a value xx of XX and sets of values Ω,Λ\Omega, \Lambda of other random variables,

P(x,Ω|Λ)=P(x|Λ)P(Ω|x,Λ).P(x, \Omega \ | \ \Lambda) = P( x \ | \ \Lambda) P( \Omega \ | \ x, \Lambda).

Random variables XX and YY are said to be conditionally independent given ZZ, written XY|ZX \perp\!\!\!\!\!\!\!\perp Y \ | \ Z, if for all values xx of XX, yy of YY and zz of ZZ

P(x,y|z)=P(x|z)P(y|z).P(x,y \ | \ z) = P(x \ | \ z) P(y \ | \ z).

By the chain rule this is equivalent to

P(x|y,z)=P(x|z),x,y,z.P(x \ | \ y,z ) = P (x \ | \ z) , \ \forall x,y, z.

More generally, we may replace X,YX,Y and ZZ with sets of random variables. So, in the special case that ZZ is empty, then XX and YY are independent if and only if P(x,y)=P(x)P(y)P(x, y) = P(x)P(y) for all x,yx,y.

Markov condition

A joint probability distribution PP on the nodes of a DAG GG is said to satisfy the Markov condition if for any set of random variable {X i} i=1 n\{X_i\}_{i = 1}^n on the nodes of GG, with choice of values {x i} i=1 n\{x_i\}_{i = 1}^n

P(x i,,x n)= i=1 nP(x i|pa(x i)).P(x_i, \dots, x_n) = \prod_{i = 1}^n P(x_i \ | \ {pa(x_i)}).

So, for the flu, fever and sneezing example above, a distribution PP satisfies the Markov condition if

P(flu,fever,sneezing)=P(fever|flu)P(sneezing|flu)P(flu).P(flu, \ fever, \ sneezing) = P(fever \ | \ flu) P(sneezing \ | \ flu) P(flu).

A Bayesian network is defined as a pair (G,P)(G,P) of a DAG GG and a joint probability distribution PP on the nodes of GG that satisfies the Markov condition with respect to GG. This means that each node in a Bayesian network is conditionally independent, given its parents, of any of the remaining nodes.

In particular, given a Bayesian network (G,P)(G,P) such that there is a directed edge XYX \to Y, the Markov condition implies that

yP(x,y)= yP(x)P(y|x)=P(x) yP(y|x)=P(x)\sum_{y} P(x,y) = \sum_y P(x) P(y \ | \ x) = P(x) \sum_y P(y \ | \ x) = P(x)

which may be interpreted as a discard condition. (The ordering is reflected by the fact that we can’t derive P(y)P(y) from xP(x,y)= xP(x)P(y|x)\sum_{x} P(x,y) = \sum_x P(x) P(y \ | \ x).)

Let’s consider some simple examples.


In the example of flu, sneezing and fever above, the graph has a fork shape. For a probability distribution PP to satisfy the Markov condition for this graph we must have

P(x,y,z)=P(x|z)P(y|z)P(z),x,y,z.P(x, y, z) = P(x \ | \ z) P(y \ | \ z)P(z), \ \forall x,y,z.

However, P(x,y)P(x)P(y)P(x,y) \neq P(x) P(y).

In other words, XY|ZX \perp\!\!\!\!\!\!\!\perp Y \ | \ Z, though XX and YY are not independent. This makes sense, we wouldn’t expect sneezing and fever to be uncorrelated, but given that we know whether or not Alice has flu, telling us that she has fever isn’t going to tell us anything about her sneezing.


Reversing the arrows in the fork graph above gives a collider as in the following example.

a poset

Clearly whether or not Alice has allergies other than hayfever is independent of what season it is. So we’d expect a distribution on this graph to satisfy XY|X \perp\!\!\!\!\!\!\!\perp Y \ | \ \emptyset. However, if we know that Alice is having an allergic reaction, and it happens to be spring, we will likely assume that she has some allergy, i.e. XX and YY are not conditionally independent given ZZ.

Indeed, the Markov condition and chain rule for this graph gives us XY|X \perp\!\!\!\!\!\!\!\perp Y \ | \ \emptyset:

P(x,y,z)=P(x)P(y)P(z|x,y)=P(z|x,y)P(x|y)P(y)x,y,z.P(x, y, z) = P(x)P(y) P(z \ | \ x,\ y) = P(z \ | \ x,\ y) P( x\ | \ y) P(y) \ \forall x,y,z.

from which we cannot derive P(x|z)P(y|z)=P(x,y|z)P(x \ | \ z) P(y \ | \ z) = P(x,y \ | \ z). (However, it could still be true for some particular choice of probability distribution.)


Finally, let us return to the chain of correlations presented in the introduction.

Clearly the probabilities that it is spring and that Alice is sneezing are not independent, and indeed, we cannot derive P(x,y)=P(x)P(y)P(x, y) = P(x) P(y). However observe that, by the chain rule, a Markov distribution on the chain graph must satisfy XY|ZX\perp\!\!\!\!\!\!\!\perp Y \ | \ Z. If we know Alice is having an allergic reaction that is not hayfever, whether or not she is sneezing is not going to affect our guess as to what season it is.

Crucially, in this case, knowing the season is also not going to affect whether we think Alice is sneezing. By definition, conditional independence of XX and YY given ZZ is symmetric in XX and YY. In other words, a joint distribution PP on the variables X,Y,ZX,Y,Z satisfies the Markov condition with respect to the chain graph

XZYX \longrightarrow Z \longrightarrow Y

if and only if PP satisfies the Markov condition on

YZX.Y \longrightarrow Z \longrightarrow X .


The above observations can be generalised to statements about conditional independences in any Bayesian network. That is, if (G,P)(G,P) is a Bayesian network then the structure of GG is enough to derive all the conditional independences in PP that are implied by the graph GG (in reality there may be more that have not been included in the network!).

Given a DAG GG and a set of vertices UU of GG, let m(U)m(U) denote the union of UU with all the vertices vv of GG such that there is a directed edge from UU to vv. The set W(U)W(U) will denote the non-inclusive future of UU, that is, the set of vertices vv of GG for which there is no directed (possibly trivial) path from vv to UU.

For a graph GG, let X,Y,ZX, Y, Z now denote disjoint subsets of the vertices of GG (and their corresponding random variables). Set W:=W(XYZ)W := W(X \cup Y \cup Z).

Then XX and YY are said to be d-separated by ZZ, written XY|ZX \perp Y \ | \ Z, if there is a partition {U,V,W,Z}\{U,V,W,Z\} of the nodes of GG such that

  • XUX \subseteq U and YVY \subseteq V, and

  • m(U)m(V)W,m(U) \cap m(V) \subseteq W, in other words UU and VV have no direct influence on each other.

(This is lemma 19 in the paper.)

Now d-separation is really useful since it tells us everything there is to know about the conditional dependences on Bayesian networks with underlying graph GG. Indeed,

Theorem 5

  • Soundness of d-separation (Verma and Pearl, 1988) If PP is a Markov distribution with respect to a graph GG then for all disjoint subsets X,Y,ZX,Y,Z of nodes of GG XY|ZX \perp Y \ | \ Z implies that XY|ZX \perp\!\!\!\!\!\!\!\perp Y \ | \ Z.

  • Completeness of d-separation (Meek, 1995) If XY|ZX \perp\!\!\!\!\!\!\!\perp Y \ | \ Z for all PP Markov with respect to GG, then XY|ZX \perp Y \ | \ Z.

We can combine the previous examples of fork, collider and chain graphs to get the following

a poset

A priori, Allergic reaction is conditionally independent of Fever. Indeed, we have the partition

a poset

which clearly satisfies d-separation. However, if Sneezing is known then W=W = \emptyset, so Allergic reaction and Fever are not independent. Indeed, if we use the same sets UU and VV as before, then m(U)m(V)={Sneezing}m(U) \cap m(V) = \{ Sneezing \}, so the condition for d-separation fails; and it does for any possible choice of UU and VV. Interestingly, if Flu is also known, we again obtain conditional independence between Allergic reaction and Fever, as shown below.

a poset

Before describing the limitations of this setup and why we may want to generalise it, it is worth observing that Theorem 5 is genuinely useful computationally. Theorem 5 says that given a Bayesian network (G,P)(G,P), the structure of GG gives us a recipe to factor PP, thereby greatly increasing computation efficiency for Bayesian inference.

Latent variables, hidden variables, and unobservables

In the context of Bayesian networks, there are two reasons that we may wish to add variables to a probabilistic model, even if we are not entirely sure what the variables signify or how they are distributed. The first reason is statistical and the second is physical.

Consider the example of flu, fever and sneezing discussed earlier. Although our analysis told us FeverSneezing|FluFever \perp\!\!\!\!\!\!\!\perp Sneezing \ | \ Flu, if we conduct an experiment we are likely to find:

P(fever|sneezing,flu)P(fever|flu).P(fever \ | \ sneezing, \ flu) \neq P(fever \ | \ flu).

The problem is caused by the graph not properly modelling reality, but a simplification of it. After all, there are a whole bunch of things that can cause sneezing and flu. We just don’t know what they all are or how to measure them. So, to make the network work, we may add a hypothetical latent variable that bunches together all the unknown joint causes, and equip it with a distribution that makes the whole network Bayesian, so that we are still able to perform inference methods like belief propagation.

a poset

On the other hand, we may want to add variables to a Bayesian network if we have evidence that doing so will provide a better model of reality.

For example, consider the network with just two connected nodes

a poset

Every distribution on this graph is Markov, and we would expect there to be a correlation between a road being wet and the grass next to it being wet as well, but most people would claim that there’s something missing from the picture. After all, rain could be a ‘common cause’ of the road and the grass being wet. So, it makes sense to add a third variable.

But maybe we can’t observe whether it has rained or not, only whether the grass and/or road are wet. Nonetheless, the correlation we observe suggests that they have a common cause. To deal with such cases, we could make the third variable hidden. We may not know what information is included in a hidden variable, nor its probability distribution.

All that matters is that the hidden variable helps to explain the observed correlations.

a poset

So, latent variables are a statistical tool that ensure the Markov condition holds. Hence they are inherently classical, and can, in theory, be known. But the universe is not classical, so, even if we lump whatever we want into as many classical hidden variables as we want and put them wherever we need, in some cases, there will still be empirically observed correlations that do not satisfy the Markov condition.

Most famously, Bell’s experiment shows that it is possible to have distinct variables AA and BB that exhibit correlations that cannot be explained by any classical hidden variable, since classical variables are restricted by the principle of locality.

In other words, though AB|ΛA \perp B \ | \ \Lambda,

P(a|b,λ)P(a|λ).P(a \ | b,\ \lambda) \neq P(a \ | \ \lambda).

Implicitly, this means that a classical Λ\Lambda is not enough. If we want P(a|b,λ)P(a|λ)P(a \ | b,\ \lambda) \neq P(a \ | \ \lambda) to hold, Λ\Lambda must be a non-local (non-classical) variable. Quantum mechanics implies that we can’t possibly empirically find the value of a non-local variable (for similar reasons to the Heisenberg’s uncertainty principle), so non-classical variables are often called unobservables. In particular, it is irrelevant to question whether AB|ΛA \perp\!\!\!\!\!\!\!\perp B \ | \ \Lambda, as we would need to know the value of Λ\Lambda in order to condition over it.

Indeed, this is the key idea behind what follows. We declare certain variables to be unobservable and then insist that conditional (in)dependence only makes sense between observable variables conditioned over observable variables.

Generalising classical causality

The correlations observed in the Bell experiment can be explained by quantum mechanics. But thought experiments such as the one described here suggest that theoretically, correlations may exist that violate even quantum causality.

So, given that graphical models and d-separation provide such a powerful tool for causal reasoning in the classical context, how can we generalise the Markov condition and Theorem 5 to quantum, and even more general causal theories? And, if we have a theory-independent Markov condition, are there d-separation results that don’t correspond to any given causal theory?

Clearly the first step in answering these questions is to fix a definition of a causal theory.

Operational probabilistic theories

An operational theory is a symmetric monoidal category (C,,I)(\mathsf {C}, \otimes, I) whose objects are known as systems or resources. Morphisms are finite sets f={𝒞 i} iIf = \{\mathcal {C}_i\}_{i \in I} called tests, whose elements are called outcomes. Tests with a single element are called deterministic, and for each system Aob(C)A \in ob (\mathsf {C}), the identity id A(A,A)id_A \in \mathsf (A,A) is a deterministic test.

In this discussion, we’ll identify tests {𝒞 i} i,{𝒟 j} j\{\mathcal {C}_i \}_i , \{\mathcal {D}_j\}_j in C\mathsf {C} if we may always replace one with the other without affecting the distributions in C(I,I)\mathsf {C}(I, I).

Given {𝒞 i} iC(B,C)\{\mathcal {C}_i \}_i \in \mathsf {C}(B, C) and {𝒟 j}C(A,B)\{\mathcal {D}_j \} \in \mathsf {C}(A, B), their composition fgf \circ g is given by

{𝒞 i𝒟 j} i,jC(A,C).\{ \mathcal {C}_i \circ \mathcal {D}_j \}_{i,j} \in \mathsf {C}(A, C).

First apply 𝒟\mathcal {D} with output BB then apply 𝒞\mathcal {C} with outcome CC.

The monoidal composition {𝒞 i𝒟 j} i,jC(AC,BD)\{ \mathcal {C}_i \otimes \mathcal {D}_j \}_{i, j} \in \mathsf {C}(A \otimes C, B \otimes D) corresponds to applying {𝒞 i} iC(A,B)\{\mathcal {C}_i\}_i \in \mathsf {C}(A,B) and {𝒟 j} j\{ \mathcal {D}_j \}_j separately on AA and CC.

An operational probabilistic theory or OPT is an operational theory such that every test III \to I is a probability distribution.

A morphism {𝒞 i} iC(A,I)\{ \mathcal {C}_i \}_i \in \mathsf {C}(A, I) is called an effect on AA. An OPT C\mathsf {C} is called causal or a causal theory if, for each system Aob(C)A \in ob (\mathsf {C}), there is a unique deterministic effect AC(A,I)\top_A \in \mathsf {C}( A, I) which we call the discard of AA.

In particular, for a causal OPT C\mathsf {C}, uniqueness of the discard implies that, for all systems A,Bob(C)A, B \in ob (\mathsf {C}),

A B= AB,\top_A \otimes \top_B = \top_{A \otimes B}, and, given any determinstic test 𝒞C(A,B)\mathcal {C} \in \mathsf {C}(A, B),

B𝒞= A.\top_B \circ \mathcal {C} = \top_A.

The existence of a discard map allows a definition of causal morphisms in a causal theory. For example, as we saw in January when we discussed Kissinger and Uijlen’s paper, a test {𝒞 i} iC(A,B)\{ \mathcal {C}_i \}_i \in \mathsf {C} (A, B) is causal if

B{𝒞 i} i= AC(A,I).\top_B \circ \{ \mathcal {C}_i \}_i = \top_A \in \mathsf {C}( A, I).

In other words, for a causal test, discarding the outcome is the same as not performing the test. Intuitively it is not obvious why such morphisms should be called causal. But this definition enables the formulation of a non-signalling condition that describes the conditions under which the possibility of cause-effect correlation is excluded, in particular, it implies the impossibility of time travel.


The category Mat( +)Mat(\mathbb {R}_+) of natural numbers and with Mat( +)(m,n)Mat(\mathbb {R}_+)(m,n) the set of n×mn \times m matrices, has the structure of a causal OPT. The causal morphisms in Mat( +)Mat(\mathbb {R}_+) are the stochastic maps (the matrices whose columns sum to 1). This category describes classical probability theory.

The category CPM\mathsf{CPM} of sets of linear operators on Hilbert spaces and completely positive maps between them is an OPT and describes quantum relations. The causal morphisms are the trace preserving completely positive maps.

Finally, Boxworld is the theory that allows to describe any correlation between two variables as the cause of some resource of the theory in the past.

Generalised Bayesian networks

So, we’re finally ready to give the main construction and results of the paper. As mentioned before, to get a generalised d-separation result, the idea is that we will distinguish observable and unobservable variables, and simply insist that conditional independence is only defined relative to observable variables.

To this end, a generalised DAG or GDAG is a DAG GG together with a partition on the nodes of GG into two subsets called observed and unobserved. We’ll represent observed nodes by triangles, and unobserved nodes by circles. An edge out of an (un)observed node will be called (un)observed and represented by a (solid) dashed arrow.

In order to get a generalisation of Theorem 5, we still need to come up with a sensible generalisation of the Markov property which will essentially say that at an observed node that has only observed parents, the distribution must be Markov. However, if an observed node has an unobserved parent, the latter’s whole history is needed to describe the distribution.

To state this precisely, we will associate a causal theory (C,,I)(\mathsf {C}, \otimes, I) to a GDAG GG via an assignment of systems to edges of GG and tests to nodes of GG, such that the observed edges of GG will ‘carry’ only the outcomes of classical tests (so will say something about conditional probability) whereas unobserved edges will carry only the output system.

Precisely, such an assignment PP satisfies the generalised Markov condition (GMC) and is called a generalised Markov distribution if

  • Each unobserved edge corresponds to a distinct system in the theory.

  • If we can’t observe what is happening at a node, we can’t condition over it: To each unobserved node and each value of its observed parents, we assign a deterministic test from the system defined by the product of its incoming (unobserved) edges to the system defined by the product of its outgoing (unobserved) edges.

  • Each observed node XX is an observation test, i.e. a morphism in C(A,I)\mathsf {C}(A, I) for the system Aob(C)A \in ob( \mathsf {C}) corresponding to the product of the systems assigned to the unobserved input edges of XX. Since C\mathsf {C} is a causal theory, this says that XX is assigned a classical random variable, also denoted XX, and that if YY is an observed node, and has observed parent XX, the distribution at YY is conditionally dependent on the distribution at XX (see here for details).

  • It therefore follows that each observed edge is assigned the trivial system II.

  • The joint probability distribution on the observed nodes of GG is given by the morphism C(I,I)\mathsf {C}(I, I) that results from these assignments.

A generalised Bayesian network consists of a GDAG GG together with a generalised Markov distribution PP on GG.


Consider the following GDAG

a poset

Let’s build its OPT morphism as indictated by the generalised Markov condition.

The observed node XX has no incoming edges so it corresponds to a C(I,I)\mathsf {C}(I, I) morphism, and thus we assign a probability distribution to it.

The unobserved node A depends on XX, and has no unobserved inputs, so we assign a deterministic test A(x):IAA(x): I \to A for each value xx of XX.

a poset

The observed node YY has one incoming unobserved edge and no incoming observed edges so we assign to it a test Y:AIY: A \to I such that, for each value xx of XX, YA(x)Y \circ A(x) is a probability distribution.

Building up the rest of the picture gives an OPT diagram of the form

a poset

which is a C(I,I)\mathsf {C}(I, I) morphism that defines the joint probability distribution P(x,y,z,w)P(x,y,z,w). We now have all the ingredients to state Theorem 22, the generalised d-separation theorem. This is the analogue of Theorem 5 for generalised Markov distributions.

Theorem 22

Given a GDAG GG and subsets X,Y,ZX,Y, Z of observed nodes

  • if a probability distribution PP is generalised Markov relative to GG then XY|ZXY|ZX \perp Y \ | \ Z \Rightarrow X\perp\!\!\!\!\!\!\!\perp Y \ | \ Z.

  • If XY|ZX\perp\!\!\!\!\!\!\!\perp Y \ | \ Z holds for all generalised Markov probability distributions on GG, then XY|ZX \perp Y \ | \ Z.

Note in particular that there is no change in the definition of d-separation: d-separation of a GDAG GG is simply d-separation with respect to its underlying DAG. There is also no change in the definition of conditional independence. Now, however, we restrict to statements of conditional independence with respect to observed nodes only. This enables the generalised soundness and completeness statements of the theorem.

The proof of soundness uses uniqueness of discarding, and completeness follows since generalised Markov is a stronger condition on a distribution than classically Markov.

Classical distributions on GDAGs

Theorem 22 is all well and good. But does it really generalise the classical case? That is, can we recover Theorem 5 for all classical Bayesian networks from Theorem 22?

As a first step, Proposition 17 states that if all the nodes of a generalised Bayesian network are observed, then it is a classical bayesian network. In fact, this follows pretty immediately from the definitions.

Moreover, it is easily checked that, given a classical Bayesian network, even if it has hidden or latent variables, it can still be expressed directly as a generalised Bayesian network with no unobserved nodes.

In fact, Theorem 22 generalises Theorem 5 in a stricter sense. That is, the generalised Bayesian network setup together with classical causality adds nothing extra to the theory of classical Bayesian networks. If a generalised Markov distribution is classical (then hidden and latent variables may be represented by unobserved nodes), it can be viewed as a classical Bayesian network. More precisely, Lemma 18 says that, given any generalised Bayesian network (G,P)(G,P) with underlying DAG GG' and distribution P𝒞P \in \mathcal {C}, we can construct a classical Bayesian network (G,P)(G', P') such that PP' agrees with PP on the observed nodes.

It is worth voicing a note of caution. The authors themselves mention in the conclusion that the construction based on GDAGs with two types of nodes is not entirely satisfactory. The problem is that, although the setups and results presented here do give a generalisation of Theorem 5, they do not, as such, provide a way of generalising Bayesian networks as they are used for probabilistic inference to non-classical settings. For example, belief propagation works through observed nodes, but there is no apparent way of generalising it for unobserved nodes.

Theory independence

More generally, given a GDAG GG, we can look at the set of distributions on GG that are generalised Markov with respect to a given causal theory. Of particular importance are the following.

  • The set 𝒞\mathcal {C} of generalised Markov distributions in Mat( +)Mat(\mathbb {R}_+) on GG.

  • The set 𝒬\mathcal {Q} of generalised Markov distributions in CPM\mathsf{CPM} on GG.

  • The set 𝒢\mathcal {G} of all generalised Markov distributions on GG. (This is the set of generalised Markov distributions in Boxworld.)

Moreover, we can distinguish another class of distributions on GG, by not restricting to d-seperation of observed nodes, but considering distributions that satisfy the observable conditional independences given by any d-separation properties on the graph. Theorem 22 implies, in particular that GIG \subseteq I.

And, so, since Mat( +)Mat(\mathbb {R}_+) embeds into CPM\mathsf{CPM}, we have 𝒞𝒬𝒢\mathcal {C} \subseteq \mathcal {Q} \subseteq \mathcal {G} \subseteq \mathcal {I}.

This means that one can ask for which graphs (some or all of) these inequalities are strict, and the last part of the paper explores these questions. In the original paper, a sufficient condition is given for graphs to satisfy 𝒞\mathcal {C} \neq \mathcal {I}. I.e. for these graphs it is guaranteed that the causal structure admits correlations that are non-local. Moreover the authors show that their condition is necessary for small enough graphs.

Another interesting result is that there exist graphs for which 𝒢\mathcal {G} \neq \mathcal {I}. This means that using a theory of resources, whatever theory it may be, to explain correlations imposes constraints that are stronger than those imposed by the relations themselves.

What next?

This setup represents one direction for using category theory to generalise Bayesian networks. In our group work at the ACT workshop, we considered another generalisation of Bayesian networks, this time staying within the classical realm. Namely, building on the work of Bonchi, Gadducci, Kissinger, Sobocinski, and Zanasi, we gave a functorial Markov condition on directed graphs admitting cycles. Hopefully we’ll present this work here soon.

July 04, 2018

n-Category Café Symposium on Compositional Structures

There’s a new conference series, whose acronym is pronounced “psycho”. It’s part of the new trend toward the study of “compositionality” in many branches of thought, often but not always using category theory:

  • First Symposium on Compositional Structures (SYCO1), School of Computer Science, University of Birmingham, 20-21 September, 2018. Organized by Ross Duncan, Chris Heunen, Aleks Kissinger, Samuel Mimram, Simona Paoli, Mehrnoosh Sadrzadeh, Pawel Sobocinski and Jamie Vicary.

The Symposium on Compositional Structures is a new interdisciplinary series of meetings aiming to support the growing community of researchers interested in the phenomenon of compositionality, from both applied and abstract perspectives, and in particular where category theory serves as a unifying common language. We welcome submissions from researchers across computer science, mathematics, physics, philosophy, and beyond, with the aim of fostering friendly discussion, disseminating new ideas, and spreading knowledge between fields. Submission is encouraged for both mature research and work in progress, and by both established academics and junior researchers, including students.

More details below! Our very own David Corfield is one of the invited speakers.

The Symposium on Compositional Structures is a new interdisciplinary series of meetings aiming to support the growing community of researchers interested in the phenomenon of compositionality, from both applied and abstract perspectives, and in particular where category theory serves as a unifying common language. We welcome submissions from researchers across computer science, mathematics, physics, philosophy, and beyond, with the aim of fostering friendly discussion, disseminating new ideas, and spreading knowledge between fields. Submission is encouraged for both mature research and work in progress, and by both established academics and junior researchers, including students.

Submission is easy, with no format requirements or page restrictions. The meeting does not have proceedings, so work can be submitted even if it has been submitted or published elsewhere.

While no list of topics could be exhaustive, SYCO welcomes submissions with a compositional focus related to any of the following areas, in particular from the perspective of category theory:

  • logical methods in computer science, including classical and quantum programming, type theory, concurrency, natural language processing and machine learning;
  • graphical calculi, including string diagrams, Petri nets and reaction networks;
  • languages and frameworks, including process algebras, proof nets, type theory and game semantics;
  • abstract algebra and pure category theory, including monoidal category theory, higher category theory, operads, polygraphs, and relationships to homotopy theory;
  • quantum algebra, including quantum computation and representation theory;
  • tools and techniques, including rewriting, formal proofs and proof assistants, and game theory;
  • industrial applications, including case studies and real-world problem descriptions.

This new series aims to bring together the communities behind many previous successful events which have taken place over the last decade, including “Categories, Logic and Physics”, “Categories, Logic and Physics (Scotland)”, “Higher-Dimensional Rewriting and Applications”, “String Diagrams in Computation, Logic and Physics”, “Applied Category Theory”, “Simons Workshop on Compositionality”, and the “Peripatetic Seminar in Sheaves and Logic”.

The steering committee hopes that SYCO will become a regular fixture in the academic calendar, running regularly throughout the year, and becoming over time a recognized venue for presentation and discussion of results in an informal and friendly atmosphere. To help create this community, in the event that more good-quality submissions are received than can be accommodated in the timetable, we may choose to defer some submissions to a future meeting, rather than reject them. This would be done based on submission order, giving an incentive for early submission, and avoiding any need to make difficult choices between strong submissions. Deferred submissions would be accepted for presentation at any future SYCO meeting without the need for peer review. This will allow us to ensure that speakers have enough time to present their ideas, without creating an unnecessarily competitive atmosphere. Meetings would be held sufficiently frequently to avoid a backlog of deferred papers.

Invited Speakers

  • David Corfield, Department of Philosophy, University of Kent: “The ubiquity of modal type theory”.

  • Jules Hedges, Department of Computer Science, University of Oxford: “Compositional game theory”

Important Dates

All times are anywhere-on-earth.

  • Submission deadline: Sunday 5 August 2018
  • Author notification: Monday 13 August 2018
  • Travel support application deadline: Monday 20 August 2018
  • Symposium dates: Thursday 20 September and Friday 21 September 2018


Submission is by EasyChair, via the following link:

Submissions should present research results in sufficient detail to allow them to be properly considered by members of the programme committee, who will assess papers with regards to significance, clarity, correctness, and scope. We encourage the submission of work in progress, as well as mature results. There are no proceedings, so work can be submitted even if it has been previously published, or has been submitted for consideration elsewhere. There is no specific formatting requirement, and no page limit, although for long submissions authors should understand that reviewers may not be able to read the entire document in detail.


Some funding is available to cover travel and subsistence costs, with a priority for PhD students and junior researchers. To apply for this funding, please contact the local organizer Jamie Vicary at by the deadline given above, with a short statement of your travel costs and funding required.

Programme Committee

The symposium managed by the following people, who also serve as the programme committee.

  • Ross Duncan, University of Strathclyde
  • Chris Heunen, University of Edinburgh
  • Aleks Kissinger, Radboud University Nijmegen
  • Samuel Mimram, École Polytechnique
  • Simona Paoli, University of Leicester
  • Mehrnoosh Sadrzadeh, Queen Mary, University of London
  • Pawel Sobocinski, University of Southampton
  • Jamie Vicary, University of Birmingham and University of Oxford (local organizer)

Tommaso DorigoChasing The Higgs Self Coupling: New CMS Results

Happy Birthday Higgs boson! The discovery of the last fundamental particle of the Standard Model was announced exactly 6 years ago at CERN (well, plus one day, since I decided to postpone to July 5 the publication of this post...).

In the Standard Model, the theory of fundamental interactions among elementary particles which enshrines our current understanding of the subnuclear world,  particles that constitute matter are fermionic: they have a haif-integer value of a quantity we call spin; and particles that mediate interactions between those fermions, keeping them together and governing their behaviour, are bosonic: they have an integer value of spin. 

read more

July 02, 2018

John PreskillThe World Cup from a Quantum Perspective

Two weeks into the Football World Cup and the group stages are over: 16 teams have gone home, leaving the top 16 teams to contend the knock-out stages. Those fans who enjoy a bet will be poring over the odds in search of a bargain—a mis-calculation on the part of the bookmakers. Is now the time to back Ronaldo for the golden boot, whilst Harry Kane dominates the headlines and sports betting markets? Will the hosts Russia continue to defy lowly pre-tournament expectations and make the semi-finals? Are France about to emerge from an unconvincing start to the tournament and blossom as clear front-runners?

But, whilst for most the sports betting markets may lead to the enhanced enjoyment of the tournament that a bet can bring (as well as the possibility of making a little money), for others they represent a window into the fascinating world of sports statistics. A fascination that can be captured by the simple question: how do they set the odds?

Suppose that a bookmaker has in their possession a list of outcome probabilities for matches between each pair of national football teams in the world and wants to predict the overall winner. There are 32768 possible ways for the tournament knock-out rounds to pan-out—a large, but not insurmountable number of iterations by modern computational standards.

However, if the bookmaker instead considers the tennis grand-slams, with 128 competitors in the first round, then there are a colossal 1.7 × 1038 permutations. Indeed, in a knock-out format there are 2n-1 permutations, where n is the number of entrants. And for those of a certain mindset, this exponentially growing space immediately raises the question of whether a quantum algorithm can yield a speed-up for the related prediction tasks.

A Tiny Cup

The immediate question which we want to answer here is, perhaps, who will win the World Cup. We will walk through the idea on the blackboard first, and then implement it as a quantum algorithm—which, hopefully, will give some insight into how and where quantum computers can outperform classical ones, for this particular way of answering the question.

Let us take a toy setup with four teams A, B, C and D;
the knockout stage starts with a game A vs. B, and C vs. D.
Whoever wins each game will play against each other, so here we have four possible final games: A vs. C, A vs. D, B vs. C, or B vs. D.
Let’s denote by p(X, Y) the probability that X wins when playing against Y.

The likelihood of A winning the cup is then simply given by

p(A, B) × ( p(C, D) × p(A, C) + p(D, C) × p(A, D) ),

i.e. the probability that A wins against B, times the probabilities of A winning against C in case C won against D, plus the probability of A winning against D in case D won.

How can we obtain the same quantity with a quantum algorithm?

First, we set up our Hilbert space so that it can represent all possible Cup scenarios.
Since we have four teams, we need a four-dimensional quantum system as our smallest storage unit—we commonly call those qudits as generalizations of a qubit, which having dimension 2 would be fit to store two teams only (we can always “embed” a qudit into a few qubits of the same dimension.

Remember: k qubits have dimension 2k, so we could also store the qudit as two qubits).
If we write |A\rangle, this simply stands for a qudit representing team A; if we write |A\rangle |B\rangle, then we have a state representing two teams.

To represent a full knockout tree, we follow the same logic: Take four qudits for the initial draw; add two qudits for the winners of the first two matches, and one qudit for the final winner.

For instance, one possible knockout scenario would be

|\text{Game 1}\rangle = \underbrace{|A\rangle |B\rangle |C\rangle |D\rangle}_\text{Initial Draw} \ \underbrace{|A\rangle |D\rangle}_\text{Finals} \ |D\rangle.

The probability associated with Game 1 is then precisely p(A, B) × p(D, C) × p(D, A).

Here is where quantum computing comes in.

Starting from an initial state |A\rangle |B\rangle |C\rangle |D\rangle, we create two new slots in a superposition over all possible match outcomes, weighted by the square-root of their probabilities (which we call q instead of p):

\begin{aligned} |\text{Step 1}\rangle = |A\rangle |B\rangle |C\rangle |D\rangle \big(\ \ &\text{q(A, B)q(C, D)} \,|A\rangle\ |C\rangle +\\ &\text{q(A, B)q(D, C)} \,|A\rangle\ |D\rangle +\\ &\text{q(B, A)q(C, D)} \,|B\rangle\ |C\rangle +\\ &\text{q(B, A)q(D, C)} \,|B\rangle\ |D\rangle\ \big). \end{aligned}

For the final round, we perform the same operation on those two last slots; e.g. we would map |A\rangle |C\rangle to a state |A\rangle |C\rangle ( q(A, C) |A\rangle + q(C, A) |C\rangle ). The final state is thus a superposition over eight possible weighted games (as we would expect).

So you can tell me who wins the World Cup?

Yes. Or well, probably. We find out by measuring the rightmost qudit.
As we know, the probability of obtaining a certain measurement outcome, say A, will then be determined by the square of the weights in front of the measured state; since we put in the square-roots initially we recover the original probabilities. Neat!

And since there are two possible game trees that lead to a victory of A, we have to sum them up—and we get precisely the probability we calculated by hand above. This means the team that is most likely to win will be the most likely measurement outcome.

So what about the World Cup? We have 16 teams; one team can thus be stored in four qubits. The knockout tree has 31 vertices, and a naive implementation can be done on a quantum computer with 124 qubits. Of course we are only a bit naive, so we can simulate this quantum computer on a classical one and obtain the following winning probabilities:

0.194 Brazil
0.168 Spain
0.119 France
0.092 Belgium
0.082 Argentina
0.075 England
0.049 Croatia
0.041 Colombia
0.04 Portugal
0.032 Uruguay
0.031 Russia
0.022 Switzerland
0.019 Denmark
0.018 Sweden
0.012 Mexico
0.006 Japan

It is worth noting that all operations we described can be implemented efficiently with a quantum computer, and the number of required qubits is quite small; for the four teams, we could get away with seven qudits, or fourteen qubits (and we could even save some, by ignoring the first four qudits which are always the same).

So for this particular algorithm there is an exponential speedup over its non-probabilistic classical counterpart: as mentioned, one would have to iterate over all trees; tedious for the World Cup, practically impossible for tennis. However…

Classical vs. Quantum

Does using a quantum algorithm give us a speedup for this task? Here, the answer is no; one could obtain similar results in comparable time using probabilistic methods, for instance, by doing Monte Carlo sampling.

But there are several interesting related questions that we could ask for which there might be a quantum advantage.

For some team A, we can easily create a state that has all game trees in superposition that lead to a victory of A—even weighting them using their respective probabilities.
Given this state as a resource, we can think of questions like “which game tree is most likely, given that we fix A and B as semifinalists”, or “which team should A play in the knockout stages to maximize the probability that B wins the tournament”.

Or, more controversially: can we optimize the winning chances for some team by rearranging the initial draw?

Some questions like these lend themselves to applying Grover search, for which there is a known speedup over classical computers. To inquire deeper into the utility of quantum algorithms, we need to invent the right kind of question to ask of this state.

Let us think of one more toy example. Being part physicists, we assume cows are spheres—so we might as well also assume that if A is likely to win a match against B, it always wins—even if the probability is only 51%. Let’s call this exciting sport “deterministic football”. For a set of teams playing a tournament of deterministic football, does there exist a winning initial draw for every team?

This becomes an especially interesting question in cases where there is a non-trivial cyclic relation between the teams’ abilities, a simple example being: A always beats B, B always beats C, and C always beats A. For example, if this problem turns out to be NP-hard, then it would be reasonable to expect that the quadratic improvement achieved by quantum search is the best we can hope for in using a quantum algorithm for the task of finding a winning initial draw for a chosen team—at least for deterministic football (phew).

To the finals and beyond

World Cup time is an exciting time: whatever the question, we are essentially dealing with binary trees, and making predictions can be translated into finding partitions or cuts that satisfy certain properties defined through a function of the edge weights (here the pairwise winning probabilities). We hope this quantum take on classical bookmaking might point us in the direction of new and interesting applications for quantum algorithms.

Hopefully a good bargain!

(Written with Steven Herbert and Sathyawageeswar Subramanian)

June 25, 2018

Sean CarrollOn Civility

Alex Wong/Getty Images

White House Press Secretary Sarah Sanders went to have dinner at a local restaurant the other day. The owner, who is adamantly opposed to the policies of the Trump administration, politely asked her to leave, and she did. Now (who says human behavior is hard to predict?) an intense discussion has broken out concerning the role of civility in public discourse and our daily life. The Washington Post editorial board, in particular, called for public officials to be allowed to eat in peace, and people have responded in volume.

I don’t have a tweet-length response to this, as I think the issue is more complex than people want to make it out to be. I am pretty far out to one extreme when it comes to the importance of engaging constructively with people with whom we disagree. We live in a liberal democracy, and we should value the importance of getting along even in the face of fundamentally different values, much less specific political stances. Not everyone is worth talking to, but I prefer to err on the side of trying to listen to and speak with as wide a spectrum of people as I can. Hell, maybe I am even wrong and could learn something.

On the other hand, there is a limit. At some point, people become so odious and morally reprehensible that they are just monsters, not respected opponents. It’s important to keep in our list of available actions the ability to simply oppose those who are irredeemably dangerous/evil/wrong. You don’t have to let Hitler eat in your restaurant.

This raises two issues that are not so easy to adjudicate. First, where do we draw the line? What are the criteria by which we can judge someone to have crossed over from “disagreed with” to “shunned”? I honestly don’t know. I tend to err on the side of not shunning people (in public spaces) until it becomes absolutely necessary, but I’m willing to have my mind changed about this. I also think the worry that this particular administration exhibits authoritarian tendencies that could lead to a catastrophe is not a completely silly one, and is at least worth considering seriously.

More importantly, if the argument is “moral monsters should just be shunned, not reasoned with or dealt with constructively,” we have to be prepared to be shunned ourselves by those who think that we’re moral monsters (and those people are out there).  There are those who think, for what they take to be good moral reasons, that abortion and homosexuality are unforgivable sins. If we think it’s okay for restaurant owners who oppose Trump to refuse service to members of his administration, we have to allow staunch opponents of e.g. abortion rights to refuse service to politicians or judges who protect those rights.

The issue becomes especially tricky when the category of “people who are considered to be morally reprehensible” coincides with an entire class of humans who have long been discriminated against, e.g. gays or transgender people. In my view it is bigoted and wrong to discriminate against those groups, but there exist people who find it a moral imperative to do so. A sensible distinction can probably be made between groups that we as a society have decided are worthy of protection and equal treatment regardless of an individual’s moral code, so it’s at least consistent to allow restaurant owners to refuse to serve specific people they think are moral monsters because of some policy they advocate, while still requiring that they serve members of groups whose behaviors they find objectionable.

The only alternative, as I see it, is to give up on the values of liberal toleration, and to simply declare that our personal moral views are unquestionably the right ones, and everyone should be judged by them. That sounds wrong, although we do in fact enshrine certain moral judgments in our legal codes (murder is bad) while leaving others up to individual conscience (whether you want to eat meat is up to you). But it’s probably best to keep that moral core that we codify into law as minimal and widely-agreed-upon as possible, if we want to live in a diverse society.

This would all be simpler if we didn’t have an administration in power that actively works to demonize immigrants and non-straight-white-Americans more generally. Tolerating the intolerant is one of the hardest tasks in a democracy.



John PreskillWhat’s the worst that could happen?

The archaeologist Howard Carter discovered Tutankhamun’s burial site in 1922. No other Egyptian pharaoh’s tomb had survived mostly intact until the modern era. Gold and glass and faience, statues and pendants and chariots, had evaded looting. The discovery would revolutionize the world’s understanding of, and enthusiasm for, ancient Egypt.

First, the artifacts had to leave the tomb.

Tutankhamun lay in three coffins nested like matryoshka dolls. Carter describes the nesting in his book The Tomb of Tutankhamen. Lifting the middle coffin from the outer coffin raised his blood pressure:

Everything may seem to be going well until suddenly, in the crisis of the process, you hear a crack—little pieces of surface ornament fall. Your nerves are at an almost painful tension. What is happening? All available room in the narrow space is crowded by your men. What action is needed to avert a catastrophe?

In other words, “What’s the worst that could happen?”

Matryoshka dolls

Collaborators and I asked that question in a paper published last month. We had in mind less Egyptology than thermodynamics and information theory. But never mind the distinction; you’re reading Quantum Frontiers! Let’s mix the fields like flour and oil in a Biblical grain offering.

Carter’s team had trouble separating the coffins: Ancient Egyptian priests (presumably) had poured fluid atop the innermost, solid-gold coffin. The fluid had congealed into a brown gunk, gluing the gold coffin to the bottom of the middle coffin. Removing the gold coffin required work—thermodynamic work.

Work consists of “well-ordered” energy usable in tasks like levering coffins out of sarcophagi and motoring artifacts from Egypt’s Valley of the Kings toward Cairo. We can model the gunk as a spring, one end of which was fixed to the gold coffin and one end of which was fixed to the middle coffin. The work W required to stretch a spring depends on the spring’s stiffness (the gunk’s viscosity) and on the distance stretched through.

W depends also on details: How many air molecules struck the gold coffin from above, opposing the team’s effort? How quickly did Carter’s team pull? Had the gunk above Tuankhamun’s nose settled into a hump or spread out? How about the gunk above Tutankhamun’s left eye socket? Such details barely impact the work required to open a 6.15-foot-long coffin. But air molecules would strongly impact W if Tutankhamun measured a few nanometers in length. So imagine Egyptian matryoshka dolls as long as stubs of DNA.


Imagine that Carter found one million sets of these matryoshka dolls. Lifting a given set’s innermost coffin would require an amount W of work that would vary from set of coffins to set of coffins. W would satisfy fluctuation relations, equalities I’ve blogged about many times.

Fluctuation relations resemble the Second Law of Thermodynamics, which illuminates why time flows in just one direction. But fluctuation relations imply more-precise predictions about W than the Second Law does.

Some predictions concern dissipated work: Carter’s team could avoid spending much work by opening the coffin infinitesimally slowly. Speeding up would heat the gunk, roil air molecules, and more. The heating and roiling would cost extra work, called dissipated work, denoted by W_{\rm diss}.

Suppose that Carter’s team has chosen a lid-opening speed v. Consider the greatest W_{\rm diss} that the team might have to waste on any nanoscale coffin. W_{\rm diss}^{\rm worst} is proportional to each of three information-theoretic quantities, my coauthors and I proved.

For experts: Each information-theoretic quantity is an order-infinity Rényi divergence D_\infty ( X || Y). The Rényi divergences generalize the relative entropy D ( X || Y ). D quantifies how efficiently one can distinguish between probability distributions, or quantum states, X and Y on average. The average is over many runs of a guessing game.

Imagine the worst possible run, which offers the lowest odds of guessing correctly. D_\infty quantifies your likelihood of winning. We related W_{\rm diss}^{\rm worst} to a D_\infty between two statistical-mechanical phase-space distributions (when we described classical systems), to a D_\infty between two quantum states (when we described quantum systems), and to a D_\infty between two probability distributions over work quantities W (when we described systems quantum and classical).


The worst case marks an extreme. How do the extremes consistent with physical law look? As though they’ve escaped from a mythologist’s daydream.

In an archaeologist’s worst case, arriving at home in the evening could lead to the following conversation:

“How was your day, honey?”

“The worst possible.”

“What happened?”

“I accidentally eviscerated a 3.5-thousand-year-old artifact—the most valuable, best-preserved, most information-rich, most lavishly wrought ancient Egyptian coffin that existed yesterday.”

Suppose that the archaeologist lived with a physicist. My group (guided by a high-energy physicist) realized that the conversation could continue as follows:

“And how was your day?”

“Also the worst possible.”

“What happened?”

“I created a black hole.”

General relativity and high-energy physics have begun breeding with quantum information and thermodynamics. The offspring bear extremes like few other systems imaginable. I wonder what our results would have to say about those offspring.


National Geographic reprinted Carter’s The Tomb of Tutankhamen in its “Adventure Classics” series. The series title fits Tomb as a mummy’s bandages fit the mummy. Carter’s narrative stretches from Egypt’s New Kingdom (of 3.5 thousand years ago) through the five-year hunt for the tomb (almost fruitless until the final season), to a water boy’s discovery of steps into the tomb, to the unsealing of the burial chamber, to the confrontation of Tutankhamun’s mummy.

Carter’s book guided me better than any audio guide could have at the California Science Center. The center is hosting the exhibition “King Tut: Treasures of the Golden Pharaoh.” After debuting in Los Angeles, the exhibition will tour the world. The tour showcases 150 artifacts from Tutankhamun’s tomb.

Those artifacts drove me to my desk—to my physics—as soon as I returned home from the museum. Tutankhamun’s tomb, Carter argues in his book, ranks amongst the 20th century’s most important scientific discoveries. I’d seen a smidgeon of the magnificence that Carter’s team— with perseverance, ingenuity, meticulousness, and buckets of sweat shed in Egypt’s heat—had discovered. I don’t expect to discover anything a tenth as magnificent. But how can a young scientist resist trying?

People say, “Prepare for the worst. Hope for the best.” I prefer “Calculate the worst. Hope and strive for a Tutankhamun.”

Outside exhibition

Postscript: Carter’s team failed to unglue the gold coffin by just “stretching” the gunky “spring.” The team resorted to heat, a thermodynamic quantity alternative to work: The team flipped the middle coffin upside-down above a heat lamp. The lamp raised the temperature to 932°F, melting the goo. The melting, with more work, caused the gold coffin to plop out of the middle coffin.

June 22, 2018

ResonaancesBoth g-2 anomalies

Two months ago an experiment in Berkeley announced a new ultra-precise measurement of the fine structure constant α using interferometry techniques. This wasn't much noticed because the paper is not on arXiv, and moreover this kind of research is filed under metrology, which is easily confused with meteorology. So it's worth commenting on why precision measurements of α could be interesting for particle physics. What the Berkeley group really did was to measure the mass of the cesium-133 atom, achieving the relative accuracy of 4*10^-10, that is 0.4 parts par billion (ppb). With that result in hand, α can be determined after a cavalier rewriting of the high-school formula for the Rydberg constant:   
Everybody knows the first 3 digits of the Rydberg constant, Ry≈13.6 eV, but actually it is experimentally known with the fantastic accuracy of 0.006 ppb, and the electron-to-atom mass ratio has also been determined precisely. Thus the measurement of the cesium mass can be translated into a 0.2 ppb measurement of the fine structure constant: 1/α=137.035999046(27).

You may think that this kind of result could appeal only to a Pythonesque chartered accountant. But you would be wrong. First of all, the new result excludes  α = 1/137 at 1 million sigma, dealing a mortal blow to the field of epistemological numerology. Perhaps more importantly, the result is relevant for testing the Standard Model. One place where precise knowledge of α is essential is in calculation of the magnetic moment of the electron. Recall that the g-factor is defined as the proportionality constant between the magnetic moment and the angular momentum. For the electron we have
Experimentally, ge is one of the most precisely determined quantities in physics,  with the most recent measurement quoting a= 0.00115965218073(28), that is 0.0001 ppb accuracy on ge, or 0.2 ppb accuracy on ae. In the Standard Model, ge is calculable as a function of α and other parameters. In the classical approximation ge=2, while the one-loop correction proportional to the first power of α was already known in prehistoric times thanks to Schwinger. The dots above summarize decades of subsequent calculations, which now include O(α^5) terms, that is 5-loop QED contributions! Thanks to these heroic efforts (depicted in the film  For a Few Diagrams More - a sequel to Kurosawa's Seven Samurai), the main theoretical uncertainty for the Standard Model prediction of ge is due to the experimental error on the value of α. The Berkeley measurement allows one to reduce the relative theoretical error on adown to 0.2 ppb:  ae = 0.00115965218161(23), which matches in magnitude the experimental error and improves by a factor of 3 the previous prediction based on the α measurement with rubidium atoms.

At the spiritual level, the comparison between the theory and experiment provides an impressive validation of quantum field theory techniques up to the 13th significant digit - an unimaginable  theoretical accuracy in other branches of science. More practically, it also provides a powerful test of the Standard Model. New particles coupled to the electron may contribute to the same loop diagrams from which ge is calculated, and could shift the observed value of ae away from the Standard Model predictions. In many models, corrections to the electron and muon magnetic moments are correlated. The latter famously deviates from the Standard Model prediction by 3.5 to 4 sigma, depending on who counts the uncertainties. Actually, if you bother to eye carefully the experimental and theoretical values of ae beyond the 10th significant digit you can see that they are also discrepant, this time at the 2.5 sigma level. So now we have two g-2 anomalies! In a picture, the situation can be summarized as follows:

If you're a member of the Holy Church of Five Sigma you can almost preach an unambiguous discovery of physics beyond the Standard Model. However, for most of us this is not the case yet. First, there is still some debate about the theoretical uncertainties entering the muon g-2 prediction. Second, while it is quite easy to fit each of the two anomalies separately, there seems to be no appealing model to fit both of them at the same time.  Take for example the very popular toy model with a new massive spin-1 Z' boson (aka the dark photon) kinetically mixed with the ordinary photon. In this case Z' has, much like the ordinary photon, vector-like and universal couplings to electron and muons. But this leads to a positive contribution to g-2, and it does not fit well the ae measurement which favors a new negative contribution. In fact, the ae measurement provides the most stringent constraint in part of the parameter space of the dark photon model. Conversely, a Z' boson with purely axial couplings to matter does not fit the data as it gives a negative contribution to g-2, thus making the muon g-2 anomaly worse. What might work is a hybrid model with a light Z' boson having lepton-flavor violating interactions: a vector coupling to muons and a somewhat smaller axial coupling to electrons. But constructing a consistent and realistic model along these lines is a challenge because of other experimental constraints (e.g. from the lack of observation of μ→eγ decays). Some food for thought can be found in this paper, but I'm not sure if a sensible model exists at the moment. If you know one you are welcome to drop a comment here or a paper on arXiv.

More excitement on this front is in store. The muon g-2 experiment in Fermilab should soon deliver first results which may confirm or disprove the muon anomaly. Further progress with the electron g-2 and fine-structure constant measurements is also expected in the near future. The biggest worry is that, if the accuracy improves by another two orders of magnitude, we will need to calculate six loop QED corrections... 

June 16, 2018

Tommaso DorigoOn The Residual Brightness Of Eclipsed Jovian Moons

While preparing for another evening of observation of Jupiter's atmosphere with my faithful 16" dobsonian scope, I found out that the satellite Io will disappear behind the Jovian shadow tonight. This is a quite common phenomenon and not a very spectacular one, but still quite interesting to look forward to during a visual observation - the moon takes some time to fully disappear, so it is fun to follow the event.
This however got me thinking. A fully eclipsed jovian moon should still be able to reflect back some light picked up from the still lit other satellites - so it should not, after all, appear completely dark. Can a calculation be made of the effect ? Of course - and it's not that difficult.

read more

June 10, 2018

Sean CarrollIntro to Cosmology Videos

In completely separate video news, here are videos of lectures I gave at CERN several years ago: “Cosmology for Particle Physicists” (May 2005). These are slightly technical — at the very least they presume you know calculus and basic physics — but are still basically accurate despite their age.

  1. Introduction to Cosmology
  2. Dark Matter
  3. Dark Energy
  4. Thermodynamics and the Early Universe
  5. Inflation and Beyond

Update: I originally linked these from YouTube, but apparently they were swiped from this page at CERN, and have been taken down from YouTube. So now I’m linking directly to the CERN copies. Thanks to commenters Bill Schempp and Matt Wright.

Tim GowersA new journal in combinatorics

This post is to announce that a new journal, Advances in Combinatorics, has just opened for submissions. I shall also say a little about the journal, about other new journals, about my own experiences of finding journals I am happy to submit to, and about whether we are any nearer a change to more sensible systems of dissemination and evaluation of scientific papers.

Advances in Combinatorics

Advances in Combinatorics is set up as a combinatorics journal for high-quality papers, principally in the less algebraic parts of combinatorics. It will be an arXiv overlay journal, so free to read, and it will not charge authors. Like its cousin Discrete Analysis (which has recently published its 50th paper) it will be run on the Scholastica platform. Its minimal costs are being paid for by the library at Queen’s University in Ontario, which is also providing administrative support. The journal will start with a small editorial board. Apart from me, it will consist of Béla Bollobás, Reinhard Diestel, Dan Kral, Daniela Kühn, James Oxley, Bruce Reed, Gabor Sarkozy, Asaf Shapira and Robin Thomas. Initially, Dan Kral and I will be the managing editors, though I hope to find somebody to replace me in that role once the journal is established. While I am posting this, Dan is simultaneously announcing the journal at the SIAM conference in Discrete Mathematics, where he has just given a plenary lecture. The journal is also being announced by COAR, the Confederation of Open Access Repositories. This project aligned well with what they are trying to do, and it was their director, Kathleen Shearer, who put me in touch with the library at Queen’s.

As with Discrete Analysis, all members of the editorial board will be expected to work: they won’t just be lending their names to give the journal bogus prestige. Each paper will be handled by one of the editors, who, after obtaining external opinions (when the paper warrants them) will make a recommendation to the rest of the board. All decisions will be made collectively. The job of the managing editors will be to make sure that this process runs smoothly, but when it comes to decisions, they will have no more say than any other editor.

The rough level that the journal is aiming at is that of a top specialist journal such as Combinatorica. The reason for setting it up is that there is a gap in the market for an “ethical” combinatorics journal at that level — that is, one that is not published by one of the major commercial publishers, with all the well known problems that result. We are not trying to destroy the commercial combinatorial journals, but merely to give people the option of avoiding them if they would prefer to submit to a journal that is not complicit in a system that uses its monopoly power to ruthlessly squeeze library budgets.

We are not the first ethical journal in combinatorics. Another example is The Electronic Journal of Combinatorics, which was set up by Herb Wilf back in 1994. The main difference between EJC and Advances in Combinatorics is that we plan to set a higher bar for acceptance, even if it means that we accept only a small number of papers. (One of the great advantages of a fully electronic journal is that we do not have a fixed number of issues per year, so we will not have to change our standards artificially in order to fill issues or clear backlogs.) We thus hope that EJC and AIC will between them offer suitable potential homes for a wide range of combinatorics papers. And on the more algebraic side, one should also mention Algebraic Combinatorics, which used to be the Springer journal The Journal of Algebraic Combinatorics (which officially continues with an entirely replaced editorial board — I don’t know whether it’s getting many submissions though), and also the Australasian Journal of Combinatorics.

So if you’re a combinatorialist who is writing up a result that you think is pretty good, then please consider submitting it to us. What do we mean by “pretty good”? My personal view — that is, I am not speaking for the rest of the editorial board — is that the work in a good paper should have a clear reason for others to be interested in it (so not, for example, incremental progress in some pet project of the author) and should have something about it that makes it count as a significant achievement, such as solving a well-known problem, clearing a difficult technical hurdle, inventing a new and potentially useful technique, or giving a beautiful and memorable proof.

What other ethical journals are there?

Suppose that you want to submit an article to a journal that is free to read and does not charge authors. What are your options? I don’t have a full answer to this question, so I would very much welcome feedback from other people, especially in areas of mathematics far from my own, about what the options are for them. But a good starting point is to consult the list of current member journals in the Free Journal Network, which Advances in Combinatorics hopes to join in due course.

Three notable journals not on that list are the following.

  1. Acta Mathematica. This is one of a tiny handful of the very top journals in mathematics. Last year it became fully open access without charging author fees. So for a really good paper it is a great option.
  2. Annales Henri Lebesgue. This is a new journal that has not yet published any articles, but is open for submissions. Like Acta Mathematica, it covers all of mathematics. It aims for a very high standard, but it is not yet clear what that means in practice: I cannot say that it will be roughly at the level of Journal X. But perhaps it will turn out to be suitable for a very good paper that is just short of the level of Annals, Acta, or JAMS.
  3. Algebra and Number Theory. I am told that this is regarded as the top specialist journal in number theory. From a glance at the article titles, I don’t see much analytic number theory, but there are notable analytic number theorists on the editorial board, so perhaps I have just not looked hard enough.

Added later: I learn from Benoît Kloeckner and Emmanuel Kowalski in the comments below that my information about Algebra and Number Theory was wrong, since articles in that journal are not free to read until they are five years old. However, it is published by MSP, which is a nonprofit organization, so as subscription journals go it is at the ethical end of the spectrum.

Further update: I have heard from the editors of Annales Henri Lebesgue that they have had a number of strong submissions and expect the level of the journal to be at least as high as that of journals such as Advances in Mathematics, Mathematische Annalen and the Israel Journal of Mathematics, and perhaps even slightly higher.

In what areas are ethical journals most needed?

I would very much like to hear from people who would prefer to avoid the commercially published journals, but can’t, because there are no ethical journals of a comparable standard in their area. I hope that combinatorialists will no longer have that problem. My impression is that there is a lack of suitable journals in analysis and I’m told that the same is true of logic. I’m not quite sure what the situation is in geometry or algebra. (In particular, I don’t know whether Algebra and Number Theory is also considered as the top specialist journal for algebraists.) Perhaps in some areas there are satisfactory choices for papers of some standards but not of others: that too would be interesting to know. Where do you think the gaps are? Let me know in the comments below.

Starting a new journal.

I want to make one point loud and clear, which is that the mechanics of starting a new, academic-run journal are now very easy. Basically, the only significant obstacle is getting together an editorial board with the right combination of reputation in the field and willingness to work. What’s more, unless the journal grows large, the work is quite manageable — all the more so if it is spread reasonably uniformly amongst the editorial board. Creating the journal itself can be done on one of a number of different platforms, either for no charge or for a very small charge. Some examples are the Mersenne platform, which hosts the Annales Henri Lebesgue, the Episciences platform, which hosts the Epijournal de Géométrie Algébrique, and Scholastica, which, as I mentioned above, hosts Discrete Analysis and Advances in Combinatorics.

Of these, Scholastica charges a submission fee of $10 per article and the other two are free. There are a few additional costs — for example, Discrete Analysis pays a subscription to CrossRef in order to give DOIs to articles — but the total cost of running a new journal that isn’t too large is of the order of a few hundred dollars per year, as long as nobody is paid for what they do. (Discrete Analysis, like Advances in Combinatorics, gets very useful assistance from librarians, provided voluntarily, but even if they were paid the going rate, the total annual costs would be of the same order of magnitude as one “article processing charge” of the traditional publishers, which is typically around $1500 per article.)

What’s more, those few hundred dollars are not an obstacle either. For example, I know of a fund that is ready to support at least one other journal of a similar size to Discrete Analysis, there are almost certainly other libraries that would be interested in following the enlightened example of Queen’s University Library and supporting a journal (if you are a librarian reading this, then I strongly recommend doing so, as it will be helping to weaken the hold of the system that is currently costing you orders of magnitude more money), and I know various people who know about other means of obtaining funding. So if you are interested in starting a journal and think you can put together a credible editorial board, then get in touch: I can offer advice, funding (if the proposal looks a good one), and contact with several other people who are knowledgeable and keen to help.

A few remarks about my own relationship with mathematical publishing.

My attitudes to journals and the journal system have evolved quite a lot in the last few years. The alert reader may have noticed that I’ve got a long way through this post before mentioning the E-word. I still think that Elsevier is the publisher that does most damage, and have stuck rigidly to my promise made over six years ago not to submit a paper to them or to do editorial or refereeing work. However, whereas then I thought of Springer as somehow more friendly to mathematics, thanks to its long tradition of publishing important textbooks and monographs, I now feel pretty uncomfortable about all the big four — Elsevier, Springer, Wiley, and Taylor and Francis — with Springer having got a whole lot worse after merging with Nature Macmillan. And in some respects Elsevier is better than Springer: for example, they make all mathematics papers over four years old freely available, while Springer refuses to do so. Admittedly this was basically a sop to mathematicians to keep us quiet, but as sops go it was a pretty good one, and I see now that Elsevier’s open archive, as they call it, includes some serious non-mathematical journals such as Cell. (See their list of participating journals for details.)

I’m also not very comfortable with the society journals and university presses, since although they use their profits to benefit mathematics in various ways, they are fully complicit in the system of big deals, the harm of which outweighs those benefits.

The result is that if I have a paper to submit, I tend to have a lot of trouble finding a suitable home for it, and I end up having to compromise on my principles to some extent (particularly if, as happened recently, I have a young coauthor from a country that uses journal rankings to evaluate academics). An obvious place to submit to would be Discrete Analysis, but I feel uncomfortable about that for a different reason, especially now that I have discovered that the facility that enables all the discussion of a paper to be hidden from selected editors does not allow me, as administrator of the website, to hide a paper from myself. (I won’t have this last problem with Advances in Combinatorics, since the librarians at Queens will have the administrator role on the system.)

So my personal options are somewhat limited, but getting better. If I have willing coauthors, then I would now consider (if I had a suitable paper), Acta Mathematica, Annales Henri Lebesgue, Journal de l’École Polytechnique, Discrete Analysis perhaps (but only if the other editors agreed to process my paper offline), Advances in Combinatorics, the Theory of Computing, Electronic Research Announcements in the Mathematical Sciences, the Electronic Journal of Combinatorics, and the Online Journal of Analytic Combinatorics. I also wouldn’t rule out Forum of Mathematics. A couple of journals to which I have an emotional attachment even if I don’t really approve of their practices are GAFA and Combinatorics, Probability and Computing. (The latter bothers me because it is a hybrid journal — that is, it charges subscriptions but also lets authors pay large APCs to make their articles open access, and I heard recently that if you choose the open access option, CUP retains copyright, so you’re not getting that much for your money. But I think not many authors choose this option. The former is also a hybrid journal, and is published by Springer.) Annals of Mathematics, if I’m lucky enough to have an Annals-worthy paper (though I think now I’d try Acta first), is not too bad — although its articles aren’t open access, their subscription costs are much more reasonable than most journals.

That’s a list off the top of my head: if you think I’ve missed out a good option, then I’d be very happy to hear about it.

As an editor, I have recently made the decision that I want to devote all my energies to promoting journals and “post-journal” systems that I fully approve of. So in order to make time for the work that will be involved in establishing Advances in Combinatorics, I have given notice to Forum of Mathematics and Mathematika, the two journals that took up the most of my time, that I will leave their editorial boards at the end of 2018. I feel quite sad about Forum of Mathematics, since I was involved in it from the start, and I really like the way it runs, with proper discussions amongst all the editors about the decisions we make. Also, I am less hostile (for reasons I’ve given in the past) to its APC model than most mathematicians. However, although I am less hostile, I could never say that I have positively liked it, and I came to the conclusion quite a while ago that, as many others have also said, it simply can’t be made to work satisfactorily: it will lead to just as bad market abuses as there are with the subscription system. In the UK it has been a disaster — government open-access mandates have led to universities paying as much as ever for subscriptions and then a whole lot extra for APCs. And there is a real worry that subscription big deals will be replaced by APC big deals, where a country pays a huge amount up front to a publisher in return for people from that country being able to publish with them. This, for example, is what Germany is pushing for. Fortunately, for the moment (if I understand correctly, though I don’t have good insider information on this) they are asking for the average fee per article to be much lower than Elsevier is prepared to accept: long may that impasse continue.

So my leaving Forum of Mathematics is not a protest against it, but simply a practical step that will allow me to focus my energies where I think they can do the most good. I haven’t yet decided whether I ought to resign in protest from some other editorial boards of journals that don’t ask anything of me. Actually, even the practice of having a long list of names of editors, most of whom have zero involvement in the decisions of the journal, is one that bothers me. I recently heard of an Elsevier journal where almost all the editorial board would be happy to resign en masse and set up an ethical version, but the managing editor is strongly against. “But why don’t the rest of the board resign in that case?” I naively asked, to which the answer was, “Because he’s the one who does all the work!” From what I understood, this is literally true — the managing editor handles all the papers and makes all the decisions — but I’m not 100% sure about that.

Is there any point in starting new journals?

Probably major change, if it happens, will be the result of decisions made by major players such as government agencies, national negotiators, and so on. Compared with big events like the Elsevier negotiations in Germany, founding a new journal is a very small step. And even if all mathematicians gave up using the commercial publishers (not something I expect to see any time soon), that would have almost no direct effect, since mathematics journals are bundled together with journals in other subjects, which would continue with the current system.

However, this is a familiar situation in politics. Big decisions are taken by people in positions of power, but what prompts them to make those decisions is often the result of changes in attitudes and behaviour of voters. And big behavioural changes do happen in academia. For example, as we all know, many people have got into the habit of posting all their work on the arXiv, and this accumulation of individual decisions has had the effect of completely changing the way dissemination works in some subjects, including mathematics, a change that has significantly weakened the hold that journals have — or would have if they weren’t bundled together with other journals. Who would ever subscribe at vast expense to a mathematics journal when almost all its content is available online in preprint form?

So I see Advances in Combinatorics as a small step certainly, but a step that needs to be taken. I hope that it will demonstrate once again that starting a serious new journal is not that hard. I also hope that the current trickle of such journals will turn into a flood, that after the flood it will not be possible for people to argue that they are forced to submit articles to the commercial publishers, and that at some point, someone in a position of power will see what is going on, understand better the absurdities of the current system, and take a decision that benefits us all.

Tommaso DorigoModeling Issues Or New Physics ? Surprises From Top Quark Kinematics Study

Simulation, noun:
1. Imitation or enactment
2. The act or process of pretending; feigning.
3. An assumption or imitation of a particular appearance or form; counterfeit; sham.

Well, high-energy physics is all about simulations. 

We have a theoretical model that predicts the outcome of the very energetic particle collisions we create in the core of our giant detectors, but we only have approximate descriptions of the inputs to the theoretical model, so we need simulations. 

read more

June 09, 2018

ResonaancesDark Matter goes sub-GeV

It must have been great to be a particle physicist in the 1990s. Everything was simple and clear then. They knew that, at the most fundamental level, nature was described by one of the five superstring theories which, at low energies, reduced to the Minimal Supersymmetric Standard Model. Dark matter also had a firm place in this narrative, being identified with the lightest neutralino of the MSSM. This simple-minded picture strongly influenced the experimental program of dark matter detection, which was almost entirely focused on the so-called WIMPs in the 1 GeV - 1 TeV mass range. Most of the detectors, including the current leaders XENON and LUX, are blind to sub-GeV dark matter, as slow and light incoming particles are unable to transfer a detectable amount of energy to the target nuclei.

Sometimes progress consists in realizing that you know nothing Jon Snow. The lack of new physics at the LHC invalidates most of the historical motivations for WIMPs. Theoretically, the mass of the dark matter particle could be anywhere between 10^-30 GeV and 10^19 GeV. There are myriads of models positioned anywhere in that range, and it's hard to argue with a straight face that any particular one is favored. We now know that we don't know what dark matter is, and that we should better search in many places. If anything, the small-scale problem of the 𝞚CDM cosmological model can be interpreted as a hint against the boring WIMPS and in favor of light dark matter. For example, if it turns out that dark matter has significant (nuclear size) self-interactions, that can only be realized with sub-GeV particles. 
It takes some time for experiment to catch up with theory, but the process is already well in motion. There is some fascinating progress on the front of ultra-light axion dark matter, which deserves a separate post. Here I want to highlight the ongoing  developments in direct detection of dark matter particles with masses between MeV and GeV. Until recently, the only available constraint in that regime was obtained by recasting data from the XENON10 experiment - the grandfather of the currently operating XENON1T.  In XENON detectors there are two ingredients of the signal generated when a target nucleus is struck:  ionization electrons and scintillation photons. WIMP searches require both to discriminate signal from background. But MeV dark matter interacting with electrons could eject electrons from xenon atoms without producing scintillation. In the standard analysis, such events would be discarded as background. However,  this paper showed that, recycling the available XENON10 data on ionization-only events, one can exclude dark matter in the 100 MeV ballpark with the cross section for scattering on electrons larger than ~0.01 picobarn (10^-38 cm^2). This already has non-trivial consequences for concrete models; for example, a part of the parameter space of milli-charged dark matter is currently best constrained by XENON10.   

It is remarkable that so much useful information can be extracted by basically misusing data collected for another purpose (earlier this year the DarkSide-50 recast their own data in the same manner, excluding another chunk of the parameter space).  Nevertheless, dedicated experiments will soon  be taking over. Recently, two collaborations published first results from their prototype detectors:  one is SENSEI, which uses 0.1 gram of silicon CCDs, and the other is SuperCDMS, which uses 1 gram of silicon semiconductor.  Both are sensitive to eV energy depositions, thanks to which they can extend the search region to lower dark matter mass regions, and set novel limits in the virgin territory between 0.5 and 5 MeV.  A compilation of the existing direct detection limits is shown in the plot. As you can see, above 5 MeV the tiny prototypes cannot yet beat the XENON10 recast. But that will certainly change as soon as full-blown detectors are constructed, after which the XENON10 sensitivity should be improved by several orders of magnitude.
Should we be restless waiting for these results? Well, for any single experiment the chance of finding nothing are immensely larger than that of finding something. Nevertheless, the technical progress and the widening scope of searches offer some hope that the dark matter puzzle may be solved soon.