Planet Musings

June 19, 2018

Doug NatelsonScientific American - what the heck is this?

Today, Scientific American ran this on their blogs page.  This article calls to mind weird mysticism stuff like crystal energy, homeopathy, and tree waves (a reference that attendees of mid-1990s APS meetings might get), and would not be out of place in Omni Magazine in about 1979.

I’ve written before about SciAm and their blogs.  My offer still stands, if they ever want a condensed matter/nano blog that I promise won’t verge into hype or pseudoscience.

BackreactionAstrophysicists try to falsify multiverse, find they can’t.

Ben Carson, trying tomake sense of the multiverse. The idea that we live in a multiverse – an infinite collection of universes from which ours is merely one – is interesting but unscientific. It postulates the existence of entities that are unnecessary to describe what we observe. All those other universes are inaccessible to experiment. Science, therefore, cannot say anything about their

John BaezApplied Category Theory Course: Databases


In my online course we’re now into the third chapter of Fong and Spivak’s book Seven Sketches. Now we’re talking about databases!

To some extent this is just an excuse to (finally) introduce categories, functors, natural transformations, adjoint functors and Kan extensions. Great stuff, and databases are a great source of easy examples.

But it’s also true that Spivak helps run a company called Categorical Informatics that actually helps design databases using category theory! And his partner, Ryan Wisnesky, would be happy to talk to people about it. If you’re interested, click the link: he’s attending my course.

To read and join discussions on Chapter 3 go here:

Chapter 3

You can also do exercises and puzzles, and see other people’s answers to these.

Here are the lectures I’ve given so far:

Lecture 34 – Chapter 3: Categories
Lecture 35 – Chapter 3: Categories versus Preorders
Lecture 36 – Chapter 3: Categories from Graphs
Lecture 37 – Chapter 3: Presentations of Categories
Lecture 38 – Chapter 3: Functors
Lecture 39 – Chapter 3: Databases
Lecture 40 – Chapter 3: Relations
Lecture 41 – Chapter 3: Composing Functors
Lecture 42 – Chapter 3: Transforming Databases
Lecture 43 – Chapter 3: Natural Transformations
Lecture 44 – Chapter 3: Categories, Functors and Natural Transformations
Lecture 45 – Chapter 3: Composing Natural Transformations
Lecture 46 – Chapter 3: Isomorphisms
Lecture 47 – Chapter 3: Adjoint Functors
Lecture 48 – Chapter 3: Adjoint Functors

June 18, 2018

David HoggDr Pearson

It was my great pleasure to sit on the PhD defense committee for the successful defense of Sarah Pearson. She wrote a thesis about low-mass galaxies and globular clusters, considering both their interactions with each other, and with the bigger galaxies into which they later fall. She has some nice analyses of the Palomar 5 tidal stream, and what it's morphology might tell us about the Milky Way halo and bar. And also nice results on gas bridges and streams around pairs of dwarf galaxies.

I was most interested in her stellar-stream results, including several things I hadn't thought about before: One is that prograde streams are more affected by the bar and spiral arms in the disk than retrograde streams. Another is that we might be able to find globular-cluster streams around other galaxies nearby. That would be incredible! And since (as she showed) you can learn a lot about a galaxy just from the shape of a stream, we might not need to do much more than detect streams around other galaxies to learn a lot. It was a pleasure to serve on the committee, and it is a beautiful body of work.

John BaezCognition, Convexity, and Category Theory

Two more students in the Applied Category Theory 2018 school wrote a blog article about a paper they read:

• Tai-Danae Bradley and Brad Theilman, Cognition, convexity and category theory, The n-Category Café, 10 March 2018.

Tai-Danae Bradley is a mathematics PhD student at the CUNY Graduate Center and well-known math blogger. Brad Theilman is a grad student in neuroscience at the Gentner Lab at U. C. San Diego. I was happy to get to know both of them when the school met in Leiden.

In their blog article, they explain this paper:

• Joe Bolt, Bob Coecke, Fabrizio Genovese, Martha Lewis, Dan Marsden, and Robin Piedeleu, Interacting conceptual spaces I.

Fans of convex sets will enjoy this!

n-Category Café ∞-Atomic Geometric Morphisms

Today’s installment in the ongoing project to sketch the \infty-elephant: atomic geometric morphisms.

Chapter C3 of Sketches of an Elephant studies various classes of geometric morphisms between toposes. Pretty much all of this chapter has been categorified, except for section C3.5 about atomic geometric morphisms. To briefly summarize the picture:

  • Sections C3.1 (open geometric morphisms) and C3.3 (locally connected geometric morphisms) are steps n=1n=-1 and n=0n=0 on an infinite ladder of locally n-connected geometric morphisms, for 1n-1 \le n \le \infty. A geometric morphism between (n+1,1)(n+1,1)-toposes is locally nn-connected if its inverse image functor is locally cartesian closed and has a left adjoint. More generally, a geometric morphism between (m,1)(m,1)-toposes is locally nn-connected, for n<mn\lt m, if it is “locally” locally nn-connected on nn-truncated maps.

  • Sections C3.2 (proper geometric morphisms) and C3.4 (tidy geometric morphisms) are likewise steps n=1n=-1 and n=0n=0 on an infinite ladder of n-proper geometric morphisms.

  • Section C3.6 (local geometric morphisms) is also step n=0n=0 on an infinite ladder: a geometric morphism between (n+1,1)(n+1,1)-toposes is nn-local if its direct image functor has an indexed right adjoint. Cohesive toposes, which have attracted a lot of attention around here, are both locally \infty-connected and \infty-local. (Curiously, the n=1n=-1 case of locality doesn’t seem to be mentioned in the 1-Elephant; has anyone seen it before?)

So what about C3.5? An atomic geometric morphism between elementary 1-toposes is usually defined as one whose inverse image functor is logical. This is an intriguing prospect to categorify, because it appears to mix the “elementary” and “Grothendieck” aspects of topos theory: a geometric morphisms are arguably the natural morphisms between Grothendieck toposes, while logical functors are more natural for the elementary sort (where “natural” means “preserves all the structure in the definition”). So now that we’re starting to see some progress on elementary higher toposes (my post last year has now been followed by a preprint by Rasekh), we might hope be able to make some progress on it.

Unfortunately, the definitions of elementary (,1)(\infty,1)-topos currently under consideration have a problem when it comes to defining logical functors. A logical functor between 1-toposes can be defined as a cartesian closed functor that preserves the subobject classifier, i.e. F(Ω)ΩF(\Omega) \cong \Omega. The higher analogue of the subobject classifier is an object classifier — but note the switch from definite to indefinite article! For Russellian size reasons, we can’t expect to have one object classifer that classifies all objects, only a tower of “universes” each of which classifies some subcollection of “small” objects.

What does it mean for a functor to “preserve” the tower of object classifiers? If an (,1)(\infty,1)-topos came equipped with a specified tower of object classifiers (indexed by \mathbb{N}, say, or maybe by the ordinal numbers), then we could ask a logical functor to preserve them one by one. This would probably be the relevant kind of “logical functor” when discussing categorical semantics of homotopy type theory: since type theory does have a specified tower of universe types U 0:U 1:U 2:U_0 : U_1 : U_2 : \cdots, the initiality conjecture for HoTT should probably say that the syntactic category is an elementary (,1)(\infty,1)-topos that’s initial among logical functors of this sort.

However, Grothendieck (,1)(\infty,1)-topoi don’t really come equipped with such a tower. And even if they did, preserving it level by level doesn’t seem like the right sort of “logical functor” to use in defining atomic geometric morphisms; there’s no reason to expect such a functor to “preserve size” exactly.

What do we want of a logical functor? Well, glancing through some of the theorems about logical functors in the 1-Elephant, one result that stands out to me is the following: if F:SEF:\mathbf{S}\to \mathbf{E} is a logical functor with a left adjoint LL, then LL induces isomorphisms of subobject lattices Sub E(A)Sub S(LA)Sub_{\mathbf{E}}(A) \cong Sub_{\mathbf{S}}(L A). This is easy to prove using the adjointness LFL\dashv F and the fact that FF preserves the subobject classifier:

Sub E(A)E(A,Ω E)E(A,FΩ S)E(LA,Ω S)Sub S(LA). Sub_{\mathbf{E}}(A) \cong \mathbf{E}(A,\Omega_{\mathbf{E}}) \cong \mathbf{E}(A,F \Omega_{\mathbf{S}}) \cong \mathbf{E}(L A,\Omega_{\mathbf{S}})\cong Sub_{\mathbf{S}}(L A).

What would be the analogue for (,1)(\infty,1)-topoi? Well, if we imagine hypothetically that we had a classifier UU for all objects, then the same argument would show that LL induces an equivalence between entire slice categories E/AS/LA\mathbf{E}/A \simeq \mathbf{S}/L A. (Actually, I’m glossing over something here: the direct arguments with Ω\Omega and UU show only an equivalence between sets of subobjects and cores of slice categories. The rest comes from the fact that FF preserves local cartesian closure as well as the (sub)object classifier, so that we can enhance Ω\Omega to an internal poset and UU to an internal full subcategory and both of these are preserved by FF as well.)

In fact, the converse is true too: reversing the above argument shows that FF preserves Ω\Omega if and only if LL induces isomorphisms of subobject lattices, and similarly FF preserves UU if and only if LL induces equivalences of slice categories. The latter condition, however, is something that can be said without reference to the nonexistent UU. So if we have a functor F:ESF:\mathbf{E}\to \mathbf{S} between (,1)(\infty,1)-toposes that has a left adjoint LL, then I think it’s reasonable to define FF to be logical if it is locally cartesian closed and LL induces equivalences E/AS/LA\mathbf{E}/A \simeq \mathbf{S}/L A.

Furthermore, a logical functor between 1-toposes has a left adjoint if and only if it has a right adjoint. (This follows from the monadicity of the powerset functor P:E opEP : \mathbf{E}^{op} \to \mathbf{E} for 1-toposes, which we don’t have an analogue of (yet) in the \infty-case.) In particular, if the inverse image functor in a geometric morphism is logical, then it automatically has a left adjoint, so that the above characterization of logical-ness applies. And since a logical functor is locally cartesian closed, this geometric morphism is automatically locally connected as well. This suggests the following:

Definition: A geometric morphism p:ESp:\mathbf{E}\to \mathbf{S} between (,1)(\infty,1)-topoi is \infty-atomic if

  1. It is locally \infty-connected, i.e. p *p^\ast is locally cartesian closed and has a left adjoint p !p_!, and
  2. p !p_! induces equivalences of slice categories E/AS/p !A\mathbf{E}/A \simeq \mathbf{S}/p_! A for all AEA\in \mathbf{E}.

This seems natural to me, but it’s very strong! In particular, taking A=1A=1 we get an equivalence EE/1S/p !1\mathbf{E}\simeq \mathbf{E}/1 \simeq \mathbf{S}/p_! 1, so that E\mathbf{E} is equivalent to a slice category of S\mathbf{S}. In other words, \infty-atomic geometric morphisms coincide with local homeomorphisms!

Is that really reasonable? Actually, I think it is. Consider the simplest example of an atomic geometric morphism of 1-topoi that is not a local homeomorphism: [G,Set]Set[G,Set] \to Set for a group GG. The corresponding geometric morphism of (,1)(\infty,1)-topoi [G,Gpd]Gpd[G,\infty Gpd] \to \infty Gpd is a local homeomorphism! Specifically, we have [G,Gpd]Gpd/BG[G,\infty Gpd] \simeq \infty Gpd / B G. So in a sense, the difference between atomic and locally-homeomorphic vanishes in the limit nn\to \infty.

To be sure, there are other atomic geometric morphisms of 1-topoi that do not extend to local homeomorphisms of (,1)(\infty,1)-topoi, such as Cont(G)SetCont(G) \to Set for a topological group GG. But it seems reasonable to me to regard these as “1-atomic morphisms that are not \infty-atomic” — a thing which we should certainly expect to exist, just as there are locally 0-connected morphisms that are not locally \infty-connected, and 0-proper morphisms that are not \infty-proper.

We can also “see” how the difference gets “pushed off to \infty” to vanish, in terms of sites of definition. In C3.5.8 of the 1-Elephant it is shown that every atomic Grothendieck topos has a site of definition in which (among other properties) all morphisms are effective epimorphisms. If we trace through the proof, we see that this effective-epi condition comes about as the “dual” class to the monomorphisms that the left adjoint of a logical functor induces an equivalence on. Since an (n+1,1)(n+1,1)-topos has classifiers for nn-truncated objects, we would expect an atomic one to have a site of definition in which all morphisms belong to the dual class of the nn-truncated morphisms, i.e. the nn-connected morphisms. So as nn\to \infty, we get stronger and stronger conditions on the morphisms in our site, until in the limit we have a classifier for all morphisms, and the morphisms in our site are all required to be equivalences. In other words, the site is itself an \infty-groupoid, and thus the topos of (pre)sheaves on it is a slice of Gpd\infty Gpd.

However, it could be that I’m missing something and this is not the best categorification of atomic geometric morphisms. Any thoughts from readers?

June 17, 2018

BackreactionPhysicists are very predictable

I have a new paper on the arXiv, which came out of a collaboration with Tobias Mistele and Tom Price. We fed a neural network with data about the publication activity of physicists and tried to make a “fake” prediction, for which we used data from the years 1996 up to 2008 to predict the next ten years. Data come from the arXiv via the Open Archives Initiative. To train the network, we took a

BackreactionScience Magazine had my book reviewed, and it’s glorious, glorious.

Science Magazine had my book “Lost in Math” reviewed by Dr. Djuna Lize Croon, a postdoctoral associate at the Department of Physics at Dartmouth College, in New Hampshire, USA. Dr. Croon has worked on the very research topics that my book exposes as mathematical fantasies, such as “extra natural inflation” or “holographic composite Higgs models,” so choosing her to review my book is an

Doug NatelsonWater at the nanoscale

One reason the nanoscale is home to some interesting physics and chemistry is that the nanometer is a typical scale for molecules.   When the size of your system becomes comparable to the molecular scale, you can reasonably expect something to happen, in the sense that it should no longer be possible to ignore the fact that your system is actually built out of molecules.

Consider water as an example.  Water molecules have finite size (on the order of 0.2 nm between the hydrogens), a definite angled shape, and have a bit of an electric dipole moment (the oxygen has a slight excess of electron density and the hydrogens have a slight deficit).  In the liquid state, the water molecules are basically jostling around and have a typical intermolecular distance comparable to the size of the molecule.  If you confine water down to a nanoscale volume, you know at some point the finite size and interactions (steric and otherwise) between the water molecules have to matter.  For example, squeeze water down to a few molecular layers between solid boundaries, and it starts to act more like an elastic solid than a viscous fluid.  

Another consequence of this confinement in water can be seen in measurements of its dielectric properties - how charge inside rearranges itself in response to an external electric field.  In bulk liquid water, there are two components to the dielectric response.  The electronic clouds in the individual molecules can polarize a bit, and the molecules themselves (with their electric dipole moments) can reorient.  This latter contribution ends up being very important for dc electric fields, and as a result the dc relative dielectric permittivity of water, \(\kappa\), is about 80 (compared with 1 for the vacuum, and around 3.9 for SiO2).   At the nanoscale, however, the motion of the water molecules should be hindered, especially near a surface.  That should depress \(\kappa\) for nanoconfined water.

In a preprint on the arxiv this week, that is exactly what is found.  Using a clever design, water is confined in nanoscale channels defined by a graphite floor, hexagonal boron nitride (hBN) walls, and a hBN roof.  A conductive atomic force microscope tip is used as a top electrode, the graphite is used as a bottom electrode, and the investigators are able to see results consistent with \(\kappa\) falling to roughly 2.1 for layers about 0.6-0.9 nm thick adjacent to the channel floor and ceiling.  The result is neat, and it should provide a very interesting test case for attempts to model these confinement effects computationally.

June 16, 2018

Tommaso DorigoOn The Residual Brightness Of Eclipsed Jovian Moons

While preparing for another evening of observation of Jupiter's atmosphere with my faithful 16" dobsonian scope, I found out that the satellite Io will disappear behind the Jovian shadow tonight. This is a quite common phenomenon and not a very spectacular one, but still quite interesting to look forward to during a visual observation - the moon takes some time to fully disappear, so it is fun to follow the event.
This however got me thinking. A fully eclipsed jovian moon should still be able to reflect back some light picked up from the still lit other satellites - so it should not, after all, appear completely dark. Can a calculation be made of the effect ? Of course - and it's not that difficult.

read more

David Hogg#TASI, day 9

Today was my last day and fifth lecture at TASI. This lecture was crowd-sourced in content! I spoke about Fisher information, linear algebra tips and tricks, and decision theory and model selection. On the latter I strongly advocated engineering methods like cross-validation!

Over lunch I had a great set of conversations with Zach Berta-Thompson about precise measurement for exoplanets, and also hack weeks like the #GaiaSprint. We went deep into the limits on ultra-precise photometry from the ground. We wondered at the point that the best imaging systems get the best precision (on photometry, of point sources) by de-focusing. That has always struck me as somehow absurd, though it's true that you don't have to understand your system nearly so well when you are out of focus (for many reasons).

We had one very good idea: Instead of de-focusing, put in an objective prism! You could get many of the benefits of de-focus but also get far more information about the atmosphere and speckle and scintillation and so on. In principle, you might beat the best measurements made to date. And it is a cheap experiment to perform.

June 15, 2018

Matt von HippelQuelques Houches

For the last two weeks I’ve been at Les Houches, a village in the French Alps, for the Summer School on Structures in Local Quantum Field Theory.


To assist, we have a view of some very large structures in local quantum field theory

Les Houches has a long history of prestigious summer schools in theoretical physics, going back to the activity of Cécile DeWitt-Morette after the second world war. This was more of a workshop than a “school”, though: each speaker gave one talk, and they weren’t really geared for students.

The workshop was organized by Dirk Kreimer and Spencer Bloch, who both have a long track record of work on scattering amplitudes with a high level of mathematical sophistication. The group they invited was an even mix of physicists interested in mathematics and mathematicians interested in physics. The result was a series of talks that managed to both be thoroughly technical and ask extremely deep questions, including “is quantum electrodynamics really an asymptotic series?”, “are there simple graph invariants that uniquely identify Feynman integrals?”, and several talks about something called the Spine of Outer Space, which still sounds a bit like a bad sci-fi novel. Along the way there were several talks showcasing the growing understanding of elliptic polylogarithms, giving me an opportunity to quiz Johannes Broedel about his recent work.

While some of the more mathematical talks went over my head, they spurred a lot of productive dialogues between physicists and mathematicians. Several talks had last-minute slides, added as a result of collaborations that happened right there at the workshop. There was even an entire extra talk, by David Broadhurst, based on work he did just a few days before.

We also had a talk by Jaclyn Bell, a former student of one of the participants who was on a BBC reality show about training to be an astronaut. She’s heavily involved in outreach now, and honestly I’m a little envious of how good she is at it.

John BaezApplied Category Theory 2018/2019

A lot happened at Applied Category Theory 2018. Even as it’s still winding down, we’re already starting to plan a followup in 2019, to be held in Oxford. Here are some notes Joshua Tan sent out:

  1. Discussions: Minutes from the discussions can be found here.
  2. Photos: Ross Duncan took some very glamorous photos of the conference, which you can find here.

  3. Videos: Videos of talks are online here: courtesy of Jelle Herold and Fabrizio Genovese.

  4. Next year’s workshop: Bob Coecke will be organizing ACT 2019, to be hosted in Oxford sometime spring/summer. There will be a call for papers.

  5. Next year’s school: Daniel Cicala is helping organize next year’s ACT school. Please contact him at if you would like to get involved.

  6. Look forward to the official call for submissions, coming soon, for the first issue of Compositionality!

The minutes mentioned above contain interesting thoughts on these topics:

• Day 1: Causality
• Day 2: AI & Cognition
• Day 3: Dynamical Systems
• Day 4: Systems Biology
• Day 5: Closing

Jordan EllenbergThe Lovasz number of the plane is about 3.48287

As seen in this comment on Polymath and explicated further in Fernando de Oliveira Filho’s thesis, section 4.4.

I actually spent much of today thinking about this so let me try to explain it in a down-to-earth way, because it involved me thinking about Bessel functions for the first time ever, surely a life event worthy of recording.

So here’s what we’re going to do.  As I mentioned last week, you can express this problem as follows:  suppose you have a map h: R^2 -> V, for some normed vector space V, which is a unit-distance embedding; that is, if |x-x’|_{R^2} = 1, then |h(x)-h(x’)|_V = 1.  (We don’t ask that h is an isometry, only that it preserves the distance-1 set.)

Then let t be the radius of the smallest hypersphere in V containing h(R^2).

Then any graph embeddable in R^2 with all edges of length 1 is sent to a unit-distance graph in V contained in the hyperplane of radius t; this turns out to be equivalent to saying the Lovasz number of G (ok, really I mean the Lovasz number of the complement of G) is at most 1/(1-2t).  So we want to show that t is bounded below 1, is the point.  Or rather:  we can find a V and a map from R^2 to V to make this the case.

So here’s one!  Let V be the space of L^2 functions on R^2 with the usual inner product.  Choose a square-integrable function F on R^2 — in fact let’s normalize to make F^2 integrate to 1 — and for each a in R^2 we let h(a) be the function F(x-a).

We want the distance between F(x-a) and F(x-b) to be the same for every pair of points at distance 1 from each other; the easiest way to arrange that is to insist that F(x) be a radially symmetric function F(x) = f(|x|); then it’s easy to see that the distance between F(x-a) and F(x-b) in V is a function G(a-b) which depends only on |a-b|.  We write

g(r) = \int_{\mathbf{R}^2} F(x)F(x-r) dx

so that the squared distance between F(x) and F(x-r) is

\int F(x)^2 dx - 2 \int F(x)F(x-r) dx + \int F(x-r)^2 dx = 2(1-g(r)).

In particular, if two points in R^2 are at distance 1, the squared distance between their images in V is 2(1-g(1)).  Note also that g(0) is the square integral of F, which is 1.

What kind of hypersphere encloses all the points F(x-a) in V?  We can just go ahead and take the “center” of our hypersphere to be 0; since |F| = 1, every point in h(R^2) lies in (indeed, lies on) the sphere of radius 1 around the origin.

Hey but remember:  we want to study a unit-distance embedding of R^2 in V.  Right now, h sends unit distances to the distance 2(1-g(1)), whatever that is.  We can fix that by scaling h by the square root of that number.  So now h sends unit distances to unit distances, and its image is enclosed in a hypersphere of radius


The more negative g(1) is, the smaller this sphere is, which means the more we can “fold” R^2 into a small space.  Remember, the relationship between hypersphere number and Lovasz theta is

2t + 1/\theta = 1

and plugging in the above bound for the hypersphere number, we find that the Lovasz theta number of R^2, and thus the Lovasz theta number of any unit-distance graph in R^2, is at most


So the only question is — what is g(1)?

Well, that depends on what g is.

Which depends on what F is.

Which depends on what f is.

And of course we get to choose what f is, in order to make g(1) as negative as possible.

How do we do this?  Well, here’s the trick.  The function G is not arbitrary; if it were, we could make g(1) whatever we wanted.  It’s not hard to see that G is what’s called a positive definite function on R^2.  And moreover, if G is positive definite, there exists some f giving rise to it.  (Roughly speaking, this is the fact that a positive definite symmetric matrix has a square root.)  So we ask:  if G is a positive definite (radially symmetric) function on R^2, and g(0) = 1, how small can g(1) be?

And now there’s an old theorem of (Wisconsin’s own!) Isaac Schoenberg which helpfully classifies the positive definite functions on R^2; they are precisely the functions G(x) = g(|x|) where g is a mixture of scalings of the Bessel function $J_0$:

g(r) = \int_0^\infty J_0(ur) A(u)

for some everywhere nonnegative A(u).  (Actually it’s more correct to say that A is a distribution and we are integrating J_0(ur) against a non-decreasing measure.)

So g(1) can be no smaller than the minimum value of J_0 on [0,infty], and in fact can be exactly that small if you let A become narrowly supported around the minimum argument.  This is basically just taking g to be a rescaled version of J_0 which achieves its minimum at 1.  That minimum value is about -0.4, and so the Lovasz theta for any unit-distance subgraph on the plane is bounded above by a number that’s about 1 + 1/0.4 = 3.5.

To sum up:  I give you a set of points in the plane, I connect every pair that’s at distance 1, and I ask how you can embed that graph in a small hypersphere keeping all the distances 1.  And you say:  “Oh, I know what to do, just assign to each point a the radially symmetrized Bessel function J_0(|x-a|) on R^2, the embedding of your graph in the finite-dimensional space of functions spanned by those Bessel translates will do the trick!”

That is cool!

Remark: Oliveira’s thesis does this for Euclidean space of every dimension (it gets more complicated.)  And I think (using analysis I haven’t really tried to understand) he doesn’t just give an upper bound for the Lovasz number of the plane as I do in this post, he really computes that number on the nose.

Update:  DeCorte, Oliveira, and Vallentin just posted a relevant paper on the arXiv this morning!

John BaezDynamical Systems and Their Steady States


As part of the Applied Category Theory 2018 school, Maru Sarazola wrote a blog article on open dynamical systems and their steady states. Check it out:

• Maru Sarazola, Dynamical systems and their steady states, 2 April 2018.

She compares two papers:

• David Spivak, The steady states of coupled dynamical systems compose according to matrix arithmetic.

• John Baez and Blake Pollard, A compositional framework for reaction networks, Reviews in Mathematical Physics 29 (2017), 1750028.
(Blog article here.)

It’s great, because I’d never really gotten around to understanding the precise relationship between these two approaches. I wish I knew the answers to the questions she raises at the end!

John BaezThe Behavioral Approach to Systems Theory


Two more students in the Applied Category Theory 2018 school wrote a blog article about something they read:

• Eliana Lorch and Joshua Tan, The behavioral approach to systems theory, 15 June 2018.

Eliana Lorch is a mathematician based in San Francisco. Joshua Tan is a grad student in computer science at the University of Oxford and one of the organizers of Applied Category Theory 2018.

They wrote a great summary of this paper, which has been an inspiration to me and many others:

• Jan Willems, The behavioral approach to open and interconnected systems, IEEE Control Systems 27 (2007), 46–99.

They also list many papers influenced by it, and raise a couple of interesting problems with Willems’ idea, which can probably be handled by generalizing it.

n-Category Café The Behavioral Approach to Systems Theory

guest post by Eliana Lorch and Joshua Tan

As part of the Applied Category Theory seminar, we discussed an article commonly cited as an inspiration by many papers1 taking a categorical approach to systems theory, The Behavioral Approach to Open and Interconnected Systems. In this sprawling monograph for the IEEE Control Systems Magazine, legendary control theorist Jan Willems poses and answers foundational questions like how to define the very concept of mathematical model, gives fully-worked examples of his approach to modeling from physical first principles, provides various arguments in favor of his framework versus others, and finally proves several theorems about the special case of linear time-invariant differential systems.

In this post, we’ll summarize the behavioral approach, Willems’ core definitions, and his “systematic procedure” for creating behavioral models; we’ll also examine the limitations of Willems’ framework, and conclude with a partial reference list of Willems-inspired categorical approaches to understanding systems.

The behavioral approach

Here’s the view from 10,000 feet of the behavioral approach in contrast with the traditional signal-flow approach:

Image: Comparison table of traditional, functional approach vs Willems' relational approach

Willems’ approach breaks down into: (1) considering a dynamical system as a ‘behavior,’ and (2) defining interconnection as variable sharing.

Dynamical system as behavior

Willems goes so far as to claim: “It is remarkable that the idea of viewing a system in terms of inputs and outputs, in terms of cause and effect, kept its central place in systems and control theory throughout the 20th century. Input/output thinking is not an appropriate starting point in a field that has modeling of physical systems as one of its main concerns.”

To get a sense of the inappropriateness of input/output-based modeling: consider a freely swinging pendulum in 2-dimensional space with a finite-sized bob. Now consider adding to this system a model representing the right-hand half-plane being filled with cement. With soft-contact mechanics, we could determine what force the cement exerts on the pendulum when it bounces against it — that is, when the pendulum bob’s center of mass comes within its radius of the right half-plane.

Traditionally, we might define a function that takes the pendulum’s position as input and produces a force as output. But this is insufficient to model the effect of the wall, which also prevents the pendulum bob’s center of mass from ever being in the right-hand half-plane; the wall imposes a constraint on the possible states of the world. How can we can capture this kind of constraint? In this case, we can extend the state model with inequalities delineating the feasible region of the state space.2

Willems’ insight is that the entire modeling framework can be subsumed by a sufficiently broad notion of “feasible region.” A dynamical system is simply a relation on the variables, forming a subset of all conceivable trajectories — a ‘behavior.’

Interconnection as variable sharing

The signal-flow approach to systems modeling requires labeling the terminals of a system as inputs or outputs before a model can be formulated; Willems argues this does not respect the actual physics. Most physical terminals — electrical wires, mechanical linkages or gears, thermal couplings, etc. — do not have an intrinsic, a priori directionality, and may permit “signals” to “flow” in either or both directions. Rather, physical interconnections constrain certain variables on either side to be equal (or equal-and-opposite). After having modeled a system, one may be able to prove that it obeys a certain partitioning of variables into inputs and outputs, but assuming this up front obscures the underlying physical reality. This paradigm shift amounts to moving from functional composition (given aa we compute b=f(a)b=f(a), then c=g(b)c=g(b)) to relational composition: (a,c)(R;S)(a,c)\in(R;S) iff ((a,b 0),(b 1,c))(R×S).b 0=b 1\exists ((a,b_0),(b_1,c))\in(R\times S).\;b_0=b_1, which can be read as “variable sharing” between b 0b_0 and b 1b_1. This is a way of restoring symmetry to composition — giving no precedence either between the two entities being composed, nor between each entity’s domain and codomain.

Image: Examples of systems to model in various domains
Examples of systems to model within the behavioral framework.

Core definitions

Given some natural phenomenon we wish to model mathematically, the first step is to establish the universum, the set of all a priori feasible outcomes, notated 𝕍\mathbb{V}. Then, Willems asserts a mathematical model to be a restriction of possible outcomes to a subset of 𝕍\mathbb{V}.3 This subset itself is called the behavior of the model, and is written \mathcal{B}. This concept is, as the name suggests, at the center of Willems’ “behavioral approach”: he asserts that “equivalence of models, properties of models, model representations, and system identification must refer to the behavior.”

A dynamical system is a model in which elements of the universum 𝕍\mathbb{V} are functions of time, that is, a triple

Σ=(𝕋,𝕎,)\Sigma = \left(\mathbb{T}, \mathbb{W}, \mathcal{B}\right)

in which 𝕍:=𝕎 𝕋\mathcal{B} \subseteq \mathbb{V} := \mathbb{W}^\mathbb{T}. 𝕋\mathbb{T} is referred to as the time set (which may be discrete or continuous), and 𝕎\mathbb{W} is referred to as the signal space. The elements of \mathcal{B} are trajectories w:𝕋𝕎w: \mathbb{T}\rightarrow\mathbb{W}.

A dynamical system with latent variables is one whose signal space is a Cartesian product of manifest variables and latent variables: the “full” system is a tuple

Σ full=(𝕋,𝕄,𝕃, full) \Sigma_{full}= \left(\mathbb{T}, \mathbb{M}, \mathbb{L}, \mathcal{B}_{full}\right)

where the behavior full𝕍:=(𝕄×𝕃) 𝕋\mathcal{B}_{{full}}\subseteq \mathbb{V} := \left(\mathbb{M} \times \mathbb{L}\right)^\mathbb{T}. Here 𝕄\mathbb{M} is the set of manifest values and 𝕃\mathbb{L} is the set of latent values.

A full behavior Σ full\Sigma_{{full}} is said to induce or represent a manifest dynamical system Σ=(𝕋,𝕄,)\Sigma=\left(\mathbb{T}, \mathbb{M}, \mathcal{B}\right), with the manifest behavior \mathcal{B} defined by

:={m:𝕋𝕄|:𝕋𝕃.m, full} \mathcal{B}:=\left\{m: \mathbb{T} \rightarrow \mathbb{M}\;|\;\exists \ell: \mathbb{T} \rightarrow \mathbb{L}. \left\langle m,\ell\right\rangle \in \mathcal{B}_{{full}}\right\}

The behavior of all the variables is determined by the equations specifying the first-principles physical laws, together with the equations expressing all the constraints from interconnection. Willems treats interconnection as simply “variable sharing,” that is, restricting behavior such that the trajectories assigned to the interconnected variables are constrained to be equal (or sometimes “equal and opposite,” depending on sign conventions).

An interconnection architecture is a sort of wiring diagram (analogous to operadic wiring diagrams) that describes the way in which a collection of systems is interconnected. Willems formalises this as a graph with leaves: a set VV of vertices (which are systems or modules), a set EE of edges (terminals), and a set LL of leaves (open wires), with an assignment map 𝒜\mathcal{A}: to each edge an unordered pair of vertices and to each leaf a single vertex. A leaf is depicted as an open half-edge emanating from the graph, like a wire sticking out of a circuit — the “open” part of “open and interconnected systems.” (Note that this interpretation of a “leaf” differs from the usual graph-theoretic “vertex with degree one”; here, a leaf is like an “edge with degree one.”) There’s a type-checking condition that the set of leaves and internal half-edges emanating from a vertex must be completely matched to the set of terminals of the associated module, so that you don’t have hidden dangling wires.

Finally, the interconnection architecture requires a module embedding that specifies how each vertex is interpreted as a module: either a primitive model made of physical laws, or a sub-model within which there’s a further module embedding. Here we get a sense of the “zooming” nature of the modeling procedure.

Tearing, Zooming, Linking

Willems proposes a “systematic procedure” for generating models in this behavioral form: first decompose (tear) the system under investigation into smaller subsystems, then recursively apply the modeling process (zoom in) to each subsystem, and finally compose (link) the resulting submodels together into an overall system model. Rendered in pseudocode, it looks like this:

define makeModel(System system) => Model {
  if (system is directly governed by known physics) {
    return knownModel(system)
  } else {
    WiringDiagram<System> decomposition := tear(system)
    List<Model> submodels := decomposition.listSubsystems().fmap(makeModel)

As an example, Willems analyzes an open hydraulic system made of two tanks interconnected by a pipe:

Image: Two tanks interconnected by a pipe

In the “tear” step, he breaks the system apart into three subsystems: the two tanks, (1) and (3) in the figure, and the pipe (2). In the “zoom” step, each of the three subsystems is “simple enough to be modeled using first-principles physical laws,” so he fills in the known model for each one (reaching the recursive base case, rather than starting again from “tear”). For the pipe, flows on each end are equal and opposite, and the difference in pressure is proportional to the flow; for each of the tanks, conservation of mass and Bernoulli’s laws relate the pressures, flows, and height of water in the tank.

Then, in the “link” step, he starts with, for each subsystem, a copy of the corresponding model (initially each relating a completely separate set of variables), then combines the models according to the links between subsystems, using the appropriate “interconnection laws” for each pair of connected terminals. In this example, the interconnection laws consist of setting connected pressures to be equal and connected flows to be equal and opposite.

An essential claim of Willems’ philosophy is that for physical systems for which we can use his modeling procedure of breaking systems into subsystems with interconnections, the hierarchical structure of our model will match reality closely enough that there will be straightforward physical principles governing the interactions at the interfaces (which tend to correspond to partitions of physical space). Many engineered systems deliberately have their important interfaces clearly delineated, but he explicitly disclaims that there are forms of interaction, such as gravitational interaction and colliding objects, which do not perfectly fit this framework.

Limitations of Willems’ framework

Not all interconnections fit. This framework assumes that the interface of a module can be specified—as a finite set of terminals—prior to composition with other modules, and Willems identifies three situations where this assumption fails:

  • nn-body problems exhibiting “virtual terminals” which are not so much a property of each module, but of each pair of modules. The classic example of this phenomenon is the nn-body problem in gravitational (or electrostatic, etc.) dynamics: an nn-body system has O(n 2)O(n^2) interactions, but the combination of an nn-body system and an mm-body system has more than O(n 2+m 2)O(n^2+m^2) interactions.

  • “Distributed interconnections” in which a terminal has continuous spatial extent (e.g. heat conduction along a surface), calling for partial differential equations involving coordinate variables.

  • Contact mechanics such as rolling, sliding, bouncing, collisions, etc., in which interconnections appear and vanish depending on the values of certain position variables, as objects come into and out of contact.

Directional components and systems. In contrast to an a posteriori partitioning of variables into inputs and outputs (meaning that any setting of the “inputs” uniquely determines the trajectories of the “outputs”), some components fundamentally exhibit a priori input/output behavior (that is, they cannot be back-driven), and Willems’ framework can’t accommodate these.

  • Ideal amplifier. The behavior of an ideal amplifier with gain KK, input xx and output yy would be {(x,y)|y=Kx}\left\{ (x,y) | y=K x\right\} (constant-gain model), yet Willems’ approach here would make the incorrect prediction that we could back-drive terminal yy by interconnecting it to a signal source and expect to observe the signal scaled by 1/K1/K at terminal xx. However, an “ideal amplifier” is not a first-principles physical law; the modeling procedure might suggest we “tear” an amplifier further into its component parts, and then “tear” the constituent transistors with a deconstruction (such as the Gummel-Poon model) into passive circuit primitives. This might result in a more realistic model of actual amplifier behavior, though it would have at least an order of magnitude more components than the constant-gain model.

  • Humans, etc. Willems lists additional signal-flow systems for which the behavioral approach is not quite adequate: actuator inputs and sensor outputs interconnected via a controller, reactions of humans or animals, devices that respond to external commands, or switches and other logical devices.

Cartesian latent/manifest partitioning. Among Willems’ arguments against mandatory input/output partitioning is the simple and compelling example of a particle moving on a sphere, whose position is truly an output and whose velocity is truly an input—yet even in this seemingly favorable setup, the full state space (the tangent bundle of the sphere) cannot be decomposed as a Cartesian product of positions on the sphere with any vector space. However, Willems uses exactly the same hidden assumption in his definition of a dynamical system with latent variables. If a dynamical system’s full state space can’t be written as a Cartesian product, then its behavior can’t be represented in the way Willems defines.

Probability. Willems’ non-deterministic approach to behaviors is a kind of unquantified uncertainty; it doesn’t natively give us a way of associating probabilities with elements of a “behavior” (although behaviors could be considered as always having an implicitly uniform distribution). Nontrivial distributions could also be modeled by defining the universum 𝕍:=P(𝕎 𝕋)\mathbb{V}:=P\!\left(\mathbb{W}^\mathbb{T}\right) (where PP is probability, namely the Giry monad), but the non-determinism of 𝕍\mathcal{B}\subseteq \mathbb{V} introduces “Knightian uncertainty”, in that models are now sets of distributions, with no probabilities specified at the top level—and it’s unclear how such models should compose with non-stochastic models.

Categorical approaches to systems

As mentioned earlier, Willems has been an inspiration for many papers in applied category theory. One common feature is that many take a relational approach to semantics, providing a functor into (some subcategory of) Rel ×\mathbf{Rel}_\times. Here is a (non-exhaustive!) reference list of these and related works.

Applications to specific domains:

  • Passive linear networks: Baez and Fong, 2015. Constructs a “black-boxing” functor from a decorated-cospan category of passive linear circuits (composed of resistors, inductors and capacitors) to a behavior category of Lagrangian relations.

  • Generalized circuit networks: Baez, Coya and Rebro, 2018. Generalizes the black-boxing functor to potentially nonlinear components/circuits.

  • Reversible Markov processes: Baez, Fong and Pollard, 2016. Constructs a “black-boxing” functor from a decorated-cospan category of reversible Markov processes to a category of linear relations describing steady states.

  • Petri nets / reaction networks: Baez and Pollard, 2017. Constructs a “black-boxing” functor from a decorated-cospan category of Petri nets to a category of semi-algebraic relations describing steady states, with an intermediate stop at a “grey-box” category of algebraic vector fields.

  • Digital circuits: Ghica, Jung and Lopez, 2017. Defines a symmetric monoidal theory of circuits including a discrete-delay operator and feedback, with operational semantics.

  • Discrete linear time-invariant dynamical systems (LTIDS): Fong, Sobocinski, and Rapisarda, 2016. Constructs a full and faithful functor from a freely generated symmetric monoidal theory into the PROP of LTIDSs and characterizes controllability in terms of spans, among other things—not only using Willems’ definitions and philosophy, but even some of his theorems.

General frameworks:

  • Algebra of Open and Interconnected Systems: Brendan Fong’s 2016 doctoral thesis. Covers his technique of decorated cospans as well as more recent work on decorated corelations, both of which are especially useful to construct syntactic categories for various kinds of non-categorical diagrams.

  • Topos of behavior types: Schultz and Spivak, 2017. Constructs a temporal type theory, as the internal logic of the category of sheaves on an interval domain, in which every object represents a behavior that seems to be essentially in Willems’ sense of the word.

  • Operad of wiring diagrams: Vagner, Spivak and Lerman, 2015. Formalises construction of systems of differential equations on manifolds using Spivak’s “operad of wiring diagrams” approach to composition, which is conceptually similar to Willems’ notion of hierarchical zooming into modules.

  • Signal flow graphs: Bonchi, Sobocinski and Zanasi, 2017. Sound and complete axiomatization of signal flow graphs, arguably the primary incumbent against which Willems’ behavioral approach contends.

  • Bond graphs: Brandon Coya, 2017. Defines a category of bond graphs (an older general modeling framework which Willems acknowledges as a step in the right direction toward a behavioral approach) with functorial semantics as Lagrangian relations.

  • Cospan/Span(Graph): Gianola, Kasangian and Sabadini in 2017 review a line of work mostly done in the 90’s by Sabadini, Walters and collaborators on what are essentially open and interconnected labeled-transition systems.

Many thanks to Pawel Sobocinski and Brendan Fong for feedback on this post, and to Sophie Raynor and other members of the seminar for thoughts and discussions.

1 see e.g. Spivak and Schultz’s Temporal type theory; Fong, Sobocinski, and Rapisarda’s Categorical approach to open and interconnected dynamical systems; Bonchi, Sobocinski, and Zanasi’s Categorical semantics of signal flow graphs

2 Depending on the formalisation, a system of differential equations could technically contain equality constraints without any actual derivatives, such as x 2+y 21=0x^2 + y^2 - 1 = 0, which can restrict the feasible region without augmenting the modeling framework to include inequalities. We could even impose the constraint x0x\leq 0 by using a non-analytic function: 0={e 1/x ifx>0 0 ifx00 = \begin{cases}e^{-1/x}& if x\gt 0\\ 0& if x\leq 0\end{cases}

3 Note: From a computer science perspective, this says that any “mathematical model” must be a non-deterministic model, as opposed to, on the one hand, a deterministic model (which would pick out an element of 𝕍\mathbb{V}), or, on the other hand, a probabilistic model (which would give a distribution over 𝕍\mathbb{V}). If we are given free choice of 𝕍\mathbb{V}, any of these kinds of model is encodable as any other, but the choice is significant when it comes to composition.

June 14, 2018

David Hogg#TASI, day 8

Today my lecture was crowd-sourced! In response to popular opinions from the students, I spoke about cosmological large-scale structure experiments. I spoke about how the large surveys are collapsed to symmetry-respecting mean, variance, and three-point functions, and how simulations of large-scale structure are used to build surrogate likelihood functions for these summary statistics.

BackreactionLost in Math: Out Now.

Today is the official publication date of my book “Lost in Math: How Beauty Leads Physics Astray.” There’s an interview with me in the current issue of “Der Spiegel” (in German) with a fancy photo, and an excerpt at Scientific American. In the book I don’t say much about myself or my own research. I felt that was both superfluous and not particularly interesting. However, a lot of people have

Jordan EllenbergRebecca Dallet and the gerrymandered Assembly map

The fate of the current Wisconsin Assembly district map, precision-engineered to maintain a Republican majority in the face of anything short of a major Democratic wave election, is in the hands of the Supreme Court, which could announce a decision in Gill v. Whitford any day.

One theory of gerrymandering is that the practice isn’t much of a problem, because the power of a gerrymandered map “decays” with time — a map that suits a party in 2010 may, due to shifting demographics, be reasonably fair a few years later.

How’s the Wisconsin gerrymander doing in 2018?  We just had a statewide election in which Rebecca Dallet, the more liberal candidate, beat her conservative rival by 12 points, an unusually large margin for a Wisconsin statewide race.

The invaluable J. Miles Coleman broke the race down by Assembly district:

Dallet won in 58% of seats while getting 56% of the vote.  That sounds fair, but in fact a candidate who wins by 12 points is typically going to win in more seats than that.  (That’s why the courts are right to say proportional representation isn’t a reasonable expectation!)

Here’s the breakdown by Assembly district, shown a little bigger:

Dallet won by 2 points or less in 8 of the Assembly districts.  So, as a rough estimate, if she’d gotten 2% of the vote less, and won 54-46 instead of 56-44, you might guess she’d have won 49 out of 99 seats.  That’s consistent with the analysis of Herschlag, Ravier, and Mattingly conducted last year, which estimates that under current maps Democrats would need an 8-12 point statewide lead in order to win half the Assembly seats. (Figure 5 in the linked paper.)

I don’t think the gerrymander is decaying very much.  I think it’s robust enough to make GOP legislative control very likely through 2020, at which point it can be updated to last another ten years, and so on and so on.  This isn’t the same kind of softcore gerrymandering the Supreme Court allowed to stand in 1986, and I hope the 2018 Supreme Court decides to do something about it.

June 13, 2018

David Hogg#TASI, day 7

Today was my second day of lecturing at TASI, and I gave one (morning) lecture, on the use of MCMC sampling. In the afternoon, I looked at (for the first time) the GALAH data on detailed element abundances of stars. I looked at the question of whether the chemical abundances could be used to predict the Galactocentric radii. The idea is: If the gas involved in star formation is azimuthally mixed, there ought to be relationships between radius in the disk and chemical abundances. They didn't jump out! I have various ideas about why, but for now this will be back-burner.

David Hogg#TASI, day 6

Today was the sixth day (but my first day) of the Theory Advanced Study Institute summer school at CU Boulder. I gave two 75-minute lectures on data analysis, my first two of five lectures this week. In the first, I tried to boil down data-analysis to a set of over-arching principles. I got 8 principles. Maybe this is the introduction to the book I will never write! In the second lecture I spoke about fitting a model, from a frequentist perspective, but with a focus on the likelihood function. I am loving the interactive audience. See the wiki for a (constantly updating) description of the lectures I am giving.

n-Category Café Fun for Everyone

There’s a been a lot of progress on the ‘field with one element’ since I discussed it back in “week259”. I’ve been starting to learn more about it, and especially its possible connections to the Riemann Hypothesis. This is a great place to start:

Abstract. This text serves as an introduction to 𝔽 1\mathbb{F}_1-geometry for the general mathematician. We explain the initial motivations for 𝔽 1\mathbb{F}_1-geometry in detail, provide an overview of the different approaches to 𝔽 1\mathbb{F}_1 and describe the main achievements of the field.

Lorscheid’s paper describes various approaches. Since I’m hoping the key to 𝔽 1\mathbb{F}_1-mathematics is something very simple and natural that we haven’t fully explored yet, I’m especially attracted to these:

  • Deitmar’s approach, which generalizes algebraic geometry by replacing commutative rings with commutative monoids. A lot of stuff in algebraic geometry, like the ideals and spectra of commutative rings, or the theory of schemes, doesn’t really require the additive aspect of a ring! So, for many purposes we can get away with commutative monoids, where we think of the monoid operation as multiplication. Sometimes it’s good to use commutative monoids equipped with a ‘zero’ element. The main problem is that algebraic geometry without addition seems to be approximately the same as toric geometry — a wonderful subject, but not general enough to handle everything we want from schemes over 𝔽 1\mathbb{F}_1.

  • Toën and Vaquié’s approach, which goes further and replaces commutative rings by commutative monoid objects in symmetric monoidal categories (which work best when they’re complete, cocomplete and cartesian closed). If our symmetric monoidal category is (AbGp,)(\mathbf{AbGp}, \otimes) we’re back to commutative rings, if it’s (Set,×)(\mathbf{Set}, \times) we’ve got commutative monoids, but there are plenty of other nice choices: for example if it’s (CommMon,)(\mathbf{CommMon}, \otimes) we get commutative rigs, which are awfully nice.

One can also imagine ‘homotopy-coherent’ or ‘\infty-categorical’ analogues of these two approaches, which might provide a good home for certain ways the sphere spectrum shows up in this business as a substitute for the integers. For example, one could imagine that the ultimate replacement for a commutative ring is an E E_\infty algebra inside a symmetric monoidal (,1)(\infty,1)-category.

However, it’s not clear to me that homotopical thinking is the main thing we need to penetrate the mysteries of 𝔽 1\mathbb{F}_1. There seem to be some other missing ideas….

Lorscheid’s own approach uses ‘blueprints’. A blueprint (R,S)(R,S) is a commutative rig RR equipped with a subset SRS \subseteq R that’s closed under multiplication, contains 00 and 11, and generates RR as a rig.

I have trouble, just on general aesthetic grounds, believing that blueprints are final ultimate solution to the quest for a generalization of commutative rings that can handle the “field with one element”. They just don’t seem ‘god-given’ the way commutative monoids or commutative objects are. But they do various nice things.

Maybe someone has answered this already, since it’s a kind of obvious question:

Question. Is there a symmetric monoidal category C\mathbf{C} in which blueprints are the commutative monoid objects?

Maybe something like the category of ‘abelian groups equipped with a set of generators’?

Of course you should want to know what morphisms of blueprints are, because really we should want the category of commutative monoid objects in C\mathbf{C} to be equivalent to the category of blueprints. Luckily Lorscheid’s morphisms of blueprints are the obvious thing: a morphism f:(R,S)(R,S)f : (R,S) \to (R',S') is a morphism of commutative rigs f:RRf: R \to R' with f(S)Sf(S) \subseteq S'.

Anyway, there’s a lot more to say about 𝔽 1\mathbb{F}_1, but Lorscheid’s paper is a great way to get into this subject.

June 11, 2018

BackreactionAre the laws of nature beautiful? (2nd book trailer)

Here is the other video trailer for my book "Lost in Math: How Beauty Leads Physics Astray".  Since I have been asked repeatedly, let me emphasize again that the book is aimed at non-experts, or "the interested lay-reader" as they say. You do not need to bring any background knowledge in math or physics and, no, there are no equations in the book, just a few numbers. It's really about the

June 10, 2018

Sean CarrollIntro to Cosmology Videos

In completely separate video news, here are videos of lectures I gave at CERN several years ago: “Cosmology for Particle Physicists” (May 2005). These are slightly technical — at the very least they presume you know calculus and basic physics — but are still basically accurate despite their age.

  1. Introduction to Cosmology
  2. Dark Matter
  3. Dark Energy
  4. Thermodynamics and the Early Universe
  5. Inflation and Beyond

Update: I originally linked these from YouTube, but apparently they were swiped from this page at CERN, and have been taken down from YouTube. So now I’m linking directly to the CERN copies. Thanks to commenters Bill Schempp and Matt Wright.

Tim GowersA new journal in combinatorics

This post is to announce that a new journal, Advances in Combinatorics, has just opened for submissions. I shall also say a little about the journal, about other new journals, about my own experiences of finding journals I am happy to submit to, and about whether we are any nearer a change to more sensible systems of dissemination and evaluation of scientific papers.

Advances in Combinatorics

Advances in Combinatorics is set up as a combinatorics journal for high-quality papers, principally in the less algebraic parts of combinatorics. It will be an arXiv overlay journal, so free to read, and it will not charge authors. Like its cousin Discrete Analysis (which has recently published its 50th paper) it will be run on the Scholastica platform. Its minimal costs are being paid for by the library at Queen’s University in Ontario, which is also providing administrative support. The journal will start with a small editorial board. Apart from me, it will consist of Béla Bollobás, Reinhard Diestel, Dan Kral, Daniela Kühn, James Oxley, Bruce Reed, Gabor Sarkozy, Asaf Shapira and Robin Thomas. Initially, Dan Kral and I will be the managing editors, though I hope to find somebody to replace me in that role once the journal is established. While I am posting this, Dan is simultaneously announcing the journal at the SIAM conference in Discrete Mathematics, where he has just given a plenary lecture. The journal is also being announced by COAR, the Confederation of Open Access Repositories. This project aligned well with what they are trying to do, and it was their director, Kathleen Shearer, who put me in touch with the library at Queen’s.

As with Discrete Analysis, all members of the editorial board will be expected to work: they won’t just be lending their names to give the journal bogus prestige. Each paper will be handled by one of the editors, who, after obtaining external opinions (when the paper warrants them) will make a recommendation to the rest of the board. All decisions will be made collectively. The job of the managing editors will be to make sure that this process runs smoothly, but when it comes to decisions, they will have no more say than any other editor.

The rough level that the journal is aiming at is that of a top specialist journal such as Combinatorica. The reason for setting it up is that there is a gap in the market for an “ethical” combinatorics journal at that level — that is, one that is not published by one of the major commercial publishers, with all the well known problems that result. We are not trying to destroy the commercial combinatorial journals, but merely to give people the option of avoiding them if they would prefer to submit to a journal that is not complicit in a system that uses its monopoly power to ruthlessly squeeze library budgets.

We are not the first ethical journal in combinatorics. Another example is The Electronic Journal of Combinatorics, which was set up by Herb Wilf back in 1994. The main difference between EJC and Advances in Combinatorics is that we plan to set a higher bar for acceptance, even if it means that we accept only a small number of papers. (One of the great advantages of a fully electronic journal is that we do not have a fixed number of issues per year, so we will not have to change our standards artificially in order to fill issues or clear backlogs.) We thus hope that EJC and AIC will between them offer suitable potential homes for a wide range of combinatorics papers. And on the more algebraic side, one should also mention Algebraic Combinatorics, which used to be the Springer journal The Journal of Algebraic Combinatorics (which officially continues with an entirely replaced editorial board — I don’t know whether it’s getting many submissions though), and also the Australasian Journal of Combinatorics.

So if you’re a combinatorialist who is writing up a result that you think is pretty good, then please consider submitting it to us. What do we mean by “pretty good”? My personal view — that is, I am not speaking for the rest of the editorial board — is that the work in a good paper should have a clear reason for others to be interested in it (so not, for example, incremental progress in some pet project of the author) and should have something about it that makes it count as a significant achievement, such as solving a well-known problem, clearing a difficult technical hurdle, inventing a new and potentially useful technique, or giving a beautiful and memorable proof.

What other ethical journals are there?

Suppose that you want to submit an article to a journal that is free to read and does not charge authors. What are your options? I don’t have a full answer to this question, so I would very much welcome feedback from other people, especially in areas of mathematics far from my own, about what the options are for them. But a good starting point is to consult the list of current member journals in the Free Journal Network, which Advances in Combinatorics hopes to join in due course.

Three notable journals not on that list are the following.

  1. Acta Mathematica. This is one of a tiny handful of the very top journals in mathematics. Last year it became fully open access without charging author fees. So for a really good paper it is a great option.
  2. Annales Henri Lebesgue. This is a new journal that has not yet published any articles, but is open for submissions. Like Acta Mathematica, it covers all of mathematics. It aims for a very high standard, but it is not yet clear what that means in practice: I cannot say that it will be roughly at the level of Journal X. But perhaps it will turn out to be suitable for a very good paper that is just short of the level of Annals, Acta, or JAMS.
  3. Algebra and Number Theory. I am told that this is regarded as the top specialist journal in number theory. From a glance at the article titles, I don’t see much analytic number theory, but there are notable analytic number theorists on the editorial board, so perhaps I have just not looked hard enough.

Added later: I learn from Benoît Kloeckner and Emmanuel Kowalski in the comments below that my information about Algebra and Number Theory was wrong, since articles in that journal are not free to read until they are five years old. However, it is published by MSP, which is a nonprofit organization, so as subscription journals go it is at the ethical end of the spectrum.

Further update: I have heard from the editors of Annales Henri Lebesgue that they have had a number of strong submissions and expect the level of the journal to be at least as high as that of journals such as Advances in Mathematics, Mathematische Annalen and the Israel Journal of Mathematics, and perhaps even slightly higher.

In what areas are ethical journals most needed?

I would very much like to hear from people who would prefer to avoid the commercially published journals, but can’t, because there are no ethical journals of a comparable standard in their area. I hope that combinatorialists will no longer have that problem. My impression is that there is a lack of suitable journals in analysis and I’m told that the same is true of logic. I’m not quite sure what the situation is in geometry or algebra. (In particular, I don’t know whether Algebra and Number Theory is also considered as the top specialist journal for algebraists.) Perhaps in some areas there are satisfactory choices for papers of some standards but not of others: that too would be interesting to know. Where do you think the gaps are? Let me know in the comments below.

Starting a new journal.

I want to make one point loud and clear, which is that the mechanics of starting a new, academic-run journal are now very easy. Basically, the only significant obstacle is getting together an editorial board with the right combination of reputation in the field and willingness to work. What’s more, unless the journal grows large, the work is quite manageable — all the more so if it is spread reasonably uniformly amongst the editorial board. Creating the journal itself can be done on one of a number of different platforms, either for no charge or for a very small charge. Some examples are the Mersenne platform, which hosts the Annales Henri Lebesgue, the Episciences platform, which hosts the Epijournal de Géométrie Algébrique, and Scholastica, which, as I mentioned above, hosts Discrete Analysis and Advances in Combinatorics.

Of these, Scholastica charges a submission fee of $10 per article and the other two are free. There are a few additional costs — for example, Discrete Analysis pays a subscription to CrossRef in order to give DOIs to articles — but the total cost of running a new journal that isn’t too large is of the order of a few hundred dollars per year, as long as nobody is paid for what they do. (Discrete Analysis, like Advances in Combinatorics, gets very useful assistance from librarians, provided voluntarily, but even if they were paid the going rate, the total annual costs would be of the same order of magnitude as one “article processing charge” of the traditional publishers, which is typically around $1500 per article.)

What’s more, those few hundred dollars are not an obstacle either. For example, I know of a fund that is ready to support at least one other journal of a similar size to Discrete Analysis, there are almost certainly other libraries that would be interested in following the enlightened example of Queen’s University Library and supporting a journal (if you are a librarian reading this, then I strongly recommend doing so, as it will be helping to weaken the hold of the system that is currently costing you orders of magnitude more money), and I know various people who know about other means of obtaining funding. So if you are interested in starting a journal and think you can put together a credible editorial board, then get in touch: I can offer advice, funding (if the proposal looks a good one), and contact with several other people who are knowledgeable and keen to help.

A few remarks about my own relationship with mathematical publishing.

My attitudes to journals and the journal system have evolved quite a lot in the last few years. The alert reader may have noticed that I’ve got a long way through this post before mentioning the E-word. I still think that Elsevier is the publisher that does most damage, and have stuck rigidly to my promise made over six years ago not to submit a paper to them or to do editorial or refereeing work. However, whereas then I thought of Springer as somehow more friendly to mathematics, thanks to its long tradition of publishing important textbooks and monographs, I now feel pretty uncomfortable about all the big four — Elsevier, Springer, Wiley, and Taylor and Francis — with Springer having got a whole lot worse after merging with Nature Macmillan. And in some respects Elsevier is better than Springer: for example, they make all mathematics papers over four years old freely available, while Springer refuses to do so. Admittedly this was basically a sop to mathematicians to keep us quiet, but as sops go it was a pretty good one, and I see now that Elsevier’s open archive, as they call it, includes some serious non-mathematical journals such as Cell. (See their list of participating journals for details.)

I’m also not very comfortable with the society journals and university presses, since although they use their profits to benefit mathematics in various ways, they are fully complicit in the system of big deals, the harm of which outweighs those benefits.

The result is that if I have a paper to submit, I tend to have a lot of trouble finding a suitable home for it, and I end up having to compromise on my principles to some extent (particularly if, as happened recently, I have a young coauthor from a country that uses journal rankings to evaluate academics). An obvious place to submit to would be Discrete Analysis, but I feel uncomfortable about that for a different reason, especially now that I have discovered that the facility that enables all the discussion of a paper to be hidden from selected editors does not allow me, as administrator of the website, to hide a paper from myself. (I won’t have this last problem with Advances in Combinatorics, since the librarians at Queens will have the administrator role on the system.)

So my personal options are somewhat limited, but getting better. If I have willing coauthors, then I would now consider (if I had a suitable paper), Acta Mathematica, Annales Henri Lebesgue, Journal de l’École Polytechnique, Discrete Analysis perhaps (but only if the other editors agreed to process my paper offline), Advances in Combinatorics, the Theory of Computing, Electronic Research Announcements in the Mathematical Sciences, the Electronic Journal of Combinatorics, and the Online Journal of Analytic Combinatorics. I also wouldn’t rule out Forum of Mathematics. A couple of journals to which I have an emotional attachment even if I don’t really approve of their practices are GAFA and Combinatorics, Probability and Computing. (The latter bothers me because it is a hybrid journal — that is, it charges subscriptions but also lets authors pay large APCs to make their articles open access, and I heard recently that if you choose the open access option, CUP retains copyright, so you’re not getting that much for your money. But I think not many authors choose this option. The former is also a hybrid journal, and is published by Springer.) Annals of Mathematics, if I’m lucky enough to have an Annals-worthy paper (though I think now I’d try Acta first), is not too bad — although its articles aren’t open access, their subscription costs are much more reasonable than most journals.

That’s a list off the top of my head: if you think I’ve missed out a good option, then I’d be very happy to hear about it.

As an editor, I have recently made the decision that I want to devote all my energies to promoting journals and “post-journal” systems that I fully approve of. So in order to make time for the work that will be involved in establishing Advances in Combinatorics, I have given notice to Forum of Mathematics and Mathematika, the two journals that took up the most of my time, that I will leave their editorial boards at the end of 2018. I feel quite sad about Forum of Mathematics, since I was involved in it from the start, and I really like the way it runs, with proper discussions amongst all the editors about the decisions we make. Also, I am less hostile (for reasons I’ve given in the past) to its APC model than most mathematicians. However, although I am less hostile, I could never say that I have positively liked it, and I came to the conclusion quite a while ago that, as many others have also said, it simply can’t be made to work satisfactorily: it will lead to just as bad market abuses as there are with the subscription system. In the UK it has been a disaster — government open-access mandates have led to universities paying as much as ever for subscriptions and then a whole lot extra for APCs. And there is a real worry that subscription big deals will be replaced by APC big deals, where a country pays a huge amount up front to a publisher in return for people from that country being able to publish with them. This, for example, is what Germany is pushing for. Fortunately, for the moment (if I understand correctly, though I don’t have good insider information on this) they are asking for the average fee per article to be much lower than Elsevier is prepared to accept: long may that impasse continue.

So my leaving Forum of Mathematics is not a protest against it, but simply a practical step that will allow me to focus my energies where I think they can do the most good. I haven’t yet decided whether I ought to resign in protest from some other editorial boards of journals that don’t ask anything of me. Actually, even the practice of having a long list of names of editors, most of whom have zero involvement in the decisions of the journal, is one that bothers me. I recently heard of an Elsevier journal where almost all the editorial board would be happy to resign en masse and set up an ethical version, but the managing editor is strongly against. “But why don’t the rest of the board resign in that case?” I naively asked, to which the answer was, “Because he’s the one who does all the work!” From what I understood, this is literally true — the managing editor handles all the papers and makes all the decisions — but I’m not 100% sure about that.

Is there any point in starting new journals?

Probably major change, if it happens, will be the result of decisions made by major players such as government agencies, national negotiators, and so on. Compared with big events like the Elsevier negotiations in Germany, founding a new journal is a very small step. And even if all mathematicians gave up using the commercial publishers (not something I expect to see any time soon), that would have almost no direct effect, since mathematics journals are bundled together with journals in other subjects, which would continue with the current system.

However, this is a familiar situation in politics. Big decisions are taken by people in positions of power, but what prompts them to make those decisions is often the result of changes in attitudes and behaviour of voters. And big behavioural changes do happen in academia. For example, as we all know, many people have got into the habit of posting all their work on the arXiv, and this accumulation of individual decisions has had the effect of completely changing the way dissemination works in some subjects, including mathematics, a change that has significantly weakened the hold that journals have — or would have if they weren’t bundled together with other journals. Who would ever subscribe at vast expense to a mathematics journal when almost all its content is available online in preprint form?

So I see Advances in Combinatorics as a small step certainly, but a step that needs to be taken. I hope that it will demonstrate once again that starting a serious new journal is not that hard. I also hope that the current trickle of such journals will turn into a flood, that after the flood it will not be possible for people to argue that they are forced to submit articles to the commercial publishers, and that at some point, someone in a position of power will see what is going on, understand better the absurdities of the current system, and take a decision that benefits us all.

Tommaso DorigoModeling Issues Or New Physics ? Surprises From Top Quark Kinematics Study

Simulation, noun:
1. Imitation or enactment
2. The act or process of pretending; feigning.
3. An assumption or imitation of a particular appearance or form; counterfeit; sham.

Well, high-energy physics is all about simulations. 

We have a theoretical model that predicts the outcome of the very energetic particle collisions we create in the core of our giant detectors, but we only have approximate descriptions of the inputs to the theoretical model, so we need simulations. 

read more

June 09, 2018

n-Category Café Sets of Sets of Sets of Sets of Sets of Sets

The covariant power set functor P:SetSetP : Set \to Set can be made into a monad whose multiplication m X:P(P(X))P(X)m_X: P(P(X)) \to P(X) turns a subset of the set of subsets of XX into a subset of XX by taking their union. Algebras of this monad are complete semilattices.

But what about powers of the power set functor? Yesterday Jules Hedges pointed out this paper:

The authors prove that P nP^n cannot be made into a monad for n2n \ge 2.

I’ve mainly looked at their proof for the case n=2n = 2. I haven’t completely worked through it, but it focuses on the unit of any purported monad structure for P 2P^2, rather than its multiplication. Using a cute Yoneda trick they show there are only four possible units, corresponding to the four elements of P(P(1))P(P(1)). Then they show these can’t work. The argument involves sets like this:

As far as I’ve seen, they don’t address the following question:

Question. Does there exist an associative multiplication m:P 2P 2P 2m: P^2 P^2 \Rightarrow P^2? In other words, is there a natural transformation m:P 2P 2P 2m: P^2 P^2 \Rightarrow P^2 such that

P 2P 2P 2mP 2P 2P 2mP 2 P^2 P^2 P^2 \stackrel{m P^2 }{\Rightarrow} P^2 P^2 \stackrel{m}{\Rightarrow} P^2


P 2P 2P 2P 2mP 2P 2mP 2. P^2 P^2 P^2 \stackrel{P^2 m}{\Rightarrow} P^2 P^2 \stackrel{m}{\Rightarrow} P^2 .

I’m not very good at these things, so this question might be very easy to answer. But if the answer were “obviously no” then you’d think Klin and Salamanca might have mentioned that. They do prove there is no distributive law PPPPP P \Rightarrow P P. But they also give examples of monads TT for which there’s no distributive law TTTTT T \Rightarrow T T, yet there’s still a way to make T 2T^2 into a monad.

As far as I can tell, my question is fairly useless: does anyone consider “semigroupads”, namely monads without unit? Nonetheless I’m curious.

If there were a positive answer, we’d have a natural way to take a set of sets of sets of sets and turn it into a set of sets in such a way that the two most obvious resulting ways to turn a set of sets of sets of sets of sets of sets into a set of sets agree!

ResonaancesDark Matter goes sub-GeV

It must have been great to be a particle physicist in the 1990s. Everything was simple and clear then. They knew that, at the most fundamental level, nature was described by one of the five superstring theories which, at low energies, reduced to the Minimal Supersymmetric Standard Model. Dark matter also had a firm place in this narrative, being identified with the lightest neutralino of the MSSM. This simple-minded picture strongly influenced the experimental program of dark matter detection, which was almost entirely focused on the so-called WIMPs in the 1 GeV - 1 TeV mass range. Most of the detectors, including the current leaders XENON and LUX, are blind to sub-GeV dark matter, as slow and light incoming particles are unable to transfer a detectable amount of energy to the target nuclei.

Sometimes progress consists in realizing that you know nothing Jon Snow. The lack of new physics at the LHC invalidates most of the historical motivations for WIMPs. Theoretically, the mass of the dark matter particle could be anywhere between 10^-30 GeV and 10^19 GeV. There are myriads of models positioned anywhere in that range, and it's hard to argue with a straight face that any particular one is favored. We now know that we don't know what dark matter is, and that we should better search in many places. If anything, the small-scale problem of the 𝞚CDM cosmological model can be interpreted as a hint against the boring WIMPS and in favor of light dark matter. For example, if it turns out that dark matter has significant (nuclear size) self-interactions, that can only be realized with sub-GeV particles. 
It takes some time for experiment to catch up with theory, but the process is already well in motion. There is some fascinating progress on the front of ultra-light axion dark matter, which deserves a separate post. Here I want to highlight the ongoing  developments in direct detection of dark matter particles with masses between MeV and GeV. Until recently, the only available constraint in that regime was obtained by recasting data from the XENON10 experiment - the grandfather of the currently operating XENON1T.  In XENON detectors there are two ingredients of the signal generated when a target nucleus is struck:  ionization electrons and scintillation photons. WIMP searches require both to discriminate signal from background. But MeV dark matter interacting with electrons could eject electrons from xenon atoms without producing scintillation. In the standard analysis, such events would be discarded as background. However,  this paper showed that, recycling the available XENON10 data on ionization-only events, one can exclude dark matter in the 100 MeV ballpark with the cross section for scattering on electrons larger than ~0.01 picobarn (10^-38 cm^2). This already has non-trivial consequences for concrete models; for example, a part of the parameter space of milli-charged dark matter is currently best constrained by XENON10.   

It is remarkable that so much useful information can be extracted by basically misusing data collected for another purpose (earlier this year the DarkSide-50 recast their own data in the same manner, excluding another chunk of the parameter space).  Nevertheless, dedicated experiments will soon  be taking over. Recently, two collaborations published first results from their prototype detectors:  one is SENSEI, which uses 0.1 gram of silicon CCDs, and the other is SuperCDMS, which uses 1 gram of silicon semiconductor.  Both are sensitive to eV energy depositions, thanks to which they can extend the search region to lower dark matter mass regions, and set novel limits in the virgin territory between 0.5 and 5 MeV.  A compilation of the existing direct detection limits is shown in the plot. As you can see, above 5 MeV the tiny prototypes cannot yet beat the XENON10 recast. But that will certainly change as soon as full-blown detectors are constructed, after which the XENON10 sensitivity should be improved by several orders of magnitude.
Should we be restless waiting for these results? Well, for any single experiment the chance of finding nothing are immensely larger than that of finding something. Nevertheless, the technical progress and the widening scope of searches offer some hope that the dark matter puzzle may be solved soon.

Terence TaoHeat flow and zeroes of polynomials II: zeroes on a circle

This is a sequel to this previous blog post, in which we discussed the effect of the heat flow evolution

\displaystyle  \partial_t P(t,z) = \partial_{zz} P(t,z)

on the zeroes of a time-dependent family of polynomials {z \mapsto P(t,z)}, with a particular focus on the case when the polynomials {z \mapsto P(t,z)} had real zeroes. Here (inspired by some discussions I had during a recent conference on the Riemann hypothesis in Bristol) we record the analogous theory in which the polynomials instead have zeroes on a circle {\{ z: |z| = \sqrt{q} \}}, with the heat flow slightly adjusted to compensate for this. As we shall discuss shortly, a key example of this situation arises when {P} is the numerator of the zeta function of a curve.

More precisely, let {g} be a natural number. We will say that a polynomial

\displaystyle  P(z) = \sum_{j=0}^{2g} a_j z^j

of degree {2g} (so that {a_{2g} \neq 0}) obeys the functional equation if the {a_j} are all real and

\displaystyle  a_j = q^{g-j} a_{2g-j}

for all {j=0,\dots,2g}, thus

\displaystyle  P(\overline{z}) = \overline{P(z)}


\displaystyle  P(q/z) = q^g z^{-2g} P(z)

for all non-zero {z}. This means that the {2g} zeroes {\alpha_1,\dots,\alpha_{2g}} of {P(z)} (counting multiplicity) lie in {{\bf C} \backslash \{0\}} and are symmetric with respect to complex conjugation {z \mapsto \overline{z}} and inversion {z \mapsto q/z} across the circle {\{ |z| = \sqrt{q}\}}. We say that this polynomial obeys the Riemann hypothesis if all of its zeroes actually lie on the circle {\{ z = \sqrt{q}\}}. For instance, in the {g=1} case, the polynomial {z^2 - a_1 z + q} obeys the Riemann hypothesis if and only if {|a_1| \leq 2\sqrt{q}}.

Such polynomials arise in number theory as follows: if {C} is a projective curve of genus {g} over a finite field {\mathbf{F}_q}, then, as famously proven by Weil, the associated local zeta function {\zeta_{C,q}(z)} (as defined for instance in this previous blog post) is known to take the form

\displaystyle  \zeta_{C,q}(z) = \frac{P(z)}{(1-z)(1-qz)}

where {P} is a degree {2g} polynomial obeying both the functional equation and the Riemann hypothesis. In the case that {C} is an elliptic curve, then {g=1} and {P} takes the form {P(z) = z^2 - a_1 z + q}, where {a_1} is the number of {{\bf F}_q}-points of {C} minus {q+1}. The Riemann hypothesis in this case is a famous result of Hasse.

Another key example of such polynomials arise from rescaled characteristic polynomials

\displaystyle  P(z) := \det( 1 - \sqrt{q} F ) \ \ \ \ \ (1)

of {2g \times 2g} matrices {F} in the compact symplectic group {Sp(g)}. These polynomials obey both the functional equation and the Riemann hypothesis. The Sato-Tate conjecture (in higher genus) asserts, roughly speaking, that “typical” polyomials {P} arising from the number theoretic situation above are distributed like the rescaled characteristic polynomials (1), where {F} is drawn uniformly from {Sp(g)} with Haar measure.

Given a polynomial {z \mapsto P(0,z)} of degree {2g} with coefficients

\displaystyle  P(0,z) = \sum_{j=0}^{2g} a_j(0) z^j,

we can evolve it in time by the formula

\displaystyle  P(t,z) = \sum_{j=0}^{2g} \exp( t(j-g)^2 ) a_j(0) z^j,

thus {a_j(t) = \exp(t(j-g)) a_j(0)} for {t \in {\bf R}}. Informally, as one increases {t}, this evolution accentuates the effect of the extreme monomials, particularly, {z^0} and {z^{2g}} at the expense of the intermediate monomials such as {z^g}, and conversely as one decreases {t}. This family of polynomials obeys the heat-type equation

\displaystyle  \partial_t P(t,z) = (z \partial_z - g)^2 P(t,z). \ \ \ \ \ (2)

In view of the results of Marcus, Spielman, and Srivastava, it is also very likely that one can interpret this flow in terms of expected characteristic polynomials involving conjugation over the compact symplectic group {Sp(n)}, and should also be tied to some sort of “{\beta=\infty}” version of Brownian motion on this group, but we have not attempted to work this connection out in detail.

It is clear that if {z \mapsto P(0,z)} obeys the functional equation, then so does {z \mapsto P(t,z)} for any other time {t}. Now we investigate the evolution of the zeroes. Suppose at some time {t_0} that the zeroes {\alpha_1(t_0),\dots,\alpha_{2g}(t_0)} of {z \mapsto P(t_0,z)} are distinct, then

\displaystyle  P(t_0,z) = a_{2g}(0) \exp( t_0g^2 ) \prod_{j=1}^{2g} (z - \alpha_j(t_0) ).

From the inverse function theorem we see that for times {t} sufficiently close to {t_0}, the zeroes {\alpha_1(t),\dots,\alpha_{2g}(t)} of {z \mapsto P(t,z)} continue to be distinct (and vary smoothly in {t}), with

\displaystyle  P(t,z) = a_{2g}(0) \exp( t g^2 ) \prod_{j=1}^{2g} (z - \alpha_j(t) ).

Differentiating this at any {z} not equal to any of the {\alpha_j(t)}, we obtain

\displaystyle  \partial_t P(t,z) = P(t,z) ( g^2 - \sum_{j=1}^{2g} \frac{\alpha'_j(t)}{z - \alpha_j(t)})


\displaystyle  \partial_z P(t,z) = P(t,z) ( \sum_{j=1}^{2g} \frac{1}{z - \alpha_j(t)})


\displaystyle  \partial_{zz} P(t,z) = P(t,z) ( \sum_{1 \leq j,k \leq 2g: j \neq k} \frac{1}{(z - \alpha_j(t))(z - \alpha_k(t))}).

Inserting these formulae into (2) (expanding {(z \partial_z - g)^2} as {z^2 \partial_{zz} - (2g-1) z \partial_z + g^2}) and canceling some terms, we conclude that

\displaystyle  - \sum_{j=1}^{2g} \frac{\alpha'_j(t)}{z - \alpha_j(t)} = z^2 \sum_{1 \leq j,k \leq 2g: j \neq k} \frac{1}{(z - \alpha_j(t))(z - \alpha_k(t))}

\displaystyle  - (2g-1) z \sum_{j=1}^{2g} \frac{1}{z - \alpha_j(t)}

for {t} sufficiently close to {t_0}, and {z} not equal to {\alpha_1(t),\dots,\alpha_{2g}(t)}. Extracting the residue at {z = \alpha_j(t)}, we conclude that

\displaystyle  - \alpha'_j(t) = 2 \alpha_j(t)^2 \sum_{1 \leq k \leq 2g: k \neq j} \frac{1}{\alpha_j(t) - \alpha_k(t)} - (2g-1) \alpha_j(t)

which we can rearrange as

\displaystyle  \frac{\alpha'_j(t)}{\alpha_j(t)} = - \sum_{1 \leq k \leq 2g: k \neq j} \frac{\alpha_j(t)+\alpha_k(t)}{\alpha_j(t)-\alpha_k(t)}.

If we make the change of variables {\alpha_j(t) = \sqrt{q} e^{i\theta_j(t)}} (noting that one can make {\theta_j} depend smoothly on {t} for {t} sufficiently close to {t_0}), this becomes

\displaystyle  \partial_t \theta_j(t) = \sum_{1 \leq k \leq 2g: k \neq j} \cot \frac{\theta_j(t) - \theta_k(t)}{2}. \ \ \ \ \ (3)

Intuitively, this equation asserts that the phases {\theta_j} repel each other if they are real (and attract each other if their difference is imaginary). If {z \mapsto P(t_0,z)} obeys the Riemann hypothesis, then the {\theta_j} are all real at time {t_0}, then the Picard uniqueness theorem (applied to {\theta_j(t)} and its complex conjugate) then shows that the {\theta_j} are also real for {t} sufficiently close to {t_0}. If we then define the entropy functional

\displaystyle  H(\theta_1,\dots,\theta_{2g}) := \sum_{1 \leq j < k \leq 2g} \log \frac{1}{|\sin \frac{\theta_j-\theta_k}{2}| }

then the above equation becomes a gradient flow

\displaystyle  \partial_t \theta_j(t) = - 2 \frac{\partial H}{\partial \theta_j}( \theta_1(t),\dots,\theta_{2g}(t) )

which implies in particular that {H(\theta_1(t),\dots,\theta_{2g}(t))} is non-increasing in time. This shows that as one evolves time forward from {t_0}, there is a uniform lower bound on the separation between the phases {\theta_1(t),\dots,\theta_{2g}(t)}, and hence the equation can be solved indefinitely; in particular, {z \mapsto P(t,z)} obeys the Riemann hypothesis for all {t > t_0} if it does so at time {t_0}. Our argument here assumed that the zeroes of {z \mapsto P(t_0,z)} were simple, but this assumption can be removed by the usual limiting argument.

For any polynomial {z \mapsto P(0,z)} obeying the functional equation, the rescaled polynomials {z \mapsto e^{-g^2 t} P(t,z)} converge locally uniformly to {a_{2g}(0) (z^{2g} + q^g)} as {t \rightarrow +\infty}. By Rouche’s theorem, we conclude that the zeroes of {z \mapsto P(t,z)} converge to the equally spaced points {\{ e^{2\pi i(j+1/2)/2g}: j=1,\dots,2g\}} on the circle {\{ |z| = \sqrt{q}\}}. Together with the symmetry properties of the zeroes, this implies in particular that {z \mapsto P(t,z)} obeys the Riemann hypothesis for all sufficiently large positive {t}. In the opposite direction, when {t \rightarrow -\infty}, the polynomials {z \mapsto P(t,z)} converge locally uniformly to {a_g(0) z^g}, so if {a_g(0) \neq 0}, {g} of the zeroes converge to the origin and the other {g} converge to infinity. In particular, {z \mapsto P(t,z)} fails the Riemann hypothesis for sufficiently large negative {t}. Thus (if {a_g(0) \neq 0}), there must exist a real number {\Lambda}, which we call the de Bruijn-Newman constant of the original polynomial {z \mapsto P(0,z)}, such that {z \mapsto P(t,z)} obeys the Riemann hypothesis for {t \geq \Lambda} and fails the Riemann hypothesis for {t < \Lambda}. The situation is a bit more complicated if {a_g(0)} vanishes; if {k} is the first natural number such that {a_{g+k}(0)} (or equivalently, {a_{g-j}(0)}) does not vanish, then by the above arguments one finds in the limit {t \rightarrow -\infty} that {g-k} of the zeroes go to the origin, {g-k} go to infinity, and the remaining {2k} zeroes converge to the equally spaced points {\{ e^{2\pi i(j+1/2)/2k}: j=1,\dots,2k\}}. In this case the de Bruijn-Newman constant remains finite except in the degenerate case {k=g}, in which case {\Lambda = -\infty}.

For instance, consider the case when {g=1} and {P(0,z) = z^2 - a_1 z + q} for some real {a_1} with {|a_1| \leq 2\sqrt{q}}. Then the quadratic polynomial

\displaystyle  P(t,z) = e^t z^2 - a_1 z + e^t q

has zeroes

\displaystyle  \frac{a_1 \pm \sqrt{a_1^2 - 4 e^{2t} q}}{2e^t}

and one easily checks that these zeroes lie on the circle {\{ |z|=\sqrt{q}\}} when {t \geq \log \frac{|a_1|}{2\sqrt{q}}}, and are on the real axis otherwise. Thus in this case we have {\Lambda = \log \frac{|a_1|}{2\sqrt{q}}} (with {\Lambda=-\infty} if {a_1=0}). Note how as {t} increases to {+\infty}, the zeroes repel each other and eventually converge to {\pm i \sqrt{q}}, while as {t} decreases to {-\infty}, the zeroes collide and then separate on the real axis, with one zero going to the origin and the other to infinity.

The arguments in my paper with Brad Rodgers (discussed in this previous post) indicate that for a “typical” polynomial {P} of degree {g} that obeys the Riemann hypothesis, the expected time to relaxation to equilibrium (in which the zeroes are equally spaced) should be comparable to {1/g}, basically because the average spacing is {1/g} and hence by (3) the typical velocity of the zeroes should be comparable to {g}, and the diameter of the unit circle is comparable to {1}, thus requiring time comparable to {1/g} to reach equilibrium. Taking contrapositives, this suggests that the de Bruijn-Newman constant {\Lambda} should typically take on values comparable to {-1/g} (since typically one would not expect the initial configuration of zeroes to be close to evenly spaced). I have not attempted to formalise or prove this claim, but presumably one could do some numerics (perhaps using some of the examples of {P} given previously) to explore this further.

June 08, 2018

Doug NatelsonWhat are steric interactions?

When first was reading chemistry papers, one piece of jargon jumped out at me:  "steric hindrance", which is an abstruse way of saying that you can't force pieces of molecules (atoms or groups of atoms) to pass through each other.  In physics jargon, they have a "hard core repulsion".  If you want to describe the potential energy of two atoms as you try to squeeze one into the volume of the other, you get a term that blows up very rapidly, like \(1/r^{12}\), where \(r\) is the distance between the nuclei.  Basically, you can do pretty well treating atoms like impenetrable spheres with diameters given by their outer electronic orbitals.  Indeed, Robert Hooke went so far as to infer, from the existence of faceted crystals, that matter is built from effectively impenetrable little spherical atoms.

It's a common thing in popular treatments of physics to point out that atoms are "mostly empty space".  With hydrogen, for example, if you said that the proton was the size of a pea, then the 1s orbital (describing the spatial probability distribution for finding the point-like electron) would be around 250 m in radius.  So, if atoms are such big, puffy objects, then why can't two atoms overlap in real space?  It's not just the electrostatic repulsion, since each atom is overall neutral.

The answer is (once again) the Pauli exclusion principle (PEP) and the fact that electrons obey Fermi statistics.  Sometimes the PEP is stated in a mathematically formal way that can obscure its profound consequences.  For our purposes, the bottom line is:  It is apparently a fundamental property of the universe that you can't stick two identical fermions (including having the same spin) in the same quantum state.    At the risk of getting technical, this can mean a particular atomic orbital, or more generally it can be argued to mean the same little "cell" of volume \(h^{3}\) in r-p phase space.  It just can't happen

If you try to force it, what happens instead?  In practice, to get two carbon atoms, say, to overlap in real space, you would have to make the electrons in one of the atoms leave their ordinary orbitals and make transitions to states with higher kinetic energies.  That energy has to come from somewhere - you have to do work and supply that energy to squeeze two atoms into the volume of one.  Books have been written about this.

Leaving aside for a moment the question of why rigid solids are rigid, it's pretty neat to realize that the physics principle that keeps you from falling through your chair or the floor is really the same principle that holds up white dwarf stars.

Matt von HippelAn Omega for Every Alpha

In particle physics, we almost always use approximations.

Often, we assume the forces we consider are weak. We use a “coupling constant”, some number written g or a or \alpha, and we assume it’s small, so \alpha is greater than \alpha^2 is greater than \alpha^3. With this assumption, we can start drawing Feynman diagrams, and each “loop” we add to the diagram gives us a higher power of \alpha.

If \alpha isn’t small, then the trick stops working, the diagrams stop making sense, and we have to do something else.

Except for some times, when everything keeps working fine. This week, along with Simon Caron-Huot, Lance Dixon, Andrew McLeod, and Georgios Papathanasiou, I published what turned out to be a pretty cute example.


We call this fellow \Omega. It’s a family of diagrams that we can write down for any number of loops: to get more loops, just extend the “…”, adding more boxes in the middle. Count the number of lines sticking out, and you get six: these are “hexagon functions”, the type of function I’ve used to calculate six-particle scattering in N=4 super Yang-Mills.

The fun thing about \Omega is that we don’t have to think about it this way, one loop at a time. We can add up all the loops, \alpha times one loop plus \alpha^2 times two loops plus \alpha^3 times three loops, all the way up to infinity. And we’ve managed to figure out what those loops sum to.


The result ends up beautifully simple. This formula isn’t just true for small coupling constants, it’s true for any number you care to plug in, making the forces as strong as you’d like.

We can do this with \Omega because we have equations relating different loops together. Solving those equations with a few educated guesses, we can figure out the full sum. We can also go back, and use those equations to take the \Omegas at each loop apart, finding a basis of functions needed to describe them.

That basis is the real reward here. It’s not the full basis of “hexagon functions”: if you wanted to do a full six-particle calculation, you’d need more functions than the ones \Omega is made of. What it is, though, is a basis we can describe completely, stating exactly what it’s made of for any number of loops.

We can’t do that with the hexagon functions, at least not yet: we have to build them loop by loop, one at a time before we can find the next ones. The hope, though, is that we won’t have to do this much longer. The \Omega basis covers some of the functions we need. Our hope is that other nice families of diagrams can cover the rest. If we can identify more functions like \Omega, things that we can sum to any number of loops, then perhaps we won’t have to think loop by loop anymore. If we know the right building blocks, we might be able to guess the whole amplitude, to find a formula that works for any \alpha you’d like.

That would be a big deal. N=4 super Yang-Mills isn’t the real world, but it’s complicated in some of the same ways. If we can calculate there without approximations, it should at least give us an idea of what part of the real-world answer can look like. And for a field that almost always uses approximations, that’s some pretty substantial progress.

ResonaancesMassive Gravity, or You Only Live Twice

Proving Einstein wrong is the ultimate ambition of every crackpot and physicist alike. In particular, Einstein's theory of gravitation -  the general relativity -  has been a victim of constant harassment. That is to say, it is trivial to modify gravity at large energies (short distances), for example by embedding it in string theory, but it is notoriously difficult to change its long distance behavior. At the same time, motivations to keep trying go beyond intellectual gymnastics. For example, the accelerated expansion of the universe may be a manifestation of modified gravity (rather than of a small cosmological constant).   

In Einstein's general relativity, gravitational interactions are mediated by a massless spin-2 particle - the so-called graviton. This is what gives it its hallmark properties: the long range and the universality. One obvious way to screw with Einstein is to add mass to the graviton, as entertained already in 1939 by Fierz and Pauli. The Particle Data Group quotes the constraint m ≤ 6*10^−32 eV, so we are talking about the De Broglie wavelength comparable to the size of the observable universe. Yet even that teeny mass may cause massive troubles. In 1970 the Fierz-Pauli theory was killed by the van Dam-Veltman-Zakharov (vDVZ) discontinuity. The problem stems from the fact that a massive spin-2 particle has 5 polarization states (0,±1,±2) unlike a massless one which has only two (±2). It turns out that the polarization-0 state couples to matter with the similar strength as the usual polarization ±2 modes, even in the limit where the mass goes to zero, and thus mediates an additional force which differs from the usual gravity. One finds that, in massive gravity, light bending would be 25% smaller, in conflict with the very precise observations of stars' deflection around the Sun. vDV concluded that "the graviton has rigorously zero mass". Dead for the first time...           

The second coming was heralded soon after by Vainshtein, who noticed that the troublesome polarization-0 mode can be shut off in the proximity of stars and planets. This can happen in the presence of graviton self-interactions of a certain type. Technically, what happens is that the polarization-0 mode develops a background value around massive sources which, through the derivative self-interactions, renormalizes its kinetic term and effectively diminishes its interaction strength with matter. See here for a nice review and more technical details. Thanks to the Vainshtein mechanism, the usual predictions of general relativity are recovered around large massive source, which is exactly where we can best measure gravitational effects. The possible self-interactions leading a healthy theory without ghosts have been classified, and go under the name of the dRGT massive gravity.

There is however one inevitable consequence of the Vainshtein mechanism. The graviton self-interaction strength grows with energy, and at some point becomes inconsistent with the unitarity limits that every quantum theory should obey. This means that massive gravity is necessarily an effective theory with a limited validity range and has to be replaced by a more fundamental theory at some cutoff scale 𝞚. This is of course nothing new for gravity: the usual Einstein gravity is also an effective theory valid at most up to the Planck scale MPl~10^19 GeV.  But for massive gravity the cutoff depends on the graviton mass and is much smaller for realistic theories. At best,
So the massive gravity theory in its usual form cannot be used at distance scales shorter than ~300 km. For particle physicists that would be a disaster, but for cosmologists this is fine, as one can still predict the behavior of galaxies, stars, and planets. While the theory certainly cannot be used to describe the results of table top experiments,  it is relevant for the  movement of celestial bodies in the Solar System. Indeed, lunar laser ranging experiments or precision studies of Jupiter's orbit are interesting probes of the graviton mass.

Now comes the latest twist in the story. Some time ago this paper showed that not everything is allowed  in effective theories.  Assuming the full theory is unitary, causal and local implies non-trivial constraints on the possible interactions in the low-energy effective theory. These techniques are suitable to constrain, via dispersion relations, derivative interactions of the kind required by the Vainshtein mechanism. Applying them to the dRGT gravity one finds that it is inconsistent to assume the theory is valid all the way up to 𝞚max. Instead, it must be replaced by a more fundamental theory already at a much lower cutoff scale,  parameterized as 𝞚 = g*^1/3 𝞚max (the parameter g* is interpreted as the coupling strength of the more fundamental theory). The allowed parameter space in the g*-m plane is showed in this plot:

Massive gravity must live in the lower left corner, outside the gray area  excluded theoretically  and where the graviton mass satisfies the experimental upper limit m~10^−32 eV. This implies g* ≼ 10^-10, and thus the validity range of the theory is some 3 order of magnitude lower than 𝞚max. In other words, massive gravity is not a consistent effective theory at distance scales below ~1 million km, and thus cannot be used to describe the motion of falling apples, GPS satellites or even the Moon. In this sense, it's not much of a competition to, say, Newton. Dead for the second time.   

Is this the end of the story? For the third coming we would need a more general theory with additional light particles beyond the massive graviton, which is consistent theoretically in a larger energy range, realizes the Vainshtein mechanism, and is in agreement with the current experimental observations. This is hard but not impossible to imagine. Whatever the outcome, what I like in this story is the role of theory in driving the progress, which is rarely seen these days. In the process, we have understood a lot of interesting physics whose relevance goes well beyond one specific theory. So the trip was certainly worth it, even if we find ourselves back at the departure point.

June 07, 2018

ResonaancesCan MiniBooNE be right?

The experimental situation in neutrino physics is confusing. One one hand, a host of neutrino experiments has established a consistent picture where the neutrino mass eigenstates are mixtures of the 3 Standard Model neutrino flavors νe, νμ, ντ. The measured mass differences between the eigenstates are Δm12^2 ≈ 7.5*10^-5 eV^2 and Δm13^2 ≈ 2.5*10^-3 eV^2, suggesting that all Standard Model neutrinos have masses below 0.1 eV. That is well in line with cosmological observations which find that the radiation budget of the early universe is consistent with the existence of exactly 3 neutrinos with the sum of the masses less than 0.2 eV. On the other hand, several rogue experiments refuse to conform to the standard 3-flavor picture. The most severe anomaly is the appearance of electron neutrinos in a muon neutrino beam observed by the LSND and MiniBooNE experiments.

This story begins in the previous century with the LSND experiment in Los Alamos, which claimed to observe νμνe antineutrino oscillations with 3.8σ significance.  This result was considered controversial from the very beginning due to limitations of the experimental set-up. Moreover, it was inconsistent with the standard 3-flavor picture which, given the masses and mixing angles measured by other experiments, predicted that νμνe oscillation should be unobservable in short-baseline (L ≼ km) experiments. The MiniBooNE experiment in Fermilab was conceived to conclusively prove or disprove the LSND anomaly. To this end, a beam of mostly muon neutrinos or antineutrinos with energies E~1 GeV is sent to a detector at the distance L~500 meters away. In general, neutrinos can change their flavor with the probability oscillating as P ~ sin^2(Δm^2 L/4E). If the LSND excess is really due to neutrino oscillations, one expects to observe electron neutrino appearance in the MiniBooNE detector given that L/E is similar in the two experiments. Originally, MiniBooNE was hoping to see a smoking gun in the form of an electron neutrino excess oscillating as a function of L/E, that is peaking at intermediate energies and then decreasing towards lower energies (possibly with several wiggles). That didn't happen. Instead, MiniBooNE finds an excess increasing towards low energies with a similar shape as the backgrounds. Thus the confusion lingers on: the LSND anomaly has neither been killed nor robustly confirmed.     

In spite of these doubts, the LSND and MiniBooNE anomalies continue to arouse interest. This is understandable: as the results do not fit the 3-flavor framework, if confirmed they would prove the existence of new physics beyond the Standard Model. The simplest fix would be to introduce a sterile neutrino νs with the mass in the eV ballpark, in which case MiniBooNE would be observing the νμνsνe oscillation chain. With the recent MiniBooNE update the evidence for the electron neutrino appearance increased to 4.8σ, which has stirred some commotion on Twitter and in the blogosphere. However, I find the excitement a bit misplaced. The anomaly is not really new: similar results showing a 3.8σ excess of νe-like events were already published in 2012.  The increase of the significance is hardly relevant: at this point we know anyway that the excess is not a statistical fluke, while a systematic effect due to underestimated backgrounds would also lead to a growing anomaly. If anything, there are now less reasons than in 2012 to believe in the sterile neutrino origin the MiniBooNE anomaly, as I will argue in the following.

What has changed since 2012? First, there are new constraints on νe appearance from the OPERA experiment (yes, this OPERA) who did not see any excess νe in the CERN-to-Gran-Sasso νμ beam. This excludes a large chunk of the relevant parameter space corresponding to large mixing angles between the active and sterile neutrinos. From this point of view, the MiniBooNE update actually adds more stress on the sterile neutrino interpretation by slightly shifting the preferred region towards larger mixing angles...  Nevertheless, a not-too-horrible fit to all appearance experiments can still be achieved in the region with Δm^2~0.5 eV^2 and the mixing angle sin^2(2θ) of order 0.01.     

Next, the cosmological constraints have become more stringent. The CMB observations by the Planck satellite do not leave room for an additional neutrino species in the early universe. But for the parameters preferred by LSND and MiniBooNE, the sterile neutrino would be abundantly produced in the hot primordial plasma, thus violating the Planck constraints. To avoid it, theorists need to deploy a battery of  tricks (for example, large sterile-neutrino self-interactions), which makes realistic models rather baroque.

But the killer punch is delivered by disappearance analyses. Benjamin Franklin famously said that only two things in this world were certain: death and probability conservation. Thus whenever an electron neutrino appears in a νμ beam, a muon neutrino must disappear. However, the latter process is severely constrained by long-baseline neutrino experiments, and recently the limits have been further strengthened thanks to the MINOS and IceCube collaborations. A recent combination of the existing disappearance results is available in this paper.  In the 3+1 flavor scheme, the probability of a muon neutrino transforming into an electron  one in a short-baseline experiment is
where U is the 4x4 neutrino mixing matrix.  The Uμ4 matrix elements controls also the νμ survival probability
The νμ disappearance data from MINOS and IceCube imply |Uμ4|≼0.1, while |Ue4|≼0.25 from solar neutrino observations. All in all, the disappearance results imply that the effective mixing angle sin^2(2θ) controlling the νμνsνe oscillation must be much smaller than 0.01 required to fit the MiniBooNE anomaly. The disagreement between the appearance and disappearance data had already existed before, but was actually made worse by the MiniBooNE update.
So the hypothesis of a 4th sterile neutrino does not stand scrutiny as an explanation of the MiniBooNE anomaly. It does not mean that there is no other possible explanation (more sterile neutrinos? non-standard interactions? neutrino decays?). However, any realistic model will have to delve deep into the crazy side in order to satisfy the constraints from other neutrino experiments, flavor physics, and cosmology. Fortunately, the current confusing situation should not last forever. The MiniBooNE photon background from π0 decays may be clarified by the ongoing MicroBooNE experiment. On the timescale of a few years the controversy should be closed by the SBN program in Fermilab, which will add one near and one far detector to the MicroBooNE beamline. Until then... years of painful experience have taught us to assign a high prior to the Standard Model hypothesis. Currently, by far the most plausible explanation of the existing data is an experimental error on the part of the MiniBooNE collaboration.

Terence Tao246C notes 4: Brownian motion, conformal invariance, and SLE

Important note: As this is not a course in probability, we will try to avoid developing the general theory of stochastic calculus (which includes such concepts as filtrations, martingales, and Ito calculus). This will unfortunately limit what we can actually prove rigorously, and so at some places the arguments will be somewhat informal in nature. A rigorous treatment of many of the topics here can be found for instance in Lawler’s Conformally Invariant Processes in the Plane, from which much of the material here is drawn.

In these notes, random variables will be denoted in boldface.

Definition 1 A real random variable {\mathbf{X}} is said to be normally distributed with mean {x_0 \in {\bf R}} and variance {\sigma^2 > 0} if one has

\displaystyle \mathop{\bf E} F(\mathbf{X}) = \frac{1}{\sqrt{2\pi} \sigma} \int_{\bf R} e^{-(x-x_0)^2/2\sigma^2} F(x)\ dx

for all test functions {F \in C_c({\bf R})}. Similarly, a complex random variable {\mathbf{Z}} is said to be normally distributed with mean {z_0 \in {\bf R}} and variance {\sigma^2>0} if one has

\displaystyle \mathop{\bf E} F(\mathbf{Z}) = \frac{1}{\pi \sigma^2} \int_{\bf C} e^{-|z-x_0|^2/\sigma^2} F(z)\ dx dy

for all test functions {F \in C_c({\bf C})}, where {dx dy} is the area element on {{\bf C}}.

A real Brownian motion with base point {x_0 \in {\bf R}} is a random, almost surely continuous function {\mathbf{B}^{x_0}: [0,+\infty) \rightarrow {\bf R}} (using the locally uniform topology on continuous functions) with the property that (almost surely) {\mathbf{B}^{x_0}(0) = x_0}, and for any sequence of times {0 \leq t_0 < t_1 < t_2 < \dots < t_n}, the increments {\mathbf{B}^{x_0}(t_i) - \mathbf{B}^{x_0}(t_{i-1})} for {i=1,\dots,n} are independent real random variables that are normally distributed with mean zero and variance {t_i - t_{i-1}}. Similarly, a complex Brownian motion with base point {z_0 \in {\bf R}} is a random, almost surely continuous function {\mathbf{B}^{z_0}: [0,+\infty) \rightarrow {\bf R}} with the property that {\mathbf{B}^{z_0}(0) = z_0} and for any sequence of times {0 \leq t_0 < t_1 < t_2 < \dots < t_n}, the increments {\mathbf{B}^{z_0}(t_i) - \mathbf{B}^{z_0}(t_{i-1})} for {i=1,\dots,n} are independent complex random variables that are normally distributed with mean zero and variance {t_i - t_{i-1}}.

Remark 2 Thanks to the central limit theorem, the hypothesis that the increments {\mathbf{B}^{x_0}(t_i) - \mathbf{B}^{x_0}(t_{i-1})} be normally distributed can be dropped from the definition of a Brownian motion, so long as one retains the independence and the normalisation of the mean and variance (technically one also needs some uniform integrability on the increments beyond the second moment, but we will not detail this here). A similar statement is also true for the complex Brownian motion (where now we need to normalise the variances and covariances of the real and imaginary parts of the increments).

Real and complex Brownian motions exist from any base point {x_0} or {z_0}; see e.g. this previous blog post for a construction. We have the following simple invariances:

Exercise 3

  • (i) (Translation invariance) If {\mathbf{B}^{x_0}} is a real Brownian motion with base point {x_0 \in {\bf R}}, and {h \in {\bf R}}, show that {\mathbf{B}^{x_0}+h} is a real Brownian motion with base point {x_0+h}. Similarly, if {\mathbf{B}^{z_0}} is a complex Brownian motion with base point {z_0 \in {\bf R}}, and {h \in {\bf C}}, show that {\mathbf{B}^{z_0}+c} is a complex Brownian motion with base point {z_0+h}.
  • (ii) (Dilation invariance) If {\mathbf{B}^{0}} is a real Brownian motion with base point {0}, and {\lambda \in {\bf R}} is non-zero, show that {t \mapsto \lambda \mathbf{B}^0(t / |\lambda|^{1/2})} is also a real Brownian motion with base point {0}. Similarly, if {\mathbf{B}^0} is a complex Brownian motion with base point {0}, and {\lambda \in {\bf C}} is non-zero, show that {t \mapsto \lambda \mathbf{B}^0(t / |\lambda|^{1/2})} is also a complex Brownian motion with base point {0}.
  • (iii) (Real and imaginary parts) If {\mathbf{B}^0} is a complex Brownian motion with base point {0}, show that {\sqrt{2} \mathrm{Re} \mathbf{B}^0} and {\sqrt{2} \mathrm{Im} \mathbf{B}^0} are independent real Brownian motions with base point {0}. Conversely, if {\mathbf{B}^0_1, \mathbf{B}^0_2} are independent real Brownian motions of base point {0}, show that {\frac{1}{\sqrt{2}} (\mathbf{B}^0_1 + i \mathbf{B}^0_2)} is a complex Brownian motion with base point {0}.

The next lemma is a special case of the optional stopping theorem.

Lemma 4 (Optional stopping identities)

  • (i) (Real case) Let {\mathbf{B}^{x_0}} be a real Brownian motion with base point {x_0 \in {\bf R}}. Let {\mathbf{t}} be a bounded stopping time – a bounded random variable with the property that for any time {t \geq 0}, the event that {\mathbf{t} \leq t} is determined by the values of the trajectory {\mathbf{B}^{x_0}} for times up to {t} (or more precisely, this event is measurable with respect to the {\sigma} algebra generated by this proprtion of the trajectory). Then

    \displaystyle \mathop{\bf E} \mathbf{B}^{x_0}(\mathbf{t}) = x_0


    \displaystyle \mathop{\bf E} (\mathbf{B}^{x_0}(\mathbf{t})-x_0)^2 - \mathbf{t} = 0


    \displaystyle \mathop{\bf E} (\mathbf{B}^{x_0}(\mathbf{t})-x_0)^4 = O( \mathop{\bf E} \mathbf{t}^2 ).

  • (ii) (Complex case) Let {\mathbf{B}^{z_0}} be a real Brownian motion with base point {z_0 \in {\bf R}}. Let {\mathbf{t}} be a bounded stopping time – a bounded random variable with the property that for any time {t \geq 0}, the event that {\mathbf{t} \leq t} is determined by the values of the trajectory {\mathbf{B}^{x_0}} for times up to {t}. Then

    \displaystyle \mathop{\bf E} \mathbf{B}^{z_0}(\mathbf{t}) = z_0

    \displaystyle \mathop{\bf E} (\mathrm{Re}(\mathbf{B}^{z_0}(\mathbf{t})-z_0))^2 - \frac{1}{2} \mathbf{t} = 0

    \displaystyle \mathop{\bf E} (\mathrm{Im}(\mathbf{B}^{z_0}(\mathbf{t})-z_0))^2 - \frac{1}{2} \mathbf{t} = 0

    \displaystyle \mathop{\bf E} \mathrm{Re}(\mathbf{B}^{z_0}(\mathbf{t})-z_0) \mathrm{Im}(\mathbf{B}^{z_0}(\mathbf{t})-z_0) = 0

    \displaystyle \mathop{\bf E} |\mathbf{B}^{x_0}(\mathbf{t})-z_0|^4 = O( \mathop{\bf E} \mathbf{t}^2 ).

Proof: (Slightly informal) We just prove (i) and leave (ii) as an exercise. By translation invariance we can take {x_0=0}. Let {T} be an upper bound for {\mathbf{t}}. Since {\mathbf{B}^0(T)} is a real normally distributed variable with mean zero and variance {T}, we have

\displaystyle \mathop{\bf E} \mathbf{B}^0( T ) = 0


\displaystyle \mathop{\bf E} \mathbf{B}^0( T )^2 = T


\displaystyle \mathop{\bf E} \mathbf{B}^0( T )^4 = 3T^2.

By the law of total expectation, we thus have

\displaystyle \mathop{\bf E} \mathop{\bf E}(\mathbf{B}^0( T ) | \mathbf{t}, \mathbf{B}^{z_0}(\mathbf{t}) ) = 0


\displaystyle \mathop{\bf E} \mathop{\bf E}((\mathbf{B}^0( T ))^2 | \mathbf{t}, \mathbf{B}^{z_0}(\mathbf{t}) ) = T


\displaystyle \mathop{\bf E} \mathop{\bf E}((\mathbf{B}^0( T ))^4 | \mathbf{t}, \mathbf{B}^{z_0}(\mathbf{t}) ) = 3T^2

where the inner conditional expectations are with respect to the event that {\mathbf{t}, \mathbf{B}^{0}(\mathbf{t})} attains a particular point in {S}. However, from the independent increment nature of Brownian motion, once one conditions {(\mathbf{t}, \mathbf{B}^{0}(\mathbf{t}))} to a fixed point {(t, x)}, the random variable {\mathbf{B}^0(T)} becomes a real normally distributed variable with mean {x} and variance {T-t}. Thus we have

\displaystyle \mathop{\bf E}(\mathbf{B}^0( T ) | \mathbf{t}, \mathbf{B}^{z_0}(\mathbf{t}) ) = \mathbf{B}^{z_0}(\mathbf{t})


\displaystyle \mathop{\bf E}( (\mathbf{B}^0( T ))^2 | \mathbf{t}, \mathbf{B}^{z_0}(\mathbf{t}) ) = \mathbf{B}^{z_0}(\mathbf{t})^2 + T - \mathbf{t}


\displaystyle \mathop{\bf E}( (\mathbf{B}^0( T ))^4 | \mathbf{t}, \mathbf{B}^{z_0}(\mathbf{t}) ) = \mathbf{B}^{z_0}(\mathbf{t})^4 + 6(T - \mathbf{t}) \mathbf{B}^{z_0}(\mathbf{t})^2 + 3(T - \mathbf{t})^2

which give the first two claims, and (after some algebra) the identity

\displaystyle \mathop{\bf E} \mathbf{B}^{z_0}(\mathbf{t})^4 - 6 \mathbf{t} \mathbf{B}^{z_0}(\mathbf{t})^2 + 3 \mathbf{t}^2 = 0

which then also gives the third claim. \Box

Exercise 5 Prove the second part of Lemma 4.

— 1. Conformal invariance of Brownian motion —

Let {U} be an open subset of {{\bf C}}, and {z_0} a point in {U}. We can define the complex Brownian motion with base point {z_0} restricted to {U} to be the restriction {\mathbf{B}^{z_0}: [0,\mathbf{t}) \rightarrow U} of a complex Brownian motion {\mathbf{B}^{z_0}} with base point {z_0} to the first time {\mathbf{t} \in (0,+\infty]} in which the Brownian motion exits {U} (or {+\infty} if no such time exists). We have a fundamental conformal invariance theorem of Lévy:

Theorem 6 (Lévy’s theorem on conformal invariance of Brownian motion) Let {\phi: U \rightarrow V} be a conformal map between two open subsets {U,V} of {{\bf C}}, and let {\mathbf{B}^{z_0}: [0, \mathbf{t}) \rightarrow U} be a complex Brownian motion with base point {z_0} restricted to {U}. Define a rescaling {\mathbf{\tau}: [0, \mathbf{t}) \rightarrow [0,+\infty)} by

\displaystyle \mathbf{\tau}(t) := \int_0^t |\phi'(\mathbf{B}^{z_0}(s))|^2\ ds.

Note that this is almost surely a continuous strictly monotone increasing function. Set {\mathbf{t}' := \lim_{t \rightarrow \mathbf{t}} \mathbf{\tau}(t)} (so that {\mathbf{\tau}} is a homeomorphism from {[0,\mathbf{t})} to {[0,\mathbf{t}')}), and let {\tilde{\mathbf{B}}^{\phi(z_0)}: [0, \mathbf{t}') \rightarrow V} be the function defined by the formula

\displaystyle \tilde{\mathbf{B}}^{\phi(z_0)}(\mathbf{\tau}(t)) := \phi( \mathbf{B}^{z_0}( t) ).

Then {\tilde{\mathbf{B}}^{\phi(z_0)}} is a complex Brownian motion with base point {\phi(z_0)} restricted to {V}.

Note that this significantly generalises the translation and dilation invariance of complex Brownian motion.

Proof: (Somewhat informal – to do things properly one should first set up Ito calculus) To avoid technicalities we will assume that {|\phi'|} is bounded above and below on {U}, so that the map {\mathbf{\tau}} is uniformly bilipschitz; the general case can be obtained from this case by a limiting argument that is not detailed here. With this assumption, we see that {\tilde{\mathbf{B}}^{\phi(z_0)}} almost surely extends continuously to the endpoint time {\mathbf{t}'} if this time is finite. Once one conditions on the value of {\mathbf{t}'} and {\tilde{\mathbf{B}}^{\phi(z_0)}} up to this time {\mathbf{t}'}, we then extend this motion further (if {\mathbf{t}' < \infty}) by declaring {t \mapsto \tilde{\mathbf{B}}^{\phi(z_0)}(t')} for {t' \geq \mathbf{t}'} to be a complex Brownian motion with base point {\tilde{\mathbf{B}}^{\phi(z_0)}(\mathbf{t'})}, translated in time by {\mathbf{t}'}. Now {\tilde{\mathbf{B}}^{\phi(z_0)}} is defined on all of {[0,+\infty)}, and it will suffice to show that this is a complex Brownian motion based at {\phi(z_0)}. The basing is clear, so it suffices to show for all times {0 \leq t'_0 < t'_1 < \dots 0}, the random variable {\tilde{\mathbf{B}}^{\phi(z_0)}(t_1)} is normally distributed with mean {\phi(z_0)} and variance {t_1}.

Let {F \in C^\infty_c({\bf C})} be a test function. It will suffice to show that

\displaystyle \mathop{\bf E} F( \mathbf{B}^{\phi(z_0)}(t_1) ) = \frac{1}{\pi t_1} \int_{\bf C} e^{-|z-z_0|^2/t_1} F(z)\ dx dy.

If we define the field

\displaystyle u(t,z') := \frac{1}{\pi (t_1-t)} \int_{\bf C} e^{-|z-z'|^2/(t_1-t)} F(z)\ dx dy

for {0 \leq t < t_1} and {z' \in {\bf C}}, with {u(t_1,z') := F(z')}, then it will suffice to prove the more general claim

\displaystyle \tilde u(t,z') = u(t, z') \ \ \ \ \ (1)

for all {0 \leq t \leq t_1} and {z' \in {\bf C}} (with the convention that {\tilde{\mathbf{B}}^{z'}} is just Brownian motion based at {z'} if {z'} lies outside of {V}), where

\displaystyle \tilde u(t,z') := \mathop{\bf E} F( \tilde{\mathbf{B}}^{z'}( t_1-t ) )

As is well known, {u} is smooth on {[0,t_1] \times {\bf C}} and solves the backwards heat equation

\displaystyle \partial_t u = - \partial_{xx} u - \partial_{yy} u \ \ \ \ \ (2)

on this domain. The strategy will be to show that {\tilde u} also solves this equation.

Let {0 \leq t \leq t_1} and {z' \in {\bf C}}. If {t=t_1} then clearly {\tilde u(t,z') = F(z') = u(t,z')}. If instead {t < t_1} and {z' \not \in U}, then {\tilde{\mathbf{B}}^{z'}} is a Brownian motion and then we have {\tilde u(t,z') = u(t,z')}. Now suppose that {t 0} be small enough that {t+ dt \leq t_1}, where {C} is an upper bound for {|\phi'|^2} on {U}. Let {\mathbf{t}} be the first time such that either {\mathbf{B}^z(\mathbf{t}) \not \in U} or

\displaystyle \mathbf{t}' := \int_0^{\mathbf t} |\phi'(\mathbf{B}^{z}(s))|^2\ ds = dt.

Then if we let {\mathbf{t}} be the quantity

\displaystyle \mathbf{t}' := \int_0^{\mathbf t} |\phi'(\mathbf{B}^{z}(s))|^2\ ds,

then {0 \leq \mathbf{t}' \leq dt} and {\tilde{\mathbf{B}}^{z'}( t + \mathbf{t}') = \mathbf{B}^z(\mathbf{t})}. Let us now condition on a specific value of {\mathbf{t}}, and on the trajectory {\mathbf{B}^z} up to time {\mathbf{t}}. Then the (conditional) distribution of {\tilde{\mathbf{B}}^{z'}( t_1-t )} is that of {\tilde{\mathbf{B}}^{\mathbf{B}^z(\mathbf{t})}( t_1 - t - \mathbf{t'} )}, and hence the conditional expectation is {\tilde u( t + \mathbf{t}', \mathbf{B}^z(\mathbf{t}))}. By the law of total expectation, we conclude the identity

\displaystyle \tilde u(t,z') = \mathop{\bf E} \tilde u( t + \mathbf{t}', \mathbf{B}^z(\mathbf{t})).

Next, we obtain the analogous estimate

\displaystyle u(t,z') = \mathop{\bf E} u( t + \mathbf{t}', \mathbf{B}^z(\mathbf{t})) + O( dt^{3/2} ). \ \ \ \ \ (3)

From Taylor expansion we have

\displaystyle u( t + \mathbf{t}', \mathbf{B}^z(\mathbf{t})) = u(t,z) + \mathbf{t'} \partial_t u(t,z) + \mathrm{Re}(\mathbf{B}^z(\mathbf{t})-z) \partial_x u(t,z) + \mathrm{Im}(\mathbf{B}^z(\mathbf{t})-z) \partial_y u(t,z)

\displaystyle + \frac{1}{2} (\mathrm{Re}(\mathbf{B}^z(\mathbf{t})-z))^2 \partial_{xx} u(t,z)

\displaystyle + (\mathrm{Re}(\mathbf{B}^z(\mathbf{t})-z)) (\mathrm{Im}(\mathbf{B}^z(\mathbf{t})-z)) \partial_{xy} u(t,z)

\displaystyle + \frac{1}{2} (\mathrm{Im}(\mathbf{B}^z(\mathbf{t})-z))^2 \partial_{yy} u(t,z)

\displaystyle + O( |\mathbf{B}^z(\mathbf{t}) - z|^3 ).

Taking expectations and applying Lemma 4, (2) and Hölder’s inequality (which can interpolate between the bounds {\mathop{\bf E} |\mathbf{B}^z(\mathbf{t}) - z|^4 = O(dt^2)} and {\mathop{\bf E} |\mathbf{B}^z(\mathbf{t}) - z|^2 = O(dt)} to conclude {\mathop{\bf E} |\mathbf{B}^z(\mathbf{t}) - z|^3 = O(dt^{3/2})}), we obtain the desired claim (3). Subtracting, we now have

\displaystyle \tilde u(t,z') - u(t,z') = \mathop{\bf E} (\tilde u-u)( t + \mathbf{t}', \mathbf{B}^z(\mathbf{t})) + O( dt^{3/2} ).

The expression in the expectation vanishes unless {\mathbf{t}' = dt}, hence by the triangle inequality

\displaystyle \| \tilde u(t) - u(t)\|_{L^\infty({\bf C})} \leq \| \tilde u(t+dt) - u(t+dt) \|_{L^\infty({\bf C})} + O(dt^{3/2}).

Iterating this using the fact that {\tilde u-u} vanishes at {t=t_1}, and sending {dt} to zero (noting that the cumulative error term will go to zero since {3/2 > 1}), we conclude that {\tilde u(t)=u(t)} for all {0 \leq t \leq t_1}, giving the claim. \Box

One can use Lévy’s theorem (or variants of this theorem) to prove various results in complex analysis rather efficiently. As a quick example, we sketch a Brownian motion-based proof of Liouville’s theorem (omitting some technical steps). Suppose for contradiction that we have a nonconstant bounded entire function {f: {\bf C} \rightarrow {\bf C}}. If {\mathbf{B}^0} is a complex Brownian motion based at {0}, then a variant of Levy’s theorem can be used to show that the image {f(\mathbf{B}^0)} is a time parameterisation of Brownian motion. But it is easy to show that Brownian motion is almost surely unbounded, so the image {f({\bf C})} cannot be bounded.

If {U} is an open subset of {{\bf C}} whose complement contains an arc, then one can show that for any {z_0 \in U}, the complex Brownian motion {\mathbf{B}^{z_0}} based at {z_0} will hit the boundary {\partial U} of {U} in a finite time {\mathbf{t}}. The location {\mathbf{B}^{z_0}(\mathbf{t})} where this motion first hits the boundary is then a random variable in {\partial U}; the law of this variable is called the harmonic measure of {U} with base point {z_0}, and we will denote it by {\mu^U_{z_0}}; it is a probability measure on {\partial U}. The reason for the terminology “harmonic measure” comes from the following:

Theorem 7 Let {U} be a bounded open subset of {{\bf C}}, and let {f: U \rightarrow {\bf C}} be a harmonic (or holomorphic) function that extends continuously to {\partial U}. Then for any {z_0 \in U}, one has the representation formula

\displaystyle f(z_0) = {\mathbf E} f( \mathbf{B}^{z_0}(\mathbf{t})) = \int_{\partial U} f(z)\ d\mu^U_{z_0}(z). \ \ \ \ \ (4)


Proof: (Informal) For simplicity let us assume that {f} extends smoothly to some open neighbourhood of {\partial U}. Let {\tilde {\mathbf B}^{z_0}} be the motion that is equal to {\mathbf{B}^{z_0}} up to time {\mathbf{t}}, and then is constant at {\mathbf{B}^{z_0}(\mathbf{t})} for all later times. A variant of the Taylor expansion argument used to prove Lévy’s theorem shows that

\displaystyle \mathop{\bf E} f( \tilde{\mathbf{B}}^{z_0}(t) ) = \mathop{\bf E} f( \tilde{\mathbf{B}}^{z_0}(t+dt) ) + O( dt^{3/2})

for any {0 \leq t < t+dt < \infty}, which on iterating and sending {dt} to zero implies that {\mathop{\bf E} f( \tilde{\mathbf{B}}^{z_0}(t) )} is independent of time. Since this quantity converges to {f(z_0)} as {t \rightarrow 0} and to {f( \mathbf{B}^{z_0}(\mathbf{t}))} as {t \rightarrow \infty}, the claim follows. \Box

This theorem can also extend to unbounded domains provided that {f} does not grow too fast at infinity (for instance if {f} is bounded, basically thanks to the neighbourhood recurrent properties of complex Brownian motion); we do not give a precise statement here. Among other things, this theorem gives an immediate proof of the maximum principle for harmonic functions, since if {|f(z)| \leq M} on the boundary {\partial U} then from the triangle inequality one has {|f(z_0)|\leq M} for all {z_0 \in U}. It also gives an alternate route to Liouville’s theorem: if {f: {\bf C} \rightarrow {\bf C}} is entire and bounded, then applying the maximum principle to the complement of a small disk {D(z_1,\varepsilon)} we see that {f(z_0) = f(z_1)} for all distinct {z_0,z_1 \in {\bf C}}.

When the boundary {\partial U} is sufficiently nice (e.g. analytic), the harmonic measure becomes absolutely continuous with respect to one-dimensional Lebesgue measure; however, we will not pay too much attention to these sorts of regularity issues in this set of notes.

From Levy’s theorem on the conformal invariance of Brownian motion we deduce the conformal invariance of harmonic measure, thus for any conformal map {f: U \rightarrow V} that extends continuously to the boundaries {\partial U, \partial V} and any {z_0 \in {\bf C}}, the harmonic measure {\mu^V_{f(z_0)}} of {V} with base point {f(z_0)} is the pushforward of the harmonic measure {\mu^U_{z_0}} of {U} with base point {z_0}, thus

\displaystyle \int_{\partial V} g(w)\ d\mu^V_{f(z_0)}(w) = \int_{\partial U} g(f(z))\ d\mu^U_{z_0}(z)

for any continuous compactly supported test function {g}, and also

\displaystyle \mu^V_{f(z_0)}( E ) = \mu^U_{z_0}( f^{-1}(E) )

for any (Borel) measurable {E \subset \partial V}.

Exercise 8 (Poisson kernel)

Exercise 9 (Brownian motion description of conformal mapping) Let {U} be the region enclosed by a Jordan curve {\partial U}, and let {z_1,z_2,z_3} be three distinct points on {\partial U} in anticlockwise order. Let {w_1,w_2,w_3} be three distinct points on the boundary {\partial D(0,1)} of the unit disk {D(0,1)}, again traversed in anticlockwise order. Let {\phi: U \rightarrow D(0,1)} be the conformal map that takes {z_j} to {w_j} for {j=1,2,3} (the existence and uniqueness of this map follows from the Riemann mapping theorem). Let {z_0 \in U}, and for {ij=12,23,31}, let {p_{ij}} be the probability that the terminal point {\mathbf{B}^{z_0}(\mathbf{t})} of Brownian motion at {U} with base point {z_0} lies in the arc between {z_i} and {z_j} (here we use the fact that the endpoints {z_i,z_j} are hit with probability zero, or in other words that the harmonic measure is continuous; see Exercise 15 below). Thus {p_{12}, p_{23}, p_{31}} are non-negative and sum to {1}. Let {\zeta_1,\zeta_2,\zeta_3 \in \partial B(0,1)} be the complex numbers {\zeta_1 := 1}, {\zeta_2 := e^{2\pi i p_{12}}}, {\zeta_3 := e^{2\pi i (p_{12} + p_{23})} = e^{-2\pi i p_{31}}}. Show the crossratio identity

\displaystyle \frac{(\phi(z_0) - w_1)(w_3 - w_2)}{(\phi(z_0)-w_2)(w_3-w_1)} = \frac{ \zeta_1 (\zeta_3 - \zeta_2)}{\zeta_2 (\zeta_3 - \zeta_1)}.

In principle, this allows one to describe conformal maps purely in terms of Brownian motion.

We remark that the link between Brownian motion and conformal mapping can help gain an intuitive understanding of the Carathéodory kernel theorem (Theorem 12 from Notes 3). Consider for instance the example in Exercise 13 from those notes. It is intuitively clear that a Brownian motion {\mathbf{B}^0} based at the origin will very rarely pass through the slit beween {-1 + \frac{i}{2w_n}} and {-1 + \frac{i}{2w_n}}, instead hitting the right side of the boundary of {f_n(D(0,1))} first. As such, the harmonic measure of the left side of the bounadry should be very small, and in fact one can use this to show that the preimage under {f_n} of the region to the left of the boundary goes to zero in diameter as {n \rightarrow \infty}, which helps explain why the limiting function {f} does not map to this region at all.

Exercise 10 (Brownian motion description of conformal radius)

  • (i) Let {0 < r_1 < r_2} and {z_0 \in {\bf C}} with {r_1 < |z_0| < r_2}. Show that the probability that the Brownian motion {\mathbf{B}^{z_0}} hits the circle {\{ |z| = r_1\}} before it hits {\{ |z| = r_2\}} is equal to {\frac{\log(r_2/|z_0|)}{\log(r_2/r_1)}}. (Hint: {\log|z|} is harmonic away from the origin.)
  • (ii) Let {U} be a simply connected proper subset of {{\bf C}}, let {z_0} be a point in {U}, and let {r} be the conformal radius of {U} around {z_0}. Show that for small {\varepsilon_2 > \varepsilon_1 > 0}, the probability that a Brownian motion based at a point {z_1} with {|z_1-z_0| = \varepsilon_2} will hit the circle {\{ |z-z_0| = r_1\}} before it hits the boundary {\partial U} is equal to {\frac{\log(r/\varepsilon_2) + o(1)}{\log(r/\varepsilon_1)}}, where {o(1)} denotes a quantity that goes to zero as {\varepsilon_1,\varepsilon_2 \rightarrow 0}.

Exercise 11 Let {K} be a connected subset of {D(0,1)}, let {{\mathbf B}^0} be a Brownian motion based at the origin, and let {\mathbf{t}} be the first time this motion exits {D(0,1)}. Show that the probability that {\mathbf{B}^0([0,\mathbf{t}])} hits {K} is at least {c\mathrm{diam}(K)} for some absolute constant {c>0}. (Hint: one can control the event that {\mathbf{B}^0} makes a “loop” around a point in {K} at radius less than {\mathrm{diam}(K)}, which is enough to force intersection with {K}, at least if one works some distance away from the boundary of the disk.)

We now sketch the proof of a basic Brownian motion estimate that is useful in applications. We begin with a lemma that says, roughly speaking, that “folding” a set reduces the probability of it being hit by Brownian motion.

Lemma 12 Let {-1 < x < 1}, and let {K} be a closed subset of the unit disk {D(0,1)}. Write {K^+ := \{ z \in K: \mathrm{Im}(z) \geq 0 \}} and {K^- := \{ z \in K: \mathrm{Im}(z) < 0 \}}, and write {K' := K^+ \cup \overline{K^-}} (i.e. {K} reflected onto the upper half-plane). Let {{\mathbf B}^{x}} be a complex Brownian motion based at {x}, and let {\mathbf{t}} be the first time this motion hits the boundary of {D(0,1)}. Then

\displaystyle \mathbf{P}( \mathbf{B}^x([0,\mathbf{t}]) \hbox{ intersects} K ) \geq \mathbf{P}( \mathbf{B}^x([0,\mathbf{t}]) \hbox{ intersects} K' ).

Proof: (Informal) To illustrate the argument at a heuristic level, let us make the (almost surely false) assumption that the Brownian motion {\mathbf{B}^x} only crosses the real axis at a finite set of times {t_0=0 < t_1 < \dots < t_n < \mathbf{t}} before hitting the disk. Then the Brownian motion {\mathbf{B}^x([0,\mathbf{t}])} would split into subcurves {\mathbf{B}^x([t_i,t_{i+1}])} for {i=0,\dots,n}, with the convention that {t_{i+1} = \mathbf{t}}. Each subcurve would lie in either the upper half-plane or the lower half-plane, with equal probability of each; furthermore, one could arbitrarily apply complex conjugation to one or more of these subcurves and still obtain a motion with the same law. Observe that if one conditions on the Brownian motion up to time {t_i}, and the subcurve {\mathbf{B}^x([t_i,t_{i+1}])} has a probability {p_i^+} of hitting {K^+} when it lies in the upper half-plane, and a probability {p_i^-} of hitting {K^-} when it lies in the lower half-plane, then it will have a probability of at most {p_i^+ + p_i^-} of hitting {K'} when it lies in the upper half-plane, and probability {0} of hitting {K'} when it lies in the lower half-plane; thus the probability of this subcurve hitting {K'} is less than or equal to that of it hitting {K}. In principle, the lemma now follows from repeatedly applying the law of total expectation.

This naive argument does not quite work because a Brownian motion starting at a real number will in fact almost surely cross the real axis an infinite number of times. However it is possible to adapt this argument by redefining the {t_i} so that after each time {t_i}, the Brownian motion is forced to move some small distance before one starts looking for the next time {t_{i+1}} it hits the real axis. See the proof of Lemma 6.1 of these notes of Lawler for a complete proof along these lines. \Box

This gives an inequality similar in spirit to the Grötzsch modulus estimate from Notes 2:

Corollary 13 (Beurling projection theorem) Let {0 < \varepsilon < 1}, and let {K} be a compact connected subset the annulus {\{ \varepsilon \leq |z| \leq 1 \}} that intersects both boundary circles of the annulus. Let {{\mathbf B}^0} be a complex Brownian motion based at {0}, and let {\mathbf{t}} be the first time this motion hits the outer boundary {\{ |z|=1\}} of the annulus. Then the probability that {\mathbf{B}^0([0,\mathbf{t}])} intersects {K} is greater than or equal to the probability that {\mathbf{B}^0([0,\mathbf{t}])} intersects the interval {[\varepsilon,1]}.

Proof: (Sketch) One can use the above lemma to fold {K} around the real axis without increasing the probability of being hit by Brownian motion. By rotation, one can similarly fold {K} around any other line through the origin. By repeatedly folding {K} in this fashion to reduce its angular variation, one can eventually replace {K} with a set that lies inside the sector {\{ re^{i\theta}: \varepsilon \leq r \leq 1; 0 \leq \theta \leq \delta \}} for any {\delta}. However, by the monotone convergence theorem, the probability that {\mathbf{B}^0([0,\mathbf{t}])} intersects this sector converges to the probability that it intersects {[\varepsilon,1]} in the limit {\delta \rightarrow 0}, and the claim follows. \Box

Exercise 14 With the notation as the above corollary, show that the probability that {\mathbf{B}^0([0,\mathbf{t}])} intersects the interval {[\varepsilon,1]} is {O(\varepsilon^{1/2})}. (Hint: apply a square root conformal map to the disk with {[\varepsilon,1]} removed, and then compare with the half-plane harmonic measure from Exercise 8(ii).)

The following consequence of the above estimate, giving a sort of Hölder regularity of Brownian measure, is particularly useful in applications.

Exercise 15 (Beurling estimate) Let {U} be an open set not containing {0}, with the property that the connected component of {{\bf C} \backslash U} containing {0} intersects the unit circle {\{ |z| = r \}}. Let {z_0 \in U} be such that {|z_0| \geq 2r}. Then for any {\varepsilon > 0}, one has {\mu^U_{z_0}( D(0,\varepsilon r) ) = O(\varepsilon^{1/2})}; that is to say, the probability that a Brownian motion based at {z_0} exits {U} at a point within {\varepsilon r} from the origin is {O(\varepsilon^{1/2})}. (Hint: one can use conformal mapping to show that the probability appearing at the end of Corollary 13 is {O(\varepsilon^{1/2})}.) Conclude in particular that harmonic measures {\mu^U_{z_0}} are always continuous (they assign zero to any point).

Exercise 16 Let {U} be a region bounded by a Jordan curve, let {z_0 \in U}, let {\mathbf{B}^{z_0}} be the Brownian motion based at {z_0}, and let {\mathbf{t}} be the first time this motion exits {U}. Then for any {R > 0}, show that the probability that the curve {\mathbf{B}^{z_0}([0,\mathbf{t}])} has diameter at least {R \mathrm{dist}(z, \partial U)} is at most {O(R^{-1/2})}.

Exercise 17 Let {f: D(0,1) \rightarrow U} be a conformal map with {f(0)=0}, and let {\gamma: [0,1] \rightarrow \overline{U}} be a curve with {\gamma(0) \in \partial U} and {\gamma(t) \in U} for {0 < t \leq 1}. Show that

\displaystyle \mathrm{diam}( f^{-1}( \gamma([0,1]) ) ) \leq O( (\frac{\mathrm{diam}(\gamma([0,1]))}{|f'(0)|})^{1/2}).

(Hint: use Exercise 11.)

— 2. Half-plane capacity —

One can use Brownian motion to construct other close relatives of harmonic measure, such Green’s functions, excursion measures. See for instance these lecture notes of Lawler for more details. We will focus on one such use of Brownian motion, to interpret the concept of half-plane capacity; this is a notion that is particularly well adapted to the study of chordal Loewner equations (it plays a role analogous to that of conformal radius for the radial Loewner equation).

Let {\mathbf{H} := \{ z: \mathrm{Im}(z) > 0 \}} be the upper half-plane. A subset {A} of the upper half-plane {\mathbf{H}} is said to be a compact hull if it is bounded, closed in {\mathbf{H}}, and the complement {\mathbf{H} \backslash A} is simply connected. By the Riemann mapping theorem, for any compact hull {A}, there is a unique conformal map {g_A: \mathbf{H} \backslash A \rightarrow \mathbf{H}} which is normalised at infinity in the sense that

\displaystyle g_A(z) = z + \frac{b_1}{z} + \frac{b_2}{z^2} + \dots \ \ \ \ \ (5)

for some complex numbers {b_1, b_2, \dots}. The quantity {b_1} is particularly important and will be called the half-plane capacity of {A}and denoted {\mathrm{hcap}(A)}.

Exercise 18 (Examples of half-plane capacity)

In general, we have the following Brownian motion characterisation of half-plane capacity:

Proposition 19 Let {A} be a compact hull, with conformal map {g_A: \mathbf{H} \backslash A \rightarrow \mathbf{H}} and half-plane capacity {b_1}.

  • (i) If {\mathbf{B}^{z_0}} is complex Brownian motion based at some point {z_0 \in \mathbf{H} \backslash A}, and {\mathbf{t}} is the first time this motion exits {\mathbf{H} \backslash A}, then

    \displaystyle \mathrm{Im}(z_0) = \mathrm{Im}( g_A(z_0) ) + \mathbf{E} \mathrm{Im} \mathbf{B}^{z_0}(\mathbf{t}).

  • (ii) We have

    \displaystyle b_1 = \lim_{y \rightarrow +\infty} y \mathbf{E} \mathrm{Im} \mathbf{B}^{iy}(\mathbf{t}).

Proof: (Sketch) Part (i) follows from applying Theorem 7 to the bounded harmonic function {z \mapsto \mathrm{Im}(z - g_A(z))}. Part (ii) follows from part (i) by setting {z_0 = iy} for a large {y}, rearranging, and sending {y \rightarrow \infty} using (5). \Box

Among other things, this proposition demonstrates that {\mathrm{Im}(g_A(z_0)) \leq \mathrm{Im}(z_0)} for all {z_0 \in \mathbf{H} \backslash A}, and that the half-plane capacity is always non-negative (in fact it is not hard to show from the above proposition that it is strictly positive as long as {A} is non-empty).

If {A, A'} are two compact hulls with {A \subset A'}, then {g_A} will map {\mathbf{H} \backslash A'} conformally to the complement of {g_A( A' \backslash A)} in {\mathbf{H}}. Thus {g_A( A' \backslash A)} is also a convex hull, and by the uniqueness of Riemann maps we have the identity

\displaystyle g_{A'} = g_{g_A(A' \backslash A)} \circ g_A \ \ \ \ \ (8)

which on comparing Laurent expansions leads to the further identity

\displaystyle \mathrm{hcap}(A') = \mathrm{hcap}( g_A(A' \backslash A) ) + \mathrm{hcap}( A). \ \ \ \ \ (9)

In particular we have the monotonicity {\mathrm{hcap}(A') \geq \mathrm{hcap}(A)}, with equality if and only if {A'=A}. One may verify that these claims are consistent with Exercise 18.

Exercise 20 (Submodularity of half-plane capacity) Let {A_1, A_2} be two compact hulls.

  • (i) If {z_0 \in \mathbf{H} \backslash (A_1 \cap A_2)}, show that

    \displaystyle \mathrm{Im}(g_{A_1}(z_0)) + \mathrm{Im}(g_{A_2}(z_0)) \geq \mathrm{Im}(g_{A_1 \cap A_2}(z_0)) + \mathrm{Im}(g_{A_1 \cup A_2}(z_0)).

    (Hint: use Proposition 19, and consider how the times in which a Brownian motion {\mathbf{B}^{z_0}} exits {\mathbf{H} \backslash A_1}, {\mathbf{H} \backslash A_2}, {\mathbf{H} \backslash (A_1 \cup A_2)}, and {\mathbf{H} \backslash (A_1 \cap A_2)} are related.)

  • (ii) Show that

    \displaystyle \mathrm{hcap}(A_1) + \mathrm{hcap}(A_2) \geq \mathrm{hcap}(A_1 \cap A_2) + \mathrm{hcap}(A_1 \cup A_2).

Exercise 21 Let {A} be a compact hull bounded in a disk {D(0,R)}. For any {x>R}, show that

\displaystyle \mathbf{P}( \mathbf{B}^{iy}( \mathbf{t} ) \in [x,+\infty) ) = \frac{1}{2} - \frac{g_A(x)}{\pi y} + o(1)

as {y \rightarrow +\infty}, where {\mathbf{B}^{iy}} is complex Brownian motion based at {iy} and {\mathbf{t}} is the first time it exits {\mathbf{H} \backslash A}. Similarly, for any {x < -R}, show that

\displaystyle \mathbf{P}( \mathbf{B}^{iy}( \mathbf{t} ) \in (-\infty,x] ) = \frac{1}{2} + \frac{g_A(x)}{\pi y} + o(1).

This formula gives a Brownian motion interpretation for {g_A} on the portion {(-\infty,R) \cup (R,+\infty)} of the boundary of {\mathbf{H} \backslash A}. It can be used to give useful quantitative estimates for {g_A} in this region; see Section 3.4 of Lawler’s book.

— 3. The chordal Loewner equation —

We now develop (in a rather informal fashion) the theory of the chordal Loewner equation, which roughly speaking is to conformal maps from the upper half-plane {{\mathbf H}} to the complement {{\mathbf H} \backslash A} of complex hulls as the radial Loewner equation is to conformal maps from the unit disk to subsets of the complex plane. A more rigorous treatment can be found in Lawler’s book.

Suppose one has a simple curve {\gamma: [0,+\infty) \rightarrow \overline{\mathbf{H}}} such that {\gamma(0) \in {\bf R}} and {\gamma(0,+\infty) \in {\mathbf H}}. There are important and delicate issues regarding the regularity hypotheses on this curve (which become particularly important in SLE, when the regularity is quite limited), but for this informal discussion we will ignore all of these issues.

For each time {t}, the set {\gamma((0,t])} forms a compact hull, and so has some half-plane capacity {\mathrm{hcap}( \gamma((0,t]))}. From the monotonicity of capacity, this half-plane capacity is increasing in {t}. It is traditional to normalise the curve {\gamma} so that

\displaystyle \mathrm{hcap}( \gamma((0,t])) = 2t; \ \ \ \ \ (10)

this is analogous to normalising the Loewner chains from Notes 3 to have conformal radius {e^t} at time {t}. A basic example of such normalised curves would be the curves {\gamma(t) = x+2 t^{1/2} i} for some fixed {x}, since the normalisation follows from (6).

Let {g_t: \mathbf{H} \backslash \gamma((0,t]) \rightarrow \mathbf{H}} be the conformal maps associated to these compact hulls. From (8) we will have

\displaystyle g_{t+dt} = g_{t+dt \leftarrow t} \circ g_t \ \ \ \ \ (11)

for any {t \geq 0} and {dt>0}, where {g_{t + dt \leftarrow t}: \mathbf{H}\backslash g_t(\gamma((t,t+dt]))) \rightarrow \mathbf{H}} is the conformal map associated to the compact hull {g_t(\gamma((t,t+dt]))}. From (9) this hull has half-plane capacity {2dt}, thus we have the Laurent expansion

\displaystyle g_{t+dt \leftarrow t}(z) = z + \frac{2dt}{z} + \dots .

It can be shown (using the Beurling estimate) that {g_t} extends continuously to the tip {\gamma(t)} of the curve {\gamma([0,t])}, and attains a real value {U_t := g_t(\gamma(t))} at that point; furthermore, {U_t} depends continuously on {t}. See Lemma 4.2 of Lawler’s book. As such, {g_t(\gamma((t,t+dt]))} should be a short arc (of length {O(dt)}) starting at {U_t := g_t(\gamma(t))}. If {U_t=0}, it is possible to use a quantitative version of Exercise 21 (again using the Beurling estimate) to obtain an estimate basically of the form

\displaystyle g_{t+dt \leftarrow t}(z) = z + \frac{2dt}{z} + o(dt).

for any fixed {z \in \mathbf{H}}. If {U_t} is non-zero, we instead have

\displaystyle g_{t+dt \leftarrow t}(z) = z + \frac{2dt}{z - U_t} + o(dt). \ \ \ \ \ (12)

For instance, if {\gamma(t) = x + 2t^{1/2} i}, then {U_t = x} for all {t}, and from Exercise 18 we have the exact formula

\displaystyle g_{t+dt \leftarrow t}(z) = x + ((z-x)^2 + 4 dt)^{1/2} = z + \frac{2dt}{z - x} + o(dt).

Inserting (12) into (11) and using the chain rule, we obtain

\displaystyle g_{t+dt}(z) = g_t(z) + \frac{2dt}{g_t(z) - U_t} + o(dt)

and we then arrive at the (chordal) Loewner equation

\displaystyle \partial_t g_t(z) = \frac{2}{g_t(z) - U_t} \ \ \ \ \ (13)

for all {t \geq 0} and {z \in \mathbf{H} \backslash \gamma((0,t])}. This equation can be justified rigorously for any simple curve {\gamma}: see Proposition 4.4 of Lawler’s book. Note that the imaginary part of {\frac{2}{g_t(z)-U_t}} is negative, which is consistent with the observation made previously that the imaginary part of {g_t(z)} is decreasing in {t}.

We have started with a chain of compact hulls {\gamma((0,t])} associated to a simple curve, and shown that the resulting conformal maps {g_t: \mathbf{H} \backslash \gamma((0,t]) \rightarrow \mathbf{H}} obey the Loewner equation for some continuous driving term {U_t: [0,+\infty) \rightarrow {\bf R}}. Conversely, suppose one is given a continuous driving term {U_t: [0,+\infty) \rightarrow {\bf R}}. It follows from Picard existence and uniqueness theorem that for each {z \in \mathbf{H}} there is a unique maximal time of existence {0 < T(z) \leq +\infty} such that the ODE (13) with initial data {g_0(z)} can be solved for time {0 \leq t  t \}}, one can show that for each time {t}, {g_t} is a conformal map from {H_t} to {\mathbf{H}} with the Laurent expansion

\displaystyle g_t(z) = z + \frac{2t}{z} + \dots,

hence the complement {K_t := \mathbf{H} \backslash H_t} are an increasing sequence of compact hulls with half-plane capacity {2t}. Proving complex differentiability of {g_t} can be done from first principles, and the Laurent expansion near infinity is also not hard; the main difficulty is to show that the map {g_t: H_t \rightarrow \mathbf{H}} is surjective, which requires solving (13) backwards in time (and here one can do this indefinitely as now one is moving away from the real axis instead of towards it). See Theorem 4.6 of Lawler’s book for details (in fact a more general theorem is proven, in which the single point {U_t} is replaced by a probability measure, analogously to how the radial Loewner equation uses Herglotz functions instead of a single driving function when not restricted to slit domains). However, there is a subtlety, in that the hulls {K_t} are not necessarily the image of simple curves {\gamma}. This is often the case for short times if the driving function {U_t} does not oscillate too wildly, but it can happen that the curve {\gamma} that one would expect to trace out {K_t} eventually intersects itself, in which case the region it then encloses must be absorbed into the hull {K_t} (cf. the “pinching off” phenomenon in the Carathéodory kernel theorem). Nevertheless, it is still possible to have Loewner chains that are “generated” by non-simple paths {\gamma: [0,+\infty) \rightarrow \mathbf{H}}, in the sense that {H_t} consists of the unbounded connected component of the complement {\mathbf{H} \backslash \gamma([0,t])}.

There are some symmetries of the transform from the {U_t} to the {g_t}. If one translates {U_t} by a constant, {\tilde U_t = U_t + x_0}, then the resulting domains {H_t} are also translated, {\tilde H_t = H_t + x_0}, and {\tilde g_t(z) = g_t(z-x_0) + x_0}. Slightly less trivially, for any {\lambda > 0}, if one performs a rescaled dilation {\tilde U_t := \lambda^{-1} U_{\lambda^2 t}}, then one can check using (13) that {\tilde H_t = \lambda^{-1} H_{\lambda^2 t}}, and the corresponding conformal maps {\tilde g_t} are given by {\tilde g_t(z) = \lambda^{-1} g_{\lambda^2 t}(\lambda z)}. On the other hand, just performing a scalar multiple {\tilde U_t = \lambda U_t} on the driving force {U_t} can transform the behavior of {g_t} dramatically; the transform from {U_t} to {g_t} is very definitely not linear!

— 4. Schramm-Loewner evolution —

In the previous section, we have indicated that every continuous driving function {U: [0,\infty) \rightarrow {\bf R}} gives rise to a family {g_t: H_t \rightarrow \mathbf{H}} of conformal maps obeying the Loewner equation (13). The (chordal) Schramm-Loewner evolution ({SLE_\kappa}) with parameter {\kappa \geq 0} is the special case in which the driving function {t \mapsto U_t} takes the form {U_t = \sqrt{\kappa} \mathbf{B}_t} for some real Brownian motion based at the origin. Thus {\mathbf{g}_t: \mathbf{H}_t \rightarrow \mathbf{H}} is now a random conformal map from a random domain {\mathbf{H}_t}, defined by solving the Schramm-Loewner equation

\displaystyle \partial_t \mathbf{g}_t(z) = \frac{2}{\mathbf{g}_t(z) - \sqrt{\kappa} \mathbf{B}_t}

with initial condition {\mathbf{g}_0(z) = z} for {z \in \mathbf{H}}, and with {\mathbf{H}_t} defined as the set of all {z} for which the above ODE can be solved up to time {t} taking values in {\mathbf{H}}. The parameter {\kappa} cannot be scaled away by simple renormalisations such as scaling, and in fact the behaviour of {SLE_\kappa} is rather sensitive to the value of {\kappa}, with special behaviour or significance at various values such as {\kappa = 2, 8/3, 3, 4, 6, 8} playing particularly special roles; there is also a duality relationship between {SLE_\kappa} and {SLE_{16/\kappa}} which we will not discuss here.

The {\kappa=0} case is rather boring, in which {\mathbf{g}_t(z) = \sqrt{z^2+4t}} is deterministic, and {\mathbf{H}_t} is just {\mathbf{H}_t} with the line segment between {0} and {2\sqrt{t}i} removed. The cases {\kappa>0} are substantially more interesting. It is a non-trivial theorem (particularly at the special value {\kappa = 8}) that {SLE_\kappa} is almost surely generated by some random path {\mathbf{\gamma}: [0,+\infty) \rightarrow \mathbf{H}}; see Theorem 6.3 of Lawler’s book. The nature of this path is sensitive to the choice of parameter {\kappa}:

  • For {0 \leq \kappa \leq 4}, the path is almost surely simple and goes to infinity as {t \rightarrow \infty}; it also avoids the real line (except at time {t=0}).
  • For {4 < \kappa 0} \mathbf{H}_t = \emptyset}; it also has non-trivial intersection with the real line.
  • For {\kappa \geq 8}, the path is almost surely space-filling (which of course also implies that {\bigcap_{t>0} \mathbf{H}_t = \emptyset}), and also hits every point on {{\bf R}}.

See Section 6.2 of Lawler’s book. The path becomes increasingly fractal as {\kappa} increases: it is a result of Rohde and Schramm and Beffara that the image almost surely has Hausdorff dimension {\min(1 + \frac{\kappa}{8}, 2)}.

We have asserted that {SLE_\kappa} defines a random path in {\mathbf{H}} that starts at the origin and generally “wanders off” to infinity (though for {\kappa > 4} it keeps recurring back to bounded sets infinitely often). By the Riemann mapping theorem, we can now extend this to other domains. Let {U} be a simply connected open proper subset of {{\bf C}} whose boundary we will assume for simplicity to be a Jordan curve (this hypothesis can be relaxed). Let {z_0, z_\infty} be two distinct points on the boundary {\partial U}. By the Riemann mapping theorem and Carathéodory’s theorem (Theorem 20 from Notes 2), there is a conformal map {\phi: U \rightarrow \mathbf{H}} whose continuous extension {\phi: \overline{U} \rightarrow \overline{\mathbf{H}}} maps {z_0} and {z_\infty} to {0} and {\infty} respectively; this map is unique up to rescalings {\phi \mapsto \lambda \phi} for {\lambda > 0}. One can then define the Schramm-Loewner evolution {SLE_\kappa} on {U} from {z_0} to {z_\infty} to be the family of conformal maps {\phi^{-1} \circ \mathbf{g}_t \circ \phi: \phi^{-1}(\mathbf{H}_t) \rightarrow U} for {t>0}, where {\mathbf{g}_t: \mathbf{H}_t \rightarrow \mathbf{H}} is the usual Schramm-Loewner evolution {SLE_\kappa} with parameter {\kappa}. The Schramm-Loewner evolution {SLE_\kappa} on {U} is well defined up to a time reparameterisation {t \mapsto \lambda^{-2} t}. The Markovian and stationary nature of Brownian motion translates to an analogous Markovian and conformally invariant property of {SLE_\kappa}. Roughly speaking, it is the following: if {U} is any reasonable domain with two boundary points {z_0, z_\infty}, {\mathbf{g}_t: \mathbf{U}_t \rightarrow U} is {SLE_\kappa} on this domain from {t_0} to {t_\infty} with associated path {\mathbf{\gamma}: [0,+\infty) \rightarrow U}, and {t_0>0} is any time, then after conditioning on the path up to time {t_0}, the remainder of the {SLE_\kappa} path has the same image as the {SLE_\kappa} path on the domain {\mathbf{U}_{t_0}} from {\mathbf{\gamma}(t_0)} to {t_\infty}. Conversely, under suitable regularity hypotheses, the {SLE_\kappa} processes are the only random path processes on domains with this property (much as Brownian motion is the only Markovian stationary process, once one normalises the mean and variance). As a consequence, whenever one now a random path process that is known or suspected to enjoy some conformal invariance properties, it has become natural to conjecture that it obeys the law of {SLE_\kappa} (though in some cases it is more natural to work with other flavours of SLE than the chordal SLE discussed here, such as radial SLE or whole-plane SLE). For instance, in the pioneering work of Schramm, this line of reasoning was used to conjecture that the loop-erased random walk in a domain has the law of (radial) {SLE_2}; this conjecture was then established by Lawler, Schramm, and Werner. Many further processes have since been either proven or conjectured to be linked to one of the SLE processes, such as the limiting law of a uniform spanning tree (proven to be {SLE_8}), interfaces of the Ising model (proven to be {SLE_3}), or the scaling limit of self-avoiding random walks (conjectured to be {SLE_{8/3}}). Further discussion of these topics is beyond the scope of this course, and we refer the interested reader to Lawler’s book for more details.

June 04, 2018

Richard EastherHow to Talk to Kids With Dreams

A few weeks ago I listened to a bunch of super-enthusiastic high school students share their excitement about astronomy, astrophysics and the space industry. We were at the Royal Astronomical Society of New Zealand’s annual meeting; every year they fund a dozen or so keen students from around the country to attend the event. It’s competitive, with many more kids applying than there are places to take them. 

The students get to share what lit their passion for space, but as they recounted their stories maybe a third of them mentioned a teacher or career counsellor who had done their best to quench that fire. “There’s no jobs in that.” “It’s really hard,” “You can’t do that in New Zealand.” “It won’t help you get into university.” And I wondered who listens to a smart, ambitious youngster's hopes and dreams and tells them to play it safe?

It isn’t all teachers – others told of enthusiastic and supportive mentors – but it was potentially way too many of them, even if the nay-sayers imagine they are protecting kids from disappointment or quietly steering them away from games that are out of their league. Ironically, this is happening at a time when there is a lot of talk about reinventing school – getting kids to code, preparing them for a future where machine learning, AI and 3D printing are big things, building baby entrepreneurs. But none of that will be much use if some teachers are telling students not to chase their dreams.

Ironically, when educators talk about preparing kids for the future, part of their message is usually that the age of linear careers is behind us and today's kids will change jobs more far frequently than their parents did. Which makes it particularly strange for teachers to tell students to aim only at safe and dependable careers as they chart their educational trajectory.

So what should you say to a smart kid with dreams that would take them off the beaten track? Here's my list: 

  1. Tell them "Wow that's great. Go for it!"
  2. Say to them "That sounds like it's going to be a lot of work."
  3. Ask them if they know what they need to do to get to their goal. What skills will they need? 
  4. Tell them "That sounds awesome" (whatever it is).
  5. Tell them "Go for it. Whatever happens, you will be OK", but help them figure out what they are likely to hit if they miss their actual target. Don't be negative, it just never hurts to look before you leap, to make sure they will be ok.
  6. Ask them, "What can you do right now that will get you closer to your dream?" Get them doing things today that will engage their passion. 
  7. Ask "How can I help?" 

Item 1 is easy, and it works for anything. Item 2 is honest – no-one sleepwalks to being an astronaut or an All Black – and it is likely part of the fun.

Item 3 is a biggie, and for some goals the answer will be more obvious than others. What should someone who wants to be a professional athlete do, beyond just training for their sport? I'm not the best person to ask about that (really, I'm not) but whatever you want to do, my guess is that some of the things that will get you there won't always be obvious: one of my key skills as a scientist is that I like to write and know how to tell a story. (Seriously: every grant application is a story about what I would do if you gave me money – and it helps if I tell it well.) If a kid at your school wants to get into astronomy (something I do know something about), it could be a long wait before they can actually do astronomy. But astronomers need to know maths, computing, statistics and physics – does the kid you are talking to know this? Set them to finding out what is on their list. 

Item 4 leads into Item 5. If a kid chases their dreams, are they really taking their whole future into the Casino of Life and putting it all on 22? Or are they laying down skills that also lead to all sorts of other opportunities? If I am brutally honest, I will only retire once, and while my field is growing it is certainly not expanding fast enough to accommodate everyone who trains to work within it. Numbers as miserable as a 2 or 3% success rate for making the transition from a PhD to a permanent job in the field are thrown around – the situation is not as dire as that, but in astronomy the success rate may be around 20% for people who actually get PhDs. At some point, it is possible students stand a better chance of success if they don't spend too much time stressing about the odds. But all my former students are doing well. Some have tenure – and others have successful startups or jobs in Silicon Valley where they use knowledge they acquired while they were working with me.

Someone chasing their dreams may be walking a high wire, but they can build a safety net by laying down serious and transferable skills. (And many of the people I know who pulled the cord on their personal Plan B made a conscious decision to do something new; ambitions change.) Help them figure out what those will be for them. The real trick to Item 5 is to help kids take measured risks without hinting you don't think they can do it. And this goes double if you are talking to people who might feel that they won't fit naturally into their chosen field – my friend Chanda Prescod-Weinstein often says that African American students hear well-meaning talk about a personal Plan B as "I don' think this is for you"; the same likely goes for Māori and Pasifika kids in New Zealand, or girls heading into fields where women are scarce. 

Item 6 and 7: whatever it is they want to do, there will always be something they can start on right away. There's no time like the present. And be there for them. Check in. Ask how they are getting on. And wish them well.


And one last question: if you're a teacher, what's your ambition? It's got to be better to watch your students chase their dreams than, when they come to tell the story of their success, be remembered as the person who said they couldn't do it.

CODA: It is also true that if we are talking about academia, it is not just the student's job to be sure they have a Plan B, it is up to their teachers to recognise that many of them will need one, make sure that they get the chance to maximise their transferable skills and know how to demonstrate them to potential employers. But that's a story for another day.  

IMAGE CREDIT: The image is from NASA.  

Terence Tao246C notes 3: Univalent functions, the Loewner equation, and the Bieberbach conjecture

We now approach conformal maps from yet another perspective. Given an open subset {U} of the complex numbers {{\bf C}}, define a univalent function on {U} to be a holomorphic function {f: U \rightarrow {\bf C}} that is also injective. We will primarily be studying this concept in the case when {U} is the unit disk {D(0,1) := \{ z \in {\bf C}: |z| < 1 \}}.

Clearly, a univalent function {f: D(0,1) \rightarrow {\bf C}} on the unit disk is a conformal map from {D(0,1)} to the image {f(D(0,1))}; in particular, {f(D(0,1))} is simply connected, and not all of {{\bf C}} (since otherwise the inverse map {f^{-1}: {\bf C} \rightarrow D(0,1)} would violate Liouville’s theorem). In the converse direction, the Riemann mapping theorem tells us that every open simply connected proper subset {V \subsetneq {\bf C}} of the complex numbers is the image of a univalent function on {D(0,1)}. Furthermore, if {V} contains the origin, then the univalent function {f: D(0,1) \rightarrow {\bf C}} with this image becomes unique once we normalise {f(0) = 0} and {f'(0) > 0}. Thus the Riemann mapping theorem provides a one-to-one correspondence between open simply connected proper subsets of the complex plane containing the origin, and univalent functions {f: D(0,1) \rightarrow {\bf C}} with {f(0)=0} and {f'(0)>0}. We will focus particular attention on the univalent functions {f: D(0,1) \rightarrow {\bf C}} with the normalisation {f(0)=0} and {f'(0)=1}; such functions will be called schlicht functions.

One basic example of a univalent function on {D(0,1)} is the Cayley transform {z \mapsto \frac{1+z}{1-z}}, which is a Möbius transformation from {D(0,1)} to the right half-plane {\{ \mathrm{Re}(z) > 0 \}}. (The slight variant {z \mapsto \frac{1-z}{1+z}} is also referred to as the Cayley transform, as is the closely related map {z \mapsto \frac{z-i}{z+i}}, which maps {D(0,1)} to the upper half-plane.) One can square this map to obtain a further univalent function {z \mapsto \left( \frac{1+z}{1-z} \right)^2}, which now maps {D(0,1)} to the complex numbers with the negative real axis {(-\infty,0]} removed. One can normalise this function to be schlicht to obtain the Koebe function

\displaystyle  f(z) := \frac{1}{4}\left( \left( \frac{1+z}{1-z} \right)^2 - 1\right) = \frac{z}{(1-z)^2}, \ \ \ \ \ (1)

which now maps {D(0,1)} to the complex numbers with the half-line {(-\infty,-1/4]} removed. A little more generally, for any {\theta \in {\bf R}} we have the rotated Koebe function

\displaystyle  f(z) := \frac{z}{(1 - e^{i\theta} z)^2} \ \ \ \ \ (2)

that is a schlicht function that maps {D(0,1)} to the complex numbers with the half-line {\{ -re^{-i\theta}: r \geq 1/4\}} removed.

Every schlicht function {f: D(0,1) \rightarrow {\bf C}} has a convergent Taylor expansion

\displaystyle  f(z) = a_1 z + a_2 z^2 + a_3 z^3 + \dots

for some complex coefficients {a_1,a_2,\dots} with {a_1=1}. For instance, the Koebe function has the expansion

\displaystyle  f(z) = z + 2 z^2 + 3 z^3 + \dots = \sum_{n=1}^\infty n z^n

and similarly the rotated Koebe function has the expansion

\displaystyle  f(z) = z + 2 e^{i\theta} z^2 + 3 e^{2i\theta} z^3 + \dots = \sum_{n=1}^\infty n e^{(n-1)\theta} z^n.

Intuitively, the Koebe function and its rotations should be the “largest” schlicht functions available. This is formalised by the famous Bieberbach conjecture, which asserts that for any schlicht function, the coefficients {a_n} should obey the bound {|a_n| \leq n} for all {n}. After a large number of partial results, this conjecture was eventually solved by de Branges; see for instance this survey of Korevaar or this survey of Koepf for a history.

It turns out that to resolve these sorts of questions, it is convenient to restrict attention to schlicht functions {g: D(0,1) \rightarrow {\bf C}} that are odd, thus {g(-z)=-g(z)} for all {z}, and the Taylor expansion now reads

\displaystyle  g(z) = b_1 z + b_3 z^3 + b_5 z^5 + \dots

for some complex coefficients {b_1,b_3,\dots} with {b_1=1}. One can transform a general schlicht function {f: D(0,1) \rightarrow {\bf C}} to an odd schlicht function {g: D(0,1) \rightarrow {\bf C}} by observing that the function {f(z^2)/z^2: D(0,1) \rightarrow {\bf C}}, after removing the singularity at zero, is a non-zero function that equals {1} at the origin, and thus (as {D(0,1)} is simply connected) has a unique holomorphic square root {(f(z^2)/z^2)^{1/2}} that also equals {1} at the origin. If one then sets

\displaystyle  g(z) := z (f(z^2)/z^2)^{1/2} \ \ \ \ \ (3)

it is not difficult to verify that {g} is an odd schlicht function which additionally obeys the equation

\displaystyle  f(z^2) = g(z)^2. \ \ \ \ \ (4)

Conversely, given an odd schlicht function {g}, the formula (4) uniquely determines a schlicht function {f}.

For instance, if {f} is the Koebe function (1), {g} becomes

\displaystyle  g(z) = \frac{z}{1-z^2} = z + z^3 + z^5 + \dots, \ \ \ \ \ (5)

which maps {D(0,1)} to the complex numbers with two slits {\{ \pm iy: y > 1/2 \}} removed, and if {f} is the rotated Koebe function (2), {g} becomes

\displaystyle  g(z) = \frac{z}{1- e^{i\theta} z^2} = z + e^{i\theta} z^3 + e^{2i\theta} z^5 + \dots. \ \ \ \ \ (6)

De Branges established the Bieberbach conjecture by first proving an analogous conjecture for odd schlicht functions known as Robertson’s conjecture. More precisely, we have

Theorem 1 (de Branges’ theorem) Let {n \geq 1} be a natural number.

  • (i) (Robertson conjecture) If {g(z) = b_1 z + b_3 z^3 + b_5 z^5 + \dots} is an odd schlicht function, then

    \displaystyle  \sum_{k=1}^n |b_{2k-1}|^2 \leq n.

  • (ii) (Bieberbach conjecture) If {f(z) = a_1 z + a_2 z^2 + a_3 z^3 + \dots} is a schlicht function, then

    \displaystyle  |a_n| \leq n.

It is easy to see that the Robertson conjecture for a given value of {n} implies the Bieberbach conjecture for the same value of {n}. Indeed, if {f(z) = a_1 z + a_2 z^2 + a_3 z^3 + \dots} is schlicht, and {g(z) = b_1 z + b_3 z^3 + b_5 z^5 + \dots} is the odd schlicht function given by (3), then from extracting the {z^{2n}} coefficient of (4) we obtain a formula

\displaystyle  a_n = \sum_{j=1}^n b_{2j-1} b_{2(n+1-j)-1}

for the coefficients of {f} in terms of the coefficients of {g}. Applying the Cauchy-Schwarz inequality, we derive the Bieberbach conjecture for this value of {n} from the Robertson conjecture for the same value of {n}. We remark that Littlewood and Paley had conjectured a stronger form {|b_{2k-1}| \leq 1} of Robertson’s conjecture, but this was disproved for {k=3} by Fekete and Szegö.

To prove the Robertson and Bieberbach conjectures, one first takes a logarithm and deduces both conjectures from a similar conjecture about the Taylor coefficients of {\log \frac{f(z)}{z}}, known as the Milin conjecture. Next, one continuously enlarges the image {f(D(0,1))} of the schlicht function to cover all of {{\bf C}}; done properly, this places the schlicht function {f} as the initial function {f = f_0} in a sequence {(f_t)_{t \geq 0}} of univalent maps {f_t: D(0,1) \rightarrow {\bf C}} known as a Loewner chain. The functions {f_t} obey a useful differential equation known as the Loewner equation, that involves an unspecified forcing term {\mu_t} (or {\theta(t)}, in the case that the image is a slit domain) coming from the boundary; this in turn gives useful differential equations for the Taylor coefficients of {f(z)}, {g(z)}, or {\log \frac{f(z)}{z}}. After some elementary calculus manipulations to “integrate” this equations, the Bieberbach, Robertson, and Milin conjectures are then reduced to establishing the non-negativity of a certain explicit hypergeometric function, which is non-trivial to prove (and will not be done here, except for small values of {n}) but for which several proofs exist in the literature.

The theory of Loewner chains subsequently became fundamental to a more recent topic in complex analysis, that of the Schramm-Loewner equation (SLE), which is the focus of the next and final set of notes.

— 1. The area theorem and its consequences —

We begin with the area theorem of Grönwall.

Theorem 2 (Grönwall area theorem) Let {F: \{ z: |z| > 1 \} \rightarrow {\bf C}} be a univalent function with a convergent Laurent expansion

\displaystyle  f(z) = z + d_0 + \frac{d_1}{z} + \frac{d_2}{z^2} + \dots.


\displaystyle  \sum_{n=0}^\infty n |d_n|^2 \leq 1.

Proof: By shifting {f} we may normalise {d_0=0}. By hypothesis we have {d_n = O_\varepsilon((1+\varepsilon)^n)} for any {\varepsilon>0}; by replacing {F} with {(1+\varepsilon)^{-1} F((1+\varepsilon) z)} and using a limiting argument, we may assume without loss of generality that the {d_n} have some exponential decay as {n \rightarrow \infty} (in order to justify some of the manipulations below).

Let {R>1} be a large parameter. If {z = R e^{i\theta}}, then {F(Re^{i\theta}) = R e^{i\theta} + d_1 R^{-1} e^{-i\theta} + O(R^{-2})} and {\frac{d}{d\theta} F(Re^{i\theta}) = i R e^{i\theta} - i d_1 R^{-1} e^{-i\theta} + O(R^{-2})}. The area enclosed by the simple curve {\{ F(Re^{i\theta}):0 \leq \theta \leq 2\pi \}} is equal to

\displaystyle  \int_0^{2\pi} \frac{1}{2} \mathrm{Im}( \overline{F(Re^{i\theta})} \frac{d}{d\theta} F(Re^{i\theta}) )\ d\theta

\displaystyle  = \frac{1}{2} \int_0^{2\pi} \mathrm{Im}( i R^2 + i \overline{d_1} e^{2i\theta} - i d_1 e^{-2i\theta} + O( R^{-1} )\ d\theta

\displaystyle  = \pi R^2 + O(1/R);

crucially, the error term here goes to zero as {R \rightarrow \infty}. Meanwhile, by the change of variables formula (using monotone convergence if desired to work in compact subsets of the annulus {\{z: 1 < |z| < R \}} initially) and Plancherel's theorem, the area of the region {\{ F(z): 1 < |z| < R \}} is

\displaystyle  \int_{1 < |z| < R} |F'(z)|^2\ dz d\overline{z}

\displaystyle  = \int_1^R \int_0^{2\pi} |1 + \frac{d_1}{r^2} e^{-2i\theta} + \frac{2d_2}{r^3} e^{-3i\theta} + \dots|^2\ d\theta\ r dr

\displaystyle  = 2\pi \int_1^R (1 + \frac{|d_1|^2}{r^4} + \frac{|2d_2|^2}{r^6} + \dots)\ r dr

\displaystyle  = \pi R^2 - \pi + \pi |d_1|^2 + 2 \pi |d_2|^2 + \dots - O(R^{-1}).

Comparing these bounds we conclude that

\displaystyle  \sum_{n=0}^\infty n |d_n|^2 \leq 1 + O(R^{-1});

sending {R} to infinity, we obtain the claim. \Box

Exercise 3 Let {f: D(0,1) \rightarrow {\bf C}} be a univalent function with Taylor expansion

\displaystyle  f(z) = a_0 + a_1 z + a_2 z^2 + \dots

Show that the area of {f(D(0,1))} is equal to {\pi \sum_n n |a_n|^2}. (In particular, {f(D(0,1))} has finite area if and only if {\sum_n n |a_n|^2 < \infty}.)

Corollary 4 (Bieberbach inequality)

  • (i) If {g(z) = z + b_3 z^3 + b_5 z^5 + \dots} is an odd schlicht function, then {|b_3| \leq 1}.
  • (ii) If {f(z) = z + a_2 z^2 + a_3 z^3 + \dots} is a schlicht function, then {|a_2| \leq 2}.

Proof: For (i), we apply Theorem 2 to the univalent function {F: \{ |z| > 1 \} \rightarrow {\bf C}} defined by {F(z) := 1/g(\frac{1}{z})}, which has a Laurent expansion {F(z) = z - \frac{b_3}{z} + \dots}, to give the claim. For (ii), apply part (i) to the square root of {z \mapsto f(z^2)} with first term {z}. \Box

Exercise 5 Show that equality occurs in Corollary 4(i) if and only if {g} takes the form {g(z) = \frac{z}{1-e^{i\theta} z^2}} for some {\theta \in {\bf R}}, and in Corollary 4(ii) if and only if {f} takes the form of a rotated Koebe function {f(z) = \frac{z}{(1-e^{i\theta} z)^2}} for some {\theta \in {\bf R}}.

The Bieberbach inequality can be rescaled to bound the second coefficient of univalent functions:

Exercise 6 (Rescaled Bieberbach inequality) If {f: D(0,1) \rightarrow {\bf C}} is a univalent function, show that

\displaystyle  |f''(0)| \leq 4|f'(0)|.

When does equality hold?

The Bieberbach inequality gives a useful lower bound for the image of a univalent function, known as the Koebe quarter theorem:

Corollary 7 (Koebe quarter theorem) Let {f: D(0,1) \rightarrow {\bf C}} be a univalent function. Then {f(D(0,1))} contains the disk {D(f(0), |f'(0)|/4)}.

Proof: By applying a translation and rescaling, we may assume without loss of generality that {f} is a schlicht function, with Taylor expansion

\displaystyle  f(z) = z + a_2 z^2 + a_3 z^3 + \dots

Our task is now to show that for every {w \in D(0,1/4)}, the equation {f(z)=w} has a solution in {D(0,1)}. If this were not the case, then the function {w-f(z)} is invertible on {D(0,1)}, with inverse being univalent and having the Taylor expansion

\displaystyle  \frac{1}{w-f(z)} = \frac{1}{w} + \frac{f(z)}{w^2} + \frac{f(z)^2}{w^3} + \dots = \frac{1}{w} + \frac{z}{w^2} + (\frac{a_2}{w^2} + \frac{1}{w^3}) z^2 + \dots.

Applying Exercise 6 we then have

\displaystyle  2|\frac{a_2}{w^2} + \frac{1}{w^3}| \leq \frac{4}{|w|^2}

while from the Bieberbach inequality one also has {|a_2| \leq 2}. Hence by the triangle inequality {|1/w| \leq 4}, which is incompatible with the hypothesis {w \in D(0,1/4)}. \Box

Exercise 8 Show that the radius {|f'(0)|/4} is best possible in Corollary 7 (thus, {f(D(0,1))} does not contain any disk {D(f(0), |f'(0)|/4+\varepsilon)} with {\varepsilon>0}) if and only if {f} takes the form {f(z) = a + \frac{bz}{(1-e^{i\theta}z)^2}} for some complex numbers {a,b \in {\bf C}} and real {\theta}.

Remark 9 The univalence hypothesis is crucial in the Koebe quarter theorem. Consider for instance the functions {f_n: D(0,1) \rightarrow {\bf C}} defined by {f_n(z) := \frac{e^{nz}-1}{n}}. These are locally univalent functions (since {f_n} is holomorphic with non-zero derivative) and {f_n(0)=0}, {f'_n(z)=1}, but {f_n(D(0,1))} avoids the point {-\frac{1}{n}}.

Exercise 10 (Koebe distortion theorem) Let {f: D(0,1) \rightarrow {\bf C}} be a schlicht function, and let {z \in D(0,1)} have magnitude {|z|=r}.

Exercise 11 (Conformal radius) If {U} is a non-empty simply connected open subset of {{\bf C}} that is not all of {{\bf C}}, and {z_0} is a point in {U}, define the conformal radius of {U} at {z_0} to be the quantity {|\phi'(0)|}, where {\phi} is any conformal map from {D(0,1)} to {U} that maps {0} to {z_0} (the existence and uniqueness of this radius follows from the Riemann mapping theorem). Thus for instance the disk {D(z_0,r)} has conformal radius {r} around {z_0}.

We can use the distortion theorem to obtain a nice criterion for when univalent maps converge to a given limit, known as the Carathéodory kernel theorem.

Theorem 12 (Carathéodory kernel theorem) Let {U_1,U_2,\dots} be a sequence of simply connected open proper subsets of {{\bf C}} containing the origin, and let {U} be a further simply connected open proper subset of {{\bf C}} containing {0}. Let {f_n: D(0,1) \rightarrow U_n} and {f: D(0,1) \rightarrow U} be the conformal maps with {f_n(0)=0} and {f'_n(0) > 0} (the existence and uniqueness of these maps are given by the Riemann mapping theorem). Then the following are equivalent:

  • (i) {f_n} converges locally uniformly on compact sets to {f}.
  • (ii) For every subsequence {U_{n_j}} of the {U_n}, {U} is the set of all {w \in {\bf C}} such that there is an open connected set containing {0} and {w} that is contained in {U_{n_j}} for all sufficiently large {j}.

If conclusion (ii) holds, {U} is known as the kernel of the domains {U_n}.

Proof: Suppose first that {f_n} converges locally uniformly on compact sets to {f}. If {w \in U}, then {w \in f(B(0,r')) \subset f(B(0,r))} for some {0 < r' < r < 1}. If {w' \in f(B(0,r))}, then the holomorphic functions {f_n-w'} converge uniformly on {\overline{B(0,r)}} to the function {f-w'}, which is not identically zero but has a zero in {B(0,r)}. By Hurwitz’s theorem we conclude that {f_n-w'} also has a zero in {B(0,r)} for all sufficiently large {n}; indeed the same argument shows that one can replace {w'} by any element of a small neighbourhood of {w'} to obtain the same conclusion, uniformly in {n}. From compactness we conclude that for sufficiently large {n}, {f_n-w'} has a zero in {B(0,r)} for all {w' \in \overline{f(B(0,r'))}}, thus {f(B(0,r')) \subset U_n} for sufficiently large {n}. Since {f(B(0,r'))} is open connected and contains {0} and {w}, we see that {U} is contained in the set described in (ii).

Conversely, suppose that {U_{n_j}} is a subsequence of the {U_n} and {w \in {\bf C}} is such that there is an open connected set {V} containing {0} and {w} that is contained in {U_{n_j}} for sufficiently large {j}. The inverse maps {f_{n_j}^{-1}: V \rightarrow D(0,1)} are holomorphic and bounded, hence form a normal family by Montel’s theorem. By refining the subsequence we may thus assume that the {f_{n_j}^{-1}} converge locally uniformly to a holomorphic limit {g}. The function {g} takes values in {\overline{D(0,1)}}, but by the open mapping theorem it must in fact map to {D(0,1)}. In particular, {g(w) \in D(0,1)}. Since {f_{n_j}^{-1}(w)} converges to {g(w)}, and {f_{n_j}} converges locally uniformly to {f}, we conclude that {f_{n_j}( f_{n_j}^{-1}(w) )} converges to {f(g(w))}, thus {f(g(w)) = w} and hence {w \in U}. This establishes the derivation of (ii) from (i).

Now suppose that (ii) holds. It suffices to show that every subsequence of {f_n} has a further subsequence that converges locally uniformly on compact sets to {f} (this is an instance of the Urysohn subsequence principle). Then (as {U} contains {0}) in particular there is a disk {D(0,\varepsilon)} that is contained in the {U_n} for all sufficiently large {n}; on the other hand, as {U} is not all of {{\bf C}}, there is also a disk {D(0,R)} which is not contained in the {U_n} for all sufficiently large {n}. By Exercise 11, this implies that the conformal radii of the {U_n} around zero is bounded above and below, thus {f'_n(0)} is bounded above and below.

By Exercise 10(v), and rescaling, the functions {f_{n_j}} then form a normal family, thus there is a subsequence {f_{n'_j}} of the {f_{n_j}} that converges locally uniformly on compact sets to some limit {g}. Since {f'_{n'_j}(0)} is positive and bounded away from zero, {g'(0)} is also positive, so {g} is non-constant. By Hurwitz’s theorem, {g} is therefore also univalent, and thus maps {D(0,1)} to some region {V}. By the implication of (ii) from (i) (with {f,U} replaced by {g,V}) we conclude that {V} is the set of all {w \in {\bf C}} such that there is an open connected set containing {0} and {w} that is contained in {U_{n'_j}} for all sufficiently large {j}; but by hypothesis, this set is also {U}. Thus {U=V}, and then by the uniqueness part of the Riemann mapping theorem, {f=g} as desired. \Box

The condition in Theorem 12(ii) indicates that {U_n} “converges” to {U} in a rather complicated sense, in which large parts of {U_n} are allowed to be “pinched off” from {0} and disappear in the limit. This is illustrated in the following explicit example:

Exercise 13 (Explicit example of kernel convergence) Let {g} be the function from (5), thus {g} is a univalent function from {D(0,1)} to {{\bf C}} with the two vertical rays from {i/2} to {+i\infty}, and from {-i/2} to {-i\infty}, removed. For any natural number {n}, let {z_n := 1-\frac{1}{n}} and let {w_n := g(z_n) = \frac{n(n-1)}{2n-1}}, and define the transformed functions {f_n(z) := \frac{1}{w_n} g(\frac{z+z_n}{1+\overline{z_n} z}) - 1}.

  • (i) Show that {f_n} is a univalent function from {D(0,1)} to {{\bf C}} with the two vertical rays from {-1 + \frac{i}{2w_n}} to {-1+i\infty}, and from {-1 - \frac{i}{2w_n}} to {-1 -i\infty}, removed, and that {f_n(0)=0} and {f'_n(0)>0}.
  • (ii) Show that {f_n} converges locally uniformly to the function {f(z) := \frac{2z}{1-z}}, and that this latter map is a univalent map from {D(0,1)} to the half-plane {\{ x+iy: x > -1\}}. (Hint: one does not need to compute everything exactly; for instance, any terms of the form {O( \frac{1}{n^2})} can be written using the {O()} notation instead of expanded explicitly.)
  • (iii) Explain why these facts are consistent with the Carathéodory kernel theorem.

As another illustration of the theorem, let {U,V} be two distinct convex open proper subsets of {{\bf C}} containing the origin, and let {f_U, f_V} be the associated conformal maps from {D(0,1)} to {U,V} respectively with {f_U(0)=f_V(0)=0} and {f'_U(0), f'_V(0)>0}. Then the alternating sequence {f_U, f_V, f_U, f_V, \dots} does not converge locally uniformly to any limit. The set {U \cap V} is the set of all points that lie in a connected open set containing the origin that eventually is contained in the sequence {U,V,U,V,\dots}; but if one passes to the subsequence {U,U,U,\dots}, this set of points enlarges to {U}, and so the sequence {U,V,U,V,\dots} does not in fact have a kernel.

However, the kernel theorem simplifies significantly when the {U_n} are monotone increasing, which is already an important special case:

Corollary 14 (Monotone increasing case of kernel theorem) Let the notation and assumptions be as in Theorem 12. Assume furthermore that

\displaystyle  U_1 \subset U_2 \subset U_3 \subset \dots

and that {U = \bigcup_{n=1}^\infty U_n}. Then {f_n} converges locally uniformly on compact sets to {f}.

Loewner observed that the kernel theorem can be used to approximate univalent functions by functions mapping into slit domains. More precisely, define a slit domain to be an open simply connected subset of {{\bf C}} formed by deleting a half-infinite Jordan curve {\{ \gamma(t): t \geq 0 \}} connecting some finite point {\gamma(0)} to infinity; for instance, the image {{\bf C} \backslash (-\infty,-1/4]} of the Koebe function is a slit domain.

Theorem 15 (Loewner approximation theorem) Let {f: D(0,1) \rightarrow {\bf C}} be a univalent function. Then there exists a sequence {f_n: D(0,1) \rightarrow {\bf C}} of univalent functions whose images {f_n(D(0,1))} are slit domains, and which converge locally uniformly on compact subsets to {f}.

Proof: First suppose that {f} extends to a univalent function on a slightly larger disk {D(0,1+\varepsilon)} for some {\varepsilon>0}. Then the image {f( \partial D(0,1)) = \{ f(e^{i\theta}): 0 \leq \theta \leq 2\pi \}} of the unit circle is a Jordan curve enclosing the region {f(D(0,1))} in the interior. Applying the Jordan curve theorem (and the Möbius inversion {z \mapsto 1/z}), one can find a half-infinite Jordan curve {\gamma} from {f(1)} to infinity that stays outside of {f(\overline{D(0,1)})}. For any {n}, one can concatenate this curve with the arc {\{f(e^{i\theta}): 0 \leq \theta \leq 2\pi - \frac{1}{n} \}} to obtain another half-infinite Jordan curve {\gamma_n}, whose complement {U_n := {\bf C} \backslash \gamma_n} is a slit domain which has {f(D(0,1))} as kernel (why?). If we let {f_n} be the conformal maps from {D(0,1)} to {U_n} with {f_n(0)=0} and {f'_n(0)>0}, we conclude from the Carathéodory kernel theorem that {f_n} converges locally uniformly on compact sets to {f}.

If {f} is just univalent on {D(0,1)}, then it is the locally uniform limit of the dilations {f_n(z) := f( \frac{n-1}{n} z )}, which are univalent on the slightly larger disks {D(0,\frac{n}{n-1})}. By the previous arguments, each {f_n} is in turn the locally uniform limit of univalent functions whose images are slit domains, and the claim now follows from a diagonalisation argument. \Box

— 2. Loewner chains —

The material in this section is based on these lecture notes of Contreras.

An important tool in analysing univalent functions is to study one-parameter families {f_t: D(0,1) \rightarrow {\bf C}} of univalent functions, parameterised by a time parameter {t \geq 0}, in which the images {f_t(D(0,1))} are increasing in {t}; roughly speaking, these families allow one to study an arbitrary univalent function {f_1} by “integrating” along such a family from {t=\infty} back to {t=1}. Traditionally, we normalise these families into (radial) Loewner chains, which we now define:

Definition 16 (Loewner chain) A (radial) Loewner chain is a family {(f_t)_{t \geq 0}} of univalent maps {f_t: D(0,1) \rightarrow {\bf C}} with {f(0)=0} and {f'_t(0) = e^t} (so in particular {f_0} is schlicht), such that {f_s(D(0,1)) \subsetneq f_t(D(0,1))} for all {0 \leq s < t < \infty}. (In these notes we use the prime notation exclusively for differentiation in the {z} variable; we will use {\partial_t} later for differentiation in the {t} variable.)

A key example of a Loewner chain is the family

\displaystyle  f_t(z) := e^t \frac{z}{(1-z)^2} \ \ \ \ \ (7)

of dilated Koebe functions; note that the image {f_t(D(0,1))} of each {f_t} is the slit domain {{\bf C} \backslash (-\infty, -e^t/4]}, which is clearly monotone increasing in {t}. More generally, we have the rotated Koebe chains

\displaystyle  f_t(z) := e^t \frac{z}{(1- e^{i\theta} z)^2} \ \ \ \ \ (8)

for any real {\theta}.

Whenever one has a family {(U_t)_{t>0}} of simply connected proper open subsets of {{\bf C}} containing {0} with {U_s \subsetneq U_t} for {0 \leq s<t0}, and {f_t(D(0,1)) = U_t}. By definition, {f'_t(0)} is then the conformal radius of {U_t} around {0}, which is a strictly increasing function of {t} by Exercise 11. If this conformal radius {f'_t(0)} is equal to {1} at {t=0} and increases continuously to infinity as {t \rightarrow \infty}, then one can reparameterise the {t} variable so that {f'_t(0)=e^t}, at which point one obtains a Loewner chain.

From the Koebe quarter theorem we see that each image {U_t = f_t(D(0,1))} in a Loewner chain contains the disk {D(0,e^t/4)}. In particular the {U_t} increase to fill out all of {{\bf C}}: {{\bf C} = \bigcup_{t>0} U_t}.

Let {(f_t)_{t \geq 0}} be a Loewner chain, Let {0 \leq s \leq t < \infty}. The relation {f_s(D(0,1)) \subsetneq f_t(D(0,1))} is sometimes expressed as the assertion that {f_s} is subordinate to {f_t}. It has the consequence that one has a composition law of the form

\displaystyle  f_s = f_t \circ \phi_{t \leftarrow s} \ \ \ \ \ (9)

for a univalent function {\phi_{s \leftarrow t}: D(0,1) \rightarrow D(0,1)}, uniquely defined as {\phi_{t \leftarrow s} = f_t^{-1} \circ f_s}, noting taht {f_t^{-1}: U_t \rightarrow D(0,1)} is well-defined on {U_s \subset U_t}. By construction, we have {\phi_{t \leftarrow s}(0)=0} and

\displaystyle  \phi_{s \leftarrow t}'(0) = e^{s-t}, \ \ \ \ \ (10)

as well as the composition laws

\displaystyle  \phi_{t \leftarrow t} = \mathrm{id}; \quad \phi_{t \leftarrow r} = \phi_{t \leftarrow s} \circ \phi_{s \leftarrow r} \ \ \ \ \ (11)

for {0 \leq r \leq s \leq t < \infty}. We will refer to the {\phi_{t \leftarrow s}} as transition functions.

From the Schwarz lemma, we have

\displaystyle  |\phi_{t \leftarrow s}(z)| \leq |z|

for {0 \leq s \leq t < \infty}, with strict inequality when {s<t}. In particular, if we introduce the function

\displaystyle  p_{t \leftarrow s}(z) := \frac{1 + e^{s-t}}{1-e^{s-t}} \frac{z - \phi_{t \leftarrow s}(z)}{z + \phi_{t \leftarrow s}(z)} \ \ \ \ \ (12)

for {0 \leq s<t< \infty} and {z \in D(0,1)}, then (after removing the singularity at infinity and using (10)) we see that {p_{t \leftarrow s}: D(0,1) \rightarrow \mathbf{H}} is a holomorphic map to the right half-plane {\mathbf{H} := \{ x+iy: x>0 \}}, normalised so that

\displaystyle  p_{t \leftarrow s}(0) = 1.

Define a Herglotz function to be a holomorphic function {p: D(0,1) \rightarrow \mathbf{H}}, thus {p_{t \leftarrow s}} is a Herglotz function for all {0 \leq s < t < \infty}. A key family of examples of a Herglotz function are the Möbius transforms {z \mapsto \frac{e^{i\theta}+z}{e^{i\theta}-z}} for {\theta \in {\bf R}}. In fact, all other Herglotz functions are basically just averages of this one:

Exercise 17 (Herglotz representation theorem) Let {p: D(0,1) \rightarrow \mathbf{H}} be a Herglotz function, normalised so that {p(0)=1}.

  • (i) For any {0 < r < 1}, show that

    \displaystyle  p(z) = \frac{1}{2\pi} \int_0^{2\pi} \frac{re^{i\theta}+z}{re^{i\theta}-z} \mathrm{Re} p(re^{i\theta})\ d\theta.

    for {z \in D(0,r)}. (Hint: The real part of {p} is harmonic, and so has a Poisson kernel representation. Alternatively, one can use a Taylor expansion of {p}.)

  • (ii) Show that there exists a (Radon) probability measure {\mu} on {[0,2\pi)} such that

    \displaystyle  p(z) = \int_0^{2\pi} \frac{e^{i\theta}+z}{e^{i\theta}-z}\ d\mu(\theta)

    for all {z \in D(0,1)}. (One will need a measure-theoretic tool such as Prokhorov’s theorem, the Riesz representation theorem, or the Helly selection principle.) Conversely, show that every probability measure {\mu} on {[0,2\pi]} generates a Herglotz function {p} with {p(0)=1} by the above formula.

  • (iii) Show that the measure {\mu} constructed on (ii) is unique.

This has a useful corollary, namely a version of the Harnack inequality:

Exercise 18 (Harnack inequality) Let {p: D(0,1) \rightarrow \mathbf{H}} be a Herglotz function, normalised so that {p(0)=1}. Show that

\displaystyle  \frac{1-|z|}{1+|z|} \leq \mathrm{Re} p(z) \leq |p(z)| \leq \frac{1+|z|}{1-|z|}

for all {z \in D(0,1)}.

This gives some useful Lipschitz regularity properties of the transition functions {p_{t \leftarrow s}} and univalent functions {f_t} in the {t} variable:

Lemma 19 (Lipschitz regularity) Let {K} be a compact subset of {D(0,1)}, and let {0 < T < \infty}. Use {O_{K,T}(X)} to denote a quantity bounded in magnitude by {C_{K,T} X}, where {C_{K,T}} depends only on {K,T}.

  • (i) For any {0 \leq s \leq t \leq u \leq T} and {z \in K}, one has

    \displaystyle  \phi_{u \leftarrow t}(z) - \phi_{u \leftarrow s}(z) = O_{K,T}(|s-t|).

  • (ii) For any {0 \leq s \leq t \leq T} and {z \in K}, one has

    \displaystyle  f_t(z) - f_s(z) = O_{K,T}(|s-t|).

One can make the bounds {O_{K,T}(|s-t|)} much more explicit if desired (see e.g. Lemma 2.3 of these notes of Contreras), but for our purposes any Lipschitz bound will suffice.

Proof: To prove (i), it suffices from (11) and the Schwarz-Pick lemma (Exercise 13 from Notes 2) to establish this claim when {u=t}. We can also assume that {s<t} since the claim is trivial when {s=t}. From the Harnack inequality one has

\displaystyle  p_{t \leftarrow s}(z) = O_{K,T}(1)

for {z \in K}, which by (12) and some computation gives

\displaystyle  \phi_{t \leftarrow s}(z) = z + O_{K,T}(|s-t|), \ \ \ \ \ (13)

giving the claim.

Now we prove (ii). We may assume without loss of generality that {K} is convex. From Exercise 10 (normalising {f_t} to be schlicht) we see that {f'_t(z) = O_{K,T}(1)} for {z \in K}, and hence {f_t} has a Lipschitz constant of {O_{K,T}(1)} on {K}. Since {f_t(z) - f_s(z) = f_t(z) - f_t( \phi_{t \leftarrow s}(z) )}, the claim now follows from (13). \Box

As a first application of this we show that every schlicht function starts a Loewner chain.

Lemma 20 Let {f: D(0,1) \rightarrow {\bf C}} be schlicht. Then there exists a Loewner chain {(f_t)_{t \geq 0}} with {f_0 = 0}.

Proof: This will be similar to the proof of Theorem 15. First suppose that {f} extends to be univalent on {D(0,1+\varepsilon)} for some {\varepsilon>0}, then {f(\partial D(0,1))} is a Jordan curve. Then by Carathéodory’s theorem (Theorem 20 of Notes 2) (and the Möbius inversion {z \mapsto \frac{1}{z}}) one can find a conformal map {\phi} from the exterior of {f(\partial D(0,1))} to the exterior of {D(0,1)} that sends infinity to infinity. If we define {U_t} for {t>0} to be the region enclosed by the Jordan curve {\phi^{-1}( \partial D( 0, e^t) ))}, then the {U_t} are increasing in {t} with conformal radius going to infinity as {t \rightarrow \infty}. If one sets {f_t: D(0,1) \rightarrow U_t} to be the conformal maps with {f_t(0)=0} and {f'_t(0)>0}, then {f_0 = f} (by the uniqueness of Riemann mapping) and by the Carathéodory kernel theorem, {f_s} converges locally uniformly to {f_t} as {s \rightarrow t}. In particular, the conformal radii {f'_t(0)} are continuous in {t}. Reparameterising in {t} one can then obtain the required Loewner chain.

Now suppose {f} is only univalent of {D(0,1)}. As in the proof of Theorem 15, one can express {f} as the locally uniform limit of schlicht functions {f_n}, each of which extends univalently to some larger disk {D(0,1+\varepsilon_n)}. By the preceding discussion, each of the {f_n} extends to a Loewner chain {(f_{n,t})_{t>0}}. From the Lipschitz bounds (and the Koebe distortion theorem) one sees that these chains are locally uniformly equicontinuous in {t} and {z}, uniformly in {n}, and hence by Arzela-Ascoli we can pass to a subsequence that converges locally uniformly in {t,z} to a limit {(f_t)_{t>0}}; one can also assume that the transition functions {\phi_{n,t \leftarrow s}} converge locally uniformly to limits {\phi_{t \leftarrow s}}. It is then not difficult by Hurwitz theorem to verify the limiting relations (9), (11), and that {(f_t)_{t>0}} is a Loewner chain with {f_0=f} as desired. \Box

Suppose that {0 \leq s < t < \infty} are close to each other: {s \approx t}. Then one heuristically has the approximations

\displaystyle \frac{1 + e^{s-t}}{1-e^{s-t}} \approx \frac{2}{t-s}; \quad \frac{z - \phi_{t \leftarrow s}(z)}{z + \phi_{t \leftarrow s}(z)} \approx \frac{1}{2z} (z - \phi_{t \leftarrow s}(z))

and hence by (12) and some rearranging

\displaystyle  \phi_{t \leftarrow s}(z) \approx z + (s-t) z p_{t \leftarrow s}(z)

and hence on applying {f_t}, (9), and the Newton approximation

\displaystyle  f_s(z) \approx f_t(z) + (s-t) z f'_t(z) p_{t \leftarrow s}(z).

This suggests that the {f_t} should obey the Loewner equation

\displaystyle  \partial_t f_t(z) = z f'_t(z) p_t(z) \ \ \ \ \ (14)

for some Herglotz function {p_t}. This is essentially the case:

Theorem 21 (Loewner equation) Let {(f_t)_{t \geq 0}} be a Loewner chain. Then, for {t} outside of an exceptional set {E \subset [0,+\infty)} of Lebesgue measure zero, the functions {f_t(z)} are differentiable in time for each {z \in D(0,1)}, and obey the equation (14) for all {t \in [0,+\infty) \backslash E} and {z \in D(0,1)}, and some Herglotz function {p_t} for each {t \in E} with {p_t(0)=1}. Furthermore, the maps {t \mapsto p_t(z)} are measurable for every {z \in D(0,1)}.

Proof: Let {Q} be a countable dense subset of {D(0,1)}. From Lemma 19, the function {t \mapsto f_t(z)} is Lipschitz continuous, and thus differentiable almost everywhere, for each {z \in Q}. Thus there exists a Lebesgue measure zero set {E \subset [0,+\infty)} such that {t \mapsto f_t(z)} is differentiable in {t} outside of {E} for each {z \in Q}. From the Koebe distortion theorem {f_t(z)} is also locally Lipschitz (hence locally uniformly equicontinuous) in the {z} variable, so in fact {t \mapsto f_t(z)} is differentiable in {t} outside of {E} for all {z \in D(0,1)}. Without loss of generality we may assume {E} contains zero.

Let {t \in [0,+\infty) \backslash E}, and let {z \in D(0,1)}. Then as {s} approaches {t} from below, we have

\displaystyle  f_s(z) = f_t(z) + (s-t) \partial_t f_t(z) + o(|s-t|)

uniformly; from (9) and Newton approximation we thus have

\displaystyle  \phi_{t \leftarrow s}(t) = z + (s-t) \partial_t f_t(z) / f'_t(z) + o(|s-t|)

which implies that

\displaystyle  \frac{z - \phi_{t \leftarrow s}(z)}{z + \phi_{t \leftarrow s}(z)} = (1+o(1)) \frac{t-s}{2z} \frac{\partial_t f_t(z)}{f'_t(z)}.

Also we have

\displaystyle \frac{1 + e^{s-t}}{1-e^{s-t}} = (1 + o(1)) \frac{2}{t-s}

and hence by (12)

\displaystyle  p_{t \leftarrow s}(z) = (1+o(1)) \frac{\partial_t f_t(z)}{z f'_t(z)}.

Taking limits, we see that the function {p_t(z) := \frac{\partial_t f_t(z)}{z f'_t(z)}} is Herglotz with {p_t(0)=1}, giving the claim. It is also easy to verify the measurability (because derivatives of Lipschitz functions are measurable) \Box

Example 22 The Loewner chain (7) solves the Loewner equation with the Herglotz function {p_t(z) = \frac{1-z}{1+z}}. With the rotated Koebe chains (8), we instead have {p_t(z) = \frac{1-e^{i\theta} z}{1+e^{i\theta} z}}.

Although we will not need it in this set of notes, there is also a converse implication that for every family {p_t} of Herglotz functions depending measurably on {t}, one can associate a Loewner chain.

Let us now Taylor expand a Loewner chain {f_t(z)} at each time {t} as

\displaystyle  f_t(z) = a_1(t) z + a_2(t) z^2 + a_3(t) z^3 + \dots;

as {f'_t(0)=e^t}, we have {a_1(t) = e^t}. As {f_t(z)} is differentiable in almost every {t} for each {z}, and is locally uniformly continuous in {z}, we see from the Cauchy integral formulae that the {a_n(t)} are also differentiable almost everywhere in {t}. If we similarly write

\displaystyle  p_t(z) = c_0(t) + c_1(t) z + c_2(t) z^2 + \dots

for all {t} outside of {E}, then {c_0(t)=1}, and we obtain the equations

\displaystyle  \partial_t a_n(t) = \sum_{j=1}^n j a_j(t) c_{n-j}(t)

for every {n}, thus

\displaystyle  \partial_t a_2(t) = 2 a_2(t) + c_1(t) e^t \ \ \ \ \ (15)

\displaystyle  \partial_t a_3(t) = 3 a_3(t) + 2 c_1(t) a_2(t) + e^t c_2(t) \ \ \ \ \ (16)

and so forth. For instance, for the Loewner chain (7) one can verify that {a_n(t) = ne^t} and {c_n(t) = 2 (-1)^n} for {n \geq 1} solve these equations. For (8) one instead has {a_n(t) = e^{i(n-1) \theta} n e^t} and {c_n(t) = 2 (-1)^n e^{in\theta}}.

We have the following bounds on the first few coefficients of {p}:

Exercise 23 Let {p(z) = 1 + c_1 z + c_2 z^2 + \dots} be a Herglotz function with {p(0)=1}. Let {\mu} be the measure coming from the Herglotz representation theorem.

  • (i) Show that {c_n = 2 \int_0^{2\pi} e^{-int}\ d\mu(t)} for all {n \geq 1}. In particular, {|c_n| \leq 2} for all {n \geq 1}. Use this to give an alternate proof of the upper bound in the Harnack inequality.
  • (ii) Show that {(\mathrm{Re}(c_1))^2 \leq 2 + \mathrm{Re} c_2}.

We can use this to establish the first two cases of the Bieberbach conjecture:

Theorem 24 ({n=2,3} cases of Bieberbach) If {f(z) = z + a_2 z^2 + a_3 z^3 + \dots} is schlicht, then {|a_2| \leq 2} and {|a_3| \leq 3}.

The bound {|a_2| \leq 2} is not new, and indeed was implicitly used many times in the above arguments, but we include it to illustrate the use of the equations (15), (16).

Proof: By Lemma 20, we can write {f = f_0} (and {a_n = a_n(0)}) for some Loewner chain {(f_t)_{t \geq 0}}.

We can write (15) as {\partial_t (e^{-2t} a_2(t)) = c_1(t) e^{-t}}. On the other hand, from the Koebe distortion theorem applied to the schlicht functions {e^{-t} f_t}, we have {a_2(t) = O( e^{t})}, so in particular {e^{-2t} a_2(t)} goes to zero at infinity. We can integrate from {0} to infinity to obtain

\displaystyle  a_2 = - \int_0^\infty c_1(t) e^{-t}\ dt. \ \ \ \ \ (17)

From Harnack’s inequality we have {|c_1(t)| \leq 2}, giving the required bound {|a_2| \leq 2}.

In a similar vein, writing (16) as

\displaystyle  \partial_t (e^{-3t} a_3(t)) = 2 e^{-3t} c_1(t) a_2(t) + e^{-2t} c_2(t)

\displaystyle  = 2 (e^{-2t} a_2(t)) \partial_t (e^{-2t} a_2(t)) + e^{-2t} c_2(t)

we obtain

\displaystyle  \partial_t (e^{-3t} a_3(t) - e^{-4t} a_2(t)^2) = e^{-2t} c_2(t).

As {a_2(t), a_3(t) = O(e^t)}, we may integrate from {0} to infinity to obtain the identity

\displaystyle  a_3 - a_2^2 = - \int_0^\infty e^{-2t} c_2(t)\ dt.

Taking real parts using Exercise 23(ii) and (17), we have

\displaystyle  \mathrm{Re} a_3 - \mathrm{Re}((\int_0^\infty c_1(t) e^{-t}\ dt)^2) \leq 1 - \int_0^\infty e^{-2t} (\mathrm{Re}(c_1(t)))^2\ dt.

Since {\mathrm{Re}(z^2) \leq (\mathrm{Re}(z))^2}, we thus have

\displaystyle  \mathrm{Re} a_3 \leq 1 + (\int_0^\infty f(t) e^{-t}\ dt)^2 - \int_0^\infty e^{-2t} f(t)^2\ dt

where {f(t) := \mathrm{Re} c_1(t)}. By Cauchy-Schwarz, we have {(\int_0^\infty f(t) e^{-t}\ dt)^2 \leq \int_0^\infty f(t)^2 e^{-t}\ dt}, and from the bound {f(t)^2 \leq |c_1(t)|^2 \leq 4}, we thus have

\displaystyle \mathrm{Re} a_3 \leq 1 + 4 \int_0^\infty (e^{-t} - e^{-2t})\ dt = 3.

Replacing {f(z)} by the schlicht function {z \mapsto e^{-i\theta} f(e^{i\theta} z)} (which rotates {a_3} by {e^{2i\theta}}) and optimising in {\theta}, we obtain the claim {|a_3| \leq 3}. \Box

Exercise 25 Show that equality in the above bound {|a_3| \leq 3} is only attained when {f} is a rotated Koebe function.

The Loewner equation (14) takes a special form in the case of slit domains. Indeed, let {U = {\bf C} \backslash \{ \gamma(t): t \geq 0 \}} be a slit domain not containing the origin, with conformal radius {1} around {0}, and let {(f_t)_{t \geq 0}} be the Loewner chain with {f_0(D(0,1)) = U}. We can parameterise {\gamma} so that the sets {U_{t_0} := {\bf C} \backslash \{ \gamma(t): t \geq t_0\}} have conformal radius {e^{t_0}} around {0} for every {t_0 \geq 0}, in which case we see that {f_{t_0}} must be the unique conformal map from {D(0,1)} to {U_{t_0}} with {f_{t_0}(0)=0} and {f'_{t_0}>0}. For instance, for the chain (7) we would have {\gamma(t) = -e^t/4}.

Theorem 26 (Loewner equation for slit domains) In the above situation, we have the Loewner equation holding with

\displaystyle  p_t(z) = \frac{1+e^{i\theta(t)} z}{1-e^{i\theta(t)} z} \ \ \ \ \ (18)

for almost all {t} and some measurable {\theta: [0,+\infty) \rightarrow {\bf R}}.

Proof: Let {t>0} be a time where the Loewner equation holds. For {0 < s < t}, the function {f_t: D(0,1) \rightarrow {\bf C} \backslash \gamma([t,+\infty))} extends continuously to the boundary, and is two-to-one on the split {\gamma([t,+\infty))}, except at the tip {\gamma(t)} where there is a single preimage {e^{i\theta(t)}} on the unit circle; this can be seen by taking a holomorphic square root of {f(t) - \gamma(t)}, using a Möbius transformation to map the resulting image to a set bounded by a Jordan curve, and applying Carathéodory's theorem (Theorem 20 from Notes 2) to the resulting conformal map. The image {\phi_{t \leftarrow s}(D(0,1)) = f_t^{-1}( f_s(D(0,1)))} is then {D(0,1)} with a Jordan arc {\{ \eta_t(t'): s \leq t \leq t' \}} removed, where {\eta_t(t) =: e^{i\theta(t)}} is a point on the boundary of the sphere. Applying Carathéodory’s theorem to a holomorphic square root of {\phi_{t \leftarrow s} - \eta_t(s)}, we see that {\phi_{t \leftarrow s}} extends continuously to be a map from {\overline{D(0,1)}} to {\overline{D(0,1)}}, with an arc {\{ e^{i\theta}: \theta \in I_{t,s} \}} on the boundary mapping (in two-to-one fashion) to the arc {\{ \eta_t(t'): s \leq t \leq t' \}}, and the endpoints of this arc mapping to {e^{i\theta(t)}}. From this and (12), we see that {\lim_{r \rightarrow 1^-} \mathrm{Re}(p_{t \leftarrow s})(re^{i\theta})} converges to zero outside of the arc {I_{t,s}}, which by the Herglotz representation theorem implies that the measure {\mu_{t \leftarrow s}} associated to {p_{t \leftarrow s}} is supported on the arc {I_{t,s}}. An inspection of the proof of Carathéodory’s theorem also reveals that the {\phi_{t \leftarrow s}} are equicontinuous on {\overline{D(0,1)}} as {s \rightarrow t^-}, and thus converge uniformly to {\phi_{t \leftarrow t}} (which is the identity function) as {s \rightarrow t^-}. This implies that {I_{t,s}} must converge to the point {\theta(t)} as {s} approaches {t}, and so {\mu_{t \leftarrow s}} converges vaguely to the Dirac mass at {\theta(t)}. Since {p_{t \leftarrow s}} converges locally uniformly to {p_t}, we conclude the formula (18). As {p_t} depends measurably in {t}, we conclude that {\theta} does also. \Box

In fact one can show that {\theta} extends to a continuous function {\theta: [0,+\infty) \rightarrow {\bf R}}, and that the Loewner equation holds for all {t}, but this is a bit trickier to show (it requires some further distortion estimates on conformal maps, related to the arguments used to prove Carathéodory’s theorem in the previous notes) and will not be done here. One can think of the function {\theta} as “driving force” that incrementally enlarges the slit via the Loewner equation; this perspective is often used when studying the Schramm-Loewner evolution, which is the topic of the next (and final) set of notes.

— 3. The Bieberbach conjecture —

We now turn to the resolution of the Bieberbach (and Robertson) conjectures. We follow the simplified treatment of de Branges’ original proof, due to FitzGerald and Pommerenke, though we omit the proof of one key ingredient, namely the non-negativity of a certain hypergeometric function.

The first step is to work not with the Taylor coefficients of a schlicht function {f(z)} or with an odd schlicht function {g(z)}, but rather with the (normalised) logarithm {\log \frac{f(z)}{z}} of a schlicht function {f}, as the coefficients end up obeying more tractable equations. To transfer to this setting we need the following elementary inequalities relating the coefficients of a power series with the coefficients of its exponential.

Lemma 27 (Second Lebedev-Milin inequality) Let {\sum_{n=1}^\infty c_n z^n} be a formal power series with complex coefficients and no constant term, and let {\sum_{n=0}^\infty b_n z^n} be its formal exponential, thus

\displaystyle  \sum_{n=0}^\infty b_n z^n = \exp( \sum_{n=1}^\infty c_n z^n ) \ \ \ \ \ (19)

where {\exp(w)} is the formal series {1 + w + \frac{w^2}{2!} + \frac{w^3}{3!} + \dots}. Then for any {n \geq 0}, one has

\displaystyle  \frac{1}{n+1} \sum_{k=0}^n |b_k|^2 \leq \exp( \frac{1}{n+1} \sum_{k=1}^n (n+1-k) (k |c_k|^2 - \frac{1}{k}) ). \ \ \ \ \ (20)

Proof: If we formally differentiate (19) in {z}, we obtain the identity

\displaystyle  \sum_{n=1}^\infty b_n n z^{n-1} = (\sum_{n=1}^\infty c_n n z^{n-1}) (\sum_{n=0}^\infty b_n z^n);

extracting the {z^m} coefficient for any {m \geq 1}, we obtain the formula

\displaystyle  b_{m} = \frac{1}{m} \sum_{k=1}^{m} k c_k b_{m-k}.

By Cauchy-Schwarz, we thus have

\displaystyle  |b_{m}|^2 \leq \frac{1}{m^2} (\sum_{k=1}^{m} k^2 |c_k|^2) (\sum_{k=0}^{m-1} |b_n|^2) \ \ \ \ \ (21)

which we can rearrange as

\displaystyle  \frac{1}{m+1} \sum_{k=0}^{m} |b_{k}|^2 \leq \frac{1}{m} \sum_{k=0}^{m-1} |b_{k}|^2 ( 1 - \frac{1}{m+1} + \frac{1}{m(m+1)} (\sum_{k=1}^{m} k^2 |c_k|^2)).

Using {1+x \leq \exp(x)} and telescoping series, it thus suffices to prove the identity

\displaystyle  \sum_{m=1}^n (-\frac{1}{m+1} + \frac{1}{m(m+1)} \sum_{k=1}^m k^2 |c_k|^2) = \frac{1}{n+1} \sum_{k=1}^n (n+1-k) (k |c_k|^2 - \frac{1}{k}).

But this follows from observing that

\displaystyle  \frac{1}{n+1} \sum_{k=1}^n (n+1-k) \frac{1}{k} = \sum_{k=1}^n \frac{1}{k} - \frac{n}{n+1} = \sum_{m=1}^n \frac{1}{m+1}

and that

\displaystyle  k^2 \sum_{m=k}^n \frac{1}{m(m+1)} = k^2 (\frac{1}{k} - \frac{1}{n+1}) = \frac{1}{n+1} (n+1-k) k

for all {1 \leq k \leq n}. \Box

Exercise 28 Show that equality holds in (20) for a given {n} if and only if there is {\theta \in {\bf R}} such that {c_k = e^{ik\theta}/k} for all {k=1,\dots,n}.

Exercise 29 (First Lebedev-Milin inequality) With the notation as in the above lemma, and under the additional assumption {\sum_{k=1}^\infty k |c_k|^2 < \infty}, prove that

\displaystyle  \sum_{k=0}^\infty |b_k|^2 \leq \exp( \sum_{k=1}^\infty k |c_k|^2 ).

(Hint: using the Cauchy-Schwarz inequality as above, first show that the power series {\sum_{k=0}^\infty |b_k|^2 z^k} is bounded term-by-term by the power series of {\exp( \sum_{k=1}^\infty k |c_k|^2 z^k)}.) When does equality occur?

Exercise 30 (Third Lebedev-Milin inequality) With the notation as in the above lemma, show that

\displaystyle  |b_n|^2 \leq \exp( \sum_{k=1}^n (k |c_k|^2 - \frac{1}{k}) ).

(Hint: use the second Lebedev-Milin inequality and (21), together with the calculus inequality {x e^{-x} \leq e} for all {x}.) When does equality occur?

Using these inequalities, one can reduce the Robertson and Bieberbach conjectures to the following conjecture of Milin, also proven by de Branges:

Theorem 31 (Milin conjecture) Let {f: D(0,1) \rightarrow {\bf C}} be a schlicht function. Let {\log \frac{f(z)}{z}} be the branch of the logarithm of {f(z)/z} that equals {0} at the origin, thus one has

\displaystyle  \log \frac{f(z)}{z} = d_1 z + d_2 z^2 + \dots

for some complex coefficients {d_1,d_2,\dots}. Then one has

\displaystyle  \sum_{k=1}^n (n+1-k) (k |d_k|^2 - \frac{4}{k}) \leq 0

for all {n \geq 1}.

Indeed, if

\displaystyle g(z) = b_1 z + b_3 z^3 + b_5 z^5 + \dots

is an odd schlicht function, let {f} be the schlicht function given by (4), then

\displaystyle \exp( \frac{1}{2} \log \frac{f(z)}{z} ) = \frac{g(z^{1/2})}{z^{1/2}} = b_1 + b_3 z + b_5 z^2 + \dots.

Applying Lemma 27 with {c_n := d_n/2}, we obtain the Robertson conjecture, and the Bieberbach conjecture follows.

Example 32 If {f} is the Koebe function (1), then

\displaystyle  \log \frac{f(z)}{z} = - 2\log (1-z) = 2z + \frac{2}{2} z^2 + \frac{2}{3} z^3 + \dots

so in this case {d_k = \frac{2}{k}} and {k |d_k|^2 - \frac{4}{k} = 0}. Similarly, for the rotated Koebe function (2) one has {d_k = \frac{2}{k} e^{ik\theta}} and again {k |d_k|^2 - \frac{4}{k} = 0}. If one works instead with the dilated Koebe function {e^t f}, we have {\log \frac{f(z)}{z} = t + d_1 z + d_2 z^2 + \dots}, thus the time parameter only affects the constant term in {\log \frac{f(z)}{z}}. This is already a hint that the coefficients of {\log \frac{f(z)}{z}} could be worth studying further in this problem.

To prove the Milin conjecture, we use the Loewner chain method. It suffices by Theorem 15 and a limiting argument to do so in the case that {f(D(0,1))} is a slit domain. Then, by Theorem 26, {f} is the initial function {f_0} of a Loewner chain {(f_t)_{t \geq 0}} that solves the Loewner equation

\displaystyle  \partial_t f_t(z) = z f'_t(z) \frac{1-e^{i\theta(t)} z}{1+e^{i\theta(t)} z}

for all {z} and almost every {t}, and some function {\theta: [0,+\infty) \rightarrow {\bf R}}.

We can transform this into an equation for {\log \frac{f(z)}{z}}. Indeed, for non-zero {z} we may divide by {f} to obtain

\displaystyle  \partial_t \log f_t(z) = z (\frac{d}{dz} \log f_t(z)) \frac{1-e^{i\theta(t)} z}{1+e^{i\theta(t)} z}

(for any local branch of the logarithm) and hence

\displaystyle  \partial_t \log \frac{f_t(z)}{z} = (z \frac{d}{dz} \log \frac{f_t(z)}{z} + 1) \frac{1-e^{i\theta(t)} z}{1+e^{i\theta(t)} z}.

Since {f'_t(0)=e^t}, {\log \frac{f_t(z)}{z}} is equal to {t} at the origin (for an appropriate branch of the logarithm). Thus we can write

\displaystyle  \log \frac{f_t(z)}{z} = t + d_1(t) z + d_2(t) z^2 + \dots.

The {d_n(t)} are locally Lipschitz in {t} (basically thanks to Lemma 19) and for almost every {t} we have the Taylor expansions

\displaystyle  \partial_t \log \frac{f_t(z)}{z} = 1 +\partial_t d_1(t) z + \partial_t d_2(t) z^2 + \dots

\displaystyle  z \frac{d}{dz} \log \frac{f_t(z)}{z} + 1 = 1 + d_1(t) z + 2 d_2(t) z^2 + \dots


\displaystyle  \frac{1-e^{i\theta(t)} z}{1+e^{i\theta(t)} z} = 1 - 2e^{i\theta(t)} z + 2 e^{2i\theta(t)} z^2 - \dots.

Comparing coefficients, we arrive at the system of ordinary differential equations

\displaystyle  \partial_t d_k(t) = k d_k(t) + 2 \sum_{j=1}^{k-1} j d_j(t) (-e^{i\theta(t)})^{k-j} + 2(-e^{i\theta(t)})^{k} \ \ \ \ \ (22)

for every {k \geq 1}.

Fix {n \geq 1} (we will not need to use any induction on {n} here). We would like to use the system (22) to show that

\displaystyle  \sum_{k=1}^n (n+1-k) (k |d_k(0)|^2 - \frac{4}{k}) \leq 0.

The most naive attempt to do this would be to show that one has a monotonicity formula

\displaystyle  \partial_t \sum_{k=1}^n (n+1-k) (k |d_k(t)|^2 - \frac{4}{k}) \geq 0

for all {t \geq 0}, and that the expression {\sum_{k=1}^n (n+1-k) (k |d_k(0)|^2 - \frac{4}{k})} goes to zero as { \rightarrow\infty}, as the claim would then follow from the fundamental theorem of calculus. This turns out to not quite work; however it turns out that a slight modification of this idea does work. Namely, we introduce the quantities

\displaystyle  \Omega(t) := \sum_{k=1}^n \sigma_{k,n}(t) (k |d_k(t)|^2 - \frac{4}{k})

where for each {k=1,\dots,n}, {\sigma_{k}: [0,+\infty) \rightarrow {\bf R}} is a continuously differentiable function to be chosen later. If we have the initial condition

\displaystyle  \sigma_{k}(0) = n+1-k \ \ \ \ \ (23)

for all {k=1,\dots,n}, then the Milin conjecture is equivalent to asking that {\Omega(0) \leq 0}. On the other hand, if we impose a boundary condition

\displaystyle  \lim_{t \rightarrow \infty} \sigma_{k}(t) = 0 \ \ \ \ \ (24)

for {k=1,\dots,n}, then we also have {\Omega(t) \rightarrow 0} as {t \rightarrow \infty}, since {e^{-t} f_t} is schlicht and hence {\log \frac{e^{-t} f(z)}{z} = \log \frac{f(z)}{z} - t} is a normal family, implying that the {d_k(t)} are bounded in {t} for each {k}. Thus, to solve the Milin, Robertson, and Bieberbach conjectures, it suffices to find a choice of weights {\sigma_k(t)} obeying the initial and boundary conditions (23), (24), and such that

\displaystyle  \partial_t \Omega(t) \geq 0 \ \ \ \ \ (25)

for almost every {t} (note that {\Omega} will be Lipschitz, so the fundamental theorem of calculus applies).

Let us now try to establish (25) using (22). We first write {\kappa(t) := -e^{i\theta(t)}}, and drop the explicit dependence on {t}, thus

\displaystyle  \partial_t d_k = k d_k + 2 \sum_{j=1}^{k-1} j d_j \kappa^{k-j} + 2 \kappa^k

for {k \geq 1}. To simplify this equation, we make a further transformation, introducing the functions

\displaystyle  f_k := \sum_{j=1}^k j d_j \kappa^{-j}

(with the convention {f_0=0}); then we can write the above equation as

\displaystyle  \partial_t d_k = \kappa^n ( f_k + f_{k-1} + 2 ).

We can recover the {d_k} from the {f_k} by the formula

\displaystyle  d_k = \frac{1}{k} \kappa^k (f_k - f_{k-1}).

It may be worth recalling at this point that in the example of the rotated Koebe Loewner chain (2) one has {d_k = \frac{2}{k} e^{ik\theta}}, {\kappa = -e^{i\theta}}, and {f_k = -1 + (-1)^k}, for some real constant {\theta}. Observe that {f_k} has a simpler form than {d_k} in this example, suggesting again that the decision to transform the problem to one about the {f_k} rather than the {d_k} is on the right track.

We now calculate

\displaystyle  \partial_t \Omega(t) = \sum_{k=1}^n 2 k \sigma_k \mathrm{Re} \overline{d_k} \partial_t d_k + (\partial_t \sigma_k) (k |d_k|^2 - \frac{4}{k})

\displaystyle  = \sum_{k=1}^n 2 k \sigma_k \mathrm{Re} \overline{d_k} \kappa^k (f_k + f_{k-1}+2) + \frac{\partial_t \sigma_k}{k} (|f_k - f_{k-1}|^2 - 4)

\displaystyle  = \sum_{k=1}^n 2 \sigma_k \mathrm{Re} ((\overline{f_k} - \overline{f_{k-1}}) (f_k + f_{k-1}+2)) + \frac{\partial_t \sigma_k}{k} (|f_k - f_{k-1}|^2 - 4).

Conveniently, the unknown function {\kappa} no longer appears explicitly! Some simple algebra shows that

\displaystyle  \mathrm{Re} (\overline{f_k} - \overline{f_{k-1}}) (f_k + f_{k-1}+2) = (|f_k|^2 + 2 \mathrm{Re} f_k) - (|f_{k-1}|^2 + 2 \mathrm{Re} f_{k-1})

and hence by summation by parts

\displaystyle  \partial_t \Omega(t) = \sum_{k=1}^n 2 (\sigma_k - \sigma_{k+1}) (|f_k|^2 + 2 \mathrm{Re} f_k) + \frac{\partial_t \sigma_k}{k} (|f_k - f_{k-1}|^2 - 4)

with the convention {\sigma_{n+1}=0}.

In the example of the rotated Koebe function, with {f_k = -1 + (-1)^k}, the factors {|f_k|^2 + 2 \mathrm{Re} f_k} and {|f_k - f_{k-1}|^2 - 4} both vanish, which is consistent with the fact that {\Omega} vanishes in this case regardless of the choice of weights {\sigma_k}. So these two factors look to be related to each other. On the other hand, for more general choices of {f_k}, these two expressions do not have any definite sign. For comparison, the quantity {|f_k + f_{k-1} + 2|^2} also vanishes when {f_k = -1 + (-1)^k}, and has a definite sign. So it is natural to see of these three factors are related to each other. After a little bit of experimentation, one eventually discovers the following elementary identity giving such a connection:

\displaystyle  |f_k - f_{k-1}|^2 - 4 = 2 (|f_k|^2 + 2 \mathrm{Re} f_k) + 2 (|f_{k-1}|^2 + 2 \mathrm{Re} f_{k-1})

\displaystyle  - |f_k + f_{k_1} + 2|^2.

Inserting this identity into the above equation, we obtain

\displaystyle  \partial_t \Omega(t) = \sum_{k=1}^n 2 (\sigma_k - \sigma_{k+1}) (|f_k|^2 + 2 \mathrm{Re} f_k) + 2 \frac{\partial_t \sigma_k}{k} (|f_k|^2 + 2 \mathrm{Re} f_k)

\displaystyle  + 2 \frac{\partial_t \sigma_k}{k} (|f_{k-1}|^2 + 2 \mathrm{Re} f_{k-1}) - \frac{\partial_t \sigma_k}{k} |f_k + f_{k_1} + 2|^2

which can be rearranged as

\displaystyle  \partial_t \Omega(t) = 2 \sum_{k=1}^n (\sigma_k - \sigma_{k+1} + \frac{\partial_t \sigma_k}{k} + \frac{\partial_t \sigma_{k+1}}{k+1}) (|f_k|^2 + 2 \mathrm{Re} f_k)

\displaystyle  - \sum_{k=1}^n \frac{\partial_t \sigma_k}{k} |f_k + f_{k_1} + 2|^2.

We can kill the first summation by fiat, by imposing the requirement that the {\sigma_k} obey the system of differential equations

\displaystyle  \sigma_k - \sigma_{k+1} = - \frac{\partial_t \sigma_k}{k} - \frac{\partial_t \sigma_{k+1}}{k+1}, \ \ \ \ \ (26)

for {k=1,\dots,n}; then we just have

\displaystyle  \partial_t \Omega(t) = - \sum_{k=1}^n \frac{\partial_t \sigma_k}{k} |f_k + f_{k_1} + 2|^2.

Hence if we also have the non-negativity condition

\displaystyle  -\partial_t \sigma_k(t) \geq 0 \ \ \ \ \ (27)

for all {k=1,\dots,n} and {t \geq 0}, we will have obtained the desired monotonicity (25).

To summarise, in order to prove the Milin conjecture for a fixed value of {n}, we need to find functions {\sigma_1,\dots,\sigma_n} obeying the initial condition (23), the boundary condition (24), the differential equation (26), and the nonnegativity condition (27), with the convention {\sigma_{n+1}=0}. This is a significant reduction to the problem, as one just has to write down an explicit formula for such functions and verify all the properties.

Let us work out some simple cases. First consider the case {n=1}. Now our task is to solve the system

\displaystyle  \sigma_1(0) = 1

\displaystyle  \lim_{t \rightarrow \infty} \sigma_1(t) = 0

\displaystyle  \sigma_1(t) = - \partial_t \sigma_1(t)

\displaystyle  -\partial_t \sigma_1(t) \geq 0

for all {0 \leq t \leq \infty}. This is easy: we just take {\sigma_1(t) = e^{-t}} (indeed this is the unique choice). This gives the {n=1} case of the Milin conjecture (which corresponds to the {n=2} case of Bieberbach).

Next consider the case {n=2}. The system is now

\displaystyle  \sigma_1(0) = 2; \quad \sigma_2(0) = 1

\displaystyle  \lim_{t \rightarrow \infty} \sigma_1(t) = \lim_{t \rightarrow \infty} \sigma_2(t) = 0

\displaystyle  \sigma_1(t) - \sigma_2(t) = - \partial_t \sigma_1(t) - \frac{1}{2} \partial_t \sigma_2(t)

\displaystyle  \sigma_2(t) = - \frac{1}{2} \partial_t \sigma_2(t)

\displaystyle  - \partial_t \sigma_1(t), - \partial_t \sigma_2(t) \geq 0.

Again, a routine computation shows that there is a unique solution here, namely {\sigma_1(t) = 4e^{-t} - 2e^{-2t}} and {\sigma_2(t) = e^{-2t}}. This gives the {n=2} case of the Milin conjecture (which corresponds to the {n=3} case of Bieberbach). One should compare this argument to that in Theorem 24, in particular one should see very similar weight functions emerging.

Let us now move on to {n=3}. The system is now

\displaystyle  \sigma_1(0) = 3; \quad \sigma_2(0) = 2; \quad \sigma_1(0) = 1

\displaystyle  \lim_{t \rightarrow \infty} \sigma_1(t) = \lim_{t \rightarrow \infty} \sigma_2(t) = \lim_{t \rightarrow \infty} \sigma_3(t) = 0

\displaystyle  \sigma_1(t) - \sigma_2(t) = - \partial_t \sigma_1(t) - \frac{1}{2} \partial_t \sigma_2(t)

\displaystyle  \sigma_2(t) - \sigma_3(t) = - \frac{1}{2} \partial_t \sigma_2(t) - \frac{1}{3} \partial_t \sigma_3(t)

\displaystyle  \sigma_3(t) = - \frac{1}{3} \partial_t \sigma_3(t)

\displaystyle  - \partial_t \sigma_1(t), - \partial_t \sigma_2(t), - \partial_t \sigma_3(t) \geq 0.

A slightly lengthier calculation gives the unique explicit solution

\displaystyle  \sigma_1(t) = 10 e^{-t} - 12 e^{-2t} + 5 e^{-3t}

\displaystyle  \sigma_2(t) = 6 e^{-2t} - 4 e^{-3t}

\displaystyle  \sigma_3(t) = e^{-3t}

to the above conditions.

These simple cases already indicate that there is basically only one candidate for the weights {\sigma_k} that will work. A calculation can give the explicit formula:

Exercise 33 Let {n \geq 1}.

  • (i) Show there is a unique choice of continuously differentiable functions {\sigma_1,\dots,\sigma_n: [0,+\infty) \rightarrow {\bf R}} that solve the differential equations (26) with initial condition (23), with the convention {\sigma_{n+1}=0}. (Use the Picard existence theorem.)
  • (ii) For any {0 \leq k \leq n}, show that the expression

    \displaystyle  \sum_{j=0}^{n-k} (-1)^j \binom{2k+2j}{j} \binom{n+j+k+1}{n-k-j}

    is equal to {1} when {n-k} is even and {0} when {n-k} is odd.

  • (iii) Show that the functions

    \displaystyle  \sigma_k(t) = k \sum_{j=0}^{n-k} (-1)^j \binom{2k+2j}{j} \binom{n+k+j+1}{n-k-j} \frac{1}{k+j} e^{-(k+j)t}

    for {k=1,\dots,n} obey the properties (23), (26), (24). (Hint: for (23), first use (ii) to show that {\partial_t \sigma_k(t)} is equal to {-k} when {n-k} is even and {0} when {n-k} is odd, then use (26).)

The Bieberbach conjecture is then reduced to the claim that

\displaystyle  k \sum_{j=0}^{n-k} (-1)^j \binom{2k+2j}{j} \binom{n+k+j+1}{n-k-j} e^{-(k+j)t} \geq 0 \ \ \ \ \ (28)

for any {1 \leq k \leq n} and {t \geq 0}. This inequality can be directly verified for any fixed {n}; for general {n} it follows from general inequalities on Jacobi polynomials by Askey and Gasper, with an alternate proof given subsequently by Gasper. A further proof of (28), based on a variant of the above argument due to Weinstein that avoids explicit use of (28), appears in this article of Koepf. We will not detail these arguments here.

n-Category Café Applied Category Theory: Resource Theories

My course on applied category theory is continuing! After a two-week break where the students did exercises, I went back to lecturing about Fong and Spivak’s book Seven Sketches. The second chapter is about ‘resource theories’.

Resource theories help us answer questions like this:

  1. Given what I have, is it possible to get what I want?
  2. Given what I have, how much will it cost to get what I want?
  3. Given what I have, how long will it take to get what I want?
  4. Given what I have, what is the set of ways to get what I want?

Resource theories in their modern form were arguably born in these papers:

We are lucky to have Tobias in our course, helping the discussions along! He’s already posted some articles on resource theory on the Azimuth blog:

In the course, we had fun bouncing between the relatively abstract world of monoidal preorders and their very concrete real-world applications to chemistry, scheduling, manufacturing and other topics. Here are the lectures:

June 02, 2018

John PreskillA finger painting for John Preskill

I’d completed about half my candidacy exam.

Four Caltech faculty members sat in front of me, in a bare seminar room. I stood beside a projector screen, explaining research I’d undertaken. The candidacy exam functions as a milepost in year three of our PhD program. The committee confirms that the student has accomplished research and should continue.

I was explaining a quantum-thermodynamics problem. I reviewed the problem’s classical doppelgänger and a strategy for solving the doppelgänger. Could you apply the classical strategy in the quantum problem? Up to a point. Beyond it, you’d need


“Does anyone here like the Beatles?” I asked the committee. Three professors had never participated in an exam committee before. The question from the examinee appeared to startle them.

One committee member had participated in cartloads of committees. He recovered first, raising a hand.

The committee member—John Preskill—then began singing the Beatles song.

In the middle of my candidacy exam.

The moment remains one of the highlights of my career.


Throughout my PhD career, I’ve reported to John. I’ve emailed an update every week and requested a meeting about once a month. I sketch the work that’s firing me, relate my plans, and request feedback.

Much of the feedback, I’ve discerned over the years, condenses into aphorisms buried in our conversations. I doubt whether John has noticed his aphorisms. But they’ve etched themselves in me, and I hope they remain there.

“Think big.” What would impact science? Don’t buff a teapot if you could be silversmithing.

Education serves as “money in the bank.” Invest in yourself, and draw on the interest throughout your career.

“Stay broad.” (A stretching outward of both arms accompanies this aphorism.) Embrace connections with diverse fields. Breadth affords opportunities to think big.

“Keep it simple,” but “do something technical.” A teapot cluttered with filigree, spouts, and eighteen layers of gold leaf doesn’t merit a spot at the table. A Paul Revere does.

“Do what’s best for Nicole.” I don’t know how many requests to speak, to participate on committees, to explain portions of his lecture notes, to meet, to contribute to reports, and more John receives per week. The requests I receive must look, in comparison, like a mouse to a mammoth. But John exhorts me to to guard my time for research—perhaps, partially, because he gives so much time, including to students.

“Move on.” If you discover an opportunity, study background information for a few months, seize the opportunity, wrap up the project, and seek the next window.

John has never requested my updates, but he’s grown used to them. I’ve grown used to how meetings end. Having brought him questions, I invite him to ask questions of me.

“Are you having fun?” he says.


I tell the Beatles story when presenting that quantum-thermodynamics problem in seminars.

“I have to digress,” I say when the “Help!” image appears. “I presented this slide at a talk at Caltech, where John Preskill was in the audience. Some of you know John.” People nod. “He’s a…mature gentleman.”

I borrowed the term from the apparel industry. “Mature gentleman” means “at a distinguished stage by which one deserves to have celebrated a birthday of his with a symposium.”

Many physicists lack fluency in apparel-industry lingo. My audience members take “mature” at face value.

Some audience members grin. Some titter. Some tilt their heads from side to side, as though thinking, “Eh…”

John has impact. He’s logged boatloads of technical achievements. He has the scientific muscle of a scientific rhinoceros.

And John has fun. He doesn’t mind my posting an article about audience members giggling about him.

Friends ask me whether professors continue doing science after meriting birthday symposia, winning Nobel Prizes, and joining the National Academy of Sciences. I point to the number of papers with which John has, with coauthors, electrified physics over the past 20 years. Has coauthored because science is fun. It merits singing about during candidacy exams. Satisfying as passing the exam felt two years ago, I feel more honored when John teases me about my enthusiasm for science.


A year ago, I ate lunch with an alumnus who’d just graduated from our group. Students, he reported, have a tradition of gifting John a piece of art upon graduating. I relayed the report to another recent alumnus.

“Really?” the second alumnus said. “Maybe someone gave John a piece of art and then John invented the tradition.”

Regardless of its origin, the tradition appealed to me. John has encouraged me to blog as he’s encouraged me to do theoretical physics. Writing functions as art. And writing resembles theoretical physics: Each requires little more than a pencil, paper, and thought. Each requires creativity, aesthetics, diligence, and style. Each consists of ideas, of abstractions; each lacks substance but can outlive its creator. Let this article serve as a finger painting for John Preskill.

Thanks for five fun years.


With my PhD-thesis committee, after my thesis defense. Photo credit to Nick Hutzler, who cracked the joke that accounts for everyone’s laughing. (Left to right: Xie Chen, Fernando Brandão, John Preskill, Nicole Yunger Halpern, Manuel Endres.)

June 01, 2018

ResonaancesWIMPs after XENON1T

After today's update from the XENON1T experiment, the situation on the front of direct detection of WIMP dark matter is as follows

WIMP can be loosely defined as a dark matter particle with mass in the 1 GeV - 10 TeV range and significant interactions with ordinary matter. Historically, WIMP searches have stimulated enormous interest because this type of dark matter can be easily realized in models with low scale supersymmetry. Now that we are older and wiser, many physicists would rather put their money on other realizations, such as axions, MeV dark matter, or primordial black holes. Nevertheless, WIMPs remain a viable possibility that should be further explored.
To detect WIMPs heavier than a few GeV, currently the most successful strategy is to use huge detectors filled with xenon atoms, hoping one of them is hit by a passing dark matter particle. Xenon1T beats the competition from the LUX and Panda-X experiments because it has a bigger gun tank. Technologically speaking, we have come a long way in the last 30 years. XENON1T is now sensitive to 40 GeV WIMPs interacting with nucleons with the cross section of 40 yoctobarn (1 yb = 10^-12 pb = 10^-48 cm^2). This is 6 orders of magnitude better than what the first direct detection experiment in the Homestake mine could achieve back in the 80s. Compared to the last year, the  limit is better by a factor of two at the most sensitive mass point. At high mass the improvement is somewhat smaller than expected due to a small excess of events observed by XENON1T, which is probably just a 1 sigma upward fluctuation of the background.

What we are learning about WIMPs is how they can (or cannot) interact with us. Of course, at this point in the game we don't see qualitative progress, but rather incremental quantitative improvements. One possible scenario is that WIMPs experience one of the Standard Model forces,  such as the weak or the Higgs force. The former option is strongly constrained by now. If WIMPs had interacted in the same way as our neutrino does, that is by exchanging a Z boson,  it would have been found in the Homestake experiment. Xenon1T is probing models where the dark matter coupling to the Z boson is suppressed by a factor cχ ~ 10^-3 - 10^-4 compared to that of an active neutrino. On the other hand, dark matter could be participating in weak interactions only by exchanging W bosons, which can happen for example when it is a part of an SU(2) triplet. In the plot you can see that XENON1T is approaching but not yet excluding this interesting possibility. As for models using the Higgs force, XENON1T is probing the (subjectively) most natural parameter space where WIMPs couple with order one strength to the Higgs field. 

And the arms race continues. The search in XENON1T will go on until the end of this year, although at this point a discovery is extremely unlikely. Further progress is expected on a timescale of a few years thanks to the next generation xenon detectors XENONnT and LUX-ZEPLIN, which should achieve yoctobarn sensitivity. DARWIN may be the ultimate experiment along these lines, in the sense that there is no prefix smaller than yocto it will reach the irreducible background from atmospheric neutrinos, after which new detection techniques will be needed.  For dark matter mass closer to 1 GeV, several orders of magnitude of pristine parameter space will be covered by the SuperCDMS experiment. Until then we are kept in suspense. Is dark matter made of WIMPs? And if yes, does it stick above the neutrino sea?

Matt von HippelBe Rational, Integrate Our Way!

I’ve got another paper up this week with Jacob Bourjaily, Andrew McLeod, and Matthias Wilhelm, about integrating Feynman diagrams.

If you’ve been following this blog for a while, you might be surprised: most of my work avoids Feynman diagrams at all costs. I’ve changed my mind, in part, because it turns out integrating Feynman diagrams can be a lot easier than I had thought.

At first, I thought Feynman integrals would be hard purely because they’re integrals. Those of you who’ve taken calculus might remember that, while taking derivatives was just a matter of following the rules, doing integrals required a lot more thought. Rather than one set of instructions, you had a set of tricks, meant to try to match your integral to the derivative of some known function. Sometimes the tricks worked, sometimes you just ended up completely lost.

As it turns out, that’s not quite the problem here. When I integrate a Feynman diagram, most of the time I’m expecting a particular kind of result, called a polylogarithm. If you know that’s the end goal, then you really can just follow the rules, using partial-fractioning to break your integral up into simpler integrations, linear pieces that you can match to the definition of polylogarithms. There are even programs that do this for you: Erik Panzer’s HyperInt is an especially convenient one.


Or it would be convenient, if Maple’s GUI wasn’t cursed…

Still, I wouldn’t have expected Feynman integrals to work particularly well, because they require too many integrations. You need to integrate a certain number of times to define a polylogarithm: for the ones we get out of Feynman diagrams, it’s two integrations for each loop the diagram has. The usual ways we calculate Feynman diagrams lead to a lot more integrations: the systematic method, using something called Symanzik polynomials, involves one integration per particle line in the diagram, which usually adds up to a lot more than two per loop.

When I arrived at the Niels Bohr Institute, I assumed everyone in my field knew about Symanzik polynomials. I was surprised when it turned out Jake Bourjaily hadn’t even heard of them. He was integrating Feynman diagrams by what seemed like a plodding, unsystematic method, taking the intro example from textbooks and just applying it over and over, gaining no benefit from all of the beautiful graph theory that goes into the Symanzik polynomials.

I was even more surprised when his method turned out to be the better one.

Avoid Symanzik polynomials, and you can manage with a lot fewer integrations. Suddenly we were pretty close to the “two integrations per loop” sweet spot, with only one or two “extra” integrations to do.

A few more advantages, and Feynman integrals were actually looking reasonable. The final insight came when we realized that just writing the problem in the right variables made a huge difference.

HyperInt, as I mentioned, tries to break a problem up into simpler integrals. Specifically, it’s trying to make things linear in the integration variable. In order to do this, sometimes it has to factor quadratic polynomials, like so:


Notice the square roots in this formula? Those can make your life a good deal trickier. Once you’ve got irrational functions in the game, HyperInt needs extra instructions for how to handle them, and integration is a lot more cumbersome.

The last insight, then, and the key point in our paper, is to avoid irrational functions. To do that, we use variables that rationalize the square roots.

We get these variables from one of the mainstays of our field, called momentum twistors. These variables are most useful in our favorite theory of N=4 super Yang-Mills, but they’re useful in other contexts too. By parametrizing them with a good “chart”, one with only the minimum number of variables we need to capture the integral, we can rationalize most of the square roots we encounter.

That “most” is going to surprise some people. We rationalized all of the expected square roots, letting us do integrals all the way to four loops in a few cases. But there were some unexpected square roots, and those we couldn’t rationalize.

These unexpected square roots don’t just make our life more complicated, if they stick around in a physically meaningful calculation they’ll upset a few other conjectures as well. People had expected that these integrals were made of certain kinds of “letters”, organized by a mathematical structure called a cluster algebra. That cluster algebra structure doesn’t have room for square roots, which suggests that it can’t be the full story here.

The integrals that we can do, though, with no surprise square roots? They’re much easier than anyone expected, much easier than with any other method. Rather than running around doing something fancy, we just integrated things the simple, rational way…and it worked!

Tommaso DorigoMiniBoone Confirms Neutrino Anomaly

Neutrinos, the most mysterious and fascinating of all elementary particles, continue to puzzle physicists. 20 years after the experimental verification of a long-debated effect whereby the three neutrino species can "oscillate", changing their nature by turning one into the other as they propagate in vacuum and in matter, the jury is still out to decide what really is the matter with them. And a new result by the MiniBoone collaboration is stirring waters once more.

read more

May 31, 2018

Doug NatelsonComing attractions and short items

Here are a few items of interest. 

I am planning to write a couple of posts about why solids are rigid, and in the course of thinking about this, I made a couple of discoveries:

  • When you google "why are solids rigid?", you find a large number of websites that all have exactly the same wording:  "Solids are rigid because the intermolecular forces of attraction that are present in solids are very strong. The constituent particles of solids cannot move from their positions they can only vibrate from their mean positions."  Note that this is (1) not correct, and (2) also not much of an answer.  It seems that the wording is popular because it's an answer that has appeared on the IIT entrance examinations in India.
  • I came across an absolutely wonderful paper by Victor Weisskopf, "Of Atoms, Mountains, and Stars:  A Study in Qualitative Physics", Science 187, 605-612 (1975).  Here is the only link I could find that might be reachable without a subscription.  It is a great example of "thinking like a physicist", showing how far one can get by starting from simple ideas and using order-of-magnitude estimates.  This seems like something that should be required reading of most undergrad physics majors, and more besides.
In politics-of-science news:

  • There is an amendment pending in the US Congress on the big annual defense bill that has the potential to penalize US researchers who have received any (presently not well-defined) resources from Chinese talent recruitment efforts.  (Russia, Iran, and North Korea are also mentioned, but they're irrelevant here, since they are not running such programs.)  The amendment would allow the DOD to deny these folks research funding.  The idea seems to be that such people are perceived by some as a risk in terms of taking DOD-relevant knowledge and giving China an economic or strategic benefit.  Many major US research universities have been encouraging closer ties with China and Chinese universities in the last 15 years.  Makes you wonder how many people would be affected.
  • The present US administration, according to AP, is apparently about to put in place (June 11?) new limitations on Chinese graduate student visas, for those working in STEM (and especially in fields mentioned explicitly in the Chinese government's big economic plan).   It would make relevant student visas one year in duration.  Given that the current visa renewal process can already barely keep up with the demand, it seems like this could become an enormous headache.  I could go on at length about why I think this is a bad idea.  Given that it's just AP that is reporting this so far, perhaps it won't happen or will be more narrowly construed.  We'll see.

Terence TaoCommutators close to the identity – an update

I have just uploaded to the arXiv my paper “Commutators close to the identity“, submitted to the Journal of Operator Theory. This paper resulted from some progress I made on the problem discussed in this previous post. Recall in that post the following result of Popa: if {D,X \in B(H)} are bounded operators on a Hilbert space {H} whose commutator {[D,X] := DX-XD} is close to the identity in the sense that

\displaystyle  \| [D,X] - I \|_{op} \leq \varepsilon \ \ \ \ \ (1)

for some {\varepsilon > 0}, then one has the lower bound

\displaystyle  \| X \|_{op} \|D \|_{op} \geq \frac{1}{2} \log \frac{1}{\varepsilon}. \ \ \ \ \ (2)

In the other direction, for any {0 < \varepsilon < 1}, there are examples of operators {D,X \in B(H)} obeying (1) such that

\displaystyle  \| X \|_{op} \|D \|_{op} \ll \varepsilon^{-2}. \ \ \ \ \ (3)

In this paper we improve the upper bound to come closer to the lower bound:

Theorem 1 For any {0 < \varepsilon < 1/2}, and any infinite-dimensional {H}, there exist operators {D,X \in B(H)} obeying (1) such that

\displaystyle  \| X \|_{op} \|D \|_{op} \ll \log^{16} \frac{1}{\varepsilon}. \ \ \ \ \ (4)

One can probably improve the exponent {16} somewhat by a modification of the methods, though it does not seem likely that one can lower it all the way to {1} without a substantially new idea. Nevertheless I believe it plausible that the lower bound (2) is close to optimal.

We now sketch the methods of proof. The construction giving (3) proceeded by first identifying {B(H)} with the algebra {M_2(B(H))} of {2 \times 2} matrices that have entries in {B(H)}. It is then possible to find two matrices {D, X \in M_2(B(H))} whose commutator takes the form

\displaystyle  [D,X] = \begin{pmatrix} I & u \\ 0 & I \end{pmatrix}

for some bounded operator {u \in B(H)} (for instance one can take {u} to be an isometry). If one then conjugates {D, X} by the diagonal operator {\mathrm{diag}(\varepsilon,1)}, one can eusure that (1) and (3) both hold.

It is natural to adapt this strategy to {n \times n} matrices {D,X \in M_n(B(H))} rather than {2 \times 2} matrices, where {n} is a parameter at one’s disposal. If one can find matrices {D,X \in M_n(B(H))} that are almost upper triangular (in that only the entries on or above the lower diagonal are non-zero), whose commutator {[D,X]} only differs from the identity in the top right corner, thus

\displaystyle  [D, X] = \begin{pmatrix} I & 0 & 0 & \dots & 0 & S \\ 0 & I & 0 & \dots & 0 & 0 \\ 0 & 0 & I & \dots & 0 & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & \dots & I & 0 \\ 0 & 0 & 0 & \dots & 0 & I \end{pmatrix}.

for some {S}, then by conjugating by a diagonal matrix such as {\mathrm{diag}( \mu^{n-1}, \mu^{n-2}, \dots, 1)} for some {\mu} and optimising in {\mu}, one can improve the bound {\varepsilon^{-2}} in (3) to {O_n( \varepsilon^{-\frac{2}{n-1}} )}; if the bounds in the implied constant in the {O_n(1)} are polynomial in {n}, one can then optimise in {n} to obtain a bound of the form (4) (perhaps with the exponent {16} replaced by a different constant).

The task is then to find almost upper triangular matrices {D, X} whose commutator takes the required form. The lower diagonals of {D,X} must then commute; it took me a while to realise then that one could (usually) conjugate one of the matrices, say {X} by a suitable diagonal matrix, so that the lower diagonal consisted entirely of the identity operator, which would make the other lower diagonal consist of a single operator, say {u}. After a lot of further lengthy experimentation, I eventually realised that one could conjugate {X} further by unipotent upper triangular matrices so that all remaining entries other than those on the far right column vanished. Thus, without too much loss of generality, one can assume that {X} takes the normal form

\displaystyle  X := \begin{pmatrix} 0 & 0 & 0 & \dots & 0 & b_1 \\ I & 0 & 0 & \dots & 0 & b_2 \\ 0 & I & 0 & \dots & 0 & b_3 \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & \dots & 0 & b_{n-1} \\ 0 & 0 & 0 & \dots & I & b_n \end{pmatrix}.

\displaystyle  D := \begin{pmatrix} v & I & 0 & \dots & 0 & b_1 u \\ u & v & 2 I & \dots & 0 & b_2 u \\ 0 & u & v & \dots & 0 & b_3 u \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & \dots & v & (n-1) I + b_{n-1} u \\ 0 & 0 & 0 & \dots & u & v + b_n u \end{pmatrix}

for some {u,v \in B(H)}, solving the system of equations

\displaystyle  [v, b_i] + [u, b_{i-1}] + i b_{i+1} + b_i [u, b_n] = 0 \ \ \ \ \ (5)

for {i=2,\dots,n-1}, and also

\displaystyle  [v, b_n] + [u, b_{n-1}] + b_n [u, b_n] = n \cdot 1_{B(H)}. \ \ \ \ \ (6)

It turns out to be possible to solve this system of equations by a contraction mapping argument if one takes {u,v} to be a “Hilbert’s hotel” pair of isometries as in the previous post, though the contraction is very slight, leading to polynomial losses in {n} in the implied constant.

There is a further question raised in Popa’s paper which I was unable to resolve. As a special case of one of the main theorems (Theorem 2.1) of that paper, the following result was shown: if {A \in B(H)} obeys the bounds

\displaystyle  \|A \| = O(1)


\displaystyle  \| A \| = O( \mathrm{dist}( A, {\bf C} + K(H) )^{2/3} ) \ \ \ \ \ (7)

(where {{\bf C} + K(H)} denotes the space of all operators of the form {\lambda I + T} with {\lambda \in {\bf C}} and {T} compact), then there exist operators {D,X \in B(H)} with {\|D\|, \|X\| = O(1)} such that {A = [D,X]}. (In fact, Popa’s result covers a more general situation in which one is working in a properly infinite {W^*} algebra with non-trivial centre.) We sketch a proof of this result as follows. Suppose that {\mathrm{dist}(A, {\bf C} + K(H)) = \varepsilon} and {\|A\| = O( \varepsilon^{2/3})} for some {0 < \varepsilon \ll 1}. A standard greedy algorithm argument (see this paper of Brown and Pearcy) allows one to find orthonormal vectors {e_n, f_n, g_n} for {n=1,2,\dots} such that for each {n}, one has {A e_n = \varepsilon_n f_n + v_n} for some {\varepsilon_n} comparable to {\varepsilon}, and some {v_n} orthogonal to all of the {e_n,f_n,g_n}. After some conjugation (and a suitable identification of {B(H)} with {M_2(B(H))}, one can thus place {A} in a normal form

\displaystyle  A = \begin{pmatrix} \varepsilon^{2/3} x & \varepsilon v^* \\ \varepsilon^{2/3} y & \varepsilon^{2/3} z \end{pmatrix}

where {v \in B(H)} is a isometry with infinite deficiency, and {x,y,z \in B(H)} have norm {O(1)}. Setting {\varepsilon' := \varepsilon^{1/3}}, it then suffices to solve the commutator equation

\displaystyle  [D,X] = \begin{pmatrix} x & \varepsilon' v^* \\ y & z \end{pmatrix}

with {\|D\|_{op} \|X\|_{op} \ll (\varepsilon')^{-2}}; note the similarity with (3).

By the usual Hilbert’s hotel construction, one can complement {v} with another isometry {u} obeying the “Hilbert’s hotel” identity

\displaystyle  uu^* + vv^* = I

and also {u^* u = v^* v = I}, {u^* v = v^* u = 0}. Proceeding as in the previous post, we can try the ansatz

\displaystyle  D = \begin{pmatrix} \frac{1}{2} u^* & 0 \\ a & \frac{1}{2} u^* - v^* \end{pmatrix}, X = \begin{pmatrix} b & \varepsilon' I \\ c & d \end{pmatrix}

for some operators {a,b,c,d \in B(H)}, leading to the system of equations

\displaystyle  [\frac{1}{2} u^*, b] + [\frac{1}{2} u^* - v^*, c] = x+z

\displaystyle  \varepsilon' a = [\frac{1}{2} u^*, b] - x

\displaystyle  \frac{1}{2} u^* c + c (\frac{1}{2} u^* - v^*) + ab-da = y.

Using the first equation to solve for {b,c}, the second to then solve for {a}, and the third to then solve for {c}, one can obtain matrices {D,X} with the required properties.

Thus far, my attempts to extend this construction to larger matrices with good bounds on {D,X} have been unsuccessful. A model problem would be to express

\displaystyle  \begin{pmatrix} I & 0 & \varepsilon v^* \\ 0 & I & 0 \\ 0 & 0 & I \end{pmatrix}

as a commutator {[D,X]} with {\|D\| \|X\|} significantly smaller than {O(\varepsilon^{-2})}. The construction in my paper achieves something like this, but with {v^*} replaced by a more complicated operator. One would also need variants of this result in which one is allowed to perturb the above operator by an arbitrary finite rank operator of bounded operator norm.

May 30, 2018

Jordan EllenbergMissing LeBron

When I was a postdoc in Princeton I subscribed to the Trenton Times, because I felt it was important to be in touch with what was going on in my local community and not just follow national news.  The only story I remember was one that said “hey, a basketball team from Akron is coming to play against a top prep-school team in Trenton, and they’ve got this kid LeBron James they say is incredible, you should come check it out.”  And I really did think about it, but I was a postdoc, I was trying to write papers, I was busy, too busy to drive into Trento for a high-school basketball game.

So I guess what I’m trying to say is, yes, subscribe to your local paper because local journalism badly needs financial support, and maybe actually take seriously the local events it alerts you to.


May 29, 2018

Doug NatelsonWhat is tunneling?

I first learned about quantum tunneling from science fiction, specifically a short story by Larry Niven.  The idea is often tossed out there as one of those "quantum is weird and almost magical!" concepts.  It is surely far from our daily experience.

Imagine a car of mass \(m\) rolling along a road toward a small hill.  Let’s make the car and the road ideal – we’re not going to worry about friction or drag from the air or anything like that.   You know from everyday experience that the car will roll up the hill and slow down.  This ideal car’s total energy is conserved, and it has (conventionally) two pieces, the kinetic energy \(p^2/2m\) (where \(p\) is the momentum; here I’m leaving out the rotational contribution of the tires), and the gravitational potential energy, \(mgz\), where \(g\) is the gravitational acceleration and \(z\) is the height of the center of mass above some reference level.  As the car goes up, so does its potential energy, meaning its kinetic energy has to fall.  When the kinetic energy hits zero, the car stops momentarily before starting to roll backward down the hill.  The spot where the car stops is called a classical turning point.  Without some additional contribution to the energy, you won’t ever find the car on the other side of that hill, because the shaded region is “classically forbidden”.  We’d either have to sacrifice conservation of energy, or the car would have to have negative kinetic energy to exist in the forbidden region.  Since the kinetic piece is proportional to \(p^2\), to have negative kinetic energy would require \(p\) to be imaginary (!).

However, we know that the car is really a quantum object, built out of a huge number (more than \(10^27\)) other quantum objects.  The spatial locations of quantum objects can be described with “wavefunctions”, and you need to know a couple of things about these to get a feel for tunneling.  For the ideal case of a free particle with a definite momentum, the wavefunction really looks like a wave with a wavelength \(h/p\), where \(h\) is Planck’s constant.  Because a wave extends throughout all space, the probability of finding the ideal free particle anywhere is equal, in agreement with the oft-quoted uncertainty principle. 

Here’s the essential piece of physics:  In a classically forbidden region, the wavefunction decays exponentially with distance (mathematically equivalent to the wave having an imaginary wavelength), but it can’t change abruptly.  That means that if you solve the problem of a quantum particle incident on a finite (in energy and spatial size) barrier from one side, there is always some probability that the particle will be found on the far side of the classically forbidden region.  

This means that it’s technically possible for the car to “tunnel” through the hillside and end up on the downslope.  I would not recommend this as a transportation strategy, though, because that’s incredibly unlikely.  The more massive the particle, and the more forbidden the region (that is, the more negative the classical kinetic energy of the particle would have to be in the barrier), the faster the exponential decay of the probability of getting through.  For a 1000 kg car trying to tunnel through a 10 cm high speed bump 1 m long, the probability is around exp(-2.7e20).  That kind of number is why quantum tunneling is not an obvious part of your daily existence.  For something much less massive, like an electron, the tunneling probability from, say, a metal tip to a metal surface decays by around a factor of \(e^2\) for every 0.1 nm of tip-surface distance separation.  It’s that exponential sensitivity to geometry that makes scanning tunneling microscopypossible.

However, quantum tunneling is very much a part of your life.  Protons can tunnel through the repulsion of their positive charges to bind to each other – that’s what powers the sun.  Electrons routinely tunnel in zillions of chemical reactions going on in your body right now, as well as in the photosynthesis process that drives most plant life. 

On a more technological note, tunneling is a key ingredient in the physics of flash memory.  Flash is based on field-effect transistors, and as I described the other day, transistors are switched on or off depending on the voltage applied to a gate electrode.  Flash storage uses transistors with a “floating gate”, a conductive island surrounded by insulating material, some kind of glassy oxide.  Charge can be parked on that gate or removed from it, and depending on the amount of charge there, the underlying transistor channel is either conductive or not.   How does charge get on or off the island?  By a flavor of tunneling called field emission.  The insulator around the floating gate functions as a potential energy barrier for electrons.  If a big electric field is applied via some other electrodes, the barrier’s shape is distorted, allowing electrons to tunnel through it efficiently.  This is a tricky aspect of flash design.  The barrier has to be high/thick enough that charge stuck on the floating gate can stay there a very long time - you wouldn’t want the bits in your SSD or your flash drive losing their status on the timescale of months, right? - but ideally tunable enough that the data can be rewritten quickly, with low error rates, at low voltages.

May 25, 2018

Matt von HippelCalabi-Yaus for Higgs Phenomenology

less joking title:

You Didn’t Think We’d Stop at Elliptics, Did You?

When calculating scattering amplitudes, I like to work with polylogarithms. They’re a very well-understood type of mathematical function, and thus pretty easy to work with.

Even for our favorite theory of N=4 super Yang-Mills, though, they’re not the whole story. You need other types of functions to represent amplitudes, elliptic polylogarithms that are only just beginning to be properly understood. We had our own modest contribution to that topic last year.

You can think of the difference between these functions in terms of more and more complicated curves. Polylogarithms just need circles or spheres, elliptic polylogarithms can be described with a torus.

A torus is far from the most complicated curve you can think of, though.

983px-calabi_yau_formatted-svgString theorists have done a lot of research into complicated curves, in particular ones with a property called Calabi-Yau. They were looking for ways to curl up six or seven extra dimensions, to get down to the four we experience. They wanted to find ways of curling that preserved some supersymmetry, in the hope that they could use it to predict new particles, and it turned out that Calabi-Yau was the condition they needed.

That hope, for the most part, didn’t pan out. There were too many Calabi-Yaus to check, and the LHC hasn’t seen any supersymmetric particles. Today, “string phenomenologists”, who try to use string theory to predict new particles, are a relatively small branch of the field.

This research did, however, have lasting impact: due to string theorists’ interest, there are huge databases of Calabi-Yau curves, and fruitful dialogues with mathematicians about classifying them.

This has proven quite convenient for us, as we happen to have some Calabi-Yaus to classify.


Our midnight train going anywhere…in the space of Calabi-Yaus

We call Feynman diagrams like the one above “traintrack integrals”. With two loops, it’s the elliptic integral we calculated last year. With three, though, you need a type of Calabi-Yau curve called a K3. With four loops, it looks like you start needing Calabi-Yau three-folds, the type of space used to compactify string theory to four dimensions.

“We” in this case is myself, Jacob Bourjaily, Andrew McLeod, Matthias Wilhelm, and Yang-Hui He, a Calabi-Yau expert we brought on to help us classify these things. Our new paper investigates these integrals, and the more and more complicated curves needed to compute them.

Calabi-Yaus had been seen in amplitudes before, in diagrams called “sunrise” or “banana” integrals. Our example shows that they should occur much more broadly. “Traintrack” integrals appear in our favorite N=4 super Yang-Mills theory, but they also appear in theories involving just scalar fields, like the Higgs boson. For enough loops and particles, we’re going to need more and more complicated functions, not just the polylogarithms and elliptic polylogarithms that people understand.

(And to be clear, no, nobody needs to do this calculation for Higgs bosons in practice. This diagram would calculate the result of two Higgs bosons colliding and producing ten or more Higgs bosons, all at energies so high you can ignore their mass, which is…not exactly relevant for current collider phenomenology. Still, the title proved too tempting to resist.)

Is there a way to understand traintrack integrals like we understand polylogarithms? What kinds of Calabi-Yaus do they pick out, in the vast space of these curves? We’d love to find out. For the moment, we just wanted to remind all the people excited about elliptic polylogarithms that there’s quite a bit more strangeness to find, even if we don’t leave the tracks.

May 21, 2018

Andrew JaffeLeon Lucy, R.I.P.

I have the unfortunate duty of using this blog to announce the death a couple of weeks ago of Professor Leon B Lucy, who had been a Visiting Professor working here at Imperial College from 1998.

Leon got his PhD in the early 1960s at the University of Manchester, and after postdoctoral positions in Europe and the US, worked at Columbia University and the European Southern Observatory over the years, before coming to Imperial. He made significant contributions to the study of the evolution of stars, understanding in particular how they lose mass over the course of their evolution, and how very close binary stars interact and evolve inside their common envelope of hot gas.

Perhaps most importantly, early in his career Leon realised how useful computers could be in astrophysics. He made two major methodological contributions to astrophysical simulations. First, he realised that by simulating randomised trajectories of single particles, he could take into account more physical processes that occur inside stars. This is now called “Monte Carlo Radiative Transfer” (scientists often use the term “Monte Carlo” — after the European gambling capital — for techniques using random numbers). He also invented the technique now called smoothed-particle hydrodynamics which models gases and fluids as aggregates of pseudo-particles, now applied to models of stars, galaxies, and the large scale structure of the Universe, as well as many uses outside of astrophysics.

Leon’s other major numerical contributions comprise advanced techniques for interpreting the complicated astronomical data we get from our telescopes. In this realm, he was most famous for developing the methods, now known as Lucy-Richardson deconvolution, that were used for correcting the distorted images from the Hubble Space Telescope, before NASA was able to send a team of astronauts to install correcting lenses in the early 1990s.

For all of this work Leon was awarded the Gold Medal of the Royal Astronomical Society in 2000. Since then, Leon kept working on data analysis and stellar astrophysics — even during his illness, he asked me to help organise the submission and editing of what turned out to be his final papers, on extracting information on binary-star orbits and (a subject dear to my heart) the statistics of testing scientific models.

Until the end of last year, Leon was a regular presence here at Imperial, always ready to contribute an occasionally curmudgeonly but always insightful comment on the science (and sociology) of nearly any topic in astrophysics. We hope that we will be able to appropriately memorialise his life and work here at Imperial and elsewhere. He is survived by his wife and daughter. He will be missed.

John PreskillA quantum podcast

A few months ago I sat down with Craig Cannon of Y Combinator for a discussion about quantum technology and other things. A lightly edited version was published this week on the Y Combinator blog. The video is also on YouTube:

If you’re in a hurry, or can’t stand the sound of my voice, you might prefer to read the transcript, which is appended below. Only by watching the video, however, can you follow the waving of my hands.

I grabbed the transcript from the Y Combinator blog post, so you can read it there if you prefer, but I’ve corrected some of the typos. (There are a few references to questions and comments that were edited out, but that shouldn’t cause too much confusion.)

Here we go:

Craig Cannon [00:00:00] – Hey, how’s it going? This is Craig Cannon, and you’re listening to Y Combinator’s Podcast. Today’s episode is with John Preskill. John’s a theoretical physicist and the Richard P. Feynman Professor of Theoretical Physics at Caltech. He once won a bet with Stephen Hawking and he writes that it made him briefly almost famous. Basically, what happened is John and Kip Thorne bet that singularities could exist outside of black holes. After six years, Hawking conceded. He said that they were possible in very special, “non-generic conditions.” I’ll link up some more details to that in the description. In this episode, we cover what John’s been focusing on for years, which is quantum information, quantum computing, and quantum error correction. Alright, here we go. What was the revelation that made scientists and physicists think that a quantum computer could exist?

John Preskill [00:00:54] – It’s not obvious. A lot of people thought it couldn’t. The idea that a quantum computer would be powerful was emphasized over 30 years ago by Richard Feynman, the Caltech physicist. It was interesting how he came to that realization. Feynman was interested in computation his whole life. He had been involved during the war in Los Alamos. He was the head of the computation group. He was the guy who fixed the little mechanical calculators, and he had a whole crew of people who were calculating, and he figured out how to flow the work from one computer to another. All that kind of stuff. As computing technology started to evolve, he followed that. In the 1970s, a particle physicist like Feynman, that’s my background too, got really interested in using computers to study the properties of elementary particles like the quarks inside a nucleus, you know? We know a proton isn’t really a fundamental object. It’s got little beans rattling around inside, but they’re quantum beans. Gell-Mann, who’s good at names, called them quarks.

John Preskill [00:02:17] – Now we’ve had a theory since the 1970s of how quarks behave, and so in principle, you know everything about the theory, you can compute everything, but you can’t because it’s just too hard. People started to simulate that physics with digital computers in the ’70s, and there were some things that they could successfully compute, and some things they couldn’t because it was just too hard. The resources required, the memory, the time were out of reach. Feynman, in the early ’80s said nature is quantum mechanical damn it, so if you want a simulation of nature, it should be quantum mechanical. You should use a quantum system to behave like another quantum system. At the time, he called it a universal quantum simulator.

John Preskill [00:03:02] – Now we call it a quantum computer. The idea caught on about 10 years later when Peter Shor made the suggestion that we could solve problems which don’t seem to have anything to do with physics, which are really things about numbers like finding the prime factors of a big integer. That caused a lot of excitement,  in part because the implications for cryptography are a big disturbing. But then physicists — good physicists — started to consider, can we really build this thing? Some concluded and argued fairly cogently that no, you couldn’t because of this difficulty that it’s so hard to isolate systems from the environment well enough for them to behave quantumly. It took a few years for that to sort out at the theoretical level. In the mid ’90s we developed a theory called quantum error correction. It’s about how to encode the quantum state that you’d like to protect in such a clever way that even if there are some interactions with the environment that you can’t control, it still stays robust.

John Preskill [00:04:17] – At first, that was just kind of a theorist’s fantasy — it was a little too far ahead of the technology. But 20 years later, the technology is catching up, and now this idea of quantum error correction has become something you can do in the lab.

Craig Cannon [00:04:31] – How does quantum error correction work? I’ve seen a bunch of diagrams, so maybe this is difficult to explain, but how would you explain it?

John Preskill [00:04:39] – Well, I would explain it this way. I don’t think I’ve said the word entanglement yet, have I?

Craig Cannon [00:04:43] – Well, I have been checking off all the Bingo words yet.

John Preskill [00:04:45] – Okay, so let’s talk about entanglement because it’s part of the answer to your question, which I’m still not done answering, what is quantum physics? What do we mean by entanglement? It’s really the characteristic way, maybe the most important way that we know in which quantum is different from ordinary stuff, from classical. Now what does it mean, entanglement? It means that you can have a physical system which has many parts, which have interacted with one another, so it’s in kind of a complex correlated state of all those parts, and when you look at the parts one at a time it doesn’t tell you anything about the state of the whole thing. The whole thing’s in some definite state — there’s information stored in it — and now you’d like to access that information … Let me be a little more concrete. Suppose it’s a book.

John Preskill [00:05:40] – Okay? It’s a book, it’s 100 pages long. If it’s an ordinary book, 100 people could each take a page, and read it, they know what’s on that page, and then they could get together and talk, and now they’d know everything that’s in the book, right? But if it’s a quantum book written in qubits where these pages are very highly entangled, there’s still a lot of information in the book, but you can’t read it the way I just described. You can look at the pages one at a time, but a single page when you look at it just gives you random gibberish. It doesn’t reveal anything about the content of the book. Why is that? There’s information in the book, but it’s not stored in the individual pages. It’s encoded almost entirely in how those pages are correlated with one another. That’s what we mean by quantum entanglement: Information stored in those correlations which you can’t see when you look at the parts one at a time. You asked about quantum error correction?

John Preskill [00:06:39] – What’s the basic idea? It’s to take advantage of that property of entanglement. Because let’s say you have a system of many particles. The environment is kind of kicking them around, it’s interacting with them. You can’t really completely turn off those interactions no matter how hard you try, but suppose we’ve encoded the information in entanglement. So, say, if you look at one atom, it’s not telling you anything about the information you’re trying to protect. The environment isn’t learning anything when it looks at the atoms one at a time.

John Preskill [00:07:15] – This is kind of the key thing — that what makes quantum information so fragile is that when you look at it, you disturb it. This ordinary water bottle isn’t like that. Let’s say we knew it was either here or here, and we didn’t know. I would look at it, I’d find out it’s here. I was ignorant of where it was to start with, and now I know. With a quantum system, when you look at it, you really change the state. There’s no way to avoid that. So if the environment is looking at it in the sense that information is leaking out to the environment, that’s going to mess it up. We have to encode the information so the environment, so to speak, can’t find out anything about what the information is, and that’s the idea of quantum error correction. If we encode it in entanglement, the environment is looking at the parts one at a time, but it doesn’t find out what the protected information is.

Craig Cannon [00:08:06] – In other words, it’s kind of measuring probability the whole way along, right?

John Preskill [00:08:12] – I’m not sure what you mean by that.

Craig Cannon [00:08:15] – Is it Grover’s algorithm that was as quantum bits roll through, go through gates– The probability is determined of what information’s being passed through? What’s being computed?

John Preskill [00:08:30] – Grover’s algorithm is a way of sort of doing an exhaustive search through many possibilities. Let’s say I’m trying to solve some problem like a famous one is the traveling salesman problem. I’ve told you what the distances are between all the pairs of cities, and now I want to find the shortest route I can that visits them all. That’s a really hard problem. It’s still hard for a quantum computer, but not quite as hard because there’s a way of solving it, which is to try all the different routes, and measure how long they are, and then find the one that’s shortest, and you’ve solved the problem. The reason it’s so hard to solve is there’s such a vast number of possible routes. Now what Grover’s algorithm does is it speeds up that exhaustive search.

John Preskill [00:09:29] – In practice, it’s not that big a deal. What it means is that if you had the same processing speed, you can handle about twice as many cities before the problem becomes too hard to solve, as you could if you were using a classical processor. As far as what’s quantum about Grover, it takes advantage of the property in quantum physics that probabilities … tell me if I’m getting too inside baseball …

Craig Cannon [00:10:03] – No, no, this is perfect.

John Preskill [00:10:05] – That probabilities are the squares of amplitudes. This is interference. Again, this is another part of the answer. Well, we can spend the whole hour answering the question, what is quantum physics? Another essential part of it is what we call interference, and this is really crucial for understanding how quantum computing works. That is that probabilities add. If you know the probability of one alternative, and you know the probability of another, then you can add those together and find the probability that one or the other occurred. It’s not like that in quantum physics. The famous example is the double slit interference experiment. I’m sending electrons, let’s say — it could be basketballs, but it’s an easier experiment to do with electrons —

John Preskill [00:11:02] – at a screen, and there are two holes in the screen. You can try to detect the electron on the other side of the screen, and when you do that experiment many times, you can plot a graph showing where the electron was detected in each run, or make a histogram of all the different outcomes. And the graph wiggles, okay? If you could say there’s some probability of going through the first hole, and some probability of going through the second, and each time you detected it, it went through either one or the other, there’d be no wiggles in that graph. It’s the interference that makes it wiggle. The essence of the interference is that nobody can tell you whether it went through the first slit or the second slit. The question is sort of inadmissible. This interference then occurs when we can add up these different alternatives in a way which is different from what we’re used to. It’s not right to say that the electron was detected at this point because it had some probability of going through the first hole, and some probability of going through the second

John Preskill [00:12:23] – and we add those probabilities up. That doesn’t give the right answer. The different alternatives can interfere. This is really important for quantum computing because what we’re trying to do is enhance the probability or the time it takes to find the solution to a problem, and this interference can work to our advantage. We want to have, when we’re doing our search, we want to have a higher chance of getting the right answer, and a lower chance of getting the wrong answer. If the different wrong answers can interfere, they can cancel one another out, and that enhances the probability of getting the right answer. Sorry it’s such a long-winded answer, but this is how Grover’s algorithm works.

John Preskill [00:13:17] – It can speed up exhaustive search by taking advantage of that interference phenomenon.

Craig Cannon [00:13:20] – Well this is kind of one of the underlying questions among many of the questions from Twitter. You’ve hit our record for most questions asked. Basically, many people are wondering what quantum computers really will do if and when it becomes a reality that they outperform classical computers. What are they going to be really good at?

John Preskill [00:13:44] – Well, you know what? I’m not really sure. If you look at the history of technology, it would be hubris to expect me to know. It’s a whole different way of dealing with information. Quantum information is not just … a quantum computer is not just a faster way of computing. It deals with information in a completely new way because of this interference phenomenon, because of entanglement that we’ve talked about. We have limited vision when it comes to predicting decades out what the impact will be of an entirely new way of doing things. Information processing, in particular. I mean you know this well. We go back to the 1960s, and people are starting to put a few transistors on a chip. Where is that going to lead? Nobody knew.

Craig Cannon [00:14:44] – Even early days of the internet.

John Preskill [00:14:45] – Yeah, good example.

Craig Cannon [00:14:46] – Even the first browser. No one really knew what anyone was going to do with it. It makes total sense.

John Preskill [00:14:52] – For good or ill. Yeah. But we have some ideas, you know? I think … why are we confident there will be some transformative effect on society? Of the things we know about, and I emphasize again, probably the most important ones are things we haven’t thought of when it comes to applications of quantum computing, the ones which will affect everyday life, I think, are better methods for understanding and inventing new materials, new chemical compounds. Things like that can be really important. If you find a better way of capturing carbon by designing a better catalyst, or you can design pharmaceuticals that have new effects, materials that have unusual properties. These are quantum physics problems because those properties of the molecule or the material really have to do with the underlying quantum behavior of the particles, and we don’t have a good way for solving such problems or predicting that behavior using ordinary digital computers. That’s what a quantum computer is good at. It’s good — but maybe not the only thing it’s good at — one thing it should certainly be good at is telling us quantitatively how quantum systems behave. In the two contexts I just mentioned, there’s little question that there will be practical impact of that.

Craig Cannon [00:16:37] – It’s not just doing the traveling salesman problem through the table of elements for why it can find these compounds.

John Preskill [00:16:49] – No. If it were, that wouldn’t be very efficient.

Craig Cannon [00:16:52] – Exactly.

John Preskill [00:16:53] – Yeah. No, it’s much trickier than that. Like I said, the exhaustive search, though conceptually it’s really interesting that quantum can speed it up because of interference, from a practical point of view it may not be that big a deal. It means that, well like I said, in the same amount of time you can solve an instance which is twice as big of the problem. What we really get excited about are the so-called exponential speed ups. That was why Shor’s algorithm was exciting in 1994, because factoring large numbers was a problem that had been studied by smart people for a long time, and on that basis, the fact that there weren’t any fast ways of solving it was pretty good evidence it’s a hard problem. Actually, we don’t know how to prove that from first principles. Maybe somebody will come along one day and figure out how to solve factoring very fast on a digital computer. It doesn’t seem very likely because people have been trying for so long to solve problems like that, and it’s just intractable with ordinary computers. You could say the same thing about these quantum physics problems. Maybe some brilliant graduate student is going to drop a paper on the arXiv tomorrow which will say, “Here, I solved quantum chemistry, and I can do it on a digital computer.” But we don’t think that’s very likely because we’ve been working pretty hard on these problems for decades and they seem to be really hard. Those cases, like these number theoretic problems,

John Preskill [00:18:40] – which have cryptological implications, and tasks for simulating the behavior of quantum systems, we’re pretty sure those are hard problems classically, and we’re pretty sure quantum computers … I mean we have algorithms that have been proposed, but which we can’t really run currently because our quantum computers aren’t big enough on the scale that’s needed to solve problems people really care about.

Craig Cannon [00:19:09] – Maybe we should jump to one of the questions from Twitter which is related to that. Travis Scholten (@Travis_Sch) asked, what are the most problem pressings in physics, let’s say specifically around quantum computers that you think substantial progress ought to be made in to move the field forward?

John Preskill [00:19:27] – I know Travis. He was an undergrad here. How you doing, Travis? The problems that we need to solve to make quantum computing closer to realization at the level that would solve problems people care about? Well, let’s go over where we are now.

Craig Cannon [00:19:50] – Yeah, definitely.

John Preskill [00:19:51] – People have been working on quantum hardware for 20 years, working hard, and there are a number of different approaches to building the hardware, and nobody really knows which is going to be the best. I think we’re far from collapsing to one approach which everybody agrees has the best long-term prospects for scalability. And so it’s important that a lot of different types of hardware are being pursued. We can come back to what some of the different approaches are later. Where are we now? We think in a couple of years we’ll have devices with about 50 qubits to 100, and we’ll be able to control them pretty well. That’s an interesting range because even though it’s only 50 to 100 qubits, doesn’t sound like that big a deal, but that’s already too many to simulate with a digital computer, even with the most powerful supercomputers today. From that point of view, these relatively small, near-term quantum computers which we’ll be fooling around with over the next five years or so, are doing something that’s kind of super-classical.

John Preskill [00:21:14] – At least, we don’t know how to do exactly the same things with ordinary computers. Now that doesn’t mean they’ll be able to do anything that’s practically important, but we’re going to try. We’re going to try, and there are ideas about things we’ll try out, including baby versions of these problems in chemistry, and materials, and ways of speeding up optimization problems. Nobody knows how well those things are going to work at these small scales. Part of the reason is not just the number of qubits is small, but they’re also not perfect. We can perform elementary operations on pairs of qubits, which we call quantum gates like the gates in ordinary logic. But they have an error rate a little bit below an error every 100 gates. If you have a circuit with 1000 qubits, that’s a lot of noise.

Craig Cannon [00:22:18] – Exactly. Does for instance, 100-qubit quantum computer really mean 100-qubit quantum computer or do you need a certain amount of backup going on?

John Preskill [00:22:29] – In the near term, we’re going to be trying out, and probably we have the best hopes for, kind of hybrid classical-quantum methods with some kind of classical feedback. You try to do something on the quantum computer, you make a measurement that gives you some information, then you change the way you did it a little bit, and try to converge on some better answer. That’s one possible way of addressing optimization that might be faster on a quantum computer. But I just wanted to emphasize that the number of qubits isn’t the only metric. How good they are, and in particular, the reliability of the gates, how well we can perform them … that’s equally important. Anyway, coming back to Travis’ question, there are lots of things that we’d like to be able to do better. But just having much better qubits would be huge, right? If you … more or less, with the technology we have now, you can have a gate error rate of a few parts in 1,000, you know? If you can improve that by orders of magnitude, then obviously, you could run bigger circuits. That would be very enabling.

John Preskill [00:23:58] – Even if you stick with 100 qubits just by having a circuit with more depth, more layers of gates, that increases the range of what you could do. That’s always going to be important. Because, I mean look at how crappy that is. A gate error rate, even if it’s one part in 1,000, that’s pretty lousy compared to if you look at where–

Craig Cannon [00:24:21] – Your phone has a billion transistors in it. Something like that, and 0%–

John Preskill [00:24:27] – You don’t worry about the … it’s gotten to the point where there is some error protection built in at the hardware level in a processor, because I mean, we’re doing these crazy things like going down from the 11 nanometer scale for features on a chip.

Craig Cannon [00:24:45] – How are folks trying to deal with interference right now?

John Preskill [00:24:50] – You mean, what types of devices? Yeah, so that’s interesting too because there are a range of different ways to do it. I mentioned that we could store information, we could make a qubit out of a single atom, for example. That’s one approach. You have to control a whole bunch of atoms and get them to interact with one another. One way of doing that is with what we call trapped ions. That means the atoms have electrical charges. That’s a good thing because then you could control them with electric fields. You could hold them in a trap, and you can isolate them, like I said, in a very high vacuum so they’re not interacting too much with other things in the laboratory, including stray electric and magnetic fields. But that’s not enough because you got to get them to talk to one another. You got to get them to interact. We have this set of desiderata, which are kind of in tension with one another. On the one hand, we want to isolate the qubits very well. On the other hand, we want to control them from the outside and get them to do what we want them to do, and eventually, we want to read them out. You have to be able to read out the result of the computation. But the key thing is the control. You could have two of those qubits in your device interact with one another in a specified way, and to do that very accurately you have to have some kind of bus that gets the two to talk to one another.

John Preskill [00:26:23] – The way they do that in an ion trap is pretty interesting. It’s by using lasers and controlling how the ions vibrate in the trap, and with a laser, kind of excite, wiggles of the ion, and then by determining whether the ions are wiggling or not, you can go address another ion, and that way you can do a two-qubit interaction. You can do that pretty well. Another way is really completely different. What I just described was encoding information at the one atom level. But another way is to use superconductivity — circuits in which electric current flows without any dissipation. In that case, you have a lot of freedom to sort of engineer the circuits to behave in a quantum way. There are many nuances there, but the key thing is that you can encode information now in a system that might involve the collective motion of billions of electrons, and yet you can control it as though it were a single atom. I mean, here’s one oversimplified way of thinking about it.

John Preskill [00:27:42] – Suppose you have a little loop of wire, and there’s current flowing in the loop. It’s a superconducting wire so it just keeps flowing. Normally, there’d be resistance, which would dissipate that as heat, but not for the superconducting circuit, which of course, has to be kept very cold so it stays superconducting. But you can imagine in this little loop that the current is either circulating clockwise or counterclockwise. That’s a way of encoding information. It could also be both at once, and that’s what makes it a qubit.

Craig Cannon [00:28:14] – Right.

John Preskill [00:28:15] – And so in that case, even though it involves lots of particles, the magic is that you can control that system extremely well. I mentioned individual electrons. That’s another approach. Put the qubit in the spin of a single electron.

Craig Cannon [00:28:32] – You also mentioned better qubits. What did you mean by that?

John Preskill [00:28:35] – Well, what I really care about is how well I can do the gates. There’s a whole other approach, which is motivated by the desire to have much, much better control over the quantum information than we do in those systems that I mentioned so far, superconducting circuits and trapped ions. That’s actually what Microsoft is pushing very hard. We call it topological quantum computing. Topological is a word physicists and mathematicians love. It means, well, we’ll come back to what it means. Anyway, let me just tell you what they’re trying to do. They’re trying to make a much, much better qubit, which they can control much, much better using a completely different hardware approach.

Craig Cannon [00:29:30] – Okay.

John Preskill [00:29:32] – It’s very ambitious because at this point, it’s not even clear they have a single qubit, but if that approach is successful, and it’s making progress, we will see a validated qubit of this type soon. Maybe next year. Nobody really knows where it goes from there, but suppose it’s the case that you could do a two-qubit gate with an error rate of one in a million instead of one in 1,000. That would be huge. Now, scaling all these technologies up, is really challenging from a number of perspectives, including just the control engineering.

Craig Cannon [00:30:17] – How are they doing it or attempting to do it?

John Preskill [00:30:21] – You know, you could ask, where did all this progress come from over 20 years, or so? For example, with the superconducting circuits, a sort of crucial measure is what we call the coherence time of the qubit, which roughly speaking, means how much it interacts with the outside world. The longer the coherence time, the better. The rate of what we call decoherence is essentially how much it’s getting buffeted around by outside influences. For the superconducting circuits, those coherence times have increased about a factor of 10 every three years, going back 15 years or so.

Craig Cannon [00:31:06] – Wow.

John Preskill [00:31:07] – Now, it won’t necessarily go on like that indefinitely, but in order to achieve that type of progress, better materials, better fabrication, better control. The way you control these things is with microwave circuitry. Not that different from the kind of things that are going on in communication devices. All those things are important, but going forward, the control is really the critical thing. Coherence times are already getting pretty long, I mean having them longer is certainly good. But the key thing is to get two qubits to interact just the way you want them to. Even if there is, now I keep saying the key thing is the environment, it’s not the only key thing, right? Because you have some qubit, like if you think about that electron spin, one way of saying it is I said it can be both up and down at the same time. Well, there’s a simpler way of saying that. It might not point either up or down. It might point some other way. But there really are a continuum of ways it could point. That’s not like a bit. See, it’s much easier to stabilize a bit because it’s got two states.

John Preskill [00:32:31] – But if it can kind of wander around in the space of possible configurations for a qubit, that makes it much harder to control. People have gotten better at that, a lot better at that in the last few years.

Craig Cannon [00:32:44] – Interesting. Joshua Harmon asked, what engineering strategy for quantum computers do you think has the most promise?

John Preskill [00:32:53] – Yeah, so I mentioned some of these different approaches, and I guess I’ll interpret the question as, which one is the winning horse? I know better than to answer that question! They’re all interesting. For the near term, the most advanced are superconducting circuits and trapped ions, which is why I mentioned those first. I think that will remain true over the next five to 10 years. Other technologies have the potential — like these topologically protected qubits — to surpass those, but it’s not going to happen real soon. I kind of like superconducting circuits because there’s so much phase space of things you can do with them. Of ways you can engineer and configure them, and imagine scaling them up.

John Preskill [00:33:54] – They have the advantage of being faster. The cycle time, time to do a gate, is faster than with the trapped ions. Just the basic physics of the interactions is different. In the long term, those electron spins could catapult ahead of these other things. That’s something that you can naturally do in silicon, and it’s potentially easy to integrate with silicon technology. Right now, the qubits and gates aren’t as good as the other technologies, but that can change. I mean, from a theorist’s perspective, this topological approach is very appealing. We can imagine it takes off maybe 10 years from now and it becomes the leader. I think it’s important to emphasize we don’t really know what’s going to scale the best.

Craig Cannon [00:34:50] – Right. And are there multiple attempts being made around programming quantum computers?

John Preskill [00:34:55] – Yeah. I mean, some of these companies– That are working on quantum technology now, which includes well-known big players like IBM, and Google, and Microsoft and Intel, but also a lot of startups now. They are trying to encompass the full stack, so they’re interested in the hardware, and the fabrication, and the control technology. But also, the software, the applications, the user interface. All those things are certainly going to be important eventually.

Craig Cannon [00:35:38] – Yeah, they’re pushing it almost to like an AWS layer. Where you interact with your quantum computer in a server farm and you don’t even touch it.

John Preskill [00:35:49] – That’s how it will be in the near term. You’re not going to have, most of us won’t, have a quantum computer sitting on your desktop, or in your pocket. Maybe someday. In the near term, it’ll be in the Cloud, and you’ll be able to run applications on it by some kind of web interface. Ideally, that should be designed so the user doesn’t have to know anything about quantum physics in order to program or use it, and I think that’s part of what some of these companies are moving toward.

Craig Cannon [00:36:24] – Do you think it will get to the level where it’s in your pocket? How do you deal with that when you’re below one kelvin?

John Preskill [00:36:32] – Well, if it’s in your pocket, it probably won’t be one kelvin.

Craig Cannon [00:36:35] – Yeah, probably not.

John Preskill [00:36:38] – What do you do? Well, there’s one approach, as an example, which I guess I mentioned in passing before, where maybe it doesn’t have to be at such low temperature, and that’s nuclear spins. Because they’re very weakly interacting with the outside world, you can have quantum information in a nuclear spin, which — I’m not saying that it would be undisturbed for years, but seconds, which is pretty good. And you can imagine that getting significantly longer. Someday you might have a little quantum smart card in your pocket. The nice thing about that particular technology is you could do it at room temperature. Still have long coherence times. If you go to the ATM and you’re worried that there’s a rogue bank that’s going to steal your information, one solution to that problem — I’m not saying there aren’t other solutions — is to have a quantum card where the bank will be able to authenticate it without being able to forge it.

Craig Cannon [00:37:54] – We should talk about the security element. Kevin Su asked what risk would quantum computers pose to current encryption schemes? So public key, and what changes should people be thinking about if quantum computers come in the next five years, 10 years?

John Preskill [00:38:12] – Yeah. Quantum computers threaten those systems that are in widespread use. Whenever you’re using a web browser and you see that little padlock and you’re at an HTTPS site, you’re using a public key cryptosystem to protect your privacy. Those cryptosystems rely for their security on the presumed hardness of computational problems. That is, it’s possible to crack them, but it’s just too hard. RSA, which is one of the ones that’s widely used … as typically practiced today, to break it you’d have to do something like factor a number which is over 2000 bits long, 2048. That’s too hard to do now. But that’s what quantum computers will be good at. Another one that’s widely used is called elliptic curve cryptography. Doesn’t really matter exactly what it is.

John Preskill [00:39:24] – But the point is that it’s also vulnerable to quantum attack, so we’re going to have to protect our privacy in different ways when quantum computers are prevalent.

Craig Cannon [00:39:37] – What are the attempts being made right now?

John Preskill [00:39:39] – There are two main classes of attempts. One is just to come up with a cryptographic protocol not so different conceptually from what’s done now, but based on a problem that’s hard for quantum computers.

Craig Cannon [00:39:59] – There you go.

John Preskill [00:40:02] – It turns out that what has sort of become the standard way doesn’t have that feature, and there are alternatives that people are working on. We speak of post-quantum cryptography, meaning the protocols that we’ll have to use when we’re worried that our adversaries have quantum computers. I don’t think there’s any proposed cryptosystem — although there’s a long list of them by now which people think are candidates for being quantum resistant, for being unbreakable, or hard to break by quantum computers. I don’t think there’s any one that the world has sufficient confidence in now that’s really hard for a quantum adversary that we’re all going to switch over. But it’s certainly time to be thinking about it. When people worry about their privacy, of course different users have different standards, but the US Government sometimes says they would like a system to stay secure for 50 years. They’d like to be able to use it for 20, roughly speaking, and then have the intercepted traffic be protected for another 30 after that. I don’t think, though I could be wrong, that we’re likely to have quantum computers that can break those public key cryptosystems in 10 years, but in 50 years seems not unlikely,

John Preskill [00:41:33] – and so we should really be worrying about it. The other one is actually using quantum communication for privacy. In other words, if you and I could send qubits to one another instead of bits, it opens up new possibilities. The way to think about these public key schemes — or one way — that we’re using now, is I want you to send me a private message, and I can send you a lockbox. It has a padlock on it, but I keep the key, okay? But you can close up the box and send it to me. But I’m the only one with the key. The key thing is that if you have the padlock you can’t reverse engineer the key. Of course, it’s a digital box and key, but that’s the idea of public key. The idea of what we call quantum key distribution, which is a particular type of quantum cryptography, is that I can actually send you the key, or you can send me your key, but why can’t any eavesdropper then listen in and know the key? Well it’s because it’s quantum, and remember, it has that property that if you look at it, you disturb it.

John Preskill [00:42:59] – So if you collect information about my key, or if the adversary does, that will cause some change in the key, and there are ways in which we can check whether what you received is really what I sent. And if it turns out it’s not, or it has too many errors in it, then we’ll be suspicious that there was an adversary who tampered with it, and then we won’t use that key. Because we haven’t used it yet — we’re just trying to establish the key. We do the test to see whether an adversary interfered. If it passes the test, then we can use the key. And if it fails the test, we throw that key away and we try again. That’s how quantum cryptography works, but it requires a much different infrastructure than what we’re using now. We have to be able to send qubits … well, it’s not completely different because you can do it with photons. Of course, that’s how we communicate through optical fiber now — we’re sending photons. It’s a little trickier sending quantum information through an optical fiber, because of that issue that interactions with the environment can disturb it. But nowadays, you can send quantum information through an optical fiber over tens of kilometers with a low enough error rate so it’s useful for communication.

Craig Cannon [00:44:22] – Wow.

John Preskill [00:44:23] – Of course, we’d like to be able to scale that up to global distances.

Craig Cannon [00:44:26] – Sure.

John Preskill [00:44:27] – And there are big challenges in that. But anyway, so that’s another approach to the future of privacy that people are interested in.

Craig Cannon [00:44:35] – Does that necessitate quantum computers on both ends?

John Preskill [00:44:38] – Yes, but not huge ones. The reason … well, yes and no. At the scale of tens of kilometers, no. You can do that now. There are prototype systems that are in existence. But if you really want to scale it up —  in other words, to send things longer distance — then you have to bring this quantum error correction idea into the game.

John Preskill [00:45:10] – Because at least with our current photonics technology, there’s no way I can send a single photon from here to China without there being a very high probability that it gets lost in the fiber somewhere. We have to have what we call quantum repeaters, which can boost the signal. But it’s not like the usual type of repeater that we have in communication networks now. The usual type is you measure the signal, and then you resend it. That won’t work for quantum because as soon as you measure it you’re going to mess it up. You have to find a way of boosting it without knowing what it is. Of course, it’s important that it works that way because otherwise, the adversary could just intercept it and resend it. And so it will require some quantum processing to get that quantum error correction in the quantum repeater to work. But it’s a much more modest scale quantum processor than we would need to solve hard problems.

Craig Cannon [00:46:14] – Okay. Gotcha. What are the other things you’re both excited about, and worried about for potential business opportunities? Snehan, I’m mispronouncing names all the times, Snehan Kekre asks, budding entrepreneurs, what should they be thinking about in the context of quantum computing?

John Preskill [00:46:37] – There’s more to quantum technology than computing. Something which has good potential to have an impact in the relatively near future is improved sensing. Quantum systems, partly because of that property that I keep emphasizing that they can’t be perfectly isolated from the outside, they’re good at sensing things. Sometimes, you want to detect it when something in the outside world messes around with your qubit. Again, using this technology of nuclear spins, which I mentioned you can do at room temperature potentially, you can make a pretty good sensor, and it can potentially achieve higher sensitivity and spatial resolution, look at things on shorter distance scales than other existing sensing technology. One of the things people are excited about are the biological and medical implications of that.

John Preskill [00:47:53] – If you can monitor the behavior of molecular machines, probe biological systems at the molecular level using very powerful sensors, that would surely have a lot of applications. One interesting question you can ask is, can you use these quantum error correction ideas to make those sensors even more powerful? That’s another area of current basic research, where you could see significant potential economic impact.

Craig Cannon [00:48:29] – Interesting. In terms of your research right now, what are you working on that you find both interesting and incredibly difficult?

John Preskill [00:48:40] – Everything I work on–

Craig Cannon [00:48:41] – 100%.

John Preskill [00:48:42] – Is both interesting and incredibly difficult. Well, let me change direction a little from what we’ve been talking about so far. Well, I’m going to tell you a little bit about me.

Craig Cannon [00:48:58] – Sure.

John Preskill [00:49:00] – I didn’t start out interested in information in my career. I’m a physicist. I was trained as an elementary particle theorist, studying the fundamental interactions and the elementary particles. That drew me into an interest in gravitation because one thing that we still have a very poor understanding of is how gravity fits together with the other fundamental interactions. The way physicists usually say it is we don’t have a quantum theory of gravity, at least not one that we think is complete and satisfactory. I’ve been interested in that question for many decades, and then got sidetracked because I got excited about quantum computing. But you know what? I’ve always looked at quantum information not just as a technology. I’m a physicist, I’m not an engineer. I’m not trying to build a better computer, necessarily, though that’s very exciting, and worth doing, and if my work can contribute to that, that’s very pleasing. I see quantum information as a new frontier in the exploration of the physical sciences. Sometimes I call it the entanglement frontier. Physicists, we like to talk about frontiers, and stuff. Short distance frontier. That’s what we’re doing at CERN in the Large Hadron Collider, trying to discern new properties of matter at distances which are shorter than we’ve ever been able to explore before.

John Preskill [00:50:57] – There’s a long distance frontier in cosmology. We’re trying to look deeper into the universe and understand its structure and behavior at earlier times. Those are both very exciting frontiers. This entanglement frontier is increasingly going to be at the forefront of basic physics research in the 21st century. By entanglement frontier, I just mean scaling up quantum systems to larger and larger complexity where it becomes harder and harder to simulate those systems with our existing digital tools. That means we can’t very well anticipate the types of behavior that we’re going to see. That’s a great opportunity for new discovery, and that’s part of what’s going to be exciting even in the relatively near term. When we have 100 qubits … there are some things that we can do to understand the behavior of the dynamics of a highly complex system of 100 qubits that we’ve never been able to experimentally probe before. That’s going to be very interesting. But what we’re starting to see now is that these quantum information ideas are connecting to these fundamental questions about gravitation, and how to think about it quantumly. And it turns out, as is true for most of the broader implications of quantum physics, the key thing is entanglement.

John Preskill [00:52:36] – We can think of the microscopic structure of spacetime, the geometry of where we live. Geometry just means who’s close to who else. If we’re in the auditorium, and I’m in the first row and you’re in the fourth row, the geometry is how close we are to one another. Of course, that’s very fundamental in both space and time. How far apart are we in space? How far apart are we in time? Is geometry really a fundamental thing, or is it something that’s kind of emergent from some even more fundamental concept? It seems increasingly likely that it’s really an emergent property.

John Preskill [00:53:29] – That there’s something deeper than geometry. What is it? We think it’s quantum entanglement. That you can think of the geometry as arising from quantum correlations among parts of a system. That’s really what defines who’s close to who. We’re trying to explore that idea more deeply, and one of the things that comes in is the idea of quantum error correction. Remember the whole idea of quantum error correction was that we could make a quantum system behave the way we want it to because it’s well-protected against the damaging effects of noise. It seems like quantum error correction is part of the deep secret of how spacetime geometry works. It has a kind of intrinsic robustness coming from these ideas of quantum error correction that makes space meaningful, so that it doesn’t just evaporate when you tap on it. If you wanted to, you could think of the spacetime, the space that you’re in and the space that I’m in, as parts of a system that are entangled with one another.

John Preskill [00:54:45] – What would happen if we broke that entanglement and your part of space became disentangled from my part? Well what we think that would mean is that there’d be no way to connect us anymore. There wouldn’t be any path through space that starts over here with me and ends with you. It’d become broken apart into two pieces. It’s really the entanglement which holds space together, which keeps it from falling apart into little pieces. We’re trying to get a deeper grasp of what that means.

Craig Cannon [00:55:19] – How do you make any progress on that? That seems like the most unbelievably difficult problem to work on.

John Preskill [00:55:26] – It’s difficult because, well for a number of reasons, but in particular, because it’s hard to get guidance from experiment, which is how physics historically–

Craig Cannon [00:55:38] – All science.

John Preskill [00:55:38] – Has advanced.

Craig Cannon [00:55:39] – Yeah.

John Preskill [00:55:41] – Although it was fun a moment ago to talk about what would happen if we disentangled your part of space from mine, I don’t know how to do that in the lab right now. Of course, part of the reason is we have the audacity to think we can figure these things out just by thinking about them. Maybe that’s not true. Nobody knows, right? We should try. Solving these problems is a great challenge, and it may be that the apes that evolved on Earth don’t have the capacity to understand things like the quantum structure of spacetime. But maybe we do, so we should try. Now in the longer term, and maybe not such a long term, maybe we can get some guidance from experiment. In particular, what we’re going to be doing with quantum computers and the other quantum technologies that are becoming increasingly sophisticated in the next couple of decades, is we’ll be able to control very well highly entangled complex quantum systems. That should mean that in a laboratory, on a tabletop, I can sort of make my own little toy space time …

John Preskill [00:57:02] – with an emergent geometry arising from the properties of that entanglement, and I think that’ll teach us lessons because systems like that are the types of system that, because they’re so highly entangled, digital computers can’t simulate them. It seems like only quantum computers are potentially up to the task. So that won’t be quite the same as disentangling your side of the room from mine, in real life. But we’d be able to do it in a laboratory setting using model systems, which I think would help us to understand the basic principles better.

Craig Cannon [00:57:39] – Wild. Yeah, desktop space time seems pretty cool, if you could figure it out.

John Preskill [00:57:43] – Yeah, it’s pretty fundamental. We didn’t really talk about what people sometimes, we did implicitly, but not in so many words. We didn’t talk about what people sometimes call quantum non-locality. It’s another way of describing quantum entanglement, actually. There’s this notion of Bell’s theorem that when you look at the correlations among the parts of a quantum system, that they’re different from any possible classical correlations. Some things that you read give you the impression that you can use that to instantaneously send information over long distances. It is true that if we have two qubits, electron spins, say, and they’re entangled with one another, then what’s kind of remarkable is that I can measure my qubit to see along some axis whether it’s up or down, and you can measure yours, and we will get perfectly correlated results. When I see up, you’ll see up, say, and when I see down, you’ll see down. And sometimes, people make it sound like that’s remarkable. That’s not remarkable in itself. Somebody could’ve flipped a pair of coins, you know,

John Preskill [00:59:17] – so that they came up both heads or both tails, and given one to you and one –

Craig Cannon [00:59:20] – Split them apart.

John Preskill [00:59:20] – to me.

Craig Cannon [00:59:21] – Yeah.

John Preskill [00:59:22] – And gone a light year apart, and then we both …  hey, mine’s heads. Mine’s heads too!

Craig Cannon [00:59:24] – And then they call it quantum teleportation on YouTube.

John Preskill [00:59:28] – Yeah. Of course, what’s really important about entanglement that makes it different from just those coins is that there’s more than one way of looking at a qubit. We have what we call complementary ways of measuring it, so you can ask whether it’s up or down along this axis or along that axis. There’s nothing like that for the coins. There’s just one way to look at it. What’s cool about entanglement is that we’ll get perfectly correlated results if we both measure in the same way, but there’s more than one possible way that we could measure. What sometimes gets said, or the impression people get, is that that means that when I do something to my qubit, it instantaneously affects your qubit, even if we’re on different sides of the galaxy. But that’s not what entanglement does. It just means they’re correlated in a certain way.

John Preskill [01:00:30] – When you look at yours, if we have maximally entangled qubits, you just see a random bit. It could be a zero or a one, each occurring with probability 1/2. That’s going to be true no matter what I did to my qubit, and so you can’t tell what I did by just looking at it. It’s only that if we compared notes later we can see how they’re correlated, and that correlation holds for either one of these two complementary ways in which we could both measure. It’s that fact that we have these complementary ways to measure that makes it impossible for a classical system to reproduce those same correlations. So that’s one misconception that’s pretty widespread. Another one is this about quantum computing, which is in trying to explain why quantum computers are powerful, people will sometimes say, well, it’s because you can superpose –I used that word before, you can add together many different possibilities. That means that, whereas an ordinary computer would just do a computation once, acting on a superposition a quantum computer can do a vast number of computations all at once.

John Preskill [01:01:54] – There’s a certain sense in which that’s mathematically true if you interpret it right, but it’s very misleading. Because in the end, you’re going to have to make some measurement to read out the result. When you read it out, there’s a limited amount of information you can get. You’re not going to be able to read out the results of some huge number of computations in a single shot measurement. Really the key thing that makes it work is this idea of interference, which we discussed briefly when you asked about Grover’s algorithm. The art of a quantum algorithm is to make sure that the wrong answers interfere and cancel one another out, so the right answer is enhanced. That’s not automatic. It requires that the quantum algorithm be designed in just the right way.

Craig Cannon [01:02:50] – Right. The diagrams I’ve seen online at least, involve usually you’re squaring the output as it goes along, and then essentially, that flips the correct answer to the positive, and the others are in the negative position. Is that accurate?

John Preskill [01:03:08] – I wouldn’t have said it the way you did– Because you can’t really measure it as you go along. Once you measure it, the magic of superposition is going to be lost.

John Preskill [01:03:19] – It means that now there’s some definite outcome or state. To take advantage of this interference phenomenon, you need to delay the measurement. Remember when we were talking about the double slit and I said, if you actually see these wiggles in the probability of detection, which is the signal of interference, that means that there’s no way anybody could know whether the electron went through hole one or hole two? It’s the same way with quantum computing. If you think of the computation as being a superposition of different possible computations, it wouldn’t work — there wouldn’t be a speed up — if you could know which of those paths the computation followed. It’s important that you don’t know. And so you have to sum up all the different computations, and that’s how the interference phenomenon comes into play.

Craig Cannon [01:04:17] – To take a little sidetrack, you mentioned Feynman before. And before we started recording you mentioned working with him. I know I’m in the Feynman fan club, for sure. What was that experience like?

John Preskill [01:04:32] – We never really collaborated. I mean, we didn’t write a paper together, or anything like that. We overlapped for five years at Caltech. I arrived here in 1983. He died in 1988. We had offices on the same corridor, and we talked pretty often because we were both interested in the fundamental interactions, and in particular, what we call quantum chromodynamics. It’s our theory of how nuclear matter behaves, how quarks interact, what holds the proton together, those kinds of things. One big question is what does hold the proton together? Why don’t the quarks just fall apart? That was an example of a problem that both he and I were very interested in, and which we talked about sometimes. Now, this was pretty late in his career. When I think about it now, when I arrived at Caltech, that was 1983, Feynman was born in 1918, so he was 65. I’m 64 now, so maybe he wasn’t so old, right? But at the time, he seemed pretty ancient to me. Since I was 30.

John Preskill [01:05:58] – Those who interacted with Dick Feynman when he was really at his intellectual peak in the ’40s, and ’50s, and ’60s, probably saw even more extraordinary intellectual feats than I witnessed interacting with the 65 year old Feynman. He just loved physics, you know? He just thought everything was so much fun. He loved talking about it. He wasn’t as good a listener as a talker, but actually – well that’s a little unfair, isn’t it? It was kind of funny because Feynman, he always wanted to think things through for himself, sort of from first principles, rather than rely on the guidance from experts who have thought about these things before. Well that’s fine. You should try to understand things as deeply as you can on your own, and sort of reconstruct the knowledge from the ground up. That’s very enabling, and gives you new insights. But he was a little too dismissive, in my view, of what the other guys knew. But I could slip it in because I didn’t tell him, “Dick, you should read this paper by Polyakov” — well maybe I did, but he wouldn’t have even heard that  — because he solved that problem that you’re talking about.

John Preskill [01:07:39] – But I knew what Polyakov had said about it, so I would say, “Oh well, look, why don’t we look at it this way?” And so he thought I was, that I was having all these insights, but the truth was the big difference between Feynman and me in the mid 1980s was I was reading literature, and he wasn’t.

Craig Cannon [01:08:00] – That’s funny.

John Preskill [01:08:01] – Probably, if he had been, he would’ve been well served, but that wasn’t the way he liked to work on things. He wanted to find his own approach. Of course, that had worked out pretty well for him throughout his career.

Craig Cannon [01:08:15] – What other qualities did you notice about him when he was roaming the corridors?

John Preskill [01:08:21] – He’d always be drumming. So you would know he was around because he’d actually be walking down the hallway drumming on the wall.

Craig Cannon [01:08:27] – Wait, with his hands, or with sticks, or–

John Preskill [01:08:29] – No, hands. He’d just be tapping.

Craig Cannon [01:08:32] – Just a bongo thing.

John Preskill [01:08:33] – Yeah. That was one thing. He loved to tell stories. You’ve probably read the books that Ralph Leighton put together based on the stories Feynman told. Ralph did an amazing job, of capturing Feynman’s personality in writing those stories down because I’d heard a lot of them. I’m sure he told the same stories to many people many times, because he loved telling stories. But the book really captures his voice pretty well.

John Preskill [01:09:12] – If you had heard him tell some of these stories, and then you read the way Ralph Leighton transcribed them, you can hear Feynman talking. At the time that I knew him, one of the experiences that he went through was he was on the Challenger commission after the space shuttle blew up. He was in Washington a lot of the time, but he’d come back from time to time, and he would sort of sit back and relax in our seminar room and start bringing us up to date on all the weird things that were happening on the Challenger commission. That was pretty fun.

Craig Cannon [01:09:56] – That’s really cool.

John Preskill [01:09:56] – A lot of that got captured in the second volume. I guess it’s the one called, What Do You Care What Other People Think? There’s a chapter about him telling stories about the Challenger commission. He was interested in everything. It wasn’t just physics. He was very interested in biology. He was interested in computation. I remember how excited he was when he got his first IBM PC. Probably not long after I got to Caltech. Yeah, it was what they called the AT. We thought it was a pretty sexy machine. I had one, too. He couldn’t wait to start programming it in BASIC.

Craig Cannon [01:10:50] – Very cool.

John Preskill [01:10:51] – Because that was so much fun.

Craig Cannon [01:10:52] – There was a question that I was kind of curious to your answer. Tika asks about essentially, teaching about quantum computers. They say, many kids in grade 10 can code. Some can play with machine learning tools without knowing the math. Can quantum computing become as simple and/or accessible?

John Preskill [01:11:17] – Maybe so. At some level, when people say quantum mechanics is counterintuitive, it’s hard for us to grasp, it’s so foreign to our experience, that’s true. The way things behave at the microscopic scale are, like we discussed earlier, really different from the way ordinary stuff behaves. But it’s a question of familiarity. What I wouldn’t be surprised by is that if you go out a few decades, kids who are 10 years old are going to be playing quantum games. That’s an application area that doesn’t get discussed very much, but there could be a real market there because people love games. Quantum games are different, and the strategies are different, and what you have to do to win is different. If you play the game enough, you start to get the hang of it.

John Preskill [01:12:26] – I don’t see any reason why kids who have not necessarily deeply studied physics can’t get a pretty good feel for how quantum mechanics works. You know, the way ordinary physics works, maybe it’s not so intuitive. Newton’s laws … Aristotle couldn’t get it right. He thought you had to keep pushing on something to get it to keep moving. That wasn’t right. Galileo was able to roll balls down a ramp, and things like that, and see he didn’t have to keep pushing to keep it moving. He could see that it was uniformly accelerated in a gravitational field. Newton took that to a much more general and powerful level. You fool around with stuff, and you get the hang of it. And I think quantum stuff can be like that. We’ll experience it in a different way, but when we have quantum computers, in a way, that opens the opportunity for trying things out and seeing what happens.

John Preskill [01:13:50] – After you’ve played the game enough, you start to anticipate. And actually, it’s an important point about the applications. One of the questions you asked me at the beginning was what are we able to do with quantum computers? And I said, I don’t know. So how are we going to discover new applications? It might just be, at least in part, by fooling around. A lot of classical algorithms that people use on today’s computers were discovered, or that they were powerful was discovered, by experimenting. By trying it. I don’t know … what’s an example of that? Well, the simplex method that we use in linear programming. I don’t think there was a mathematical proof that it was fast at first, but people did experiments, and they said, hey, this is pretty fast.

Craig Cannon [01:14:53] – Well, you’re seeing it a lot now in machine learning.

John Preskill [01:14:57] – Yeah, well that’s a good example.

Craig Cannon [01:14:58] – You test it out a million times over when you’re running simulations, and it turns out, that’s what works. Following the thread of education, and maybe your political interest, given it’s the year that it is, do you have thoughts on how you would adjust or change STEM education?

John Preskill [01:15:23] – Well, no particularly original thoughts. But I do think that STEM education … we shouldn’t think of it as we’re going to need this technical workforce, and so we better train them. The key thing is we want the general population to be able to reason effectively, and to recognize when an argument is phony and when it’s authentic. To think about, well how can I check whether what I just read on Facebook is really true? And I see that as part of the goal of STEM education. When you’re teaching kids in school how to understand the world by doing experiments, by looking at the evidence, by reasoning from the evidence, this is something that we apply in everyday life, too. I don’t know exactly how to implement this–

John Preskill [01:16:36] – But I think we should have that perspective that we’re trying to educate a public, which is going to eventually make critical decisions about our democracy, and they should understand how to tell when something is true or not. That’s a hard thing to do in general, but you know what I mean. That there are some things that, if you’re a person with some — I mean it doesn’t necessarily have to be technical — but if you’re used to evaluating evidence and making a judgment based on that evidence about whether it’s a good argument or not, you can apply that to all the things you hear and read, and make better judgments.

Craig Cannon [01:17:23] – What about on the policy side? Let’s see, JJ Francis asked that, if you or any of your colleagues would ever consider running for office. Curious about science policy in the US.

John Preskill [01:17:38] – Well, it would be good if we had more scientifically trained people in government. Very few members of Congress. I know of one, Bill Foster’s a physicist in Illinois. He was a particle physicist, and he worked at Fermilab, and now he’s in Congress, and very interested in the science and educational policy aspects of government. Rush Holt was a congressman from New Jersey who had a background in physics. He retired from the House a couple of years ago, but he was in Congress for something like 18 years, and he had a positive influence, because he had a voice that people respected when it came to science policy. Having more people like that would help. Now, another thing, it doesn’t have to be elective office.

Craig Cannon [01:18:39] – Right.

John Preskill [01:18:42] – There are a lot of technically trained people in government, many of them making their careers in agencies that deal with technical issues. Department of Defense, of course, there are a lot of technical issues. In the Obama Administration we had two successive secretaries of energy who were very, very good physicists. Steve Chu was Nobel Prize winning physicist. Then Ernie Moniz, who’s a real authority on nuclear energy and weapons. That kind of expertise makes a difference in government.

John Preskill [01:19:24] – Now the Secretary of Energy is Rick Perry. It’s a different background.

Craig Cannon [01:19:28] – Yeah, you could say that. Just kind of historical reference, what policies did they put in place that you really felt their hand as a physicist move forward?

John Preskill [01:19:44] – You mean in particular–

Craig Cannon [01:19:45] – I’m talking the Obama Administration.

John Preskill [01:19:49] – Well, I think the Department of Energy, DOE, tried to facilitate technical innovation by seeding new technologies, by supporting startup companies that were trying to do things that would improve battery technology, and solar power, and things like that, which could benefit future generations. They had an impact by doing that. You don’t have to be a Nobel Prize winning physicist to think that’s a good idea. That the administration felt that was a priority made a difference, and appointing a physicist at Department of Energy was, if nothing else, highly symbolic of how important those things are.

Craig Cannon [01:20:52] – On the quantum side, someone asked Vikas Karad, he asked where the Quantum Valley might be. Do you have thoughts, as in Silicon Valley for quantum computing?

John Preskill [01:21:06] – Well… I don’t know, but you look at what’s happening the last couple of years, there have been a number of quantum startups. A notable number of them are in the Bay Area. Why so? Well, that’s where the tech industry is concentrated and where the people who are interested in financing innovative technical startups are concentrated. If you are an entrepreneur interested in starting a company, and you’re concerned about how to fundraise for it, it kind of makes sense to locate in that area. Now, that’s what’s sort of happening now, and may not continue, of course. It might not be like that indefinitely. Nothing lasts forever, but I would say… That’s the place, Silicon Valley is likely to be Quantum Valley, the way things are right now.

Craig Cannon [01:22:10] – Well then what about the physicists who might be listening to this? If they’re thinking about starting a company, do you have advice for them?

John Preskill [01:22:22] – Just speaking very generally, if you’re putting a team together… Different people have different expertise. Take quantum computing as an example, like we were saying earlier, some of the big players and the startups, they want to do everything. They want to build the hardware, figure out better ways to fabricate it. Better control, better software, better applications. Nobody can be an expert on all those things. Of course, you’ll hire a software person to write your software, and microwave engineer to figure out your control, and of course that’s the right thing to do. But I think in that arena, and it probably applies to other entrepreneurial activity relating to physics, being able to communicate across those boundaries is very valuable, and you can see it in quantum computing now. That if the man or woman who’s involved in the software has that background, but there’s not a big communication barrier talking to the people who are doing the control engineering, that can be very helpful. It makes sense to give some preference to the people who maybe are comfortable doing so, or have the background that stretches across more than one of those areas of expertise. That can be very enabling in a technology arena like quantum computing today, where we’re trying to do really, really hard stuff, and you don’t know whether you’ll succeed, and you want to give it your best go by seeing the connections between those different things.

Craig Cannon [01:24:28] – Would you advise someone then to maybe teach or try and explain it to, I don’t know their young cousins? Because Feynman maybe recognizes the king of communicating physics, at least for a certain period of time. How would you advise someone to get better at it so they can be more effective?

John Preskill [01:24:50] – Practice. There are different aspects of that. This isn’t what you meant at all, but I’ll say it anyway, because what you asked brought it to mind. If you teach, you learn. We have this odd model in the research university that a professor like me is supposed to do research and teach. Why don’t we hire teachers and researchers? Why do we have the same people doing both? Well, part of the reason for me is most of what I know, what I’ve learned since my own school education ended, is knowledge I acquired by trying to teach it. To keep our intellect rejuvenated, we have to have that experience of trying to teach new things that we didn’t know that well before to other people. That deepens your knowledge. Just thinking about how you convey it makes you ask questions that you might not think to ask otherwise, and you say “Hey, I don’t know the answer to that.” Then you have to try to figure it out. So I think that applies at varying levels to any situation in which a scientist, or somebody with a technical background, is trying to communicate.

John Preskill [01:26:21] – By thinking about how to get it across to other people, we can get new insights, you know? We can look at it in a different way. It’s not a waste of time. Aside from the benefits of actually successfully communicating, we benefit from it in this other way. But other than that… Have fun with it, you know? Don’t look at it as a burden, or some kind of task you have to do along with all the other things you’re doing. It should be a pleasure. When it’s successful, it’s very gratifying. If you put a lot of thought into how to communicate something and you think people are getting it, that’s one of the ways that somebody in my line of work can get a lot of satisfaction.

Craig Cannon [01:27:23] – If now were to be your opportunity to teach a lot of people about physics, and you could just point someone to things, who would you advise someone to be? They want to learn more about quantum computing, they want to learn about physics. What should they be reading? What YouTube channel should they follow? What should they pay attention to?

John Preskill [01:27:44] – Well one communicator who I have great admiration for is Leonard Susskind, who’s at Stanford. You mentioned Feynman as the great communicator, and that’s fair, but in terms of style and personality of physicists who are currently active, I think Lenny Susskind is the most similar to Feynman of anyone I can think of. He’s a no bullshit kind of guy. He wants to give you the straight stuff. He doesn’t want to water it down for you. But he’s very gifted when it comes to making analogies and creating the illusion that you’re understanding what he’s saying. He has … if you just go to YouTube and search Leonard Susskind you’ll see lectures that he’s given at Stanford where they have some kind of extension school for people who are not Stanford students, people in the community. A lot of them in the tech community because it’s Stanford, and he’s giving courses. Yeah, and on quite sophisticated topics, but also on more basic topics, and he’s in the process of turning those into books. I’m not sure how many of those have appeared, but he has a series called The Theoretical Minimum

John Preskill [01:29:19] – which is supposed to be the gentle introduction to different topics like classical physics, quantum physics, and so on. He’s pretty special I think in his ability to do that.

Craig Cannon [01:29:32] – I need to subscribe. Actually, here’s a question then. In the things you’ve relearned while teaching over the past, I guess it’s 35 years now.

John Preskill [01:29:46] – Shit, is that right?

Craig Cannon [01:29:47] – Something like that.

John Preskill [01:29:48] – That’s true. Yeah.

Craig Cannon [01:29:51] – What were the big thing, what were the revelations?

John Preskill [01:29:55] – That’s how I learned quantum computing, for one thing. I was not at all knowledgeable about information science. That wasn’t my training. Back when I was in school, physicists didn’t learn much about things like information theory, computer science, complexity theory. One of the great things about quantum computing is its interdisciplinary character, that it brings these different things into contact, which traditionally had not been part of the common curriculum of any community of scholars. I decided 20 years ago that I should teach a quantum information class at Caltech, and I worked very hard on it that year. Not that I’m an expert, or anything, but I learned a lot about information theory, and things like channel capacity, and computational complexity — how we classify the hardness of problems — and algorithms. Things like that, which I didn’t really know very well. I had sort of a passing familiarity with some of those things from reading some of the quantum computing literature. That’s no substitute for teaching a class because then you really have to synthesize it and figure out your way of presenting it. Most of the notes are typed up and you can still get to them on my website.That was pretty transformative for me … and it was easier then, 20 years ago, I guess than it is now because it was such a new topic.

John Preskill [01:31:49] – But I really felt I was kind of close enough to the cutting edge on most of those topics by the time I’d finished the class that I wasn’t intimidated by another paper I’d read or a new thing I’d hear about those things. That was probably the one case where it really made a difference in my foundation of knowledge which enabled me to do things. But I had the same experience in particle physics. When I was a student, I read a lot. I was very broadly interested in physics. But when the first time, I was still at Harvard at the time –later I taught a similar course here — I’m in my late 20s, I’m just a year or two out of graduate school, and I decide to teach a very comprehensive class on elementary particles … in particular, quantum chromodynamics, the theory of nuclear forces like we talked about before. It just really expanded my knowledge to have that experience of teaching that class. I still draw on that. I can still remember that experience and I think I get ideas that I might not otherwise have because I went through that.

Craig Cannon [01:33:23] – I want to get involved now. I want to go back to school, or maybe teach a class. I don’t know.

John Preskill [01:33:27] – Well, what’s stopping you?

Craig Cannon [01:33:29] – Nothing. Alright, thanks John.

John Preskill [01:33:32] – Okay, thank you Craig.

May 19, 2018

Tommaso DorigoPiero Martin At TedX: An Eulogy Of The Error

Living in Padova has its merits. I moved here since January 1st and am enjoying every bit of it. I used to live in Venice, my home town, and commute with Padova during weekdays, but a number of factors led me to decide on this move (not last the fact that I could afford to buy a spacious place close to my office in Padova, while in Venice I was confined to a rented apartment).

read more

May 18, 2018

Matt von HippelThe Amplitudes Long View

Occasionally, other physicists ask me what the goal of amplitudes research is. What’s it all about?

I want to give my usual answer: we’re calculating scattering amplitudes! We’re trying to compute them more efficiently, taking advantage of simplifications and using a big toolbox of different approaches, and…

Usually by this point in the conversation, it’s clear that this isn’t what they were asking.

When physicists ask me about the goal of amplitudes research, they’ve got a longer view in mind. Maybe they’ve seen a talk by Nima Arkani-Hamed, declaring that spacetime is doomed. Maybe they’ve seen papers arguing that everything we know about quantum field theory can be derived from a few simple rules. Maybe they’ve heard slogans, like “on-shell good, off-shell bad”. Maybe they’ve heard about the conjecture that N=8 supergravity is finite, or maybe they’ve just heard someone praise the field as “demoting the sacred cows like fields, Lagrangians, and gauge symmetry”.

Often, they’ve heard a little bit of all of these. Sometimes they’re excited, sometimes they’re skeptical, but either way, they’re usually more than a little confused. They’re asking how all of these statements fit into a larger story.

The glib answer is that they don’t. Amplitudes has always been a grab-bag of methods: different people with different backgrounds, united by their interest in a particular kind of calculation.

With that said, I think there is a shared philosophy, even if each of us approaches it a little differently. There is an overall principle that unites the amplituhedron and color-kinematics duality, the CHY string and bootstrap methods, BCFW and generalized unitarity.

If I had to describe that principle in one word, I’d call it minimality. Quantum field theory involves hugely complicated mathematical machinery: Lagrangians and path integrals, Feynman diagrams and gauge fixing. At the end of the day, if you want to answer a concrete question, you’re computing a few specific kinds of things: mostly, scattering amplitudes and correlation functions. Amplitudes tries to start from the other end, and ask what outputs of this process are allowed. The idea is to search for something minimal: a few principles that, when applied to a final answer in a particular form, specify it uniquely. The form in question varies: it can be a geometric picture like the amplituhedron, or a string-like worldsheet, or a constructive approach built up from three-particle amplitudes. The goal, in each case, is the same: to skip the usual machinery, and understand the allowed form for the answer.

From this principle, where do the slogans come from? How could minimality replace spacetime, or solve quantum gravity?

It can’t…if we stick to only matching quantum field theory. As long as each calculation matches one someone else could do with known theories, even if we’re more efficient, these minimal descriptions won’t really solve these kinds of big-picture mysteries.

The hope (and for the most part, it’s a long-term hope) is that we can go beyond that. By exploring minimal descriptions, the hope is that we will find not only known theories, but unknown ones as well, theories that weren’t expected in the old understanding of quantum field theory. The amplituhedron doesn’t need space-time, it might lead the way to a theory that doesn’t have space-time. If N=8 supergravity is finite, it could suggest new theories that are finite. The story repeats, with variations, whenever amplitudeologists explore the outlook of our field. If we know the minimal requirements for an amplitude, we could find amplitudes that nobody expected.

I’m not claiming we’re the only field like this: I feel like the conformal bootstrap could tell a similar story. And I’m not saying everyone thinks about our field this way: there’s a lot of deep mathematics in just calculating amplitudes, and it fascinated people long before the field caught on with the Princeton set.

But if you’re asking what the story is for amplitudes, the weird buzz you catch bits and pieces of and can’t quite put together…well, if there’s any unifying story, I think it’s this one.

May 17, 2018

ResonaancesProton's weak charge, and what's it for

In the particle world the LHC still attracts the most attention, but in parallel there is ongoing progress at the low-energy frontier. A new episode in that story is the Qweak experiment in Jefferson Lab in the US, which just published their final results.  Qweak was shooting a beam of 1 GeV electrons on a hydrogen (so basically proton) target to determine how the scattering rate depends on electron's polarization. Electrons and protons interact with each other via the electromagnetic and weak forces. The former is much stronger, but it is parity-invariant, i.e. it does not care about the direction of polarization. On the other hand, since the classic Wu experiment in 1956, the weak force is known to violate parity. Indeed, the Standard Model postulates that the Z boson, who mediates the weak force,  couples with different strength to left- and right-handed particles. The resulting asymmetry between the low-energy electron-proton scattering cross sections of left- and right-handed polarized electrons is predicted to be at the 10^-7 level. That has been experimentally observed many times before, but Qweak was able to measure it with the best precision to date (relative 4%), and at a lower momentum transfer than the previous experiments.   

What is the point of this exercise? Low-energy parity violation experiments are often sold as precision measurements of the so-called Weinberg angle, which is a function of the electroweak gauge couplings - the fundamental parameters of the Standard Model. I don't like too much that perspective because the electroweak couplings, and thus the Weinberg angle, can be more precisely determined from other observables, and Qweak is far from achieving a competing accuracy. The utility of Qweak is better visible in the effective theory picture. At low energies one can parameterize the relevant parity-violating interactions between protons and electrons by the contact term
where v ≈ 246 GeV, and QW is the so-called weak charge of the proton. Such interactions arise thanks to the Z boson in the Standard Model being exchanged between electrons and quarks that make up the proton. At low energies, the exchange diagram is well approximated by the contact term above with QW = 0.0708  (somewhat smaller than the "natural" value QW ~ 1  due to numerical accidents making the Z boson effectively protophobic). The measured polarization asymmetry in electron-proton scattering can be re-interpreted as a determination of the proton weak charge: QW = 0.0719 ± 0.0045, in perfect agreement with the Standard Model prediction.

New physics may affect the magnitude of the proton weak charge in two distinct ways. One is by altering the strength with which the Z boson couples to matter. This happens for example when light quarks mix with their heavier exotic cousins with different quantum numbers, as is often the case in the models from the Randall-Sundrum family. More generally, modified couplings to the Z boson could be a sign of quark compositeness. Another way is by generating new parity-violating contact interactions between electrons and quarks. This can be a result of yet unknown short-range forces which distinguish left- and right-handed electrons. Note that the observation of lepton flavor violation in B-meson decays can be interpreted as a hint for existence of such forces (although for that purpose the new force carriers do not need to couple to 1st generation quarks).  Qweak's measurement puts novel limits on such broad scenarios. Whatever the origin, simple dimensional analysis allows one to estimate  the possible change of the proton weak charge as 
   where M* is the mass scale of new particles beyond the Standard Model, and g* is their coupling strength to matter. Thus, Qweak can constrain new weakly coupled particles with masses up to a few TeV, or even 50 TeV particles if they are strongly coupled to matter (g*~4π).

What is the place of Qweak in the larger landscape of precision experiments? One can illustrate it by considering a simple example where heavy new physics modifies only the vector couplings of the Z boson to up and down quarks. The best existing constraints on such a scenario are displayed in this plot:
From the size of the rotten egg region you see that the Z boson couplings to light quarks are currently known with a per-mille accuracy. Somewhat surprisingly, the LEP collider, which back in the 1990s produced tens of millions of Z boson to precisely study their couplings, is not at all the leader in this field. In fact, better constraints come from precision measurements at very low energies: pion, kaon, and neutron decays,  parity-violating transitions in cesium atoms,  and the latest Qweak results which make a difference too. The importance of Qweak is even more pronounced in more complex scenarios where the parameter space is multi-dimensional.

Qweak is certainly not the last salvo on the low-energy frontier. Similar but more precise experiments are being prepared as we read (I wish the follow up were called SuperQweak, or SQweak in short). Who knows, maybe quarks are made of more fundamental building blocks at the scale of ~100 TeV,  and we'll first find it out thanks to parity violation at very low energies. 

May 16, 2018

Terence Tao246C notes 2: Circle packings, conformal maps, and quasiconformal maps

We now leave the topic of Riemann surfaces, and turn now to the (loosely related) topic of conformal mapping (and quasiconformal mapping). Recall that a conformal map {f: U \rightarrow V} from an open subset {U} of the complex plane to another open set {V} is a map that is holomorphic and bijective, which (by Rouché’s theorem) also forces the derivative of {f} to be nowhere vanishing. We then say that the two open sets {U,V} are conformally equivalent. From the Cauchy-Riemann equations we see that conformal maps are orientation-preserving and angle-preserving; from the Newton approximation {f( z_0 + \Delta z) \approx f(z_0) + f'(z_0) \Delta z + O( |\Delta z|^2)} we see that they almost preserve small circles, indeed for {\varepsilon} small the circle {\{ z: |z-z_0| = \varepsilon\}} will approximately map to {\{ w: |w - f(z_0)| = |f'(z_0)| \varepsilon \}}.

In previous quarters, we proved a fundamental theorem about this concept, the Riemann mapping theorem:

Theorem 1 (Riemann mapping theorem) Let {U} be a simply connected open subset of {{\bf C}} that is not all of {{\bf C}}. Then {U} is conformally equivalent to the unit disk {D(0,1)}.

This theorem was proven in these 246A lecture notes, using an argument of Koebe. At a very high level, one can sketch Koebe’s proof of the Riemann mapping theorem as follows: among all the injective holomorphic maps {f: U \rightarrow D(0,1)} from {U} to {D(0,1)} that map some fixed point {z_0 \in U} to {0}, pick one that maximises the magnitude {|f'(z_0)|} of the derivative (ignoring for this discussion the issue of proving that a maximiser exists). If {f(U)} avoids some point in {D(0,1)}, one can compose {f} with various holomorphic maps and use Schwarz’s lemma and the chain rule to increase {|f'(z_0)|} without destroying injectivity; see the previous lecture notes for details. The conformal map {\phi: U \rightarrow D(0,1)} is unique up to Möbius automorphisms of the disk; one can fix the map by picking two distinct points {z_0,z_1} in {U}, and requiring {\phi(z_0)} to be zero and {\phi(z_1)} to be positive real.

It is a beautiful observation of Thurston that the concept of a conformal mapping has a discrete counterpart, namely the mapping of one circle packing to another. Furthermore, one can run a version of Koebe’s argument (using now a discrete version of Perron’s method) to prove the Riemann mapping theorem through circle packings. In principle, this leads to a mostly elementary approach to conformal geometry, based on extremely classical mathematics that goes all the way back to Apollonius. However, in order to prove the basic existence and uniqueness theorems of circle packing, as well as the convergence to conformal maps in the continuous limit, it seems to be necessary (or at least highly convenient) to use much more modern machinery, including the theory of quasiconformal mapping, and also the Riemann mapping theorem itself (so in particular we are not structuring these notes to provide a completely independent proof of that theorem, though this may well be possible).

To make the above discussion more precise we need some notation.

Definition 2 (Circle packing) A (finite) circle packing is a finite collection {(C_j)_{j \in J}} of circles {C_j = \{ z \in {\bf C}: |z-z_j| = r_j\}} in the complex numbers indexed by some finite set {J}, whose interiors are all disjoint (but which are allowed to be tangent to each other), and whose union is connected. The nerve of a circle packing is the finite graph whose vertices {\{z_j: j \in J \}} are the centres of the circle packing, with two such centres connected by an edge if the circles are tangent. (In these notes all graphs are undirected, finite and simple, unless otherwise specified.)

It is clear that the nerve of a circle packing is connected and planar, since one can draw the nerve by placing each vertex (tautologically) in its location in the complex plane, and drawing each edge by the line segment between the centres of the circles it connects (this line segment will pass through the point of tangency of the two circles). Later in these notes we will also have to consider some infinite circle packings, most notably the infinite regular hexagonal circle packing.

The first basic theorem in the subject is the following converse statement:

Theorem 3 (Circle packing theorem) Every connected planar graph is the nerve of a circle packing.

Of course, there can be multiple circle packings associated to a given connected planar graph; indeed, since reflections across a line and Möbius transformations map circles to circles (or lines), they will map circle packings to circle packings (unless one or more of the circles is sent to a line). It turns out that once one adds enough edges to the planar graph, the circle packing is otherwise rigid:

Theorem 4 (Koebe-Andreev-Thurston theorem) If a connected planar graph is maximal (i.e., no further edge can be added to it without destroying planarity), then the circle packing given by the above theorem is unique up to reflections and Möbius transformations.

Exercise 5 Let {G} be a connected planar graph with {n \geq 3} vertices. Show that the following are equivalent:

  • (i) {G} is a maximal planar graph.
  • (ii) {G} has {3n-6} edges.
  • (iii) Every drawing {D} of {G} divides the plane into faces that have three edges each. (This includes one unbounded face.)
  • (iv) At least one drawing {D} of {G} divides the plane into faces that have three edges each.

(Hint: use Euler’s formula {V-E+F=2}, where {F} is the number of faces including the unbounded face.)

Thurston conjectured that circle packings can be used to approximate the conformal map arising in the Riemann mapping theorem. Here is an informal statement:

Conjecture 6 (Informal Thurston conjecture) Let {U} be a simply connected domain, with two distinct points {z_0,z_1}. Let {\phi: U \rightarrow D(0,1)} be the conformal map from {U} to {D(0,1)} that maps {z_0} to the origin and {z_1} to a positive real. For any small {\varepsilon>0}, let {{\mathcal C}_\varepsilon} be the portion of the regular hexagonal circle packing by circles of radius {\varepsilon} that are contained in {U}, and let {{\mathcal C}'_\varepsilon} be an circle packing of {D(0,1)} with all “boundary circles” tangent to {D(0,1)}, giving rise to an “approximate map” {\phi_\varepsilon: U_\varepsilon \rightarrow D(0,1)} defined on the subset {U_\varepsilon} of {U} consisting of the circles of {{\mathcal C}_\varepsilon}, their interiors, and the interstitial regions between triples of mutually tangent circles. Normalise this map so that {\phi_\varepsilon(z_0)} is zero and {\phi_\varepsilon(z_1)} is a positive real. Then {\phi_\varepsilon} converges to {\phi} as {\varepsilon \rightarrow 0}.

A rigorous version of this conjecture was proven by Rodin and Sullivan. Besides some elementary geometric lemmas (regarding the relative sizes of various configurations of tangent circles), the main ingredients are a rigidity result for the regular hexagonal circle packing, and the theory of quasiconformal maps. Quasiconformal maps are what seem on the surface to be a very broad generalisation of the notion of a conformal map. Informally, conformal maps take infinitesimal circles to infinitesimal circles, whereas quasiconformal maps take infinitesimal circles to infinitesimal ellipses of bounded eccentricity. In terms of Wirtinger derivatives, conformal maps obey the Cauchy-Riemann equation {\frac{\partial \phi}{\partial \overline{z}} = 0}, while (sufficiently smooth) quasiconformal maps only obey an inequality {|\frac{\partial \phi}{\partial \overline{z}}| \leq \frac{K-1}{K+1} |\frac{\partial \phi}{\partial z}|}. As such, quasiconformal maps are considerably more plentiful than conformal maps, and in particular it is possible to create piecewise smooth quasiconformal maps by gluing together various simple maps such as affine maps or Möbius transformations; such piecewise maps will naturally arise when trying to rigorously build the map {\phi_\varepsilon} alluded to in the above conjecture. On the other hand, it turns out that quasiconformal maps still have many vestiges of the rigidity properties enjoyed by conformal maps; for instance, there are quasiconformal analogues of fundamental theorems in conformal mapping such as the Schwarz reflection principle, Liouville’s theorem, or Hurwitz’s theorem. Among other things, these quasiconformal rigidity theorems allow one to create conformal maps from the limit of quasiconformal maps in many circumstances, and this will be how the Thurston conjecture will be proven. A key technical tool in establishing these sorts of rigidity theorems will be the theory of an important quasiconformal (quasi-)invariant, the conformal modulus (or, equivalently, the extremal length, which is the reciprocal of the modulus).

— 1. Proof of the circle packing theorem —

We loosely follow the treatment of Beardon and Stephenson. It is slightly more convenient to temporarily work in the Riemann sphere {{\bf C} \cup \{\infty\}} rather than the complex plane {{\bf C}}, in order to more easily use Möbius transformations. (Later we will make another change of venue, working in the Poincaré disk {D(0,1)} instead of the Riemann sphere.)

Define a Riemann sphere circle to be either a circle in {{\bf C}} or a line in {{\bf C}} together with {\infty}, together with one of the two components of the complement of this circle or line designated as the “interior”. In the case of a line, this “interior” is just one of the two half-planes on either side of the line; in the case of the circle, this is either the usual interior or the usual exterior plus the point at infinity; in the last case, we refer to the Riemann sphere circle as an exterior circle. (One could also equivalently work with an orientation on the circle rather than assigning an interior, since the interior could then be described as the region to (say) the left of the circle as one traverses the circle along the indicated orientation.) Note that Möbius transforms map Riemann sphere circles to Riemann sphere circles. If one views the Riemann sphere as a geometric sphere in Euclidean space {{\bf R}^3}, then Riemann sphere circles are just circles on this geometric sphere, which then have a centre on this sphere that lies in the region designated as the interior of the circle. We caution though that this “Riemann sphere” centre does not always correspond to the Euclidean notion of the centre of a circle. For instance, the real line, with the upper half-plane designated as interior, will have {i} as its Riemann sphere centre; if instead one designates the lower half-plane as the interior, the Riemann sphere centre will now be {-i}. We can then define a Riemann sphere circle packing in exact analogy with circle packings in {{\bf C}}, namely finite collections of Riemann sphere circles whose interiors are disjoint and whose union is connected; we also define the nerve as before. This is now a graph that can be drawn in the Riemann sphere, using great circle arcs in the Riemann sphere rather than line segments; it is also planar, since one can apply a Möbius transformation to move all the points and edges of the drawing away from infinity.

By Exercise 5, a maximal planar graph with at least three vertices can be drawn as a triangulation of the Riemann sphere. If there are at least four vertices, then it is easy to see that each vertex has degree at least three (a vertex of degree zero, one or two in a triangulation with simple edges will lead to a connected component of at most three vertices). It is a topological fact, not established here, that any two triangulations of such a graph are homotopic up to reflection (to reverse the orientation). If a Riemann sphere circle packing has the nerve of a maximal planar graph {G} of at least four vertices, then we see that this nerve induces an explicit triangulation of the Riemann sphere by connecting the centres of any pair of tangent circles with the great circle arc that passes through the point of tangency. If {G} was not maximal, one no longer gets a triangulation this way, but one still obtains a partition of the Riemann sphere into spherical polygons.

We remark that the triangles in this triangulation can also be described purely from the abstract graph {G}. Define a triangle in {G} to be a triple {w_1,w_2,w_3} of vertices in {G} which are all adjacent to each other, and such that the removal of these three vertices from {G} does not disconnect the graph. One can check that there is a one-to-one correspondence between such triangles in a maximal planar graph {G} and the triangles in any Riemann sphere triangulation of this graph.

Theorems 3, 4 are then a consequence of

Theorem 7 (Riemann sphere circle packing theorem) Let {G} be a maximal planar graph with at least four vertices, drawn as a triangulation of the Riemann sphere. Then there exists a Riemann sphere circle packing with nerve {G} whose triangulation is homotopic to the given triangulation. Furthermore, this packing is unique up to Möbius transformations.

Exercise 8 Deduce Theorems 3, 4 from Theorem 7. (Hint: If one has a non-maximal planar graph for Theorem 3, add a vertex at the interior of each non-triangular face of a drawing of that graph, and connect that vertex to the vertices of the face, to create a maximal planar graph to which Theorem 4 or Theorem 7 can be applied. Then delete these “helper vertices” to create a packing of the original planar graph that does not contain any “unwanted” tangencies. You may use without proof the above assertion that any two triangulations of a maximal planar graph are homotopic up to reflection.)

Exercise 9 Verify Theorem 7 when {G} has exactly four vertices. (Hint: for the uniqueness, one can use Möbius transformations to move two of the circles to become parallel lines.)

To prove this theorem, we will make a reduction with regards to the existence component of Theorem 7. For technical reasons we will need to introduce a notion of non-degeneracy. Let {G} be a maximal planar graph with at least four vertices, and let {v} be a vertex in {G}. As discussed above, the degree {d} of {v} is at least three. Writing the neighbours of {v} in clockwise or counterclockwise order (with respect to a triangulation) as {v_1,\dots,v_d} (starting from some arbitrary neighbour), we see that each {v_i} is adjacent to {v_{i-1}} and {v_{i+1}} (with the conventions {v_0=v_d} and {v_{d+1}=v_1}). We say that {v} is non-degenerate if there are no further adjacencies between the {v_1,\dots,v_d}, and if there is at least one further vertex in {G} besides {v,v_1,\dots,v_d}. Here is another characterisation:

Exercise 10 Let {G} be a maximal planar graph with at least four vertices, let {v} be a vertex in {G}, and let {v_1,\dots,v_d} be the neighbours of {v}. Show that the following are equivalent:

  • (i) {v} is non-degenerate.
  • (ii) The graph {G \backslash \{ v, v_1, \dots, v_d \}} is connected and non-empty, and every vertex in {v_1,\dots,v_d} is adjacent to at least one vertex in {G \backslash \{ v, v_1, \dots, v_d \}}.

We will then derive Theorem 7 from

Theorem 11 (Inductive step) Let {G} be a maximal planar graph with at least four vertices {V}, drawn as a triangulation of the Riemann sphere. Let {v} be a non-degenerate vertex of {G}, and let {G - \{v\}} be the graph formed by deleting {v} (and edges emenating from {v}) from {G}. Suppose that there exists a Riemann sphere circle packing {(C_w)_{w \in V \backslash \{v\}}} whose nerve is at least {G - \{v\}} (that is, {C_w} and {C_{w'}} are tangent whenever {w,w'} are adjacent in {G - \{v\}}, although we also allow additional tangencies), and whose associated subdivision of the Riemann sphere into spherical polygons is homotopic to the given triangulation with {v} removed. Then there is a Riemann sphere circle packing {(\tilde C_w)_{w \in V}} with nerve {G} whose triangulation is homotopic to the given triangulation. Furthermore this circle packing {(\tilde C_w)_{w \in V}} is unique up to Möbius transformations.

Let us now see how Theorem 7 follows from Theorem 14. Fix {G} as in Theorem 7. By Exercise 9 and induction we may assume that {G} has at least five vertices, and that the claim has been proven for any smaller number of vertices.

First suppose that {G} contains a non-degenerate vertex {v}. Let {v_1,\dots,v_d} be the the neighbours of {v}. One can then form a new graph {G'} with one fewer vertex by deleting {v}, and then connecting {v_3,\dots,v_{d-1}} to {v_1} (one can think of this operation as contracting the edge {\{v,v_1\}} to a point). One can check that this is still a maximal planar graph that can triangulate the Riemann sphere in a fashion compatible with the original triangulation of {G} (in that all the common vertices, edges, and faces are unchanged). By induction hypothesis, {G'} is the nerve of a circle packing that is compatible with this triangulation, and hence this circle packing has nerve at least {G - \{v\}}. Applying Theorem 14, we then obtain the required claim for {G}.

Now suppose that {G} contains a degenerate vertex {v}. Let {v_1,\dots,v_d} be the neighbours of {v} traversed in order. By hypothesis, there is an additional adjacency between the {v_1,\dots,v_d}; by relabeling we may assume that {v_1} is adjacent to {v_k} for some {3 \leq k \leq d-1}. The vertices {V} in {G} can then be partitioned as

\displaystyle  V = \{v\} \cup \{ v_1,\dots,v_d\} \cup V_1 \cup V_2

where {V_1} denotes those vertices in {V \backslash \{ v_1,\dots,v_d\}} that lie in the region enclosed by the loop {v_1,\dots,v_k, v_1} that does not contain {v}, and {V_2} denotes those vertices in {V \backslash \{ v_1,\dots,v_d\}} that lie in the region enclosed by the loop {v_k,\dots,v_d,v_1, v_k} that does not contain {v}. One can then form two graphs {G_1, G_2}, formed by restricting {G} to the vertices {\tilde V_1 := \{v, v_1,\dots,v_k\} \cup V_1} and {\tilde V_2 := \{ v, v_k, \dots, v_d, v_1\} \cup V_2} respectively; furthermore, these graphs are also maximal planar (with triangulations that are compatible with those of {G}). By induction hypothesis, we can find a circle packing {(C_w)_{w \in \tilde V_1}} with nerve {G_1}, and a circle packing {(C'_w)_{w \in \tilde V_2}} with nerve {G_2}. Note that the circles {C_v, C_{v_1}, C_{v_k}} are mutually tangent, as are {C'_v, C'_{v_1}, C'_{v_k}}. By applying a Möbius transformation one may assume that these circles agree, thus (cf. Exercise 9) {C_v = C'_v}, {C_{v_1} = C'_{v_1}, C_{v_k} = C'_{v_k}}. The complement of the these three circles (and their interiors) determine two connected “interstitial” regions (that are in the shape of an arbelos, up to Möbius transformation); one can check that the remaining circles in {(C_w)_{w \in \tilde V_1}} will lie in one of these regions, and the remaining circles in {(C'_w)_{w \in \tilde V_2}} lie in the other. Hence one can glue these circle packings together to form a single circle packing with nerve {G}, which is homotopic to the given triangulation. Also, since a Möbius transformation that fixes three mutually tangent circles has to be the identity, the uniqueness of this circle packing up to Möbius transformations follows from the uniqueness for the two component circle packings {(C_w)_{w \in \tilde V_1}}, {(C'_w)_{w \in \tilde V_2}}.

It remains to prove Theorem 7. To help fix the freedom to apply Möbius transformations, we can normalise the target circle packing {(\tilde C_w)_{w \in V}} so that {\tilde C_v} is the exterior circle {\{ |z|=1\}}, thus all the other circles {\tilde C_w} in the packing will lie in the closed unit disk {\overline{D(0,1)}}. Similarly, by applying a suitable Möbius transformation one can assume that {\infty} lies outside of the interior of all the circles {C_w} in the original packing, and after a scaling one may then assume that all the circles {C_w} lie in the unit disk {D(0,1)}.

At this point it becomes convenient to switch from the “elliptic” conformal geometry of the Riemann sphere {{\bf C} \cup \{\infty\}} to the “hyperbolic” conformal geometry of the unit disk {D(0,1)}. Recall that the Möbius transformations that preserve the disk {D(0,1)} are given by the maps

\displaystyle  z \mapsto e^{i\theta} \frac{z-\alpha}{1-\overline{\alpha} z} \ \ \ \ \ (1)

for real {\theta} and {\alpha \in D(0,1)} (see Theorem 19 of these notes). It comes with a natural metric that interacts well with circles:

Exercise 12 Define the Poincaré distance {d(z_1,z_2)} between two points of {D(0,1)} by the formula

\displaystyle  d(z_1,z_2) := 2 \mathrm{arctanh} |\frac{z_1-z_2}{1-z_1 \overline{z_2}}|.

Given a measurable subset {E} of {D(0,1)}, define the hyperbolic area of {E} to be the quantity

\displaystyle  \mathrm{area}(E) := \int_E \frac{4\ dx dy}{(1-|z|^2)^2}

where {dx dy} is the Euclidean area element on {D(0,1)}.

  • (i) Show that the Poincaré distance is invariant with respect to Möbius automorphisms of {D(0,1)}, thus {d(Tz_1, Tz_2) = d(z_1,z_2)} whenever {T} is a transformation of the form (1). Similarly show that the hyperbolic area is invariant with respect to such transformations.
  • (ii) Show that the Poincaré distance defines a metric on {D(0,1)}. Furthermore, show that any two distinct points {z_1,z_2} are connected by a unique geodesic, which is a portion of either a line or a circle that meets the unit circle orthogonally at two points. (Hint: use the symmetries of (i) to normalise the points one is studying.)
  • (iii) If {C} is a circle in the interior of {D(0,1)}, show that there exists a point {z_C} in {D(0,1)} and a positive real number {r_C} (which we call the hyperbolic center and hyperbolic radius respectively) such that {C = \{ z \in D(0,1): d(z,z_C) = r_C \}}. (In general, the hyperbolic center and radius will not quite agree with their familiar Euclidean counterparts.) Conversely, show that for any {z_C \in D(0,1)} and {r_C > 0}, the set {\{ z \in D(0,1): d(z,z_C) = r_C \}} is a circle in {D(0,1)}.
  • (iv) If two circles {C_1, C_2} in {D(0,1)} are externally tangent, show that the geodesic connecting the hyperbolic centers {z_{C_1}, z_{C_2}} passes through the point of tangency, orthogonally to the two tangent circles.

Exercise 13 (Schwarz-Pick theorem) Let {f: D(0,1) \rightarrow D(0,1)} be a holomorphic map. Show that {d(f(z_1),f(z_2)) \leq d(z_1,z_2)} for all {z_1,z_2 \in D(0,1)}. If {z_1 \neq z_2}, show that equality occurs if and only if {f} is a Möbius automorphism (1) of {D(0,1)}. (This result is known as the Schwarz-Pick theorem.)

We will refer to circles that lie in the closure {\overline{D(0,1)}} of the unit disk as hyperbolic circles. These can be divided into the finite radius hyperbolic circles, which lie in the interior of the unit disk (as per part (iii) of the above exercise), and the horocycles, which are internally tangent to the unit circle. By convention, we view horocycles as having infinite radius, and having center at their point of tangency to the unit circle; they can be viewed as the limiting case of finite radius hyperbolic circles when the radius goes to infinity and the center goes off to the boundary of the disk (at the same rate as the radius, as measured with respect to the Poincaré distance). We write {C(p,r)} for the hyperbolic circle with hyperbolic centre {p} and hyperbolic radius {r} (thus either {0 < r < \infty} and {p \in D(0,1)}, or {r = \infty} and {p} is on the unit circle); there is an annoying caveat that when {r=\infty} there is more than one horocycle {C(p,\infty)} with hyperbolic centre {p}, but we will tolerate this breakdown of functional dependence of {C} on {p} and {r} in order to simplify the notation. A hyperbolic circle packing is a circle packing {(C(p_v,r_v))_{v \in V}} in which all circles are hyperbolic circles.

We also observe that the geodesic structure extends to the boundary of the unit disk: for any two distinct points {z_1,z_2} in {\overline{D(0,1)}}, there is a unique geodesic that connects them.

In view of the above discussion, Theorem 7 may now be formulated as follows:

Theorem 14 (Inductive step, hyperbolic formulation) Let {G} be a maximal planar graph with at least four vertices {V}, let {v} be a non-degenerate vertex of {G}, and let {v_1,\dots,v_d} be the vertices adjacent to {v}. Suppose that there exists a hyperbolic circle packing {(C(p_w,r_w))_{w \in V \backslash \{v\}}} whose nerve is at least {G - \{v\}}. Then there is a hyperbolic circle packing {(C(\tilde p_w,\tilde r_w))_{V \backslash \{v\}}} homotopic to {(C(p_w,r_w))_{w \in V \backslash \{v\}}} such that the boundary circles {C(\tilde p_{v_j}, \tilde r_{v_j})}, {j=1,\dots,d} are all horocycles. Furthermore, this packing is unique up to Möbius automorphisms (1) of the disk {D(0,1)}.

Indeed, once one adjoints the exterior unit circle to {(C(p_w,r_w))_{w \in V \backslash \{v\}}}, one obtains a Riemann sphere circle packing whose nerve is at least {G}, and hence equal to {G} since {G} is maximal.

To prove this theorem, the intuition is to “inflate” the hyperbolic radius of the circles of {C_w} until the boundary circles all become infinite radius (i.e., horocycles). The difficulty is that one cannot just arbitrarily increase the radius of any given circle without destroying the required tangency properties. The resolution to this difficulty given in the work of Beardon and Stephenson that we are following here was inspired by Perron’s method of subharmonic functions, in which one faced an analogous difficulty that one could not easily manipulate a harmonic function without destroying its harmonicity. There, the solution was to work instead with the more flexible class of subharmonic functions; here we similarly work with the concept of a subpacking.

We will need some preliminaries to define this concept precisely. We first need some hyperbolic trigonometry. We define a hyperbolic triangle to be the solid (and closed) region in {\overline{D(0,1)}} enclosed by three distinct points {z_1,z_2,z_3} in {\overline{D(0,1)}} and the geodesic arcs connecting them. (Note that we allow one or more of the vertices to be on the boundary of the disk, so that the sides of the triangle could have infinite length.) Let {T := (0,+\infty]^3 \backslash \{ (\infty,\infty,\infty)\}} be the space of triples {(r_1,r_2,r_3)} with {0 < r_1,r_2,r_3 \leq \infty} and not all of {r_1,r_2,r_3} infinite. We say that a hyperbolic triangle with vertices {p_1,p_2,p_3} is a {(r_1,r_2,r_3)}-triangle if there are hyperbolic circles {C(p_i,r_1), C(p_2,r_2), C(p_3,r_3)} with the indicated hyperbolic centres and hyperbolic radii that are externally tangent to each other; note that this implies that the sidelengths opposite {p_1,p_2,p_3} have length {r_2+r_3, r_1+r_3, r_1+r_2} respectively (see Figure 3 of Beardon and Stephenson). It is easy to see that for any {(r_1,r_2,r_3) \in T}, there exists a unique {(r_1,r_2,r_3)}-triangle in {\overline{D(0,1)}} up to reflections and Möbius automorphisms (use Möbius transforms to fix two of the hyperbolic circles, and consider all the circles externally tangent to both of these circles; the case when one or two of the {r_1,r_2,r_3} are infinite may need to be treated separately.). As a consequence, there is a well defined angle {\alpha_i(r_1,r_2,r_3) \in [0,\pi)} for {i=1,2,3} subtended by the vertex {p_i} of an {(r_1,r_2,r_3)} triangle. We need some basic facts from hyperbolic geometry:

Exercise 15 (Hyperbolic trigonometry)

  • (i) (Hyperbolic cosine rule) For any {0 < r_1,r_2,r_3 < \infty}, show that the quantity {\cos \alpha_1(r_1,r_2,r_3)} is equal to the ratio

    \displaystyle  \frac{\cosh( r_1+r_2) \cosh(r_1+r_3) - \cosh(r_2+r_3)}{\sinh(r_1+r_2) \sinh(r_1+r_3)}.

    Furthermore, establish the limiting angles

    \displaystyle  \alpha_1(\infty,r_2,r_3) = \alpha_1(\infty,\infty,r_3) = \alpha_1(\infty,r_2,\infty) = 0

    \displaystyle  \cos \alpha_1(r_1,\infty,r_3) = \frac{\cosh(r_1+r_3) - \exp(r_3-r_1)}{\sinh(r_1+r_3)}

    \displaystyle  \cos \alpha_1(r_1,r_2,\infty) = \frac{\cosh(r_1+r_2) - \exp(r_2-r_1)}{\sinh(r_1+r_2)}

    \displaystyle  \cos \alpha_1(r_1,\infty,\infty) = 1 - 2\exp(-2r_1).

    (Hint: to facilitate computations, use a Möbius transform to move the {p_1} vertex to the origin when the radius there is finite.) Conclude in particular that {\alpha_1: T \rightarrow [0,\pi)} is continuous (using the topology of the extended real line for each component of {T}). Discuss how this rule relates to the Euclidean cosine rule in the limit as {r_1,r_2,r_3} go to zero. Of course, by relabeling one obtains similar formulae for {\alpha_2(r_1,r_2,r_3)} and {\alpha_3(r_1,r_2,r_3)}.

  • (ii) (Area rule) Show that the area of a hyperbolic triangle is given by {\pi - \alpha_1-\alpha_2-\alpha_3}, where {\alpha_1,\alpha_2,\alpha_3} are the angles of the hyperbolic triangle. (Hint: there are several ways to proceed. For instance, one can prove this for small hyperbolic triangles (of diameter {O(\varepsilon)}) up to errors of size {o(\varepsilon^2)} after normalising as in (ii), and then establish the general case by subdividing a large hyperbolic triangle into many small hyperbolic triangles. This rule is also a special case of the Gauss-Bonnet theorem in Riemannian geometry. One can also first establish the case when several of the radii are infinite, and use that to derive finite cases.) In particular, the area {\mathrm{Area}(r_1,r_2,r_3)} of a {(r_1,r_2,r_3)}-triangle is given by the formula

    \displaystyle  \pi - \alpha_1(r_1,r_2,r_3) - \alpha_2(r_1,r_2,r_3) - \alpha_3(r_1,r_2,r_3). \ \ \ \ \ (2)

  • (iii) Show that the area of the interior of a hyperbolic circle {C(p,r)} with {r<\infty} is equal to {4\pi \sinh^2(r/2)}.

Henceforth we fix {G, v, v_1,\dots,v_d, {\mathcal C} = (C(p_w,r_w))_{w \in V \backslash \{v\}}} as in Theorem 14. We refer to the vertices {v_1,\dots,v_d} as boundary vertices of {G - \{v\}} and the remaining vertices as interior vertices; edges between boundary vertices are boundary edges, all other edges will be called interior edges (including edges that have one vertex on the boundary). Triangles in {G -\{v\}} that involve two boundary vertices (and thus necessarily one interior vertex) will be called boundary triangles; all other triangles (including ones that involve one boundary vertex) will be called interior triangles. To any triangle {w_1,w_2,w_3} of {G - \{v\}}, we can form the hyperbolic triangle {\Delta_{\mathcal C}(w_1,w_2,w_3)} with vertices {p_{w_1}, p_{w_2}, p_{w_3}}; this is an {(r_{w_1}, r_{w_2}, r_{w_3})}-triangle. Let {\Sigma} denote the collection of such hyperbolic triangles; because {{\mathcal C}} is a packing, we see that these triangles have disjoint interiors. They also fit together in the following way: if {e} is a side of a hyperbolic triangle in {\Sigma}, then there will be another hyperbolic triangle in {\Sigma} that shares that side precisely when {e} is associated to an interior edge of {G - \{v\}}. The union of all these triangles is homeomorphic to the region formed by starting with a triangulation of the Riemann sphere by {G} and removing the triangles containing {v} as a vertex, and is therefore homeomorphic to a disk. One can think of the collection {\Sigma} of hyperbolic triangles, together with the vertices and edges shared by these triangles, as a two-dimensional (hyperbolic) simplicial complex, though we will not develop the full machinery of such complexes here.

Our objective is to find another hyperbolic circle packing {\tilde {\mathcal C} = (C(\tilde p_w, \tilde r_w))_{w \in V \backslash \{v\}}} homotopic to the existing circle packing {{\mathcal C}}, such at all the boundary circles (circles centred at boundary vertices) are horocycles. We observe that such a hyperbolic circle packing is completely described (up to Möbius transformations) by the hyperbolic radii {(\tilde r_w)_{w \in V \backslash \{v\}}} of these circles. Indeed, suppose one knows the values of these hyperbolic radii. Then each hyperbolic triangle {\Delta_{\mathcal C}(w_1,w_2,w_3)} in {\Sigma} is associated to a hyperbolic triangle {\Delta_{\tilde {\mathcal C}}(w_1,w_2,w_3)} whose sides and angles are known from Exercise 15. As the orientation of each hyperbolic triangle is fixed, each hyperbolic triangle is determined up to a Möbius automorphism of {D(0,1)}. Once one fixes one hyperbolic triangle, the adjacent hyperbolic triangles (that share a common side with the first triangle) are then also fixed; continuing in this fashion we see that the entire hyperbolic circle packing {\tilde {\mathcal C}} is determined.

On the other hand, not every choice of radii {(\tilde r_w)_{w \in V \backslash \{v\}}} will lead to a hyperbolic circle packing {\tilde {\mathcal C}} with the required properties. There are two obvious constraints that need to be satisfied:

  • (i) (Local constraint) The angles {\alpha_1( \tilde r_w, \tilde r_{w_1}, \tilde r_{w_2})} of all the hyperbolic triangles {\Delta_{\tilde {\mathcal C}}(w,w_1,w_2)} around any given interior vertex {w} must sum to exactly {2\pi}.
  • (ii) (Boundary constraint) The radii associated to boundary vertices must be infinite.

There could potentially also be a global constraint, in that one requires the circles of the packing to be disjoint – including circles that are not necessarily adjacent to each other. In general, one can easily create configurations of circles that are local circle packings but not global ones (see e.g., Figure 7 of Beardon-Stephenson). However, it turns out that one can use the boundary constraint and topological arguments to prevent this from happening. We first need a topological lemma:

Lemma 16 (Topological lemma) Let {U, V} be bounded connected open subsets of {{\bf C}} with {V} simply connected, and let {f: \overline{U} \rightarrow \overline{V}} be a continuous map such that {f(\partial U) \subset \partial V} and {f(U) \subset V}. Suppose furthermore that the restriction of {f} to {U} is a local homeomorphism. Then {f} is in fact a global homeomorphism.

The requirement that the restriction of {f} to {U} be a local homeomorphism can in fact be relaxed to local injectivity thanks to the invariance of domain theorem. The complex numbers {{\bf C}} can be replaced here by any finite-dimensional vector space.

Proof: The preimage {f^{-1}(p)} of any point {p} in the interior of {V} is closed, discrete, and disjoint from {\partial U}, and is hence finite. Around each point in the preimage, there is a neighbourhood on which {f} is a homeomorphism onto a neighbourhood of {p}. If one deletes the closure of these neighbourhoods, the image under {f} is compact and avoids {p}, and thus avoids a neighbourhood of {p}. From this we can show that {f} is a covering map from {U} to {V}. As the base {V} is simply connected, it is its own universal cover, and hence (by the connectedness of {U}) {f} must be a homeomorphism as claimed. \Box

Proposition 17 Suppose we assign a radius {\tilde r_w \in (0,+\infty]} to each {w \in V \backslash \{v\}} that obeys the local constraint (i) and the boundary constraint (ii). Then there is a hyperbolic circle packing {(C(\tilde p_w, \tilde r_w))_{w \in V \backslash \{v\}}} with nerve {G - \{v\}} and the indicated radii.

Proof: We first create the hyperbolic triangles {\Delta_{\tilde {\mathcal C}}(w_1,w_2,w_3)} associated with the required hyperbolic circle packing, and then verify that this indeed arises from a circle packing.

Start with a single triangle {(w^0_1,w^0_2,w^0_3)} in {G - \{v\}}, and arbitrarily select a {(\tilde r_{w^0_1}, \tilde r_{w^0_2}, \tilde r_{w^0_3})}-triangle {\Delta_{\tilde {\mathcal C}}(w^0_1,w^0_2,w^0_3)} with the same orientation as {\Delta_{{\mathcal C}}(w_1,w_2,w_3)}. By Exercise 15(i), such a triangle exists (and is unique up to Möbius automorphisms of the disk). If a hyperbolic triangle {\Delta_{\tilde {\mathcal C}}(w_1,w_2,w_3)} has been fixed, and {(w_2,w_3,w_4)} (say) is an adjacent triangle in {G - \{v\}}, we can select {\Delta_{\tilde {\mathcal C}}(w_2,w_3,w_4)} to be the unique {(r_{w_2}, r_{w_3}, r_{w_4})}-triangle with the same orientation as {\Delta_{{\mathcal C}}(w_2,w_3,w_4)} that shares the {w_2,w_3} side in common with {\Delta_{\tilde {\mathcal C}}(w_1,w_2,w_3)} (with the {w_2} and {w_3} vertices agreeing). Similarly for other permutations of the labels. As {G} is a maximal planar graph with {v} non-degenerate (so in particular the set of internal vertices is connected), we can continue this construction to eventually fix every triangle in {G - \{v\}}. There is the potential issue that a given triangle {\Delta_{{\mathcal C}}(w_1,w_2,w_3)} may depend on the order in which one arrives at that triangle starting from {(w^0_1,w^0_2,w^0_3)}, but one can check from a monodromy argument (in the spirit of the monodromy theorem) using the local constraint (i) and the simply connected nature of the triangulation associated to {{\mathcal C}} that there is in fact no dependence on the order. (The process resembles that of laying down jigsaw pieces in the shape of hyperbolic triangles together, with the local constraint ensuring that there is always a flush fit locally.)

Now we show that the hyperbolic triangles {\Delta_{\tilde {\mathcal C}}(w_1,w_2,w_3)} have disjoint interiors inside the disk {D(0,1)}. Let {X} denote the topological space formed by taking the disjoint union of the hyperbolic triangles {\Delta_{\tilde {\mathcal C}}(w_1,w_2,w_3)} (now viewed as abstract topological spaces rather than subsets of the disk) and then gluing together all common edges, e.g. identifying the {\{w_2,w_3\}} edge of {\Delta_{\tilde {\mathcal C}}(w_1,w_2,w_3)} with the same edge of {\Delta_{\tilde {\mathcal C}}(w_2,w_3,w_4)} if {(w_1,w_2,w_3)} and {(w_2,w_3,w_4)} are adjacent triangles in {G - \{v\}}. This space is homeomorphic to the union of the original hyperbolic triangles {\Delta_{\tilde {\mathcal C}}(w_1,w_2,w_3)}, and is thus homeomorphic to the closed unit disk. There is an obvious projection map {\pi} from {X} to the union of the {\Delta_{\tilde {\mathcal C}}(w_1,w_2,w_3)}, which maps the abstract copy in {X} of a given hyperbolic triangle {\Delta_{\tilde {\mathcal C}}(w_1,w_2,w_3)} to its concrete counterpart in {\overline{D(0,1)}} in the obvious fashion. This map is continuous. It does not quite cover the full closed disk, mainly because (by the boundary condition (ii)) the boundary hyperbolic triangles {\Delta_{\tilde {\mathcal C}}(v_i,v_{i+1},w)} touch the boundary of the disk at the vertices associated to {v_i} and {v_{i+1}} but do not follow the boundary arc connecting these vertices, being bounded instead by the geodesic from the {v_i} vertex to the {v_{i+1}} vertex; the missing region is a lens-shaped region bounded by two circular arcs. However, by applying another homeomorphism (that does not alter the edges from {v_i} to {w} or {v_{i+1}} to {w}), one can “push out” the {\{v_i,v_{i+1}\}} edge of this hyperbolic triangle across the lens to become the boundary arc from {v_i} to {v_{i+1}}. If one performs this modification for each boundary triangle, one arrives at a modified continuous map {\tilde \pi} from {X} to {\overline{D(0,1)}}, which now has the property that the boundary of {X} maps to the boundary of the disk, and the interior of {X} maps to the interior of the disk. Also one can check that this map is a local homeomorphism. By Lemma 16, {\tilde \pi} is injective; undoing the boundary modifications we conclude that {\pi} is injective. Thus the hyperbolic triangles {\Delta_{\tilde {\mathcal C}}(w_1,w_2,w_3)} have disjoint interiors. Furthermore, the arguments show that for each boundary triangle {\Delta_{\tilde {\mathcal C}}(v_i,v_{i+1},w)}, the lens-shaped regions between the boundary arc between the vertices associated to {v_i, v_{i+1}} and the corresponding edge of the boundary triangle are also disjoint from the hyperbolic triangles and from each other. On the other hand, all of the hyperbolic circles and in {{\tilde {\mathcal C}}} and their interiors are contained in the union of the hyperbolic triangles {\Delta_{\tilde {\mathcal C}}(w_1,w_2,w_3)} and the lens-shaped regions, with each hyperbolic triangle containing portions only of the hyperbolic circles with hyperbolic centres at the vertices of the triangle, and similarly for the lens-shaped regions. From this one can verify that the interiors of the hyperbolic circles are all disjoint from each other, and give a hyperbolic circle packing with the required properties. \Box

In view of the above proposition, the only remaining task is to find an assignment of radii {(\tilde r_w)_{w \in V \backslash \{v\}}} obeying both the local condition (i) and the boundary condition (ii). This is analogous to finding a harmonic function with specified boundary data. To do this, we perform the following analogue of Perron’s method. Define a subpacking to be an assignment {(\tilde r_w)_{w \in V \backslash \{v\}}} of radii {\tilde r_w \in (0,+\infty]} obeying the following

  • (i’) (Local sub-condition) The angles {\alpha_1( \tilde r_w, \tilde r_{w_1}, \tilde r_{w_2})} around any given interior vertex {w} sum to at least {2\pi}.

This can be compared with the definition of a (smooth) subharmonic function as one where the Laplacian is always at least zero. Note that we always have at least one subpacking, namely the one provided by the radii of the original hyperbolic circle packing {{\mathcal C}}. Intuitively, in each subpacking, the radius {\tilde r_w} at an interior vertex {w} is either “too small” or “just right”.

We now need a key monotonicity property, analogous to how the maximum of two subharmonic functions is again subharmonic:

Exercise 18 (Monotonicity)

  • (i) Show that the angle {\alpha_1( r_1, r_2, r_3)} (as defined in Exercise 15(i)) is strictly decreasing in {r_1} and strictly increasing in {r_2} or {r_3} (if one holds the other two radii fixed). Do these claims agree with your geometric intuition?
  • (ii) Conclude that whenever {{\mathcal R}' = (r'_w)_{w \in V \backslash \{v\}}} and {{\mathcal R}'' = (r''_w)_{w \in V \backslash \{v\}}} are subpackings, that {\max( {\mathcal R}' , {\mathcal R}'' ) := (\max(r'_w, r''_w))_{w \in V \backslash \{v\}}} is also a subpacking.
  • (iii) Let {(r_1,r_2,r_3), (r'_1,r'_2,r'_3) \in T} be such that {r_i \leq r'_i} for {i=1,2,3}. Show that {\mathrm{Area}(r_1,r_2,r_3) \leq \mathrm{Area}(r'_1,r'_2,r'_3)}, with equality if and only if {r_i=r'_i} for all {i=1,2,3}. (Hint: increase just one of the radii {r_1,r_2,r_3}. One can either use calculus (after first disposing of various infinite radii cases) or one can argue geometrically.)

As with Perron’s method, we can now try to construct a hyperbolic circle packing by taking the supremum of all the subpackings. To avoid degeneracies we need an upper bound:

Proposition 19 (Upper bound) Let {(\tilde r_w)_{w \in V \backslash \{v\}}} be a subpacking. Then for any interior vertex {w} of degree {d}, one has {\tilde r_w \leq \sqrt{d}}.

The precise value of {\sqrt{d}} is not so important for our arguments, but the fact that it is finite will be. This boundedness of interior circles in a circle packing is a key feature of hyperbolic geometry that is not present in Euclidean geometry, and is one of the reasons why we moved to a hyperbolic perspective in the first place.

Proof: By the subpacking property and pigeonhole principle, there is a triangle {w, w_1, w_2} in {G - \{v\}} such that {\alpha_1(w,w_1,w_2) \geq \frac{2\pi}{d}}. The hyperbolic triangle associated to {(w_1,w_2,w_3)} has area at most {\pi} by (2); on the other hand, it contains a sector of a hyperbolic circle of radius {\tilde r_w} and angle {\frac{2\pi}{d}}, and hence has area at least {\frac{1}{d} 4\pi \sinh^2(r/2) \geq \frac{\pi r^2}{d}}, thanks to Exercise 15(iv). Comparing the two bounds gives the claim. \Box

Now define {{\mathcal R} = ( \tilde r_w )_{w \in V \backslash \{v\}}} to be the (pointwise) supremum of all the subpackings. By the above proposition, {\tilde r_w} is finite at every interior vertex. By Exercise 18, one can view {{\mathcal R}} as a monotone increasing limit of subpackings, and is thus again a subpacking (due to the continuity properties of {\alpha_1} as long as at least one of the radii stays bounded); thus {{\mathcal R}} is the maximal subpacking. On the other hand, if {\tilde r_w} is finite at some boundary vertex, then by Exercise 18(i) one could replace that radius by a larger quantity without destroying the subpacking property, contradicting the maximality of {{\mathcal R}}. Thus all the boundary radii are infinite, that is to say the boundary condition (ii) holds. Finally, if the sum of the angles at an interior vertex {w} is strictly greater than {\pi}, then by Exercise 18 we could increase the radius at this vertex slightly without destroying the subpacking property at {w} or at any other of the interior vertices, again contradicting the maximality of {{\mathcal R}}. Thus {{\mathcal R}} obeys the local condition (i), and we have demonstrated existence of the required hyperbolic circle packing.

Finally we establish uniqueness. It suffices to establish that {{\mathcal R}} is the unique tuple that obeys the local condition (i) and the boundary condition (ii). Suppose we had another tuple {{\mathcal R}' = ( r'_w )_{w \in V \backslash \{v\}}} other than {{\mathcal R}} that obeyed these two conditions. Then by the maximality of {{\mathcal R}}, we have {r'_w \leq \tilde r_w} for all {w}. By Exercise 18(iii), this implies that

\displaystyle  \mathrm{Area}( r'_{w_1}, r'_{w_2}, r'_{w_3} ) \leq \mathrm{Area}( \tilde r_{w_1}, \tilde r_{w_2}, \tilde r_{w_3} )

for any triangle {(w_1,w_2,w_3)} in {T}. Summing over all triangles and using (2), we conclude that

\displaystyle  \sum_{w \in V \backslash \{v\}} \sum_{w_1,w_2: (w,w_1,w_2) \hbox{ triangle}} \alpha_1(r'_{w}, r'_{w_1}, r'_{w_2})

\displaystyle \geq \sum_{w \in V \backslash \{v\}} \sum_{w_1,w_2: (w,w_1,w_2) \hbox{ triangle}} \alpha_1(\tilde r_{w}, \tilde r_{w_1}, \tilde r_{w_2})

where the inner sum is over the pairs {w_1,w_2} such that {(w,w_1,w_2)} forms a triangle in {G - \{v\}}. But by the local condition (i) and the boundary condition (ii), the inner sum on either side is equal to {2\pi} for an interior vertex and {0} for a boundary vertex. Thus the two sides agree, which by Exercise 18(iii) implies that {r'_w = \tilde r_w} for all {w}. This proves Theorem 14 and thus Theorems 7, 3, 4.

— 2. Quasiconformal maps —

In this section we set up some of the foundational theory of quasiconformal mapping, which are generalisations of the conformal mapping concept that can tolerate some deviations from perfect conformality, while still retaining many of the good properties of conformal maps (such as being preserved under uniform limits), though with the notable caveat that in contrast to conformal maps, quasiconformal maps need not be smooth. As such, this theory will come in handy when proving convergence of circle packings to the Riemann map. The material here is largely drawn from the text of Lehto and Virtanen.

We first need the following refinement of the Riemann mapping theorem, known as Carathéodory’s theorem:

Theorem 20 (Carathéodory’s theorem) Let {U} be a bounded simply connected domain in {{\bf C}} whose boundary {\partial U} is a Jordan curve, and let {\phi: D(0,1) \rightarrow U} be a conformal map between {D(0,1)} and {U} (as given by the Riemann mapping theorem). Then {\phi} extends to a continuous homeomorphism from {\overline{D}(0,1)} to {\overline{U}}.

The condition that {\partial U} be a Jordan curve is clearly necessary, since if {\partial U} is not simple then there are paths in {D(0,1)} that end up at different points in {\partial D(0,1)} but have the same endpoint in {\partial U} after applying {\phi}, which prevents {\phi} being continuously extended to a homeomorphism.

Proof: We first prove continuous extension to the boundary. It suffices to show that for every point {\zeta} on the boundary of the unit circle, the diameters of the sets {\phi( D(0,1) \cap D( \zeta, r_n ) )} go to zero for some sequence of radii {r_n \rightarrow 0}.

First observe from the change of variables formula that the area of {U = \phi(D(0,1))} is given by {\int_{D(0,1)} |\phi'(z)|^2\ dx dy}, where {dx dy} denotes Lebesgue measure (or the area element). In particular, this integral is finite. Expanding in polar coordinates around