I’ve written before about SciAm and their blogs. My offer still stands, if they ever want a condensed matter/nano blog that I promise won’t verge into hype or pseudoscience.
In my online course we’re now into the third chapter of Fong and Spivak’s book Seven Sketches. Now we’re talking about databases!
To some extent this is just an excuse to (finally) introduce categories, functors, natural transformations, adjoint functors and Kan extensions. Great stuff, and databases are a great source of easy examples.
But it’s also true that Spivak helps run a company called Categorical Informatics that actually helps design databases using category theory! And his partner, Ryan Wisnesky, would be happy to talk to people about it. If you’re interested, click the link: he’s attending my course.
To read and join discussions on Chapter 3 go here:
You can also do exercises and puzzles, and see other people’s answers to these.
Here are the lectures I’ve given so far:
• Lecture 34 – Chapter 3: Categories
• Lecture 35 – Chapter 3: Categories versus Preorders
• Lecture 36 – Chapter 3: Categories from Graphs
• Lecture 37 – Chapter 3: Presentations of Categories
• Lecture 38 – Chapter 3: Functors
• Lecture 39 – Chapter 3: Databases
• Lecture 40 – Chapter 3: Relations
• Lecture 41 – Chapter 3: Composing Functors
• Lecture 42 – Chapter 3: Transforming Databases
• Lecture 43 – Chapter 3: Natural Transformations
• Lecture 44 – Chapter 3: Categories, Functors and Natural Transformations
• Lecture 45 – Chapter 3: Composing Natural Transformations
• Lecture 46 – Chapter 3: Isomorphisms
• Lecture 47 – Chapter 3: Adjoint Functors
• Lecture 48 – Chapter 3: Adjoint Functors
It was my great pleasure to sit on the PhD defense committee for the successful defense of Sarah Pearson. She wrote a thesis about low-mass galaxies and globular clusters, considering both their interactions with each other, and with the bigger galaxies into which they later fall. She has some nice analyses of the Palomar 5 tidal stream, and what it's morphology might tell us about the Milky Way halo and bar. And also nice results on gas bridges and streams around pairs of dwarf galaxies.
I was most interested in her stellar-stream results, including several things I hadn't thought about before: One is that prograde streams are more affected by the bar and spiral arms in the disk than retrograde streams. Another is that we might be able to find globular-cluster streams around other galaxies nearby. That would be incredible! And since (as she showed) you can learn a lot about a galaxy just from the shape of a stream, we might not need to do much more than detect streams around other galaxies to learn a lot. It was a pleasure to serve on the committee, and it is a beautiful body of work.
Two more students in the Applied Category Theory 2018 school wrote a blog article about a paper they read:
• Tai-Danae Bradley and Brad Theilman, Cognition, convexity and category theory, The n-Category Café, 10 March 2018.
Tai-Danae Bradley is a mathematics PhD student at the CUNY Graduate Center and well-known math blogger. Brad Theilman is a grad student in neuroscience at the Gentner Lab at U. C. San Diego. I was happy to get to know both of them when the school met in Leiden.
In their blog article, they explain this paper:
• Joe Bolt, Bob Coecke, Fabrizio Genovese, Martha Lewis, Dan Marsden, and Robin Piedeleu, Interacting conceptual spaces I.
Fans of convex sets will enjoy this!
Today’s installment in the ongoing project to sketch the $\infty$-elephant: atomic geometric morphisms.
Chapter C3 of Sketches of an Elephant studies various classes of geometric morphisms between toposes. Pretty much all of this chapter has been categorified, except for section C3.5 about atomic geometric morphisms. To briefly summarize the picture:
Sections C3.1 (open geometric morphisms) and C3.3 (locally connected geometric morphisms) are steps $n=-1$ and $n=0$ on an infinite ladder of locally n-connected geometric morphisms, for $-1 \le n \le \infty$. A geometric morphism between $(n+1,1)$-toposes is locally $n$-connected if its inverse image functor is locally cartesian closed and has a left adjoint. More generally, a geometric morphism between $(m,1)$-toposes is locally $n$-connected, for $n\lt m$, if it is “locally” locally $n$-connected on $n$-truncated maps.
Sections C3.2 (proper geometric morphisms) and C3.4 (tidy geometric morphisms) are likewise steps $n=-1$ and $n=0$ on an infinite ladder of n-proper geometric morphisms.
Section C3.6 (local geometric morphisms) is also step $n=0$ on an infinite ladder: a geometric morphism between $(n+1,1)$-toposes is $n$-local if its direct image functor has an indexed right adjoint. Cohesive toposes, which have attracted a lot of attention around here, are both locally $\infty$-connected and $\infty$-local. (Curiously, the $n=-1$ case of locality doesn’t seem to be mentioned in the 1-Elephant; has anyone seen it before?)
So what about C3.5? An atomic geometric morphism between elementary 1-toposes is usually defined as one whose inverse image functor is logical. This is an intriguing prospect to categorify, because it appears to mix the “elementary” and “Grothendieck” aspects of topos theory: a geometric morphisms are arguably the natural morphisms between Grothendieck toposes, while logical functors are more natural for the elementary sort (where “natural” means “preserves all the structure in the definition”). So now that we’re starting to see some progress on elementary higher toposes (my post last year has now been followed by a preprint by Rasekh), we might hope be able to make some progress on it.
Unfortunately, the definitions of elementary $(\infty,1)$-topos currently under consideration have a problem when it comes to defining logical functors. A logical functor between 1-toposes can be defined as a cartesian closed functor that preserves the subobject classifier, i.e. $F(\Omega) \cong \Omega$. The higher analogue of the subobject classifier is an object classifier — but note the switch from definite to indefinite article! For Russellian size reasons, we can’t expect to have one object classifer that classifies all objects, only a tower of “universes” each of which classifies some subcollection of “small” objects.
What does it mean for a functor to “preserve” the tower of object classifiers? If an $(\infty,1)$-topos came equipped with a specified tower of object classifiers (indexed by $\mathbb{N}$, say, or maybe by the ordinal numbers), then we could ask a logical functor to preserve them one by one. This would probably be the relevant kind of “logical functor” when discussing categorical semantics of homotopy type theory: since type theory does have a specified tower of universe types $U_0 : U_1 : U_2 : \cdots$, the initiality conjecture for HoTT should probably say that the syntactic category is an elementary $(\infty,1)$-topos that’s initial among logical functors of this sort.
However, Grothendieck $(\infty,1)$-topoi don’t really come equipped with such a tower. And even if they did, preserving it level by level doesn’t seem like the right sort of “logical functor” to use in defining atomic geometric morphisms; there’s no reason to expect such a functor to “preserve size” exactly.
What do we want of a logical functor? Well, glancing through some of the theorems about logical functors in the 1-Elephant, one result that stands out to me is the following: if $F:\mathbf{S}\to \mathbf{E}$ is a logical functor with a left adjoint $L$, then $L$ induces isomorphisms of subobject lattices $Sub_{\mathbf{E}}(A) \cong Sub_{\mathbf{S}}(L A)$. This is easy to prove using the adjointness $L\dashv F$ and the fact that $F$ preserves the subobject classifier:
$Sub_{\mathbf{E}}(A) \cong \mathbf{E}(A,\Omega_{\mathbf{E}}) \cong \mathbf{E}(A,F \Omega_{\mathbf{S}}) \cong \mathbf{E}(L A,\Omega_{\mathbf{S}})\cong Sub_{\mathbf{S}}(L A).$
What would be the analogue for $(\infty,1)$-topoi? Well, if we imagine hypothetically that we had a classifier $U$ for all objects, then the same argument would show that $L$ induces an equivalence between entire slice categories $\mathbf{E}/A \simeq \mathbf{S}/L A$. (Actually, I’m glossing over something here: the direct arguments with $\Omega$ and $U$ show only an equivalence between sets of subobjects and cores of slice categories. The rest comes from the fact that $F$ preserves local cartesian closure as well as the (sub)object classifier, so that we can enhance $\Omega$ to an internal poset and $U$ to an internal full subcategory and both of these are preserved by $F$ as well.)
In fact, the converse is true too: reversing the above argument shows that $F$ preserves $\Omega$ if and only if $L$ induces isomorphisms of subobject lattices, and similarly $F$ preserves $U$ if and only if $L$ induces equivalences of slice categories. The latter condition, however, is something that can be said without reference to the nonexistent $U$. So if we have a functor $F:\mathbf{E}\to \mathbf{S}$ between $(\infty,1)$-toposes that has a left adjoint $L$, then I think it’s reasonable to define $F$ to be logical if it is locally cartesian closed and $L$ induces equivalences $\mathbf{E}/A \simeq \mathbf{S}/L A$.
Furthermore, a logical functor between 1-toposes has a left adjoint if and only if it has a right adjoint. (This follows from the monadicity of the powerset functor $P : \mathbf{E}^{op} \to \mathbf{E}$ for 1-toposes, which we don’t have an analogue of (yet) in the $\infty$-case.) In particular, if the inverse image functor in a geometric morphism is logical, then it automatically has a left adjoint, so that the above characterization of logical-ness applies. And since a logical functor is locally cartesian closed, this geometric morphism is automatically locally connected as well. This suggests the following:
Definition: A geometric morphism $p:\mathbf{E}\to \mathbf{S}$ between $(\infty,1)$-topoi is $\infty$-atomic if
This seems natural to me, but it’s very strong! In particular, taking $A=1$ we get an equivalence $\mathbf{E}\simeq \mathbf{E}/1 \simeq \mathbf{S}/p_! 1$, so that $\mathbf{E}$ is equivalent to a slice category of $\mathbf{S}$. In other words, $\infty$-atomic geometric morphisms coincide with local homeomorphisms!
Is that really reasonable? Actually, I think it is. Consider the simplest example of an atomic geometric morphism of 1-topoi that is not a local homeomorphism: $[G,Set] \to Set$ for a group $G$. The corresponding geometric morphism of $(\infty,1)$-topoi $[G,\infty Gpd] \to \infty Gpd$ is a local homeomorphism! Specifically, we have $[G,\infty Gpd] \simeq \infty Gpd / B G$. So in a sense, the difference between atomic and locally-homeomorphic vanishes in the limit $n\to \infty$.
To be sure, there are other atomic geometric morphisms of 1-topoi that do not extend to local homeomorphisms of $(\infty,1)$-topoi, such as $Cont(G) \to Set$ for a topological group $G$. But it seems reasonable to me to regard these as “1-atomic morphisms that are not $\infty$-atomic” — a thing which we should certainly expect to exist, just as there are locally 0-connected morphisms that are not locally $\infty$-connected, and 0-proper morphisms that are not $\infty$-proper.
We can also “see” how the difference gets “pushed off to $\infty$” to vanish, in terms of sites of definition. In C3.5.8 of the 1-Elephant it is shown that every atomic Grothendieck topos has a site of definition in which (among other properties) all morphisms are effective epimorphisms. If we trace through the proof, we see that this effective-epi condition comes about as the “dual” class to the monomorphisms that the left adjoint of a logical functor induces an equivalence on. Since an $(n+1,1)$-topos has classifiers for $n$-truncated objects, we would expect an atomic one to have a site of definition in which all morphisms belong to the dual class of the $n$-truncated morphisms, i.e. the $n$-connected morphisms. So as $n\to \infty$, we get stronger and stronger conditions on the morphisms in our site, until in the limit we have a classifier for all morphisms, and the morphisms in our site are all required to be equivalences. In other words, the site is itself an $\infty$-groupoid, and thus the topos of (pre)sheaves on it is a slice of $\infty Gpd$.
However, it could be that I’m missing something and this is not the best categorification of atomic geometric morphisms. Any thoughts from readers?
Today was my last day and fifth lecture at TASI. This lecture was crowd-sourced in content! I spoke about Fisher information, linear algebra tips and tricks, and decision theory and model selection. On the latter I strongly advocated engineering methods like cross-validation!
Over lunch I had a great set of conversations with Zach Berta-Thompson about precise measurement for exoplanets, and also hack weeks like the #GaiaSprint. We went deep into the limits on ultra-precise photometry from the ground. We wondered at the point that the best imaging systems get the best precision (on photometry, of point sources) by de-focusing. That has always struck me as somehow absurd, though it's true that you don't have to understand your system nearly so well when you are out of focus (for many reasons).
We had one very good idea: Instead of de-focusing, put in an objective prism! You could get many of the benefits of de-focus but also get far more information about the atmosphere and speckle and scintillation and so on. In principle, you might beat the best measurements made to date. And it is a cheap experiment to perform.
For the last two weeks I’ve been at Les Houches, a village in the French Alps, for the Summer School on Structures in Local Quantum Field Theory.
Les Houches has a long history of prestigious summer schools in theoretical physics, going back to the activity of Cécile DeWitt-Morette after the second world war. This was more of a workshop than a “school”, though: each speaker gave one talk, and they weren’t really geared for students.
The workshop was organized by Dirk Kreimer and Spencer Bloch, who both have a long track record of work on scattering amplitudes with a high level of mathematical sophistication. The group they invited was an even mix of physicists interested in mathematics and mathematicians interested in physics. The result was a series of talks that managed to both be thoroughly technical and ask extremely deep questions, including “is quantum electrodynamics really an asymptotic series?”, “are there simple graph invariants that uniquely identify Feynman integrals?”, and several talks about something called the Spine of Outer Space, which still sounds a bit like a bad sci-fi novel. Along the way there were several talks showcasing the growing understanding of elliptic polylogarithms, giving me an opportunity to quiz Johannes Broedel about his recent work.
While some of the more mathematical talks went over my head, they spurred a lot of productive dialogues between physicists and mathematicians. Several talks had last-minute slides, added as a result of collaborations that happened right there at the workshop. There was even an entire extra talk, by David Broadhurst, based on work he did just a few days before.
We also had a talk by Jaclyn Bell, a former student of one of the participants who was on a BBC reality show about training to be an astronaut. She’s heavily involved in outreach now, and honestly I’m a little envious of how good she is at it.
A lot happened at Applied Category Theory 2018. Even as it’s still winding down, we’re already starting to plan a followup in 2019, to be held in Oxford. Here are some notes Joshua Tan sent out:
Photos: Ross Duncan took some very glamorous photos of the conference, which you can find here.
Videos: Videos of talks are online here: courtesy of Jelle Herold and Fabrizio Genovese.
Next year’s workshop: Bob Coecke will be organizing ACT 2019, to be hosted in Oxford sometime spring/summer. There will be a call for papers.
Next year’s school: Daniel Cicala is helping organize next year’s ACT school. Please contact him at if you would like to get involved.
Look forward to the official call for submissions, coming soon, for the first issue of Compositionality!
The minutes mentioned above contain interesting thoughts on these topics:
• Day 1: Causality
• Day 2: AI & Cognition
• Day 3: Dynamical Systems
• Day 4: Systems Biology
• Day 5: Closing
As seen in this comment on Polymath and explicated further in Fernando de Oliveira Filho’s thesis, section 4.4.
I actually spent much of today thinking about this so let me try to explain it in a down-to-earth way, because it involved me thinking about Bessel functions for the first time ever, surely a life event worthy of recording.
So here’s what we’re going to do. As I mentioned last week, you can express this problem as follows: suppose you have a map h: R^2 -> V, for some normed vector space V, which is a unit-distance embedding; that is, if |x-x’|_{R^2} = 1, then |h(x)-h(x’)|_V = 1. (We don’t ask that h is an isometry, only that it preserves the distance-1 set.)
Then let t be the radius of the smallest hypersphere in V containing h(R^2).
Then any graph embeddable in R^2 with all edges of length 1 is sent to a unit-distance graph in V contained in the hyperplane of radius t; this turns out to be equivalent to saying the Lovasz number of G (ok, really I mean the Lovasz number of the complement of G) is at most 1/(1-2t). So we want to show that t is bounded below 1, is the point. Or rather: we can find a V and a map from R^2 to V to make this the case.
So here’s one! Let V be the space of L^2 functions on R^2 with the usual inner product. Choose a square-integrable function F on R^2 — in fact let’s normalize to make F^2 integrate to 1 — and for each a in R^2 we let h(a) be the function F(x-a).
We want the distance between F(x-a) and F(x-b) to be the same for every pair of points at distance 1 from each other; the easiest way to arrange that is to insist that F(x) be a radially symmetric function F(x) = f(|x|); then it’s easy to see that the distance between F(x-a) and F(x-b) in V is a function G(a-b) which depends only on |a-b|. We write
so that the squared distance between F(x) and F(x-r) is
.
In particular, if two points in R^2 are at distance 1, the squared distance between their images in V is 2(1-g(1)). Note also that g(0) is the square integral of F, which is 1.
What kind of hypersphere encloses all the points F(x-a) in V? We can just go ahead and take the “center” of our hypersphere to be 0; since |F| = 1, every point in h(R^2) lies in (indeed, lies on) the sphere of radius 1 around the origin.
Hey but remember: we want to study a unit-distance embedding of R^2 in V. Right now, h sends unit distances to the distance 2(1-g(1)), whatever that is. We can fix that by scaling h by the square root of that number. So now h sends unit distances to unit distances, and its image is enclosed in a hypersphere of radius
2(1-g(1))^{-1}
The more negative g(1) is, the smaller this sphere is, which means the more we can “fold” R^2 into a small space. Remember, the relationship between hypersphere number and Lovasz theta is
and plugging in the above bound for the hypersphere number, we find that the Lovasz theta number of R^2, and thus the Lovasz theta number of any unit-distance graph in R^2, is at most
1-1/g(1).
So the only question is — what is g(1)?
Well, that depends on what g is.
Which depends on what F is.
Which depends on what f is.
And of course we get to choose what f is, in order to make g(1) as negative as possible.
How do we do this? Well, here’s the trick. The function G is not arbitrary; if it were, we could make g(1) whatever we wanted. It’s not hard to see that G is what’s called a positive definite function on R^2. And moreover, if G is positive definite, there exists some f giving rise to it. (Roughly speaking, this is the fact that a positive definite symmetric matrix has a square root.) So we ask: if G is a positive definite (radially symmetric) function on R^2, and g(0) = 1, how small can g(1) be?
And now there’s an old theorem of (Wisconsin’s own!) Isaac Schoenberg which helpfully classifies the positive definite functions on R^2; they are precisely the functions G(x) = g(|x|) where g is a mixture of scalings of the Bessel function $J_0$:
for some everywhere nonnegative A(u). (Actually it’s more correct to say that A is a distribution and we are integrating J_0(ur) against a non-decreasing measure.)
So g(1) can be no smaller than the minimum value of J_0 on [0,infty], and in fact can be exactly that small if you let A become narrowly supported around the minimum argument. This is basically just taking g to be a rescaled version of J_0 which achieves its minimum at 1. That minimum value is about -0.4, and so the Lovasz theta for any unit-distance subgraph on the plane is bounded above by a number that’s about 1 + 1/0.4 = 3.5.
To sum up: I give you a set of points in the plane, I connect every pair that’s at distance 1, and I ask how you can embed that graph in a small hypersphere keeping all the distances 1. And you say: “Oh, I know what to do, just assign to each point a the radially symmetrized Bessel function J_0(|x-a|) on R^2, the embedding of your graph in the finite-dimensional space of functions spanned by those Bessel translates will do the trick!”
That is cool!
Remark: Oliveira’s thesis does this for Euclidean space of every dimension (it gets more complicated.) And I think (using analysis I haven’t really tried to understand) he doesn’t just give an upper bound for the Lovasz number of the plane as I do in this post, he really computes that number on the nose.
Update: DeCorte, Oliveira, and Vallentin just posted a relevant paper on the arXiv this morning!
As part of the Applied Category Theory 2018 school, Maru Sarazola wrote a blog article on open dynamical systems and their steady states. Check it out:
• Maru Sarazola, Dynamical systems and their steady states, 2 April 2018.
She compares two papers:
• David Spivak, The steady states of coupled dynamical systems compose according to matrix arithmetic.
• John Baez and Blake Pollard, A compositional framework for reaction networks, Reviews in Mathematical Physics 29 (2017), 1750028.
(Blog article here.)
It’s great, because I’d never really gotten around to understanding the precise relationship between these two approaches. I wish I knew the answers to the questions she raises at the end!
Two more students in the Applied Category Theory 2018 school wrote a blog article about something they read:
• Eliana Lorch and Joshua Tan, The behavioral approach to systems theory, 15 June 2018.
Eliana Lorch is a mathematician based in San Francisco. Joshua Tan is a grad student in computer science at the University of Oxford and one of the organizers of Applied Category Theory 2018.
They wrote a great summary of this paper, which has been an inspiration to me and many others:
• Jan Willems, The behavioral approach to open and interconnected systems, IEEE Control Systems 27 (2007), 46–99.
They also list many papers influenced by it, and raise a couple of interesting problems with Willems’ idea, which can probably be handled by generalizing it.
guest post by Eliana Lorch and Joshua Tan
As part of the Applied Category Theory seminar, we discussed an article commonly cited as an inspiration by many papers^{1} taking a categorical approach to systems theory, The Behavioral Approach to Open and Interconnected Systems. In this sprawling monograph for the IEEE Control Systems Magazine, legendary control theorist Jan Willems poses and answers foundational questions like how to define the very concept of mathematical model, gives fully-worked examples of his approach to modeling from physical first principles, provides various arguments in favor of his framework versus others, and finally proves several theorems about the special case of linear time-invariant differential systems.
In this post, we’ll summarize the behavioral approach, Willems’ core definitions, and his “systematic procedure” for creating behavioral models; we’ll also examine the limitations of Willems’ framework, and conclude with a partial reference list of Willems-inspired categorical approaches to understanding systems.
Here’s the view from 10,000 feet of the behavioral approach in contrast with the traditional signal-flow approach:
Willems’ approach breaks down into: (1) considering a dynamical system as a ‘behavior,’ and (2) defining interconnection as variable sharing.
Willems goes so far as to claim: “It is remarkable that the idea of viewing a system in terms of inputs and outputs, in terms of cause and effect, kept its central place in systems and control theory throughout the 20th century. Input/output thinking is not an appropriate starting point in a field that has modeling of physical systems as one of its main concerns.”
To get a sense of the inappropriateness of input/output-based modeling: consider a freely swinging pendulum in 2-dimensional space with a finite-sized bob. Now consider adding to this system a model representing the right-hand half-plane being filled with cement. With soft-contact mechanics, we could determine what force the cement exerts on the pendulum when it bounces against it — that is, when the pendulum bob’s center of mass comes within its radius of the right half-plane.
Traditionally, we might define a function that takes the pendulum’s position as input and produces a force as output. But this is insufficient to model the effect of the wall, which also prevents the pendulum bob’s center of mass from ever being in the right-hand half-plane; the wall imposes a constraint on the possible states of the world. How can we can capture this kind of constraint? In this case, we can extend the state model with inequalities delineating the feasible region of the state space.^{2}
Willems’ insight is that the entire modeling framework can be subsumed by a sufficiently broad notion of “feasible region.” A dynamical system is simply a relation on the variables, forming a subset of all conceivable trajectories — a ‘behavior.’
The signal-flow approach to systems modeling requires labeling the terminals of a system as inputs or outputs before a model can be formulated; Willems argues this does not respect the actual physics. Most physical terminals — electrical wires, mechanical linkages or gears, thermal couplings, etc. — do not have an intrinsic, a priori directionality, and may permit “signals” to “flow” in either or both directions. Rather, physical interconnections constrain certain variables on either side to be equal (or equal-and-opposite). After having modeled a system, one may be able to prove that it obeys a certain partitioning of variables into inputs and outputs, but assuming this up front obscures the underlying physical reality. This paradigm shift amounts to moving from functional composition (given $a$ we compute $b=f(a)$, then $c=g(b)$) to relational composition: $(a,c)\in(R;S)$ iff $\exists ((a,b_0),(b_1,c))\in(R\times S).\;b_0=b_1$, which can be read as “variable sharing” between $b_0$ and $b_1$. This is a way of restoring symmetry to composition — giving no precedence either between the two entities being composed, nor between each entity’s domain and codomain.
Examples of systems to model within the behavioral framework.
Given some natural phenomenon we wish to model mathematically, the first step is to establish the universum, the set of all a priori feasible outcomes, notated $\mathbb{V}$. Then, Willems asserts a mathematical model to be a restriction of possible outcomes to a subset of $\mathbb{V}$.^{3} This subset itself is called the behavior of the model, and is written $\mathcal{B}$. This concept is, as the name suggests, at the center of Willems’ “behavioral approach”: he asserts that “equivalence of models, properties of models, model representations, and system identification must refer to the behavior.”
A dynamical system is a model in which elements of the universum $\mathbb{V}$ are functions of time, that is, a triple
$\Sigma = \left(\mathbb{T}, \mathbb{W}, \mathcal{B}\right)$
in which $\mathcal{B} \subseteq \mathbb{V} := \mathbb{W}^\mathbb{T}$. $\mathbb{T}$ is referred to as the time set (which may be discrete or continuous), and $\mathbb{W}$ is referred to as the signal space. The elements of $\mathcal{B}$ are trajectories $w: \mathbb{T}\rightarrow\mathbb{W}$.
A dynamical system with latent variables is one whose signal space is a Cartesian product of manifest variables and latent variables: the “full” system is a tuple
$\Sigma_{full}= \left(\mathbb{T}, \mathbb{M}, \mathbb{L}, \mathcal{B}_{full}\right)$
where the behavior $\mathcal{B}_{{full}}\subseteq \mathbb{V} := \left(\mathbb{M} \times \mathbb{L}\right)^\mathbb{T}$. Here $\mathbb{M}$ is the set of manifest values and $\mathbb{L}$ is the set of latent values.
A full behavior $\Sigma_{{full}}$ is said to induce or represent a manifest dynamical system $\Sigma=\left(\mathbb{T}, \mathbb{M}, \mathcal{B}\right)$, with the manifest behavior $\mathcal{B}$ defined by
$\mathcal{B}:=\left\{m: \mathbb{T} \rightarrow \mathbb{M}\;|\;\exists \ell: \mathbb{T} \rightarrow \mathbb{L}. \left\langle m,\ell\right\rangle \in \mathcal{B}_{{full}}\right\}$
The behavior of all the variables is determined by the equations specifying the first-principles physical laws, together with the equations expressing all the constraints from interconnection. Willems treats interconnection as simply “variable sharing,” that is, restricting behavior such that the trajectories assigned to the interconnected variables are constrained to be equal (or sometimes “equal and opposite,” depending on sign conventions).
An interconnection architecture is a sort of wiring diagram (analogous to operadic wiring diagrams) that describes the way in which a collection of systems is interconnected. Willems formalises this as a graph with leaves: a set $V$ of vertices (which are systems or modules), a set $E$ of edges (terminals), and a set $L$ of leaves (open wires), with an assignment map $\mathcal{A}$: to each edge an unordered pair of vertices and to each leaf a single vertex. A leaf is depicted as an open half-edge emanating from the graph, like a wire sticking out of a circuit — the “open” part of “open and interconnected systems.” (Note that this interpretation of a “leaf” differs from the usual graph-theoretic “vertex with degree one”; here, a leaf is like an “edge with degree one.”) There’s a type-checking condition that the set of leaves and internal half-edges emanating from a vertex must be completely matched to the set of terminals of the associated module, so that you don’t have hidden dangling wires.
Finally, the interconnection architecture requires a module embedding that specifies how each vertex is interpreted as a module: either a primitive model made of physical laws, or a sub-model within which there’s a further module embedding. Here we get a sense of the “zooming” nature of the modeling procedure.
Willems proposes a “systematic procedure” for generating models in this behavioral form: first decompose (tear) the system under investigation into smaller subsystems, then recursively apply the modeling process (zoom in) to each subsystem, and finally compose (link) the resulting submodels together into an overall system model. Rendered in pseudocode, it looks like this:
define makeModel(System system) => Model { if (system is directly governed by known physics) { return knownModel(system) } else { WiringDiagram<System> decomposition := tear(system) List<Model> submodels := decomposition.listSubsystems().fmap(makeModel) return decomposition.link(submodels) } }
As an example, Willems analyzes an open hydraulic system made of two tanks interconnected by a pipe:
In the “tear” step, he breaks the system apart into three subsystems: the two tanks, (1) and (3) in the figure, and the pipe (2). In the “zoom” step, each of the three subsystems is “simple enough to be modeled using first-principles physical laws,” so he fills in the known model for each one (reaching the recursive base case, rather than starting again from “tear”). For the pipe, flows on each end are equal and opposite, and the difference in pressure is proportional to the flow; for each of the tanks, conservation of mass and Bernoulli’s laws relate the pressures, flows, and height of water in the tank.
Then, in the “link” step, he starts with, for each subsystem, a copy of the corresponding model (initially each relating a completely separate set of variables), then combines the models according to the links between subsystems, using the appropriate “interconnection laws” for each pair of connected terminals. In this example, the interconnection laws consist of setting connected pressures to be equal and connected flows to be equal and opposite.
An essential claim of Willems’ philosophy is that for physical systems for which we can use his modeling procedure of breaking systems into subsystems with interconnections, the hierarchical structure of our model will match reality closely enough that there will be straightforward physical principles governing the interactions at the interfaces (which tend to correspond to partitions of physical space). Many engineered systems deliberately have their important interfaces clearly delineated, but he explicitly disclaims that there are forms of interaction, such as gravitational interaction and colliding objects, which do not perfectly fit this framework.
Not all interconnections fit. This framework assumes that the interface of a module can be specified—as a finite set of terminals—prior to composition with other modules, and Willems identifies three situations where this assumption fails:
$n$-body problems exhibiting “virtual terminals” which are not so much a property of each module, but of each pair of modules. The classic example of this phenomenon is the $n$-body problem in gravitational (or electrostatic, etc.) dynamics: an $n$-body system has $O(n^2)$ interactions, but the combination of an $n$-body system and an $m$-body system has more than $O(n^2+m^2)$ interactions.
“Distributed interconnections” in which a terminal has continuous spatial extent (e.g. heat conduction along a surface), calling for partial differential equations involving coordinate variables.
Contact mechanics such as rolling, sliding, bouncing, collisions, etc., in which interconnections appear and vanish depending on the values of certain position variables, as objects come into and out of contact.
Directional components and systems. In contrast to an a posteriori partitioning of variables into inputs and outputs (meaning that any setting of the “inputs” uniquely determines the trajectories of the “outputs”), some components fundamentally exhibit a priori input/output behavior (that is, they cannot be back-driven), and Willems’ framework can’t accommodate these.
Ideal amplifier. The behavior of an ideal amplifier with gain $K$, input $x$ and output $y$ would be $\left\{ (x,y) | y=K x\right\}$ (constant-gain model), yet Willems’ approach here would make the incorrect prediction that we could back-drive terminal $y$ by interconnecting it to a signal source and expect to observe the signal scaled by $1/K$ at terminal $x$. However, an “ideal amplifier” is not a first-principles physical law; the modeling procedure might suggest we “tear” an amplifier further into its component parts, and then “tear” the constituent transistors with a deconstruction (such as the Gummel-Poon model) into passive circuit primitives. This might result in a more realistic model of actual amplifier behavior, though it would have at least an order of magnitude more components than the constant-gain model.
Humans, etc. Willems lists additional signal-flow systems for which the behavioral approach is not quite adequate: actuator inputs and sensor outputs interconnected via a controller, reactions of humans or animals, devices that respond to external commands, or switches and other logical devices.
Cartesian latent/manifest partitioning. Among Willems’ arguments against mandatory input/output partitioning is the simple and compelling example of a particle moving on a sphere, whose position is truly an output and whose velocity is truly an input—yet even in this seemingly favorable setup, the full state space (the tangent bundle of the sphere) cannot be decomposed as a Cartesian product of positions on the sphere with any vector space. However, Willems uses exactly the same hidden assumption in his definition of a dynamical system with latent variables. If a dynamical system’s full state space can’t be written as a Cartesian product, then its behavior can’t be represented in the way Willems defines.
Probability. Willems’ non-deterministic approach to behaviors is a kind of unquantified uncertainty; it doesn’t natively give us a way of associating probabilities with elements of a “behavior” (although behaviors could be considered as always having an implicitly uniform distribution). Nontrivial distributions could also be modeled by defining the universum $\mathbb{V}:=P\!\left(\mathbb{W}^\mathbb{T}\right)$ (where $P$ is probability, namely the Giry monad), but the non-determinism of $\mathcal{B}\subseteq \mathbb{V}$ introduces “Knightian uncertainty”, in that models are now sets of distributions, with no probabilities specified at the top level—and it’s unclear how such models should compose with non-stochastic models.
As mentioned earlier, Willems has been an inspiration for many papers in applied category theory. One common feature is that many take a relational approach to semantics, providing a functor into (some subcategory of) $\mathbf{Rel}_\times$. Here is a (non-exhaustive!) reference list of these and related works.
Applications to specific domains:
Passive linear networks: Baez and Fong, 2015. Constructs a “black-boxing” functor from a decorated-cospan category of passive linear circuits (composed of resistors, inductors and capacitors) to a behavior category of Lagrangian relations.
Generalized circuit networks: Baez, Coya and Rebro, 2018. Generalizes the black-boxing functor to potentially nonlinear components/circuits.
Reversible Markov processes: Baez, Fong and Pollard, 2016. Constructs a “black-boxing” functor from a decorated-cospan category of reversible Markov processes to a category of linear relations describing steady states.
Petri nets / reaction networks: Baez and Pollard, 2017. Constructs a “black-boxing” functor from a decorated-cospan category of Petri nets to a category of semi-algebraic relations describing steady states, with an intermediate stop at a “grey-box” category of algebraic vector fields.
Digital circuits: Ghica, Jung and Lopez, 2017. Defines a symmetric monoidal theory of circuits including a discrete-delay operator and feedback, with operational semantics.
Discrete linear time-invariant dynamical systems (LTIDS): Fong, Sobocinski, and Rapisarda, 2016. Constructs a full and faithful functor from a freely generated symmetric monoidal theory into the PROP of LTIDSs and characterizes controllability in terms of spans, among other things—not only using Willems’ definitions and philosophy, but even some of his theorems.
General frameworks:
Algebra of Open and Interconnected Systems: Brendan Fong’s 2016 doctoral thesis. Covers his technique of decorated cospans as well as more recent work on decorated corelations, both of which are especially useful to construct syntactic categories for various kinds of non-categorical diagrams.
Topos of behavior types: Schultz and Spivak, 2017. Constructs a temporal type theory, as the internal logic of the category of sheaves on an interval domain, in which every object represents a behavior that seems to be essentially in Willems’ sense of the word.
Operad of wiring diagrams: Vagner, Spivak and Lerman, 2015. Formalises construction of systems of differential equations on manifolds using Spivak’s “operad of wiring diagrams” approach to composition, which is conceptually similar to Willems’ notion of hierarchical zooming into modules.
Signal flow graphs: Bonchi, Sobocinski and Zanasi, 2017. Sound and complete axiomatization of signal flow graphs, arguably the primary incumbent against which Willems’ behavioral approach contends.
Bond graphs: Brandon Coya, 2017. Defines a category of bond graphs (an older general modeling framework which Willems acknowledges as a step in the right direction toward a behavioral approach) with functorial semantics as Lagrangian relations.
Cospan/Span(Graph): Gianola, Kasangian and Sabadini in 2017 review a line of work mostly done in the 90’s by Sabadini, Walters and collaborators on what are essentially open and interconnected labeled-transition systems.
Many thanks to Pawel Sobocinski and Brendan Fong for feedback on this post, and to Sophie Raynor and other members of the seminar for thoughts and discussions.
^{1} see e.g. Spivak and Schultz’s Temporal type theory; Fong, Sobocinski, and Rapisarda’s Categorical approach to open and interconnected dynamical systems; Bonchi, Sobocinski, and Zanasi’s Categorical semantics of signal flow graphs
^{2} Depending on the formalisation, a system of differential equations could technically contain equality constraints without any actual derivatives, such as $x^2 + y^2 - 1 = 0$, which can restrict the feasible region without augmenting the modeling framework to include inequalities. We could even impose the constraint $x\leq 0$ by using a non-analytic function: $0 = \begin{cases}e^{-1/x}& if x\gt 0\\ 0& if x\leq 0\end{cases}$
^{3} Note: From a computer science perspective, this says that any “mathematical model” must be a non-deterministic model, as opposed to, on the one hand, a deterministic model (which would pick out an element of $\mathbb{V}$), or, on the other hand, a probabilistic model (which would give a distribution over $\mathbb{V}$). If we are given free choice of $\mathbb{V}$, any of these kinds of model is encodable as any other, but the choice is significant when it comes to composition.
Today my lecture was crowd-sourced! In response to popular opinions from the students, I spoke about cosmological large-scale structure experiments. I spoke about how the large surveys are collapsed to symmetry-respecting mean, variance, and three-point functions, and how simulations of large-scale structure are used to build surrogate likelihood functions for these summary statistics.
The fate of the current Wisconsin Assembly district map, precision-engineered to maintain a Republican majority in the face of anything short of a major Democratic wave election, is in the hands of the Supreme Court, which could announce a decision in Gill v. Whitford any day.
One theory of gerrymandering is that the practice isn’t much of a problem, because the power of a gerrymandered map “decays” with time — a map that suits a party in 2010 may, due to shifting demographics, be reasonably fair a few years later.
How’s the Wisconsin gerrymander doing in 2018? We just had a statewide election in which Rebecca Dallet, the more liberal candidate, beat her conservative rival by 12 points, an unusually large margin for a Wisconsin statewide race.
The invaluable J. Miles Coleman broke the race down by Assembly district:
Dallet won in 58% of seats while getting 56% of the vote. That sounds fair, but in fact a candidate who wins by 12 points is typically going to win in more seats than that. (That’s why the courts are right to say proportional representation isn’t a reasonable expectation!)
Here’s the breakdown by Assembly district, shown a little bigger:
Dallet won by 2 points or less in 8 of the Assembly districts. So, as a rough estimate, if she’d gotten 2% of the vote less, and won 54-46 instead of 56-44, you might guess she’d have won 49 out of 99 seats. That’s consistent with the analysis of Herschlag, Ravier, and Mattingly conducted last year, which estimates that under current maps Democrats would need an 8-12 point statewide lead in order to win half the Assembly seats. (Figure 5 in the linked paper.)
I don’t think the gerrymander is decaying very much. I think it’s robust enough to make GOP legislative control very likely through 2020, at which point it can be updated to last another ten years, and so on and so on. This isn’t the same kind of softcore gerrymandering the Supreme Court allowed to stand in 1986, and I hope the 2018 Supreme Court decides to do something about it.
Today was my second day of lecturing at TASI, and I gave one (morning) lecture, on the use of MCMC sampling. In the afternoon, I looked at (for the first time) the GALAH data on detailed element abundances of stars. I looked at the question of whether the chemical abundances could be used to predict the Galactocentric radii. The idea is: If the gas involved in star formation is azimuthally mixed, there ought to be relationships between radius in the disk and chemical abundances. They didn't jump out! I have various ideas about why, but for now this will be back-burner.
Today was the sixth day (but my first day) of the Theory Advanced Study Institute summer school at CU Boulder. I gave two 75-minute lectures on data analysis, my first two of five lectures this week. In the first, I tried to boil down data-analysis to a set of over-arching principles. I got 8 principles. Maybe this is the introduction to the book I will never write! In the second lecture I spoke about fitting a model, from a frequentist perspective, but with a focus on the likelihood function. I am loving the interactive audience. See the wiki for a (constantly updating) description of the lectures I am giving.
There’s a been a lot of progress on the ‘field with one element’ since I discussed it back in “week259”. I’ve been starting to learn more about it, and especially its possible connections to the Riemann Hypothesis. This is a great place to start:
Abstract. This text serves as an introduction to $\mathbb{F}_1$-geometry for the general mathematician. We explain the initial motivations for $\mathbb{F}_1$-geometry in detail, provide an overview of the different approaches to $\mathbb{F}_1$ and describe the main achievements of the field.
Lorscheid’s paper describes various approaches. Since I’m hoping the key to $\mathbb{F}_1$-mathematics is something very simple and natural that we haven’t fully explored yet, I’m especially attracted to these:
Deitmar’s approach, which generalizes algebraic geometry by replacing commutative rings with commutative monoids. A lot of stuff in algebraic geometry, like the ideals and spectra of commutative rings, or the theory of schemes, doesn’t really require the additive aspect of a ring! So, for many purposes we can get away with commutative monoids, where we think of the monoid operation as multiplication. Sometimes it’s good to use commutative monoids equipped with a ‘zero’ element. The main problem is that algebraic geometry without addition seems to be approximately the same as toric geometry — a wonderful subject, but not general enough to handle everything we want from schemes over $\mathbb{F}_1$.
Toën and Vaquié’s approach, which goes further and replaces commutative rings by commutative monoid objects in symmetric monoidal categories (which work best when they’re complete, cocomplete and cartesian closed). If our symmetric monoidal category is $(\mathbf{AbGp}, \otimes)$ we’re back to commutative rings, if it’s $(\mathbf{Set}, \times)$ we’ve got commutative monoids, but there are plenty of other nice choices: for example if it’s $(\mathbf{CommMon}, \otimes)$ we get commutative rigs, which are awfully nice.
One can also imagine ‘homotopy-coherent’ or ‘$\infty$-categorical’ analogues of these two approaches, which might provide a good home for certain ways the sphere spectrum shows up in this business as a substitute for the integers. For example, one could imagine that the ultimate replacement for a commutative ring is an $E_\infty$ algebra inside a symmetric monoidal $(\infty,1)$-category.
However, it’s not clear to me that homotopical thinking is the main thing we need to penetrate the mysteries of $\mathbb{F}_1$. There seem to be some other missing ideas….
Lorscheid’s own approach uses ‘blueprints’. A blueprint $(R,S)$ is a commutative rig $R$ equipped with a subset $S \subseteq R$ that’s closed under multiplication, contains $0$ and $1$, and generates $R$ as a rig.
I have trouble, just on general aesthetic grounds, believing that blueprints are final ultimate solution to the quest for a generalization of commutative rings that can handle the “field with one element”. They just don’t seem ‘god-given’ the way commutative monoids or commutative objects are. But they do various nice things.
Maybe someone has answered this already, since it’s a kind of obvious question:
Question. Is there a symmetric monoidal category $\mathbf{C}$ in which blueprints are the commutative monoid objects?
Maybe something like the category of ‘abelian groups equipped with a set of generators’?
Of course you should want to know what morphisms of blueprints are, because really we should want the category of commutative monoid objects in $\mathbf{C}$ to be equivalent to the category of blueprints. Luckily Lorscheid’s morphisms of blueprints are the obvious thing: a morphism $f : (R,S) \to (R',S')$ is a morphism of commutative rigs $f: R \to R'$ with $f(S) \subseteq S'$.
Anyway, there’s a lot more to say about $\mathbb{F}_1$, but Lorscheid’s paper is a great way to get into this subject.
In completely separate video news, here are videos of lectures I gave at CERN several years ago: “Cosmology for Particle Physicists” (May 2005). These are slightly technical — at the very least they presume you know calculus and basic physics — but are still basically accurate despite their age.
Update: I originally linked these from YouTube, but apparently they were swiped from this page at CERN, and have been taken down from YouTube. So now I’m linking directly to the CERN copies. Thanks to commenters Bill Schempp and Matt Wright.
This post is to announce that a new journal, Advances in Combinatorics, has just opened for submissions. I shall also say a little about the journal, about other new journals, about my own experiences of finding journals I am happy to submit to, and about whether we are any nearer a change to more sensible systems of dissemination and evaluation of scientific papers.
Advances in Combinatorics is set up as a combinatorics journal for high-quality papers, principally in the less algebraic parts of combinatorics. It will be an arXiv overlay journal, so free to read, and it will not charge authors. Like its cousin Discrete Analysis (which has recently published its 50th paper) it will be run on the Scholastica platform. Its minimal costs are being paid for by the library at Queen’s University in Ontario, which is also providing administrative support. The journal will start with a small editorial board. Apart from me, it will consist of Béla Bollobás, Reinhard Diestel, Dan Kral, Daniela Kühn, James Oxley, Bruce Reed, Gabor Sarkozy, Asaf Shapira and Robin Thomas. Initially, Dan Kral and I will be the managing editors, though I hope to find somebody to replace me in that role once the journal is established. While I am posting this, Dan is simultaneously announcing the journal at the SIAM conference in Discrete Mathematics, where he has just given a plenary lecture. The journal is also being announced by COAR, the Confederation of Open Access Repositories. This project aligned well with what they are trying to do, and it was their director, Kathleen Shearer, who put me in touch with the library at Queen’s.
As with Discrete Analysis, all members of the editorial board will be expected to work: they won’t just be lending their names to give the journal bogus prestige. Each paper will be handled by one of the editors, who, after obtaining external opinions (when the paper warrants them) will make a recommendation to the rest of the board. All decisions will be made collectively. The job of the managing editors will be to make sure that this process runs smoothly, but when it comes to decisions, they will have no more say than any other editor.
The rough level that the journal is aiming at is that of a top specialist journal such as Combinatorica. The reason for setting it up is that there is a gap in the market for an “ethical” combinatorics journal at that level — that is, one that is not published by one of the major commercial publishers, with all the well known problems that result. We are not trying to destroy the commercial combinatorial journals, but merely to give people the option of avoiding them if they would prefer to submit to a journal that is not complicit in a system that uses its monopoly power to ruthlessly squeeze library budgets.
We are not the first ethical journal in combinatorics. Another example is The Electronic Journal of Combinatorics, which was set up by Herb Wilf back in 1994. The main difference between EJC and Advances in Combinatorics is that we plan to set a higher bar for acceptance, even if it means that we accept only a small number of papers. (One of the great advantages of a fully electronic journal is that we do not have a fixed number of issues per year, so we will not have to change our standards artificially in order to fill issues or clear backlogs.) We thus hope that EJC and AIC will between them offer suitable potential homes for a wide range of combinatorics papers. And on the more algebraic side, one should also mention Algebraic Combinatorics, which used to be the Springer journal The Journal of Algebraic Combinatorics (which officially continues with an entirely replaced editorial board — I don’t know whether it’s getting many submissions though), and also the Australasian Journal of Combinatorics.
So if you’re a combinatorialist who is writing up a result that you think is pretty good, then please consider submitting it to us. What do we mean by “pretty good”? My personal view — that is, I am not speaking for the rest of the editorial board — is that the work in a good paper should have a clear reason for others to be interested in it (so not, for example, incremental progress in some pet project of the author) and should have something about it that makes it count as a significant achievement, such as solving a well-known problem, clearing a difficult technical hurdle, inventing a new and potentially useful technique, or giving a beautiful and memorable proof.
Suppose that you want to submit an article to a journal that is free to read and does not charge authors. What are your options? I don’t have a full answer to this question, so I would very much welcome feedback from other people, especially in areas of mathematics far from my own, about what the options are for them. But a good starting point is to consult the list of current member journals in the Free Journal Network, which Advances in Combinatorics hopes to join in due course.
Three notable journals not on that list are the following.
Added later: I learn from Benoît Kloeckner and Emmanuel Kowalski in the comments below that my information about Algebra and Number Theory was wrong, since articles in that journal are not free to read until they are five years old. However, it is published by MSP, which is a nonprofit organization, so as subscription journals go it is at the ethical end of the spectrum.
Further update: I have heard from the editors of Annales Henri Lebesgue that they have had a number of strong submissions and expect the level of the journal to be at least as high as that of journals such as Advances in Mathematics, Mathematische Annalen and the Israel Journal of Mathematics, and perhaps even slightly higher.
I would very much like to hear from people who would prefer to avoid the commercially published journals, but can’t, because there are no ethical journals of a comparable standard in their area. I hope that combinatorialists will no longer have that problem. My impression is that there is a lack of suitable journals in analysis and I’m told that the same is true of logic. I’m not quite sure what the situation is in geometry or algebra. (In particular, I don’t know whether Algebra and Number Theory is also considered as the top specialist journal for algebraists.) Perhaps in some areas there are satisfactory choices for papers of some standards but not of others: that too would be interesting to know. Where do you think the gaps are? Let me know in the comments below.
I want to make one point loud and clear, which is that the mechanics of starting a new, academic-run journal are now very easy. Basically, the only significant obstacle is getting together an editorial board with the right combination of reputation in the field and willingness to work. What’s more, unless the journal grows large, the work is quite manageable — all the more so if it is spread reasonably uniformly amongst the editorial board. Creating the journal itself can be done on one of a number of different platforms, either for no charge or for a very small charge. Some examples are the Mersenne platform, which hosts the Annales Henri Lebesgue, the Episciences platform, which hosts the Epijournal de Géométrie Algébrique, and Scholastica, which, as I mentioned above, hosts Discrete Analysis and Advances in Combinatorics.
Of these, Scholastica charges a submission fee of $10 per article and the other two are free. There are a few additional costs — for example, Discrete Analysis pays a subscription to CrossRef in order to give DOIs to articles — but the total cost of running a new journal that isn’t too large is of the order of a few hundred dollars per year, as long as nobody is paid for what they do. (Discrete Analysis, like Advances in Combinatorics, gets very useful assistance from librarians, provided voluntarily, but even if they were paid the going rate, the total annual costs would be of the same order of magnitude as one “article processing charge” of the traditional publishers, which is typically around $1500 per article.)
What’s more, those few hundred dollars are not an obstacle either. For example, I know of a fund that is ready to support at least one other journal of a similar size to Discrete Analysis, there are almost certainly other libraries that would be interested in following the enlightened example of Queen’s University Library and supporting a journal (if you are a librarian reading this, then I strongly recommend doing so, as it will be helping to weaken the hold of the system that is currently costing you orders of magnitude more money), and I know various people who know about other means of obtaining funding. So if you are interested in starting a journal and think you can put together a credible editorial board, then get in touch: I can offer advice, funding (if the proposal looks a good one), and contact with several other people who are knowledgeable and keen to help.
My attitudes to journals and the journal system have evolved quite a lot in the last few years. The alert reader may have noticed that I’ve got a long way through this post before mentioning the E-word. I still think that Elsevier is the publisher that does most damage, and have stuck rigidly to my promise made over six years ago not to submit a paper to them or to do editorial or refereeing work. However, whereas then I thought of Springer as somehow more friendly to mathematics, thanks to its long tradition of publishing important textbooks and monographs, I now feel pretty uncomfortable about all the big four — Elsevier, Springer, Wiley, and Taylor and Francis — with Springer having got a whole lot worse after merging with Nature Macmillan. And in some respects Elsevier is better than Springer: for example, they make all mathematics papers over four years old freely available, while Springer refuses to do so. Admittedly this was basically a sop to mathematicians to keep us quiet, but as sops go it was a pretty good one, and I see now that Elsevier’s open archive, as they call it, includes some serious non-mathematical journals such as Cell. (See their list of participating journals for details.)
I’m also not very comfortable with the society journals and university presses, since although they use their profits to benefit mathematics in various ways, they are fully complicit in the system of big deals, the harm of which outweighs those benefits.
The result is that if I have a paper to submit, I tend to have a lot of trouble finding a suitable home for it, and I end up having to compromise on my principles to some extent (particularly if, as happened recently, I have a young coauthor from a country that uses journal rankings to evaluate academics). An obvious place to submit to would be Discrete Analysis, but I feel uncomfortable about that for a different reason, especially now that I have discovered that the facility that enables all the discussion of a paper to be hidden from selected editors does not allow me, as administrator of the website, to hide a paper from myself. (I won’t have this last problem with Advances in Combinatorics, since the librarians at Queens will have the administrator role on the system.)
So my personal options are somewhat limited, but getting better. If I have willing coauthors, then I would now consider (if I had a suitable paper), Acta Mathematica, Annales Henri Lebesgue, Journal de l’École Polytechnique, Discrete Analysis perhaps (but only if the other editors agreed to process my paper offline), Advances in Combinatorics, the Theory of Computing, Electronic Research Announcements in the Mathematical Sciences, the Electronic Journal of Combinatorics, and the Online Journal of Analytic Combinatorics. I also wouldn’t rule out Forum of Mathematics. A couple of journals to which I have an emotional attachment even if I don’t really approve of their practices are GAFA and Combinatorics, Probability and Computing. (The latter bothers me because it is a hybrid journal — that is, it charges subscriptions but also lets authors pay large APCs to make their articles open access, and I heard recently that if you choose the open access option, CUP retains copyright, so you’re not getting that much for your money. But I think not many authors choose this option. The former is also a hybrid journal, and is published by Springer.) Annals of Mathematics, if I’m lucky enough to have an Annals-worthy paper (though I think now I’d try Acta first), is not too bad — although its articles aren’t open access, their subscription costs are much more reasonable than most journals.
That’s a list off the top of my head: if you think I’ve missed out a good option, then I’d be very happy to hear about it.
As an editor, I have recently made the decision that I want to devote all my energies to promoting journals and “post-journal” systems that I fully approve of. So in order to make time for the work that will be involved in establishing Advances in Combinatorics, I have given notice to Forum of Mathematics and Mathematika, the two journals that took up the most of my time, that I will leave their editorial boards at the end of 2018. I feel quite sad about Forum of Mathematics, since I was involved in it from the start, and I really like the way it runs, with proper discussions amongst all the editors about the decisions we make. Also, I am less hostile (for reasons I’ve given in the past) to its APC model than most mathematicians. However, although I am less hostile, I could never say that I have positively liked it, and I came to the conclusion quite a while ago that, as many others have also said, it simply can’t be made to work satisfactorily: it will lead to just as bad market abuses as there are with the subscription system. In the UK it has been a disaster — government open-access mandates have led to universities paying as much as ever for subscriptions and then a whole lot extra for APCs. And there is a real worry that subscription big deals will be replaced by APC big deals, where a country pays a huge amount up front to a publisher in return for people from that country being able to publish with them. This, for example, is what Germany is pushing for. Fortunately, for the moment (if I understand correctly, though I don’t have good insider information on this) they are asking for the average fee per article to be much lower than Elsevier is prepared to accept: long may that impasse continue.
So my leaving Forum of Mathematics is not a protest against it, but simply a practical step that will allow me to focus my energies where I think they can do the most good. I haven’t yet decided whether I ought to resign in protest from some other editorial boards of journals that don’t ask anything of me. Actually, even the practice of having a long list of names of editors, most of whom have zero involvement in the decisions of the journal, is one that bothers me. I recently heard of an Elsevier journal where almost all the editorial board would be happy to resign en masse and set up an ethical version, but the managing editor is strongly against. “But why don’t the rest of the board resign in that case?” I naively asked, to which the answer was, “Because he’s the one who does all the work!” From what I understood, this is literally true — the managing editor handles all the papers and makes all the decisions — but I’m not 100% sure about that.
Probably major change, if it happens, will be the result of decisions made by major players such as government agencies, national negotiators, and so on. Compared with big events like the Elsevier negotiations in Germany, founding a new journal is a very small step. And even if all mathematicians gave up using the commercial publishers (not something I expect to see any time soon), that would have almost no direct effect, since mathematics journals are bundled together with journals in other subjects, which would continue with the current system.
However, this is a familiar situation in politics. Big decisions are taken by people in positions of power, but what prompts them to make those decisions is often the result of changes in attitudes and behaviour of voters. And big behavioural changes do happen in academia. For example, as we all know, many people have got into the habit of posting all their work on the arXiv, and this accumulation of individual decisions has had the effect of completely changing the way dissemination works in some subjects, including mathematics, a change that has significantly weakened the hold that journals have — or would have if they weren’t bundled together with other journals. Who would ever subscribe at vast expense to a mathematics journal when almost all its content is available online in preprint form?
So I see Advances in Combinatorics as a small step certainly, but a step that needs to be taken. I hope that it will demonstrate once again that starting a serious new journal is not that hard. I also hope that the current trickle of such journals will turn into a flood, that after the flood it will not be possible for people to argue that they are forced to submit articles to the commercial publishers, and that at some point, someone in a position of power will see what is going on, understand better the absurdities of the current system, and take a decision that benefits us all.
The covariant power set functor $P : Set \to Set$ can be made into a monad whose multiplication $m_X: P(P(X)) \to P(X)$ turns a subset of the set of subsets of $X$ into a subset of $X$ by taking their union. Algebras of this monad are complete semilattices.
But what about powers of the power set functor? Yesterday Jules Hedges pointed out this paper:
The authors prove that $P^n$ cannot be made into a monad for $n \ge 2$.
I’ve mainly looked at their proof for the case $n = 2$. I haven’t completely worked through it, but it focuses on the unit of any purported monad structure for $P^2$, rather than its multiplication. Using a cute Yoneda trick they show there are only four possible units, corresponding to the four elements of $P(P(1))$. Then they show these can’t work. The argument involves sets like this:
As far as I’ve seen, they don’t address the following question:
Question. Does there exist an associative multiplication $m: P^2 P^2 \Rightarrow P^2$? In other words, is there a natural transformation $m: P^2 P^2 \Rightarrow P^2$ such that
$P^2 P^2 P^2 \stackrel{m P^2 }{\Rightarrow} P^2 P^2 \stackrel{m}{\Rightarrow} P^2$
equals
$P^2 P^2 P^2 \stackrel{P^2 m}{\Rightarrow} P^2 P^2 \stackrel{m}{\Rightarrow} P^2 .$
I’m not very good at these things, so this question might be very easy to answer. But if the answer were “obviously no” then you’d think Klin and Salamanca might have mentioned that. They do prove there is no distributive law $P P \Rightarrow P P$. But they also give examples of monads $T$ for which there’s no distributive law $T T \Rightarrow T T$, yet there’s still a way to make $T^2$ into a monad.
As far as I can tell, my question is fairly useless: does anyone consider “semigroupads”, namely monads without unit? Nonetheless I’m curious.
If there were a positive answer, we’d have a natural way to take a set of sets of sets of sets and turn it into a set of sets in such a way that the two most obvious resulting ways to turn a set of sets of sets of sets of sets of sets into a set of sets agree!
This is a sequel to this previous blog post, in which we discussed the effect of the heat flow evolution
on the zeroes of a time-dependent family of polynomials , with a particular focus on the case when the polynomials had real zeroes. Here (inspired by some discussions I had during a recent conference on the Riemann hypothesis in Bristol) we record the analogous theory in which the polynomials instead have zeroes on a circle , with the heat flow slightly adjusted to compensate for this. As we shall discuss shortly, a key example of this situation arises when is the numerator of the zeta function of a curve.
More precisely, let be a natural number. We will say that a polynomial
of degree (so that ) obeys the functional equation if the are all real and
for all , thus
and
for all non-zero . This means that the zeroes of (counting multiplicity) lie in and are symmetric with respect to complex conjugation and inversion across the circle . We say that this polynomial obeys the Riemann hypothesis if all of its zeroes actually lie on the circle . For instance, in the case, the polynomial obeys the Riemann hypothesis if and only if .
Such polynomials arise in number theory as follows: if is a projective curve of genus over a finite field , then, as famously proven by Weil, the associated local zeta function (as defined for instance in this previous blog post) is known to take the form
where is a degree polynomial obeying both the functional equation and the Riemann hypothesis. In the case that is an elliptic curve, then and takes the form , where is the number of -points of minus . The Riemann hypothesis in this case is a famous result of Hasse.
Another key example of such polynomials arise from rescaled characteristic polynomials
of matrices in the compact symplectic group . These polynomials obey both the functional equation and the Riemann hypothesis. The Sato-Tate conjecture (in higher genus) asserts, roughly speaking, that “typical” polyomials arising from the number theoretic situation above are distributed like the rescaled characteristic polynomials (1), where is drawn uniformly from with Haar measure.
Given a polynomial of degree with coefficients
we can evolve it in time by the formula
thus for . Informally, as one increases , this evolution accentuates the effect of the extreme monomials, particularly, and at the expense of the intermediate monomials such as , and conversely as one decreases . This family of polynomials obeys the heat-type equation
In view of the results of Marcus, Spielman, and Srivastava, it is also very likely that one can interpret this flow in terms of expected characteristic polynomials involving conjugation over the compact symplectic group , and should also be tied to some sort of “” version of Brownian motion on this group, but we have not attempted to work this connection out in detail.
It is clear that if obeys the functional equation, then so does for any other time . Now we investigate the evolution of the zeroes. Suppose at some time that the zeroes of are distinct, then
From the inverse function theorem we see that for times sufficiently close to , the zeroes of continue to be distinct (and vary smoothly in ), with
Differentiating this at any not equal to any of the , we obtain
and
and
Inserting these formulae into (2) (expanding as ) and canceling some terms, we conclude that
for sufficiently close to , and not equal to . Extracting the residue at , we conclude that
which we can rearrange as
If we make the change of variables (noting that one can make depend smoothly on for sufficiently close to ), this becomes
Intuitively, this equation asserts that the phases repel each other if they are real (and attract each other if their difference is imaginary). If obeys the Riemann hypothesis, then the are all real at time , then the Picard uniqueness theorem (applied to and its complex conjugate) then shows that the are also real for sufficiently close to . If we then define the entropy functional
then the above equation becomes a gradient flow
which implies in particular that is non-increasing in time. This shows that as one evolves time forward from , there is a uniform lower bound on the separation between the phases , and hence the equation can be solved indefinitely; in particular, obeys the Riemann hypothesis for all if it does so at time . Our argument here assumed that the zeroes of were simple, but this assumption can be removed by the usual limiting argument.
For any polynomial obeying the functional equation, the rescaled polynomials converge locally uniformly to as . By Rouche’s theorem, we conclude that the zeroes of converge to the equally spaced points on the circle . Together with the symmetry properties of the zeroes, this implies in particular that obeys the Riemann hypothesis for all sufficiently large positive . In the opposite direction, when , the polynomials converge locally uniformly to , so if , of the zeroes converge to the origin and the other converge to infinity. In particular, fails the Riemann hypothesis for sufficiently large negative . Thus (if ), there must exist a real number , which we call the de Bruijn-Newman constant of the original polynomial , such that obeys the Riemann hypothesis for and fails the Riemann hypothesis for . The situation is a bit more complicated if vanishes; if is the first natural number such that (or equivalently, ) does not vanish, then by the above arguments one finds in the limit that of the zeroes go to the origin, go to infinity, and the remaining zeroes converge to the equally spaced points . In this case the de Bruijn-Newman constant remains finite except in the degenerate case , in which case .
For instance, consider the case when and for some real with . Then the quadratic polynomial
has zeroes
and one easily checks that these zeroes lie on the circle when , and are on the real axis otherwise. Thus in this case we have (with if ). Note how as increases to , the zeroes repel each other and eventually converge to , while as decreases to , the zeroes collide and then separate on the real axis, with one zero going to the origin and the other to infinity.
The arguments in my paper with Brad Rodgers (discussed in this previous post) indicate that for a “typical” polynomial of degree that obeys the Riemann hypothesis, the expected time to relaxation to equilibrium (in which the zeroes are equally spaced) should be comparable to , basically because the average spacing is and hence by (3) the typical velocity of the zeroes should be comparable to , and the diameter of the unit circle is comparable to , thus requiring time comparable to to reach equilibrium. Taking contrapositives, this suggests that the de Bruijn-Newman constant should typically take on values comparable to (since typically one would not expect the initial configuration of zeroes to be close to evenly spaced). I have not attempted to formalise or prove this claim, but presumably one could do some numerics (perhaps using some of the examples of given previously) to explore this further.
In particle physics, we almost always use approximations.
Often, we assume the forces we consider are weak. We use a “coupling constant”, some number written or or , and we assume it’s small, so is greater than is greater than . With this assumption, we can start drawing Feynman diagrams, and each “loop” we add to the diagram gives us a higher power of .
If isn’t small, then the trick stops working, the diagrams stop making sense, and we have to do something else.
Except for some times, when everything keeps working fine. This week, along with Simon Caron-Huot, Lance Dixon, Andrew McLeod, and Georgios Papathanasiou, I published what turned out to be a pretty cute example.
We call this fellow . It’s a family of diagrams that we can write down for any number of loops: to get more loops, just extend the “…”, adding more boxes in the middle. Count the number of lines sticking out, and you get six: these are “hexagon functions”, the type of function I’ve used to calculate six-particle scattering in N=4 super Yang-Mills.
The fun thing about is that we don’t have to think about it this way, one loop at a time. We can add up all the loops, times one loop plus times two loops plus times three loops, all the way up to infinity. And we’ve managed to figure out what those loops sum to.
The result ends up beautifully simple. This formula isn’t just true for small coupling constants, it’s true for any number you care to plug in, making the forces as strong as you’d like.
We can do this with because we have equations relating different loops together. Solving those equations with a few educated guesses, we can figure out the full sum. We can also go back, and use those equations to take the s at each loop apart, finding a basis of functions needed to describe them.
That basis is the real reward here. It’s not the full basis of “hexagon functions”: if you wanted to do a full six-particle calculation, you’d need more functions than the ones is made of. What it is, though, is a basis we can describe completely, stating exactly what it’s made of for any number of loops.
We can’t do that with the hexagon functions, at least not yet: we have to build them loop by loop, one at a time before we can find the next ones. The hope, though, is that we won’t have to do this much longer. The basis covers some of the functions we need. Our hope is that other nice families of diagrams can cover the rest. If we can identify more functions like , things that we can sum to any number of loops, then perhaps we won’t have to think loop by loop anymore. If we know the right building blocks, we might be able to guess the whole amplitude, to find a formula that works for any you’d like.
That would be a big deal. N=4 super Yang-Mills isn’t the real world, but it’s complicated in some of the same ways. If we can calculate there without approximations, it should at least give us an idea of what part of the real-world answer can look like. And for a field that almost always uses approximations, that’s some pretty substantial progress.
Important note: As this is not a course in probability, we will try to avoid developing the general theory of stochastic calculus (which includes such concepts as filtrations, martingales, and Ito calculus). This will unfortunately limit what we can actually prove rigorously, and so at some places the arguments will be somewhat informal in nature. A rigorous treatment of many of the topics here can be found for instance in Lawler’s Conformally Invariant Processes in the Plane, from which much of the material here is drawn.
In these notes, random variables will be denoted in boldface.
Definition 1 A real random variable is said to be normally distributed with mean and variance if one has
for all test functions . Similarly, a complex random variable is said to be normally distributed with mean and variance if one has
for all test functions , where is the area element on .
A real Brownian motion with base point is a random, almost surely continuous function (using the locally uniform topology on continuous functions) with the property that (almost surely) , and for any sequence of times , the increments for are independent real random variables that are normally distributed with mean zero and variance . Similarly, a complex Brownian motion with base point is a random, almost surely continuous function with the property that and for any sequence of times , the increments for are independent complex random variables that are normally distributed with mean zero and variance .
Remark 2 Thanks to the central limit theorem, the hypothesis that the increments be normally distributed can be dropped from the definition of a Brownian motion, so long as one retains the independence and the normalisation of the mean and variance (technically one also needs some uniform integrability on the increments beyond the second moment, but we will not detail this here). A similar statement is also true for the complex Brownian motion (where now we need to normalise the variances and covariances of the real and imaginary parts of the increments).
Real and complex Brownian motions exist from any base point or ; see e.g. this previous blog post for a construction. We have the following simple invariances:
Exercise 3
- (i) (Translation invariance) If is a real Brownian motion with base point , and , show that is a real Brownian motion with base point . Similarly, if is a complex Brownian motion with base point , and , show that is a complex Brownian motion with base point .
- (ii) (Dilation invariance) If is a real Brownian motion with base point , and is non-zero, show that is also a real Brownian motion with base point . Similarly, if is a complex Brownian motion with base point , and is non-zero, show that is also a complex Brownian motion with base point .
- (iii) (Real and imaginary parts) If is a complex Brownian motion with base point , show that and are independent real Brownian motions with base point . Conversely, if are independent real Brownian motions of base point , show that is a complex Brownian motion with base point .
The next lemma is a special case of the optional stopping theorem.
Lemma 4 (Optional stopping identities)
- (i) (Real case) Let be a real Brownian motion with base point . Let be a bounded stopping time – a bounded random variable with the property that for any time , the event that is determined by the values of the trajectory for times up to (or more precisely, this event is measurable with respect to the algebra generated by this proprtion of the trajectory). Then
and
and
- (ii) (Complex case) Let be a real Brownian motion with base point . Let be a bounded stopping time – a bounded random variable with the property that for any time , the event that is determined by the values of the trajectory for times up to . Then
Proof: (Slightly informal) We just prove (i) and leave (ii) as an exercise. By translation invariance we can take . Let be an upper bound for . Since is a real normally distributed variable with mean zero and variance , we have
and
and
By the law of total expectation, we thus have
and
and
where the inner conditional expectations are with respect to the event that attains a particular point in . However, from the independent increment nature of Brownian motion, once one conditions to a fixed point , the random variable becomes a real normally distributed variable with mean and variance . Thus we have
and
and
which give the first two claims, and (after some algebra) the identity
which then also gives the third claim.
Exercise 5 Prove the second part of Lemma 4.
— 1. Conformal invariance of Brownian motion —
Let be an open subset of , and a point in . We can define the complex Brownian motion with base point restricted to to be the restriction of a complex Brownian motion with base point to the first time in which the Brownian motion exits (or if no such time exists). We have a fundamental conformal invariance theorem of Lévy:
Theorem 6 (Lévy’s theorem on conformal invariance of Brownian motion) Let be a conformal map between two open subsets of , and let be a complex Brownian motion with base point restricted to . Define a rescaling by
Note that this is almost surely a continuous strictly monotone increasing function. Set (so that is a homeomorphism from to ), and let be the function defined by the formula
Then is a complex Brownian motion with base point restricted to .
Note that this significantly generalises the translation and dilation invariance of complex Brownian motion.
Proof: (Somewhat informal – to do things properly one should first set up Ito calculus) To avoid technicalities we will assume that is bounded above and below on , so that the map is uniformly bilipschitz; the general case can be obtained from this case by a limiting argument that is not detailed here. With this assumption, we see that almost surely extends continuously to the endpoint time if this time is finite. Once one conditions on the value of and up to this time , we then extend this motion further (if ) by declaring for to be a complex Brownian motion with base point , translated in time by . Now is defined on all of , and it will suffice to show that this is a complex Brownian motion based at . The basing is clear, so it suffices to show for all times , the random variable is normally distributed with mean and variance .
Let be a test function. It will suffice to show that
If we define the field
for and , with , then it will suffice to prove the more general claim
for all and (with the convention that is just Brownian motion based at if lies outside of ), where
As is well known, is smooth on and solves the backwards heat equation
on this domain. The strategy will be to show that also solves this equation.
Let and . If then clearly . If instead and , then is a Brownian motion and then we have . Now suppose that be small enough that , where is an upper bound for on . Let be the first time such that either or
Then if we let be the quantity
then and . Let us now condition on a specific value of , and on the trajectory up to time . Then the (conditional) distribution of is that of , and hence the conditional expectation is . By the law of total expectation, we conclude the identity
Next, we obtain the analogous estimate
From Taylor expansion we have
Taking expectations and applying Lemma 4, (2) and Hölder’s inequality (which can interpolate between the bounds and to conclude ), we obtain the desired claim (3). Subtracting, we now have
The expression in the expectation vanishes unless , hence by the triangle inequality
Iterating this using the fact that vanishes at , and sending to zero (noting that the cumulative error term will go to zero since ), we conclude that for all , giving the claim.
One can use Lévy’s theorem (or variants of this theorem) to prove various results in complex analysis rather efficiently. As a quick example, we sketch a Brownian motion-based proof of Liouville’s theorem (omitting some technical steps). Suppose for contradiction that we have a nonconstant bounded entire function . If is a complex Brownian motion based at , then a variant of Levy’s theorem can be used to show that the image is a time parameterisation of Brownian motion. But it is easy to show that Brownian motion is almost surely unbounded, so the image cannot be bounded.
If is an open subset of whose complement contains an arc, then one can show that for any , the complex Brownian motion based at will hit the boundary of in a finite time . The location where this motion first hits the boundary is then a random variable in ; the law of this variable is called the harmonic measure of with base point , and we will denote it by ; it is a probability measure on . The reason for the terminology “harmonic measure” comes from the following:
Theorem 7 Let be a bounded open subset of , and let be a harmonic (or holomorphic) function that extends continuously to . Then for any , one has the representation formula
Proof: (Informal) For simplicity let us assume that extends smoothly to some open neighbourhood of . Let be the motion that is equal to up to time , and then is constant at for all later times. A variant of the Taylor expansion argument used to prove Lévy’s theorem shows that
for any , which on iterating and sending to zero implies that is independent of time. Since this quantity converges to as and to as , the claim follows.
This theorem can also extend to unbounded domains provided that does not grow too fast at infinity (for instance if is bounded, basically thanks to the neighbourhood recurrent properties of complex Brownian motion); we do not give a precise statement here. Among other things, this theorem gives an immediate proof of the maximum principle for harmonic functions, since if on the boundary then from the triangle inequality one has for all . It also gives an alternate route to Liouville’s theorem: if is entire and bounded, then applying the maximum principle to the complement of a small disk we see that for all distinct .
When the boundary is sufficiently nice (e.g. analytic), the harmonic measure becomes absolutely continuous with respect to one-dimensional Lebesgue measure; however, we will not pay too much attention to these sorts of regularity issues in this set of notes.
From Levy’s theorem on the conformal invariance of Brownian motion we deduce the conformal invariance of harmonic measure, thus for any conformal map that extends continuously to the boundaries and any , the harmonic measure of with base point is the pushforward of the harmonic measure of with base point , thus
for any continuous compactly supported test function , and also
for any (Borel) measurable .
- (i) If and , show that the measure on the unit circle is given by
where is arclength measure. In particular, when , then is the uniform measure on the unit circle.
- (ii) If and , show that the measure on the real line is given by
(For this exercise one can assume that harmonic measure is well defined for unbounded domains, and that the representation formula (4) continues to hold for bounded harmonic or holomorphic functions.)
Exercise 9 (Brownian motion description of conformal mapping) Let be the region enclosed by a Jordan curve , and let be three distinct points on in anticlockwise order. Let be three distinct points on the boundary of the unit disk , again traversed in anticlockwise order. Let be the conformal map that takes to for (the existence and uniqueness of this map follows from the Riemann mapping theorem). Let , and for , let be the probability that the terminal point of Brownian motion at with base point lies in the arc between and (here we use the fact that the endpoints are hit with probability zero, or in other words that the harmonic measure is continuous; see Exercise 15 below). Thus are non-negative and sum to . Let be the complex numbers , , . Show the crossratio identity
In principle, this allows one to describe conformal maps purely in terms of Brownian motion.
We remark that the link between Brownian motion and conformal mapping can help gain an intuitive understanding of the Carathéodory kernel theorem (Theorem 12 from Notes 3). Consider for instance the example in Exercise 13 from those notes. It is intuitively clear that a Brownian motion based at the origin will very rarely pass through the slit beween and , instead hitting the right side of the boundary of first. As such, the harmonic measure of the left side of the bounadry should be very small, and in fact one can use this to show that the preimage under of the region to the left of the boundary goes to zero in diameter as , which helps explain why the limiting function does not map to this region at all.
Exercise 10 (Brownian motion description of conformal radius)
- (i) Let and with . Show that the probability that the Brownian motion hits the circle before it hits is equal to . (Hint: is harmonic away from the origin.)
- (ii) Let be a simply connected proper subset of , let be a point in , and let be the conformal radius of around . Show that for small , the probability that a Brownian motion based at a point with will hit the circle before it hits the boundary is equal to , where denotes a quantity that goes to zero as .
We now sketch the proof of a basic Brownian motion estimate that is useful in applications. We begin with a lemma that says, roughly speaking, that “folding” a set reduces the probability of it being hit by Brownian motion.
Lemma 12 Let , and let be a closed subset of the unit disk . Write and , and write (i.e. reflected onto the upper half-plane). Let be a complex Brownian motion based at , and let be the first time this motion hits the boundary of . Then
Proof: (Informal) To illustrate the argument at a heuristic level, let us make the (almost surely false) assumption that the Brownian motion only crosses the real axis at a finite set of times before hitting the disk. Then the Brownian motion would split into subcurves for , with the convention that . Each subcurve would lie in either the upper half-plane or the lower half-plane, with equal probability of each; furthermore, one could arbitrarily apply complex conjugation to one or more of these subcurves and still obtain a motion with the same law. Observe that if one conditions on the Brownian motion up to time , and the subcurve has a probability of hitting when it lies in the upper half-plane, and a probability of hitting when it lies in the lower half-plane, then it will have a probability of at most of hitting when it lies in the upper half-plane, and probability of hitting when it lies in the lower half-plane; thus the probability of this subcurve hitting is less than or equal to that of it hitting . In principle, the lemma now follows from repeatedly applying the law of total expectation.
This naive argument does not quite work because a Brownian motion starting at a real number will in fact almost surely cross the real axis an infinite number of times. However it is possible to adapt this argument by redefining the so that after each time , the Brownian motion is forced to move some small distance before one starts looking for the next time it hits the real axis. See the proof of Lemma 6.1 of these notes of Lawler for a complete proof along these lines.
This gives an inequality similar in spirit to the Grötzsch modulus estimate from Notes 2:
Corollary 13 (Beurling projection theorem) Let , and let be a compact connected subset the annulus that intersects both boundary circles of the annulus. Let be a complex Brownian motion based at , and let be the first time this motion hits the outer boundary of the annulus. Then the probability that intersects is greater than or equal to the probability that intersects the interval .
Proof: (Sketch) One can use the above lemma to fold around the real axis without increasing the probability of being hit by Brownian motion. By rotation, one can similarly fold around any other line through the origin. By repeatedly folding in this fashion to reduce its angular variation, one can eventually replace with a set that lies inside the sector for any . However, by the monotone convergence theorem, the probability that intersects this sector converges to the probability that it intersects in the limit , and the claim follows.
Exercise 14 With the notation as the above corollary, show that the probability that intersects the interval is . (Hint: apply a square root conformal map to the disk with removed, and then compare with the half-plane harmonic measure from Exercise 8(ii).)
The following consequence of the above estimate, giving a sort of Hölder regularity of Brownian measure, is particularly useful in applications.
Exercise 15 (Beurling estimate) Let be an open set not containing , with the property that the connected component of containing intersects the unit circle . Let be such that . Then for any , one has ; that is to say, the probability that a Brownian motion based at exits at a point within from the origin is . (Hint: one can use conformal mapping to show that the probability appearing at the end of Corollary 13 is .) Conclude in particular that harmonic measures are always continuous (they assign zero to any point).
Exercise 16 Let be a region bounded by a Jordan curve, let , let be the Brownian motion based at , and let be the first time this motion exits . Then for any , show that the probability that the curve has diameter at least is at most .
Exercise 17 Let be a conformal map with , and let be a curve with and for . Show that
(Hint: use Exercise 11.)
— 2. Half-plane capacity —
One can use Brownian motion to construct other close relatives of harmonic measure, such Green’s functions, excursion measures. See for instance these lecture notes of Lawler for more details. We will focus on one such use of Brownian motion, to interpret the concept of half-plane capacity; this is a notion that is particularly well adapted to the study of chordal Loewner equations (it plays a role analogous to that of conformal radius for the radial Loewner equation).
Let be the upper half-plane. A subset of the upper half-plane is said to be a compact hull if it is bounded, closed in , and the complement is simply connected. By the Riemann mapping theorem, for any compact hull , there is a unique conformal map which is normalised at infinity in the sense that
for some complex numbers . The quantity is particularly important and will be called the half-plane capacity of and denoted .
Exercise 18 (Examples of half-plane capacity)
- (i) Show the translation invariance and the dilation invariance for any compact hull , any , and any , where and .
- (ii) If is a vertical line segment for some and , show that is given by
(using the standard branch of the square root), so that
- (iii) If is the semicircle , then is given by
In general, we have the following Brownian motion characterisation of half-plane capacity:
Proposition 19 Let be a compact hull, with conformal map and half-plane capacity .
- (i) If is complex Brownian motion based at some point , and is the first time this motion exits , then
- (ii) We have
Proof: (Sketch) Part (i) follows from applying Theorem 7 to the bounded harmonic function . Part (ii) follows from part (i) by setting for a large , rearranging, and sending using (5).
Among other things, this proposition demonstrates that for all , and that the half-plane capacity is always non-negative (in fact it is not hard to show from the above proposition that it is strictly positive as long as is non-empty).
If are two compact hulls with , then will map conformally to the complement of in . Thus is also a convex hull, and by the uniqueness of Riemann maps we have the identity
which on comparing Laurent expansions leads to the further identity
In particular we have the monotonicity , with equality if and only if . One may verify that these claims are consistent with Exercise 18.
Exercise 20 (Submodularity of half-plane capacity) Let be two compact hulls.
- (i) If , show that
(Hint: use Proposition 19, and consider how the times in which a Brownian motion exits , , , and are related.)
- (ii) Show that
Exercise 21 Let be a compact hull bounded in a disk . For any , show that
as , where is complex Brownian motion based at and is the first time it exits . Similarly, for any , show that
This formula gives a Brownian motion interpretation for on the portion of the boundary of . It can be used to give useful quantitative estimates for in this region; see Section 3.4 of Lawler’s book.
— 3. The chordal Loewner equation —
We now develop (in a rather informal fashion) the theory of the chordal Loewner equation, which roughly speaking is to conformal maps from the upper half-plane to the complement of complex hulls as the radial Loewner equation is to conformal maps from the unit disk to subsets of the complex plane. A more rigorous treatment can be found in Lawler’s book.
Suppose one has a simple curve such that and . There are important and delicate issues regarding the regularity hypotheses on this curve (which become particularly important in SLE, when the regularity is quite limited), but for this informal discussion we will ignore all of these issues.
For each time , the set forms a compact hull, and so has some half-plane capacity . From the monotonicity of capacity, this half-plane capacity is increasing in . It is traditional to normalise the curve so that
this is analogous to normalising the Loewner chains from Notes 3 to have conformal radius at time . A basic example of such normalised curves would be the curves for some fixed , since the normalisation follows from (6).
Let be the conformal maps associated to these compact hulls. From (8) we will have
for any and , where is the conformal map associated to the compact hull . From (9) this hull has half-plane capacity , thus we have the Laurent expansion
It can be shown (using the Beurling estimate) that extends continuously to the tip of the curve , and attains a real value at that point; furthermore, depends continuously on . See Lemma 4.2 of Lawler’s book. As such, should be a short arc (of length ) starting at . If , it is possible to use a quantitative version of Exercise 21 (again using the Beurling estimate) to obtain an estimate basically of the form
for any fixed . If is non-zero, we instead have
For instance, if , then for all , and from Exercise 18 we have the exact formula
Inserting (12) into (11) and using the chain rule, we obtain
and we then arrive at the (chordal) Loewner equation
for all and . This equation can be justified rigorously for any simple curve : see Proposition 4.4 of Lawler’s book. Note that the imaginary part of is negative, which is consistent with the observation made previously that the imaginary part of is decreasing in .
We have started with a chain of compact hulls associated to a simple curve, and shown that the resulting conformal maps obey the Loewner equation for some continuous driving term . Conversely, suppose one is given a continuous driving term . It follows from Picard existence and uniqueness theorem that for each there is a unique maximal time of existence such that the ODE (13) with initial data can be solved for time , one can show that for each time , is a conformal map from to with the Laurent expansion
hence the complement are an increasing sequence of compact hulls with half-plane capacity . Proving complex differentiability of can be done from first principles, and the Laurent expansion near infinity is also not hard; the main difficulty is to show that the map is surjective, which requires solving (13) backwards in time (and here one can do this indefinitely as now one is moving away from the real axis instead of towards it). See Theorem 4.6 of Lawler’s book for details (in fact a more general theorem is proven, in which the single point is replaced by a probability measure, analogously to how the radial Loewner equation uses Herglotz functions instead of a single driving function when not restricted to slit domains). However, there is a subtlety, in that the hulls are not necessarily the image of simple curves . This is often the case for short times if the driving function does not oscillate too wildly, but it can happen that the curve that one would expect to trace out eventually intersects itself, in which case the region it then encloses must be absorbed into the hull (cf. the “pinching off” phenomenon in the Carathéodory kernel theorem). Nevertheless, it is still possible to have Loewner chains that are “generated” by non-simple paths , in the sense that consists of the unbounded connected component of the complement .
There are some symmetries of the transform from the to the . If one translates by a constant, , then the resulting domains are also translated, , and . Slightly less trivially, for any , if one performs a rescaled dilation , then one can check using (13) that , and the corresponding conformal maps are given by . On the other hand, just performing a scalar multiple on the driving force can transform the behavior of dramatically; the transform from to is very definitely not linear!
— 4. Schramm-Loewner evolution —
In the previous section, we have indicated that every continuous driving function gives rise to a family of conformal maps obeying the Loewner equation (13). The (chordal) Schramm-Loewner evolution () with parameter is the special case in which the driving function takes the form for some real Brownian motion based at the origin. Thus is now a random conformal map from a random domain , defined by solving the Schramm-Loewner equation
with initial condition for , and with defined as the set of all for which the above ODE can be solved up to time taking values in . The parameter cannot be scaled away by simple renormalisations such as scaling, and in fact the behaviour of is rather sensitive to the value of , with special behaviour or significance at various values such as playing particularly special roles; there is also a duality relationship between and which we will not discuss here.
The case is rather boring, in which is deterministic, and is just with the line segment between and removed. The cases are substantially more interesting. It is a non-trivial theorem (particularly at the special value ) that is almost surely generated by some random path ; see Theorem 6.3 of Lawler’s book. The nature of this path is sensitive to the choice of parameter :
See Section 6.2 of Lawler’s book. The path becomes increasingly fractal as increases: it is a result of Rohde and Schramm and Beffara that the image almost surely has Hausdorff dimension .
We have asserted that defines a random path in that starts at the origin and generally “wanders off” to infinity (though for it keeps recurring back to bounded sets infinitely often). By the Riemann mapping theorem, we can now extend this to other domains. Let be a simply connected open proper subset of whose boundary we will assume for simplicity to be a Jordan curve (this hypothesis can be relaxed). Let be two distinct points on the boundary . By the Riemann mapping theorem and Carathéodory’s theorem (Theorem 20 from Notes 2), there is a conformal map whose continuous extension maps and to and respectively; this map is unique up to rescalings for . One can then define the Schramm-Loewner evolution on from to to be the family of conformal maps for , where is the usual Schramm-Loewner evolution with parameter . The Schramm-Loewner evolution on is well defined up to a time reparameterisation . The Markovian and stationary nature of Brownian motion translates to an analogous Markovian and conformally invariant property of . Roughly speaking, it is the following: if is any reasonable domain with two boundary points , is on this domain from to with associated path , and is any time, then after conditioning on the path up to time , the remainder of the path has the same image as the path on the domain from to . Conversely, under suitable regularity hypotheses, the processes are the only random path processes on domains with this property (much as Brownian motion is the only Markovian stationary process, once one normalises the mean and variance). As a consequence, whenever one now a random path process that is known or suspected to enjoy some conformal invariance properties, it has become natural to conjecture that it obeys the law of (though in some cases it is more natural to work with other flavours of SLE than the chordal SLE discussed here, such as radial SLE or whole-plane SLE). For instance, in the pioneering work of Schramm, this line of reasoning was used to conjecture that the loop-erased random walk in a domain has the law of (radial) ; this conjecture was then established by Lawler, Schramm, and Werner. Many further processes have since been either proven or conjectured to be linked to one of the SLE processes, such as the limiting law of a uniform spanning tree (proven to be ), interfaces of the Ising model (proven to be ), or the scaling limit of self-avoiding random walks (conjectured to be ). Further discussion of these topics is beyond the scope of this course, and we refer the interested reader to Lawler’s book for more details.
A few weeks ago I listened to a bunch of super-enthusiastic high school students share their excitement about astronomy, astrophysics and the space industry. We were at the Royal Astronomical Society of New Zealand’s annual meeting; every year they fund a dozen or so keen students from around the country to attend the event. It’s competitive, with many more kids applying than there are places to take them.
The students get to share what lit their passion for space, but as they recounted their stories maybe a third of them mentioned a teacher or career counsellor who had done their best to quench that fire. “There’s no jobs in that.” “It’s really hard,” “You can’t do that in New Zealand.” “It won’t help you get into university.” And I wondered who listens to a smart, ambitious youngster's hopes and dreams and tells them to play it safe?
It isn’t all teachers – others told of enthusiastic and supportive mentors – but it was potentially way too many of them, even if the nay-sayers imagine they are protecting kids from disappointment or quietly steering them away from games that are out of their league. Ironically, this is happening at a time when there is a lot of talk about reinventing school – getting kids to code, preparing them for a future where machine learning, AI and 3D printing are big things, building baby entrepreneurs. But none of that will be much use if some teachers are telling students not to chase their dreams.
Ironically, when educators talk about preparing kids for the future, part of their message is usually that the age of linear careers is behind us and today's kids will change jobs more far frequently than their parents did. Which makes it particularly strange for teachers to tell students to aim only at safe and dependable careers as they chart their educational trajectory.
So what should you say to a smart kid with dreams that would take them off the beaten track? Here's my list:
Item 1 is easy, and it works for anything. Item 2 is honest – no-one sleepwalks to being an astronaut or an All Black – and it is likely part of the fun.
Item 3 is a biggie, and for some goals the answer will be more obvious than others. What should someone who wants to be a professional athlete do, beyond just training for their sport? I'm not the best person to ask about that (really, I'm not) but whatever you want to do, my guess is that some of the things that will get you there won't always be obvious: one of my key skills as a scientist is that I like to write and know how to tell a story. (Seriously: every grant application is a story about what I would do if you gave me money – and it helps if I tell it well.) If a kid at your school wants to get into astronomy (something I do know something about), it could be a long wait before they can actually do astronomy. But astronomers need to know maths, computing, statistics and physics – does the kid you are talking to know this? Set them to finding out what is on their list.
Item 4 leads into Item 5. If a kid chases their dreams, are they really taking their whole future into the Casino of Life and putting it all on 22? Or are they laying down skills that also lead to all sorts of other opportunities? If I am brutally honest, I will only retire once, and while my field is growing it is certainly not expanding fast enough to accommodate everyone who trains to work within it. Numbers as miserable as a 2 or 3% success rate for making the transition from a PhD to a permanent job in the field are thrown around – the situation is not as dire as that, but in astronomy the success rate may be around 20% for people who actually get PhDs. At some point, it is possible students stand a better chance of success if they don't spend too much time stressing about the odds. But all my former students are doing well. Some have tenure – and others have successful startups or jobs in Silicon Valley where they use knowledge they acquired while they were working with me.
Someone chasing their dreams may be walking a high wire, but they can build a safety net by laying down serious and transferable skills. (And many of the people I know who pulled the cord on their personal Plan B made a conscious decision to do something new; ambitions change.) Help them figure out what those will be for them. The real trick to Item 5 is to help kids take measured risks without hinting you don't think they can do it. And this goes double if you are talking to people who might feel that they won't fit naturally into their chosen field – my friend Chanda Prescod-Weinstein often says that African American students hear well-meaning talk about a personal Plan B as "I don' think this is for you"; the same likely goes for Māori and Pasifika kids in New Zealand, or girls heading into fields where women are scarce.
Item 6 and 7: whatever it is they want to do, there will always be something they can start on right away. There's no time like the present. And be there for them. Check in. Ask how they are getting on. And wish them well.
And one last question: if you're a teacher, what's your ambition? It's got to be better to watch your students chase their dreams than, when they come to tell the story of their success, be remembered as the person who said they couldn't do it.
CODA: It is also true that if we are talking about academia, it is not just the student's job to be sure they have a Plan B, it is up to their teachers to recognise that many of them will need one, make sure that they get the chance to maximise their transferable skills and know how to demonstrate them to potential employers. But that's a story for another day.
IMAGE CREDIT: The image is from NASA.
We now approach conformal maps from yet another perspective. Given an open subset of the complex numbers , define a univalent function on to be a holomorphic function that is also injective. We will primarily be studying this concept in the case when is the unit disk .
Clearly, a univalent function on the unit disk is a conformal map from to the image ; in particular, is simply connected, and not all of (since otherwise the inverse map would violate Liouville’s theorem). In the converse direction, the Riemann mapping theorem tells us that every open simply connected proper subset of the complex numbers is the image of a univalent function on . Furthermore, if contains the origin, then the univalent function with this image becomes unique once we normalise and . Thus the Riemann mapping theorem provides a one-to-one correspondence between open simply connected proper subsets of the complex plane containing the origin, and univalent functions with and . We will focus particular attention on the univalent functions with the normalisation and ; such functions will be called schlicht functions.
One basic example of a univalent function on is the Cayley transform , which is a Möbius transformation from to the right half-plane . (The slight variant is also referred to as the Cayley transform, as is the closely related map , which maps to the upper half-plane.) One can square this map to obtain a further univalent function , which now maps to the complex numbers with the negative real axis removed. One can normalise this function to be schlicht to obtain the Koebe function
which now maps to the complex numbers with the half-line removed. A little more generally, for any we have the rotated Koebe function
that is a schlicht function that maps to the complex numbers with the half-line removed.
Every schlicht function has a convergent Taylor expansion
for some complex coefficients with . For instance, the Koebe function has the expansion
and similarly the rotated Koebe function has the expansion
Intuitively, the Koebe function and its rotations should be the “largest” schlicht functions available. This is formalised by the famous Bieberbach conjecture, which asserts that for any schlicht function, the coefficients should obey the bound for all . After a large number of partial results, this conjecture was eventually solved by de Branges; see for instance this survey of Korevaar or this survey of Koepf for a history.
It turns out that to resolve these sorts of questions, it is convenient to restrict attention to schlicht functions that are odd, thus for all , and the Taylor expansion now reads
for some complex coefficients with . One can transform a general schlicht function to an odd schlicht function by observing that the function , after removing the singularity at zero, is a non-zero function that equals at the origin, and thus (as is simply connected) has a unique holomorphic square root that also equals at the origin. If one then sets
it is not difficult to verify that is an odd schlicht function which additionally obeys the equation
Conversely, given an odd schlicht function , the formula (4) uniquely determines a schlicht function .
For instance, if is the Koebe function (1), becomes
which maps to the complex numbers with two slits removed, and if is the rotated Koebe function (2), becomes
De Branges established the Bieberbach conjecture by first proving an analogous conjecture for odd schlicht functions known as Robertson’s conjecture. More precisely, we have
Theorem 1 (de Branges’ theorem) Let be a natural number.
- (i) (Robertson conjecture) If is an odd schlicht function, then
- (ii) (Bieberbach conjecture) If is a schlicht function, then
It is easy to see that the Robertson conjecture for a given value of implies the Bieberbach conjecture for the same value of . Indeed, if is schlicht, and is the odd schlicht function given by (3), then from extracting the coefficient of (4) we obtain a formula
for the coefficients of in terms of the coefficients of . Applying the Cauchy-Schwarz inequality, we derive the Bieberbach conjecture for this value of from the Robertson conjecture for the same value of . We remark that Littlewood and Paley had conjectured a stronger form of Robertson’s conjecture, but this was disproved for by Fekete and Szegö.
To prove the Robertson and Bieberbach conjectures, one first takes a logarithm and deduces both conjectures from a similar conjecture about the Taylor coefficients of , known as the Milin conjecture. Next, one continuously enlarges the image of the schlicht function to cover all of ; done properly, this places the schlicht function as the initial function in a sequence of univalent maps known as a Loewner chain. The functions obey a useful differential equation known as the Loewner equation, that involves an unspecified forcing term (or , in the case that the image is a slit domain) coming from the boundary; this in turn gives useful differential equations for the Taylor coefficients of , , or . After some elementary calculus manipulations to “integrate” this equations, the Bieberbach, Robertson, and Milin conjectures are then reduced to establishing the non-negativity of a certain explicit hypergeometric function, which is non-trivial to prove (and will not be done here, except for small values of ) but for which several proofs exist in the literature.
The theory of Loewner chains subsequently became fundamental to a more recent topic in complex analysis, that of the Schramm-Loewner equation (SLE), which is the focus of the next and final set of notes.
— 1. The area theorem and its consequences —
We begin with the area theorem of Grönwall.
Theorem 2 (Grönwall area theorem) Let be a univalent function with a convergent Laurent expansion
Then
Proof: By shifting we may normalise . By hypothesis we have for any ; by replacing with and using a limiting argument, we may assume without loss of generality that the have some exponential decay as (in order to justify some of the manipulations below).
Let be a large parameter. If , then and . The area enclosed by the simple curve is equal to
crucially, the error term here goes to zero as . Meanwhile, by the change of variables formula (using monotone convergence if desired to work in compact subsets of the annulus initially) and Plancherel's theorem, the area of the region is
Comparing these bounds we conclude that
sending to infinity, we obtain the claim.
Exercise 3 Let be a univalent function with Taylor expansion
Show that the area of is equal to . (In particular, has finite area if and only if .)
Corollary 4 (Bieberbach inequality)
- (i) If is an odd schlicht function, then .
- (ii) If is a schlicht function, then .
Proof: For (i), we apply Theorem 2 to the univalent function defined by , which has a Laurent expansion , to give the claim. For (ii), apply part (i) to the square root of with first term .
Exercise 5 Show that equality occurs in Corollary 4(i) if and only if takes the form for some , and in Corollary 4(ii) if and only if takes the form of a rotated Koebe function for some .
The Bieberbach inequality can be rescaled to bound the second coefficient of univalent functions:
Exercise 6 (Rescaled Bieberbach inequality) If is a univalent function, show that
When does equality hold?
The Bieberbach inequality gives a useful lower bound for the image of a univalent function, known as the Koebe quarter theorem:
Corollary 7 (Koebe quarter theorem) Let be a univalent function. Then contains the disk .
Proof: By applying a translation and rescaling, we may assume without loss of generality that is a schlicht function, with Taylor expansion
Our task is now to show that for every , the equation has a solution in . If this were not the case, then the function is invertible on , with inverse being univalent and having the Taylor expansion
Applying Exercise 6 we then have
while from the Bieberbach inequality one also has . Hence by the triangle inequality , which is incompatible with the hypothesis .
Exercise 8 Show that the radius is best possible in Corollary 7 (thus, does not contain any disk with ) if and only if takes the form for some complex numbers and real .
Remark 9 The univalence hypothesis is crucial in the Koebe quarter theorem. Consider for instance the functions defined by . These are locally univalent functions (since is holomorphic with non-zero derivative) and , , but avoids the point .
Exercise 10 (Koebe distortion theorem) Let be a schlicht function, and let have magnitude .
- (i) Show that
(Hint: compose on the right with a Möbius automorphism of that sends to and then apply the rescaled Bieberbach inequality.)
- (ii) Show that
(Hint: use (i) to control the radial derivative of .)
- (iii) Show that
- (iv) Show that
(This cannot be directly derived from (ii) and (iii). Instead, compose on the right with a Mobius automorphism that sends to and to , rescale it to be schlicht, and apply (iii) to this function at .)
- (v) Show that the space of schlicht functions is a normal family. In other words, if is any sequence of schlicht functions, then there is a subsequence that converges locally uniformly on compact sets.
- (vi) (Qualitative Bieberbach conjecture) Show that for each natural number there is a constant such that whenever is a schlicht function with Taylor expansion
- (i) Show that the conformal radius is strictly monotone in : if are non-empty simply connected open subsets of , and , then the conformal radius of around is strictly greater than that of .
- (ii) Show that the conformal radius of a disk around an element of the disk is given by the formula .
- (iii) Show that the conformal radius of around lies between and , where is the radius of the maximal disk that is contained in .
- (iv) If the conformal radius of around is equal to , show that for all sufficiently small , the ring domain has modulus , where denotes a quantity that goes to zero as , and the modulus of a ring domain was defined in Notes 2.
We can use the distortion theorem to obtain a nice criterion for when univalent maps converge to a given limit, known as the Carathéodory kernel theorem.
Theorem 12 (Carathéodory kernel theorem) Let be a sequence of simply connected open proper subsets of containing the origin, and let be a further simply connected open proper subset of containing . Let and be the conformal maps with and (the existence and uniqueness of these maps are given by the Riemann mapping theorem). Then the following are equivalent:
- (i) converges locally uniformly on compact sets to .
- (ii) For every subsequence of the , is the set of all such that there is an open connected set containing and that is contained in for all sufficiently large .
If conclusion (ii) holds, is known as the kernel of the domains .
Proof: Suppose first that converges locally uniformly on compact sets to . If , then for some . If , then the holomorphic functions converge uniformly on to the function , which is not identically zero but has a zero in . By Hurwitz’s theorem we conclude that also has a zero in for all sufficiently large ; indeed the same argument shows that one can replace by any element of a small neighbourhood of to obtain the same conclusion, uniformly in . From compactness we conclude that for sufficiently large , has a zero in for all , thus for sufficiently large . Since is open connected and contains and , we see that is contained in the set described in (ii).
Conversely, suppose that is a subsequence of the and is such that there is an open connected set containing and that is contained in for sufficiently large . The inverse maps are holomorphic and bounded, hence form a normal family by Montel’s theorem. By refining the subsequence we may thus assume that the converge locally uniformly to a holomorphic limit . The function takes values in , but by the open mapping theorem it must in fact map to . In particular, . Since converges to , and converges locally uniformly to , we conclude that converges to , thus and hence . This establishes the derivation of (ii) from (i).
Now suppose that (ii) holds. It suffices to show that every subsequence of has a further subsequence that converges locally uniformly on compact sets to (this is an instance of the Urysohn subsequence principle). Then (as contains ) in particular there is a disk that is contained in the for all sufficiently large ; on the other hand, as is not all of , there is also a disk which is not contained in the for all sufficiently large . By Exercise 11, this implies that the conformal radii of the around zero is bounded above and below, thus is bounded above and below.
By Exercise 10(v), and rescaling, the functions then form a normal family, thus there is a subsequence of the that converges locally uniformly on compact sets to some limit . Since is positive and bounded away from zero, is also positive, so is non-constant. By Hurwitz’s theorem, is therefore also univalent, and thus maps to some region . By the implication of (ii) from (i) (with replaced by ) we conclude that is the set of all such that there is an open connected set containing and that is contained in for all sufficiently large ; but by hypothesis, this set is also . Thus , and then by the uniqueness part of the Riemann mapping theorem, as desired.
The condition in Theorem 12(ii) indicates that “converges” to in a rather complicated sense, in which large parts of are allowed to be “pinched off” from and disappear in the limit. This is illustrated in the following explicit example:
Exercise 13 (Explicit example of kernel convergence) Let be the function from (5), thus is a univalent function from to with the two vertical rays from to , and from to , removed. For any natural number , let and let , and define the transformed functions .
- (i) Show that is a univalent function from to with the two vertical rays from to , and from to , removed, and that and .
- (ii) Show that converges locally uniformly to the function , and that this latter map is a univalent map from to the half-plane . (Hint: one does not need to compute everything exactly; for instance, any terms of the form can be written using the notation instead of expanded explicitly.)
- (iii) Explain why these facts are consistent with the Carathéodory kernel theorem.
As another illustration of the theorem, let be two distinct convex open proper subsets of containing the origin, and let be the associated conformal maps from to respectively with and . Then the alternating sequence does not converge locally uniformly to any limit. The set is the set of all points that lie in a connected open set containing the origin that eventually is contained in the sequence ; but if one passes to the subsequence , this set of points enlarges to , and so the sequence does not in fact have a kernel.
However, the kernel theorem simplifies significantly when the are monotone increasing, which is already an important special case:
Corollary 14 (Monotone increasing case of kernel theorem) Let the notation and assumptions be as in Theorem 12. Assume furthermore that
and that . Then converges locally uniformly on compact sets to .
Loewner observed that the kernel theorem can be used to approximate univalent functions by functions mapping into slit domains. More precisely, define a slit domain to be an open simply connected subset of formed by deleting a half-infinite Jordan curve connecting some finite point to infinity; for instance, the image of the Koebe function is a slit domain.
Theorem 15 (Loewner approximation theorem) Let be a univalent function. Then there exists a sequence of univalent functions whose images are slit domains, and which converge locally uniformly on compact subsets to .
Proof: First suppose that extends to a univalent function on a slightly larger disk for some . Then the image of the unit circle is a Jordan curve enclosing the region in the interior. Applying the Jordan curve theorem (and the Möbius inversion ), one can find a half-infinite Jordan curve from to infinity that stays outside of . For any , one can concatenate this curve with the arc to obtain another half-infinite Jordan curve , whose complement is a slit domain which has as kernel (why?). If we let be the conformal maps from to with and , we conclude from the Carathéodory kernel theorem that converges locally uniformly on compact sets to .
If is just univalent on , then it is the locally uniform limit of the dilations , which are univalent on the slightly larger disks . By the previous arguments, each is in turn the locally uniform limit of univalent functions whose images are slit domains, and the claim now follows from a diagonalisation argument.
— 2. Loewner chains —
The material in this section is based on these lecture notes of Contreras.
An important tool in analysing univalent functions is to study one-parameter families of univalent functions, parameterised by a time parameter , in which the images are increasing in ; roughly speaking, these families allow one to study an arbitrary univalent function by “integrating” along such a family from back to . Traditionally, we normalise these families into (radial) Loewner chains, which we now define:
Definition 16 (Loewner chain) A (radial) Loewner chain is a family of univalent maps with and (so in particular is schlicht), such that for all . (In these notes we use the prime notation exclusively for differentiation in the variable; we will use later for differentiation in the variable.)
A key example of a Loewner chain is the family
of dilated Koebe functions; note that the image of each is the slit domain , which is clearly monotone increasing in . More generally, we have the rotated Koebe chains
Whenever one has a family of simply connected proper open subsets of containing with for , and . By definition, is then the conformal radius of around , which is a strictly increasing function of by Exercise 11. If this conformal radius is equal to at and increases continuously to infinity as , then one can reparameterise the variable so that , at which point one obtains a Loewner chain.
From the Koebe quarter theorem we see that each image in a Loewner chain contains the disk . In particular the increase to fill out all of : .
Let be a Loewner chain, Let . The relation is sometimes expressed as the assertion that is subordinate to . It has the consequence that one has a composition law of the form
for a univalent function , uniquely defined as , noting taht is well-defined on . By construction, we have and
as well as the composition laws
for . We will refer to the as transition functions.
From the Schwarz lemma, we have
for , with strict inequality when . In particular, if we introduce the function
for and , then (after removing the singularity at infinity and using (10)) we see that is a holomorphic map to the right half-plane , normalised so that
Define a Herglotz function to be a holomorphic function , thus is a Herglotz function for all . A key family of examples of a Herglotz function are the Möbius transforms for . In fact, all other Herglotz functions are basically just averages of this one:
Exercise 17 (Herglotz representation theorem) Let be a Herglotz function, normalised so that .
- (i) For any , show that
for . (Hint: The real part of is harmonic, and so has a Poisson kernel representation. Alternatively, one can use a Taylor expansion of .)
- (ii) Show that there exists a (Radon) probability measure on such that
for all . (One will need a measure-theoretic tool such as Prokhorov’s theorem, the Riesz representation theorem, or the Helly selection principle.) Conversely, show that every probability measure on generates a Herglotz function with by the above formula.
- (iii) Show that the measure constructed on (ii) is unique.
This has a useful corollary, namely a version of the Harnack inequality:
Exercise 18 (Harnack inequality) Let be a Herglotz function, normalised so that . Show that
for all .
This gives some useful Lipschitz regularity properties of the transition functions and univalent functions in the variable:
Lemma 19 (Lipschitz regularity) Let be a compact subset of , and let . Use to denote a quantity bounded in magnitude by , where depends only on .
- (i) For any and , one has
- (ii) For any and , one has
One can make the bounds much more explicit if desired (see e.g. Lemma 2.3 of these notes of Contreras), but for our purposes any Lipschitz bound will suffice.
Proof: To prove (i), it suffices from (11) and the Schwarz-Pick lemma (Exercise 13 from Notes 2) to establish this claim when . We can also assume that since the claim is trivial when . From the Harnack inequality one has
for , which by (12) and some computation gives
Now we prove (ii). We may assume without loss of generality that is convex. From Exercise 10 (normalising to be schlicht) we see that for , and hence has a Lipschitz constant of on . Since , the claim now follows from (13).
As a first application of this we show that every schlicht function starts a Loewner chain.
Lemma 20 Let be schlicht. Then there exists a Loewner chain with .
Proof: This will be similar to the proof of Theorem 15. First suppose that extends to be univalent on for some , then is a Jordan curve. Then by Carathéodory’s theorem (Theorem 20 of Notes 2) (and the Möbius inversion ) one can find a conformal map from the exterior of to the exterior of that sends infinity to infinity. If we define for to be the region enclosed by the Jordan curve , then the are increasing in with conformal radius going to infinity as . If one sets to be the conformal maps with and , then (by the uniqueness of Riemann mapping) and by the Carathéodory kernel theorem, converges locally uniformly to as . In particular, the conformal radii are continuous in . Reparameterising in one can then obtain the required Loewner chain.
Now suppose is only univalent of . As in the proof of Theorem 15, one can express as the locally uniform limit of schlicht functions , each of which extends univalently to some larger disk . By the preceding discussion, each of the extends to a Loewner chain . From the Lipschitz bounds (and the Koebe distortion theorem) one sees that these chains are locally uniformly equicontinuous in and , uniformly in , and hence by Arzela-Ascoli we can pass to a subsequence that converges locally uniformly in to a limit ; one can also assume that the transition functions converge locally uniformly to limits . It is then not difficult by Hurwitz theorem to verify the limiting relations (9), (11), and that is a Loewner chain with as desired.
Suppose that are close to each other: . Then one heuristically has the approximations
and hence by (12) and some rearranging
and hence on applying , (9), and the Newton approximation
This suggests that the should obey the Loewner equation
for some Herglotz function . This is essentially the case:
Theorem 21 (Loewner equation) Let be a Loewner chain. Then, for outside of an exceptional set of Lebesgue measure zero, the functions are differentiable in time for each , and obey the equation (14) for all and , and some Herglotz function for each with . Furthermore, the maps are measurable for every .
Proof: Let be a countable dense subset of . From Lemma 19, the function is Lipschitz continuous, and thus differentiable almost everywhere, for each . Thus there exists a Lebesgue measure zero set such that is differentiable in outside of for each . From the Koebe distortion theorem is also locally Lipschitz (hence locally uniformly equicontinuous) in the variable, so in fact is differentiable in outside of for all . Without loss of generality we may assume contains zero.
Let , and let . Then as approaches from below, we have
uniformly; from (9) and Newton approximation we thus have
which implies that
Also we have
and hence by (12)
Taking limits, we see that the function is Herglotz with , giving the claim. It is also easy to verify the measurability (because derivatives of Lipschitz functions are measurable)
Example 22 The Loewner chain (7) solves the Loewner equation with the Herglotz function . With the rotated Koebe chains (8), we instead have .
Although we will not need it in this set of notes, there is also a converse implication that for every family of Herglotz functions depending measurably on , one can associate a Loewner chain.
Let us now Taylor expand a Loewner chain at each time as
as , we have . As is differentiable in almost every for each , and is locally uniformly continuous in , we see from the Cauchy integral formulae that the are also differentiable almost everywhere in . If we similarly write
for all outside of , then , and we obtain the equations
and so forth. For instance, for the Loewner chain (7) one can verify that and for solve these equations. For (8) one instead has and .
We have the following bounds on the first few coefficients of :
Exercise 23 Let be a Herglotz function with . Let be the measure coming from the Herglotz representation theorem.
- (i) Show that for all . In particular, for all . Use this to give an alternate proof of the upper bound in the Harnack inequality.
- (ii) Show that .
We can use this to establish the first two cases of the Bieberbach conjecture:
Theorem 24 ( cases of Bieberbach) If is schlicht, then and .
The bound is not new, and indeed was implicitly used many times in the above arguments, but we include it to illustrate the use of the equations (15), (16).
Proof: By Lemma 20, we can write (and ) for some Loewner chain .
We can write (15) as . On the other hand, from the Koebe distortion theorem applied to the schlicht functions , we have , so in particular goes to zero at infinity. We can integrate from to infinity to obtain
From Harnack’s inequality we have , giving the required bound .
In a similar vein, writing (16) as
we obtain
As , we may integrate from to infinity to obtain the identity
Taking real parts using Exercise 23(ii) and (17), we have
Since , we thus have
where . By Cauchy-Schwarz, we have , and from the bound , we thus have
Replacing by the schlicht function (which rotates by ) and optimising in , we obtain the claim .
Exercise 25 Show that equality in the above bound is only attained when is a rotated Koebe function.
The Loewner equation (14) takes a special form in the case of slit domains. Indeed, let be a slit domain not containing the origin, with conformal radius around , and let be the Loewner chain with . We can parameterise so that the sets have conformal radius around for every , in which case we see that must be the unique conformal map from to with and . For instance, for the chain (7) we would have .
Theorem 26 (Loewner equation for slit domains) In the above situation, we have the Loewner equation holding with
Proof: Let be a time where the Loewner equation holds. For , the function extends continuously to the boundary, and is two-to-one on the split , except at the tip where there is a single preimage on the unit circle; this can be seen by taking a holomorphic square root of , using a Möbius transformation to map the resulting image to a set bounded by a Jordan curve, and applying Carathéodory's theorem (Theorem 20 from Notes 2) to the resulting conformal map. The image is then with a Jordan arc removed, where is a point on the boundary of the sphere. Applying Carathéodory’s theorem to a holomorphic square root of , we see that extends continuously to be a map from to , with an arc on the boundary mapping (in two-to-one fashion) to the arc , and the endpoints of this arc mapping to . From this and (12), we see that converges to zero outside of the arc , which by the Herglotz representation theorem implies that the measure associated to is supported on the arc . An inspection of the proof of Carathéodory’s theorem also reveals that the are equicontinuous on as , and thus converge uniformly to (which is the identity function) as . This implies that must converge to the point as approaches , and so converges vaguely to the Dirac mass at . Since converges locally uniformly to , we conclude the formula (18). As depends measurably in , we conclude that does also.
In fact one can show that extends to a continuous function , and that the Loewner equation holds for all , but this is a bit trickier to show (it requires some further distortion estimates on conformal maps, related to the arguments used to prove Carathéodory’s theorem in the previous notes) and will not be done here. One can think of the function as “driving force” that incrementally enlarges the slit via the Loewner equation; this perspective is often used when studying the Schramm-Loewner evolution, which is the topic of the next (and final) set of notes.
— 3. The Bieberbach conjecture —
We now turn to the resolution of the Bieberbach (and Robertson) conjectures. We follow the simplified treatment of de Branges’ original proof, due to FitzGerald and Pommerenke, though we omit the proof of one key ingredient, namely the non-negativity of a certain hypergeometric function.
The first step is to work not with the Taylor coefficients of a schlicht function or with an odd schlicht function , but rather with the (normalised) logarithm of a schlicht function , as the coefficients end up obeying more tractable equations. To transfer to this setting we need the following elementary inequalities relating the coefficients of a power series with the coefficients of its exponential.
Lemma 27 (Second Lebedev-Milin inequality) Let be a formal power series with complex coefficients and no constant term, and let be its formal exponential, thus
where is the formal series . Then for any , one has
Proof: If we formally differentiate (19) in , we obtain the identity
extracting the coefficient for any , we obtain the formula
By Cauchy-Schwarz, we thus have
Using and telescoping series, it thus suffices to prove the identity
But this follows from observing that
and that
for all .
Exercise 28 Show that equality holds in (20) for a given if and only if there is such that for all .
Exercise 29 (First Lebedev-Milin inequality) With the notation as in the above lemma, and under the additional assumption , prove that
(Hint: using the Cauchy-Schwarz inequality as above, first show that the power series is bounded term-by-term by the power series of .) When does equality occur?
Exercise 30 (Third Lebedev-Milin inequality) With the notation as in the above lemma, show that
(Hint: use the second Lebedev-Milin inequality and (21), together with the calculus inequality for all .) When does equality occur?
Using these inequalities, one can reduce the Robertson and Bieberbach conjectures to the following conjecture of Milin, also proven by de Branges:
Theorem 31 (Milin conjecture) Let be a schlicht function. Let be the branch of the logarithm of that equals at the origin, thus one has
for some complex coefficients . Then one has
for all .
Indeed, if
is an odd schlicht function, let be the schlicht function given by (4), then
Applying Lemma 27 with , we obtain the Robertson conjecture, and the Bieberbach conjecture follows.
Example 32 If is the Koebe function (1), then
so in this case and . Similarly, for the rotated Koebe function (2) one has and again . If one works instead with the dilated Koebe function , we have , thus the time parameter only affects the constant term in . This is already a hint that the coefficients of could be worth studying further in this problem.
To prove the Milin conjecture, we use the Loewner chain method. It suffices by Theorem 15 and a limiting argument to do so in the case that is a slit domain. Then, by Theorem 26, is the initial function of a Loewner chain that solves the Loewner equation
for all and almost every , and some function .
We can transform this into an equation for . Indeed, for non-zero we may divide by to obtain
(for any local branch of the logarithm) and hence
Since , is equal to at the origin (for an appropriate branch of the logarithm). Thus we can write
The are locally Lipschitz in (basically thanks to Lemma 19) and for almost every we have the Taylor expansions
and
Comparing coefficients, we arrive at the system of ordinary differential equations
Fix (we will not need to use any induction on here). We would like to use the system (22) to show that
The most naive attempt to do this would be to show that one has a monotonicity formula
for all , and that the expression goes to zero as , as the claim would then follow from the fundamental theorem of calculus. This turns out to not quite work; however it turns out that a slight modification of this idea does work. Namely, we introduce the quantities
where for each , is a continuously differentiable function to be chosen later. If we have the initial condition
for all , then the Milin conjecture is equivalent to asking that . On the other hand, if we impose a boundary condition
for , then we also have as , since is schlicht and hence is a normal family, implying that the are bounded in for each . Thus, to solve the Milin, Robertson, and Bieberbach conjectures, it suffices to find a choice of weights obeying the initial and boundary conditions (23), (24), and such that
for almost every (note that will be Lipschitz, so the fundamental theorem of calculus applies).
Let us now try to establish (25) using (22). We first write , and drop the explicit dependence on , thus
for . To simplify this equation, we make a further transformation, introducing the functions
(with the convention ); then we can write the above equation as
We can recover the from the by the formula
It may be worth recalling at this point that in the example of the rotated Koebe Loewner chain (2) one has , , and , for some real constant . Observe that has a simpler form than in this example, suggesting again that the decision to transform the problem to one about the rather than the is on the right track.
We now calculate
Conveniently, the unknown function no longer appears explicitly! Some simple algebra shows that
and hence by summation by parts
with the convention .
In the example of the rotated Koebe function, with , the factors and both vanish, which is consistent with the fact that vanishes in this case regardless of the choice of weights . So these two factors look to be related to each other. On the other hand, for more general choices of , these two expressions do not have any definite sign. For comparison, the quantity also vanishes when , and has a definite sign. So it is natural to see of these three factors are related to each other. After a little bit of experimentation, one eventually discovers the following elementary identity giving such a connection:
Inserting this identity into the above equation, we obtain
which can be rearranged as
We can kill the first summation by fiat, by imposing the requirement that the obey the system of differential equations
Hence if we also have the non-negativity condition
for all and , we will have obtained the desired monotonicity (25).
To summarise, in order to prove the Milin conjecture for a fixed value of , we need to find functions obeying the initial condition (23), the boundary condition (24), the differential equation (26), and the nonnegativity condition (27), with the convention . This is a significant reduction to the problem, as one just has to write down an explicit formula for such functions and verify all the properties.
Let us work out some simple cases. First consider the case . Now our task is to solve the system
for all . This is easy: we just take (indeed this is the unique choice). This gives the case of the Milin conjecture (which corresponds to the case of Bieberbach).
Next consider the case . The system is now
Again, a routine computation shows that there is a unique solution here, namely and . This gives the case of the Milin conjecture (which corresponds to the case of Bieberbach). One should compare this argument to that in Theorem 24, in particular one should see very similar weight functions emerging.
Let us now move on to . The system is now
A slightly lengthier calculation gives the unique explicit solution
to the above conditions.
These simple cases already indicate that there is basically only one candidate for the weights that will work. A calculation can give the explicit formula:
Exercise 33 Let .
- (i) Show there is a unique choice of continuously differentiable functions that solve the differential equations (26) with initial condition (23), with the convention . (Use the Picard existence theorem.)
- (ii) For any , show that the expression
is equal to when is even and when is odd.
- (iii) Show that the functions
for obey the properties (23), (26), (24). (Hint: for (23), first use (ii) to show that is equal to when is even and when is odd, then use (26).)
The Bieberbach conjecture is then reduced to the claim that
for any and . This inequality can be directly verified for any fixed ; for general it follows from general inequalities on Jacobi polynomials by Askey and Gasper, with an alternate proof given subsequently by Gasper. A further proof of (28), based on a variant of the above argument due to Weinstein that avoids explicit use of (28), appears in this article of Koepf. We will not detail these arguments here.
My course on applied category theory is continuing! After a two-week break where the students did exercises, I went back to lecturing about Fong and Spivak’s book Seven Sketches. The second chapter is about ‘resource theories’.
Resource theories help us answer questions like this:
Resource theories in their modern form were arguably born in these papers:
Bob Coecke, Tobias Fritz and Robert W. Spekkens, A mathematical theory of resources.
Tobias Fritz, Resource convertibility and ordered commutative monoids.
We are lucky to have Tobias in our course, helping the discussions along! He’s already posted some articles on resource theory on the Azimuth blog:
Tobias Fritz, Resource convertibility (part 1), Azimuth, 7 April 2015.
Tobias Fritz, Resource convertibility (part 2), Azimuth, 10 April 2015.
Tobias Fritz, Resource convertibility (part 3), Azimuth, 13 April 2015.
In the course, we had fun bouncing between the relatively abstract world of monoidal preorders and their very concrete real-world applications to chemistry, scheduling, manufacturing and other topics. Here are the lectures:
I’d completed about half my candidacy exam.
Four Caltech faculty members sat in front of me, in a bare seminar room. I stood beside a projector screen, explaining research I’d undertaken. The candidacy exam functions as a milepost in year three of our PhD program. The committee confirms that the student has accomplished research and should continue.
I was explaining a quantum-thermodynamics problem. I reviewed the problem’s classical doppelgänger and a strategy for solving the doppelgänger. Could you apply the classical strategy in the quantum problem? Up to a point. Beyond it, you’d need
“Does anyone here like the Beatles?” I asked the committee. Three professors had never participated in an exam committee before. The question from the examinee appeared to startle them.
One committee member had participated in cartloads of committees. He recovered first, raising a hand.
The committee member—John Preskill—then began singing the Beatles song.
In the middle of my candidacy exam.
The moment remains one of the highlights of my career.
Throughout my PhD career, I’ve reported to John. I’ve emailed an update every week and requested a meeting about once a month. I sketch the work that’s firing me, relate my plans, and request feedback.
Much of the feedback, I’ve discerned over the years, condenses into aphorisms buried in our conversations. I doubt whether John has noticed his aphorisms. But they’ve etched themselves in me, and I hope they remain there.
“Think big.” What would impact science? Don’t buff a teapot if you could be silversmithing.
Education serves as “money in the bank.” Invest in yourself, and draw on the interest throughout your career.
“Stay broad.” (A stretching outward of both arms accompanies this aphorism.) Embrace connections with diverse fields. Breadth affords opportunities to think big.
“Keep it simple,” but “do something technical.” A teapot cluttered with filigree, spouts, and eighteen layers of gold leaf doesn’t merit a spot at the table. A Paul Revere does.
“Do what’s best for Nicole.” I don’t know how many requests to speak, to participate on committees, to explain portions of his lecture notes, to meet, to contribute to reports, and more John receives per week. The requests I receive must look, in comparison, like a mouse to a mammoth. But John exhorts me to to guard my time for research—perhaps, partially, because he gives so much time, including to students.
“Move on.” If you discover an opportunity, study background information for a few months, seize the opportunity, wrap up the project, and seek the next window.
John has never requested my updates, but he’s grown used to them. I’ve grown used to how meetings end. Having brought him questions, I invite him to ask questions of me.
“Are you having fun?” he says.
I tell the Beatles story when presenting that quantum-thermodynamics problem in seminars.
“I have to digress,” I say when the “Help!” image appears. “I presented this slide at a talk at Caltech, where John Preskill was in the audience. Some of you know John.” People nod. “He’s a…mature gentleman.”
I borrowed the term from the apparel industry. “Mature gentleman” means “at a distinguished stage by which one deserves to have celebrated a birthday of his with a symposium.”
Many physicists lack fluency in apparel-industry lingo. My audience members take “mature” at face value.
Some audience members grin. Some titter. Some tilt their heads from side to side, as though thinking, “Eh…”
John has impact. He’s logged boatloads of technical achievements. He has the scientific muscle of a scientific rhinoceros.
And John has fun. He doesn’t mind my posting an article about audience members giggling about him.
Friends ask me whether professors continue doing science after meriting birthday symposia, winning Nobel Prizes, and joining the National Academy of Sciences. I point to the number of papers with which John has, with coauthors, electrified physics over the past 20 years. Has coauthored because science is fun. It merits singing about during candidacy exams. Satisfying as passing the exam felt two years ago, I feel more honored when John teases me about my enthusiasm for science.
A year ago, I ate lunch with an alumnus who’d just graduated from our group. Students, he reported, have a tradition of gifting John a piece of art upon graduating. I relayed the report to another recent alumnus.
“Really?” the second alumnus said. “Maybe someone gave John a piece of art and then John invented the tradition.”
Regardless of its origin, the tradition appealed to me. John has encouraged me to blog as he’s encouraged me to do theoretical physics. Writing functions as art. And writing resembles theoretical physics: Each requires little more than a pencil, paper, and thought. Each requires creativity, aesthetics, diligence, and style. Each consists of ideas, of abstractions; each lacks substance but can outlive its creator. Let this article serve as a finger painting for John Preskill.
Thanks for five fun years.
With my PhD-thesis committee, after my thesis defense. Photo credit to Nick Hutzler, who cracked the joke that accounts for everyone’s laughing. (Left to right: Xie Chen, Fernando Brandão, John Preskill, Nicole Yunger Halpern, Manuel Endres.)
I’ve got another paper up this week with Jacob Bourjaily, Andrew McLeod, and Matthias Wilhelm, about integrating Feynman diagrams.
If you’ve been following this blog for a while, you might be surprised: most of my work avoids Feynman diagrams at all costs. I’ve changed my mind, in part, because it turns out integrating Feynman diagrams can be a lot easier than I had thought.
At first, I thought Feynman integrals would be hard purely because they’re integrals. Those of you who’ve taken calculus might remember that, while taking derivatives was just a matter of following the rules, doing integrals required a lot more thought. Rather than one set of instructions, you had a set of tricks, meant to try to match your integral to the derivative of some known function. Sometimes the tricks worked, sometimes you just ended up completely lost.
As it turns out, that’s not quite the problem here. When I integrate a Feynman diagram, most of the time I’m expecting a particular kind of result, called a polylogarithm. If you know that’s the end goal, then you really can just follow the rules, using partial-fractioning to break your integral up into simpler integrations, linear pieces that you can match to the definition of polylogarithms. There are even programs that do this for you: Erik Panzer’s HyperInt is an especially convenient one.
Still, I wouldn’t have expected Feynman integrals to work particularly well, because they require too many integrations. You need to integrate a certain number of times to define a polylogarithm: for the ones we get out of Feynman diagrams, it’s two integrations for each loop the diagram has. The usual ways we calculate Feynman diagrams lead to a lot more integrations: the systematic method, using something called Symanzik polynomials, involves one integration per particle line in the diagram, which usually adds up to a lot more than two per loop.
When I arrived at the Niels Bohr Institute, I assumed everyone in my field knew about Symanzik polynomials. I was surprised when it turned out Jake Bourjaily hadn’t even heard of them. He was integrating Feynman diagrams by what seemed like a plodding, unsystematic method, taking the intro example from textbooks and just applying it over and over, gaining no benefit from all of the beautiful graph theory that goes into the Symanzik polynomials.
I was even more surprised when his method turned out to be the better one.
Avoid Symanzik polynomials, and you can manage with a lot fewer integrations. Suddenly we were pretty close to the “two integrations per loop” sweet spot, with only one or two “extra” integrations to do.
A few more advantages, and Feynman integrals were actually looking reasonable. The final insight came when we realized that just writing the problem in the right variables made a huge difference.
HyperInt, as I mentioned, tries to break a problem up into simpler integrals. Specifically, it’s trying to make things linear in the integration variable. In order to do this, sometimes it has to factor quadratic polynomials, like so:
Notice the square roots in this formula? Those can make your life a good deal trickier. Once you’ve got irrational functions in the game, HyperInt needs extra instructions for how to handle them, and integration is a lot more cumbersome.
The last insight, then, and the key point in our paper, is to avoid irrational functions. To do that, we use variables that rationalize the square roots.
We get these variables from one of the mainstays of our field, called momentum twistors. These variables are most useful in our favorite theory of N=4 super Yang-Mills, but they’re useful in other contexts too. By parametrizing them with a good “chart”, one with only the minimum number of variables we need to capture the integral, we can rationalize most of the square roots we encounter.
That “most” is going to surprise some people. We rationalized all of the expected square roots, letting us do integrals all the way to four loops in a few cases. But there were some unexpected square roots, and those we couldn’t rationalize.
These unexpected square roots don’t just make our life more complicated, if they stick around in a physically meaningful calculation they’ll upset a few other conjectures as well. People had expected that these integrals were made of certain kinds of “letters”, organized by a mathematical structure called a cluster algebra. That cluster algebra structure doesn’t have room for square roots, which suggests that it can’t be the full story here.
The integrals that we can do, though, with no surprise square roots? They’re much easier than anyone expected, much easier than with any other method. Rather than running around doing something fancy, we just integrated things the simple, rational way…and it worked!
I have just uploaded to the arXiv my paper “Commutators close to the identity“, submitted to the Journal of Operator Theory. This paper resulted from some progress I made on the problem discussed in this previous post. Recall in that post the following result of Popa: if are bounded operators on a Hilbert space whose commutator is close to the identity in the sense that
for some , then one has the lower bound
In the other direction, for any , there are examples of operators obeying (1) such that
In this paper we improve the upper bound to come closer to the lower bound:
Theorem 1 For any , and any infinite-dimensional , there exist operators obeying (1) such that
One can probably improve the exponent somewhat by a modification of the methods, though it does not seem likely that one can lower it all the way to without a substantially new idea. Nevertheless I believe it plausible that the lower bound (2) is close to optimal.
We now sketch the methods of proof. The construction giving (3) proceeded by first identifying with the algebra of matrices that have entries in . It is then possible to find two matrices whose commutator takes the form
for some bounded operator (for instance one can take to be an isometry). If one then conjugates by the diagonal operator , one can eusure that (1) and (3) both hold.
It is natural to adapt this strategy to matrices rather than matrices, where is a parameter at one’s disposal. If one can find matrices that are almost upper triangular (in that only the entries on or above the lower diagonal are non-zero), whose commutator only differs from the identity in the top right corner, thus
for some , then by conjugating by a diagonal matrix such as for some and optimising in , one can improve the bound in (3) to ; if the bounds in the implied constant in the are polynomial in , one can then optimise in to obtain a bound of the form (4) (perhaps with the exponent replaced by a different constant).
The task is then to find almost upper triangular matrices whose commutator takes the required form. The lower diagonals of must then commute; it took me a while to realise then that one could (usually) conjugate one of the matrices, say by a suitable diagonal matrix, so that the lower diagonal consisted entirely of the identity operator, which would make the other lower diagonal consist of a single operator, say . After a lot of further lengthy experimentation, I eventually realised that one could conjugate further by unipotent upper triangular matrices so that all remaining entries other than those on the far right column vanished. Thus, without too much loss of generality, one can assume that takes the normal form
for some , solving the system of equations
It turns out to be possible to solve this system of equations by a contraction mapping argument if one takes to be a “Hilbert’s hotel” pair of isometries as in the previous post, though the contraction is very slight, leading to polynomial losses in in the implied constant.
There is a further question raised in Popa’s paper which I was unable to resolve. As a special case of one of the main theorems (Theorem 2.1) of that paper, the following result was shown: if obeys the bounds
(where denotes the space of all operators of the form with and compact), then there exist operators with such that . (In fact, Popa’s result covers a more general situation in which one is working in a properly infinite algebra with non-trivial centre.) We sketch a proof of this result as follows. Suppose that and for some . A standard greedy algorithm argument (see this paper of Brown and Pearcy) allows one to find orthonormal vectors for such that for each , one has for some comparable to , and some orthogonal to all of the . After some conjugation (and a suitable identification of with , one can thus place in a normal form
where is a isometry with infinite deficiency, and have norm . Setting , it then suffices to solve the commutator equation
with ; note the similarity with (3).
By the usual Hilbert’s hotel construction, one can complement with another isometry obeying the “Hilbert’s hotel” identity
and also , . Proceeding as in the previous post, we can try the ansatz
for some operators , leading to the system of equations
Using the first equation to solve for , the second to then solve for , and the third to then solve for , one can obtain matrices with the required properties.
Thus far, my attempts to extend this construction to larger matrices with good bounds on have been unsuccessful. A model problem would be to express
as a commutator with significantly smaller than . The construction in my paper achieves something like this, but with replaced by a more complicated operator. One would also need variants of this result in which one is allowed to perturb the above operator by an arbitrary finite rank operator of bounded operator norm.
When I was a postdoc in Princeton I subscribed to the Trenton Times, because I felt it was important to be in touch with what was going on in my local community and not just follow national news. The only story I remember was one that said “hey, a basketball team from Akron is coming to play against a top prep-school team in Trenton, and they’ve got this kid LeBron James they say is incredible, you should come check it out.” And I really did think about it, but I was a postdoc, I was trying to write papers, I was busy, too busy to drive into Trento for a high-school basketball game.
So I guess what I’m trying to say is, yes, subscribe to your local paper because local journalism badly needs financial support, and maybe actually take seriously the local events it alerts you to.
less joking title:
When calculating scattering amplitudes, I like to work with polylogarithms. They’re a very well-understood type of mathematical function, and thus pretty easy to work with.
Even for our favorite theory of N=4 super Yang-Mills, though, they’re not the whole story. You need other types of functions to represent amplitudes, elliptic polylogarithms that are only just beginning to be properly understood. We had our own modest contribution to that topic last year.
You can think of the difference between these functions in terms of more and more complicated curves. Polylogarithms just need circles or spheres, elliptic polylogarithms can be described with a torus.
A torus is far from the most complicated curve you can think of, though.
String theorists have done a lot of research into complicated curves, in particular ones with a property called Calabi-Yau. They were looking for ways to curl up six or seven extra dimensions, to get down to the four we experience. They wanted to find ways of curling that preserved some supersymmetry, in the hope that they could use it to predict new particles, and it turned out that Calabi-Yau was the condition they needed.
That hope, for the most part, didn’t pan out. There were too many Calabi-Yaus to check, and the LHC hasn’t seen any supersymmetric particles. Today, “string phenomenologists”, who try to use string theory to predict new particles, are a relatively small branch of the field.
This research did, however, have lasting impact: due to string theorists’ interest, there are huge databases of Calabi-Yau curves, and fruitful dialogues with mathematicians about classifying them.
This has proven quite convenient for us, as we happen to have some Calabi-Yaus to classify.
We call Feynman diagrams like the one above “traintrack integrals”. With two loops, it’s the elliptic integral we calculated last year. With three, though, you need a type of Calabi-Yau curve called a K3. With four loops, it looks like you start needing Calabi-Yau three-folds, the type of space used to compactify string theory to four dimensions.
“We” in this case is myself, Jacob Bourjaily, Andrew McLeod, Matthias Wilhelm, and Yang-Hui He, a Calabi-Yau expert we brought on to help us classify these things. Our new paper investigates these integrals, and the more and more complicated curves needed to compute them.
Calabi-Yaus had been seen in amplitudes before, in diagrams called “sunrise” or “banana” integrals. Our example shows that they should occur much more broadly. “Traintrack” integrals appear in our favorite N=4 super Yang-Mills theory, but they also appear in theories involving just scalar fields, like the Higgs boson. For enough loops and particles, we’re going to need more and more complicated functions, not just the polylogarithms and elliptic polylogarithms that people understand.
(And to be clear, no, nobody needs to do this calculation for Higgs bosons in practice. This diagram would calculate the result of two Higgs bosons colliding and producing ten or more Higgs bosons, all at energies so high you can ignore their mass, which is…not exactly relevant for current collider phenomenology. Still, the title proved too tempting to resist.)
Is there a way to understand traintrack integrals like we understand polylogarithms? What kinds of Calabi-Yaus do they pick out, in the vast space of these curves? We’d love to find out. For the moment, we just wanted to remind all the people excited about elliptic polylogarithms that there’s quite a bit more strangeness to find, even if we don’t leave the tracks.
I have the unfortunate duty of using this blog to announce the death a couple of weeks ago of Professor Leon B Lucy, who had been a Visiting Professor working here at Imperial College from 1998.
Leon got his PhD in the early 1960s at the University of Manchester, and after postdoctoral positions in Europe and the US, worked at Columbia University and the European Southern Observatory over the years, before coming to Imperial. He made significant contributions to the study of the evolution of stars, understanding in particular how they lose mass over the course of their evolution, and how very close binary stars interact and evolve inside their common envelope of hot gas.
Perhaps most importantly, early in his career Leon realised how useful computers could be in astrophysics. He made two major methodological contributions to astrophysical simulations. First, he realised that by simulating randomised trajectories of single particles, he could take into account more physical processes that occur inside stars. This is now called “Monte Carlo Radiative Transfer” (scientists often use the term “Monte Carlo” — after the European gambling capital — for techniques using random numbers). He also invented the technique now called smoothed-particle hydrodynamics which models gases and fluids as aggregates of pseudo-particles, now applied to models of stars, galaxies, and the large scale structure of the Universe, as well as many uses outside of astrophysics.
Leon’s other major numerical contributions comprise advanced techniques for interpreting the complicated astronomical data we get from our telescopes. In this realm, he was most famous for developing the methods, now known as Lucy-Richardson deconvolution, that were used for correcting the distorted images from the Hubble Space Telescope, before NASA was able to send a team of astronauts to install correcting lenses in the early 1990s.
For all of this work Leon was awarded the Gold Medal of the Royal Astronomical Society in 2000. Since then, Leon kept working on data analysis and stellar astrophysics — even during his illness, he asked me to help organise the submission and editing of what turned out to be his final papers, on extracting information on binary-star orbits and (a subject dear to my heart) the statistics of testing scientific models.
Until the end of last year, Leon was a regular presence here at Imperial, always ready to contribute an occasionally curmudgeonly but always insightful comment on the science (and sociology) of nearly any topic in astrophysics. We hope that we will be able to appropriately memorialise his life and work here at Imperial and elsewhere. He is survived by his wife and daughter. He will be missed.
A few months ago I sat down with Craig Cannon of Y Combinator for a discussion about quantum technology and other things. A lightly edited version was published this week on the Y Combinator blog. The video is also on YouTube:
If you’re in a hurry, or can’t stand the sound of my voice, you might prefer to read the transcript, which is appended below. Only by watching the video, however, can you follow the waving of my hands.
I grabbed the transcript from the Y Combinator blog post, so you can read it there if you prefer, but I’ve corrected some of the typos. (There are a few references to questions and comments that were edited out, but that shouldn’t cause too much confusion.)
Here we go:
Craig Cannon [00:00:00] – Hey, how’s it going? This is Craig Cannon, and you’re listening to Y Combinator’s Podcast. Today’s episode is with John Preskill. John’s a theoretical physicist and the Richard P. Feynman Professor of Theoretical Physics at Caltech. He once won a bet with Stephen Hawking and he writes that it made him briefly almost famous. Basically, what happened is John and Kip Thorne bet that singularities could exist outside of black holes. After six years, Hawking conceded. He said that they were possible in very special, “non-generic conditions.” I’ll link up some more details to that in the description. In this episode, we cover what John’s been focusing on for years, which is quantum information, quantum computing, and quantum error correction. Alright, here we go. What was the revelation that made scientists and physicists think that a quantum computer could exist?
John Preskill [00:00:54] – It’s not obvious. A lot of people thought it couldn’t. The idea that a quantum computer would be powerful was emphasized over 30 years ago by Richard Feynman, the Caltech physicist. It was interesting how he came to that realization. Feynman was interested in computation his whole life. He had been involved during the war in Los Alamos. He was the head of the computation group. He was the guy who fixed the little mechanical calculators, and he had a whole crew of people who were calculating, and he figured out how to flow the work from one computer to another. All that kind of stuff. As computing technology started to evolve, he followed that. In the 1970s, a particle physicist like Feynman, that’s my background too, got really interested in using computers to study the properties of elementary particles like the quarks inside a nucleus, you know? We know a proton isn’t really a fundamental object. It’s got little beans rattling around inside, but they’re quantum beans. Gell-Mann, who’s good at names, called them quarks.
John Preskill [00:02:17] – Now we’ve had a theory since the 1970s of how quarks behave, and so in principle, you know everything about the theory, you can compute everything, but you can’t because it’s just too hard. People started to simulate that physics with digital computers in the ’70s, and there were some things that they could successfully compute, and some things they couldn’t because it was just too hard. The resources required, the memory, the time were out of reach. Feynman, in the early ’80s said nature is quantum mechanical damn it, so if you want a simulation of nature, it should be quantum mechanical. You should use a quantum system to behave like another quantum system. At the time, he called it a universal quantum simulator.
John Preskill [00:03:02] – Now we call it a quantum computer. The idea caught on about 10 years later when Peter Shor made the suggestion that we could solve problems which don’t seem to have anything to do with physics, which are really things about numbers like finding the prime factors of a big integer. That caused a lot of excitement, in part because the implications for cryptography are a big disturbing. But then physicists — good physicists — started to consider, can we really build this thing? Some concluded and argued fairly cogently that no, you couldn’t because of this difficulty that it’s so hard to isolate systems from the environment well enough for them to behave quantumly. It took a few years for that to sort out at the theoretical level. In the mid ’90s we developed a theory called quantum error correction. It’s about how to encode the quantum state that you’d like to protect in such a clever way that even if there are some interactions with the environment that you can’t control, it still stays robust.
John Preskill [00:04:17] – At first, that was just kind of a theorist’s fantasy — it was a little too far ahead of the technology. But 20 years later, the technology is catching up, and now this idea of quantum error correction has become something you can do in the lab.
Craig Cannon [00:04:31] – How does quantum error correction work? I’ve seen a bunch of diagrams, so maybe this is difficult to explain, but how would you explain it?
John Preskill [00:04:39] – Well, I would explain it this way. I don’t think I’ve said the word entanglement yet, have I?
Craig Cannon [00:04:43] – Well, I have been checking off all the Bingo words yet.
John Preskill [00:04:45] – Okay, so let’s talk about entanglement because it’s part of the answer to your question, which I’m still not done answering, what is quantum physics? What do we mean by entanglement? It’s really the characteristic way, maybe the most important way that we know in which quantum is different from ordinary stuff, from classical. Now what does it mean, entanglement? It means that you can have a physical system which has many parts, which have interacted with one another, so it’s in kind of a complex correlated state of all those parts, and when you look at the parts one at a time it doesn’t tell you anything about the state of the whole thing. The whole thing’s in some definite state — there’s information stored in it — and now you’d like to access that information … Let me be a little more concrete. Suppose it’s a book.
John Preskill [00:05:40] – Okay? It’s a book, it’s 100 pages long. If it’s an ordinary book, 100 people could each take a page, and read it, they know what’s on that page, and then they could get together and talk, and now they’d know everything that’s in the book, right? But if it’s a quantum book written in qubits where these pages are very highly entangled, there’s still a lot of information in the book, but you can’t read it the way I just described. You can look at the pages one at a time, but a single page when you look at it just gives you random gibberish. It doesn’t reveal anything about the content of the book. Why is that? There’s information in the book, but it’s not stored in the individual pages. It’s encoded almost entirely in how those pages are correlated with one another. That’s what we mean by quantum entanglement: Information stored in those correlations which you can’t see when you look at the parts one at a time. You asked about quantum error correction?
John Preskill [00:06:39] – What’s the basic idea? It’s to take advantage of that property of entanglement. Because let’s say you have a system of many particles. The environment is kind of kicking them around, it’s interacting with them. You can’t really completely turn off those interactions no matter how hard you try, but suppose we’ve encoded the information in entanglement. So, say, if you look at one atom, it’s not telling you anything about the information you’re trying to protect. The environment isn’t learning anything when it looks at the atoms one at a time.
John Preskill [00:07:15] – This is kind of the key thing — that what makes quantum information so fragile is that when you look at it, you disturb it. This ordinary water bottle isn’t like that. Let’s say we knew it was either here or here, and we didn’t know. I would look at it, I’d find out it’s here. I was ignorant of where it was to start with, and now I know. With a quantum system, when you look at it, you really change the state. There’s no way to avoid that. So if the environment is looking at it in the sense that information is leaking out to the environment, that’s going to mess it up. We have to encode the information so the environment, so to speak, can’t find out anything about what the information is, and that’s the idea of quantum error correction. If we encode it in entanglement, the environment is looking at the parts one at a time, but it doesn’t find out what the protected information is.
Craig Cannon [00:08:06] – In other words, it’s kind of measuring probability the whole way along, right?
John Preskill [00:08:12] – I’m not sure what you mean by that.
Craig Cannon [00:08:15] – Is it Grover’s algorithm that was as quantum bits roll through, go through gates– The probability is determined of what information’s being passed through? What’s being computed?
John Preskill [00:08:30] – Grover’s algorithm is a way of sort of doing an exhaustive search through many possibilities. Let’s say I’m trying to solve some problem like a famous one is the traveling salesman problem. I’ve told you what the distances are between all the pairs of cities, and now I want to find the shortest route I can that visits them all. That’s a really hard problem. It’s still hard for a quantum computer, but not quite as hard because there’s a way of solving it, which is to try all the different routes, and measure how long they are, and then find the one that’s shortest, and you’ve solved the problem. The reason it’s so hard to solve is there’s such a vast number of possible routes. Now what Grover’s algorithm does is it speeds up that exhaustive search.
John Preskill [00:09:29] – In practice, it’s not that big a deal. What it means is that if you had the same processing speed, you can handle about twice as many cities before the problem becomes too hard to solve, as you could if you were using a classical processor. As far as what’s quantum about Grover, it takes advantage of the property in quantum physics that probabilities … tell me if I’m getting too inside baseball …
Craig Cannon [00:10:03] – No, no, this is perfect.
John Preskill [00:10:05] – That probabilities are the squares of amplitudes. This is interference. Again, this is another part of the answer. Well, we can spend the whole hour answering the question, what is quantum physics? Another essential part of it is what we call interference, and this is really crucial for understanding how quantum computing works. That is that probabilities add. If you know the probability of one alternative, and you know the probability of another, then you can add those together and find the probability that one or the other occurred. It’s not like that in quantum physics. The famous example is the double slit interference experiment. I’m sending electrons, let’s say — it could be basketballs, but it’s an easier experiment to do with electrons —
John Preskill [00:11:02] – at a screen, and there are two holes in the screen. You can try to detect the electron on the other side of the screen, and when you do that experiment many times, you can plot a graph showing where the electron was detected in each run, or make a histogram of all the different outcomes. And the graph wiggles, okay? If you could say there’s some probability of going through the first hole, and some probability of going through the second, and each time you detected it, it went through either one or the other, there’d be no wiggles in that graph. It’s the interference that makes it wiggle. The essence of the interference is that nobody can tell you whether it went through the first slit or the second slit. The question is sort of inadmissible. This interference then occurs when we can add up these different alternatives in a way which is different from what we’re used to. It’s not right to say that the electron was detected at this point because it had some probability of going through the first hole, and some probability of going through the second
John Preskill [00:12:23] – and we add those probabilities up. That doesn’t give the right answer. The different alternatives can interfere. This is really important for quantum computing because what we’re trying to do is enhance the probability or the time it takes to find the solution to a problem, and this interference can work to our advantage. We want to have, when we’re doing our search, we want to have a higher chance of getting the right answer, and a lower chance of getting the wrong answer. If the different wrong answers can interfere, they can cancel one another out, and that enhances the probability of getting the right answer. Sorry it’s such a long-winded answer, but this is how Grover’s algorithm works.
John Preskill [00:13:17] – It can speed up exhaustive search by taking advantage of that interference phenomenon.
Craig Cannon [00:13:20] – Well this is kind of one of the underlying questions among many of the questions from Twitter. You’ve hit our record for most questions asked. Basically, many people are wondering what quantum computers really will do if and when it becomes a reality that they outperform classical computers. What are they going to be really good at?
John Preskill [00:13:44] – Well, you know what? I’m not really sure. If you look at the history of technology, it would be hubris to expect me to know. It’s a whole different way of dealing with information. Quantum information is not just … a quantum computer is not just a faster way of computing. It deals with information in a completely new way because of this interference phenomenon, because of entanglement that we’ve talked about. We have limited vision when it comes to predicting decades out what the impact will be of an entirely new way of doing things. Information processing, in particular. I mean you know this well. We go back to the 1960s, and people are starting to put a few transistors on a chip. Where is that going to lead? Nobody knew.
Craig Cannon [00:14:44] – Even early days of the internet.
John Preskill [00:14:45] – Yeah, good example.
Craig Cannon [00:14:46] – Even the first browser. No one really knew what anyone was going to do with it. It makes total sense.
John Preskill [00:14:52] – For good or ill. Yeah. But we have some ideas, you know? I think … why are we confident there will be some transformative effect on society? Of the things we know about, and I emphasize again, probably the most important ones are things we haven’t thought of when it comes to applications of quantum computing, the ones which will affect everyday life, I think, are better methods for understanding and inventing new materials, new chemical compounds. Things like that can be really important. If you find a better way of capturing carbon by designing a better catalyst, or you can design pharmaceuticals that have new effects, materials that have unusual properties. These are quantum physics problems because those properties of the molecule or the material really have to do with the underlying quantum behavior of the particles, and we don’t have a good way for solving such problems or predicting that behavior using ordinary digital computers. That’s what a quantum computer is good at. It’s good — but maybe not the only thing it’s good at — one thing it should certainly be good at is telling us quantitatively how quantum systems behave. In the two contexts I just mentioned, there’s little question that there will be practical impact of that.
Craig Cannon [00:16:37] – It’s not just doing the traveling salesman problem through the table of elements for why it can find these compounds.
John Preskill [00:16:49] – No. If it were, that wouldn’t be very efficient.
Craig Cannon [00:16:52] – Exactly.
John Preskill [00:16:53] – Yeah. No, it’s much trickier than that. Like I said, the exhaustive search, though conceptually it’s really interesting that quantum can speed it up because of interference, from a practical point of view it may not be that big a deal. It means that, well like I said, in the same amount of time you can solve an instance which is twice as big of the problem. What we really get excited about are the so-called exponential speed ups. That was why Shor’s algorithm was exciting in 1994, because factoring large numbers was a problem that had been studied by smart people for a long time, and on that basis, the fact that there weren’t any fast ways of solving it was pretty good evidence it’s a hard problem. Actually, we don’t know how to prove that from first principles. Maybe somebody will come along one day and figure out how to solve factoring very fast on a digital computer. It doesn’t seem very likely because people have been trying for so long to solve problems like that, and it’s just intractable with ordinary computers. You could say the same thing about these quantum physics problems. Maybe some brilliant graduate student is going to drop a paper on the arXiv tomorrow which will say, “Here, I solved quantum chemistry, and I can do it on a digital computer.” But we don’t think that’s very likely because we’ve been working pretty hard on these problems for decades and they seem to be really hard. Those cases, like these number theoretic problems,
John Preskill [00:18:40] – which have cryptological implications, and tasks for simulating the behavior of quantum systems, we’re pretty sure those are hard problems classically, and we’re pretty sure quantum computers … I mean we have algorithms that have been proposed, but which we can’t really run currently because our quantum computers aren’t big enough on the scale that’s needed to solve problems people really care about.
Craig Cannon [00:19:09] – Maybe we should jump to one of the questions from Twitter which is related to that. Travis Scholten (@Travis_Sch) asked, what are the most problem pressings in physics, let’s say specifically around quantum computers that you think substantial progress ought to be made in to move the field forward?
John Preskill [00:19:27] – I know Travis. He was an undergrad here. How you doing, Travis? The problems that we need to solve to make quantum computing closer to realization at the level that would solve problems people care about? Well, let’s go over where we are now.
Craig Cannon [00:19:50] – Yeah, definitely.
John Preskill [00:19:51] – People have been working on quantum hardware for 20 years, working hard, and there are a number of different approaches to building the hardware, and nobody really knows which is going to be the best. I think we’re far from collapsing to one approach which everybody agrees has the best long-term prospects for scalability. And so it’s important that a lot of different types of hardware are being pursued. We can come back to what some of the different approaches are later. Where are we now? We think in a couple of years we’ll have devices with about 50 qubits to 100, and we’ll be able to control them pretty well. That’s an interesting range because even though it’s only 50 to 100 qubits, doesn’t sound like that big a deal, but that’s already too many to simulate with a digital computer, even with the most powerful supercomputers today. From that point of view, these relatively small, near-term quantum computers which we’ll be fooling around with over the next five years or so, are doing something that’s kind of super-classical.
John Preskill [00:21:14] – At least, we don’t know how to do exactly the same things with ordinary computers. Now that doesn’t mean they’ll be able to do anything that’s practically important, but we’re going to try. We’re going to try, and there are ideas about things we’ll try out, including baby versions of these problems in chemistry, and materials, and ways of speeding up optimization problems. Nobody knows how well those things are going to work at these small scales. Part of the reason is not just the number of qubits is small, but they’re also not perfect. We can perform elementary operations on pairs of qubits, which we call quantum gates like the gates in ordinary logic. But they have an error rate a little bit below an error every 100 gates. If you have a circuit with 1000 qubits, that’s a lot of noise.
Craig Cannon [00:22:18] – Exactly. Does for instance, 100-qubit quantum computer really mean 100-qubit quantum computer or do you need a certain amount of backup going on?
John Preskill [00:22:29] – In the near term, we’re going to be trying out, and probably we have the best hopes for, kind of hybrid classical-quantum methods with some kind of classical feedback. You try to do something on the quantum computer, you make a measurement that gives you some information, then you change the way you did it a little bit, and try to converge on some better answer. That’s one possible way of addressing optimization that might be faster on a quantum computer. But I just wanted to emphasize that the number of qubits isn’t the only metric. How good they are, and in particular, the reliability of the gates, how well we can perform them … that’s equally important. Anyway, coming back to Travis’ question, there are lots of things that we’d like to be able to do better. But just having much better qubits would be huge, right? If you … more or less, with the technology we have now, you can have a gate error rate of a few parts in 1,000, you know? If you can improve that by orders of magnitude, then obviously, you could run bigger circuits. That would be very enabling.
John Preskill [00:23:58] – Even if you stick with 100 qubits just by having a circuit with more depth, more layers of gates, that increases the range of what you could do. That’s always going to be important. Because, I mean look at how crappy that is. A gate error rate, even if it’s one part in 1,000, that’s pretty lousy compared to if you look at where–
Craig Cannon [00:24:21] – Your phone has a billion transistors in it. Something like that, and 0%–
John Preskill [00:24:27] – You don’t worry about the … it’s gotten to the point where there is some error protection built in at the hardware level in a processor, because I mean, we’re doing these crazy things like going down from the 11 nanometer scale for features on a chip.
Craig Cannon [00:24:45] – How are folks trying to deal with interference right now?
John Preskill [00:24:50] – You mean, what types of devices? Yeah, so that’s interesting too because there are a range of different ways to do it. I mentioned that we could store information, we could make a qubit out of a single atom, for example. That’s one approach. You have to control a whole bunch of atoms and get them to interact with one another. One way of doing that is with what we call trapped ions. That means the atoms have electrical charges. That’s a good thing because then you could control them with electric fields. You could hold them in a trap, and you can isolate them, like I said, in a very high vacuum so they’re not interacting too much with other things in the laboratory, including stray electric and magnetic fields. But that’s not enough because you got to get them to talk to one another. You got to get them to interact. We have this set of desiderata, which are kind of in tension with one another. On the one hand, we want to isolate the qubits very well. On the other hand, we want to control them from the outside and get them to do what we want them to do, and eventually, we want to read them out. You have to be able to read out the result of the computation. But the key thing is the control. You could have two of those qubits in your device interact with one another in a specified way, and to do that very accurately you have to have some kind of bus that gets the two to talk to one another.
John Preskill [00:26:23] – The way they do that in an ion trap is pretty interesting. It’s by using lasers and controlling how the ions vibrate in the trap, and with a laser, kind of excite, wiggles of the ion, and then by determining whether the ions are wiggling or not, you can go address another ion, and that way you can do a two-qubit interaction. You can do that pretty well. Another way is really completely different. What I just described was encoding information at the one atom level. But another way is to use superconductivity — circuits in which electric current flows without any dissipation. In that case, you have a lot of freedom to sort of engineer the circuits to behave in a quantum way. There are many nuances there, but the key thing is that you can encode information now in a system that might involve the collective motion of billions of electrons, and yet you can control it as though it were a single atom. I mean, here’s one oversimplified way of thinking about it.
John Preskill [00:27:42] – Suppose you have a little loop of wire, and there’s current flowing in the loop. It’s a superconducting wire so it just keeps flowing. Normally, there’d be resistance, which would dissipate that as heat, but not for the superconducting circuit, which of course, has to be kept very cold so it stays superconducting. But you can imagine in this little loop that the current is either circulating clockwise or counterclockwise. That’s a way of encoding information. It could also be both at once, and that’s what makes it a qubit.
Craig Cannon [00:28:14] – Right.
John Preskill [00:28:15] – And so in that case, even though it involves lots of particles, the magic is that you can control that system extremely well. I mentioned individual electrons. That’s another approach. Put the qubit in the spin of a single electron.
Craig Cannon [00:28:32] – You also mentioned better qubits. What did you mean by that?
John Preskill [00:28:35] – Well, what I really care about is how well I can do the gates. There’s a whole other approach, which is motivated by the desire to have much, much better control over the quantum information than we do in those systems that I mentioned so far, superconducting circuits and trapped ions. That’s actually what Microsoft is pushing very hard. We call it topological quantum computing. Topological is a word physicists and mathematicians love. It means, well, we’ll come back to what it means. Anyway, let me just tell you what they’re trying to do. They’re trying to make a much, much better qubit, which they can control much, much better using a completely different hardware approach.
Craig Cannon [00:29:30] – Okay.
John Preskill [00:29:32] – It’s very ambitious because at this point, it’s not even clear they have a single qubit, but if that approach is successful, and it’s making progress, we will see a validated qubit of this type soon. Maybe next year. Nobody really knows where it goes from there, but suppose it’s the case that you could do a two-qubit gate with an error rate of one in a million instead of one in 1,000. That would be huge. Now, scaling all these technologies up, is really challenging from a number of perspectives, including just the control engineering.
Craig Cannon [00:30:17] – How are they doing it or attempting to do it?
John Preskill [00:30:21] – You know, you could ask, where did all this progress come from over 20 years, or so? For example, with the superconducting circuits, a sort of crucial measure is what we call the coherence time of the qubit, which roughly speaking, means how much it interacts with the outside world. The longer the coherence time, the better. The rate of what we call decoherence is essentially how much it’s getting buffeted around by outside influences. For the superconducting circuits, those coherence times have increased about a factor of 10 every three years, going back 15 years or so.
Craig Cannon [00:31:06] – Wow.
John Preskill [00:31:07] – Now, it won’t necessarily go on like that indefinitely, but in order to achieve that type of progress, better materials, better fabrication, better control. The way you control these things is with microwave circuitry. Not that different from the kind of things that are going on in communication devices. All those things are important, but going forward, the control is really the critical thing. Coherence times are already getting pretty long, I mean having them longer is certainly good. But the key thing is to get two qubits to interact just the way you want them to. Even if there is, now I keep saying the key thing is the environment, it’s not the only key thing, right? Because you have some qubit, like if you think about that electron spin, one way of saying it is I said it can be both up and down at the same time. Well, there’s a simpler way of saying that. It might not point either up or down. It might point some other way. But there really are a continuum of ways it could point. That’s not like a bit. See, it’s much easier to stabilize a bit because it’s got two states.
John Preskill [00:32:31] – But if it can kind of wander around in the space of possible configurations for a qubit, that makes it much harder to control. People have gotten better at that, a lot better at that in the last few years.
Craig Cannon [00:32:44] – Interesting. Joshua Harmon asked, what engineering strategy for quantum computers do you think has the most promise?
John Preskill [00:32:53] – Yeah, so I mentioned some of these different approaches, and I guess I’ll interpret the question as, which one is the winning horse? I know better than to answer that question! They’re all interesting. For the near term, the most advanced are superconducting circuits and trapped ions, which is why I mentioned those first. I think that will remain true over the next five to 10 years. Other technologies have the potential — like these topologically protected qubits — to surpass those, but it’s not going to happen real soon. I kind of like superconducting circuits because there’s so much phase space of things you can do with them. Of ways you can engineer and configure them, and imagine scaling them up.
John Preskill [00:33:54] – They have the advantage of being faster. The cycle time, time to do a gate, is faster than with the trapped ions. Just the basic physics of the interactions is different. In the long term, those electron spins could catapult ahead of these other things. That’s something that you can naturally do in silicon, and it’s potentially easy to integrate with silicon technology. Right now, the qubits and gates aren’t as good as the other technologies, but that can change. I mean, from a theorist’s perspective, this topological approach is very appealing. We can imagine it takes off maybe 10 years from now and it becomes the leader. I think it’s important to emphasize we don’t really know what’s going to scale the best.
Craig Cannon [00:34:50] – Right. And are there multiple attempts being made around programming quantum computers?
John Preskill [00:34:55] – Yeah. I mean, some of these companies– That are working on quantum technology now, which includes well-known big players like IBM, and Google, and Microsoft and Intel, but also a lot of startups now. They are trying to encompass the full stack, so they’re interested in the hardware, and the fabrication, and the control technology. But also, the software, the applications, the user interface. All those things are certainly going to be important eventually.
Craig Cannon [00:35:38] – Yeah, they’re pushing it almost to like an AWS layer. Where you interact with your quantum computer in a server farm and you don’t even touch it.
John Preskill [00:35:49] – That’s how it will be in the near term. You’re not going to have, most of us won’t, have a quantum computer sitting on your desktop, or in your pocket. Maybe someday. In the near term, it’ll be in the Cloud, and you’ll be able to run applications on it by some kind of web interface. Ideally, that should be designed so the user doesn’t have to know anything about quantum physics in order to program or use it, and I think that’s part of what some of these companies are moving toward.
Craig Cannon [00:36:24] – Do you think it will get to the level where it’s in your pocket? How do you deal with that when you’re below one kelvin?
John Preskill [00:36:32] – Well, if it’s in your pocket, it probably won’t be one kelvin.
Craig Cannon [00:36:35] – Yeah, probably not.
John Preskill [00:36:38] – What do you do? Well, there’s one approach, as an example, which I guess I mentioned in passing before, where maybe it doesn’t have to be at such low temperature, and that’s nuclear spins. Because they’re very weakly interacting with the outside world, you can have quantum information in a nuclear spin, which — I’m not saying that it would be undisturbed for years, but seconds, which is pretty good. And you can imagine that getting significantly longer. Someday you might have a little quantum smart card in your pocket. The nice thing about that particular technology is you could do it at room temperature. Still have long coherence times. If you go to the ATM and you’re worried that there’s a rogue bank that’s going to steal your information, one solution to that problem — I’m not saying there aren’t other solutions — is to have a quantum card where the bank will be able to authenticate it without being able to forge it.
Craig Cannon [00:37:54] – We should talk about the security element. Kevin Su asked what risk would quantum computers pose to current encryption schemes? So public key, and what changes should people be thinking about if quantum computers come in the next five years, 10 years?
John Preskill [00:38:12] – Yeah. Quantum computers threaten those systems that are in widespread use. Whenever you’re using a web browser and you see that little padlock and you’re at an HTTPS site, you’re using a public key cryptosystem to protect your privacy. Those cryptosystems rely for their security on the presumed hardness of computational problems. That is, it’s possible to crack them, but it’s just too hard. RSA, which is one of the ones that’s widely used … as typically practiced today, to break it you’d have to do something like factor a number which is over 2000 bits long, 2048. That’s too hard to do now. But that’s what quantum computers will be good at. Another one that’s widely used is called elliptic curve cryptography. Doesn’t really matter exactly what it is.
John Preskill [00:39:24] – But the point is that it’s also vulnerable to quantum attack, so we’re going to have to protect our privacy in different ways when quantum computers are prevalent.
Craig Cannon [00:39:37] – What are the attempts being made right now?
John Preskill [00:39:39] – There are two main classes of attempts. One is just to come up with a cryptographic protocol not so different conceptually from what’s done now, but based on a problem that’s hard for quantum computers.
Craig Cannon [00:39:59] – There you go.
John Preskill [00:40:02] – It turns out that what has sort of become the standard way doesn’t have that feature, and there are alternatives that people are working on. We speak of post-quantum cryptography, meaning the protocols that we’ll have to use when we’re worried that our adversaries have quantum computers. I don’t think there’s any proposed cryptosystem — although there’s a long list of them by now which people think are candidates for being quantum resistant, for being unbreakable, or hard to break by quantum computers. I don’t think there’s any one that the world has sufficient confidence in now that’s really hard for a quantum adversary that we’re all going to switch over. But it’s certainly time to be thinking about it. When people worry about their privacy, of course different users have different standards, but the US Government sometimes says they would like a system to stay secure for 50 years. They’d like to be able to use it for 20, roughly speaking, and then have the intercepted traffic be protected for another 30 after that. I don’t think, though I could be wrong, that we’re likely to have quantum computers that can break those public key cryptosystems in 10 years, but in 50 years seems not unlikely,
John Preskill [00:41:33] – and so we should really be worrying about it. The other one is actually using quantum communication for privacy. In other words, if you and I could send qubits to one another instead of bits, it opens up new possibilities. The way to think about these public key schemes — or one way — that we’re using now, is I want you to send me a private message, and I can send you a lockbox. It has a padlock on it, but I keep the key, okay? But you can close up the box and send it to me. But I’m the only one with the key. The key thing is that if you have the padlock you can’t reverse engineer the key. Of course, it’s a digital box and key, but that’s the idea of public key. The idea of what we call quantum key distribution, which is a particular type of quantum cryptography, is that I can actually send you the key, or you can send me your key, but why can’t any eavesdropper then listen in and know the key? Well it’s because it’s quantum, and remember, it has that property that if you look at it, you disturb it.
John Preskill [00:42:59] – So if you collect information about my key, or if the adversary does, that will cause some change in the key, and there are ways in which we can check whether what you received is really what I sent. And if it turns out it’s not, or it has too many errors in it, then we’ll be suspicious that there was an adversary who tampered with it, and then we won’t use that key. Because we haven’t used it yet — we’re just trying to establish the key. We do the test to see whether an adversary interfered. If it passes the test, then we can use the key. And if it fails the test, we throw that key away and we try again. That’s how quantum cryptography works, but it requires a much different infrastructure than what we’re using now. We have to be able to send qubits … well, it’s not completely different because you can do it with photons. Of course, that’s how we communicate through optical fiber now — we’re sending photons. It’s a little trickier sending quantum information through an optical fiber, because of that issue that interactions with the environment can disturb it. But nowadays, you can send quantum information through an optical fiber over tens of kilometers with a low enough error rate so it’s useful for communication.
Craig Cannon [00:44:22] – Wow.
John Preskill [00:44:23] – Of course, we’d like to be able to scale that up to global distances.
Craig Cannon [00:44:26] – Sure.
John Preskill [00:44:27] – And there are big challenges in that. But anyway, so that’s another approach to the future of privacy that people are interested in.
Craig Cannon [00:44:35] – Does that necessitate quantum computers on both ends?
John Preskill [00:44:38] – Yes, but not huge ones. The reason … well, yes and no. At the scale of tens of kilometers, no. You can do that now. There are prototype systems that are in existence. But if you really want to scale it up — in other words, to send things longer distance — then you have to bring this quantum error correction idea into the game.
John Preskill [00:45:10] – Because at least with our current photonics technology, there’s no way I can send a single photon from here to China without there being a very high probability that it gets lost in the fiber somewhere. We have to have what we call quantum repeaters, which can boost the signal. But it’s not like the usual type of repeater that we have in communication networks now. The usual type is you measure the signal, and then you resend it. That won’t work for quantum because as soon as you measure it you’re going to mess it up. You have to find a way of boosting it without knowing what it is. Of course, it’s important that it works that way because otherwise, the adversary could just intercept it and resend it. And so it will require some quantum processing to get that quantum error correction in the quantum repeater to work. But it’s a much more modest scale quantum processor than we would need to solve hard problems.
Craig Cannon [00:46:14] – Okay. Gotcha. What are the other things you’re both excited about, and worried about for potential business opportunities? Snehan, I’m mispronouncing names all the times, Snehan Kekre asks, budding entrepreneurs, what should they be thinking about in the context of quantum computing?
John Preskill [00:46:37] – There’s more to quantum technology than computing. Something which has good potential to have an impact in the relatively near future is improved sensing. Quantum systems, partly because of that property that I keep emphasizing that they can’t be perfectly isolated from the outside, they’re good at sensing things. Sometimes, you want to detect it when something in the outside world messes around with your qubit. Again, using this technology of nuclear spins, which I mentioned you can do at room temperature potentially, you can make a pretty good sensor, and it can potentially achieve higher sensitivity and spatial resolution, look at things on shorter distance scales than other existing sensing technology. One of the things people are excited about are the biological and medical implications of that.
John Preskill [00:47:53] – If you can monitor the behavior of molecular machines, probe biological systems at the molecular level using very powerful sensors, that would surely have a lot of applications. One interesting question you can ask is, can you use these quantum error correction ideas to make those sensors even more powerful? That’s another area of current basic research, where you could see significant potential economic impact.
Craig Cannon [00:48:29] – Interesting. In terms of your research right now, what are you working on that you find both interesting and incredibly difficult?
John Preskill [00:48:40] – Everything I work on–
Craig Cannon [00:48:41] – 100%.
John Preskill [00:48:42] – Is both interesting and incredibly difficult. Well, let me change direction a little from what we’ve been talking about so far. Well, I’m going to tell you a little bit about me.
Craig Cannon [00:48:58] – Sure.
John Preskill [00:49:00] – I didn’t start out interested in information in my career. I’m a physicist. I was trained as an elementary particle theorist, studying the fundamental interactions and the elementary particles. That drew me into an interest in gravitation because one thing that we still have a very poor understanding of is how gravity fits together with the other fundamental interactions. The way physicists usually say it is we don’t have a quantum theory of gravity, at least not one that we think is complete and satisfactory. I’ve been interested in that question for many decades, and then got sidetracked because I got excited about quantum computing. But you know what? I’ve always looked at quantum information not just as a technology. I’m a physicist, I’m not an engineer. I’m not trying to build a better computer, necessarily, though that’s very exciting, and worth doing, and if my work can contribute to that, that’s very pleasing. I see quantum information as a new frontier in the exploration of the physical sciences. Sometimes I call it the entanglement frontier. Physicists, we like to talk about frontiers, and stuff. Short distance frontier. That’s what we’re doing at CERN in the Large Hadron Collider, trying to discern new properties of matter at distances which are shorter than we’ve ever been able to explore before.
John Preskill [00:50:57] – There’s a long distance frontier in cosmology. We’re trying to look deeper into the universe and understand its structure and behavior at earlier times. Those are both very exciting frontiers. This entanglement frontier is increasingly going to be at the forefront of basic physics research in the 21st century. By entanglement frontier, I just mean scaling up quantum systems to larger and larger complexity where it becomes harder and harder to simulate those systems with our existing digital tools. That means we can’t very well anticipate the types of behavior that we’re going to see. That’s a great opportunity for new discovery, and that’s part of what’s going to be exciting even in the relatively near term. When we have 100 qubits … there are some things that we can do to understand the behavior of the dynamics of a highly complex system of 100 qubits that we’ve never been able to experimentally probe before. That’s going to be very interesting. But what we’re starting to see now is that these quantum information ideas are connecting to these fundamental questions about gravitation, and how to think about it quantumly. And it turns out, as is true for most of the broader implications of quantum physics, the key thing is entanglement.
John Preskill [00:52:36] – We can think of the microscopic structure of spacetime, the geometry of where we live. Geometry just means who’s close to who else. If we’re in the auditorium, and I’m in the first row and you’re in the fourth row, the geometry is how close we are to one another. Of course, that’s very fundamental in both space and time. How far apart are we in space? How far apart are we in time? Is geometry really a fundamental thing, or is it something that’s kind of emergent from some even more fundamental concept? It seems increasingly likely that it’s really an emergent property.
John Preskill [00:53:29] – That there’s something deeper than geometry. What is it? We think it’s quantum entanglement. That you can think of the geometry as arising from quantum correlations among parts of a system. That’s really what defines who’s close to who. We’re trying to explore that idea more deeply, and one of the things that comes in is the idea of quantum error correction. Remember the whole idea of quantum error correction was that we could make a quantum system behave the way we want it to because it’s well-protected against the damaging effects of noise. It seems like quantum error correction is part of the deep secret of how spacetime geometry works. It has a kind of intrinsic robustness coming from these ideas of quantum error correction that makes space meaningful, so that it doesn’t just evaporate when you tap on it. If you wanted to, you could think of the spacetime, the space that you’re in and the space that I’m in, as parts of a system that are entangled with one another.
John Preskill [00:54:45] – What would happen if we broke that entanglement and your part of space became disentangled from my part? Well what we think that would mean is that there’d be no way to connect us anymore. There wouldn’t be any path through space that starts over here with me and ends with you. It’d become broken apart into two pieces. It’s really the entanglement which holds space together, which keeps it from falling apart into little pieces. We’re trying to get a deeper grasp of what that means.
Craig Cannon [00:55:19] – How do you make any progress on that? That seems like the most unbelievably difficult problem to work on.
John Preskill [00:55:26] – It’s difficult because, well for a number of reasons, but in particular, because it’s hard to get guidance from experiment, which is how physics historically–
Craig Cannon [00:55:38] – All science.
John Preskill [00:55:38] – Has advanced.
Craig Cannon [00:55:39] – Yeah.
John Preskill [00:55:41] – Although it was fun a moment ago to talk about what would happen if we disentangled your part of space from mine, I don’t know how to do that in the lab right now. Of course, part of the reason is we have the audacity to think we can figure these things out just by thinking about them. Maybe that’s not true. Nobody knows, right? We should try. Solving these problems is a great challenge, and it may be that the apes that evolved on Earth don’t have the capacity to understand things like the quantum structure of spacetime. But maybe we do, so we should try. Now in the longer term, and maybe not such a long term, maybe we can get some guidance from experiment. In particular, what we’re going to be doing with quantum computers and the other quantum technologies that are becoming increasingly sophisticated in the next couple of decades, is we’ll be able to control very well highly entangled complex quantum systems. That should mean that in a laboratory, on a tabletop, I can sort of make my own little toy space time …
John Preskill [00:57:02] – with an emergent geometry arising from the properties of that entanglement, and I think that’ll teach us lessons because systems like that are the types of system that, because they’re so highly entangled, digital computers can’t simulate them. It seems like only quantum computers are potentially up to the task. So that won’t be quite the same as disentangling your side of the room from mine, in real life. But we’d be able to do it in a laboratory setting using model systems, which I think would help us to understand the basic principles better.
Craig Cannon [00:57:39] – Wild. Yeah, desktop space time seems pretty cool, if you could figure it out.
John Preskill [00:57:43] – Yeah, it’s pretty fundamental. We didn’t really talk about what people sometimes, we did implicitly, but not in so many words. We didn’t talk about what people sometimes call quantum non-locality. It’s another way of describing quantum entanglement, actually. There’s this notion of Bell’s theorem that when you look at the correlations among the parts of a quantum system, that they’re different from any possible classical correlations. Some things that you read give you the impression that you can use that to instantaneously send information over long distances. It is true that if we have two qubits, electron spins, say, and they’re entangled with one another, then what’s kind of remarkable is that I can measure my qubit to see along some axis whether it’s up or down, and you can measure yours, and we will get perfectly correlated results. When I see up, you’ll see up, say, and when I see down, you’ll see down. And sometimes, people make it sound like that’s remarkable. That’s not remarkable in itself. Somebody could’ve flipped a pair of coins, you know,
John Preskill [00:59:17] – so that they came up both heads or both tails, and given one to you and one –
Craig Cannon [00:59:20] – Split them apart.
John Preskill [00:59:20] – to me.
Craig Cannon [00:59:21] – Yeah.
John Preskill [00:59:22] – And gone a light year apart, and then we both … hey, mine’s heads. Mine’s heads too!
Craig Cannon [00:59:24] – And then they call it quantum teleportation on YouTube.
John Preskill [00:59:28] – Yeah. Of course, what’s really important about entanglement that makes it different from just those coins is that there’s more than one way of looking at a qubit. We have what we call complementary ways of measuring it, so you can ask whether it’s up or down along this axis or along that axis. There’s nothing like that for the coins. There’s just one way to look at it. What’s cool about entanglement is that we’ll get perfectly correlated results if we both measure in the same way, but there’s more than one possible way that we could measure. What sometimes gets said, or the impression people get, is that that means that when I do something to my qubit, it instantaneously affects your qubit, even if we’re on different sides of the galaxy. But that’s not what entanglement does. It just means they’re correlated in a certain way.
John Preskill [01:00:30] – When you look at yours, if we have maximally entangled qubits, you just see a random bit. It could be a zero or a one, each occurring with probability 1/2. That’s going to be true no matter what I did to my qubit, and so you can’t tell what I did by just looking at it. It’s only that if we compared notes later we can see how they’re correlated, and that correlation holds for either one of these two complementary ways in which we could both measure. It’s that fact that we have these complementary ways to measure that makes it impossible for a classical system to reproduce those same correlations. So that’s one misconception that’s pretty widespread. Another one is this about quantum computing, which is in trying to explain why quantum computers are powerful, people will sometimes say, well, it’s because you can superpose –I used that word before, you can add together many different possibilities. That means that, whereas an ordinary computer would just do a computation once, acting on a superposition a quantum computer can do a vast number of computations all at once.
John Preskill [01:01:54] – There’s a certain sense in which that’s mathematically true if you interpret it right, but it’s very misleading. Because in the end, you’re going to have to make some measurement to read out the result. When you read it out, there’s a limited amount of information you can get. You’re not going to be able to read out the results of some huge number of computations in a single shot measurement. Really the key thing that makes it work is this idea of interference, which we discussed briefly when you asked about Grover’s algorithm. The art of a quantum algorithm is to make sure that the wrong answers interfere and cancel one another out, so the right answer is enhanced. That’s not automatic. It requires that the quantum algorithm be designed in just the right way.
Craig Cannon [01:02:50] – Right. The diagrams I’ve seen online at least, involve usually you’re squaring the output as it goes along, and then essentially, that flips the correct answer to the positive, and the others are in the negative position. Is that accurate?
John Preskill [01:03:08] – I wouldn’t have said it the way you did– Because you can’t really measure it as you go along. Once you measure it, the magic of superposition is going to be lost.
John Preskill [01:03:19] – It means that now there’s some definite outcome or state. To take advantage of this interference phenomenon, you need to delay the measurement. Remember when we were talking about the double slit and I said, if you actually see these wiggles in the probability of detection, which is the signal of interference, that means that there’s no way anybody could know whether the electron went through hole one or hole two? It’s the same way with quantum computing. If you think of the computation as being a superposition of different possible computations, it wouldn’t work — there wouldn’t be a speed up — if you could know which of those paths the computation followed. It’s important that you don’t know. And so you have to sum up all the different computations, and that’s how the interference phenomenon comes into play.
Craig Cannon [01:04:17] – To take a little sidetrack, you mentioned Feynman before. And before we started recording you mentioned working with him. I know I’m in the Feynman fan club, for sure. What was that experience like?
John Preskill [01:04:32] – We never really collaborated. I mean, we didn’t write a paper together, or anything like that. We overlapped for five years at Caltech. I arrived here in 1983. He died in 1988. We had offices on the same corridor, and we talked pretty often because we were both interested in the fundamental interactions, and in particular, what we call quantum chromodynamics. It’s our theory of how nuclear matter behaves, how quarks interact, what holds the proton together, those kinds of things. One big question is what does hold the proton together? Why don’t the quarks just fall apart? That was an example of a problem that both he and I were very interested in, and which we talked about sometimes. Now, this was pretty late in his career. When I think about it now, when I arrived at Caltech, that was 1983, Feynman was born in 1918, so he was 65. I’m 64 now, so maybe he wasn’t so old, right? But at the time, he seemed pretty ancient to me. Since I was 30.
John Preskill [01:05:58] – Those who interacted with Dick Feynman when he was really at his intellectual peak in the ’40s, and ’50s, and ’60s, probably saw even more extraordinary intellectual feats than I witnessed interacting with the 65 year old Feynman. He just loved physics, you know? He just thought everything was so much fun. He loved talking about it. He wasn’t as good a listener as a talker, but actually – well that’s a little unfair, isn’t it? It was kind of funny because Feynman, he always wanted to think things through for himself, sort of from first principles, rather than rely on the guidance from experts who have thought about these things before. Well that’s fine. You should try to understand things as deeply as you can on your own, and sort of reconstruct the knowledge from the ground up. That’s very enabling, and gives you new insights. But he was a little too dismissive, in my view, of what the other guys knew. But I could slip it in because I didn’t tell him, “Dick, you should read this paper by Polyakov” — well maybe I did, but he wouldn’t have even heard that — because he solved that problem that you’re talking about.
John Preskill [01:07:39] – But I knew what Polyakov had said about it, so I would say, “Oh well, look, why don’t we look at it this way?” And so he thought I was, that I was having all these insights, but the truth was the big difference between Feynman and me in the mid 1980s was I was reading literature, and he wasn’t.
Craig Cannon [01:08:00] – That’s funny.
John Preskill [01:08:01] – Probably, if he had been, he would’ve been well served, but that wasn’t the way he liked to work on things. He wanted to find his own approach. Of course, that had worked out pretty well for him throughout his career.
Craig Cannon [01:08:15] – What other qualities did you notice about him when he was roaming the corridors?
John Preskill [01:08:21] – He’d always be drumming. So you would know he was around because he’d actually be walking down the hallway drumming on the wall.
Craig Cannon [01:08:27] – Wait, with his hands, or with sticks, or–
John Preskill [01:08:29] – No, hands. He’d just be tapping.
Craig Cannon [01:08:32] – Just a bongo thing.
John Preskill [01:08:33] – Yeah. That was one thing. He loved to tell stories. You’ve probably read the books that Ralph Leighton put together based on the stories Feynman told. Ralph did an amazing job, of capturing Feynman’s personality in writing those stories down because I’d heard a lot of them. I’m sure he told the same stories to many people many times, because he loved telling stories. But the book really captures his voice pretty well.
John Preskill [01:09:12] – If you had heard him tell some of these stories, and then you read the way Ralph Leighton transcribed them, you can hear Feynman talking. At the time that I knew him, one of the experiences that he went through was he was on the Challenger commission after the space shuttle blew up. He was in Washington a lot of the time, but he’d come back from time to time, and he would sort of sit back and relax in our seminar room and start bringing us up to date on all the weird things that were happening on the Challenger commission. That was pretty fun.
Craig Cannon [01:09:56] – That’s really cool.
John Preskill [01:09:56] – A lot of that got captured in the second volume. I guess it’s the one called, What Do You Care What Other People Think? There’s a chapter about him telling stories about the Challenger commission. He was interested in everything. It wasn’t just physics. He was very interested in biology. He was interested in computation. I remember how excited he was when he got his first IBM PC. Probably not long after I got to Caltech. Yeah, it was what they called the AT. We thought it was a pretty sexy machine. I had one, too. He couldn’t wait to start programming it in BASIC.
Craig Cannon [01:10:50] – Very cool.
John Preskill [01:10:51] – Because that was so much fun.
Craig Cannon [01:10:52] – There was a question that I was kind of curious to your answer. Tika asks about essentially, teaching about quantum computers. They say, many kids in grade 10 can code. Some can play with machine learning tools without knowing the math. Can quantum computing become as simple and/or accessible?
John Preskill [01:11:17] – Maybe so. At some level, when people say quantum mechanics is counterintuitive, it’s hard for us to grasp, it’s so foreign to our experience, that’s true. The way things behave at the microscopic scale are, like we discussed earlier, really different from the way ordinary stuff behaves. But it’s a question of familiarity. What I wouldn’t be surprised by is that if you go out a few decades, kids who are 10 years old are going to be playing quantum games. That’s an application area that doesn’t get discussed very much, but there could be a real market there because people love games. Quantum games are different, and the strategies are different, and what you have to do to win is different. If you play the game enough, you start to get the hang of it.
John Preskill [01:12:26] – I don’t see any reason why kids who have not necessarily deeply studied physics can’t get a pretty good feel for how quantum mechanics works. You know, the way ordinary physics works, maybe it’s not so intuitive. Newton’s laws … Aristotle couldn’t get it right. He thought you had to keep pushing on something to get it to keep moving. That wasn’t right. Galileo was able to roll balls down a ramp, and things like that, and see he didn’t have to keep pushing to keep it moving. He could see that it was uniformly accelerated in a gravitational field. Newton took that to a much more general and powerful level. You fool around with stuff, and you get the hang of it. And I think quantum stuff can be like that. We’ll experience it in a different way, but when we have quantum computers, in a way, that opens the opportunity for trying things out and seeing what happens.
John Preskill [01:13:50] – After you’ve played the game enough, you start to anticipate. And actually, it’s an important point about the applications. One of the questions you asked me at the beginning was what are we able to do with quantum computers? And I said, I don’t know. So how are we going to discover new applications? It might just be, at least in part, by fooling around. A lot of classical algorithms that people use on today’s computers were discovered, or that they were powerful was discovered, by experimenting. By trying it. I don’t know … what’s an example of that? Well, the simplex method that we use in linear programming. I don’t think there was a mathematical proof that it was fast at first, but people did experiments, and they said, hey, this is pretty fast.
Craig Cannon [01:14:53] – Well, you’re seeing it a lot now in machine learning.
John Preskill [01:14:57] – Yeah, well that’s a good example.
Craig Cannon [01:14:58] – You test it out a million times over when you’re running simulations, and it turns out, that’s what works. Following the thread of education, and maybe your political interest, given it’s the year that it is, do you have thoughts on how you would adjust or change STEM education?
John Preskill [01:15:23] – Well, no particularly original thoughts. But I do think that STEM education … we shouldn’t think of it as we’re going to need this technical workforce, and so we better train them. The key thing is we want the general population to be able to reason effectively, and to recognize when an argument is phony and when it’s authentic. To think about, well how can I check whether what I just read on Facebook is really true? And I see that as part of the goal of STEM education. When you’re teaching kids in school how to understand the world by doing experiments, by looking at the evidence, by reasoning from the evidence, this is something that we apply in everyday life, too. I don’t know exactly how to implement this–
John Preskill [01:16:36] – But I think we should have that perspective that we’re trying to educate a public, which is going to eventually make critical decisions about our democracy, and they should understand how to tell when something is true or not. That’s a hard thing to do in general, but you know what I mean. That there are some things that, if you’re a person with some — I mean it doesn’t necessarily have to be technical — but if you’re used to evaluating evidence and making a judgment based on that evidence about whether it’s a good argument or not, you can apply that to all the things you hear and read, and make better judgments.
Craig Cannon [01:17:23] – What about on the policy side? Let’s see, JJ Francis asked that, if you or any of your colleagues would ever consider running for office. Curious about science policy in the US.
John Preskill [01:17:38] – Well, it would be good if we had more scientifically trained people in government. Very few members of Congress. I know of one, Bill Foster’s a physicist in Illinois. He was a particle physicist, and he worked at Fermilab, and now he’s in Congress, and very interested in the science and educational policy aspects of government. Rush Holt was a congressman from New Jersey who had a background in physics. He retired from the House a couple of years ago, but he was in Congress for something like 18 years, and he had a positive influence, because he had a voice that people respected when it came to science policy. Having more people like that would help. Now, another thing, it doesn’t have to be elective office.
Craig Cannon [01:18:39] – Right.
John Preskill [01:18:42] – There are a lot of technically trained people in government, many of them making their careers in agencies that deal with technical issues. Department of Defense, of course, there are a lot of technical issues. In the Obama Administration we had two successive secretaries of energy who were very, very good physicists. Steve Chu was Nobel Prize winning physicist. Then Ernie Moniz, who’s a real authority on nuclear energy and weapons. That kind of expertise makes a difference in government.
John Preskill [01:19:24] – Now the Secretary of Energy is Rick Perry. It’s a different background.
Craig Cannon [01:19:28] – Yeah, you could say that. Just kind of historical reference, what policies did they put in place that you really felt their hand as a physicist move forward?
John Preskill [01:19:44] – You mean in particular–
Craig Cannon [01:19:45] – I’m talking the Obama Administration.
John Preskill [01:19:49] – Well, I think the Department of Energy, DOE, tried to facilitate technical innovation by seeding new technologies, by supporting startup companies that were trying to do things that would improve battery technology, and solar power, and things like that, which could benefit future generations. They had an impact by doing that. You don’t have to be a Nobel Prize winning physicist to think that’s a good idea. That the administration felt that was a priority made a difference, and appointing a physicist at Department of Energy was, if nothing else, highly symbolic of how important those things are.
Craig Cannon [01:20:52] – On the quantum side, someone asked Vikas Karad, he asked where the Quantum Valley might be. Do you have thoughts, as in Silicon Valley for quantum computing?
John Preskill [01:21:06] – Well… I don’t know, but you look at what’s happening the last couple of years, there have been a number of quantum startups. A notable number of them are in the Bay Area. Why so? Well, that’s where the tech industry is concentrated and where the people who are interested in financing innovative technical startups are concentrated. If you are an entrepreneur interested in starting a company, and you’re concerned about how to fundraise for it, it kind of makes sense to locate in that area. Now, that’s what’s sort of happening now, and may not continue, of course. It might not be like that indefinitely. Nothing lasts forever, but I would say… That’s the place, Silicon Valley is likely to be Quantum Valley, the way things are right now.
Craig Cannon [01:22:10] – Well then what about the physicists who might be listening to this? If they’re thinking about starting a company, do you have advice for them?
John Preskill [01:22:22] – Just speaking very generally, if you’re putting a team together… Different people have different expertise. Take quantum computing as an example, like we were saying earlier, some of the big players and the startups, they want to do everything. They want to build the hardware, figure out better ways to fabricate it. Better control, better software, better applications. Nobody can be an expert on all those things. Of course, you’ll hire a software person to write your software, and microwave engineer to figure out your control, and of course that’s the right thing to do. But I think in that arena, and it probably applies to other entrepreneurial activity relating to physics, being able to communicate across those boundaries is very valuable, and you can see it in quantum computing now. That if the man or woman who’s involved in the software has that background, but there’s not a big communication barrier talking to the people who are doing the control engineering, that can be very helpful. It makes sense to give some preference to the people who maybe are comfortable doing so, or have the background that stretches across more than one of those areas of expertise. That can be very enabling in a technology arena like quantum computing today, where we’re trying to do really, really hard stuff, and you don’t know whether you’ll succeed, and you want to give it your best go by seeing the connections between those different things.
Craig Cannon [01:24:28] – Would you advise someone then to maybe teach or try and explain it to, I don’t know their young cousins? Because Feynman maybe recognizes the king of communicating physics, at least for a certain period of time. How would you advise someone to get better at it so they can be more effective?
John Preskill [01:24:50] – Practice. There are different aspects of that. This isn’t what you meant at all, but I’ll say it anyway, because what you asked brought it to mind. If you teach, you learn. We have this odd model in the research university that a professor like me is supposed to do research and teach. Why don’t we hire teachers and researchers? Why do we have the same people doing both? Well, part of the reason for me is most of what I know, what I’ve learned since my own school education ended, is knowledge I acquired by trying to teach it. To keep our intellect rejuvenated, we have to have that experience of trying to teach new things that we didn’t know that well before to other people. That deepens your knowledge. Just thinking about how you convey it makes you ask questions that you might not think to ask otherwise, and you say “Hey, I don’t know the answer to that.” Then you have to try to figure it out. So I think that applies at varying levels to any situation in which a scientist, or somebody with a technical background, is trying to communicate.
John Preskill [01:26:21] – By thinking about how to get it across to other people, we can get new insights, you know? We can look at it in a different way. It’s not a waste of time. Aside from the benefits of actually successfully communicating, we benefit from it in this other way. But other than that… Have fun with it, you know? Don’t look at it as a burden, or some kind of task you have to do along with all the other things you’re doing. It should be a pleasure. When it’s successful, it’s very gratifying. If you put a lot of thought into how to communicate something and you think people are getting it, that’s one of the ways that somebody in my line of work can get a lot of satisfaction.
Craig Cannon [01:27:23] – If now were to be your opportunity to teach a lot of people about physics, and you could just point someone to things, who would you advise someone to be? They want to learn more about quantum computing, they want to learn about physics. What should they be reading? What YouTube channel should they follow? What should they pay attention to?
John Preskill [01:27:44] – Well one communicator who I have great admiration for is Leonard Susskind, who’s at Stanford. You mentioned Feynman as the great communicator, and that’s fair, but in terms of style and personality of physicists who are currently active, I think Lenny Susskind is the most similar to Feynman of anyone I can think of. He’s a no bullshit kind of guy. He wants to give you the straight stuff. He doesn’t want to water it down for you. But he’s very gifted when it comes to making analogies and creating the illusion that you’re understanding what he’s saying. He has … if you just go to YouTube and search Leonard Susskind you’ll see lectures that he’s given at Stanford where they have some kind of extension school for people who are not Stanford students, people in the community. A lot of them in the tech community because it’s Stanford, and he’s giving courses. Yeah, and on quite sophisticated topics, but also on more basic topics, and he’s in the process of turning those into books. I’m not sure how many of those have appeared, but he has a series called The Theoretical Minimum
John Preskill [01:29:19] – which is supposed to be the gentle introduction to different topics like classical physics, quantum physics, and so on. He’s pretty special I think in his ability to do that.
Craig Cannon [01:29:32] – I need to subscribe. Actually, here’s a question then. In the things you’ve relearned while teaching over the past, I guess it’s 35 years now.
John Preskill [01:29:46] – Shit, is that right?
Craig Cannon [01:29:47] – Something like that.
John Preskill [01:29:48] – That’s true. Yeah.
Craig Cannon [01:29:51] – What were the big thing, what were the revelations?
John Preskill [01:29:55] – That’s how I learned quantum computing, for one thing. I was not at all knowledgeable about information science. That wasn’t my training. Back when I was in school, physicists didn’t learn much about things like information theory, computer science, complexity theory. One of the great things about quantum computing is its interdisciplinary character, that it brings these different things into contact, which traditionally had not been part of the common curriculum of any community of scholars. I decided 20 years ago that I should teach a quantum information class at Caltech, and I worked very hard on it that year. Not that I’m an expert, or anything, but I learned a lot about information theory, and things like channel capacity, and computational complexity — how we classify the hardness of problems — and algorithms. Things like that, which I didn’t really know very well. I had sort of a passing familiarity with some of those things from reading some of the quantum computing literature. That’s no substitute for teaching a class because then you really have to synthesize it and figure out your way of presenting it. Most of the notes are typed up and you can still get to them on my website.That was pretty transformative for me … and it was easier then, 20 years ago, I guess than it is now because it was such a new topic.
John Preskill [01:31:49] – But I really felt I was kind of close enough to the cutting edge on most of those topics by the time I’d finished the class that I wasn’t intimidated by another paper I’d read or a new thing I’d hear about those things. That was probably the one case where it really made a difference in my foundation of knowledge which enabled me to do things. But I had the same experience in particle physics. When I was a student, I read a lot. I was very broadly interested in physics. But when the first time, I was still at Harvard at the time –later I taught a similar course here — I’m in my late 20s, I’m just a year or two out of graduate school, and I decide to teach a very comprehensive class on elementary particles … in particular, quantum chromodynamics, the theory of nuclear forces like we talked about before. It just really expanded my knowledge to have that experience of teaching that class. I still draw on that. I can still remember that experience and I think I get ideas that I might not otherwise have because I went through that.
Craig Cannon [01:33:23] – I want to get involved now. I want to go back to school, or maybe teach a class. I don’t know.
John Preskill [01:33:27] – Well, what’s stopping you?
Craig Cannon [01:33:29] – Nothing. Alright, thanks John.
John Preskill [01:33:32] – Okay, thank you Craig.
Occasionally, other physicists ask me what the goal of amplitudes research is. What’s it all about?
I want to give my usual answer: we’re calculating scattering amplitudes! We’re trying to compute them more efficiently, taking advantage of simplifications and using a big toolbox of different approaches, and…
Usually by this point in the conversation, it’s clear that this isn’t what they were asking.
When physicists ask me about the goal of amplitudes research, they’ve got a longer view in mind. Maybe they’ve seen a talk by Nima Arkani-Hamed, declaring that spacetime is doomed. Maybe they’ve seen papers arguing that everything we know about quantum field theory can be derived from a few simple rules. Maybe they’ve heard slogans, like “on-shell good, off-shell bad”. Maybe they’ve heard about the conjecture that N=8 supergravity is finite, or maybe they’ve just heard someone praise the field as “demoting the sacred cows like fields, Lagrangians, and gauge symmetry”.
Often, they’ve heard a little bit of all of these. Sometimes they’re excited, sometimes they’re skeptical, but either way, they’re usually more than a little confused. They’re asking how all of these statements fit into a larger story.
The glib answer is that they don’t. Amplitudes has always been a grab-bag of methods: different people with different backgrounds, united by their interest in a particular kind of calculation.
With that said, I think there is a shared philosophy, even if each of us approaches it a little differently. There is an overall principle that unites the amplituhedron and color-kinematics duality, the CHY string and bootstrap methods, BCFW and generalized unitarity.
If I had to describe that principle in one word, I’d call it minimality. Quantum field theory involves hugely complicated mathematical machinery: Lagrangians and path integrals, Feynman diagrams and gauge fixing. At the end of the day, if you want to answer a concrete question, you’re computing a few specific kinds of things: mostly, scattering amplitudes and correlation functions. Amplitudes tries to start from the other end, and ask what outputs of this process are allowed. The idea is to search for something minimal: a few principles that, when applied to a final answer in a particular form, specify it uniquely. The form in question varies: it can be a geometric picture like the amplituhedron, or a string-like worldsheet, or a constructive approach built up from three-particle amplitudes. The goal, in each case, is the same: to skip the usual machinery, and understand the allowed form for the answer.
From this principle, where do the slogans come from? How could minimality replace spacetime, or solve quantum gravity?
It can’t…if we stick to only matching quantum field theory. As long as each calculation matches one someone else could do with known theories, even if we’re more efficient, these minimal descriptions won’t really solve these kinds of big-picture mysteries.
The hope (and for the most part, it’s a long-term hope) is that we can go beyond that. By exploring minimal descriptions, the hope is that we will find not only known theories, but unknown ones as well, theories that weren’t expected in the old understanding of quantum field theory. The amplituhedron doesn’t need space-time, it might lead the way to a theory that doesn’t have space-time. If N=8 supergravity is finite, it could suggest new theories that are finite. The story repeats, with variations, whenever amplitudeologists explore the outlook of our field. If we know the minimal requirements for an amplitude, we could find amplitudes that nobody expected.
I’m not claiming we’re the only field like this: I feel like the conformal bootstrap could tell a similar story. And I’m not saying everyone thinks about our field this way: there’s a lot of deep mathematics in just calculating amplitudes, and it fascinated people long before the field caught on with the Princeton set.
But if you’re asking what the story is for amplitudes, the weird buzz you catch bits and pieces of and can’t quite put together…well, if there’s any unifying story, I think it’s this one.
We now leave the topic of Riemann surfaces, and turn now to the (loosely related) topic of conformal mapping (and quasiconformal mapping). Recall that a conformal map from an open subset of the complex plane to another open set is a map that is holomorphic and bijective, which (by Rouché’s theorem) also forces the derivative of to be nowhere vanishing. We then say that the two open sets are conformally equivalent. From the Cauchy-Riemann equations we see that conformal maps are orientation-preserving and angle-preserving; from the Newton approximation we see that they almost preserve small circles, indeed for small the circle will approximately map to .
In previous quarters, we proved a fundamental theorem about this concept, the Riemann mapping theorem:
Theorem 1 (Riemann mapping theorem) Let be a simply connected open subset of that is not all of . Then is conformally equivalent to the unit disk .
This theorem was proven in these 246A lecture notes, using an argument of Koebe. At a very high level, one can sketch Koebe’s proof of the Riemann mapping theorem as follows: among all the injective holomorphic maps from to that map some fixed point to , pick one that maximises the magnitude of the derivative (ignoring for this discussion the issue of proving that a maximiser exists). If avoids some point in , one can compose with various holomorphic maps and use Schwarz’s lemma and the chain rule to increase without destroying injectivity; see the previous lecture notes for details. The conformal map is unique up to Möbius automorphisms of the disk; one can fix the map by picking two distinct points in , and requiring to be zero and to be positive real.
It is a beautiful observation of Thurston that the concept of a conformal mapping has a discrete counterpart, namely the mapping of one circle packing to another. Furthermore, one can run a version of Koebe’s argument (using now a discrete version of Perron’s method) to prove the Riemann mapping theorem through circle packings. In principle, this leads to a mostly elementary approach to conformal geometry, based on extremely classical mathematics that goes all the way back to Apollonius. However, in order to prove the basic existence and uniqueness theorems of circle packing, as well as the convergence to conformal maps in the continuous limit, it seems to be necessary (or at least highly convenient) to use much more modern machinery, including the theory of quasiconformal mapping, and also the Riemann mapping theorem itself (so in particular we are not structuring these notes to provide a completely independent proof of that theorem, though this may well be possible).
To make the above discussion more precise we need some notation.
Definition 2 (Circle packing) A (finite) circle packing is a finite collection of circles in the complex numbers indexed by some finite set , whose interiors are all disjoint (but which are allowed to be tangent to each other), and whose union is connected. The nerve of a circle packing is the finite graph whose vertices are the centres of the circle packing, with two such centres connected by an edge if the circles are tangent. (In these notes all graphs are undirected, finite and simple, unless otherwise specified.)
It is clear that the nerve of a circle packing is connected and planar, since one can draw the nerve by placing each vertex (tautologically) in its location in the complex plane, and drawing each edge by the line segment between the centres of the circles it connects (this line segment will pass through the point of tangency of the two circles). Later in these notes we will also have to consider some infinite circle packings, most notably the infinite regular hexagonal circle packing.
The first basic theorem in the subject is the following converse statement:
Theorem 3 (Circle packing theorem) Every connected planar graph is the nerve of a circle packing.
Of course, there can be multiple circle packings associated to a given connected planar graph; indeed, since reflections across a line and Möbius transformations map circles to circles (or lines), they will map circle packings to circle packings (unless one or more of the circles is sent to a line). It turns out that once one adds enough edges to the planar graph, the circle packing is otherwise rigid:
Theorem 4 (Koebe-Andreev-Thurston theorem) If a connected planar graph is maximal (i.e., no further edge can be added to it without destroying planarity), then the circle packing given by the above theorem is unique up to reflections and Möbius transformations.
Exercise 5 Let be a connected planar graph with vertices. Show that the following are equivalent:
- (i) is a maximal planar graph.
- (ii) has edges.
- (iii) Every drawing of divides the plane into faces that have three edges each. (This includes one unbounded face.)
- (iv) At least one drawing of divides the plane into faces that have three edges each.
(Hint: use Euler’s formula , where is the number of faces including the unbounded face.)
Thurston conjectured that circle packings can be used to approximate the conformal map arising in the Riemann mapping theorem. Here is an informal statement:
A rigorous version of this conjecture was proven by Rodin and Sullivan. Besides some elementary geometric lemmas (regarding the relative sizes of various configurations of tangent circles), the main ingredients are a rigidity result for the regular hexagonal circle packing, and the theory of quasiconformal maps. Quasiconformal maps are what seem on the surface to be a very broad generalisation of the notion of a conformal map. Informally, conformal maps take infinitesimal circles to infinitesimal circles, whereas quasiconformal maps take infinitesimal circles to infinitesimal ellipses of bounded eccentricity. In terms of Wirtinger derivatives, conformal maps obey the Cauchy-Riemann equation , while (sufficiently smooth) quasiconformal maps only obey an inequality . As such, quasiconformal maps are considerably more plentiful than conformal maps, and in particular it is possible to create piecewise smooth quasiconformal maps by gluing together various simple maps such as affine maps or Möbius transformations; such piecewise maps will naturally arise when trying to rigorously build the map alluded to in the above conjecture. On the other hand, it turns out that quasiconformal maps still have many vestiges of the rigidity properties enjoyed by conformal maps; for instance, there are quasiconformal analogues of fundamental theorems in conformal mapping such as the Schwarz reflection principle, Liouville’s theorem, or Hurwitz’s theorem. Among other things, these quasiconformal rigidity theorems allow one to create conformal maps from the limit of quasiconformal maps in many circumstances, and this will be how the Thurston conjecture will be proven. A key technical tool in establishing these sorts of rigidity theorems will be the theory of an important quasiconformal (quasi-)invariant, the conformal modulus (or, equivalently, the extremal length, which is the reciprocal of the modulus).
— 1. Proof of the circle packing theorem —
We loosely follow the treatment of Beardon and Stephenson. It is slightly more convenient to temporarily work in the Riemann sphere rather than the complex plane , in order to more easily use Möbius transformations. (Later we will make another change of venue, working in the Poincaré disk instead of the Riemann sphere.)
Define a Riemann sphere circle to be either a circle in or a line in together with , together with one of the two components of the complement of this circle or line designated as the “interior”. In the case of a line, this “interior” is just one of the two half-planes on either side of the line; in the case of the circle, this is either the usual interior or the usual exterior plus the point at infinity; in the last case, we refer to the Riemann sphere circle as an exterior circle. (One could also equivalently work with an orientation on the circle rather than assigning an interior, since the interior could then be described as the region to (say) the left of the circle as one traverses the circle along the indicated orientation.) Note that Möbius transforms map Riemann sphere circles to Riemann sphere circles. If one views the Riemann sphere as a geometric sphere in Euclidean space , then Riemann sphere circles are just circles on this geometric sphere, which then have a centre on this sphere that lies in the region designated as the interior of the circle. We caution though that this “Riemann sphere” centre does not always correspond to the Euclidean notion of the centre of a circle. For instance, the real line, with the upper half-plane designated as interior, will have as its Riemann sphere centre; if instead one designates the lower half-plane as the interior, the Riemann sphere centre will now be . We can then define a Riemann sphere circle packing in exact analogy with circle packings in , namely finite collections of Riemann sphere circles whose interiors are disjoint and whose union is connected; we also define the nerve as before. This is now a graph that can be drawn in the Riemann sphere, using great circle arcs in the Riemann sphere rather than line segments; it is also planar, since one can apply a Möbius transformation to move all the points and edges of the drawing away from infinity.
By Exercise 5, a maximal planar graph with at least three vertices can be drawn as a triangulation of the Riemann sphere. If there are at least four vertices, then it is easy to see that each vertex has degree at least three (a vertex of degree zero, one or two in a triangulation with simple edges will lead to a connected component of at most three vertices). It is a topological fact, not established here, that any two triangulations of such a graph are homotopic up to reflection (to reverse the orientation). If a Riemann sphere circle packing has the nerve of a maximal planar graph of at least four vertices, then we see that this nerve induces an explicit triangulation of the Riemann sphere by connecting the centres of any pair of tangent circles with the great circle arc that passes through the point of tangency. If was not maximal, one no longer gets a triangulation this way, but one still obtains a partition of the Riemann sphere into spherical polygons.
We remark that the triangles in this triangulation can also be described purely from the abstract graph . Define a triangle in to be a triple of vertices in which are all adjacent to each other, and such that the removal of these three vertices from does not disconnect the graph. One can check that there is a one-to-one correspondence between such triangles in a maximal planar graph and the triangles in any Riemann sphere triangulation of this graph.
Theorems 3, 4 are then a consequence of
Theorem 7 (Riemann sphere circle packing theorem) Let be a maximal planar graph with at least four vertices, drawn as a triangulation of the Riemann sphere. Then there exists a Riemann sphere circle packing with nerve whose triangulation is homotopic to the given triangulation. Furthermore, this packing is unique up to Möbius transformations.
Exercise 8 Deduce Theorems 3, 4 from Theorem 7. (Hint: If one has a non-maximal planar graph for Theorem 3, add a vertex at the interior of each non-triangular face of a drawing of that graph, and connect that vertex to the vertices of the face, to create a maximal planar graph to which Theorem 4 or Theorem 7 can be applied. Then delete these “helper vertices” to create a packing of the original planar graph that does not contain any “unwanted” tangencies. You may use without proof the above assertion that any two triangulations of a maximal planar graph are homotopic up to reflection.)
Exercise 9 Verify Theorem 7 when has exactly four vertices. (Hint: for the uniqueness, one can use Möbius transformations to move two of the circles to become parallel lines.)
To prove this theorem, we will make a reduction with regards to the existence component of Theorem 7. For technical reasons we will need to introduce a notion of non-degeneracy. Let be a maximal planar graph with at least four vertices, and let be a vertex in . As discussed above, the degree of is at least three. Writing the neighbours of in clockwise or counterclockwise order (with respect to a triangulation) as (starting from some arbitrary neighbour), we see that each is adjacent to and (with the conventions and ). We say that is non-degenerate if there are no further adjacencies between the , and if there is at least one further vertex in besides . Here is another characterisation:
Exercise 10 Let be a maximal planar graph with at least four vertices, let be a vertex in , and let be the neighbours of . Show that the following are equivalent:
- (i) is non-degenerate.
- (ii) The graph is connected and non-empty, and every vertex in is adjacent to at least one vertex in .
We will then derive Theorem 7 from
Let us now see how Theorem 7 follows from Theorem 14. Fix as in Theorem 7. By Exercise 9 and induction we may assume that has at least five vertices, and that the claim has been proven for any smaller number of vertices.
First suppose that contains a non-degenerate vertex . Let be the the neighbours of . One can then form a new graph with one fewer vertex by deleting , and then connecting to (one can think of this operation as contracting the edge to a point). One can check that this is still a maximal planar graph that can triangulate the Riemann sphere in a fashion compatible with the original triangulation of (in that all the common vertices, edges, and faces are unchanged). By induction hypothesis, is the nerve of a circle packing that is compatible with this triangulation, and hence this circle packing has nerve at least . Applying Theorem 14, we then obtain the required claim for .
Now suppose that contains a degenerate vertex . Let be the neighbours of traversed in order. By hypothesis, there is an additional adjacency between the ; by relabeling we may assume that is adjacent to for some . The vertices in can then be partitioned as
where denotes those vertices in that lie in the region enclosed by the loop that does not contain , and denotes those vertices in that lie in the region enclosed by the loop that does not contain . One can then form two graphs , formed by restricting to the vertices and respectively; furthermore, these graphs are also maximal planar (with triangulations that are compatible with those of ). By induction hypothesis, we can find a circle packing with nerve , and a circle packing with nerve . Note that the circles are mutually tangent, as are . By applying a Möbius transformation one may assume that these circles agree, thus (cf. Exercise 9) , . The complement of the these three circles (and their interiors) determine two connected “interstitial” regions (that are in the shape of an arbelos, up to Möbius transformation); one can check that the remaining circles in will lie in one of these regions, and the remaining circles in lie in the other. Hence one can glue these circle packings together to form a single circle packing with nerve , which is homotopic to the given triangulation. Also, since a Möbius transformation that fixes three mutually tangent circles has to be the identity, the uniqueness of this circle packing up to Möbius transformations follows from the uniqueness for the two component circle packings , .
It remains to prove Theorem 7. To help fix the freedom to apply Möbius transformations, we can normalise the target circle packing so that is the exterior circle , thus all the other circles in the packing will lie in the closed unit disk . Similarly, by applying a suitable Möbius transformation one can assume that lies outside of the interior of all the circles in the original packing, and after a scaling one may then assume that all the circles lie in the unit disk .
At this point it becomes convenient to switch from the “elliptic” conformal geometry of the Riemann sphere to the “hyperbolic” conformal geometry of the unit disk . Recall that the Möbius transformations that preserve the disk are given by the maps
for real and (see Theorem 19 of these notes). It comes with a natural metric that interacts well with circles:
Exercise 12 Define the Poincaré distance between two points of by the formula
Given a measurable subset of , define the hyperbolic area of to be the quantity
where is the Euclidean area element on .
- (i) Show that the Poincaré distance is invariant with respect to Möbius automorphisms of , thus whenever is a transformation of the form (1). Similarly show that the hyperbolic area is invariant with respect to such transformations.
- (ii) Show that the Poincaré distance defines a metric on . Furthermore, show that any two distinct points are connected by a unique geodesic, which is a portion of either a line or a circle that meets the unit circle orthogonally at two points. (Hint: use the symmetries of (i) to normalise the points one is studying.)
- (iii) If is a circle in the interior of , show that there exists a point in and a positive real number (which we call the hyperbolic center and hyperbolic radius respectively) such that . (In general, the hyperbolic center and radius will not quite agree with their familiar Euclidean counterparts.) Conversely, show that for any and , the set is a circle in .
- (iv) If two circles in are externally tangent, show that the geodesic connecting the hyperbolic centers passes through the point of tangency, orthogonally to the two tangent circles.
Exercise 13 (Schwarz-Pick theorem) Let be a holomorphic map. Show that for all . If , show that equality occurs if and only if is a Möbius automorphism (1) of . (This result is known as the Schwarz-Pick theorem.)
We will refer to circles that lie in the closure of the unit disk as hyperbolic circles. These can be divided into the finite radius hyperbolic circles, which lie in the interior of the unit disk (as per part (iii) of the above exercise), and the horocycles, which are internally tangent to the unit circle. By convention, we view horocycles as having infinite radius, and having center at their point of tangency to the unit circle; they can be viewed as the limiting case of finite radius hyperbolic circles when the radius goes to infinity and the center goes off to the boundary of the disk (at the same rate as the radius, as measured with respect to the Poincaré distance). We write for the hyperbolic circle with hyperbolic centre and hyperbolic radius (thus either and , or and is on the unit circle); there is an annoying caveat that when there is more than one horocycle with hyperbolic centre , but we will tolerate this breakdown of functional dependence of on and in order to simplify the notation. A hyperbolic circle packing is a circle packing in which all circles are hyperbolic circles.
We also observe that the geodesic structure extends to the boundary of the unit disk: for any two distinct points in , there is a unique geodesic that connects them.
In view of the above discussion, Theorem 7 may now be formulated as follows:
Theorem 14 (Inductive step, hyperbolic formulation) Let be a maximal planar graph with at least four vertices , let be a non-degenerate vertex of , and let be the vertices adjacent to . Suppose that there exists a hyperbolic circle packing whose nerve is at least . Then there is a hyperbolic circle packing homotopic to such that the boundary circles , are all horocycles. Furthermore, this packing is unique up to Möbius automorphisms (1) of the disk .
Indeed, once one adjoints the exterior unit circle to , one obtains a Riemann sphere circle packing whose nerve is at least , and hence equal to since is maximal.
To prove this theorem, the intuition is to “inflate” the hyperbolic radius of the circles of until the boundary circles all become infinite radius (i.e., horocycles). The difficulty is that one cannot just arbitrarily increase the radius of any given circle without destroying the required tangency properties. The resolution to this difficulty given in the work of Beardon and Stephenson that we are following here was inspired by Perron’s method of subharmonic functions, in which one faced an analogous difficulty that one could not easily manipulate a harmonic function without destroying its harmonicity. There, the solution was to work instead with the more flexible class of subharmonic functions; here we similarly work with the concept of a subpacking.
We will need some preliminaries to define this concept precisely. We first need some hyperbolic trigonometry. We define a hyperbolic triangle to be the solid (and closed) region in enclosed by three distinct points in and the geodesic arcs connecting them. (Note that we allow one or more of the vertices to be on the boundary of the disk, so that the sides of the triangle could have infinite length.) Let be the space of triples with and not all of infinite. We say that a hyperbolic triangle with vertices is a -triangle if there are hyperbolic circles with the indicated hyperbolic centres and hyperbolic radii that are externally tangent to each other; note that this implies that the sidelengths opposite have length respectively (see Figure 3 of Beardon and Stephenson). It is easy to see that for any , there exists a unique -triangle in up to reflections and Möbius automorphisms (use Möbius transforms to fix two of the hyperbolic circles, and consider all the circles externally tangent to both of these circles; the case when one or two of the are infinite may need to be treated separately.). As a consequence, there is a well defined angle for subtended by the vertex of an triangle. We need some basic facts from hyperbolic geometry:
Exercise 15 (Hyperbolic trigonometry)
- (i) (Hyperbolic cosine rule) For any , show that the quantity is equal to the ratio
Furthermore, establish the limiting angles
(Hint: to facilitate computations, use a Möbius transform to move the vertex to the origin when the radius there is finite.) Conclude in particular that is continuous (using the topology of the extended real line for each component of ). Discuss how this rule relates to the Euclidean cosine rule in the limit as go to zero. Of course, by relabeling one obtains similar formulae for and .
- (ii) (Area rule) Show that the area of a hyperbolic triangle is given by , where are the angles of the hyperbolic triangle. (Hint: there are several ways to proceed. For instance, one can prove this for small hyperbolic triangles (of diameter ) up to errors of size after normalising as in (ii), and then establish the general case by subdividing a large hyperbolic triangle into many small hyperbolic triangles. This rule is also a special case of the Gauss-Bonnet theorem in Riemannian geometry. One can also first establish the case when several of the radii are infinite, and use that to derive finite cases.) In particular, the area of a -triangle is given by the formula
- (iii) Show that the area of the interior of a hyperbolic circle with is equal to .
Henceforth we fix as in Theorem 14. We refer to the vertices as boundary vertices of and the remaining vertices as interior vertices; edges between boundary vertices are boundary edges, all other edges will be called interior edges (including edges that have one vertex on the boundary). Triangles in that involve two boundary vertices (and thus necessarily one interior vertex) will be called boundary triangles; all other triangles (including ones that involve one boundary vertex) will be called interior triangles. To any triangle of , we can form the hyperbolic triangle with vertices ; this is an -triangle. Let denote the collection of such hyperbolic triangles; because is a packing, we see that these triangles have disjoint interiors. They also fit together in the following way: if is a side of a hyperbolic triangle in , then there will be another hyperbolic triangle in that shares that side precisely when is associated to an interior edge of . The union of all these triangles is homeomorphic to the region formed by starting with a triangulation of the Riemann sphere by and removing the triangles containing as a vertex, and is therefore homeomorphic to a disk. One can think of the collection of hyperbolic triangles, together with the vertices and edges shared by these triangles, as a two-dimensional (hyperbolic) simplicial complex, though we will not develop the full machinery of such complexes here.
Our objective is to find another hyperbolic circle packing homotopic to the existing circle packing , such at all the boundary circles (circles centred at boundary vertices) are horocycles. We observe that such a hyperbolic circle packing is completely described (up to Möbius transformations) by the hyperbolic radii of these circles. Indeed, suppose one knows the values of these hyperbolic radii. Then each hyperbolic triangle in is associated to a hyperbolic triangle whose sides and angles are known from Exercise 15. As the orientation of each hyperbolic triangle is fixed, each hyperbolic triangle is determined up to a Möbius automorphism of . Once one fixes one hyperbolic triangle, the adjacent hyperbolic triangles (that share a common side with the first triangle) are then also fixed; continuing in this fashion we see that the entire hyperbolic circle packing is determined.
On the other hand, not every choice of radii will lead to a hyperbolic circle packing with the required properties. There are two obvious constraints that need to be satisfied:
There could potentially also be a global constraint, in that one requires the circles of the packing to be disjoint – including circles that are not necessarily adjacent to each other. In general, one can easily create configurations of circles that are local circle packings but not global ones (see e.g., Figure 7 of Beardon-Stephenson). However, it turns out that one can use the boundary constraint and topological arguments to prevent this from happening. We first need a topological lemma:
Lemma 16 (Topological lemma) Let be bounded connected open subsets of with simply connected, and let be a continuous map such that and . Suppose furthermore that the restriction of to is a local homeomorphism. Then is in fact a global homeomorphism.
The requirement that the restriction of to be a local homeomorphism can in fact be relaxed to local injectivity thanks to the invariance of domain theorem. The complex numbers can be replaced here by any finite-dimensional vector space.
Proof: The preimage of any point in the interior of is closed, discrete, and disjoint from , and is hence finite. Around each point in the preimage, there is a neighbourhood on which is a homeomorphism onto a neighbourhood of . If one deletes the closure of these neighbourhoods, the image under is compact and avoids , and thus avoids a neighbourhood of . From this we can show that is a covering map from to . As the base is simply connected, it is its own universal cover, and hence (by the connectedness of ) must be a homeomorphism as claimed.
Proposition 17 Suppose we assign a radius to each that obeys the local constraint (i) and the boundary constraint (ii). Then there is a hyperbolic circle packing with nerve and the indicated radii.
Proof: We first create the hyperbolic triangles associated with the required hyperbolic circle packing, and then verify that this indeed arises from a circle packing.
Start with a single triangle in , and arbitrarily select a -triangle with the same orientation as . By Exercise 15(i), such a triangle exists (and is unique up to Möbius automorphisms of the disk). If a hyperbolic triangle has been fixed, and (say) is an adjacent triangle in , we can select to be the unique -triangle with the same orientation as that shares the side in common with (with the and vertices agreeing). Similarly for other permutations of the labels. As is a maximal planar graph with non-degenerate (so in particular the set of internal vertices is connected), we can continue this construction to eventually fix every triangle in . There is the potential issue that a given triangle may depend on the order in which one arrives at that triangle starting from , but one can check from a monodromy argument (in the spirit of the monodromy theorem) using the local constraint (i) and the simply connected nature of the triangulation associated to that there is in fact no dependence on the order. (The process resembles that of laying down jigsaw pieces in the shape of hyperbolic triangles together, with the local constraint ensuring that there is always a flush fit locally.)
Now we show that the hyperbolic triangles have disjoint interiors inside the disk . Let denote the topological space formed by taking the disjoint union of the hyperbolic triangles (now viewed as abstract topological spaces rather than subsets of the disk) and then gluing together all common edges, e.g. identifying the edge of with the same edge of if and are adjacent triangles in . This space is homeomorphic to the union of the original hyperbolic triangles , and is thus homeomorphic to the closed unit disk. There is an obvious projection map from to the union of the , which maps the abstract copy in of a given hyperbolic triangle to its concrete counterpart in in the obvious fashion. This map is continuous. It does not quite cover the full closed disk, mainly because (by the boundary condition (ii)) the boundary hyperbolic triangles touch the boundary of the disk at the vertices associated to and but do not follow the boundary arc connecting these vertices, being bounded instead by the geodesic from the vertex to the vertex; the missing region is a lens-shaped region bounded by two circular arcs. However, by applying another homeomorphism (that does not alter the edges from to or to ), one can “push out” the edge of this hyperbolic triangle across the lens to become the boundary arc from to . If one performs this modification for each boundary triangle, one arrives at a modified continuous map from to , which now has the property that the boundary of maps to the boundary of the disk, and the interior of maps to the interior of the disk. Also one can check that this map is a local homeomorphism. By Lemma 16, is injective; undoing the boundary modifications we conclude that is injective. Thus the hyperbolic triangles have disjoint interiors. Furthermore, the arguments show that for each boundary triangle , the lens-shaped regions between the boundary arc between the vertices associated to and the corresponding edge of the boundary triangle are also disjoint from the hyperbolic triangles and from each other. On the other hand, all of the hyperbolic circles and in and their interiors are contained in the union of the hyperbolic triangles and the lens-shaped regions, with each hyperbolic triangle containing portions only of the hyperbolic circles with hyperbolic centres at the vertices of the triangle, and similarly for the lens-shaped regions. From this one can verify that the interiors of the hyperbolic circles are all disjoint from each other, and give a hyperbolic circle packing with the required properties.
In view of the above proposition, the only remaining task is to find an assignment of radii obeying both the local condition (i) and the boundary condition (ii). This is analogous to finding a harmonic function with specified boundary data. To do this, we perform the following analogue of Perron’s method. Define a subpacking to be an assignment of radii obeying the following
This can be compared with the definition of a (smooth) subharmonic function as one where the Laplacian is always at least zero. Note that we always have at least one subpacking, namely the one provided by the radii of the original hyperbolic circle packing . Intuitively, in each subpacking, the radius at an interior vertex is either “too small” or “just right”.
We now need a key monotonicity property, analogous to how the maximum of two subharmonic functions is again subharmonic:
- (i) Show that the angle (as defined in Exercise 15(i)) is strictly decreasing in and strictly increasing in or (if one holds the other two radii fixed). Do these claims agree with your geometric intuition?
- (ii) Conclude that whenever and are subpackings, that is also a subpacking.
- (iii) Let be such that for . Show that , with equality if and only if for all . (Hint: increase just one of the radii . One can either use calculus (after first disposing of various infinite radii cases) or one can argue geometrically.)
As with Perron’s method, we can now try to construct a hyperbolic circle packing by taking the supremum of all the subpackings. To avoid degeneracies we need an upper bound:
Proposition 19 (Upper bound) Let be a subpacking. Then for any interior vertex of degree , one has .
The precise value of is not so important for our arguments, but the fact that it is finite will be. This boundedness of interior circles in a circle packing is a key feature of hyperbolic geometry that is not present in Euclidean geometry, and is one of the reasons why we moved to a hyperbolic perspective in the first place.
Proof: By the subpacking property and pigeonhole principle, there is a triangle in such that . The hyperbolic triangle associated to has area at most by (2); on the other hand, it contains a sector of a hyperbolic circle of radius and angle , and hence has area at least , thanks to Exercise 15(iv). Comparing the two bounds gives the claim.
Now define to be the (pointwise) supremum of all the subpackings. By the above proposition, is finite at every interior vertex. By Exercise 18, one can view as a monotone increasing limit of subpackings, and is thus again a subpacking (due to the continuity properties of as long as at least one of the radii stays bounded); thus is the maximal subpacking. On the other hand, if is finite at some boundary vertex, then by Exercise 18(i) one could replace that radius by a larger quantity without destroying the subpacking property, contradicting the maximality of . Thus all the boundary radii are infinite, that is to say the boundary condition (ii) holds. Finally, if the sum of the angles at an interior vertex is strictly greater than , then by Exercise 18 we could increase the radius at this vertex slightly without destroying the subpacking property at or at any other of the interior vertices, again contradicting the maximality of . Thus obeys the local condition (i), and we have demonstrated existence of the required hyperbolic circle packing.
Finally we establish uniqueness. It suffices to establish that is the unique tuple that obeys the local condition (i) and the boundary condition (ii). Suppose we had another tuple other than that obeyed these two conditions. Then by the maximality of , we have for all . By Exercise 18(iii), this implies that
for any triangle in . Summing over all triangles and using (2), we conclude that
where the inner sum is over the pairs such that forms a triangle in . But by the local condition (i) and the boundary condition (ii), the inner sum on either side is equal to for an interior vertex and for a boundary vertex. Thus the two sides agree, which by Exercise 18(iii) implies that for all . This proves Theorem 14 and thus Theorems 7, 3, 4.
— 2. Quasiconformal maps —
In this section we set up some of the foundational theory of quasiconformal mapping, which are generalisations of the conformal mapping concept that can tolerate some deviations from perfect conformality, while still retaining many of the good properties of conformal maps (such as being preserved under uniform limits), though with the notable caveat that in contrast to conformal maps, quasiconformal maps need not be smooth. As such, this theory will come in handy when proving convergence of circle packings to the Riemann map. The material here is largely drawn from the text of Lehto and Virtanen.
We first need the following refinement of the Riemann mapping theorem, known as Carathéodory’s theorem:
Theorem 20 (Carathéodory’s theorem) Let be a bounded simply connected domain in whose boundary is a Jordan curve, and let be a conformal map between and (as given by the Riemann mapping theorem). Then extends to a continuous homeomorphism from to .
The condition that be a Jordan curve is clearly necessary, since if is not simple then there are paths in that end up at different points in but have the same endpoint in after applying , which prevents being continuously extended to a homeomorphism.
Proof: We first prove continuous extension to the boundary. It suffices to show that for every point on the boundary of the unit circle, the diameters of the sets go to zero for some sequence of radii .
First observe from the change of variables formula that the area of is given by , where denotes Lebesgue measure (or the area element). In particular, this integral is finite. Expanding in polar coordinates around