Planet Musings

April 26, 2018

Jordan EllenbergThe Lovasz number of the plane is about 3.48287

As seen in this comment on Polymath and explicated further in Fernando de Oliveira Filho’s thesis, section 4.4.

I actually spent much of today thinking about this so let me try to explain it in a down-to-earth way, because it involved me thinking about Bessel functions for the first time ever, surely a life event worthy of recording.

So here’s what we’re going to do.  As I mentioned last week, you can express this problem as follows:  suppose you have a map h: R^2 -> V, for some normed vector space V, which is a unit-distance embedding; that is, if |x-x’|_{R^2} = 1, then |h(x)-h(x’)|_V = 1.  (We don’t ask that h is an isometry, only that it preserves the distance-1 set.)

Then let t be the radius of the smallest hypersphere in V containing h(R^2).

Then any graph embeddable in R^2 with all edges of length 1 is sent to a unit-distance graph in V contained in the hyperplane of radius t; this turns out to be equivalent to saying the Lovasz number of G (ok, really I mean the Lovasz number of the complement of G) is at most 1/(1-2t).  So we want to show that t is bounded below 1, is the point.  Or rather:  we can find a V and a map from R^2 to V to make this the case.

So here’s one!  Let V be the space of L^2 functions on R^2 with the usual inner product.  Choose a square-integrable function F on R^2 — in fact let’s normalize to make F^2 integrate to 1 — and for each a in R^2 we let h(a) be the function F(x-a).

We want the distance between F(x-a) and F(x-b) to be the same for every pair of points at distance 1 from each other; the easiest way to arrange that is to insist that F(x) be a radially symmetric function F(x) = f(|x|); then it’s easy to see that the distance between F(x-a) and F(x-b) in V is a function G(a-b) which depends only on |a-b|.  We write

g(r) = \int_{\mathbf{R}^2} F(x)F(x-r) dx

so that the squared distance between F(x) and F(x-r) is

\int F(x)^2 dx - \int F(x)F(x-r) dx + \int F(x-r)^2 dx = 2(1-g(r)).

In particular, if two points in R^2 are at distance 1, the squared distance between their images in V is 2(1-g(1)).  Note also that g(0) is the square integral of F, which is 1.

What kind of hypersphere encloses all the points F(x-a) in V?  We can just go ahead and take the “center” of our hypersphere to be 0; since |F| = 1, every point in h(R^2) lies in (indeed, lies on) the sphere of radius 1 around the origin.

Hey but remember:  we want to study a unit-distance embedding of R^2 in V.  Right now, h sends unit distances to the distance 2(1-g(1)), whatever that is.  We can fix that by scaling h by the square root of that number.  So now h sends unit distances to unit distances, and its image is enclosed in a hypersphere of radius


The more negative g(1) is, the smaller this sphere is, which means the more we can “fold” R^2 into a small space.  Remember, the relationship between hypersphere number and Lovasz theta is

2t + 1/\theta = 1

and plugging in the above bound for the hypersphere number, we find that the Lovasz theta number of R^2, and thus the Lovasz theta number of any unit-distance graph in R^2, is at most


So the only question is — what is g(1)?

Well, that depends on what g is.

Which depends on what F is.

Which depends on what f is.

And of course we get to choose what f is, in order to make g(1) as negative as possible.

How do we do this?  Well, here’s the trick.  The function G is not arbitrary; if it were, we could make g(1) whatever we wanted.  It’s not hard to see that G is what’s called a positive definite function on R^2.  And moreover, if G is positive definite, there exists some f giving rise to it.  (Roughly speaking, this is the fact that a positive definite symmetric matrix has a square root.)  So we ask:  if G is a positive definite (radially symmetric) function on R^2, and g(0) = 1, how small can g(1) be?

And now there’s an old theorem of (Wisconsin’s own!) Isaac Schoenberg which helpfully classifies the positive definite functions on R^2; they are precisely the functions G(x) = g(|x|) where g is a mixture of scalings of the Bessel function $J_0$:

g(r) = \int_0^\infty J_0(ur) A(u)

for some everywhere nonnegative A(u).  (Actually it’s more correct to say that A is a distribution and we are integrating J_0(ur) against a non-decreasing measure.)

So g(1) can be no smaller than the minimum value of J_0 on [0,infty], and in fact can be exactly that small if you let A become narrowly supported around the minimum argument.  This is basically just taking g to be a rescaled version of J_0 which achieves its minimum at 1.  That minimum value is about -0.4, and so the Lovasz theta for any unit-distance subgraph on the plane is bounded above by a number that’s about 1 + 1/0.4 = 3.5.

To sum up:  I give you a set of points in the plane, I connect every pair that’s at distance 1, and I ask how you can embed that graph in a small hypersphere keeping all the distances 1.  And you say:  “Oh, I know what to do, just assign to each point a the radially symmetrized Bessel function J_0(|x-a|) on R^2, the embedding of your graph in the finite-dimensional space of functions spanned by those Bessel translates will do the trick!”

That is cool!

Remark: Oliveira’s thesis does this for Euclidean space of every dimension (it gets more complicated.)  And I think (using analysis I haven’t really tried to understand) he doesn’t just give an upper bound for the Lovasz number of the plane as I do in this post, he really computes that number on the nose.

Update:  DeCorte, Oliveira, and Vallentin just posted a relevant paper on the arXiv this morning!

April 25, 2018

Terence Tao246C notes 2: Circle packings, conformal maps, and quasiconformal maps

We now leave the topic of Riemann surfaces, and turn now to the (loosely related) topic of conformal mapping (and quasiconformal mapping). Recall that a conformal map {f: U \rightarrow V} from an open subset {U} of the complex plane to another open set {V} is a map that is holomorphic and bijective, which (by Rouché’s theorem) also forces the derivative of {f} to be nowhere vanishing. We then say that the two open sets {U,V} are conformally equivalent. From the Cauchy-Riemann equations we see that conformal maps are orientation-preserving and angle-preserving; from the Newton approximation {f( z_0 + \Delta z) \approx f(z_0) + f'(z_0) \Delta z + O( |\Delta z|^2)} we see that they almost preserve small circles, indeed for {\varepsilon} small the circle {\{ z: |z-z_0| = \varepsilon\}} will approximately map to {\{ w: |w - f(z_0)| = |f'(z_0)| \varepsilon \}}.

In previous quarters, we proved a fundamental theorem about this concept, the Riemann mapping theorem:

Theorem 1 (Riemann mapping theorem) Let {U} be a simply connected open subset of {{\bf C}} that is not all of {{\bf C}}. Then {U} is conformally equivalent to the unit disk {D(0,1)}.

This theorem was proven in these 246A lecture notes, using an argument of Koebe. At a very high level, one can sketch Koebe’s proof of the Riemann mapping theorem as follows: among all the injective holomorphic maps {f: U \rightarrow D(0,1)} from {U} to {D(0,1)} that map some fixed point {z_0 \in U} to {0}, pick one that maximises the magnitude {|f'(z_0)|} of the derivative (ignoring for this discussion the issue of proving that a maximiser exists). If {f(U)} avoids some point in {D(0,1)}, one can compose {f} with various holomorphic maps and use Schwarz’s lemma and the chain rule to increase {|f'(z_0)|} without destroying injectivity; see the previous lecture notes for details. The conformal map {\phi: U \rightarrow D(0,1)} is unique up to Möbius automorphisms of the disk; one can fix the map by picking two distinct points {z_0,z_1} in {U}, and requiring {\phi(z_0)} to be zero and {\phi(z_1)} to be positive real.

It is a beautiful observation of Thurston that the concept of a conformal mapping has a discrete counterpart, namely the mapping of one circle packing to another. Furthermore, one can run a version of Koebe’s argument (using now a discrete version of Perron’s method) to prove the Riemann mapping theorem through circle packings. In principle, this leads to a mostly elementary approach to conformal geometry, based on extremely classical mathematics that goes all the way back to Apollonius. However, in order to prove the basic existence and uniqueness theorems of circle packing, as well as the convergence to conformal maps in the continuous limit, it seems to be necessary (or at least highly convenient) to use much more modern machinery, including the theory of quasiconformal mapping, and also the Riemann mapping theorem itself (so in particular we are not structuring these notes to provide a completely independent proof of that theorem, though this may well be possible).

To make the above discussion more precise we need some notation.

Definition 2 (Circle packing) A (finite) circle packing is a finite collection {(C_j)_{j \in J}} of circles {C_j = \{ z \in {\bf C}: |z-z_j| = r_j\}} in the complex numbers indexed by some finite set {J}, whose interiors are all disjoint (but which are allowed to be tangent to each other), and whose union is connected. The nerve of a circle packing is the finite graph whose vertices {\{z_j: j \in J \}} are the centres of the circle packing, with two such centres connected by an edge if the circles are tangent. (In these notes all graphs are undirected, finite and simple, unless otherwise specified.)

It is clear that the nerve of a circle packing is connected and planar, since one can draw the nerve by placing each vertex (tautologically) in its location in the complex plane, and drawing each edge by the line segment between the centres of the circles it connects (this line segment will pass through the point of tangency of the two circles). Later in these notes we will also have to consider some infinite circle packings, most notably the infinite regular hexagonal circle packing.

The first basic theorem in the subject is the following converse statement:

Theorem 3 (Circle packing theorem) Every connected planar graph is the nerve of a circle packing.

Of course, there can be multiple circle packings associated to a given connected planar graph; indeed, since reflections across a line and Möbius transformations map circles to circles (or lines), they will map circle packings to circle packings (unless one or more of the circles is sent to a line). It turns out that once one adds enough edges to the planar graph, the circle packing is otherwise rigid:

Theorem 4 (Koebe-Andreev-Thurston theorem) If a connected planar graph is maximal (i.e., no further edge can be added to it without destroying planarity), then the circle packing given by the above theorem is unique up to reflections and Möbius transformations.

Exercise 5 Let {G} be a connected planar graph with {n \geq 3} vertices. Show that the following are equivalent:

  • (i) {G} is a maximal planar graph.
  • (ii) {G} has {3n-6} edges.
  • (iii) Every drawing {D} of {G} divides the plane into faces that have three edges each. (This includes one unbounded face.)
  • (iv) At least one drawing {D} of {G} divides the plane into faces that have three edges each.

(Hint: use Euler’s formula {V-E+F=2}, where {F} is the number of faces including the unbounded face.)

Thurston conjectured that circle packings can be used to approximate the conformal map arising in the Riemann mapping theorem. Here is an informal statement:

Conjecture 6 (Informal Thurston conjecture) Let {U} be a simply connected domain, with two distinct points {z_0,z_1}. Let {\phi: U \rightarrow D(0,1)} be the conformal map from {U} to {D(0,1)} that maps {z_0} to the origin and {z_1} to a positive real. For any small {\varepsilon>0}, let {{\mathcal C}_\varepsilon} be the portion of the regular hexagonal circle packing by circles of radius {\varepsilon} that are contained in {U}, and let {{\mathcal C}'_\varepsilon} be an circle packing of {D(0,1)} with all “boundary circles” tangent to {D(0,1)}, giving rise to an “approximate map” {\phi_\varepsilon: U_\varepsilon \rightarrow D(0,1)} defined on the subset {U_\varepsilon} of {U} consisting of the circles of {{\mathcal C}_\varepsilon}, their interiors, and the interstitial regions between triples of mutually tangent circles. Normalise this map so that {\phi_\varepsilon(z_0)} is zero and {\phi_\varepsilon(z_1)} is a positive real. Then {\phi_\varepsilon} converges to {\phi} as {\varepsilon \rightarrow 0}.

A rigorous version of this conjecture was proven by Rodin and Sullivan. Besides some elementary geometric lemmas (regarding the relative sizes of various configurations of tangent circles), the main ingredients are a rigidity result for the regular hexagonal circle packing, and the theory of quasiconformal maps. Quasiconformal maps are what seem on the surface to be a very broad generalisation of the notion of a conformal map. Informally, conformal maps take infinitesimal circles to infinitesimal circles, whereas quasiconformal maps take infinitesimal circles to infinitesimal ellipses of bounded eccentricity. In terms of Wirtinger derivatives, conformal maps obey the Cauchy-Riemann equation {\frac{\partial \phi}{\partial \overline{z}} = 0}, while (sufficiently smooth) quasiconformal maps only obey an inequality {|\frac{\partial \phi}{\partial \overline{z}}| \leq \frac{K-1}{K+1} |\frac{\partial \phi}{\partial z}|}. As such, quasiconformal maps are considerably more plentiful than conformal maps, and in particular it is possible to create piecewise smooth quasiconformal maps by gluing together various simple maps such as affine maps or Möbius transformations; such piecewise maps will naturally arise when trying to rigorously build the map {\phi_\varepsilon} alluded to in the above conjecture. On the other hand, it turns out that quasiconformal maps still have many vestiges of the rigidity properties enjoyed by conformal maps; for instance, there are quasiconformal analogues of fundamental theorems in conformal mapping such as the Schwarz reflection principle, Liouville’s theorem, or Hurwitz’s theorem. Among other things, these quasiconformal rigidity theorems allow one to create conformal maps from the limit of quasiconformal maps in many circumstances, and this will be how the Thurston conjecture will be proven. A key technical tool in establishing these sorts of rigidity theorems will be the theory of an important quasiconformal (quasi-)invariant, the conformal modulus (or, equivalently, the extremal length, which is the reciprocal of the modulus).

— 1. Proof of the circle packing theorem —

We loosely follow the treatment of Beardon and Stephenson. It is slightly more convenient to temporarily work in the Riemann sphere {{\bf C} \cup \{\infty\}} rather than the complex plane {{\bf C}}, in order to more easily use Möbius transformations. (Later we will make another change of venue, working in the Poincaré disk {D(0,1)} instead of the Riemann sphere.)

Define a Riemann sphere circle to be either a circle in {{\bf C}} or a line in {{\bf C}} together with {\infty}, together with one of the two components of the complement of this circle or line designated as the “interior”. In the case of a line, this “interior” is just one of the two half-planes on either side of the line; in the case of the circle, this is either the usual interior or the usual exterior plus the point at infinity; in the last case, we refer to the Riemann sphere circle as an exterior circle. (One could also equivalently work with an orientation on the circle rather than assigning an interior, since the interior could then be described as the region to (say) the left of the circle as one traverses the circle along the indicated orientation.) Note that Möbius transforms map Riemann sphere circles to Riemann sphere circles. If one views the Riemann sphere as a geometric sphere in Euclidean space {{\bf R}^3}, then Riemann sphere circles are just circles on this geometric sphere, which then have a centre on this sphere that lies in the region designated as the interior of the circle. We caution though that this “Riemann sphere” centre does not always correspond to the Euclidean notion of the centre of a circle. For instance, the real line, with the upper half-plane designated as interior, will have {i} as its Riemann sphere centre; if instead one designates the lower half-plane as the interior, the Riemann sphere centre will now be {-i}. We can then define a Riemann sphere circle packing in exact analogy with circle packings in {{\bf C}}, namely finite collections of Riemann sphere circles whose interiors are disjoint and whose union is connected; we also define the nerve as before. This is now a graph that can be drawn in the Riemann sphere, using great circle arcs in the Riemann sphere rather than line segments; it is also planar, since one can apply a Möbius transformation to move all the points and edges of the drawing away from infinity.

By Exercise 5, a maximal planar graph with at least three vertices can be drawn as a triangulation of the Riemann sphere. If there are at least four vertices, then it is easy to see that each vertex has degree at least three (a vertex of degree zero, one or two in a triangulation with simple edges will lead to a connected component of at most three vertices). It is a topological fact, not established here, that any two triangulations of such a graph are homotopic up to reflection (to reverse the orientation). If a Riemann sphere circle packing has the nerve of a maximal planar graph {G} of at least four vertices, then we see that this nerve induces an explicit triangulation of the Riemann sphere by connecting the centres of any pair of tangent circles with the great circle arc that passes through the point of tangency. If {G} was not maximal, one no longer gets a triangulation this way, but one still obtains a partition of the Riemann sphere into spherical polygons.

We remark that the triangles in this triangulation can also be described purely from the abstract graph {G}. Define a triangle in {G} to be a triple {w_1,w_2,w_3} of vertices in {G} which are all adjacent to each other, and such that the removal of these three vertices from {G} does not disconnect the graph. One can check that there is a one-to-one correspondence between such triangles in a maximal planar graph {G} and the triangles in any Riemann sphere triangulation of this graph.

Theorems 3, 4 are then a consequence of

Theorem 7 (Riemann sphere circle packing theorem) Let {G} be a maximal planar graph with at least four vertices, drawn as a triangulation of the Riemann sphere. Then there exists a Riemann sphere circle packing with nerve {G} whose triangulation is homotopic to the given triangulation. Furthermore, this packing is unique up to Möbius transformations.

Exercise 8 Deduce Theorems 3, 4 from Theorem 7. (Hint: If one has a non-maximal planar graph for Theorem 3, add a vertex at the interior of each non-triangular face of a drawing of that graph, and connect that vertex to the vertices of the face, to create a maximal planar graph to which Theorem 4 or Theorem 7 can be applied. Then delete these “helper vertices” to create a packing of the original planar graph that does not contain any “unwanted” tangencies. You may use without proof the above assertion that any two triangulations of a maximal planar graph are homotopic up to reflection.)

Exercise 9 Verify Theorem 7 when {G} has exactly four vertices. (Hint: for the uniqueness, one can use Möbius transformations to move two of the circles to become parallel lines.)

To prove this theorem, we will make a reduction with regards to the existence component of Theorem 7. For technical reasons we will need to introduce a notion of non-degeneracy. Let {G} be a maximal planar graph with at least four vertices, and let {v} be a vertex in {G}. As discussed above, the degree {d} of {v} is at least three. Writing the neighbours of {v} in clockwise or counterclockwise order (with respect to a triangulation) as {v_1,\dots,v_d} (starting from some arbitrary neighbour), we see that each {v_i} is adjacent to {v_{i-1}} and {v_{i+1}} (with the conventions {v_0=v_d} and {v_{d+1}=v_1}). We say that {v} is non-degenerate if there are no further adjacencies between the {v_1,\dots,v_d}, and if there is at least one further vertex in {G} besides {v,v_1,\dots,v_d}. Here is another characterisation:

Exercise 10 Let {G} be a maximal planar graph with at least four vertices, let {v} be a vertex in {G}, and let {v_1,\dots,v_d} be the neighbours of {v}. Show that the following are equivalent:

  • (i) {v} is non-degenerate.
  • (ii) The graph {G \backslash \{ v, v_1, \dots, v_d \}} is connected and non-empty, and every vertex in {v_1,\dots,v_d} is adjacent to at least one vertex in {G \backslash \{ v, v_1, \dots, v_d \}}.

We will then derive Theorem 7 from

Theorem 11 (Inductive step) Let {G} be a maximal planar graph with at least four vertices {V}, drawn as a triangulation of the Riemann sphere. Let {v} be a non-degenerate vertex of {G}, and let {G - \{v\}} be the graph formed by deleting {v} (and edges emenating from {v}) from {G}. Suppose that there exists a Riemann sphere circle packing {(C_w)_{w \in V \backslash \{v\}}} whose nerve is at least {G - \{v\}} (that is, {C_w} and {C_{w'}} are tangent whenever {w,w'} are adjacent in {G - \{v\}}, although we also allow additional tangencies), and whose associated subdivision of the Riemann sphere into spherical polygons is homotopic to the given triangulation with {v} removed. Then there is a Riemann sphere circle packing {(\tilde C_w)_{w \in V}} with nerve {G} whose triangulation is homotopic to the given triangulation. Furthermore this circle packing {(\tilde C_w)_{w \in V}} is unique up to Möbius transformations.

Let us now see how Theorem 7 follows from Theorem 14. Fix {G} as in Theorem 7. By Exercise 9 and induction we may assume that {G} has at least five vertices, and that the claim has been proven for any smaller number of vertices.

First suppose that {G} contains a non-degenerate vertex {v}. Let {v_1,\dots,v_d} be the the neighbours of {v}. One can then form a new graph {G'} with one fewer vertex by deleting {v}, and then connecting {v_3,\dots,v_{d-1}} to {v_1} (one can think of this operation as contracting the edge {\{v,v_1\}} to a point). One can check that this is still a maximal planar graph that can triangulate the Riemann sphere in a fashion compatible with the original triangulation of {G} (in that all the common vertices, edges, and faces are unchanged). By induction hypothesis, {G'} is the nerve of a circle packing that is compatible with this triangulation, and hence this circle packing has nerve at least {G - \{v\}}. Applying Theorem 14, we then obtain the required claim for {G}.

Now suppose that {G} contains a degenerate vertex {v}. Let {v_1,\dots,v_d} be the neighbours of {v} traversed in order. By hypothesis, there is an additional adjacency between the {v_1,\dots,v_d}; by relabeling we may assume that {v_1} is adjacent to {v_k} for some {3 \leq k \leq d-1}. The vertices {V} in {G} can then be partitioned as

\displaystyle  V = \{v\} \cup \{ v_1,\dots,v_d\} \cup V_1 \cup V_2

where {V_1} denotes those vertices in {V \backslash \{ v_1,\dots,v_d\}} that lie in the region enclosed by the loop {v_1,\dots,v_k, v_1} that does not contain {v}, and {V_2} denotes those vertices in {V \backslash \{ v_1,\dots,v_d\}} that lie in the region enclosed by the loop {v_k,\dots,v_d,v_1, v_k} that does not contain {v}. One can then form two graphs {G_1, G_2}, formed by restricting {G} to the vertices {\tilde V_1 := \{v, v_1,\dots,v_k\} \cup V_1} and {\tilde V_2 := \{ v, v_k, \dots, v_d, v_1\} \cup V_2} respectively; furthermore, these graphs are also maximal planar (with triangulations that are compatible with those of {G}). By induction hypothesis, we can find a circle packing {(C_w)_{w \in \tilde V_1}} with nerve {G_1}, and a circle packing {(C'_w)_{w \in \tilde V_2}} with nerve {G_2}. Note that the circles {C_v, C_{v_1}, C_{v_k}} are mutually tangent, as are {C'_v, C'_{v_1}, C'_{v_k}}. By applying a Möbius transformation one may assume that these circles agree, thus (cf. Exercise 9) {C_v = C'_v}, {C_{v_1} = C'_{v_1}, C_{v_k} = C'_{v_k}}. The complement of the these three circles (and their interiors) determine two connected “interstitial” regions (that are in the shape of an arbelos, up to Möbius transformation); one can check that the remaining circles in {(C_w)_{w \in \tilde V_1}} will lie in one of these regions, and the remaining circles in {(C'_w)_{w \in \tilde V_2}} lie in the other. Hence one can glue these circle packings together to form a single circle packing with nerve {G}, which is homotopic to the given triangulation. Also, since a Möbius transformation that fixes three mutually tangent circles has to be the identity, the uniqueness of this circle packing up to Möbius transformations follows from the uniqueness for the two component circle packings {(C_w)_{w \in \tilde V_1}}, {(C'_w)_{w \in \tilde V_2}}.

It remains to prove Theorem 7. To help fix the freedom to apply Möbius transformations, we can normalise the target circle packing {(\tilde C_w)_{w \in V}} so that {\tilde C_v} is the exterior circle {\{ |z|=1\}}, thus all the other circles {\tilde C_w} in the packing will lie in the closed unit disk {\overline{D(0,1)}}. Similarly, by applying a suitable Möbius transformation one can assume that {\infty} lies outside of the interior of all the circles {C_w} in the original packing, and after a scaling one may then assume that all the circles {C_w} lie in the unit disk {D(0,1)}.

At this point it becomes convenient to switch from the “elliptic” conformal geometry of the Riemann sphere {{\bf C} \cup \{\infty\}} to the “hyperbolic” conformal geometry of the unit disk {D(0,1)}. Recall that the Möbius transformations that preserve the disk {D(0,1)} are given by the maps

\displaystyle  z \mapsto e^{i\theta} \frac{z-\alpha}{1-\overline{\alpha} z} \ \ \ \ \ (1)

for real {\theta} and {\alpha \in D(0,1)} (see Theorem 19 of these notes). It comes with a natural metric that interacts well with circles:

Exercise 12 Define the Poincaré distance {d(z_1,z_2)} between two points of {D(0,1)} by the formula

\displaystyle  d(z_1,z_2) := 2 \mathrm{arctanh} |\frac{z_1-z_2}{1-z_1 \overline{z_2}}|.

Given a measurable subset {E} of {D(0,1)}, define the hyperbolic area of {E} to be the quantity

\displaystyle  \mathrm{area}(E) := \int_E \frac{4\ dz d\overline{z}}{(1-|z|^2)^2}

where {dz d\overline{z}} is the Euclidean area element on {D(0,1)}.

  • (i) Show that the Poincaré distance is invariant with respect to Möbius automorphisms of {D(0,1)}, thus {d(Tz_1, Tz_2) = d(z_1,z_2)} whenever {T} is a transformation of the form (1). Similarly show that the hyperbolic area is invariant with respect to such transformations.
  • (ii) Show that the Poincaré distance defines a metric on {D(0,1)}. Furthermore, show that any two distinct points {z_1,z_2} are connected by a unique geodesic, which is a portion of either a line or a circle that meets the unit circle orthogonally at two points. (Hint: use the symmetries of (i) to normalise the points one is studying.)
  • (iii) If {C} is a circle in the interior of {D(0,1)}, show that there exists a point {z_C} in {D(0,1)} and a positive real number {r_C} (which we call the hyperbolic center and hyperbolic radius respectively) such that {C = \{ z \in D(0,1): d(z,z_C) = r_C \}}. (In general, the hyperbolic center and radius will not quite agree with their familiar Euclidean counterparts.) Conversely, show that for any {z_C \in D(0,1)} and {r_C > 0}, the set {\{ z \in D(0,1): d(z,z_C) = r_C \}} is a circle in {D(0,1)}.
  • (iv) If two circles {C_1, C_2} in {D(0,1)} are externally tangent, show that the geodesic connecting the hyperbolic centers {z_{C_1}, z_{C_2}} passes through the point of tangency, orthogonally to the two tangent circles.

Exercise 13 (Schwarz-Pick theorem) Let {f: D(0,1) \rightarrow D(0,1)} be a holomorphic map. Show that {d(f(z_1),f(z_2)) \leq d(z_1,z_2)} for all {z_1,z_2 \in D(0,1)}. If {z_1 \neq z_2}, show that equality occurs if and only if {f} is a Möbius automorphism (1) of {D(0,1)}. (This result is known as the Schwarz-Pick theorem.)

We will refer to circles that lie in the closure {\overline{D(0,1)}} of the unit disk as hyperbolic circles. These can be divided into the finite radius hyperbolic circles, which lie in the interior of the unit disk (as per part (iii) of the above exercise), and the horocycles, which are internally tangent to the unit circle. By convention, we view horocycles as having infinite radius, and having center at their point of tangency to the unit circle; they can be viewed as the limiting case of finite radius hyperbolic circles when the radius goes to infinity and the center goes off to the boundary of the disk (at the same rate as the radius, as measured with respect to the Poincaré distance). We write {C(p,r)} for the hyperbolic circle with hyperbolic centre {p} and hyperbolic radius {r} (thus either {0 < r < \infty} and {p \in D(0,1)}, or {r = \infty} and {p} is on the unit circle); there is an annoying caveat that when {r=\infty} there is more than one horocycle {C(p,\infty)} with hyperbolic centre {p}, but we will tolerate this breakdown of functional dependence of {C} on {p} and {r} in order to simplify the notation. A hyperbolic circle packing is a circle packing {(C(p_v,r_v))_{v \in V}} in which all circles are hyperbolic circles.

We also observe that the geodesic structure extends to the boundary of the unit disk: for any two distinct points {z_1,z_2} in {\overline{D(0,1)}}, there is a unique geodesic that connects them.

In view of the above discussion, Theorem 7 may now be formulated as follows:

Theorem 14 (Inductive step, hyperbolic formulation) Let {G} be a maximal planar graph with at least four vertices {V}, let {v} be a non-degenerate vertex of {G}, and let {v_1,\dots,v_d} be the vertices adjacent to {v}. Suppose that there exists a hyperbolic circle packing {(C(p_w,r_w))_{w \in V \backslash \{v\}}} whose nerve is at least {G - \{v\}}. Then there is a hyperbolic circle packing {(C(\tilde p_w,\tilde r_w))_{V \backslash \{v\}}} homotopic to {(C(p_w,r_w))_{w \in V \backslash \{v\}}} such that the boundary circles {C(\tilde p_{v_j}, \tilde r_{v_j})}, {j=1,\dots,d} are all horocycles. Furthermore, this packing is unique up to Möbius automorphisms (1) of the disk {D(0,1)}.

Indeed, once one adjoints the exterior unit circle to {(C(p_w,r_w))_{w \in V \backslash \{v\}}}, one obtains a Riemann sphere circle packing whose nerve is at least {G}, and hence equal to {G} since {G} is maximal.

To prove this theorem, the intuition is to “inflate” the hyperbolic radius of the circles of {C_w} until the boundary circles all become infinite radius (i.e., horocycles). The difficulty is that one cannot just arbitrarily increase the radius of any given circle without destroying the required tangency properties. The resolution to this difficulty given in the work of Beardon and Stephenson that we are following here was inspired by Perron’s method of subharmonic functions, in which one faced an analogous difficulty that one could not easily manipulate a harmonic function without destroying its harmonicity. There, the solution was to work instead with the more flexible class of subharmonic functions; here we similarly work with the concept of a subpacking.

We will need some preliminaries to define this concept precisely. We first need some hyperbolic trigonometry. We define a hyperbolic triangle to be the solid (and closed) region in {\overline{D(0,1)}} enclosed by three distinct points {z_1,z_2,z_3} in {\overline{D(0,1)}} and the geodesic arcs connecting them. (Note that we allow one or more of the vertices to be on the boundary of the disk, so that the sides of the triangle could have infinite length.) Let {T := (0,+\infty]^3 \backslash \{ (\infty,\infty,\infty)\}} be the space of triples {(r_1,r_2,r_3)} with {0 < r_1,r_2,r_3 \leq \infty} and not all of {r_1,r_2,r_3} infinite. We say that a hyperbolic triangle with vertices {p_1,p_2,p_3} is a {(r_1,r_2,r_3)}-triangle if there are hyperbolic circles {C(p_i,r_1), C(p_2,r_2), C(p_3,r_3)} with the indicated hyperbolic centres and hyperbolic radii that are externally tangent to each other; note that this implies that the sidelengths opposite {p_1,p_2,p_3} have length {r_2+r_3, r_1+r_3, r_1+r_2} respectively (see Figure 3 of Beardon and Stephenson). It is easy to see that for any {(r_1,r_2,r_3) \in T}, there exists a unique {(r_1,r_2,r_3)}-triangle in {\overline{D(0,1)}} up to reflections and Möbius automorphisms (use Möbius transforms to fix two of the hyperbolic circles, and consider all the circles externally tangent to both of these circles; the case when one or two of the {r_1,r_2,r_3} are infinite may need to be treated separately.). As a consequence, there is a well defined angle {\alpha_i(r_1,r_2,r_3) \in [0,\pi)} for {i=1,2,3} subtended by the vertex {p_i} of an {(r_1,r_2,r_3)} triangle. We need some basic facts from hyperbolic geometry:

Exercise 15 (Hyperbolic trigonometry)

  • (i) (Hyperbolic cosine rule) For any {0 < r_1,r_2,r_3 < \infty}, show that the quantity {\cos \alpha_1(r_1,r_2,r_3)} is equal to the ratio

    \displaystyle  \frac{\cosh( r_1+r_2) \cosh(r_1+r_3) - \cosh(r_2+r_3)}{\sinh(r_1+r_2) \sinh(r_1+r_3)}.

    Furthermore, establish the limiting angles

    \displaystyle  \alpha_1(\infty,r_2,r_3) = \alpha_1(\infty,\infty,r_3) = \alpha_1(\infty,r_2,\infty) = 0

    \displaystyle  \cos \alpha_1(r_1,\infty,r_3) = \frac{\cosh(r_1+r_3) - \exp(r_3-r_1)}{\sinh(r_1+r_3)}

    \displaystyle  \cos \alpha_1(r_1,r_2,\infty) = \frac{\cosh(r_1+r_2) - \exp(r_2-r_1)}{\sinh(r_1+r_2)}

    \displaystyle  \cos \alpha_1(r_1,\infty,\infty) = 1 - 2\exp(-r_1).

    (Hint: to facilitate computations, use a Möbius transform to move the {p_1} vertex to the origin when the radius there is finite.) Conclude in particular that {\alpha_1: T \rightarrow [0,\pi)} is continuous (using the topology of the extended real line for each component of {T}). Discuss how this rule relates to the Euclidean cosine rule in the limit as {r_1,r_2,r_3} go to zero. Of course, by relabeling one obtains similar formulae for {\alpha_2(r_1,r_2,r_3)} and {\alpha_3(r_1,r_2,r_3)}.

  • (ii) (Area rule) Show that the area of a hyperbolic triangle is given by {\pi - \alpha_1-\alpha_2-\alpha_3}, where {\alpha_1,\alpha_2,\alpha_3} are the angles of the hyperbolic triangle. (Hint: one can prove this for small hyperbolic triangles (of diameter {O(\varepsilon)}) up to errors of size {o(\varepsilon^2)} after normalising as in (ii), and then establish the general case by subdividing a large hyperbolic triangle into many small hyperbolic triangles. This rule is also a special case of the Gauss-Bonnet theorem in Riemannian geometry.) In particular, the area {\mathrm{Area}(r_1,r_2,r_3)} of a {(r_1,r_2,r_3)}-triangle is given by the formula

    \displaystyle  \pi - \alpha_1(r_1,r_2,r_3) - \alpha_2(r_1,r_2,r_3) - \alpha_3(r_1,r_2,r_3). \ \ \ \ \ (2)

  • (iii) Show that the area of the interior of a hyperbolic circle {C(p,r)} with {r<\infty} is equal to {4\pi \sinh^2(r/2)}.

Henceforth we fix {G, v, v_1,\dots,v_d, {\mathcal C} = (C(p_w,r_w))_{w \in V \backslash \{v\}}} as in Theorem 14. We refer to the vertices {v_1,\dots,v_d} as boundary vertices of {G - \{v\}} and the remaining vertices as interior vertices; edges between boundary vertices are boundary edges, all other edges will be called interior edges (including edges that have one vertex on the boundary). Triangles in {G -\{v\}} that involve two boundary vertices (and thus necessarily one interior vertex) will be called boundary triangles; all other triangles (including ones that involve one boundary vertex) will be called interior triangles. To any triangle {w_1,w_2,w_3} of {G - \{v\}}, we can form the hyperbolic triangle {\Delta_{\mathcal C}(w_1,w_2,w_3)} with vertices {p_{w_1}, p_{w_2}, p_{w_3}}; this is an {(r_{w_1}, r_{w_2}, r_{w_3})}-triangle. Let {\Sigma} denote the collection of such hyperbolic triangles; because {{\mathcal C}} is a packing, we see that these triangles have disjoint interiors. They also fit together in the following way: if {e} is a side of a hyperbolic triangle in {\Sigma}, then there will be another hyperbolic triangle in {\Sigma} that shares that side precisely when {e} is associated to an interior edge of {G - \{v\}}. The union of all these triangles is homeomorphic to the region formed by starting with a triangulation of the Riemann sphere by {G} and removing the triangles containing {v} as a vertex, and is therefore homeomorphic to a disk. One can think of the collection {\Sigma} of hyperbolic triangles, together with the vertices and edges shared by these triangles, as a two-dimensional (hyperbolic) simplicial complex, though we will not develop the full machinery of such complexes here.

Our objective is to find another hyperbolic circle packing {\tilde {\mathcal C} = (C(\tilde p_w, \tilde r_w))_{w \in V \backslash \{v\}}} homotopic to the existing circle packing {{\mathcal C}}, such at all the boundary circles (circles centred at boundary vertices) are horocycles. We observe that such a hyperbolic circle packing is completely described (up to Möbius transformations) by the hyperbolic radii {(\tilde r_w)_{w \in V \backslash \{v\}}} of these circles. Indeed, suppose one knows the values of these hyperbolic radii. Then each hyperbolic triangle {\Delta_{\mathcal C}(w_1,w_2,w_3)} in {\Sigma} is associated to a hyperbolic triangle {\Delta_{\tilde {\mathcal C}}(w_1,w_2,w_3)} whose sides and angles are known from Exercise 15. As the orientation of each hyperbolic triangle is fixed, each hyperbolic triangle is determined up to a Möbius automorphism of {D(0,1)}. Once one fixes one hyperbolic triangle, the adjacent hyperbolic triangles (that share a common side with the first triangle) are then also fixed; continuing in this fashion we see that the entire hyperbolic circle packing {\tilde {\mathcal C}} is determined.

On the other hand, not every choice of radii {(\tilde r_w)_{w \in V \backslash \{v\}}} will lead to a hyperbolic circle packing {\tilde {\mathcal C}} with the required properties. There are two obvious constraints that need to be satisfied:

  • (i) (Local constraint) The angles {\alpha_1( \tilde r_w, \tilde r_{w_1}, \tilde r_{w_2})} of all the hyperbolic triangles {\Delta_{\tilde {\mathcal C}}(w,w_1,w_2)} around any given interior vertex {w} must sum to exactly {2\pi}.
  • (ii) (Boundary constraint) The radii associated to boundary vertices must be infinite.

There could potentially also be a global constraint, in that one requires the circles of the packing to be disjoint – including circles that are not necessarily adjacent to each other. In general, one can easily create configurations of circles that are local circle packings but not global ones (see e.g., Figure 7 of Beardon-Stephenson). However, it turns out that one can use the boundary constraint and topological arguments to prevent this from happening. We first need a topological lemma:

Lemma 16 (Topological lemma) Let {U, V} be bounded connected open subsets of {{\bf C}} with {V} simply connected, and let {f: \overline{U} \rightarrow \overline{V}} be a continuous map such that {f(\partial U) \subset \partial V} and {f(U) \subset V}. Suppose furthermore that the restriction of {f} to {U} is a local homeomorphism. Then {f} is in fact a global homeomorphism.

The requirement that the restriction of {f} to {U} be a local homeomorphism can in fact be relaxed to local injectivity thanks to the invariance of domain theorem. The complex numbers {{\bf C}} can be replaced here by any finite-dimensional vector space.

Proof: The preimage {f^{-1}(p)} of any point {p} in the interior of {V} is closed, discrete, and disjoint from {\partial U}, and is hence finite. Around each point in the preimage, there is a neighbourhood on which {f} is a homeomorphism onto a neighbourhood of {p}. If one deletes the closure of these neighbourhoods, the image under {f} is compact and avoids {p}, and thus avoids a neighbourhood of {p}. From this we can show that {f} is a covering map from {U} to {V}. As the base {V} is simply connected, it is its own universal cover, and hence (by the connectedness of {U}) {f} must be a homeomorphism as claimed. \Box

Proposition 17 Suppose we assign a radius {\tilde r_w \in (0,+\infty]} to each {w \in V \backslash \{v\}} that obeys the local constraint (i) and the boundary constraint (ii). Then there is a hyperbolic circle packing {(C(\tilde p_w, \tilde r_w))_{w \in V \backslash \{v\}}} with nerve {G - \{v\}} and the indicated radii.

Proof: We first create the hyperbolic triangles {\Delta_{\tilde {\mathcal C}}(w_1,w_2,w_3)} associated with the required hyperbolic circle packing, and then verify that this indeed arises from a circle packing.

Start with a single triangle {(w^0_1,w^0_2,w^0_3)} in {G - \{v\}}, and arbitrarily select a {(\tilde r_{w^0_1}, \tilde r_{w^0_2}, \tilde r_{w^0_3})}-triangle {\Delta_{\tilde {\mathcal C}}(w^0_1,w^0_2,w^0_3)} with the same orientation as {\Delta_{{\mathcal C}}(w_1,w_2,w_3)}. By Exercise 15(i), such a triangle exists (and is unique up to Möbius automorphisms of the disk). If a hyperbolic triangle {\Delta_{\tilde {\mathcal C}}(w_1,w_2,w_3)} has been fixed, and {(w_2,w_3,w_4)} (say) is an adjacent triangle in {G - \{v\}}, we can select {\Delta_{\tilde {\mathcal C}}(w_2,w_3,w_4)} to be the unique {(r_{w_2}, r_{w_3}, r_{w_4})}-triangle with the same orientation as {\Delta_{{\mathcal C}}(w_2,w_3,w_4)} that shares the {w_2,w_3} side in common with {\Delta_{\tilde {\mathcal C}}(w_1,w_2,w_3)} (with the {w_2} and {w_3} vertices agreeing). Similarly for other permutations of the labels. As {G} is a maximal planar graph with {v} non-degenerate (so in particular the set of internal vertices is connected), we can continue this construction to eventually fix every triangle in {G - \{v\}}. There is the potential issue that a given triangle {\Delta_{{\mathcal C}}(w_1,w_2,w_3)} may depend on the order in which one arrives at that triangle starting from {(w^0_1,w^0_2,w^0_3)}, but one can check from a monodromy argument (in the spirit of the monodromy theorem) using the local constraint (i) and the simply connected nature of the triangulation associated to {{\mathcal C}} that there is in fact no dependence on the order. (The process resembles that of laying down jigsaw pieces in the shape of hyperbolic triangles together, with the local constraint ensuring that there is always a flush fit locally.)

Now we show that the hyperbolic triangles {\Delta_{\tilde {\mathcal C}}(w_1,w_2,w_3)} have disjoint interiors inside the disk {D(0,1)}. Let {X} denote the topological space formed by taking the disjoint union of the hyperbolic triangles {\Delta_{\tilde {\mathcal C}}(w_1,w_2,w_3)} (now viewed as abstract topological spaces rather than subsets of the disk) and then gluing together all common edges, e.g. identifying the {\{w_2,w_3\}} edge of {\Delta_{\tilde {\mathcal C}}(w_1,w_2,w_3)} with the same edge of {\Delta_{\tilde {\mathcal C}}(w_2,w_3,w_4)} if {(w_1,w_2,w_3)} and {(w_2,w_3,w_4)} are adjacent triangles in {G - \{v\}}. This space is homeomorphic to the union of the original hyperbolic triangles {\Delta_{\tilde {\mathcal C}}(w_1,w_2,w_3)}, and is thus homeomorphic to the closed unit disk. There is an obvious projection map {\pi} from {X} to the union of the {\Delta_{\tilde {\mathcal C}}(w_1,w_2,w_3)}, which maps the abstract copy in {X} of a given hyperbolic triangle {\Delta_{\tilde {\mathcal C}}(w_1,w_2,w_3)} to its concrete counterpart in {\overline{D(0,1)}} in the obvious fashion. This map is continuous. It does not quite cover the full closed disk, mainly because (by the boundary condition (ii)) the boundary hyperbolic triangles {\Delta_{\tilde {\mathcal C}}(v_i,v_{i+1},w)} touch the boundary of the disk at the vertices associated to {v_i} and {v_{i+1}} but do not follow the boundary arc connecting these vertices, being bounded instead by the geodesic from the {v_i} vertex to the {v_{i+1}} vertex; the missing region is a lens-shaped region bounded by two circular arcs. However, by applying another homeomorphism (that does not alter the edges from {v_i} to {w} or {v_{i+1}} to {w}), one can “push out” the {\{v_i,v_{i+1}\}} edge of this hyperbolic triangle across the lens to become the boundary arc from {v_i} to {v_{i+1}}. If one performs this modification for each boundary triangle, one arrives at a modified continuous map {\tilde \pi} from {X} to {\overline{D(0,1)}}, which now has the property that the boundary of {X} maps to the boundary of the disk, and the interior of {X} maps to the interior of the disk. Also one can check that this map is a local homeomorphism. By Lemma 16, {\tilde \pi} is injective; undoing the boundary modifications we conclude that {\pi} is injective. Thus the hyperbolic triangles {\Delta_{\tilde {\mathcal C}}(w_1,w_2,w_3)} have disjoint interiors. Furthermore, the arguments show that for each boundary triangle {\Delta_{\tilde {\mathcal C}}(v_i,v_{i+1},w)}, the lens-shaped regions between the boundary arc between the vertices associated to {v_i, v_{i+1}} and the corresponding edge of the boundary triangle are also disjoint from the hyperbolic triangles and from each other. On the other hand, all of the hyperbolic circles and in {{\tilde {\mathcal C}}} and their interiors are contained in the union of the hyperbolic triangles {\Delta_{\tilde {\mathcal C}}(w_1,w_2,w_3)} and the lens-shaped regions, with each hyperbolic triangle containing portions only of the hyperbolic circles with hyperbolic centres at the vertices of the triangle, and similarly for the lens-shaped regions. From this one can verify that the interiors of the hyperbolic circles are all disjoint from each other, and give a hyperbolic circle packing with the required properties. \Box

In view of the above proposition, the only remaining task is to find an assignment of radii {(\tilde r_w)_{w \in V \backslash \{v\}}} obeying both the local condition (i) and the boundary condition (ii). This is analogous to finding a harmonic function with specified boundary data. To do this, we perform the following analogue of Perron’s method. Define a subpacking to be an assignment {(\tilde r_w)_{w \in V \backslash \{v\}}} of radii {\tilde r_w \in (0,+\infty]} obeying the following

  • (i’) (Local sub-condition) The angles {\alpha_1( \tilde r_w, \tilde r_{w_1}, \tilde r_{w_2})} around any given interior vertex {w} sum to at least {2\pi}.

This can be compared with the definition of a (smooth) subharmonic function as one where the Laplacian is always at least zero. Note that we always have at least one subpacking, namely the one provided by the radii of the original hyperbolic circle packing {{\mathcal C}}. Intuitively, in each subpacking, the radius {\tilde r_w} at an interior vertex {w} is either “too small” or “just right”.

We now need a key monotonicity property, analogous to how the maximum of two subharmonic functions is again subharmonic:

Exercise 18 (Monotonicity)

  • (i) Show that the angle {\alpha_1( r_1, r_2, r_3)} (as defined in Exercise 15(i)) is strictly decreasing in {r_1} and strictly increasing in {r_2} or {r_3} (if one holds the other two radii fixed). Do these claims agree with your geometric intuition?
  • (ii) Conclude that whenever {{\mathcal R}' = (r'_w)_{w \in V \backslash \{v\}}} and {{\mathcal R}'' = (r''_w)_{w \in V \backslash \{v\}}} are subpackings, that {\max( {\mathcal R}' , {\mathcal R}'' ) := (\max(r'_w, r''_w))_{w \in V \backslash \{v\}}} is also a subpacking.
  • (iii) Let {(r_1,r_2,r_3), (r'_1,r'_2,r'_3) \in T} be such that {r_i \leq r'_i} for {i=1,2,3}. Show that {\mathrm{Area}(r_1,r_2,r_3) \leq \mathrm{Area}(r'_1,r'_2,r'_3)}, with equality if and only if {r_i=r'_i} for all {i=1,2,3}. (Hint: increase just one of the radii {r_1,r_2,r_3}. One can either use calculus (after first disposing of various infinite radii cases) or one can argue geometrically.)

As with Perron’s method, we can now try to construct a hyperbolic circle packing by taking the supremum of all the subpackings. To avoid degeneracies we need an upper bound:

Proposition 19 (Upper bound) Let {(\tilde r_w)_{w \in V \backslash \{v\}}} be a subpacking. Then for any interior vertex {w} of degree {d}, one has {\tilde r_w \leq \sqrt{d}}.

The precise value of {\sqrt{d}} is not so important for our arguments, but the fact that it is finite will be. This boundedness of interior circles in a circle packing is a key feature of hyperbolic geometry that is not present in Euclidean geometry, and is one of the reasons why we moved to a hyperbolic perspective in the first place.

Proof: By the subpacking property and pigeonhole principle, there is a triangle {w, w_1, w_2} in {G - \{v\}} such that {\alpha_1(w,w_1,w_2) \geq \frac{2\pi}{d}}. The hyperbolic triangle associated to {(w_1,w_2,w_3)} has area at most {\pi} by (2); on the other hand, it contains a sector of a hyperbolic circle of radius {\tilde r_w} and angle {\frac{2\pi}{d}}, and hence has area at least {\frac{1}{d} 4\pi \sinh^2(r/2) \geq \frac{\pi r^2}{d}}, thanks to Exercise 15(iv). Comparing the two bounds gives the claim. \Box

Now define {{\mathcal R} = ( \tilde r_w )_{w \in V \backslash \{v\}}} to be the (pointwise) supremum of all the subpackings. By the above proposition, {\tilde r_w} is finite at every interior vertex. By Exercise 18, one can view {{\mathcal R}} as a monotone increasing limit of subpackings, and is thus again a subpacking (due to the continuity properties of {\alpha_1} as long as at least one of the radii stays bounded); thus {{\mathcal R}} is the maximal subpacking. On the other hand, if {\tilde r_w} is finite at some boundary vertex, then by Exercise 18(i) one could replace that radius by a larger quantity without destroying the subpacking property, contradicting the maximality of {{\mathcal R}}. Thus all the boundary radii are infinite, that is to say the boundary condition (ii) holds. Finally, if the sum of the angles at an interior vertex {w} is strictly greater than {\pi}, then by Exercise 18 we could increase the radius at this vertex slightly without destroying the subpacking property at {w} or at any other of the interior vertices, again contradicting the maximality of {{\mathcal R}}. Thus {{\mathcal R}} obeys the local condition (i), and we have demonstrated existence of the required hyperbolic circle packing.

Finally we establish uniqueness. It suffices to establish that {{\mathcal R}} is the unique tuple that obeys the local condition (i) and the boundary condition (ii). Suppose we had another tuple {{\mathcal R}' = ( r'_w )_{w \in V \backslash \{v\}}} other than {{\mathcal R}} that obeyed these two conditions. Then by the maximality of {{\mathcal R}}, we have {r'_w \leq \tilde r_w} for all {w}. By Exercise 18(iii), this implies that

\displaystyle  \mathrm{Area}( r'_{w_1}, r'_{w_2}, r'_{w_3} ) \leq \mathrm{Area}( \tilde r_{w_1}, \tilde r_{w_2}, \tilde r_{w_3} )

for any triangle {(w_1,w_2,w_3)} in {T}. Summing over all triangles and using (2), we conclude that

\displaystyle  \sum_{w \in V \backslash \{v\}} \sum_{w_1,w_2: (w,w_1,w_2) \hbox{ triangle}} \alpha_1(r'_{w}, r'_{w_1}, r'_{w_2})

\displaystyle \geq \sum_{w \in V \backslash \{v\}} \sum_{w_1,w_2: (w,w_1,w_2) \hbox{ triangle}} \alpha_1(\tilde r_{w}, \tilde r_{w_1}, \tilde r_{w_2})

where the inner sum is over the pairs {w_1,w_2} such that {(w,w_1,w_2)} forms a triangle in {G - \{v\}}. But by the local condition (i) and the boundary condition (ii), the inner sum on either side is equal to {2\pi} for an interior vertex and {0} for a boundary vertex. Thus the two sides agree, which by Exercise 18(iii) implies that {r'_w = \tilde r_w} for all {w}. This proves Theorem 14 and thus Theorems 7, 3, 4.

— 2. Quasiconformal maps —

In this section we set up some of the foundational theory of quasiconformal mapping, which are generalisations of the conformal mapping concept that can tolerate some deviations from perfect conformality, while still retaining many of the good properties of conformal maps (such as being preserved under uniform limits), though with the notable caveat that in contrast to conformal maps, quasiconformal maps need not be smooth. As such, this theory will come in handy when proving convergence of circle packings to the Riemann map. The material here is largely drawn from the text of Lehto and Virtanen.

We first need the following refinement of the Riemann mapping theorem, known as Carathéodory’s theorem:

Theorem 20 (Carathéodory’s theorem) Let {U} be a bounded simply connected domain in {{\bf C}} whose boundary {\partial U} is a Jordan curve, and let {\phi: D(0,1) \rightarrow U} be a conformal map between {D(0,1)} and {U} (as given by the Riemann mapping theorem). Then {\phi} extends to a continuous homeomorphism from {\overline{D}(0,1)} to {\overline{U}}.

The condition that {\partial U} be a Jordan curve is clearly necessary, since if {\partial U} is not simple then there are paths in {D(0,1)} that end up at different points in {\partial D(0,1)} but have the same endpoint in {\partial U} after applying {\phi}, which prevents {\phi} being continuously extended to a homeomorphism.

Proof: We first prove continuous extension to the boundary. It suffices to show that for every point {\zeta} on the boundary of the unit circle, the diameters of the sets {\phi( D(0,1) \cap D( \zeta, r_n ) )} go to zero for some sequence of radii {r_n \rightarrow 0}.

First observe from the change of variables formula that the area of {U = \phi(D(0,1))} is given by {\int_{D(0,1)} |\phi'(z)|^2 dz d\overline{z}}, where {dz d\overline{z}} is Lebesgue measure. In particular, this integral is finite. Expanding in polar coordinates around {\zeta}, we conclude that

\displaystyle  \int_0^2 \left(\int_{0}^{2\pi} 1_{D(0,1)}(\zeta+re^{i\theta}) |\phi'( \zeta + r e^{i\theta} )|^2\ d \theta\right) r dr < \infty.

Since {\int_0^2 \frac{dr}{r}} diverges near {r=0}, we conclude from the pigeonhole principle that there exists a sequence of radii {0 < r_n < 2} decreasing to zero such that

\displaystyle  r_n^2 \int_{0}^{2\pi} 1_{D(0,1)}(\zeta+re^{i\theta}) |\phi'( \zeta + r_n e^{i\theta} )|^2\ d \theta \rightarrow 0

and hence by Cauchy-Schwarz

\displaystyle  r_n \int_{0}^{2\pi} 1_{D(0,1)}(\zeta+re^{i\theta}) |\phi'( \zeta + r_n e^{i\theta} )|\ d \theta \rightarrow 0

If we let {C_n} denote the circular arc {\{ \zeta + r_n e^{i\theta}: 0 \leq \theta \leq 2\pi \} \cap D(0,1)}, we conclude from this and the triangle inequality (and chain rule) that {\phi(C_n)} is a rectifiable curve with length going to zero as {n \rightarrow \infty}. Let {a_n, b_n} denote the endpoints of this curve. Clearly they lie in {\overline{U}}. If (say) {a_n} was in {U}, then as {\phi} is a homeomorphism from {D(0,1)} to {U}, {C_n} would have one endpoint in {D(0,1)} rather than {\partial D(0,1)}, which is absurd. Thus {a_n} lies in {\partial U}, and similarly for {b_n}. Since the length of {\phi(C_n)} goes to zero, the distance between {a_n} and {b_n} goes to zero. Since {\partial U} is a Jordan curve, it can be parameterised homeomorphically by {\partial D(0,1)}, and so by compactness we also see that the distance between the parameterisations of {a_n} and {b_n} in {\partial D(0,1)} must also go to zero, hence (by uniform continuity of the inverse parameterisation) {a_n} and {b_n} are connected along {\partial U} by an arc whose diameter goes to zero. Combining this arc with {\phi(C_n)}, we obtain a Jordan curve of diameter going to zero which separates {\phi(D(0,1) \cap D(\zeta, r_n))} from the rest of {U}. Sending {r} to infinity, we see that {\phi(D(0,1) \cap D(\zeta, r_n))} (which decreases with {n}) must eventually map in the interior of this curve rather than the exterior, and so the diameter goes to zero as claimed.

The above construction shows that {\phi} extends to a continuous map (which by abuse of notation we continue to call {\phi}) from {\overline{D(0,1)}} to {\overline{U}}, and the proof also shows that {\partial D(0,1)} maps to {\partial U}. As {\phi(\overline{D(0,1)})} is a compact subset of {\overline{U}} that contains {U}, it must surject onto {\overline{U}}. As both {\overline{D(0,1)}} and {\overline{U}} are compact Hausdorff spaces, we will now be done if we can show injectivity. The only way injectivity can fail is if there are two distinct points {\zeta,\omega} on {\partial D(0,1)} that map to the same point. Let {C} be the line segment connecting {\zeta} with {\omega}, then {\phi(C)} is a Jordan curve in {\overline{U}} that meets {\partial U} only at {\phi(\zeta) = \phi(\omega)}. {C} divides {\overline{D(0,1)}} into two regions; one of which must map to the interior of {\phi(C)}, which implies that there is an entire arc of {\partial D(0,1)} which maps to the single point {\phi(\zeta)=\phi(\omega)}. But then by the Schwarz reflection principle, {\phi} extends conformally across this arc and is constant in a non-isolated set, thus is constant everywhere by analytic continuation, which is absurd. This establishes the required injectivity. \Box

This has the following consequence. Define a Jordan quadrilateral to be the open region {Q} enclosed by a Jordan curve with four distinct marked points {p_1,p_2,p_3,p_4} on it in counterclockwise order, which we call the vertices of the quadrilateral. The arcs in {\partial Q} connecting {p_1} to {p_2} or {p_3} to {p_4} will be called the {a}-sides; the arcs connecting {p_2} to {p_3} or {p_4} to {p_1} will be called {b}-sides. (Thus for instance each cyclic permutation of the {p_1,p_2,p_3,p_4} vertices will swap the {a}-sides and {b}-sides, while keeping the interior region {Q} unchanged.) A key example of a Jordan quadrilateral are the (Euclidean) rectangles, in which the vertices {p_1,\dots,p_4} are the usual corners of the rectangle, traversed counterclockwise. The {a}-sides then are line segments of some length {a}, and the {b}-sides are line segments of some length {b} that are orthogonal to the {a}-sides. A vertex-preserving conformal map from one Jordan quadrilateral {Q} to another {Q'} will be a conformal map that extends to a homeomorphism from {\overline{Q}} to {\overline{Q'}} that maps the corners of {Q} to the respective corners of {Q'} (in particular, {a}-sides get mapped to {a}-sides, and similarly for {b}-sides).

Exercise 21 Let {Q} be a Jordan quadrilateral with vertices {p_1,p_2,p_3,p_4}.

  • (i) Show that there exists {r > 1} and a conformal map {\phi: Q \rightarrow \mathbf{H}} to the upper half-plane {\mathbf{H} := \{ z: \mathrm{Im} z > 0 \}} that extends continuously to a homeomorphism {\phi: \overline{Q} \rightarrow \overline{\mathbf{H}}} and which maps {p_1,p_2,p_3,p_4} to {-r, -1, 1, r} respectively. (Hint: first map {p_1,p_2,p_3,p_4} to increasing elements of the real line, then use the intermediate value theorem to enforce {\phi(p_1)+\phi(p_4) = \phi(p_2)+\phi(p_3)}.)
  • (ii) Show that there is a vertex-preserving conformal map {\psi: Q \rightarrow R} from {Q} to a rectangle {R} (Hint: use Schwarz-Christoffel mapping.)
  • (iii) Show that the rectangle {R} in part (ii) is unique up to affine transformations. (Hint: if one has a conformal map between rectangles that preserves the vertices, extend it via repeated use of the Schwarz reflection principle to an entire map.)

This allows for the following definition: the conformal modulus {\mathrm{mod}(Q)} (or modulus for short, also called module in older literature) of a Jordan quadrilateral with vertices {p_1,p_2,p_3,p_4} is the ratio {b/a}, where {a,b} are the lengths of the {a}-sides and {b}-sides of a rectangle {R} that is conformal to {Q} in a vertex-preserving vashion.. This is a number between {0} and {\infty}; each cyclic permutation of the vertices replaces the modulus with its reciprocal. It is clear from construction that the modulus of a Jordan quadrilateral is unaffected by vertex-preserving conformal transformations.

Now we define quasiconformal maps. Informally, conformal maps are homeomorphisms that map infinitesimal circles to infinitesimal circles; quasiconformal maps are homeomorphisms that map infinitesimal circles to curves that differ from an infinitesimal circle by “bounded distortion”. However, for the purpose of setting up the foundations of the theory, it is slightly more convenient to work with rectangles instead of circles (it is easier to partition rectangles into subrectangles than disks into subdisks). We therefore introduce

Definition 22 Let {K \geq 1}. An orientation-preserving homeomorphism {\phi: U \rightarrow V} between two domains {U,V} in {{\bf C}} is said to be {K}-quasiconformal if one has {\mathrm{mod}(\phi(Q)) \leq K \mathrm{mod}(Q)} for every Jordan quadrilateral {Q} in {U}. (In these notes, we do not consider orientation-reversing homeomorphisms to be quasiconformal.)

Note that by cyclically permuting the vertices of {Q}, we automatically also obtain the inequality

\displaystyle  \frac{1}{\mathrm{mod}(\phi(Q))} \leq K \frac{1}{\mathrm{mod}(Q)}

or equivalently

\displaystyle  \frac{1}{K} \mathrm{mod}(Q) \leq \mathrm{mod}(\phi(Q))

for any Jordan quadrilateral. Thus it is not possible to have any {K}-quasiconformal maps for {K<1} (excluding the degenerate case when {U,V} are empty), and a map is {1}-conformal if and only if it preserves the modulus. In particular, conformal maps are {1}-conformal; we will shortly establish that the converse claim is also true. It is also clear from the definition that the inverse of a {K}-quasiconformal map is also {K}-quasiconformal, and the composition of a {K}-quasiconformal map and a {K'}-quasiconformal map is a {KK'}-quasiconformal map.

It is helpful to have an alternate characterisation of the modulus that does not explicitly mention conformal mapping:

Proposition 23 (Alternate definition of modulus) Let {Q} be a Jordan quadrilateral with vertices {p_1,p_2,p_3,p_4}. Then {\mathrm{Mod}(Q)} is the smallest quantity with the following property: for any Lebesgue measurable {\rho: Q \rightarrow [0,+\infty)} one can find a curve {\gamma} in {Q} connecting one {a}-side of {Q} to another, and which is locally rectifiable away from endpoints, such that

\displaystyle  \left(\int_\gamma \rho(z)\ |dz|\right)^2 \leq \mathrm{Mod}(Q) \int_Q \rho^2(z)\ dz \overline{dz}

where {\int_\gamma |dz|} denotes integration using the length element of {\gamma} (not to be confused with the contour integral {\int_\gamma\ dz}).

The reciprocal of this notion of modulus generalises to the concept of extremal length, which we will not develop further here.

Proof: Observe from the change of variables formula that if {\phi: Q \rightarrow Q'} is a vertex-preserving conformal mapping between Jordan quadrilaterals {Q,Q'}, and {\gamma} is a locally rectifiable curve connecting one {a}-side of {Q} to another, then {\phi \circ \gamma} is a locally rectifiable curve connecting one {a}-side of {Q'} to another, with

\displaystyle  \int_{\phi \circ \gamma} \rho \circ \phi^{-1}(w) \ |dw| = \int_\phi \rho(z) |\phi'(z)|\ |dz|


\displaystyle  \int_{Q'} |\rho \circ \phi^{-1}(w)|^2\ dw \overline{dw} = \int_Q |\rho(z)|^2 |\phi'(z)|^2\ dz \overline{dz}.

As a consequence, if the proposition holds for {Q} it also holds for {Q'}. Thus we may assume without loss of generality that {Q} is a rectangle, which we may normalise to be {\{ x+iy: 0 \leq y \leq 1; 0 \leq x \leq M \}} with vertices {0, i, M+i, M}, so that the modulus is {M}. For any measurable {\rho: Q \rightarrow [0,+\infty)}, we have from Cauchy-Schwarz and Fubini’s theorem that

\displaystyle  \int_0^1 \left(\int_0^M \rho(x+iy)\ dx\right)^2 dy \leq M \int_0^1 \int_0^M \rho^2(x+iy)\ dx dy

\displaystyle  = M \int_Q \rho^2(z)\ dz \overline{dz}

and hence by the pigeonhole principle there exists {y} such that

\displaystyle  \left(\int_0^M \rho(x+iy)\ dx\right)^2 \leq M \int_Q \rho^2(z)\ dz \overline{dz}.

On the other hand, if we set {\rho=1}, then {\int_Q \rho^2(z)\ dz \overline{dz} = M}, and for any curve {\gamma} connecting the {a}-side from {0} to {i} to the {a}-side from {M} to {M+i}, we have

\displaystyle  \int_\gamma \rho\ |dz| \geq \left| \int_\gamma \rho\ dx \right| = M.

Thus {M} is the best constant with the required property, proving the claim. \Box

Here are some quick and useful consequences of this characterisation:

Exercise 24 (Superadditivity)

  • (i) If {Q_1, Q_2} are disjoint Jordan quadrilaterals that share a common {a}-side, and which can be glued together along this side to form a new Jordan quadrilateral {Q_1 \cup Q_2}, show that {\mathrm{mod}(Q_1 \cup Q_2) \geq \mathrm{mod}(Q_1) + \mathrm{mod}(Q_2)}. If equality occurs, show that after conformally mapping {Q_1 \cup Q_2} to a rectangle (in a vertex preserving fashion), {Q_1}, {Q_2} are mapped to subrectangles (formed by cutting the original parallel to the {a}-side).
  • (ii) If {Q_1, Q_2} are disjoint Jordan quadrilaterals that share a common {b}-side, and which can be glued together along this side to form a new Jordan quadrilateral {Q_1 \cup Q_2}, show that {\frac{1}{\mathrm{mod}(Q_1 \cup Q_2)} \geq \frac{1}{\mathrm{mod}(Q_1)} + \frac{1}{\mathrm{mod}(Q_2)}}. If equality occurs, show that after conformally mapping {Q_1 \cup Q_2} to a rectangle (in a vertex preserving fashion), {Q_1}, {Q_2} are mapped to subrectangles (formed by cutting the original parallel to the {b}-side).

Exercise 25 (Rengel’s inequality) Let {Q} be a Jordan quadrilateral of area {A}, let {b} be the shortest (Euclidean) distance between a point on one {a}-side and a point on the other {a}-side, and similarly let {a} be the shortest (Euclidean) distance between a point on one {b}-side and a point on the other {b}-side. Show that

\displaystyle  \frac{b^2}{A} \leq \mathrm{Mod}(Q) \leq \frac{A}{a^2}

and that equality in either case occurs if and only if {Q} is a rectangle.

Exercise 26 (Continuity from below) Suppose {Q_n} is a sequence of Jordan quadrilaterals which converge to another Jordan quadrilateral {Q}, in the sense that the vertices of {Q_n} converge to their respective counterparts in {Q}, each {a}-side in {Q_n} converges (in the Hausdorff sense) to the {a}-side of {Q}, and the similarly for {b}-sides. Suppose also that {Q_n \subset Q} for all {n}. Show that {\mathrm{Mod}(Q_n)} converges to {\mathrm{Mod}(Q)}. (Hint: map {Q} to a rectangle and use Rengel’s inequality.)

Proposition 27 (Local quasiconformality implies quasiconformality) Let {K \geq 1}, and let {\phi: U \rightarrow V} be an orientation-preserving homeomorphism between complex domains {U,V} which is locally {K}-quasiconformal in the sense that for every {z_0 \in U} there is a neighbourhood {U_{z_0}} of {z_0} in {U} such that {\phi} is {K}-quasiconformal from {U_{z_0}} to {\phi(U_{z_0})}. Then {\phi} is {K}-quasiconformal.

Proof: We need to show that {\mathrm{Mod}(\phi(Q)) \leq K \mathrm{Mod}(Q)} for any Jordan quadrilateral {Q} in {U}. The hypothesis gives this claim for all quadrilaterals in the sufficiently small neighbourhood of any point in {U}. For any natural number {n}, we can subdivide a rectangle {R} conformally equivalent (in a vertex-preserving fashion) with {Q} into {n^2} subrectangles of the same modulus; for {n} large enough, a compactness argument then shows that {Q} can be subdivided into {n^2} Jordan quadrilaterals of the same modulus, that are small enough for the hypothesis to apply to each of these quadrilaterals. The claim then follows from many applications of Exercise 24. \Box

We can now reverse the implication that conformal maps are {1}-conformal:

Proposition 28 Every {1}-conformal map {\phi: U \rightarrow V} is conformal.

Proof: By covering {U} by quadrilaterals we may assume without loss of generality that {U} (and hence also {\phi(U)=V}) is a Jordan quadrilateral; by composing on left and right with conformal maps we may assume that {U} and {V} are rectangles. As {\phi} is {1}-conformal, the rectangles have the same modulus, so after a further affine transformation we may assume that {U=V} is the rectangle with vertices {0, i, M+i, M} for some modulus {M}. If one subdivides {U} into two rectangles along an intermediate vertical line segment connecting say {x} to {x+i} for some {0 < x < M}, the moduli of these rectangles are {x} and {M-x}. Applying the {1}-conformal map and the converse portion of Exercise 24, we conclude that these rectangles must be preserved by {\phi}, thus {\phi} preserves the {x} coordinate. Similarly {\phi} preserves the {y} coordinate, and is therefore the identity map, which is of course conformal. \Box

Next, we can give a simple criterion for quasiconformality in the continuously differentiable case:

Theorem 29 Let {K \geq 1}, and let {\phi: U \rightarrow V} be an orientation-preserving diffeomorphism (a continuously (real) differentiable homeomorphism whose derivative is always nondegenerate) between complex domains {U,V}. Then the following are equivalent:

  • (i) {\phi} is {K}-quasiconformal.
  • (ii) For any point {z_0 \in U} and phases {v,w \in S^1 := \{ z \in {\bf C}: |z|=1\}}, one has

    \displaystyle  |D_v \phi(z_0)| \leq K|D_w \phi(z_0)|

    where {D_v \phi(z_0) := \frac{\partial}{\partial t} \phi(z_0+tv)|_{t=0}} denotes the directional derivative.

Proof: Let us first show that (ii) implies (i). Let {Q} be a Jordan quadrilateral in {U}; we have to show that {\mathrm{Mod}(\phi(Q)) \leq K \mathrm{Mod}(Q)}. From the chain rule one can check that condition (ii) is unchanged by composing {\phi} with conformal maps on the left or right, so we may assume without loss of generality that {Q} and {\phi(Q)} are rectangles; in fact we may normalise {Q} to have vertices {0, i, T+i, T} and {\phi(Q)} to have vertices {0, i, T'+i,T'} where {T = \mathrm{Mod}(Q)} and {T' = \mathrm{Mod}(Q')}. From the change of variables formula (and the singular value decomposition), followed by Fubini’s theorem and Cauchy-Schwarz, we have

\displaystyle  T' = \int_{\phi(Q)} dw d\overline{w}

\displaystyle  = \int_Q \mathrm{det}(D\phi)(z)\ dz d\overline{z}

\displaystyle  = \int_Q \max_{v \in S^1} |D_v \phi(z)| \min_{w \in S^1} |D_w \phi(z)|\ dz d\overline{z}

\displaystyle  \geq \int_Q \frac{1}{K} \left|\frac{\partial}{\partial x} \phi(z)\right|^2\ dz d\overline{z}

\displaystyle = \frac{1}{K} \int_0^1 \int_0^T \left|\frac{\partial}{\partial x} \phi(x+iy)\right|^2\ dx dy

\displaystyle  \geq \frac{1}{K T} \int_0^1 \left|\int_0^T \frac{\partial}{\partial x} \phi(x+iy)\right|^2\ dx dy

\displaystyle  = \frac{1}{K T} \int_0^1 (T')^2\ dy

and hence {T' \leq K T}, giving the claim.

Now suppose that (ii) failed, then by the singular value decomposition we can find {z_0 \in U} and a phase {v \in S^1} such that

\displaystyle  D_{iv} \phi(z_0) = i \lambda D_v \phi(z_0)

for some real {\lambda} with {\lambda > K}. After translations and rotations we may normalise so that

\displaystyle  \phi(0) = 0; \frac{\partial}{\partial x} \phi(0) = 1; \frac{\partial}{\partial y} \phi(0) = i\lambda.

But then from Rengel’s inequality and Taylor expansion one sees that {\phi} will map a unit square with vertices {0, -\varepsilon, -\varepsilon+i\varepsilon, i\varepsilon} to a quadrilateral of modulus converging to {\lambda} as {\varepsilon \rightarrow 0}, contradicting (i). \Box

Exercise 30 Show that the conditions (i), (ii) in the above theorem are also equivalent to the bound

\displaystyle  \left|\frac{\partial}{\partial \overline{z}} \phi(z_0)\right| \leq \frac{K-1}{K+1} \left|\frac{\partial }{\partial z} \phi(z_0)\right|

for all {z_0 \in U}, where

\displaystyle  \frac{\partial }{\partial z} := \frac{1}{2} ( \frac{\partial }{\partial x} - i \frac{\partial }{\partial y}); \quad \frac{\partial }{\partial \overline{z}} := \frac{1}{2} ( \frac{\partial }{\partial x} + i \frac{\partial }{\partial y})

are the Wirtinger derivatives.

We now prove a technical regularity result on quasiconformal maps.

Proposition 31 (Absolute continuity on lines) Let {\phi: U \rightarrow V} be a {K}-quasiconformal map between two complex domains {U,V} for some {K}. Suppose that {U} contains the closed rectangle with endpoints {0, i, T+i, T}. Then for almost every {0 \leq y \leq 1}, the map {x \mapsto \phi(x+iy)} is absolutely continuous on {[0,R]}.

Proof: For each {y}, let {A(y)} denote the area of the image {\{ \phi(x+iy'): 0 \leq x \leq T; 0 \leq y \leq y'\}} of the rectangle with endpoints {0, iy, T+iy, T}. This is a bounded monotone function on {[0,1]} and is hence differentiable almost everywhere. It will thus suffice to show that the map {x \mapsto \phi(x+iy)} is absolutely continuous on {[0,R]} whenever {y \in (0,1)} is a point of differentiability of {A}.

Let {\varepsilon > 0}, and let {[x_1,x'_1],\dots,[x_m,x'_m]} be disjoint intervals in {[0,R]} of total length {\sum_{j=1}^m x'_j-x_j \leq \varepsilon}. To show absolute continuity, we need a bound on {\sum_{j=1}^m |\phi(x'_j) - \phi(x_j)|} that goes to zero as {\varepsilon \rightarrow 0} uniformly in the choice of intervals. Let {\delta>0} be a small number (that can depend on the intervals), and for each {j=1,\dots,m} let {R_j} be the rectangle with vertices {x_j+iy_j}, {x_j+i(y_j+\delta)}, {x'_j+i(y_j+\delta)}, {x'_j+iy_j}. This rectangle has modulus {(x'_j-x_j)/\delta}, and hence {\phi(R_j)} has modulus at most {K (x'_j-x_j)/\delta}. On the other hand, by Rengel’s inequality this modulus is at least {|\phi(x'_j)-\phi(x_j)-o(1)|^2 / \mathrm{Area}(\phi(R_j))}, where {o(1)} is a quantity that goes to zero as {\delta \rightarrow 0} (holding the intervals fixed). We conclude that

\displaystyle  |\phi(x'_j)-\phi(x_j)|^2 \leq \frac{K}{\delta} (x'_j-x_j) \mathrm{Area}(\phi(R_j)) + o(1).

On the other hand, we have

\displaystyle  \sum_{j=1}^m \mathrm{Area}(\phi(R_j)) \leq A(y+\delta) - A(y) = (A'(y)+o(1)) \delta.

By Cauchy-Schwarz, we thus have

\displaystyle  (\sum_{j=1}^m |\phi(x'_j)-\phi(x_j)|)^2 \leq K A'(y) \sum_{j=1}^m (x'_j-x_j) + o(1);

sending {\delta \rightarrow 0}, we conclude

\displaystyle  \sum_{j=1}^m |\phi(x'_j)-\phi(x_j)| \leq K^{1/2} A'(y)^{1/2} \varepsilon^{1/2}

giving the claim. \Box

Exercise 32 Let {\phi: U \rightarrow V} be a {K}-quasiconformal map between two complex domains {U,V} for some {K}. Suppose that there is a closed set {S \subset {\bf C}} of Lebesgue measure zero such that {\phi} is conformal on {U \backslash S}. Show that {\phi} is {1}-conformal (and hence conformal, by Proposition 28). (Hint: Arguing as in the proof of Theorem 29, it suffices to show that of {\phi} maps the rectangle with endpoints {0, i, T+i, T} to the rectangle with endpoints {0, i, T'+i, T'}, then {T' \leq T}. Repeat the proof of that theorem, using the absolute continuity of lines at a crucial juncture to justify using the fundamental theorem of calculus.)

Recall Hurwitz’s theorem that the locally uniform limit of conformal maps is either conformal or constant. It turns out there is a similar result for quasiconformal maps. We will just prove a weak version of the result (see Theorem II.5.5 of Lehto-Virtanen for the full statement):

Theorem 33 Let {K \geq 1}, and let {\phi_n: U \rightarrow V_n} be a sequence of {K}-quasiconformal maps that converge locally uniformly to an orientation-preserving homeomorphism {\phi: U \rightarrow V}. Then {\phi} is also {K}-quasiconformal.

It is important for this theorem that we do not insist that quasiconformal maps are necessarily differentiable. Indeed for applications to circle packing we will be working with maps that are only piecewise smooth, or possibly even worse, even though at the end of the day we will recover a smooth conformal map in the limit.

Proof: Let {Q} be a Jordan quadrilateral in {U}. We need to show that {\mathrm{Mod}(\phi(Q)) \leq K \mathrm{Mod}(Q)}. By restricting {U} we may assume {U=Q}. By composing {\phi, \phi_n} with a conformal map we may assume that {Q} is a rectangle. We can write {Q} as the increasing limit of rectangles {Q_m} of the same modulus, then for any {n,m} we have {\mathrm{Mod}(\phi_n(Q_m)) \leq K \mathrm{Mod}(Q)}. By choosing {n_m} going to infinity sufficiently rapidly, {\phi_{n_m}(Q_m)} stays inside {\phi(Q)} and converges to {\phi(Q)} in the sense of Exercise 26, and the claim then follows from that exercise. \Box

Another basic property of conformal mappings (a consequence of Morera’s theorem) is that they can be glued along a common edge as long as the combined map is also a homeomorphism; this fact underlies for instance the Schwarz reflection principle. We have a quasiconformal analogue:

Theorem 34 Let {K \geq 1}, and let {\phi: U \rightarrow V} be an orientation-preserving homeomorphism. Let {C} be a real analytic (and topologically closed) contour that lies in {U} except possibly at the endpoints. If {\phi: U \backslash C \rightarrow \phi(U \backslash C)} is {K}-quasiconformal, then {\phi: U \rightarrow V} is {K}-quasiconformal.

We will generally apply this theorem in the case when {C} disconnects {U} into two components, in which case {\phi} can be viewed as the gluing of the restrictions of this map to the two components.

Proof: As in the proof of the previous theorem, we may take {U} to be a rectangle {Q}, and it suffices (after cyclically permuting vertices) to show that {\mathrm{Mod}(\phi(Q)) \geq \frac{1}{K} \mathrm{Mod}(Q)}. We may normalise {Q} to have vertices {0, i, M+i, M} where {M = \mathrm{Mod}(Q)}. The real analytic contour {C} meets {Q} in a finite number of curves, which can be broken up further into a finite horizontal line segments and graphs {\{ f_j(y) + iy: y \in I_j\}} for various closed intervals {I_j \subset [0,1]} and real analytic {f_j: I_j \rightarrow [0,M]}. For any {\varepsilon>0}, we can then use the uniform continuity of the {f_j} to subdivide {Q} into a finite number of rectangles {Q_k= \{ x+iy: y \in J_k, 0 \leq x \leq M \}} where on each such rectangle, {C} meets the interior of {Q_k} in a bounded number of graphs {\{ f_j(y) +iy: y \in J_k\}} whose horizontal variation is {O(\varepsilon)}. This subdividies {Q_k} into a bounded number of Jordan quadrilaterals {Q_{k,j}}; by Rengel’s theorem we see that the moduli of these quadrilaterals adds up to {(1+O(\varepsilon)) \mathrm{Mod}(Q_k)}. Applying {\phi} (which is {K}-quasiconformal on the {Q_{k,j}}) and then Exercise 24 we conclude that {\mathrm{mod}(\phi(Q_k)) \geq (1+O(\varepsilon)) \frac{1}{K} \mathrm{Mod}(Q_k)}, and then by Exercise 24 again we conclude that {\mathrm{mod}(\phi(Q)) \geq (1+O(\varepsilon)) \frac{1}{K} \mathrm{Mod}(Q)}; sending {\varepsilon \rightarrow 0} we obtain the claim. \Box

It will be convenient to study analogues of the modulus when quadrilaterals are replaced by generalisations of annuli. We define a ring domain to be a region bounded between two Jordan curves {C_1, C_2}, where {C_1} (the inner boundary) is contained inside the interior of {C_2} (the outer boundary). For instance, the annulus {\{ z: r < |z-z_0| < R \}} is a ring domain for any {z_0 \in {\bf C}} and {0 < r < R < \infty}. In the spirit of Proposition 23, define the modulus {\mathrm{Mod}(A)} of a ring domain {A} to be the supremum of all the quantities {M} with the following property: for any Lebesgue measurable {\rho: A \rightarrow [0,+\infty)} one can find a rectifable curve {\gamma} in {A} winding once around the inner boundary {C_1}, such that

\displaystyle  \left(\int_\gamma \rho(z)\ |dz|\right)^2 \leq \frac{2\pi}{M} \int_A \rho^2(z)\ dz \overline{dz}.

We record some basic properties of this modulus:

Exercise 35

Exercise 36 Show that every ring domain is conformal to an annulus. (There are several ways to proceed here. One is to start by using Perron’s method to construct a harmonic function that is {1} on one of the boundaries of the annulus and {0} on the other. Another is to apply a logarithm map to transform the annulus to a simply connected domain with a “parabolic” group of discrete translation symmetries, use the Riemann mapping theorem to map this to a disc, and use the uniqueness aspect of the Riemann mapping theorem to figure out what happens to the symmetry.) Use this to give an alternate definition of the modulus of a ring domain that is analogous to the original definition of the modulus of a quadrilateral.

As a basic application of this concept we have the fact that the complex plane cannot be quasiconformal to any proper subset:

Proposition 37 Let {\phi: {\bf C} \rightarrow V} be a {K}-quasiconformal map for some {K \geq 1}; then {V = {\bf C}}.

Proof: As {V} is homeomorphic to {{\bf C}}, it is simply connected. Thus, if we assume for contradiction that {V \neq {\bf C}}, then by the Riemann mapping theorem {V} is conformal to {D(0,1)}, so we may assume without loss of generality that {V = D(0,1)}.

By Exercise 35(i), the moduli {\log R} of the annuli {\{ z: 1 \leq |z| \leq R \}} goes to infinity as {R \rightarrow \infty}, and hence (by Exercise 35(ii) (applied to {\phi^{-1}}) the moduli of the ring domains {\{ \phi(z): 1 \leq |z| \leq R \}} must also go to infinity. However, as the inner boundary of this domain is fixed and the outer one is bounded, all these ring domains can be contained inside a common annulus, contradicting Exercise 35(iii). \Box

For some further applications of the modulus of ring domains, we need the following result of Grötzsch:

Theorem 38 (Grötzsch modulus theorem) Let {0 < r < 1}, and let {A} be the ring domain formed from {D(0,1)} by deleting the line segment from {0} to {r}. Let {A'} be another ring domain contained in {D(0,1)} whose inner boundary encloses both {0} and {r}. Then {\mathrm{Mod}(A') \leq \mathrm{Mod}(A)}.

Proof: Let {R := \exp(\mathrm{Mod}(A))}, then by Exercise 36 we can find a conformal map {f} from {A} to the annulus {\{ z: 1 \leq |z| \leq R \}}. As {A} is symmetric around the real axis, and the only conformal automorphisms of the annulus that preserve the inner and outer boundaries are rotations (as can be seen for instance by using the Schwarz reflection principle repeatedly to extend such automorphisms to an entire function of linear growth), we may assume that {f} obeys the symmetry {f(\overline{z}) = \overline{f(z)}}. Let {\rho: A \rightarrow {\bf R}^+} be the function {\rho := |f'/f|}, then {\rho} is symmetric around the real axis. One can view {\rho} as a measurable function on {A'}; from the change of variables formula we have

\displaystyle  \int_{A'} \rho^2\ dz d\overline{z} = \int_{1 \leq |z| \leq R} \frac{1}{|z|^2}\ dz d\overline{z} = 2\pi \log R,

so in particular {\rho} is square-integrable. Our task is to show that {\mathrm{Mod}(A') \leq \log R}; by the definition of modulus, it suffices to show that

\displaystyle  (\int_\gamma \rho\ d|z|)^2 \leq \frac{2\pi}{\log R} \int_{A'} \rho^2\ dz d\overline{z}

for any rectifiable curve {\gamma} that goes once around {A'}, and thus once around {0} and {r} in {D(0,1)}. By a limiting argument we may assume that {\gamma} is polygonal. By repeatedly reflecting around the real axis whenever {\gamma} crosses the line segment between {0} and {r}, we may assume that {\gamma} does not actually cross this segment, and then by perturbation we may assume it is contained in {A}. But then by change of variables we have

\displaystyle  \int_\gamma \rho\ d|z| = \int_{f(\gamma)} \frac{d|z|}{|z|} \leq |\int_{f(\gamma)} \frac{dz}{z}| = 2\pi

by the Cauchy integral formula, and the claim follows. \Box

Exercise 39 Let {\phi_n: U \rightarrow V_n} be a sequence of {K}-quasiconformal maps for some {K \geq 1}, such that all the {V_n} are uniformly bounded. Show that the {\phi_n} are a normal family, that is to say every sequence in {\phi_n} contains a subsequence that converges locally uniformly. (Hint: use an argument similar to that in the proof of Proposition 37, combined with Theorem 38, to establish some equicontinuity of the {\phi_n}.)

There are many further basic properties of the conformal modulus for both quadrilaterals and annuli; we refer the interested reader to Lehto-Virtanen for details.

— 3. Rigidity of the hexagonal circle packing —

We return now to circle packings. In order to understand finite circle packings, it is convenient (in order to use some limiting arguments) to consider some infinite circle packings. A basic example of an infinite circle packing is the regular hexagonal circle packing

\displaystyle  {\mathcal H} := ( z_0 + S^1 )_{z_0 \in \Gamma}

where {\Gamma} is the hexagonal lattice

\displaystyle  \Gamma := \{ 2 n + 2 e^{2\pi i/3} m: n,m \in {\bf Z} \}

and {z_0 + S^1 := \{ z_0 + e^{i \theta}: \theta \in {\bf R} \}} is the unit circle centred at {z_0}. This is clearly an (infinite) circle packing, with two circles {z_0+S^1, z_1+S^1} in this packing (externally) tangent if and only if they differ by twice a sixth root of unity. Between any three mutually tangent circles in this packing is an open region that we will call an interstice. It is inscribed in a dual circle that meets the three original circles orthogonally and can be computed to have radius {1/\sqrt{3}}; the interstice can then be viewed as a hyperbolic triangle in this dual circle in which all three sides have infinite length. Let {I} denote the union of all the interstices.

For every circle {z_0 + S^1} in this circle packing, we can form the inversion map {\iota_{z_0}: {\bf C} \cup \{\infty\} \rightarrow {\bf C} \cup \{\infty\}} across this circle on the Riemann sphere, defined by setting

\displaystyle  \iota_{z_0}( z_0 + re^{i\theta} ) := z_0 + \frac{1}{r} e^{i\theta}

for {0 < r < \infty} and {\theta \in {\bf R}}, with the convention that {\iota_{z_0}} maps {z_0} to {\infty} and vice versa. These are conjugates of Möbius transformations; they preserve the circle {z_0+S^1} and swap the interior with the exterior. Let {G} be the group of transformations of {{\bf C} \cup \{\infty\}} generated by these inversions {\iota_{z_0}}; this is essentially a Schottky group (except for the fact that we are are allowing for conjugate Möbius transformations in addition to ordinary Möbius transformations). Let {GI := \bigcup_{g \in G} g(I)} be the union of the images of the interstitial regions {I} under all of these transformations. We have the following basic fact:

Proposition 40 {{\bf C} \backslash GI} has Lebesgue measure zero.

Proof: (Sketch) I thank Mario Bonk for this argument. Let {G {\mathcal H}} denote all the circles formed by applying an element of {G} to the circles in {{\mathcal H}}. If {z} lies in {{\bf C} \backslash GI}, then it lies inside one of the circles in {{\mathcal H}}, and then after inverting through that circle it lies in another circle in {{\mathcal H}}, and so forth; undoing the inversions, we conclude that {z} lies in infinite number of nested circles. Let {C} be one of these circles. {GI} contains a union of six interstices bounded by {C} and a cycle of six circles internally tangent to {C} and consecutively externally tangent to each other. A slight modification of the ring lemma shows that the six internal circles have radii comparable to that of {C}, and hence {GI} has density {\gg 1} in the disk enclosed by {C}, which also contains {z}. The ring lemma also shows that the radius of each circle in the nested sequence is at most {1-c} times the one enclosing it for some absolute constant {c>0}, so in particular the disks shrink to zero in size. Thus {z} cannot be a point of density of {{\bf C} \backslash GI}, and hence by the Lebesgue density theorem this set has measure zero. \Box

Next we need two simple geometric lemmas, due to Rodin and Sullivan.

Lemma 41 (Ring lemma) Let {C} be a circle that is externally tangent to a chain {C_1,\dots,C_n} of circles with disjoint interiors, with each {C_i} externally tangent to {C_{i+1}} (with the convention {C_{n+1}=C_1}). Then there is a constant {c_n} depending only on {n}, such that the radii of each of the {C_i} is at least {c_n} times the radius of {C}.

Proof: Without loss of generality we may assume that {C} has radius {1} and that the radius {r_1} of {C_1} is maximal among the radii {r_i} of the {C_i}. As the polygon connecting the centers of the {C_i} has to contain {C}, we see that {r_1 \gg_n 1}. This forces {r_2 \gg_n 1}, for if {r_2} was too small then {C_2} would be so deep in the cuspidal region between {C} and {C_1} that it would not be possible for {C_3, C_4,\dots C_n} to escape this cusp and go around {C_1}. A similar argument then gives {r_3 \gg_n 1}, and so forth, giving the claim. \Box

Lemma 42 (Length-area lemma) Let {n \geq 1}, and let {{\mathcal H}_n} consist of those circles in {{\mathcal H}} that can be connected to the circle {0 + S^1} by a path of length at most {n} (going through consecutively tangent circles in {{\mathcal H}}). Let {{\mathcal C}_n} be circle packing with the same nerve as {{\mathcal H}_n} that is contained in a disk of radius {R}. Then the circle {C_0} in {{\mathcal C}_n} associated to the circle {0+S^1} in {{\mathcal H}_n} has radius {O(\frac{R}{\log^{1/2} n})}.

The point of this bound is that when {R} is bounded and {n \rightarrow \infty}, the radius of {C_0} is forced to go to zero.

Proof: We can surround {0+S^1} by {n} disjoint chains {(C_{j,i})_{i=1}^{6j}, j=1,\dots,n} of consecutively tangent circles {z_{j,i}+S^1}, {i=1,\dots, 6j} in {{\mathcal H}_n}. Each circle is associated to a corresponding circle in {{\mathcal C}} of some radius {r_{j,i}}. The total area {\sum_{j=1}^n \sum_{i=1}^{6j} \pi r_{ij}^2} of these circles is at most the area {\pi R^2} of the disk of radius {R}. Since {\sum_{j=1}^n \frac{1}{n} \gg \log n}, this implies from the pigeonhole principle that there exists {j} for which

\displaystyle  \sum_{i=1}^{6j} \pi r_{ij}^2 \ll \frac{R^2}{j \log n}

and hence by Cauchy-Schwarz

\displaystyle  \sum_{i=1}^{6j} r_{ij} \ll \frac{R}{\log^{1/2} n}.

Connecting the centers of these circles, we obtain a polygonal path of length {O( \frac{R}{\log^{1/2} n})} that goes around {C_0}, and the claim follows. \Box

We also need another simple geometric observation:

Exercise 43 Let {C_1,C_2,C_3} be mutually externally tangent circles, and let {C'_1, C'_2, C'_3} be another triple of mutually external circles, with the same orientation (e.g. {C_1,C_2,C_3} and {C'_1,C'_2,C'_3} both go counterclockwise around their interstitial region). Show that there exists a Möbius transformation {\phi} that maps each {C_i} to {C'_i} and which maps the interstice of {C_1,C_2,C_3} conformally onto the interstice of {C'_1,C'_2, C'_3}.

Now we can give a rigidity result for the hexagonal circle packing, somewhat in the spirit of Theorem 4 (though it does not immediately follow from that theorem), and also due to Rodin and Sullivan:

Proposition 44 (Rigidity of infinite hexagonal packing) Let {{\mathcal C}} be an infinite circle packing in {{\bf C}} with the same nerve as the hexagonal circle packing {{\mathcal H}}. Then {{\mathcal C}} is in fact equal to the hexagonl circle packing up to affine transformations and reflections.

Proof: By applying a reflection we may assume that {{\mathcal C}} and {{\mathcal H}} have the same orientation. For each interstitial region {I_j} of {{\mathcal H}} there is an associated interstitial region {I'_j} of {{\mathcal C}}, and by Exercise 43 there is a Möbius transformation {T_j: I_j \rightarrow I'_j}. These can be glued together to form a map {\phi_0} that is initially defined (and conformal) on the interstitial regions {I =\bigcup_j I_j}; we would like to extend it to the entire complex plane by defining it also inside the circles {z_j + S^1}.

Now consider a circle {z_j+S^1} in {{\mathcal H}}. It is bounded by six interstitial regions {I_1,\dots,I_6}, which map to six interstitial regions {I'_1,\dots,I'_6} that lie between the circle {C_0} corresponding to {z_j+S^1} and six tangent circles {C_1,\dots,C_6}. By the ring lemma, all of the circles {C_1,\dots,C_6} have radii comparable to the radius {r_j} of {C_0}. As a consequence, the map {\phi_0}, which is defined (and piecewise Möbius) on the boundary of {z_j + S^1} as a map to the boundary of {C_0}, has derivative comparable in magnitude to {r_j} also. By extending this map radially (in the sense of defining {\phi(z_j + r e^{i\theta}) := w_j + r r_j (\phi(z_j + e^{i\theta})-w_j)} for {0 < r < 1} and {\theta \in {\bf R}}, where {w_j} is the centre of {C_0}, we see from Theorem 29 that we can extend {\phi_0} to be {K}-quasiconformal in the interior of {z_j+S^1} except possibly at {z_j} for some {K=O(1)}, and to a homeomorphism from {{\bf C}} to the region {\phi_0({\bf C})} consisting of the union of the disks in {{\mathcal C}} and their interstitial regions. By many applications of Theorem 34, {\phi_0} is now {K}-quasiconformal on all of {{\bf C}}, and conformal in the interstitial regions {I}. By Proposition 37, {\phi_0} surjects onto {{\bf C}}, thus the circle packing {{\mathcal C}} and all of its interstitial regions cover the entire complex plane.

Next, we use a version of the Schwarz reflection principle to replace {\phi_0} by another {K}-quasiconformal map that is conformal on a larger region than {I}. Namely, pick a circle {z_j+S^1} in {{\mathcal H}}, and let {C_0} be the corresponding circle in {{\mathcal C}}. Let {\iota_j} and {\iota'_j} be the inversions across {z_j+S^1} and {C_0} respectively. Note that {\phi_0} maps the circle {z_j+S^1} to {C_0}, with the interior mapping to the interior and exterior mapping to the exterior. We can then define a modified map {\phi_1} by setting {\phi_1(z)} equal to {\phi_0(z)} on or outside {z_j+S_1}, and {\phi_1(z)} equal to {\iota'_j \circ \phi_0 \circ \iota_j(z)} inside {z_j+S_1} (with the convention that {\phi_0} maps {\infty} to {\infty}). This is still an orientation-preserving function {{\bf C}}; by Theorem 34 it is still {K}-quasiconformal. It remains conformal on the interstitial region {I}, but is now also conformal on the additional interstitial region {\iota_j(I)}. Repeating this construction one can find a sequence {\phi_n:{\bf C} \rightarrow {\bf C}} of {K}-quasiconformal maps that map each circle {z_j+S^1} to their counterparts {C_0}, and which are conformal on a sequence {I_n} of sets that increase up to {GI}. By Exercise 39, the restriction of {\phi_n} to any compact set forms a normal family (the fact that the circles {z_j+S^1} map to the circles {C_0} will give the required uniform boundedness for topological reasons), and hence (by the usual diagonalisation argument) the {\phi_n} themselves are a normal family; similarly for {\phi_n^{-1}}. Thus, by passing to a subsequence, we may assume that the {\phi_n} converge locally uniformly to a limit {\phi}, and that {\phi_n^{-1}} also converge locally uniformly to a limit which must then invert {\phi}. Thus {\phi} is a homeomorphism, and thus {K}-quasiconformal by Theorem 33. It is conformal on {GI}, and hence by Proposition 32 it is conformal. But the only conformal maps of the complex plane are the affine maps (see Proposition 15 of this previous blog post), and hence {{\mathcal C}} is an affine copy of {{\mathcal H}} as required. \Box

By a standard limiting argument, the perfect rigidity of the infinite circle packing can be used to give approximate rigidity of finite circle packings:

Corollary 45 (Approximate rigidity of finite hexagonal packings) Let {\varepsilon>0}, and suppose that {n} is sufficiently large depending on {\varepsilon}. Let {{\mathcal H}_n} and {{\mathcal C}_n} be as in Lemma 42. Let {r_0} be the radius of the circle {C_1} in {{\mathcal C}_n} associated to {0+S_1}, and let {r_1} be the radius of an adjacent circle {C_1}. Then {1-\varepsilon \leq \frac{r_1}{r_0} \leq 1+\varepsilon}.

Proof: We may normalise {r_0=1} and {C_0=S^1}. Suppose for contradiction that the claim failed, then one can find a sequence {n} tending to infinity, and circle packings {{\mathcal C}_n} with nerve {{\mathcal H}_n} with {C_0 = C_{0,n} = S^1}, such that the radius {r_{1,n}} of the adjacent circle {C_1 = C_{1,n}} stays bounded away from {1}. By many applications of the ring lemma, for each circle {z + S^1} of {{\mathcal H}}, the corresponding circle {C_{z,n}} in {{\mathcal C}_n} has radius bounded above and below by zero. Passing to a subsequence using Bolzano-Weierstrass and using the Arzela-Ascoli diagonalisation argument, we may assume that the radii {r_{z,n}} of these circles converge to a positive finite limit {r_{z,\infty}}. Applying a rotation we may also assume that the circles {C_{1,n}} converge to a limit circle {C_{1,\infty}} (using the obvious topology on the space of circles); we can also assume that the orientation of the {{\mathcal C}_n} does not depend on {n}. A simple induction then shows that {C_{z,n}} converges to a limit circle {C_{z,\infty}}, giving a circle packing {{\mathcal C}_\infty} with the same nerve as {{\mathcal H}}. But then by Lemma 44, {{\mathcal C}_\infty} is an affine copy of {{\mathcal H}}, which among other things implies that {r_{1,\infty} = r_{0,\infty} = 1}. Thus {r_{1,n}} converges to {1}, giving the required contradiction. \Box

A more quantitative version of this corollary was worked out by He. There is also a purely topological proof of the rigidity of the infinite hexagonal circle packing due to Schramm.

— 4. Approximating a conformal map by circle packing —

Let {U} be a simply connected bounded region in {{\bf C}} with two distinct distinguished points {z_0, z_1 \in U}. By the Riemann mapping theorem, there is a unique conformal map {\phi: U \rightarrow D(0,1)} that maps {z_0} to {0} and {z_1} to a positive real. However, many proofs of this theorem are rather nonconstructive, and do not come with an effective algorithm to locate, or at least approximate, this map {\phi}.

It was conjectured by Thurston, and later proven by Rodin and Sullivan, that one could achieve this by applying the circle packing theorem (Theorem 3) to a circle packing in {U} by small circles. To formalise this, we need some more notation. Let {\varepsilon>0} be a small number, and let {\varepsilon \cdot {\mathcal H}} be the infinite hexagonal packing scaled by {\varepsilon}. For every circle in {\varepsilon \cdot {\mathcal H}}, define the flower to be the union of this circle, its interior, and the six interstices bounding it. Let {C_0} be a circle in {\varepsilon \cdot {\mathcal H}} such that {z_0} lies in its flower. For {\varepsilon} small enough, this flower is contained in {U}. Let {{\mathcal I}_\varepsilon} denote all circles in {\varepsilon \cdot {\mathcal H}} that can be reached from {C_0} by a finite chain of consecutively tangent circles in {\varepsilon \cdot {\mathcal H}}, whose flowers all lie in {U}. Elements of {{\mathcal I}_\varepsilon} will be called inner circles, and circles in {\varepsilon \cdot {\mathcal H}} that are not an inner circle but are tangent to it will be called border circles. Because {U} is simply connected, the union of all the flowers of inner circles is also simply connected. As a consequence, one can traverse the border circles by a cycle of consecutively tangent circles, with the inner circles enclosed by this cycle. Let {{\mathcal C}_\varepsilon} be the circle packing consisting of the inner circles and border circles. Applying Theorem 3 followed by a Möbius transformation, one can then find a circle packing {{\mathcal C}'_\varepsilon} in {D(0,1)} with the same nerve and orientation as {{\mathcal C}_\varepsilon}, such that all the circles in {{\mathcal C}'_\varepsilon} associated to border circles of {{\mathcal C}_\varepsilon} are internally tangent to {D(0,1)}. Applying a Möbius transformation, we may assume that the flower containing {z_0} in {{\mathcal C}_\varepsilon} is mapped to the flower containing {0}, and the flower containing {z_1} is mapped to a flower containing a positive real. (From the exercise below {z_1} will lie in such a flower for {\varepsilon} small enough.)

Let {U_\varepsilon} be the union of all the solid closed equilateral triangles formed by the centres of mutually tangent circles in {{\mathcal C}_\varepsilon}, and let {D_\varepsilon} be the corresponding union of the solid closed triangles from {{\mathcal C}'_\varepsilon}. Let {\phi_\varepsilon} be the piecewise affine map from {U_\varepsilon} to {D_\varepsilon} that maps each triangle in {U_\varepsilon} to the associated triangle in {D_\varepsilon}.

Exercise 46 Show that {U_\varepsilon} converges to {U} as {\varepsilon \rightarrow 0} in the Hausdorff sense. In particular, {z_1} lies in {U_\varepsilon} for sufficiently small {\varepsilon}.

Exercise 47 By modifying the proof of the length-area lemma, show that all the circles {C} in {{\mathcal C}'_\varepsilon} have radius that goes uniformly to zero as {\varepsilon \rightarrow 0}. (Hint: for circles {C} deep in the interior, the length-area lemma works as is; for circles {C} near the boundary, one has to encircle {C} by a sequence of chains that need not be closed, but may instead terminate on the boundary of {D(0,1)}. The argument may be viewed as a discrete version of the one used to prove Theorem 20.) Using this and the previous exercise, show that {D_\varepsilon} converges to {D(0,1)} in the Hausdorff sense.

From Corollary 45 we see that as {\varepsilon \rightarrow 0}, the circles in {{\mathcal C}'_\varepsilon} corresponding to adjacent circles of {{\mathcal C}_\varepsilon} in a fixed compact subset {R} of {U} have radii differing by a ratio of {1+o(1)}. We conclude that in any compact subset {R'} of {D(0,1)}, adjacent circles in {{\mathcal C}'_\varepsilon} in {R'} also have radii differing by a ratio of {1+o(1)}, which implies by trigonometry that the triangles of {D_\varepsilon} in {R'} are approximately equilateral in the sense that their angles are {\frac{\pi}{3}+o(1)}. By Theorem 29 {\phi_\varepsilon} is {1+o(1)}-quasiconformal on each such triangle, and hence by Theorem 34 it is {1+o(1)}-quasiconformal on {R}. By Exercise 39 every sequence of {\phi_\varepsilon} has a subsequence which converges locally uniformly on {U}, and whose inverses converge locally uniformly on {D}; the limit is then a homeomorphism from {U} to {D} that maps {z_0} to {0} and {z_1} to a positive real. By Theorem 33 the limit is {1}-conformal and hence conformal, hence by uniqueness of the Riemann mapping it must equal {\phi}. This implies that {\phi_\varepsilon} converges locally uniformly to {\phi}, thus making precise the sense in which the circle packings converge to the Riemann map.

Doug NatelsonPostdoc opportunity

While I have already spammed a number of faculty colleagues about this, I wanted to point out a competitive, endowed postdoctoral opportunity at Rice, made possible through the Smalley-Curl Institute.  (I am interested in hiring a postdoc in general, but the endowed opportunity is a nice one to pursue as well.)

The endowed program is the J Evans Attwell Welch Postdoctoral Fellowship.  This is a competitive, two-year fellowship, and each additionally includes travel funds and research supplies/minor equipment resources.  The deadline for the applications is this coming July 1, 2018 with an anticipated start date around September, 2018.  

I'd be delighted to work with someone on an application for this, and I am looking for a great postdoc in any case.  The best applicant would be a strong student who is interested in working on (i) noise and transport measurements in spin-orbit systems including 2d TIs; (ii) nanoscale studies (incl noise and transport) of correlated materials and non-Fermi liquids; and/or (iii) combined electronic and optical studies down to the molecular scale via plasmonic structures.  If you're a student finishing up and are interested, please contact me, and if you're a faculty member working with possible candidates, please feel free to point out this opportunity.

BackreactionA black hole merger... merger... merger

For my 40th birthday I got a special gift: 2.5 σ evidence for quantum gravity. It came courtesy of Niayesh Afshordi, Professor of astrophysics at Perimeter Institute, and in contrast to what you might think he didn’t get the 2.5 σ on Ebay. No, he got it from a LIGO-data analysis, results of which he presented at the 2016 conference on “Experimental Search for Quantum Gravity.” Frankly I

April 24, 2018

David Hoggready for DR2

Today I spent all my research time on details in preparation for Gaia DR2, which happens on Wednesday. Unfortunately, my preparation wasn't exactly research: I was working on building access, catering details, room arrangement, invitations, and encouragement. We have some forty people (not all of them astronomers) converging on Flatiron to work together on the new data.

One thing has become absolutely clear over the last few weeks, in part because of hard things some people have said to me, about themselves and about others and about perceptions: Our goal this week is to have fun. And learn. It isn't to be first on things. It is to learn things we couldn't have known before. The idea is to cooperate, to share, and to support the global Gaia community. I think we have been doing that for years now (I sure hope we have), but it is worth re-stating daily, especially when there is a lot of anticipation and excitement and, frankly, anxiety, about the upcoming data release.

Here's to 1.6 BILLION stars. In less than 36 hours.

April 23, 2018

Tommaso DorigoThe T-Index: More Meaningful Metrics For Scientists

Academics direly need objective, meaningful metrics to judge the impact their publications have on their field of expertise. Nowadays any regular Joe will be able to show many authored papers in their CV, and it will be impossible to objectively assess the relative merits of each and every one of them, if you are trying to rank Joe in a list of candidates for tenure, or just a research job at a University. 

read more

Jordan EllenbergThe first pancake is always strangely shaped

Alena Pirutka gave a great algebraic geometry seminar here last week, about (among many other things!) families of smooth projective varieties containing both rational and non-rational members.   We were talking about how you have to give a talk several times before it really starts to be well-put together, and she told me there’s a Russian proverb on the subject:  “The first pancake is always strangely shaped.”  I am totally going to go around saying this from now on.


Jordan EllenbergOcular regression

A phrase I learned from Aaron Clauset’s great colloquium on the non-ubiquity of scale-free networks.  “Ocular regression” is the practice of squinting at the data until it looks linear.


I am giving (another) seminar in Heidelberg on Wednesday (April 25th), this time about my upcoming book. May 1st is a national holiday in Germany (labor day) and I’ll be off-grid due to family affairs for some days. May 7th to 9th I am in Stockholm to get yelled at (it’s complicated). On May 26th I am in Hay-on-Wye which is a village someplace UK that hosts an event called How The Light Gets

David Hoggfinished a single-author paper!

In parallel-working session this morning, I finished and prepared for submission (to arXiv) my paper on a likelihood function for Bayesian data analyses with the Gaia data.

April 21, 2018

David Hoggfiber robots; gravitational waves

At lunch, Mike Blanton (NYU) and I discussed operational matters for SDSS-V. One thing we discussed was how to have different cadences for different types of stars, when we have a huge field of view and finite target densities for each stellar type. His view is that we should re-formulate the question in terms of sky patches, and set cadences for particular sky patches, and then observe the stars inside those patches as makes sense given the patch cadences. We also asked how to formulate this problem in terms of a scalar objective function or cost function, which is essential if we are going to let loose with optimizers.

The other thing we talked about is positioning a dense set of fibers. There are configurational constraints on the path that the fiber robots can take if they are going to avoid collisions and conflicts. Can we resolve these? And what engineering literature do we look to for the best or standard solutions to problems of this type. I am sure there is a huge literature, because it connects to all sorts of things like milling machines and warehousing and things like that. But I need keywords. I promised to deliver some to the SDSS-V Collaboration.

At the end of the day, Vicky Kalogera (Northwestern) gave a great talk about gravitational wave observing. Her group has been essential in converting the theory of gravitational-wave sources into practical schemes for performing principled probabilistic inferences on the data. She said, in her talk, that in the process she has become an observer, but she only observes in the gravitational-wave sector! And it is really true: She referred consistently in her talk to astronomers as “electromagnetic observers”. I love that! But really, the LIGO results are incredible, and Kalogera deserves a lot of credit for them.

David Hogga non-parametric model of the MW acceleration field

At Stars group meeting, I spoke about Ana Bonaca and my new paper looking at the information content of cold stellar streams in the Milky-Way halo. It is a huge document, with lots of results, but my absolute favorite is this: As we make the potential model for the Milky Way more flexible, each stream constrains each potential parameter less well. This is the issue with information studies: They depend strongly on the model flexibility! But something cool happens in the limit of very flexible potential model: Each stream appears to end up constraining the local acceleration field, local to the current position (not past position) of the stream. This has lots of consequences: One is that if this is true, we can just model each stream independently, in a flexible potential, and then interpolate the acceleration constraints they deliver with a flexible or non-parametric model as an interpolator! That would make stream fitting more tractable than it is now, not less (and most other ideas we have are computationally impossible at present).

In the discussion, Vasily Belokurov (Cambridge) suggested that we might get more information—and more global information—if we modeled the density of stars along the stream. He is reacting to the point that the Bonaca stream model is a stream-track model, not a full six-dimensional distribution function. Belokurov might be right; we should add something like this to the paper.

After I spoke, Jackie Faherty (AMNH) got us really excited about what Gaia has done and will do for nearby moving groups of young stars (like open clusters). She believes that several of the “connected components” in the Oh et al paper are new, previously undiscovered young clusters, and that Gaia DR2 might find hundreds of new members, going down the main sequence! That's amazing. I hope it's true.

Doug NatelsonThe Einstein-de Haas effect

Angular momentum in classical physics is a well-defined quantity tied to the motion of mass about some axis - its value (magnitude and direction) is tied to a particular choice of coordinates.  When we think about some extended object spinning around an axis with some angular velocity \(\mathbf{\omega}\), we can define the angular momentum associated with that rotation by \(\mathbf{I}\cdot \mathbf{\omega}\), where \(\mathbf{I}\) is the "inertia tensor" that keeps track of how mass is distributed in space around the axis.  In general, conservation of angular momentum in isolated systems is a consequence of the rotational symmetry of the laws of physics (Noether's theorem). 

The idea of quantum particles possessing some kind of intrinsic angular momentum is a pretty weird one, but it turns out to be necessary to understand a huge amount of physics.  That intrinsic angular momentum is called "spin", but it's *not* correct to think of it as resulting from the particle being an extended physical object actually spinning.  As I learned from reading The Story of Spin (cool book by Tomonaga, though I found it a bit impenetrable toward the end - more on that below), Kronig first suggested that electrons might have intrinsic angular momentum and used the intuitive idea of spinning to describe it; Pauli pushed back very hard on Kronig about the idea that there could be some physical rotational motion involved - the intrinsic angular momentum is some constant on the order of \(\hbar\).  If it were the usual mechanical motion, dimensionally this would have to go something like \(m r v\), where \(m\) is the mass, \(r\) is the size of the particle, and \(v\) is a speed; as \(r\) gets small, like even approaching a scale we know to be much larger than any intrinsic size of the electron, \(v\) would exceed \(c\), the speed of light.  Pauli pounded on Kronig hard enough that Kronig didn't publish his ideas, and two years later Goudsmit and Uhlenbeck established intrinsic angular momentum, calling it "spin".

Because of its weird intrinsic nature, when we teach undergrads about spin, we often don't emphasize that it is just as much angular momentum as the classical mechanical kind.  If you somehow do something to a system a bunch of spins, that can have mechanical consequences.  I've written about one example before, a thought experiment described by Feynman and approximately implemented in micromechanical devices.  A related concept is the Einstein-de Haas effect, where flipping spins again exerts some kind of mechanical torque.  A new preprint on the arxiv shows a cool implementation of this, using ultrafast laser pulses to demagnetize a ferromagnetic material.  The sudden change of the spin angular momentum of the electrons results, through coupling to the atoms, in the launching of a mechanical shear wave as the angular momentum is dumped into the lattice.   The wave is then detected by time-resolved x-ray measurements.  Pretty cool!

(The part of Tomonaga's book that was hard for me to appreciate deals with the spin-statistics theorem, the quantum field theory statement that fermions have spins that are half-integer multiples of \(\hbar\) while bosons have spins that are integer multiples.  There is a claim that even Feynman could not come up with a good undergrad-level explanation of the argument.  Have any of my readers every come across a clear, accessible hand-wave proof of the spin-statistics theorem?)

April 20, 2018

Matt von HippelBubbles of Nothing

I recently learned about a very cool concept, called a bubble of nothing.

Read about physics long enough, and you’ll hear all sorts of cosmic disaster scenarios. If the Higgs vacuum decays, and the Higgs field switches to a different value, then the masses of most fundamental particles would change. It would be the end of physics, and life, as we know it.

A bubble of nothing is even more extreme. In a bubble of nothing, space itself ceases to exist.

The idea was first explored by Witten in 1982. Witten started with a simple model, a world with our four familiar dimensions of space and time, plus one curled-up extra dimension. What he found was that this simple world is unstable: quantum mechanics (and, as was later found, thermodynamics) lets it “tunnel” to another world, one that contains a small “bubble”, a sphere in which nothing at all exists.


Except perhaps the Nowhere Man

A bubble of nothing might sound like a black hole, but it’s quite different. Throw a particle into a black hole and it will fall in, never to return. Throw it into a bubble of nothing, though, and something more interesting happens. As you get closer, the extra dimension of space gets smaller and smaller. Eventually, it stops, smoothly closing off. The particle you threw in will just bounce back, smoothly, off the outside of the bubble. Essentially, it reached the edge of the universe.

The bubble starts out small, comparable to the size of the curled-up dimension. But it doesn’t stay that way. In Witten’s setup, the bubble grows, faster and faster, until it’s moving at the speed of light, erasing the rest of the universe from existence.

You probably shouldn’t worry about this happening to us. As far as I’m aware, nobody has written down a realistic model that can transform into a bubble of nothing.

Still, it’s an evocative concept, and one I’m surprised isn’t used more often in science fiction. I could see writers using a bubble of nothing as a risk from an experimental FTL drive, or using a stable (or slowly growing) bubble as the relic of some catastrophic alien war. The idea of a bubble of literal nothing is haunting enough that it ought to be put to good use.

ResonaancesMassive Gravity, or You Only Live Twice

Proving Einstein wrong is the ultimate ambition of every crackpot and physicist alike. In particular, Einstein's theory of gravitation -  the general relativity -  has been a victim of constant harassment. That is to say, it is trivial to modify gravity at large energies (short distances), for example by embedding it in string theory, but it is notoriously difficult to change its long distance behavior. At the same time, motivations to keep trying go beyond intellectual gymnastics. For example, the accelerated expansion of the universe may be a manifestation of modified gravity (rather than of a small cosmological constant).   

In Einstein's general relativity, gravitational interactions are mediated by a massless spin-2 particle - the so-called graviton. This is what gives it its hallmark properties: the long range and the universality. One obvious way to screw with Einstein is to add mass to the graviton, as entertained already in 1939 by Fierz and Pauli. The Particle Data Group quotes the constraint m ≤ 6*10^−32 eV, so we are talking about the De Broglie wavelength comparable to the size of the observable universe. Yet even that teeny mass may cause massive troubles. In 1970 the Fierz-Pauli theory was killed by the van Dam-Veltman-Zakharov (vDVZ) discontinuity. The problem stems from the fact that a massive spin-2 particle has 5 polarization states (0,±1,±2) unlike a massless one which has only two (±2). It turns out that the polarization-0 state couples to matter with the similar strength as the usual polarization ±2 modes, even in the limit where the mass goes to zero, and thus mediates an additional force which differs from the usual gravity. One finds that, in massive gravity, light bending would be 25% smaller, in conflict with the very precise observations of stars' deflection around the Sun. vDV concluded that "the graviton has rigorously zero mass". Dead for the first time...           

The second coming was heralded soon after by Vainshtein, who noticed that the troublesome polarization-0 mode can be shut off in the proximity of stars and planets. This can happen in the presence of graviton self-interactions of a certain type. Technically, what happens is that the polarization-0 mode develops a background value around massive sources which, through the derivative self-interactions, renormalizes its kinetic term and effectively diminishes its interaction strength with matter. See here for a nice review and more technical details. Thanks to the Vainshtein mechanism, the usual predictions of general relativity are recovered around large massive source, which is exactly where we can best measure gravitational effects. The possible self-interactions leading a healthy theory without ghosts have been classified, and go under the name of the dRGT massive gravity.

There is however one inevitable consequence of the Vainshtein mechanism. The graviton self-interaction strength grows with energy, and at some point becomes inconsistent with the unitarity limits that every quantum theory should obey. This means that massive gravity is necessarily an effective theory with a limited validity range and has to be replaced by a more fundamental theory at some cutoff scale 𝞚. This is of course nothing new for gravity: the usual Einstein gravity is also an effective theory valid at most up to the Planck scale MPl~10^19 GeV.  But for massive gravity the cutoff depends on the graviton mass and is much smaller for realistic theories. At best,
So the massive gravity theory in its usual form cannot be used at distance scales shorter than ~300 km. For particle physicists that would be a disaster, but for cosmologists this is fine, as one can still predict the behavior of galaxies, stars, and planets. While the theory certainly cannot be used to describe the results of table top experiments,  it is relevant for the  movement of celestial bodies in the Solar System. Indeed, lunar laser ranging experiments or precision studies of Jupiter's orbit are interesting probes of the graviton mass.

Now comes the latest twist in the story. Some time ago this paper showed that not everything is allowed  in effective theories.  Assuming the full theory is unitary, causal and local implies non-trivial constraints on the possible interactions in the low-energy effective theory. These techniques are suitable to constrain, via dispersion relations, derivative interactions of the kind required by the Vainshtein mechanism. Applying them to the dRGT gravity one finds that it is inconsistent to assume the theory is valid all the way up to 𝞚max. Instead, it must be replaced by a more fundamental theory already at a much lower cutoff scale,  parameterized as 𝞚 = g*^1/3 𝞚max (the parameter g* is interpreted as the coupling strength of the more fundamental theory). The allowed parameter space in the g*-m plane is showed in this plot:

Massive gravity must live in the lower left corner, outside the gray area  excluded theoretically  and where the graviton mass satisfies the experimental upper limit m~10^−32 eV. This implies g* ≼ 10^-10, and thus the validity range of the theory is some 3 order of magnitude lower than 𝞚max. In other words, massive gravity is not a consistent effective theory at distance scales below ~1 million km, and thus cannot be used to describe the motion of falling apples, GPS satellites or even the Moon. In this sense, it's not much of a competition to, say, Newton. Dead for the second time.   

Is this the end of the story? For the third coming we would need a more general theory with additional light particles beyond the massive graviton, which is consistent theoretically in a larger energy range, realizes the Vainshtein mechanism, and is in agreement with the current experimental observations. This is hard but not impossible to imagine. Whatever the outcome, what I like in this story is the role of theory in driving the progress, which is rarely seen these days. In the process, we have understood a lot of interesting physics whose relevance goes well beyond one specific theory. So the trip was certainly worth it, even if we find ourselves back at the departure point.

April 19, 2018

Terence Tao246C notes 1: Meromorphic functions on Riemann surfaces, and the Riemann-Roch theorem

The fundamental object of study in real differential geometry are the real manifolds: Hausdorff topological spaces {M = M^n} that locally look like open subsets of a Euclidean space {{\bf R}^n}, and which can be equipped with an atlas {(\phi_\alpha: U_\alpha \rightarrow V_\alpha)_{\alpha \in A}} of coordinate charts {\phi_\alpha: U_\alpha \rightarrow V_\alpha} from open subsets {U_\alpha} covering {M} to open subsets {V_\alpha} in {{\bf R}^n}, which are homeomorphisms; in particular, the transition maps {\tau_{\alpha,\beta}: \phi_\alpha( U_\alpha \cap U_\beta ) \rightarrow \phi_\beta( U_\alpha \cap U_\beta )} defined by {\tau_{\alpha,\beta}: \phi_\beta \circ \phi_\alpha^{-1}} are all continuous. (It is also common to impose the requirement that the manifold {M} be second countable, though this will not be important for the current discussion.) A smooth real manifold is a real manifold in which the transition maps are all smooth.

In a similar fashion, the fundamental object of study in complex differential geometry are the complex manifolds, in which the model space is {{\bf C}^n} rather than {{\bf R}^n}, and the transition maps {\tau_{\alpha\beta}} are required to be holomorphic (and not merely smooth or continuous). In the real case, the one-dimensional manifolds (curves) are quite simple to understand, particularly if one requires the manifold to be connected; for instance, all compact connected one-dimensional real manifolds are homeomorphic to the unit circle (why?). However, in the complex case, the connected one-dimensional manifolds – the ones that look locally like subsets of {{\bf C}} – are much richer, and are known as Riemann surfaces. For sake of completeness we give the (somewhat lengthy) formal definition:

Definition 1 (Riemann surface) If {M} is a Hausdorff connected topological space, a (one-dimensional complex) atlas is a collection {(\phi_\alpha: U_\alpha \rightarrow V_\alpha)_{\alpha \in A}} of homeomorphisms from open subsets {(U_\alpha)_{\alpha \in A}} of {M} that cover {M} to open subsets {V_\alpha} of the complex numbers {{\bf C}}, such that the transition maps {\tau_{\alpha,\beta}: \phi_\alpha( U_\alpha \cap U_\beta ) \rightarrow \phi_\beta( U_\alpha \cap U_\beta )} defined by {\tau_{\alpha,\beta}: \phi_\beta \circ \phi_\alpha^{-1}} are all holomorphic. Here {A} is an arbitrary index set. Two atlases {(\phi_\alpha: U_\alpha \rightarrow V_\alpha)_{\alpha \in A}}, {(\phi'_\beta: U'_\beta \rightarrow V'_\beta)_{\beta \in B}} on {M} are said to be equivalent if their union is also an atlas, thus the transition maps {\phi'_\beta \circ \phi_\alpha^{-1}: \phi_\alpha(U_\alpha \cap U'_\beta) \rightarrow \phi'_\beta(U_\alpha \cap U'_\beta)} and their inverses are all holomorphic. A Riemann surface is a Hausdorff connected topological space {M} equipped with an equivalence class of one-dimensional complex atlases.

A map {f: M \rightarrow M'} from one Riemann surface {M} to another {M'} is holomorphic if the maps {\phi'_\beta \circ f \circ \phi_\alpha^{-1}: \phi_\alpha(U_\alpha \cap f^{-1}(U'_\beta)) \rightarrow {\bf C}} are holomorphic for any charts {\phi_\alpha: U_\alpha \rightarrow V_\alpha}, {\phi'_\beta: U'_\beta \rightarrow V'_\beta} of an atlas of {M} and {M'} respectively; it is not hard to see that this definition does not depend on the choice of atlas. It is also clear that the composition of two holomorphic maps is holomorphic (and in fact the class of Riemann surfaces with their holomorphic maps forms a category).

Here are some basic examples of Riemann surfaces.

Example 2 (Quotients of {{\bf C}}) The complex numbers {{\bf C}} clearly form a Riemann surface (using the identity map {\phi: {\bf C} \rightarrow {\bf C}} as the single chart for an atlas). Of course, maps {f: {\bf C} \rightarrow {\bf C}} that are holomorphic in the usual sense will also be holomorphic in the sense of the above definition, and vice versa, so the notion of holomorphicity for Riemann surfaces is compatible with that of holomorphicity for complex maps. More generally, given any discrete additive subgroup {\Lambda} of {{\bf C}}, the quotient {{\bf C}/\Lambda} is a Riemann surface. There are an infinite number of possible atlases to use here; one such is to pick a sufficiently small neighbourhood {U} of the origin in {{\bf C}} and take the atlas {(\phi_\alpha: U_\alpha \rightarrow U)_{\alpha \in {\bf C}/\Lambda}} where {U_\alpha := \alpha+U} and {\phi_\alpha(\alpha+z) := z} for all {z \in U}. In particular, given any non-real complex number {\omega}, the complex torus {{\bf C} / \langle 1, \omega \rangle} formed by quotienting {{\bf C}} by the lattice {\langle 1, \omega \rangle := \{ n + m \omega: n,m \in {\bf Z}\}} is a Riemann surface.

Example 3 Any open connected subset {U} of {{\bf C}} is a Riemann surface. By the Riemann mapping theorem, all simply connected open {U \subset {\bf C}}, other than {{\bf C}} itself, are isomorphic (as Riemann surfaces) to the unit disk (or, equivalently, to the upper half-plane).

Example 4 (Riemann sphere) The Riemann sphere {{\bf C} \cup \{\infty\}}, as a topological manifold, is the one-point compactification of {{\bf C}}. Topologically, this is a sphere and is in particular connected. One can cover the Riemann sphere by the two open sets {U_1 := {\bf C}} and {U_2 := {\bf C} \cup \{\infty\} \backslash \{0\}}, and give these two open sets the charts {\phi_1: U_1 \rightarrow {\bf C}} and {\phi_2: U_2 \rightarrow {\bf C}} defined by {\phi_1(z) := z} for {z \in {\bf C}}, {\phi_2(z) := 1/z} for {z \in {\bf C} \backslash \{0\}}, and {\phi_2(\infty) := 0}. This is a complex atlas since the {1/z} is holomorphic on {{\bf C} \backslash \{0\}}.

An alternate way of viewing the Riemann sphere is as the projective line {\mathbf{CP}^1}. Topologically, this is the punctured complex plane {{\bf C}^2 \backslash \{(0,0)\}} quotiented out by non-zero complex dilations, thus elements of this space are equivalence classes {[z,w] := \{ (\lambda z, \lambda w): \lambda \in {\bf C} \backslash \{0\}\}} with the usual quotient topology. One can cover this space by two open sets {U_1 := \{ [z,1]: z \in {\bf C} \}} and {U_2: \{ [1,w]: w \in {\bf C} \}} and give these two open sets the charts {\phi: U_1 \rightarrow {\bf C}} and {\phi_2: U_2 \rightarrow {\bf C}} defined by {\phi_1([z,1]) := z} for {z \in {\bf C}}, {\phi_2([1,w]) := w}. This is a complex atlas, basically because {[z,1] = [1,1/z]} for {z \in {\bf C} \backslash \{0\}} and {1/z} is holomorphic on {{\bf C} \backslash \{0\}}.

Exercise 5 Verify that the Riemann sphere is isomorphic (as a Riemann surface) to the projective line.

Example 6 (Smooth algebraic plane curves) Let {P(z_1,z_2,z_3)} be a complex polynomial in three variables which is homogeneous of some degree {d \geq 1}, thus

\displaystyle P( \lambda z_1, \lambda z_2, \lambda z_3) = \lambda^d P( z_1, z_2, z_3). \ \ \ \ \ (1)


Define the complex projective plane {\mathbf{CP}^2} to be the punctured space {{\bf C}^3 \backslash \{0\}} quotiented out by non-zero complex dilations, with the usual quotient topology. (There is another important topology to place here of fundamental importance in algebraic geometry, namely the Zariski topology, but we will ignore this topology here.) This is a compact space, whose elements are equivalence classes {[z_1,z_2,z_3] := \{ (\lambda z_1, \lambda z_2, \lambda z_3)\}}. Inside this plane we can define the (projective, degree {d}) algebraic curve

\displaystyle Z(P) := \{ [z_1,z_2,z_3] \in \mathbf{CP}^2: P(z_1,z_2,z_3) = 0 \};

this is well defined thanks to (1). It is easy to verify that {Z(P)} is a closed subset of {\mathbf{CP}^2} and hence compact; it is non-empty thanks to the fundamental theorem of algebra.

Suppose that {P} is irreducible, which means that it is not the product of polynomials of smaller degree. As we shall show in the appendix, this makes the algebraic curve connected. (Actually, algebraic curves remain connected even in the reducible case, thanks to Bezout’s theorem, but we will not prove that theorem here.) We will in fact make the stronger nonsingularity hypothesis: there is no triple {(z_1,z_2,z_3) \in {\bf C}^3 \backslash \{(0,0,0)\}} such that the four numbers {P(z_1,z_2,z_3), \frac{\partial}{\partial z_j} P(z_1,z_2,z_3)} simultaneously vanish for {j=1,2,3}. (This looks like four constraints, but is in fact essentially just three, due to the Euler identity

\displaystyle \sum_{j=1}^3 z_j \frac{\partial}{\partial z_j} P(z_1,z_2,z_3) = d P(z_1,z_2,z_3)

that arises from differentiating (1) in {\lambda}. The fact that nonsingularity implies irreducibility is another consequence of Bezout’s theorem, which is not proven here.) For instance, the polynomial {z_1^2 z_3 - z_2^3} is irreducible but singular (there is a “cusp” singularity at {[0,0,1]}). With this hypothesis, we call the curve {Z(P)} smooth.

Now suppose {[z_1,z_2,z_3]} is a point in {Z(P)}; without loss of generality we may take {z_3} non-zero, and then we can normalise {z_3=1}. Now one can think of {P(z_1,z_2,1)} as an inhomogeneous polynomial in just two variables {z_1,z_2}, and by nondegeneracy we see that the gradient {(\frac{\partial}{\partial z_1} P(z_1,z_2,1), \frac{\partial}{\partial z_2} P(z_1,z_2,1))} is non-zero whenever {P(z_1,z_2,1)=0}. By the (complexified) implicit function theorem, this ensures that the affine algebraic curve

\displaystyle Z(P)_{aff} := \{ (z_1,z_2) \in {\bf C}^2: P(z_1,z_2,1) = 0 \}

is a Riemann surface in a neighbourhood of {(z_1,z_2,1)}; we leave this as an exercise. This can be used to give a coordinate chart for {Z(P)} in a neighbourhood of {[z_1,z_2,z_3]} when {z_3 \neq 0}. Similarly when {z_1,z_2} is non-zero. This can be shown to give an atlas on {Z(P)}, which (assuming the connectedness claim that we will prove later) gives {Z(P)} the structure of a Riemann surface.

Exercise 7 State and prove a complex version of the implicit function theorem that justifies the above claim that the charts in the above example form an atlas, and an algebraic curve associated to a non-singular polynomial is a Riemann surface.

Exercise 8

  • (i) Show that all (irreducible plane projective) algebraic curves of degree {1} are isomorphic to the Riemann sphere. (Hint: reduce to an explicit linear polynomial such as {z_3}.)
  • (ii) Show that all (irreducible plane projective) algebraic curves of degree {2} are isomorphic to the Riemann sphere. (Hint: to reduce computation, first use some linear algebra to reduce the homogeneous quadratic polynomial to a standard form, such as {z_1^2+z_2^2+z_3^2} or {z_2 z_3 - z_1^2}.)

Exercise 9 If {a,b} are complex numbers, show that the projective cubic curve

\displaystyle \{ [z_1, z_2, z_3]: z_2^2 z_3 = z_1^3 + a z_1 z_3^2 + b z_3^3 \}

is nonsingular if and only if the discriminant {-16 (4a^3 + 27b^2)} is non-zero. (When this occurs, the curve is called an elliptic curve (in Weierstrass form), which is a fundamentally important example of a Riemann surface in many areas of mathematics, and number theory in particular. One can also define the discriminant for polynomials of higher degree, but we will not do so here.)

A recurring theme in mathematics is that an object {X} is often best studied by understanding spaces of “good” functions on {X}. In complex analysis, there are two basic types of good functions:

Definition 10 Let {X} be a Riemann surface. A holomorphic function on {X} is a holomorphic map from {X} to {{\bf C}}; the space of all such functions will be denoted {{\mathcal O}(X)}. A meromorphic function on {X} is a holomorphic map from {X} to the Riemann sphere {{\bf C} \cup \{\infty\}}, that is not identically equal to {\infty}; the space of all such functions will be denoted {M(X)}.

One can also define holomorphicity and meromorphicity in terms of charts: a function {f: X \rightarrow {\bf C}} is holomorphic if and only if, for any chart {\phi_\alpha: U_\alpha \rightarrow {\bf C}}, the map {f \circ \phi^{-1}_\alpha: \phi_\alpha(U_\alpha) \rightarrow {\bf C}} is holomorphic in the usual complex analysis sense; similarly, a function {f: X \rightarrow {\bf C} \cup \{\infty\}} is meromorphic if and only if the preimage {f^{-1}(\{\infty\})} is discrete (otherwise, by analytic continuation and the connectedness of {X}, {f} will be identically equal to {\infty}) and for any chart {\phi_\alpha: U_\alpha \rightarrow X}, the map {f \circ \phi_\alpha^{-1}: \phi_\alpha(U_\alpha) \rightarrow {\bf C} \cup \{\infty\}} becomes a meromorphic function in the usual complex analysis sense, after removing the discrete set of complex numbers where this map is infinite. One consequence of this alternate definition is that the space {{\mathcal O}(X)} of holomorphic functions is a commutative complex algebra (a complex vector space closed under pointwise multiplication), while the space {M(X)} of meromorphic functions is a complex field (a commutative complex algebra where every non-zero element has an inverse). Another consequence is that one can define the notion of a zero of given order {k}, or a pole of order {k}, for a holomorphic or meromorphic function, by composing with a chart map and using the usual complex analysis notions there, noting (from the holomorphicity of transition maps and their inverses) that this does not depend on the choice of chart. (However, one cannot similarly define the residue of a meromorphic function on {X} this way, as the residue turns out to be chart-dependent thanks to the chain rule. Residues should instead be applied to meromorphic {1}-forms, a concept we will introduce later.) A third consequence is analytic continuation: if two holomorphic or meromorphic functions on {X} agree on a non-empty open set, then they agree everywhere.

On the complex numbers {{\bf C}}, there are of course many holomorphic functions and meromorphic functions; for instance any power series with an infinite radius of convergence will give a holomorphic function, and the quotient of any two such functions (with non-zero denominator) will give a meromorphic function. Furthermore, we have extremely wide latitude in how to specify the zeroes of the holomorphic function, or the zeroes and poles of the meromorphic function, thanks to tools such as the Weierstrass factorisation theorem or the Mittag-Leffler theorem (covered in previous quarters).

It turns out, however, that the situation changes dramatically when the Riemann surface {X} is compact, with the holomorphic and meromorphic functions becoming much more rigid. First of all, compactness eliminates all holomorphic functions except for the constants:

Lemma 11 Let {f \in \mathcal{O}(X)} be a holomorphic function on a compact Riemann surface {X}. Then {f} is constant.

This result should be seen as a close sibling of Liouville’s theorem that all bounded entire functions are constant. (Indeed, in the case of a complex torus, this lemma is a corollary of Liouville’s theorem.)

Proof: As {f} is continuous and {X} is compact, {|f(z_0)|} must attain a maximum at some point {z_0 \in X}. Working in a chart around {z_0} and applying the maximum principle, we conclude that {f} is constant in a neighbourhood of {z_0}, and hence is constant everywhere by analytic continuation. \Box

This dramatically cuts down the number of possible meromorphic functions – indeed, for an abstract Riemann surface, it is not immediately obvious that there are any non-constant meromorphic functions at all! As the poles are isolated and the surface is compact, a meromorphic function can only have finitely many poles, and if one prescribes the location of the poles and the maximum order at each pole, then we shall see that the space of meromorphic functions is now finite dimensional. The precise dimensions of these spaces are in fact rather interesting, and obey a basic duality law known as the Riemann-Roch theorem. We will give a mostly self-contained proof of the Riemann-Roch theorem in these notes, omitting only some facts about genus and Euler characteristic, as well as construction of certain meromorphic {1}-forms (also known as Abelian differentials).

— 1. Divisors —

To discuss the zeroes and poles of meromorphic functions, it is convenient to introduce an abstraction of the concept of “a collection of zeroes and poles”, known as a divisor.

Definition 12 (Divisor) Let {X} be a compact Riemann surface. A divisor on {X} is a formal integer linear combination {\sum_P c_P \cdot (P)}, where {P} ranges over a finite collection of points in {X}, and {c_P} are integers, with the obvious additive group structure; equivalently, the space {\mathrm{Div}(X)} of divisors is the free abelian group with generators {(P)} with {P \in X} (where we make the usual convention {1 \cdot (P) = (P)}). The number {\sum_P c_P} is the degree of the divisor; we call each {c_P} the order of the divisor {D} at {P}, with the convention that the order is zero for points not appearing in the sum. A divisor is non-negative (or effective) if all the {c_P} are non-negative, and we partially order the divisors by writing {D_1 \geq D_2} if {D_1-D_2} is non-negative. This makes {\mathrm{Div}(X)} a lattice, so we can define the maximum {\max(D_1,D_2)} or minimum {\min(D_1,D_2)} of two divisors. Given a non-zero meromorphic function {f \in M(X)}, the principal divisor {(f)} associated to {f} is the divisor {\sum_P \mathrm{ord}_P(f) \cdot (P)}, where {P} ranges over the zeroes and poles of {f}, and {\mathrm{ord}_P(f)} is the order of zero (or negative the order of pole) at {P}. (Note that as zeroes and poles are isolated, and {X} is compact, the number of zeroes and poles is automatically finite.)

Informally, one should think of {c_P \cdot (P)} as the abstraction of a zero of order {c_P} at {P}, or a pole of order {-c_P} if {c_P} is negative.

Example 13 Consider a rational function

\displaystyle f(z) = \alpha \frac{(z-z_1) \dots (z-z_m)}{(z-w_1)\dots(z-w_n)}

for some non-zero complex number {\alpha} and some complex numbers {z_1,\dots,z_m,w_1,\dots,w_n}. This is a meromorphic function on {{\bf C}}, and {f(1/z)} is also meromorphic, so {f} extends to a meromorphic function on the Riemann sphere {{\bf C} \cup \{\infty\}}. It has zeroes at {z_1,\dots,z_m} and poles at {w_1,\dots,w_n}, and also has a zero of order {n-m} (or a pole of order {m-n}) at {\infty}, as can be seen by inspection of {z \mapsto f(1/z)} near the origin (or the growth of {f(z)} near infinity), and thus

\displaystyle (f) = \sum_{j=1}^m (z_j) - \sum_{j=1}^n (w_n) + (n-m) \cdot (\infty).

In particular, {(f)} has degree zero.

Exercise 14 Show that all meromorphic functions on the Riemann sphere come from rational functions as in the above example. In particular, every principal divisor on the Riemann sphere has degree zero. Give an alternate proof of this latter fact using the residue theorem. (We will generalise this fact to other Riemann surfaces shortly; see Proposition 24.)

It is easy to see (by working in a coordinate chart around {P}) that if {f, g \in M(X)} are non-zero meromorphic functions, that one has the valuation axioms

\displaystyle \mathrm{ord}_P(fg) = \mathrm{ord}_P(f) + \mathrm{ord}_P(g)

\displaystyle \mathrm{ord}_P(f/g) = \mathrm{ord}_P(f) - \mathrm{ord}_P(g)

\displaystyle \mathrm{ord}_P(f+g) \geq \min( \mathrm{ord}_P(f), \mathrm{ord}_P(g) )

for any {P \in X} (adopting the convention the zero function has order {+\infty} everywhere); thus we have

\displaystyle (fg) = (f) + (g), \quad (f/g) = (f) - (g); \quad (f+g) \geq \min( (f), (g) ) \ \ \ \ \ (2)


again adopting the convention that {(0)} is larger than every divisor. In particular, the space {\mathrm{PDiv}(X)} of principal divisors of {X} is a subgroup of {\mathrm{Div}(X)}. We call two divisors linearly equivalent if they differ by a principal divisor; this is clearly an equivalence relation.

The properties (2) have the following consequence. Given a divisor {D}, let {L(D)} be the space of all meromorphic functions {f \in M(X)} such that {(f)+D \geq 0} (including, by convention, the zero function {0}); thus, if {D = \sum_P c_P \cdot (P)}, then {L(D)} consists of functions that have at worst a pole of order {c_P} at {P} (or a zero of order {-c_P} or greater, if {c_P} is negative). For instance, {L( 2(P) + (Q) - (R))} is the space of meromorphic functions that have at most a double pole at {P}, a single pole at {Q}, and at least a simple zero at {R}, if {P,Q,R} are distinct points in {X}. From (2) (and the fact that non-zero constant functions have principal divisor zero) we see that each {L(D)} is a vector space. We clearly have the nesting properties {L(D_1) \subset L(D_2)} if {D_1 \leq D_2}, and also if {f \in L(D_1), g \in L(D_2)} then {fg \in L(D_1+D_2)}.

Remark 15 In the language of vector bundles, one can identify a divisor {D} with a certain holomorphic line bundle on {X}, and {L(D)} can be identified with the space of sections of this bundle. This is arguably the more natural way to think about divisors; however, we will not adopt this language here.

If {D \leq 0} and {f \in L(D)}, then {f} is holomorphic on {X} and hence (by Lemma 11) constant. We can thus easily compute {L(D)} for zero or negative divisors:

Corollary 16 Let {X} be a compact Riemann surface. Then {L(0)} consists only of the constant functions, and {L(D)} consists only of {0} if {D<0}. In particular, {L(D)} has dimension {1} when {D=0} and {0} when {D<0}.

Exercise 17 If {(f)} and {(g)} are principal divisors with {(f) \leq (g)}, show that {f} is a constant multiple of {g} with {(f) = (g)}.

Exercise 18 Let {D} be a divisor. Show that {\mathrm{dim}(L(D)) > 0} if and only if {D} is linearly equivalent to an effective divisor.

The situation for {D \not \leq 0} (i.e., {D} has positive order at at least one point) is more interesting. We first have a simple observation from linear algebra:

Lemma 19 Let {X} be a compact Riemann surface, {D} be a divisor, and {P \in X} be a point. Then {L(D)} has codimension at most {1} in {L(D+(P))}.

Proof: Let {\phi: U \rightarrow {\bf C}} be a chart that maps {P} to the origin, and suppose that {D} already had order {m} at {P} (so that {D+(P)} had order {m+1}). Then functions {f \in L(D+(P))}, when composed with the inverse {\phi^{-1}} of the chart function have Laurent expansion

\displaystyle f( \phi^{-1}(z) ) = \frac{a_{m+1}}{z^{m+1}} + \frac{a_m}{z^m} + \dots

for some complex coefficients {a_{m+1},a_m,\dots} (which will depend on the choice of chart). The map {f \mapsto a_{m+1}} is clearly a linear map from {L(D+(P))} to {{\bf C}}, whose kernel is {L(D)}, and the claim follows. \Box

As a corollary of this lemma and Corollary 16, we see that the spaces {L(D)} are all finite dimensional, with the dimension {\mathrm{dim}(L(D))} increasing by zero or one each time one adds an additional pole to {D}.

Here is another simple linear algebra relation between the dimensions of the spaces {L(D)}:

Lemma 20 Let {X} be a compact Riemann surface, and let {D_1,D_2} be divisors. Then

\displaystyle \mathrm{dim} L(D_1) + \mathrm{dim} L(D_2)

\displaystyle \leq \mathrm{dim} L(\min(D_1,D_2)) + \mathrm{dim} L(\max(D_1,D_2)).

Proof: From linear algebra we have

\displaystyle \mathrm{dim} L(D_1) + \mathrm{dim} L(D_2) = \mathrm{dim}(L(D_1) \cap L(D_2)) + \mathrm{dim} (L(D_1)+L(D_2)).

Since {L(D_1) \cap L(D_2) = L(\min(D_1,D_2))} and {L(D_1) + L(D_2) \subset L(\max(D_1,D_2))}, the claim follows. \Box

If {D} is a divisor and {(f)} is a principal divisor, then (2) gives an isomorphism between {L(D)} and {L(D+(f))}, by mapping {g \in L(D)} to {g/f \in L(D+(f))}. In particular, the dimensions {\mathrm{dim}(L(D))} and {\mathrm{dim}(L(D+(f)))} of the linearly equivalent divisors {D, D+(f)} are the same. If we define a divisor class to be a coset {D + \mathrm{PDiv}(X) = \{ D + (f): f \in M(X) \backslash \{0\}\}} of the principal divisors in {\mathrm{Div}(X)} (that is to say, an equivalence class for linear equivalence), then we conclude that the dimension {\mathrm{dim}(L(D))} depends only on the divisor class {D + \mathrm{PDiv}(X)} of {D}. The space {\mathrm{Div}(X)/\mathrm{PDiv}(X)} of divisor classes is an abelian group, which is known as the divisor class group. (For nonsingular algebraic curves, this group also coincides with the Picard group, though the situation is more subtle if one allows singularities.)

It is now easy to understand the spaces {L(D)} for the Riemann sphere:

Exercise 21 Show that two divisors on the Riemann sphere are equivalent if and only if they have the same degree, so that the degree map gives an isomorphism between the divisor class group of the Riemann sphere and the integers. If {D} is a divisor on the Riemann sphere, show that {\mathrm{dim}(L(D))} is equal to {\max( 0, \mathrm{deg}(D) + 1 )}. (Hint: first show that for any integer {m}, that {L(m \cdot (\infty))} is the space of polynomials of degree at most {m}.)

From the above exercise we observe in particular that

\displaystyle \mathrm{dim}(L(D)) - \mathrm{dim}(L(K-D)) = \mathrm{deg}(D) + 1, \ \ \ \ \ (3)


whenever {K} has degree {-2}; as we will see later, this is a special case of the Riemann-Roch theorem.

— 2. Meromorphic {1}-forms —

To proceed further, we will introduce the concept of a meromorphic {1}-form on a compact Riemann surface {X}. To motivate this concept, observe that one can think of a meromorphic function {f \in M(X)} on {X} as a collection of meromorphic functions {f_\phi := f \circ \phi^{-1}: \phi(U) \rightarrow {\bf C} \cup \{\infty\}} on open subsets of the complex plane, where {\phi: U \rightarrow {\bf C}} ranges over a suitable atlas of {X}. These meromorphic functions {f_\phi} are compatible with each other in the following sense: if {\phi: U \rightarrow {\bf C}} and {\psi: V \rightarrow {\bf C}} are charts, then we have

\displaystyle f_\psi(z) = f_\phi( \phi \circ \psi^{-1}(z) ) \ \ \ \ \ (4)


for all {z \in \psi(U \cap V)} (this condition is vacuous if {U,V} do not overlap). As already noted, one can define such concepts as the order of {f} at a pole {P} by declaring it to be the order of {f_\phi} at {\phi(P)} for any chart {\phi: U \rightarrow {\bf C}} that contains {P} in its domain, and the compatibility condition (4) ensures that this definition is well defined.

On the other hand, several other basic notions in complex analysis do not seem to be well defined for such meromorphic functions. Consider for instance the question of how to define the residue of {f} at a pole {P}. The natural thing to do is to again pick a chart {\phi} around {P} and use the residue of {f_\phi}; however one can check that this is not independent of the choice of chart in general, as from (4) one will find that the residues of {f_\psi} and {f_\phi} are related to each other, but not equal. Similarly, one encounters a difficulty integrating {f} on a contour {\gamma} in {X}, even if the contour is short enough to fit into the domain {U} of a single chart {\phi: U \rightarrow {\bf C}} and also avoids all the poles of {f}; the natural thing to do is to compute {\int_{\phi \circ \gamma} f_\phi(z)\ dz}, but again this will depend on the choice of chart (substituting (4) will reveal that {\int_{\psi \circ \gamma} f_\psi(z)\ dz} is not equal to {\int_{\phi \circ \gamma} f_\phi(z)\ dz} in general due to an additional Jacobian factor). Finally, one encounters a difficulty trying to differentiate a meromorphic function {f \in M(X)}; on each chart {\phi: U \rightarrow {\bf C}} one would like to just differentiate {f_\phi}, but the resulting derivatives {(f_\phi)'} do not obey the compatibility condition (4), but instead (by the chain rule) obey the slightly different condition

\displaystyle (f_\psi)'(z) = (f_\phi)'( \phi \circ \psi^{-1}(z) ) (\phi \circ \psi^{-1})'(z).

The solution to all of these issues is to introduce a new type of object on {X}, the meromorphic {1}-forms.

Definition 22 A meromorphic {1}-form {\omega} on {X} is a collection of expressions {\omega_\phi(z)\ dz} for each coordinate chart {\phi: U \rightarrow {\bf C}} of {X}, with {\omega_\phi} meromorphic on {\phi(U)}, which obey the compatibility condition

\displaystyle \omega_\psi(z) = \omega_\phi( \phi \circ \psi^{-1}(z) ) (\phi \circ \psi^{-1})'(z) \ \ \ \ \ (5)


for any pair {\phi: U \rightarrow {\bf C}}, {\psi: V \rightarrow {\bf C}} of charts and any {z \in U \cap V}. If all the {\omega_\phi} are holomorphic, we say that {\omega} is holomorphic also. The space of meromorphic {1}-forms will be denoted {M\Omega^1(X)}.

As with meromorphic functions, we can define the order {\mathrm{ord}_P(\omega)} of {\omega} at a point {P \in X} to be the order of {\omega_\phi} at {\phi(P)} for some chart {\phi} that contains {P} in its domain; from (5) we see that this is well defined. Similarly we may define the divisor {(\omega)} of {\omega}. The divisor of a non-zero meromorphic {1}-form is called a canonical divisor. (We will show later that at least one non-zero meromorphic {1}-form is available, so that canonical divisors exist.)

Let {\omega} be a meromorphic {1}-form. Given a contour {\gamma: [a,b] \rightarrow X} that lies in the domain {U} of a single chart {\phi: U \rightarrow {\bf C}} and avoids the poles of {\omega}, we can define the integral {\int_\gamma \omega} to be equal to {\int_{\phi \circ \gamma} \omega_\phi(z)\ dz}. One checks from (5) and the change of variables formula that this definition is independent of the choice of chart. One then defines {\int_\gamma \omega} for longer contours by partitioning into short contours; again, one can check that this definition is independent of the choice of partition.

The residue of {\omega} at {P} can be defined as the residue of {\omega_\phi} at {\phi(P)} for a chart {\phi} that contains {P} in its domain, or equivalently (by the residue theorem) {\frac{1}{2\pi i} \int_\gamma \omega} where {\gamma} is a sufficiently small contour winding around {P} once anticlockwise (note that we have a consistent orientation on {X} since invertible holomorphic maps are orientation preserving).

Meromorphic {1}-forms are also known as Abelian differentials, while holomorphic {1}-forms are Abelian differentials of the first kind. (Abelian differentials of the second kind are meromorphic {1}-forms in which all residues vanish, while Abelian differentials of the third kind are meromorphic {1}-forms in which all poles are simple.) To specify a meromorphic form {\omega}, it suffices to prescribe {\omega_\phi} for all {\phi} in a single atlas of {X}; as long as (5) is obeyed within this atlas, it is easy to see that {\omega_\psi} can then be defined uniquely using (5) for all other coordinate charts.

There are two basic ways to create meromorphic {1}-forms. One is to start with a meromorphic function {f \in M(X)} and form its differential {df}, which when evaluated any chart {\phi: U \rightarrow {\bf C}} of {X} is given by the formula

\displaystyle (df)_\phi\ dz := (f \circ \phi^{-1})'(z)\ dz;

the compatibility condition (5) is then clear from the chain rule. Another way is to start with an existing meromorphic {1}-form {\omega} and multiply it by a meromorphic function {f} to give a new meromorphic {1}-form {f \omega}, which when evaluated at a given chart {\phi: U \rightarrow {\bf C}} of {X} is given by

\displaystyle (f\omega)_\phi\ dz := (f \circ \phi^{-1}))(z) \omega_\phi(z)\ dz;

again, it is clear that the compatibility condition (5) holds. Conversely, given two meromorphic {1}-forms {\omega_1, \omega_2}, with {\omega_2} not identically zero, one can form the ratio {\omega_1/\omega_2} to be the unique meromorphic function {f} such that {\omega_1 = f \omega_2}; it is easy to see that {f} exists and is unique. These properties are compatible with taking divisors, thus {(f\omega) = (f) + (\omega)} and {(\omega_1/\omega_2) = (\omega_1) - (\omega_2)}.

Of course, one can also add two meromorphic {1}-forms to obtain another meromorphic {1}-form. Thus {M\Omega^1(X)} is in fact a one-dimensional vector space over the field {M(X)} (here we assume that non-zero meromorphic {1}-forms exist, a claim which we will return to later). In particular, the canonical divisor is unique up to linear equivalence.

Later on we will discuss a further way to create a meromorphic {1}-form, by taking the gradient of a harmonic function with specific types of singularities.

Example 23 The coordinate function {z} can be viewed as a meromorphic function on the Riemann sphere {{\bf C} \cup \{\infty\}} (it has a simple zero at {0} and a simple pole at {\infty}). Its derivative {dz} then has a double pole at infinity (note that in the reciprocal coordinate {w = 1/z}, {dz} transforms to {-\frac{1}{w^2} dw}), so {(dz) = - 2 \cdot (\infty)}. Any other meromorphic {1}-form is of the form {f(z) dz}, where {f} is a meromorphic function (that is to say, a rational function). In particular, since meromorphic functions have divisor of degree {0}, all meromorphic {1}-forms on the Riemann sphere have a divisor of degree {-2}; indeed, the canonical divisors here are precisely the divisors of degree {-2}.

We now give a key application of meromorphic {1}-forms to the divisors of meromorphic functions:

Proposition 24 Let {X} be a compact Riemann surface.

  • (i) For any meromorphic {1}-form {\omega}, the sum of all the residues of {\omega} vanishes.
  • (ii) Every principal divisor {(f)} has degree zero.

Proof: We begin with (i). By evaluating at coordinate charts, the counterclockwise integral of {\omega} around any small loop {\gamma} that avoids any pole is zero; thus {\omega} is closed outside of these poles, and hence by Stokes’ theorem we conclude that the integral of {\gamma} around the sum of small counterclockwise loops around every pole is zero. On the other hand, by the residue theorem applied in each chart, this integral is equal to {2\pi i} times the sum of the residues, and the claim follows.

To prove (ii), apply (i) to the meromorphic function {df/f} (cf. the usual proof of the argument principle). \Box

Exercise 25 Let {X} be a compact Riemann surface, and let {D} be a divisor on {X}.

  • (i) If {\mathrm{deg}(D) < 0}, show that {\mathrm{dim}(L(D)) = 0}.
  • (ii) If {\mathrm{deg}(D) = 0}, show that {\mathrm{dim}(L(D))} is equal to {0} or {1}, with the latter occuring if and only if {D} is principal. Furthermore, any non-zero element of {L(D)} has divisor {-D}.
  • (iii) If {\mathrm{deg}(D) \geq 0}, establish the bound {\mathrm{dim}(L(D)) \leq \mathrm{deg}(D) + 1}.

We have already discussed how algebraic curves {Z(P)} give good examples of Riemann surfaces. In the converse direction, it is common for Riemann surfaces to map into algebraic curves, as hinted by the following exercise:

Exercise 26 Let {X} be a compact Riemann surface, and let {f, g} be two non-constant meromorphic functions on {X}. Show that there exists a non-zero polynomial {P(z_1,z_2)} of two variables with complex coefficients such that {P(f,g)=0}. (Hint: look at the monomials {f^i g^j} for {i,j \leq N} for some large {N}, and show that they lie in {L(D_N)} for a suitable divisor {D_N}. Then use part (iii) of the previous exercise and linear algebra.) Show furthermore that one can take {P} to be irreducible.

— 3. The case of a complex torus —

For the special case when the Riemann surface being studied is a complex torus {{\bf C}/\Lambda}, one can obtain more precise information on the dimensions {\mathrm{dim}(L(D))} by explicit computations. First observe we have a natural holomorphic {1}-form on {{\bf C}/\Lambda}, namely the form {dz}, defined in any small coordinate chart {\phi: U \hbox{ mod } \Lambda \rightarrow U} on a small disk {U} in {{\bf C}} (with {\phi(z+\Lambda) = z}) by {(dz)_\phi = dz}, and then defined for any other coordinate chart by compatibility. This form has no poles and zeroes, and so {0} is a canonical divisor. Using this {1}-form, we have a bijection between meromorphic functions and meromorphic {1}-forms on {{\bf C}/\Lambda} which maps {f(z)} to {f(z)\ dz}; in contrast to the situation with other Riemann surfaces with non-zero canonical divisor, this bijection does not affect the divisor. In particular, canonical divisors are principal and vice versa. Using this bijection, we can think of the differential {df} of a meromorphic function as another meromorphic function, which we call the derivative {f'}, as per the familiar formula {f' = df / dz}. Of course, with respect to the above coordinate charts, this derivative corresponds to the usual complex derivative.

We also have a fundamental meromorphic function on {{\bf C}/\Lambda}, or equivalently a {\Lambda}-periodic function on {{\bf C}}, namely the Weierstrass {p}-function

\displaystyle \wp(z) := \frac{1}{z^2} + \sum_{w \in \Lambda \backslash\{0\}} (\frac{1}{(z-w)^2} - \frac{1}{w^2}). \ \ \ \ \ (6)


It is easy to see that the sum converges outside of {\Lambda}, and that this is a meromorphic {\Lambda}-periodic function on {{\bf C}} that has a double pole at every point in {\Lambda}; this descends to a meromorphic function on {{\bf C}/\Lambda} with divisor {2 \cdot (0 + \Lambda)}. By translation we can then create a meromorphic function {\wp_P} with divisor {2 \cdot (P)} for any {{\bf C}/\Lambda}.

Using this function and some manipulations, we can compute {\mathrm{dim}(L(D))} for most divisors {D}:

Lemma 27 Let {{\bf C}/\Lambda} be a complex torus, and let {D} be a divisor.

  • (i) If {\mathrm{deg}(D) < 0}, then {\mathrm{dim}(L(D)) = 0}.
  • (ii) If {\mathrm{deg}(D) = 0}, then {\mathrm{dim}(L(D))} is equal to {0} or {1}. If {D = (P)-(Q)} for some distinct {P,Q \in {\bf C}/\Lambda}, then {\mathrm{dim}(L(D)) = 0}. Also, {\mathrm{dim}(L(D)) = \mathrm{dim}(-L(D))}.
  • (iii) If {\mathrm{deg}(D) > 0}, then {\mathrm{dim}(L(D)) = \mathrm{deg}(D)}.

Proof: Part (i) and the first claim of part (ii) follows from Exercise 25. To prove the second claim of part (ii), it suffices by Exercise 25 to show that there is no meromorphic function with divisor {(P)-(Q)}, that is to say a simple pole at {P} and a simple zero at {Q}. But this follows from Proposition 24(i) (and identifying meromorphic functions with meromorphic {1}-forms) since the residue at {P} is non-zero and there is no other residue to cancel it. The third claim comes from Exercise 25 and the observation that {D} is principal if and only if {-D} is.

Call a divisor {D} good if {\mathrm{dim}(L(D)) = \mathrm{deg}(D)}. We need to show that all divisors of positive degree are good. First we check that {(P)} is good for a point {P}. By Proposition 24(i) we see that the only meromorphic functions in {L((P))} are constant, hence {\mathrm{dim}(L((P))) = 1}, and so {(P)} is good.

The Weierstrass {p}-function {\wp_P} at {P} gives a an element of {L(2 \cdot (P))} which is non-constant (it has a double pole at {P}), so by Lemma 19 we have {\mathrm{dim}(L(2 \cdot (P))) = 2}, and so {2 \cdot (P)} is good. Taking a derivative of {\wp_P} to obtain a meromorphic function with a triple pole at {P}, we obtain a further element of {L(3 \cdot (P))} that is not in {L(2 \cdot (P))}, and so {\mathrm{dim}(L(3 \cdot (P))) = 3}, and so {3 \cdot (P)} is good. Continuing to differentiate in this fashion we see that {k \cdot (P)} is good for any natural number {k}.

Next, for any distinct pairs of points {P, Q}, we write {Q = P + \zeta} for some complex number {\zeta}, and define the meromorphic function

\displaystyle \wp_{P \rightarrow Q}(z) := \int_\gamma \wp_{P+w}(z)\ dw

where {\gamma} is some contour from {0} to {\zeta}. (As {w \mapsto \wp_{P+w}(z)} locally has an antiderivative at every point, this definition does not depend on the choice of {\gamma}, though it is a little sensitive to the choice of {\zeta}.) One can check that this function is meromorphic with simple poles at {P} and {Q}, which shows that {\mathrm{dim}(L((P)+(Q))) \geq 2}. From Lemma 19 and the fact that {(P)} is good, we conclude that {(P)+(Q)} is good.

Observe from Lemma 19 and Lemma 20 we see that if {D} is a divisor and {P,Q} are distinct points such that {D, D+(P), D+(Q)} are good, then {D+(P)+(Q)} is also good. We have just shown that all effective divisors of degree {1} and {2} are good; by induction one can now show that all effective divisors of positive degree are good.

Call a degree one divisor {D} very good if {D+D'} is good for every {D' \geq 0}. We have shown that {(P)} is very good for all {P}. We now claim that if {D} is very good then so is {\tilde D := D + (P) - (Q)} is very good. First note that {\tilde D - (P)} and {\tilde D - (Q)} cannot both be principal, since their difference {(P)-(Q)} is not principal. Thus by Exercise 25, at least one of {\mathrm{dim}(L(\tilde D-(P))} or {\mathrm{dim}(L(\tilde D- (Q))} vanishes, and hence {\mathrm{dim}(L(\tilde D)) \leq 1} by Lemma 19. On the other hand, as {D} is very good, {\tilde D+(Q) = D+(P)} is good, and so {\mathrm{dim}(L(\tilde D + (Q))) = 2}. By Lemma 19 we conclude that {\tilde D} is good.

For any {R}, we know that {\tilde D} and {\tilde D + (Q) + (R) = D + (P) + (R)} are good, hence by Lemma 19 the intermediate divisor {\tilde D + (R)} must also be good. Iterating this argument we see that {\tilde D +D'} is good for every {D' \geq 0}, thus {\tilde D'} is very good. Iterating this we see that all degree one divisors are very good, giving (iii). \Box

An alternate way to show that {(P)+(Q)} is good was shown to me by Redmond McNamara as follows. By subtracting a constant from {\wp_P} one can find a meromorphic function {f} with a double pole at {P} and at least a single zero at {Q}. If it is exactly a single zero, multiplying by {\wp_Q} will create a function with precisely a simple pole at {Q} and at most a double pole at {P}; subtracting a multiple of {\wp_P} if necessary will then give the required non-trivial element of {L((P)+(Q))}. If {f} has a double zero at {Q} instead, multiply {f'} by {\wp_Q} and subtract multiples of {\wp_P} and {\wp'_P} to obtain the same result. Note that {f} cannot have more than a double zero because of Proposition 24(ii).

As a corollary of the above proposition we obtain the complex torus case of the Riemann-Roch theorem:

\displaystyle \mathrm{dim}(L(D)) - \mathrm{dim}(L(-D)) = \mathrm{deg}(D), \ \ \ \ \ (7)


valid for any divisor {D} (regardless of degree); compare with (3). The one remaining point is to work out which degree zero divisors are principal. It turns out that there is an additional constraint beyond degree zero:

Exercise 28 Suppose that {(P_1)+\dots+(P_n)-(Q_1)-\dots-(Q_n)} is a principal divisor on a complex torus {{\bf C}/\Lambda} (we allow repetition). Show that {P_1+\dots+P_n-Q_1-\dots-Q_n=0} using the group law on {{\bf C}/\Lambda}. (Hint: if {f} is a meromorphic function with zeroes at {P_1,\dots,P_n} and poles at {Q_1,\dots,Q_n}, integrate {z \frac{f'(z)}{f(z)}} around a parallelogram fundamental domain of {{\bf C}/\Lambda} (translating if necessary so that the boundary of the parallelogram avoids the zeroes and poles).)

In fact, this is the only condition:

Proposition 29 A degree zero divisor {(P_1)+\dots+(P_n)-(Q_1)-\dots-(Q_n)} is principal if and only if {P_1+\dots+P_n-Q_1-\dots-Q_n=0}.

Proof: By the above exercise it suffices to establish the “if” direction. We may of course assume {n \geq 1}. By Lemma 27, the space {L( (P_1) + \dots + (P_n) - (Q_1) - \dots - (Q_{n-1}) )} is one-dimensional, thus there exists a non-zero meromorphic function {f} with poles at {P_1,\dots,P_n}, zeroes at {Q_1,\dots,Q_{n-1}}, and no further poles (counting multiplicity). By Proposition 24(ii) {f} must have one further zero, and by the above exercise this zero must be {Q_n}. The claim follows. \Box

One can explicitly write down a formula for these meromorphic functions using theta functions, but we will not do so here.

The above proposition links the group law on a complex torus with the group law on divisors. This is part of a more general relation involving the Jacobian variety of a curve and the Abel-Jacobi theorem, but we will not discuss this further in this course.

Exercise 30 Let {{\bf C}/\Lambda} be a complex torus. Show that the Weierstrass function {\wp} obeys the differential equation

\displaystyle \wp'(z)^2 = 4\wp(z)^3 - g_2 \wp(z) - g_3

for some complex numbers {g_2,g_3} depending on {\Lambda}. Also show that the map {z \mapsto [\wp(z), \wp'(z),1]} for {z \in {\bf C} /\Lambda \backslash \{0+\Lambda\}} (with {0+\Lambda} mapping to {[0,1,0]}) is a holomorphic invertible map from {{\bf C}/\Lambda} to the algebraic curve

\displaystyle \{ [z_1,z_2,z_3]: z_2^2 z_3 = 4 z_1^3 - g_2 z_1 z_3^2 - g_3 z_3^3 \}

which is non-singular and irreducible. (Thus, every complex torus is isomorphic to an elliptic curve. The converse is also true, but will not be established here.)

Proposition 31 Let {{\bf C}/\Lambda} be a complex torus, and let {f: {\bf C}/\Lambda \rightarrow {\bf C}/\Lambda} be a function. Show that {f} is holomorphic if and only if it takes the form

\displaystyle f( z + \Lambda ) = \alpha z + z_0 + \Lambda

for all {z \in {\bf C}}, and some complex numbers {\alpha,z_0}, with {\alpha} lying in the set {\Gamma := \{ \alpha \in {\bf C}: \alpha \lambda \in \Lambda \hbox{ for all } \lambda \in \Lambda \}}. Furthermore, show that {\Gamma} is either equal to the integers, or to a lattice of the form {\{ n + m \alpha_0: n,m \in {\bf Z} \}} for some quadratic algebraic integer {\alpha_0} (thus {\alpha_0} obeys an equation {\alpha_0^2 + b \alpha_0 + c = 0} for some integers {b,c}). In the latter case, the complex torus is said to have complex multiplication.

— 4. The Riemann Roch theorem —

We now leave the example of the complex torus and return to more general compact Riemann surfaces {X}. We would like to generalise the identity (7) (or (3)) to this setting. As a first step we establish

Proposition 32 (Baby Riemann Roch theorem) Let {K} be a canonical divisor in a compact Riemann surface {X}, and let {D} be an effective divisor. Then

\displaystyle \mathrm{dim} L(D) - \mathrm{dim}L(K-D) \leq \mathrm{deg}(D) + 1 - \mathrm{dim}(L(K)).

Proof: Write {D = \sum_{P \in {\mathcal P}} n_P \cdot (P)} where {P} ranges over some finite set {{\mathcal P}} of points in {X}, and {n_P} are positive integers. As {K} is a canonical divisor, we can find a meromorphic {1}-form {\omega}, not identically zero, with divisor {K}. If {h \in L(K)}, then {h \omega} is a holomorphic {1}-form. The proof relies on using linear algebra to combine the following three observations that tie together {L(D)}, {L(K-D)}, and {L(K)}:

  1. If {f \in L(D)} and {h \in L(K)}, then {\sum_{P \in {\mathcal P}} \mathrm{Res}(f h \omega, P) = 0}. This follows from Proposition 24(i) and the fact that the only possible poles of {f h \omega} are in {{\mathcal P}}.
  2. If {f \in L(D)} and {h \in L(K-D)}, then we have the stronger assertion that {\mathrm{Res}(f h \omega, P) = 0} for each individual {P \in {\mathcal P}}. This follows because the divisor of {f h \omega} is at least {-D - (K-D) + K = 0}, and so {f h \omega} has no pole at {P}. Furthermore this is a local statement: it holds even if {f} is only defined on a small neighborhood of {P}, rather than on all of {X}. Finally, the claim is sharp: if {h \not \in L(K-D)} then one can find {P \in {\mathcal P}} and some {f} defined locally near {P} in {L(D)} for which {\mathrm{Res}(f h \omega, P) \neq 0}
  3. For {h \in L(K)}, {P \in {\mathcal P}}, and {f \in M(X)} is holomorphic at {P}, then {\mathrm{Res}(f h \omega, P) = 0}, since {h \omega} is also holomorphic at {P}. Again, this is a local statement, and holds even if {f} is only defined in a neighbourhood of {P}.

Let us now see how these facts combine to give the proposition. Around each {P} let us form a chart {\phi_P: U_P \rightarrow {\bf C}} that maps {P} to {0}. Then for any {f \in L(D)}, {f \circ \phi_P^{-1}} has a pole of order at most {n_P} at the origin, and can thus be written as

\displaystyle f \circ \phi_P^{-1}(z) = \sum_{j=1}^{n_P} \frac{c_{P,j}}{z^j} + h_P(z)

for {z} near {0}, where {c_{P,j}} are complex numbers and {h_P(z)} is holomorphic at the origin. We call the expression {\sum_{j=1}^{n_P} \frac{c_{P,j}}{z^j}} the principal part of {f} (uniformised by {\phi_P}) at {P}. If we let {V} denote the collection of tuples {(\sum_{j=1}^{n_P} \frac{c_{P,j}}{z^j})_{P \in {\mathcal P}}} with {c_{P,j}} complex, then {V} is a complex vector space of dimension {\sum_{P \in {\mathcal P}} n_P = \mathrm{deg}(D)}. Inside this space we have the subspace {W} of tuples that can actually arise as the principal parts of a meromorphic function {f} in {L(D)}. Observe that if two functions {f,g \in L(D)} have the same principal parts, then their difference {f-g} is holomorphic and hence constant by Lemma 11. Thus, the space {W} has dimension exactly {\mathrm{dim}(L(D)) - 1}.

As {K} is a canonical divisor, we have a meromorphic {1}-form {\omega} with divisor {K}. If {h \in L(K)}, then {h \omega} is a holomorphic {1}-form. If {v = (\sum_{j=1}^{n_P} \frac{c_{P,j}}{z^j})_{P \in {\mathcal P}}} is a tuple in {V}, we can define a pairing {\langle h, v \rangle} by the formula

\displaystyle \langle h, v \rangle := \sum_{P \in {\mathcal P}} \mathrm{Res}( ((h \omega) \circ \phi_P^{-1})(z) \sum_{j=1}^{n_P} \frac{c_{P,j}}{z^j}, 0).

This is a bilinear pairing from {L(K) \times V} to {{\bf C}}. If {v \in W}, then all the components of {v} are principal parts of some {f \in L(D)}, and by Observation 3 one can then write {\langle h,v \rangle} as {\sum_{P \in {\mathcal P}} \mathrm{Res}(f h \omega, P)}, which then vanishes by Observation 1. Thus {\langle h,v \rangle = 0} whenever {h \in L(K)} and {v \in W}. As row rank equals column rank, we conclude that there is a subspace {U} of {L(K)} of dimension at least

\displaystyle \mathrm{dim} L(K) - (\mathrm{dim}(V) - \mathrm{dim}(W)) = \mathrm{dim} L(K) - \mathrm{deg}(D) + \mathrm{dim} L(D) - 1

such that {\langle h, v \rangle = 0} whenever {h \in U} and {v \in V}. But then if {h \in U}, {h \omega} must vanish to order at least {n_P} at each {P}, hence {(h \omega) \geq D}, which is equivalent to {(h) \geq D-K} and hence to {L(K-D)}; this is Observation 2. One concludes that

\displaystyle \mathrm{dim} L(K-D) \geq \mathrm{dim} L(K) - \mathrm{deg}(D) + \mathrm{dim} L(D) - 1

and the claim follows by rearranging. \Box

One can amplify this proposition if one is in possession of the following three non-trivial claims.

  1. There is at least one non-zero meromorphic {1}-form; in particular, canonical divisors exist.
  2. Every canonical divisor has degree {2g-2}, where {g} is the (topological) genus of {X}.
  3. The space of holomorphic {1}-forms has dimension {g}. Equivalently, for any canonical divisor {K}, {\mathrm{dim} L(K) = g}. (In algebraic geometry language, this asserts that for compact Riemann surfaces, the topological genus is equal to the geometric genus.)

Example 33 The Riemann sphere {{\bf C} \cup \{\infty\}} has genus {g=0}. All meromorphic {1}-forms, such as {dz}, have degree {-2} and so cannot be holomorphic, so there are no holomorphic {1}-forms. Meanwhile, a complex torus {{\bf C}/\Lambda} has genus {1}. All meromorphic {1}-forms, such as {dz}, have degree {0}. In particular, a holomorphic {1}-form is {dz} times a holomorphic function, so by Lemma 11 the space of holomorphic {1}-forms is one-dimensional.

Assuming these claims, the above proposition gives, for any canonical divisor {K}, that

\displaystyle \mathrm{dim} L(D) - \mathrm{dim}L(K-D) \leq \mathrm{deg}(D) + 1 - g

when {D} is effective and (replacing {D} by {K-D})

\displaystyle \mathrm{dim} L(K-D) - \mathrm{dim}L(D) \leq (2g-2 - \mathrm{deg}(D)) + 1 - g.

when {K-D} is effective. Since the second right-hand side is the negative of the first, we conclude that

\displaystyle \mathrm{dim} L(D) - \mathrm{dim}L(K-D) = \mathrm{deg}(D) + 1 - g

whenever {D} and {K-D} are both effective. In fact we have the more general

Theorem 34 (Riemann-Roch theorem) Let {X} be a compact Riemann surface of genus {g}, let {K} be a canonical divisor, and let {D} be any divisor. Then

\displaystyle \mathrm{dim} L(D) - \mathrm{dim}L(K-D) = \mathrm{deg}(D) + 1 - g.

This of course generalises (3) on the Riemann sphere (which has genus zero) and (7) on a complex torus (which has genus one).

It remains to establish the above three claims, and to obtain the Riemann-Roch theorem in full generality. I have not been able to locate particularly simple proofs of these steps that do not require significant machinery outside of complex analysis, so will only sketch some arguments justifying each of these.

To create meromorphic {1}-forms one can take gradients of harmonic functions, in the spirit of the proof of the uniformization theorem that was (mostly) given in these 246A lecture notes. A function {f: X \rightarrow {\bf R}} is said to be harmonic if, for every coordinate chart {\phi: U \rightarrow {\bf C}}, {f \circ \phi^{-1}: \phi(U) \rightarrow {\bf R}} is harmonic; as the property of being harmonic on open subsets of the complex plane is unaffected by conformal transformations, this definition does not depend on the choice of atlas that the charts are drawn from. If {f} is harmonic, one can form a holomorphic {1}-form {Df} on {X} by defining

\displaystyle (Df)_\phi(x+iy)\ d(x+iy) := (\frac{\partial}{\partial x} (f \circ \phi^{-1})(x+iy)

\displaystyle - i \frac{\partial}{\partial y} (f \circ \phi^{-1})(x+iy)) d(x+iy)

for each chart {\phi: U \rightarrow {\bf C}} and {x+iy \in \phi(U)}.

For instance, on {{\bf C}}, the harmonic function {x^2-y^2} gives rise to the holomorphic {1}-form {(2x + 2iy) d(x+iy) = 2z dz}.

Exercise 35 Show that this definition indeed defines a holomorphic {1}-form (thus the {(Df)_\phi} are all holomorphic and obey the compatibiltiy condition (5). (The computations are slightly less tedious if one uses Wirtinger derivatives.)

Unfortunately, for compact Riemann surfaces {X}, the same maximum principle argument used to prove Lemma 11 shows that there are no non-constant globally harmonic functions on {X}, so we cannot use this construction directly to produce non-trivial holomorphic or meromorphic {1}-forms on {X}. However, one can produce harmonic functions with logarithmic singularities, a prototypical example of which is the function {\log|z|} on the Riemann sphere, which is harmonic except at {z=0} and {z=\infty}. More generally, one has

Proposition 36 (Existence of dipole Green’s function) Let {X} be a Riemann surface, and let {P,Q} be distinct points in {X}. Then there exists a harmonic function {f} on {X \backslash \{P,Q\}} with the property that for any chart {\phi: U \rightarrow {\bf C}} that maps {P} to {0}, {f \circ \phi^{-1}(z)} is equal to {\log |z|} plus a bounded function near {0}, and for any chart {\psi: V \rightarrow {\bf C}} that maps {Q} to {0}, {f \circ \psi^{-1}(z)} is equal to {-\log |z|} plus a bounded function near {0}.

This proposition is essentially Proposition 65 of these 246A notes and can be proven using (a somewhat technical modification of) Perron’s method of subharmonic functions; we will not do so here. One can combine this proposition with the preceding construction to obtain a non-constant meromorphic {1}-form:

Exercise 37 Using the above proposition, show that if {X} is a compact Riemann surface and {P,Q} are distinct points in {X}, then there is a meromorphic {1}-form {\omega_{(P)-(Q)}} on {X} with poles only at {P,Q}, with a residue of {1} at {P} and a residue of {-1} at {Q}.

Using this, conclude the Riemann existence theorem: for any compact Riemann surface {X} and distinct points {P,Q} in {X}, there exists a meromorphic function on {X} that takes different values at {P} and {Q} and is in particular non-constant. (In other words, the meromorphic functions separate points.)

To prove the full Riemann-Roch theorem we will also need a variant of this exercise, not proven here:

Proposition 38 If {X} is a compact Riemann surface, {P} is a point in {X}, and {n \geq 2}, then there exists a meromorphic {1}-form {\omega_{n \cdot (P)}} on {X} with a pole of order {n} at {P} and no other poles.

The {1}-forms {\omega_{n \cdot (P)}} constructed by this proposition can be viewed as generalisations of the Weierstrass functions {\wp_P} (and their derivatives) to other Riemann surfaces, much as the {1}-form {\omega_{(P)-(Q)}} constructed in Exercise 37 are generalisations of first integrals of these functions. (Indeed, one can think of {\omega_{2 \cdot (P)}} as a sort of “derivative” of {\omega_{(P)-(Q)}}, formed as {Q} approaches {P} and taking a suitable renormalised limit; more generally, one can furthermore of {\omega_{(n+1) \cdot (P)}} as a suitably renormalised limit of {\omega_{n \cdot (P)} - \omega_{n \cdot (Q)}} as {Q} approaches {P}.) Note from Proposition 24 that the {\omega_{n \cdot (P)}} constructed by the above proposition automatically have vanishing residue at {P} (in classical language, these are Abelian differentials of the third kind, while the {\omega_{(P)-(Q)}} are Abelian differentials of the second kind).

The first claim is now settled by Exercise 37, so we now turn to the second. A non-constant meromorphic function {f} on {X} can be viewed as a non-constant holomorphic map from {X} to the Riemann sphere {{\bf C} \cup \{\infty\}}. By Proposition 24(ii), the number of times {f} equals {\infty} (counting multiplicity) equals the number of times {f} equals {0} (counting multiplicity). Calling this number {d} (the degree of {f}), then {d \geq 1} by Lemma 11, and we see (by again applying Proposition 24(ii) to {f-c} for any constant {c}) that {f} attains each value {c} on the Riemann sphere {d} times (counting multiplicity). As long as one stays away from the zeroes of {f'}, the zeroes of {f(z)-c} are all simple, and vary continuously in {z} by the inverse function theorem (or Rouché’s theorem), and hence after deleting a finite number of ramification points from {{\bf C} \cup \{\infty\}} (and also deleting their preimages from {X}), one can think of {f} as a {d}-fold covering map from (a punctured version) of {X} by (a punctured version of) {{\bf C} \cup \{\infty\}}, that is to say {X} is a {d}-fold branched cover of {{\bf C} \cup \{\infty\}}. By applying a fractional linear transformation if necessary, we may assume that {\infty} is not a ramification point of this cover (this is mainly for notational convenience).

One can use such a branched covering, together with some algebraic topology (which we will assume here as “black boxes”), to verify the second claim. Instead of working directly with the genus {g} of {X}, one can work instead with the Euler characteristic {\chi(X)} of {X}, which is known from algebraic topology to equal {2-2g}. For instance, the Riemann sphere {{\bf C} \cup \{\infty\}} has genus {0} and Euler characteristic {2}, while a complex torus {{\bf C}/\Lambda} has genus {1} and Euler characteristic {0}.

If one has a {d}-fold covering map from one surface {X} to another {Y}, one can show that the Euler characteristics are related by the formula {\chi(Y) = d \chi(X)}. With branched coverings this is not quite the case, but there is a substitute formula that takes into account the ramification points known as the Riemann-Hurwitz formula. Basically, if since we have a {d}-fold cover from a punctured version {X \backslash S} of {X} to a punctured version {{\bf C} \cup \{\infty\} \backslash R} of the Riemann sphere, we have

\displaystyle \chi(X \backslash S) = d \chi( {\bf C} \cup \{\infty\} \backslash R ).

On the other hand, if one reinserts a point {z} from {R} back into the punctured Riemann sphere, and also inserts all the preimages {f^{-1}(z)} of that point back into {X}, one can calculate that the Euler characteristic of the punctured sphere increases by {1}, while the Euler characteristic of the punctured version of {X} increases by the cardinality {\# f^{-1}(z)} of the preimage. We conclude the Riemann-Hurwitz formula

\displaystyle \chi(X) - \sum_{z \in R} \# f^{-1}(z) = d (\chi({\bf C} \cup \{\infty\}) - |R|).

As {f} has degree {d}, we have

\displaystyle \sum_{P \in f^{-1}(z)} \mathrm{ord}_P(f-z) = d

for each {z \in R} and so we can rearrange the above (using {\chi(X) = 2-2g} and {\chi({\bf C} \cup \{\infty\}) = 2}) as

\displaystyle \sum_{z \in R} \sum_{P \in f^{-1}(z)} (\mathrm{ord}_P(f-z)-1) - 2d = - (2-2g). \ \ \ \ \ (8)


The meromorphic {1}-form {dz} on the Riemann sphere has a double pole at {\infty} and no zeroes. As {\infty} is not a point of ramification, the pullback {f^* dz} of this form then has a double pole at each of the {d} preimages in {f^{-1}(\infty)}. However, it also acquires a zero of order {\mathrm{ord}_P(f-z)-1} whenever {z \in R} and {P \in f^{-1}(z)}. Taking divisors, we conclude that the left-hand side of (8) is equal to the degree of {(f^* dz)}, which is a canonical divisor. Since all canonical divisors have the same degree, this gives the second claim.

Exercise 39 Let {X} and {Y} be compact Riemann surfaces, with {Y} having higher genus than {X}. Show that there does not exist any non-constant holomorphic map from {X} to {Y}.

Now we discuss the third claim. It is relatively easy to show that the dimension of the space of holomorphic {1}-forms is upper bounded by {g}. Indeed, we may assume without loss of generality that there exists at least one non-zero holomorphic {1}-form, giving an effective canonical divisor {K}, which we have just shown to have degree {2g-2}. From Proposition 32 applied to {D=K} we then have

\displaystyle \mathrm{dim} L(K) - 1 \leq 2g-2 + 1 - \mathrm{dim}(L(K))

and hence {\mathrm{dim}(L(K))\leq g}, giving the claimed upper bound.

The lower bound is harder. Basically, it asserts that the pairing {\langle,\rangle} in Proposition 32, when quotiented down to a pairing between {L(K)/L(K-D)} and {V/W}, is non-degenerate. This is a special case of an algebraic geometry fact known as Serre duality, which we will not prove here. It can also be proven from Hodge theory, using the fact that the first de Rham cohomology has dimension {2g}; we do not pursue this approach here. Alternatively, one can try to explicitly construct {g} linearly holomorphic {1}-forms on the Riemann surface {X}. We will not do this in general, but show how to do this in the case of a smooth algebraic curve of degree {d}. The genus of such a curve turns out to be given by the genus-degree formula

\displaystyle g = \frac{(d-1)(d-2)}{2}.

One can sketch a proof of this using the Riemann-Hurwitz formula. For simplicity of notation let us assume that the polynomial is in “general position” in a number of senses that we will not specify precisely. We focus on the affine curve

\displaystyle Z(P)_{aff} = \{ [z_1,z_2,1]: P(z_1,z_2,1) = 0 \};

generically this is {Z(P)} with {d} points deleted, and thus will have an Euler characteristic of {2-2g-d}. The projection map from {Z(P)_{aff}} to {{\bf C}} (which has Euler characteristic {1}) that maps {[z_1,z_2,1]} to {z_1} has ramification points whenever {\frac{\partial P}{\partial z_2}(z_1,z_2,1)} vanishes, which generically will be simple; away from these points one has a {d}-fold covering. Bezout’s theorem shows that this happens {d(d-1)} times. A modification of the proof of (8) then gives

\displaystyle d(d-1) - d = - (2 - 2g - d)

which gives the claim.

To construct {\frac{(d-1)(d-2)}{2}} linearly independent holomorphic {1}-forms on {Z(P)} one can argue as follows. Again it is convenient for notational reasons to work on the affine curve {Z(P)_{aff}} and assume that {P} is in general position. The cases {d=0,1,2} can be worked out by hand, so suppose {d \geq 3}. Taking the differential of the coordinate function {z_1} gives a meromorphic {1}-form {dz_1}, which generically has simple zeroes whenever the degree {d-1} polynomial {\frac{\partial P}{\partial z_2}(z_1,z_2,1)} vanishes, and has a pole of order {2} at the {d} points in {Z(P) \backslash Z(P)_{aff}} (i.e., the {d} points where {Z(P)} meets the line at infinity). This implies that for any polynomial {Q(z_1,z_2)} of degree at most {d-3}, the {1}-form

\displaystyle \frac{Q(z_1,z_2)}{\frac{\partial P}{\partial z_2}(z_1,z_2,1)} dz_1

is holomorphic (we have killed all the poles and removed the simple zeroes, while possibly creating new zeroes where {Q} vanishes). The space of such polynomials has dimension {\frac{(d-1)(d-2)}{2} = g}, giving the claim.

It remains to remove the condition that {D} and {K-D} be effective to obtain the Riemann-Roch theorem in full generality. We first prove a weaker version known as Riemann’s inequality:

Proposition 40 (Riemann’s inequality) Let {X} be a compact Riemann surface of genus {g}, and let {D} be a divisor. Then {\mathrm{dim} L(D) \geq \mathrm{deg}(D) + 1 - g}.

Proof: Let {K} be a canonical divisor. Choose a non-zero effective divisor {D'} such that {K+D' \geq D}. We will show that

\displaystyle \mathrm{dim} L(K+D') = \mathrm{deg}(D') + g - 1;

since from Lemma 19 we have {\mathrm{dim} L(K+D') \leq \mathrm{dim} L(D) + \mathrm{deg}(K+D'-D)}, and {\mathrm{deg}(K) = 2g-2}, Riemann’s inequality will follow after a brief calculation.

Dividing through by a meromorphic {1}-form of divisor {K}, we see that {\mathrm{dim} L(K+D')} is the dimension of the space {M} of meromorphic {1}-forms with divisor at least {-D'}. If {D' = \sum_{P \in {\mathcal P}} n_P \cdot (P)} with {n_P \geq 1}, {M} is the space of meromorphic {1}-forms {\omega} that have poles of order at most {n_P} at each {P \in {\mathcal P}}, and no other poles.

As in the proof of Proposition 32, let {V} be the space of tuples {(\sum_{j=1}^{n_P} \frac{c_{P,j}}{z^j})_{P \in {\mathcal P}}} with {c_{P,j}} complex; this has dimension {\mathrm{deg}(D') \geq 1}, and there is a linear map {T: M \rightarrow V} that takes a meromorphic {1}-form {\omega} in {M} to the tuple of its principal parts at points in {{\mathcal P}}. The image is constrained by Proposition 24(i), which forces the residues {c_{P,1}} to sum to zero. On the other hand, by taking linear combinations of the meromorphic {1}-forms from Exercise 37 and Propsosition 38, we see conversely that any tuple in {V} whose residues sum to zero lies in the image of {T}. Thus the image of {T} has dimension {\mathrm{deg}(D') - 1}. On the other hand, the kernel of {T} is simply the space of holomorphic {1}-forms, which has dimension {g}. The claim follows. \Box

Now we prove the Riemann-Roch theorem. We split into cases, depending on the dimensions of {L(D)} and {L(K-D)}.

First suppose that {L(D)} and {L(K-D)} are both positive dimensional. By Exercise 18, {D} is linearly equivalent to an effective divisor, hence by Proposition 32 we have

\displaystyle \mathrm{dim} L(D) - \mathrm{dim}L(K-D) \leq \mathrm{deg}(D) + 1 - g

and similarly (replacing {D} by {K-D}

\displaystyle \mathrm{dim} L(K-D) - \mathrm{dim}L(D) \leq \mathrm{deg}(K-D) + 1 - g

and the claim then follows by using {\mathrm{deg}(K) = 2g-2}.

Now suppose that {L(D)} and {L(K-D)} are both trivial. Riemann’s inequality then gives

\displaystyle 0 \leq \mathrm{deg}(D) + 1-g, \mathrm{deg}(K-D) + 1-g,

which (again using {\mathrm{deg}(K)=2g-2}) gives {\mathrm{deg}(D) = g-1}, and the claim again follows.

Now suppose that {L(K-D)} is trivial but {L(D)} is positive dimensional. From Exercise 18 and Proposition 32 as before we have

\displaystyle \mathrm{dim} L(D) - \mathrm{dim}L(K-D) \leq \mathrm{deg}(D) + 1 - g,

while from Riemann’s inequality and the triviality of {L(K-D)} we have

\displaystyle \mathrm{dim} L(D) - \mathrm{dim}L(K-D) \geq \mathrm{deg}(D) + 1 - g,

giving the claim. The final case when {L(D)} is trivial and {L(K-D)} is positive dimensional then follows by swapping {D} with {K-D}.

Exercise 41 Let {X} be a compact Riemann surface of genus one, and let {\infty} be a point on {X}. Show that for any points {P,Q} on {X}, there is a unique point {P+Q} on {X} such that {(P) + (Q) - (P+Q) - (\infty)} is a principal divisor. Furthermore show that this defines an abelian group law on {X}. What is this group law in the case that {X} is an elliptic curve?

Exercise 42 Let {X} be a compact Riemann surface, and there exists a meromorphic function {f} on {X} with one simple pole and no other poles. Show that {f} is an isomorphism between {X} and the Riemann sphere. Conclude in particular that the Riemann sphere is the only genus zero compact Riemann surface (up to isomorphism, of course).

Exercise 43 Let {X} be a compact Riemann surface of genus {g}, and let {D} be a divisor of degree {2g-2}. Show that {\mathrm{dim}L(D)=g} when {D} is a canonical divisor, and {\mathrm{dim} L(D)=g-1} otherwise.

Exercise 44 (Gap theorems) Let {X} be a compact Riemann surface of genus {g}.

  • (i) (Weierstrass gap theorem) If {P} is a point in {X}, show that there are precisely {g} positive integers {n} with the property that there does not exist a meromorphic function on {X} with a pole of order {n} at {P}, and no other poles. Show in addition that all of these integers are less than or equal to {2g-1}.
  • (ii) (Noether gap theorem) If {P_1, P_2, \dots} are a sequence of distinct points in {X}, show that there are precisely {g} positive integers {n} with the property that there does not exist a meromorphic function with a simple pole at {P_n}, at most a simple pole at {P_1,\dots,P_{n-1}}, and no other poles. Show in addition that all of these integers are less than or equal to {2g-1}.

— 5. Appendix: connectedness of irreducible algebraic curves —

In this section we prove

Theorem 45 Let {P(z_1,z_2,z_3)} be an irreducible homogeneous polynomial of degree {d \geq 1}. Then {Z(P)} is connected.

We begin with the affine version of this theorem:

Proposition 46 Let {P(z_1,z_2)} be an irreducible polynomial of degree {d \geq 1}. Then the affine curve

\displaystyle Z(P)_{aff} := \{ (z_1,z_2) \in {\bf C}^2: P(z_1,z_2) = 0 \}

is connected.

We observe that this theorem fails if one replaces the complex numbers by the real ones; for instance, the quadratic polynomial {x_2^2 - x_1^2 - 1} is irreducible, but the hyperbola it defines in {{\bf R}^2} is disconnected. Thus we will need properties of the complex numbers that are not true for the reals. We will rely in particular on the fundamental theorem of algebra, the removability of bounded singularities, the generalised Liouville theorem that entire functions of polynomial growth are polynomial, and the fact that the complex numbers remain connected even after removing finitely many points.

We now prove the proposition. We will use the classical approach of thinking of {Z(P)^{aff}} as a branched {d}-fold cover over the complex numbers, possibly after some preparatory change of variables; the main difficulty is then to work around the ramification points of this cover. We turn to the details. Let {P(z_1,z_2)} be an irreducible polynomial of degree {d \geq 1}, then we can write it as

\displaystyle P(z_1,z_2) = P_d(z_1) z_2^d + P_{d-1}(z_1) z_2^{d-1} + \dots + P_0(z_1)

where for {j=0,\dots,d}, {P_j} lies in the space {\mathrm{Poly}_{\leq d-j}} of polynoimals of one variable of degree at most {d-j}. In particular {P_d = P_d(z_1)} is a constant. It could happen that this constant vanishes (e.g., consider the example {d=2} and {P(z_1,z_2) = z_2 - z_1^2}); but in that case we will make a change of variables and consider instead the polynomial {P( z_1 + \lambda z_2, z_2 )} for a complex parameter {\lambda}. Now the analogue of {P_d} is a non-trivial polynomial function of {\lambda} (because one of the {P_j} must have degree exactly {d-j}), and so this quantity will be non-zero for some {\lambda} (in fact for all but at most {d} values of {\lambda}).

Henceforth we assume we have placed {P(z_1,z_2)} into a form where {P_d} is non-zero. Then, for each {z_1}, the function {P_{z_1}: z_2 \mapsto P(z_1,z_2)} is a polynomial of one variable of degree exactly {d}, so it has {d} roots (counting multiplicity) by the fundamental theorem of algebra. Let us call these set of roots {\Sigma_{z_1}}, thus

\displaystyle Z(P)_{aff} = \bigcup_{z_1 \in {\bf C}} \{z_1\} \times \Sigma_{z_1}


\displaystyle P_{z_1}(\zeta) = P_d \prod_{\zeta \in \Sigma_{z_1}} (z_2 - \zeta).

From Rouché’s theorem we know that the zero set varies continuously in {z_1} in the following sense: for any {z_1} and {\varepsilon>0}, each point of {\Sigma_{z'_1}} will stay within {\varepsilon} of some point in {\Sigma_{z_1}} if {z'_1} is sufficiently close to {z_1}. (In other words, {z_1 \mapsto \Sigma_{z_1}} is continuous with respect to Hausdorff distance.) We also see that the elements in {\Sigma_{z_1}} grow at most polynomially in {z_1}. For some values of {Z_1}, some of these roots in {\Sigma_{z_1}} may be repeated. For instance, if {P(z_1,z_2) = z_2^2 + z_1^2 - 1}, then {P_{z_1}(z_2) = z_2^2 + (z_1^2-1)}, which has a double root at {z_2=0} if {z_1=-1} or {z_1=+1}. However, if this occurs for some {z_1}, the the degree {d} polynomial {P_{z_1}(z_2)} and the degree {d-1} {P'_{z_1}(z_2)} have a common root {w_{z_1}}. This only occurs when the resultant {\mathrm{Res}(P_{z_1}, P'_{z_1})} of {P_{z_1}} and {P'_{z_1}} vanishes. (One can also use the discriminant of {P_{z_1}} here in place of the resultant; the two are constant multiples of each other.) From the definition of the resultant we see that {\mathrm{Res}(P_{z_1}, P'_{z_1})} is a polynomial in {z_1}, and furthermore we have a Bezout identity

\displaystyle \mathrm{Res}(P_{z_1}, P'_{z_1}) = P_{z_1} A_{z_1} + P'_{z_1} B_{z_1}

where {A_{z_1}, B_{z_1}} are polynomials of degree at most {d-2} and {d-1} in {z_2} respectively, with coefficients that are polynomials in {z_1}. The resultant cannot vanish identically, as this would mean that {P_{z_1}} divides {P'_{z_1} B_{z_1}} viewed as polynomials in {z_1,z_2}, which contradicts unique factorisation and the irreducibility of {P} since {P_{z_1}} cannot divide the lower degree polynomials {P'_{z_1}} or {B_{z_1}}. Thus the resultant can only vanish for a finite number of {z_1}, and so for all but finitely many {z_1} the roots of {P_{z_1}} are distinct, thus {\Sigma_{z_1}} has cardinality exactly {d} and {P'_{z_1}} is non-vanishing at each element of {\Sigma{z_1}}.

From this and the inverse function theorem we see that for {z_1} outside of a finite number of points (known as ramification points), the set {\Sigma_{z_1}} varies holomorphically with {z_1}. Locally, one can thus describe {\Sigma_{z_1}} as a family {\{f_1(z_1),\dots,f_d(z_1)\}} of holomorphic functions, although the order in which one labels these points is arbitrary, as one varies {z_1} around one of the ramification points, theis ordering may be permuted (consider for instance the case {P_{z_1}(z_2) = z_2^2 - z_1} as {z_1} goes around the origin, in which we can write {\Sigma_{z_1} = \{ + z_1^{1/2}, - z_1^{1/2} \}} for various branches of the square root function).

Now suppose that {Z(P)_{aff}} is disconnected, so it splits into two non-empty clopen subsets {Z_1 \cup Z_2}. At each non-ramified point {z_1}, the set {Z_1} meets some subset of {\{z_1 \}\times \Sigma_{z_1}}. In local coordinates, the {f_j(z_1)} are distinct and vary continuously with {z_1}, the number of points in which {Z_1} meets {\{z_1\} \times \Sigma_{z_1}} is locally constant; since {{\bf C}} with finitely many points removed is connected, this number is then globally constant, thus there is {0 \leq k \leq d} such that {Z_1 \cap \{z_1\} \times \Sigma_{z_1}} has cardinality precisely {k}. This lets us factor {P_{z_1} = P_d Q_{z_1} R_{z_2}}, where {Q_{z_1}} is the degree {k} polynomial

\displaystyle Q_{z_1}(z_2) := \prod_{(z_1,\zeta) \in Z_1 \cap \Sigma_{z_1}} (z_2 - \zeta)

and {R_{z_1}} is the degree {d-k} polynomial

\displaystyle Q_{z_1}(z_2) := \prod_{(z_1,\zeta) \in Z_2 \cap \Sigma_{z_1}} (z_2 - \zeta)

The coefficients of these polynomials are functions of {z_1} that vary holomorphically with {z_1} outside of the ramification points; they also stay bounded as one approaches these points and grow at most polynomially. Hence (by the generalised Liouville theorem) they depend polynomially on {z_1}, thus {Q_{z_1}(z_2)} and {R_{z_1}(z_2)} are in fact a polynomial jointly in {z_1,z_2}. But this contradicts the irreducibility of {P}, unless {k=0} or {k=d}. We conclude that {Z(P)_{aff}} is connected after deleting its ramification points. But from the continuous dependence of {\Sigma_{z_1}} on {z_1}, the ramification points adhere to the rest of {Z(P)_{aff}} (the zeroes of {P_{z_1}} are stable under small perturbations, even at points of ramification), so that {Z(P)_{aff}} is connected, proving the proposition.

Now we prove the theorem. The case {d=1} can be done by hand, so assume {d>1}. Let {P(z_1,z_2,z_3)} be an irreducible homogeneous polynomial of degree {d}. Then {P(z_1,z_2,1)} is an irreducible polynomial of degree {d} (it cannot be less than {d}, as this will make {P(z_1,z_2,z_3)} contain a power of {z_3} which makes it reducible since {d>1}). As a consequence, we see from the proposition that the affine part {Z(P) \cap \{ [z_1,z_2,z_3] \in \mathbf{CP}^2: z_3 \neq 0\}} is connected; similarly if we replace the condition {z_3 \neq 0} with {z_1 \neq 0} and {z_2 \neq 0}. As these three pieces of {Z(P)} cover the whole zero locus, it will suffice to show that they intersect each other; for instance, it will suffice to show that the zero set of {P(z_1,z_2,1)} is not completely contained in any line. But this is clear from the proof of the proposition, which shows that (after a linear transformation) almost every vertical line meets this zero set in {d>1} points.

The following exercise will be moved to a more appropriate position later (to avoid renumbering for homework exercises).

Exercise 47 Let {f: X \rightarrow Y} be a holomorphic map between a compact Riemann surface {X} and a Riemann surface {Y}. Show that {f} is either surjective or constant.

April 18, 2018

John BaezApplied Category Theory at NIST (Part 2)

Here are links to the slides and videos for most of the talks from this workshop:

Applied Category Theory: Bridging Theory & Practice, March 15–16, 2018, NIST, Gaithersburg, Maryland, USA. Organized by Spencer Breiner and Eswaran Subrahmanian.

They give a pretty good picture of what went on. Spencer Breiner put them up here; what follows is just a copy of what’s on his site.

Unfortunately, the end of Dusko Pavlovic’s talk, as well as Ryan Wisnesky’s and Steve Huntsman’s were lost due to a technical error. You can also find a Youtube playlist with all of the videos here.

Introduction to NIST:

Ram Sriram – NIST and Category Theory


Spencer Breiner – Introduction

Invited talks:

Bob Coecke – From quantum foundations to cognition via pictures


Dusko Pavlovic – Security Science in string diagrams (partial video)


John Baez – Compositional design and tasking of networks (part 1)


John Foley – Compositional design and tasking of networks (part 2)


David Spivak – A higher-order temporal logic for dynamical systems


Lightning Round Talks:

Ryan Wisnesky – Categorical databases (no video)

Steve Huntsman – Towards an operad of goals (no video)


Bill Regli – Disrupting interoperability (no slides)


Evan Patterson – Applied category theory in data science


Brendan Fong – data structures for network languages


Stephane Dugowson – A short introduction to a general theory of interactivity


Michael Robinson – Sheaf methods for inference


Cliff Joslyn – Seeking a categorical systems theory via the category of hypergraphs


Emilie Purvine – A category-theoretical investigation of the type hierarchy for heterogeneous sensor integration


Helle Hvid Hansen – Long-term values in Markov decision processes, corecursively


Alberto Speranzon – Localization and planning for autonomous systems via (co)homology computation


Josh Tan – Indicator frameworks (no slides)

Breakout round report

John BaezApplied Category Theory Course: Ordered Sets

My applied category theory course based on Fong and Spivak’s book Seven Sketches is going well. Over 250 people have registered for the course, which allows them to ask question and discuss things. But even if you don’t register you can read my “lectures”.

Here are all the lectures on Chapter 1, which is about adjoint functors between posets, and how they interact with meets and joins. We study the applications to logic – both classical logic based on subsets, and the nonstandard version of logic based on partitions. And we show how this math can be used to understand “generative effects”: situations where the whole is more than the sum of its parts!

Lecture 1 – Introduction
Lecture 2 – What is Applied Category Theory?
Lecture 3 – Chapter 1: Preorders
Lecture 4 – Chapter 1: Galois Connections
Lecture 5 – Chapter 1: Galois Connections
Lecture 6 – Chapter 1: Computing Adjoints
Lecture 7 – Chapter 1: Logic
Lecture 8 – Chapter 1: The Logic of Subsets
Lecture 9 – Chapter 1: Adjoints and the Logic of Subsets
Lecture 10 – Chapter 1: The Logic of Partitions
Lecture 11 – Chapter 1: The Poset of Partitions
Lecture 12 – Chapter 1: Generative Effects
Lecture 13 – Chapter 1: Pulling Back Partitions
Lecture 14 – Chapter 1: Adjoints, Joins and Meets
Lecture 15 – Chapter 1: Preserving Joins and Meets
Lecture 16 – Chapter 1: The Adjoint Functor Theorem for Posets
Lecture 17 – Chapter 1: The Grand Synthesis

If you want to discuss these things, please visit the Azimuth Forum and register! Use your full real name as your username, with no spaces, and use a real working email address. If you don’t, I won’t be able to register you. Your email address will be kept confidential.

I’m finding this course a great excuse to put my thoughts about category theory into a more organized form, and it’s displaced most of the time I used to spend on Google+ and Twitter. That’s what I wanted: the conversations in the course are more interesting!

John BaezApplied Category Theory 2018 Schedule

Here’s the schedule of the ACT2018 workshop:

Click to enlarge!

They put me on last, either because my talk will be so boring that it’s okay everyone will have left, or because my talk will be so exciting that nobody will want to leave. I haven’t dared ask the organizers which one.

On the other hand, they’ve put me on first for the “school” which occurs one week before the workshop. Here’s the schedule for the ACT 2018 Adjoint School:

BackreactionGuest Post: Brian Keating about his book “Losing the Nobel Prize"

My editor always said “Don’t read reviews”... But given that I’ve received some pretty amazing reviews lately, how bad could it be? Nature even made a delightfully whimsical custom-illustration of my conjecture: that some of my fellow astronomers look to the skies for the Nobel Prize: Illustration by Stephan Schmitz for Nature. When I saw Sabine had finally gotten round to reading my book,

David HoggGaia writing, neutrino masses

In a remarkable turn of events, I finished writing a paper today! More specifically, I finished what I would call the “zeroth draft” of my paper (or really just short note) on the Gaia likelihood function. I checked in with Hans-Walter Rix (MPIA) and he encouraged me to take it through some revisions and submit it to arXiv. We shall see if I make it.

At lunch-time, the NYU CCPP Brown Bag talk was by Derek Inman (NYU) on neutrino masses. He explained what's known from oscillation experiments, from beta-decay experiments, and from cosmology. The laboratory bounds put lower limits on the neutrino masses, and the cosmological bounds put upper limits. The cosmological bounds are very strong, but they are also very dependent on having a very good cosmogonic model. That is, they are not even close to being model-independent. He did a nice job explaining how the flavor eigentstates relate to the propagation eigenstates, and the mass hierarchies. It is a nice set of problems.

April 17, 2018

Terence TaoPolymath15, eighth thread: going below 0.28

This is the eighth “research” thread of the Polymath15 project to upper bound the de Bruijn-Newman constant {\Lambda}, continuing this post. Discussion of the project of a non-research nature can continue for now in the existing proposal thread. Progress will be summarised at this Polymath wiki page.

Significant progress has been made since the last update; by implementing the “barrier” method to establish zero free regions for H_t by leveraging the extensive existing numerical verification of the Riemann hypothesis (which establishes zero free regions for H_0), we have been able to improve our upper bound on \Lambda from 0.48 to 0.28. Furthermore, there appears to be a bit of further room to improve the bounds further by tweaking the parameters t_0, y_0, X used in the argument (we are currently using t_0=0.2, y_0 = 0.4, X = 5 \times 10^9); the most recent idea is to try to use exponential sum estimates to improve the bounds on the derivative of the approximation to H_t that is used in the barrier method, which currently is the most computationally intensive step of the argument.

Jordan EllenbergWhat is the Lovasz number of the plane?

There are lots of interesting invariants of a graph which bound its chromatic number!  Most famous is the Lovász number, which asks, roughly:  I attach vectors v_x to each vertex x such that v_x and v_y are orthogonal whenever x and y are adjacent, I try to stuff all those vectors into a small cone, the half-angle of the cone tells you the Lovász number, which is bigger and bigger as the smallest cone gets closer and closer to a hemisphere.

Here’s an equivalent formulation:  If G is a graph and V(G) its vertex set, I try to find a function f: V(G) -> R^d, for some d, such that

|f(x) – f(y)| = 1 whenever x and y are adjacent.

This is called a unit distance embedding, for obvious reasons.

The hypersphere number t(G) of the graph is the radius of the smallest sphere containing a unit distance embedding of G.  Computing t(G) is equivalent to computing the Lovász number, but let’s not worry about that now.  I want to generalize it a bit.  We say a finite sequence (t_1, t_2, t_3, … ,t_d) is big enough for G if there’s a unit-distance embedding of G contained in an ellipsoid with major radii t_1^{1/2}, t_2^{1/2}, .. t_d^{1/2}.  (We could also just consider infinite sequences with all but finitely many terms nonzero, that would be a little cleaner.)

Physically I think of it like this:  the graph is trying to fold itself into Euclidean space and fit into a small region, with the constraint that the edges are rigid and have to stay length 1.

Sometimes it can fold a lot!  Like if it’s bipartite.  Then the graph can totally fold itself down to a line segment of length 1, with all the black vertices going to one end and the white vertices going to the other.  And the big enough sequences are just those with some entry bigger than 1.

On the other hand, if G is a complete graph on k vertices, a unit-distance embedding has to be a simplex, so certainly anything with k of the t_i of size at least 1-1/k is big enough.   (Is that an if and only if?  To know this I’d have to know whether an ellipse containing an equilateral triangle can have a radius shorter than that of the circumcircle.)

Let’s face it, it’s confusing to think about ellipsoids circumscribing embedded graphs, so what about instead we define t(p,G) to be the minimum value of the L^p norm of (t_1, t_2, …) over ellipsoids enclosing a unit-distance embedding of G.

Then a graph has a unit-distance embedding in the plane iff t(0,G) <= 2.  And t(oo,G) is just the hypersphere number again, right?  If G has a k-clique then t(p,G) >= t(p,K_k) for any p, while if G has a k-coloring (i.e. a map to K_k) then t(p,G) <= t(p,K_k) for any n.  In particular, a regular k-simplex with unit edges fits into a sphere of squared radius 1-1/k, so t(oo,G) < 1-1/k.

So… what’s the relation between these invariants?  Is there a graph with t(0,G) = 2 and t(oo,G) > 4/5?  If so, there would be a non-5-colorable unit distance graph in the plane.  But I guess the relationship between these various “norms” feels interesting to me irrespective of any relation to plane-coloring.  What is the max of t(oo,G) with t(0,G)=2?

The intermediate t(p,G) all give functions which upper-bound clique number and lower-bound chromatic number; are any of them interesting?  Are any of them easily calculable, like the Lovász number?


  1.  I called this post “What is the Lovász number of the plane?” but the question of “how big can t(oo,G) be if t(0,G)=2”? is more a question about finite subgraphs of the plane and their Lovász numbers.  Another way to ask “What is the Lovász number of the plane” would be to adopt the point of view that the Lovász number of a graph has to do with extremizers on the set of positive semidefinite matrices whose (i,j) entry is nonzero only when i and j are adjacent vertices or i=j.  So there must be some question one could ask about the space of positive semidefinite symmetric kernels K(x,y) on R^2  x R^2 which are supported on the locus ||x-y||=1 and the diagonal, which question would rightly be called “What is the Lovász number of the plane?” But I’m not sure what it is.
  2. Having written this, I wonder whether it might be better, rather than thinking about enclosing ellipsoids of a set of points in R^d, just to think of the n points as an nxd matrix X and compute the singular values of X^T X, which would be kind of an “approximating ellipsoid” to the points.  Maybe later I’ll think about what that would measure.  Or you can!







April 16, 2018

BackreactionBook Review: “Losing the Nobel Prize” by Brian Keating

Losing the Nobel Prize: A Story of Cosmology, Ambition, and the Perils of Science’s Highest Honor Brian Keating W. W. Norton & Company (April 24, 2018) Brian Keating hasn’t won a Nobel Prize. Who doesn’t know the feeling? But Keating, professor of physics at UC San Diego, isn’t like you and I. He had a good shot at winning. Or at least he thought he had. And that’s what his book is about.

BackreactionNo, that galaxy without dark matter has not ruled out modified gravity

A smear with dots,  also known as NGC 5264-HST. Did you really have to ask? And if you had to ask, why did you have to ask me? You sent me like two million messages and comments and emails asking what I think about NGC 1052-DF2, that galaxy which supposedly doesn’t contain dark matter. Thanks. I am very flattered by your faith. But I’m not an astrophysicist, I’m a theorist. I invent

John PreskillCatching up with the quantum-thermo crowd

You have four hours to tour Oxford University.

What will you visit? The Ashmolean Museum, home to da Vinci drawings, samurai armor, and Egyptian mummies? The Bodleian, one of Europe’s oldest libraries? Turf Tavern, where former president Bill Clinton reportedly “didn’t inhale” marijuana?

Felix Binder showed us a cemetery.

Of course he showed us a cemetery. We were at a thermodynamics conference.

The Fifth Quantum Thermodynamics Conference took place in the City of Dreaming Spires.Participants enthused about energy, information, engines, and the flow of time. About 160 scientists attended—roughly 60 more than attended the first conference, co-organizer Janet Anders estimated.


Weak measurements and quasiprobability distributions were trending. The news delighted me, Quantum Frontiers regulars won’t be surprised to hear.

Measurements disturb quantum systems, as early-20th-century physicist Werner Heisenberg intuited. Measure a system’s position strongly, and you forfeit your ability to predict the outcomes of future momentum measurements. Weak measurements don’t disturb the system much. In exchange, weak measurements provide little information about the system. But you can recoup information by performing a weak measurement in each of many trials, then processing the outcomes.

Strong measurements lead to probability distributions: Imagine preparing a particle in some quantum state, then measuring its position strongly, in each of many trials. From the outcomes, you can infer a probability distribution \{ p(x) \}, wherein p(x) denotes the probability that the next trial will yield position x.

Weak measurements lead analogously to quasiprobability distributions. Quasiprobabilities resemble probabilities but can misbehave: Probabilities are real numbers no less than zero. Quasiprobabilities can dip below zero and can assume nonreal values.

Do not disturb 2

What relevance have weak measurements and quasiprobabilities to quantum thermodynamics? Thermodynamics involves work and heat. Work is energy harnessed to perform useful tasks, like propelling a train from London to Oxford. Heat is energy that jiggles systems randomly.

Quantum properties obscure the line between work and heat. (Here’s an illustration for experts: Consider an isolated quantum, such as a spin chain. Let H(t) denote the Hamiltonian that evolves with the time t \in [0, t_f]. Consider preparing the system in an energy eigenstate | E_i(0) \rangle. This state has zero diagonal entropy: Measuring the energy yields E_i(0) deterministically. Considering tuning H(t), as by changing a magnetic field. This change constitutes work, we learn in electrodynamics class. But if H(t) changes quickly, the state can acquire weight on multiple energy eigenstates. The diagonal entropy rises. The system’s energetics have gained an unreliability characteristic of heat absorption. But the system has remained isolated from any heat bath. Work mimics heat.)

Quantum thermodynamicists have defined work in terms of a two-point measurement scheme: Initialize the quantum system, such as by letting heat flow between the system and a giant, fixed-temperature heat reservoir until the system equilibrates. Measure the system’s energy strongly, and call the outcome E_i. Isolate the system from the reservoir. Tune the Hamiltonian, performing the quantum equivalent of propelling the London train up a hill. Measure the energy, and call the outcome E_f.

Any change \Delta E in a system’s energy comes from heat Q and/or from work W, by the First Law of Thermodynamics, \Delta E = Q + W.  Our system hasn’t exchanged energy with any heat reservoir between the measurements. So the energy change consists of work: E_f - E_i =: W.


Imagine performing this protocol in each of many trials. Different trials will require different amounts W of work. Upon recording the amounts, you can infer a distribution \{ p(W) \}. p(W) denotes the probability that the next trial will require an amount W of work.

Measuring the system’s energy disturbs the system, squashing some of its quantum properties. (The measurement eliminates coherences, relative to the energy eigenbasis, from the state.) Quantum properties star in quantum thermodynamics. So the two-point measurement scheme doesn’t satisfy everyone.

Enter weak measurements. They can provide information about the system’s energy without disturbing the system much. Work probability distributions \{ p(W) \} give way to quasiprobability distributions \{ \tilde{p}(W) \}.

So propose Solinas and Gasparinetti, in these papers. Other quantum thermodynamicists apply weak measurements and quasiprobabilities differently.2 I proposed applying them to characterize chaos, and the scrambling of quantum information in many-body systems, at the conference.3 Feel free to add your favorite applications to the “comments” section.


All the quantum ladies: The conference’s female participants gathered for dinner one conference night.

Wednesday afforded an afternoon for touring. Participants congregated at the college of conference co-organizer Felix Binder.3 His tour evoked, for me, the ghosts of thermo conferences past: One conference, at the University of Cambridge, had brought me to the grave of thermodynamicist Arthur Eddington. Another conference, about entropies in information theory, had convened near Canada’s Banff Cemetery. Felix’s tour began with St. Edmund Hall’s cemetery. Thermodynamics highlights equilibrium, a state in which large-scale properties—like temperature and pressure—remain constant. Some things never change.



With thanks to Felix, Janet, and the other coordinators for organizing the conference.

1Oxford derives its nickname from an elegy by Matthew Arnold. Happy National Poetry Month!


3Michele Campisi joined me in introducing out-of-time-ordered correlators (OTOCs) into the quantum-thermo conference: He, with coauthor John Goold, combined OTOCs with the two-point measurement scheme.

3Oxford University contains 38 colleges, the epicenters of undergraduates’ social, dining, and housing experiences. Graduate students and postdoctoral scholars affiliate with colleges, and senior fellows—faculty members—govern the colleges.

April 14, 2018

Terence TaoPolymath16 now launched: simplifying the lower bound argument for the Hadwiger-Nelson problem

Just a quick announcement that Dustin Mixon and Aubrey de Grey have just launched the Polymath16 project over at Dustin’s blog.  The main goal of this project is to simplify the recent proof by Aubrey de Grey that the chromatic number of the unit distance graph of the plane is at least 5, thus making progress on the Hadwiger-Nelson problem.  The current proof is computer assisted (in particular it requires one to control the possible 4-colorings of a certain graph with over a thousand vertices), but one of the aims of the project is to reduce the amount of computer assistance needed to verify the proof; already a number of such reductions have been found.  See also this blog post where the polymath project was proposed, as well as the wiki page for the project.  Non-technical discussion of the project will continue at the proposal blog post.

Terence TaoCommutators close to the identity

I am recording here some notes on a nice problem that Sorin Popa shared with me recently. To motivate the question, we begin with the basic observation that the differentiation operator {Df(x) := \frac{d}{dx} f(x)} and the position operator {Xf(x) := xf(x)} in one dimension formally obey the commutator equation

\displaystyle  [D,X] = 1 \ \ \ \ \ (1)

where {1} is the identity operator and {[D,X] := DX-XD} is the commutator. Among other things, this equation is fundamental in quantum mechanics, leading for instance to the Heisenberg uncertainty principle.

The operators {D,X} are unbounded on spaces such as {L^2({\bf R})}. One can ask whether the commutator equation (1) can be solved using bounded operators {D,X \in B(H)} on a Hilbert space {H} rather than unbounded ones. In the finite dimensional case when {D, X} are just {n \times n} matrices for some {n \geq 1}, the answer is clearly negative, since the left-hand side of (1) has trace zero and the right-hand side does not. What about in infinite dimensions, when the trace is not available? As it turns out, the answer is still negative, as was first worked out by Wintner and Wielandt. A short proof can be given as follows. Suppose for contradiction that we can find bounded operators {D, X} obeying (1). From (1) and an easy induction argument, we obtain the identity

\displaystyle  [D,X^n] = n X^{n-1} \ \ \ \ \ (2)

for all natural numbers {n}. From the triangle inequality, this implies that

\displaystyle  n \| X^{n-1} \|_{op} \leq 2 \|D\|_{op} \| X^n \|_{op}.

Iterating this, we conclude that

\displaystyle  \| X \|_{op} \leq \frac{(2 \|D\|_{op})^{n-1}}{n!} \|X^n \|_{op}

for any {n}. Bounding {\|X^n\|_{op} \leq \|X\|_{op}^n} and then sending {n \rightarrow \infty}, we conclude that {\|X\|_{op}=0}, which clearly contradicts (1). (Note the argument can be generalised without difficulty to the case when {D,X} lie in a Banach algebra, rather than be bounded operators on a Hilbert space.)

It was observed by Popa that there is a quantitative version of this result:

Theorem 1 Let {D, X \in B(H)} such that

\displaystyle  \| [D,X] - I \|_{op} \leq \varepsilon

for some {\varepsilon > 0}. Then we have

\displaystyle  \| X \|_{op} \|D \|_{op} \geq \frac{1}{2} \log \frac{1}{\varepsilon}. \ \ \ \ \ (3)

Proof: By multiplying {D} by a suitable constant and dividing {X} by the same constant, we may normalise {\|D\|_{op}=1/2}. Write {DX - XD = 1 + E} with {\|E\|_{op} \leq \varepsilon}. Then the same induction that established (2) now shows that

\displaystyle  [D,X^n]= n X^{n-1} + X^{n-1} E + X^{n-2} E X + \dots + E X^{n-1}

and hence by the triangle inequality

\displaystyle  n \| X^{n-1} \|_{op} \leq \| X^n \|_{op} + n \varepsilon \|X\|_{op}^{n-1}.

We divide by {n!} and sum to conclude that

\displaystyle  \sum_{n=0}^\infty \frac{\|X^n\|_{op}}{n!} \leq \sum_{n=1}^\infty \frac{\|X^n\|_{op}}{n!} + \varepsilon \exp( \|X\|_{op} )

giving the claim.

Again, the argument generalises easily to any Banach algebra. Popa then posed the question of whether the quantity {\frac{1}{2} \log \frac{1}{\varepsilon}} can be replaced by any substantially larger function of {\varepsilon}, such as a polynomial in {\frac{1}{\varepsilon}}. As far as I know, the above simple bound has not been substantially improved.

In the opposite direction, one can ask for constructions of operators {X,D} that are not too large in operator norm, such that {[D,X]} is close to the identity. Again, one cannot do this in finite dimensions: {[D,X]} has trace zero, so at least one of its eigenvalues must outside the disk {\{ z: |z-1| < 1\}}, and therefore {\|[D,X]-1\|_{op} \geq 1} for any finite-dimensional {n \times n} matrices {X,D}.

However, it was shown in 1965 by Brown and Pearcy that in infinite dimensions, one can construct operators {D,X} with {[D,X]} arbitrarily close to {1} in operator norm (in fact one can prescribe any operator for {[D,X]} as long as it is not equal to a non-zero multiple of the identity plus a compact operator). In the above paper of Popa, a quantitative version of the argument (based in part on some earlier work of Apostol and Zsido) was given as follows. The first step is to observe the following Hilbert space version of Hilbert’s hotel: in an infinite dimensional Hilbert space {H}, one can locate isometries {u, v \in B(H)} obeying the equation

\displaystyle  uu^* + vv^* = 1, \ \ \ \ \ (4)

where {u^*} denotes the adjoint of {u}. For instance, if {H} has a countable orthonormal basis {e_1, e_2, \dots}, one could set

\displaystyle  u := \sum_{n=1}^\infty e_{2n-1} e_n^*


\displaystyle  v := \sum_{n=1}^\infty e_{2n} e_n^*,

where {e_n^*} denotes the linear functional {x \mapsto \langle x, e_n \rangle} on {H}. Observe that (4) is again impossible to satisfy in finite dimension {n}, as the left-hand side must have trace {2n} while the right-hand side has trace {n}.

As {u,v} are isometries, we have

\displaystyle  v^* v = u^* u = 1; \ \ \ \ \ (5)

Multiplying (4) on the left by {v^*} and right by {u}, or on the left by {u^*} and right by {v}, then gives

\displaystyle  v^* u = u^* v = 0. \ \ \ \ \ (6)

From (4), (5) we see in particular that, while we cannot express {1} as a commutator of bounded operators, we can at least express it as the sum of two commutators:

\displaystyle  [u^*, u] + [v^*, v] =1.

We can rewrite this somewhat strangely as

\displaystyle  [\frac{1}{2} u^*, 4u+2v] + [\frac{1}{2} u^* - v^*, -2v] = 2

and hence there exists a bounded operator {a} such that

\displaystyle  [\frac{1}{2} u^*, 4u+2v] = 1+a; \quad [\frac{1}{2} u^* - v^*, -2v] = 1-a.

Moving now to the Banach algebra of {2 \times 2} matrices with entries in {B(H)} (which can be equivalently viewed as {B(H \oplus H)}), a short computation then gives the identity

\displaystyle  \left[ \begin{pmatrix} \frac{1}{2} u^* & 0 \\ a & \frac{1}{2} u^* - v^* \end{pmatrix}, \begin{pmatrix} 4u+2v & 1 \\ 0 & -2v \end{pmatrix} \right] = \begin{pmatrix} 1 & v^* \\ b & 1 \end{pmatrix}

for some bounded operator {b} whose exact form will not be relevant for the argument. Now, by Neumann series (and the fact that {u,v} have unit operator norm), we can find another bounded operator {c} such that

\displaystyle  c + \frac{1}{2} v c u^* = b,

and then another brief computation shows that

\displaystyle  \left[ \begin{pmatrix} \frac{1}{2} u^* & 0 \\ a & \frac{1}{2} u^* - v^* \end{pmatrix}, \begin{pmatrix} 4u+2v & 1 \\ vc & -2v \end{pmatrix} \right] = \begin{pmatrix} 1 & v^* \\ 0 & 1 \end{pmatrix}.

Thus we can express the operator {\begin{pmatrix} 1 & v^* \\ 0 & 1 \end{pmatrix}} as the commutator of two operators of norm {O(1)}. Conjugating by {\begin{pmatrix} \varepsilon^{1/2} & 0 \\ 0 & \varepsilon^{-1/2} \end{pmatrix}} for any {0 < \varepsilon \leq 1}, we may then express {\begin{pmatrix} 1 & \varepsilon v^* \\ 0 & 1 \end{pmatrix}} as the commutator of two operators of norm {O(\varepsilon^{-1})}. This shows that the right-hand side of (3) cannot be replaced with anything that blows up faster than {\varepsilon^{-2}} as {\varepsilon \rightarrow 0}. Can one improve this bound further?

April 13, 2018

Matt von HippelBy Any Other Author Would Smell as Sweet

I was chatting with someone about this paper (which probably deserves a post in its own right, once I figure out an angle that isn’t just me geeking out about how much I could do with their new setup), and I referred to it as “Claude’s paper”. This got me chided a bit: the paper has five authors, experts on Feynman diagrams and elliptic integrals. It’s not just “Claude’s paper”. So why do I think of it that way?

Part of it, I think, comes from the experience of reading a paper. We want to think of a paper as a speech act: someone talking to us, explaining something, leading us through a calculation. Our brain models that as a conversation with a single person, so we naturally try to put a single face to a paper. With a collaborative paper, this is almost never how it was written: different sections are usually written by different people, who then edit each other’s work. But unless you know the collaborators well, you aren’t going to know who wrote which section, so it’s easier to just picture one author for the whole thing.

Another element comes from how I think about the field. Just as it’s easier to think of a paper as the speech of one person, it’s easier to think of new developments as continuations of a story. I at least tend to think about the field in terms of specific programs: these people worked on this, which is a continuation of that. You can follow those kinds of threads though the field, but in reality they’re tangled together: collaborations are an opportunity for two programs to meet. In other fields you might have a “first author” to default to, but in theoretical physics we normally write authors alphabetically. For “Claude’s paper”, it just feels like the sort of thing I’d expect Claude Duhr to write, like a continuation of the other things he’s known for, even if it couldn’t have existed without the other four authors.

You’d worry that associating papers with people like this takes away deserved credit. I don’t think it’s quite that simple, though. In an older post I described this paper as the work of Anastasia Volovich and Mark Spradlin. On some level, that’s still how I think about it. Nevertheless, when I heard that Cristian Vergu was going to be at the Niels Bohr Institute next year, I was excited: we’re hiring one of the authors of GSVV! Even if I don’t think of him immediately when I think of the paper, I think of the paper when I think of him.

That, I think, is more important for credit. If you’re a hiring committee, you’ll start out by seeing names of applicants. It’s important, at that point, that you know what they did, that the authors of important papers stand out, that you assign credit where it’s due. It’s less necessary on the other end, when you’re reading a paper and casually classify it in your head.

Nevertheless, I should be more careful about credit. It’s important to remember that “Claude Duhr’s paper” is also “Johannes Broedel’s paper” and “Falko Dulat’s paper”, “Brenda Penante’s paper” and “Lorenzo Tancredi’s paper”. It gives me more of an appreciation of where it comes from, so I can get back to having fun applying it.

John BaezApplied Category Theory Course

It just became a lot easier to learn about applied category theory, thanks to this free book:

• Brendan Fong and David Spivak, Seven Sketches in Compositionality: An Invitation to Applied Category Theory.

I’ve started an informal online course based on this book on the Azimuth Forum. I’m getting pretty sick of the superficial quality of my interactions on social media. This could be a way to do something more interesting.

The idea is that you can read chapters of this book, discuss them, try the exercises in the book, ask and answer questions, and maybe team up to create software that implements some of the ideas. I’ll try to keep things moving forward. For example, I’ll explain some stuff and try to help answer questions that people are stuck on. I may also give some talks or run discussions on Google Hangouts or similar software—but only when I have time: I’m more of a text-based guy. I may get really busy some times, and leave the rest of you alone for a while. But I like writing about math for at least 15 minutes a day, and more when I have time. Furthermore, I’m obsessed with applied category theory and plan to stay that way for at least a few more years.

If this sounds interesting, let me know here—and please visit the Azimuth Forum and register! Use your full real name as your username, with no spaces. I will add spaces and that will become your username. Use a real working email address. If you don’t, the registration process may not work.

Over 70 people have registered so far, so this process will take a while.

The main advantage of the Forum over this blog is that you can initiate new threads and edit your comments. Like here you can write equations in LaTeX. Like here, that ability is severely limited: for example you can’t define macros, and you can’t use TikZ. (Maybe someone could fix that.) But equations are better typeset over there—and more importantly, the ability to edit comments makes it a lot easier to correct errors in your LaTeX.

Please let me know what you think.

What follows is the preface to Fong and Spivak’s book, just so you can get an idea of what it’s like.


Category theory is becoming a central hub for all of pure mathematics. It is unmatched in its ability to organize and layer abstractions, to find commonalities between structures of all sorts, and to facilitate communication between different mathematical communities. But it has also been branching out into science, informatics, and industry. We believe that it has the potential to be a major cohesive force in the world, building rigorous bridges between disparate worlds, both theoretical and practical. The motto at MIT is mens et manus, Latin for mind and hand. We believe that category theory—and pure math in general—has stayed in the realm of mind for too long; it is ripe to be brought to hand.

Purpose and audience

The purpose of this book is to offer a self-contained tour of applied category theory. It is an invitation to discover advanced topics in category theory through concrete real-world examples. Rather than try to give a comprehensive treatment of these topics—which include adjoint functors, enriched categories, proarrow equipments, toposes, and much more–we merely provide a taste. We want to give readers some insight into how it feels to work with these structures as well as some ideas about how they might show up in practice.

The audience for this book is quite diverse: anyone who finds the above description intriguing. This could include a motivated high school student who hasn’t seen calculus yet but has loved reading a weird book on mathematical logic they found at the library. Or a machine learning researcher who wants to understand what vector spaces, design theory, and dynamical systems could possibly have in common. Or a pure mathematician who wants to imagine what sorts of applications their work might have. Or a recently-retired programmer who’s always had an eerie feeling that category theory is what they’ve been looking for to tie it all together, but who’s found the usual books on the subject impenetrable.

For example, we find it something of a travesty that in 2018 there seems to be no introductory material available on monoidal categories. Even beautiful modern introductions to category theory, e.g. by Riehl or Leinster, do not include anything on this rather central topic. The basic idea is certainly not too abstract; modern human intuition seems to include a pre-theoretical understanding of monoidal categories that is just waiting to be formalized. Is there anyone who wouldn’t correctly understand the basic idea being communicated in the following diagram?

Many applied category theory topics seem to take monoidal categories as their jumping off point. So one aim of this book is to provide a reference—even if unconventional—for this important topic.

We hope this book inspires both new visions and new questions. We intend it to be self-contained in the sense that it is approachable with minimal prerequisites, but not in the sense that the complete story is told here. On the contrary, we hope that readers use this as an invitation to further reading, to orient themselves in what is becoming a large literature, and to discover new applications for themselves.

This book is, unashamedly, our take on the subject. While the abstract structures we explore are important to any category theorist, the specific topics have simply been chosen to our personal taste. Our examples are ones that we find simple but powerful, concrete but representative, entertaining but in a way that feels important and expansive at the same time. We hope our readers will enjoy themselves and learn a lot in the process.

How to read this book

The basic idea of category theory—which threads through every chapter—is that if one pays careful attention to structures and coherence, the resulting systems will be extremely reliable and interoperable. For example, a category involves several structures: a collection of objects, a collection of morphisms relating objects, and a formula for combining any chain of morphisms into a morphism. But these structures need to cohere or work together in a simple commonsense way: a chain of chains is a chain, so combining a chain of chains should be the same as combining the chain. That’s it!

We will see structures and coherence come up in pretty much every definition we give: “here are some things and here are how they fit together.” We ask the reader to be on the lookout for structures and coherence as they read the book, and to realize that as we layer abstraction on abstraction, it is the coherence that makes everything function like a well-oiled machine.

Each chapter in this book is motivated by a real-world topic, such as electrical circuits, control theory, cascade failures, information integration, and hybrid systems. These motivations lead us into and through various sorts of category-theoretic concepts.

We generally have one motivating idea and one category-theoretic purpose per chapter, and this forms the title of the chapter, e.g. Chapter 4 is “Collaborative design: profunctors, categorification, and monoidal categories.” In many math books, the difficulty is roughly a monotonically-increasing function of the page number. In this book, this occurs in each chapter, but not so much in the book as a whole. The chapters start out fairly easy and progress in difficulty.

The upshot is that if you find the end of a chapter very difficult, hope is certainly not lost: you can start on the next one and make good progress. This format lends itself to giving you a first taste now, but also leaving open the opportunity for you to come back at a later date and get more deeply into it. But by all means, if you have the gumption to work through each chapter to its end, we very much encourage that!

We include many exercises throughout the text. Usually these exercises are fairly straightforward; the only thing they demand is that the reader’s mind changes state from passive to active, rereads the previous paragraphs with intent, and puts the pieces together. A reader becomes a student when they work the exercises; until then they are more of a tourist, riding on a bus and listening off and on to the tour guide. Hey, there’s nothing wrong with that, but we do encourage you to get off the bus and make contact with the natives as often as you can.

April 12, 2018

n-Category Café Torsion: Graph Magnitude Homology Meets Combinatorial Topology

As I explained in my previous post, magnitude homology categorifies the magnitude of graphs. There are two questions that will jump out to seasoned students of homology.

  • Are there two graphs which have the same magnitude but different homology groups?
  • Is there a graph with torsion in its homology groups?

Both of these were stated as open questions by Richard Hepworth and me in our paper as we were unable to answer them, despite thinking about them a fair bit. However, recently both of these have been answered positively!

The first question has been answered positively by Yuzhou Gu in a reply to my post. Well, this is essentially answered, in the sense that he has given two graphs both of which we know (provably) the magnitude of, one of which we know (provably) the magnitude homology groups of and the other of which we can compute the magnitude homology of using mine and James Cranch’s SageMath software. So this just requires verification that the program result is correct! I have no doubt that it is correct though.

The second question on the existence of torsion is what I want to concentrate on in this post. This question has been answered positively by Ryuki Kaneta and Masahiko Yoshinaga in their paper

It is a consequence of what they prove in their paper that the graph below has 22-torsion in its magnitude homology; SageMath has drawn it as a directed graph, but you can ignore the arrows. (Click on it to see a bigger version.)


In their paper they prove that if you have a finite triangulation TT of an mm-dimensional manifold MM then you can construct a graph G((T)G((T) so that the reduced homology groups of MM embed in the magnitude homology groups of G((T)G((T):

H˜ i(M)MH i+2,m+2(G(T))for 0im. \widetilde{\mathrm{H}}_i(M)\hookrightarrow MH_{i+2, m+2}( G(T)) \,\,\,\, \text{for }\,\,0\le i \le m.

Following the suggestion in their paper, I’ve taken a minimal triangulation T 0T_0 of the real projective plane P 2\mathbb{R} P^2 and used that to construct the above graph. As we know H 1(P 2)=/2\mathrm{H}_1(\mathbb{R} P^2)=\mathbb{Z}/2\mathbb{Z}, we know that there is 22-torsion in MH 3,4(G(T 0))MH_{3,4}(G({T_0})).

In the rest of this post I’ll explain the construction of the graph and show explicitly how to give a 22-torsion class in MH 3,4(G(T 0))MH_{3,4}(G({T_0})). I’ll illustrate the theory of Kaneta and Yoshinaga by working through a specific example. Barycentric subdivision plays a key role!

The minimal triangulation of the projective plane

We are going to construct our graph from the minimal triangulation of P 2\mathbb{R} P^2 so let’s have a look at that first. We want to see how the 22-torsion in the homology of P 2\mathbb{R} P^2 can be expressed using this triangulation as we will need that later for the 22-torsion in the graph magnitude homology.

The real projective plane can be thought of as the two-sphere quotiented out by the antipodal map. The antipodal map acts on the icosahedral triangulation of the two-sphere. So quotienting the icosahedron by the antipodal map gives us a triangulation T 0T_0 of P 2\mathbb{R} P^2 which is in fact the triangulation with fewest simplices. Here is a picture of it.


I’ve numbered the vertices and we will label each simplex by its vertices, so (0,1,2)(0, 1, 2) is a 2-simplex you can see in the triangulation above. The label for a simplex will have the vertices in linear order.

Let’s recall how we get the 22-torsion element in homology. If we take the boundary \partial of the ten 2-simplices with the orientation drawn above then we get

(3,4)+(4,5)(3,5)+(3,4)+(4,5)(3,5). (3, 4)+(4,5)-(3,5) + (3, 4)+(4,5)-(3,5).

(We write (3,5)-(3,5) rather than (5,3)(5,3) because of the orientation conventions which I will gloss over. You can figure them out if you’re interested/concerned. I think they are right!)

As 2=0\partial^2=0 this boundary chain is a cycle. To get homology we quotient cycles out by boundaries, so this cycle is trivial in homology, which means

2[(3,4)+(4,5)(3,5)]=0H 1 simp(T 0). 2[(3, 4)+(4,5)-(3,5)] = 0 \in \mathrm{H}_1^{\mathrm{simp}}(T_0).

Thus [(3,4)+(4,5)(3,5)][(3, 4)+(4,5)-(3,5)] is a 22-torsion element. Off the top of my head I can’t think of a nice argument showing that this is non-trivial in homology, but hopefully someone can provide one in the comments! Anyway, we’re going to need this later.

Constructing the graph and the magnitude cycles

Following Kaneta and Yoshinaga, we are now going to use the above triangulation T 0T_0 to build our graph G(T 0)G({T_0}). We take the simplices of the triangulation as the nodes of the graph and have an edge στ\sigma \to \tau if σ\sigma is a facet of τ\tau, remember that a facet is a face of maximal dimension. We then add top and bottom nodes, so bottom\mathrm{bottom} has an arrow to each 00-simplex and top\mathrm{top} has an arrow from each 22-dimensional simplex. Here is the graph again. You should be able to see the six vertices, fifteen edges and ten faces of the original triangulation.


A more sophisticated way of saying what we’ve done is the following. The vertices in the triangulation form a poset, the face poset, with the ordering is ‘is a face of’. We add top and bottom elements to that poset. We then take the Hasse diagram which is the graph which has the elements of the poset as its nodes and where there is an edge xyx\to y if xyx\le y but there is no zz with xzyx\le z\le y. Clearly this process gives us a graph G(T)G({T}) from any triangulation TT of a manifold.

We can obtain the graph G(T 0)G({T_0}) in SageMath with a couple of commands:

triangulation = simplicial_complexes.RealProjectivePlane()
poset = triangulation.face_poset().with_bounds()
graph = poset.hasse_diagram()

To see the above picture you can use the following command:


For what we do next it doesn’t matter if we stick with this directed graph or take the associated undirected graph as we are going to be forced to only consider upward pointing edges.

The magnitude chains for this graph

In the previous post I explained that for a finite graph GG the magnitude chain groups are defined as follows.

A chain generator is a tuple of the form c=(x 0,x 1,,x k1,x k),c= (x_0, x_1,\dots, x_{k-1},x_k), where each x ix_i is a node of the graph GG and x i1x ix_{i-1}\ne x_{i}. The degree is deg(c)=k\mathrm{deg}(c)=k and the length is len(c)=d(x i1,x i)\mathrm{len}(c)=\sum \mathrm{d}(x_{i-1}, x_i).

The face map i\partial_i for i=1,,k1i=1,\dots, k-1 is defined by:

i(x 0,,x k)={(x 0,,x i^,,x k) ifx i1<x i<x i+1, 0 otherwise. \partial_{i}(x_0,\ldots,x_k) = \begin{cases} (x_0,\ldots,\widehat{x_i},\ldots,x_k) & \text{if}\,\, x_{i-1} \lt x_{i} \lt x_{i+1}, \\ 0 & \text{otherwise}. \end{cases}

where x i1<x i<x i+1x_{i-1} \lt x_{i}\lt x_{i+1} means that x ix_i lies on a shortest path between x i1x_{i-1} and x i+1x_{i+1}, i.e., d(x i1,x i)+d(x i,x i+1)=d(x i1,x i+1)\text{d}(x_{i-1},x_i)+\text{d}(x_i,x_{i+1})=\text{d}(x_{i-1},x_{i+1}).

Neither the length nor the endpoints of chains are altered by the face maps, so the magnitude complex splits up into subcomplexes with specified endpoints. So if we define

MC k,l y,z(G)=(y,x 1,,x k1,z)|chain generator of length l. MC^{y,z}_{k,l}(G)=\left\langle (y, x_1,\dots, x_{k-1},z) \,\, |\,\, \text{chain generator of length }\,\,l\right\rangle.

Then the magnitude chain complex splits as

MC *,*(G)= y,zG lMC *,l y,z(G) MC_{\ast,\ast}(G)=\bigoplus_{y,z\in G}\bigoplus_l MC^{y,z}_{\ast,l}(G)

We will concentrate on the subcomplex of length-four chains from the bottom element to the top element in our graph (here, four is dimension of P 2\mathbb{R} P^2 plus two). Writing b\mathrm{b} and t\mathrm{t} for the bottom and top elements we consider the magnitude chain complex MC *,4 b,t(G(T 0)MC^{\mathrm{b},\mathrm{t}}_{\ast,4}(G({T_0}). We will see that the homology of this is isomorphic to H˜ *+2(P 2)\widetilde{\mathrm{H}}_{\ast+2}(\mathbb{R} P_2) and so we get the embedding H˜ *(P 2)MC *+2,4(G)\widetilde{\mathrm{H}}_{\ast}(\mathbb{R} P_2)\hookrightarrow MC_{\ast+2,4}(G).

Looking at our graph it is easy to see that a length four chain must be of the form

(b,σ 1,,σ k1,t) (\mathrm{b}, \sigma_1, \dots,\sigma_{k-1},\mathrm{t})

where σ 1σ k1\sigma_1\subset \dots\subset\sigma_{k-1} is a sequence of simplices, each of which is a face of the following one, in other words it is flag of simplices in our original triangulation. Such a flag can be a full flag like (0)(0,1)(0,1,2)(0)\subset(0,1)\subset(0,1,2) in which each is a facet of the following simplex, or it can be a partial flag like (0)(0,1,2)(0)\subset (0,1,2).

Looking at the formula for the face maps you can see that given a generator corresponding to a flag of simplices, each facet of the generator corresponds to a flag with one of the simplices removed.

Those of you familiar with combinatorial topology might have spotted the connection with the barycentric subdivision construction. We will look a little more closely at these flags now.

Barycentric subdivision

One of the things you usually learn in a first course on algebraic topology is that if you have a triangulation of a space then you can form a finer triangulation – the barycentric subdivision – in the following way. Take the midpoint of each simplex in the original triangulation and use these as the vertices of the new triangulation, decomposing each of the old nn-simplices into (n+1)!(n+1)! new nn-simplices in ‘the obvious fashion.

For instance, if we take a triangle in our triangulation then each 11-simplex gets split into 22 new 11-simplices and the 22-simplex gets split into 66 new 22-simplices.


Each new vertex can be labelled by the nn-simplex that it is the midpoint of. What then is ‘the obvious fashion’ for creating the new simplices? Well each new nn-simplex in the subdivision corresponds to a flag of old simplices σ 0σ 1σ n\sigma_0\subset \sigma_1\subset \dots \subset \sigma_n. We can picture the 22-simplex corresponding to (0)(0,1)(0,1,2)(0)\subset(0,1)\subset(0,1,2) and the 11-simplex corresponding to (0)(0,1,2)(0)\subset (0,1,2) as follows.


It ought to be clear that for the nn-simplex in the subdivision corresponding to a flag σ 0σ 1σ n\sigma_0\subset \sigma_1\subset \dots \subset \sigma_n, each facet corresponds to a flag where one of the old simplices have been removed. So for (0)(0,1)(0,1,2)(0)\subset(0,1)\subset(0,1,2) the facets correspond to (0,1)(0,1,2)(0,1)\subset(0,1,2), (0)(0,1,2)(0)\subset(0,1,2) and (0)(0,1)(0)\subset(0,1).

So the barycentric subdivision TT', as a simplicial complex is isomorphic to the simplicial complex of flags where an nn-simplex is a flag σ 0σ 1σ n\sigma_0\subset \sigma_1\subset \dots \subset \sigma_n of simplices of the original triangulation TT. Well, there’s a subtlety here in that we shouldn’t forget the empty flag! The flags naturally form an augmented simplicial complex, meaning that there is a unique simplex in degree 1-1, the empty flag, which is the facet of every 00-simplex.

So the augmented chain complex C˜ * simp(T)\widetilde{\mathrm{C}}_\ast^{\mathrm{simp}}(T'), which is obtained from the usual chain complex by sticking a unique generator in degree 1-1, is isomorphic to the complex of flags. The homology of the augmented chain complex gives the reduced homology H˜ * simp(T)H˜ *(M)\widetilde{\mathrm{H}}_\ast^{\mathrm{simp}}(T')\cong \widetilde{\mathrm{H}}_\ast(M) which in practice means you kill off a copy of Z\mathrm{Z} in degree zero from the usual homology.

The important thing is that we are seeing here precisely the same structure that we saw in the magnitude chain group MC *,l b,t(G)MC_{\ast, l}^{b,t}(G). We should now make precise.

Synthesis of the two sides: the theorem

I’ve hopefully given the impression that a complex of flags of simplices is isomorphic to both the augmented chain complex of the barycentric subdivision of a triangulation and to a subcomplex of the magnitude chain complex of the graph of the triangulation. Let’s give a proper statement of this now. Kaneta and Yoshinaga have a more general statement in their paper, involving ranked posets rather than just triangulations, but for the purpose of finding torsion, the following will suffice.

Theorem. Suppose that TT is a finite triangulation of a mm-manifold MM, and that TT' is its barycentric subdivision, with G(T)G({T}) the graph obtained as above from the poset structure, then the isomorphism of (augmented) chain groups for k1k\ge -1

C˜ k simp(T) MC k+2,m+2 b,t(G(T)); σ 0σ 1σ k (b,σ 0,,σ k,t) \begin{aligned} \widetilde{\mathrm{C}}^{\mathrm{simp}}_k (T')&\xrightarrow{\sim} MC^{\mathrm{b},\mathrm{t}}_{k+2,m+2}(G({T})); \\ \sigma_0\subset \sigma_1\subset \dots \subset \sigma_k &\mapsto (b, \sigma_0,\dots, \sigma_k,t) \end{aligned}

commutes up to sign with the differentials. Thus this induces an isomorphism of homology groups

H˜ * simp(T)MH *+2,m+2 b,t(G(T)). \widetilde{\mathrm{H}}^{\mathrm{simp}}_\ast (T')\xrightarrow{\simeq} MH^{b,t}_{\ast+2,m+2}(G({T})).

As a corollary we get that the homology of MM embeds in the magnitude homology of the graph.

Corollary. With MM, TT, TT' and G(T)G({T}) as in the above theorem, the homology of MM embeds in the magnitude homology of the graph G(T)G({T}) via the following sequence of isomorphisms and embeddings:

H˜ *(M)H˜ * simp(T)H˜ * simp(T)MH *+2,m+2 b,t(G(T))MH *+2,m+2(G(T)). \widetilde{\mathrm{H}}_\ast(M) \cong \widetilde{\mathrm{H}}_\ast^{\mathrm{simp}}(T) \xrightarrow{\simeq} \widetilde{\mathrm{H}}^{\mathrm{simp}}_\ast (T') \xrightarrow{\simeq} MH^{b,t}_{\ast+2,m+2}(G({T})) \hookrightarrow MH_{\ast+2,m+2}(G({T})).

The payoff: a torsion element in the homology of our graph

We saw earlier on in this post that

[(3,4)+(4,5)(3,5)]H 1 simp(T 0). [(3, 4)+(4,5)-(3,5)] \in \mathrm{H}_1^{\mathrm{simp}}(T_0).

is the non-trivial 22-torsion element. We can follow this element through the sequence of maps above. We just need to note that the map 11-chains on TT to 11-chains on TT' rewrites each edge as the (signed) sum of its two half-edges, namely

C 1 simp(T) C 1 simp(T) (a,b) ((a)(a,b))((b)(a,b)). \begin{aligned} \mathrm{C}_1^{\mathrm{simp}}(T) &\to \mathrm{C}_1^{\mathrm{simp}}(T') \\ (a,b)&\mapsto \bigl((a) \subset (a,b)\bigr) - \bigl((b)\subset (a,b)\bigr). \end{aligned}

Then we get our non-trivial 22-torsion element in MH 3,4(G(T 0))MH_{3,4}(G({T_0})) to be

[(b,(3),(3,4),t) (b,(4),(3,4),t)+(b,(4),(4,5),t) (b,(5),(4,5),t)+(b,(5),(3,5),t)(b,(3),(3,5),t)]. \begin{split} [(\mathrm{b}, (3), (3,4), \mathrm{t}) &-(\mathrm{b}, (4), (3,4), \mathrm{t}) +(\mathrm{b}, (4), (4,5), \mathrm{t})\\ &-(\mathrm{b}, (5), (4,5), \mathrm{t}) +(\mathrm{b}, (5), (3,5), \mathrm{t}) -(\mathrm{b}, (3), (3,5), \mathrm{t})]. \end{split}

Thus we have a graph with torsion in its magnitude homology groups!

April 11, 2018

Doug NatelsonChapman Lecture: Using Topology to Build a Better Qubit

Yesterday, we hosted Prof. Charlie Marcus of the Niels Bohr Institute and Microsoft for our annual Chapman Lecture on Nanotechnology.   He gave a very fun, engaging talk about the story of Majorana fermions as a possible platform for topological quantum computing. 

Charlie used quipu to introduce the idea of topology as a way to store information, and made a very nice heuristic argument about how topology encodes information in a global rather than a local sense.  That is, if you have a big, loose tangle of string on the ground, and you do local measurements of little bits of the string, you really can't tell whether it's actually tied in a knot (topologically nontrivial) or just lying in a heap.  This hints at the idea that local interactions (measurements, perturbations) can't necessarily disrupt the topological state of a quantum system.

The talk was given a bit of a historical narrative flow, pointing out that while there had been a lot of breathless prose written about the long search for Majoranas, etc., in fact the timeline was actually rather compressed.  In 2001, Alexei Kitaev proposed a possible way of creating effective Majorana fermions, particles that encode topological information,  using semiconductor nanowires coupled to a (non-existent) p-wave superconductor.   In this scheme, Majorana quasiparticles localize at the ends of the wire.  You can get some feel for the concept by imagining string leading off from the ends of the wire, say downward through the substrate and off into space.  If you could sweep the Majoranas around each other somehow, the history of that wrapping would be encoded in the braiding of the strings, and even if the quasiparticles end up back where they started, there is a difference in the braiding depending on the history of the motion of the quasiparticles.   Theorists got very excited a bout the braiding concept and published lots of ideas, including how one might do quantum computing operations by this kind of braiding.

In 2010, other theorists pointed out that it should be possible to implement the Majoranas in much more accessible materials - InAs semiconductor nanowires and conventional s-wave superconductors, for example.  One experimental feature that could be sought would be a peak in the conductance of a superconductor/nanowire/superconductor device, right at zero voltage, that should turn on above a threshold magnetic field (in the plane of the wire).  That's really what jumpstarted the experimental action.  Fast forward a couple of years, and you have a paper that got a ton of attention, reporting the appearance of such a peak.  I pointed out at the time that that peak alone is not proof, but it's suggestive.  You have to be very careful, though, because other physics can mimic some aspects of the expected Majorana signature in the data.

A big advance was the recent success in growing epitaxial Al on the InAs wires.  Having atomically precise lattice registry between the semiconductor and the aluminum appears to improve the contacts significantly.   Note that this can be done in 2d as well, opening up the possibility of many investigations into proximity-induced superconductivity in gate-able semiconductor devices.  This has enabled some borrowing of techniques from other quantum computing approaches (transmons).

The main take-aways from the talk:

  • Experimental progress has actually been quite rapid, once a realistic material system was identified.
  • While many things point to these platforms as really having Majorana quasiparticles, the true unambiguous proof in the form of some demonstration of non-Abelian statistics hasn't happened yet.  Getting close.
  • Like many solid-state endeavors before, the true enabling advances here have come from high quality materials growth.
  • If this does work, scale-up may actually be do-able, since this does rely on planar semiconductor fabrication for the most part, and topological qubits may have a better ratio of physical qubits to logical qubits than other approaches.
  • Charlie Marcus remains an energetic, engaging speaker, something I first learned when I worked as the TA for the class he was teaching 24 years ago. 

April 10, 2018

Tommaso DorigoInterpreting The Predictions Of Deep Neural Networks

CERN has equipped itself with an inter-experimental working group on Machine Learning since a couple of years. Besides organizing monthly meetings and other activities fostering the dissemination of knowledge and active research on the topic, the group holds a yearly meeting at CERN where along with interesting presentations on advances and summaries, there are tutorials to teach participants the use of the fast-growing arsenal of tools that any machine-learning enthusiast these days should master.

read more

Jordan EllenbergThe chromatic number of the plane is at least 5

That is:  any coloring of the plane with four colors has two points at distance 1 from each other.  So says a paper just posted by Aubrey de Grey.

The idea:  given a set S of points in the plane, its unit distance graph G_S is the graph whose vertices are S and where two points are adjacent if they’re at distance 1 in the plane.  If you can find S such that G_S has chromatic number k, then the chromatic number of the plane is at least k.  And de Grey finds a set of 1,567 points whose unit distance graph can’t be 4-colored.

It’s known that the chromatic number of the plane is at most 7.  Idle question:  is there any chance of a “polynomial method”-style proof that there is no subset S of the plane whose unit distance graph has chromatic number 7?  Such a graph would have a lot of unit distances, and ruling out lots of repetitions of the same distance is something the polynomial method can in principle do.

Though be warned:  as far as I know the polynomial method has generated no improvement so far on older bounds on the unit distance problem (“how many unit distances can there be among pairs drawn from S?”) while it has essentially solved the distinct distance problem (“how few distinct distances can there be among pairs drawn from S?”)


April 09, 2018

ResonaancesPer kaons ad astra

NA62 is a precision experiment at CERN. From their name you wouldn't suspect that they're doing anything noteworthy: the collaboration was running in the contest for the most unimaginative name, only narrowly losing to CMS...  NA62 employs an intense beam of charged kaons to search for the very rare decay K+ → 𝝿+ 𝜈 𝜈. The Standard Model predicts the branching fraction BR(K+ → 𝝿+ 𝜈 𝜈) = 8.4x10^-11 with a small, 10% theoretical uncertainty (precious stuff in the flavor business). The previous measurement by the BNL-E949 experiment reported BR(K+ → 𝝿+ 𝜈 𝜈) = (1.7 ± 1.1)x10^-10, consistent with the Standard Model, but still  leaving room for large deviations.  NA62 is expected to pinpoint the decay and measure the branching fraction with a 10% accuracy, thus severely constraining new physics contributions. The wires, pipes, and gory details of the analysis  were nicely summarized by Tommaso. Let me jump directly to explaining what is it good for from the theory point of view.

To this end it is useful to adopt the effective theory perspective. At a more fundamental level, the decay occurs due to the strange quark inside the kaon undergoing the transformation  sbardbar 𝜈 𝜈bar. In the Standard Model, the amplitude for that process is dominated by one-loop diagrams with W/Z bosons and heavy quarks. But kaons live at low energies and do not really see the fine details of the loop amplitude. Instead, they effectively see the 4-fermion contact interaction:
The mass scale suppressing this interaction is quite large, more than 1000 times larger than the W boson mass, which is due to the loop factor and small CKM matrix elements entering the amplitude. The strong suppression is the reason why the K+ → 𝝿+ 𝜈 𝜈  decay is so rare in the first place. The corollary is that even a small new physics effect inducing that effective interaction may dramatically change the branching fraction. Even a particle with a mass as large as 1 PeV coupled to the quarks and leptons with order one strength could produce an observable shift of the decay rate.  In this sense, NA62 is a microscope probing physics down to 10^-20 cm  distances, or up to PeV energies, well beyond the reach of the LHC or other colliders in this century. If the new particle is lighter, say order TeV mass, NA62 can be sensitive to a tiny milli-coupling of that particle to quarks and leptons.

So, from a model-independent perspective, the advantages  of studying the K+ → 𝝿+ 𝜈 𝜈  decay are quite clear. A less trivial question is what can the future NA62 measurements teach us about our cherished models of new physics. One interesting application is in the industry of explaining the apparent violation of lepton flavor universality in BK l+ l-, and BD l 𝜈 decays. Those anomalies involve the 3rd generation bottom quark, thus a priori they do not need to have anything to do with kaon decays. However, many of the existing models introduce flavor symmetries controlling the couplings of the new particles to matter (instead of just ad-hoc interactions to address the anomalies). The flavor symmetries may then relate the couplings of different quark generations, and thus predict  correlations between new physics contributions to B meson and to kaon decays. One nice example is illustrated in this plot:

The observable RD(*) parametrizes the preference for BD 𝜏 𝜈 over similar decays with electrons and muon, and its measurement by the BaBar collaboration deviates from the Standard Model prediction by roughly 3 sigma. The plot shows that, in a model based on U(2)xU(2) flavor symmetry, a significant contribution to RD(*) generically implies a large enhancement of BR(K+ → 𝝿+ 𝜈 𝜈), unless the model parameters are tuned to avoid that.  The anomalies in the BK(*) 𝜇 𝜇 decays can also be correlated with large effects in K+ → 𝝿+ 𝜈 𝜈, see here for an example. Finally, in the presence of new light invisible particles, such as axions, the NA62 observations can be polluted by exotic decay channels, such as e.g.  K+ → axion 𝝿+.

The  K+ → 𝝿+ 𝜈 𝜈 decay is by no means the magic bullet that will inevitably break the Standard Model.  It should be seen as one piece of a larger puzzle that may or may not provide crucial hints about new physics. For the moment, NA62 has analyzed only a small batch of data collected in 2016, and their error bars are still larger than those of BNL-E949. That should change soon when the 2017  dataset is analyzed. More data will be acquired this year, with 20 signal events expected  before the long LHC shutdown. Simultaneously, another experiment called KOTO studies an even more rare process where neutral kaons undergo the CP-violating decay KL → 𝝿0 𝜈 𝜈,  which probes the imaginary part of the effective operator written above. As I wrote recently, my feeling is that low-energy precision experiments are currently our best hope for a better understanding of fundamental interactions, and I'm glad to see a good pace of progress on this front.

April 06, 2018

Matt von HippelA Paper About Ranking Papers

If you’ve ever heard someone list problems in academia, citation-counting is usually near the top. Hiring and tenure committees want easy numbers to judge applicants with: number of papers, number of citations, or related statistics like the h-index. Unfortunately, these metrics can be gamed, leading to a host of bad practices that get blamed for pretty much everything that goes wrong in science. In physics, it’s not even clear that these statistics tell us anything: papers in our field have been including more citations over time, and for thousand-person experimental collaborations the number of citations and papers don’t really reflect any one person’s contribution.

It’s pretty easy to find people complaining about this. It’s much rarer to find a proposed solution.

That’s why I quite enjoyed Alessandro Strumia and Riccardo Torre’s paper last week, on Biblioranking fundamental physics.

Some of their suggestions are quite straightforward. With the number of citations per paper increasing, it makes sense to divide each paper by the number of citations it contains: it means more to get cited by a paper with ten citations than by a paper with one hundred. Similarly, you could divide credit for a paper among its authors, rather than giving each author full credit.

Some are more elaborate. They suggest using a variant of Google’s PageRank algorithm to rank papers and authors. Essentially, the algorithm imagines someone wandering from paper to paper and tries to figure out which papers are more central to the network. This is apparently an old idea, but by combining it with their normalization by number of citations they eke a bit more mileage from it. (I also found their treatment a bit clearer than the older papers they cite. There are a few more elaborate setups in the literature as well, but they seem to have a lot of free parameters so Strumia and Torre’s setup looks preferable on that front.)

One final problem they consider is that of self-citations, and citation cliques. In principle, you could boost your citation count by citing yourself. While that’s easy to correct for, you could also be one of a small number of authors who cite each other a lot. To keep the system from being gamed in this way, they propose a notion of a “CitationCoin” that counts (normalized) citations received minus (normalized) citations given. The idea is that, just as you can’t make anyone richer just by passing money between your friends without doing anything with it, so a small community can’t earn “CitationCoins” without getting the wider field interested.

There are still likely problems with these ideas. Dividing each paper by its number of authors seems like overkill: a thousand-person paper is not typically going to get a thousand times as many citations. I also don’t know whether there are ways to game this system: since the metrics are based in part on citations given, not just citations received, I worry there are situations where it would be to someone’s advantage to cite others less. I think they manage to avoid this by normalizing by number of citations given, and they emphasize that PageRank itself is estimating something we directly care about: how often people read a paper. Still, it would be good to see more rigorous work probing the system for weaknesses.

In addition to the proposed metrics, Strumia and Torre’s paper is full of interesting statistics about the arXiv and InSpire databases, both using more traditional metrics and their new ones. Whether or not the methods they propose work out, the paper is definitely worth a look.

Tommaso DorigoMachine Learning For Phenomenology

These days the use of machine learning is exploding, as problems which can be solved more effectively with it are ubiquitous, and the construction of deep neural networks or similar advanced tools is at reach of sixth graders.  So it is not surprising to see theoretical physicists joining the fun. If you think that the work of a particle theorist is too abstract to benefit from ML applications, you better think again. 

read more

April 05, 2018

n-Category Café Magnitude Homology Reading Seminar, II

guest post by Scott Balchin

Following on from Simon’s introductory post, this is the second installment regarding the reading group at Sheffield on magnitude homology, and the first installment which looks at the paper of Leinster and Shulman. In this post, we will be discussing the concept of magnitude for enriched categories.

The idea of magnitude is to capture the essence of size of a (finite) enriched category. By changing the ambient enrichment, this magnitude carries different meanings. For example, when we enrich over the monoidal category [0,][0,\infty ] we capture metric space invariants, while changing the enrichment to {true,false}\{ \text {true},\text {false}\} we capture poset invariants.

We will introduce the concept of magnitude via the use of zeta functions of enriched categories, which depend on the choice of a size function for the underlying enriching category. Then, we describe magnitude in a more general way using the theory of weightings. The latter will have the advantage that it is invariant under equivalence of categories, a highly desirable property.

What is presented here is taken almost verbatim from Section 2 of Leinster and Shulman’s Magnitude homology of enriched categories and metric spaces. It is, however, enhanced using comments from various other papers and, of course, multiple nn-Café posts.

Sizes on monoidal categories

Recall that:

  • A symmetric monoidal category consists of a triple (V,,I)(\mathbf{V},\otimes ,I) where V\mathbf{V} is a category, and \otimes is a symmetric bifunctor V×VV\mathbf{V} \times \mathbf{V} \to \mathbf{V} with identity object II.
  • A semiring (or rig) 𝕂\mathbb{K} is a ring without additive inverses.

Definition: A size is a function #:ob(V)𝕂\operatorname{\#} \colon \text {ob}(\mathbf{V}) \to \mathbb{K} such that:

  1. #\operatorname{\#} is invariant under isomorphism: ab#a=#ba \cong b \Rightarrow \operatorname{\#}a = \operatorname{\#}b.
  2. #\operatorname{\#} is multiplicative: #(I)=1\operatorname{\#}(I)=1 and #(ab)=#a#b\operatorname{\#}(a \otimes b) = \operatorname{\#}a \cdot \operatorname{\#}b.

Example: Let (V,,I)=(FinSet,×,{})(\mathbf{V},{\otimes}, I)=(\text {FinSet},\times, \{\star\}) and 𝕂=\mathbb{K} = \mathbb{N}. Then we can take #\operatorname{\#} to be the cardinality. Note that \mathbb{N} is the initial object in the category of semirings, and therefore by defining a size on \mathbb{N}, we can define a size on any other semiring by taking the unique map ϕ:S\phi \colon \mathbb{N} \to S, ϕ(1)=1 S\phi (1) = 1_{S}.

Example: Let (V,,I)=([0,],+,0)(\mathbf{V},{\otimes}, I)= ([0,\infty ], +, 0). Here [0,][0,\infty ] is the category whose objects are the non-negative reals together with \infty where there is a morphism aba \to b if and only if aba \geq b, and the monoidal structure is addition ++. We take 𝕂=\mathbb{K} = \mathbb{R} and set #a=e a\operatorname{\#}a = e^{-a}. The choice of ee here is arbitrary, we could take any positive real number qq, and note that we have q a=e taq^{a} = e^{-t a} for t=lnqt = \operatorname{ln}q.

Let V\mathbf{V} be essentially small, and 𝕂=[ob(𝕍)/]\mathbb{K} = \mathbb{N}[\text {ob}(\mathbb{V})/\cong ] be the monoid semiring of the monoid of isomorphism classes of objects in V\mathbf{V}. This is the univerval example in that any other size on 𝕍\mathbb{V} factors uniquely through it.

For example, if V=[0,]\mathbf{V} = [0,\infty ] as before, then the elements of this universal semiring are formal \mathbb{N}-linear combinations of numbers in [0,][0,\infty ], and are therefore of the form

a 1[ 1]+a 2[ 2]++a n[ n]. a_{1}[\ell _{1}] + a_{2}[\ell _{2}] + \cdots + a_{n}[\ell _{n}].

Since multiplication in 𝕂\mathbb{K} is defined via [ 1][ 2]=[ 1+ 2][\ell _{1}] \cdot [\ell _{2}] = [\ell _{1} + \ell _{2}], it makes more sense to write [][\ell ] as q q^{\ell } for a formal variable qq. Therefore we can see the elements of 𝕂\mathbb{K} represented as generalised polynomials

a 1q 1+a 2q 2++a nq n a_{1}q^{\ell _{1}} + a_{2}q^{\ell _{2}} + \cdots + a_{n}q^{\ell _{n}}

where the i[0,]\ell _{i} \in [0,\infty ]. We write this semiring of generalised polynomials as [q [0,]]\mathbb{N}[q^{[0,\infty ]}].

We can now compare this universal size construction with the previous example of #a=e a\operatorname{\#}a = e^{-a}. There is an evaluation map [q [0,]]\mathbb{N}[q^{[0,\infty ]}] \to \mathbb{R} that substitutes e 1e^{-1} (or any other positive real number) for qq. Therefore, the universal size valued in [q [0,]]\mathbb{N}[q^{[0,\infty ]}] contains all of the information of the sizes ae taa\mapsto e^{-t a} for all values of tt.

Here are some further examples of sizes associated to other symmetric monoidal categories. However, we will not be considering any of these in the rest of this post.


  • (V,,I)=(sSet,,Δ[0])(\mathbf{V},\otimes ,I) = (\mathbf{sSet},\otimes ,\Delta [0]). We can take #\operatorname{\#} to be the Euler characteristic of the realisation of the simplicial set.
  • (V,,I)=(FDVect,,)(\mathbf{V},\otimes ,I) = (\mathbf{FDVect},\otimes ,\mathbb{C}). We can take #\operatorname{\#} to be the cardinality of the vector space
  • (V,,I)=([0,],max,0)(\mathbf{V},\otimes ,I) = ([0,\infty ],\text {max},0). This is the same category as above, however we have changed the monoidal structure to be the maximum instead of addition. Categories enriched over this are ultrametric spaces, such as the pp-adic numbers. In this case #\operatorname{\#} cannot be e ae^{-a}, instead, it needs to be some form of indicator function. We (arbitrarily) choose the interval [0,1][0,1] and say that #a=1\operatorname{\#}a = 1 if a1a \leq 1, and 00 otherwise.

Enriched categories

Many people think that an enriched category is a category in which the hom-sets have extra structure. Whilst such a thing is usually an enriched category, the notion of enriched category is much more encompassing than that. The homs in an enriched category might not be sets, they might just be objects in some abstract category, so they might not even have elements, as we will see in the metric space example below.

Definition: For (V,,I)(\mathbf{V},{\otimes},I) a monoidal category, a category enriched over V\mathbf{V} – or, a V\mathbf{V}-categoryXX consists of a set of objects ob(X)\operatorname{ob}(X) such that the following hold:

  1. for each pair a,bob(X)a,b \in \operatorname{ob}(X) there is a specified obect X(a,b)VX(a,b)\in \mathbf{V} called the hom-object;
  2. for each triple a,b,cob(X)a,b,c \in \operatorname{ob}(X) there is a specified morphism X(a,b)X(b,c)X(a,c)X(a,b)\otimes X(b,c)\to X(a,c) in V\mathbf{V} called composition;
  3. for each element aob(X)a \in \operatorname{ob}(X) there is a specified morphism id a:1X(a,a)id_a\colon 1\to X(a,a) in V\mathbf{V} called the identity.

These are required to satisfy assocativity and identity axioms which we won’t go into here; see the nLab for the details.

Example: If (V,,I)=(FinSet,×,{})(\mathbf{V},{\otimes}, I)=(\text {FinSet},\times, \{\star\}) then a V\mathbf{V}-category is precisely a small category with finite hom-sets.

Example: If (V,,I)=([0,],+,1)(\mathbf{V},{\otimes}, I)=([0,\infty ],+,1) then a V\mathbf{V}-category is an extended quasi-pseudo metric space (in the sense of Lawvere). The various adjectives here mean the following:

  • pseudo: d(x,y)d(x,y) does not imply x=yx=y.
  • quasi: d(x,y)d(x,y) is not necessarily equal to d(y,x)d(y,x).
  • extended: d(x,y)d(x,y) is allowed to be \infty .

Why does this enrichment give us something like a metric space? For each pair of objects x,yx,y, we will denote the hom X [0,](x,y) +X_{[0,\infty ]}(x,y) \in \mathbb{R}^{+} as d(x,y)d(x,y). Now, the identity axiom of the enrichment tells us that for each object xXx \in X there is a morphism 0d(x,x)0 \to d(x,x) in [0,][0,\infty ] which tells us that 0d(x,y)00 \geq d(x,y) \geq 0, and therefore d(x,x)=0d(x,x) = 0. Finally the composition tells us that for all triples of objects x,y,zXx,y,z \in X we have a morphism d(x,y)+d(y,z)d(x,z)d(x,y) + d(y,z) \to d(x,z), and therefore d(x,y)+d(y,z)d(x,z)d(x,y) + d(y,z) \geq d(x,z) which gives us the triangle axiom.

Magnitude of finite enriched categories

For us, a square matrix will be one whose rows and columns are indexed by the same finite set (this means that we do not impose an ordering on the rows and columns). In particular, there is a category whose objects are finite sets, and whose morphisms ABA \to B are functions A×B𝕂A \times B \to \mathbb{K} with composition by matrix multiplication. The square matrices that we are interested in are the endomorphisms of this category. Note that this latter description illuminates what we mean by such a square matrix being invertible.

Definition: Let XX be a V\mathbf{V}-category with finitely many objects, where we denote the hom-object by X(x,y)X(x,y). Then its zeta function is the ob(X)×ob(X)\text {ob}(X) \times \text {ob}(X) matrix over 𝕂\mathbb{K} such that

Z X,𝕂(x,y)=#(X(x,y)). Z_{X,\mathbb{K}}(x,y) = \operatorname{\#}(X(x,y)).

Our notation is slightly different here to the usual, in that we wish to explicitly keep track of the semiring 𝕂\mathbb{K}.

Definition: We say that XX has Möbius inversion (with respect to 𝕂\mathbb{K} and #\operatorname{\#}) if Z X,𝕂Z_{X,\mathbb{K}} is invertible over 𝕂\mathbb{K}. If XX has Möbius inversion, then we set its magnitude, Mag 𝕂(X)\operatorname{Mag}_{\mathbb{K}}(X), is the sum of all the entries of Z X,𝕂 1Z_{X,\mathbb{K}}^{-1}.

Example: Let V=FinSet\mathbf{V} = \text {FinSet}, 𝕂=\mathbb{K} = \mathbb{Q} and ## be the cardinality. We take XX to be any finite category which is skeletal (i.e., isomorphic objects are necessarily equal) and contains no nonidentity endomorphisms. Then XX has Möbius inversion, and its magnitude is equal to the Euler characteristic of the geometric realisation of its nerve.

Note that if XX were not skeletal, then there would be two identical rows in Z X,𝕂Z_{X,\mathbb{K}} and the determinant would be zero. This raises an observation that the magnitude of a category is not invariant under equivalence of categories.

Let us expand a bit on the last comment made in the example above. Let XX be a category of the above form. Let a,bXa,b \in X, we say that an nn-path from aa to bb is a diagram

a=a 0a 1a n=b a = a_{0} \to a_{1} \to \cdots \to a_{n} = b

Such a path is a circuit if a=ba=b, and non-degenerate if no f if_{i} is the identity. Then for our particular XX, we have

Z X, 1(a,b)= n0(1) n|{non-degeneraten-pathsfroma to b}| Z_{X,\mathbb{Q}}^{-1}(a,b) = \sum _{n \geq 0}(-1)^{n} | \{ \text{non-degenerate} \: n\text{-paths} \: \text{from} \: a \: \text{ to } \: b \} | \in \mathbb{Z}

Now, we note that for our choice of XX, the nerve contains only finitely many non-degenerate simplices and we get that

χ(|NX|)= n0(1) n|{non-degenerate n-simplicesin NX}| \chi (|NX|) = \sum _{n \geq 0} (-1)^{n} |\{ \text{non-degenerate } \: n \:\text{-simplices} \: \text{in } \: NX\} |

The claimed result then follows from this.

Example: Remember from above that if V=[0,]\mathbf{V}=[0,\infty ] then a V\mathbf{V}-category is an extended quasi-pseudo metric space. With the family of \mathbb{R}-valued size functions e tde^{-t d}, the resulting magnitude of an (extended quasi-pseudo-)metric space is an object of interest and has been studied extensively.

There is more probabilistic chance of a matrix being invertible over a field or a ring. Therefore if 𝕂\mathbb{K} is given as a semiring, it makes sense to universally complete it to a field or a ring. The universal semirings can be completed to rings by allowing integer coefficients as opposed to natural number coefficients. These rings need not be integral domains, in particular, (q [0,])\mathbb{Z}(q^{[0,\infty ]}) contains zero divisors

q (1q )=q q +=q q =0. q^{\infty }(1-q^{\infty }) = q^{\infty }- q^{\infty +\infty } = q^{\infty }- q^{\infty }= 0.

However, by omitting \infty (and only caring about quasi-psuedo metrics) we indeed do get an integral domain [q [0,)]\mathbb{Z}[q^{[0,\infty )}]. Its field of rational fractions written [q [0,]]\mathbb{Q}[q^{[0,\infty ]}] (or more suggestively (q )\mathbb{Q}(q^{\mathbb{R}})) consists of generalised rational functions

a 1q 1+a 2q 2++a nq nb 1q k 1+b 2q k 2++b mq k m \frac{a_{1}q^{\ell _{1}} + a_{2}q^{\ell _{2}} + \cdots + a_{n}q^{\ell _{n}}}{b_{1}q^{k_{1}} + b_{2}q^{k_{2}} + \cdots + b_{m}q^{k_{m}}} with a i,b ja_{i},b_{j} \in \mathbb{Q} and i,k j\ell _{i},k_{j} \in \mathbb{R}.

Theorem: Any finite quasi-metric space (i.e., a finite skeletal [0,)[0,\infty )-category) has Möbius inversion over (q )\mathbb{Q}(q^{\mathbb{R}}).

To prove this we make the field (q )\mathbb{Q}(q^{\mathbb{R}}) ordered by inheriting the order of \mathbb{Q} and declaring the variable qq to be infinitesimal. Therefore we order the generalised polynomials lexicographically on their coefficients, starting with the most negative exponents of qq.

The condition d(x,x)=0d(x,x)=0 of a metric space gives us that the diagonal entries of Z X,(q )Z_{X,\mathbb{Q}(q^{\mathbb{R}})} are all q 0=1q^{0} = 1. The skeletal condition (d(x,y)>0d(x,y) \gt 0 if xyx\neq y) means that the off-diagonal entries are q d(x,y)q^{d(x,y)} which is infinitesimal as d(x,y)>0d(x,y)\gt 0. Therefore, we get that the determinant of Z X,(q )Z_{X,\mathbb{Q}(q^{\mathbb{R}})} is a sum of the diagonal terms (whose entries are e d(x,x)=e 0=1e^{d(x,x)}=e^{0}=1) and a finite number of infinitesimals, which is necessarily positive and therefore non-zero. Therefore Z X,(q )Z_{X,\mathbb{Q}(q^{\mathbb{R}})} is indeed invertible, and the theorem is proved.

Magnitudes via weightings

We now look at a way of generalising magnitudes using weightings. The advantage of this will be invariance under equivalence, a property highly desirable for any categorical invariant.

Definition: A weighting on a finite V\mathbf{V}-category XX is a function w:ob(X)𝕂w \colon \text {ob}(X) \to \mathbb{K} such that y#(X(x,y))w(y)=1\sum _{y} \operatorname{\#}(X(x,y)) \cdot w(y) = 1 for all xXx \in X. A coweighting on XX is a weighting on X opX^{op}.

Here are some simple examples.


  • Consider the category (i.e., enriched over FinSet\mathbf{FinSet}) b 1ab 2b_{1} \leftarrow a \rightarrow b_{2}. Then this category carries a unique weighting given by w(a)=1w(a)=-1 and w(b i)=1w(b_{i})=1.
  • It is possible for a category to have no weightings. For an instance of this see Example 1.11(c) of Leinster’s paper The Euler characteristic of a category.
  • Consider a category with two objects and an isomorphism between them. A weighting is then given by a pair of rational numbers whose sum is 1. Therefore, there are infinitely many weightings on this category.

We can relate the notion of weighting to Möbius inversion in the following way.

Theorem: If 𝕂\mathbb{K} is a field, then a V\mathbf{V}-category XX has Möbius inversion if and only if it has a unique weighting ww, and if and only if it has a unique coweighting vv, in which case Mag(X)= xw(x)= xv(x)\operatorname{Mag}(X) = \sum _{x} w(x) = \sum _{x} v(x).


  • The category b 1ab 2b_{1} \leftarrow a \rightarrow b_{2} has zeta function given by Z X,=(1 1 1 0 1 0 0 0 1) Z_{X,\mathbb{Q}}= \begin{pmatrix} 1 & 1 & 1 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix} This matrix has inverse Z X, 1=(1 1 1 0 1 0 0 0 1) Z_{X,\mathbb{Q}}^{-1}= \begin{pmatrix} 1 & -1 & -1 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix} and therefore its magnitude Mag(X)=1+1+111=1\operatorname{Mag}(X) = 1+1+1-1-1=1. We reconsider the weighting on XX, and see that the sum of the weightings is 1+11=11+1-1=1.

    • The category \bullet \cong \bullet has zeta function given by Z X=(1 1 1 1) Z_{X\mathbb{Q}}= \begin{pmatrix} 1 & 1 \\ 1 & 1 \end{pmatrix} which clearly has no inverse, and therefore the category has no magnitude.

Theorem: If a V\mathbf{V}-category XX has both a weighting ww and a coweighting vv then xw(x)= xv(x)\sum _{x} w(x) = \sum _{x} v(x).

Definition: A V\mathbf{V}-category has magnitude if it has both a weighting ww and a coweighting vv, in which case its magnitude is the common value of xw(x)\sum _{x} w(x) and xv(x)\sum _{x} v(x).

This generalised notion of magnitude is that it is invariant under equivalence of V\mathbf{V}-enriched categories unlike the definition involving Möbius inversions.

Theorem: If XX and XX' are equivalent V\mathbf{V}-enriched categories, and XX has a weighting, a coweighting, or has magnitude, then so does XX'.

We provide a brief explanation of why this is true. Let F:XXF \colon X \to X' be an equivalence. Given aXa \in X, we write C aC_{a} for the number of objects in the isoclass of aa, and similarly C aC_{a'} for aXa' \in X'. Take a weighting ll on XX', and set k(a)=(C Fa/C a)l(Fa)k(a) = (C_{Fa}/C_{a})l(Fa). Then kk is a weighting on X.

Theorem: If XX and XX' are equivalent V\mathbf{V}-enriched categories, and both have magnitude, then Mag(X)=Mag(X)\operatorname{Mag}(X) = \operatorname{Mag}(X').

Next time we will be looking at how Hochschild homology comes into the picture

April 02, 2018

n-Category Café Dynamical Systems and Their Steady States

guest post by Maru Sarazola

Now that we know how to use decorated cospans to represent open networks, the Applied Category Theory Seminar has turned its attention to open reaction networks (aka Petri nets) and the dynamical systems associated to them.

In A Compositional Framework for Reaction Networks (summarized in this very blog by John Baez not too long ago), authors John Baez and Blake Pollard put Fong’s results to good use and define cospan categories RxNet\mathbf{RxNet} and Dynam\mathbf{Dynam} of (open) reaction networks and (open) dynamical systems. Once this is done, the main goal of the paper is to show that the mapping that associates to an open reaction network its corresponding dynamical system is compositional, as is the mapping that takes an open dynamical system to the relation that holds between its constituents in steady state. In other words, they show that the study of the whole can be done through the study of the parts.

I would like to place the focus on dynamical systems and the study of their steady states, taking a closer look at this correspondence called “black-boxing”, and comparing it to previous related work done by David Spivak.

Baez–Pollard’s approach

The category Dynam\mathbf{Dynam} of open dynamical systems

Let’s start by introducing the main players. A dynamical system is usually defined as a manifold MM whose points are “states”, together with a smooth vector field on MM saying how these states evolve in time. Since the motivation in this paper comes from chemistry, our manifolds will be euclidean spaces S\mathbb{R}^S, where SS should be thought of as the finite set of species involved, and a vector c Sc\in\mathbb{R}^S gives the concentration of each species. Then, the dynamical system is a differential equation

dc(t)dt=v(c(t))\frac{d c(t)}{d t}=v(c(t))

where c: Sc:\mathbb{R}\to\mathbb{R}^S gives the concentrations as a function of time, and vv is a vector field on S\mathbb{R}^S.

Now imagine our motivating chemical system is open; that is, we are allowed to inject molecules of some chosen species, and remove some others. An open dynamical system is a cospan of finite sets

together with a vector field vv on S\mathbb{R}^S. Here the legs of the cospan mark the species that we’re allowed to inject and remove, labeled ii (oo) for input (output).

So, how can we build a category from this? Loosely citing a result of Fong, if the decorations of the cospan (in this case, the vector fields) can be given through a functor F:(FinSet,+)(Set,×)F:(\mathbf{FinSet},+)\to(\mathbf{Set},\times ) that is lax monoidal, then we can form a category whose objects are finite sets, and whose morphisms are (iso classes of) decorated cospans.

Indeed, this can be done in a very natural way, and therefore gives rise to the category Dynam\mathbf{Dynam}, whose morphisms are open dynamical systems.

The black-boxing functor :DynamRel\blacksquare :\mathbf{Dynam}\to\mathbf{Rel}

Given a dynamical system, one of the first things we might like to do is to study its fixed points; in our case, study the concentration vectors that remain constant in time. When working with an open dynamical system, it’s clear that the amounts that we choose to inject and remove will alter the change in concentration of our species, and hence it makes sense to consider the following.

For an open dynamical system (XiSoY,v)(X\xrightarrow{i} S \xleftarrow{o} Y, v), together with a constant inflow I XI\in\mathbb{R}^X and constant outflow O YO\in\mathbb{R}^Y, a steady state (with inflows II and outflows OO) is a constant vector of concentrations c Sc\in\mathbb{R}^S such that

v(c)+i *(I)o *(O)=0v(c)+i_{\ast} (I)-o_{\ast} (O)=0

Here i *(I)i_{\ast} (I) is the vector in S\mathbb{R}^S given by i *(I)(s)= xX:i(x)=sI(x)i_{\ast} (I)(s)=\sum_{x\in X: i(x)=s} I(x); that is, the inflow concentration of all species as marked by the input leg of the cospan. As the authors concisely put it, “in a steady state, the inflows and outflows conspire to exactly compensate for the reaction velocities”.

Note that the inflow and outflow functions II and OO won’t affect any species not marked by the legs of the cospan, and so any steady state cc must be such that v(c)=0v(c)=0 when restricted to these inner species that we can’t reach.

What we want to do next is build a functor that, given an open dynamical system, records all possible combinations of input concentrations, output concentrations, inflows and outflows that hold in steady state. This process will be called black-boxing, since it discards information that can’t be seen at the inputs and outputs.

The black-boxing functor :DynamRel\blacksquare:\mathbf{Dynam}\to \mathbf{Rel} takes a finite set XX to the vector space X X\mathbb{R}^X\oplus\mathbb{R}^X, and a morphism, that is, an open dynamical system f=(XiSoY,v)f=(X\xrightarrow{i} S \xleftarrow{o} Y, v), to the subset

(f) X X Y Y\blacksquare(f)\subseteq\mathbb{R}^X\oplus\mathbb{R}^X\oplus\mathbb{R}^Y\oplus\mathbb{R}^Y

(f)={(i *(c),I,o *(c),O):c  is a steady state with inflows  I  and outflows  O}\blacksquare(f)=\{(i^{\ast} (c),I,o^{\ast} (c),O): c &nbsp; \text{ is a steady state with inflows } &nbsp; I &nbsp; \text{ and outflows } &nbsp; O\}

where i *(c)i^{\ast} (c) is the vector in X\mathbb{R}^X defined by i *(c)(x)=c(i(x))i^{\ast} (c) (x)=c(i(x)); that is, the concentration of the input species.

The authors prove that black-boxing is indeed a functor, which implies that if we want to study the steady states of a complex open dynamical system, we can break it up into smaller, simpler pieces and study their steady states. In other words, studying the steady states of a big system, which is given by the composition of smaller systems (as morphisms in the category Dynam\mathbf{Dynam}) amounts to studying the steady states of each of the smaller systems, and composing them (as morphisms in Rel\mathbf{Rel}).

Spivak’s approach

The category 𝒲\mathcal{W} of wiring diagrams

Instead of dealing with dynamical systems from the start, Spivak takes a step back and develops a syntax for boxes, which are things that admit inputs and outputs.

Let’s define the category 𝒲 𝒞\mathcal{W}_\mathcal{C} of 𝒞\mathcal{C}-boxes and wiring diagrams, for a category 𝒞\mathcal{C} with finite products. Its objects are pairs

X=(X in,X out)X=(X^\text{in},X^\text{out})

where each of these coordinates is a finite product of objects of 𝒞\mathcal{C}. For example, we interpret the pair (A 1×A 2,B 1×B 2×B 3)(A_1\times A_2, B_1\times B_2\times B_3) as a box with input ports (a 1,a 2)A 1×A 2(a_1 ,a_2)\in A_1\times A_2 and output ports (b 1,b 2,b 3)B 1×B 2×B 3(b_1 ,b_2 ,b_3 )\in B_1\times B_2\times B_3.

Its morphisms are wiring diagrams φ:XY\varphi:X\to Y, that is, pairs of maps (φ in,φ out)(\varphi^\text{in},\varphi^\text{out}) which we interpret as a rewiring of the box XX inside of the box YY. The function φ in\varphi^\text{in} indicates whether an input port of XX should be attached to an input of YY or to an output of XX itself; the function φ out\varphi^\text{out} indicates how the outputs of XX feed the outputs of YY. Examples of wirings are

Composition is given by a nesting of wirings.

Given boxes XX and YY, we define their parallel composition by

XY=(X in×Y in,X out×Y out)X\boxtimes Y=(X^\text{in}\times Y^\text{in},X^\text{out}\times Y^\text{out})

This gives a monoidal structure to the category 𝒲 𝒞\mathcal{W}_\mathcal{C}. Parallel composition is true to its name, as illustrated by

The huge advantage of this approach is that one can now fill the boxes with suitable “inhabitants”, and model many different situations that look like wirings at their core. These inhabitants will be given through functors 𝒲 𝒞Set\mathcal{W}_\mathcal{C}\to\mathbf{Set}, taking a box to the set of its desired interpretations, and giving a meaning to the wiring of boxes.

The functor ODS:𝒲 EucSetODS:\mathcal{W}_{\mathbf{Euc}}\to\mathbf{Set} of open dynamical systems

The first of our inhabitants will be, as you probably guessed by now, open dynamical systems. Here 𝒞=Euc\mathcal{C}=\mathbf{Euc} is the category of Euclidean spaces n\mathbb{R}^n and smooth maps.

From the perspective of Spivak’s paper, an ( X, Y)(\mathbb{R}^X,\mathbb{R}^Y)-open dynamical system is a 3-tuple ( S,f dyn,f rdt)(\mathbb{R}^S,f^\text{dyn},f^\text{rdt}) where

  • S\mathbb{R}^S is the state space

  • f dyn: X× S Sf^\text{dyn}:\mathbb{R}^X\times\mathbb{R}^S\to\mathbb{R}^S is a vector field parametrized by the inputs X\mathbb{R}^X, giving the differential equation of the system

  • f rdt: S Yf^\text{rdt}:\mathbb{R}^S\to\mathbb{R}^Y is the readout function at the outputs Y\mathbb{R}^Y.

One should notice the similarity with our previously defined dynamical systems, although it’s clear that the two definitions are not equivalent.

The functor ODS:𝒲 EucSetODS:\mathcal{W}_{\mathbf{Euc}}\to\mathbf{Set} exhibiting dynamical systems as inhabitants of input-output boxes, takes a box X=(X in,X out)X=(X^\text{in},X^\text{out}) to the set of all ( X in, X out)(\mathbb{R}^{X^\text{in}},\mathbb{R}^{X^\text{out}})-dynamical systems

ODS(X)={( S,f dyn: X in× S S,f rdt: S X out)}ODS(X)=\{(\mathbb{R}^S,f^\text{dyn}:\mathbb{R}^{X^\text{in}}\times\mathbb{R}^S\to\mathbb{R}^S,f^\text{rdt}:\mathbb{R}^S\to\mathbb{R}^{X^\text{out}})\}

You can surely figure out how ODSODS acts on wirings by drawing a picture and doing a bit of careful bookkeeping.

Note that there’s a natural notion of parallel composition of two dynamical systems, which amounts to carrying out the processes indicated by the two dynamical systems in parallel. Spivak shows that ODSODS is a functor, and, furthermore, that

ODS(XY)ODS(X)ODS(Y)ODS(X\boxtimes Y)\simeq ODS(X)\boxtimes ODS(Y)

The functor Mat:𝒲 𝒞SetMat:\mathcal{W}_{\mathcal{C}}\to\mathbf{Set} of Set\mathbf{Set}-matrices

Our second inhabitants will be given by matrices of sets. For objects X,YX,Y, an (X,Y)(X,Y)-matrix of sets is a function MM that assigns to each pair (x,y)(x,y) a set M x,yM_{x,y}. In other words, it is a matrix indexed by X×YX\times Y that, instead of coefficients, has sets in each position.

The functor Mat:𝒲 𝒞SetMat:\mathcal{W}_{\mathcal{C}}\to\mathbf{Set} exhibiting Set\mathbf{Set}-matrices as inhabitants of input-output boxes, takes a box X=(X in,X out)X=(X^\text{in},X^\text{out}) to the set of all (X in,X out)(X^\text{in},X^\text{out})-matrices of sets

Mat(X)={{M i,j} X in×X out:M i,j  is a set}Mat(X)=\{\{M_{i,j}\}_{X^\text{in}\times X^\text{out}} : M_{i,j} &nbsp; \text{ is a set}\}

Once again, it’s not too hard to figure out how MatMat should act on wirings.

Like before, there’s a notion of parallel composition of two matrices of sets, and the author shows that MatMat is a functor such that

Mat(XY)Mat(X)Mat(Y)Mat(X\boxtimes Y)\simeq Mat(X)\boxtimes Mat(Y)

The steady-state natural transformation Stst:ODSMatStst:ODS\to Mat

Finally, we explain how to use all this to study steady states of dynamical systems.

Given an ( X, Y)(\mathbb{R}^X,\mathbb{R}^Y)-dynamical system f=( S,f dyn,f rdt)f=(\mathbb{R}^S,f^\text{dyn},f^\text{rdt}) and an element (I,O) X× Y(I,O)\in\mathbb{R}^X\times\mathbb{R}^Y, an (I,O)(I,O)-steady state is a state c Sc\in\mathbb{R}^S such that

f dyn(I,c)=0   and   f rdt(c)=Of^\text{dyn}(I,c)=0 &nbsp; &nbsp; \text{ and } &nbsp; &nbsp; f^\text{rdt}(c)=O

Since dynamical systems are encoded by the functor ODSODS, it makes sense to study steady states through a natural transformation out of ODSODS. We define Stst:ODSMatStst:ODS\to Mat as the transformation that assigns to each box XX, the function

Stst X:ODS(X)Mat(X)Stst_X:ODS(X)\longrightarrow Mat(X)

taking a dynamical system ( S,f dyn,f rdt)(\mathbb{R}^S,f^\text{dyn},f^\text{rdt}) to its matrix of steady states

M I,O={c S:f dyn(I,c)=0, f rdt(c)=O}M_{I,O}=\{c\in\mathbb{R}^S : f^\text{dyn}(I,c)=0, &nbsp; f^\text{rdt}(c)=O\}

where (I,O) X in× X out(I,O)\in \mathbb{R}^{X^\text{in}}\times \mathbb{R}^{X^\text{out}}. The author proceeds to show that StstStst is a monoidal natural transformation.

Is it possible to use this machinery to draw the same conclusion as before, that is, that the steady states of a composition of systems comes from the composition of the steady states of the parts?

Indeed, it is! Given two boxes X 1X_1 and X 2X_2, we recover the usual notion of (serial) composition by first setting them in parallel X 1X 2X_1 \boxtimes X_2,

and wiring this by φ:X 1X 2Y\varphi:X_1 \boxtimes X_2\to Y as follows:

The fact that StstStst is a monoidal natural transformation, combined with the facts that the functors ODSODS and MatMat respect parallel composition, allows us to write the following diagram, where both squares are commutative

Then, chasing the diagram along the top and left sides gives the steady states of the serial composition of the dynamical systems X 1X_1 and X 2X_2, while chasing it along the right and bottom sides gives the composition of the steady states of X 1X_1 and of X 2X_2, and the two must agree.

The two approaches, side by side

So how are these two perspectives related? Looking at the definitions we can immediately see that Spivak’s approach has a broader scope than Baez and Pollard’s, so it’s apparent that his results won’t be implied by theirs.

For the converse direction, recall that in the first paper, a dynamical system is given by a decorated cospan f=(XiSoY,v)f=(X\xrightarrow{i} S \xleftarrow{o} Y, v), and a steady state with inflows II and outflows OO is a constant vector of concentrations c Sc\in\mathbb{R}^S such that

v(c)+i *(I)o *(O)=0v(c)+i_{\ast} (I)-o_{\ast} (O)=0

Thus, studying the steady states for this cospan system corresponds to studying the box system

f=( S,f dyn: X× S S,f rdt: S Y)f=(\mathbb{R}^S, f^\text{dyn}:\mathbb{R}^X\times\mathbb{R}^S\to\mathbb{R}^S, f^\text{rdt}:\mathbb{R}^S\to\mathbb{R}^Y)

with dynamics given by f dyn(I,c)=v(c)+i *(I)o *(f rdt(c))f^\text{dyn}(I,c)=v(c)+i_{\ast} (I)-o_{\ast} (f^\text{rdt}(c)), since its (I,O)(I,O)-steady states are vectors c Sc\in\mathbb{R}^S such that

f dyn(I,c)=0   and   f rdt(c)=Of^\text{dyn}(I,c)=0 &nbsp; &nbsp; \text{ and } &nbsp; &nbsp; f^\text{rdt}(c)=O

Thus, the study of the steady states of a given cospan dynamical system can be done just as well by looking at it as a box dynamical system and running it through Spivak’s machinery. However, setting two such box systems in serial composition will not yield the box system representing the composition of the cospan systems as one would (naively?) hope, so it doesn’t seem that Spivak’s compositional results will imply those of Baez and Pollard.

This is a bit disconcerting, but instead of it being discouraging, I believe it should be seen as an invitation to delve into the semantics of open dynamical systems and find the right perspective, which manages to subsume both of the approaches presented here.

n-Category Café Linguistics Using Category Theory

guest post by Cory Griffith and Jade Master

Most recently, the Applied Category Theory Seminar took a step into linguistics by discussing the 2010 paper Mathematical Foundations for a Compositional Distributional Model of Meaning, by Bob Coecke, Mehrnoosh Sadrzadeh, and Stephen Clark.

Here is a summary and discussion of that paper.

In recent years, well known advances in AI, such as the development of AlphaGo and the ongoing development of self driving cars, have sparked interest in the general idea of machines examining and trying to understand complex data. In particular, a variety of accounts of successes in natural language processing (NLP) have reached wide audiences (see, for example, The Great AI Awakening).

One key tool for NLP practitioners is the concept of distributional semantics. There is a saying due to Firth that is so often repeated in NLP papers and presentations that even mentioning its ubiquity has become a cliche:

“You shall know a word by the company it keeps.”

The idea is that if we want to know if two words have similar meanings, we should examine the words they are used in conjunction with, and in some way measure how much overlap there is. While direct ancestry of this concept can be traced at least back to Wittgenstein, and the idea of characterizing an object by its relationship with other objects is one category theorists are already fond of, distributional semantics is distinguished by its essentially statistical methods. The variations are endless and complex, but in the cases relevant to our discussion, one starts with a corpus, a suitable way of determining what the context of a word is (simply being nearby, having a grammatical relationship, being in the same corpus at all, etc) and ends up with a vector space in which the words in the corpus each specify a point. The distance between vectors (for an appropriate definition of distance) then correspond to relationships in meaning, often in surprising ways. The creators of the GloVe algorithm give the example of a vector space in which kingman+woman=queenking - man + woman = queen.

There is also a “top down,” relatively syntax oriented analysis of meaning called categorial grammar. Categorial grammar has no accepted formal definition, but the underlying philosophy, called the principle of compositionality, is this: a meaningful sentence is composed of parts, each of which itself has a meaning. To determine the meaning of the sentence as a whole, we may combine the meanings of the constituent parts according to rules which are specified by the syntax of the sentence. Mathematically, this amounts to constructing some algebraic structure which represents grammatical rules. When this algebraic structure is a category, we call it a grammar category.

The Paper


Pregroups are the algebraic structure that this paper uses to model grammar. A pregroup P is a type of partially ordered monoid. Writing xyx \to y to specify that xyx \leq y in the order relation, we require the following additional property: for each pPp \in P, there exists a left adjoint p lp^l and a right adjoint p rp^r, such that p lp1pp rp^l p \to 1 \to p p^r and pp r1p rpp p^r \to 1 \to p^r p. Since pregroups are partial orders, we can regard them as categories. The monoid multiplication and adjoints then upgrade the category of a pregroup to compact closed category. The equations referenced above are exactly the snake equations.

We can define a pregroup generated by a set XX by freely adding adjoints, units and counits to the free monoid on XX. Our grammar categories will be constructed as follows: take certain symbols, such as nn for noun and ss for sentence, to be primitive. We call these “word classes.” Generate a pregroup from them. The morphisms in the resulting category represent “grammatical reductions” of strings of word classes, with a particular string being deemed “grammatical” if it reduces to the word class ss. For example, construct the pregroup Preg({n,s})Preg( \{n,s\}) generated by nn and ss. A transitive verb can be thought of as accepting two nouns, one on the left and one on the right, and returning a sentence. Using the powerful graphical language for compact closed categories, we can represent this as

Using the adjunctions, we can turn the two inputs into outputs to get

Therefore the type of a verb is n rsn ln^r s n^l. Multiplying this on the left and right by nn allows us to apply the counits of nn to reduce n(n rsn l)nn \cdot (n^r s n^l) \cdot n to the type ss, as witnessed by

Let (FVect,,)(\mathbf{FVect},\otimes, \mathbb{R}) be the symmetric monoidal category of finite dimensional vector spaces and linear transformations with the standard tensor product. Since any vector space we use in our applications will always come equipped with a basis, these vector spaces are all endowed with an inner product. Note that FVect\mathbf{FVect} has a compact closed structure. The counit is the diagonal

η l=η r: VV 1 ie ie i\begin{array}{cccc} \eta_l = \eta_r \colon & \mathbb{R} & \to &V \otimes V \\ &1 &\mapsto & \sum_i \overrightarrow{e_i} \otimes \overrightarrow{e_i} \end{array}

and the unit is a linear extension of the inner product

ϵ l=ϵ r: VV i,jc ijv iw j i,jc ijv i,w j.\begin{array}{cccc} \epsilon^l = \epsilon^r \colon &V \otimes V &\to& \mathbb{R} \\ & \sum_{i,j} c_{i j} \vec{v_{i}} \otimes \vec{w_j} &\mapsto& \sum_{i,j} c_{i j} \langle \vec{v_i}, \vec{w_j} \rangle. \end{array}

The Model of Meaning

Let (P,)(P, \cdot) be a pregroup. The ingenious idea that the authors of this paper had was to combine categorial grammar with distributional semantics. We can rephrase their construction in more general terms by using a compact closed functor

F:(P,)(FVect,,).F \colon (P, \cdot) \to (\mathbf{FVect}, \otimes, \mathbb{R}) .

Unpacking this a bit, we assign each word class a vector space whose basis is a chosen finite set of context words. To each type reduction in PP, we assign a linear transformation. Because FF is strictly monoidal, a string of word classes p 1p 2p np_1 p_2 \cdots p_n maps to a tensor product of vector spaces V 1V 2V nV_1 \otimes V_2 \otimes \cdots \otimes V_n.

To compute the meaning of a string of words you must:

  1. Assign to each word a string of symbols p 1p 2p np_1 p_2 \cdots p_n according to the grammatical types of the word and your choice of pregroup formalism. This is nontrivial. For example, many nouns can also be used as adjectives.

  2. Compute the correlations between each word in your string and the context words of the chosen vector space (see the example below) to get a vector v 1v nV 1V nv_1 \otimes \cdots \otimes v_n \in V_1 \otimes \cdots \otimes V_n,

  3. choose a type reduction f:p 1p 2p nq 1q 2q nf \colon p_1 p_2 \cdots p_n \to q_1 q_2 \cdots q_n in your grammar category (there may not always be a unique type reduction) and,

  4. apply F(f)F(f) to your vector v 1v nv_1 \otimes \cdots \otimes v_n.

  5. You now have a vector in whatever space you reduced to. This is the “meaning” of the string of words, according the your model.

This sweeps some things under the rug, because A. Preller proved that strict monoidal functors from a pregroup to FVect\mathbf{FVect} actually force the relevant spaces to have dimension at most one. So for each word type, the best we can do is one context word. This is bad news, but the good news is that this problem disappears when more complicated grammar categories are used. In Lambek vs. Lambek monoidal bi-closed categories are used, which allow for this functorial description. So even though we are not really dealing with a functor when the domain is a pregroup, it is a functor in spirit and thinking of it this way will allow for generalization into more complicated models.

An Example

As before, we use the pregroup Preg({n,s})Preg(\{n,s\}). The nouns that we are interested in are

{Maria,John,Cynthia} \{ Maria, John, Cynthia \}

These nouns form the basis vectors of our noun space. In the order they are listed, they can be represented as

[1 0 0],[0 1 0],[0 0 1]. \begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix}, \begin{bmatrix} 0 \\ 1 \\ 0 \end{bmatrix}, \begin{bmatrix} 0 \\ 0 \\ 1 \end{bmatrix}.

The “sentence space” F(s)F(s) is taken to be a one dimensional space in which 00 corresponds to false and the basis vector 1 S1_S corresponds to true. As before, transitive verbs have type n rsn ln^r s n^l, so using our functor FF, verbs will live in the vector space NSNN \otimes S \otimes N. In particular, the verb “like” can be expressed uniquely as a linear combination of its basis elements. With knowledge of who likes who, we can encode this information into a matrix where the ijij-th entry corresponds to the coefficient in front of v i1 sv jv_i \otimes 1_s \otimes v_j. Specifically, we have

[1 0 1 1 1 0 1 0 1]. \begin{bmatrix} 1 & 0 & 1 \\ 1 & 1 & 0 \\ 1 & 0 & 1 \end{bmatrix}.

The ijij-th entry is 11 if person ii likes person jj and 00 otherwise. To compute the meaning of the sentence “Maria likes Cynthia”, you compute the matrix product

[1 0 0][1 0 1 1 1 0 1 0 1][0 0 1]=1 \begin{bmatrix} 1 & 0 & 0 \end{bmatrix} \begin{bmatrix} 1 & 0 & 1 \\ 1 & 1 & 0 \\ 1 & 0 & 1 \end{bmatrix} \begin{bmatrix} 0\\ 0\\ 1 \end{bmatrix} =1

This means that the sentence “Maria likes Cynthia” is true.

Food for Thought

As we said above, this model does not always give a unique meaning to a string of words, because at various points there are choices that need to be made. For example, the phrase “squad helps dog bite victim” has a different meaning depending on whether you take “bite” to be a verb or a noun. Also, if you reduce “dog bite victim” before applying it to the verb, you will get a different meaning than if you reduce “squad helps dog” and apply it to the verb “bite”. On the one hand, this a good thing because those sentences should have different meanings. On the other hand, the presence of choices makes it harder use this model in a practical algorithm.

Some questions arose which we did not have a clear way to address. Tensor products of spaces of high dimension quickly achieve staggering dimensionality — can this be addressed? How would one actually fit empirical data into this model? The “likes” example, which required us to know exactly who likes who, illustrates the potentially inaccessible information that seems to be necessary to assign vectors to words in a way compatible with the formalism. Admittedly, this is a necessary consequence of the fact the evaluation is of the truth or falsity of the statement, but the issue also arises in general cases. Can this be resolved? In the paper, the authors are concerned with determining the meaning of grammatical sentences (although we can just as easily use non-grammatical strings of words), so that the computed meaning is always a vector in the sentence space F(s)F(s). What are the useful choices of structure for the sentence space?

This paper was not without precedent — suggestions and models related its concepts of this paper had been floating around beforehand, and could be helpful in understanding the development of the central ideas. For example, Aerts and Gabora proposed elaborating on vector space models of meaning, incidentally using tensors as part of an elaborate quantum mechanical framework. Notably, they claimed their formalism solved the “pet fish” problem - English speakers rate goldfish as very poor representatives of fish as a whole, and of pets as a whole, but consider goldfish to be excellent representatives of “pet fish.” Existing descriptions of meaning in compositional terms struggled with this. In The Harmonic Mind, first published in 2005, Smolensky and Legendre argued for the use of tensor products in marrying linear algebra and formal grammar models of meaning. Mathematical Foundations for a Compositional Distributional Model of Meaning represents a crystallization of all this into a novel and exciting construction, which continues to be widely cited and discussed.

We would like to thank Martha Lewis, Brendan Fong, Nina Otter, and the other participants in the seminar.

Tommaso DorigoThe Magical Caves Of Frasassi

While spending a few vacation days on a trip around central Italy I made a stop in a place in the Appennini mountains, to visit some incredible caves. The caves of Frasassi were discovered in September 1971 by a few young speleologists, who had been tipped off by locals about the existence, atop a mountain near their village, of a hole in the ground, which emitted a strong draft wind - the unmistakable sign of underground hollows.

read more

ResonaancesSingularity is now

Artificial intelligence (AI) is entering into our lives.  It's been 20 years now since the watershed moment of Deep Blue versus Garry Kasparov.  Today, people study the games of AlphaGo against itself to get a glimpse of what a superior intelligence would be like. But at the same time AI is getting better in copying human behavior.  Many Apple users have got emotionally attached to Siri. Computers have not only learnt  to drive cars, but also not to slow down when a pedestrian is crossing the road. The progress is very well visible to the bloggers community. Bots commenting under my posts have evolved well past !!!buy!!!viagra!!!cialis!!!hot!!!naked!!!  sort of thing. Now they refer to the topic of the post, drop an informed comment, an interesting remark,  or a relevant question, before pasting a link to a revenge porn website. Sometimes it's really a pity to delete those comments, as they can be more to-the-point than those written by human readers.   

AI is also entering the field of science at an accelerated pace, and particle physics is as usual in the avant-garde. It's not a secret that physics analyses for the LHC papers (even if finally signed by 1000s of humans) are in reality performed by neural networks, which are just beefed up versions of Alexa developed at CERN. The hottest topic in high-energy physics experiment is now machine learning,  where computers teach  humans the optimal way of clustering jets, or telling quarks from gluons. The question is when, not if, AI will become sophisticated enough to perform a creative work of theoreticians. 

It seems that the answer is now.

Some of you might have noticed a certain Alan Irvine, affiliated with the Los Alamos National Laboratory, regularly posting on arXiv single-author theoretical papers on fashionable topics such as the ATLAS diphoton excess, LHCb B-meson anomalies, DAMPE spectral feature, etc. Many of us have received emails from this author requesting citations. Recently I got one myself; it seemed overly polite, but otherwise it didn't differ in relevance or substance from other similar requests. During the last two and half years,  A. Irvine has accumulated a decent h-factor of 18.  His papers have been submitted to prestigious journals in the field, such as the PRL, JHEP, or PRD, and some of them were even accepted after revisions. The scandal broke out a week ago when a JHEP editor noticed that the extensive revision, together with a long cover letter, was submitted within 10 seconds from receiving the referee's comments. Upon investigation, it turned out that A. Irvine never worked in Los Alamos, nobody in the field has ever met him in person, and the IP from which the paper was submitted was that of the well-known Ragnarok Thor server. A closer analysis of his past papers showed that, although linguistically and logically correct, they were merely a compilation of equations and text from the previous literature without any original addition. 

Incidentally, arXiv administrators have been aware that, since a few years, all source files in daily hep-ph listings were downloaded for an unknown purpose by automated bots. When you have excluded the impossible, whatever remains, however improbable, must be the truth. There is no doubt that A. Irvine is an AI bot, that was trained on the real hep-ph input to produce genuinely-looking  particle theory papers.     

The works of A. Irvine have been quietly removed from arXiv and journals, but difficult questions remain. What was the purpose of it? Was it a spoof? A parody? A social experiment? A Facebook research project? A Russian provocation?  And how could it pass unnoticed for so long within  the theoretical particle community?  What's most troubling is that, if there was one, there can easily be more. Which other papers on arXiv are written by AI? How can we recognize them?  Should we even try, or maybe the dam is already broken and we have to accept the inevitable?  Is Résonaances written by a real person? How can you be sure that you are real?

Update: obviously, this post is an April Fools' prank. It is absolutely unthinkable that the creative process of writing modern particle theory papers can ever be automatized. Also, the neural network referred to in the LHC papers is nothing like Alexa; it's simply a codename for PhD students.  Finally, I assure you that Résonaances is written by a hum 00105e0 e6b0 343b 9c74 0804 e7bc 0804 e7d5 0804 [core dump]

March 30, 2018

Matt von HippelWhy Physicists Leave Physics

It’s an open secret that many physicists end up leaving physics. How many depends on how you count things, but for a representative number, this report has 31% of US physics PhDs in the private sector after one year. I’d expect that number to grow with time post-PhD. While some of these people might still be doing physics, in certain sub-fields that isn’t really an option: it’s not like there are companies that do R&D in particle physics, astrophysics, or string theory. Instead, these physicists get hired in data science, or quantitative finance, or machine learning. Others stay in academia, but stop doing physics: either transitioning to another field, or taking teaching-focused jobs that don’t leave time for research.

There’s a standard economic narrative for why this happens. The number of students grad schools accept and graduate is much higher than the number of professor jobs. There simply isn’t room for everyone, so many people end up doing something else instead.

That narrative is probably true, if you zoom out far enough. On the ground, though, the reasons people leave academia don’t feel quite this “economic”. While they might be indirectly based on a shortage of jobs, the direct reasons matter. Physicists leave physics for a wide variety of reasons, and many of them are things the field could improve on. Others are factors that will likely be present regardless of how many students graduate, or how many jobs there are. I worry that an attempt to address physics attrition on a purely economic level would miss these kinds of details.

I thought I’d talk in this post about a few reasons why physicists leave physics. Most of this won’t be new information to anyone, but I hope some of it is at least a new perspective.

First, to get it out of the way: almost no-one starts a physics PhD with the intention of going into industry. I’ve met a grand total of one person who did, and he’s rather unusual. Almost always, leaving physics represents someone’s dreams not working out.

Sometimes, that just means realizing you aren’t suited for physics. These are people who feel like they aren’t able to keep up with the material, or people who find they aren’t as interested in it as they expected. In my experience, people realize this sort of thing pretty early. They leave in the middle of grad school, or they leave once they have their PhD. In some sense, this is the healthy sort of attrition: without the ability to perfectly predict our interests and abilities, there will always be people who start a career and then decide it’s not for them.

I want to distinguish this from a broader reason to leave, disillusionment. These are people who can do physics, and want to do physics, but encounter a system that seems bent on making them do anything but. Sometimes this means disillusionment with the field itself: phenomenologists sick of tweaking models to lie just beyond the latest experimental bounds, or theorists who had hoped to address the real world but begin to see that they can’t. This kind of motivation lay behind several great atomic physicists going into biology after the second world war, to work on “life rather than death”. Sometimes instead it’s disillusionment with academia: people who have been bludgeoned by academic politics or bureaucracy, who despair of getting the academic system to care about real research or teaching instead of its current screwed-up priorities or who just don’t want to face that kind of abuse again.

When those people leave, it’s at every stage in their career. I’ve seen grad students disillusioned into leaving without a PhD, and successful tenured professors who feel like the field no longer has anything to offer them. While occasionally these people just have a difference of opinion, a lot of the time they’re pointing out real problems with the system, problems that actually should be fixed.

Sometimes, life intervenes. The classic example is the two-body problem, where you and your spouse have trouble finding jobs in the same place. There aren’t all that many places in the world that hire theoretical physicists, and still fewer with jobs open. One or both partners end up needing to compromise, and that can mean switching to a career with a bit more choice in location. People also move to take care of their parents, or because of other connections.

This seems closer to the economic picture, but I don’t think it quite lines up. Even if there were a lot fewer physicists applying for the same number of jobs, it’s still not certain that there’s a job where you want to live, specifically. You’d still end up with plenty of people leaving the field.

A commenter here frequently asks why physicists have to travel so much. Especially for a theorist, why can’t we just work remotely? With current technology, shouldn’t that be pretty easy to do?

I’ve done a lot of remote collaboration, it’s not impossible. But there really isn’t a substitute for working in the same place, for being able to meet someone in the hall and strike up a conversation around a blackboard. Remote collaborations are an ok way to keep a project going, but a rough way to start one. Institutes realize this, which is part of why most of the time they’ll only pay you a salary if they think you’re actually going to show up.

Could I imagine this changing? Maybe. The technology doesn’t exist right now, but maybe someday someone will design a social network with the right features, one where you can strike up and work on collaborations as naturally as you can in person. Then again, maybe I’m silly for imagining a technological solution to the problem in the first place.

What about more direct economic reasons? What about when people leave because of the academic job market itself?

This certainly happens. In my experience though, a lot of the time it’s pre-emptive. You’d think that people would apply for academic jobs, get rejected, and quit the field. More often, I’ve seen people notice the competition for jobs and decide at the outset that it’s not worth it for them. Sometimes this happens right out of grad school. Other times it’s later. In the latter case, these are often people who are “keeping up”, in that their career is moving roughly as fast as everyone else’s. Rather, it’s the stress, of keeping ahead of the field and marketing themselves and applying for every grant in sight and worrying that it could come crashing down any moment, that ends up too much to deal with.

What about the people who do get rejected over and over again?

Physics, like life in Jurassic Park, finds a way. Surprisingly often, these people manage to stick around. Without faculty positions they scrabble up postdoc after postdoc, short-term position after short-term position. They fund their way piece by piece, grant by grant. Often they get depressed, and cynical, and pissed off, and insist that this time they’re just going to quit the field altogether. But from what I’ve seen, once someone is that far in, they often don’t go through with it.

If fewer people went to physics grad school, or more professors were hired, would fewer people leave physics? Yes, absolutely. But there’s enough going on here, enough different causes and different motivations, that I suspect things wouldn’t work out quite as predicted. Some attrition is here to stay, some is independent of the economics. And some, perhaps, is due to problems we ought to actually solve.

March 29, 2018

n-Category Café On the Magnitude Function of Domains in Euclidean Space, II

joint post with Heiko Gimperlein and Magnus Goffeng.

In the previous post, On the Magnitude Function of Domains in Euclidean Space, I, Heiko and Magnus explained the main theorem in their paper

(Remember that here a domain XX in R nR^n means a subset equal to the closure of its interior.)

The main theorem involves the asymptoic behaviour of the magnitude function X(R)\mathcal{M}_X(R) as RR\to\infty and also the continuation of the magnitude function to a meromorphic function on the complex numbers.

In this post we have tried to tease out some of the analytical ideas that Heiko and Magnus use in the proof of their main theorem.

Heiko and Magnus build on the work of Mark Meckes,