Planet Musings

December 15, 2017

Terence TaoEmbedding the Boussinesq equations in the incompressible Euler equations on a manifold

The Boussinesq equations for inviscid, incompressible two-dimensional fluid flow in the presence of gravity are given by

\displaystyle  (\partial_t + u_x \partial_x+ u_y \partial_y) u_x = -\partial_x p \ \ \ \ \ (1)

\displaystyle  (\partial_t + u_x \partial_x+ u_y \partial_y) u_y = \rho - \partial_y p \ \ \ \ \ (2)

\displaystyle  (\partial_t + u_x \partial_x+ u_y \partial_y) \rho = 0 \ \ \ \ \ (3)

\displaystyle  \partial_x u_x + \partial_y u_y = 0 \ \ \ \ \ (4)

where {u: {\bf R} \times {\bf R}^2 \rightarrow {\bf R}^2} is the velocity field, {p: {\bf R} \times {\bf R}^2 \rightarrow {\bf R}} is the pressure field, and {\rho: {\bf R} \times {\bf R}^2 \rightarrow {\bf R}} is the density field (or, in some physical interpretations, the temperature field). In this post we shall restrict ourselves to formal manipulations, assuming implicitly that all fields are regular enough (or sufficiently decaying at spatial infinity) that the manipulations are justified. Using the material derivative {D_t := \partial_t + u_x \partial_x + u_y \partial_y}, one can abbreviate these equations as

\displaystyle  D_t u_x = -\partial_x p

\displaystyle  D_t u_y = \rho - \partial_y p

\displaystyle  D_t \rho = 0

\displaystyle  \partial_x u_x + \partial_y u_y = 0.

One can eliminate the role of the pressure {p} by working with the vorticity {\omega := \partial_x u_y - \partial_y u_x}. A standard calculation then leads us to the equivalent “vorticity-stream” formulation

\displaystyle  D_t \omega = \partial_x \rho

\displaystyle  D_t \rho = 0

\displaystyle  \omega = \partial_x u_y - \partial_y u_x

\displaystyle  \partial_x u_y + \partial_y u_y = 0

of the Boussinesq equations. The latter two equations can be used to recover the velocity field {u} from the vorticity {\omega} by the Biot-Savart law

\displaystyle  u_x := -\partial_y \Delta^{-1} \omega; \quad u_y = \partial_x \Delta^{-1} \omega.

It has long been observed (see e.g. Section 5.4.1 of Bertozzi-Majda) that the Boussinesq equations are very similar, though not quite identical, to the three-dimensional inviscid incompressible Euler equations under the hypothesis of axial symmetry (with swirl). The Euler equations are

\displaystyle  \partial_t u + (u \cdot \nabla) u = - \nabla p

\displaystyle  \nabla \cdot u = 0

where now the velocity field {u: {\bf R} \times {\bf R}^3 \rightarrow {\bf R}^3} and pressure field {p: {\bf R} \times {\bf R}^3 \rightarrow {\bf R}} are over the three-dimensional domain {{\bf R}^3}. If one expresses {{\bf R}^3} in polar coordinates {(z,r,\theta)} then one can write the velocity vector field {u} in these coordinates as

\displaystyle  u = u^z \frac{d}{dz} + u^r \frac{d}{dr} + u^\theta \frac{d}{d\theta}.

If we make the axial symmetry assumption that these components, as well as {p}, do not depend on the {\theta} variable, thus

\displaystyle  \partial_\theta u^z, \partial_\theta u^r, \partial_\theta u^\theta, \partial_\theta p = 0,

then after some calculation (which we give below the fold) one can eventually reduce the Euler equations to the system

\displaystyle  \tilde D_t \omega = \frac{1}{r^4} \partial_z \rho \ \ \ \ \ (5)

\displaystyle  \tilde D_t \rho = 0 \ \ \ \ \ (6)

\displaystyle  \omega = \frac{1}{r} (\partial_z u^r - \partial_r u^z) \ \ \ \ \ (7)

\displaystyle  \partial_z(ru^z) + \partial_r(ru^r) = 0 \ \ \ \ \ (8)

where {\tilde D_t := \partial_t + u^z \partial_z + u^r \partial_r} is the modified material derivative, and {\rho} is the field {\rho := (r u^\theta)^2}. This is almost identical with the Boussinesq equations except for some additional powers of {r}; thus, the intuition is that the Boussinesq equations are a simplified model for axially symmetric Euler flows when one stays away from the axis {r=0} and also does not wander off to {r=\infty}.

However, this heuristic is not rigorous; the above calculations do not actually give an embedding of the Boussinesq equations into Euler. (The equations do match on the cylinder {r=1}, but this is a measure zero subset of the domain, and so is not enough to give an embedding on any non-trivial region of space.) Recently, while playing around with trying to embed other equations into the Euler equations, I discovered that it is possible to make such an embedding into a four-dimensional Euler equation, albeit on a slightly curved manifold rather than in Euclidean space. More precisely, we use the Ebin-Marsden generalisation

\displaystyle  \partial_t u + \nabla_u u = - \mathrm{grad}_g p

\displaystyle  \mathrm{div}_g u = 0

of the Euler equations to an arbitrary Riemannian manifold {(M,g)} (ignoring any issues of boundary conditions for this discussion), where {u: {\bf R} \rightarrow \Gamma(TM)} is a time-dependent vector field, {p: {\bf R} \rightarrow C^\infty(M)} is a time-dependent scalar field, and {\nabla_u} is the covariant derivative along {u} using the Levi-Civita connection {\nabla}. In Penrose abstract index notation (using the Levi-Civita connection {\nabla}, and raising and lowering indices using the metric {g = g_{ij}}), the equations of motion become

\displaystyle  \partial_t u^i + u^j \nabla_j u^i = - \nabla^i p \ \ \ \ \ (9)

\displaystyle  \nabla_i u^i = 0;

in coordinates, this becomes

\displaystyle  \partial_t u^i + u^j (\partial_j u^i + \Gamma^i_{jk} u^k) = - g^{ij} \partial_j p

\displaystyle  \partial_i u^i + \Gamma^i_{ik} u^k = 0 \ \ \ \ \ (10)

where the Christoffel symbols {\Gamma^i_{jk}} are given by the formula

\displaystyle  \Gamma^i_{jk} := \frac{1}{2} g^{il} (\partial_j g_{lk} + \partial_k g_{lj} - \partial_l g_{jk}),

where {g^{il}} is the inverse to the metric tensor {g_{il}}. If the coordinates are chosen so that the volume form {dg} is the Euclidean volume form {dx}, thus {\mathrm{det}(g)=1}, then on differentiating we have {g^{ij} \partial_k g_{ij} = 0}, and hence {\Gamma^i_{ik} = 0}, and so the divergence-free equation (10) simplifies in this case to {\partial_i u^i = 0}. The Ebin-Marsden Euler equations are the natural generalisation of the Euler equations to arbitrary manifolds; for instance, they (formally) conserve the kinetic energy

\displaystyle  \frac{1}{2} \int_M |u|_g^2\ dg = \frac{1}{2} \int_M g_{ij} u^i u^j\ dg

and can be viewed as the formal geodesic flow equation on the infinite-dimensional manifold of volume-preserving diffeomorphisms on {M} (see this previous post for a discussion of this in the flat space case).

The specific four-dimensional manifold in question is the space {{\bf R} \times {\bf R}^+ \times {\bf R}/{\bf Z} \times {\bf R}/{\bf Z}} with metric

\displaystyle  dx^2 + dy^2 + y^{-1} dz^2 + y dw^2

and solutions to the Boussinesq equation on {{\bf R} \times {\bf R}^+} can be transformed into solutions to the Euler equations on this manifold. This is part of a more general family of embeddings into the Euler equations in which passive scalar fields (such as the field {\rho} appearing in the Boussinesq equations) can be incorporated into the dynamics via fluctuations in the Riemannian metric {g}). I am writing the details below the fold (partly for my own benefit).

Firstly, it is convenient to transform the Euler equations on an arbitrary Riemannian manifold to a covelocity formulation, by introducing the covelocity {1}-form {v_i := g_{ij} u^j}, as this formulation allows one to largely avoid having to work with covariant derivatives or Christoffel symbols. Lowering indices in the Euler equation (9) then gives the system

\displaystyle \partial_t v_i + u^j \nabla_j v_i = - \partial_i p

\displaystyle  u^j = g^{ij} v_i

\displaystyle  \nabla_i u^i = 0.

Noting that {u^j \nabla_i v_j = \frac{1}{2} \partial_i ( u^j v_j)}, and introducing the modified pressure {p' := p + \frac{1}{2} u^j v_j}, we arrive at the system

\displaystyle \partial_t v_i + u^j (\nabla_j v_i - \nabla_i v_j) = - \partial_i p'

\displaystyle  u^j = g^{ij} v_i

\displaystyle  \nabla_i u^i = 0.

As the Levi-Civita connection is torsion-free (or equivalently, one has the symmetry {\Gamma^i_{jk} = \Gamma^i_{kj})}, we have {\nabla_j v_i - \nabla_i v_j = \partial_j v_i - \partial_i v_j}, thus we arrive at the system

\displaystyle  \partial_t v_i + u^j (\partial_j v_i - \partial_i v_j) = - \partial_i p' \ \ \ \ \ (11)

\displaystyle  u^j = g^{ij} v_i \ \ \ \ \ (12)

\displaystyle  \nabla_i u^i = 0 \ \ \ \ \ (13)

which is equivalent to (and thus embeddable in) the Euler equations. The advantage of this formulation is that the metric {g} now plays no role whatsoever in the main equation (11), and only appears in (12) and (13). One can also interpret the expression {u^j (\partial_j v_i - \partial_i v_j)} as the Lie derivative of the covelocity {v} along the velocity {u}.

If one works in a coordinate system in which the volume form {dg} is Euclidean (that is to say, {\mathrm{det} g = 1}), then the Riemannian divergence {\nabla_i u^i} is the same as the ordinary divergence {\partial_i u^i}; this can be seen either by viewing the divergence as the adjoint of the gradient operator with respect to the volume form, or else by differentiating the condition {\mathrm{det} g = 1} to conclude that {g^{ij} \partial_k g_{ij} = 0}, which implies that {\Gamma^i_{ik}=0} and hence {\nabla_i u^i = \partial_i u^i}. But actually, as already observed in my previous paper, one can replace {\nabla_i u^i} with {\partial_i u^i} “for free”, even if one does not have the Euclidean volume form condition {\mathrm{det} g = 1}, if one is prepared to add an additional “dummy” dimension to the manifold {M}. More precisely, if {u, v, g, p'} solves the system

\displaystyle  \partial_t v_i + u^j (\partial_j v_i - \partial_i v_j) = - \partial_i p' \ \ \ \ \ (14)

\displaystyle  u^j = g^{ij} v_i \ \ \ \ \ (15)

\displaystyle  \partial_i u^i = 0 \ \ \ \ \ (16)

on some {d}-dimensional Riemannian manifold {(M,g)}, then one can introduce modified fields {\tilde u, \tilde v, \tilde g, \tilde p'} on a {d+1}-dimensional manifold {(M',g')} by defining

\displaystyle  \tilde M := M \times {\bf R}/{\bf Z}

\displaystyle  d\tilde g^2 := dg^2 + \mathrm{det}(g)^{-1} ds^2

\displaystyle  \tilde u^i(x,s) := u^i(x)

\displaystyle  \tilde u^s(x,s) := 0

\displaystyle  \tilde v_i(x,s) := v_i(x)

\displaystyle  \tilde v_s(x,s) := 0

\displaystyle  \tilde p'(x,s) := p'(x)

then these fields obey the same system, and hence (since {\mathrm{det}(\tilde g)=1}) solve (11), (12), (13). Thus the above system is embeddable into the Euler equations in one higher dimension. To embed the Boussinesq equations into the four-dimensional Euler equations mentioned previously, it thus suffices to embed these equations into the system (14)(16) for the three-dimensional manifold {{\bf R} \times {\bf R}^+ \times {\bf R}/{\bf Z}} with metric

\displaystyle  dx^2 + dy^2 + y^{-1} dz^2. \ \ \ \ \ (17)

Let us more generally consider the system (14)(16) under the assumption that {M} splits as a product {M_1 \times M_2} of two manifolds {M_1,M_2}, with all data independent of the {M_2} coordinates (but, for added flexibility, we do not assume that the metric on {M} splits as the direct sum of metrics on {M_1} and {M_2}, allowing for twists and shears). This, if we use Roman indices to denote the {M_1} coordinates and Greek indices to denote the {M_2} coordinates (with summation only being restricted to these coordinates), and denote the inverse metric by the tensor with components {g^{ij}, g^{i\beta}, g^{\beta \gamma}}, then we have

\displaystyle  \partial_\alpha g^{ij}, \partial_\alpha g^{i\beta}, \partial_\alpha g^{\beta \gamma} = 0 \ \ \ \ \ (18)

\displaystyle  \partial_\alpha v_i, \partial_\alpha v_\beta = 0 \ \ \ \ \ (19)

\displaystyle  \partial_\alpha u^i, \partial_\alpha u^\beta = 0 \ \ \ \ \ (20)

\displaystyle  \partial_\alpha p' = 0, \ \ \ \ \ (21)

and then the system (14)(16) in these split coordinates become

\displaystyle  \partial_t v_i + u^j (\partial_j v_i - \partial_i v_j) - u^\alpha \partial_i v_\alpha = - \partial_i p'

\displaystyle  \partial_t v_\beta + u^j \partial_j v_\beta = 0

\displaystyle  u^j = g^{ij} v_i + g^{\alpha j} v_\alpha

\displaystyle  u^\alpha = g^{\alpha \beta} v_\beta + g^{\alpha j} v_j

\displaystyle  \partial_i u^i = 0.

We can view this as a system of PDE on the smaller manifold {M_1}, which is then embedded into the Euler equations. Introducing the material derivative {D_t := \partial_t + u^j \partial_j}, this simplifies slightly to

\displaystyle  D_t v_i - u^j \partial_i v_j - u^\alpha \partial_i v_\alpha = - \partial_i p'

\displaystyle  D_t v_\beta = 0

\displaystyle  u^j = g^{ij} v_i + g^{\alpha j} v_\alpha

\displaystyle  u^\alpha = g^{\alpha \beta} v_\beta + g^{\alpha j} v_j

\displaystyle  \partial_i u^i = 0.

We substitute the third and fourth equations into the first, then drop the fourth (as it can be viewed as a definition of the field {u^\alpha}, which no longer plays any further role), to obtain

\displaystyle  D_t v_i - g^{jk} v_k \partial_i v_j - g^{\alpha j} v_\alpha \partial_i v_j - g^{\alpha \beta} v_\beta \partial_i v_\alpha - g^{\alpha j} v_j \partial_i v_\alpha = - \partial_i p'

\displaystyle  D_t v_\beta = 0

\displaystyle  u^j = g^{ij} v_i + g^{\alpha j} v_\alpha

\displaystyle  \partial_i u^i = 0.

We can reverse the pressure modification by writing

\displaystyle  p := p' - \frac{1}{2} g^{jk} v_j v_k - g^{\alpha j} v_j v_\alpha - \frac{1}{2} g^{\alpha \beta} v_\alpha v_\beta,

to move some derivatives off of the covelocity fields and onto the metric, so that the system now becomes

\displaystyle  D_t v_i + \frac{1}{2} v_k v_j \partial_i g^{jk} + v_j v_\alpha \partial_i g^{j\alpha} + \frac{1}{2} v_\alpha v_\beta \partial_i g^{\alpha \beta} = - \partial_i p \ \ \ \ \ (22)

\displaystyle  D_t v_\beta = 0 \ \ \ \ \ (23)

\displaystyle  u^j = g^{ij} v_i + g^{\alpha j} v_\alpha \ \ \ \ \ (24)

\displaystyle  \partial_i u^i = 0. \ \ \ \ \ (25)

At this point one can specialise to various special cases to obtain some possibly simpler dynamics. For instance, one could set {M_2} to be flat (so that {g^{\alpha \beta}} is constant), and set {v_i} and {p} to both vanish, then we obtain the simple-looking (but somewhat overdetermined) system

\displaystyle  D_t v_\beta = 0

\displaystyle  u^j = g^{\alpha j} v_\alpha

\displaystyle  \partial_i u^i = 0.

This is basically the system I worked with in my previous paper. For instance, one could set one of the components of {v_\alpha}, say {v_\zeta} to be identically {1}, and {g^{\zeta j}} to be an arbitrary divergence-free vector field for that component, then {u^j = g^{\zeta j}}, and all the other components {v_\beta} of {v} are transported by this static velocity field, leading for instance to exponential growth of vorticity if {g^{\zeta j}} has a hyperbolic fixed point and the initial data of the components of {v} other than {v_\alpha} are generic. (Alas, I was not able to modify this example to obtain something more dramatic than exponential growth, such as finite time blowup.)

Alternatively, one can set {g^{j\alpha}} to vanish, leaving one with

\displaystyle  D_t v_i + \frac{1}{2} v_k v_j \partial_i g^{jk} + \frac{1}{2} v_\alpha v_\beta \partial_i g^{\alpha \beta} = - \partial_i p. \ \ \ \ \ (26)

\displaystyle  D_t v_\beta = 0 \ \ \ \ \ (27)

\displaystyle  u^j = g^{ij} v_i \ \ \ \ \ (28)

\displaystyle  \partial_i u^i = 0. \ \ \ \ \ (29)

If {M_2} consists of a single coordinate {\zeta}, then on setting {\rho := \frac{1}{2} v_\zeta^2}, this simplifies to

\displaystyle  D_t v_i + \frac{1}{2} v_k v_j \partial_i g^{jk} = - \partial_i p - \rho \partial_i g^{\zeta\zeta}

\displaystyle  D_t \rho = 0

\displaystyle  u^j = g^{ij} v_i

\displaystyle  \partial_i u^i = 0.

If we take {M_1} to be {{\bf R} \times {\bf R}^+} with the Euclidean metric {g^{ij}}, and set {g^{\zeta\zeta} = y} (so that {M} has the metric (17)), then one obtains the Boussinesq system (1)(3), giving the claimed embedding.

Now we perform a similar analysis for the axially symmetric Euler equations. The cylindrical coordinate system {(z,r,\theta)} is slightly inconvenient to work with because the volume form {r\ dz dr d\theta} is not Euclidean. We therefore introduce Turkington coordinates

\displaystyle  (x,y,\zeta) := (z,r^2/2,\theta)

to rewrite the metric {dz^2 + dr^2 + r^2 d\theta^2} as

\displaystyle  dx^2 + \frac{1}{2y} dy^2 + 2y d\zeta^2

so that the volume form {dx dy d\zeta} is now Euclidean, and the Euler equations become (14)(16). Splitting as before, with {M_1} being the two-dimensional manifold parameterised by {x,y}, and {M_2} the one-dimensional manifold parameterised by {\zeta}, the symmetry reduction (18)(21) gives us (26)(29) as before. Explicitly, one has

\displaystyle  D_t v_x = - \partial_x p

\displaystyle  D_t v_y + v_y^2 - \frac{1}{4y^2} v_\zeta^2 = - \partial_y p

\displaystyle  D_t v_\zeta = 0

\displaystyle  u^x = v_x; u^y = 2y v_y

\displaystyle  \partial_x u^x + \partial_y u^y = 0.

Setting {\omega := \partial_x v_y - \partial_y v_x} to eliminate the pressure {p}, we obtain

\displaystyle  D_t \omega = \frac{1}{4y^2} \partial_x (v_\zeta^2)

\displaystyle  D_t v_\zeta = 0

\displaystyle \omega := \partial_x(\frac{1}{2y} u^y) - \partial_y u^x

\displaystyle  \partial_x u^x + \partial_y u^y = 0.

Since {u^y = r u^r}, {u^x = u^z}, {\partial_x = \partial_z}, {\partial_y = \frac{1}{r} \partial_r}, and {\rho = v_\zeta^2}, we obtain the system (5)(8).

Returning to the general form of (22)(25), one can obtain an interesting transformation of this system by writing {g_{ij}} for the inverse of {g^{ij}} (caution: in general, this is not the restriction of the original metric on {M} to {M_1}), and define the modified covelocity

\displaystyle  \tilde v_i := g_{ij} u^j = v_i + g_{ij} g^{j\alpha} v_\alpha,

then by the Leibniz rule

\displaystyle  D_t \tilde v_i = D_t v_i + v_\alpha D_t (g_{ij} g^{j\alpha})

\displaystyle  = - \frac{1}{2} v_k v_j \partial_i g^{jk} - v_j v_\alpha \partial_i g^{j\alpha} - v_\alpha v_\beta \frac{1}{2} \partial_i g^{\alpha \beta} - \partial_i p'

\displaystyle  + v_\alpha u^k \partial_k( g_{ij} g^{j\alpha} )

Replacing the covelocity with the modified covelocity, this becomes

\displaystyle  = - \frac{1}{2} \tilde v_k \tilde v_j \partial_i g^{jk} + \tilde v_k g_{jl} g^{l\alpha} v_\alpha \partial_i g^{jk} - \frac{1}{2} g_{km} g^{m\beta} v_\beta g_{jl} g^{l\alpha} v_\alpha \partial_i g^{jk}

\displaystyle  - \tilde v_j v_\alpha \partial_i g^{j\alpha} + g_{jl} g^{l \beta} v_\alpha v_\beta \partial_i g^{j\alpha} - v_\alpha v_\beta \frac{1}{2} \partial_i g^{\alpha \beta} - \partial_i p

\displaystyle  + v_\alpha u^k \partial_k( g_{ij} g^{j\alpha} ).

We thus have the system

\displaystyle  D_t \tilde v_i + \frac{1}{2} \tilde v_k \tilde v_j \partial_i g^{jk} = - \partial_i p' + R_i^{j\alpha} \tilde v_j v_\alpha + S_i^{\alpha \beta} v_\alpha v_\beta

\displaystyle  D_t v_\alpha = 0

\displaystyle  u^i = g^{ij} \tilde v_j

\displaystyle  \partial_i u^i = 0

where

\displaystyle  R_i^{j\alpha} := g_{kl} g^{l\alpha} \partial_i g^{jk} - \partial_i g^{j\alpha} + g^{jk} \partial_k( g_{il} g^{l\alpha} )

\displaystyle  = g^{jk} (\partial_k (g^{l\alpha} g_{il}) - \partial_i(g^{l\alpha} g_{kl}))

\displaystyle  S_i^{\alpha \beta} := -\frac{1}{2} g_{km} g^{m\beta} g_{jl} g^{l\alpha} \partial_i g^{jk} + g_{jl} g^{l\beta} \partial_i g^{j\alpha} - \frac{1}{2} \partial_i g^{\alpha\beta} = \frac{1}{2} \partial_i (g^{m\beta} g^{l\alpha} g_{ml} - g^{\alpha \beta})

and so if one writes

\displaystyle  p' := p + \frac{1}{2} \tilde v_k \tilde v_j g^{jk}

\displaystyle  \theta^\alpha_i := g^{l\alpha} g_{il}

\displaystyle  \omega^\alpha_{ki} := \partial_k \theta^\alpha_i - \partial_i \theta^\alpha_k

\displaystyle  F^{\alpha \beta} := \frac{1}{2} (g^{\alpha \beta} - g^{m\beta} g^{l\alpha} g_{ml})

we obtain

\displaystyle  D_t \tilde v_i - u^j \partial_i \tilde v_j = - \partial_i p' + u^k v_\alpha \omega^\alpha_{ki} - v_\alpha v_\beta \partial_i F^{\alpha \beta}

\displaystyle  D_t v_\alpha = 0

\displaystyle  u^i = g^{ij} \tilde v_j

\displaystyle  \partial_i u^i = 0.

For each {\alpha,\beta}, we can specify {F^{\alpha\beta}} as an arbitrary smooth function of space (it has to be positive definite to keep the manifold {M} Riemannian, but one can add an arbitrary constant to delete this constraint), and {\omega^\alpha_{ki}} as an arbitrary time-independent exact {2}-form. Thus we obtain an incompressible Euler system with two new forcing terms, one term {v_\alpha v_\beta \partial_i F^{\alpha \beta}} conming from passive scalars {v_\alpha, v_\beta}, and another term {u^k v_\alpha \omega^\alpha_{ki}} that sets up some rotation between the components {\tilde v_i}, with the rotation speed determined by a passive scalar {v_\alpha}.

Remark 1 As a sanity check, one can observe that one still has conservation of the kinetic energy, which is equal to

\displaystyle  \frac{1}{2} \int g^{jk} v_j v_k + 2 g^{j\alpha} v_j v_\alpha + g^{\alpha \beta} v_\alpha v_\beta

and can be expressed in terms of {u^j} and {v_\alpha} as

\displaystyle  \int g_{jk} u^j u^k + v_\alpha v_\beta (g^{\alpha \beta} - g^{j\alpha} g^{k\beta} g_{jk})

\displaystyle  = \int u^j \tilde v_j + 2 v_\alpha v_\beta F^{\alpha \beta}.

One can check this is conserved by the above system (mainly due to the antisymmetry of {\omega}).

As one special case of this system, one can work with a one-dimensional fibre manifold {M_2}, and set {v_\zeta=1} and {F^{\zeta\zeta}=0} for the single coordinate {\zeta} of this manifold. This leads to the system

\displaystyle  D_t \tilde v_i - u^j \partial_i \tilde v_j = - \partial_i p' + u^k \omega_{ki}

\displaystyle  u^i = g^{ij} \tilde v_j

\displaystyle  \partial_i u^i = 0.

where {\omega_{ki}} is some smooth time-independent exact {2}-form that one is free to specify. This resembles an Euler equation in the presence of a “magnetic field” that rotates the velocity of the fluid. I am currently experimenting with trying to use this to force some sort of blowup, though I have not succeeded so far (one would obviously have to use the pressure term at some point, for if the pressure vanished then one could keep things bounded using the method of characteristics).


Filed under: expository, math.AP Tagged: Boussinesq equations, incompressible Euler equations, Riemannian geometry

December 14, 2017

n-Category Café Entropy Modulo a Prime

In 1995, the German geometer Friedrich Hirzebruch retired, and a private booklet was put together to mark the occasion. That booklet included a short note by Maxim Kontsevich entitled “The 1121\tfrac{1}{2}-logarithm”.

Kontsevich’s note didn’t become publicly available until five years later, when it was included as an appendix to a paper on polylogarithms by Philippe Elbaz-Vincent and Herbert Gangl. Towards the end, it contains the following provocative words:

Conclusion: If we have a random variable ξ\xi which takes finitely many values with all probabilities in \mathbb{Q} then we can define not only the transcendental number H(ξ)H(\xi) but also its “residues modulo pp” for almost all primes pp !

Kontsevich’s note was very short and omitted many details. I’ll put some flesh on those bones, showing how to make sense of the sentence above, and much more.

The “HH” that Kontsevich uses here is the symbol for entropy — or more exactly, Shannon entropy. So, I’ll begin by recalling what that is. That will pave the way for what I really want to talk about, which is a kind of entropy for probability distributions where the “probabilities” are not real numbers, but elements of the field /p\mathbb{Z}/p\mathbb{Z} of integers modulo a prime pp.

Let π=(π 1,,π n)\pi = (\pi_1, \ldots, \pi_n) be a finite probability distribution. (It would be more usual to write a probability distribution as pp, but I want to reserve that letter for prime numbers.) The entropy of π\pi is

H (π)= i:π i0π ilogπ i. H_\mathbb{R}(\pi) = - \sum_{i : \pi_i \neq 0} \pi_i \log \pi_i.

Usually this is just written as HH, but I want to emphasize the role of the real numbers here: both the probabilities π i\pi_i and the entropy H (π)H_\mathbb{R}(\pi) belong to \mathbb{R}.

There are applications of entropy in dozens of branches of science… but none will be relevant here! This is purely a mathematical story, though if anyone can think of any possible application or interpretation of entropy modulo a prime, I’d love to hear it.

The challenge now is to find the correct analogue of entropy when the field \mathbb{R} is replaced by the field /p\mathbb{Z}/p\mathbb{Z} of integers mod pp, for any prime pp. So, we want to define a kind of entropy

H p(π 1,,π n)/p H_p(\pi_1, \ldots, \pi_n) \in \mathbb{Z}/p\mathbb{Z}

when π i/p\pi_i \in \mathbb{Z}/p\mathbb{Z}.

We immediately run into an obstacle. Over \mathbb{R}, probabilities are required to be nonnegative. Indeed, the logarithm in the definition of entropy doesn’t make sense otherwise. But in /p\mathbb{Z}/p\mathbb{Z}, there is no notion of positive or negative. So, what are we even going to define the entropy of?

We take the simplest way out: ignore the problem. So, writing

Π n={(π 1,,π n)(/p) n:π 1++π n=1}, \Pi_n = \{ (\pi_1, \ldots, \pi_n) \in (\mathbb{Z}/p\mathbb{Z})^n : \pi_1 + \cdots + \pi_n = 1 \},

we’re going to try to define

H p(π)/p H_p(\pi) \in \mathbb{Z}/p\mathbb{Z}

for each π=(π 1,,π n)Π n\pi = (\pi_1, \ldots, \pi_n) \in \Pi_n.

Let’s try the most direct approach to doing this. That is, let’s stare at the formula defining real entropy…

H (π)= i:π i0π ilogπ i H_\mathbb{R}(\pi) = - \sum_{i : \pi_i \neq 0} \pi_i \log \pi_i

… and try to write down the analogous formula over /p\mathbb{Z}/p\mathbb{Z}.

The immediate question is: what should play the role of the logarithm mod pp?

The crucial property of the ordinary logarithm is that it converts multiplication into addition. Specifically, we’re concerned here with logarithms of nonzero probabilities, and log\log defines a homomorphism from the multiplicative group (0,1](0, 1] of nonzero probabilities to the additive group \mathbb{R}.

Mod pp, then, we want a homomorphism from the multiplicative group (/p) ×(\mathbb{Z}/p\mathbb{Z})^\times of nonzero probabilities to the additive group /p\mathbb{Z}/p\mathbb{Z}. And here we hit another obstacle: a simple argument using Lagrange’s theorem shows that apart from the zero map, no such homomorphism exists.

So, we seem to be stuck. Actually, we’re stuck in a way that often happens when you try to construct something new, working by analogy with something old: slavishly imitating the old situation, symbol for symbol, often doesn’t work. In the most interesting analogies, there are wrinkles.

To make some progress, instead of looking at the formula for entropy, let’s look at the properties of entropy.

The most important property is a kind of recursivity. In the language spoken by many patrons of the Café, finite probability distributions form an operad. Explicitly, this means the following.

Suppose I flip a coin. If it’s heads, I roll a die, and if it’s tails, I draw from a pack of cards. This is a two-stage process with 58 possible final outcomes: either the face of a die or a playing card. Assuming that the coin toss, die roll and card draw are all fair, the probability distribution on the 58 outcomes is

(1/12,,1/12,1/104,,1/104), (1/12, \ldots, 1/12, 1/104, \ldots, 1/104),

with 66 copies of 1/121/12 and 5252 copies of 1/1041/104. Generally, given a probability distribution γ=(γ 1,,γ n)\gamma = (\gamma_1, \ldots, \gamma_n) on nn elements and, for each i{1,,n}i \in \{1, \ldots, n\}, a probability distribution π i=(π 1 i,,π k i i)\pi^i = (\pi^i_1, \ldots, \pi^i_{k_i}) on k ik_i elements, we get a composite distribution

γ(π 1,,π n)=(γ 1π 1 1,,γ 1π k 1 1,,γ nπ 1 n,,γ nπ k n n) \gamma \circ (\pi^1, \ldots, \pi^n) = (\gamma_1 \pi^1_1, \ldots, \gamma_1 \pi^1_{k_1}, \ldots, \gamma_n \pi^n_1, \ldots, \gamma_n \pi^n_{k_n})

on k 1++k nk_1 + \cdots + k_n elements.

For example, take the coin-die-card process above. Writing u nu_n for the uniform distribution on nn elements, the final distribution on 5858 elements is u 2(u 6,u 52)u_2 \circ (u_6, u_{52}), which I wrote out explicitly above.

The important recursivity property of entropy is called the chain rule, and it states that

H (γ(π 1,,π n))=H (γ)+ i=1 nγ iH (π i). H_\mathbb{R}(\gamma \circ (\pi^1, \ldots, \pi^n)) = H_\mathbb{R}(\gamma) + \sum_{i = 1}^n \gamma_i H_\mathbb{R}(\pi^i).

It’s easy to check that this is true. (It’s also nice to understand it in terms of information… but if I follow every tempting explanatory byway, I’ll run out of energy too soon.) And in fact, it characterizes entropy almost uniquely:

Theorem   Let II be a function assigning a real number I(π)I(\pi) to each finite probability distribution π\pi. The following are equivalent:

  • II is continuous in π\pi and satisfies the chain rule;

  • I=cH I = c H_\mathbb{R} for some constant cc \in \mathbb{R}.

The theorem as stated is due to Faddeev, and I blogged about it earlier this year. In fact, you can weaken “continuous” to “measurable” (a theorem of Lee), but that refinement won’t be important here.

What is important is this. In our quest to imitate real entropy in /p\mathbb{Z}/p\mathbb{Z}, we now have something to aim for. Namely: we want a sequence of functions H p:Π n/pH_p : \Pi_n \to \mathbb{Z}/p\mathbb{Z} satisfying the obvious analogue of the chain rule. And if we’re really lucky, there will be essentially only one such sequence.

We’ll discover that this is indeed the case. Once we’ve found the right definition of H pH_p and proved this, we can very legitimately baptize H pH_p as “entropy mod pp” — no matter what weird and wonderful formula might be used to define it — because it has the same characteristic properties as entropy over \mathbb{R}.

I think I’ll leave you on that cliff-hanger. If you’d like to guess what the definition of entropy mod pp is, go ahead! Otherwise, I’ll tell you next time.

David HoggGalactic Center review

I spent the day at UCLA, reviewing the data-analysis work of the Galactic Center Group there, for reporting to the Keck Foundation. It was a great day on a great project. They have collected large amounts of data (for more than 20 years!), both imaging and spectroscopy, to tie down the orbits of the stars near the Galactic Center black hole, and also to tie down the Newtonian reference frame. The approach is to process imaging and spectroscopy into astrometric and kinematic measurements, and then fit those measurements with a physical model. Among the highlights of the day were arguments about priors on orbital parameters, and descriptions of post-Newtonian terms that matter if you want to test General Relativity. Or test for the presence of dark matter concentrated at the center of the Galaxy.

Robert HellingWhat are the odds?

It's the time of year, you give out special problems in your classes. So this is mine for the blog. It is motivated by this picture of the home secretaries of the German federal states after their annual meeting as well as some recent discussions on Facebook:
I would like to call it Summers' problem:

Let's have two real random variables $M$ and $F$ that are drawn according to two probability distributions $\rho_{M/F}(x)$ (for starters you may both assume to be Gaussians but possibly with different mean and variance). Take $N$ draws from each and order the $2N$ results. What is the probability that the $k$ largest ones are all from $M$ rather than $F$? Express your results in terms of the $\rho_{M/F}(x)$. We are also interested in asymptotic results for $N$ large and $k$ fixed as well as $N$ and $k$ large but $k/N$ fixed.

Last bonus question: How many of the people that say that they hire only based on merit and end up with an all male board realise that by this they say that women are not as good by quite a margin?

Secret Blogging SeminarFighting the grad student tax

I’m throwing this post up quickly, because time is of the essence. I had hoped someone else would do the work. If they did, please link them in the comments.

As many of you know, the US House and Senate have passed revisions to the tax code. According to the House, but not the Senate draft, graduate tuition remissions are taxed as income. Thus, here at U Michigan, our graduate stipend is 19K and our tuition is 12K. If the House version takes effect, our students would be billed as if receiving 31K without getting a penny more to pay it with.

It is thus crucial which version of the bill goes forward. The first meeting of the reconciliation committee is TONIGHT, at 6 PM Eastern Time. Please contact your congress people. You can look up their contact information here. Even if they are clearly on the right side of this issue, they need to be able to report how many calls they have gotten about it when arguing with their colleagues. Remember — be polite, make it clear what you are asking for, and make it clear that you live and vote in their district. If you work at a large public university in their district, you may want to point out the effect this will have on that university.

I’ll try to look up information about congress people who are specifically on the committee or otherwise particularly vulnerable. Jordan Ellenberg wrote a “friends only” facebook post relevant to this, which I encourage him to repost publicly, on his blog or in the comments here.

UPDATE According to the Wall Street Journal, the grad tax is out. Ordinarily, I thank my congress people when they’ve been on the right side of an issue and won. (Congress people are human, and appreciate thanks too!) In this case, I believe the negotiations happened largely in secret, so I’m not sure who deserves thanks. If anyone knows, feel free to post below.


n-Category Café The Icosahedron and E8

Here’s a draft of a little thing I’m writing for the Newsletter of the London Mathematical Society. The regular icosahedron is connected to many ‘exceptional objects’ in mathematics, and here I describe two ways of using it to construct E 8 \mathrm{E}_8. One uses a subring of the quaternions called the ‘icosians’, while the other uses Du Val’s work on the resolution of Kleinian singularities. I leave it as a challenge to find the connection between these two constructions!

(Dedicated readers of this blog may recall that I was struggling with the second construction in July. David Speyer helped me a lot, but I got distracted by other work and the discussion fizzled. Now I’ve made more progress… but I’ve realized that the details would never fit in the Newsletter, so I’m afraid anyone interested will have to wait a bit longer.)

You can get a PDF version here:

From the icosahedron to E8.

But blogs are more fun.

From the Icosahedron to E8

In mathematics, every sufficiently beautiful object is connected to all others. Many exciting adventures, of various levels of difficulty, can be had by following these connections. Take, for example, the icosahedron — that is, the regular icosahedron, one of the five Platonic solids. Starting from this it is just a hop, skip and a jump to the E 8\mathrm{E}_8 lattice, a wonderful pattern of points in 8 dimensions! As we explore this connection we shall see that it also ties together many other remarkable entities: the golden ratio, the quaternions, the quintic equation, a highly symmetrical 4-dimensional shape called the 600-cell, and a manifold called the Poincaré homology 3-sphere.

Indeed, the main problem with these adventures is knowing where to stop. The story we shall tell is just a snippet of a longer one involving the McKay correspondence and quiver representations. It would be easy to bring in the octonions, exceptional Lie groups, and more. But it can be enjoyed without these digressions, so let us introduce the protagonists without further ado.

The icosahedron has a long history. According to a comment in Euclid’s Elements it was discovered by Plato’s friend Theaetetus, a geometer who lived from roughly 415 to 369 BC. Since Theaetetus is believed to have classified the Platonic solids, he may have found the icosahedron as part of this project. If so, it is one of the earliest mathematical objects discovered as part of a classification theorem. In any event, it was known to Plato: in his Timaeus, he argued that water comes in atoms of this shape.

The icosahedron has 20 triangular faces, 30 edges, and 12 vertices. We can take the vertices to be the four points

(0,±1,±Φ) (0 , \pm 1 , \pm \Phi)

and all those obtained from these by cyclic permutations of the coordinates, where

Φ=5+12 \displaystyle{ \Phi = \frac{\sqrt{5} + 1}{2} }

is the golden ratio. Thus, we can group the vertices into three orthogonal golden rectangles: rectangles whose proportions are Φ\Phi to 1.

In fact, there are five ways to do this. The rotational symmetries of the icosahedron permute these five ways, and any nontrivial rotation gives a nontrivial permutation. The rotational symmetry group of the icosahedron is thus a subgroup of S 5 \mathrm{S}_5. Moreover, this subgroup has 60 elements. After all, any rotation is determined by what it does to a chosen face of the icosahedron: it can map this face to any of the 20 faces, and it can do so in 3 ways. The rotational symmetry group of the icosahedron is therefore a 60-element subgroup of S 5 \mathrm{S}_5. Group theory therefore tells us that it must be the alternating group A 5 \mathrm{A}_5.

The E 8 \mathrm{E}_8 lattice is harder to visualize than the icosahedron, but still easy to characterize. Take a bunch of equal-sized spheres in 8 dimensions. Get as many of these spheres to touch a single sphere as you possibly can. Then, get as many to touch those spheres as you possibly can, and so on. Unlike in 3 dimensions, where there is “wiggle room”, you have no choice about how to proceed, except for an overall rotation and translation. The balls will inevitably be centered at points of the E 8 \mathrm{E}_8 lattice!

We can also characterize the E 8\mathrm{E}_8 lattice as the one giving the densest packing of spheres among all lattices in 8 dimensions. This packing was long suspected to be optimal even among those that do not arise from lattices — but this fact was proved only in 2016, by the young mathematician Maryna Viazovska [V].

We can also describe the E 8\mathrm{E}_8 lattice more explicitly. In suitable coordinates, it consists of vectors for which:

• the components are either all integers or all integers plus 12\textstyle{\frac{1}{2}}, and

• the components sum to an even number.

This lattice consists of all integral linear combinations of the 8 rows of this matrix:

(1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 12 12 12 12 12 12 12 12) \left( \begin{array}{rrrrrrrr} 1&-1&0&0&0&0&0&0 \\ 0&1&-1&0&0&0&0&0 \\ 0&0&1&-1&0&0&0&0 \\ 0&0&0&1&-1&0&0&0 \\ 0&0&0&0&1&-1&0&0 \\ 0&0&0&0&0&1&-1&0 \\ 0&0&0&0&0&1&1&0 \\ -\frac{1}{2}&-\frac{1}{2}&-\frac{1}{2}&-\frac{1}{2}&-\frac{1}{2}&-\frac{1}{2}&-\frac{1}{2}&-\frac{1}{2} \end{array} \right)

The inner product of any row vector with itself is 2, while the inner product of distinct row vectors is either 0 or -1. Thus, any two of these vectors lie at an angle of either 90° or 120° from each other. If we draw a dot for each vector, and connect two dots by an edge when the angle between their vectors is 120° we get this pattern:

This is called the E 8\mathrm{E}_8 Dynkin diagram. In the first part of our story we shall find the E 8 \mathrm{E}_8 lattice hiding in the icosahedron; in the second part, we shall find this diagram. The two parts of this story must be related — but the relation remains mysterious, at least to me.

The Icosians

The quickest route from the icosahedron to E 8 \mathrm{E}_8 goes through the fourth dimension. The symmetries of the icosahedron can be described using certain quaternions; the integer linear combinations of these form a subring of the quaternions called the ‘icosians’, but the icosians can be reinterpreted as a lattice in 8 dimensions, and this is the E 8 \mathrm{E}_8 lattice [CS]. Let us see how this works. The quaternions, discovered by Hamilton, are a 4-dimensional algebra

={a+bi+cj+dk:a,b,c,d} \displaystyle{ \mathbb{H} = \{a + b i + c j + d k \colon \; a,b,c,d\in \mathbb{R}\} }

with multiplication given as follows:

i 2=j 2=k 2=1, \displaystyle{i^2 = j^2 = k^2 = -1, } ij=k=jiandcyclicpermutations \displaystyle{i j = k = - j i \; and \; cyclic \; permutations }

It is a normed division algebra, meaning that the norm

|a+bi+cj+dk|=a 2+b 2+c 2+d 2 \displaystyle{ |a + b i + c j + d k| = \sqrt{a^2 + b^2 + c^2 + d^2} }

obeys

|qq|=|q||q| |q q'| = |q| |q'|

for all q,qq,q' \in \mathbb{H}. The unit sphere in \mathbb{H} is thus a group, often called SU(2) \mathrm{SU}(2) because its elements can be identified with 2×2 2 \times 2 unitary matrices with determinant 1. This group acts as rotations of 3-dimensional Euclidean space, since we can see any point in 3 \mathbb{R}^3 as a purely imaginary quaternion x=bi+cj+dk x = b i + c j + d k, and the quaternion qxq 1 qxq^{-1} is then purely imaginary for any qSO(3) q \in \mathrm{SO}(3). Indeed, this action gives a double cover

α:SU(2)SO(3) \displaystyle{ \alpha \colon \mathrm{SU}(2) \to \mathrm{SO}(3) }

where SO(3) \mathrm{SO}(3) is the group of rotations of 3 \mathbb{R}^3.

We can thus take any Platonic solid, look at its group of rotational symmetries, get a subgroup of SO(3) \mathrm{SO}(3), and take its double cover in SU(2) \mathrm{SU}(2). If we do this starting with the icosahedron, we see that the 60 60-element group A 5SO(3) \mathrm{A}_5 \subset \mathrm{SO}(3) is covered by a 120-element group ΓSU(2) \Gamma \subset \mathrm{SU}(2), called the binary icosahedral group.

The elements of Γ \Gamma are quaternions of norm one, and it turns out that they are the vertices of a 4-dimensional regular polytope: a 4-dimensional cousin of the Platonic solids. It deserves to be called the ‘hypericosahedron’, but it is usually called the 600-cell, since it has 600 tetrahedral faces. Here is the 600-cell projected down to 3 dimensions, drawn using Robert Webb’s Stella software:

Explicitly, if we identify \mathbb{H} with 4 \mathbb{R}^4, the elements of Γ \Gamma are the points

(±12,±12,±12,±12) \displaystyle{ (\pm \textstyle{\frac{1}{2}}, \pm \textstyle{\frac{1}{2}},\pm \textstyle{\frac{1}{2}},\pm \textstyle{\frac{1}{2}}) }

(±1,0,0,0) \displaystyle{ (\pm 1, 0, 0, 0) }

12(±Φ,±1,±1/Φ,0), \displaystyle{ \textstyle{\frac{1}{2}} (\pm \Phi, \pm 1 , \pm 1/\Phi, 0 ),}

and those obtained from these by even permutations of the coordinates. Since these points are closed under multiplication, if we take integral linear combinations of them we get a subring of the quaternions:

𝕀={ qΓa qq:a q}. \displaystyle{ \mathbb{I} = \{ \sum_{q \in \Gamma} a_q q : \; a_q \in \mathbb{Z} \} \subset \mathbb{H} .}

Conway and Sloane [CS] call this the ring of icosians. The icosians are not a lattice in the quaternions: they are dense. However, any icosian is of the form a+bi+cj+dk a + bi + cj + dk where a,b,c a,b,c, and d d live in the golden field

(5)={x+5y:x,y} \displaystyle{ \mathbb{Q}(\sqrt{5}) = \{ x + \sqrt{5} y : \; x,y \in \mathbb{Q}\} }

Thus we can think of an icosian as an 8-tuple of rational numbers. Such 8-tuples form a lattice in 8 dimensions.

In fact we can put a norm on the icosians as follows. For q𝕀 q \in \mathbb{I} the usual quaternionic norm has

|q| 2=x+5y \displaystyle{ |q|^2 = x + \sqrt{5} y }

for some rational numbers x x and y y, but we can define a new norm on 𝕀 \mathbb{I} by setting

q 2=x+y \displaystyle{ \|q\|^2 = x + y }

With respect to this new norm, the icosians form a lattice that fits isometrically in 8-dimensional Euclidean space. And this is none other than E 8 \mathrm{E}_8!

Klein’s Icosahedral Function

Not only is the E 8 \mathrm{E}_8 lattice hiding in the icosahedron; so is the E 8 \mathrm{E}_8 Dynkin diagram. The space of all regular icosahedra of arbitrary size centered at the origin has a singularity, which corresponds to a degenerate special case: the icosahedron of zero size. If we resolve this singularity in a minimal way we get eight Riemann spheres, intersecting in a pattern described by the E 8 \mathrm{E}_8 Dynkin diagram!

This remarkable story starts around 1884 with Felix Klein’s Lectures on the Icosahedron [Kl]. In this work he inscribed an icosahedron in the Riemann sphere, P 1 \mathbb{C}\mathrm{P}^1. He thus got the icosahedron’s symmetry group, A 5 \mathrm{A}_5, to act as conformal transformations of P 1 \mathbb{C}\mathrm{P}^1 — indeed, rotations. He then found a rational function of one complex variable that is invariant under all these transformations. This function equals 0 0 at the centers of the icosahedron’s faces, 1 at the midpoints of its edges, and \infty at its vertices.

Here is Klein’s icosahedral function as drawn by Abdelaziz Nait Merzouk. The color shows its phase, while the contour lines show its magnitude:

We can think of Klein’s icosahedral function as a branched cover of the Riemann sphere by itself with 60 sheets:

:P 1P 1. \displaystyle{ \mathcal{I} \colon \mathbb{C}\mathrm{P}^1 \to \mathbb{C}\mathrm{P}^1 .}

Indeed, A 5 \mathrm{A}_5 acts on P 1 \mathbb{C}\mathrm{P}^1, and the quotient space P 1/A 5 \mathbb{C}\mathrm{P}^1/\mathrm{A}_5 is isomorphic to P 1 \mathbb{C}\mathrm{P}^1 again. The function \mathcal{I} gives an explicit formula for the quotient map P 1P 1/A 5P 1 \mathbb{C}\mathrm{P}^1 \to \mathbb{C}\mathrm{P}^1/\mathrm{A}_5 \cong \mathbb{C}\mathrm{P}^1.

Klein managed to reduce solving the quintic to the problem of solving the equation (z)=w \mathcal{I}(z) = w for z z. A modern exposition of this result is Shurman’s Geometry of the Quintic [Sh]. For a more high-powered approach, see the paper by Nash [N]. Unfortunately, neither of these treatments avoids complicated calculations. But our interest in Klein’s icosahedral function here does not come from its connection to the quintic: instead, we want to see its connection to E 8 \mathrm{E}_8.

For this we should actually construct Klein’s icosahedral function. To do this, recall that the Riemann sphere P 1 \mathbb{C}\mathrm{P}^1 is the space of 1-dimensional linear subspaces of 2 \mathbb{C}^2. Let us work directly with 2 \mathbb{C}^2. While SO(3) \mathrm{SO}(3) acts on P 1 \mathbb{C}\mathrm{P}^1, this comes from an action of this group’s double cover SU(2) \mathrm{SU}(2) on 2 \mathbb{C}^2. As we have seen, the rotational symmetry group of the icosahedron, A 5SO(3) \mathrm{A}_5 \subset \mathrm{SO}(3), is double covered by the binary icosahedral group ΓSU(2) \Gamma \subset \mathrm{SU}(2). To build an A 5 \mathrm{A}_5-invariant rational function on P 1 \mathbb{C}\mathrm{P}^1, we should thus look for Γ \Gamma-invariant homogeneous polynomials on 2 \mathbb{C}^2.

It is easy to construct three such polynomials:

V V, of degree 12 12, vanishing on the 1d subspaces corresponding to icosahedron vertices.

E E, of degree 30 30, vanishing on the 1d subspaces corresponding to icosahedron edge midpoints.

F F, of degree 20 20, vanishing on the 1d subspaces corresponding to icosahedron face centers.

Remember, we have embedded the icosahedron in P 1 \mathbb{C}\mathrm{P}^1, and each point in P 1 \mathbb{C}\mathrm{P}^1 is a 1-dimensional subspace of 2 \mathbb{C}^2, so each icosahedron vertex determines such a subspace, and there is a linear function on 2 \mathbb{C}^2, unique up to a constant factor, that vanishes on this subspace. The icosahedron has 12 12 vertices, so we get 12 12 linear functions this way. Multiplying them gives V V, a homogeneous polynomial of degree 12 12 on 2 \mathbb{C}^2 that vanishes on all the subspaces corresponding to icosahedron vertices! The same trick gives E E, which has degree 30 30 because the icosahedron has 30 30 edges, and F F, which has degree 20 20 because the icosahedron has 20 20 faces.

A bit of work is required to check that V,E V,E and F F are invariant under Γ \Gamma, instead of changing by constant factors under group transformations. Indeed, if we had copied this construction using a tetrahedron or octahedron, this would not be the case. For details, see Shurman’s book [Sh], which is free online, or van Hoboken’s nice thesis [VH].

Since both F 3 F^3 and V 5 V^5 have degree 60 60, F 3/V 5 F^3/V^5 is homogeneous of degree zero, so it defines a rational function :P 1P 1 \mathcal{I} \colon \mathbb{C}\mathrm{P}^1 \to \mathbb{C}\mathrm{P}^1. This function is invariant under A 5 \mathrm{A}_5 because F F and V V are invariant under Γ \Gamma. Since F F vanishes at face centers of the icosahedron while V V vanishes at vertices, =F 3/V 5 \mathcal{I} = F^3/V^5 equals 0 0 at face centers and \infty at vertices. Finally, thanks to its invariance property, \mathcal{I} takes the same value at every edge center, so we can normalize V V or F F to make this value 1. Thus, \mathcal{I} has precisely the properties required of Klein’s icosahedral function!

The Appearance of E8

Now comes the really interesting part. Three polynomials on a 2-dimensional space must obey a relation, and V,E V,E, and F F obey a very pretty one, at least after we normalize them correctly:

V 5+E 2+F 3=0. \displaystyle{ V^5 + E^2 + F^3 = 0. }

We could guess this relation simply by noting that each term must have the same degree. Every Γ \Gamma-invariant polynomial on 2 \mathbb{C}^2 is a polynomial in V,E V, E and F F, and indeed

2/Γ{(V,E,F) 3:V 5+E 2+F 3=0}. \displaystyle{ \mathbb{C}^2 / \Gamma \cong \{ (V,E,F) \in \mathbb{C}^3 \colon \; V^5 + E^2 + F^3 = 0 \} . }

This complex surface is smooth except at V=E=F=0 V = E = F = 0, where it has a singularity. And hiding in this singularity is E 8 \mathrm{E}_8!

To see this, we need to ‘resolve’ the singularity. Roughly, this means that we find a smooth complex surface S S and an onto map

that is one-to-one away from the singularity. (More precisely, if X X is an algebraic variety with singular points X singX X_{\mathrm{sing}} \subset X, π:SX \pi \colon S \to X is a resolution of X X if S S is smooth, π \pi is proper, π 1(XX sing) \pi^{-1}(X - X_{sing}) is dense in S S, and π \pi is an isomorphism between π 1(XX sing) \pi^{-1}(X - X_{sing}) and XX sing X - X_{sing}. For more details see Lamotke’s book [L].)

There are many such resolutions, but one minimal resolution, meaning that all others factor uniquely through this one:

What sits above the singularity in this minimal resolution? Eight copies of the Riemann sphere P 1 \mathbb{C}\mathrm{P}^1, one for each dot here:

Two of these P 1 \mathbb{C}\mathrm{P}^1s intersect in a point if their dots are connected by an edge: otherwise they are disjoint.

This amazing fact was discovered by Patrick Du Val in 1934 [DV]. Why is it true? Alas, there is not enough room in the margin, or even in the entire blog article, to explain this. The books by Kirillov [Ki] and Lamotke [L] fill in the details. But here is a clue. The E 8 \mathrm{E}_8 Dynkin diagram has ‘legs’ of lengths 5,2 5, 2 and 3 3:

On the other hand,

A 5v,e,f|v 5=e 2=f 3=vef=1 \displaystyle{ \mathrm{A}_5 \cong \langle v, e, f | v^5 = e^2 = f^3 = v e f = 1 \rangle }

where in terms of the rotational symmetries of the icosahedron:

v v is a 1/5 1/5 turn around some vertex of the icosahedron,

e e is a 1/2 1/2 turn around the center of an edge touching that vertex,

f f is a 1/3 1/3 turn around the center of a face touching that vertex,

and we must choose the sense of these rotations correctly to obtain vef=1 v e f = 1. As a result, the dots in the E 8 \mathrm{E}_8 Dynkin diagram correspond naturally to conjugacy classes in A 5 A_5, with the dot labelled 1 1 corresponding to the identity. Each conjugacy class, in turn, gives a copy of P 1 \mathbb{C}\mathrm{P}^1 in the minimal resolution of 2/Γ \mathbb{C}^2/\Gamma.

Not only the E 8\mathrm{E}_8 Dynkin diagram, but also the E 8\mathrm{E}_8 lattice, can be found in the minimal resolution of 2/Γ\mathbb{C}^2/\Gamma. Topologically, this space is a 4-dimensional manifold. Its real second homology group is an 8-dimensional vector space with an inner product given by the intersection pairing. The integral second homology is a lattice in this vector space spanned by the 8 copies of P 1\mathbb{C}P^1 we have just seen—and it is a copy of the E 8\mathrm{E}_8 lattice [KS].

But let us turn to a more basic question: what is 2/Γ\mathbb{C}^2/\Gamma like as a topological space? To tackle this, first note that we can identify a pair of complex numbers with a single quaternion, and this gives a homeomorphism

2/Γ/Γ \mathbb{C}^2/\Gamma \cong \mathbb{H}/\Gamma

where we let Γ\Gamma act by right multiplication on \mathbb{H}. So, it suffices to understand /Γ\mathbb{H}/\Gamma.

Next, note that sitting inside /Γ\mathbb{H}/\Gamma are the points coming from the unit sphere in \mathbb{H}. These points form the 3-dimensional manifold SU(2)/Γ\mathrm{SU}(2)/\Gamma, which is called the Poincaré homology 3-sphere [KS]. This is a wonderful thing in its own right: Poincaré discovered it as a counterexample to his guess that any compact 3-manifold with the same homology as a 3-sphere is actually diffeomorphic to the 3-sphere, and it is deeply connected to E 8\mathrm{E}_8. But for our purposes, what matters is that we can think of this manifold in another way, since we have a diffeomorphism

SU(2)/ΓSO(3)/A 5. \mathrm{SU}(2)/\Gamma \cong \mathrm{SO}(3)/\mathrm{A}_5.

The latter is just the space of all icosahedra inscribed in the unit sphere in 3d space, where we count two as the same if they differ by a rotational symmetry.

This is a nice description of the points of /Γ\mathbb{H}/\Gamma coming from points in the unit sphere of H\H. But every quaternion lies in some sphere centered at the origin of \mathbb{H}, of possibly zero radius. It follows that 2/Γ/Γ\mathbb{C}^2/\Gamma \cong \mathbb{H}/\Gamma is the space of all icosahedra centered at the origin of 3d space — of arbitrary size, including a degenerate icosahedron of zero size. This degenerate icosahedron is the singular point in 2/Γ\mathbb{C}^2/\Gamma. This is where E 8\E_8 is hiding.

Clearly much has been left unexplained in this brief account. Most of the missing details can be found in the references. But it remains unknown — at least to me — how the two constructions of E 8\mathrm{E}_8 from the icosahedron fit together in a unified picture.

Recall what we did. First we took the binary icosahedral group Γ\Gamma \subset \mathbb{H}, took integer linear combinations of its elements, thought of these as forming a lattice in an 8-dimensional rational vector space with a natural norm, and discovered that this lattice is a copy of the E 8\mathrm{E}_8 lattice. Then we took 2/Γ/Γ\mathbb{C}^2/\Gamma \cong \mathbb{H}/\Gamma, took its minimal resolution, and found that the integral 2nd homology of this space, equipped with its natural inner product, is a copy of the E 8\mathrm{E}_8 lattice. From the same ingredients we built the same lattice in two very different ways! How are these constructions connected? This puzzle deserves a nice solution.

Acknowledgements

I thank Tong Yang for inviting me to speak on this topic at the Annual General Meeting of the Hong Kong Mathematical Society on May 20, 2017, and Guowu Meng for hosting me at the HKUST while I prepared that talk. I also thank the many people, too numerous to accurately list, who have helped me understand these topics over the years.

Bibliography

[CS] J. H. Conway and N. J. A. Sloane, Sphere Packings, Lattices and Groups, Springer, Berlin, 2013.

[DV] P. du Val, On isolated singularities of surfaces which do not affect the conditions of adjunction, I, II and III, Proc. Camb. Phil. Soc. 30, 453–459, 460–465, 483–491.

[KS] R. Kirby and M. Scharlemann, Eight faces of the Poincaré homology 3-sphere, Usp. Mat. Nauk. 37 (1982), 139–159. Available at https://tinyurl.com/ybrn4pjq.

[Ki] A. Kirillov, Quiver Representations and Quiver Varieties, AMS, Providence, Rhode Island, 2016.

[Kl] F. Klein, Lectures on the Ikosahedron and the Solution of Equations of the Fifth Degree, Trüubner & Co., London, 1888. Available at https://archive.org/details/cu31924059413439.

[L] K. Lamotke, Regular Solids and Isolated Singularities, Vieweg & Sohn, Braunschweig, 1986.

[N] O. Nash, On Klein’s icosahedral solution of the quintic. Available as arXiv:1308.0955.

[Sh] J. Shurman, Geometry of the Quintic, Wiley, New York, 1997. Available at http://people.reed.edu/~jerry/Quintic/quintic.html.

[Sl] P. Slodowy, Platonic solids, Kleinian singularities, and Lie groups, in Algebraic Geometry, Lecture Notes in Mathematics 1008, Springer, Berlin, 1983, pp. 102–138.

[VH] J. van Hoboken, Platonic Solids, Binary Polyhedral Groups, Kleinian Singularities and Lie Algebras of Type A, D, E, Master’s Thesis, University of Amsterdam, 2002. Available at http://math.ucr.edu/home/baez/joris_van_hoboken_platonic.pdf.

[V] M. Viazovska, The sphere packing problem in dimension 8, Ann. Math. 185 (2017), 991–1015. Available at https://arxiv.org/abs/1603.04246.

December 13, 2017

John BaezFrom the Icosahedron to E8

Here’s a draft of a little thing I’m writing for the Newsletter of the London Mathematical Society. The regular icosahedron is connected to many ‘exceptional objects’ in mathematics, and here I describe two ways of using it to construct \mathrm{E}_8. One uses a subring of the quaternions called the ‘icosians’, while the other uses Patrick du Val’s work on the resolution of Kleinian singularities. I leave it as a challenge to find the connection between these two constructions!

You can see a PDF here:

From the icosahedron to E8.

Here’s the story:

From the Icosahedron to E8

In mathematics, every sufficiently beautiful object is connected to all others. Many exciting adventures, of various levels of difficulty, can be had by following these connections. Take, for example, the icosahedron—that is, the regular icosahedron, one of the five Platonic solids. Starting from this it is just a hop, skip and a jump to the \mathrm{E}_8 lattice, a wonderful pattern of points in 8 dimensions! As we explore this connection we shall see that it also ties together many other remarkable entities: the golden ratio, the quaternions, the quintic equation, a highly symmetrical 4-dimensional shape called the 600-cell, and a manifold called the Poincaré homology 3-sphere.

Indeed, the main problem with these adventures is knowing where to stop! The story we shall tell is just a snippet of a longer one involving the McKay correspondence and quiver representations. It would be easy to bring in the octonions, exceptional Lie groups, and more. But it can be enjoyed without these esoteric digressions, so let us introduce the protagonists without further ado.

The icosahedron has a long history. According to a comment in Euclid’s Elements it was discovered by Plato’s friend Theaetetus, a geometer who lived from roughly 415 to 369 BC. Since Theaetetus is believed to have classified the Platonic solids, he may have found the icosahedron as part of this project. If so, it is one of the earliest mathematical objects discovered as part of a classification theorem. It’s hard to be sure. In any event, it was known to Plato: in his Timaeus, he argued that water comes in atoms of this shape.

The icosahedron has 20 triangular faces, 30 edges, and 12 vertices. We can take the vertices to be the four points

\displaystyle{   (0 , \pm 1 , \pm \Phi)  }

and all those obtained from these by cyclic permutations of the coordinates, where

\displaystyle{   \Phi = \frac{\sqrt{5} + 1}{2} }

is the golden ratio. Thus, we can group the vertices into three orthogonal golden rectangles: rectangles whose proportions are \Phi to 1.

In fact, there are five ways to do this. The rotational symmetries of the icosahedron permute these five ways, and any nontrivial rotation gives a nontrivial permutation. The rotational symmetry group of the icosahedron is thus a subgroup of \mathrm{S}_5. Moreover, this subgroup has 60 elements. After all, any rotation is determined by what it does to a chosen face of the icosahedron: it can map this face to any of the 20 faces, and it can do so in 3 ways. The rotational symmetry group of the icosahedron is therefore a 60-element subgroup of \mathrm{S}_5. Group theory therefore tells us that it must be the alternating group \mathrm{A}_5.

The \mathrm{E}_8 lattice is harder to visualize than the icosahedron, but still easy to characterize. Take a bunch of equal-sized spheres in 8 dimensions. Get as many of these spheres to touch a single sphere as you possibly can. Then, get as many to touch those spheres as you possibly can, and so on. Unlike in 3 dimensions, where there is ‘wiggle room’, you have no choice about how to proceed, except for an overall rotation and translation. The balls will inevitably be centered at points of the \mathrm{E}_8 lattice!

We can also characterize the \mathrm{E}_8 lattice as the one giving the densest packing of spheres among all lattices in 8 dimensions. This packing was long suspected to be optimal even among those that do not arise from lattices—but this fact was proved only in 2016, by the young mathematician Maryna Viazovska [V].

We can also describe the \mathrm{E}_8 lattice more explicitly. In suitable coordinates, it consists of vectors for which:

1) the components are either all integers or all integers plus \textstyle{\frac{1}{2}}, and

2) the components sum to an even number.

This lattice consists of all integral linear combinations of the 8 rows of this matrix:

\left( \begin{array}{rrrrrrrr}  1&-1&0&0&0&0&0&0 \\  0&1&-1&0&0&0&0&0 \\  0&0&1&-1&0&0&0&0 \\  0&0&0&1&-1&0&0&0 \\  0&0&0&0&1&-1&0&0 \\  0&0&0&0&0&1&-1&0 \\  0&0&0&0&0&1&1&0 \\  -\frac{1}{2}&-\frac{1}{2}&-\frac{1}{2}&-\frac{1}{2}&-\frac{1}{2}&-\frac{1}{2}&-\frac{1}{2}&-\frac{1}{2}   \end{array} \right)

The inner product of any row vector with itself is 2, while the inner product of distinct row vectors is either 0 or -1. Thus, any two of these vectors lie at an angle of either 90° or 120°. If we draw a dot for each vector, and connect two dots by an edge when the angle between their vectors is 120° we get this pattern:

This is called the \mathrm{E}_8 Dynkin diagram. In the first part of our story we shall find the \mathrm{E}_8 lattice hiding in the icosahedron; in the second part, we shall find this diagram. The two parts of this story must be related—but the relation remains mysterious, at least to me.

The Icosians

The quickest route from the icosahedron to \mathrm{E}_8 goes through the fourth dimension. The symmetries of the icosahedron can be described using certain quaternions; the integer linear combinations of these form a subring of the quaternions called the ‘icosians’, but the icosians can be reinterpreted as a lattice in 8 dimensions, and this is the \mathrm{E}_8 lattice [CS]. Let us see how this works.

The quaternions, discovered by Hamilton, are a 4-dimensional algebra

\displaystyle{ \mathbb{H} = \{a + bi + cj + dk \colon \; a,b,c,d\in \mathbb{R}\}  }

with multiplication given as follows:

\displaystyle{i^2 = j^2 = k^2 = -1, }
\displaystyle{i j = k = - j i  \textrm{ and cyclic permutations} }

It is a normed division algebra, meaning that the norm

\displaystyle{ |a + bi + cj + dk| = \sqrt{a^2 + b^2 + c^2 + d^2} }

obeys

|q q'| = |q| |q'|

for all q,q' \in \mathbb{H}. The unit sphere in \mathbb{H} is thus a group, often called \mathrm{SU}(2) because its elements can be identified with 2 \times 2 unitary matrices with determinant 1. This group acts as rotations of 3-dimensional Euclidean space, since we can see any point in \mathbb{R}^3 as a purely imaginary quaternion x = bi + cj + dk, and the quaternion qxq^{-1} is then purely imaginary for any q \in \mathrm{SO}(3). Indeed, this action gives a double cover

\displaystyle{   \alpha \colon \mathrm{SU}(2) \to \mathrm{SO}(3) }

where \mathrm{SO}(3) is the group of rotations of \mathbb{R}^3.

We can thus take any Platonic solid, look at its group of rotational symmetries, get a subgroup of \mathrm{SO}(3), and take its double cover in \mathrm{SU}(2). If we do this starting with the icosahedron, we see that the 60-element group \mathrm{A}_5 \subset \mathrm{SO}(3) is covered by a 120-element group \Gamma \subset \mathrm{SU}(2), called the binary icosahedral group.

The elements of \Gamma are quaternions of norm one, and it turns out that they are the vertices of a 4-dimensional regular polytope: a 4-dimensional cousin of the Platonic solids. It deserves to be called the “hypericosahedron”, but it is usually called the 600-cell, since it has 600 tetrahedral faces. Here is the 600-cell projected down to 3 dimensions, drawn using Robert Webb’s Stella software:

Explicitly, if we identify \mathbb{H} with \mathbb{R}^4, the elements of \Gamma are the points

\displaystyle{    (\pm \textstyle{\frac{1}{2}}, \pm \textstyle{\frac{1}{2}},\pm \textstyle{\frac{1}{2}},\pm \textstyle{\frac{1}{2}}) }

\displaystyle{ (\pm 1, 0, 0, 0) }

\displaystyle{  \textstyle{\frac{1}{2}} (\pm \Phi, \pm 1 , \pm 1/\Phi, 0 ),}

and those obtained from these by even permutations of the coordinates. Since these points are closed under multiplication, if we take integral linear combinations of them we get a subring of the quaternions:

\displaystyle{    \mathbb{I} = \{ \sum_{q \in \Gamma} a_q  q  : \; a_q \in \mathbb{Z} \}  \subset \mathbb{H} .}

Conway and Sloane [CS] call this the ring of icosians. The icosians are not a lattice in the quaternions: they are dense. However, any icosian is of the form a + bi + cj + dk where a,b,c, and d live in the golden field

\displaystyle{   \mathbb{Q}(\sqrt{5}) = \{ x + \sqrt{5} y : \; x,y \in \mathbb{Q}\} }

Thus we can think of an icosian as an 8-tuple of rational numbers. Such 8-tuples form a lattice in 8 dimensions.

In fact we can put a norm on the icosians as follows. For q \in \mathbb{I} the usual quaternionic norm has

\displaystyle{  |q|^2 =  x + \sqrt{5} y }

for some rational numbers x and y, but we can define a new norm on \mathbb{I} by setting

\displaystyle{ \|q\|^2 = x + y }

With respect to this new norm, the icosians form a lattice that fits isometrically in 8-dimensional Euclidean space. And this is none other than \mathrm{E}_8!

Klein’s Icosahedral Function

Not only is the \mathrm{E}_8 lattice hiding in the icosahedron; so is the \mathrm{E}_8 Dynkin diagram. The space of all regular icosahedra of arbitrary size centered at the origin has a singularity, which corresponds to a degenerate special case: the icosahedron of zero size. If we resolve this singularity in a minimal way we get eight Riemann spheres, intersecting in a pattern described by the \mathrm{E}_8 Dynkin diagram!

This remarkable story starts around 1884 with Felix Klein’s Lectures on the Icosahedron [Kl]. In this work he inscribed an icosahedron in the Riemann sphere, \mathbb{C}\mathrm{P}^1. He thus got the icosahedron’s symmetry group, \mathrm{A}_5, to act as conformal transformations of \mathbb{C}\mathrm{P}^1—indeed, rotations. He then found a rational function of one complex variable that is invariant under all these transformations. This function equals 0 at the centers of the icosahedron’s faces, 1 at the midpoints of its edges, and \infty at its vertices.

Here is Klein’s icosahedral function as drawn by Abdelaziz Nait Merzouk. The color shows its phase, while the contour lines show its magnitude:

We can think of Klein’s icosahedral function as a branched cover of the Riemann sphere by itself with 60 sheets:

\displaystyle{                \mathcal{I} \colon \mathbb{C}\mathrm{P}^1 \to \mathbb{C}\mathrm{P}^1 .}

Indeed, \mathrm{A}_5 acts on \mathbb{C}\mathrm{P}^1, and the quotient space \mathbb{C}\mathrm{P}^1/\mathrm{A}_5 is isomorphic to \mathbb{C}\mathrm{P}^1 again. The function \mathcal{I} gives an explicit formula for the quotient map \mathbb{C}\mathrm{P}^1 \to \mathbb{C}\mathrm{P}^1/\mathrm{A}_5 \cong \mathbb{C}\mathrm{P}^1.

Klein managed to reduce solving the quintic to the problem of solving the equation \mathcal{I}(z) = w for z. A modern exposition of this result is Shurman’s Geometry of the Quintic [Sh]. For a more high-powered approach, see the paper by Nash [N]. Unfortunately, neither of these treatments avoids complicated calculations. But our interest in Klein’s icosahedral function here does not come from its connection to the quintic: instead, we want to see its connection to \mathrm{E}_8.

For this we should actually construct Klein’s icosahedral function. To do this, recall that the Riemann sphere \mathbb{C}\mathrm{P}^1 is the space of 1-dimensional linear subspaces of \mathbb{C}^2. Let us work directly with \mathbb{C}^2. While \mathrm{SO}(3) acts on \mathbb{C}\mathrm{P}^1, this comes from an action of this group’s double cover \mathrm{SU}(2) on \mathbb{C}^2. As we have seen, the rotational symmetry group of the icosahedron, \mathrm{A}_5 \subset \mathrm{SO}(3), is double covered by the binary icosahedral group \Gamma \subset \mathrm{SU}(2). To build an \mathrm{A}_5-invariant rational function on \mathbb{C}\mathrm{P}^1, we should thus look for \Gamma-invariant homogeneous polynomials on \mathbb{C}^2.

It is easy to construct three such polynomials:

V, of degree 12, vanishing on the 1d subspaces corresponding to icosahedron vertices.

E, of degree 30, vanishing on the 1d subspaces corresponding to icosahedron edge midpoints.

F, of degree 20, vanishing on the 1d subspaces corresponding to icosahedron face centers.

Remember, we have embedded the icosahedron in \mathbb{C}\mathrm{P}^1, and each point in \mathbb{C}\mathrm{P}^1 is a 1-dimensional subspace of \mathbb{C}^2, so each icosahedron vertex determines such a subspace, and there is a linear function on \mathbb{C}^2, unique up to a constant factor, that vanishes on this subspace. The icosahedron has 12 vertices, so we get 12 linear functions this way. Multiplying them gives V, a homogeneous polynomial of degree 12 on \mathbb{C}^2 that vanishes on all the subspaces corresponding to icosahedron vertices! The same trick gives E, which has degree 30 because the icosahedron has 30 edges, and F, which has degree 20 because the icosahedron has 20 faces.

A bit of work is required to check that V,E and F are invariant under \Gamma, instead of changing by constant factors under group transformations. Indeed, if we had copied this construction using a tetrahedron or octahedron, this would not be the case. For details, see Shurman’s book [Sh], which is free online, or van Hoboken’s nice thesis [VH].

Since both F^3 and V^5 have degree 60, F^3/V^5 is homogeneous of degree zero, so it defines a rational function \mathcal{I} \colon \mathbb{C}\mathrm{P}^1 \to \mathbb{C}\mathrm{P}^1. This function is invariant under \mathrm{A}_5 because F and V are invariant under \Gamma. Since F vanishes at face centers of the icosahedron while V vanishes at vertices, \mathcal{I} = F^3/V^5 equals 0 at face centers and \infty at vertices. Finally, thanks to its invariance property, \mathcal{I} takes the same value at every edge center, so we can normalize V or F to make this value 1.

Thus, \mathcal{I} has precisely the properties required of Klein’s icosahedral function! And indeed, these properties uniquely characterize that function, so that function is \mathcal{I}.

The Appearance of E8

Now comes the really interesting part. Three polynomials on a 2-dimensional space must obey a relation, and V,E, and F obey a very pretty one, at least after we normalize them correctly:

\displaystyle{      V^5 + E^2 + F^3 = 0. }

We could guess this relation simply by noting that each term must have the same degree. Every \Gamma-invariant polynomial on \mathbb{C}^2 is a polynomial in V, E and F, and indeed

\displaystyle{          \mathbb{C}^2 / \Gamma \cong  \{ (V,E,F) \in \mathbb{C}^3 \colon \; V^5 + E^2 + F^3 = 0 \} . }

This complex surface is smooth except at V = E = F = 0, where it has a singularity. And hiding in this singularity is \mathrm{E}_8!

To see this, we need to ‘resolve’ the singularity. Roughly, this means that we find a smooth complex surface S and an onto map

that is one-to-one away from the singularity. (More precisely, if X is an algebraic variety with singular points X_{\mathrm{sing}} \subset X, \pi \colon S \to X is a resolution of X if S is smooth, \pi is proper, \pi^{-1}(X - X_{\textrm{sing}}) is dense in S, and \pi is an isomorphism between \pi^{-1}(X - X_{\mathrm{sing}}) and X - X_{\mathrm{sing}}. For more details see Lamotke’s book [L].)

There are many such resolutions, but one minimal resolution, meaning that all others factor uniquely through this one:

What sits above the singularity in this minimal resolution? Eight copies of the Riemann sphere \mathbb{C}\mathrm{P}^1, one for each dot here:

Two of these \mathbb{C}\mathrm{P}^1s intersect in a point if their dots are connected by an edge: otherwise they are disjoint.

This amazing fact was discovered by Patrick Du Val in 1934 [DV]. Why is it true? Alas, there is not enough room in the margin, or even in the entire blog article, to explain this. The books by Kirillov [Ki] and Lamotke [L] fill in the details. But here is a clue. The \mathrm{E}_8 Dynkin diagram has ‘legs’ of lengths 5, 2 and 3:

On the other hand,

\displaystyle{   \mathrm{A}_5 \cong \langle v, e, f | v^5 = e^2 = f^3 = v e f = 1 \rangle }

where in terms of the rotational symmetries of the icosahedron:

v is a 1/5 turn around some vertex of the icosahedron,

e is a 1/2 turn around the center of an edge touching that vertex,

f is a 1/3 turn around the center of a face touching that vertex,

and we must choose the sense of these rotations correctly to obtain vef = 1. As a result, the dots in the \mathrm{E}_8 Dynkin diagram correspond naturally to conjugacy classes in A_5, with the dot labelled 1 corresponding to the identity. Each conjugacy class, in turn, gives a copy of \mathbb{C}\mathrm{P}^1 in the minimal resolution of \mathbb{C}^2/\Gamma.

Not only the \mathrm{E}_8 Dynkin diagram, but also the \mathrm{E}_8 lattice, can be found in the minimal resolution of \mathbb{C}^2/\Gamma. Topologically, this space is a 4-dimensional manifold. Its real second homology group is an 8-dimensional vector space with an inner product given by the intersection pairing. The integral second homology is a lattice in this vector space spanned by the 8 copies of \mathbb{C}P^1 we have just seen—and it is a copy of the \mathrm{E}_8 lattice [KS].

But let us turn to a more basic question: what is \mathbb{C}^2/\Gamma like as a topological space? To tackle this, first note that we can identify a pair of complex numbers with a single quaternion, and this gives a homeomorphism

\mathbb{C}^2/\Gamma \cong \mathbb{H}/\Gamma

where we let \Gamma act by right multiplication on \mathbb{H}. So, it suffices to understand \mathbb{H}/\Gamma.

Next, note that sitting inside \mathbb{H}/\Gamma are the points coming from the unit sphere in \mathbb{H}. These points form the 3-dimensional manifold \mathrm{SU}(2)/\Gamma, which is called the Poincaré homology 3-sphere [KS]. This is a wonderful thing in its own right: Poincaré discovered it as a counterexample to his guess that any compact 3-manifold with the same homology as a 3-sphere is actually diffeomorphic to the 3-sphere, and it is deeply connected to \mathrm{E}_8. But for our purposes, what matters is that we can think of this manifold in another way, since we have a diffeomorphism

\mathrm{SU}(2)/\Gamma \cong \mathrm{SO}(3)/\mathrm{A}_5.

The latter is just the space of all icosahedra inscribed in the unit sphere in 3d space, where we count two as the same if they differ by a rotational symmetry.

This is a nice description of the points of \mathbb{H}/\Gamma coming from points in the unit sphere of \mathbb{H}. But every quaternion lies in some sphere centered at the origin of \mathbb{H}, of possibly zero radius. It follows that \mathbb{C}^2/\Gamma \cong \mathbb{H}/\Gamma is the space of all icosahedra centered at the origin of 3d space—of arbitrary size, including a degenerate icosahedron of zero size. This degenerate icosahedron is the singular point in \mathbb{C}^2/\Gamma. This is where \mathrm{E}_8 is hiding.

Clearly much has been left unexplained in this brief account. Most of the missing details can be found in the references. But it remains unknown—at least to me—how the two constructions of \mathrm{E}_8 from the icosahedron fit together in a unified picture.

Recall what we did. First we took the binary icosahedral group \Gamma \subset \mathbb{H}, took integer linear combinations of its elements, thought of these as forming a lattice in an 8-dimensional rational vector space with a natural norm, and discovered that this lattice is a copy of the \mathrm{E}_8 lattice. Then we took \mathbb{C}^2/\Gamma \cong \mathbb{H}/\Gamma, took its minimal resolution, and found that the integral 2nd homology of this space, equipped with its natural inner product, is a copy of the \mathrm{E}_8 lattice. From the same ingredients we built the same lattice in two very different ways! How are these constructions connected? This puzzle deserves a nice solution.

Acknowledgements

I thank Tong Yang for inviting me to speak on this topic at the Annual General Meeting of the Hong Kong Mathematical Society on May 20, 2017, and Guowu Meng for hosting me at the HKUST while I prepared that talk. I also thank the many people, too numerous to accurately list, who have helped me understand these topics over the years.

Bibliography

[CS] J. H. Conway and N. J. A. Sloane, Sphere Packings, Lattices and Groups, Springer, Berlin, 2013.

[DV] P. du Val, On isolated singularities of surfaces which do not affect the conditions of adjunction, I, II and III, Proc. Camb. Phil. Soc. 30, 453–459, 460–465, 483–491.

[KS] R. Kirby and M. Scharlemann, Eight faces of the Poincaré homology 3-sphere, Usp. Mat. Nauk. 37 (1982), 139–159. Available at https://tinyurl.com/ybrn4pjq

[Ki] A. Kirillov, Quiver Representations and Quiver Varieties, AMS, Providence, Rhode Island, 2016.

[Kl] F. Klein, Lectures on the Ikosahedron and the Solution of Equations of the Fifth Degree, Trüubner & Co., London, 1888. Available at https://archive.org/details/cu31924059413439

[L] K. Lamotke, Regular Solids and Isolated Singularities, Vieweg & Sohn, Braunschweig, 1986.

[N] O. Nash, On Klein’s icosahedral solution of the quintic. Available at https://arxiv.org/abs/1308.0955

[Sh] J. Shurman, Geometry of the Quintic, Wiley, New York, 1997. Available at http://people.reed.edu/~jerry/Quintic/quintic.html

[Sl] P. Slodowy, Platonic solids, Kleinian singularities, and Lie groups, in Algebraic Geometry, Lecture Notes in Mathematics 1008, Springer, Berlin, 1983, pp. 102–138.

[VH] J. van Hoboken, Platonic Solids, Binary Polyhedral Groups, Kleinian Singularities and Lie Algebras of Type A, D, E, Master’s Thesis, University of Amsterdam, 2002. Available at http://math.ucr.edu/home/baez/joris_van_hoboken_platonic.pdf

[V] M. Viazovska, The sphere packing problem in dimension 8, Ann. Math. 185 (2017), 991–1015. Available at https://arxiv.org/abs/1603.04246


December 12, 2017

Jonathan ShockReflections

It's been three years or so since my last post. It would be cliche to talk of all that has changed, but perhaps the more crucial aspects surround what has remained the same. Ironically, the changes have often been associated with the realisation of themes and patterns which recur again and again...perhaps it is simply the spotting of these patterns which leads to potential growth, though growth often feels like a multi-directional random walk, where over time you get further from where you started, but the directions only proliferate as you find more corners to turn down. The perpendicular nature of Euclidean dimensions is itself probably illusory as here, directions may curve back on themselves and become a lost corridor which you had previously explored and long since left behind, with dust hiding your former footsteps...but the particles still floating reminds you that you've been here before and that there are mistakes still to learn from what may be a narrow and uncomfortable passageway. So perhaps it is all some giant torus, and the trick is counting the closed cycles and trying to find ones which are enjoyable to go around, rather than trying to escape from them all, which is probably futile.

A cycle which recurs is hidden truth, or perhaps, more accurately the hiding of truth...it may be deeper than the hiding of truth from the map as the map-maker himself may be unaware of what is resistance/compromise/expectation/honesty/pain/frustration or joy in a given journey. Google Maps has to go back to the matrix to find any reality hidden within the neurons, and maybe the reality has been erased long ago...'safe passage', or 'dead end' being all that is left...where in fact it may not be a dead end at all, and the paralysis that ensues would be eased with only the simplest of secret code words being divulged...I don't like that...there is a better way...can you do this for me?

This is not to be read as deep or meaningful, but really as a little gush of words, to ease the fingers into writing some more, which I hope to do. This is a deep breath and a stretch after a long long sleep...


BackreactionResearch perversions are spreading. You will not like the proposed solution.

The ivory tower from The Neverending Story Science has a problem. The present organization of academia discourages research that has tangible outcomes, and this wastes a lot of money. Of course scientific research is not exclusively pursued in academia, but much of basic research is. And if basic research doesn’t move forward, science by large risks getting stuck. At the root of the problem

David Hoggthe assumptions underlying EPRV

The conversation on Friday with Cisewski and Bedell got me thinking all weekend. It appears that the problem of precise RV difference measurement becomes ill-posed once we permit the stellar spectrum to vary with time. I felt like I nearly had a breakthrough on this today. Let me start by backing up.

It is impossible to obtain exceedingly precise absolute radial velocities (RVs) of stars, because to get an absolute RV, you need a spectral model that puts the centroids of the absorption lines in precisely the correct locations. Right now physical models of convecting photospheres have imperfections that lead to small systematic differences in line shapes, depths, and locations between the models of stars and the observations of stars. Opinions vary, but most astronomers would agree that this limits absolute RV accuracy at the 0.3-ish km/s level (not m/s level, km/s level).

How is it, then, that we measure at the m/s level with extreme-precision RV (EPRV) projects? The answer is that as long as the stellar spectrum doesn't change with time, we can measure relative velocity changes to arbitrary accuracy! That has been an incredibly productive realization, leading as it did to the discovery, confirmation, or characterization of many hundreds of planets around other stars!

The issue is: Stellar spectra do change with time! There is activity, and also turbulent convection, and also rotation. This puts a long-term wrench in the long-term EPRV plans. It might even partially explain why current EPRV projects never beat m/s accuracy, even when the data (on the face of it) seem good enough to do better. Now the question is: Do the time variations of stellar spectra put an absolute floor on relative-RV measurement? That is, do they limit ultimate precision?

I think the answer is no. But the Right Thing To Do (tm) might be hard. It will involve making some new assumptions. No longer will we assume that the stellar spectrum is constant with time. But we will have to assume that spectral variations are somehow uncorrelated (in the long run) with exoplanet phase. We might also have to assume that the exoplanet-induced RV variations are dynamically predictable. Time to work out exactly what we need to assume and how.

December 11, 2017

David Hoggall about radial velocities

The day started with a conversation among Stuermer (Chicago), Montet (Chicago), Bedell (Flatiron), and me about the problem of deriving radial velocities from two-d spectroscopic images rather than going through one-d extractions. We tried to find scope for a minimal paper on the subject.

The day ended with a great talk by Jessi Cisewski (Yale) about topological data analysis. She finally convinced me that there is some there there. I asked about using automation to find best statistics, and she agreed that it must be possible. Afterwards, Ben Wandelt (Paris) told me he has a nearly-finished project on this very subject. Before Cisewski's talk, she spoke to Bedell and me about our EPRV plans. That conversation got me concerned about the non-identifiability of radial velocity if you let the stellar spectrum vary with time. Hmm.

December 10, 2017

David Hoggwhat's the circular acceleration?

Ana Bonaca (Harvard) and I started the day with a discussion that was in part about how to present our enormous, combinatoric range of results we have created with our information-theory project. One tiny point there: How do you define the equivalent of the circular velocity in a non-axi-symmetric potential? There is no clear answer. One is to do something relating to averaging the acceleration around a circular ring. Another is to use v2/R locally. Another is to use that locally, but on the radial component of the acceleration.

While I was proctoring an exam, Megan Bedell (Flatiron) wrote me to say that our one-d, data-driven spectroscopic RV extraction code is now performing almost as well as the HARPS pipeline, on real data. That's exciting. We had a short conversation about extending our analysis to more stars to make the point better. We believe that our special sauce is our treatment of the tellurics, but we are not yet certain of this.

John BaezExcitonium

In certain crystals you can knock an electron out of its favorite place and leave a hole: a place with a missing electron. Sometimes these holes can move around like particles. And naturally these holes attract electrons, since they are places an electron would want to be.

Since an electron and a hole attract each other, they can orbit each other. An orbiting electron-hole pair is a bit like a hydrogen atom, where an electron orbits a proton. All of this is quantum-mechanical, of course, so you should be imagining smeared-out wavefunctions, not little dots moving around. But imagine dots if it’s easier.

An orbiting electron-hole pair is called an exciton, because while it acts like a particle in its own right, it’s really just a special kind of ‘excited’ electron—an electron with extra energy, not in its lowest energy state where it wants to be.

An exciton usually doesn’t last long: the orbiting electron and hole spiral towards each other, the electron finds the hole it’s been seeking, and it settles down.

But excitons can last long enough to do interesting things. In 1978 the Russian physicist Abrikosov wrote a short and very creative paper in which he raised the possibility that excitons could form a crystal in their own right! He called this new state of matter excitonium.

In fact his reasoning was very simple.

Just as electrons have a mass, so do holes. That sounds odd, since a hole is just a vacant spot where an electron would like to be. But such a hole can move around. It has more energy when it moves faster, and it takes force to accelerate it—so it acts just like it has a mass! The precise mass of a hole depends on the nature of the substance we’re dealing with.

Now imagine a substance with very heavy holes.

When a hole is much heavier than an electron, it will stand almost still when an electron orbits it. So, they form an exciton that’s very similar to a hydrogen atom, where we have an electron orbiting a much heavier proton.

Hydrogen comes in different forms: gas, liquid, solid… and at extreme pressures, like in the core of Jupiter, hydrogen becomes metallic. So, we should expect that excitons can come in all these different forms too!

We should be able to create an exciton gas… an exciton liquid… an exciton solid…. and under the right circumstances, a metallic crystal of excitons. Abrikosov called this metallic excitonium.

People have been trying to create this stuff for a long time. Some claim to have succeeded. But a new paper claims to have found something else: a Bose–Einstein condensate of excitons:

• Anshul Kogar et al, Signatures of exciton condensation in a transition metal dichalcogenide, Science (2017).

A lone electron acts like a fermion, so I guess a hole does do, and if so that means an exciton acts approximately like a boson. When it’s cold, a gas of bosons will ‘condense’, with a significant fraction of them settling into the lowest energy states available. I guess excitons have been seen to do this!

There’s a pretty good simplified explanation at the University of Illinois website:

• Siv Schwink, Physicists excited by discovery of new form of matter, excitonium, 7 December 2017.

However, the picture on this page, which I used above, shows domain walls moving through crystallized excitonium. I think that’s different than a Bose-Einstein condensate!

I urge you to look at Abrikosov’s paper. It’s short and beautiful:

• Alexei Alexeyevich Abrikosov, A possible mechanism of high temperature superconductivity, Journal of the Less Common Metals
62 (1978), 451–455.

(Cool journal title. Is there a journal of the more common metals?)

In this paper, Abrikoskov points out that previous authors had the idea of metallic excitonium. Maybe his new idea was that this might be a superconductor—and that this might explain high-temperature superconductivity. The reason for his guess is that metallic hydrogen, too, is widely suspected to be a superconductor.

Later, Abrikosov won the Nobel prize for some other ideas about superconductors. I think I should read more of his papers. He seems like one of those physicists with great intuitions.

Puzzle 1. If a crystal of excitons conducts electricity, what is actually going on? That is, which electrons are moving around, and how?

This is a fun puzzle because an exciton crystal is a kind of abstract crystal created by the motion of electrons in another, ordinary, crystal. And that leads me to another puzzle, that I don’t know the answer to:

Puzzle 2. Is it possible to create a hole in excitonium? If so, it possible to create an exciton in excitonium? If so, is it possible to create meta-excitonium: an crystal of excitons in excitonium?


December 09, 2017

David HoggGaia-based training data, GANs, and optical interferometry

In today's Gaia DR2 working meeting, I worked with Christina Eilers (MPIA) to build the APOGEE+TGAS training set we could use to train her post-Cannon model of stellar spectra. The important idea behind the new model is that we are no longer trying to specify the latent parameters that control the spectral generation; we are using uninterpreted latents. For this reason, we don't need complete labels (or any labels!) for the training set. That means we can train on, and predict, any labels or label subset we like. We are going to use absolute magnitude, and thereby put distances onto all APOGEE giants. And thereby map the Milky Way!

In stars group meeting, Richard Galvez (NYU) started a lively discussion by showing how generative adversarial networks work and giving some impressive examples on astronomical imaging data. This led into some good discussion about uses and abuses of complex machine-learning methods in astrophysics.

Also in stars meeting, Oliver Pfuhl (MPA) described to us how the VLT four-telescope interferometric imager GRAVITY works. It is a tremendously difficult technical problem to perform interferometric imaging in the optical: You have to keep everything aligned in real time to a tiny fraction of a micron, and you have little carts with mirrors zipping down tunnels at substantial speeds! The instrument is incredibly impressive: It is performing milli-arcsecond astrometry of the Galactic Center, and it can see star S2 move on a weekly basis!.

Doug NatelsonFinding a quantum phase transition, part 1

I am going to try to get the post frequency back up now that some tasks are getting off the to-do list....

Last year, we found what seems to be a previously undiscovered quantum phase transition, and I think it's kind of a fun example of how this kind of science gets done, with a few take-away lessons for students.  The paper itself is here.

My colleague Jun Lou and I had been interested in low-dimensional materials with interesting magnetic properties for a while (back before it was cool, as the hipsters say).  The 2d materials craze continues, and a number of these are expected to have magnetic ordering of various kinds.  For example, even down to atomically thin single layers, Cr2Ge2Te6 is a ferromagnetic insulator (see here), as is CrI3 (see here).  The 2d material VS2 had been predicted to be a ferromagnet in the single-layer limit.  

In the pursuit of VS2, Prof. Lou's student Jiangtan Yuan found that the vanadium-sulphur phase diagram is rather finicky, and we ended up with a variety of crystals of V5S8 with thicknesses down to about 10 nm (a few unit cells).  

[Lesson 1:  Just because they're not the samples you want doesn't mean that they're uninteresting.]   

It turns out that V5S8  had been investigated in bulk form (that is, mm-cm sized crystals) rather heavily by several Japanese groups starting in the mid-1970s.  They discovered and figured out quite a bit.  Using typical x-ray methods they found the material's structure:  It's better to think of V5S8  as V0.25VS2.  There are VS2 layers with an ordered arrangement of vanadium atoms intercalated in the interlayer space.  By measuring electrical conduction, they found that the system as a whole is metallic.   Using neutron scattering, they showed that there are unpaired 3d electrons that are localized to those intercalated vanadium atoms, and that those local magnetic moments order antiferromagnetically below a Neel temperature of 32 K in the bulk.  The moments like to align (antialign) along a direction close to perpendicular to the VS2 layers, as shown in the top panel of the figure.   (Antiferromagnetism can be tough to detect, as it does not produce the big stray magnetic fields that we all associate with ferromagnetism. )

If a large magnetic field is applied perpendicular to the layers, the spins that are anti-aligned become very energetically unfavored.  It becomes energetically favorable for the spins to find some way to avoid antialignment but still keep the antiferromagnetism.  The result is a spin-flop transition, when the moments keep their antiferromagnetism but flop down toward the plane, as in the lower panel of the figure.  What's particularly nice in this system is that this ends up producing a kink in the electrical resistance vs. magnetic field that is a clear, unambiguous signature of the spin flop, and therefore a way of spotting antiferromagnetism electrically

My student Will Hardy figured out how to make reliable electrical contact to the little, thin V5S8 crystals (not a trivial task), and we found the physics described above.  However, we also stumbled on a mystery that I'll leave you as a cliff-hanger until the next post:  Just below the Neel temperature, we didn't just find the spin-flop kink.  Instead, we found hysteresis in the magnetoresistance, over an extremely narrow temperature range, as shown here.

[Lesson 2:  New kinds of samples can make "old" materials young again.]

[Lesson 3:  Don't explore too coarsely.  We could easily have missed that entire ~ 2.5 K temperature window when you can see the hysteresis with our magnetic field range.] 

Tune in next time for the rest of the story....

December 08, 2017

John BaezWigner Crystals

I’d like to explain a conjecture about Wigner crystals, which we came up with in a discussion on Google+. It’s a purely mathematical conjecture that’s pretty simple to state, motivated by the picture above. But let me start at the beginning.

Electrons repel each other, so they don’t usually form crystals. But if you trap a bunch of electrons in a small space, and cool them down a lot, they will try to get as far away from each other as possible—and they can do this by forming a crystal!

This is sometimes called an electron crystal. It’s also called a Wigner crystal, because the great physicist Eugene Wigner predicted in 1934 that this would happen.

Only since the late 1980s have we been able to make electron crystals in the lab. Such a crystal can only form if the electron density is low enough. The reasons is that even at absolute zero, a gas of electrons has kinetic energy. At absolute zero the gas will minimize its energy. But it can’t do this by having all the electrons in a state with zero momentum, since you can’t put two electrons in the same state, thanks to the Pauli exclusion principle. So, higher momentum states need to be occupied, and this means there’s kinetic energy. And it has more if its density is high: if there’s less room in position space, the electrons are forced to occupy more room in momentum space.

When the density is high, this prevents the formation of a crystal: instead, we have lots of electrons whose wavefunctions are ‘sitting almost on top of each other’ in position space, but with different momenta. They’ll have lots of kinetic energy, so minimizing kinetic energy becomes more important than minimizing potential energy.

When the density is low, this effect becomes unimportant, and the electrons mainly try to minimize potential energy. So, they form a crystal with each electron avoiding the rest. It turns out they form a body-centered cubic: a crystal lattice formed of cubes, with an extra electron in the middle of each cube.

To know whether a uniform electron gas at zero temperature forms a crystal or not, you need to work out its so-called Wigner-Seitz radius. This is the average inter-particle spacing measured in units of the Bohr radius. The Bohr radius is the unit of length you can cook up from the electron mass, the electron charge and Planck’s constant:

\displaystyle{ a_0=\frac{\hbar^2}{m_e e^2} }

It’s mainly famous as the average distance between the electron and a proton in a hydrogen atom in its lowest energy state.

Simulations show that a 3-dimensional uniform electron gas crystallizes when the Wigner–Seitz radius is at least 106. The picture, however, shows an electron crystal in 2 dimensions, formed by electrons trapped on a thin film shaped like a disk. In 2 dimensions, Wigner crystals form when the Wigner–Seitz radius is at least 31. In the picture, the density is so low that we can visualize the electrons as points with well-defined positions.

So, the picture simply shows a bunch of points x_i trying to minimize the potential energy, which is proportional to

\displaystyle{ \sum_{i \ne j} \frac{1}{\|x_i - x_j\|} }

The lines between the dots are just to help you see what’s going on. They’re showing the Delauney triangulation, where we draw a graph that divides the plane into regions closer to one electron than all the rest, and then take the dual of that graph.

Thanks to energy minimization, this triangulation wants to be a lattice of equilateral triangles. But since such a triangular lattice doesn’t fit neatly into a disk, we also see some ‘defects’:

Most electrons have 6 neighbors. But there are also some red defects, which are electrons with 5 neighbors, and blue defects, which are electrons with 7 neighbors.

Note that there are 6 clusters of defects. In each cluster there is one more red defect than blue defect. I think this is not a coincidence.

Conjecture. When we choose a sufficiently large number of points x_i on a disk in such a way that

\displaystyle{ \sum_{i \ne j} \frac{1}{\|x_i - x_j\|} }

is minimized, and draw the Delauney triangulation, there will be 6 more vertices with 5 neighbors than vertices with 7 neighbors.

Here’s a bit of evidence for this, which is not at all conclusive. Take a sphere and triangulate it in such a way that each vertex has 5, 6 or 7 neighbors. Then here’s a cool fact: there must be 12 more vertices with 5 neighbors than vertices with 7 neighbors.

Puzzle. Prove this fact.

If we think of the picture above as the top half of a triangulated sphere, then each vertex in this triangulated sphere has 5, 6 or 7 neighbors. So, there must be 12 more vertices on the sphere with 5 neighbors than with 7 neighbors. So, it makes some sense that the top half of the sphere will contain 6 more vertices with 5 neighbors than with 7 neighbors. But this is not a proof.

I have a feeling this energy minimization problem has been studied with various numbers of points. So, there either be a lot of evidence for my conjecture, or some counterexamples that will force me to refine it. The picture shows what happens with 600 points on the disk. Maybe something dramatically different happens with 599! Maybe someone has even proved theorems about this. I just haven’t had time to look for such work.

The picture here was drawn by Arunas.rv and placed on Wikicommons on a Creative Commons Attribution-Share Alike 3.0 Unported license.


December 07, 2017

Tommaso DorigoAlpha Zero Teaches Itself Chess 4 Hours, Then Beats Dad

Peter Heine Nielsen, a Danish chess Grandmaster, summarized it quite well. "I always wondered, if some superior alien race came to Earth, how they would play chess. Now I know". The architecture that beat humans at the notoriously CPU-impervious game Go, AlphaGo by Google Deep Mind, was converted to allow the machine to tackle other "closed-rules" games. Successively, the program was given the rules of chess, and a huge battery of Google's GPUs to train itself on the game. Within four hours, the alien emerged. And it is indeed a new class of player.

read more

December 06, 2017

BackreactionThe cosmological constant is not the worst prediction ever. It’s not even a prediction.

Think fake news and echo chambers are a problem only in political discourse? Think again. You find many examples of myths and falsehoods on popular science pages. Most of them surround the hype of the day, but some of them have been repeated so often they now appear in papers, seminar slides, and textbooks. And many scientists, I have noticed with alarm, actually believe them. I can’t say

BackreactionBook Update

As you probably noticed from the uptick in blogposts, I’ve finished writing the book. The publication date is set for June 12, 2018. We have a cover image now: and we have a webpage, where you can preoder my masterwork. The publishing business continues to surprise me. I have no idea who wrote the text accompanying the Amazon page and, for all I can tell, the first sentence doesn’t even make

December 05, 2017

John PreskillThe light show

Atoms 2

A strontium magneto-optical trap.

How did a quantum physics experiment end up looking like a night club? Due to a fortunate coincidence of nature, my lab mates and I at Endres Lab get to use three primary colors of laser light – red, blue, and green – to trap strontium atoms.  Let’s take a closer look at the physics behind this visually entrancing combination.

The spectrum

Sr level structure

The electronic spectrum of strontium near the ground state.

The trick to research is finding a problem that is challenging enough to be interesting, but accessible enough to not be impossible.  Strontium embodies this maxim in its electronic spectrum.  While at first glance it may seem daunting, it’s not too bad once you get to know each other.  Two valence electrons divide the spectrum into a spin-singlet sector and a spin-triplet sector – a designation that roughly defines whether the electron spins point in the opposite or in the same direction.  Certain transitions between these sectors are extremely precisely defined, and currently offer the best clock standards in the world.  Although navigating this spectrum requires more lasers, it offers opportunities for quantum physics that singly-valent spectra do not.  In the end, the experimental complexity is still very much manageable, and produces some great visuals to boot.  Here are some of the lasers we use in our lab:

The blue

At the center of the .gif above is a pulsating cloud of strontium atoms, shining brightly blue.  This is a magneto-optical trap, produced chiefly by strontium’s blue transition at 461nm.

IMG_3379

461nm blue laser light being routed through various paths.

The blue transition is exceptionally strong, scattering about 100 million photons per atom per second.  It is the transition we use to slow strontium atoms from a hot thermal beam traveling at hundreds of meters per second down to a cold cloud at about 1 milliKelvin.  In less than a second, this procedure gives us a couple hundred million atoms to work with.  As the experiment repeats, we get to watch this cloud pulse in and out of existence.

The red(s)

IMG_3380

689nm red light.  Bonus: Fabry-Perot interference fringes on my camera!

While the blue transition is a strong workhorse, the red transition at 689nm trades off strength for precision.  It couples strontium’s spin-singlet ground state to an excited spin-triplet state, a much weaker but more precisely defined transition.  While it does not scatter as fast as the blue (only about 23,000 photons per atom per second), it allows us to cool our atoms to much colder temperatures, on the order of 1 microKelvin.

In addition to our red laser at 689nm, we have two other reds at 679nm and 707nm.  These are necessary to essentially plug “holes” in the blue transition, which eventually cause an atom to fall into long-lived states other than the ground state.  It is generally true that the more complicated an atomic spectrum gets, the more “holes” there are to plug, and this is many times the reason why certain atoms and molecules are harder to trap than others.

The green

After we have established a cold magneto-optical trap, it is time to pick out individual atoms from this cloud and load them into very tightly focused optical traps that we call tweezers.  Here, our green laser comes into play.  This laser’s wavelength is far away from any particular transition, as we do not want it to scatter any photons at all.  However, its large intensity creates a conservative trapping potential for the atom, allowing us to hold onto it and even move it around.  Furthermore, its wavelength is what we call “magic”, which means it is chosen such that the ground and excited state experience the same trapping potential.

IMG_3369

The quite powerful green laser.  So powerful that you can see the beam in the air, like in the movies.

The invisible

Yet to be implemented are two more lasers slightly off the visible spectrum at both the ultraviolet and infrared sides.  Our ultraviolet laser will be crucial to elevating our experiment from single-body to many-body quantum physics, as it will allow us to drive our atoms to very highly excited Rydberg states which interact with long range.  Our infrared laser will allow us to trap atoms in the extremely precise clock state under “magic” conditions.

 

The combination of strontium’s various optical pathways allows for a lot of new tricks beyond just cooling and trapping.  Having Rydberg states alongside narrow-line transitions, for example, has yet unexplored potential for quantum simulation.  It is a playground that is very exciting without being utterly overwhelming.  Stay tuned as we continue our exploration – maybe we’ll have a yellow laser next time too.

 


n-Category Café The 2-Dialectica Construction: A Definition in Search of Examples

An adjunction is a pair of functors f:ABf:A\to B and g:BAg:B\to A along with a natural isomorphism

A(a,gb)B(fa,b). A(a,g b) \cong B(f a,b).

Question 1: Do we get any interesting things if we replace “isomorphism” in this definition by something else?

  • If we replace it by “function”, then the Yoneda lemma tells us we get just a natural transformation fg1 Bf g \to 1_B.
  • If we replace it by “retraction” then we get a unit and counit, as in an adjunction, satisfying one triangle identity but not the other.
  • If AA and BB are 2-categories and we replace it by “equivalence”, we get a biadjunction.
  • If AA and BB are 2-categories and we replace it by “adjunction”, we get a sort of lax 2-adjunction (a.k.a. “local adjunction”)

Are there other examples?

Question 2: What if we do the same thing for multivariable adjunctions?

A two-variable adjunction is a triple of functors f:A×BCf:A\times B\to C and g:A op×CBg:A^{op}\times C\to B and h:B op×CAh:B^{op}\times C\to A along with natural isomorphisms

C(f(a,b),c)B(b,g(a,c))A(a,h(b,c)). C(f(a,b),c) \cong B(b,g(a,c)) \cong A(a,h(b,c)).

What does it mean to “replace ‘isomorphism’ by something else” here? It could mean different things, but one thing it might mean is to ask instead for a function

A(a,h(b,c))×B(b,g(a,c))C(f(a,b),c). A(a,h(b,c)) \times B(b,g(a,c)) \to C(f(a,b),c).

Even more intriguingly, if A,B,CA,B,C are 2-categories, we could ask for an ordinary two-variable adjunction between these three hom-categories; this would give a certain notion of “lax two-variable 2-adjunction”. Question 2 is, are notions like this good for anything? Are there any natural examples?

Now, you may, instead, be wondering about

Question 3: In what sense is a function A(a,h(b,c))×B(b,g(a,c))C(f(a,b),c) A(a,h(b,c)) \times B(b,g(a,c)) \to C(f(a,b),c) a “replacement” for isomorphisms C(f(a,b),c)B(b,g(a,c))A(a,h(b,c)) C(f(a,b),c) \cong B(b,g(a,c)) \cong A(a,h(b,c)) ?

But that question, I can answer; it has to do with comparing the Chu construction and the Dialectica construction.

Last month I told you about how multivariable adjunctions form a polycategory that sits naturally inside the 2-categorical Chu construction Chu(Cat,Set)Chu(Cat,Set).

Now the classical Chu construction is, among other things, a way to produce *\ast-autonomous categories, which are otherwise in somewhat short supply. At first, I found that rather disincentivizing to study either one: why would I be interested in a contrived way to construct things that don’t occur naturally? But then I realized that the same sentence would make sense if you replaced “Chu construction” with “sheaves on a site” and “*\ast-autonomous categories” with “toposes”, and I certainly think those are interesting. So now it doesn’t bother me as much.

Anyway, there is also another general construction of *\ast-autonomous categories (and, in fact, more general things), which goes by the odd name of the “Dialectica construction”. The categorical Dialectica construction is an abstraction, due to Valeria de Paiva, of a syntactic construction due to Gödel, which in turn is referred to as the “Dialectica interpretation” apparently because it was published in the journal Dialectica. I must say that I cannot subscribe to this as a general principle for the naming of mathematical definitions; fortunately it does not seem to have been very widely adopted.

Anyway, however execrable its name, the Dialectica construction appears quite similar to the Chu construction. Both start from a closed symmetric monoidal category 𝒞\mathcal{C} equipped with a chosen object, which in this post I’ll call Ω\Omega. (Actually, there are various versions of both, but here I’m going to describe two versions that are maximally similar, as de Paiva did in her paper Dialectica and Chu constructions: Cousins?.) Moreover, both Chu(𝒞,Ω)Chu(\mathcal{C},\Omega) and Dial(𝒞,Ω)Dial(\mathcal{C},\Omega) have the same objects: triples A=(A +,A ,A̲)A=(A^+,A^-,\underline{A}) where A +,A A^+,A^- are objects of 𝒞\mathcal{C} and A̲:A +A Ω\underline{A} : A^+ \otimes A^- \to \Omega is a morphism in 𝒞\mathcal{C}. Finally, the morphisms f:ABf:A\to B in both Chu(𝒞,Ω)Chu(\mathcal{C},\Omega) and Dial(𝒞,Ω)Dial(\mathcal{C},\Omega) consist of a pair of morphisms f +:A +B +f^+ : A^+ \to B^+ and f :B A f^- : B^- \to A^- (note the different directions) subject to some condition.

The only difference is in the conditions. In Chu(𝒞,Ω)Chu(\mathcal{C},\Omega), the condition is that the composites

A +B 1f A +A A̲ΩA^+ \otimes B^- \xrightarrow{1\otimes f^-} A^+ \otimes A^- \xrightarrow{\underline{A}} \Omega

A +B f +1B +B B̲ΩA^+ \otimes B^- \xrightarrow{f^+\otimes 1} B^+ \otimes B^- \xrightarrow{\underline{B}} \Omega

are equal. But in Dial(𝒞,Ω)Dial(\mathcal{C},\Omega), we assume that Ω\Omega is equipped with an internal preorder, and require that the first of these composites is \le the second with respect to this preorder.

Now you can probably see where Question 1 above comes from. The 2-category of categories and adjunctions sits inside Chu(Cat,Set)Chu(Cat,Set) as the objects of the form (A,A op,hom A)(A,A^{op},hom_A). The analogous category sitting inside Dial(Cat,Set)Dial(Cat,Set), where SetSet is regarded as an internal category in CatCat in the obvious way, would consist of “generalized adjunctions” of the first sort, with simple functions A(a,gb)B(fa,b)A(a,g b) \to B(f a,b) rather than isomorphisms. Other “2-Dialectica constructions” would yield other sorts of generalized adjunction.

What about Questions 2 and 3? Well, back up a moment: the above description of the Chu and Dialectica constructions actually exaggerates their similarity, because it omits their monoidal structures. As a mere category, Chu(𝒞,Ω)Chu(\mathcal{C},\Omega) is clearly the special case of Dial(𝒞,Ω)Dial(\mathcal{C},\Omega) where Ω\Omega has a discrete preorder (i.e. xyx\le y iff x=yx=y). But Chu(𝒞,Ω)Chu(\mathcal{C},\Omega) is always *\ast-autonomous, as long as 𝒞\mathcal{C} has pullbacks; whereas for Dial(𝒞,Ω)Dial(\mathcal{C},\Omega) to be monoidal, closed, or *\ast-autonomous we require the preorder Ω\Omega to have those same properties, which a discrete preorder certainly does not always. And even when a discrete preorder Ω\Omega does have some or all those properties, the resulting monoidal structure of Dial(𝒞,Ω)Dial(\mathcal{C},\Omega) does not coincide with that of Chu(𝒞,Ω)Chu(\mathcal{C},\Omega).

As happens so often, the situation is clarified by considering universal properties. That is, rather than comparing the concrete constructions of the tensor products in Chu(𝒞,Ω)Chu(\mathcal{C},\Omega) and Dial(𝒞,Ω)Dial(\mathcal{C},\Omega), we should compare the functors that they represent. A morphism ABCA\otimes B\to C in Chu(𝒞,Ω)Chu(\mathcal{C},\Omega) consists of three mophisms f:A +B +C +f:A^+\otimes B^+\to C^+ and g:A +C B g:A^+ \otimes C^- \to B^- and h:B +C A h:B^+ \otimes C^- \to A^- such that a certain three morphisms A +B +C ΩA^+ \otimes B^+ \otimes C^- \to \Omega are equal. In terms of “formal elements” a:A +,b:B +,c:C a:A^+, b:B^+,c:C^- in the internal type theory of 𝒞\mathcal{C}, these certain three morphisms can be written as

C̲(f(a,b),c)B̲(b,g(a,c))A̲(a,h(b,c)) \underline{C}(f(a,b),c) \qquad \underline{B}(b,g(a,c)) \qquad \underline{A}(a,h(b,c))

just as in a two-variable adjunction. By contrast, a morphism ABCA\otimes B\to C in Dial(𝒞,Ω)Dial(\mathcal{C},\Omega) consists of three morphisms f,g,hf,g,h of the same sorts, but such that

B̲(b,g(a,c))A̲(a,h(b,c))C̲(f(a,b),c) \underline{B}(b,g(a,c)) \boxtimes \underline{A}(a,h(b,c)) \le \underline{C}(f(a,b),c)

where \boxtimes denotes the tensor product of the monoidal preorder Ω\Omega. Now you can probably see where Question 2 comes from: if in constructing Dial(Cat,Set)Dial(Cat,Set) we equip SetSet with its usual monoidal structure, we get generalized 2-variable adjunctions with a function A(a,h(b,c))×B(b,g(a,c))C(f(a,b),c)A(a,h(b,c)) \times B(b,g(a,c)) \to C(f(a,b),c), and for other choices of Ω\Omega we get other kinds.

This is already somewhat of an answer to Question 3: the analogy between ordinary adjunctions and these “generalized adjunctions” is the same as between the Chu and Dialectica constructions. But it’s more satisfying to make both of those analogies precise, and we can do that by generalizing the Dialectica construction to allow Ω\Omega to be an internal polycategory rather than merely an internal poset (or category). If this polycategory structure is representable, then we recover the original Dialectica construction. Whereas if we give an arbitrary object Ω\Omega the (non-representable) “Frobenius-discrete” polycategory structure, in which a morphism (x 1,,x m)(y 1,,y n)(x_1,\dots,x_m) \to (y_1,\dots,y_n) is the assertion that x 1==x m=y 1==y nx_1=\cdots=x_m=y_1=\cdots=y_n, then we recover the original Chu construction.

For a general internal polycategory Ω\Omega, the resulting “Dialectica-Chu” construction will be only a polycategory. But it is representable in the Dialectica case if Ω\Omega is representable, and it is representable in the Chu case if 𝒞\mathcal{C} has pullbacks. This explains why the tensor products in Chu(𝒞,Ω)Chu(\mathcal{C},\Omega) and Dial(𝒞,Ω)Dial(\mathcal{C},\Omega) look different: they are representing two instances of the same functor, but they represent it for different reasons.

So… what about Questions 1 and 2? In other words: if the reason I care about the Chu construction is because it’s an abstraction of multivariable adjunctions, why should I care about the Dialectica construction?

December 04, 2017

Andrew JaffeWMAP Breaks Through

It was announced this morning that the WMAP team has won the $3 million Breakthrough Prize. Unlike the Nobel Prize, which infamously is only awarded to three people each year, the Breakthrough Prize was awarded to the whole 27-member WMAP team, led by Chuck Bennett, Gary Hinshaw, Norm Jarosik, Lyman Page, and David Spergel, but including everyone through postdocs and grad students who worked on the project. This is great, and I am happy to send my hearty congratulations to all of them (many of whom I know well and am lucky to count as friends).

I actually knew about the prize last week as I was interviewed by Nature for an article about it. Luckily I didn’t have to keep the secret for long. Although I admit to a little envy, it’s hard to argue that the prize wasn’t deserved. WMAP was ideally placed to solidify the current standard model of cosmology, a Universe dominated by dark matter and dark energy, with strong indications that there was a period of cosmological inflation at very early times, which had several important observational consequences. First, it made the geometry of the Universe — as described by Einstein’s theory of general relativity, which links the contents of the Universe with its shape — flat. Second, it generated the tiny initial seeds which eventually grew into the galaxies that we observe in the Universe today (and the stars and planets within them, of course).

By the time WMAP released its first results in 2003, a series of earlier experiments (including MAXIMA and BOOMERanG, which I had the privilege of being part of) had gone much of the way toward this standard model. Indeed, about ten years one of my Imperial colleagues, Carlo Contaldi, and I wanted to make that comparison explicit, so we used what were then considered fancy Bayesian sampling techniques to combine the data from balloons and ground-based telescopes (which are collectively known as “sub-orbital” experiments) and compare the results to WMAP. We got a plot like the following (which we never published), showing the main quantity that these CMB experiments measure, called the power spectrum (which I’ve discussed in a little more detail here). The horizontal axis corresponds to the size of structures in the map (actually, its inverse, so smaller is to the right) and the vertical axis to how large the the signal is on those scales.

Grand unified spectrum

As you can see, the suborbital experiments, en masse, had data at least as good as WMAP on most scales except the very largest (leftmost; this is because you really do need a satellite to see the entire sky) and indeed were able to probe smaller scales than WMAP (to the right). Since then, I’ve had the further privilege of being part of the Planck Satellite team, whose work has superseded all of these, giving much more precise measurements over all of these scales: PlanckCl

Am I jealous? Ok, a little bit.

But it’s also true, perhaps for entirely sociological reasons, that the community is more apt to trust results from a single, monolithic, very expensive satellite than an ensemble of results from a heterogeneous set of balloons and telescopes, run on (comparative!) shoestrings. On the other hand, the overall agreement amongst those experiments, and between them and WMAP, is remarkable.

And that agreement remains remarkable, even if much of the effort of the cosmology community is devoted to understanding the small but significant differences that remain, especially between one monolithic and expensive satellite (WMAP) and another (Planck). Indeed, those “real and serious” (to quote myself) differences would be hard to see even if I plotted them on the same graph. But since both are ostensibly measuring exactly the same thing (the CMB sky), any differences — even those much smaller than the error bars — must be accounted for almost certainly boil down to differences in the analyses or misunderstanding of each team’s own data. Somewhat more interesting are differences between CMB results and measurements of cosmology from other, very different, methods, but that’s a story for another day.

December 03, 2017

Tommaso DorigoAnother Nice Review Of "Anomaly!"

December 01, 2017

John BaezA Universal Snake-like Continuum

It sounds like jargon from a bad episode of Star Trek. But it’s a real thing. It’s a monstrous object that lives in the plane, but is impossible to draw.

Do you want to see how snake-like it is? Okay, but beware… this video clip is a warning:

This snake-like monster is also called the ‘pseudo-arc’. It’s the limit of a sequence of curves that get more and more wiggly. Here are the 5th and 6th curves in the sequence:



Here are the 8th and 10th:



But what happens if you try to draw the pseudo-arc itself, the limit of all these curves? It turns out to be infinitely wiggly—so wiggly that any picture of it is useless.

In fact Wayne Lewis and Piotr Minic wrote a paper about this, called Drawing the pseudo-arc. That’s where I got these pictures. The paper also shows stage 200, and it’s a big fat ugly black blob!



But the pseudo-arc is beautiful if you see through the pictures to the concepts, because it’s a universal snake-like continuum. Let me explain. This takes some math.

The nicest metric spaces are compact metric spaces, and each of these can be written as the union of connected components… so there’s a long history of interest in compact connected metric spaces. Except for the empty set, which probably doesn’t deserve to be called connected, these spaces are called continua.

Like all point-set topology, the study of continua is considered a bit old-fashioned, because people have been working on it for so long, and it’s hard to get good new results. But on the bright side, what this means is that many great mathematicians have contributed to it, and there are lots of nice theorems. You can learn about it here:

• W. T. Ingraham, A brief historical view of continuum theory,
Topology and its Applications 153 (2006), 1530–1539.

• Sam B. Nadler, Jr, Continuum Theory: An Introduction, Marcel Dekker, New York, 1992.

Now, if we’re doing topology, we should really talk not about metric spaces but about metrizable spaces: that is, topological spaces where the topology comes from some metric, which is not necessarily unique. This nuance is a way of clarifying that we don’t really care about the metric, just the topology.

So, we define a continuum to be a nonempty compact connected metrizable space. When I think of this I think of a curve, or a ball, or a sphere. Or maybe something bigger like the Hilbert cube: the countably infinite product of closed intervals. Or maybe something full of holes, like the Sierpinski carpet:



or the Menger sponge:



Or maybe something weird like a solenoid:



Very roughly, a continuum is ‘snake-like’ if it’s long and skinny and doesn’t loop around. But the precise definition is a bit harder:

We say that an open cover 𝒰 of a space X refines an open cover 𝒱 if each element of 𝒰 is contained in an element of 𝒱. We call a continuum X snake-like if each open cover of X can be refined by an open cover U1, …, Un such that for any i, j the intersection of Ui and Uj is nonempty iff i and j are right next to each other.

Such a cover is called a chain, so a snake-like continuum is also called chainable. But ‘snake-like’ is so much cooler: we should take advantage of any opportunity to bring snakes into mathematics!

The simplest snake-like continuum is the closed unit interval [0,1]. It’s hard to think of others. But here’s what Mioduszewski proved in 1962: the pseudo-arc is a universal snake-like continuum. That is: it’s a snake-like continuum, and it has continuous map onto every snake-like continuum!

This is a way of saying that the pseudo-arc is the most complicated snake-like continuum possible. A bit more precisely: it bends back on itself as much as possible while still going somewhere! You can see this from the pictures above, or from the construction on Wikipedia:

• Wikipedia, Pseudo-arc.

I like the idea that there’s a subset of the plane with this simple ‘universal’ property, which however is so complicated that it’s impossible to draw.

Here’s the paper where these pictures came from:

• Wayne Lewis and Piotr Minic, Drawing the pseudo-arc, Houston J. Math. 36 (2010), 905–934.

The pseudo-arc has other amazing properties. For example, it’s ‘indecomposable’. A nonempty connected closed subset of a continuum is a continuum in its own right, called a subcontinuum, and we say a continuum is indecomposable if it is not the union of two proper subcontinua.

It takes a while to get used to this idea, since all the examples of continua that I’ve listed so far are decomposable except for the pseudo-arc and the solenoid!

Of course a single point is an indecomposable continuum, but that example is so boring that people sometimes exclude it. The first interesting example was discovered by Brouwer in 1910. It’s the intersection of an infinite sequence of sets like this:


It’s called the Brouwer–Janiszewski–Knaster continuum or buckethandle. Like the solenoid, it shows up as an attractor in some chaotic dynamical systems.

It’s easy to imagine how if you write the buckethandle as the union of two closed proper subsets, at least one will be disconnected. And note: you don’t even need these subsets to be disjoint! So, it’s an indecomposable continuum.

But once you get used to indecomposable continua, you’re ready for the next level of weirdness. An even more dramatic thing is a hereditarily indecomposable continuum: one for which each subcontinuum is also indecomposable.

Apart from a single point, the pseudo-arc is the unique hereditarily indecomposable snake-like continuum! I believe this was first proved here:

• R. H. Bing, Concerning hereditarily indecomposable continua, Pacific J. Math. 1 (1951), 43–51.

Finally, here’s one more amazing fact about the pseudo-arc. To explain it, I need a bunch more nice math:

Every continuum arises as a closed subset of the Hilbert cube. There’s an obvious way to define the distance between two closed subsets of a compact metric space, called the Hausdorff distance—if you don’t know about this already, it’s fun to reinvent it yourself. The set of all closed subsets of a compact metric space thus forms a metric space in its own right—and by the way, the Blaschke selection theorem says this metric space is again compact!

Anyway, this stuff means that there’s a metric space whose points are all subcontinua of the Hilbert cube, and we don’t miss out on any continua by looking at these. So we can call this the space of all continua.

Now for the amazing fact: pseudo-arcs are dense in the space of all continua!

I don’t know who proved this. It’s mentioned here:

• Trevor L. Irwin and Salawomir Solecki, Projective Fraïssé limits and the pseudo-arc.

but they refer to this paper as a good source for such facts:

• Wayne Lews, The pseudo-arc, Bol. Soc. Mat. Mexicana (3) 5 (1999), 25–77.

Abstract. The pseudo-arc is the simplest nondegenerate hereditarily indecomposable continuum. It is, however, also the most important, being homogeneous, having several characterizations, and having a variety of useful mapping properties. The pseudo-arc has appeared in many areas of continuum theory, as well as in several topics in geometric topology, and is beginning to make its appearance in dynamical systems. In this monograph, we give a survey of basic results and examples involving the pseudo-arc. A more complete treatment will be given in a book dedicated to this topic, currently under preparation by this author. We omit formal proofs from this presentation, but do try to give indications of some basic arguments and construction techniques. Our presentation covers the following major topics: 1. Construction 2. Homogeneity 3. Characterizations 4. Mapping properties 5. Hyperspaces 6. Homeomorphism groups 7. Continuous decompositions 8. Dynamics.

It may seem surprising that one can write a whole book about the pseudo-arc… but if you like continua, it’s a fundamental structure just like spheres and cubes!


Tommaso DorigoWhen Ignorance Kills Human Progress, And A Petition You Should Sign

An experiment designed to study neutrinos at the Gran Sasso Laboratories in Italy is under attack by populistic media. Why should you care? Because it's a glaring example of the challenges we face in the XXI century in our attempt to foster the progress of the human race.
What is a neutrino? Nothing - it's a particle as close to nothing as you can imagine. Almost massless, almost perfectly non-interacting, and yet incredibly mysterious and the key to the solution of many riddles in fundamental physics and cosmology. But it's really nothing you should worry about, or care about, if you want to lead your life oblivious of the intricacies of subnuclear physics. Which is fine of course - unless you try to use your ignorance to stop progress.

read more

November 30, 2017

BackreactionIf science is what scientists do, what happens if scientists stop doing science?

“Is this still science?” has become a recurring question in the foundations of physics. Whether it’s the multiverse, string theory, supersymmetry, or inflation, concerns abound that theoreticians have crossed a line. Science writer Jim Baggott called the new genre “fairy-tale science.” Historian Helge Kragh coined the term “higher speculations,” and Peter Woit, more recently, suggested the

November 29, 2017

John PreskillMachine learning the arXiv

Over the last year or so, the machine learning wave has really been sweeping through the field of condensed matter physics. Machine learning techniques have been applied to condensed matter physics before, but very sparsely and with little recognition. These days, I guess (partially) due to the general machine learning and AI hype, the amount of such studies skyrocketed (I admit to contributing to that..). I’ve been keeping track of this using the arXiv and Twitter (@Evert_v_N), but you should know about this website for getting an overview of the physics & machine learning papers: https://physicsml.github.io/pages/papers.html.

This effort of applying machine learning to physics is a serious attempt at trying to understand how such tools could be useful in a variety of ways. It isn’t very hard to get a neural network to learn ‘something’ from physics data, but it is really hard to find out what – and especially how – the network does that. That’s why toy cases such as the Ising model or the Kosterlitz-Thouless transition have been so important!

When you’re keeping track of machine learning and AI developments, you soon realize that there are examples out there of amazing feats. Being able to generate photo-realistic pictures given just a sentence. e.g. “a brown bird with golden speckles and red wings is sitting on a yellow flower with pointy petals”, is (I think..) pretty cool. I can’t help but wonder if we’ll get to a point where we can ask it to generate “the groundstate of the Heisenberg model on a Kagome lattice of 100×100”…

Another feat I want to mention, and the main motivation for this post, is that of being able to encode words as vectors. That doesn’t immediately seem like a big achievement, but it is once you want to have ‘similar’ words have ‘similar’ vectors. That is, you intuitively understand that Queen and King are very similar, but differ basically only in gender. Can we teach that to a computer (read: neural network) by just having it read some text? Turns out we can. The general encoding of words to vectors is aptly named ‘Word2Vec’, and some of the top algorithms that do that were introduced here (https://arxiv.org/abs/1301.3781) and here (https://arxiv.org/abs/1310.4546). The neat thing is that we can actually do arithmetics with these words encoded as vectors, so that the network learns (with no other input than text!):

  • King – Man + Woman = Queen
  • Paris – France + Italy = Rome

In that spirit, I wondered if we can achieve the same thing with physics jargon. Everyone knows, namely, that “electrons + two dimensions + magnetic field = Landau levels”. But is that clear from condensed matter titles?

Try it yourself

If you decide at this point that the rest of the blog is too long, at least have a look here: everthemore.pythonanywhere.com or skip to the last section. That website demonstrates the main point of this post. If that sparks your curiosity, read on!

This post is mainly for entertainment, and so a small disclaimer is in order: in all of the results below, I am sure things can be improved upon. Consider this a ‘proof of principle’. However, I would be thrilled to see what kind of trained models you can come up with yourself! So for that purpose, all of the code (plus some bonus content!) can be found on this github repository: https://github.com/everthemore/physics2vec.

Harvesting the arXiv

The perfect dataset for our endeavor can be found in the form of the arXiv. I’ve written a small script (see github repository) that harvests the titles of a given section from the arXiv. It also has options for getting the abstracts, but I’ll leave that for a separate investigation. Note that in principle we could also get the source-files of all of these papers, but doing that in bulk requires a payment; and getting them one by one will 1) take forever and 2) probably get us banned.

Collecting all this data of the physics:cond-mat subsection took right about 1.5 hours and resulted in 240737 titles and abstracts (I last ran this script on November 20th, 2017). I’ve filtered them by year and month, and you can see the result in Fig.1 below. Seems like we have some catching up to do in 2017 still (although as the inset shows, we have nothing to fear. November is almost over, but we still have the ‘getting things done before x-mas’ rush coming up!).

numpapers

Figure 1: The number of papers in the cond-mat arXiv section over the years. We’re behind, but the year isn’t over yet! (Data up to Nov 20th 2017)

Analyzing n-grams

After tidying up the titles (removing LaTeX, converting everything to lowercase, etc.), the next thing to do is to train a language model on finding n-grams. N-grams are basically fixed n-word expressions such as ‘cooper pair’ (bigram) or ‘metal insulator transition’ (trigram). This makes it easier to train a Word2Vec encoding, since these phrases are fixed and can be considered a single word. The python module we’ll use for Word2Vec is gensim (https://radimrehurek.com/gensim/), and it conveniently has phrase-detection built-in. The language model it builds reports back to us the n-grams it finds, and assigns them a score indicating how certain it is about them. Notice that this is not the same as how frequently it appears in the dataset. Hence an n-gram can appear fewer times than another, but have a higher certainty because it always appears in the same combination. For example, ‘de-haas-van-alphen’ appears less than, but is more certain than, ‘cooper-pair’, because ‘pair’ does not always come paired (pun intended) with ‘cooper’. I’ve analyzed up to 4-grams in the analysis below.

I can tell you’re curious by now to find out what some of the most certain n-grams in cond-mat are (again, these are not necessarily the most frequent), so here are some interesting findings:

  • The most certain n-grams are all surname combo’s, Affleck-Kennedy-Lieb-Tasaki being the number 1. Kugel-Khomskii is the most certain 2-name combo and Einstein-Podolksi-Rosen the most certain 3-name combo.
  • The first certain non-name n-gram is a ‘quartz tuning fork’, followed by a ‘superconducting coplanar waveguide resonator’. Who knew.
  • The bigram ‘phys. rev.’ and trigram ‘phys. rev. lett.’ are relatively high up in the confidence lists. These seem to come from the “Comment on […]”-titles on the arXiv.
  • I learned that there is such a thing as a Lefschetz thimble. I also learned that those things are called thimbles in English (we (in Holland) call them ‘finger-hats’!).

In terms of frequency however, which is probably more of interest to us, the most dominant n-grams are Two-dimensional, Quantum dot, Phase transition, Magnetic field, One dimensional and Bose-Einstein (in descending order). It seems 2D is still more popular than 1D, and all in all the top n-grams do a good job at ‘defining’ condensed matter physics. I’ll refer you to the github repository code if you want to see a full list! You’ll find there a piece of code that produces wordclouds from the dominant words and n-grams too, such as this one:

caltechwordcloud.png

For fun though, before we finally get to the Word2Vec encoding, I’ve also kept track of all of these as a function of year, so that we can now turn to finding out which bigrams have been gaining the most popularity. The table below shows the top 5 n-grams for the period 2010 – 2016 (not including 2017) and for the period 2015 – Nov 20th 2017.

2010-2016

2015 – November 20th 2017

Spin liquids  Topological phases & transitions
 Weyl semimetals  Spin chains
 Topological phases & transitions  Machine learning
 Surface states  Transition metal dichalcogenides
 Transition metal dichalcogenides  Thermal transport
 Many-body localization  Open quantum systems

Actually, the real number 5 in the left column was ‘Topological insulators’, but given number 3 I skipped it. Also, this top 5 includes a number 6 (!), which I just could not leave off given that everyone seems to have been working on MBL. If we really want to be early adopters though, taking only the last 1.8 years (2015 – now, Nov 20th 2017)  in the right column of the table shows some interesting newcomers. Surprisingly, many-body localization is not even in the top 20 anymore. Suffice it to say, if you have been working on anything topology-related, you have nothing to worry about. Machine learning is indeed gaining lots of attention, but we’ve yet to see if it doesn’t go the MBL-route (I certainly don’t hope so!). Quantum computing does not seem to be on the cond-mat radar, but I’m certain we would find that high up in the quant-ph arXiv section.

CondMat2Vec

Alright, finally time to use some actual neural networks for machine learning. As I started this post, what we’re about to do is try to train a network to encode/decode words into vectors, while simultaneously making sure that similar words (by meaning!) have similar vectors. Now that we have the n-grams, we want the Word2Vec algorithm to treat these as words by themselves (they are, after all, fixed combinations).

In the Word2Vec algorithm, we get to decide the length of the vectors that encode words ourselves. Larger vectors means more freedom in encoding words, but also makes it harder to learn similarity. In addition, we get to choose a window-size, indicating how many words the algorithm will look ahead to analyze relations between words. Both of these parameters are free for you to play with if you have a look at the source code repository. For the website everthemore.pythonanywhere.com, I’ve uploaded a size 100 with window-size 10 model, which I found to produce sensible results. Sensible here means “based on my expectations”, such as the previous example of “2D + electrons + magnetic field = Landau levels”. Let’s ask our network some questions.

First, as a simple check, let’s see what our encoding thinks some jargon is similar to:

  • Superconductor ~ Superconducting, Cuprate superconductor, Superconductivity, Layered superconductor, Unconventional superconductor, Superconducting gap, Cuprate, Weyl semimetal, …
  • Majorana ~ Majorana fermion, Majorana mode, Non-abelian, Zero-energy, braiding, topologically protected, …

It seems we could start to cluster words based on this. But the real test comes now, in the form of arithmetics. According to our network (I am listing the top two choices in some cases; the encoder outputs a list of similar vectors, ordered by similarity):

  • Majorana + Braiding = Non-Abelian
  • Electron + Hole = Exciton, Carrier
  • Spin + Magnetic field = Magnetization, Antiferromagnetic
  • Particle + Charge = Electron, Charged particle

And, sure enough:

  • 2D + electrons + magnetic field = Landau level, Magnetoresistance oscillation

The above is just a small sample of the things I’ve tried. See the link in the try it yourself section above if you want to have a go. Not all of the examples work nicely. For example, neither lattice + wave nor lattice + excitation nor lattice + force seem to result in anything related to the word ‘phonon’. I would guess that increasing the window size will help remedy this problem. Even better probably would be to include abstracts!

Outlook

I could play with this for hours, and I’m sure that by including the abstracts and tweaking the vector size (plus some more parameters I haven’t even mentioned) one could optimize this more. Once we have an optimized model, we could start to cluster the vectors to define research fields, visualizing the relations between n-grams (both suggestions thanks to Thomas Vidick and John Preskill!), and many other things. This post has become rather long already however, and I will leave further investigation to a possible future post. I’d be very happy to incorporate anything cool you find yourselves though, so please let me know!


Doug NatelsonVery busy time....

Sorry for the light blogging - between departmental duties and deadline-motivated writing, it's been very difficult to squeeze in much blogging.  Hopefully things will lighten up again in the next week or two.   In the meantime, I suggest watching old episodes of the excellent show Scrapheap Challenge (episode 1 here).  Please feel free to put in suggestions of future blogging topics in the comments below.  I'm thinking hard about doing a series on phases and phase transitions.

November 27, 2017

John PreskillGently yoking yin to yang

The architecture at the University of California, Berkeley mystified me. California Hall evokes a Spanish mission. The main library consists of white stone pillared by ionic columns. A sea-green building scintillates in the sunlight like a scarab. The buildings straddle the map of styles.

Architecture.001

So do Berkeley’s quantum scientists, information-theory users, and statistical mechanics.

The chemists rove from abstract quantum information (QI) theory to experiments. Physicists experiment with superconducting qubits, trapped ions, and numerical simulations. Computer scientists invent algorithms for quantum computers to perform.

Few activities light me up more than bouncing from quantum group to info-theory group to stat-mech group, hunting commonalities. I was honored to bounce from group to group at Berkeley this September.

What a trampoline Berkeley has.

The groups fan out across campus and science, but I found compatibility. Including a collaboration that illuminated quantum incompatibility.

Quantum incompatibility originated in studies by Werner Heisenberg. He and colleagues cofounded quantum mechanics during the early 20th century. Measuring one property of a quantum system, Heisenberg intuited, can affect another property.

The most famous example involves position and momentum. Say that I hand you an electron. The electron occupies some quantum state represented by | \Psi \rangle. Suppose that you measure the electron’s position. The measurement outputs one of many possible values x (unless | \Psi \rangle has an unusual form, the form a Dirac delta function).

But we can’t say that the electron occupies any particular point x = x_0 in space. Measurement devices have limited precision. You can measure the position only to within some error \varepsilon: x = x_0 \pm \varepsilon.

Suppose that, immediately afterward, you measure the electron’s momentum. This measurement, too, outputs one of many possible values. What probability q(p) dp does the measurement have of outputting some value p? We can calculate q(p) dp, knowing the mathematical form of | \Psi \rangle and knowing the values of x_0 and \varepsilon.

q(p) is a probability density, which you can think of as a set of probabilities. The density can vary with p. Suppose that q(p) varies little: The probabilities spread evenly across the possible p values. You have no idea which value your momentum measurement will output. Suppose, instead, that q(p) peaks sharply at some value p = p_0. You can likely predict the momentum measurement’s outcome.

The certainty about the momentum measurement trades off with the precision \varepsilon of the position measurement. The smaller the \varepsilon (the more precisely you measured the position), the greater the momentum’s unpredictability. We call position and momentum complementary, or incompatible.

You can’t measure incompatible properties, with high precision, simultaneously. Imagine trying to do so. Upon measuring the momentum, you ascribe a tiny range of momentum values p to the electron. If you measured the momentum again, an instant later, you could likely predict that measurement’s outcome: The second measurement’s q(p) would peak sharply (encode high predictability). But, in the first instant, you measure also the position. Hence, by the discussion above, q(p) would spread out widely. But we just concluded that q(p) would peak sharply. This contradiction illustrates that you can’t measure position and momentum, precisely, at the same time.

But you can simultaneously measure incompatible properties weakly. A weak measurement has an enormous \varepsilon. A weak position measurement barely spreads out q(p). If you want more details, ask a Quantum Frontiers regular; I’ve been harping on weak measurements for months.

Blame Berkeley for my harping this month. Irfan Siddiqi’s and Birgitta Whaley’s groups collaborated on weak measurements of incompatible observables. They tracked how the measured quantum state | \Psi (t) \rangle evolved in time (represented by t).

Irfan’s group manipulates superconducting qubits.1 The qubits sit in the physics building, a white-stone specimen stamped with an egg-and-dart motif. Across the street sit chemists, including members of Birgitta’s group. The experimental physicists and theoretical chemists teamed up to study a quantum lack of teaming up.

Phys. & chem. bldgs

The experiment involved one superconducting qubit. The qubit has properties analogous to position and momentum: A ball, called the Bloch ball, represents the set of states that the qubit can occupy. Imagine an arrow pointing from the sphere’s center to any point in the ball. This Bloch vector represents the qubit’s state. Consider an arrow that points upward from the center to the surface. This arrow represents the qubit state | 0 \rangle. | 0 \rangle is the quantum analog of the possible value 0 of a bit, or unit of information. The analogous downward-pointing arrow represents the qubit state | 1 \rangle, analogous to 1.

Infinitely many axes intersect the sphere. Different axes represent different observables that Irfan’s group can measure. Nonparallel axes represent incompatible observables. For example, the x-axis represents an observable \sigma_x analogous to position. The y-axis represents an observable \sigma_y analogous to momentum.

Tug-of-war

Siddiqi lab, decorated with the trademark for the paper’s tug-of-war between incompatible observables. Photo credit: Leigh Martin, one of the paper’s leading authors.

Irfan’s group stuck their superconducting qubit in a cavity, or box. The cavity contained light that interacted with the qubit. The interactions transferred information from the qubit to the light: The light measured the qubit’s state. The experimentalists controlled the interactions, controlling the axes “along which” the light was measured. The experimentalists weakly measured along two axes simultaneously.

Suppose that the axes coincided—say, at the x-axis \hat{x}. The qubit would collapse to the state | \Psi \rangle = \frac{1}{ \sqrt{2} } ( | 0 \rangle + | 1 \rangle ), represented by the arrow that points along \hat{x} to the sphere’s surface, or to the state | \Psi \rangle = \frac{1}{ \sqrt{2} } ( | 0 \rangle - | 1 \rangle ), represented by the opposite arrow.

0 deg

(Projection of) the Bloch Ball after the measurement. The system can access the colored points. The lighter a point, the greater the late-time state’s weight on the point.

Let \hat{x}' denote an axis near \hat{x}—say, 18° away. Suppose that the group weakly measured along \hat{x} and \hat{x}'. The state would partially collapse. The system would access points in the region straddled by \hat{x} and \hat{x}', as well as points straddled by - \hat{x} and - \hat{x}'.

18 deg

Finally, suppose that the group weakly measured along \hat{x} and \hat{y}. These axes stand in for position and momentum. The state would, loosely speaking, swirl around the Bloch ball.

90 deg

The Berkeley experiment illuminates foundations of quantum theory. Incompatible observables, physics students learn, can’t be measured simultaneously. This experiment blasts our expectations, using weak measurements. But the experiment doesn’t just destroy. It rebuilds the blast zone, by showing how | \Psi (t) \rangle evolves.

“Position” and “momentum” can hang together. So can experimentalists and theorists, physicists and chemists. So, somehow, can the California mission and the ionic columns. Maybe I’ll understand the scarab building when we understand quantum theory.2

With thanks to Birgitta’s group, Irfan’s group, and the rest of Berkeley’s quantum/stat-mech/info-theory community for its hospitality. The Bloch-sphere figures come from http://www.nature.com/articles/nature19762.

1The qubit is the quantum analog of a bit. The bit is the basic unit of information. A bit can be in one of two possible states, which we can label as 0 and 1. Qubits can manifest in many physical systems, including superconducting circuits. Such circuits are tiny quantum circuits through which current can flow, without resistance, forever.

2Soda Hall dazzled but startled me.


November 26, 2017

Terence TaoAn inverse theorem for an inequality of Kneser

I have just uploaded to the arXiv the paper “An inverse theorem for an inequality of Kneser“, submitted to a special issue of the Proceedings of the Steklov Institute of Mathematics in honour of Sergei Konyagin. It concerns an inequality of Kneser discussed previously in this blog, namely that

\displaystyle \mu(A+B) \geq \min(\mu(A)+\mu(B), 1) \ \ \ \ \ (1)

whenever {A,B} are compact non-empty subsets of a compact connected additive group {G} with probability Haar measure {\mu}.  (A later result of Kemperman extended this inequality to the nonabelian case.) This inequality is non-trivial in the regime

\displaystyle \mu(A), \mu(B), 1- \mu(A)-\mu(B) > 0. \ \ \ \ \ (2)

The connectedness of {G} is essential, otherwise one could form counterexamples involving proper subgroups of {G} of positive measure. In the blog post, I indicated how this inequality (together with a more “robust” strengthening of it) could be deduced from submodularity inequalities such as

\displaystyle \mu( (A_1 \cup A_2) + B) + \mu( (A_1 \cap A_2) + B)

\displaystyle \leq \mu(A_1+B) + \mu(A_2+B) \ \ \ \ \ (3)

which in turn easily follows from the identity {(A_1 \cup A_2) + B = (A_1+B) \cup (A_2+B)} and the inclusion {(A_1 \cap A_2) + B \subset (A_1 +B) \cap (A_2+B)}, combined with the inclusion-exclusion formula.

In the non-trivial regime (2), equality can be attained in (1), for instance by taking {G} to be the unit circle {G = {\bf R}/{\bf Z}} and {A,B} to be arcs in that circle (obeying (2)). A bit more generally, if {G} is an arbitrary connected compact abelian group and {\xi: G \rightarrow {\bf R}/{\bf Z}} is a non-trivial character (i.e., a continuous homomorphism), then {\xi} must be surjective (as {{\bf R}/{\bf Z}} has no non-trivial connected subgroups), and one can take {A = \xi^{-1}(I)} and {B = \xi^{-1}(J)} for some arcs {I,J} in that circle (again choosing the measures of these arcs to obey (2)). The main result of this paper is an inverse theorem that asserts that this is the only way in which equality can occur in (1) (assuming (2)); furthermore, if (1) is close to being satisfied with equality and (2) holds, then {A,B} must be close (in measure) to an example of the above form {A = \xi^{-1}(I), B = \xi^{-1}(J)}. Actually, for technical reasons (and for the applications we have in mind), it is important to establish an inverse theorem not just for (1), but for the more robust version mentioned earlier (in which the sumset {A+B} is replaced by the partial sumset {A +_\varepsilon B} consisting of “popular” sums).

Roughly speaking, the idea is as follows. Let us informally call {(A,B)} a critical pair if (2) holds and the inequality (1) (or more precisely, a robust version of this inequality) is almost obeyed with equality. The notion of a critical pair obeys some useful closure properties. Firstly, it is symmetric in {A,B}, and invariant with respect to translation of either {A} or {B}. Furthermore, from the submodularity inequality (3), one can show that if {(A_1,B)} and {(A_2,B)} are critical pairs (with {\mu(A_1 \cap A_2)} and {1 - \mu(A_1 \cup A_2) - \mu(B)} positive), then {(A_1 \cap A_2,B)} and {(A_1 \cup A_2, B)} are also critical pairs. (Note that this is consistent with the claim that critical pairs only occur when {A,B} come from arcs of a circle.) Similarly, from associativity {(A+B)+C = A+(B+C)}, one can show that if {(A,B)} and {(A+B,C)} are critical pairs, then so are {(B,C)} and {(A,B+C)}.

One can combine these closure properties to obtain further ones. For instance, suppose {A,B} is such that {\mu(A+B)  0}. Then (cheating a little bit), one can show that {(A+B,C)} is also a critical pair, basically because {A+B} is the union of the {A+b}, {b \in B}, the {(A+b,C)} are all critical pairs, and the {A+b} all intersect each other. This argument doesn’t quite work as stated because one has to apply the closure property under union an uncountable number of times, but it turns out that if one works with the robust version of sumsets and uses a random sampling argument to approximate {A+B} by the union of finitely many of the {A+b}, then the argument can be made to work.

Using all of these closure properties, it turns out that one can start with an arbitrary critical pair {(A,B)} and end up with a small set {C} such that {(A,C)} and {(kC,C)} are also critical pairs for all {1 \leq k \leq 10^4} (say), where {kC} is the {k}-fold sumset of {C}. (Intuitively, if {A,B} are thought of as secretly coming from the pullback of arcs {I,J} by some character {\xi}, then {C} should be the pullback of a much shorter arc by the same character.) In particular, {C} exhibits linear growth, in that {\mu(kC) = k\mu(C)} for all {1 \leq k \leq 10^4}. One can now use standard technology from inverse sumset theory to show first that {C} has a very large Fourier coefficient (and thus is biased with respect to some character {\xi}), and secondly that {C} is in fact almost of the form {C = \xi^{-1}(K)} for some arc {K}, from which it is not difficult to conclude similar statements for {A} and {B} and thus finish the proof of the inverse theorem.

In order to make the above argument rigorous, one has to be more precise about what the modifier “almost” means in the definition of a critical pair. I chose to do this in the language of “cheap” nonstandard analysis (aka asymptotic analysis), as discussed in this previous blog post; one could also have used the full-strength version of nonstandard analysis, but this does not seem to convey any substantial advantages. (One can also work in a more traditional “non-asymptotic” framework, but this requires one to keep much more careful account of various small error terms and leads to a messier argument.)

 

[Update, Nov 15: Corrected the attribution of the inequality (1) to Kneser instead of Kemperman.  Thanks to John Griesmer for pointing out the error.]


Filed under: math.CO, paper Tagged: inverse theorems, Kneser's theorem, sum set estimates

BackreactionAstrophysicist discovers yet another way to screw yourself over when modifying Einstein’s theory

Several people have informed me that phys.org has once again uncritically promoted a questionable paper, in this case by André Maeder from UNIGE. This story goes back to a press release by the author’s home institution and has since been hyped by a variety of other low-quality outlets. From what I gather from Maeder’s list of publications, he’s an astrophysicist who recently had the idea to

November 25, 2017

Sean CarrollThanksgiving

This year we give thanks for a simple but profound principle of statistical mechanics that extends the famous Second Law of Thermodynamics: the Jarzynski Equality. (We’ve previously given thanks for the Standard Model Lagrangian, Hubble’s Law, the Spin-Statistics Theorem, conservation of momentum, effective field theory, the error bar, gauge symmetry, Landauer’s Principle, the Fourier Transform, Riemannian Geometry, and the speed of light.)

The Second Law says that entropy increases in closed systems. But really it says that entropy usually increases; thermodynamics is the limit of statistical mechanics, and in the real world there can be rare but inevitable fluctuations around the typical behavior. The Jarzynski Equality is a way of quantifying such fluctuations, which is increasingly important in the modern world of nanoscale science and biophysics.

Our story begins, as so many thermodynamic tales tend to do, with manipulating a piston containing a certain amount of gas. The gas is of course made of a number of jiggling particles (atoms and molecules). All of those jiggling particles contain energy, and we call the total amount of that energy the internal energy U of the gas. Let’s imagine the whole thing is embedded in an environment (a “heat bath”) at temperature T. That means that the gas inside the piston starts at temperature T, and after we manipulate it a bit and let it settle down, it will relax back to T by exchanging heat with the environment as necessary.

Finally, let’s divide the internal energy into “useful energy” and “useless energy.” The useful energy, known to the cognoscenti as the (Helmholtz) free energy and denoted by F, is the amount of energy potentially available to do useful work. For example, the pressure in our piston may be quite high, and we could release it to push a lever or something. But there is also useless energy, which is just the entropy S of the system times the temperature T. That expresses the fact that once energy is in a highly-entropic form, there’s nothing useful we can do with it any more. So the total internal energy is the free energy plus the useless energy,

U = F + TS. \qquad \qquad (1)

Our piston starts in a boring equilibrium configuration a, but we’re not going to let it just sit there. Instead, we’re going to push in the piston, decreasing the volume inside, ending up in configuration b. This squeezes the gas together, and we expect that the total amount of energy will go up. It will typically cost us energy to do this, of course, and we refer to that energy as the work Wab we do when we push the piston from a to b.

Remember that when we’re done pushing, the system might have heated up a bit, but we let it exchange heat Q with the environment to return to the temperature T. So three things happen when we do our work on the piston: (1) the free energy of the system changes; (2) the entropy changes, and therefore the useless energy; and (3) heat is exchanged with the environment. In total we have

W_{ab} = \Delta F_{ab} + T\Delta S_{ab} - Q_{ab}.\qquad \qquad (2)

(There is no ΔT, because T is the temperature of the environment, which stays fixed.) The Second Law of Thermodynamics says that entropy increases (or stays constant) in closed systems. Our system isn’t closed, since it might leak heat to the environment. But really the Second Law says that the total of the last two terms on the right-hand side of this equation add up to a positive number; in other words, the increase in entropy will more than compensate for the loss of heat. (Alternatively, you can lower the entropy of a bottle of champagne by putting it in a refrigerator and letting it cool down; no laws of physics are violated.) One way of stating the Second Law for situations such as this is therefore

W_{ab} \geq \Delta F_{ab}. \qquad \qquad (3)

The work we do on the system is greater than or equal to the change in free energy from beginning to end. We can make this inequality into an equality if we act as efficiently as possible, minimizing the entropy/heat production: that’s an adiabatic process, and in practical terms amounts to moving the piston as gradually as possible, rather than giving it a sudden jolt. That’s the limit in which the process is reversible: we can get the same energy out as we put in, just by going backwards.

Awesome. But the language we’re speaking here is that of classical thermodynamics, which we all know is the limit of statistical mechanics when we have many particles. Let’s be a little more modern and open-minded, and take seriously the fact that our gas is actually a collection of particles in random motion. Because of that randomness, there will be fluctuations over and above the “typical” behavior we’ve been describing. Maybe, just by chance, all of the gas molecules happen to be moving away from our piston just as we move it, so we don’t have to do any work at all; alternatively, maybe there are more than the usual number of molecules hitting the piston, so we have to do more work than usual. The Jarzynski Equality, derived 20 years ago by Christopher Jarzynski, is a way of saying something about those fluctuations.

One simple way of taking our thermodynamic version of the Second Law (3) and making it still hold true in a world of fluctuations is simply to say that it holds true on average. To denote an average over all possible things that could be happening in our system, we write angle brackets \langle \cdots \rangle around the quantity in question. So a more precise statement would be that the average work we do is greater than or equal to the change in free energy:

\displaystyle \left\langle W_{ab}\right\rangle \geq \Delta F_{ab}. \qquad \qquad (4)

(We don’t need angle brackets around ΔF, because F is determined completely by the equilibrium properties of the initial and final states a and b; it doesn’t fluctuate.) Let me multiply both sides by -1, which means we  need to flip the inequality sign to go the other way around:

\displaystyle -\left\langle W_{ab}\right\rangle \leq -\Delta F_{ab}. \qquad \qquad (5)

Next I will exponentiate both sides of the inequality. Note that this keeps the inequality sign going the same way, because the exponential is a monotonically increasing function; if x is less than y, we know that ex is less than ey.

\displaystyle e^{-\left\langle W_{ab}\right\rangle} \leq e^{-\Delta F_{ab}}. \qquad\qquad (6)

(More typically we will see the exponents divided by kT, where k is Boltzmann’s constant, but for simplicity I’m using units where kT = 1.)

Jarzynski’s equality is the following remarkable statement: in equation (6), if we exchange  the exponential of the average work e^{-\langle W\rangle} for the average of the exponential of the work \langle e^{-W}\rangle, we get a precise equality, not merely an inequality:

\displaystyle \left\langle e^{-W_{ab}}\right\rangle = e^{-\Delta F_{ab}}. \qquad\qquad (7)

That’s the Jarzynski Equality: the average, over many trials, of the exponential of minus the work done, is equal to the exponential of minus the free energies between the initial and final states. It’s a stronger statement than the Second Law, just because it’s an equality rather than an inequality.

In fact, we can derive the Second Law from the Jarzynski equality, using a math trick known as Jensen’s inequality. For our purposes, this says that the exponential of an average is less than the average of an exponential, e^{\langle x\rangle} \leq \langle e^x \rangle. Thus we immediately get

\displaystyle e^{-\left\langle W_{ab}\right\rangle} \leq \left\langle e^{-W_{ab}}\right\rangle = e^{-\Delta F_{ab}}, \qquad\qquad (8)

as we had before. Then just take the log of both sides to get \langle W_{ab}\rangle \geq \Delta F_{ab}, which is one way of writing the Second Law.

So what does it mean? As we said, because of fluctuations, the work we needed to do on the piston will sometimes be a bit less than or a bit greater than the average, and the Second Law says that the average will be greater than the difference in free energies from beginning to end. Jarzynski’s Equality says there is a quantity, the exponential of minus the work, that averages out to be exactly the exponential of minus the free-energy difference. The function e^{-W} is convex and decreasing as a function of W. A fluctuation where W is lower than average, therefore, contributes a greater shift to the average of e^{-W} than a corresponding fluctuation where W is higher than average. To satisfy the Jarzynski Equality, we must have more fluctuations upward in W than downward in W, by a precise amount. So on average, we’ll need to do more work than the difference in free energies, as the Second Law implies.

It’s a remarkable thing, really. Much of conventional thermodynamics deals with inequalities, with equality being achieved only in adiabatic processes happening close to equilibrium. The Jarzynski Equality is fully non-equilibrium, achieving equality no matter how dramatically we push around our piston. It tells us not only about the average behavior of statistical systems, but about the full ensemble of possibilities for individual trajectories around that average.

The Jarzynski Equality has launched a mini-revolution in nonequilibrium statistical mechanics, the news of which hasn’t quite trickled to the outside world as yet. It’s one of a number of relations, collectively known as “fluctuation theorems,” which also include the Crooks Fluctuation Theorem, not to mention our own Bayesian Second Law of Thermodynamics. As our technological and experimental capabilities reach down to scales where the fluctuations become important, our theoretical toolbox has to keep pace. And that’s happening: the Jarzynski equality isn’t just imagination, it’s been experimentally tested and verified. (Of course, I remain just a poor theorist myself, so if you want to understand this image from the experimental paper, you’ll have to talk to someone who knows more about Raman spectroscopy than I do.)

November 23, 2017

John BaezThe Golden Ratio and the Entropy of Braids

Here’s a cute connection between topological entropy, braids, and the golden ratio. I learned about it in this paper:

• Jean-Luc Thiffeault and Matthew D. Finn, Topology, braids, and mixing in fluids.

Topological entropy

I’ve talked a lot about entropy on this blog, but not much about topological entropy. This is a way to define the entropy of a continuous map f from a compact topological space X to itself. The idea is that a map that mixes things up a lot should have a lot of entropy. In particular, any map defining a ‘chaotic’ dynamical system should have positive entropy, while non-chaotic maps maps should have zero entropy.

How can we make this precise? First, cover X with finitely many open sets U_1, \dots, U_k. Then take any point in X, apply the map f to it over and over, say n times, and report which open set the point lands in each time. You can record this information in a string of symbols. How much information does this string have? The easiest way to define this is to simply count the total number of strings that can be produced this way by choosing different points initially. Then, take the logarithm of this number.

Of course the answer depends on n, typically growing bigger as n increases. So, divide it by n and try to take the limit as n \to \infty. Or, to be careful, take the lim sup: this could be infinite, but it’s always well-defined. This will tell us how much new information we get, on average, each time we apply the map and report which set our point lands in.

And of course the answer also depends on our choice of open cover U_1, \dots, U_k. So, take the supremum over all finite open covers. This is called the topological entropy of f.

Believe it or not, this is often finite! Even though the log of the number of symbol strings we get will be larger when we use a cover with lots of small sets, when we divide by n and take the limit as n \to \infty this dependence often washes out.

Braids

Any braid gives a bunch of maps from the disc to itself. So, we define the entropy of a braid to be the minimum—or more precisely, the infimum—of the topological entropies of these maps.

How does a braid give a bunch of maps from the disc to itself? Imagine the disc as made of very flexible rubber. Grab it at some finite set of points and then move these points around in the pattern traced out by the braid. When you’re done you get a map from the disc to itself. The map you get is not unique, since the rubber is wiggly and you could have moved the points around in slightly different ways. So, you get a bunch of maps.

I’m being sort of lazy in giving precise details here, since the idea seems so intuitively obvious. But that could be because I’ve spent a lot of time thinking about braids, the braid group, and their relation to maps from the disc to itself!

This picture by Thiffeault and Finn may help explain the idea:



As we keep move points around each other, we keep building up more complicated braids with 4 strands, and keep getting more complicated maps from the disc to itself. In fact, these maps are often chaotic! More precisely: they often have positive entropy.

In this other picture the vertical axis represents time, and we more clearly see the braid traced out as our 4 points move around:



Each horizontal slice depicts a map from the disc (or square: this is topology!) to itself, but we only see their effect on a little rectangle drawn in black.

The golden ratio

Okay, now for the punchline!

Puzzle 1. Which braid with 3 strands has the highest entropy per generator? What is its entropy per generator?

I should explain: any braid with 3 strands can be written as a product of generators \sigma_1, \sigma_2, \sigma_1^{-1}, \sigma_2^{-1}. Here \sigma_1 switches strands 1 and 2 moving the counterclockwise around each other, \sigma_2 does the same for strands 2 and 3, and \sigma_1^{-1} and \sigma_2^{-1} do the same but moving the strands clockwise.

For any braid we can write it as a product of n generators with n as small as possible, and then we can evaluate its entropy divided by n. This is the right way to compare the entropy of braids, because if a braid gives a chaotic map we expect powers of that braid to have entropy growing linearly with n.

Now for the answer to the puzzle!

Answer 1. A 3-strand braid maximizing the entropy per generator is \sigma_1 \sigma_2^{-1}. And the entropy of this braid, per generator, is the logarithm of the golden ratio:

\displaystyle{ \log \left( \frac{\sqrt{5} + 1}{2} \right) }

In other words, the entropy of this braid is

\displaystyle{ \log \left( \frac{\sqrt{5} + 1}{2} \right)^2 }

All this works regardless of which base we use for our logarithms. But if we use base e, which seems pretty natural, the maximum possible entropy per generator is

\displaystyle{ \ln \left( \frac{\sqrt{5} + 1}{2} \right) \approx 0.48121182506\dots }

Or if you prefer base 2, then each time you stir around a point in the disc with this braid, you’re creating

\displaystyle{ \log_2 \left( \frac{\sqrt{5} + 1}{2} \right) \approx 0.69424191363\dots }

bits of unknown information.

This fact was proved here:

• D. D’Alessandro, M. Dahleh and I Mezíc, Control of mixing in fluid flow: A maximum entropy approach, IEEE Transactions on Automatic Control 44 (1999), 1852–1863.

So, people call this braid \sigma_1 \sigma_2^{-1} the golden braid. But since you can use it to generate entropy forever, perhaps it should be called the eternal golden braid.

What does it all mean? Well, the 3-strand braid group is called \mathrm{B}_3, and I wrote a long story about it:

• John Baez, This Week’s Finds in Mathematical Physics (Week 233).

You’ll see there that \mathrm{B}_3 has a representation as 2 × 2 matrices:

\displaystyle{ \sigma_1 \mapsto \left(\begin{array}{rr} 1 & 1 \\ 0 & 1 \end{array}\right)}

\displaystyle{ \sigma_2 \mapsto \left(\begin{array}{rr} 1 & 0 \\ -1 & 1 \end{array}\right) }

These matrices are shears, which is connected to how the braids \sigma_1 and \sigma_2 give maps from the disc to itself that shear points. If we take the golden braid and turn it into a matrix using this representation, we get a matrix for which the magnitude of its largest eigenvalue is the square of the golden ratio! So, the amount of stretching going on is ‘the golden ratio per generator’.

I guess this must be part of the story too:

Puzzle 2. Is it true that when we multiply n matrices of the form

\displaystyle{ \left(\begin{array}{rr} 1 & 1 \\ 0 & 1 \end{array}\right)  , \quad \left(\begin{array}{rr} 1 & 0 \\ -1 & 1 \end{array}\right) }

or their inverses:

\displaystyle{ \left(\begin{array}{rr} 1 & -1 \\ 0 & 1 \end{array}\right)  , \quad \left(\begin{array}{rr} 1 & 0 \\ 1 & 1 \end{array}\right) }

the magnitude of the largest eigenvalue of the resulting product can never exceed the nth power of the golden ratio?

There’s also a strong connection between braid groups, certain quasiparticles in the plane called Fibonacci anyons, and the golden ratio. But I don’t see the relation between these things and topological entropy! So, there is a mystery here—at least for me.

For more, see:

• Matthew D. Finn and Jean-Luc Thiffeault, Topological optimisation of rod-stirring devices, SIAM Review 53 (2011), 723—743.

Abstract. There are many industrial situations where rods are used to stir a fluid, or where rods repeatedly stretch a material such as bread dough or taffy. The goal in these applications is to stretch either material lines (in a fluid) or the material itself (for dough or taffy) as rapidly as possible. The growth rate of material lines is conveniently given by the topological entropy of the rod motion. We discuss the problem of optimising such rod devices from a topological viewpoint. We express rod motions in terms of generators of the braid group, and assign a cost based on the minimum number of generators needed to write the braid. We show that for one cost function—the topological entropy per generator—the optimal growth rate is the logarithm of the golden ratio. For a more realistic cost function,involving the topological entropy per operation where rods are allowed to move together, the optimal growth rate is the logarithm of the silver ratio, 1+ \sqrt{2}. We show how to construct devices that realise this optimal growth, which we call silver mixers.

Here is the silver ratio:

But now for some reason I feel it’s time to stop!


n-Category Café Real Sets

Good news! Janelidze and Street have tackled some puzzles that are perennial favorites here on the nn-Café:

  • George Janelidze and Ross Street, Real sets, Tbilisi Mathematical Journal, 10 (2017), 23–49.

Abstract. After reviewing a universal characterization of the extended positive real numbers published by Denis Higgs in 1978, we define a category which provides an answer to the questions:

• what is a set with half an element?

• what is a set with π elements?

The category of these extended positive real sets is equipped with a countable tensor product. We develop somewhat the theory of categories with countable tensors; we call the commutative such categories series monoidal and conclude by only briefly mentioning the non-commutative possibility called ω-monoidal. We include some remarks on sets having cardinalities in [,][-\infty,\infty].

First they define a series magma, which is a set AA equipped with an element 00 and a summation function

:A A \sum \colon A^{\mathbb{N}} \to A

obeying a nice generalization of the law a+0=0+a=aa + 0 = 0 + a = a. Then they define a series monoid in which this summation function obeys a version of the commutative law.

(Yeah, the terminology here seems a bit weird: their summation function already has associativity built in, so their ‘series magma’ is associative and their ‘series monoid’ is also commutative!)

The forgetful functor from series monoids to sets has a left adjoint, and as you’d expect, the free series monoid on the one-element set is {}\mathbb{N} \cup \{\infty\}. A more interesting series monoid is [0,][0,\infty], and one early goal of the paper is to recall Higgs’ categorical description of this. That’s Denis Higgs. Peter Higgs has a boson, but Denis Higgs has a nice theorem.

First, some preliminaries:

Countable products of series monoids coincide with countable coproducts, just as finite products of commutative monoids coincide with finite coproducts.

There is a tensor product of series monoids, which is very similar to the tensor product of commutative monoids —- or, to a lesser extent, the more familiar tensor product of abelian groups. Monoids with respect to this tensor product are called series rigs. For abstract nonsense reasons, because {}\mathbb{N} \cup \{\infty\} is the free series monoid on one elements, it also becomes a series rig… with the usual multiplication and addition. (Well, more or less usual: if you’re not familiar with this stuff, a good exercise is to figure out what 00 times \infty must be.)

Now for the characterization of [0,][0,\infty]. Given an endomorphism f:AAf \colon A \to A of a series monoid AA you can define a new endomorphism f¯:AA\overline{f} \colon A \to A by

f¯=f+ff+fff+ \overline{f} = f + f\circ f + f \circ f \circ f + \cdots

where the infinite sum is defined using the series monoid structure on AA. Following Higgs, Janelidze and Street define a Zeno morphism to be an endomorphism hmapsAAh \maps A \to A such that

h¯=1 A \overline{h} = 1_A

The reason for this name is that in [0,][0,\infty] we have

1=12+(12) 2+(12) 3+ 1 = \frac{1}{2} + \left(\frac{1}{2}\right)^2 + \left(\frac{1}{2}\right)^3 + \cdots

putting us in mind of Zeno’s paradox:

That which is in locomotion must arrive at the half-way stage before it arrives at the goal. — Aristotle, Physics VI:9, 239b10.

So, it makes lots of sense to think of any Zeno morphism h:AAh \colon A \to A as a ‘halving’ operation. Hence the name hh.

In particular, one can show any Zeno morphism obeys

h+h=1 A h + h = 1_A

Higgs called a series monoid equipped with a Zeno morphism a magnitude module, and he showed that the free magnitude module on one element is [0,][0,\infty]. By the same flavor of abstract nonsense as before, this implies that [0,][0,\infty] is a series rig…. with the usual addition and multiplication.

Categorification

Next, Janelidze and Street categorify the entire discussion so far! They define a ‘series monoidal category’ to be a category AA with an object 0A0 \in A and summation functor

:A A \sum \colon A^{\mathbb{N}} \to A

obeying some reasonable properties… up to natural isomorphisms that themselves obey some reasonable properties. So, it’s a category where we can add infinite sequences of objects. For example, every series monoid gives a series monoidal category with only identity morphisms. The maps between series monoidal categories are called ‘series monoidal functors’.

They define a ‘Zeno functor’ to be a series monoidal functor h:AAh \colon A \to A obeying a categorified version of the definition of Zeno morphism. A series monoidal category with a Zeno functor is called a ‘magnitude category’.

As you’d guess, there are also ‘magnitude functors’ and ‘magnitude natural transformations’, giving a 2-category MgnCatMgnCat. There’s a forgetful 2-functor

U:MgnCatCat U \colon MgnCat \to Cat

and it has a left adjoint (or, as Janelidze and Street say, a left ‘biadjoint’)

F:CatMgnCat F \colon Cat \to MgnCat

Applying FF to the terminal category 11, they get a magnitude category RSet gRSet_g of positive real sets. These are like sets, but their cardinality can be anything in [0,][0,\infty]!

For example, Janelidze and Street construct a positive real set of cardinality π\pi. Unfortunately they do it starting from the binary expansion of π\pi, so it doesn’t connect in a very interesting way with anything I know about the number π\pi.

What’s that little subscript gg? Well, unfortunately RSet gRSet_g is a groupoid: the only morphisms between positive real sets we get from this construction are the isomorphisms.

So, there’s a lot of great stuff here, but apparently a lot left to do.

Digressive Postlude

There is more to say, but I need to get going — I have to walk 45 minutes to Paris 7 to talk to Mathieu Anel about symplectic geometry, and then have lunch with him and Paul-André Melliès. Paul-André kindly invited me to participate in his habilitation defense on Monday, along with Gordon Plotkin, André Joyal, Jean-Yves Girard, Thierry Coquand, Pierre-Louis Curien, George Gonthier, and my friend Karine Chemla (an expert on the history of Chinese mathematics). Paul-André has some wonderful ideas on linear logic, Frobenius pseudomonads, game semantics and the like, and we want to figure out more precisely how all this stuff is connected to topological quantum field theory. I think nobody has gotten to the bottom of this! So, I hope to spend more time here, figuring it out with Paul-André.

n-Category Café Star-autonomous Categories are Pseudo Frobenius Algebras

A little while ago I talked about how multivariable adjunctions naturally form a polycategory: a structure like a multicategory, but in which codomains as well as domains can involve multiple objects. Now I want to talk about some structures we can define inside this polycategory MVarMVar.

What can you define inside a polycategory? Well, to start with, a polycategory has an underlying multicategory, consisting of the arrows with unary target; so anything we can define in a multicategory we can define in a polycategory. And the most basic thing we can define in a multicategory is a monoid object — in fact, there are some senses in which this is the canonical thing we can define in a multicategory.

So what is a monoid object in MVarMVar?

Well, actually it’s more interesting to ask about pseudomonoid objects, using the 2-categorical structure of MVarMVar. In this case what we have is an object AA, a (0,1)-variable adjunction i:()Ai:()\to A (which, recall, is just an object iAi\in A), and a (2,1)-variable adjunction m:(A,A)Am:(A,A)\to A, together with coherent associativity and unit isomorphisms. The left adjoint part of mm is a functor A×AAA\times A\to A, and the associativity and unit isomorphisms then make AA into a monoidal category. And to say that this functor extends to a multivariable adjunction is precisely to say that AA is a closed monoidal category, i.e. that its tensor product has a right adjoint in each variable:

A(xy,z)A(y,xz)A(x,zy) A(x\otimes y,z) \cong A(y, x\multimap z) \cong A(x, z \;⟜\; y)

Similarly, we can define braided pseudomonoids and symmetric pseudomonoids in any 2-multicategory, and in MVarMVar these specialize to braided and symmetric closed monoidal categories.

Now, what can we define in a polycategory that we can’t define in a multicategory? The most obvious monoid-like thing that involves multiple objects in a codomain is a comonoid. So what is a pseudo-comonoid in MVarMVar?

I think this question is easiest to answer if we use the duality of MVarMVar to turn everything around. So a pseudo-comonoid structure on a category AA is the same as a pseudo-monoid structure on A opA^{op}. In terms of AA, that means it’s a monoidal structure that’s co-closed, i.e. the tensor product functor has a left adjoint in each variable:

A(x,yz)A(yx,z)A(xz,y). A(x, y \odot z) \cong A(y \rhd x, z) \cong A(x \lhd z , y).

The obvious next thing to do is to mix a monoid structure with a comonoid structure. In general, there’s more than one way to do that: we could think about bimonoids, Hopf monoids, or Frobenius monoids. However, while all of these can be defined in any symmetric monoidal category (or PROP), in a polycategory, bimonoids and Hopf monoids don’t make sense, because their axioms involve composing along multiple objects at once, whereas in a polycategory we are only allowed to compose along one object at a time.

Frobenius algebras, however, make perfect sense in a polycategory. If you look at the usual definition in a monoidal category, you can see that the axioms only involve composing along one object at once; when they’re written topologically that corresponds to the “absence of holes”.

So what is a pseudo Frobenius algebra in MVarMVar? Actually, let’s ask a more general question first: what is a lax Frobenius algebra in MVarMVar? By a lax Frobenius algebra I mean an object with a pseudo-monoid structure and a pseudo-comonoid structure, together with not-necessarily invertible “Frobenius-ator” 2-cells

satisfying some coherence axioms, which can be found for instance in this paper (pages 52-55). This isn’t quite as scary as it looks; there are 20 coherence diagrams listed there, but the first 2 are the associativity pentagons for the pseudomonoid and pseudo-comonoid, while the last 8 are the unit axioms for the pseudomonoid and pseudo-comonoid (of which the 17 th17^{\mathrm{th}} and 18 th18^{\mathrm{th}} imply the other 6, by an old observation of Kelly). Of the remaining 10 axioms, 6 assert compatibility of the Frobenius-ators with the associativities, while 4 assert their compatibility with the units.

Now, to work out what a lax Frobenius algebra in MVarMVar is, we need to figure out what (2,2)-variable adjunctions (A,A)(A,A)(A,A)\to (A,A) those pictures represent. To work out what these functors are, I find it helpful to draw the monoid and comonoid structures with all the possible choices for input/output:

By the mates correspondence, to characterize a 2-cell in MVarMVar it suffices to consider one of the functors involved in the multivariable adjunctions, which means we should pick one of the copies of AA to be the “output” and consider all the others as the “input”. I find it easier to pick different copies of AA for the two Frobenius-ators. For the first one, let’s pick the second copy of AA in the codomain; this gives

In the domain of the 2-cell, on the right, xx and yy come in and get combined into xyx\otimes y, and then that gets treated as ww and gets combined with uu coming in from the lower-left to give u(xy)u\rhd (x\otimes y). In the codomain of the 2-cell, on the left, first xx gets combined with uu to give uxu\rhd x, then that gets multiplied with yy to give (ux)y(u\rhd x) \otimes y. Thus, the first Frobenius-ator is

u(xy)(ux)y. u\rhd (x\otimes y)\to (u\rhd x) \otimes y.

For the second Frobenius-ator, let’s dually pick the first copy of AA in the codomain to be the output:

Thus the second Frobenius-ator is

(xy)vx(yv). (x\otimes y)\lhd v \to x\otimes (y\lhd v).

What is this? Well, let’s take mates once with respect to the co-closed monoidal structure to reexpress both Frobenius-ators in terms of \otimes and \odot. The first gives

(ux)yu(u((ux)y))u((u(xx))y)u(xy). (u \odot x) \otimes y \to u \odot (u\rhd ((u \odot x) \otimes y)) \to u\odot ((u\rhd (x\odot x)) \otimes y) \to u \odot (x\otimes y).

and the second dually gives

x(yv)((x(yv))v)v(x((yv)v))v(xy)v. x \otimes (y\odot v) \to ((x \otimes (y\odot v)) \lhd v) \odot v \to (x \otimes ((y\odot v) \lhd v)) \odot v \to (x\otimes y)\odot v.

These two transformations (ux)yu(xy)(u \odot x) \otimes y \to u \odot (x\otimes y) and x(yv)(xy)vx \otimes (y\odot v) \to (x\otimes y)\odot v have exactly the shape of the “linear distributivity” transformations in a linearly distributive category! (Remember from last time that linearly distributive categories are the “representable” version of polycategories.) The latter are supposed to satisfy their own coherence axioms, which aren’t listed on the nLab, but if you look up the original Cockett-Seely paper and count them there are… 10 axioms… 6 asserting compatibility with associativity of \otimes and \odot, and 4 asserting compatibility with the unit. In other words,

A lax Frobenius algebra in MVarMVar is precisely a linearly distributive category! (In which \otimes is closed and \odot is co-closed.)

Note that this is at least an approximate instance of the microcosm principle. (I have to admit that I have not actually checked that the two groups of coherence axioms coincide under the mates correspondence, but I find it inconceivable that they don’t.)

The next thing to ask is what a pseudo Frobenius algebra is, i.e. what it means for the Frobenius-ators to be invertible. If you’ve come this far (or if you read the title of the post) you can probably guess the answer: a *\ast-autonomous category, i.e. a linearly distributive category in which all objects have duals (in the polycategorical sense I defined in the first post).

First note that in a *\ast-autonomous category, \otimes is always closed and \odot is co-closed, with (xz)=(x *z)(x\multimap z) = (x^\ast \odot z) and (uw)=(u *w)(u\rhd w) = (u^\ast \otimes w) and so on. With these definitions, the Frobenius-ators become just associativity isomorphisms:

u(xy)=u *(xy)(u *x)y=(ux)y. u\rhd (x\otimes y) = u^\ast \otimes (x\otimes y) \cong (u^\ast \otimes x) \otimes y = (u\rhd x) \otimes y.

(xy)v=(xy)v *x(yv *)=x(yv). (x\otimes y)\lhd v = (x\otimes y)\otimes v^\ast \cong x\otimes (y\otimes v^\ast) = x\otimes (y\lhd v).

Thus, an *\ast-autonomous category is a pseudo Frobenius algebra in MVarMVar. Conversely, if AA is a pseudo Frobenius algebra in MVarMVar, then letting x=ix=i be the unit object of \otimes, we have

uyu(iy)(ui)y u\rhd y \cong u\rhd (i\otimes y) \cong (u\rhd i) \otimes y

giving an isomorphism

A(y,uv)A(uy,v)A((ui)y,v).A(y, u\odot v) \cong A(u\rhd y, v) \cong A((u\rhd i) \otimes y, v).

Thus uiu\rhd i behaves like a dual of uu, and with a little more work we can show that it actually is. (I’m totally glossing over the symmetric/non-symmetric distinction here; in the non-symmetric case one has to distinguish between left and right duals, blah blah blah, but it all works.) So

A pseudo Frobenius algebra in MVarMVar is precisely a *\ast-autonomous category!

The fact that there’s a relationship between Frobenius algebras and *\ast-autonomous categories is not new. In this paper, Brian Day and Ross Street showed that pseudo Frobenius algebras in ProfProf can be identified with “pro-*\ast-autonomous categories”, i.e. promonoidal categories that are *\ast-autonomous in a suitable sense. In this paper Jeff Egger showed that Frobenius algebras in the *\ast-autonomous category Sup of suplattices can be identified with *\ast-autonomous cocomplete posets. And Jeff has told me personally that he also noticed that lax Frobenius algebras correspond to mere linear distributivity. (By the way, the above characterization of *\ast-autonomous categories as closed and co-closed linearly distributive ones such that certain transformations are invertible is due to Cockett and Seely.)

What’s new here is that the pseudo Frobenius algebras in MVarMVar are exactly *\ast-autonomous categories — not pro, not posets, not cocomplete.

There’s more that could be said. For instance, it’s known that Frobenius algebras can be defined in many different ways. One example is that instead of giving an algebra and coalgebra structure related by a Frobenius axiom, we could give just the algebra structure along with a compatible nondegenerate pairing AAIA\otimes A \to I. This is also true for pseudo Frobenius algebras in a polycategory, and in MVarMVar such a pairing (A,A)()(A,A) \to () corresponds to a contravariant self-equivalence () *:AA op(-)^\ast : A\simeq A^{op}, leading to the perhaps-more-common definition of star-autonomous category involving such a self-duality. And so on; but maybe I’ll stop here.

John PreskillMajorana update

If you are, by any chance, following progress in the field of Majorana bound states, then you are for sure super excited about ample Majorana results arriving this Fall. On the other hand, if you just heard about these elusive states recently, it is time for an update. For physicists working in the field, this Fall was perhaps the most exciting time since the first experimental reports from 2012. In the last few weeks there was not only one, but at least three interesting manuscripts reporting new insightful data which may finally provide a definitive experimental verification of the existence of these states in condensed matter systems.

But before I dive into these new results, let me give a brief history on the topic of  Majorana states and their experimental observation. The story starts with the young talented physicist Ettore Majorana, who hypothesized back in 1937 the existence of fermionic particles which were their own antiparticles. These hypothetical particles, now called Majorana fermions, were proposed in the context of elementary particle physics, but never observed. Some 60 years later, in the early 2000s, theoretical work emerged showing that Majorana fermionic states can exist as the quasiparticle excitations in certain low-dimensional superconducting systems (not a real particle as originally proposed, but otherwise having the exact same properties). Since then theorists have proposed half a dozen possible ways to realize Majorana modes using readily available materials such as superconductors, semiconductors, magnets, as well as topological insulators (for curious readers, I recommend manuscripts [1, 2, 3] for an overview of the different proposed methods to realize Majorana states in the lab).

The most fascinating thing about Majorana states is that they belong to the class of anyons, which means that they behave neither as bosons nor as fermions upon exchange. For example, if you have two identical fermionic (or bosonic) states and you exchange their positions, the quantum mechanical function describing the two states will acquire a phase factor of -1 (or +1). Anyons, on the other hand, can have an arbitrary phase factor eiφ upon exchange. For this reason, they are considered to be a starting point for topological quantum computation. If you want to learn more about anyons, check out the video below featuring IQIM’s Gil Refael and Jason Alicea.

 

Back in 2012, a group in Delft (led by Prof. Leo Kouwenhoven) announced the observation of zero-energy states in a nanoscale device consisting of a semiconductor nanowire coupled to a superconductor. These states behaved very similarly to the Majoranas that were previously predicted to occur in this system. The key word here is ‘similar’, since the behavior of these modes was not fully consistent with the theoretical predictions. Namely, the electrical conductance carried through the observed zero energy states was only about ~5% of the expected perfect transmission value for Majoranas. This part of the data was very puzzling, and immediately cast some doubts throughout the community. The physicists were quickly divided into what I will call enthusiasts (believers that these initial results indeed originated from Majorana states) and skeptics (who were pointing out that effects, other than Majoranas, can result in similarly looking zero energy peaks). And thus a great debate started.

In the coming years, experimentalists tried to observe zero energy features in improved devices, track how these features evolve with external parameters, such as gate voltages, length of the wires, etc., or focus on completely different platforms for hosting Majorana states, such as magnetic flux vortices in topological superconductors and magnetic atomic chains placed on a superconducting surface.  However, these results were not enough to convince skeptics that the observed states indeed originated from the Majoranas and not some other yet-to-be-discovered phenomenon. And so, the debate continued. With each generation of the experiments some of the alternative proposed scenarios were ruled out, but the final verification was still missing.

Fast forward to the events of this Fall and the exciting recent results. The manuscript I would like to invite you to read was just posted on ArXiv a couple of weeks ago. The main result is the observation of the perfectly quantized 2e2/h conductance at zero energy, the long sought signature of the Majorana states. This quantization implies that in this latest generation of semiconducting-superconducting devices zero-energy states exhibit perfect electron-hole symmetry and thus allow for perfect Andreev reflection. These remarkable results may finally end the debate and convince most of the skeptics out there.

Fig_blog

Figure 1. (a,b) Comparison between devices and measurements from 2012 and 2017. (a) In 2012 a device made by combining a superconductor (Niobium Titanium Nitride alloy) and Indium Antimonide nanowire resulted in the first signature of zero energy states but the conductance peak was only about 0.1 x e2/h. Adapted from Mourik et al. Science 2012. (b) Similar device from 2017 made by carefully depositing superconducting Aluminum on Indium Arsenide. The fully developed 2e2/h conductance peak was observed. Adapted from Zhang et. al. ArXiv 2017. (c) Schematics of the Andreev reflection through the Normal (N)/Superconductor (S) interface. (d,e) Alternative view of the Andreev reflection process as a tunneling through a double barrier without and with Majorana modes (shown in yellow).

To fully appreciate these results, it is useful to quickly review the physics of Andreev reflection (Fig. 1c-e) that occurs at the interface between a normal region with a superconductor [4]. As the electron (blue) in the normal region enters a superconductor and pulls an additional electron with it to form a Copper pair, an extra hole (red) is left behind (Fig. 1(c)). You can also think about this process as the transmission through two leads, one connecting the superconductor to the electrons and the other to the holes (Fig. 1d). This allows us to view this problem as a transmission through the double barrier that is generally low. In the presence of a Majorana state, however, there is a resonant level at zero energy which is coupled with the same amplitude with both electrons and holes. This in turn results in the resonant Andreev reflection with a perfect quantization of 2e2/h (Fig. 1e). Note that, even in the configuration without Majorana modes, perfect quantization is possible but highly unlikely as it requires very careful tuning of the barrier potential (the authors did show that their quantization is robust against tuning the voltages on the gates, ruling out this possibility).

Going back to the experiments, you may wonder what made this breakthrough possible? It seems to be the combination of various factors, including using epitaxially grown  superconductors and more sophisticated fabrication methods. As often happens in experimental physics, this milestone did not come from one ingenious idea, but rather from numerous technical improvements obtained by several generations of hard-working grad students and postdocs.

If you are up for more Majorana reading, you can find two more recent eye-catching manuscripts here and here. Note that the list of interesting recent Majorana papers is a mere selection by the author and not complete by any means. A few months ago, my IQIM colleagues wrote a nice blog entry about topological qubits arriving in 2018. Although this may sound overly optimistic, the recent results suggest that the field is definitely taking off. While there are certainly many challenges to be solved, we may see the next generation of experiments designed to probe control over the Majorana states quite soon. Stay tuned for more!!!!!!


November 21, 2017

Doug NatelsonMax the Demon and the Entropy of Doom

My readers know I've complained/bemoaned repeatedly how challenging it can be to explain condensed matter physics on a popular level in an engaging way, even though that's the branch of physics that arguably has the greatest impact on our everyday lives.  Trying to take such concepts and reach an audience of children is an even greater, more ambitious task, and teenagers might be the toughest crowd of all.  A graphic novel or comic format is one visually appealing approach that is a lot less dry and perhaps more nonthreatening than straight prose.   Look at the success of xkcd and Randall Munroe!   The APS has had some reasonable success with their comics about their superhero Spectra.  Prior to that, Larry Gonick had done a very nice job on the survey side with the Cartoon Guide to Physics.  (On the parody side, I highly recommend Science Made Stupid (pdf) by Tom Weller, a key text from my teen years.  I especially liked Weller's description of the scientific method, and his fictional periodic table.)

Max the Demon and the Entropy of Doom is a new entry in the field, by Assa Auerbach and Richard Codor.  Prof. Auerbach is a well-known condensed matter theorist who usually writes more weighty tomes, and Mr. Codor is a professional cartoonist and illustrator.  The book is an entertaining explanation of the laws of thermodynamics, with a particular emphasis on the Second Law, using a humanoid alien, Max (the Demon), as an effective superhero.  

The comic does a good job, with nicely illustrated examples, of getting the point across about entropy as counting how many (microscopic) ways there are to do things.  One of Max's powers is the ability to see and track microstates (like the detailed arrangement and trajectory of every air molecule in this room), when mere mortals can only see macrostates (like the average density and temperature).    It also illustrates what we mean by temperature and heat with nice examples (and a not very subtle at all environmental message).   There's history (through the plot device of time travel), action, adventure, and a Bad Guy who is appropriately not nice (and has a connection to history that I was irrationally pleased about guessing before it was revealed).   My kids thought it was good, though my sense is that some aspects were too conceptually detailed for 12 years old and others were a bit too cute for world-weary 15.  Still, a definite good review from a tough crowd, and efforts like this should be applauded - overall I was very impressed.

November 19, 2017

Jordan Ellenberg“Worst of the worst maps”: a factual mistake in Gill v. Whitford

The oral arguments in Gill v. Whitford, the Wisconsin gerrymandering case, are now a month behind us.  But there’s a factual error in the state’s case, and I don’t want to let it be forgotten.  Thanks to Mira Bernstein for pointing this issue out to me.

Misha Tseytlin, Wisconsin’s solicitor general, was one of two lawyers arguing that the state’s Republican-drawn legislative boundaries should be allowed to stand.  Tseytlin argued that the metrics that flagged Wisconsin’s maps as drastically skewed in the GOP’s favor were unreliable:

And I think the easiest way to see this is to take a look at a chart that plaintiff’s own expert created, and that’s available on Supplemental Appendix 235. This is plain — plaintiff’s expert studied maps from 30 years, and he identified the 17 worst of the worst maps. What is so striking about that list of 17 is that 10 were neutral draws.  There were court-drawn maps, commission-drawn maps, bipartisan drawn maps, including the immediately prior Wisconsin drawn map.

That’s a strong claim, which jumped out at me when I read the transcripts–10 of the 17 very worst maps, according to the metrics, were drawn by neutral parties!  That really makes it sound like whatever those metrics are measuring, it’s not partisan gerrymandering.

But the claim isn’t true.

(To be clear, I believe Tseytlin made a mistake here, not a deliberate misrepresentation.)

The table he’s referring to is on p.55 of this paper by Simon Jackman, described as follows:

Of these, 17 plans are utterly unambiguous with respect to the sign of the efficiency gap estimates recorded over the life of the plan:

Let me unpack what Jackman’s saying here.  These are the 17 maps where we can be sure the efficiency gap favored the same party, three elections in a row.  You might ask: why wouldn’t we be sure about which side the map favors?  Isn’t the efficiency gap something we can compute precisely?  Not exactly.  The basic efficiency gap formula assumes both parties are running candidates in every district.  If there’s an uncontested race, you have to make your best estimate for what the candidate’s vote shares would have been if there had been candidates of both parties.  So you have an estimate for the efficiency gap, but also some uncertainty.  The more uncontested races, the more uncertain you are about the efficiency gap.

So the maps on this list aren’t the 17 “worst of the worst maps.”  They’re not the ones with the highest efficiency gaps, not the ones most badly gerrymandered by any measure.  They’re the ones in states with so few uncontested races that we can be essentially certain the efficiency gap favored the same party three years running.

Tseytlin’s argument is supposed to make you think that big efficiency gaps are as likely to come from neutral maps as partisan ones.  But that’s not true.  Maps drawn by Democratic legislatures have average efficiency gap favoring Democrats; those by GOP on average favor the GOP; neutral maps are in between, and have smaller efficiency gaps overall.

That’s from p.35 of another Jackman paper.  Note the big change after 2010.  It wasn’t always the case that partisan legislators automatically thumbed the scales strongly in their favor when drawing the maps.  But these days, it kind of is.  Is that because partisanship is worse now?  Or because cheaper, faster computation makes it easier for one-party legislatures to do what they always would have done, if they could?  I can’t say for sure.

Efficiency gap isn’t a perfect measure, and neither side in this case is arguing it should be the single or final arbiter of unconstitutional gerrymandering.  But the idea that efficiency gap flags neutral maps as often as partisan maps is just wrong, and it shouldn’t have been part of the state’s argument before the court.


Chad OrzelMeet Charlie

It’s been a couple of years since we lost the Queen of Niskayuna, and we’ve held off getting a dog until now because we were planning a big home renovation– adding on to the mud room, creating a new bedroom on the second floor, and gutting and replacing the kitchen. This was quite the undertaking, and we would not have wanted to put a dog through that. It was bad enough putting us through that…

Withe the renovation complete, we started looking for a dog a month or so back, and eventually ended up working with a local rescue group with the brilliantly unsubtle name Help Orphan Puppies. This weekend, we officially adopted this cutie:

Charlie, the new pupper at Chateau Steelypips, showing off his one pointy ear.

He was listed on the website as “Prince,” but his foster family had been calling him “Charlie,” and the kids liked that name a lot, so we’re keeping it. He’s a Plott Hound mix (the “mix” being evident in the one ear that sticks up while the other flops down), one of six puppies found with his mother back in May in a ravine in I think they said South Carolina. He’s the last of the litter to find a permanent home. The name change is appropriate, as Emmy was listed as “Princess” before we adopted her and changed her name.

Charlie’s a sweet and energetic boy, who’s basically housebroken, and sorta-kinda crate trained, which is about the same as Emmy when we got her. He knows how to sit, and is learning other commands. He’s very sweet with people, and we haven’t really met any other dogs yet, but he was fostered in a home with two other dogs, so we hope he’ll do well. And he’s super good at jumping– he cleared a 28″ child safety gate we were attempting to use to keep him in the mud room– and does a zoom with the best of them:

Charlie does a zoom.

The kids are absolutely over the moon about having a dog again, as you can see from their paparazzi turn:

Charlie poses for the paparazzi.

He’s a very good boy, all in all, and we’re very pleased to have him. I can’t really describe how good it felt on Saturday afternoon to once again settle down on the couch with a football game on tv, and drop my hand down to pet a dog lying on the floor next to me. I still miss some things about Emmy, but Charlie’s already filling a huge void.

Chad OrzelGo On Till You Come to the End; Then Stop

ScienceBlogs is coming to an end. I don’t know that there was ever a really official announcement of this, but the bloggers got email a while back letting us know that the site will be closing down. I’ve been absolutely getting crushed between work and the book-in-progress and getting Charlie the pupper, but I did manage to export and re-import the content to an archive site back on steelypips.org. (The theme there is an awful default WordPress one, but I’m too slammed with work to make it look better; the point is just to have an online archive for the temporary redirects to work with.)

I’m one of a handful who were there from the very beginning to the bitter end– I got asked to join up in late 2005, and the first new post here was on January 11,2016 (I copied over some older content before it went live, so it wasn’t just a blank page with a “Welcome to my new blog!” post). It seems fitting to have the last post be on the site’s last day of operation.

The history of ScienceBlogs and my place in it was… complicated. There were some early efforts to build a real community among the bloggers, but we turned out to be an irascible lot, and after a while that kind of fell apart. The site was originally associated with Seed magazine, which folded, then it was a stand-alone thing for a bit, then partnered with National Geoographic, and the last few years it’s been an independent entity again. I’ve been mostly blogging at Forbes since mid-2015, so I’ve been pretty removed from the network– I’m honestly not even sure what blogs have been active in the past few years. I’ll continue to blog at Forbes, and may or may not re-launch more personal blogging at the archive site. A lot of that content is now posted to

What led to the slow demise of ScienceBlogs? Like most people who’ve been associated with it over the years, I have Thoughts on the subject, but I don’t really feel like airing them at this point. (If somebody else wants to write an epic oral history of SB, email me, and we can talk…) I don’t think it was ever going to be a high-margin business, and there were a number of mis-steps over the years that undercut the money-making potential even more. I probably burned or at least charred some bridges by staying with the site as long as I did, but whatever. And it’s not like anybody else is getting fabulously wealthy from running blog networks that pay reasonable rates.

ScienceBlogs unquestionably gave an enormous boost to my career. I’ve gotten any number of cool opportunities as a direct result of blogging here, most importantly my career as a writer of pop-physics books. There were some things along the way that didn’t pan out as I’d hoped, but this site launched me to what fame I have, and I’ll always be grateful for that.

So, ave atque vale, ScienceBlogs. It was a noble experiment, and the good days were very good indeed.

November 17, 2017

Tommaso DorigoMy Interview On Physics Today

Following the appearance of Kent Staley's review of my book "Anomaly!" in the November 2017 issue of Physics Today, the online site of the magazine offers, starting today, an interview with yours truly. I think the piece is quite readable and I encourage you to give it a look. Here I only quote a couple of passages for the laziest readers.

read more

November 16, 2017

Dirac Sea ShoreWhat’s on my mind

November 12, 2017

November 11, 2017

Terence TaoContinuous approximations to arithmetic functions

A basic object of study in multiplicative number theory are the arithmetic functions: functions {f: {\bf N} \rightarrow {\bf C}} from the natural numbers to the complex numbers. Some fundamental examples of such functions include

Given an arithmetic function {f}, we are often interested in statistics such as the summatory function

\displaystyle \sum_{n \leq x} f(n), \ \ \ \ \ (1)

 

the logarithmically (or harmonically) weighted summatory function

\displaystyle \sum_{n \leq x} \frac{f(n)}{n}, \ \ \ \ \ (2)

 

or the Dirichlet series

\displaystyle {\mathcal D}[f](s) := \sum_n \frac{f(n)}{n^s}.

In the latter case, one typically has to first restrict {s} to those complex numbers whose real part is large enough in order to ensure the series on the right converges; but in many important cases, one can then extend the Dirichlet series to almost all of the complex plane by analytic continuation. One is also interested in correlations involving additive shifts, such as {\sum_{n \leq x} f(n) f(n+h)}, but these are significantly more difficult to study and cannot be easily estimated by the methods of classical multiplicative number theory.

A key operation on arithmetic functions is that of Dirichlet convolution, which when given two arithmetic functions {f,g: {\bf N} \rightarrow {\bf C}}, forms a new arithmetic function {f*g: {\bf N} \rightarrow {\bf C}}, defined by the formula

\displaystyle f*g(n) := \sum_{d|n} f(d) g(\frac{n}{d}).

Thus for instance {1*1 = d_2}, {1 * \Lambda = L}, {1 * \mu = \delta}, and {\delta * f = f} for any arithmetic function {f}. Dirichlet convolution and Dirichlet series are related by the fundamental formula

\displaystyle {\mathcal D}[f * g](s) = {\mathcal D}[f](s) {\mathcal D}[g](s), \ \ \ \ \ (3)

 

at least when the real part of {s} is large enough that all sums involved become absolutely convergent (but in practice one can use analytic continuation to extend this identity to most of the complex plane). There is also the identity

\displaystyle {\mathcal D}[Lf](s) = - \frac{d}{ds} {\mathcal D}[f](s), \ \ \ \ \ (4)

 

at least when the real part of {s} is large enough to justify interchange of differentiation and summation. As a consequence, many Dirichlet series can be expressed in terms of the Riemann zeta function {\zeta = {\mathcal D}[1]}, thus for instance

\displaystyle {\mathcal D}[d_2](s) = \zeta^2(s)

\displaystyle {\mathcal D}[L](s) = - \zeta'(s)

\displaystyle {\mathcal D}[\delta](s) = 1

\displaystyle {\mathcal D}[\mu](s) = \frac{1}{\zeta(s)}

\displaystyle {\mathcal D}[\Lambda](s) = -\frac{\zeta'(s)}{\zeta(s)}.

Much of the difficulty of multiplicative number theory can be traced back to the discrete nature of the natural numbers {{\bf N}}, which form a rather complicated abelian semigroup with respect to multiplication (in particular the set of generators is the set of prime numbers). One can obtain a simpler analogue of the subject by working instead with the half-infinite interval {{\bf N}_\infty := [1,+\infty)}, which is a much simpler abelian semigroup under multiplication (being a one-dimensional Lie semigroup). (I will think of this as a sort of “completion” of {{\bf N}} at the infinite place {\infty}, hence the terminology.) Accordingly, let us define a continuous arithmetic function to be a locally integrable function {f: {\bf N}_\infty \rightarrow {\bf C}}. The analogue of the summatory function (1) is then an integral

\displaystyle \int_1^x f(t)\ dt,

and similarly the analogue of (2) is

\displaystyle \int_1^x \frac{f(t)}{t}\ dt.

The analogue of the Dirichlet series is the Mellin-type transform

\displaystyle {\mathcal D}_\infty[f](s) := \int_1^\infty \frac{f(t)}{t^s}\ dt,

which will be well-defined at least if the real part of {s} is large enough and if the continuous arithmetic function {f: {\bf N}_\infty \rightarrow {\bf C}} does not grow too quickly, and hopefully will also be defined elsewhere in the complex plane by analytic continuation.

For instance, the continuous analogue of the discrete constant function {1: {\bf N} \rightarrow {\bf C}} would be the constant function {1_\infty: {\bf N}_\infty \rightarrow {\bf C}}, which maps any {t \in [1,+\infty)} to {1}, and which we will denote by {1_\infty} in order to keep it distinct from {1}. The two functions {1_\infty} and {1} have approximately similar statistics; for instance one has

\displaystyle \sum_{n \leq x} 1 = \lfloor x \rfloor \approx x-1 = \int_1^x 1\ dt

and

\displaystyle \sum_{n \leq x} \frac{1}{n} = H_{\lfloor x \rfloor} \approx \log x = \int_1^x \frac{1}{t}\ dt

where {H_n} is the {n^{th}} harmonic number, and we are deliberately vague as to what the symbol {\approx} means. Continuing this analogy, we would expect

\displaystyle {\mathcal D}[1](s) = \zeta(s) \approx \frac{1}{s-1} = {\mathcal D}_\infty[1_\infty](s)

which reflects the fact that {\zeta} has a simple pole at {s=1} with residue {1}, and no other poles. Note that the identity {{\mathcal D}_\infty[1_\infty](s) = \frac{1}{s-1}} is initially only valid in the region {\mathrm{Re} s > 1}, but clearly the right-hand side can be continued analytically to the entire complex plane except for the pole at {1}, and so one can define {{\mathcal D}_\infty[1_\infty]} in this region also.

In a similar vein, the logarithm function {L: {\bf N} \rightarrow {\bf C}} is approximately similar to the logarithm function {L_\infty: {\bf N}_\infty \rightarrow {\bf C}}, giving for instance the crude form

\displaystyle \sum_{n \leq x} L(n) = \log \lfloor x \rfloor! \approx x \log x - x = \int_1^\infty L_\infty(t)\ dt

of Stirling’s formula, or the Dirichlet series approximation

\displaystyle {\mathcal D}[L](s) = -\zeta'(s) \approx \frac{1}{(s-1)^2} = {\mathcal D}_\infty[L_\infty](s).

The continuous analogue of Dirichlet convolution is multiplicative convolution using the multiplicative Haar measure {\frac{dt}{t}}: given two continuous arithmetic functions {f_\infty, g_\infty: {\bf N}_\infty \rightarrow {\bf C}}, one can define their convolution {f_\infty *_\infty g_\infty: {\bf N}_\infty \rightarrow {\bf C}} by the formula

\displaystyle f_\infty *_\infty g_\infty(t) := \int_1^t f_\infty(s) g_\infty(\frac{t}{s}) \frac{ds}{s}.

Thus for instance {1_\infty * 1_\infty = L_\infty}. A short computation using Fubini’s theorem shows the analogue

\displaystyle D_\infty[f_\infty *_\infty g_\infty](s) = D_\infty[f_\infty](s) D_\infty[g_\infty](s)

of (3) whenever the real part of {s} is large enough that Fubini’s theorem can be justified; similarly, differentiation under the integral sign shows that

\displaystyle D_\infty[L_\infty f_\infty](s) = -\frac{d}{ds} D_\infty[f_\infty](s) \ \ \ \ \ (5)

 

again assuming that the real part of {s} is large enough that differentiation under the integral sign (or some other tool like this, such as the Cauchy integral formula for derivatives) can be justified.

Direct calculation shows that for any complex number {\rho}, one has

\displaystyle \frac{1}{s-\rho} = D_\infty[ t \mapsto t^{\rho-1} ](s)

(at least for the real part of {s} large enough), and hence by several applications of (5)

\displaystyle \frac{1}{(s-\rho)^k} = D_\infty[ t \mapsto \frac{1}{(k-1)!} t^{\rho-1} \log^{k-1} t ](s)

for any natural number {k}. This can lead to the following heuristic: if a Dirichlet series {D[f](s)} behaves like a linear combination of poles {\frac{1}{(s-\rho)^k}}, in that

\displaystyle D[f](s) \approx \sum_\rho \frac{c_\rho}{(s-\rho)^{k_\rho}}

for some set {\rho} of poles and some coefficients {c_\rho} and natural numbers {k_\rho} (where we again are vague as to what {\approx} means, and how to interpret the sum {\sum_\rho} if the set of poles is infinite), then one should expect the arithmetic function {f} to behave like the continuous arithmetic function

\displaystyle t \mapsto \sum_\rho \frac{c_\rho}{(k_\rho-1)!} t^{\rho-1} \log^{k_\rho-1} t.

In particular, if we only have simple poles,

\displaystyle D[f](s) \approx \sum_\rho \frac{c_\rho}{s-\rho}

then we expect to have {f} behave like continuous arithmetic function

\displaystyle t \mapsto \sum_\rho c_\rho t^{\rho-1}.

Integrating this from {1} to {x}, this heuristically suggests an approximation

\displaystyle \sum_{n \leq x} f(n) \approx \sum_\rho c_\rho \frac{x^\rho-1}{\rho}

for the summatory function, and similarly

\displaystyle \sum_{n \leq x} \frac{f(n)}{n} \approx \sum_\rho c_\rho \frac{x^{\rho-1}-1}{\rho-1},

with the convention that {\frac{x^\rho-1}{\rho}} is {\log x} when {\rho=0}, and similarly {\frac{x^{\rho-1}-1}{\rho-1}} is {\log x} when {\rho=1}. One can make these sorts of approximations more rigorous by means of Perron’s formula (or one of its variants) combined with the residue theorem, provided that one has good enough control on the relevant Dirichlet series, but we will not pursue these rigorous calculations here. (But see for instance this previous blog post for some examples.)

For instance, using the more refined approximation

\displaystyle \zeta(s) \approx \frac{1}{s-1} + \gamma

to the zeta function near {s=1}, we have

\displaystyle {\mathcal D}[d_2](s) = \zeta^2(s) \approx \frac{1}{(s-1)^2} + \frac{2 \gamma}{s-1}

we would expect that

\displaystyle d_2 \approx L_\infty + 2 \gamma

and thus for instance

\displaystyle \sum_{n \leq x} d_2(n) \approx x \log x - x + 2 \gamma x

which matches what one actually gets from the Dirichlet hyperbola method (see e.g. equation (44) of this previous post).

Or, noting that {\zeta(s)} has a simple pole at {s=1} and assuming simple zeroes elsewhere, the log derivative {-\zeta'(s)/\zeta(s)} will have simple poles of residue {+1} at {s=1} and {-1} at all the zeroes, leading to the heuristic

\displaystyle {\mathcal D}[\Lambda](s) = -\frac{\zeta'(s)}{\zeta(s)} \approx \frac{1}{s-1} - \sum_\rho \frac{1}{s-\rho}

suggesting that {\Lambda} should behave like the continuous arithmetic function

\displaystyle t \mapsto 1 - \sum_\rho t^{\rho-1}

leading for instance to the summatory approximation

\displaystyle \sum_{n \leq x} \Lambda(n) \approx x - \sum_\rho \frac{x^\rho-1}{\rho}

which is a heuristic form of the Riemann-von Mangoldt explicit formula (see Exercise 45 of these notes for a rigorous version of this formula).

Exercise 1 Go through some of the other explicit formulae listed at this Wikipedia page and give heuristic justifications for them (up to some lower order terms) by similar calculations to those given above.

Given the “adelic” perspective on number theory, I wonder if there are also {p}-adic analogues of arithmetic functions to which a similar set of heuristics can be applied, perhaps to study sums such as {\sum_{n \leq x: n = a \hbox{ mod } p^j} f(n)}. A key problem here is that there does not seem to be any good interpretation of the expression {\frac{1}{t^s}} when {s} is complex and {t} is a {p}-adic number, so it is not clear that one can analyse a Dirichlet series {p}-adically. For similar reasons, we don’t have a canonical way to define {\chi(t)} for a Dirichlet character {\chi} (unless its conductor happens to be a power of {p}), so there doesn’t seem to be much to say in the {q}-aspect either.


Filed under: expository, math.NT Tagged: Dirichlet series, prime number theorem

Tommaso DorigoAnomaly Reviewed On Physics Today

Another quite positive review of my book "Anomaly! Collider Physics and the Quest for New Phenomena at Fermilab"  (which these days is 40% off at the World Scientific site I am linking) has appeared on Physics Today this month.

read more

November 09, 2017

Robert HellingWhy is there a supercontinent cycle?

One of the most influential books of my early childhood was my "Kinderatlas"
There were many things to learn about the world (maps were actually only the last third of the book) and for example I blame my fascination for scuba diving on this book. Also last year, when we visited the Mont-Doré in Auvergne and I had to explain how volcanos are formed to my kids to make them forget how many stairs were still ahead of them to the summit, I did that while mentally picturing the pages in that book about plate tectonics.


But there is one thing I about tectonics that has been bothering me for a long time and I still haven't found a good explanation for (or at least an acknowledgement that there is something to explain): Since the days of Alfred Wegener we know that the jigsaw puzzle pieces of the continents fit in a way that geologists believe that some hundred million years ago they were all connected as a supercontinent Pangea.
Pangea animation 03.gif
By Original upload by en:User:Tbower - USGS animation A08, Public Domain, Link

In fact, that was only the last in a series of supercontinents, that keep forming and breaking up in the "supercontinent cycle".
Platetechsimple.png
By SimplisticReps - Own work, CC BY-SA 4.0, Link

So here is the question: I am happy with the idea of several (say $N$) plates roughly containing a continent each that a floating around on the magma driven by all kinds of convection processes in the liquid part of the earth. They are moving around in a pattern that looks to me to be pretty chaotic (in the non-technical sense) and of course for random motion you would expect that from time to time two of those collide and then maybe stick for a while.

Then it would be possible that also a third plate collides with the two but that would be a coincidence (like two random lines typically intersect but if you have three lines they would typically intersect in pairs but typically not in a triple intersection). But to form a supercontinent, you need all $N$ plates to miraculously collide at the same time. This order-$N$ process seems to be highly unlikely when random let alone the fact that it seems to repeat. So this motion cannot be random (yes, Sabine, this is a naturalness argument). This needs an explanation.

So, why, every few hundred million years, do all the land masses of the earth assemble on side of the earth?

One explanation could for example be that during those tines, the center of mass of the earth is not in the symmetry center so the water of the oceans flow to one side of the earth and reveals the seabed on the opposite side of the earth. Then you would have essentially one big island. But this seems not to be the case as the continents (those parts that are above sea-level) appear to be stable on much longer time scales. It is not that the seabed comes up on one side and the land on the other goes under water but the land masses actually move around to meet on one side.

I have already asked this question whenever I ran into people with a geosciences education but it is still open (and I have to admit that in a non-zero number of cases I failed to even make the question clear that an $N$-body collision needs an explanation). But I am sure, you my readers know the answer or even better can come up with one.

November 08, 2017

Terence TaoIPAM program in quantitative linear algebra, Mar 19-Jun 15 2018

Alice Guionnet, Assaf Naor, Gilles Pisier, Sorin Popa, Dimitri Shylakhtenko, and I are organising a three month program here at the Institute for Pure and Applied Mathematics (IPAM) on the topic of Quantitative Linear Algebra.  The purpose of this program is to bring together mathematicians and computer scientists (both junior and senior) working in various quantitative aspects of linear operators, particularly in large finite dimension.  Such aspects include, but are not restricted to discrepancy theory, spectral graph theory, random matrices, geometric group theory, ergodic theory, von Neumann algebras, as well as specific research directions such as the Kadison-Singer problem, the Connes embedding conjecture and the Grothendieck inequality.  There will be several workshops and tutorials during the program (for instance I will be giving a series of introductory lectures on random matrix theory).

While we already have several confirmed participants, we are still accepting applications for this program until Dec 4; details of the application process may be found at this page.


Filed under: advertising Tagged: ipam, linear algebra

November 07, 2017

Doug NatelsonTaxes and grad student tuition

As has happened periodically over the last couple of decades (I remember a scare about this when Newt Gingrich's folks ran Congress in the mid-1990s), a tax bill has been put forward in the US House that would treat graduate student tuition waivers like taxable income (roughly speaking).   This is discussed a little bit here, and here.

Here's an example of why this is an ill-informed idea.  Suppose a first-year STEM grad student comes to a US university, and they are supported by, say, departmental fellowship funds or a TA position during that first year.  Their stipend is something like $30K.  These days the university waives their graduate tuition - that is, they do not expect the student to pony up tuition funds.  At Rice, that tuition is around $45K.  Under the proposed legislation, the student would end up getting taxed as if their income was $75K, when their actual gross pay is $30K.   

That would be extremely bad for both graduate students and research universities.  Right off the bat this would create unintended (I presume) economic incentives, for grad students to drop out of their programs, and/or for universities to play funny games with what they say is graduate tuition.   

This has been pitched multiple times before, and my hypothesis is that it's put forward by congressional staffers who do not understand graduate school (and/or think that this is the same kind of tuition waiver as when a faculty member's child gets a vastly reduced tuition for attending the parent's employing university).  Because it is glaringly dumb, it has been fixed whenever it's come up before.  In the present environment, the prudent thing to do would be to exercise caution and let legislators know that this is a problem that needs to be fixed.

October 31, 2017

Doug NatelsonLinks + coming soon

Real life is a bit busy right now, but I wanted to point out a couple of links and talk about what's coming up.
  • I've been looking for ways to think about and discuss topological materials that might be more broadly accessible to non-experts, and I found this paper and videos like this one and this one.  Very cool, and I'm sorry I'd missed it back in '15 when it came out.
  • In the experimental literature talking about realizations of Majorana fermions in the solid state, a key signature is a peak in the conductance at zero voltage - that's an indicator that there is a "zero-energy mode" in the system.  There are other ways to get zero-bias peaks, though, and nailing down whether this has the expected properties (magnitude, response to magnetic fields) has been a lingering issue.  This seems to nail down the situation more firmly.
  • Discussions about "quantum supremacy" strictly in terms of how many qubits can be simulated on a classical computer right now seem a bit silly to me.  Ok, so IBM managed to simulate a handful of additional qubits (56 rather than 49).  It wouldn't shock me if they could get up to 58 - supercomputers are powerful and programmers can be very clever.  Are we going to get a flurry of news stories every time about how this somehow moves the goalposts for quantum computers?    
  • I'm hoping to put out a review of Max the Demon and the Entropy of Doom, since I received my beautifully printed copies this past weekend.

October 27, 2017

Terence TaoUCLA Math Undergraduate Merit Scholarship for 2018

In 2010, the UCLA mathematics department launched a scholarship opportunity for entering freshman students with exceptional background and promise in mathematics. We are able to offer one scholarship each year.  The UCLA Math Undergraduate Merit Scholarship provides for full tuition, and a room and board allowance for 4 years, contingent on continued high academic performance. In addition, scholarship recipients follow an individualized accelerated program of study, as determined after consultation with UCLA faculty.   The program of study leads to a Masters degree in Mathematics in four years.

More information and an application form for the scholarship can be found on the web at:

http://www.math.ucla.edu/ugrad/mums

To be considered for Fall 2018, candidates must apply for the scholarship and also for admission to UCLA on or before November 30, 2017.


Filed under: advertising Tagged: scholarship, UCLA, undergraduate study

October 24, 2017

Jordan EllenbergThe greatest Astro/Dodger

The World Series is here and so it’s time again to figure out which player in the history of baseball has had the most distinguished joint record of contributions to both teams in contention for the title.  (Last year:  Riggs Stephenson was the greatest Cub/Indian.)  Astros history just isn’t that long, so it’s a little surprising to find we come up with a really solid winner this year:  Jimmy Wynn, “The Toy Cannon,” a longtime Astro who moved to LA in 1974 and had arguably his best season, finishing 5th in MVP voting and leading the Dodgers to a pennant.  Real three-true-outcomes guy:  led the league in walks twice and strikeouts once, and was top-10 in the National League in home runs four times in the AstrodomeCareer total of 41.4 WAR for the Astros, and 12.3 for the Dodgers in just two years there.

As always, thanks to the indispensable Baseball Reference Play Index for making this search possible.

Other contenders:  Don Sutton is clearly tops among pitchers.  Sutton was the flip side of Wynn; he had just two seasons for Houston but they were pretty good.  Beyond that it’s slim pickings.  Jeff Kent put in some years for both teams.  So did Joe Ferguson.

Who are we rooting for?  On the “ex-Orioles on the WS Roster” I guess the Dodgers have the advantage, with Rich Hill and Justin Turner (I have to admit I have no memory of Turner playing for the Orioles at all, even though it wasn’t that long ago!  It was in 2009, a season I have few occasions to recall.)  But both these teams are stocked with players I just plain like:  Kershaw, Puig, Altuve, the great Carlos Beltran…

 

 


Andrew JaffeThe Chandrasekhar Mass and the Hubble Constant

The

first direct detection of gravitational waves was announced in February of 2015 by the LIGO team, after decades of planning, building and refining their beautiful experiment. Since that time, the US-based LIGO has been joined by the European Virgo gravitational wave telescope (and more are planned around the globe).

The first four events that the teams announced were from the spiralling in and eventual mergers of pairs of black holes, with masses ranging from about seven to about forty times the mass of the sun. These masses are perhaps a bit higher than we expect to by typical, which might raise intriguing questions about how such black holes were formed and evolved, although even comparing the results to the predictions is a hard problem depending on the details of the statistical properties of the detectors and the astrophysical models for the evolution of black holes and the stars from which (we think) they formed.

Last week, the teams announced the detection of a very different kind of event, the collision of two neutron stars, each about 1.4 times the mass of the sun. Neutron stars are one possible end state of the evolution of a star, when its atoms are no longer able to withstand the pressure of the gravity trying to force them together. This was first understood by S Chandrasekhar in the early years of the 20th Century, who realised that there was a limit to the mass of a star held up simply by the quantum-mechanical repulsion of the electrons at the outskirts of the atoms making up the star. When you surpass this mass, known, appropriately enough, as the Chandrasekhar mass, the star will collapse in upon itself, combining the electrons and protons into neutrons and likely releasing a vast amount of energy in the form of a supernova explosion. After the explosion, the remnant is likely to be a dense ball of neutrons, whose properties are actually determined fairly precisely by similar physics to that of the Chandrasekhar limit (discussed for this case by Oppenheimer, Volkoff and Tolman), giving us the magic 1.4 solar mass number.

(Last week also coincidentally would have seen Chandrasekhar’s 107th birthday, and Google chose to illustrate their home page with an animation in his honour for the occasion. I was a graduate student at the University of Chicago, where Chandra, as he was known, spent most of his career. Most of us students were far too intimidated to interact with him, although it was always seen as an auspicious occasion when you spotted him around the halls of the Astronomy and Astrophysics Center.)

This process can therefore make a single 1.4 solar-mass neutron star, and we can imagine that in some rare cases we can end up with two neutron stars orbiting one another. Indeed, the fact that LIGO saw one, but only one, such event during its year-and-a-half run allows the teams to constrain how often that happens, albeit with very large error bars, between 320 and 4740 events per cubic gigaparsec per year; a cubic gigaparsec is about 3 billion light-years on each side, so these are rare events indeed. These results and many other scientific inferences from this single amazing observation are reported in the teams’ overview paper.

A series of other papers discuss those results in more detail, covering the physics of neutron stars to limits on departures from Einstein’s theory of gravity (for more on some of these other topics, see this blog, or this story from the NY Times). As a cosmologist, the most exciting of the results were the use of the event as a “standard siren”, an object whose gravitational wave properties are well-enough understood that we can deduce the distance to the object from the LIGO results alone. Although the idea came from Bernard Schutz in 1986, the term “Standard siren” was coined somewhat later (by Sean Carroll) in analogy to the (heretofore?) more common cosmological standard candles and standard rulers: objects whose intrinsic brightness and distances are known and so whose distances can be measured by observations of their apparent brightness or size, just as you can roughly deduce how far away a light bulb is by how bright it appears, or how far away a familiar object or person is by how big how it looks.

Gravitational wave events are standard sirens because our understanding of relativity is good enough that an observation of the shape of gravitational wave pattern as a function of time can tell us the properties of its source. Knowing that, we also then know the amplitude of that pattern when it was released. Over the time since then, as the gravitational waves have travelled across the Universe toward us, the amplitude has gone down (further objects look dimmer sound quieter); the expansion of the Universe also causes the frequency of the waves to decrease — this is the cosmological redshift that we observe in the spectra of distant objects’ light.

Unlike LIGO’s previous detections of binary-black-hole mergers, this new observation of a binary-neutron-star merger was also seen in photons: first as a gamma-ray burst, and then as a “nova”: a new dot of light in the sky. Indeed, the observation of the afterglow of the merger by teams of literally thousands of astronomers in gamma and x-rays, optical and infrared light, and in the radio, is one of the more amazing pieces of academic teamwork I have seen.

And these observations allowed the teams to identify the host galaxy of the original neutron stars, and to measure the redshift of its light (the lengthening of the light’s wavelength due to the movement of the galaxy away from us). It is most likely a previously unexceptional galaxy called NGC 4993, with a redshift z=0.009, putting it about 40 megaparsecs away, relatively close on cosmological scales.

But this means that we can measure all of the factors in one of the most celebrated equations in cosmology, Hubble’s law: cz=Hd, where c is the speed of light, z is the redshift just mentioned, and d is the distance measured from the gravitational wave burst itself. This just leaves H₀, the famous Hubble Constant, giving the current rate of expansion of the Universe, usually measured in kilometres per second per megaparsec. The old-fashioned way to measure this quantity is via the so-called cosmic distance ladder, bootstrapping up from nearby objects of known distance to more distant ones whose properties can only be calibrated by comparison with those more nearby. But errors accumulate in this process and we can be susceptible to the weakest rung on the chain (see recent work by some of my colleagues trying to formalise this process). Alternately, we can use data from cosmic microwave background (CMB) experiments like the Planck Satellite (see here for lots of discussion on this blog); the typical size of the CMB pattern on the sky is something very like a standard ruler. Unfortunately, it, too, needs to calibrated, implicitly by other aspects of the CMB pattern itself, and so ends up being a somewhat indirect measurement. Currently, the best cosmic-distance-ladder measurement gives something like 73.24 ± 1.74 km/sec/Mpc whereas Planck gives 67.81 ± 0.92 km/sec/Mpc; these numbers disagree by “a few sigma”, enough that it is hard to explain as simply a statistical fluctuation.

Unfortunately, the new LIGO results do not solve the problem. Because we cannot observe the inclination of the neutron-star binary (i.e., the orientation of its orbit), this blows up the error on the distance to the object, due to the Bayesian marginalisation over this unknown parameter (just as the Planck measurement requires marginalization over all of the other cosmological parameters to fully calibrate the results). Because the host galaxy is relatively nearby, the teams must also account for the fact that the redshift includes the effect not only of the cosmological expansion but also the movement of galaxies with respect to one another due to the pull of gravity on relatively large scales; this so-called peculiar velocity has to be modelled which adds further to the errors.

This procedure gives a final measurement of 70.0+12-8.0, with the full shape of the probability curve shown in the Figure, taken directly from the paper. Both the Planck and distance-ladder results are consistent with these rather large error bars. But this is calculated from a single object; as more of these events are seen these error bars will go down, typically by something like the square root of the number of events, so it might not be too long before this is the best way to measure the Hubble Constant.

GW H0

[Apologies: too long, too technical, and written late at night while trying to get my wonderful not-quite-three-week-old daughter to sleep through the night.]

Steinn Sigurðssonbusiness vs science: an anecdote

many years ago, when the Web was young, I was talking to an acquaintance - a friend-of-a-friend - a SoCal business person.
They had heard about this new Web thing, and were asking me about what use it was.

Now, if you had asked me, I'd have guessed this was end of '94,  but I checked and it must have been the summer of '95.

I had just seen Kelson and Trager order pizza on the web, so clearly I was an expert, but my acquaintance wanted to know if this Web thing could be used to help sell cars - not directly, SSL 3.0 had not been implemented yet, but to show models continuously, update inventory, specs, pricing etc.

I opined, "sure, why not", seemed like it'd be a good fit, and I would explore how feasible this might be.  So I searched (this was before google so AltaVista probably...[sigh]) and discovered that Edmunds had already done this.

At this point there was a bifurcation:  my acquaintance was very excited, this was clearly a proven concept and much more interesting than they had appreciated, and would I be interested in working with them on this;  and I was totally not interested,  somebody had done it, it was not new, booooring.

This was a revelation for me.  The concept was only interesting to me when it was new and innovative and unproven. Soon as I discovered it had been done and worked, I become disinterested.
My acquaintance, the business man, really only became interested when they discovered it was a proven idea already done by a player.

So, they got some SoCal hack to put up a website and sold more cars, and got even more rich,
and I went off to think about black hole and shit.

Learned something.

October 23, 2017

John PreskillParadise

The word dominates chapter one of Richard Holmes’s book The Age of WonderHolmes writes biographies of Romantic-Era writers: Mary Wollstonecraft, Percy Shelley, and Samuel Taylor Coleridge populate his bibliography. They have cameos in Age. But their scientific counterparts star.

“Their natural-philosopher” counterparts, I should say. The word “scientist” emerged as the Romantic Era closed. Romanticism, a literary and artistic movement, flourished between the 1700s and the 1800s. Romantics championed self-expression, individuality, and emotion over convention and artificiality. Romantics wondered at, and drew inspiration from, the natural world. So, Holmes argues, did Romantic-Era natural philosophers. They explored, searched, and innovated with Wollstonecraft’s, Shelley’s, and Coleridge’s zest.

Age of Wonder

Holmes depicts Wilhelm and Caroline Herschel, a German brother and sister, discovering the planet Uranus. Humphry Davy, an amateur poet from Penzance, inventing a lamp that saved miners’ lives. Michael Faraday, a working-class Londoner, inspired by Davy’s chemistry lectures.

Joseph Banks in paradise.

So Holmes entitled chapter one.

Banks studied natural history as a young English gentleman during the 1760s. He then sailed around the world, a botanist on exploratory expeditions. The second expedition brought Banks aboard the HMS Endeavor. Captain James Cook steered the ship to Brazil, Tahiti, Australia, and New Zealand. Banks brought a few colleagues onboard. They studied the native flora, fauna, skies, and tribes.

Banks, with fellow botanist Daniel Solander, accumulated over 30,000 plant samples. Artist Sydney Parkinson drew the plants during the voyage. Parkinson’s drawings underlay 743 copper engravings that Banks commissioned upon returning to England. Banks planned to publish the engravings as the book Florilegium. He never succeeded. Two institutions executed Banks’s plan more than 200 years later.

Banks’s Florilegium crowns an exhibition at the University of California at Santa Barbara (UCSB). UCSB’s Special Research Collections will host “Botanical Illustrations and Scientific Discovery—Joseph Banks and the Exploration of the South Pacific, 1768–1771” until May 2018. The exhibition features maps of Banks’s journeys, biographical sketches of Banks and Cook, contemporary art inspired by the engravings, and the Florilegium.

online poster

The exhibition spotlights “plants that have subsequently become important ornamental plants on the UCSB campus, throughout Santa Barbara, and beyond.” One sees, roaming Santa Barbara, slivers of Banks’s paradise.

2 bouganvilleas

In Santa Barbara resides the Kavli Institute for Theoretical Physics (KITP). The KITP is hosting a program about the physics of quantum information (QI). QI scientists are congregating from across the world. Everyone visits for a few weeks or months, meeting some participants and missing others (those who have left or will arrive later). Participants attend and present tutorials, explore beyond their areas of expertise, and initiate research collaborations.

A conference capstoned the program, one week this October. Several speakers had founded subfields of physics: quantum error correction (how to fix errors that dog quantum computers), quantum computational complexity (how quickly quantum computers can solve hard problems), topological quantum computation, AdS/CFT (a parallel between certain gravitational systems and certain quantum systems), and more. Swaths of science exist because of these thinkers.

KITP

One evening that week, I visited the Joseph Banks exhibition.

Joseph Banks in paradise.

I’d thought that, by “paradise,” Holmes had meant “physical attractions”: lush flowers, vibrant colors, fresh fish, and warm sand. Another meaning occurred to me, after the conference talks, as I stood before a glass case in the library.

Joseph Banks, disembarking from the Endeavour, didn’t disembark onto just an island. He disembarked onto terra incognita. Never had he or his colleagues seen the blossoms, seed pods, or sprouts before him. Swaths of science awaited. What could the natural philosopher have craved more?

QI scientists of a certain age reminisce about the 1990s, the cowboy days of QI. When impactful theorems, protocols, and experiments abounded. When they dangled, like ripe fruit, just above your head. All you had to do was look up, reach out, and prove a pineapple.

Cowboy

Typical 1990s quantum-information scientist

That generation left mine few simple theorems to prove. But QI hasn’t suffered extinction. Its frontiers have advanced into other fields of science. Researchers are gaining insight into thermodynamics, quantum gravity, condensed matter, and chemistry from QI. The KITP conference highlighted connections with quantum gravity.

…in paradise.

What could a natural philosopher crave more?

Contemporary

Artwork commissioned by the UCSB library: “Sprawling Neobiotic Chimera (After Banks’ Florilegium),” by Rose Briccetti

Most KITP talks are recorded and released online. You can access talks from the conference here. My talk, about quantum chaos and thermalization, appears here. 

With gratitude to the KITP, and to the program organizers and the conference organizers, for the opportunity to participate. 


October 20, 2017

Steinn Sigurðsson"In the beginning..."


"In the beginning was the command line..."

a lot of people who ought to have read Neal Stephenson's essay on the UI as a metaphor, have not done so.

This is a public service.

Go get a copy, then carry it with you until you get stuck at O'Hare long enough to read it, or whatever works for you.


October 19, 2017

Steinn Sigurðssonthe best things in life are free

The The arXiv wants your $

arXiv  is an e-print service in the fields of physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering and systems science, and economics

each day it receives several hundred e-prints, mostly preprints, categorizes them and distributes the list of papers, with full access to the pdf and, when available, the TeX source  - which is most of the time, TeX rocks

authors submit the e-prints, with license to distribute, and users receive the list of the day's papers, for the categories they express interest in, and access to content, free, originally by e-mail, now web

almost all papers in theoretical physics, mathematics and astrophysics now go on the arXiv, as does an increasing fraction from the newer fields
there are multiple other *Xivs  covering other subject areas with varying degree of success

the arXiv now holds over a million e-prints, going back to 1991, and a little bit beyond, as people have sent in old stuff to archive on the arXiv, e-prints are coming at about 10,000 per month and growing

the service is slim, by design, almost minimalistic, but oh so powerful

you all use it, obsessively, you couldn't do without it!

arXiv is not actually cost free, it as an IT staff, its own server and a management team and development team
a lot of the current cost is provided by member institutions, and Cornell University

but... we could use more, so the annual fundraising drive is under way, this week only
 - ok, you can give any time, but this is sorta special

Steinn SigurðssonWhy, yes, it is all about me...


Dynamics of Cats is back.

This is the original, Ye Olde, blog, started back in March of 20065, it had a decent run through June 2006, at which point I was invited to join the Scienceblogs collective at what was then SEED magazine.

A fun decade ensued, as blogs boomed,  markets crashed, SEED realized Sb was keeping the rest of the group going, and then National Geographic ate Sb, which then started to shrivel.

I blog strictly to amuse myself, no promises.

I find blogging is good warmup for serious writing.  Actual output is very sensitive to Real Life,  I tend to be more prolific when busy doing stuff, and less prolific when subsumed by administrivia and other intrusions from The World.

I'm a physicist, educated in England, PhD from a small private university in California.
Postdocs in NoCal and back to UK, and now a Professor of Astronomy and Astrophysics at The Pennsylvania State University.
Which is Good In Parts.

I am a member of:

I have never taken a class in astronomy, for credit.

In my copious spare time I am a Science Editor for the AAS Journals,
I am also a member of the Aspen Center for Physics 
and most recently I became the Scientific Director of arXiv

Steinn Sigurdsson Appointed as arXiv Scientific Director

As noted above (you did read the disclaimer...),
I do not speak for any of these Institutions upon this here blog.

We also have a dog.
Gunnar.
Named after the poet-warrior of the Sagas.
Don't ask, he had the name when he moved in with us.



We used to have cats, but they, sadly, died.

October 17, 2017

Matt StrasslerThe Significance of Yesterday’s Gravitational Wave Announcement: an FAQ

Yesterday’s post on the results from the LIGO/VIRGO network of gravitational wave detectors was aimed at getting information out, rather than providing the pedagogical backdrop.  Today I’m following up with a post that attempts to answer some of the questions that my readers and my personal friends asked me.  Some wanted to understand better how to visualize what had happened, while others wanted more clarity on why the discovery was so important.  So I’ve put together a post which  (1) explains what neutron stars and black holes are and what their mergers are like, (2) clarifies why yesterday’s announcement was important — and there were many reasons, which is why it’s hard to reduce it all to a single soundbite.  And (3) there are some miscellaneous questions at the end.

First, a disclaimer: I am *not* an expert in the very complex subject of neutron star mergers and the resulting explosions, called kilonovas.  These are much more complicated than black hole mergers.  I am still learning some of the details.  Hopefully I’ve avoided errors, but you’ll notice a few places where I don’t know the answers … yet.  Perhaps my more expert colleagues will help me fill in the gaps over time.

Please, if you spot any errors, don’t hesitate to comment!!  And feel free to ask additional questions whose answers I can add to the list.

BASIC QUESTIONS ABOUT NEUTRON STARS, BLACK HOLES, AND MERGERS

What are neutron stars and black holes, and how are they related?

Every atom is made from a tiny atomic nucleus, made of neutrons and protons (which are very similar), and loosely surrounded by electrons. Most of an atom is empty space, so it can, under extreme circumstances, be crushed — but only if every electron and proton convert to a neutron (which remains behind) and a neutrino (which heads off into outer space.) When a giant star runs out of fuel, the pressure from its furnace turns off, and it collapses inward under its own weight, creating just those extraordinary conditions in which the matter can be crushed. Thus: a star’s interior, with a mass one to several times the Sun’s mass, is all turned into a several-mile(kilometer)-wide ball of neutrons — the number of neutrons approaching a 1 with 57 zeroes after it.

If the star is big but not too big, the neutron ball stiffens and holds its shape, and the star explodes outward, blowing itself to pieces in a what is called a core-collapse supernova. The ball of neutrons remains behind; this is what we call a neutron star. It’s a ball of the densest material that we know can exist in the universe — a pure atomic nucleus many miles(kilometers) across. It has a very hard surface; if you tried to go inside a neutron star, your experience would be a lot worse than running into a closed door at a hundred miles per hour.

If the star is very big indeed, the neutron ball that forms may immediately (or soon) collapse under its own weight, forming a black hole. A supernova may or may not result in this case; the star might just disappear. A black hole is very, very different from a neutron star. Black holes are what’s left when matter collapses irretrievably upon itself under the pull of gravity, shrinking down endlessly. While a neutron star has a surface that you could smash your head on, a black hole has no surface — it has an edge that is simply a point of no return, called a horizon. In Einstein’s theory, you can just go right through, as if passing through an open door. You won’t even notice the moment you go in. [Note: this is true in Einstein’s theory. But there is a big controversy as to whether the combination of Einstein’s theory with quantum physics changes the horizon into something novel and dangerous to those who enter; this is known as the firewall controversy, and would take us too far afield into speculation.]  But once you pass through that door, you can never return.

Black holes can form in other ways too, but not those that we’re observing with the LIGO/VIRGO detectors.

Why are their mergers the best sources for gravitational waves?

One of the easiest and most obvious ways to make gravitational waves is to have two objects orbiting each other.  If you put your two fists in a pool of water and move them around each other, you’ll get a pattern of water waves spiraling outward; this is in rough (very rough!) analogy to what happens with two orbiting objects, although, since the objects are moving in space, the waves aren’t in a material like water.  They are waves in space itself.

To get powerful gravitational waves, you want objects each with a very big mass that are orbiting around each other at very high speed. To get the fast motion, you need the force of gravity between the two objects to be strong; and to get gravity to be as strong as possible, you need the two objects to be as close as possible (since, as Isaac Newton already knew, gravity between two objects grows stronger when the distance between them shrinks.) But if the objects are large, they can’t get too close; they will bump into each other and merge long before their orbit can become fast enough. So to get a really fast orbit, you need two relatively small objects, each with a relatively big mass — what scientists refer to as compact objects. Neutron stars and black holes are the most compact objects we know about. Fortunately, they do indeed often travel in orbiting pairs, and do sometimes, for a very brief period before they merge, orbit rapidly enough to produce gravitational waves that LIGO and VIRGO can observe.

Why do we find these objects in pairs in the first place?

Stars very often travel in pairs… they are called binary stars. They can start their lives in pairs, forming together in large gas clouds, or even if they begin solitary, they can end up pairing up if they live in large densely packed communities of stars where it is common for multiple stars to pass nearby. Perhaps surprisingly, their pairing can survive the collapse and explosion of either star, leaving two black holes, two neutron stars, or one of each in orbit around one another.

What happens when these objects merge?

Not surprisingly, there are three classes of mergers which can be detected: two black holes merging, two neutron stars merging, and a neutron star merging with a black hole. The first class was observed in 2015 (and announced in 2016), the second was announced yesterday, and it’s a matter of time before the third class is observed. The two objects may orbit each other for billions of years, very slowly radiating gravitational waves (an effect observed in the 70’s, leading to a Nobel Prize) and gradually coming closer and closer together. Only in the last day of their lives do their orbits really start to speed up. And just before these objects merge, they begin to orbit each other once per second, then ten times per second, then a hundred times per second. Visualize that if you can: objects a few dozen miles (kilometers) across, a few miles (kilometers) apart, each with the mass of the Sun or greater, orbiting each other 100 times each second. It’s truly mind-boggling — a spinning dumbbell beyond the imagination of even the greatest minds of the 19th century. I don’t know any scientist who isn’t awed by this vision. It all sounds like science fiction. But it’s not.

How do we know this isn’t science fiction?

We know, if we believe Einstein’s theory of gravity (and I’ll give you a very good reason to believe in it in just a moment.) Einstein’s theory predicts that such a rapidly spinning, large-mass dumbbell formed by two orbiting compact objects will produce a telltale pattern of ripples in space itself — gravitational waves. That pattern is both complicated and precisely predicted. In the case of black holes, the predictions go right up to and past the moment of merger, to the ringing of the larger black hole that forms in the merger. In the case of neutron stars, the instants just before, during and after the merger are more complex and we can’t yet be confident we understand them, but during tens of seconds before the merger Einstein’s theory is very precise about what to expect. The theory further predicts how those ripples will cross the vast distances from where they were created to the location of the Earth, and how they will appear in the LIGO/VIRGO network of three gravitational wave detectors. The prediction of what to expect at LIGO/VIRGO thus involves not just one prediction but many: the theory is used to predict the existence and properties of black holes and of neutron stars, the detailed features of their mergers, the precise patterns of the resulting gravitational waves, and how those gravitational waves cross space. That LIGO/VIRGO have detected the telltale patterns of these gravitational waves. That these wave patterns agree with Einstein’s theory in every detail is the strongest evidence ever obtained that there is nothing wrong with Einstein’s theory when used in these combined contexts.  That then in turn gives us confidence that our interpretation of the LIGO/VIRGO results is correct, confirming that black holes and neutron stars really exist and really merge. (Notice the reasoning is slightly circular… but that’s how scientific knowledge proceeds, as a set of detailed consistency checks that gradually and eventually become so tightly interconnected as to be almost impossible to unwind.  Scientific reasoning is not deductive; it is inductive.  We do it not because it is logically ironclad but because it works so incredibly well — as witnessed by the computer, and its screen, that I’m using to write this, and the wired and wireless internet and computer disk that will be used to transmit and store it.)

THE SIGNIFICANCE(S) OF YESTERDAY’S ANNOUNCEMENT OF A NEUTRON STAR MERGER

What makes it difficult to explain the significance of yesterday’s announcement is that it consists of many important results piled up together, rather than a simple takeaway that can be reduced to a single soundbite. (That was also true of the black hole mergers announcement back in 2016, which is why I wrote a long post about it.)

So here is a list of important things we learned.  No one of them, by itself, is earth-shattering, but each one is profound, and taken together they form a major event in scientific history.

First confirmed observation of a merger of two neutron stars: We’ve known these mergers must occur, but there’s nothing like being sure. And since these things are too far away and too small to see in a telescope, the only way to be sure these mergers occur, and to learn more details about them, is with gravitational waves.  We expect to see many more of these mergers in coming years as gravitational wave astronomy increases in its sensitivity, and we will learn more and more about them.

New information about the properties of neutron stars: Neutron stars were proposed almost a hundred years ago and were confirmed to exist in the 60’s and 70’s.  But their precise details aren’t known; we believe they are like a giant atomic nucleus, but they’re so vastly larger than ordinary atomic nuclei that can’t be sure we understand all of their internal properties, and there are debates in the scientific community that can’t be easily answered… until, perhaps, now.

From the detailed pattern of the gravitational waves of this one neutron star merger, scientists already learn two things. First, we confirm that Einstein’s theory correctly predicts the basic pattern of gravitational waves from orbiting neutron stars, as it does for orbiting and merging black holes. Unlike black holes, however, there are more questions about what happens to neutron stars when they merge. The question of what happened to this pair after they merged is still out — did the form a neutron star, an unstable neutron star that, slowing its spin, eventually collapsed into a black hole, or a black hole straightaway?

But something important was already learned about the internal properties of neutron stars. The stresses of being whipped around at such incredible speeds would tear you and I apart, and would even tear the Earth apart. We know neutron stars are much tougher than ordinary rock, but how much more? If they were too flimsy, they’d have broken apart at some point during LIGO/VIRGO’s observations, and the simple pattern of gravitational waves that was expected would have suddenly become much more complicated. That didn’t happen until perhaps just before the merger.   So scientists can use the simplicity of the pattern of gravitational waves to infer some new things about how stiff and strong neutron stars are.  More mergers will improve our understanding.  Again, there is no other simple way to obtain this information.

First visual observation of an event that produces both immense gravitational waves and bright electromagnetic waves: Black hole mergers aren’t expected to create a brilliant light display, because, as I mentioned above, they’re more like open doors to an invisible playground than they are like rocks, so they merge rather quietly, without a big bright and hot smash-up.  But neutron stars are big balls of stuff, and so the smash-up can indeed create lots of heat and light of all sorts, just as you might naively expect.  By “light” I mean not just visible light but all forms of electromagnetic waves, at all wavelengths (and therefore at all frequencies.)  Scientists divide up the range of electromagnetic waves into categories. These categories are radio waves, microwaves, infrared light, visible light, ultraviolet light, X-rays, and gamma rays, listed from lowest frequency and largest wavelength to highest frequency and smallest wavelength.  (Note that these categories and the dividing lines between them are completely arbitrary, but the divisions are useful for various scientific purposes.  The only fundamental difference between yellow light, a radio wave, and a gamma ray is the wavelength and frequency; otherwise they’re exactly the same type of thing, a wave in the electric and magnetic fields.)

So if and when two neutron stars merge, we expect both gravitational waves and electromagnetic waves, the latter of many different frequencies created by many different effects that can arise when two huge balls of neutrons collide.  But just because we expect them doesn’t mean they’re easy to see.  These mergers are pretty rare — perhaps one every hundred thousand years in each big galaxy like our own — so the ones we find using LIGO/VIRGO will generally be very far away.  If the light show is too dim, none of our telescopes will be able to see it.

But this light show was plenty bright.  Gamma ray detectors out in space detected it instantly, confirming that the gravitational waves from the two neutron stars led to a collision and merger that produced very high frequency light.  Already, that’s a first.  It’s as though one had seen lightning for years but never heard thunder; or as though one had observed the waves from hurricanes for years but never observed one in the sky.  Seeing both allows us a whole new set of perspectives; one plus one is often much more than two.

Over time — hours and days — effects were seen in visible light, ultraviolet light, infrared light, X-rays and radio waves.  Some were seen earlier than others, which itself is a story, but each one contributes to our understanding of what these mergers are actually like.

Confirmation of the best guess concerning the origin of “short” gamma ray bursts:  For many years, bursts of gamma rays have been observed in the sky.  Among them, there seems to be a class of bursts that are shorter than most, typically lasting just a couple of seconds.  They come from all across the sky, indicating that they come from distant intergalactic space, presumably from distant galaxies.  Among other explanations, the most popular hypothesis concerning these short gamma-ray bursts has been that they come from merging neutron stars.  The only way to confirm this hypothesis is with the observation of the gravitational waves from such a merger.  That test has now been passed; it appears that the hypothesis is correct.  That in turn means that we have, for the first time, both a good explanation of these short gamma ray bursts and, because we know how often we observe these bursts, a good estimate as to how often neutron stars merge in the universe.

First distance measurement to a source using both a gravitational wave measure and a redshift in electromagnetic waves, allowing a new calibration of the distance scale of the universe and of its expansion rate:  The pattern over time of the gravitational waves from a merger of two black holes or neutron stars is complex enough to reveal many things about the merging objects, including a rough estimate of their masses and the orientation of the spinning pair relative to the Earth.  The overall strength of the waves, combined with the knowledge of the masses, reveals how far the pair is from the Earth.  That by itself is nice, but the real win comes when the discovery of the object using visible light, or in fact any light with frequency below gamma-rays, can be made.  In this case, the galaxy that contains the neutron stars can be determined.

Once we know the host galaxy, we can do something really important.  We can, by looking at the starlight, determine how rapidly the galaxy is moving away from us.  For distant galaxies, the speed at which the galaxy recedes should be related to its distance because the universe is expanding.

How rapidly the universe is expanding has been recently measured with remarkable precision, but the problem is that there are two different methods for making the measurement, and they disagree.   This disagreement is one of the most important problems for our understanding of the universe.  Maybe one of the measurement methods is flawed, or maybe — and this would be much more interesting — the universe simply doesn’t behave the way we think it does.

What gravitational waves do is give us a third method: the gravitational waves directly provide the distance to the galaxy, and the electromagnetic waves directly provide the speed of recession.  There is no other way to make this type of joint measurement directly for distant galaxies.  The method is not accurate enough to be useful in just one merger, but once dozens of mergers have been observed, the average result will provide important new information about the universe’s expansion.  When combined with the other methods, it may help resolve this all-important puzzle.

Best test so far of Einstein’s prediction that the speed of light and the speed of gravitational waves are identical: Since gamma rays from the merger and the peak of the gravitational waves arrived within two seconds of one another after traveling 130 million years — that is, about 5 thousand million million seconds — we can say that the speed of light and the speed of gravitational waves are both equal to the cosmic speed limit to within one part in 2 thousand million million.  Such a precise test requires the combination of gravitational wave and gamma ray observations.

Efficient production of heavy elements confirmed:  It’s long been said that we are star-stuff, or stardust, and it’s been clear for a long time that it’s true.  But there’s been a puzzle when one looks into the details.  While it’s known that all the chemical elements from hydrogen up to iron are formed inside of stars, and can be blasted into space in supernova explosions to drift around and eventually form planets, moons, and humans, it hasn’t been quite as clear how the other elements with heavier atoms — atoms such as iodine, cesium, gold, lead, bismuth, uranium and so on — predominantly formed.  Yes they can be formed in supernovas, but not so easily; and there seem to be more atoms of heavy elements around the universe than supernovas can explain.  There are many supernovas in the history of the universe, but the efficiency for producing heavy chemical elements is just too low.

It was proposed some time ago that the mergers of neutron stars might be a suitable place to produce these heavy elements.  Even those these mergers are rare, they might be much more efficient, because the nuclei of heavy elements contain lots of neutrons and, not surprisingly, a collision of two neutron stars would produce lots of neutrons in its debris, suitable perhaps for making these nuclei.   A key indication that this is going on would be the following: if a neutron star merger could be identified using gravitational waves, and if its location could be determined using telescopes, then one would observe a pattern of light that would be characteristic of what is now called a “kilonova” explosion.   Warning: I don’t yet know much about kilonovas and I may be leaving out important details. A kilonova is powered by the process of forming heavy elements; most of the nuclei produced are initially radioactive — i.e., unstable — and they break down by emitting high energy particles, including the particles of light (called photons) which are in the gamma ray and X-ray categories.  The resulting characteristic glow would be expected to have a pattern of a certain type: it would be initially bright but would dim rapidly in visible light, with a long afterglow in infrared light.  The reasons for this are complex, so let me set them aside for now.  The important point is that this pattern was observed, confirming that a kilonova of this type occurred, and thus that, in this neutron star merger, enormous amounts of heavy elements were indeed produced.  So we now have a lot of evidence, for the first time, that almost all the heavy chemical elements on and around our planet were formed in neutron star mergers.  Again, we could not know this if we did not know that this was a neutron star merger, and that information comes only from the gravitational wave observation.

MISCELLANEOUS QUESTIONS

Did the merger of these two neutron stars result in a new black hole, a larger neutron star, or an unstable rapidly spinning neutron star that later collapsed into a black hole?

We don’t yet know, and maybe we won’t know.  Some scientists involved appear to be leaning toward the possibility that a black hole was formed, but others seem to say the jury is out.  I’m not sure what additional information can be obtained over time about this.

If the two neutron stars formed a black hole, why was there a kilonova?  Why wasn’t everything sucked into the black hole?

Black holes aren’t vacuum cleaners; they pull things in via gravity just the same way that the Earth and Sun do, and don’t suck things in some unusual way.  The only crucial thing about a black hole is that once you go in you can’t come out.  But just as when trying to avoid hitting the Earth or Sun, you can avoid falling in if you orbit fast enough or if you’re flung outward before you reach the edge.

The point in a neutron star merger is that the forces at the moment of merger are so intense that one or both neutron stars are partially ripped apart.  The material that is thrown outward in all directions, at an immense speed, somehow creates the bright, hot flash of gamma rays and eventually the kilonova glow from the newly formed atomic nuclei.  Those details I don’t yet understand, but I know they have been carefully studied both with approximate equations and in computer simulations such as this one and this one.  However, the accuracy of the simulations can only be confirmed through the detailed studies of a merger, such as the one just announced.  It seems, from the data we’ve seen, that the simulations did a fairly good job.  I’m sure they will be improved once they are compared with the recent data.

 

 

 


Filed under: Astronomy, Gravitational Waves Tagged: black holes, Gravitational Waves, LIGO, neutron stars

Matt StrasslerLIGO and VIRGO Announce a Joint Observation of a Black Hole Merger

Welcome, VIRGO!  Another merger of two big black holes has been detected, this time by both LIGO’s two detectors and by VIRGO as well.

Aside from the fact that this means that the VIRGO instrument actually works, which is great news, why is this a big deal?  By adding a third gravitational wave detector, built by the VIRGO collaboration, to LIGO’s Washington and Louisiana detectors, the scientists involved in the search for gravitational waves now can determine fairly accurately the direction from which a detected gravitational wave signal is coming.  And this allows them to do something new: to tell their astronomer colleagues roughly where to look in the sky, using ordinary telescopes, for some form of electromagnetic waves (perhaps visible light, gamma rays, or radio waves) that might have been produced by whatever created the gravitational waves.

The point is that with three detectors, one can triangulate.  The gravitational waves travel for billions of years, traveling at the speed of light, and when they pass by, they are detected at both LIGO detectors and at VIRGO.  But because it takes light a few thousandths of a second to travel the diameter of the Earth, the waves arrive at slightly different times at the LIGO Washington site, the LIGO Louisiana site, and the VIRGO site in Italy.  The precise timing tells the scientists what direction the waves were traveling in, and therefore roughly where they came from.  In a similar way, using the fact that sound travels at a known speed, the times that a gunshot is heard at multiple locations can be used by police to determine where the shot was fired.

You can see the impact in the picture below, which is an image of the sky drawn as a sphere, as if seen from outside the sky looking in.  In previous detections of black hole mergers by LIGO’s two detectors, the scientists could only determine a large swath of sky where the observed merger might have occurred; those are the four colored regions that stretch far across the sky.  But notice the green splotch at lower left.  That’s the region of sky where the black hole merger announced today occurred.  The fact that this region is many times smaller than the other four reflects what including VIRGO makes possible.  It’s a small enough region that one can search using an appropriate telescope for something that is making visible light, or gamma rays, or radio waves.

Skymap of the LIGO/Virgo black hole mergers.

Image credit: LIGO/Virgo/Caltech/MIT/Leo Singer (Milky Way image: Axel Mellinger)

 

While a black hole merger isn’t expected to be observable by other telescopes, and indeed nothing was observed by other telescopes this time, other events that LIGO might detect, such as a merger of two neutron stars, may create an observable effect. We can hope for such exciting news over the next year or two.


Filed under: Astronomy, Gravitational Waves Tagged: black holes, Gravitational Waves, LIGO

October 16, 2017

Matt StrasslerA Scientific Breakthrough! Combining Gravitational and Electromagnetic Waves

Gravitational waves are now the most important new tool in the astronomer’s toolbox.  Already they’ve been used to confirm that large black holes — with masses ten or more times that of the Sun — and mergers of these large black holes to form even larger ones, are not uncommon in the universe.   Today it goes a big step further.

It’s long been known that neutron stars, remnants of collapsed stars that have exploded as supernovas, are common in the universe.  And it’s been known almost as long that sometimes neutron stars travel in pairs.  (In fact that’s how gravitational waves were first discovered, indirectly, back in the 1970s.)  Stars often form in pairs, and sometimes both stars explode as supernovas, leaving their neutron star relics in orbit around one another.  Neutron stars are small — just ten or so kilometers (miles) across.  According to Einstein’s theory of gravity, a pair of stars should gradually lose energy by emitting gravitational waves into space, and slowly but surely the two objects should spiral in on one another.   Eventually, after many millions or even billions of years, they collide and merge into a larger neutron star, or into a black hole.  This collision does two things.

  1. It makes some kind of brilliant flash of light — electromagnetic waves — whose details are only guessed at.  Some of those electromagnetic waves will be in the form of visible light, while much of it will be in invisible forms, such as gamma rays.
  2. It makes gravitational waves, whose details are easier to calculate and which are therefore distinctive, but couldn’t have been detected until LIGO and VIRGO started taking data, LIGO over the last couple of years, VIRGO over the last couple of months.

It’s possible that we’ve seen the light from neutron star mergers before, but no one could be sure.  Wouldn’t it be great, then, if we could see gravitational waves AND electromagnetic waves from a neutron star merger?  It would be a little like seeing the flash and hearing the sound from fireworks — seeing and hearing is better than either one separately, with each one clarifying the other.  (Caution: scientists are often speaking as if detecting gravitational waves is like “hearing”.  This is only an analogy, and a vague one!  It’s not at all the same as acoustic waves that we can hear with our ears, for many reasons… so please don’t take it too literally.)  If we could do both, we could learn about neutron stars and their properties in an entirely new way.

Today, we learned that this has happened.  LIGO , with the world’s first two gravitational observatories, detected the waves from two merging neutron stars, 130 million light years from Earth, on August 17th.  (Neutron star mergers last much longer than black hole mergers, so the two are easy to distinguish; and this one was so close, relatively speaking, that it was seen for a long while.)  VIRGO, with the third detector, allows scientists to triangulate and determine roughly where mergers have occurred.  They saw only a very weak signal, but that was extremely important, because it told the scientists that the merger must have occurred in a small region of the sky where VIRGO has a relative blind spot.  That told scientists where to look.

The merger was detected for more than a full minute… to be compared with black holes whose mergers can be detected for less than a second.  It’s not exactly clear yet what happened at the end, however!  Did the merged neutron stars form a black hole or a neutron star?  The jury is out.

At almost exactly the moment at which the gravitational waves reached their peak, a blast of gamma rays — electromagnetic waves of very high frequencies — were detected by a different scientific team, the one from FERMI. FERMI detects gamma rays from the distant universe every day, and a two-second gamma-ray-burst is not unusual.  And INTEGRAL, another gamma ray experiment, also detected it.   The teams communicated within minutes.   The FERMI and INTEGRAL gamma ray detectors can only indicate the rough region of the sky from which their gamma rays originate, and LIGO/VIRGO together also only give a rough region.  But the scientists saw those regions overlapped.  The evidence was clear.  And with that, astronomy entered a new, highly anticipated phase.

Already this was a huge discovery.  Brief gamma-ray bursts have been a mystery for years.  One of the best guesses as to their origin has been neutron star mergers.  Now the mystery is solved; that guess is apparently correct. (Or is it?  Probably, but the gamma ray discovery is surprisingly dim, given how close it is.  So there are still questions to ask.)

Also confirmed by the fact that these signals arrived within a couple of seconds of one another, after traveling for over 100 million years from the same source, is that, indeed, the speed of light and the speed of gravitational waves are exactly the same — both of them equal to the cosmic speed limit, just as Einstein’s theory of gravity predicts.

Next, these teams quickly told their astronomer friends to train their telescopes in the general area of the source. Dozens of telescopes, from every continent and from space, and looking for electromagnetic waves at a huge range of frequencies, pointed in that rough direction and scanned for anything unusual.  (A big challenge: the object was near the Sun in the sky, so it could be viewed in darkness only for an hour each night!) Light was detected!  At all frequencies!  The object was very bright, making it easy to find the galaxy in which the merger took place.  The brilliant glow was seen in gamma rays, ultraviolet light, infrared light, X-rays, and radio.  (Neutrinos, particles that can serve as another way to observe distant explosions, were not detected this time.)

And with so much information, so much can be learned!

Most important, perhaps, is this: from the pattern of the spectrum of light, the conjecture seems to be confirmed that the mergers of neutron stars are important sources, perhaps the dominant one, for many of the heavy chemical elements — iodine, iridium, cesium, gold, platinum, and so on — that are forged in the intense heat of these collisions.  It used to be thought that the same supernovas that form neutron stars in the first place were the most likely source.  But now it seems that this second stage of neutron star life — merger, rather than birth — is just as important.  That’s fascinating, because neutron star mergers are much more rare than the supernovas that form them.  There’s a supernova in our Milky Way galaxy every century or so, but it’s tens of millenia or more between these “kilonovas”, created in neutron star mergers.

If there’s anything disappointing about this news, it’s this: almost everything that was observed by all these different experiments was predicted in advance.  Sometimes it’s more important and useful when some of your predictions fail completely, because then you realize how much you have to learn.  Apparently our understanding of gravity, of neutron stars, and of their mergers, and of all sorts of sources of electromagnetic radiation that are produced in those merges, is even better than we might have thought. But fortunately there are a few new puzzles.  The X-rays were late; the gamma rays were dim… we’ll hear more about this shortly, as NASA is holding a second news conference.

Some highlights from the second news conference:

  • New information about neutron star interiors, which affects how large they are and therefore how exactly they merge, has been obtained
  • The first ever visual-light image of a gravitational wave source, from the Swope telescope, at the outskirts of a distant galaxy; the galaxy’s center is the blob of light, and the arrow points to the explosion.

  • The theoretical calculations for a kilonova explosion suggest that debris from the blast should rather quickly block the visual light, so the explosion dims quickly in visible light — but infrared light lasts much longer.  The observations by the visible and infrared light telescopes confirm this aspect of the theory; and you can see evidence for that in the picture above, where four days later the bright spot is both much dimmer and much redder than when it was discovered.
  • Estimate: the total mass of the gold and platinum produced in this explosion is vastly larger than the mass of the Earth.
  • Estimate: these neutron stars were formed about 10 or so billion years ago.  They’ve been orbiting each other for most of the universe’s history, and ended their lives just 130 million years ago, creating the blast we’ve so recently detected.
  • Big Puzzle: all of the previous gamma-ray bursts seen up to now have always had shone in ultraviolet light and X-rays as well as gamma rays.   But X-rays didn’t show up this time, at least not initially.  This was a big surprise.  It took 9 days for the Chandra telescope to observe X-rays, too faint for any other X-ray telescope.  Does this mean that the two neutron stars created a black hole, which then created a jet of matter that points not quite directly at us but off-axis, and shines by illuminating the matter in interstellar space?  This had been suggested as a possibility twenty years ago, but this is the first time there’s been any evidence for it.
  • One more surprise: it took 16 days for radio waves from the source to be discovered, with the Very Large Array, the most powerful existing radio telescope.  The radio emission has been growing brighter since then!  As with the X-rays, this seems also to support the idea of an off-axis jet.
  • Nothing quite like this gamma-ray burst has been seen — or rather, recognized — before.  When a gamma ray burst doesn’t have an X-ray component showing up right away, it simply looks odd and a bit mysterious.  Its harder to observe than most bursts, because without a jet pointing right at us, its afterglow fades quickly.  Moreover, a jet pointing at us is bright, so it blinds us to the more detailed and subtle features of the kilonova.  But this time, LIGO/VIRGO told scientists that “Yes, this is a neutron star merger”, leading to detailed study from all electromagnetic frequencies, including patient study over many days of the X-rays and radio.  In other cases those observations would have stopped after just a short time, and the whole story couldn’t have been properly interpreted.

 

 


Filed under: Astronomy, Gravitational Waves