# Planet Musings

## August 29, 2016

### Doug Natelson — Amazon book categories are a joke

A brief non-physics post.  Others have pointed this out, but Amazon's categorizations for books are broken in such a way that they almost have to be designed to encourage scamming.  As an example, my book is, at this instant (and that's also worth noting - these things seem to fluctuate nearly minute-to-minute), the number 30 best seller in "Books > Science & Math > Physics > Solid State Physics".  That's sounds cool, but it's completely meaningless, since if you click on that category you find that it contains such solid state physics classics as "Ugly's Electrical References, 2014 ed.", "Barron's 500 Flash Cards of American Sign Language", "The Industrial Design Reader", and "Electrical Motor Controls for Integrated Systems", along with real solid state books like Kittel, Simon, and Ashcroft & Mermin.  Not quite as badly, the Nanostructures category is filled "Strength of Materials" texts and books about mechanical structures.  Weird, and completely fixable if Amazon actually cared, which they seem not to.

### Backreaction — Dear Dr. B: How come we never hear of a force that the Higgs boson carries?

“Dear Dr. Hossenfelder,

First, I love your blog. You provide a great insight into the world of physics for us laymen. I have read in popular science books that the bosons are the ‘force carriers.’ For example the photon carries the electromagnetic force, the gluon, the strong force, etc. How come we never hear of a force that the Higgs boson carries?

Ramiro Rodriguez
Dear Ramiro,

The short answer is that you never hear of a force that the Higgs boson carries because it doesn’t carry one. The longer answer is that not all bosons are alike. This of course begs the question just how the Higgs-boson is different, so let me explain.

The standard model of particle physics is based on gauge symmetries. This basically means that the laws of nature have to remain invariant under transformations in certain internal spaces, and these transformations can change from one place to the next and one moment to the next. They are what physics call “local” symmetries, as opposed to “global” symmetries whose transformations don’t change in space or time.

Amazingly enough, the requirement of gauge symmetry automatically explains how particles interact. It works like this. You start with fermions, that are particles of half-integer spin, like electrons, muons, quarks and so on. And you require that the fermions’ behavior must respect a gauge symmetry, which is classified by a symmetry group. Then you ask what equations you can possibly get that do this.

Since the fermions can move around, the equations that describe what they do must contain derivatives both in space and in time. This causes a problem, because if you want to know how the fermions’ motion changes from one place to the next you’d also have to know what the gauge transformation does from one place to the next, otherwise you can’t tell apart the change in the fermions from the change in the gauge transformation. But if you’d need to know that transformation, then the equations wouldn’t be invariant.

From this you learn that the only way the fermions can respect the gauge symmetry is if you introduce additional fields – the gauge fields – which exactly cancel the contribution from the space-time dependence of the gauge transformation. In the standard model the gauge fields all have spin 1, which means they are bosons. That's because to cancel the terms that came from the space-time derivative, the fields need to have the same transformation behavior as the derivative, which is that of a vector, hence spin 1.

To really follow this chain of arguments – from the assumption of gauge symmetry to the presence of gauge-bosons – requires several years’ worth of lectures, but the upshot is that the bosons which exchange the forces aren’t added by hand to the standard model, they are a consequence of symmetry requirements. You don’t get to pick the gauge-bosons, neither their number nor their behavior – their properties are determined by the symmetry.

In the standard model, there are 12 such force-carrying bosons: the photon (γ), the W+, W-, Z, and 8 gluons. They belong to three gauge symmetries, U(1), SU(2) and SU(3). Whether a fermion does or doesn’t interact with a gauge-boson depends on whether the fermion is “gauged” under the respective symmetry, ie transforms under it. Only the quarks, for example, are gauged under the SU(3) symmetry of the strong interaction, hence only the quarks couple to gluons and participate in that interaction. The so-introduced bosons are sometimes specifically referred to as “gauge-bosons” to indicate their origin.

The Higgs-boson in contrast is not introduced by a symmetry requirement. It has an entirely different function, which is to break a symmetry (the electroweak one) and thereby give mass to particles. The Higgs doesn’t have spin 1 (like the gauge-bosons) but spin 0. Indeed, it is the only presently known elementary particle with spin zero. Sheldon Glashow has charmingly referred to the Higgs as the “flush toilet” of the standard model – it’s there for a purpose, not because we like the smell.

The distinction between fermions and bosons can be removed by postulating an exchange symmetry between these two types of particles, known as supersymmetry. It works basically by generalizing the concept of a space-time direction to not merely be bosonic, but also fermionic, so that there is now a derivative that behaves like a fermion.

In the supersymmetric extension of the standard model there are then partner particles to all already known particles, denoted either by adding an “s” before the particle’s name if it’s a boson (selectron, stop quark, and so on) or adding “ino” after the particle’s name if it’s a fermion (Wino, photino, and so on). There is then also Higgsino, which is the partner particle of the Higgs and has spin 1/2. It is gauged under the standard model symmetries, hence participates in the interactions, but still is not itself consequence of a gauge.

In the standard model most of the bosons are also force-carriers, but bosons and force-carriers just aren’t the same category. To use a crude analogy, just because most of the men you know (most of the bosons in the standard model) have short hair (are force-carriers) doesn’t mean that to be a man (to be a boson) you must have short hair (exchange a force). Bosons are defined by having integer spin, as opposed to the half-integer spin that fermions have, and not by their ability to exchange interactions.

In summary the answer to your question is that certain types of bosons – the gauge bosons – are a consequence of symmetry requirements from which it follows that these bosons do exchange forces. The Higgs isn’t one of them.

Thanks for an interesting question!

 Peter Higgs receiving the Nobel Prize from the King of Sweden.[Img Credits: REUTERS/Claudio Bresciani/TT News Agency]

Previous Dear-Dr-B’s that you might also enjoy:

### John Preskill — Upending my equilibrium

Few settings foster equanimity like Canada’s Banff International Research Station (BIRS). Mountains tower above the center, softened by pines. Mornings have a crispness that would turn air fresheners evergreen with envy. The sky looks designed for a laundry-detergent label.

Doesn’t it?

One day into my visit, equilibrium shattered my equanimity.

I was participating in the conference “Beyond i.i.d. in information theory.” What “beyond i.i.d.” means is explained in these articles.  I was to present about resource theories for thermodynamics. Resource theories are simple models developed in quantum information theory. The original thermodynamic resource theory modeled systems that exchange energy and information.

Imagine a quantum computer built from tiny, cold, quantum circuits. An air particle might bounce off the circuit. The bounce can slow the particle down, transferring energy from particle to circuit. The bounce can entangle the particle with the circuit, transferring quantum information from computer to particle.

Suppose that particles bounced off the computer for ages. The computer would thermalize, or reach equilibrium: The computer’s energy would flatline. The computer would reach a state called the canonical ensemble. The canonical ensemble looks like this:  $e^{ - \beta H } / { Z }$.

Joe Renes and I had extended these resource theories. Thermodynamic systems can exchange quantities other than energy and information. Imagine white-bean soup cooling on a stovetop. Gas condenses on the pot’s walls, and liquid evaporates. The soup exchanges not only heat, but also particles, with its environment. Imagine letting the soup cool for ages. It would thermalize to the grand canonical ensemble, $e^{ - \beta (H - \mu N) } / { Z }$. Joe and I had modeled systems that exchange diverse thermodynamic observables.*

What if, fellow beyond-i.i.d.-er Jonathan Oppenheim asked, those observables didn’t commute with each other?

Mathematical objects called operators represent observables. Let $\hat{H}$ represent a system’s energy, and let $\hat{N}$ represent the number of particles in the system. The operators fail to commute if multiplying them in one order differs from multiplying them in the opposite order: $\hat{H} \hat{N} \neq \hat{N} \hat{H}$.

Suppose that our quantum circuit has observables represented by noncommuting operators $\hat{H}$ and $\hat{N}$. The circuit cannot have a well-defined energy and a well-defined particle number simultaneously. Physicists call this inability the Uncertainty Principle. Uncertainty and noncommutation infuse quantum mechanics as a Cashmere GlowTM infuses a Downy fabric softener.

Quantum uncertainty and noncommutation.

I glowed at Jonathan: All the coolness in Canada couldn’t have pleased me more than finding someone interested in that question.** Suppose that a quantum system exchanges observables $\hat{Q}_1$ and $\hat{Q}_2$ with the environment. Suppose that $\hat{Q}_1$ and $\hat{Q}_2$ don’t commute, like components $\hat{S}_x$ and $\hat{S}_y$ of quantum angular momentum. Would the system thermalize? Would the thermal state have the form $e^{ \mu_1 \hat{Q}_1 + \mu_2 \hat{Q}_2 } / { Z }$? Could we model the system with a resource theory?

Jonathan proposed that we chat.

The chat sucked in beyond-i.i.d.-ers Philippe Faist and Andreas Winter. We debated strategies while walking to dinner. We exchanged results on the conference building’s veranda. We huddled over a breakfast table after colleagues had pushed their chairs back. Information flowed from chalkboard to notebook; energy flowed in the form of coffee; food particles flowed onto the floor as we brushed crumbs from our notebooks.

Exchanges of energy and particles.

The idea morphed and split. It crystallized months later. We characterized, in three ways, the thermal state of a quantum system that exchanges noncommuting observables with its environment.

First, we generalized the microcanonical ensemble. The microcanonical ensemble is the thermal state of an isolated system. An isolated system exchanges no observables with any other system. The quantum computer and the air molecules can form an isolated system. So can the white-bean soup and its kitchen. Our quantum system and its environment form an isolated system. But they cannot necessarily occupy a microcanonical ensemble, thanks to noncommutation.

We generalized the microcanonical ensemble. The generalization involves approximation, unlikely measurement outcomes, and error tolerances. The microcanonical ensemble has a simple definition—sharp and clean as Banff air. We relaxed the definition to accommodate noncommutation. If the microcanonical ensemble resembles laundry detergent, our generalization resembles fabric softener.

Suppose that our system and its environment occupy this approximate microcanonical ensemble. Tracing out (mathematically ignoring) the environment yields the system’s thermal state. The thermal state basically has the form we expected, $\gamma = e^{ \sum_j \mu_j \hat{Q}_j } / { Z }$.

This exponential state, we argued, follows also from time evolution. The white-bean soup equilibrates upon exchanging heat and particles with the kitchen air for ages. Our quantum system can exchange observables $\hat{Q}_j$ with its environment for ages. The system equilibrates, we argued, to the state $\gamma$. The argument relies on a quantum-information tool called canonical typicality.

Third, we defined a resource theory for thermodynamic exchanges of noncommuting observables. In a thermodynamic resource theory, the thermal states are the worthless states: From a thermal state, one can’t extract energy usable to lift a weight or to power a laptop. The worthless states, we showed, have the form of $\gamma$.

Three path lead to the form $\gamma$ of the thermal state of a quantum system that exchanges noncommuting observables with its environment. We published the results this summer.

Not only was Team Banff spilling coffee over $\gamma$. So were teams at Imperial College London and the University of Bristol. Our conclusions overlap, suggesting that everyone calculated correctly. Our methodologies differ, generating openings for exploration. The mountain passes between our peaks call out for mapping.

So does the path to physical reality. Do these thermal states form in labs? Could they? Cold atoms offer promise for realizations. In addition to experiments and simulations, master equations merit study. Dynamical typicality, Team Banff argued, suggests that $\gamma$ results from equilibration. Master equations model equilibration. Does some Davies-type master equation have $\gamma$ as its fixed point? Email me if you have leads!

Experimentalists, can you realize the thermal state $e^{ \sum_j \mu_j \hat{Q}_j } / Z$ whose charges $\hat{Q}_j$ don’t commute?

A photo of Banff could illustrate Merriam-Webster’s entry for “equanimity.” Banff equanimity deepened our understanding of quantum equilibrium. But we wouldn’t have understood quantum equilibrium if questions hadn’t shattered our tranquility. Give me the disequilibrium of recognizing problems, I pray, and the equilibrium to solve them.

*By “observable,” I mean “property that you can measure.”

**Teams at Imperial College London and Bristol asked that question, too. More pleasing than three times the coolness in Canada!

## August 28, 2016

### John Baez — Topological Crystals (Part 4)

Okay, let’s look at some examples of topological crystals. These are what got me excited in the first place. We’ll get some highly symmetrical crystals, often in higher-dimensional Euclidean spaces. The ‘triamond’, above, is a 3d example.

### Review

First let me remind you how it works. We start with a connected graph $X.$ This has a space $C_0(X,\mathbb{R})$ of 0-chains, which are formal linear combinations of vertices, and a space $C_1(X,\mathbb{R})$ of 1-chains, which are formal linear combinations of edges.

We choose a vertex in $X.$ Each path $\gamma$ in $X$ starting at this vertex determines a 1-chain $c_\gamma,$ namely the sum of its edges. These 1-chains give some points in $C_1(X,\mathbb{R}).$ These points are the vertices of a graph $\overline{X}$ called the maximal abelian cover of $X.$ The maximal abelian cover has an edge from $c_\gamma$ to $c_{\gamma'}$ whenever the path $\gamma'$ is obtained by adding an extra edge to $\gamma.$ We can think of this edge as a straight line segment from $c_\gamma$ to $c_{\gamma'}.$

So, we get a graph $\overline{X}$ sitting inside $C_1(X,\mathbb{R}).$ But this is a high-dimensional space. To get something nicer we’ll project down to a lower-dimensional space.

There’s boundary operator

$\partial : C_1(X,\mathbb{R}) \to C_0(X,\mathbb{R})$

sending any edge to the difference of its two endpoints. The kernel of this operator is the space of 1-cycles, $Z_1(X,\mathbb{R}).$ There’s an inner product on the space of 1-chains such that edges form an orthonormal basis, so we get an orthogonal projection

$\pi : C_1(X,\mathbb{R}) \to Z_1(X,\mathbb{R})$

We can use this to take the maximal abelian cover $\overline{X}$ and project it down to the space of 1-cycles. The hard part is checking that $\pi$ is one-to-one on $\overline{X}.$ But that’s what I explained last time! It’s true whenever our original graph $X$ has no bridges: that is, edges whose removal would disconnect our graph, like this:

So, when $X$ is a bridgeless graph, we get a copy of the maximal abelian cover embedded in $Z_1(X,\mathbb{R}).$ This is our topological crystal.

Let’s do some examples.

### Graphene

I showed you this one before, but it’s a good place to start. Let $X$ be this graph:

Since this graph has 3 edges, its space of 1-chains is 3-dimensional. Since this graph has 2 holes, its 1-cycles form a plane in that 3d space. If we take paths $\gamma$ in $X$ starting at the red vertex, form the 1-chains $c_\gamma,$ and project them down to this plane, we get this:

Here the 1-chains $c_\gamma$ are the white and red dots. They’re the vertices of the maximal abelian cover $\overline{X},$ while the line segments between them are the edges of $\overline{X}.$ Projecting these vertices and edges onto the plane of 1-cycles, we get our topological crystal:

This is the pattern of graphene, a miraculous 2-dimensional form of carbon. The more familiar 3d crystal called graphite is made of parallel layers of graphene connected with some other bonds.

Puzzle 1. Classify bridgeless connected graphs with 2 holes (or more precisely, a 2-dimensional space of 1-cycles). What are the resulting 2d topological crystals?

### Diamond

Now let’s try this graph:

Since it has 3 holes, it gives a 3d crystal:

This crystal structure is famous! It’s the pattern used by a diamond. Up to translation it has two kinds of atoms, corresponding to the two vertices of the original graph.

### Triamond

Now let’s try this graph:

Since it has 3 holes, it gives another 3d crystal:

This is also famous: it’s sometimes called a ‘triamond’. If you’re a bug crawling around on this crystal, locally you experience the same topology as if you were crawling around on a wire-frame model of a tetrahedron. But you’re actually on the maximal abelian cover!

Up to translation the triamond has 4 kinds of atoms, corresponding to the 4 vertices of the tetrahedron. Each atom has 3 equally distant neighbors lying in a plane at 120° angles from each other. These planes lie in 4 families, each parallel to one face of a regular tetrahedron. This structure was discovered by the crystallographer Laves, and it was dubbed the Laves graph by Coxeter. Later Sunada called it the ‘$\mathrm{K}_4$ lattice’ and studied its energy minimization properties. Theoretically it seems to be a stable form of carbon. Crystals in this pattern have not yet been seen, but this pattern plays a role in the structure of certain butterfly wings.

Puzzle 2. Classify bridgeless connected graphs with 3 holes (or more precisely, a 3d space of 1-cycles). What are the resulting 3d topological crystals?

### Lonsdaleite and hyperquartz

There’s a crystal form of carbon called lonsdaleite that looks like this:

It forms in meteor impacts. It does not arise as 3-dimensional topological crystal.

Puzzle 3. Show that this graph gives a 5-dimensional topological crystal which can be projected down to give lonsdaleite in 3d space:

Puzzle 4. Classify bridgeless connected graphs with 4 holes (or more precisely, a 4d space of 1-cycles). What are the resulting 4d topological crystals? A crystallographer with the wonderful name of Eon calls this one hyperquartz, because it’s a 4-dimensional analogue of quartz:

All these classification problems are quite manageable if you notice there are certain ‘boring’, easily understood ways to get new bridgeless connected graphs with $n$ holes from old ones.

### Platonic crystals

For any connected graph $X,$ there is a covering map

$q : \overline{X} \to X$

The vertices of $\overline{X}$ come in different kinds, or ‘colors’, depending on which vertex of $X$ they map to. It’s interesting to look at the group of ‘covering symmetries’, $\mathrm{Cov}(X),$ consisting of all symmetries of $\overline{X}$ that map vertices of same color to vertices of the same color. Greg Egan and I showed that if $X$ has no bridges, $\mathrm{Cov}(X)$ also acts as symmetries of the topological crystal associated to $X.$ This group fits into a short exact sequence:

$1 \longrightarrow H_1(X,\mathbb{Z}) \longrightarrow \mathrm{Cov}(X) \longrightarrow \mathrm{Aut}(X) \longrightarrow 1$

where $\mathrm{Aut}(X)$ is the group of all symmetries of $X.$ Thus, every symmetry of $X$ is covered by some symmetry of its topological crystal, while $H_1(X,\mathbb{Z})$ acts as translations of the crystal, in a way that preserves the color of every vertex.

For example consider the triamond, which comes from the tetrahedron. The symmetry group of the tetrahedron is this Coxeter group:

$\mathrm{A}_3 = \langle s_1, s_2, s_3 \;| \; (s_1s_2)^3 = (s_2s_3)^3 = s_1^2 = s_2^2 = s_3^2 = 1\rangle$

Thus, the group of covering symmetries of the triamond is an extension of $\mathrm{A}_3$ by $\mathbb{Z}^3.$ Beware the notation here: this is not the alternating group on the 3 letters. In fact it’s the permutation group on 4 letters, namely the vertices of the tetrahedron!

We can also look at other ‘Platonic crystals’. The symmetry group of the cube and octahedron is the Coxeter group

$\mathrm{B}_3 = \langle s_1, s_2, s_3 \;| \; (s_1s_2)^3 = (s_2s_3)^4 = s_1^2 = s_2^2 = s_3^2 = 1\rangle$

Since the cube has 6 faces, the graph formed by its vertices and edges a 5d space of 1-cycles. The corresponding topological crystal is thus 5-dimensional, and its group of covering symmetries is an extension of $\mathrm{B}_3$ by $\mathbb{Z}^5.$ Similarly, the octahedron gives a 7-dimensional topological crystal whose group of covering symmetries, an extension of $\mathrm{B}_3$ by $\mathbb{Z}^7.$

The symmetry group of the dodecahedron and icosahedron is

$\mathrm{H}_3 = \langle s_1, s_2, s_3 \;| \; (s_1s_2)^3 = (s_2s_3)^5= s_1^2 = s_2^2 = s_3^2 = 1\rangle$

and these solids give crystals of dimensions 11 and 19. If you’re a bug crawling around on the the second of these, locally you experience the same topology as if you were crawling around on a wire-frame model of a icosahedron. But you’re actually in 19-dimensional space, crawling around on the maximal abelian cover!

There is also an infinite family of degenerate Platonic solids called ‘hosohedra’ with two vertices, $n$ edges and $n$ faces. These faces cannot be made flat, since each face has just 2 edges, but that is not relevant to our construction: the vertices and edges still give a graph. For example, when $n = 6,$ we have the ‘hexagonal hosohedron’:

The corresponding crystal has dimension $n-1,$ and its group of covering symmetries is an extension of $\mathrm{S}_n \times \mathbb{Z}/2$ by $\mathbb{Z}^{n-1}.$ The case $n = 3$ gives the graphene crystal, while $n = 4$ gives the diamond.

### Exotic crystals

We can also get crystals from more exotic highly symmetrical graphs. For example, take the Petersen graph:

Its symmetry group is the symmetric group $\mathrm{S}_5.$ It has 10 vertices and 15 edges, so its Euler characteristic is $-5,$ which implies that its space of 1-cycles is 6-dimensional. It thus gives a 6-dimensional crystal whose group of covering symmetries is an extension of $\mathrm{S}_5$ by $\mathbb{Z}^6.$

Two more nice examples come from Klein’s quartic curve, a Riemann surface of genus three on which the 336-element group $\mathrm{PGL}(2,\mathbb{F}_7)$ acts as isometries. These isometries preserve a tiling of Klein’s quartic curve by 56 triangles, with 7 meeting at each vertex. This picture is topologically correct, though not geometrically:

From this tiling we obtain a graph $X$ embedded in Klein’s quartic curve. This graph has $56 \times 3 / 2 = 84$ edges and $56 \times 3 / 7 = 24$ vertices, so it has Euler characteristic $-60.$ It thus gives a 61-dimensional topological crystal whose group of covering symmetries is extension of $\mathrm{PGL}(2,\mathbb{F}_7)$ by $\mathbb{Z}^{61}.$

There is also a dual tiling of Klein’s curve by 24 heptagons, 3 meeting at each vertex. This gives a graph with 84 edges and 56 vertices, hence Euler characteristic $-28.$ From this we obtain a 29-dimensional topological crystal whose group of covering symmetries is an extension of $\mathrm{PGL}(2,\mathbb{F}_7)$ by $\mathbb{Z}^{29}.$

### The packing fraction

Now that we’ve got a supply of highly symmetrical crystals in higher dimensions, we can try to study their structure. We’ve only made a bit of progress on this.

One interesting property of a topological crystal is its ‘packing fraction’. I like to call the vertices of a topological crystal atoms, for the obvious reason. The set $A$ of atoms is periodic. It’s usually not a lattice. But it’s contained in the lattice $L$ obtained by projecting the integral 1-chains down to the space of 1-cycles:

$L = \{ \pi(c) : \; c \in C_1(X,\mathbb{Z}) \}$

We can ask what fraction of the points in this lattice are actually atoms. Let’s call this the packing fraction. Since $Z_1(X,\mathbb{Z})$ acts as translations on both $A$ and $L,$ we can define it to be

$\displaystyle{ \frac{|A/Z_1(X,\mathbb{Z})|}{|L/Z_1(X,\mathbb{Z})|} }$

For example, suppose $X$ is the graph that gives graphene:

Then the packing fraction is 2/3, as can be seen here:

For any bridgeless connected graph $X,$ it turns out that the packing fraction equals

$\displaystyle{ \frac{|V|}{|T|} }$

where $V$ is the set of vertices and $T$ is the set of spanning trees. The main tool used to prove this is Bacher, de la Harpe and Nagnibeda’s work on integral cycles and integral cuts, which in turn relies on Kirchhoff’s matrix tree theorem.

Greg Egan used Mathematica to count the spanning trees in the examples discussed above, and this let us work out their packing fractions. They tend to be very low! For example, the maximal abelian cover of the dodecahedron gives an 11-dimensional crystal with packing fraction 1/27,648, while the heptagonal tiling of Klein’s quartic gives a 29-dimensional crystal with packing fraction 1/688,257,368,064,000,000.

So, we have some very delicate, wispy crystals in high-dimensional spaces, built from two simple ideas in topology: the maximal abelian cover of a graph, and the natural inner product on 1-chains. They have intriguing connections to tropical geometry, but they are just beginning to be understood in detail. Have fun with them!

For more, see:

• John Baez, Topological crystals.

### n-Category CaféTopological Crystals (Part 4)

Okay, let’s look at some examples of topological crystals. These are what got me excited in the first place. We’ll get some highly symmetrical crystals, often in higher-dimensional Euclidean spaces. The ‘triamond’, above, is a 3d example.

### Review

First let me remind you how it works. We start with a connected graph $X$. This has a space $C_0(X,\mathbb{R})$ of 0-chains, which are formal linear combinations of vertices, and a space $C_1(X,\mathbb{R})$ of 1-chains, which are formal linear combinations of edges.

We choose a vertex in $X$. Each path $\gamma$ in $X$ starting at this vertex determines a 1-chain $c_\gamma$, namely the sum of its edges. These 1-chains give some points in $C_1(X,\mathbb{R})$. These points are the vertices of a graph $\overline{X}$ called the maximal abelian cover of $X$. The maximal abelian cover has an edge from $c_\gamma$ to $c_{\gamma'}$ whenever the path $\gamma'$ is obtained by adding an extra edge to $\gamma$. We can think of this edge as a straight line segment from $c_\gamma$ to $c_{\gamma'}$.

So, we get a graph $\overline{X}$ sitting inside $C_1(X,\mathbb{R})$. But this is a high-dimensional space. To get something nicer we’ll project down to a lower-dimensional space.

There’s boundary operator

$\partial : C_1(X,\mathbb{R}) \to C_0(X,\mathbb{R})$

sending any edge to the difference of its two endpoints. The kernel of this operator is the space of 1-cycles, $Z_1(X,\mathbb{R})$. There’s an inner product on the space of 1-chains such that edges form an orthonormal basis, so we get an orthogonal projection

$\pi : C_1(X,\mathbb{R}) \to Z_1(X,\mathbb{R})$

We can use this to take the maximal abelian cover $\overline{X}$ and project it down to the space of 1-cycles. The hard part is checking that $\pi$ is one-to-one on $\overline{X}$. But that’s what I explained last time! It’s true whenever our original graph $X$ has no bridges: that is, edges whose removal would disconnect our graph, like this:

So, when $X$ is a bridgeless graph, we get a copy of the maximal abelian cover embedded in $Z_1(X,\mathbb{R})$. This is our topological crystal.

Let’s do some examples.

### Graphene

I showed you this one before, but it’s a good place to start. Let $X$ be this graph:

Since this graph has 3 edges, its space of 1-chains is 3-dimensional. Since this graph has 2 holes, its 1-cycles form a plane in that 3d space. If we take paths $\gamma$ in $X$ starting at the red vertex, form the 1-chains $c_\gamma$, and project them down to this plane, we get this:

Here the 1-chains $c_\gamma$ are the white and red dots. They’re the vertices of the maximal abelian cover $\overline{X}$, while the line segments between them are the edges of $\overline{X}$. Projecting these vertices and edges onto the plane of 1-cycles, we get our topological crystal:

This is the pattern of graphene, a miraculous 2-dimensional form of carbon. The more familiar 3d crystal called graphite is made of parallel layers of graphene connected with some other bonds.

Puzzle 1. Classify bridgeless connected graphs with 2 holes (or more precisely, a 2-dimensional space of 1-cycles). What are the resulting 2d topological crystals?

### Diamond

Now let’s try this graph:

Since it has 3 holes, it gives a 3d crystal:

This crystal structure is famous! It’s the pattern used by a diamond. Up to translation it has two kinds of atoms, corresponding to the two vertices of the original graph.

### Triamond

Now let’s try this graph:

Since it has 3 holes, it gives another 3d crystal:

This is also famous: it’s sometimes called a ‘triamond’. If you’re a bug crawling around on this crystal, locally you experience the same topology as if you were crawling around on a wire-frame model of a tetrahedron. But you’re actually on the maximal abelian cover!

Up to translation the triamond has 4 kinds of atoms, corresponding to the 4 vertices of the tetrahedron. Each atom has 3 equally distant neighbors lying in a plane at $120{}^\circ$ angles from each other. These planes lie in 4 families, each parallel to one face of a regular tetrahedron. This structure was discovered by the crystallographer Laves, and it was dubbed the ‘Laves graph’ by Coxeter. Later Sunada called it the ‘$\mathrm{K}_4$ lattice’ and studied its energy minimization properties. Theoretically it seems to be a stable form of carbon. Crystals in this pattern have not yet been seen, but this pattern plays a role in the structure of certain butterfly wings.

Puzzle 2. Classify bridgeless connected graphs with 3 holes (or more precisely, a 3d space of 1-cycles). What are the resulting 3d topological crystals?

### Lonsdaleite and hyperquartz

There’s a crystal form of carbon called lonsdaleite that looks like this:

It forms in meteor impacts. It does not arise as 3-dimensional topological crystal.

Puzzle 3. Show that this graph gives a 5-dimensional topological crystal which can be projected down to give lonsdaleite in 3d space:

Puzzle 4. Classify bridgeless connected graphs with 4 holes (or more precisely, a 4d space of 1-cycles). What are the resulting 4d topological crystals? A crystallographer with the wonderful name of Eon calls this one hyperquartz, because it’s a 4-dimensional analogue of quartz:

All these classification problems are quite manageable if you notice there are certain ‘boring’, easily understood ways to get new bridgeless graphs with $n$ holes from old ones.

### Platonic crystals

For any connected graph $X$, there is a covering map

$q : \overline{X} \to X$

The vertices of $\overline{X}$ come in different kinds, or ‘colors’, depending on which vertex of $X$ they map to. It’s interesting to look at the group of ‘covering symmetries’, $\mathrm{Cov}(X)$, consisting of all symmetries of $\overline{X}$ that map vertices of same color to vertices of the same color. Greg Egan and I showed that when $X$ has no bridges, $\mathrm{Cov}(X)$ also acts as symmetries of the topological crystal associated to $X$. This group fits into a short exact sequence:

$1 \longrightarrow H_1(X,\mathbb{Z}) \longrightarrow \mathrm{Cov}(X) \longrightarrow \mathrm{Aut}(X) \longrightarrow 1$

where $\mathrm{Aut}(X)$ is the group of all symmetries of $X$. Thus, every symmetry of $X$ is covered by some symmetry of its topological crystal, while $H_1(X,\mathbb{Z})$ acts as translations of the crystal, in a way that preserves the color of every vertex.

For example consider the triamond, which comes from the tetrahedron. The symmetry group of the tetrahedron is this Coxeter group:

$\mathrm{A}_3 = \langle s_1, s_2, s_3 \;| \; (s_1s_2)^3 = (s_2s_3)^3 = s_1^2 = s_2^2 = s_3^2 = 1\rangle$

Thus, the group of covering symmetries of the triamond is an extension of $\mathrm{A}_3$ by $\mathbb{Z}^3$. Beware the notation here: this is not the alternating group on the 3 letters. In fact it’s the permutation group on 4 letters, namely the vertices of the tetrahedron!

We can also look at other ‘Platonic crystals’. The symmetry group of the cube and octahedron is the Coxeter group

$\mathrm{B}_3 = \langle s_1, s_2, s_3 \;| \; (s_1s_2)^3 = (s_2s_3)^4 = s_1^2 = s_2^2 = s_3^2 = 1\rangle$

Since the cube has 6 faces, the graph formed by its vertices and edges a 5d space of 1-cycles. The corresponding topological crystal is thus 5-dimensional, and its group of covering symmetries is an extension of $\mathrm{B}_3$ by $\mathbb{Z}^5$. Similarly, the octahedron gives a 7-dimensional topological crystal whose group of covering symmetries, an extension of $\mathrm{B}_3$ by $\mathbb{Z}^7$.

The symmetry group of the dodecahedron and icosahedron is

$\mathrm{H}_3 = \langle s_1, s_2, s_3 \;| \; (s_1s_2)^3 = (s_2s_3)^5= s_1^2 = s_2^2 = s_3^2 = 1\rangle$

and these solids give crystals of dimensions 11 and 19. If you’re a bug crawling around on the the second of these, locally you experience the same topology as if you were crawling around on a wire-frame model of a icosahedron. But you’re actually in 19-dimensional space, crawling around on the maximal abelian cover!

There is also an infinite family of degenerate Platonic solids called ‘hosohedra’ with two vertices, $n$ edges and $n$ faces. These faces cannot be made flat, since each face has just 2 edges, but that is not relevant to our construction: the vertices and edges still give a graph. For example, when $n = 6$, we have the ‘hexagonal hosohedron’:

The corresponding crystal has dimension $n-1$, and its group of covering symmetries is an extension of $\mathrm{S}_n \times \mathbb{Z}/2$ by $\mathbb{Z}^{n-1}$. The case $n = 3$ gives the graphene crystal, while $n = 4$ gives the diamond.

### Exotic crystals

We can also get crystals from more exotic highly symmetrical graphs. For example, take the Petersen graph:

Its symmetry group is the symmetric group $\mathrm{S}_5$. It has 10 vertices and 15 edges, so its Euler characteristic is $-5$, which implies that its space of 1-cycles is 6-dimensional. It thus gives a 6-dimensional crystal whose group of covering symmetries is an extension of $\mathrm{S}_5$ by $\mathbb{Z}^6$.

Two more nice examples come from Klein’s quartic curve, a Riemann surface of genus three on which the 336-element group $\mathrm{PGL}(2,\mathbb{F}_7)$ acts as isometries. These isometries preserve a tiling of Klein’s quartic curve by 56 triangles, with 7 meeting at each vertex. This picture is topologically correct, though not geometrically:

From this tiling we obtain a graph $X$ embedded in Klein’s quartic curve. This graph has $56 \times 3 / 2 = 84$ edges and $56 \times 3 / 7 = 24$ vertices, so it has Euler characteristic $-60$. It thus gives a 61-dimensional topological crystal whose group of covering symmetries is extension of $\mathrm{PGL}(2,\mathbb{F}_7)$ by $\mathbb{Z}^{61}$.

There is also a dual tiling of Klein’s curve by 24 heptagons, 3 meeting at each vertex. This gives a graph with 84 edges and 56 vertices, hence Euler characteristic $-28$. From this we obtain a 29-dimensional topological crystal whose group of covering symmetries is an extension of $\mathrm{PGL}(2,\mathbb{F}_7)$ by $\mathbb{Z}^{29}$.

### The packing fraction

Another interesting property of a topological crystal is its ‘packing fraction’. I like to call the vertices of a topological crystal atoms, for the obvious reason. The set $A$ of atoms is periodic. It’s usually not a lattice. But it’s contained in the lattice $L$ obtained by projecting the integral 1-chains down to the space of 1-cycles:

$L = \{ \pi(c) : \; c \in C_1(X,\mathbb{Z}) \}$

We can ask what fraction of the points in this lattice are actually atoms. Let’s call this the packing fraction. Since $Z_1(X,\mathbb{Z})$ acts as translations on both $A$ and $L$, we can define it to be

$\displaystyle{ \frac{|A/Z_1(X,\mathbb{Z})|}{|L/Z_1(X,\mathbb{Z})|} }$

For example, suppose $X$ is the graph that gives graphene:

Then the packing fraction is $\frac{2}{3}$, as can be seen here:

For any bridgeless connected graph $X$, it turns out that the packing fraction is:

$\displaystyle{ \frac{|V|}{|T|} }$

where $V$ is the set of vertices and $T$ is the set of spanning trees. The main tool used to prove this is Bacher, de la Harpe and Nagnibeda’s work on integral cycles and integral cuts, which in turn relies on Kirchhoff’s matrix tree theorem.

Greg Egan used Mathematica to count the spanning trees in all the examples discussed above, and this let us work out their packing fractions. They tend to be very low! For example, the maximal abelian cover of the dodecahedron gives an 11-dimensional crystal with packing fraction $1/27,648$, while the heptagonal tiling of Klein’s quartic gives a 29-dimensional crystal with packing fraction $1/688,257,368,064,000,000$.

So, we have some very delicate, wispy crystals in high-dimensional spaces, built from two simple ideas in topology: the maximal abelian cover of a graph, and the natural inner product on 1-chains. They have intriguing connections to tropical geometry, but they are just beginning to be understood in detail. Have fun with them!

For more, see:

### John Baez — Topological Crystals (Part 3)

Last time I explained how to create the ‘maximal abelian cover’ of a connected graph. Now I’ll say more about a systematic procedure for embedding this into a vector space. That will give us a topological crystal, like the one above.

Some remarkably symmetrical patterns arise this way! For example, starting from this graph:

we get this:

Nature uses this pattern for crystals of graphene.

Starting from this graph:

we get this:

Nature uses this for crystals of diamond! Since the construction depends only on the topology of the graph we start with, we call this embedded copy of its maximal abelian cover a topological crystal.

Today I’ll remind you how this construction works. I’ll also outline a proof that it gives an embedding of the maximal abelian cover if and only if the graph has no bridges: that is, edges that disconnect the graph when removed. I’ll skip all the hard steps of the proof, but they can be found here:

• John Baez, Topological crystals.

### The homology of graphs

I’ll start with some standard stuff that’s good to know. Let $X$ be a graph. Remember from last time that we’re working in a setup where every edge $e$ goes from a vertex called its source $s(e)$ to a vertex called its target $t(e)$. We write $e: x \to y$ to indicate that $e$ is going from $x$ to $y$. You can think of the edge as having an arrow on it, and if you turn the arrow around you get the inverse edge, $e^{-1}: y \to x$. Also, $e^{-1} \ne e$.

The group of integral 0-chains on $X$, $C_0(X,\mathbb{Z})$, is the free abelian group on the set of vertices of $X$. The group of integral 1-chains on $X$, $C_1(X,\mathbb{Z})$, is the quotient of the free abelian group on the set of edges of $X$ by relations $e^{-1} = -e$ for every edge $e$. The boundary map is the homomorphism

$\partial : C_1(X,\mathbb{Z}) \to C_0(X,\mathbb{Z})$

such that

$\partial e = t(e) - s(e)$

for each edge $e$, and

$Z_1(X,\mathbb{Z}) = \ker \partial$

is the group of integral 1-cycles on $X$.

Remember, a path in a graph is a sequence of edges, the target of each one being the source of the next. Any path $\gamma = e_1 \cdots e_n$ in $X$ determines an integral 1-chain:

$c_\gamma = e_1 + \cdots + e_n$

For any path $\gamma$ we have

$c_{\gamma^{-1}} = -c_{\gamma},$

and if $\gamma$ and $\delta$ are composable then

$c_{\gamma \delta} = c_\gamma + c_\delta$

Last time I explained what it means for two paths to be ‘homologous’. Here’s the quick way to say it. There’s groupoid called the fundamental groupoid of $X$, where the objects are the vertices of $X$ and the morphisms are freely generated by the edges except for relations saying that the inverse of $e: x \to y$ really is $e^{-1}: y \to x$. We can abelianize the fundamental groupoid by imposing relations saying that $\gamma \delta = \delta \gamma$ whenever this equation makes sense. Each path $\gamma : x \to y$ gives a morphism which I’ll call $[[\gamma]] : x \to y$ in the abelianized fundamental groupoid. We say two paths $\gamma, \gamma' : x \to y$ are homologous if $[[\gamma]] = [[\gamma']]$.

Here’s a nice thing:

Lemma A. Let $X$ be a graph. Two paths $\gamma, \delta : x \to y$ in $X$ are homologous if and only if they give the same 1-chain: $c_\gamma = c_\delta$.

Proof. See the paper. You could say they give ‘homologous’ 1-chains, too, but for graphs that’s the same as being equal.   █

We define vector spaces of 0-chains and 1-chains by

$C_0(X,\mathbb{R}) = C_0(X,\mathbb{Z}) \otimes \mathbb{R}, \qquad C_1(X,\mathbb{R}) = C_1(X,\mathbb{Z}) \otimes \mathbb{R},$

respectively. We extend the boundary map to a linear map

$\partial : C_1(X,\mathbb{R}) \to C_0(X,\mathbb{R})$

We let $Z_1(X,\mathbb{R})$ be the kernel of this linear map, or equivalently,

$Z_1(X,\mathbb{R}) = Z_0(X,\mathbb{Z}) \otimes \mathbb{R} ,$

and we call elements of this vector space 1-cycles. Since $Z_1(X,\mathbb{Z})$ is a free abelian group, it forms a lattice in the space of 1-cycles. Any edge of $X$ can be seen as a 1-chain, and there is a unique inner product on $C_1(X,\mathbb{R})$ such that edges form an orthonormal basis (with each edge $e^{-1}$ counting as the negative of $e$.) There is thus an orthogonal projection

$\pi : C_1(X,\mathbb{R}) \to Z_1(X,\mathbb{R}) .$

This is the key to building topological crystals!

### The embedding of atoms

We now come to the main construction, first introduced by Kotani and Sunada. To build a topological crystal, we start with a connected graph $X$ with a chosen basepoint $x_0$. We define an atom to be a homology class of paths starting at the basepoint, like

$[[\alpha]] : x_0 \to x$

Last time I showed that these atoms are the vertices of the maximal abelian cover of $X$. Now let’s embed these atoms in a vector space!

Definition. Let $X$ be a connected graph with a chosen basepoint. Let $A$ be its set of atoms. Define the map

$i : A \to Z_1(X,\mathbb{R})$

by

$i([[ \alpha ]]) = \pi(c_\alpha) .$

That $i$ is well-defined follows from Lemma A. The interesting part is this:

Theorem A. The following are equivalent:

(1) The graph $X$ has no bridges.

(2) The map $i : A \to Z_1(X,\mathbb{R})$ is one-to-one.

Proof. The map $i$ is one-to-one if and only if for any atoms $[[ \alpha ]]$ and $[[ \beta ]]$, $i([[ \alpha ]]) = i([[ \beta ]])$ implies $[[ \alpha ]]= [[ \beta ]]$. Note that $\gamma = \beta^{-1} \alpha$ is a path in $X$ with $c_\gamma = c_{\alpha} - c_\beta$, so

$\pi(c_\gamma) = \pi(c_{\alpha} - c_\beta) = i([[ \alpha ]]) - i([[ \beta ]])$

Since $\pi(c_\gamma)$ vanishes if and only if $c_\gamma$ is orthogonal to every 1-cycle, we have

$c_{\gamma} \textrm{ is orthogonal to every 1-cycle} \; \iff \; i([[ \alpha ]]) = i([[ \beta ]])$

On the other hand, Lemma A says

$c_\gamma = 0 \; \iff \; [[ \alpha ]]= [[ \beta ]].$

Thus, to prove (1)$\iff$(2), it suffices to that show that $X$ has no bridges if and only if every 1-chain $c_\gamma$ orthogonal to every 1-cycle has $c_\gamma =0$. This is Lemma D below.   █

The following lemmas are the key to the theorem above — and also a deeper one saying that if $X$ has no bridges, we can extend $i : A \to Z_1(X,\mathbb{R})$ to an embedding of the whole maximal abelian cover of $X$.

For now, we just need to show that any nonzero 1-chain coming from a path in a bridgeless graph has nonzero inner product with some 1-cycle. The following lemmas, inspired by an idea of Ilya Bogdanov, yield an algorithm for actually constructing such a 1-cycle. This 1-cycle also has other desirable properties, which will come in handy later.

To state these, let a simple path be one in which each vertex appears at most once. Let a simple loop be a loop $\gamma : x \to x$ in which each vertex except $x$ appears at most once, while $x$ appears exactly twice, as the starting point and ending point. Let the support of a 1-chain $c$, denoted $\mathrm{supp}(c)$, be the set of edges $e$ such that $\langle c, e\rangle> 0$. This excludes edges with $\langle c, e \rangle= 0$, but also those with $\langle c , e \rangle < 0$, which are inverses of edges in the support. Note that

$c = \sum_{e \in \mathrm{supp}(c)} \langle c, e \rangle .$

Thus, $\mathrm{supp}(c)$ is the smallest set of edges such that $c$ can be written as a positive linear combination of edges in this set.

Okay, here are the lemmas!

Lemma B. Let $X$ be any graph and let $c$ be an integral 1-cycle on $X$. Then for some $n$ we can write

$c = c_{\sigma_1} + \cdots + c_{\sigma_n}$

where $\sigma_i$ are simple loops with $\mathrm{supp}(c_{\sigma_i}) \subseteq \mathrm{supp}(c)$.

Proof. See the paper. The proof is an algorithm that builds a simple loop $\sigma_1$ with$\mathrm{supp}(c_{\sigma_1}) \subseteq \mathrm{supp}(c)$. We subtract this from $c$, and if the result isn’t zero we repeat the algorithm, continuing to subtract off 1-cycles $c_{\sigma_i}$ until there’s nothing left.   █

Lemma C. Let $\gamma: x \to y$ be a path in a graph $X$. Then for some $n \ge 0$ we can write

$c_\gamma = c_\delta + c_{\sigma_1} + \cdots + c_{\sigma_n}$

where $\delta: x \to y$ is a simple path and $\sigma_i$ are simple loops with $\mathrm{supp}(c_\delta), \mathrm{supp}(c_{\sigma_i}) \subseteq \mathrm{supp}(c_\gamma)$.

Proof. This relies on the previous lemma, and the proof is similar — but when we can’t subtract off any more $c_{\sigma_i}$’s we show what’s left is $c_\delta$ for a simple path $\delta: x \to y$.   █

Lemma D. Let $X$ be a graph. Then the following are equivalent:

(1) $X$ has no bridges.

(2) For any path $\gamma$ in $X$, if $c_\gamma$ is orthogonal to every 1-cycle then $c_\gamma = 0$.

Proof. It’s easy to show a bridge $e$ gives a nonzero 1-chain $c_e$ that’s orthogonal to all 1-cycles, so the hard part is showing that for a bridgeless graph, if $c_\gamma$ is orthogonal to every 1-cycle then $c_\gamma = 0$. The idea is to start with a path for which $c_\gamma \ne 0$. We hit this path with Lemma C, which lets us replace $\gamma$ by a simple path $\delta$. The point is that a simple path is a lot easier to deal with than a general path: a general path could wind around crazily, passing over every edge of our graph multiple times.

Then, assuming $X$ has no bridges, we use Ilya Bogdanov’s idea to build a 1-cycle that’s not orthogonal to $c_\delta$. The basic idea is to take the path $\delta : x \to y$ and write it out as $\delta = e_1 \cdots e_n$. Since the last edge $e_n$ is not a bridge, there must be a path from $y$ back to $x$ that does not use the edge $e_n$ or its inverse. Combining this path with $\delta$ we can construct a loop, which gives a cycle having nonzero inner product with $c_\delta$ and thus with $c_\gamma$.

I’m deliberately glossing over some difficulties that can arise, so see the paper for details!   █

### Embedding the whole crystal

Okay: so far, we’ve taken a connected bridgeless graph $X$ and embedded its atoms into the space of 1-cycles via a map

$i : A \to Z_1(X,\mathbb{R}) .$

These atoms are the vertices of the maximal abelian cover $\overline{X}$. Now we’ll extend $i$ to an embedding of the whole graph $\overline{X}$ — or to be precise, its geometric realization $|\overline{X}|$. Remember, for us a graph is an abstract combinatorial gadget; its geometric realization is a topological space where the edges become closed intervals.

The idea is that just as $i$ maps each atom to a point in the vector space $Z_1(X,\mathbb{R})$, $j$ maps each edge of $|\overline{X}|$ to a straight line segment between such points. These line segments serve as the ‘bonds’ of a topological crystal. The only challenge is to show that these bonds do not cross each other.

Theorem B. If $X$ is a connected graph with basepoint, the map $i : A \to Z_1(X,\mathbb{R})$ extends to a continuous map

$j : |\overline{X}| \to Z_1(X,\mathbb{R})$

sending each edge of $|\overline{X}|$ to a straight line segment in $Z_1(X,\mathbb{R})$. If $X$ has no bridges, then $j$ is one-to-one.

Proof. The first part is easy; the second part takes real work! The problem is to show the edges don’t cross. Greg Egan and I couldn’t do it using just Lemma D above. However, there’s a nice argument that goes back and uses Lemma C — read the paper for details.

As usual, history is different than what you read in math papers: David Speyer gave us a nice proof of Lemma D, and that was good enough to prove that atoms are mapped into the space of 1-cycles in a one-to-one way, but we only came up with Lemma C after weeks of struggling to prove the edges don’t cross.   █

### Connections to tropical geometry

Tropical geometry sets up a nice analogy between Riemann surfaces and graphs. The Abel–Jacobi map embeds any Riemann surface $\Sigma$ in its Jacobian, which is the torus $H_1(\Sigma,\mathbb{R})/H_1(\Sigma,\mathbb{Z})$. We can similarly define the Jacobian of a graph $X$ to be $H_1(X,\mathbb{R})/H_1(X,\mathbb{Z})$. Theorem B yields a way to embed a graph, or more precisely its geometric realization $|X|$, into its Jacobian. This is the analogue, for graphs, of the Abel–Jacobi map.

After I put this paper on the arXiv, I got an email from Matt Baker saying that he had already proved Theorem A — or to be precise, something that’s clearly equivalent. It’s Theorem 1.8 here:

• Matthew Baker and Serguei Norine, Riemann–Roch and Abel–Jacobi theory on a finite graph.

This says that the vertices of a bridgeless graph $X$ are embedded in its Jacobian by means of the graph-theoretic analogue of the Abel–Jacobi map.

What I really want to know is whether someone’s written up a proof that this map embeds the whole graph, not just its vertices, into its Jacobian in a one-to-one way. That would imply Theorem B. For more on this, try my conversation with David Speyer.

Anyway, there’s a nice connection between topological crystallography and tropical geometry, and not enough communication between the two communities. Once I figure out what the tropical folks have proved, I will revise my paper to take that into account.

Next time I’ll talk about more examples of topological crystals!

## August 27, 2016

### David Hogg — MCMC tutorial and stellar ages

I spent my research time today working on finishing the instruction manual on MCMC that Foreman-Mackey and I are writing as part of my Data Analysis Recipes series. Our goal is to get this posted to arXiv this week. I enjoy writing but learned (once again) in the re-reading that I am not properly critical of my own prose. It really takes an outsider—or a long break—to see what needs to be fixed. The long-term preservation of science is in the hands of scientists, so the writing we do is important! Anyway, enough philosophy; this is pedagogy, not research: I am trying to make this document the most useful thing it can be. I also read and commented on a big new paper by Anna Y. Q. Ho on red giant masses and ages measured in the LAMOST project. Her paper includes an incredible map of the stellar ages as a function of position on the sky, and the different components of the Galaxy are obvious!

### n-Category CaféJobs at Heriot-Watt

We at the mathematics department at the University of Edinburgh are doing more and more things in conjunction with our sisters and brothers at Heriot–Watt University, also in Edinburgh. For instance, our graduate students take classes together, and about a dozen of them are members of both departments simultaneously. We’re planning to strengthen those links in the years to come.

The news is that Heriot–Watt are hiring.

They’re looking for one or more “pure” mathematicians. These are permanent jobs at any level, from most junior to most senior. There’s significant interest in category theory there, in contexts such as mathematical physics and semigroup theory — e.g. when I taught an introductory category theory course last year, there was a good bunch of participants from Heriot–Watt.

In case you were wondering, Heriot was goldsmith to the royal courts of Scotland and Denmark in the 16th century. Gold $\mapsto$ money $\mapsto$ university, apparently. Watt is the Scottish engineer James Watt, as in “60-watt lightbulb”.

## August 26, 2016

### Terence Tao — An erratum to “Global regularity of wave maps. II. Small energy in two dimensions”

Fifteen years ago, I wrote a paper entitled Global regularity of wave maps. II. Small energy in two dimensions, in which I established global regularity of wave maps from two spatial dimensions to the unit sphere, assuming that the initial data had small energy. Recently, Hao Jia (personal communication) discovered a small gap in the argument that requires a slightly non-trivial fix. The issue does not really affect the subsequent literature, because the main result has since been reproven and extended by methods that avoid the gap (see in particular this subsequent paper of Tataru), but I have decided to describe the gap and its fix on this blog.

I will assume familiarity with the notation of my paper. In Section 10, some complicated spaces ${S[k] = S[k]({\bf R}^{1+n})}$ are constructed for each frequency scale ${k}$, and then a further space ${S(c) = S(c)({\bf R}^{1+n})}$ is constructed for a given frequency envelope ${c}$ by the formula

$\displaystyle \| \phi \|_{S(c)({\bf R}^{1+n})} := \|\phi \|_{L^\infty_t L^\infty_x({\bf R}^{1+n})} + \sup_k c_k^{-1} \| \phi_k \|_{S[k]({\bf R}^{1+n})} \ \ \ \ \ (1)$

where ${\phi_k := P_k \phi}$ is the Littlewood-Paley projection of ${\phi}$ to frequency magnitudes ${\sim 2^k}$. Then, given a spacetime slab ${[-T,T] \times {\bf R}^n}$, we define the restrictions

$\displaystyle \| \phi \|_{S(c)([-T,T] \times {\bf R}^n)} := \inf \{ \| \tilde \phi \|_{S(c)({\bf R}^{1+n})}: \tilde \phi \downharpoonright_{[-T,T] \times {\bf R}^n} = \phi \}$

where the infimum is taken over all extensions ${\tilde \phi}$ of ${\phi}$ to the Minkowski spacetime ${{\bf R}^{1+n}}$; similarly one defines

$\displaystyle \| \phi_k \|_{S_k([-T,T] \times {\bf R}^n)} := \inf \{ \| \tilde \phi_k \|_{S_k({\bf R}^{1+n})}: \tilde \phi_k \downharpoonright_{[-T,T] \times {\bf R}^n} = \phi_k \}.$

The gap in the paper is as follows: it was implicitly assumed that one could restrict (1) to the slab ${[-T,T] \times {\bf R}^n}$ to obtain the equality

$\displaystyle \| \phi \|_{S(c)([-T,T] \times {\bf R}^n)} = \|\phi \|_{L^\infty_t L^\infty_x([-T,T] \times {\bf R}^n)} + \sup_k c_k^{-1} \| \phi_k \|_{S[k]([-T,T] \times {\bf R}^n)}.$

(This equality is implicitly used to establish the bound (36) in the paper.) Unfortunately, (1) only gives the lower bound, not the upper bound, and it is the upper bound which is needed here. The problem is that the extensions ${\tilde \phi_k}$ of ${\phi_k}$ that are optimal for computing ${\| \phi_k \|_{S[k]([-T,T] \times {\bf R}^n)}}$ are not necessarily the Littlewood-Paley projections of the extensions ${\tilde \phi}$ of ${\phi}$ that are optimal for computing ${\| \phi \|_{S(c)([-T,T] \times {\bf R}^n)}}$.

To remedy the problem, one has to prove an upper bound of the form

$\displaystyle \| \phi \|_{S(c)([-T,T] \times {\bf R}^n)} \lesssim \|\phi \|_{L^\infty_t L^\infty_x([-T,T] \times {\bf R}^n)} + \sup_k c_k^{-1} \| \phi_k \|_{S[k]([-T,T] \times {\bf R}^n)}$

for all Schwartz ${\phi}$ (actually we need affinely Schwartz ${\phi}$, but one can easily normalise to the Schwartz case). Without loss of generality we may normalise the RHS to be ${1}$. Thus

$\displaystyle \|\phi \|_{L^\infty_t L^\infty_x([-T,T] \times {\bf R}^n)} \leq 1 \ \ \ \ \ (2)$

$\displaystyle \|P_k \phi \|_{S[k]([-T,T] \times {\bf R}^n)} \leq c_k \ \ \ \ \ (3)$

$\displaystyle \|\tilde \phi \|_{L^\infty_t L^\infty_x({\bf R}^{1+n})} \lesssim 1 \ \ \ \ \ (4)$

$\displaystyle \|P_k \tilde \phi \|_{S[k]({\bf R}^{1+n})} \lesssim c_k \ \ \ \ \ (5)$

for each ${k}$. Achieving a ${\tilde \phi}$ that obeys (4) is trivial (just extend ${\phi}$ by zero), but such extensions do not necessarily obey (5). On the other hand, from (3) we can find extensions ${\tilde \phi_k}$ of ${P_k \phi}$ such that

$\displaystyle \|\tilde \phi_k \|_{S[k]({\bf R}^{1+n})} \lesssim c_k; \ \ \ \ \ (6)$

the extension ${\tilde \phi := \sum_k \tilde \phi_k}$ will then obey (5) (here we use Lemma 9 from my paper), but unfortunately is not guaranteed to obey (4) (the ${S[k]}$ norm does control the ${L^\infty_t L^\infty_x}$ norm, but a key point about frequency envelopes for the small energy regularity problem is that the coefficients ${c_k}$, while bounded, are not necessarily summable).

This can be fixed as follows. For each ${k}$ we introduce a time cutoff ${\eta_k}$ supported on ${[-T-2^{-k}, T+2^{-k}]}$ that equals ${1}$ on ${[-T-2^{-k-1},T+2^{-k+1}]}$ and obeys the usual derivative estimates in between (the ${j^{th}}$ time derivative of size ${O_j(2^{jk})}$ for each ${j}$). Later we will prove the truncation estimate

$\displaystyle \| \eta_k \tilde \phi_k \|_{S[k]({\bf R}^{1+n})} \lesssim \| \tilde \phi_k \|_{S[k]({\bf R}^{1+n})}. \ \ \ \ \ (7)$

Assuming this estimate, then if we set ${\tilde \phi := \sum_k \eta_k \tilde \phi_k}$, then using Lemma 9 in my paper and (6), (7) (and the local stability of frequency envelopes) we have the required property (5). (There is a technical issue arising from the fact that ${\tilde \phi}$ is not necessarily Schwartz due to slow decay at temporal infinity, but by considering partial sums in the ${k}$ summation and taking limits we can check that ${\tilde \phi}$ is the strong limit of Schwartz functions, which suffices here; we omit the details for sake of exposition.) So the only issue is to establish (4), that is to say that

$\displaystyle \| \sum_k \eta_k(t) \tilde \phi_k(t) \|_{L^\infty_x({\bf R}^n)} \lesssim 1$

for all ${t \in {\bf R}}$.

For ${t \in [-T,T]}$ this is immediate from (2). Now suppose that ${t \in [T+2^{k_0-1}, T+2^{k_0}]}$ for some integer ${k_0}$ (the case when ${t \in [-T-2^{k_0}, -T-2^{k_0-1}]}$ is treated similarly). Then we can split

$\displaystyle \sum_k \eta_k(t) \tilde \phi_k(t) = \Phi_1 + \Phi_2 + \Phi_3$

where

$\displaystyle \Phi_1 := \sum_{k < k_0} \tilde \phi_k(T)$

$\displaystyle \Phi_2 := \sum_{k < k_0} \tilde \phi_k(t) - \tilde \phi_k(T)$

$\displaystyle \Phi_3 := \eta_{k_0}(t) \tilde \phi_{k_0}(t).$

The contribution of the ${\Phi_3}$ term is acceptable by (6) and estimate (82) from my paper. The term ${\Phi_1}$ sums to ${P_{ which is acceptable by (2). So it remains to control the ${L^\infty_x}$ norm of ${\Phi_2}$. By the triangle inequality and the fundamental theorem of calculus, we can bound

$\displaystyle \| \Phi_2 \|_{L^\infty_x} \leq (t-T) \sum_{k < k_0} \| \partial_t \tilde \phi_k \|_{L^\infty_t L^\infty_x({\bf R}^{1+n})}.$

By hypothesis, ${t-T \leq 2^{-k_0}}$. Using the first term in (79) of my paper and Bernstein’s inequality followed by (6) we have

$\displaystyle \| \partial_t \tilde \phi_k \|_{L^\infty_t L^\infty_x({\bf R}^{1+n})} \lesssim 2^k \| \tilde \phi_k \|_{S[k]({\bf R}^{1+n})} \lesssim 2^k;$

and then we are done by summing the geometric series in ${k}$.

It remains to prove the truncation estimate (7). This estimate is similar in spirit to the algebra estimates already in my paper, but unfortunately does not seem to follow immediately from these estimates as written, and so one has to repeat the somewhat lengthy decompositions and case checkings used to prove these estimates. We do this below the fold.

— 1. Proof of truncation estimate —

Firstly, by rescaling (and changing ${T}$ as necessary) we may assume that ${k=0}$. By the triangle inequality and time translation invariance, it suffices to show an estimate of the form

$\displaystyle \| \eta \phi \|_{S[0]} \lesssim \| \phi \|_{S[0]}$

where ${\eta}$ is a smooth time cutoff that equals ${1}$ on ${(-\infty,0]}$ and is supported in ${(-\infty,1]}$, and all norms are understood to be on ${{\bf R}^{1+n}}$. We may normalise the right-hand side to be ${1}$, thus ${\phi}$ is supported in frequencies ${|\xi| \sim 1}$, and by equation (79) of my paper one has the estimates

$\displaystyle \| \nabla_{x,t} \phi \|_{L^\infty_t \dot H^{n/2-1}_x} \lesssim 1 \ \ \ \ \ (8)$

$\displaystyle \| \nabla_{x,t} \phi \|_{\dot X_0^{n/2-1,1/2,\infty}} \lesssim 1 \ \ \ \ \ (9)$

$\displaystyle \sum_{\kappa \in K_l} \| P_{0,\pm \kappa} Q^\pm_{<-2l} \phi \|_{S[0,\kappa]}^2 \lesssim 1 \ \ \ \ \ (10)$

$\displaystyle \| \nabla_{x,t}( \eta \phi) \|_{L^\infty_t \dot H^{n/2-1}_x} \lesssim 1 \ \ \ \ \ (11)$

$\displaystyle \| \nabla_{x,t} (\eta \phi) \|_{\dot X_0^{n/2-1,1/2,\infty}} \lesssim 1 \ \ \ \ \ (12)$

$\displaystyle \sum_{\kappa \in K_l} \| P_{0,\pm \kappa} Q^\pm_{<-2l} (\eta \phi) \|_{S[0,\kappa]}^2 \lesssim 1 \ \ \ \ \ (13)$

for all ${l>10}$.

The bound (11) easily follows from (8), the Leibniz rule, and using the frequency localisation of ${\phi}$ to ignore spatial derivatives. Now we turn to (12). From the definition of the ${\dot X_0^{n/2-1,1/2,\infty}}$ norms, we have

$\displaystyle \| Q_j \nabla_{x,t} \phi \|_{L^2_t L^2_x} \lesssim 2^{-j/2} \ \ \ \ \ (14)$

$\displaystyle \| Q_j \nabla_{x,t} (\eta \phi) \|_{L^2_t L^2_x} \lesssim 2^{-j/2} \ \ \ \ \ (15)$

for all integers ${j}$.

Fix ${j}$. We can use Littlewood-Paley operators to split ${\eta = \eta_{, where ${\eta_{ is supported on time frequencies ${|\tau| \leq 2^{j-10}}$ and ${\eta_{\geq j-10}}$ is supported on time frequencies ${|\tau| \geq 2^{-j-11}}$. For the contribution of ${\eta_{ one can replace ${\phi}$ in (15) by ${Q_{j-5 < \cdot < j+5}}$ (say) and the claim then follows from (14), the Leibniz rule, and Hölder’s inequality (again ignoring spatial derivatives). For the contribution of ${\eta_{\geq j-10}}$, we discard ${Q_j}$ and observe that ${\eta_{\geq j-10}}$ has an ${L^2_t L^\infty_x}$ norm of ${2^{-j/2}}$ (and its time derivative has a ${L^2_t L^\infty_x}$ norm of ${2^{j/2}}$), so this contribution is then acceptable from (8) and Hölder’s inequality.

Finally we need to show (13). Similarly to before, we split ${\eta = \eta_{<-2l-10} + \eta_{\geq -2l-10}}$. We also split ${\phi = Q^\pm_{<-2l} \phi + Q^\mp_{<-2l} \phi + Q_{\geq -2l} \phi}$, leaving us with the task of proving the four estimates

$\displaystyle \sum_{\kappa \in K_l} \| P_{0,\pm \kappa} Q^\pm_{<-2l} (\eta_{<-2l-10} Q^\pm_{<-2l} \phi) \|_{S[0,\kappa]}^2 \lesssim 1 \ \ \ \ \ (16)$

$\displaystyle \sum_{\kappa \in K_l} \| P_{0,\pm \kappa} Q^\pm_{<-2l} (\eta Q_{\geq -2l} \phi) \|_{S[0,\kappa]}^2 \lesssim 1 \ \ \ \ \ (17)$

$\displaystyle \sum_{\kappa \in K_l} \| P_{0,\pm \kappa} Q^\pm_{<-2l} (\eta_{\geq -2l-10} Q^\pm_{<-2l} \phi) \|_{S[0,\kappa]}^2 \lesssim 1. \ \ \ \ \ (18)$

$\displaystyle \sum_{\kappa \in K_l} \| P_{0,\pm \kappa} Q^\pm_{<-2l} (\eta Q^\mp_{< -2l} \phi) \|_{S[0,\kappa]}^2 \lesssim 1 \ \ \ \ \ (19)$

We begin with (16). The multiplier ${P_{0,\pm \kappa} Q^\pm_{<-2l}}$ is disposable in the sense of the paper, and similarly if one replaces ${P_{0,\pm \kappa}}$ by a slightly larger multiplier; this lets us bound the left-hand side of (16) by

$\displaystyle \sum_{\kappa \in K_l} \| P_{0,\pm \kappa} (\eta_{<-2l-10} Q^\pm_{<-2l} \phi) \|_{S[0,\kappa]}^2 \lesssim 1.$

The time cutoff ${\eta_{<-2l-10}}$ commutes with the spatial Fourier projection ${P_{0,\pm \kappa}}$ and can then be discarded by equation (66) of my paper. This term is thus acceptable thanks to (10).

Now we turn to (17). We can freely insert a factor of ${Q^\pm_{<-2l+5}}$ in front of ${P_{0,\pm \kappa} Q^\pm_{<-2l}}$. Applying estimate (75) from my paper, it then suffices to show that

$\displaystyle \| Q^\pm_{<-2l+5} (\eta Q_{\geq -2l} \phi) \|_{\dot X_0^{n/2,1/2,1}} \lesssim 1.$

From the Fourier support of the expression inside the norm, the left-hand side is bounded by

$\displaystyle \lesssim 2^{-l} \| \eta Q_{\geq -2l} \phi \|_{L^2_t L^2_x};$

discarding the cutoff ${\eta_{<-2l-10}}$ and using (9) we see that this contribution is acceptable.

Next, we show (18). Here we use the energy estimate from equation (27) (and (25)) of the paper. By repeating the proof of (11) (and using Lemma 4 from my paper) we see that

$\displaystyle \| \eta_{\geq -2l-10} Q^\pm_{<-2l} \phi[0] \|_{\dot H^{n/2} \times \dot H^{n/2-1}} \lesssim 1$

so it suffices to show that

$\displaystyle \| \Box( \eta_{\geq -2l-10} Q^\pm_{<-2l} \phi ) \|_{L^1_t L^2_x} \lesssim 1.$

Expanding out the d’Lambertian using the Leibniz rule, we are reduced to showing the estimates

$\displaystyle \| \eta_{\geq -2l-10} \Box Q^\pm_{<-2l} \phi \|_{L^1_t L^2_x} \lesssim 1. \ \ \ \ \ (20)$

$\displaystyle \| \eta'_{\geq -2l-10} \partial_t Q^\pm_{<-2l} \phi ) \|_{L^1_t L^2_x} \lesssim 1. \ \ \ \ \ (21)$

$\displaystyle \| \eta''_{\geq -2l-10} Q^\pm_{<-2l} \phi ) \|_{L^1_t L^2_x} \lesssim 1. \ \ \ \ \ (22)$

For (20) we note that ${\eta_{\geq -2l-10}}$ has an ${L^2_t L^\infty_x}$ norm of ${O( 2^l )}$, while from (9) ${\Box Q^\pm_{<-2l} \phi}$ has an ${L^2_t L^\infty_x}$ norm of ${O(2^{-l})}$, so the claim follows from Hölder’s inequality. For (21) we can similarly observe that ${\eta'_{\geq -2l-10}}$ has an ${L^1_t L^\infty_x}$ norm of ${O(1)}$ while from (8) we see that ${\partial_t Q^\pm_{<-2l} \phi}$ has an ${L^\infty_t L^2_x}$ norm of ${O(1)}$, so the claim again follows from Hölder’s inequality. A similar argument gives (22) (with an additional gain of ${2^{-2l}}$ coming from the second derivative on ${\eta_{\geq -2l-10}}$).

Finally, for (19), we observe from the Fourier separation between ${Q^\pm_{<-2l}}$ and ${Q^\mp_{<-2l}}$ that we may replace ${\eta}$ by ${\eta_{\geq -2l-10}}$ (in fact one could do a much more drastic replacement if desired). The claim now follows from repeating the proof of (18).

Filed under: math.AP, update Tagged: Hao Jia, wave maps

### Jordan Ellenberg — Arithmetic progression joke

Why was 7.333… disgusted by 7.666….?

Because 7.666…. ate twenty-five turds.

### Particlebites — The Delirium over Beryllium

Article: Particle Physics Models for the 17 MeV Anomaly in Beryllium Nuclear Decays
Authors: J.L. Feng, B. Fornal, I. Galon, S. Gardner, J. Smolinsky, T. M. P. Tait, F. Tanedo
Reference: arXiv:1608.03591 (Submitted to Phys. Rev. D)
Also featuring the results from:
— Gulyás et al., “A pair spectrometer for measuring multipolarities of energetic nuclear transitions” (description of detector; 1504.00489NIM)
— Krasznahorkay et al., “Observation of Anomalous Internal Pair Creation in 8Be: A Possible Indication of a Light, Neutral Boson”  (experimental result; 1504.01527PRL version; note PRL version differs from arXiv)
— Feng et al., “Protophobic Fifth-Force Interpretation of the Observed Anomaly in 8Be Nuclear Transitions” (phenomenology; 1604.07411; PRL)

Editor’s note: the author is a co-author of the paper being highlighted.

Recently there’s some press (see links below) regarding early hints of a new particle observed in a nuclear physics experiment. In this bite, we’ll summarize the result that has raised the eyebrows of some physicists, and the hackles of others.

## A crash course on nuclear physics

Nuclei are bound states of protons and neutrons. They can have excited states analogous to the excited states of at lowoms, which are bound states of nuclei and electrons. The particular nucleus of interest is beryllium-8, which has four neutrons and four protons, which you may know from the triple alpha process. There are three nuclear states to be aware of: the ground state, the 18.15 MeV excited state, and the 17.64 MeV excited state.

Beryllium-8 excited nuclear states. The 18.15 MeV state (red) exhibits an anomaly. Both the 18.15 MeV and 17.64 states decay to the ground through a magnetic, p-wave transition. Image adapted from Savage et al. (1987).

Most of the time the excited states fall apart into a lithium-7 nucleus and a proton. But sometimes, these excited states decay into the beryllium-8 ground state by emitting a photon (γ-ray). Even more rarely, these states can decay to the ground state by emitting an electron–positron pair from a virtual photon: this is called internal pair creation and it is these events that exhibit an anomaly.

## The beryllium-8 anomaly

Physicists at the Atomki nuclear physics institute in Hungary were studying the nuclear decays of excited beryllium-8 nuclei. The team, led by Attila J. Krasznahorkay, produced beryllium excited states by bombarding a lithium-7 nucleus with protons.

Beryllium-8 excited states are prepare by bombarding lithium-7 with protons.

The proton beam is tuned to very specific energies so that one can ‘tickle’ specific beryllium excited states. When the protons have around 1.03 MeV of kinetic energy, they excite lithium into the 18.15 MeV beryllium state. This has two important features:

1. Picking the proton energy allows one to only produce a specific excited state so one doesn’t have to worry about contamination from decays of other excited states.
2. Because the 18.15 MeV beryllium nucleus is produced at resonance, one has a very high yield of these excited states. This is very good when looking for very rare decay processes like internal pair creation.

What one expects is that most of the electron–positron pairs have small opening angle with a smoothly decreasing number as with larger opening angles.

Expected distribution of opening angles for ordinary internal pair creation events. Each line corresponds to nuclear transition that is electric (E) or magenetic (M) with a given orbital quantum number, l. The beryllium transitionsthat we’re interested in are mostly M1. Adapted from Gulyás et al. (1504.00489).

Instead, the Atomki team found an excess of events with large electron–positron opening angle. In fact, even more intriguing: the excess occurs around a particular opening angle (140 degrees) and forms a bump.

Number of events (dN/dθ) for different electron–positron opening angles and plotted for different excitation energies (Ep). For Ep=1.10 MeV, there is a pronounced bump at 140 degrees which does not appear to be explainable from the ordinary internal pair conversion. This may be suggestive of a new particle. Adapted from Krasznahorkay et al., PRL 116, 042501.

Here’s why a bump is particularly interesting:

1. The distribution of ordinary internal pair creation events is smoothly decreasing and so this is very unlikely to produce a bump.
2. Bumps can be signs of new particles: if there is a new, light particle that can facilitate the decay, one would expect a bump at an opening angle that depends on the new particle mass.

Schematically, the new particle interpretation looks like this:

Schematic of the Atomki experiment and new particle (X) interpretation of the anomalous events. In summary: protons of a specific energy bombard stationary lithium-7 nuclei and excite them to the 18.15 MeV beryllium-8 state. These decay into the beryllium-8 ground state. Some of these decays are mediated by the new X particle, which then decays in to electron–positron pairs of a certain opening angle that are detected in the Atomki pair spectrometer detector. Image from 1608.03591.

As an exercise for those with a background in special relativity, one can use the relation $(p_{e^+} + p_{e^-})^2 = m_X^2$ to prove the result:

$m_{X}^2 = \left(1-\left(\frac{E_{e^+}-E_{e^-}}{E_{e^+}+E_{e^-}}\right)^2\right) (E_{e^+}+E_{e^-})^2 \sin^2 \frac{\theta}{2}+\mathcal{O}(m_e^2)$

This relates the mass of the proposed new particle, X, to the opening angle θ and the energies E of the electron and positron. The opening angle bump would then be interpreted as a new particle with mass of roughly 17 MeV. To match the observed number of anomalous events, the rate at which the excited beryllium decays via the X boson must be 6×10-6 times the rate at which it goes into a γ-ray.

The anomaly has a significance of 6.8σ. This means that it’s highly unlikely to be a statistical fluctuation, as the 750 GeV diphoton bump appears to have been. Indeed, the conservative bet would be some not-understood systematic effect, akin to the 130 GeV Fermi γ-ray line.

## The beryllium that cried wolf?

Some physicists are concerned that beryllium may be the ‘boy that cried wolf,’ and point to papers by the late Fokke de Boer as early as 1996 and all the way to 2001. de Boer made strong claims about evidence for a new 10 MeV particle in the internal pair creation decays of the 17.64 MeV beryllium-8 excited state. These claims didn’t pan out, and in fact the instrumentation paper by the Atomki experiment rules out that original anomaly.

The proposed evidence for “de Boeron” is shown below:

The de Boer claim for a 10 MeV new particle. Left: distribution of opening angles for internal pair creation events in an E1 transition of carbon-12. This transition has similar energy splitting to the beryllium-8 17.64 MeV transition and shows good agreement with the expectations; as shown by the flat “signal – background” on the bottom panel. Right: the same analysis for the M1 internal pair creation events from the 17.64 MeV beryllium-8 states. The “signal – background” now shows a broad excess across all opening angles. Adapted from de Boer et al. PLB 368, 235 (1996).

When the Atomki group studied the same 17.64 MeV transition, they found that a key background component—subdominant E1 decays from nearby excited states—dramatically improved the fit and were not included in the original de Boer analysis. This is the last nail in the coffin for the proposed 10 MeV “de Boeron.”

However, the Atomki group also highlight how their new anomaly in the 18.15 MeV state behaves differently. Unlike the broad excess in the de Boer result, the new excess is concentrated in a bump. There is no known way in which additional internal pair creation backgrounds can contribute to add a bump in the opening angle distribution; as noted above: all of these distributions are smoothly falling.

The Atomki group goes on to suggest that the new particle appears to fit the bill for a dark photon, a reasonably well-motivated copy of the ordinary photon that differs in its overall strength and having a non-zero (17 MeV?) mass.

## Theory part 1: Not a dark photon

With the Atomki result was published and peer reviewed in Physics Review Letters, the game was afoot for theorists to understand how it would fit into a theoretical framework like the dark photon. A group from UC Irvine, University of Kentucky, and UC Riverside found that actually, dark photons have a hard time fitting the anomaly simultaneously with other experimental constraints. In the visual language of this recent ParticleBite, the situation was this:

It turns out that the minimal model of a dark photon cannot simultaneously explain the Atomki beryllium-8 anomaly without running afoul of other experimental constraints. Image adapted from this ParticleBite.

The main reason for this is that a dark photon with mass and interaction strength to fit the beryllium anomaly would necessarily have been seen by the NA48/2 experiment. This experiment looks for dark photons in the decay of neutral pions (π0). These pions typically decay into two photons, but if there’s a 17 MeV dark photon around, some fraction of those decays would go into dark-photon — ordinary-photon pairs. The non-observation of these unique decays rules out the dark photon interpretation.

The theorists then decided to “break” the dark photon theory in order to try to make it fit. They generalized the types of interactions that a new photon-like particle, X, could have, allowing protons, for example, to have completely different charges than electrons rather than having exactly opposite charges. Doing this does gross violence to the theoretical consistency of a theory—but they goal was just to see what a new particle interpretation would have to look like. They found that if a new photon-like particle talked to neutrons but not protons—that is, the new force were protophobic—then a theory might hold together.

Schematic description of how model-builders “hacked” the dark photon theory to fit both the beryllium anomaly while being consistent with other experiments. This hack isn’t pretty—and indeed, comes at the cost of potentially invalidating the mathematical consistency of the theory—but the exercise demonstrates the target for how to a complete theory might have to behave. Image adapted from this ParticleBite.

## Theory appendix: pion-phobia is protophobia

Editor’s note: what follows is for readers with some physics background interested in a technical detail; others may skip this section.

How does a new particle that is allergic to protons avoid the neutral pion decay bounds from NA48/2? Pions decay into pairs of photons through the well-known triangle-diagrams of the axial anomaly. The decay into photon–dark-photon pairs proceed through similar diagrams. The goal is then to make sure that these diagrams cancel.

A cute way to look at this is to assume that at low energies, the relevant particles running in the loop aren’t quarks, but rather nucleons (protons  and neutrons). In fact, since only the proton can talk to the photon, one only needs to consider proton loops. Thus if the new photon-like particle, X, doesn’t talk to protons, then there’s no diagram for the pion to decay into γX. This would be great if the story weren’t completely wrong.

Avoiding NA48/2 bounds requires that the new particle, X, is pion-phobic. It turns out that this is equivalent to X being protophobic. The correct way to see this is on the left, making sure that the contribution of up-quark loops cancels the contribution from down-quark loops. A slick (but naively completely wrong) calculation is on the right, arguing that effectively only protons run in the loop.

The correct way of seeing this is to treat the pion as a quantum superposition of an up–anti-up and down–anti-down bound state, and then make sure that the X charges are such that the contributions of the two states cancel. The resulting charges turn out to be protophobic.

The fact that the “proton-in-the-loop” picture gives the correct charges, however, is no coincidence. Indeed, this was precisely how Jack Steinberger calculated the correct pion decay rate. The key here is whether one treats the quarks/nucleons linearly or non-linearly in chiral perturbation theory. The relation to the Wess-Zumino-Witten term—which is what really encodes the low-energy interaction—is carefully explained in chapter 6a.2 of Georgi’s revised Weak Interactions.

## Theory part 2: Not a spin-0 particle

The above considerations focus on a new particle with the same spin and parity as a photon (spin-1, parity odd). Another result of the UCI study was a systematic exploration of other possibilities. They found that the beryllium anomaly could not be consistent with spin-0 particles. For a parity-odd, spin-0 particle, one cannot simultaneously conserve angular momentum and parity in the decay of the excited beryllium-8 state. (Parity violating effects are negligible at these energies.)

Parity and angular momentum conservation prohibit a “dark Higgs” (parity even scalar) from mediating the anomaly.

For a parity-odd pseudoscalar, the bounds on axion-like particles at 20 MeV suffocate any reasonable coupling. Measured in terms of the pseudoscalar–photon–photon coupling (which has dimensions of inverse GeV), this interaction is ruled out down to the inverse Planck scale.

Bounds on axion-like particles exclude a 20 MeV pseudoscalar with couplings to photons stronger than the inverse Planck scale. Adapted from 1205.2671 and 1512.03069.

• Dark Z bosons, cousins of the dark photon with spin-1 but indeterminate parity. This is very constrained by atomic parity violation.
• Axial vectors, spin-1 bosons with positive parity. These remain a theoretical possibility, though their unknown nuclear matrix elements make it difficult to write a predictive model. (See section II.D of 1608.03591.)

## Theory part 3: Nuclear input

The plot thickens when once also includes results from nuclear theory. Recent results from Saori Pastore, Bob Wiringa, and collaborators point out a very important fact: the 18.15 MeV beryllium-8 state that exhibits the anomaly and the 17.64 MeV state which does not are actually closely related.

Recall (e.g. from the first figure at the top) that both the 18.15 MeV and 17.64 MeV states are both spin-1 and parity-even. They differ in mass and in one other key aspect: the 17.64 MeV state carries isospin charge, while the 18.15 MeV state and ground state do not.

Isospin is the nuclear symmetry that relates protons to neutrons and is tied to electroweak symmetry in the full Standard Model. At nuclear energies, isospin charge is approximately conserved. This brings us to the following puzzle:

If the new particle has mass around 17 MeV, why do we see its effects in the 18.15 MeV state but not the 17.64 MeV state?

Naively, if the new particle emitted, X, carries no isospin charge, then isospin conservation prohibits the decay of the 17.64 MeV state through emission of an X boson. However, the Pastore et al. result tells us that actually, the isospin-neutral and isospin-charged states mix quantum mechanically so that the observed 18.15 and 17.64 MeV states are mixtures of iso-neutral and iso-charged states. In fact, this mixing is actually rather large, with mixing angle of around 10 degrees!

The result of this is that one cannot invoke isospin conservation to explain the non-observation of an anomaly in the 17.64 MeV state. In fact, the only way to avoid this is to assume that the mass of the X particle is on the heavier side of the experimentally allowed range. The rate for emission goes like the 3-momentum cubed (see section II.E of 1608.03591), so a small increase in the mass can suppresses the rate of emission by the lighter state by a lot.

The UCI collaboration of theorists went further and extended the Pastore et al. analysis to include a phenomenological parameterization of explicit isospin violation. Independent of the Atomki anomaly, they found that including isospin violation improved the fit for the 18.15 MeV and 17.64 MeV electromagnetic decay widths within the Pastore et al. formalism. The results of including all of the isospin effects end up changing the particle physics story of the Atomki anomaly significantly:

The rate of X emission (colored contours) as a function of the X particle’s couplings to protons (horizontal axis) versus neutrons (vertical axis). The best fit for a 16.7 MeV new particle is the dashed line in the teal region. The vertical band is the region allowed by the NA48/2 experiment. Solid lines show the dark photon and protophobic limits. Left: the case for perfect (unrealistic) isospin. Right: the case when isospin mixing and explicit violation are included. Observe that incorporating realistic isospin happens to have only a modest effect in the protophobic region. Figure from 1608.03591.

The results of the nuclear analysis are thus that:

1. An interpretation of the Atomki anomaly in terms of a new particle tends to push for a slightly heavier X mass than the reported best fit. (Remark: the Atomki paper does not do a combined fit for the mass and coupling nor does it report the difficult-to-quantify systematic errors  associated with the fit. This information is important for understanding the extent to which the X mass can be pushed to be heavier.)
2. The effects of isospin mixing and violation are important to include; especially as one drifts away from the purely protophobic limit.

## Theory part 4: towards a complete theory

The theoretical structure presented above gives a framework to do phenomenology: fitting the observed anomaly to a particle physics model and then comparing that model to other experiments. This, however, doesn’t guarantee that a nice—or even self-consistent—theory exists that can stretch over the scaffolding.

Indeed, a few challenges appear:

• The isospin mixing discussed above means the X mass must be pushed to the heavier values allowed by the Atomki observation.
• The “protophobic” limit is not obviously anomaly-free: simply asserting that known particles have arbitrary charges does not generically produce a mathematically self-consistent theory.
• Atomic parity violation constraints require that the X couple in the same way to left-handed and right-handed matter. The left-handed coupling implies that X must also talk to neutrinos: these open up new experimental constraints.

The Irvine/Kentucky/Riverside collaboration first note the need for a careful experimental analysis of the actual mass ranges allowed by the Atomki observation, treating the new particle mass and coupling as simultaneously free parameters in the fit.

Next, they observe that protophobic couplings can be relatively natural. Indeed: the Standard Model Z boson is approximately protophobic at low energies—a fact well known to those hunting for dark matter with direct detection experiments. For exotic new physics, one can engineer protophobia through a phenomenon called kinetic mixing where two force particles mix into one another. A tuned admixture of electric charge and baryon number, (Q-B), is protophobic.

Baryon number, however, is an anomalous global symmetry—this means that one has to work hard to make a baryon-boson that mixes with the photon (see 1304.0576 and 1409.8165 for examples). Another alternative is if the photon kinetically mixes with not baryon number, but the anomaly-free combination of “baryon-minus-lepton number,” Q-(B-L). This then forces one to apply additional model-building modules to deal with the neutrino interactions that come along with this scenario.

In the language of the ‘model building blocks’ above, result of this process looks schematically like this:

A complete theory is completely mathematically self-consistent and satisfies existing constraints. The additional bells and whistles required for consistency make additional predictions for experimental searches. Pieces of the theory can sometimes  be used to address other anomalies.

The theory collaboration presented examples of the two cases, and point out how the additional ‘bells and whistles’ required may tie to additional experimental handles to test these hypotheses. These are simple existence proofs for how complete models may be constructed.

## What’s next?

We have delved rather deeply into the theoretical considerations of the Atomki anomaly. The analysis revealed some unexpected features with the types of new particles that could explain the anomaly (dark photon-like, but not exactly a dark photon), the role of nuclear effects (isospin mixing and breaking), and the kinds of features a complete theory needs to have to fit everything (be careful with anomalies and neutrinos). The single most important next step, however, is and has always been experimental verification of the result.

While the Atomki experiment continues to run with an upgraded detector, what’s really exciting is that a swath of experiments that are either ongoing or in construction will be able to probe the exact interactions required by the new particle interpretation of the anomaly. This means that the result can be independently verified or excluded within a few years. A selection of upcoming experiments is highlighted in section IX of 1608.03591:

Other experiments that can probe the new particle interpretation of the Atomki anomaly. The horizontal axis is the new particle mass, the vertical axis is its coupling to electrons (normalized to the electric charge). The dark blue band is the target region for the Atomki anomaly. Figure from 1608.03591; assuming 100% branching ratio to electrons.

We highlight one particularly interesting search: recently a joint team of theorists and experimentalists at MIT proposed a way for the LHCb experiment to search for dark photon-like particles with masses and interaction strengths that were previously unexplored. The proposal makes use of the LHCb’s ability to pinpoint the production position of charged particle pairs and the copious amounts of D mesons produced at Run 3 of the LHC. As seen in the figure above, the LHCb reach with this search thoroughly covers the Atomki anomaly region.

## Implications

So where we stand is this:

• There is an unexpected result in a nuclear experiment that may be interpreted as a sign for new physics.
• The next steps in this story are independent experimental cross-checks; the threshold for a ‘discovery’ is if another experiment can verify these results.
• Meanwhile, a theoretical framework for understanding the results in terms of a new particle has been built and is ready-and-waiting. Some of the results of this analysis are important for faithful interpretation of the experimental results.

What if it’s nothing?

This is the conservative take—and indeed, we may well find that in a few years, the possibility that Atomki was observing a new particle will be completely dead. Or perhaps a source of systematic error will be identified and the bump will go away. That’s part of doing science.

Meanwhile, there are some important take-aways in this scenario. First is the reminder that the search for light, weakly coupled particles is an important frontier in particle physics. Second, for this particular anomaly, there are some neat take aways such as a demonstration of how effective field theory can be applied to nuclear physics (see e.g. chapter 3.1.2 of the new book by Petrov and Blechman) and how tweaking our models of new particles can avoid troublesome experimental bounds. Finally, it’s a nice example of how particle physics and nuclear physics are not-too-distant cousins and how progress can be made in particle–nuclear collaborations—one of the Irvine group authors (Susan Gardner) is a bona fide nuclear theorist who was on sabbatical from the University of Kentucky.

What if it’s real?

This is a big “what if.” On the other hand, a 6.8σ effect is not a statistical fluctuation and there is no known nuclear physics to produce a new-particle-like bump given the analysis presented by the Atomki experimentalists.

The threshold for “real” is independent verification. If other experiments can confirm the anomaly, then this could be a huge step in our quest to go beyond the Standard Model. While this type of particle is unlikely to help with the Hierarchy problem of the Higgs mass, it could be a sign for other kinds of new physics. One example is the grand unification of the electroweak and strong forces; some of the ways in which these forces unify imply the existence of an additional force particle that may be light and may even have the types of couplings suggested by the anomaly.

Could it be related to other anomalies?

The Atomki anomaly isn’t the first particle physics curiosity to show up at the MeV scale. While none of these other anomalies are necessarily related to the type of particle required for the Atomki result (they may not even be compatible!), it is helpful to remember that the MeV scale may still have surprises in store for us.

• The KTeV anomaly: The rate at which neutral pions decay into electron–positron pairs appears to be off from the expectations based on chiral perturbation theory. In 0712.0007, a group of theorists found that this discrepancy could be fit to a new particle with axial couplings. If one fixes the mass of the proposed particle to be 20 MeV, the resulting couplings happen to be in the same ballpark as those required for the Atomki anomaly. The important caveat here is that parameters for an axial vector to fit the Atomki anomaly are unknown, and mixed vector–axial states are severely constrained by atomic parity violation.

The KTeV anomaly interpreted as a new particle, U. From 0712.0007.

• The anomalous magnetic moment of the muon and the cosmic lithium problem: much of the progress in the field of light, weakly coupled forces comes from Maxim Pospelov. The anomalous magnetic moment of the muon, (g-2)μ, has a long-standing discrepancy from the Standard Model (see e.g. this blog post). While this may come from an error in the very, very intricate calculation and the subtle ways in which experimental data feed into it, Pospelov (and also Fayet) noted that the shift may come from a light (in the 10s of MeV range!), weakly coupled new particle like a dark photon. Similarly, Pospelov and collaborators showed that a new light particle in the 1-20 MeV range may help explain another longstanding mystery: the surprising lack of lithium in the universe (APS Physics synopsis).

Could it be related to dark matter?

A lot of recent progress in dark matter has revolved around the possibility that in addition to dark matter, there may be additional light particles that mediate interactions between dark matter and the Standard Model. If these particles are light enough, they can change the way that we expect to find dark matter in sometimes surprising ways. One interesting avenue is called self-interacting dark matter and is based on the observation that these light force carriers can deform the dark matter distribution in galaxies in ways that seem to fit astronomical observations. A 20 MeV dark photon-like particle even fits the profile of what’s required by the self-interacting dark matter paradigm, though it is very difficult to make such a particle consistent with both the Atomki anomaly and the constraints from direct detection.

Should I be excited?

Given all of the caveats listed above, some feel that it is too early to be in “drop everything, this is new physics” mode. Others may take this as a hint that’s worth exploring further—as has been done for many anomalies in the recent past. For researchers, it is prudent to be cautious, and it is paramount to be careful; but so long as one does both, then being excited about a new possibility is part what makes our job fun.

For the general public, the tentative hopes of new physics that pop up—whether it’s the Atomki anomaly, or the 750 GeV diphoton bumpa GeV bump from the galactic center, γ-ray lines at 3.5 keV and 130 GeV, or penguins at LHCb—these are the signs that we’re making use of all of the data available to search for new physics. Sometimes these hopes fizzle away, often they leave behind useful lessons about physics and directions forward. Maybe one of these days an anomaly will stick and show us the way forward.

Here are some of the popular-level press on the Atomki result. See the references at the top of this ParticleBite for references to the primary literature.

### David Hogg — new space!

In a low-research day, I got my first view of the new location of the NYU Center for Data Science, in the newly renovated building at 60 Fifth Ave. The space is a mix of permanent, hoteling, and studio space for faculty, researchers, staff, and students, designed to meet very diverse needs and wants. It is cool! I also discussed briefly with Daniela Huppenkothen (NYU) the scope of her first paper on the states of GRS 1915, the black-hole source with extremely complex x-ray timing characteristics.

### n-Category CaféMonoidal Categories with Projections

Monoidal categories are often introduced as an abstraction of categories with products. Instead of having the categorical product $\times$, we have some other product $\otimes$, and it’s required to behave in a somewhat product-like way.

But you could try to abstract more of the structure of a category with products than monoidal categories do. After all, when a category has products, it also comes with special maps $X \times Y \to X$ and $X \times Y \to Y$ for every $X$ and $Y$ (the projections). Abstracting this leads to the notion of “monoidal category with projections”.

I’m writing this because over at this thread on magnitude homology, we’re making heavy use of semicartesian monoidal categories. These are simply monoidal categories whose unit object is terminal. But the word “semicartesian” is repellently technical, and you’d be forgiven for believing that any mathematics using “semicartesian” anythings is bound to be going about things the wrong way. Name aside, you might simply think it’s rather ad hoc; the nLab article says it initially sounds like centipede mathematics.

I don’t know whether semicartesian monoidal categories are truly necessary to the development of magnitude homology. But I do know that they’re a more reasonable and less ad hoc concept than they might seem, because:

Theorem   A semicartesian monoidal category is the same thing as a monoidal category with projections.

So if you believe that “monoidal category with projections” is a reasonable or natural concept, you’re forced to believe the same about semicartesian monoidal categories.

I’m going to keep this post light and sketchy. A monoidal category with projections is a monoidal category $V = (V, \otimes, I)$ together with a distinguished pair of maps

$\pi^1_{X, Y} \colon X \otimes Y \to X, \qquad \pi^2_{X, Y} \colon X \otimes Y \to Y$

for each pair of objects $X$ and $Y$. We might call these “projections”. The projections are required to satisfy whatever equations they satisfy when $\otimes$ is categorical product $\times$ and the unit object $I$ is terminal. For instance, if you have three objects $X$, $Y$ and $Z$, then I can think of two ways to build a “projection” map $X \otimes Y \otimes Z \to X$:

• think of $X \otimes Y \otimes Z$ as $X \otimes (Y \otimes Z)$ and take $\pi^1_{X, Y \otimes Z}$; or

• think of $X \otimes Y \otimes Z$ as $(X \otimes Y) \otimes Z$, use $\pi^1_{X \otimes Y, Z}$ to project down to $X \otimes Y$, then use $\pi^1_{X, Y}$ to project from there to $X$.

One of the axioms for a monoidal category with projections is that these two maps are equal. You can guess the others.

A monoidal category is said to be cartesian if its monoidal structure is given by the categorical (“cartesian”) product. So, any cartesian monoidal category becomes a monoidal category with projections in an obvious way: take the projections $\pi^i_{X, Y}$ to be the usual product-projections.

That’s the motivating example of a monoidal category with projections, but there are others. For instance, take the ordered set $(\mathbb{N}, \geq)$, and view it as a category in the usual way but with a reversal of direction: there’s one object for each natural number $n$, and there’s a map $n \to m$ iff $n \geq m$. It’s monoidal under addition, with $0$ as the unit. Since $m + n \geq m$ and $m + n \geq n$ for all $m$ and $n$, we have maps $m + n \to m$ and $m + n \to n$.

In this way, $(\mathbb{N}, \geq)$ is a monoidal category with projections. But it’s not cartesian, since the categorical product of $m$ and $n$ in $(\mathbb{N}, \geq)$ is $max\{m, n\}$, not $m + n$.

Now, a monoidal category $(V, \otimes, I)$ is semicartesian if the unit object $I$ is terminal. Again, any cartesian monoidal category gives an example, but this isn’t the only kind of example. And again, the ordered set $(\mathbb{N}, \geq)$ demonstrates this: with the monoidal structure just described, $0$ is the unit object, and it’s terminal.

The point of this post is:

Theorem   A semicartesian monoidal category is the same thing as a monoidal category with projections.

I’ll state it no more precisely than that. I don’t know who this result is due to; the nLab page on semicartesian monoidal categories suggests it might be Eilenberg and Kelly, but I learned it from a Part III problem sheet of Peter Johnstone.

The proof goes roughly like this.

Start with a semicartesian monoidal category $V$. To build a monoidal category with projections, we have to define, for each $X$ and $Y$, a projection map $X \otimes Y \to X$ (and similarly for $Y$). Now, since $I$ is terminal, we have a unique map $Y \to I$. Tensoring with $X$ gives a map $X \otimes Y \to Y \otimes I$. But $Y \otimes I \cong Y$, so we’re done. That is, $\pi^1_{X, Y}$ is the composite

$X \otimes Y \stackrel{X \otimes !}{\longrightarrow} X \otimes I \cong X.$

After a few checks, we see that this makes $V$ into a monoidal category with projections.

In the other direction, start with a monoidal category $V$ with projections. We need to show that $V$ is semicartesian. In other words, we have to prove that for each object $X$, there is exactly one map $X \to I$. There’s at least one, because we have

$X \cong X \otimes I \stackrel{\pi^2_{X, I}}{\longrightarrow} I.$

I’ll skip the proof that there’s at most one, but it uses the axiom that the projections are natural transformations. (I didn’t mention that axiom, but of course it’s there.)

So we now have a way of turning a semicartesian monoidal category into a monoidal category with projections and vice versa. To finish the proof of the theorem, we have to show that these two processes are mutually inverse. That’s straightforward.

Here’s something funny about all this. A monoidal category with projections appears to be a monoidal category with extra structure, whereas a semicartesian monoidal category is a monoidal category with a certain property. The theorem tells us that in fact, there’s at most one possible way to equip a monoidal category with projections (and there is a way if and only if $I$ is terminal). So having projections turns out to be a property, not structure.

And that is my defence of semicartesian monoidal categories.

### Particlebites — Jets aren’t just a game of tag anymore

Article: Probing Quarkonium Production Mechanisms with Jet Substructure
Authors: Matthew Baumgart, Adam Leibovich, Thomas Mehen, and Ira Rothstein
Reference: arXiv:1406.2295 [hep-ph]

“Tag…you’re it!” is a popular game to play with jets these days at particle accelerators like the LHC. These collimated sprays of radiation are common in various types of high-energy collisions and can present a nasty challenge to both theorists and experimentalists (for more on the basic ideas and importance of jet physics, see my July bite on the subject). The process of tagging a jet generally means identifying the type of particle that initiated the jet. Since jets provide a significant contribution to backgrounds at high energy colliders, identifying where they come from can make doing things like discovering new particles much easier. While identifying backgrounds to new physics is important, in this bite I want to focus on how theorists are now using jets to study the production of hadrons in a unique way.

Over the years, a host of theoretical tools have been developed for making the study of jets tractable. The key steps of “reconstructing” jets are:

1. Choose a jet algorithm (i.e. basically pick a metric that decides which particles it thinks are “clustered”),
2. Identify potential jet axes (i.e. the centers of the jets),
3. Decide which particles are in/out of the jets based on your jet algorithm.

Figure 1: A basic 3-jet event where one of the reconstructed jets is found to have been initiated by a b quark. The process of finding such jets is called “tagging.”

Deciphering the particle content of a jet can often help to uncover what particle initiated the jet. While this is often enough for many analyses, one can ask the next obvious question: how are the momenta of the particles within the jet distributed? In other words, what does the inner geometry of the jet look like?

There are a number of observables that one can look at to study a jet’s geometry. These are generally referred to as jet substructure observables. Two basic examples are:

1. Jet-shape: This takes a jet of radius R and then identifies a sub-jet within it of radius r. By measuring the energy fraction contained within sub-jets of variable radius r, one can study where the majority of the jet’s energy/momentum is concentrated.
2. Jet mass: By measuring the invariant mass of all of the particles in a jet (while simultaneously considering the jet’s energy and radius) one can gain insight into how focused a jet is.

Figure 2: A basic way to produce quarkonium via the fragmentation of a gluon. The interactions highlighted in blue are calculated using standard perturbative QCD. The green zone is where things get tricky and non-perturbative models that are extracted from data must often be used.

One way in which phenomenologists are utilizing jet substructure technology is in the study of hadron production. In arXiv:1406.2295, Baumgart et. al. introduced a way to connect the world of jet physics with the world of quarkonia. These bound states of charm-anti-charm or bottom-anti-bottom quarks are the source of two things: great buzz words for impressing your friends and several outstanding problems within the standard model. While we’ve been studying quarkonia such the $J/\psi(c\bar{c})$ and the $\Upsilon(b\bar{b})$ for a half-century, there are still a bunch of very basic questions we have about how they are produced (more on this topic in future bites).

This paper offers a fresh approach to studying the various ways in which quarkonia are produced at the LHC by focusing on how they are produced within jets. The wealth of available jet physics technology then provides a new family of interesting observables. The authors first describe the various mechanisms by which quarkonia are produced. In the formalism of Non-relativistic (NR) QCD, the $J/\psi$ for example, is most frequently produced at the LHC (see Fig. 2) when a high energy gluon splits into a $c\bar{c}$ pair in one of several possible angular momentum and color quantum states. This pair then ultimately undergoes non-perturbative (i.e. we can’t really calculate them using standard techniques in quantum field theory) effects and becomes a color-singlet final state particle (as any reasonably minded particle should do). While this model makes some sense, we have no idea how often quarkonia are produced via each mechanism.

Figure 3: This plot from arXiv:1406.2295 shows how the probability that a gluon or quark fragments into a jet with a specific energy E that a contains a $J/\psi$ with a fraction $z$ of the original quark/gluon’s momentum varies for different mechanisms. The spectroscopic notation should be familiar from basic quantum mechanics. It gives the angular momentum and color quantum numbers of the $q\bar{q}$ pair that eventually becomes quarkonium. Notice that for different values of z and E, the different mechanisms behave differently. Thus this observable (i.e. that mouth full of a probability distribution I described) is said to have discriminating power between the different channels by which a $J/\psi$ is typically formed.

This paper introduces a theoretical formalism that looks at the following question: what is the probability that a parton (quark/gluon) hadronizes into a jet with a certain substructure and that contains a specific hadron with some fraction $z$ of the original partons energy? The authors show that the answer to this question is correlated with the answer to the question: How often are quarkonia produced via the different intermediate angular-momentum/color states of NRQCD? In other words, they show that studying how the geometry of the jets that contain quarkonia may lead to answers to decades old questions about how quarkonia are produced!

There are several other efforts to study hadron production through the lens of jet physics that have also done preliminary comparisons with ATLAS/CMS data (one such study will be the subject of my next bite). These studies look at the production of more general classes of hadrons and numbers of jets in events and see promising results when compared with 7 TeV data from ATLAS and CMS.

The moral of this story is that jets are now being viewed less as a source of troublesome backgrounds to new physics and more as a laboratory for studying long-standing questions about the underlying nature of hadronization. Jet physics offers innovative ways to look at old problems, offering a host of new and exciting observables to study at the LHC and other experiments.

1. The November Revolution: https://www.slac.stanford.edu/history/pubs/gilmannov.pdf. This transcript of a talk provides some nice background on, amongst other things, the momentous discovery of the $J/\psi$ in 1974 what is often referred to the November Revolution.
2. An Introduction to the NRQCD Factorization Approach to Heavy Quarkonium https://cds.cern.ch/record/319642/files/9702225.pdf. As good as it gets when it comes to outlines of the basics of this tried-and-true effective theory. This article will definitely take some familiarity with QFT but provides a great outline of the basics of the NRQCD Lagrangian, fields, decays etc.

## August 25, 2016

### Terence Tao — Notes on the “slice rank” of tensors

[This blog post was written jointly by Terry Tao and Will Sawin.]

In the previous blog post, one of us (Terry) implicitly introduced a notion of rank for tensors which is a little different from the usual notion of tensor rank, and which (following BCCGNSU) we will call “slice rank”. This notion of rank could then be used to encode the Croot-Lev-Pach-Ellenberg-Gijswijt argument that uses the polynomial method to control capsets.

Afterwards, several papers have applied the slice rank method to further problems – to control tri-colored sum-free sets in abelian groups (BCCGNSU, KSS) and from there to the triangle removal lemma in vector spaces over finite fields (FL), to control sunflowers (NS), and to bound progression-free sets in ${p}$-groups (P).

In this post we investigate the notion of slice rank more systematically. In particular, we show how to give lower bounds for the slice rank. In many cases, we can show that the upper bounds on slice rank given in the aforementioned papers are sharp to within a subexponential factor. This still leaves open the possibility of getting a better bound for the original combinatorial problem using the slice rank of some other tensor, but for very long arithmetic progressions (at least eight terms), we show that the slice rank method cannot improve over the trivial bound using any tensor.

It will be convenient to work in a “basis independent” formalism, namely working in the category of abstract finite-dimensional vector spaces over a fixed field ${{\bf F}}$. (In the applications to the capset problem one takes ${{\bf F}={\bf F}_3}$ to be the finite field of three elements, but most of the discussion here applies to arbitrary fields.) Given ${k}$ such vector spaces ${V_1,\dots,V_k}$, we can form the tensor product ${\bigotimes_{i=1}^k V_i}$, generated by the tensor products ${v_1 \otimes \dots \otimes v_k}$ with ${v_i \in V_i}$ for ${i=1,\dots,k}$, subject to the constraint that the tensor product operation ${(v_1,\dots,v_k) \mapsto v_1 \otimes \dots \otimes v_k}$ is multilinear. For each ${1 \leq j \leq k}$, we have the smaller tensor products ${\bigotimes_{1 \leq i \leq k: i \neq j} V_i}$, as well as the ${j^{th}}$ tensor product

$\displaystyle \otimes_j: V_j \times \bigotimes_{1 \leq i \leq k: i \neq j} V_i \rightarrow \bigotimes_{i=1}^k V_i$

defined in the obvious fashion. Elements of ${\bigotimes_{i=1}^k V_i}$ of the form ${v_j \otimes_j v_{\hat j}}$ for some ${v_j \in V_j}$ and ${v_{\hat j} \in \bigotimes_{1 \leq i \leq k: i \neq j} V_i}$ will be called rank one functions, and the slice rank (or rank for short) ${\hbox{rank}(v)}$ of an element ${v}$ of ${\bigotimes_{i=1}^k V_i}$ is defined to be the least nonnegative integer ${r}$ such that ${v}$ is a linear combination of ${r}$ rank one functions. If ${V_1,\dots,V_k}$ are finite-dimensional, then the rank is always well defined as a non-negative integer (in fact it cannot exceed ${\min( \hbox{dim}(V_1), \dots, \hbox{dim}(V_k))}$. It is also clearly subadditive:

$\displaystyle \hbox{rank}(v+w) \leq \hbox{rank}(v) + \hbox{rank}(w). \ \ \ \ \ (1)$

For ${k=1}$, ${\hbox{rank}(v)}$ is ${0}$ when ${v}$ is zero, and ${1}$ otherwise. For ${k=2}$, ${\hbox{rank}(v)}$ is the usual rank of the ${2}$-tensor ${v \in V_1 \otimes V_2}$ (which can for instance be identified with a linear map from ${V_1}$ to the dual space ${V_2^*}$). The usual notion of tensor rank for higher order tensors uses complete tensor products ${v_1 \otimes \dots \otimes v_k}$, ${v_i \in V_i}$ as the rank one objects, rather than ${v_j \otimes_j v_{\hat j}}$, giving a rank that is greater than or equal to the slice rank studied here.

From basic linear algebra we have the following equivalences:

• (i) One has ${\hbox{rank}(v) \leq r}$.
• (ii) One has a representation of the form

$\displaystyle v = \sum_{j=1}^k \sum_{s \in S_j} v_{j,s} \otimes_j v_{\hat j,s}$

where ${S_1,\dots,S_k}$ are finite sets of total cardinality ${|S_1|+\dots+|S_k|}$ at most ${r}$, and for each ${1 \leq j \leq k}$ and ${s \in S_j}$, ${v_{j,s} \in V_j}$ and ${v_{\hat j,s} \in \bigotimes_{1 \leq i \leq k: i \neq j} V_i}$.

• (iii) One has

$\displaystyle v \in \sum_{j=1}^k U_j \otimes_j \bigotimes_{1 \leq i \leq k: i \neq j} V_i$

where for each ${j=1,\dots,k}$, ${U_j}$ is a subspace of ${V_j}$ of total dimension ${\hbox{dim}(U_1)+\dots+\hbox{dim}(U_k)}$ at most ${r}$, and we view ${U_j \otimes_j \bigotimes_{1 \leq i \leq k: i \neq j} V_i}$ as a subspace of ${\bigotimes_{i=1}^k V_i}$ in the obvious fashion.

• (iv) (Dual formulation) There exist subspaces ${W_j}$ of the dual space ${V_j^*}$ for ${j=1,\dots,k}$, of total dimension at least ${\hbox{dim}(V_1)+\dots+\hbox{dim}(V_k) - r}$, such that ${v}$ is orthogonal to ${\bigotimes_{j=1}^k W_j}$, in the sense that one has the vanishing

$\displaystyle \langle \bigotimes_{j=1}^k w_j, v \rangle = 0$

for all ${w_j \in W_j}$, where ${\langle, \rangle: \bigotimes_{j=1}^k V_j^* \times \bigotimes_{j=1}^k V_j \rightarrow {\bf F}}$ is the obvious pairing.

Proof: The equivalence of (i) and (ii) is clear from definition. To get from (ii) to (iii) one simply takes ${U_j}$ to be the span of the ${v_{j,s}}$, and conversely to get from (iii) to (ii) one takes the ${v_{j,s}}$ to be a basis of the ${U_j}$ and computes ${v_{\hat j,s}}$ by using a basis for the tensor product ${\bigotimes_{j=1}^k U_j \otimes_j \bigotimes_{1 \leq i \leq k: i \neq j} V_i}$ consisting entirely of functions of the form ${v_{j,s} \otimes_j e}$ for various ${e}$. To pass from (iii) to (iv) one takes ${W_j}$ to be the annihilator ${\{ w_j \in V_j: \langle w_j, v_j \rangle = 0 \forall v_j \in U_j \}}$ of ${U_j}$, and conversely to pass from (iv) to (iii). $\Box$

One corollary of the formulation (iv), is that the set of tensors of slice rank at most ${r}$ is Zariski closed (if the field ${{\mathbf F}}$ is algebraically closed), and so the slice rank itself is a lower semi-continuous function. This is in contrast to the usual tensor rank, which is not necessarily semicontinuous.

Proof: In view of Lemma 1(i and iv), this set is the union over tuples of integers ${d_1,\dots,d_k}$ with ${d_1 + \dots + d_k \geq \hbox{dim}(V_1)+\dots+\hbox{dim}(V_k) - r}$ of the projection from ${\hbox{Gr}(d_1, V_1) \times \dots \times \hbox{Gr}(d_k, V_k) \times ( V_1 \otimes \dots \otimes V_k)}$ of the set of tuples ${(W_1,\dots,W_k, v)}$ with ${ v}$ orthogonal to ${W_1 \times \dots \times W_k}$, where ${\hbox{Gr}(d,V)}$ is the Grassmanian parameterizing ${d}$-dimensional subspaces of ${V}$.

One can check directly that the set of tuples ${(W_1,\dots,W_k, v)}$ with ${ v}$ orthogonal to ${W_1 \times \dots \times W_k}$ is Zariski closed in ${\hbox{Gr}(d_1, V_1) \times \dots \times \hbox{Gr}(d_k, V_k) \times V_1 \otimes \dots \otimes V_k}$ using a set of equations of the form ${\langle \bigotimes_{j=1}^k w_j, v \rangle = 0}$ locally on ${\hbox{Gr}(d_1, V_1) \times \dots \times \hbox{Gr}(d_k, V_k) }$. Hence because the Grassmanian is a complete variety, the projection of this set to ${V_1 \otimes \dots \otimes V_k}$ is also Zariski closed. So the finite union over tuples ${d_1,\dots,d_k}$ of these projections is also Zariski closed.

$\Box$

We also have good behaviour with respect to linear transformations:

$\displaystyle \hbox{rank}( (\bigotimes_{j=1}^k \phi_j)(v) ) \leq \hbox{rank}(v). \ \ \ \ \ (2)$

Thus, for instance, the rank of a tensor ${v \in \bigotimes_{j=1}^k V_k}$ is intrinsic in the sense that it is unaffected by any enlargements of the spaces ${V_1,\dots,V_k}$.

Proof: The bound (2) is clear from the formulation (ii) of rank in Lemma 1. For equality, apply (2) to the injective ${\phi_j}$, as well as to some arbitrarily chosen left inverses ${\phi_j^{-1}: W_j \rightarrow V_j}$ of the ${\phi_j}$. $\Box$

Computing the rank of a tensor is difficult in general; however, the problem becomes a combinatorial one if one has a suitably sparse representation of that tensor in some basis, where we will measure sparsity by the property of being an antichain.

$\displaystyle v = \sum_{(s_1,\dots,s_k) \in \Gamma} c_{s_1,\dots,s_k} v_{1,s_1} \otimes \dots \otimes v_{k,s_k} \ \ \ \ \ (3)$

$\displaystyle \hbox{rank}(v) \leq \min_{\Gamma = \Gamma_1 \cup \dots \cup \Gamma_k} |\pi_1(\Gamma_1)| + \dots + |\pi_k(\Gamma_k)| \ \ \ \ \ (4)$

$\displaystyle \hbox{rank}(v) \geq \min_{\Gamma' = \Gamma_1 \cup \dots \cup \Gamma_k} |\pi_1(\Gamma_1)| + \dots + |\pi_k(\Gamma_k)|. \ \ \ \ \ (5)$

Proof: By Lemma 3 (or by enlarging the bases ${v_{j,s_j}}$), we may assume without loss of generality that each of the ${V_j}$ is spanned by the ${v_{j,s_j}}$. By relabeling, we can also assume that each ${S_j}$ is of the form

$\displaystyle S_j = \{1,\dots,|S_j|\}$

with the usual ordering, and by Lemma 3 we may take each ${V_j}$ to be ${{\bf F}^{|S_j|}}$, with ${v_{j,s_j} = e_{s_j}}$ the standard basis.

Let ${r}$ denote the rank of ${v}$. To show (4), it suffices to show the inequality

$\displaystyle r \leq |\pi_1(\Gamma_1)| + \dots + |\pi_k(\Gamma_k)| \ \ \ \ \ (6)$

$\displaystyle \sum_{(s_1,\dots,s_k) \in \Gamma_j} c_{s_1,\dots,s_k} e_{s_1} \otimes \dots \otimes e_{s_k}$

can (after collecting terms) be written as

$\displaystyle \sum_{s_j \in \pi_j(\Gamma_j)} e_{s_j} \otimes_j v_{\hat j,s_j}$

Now assume that the ${c_{s_1,\dots,s_k}}$ are all non-zero and that ${\Gamma'}$ is the set of maximal elements of ${\Gamma}$. To conclude the proposition, it suffices to show that the reverse inequality

$\displaystyle r \leq |\pi_1(\Gamma_1)| + \dots + |\pi_k(\Gamma_k)| \ \ \ \ \ (7)$

for some ${\Gamma_1,\dots,\Gamma_k}$ covering ${\Gamma'}$. By Lemma 1(iv), there exist subspaces ${W_j}$ of ${({\bf F}^{|S_j|})^*}$ whose dimension ${d_j := \hbox{dim}(W_j)}$ sums to

$\displaystyle \sum_{j=1}^k d_j = \sum_{j=1}^k |S_j| - r \ \ \ \ \ (8)$

Let ${1 \leq j \leq k}$. Using Gaussian elimination, one can find a basis ${w_{j,1},\dots,w_{j,d_j}}$ of ${W_j}$ whose representation in the standard dual basis ${e^*_{1},\dots,e^*_{|S_j|}}$ of ${({\bf F}^{|S_j|})^*}$ is in row-echelon form. That is to say, there exist natural numbers

$\displaystyle 1 \leq s_{j,1} < \dots < s_{j,d_j} \leq |S_j|$

such that for all ${1 \leq t \leq d_j}$, ${w_{j,t}}$ is a linear combination of the dual vectors ${e^*_{s_{j,t}},\dots,e^*_{|S_j|}}$, with the ${e^*_{s_{j,t}}}$ coefficient equal to one.

We now claim that ${\prod_{j=1}^k \{ s_{j,t}: 1 \leq t \leq d_j \}}$ is disjoint from ${\Gamma'}$. Suppose for contradiction that this were not the case, thus there exists ${1 \leq t_j \leq d_j}$ for each ${1 \leq j \leq k}$ such that

$\displaystyle (s_{1,t_1}, \dots, s_{k,t_k}) \in \Gamma'.$

As ${\Gamma'}$ is the set of maximal elements of ${\Gamma}$, this implies that

$\displaystyle (s'_1,\dots,s'_k) \not \in \Gamma$

for any tuple ${(s'_1,\dots,s'_k) \in \prod_{j=1}^k \{ s_{j,t_j}, \dots, |S_j|\}}$ other than ${(s_{1,t_1}, \dots, s_{k,t_k})}$. On the other hand, we know that ${w_{j,t_j}}$ is a linear combination of ${e^*_{s_{j,t_j}},\dots,e^*_{|S_j|}}$, with the ${e^*_{s_{j,t_j}}}$ coefficient one. We conclude that the tensor product ${\bigotimes_{j=1}^k w_{j,t_j}}$ is equal to

$\displaystyle \bigotimes_{j=1}^k e^*_{s_{j,t_j}}$

plus a linear combination of other tensor products ${\bigotimes_{j=1}^k e^*_{s'_j}}$ with ${(s'_1,\dots,s'_k)}$ not in ${\Gamma}$. Taking inner products with (3), we conclude that ${\langle v, \bigotimes_{j=1}^k w_{j,t_j}\rangle = c_{s_{1,t_1},\dots,s_{k,t_k}} \neq 0}$, contradicting the fact that ${v}$ is orthogonal to ${\prod_{j=1}^k W_j}$. Thus we have ${\prod_{j=1}^k \{ s_{j,t}: 1 \leq t \leq d_j \}}$ disjoint from ${\Gamma'}$.

For each ${1 \leq j \leq k}$, let ${\Gamma_j}$ denote the set of tuples ${(s_1,\dots,s_k)}$ in ${\Gamma'}$ with ${s_j}$ not of the form ${\{ s_{j,t}: 1 \leq t \leq d_j \}}$. From the previous discussion we see that the ${\Gamma_j}$ cover ${\Gamma'}$, and we clearly have ${\pi_j(\Gamma_j) \leq |S_j| - d_j}$, and hence from (8) we have (7) as claimed. $\Box$

As an instance of this proposition, we recover the computation of diagonal rank from the previous blog post:

Example 5 Let ${V_1,\dots,V_k}$ be finite-dimensional vector spaces over a field ${{\bf F}}$ for some ${k \geq 2}$. Let ${d}$ be a natural number, and for ${1 \leq j \leq k}$, let ${e_{j,1},\dots,e_{j,d}}$ be a linearly independent set in ${V_j}$. Let ${c_1,\dots,c_d}$ be non-zero coefficients in ${{\bf F}}$. Then

$\displaystyle \sum_{t=1}^d c_t e_{1,t} \otimes \dots \otimes e_{k,t}$

has rank ${d}$. Indeed, one applies the proposition with ${S_1,\dots,S_k}$ all equal to ${\{1,\dots,d\}}$, with ${\Gamma}$ the diagonal in ${S_1 \times \dots \times S_k}$; this is an antichain if we give one of the ${S_i}$ the standard ordering, and another of the ${S_i}$ the opposite ordering (and ordering the remaining ${S_i}$ arbitrarily). In this case, the ${\pi_j}$ are all bijective, and so it is clear that the minimum in (4) is simply ${d}$.

The combinatorial minimisation problem in the above proposition can be solved asymptotically when working with tensor powers, using the notion of the Shannon entropy ${h(X)}$ of a discrete random variable ${X}$.

Let ${v \in \bigotimes_{j=1}^k V_j}$ be a tensor of the form (3) for some coefficients ${c_{s_1,\dots,s_k}}$. For each natural number ${n}$, let ${v^{\otimes n}}$ be the tensor power of ${n}$ copies of ${v}$, viewed as an element of ${\bigotimes_{j=1}^k V_j^{\otimes n}}$. Then

$\displaystyle \hbox{rank}(v^{\otimes n}) \leq \exp( (H + o(1)) n ) \ \ \ \ \ (9)$

$\displaystyle H = \hbox{sup}_{(X_1,\dots,X_k)} \hbox{min}( h(X_1), \dots, h(X_k) ) \ \ \ \ \ (10)$

$\displaystyle \hbox{rank}(v^{\otimes n}) \geq \exp( (H' + o(1)) n ) \ \ \ \ \ (11)$

as ${n \rightarrow \infty}$. In particular, if the maximizer in (10) is supported on the maximal elements of ${\Gamma}$ (which always holds if ${\Gamma}$ is an antichain in the product ordering), then equality holds in (9).

Proof:

It will suffice to show that

$\displaystyle \min_{\Gamma^n = \Gamma_{n,1} \cup \dots \cup \Gamma_{n,k}} |\pi_{n,1}(\Gamma_{n,1})| + \dots + |\pi_{n,k}(\Gamma_{n,k})| = \exp( (H + o(1)) n ) \ \ \ \ \ (12)$

as ${n \rightarrow \infty}$, where ${\pi_{n,j}: \prod_{i=1}^k S_i^n \rightarrow S_j^n}$ is the projection map. Then the same thing will apply to ${\Gamma'}$ and ${H'}$. Then applying Proposition 4, using the lexicographical ordering on ${S_j^n}$ and noting that, if ${\Gamma'}$ are the maximal elements of ${\Gamma}$, then ${\Gamma'^n}$ are the maximal elements of ${\Gamma^n}$, we obtain both (9) and (11).

We first prove the lower bound. By compactness (and the continuity properties of entropy), we can find a random variable ${(X_1,\dots,X_k)}$ taking values in ${\Gamma}$ such that

$\displaystyle H = \hbox{min}( h(X_1), \dots, h(X_k) ). \ \ \ \ \ (13)$

$\displaystyle |\frac{|\{ 1 \leq l \leq n: a_l = a \}|}{n} - {\bf P}( (X_1,\dots,X_k) = a )| \leq \varepsilon.$

By the asymptotic equipartition property, the cardinality of ${\Sigma}$ can be computed to be

$\displaystyle |\Sigma| = \exp( (h( X_1,\dots,X_k)+o(1)) n ) \ \ \ \ \ (14)$

$\displaystyle |\pi_{n,j}(\Sigma)| = \exp( (h( X_j)+o(1)) n ),$

$\displaystyle |\{ \sigma \in \Sigma: \pi_{n,j}(\sigma) = s_{n,j} \}| \leq \exp( (h( X_1,\dots,X_k)-h(X_j)+o(1)) n ). \ \ \ \ \ (15)$

$\displaystyle |\Gamma_{n,j} \cap \Sigma| \geq \frac{1}{k} |\Sigma|$

$\displaystyle |\pi_{n,j}( \Gamma_{n,j} \cap \Sigma)| \geq \frac{1}{k} \exp( (h( X_j)+o(1)) n )$

which by (13) implies that

$\displaystyle |\pi_{n,1}(\Gamma_{n,1})| + \dots + |\pi_{n,k}(\Gamma_{n,k})| \geq \exp( (H + o(1)) n )$

noting that the ${\frac{1}{k}}$ factor can be absorbed into the ${o(1)}$ error). This gives the lower bound in (12).

Now we prove the upper bound. We can cover ${\Gamma^n}$ by ${O(\exp(o(n))}$ sets of the form ${\Sigma_{X_1,\dots,X_k}}$ for various choices of random variables ${(X_1,\dots,X_k)}$ taking values in ${\Gamma}$. For each such random variable ${(X_1,\dots,X_k)}$, we can find ${1 \leq j \leq k}$ such that ${h(X_j) \leq H}$; we then place all of ${\Sigma_{X_1,\dots,X_k}}$ in ${\Gamma_j}$. It is then clear that the ${\Gamma_j}$ cover ${\Gamma}$ and that

$\displaystyle |\Gamma_j| \leq \exp( (H+o(1)) n )$

for all ${j=1,\dots,n}$, giving the required upper bound. $\Box$

It is of interest to compute the quantity ${H}$ in (10). We have the following criterion for when a maximiser occurs:

Proposition 7 Let ${S_1,\dots,S_k}$ be finite sets, and ${\Gamma \subset S_1 \times \dots \times S_k}$ be non-empty. Let ${H}$ be the quantity in (10). Let ${(X_1,\dots,X_k)}$ be a random variable taking values in ${\Gamma}$, and let ${\Gamma^* \subset \Gamma}$ denote the essential range of ${(X_1,\dots,X_k)}$, that is to say the set of tuples ${(t_1,\dots,t_k)\in \Gamma}$ such that ${{\bf P}( X_1=t_1, \dots, X_k = t_k)}$ is non-zero. Then the following are equivalent:

$\displaystyle D = H \sum_{j=1}^k w_j. \ \ \ \ \ (17)$

Proof: We first show that (i) implies (ii). The function ${p \mapsto p \log \frac{1}{p}}$ is concave on ${[0,1]}$. As a consequence, if we define ${C}$ to be the set of tuples ${(h_1,\dots,h_k) \in [0,+\infty)^k}$ such that there exists a random variable ${(Y_1,\dots,Y_k)}$ taking values in ${\Gamma}$ with ${h(Y_j)=h_j}$, then ${C}$ is convex. On the other hand, by (10), ${C}$ is disjoint from the orthant ${(H,+\infty)^k}$. Thus, by the hyperplane separation theorem, we conclude that there exists a half-space

$\displaystyle \{ (h_1,\dots,h_k) \in {\bf R}^k: w_1 h_1 + \dots + w_k h_k \geq c \},$

where ${w_1,\dots,w_k}$ are reals that are not all zero, and ${c}$ is another real, which contains ${(h(X_1),\dots,h(X_k))}$ on its boundary and ${(H,+\infty)^k}$ in its interior, such that ${C}$ avoids the interior of the half-space. Since ${(h(X_1),\dots,h(X_k))}$ is also on the boundary of ${(H,+\infty)^k}$, we see that the ${w_j}$ are non-negative, and that ${w_j = 0}$ whenever ${h(X_j) \neq H}$.

By construction, the quantity

$\displaystyle w_1 h(Y_1) + \dots + w_k h(Y_k)$

is maximised when ${(Y_1,\dots,Y_k) = (X_1,\dots,X_k)}$. At this point we could use the method of Lagrange multipliers to obtain the required constraints, but because we have some boundary conditions on the ${(Y_1,\dots,Y_k)}$ (namely, that the probability that they attain a given element of ${\Gamma}$ has to be non-negative) we will work things out by hand. Let ${t = (t_1,\dots,t_k)}$ be an element of ${\Gamma}$, and ${s = (s_1,\dots,s_k)}$ an element of ${\Gamma^*}$. For ${\varepsilon>0}$ small enough, we can form a random variable ${(Y_1,\dots,Y_k)}$ taking values in ${\Gamma}$, whose probability distribution is the same as that for ${(X_1,\dots,X_k)}$ except that the probability of attaining ${(t_1,\dots,t_k)}$ is increased by ${\varepsilon}$, and the probability of attaining ${(s_1,\dots,s_k)}$ is decreased by ${\varepsilon}$. If there is any ${j}$ for which ${{\bf P}(X_j = t_j)=0}$ and ${w_j \neq 0}$, then one can check that

$\displaystyle w_1 h(Y_1) + \dots + w_k h(Y_k) - (w_1 h(X_1) + \dots + w_k h(X_k)) \gg \varepsilon \log \frac{1}{\varepsilon}$

for sufficiently small ${\varepsilon}$, contradicting the maximality of ${(X_1,\dots,X_k)}$; thus we have ${{\bf P}(X_j = t_j) > 0}$ whenever ${w_j \neq 0}$. Taylor expansion then gives

$\displaystyle w_1 h(Y_1) + \dots + w_k h(Y_k) - (w_1 h(X_1) + \dots + w_k h(X_k)) = (A_t - A_s) \varepsilon + O(\varepsilon^2)$

for small ${\varepsilon}$, where

$\displaystyle A_t := \sum_{j=1}^k w_j \log \frac{1}{{\bf P}(X_j = t_j)}$

and similarly for ${A_s}$. We conclude that ${A_t \leq A_s}$ for all ${s \in \Gamma^*}$ and ${t \in \Gamma}$, thus there exists a quantity ${D}$ such that ${A_s = D}$ for all ${s \in \Gamma^*}$, and ${A_t \leq D}$ for all ${t \in \Gamma}$. By construction ${D}$ must be nonnegative. Sampling ${(t_1,\dots,t_k)}$ using the distribution of ${(X_1,\dots,X_k)}$, one has

$\displaystyle \sum_{j=1}^k w_j \log \frac{1}{{\bf P}(X_j = t_j)} = D$

almost surely; taking expectations we conclude that

$\displaystyle \sum_{j=1}^k w_j \sum_{t_j \in S_j} {\bf P}( X_j = t_j) \log \frac{1}{{\bf P}(X_j = t_j)} = D.$

The inner sum is ${h(X_j)}$, which equals ${H}$ when ${w_j}$ is non-zero, giving (17).

Now we show conversely that (ii) implies (i). As noted previously, the function ${p \mapsto p \log \frac{1}{p}}$ is concave on ${[0,1]}$, with derivative ${\log \frac{1}{p} - 1}$. This gives the inequality

$\displaystyle q \log \frac{1}{q} \leq p \log \frac{1}{p} + (q-p) ( \log \frac{1}{p} - 1 ) \ \ \ \ \ (18)$

$\displaystyle \sum_{j=1}^k w_j h(Y_j) \leq \sum_{j=1}^k w_j h(X_j)$

$\displaystyle + \sum_{j=1}^k \sum_{t_j \in S_j} w_j ({\bf P}(Y_j = t_j) - {\bf P}(X_j = t_j)) ( \log \frac{1}{{\bf P}(X_j=t_j)} - 1 ).$

By construction, one has

$\displaystyle \sum_{j=1}^k w_j h(X_j) = \min(h(X_1),\dots,h(X_k)) \sum_{j=1}^k w_j$

and

$\displaystyle \sum_{j=1}^k w_j h(Y_j) \geq \min(h(Y_1),\dots,h(Y_k)) \sum_{j=1}^k w_j$

so to prove that ${\min(h(Y_1),\dots,h(Y_k)) \leq \min(h(X_1),\dots,h(X_k))}$ (which would give (i)), it suffices to show that

$\displaystyle \sum_{j=1}^k \sum_{t_j \in S_j} w_j ({\bf P}(Y_j = t_j) - {\bf P}(X_j = t_j)) ( \log \frac{1}{{\bf P}(X_j=t_j)} - 1 ) \leq 0,$

or equivalently that the quantity

$\displaystyle \sum_{j=1}^k \sum_{t_j \in S_j} w_j {\bf P}(Y_j = t_j) ( \log \frac{1}{{\bf P}(X_j=t_j)} - 1 )$

is maximised when ${(Y_1,\dots,Y_k) = (X_1,\dots,X_k)}$. Since

$\displaystyle \sum_{j=1}^k \sum_{t_j \in S_j} w_j {\bf P}(Y_j = t_j) = \sum_{j=1}^k w_j$

it suffices to show this claim for the quantity

$\displaystyle \sum_{j=1}^k \sum_{t_j \in S_j} w_j {\bf P}(Y_j = t_j) \log \frac{1}{{\bf P}(X_j=t_j)}.$

One can view this quantity as

$\displaystyle {\bf E}_{(Y_1,\dots,Y_k)} \sum_{j=1}^k w_j \log \frac{1}{{\bf P}_{X_j}(X_j=Y_j)}.$

By (ii), this quantity is bounded by ${D}$, with equality if ${(Y_1,\dots,Y_k)}$ is equal to ${(X_1,\dots,X_k)}$ (and is in particular ranging in ${\Gamma^*}$), giving the claim. $\Box$

The second half of the proof of Proposition 7 only uses the marginal distributions ${{{\bf P}(X_j=t_j)}}$ and the equation(16), not the actual distribution of ${(X_1,\dots,X_k)}$, so it can also be used to prove an upper bound on ${H}$ when the exact maximizing distribution is not known, given suitable probability distributions in each variable. The logarithm of the probability distribution here plays the role that the weight functions do in BCCGNSU.

Remark 8 Suppose one is in the situation of (i) and (ii) above; assume the nondegeneracy condition that ${H}$ is positive (or equivalently that ${D}$ is positive). We can assign a “degree” ${d_j(t_j)}$ to each element ${t_j \in S_j}$ by the formula

$\displaystyle d_j(t_j) := w_j \log \frac{1}{{\bf P}(X_j = t_j)}, \ \ \ \ \ (19)$

then every tuple ${(t_1,\dots,t_k)}$ in ${\Gamma}$ has total degree at most ${D}$, and those tuples in ${\Gamma^*}$ have degree exactly ${D}$. In particular, every tuple in ${\Gamma^n}$ has degree at most ${nD}$, and hence by (17), each such tuple has a ${j}$-component of degree less than or equal to ${nHw_j}$ for some ${j}$ with ${w_j>0}$. On the other hand, we can compute from (19) and the fact that ${h(X_j) = H}$ for ${w_j > 0}$ that ${Hw_j = {\bf E} d_j(X_j)}$. Thus, by asymptotic equipartition, and assuming ${w_j \neq 0}$, the number of “monomials” in ${S_j^n}$ of total degree at most ${nHw_j}$ is at most ${\exp( (h(X_j)+o(1)) n )}$; one can in fact use (19) and (18) to show that this is in fact an equality. This gives a direct way to cover ${\Gamma^n}$ by sets ${\Gamma_{n,1},\dots,\Gamma_{n,k}}$ with ${|\pi_j(\Gamma_{n,j})| \leq \exp( (H+o(1)) n)}$, which is in the spirit of the Croot-Lev-Pach-Ellenberg-Gijswijt arguments from the previous post.

We can now show that the rank computation for the capset problem is sharp:

Proposition 9 Let ${V_1^{\otimes n} = V_2^{\otimes n} = V_3^{\otimes n}}$ denote the space of functions from ${{\bf F}_3^n}$ to ${{\bf F}_3}$. Then the function ${(x,y,z) \mapsto \delta_{0^n}(x,y,z)}$ from ${{\bf F}_3^n \times {\bf F}_3^n \times {\bf F}_3^n}$ to ${{\bf F}}$, viewed as an element of ${V_1^{\otimes n} \otimes V_2^{\otimes n} \otimes V_3^{\otimes n}}$, has rank ${\exp( (H^*+o(1)) n )}$ as ${n \rightarrow \infty}$, where ${H^* \approx 1.013445}$ is given by the formula

$\displaystyle H^* = \alpha \log \frac{1}{\alpha} + \beta \log \frac{1}{\beta} + \gamma \log \frac{1}{\gamma} \ \ \ \ \ (20)$

$\displaystyle \alpha = \frac{32}{3(15 + \sqrt{33})} \approx 0.51419$

$\displaystyle \beta = \frac{4(\sqrt{33}-1)}{3(15+\sqrt{33})} \approx 0.30495$

$\displaystyle \gamma = \frac{(\sqrt{33}-1)^2}{6(15+\sqrt{33})} \approx 0.18086.$

Proof: In ${{\bf F}_3 \times {\bf F}_3 \times {\bf F}_3}$, we have

$\displaystyle \delta_0(x+y+z) = 1 - (x+y+z)^2$

$\displaystyle = (1-x^2) - y^2 - z^2 + xy + yz + zx.$

Thus, if we let ${V_1=V_2=V_3}$ be the space of functions from ${{\bf F}_3}$ to ${{\bf F}_3}$ (with domain variable denoted ${x,y,z}$ respectively), and define the basis functions

$\displaystyle v_{1,0} := 1; v_{1,1} := x; v_{1,2} := x^2$

$\displaystyle v_{2,0} := 1; v_{2,1} := y; v_{2,2} := y^2$

$\displaystyle v_{3,0} := 1; v_{3,1} := z; v_{3,2} := z^2$

of ${V_1,V_2,V_3}$ indexed by ${S_1=S_2=S_3 := \{ 0,1,2\}}$ (with the usual ordering), respectively, and set ${\Gamma \subset S_1 \times S_2 \times S_3}$ to be the set

$\displaystyle \{ (2,0,0), (0,2,0), (0,0,2), (1,1,0), (0,1,1), (1,0,1),(0,0,0) \}$

then ${\delta_0(x,y,z)}$ is a linear combination of the ${v_{1,t_1} \otimes v_{1,t_2} \otimes v_{1,t_3}}$ with ${(t_1,t_2,t_3) \in \Gamma}$, and all coefficients non-zero. Then we have ${\Gamma'= \{ (2,0,0), (0,2,0), (0,0,2), (1,1,0), (0,1,1), (1,0,1) \}}$. We will show that the quantity ${H}$ of (10) agrees with the quantity ${H^*}$ of (20), and that the optimizing distribution is supported on ${\Gamma'}$, so that by Proposition 6 the rank of ${\delta_{0^n}(x,y,z)}$ is ${\exp( (H+o(1)) n)}$.

To compute the quantity at (10), we use the criterion in Proposition 7. We take ${(X_1,X_2,X_3)}$ to be the random variable taking values in ${\Gamma}$ that attains each of the values ${(2,0,0), (0,2,0), (0,0,2)}$ with a probability of ${\gamma \approx 0.18086}$, and each of ${(1,1,0), (0,1,1), (1,0,1)}$ with a probability of ${\alpha - 2\gamma = \beta/2 \approx 0.15247}$; then each of the ${X_j}$ attains the values of ${0,1,2}$ with probabilities ${\alpha,\beta,\gamma}$ respectively, so in particular ${h(X_1)=h(X_2)=h(X_3)}$ is equal to the quantity ${H'}$ in (20). If we now set ${w_1 = w_2 = w_3 := 1}$ and

$\displaystyle D := 2\log \frac{1}{\alpha} + \log \frac{1}{\gamma} = \log \frac{1}{\alpha} + 2 \log \frac{1}{\beta} = 3H^* \approx 3.04036$

we can verify the condition (16) with equality for all ${(t_1,t_2,t_3) \in \Gamma'}$, which from (17) gives ${H=H'=H^*}$ as desired. $\Box$

This statement already follows from the result of Kleinberg-Sawin-Speyer, which gives a “tri-colored sum-free set” in ${\mathbb F_3^n}$ of size ${\exp((H'+o(1))n)}$, as the slice rank of this tensor is an upper bound for the size of a tri-colored sum-free set. If one were to go over the proofs more carefully to evaluate the subexponential factors, this argument would give a stronger lower bound than KSS, as it does not deal with the substantial loss that comes from Behrend’s construction. However, because it actually constructs a set, the KSS result rules out more possible approaches to give an exponential improvement of the upper bound for capsets. The lower bound on slice rank shows that the bound cannot be improved using only the slice rank of this particular tensor, whereas KSS shows that the bound cannot be improved using any method that does not take advantage of the “single-colored” nature of the problem.

We can also show that the slice rank upper bound in a result of Naslund-Sawin is similarly sharp:

Proposition 10 Let ${V_1^{\otimes n} = V_2^{\otimes n} = V_3^{\otimes n}}$ denote the space of functions from ${\{0,1\}^n}$ to ${\mathbb C}$. Then the function ${(x,y,z) \mapsto \prod_{i=1}^n (x_i+y_i+z_i)-1}$ from ${\{0,1\}^n \times \{0,1\}^n \times \{0,1\}^n \rightarrow \mathbb C}$, viewed as an element of ${V_1^{\otimes n} \otimes V_2^{\otimes n} \otimes V_3^{\otimes n}}$, has slice rank ${(3/2^{2/3})^n e^{o(n)}}$

Proof: Let ${v_{1,0}=1}$ and ${v_{1,1}=x}$ be a basis for the space ${V_1}$ of functions on ${\{0,1\}}$, itself indexed by ${S_1=\{0,1\}}$. Choose similar bases for ${V_2}$ and ${V_3}$, with ${v_{2,0}=1, v_{2,1}=y}$ and ${v_{3,0}=1,v_{3,1}=z-1}$.

Set ${\Gamma = \{(1,0,0),(0,1,0),(0,0,1)\}}$. Then ${x+y+z-1}$ is a linear combination of the ${v_{1,t_1} \otimes v_{1,t_2} \otimes v_{1,t_3}}$ with ${(t_1,t_2,t_3) \in \Gamma}$, and all coefficients non-zero. Order ${S_1,S_2,S_3}$ the usual way so that ${\Gamma}$ is an antichain. We will show that the quantity ${H}$ of (10) is ${\log(3/2^{2/3})}$, so that applying the last statement of Proposition 6, we conclude that the rank of ${\delta_{0^n}(x,y,z)}$ is ${\exp( (\log(3/2^{2/3})+o(1)) n)= (3/2^{2/3})^n e^{o(n)}}$ ,

Let ${(X_1,X_2,X_3)}$ be the random variable taking values in ${\Gamma}$ that attains each of the values ${(1,0,0),(0,1,0),(0,0,1)}$ with a probability of ${1/3}$. Then each of the ${X_i}$ attains the value ${1}$ with probability ${1/3}$ and ${0}$ with probability ${2/3}$, so

$\displaystyle h(X_1)=h(X_2)=h(X_3) = (1/3) \log (3) + (2/3) \log(3/2) = \log 3 - (2/3) \log 2= \log (3/2^{2/3})$

Setting ${w_1=w_2=w_3=1}$ and ${D=3 \log(3/2^{2/3})=3 \log 3 - 2 \log 2}$, we can verify the condition (16) with equality for all ${(t_1,t_2,t_3) \in \Gamma'}$, which from (17) gives ${H=\log (3/2^{2/3})}$ as desired. $\Box$

We used a slightly different method in each of the last two results. In the first one, we use the most natural bases for all three vector spaces, and distinguish ${\Gamma}$ from its set of maximal elements ${\Gamma'}$. In the second one we modify one basis element slightly, with ${v_{3,1}=z-1}$ instead of the more obvious choice ${z}$, which allows us to work with ${\Gamma = \{(1,0,0),(0,1,0),(0,0,1)\}}$ instead of ${\Gamma=\{(1,0,0),(0,1,0),(0,0,1),(0,0,0)\}}$. Because ${\Gamma}$ is an antichain, we do not need to distinguish ${\Gamma}$ and ${\Gamma'}$. Both methods in fact work with either problem, and they are both about equally difficult, but we include both as either might turn out to be substantially more convenient in future work.

Proposition 11 Let ${k \geq 8}$ be a natural number and let ${G}$ be a finite abelian group. Let ${{\bf F}}$ be any field. Let ${V_1 = \dots = V_k}$ denote the space of functions from ${G}$ to ${{\bf F}}$.

Let ${F}$ be any ${{\bf F}}$-valued function on ${G^k}$ that is nonzero only when the ${k}$ elements of ${G^n}$ form a ${k}$-term arithmetic progression, and is nonzero on every ${k}$-term constant progression.

Then the slice rank of ${F}$ is ${|G|}$.

Proof: We apply Proposition 4, using the standard bases of ${V_1,\dots,V_k}$. Let ${\Gamma}$ be the support of ${F}$. Suppose that we have ${k}$ orderings on ${H}$ such that the constant progressions are maximal elements of ${\Gamma}$ and thus all constant progressions lie in ${\Gamma'}$. Then for any partition ${\Gamma_1,\dots, \Gamma_k}$ of ${\Gamma'}$, ${\Gamma_j}$ can contain at most ${|\pi_j(\Gamma_j)|}$ constant progressions, and as all ${|G|}$ constant progressions must lie in one of the ${\Gamma_j}$, we must have ${\sum_{j=1}^k |\pi_j(\Gamma_j)| \geq |G|}$. By Proposition 4, this implies that the slice rank of ${F}$ is at least ${|G|}$. Since ${F}$ is a ${|G| \times \dots \times |G|}$ tensor, the slice rank is at most ${|G|}$, hence exactly ${|G|}$.

So it is sufficient to find ${k}$ orderings on ${G}$ such that the constant progressions are maximal element of ${\Gamma}$. We make several simplifying reductions: We may as well assume that ${\Gamma}$ consists of all the ${k}$-term arithmetic progressions, because if the constant progressions are maximal among the set of all progressions then they are maximal among its subset ${\Gamma}$. So we are looking for an ordering in which the constant progressions are maximal among all ${k}$-term arithmetic progressions. We may as well assume that ${G}$ is cyclic, because if for each cyclic group we have an ordering where constant progressions are maximal, on an arbitrary finite abelian group the lexicographic product of these orderings is an ordering for which the constant progressions are maximal. We may assume ${k=8}$, as if we have an ${8}$-tuple of orderings where constant progressions are maximal, we may add arbitrary orderings and the constant progressions will remain maximal.

So it is sufficient to find ${8}$ orderings on the cyclic group ${\mathbb Z/n}$ such that the constant progressions are maximal elements of the set of ${8}$-term progressions in ${\mathbb Z/n}$ in the ${8}$-fold product ordering. To do that, let the first, second, third, and fifth orderings be the usual order on ${\{0,\dots,n-1\}}$ and let the fourth, sixth, seventh, and eighth orderings be the reverse of the usual order on ${\{0,\dots,n-1\}}$.

Then let ${(c,c,c,c,c,c,c,c)}$ be a constant progression and for contradiction assume that ${(a,a+b,a+2b,a+3b,a+4b,a+5b,a+6b,a+7b)}$ is a progression greater than ${(c,c,c,c,c,c,c,c)}$ in this ordering. We may assume that ${c \in [0, (n-1)/2]}$, because otherwise we may reverse the order of the progression, which has the effect of reversing all eight orderings, and then apply the transformation ${x \rightarrow n-1-x}$, which again reverses the eight orderings, bringing us back to the original problem but with ${c \in [0,(n-1)/2]}$.

Take a representative of the residue class ${b}$ in the interval ${[-n/2,n/2]}$. We will abuse notation and call this ${b}$. Observe that ${a+b, a+2b,}$ ${a+3b}$, and ${a+5b}$ are all contained in the interval ${[0,c]}$ modulo ${n}$. Take a representative of the residue class ${a}$ in the interval ${[0,c]}$. Then ${a+b}$ is in the interval ${[mn,mn+c]}$ for some ${m}$. The distance between any distinct pair of intervals of this type is greater than ${n/2}$, but the distance between ${a}$ and ${a+b}$ is at most ${n/2}$, so ${a+b}$ is in the interval ${[0,c]}$. By the same reasoning, ${a+2b}$ is in the interval ${[0,c]}$. Therefore ${|b| \leq c/2< n/4}$. But then the distance between ${a+2b}$ and ${a+4b}$ is at most ${n/2}$, so by the same reasoning ${a+4b}$ is in the interval ${[0,c]}$. Because ${a+3b}$ is between ${a+2b}$ and ${a+4b}$, it also lies in the interval ${[0,c]}$. Because ${a+3b}$ is in the interval ${[0,c]}$, and by assumption it is congruent mod ${n}$ to a number in the set ${\{0,\dots,n-1\}}$ greater than or equal to ${c}$, it must be exactly ${c}$. Then, remembering that ${a+2b}$ and ${a+4b}$ lie in ${[0,c]}$, we have ${c-b \leq b}$ and ${c+b \leq b}$, so ${b=0}$, hence ${a=c}$, thus ${(a,\dots,a+7b)=(c,\dots,c)}$, which contradicts the assumption that ${(a,\dots,a+7b)>(c,\dots,c)}$. $\Box$

In fact, given a ${k}$-term progressions mod ${n}$ and a constant, we can form a ${k}$-term binary sequence with a ${1}$ for each step of the progression that is greater than the constant and a ${0}$ for each step that is less. Because a rotation map, viewed as a dynamical system, has zero topological entropy, the number of ${k}$-term binary sequences that appear grows subexponentially in ${k}$. Hence there must be, for large enough ${k}$, at least one sequence that does not appear. In this proof we exploit a sequence that does not appear for ${k=8}$.

Filed under: expository, math.CO, math.RA Tagged: polynomial method, tensors, Will Sawin

### David Hogg — spectral signatures of convection; photo-zs without training data

I have spent part of the summer working with Megan Bedell (Chicago) to see if there is any evidence that radial velocity measurements with the HARPS instrument might be being affected by calibration issues or helped by taking some kind of hierarchical approach to calibration. We weren't building that hierarchical model, we were looking to see if there is evidence in the residuals for information that a hierarchical model could latch on to. We found nothing, to my surprise. I think this means that the HARPS pipelines are absolutely awesome. I think they are closed-source, so we can't do much but inspect the output.

Given this, we decided to start looking at stellar diagnostics—if it isn't the instrument calibration, then maybe it is actually the star itself: We need to ask whether we can we see spectral signatures that predict radial velocity. This is a very general causal formulation of the problem: We do not expect that a star's spectrum will vary with the phase of an exoplanet's orbit (unless it is a very hot planet!), so if anything about the spectrum predicts the radial velocity, we have something to latch on to. The idea is that we might see the spectral signature of hot up-welling or cold down-welling at the stellar surface. There is much work in this area, but I am not sure than anyone has done anything truly data driven (in the style, for example, of The Cannon). We discussed first steps towards doing that, with Bedell assigned plotting tasks, and me writing down some methodological ideas.

Over lunch, Boris Leistedt and I caught up on all the various projects we like to discuss. He has had the breakthrough that—if you build a proper generative model for galaxy imaging data—you don't need to have spectroscopic training sets, nor good galaxy spectral models, to get good photometric redshifts. The idea is that once you have multi-band photometry, you can predict the appearance of any observed galaxy as it would appear any other redshift using a flexible, non-parametric SED model that isn't tied to any physical galaxy model. The idea is that we use all of, but only, what we believe about how the redshift works, physically. Most machine-learning methods aren't required to get the redshift physics right, and most template-based models assume lots of auxilliary things about stars and stellar populations and dust. We also realized that, if done correctly, this method could subsume into itself the cross-correlation redshifts that the LSST project is excited about.

### Doug Natelson — Proxima Centauri's planet and the hazards of cool animations

It was officially announced today that Proxima Centauri has a potentially earthlike planet.  That's great, especially for fans of science fiction.  Here is a relevant video by Nature:

Did you spot the mistake?  The scientists discovered the planet by seeing the wobble in the star's motion (measured by painstaking spectroscopy of the starlight, and using the Doppler shift of the spectrum to "see" the tiny motion of the star).  The animation tries to show this at 0:55-1:12.  The wobble is because the star and planet actually orbit around a common center of mass located on the line between them.  Instead, the video seems to show the center of mass of the star+planet tracing out a circle around empty space.  Whoops.   Someone should've caught that.  Still an impressive result.

Update:  The makers of the video have updated with a link to a more accurate animation of the Doppler approach:  https://youtu.be/B-oZYm3L1JE.

### Tommaso Dorigo — A Great Blitz Game

As an old time chessplayer who's stopped competing in tournaments, I often entertain myself with the odd blitz game in some internet chess server. And more often than not, I play rather crappy chess. So nothing to report there... However fluctuations do occur.
I just played a combinative-style game which I wish to share, although I did not have the time yet (and I think I won't have time in the near future) to check the moves with a computer program. So my moves might well be flawed. Regardless, I enjoyed playing the game so that's enough motivation to report it here.

### David Hogg — the best image differencing ever

I had the pleasure today of reading two draft papers, one by Dun Wang on our alternative to difference imaging based on our data-driven pixel-level model of the Kepler K2 data, and the other by Huanian Zhang (Arizona) on H-alpha emission from the outskirts of distant galaxies. Wang's paper shows (what I believe to be) the most precise image differences ever created. Of course we had amazing data to start with! But his method for image differencing is unusual; it doesn't require any model of either PSF nor the difference between them. It just empirically figures out what linear combinations of pixels in the target image predict each pixel in the target image, using the other images to determine these predictor combinations. It works very well and has been used to find microlensing events in the K2C9 data, but it has the disadvantage that it needs to run on a variability campaign; it can't be run on just two images.

The Zhang paper uses enormous numbers of galaxy-spectrum pairs in the SDSS spectroscopic samples to find H-alpha emission from the outskirts of (or—more precisely—angularly correlated with) nearby galaxies. He detects a signal! And it is 30 times fainter than any previous upper limit. So it is big news, I think, and has implications for the radiation environments of galaxies in the nearby Universe.

## August 24, 2016

### David Hogg — halo occupation and assembly bias

My research highlight today was a conversation with MJ Vakili about the paper he wrote this summer about halo occupation and what's known as “assembly bias”. Perhaps the most remarkable thing about contemporary cosmology is that the dark-matter-only simulations do a great job of explaining the large-scale structure in the galaxy distribution, despite the fact that we don't understand galaxy formation! The connection is a “halo occupation function” that puts galaxies into halos. It turns out that incredibly simple prescriptions work.

I have always been suspicious about halo occupation, because galaxy halos are not fundamental objects in gravity or cosmology; they are defined by a prescription, running on the output of a simulation. That is, they are just effective crutches, used for convenience. There was no reason to put any reality onto a halo (or a sub-halo or anything of the sort). Really there is just a density field! However, empirically, the halo description of the Universe has been both easy and useful.

Now that cosmology is seeking ever higher precision, work has started along the lines of asking what halo properties (mass, velocity amplitude, concentration, and so on) are relevant to the galaxies that form within them. The answer from the data seems to be that mass is the main driving factor. The community has expected a bias or occupation that depends on the time of formation of the halo (which itself relates to he halo concentration parameter). Vakili has been testing this, and the main punchline is that if the effect is there, it is a small one! It is a great result and he is nearly ready to submit.

My question is: Can we step out of the halo box and consider all the ways we might put galaxies into the dark-matter field? Could the data tell us what is most relevant?

### Backreaction — What if the universe was like a pile of laundry?

What if the universe was like a pile of laundry?

Have one.

See this laundry pile? Looks just like our universe.

No?

Here, have another.

See it now? It’s got three dimensions and all.

But look again.

The shirts and towels, they’re really crinkled and interlocked two-dimensional surfaces.

Wait.

It’s one-dimensional yarn, knotted up tightly.

You ok?

Have another.

I see it clearly now. It’s everything at once, one-two-three dimensional. Just depends on how closely you look at it.

Amazing, don’t you think? What if our universe was just like that?

 Universal Laundry Pile.[Img Src: Clipartkid]

It doesn’t sound like a sober thought, but it’s got math behind it, so physicists think there might be something to it. Indeed the math piled up lately. They call it “dimensional reduction,” the idea that space on short distances has fewer than three dimensions – and it might help physicists to quantize gravity.

We’ve gotten used to space with additional dimensions, rolled up so small we can’t observe them. But how do you get rid of dimensions instead? To understand how it works we first have clarify what we mean by “dimension.”

We normally think about dimensions of space by picturing lines which spread from a point. How quickly the lines dilute with the distance from the point tells us the “Hausdorff dimension” of a space. The faster the lines diverge from each other with distance, the larger the Hausdorff dimension. If you speak through a pipe, for example, sound waves spread less and your voice carries farther. The pipe hence has a lower Hausdorff dimension than our normal 3-dimensional office cubicles. It’s the Hausdorff dimension that we colloquially refer to as just dimension.

For dimensional reduction, however, it is not the Hausdorff dimension which is relevant, but instead the “spectral dimension,” which is a slightly different concept. We can calculate it by first getting rid of the “time” in “space-time” and making it into space (period). We then place a random walker at one point and measure the probability that it returns to the same point during its walk. The smaller the average return probability, the higher the probability the walker gets lost, and the higher the number of spectral dimensions.

Normally, for a non-quantum space, both notions of dimension are identical. However, add quantum mechanics and the spectral dimension at short distances goes down from four to two. The return probability for short walks becomes larger than expected, and the walker is less likely to get lost – this is what physicists mean by “dimensional reduction.”

The spectral dimension is not necessarily an integer; it can take on any value. This value starts at 4 when quantum effects can be neglected, and decreases when the walker’s sensitivity to quantum effects at shortest distances increases. Physicists therefore also like to say that the spectral dimension “runs,” meaning its value depends on the resolution at which space-time is probed.

Dimensional reduction is an attractive idea because quantizing gravity is considerably easier in lower dimensions where the infinities that plague traditional attempts to quantize gravity go away. A theory with a reduced number of dimensions at shortest distances therefore has much higher chances to remain consistent and so to provide a meaningful theory for the quantum nature of space and time. Not so surprisingly thus, among physicists, dimensional reduction has received quite some attention lately.

This strange property of quantum-spaces was first found in Causal Dynamical Triangulation (hep-th/0505113), an approach to quantum gravity that relies on approximating curved spaces by triangular patches. In this work, the researchers did a numerical simulation of a random walk in such a triangulized quantum-space, and found that the spectral dimension goes down from four to two. Or actually to 1.80 ± 0.25 if you want to know precisely.

Instead of doing numerical simulations, it is also possible to study the spectral dimension mathematically, which has since been done in various other approaches. For this, physicists exploit that the behavior of the random walk is governed by a differential equation – the diffusion equation – which depends on the curvature of space. In quantum gravity, the curvature has quantum fluctuations, and then it’s instead its average value which enters the diffusion equation. From the diffusion equation one then calculates the return probability for the random walk.

This way, physicists have inferred the spectral dimension also in Asymptotically Safe Gravity (hep-th/0508202), an approach to quantum gravity which relies on the resolution-dependence (the “running”) of quantum field theories. And they found the same drop from four to two spectral dimensions.

Another indication comes from Loop Quantum Gravity, where the scaling of the area operator with length changes at short distances. In this case is somewhat questionable whether the notion of curvature makes sense at all on short distances. But ignoring this, one can construct the diffusion equation and finds that the spectral dimension drops from four to two (0812.2214).

And then there is Horava-Lifshitz gravity, yet another modification of gravity which some believe helps with quantizing it. Here too, dimensional reduction has been found (0902.3657).

It is difficult to visualize what is happening with the dimensionality of space if it goes down continuously, rather than in discrete steps as in the example with the laundry pile. Maybe a good way to picture it, as Calcagni, Eichhorn and Saueressig suggest, is to think of the quantum fluctuations of space-time hindering a particle’s random walk, thereby slowing it down. It wouldn’t have to be that way. Quantum fluctuations could also kick the particle around wildly, thereby increasing the spectral dimension rather than decreasing it. But that’s not what the math tells us.

One shouldn’t take this picture too seriously though, because we’re talking about a random walk in space, not space-time, and so it’s not a real physical process. Turning time into space might seem strange, but it is a common mathematical simplification which is often used for calculations in quantum theory. Still, it makes it difficult to interpret what is happening physically.

I find it intriguing that several different approaches to quantum gravity share a behavior like this. Maybe it is a general property of quantum space-time. But then, there are many different types of random walks, and while these different approaches to quantum gravity share a similar scaling behavior for the spectral dimension, they differ in the type of random walk that produces this scaling (1304.7247). So maybe the similarities are only superficial.

And of course this idea has no observational evidence speaking for it. Maybe never will. But one day, I’m sure, all the math will click into place and everything will make perfect sense. Meanwhile, have another.

[This article first appeared on Starts With A Bang under the title Dimensional Reduction: The Key To Physics' Greatest Mystery?]

### Steinn Sigurðsson — A Pale Red Dot: The Closest Exoplanet

The Pale Red Dot project has found a planet.

It is a terrestrial planet, orbiting in the formal habitable zone of Proxima Centauri, the nearest star to the Solar System.

This wide-field image shows the Milky Way stretching across the southern sky. The beautiful Carina Nebula (NGC 3372) is seen at the right of the image glowing in red. It is within this spiral arm of our Milky Way that the bright star cluster NGC 3603 resides. At the centre of the image is the constellation of Crux (The Southern Cross). The bright yellow/white star at the left of the image is Alpha Centauri, in fact a system of three stars, at a distance of about 4.4 light-years from Earth. The star Alpha Centauri C, Proxima Centauri, is the closest star to the Solar System.

Proxima Centauri is a low mass red dwarf, and is part of a triple system, the other two stars being α Centauri A and B, which are solar like stars in a close orbit around each other.

3D map of all known stellar systems in the solar neighbourhood within a radius of 12.5 light-years. The Sun is at the centre. The colour is indicative of the temperature and the spectral class — white stars are (main-sequence) A and F dwarfs; yellow stars like the Sun are G dwarfs; orange stars are K dwarfs; and red stars are M dwarfs, by far the most common type of star in the solar neighbourhood. The blue axes are oriented along the galactic coordinate system, and the radii of the rings are 5, 10, and 15 light-years, respectively.

The whole system is a little over 4 light years away, the nearest stars to the Sun, and Proxima is the closest of the three stars.

This image of the sky around the bright star Alpha Centauri AB also shows the much fainter red dwarf star, Proxima Centauri, the closest star to the Solar System. The picture was created from pictures forming part of the Digitized Sky Survey 2. The blue halo around Alpha Centauri AB is an artifact of the photographic process, the star is really pale yellow in colour like the Sun.

α Centauri B was thought to have a planet, but the evidence for that particular planet is looking shaky.

This picture combines a view of the southern skies over the ESO 3.6-metre telescope at the La Silla Observatory in Chile with images of the stars Proxima Centauri (lower-right) and the double star Alpha Centauri AB (lower-left) from the NASA/ESA Hubble Space Telescope. Proxima Centauri is the closest star to the Solar System and is orbited by the planet Proxima b, which was discovered using the HARPS instrument on the ESO 3.6-metre telescope.

It may have been noise. α Cen A and B were in conjunction, making them hard to observe, but are now separating, and observing campaigns to look for planets around those star are continuing.

ESO researchers, using the radial velocity variability technique, have detected a quite robust signature of a planet with a mass of 1.3 Earth masses, or more, in a 11 day orbit around Proxima Centauri.

This plot shows how the motion of Proxima Centauri towards and away from Earth is changing with time over the first half of 2016. Sometimes Proxima Centauri is approaching Earth at about 5 kilometres per hour — normal human walking pace — and at times receding at the same speed. This regular pattern of changing radial velocities repeats with a period of 11.2 days. Careful analysis of the resulting tiny Doppler shifts showed that they indicated the presence of a planet with a mass at least 1.3 times that of the Earth, orbiting about 7 million kilometres from Proxima Centauri — only 5% of the Earth-Sun distance.

Since the Proxima Centauri is almost 1,000 times fainter than the Sun, this puts the putative planet well within the habitable zone of the star, near the inner edge of the zone, but formally inside it.

This infographic compares the orbit of the planet around Proxima Centauri (Proxima b) with the same region of the Solar System. Proxima Centauri is smaller and cooler than the Sun and the planet orbits much closer to its star than Mercury. As a result it lies well within the habitable zone, where liquid water can exist on the planet’s surface.

The planet would most likely be tidally locked to the star, and might either have one face locked to the star (like the Moon to the Earth), or, conceivable, be in a 2:3 tidal lock, like Mercury is with the Sun.
In either case, it is conceivable for this planet to have liquid water on its surface, IF it is has reasonable thickness atmosphere of nice enough composition.

This artist’s impression shows the planet Proxima b orbiting the red dwarf star Proxima Centauri, the closest star to the Solar System. The double star Alpha Centauri AB also appears in the image between the planet and Proxima itself. Proxima b is a little more massive than the Earth and orbits in the habitable zone around Proxima Centauri, where the temperature is suitable for liquid water to exist on its surface.

This is the nearest star to the Sun.

The relative sizes of a number of objects, including the three (known) members of Alpha Centauri triple system and some other stars for which the angular sizes have also been measured with the Very Large Telescope Interferometer (VLTI) at the ESO Paranal Observatory. The Sun and planet Jupiter are also shown for comparison.

It has an Earth like planet orbiting in a nice orbit.
It will be about as easy to characterise as any exoplanet ever.
There really are a lot of exoplanets everywhere, and a lot of them are Earth mass, and a lot of those are in nice orbits.

What a nice Universe.

Next, a nice intense workshop on how to find lots of low mass planets in the habitable zones of lots of stars in the near future.
Should be a fun workshop.

ESO press release – with bonus videos and extra graphics and links.

“A terrestrial planet candidate in a temperate orbit around Proxima Centauri” Anglada-Escude et al. Nature, 25 August, 2016

Two new papers on the topic:

“Proxima Centauri as a Benchmark for Stellar Activity Indicators in the Near Infrared” Paul Robertson et al., ApJ submitted – stellar photometric activity may mimic low amplitude radial velocity variability, and cause false positive signals for candidate planets. This paper looks at multi-band photometry over a long timeline for Proxima Centauri to characterize time scales on which the star varies.
tl;dr there is very little variability on 5-15 day time scales, which makes it very unlikely the planet candidate is a false positive due to stellar variability.

“Effects of Proxima Centauri on Planet Formation in Alpha Centauri” Worth & Sigurdsson, ApJ in press – theory paper by Rachel Worth, my PhD student, on planet formation models for the α Centauri system, taking its putative dynamical history into account.
tl;dr – theoretically there can be planets in the system, few, low mass in close orbits, including around Proxima Centauri. Details could elucidate past history and formation of system.

### Tommaso Dorigo — Anomaly!: Book News And A Clip

The book "Anomaly! Collider Physics and the Quest for New Phenomena at Fermilab" is going to press as we speak, and its distribution in bookstores is foreseen for the beginning of November. In the meantime, I am getting ready to present it in several laboratories and institutes. I am posting here the coordinates of events which are already scheduled, in case anybody lives nearby and/or has an interest in attending.
- On November 29th at 4PM there will be a presentation at CERN (more details will follow).

### Jordan Ellenberg — The Wisconsin Supreme Court gets home rule wrong and right

The Supreme Court made a decision in the Milwaukee police officer residency requirement case I wrote about, peevishly and at length, earlier this year.  Chief Justice Michael Gableman is still claiming the home rule amendment says something it doesn’t say; whether he’s confused or cynical I can’t say.

the home rule amendment gives cities and villages the ability “to determine their local affairs and government, subject only to this constitution and to such enactments of the legislature of statewide concern as with uniformity shall affect every city or every village.”  In other words, a city or village may, under its home rule authority, create a law that deals with its local affairs, but the Legislature has the power to statutorily override the city’s or village’s law if the state statute touches upon a matter of statewide concern or if the state statute uniformly affects every city or village. See Madison
Teachers, 358 Wis. 2d 1, ¶101.

“In other words,” phooey.  The amendment says a state enactment has to be of statewide concern and uniform in its effect.  Gableman turns the “and” into an “or,” giving the state much greater leeway to bend cities to its will.  The citation, by the way, is to his own opinion in the Act 10 case, where he’s wrong for the same reason.

But here’s the good news.  Rebecca Bradley, the newest justice, wrote a blistering concurrence (scroll to paragraph 52 of the opinion) which gets the amendment right.  She agrees with the majority that the state has constitutional authority to block Milwaukee’s residency requirement.  But the majority’s means of reaching that conclusion is wrong.  Bradley explains: by the home rule amendment’s plain text and by what its drafters said at the time of its composition, it is and, not or; for a state law to override a city law, it has to involve a matter of statewide concern and apply uniformly to all muncipalities.  Here’s Daniel Hoan, mayor of Milwaukee, and one of the main authors of the home rule amendment:

We submit that this wording is not ambiguous as other constitutional Home Rule amendments may be. It does not say——subject to state laws, subject to state laws of state-wide concern, or subject to laws uniformly affecting cities, but it does say——subject only to such state laws as are therein defined, and these laws must meet two tests: First——do they involve a subject of statewide concern, and second——do they with uniformity affect every city or village?

Bradley concedes that decades of Supreme Court precedent interpret the amendment wrongly.  So screw the precedent, she writes!  OK, she doesn’t actually write that.  But words to that effect.

I know I crap on Scalia-style originalism a lot, partly because I think it’s often a put-on.  But this is the real thing.

## August 23, 2016

### Clifford Johnson — Sometimes a Sharpie…

Sometimes a sharpie and a bit of bristol are the best defense against getting lost in the digital world*... (Click for larger view.)

(Throwing down some additional faces for a story in the book. Just wasn't feeling it in [...] Click to continue reading this post

The post Sometimes a Sharpie… appeared first on Asymptotia.

### Doug Natelson — Statistical and Thermal Physics

Eight years ago I taught Rice's undergraduate Statistical and Thermal Physics course, and now after teaching the honors intro physics class for a while, I'm returning to it.   I posted about the course here, and I still feel the same - the subject matter is intellectually very deep, and it's the third example in the undergraduate curriculum (after electricity&magnetism and quantum mechanics) where students really need to pick up a different way of thinking about the world, a formalism that can seem far removed from their daily experience.

One aspect of the course, the classical thermodynamic potentials and how one goes back and forth between them, nearly always comes across as obscure and quasi-magical the first (or second) time students are exposed to it.  Since the last time I taught the course, a nice expository article about why the math works has appeared in the American Journal of Physics (arxiv version).

Any readers have insights/suggestions on other nice, recent pedagogical resources for statistical and thermal physics?

## August 22, 2016

### Particlebites — The CMB sheds light on galaxy clusters: Observing the kSZ signal with ACT and BOSS

Article: Detection of the pairwise kinematic Sunyaev-Zel’dovich effect with BOSS DR11 and the Atacama Cosmology Telescope
Authors: F. De Bernardis, S. Aiola, E. M. Vavagiakis, M. D. Niemack, N. Battaglia, and the ACT Collaboration
Reference: arXiv:1607.02139

Editor’s note: this post is written by one of the students involved in the published result.

Like X-rays shining through your body can inform you about your health, the cosmic microwave background (CMB) shining through galaxy clusters can tell us about the universe we live in. When light from the CMB is distorted by the high energy electrons present in galaxy clusters, it’s called the Sunyaev-Zel’dovich effect. A new 4.1σ measurement of the kinematic Sunyaev-Zel’dovich (kSZ) signal has been made from the most recent Atacama Cosmology Telescope (ACT) cosmic microwave background (CMB) maps and galaxy data from the Baryon Oscillation Spectroscopic Survey (BOSS). With steps forward like this one, the kinematic Sunyaev-Zel’dovich signal could become a probe of cosmology, astrophysics and particle physics alike.

### The Kinematic Sunyaev-Zel’dovich Effect

It rolls right off the tongue, but what exactly is the kinematic Sunyaev-Zel’dovich signal? Galaxy clusters distort the cosmic microwave background before it reaches Earth, so we can learn about these clusters by looking at these CMB distortions. In our X-ray metaphor, the map of the CMB is the image of the X-ray of your arm, and the galaxy clusters are the bones. Galaxy clusters are the largest gravitationally bound structures we can observe, so they serve as important tools to learn more about our universe. In its essence, the Sunyaev-Zel’dovich effect is inverse-Compton scattering of cosmic microwave background photons off of the gas in these galaxy clusters, whereby the photons gain a “kick” in energy by interacting with the high energy electrons present in the clusters.

The Sunyaev-Zel’dovich effect can be divided up into two categories: thermal and kinematic. The thermal Sunyaev-Zel’dovich (tSZ) effect is the spectral distortion of the cosmic microwave background in a characteristic manner due to the photons gaining, on average, energy from the hot (~107 – 108 K) gas of the galaxy clusters. The kinematic (or kinetic) Sunyaev-Zel’dovich (kSZ) effect is a second-order effect—about a factor of 10 smaller than the tSZ effect—that is caused by the motion of galaxy clusters with respect to the cosmic microwave background rest frame. If the CMB photons pass through galaxy clusters that are moving, they are Doppler shifted due to the cluster’s peculiar velocity (the velocity that cannot be explained by Hubble’s law, which states that objects recede from us at a speed proportional to their distance). The kinematic Sunyaev-Zel’dovich effect is the only known way to directly measure the peculiar velocities of objects at cosmological distances, and is thus a valuable source of information for cosmology. It allows us to probe megaparsec and gigaparsec scales – that’s around 30,000 times the diameter of the Milky Way!

A schematic of the Sunyaev-Zel’dovich effect resulting in higher energy (or blue shifted) photons of the cosmic microwave background (CMB) when viewed through the hot gas present in galaxy clusters. Source: UChicago Astronomy.

### Measuring the kSZ Effect

To make the measurement of the kinematic Sunyaev-Zel’dovich signal, the Atacama Cosmology Telescope (ACT) collaboration used a combination of cosmic microwave background maps from two years of observations by ACT. The CMB map used for the analysis overlapped with ~68000 galaxy sources from the Large Scale Structure (LSS) DR11 catalog of the Baryon Oscillation Spectroscopic Survey (BOSS). The catalog lists the coordinate positions of galaxies along with some of their properties. The most luminous of these galaxies were assumed to be located at the centers of galaxy clusters, so temperature signals from the CMB map were taken at the coordinates of these galaxy sources in order to extract the Sunyaev-Zel’dovich signal.

While the smallness of the kSZ signal with respect to the tSZ signal and the noise level in current CMB maps poses an analysis challenge, there exist several approaches to extracting the kSZ signal. To make their measurement, the ACT collaboration employed a pairwise statistic. “Pairwise” refers to the momentum between pairs of galaxy clusters, and “statistic” indicates that a large sample is used to rule out the influence of unwanted effects.

Here’s the approach: nearby galaxy clusters move towards each other on average, due to gravity. We can’t easily measure the three-dimensional momentum of clusters, but the average pairwise momentum can be estimated by using the line of sight component of the momentum, along with other information such as redshift and angular separations between clusters. The line of sight momentum is directly proportional to the measured kSZ signal: the microwave temperature fluctuation which is measured from the CMB map. We want to know if we’re measuring the kSZ signal when we look in the direction of galaxy clusters in the CMB map. Using the observed CMB temperature to find the line of sight momenta of galaxy clusters, we can estimate the mean pairwise momentum as a function of cluster separation distance, and check to see if we find that nearby galaxies are indeed falling towards each other. If so, we know that we’re observing the kSZ effect in action in the CMB map.

For the measurement quoted in their paper, the ACT collaboration finds the average pairwise momentum as a function of galaxy cluster separation, and explores a variety of error determinations and sources of systematic error. The most conservative errors based on simulations give signal-to-noise estimates that vary between 3.6 and 4.1.

The mean pairwise momentum estimator and best fit model for a selection of 20000 objects from the DR11 Large Scale Structure catalog, plotted as a function of comoving separation. The dashed line is the linear model, and the solid line is the model prediction including nonlinear redshift space corrections. The best fit provides a 4.1σ evidence of the kSZ signal in the ACTPol-ACT CMB map. Source: arXiv:1607.02139.

The ACT and BOSS results are an improvement on the 2012 ACT detection, and are comparable with results from the South Pole Telescope (SPT) collaboration that use galaxies from the Dark Energy Survey. The ACT and BOSS measurement represents a step forward towards improved extraction of kSZ signals from CMB maps. Future surveys such as Advanced ACTPol, SPT-3G, the Simons Observatory, and next-generation CMB experiments will be able to apply the methods discussed here to improved CMB maps in order to achieve strong detections of the kSZ effect. With new data that will enable better measurements of galaxy cluster peculiar velocities, the pairwise kSZ signal will become a powerful probe of our universe in the years to come.

### Implications and Future Experiments

One interesting consequence for particle physics will be more stringent constraints on the sum of the neutrino masses from the pairwise kinematic Sunyaev-Zel’dovich effect. Upper bounds on the neutrino mass sum from cosmological measurements of large scale structure and the CMB have the potential to determine the neutrino mass hierarchy, one of the next major unknowns of the Standard Model to be resolved, if the mass hierarchy is indeed a “normal hierarchy” with ν3 being the heaviest mass state. If the upper bound of the neutrino mass sum is measured to be less than 0.1 eV, the inverted hierarchy scenario would be ruled out, due to there being a lower limit on the mass sum of ~0.095 eV for an inverted hierarchy and ~0.056 eV for a normal hierarchy.

Forecasts for kSZ measurements in combination with input from Planck predict possible constraints on the neutrino mass sum with a precision of 0.29 eV, 0.22 eV and 0.096 eV for Stage II (ACTPol + BOSS), Stage III (Advanced ACTPol + BOSS) and Stage IV (next generation CMB experiment + DESI) surveys respectively, with the possibility of much improved constraints with optimal conditions. As cosmic microwave background maps are improved and Sunyaev-Zel’dovich analysis methods are developed, we have a lot to look forward to.

### Sean Carroll — Maybe We Do Not Live in a Simulation: The Resolution Conundrum

Greetings from bucolic Banff, Canada, where we’re finishing up the biennial Foundational Questions Institute conference. To a large extent, this event fulfills the maxim that physicists like to fly to beautiful, exotic locations, and once there they sit in hotel rooms and talk to other physicists. We did manage to sneak out into nature a couple of times, but even there we were tasked with discussing profound questions about the nature of reality. Evidence: here is Steve Giddings, our discussion leader on a trip up the Banff Gondola, being protected from the rain as he courageously took notes on our debate over “What Is an Event?” (My answer: an outdated notion, a relic of our past classical ontologies.)

One fun part of the conference was a “Science Speed-Dating” event, where a few of the scientists and philosophers sat at tables to chat with interested folks who switched tables every twenty minutes. One of the participants was philosopher David Chalmers, who decided to talk about the question of whether we live in a computer simulation. You probably heard about this idea long ago, but public discussion of the possibility was recently re-ignited when Elon Musk came out as an advocate.

At David’s table, one of the younger audience members raised a good point: even simulated civilizations will have the ability to run simulations of their own. But a simulated civilization won’t have access to as much computing power as the one that is simulating it, so the lower-level sims will necessarily have lower resolution. No matter how powerful the top-level civilization might be, there will be a bottom level that doesn’t actually have the ability to run realistic civilizations at all.

This raises a conundrum, I suggest, for the standard simulation argument — i.e. not only the offhand suggestion “maybe we live in a simulation,” but the positive assertion that we probably do. Here is one version of that argument:

1. We can easily imagine creating many simulated civilizations.
2. Things that are that easy to imagine are likely to happen, at least somewhere in the universe.
3. Therefore, there are probably many civilizations being simulated within the lifetime of our universe. Enough that there are many more simulated people than people like us.
4. Likewise, it is easy to imagine that our universe is just one of a large number of universes being simulated by a higher civilization.
5. Given a meta-universe with many observers (perhaps of some specified type), we should assume we are typical within the set of all such observers.
6. A typical observer is likely to be in one of the simulations (at some level), rather than a member of the top-level civilization.
7. Therefore, we probably live in a simulation.

Of course one is welcome to poke holes in any of the steps of this argument. But let’s for the moment imagine that we accept them. And let’s add the observation that the hierarchy of simulations eventually bottoms out, at a set of sims that don’t themselves have the ability to perform effective simulations. Given the above logic, including the idea that civilizations that have the ability to construct simulations usually construct many of them, we inevitably conclude:

• We probably live in the lowest-level simulation, the one without an ability to perform effective simulations. That’s where the vast majority of observers are to be found.

Hopefully the conundrum is clear. The argument started with the premise that it wasn’t that hard to imagine simulating a civilization — but the conclusion is that we shouldn’t be able to do that at all. This is a contradiction, therefore one of the premises must be false.

This isn’t such an unusual outcome in these quasi-anthropic “we are typical observers” kinds of arguments. The measure on all such observers often gets concentrated on some particular subset of the distribution, which might not look like we look at all. In multiverse cosmology this shows up as the “youngness paradox.”

Personally I think that premise 1. (it’s easy to perform simulations) is a bit questionable, and premise 5. (we should assume we are typical observers) is more or less completely without justification. If we know that we are members of some very homogeneous ensemble, where every member is basically the same, then by all means typicality is a sensible assumption. But when ensembles are highly heterogeneous, and we actually know something about our specific situation, there’s no reason to assume we are typical. As James Hartle and Mark Srednicki have pointed out, that’s a fake kind of humility — by asserting that “we are typical” in the multiverse, we’re actually claiming that “typical observers are like us.” Who’s to say that is true?

I highly doubt this is an original argument, so probably simulation cognoscenti have debated it back and forth, and likely there are standard responses. But it illustrates the trickiness of reasoning about who we are in a very big cosmos.

### Tommaso Dorigo — Post-Doctoral Positions In Experimental Physics For Foreigners

The Italian National Institute for Nuclear Physics offers 20 post-doctoral positions in experimental physics to foreigners with a PhD obtained no earlier than November 2008.
So if have a PhD (in Physics, but I guess other disciplines are also valid as long as your cv conforms), you like Italy, or if you would like to come and work with me at the search and study of the Higgs boson with the CMS experiment (or even if you would like to do something very different, in another town, with another experiment) you might consider applying!

The economical conditions are not extraordinary in an absolute sense, but you would still end up getting a salary more or less like mine, which in Italy sort of allows one to live a decent life.

### John Preskill — Toward a Coherent US Government Strategy for QIS

In an upbeat  recent post, Spiros reported some encouraging news about quantum information science from the US National Science and Technology Council. Today I’ll chime in with some further perspective and background.

The Interagency Working Group on Quantum Information Science (IWG on QIS), which began its work in late 2014, was charged “to assess Federal programs in QIS, monitor the state of the field, provide a forum for interagency coordination and collaboration, and engage in strategic planning of Federal QIS activities and investments.”  The IWG recently released a  well-crafted report, Advancing Quantum Information Science: National Challenges and Opportunities. The report recommends that “quantum information science be considered a priority for Federal coordination and investment.”

All the major US government agencies supporting QIS were represented on the IWG, which was co-chaired by officials from DOE, NSF, and NIST:

• Steve Binkley, who heads the Advanced Scientific Computing Research (ASCR) program in the Department of Energy Office of Science,
• Denise Caldwell, who directs the Physics Division of the National Science Foundation,
• Carl Williams, Deputy Director of the Physical Measurement Laboratory at the National Institute for Standards and Technology.

Denise and Carl have been effective supporters of QIS over many years of government service. Steve has recently emerged as another eloquent advocate for the field’s promise and importance.

At our request, the three co-chairs fielded questions about the report, with the understanding that their responses would be broadly disseminated. Their comments reinforced the message of the report — that all cognizant agencies favor a “coherent, all-of-government approach to QIS.”

Science funding in the US differs from elsewhere in the world. QIS is a prime example — for over 20 years, various US government agencies, each with its own mission, goals, and culture, have had a stake in QIS research. By providing more options for supporting innovative ideas, the existence of diverse autonomous funding agencies can be a blessing. But it can also be bewildering for scientists seeking support, and it poses challenges for formulating and executing effective national science policy. It’s significant that many different agencies worked together in the IWG, and were able to align with a shared vision.

“I think that everybody in the group has the same goals,” Denise told us. “The nation has a tremendous opportunity here. This is a terrifically important field for all of us involved, and we all want to see it succeed.” Carl added, “All of us believe that this is an area in which the US must be competitive, it is very important for both scientific and technological reasons … The differences [among agencies] are minor.”

Asked about the timing of the IWG and its report, Carl noted the recent trend toward “emerging niche applications” of QIS such as quantum sensors, and Denise remarked that government agencies are responding to a plea from industry for a cross-disciplinary work force broadly trained in QIS. At the same time, Denise emphasized, the IWG recognizes that “there are still many open basic science questions that are important for this field, and we need to focus investment onto these basic science questions, as well as look at investments or opportunities that lead into the first applications.”

DOE’s FY2017 budget request includes \$10M to fund a new QIS research program, coordinated with NIST and NSF. Steve explained the thinking behind that request:  “There are problems in the physical science space, spanned by DOE Office of Science programs, where quantum computation would be a useful a tool. This is the time to start making investments in that area.” Asked about the longer term commitment of DOE to QIS research, Steve was cautious. “What it will grow into over time is hard to tell — we’re right at the beginning.”

What can the rest of us in the QIS community do to amplify the impact of the report? Carl advised: “All of us should continue getting the excitement of the field out there, [and point to] the potential long term payoffs,  whether they be in searches for dark matter or building better clocks or better GPS systems or better sensors. Making everybody aware of all the potential is good for our economy, for our country, and for all of us.”

Taking an even longer view, Denise reminded us that effective advocacy for QIS can get young people “excited about a field they can work in, where they can get jobs, where they can pursue science — that can be critically important.  If we all think back to our own beginning careers, at some point in time we got excited about science. And so whatever one can do to excite the next generation about science and technology, with the hope of bringing them into studying and developing careers in this field, to me this is tremendously valuable. ”

All of us in the quantum information science community owe a debt to the IWG for their hard work and eloquent report, and to the agencies they represent for their vision and support. And we are all fortunate to be participating in the early stages of a new quantum revolution. As the IWG report makes clear, the best is yet to come.

## August 21, 2016

### Chad Orzel — 314-335/366: Massive Backlog

It’s been over a month since I did a photo-a-day post, largely because I haven’t been taking many pictures for a variety of reasons. I do still mean to get a year’s worth of good photos done, but the “daily” part has completely disintegrated at this point.

As a way of getting somewhat back on track, I’ve edited up the best of the shots I took in the loooong break since the last batch I posted. This spans all the way up to the present, and a few are just cell-phone snaps, but it’s better than nothing. since these aren’t really associated with particular days, I’ll group them thematically instead.

Flora and Fauna:

314/366: Spiderweb

A spiderweb in the back yard of Chateau Steelypips, catching the early-morning light.

315/366: Lit Tree

A low-hanging branch catching the morning sunlight.

The second of these is what I went outside to try to photograph– I liked the light effect of the sun hitting this branch (which has since been pruned away, because it hung down to about mid-chest level on me, and was a major impediment to moving around the back yard). when I went out to get that shot, though, I noticed the spiderweb stretched between it and the tall arbor vitae (it’s just visible in the lower right of the tree picture), which was an even better photo.

316/366: Lilies

Tiger lilies in front of our house.

317/366: Inordinately Fond

Beetle on the lilies in front of Chateau Steelypips.

These flowers are right in front of the bay window on the front of our house, and are really beautiful when they bloom. They also attract all manner of bugs, but that’s not a terrible thing…

318/366: Nest

The bird nest in our front-porch light.

Last year, we had house sparrows nesting in a hole in the outside wall near SteelyKid’s room. We plugged that hole up, so in retaliation, they moved to our front porch light. One of the glass panes broke a while back when I was changing the bulb, and I didn’t bother trying to find a replacement, thinking there wasn’t any problem with leavig it open. Shows what I know…

It stays light enough late enough that we don’t really use this light in the summer, so I think they managed to successfully hatch and raise some chicks in this. At least, I used to hear lots of frantic peeping as I went in and out the front door, and I don’t any more…

When it gets cold, I’ll clean this out, and maybe cover over the gap so they don’t do this again. Which probably means next summer they’ll be nesting inside my car, or something.

319/366: Why?

Wild turkey crossing Balltown Road in Niskayuna, taken through the windshield of my car.

Why did the turkey cross the road? To get into a photo dump post, of course.

Miscellany

320/366: Waves

The wave pool at Six Flags Great Escape.

This is actually a still frame from a short video of the wave pool at Six Flags, that I used for a physics post over at Forbes. The kids could’ve stayed in this all day, and the next time we go there, they probably will.

321/366: Focus

Shots of a couple of toys with different lenses, demonstrating how to make distant background objects look huge.

Another image from a post at Forbes, this one on using long focal lengths to make background objects look huge.

322/366: Physics

The computer apparatus for the spring lab I’m writing up.

323/366: Alignment

The hanging spring for the lab I’m working on, fortuitously aligned with the corner of the wall behind it.

One of the many projects I’m juggling at the moment is to write up a pedagogical paper based on an intro mechanics lab I do with springs. This involves using a computer to record the motion of a mass on a spring, so just in case I need it, I took a photo when I was taking example data.

The second picture just shows the hanging spring, and wouldn’t be interesting except for a chance alignment: without realizing, I managed to line the metal pole holding the whole apparatus up with the corner of a pillar in the wall behind. I probably wouldn’t’ve been able to do this on purpose, but by not paying attention, I was able to create kind of a weird effect with the background.

324/366: Demos

Steve Rolston and Emily Edwards of JQI doing demos at the Schrodinger Sessions workshop.

One of the things keeping me away from the computer was a second round of the Schrodinger Sessions workshop at JQI. I forgot to bring the good camera with me (which tells you something about what my state of mind has been of late), so the only documentation you get is this crappy cell-phone snap of Steve and Emily doing magnetic levitation demos while writers take pictures with their phones…

325/366: Rainbow

SteelyKid’s arm, covered with rainbows that she mostly painted herself.

The kids had a couple of play-dates in there, which included getting crazy with the face paint. SteelyKid did most of these rainbows (on her left arm) herself.

326/366: Glasses

Loops in the power line behind the Schenectady Amtrak station look like glasses.

Credit for this one ought to go to The Pip. We were up on the platform at the Amtrak station, for reasons that will become clear later, and he said “Hey, that wire looks like glasses!”

327/366: Party Prep

White plastic chairs getting ready for SteelyPalooza 2016,

Another big time sink in the last month was SteelyKid’s eighth birthday party, hosted at our house, which required a fair bit of prep. This shot of a scattering of white plastic chairs in front of the play set is your random artsy shot for this batch.

327/366: Hoops

Noon hoops at the Viniar Center.

I actually took this for eventual use in a blog post, but it’s good documentation of one of the things I’ve spent (not nearly enough) time on: playing pick-up hoops at lunchtime.

328/366: Catch ‘Em All

The kids with bird-type Pokemon.

After a few weeks, I gave into SteelyKid’s pleas, and installed Pokemon GO on my phone. which means I now have to wrestle it out of her hands every time we leave the house, but on the bright side, it makes for some amusing photos.

It also leads to weird wandering in hopes of encountering new critters, which is why we were up on the Amtrak platform (after dinner at the Irish pub across the parking lot) to get photo 326 above.

329/366: PaintedKid

SteelyKid, with face-paint by Kate.

SteelyKid did the rainbows you saw above, but Kate has become a real face-painting master, as you can see here.

330/366: TigerDude

The Pip is a scary tiger.

We’ve also had several birthday parties in there, including one that provided the Little Dude with this tiger mask.

331/366: Bat Kid

This isn’t terrifying at all.

The same party provided SteelyKid a chance to show off her climbing and hanging skills.

332/366: Door Leash

The Pip built a remote door closer, and is very pleased with himself.

At some point, The Pip got the idea to string together some dog leashes and attach them to the handle of his door, letting him pull it closed from halfway down the stairs. This led to an amazingly long time spent “trapping” Kate in his room.

333/366: Behave!

Kate cautioning The Pip not to do whatever mischevious thing he’s planning.

This might be my favorite photo of the whole lot, for the combination of facial expressions.

334/366: Lollipop Swap

SteelyKid in negotiations with her BFF about who gets which lollipop flavors.

This past week was the end of the summer day camp that SteelyKid goes to, which saw the kids receive a remarkable amount of candy. Which was then redistributed through a complex series of negotiations over who liked which flavors best.

335/366: Flying Robot Army

SteelyKid setting up her drone for a test flight.

Finally, here’s SteelyKid with the most awesome of her birthday presents, a radio-controlled quad-copter with a camera, from her Aunt Erin in California. Because who doesn’t want their own personal drone? Historians may well note this as a pivotal step toward SteelyKid’s eventual takeover of the world with the help of an army of flying robots.

——

So, that’s a great big bunch of photos, all right. which still leaves me around 20 behind the photo-a-day pace, but I intend to make an effort to catch back up before the end of the month. I’m going to start carrying the camera with me more regularly, and see what I can come up with from that.

### Richard Easther — Look West

Last Friday, work kept me late at the office. It was a clear and cloudless night and the stars were out as I biked home in the dusk. And as my pedals turned, the night sky wheeled more slowly above me.

My route starts with a mild climb along a ridge and turns westward. And as I headed for home I saw a light in the sky so bright I briefly wondered if it was an approaching plane. But it was the planet Venus, the evening star, following the setting sun through the Western sky. And a little way above Venus was another stunning light, the planet Jupiter.

But there was a third, somewhat fainter, point of light alongside Jupiter and Venus, the planet Mercury.  These three worlds are currently clustered in a patch of sky not much bigger than your outstretched hand. As the two brightest lights in the night sky, Venus and Jupiter are usually the first objects people will point to and ask "Is that one a planet?" On the other hand, Mercury is usually the hardest naked-eye planet to find in the sky – so, with Jupiter and Venus to guide you, right now is a great time to spot our Solar System's innermost world.

You'll need a cloud-free night, an unobstructed view of the western horizon (Southern Hemisphere viewers have an advantage over their northern counterparts). The sky-maps below will get you oriented.

August 21st

The Western Sky from Auckland, 6:30pm

August 25th

The Western Sky from Auckland, 6:30pm

August 28th

The Western Sky from Auckland, 6:30pm - closest approach between Venus and Jupiter.

Stellarium, a free software tool for visualising the sky which works with Windows, Mac and Linux machines while the Star Walk app is a great tool to discover the night sky – seeing it in action convinced me I needed an iPhone.  Also, Mars and Saturn are also visible in the night sky right now, lying relatively close to one another and well above the horizon.

The title image shows Mercury, imaged by the Messenger spacecraft.

## August 19, 2016

### n-Category CaféCompact Closed Bicategories

I’m happy to announce that this paper has been published:

Abstract. A compact closed bicategory is a symmetric monoidal bicategory where every object is equipped with a weak dual. The unit and counit satisfy the usual ‘zig-zag’ identities of a compact closed category only up to natural isomorphism, and the isomorphism is subject to a coherence law. We give several examples of compact closed bicategories, then review previous work. In particular, Day and Street defined compact closed bicategories indirectly via Gray monoids and then appealed to a coherence theorem to extend the concept to bicategories; we restate the definition directly.

We prove that given a 2-category $C$ with finite products and weak pullbacks, the bicategory of objects of $C$, spans, and isomorphism classes of maps of spans is compact closed. As corollaries, the bicategory of spans of sets and certain bicategories of ‘resistor networks” are compact closed.

This paper is dear to my heart because it forms part of Mike Stay’s thesis, for which I served as co-advisor. And it’s especially so because his proof that objects, spans, and maps-of-spans in a suitable 2-category forms a compact symmetric monoidal bicategory turned out to be much harder than either of us were prepared for!

A problem worthy of attack
Proves its worth by fighting back.

In a compact closed category every object comes with morphisms called the ‘cap’ and ‘cup’, obeying the ‘zig-zag identities’. For example, in the category where morphisms are 2d cobordisms, the zig-zag identities say this:

But in a compact closed bicategory the zig-zag identities hold only up to 2-morphisms, which in turn must obey equations of their own: the ‘swallowtail identities’. As the name hints, these are connected to the swallowtail singularity, which is part of René Thom’s classification of catastrophes. This in turn is part of a deep and not yet fully worked out connection between singularity theory and coherence laws for ‘$n$-categories with duals’.

But never mind that: my point is that proving the swallowtail identities for a bicategory of spans in a 2-category turned out to be much harder than expected. Luckily Mike rose to the challenge, as you’ll see in this paper!

This paper is also gaining a bit of popularity for its beautiful depictions of the coherence laws for a symmetric monoidal bicategory. And symmetric monoidal bicategories are starting to acquire interesting applications.

The most developed of these are in mathematical physics — for example, 3d topological quantum field theory! To understand 3d TQFTs, we need to understand the symmetric monoidal bicategory where objects are collections of circles, morphisms are 2d cobordisms, and 2-morphisms are 3d cobordisms-between-cobordisms. The whole business of ‘modular tensor categories’ is immensely clarified by this approach. And that’s what this series of papers, still underway, is all about:

Mike Stay, on the other hand, is working on applications to computer science. That’s always been his focus — indeed, his Ph.D. was not in math but computer science. You can get a little taste here:

But there’s a lot more coming soon from him and Greg Meredith.

As for me, I’ve been working on applied math lately, like bicategories where the morphisms are electrical circuits, or Markov processes, or chemical reaction networks. These are, in fact, also compact closed symmetric monoidal bicategories, and my student Kenny Courser is exploring that aspect.

Basically, whenever you have diagrams that you can stick together to form new diagrams, and processes that turn one diagram into another, there’s a good chance you’re dealing with a symmetric monoidal bicategory! And if you’re also allowed to ‘bend wires’ in your diagrams to turn inputs into outputs and vice versa, it’s probably compact closed. So these are fundamental structures — and it’s great that Mike’s paper on them is finally published.

### Backreaction — Away Note

I'll be in Stockholm next week for a program on Black Holes and Emergent Spacetime, so please be prepared for some service interruptions.

## August 18, 2016

### Clifford Johnson — Stranger Stuff…

Ok all you Stranger Things fans. You were expecting a physicist to say a few things about the show weren't you? Over at Screen Junkies, they've launched the first episode of a focus on TV Science (a companion to the Movie Science series you already know about)... and with the incomparable host Hal Rudnick, I talked about Stranger Things. There are spoilers. Enjoy.

(Embed and link after the fold:)
[...] Click to continue reading this post

The post Stranger Stuff… appeared first on Asymptotia.

### Particlebites — What is “Model Building”?

One thing that makes physics, and especially particle physics, is unique in the sciences is the split between theory and experiment. The role of experimentalists is clear: they build and conduct experiments, take data and analyze it using mathematical, statistical, and numerical techniques to separate signal from background. In short, they seem to do all of the real science!

So what is it that theorists do, besides sipping espresso and scribbling on chalk boards? In this post we describe one type of theoretical work called model building. This usually falls under the umbrella of phenomenology, which in physics refers to making connections between mathematically defined theories (or models) of nature and actual experimental observations of nature.

One common scenario is that one experiment observes something unusual: an anomaly. Two things immediately happen:

1. Other experiments find ways to cross-check to see if they can confirm the anomaly.
2. Theorists start figure out the broader implications if the anomaly is real.

#1 is the key step in the scientific method, but in this post we’ll illuminate what #2 actually entails. The scenario looks a little like this:

An unusual experimental result (anomaly) is observed. One thing we would like to know is whether it is consistent with other experimental observations, but these other observations may not be simply related to the anomaly.

Theorists, who have spent plenty of time mulling over the open questions in physics, are ready to apply their favorite models of new physics to see if they fit. These are the models that they know lead to elegant mathematical results, like grand unification or a solution to the Hierarchy problem. Sometimes theorists are more utilitarian, and start with “do it all” Swiss army knife theories called effective theories (or simplified models) and see if they can explain the anomaly in the context of existing constraints.

Here’s what usually happens:

Usually the nicest models of new physics don’t fit! In the explicit example, the minimal supersymmetric Standard Model doesn’t include a good candidate to explain the 750 GeV diphoton bump.

Indeed, usually one needs to get creative and modify the nice-and-elegant theory to make sure it can explain the anomaly while avoiding other experimental constraints. This makes the theory a little less elegant, but sometimes nature isn’t elegant.

Candidate theory extended with a module (in this case, an additional particle). This additional model is “bolted on” to the theory to make it fit the experimental observations.

Now we’re feeling pretty good about ourselves. It can take quite a bit of work to hack the well-motivated original theory in a way that both explains the anomaly and avoids all other known experimental observations. A good theory can do a couple of other things:

1. It points the way to future experiments that can test it.
2. It can use the additional structure to explain other anomalies.

The picture for #2 is as follows:

A good hack to a theory can explain multiple anomalies. Sometimes that makes the hack a little more cumbersome. Physicists often develop their own sense of ‘taste’ for when a module is elegant enough.

Even at this stage, there can be a lot of really neat physics to be learned. Model-builders can develop a reputation for particularly clever, minimal, or inspired modules. If a module is really successful, then people will start to think about it as part of a pre-packaged deal:

A really successful hack may eventually be thought of as it’s own variant of the original theory.

Model-smithing is a craft that blends together a lot of the fun of understanding how physics works—which bits of common wisdom can be bent or broken to accommodate an unexpected experimental result? Is it possible to find a simpler theory that can explain more observations? Are the observations pointing to an even deeper guiding principle?

Of course—we should also say that sometimes, while theorists are having fun developing their favorite models, other experimentalists have gone on to refute the original anomaly.

Sometimes anomalies go away and the models built to explain them don’t hold together.

But here’s the mark of a really, really good model: even if the anomaly goes away and the particular model falls out of favor, a good model will have taught other physicists something really neat about what can be done within the a given theoretical framework. Physicists get a feel for the kinds of modules that are out in the market (like an app store) and they develop a library of tricks to attack future anomalies. And if one is really fortunate, these insights can point the way to even bigger connections between physical principles.

I cannot help but end this post without one of my favorite physics jokes, courtesy of T. Tait:

A theorist and an experimentalist are having coffee. The theorist is really excited, she tells the experimentalist, “I’ve got it—it’s a model that’s elegant, explains everything, and it’s completely predictive.”

The experimentalist listens to her colleague’s idea and realizes how to test those predictions. She writes several grant applications, hires a team of postdocs and graduate students, trains them,  and builds the new experiment. After years of design, labor, and testing, the machine is ready to take data. They run for several months, and the experimentalist pores over the results.

The experimentalist knocks on the theorist’s door the next day and says, “I’m sorry—the experiment doesn’t find what you were predicting. The theory is dead.”

The theorist frowns a bit: “What a shame. Did you know I spent three whole weeks of my life writing that paper?”

### Tommaso Dorigo — The Daily Physics Problem - 13, 14

While I do not believe that this series of posts can be really useful to my younger colleagues, who will in a month have to participate in a tough selection for INFN researchers in Rome, I think there is some value in continuing what I have started last month.
After all, as physicists we are problem solvers, and some exercise is good for all of us. Plus, the laypersons who occasionally visit this blog may actually enjoy fiddling with the questions. For them, though, I thought it would be useful to also get to see the answers to the questions, or at least _some_ answer.

## August 17, 2016

### Clifford Johnson — New Style…

Style change. For a story-within-a-story in the book, I'm changing styles, going to a looser, more cartoony style, which sort of fits tonally with the subject matter in the story. The other day on the subway I designed the characters in that style, and I share them with you here. It's lots of fun to draw in this looser [...] Click to continue reading this post

The post New Style… appeared first on Asymptotia.

## August 16, 2016

### Sean Carroll — You Should Love (or at least respect) the Schrödinger Equation

Over at the twitter dot com website, there has been a briefly-trending topic #fav7films, discussing your favorite seven films. Part of the purpose of being on twitter is to one-up the competition, so I instead listed my #fav7equations. Slightly cleaned up, the equations I chose as my seven favorites are:

1. ${\bf F} = m{\bf a}$
2. $\partial L/\partial {\bf x} = \partial_t ({\partial L}/{\partial {\dot {\bf x}}})$
3. ${\mathrm d}*F = J$
4. $S = k \log W$
5. $ds^2 = -{\mathrm d}t^2 + {\mathrm d}{\bf x}^2$
6. $G_{ab} = 8\pi G T_{ab}$
7. $\hat{H}|\psi\rangle = i\partial_t |\psi\rangle$

In order: Newton’s Second Law of motion, the Euler-Lagrange equation, Maxwell’s equations in terms of differential forms, Boltzmann’s definition of entropy, the metric for Minkowski spacetime (special relativity), Einstein’s equation for spacetime curvature (general relativity), and the Schrödinger equation of quantum mechanics. Feel free to Google them for more info, even if equations aren’t your thing. They represent a series of extraordinary insights in the development of physics, from the 1600’s to the present day.

Of course people chimed in with their own favorites, which is all in the spirit of the thing. But one misconception came up that is probably worth correcting: people don’t appreciate how important and all-encompassing the Schrödinger equation is.

I blame society. Or, more accurately, I blame how we teach quantum mechanics. Not that the standard treatment of the Schrödinger equation is fundamentally wrong (as other aspects of how we teach QM are), but that it’s incomplete. And sometimes people get brief introductions to things like the Dirac equation or the Klein-Gordon equation, and come away with the impression that they are somehow relativistic replacements for the Schrödinger equation, which they certainly are not. Dirac et al. may have originally wondered whether they were, but these days we certainly know better.

As I remarked in my post about emergent space, we human beings tend to do quantum mechanics by starting with some classical model, and then “quantizing” it. Nature doesn’t work that way, but we’re not as smart as Nature is. By a “classical model” we mean something that obeys the basic Newtonian paradigm: there is some kind of generalized “position” variable, and also a corresponding “momentum” variable (how fast the position variable is changing), which together obey some deterministic equations of motion that can be solved once we are given initial data. Those equations can be derived from a function called the Hamiltonian, which is basically the energy of the system as a function of positions and momenta; the results are Hamilton’s equations, which are essentially a slick version of Newton’s original ${\bf F} = m{\bf a}$.

There are various ways of taking such a setup and “quantizing” it, but one way is to take the position variable and consider all possible (normalized, complex-valued) functions of that variable. So instead of, for example, a single position coordinate x and its momentum p, quantum mechanics deals with wave functions ψ(x). That’s the thing that you square to get the probability of observing the system to be at the position x. (We can also transform the wave function to “momentum space,” and calculate the probabilities of observing the system to be at momentum p.) Just as positions and momenta obey Hamilton’s equations, the wave function obeys the Schrödinger equation,

$\hat{H}|\psi\rangle = i\partial_t |\psi\rangle$.

Indeed, the $\hat{H}$ that appears in the Schrödinger equation is just the quantum version of the Hamiltonian.

The problem is that, when we are first taught about the Schrödinger equation, it is usually in the context of a specific, very simple model: a single non-relativistic particle moving in a potential. In other words, we choose a particular kind of wave function, and a particular Hamiltonian. The corresponding version of the Schrödinger equation is

$\displaystyle{\left[-\frac{1}{\mu^2}\frac{\partial^2}{\partial x^2} + V(x)\right]|\psi\rangle = i\partial_t |\psi\rangle}$.

If you don’t dig much deeper into the essence of quantum mechanics, you could come away with the impression that this is “the” Schrödinger equation, rather than just “the non-relativistic Schrödinger equation for a single particle.” Which would be a shame.

What happens if we go beyond the world of non-relativistic quantum mechanics? Is the poor little Schrödinger equation still up to the task? Sure! All you need is the right set of wave functions and the right Hamiltonian. Every quantum system obeys a version of the Schrödinger equation; it’s completely general. In particular, there’s no problem talking about relativistic systems or field theories — just don’t use the non-relativistic version of the equation, obviously.

What about the Klein-Gordon and Dirac equations? These were, indeed, originally developed as “relativistic versions of the non-relativistic Schrödinger equation,” but that’s not what they ended up being useful for. (The story is told that Schrödinger himself invented the Klein-Gordon equation even before his non-relativistic version, but discarded it because it didn’t do the job for describing the hydrogen atom. As my old professor Sidney Coleman put it, “Schrödinger was no dummy. He knew about relativity.”)

The Klein-Gordon and Dirac equations are actually not quantum at all — they are classical field equations, just like Maxwell’s equations are for electromagnetism and Einstein’s equation is for the metric tensor of gravity. They aren’t usually taught that way, in part because (unlike E&M and gravity) there aren’t any macroscopic classical fields in Nature that obey those equations. The KG equation governs relativistic scalar fields like the Higgs boson, while the Dirac equation governs spinor fields (spin-1/2 fermions) like the electron and neutrinos and quarks. In Nature, spinor fields are a little subtle, because they are anticommuting Grassmann variables rather than ordinary functions. But make no mistake; the Dirac equation fits perfectly comfortably into the standard Newtonian physical paradigm.

For fields like this, the role of “position” that for a single particle was played by the variable x is now played by an entire configuration of the field throughout space. For a scalar Klein-Gordon field, for example, that might be the values of the field φ(x) at every spatial location x. But otherwise the same story goes through as before. We construct a wave function by attaching a complex number to every possible value of the position variable; to emphasize that it’s a function of functions, we sometimes call it a “wave functional” and write it as a capital letter,

$\Psi[\phi(x)]$.

The absolute-value-squared of this wave functional tells you the probability that you will observe the field to have the value φ(x) at each point x in space. The functional obeys — you guessed it — a version of the Schrödinger equation, with the Hamiltonian being that of a relativistic scalar field. There are likewise versions of the Schrödinger equation for the electromagnetic field, for Dirac fields, for the whole Core Theory, and what have you.

So the Schrödinger equation is not simply a relic of the early days of quantum mechanics, when we didn’t know how to deal with much more than non-relativistic particles orbiting atomic nuclei. It is the foundational equation of quantum dynamics, and applies to every quantum system there is. (There are equivalent ways of doing quantum mechanics, of course, like the Heisenberg picture and the path-integral formulation, but they’re all basically equivalent.) You tell me what the quantum state of your system is, and what is its Hamiltonian, and I will plug into the Schrödinger equation to see how that state will evolve with time. And as far as we know, quantum mechanics is how the universe works. Which makes the Schrödinger equation arguably the most important equation in all of physics.

While we’re at it, people complained that the cosmological constant Λ didn’t appear in Einstein’s equation (6). Of course it does — it’s part of the energy-momentum tensor on the right-hand side. Again, Einstein didn’t necessarily think of it that way, but these days we know better. The whole thing that is great about physics is that we keep learning things; we don’t need to remain stuck with the first ideas that were handed down by the great minds of the past.

### n-Category CaféTwo Miracles of Algebraic Geometry

In real analysis you get just what you pay for. If you want a function to be seven times differentiable you have to say so, and there’s no reason to think it’ll be eight times differentiable.

But in complex analysis, a function that’s differentiable is infinitely differentiable, and its Taylor series converges, at least locally. Often this lets you extrapolate the value of a function at some faraway location from its value in a tiny region! For example, if you know its value on some circle, you can figure out its value inside. It’s like a fantasy world.

Algebraic geometry has similar miraculous properties. I recently learned about two.

Suppose if I told you:

1. Every group is abelian.
2. Every function between groups that preserves the identity is a homomorphism.

You’d rightly say I’m nuts. But all this is happening in the category of sets. Suppose we go to the category of connected projective algebraic varieties. Then a miracle occurs, and the analogous facts are true:

1. Every connected projective algebraic group is abelian. These are called abelian varieties.
2. If $A$ and $B$ are abelian varieties and $f : A \to B$ is a map of varieties with $f(1) = 1$, then $f$ is a homomorphism.

The connectedness is crucial here. So, as Qiaochu Yuan pointed out in our discussion of these issues on MathOverflow, the magic is not all about algebraic geometry: you can see signs of it in topology. As a topological group, an abelian variety is just a torus. Every continuous basepoint-preserving map between tori is homotopic to a homomorphism. But the rigidity of algebraic geometry takes us further, letting us replace ‘homotopic’ by ‘equal’.

This gives some interesting things. From now on, when I say ‘variety’ I’ll mean ‘connected projective complex algebraic variety’. Let $Var_*$ be the category of varieties equipped with a basepoint, and basepoint-preserving maps. Let $AbVar$ be the category of abelian varieties, and maps that preserve the group operation. There’s a forgetful functor

$U: AbVar \to Var_*$

sending any abelian variety to its underlying pointed variety. $U$ is obviously faithful, but Miracle 2 says that it’s is a full functor.

Taken together, these mean that $U$ is only forgetting a property, not a structure. So, shockingly, being abelian is a mere property of a variety.

Less miraculously, the functor $U$ has a left adjoint! I’ll call this

$Alb: Var_* \to AbVar$

because it sends any variety $X$ with basepoint to something called its Albanese variety.

In case you don’t thrill to adjoint functors, let me say what this mean in ‘plain English’ — or at least what some mathematicians might consider plain English.

Given any variety $X$ with a chosen basepoint, there’s an abelian variety $Alb(X)$ that deserves to be called the ‘free abelian variety on $X$’. Why? Because it has the following universal property: there’s a basepoint-preserving map called the Albanese map

$i_X \colon X \to Alb(X)$

such that any basepoint-preserving map $f: X \to A$ where $A$ happens to be abelian factors uniquely as $i_X$ followed by a map

$\overline{f} \colon Alb(X) \to A$

that is also a group homomorphism. That is:

$f = \overline{f} \circ i_X$

Okay, enough ‘plain English’. Back to category theory.

$U: AbVar \to Var_* , \qquad Alb: Var_* \to AbVar$

$T = U \circ Alb : Var_* \to Var_*$

The unit of this monad is the Albanese map. Moreover $U$ is monadic, meaning that abelian varieties are just algebras of the monad $T$.

All this is very nice, because it means the category theorist in me now understands the point of Albanese varieties. At a formal level, the Albanese variety of a pointed variety is a lot like the free abelian group on a pointed set!

But then comes a fact connected to Miracle 2: a way in which the Albanese variety is not like the free abelian group! $T$ is an idempotent monad:

$T^2 \cong T$

Since the right adjoint $U$ is only forgetting a property, the left adjoint $Alb$ is only ‘forcing that property to hold’, and forcing it to hold again doesn’t do anything more for you!

In other words: the Albanese variety of the Albanese variety is just the Albanese variety.

(I am leaving some forgetful functors unspoken in this snappy statement: I really mean “the underlying pointed variety of the Albanese variety of the underlying pointed variety of $X$ is isomorphic to the Albanese variety of $X$”. But forgetful functors often go unspoken in ordinary mathematical English: they’re not just forgetful, they’re forgotten.)

Four puzzles:

Puzzle 1. Where does Miracle 1 fit into this story?

Puzzle 2. Where does the Picard variety fit into this story? (There’s a kind of duality for abelian varieties, whose categorical significance I haven’t figured out, and the dual of the Albanese variety of $X$ is called the Picard variety of $X$.)

Puzzle 3. Back to complex analysis. Suppose that instead of working with connected projective algebraic varieties we used connected compact complex manifolds. Would we still get a version of Miracles 1 and 2?

Puzzle 4. How should we pronounce ‘Albanese’?

( I don’t think it rhymes with ‘Viennese’. I believe Giacomo Albanese was one of those ‘Italian algebraic geometers’ who always get scolded for their lack of rigor. If he’d just said it was a bloody monad…)

### Doug Natelson — Updated - Short items - new physics or the lack thereof, planets and scale, and professional interactions

Before the start of the new semester takes over, some interesting, fun, and useful items:
Update:. This is awesome.  Watch it.
• The lack of any obvious exotic physics at the LHC has some people (prematurely, I suspect) throwing around phrases like "nightmare scenario" and "desert" - shorthand for the possibility that any major beyond-standard-model particles may be many orders of magnitude above present accelerator energies.  For interesting discussions of this, see here, herehere, and here.
• On the upside, a recent new result has been published that may hint at something weird.  Because protons are built from quarks (and gluons and all sorts of fluctuating ephemeral stuff like pions), their positive charge has some spatial extent, on the order of 10-15 m in radius.  High precision optical spectroscopy of hydrogen-like atoms provides a way to look at this, because the 1s orbital of the electron in hydrogen actually overlaps with the proton a fair bit.  Muons are supposed to be just like electrons in many ways, but 200 times more massive - as a result, a bound muon's 1s orbital overlaps more with the proton and is more sensitive to the proton's charge distribution.  The weird thing is, the muonic hydrogen measurements yield a different size for the proton than the electronic hydrogen ones.  The new measurements are on muonic deuterium, and they, too, show a surprisingly smaller proton than in the ordinary hydrogen case.  Natalie Wolchover's piece in Quanta gives a great discussion of all this, and is a bit less hyperbolic than the piece in ars technica.
• Rumors abound that the European Southern Observatory is going to announce the discovery of an earthlike planet orbiting in the putative habitable zone around Proxima Centauri, the nearest star to the sun.  However, those rumors all go back to an anonymously sourced article in Der Spiegel.  I'm not holding my breath, but it sure would be cool.
• If you want a great sense of scale regarding how far it is even to some place as close as Proxima Centauri, check out this page, If the Moon were One Pixel.
• For new college students:  How to email your professor without being annoying.
• Hopefully in our discipline, despite the dire pronouncements in the top bullet point, we are not yet at the point of having to offer the physics analog of this psych course.
• The US Department of Energy helpfully put out this official response to the Netflix series Stranger Things, in which (spoilers!) a fictitious DOE national lab is up to no good.  Just in case you thought the DOE really was in the business of ripping holes to alternate dimensions and creating telekinetic children.

## August 15, 2016

### Particlebites — High Energy Physics: What Is It Really Good For?

Article: Forecasting the Socio-Economic Impact of the Large Hadron Collider: a Cost-Benefit Analysis to 2025 and Beyond
Authors: Massimo Florio, Stefano Forte, Emanuela Sirtori
Reference: arXiv:1603.00886v1 [physics.soc-ph]

Imagine this. You’re at a party talking to a non-physicist about your research.

If this scenario already has you cringing, imagine you’re actually feeling pretty encouraged this time. Your everyday analogy for the Higgs mechanism landed flawlessly and you’re even getting some interested questions in return. Right when you’re feeling like Neil DeGrasse Tyson himself, your flow grinds to a halt and you have to stammer an awkward answer to the question every particle physicist has nightmares about.

“Why are we spending so much money to discover these fundamental particles? Don’t they seem sort of… useless?”

Well, fair question. While us physicists simply get by with a passion for the field, a team of Italian economists actually did the legwork on this one. And they came up with a really encouraging answer.

The paper being summarized here performed a cost-benefit analysis of the LHC from 1993 to 2025, in order to estimate its eventual impact on the world at large. Not only does that include benefit to future scientific endeavors, but to industry and even the general public as well. To do this, they called upon some classic non-physics notions, so let’s start with a quick economics primer.

• A cost benefit analysis (CBA) is a common thing to do before launching a large-scale investment project. The LHC collaboration is a particularly tough thing to analyze; it is massive, international, complicated, and has a life span of several decades.
• In general, basic research is notoriously difficult to justify to funding agencies, since there are no immediate applications. (A similar problem is encountered with environmental CBAs, so there are some overlapping ideas between the two.) Something that taxpayers fund without getting any direct use of the end product is referred to as a non-use value.
• When trying to predict the future gets fuzzy, economists define something called a quasi option value. For the LHC, this includes aspects of timing and resource allocation (for example, what potential quality-of-life benefits come from discovering supersymmetry, and how bad would it have been if we pushed these off another 100 years?)
• One can also make a general umbrella term for the benefit of pure knowledge, called an existence value. This involves a sort of social optimization; basically what taxpayers are willing to pay to get more knowledge.

The actual equation used to represent the different costs and benefits at play here is below.

Let’s break this down by terms.

PVCu is the sum of operating costs and capital associated with getting the project off the ground and continuing its operation.

PVBu is the economic value of the benefits. Here is where we have to break down even further, into who is benefitting and what they get out of it:

1. Scientists, obviously. They get to publish new research and keep having jobs. Same goes for students and post-docs.
2. Technological industry. Not only do they get wrapped up in the supply chain of building these machines, but basic research can quickly turn into very profitable new ideas for private companies.
3. Everyone else. Because it’s fun to tour the facilities or go to public lectures. Plus CERN even has an Instagram now.

Just to give you an idea of how much overlap there really is between all these sources of benefit,  Figure 1 shows the monetary amount of goods procured from industry for the LHC. Figure 2 shows the number of ROOT software downloads, which, if you are at all familiar with ROOT, may surprise you (yes, it really is very useful outside of HEP!)

Figure 1: Amount of money (thousands of Euros) spent on industry for the LHC. pCp is past procurement, tHp1 is the total high tech procurement, and tHp2 is the high tech procurement for orders > 50 kCHF.

The rightmost term encompasses the non-use value, which is the difference between the sum of the quasi-option value QOV0 and existence value EXV0. If it sounded hard to measure a quasi-option value, it really is. In fact, the authors of this paper simply set it to 0, as a worst case value.

The other values come from in-depth interviews of over 1500 people, including all different types of physicists and industry representatives, as well as previous research papers. This data is then funneled into a computable matrix model, with a cell for each cost/benefit variable, for each year in the LHC lifetime. One can then create a conditional probability distribution function for the NPV value using Monte Carlo simulations to deal with the stochastic variables.

The end PDF is shown in Figure 2, with an expected NPV of 2.9 billion Euro! This also shows a expected benefit/cost ratio of 1.2; a project is generally considered justifiable if this ratio is greater than 1. If this all seems terribly exciting (it is), it never hurts to contact your Congressman and tell them just how much you love physics. It may not seem like much, but it will help ensure that the scientific community continues to get projects on the level of the LHC, even during tough federal budget times.

Figure 3: Net present value PDF (left) and cumulative distribution (right).

Here’s hoping this article helped you avoid at least one common source of awkwardness at a party. Unfortunately we can’t help you field concerns about the LHC destroying the world. You’re on your own with that one.

1. Another supercollider that didn’t get so lucky: The SSC story
2. More on cost-benefit analysis

### Backreaction — The Philosophy of Modern Cosmology (srsly)

 Model of Inflation.img src: umich.edu
I wrote my recent post on the “Unbearable Lightness of Philosophy” to introduce a paper summary, but it got somewhat out of hand. I don’t want to withhold the actual body of my summary though. The paper in question is

Before we start I have to warn you that the paper speaks a lot about realism and underdetermination, and I couldn’t figure out what exactly the authors mean with these words. Sure, I looked them up, but that didn’t help because there doesn’t seem to be an agreement on what the words mean. It’s philosophy after all.

Personally, I subscribe to a philosophy I’d like to call agnostic instrumentalism, which means I think science is useful and I don’t care what else you want to say about it – anything from realism to solipsism to Carroll’s “poetic naturalism” is fine by me. In newspeak, I’m a whateverist – now go away and let me science.

The authors of the paper, in contrast, position themselves as follows:
“We will first state our allegiance to scientific realism… We take scientific realism to be the doctrine that most of the statements of the mature scientific theories that we accept are true, or approximately true, whether the statement is about observable or unobservable states of affairs.”
But rather than explaining what this means, the authors next admit that this definition contains “vague words,” and apologize that they “will leave this general defense to more competent philosophers.” Interesting approach. A physics-paper in this style would say: “This is a research article about General Relativity which has something to do with curvature of space and all that. This is just vague words, but we’ll leave a general defense to more competent physicists.”

In any case, it turns out that it doesn’t matter much for the rest of the paper exactly what realism means to the authors – it’s a great paper also for an instrumentalist because it’s long enough so that, rolled up, it’s good to slap flies. The focus on scientific realism seems somewhat superfluous, but I notice that the paper is to appear in “The Routledge Handbook of Scientific Realism” which might explain it.

It also didn’t become clear to me what the authors mean by underdetermination. Vaguely speaking, they seem to mean that a theory is underdetermined if it contains elements unnecessary to explain existing data (which is also what Wikipedia offers by way of definition). But the question what’s necessary to explain data isn’t a simple yes-or-no question – it’s a question that needs a quantitative analysis.

In theory development we always have a tension between simplicity (fewer assumptions) and precision (better fit) because more parameters normally allow for better fits. Hence we use statistical measures to find out in which case a better fit justifies a more complicated model. I don’t know how one can claim that a model is “underdetermined” without such quantitative analysis.

The authors of the paper for the most part avoid the need to quantify underdetermination by using sociological markers, ie they treat models as underdetermined if cosmologists haven’t yet agreed on the model in question. I guess that’s the best they could have done, but it’s not a basis on which one can discuss what will remain underdetermined. The authors for example seem to implicitly believe that evidence for a theory at high energies can only come from processes at such high energies, but that isn’t so – one can also use high precision measurements at low energies (at least in principle). In the end it comes down, again, to quantifying which model is the best fit.

With this advance warning, let me tell you the three main philosophical issues which the authors discuss.

1. Underdetermination of topology.

Einstein’s field equations are local differential equations which describe how energy-densities curve space-time. This means these equations describe how space changes from one place to the next and from one moment to the next, but they do not fix the overall connectivity – the topology – of space-time*.

A sheet of paper is a simple example. It’s flat and it has no holes. If you roll it up and make a cylinder, the paper is still flat, but now it has a hole. You could find out about this without reference to the embedding space by drawing a circle onto the cylinder and around its perimeter, so that it can’t be contracted to zero length while staying on the cylinder’s surface. This could never happen on a flat sheet. And yet, if you look at any one point of the cylinder and its surrounding, it is indistinguishable from a flat sheet. The flat sheet and the cylinder are locally identical – but they are globally different.

General Relativity thus can’t tell you the topology of space-time. But physicists don’t normally worry much about this because you can parameterize the differences between topologies, compute observables, and then compare the results to data. Topology is, in that, no different than any other assumption of a cosmological model. Cosmologists can, and have, looked for evidence of non-trivial space-time connectivity in the CMB data, but they haven’t found anything that would indicate our universe wraps around itself. At least so far.

In the paper, the authors point out an argument raised by someone else (Manchak) which claims that different topologies can’t be distinguished almost everywhere. I haven’t read the paper in question, but this claim is almost certainly correct. The reason is that while topology is a global property, you can change it on arbitrarily small scales. All you have to do is punch a hole into that sheet of paper, and whoops, it’s got a new topology. Or if you want something without boundaries, then identify two points with each other. Indeed you could sprinkle space-time with arbitrarily many tiny wormholes and in that way create the most abstruse topological properties (and, most likely, lots of causal paradoxa).

The topology of the universe is hence, like the topology of the human body, a matter of resolution. On distances visible to the eye you can count the holes in the human body on the fingers of your hand. On shorter distances though you’re all pores and ion channels, and on subatomic distances you’re pretty much just holes. So, asking what’s the topology of a physical surface only makes sense when one specifies at which distance scale one is probing this (possibly higher-dimensional) surface.

I thus don’t think any physicist will be surprised by the philosophers’ finding that cosmology severely underdetermines global topology. What the paper fails to discuss though is the scale-dependence of that conclusion. Hence, I would like to know: Is it still true that the topology will remain underdetermined on cosmological scales? And to what extent, and under which circumstances, can the short-distance topology have long-distance consequences, as eg suggested by the ER=EPR idea? What effect would this have on the separation of scales in effective field theory?

2. Underdetermination of models of inflation.

The currently most widely accepted model for the universe assumes the existence of a scalar field – the “inflaton” – and a potential for this field – the “inflation potential” – in which the field moves towards a minimum. While the field is getting there, space is exponentially stretched. At the end of inflation, the field’s energy is dumped into the production of particles of the standard model and dark matter.

This mechanism was invented to solve various finetuning problems that cosmology otherwise has, notably that the universe seems to be almost flat (the “flatness problem”), that the cosmic microwave background has the almost-same temperature in all directions except for tiny fluctuations (the “horizon problem”), and that we haven’t seen any funky things like magnetic monopoles or domain walls that tend to be plentiful at the energy scale of grand unification (the “monopole problem”).

Trouble is, there’s loads of inflation potentials that one can cook up, and most of them can’t be distinguished with current data. Moreover, one can invent more than one inflation field, which adds to the variety of models. So, clearly, the inflation models are severely underdetermined.

I’m not really sure why this overabundance of potentials is interesting for philosophers. This isn’t so much philosophy as sociology – that the models are underdetermined is why physicists get them published, and if there was enough data to extract a potential that would be the end of their fun. Whether there will ever be enough data to tell them apart, only time will tell. Some potentials have already been ruled out with incoming data, so I am hopeful.

The questions that I wish philosophers would take on are different ones. To begin with, I’d like to know which of the problems that inflation supposedly solves are actual problems. It only makes sense to complain about finetuning if one has a probability distribution. In this, the finetuning problem in cosmology is distinctly different from the finetuning problems in the standard model, because in cosmology one can plausibly argue there is a probability distribution – it’s that of fluctuations of the quantum fields which seed the initial conditions.

So, I believe that the horizon problem is a well-defined problem, assuming quantum theory remains valid close by the Planck scale. I’m not so sure, however, about the flatness problem and the monopole problem. I don’t see what’s wrong with just assuming the initial value for the curvature is tiny (finetuned), and I don’t know why I should care about monopoles given that we don’t know grand unification is more than a fantasy.

Then, of course, the current data indicates that the inflation potential too must be finetuned which, as Steinhardt has aptly complained, means that inflation doesn’t really solve the problem it was meant to solve. But to make that statement one would have to compare the severity of finetuning, and how does one do that? Can one even make sense of this question? Where are the philosophers if one needs them?

Finally, I have a more general conceptual problem that falls into the category of underdetermination, which is to which extent the achievements of inflation are actually independent of each other. Assume, for example, you have a theory that solves the horizon problem. Under which circumstances does it also solve the flatness problem and gives the right tilt for the spectral index? I suspect that the assumptions for this do not require the full mechanism of inflation with potential and all, and almost certainly not a very specific type of potential. Hence I would like to know what’s the minimal theory that explains the observations, and which assumptions are really necessary.

3. Underdetermination in the multiverse.

Many models for inflation create not only one universe, but infinitely many of them, a whole “multiverse”. In the other universes, fundamental constants – or maybe even the laws of nature themselves – can be different. How do you make predictions in a multiverse? You can’t, really. But you can make statements about probabilities, about how likely it is that we find ourselves in this universe with these particles and not any other.

To make statements about the probability of the occurrence of certain universes in the multiverse one needs a probability distribution or a measure (in the space of all multiverses or their parameters respectively). Such a measure should also take into account anthropic considerations, since there are some universes which are almost certainly inhospitable for life, for example because they don’t allow the formation of large structures.

In their paper, the authors point out that the combination of a universe ensemble and a measure is underdetermined by observations we can make in our universe. It’s underdetermined in the same what that if I give you a bag of marbles and say the most likely pick is red, you can’t tell what’s in the bag.

I think physicists are well aware of this ambiguity, but unfortunately the philosophers don’t address why physicists ignore it. Physicists ignore it because they believe that one day they can deduce the theory that gives rise to the multiverse and the measure on it. To make their point, the philosophers would have had to demonstrate that this deduction is impossible. I think it is, but I’d rather leave the case to philosophers.

For the agnostic instrumentalist like me a different question is more interesting, which is whether one stands to gain anything from taking a “shut-up-and-calculate” attitude to the multiverse, even if one distinctly dislikes it. Quantum mechanics too uses unobservable entities, and that formalism –however much you detest it – works very well. It really adds something new, regardless of whether or not you believe the wave-function is “real” in some sense. For what the multiverse is concerned, I am not sure about this. So why bother with it?

Consider the best-case multiverse outcome: Physicists will eventually find a measure on some multiverse according to which the parameters we have measured are the most likely ones. Hurray. Now forget about the interpretation and think of this calculation as a black box: You put in math one side and out comes a set of “best” parameters the other side. You could always reformulate such a calculation as an optimization problem which allows one to calculate the correct parameters. So, independent of the thorny question of what’s real, what do I gain from thinking about measures on the multiverse rather than just looking for an optimization procedure straight away?

Yes, there are cases – like bubble collisions in eternal inflation – that would serve as independent confirmation for the existence of another universe. But no evidence for that has been found. So for me the question remains: under which circumstances is doing calculations in the multiverse an advantage rather than unnecessary mathematical baggage?

I think this paper makes a good example for the difference between philosophers’ and physicists’ interests which I wrote about in my previous post. It was a good (if somewhat long) read and it gave me something to think, though I will need some time to recover from all the -isms.

* Note added: The word connectivity in this sentence is a loose stand-in for those who do not know the technical term “topology.” It does not refer to the technical term “connectivity.”

### Tommaso Dorigo — The 2016 Perseids, And The Atmosphere As A Detector

As a long-time meteor observer, I never lose an occasion to watch the peak of good showers. The problem is that similar occasions have become less frequent in the recent times, due to a busier agenda.
In the past few days, however, I was at CERN and could afford going out to observe the night sky, so it made sense to spend at least a couple of hours to check on the peak activity of the Perseids, which this year was predicted to be stronger than usual.

## August 14, 2016

### Clifford Johnson — A Known Fact…

“The fact that certain bodies, after being rubbed, appear to attract other bodies, was known to the ancients.” Thus begins, rather awesomely, the preface to Maxwell’s massively important “Treatise on Electricity and Magnetism” (1873). -cvj

The post A Known Fact… appeared first on Asymptotia.

## August 13, 2016

### Jordan Ellenberg — I got a message for you

“I got a message for you, if I could only remember.  I got a message for you, but you’re gonna have to come and get it.”  Kardyhm Kelly gave me a tape of Zopilote Machine in 1995 and I played nothing but for a month.  “Sinaloan Milk Snake Song” especially.  Nobody but the Mountain Goats ever made do-it-yourself music like this, nobody else ever made it seem so believable that the things it occurred to you to say or sing while you were playing your guitar in your bedroom at home might actually be pop songs.   The breakdown at the end of this!

“I’ve got a heavy coat, it’s filled with rocks and sand, and if I lose it I’ll be coming back one day (I got a message for you).”  I spent a lot of 1993 thinking about the chord progression in the verse of this song.  How does it sound so straight-ahead but also so weird?  Also the “la la la”s (“Sinaloan Milk Snake Song” has these too.)

“Roll me in the greenery, point me at the scenery.  Exploit me in the deanery.  I got a message for you.”

The first of these I ever heard.  Douglas Wolk used to send mixtapes to Elizabeth Wilmer at Math Olympiad training.  This was on one of them.  1987 probably. I hadn’t even started listening to WHFS yet, I had no idea who Robyn Hitchcock was.  It was on those tapes I first heard the Ramones, Marshall Crenshaw, the Mentors (OK, we were in high school, cut us some slack.)

(Update:  Douglas denies ever putting the Mentors on a mixtape, and now that I really think about it, I believe Eric Wepsic was to blame for bringing the Mentors into my life.)

Why is this line so potent?  Why is the message never explicitly presented?  It’s enough — it’s better — that the message only be alluded to, never spoken, never delivered.

### Geraint F. Lewis — For the love of Spherical Harmonics

I hate starting every blog post with an apology as I have been busy, but I have. But I have. Teaching Electromagnetism to our first year class, computational physics using MatLab, and six smart talented students to wrangle, takes up a lot of time.

But I continue to try and learn a new thing every day! And so here's a short summary of what I've been doing recently.

There's no secret I love maths. I'm not skilled enough to be a mathematician, but I am an avid user. One of the things I love about maths is its shock value. What, I hear you say, shock? Yes, shock.

I remember when I discovered that trigonometric functions can be written as infinite series, and finding you can calculate these series numerically on a computer by adding the terms together, getting more and more accurate as we add higher terms.

And then there is Fourier Series! The fact that you can add these trigonometric functions together, appropriately weighted, to make other functions, functions that look nothing like sines and cosines. And again, calculating these on a computer.

This is my favourite, the fact that you can add waves together to make a square wave.
But we can go one step higher. We can think of waves on a sphere. These are special waves called called Spherical Harmonics.

Those familiar with Schrodinger's equation know that these appear in the solution for the hydrogen atom, describing the wave function, telling us about the properties of the electron.

But spherical harmonics on a sphere are like the sines and cosines above, and we can describe any function over a sphere by summing up the appropriately weighted harmonics. What function you might be thinking? How about the heights of the land and the depths of the oceans over the surface of the Earth?

This cool website has done this, and provide the coefficients that you need to use to describe the surface of the Earth in terms of spherical harmonics. The coefficients are complex numbers as they describe not only how much of a harmonic you need to add, but also how much you need to rotate it.

What you are seeing is the surface of the Earth. At the start, we have only the zeroth "mode", which is just a constant value across the surface. Then we add the first mode, which is a "dipole", which is negative on one side of the Earth and positive on the other, but appropriately rotated. And then we keep adding higher and higher modes, which adds more and more detail. And I think it looks very cool!

Why are you doing this, I hear you cry. Why, because to make this work, I had to beef up my knowledge of python and povray, learn how to fully wrangle healpy to deal with functions on a sphere, a bit of scripting, a bit of ffmpeg, and remember just what spherical harmonics are. And as I've written about before, I think it is an important thing for a researcher to grow these skills.

When will I need these skills? Dunno, but they're now in my bag of tricks and ready to use.

## August 12, 2016

### Backreaction — The Unbearable Lightness of Philosophy

Philosophy isn’t useful for practicing physicists. On that, I am with Steven Weinberg and Lawrence Krauss who have expressed similar opinions. But I think it’s an unfortunate situation because physicists – especially those who work on the foundations of physics – could need help from philosophers.

Massimo Pigliucci, a Prof for Philosophy at CUNY-City College, has ingeniously addressed physicists’ complaints about the uselessness of philosophy by declaring that “the business of philosophy is not to advance science.” Philosophy, hence, isn’t just useless, but it’s useless on purpose. I applaud. At least that means it has a purpose.

But I shouldn’t let Massimo Pigliucci speak for his whole discipline.

I’ve been told for what physics is concerned there are presently three good philosophers roaming Earth: David Albert, Jeremy Butterfield, and Tim Maudlin. It won’t surprise you to hear that I have some issues to pick with each of these gentlemen, but mostly they seem reasonable indeed. I would even like to nominate a fourth Good Philosopher, Steven Weinstein from UoW, with whom even I haven’t yet managed to disagree.

The good Maudlin, for example, had an excellent essay last year on PBS NOVA, in which he argued that “Physics needs Philosophy.” I really liked his argument until he wrote that “Philosophers obsess over subtle ambiguities of language,” which pretty much sums up all that physicists hate about philosophy.

If you want to know “what follows from what,” as Maudlin writes, you have to convert language into mathematics and thereby remove the ambiguities. Unfortunately, philosophers never seem to take that step, hence physicists’ complaints that it’s just words. Or, as Arthur Koestler put it, “the systematic abuse of a terminology specially invented for that purpose.”

Maybe, I admit, it shouldn’t be the philosophers’ job to spell out how to remove the ambiguities in language. Maybe that should already be the job of physicists. But regardless of whom you want to assign the task of reaching across the line, presently little crosses it. Few practicing physicists today care what philosophers do or think.

And as someone who has tried to write about topics on the intersection of both fields, I can report that this disciplinary segregation is meanwhile institutionalized: The physics journals won’t publish on the topic because it’s too much philosophy, and the philosophy journals won’t publish because it’s too much physics.

In a recent piece on Aeon, Pigliucci elaborates on the demarcation problem, how to tell science from pseudoscience. He seems to think this problem is what underlies some physicists’ worries about string theory and the multiverse, worries that were topic of a workshop that both he and I attended last year.

But he got it wrong. While I know lots of physicists critical of string theory for one reason or the other, none of them would go so far to declare it pseudoscience. No, the demarcation problem that physicists worry about isn’t that between science and pseudoscience. It’s that between science and philosophy. It is not without irony that Pigliucci in his essay conflates the two fields. Or maybe the purpose of his essay was an attempt to revive the “string wars,” in which case, wake me when it’s over.

To me, the part of philosophy that is relevant to physics is what I’d like to call “pre-science” – sharpening questions sufficiently so that they can eventually be addressed by scientific means. Maudlin in his above mentioned essay expressed a very similar point of view.

Philosophers in that area are necessarily ahead of scientists. But they also never get the credit for actually answering a question, because for that they’ll first have to hand it over to scientists. Like a psychologist, thus, the philosopher of physics succeeds by eventually making themselves superfluous. It seems a thankless job. There’s a reason I preferred studying physics instead.

Many of the “bad philosophers” are those who aren’t quick enough to notice that a question they are thinking about has been taken over by scientists. That this failure to notice can evidently persist, in some cases, for decades is another institutionalized problem that originates in the lack of communication between both fields.

Hence, I wish there were more philosophers willing to make it their business to advance science and to communicate across the boundaries. Maybe physicists would complain less that philosophy is useless if it wasn’t useless.

### Chad Orzel — Physics Blogging Round-Up: Camera Tricks, College Advice, Hot Fans, and Lots of Quantum

Several weeks of silence here, for a bunch of reasons that mostly boil down to “being crazy busy.” I’ve got a bunch of physics posts over at Forbes during that interval, though:

The Camera Trick That Justifies The Giant Death Star: I busted out camera lenses and the kids’ toys to show how you might make the Death Star appear as huge as on the Rogue One poster.

How Quantum Physics Could Protect You Against Embarrassing Email Hacks: Using the DNC email leak as an excuse to talk about quantum cryptography.

Four Things You Should Expect To Get Out Of College: Some advice for students starting their college careers about what will really matter for long-term success (the time scale of a career, not just a first job).

How Quantum Sudoku Demonstrate Entanglement: One of the things contributing to “crazy busy” was the second round of the Schrodinger Sessions workshop, at which I heard a clever analogy for entanglement from Howard Wiseman by way of Alan Migdall, and turned it into a blog post.

Is Your Fan Actually Heating The Air?: We talk about temperature as measuring how fast atoms in a gas are moving. Does that mean that a fan setting air into motion is actually increasing the air temperature?

Three Tricks Physicists Use To Observe Quantum Behavior: Another post prompted by the Schrodinger Sessions, this one a big-picture look at the general approaches physicists take to doing experimental demonstrations of quantum phenomena.

So, you know, that’s a bunch of stuff, all right.

## August 09, 2016

### Resonaances — Game of Thrones: 750 GeV edition

The 750 GeV diphoton resonance has made a big impact on theoretical particle physics. The number of papers on the topic is already legendary, and they keep coming at the rate of order 10 per week. Given that the Backović model is falsified, there's no longer a theoretical upper limit.  Does this mean we are not dealing with the classical ambulance chasing scenario? The answer may be known in the next days.

So who's leading this race?  What kind of question is that, you may shout, of course it's Strumia! And you would be wrong, independently of the metric.  For this contest, I will consider two different metrics: the King Beyond the Wall that counts the number of papers on the topic, and the Iron Throne that counts how many times these papers have been cited.

In the first category,  the contest is much more fierce than one might expect: it takes 8 papers to be the leader, and 7 papers may not be enough to even get on the podium!  Among the 3 authors with 7 papers the final classification is decided by trial by combat the citation count.  The result is (drums):

Citations, tja...   Social dynamics of our community encourages referencing all previous work on the topic, rather than just the relevant ones, which in this particular case triggered a period of inflation. One day soon citation numbers will mean as much as authorship in experimental particle physics. But for now the size of the h-factor is still an important measure of virility for theorists. If the citation count rather the number of papers is the main criterion, the iron throne is taken by a Targaryen contender (trumpets):

This explains why the resonance is usually denoted by the letter S.

Update 09.08.2016. Now that the 750 GeV excess is officially dead, one can give the final classification. The race for the iron throne was tight till the end, but there could only be one winner:

As you can see, in this race the long-term strategy and persistence proved to be more important than pulling off a few early victories.  In the other category there have also been  changes in the final stretch: the winner added 3 papers in the period between the un-official and official announcement of the demise of the 750 GeV resonance. The final standing are:

Congratulations for all the winners.  For all the rest, wish you more luck and persistence in the next edition,  provided it will take place.

### Clifford Johnson — All Aboard…!

The other day, quite recently, I clicked "place your order" on... a toy New York MTA bus. I can't pretend it was for the youngster of the house, it was for me. No, it is not a mid-life crisis (heh... I'm sure others might differ on this point), and I will happily declare that it is not out of nostalgia for my time in the city, especially back in the 90s.

It's for the book. I've an entire story set on a bus in Manhattan and I neglected to location scout a bus when I was last there. I figured I could work from tourist photos and so forth. Turns out that you don't get many good tourist photos of MTA bus interiors, and not the angles I want. Then I discovered various online bus-loving subcultures that go through all the details of every model of NYC bus, with endless shots of the buses in different parts of the city... but still not many good interiors and no good overheads and so forth. (See Transittalk, for example - I now know way more about buses in New york than I ever thought I'd want to know.) Then I accidentally had an Amazon link show up in my [...] Click to continue reading this post

The post All Aboard…! appeared first on Asymptotia.

### Jordan Ellenberg — “On l-torsion in class groups of number fields” (with L. Pierce, M.M. Wood)

New paper up with Lillian Pierce and Melanie Matchett Wood!

Here’s the deal.  We know a number field K of discriminant D_K has class group of size bounded above by roughly D_K^{1/2}.  On the other hand, if we fix a prime l, the l-torsion in the class group ought to be a lot smaller.  Conjectures of Cohen-Lenstra type predict that the average size of the l-torsion in the class group of D_K, as K ranges over a “reasonable family” of algebraic number fields, should be constant.  Very seldom do we actually know anything like this; we just have sporadic special cases, like the Davenport-Heilbronn theorem, which tells us that the 3-torsion in the class group of a random quadratic field is indeed constant on average.

But even though we don’t know what’s true on average, why shouldn’t we go ahead and speculate on what’s true universally?  It’s too much to ask that Cl(K)[l] literally be bounded as K varies (at least if you believe even the most modest version of Cohen-Lenstra, which predicts that any value of dim Cl(D_K)[l] appears for a positive proportion of quadratic fields K) but people do think it’s small:

Conjecture:  |Cl(K)[l]| < D_K^ε.

Even beating the trivial bound

|Cl(K)[l]| < |Cl(K)| < D_K^{1/2 + ε}

is not easy.  Lillian was the first to do it for 3-torsion in quadratic fields.  Later, Helfgott-Venkatesh and Venkatesh and I sharpened those bounds.  I hear from Frank Thorne that he, Bhargava, Shankar, Tsimerman and Zhao have a nontrivial bound on 2-torsion for the class group of number fields of any degree.

In the new paper with Pierce and Wood, we prove nontrivial bounds for the average size of the l-torsion in the class group of K, where l is any integer, and K is a random number field of degree at most 5.  These bounds match the conditional bounds Akshay and I get on GRH.  The point, briefly, is this.  To make our argument work, Akshay and I needed GRH in order to guarantee the existence of a lot of small rational primes which split in K.  (In a few cases, like 3-torsion of quadratic fields, we used a “Scholz reflection trick” to get around this necessity.)  At the time, there was no way to guarantee small split primes unconditionally, even on average.  But thanks to the developments of the last decade, we now know a lot more about how to count number fields of small degree, even if we want to do something delicate like keep track of local conditions.  So, for instance, not only can one count quartic fields of discriminant < X, we can count fields which have specified decomposition at any specified finite set of rational primes.  This turns out to be enough — as long as you are super-careful with error terms! — to  allow us to show, unconditionally, that most number fields of discriminant < D have enough small split primes to make the bound on l-torsion go.  Hopefully, the care we took here to get counts with explicit error terms for number fields subject to local conditions will be useful for other applications too.

## August 08, 2016

### John Preskill — Greg Kuperberg’s calculus problem

“How good are you at calculus?”

This was the opening sentence of Greg Kuperberg’s Facebook status on July 4th, 2016.

“I have a joint paper (on isoperimetric inequalities in differential geometry) in which we need to know that

$(\sin\theta)^3 xy + ((\cos\theta)^3 -3\cos\theta +2) (x+y) - (\sin\theta)^3-6\sin\theta -6\theta + 6\pi \\ \\- 6\arctan(x) +2x/(1+x^2) -6\arctan(y) +2y/(1+y^2)$

is non-negative for x and y non-negative and $\theta$ between $0$ and $\pi$. Also, the minimum only occurs for $x=y=1/(\tan(\theta/2)$.”

Let’s take a moment to appreciate the complexity of the mathematical statement above. It is a non-linear inequality in three variables, mixing trigonometry with algebra and throwing in some arc-tangents for good measure. Greg, continued:

“We proved it, but only with the aid of symbolic algebra to factor an algebraic variety into irreducible components. The human part of our proof is also not really a cake walk.

A simpler proof would be way cool.”

I was hooked. The cubic terms looked a little intimidating, but if I converted x and y into $\tan(\theta_x)$ and $\tan(\theta_y)$, respectively, as one of the comments on Facebook promptly suggested, I could at least get rid of the annoying arc-tangents and then calculus and trigonometry would take me the rest of the way. Greg replied to my initial comment outlining a quick route to the proof: “Let me just caution that we found the problem unyielding.” Hmm… Then, Greg revealed that the paper containing the original proof was over three years old (had he been thinking about this since then? that’s what true love must be like.) Titled “The Cartan-Hadamard Conjecture and The Little Prince“, the above inequality makes its appearance as Lemma 7.1 on page 45 (of 63). To quote the paper: “Although the lemma is evident from contour plots, the authors found it surprisingly tricky to prove rigorously.”

As I filled pages of calculations and memorized every trigonometric identity known to man, I realized that Greg was right: the problem was highly intractable. The quick solution that was supposed to take me two to three days turned into two weeks of hell, until I decided to drop the original approach and stick to doing calculus with the known unknowns, x and y. The next week led me to a set of three non-linear equations mixing trigonometric functions with fourth powers of x and y, at which point I thought of giving up. I knew what I needed to do to finish the proof, but it looked freaking insane. Still, like the masochist that I am, I continued calculating away until my brain was mush. And then, yesterday, during a moment of clarity, I decided to go back to one of the three equations and rewrite it in a different way. That is when I noticed the error. I had solved for $\cos\theta$ in terms of x and y, but I had made a mistake that had cost me 10 days of intense work with no end in sight. Once I found the mistake, the whole proof came together within about an hour. At that moment, I felt a mix of happiness (duh), but also sadness, as if someone I had grown fond of no longer had a reason to spend time with me and, at the same time, I had ran out of made-up reasons to hang out with them. But, yeah, I mostly felt happiness.

Greg Kuperberg pondering about the universe of mathematics.

But, back to the problem. The past four weeks thinking about it have oscillated between phases of “this is the most fun I’ve had in years!” to “this is Greg’s way of telling me I should drop math and become a go-go dancer”. Now that the ordeal is over, I can confidently say that the problem is anything but “dull” (which is how Greg felt others on MathOverflow would perceive it, so he never posted it there). In fact, if I ever have to teach Calculus, I will subject my students to the step-by-step proof of this problem. OK, here is the proof. This one is for you Greg. Thanks for being such a great role model. Sorry I didn’t get to tell you until now. And you are right not to offer a “bounty” for the solution. The journey (more like, a trip to Mordor and back) was all the money.

The proof: The first thing to note (and if I had read Greg’s paper earlier than today, I would have known as much weeks ago) is that the following equality holds (which can be verified quickly by differentiating both sides):

$4 x - 6\arctan(x) +2x/(1+x^2) = 4 \int_0^x \frac{s^4}{(1+s^2)^2} ds$.

Using the above equality (and the equivalent one for y), we get:

$F(\theta,x,y) = (\sin\theta)^3 xy + ((\cos\theta)^3 -3\cos\theta -2) (x+y) - (\sin\theta)^3-6\sin\theta -6\theta + 6\pi \\ \\4 \int_0^x \frac{s^4}{(1+s^2)^2} ds+4 \int_0^y \frac{s^4}{(1+s^2)^2} ds.$

Now comes the fun part. We differentiate with respect to $\theta$, x and y, and set to zero to find all the maxima and minima of $F(\theta,x,y)$ (though we are only interested in the global minimum, which is supposed to be at $x=y=\tan^{-1}(\theta/2))$. Some high-school level calculus yields:

$\partial_\theta F(\theta,x,y) = 0 \implies \sin^2(\theta) (\cos(\theta) xy + \sin(\theta)(x+y)) = \\ \\ 2 (1+\cos(\theta))+\sin^2(\theta)\cos(\theta).$

At this point, the most well-known trigonometric identity of all time, $\sin^2(\theta)+\cos^2(\theta)=1$, can be used to show that the right-hand-side can be re-written as:

$2(1+\cos(\theta))+\sin^2(\theta)\cos(\theta) = \sin^2(\theta) (\cos\theta \tan^{-2}(\theta/2) + 2\sin\theta \tan^{-1}(\theta/2)),$

where I used (my now favorite) trigonometric identity: $\tan^{-1}(\theta/2) = (1+\cos\theta)/\sin(\theta)$ (note to the reader: $\tan^{-1}(\theta) = \cot(\theta)$). Putting it all together, we now have the very suggestive condition:

$\sin^2(\theta) (\cos(\theta) (xy-\tan^{-2}(\theta/2)) + \sin(\theta)(x+y-2\tan^{-1}(\theta/2))) = 0,$

noting that, despite appearances, $\theta = 0$ is not a solution (as can be checked from the original form of this equality, unless $x$ and $y$ are infinite, in which case the expression is clearly non-negative, as we show towards the end of this post). This leaves us with $\theta = \pi$ and

$\cos(\theta) (\tan^{-2}(\theta/2)-xy) = \sin(\theta)(x+y-2\tan^{-1}(\theta/2)),$

as candidates for where the minimum may be. A quick check shows that:

$F(\pi,x,y) = 4 \int_0^x \frac{s^4}{(1+s^2)^2} ds+4 \int_0^y \frac{s^4}{(1+s^2)^2} ds \ge 0,$

since x and y are non-negative. The following obvious substitution becomes our greatest ally for the rest of the proof:

$x= \alpha \tan^{-1}(\theta/2), \, y = \beta \tan^{-1}(\theta/2).$

Substituting the above in the remaining condition for $\partial_\theta F(\theta,x,y) = 0$, and using again that $\tan^{-1}(\theta/2) = (1+\cos\theta)/\sin\theta$, we get:

$\cos\theta (1-\alpha\beta) = (1-\cos\theta) ((\alpha-1) + (\beta-1)),$

which can be further simplified to (if you are paying attention to minus signs and don’t waste a week on a wild-goose chase like I did):

$\cos\theta = \frac{1}{1-\beta}+\frac{1}{1-\alpha}$.

As Greg loves to say, we are finally cooking with gas. Note that the expression is symmetric in $\alpha$ and $\beta$, which should be obvious from the symmetry of $F(\theta,x,y)$ in x and y. That observation will come in handy when we take derivatives with respect to x and y now. Factoring $(\cos\theta)^3 -3\cos\theta -2 = - (1+\cos\theta)^2(2-\cos\theta)$, we get:

$\partial_x F(\theta,x,y) = 0 \implies \sin^3(\theta) y + 4\frac{x^4}{(1+x^2)^2} = (1+\cos\theta)^2 + \sin^2\theta (1+\cos\theta).$

Substituting x and y with $\alpha \tan^{-1}(\theta/2), \beta \tan^{-1}(\theta/2)$, respectively and using the identities $\tan^{-1}(\theta/2) = (1+\cos\theta)/\sin\theta$ and $\tan^{-2}(\theta/2) = (1+\cos\theta)/(1-\cos\theta),$ the above expression simplifies significantly to the following expression:

$4\alpha^4 =\left((\alpha^2-1)\cos\theta+\alpha^2+1\right)^2 \left(1 + (1-\beta)(1-\cos\theta)\right).$

Using $\cos\theta = \frac{1}{1-\beta}+\frac{1}{1-\alpha}$, which we derived earlier by looking at the extrema of $F(\theta,x,y)$ with respect to $\theta$, and noting that the global minimum would have to be an extremum with respect to all three variables, we get:

$4\alpha^4 (1-\beta) = \alpha (\alpha-1) (1+\alpha + \alpha(1-\beta))^2,$

where we used $1 + (1-\beta)(1-\cos\theta) = \alpha (1-\beta) (\alpha-1)^{-1}$ and

$(\alpha^2-1)\cos\theta+\alpha^2+1 = (\alpha+1)((\alpha-1)\cos\theta+1)+\alpha(\alpha-1) = \\ (\alpha-1)(1-\beta)^{-1} (2\alpha + 1-\alpha\beta).$

We may assume, without loss of generality, that $x \ge y$. If $\alpha = 0$, then $\alpha = \beta = 0$, which leads to the contradiction $\cos\theta = 2$, unless the other condition, $\theta = \pi$, holds, which leads to $F(\pi,0,0) = 0$. Dividing through by $\alpha$ and re-writing $4\alpha^3(1-\beta) = 4\alpha(1+\alpha)(\alpha-1)(1-\beta) + 4\alpha(1-\beta)$, yields:

$4\alpha (1-\beta) = (\alpha-1) (1+\alpha - \alpha(1-\beta))^2 = (\alpha-1)(1+\alpha\beta)^2,$

which can be further modified to:

$4\alpha +(1-\alpha\beta)^2 = \alpha (1+\alpha\beta)^2,$

and, similarly for $\beta$ (due to symmetry):

$4\beta +(1-\alpha\beta)^2 = \beta (1+\alpha\beta)^2.$

Subtracting the two equations from each other, we get:

$4(\alpha-\beta) = (\alpha-\beta)(1+\alpha\beta)^2$,

which implies that $\alpha = \beta$ and/or $\alpha\beta =1$. The first leads to $4\alpha (1-\alpha) = (\alpha-1)(1+\alpha^2)^2,$ which immediately implies $\alpha = 1 = \beta$ (since the left and right side of the equality have opposite signs otherwise). The second one implies that either $\alpha+\beta =2$, or $\cos\theta =1$, which follows from the earlier equation $\cos\theta (1-\alpha\beta) = (1-\cos\theta) ((\alpha-1) + (\beta-1))$. If $\alpha+\beta =2$ and $1 = \alpha\beta$, it is easy to see that $\alpha=\beta=1$ is the only solution by expanding $(\sqrt{\alpha}-\sqrt{\beta})^2=0$. If, on the other hand, $\cos\theta = 1$, then looking at the original form of $F(\theta,x,y)$, we see that $F(0,x,y) = 6\pi - 6\arctan(x) +2x/(1+x^2) -6\arctan(y) +2y/(1+y^2) \ge 0$, since $x,y \ge 0 \implies \arctan(x)+\arctan(y) \le \pi$.

And that concludes the proof, since the only cases for which all three conditions are met lead to $\alpha = \beta = 1$ and, hence, $x=y=\tan^{-1}(\theta/2)$. The minimum of $F(\theta, x,y)$ at these values is always zero. That’s right, all this work to end up with “nothing”. But, at least, the last four weeks have been anything but dull.

Update: Greg offered Lemma 7.4 from the same paper as another challenge (the sines, cosines and tangents are now transformed into hyperbolic trigonometric functions, with a few other changes, mostly in signs, thrown in there). This is a more hardcore-looking inequality, but the proof turns out to follow the steps of Lemma 7.1 almost identically. In particular, all the conditions for extrema are exactly the same, with the only difference being that cosine becomes hyperbolic cosine. It is an awesome exercise in calculus to check this for yourself. Do it. Unless you have something better to do.

### Doug Natelson — Why is desalination difficult? Thermodynamics.

There are millions of people around the world without access to drinkable fresh water.  At the same time, the world's oceans contain more than 1.3 million cubic kilometers of salt water.  Seems like all we have to do is get the salt out of the water, and we're all set.   Unfortunately, thermodynamics makes this tough.  Imagine that you have a tank full of sea water and magical filter that lets water through but blocks the dissolved salt ions.    You could drag the filter across the tank - this would concentrate the salt in one side of the tank and leave behind fresh water.  However, this takes work.  You can think about the dissolved ions as a dilute gas, and when you're dragging the membrane across the tank, you're compressing that gas.  An osmotic pressure would resist your pushing of the membrane.  Osmotic effects are behind why red blood cells burst in distilled water and why slugs die when coated with salt.  They're also the subject of a great Arthur C. Clarke short story.

In the language of thermodynamics, desalination requires you to increase the chemical potential of the dissolved ions you're removing from the would-be fresh water, by putting them in a more concentrated state.   This sets limits on how energetically expensive it is to desalinate water - see here, slide 12.   The simplest scheme to implement, distillation by boiling and recondensation, requires coming up with the latent heat of the water and is energetically inefficient.  With real-life approximations of the filter I mentioned, you can drive the process, called reverse osmosis, and do better.  Still, the take-away message is, it takes energy to perform desalination for very similar physics reasons that it takes energy to compress a gas.

Interestingly, you can go the other way.  You know that you can get useful work out of a gas reservoirs at two different pressures.  You can imagine using the difference in chemical potential between salt water and fresh water to drive an engine or produce electricity.  In that sense, every time a freshwater stream or river empties into the ocean and the salinity gradient smooths itself by mixing of its own accord, we are wasting possible usable energy.  This was pointed out here, and there is now an extensive wikipedia entry on osmotic power.

## August 05, 2016

### Matt Strassler — The 2016 Data Kills The Two-Photon Bump

Results for the bump seen in December have been updated, and indeed, with the new 2016 data — four times as much as was obtained in 2015 — neither ATLAS nor CMS [the two general purpose detectors at the Large Hadron Collider] sees an excess where the bump appeared in 2015. Not even a hint, as we already learned inadvertently from CMS yesterday.

All indications so far are that the bump was a garden-variety statistical fluke, probably (my personal guess! there’s no evidence!) enhanced slightly by minor imperfections in the 2015 measurements. Should we be surprised? No. If you look back at the history of the 1970s and 1980s, or at the recent past, you’ll see that it’s quite common for hints — even strong hints — of new phenomena to disappear with more data. This is especially true for hints based on small amounts of data (and there were not many two photon events in the bump — just a couple of dozen).  There’s a reason why particle physicists have very high standards for statistical significance before they believe they’ve seen something real.  (Many other fields, notably medical research, have much lower standards.  Think about that for a while.)  History has useful lessons, if you’re willing to learn them.

Back in December 2011, a lot of physicists were persuaded that the data shown by ATLAS and CMS was convincing evidence that the Higgs particle had been discovered. It turned out the data was indeed showing the first hint of the Higgs. But their confidence in what the data was telling them at the time — what was called “firm evidence” by some — was dead wrong. I took a lot of flack for viewing that evidence as a 50-50 proposition (70-30 by March 2012, after more evidence was presented). Yet the December 2015 (March 2016) evidence for the bump at 750 GeV was comparable to what we had in December 2011 for the Higgs. Where’d it go?  Clearly such a level of evidence is not so firm as people claimed. I, at least, would not have been surprised if that original Higgs hint had vanished, just as I am not surprised now… though disappointed of course.

Was this all much ado about nothing? I don’t think so. There’s a reason to have fire drills, to run live-fire exercises, to test out emergency management procedures. A lot of new ideas, both in terms of new theories of nature and new approaches to making experimental measurements, were generated by thinking about this bump in the night. The hope for a quick 2016 discovery may be gone, but what we learned will stick around, and make us better at what we do.

Filed under: History of Science, LHC News Tagged: #LHC #Higgs #ATLAS #CMS #diphoton

### Resonaances — After the hangover

The loss of the 750 GeV diphoton resonance is a big blow to the particle physics community. We are currently going through the 5 stages of grief, everyone at their own pace, as can be seen e.g. in this comments section. Nevertheless, it may already be a good moment to revisit the story one last time, so as  to understand what went wrong.

In the recent years, physics beyond the Standard Model has seen 2 other flops of comparable impact: the faster-than-light neutrinos in OPERA, and the CMB tensor fluctuations in BICEP.  Much as the diphoton signal, both of the above triggered a binge of theoretical explanations, followed by a massive hangover. There was one big difference, however: the OPERA and BICEP signals were due to embarrassing errors on the experiments' side. This doesn't seem to be the case for the diphoton bump at the LHC. Some may wonder whether the Standard Model background may have been slightly underestimated,  or whether one experiment may have been biased by the result of the other... But, most likely, the 750 GeV bump was just due to a random fluctuation of the background at this particular energy. Regrettably, the resulting mess cannot be blamed on experimentalists, who were in fact downplaying the anomaly in their official communications. This time it's the theorists who  have some explaining to do.

Why did theorists write 500 papers about a statistical fluctuation?  One reason is that it didn't look like one at first sight. Back in December 2015, the local significance of the diphoton  bump in ATLAS run-2 data was 3.9 sigma, which means the probability of such a fluctuation was 1 in 10000. Combining available run-1 and run-2 diphoton data in ATLAS and CMS, the local significance was increased to 4.4 sigma.  All in all, it was a very unusual excess, a 1-in-100000 occurrence! Of course, this number should be interpreted with care. The point is that the LHC experiments perform gazillion different measurements, thus they are bound to observe seemingly unlikely outcomes in a small fraction of them. This can be partly taken into account by calculating the global significance, which is the probability of finding a background fluctuation of the observed size anywhere in the diphoton spectrum. The global significance of the 750 GeV bump quoted by ATLAS was only about two sigma, the fact strongly emphasized by the collaboration.  However, that number can be misleading too.  One problem with the global significance is that, unlike for the local one, it cannot be  easily combined in the presence of separate measurements of the same observable. For the diphoton final state we  have ATLAS and CMS measurements in run-1 and run-2,  thus 4 independent datasets, and their robust concordance was crucial  in creating the excitement.  Note also that what is really relevant here is the probability of a fluctuation of a given size in any of the  LHC measurement, and that is not captured by the global significance.  For these reasons, I find it more transparent work with the local significance, remembering that it should not be interpreted as the probability that the Standard Model is incorrect. By these standards, a 4.4 sigma fluctuation in a combined ATLAS and CMS dataset is still a very significant effect which deserves a special attention. What we learned the hard way is that such large fluctuations do happen at the LHC...   This lesson will certainly be taken into account next time we encounter a significant anomaly.

Another reason why the 750 GeV bump was exciting is that the measurement is rather straightforward.  Indeed, at the LHC we often see anomalies in complicated final states or poorly controlled differential distributions, and we treat those with much skepticism.  But a resonance in the diphoton spectrum is almost the simplest and cleanest observable that one can imagine (only a dilepton or 4-lepton resonance would be cleaner). We already successfully discovered one particle this way - that's how the Higgs boson first showed up in 2011. Thus, we have good reasons to believe that the collaborations control this measurement very well.

Finally, the diphoton bump was so attractive because theoretical explanations were  plausible.  It was trivial to write down a model fitting the data, there was no need to stretch or fine-tune the parameters, and it was quite natural that the particle first showed in as a diphoton resonance and not in other final states. This is in stark contrast to other recent anomalies which typically require a great deal of gymnastics to fit into a consistent picture.   The only thing to give you a pause was the tension with the LHC run-1 diphoton data, but even that became  mild after the Moriond update this year.

So we got a huge signal of a new particle in a clean channel with plausible theoretic models to explain it...  that was a really bad luck.  My conclusion may not be shared by everyone but I don't think that the theory community committed major missteps  in this case.  Given that for 30 years we have been looking for a clue about the fundamental theory beyond the Standard Model, our reaction was not disproportionate once a seemingly reliable one had arrived.  Excitement is an inherent part of physics research. And so is disappointment, apparently.

There remains a question whether we really needed 500 papers...   Well, of course not: many of  them fill an important gap.  Yet many are an interesting read, and I personally learned a lot of exciting physics from them.  Actually, I suspect that the fraction of useless papers among the 500 is lower than for regular daily topics.  On a more sociological side, these papers exacerbate the problem with our citation culture (mass-grave references), which undermines the citation count as a means to evaluate the research impact.  But that is a wider issue which I don't know how to address at the moment.

Time to move on. The ICHEP conference is coming next week, with loads of brand new results based on up to 16 inverse femtobarns of 13 TeV LHC data.  Although the rumor is that there is no new exciting  anomaly at this point, it will be interesting to see how much room is left for new physics. The hope lingers on, at least until the end of this year.

In the comments section you're welcome to lash out on the entire BSM community - we made a wrong call so we deserve it. Please, however, avoid personal attacks (unless on me). Alternatively, you can also give us a hug :)

## August 04, 2016

### Matt Strassler — A Flash in the Pan Flickers Out

Back in the California Gold Rush, many people panning for gold saw a yellow glint at the bottom of their pans, and thought themselves lucky.  But more often than not, it was pyrite — iron sulfide — fool’s gold…

Back in December 2015, a bunch of particle physicists saw a bump on a plot.  The plot showed the numbers of events with two photons (particles of light) as a function of the “invariant mass” of the photon pair.  (To be precise, they saw a big bump on one ATLAS plot, and a bunch of small bumps in similar plots by CMS and ATLAS [the two general purpose experiments at the Large Hadron Collider].)  What was that bump?  Was it a sign of a new particle?

A similar bump was the first sign of the Higgs boson, though that was far from clear at the time.  What about this bump?

As I wrote in December,

“Well, to be honest, probably it’s just that: a bump on a plot. But just in case it’s not…”

and I went on to describe what it might be if the bump were more than just a statistical fluke.  A lot of us — theoretical particle physicists like me — had a lot of fun, and learned a lot of physics, by considering what that bump might mean if it were a sign of something real.  (In fact I’ll be giving a talk here at CERN next week entitled “Lessons from a Flash in the Pan,” describing what I learned, or remembered, along the way.)

But updated results from CMS, based on a large amount of new data taken in 2016, have been seen.   (Perhaps these have leaked out early; they were supposed to be presented tomorrow along with those from ATLAS.)  They apparently show that where the bump was before, they now see nothing.  In fact there’s a small dip in the data there.

So — it seems that what we saw in those December plots was a fluke.  It happens.  I’m certainly disappointed, but hardly surprised.  Funny things happen with small amounts of data.

At the ICHEP 2016 conference, which started today, official presentation of the updated ATLAS and CMS two-photon results will come on Friday, but I think we all know the score.  So instead our focus will be on  the many other results (dozens and dozens, I hear) that the experiments will be showing us for the first time.  Already we had a small blizzard of them today.  I’m excited to see what they have to show us … the Standard Model, and naturalness, remain on trial.

Filed under: LHC News, Particle Physics Tagged: atlas, cms, diphoton, LHC

## August 03, 2016

### Mark Chu-Carroll — Noodles with Dried Shrimp and Scallion Oil

During July, my kids go away to camp. So my wife and I have the opportunity to try new restaurants without having to drag the munchkins around. This year, we tried out a new chinese place in Manhattan, called Hao noodle house. One of the dishes we had was a simple noodle dish: noodles lightly dressed with soy sauce and scallion oil, and then topped with a scattering of scallion and dried shrimp.

Dried shrimp are, in my opinion, a very undervalued and underused ingredient. They’re very traditional in a lot of real Chinese cooking, and they give things a really nice taste. They’ve also got an interesting, pleasant chewy texture. So when there was a dried shrimp dish on the menu, I wanted it. (The restaurant also had dan dan noodles, which are a favorite of my wife, but she was kind, and let me indulge.)

The dish was absolutely phenomenal. So naturally I wanted to figure out how to make it at home. I finally got around to doing it tonight, and I got really lucky: everything worked out perfectly, and it turned out almost exactly like the restaurant. My wife picked the noodles at the chinese grocery that looked closest, and they were exactly right. I guessed at the ingredients from the flavors, and somehow managed to get them spot on on the first try.

That kind of thing almost never happens! It always takes a few tries to nail down a recipe. But this one just turned out the first try!

So what’s the dish like? It’s very Chinese, and very different from what most Americans would expect. If you’ve had a mazeman ramen before, I’d say that’s the closest thing to it. It’s a light, warm, lightly dressed noodle dish. The sauce is very strong if you taste it on its own, but when it’s dressed onto hot noodles, it mellows quite a bit. The dried shrimp are briney and shrimpey, but not overly fishy. All I can say is, try it!

There are two parts to the sauce: a soy mixture, and a scallion oil. The scallion oil should be made a day in advance, and allowed to stand overnight. So we’ll start with that.

• one large bunch scallions
• 1 1/2 cups canola oil
• two slices crushed ginger
• generous pinch salt
1. Coarsely chop the scallions – whites and greens.
2. Put the scallions, ginger, and salt into a food processor, and pulse until they’re well chopped.
3. Add the oil, and let the processor run on high for about a minute. You should end up with a thick pasty pale green goo.
4. Put it in the refrigerator, and let it sit overnight.
5. The next day, push through a sieve, to separate the oil from the scallion pulp. Discard the scallions. You should be left with an amazing smelling translucent green oil.

Next, the noodles and sauce.

• Noodles. We used a kind of noodle called guan miao noodle. If you can’t find that,
then white/wheat soba or ramen would be a good substitute.
• 1/2 cup soy sauce
• 2 tablespoons sugar
• 1 cup chicken stock
• 2 slices ginger
• one clove garlic
• 2 tablespoons dried shrimp
1. Cover the dried shrimp with cold water in a bowl, and let sit for 1/2 hour.
2. Put the dried shrimp, soy sauce, sugar, chicken stock, ginger, and garlic into a saucepan, and simmer on low heat for five minutes. Then remove the garlic and ginger.
3. For each portion, take about 2 tablespoons of the soy, and two tablespoons of the scallion oil, and whisk together to form something like a vinaigrette.
4. Cook the noodles according to the package. (For the guan miao noodles, they boiled in unsalted water for 3 minutes.)
5. Toss with the soy/oil mixture.
6. Serve the dressed noodles into bowls, and put a few of the simmered dried shrimp on top.
7. Drizzle another teaspoon each of the scallion oil and soy sauce over each serving.
8. Scatter a few fresh scallions on top.

And eat!

### Scott Aaronson — My biology paper in Science (really)

Think I’m pranking you, right?

You can see the paper right here (“Synthetic recombinase-based state machines in living cells,” by Nathaniel Roquet, Ava P. Soleimany, Alyssa C. Ferris, Scott Aaronson, and Timothy K. Lu).  [Update (Aug. 3): The previous link takes you to a paywall, but you can now access the full text of our paper here.  See also the Supplementary Material here.]  You can also read the MIT News article (“Scientists program cells to remember and respond to series of stimuli”).  In any case, my little part of the paper will be fully explained in this post.

A little over a year ago, two MIT synthetic biologists—Timothy Lu and his PhD student Nate Roquet—came to my office saying they had a problem they wanted help with.  Why me? I wondered.  Didn’t they realize I was a quantum complexity theorist, who so hated picking apart owl pellets and memorizing the names of cell parts in junior-high Life Science, that he avoided taking a single biology course since that time?  (Not counting computational biology, taught in a CS department by Richard Karp.)

Nevertheless, I listened to my biologist guests—which turned out to be an excellent decision.

Tim and Nate told me about a DNA system with surprisingly clear rules, which led them to a strange but elegant combinatorial problem.  In this post, first I need to spend some time to tell you the rules; then I can tell you the problem, and lastly its solution.  There are no mathematical prerequisites for this post, and certainly no biology prerequisites: everything will be completely elementary, like learning a card game.  Pen and paper might be helpful, though.

As we all learn in kindergarten, DNA is a finite string over the 4-symbol alphabet {A,C,G,T}.  We’ll find it more useful, though, to think in terms of entire chunks of DNA bases, which we’ll label arbitrarily with letters like X, Y, and Z.  For example, we might have X=ACT, Y=TAG, and Z=GATTACA.

We can also invert one of these chunks, which means writing it backwards while also swapping the A’s with T’s and the G’s with C’s.  We’ll denote this operation by * (the technical name in biology is “reverse-complement”).  For example:

X*=AGT, Y*=CTA, Z*=TGTAATC.

Note that (X*)*=X.

We can then combine our chunks and their inverses into a longer DNA string, like so:

ZYX*Y* = GATTACA TAG AGT CTA.

From now on, we’ll work exclusively with the chunks, and forget completely about the underlying A’s, C’s, G’s, and T’s.

Now, there are also certain special chunks of DNA bases, called recognition sites, which tell the little machines that read the DNA when they should start doing something and when they should stop.  Recognition sites come in pairs, so we’ll label them using various parenthesis symbols like ( ), [ ], { }.  To convert a parenthesis into its partner, you invert it: thus ( = )*, [ = ]*, { = }*, etc.  Crucially, the parentheses in a DNA string don’t need to “face the right ways” relative to each other, and they also don’t need to nest properly.  Thus, both of the following are valid DNA strings:

X ( Y [ Z [ U ) V

X { Y ] Z { U [ V

Let’s refer to X, Y, Z, etc.—the chunks that aren’t recognition sites—as letter-chunks.  Then it will be convenient to make the following simplifying assumptions:

1. Our DNA string consists of an alternating sequence of recognition sites and letter-chunks, beginning and ending with letter-chunks.  (If this weren’t true, then we could just glom together adjacent recognition sites and adjacent letter-chunks, and/or add new dummy chunks, until it was true.)
2. Every letter-chunk that appears in the DNA string appears exactly once (either inverted or not), while every recognition site that appears, appears exactly twice.  Thus, if there are n distinct recognition sites, there are 2n+1 distinct letter-chunks.
3. Our DNA string can be decomposed into its constituent chunks uniquely—i.e., it’s always possible to tell which chunk we’re dealing with, and when one chunk stops and the next one starts.  In particular, the chunks and their reverse-complements are all distinct strings.

The little machines that read the DNA string are called recombinases.  There’s one kind of recombinase for each kind of recognition site: a (-recombinase, a [-recombinase, and so on.  When, let’s say, we let a (-recombinase loose on our DNA string, it searches for (‘s and )’s and ignores everything else.  Here’s what it does:

• If there are no (‘s or )’s in the string, or only one of them, it does nothing.
• If there are two (‘s facing the same way—like ( ( or ) )—it deletes everything in between them, including the (‘s themselves.
• If there are two (‘s facing opposite ways—like ( ) or ) (—it deletes the (‘s, and inverts everything in between them.

Let’s see some examples.  When we apply [-recombinase to the string

A ( B [ C [ D ) E,

we get

A ( B D ) E.

When we apply (-recombinase to the same string, we get

A D* ] C* ] B* E.

When we apply both recombinases (in either order), we get

A D* B* E.

Another example: when we apply {-recombinase to

A { B ] C { D [ E,

we get

A D [ E.

When we apply [-recombinase to the same string, we get

A { B D* } C* E.

When we apply both recombinases—ah, but here the order matters!  If we apply { first and then [, we get

A D [ E,

since the [-recombinase now encounters only a single [, and has nothing to do.  On the other hand, if we apply [ first and then {, we get

A D B* C* E.

Notice that inverting a substring can change the relative orientation of two recognition sites—e.g., it can change { { into { } or vice versa.  It can thereby change what happens (inversion or deletion) when some future recombinase is applied.

One final rule: after we’re done applying recombinases, we remove the remaining recognition sites like so much scaffolding, leaving only the letter-chunks.  Thus, the final output

A D [ E

becomes simply A D E, and so on.  Notice also that, if we happen to delete one recognition site of a given type while leaving its partner, the remaining site will necessarily just bounce around inertly before getting deleted at the end—so we might as well “put it out of its misery,” and delete it right away.

My coauthors have actually implemented all of this in a wet lab, which is what most of the Science paper is about (my part is mostly in a technical appendix).  They think of what they’re doing as building a “biological state machine,” which could have applications (for example) to programming cells for medical purposes.

But without further ado, let me tell you the math question they gave me.  For reasons that they can explain better than I can, my coauthors were interested in the information storage capacity of their biological state machine.  That is, they wanted to know the answer to the following:

Suppose we have a fixed initial DNA string, with n pairs of recognition sites and 2n+1 letter-chunks; and we also have a recombinase for each type of recognition site.  Then by choosing which recombinases to apply, as well as which order to apply them in, how many different DNA strings can we generate as output?

It’s easy to construct an example where the answer is as large as 2n.  Thus, if we consider a starting string like

A ( B ) C [ D ] E { F } G < H > I,

we can clearly make 24=16 different output strings by choosing which subset of recombinases to apply and which not.  For example, applying [, {, and < (in any order) yields

A B C D* E F* G H* I.

There are also cases where the number of distinct outputs is less than 2n.  For example,

A ( B [ C [ D ( E

can produce only 3 outputs—A B C D E, A B D E, and A E—rather than 4.

What Tim and Nate wanted to know was: can the number of distinct outputs ever be greater than 2n?

Intuitively, it seems like the answer “has to be” yes.  After all, we already saw that the order in which recombinases are applied can matter enormously.  And given n recombinases, the number of possible permutations of them is n!, not 2n.  (Furthermore, if we remember that any subset of the recombinases can be applied in any order, the number of possibilities is even a bit greater—about e·n!.)

Despite this, when my coauthors played around with examples, they found that the number of distinct output strings never exceeded 2n. In other words, the number of output strings behaved as if the order didn’t matter, even though it does.  The problem they gave me was either to explain this pattern or to find a counterexample.

I found that the pattern holds:

Theorem: Given an initial DNA string with n pairs of recognition sites, we can generate at most 2n distinct output strings by choosing which recombinases to apply and in which order.

Let a recombinase sequence be an ordered list of recombinases, each occurring at most once: for example, ([{ means to apply (-recombinase, then [-recombinase, then {-recombinase.

The proof of the theorem hinges on one main definition.  Given a recombinase sequence that acts on a given DNA string, let’s call the sequence irreducible if every recombinase in the sequence actually finds two recognition sites (and hence, inverts or deletes a nonempty substring) when it’s applied.  Let’s call the sequence reducible otherwise.  For example, given

A { B ] C { D [ E,

the sequence [{ is irreducible, but {[ is reducible, since the [-recombinase does nothing.

Clearly, for every reducible sequence, there’s a shorter sequence that produces the same output string: just omit the recombinases that don’t do anything!  (On the other hand, I leave it as an exercise to show that the converse is false.  That is, even if a sequence is irreducible, there might be a shorter sequence that produces the same output string.)

Key Lemma: Given an initial DNA string, and given a subset of k recombinases, every irreducible sequence composed of all k of those recombinases produces the same output string.

Assuming the Key Lemma, let’s see why the theorem follows.  Given an initial DNA string, suppose you want to specify one of its possible output strings.  I claim you can do this using only n bits of information.  For you just need to specify which subset of the n recombinases you want to apply, in some irreducible order.  Since every irreducible sequence of those recombinases leads to the same output, you don’t need to specify an order on the subset.  Furthermore, for each possible output string S, there must be some irreducible sequence that leads to S—given a reducible sequence for S, just keep deleting irrelevant recombinases until no more are left—and therefore some subset of recombinases you could pick that uniquely determines S.  OK, but if you can specify each S uniquely using n bits, then there are at most 2n possible S’s.

Proof of Key Lemma.  Given an initial DNA string, let’s assume for simplicity that we’re going to apply all n of the recombinases, in some irreducible order.  We claim that the final output string doesn’t depend at all on which irreducible order we pick.

If we can prove this claim, then the lemma follows, since given a proper subset of the recombinases, say of size k<n, we can simply glom together everything between one relevant recognition site and the next one, treating them as 2k+1 giant letter-chunks, and then repeat the argument.

Now to prove the claim.  Given two letter-chunks—say A and B—let’s call them soulmates if either A and B or A* and B* will necessarily end up next to each other, whenever all n recombinases are applied in some irreducible order, and whenever A or B appears at all in the output string.  Also, let’s call them anti-soulmates if either A and B* or A* and B will necessarily end up next to each other if either appears at all.

To illustrate, given the initial DNA sequence,

A [ B ( C ] D ( E,

you can check that A and C are anti-soulmates.  Why?  Because if we apply all the recombinases in an irreducible sequence, then at some point, the [-recombinase needs to get applied, and it needs to find both [ recognition sites.  And one of these recognition sites will still be next to A, and the other will still be next to C (for what could have pried them apart?  nothing).  And when that happens, no matter where C has traveled in the interim, C* must get brought next to A.  If the [-recombinase does an inversion, the transformation will look like

A [ … C ] → A C* …,

while if it does a deletion, the transformation will look like

A [ … [ C* → A C*

Note that C’s [ recognition site will be to its left, if and only if C has been flipped to C*.  In this particular example, A never moves, but if it did, we could repeat the analysis for A and its [ recognition site.  The conclusion would be the same: no matter what inversions or deletions we do first, we’ll maintain the invariant that A and C* (or A* and C) will immediately jump next to each other, as soon as the [ recombinase is applied.  And once they’re next to each other, nothing will ever separate them.

Similarly, you can check that C and D are soulmates, connected by the ( recognition sites; D and B are anti-soulmates, connected by the [ sites; and B and E are soulmates, connected by the ( sites.

More generally, let’s consider an arbitrary DNA sequence, with n pairs of recognition sites.  Then we can define a graph, called the soulmate graph, where the 2n+1 letter-chunks are the vertices, and where X and Y are connected by (say) a blue edge if they’re soulmates, and by a red edge if they’re anti-soulmates.

When we construct this graph, we find that every vertex has exactly 2 neighbors, one for each recognition site that borders it—save the first and last vertices, which border only one recognition site each and so have only one neighbor each.  But these facts immediately determine the structure of the graph.  Namely, it must consist of a simple path, starting at the first letter-chunk and ending at the last one, together with possibly a disjoint union of cycles.

But we know that the first and last letter-chunks can never move anywhere.  For that reason, a path of soulmates and anti-soulmates, starting at the first letter-chunk and ending at the last one, uniquely determines the final output string, when the n recombinases are applied in any irreducible order.  We just follow it along, switching between inverted and non-inverted letter-chunks whenever we encounter a red edge.  The cycles contain the letter-chunks that necessarily get deleted along the way to that unique output string.  This completes the proof of the lemma, and hence the theorem.

There are other results in the paper, like a generalization to the case where there can be k pairs of recognition sites of each type, rather than only one. In that case, we can prove that the number of distinct output strings is at most 2kn, and that it can be as large as ~(2k/3e)n. We don’t know the truth between those two bounds.

Why is this interesting?  As I said, my coauthors had their own reasons to care, involving the number of bits one can store using a certain kind of DNA state machine.  I got interested for a different reason: because this is a case where biology threw up a bunch of rules that look like a random mess—the parentheses don’t even need to nest correctly?  inversion can also change the semantics of the recognition sites?  evolution never thought about what happens if you delete one recognition site while leaving the other one?—and yet, on analysis, all the rules work in perfect harmony to produce a certain outcome.  Change a single one of them, and the “at most 2n distinct DNA sequences” theorem would be false.  Mind you, I’m still not sure what biological purpose it serves for the rules to work in harmony this way, but they do.

But the point goes further.  While working on this problem, I’d repeatedly encounter an aspect of the mathematical model that seemed weird and inexplicable—only to have Tim and Nate explain that the aspect made sense once you brought in additional facts from biology, facts not in the model they gave me.  As an example, we saw that in the soulmate graph, the deleted substrings appear as cycles.  But surely excised DNA fragments don’t literally form loops?  Why yes, apparently, they do.  As a second example, consider the DNA string

A ( B [ C ( D [ E.

When we construct the soulmate graph for this string, we get the path

A–D–C–B–E.

Yet there’s no actual recombinase sequence that leads to A D C B E as an output string!  Thus, we see that it’s possible to have a “phantom output,” which the soulmate graph suggests should be reachable but that isn’t actually reachable.  According to my coauthors, that’s because the “phantom outputs” are reachable, once you know that in real biology (as opposed to the mathematical model), excised DNA fragments can also reintegrate back into the long DNA string.

Many of my favorite open problems about this model concern algorithms and complexity. For example: given as input an initial DNA string, does there exist an irreducible order in which the recombinases can be applied? Is the “utopian string”—the string suggested by the soulmate graph—actually reachable? If it is reachable, then what’s the shortest sequence of recombinases that reaches it? Are these problems solvable in polynomial time? Are they NP-hard? More broadly, if we consider all the subsets of recombinases that can be applied in an irreducible order, or all the irreducible orders themselves, what combinatorial conditions do they satisfy?  I don’t know—if you’d like to take a stab, feel free to share what you find in the comments!

What I do know is this: I’m fortunate that, before they publish your first biology paper, the editors at Science don’t call up your 7th-grade Life Science teacher to ask how you did in the owl pellet unit.