Planet Musings

July 29, 2016

ResonaancesAfter the hangover

The loss of the 750 GeV diphoton resonance is a big blow to the particle physics community. We are currently going through the 5 stages of grief, everyone at their own pace, as can be seen e.g. in this comments section. Nevertheless, it may already be a good moment to revisit the story one last time, so as  to understand what went wrong.

In the recent years, physics beyond the Standard Model has seen 2 other flops of comparable impact:
the faster-than-light neutrinos in OPERA,  and the CMB tensor fluctuations in BICEP.  Much as the diphoton signal, both of the above triggered a binge of theoretical explanations, followed by a massive hangover. There was one big difference, however: the OPERA and BICEP signals were due to embarrassing errors on the experiments' side. This doesn't seem to be the case for the diphoton bump at the LHC. Some may wonder whether the Standard Model background may have been slightly underestimated,  or whether one experiment may have been biased by the result of the other... But, most likely, the 750 GeV bump was just due to a random fluctuation of the background at this particular energy. Sadly, the resulting mess cannot be blamed on experimentalists (who were in fact downplaying the anomaly in their official communications). Clearly, this time it's the theorists who  have some explaining to do.

Why did theorists write 500 papers about a statistical fluctuation?  One reason is that it didn't look like one at first sight. Back in December 2015, the local significance of the diphoton  bump in ATLAS run-2 data was 3.9 sigma, which means the probability of such a fluctuation was 1 in 10000. Combining available run-1 and run-2 diphoton data in ATLAS and CMS, the local significance was increased to 4.4 sigma.  All in all, it was a very unusual excess, a 1-in-100000 occurrence! Of course, this number should be interpreted with care. The point is that the LHC experiments perform gazillion different measurements, thus they are bound to observe seemingly unlikely outcomes in a small fraction of them. This can be partly taken into account by calculating the global significance, which is the probability of finding a background fluctuation of the observed size anywhere in the diphoton spectrum. The global significance of the 750 GeV bump quoted by ATLAS was only about two sigma, the fact strongly emphasized by the collaboration.  However, that number can be misleading too.  One problem with the global significance is that, unlike for the local one, it cannot be  easily combined in the presence of separate measurements of the same observable. For the diphoton final state we  have ATLAS and CMS measurements in run-1 and run-2,  thus 4 independent datasets, and their robust concordance was crucial  in creating the excitement.  Note also that what is really relevant here is the probability of a fluctuation of a given size in any of the  LHC measurement, and that is not captured by the global significance.  For these reasons, I find it more transparent work with the local significance, remembering that it should not be interpreted as the probability that the Standard Model is incorrect. By these standards, a 4.4 sigma fluctuation in a combined ATLAS and CMS dataset is still a very significant effect which deserves a special attention. What we learned the hard way is that such large fluctuations do happen at the LHC...   This lesson will certainly be taken into account next time we encounter a significant anomaly.

Another reason why the 750 GeV bump was exciting is that the measurement is rather straightforward.  Indeed, at the LHC we often see anomalies in complicated final states or poorly controlled differential distributions, and we treat those with much skepticism.  But a resonance in the diphoton spectrum is almost the simplest and cleanest observable that one can imagine (only a dilepton or 4-lepton resonance would be cleaner). We already successfully discovered one particle this way - that's how the Higgs boson first showed up in 2011. Thus, we have good reasons to believe that the collaborations control this measurement very well.

Finally, the diphoton bump was so attractive because theoretical explanations were  plausible.  It was trivial to write down a model fitting the data, there was no need to stretch or fine-tune the parameters, and it was quite natural that the particle first showed in as a diphoton resonance and not in other final states. This is in stark contrast to other recent anomalies which typically require a great deal of gymnastics to fit into a consistent picture.   The only thing to give you a pause was the tension with the LHC run-1 diphoton data, but even that became  mild after the Moriond update this year.

So we got a huge signal of a new particle in a clean channel with plausible theoretic models to explain it...  that was a really bad luck.  My conclusion may be risky but I don't think that the theory community  committed major missteps  in this case.  Given that for 30 years  we have been looking for a clue about the fundamental theory beyond the Standard Model, our reaction was not disproportionate.  Excitement is a vital part of physics research. And so is disappointment, unfortunately.

There remains a question whether we really needed 500 papers...   Well, of course not: many of  them fill an important gap.  Yet many are an interesting read, and I personally learned a lot of exciting physics from them.  Actually, I suspect that the fraction of useless papers among the 500 is lower than for regular daily topics.  On a more sociological side, these papers exacerbate the problem with our citation culture (mass-grave references), which undermines the citation count as a means to evaluate the research impact.  But that is a wider issue  which I don't know how to address at the moment.

Time to move on. The ICHEP conference is coming next week, with loads of brand new results based on more than 10 inverse femtobarn of 13 TeV LHC data.  Although the rumor is that there is no new exciting  anomaly at this point, it will be interesting to see how much room is left for new physics. The hope lingers on, at least until the end of this year.

July 28, 2016

ParticlebitesCan we measure black hole kicks using gravitational waves?

Article: Black hole kicks as new gravitational wave observables
Authors: Davide Gerosa, Christopher J. Moore
Reference: arXiv:1606.04226Phys. Rev. Lett. 117, 011101 (2016)

On September 14 2015, something really huge happened in physics: the first direct detection of gravitational waves happened. But measuring a single gravitational wave was never the goal—.though freaking cool in and of itself of course!  So what is the purpose of gravitational wave astronomy?

The idea is that gravitational waves can be used as another tool to learn more about our Universe and its components. Until the discovery of gravitational waves, observations in astrophysics and astronomy were limited to observations with telescopes and thus to electromagnetic radiation. Now a new era has started: the era of gravitational wave astronomy. And when the space-based eLISA observatory comes online, it will begin an era of gravitational wave cosmology. So what is it that we can learn from our universe from gravitational waves?

First of all, the first detection aka GW150914 was already super interesting:

  1. It was the first observation of a binary black hole system (with unexpected masses!).
  2. It put some strong constraints on the allowed deviations from Einstein’s theory of general relativity.

What is next? We hope to detect a neutron star orbiting a black hole or another neutron star.  This will allow us to learn more about the equation of state of neutron stars and thus their composition. But the authors in this paper suggest another exciting prospect: observing so-called black hole kicks using gravitational wave astronomy.

So, what is a black hole kick? When two black holes rotate around each other, they emit gravitational waves. In this process, they lose energy and therefore they get closer and closer together before finally merging to form a single black hole. However, generically the radiation is not the same in all directions and thus there is also a net emission of linear momentum. By conservation of momentum, when the black holes merge, the final remnant experiences a recoil in the opposite direction. Previous numerical studies have shown that non-spinning black holes ‘only’ have kicks of ∼ 170 km per second, but you can also have “superkicks” as high as ∼5000 km per second! These speeds can exceed the escape velocity of even the most massive galaxies and may thus eject black holes from their hosts. These dramatic events have some electromagnetic signatures, but also leave an imprint in the gravitational waveform that we detect.


Fig. 1: This graph shows two black holes rotating around each other (without any black hole kick) and finally merging during the final part of the inspiral phase followed by the very short merger and ringdown phase. The wave below is the gravitational waveform. [Figure from 1602.03837]

The idea is rather simple: as the system experiences a kick, its gravitational wave is Doppler shifted. This Doppler shift effects the frequency f in the way you would expect:


Doppler shift from black hole kick.

with v the kick velocity and n the unit vector in the direction from the observer to the black hole system (and c the speed of light). The black hole dynamics is entirely captured by the dimensionless number G f M/c3 with M the mass of the binary (and G Newton’s constant). So you can also model this shift in frequency by using the unkicked frequency fno kick and observing the Doppler shift into the mass. This is very convenient because this means that you can use all the current knowledge and results for the gravitational waveforms and just change the mass. Now the tricky part is that the velocity changes over time and this needs to be modelled more carefully.

A crude model would be to say that during the inspiral of the black holes (which is the long phase during which the two black holes rotate around each other – see figure 1), the emitted linear momentum is too small and the mass is unaffected by emission of linear momentum. During the final stages the black holes merge and the final remnant emits a gravitational wave with decreasing amplitude, which is called the ringdown phase. During this latter phase the velocity kick is important and one can relate the mass during inspiral Mi with the mass during the ringdown phase Mr simply by


Mass during ringdown related to mass during inspiral.

The results of doing this for a black hole kick moving away (or towards) us are shown in fig. 2: the wave gets redshifted (or blueshifted).

Fig. 2: If a black hole binary radiates isotropically, it does not experience any kick and the gravitational wave has the black waveform. However, if it experiences a kick along the line of sight, the waveform can get redshifted (when the system moves away from us) as shown on the left of blueshifted (when system moves toward us) as shown on the right. The top and lower panel correspond to the two independent polarizations of the gravitational wave.[Figure taken from this paper]

Fig. 2: If a black hole binary radiates isotropically, it does not experience any kick and the gravitational wave has the black waveform. However, if it experiences a kick along the line of sight, the waveform can get redshifted (when the system moves away from us) as shown on the left of blueshifted (when system moves toward us) as shown on the right. The top and lower panel correspond to the two independent polarizations of the gravitational wave. [Figure from 1606.04226]

This model is refined in various ways and the results show that it is unlikely that kicks will be measured by LIGO, as LIGO is optimized for detecting black hole with relatively low masses and black hole systems with low masses have velocity kicks that are too low to be detected. However, the prospects for eLISA are better for two reasons: (1) eLISA is designed to measure supermassive black hole binaries with masses in the range of 105 to 1010 solar masses, which can have much larger kicks (and thus are more easily detectable) and (2) the signal-to-noise ratio for eLISA is much higher giving better data. This study estimates about 6 detectable kicks per year. Thus, black hole (super)kicks might be detected in the next decade using gravitational wave astronomy. The future is bright 🙂

Further Reading

July 27, 2016

David Hoggfiber number issues

[I was on vacation for a few days.]

Just before I left, Melissa Ness discovered that instrumental fiber number is a good predictor of whether or not two stars will get similar abundances in APOGEE, either with The Cannon or with the standard pipeline! This is perhaps not a surprise: The different fibers have different line-spread functions, and sit on different parts of the detector. We discussed how to mitigate this, and looked at the dependence of the issues on fiber number and line-spread function FWHM separately.

For the nth time, I re-wrote my abstract (that is, the scope of a possible paper) on what you could learn about a star's intrinsic properties from a Gaia-like parallax measurement. I think the focus perhaps should be the subjectivity of it: What you can learn depends on what you know and believe.

Hans-Walter Rix decided that my talk at the end of this week should be on the graphical model as a tool for data analysis. I hope he is right!

David Hoggchemical equilibria and bimodality; etc.

DFM and I worked through issues remaining in our MCMC Data Analysis Recipes paper, which I would like to post on the arXiv this month (or next!). We also worked through some remaining issues in his long-period transiting exoplanet paper, in which he discovers and estimates the population of very long-period planets in the Kepler data.

David Weinberg (OSU) gave a nice talk about how stellar populations come to chemical equilibria, making use of nucleosynthetic models. He looked at how star-formation events might appear in the metallicity distribution. He also showed the beautiful data on the alpha-abundance bimodality in the APOGEE data, but in the end did not give a confident explanation of that bimodality, which really is intriguing.

I also had a substantial chat with Matthias Samland about his project to constrain the directly emitted infrared spectrum of an exoplanet using multiple data sources. He has the usual issues of inconsistent data calibration, correlated noise in extracted spectra, and the simultaneous fitting of photometry and spectroscopy. It looks like he will have lots of good conclusions, though: The specta are highly informative.

John BaezTopological Crystals (Part 2)


We’re building crystals, like diamonds, purely from topology. Last time I said how: you take a graph X and embed its maximal abelian cover into the vector space H_1(X,\mathbb{R}). Now let me say a bit more about the maximal abelian cover. It’s not nearly as famous as the universal cover, but it’s very nice.

First I’ll whiz though the basic idea, and then I’ll give the details.

The basic idea

By ‘space’ let me mean a connected topological space that’s locally nice. The basic idea is that if X is some space, its universal cover \widetilde{X} is a covering space of X that covers all other covering spaces of X. The maximal abelian cover \overline{X} has a similar universal property—but it’s abelian, and it covers all abelian connected covers. A cover is abelian if its group of deck transformations is abelian.

The cool part is that universal covers are to homotopy theory as maximal abelian covers are to homology theory.

What do I mean by that? For starters, points in \widetilde{X} are just homotopy classes of paths in X starting at some chosen basepoint. And the points in \overline{X} are just ‘homology classes’ of paths starting at the basepoint.

But people don’t talk so much about ‘homology classes’ of paths. So what do I mean by that? Here a bit of category theory comes in handy. Homotopy classes of paths in X are morphisms in the fundamental groupoid of X. Homology classes of paths are morphisms in the abelianized version of the fundamental groupoid!

But wait a minute — what does that mean? Well, we can abelianize any groupoid by imposing the relations

f g = g f

whenever it makes sense to do so. It makes sense to do so when you can compose the morphisms f : x \to y and g : x' \to y' in either order, and the resulting morphisms f g and g f have the same source and the same target. And if you work out what that means, you’ll see it means

x = y = x' = y'

But now let me say it all much more slowly, for people who want a more relaxed treatment.

The details

There are lots of slightly different things called ‘graphs’ in mathematics, but in topological crystallography it’s convenient to work with one that you’ve probably never seen before. This kind of graph has two copies of each edge, one pointing in each direction.

So, we’ll say a graph X = (E,V,s,t,i) has a set V of vertices, a set E of edges, maps s,t : E \to V assigning to each edge its source and target, and a map i : E \to E sending each edge to its inverse, obeying

s(i(e)) = t(e), \quad t(i(e)) = s(e) , \qquad i(i(e)) = e


i(e) \ne e

for all e \in E.

That inequality at the end will make category theorists gag: definitions should say what’s true, not what’s not true. But category theorists should be able to see what’s really going on here, so I leave that as a puzzle.

For ordinary folks, let me repeat the definition using more words. If s(e) = v and t(e) = w we write e : v \to w, and draw e as an interval with an arrow on it pointing from v to w. We write i(e) as e^{-1}, and draw e^{-1} as the same interval as e, but with its arrow reversed. The equations obeyed by i say that taking the inverse of e : v \to w gives an edge e^{-1} : w \to v and that (e^{-1})^{-1} = e. No edge can be its own inverse.

A map of graphs, say f : X \to X', is a pair of functions, one sending vertices to vertices and one sending edges to edges, that preserve the source, target and inverse maps. By abuse of notation we call both of these functions f.

I started out talking about topology; now I’m treating graphs very combinatorially, but we can bring the topology back in. From a graph X we can build a topological space |X| called its geometric realization. We do this by taking one point for each vertex and gluing on one copy of [0,1] for each edge e : v \to w, gluing the point 0 to v and the point 1 to w, and then identifying the interval for each edge e with the interval for its inverse by means of the map t \mapsto 1 - t.

Any map of graphs gives rise to a continuous map between their geometric realizations, and we say a map of graphs is a cover if this continuous map is a covering map. For simplicity we denote the fundamental group of |X| by \pi_1(X), and similarly for other topological invariants of |X|. However, sometimes I’ll need to distinguish between a graph X and its geometric realization |X|.

Any connected graph X has a universal cover, meaning a connected cover

p : \widetilde{X} \to X

that covers every other connected cover. The geometric realization of \widetilde{X} is connected and simply connected. The fundamental group \pi_1(X) acts as deck transformations of \widetilde{X}, meaning invertible maps g : \widetilde{X} \to \widetilde{X} such that p \circ g = p. We can take the quotient of \widetilde{X} by the action of any subgroup G \subseteq \pi_1(X) and get a cover q : \widetilde{X}/G \to X.

In particular, if we take G to be the commutator subgroup of \pi_1(X), we call the graph \widetilde{X}/G the maximal abelian cover of the graph X, and denote it by \overline{X}. We obtain a cover

q : \overline{X} \to X

whose group of deck transformations is the abelianization of \pi_1(X). This is just the first homology group H_1(X,\mathbb{Z}). In particular, if the space corresponding to X has n holes, this is the free abelian group on
n generators.

I want a concrete description of the maximal abelian cover! I’ll build it starting with the universal cover, but first we need some preliminaries on paths in graphs.

Given vertices x,y in X, define a path from x to y to be a word of edges \gamma = e_1 \cdots e_\ell with e_i : v_{i-1} \to v_i for some vertices v_0, \dots, v_\ell with v_0 = x and v_\ell = y. We allow the word to be empty if and only if x = y; this gives the trivial path from x to itself.

Given a path \gamma from x to y we write \gamma : x \to y, and we write the trivial path from x to itself as 1_x : x \to x. We define the composite of paths \gamma : x \to y and \delta : y \to z via concatenation of words, obtaining a path we call \gamma \delta : x \to z. We call a path from a vertex x to itself a loop based at x.

We say two paths from x to y are homotopic if one can be obtained from the other by repeatedly introducing or deleting subwords of the form e_i e_{i+1} where e_{i+1} = e_i^{-1}. If [\gamma] is a homotopy class of paths from x to y, we write [\gamma] : x \to y. We can compose homotopy classes [\gamma] : x \to y and [\delta] : y \to z by setting [\gamma] [\delta] = [\gamma \delta].

If X is a connected graph, we can describe the universal cover \widetilde{X} as follows. Fix a vertex x_0 of X, which we call the basepoint. The vertices of \widetilde{X} are defined to be the homotopy classes of paths [\gamma] : x_0 \to x where x is arbitrary. The edges in \widetilde{X} from the vertex [\gamma] : x_0 \to x to the vertex [\delta] : x_0 \to y are defined to be the edges e \in E with [\gamma e] = [\delta]. In fact, there is always at most one such edge. There is an obvious map of graphs

p : \widetilde{X} \to X

sending each vertex [\gamma] : x_0 \to x of \widetilde{X} to the vertex
x of X. This map is a cover.

Now we are ready to construct the maximal abelian cover \overline{X}. For this, we impose a further equivalence relation on paths, which is designed to make composition commutative whenever possible. However, we need to be careful. If \gamma : x \to y and \delta : x' \to y' , the composites \gamma \delta and \delta \gamma are both well-defined if and only if x' = y and y' = x. In this case, \gamma \delta and \delta \gamma share the same starting point and share the same ending point if and only if x = x' and y = y'. If all four of these equations hold, both \gamma and \delta are loops based at x. So, we shall impose the relation \gamma \delta = \delta \gamma only in this case.

We say two paths are homologous if one can be obtained from another by:

• repeatedly introducing or deleting subwords e_i e_{i+1} where
e_{i+1} = e_i^{-1}, and/or

• repeatedly replacing subwords of the form

e_i \cdots e_j e_{j+1} \cdots e_k

by those of the form

e_{j+1} \cdots e_k e_i \cdots e_j

where e_i \cdots e_j and e_{j+1} \cdots e_k are loops based at the same vertex.

My use of the term ‘homologous’ is a bit nonstandard here!

We denote the homology class of a path \gamma by [[ \gamma ]]. Note that if two paths \gamma : x \to y, \delta : x' \to y' are homologous then x = x' and y = y'. Thus, the starting and ending points of a homology class of paths are well-defined, and given any path \gamma : x \to y we write [[ \gamma ]] : x \to y . The composite of homology classes is also well-defined if we set

[[ \gamma ]] [[ \delta ]] = [[ \gamma \delta ]]

We construct the maximal abelian cover of a connected graph X just as we constructed its universal cover, but using homology classes rather than homotopy classes of paths. And now I’ll introduce some jargon that should make you start thinking about crystals!

Fix a basepoint x_0 for X. The vertices of \overline{X}, or atoms, are defined to be the homology classes of paths [[\gamma]] : x_0 \to x where x is arbitrary. Any edge of \overline{X}, or bond, goes from some atom [[ \gamma]] : x_0 \to x to the some atom [[ \delta ]] : x_0 \to y. The bonds from [[ \gamma]] to [[ \delta ]] are defined to be the edges e \in E with [[ \gamma e ]] = [[ \delta ]]. There is at most one bond between any two atoms. Again we have a covering map

q : \overline{X} \to X .

The homotopy classes of loops based at x_0 form a group, with composition as the group operation. This is the fundamental group \pi_1(X) of the graph X. This is isomorphic as the fundamental group of the space associated to X. By our construction of the universal cover, \pi_1(X) is also the set of vertices of \widetilde{X} that are mapped to x_0 by p. Furthermore, any element [\gamma] \in \pi_1(X) defines a deck transformation of \widetilde{X} that sends each vertex [\delta] : x_0 \to x to the vertex [\gamma] [\delta] : x_0 \to x.

Similarly, the homology classes of loops based at x_0 form a group with composition as the group operation. Since the additional relation used to define homology classes is precisely that needed to make composition of homology classes of loops commutative, this group is the abelianization of \pi_1(X). It is therefore isomorphic to the first homology group H_1(X,\mathbb{Z}) of the geometric realization of X.

By our construction of the maximal abelian cover, H_1(X,\mathbb{Z}) is also the set of vertices of \overline{X} that are mapped to x_0 by q. Furthermore, any element [[\gamma]] \in H_1(X,\mathbb{Z}) defines a deck transformation of \overline{X} that sends each vertex [[\delta]] : x_0 \to x to the vertex [[\gamma]] [[\delta]] : x_0 \to x.

So, it all works out! The fundamental group \pi_1(X) acts as deck transformations of the universal cover, while the first homology group H_1(X,\mathbb{Z}) acts as deck transformations of the maximal abelian cover.

Puzzle for experts: what does this remind you of in Galois theory?

We’ll get back to crystals next time.

n-Category Café Topological Crystals (Part 2)


We’re building crystals, like diamonds, purely from topology. Last time I said how: you take a graph XX and embed its maximal abelian cover into the vector space H 1(X,)H_1(X,\mathbb{R}).

Now let me back up and say a bit more about the maximal abelian cover. It’s not nearly as famous as the universal cover, but it’s very nice.

The basic idea

By ‘space’ let me mean a connected topological space that’s locally nice. The basic idea is that if XX is some space, its universal cover X˜\widetilde{X} is a covering space of XX that covers all other covering spaces of XX. The maximal abelian cover X¯\overline{X} has a similar universal property — but it’s abelian, and it covers all abelian connected covers. A cover is abelian if its group of deck transformations is abelian.

The cool part is that universal covers are to homotopy theory as maximal abelian covers are to homology theory.

What do I mean by that? For starters, points in X˜\widetilde{X} are just homotopy classes of paths in XX starting at some chosen basepoint. And the points in X¯\overline{X} are just ‘homology classes’ of paths starting at the basepoint.

But people don’t talk so much about ‘homology classes’ of paths. So what do I mean by that? Here a bit of category theory comes in handy. Homotopy classes of paths in XX are morphisms in the fundamental groupoid of XX. Homology classes of paths are morphisms in the abelianized fundamental groupoid!

But wait a minute — what does that mean? Well, we can abelianize any groupoid by imposing the relations

fg=gf f g = g f

whenever it makes sense to do so. It makes sense to do so when you can compose the morphisms f:xyf : x \to y and g:xyg : x' \to y' in either order, and the resulting morphisms fgf g and gfg f have the same source and the same target. And if you work out what that means, you’ll see it means x=y=x=yx = y = x' = y'.

But now let me say it all much more slowly, for people who want a more relaxed treatment. After all, this is a nice little bit of topology that could be in an elementary course!

The details

There are lots of slightly different things called ‘graphs’ in mathematics, but in topological crystallography it’s convenient to work with one that you’ve probably never seen before. This kind of graph has two copies of each edge, one pointing in each direction.

So, we’ll say a graph X=(E,V,s,t,i)X = (E,V,s,t,i) has a set VV of vertices, a set EE of edges, maps s,t:EVs,t : E \to V assigning to each edge its source and target, and a map i:EEi : E \to E sending each edge to its inverse, obeying

s(i(e))=t(e),t(i(e))=s(e),i(i(e))=e s(i(e)) = t(e), \quad t(i(e)) = s(e) , \qquad i(i(e)) = e


i(e)e i(e) \ne e

for all eEe \in E.

That inequality at the end will make category theorists gag: definitions should say what’s true, not what’s not true. But category theorists should be able to see what’s really going on here, so I leave that as a puzzle.

For ordinary folks, let me repeat the definition using more words. If s(e)=vs(e) = v and t(e)=wt(e) = w we write e:vwe : v \to w, and draw ee as an interval with an arrow on it pointing from vv to ww. We write i(e)i(e) as e 1e^{-1}, and draw e 1e^{-1} as the same interval as ee, but with its arrow reversed. The equations obeyed by ii say that taking the inverse of e:vwe : v \to w gives an edge e 1:wve^{-1} : w \to v and that (e 1) 1=e(e^{-1})^{-1} = e. No edge can be its own inverse.

A map of graphs, say f:XXf : X \to X', is a pair of functions, one sending vertices to vertices and one sending edges to edges, that preserve the source, target and inverse maps. By abuse of notation we call both of these functions ff.

I started out talking about topology; now I’m treating graphs very combinatorially, but we can bring the topology back in.

From a graph XX we can build a topological space |X||X| called its geometric realization. We do this by taking one point for each vertex and gluing on one copy of [0,1][0,1] for each edge e:vwe : v \to w, gluing the point 00 to vv and the point 11 to ww, and then identifying the interval for each edge ee with the interval for its inverse by means of the map t1tt \mapsto 1 - t. Any map of graphs gives rise to a continuous map between their geometric realizations, and we say a map of graphs is a cover if this continuous map is a covering map. For simplicity we denote the fundamental group of |X||X| by π 1(X)\pi_1(X), and similarly for other topological invariants of |X||X|. However, sometimes I’ll need to distinguish between a graph XX and its geometric realization |X||X|.

Any connected graph XX has a universal cover, meaning a connected cover

p:X˜X p : \widetilde{X} \to X

that covers every other connected cover. The geometric realization of X˜\widetilde{X} is connected and simply connected. The fundamental group π 1(X)\pi_1(X) acts as deck transformations of X˜\widetilde{X}, meaning invertible maps g:X˜X˜g : \widetilde{X} \to \widetilde{X} such that pg=pp \circ g = p. We can take the quotient of X˜\widetilde{X} by the action of any subgroup Gπ 1(X)G \subseteq \pi_1(X) and get a cover q:X˜/GX q : \widetilde{X}/G \to X.

In particular, if we take GG to be the commutator subgroup of π 1(X)\pi_1(X), we call the graph X˜/G\widetilde{X}/G the maximal abelian cover of the graph XX, and denote it by X¯\overline{X}. We obtain a cover

q:X¯X q : \overline{X} \to X

whose group of deck transformations is the abelianization of π 1(X)\pi_1(X). This is just the first homology group H 1(X,)H_1(X,\mathbb{Z}). In particular, if the space corresponding to XX has nn holes, this is a free abelian group on nn generators.

I want a concrete description of the maximal abelian cover! I’ll build it starting with the universal cover, but first we need some preliminaries on paths in graphs.

Given vertices x,yx,y in XX, define a path from xx to yy to be a word of edges γ=e 1e \gamma = e_1 \cdots e_\ell with e i:v i1v ie_i : v_{i-1} \to v_i for some vertices v 0,,v v_0, \dots, v_\ell with v 0=xv_0 = x and v =yv_\ell = y. We allow the word to be empty if and only if x=yx = y; this gives the trivial path from xx to itself. Given a path γ\gamma from xx to yy we write γ:xy\gamma : x \to y, and we write the trivial path from xx to itself as 1 x:xx1_x : x \to x. We define the composite of paths γ:xy \gamma : x \to y and δ:yz \delta : y \to z via concatenation of words, obtaining a path we call γδ:xz\gamma \delta : x \to z. We call a path from a vertex xx to itself a loop based at xx.

We say two paths from xx to yy are homotopic if one can be obtained from the other by repeatedly introducing or deleting subwords of the form e ie i+1e_i e_{i+1} where e i+1=e i 1e_{i+1} = e_i^{-1}. If [γ][\gamma] is a homotopy class of paths from xx to yy, we write [γ]:xy [\gamma] : x \to y. We can compose homotopy classes [γ]:xy [\gamma] : x \to y and [δ]:yz[\delta] : y \to z by setting [γ][δ]=[γδ] [\gamma] [\delta] = [\gamma \delta].

If XX is a connected graph, we can describe the universal cover X˜\widetilde{X} as follows. Fix a vertex x 0x_0 of XX, which we call the basepoint. The vertices of X˜\widetilde{X} are defined to be the homotopy classes of paths [γ]:x 0x [\gamma] : x_0 \to x where xx is arbitrary. The edges in X˜\widetilde{X} from the vertex [γ]:x 0x[\gamma] : x_0 \to x to the vertex [δ]:x 0y[\delta] : x_0 \to y are defined to be the edges eEe \in E with [γe]=[δ][\gamma e] = [\delta]. In fact, there is always at most one such edge. There is an obvious map of graphs

p:X˜X p : \widetilde{X} \to X

sending each vertex [γ]:x 0x [\gamma] : x_0 \to x of X˜\widetilde{X} to the vertex xx of XX. This map is a cover.

Now we are ready to construct the maximal abelian cover X¯\overline{X}. For this, we impose a further equivalence relation on paths, which is designed to make composition commutative whenever possible. However, we need to be careful. If γ:xy \gamma : x \to y and δ:xy \delta : x' \to y' , the composites γδ \gamma \delta and δγ\delta \gamma are both well-defined if and only if x=yx' = y and y=xy' = x. In this case, γδ\gamma \delta and δγ\delta \gamma share the same starting point and share the same ending point if and only if x=xx = x' and y=yy = y'. If all four of these equations hold, both γ\gamma and δ\delta are loops based at xx. So, we shall impose the relation γδ=δγ\gamma \delta = \delta \gamma only in this case.

We say two paths are homologous if one can be obtained from another by:

  • repeatedly introducing or deleting subwords e ie i+1e_i e_{i+1} where e i+1=e i 1e_{i+1} = e_i^{-1}, and/or

  • repeatedly replacing subwords of the form e ie je j+1e ke_i \cdots e_j e_{j+1} \cdots e_k by those of the form e j+1e ke ie je_{j+1} \cdots e_k e_i \cdots e_j , where e ie j e_i \cdots e_j and e j+1e ke_{j+1} \cdots e_k are loops based at the same vertex.

My use of the term ‘homologous’ is a bit nonstandard here!

We denote the homology class of a path γ\gamma by [[γ]][[ \gamma ]]. Note that if two paths γ:xy\gamma : x \to y, δ:xy\delta : x' \to y' are homologous then x=xx = x' and y=yy = y'. Thus, the starting and ending points of a homology class of paths are well-defined, and given any path γ:xy \gamma : x \to y we write [[γ]]:xy [[ \gamma ]] : x \to y . The composite of homology classes is also well-defined if we set [[γ]][[δ]]=[[γδ]] [[ \gamma ]] [[ \delta ]] = [[ \gamma \delta ]].

We construct the maximal abelian cover of a connected graph XX just as we constructed its universal cover, but using homology classes rather than homotopy classes of paths. And now I’ll introduce some jargon that should make you start thinking about crystals!

Fix a basepoint x 0x_0 for XX. The vertices of X¯\overline{X}, or atoms, are defined to be the homology classes of paths [[γ]]:x 0x [[\gamma]] : x_0 \to x where xx is arbitrary. Any edge of X¯\overline{X}, or bond, goes from some atom [[γ]]:x 0x[[ \gamma]] : x_0 \to x to the some atom [[δ]]:x 0y[[ \delta ]] : x_0 \to y. The bonds from [[γ]][[ \gamma]] to [[δ]][[ \delta ]] are defined to be the edges eEe \in E with [[γe]]=[[δ]][[ \gamma e ]] = [[ \delta ]]. There is at most one bond between any two atoms. Again we have a covering map

q:X¯X q : \overline{X} \to X

The homotopy classes of loops based at x 0x_0 form a group, with composition as the group operation. This is the fundamental group π 1(X)\pi_1(X) of the graph XX. (It depends on the basepoint x 0x_0, but I’ll leave that out out of the notation just to scandalize my colleagues. It’s so easy to live dangerously when you’re an academic!)

Now, this fundamental group is isomorphic to the usual fundamental group of the space associated to XX. By our construction of the universal cover, π 1(X)\pi_1(X) is also the set of vertices of X˜\widetilde{X} that are mapped to x 0x_0 by pp. Furthermore, any element [γ]π 1(X) [\gamma] \in \pi_1(X) defines a deck transformation of X˜\widetilde{X} that sends each vertex [δ]:x 0x [\delta] : x_0 \to x to the vertex [γ][δ]:x 0x [\gamma] [\delta] : x_0 \to x.

Similarly, the homology classes of loops based at x 0x_0 form a group with composition as the group operation. Since the additional relation used to define homology classes is precisely that needed to make composition of homology classes of loops commutative, this group is the abelianization of π 1(X)\pi_1(X). It is therefore isomorphic to the first homology group H 1(X,)H_1(X,\mathbb{Z}) of the geometric realization of XX!

By our construction of the maximal abelian cover, H 1(X,)H_1(X,\mathbb{Z}) is also the set of vertices of X¯\overline{X} that are mapped to x 0x_0 by qq. Furthermore, any element [[γ]]H 1(X,) [[\gamma]] \in H_1(X,\mathbb{Z}) defines a deck transformation of X¯\overline{X} that sends each vertex [[δ]]:x 0x [[\delta]] : x_0 \to x to the vertex [[γ]][[δ]]:x 0x [[\gamma]] [[\delta]] : x_0 \to x.

So, it all works out! The fundamental group π 1(X)\pi_1(X) acts as deck transformations of the universal cover, while the first homology group H 1(X,)H_1(X,\mathbb{Z}) acts as deck transformations of the maximal abelian cover!

Puzzle for experts: what does this remind you of in Galois theory?

We’ll get back to crystals next time.

Matt StrasslerThe Summer View at CERN

For the first time in some years, I’m spending two and a half weeks at CERN (the lab that hosts the Large Hadron Collider [LHC]). Most of my recent visits have been short or virtual, but this time* there’s a theory workshop that has collected together a number of theoretical particle physicists, and it’s a good opportunity for all of us to catch up with the latest creative ideas in the subject.   It’s also an opportunity to catch a glimpse of the furtive immensity of Mont Blanc, a hulking bump on the southern horizon, although only if (as is rarely the case) nature offers clear and beautiful weather.

More importantly, new results on the data collected so far in 2016 at the LHC are coming very soon!  They will be presented at the ICHEP conference that will be held in Chicago starting August 3rd. And there’s something we’ll be watching closely.

You may remember that in a post last December I wrote:

  “Everybody wants to know. That bump seen on the ATLAS and CMS two-photon plots!  What… IS… it…?

Why the excitement? A bump of this type can be a signal of a new particle (as was the case for the Higgs particle itself.) And since a new particle that would produce a bump of this size was both completely unexpected and completely plausible, there was hope that we were seeing a hint of something new and important.

However, as I wrote in the same post,

  “Well, to be honest, probably it’s just that: a bump on a plot. But just in case it’s not…”

and I went on to discuss briefly what it might mean if it wasn’t just a statistical fluke. But speculation may be about to end: finally, we’re about to find out if it was indeed just a fluke — or a sign of something real.

Since December the amount of 13 TeV collision data available at ATLAS and CMS (the two general purpose experiments at the LHC) has roughly quadrupled, which means that typical bumps and wiggles on their 2015-2016 plots have decreased in relative size by about a factor of two (= square root of four). If the December bump is just randomness, it should also decrease in relative size. If it’s real, it should remain roughly the same relative size, but appear more prominent relative to the random bumps and wiggles around it.

Now, there’s a caution to be added here. The December ATLAS bump was so large and fat compared to what was seen at CMS that (since reality has to appear the same at both experiments, once enough data has been collected) it was pretty obvious that even if it there were a real bump there, at ATLAS it was probably in combination with a statistical fluke that made it look larger and fatter than its true nature. [Something similar happened with the Higgs; the initial bump that ATLAS saw was twice as big as expected, which is why it showed up so early, but it gradually has shrunk as more data has been collected and it is now close to its expected size.  In retrospect, that tells us that ATLAS’s original signal was indeed combined with a statistical fluke that made it appear larger than it really is.] What that means is that even if the December bumps were real, we would expect the ATLAS bump to shrink in size (but not statistical significance) and we would expect the CMS bump to remain of similar size (but grow in statistical significance). Remember, though, that “expectation” is not certainty, because at every stage statistical flukes (up or down) are possible.

In about a week we’ll find out where things currently stand. But the mood, as I read it here in the hallways and cafeteria, is not one of excitement. Moreover, the fact that the update to the results is (at the moment) unobtrusively scheduled for a parallel session of the ICHEP conference next Friday, afternoon time at CERN, suggests we’re not going  to see convincing evidence of anything exciting. If so, then the remaining question will be whether the reverse is true: whether the data will show convincing evidence that the December bump was definitely a fluke.

Flukes are guaranteed; with limited amounts of data, they can’t be avoided.  Discoveries, on the other hand, require skill, insight, and luck: you must ask a good question, address it with the best available methods, and be fortunate enough that (as is rarely the case) nature offers a clear and interesting answer.


*I am grateful for the CERN theory group’s financial support during this visit.

Filed under: LHC News, Particle Physics Tagged: atlas, cms, LHC, photons

n-Category Café Topological Crystals (Part 1)


Over on Azimuth I posted an article about crystals:

In the comments on that post, a bunch of us worked on some puzzles connected to ‘topological crystallography’—a subject that blends graph theory, topology and mathematical crystallography. You can learn more about that subject here:

I got so interested that I wrote this paper about it, with massive help from Greg Egan:

I’ll explain the basic ideas in a series of posts here.

First, a few personal words.

I feel a bit guilty putting so much work into this paper when I should be developing network theory to the point where it does our planet some good. I seem to need a certain amount of beautiful pure math to stay sane. But this project did at least teach me a lot about the topology of graphs.

For those not in the know, applying homology theory to graphs might sound fancy and interesting. For people who have studied a reasonable amount of topology, it probably sounds easy and boring. The first homology of a graph of genus gg is a free abelian group on gg generators: it’s a complete invariant of connected graphs up to homotopy equivalence. Case closed!

But there’s actually more to it, because studying graphs up to homotopy equivalence kills most of the fun. When we’re studying networks in real life we need a more refined outlook on graphs. So some aspects of this project might pay off, someday, in ways that have nothing to do with crystallography. But right now I’ll just talk about it as a fun self-contained set of puzzles.

I’ll start by quickly sketching how to construct topological crystals, and illustrate it with the example of graphene, a 2-dimensional form of carbon:

I’ll precisely state our biggest result, which says when the construction gives a crystal where the atoms don’t bump into each other and the bonds between atoms don’t cross each other. Later I may come back and add detail, but for now you can find details in our paper.

Constructing topological crystals

The ‘maximal abelian cover’ of a graph plays a key role in Sunada’s work on topological crystallography. Just as the universal cover of a connected graph XX has the fundamental group π 1(X)\pi_1(X) as its group of deck transformations, the maximal abelian cover, denoted X¯\overline{X}, has the abelianization of π 1(X)\pi_1(X) as its group of deck transformations. It thus covers every other connected cover of XX whose group of deck transformations is abelian. Since the abelianization of π 1(X)\pi_1(X) is the first homology group H 1(X,)H_1(X,\mathbb{Z}), there is a close connection between the maximal abelian cover and homology theory.

In our paper, Greg and I prove that for a large class of graphs, the maximal abelian cover can naturally be embedded in the vector space H 1(X,)H_1(X,\mathbb{R}). We call this embedded copy of X¯\overline{X} a ‘topological crystal’. The symmetries of the original graph can be lifted to symmetries of its topological crystal, but the topological crystal also has an nn-dimensional lattice of translational symmetries. In 2- and 3-dimensional examples, the topological crystal can serve as the blueprint for an actual crystal, with atoms at the vertices and bonds along the edges.

The general construction of topological crystals was developed by Kotani and Sunada, and later by Eon. Sunada uses ‘topological crystal’ for an even more general concept, but we only need a special case.

Here’s how it works. We start with a graph XX. This has a space C 0(X,)C_0(X,\mathbb{R}) of 0-chains, which are formal linear combinations of vertices, and a space C 1(X,)C_1(X,\mathbb{R}) of 1-chains, which are formal linear combinations of edges. There is a boundary operator

:C 1(X,)C 0(X,) \partial \colon C_1(X,\mathbb{R}) \to C_0(X,\mathbb{R})

This is the linear operator sending any edge to the difference of its two endpoints. The kernel of this operator is called the space of 1-cycles, Z 1(X,)Z_1(X,\mathbb{R}). There is an inner product on the space of 1-chains such that edges form an orthonormal basis. This determines an orthogonal projection

π:C 1(X,)Z 1(X,) \pi \colon C_1(X,\mathbb{R}) \to Z_1(X,\mathbb{R})

For a graph, Z 1(X,)Z_1(X,\mathbb{R}) is isomorphic to the first homology group H 1(X,)H_1(X,\mathbb{R}). So, to obtain the topological crystal of XX, we need only embed its maximal abelian cover X¯\overline{X} in Z 1(X,)Z_1(X,\mathbb{R}). We do this by embedding X¯\overline{X} in C 1(X,)C_1(X,\mathbb{R}) and then projecting it down via π\pi.

To accomplish this, we need to fix a basepoint for XX. Each path γ\gamma in XX starting at this basepoint determines a 1-chain c γc_\gamma. These 1-chains correspond to the vertices of X¯\overline{X}. The graph X¯\overline{X} has an edge from c γc_\gamma to c γc_{\gamma'} whenever the path γ\gamma' is obtained by adding an extra edge to γ\gamma. This edge is a straight line segment from the point c γc_\gamma to the point c γc_{\gamma'}.

The hard part is checking that the projection π\pi maps this copy of X¯\overline{X} into Z 1(X,)Z_1(X,\mathbb{R}) in a one-to-one manner. In Theorems 6 and 7 of our paper we prove that this happens precisely when the graph XX has no ‘bridges’: that is, edges whose removal would disconnect XX.

Kotani and Sunada noted that this condition is necessary. That’s actually pretty easy to see. The challenge was to show that it’s sufficient! For this, our main technical tool is Lemma 5, which for any path γ\gamma decomposes the 1-chain c γc_\gamma into manageable pieces.

We call the resulting copy of X¯\overline{X} embedded in Z 1(X,)Z_1(X,\mathbb{R}) a topological crystal.

Let’s see how it works in an example!

Take XX to be this graph:

Since XX has 3 edges, the space of 1-chains is 3-dimensional. Since XX has 2 holes, the space of 1-cycles is a 2-dimensional plane in this 3-dimensional space. If we take paths γ\gamma in XX starting at the red vertex, form the 1-chains c γc_\gamma, and project them down to this plane, we obtain the following picture:

Here the 1-chains c γc_\gamma are the white and red dots. These are the vertices of X¯\overline{X}, while the line segments between them are the edges of X¯\overline{X}. Projecting these vertices and edges onto the plane of 1-cycles, we obtain the topological crystal for XX. The blue dots come from projecting the white dots onto the plane of 1-cycles, while the red dots already lie on this plane. The resulting topological crystal provides the pattern for graphene:

That’s all there is to the basic idea! But there’s a lot more to say about the mathematics it leads to, and a lot of fun examples to look at: diamonds, triamonds, hyperquartz and more.

John BaezTopological Crystals (Part 1)


A while back, we started talking about crystals:

• John Baez, Diamonds and triamonds, Azimuth, 11 April 2016.

In the comments on that post, a bunch of us worked on some puzzles connected to ‘topological crystallography’—a subject that blends graph theory, topology and mathematical crystallography. You can learn more about that subject here:

• Tosio Sunada, Crystals that nature might miss creating, Notices of the AMS 55 (2008), 208–215.

I got so interested that I wrote this paper about it, with massive help from Greg Egan:

• John Baez, Topological crystals.

I’ll explain the basic ideas in a series of posts here.

First, a few personal words.

I feel a bit guilty putting so much work into this paper when I should be developing network theory to the point where it does our planet some good. I seem to need a certain amount of beautiful pure math to stay sane. But this project did at least teach me a lot about the topology of graphs.

For those not in the know, applying homology theory to graphs might sound fancy and interesting. For people who have studied a reasonable amount of topology, it probably sounds easy and boring. The first homology of a graph of genus g is a free abelian group on g generators: it’s a complete invariant of connected graphs up to homotopy equivalence. Case closed!

But there’s actually more to it, because studying graphs up to homotopy equivalence kills most of the fun. When we’re studying networks in real life we need a more refined outlook on graphs. So some aspects of this project might pay off, someday, in ways that have nothing to do with crystallography. But right now I’ll just talk about it as a fun self-contained set of puzzles.

I’ll start by quickly sketching how to construct topological crystals, and illustrate it with the example of graphene, a 2-dimensional form of carbon:

I’ll precisely state our biggest result, which says when this construction gives a crystal where the atoms don’t bump into each other and the bonds between atoms don’t cross each other. Later I may come back and add detail, but for now you can find details in our paper.

Constructing topological crystals

The ‘maximal abelian cover’ of a graph plays a key role in Sunada’s work on topological crystallography. Just as the universal cover of a connected graph X has the fundamental group \pi_1(X) as its group of deck transformations, the maximal abelian cover, denoted \overline{X}, has the abelianization of \pi_1(X) as its group of deck transformations. It thus covers every other connected cover of X whose group of deck transformations is abelian. Since the abelianization of \pi_1(X) is the first homology group H_1(X,\mathbb{Z}), there is a close connection between the maximal abelian cover and homology theory.

In our paper, Greg and I prove that for a large class of graphs, the maximal abelian cover can naturally be embedded in the vector space H_1(X,\mathbb{R}). We call this embedded copy of \overline{X} a ‘topological crystal’. The symmetries of the original graph can be lifted to symmetries of its topological crystal, but the topological crystal also has an n-dimensional lattice of translational symmetries. In 2- and 3-dimensional examples, the topological crystal can serve as the blueprint for an actual crystal, with atoms at the vertices and bonds along the edges.

The general construction of topological crystals was developed by Kotani and Sunada, and later by Eon. Sunada uses ‘topological crystal’ for an even more general concept, but we only need a special case.

Here’s how it works. We start with a graph X. This has a space C_0(X,\mathbb{R}) of 0-chains, which are formal linear combinations of vertices, and a space C_1(X,\mathbb{R}) of 1-chains, which are formal linear combinations of edges. There is a boundary operator

\partial \colon C_1(X,\mathbb{R}) \to C_0(X,\mathbb{R})

This is the linear operator sending any edge to the difference of its two endpoints. The kernel of this operator is called the space of 1-cycles, Z_1(X,\mathbb{R}). There is an inner product on the space of 1-chains such that edges form an orthonormal basis. This determines an orthogonal projection

\pi \colon C_1(X,\mathbb{R}) \to Z_1(X,\mathbb{R})

For a graph, Z_1(X,\mathbb{R}) is isomorphic to the first homology group H_1(X,\mathbb{R}). So, to obtain the topological crystal of X, we need only embed its maximal abelian cover \overline{X} in Z_1(X,\mathbb{R}). We do this by embedding \overline{X} in C_1(X,\mathbb{R}) and then projecting it down via \pi.

To accomplish this, we need to fix a basepoint for X. Each path \gamma in X starting at this basepoint determines a 1-chain c_\gamma. These 1-chains correspond to the vertices of \overline{X}. The graph \overline{X} has an edge from c_\gamma to c_{\gamma'} whenever the path \gamma' is obtained by adding an extra edge to \gamma. This edge is a straight line segment from the point c_\gamma to the point c_{\gamma'}.

The hard part is checking that the projection \pi maps this copy of \overline{X} into Z_1(X,\mathbb{R}) in a one-to-one manner. In Theorems 6 and 7 of our paper we prove that this happens precisely when the graph X has no ‘bridges’: that is, edges whose removal would disconnect X.

Kotani and Sunada noted that this condition is necessary. That’s actually pretty easy to see. The challenge was to show that it’s sufficient! For this, our main technical tool is Lemma 5, which for any path \gamma decomposes the 1-chain c_\gamma into manageable pieces.

We call the resulting copy of \overline{X} embedded in Z_1(X,\mathbb{R}) a topological crystal.

Let’s see how it works in an example!

Take X to be this graph:

Since X has 3 edges, the space of 1-chains is 3-dimensional. Since X has 2 holes, the space of 1-cycles is a 2-dimensional plane in this 3-dimensional space. If we consider paths \gamma in X starting at the red vertex, form the 1-chains c_\gamma, and project them down to this plane, we obtain the following picture:

Here the 1-chains c_\gamma are the white and red dots. These are the vertices of \overline{X}, while the line segments between them are the edges of \overline{X}. Projecting these vertices and edges onto the plane of 1-cycles, we obtain the topological crystal for X. The blue dots come from projecting the white dots onto the plane of 1-cycles, while the red dots already lie on this plane. The resulting topological crystal provides the pattern for graphene:

That’s all there is to the basic idea! But there’s a lot more to say about it, and a lot of fun examples to look at: diamonds, triamonds, hyperquartz and more.

July 26, 2016

Clifford JohnsonBut Does That Really Happen…?


Sorry I've been quiet for a long stretch recently. I've been tied up with travel, physics research, numerous meetings of various sorts (from the standard bean-counting variety to the "here's three awesome science-y things to put into your movie/TVshow" variety*), and other things, like helping my garden survive this heatwave.

I've lost some time on the book, but I'm back on it for a while, and have [...] Click to continue reading this post

The post But Does That Really Happen…? appeared first on Asymptotia.

Georg von HippelLattice 2016, Day Two

Hello again from Lattice 2016 at Southampton. Today's first plenary talk was the review of nuclear physics from the lattice given by Martin Savage. Doing nuclear physics from first principles in QCD is obviously very hard, but also necessary in order to truly understand nuclei in theoretical terms. Examples of needed theory predictions include the equation of state of dense nuclear matter, which is important for understanding neutron stars, and the nuclear matrix elements required to interpret future searches for neutrinoless double β decays in terms of fundamental quantities. The problems include the huge number of required quark-line contractions and the exponentially decaying signal-to-noise ratio, but there are theoretical advances that increasingly allow to bring these under control. The main competing procedures are more or less direct applications of the Lüscher method to multi-baryon systems, and the HALQCD method of computing a nuclear potential from Bethe-Salpeter amplitudes and solving the Schrödinger equation for that potential. There has been a lot of progress in this field, and there are now first results for nuclear reaction rates.

Next, Mike Endres spoke about new simulation strategies for lattice QCD. One of the major problems in going to very fine lattice spacings is the well-known phenomenon critical slowing-down, i.e. the divergence of the autocorrelation times with some negative power of the lattice spacing, which is particularly severe for the topological charge (a quantity that cannot change at all in the continuum limit), leading to the phenomenon of "topology freezing" in simulations at fine lattice spacings. To overcome this problem, changes in the boundary conditions have been proposed: open boundary conditions that allow topological charge to move into and out of the system, and non-orientable boundary conditions that destroy the notion of an integer topological charge. An alternative route lies in algorithmic modifications such as metadynamics, where a potential bias is introduced to disfavour revisiting configurations, so as to forcibly sample across the potential wells of different topological sectors over time, or multiscale thermalization, where a Markov chain is first run at a coarse lattice spacing to obtain well-decorrelated configurations, and then each of those is subjected to a refining operation to obtain a (non-thermalized) gauge configuration at half the lattice spacing, each of which can then hopefully thermalized by a short sequence of Monte Carlo update operations.

As another example of new algorithmic ideas, Shinji Takeda presented tensor networks, which are mathematical objects that assign a tensor to each site of a lattice, with lattice links denoting the contraction of tensor indices. An example is given by the rewriting of the partition function of the Ising model that is at the heart of the high-temperature expansion, where the sum over the spin variables is exchanged against a sum over link variables taking values of 0 or 1. One of the applications of tensor networks in field theory is that they allow for an implementation of the renormalization group based on performing a tensor decomposition along the lines of a singular value decomposition, which can be truncated, and contracting the resulting approximate tensor decomposition into new tensors living on a coarser grid. Iterating this procedure until only one lattice site remains allows the evaluation of partition functions without running into any sign problems and at only O(log V) effort.

After the coffee break, Sara Collins gave the review talk on hadron structure. This is also a field in which a lot of progress has been made recently, with most of the sources of systematic error either under control (e.g. by performing simulations at or near the physical pion mass) or at least well understood (e.g. excited-state and finite-volume effects). The isovector axial charge gA of the nucleon, which for a long time was a bit of an embarrassment to lattice practitioners, since it stubbornly refused to approach its experimental value, is now understood to be particularly severely affected by excited-state effects, and once these are well enough suppressed or properly accounted for, the situation now looks quite promising. This lends much larger credibility to lattice predictions for the scalar and tensor nucleon charges, for which little or no experimental data exists. The electromagnetic form factors are also in much better shape than one or two years ago, with the electric Sachs form factor coming out close to experiment (but still with insufficient precision to resolve the conflict between the experimental electron-proton scattering and muonic hydrogen results), while now the magnetic Sachs form factor shows a trend to undershoot experiment. Going beyond isovector quantities (in which disconnected diagrams cancel), the progress in simulation techniques for disconnected diagrams has enabled the first computation of the purely disconnected strangeness form factors. The sigma term σπN comes out smaller on the lattice than it does in experiment, which still needs investigation, and the average momentum fraction <x> still needs to become the subject of a similar effort as the nucleon charges have received.

In keeping with the pattern of having large review talks immediately followed by a related topical talk, Huey-Wen Lin was next with a talk on the Bjorken-x dependence of the parton distribution functions (PDFs). While the PDFs are defined on the lightcone, which is not readily accessible on the lattice, a large-momentum effective theory formulation allows to obtain them as the infinite-momentum limit of finite-momentum parton distribution amplitudes. First studies show interesting results, but renormalization still remains to be performed.

After lunch, there were parallel sessions, of which I attended the ones into which most of the (g-2) talks had been collected, showing quite a rate of progress in terms of the treatment of in particular the disconnected contributions.

In the evening, the poster session took place.

Tommaso DorigoThe Daily Physics Problem - 5

As explained in the first installment of this series, these questions are a warm-up for my younger colleagues, who will in two months have to pass a tough exam to become INFN researchers.
A disclaimer follows:

read more

Tommaso DorigoThe Daily Physics Problem - 4

As explained in the previous installment of this series, these questions are a warm-up for my younger colleagues, who will in two months have to pass a tough exam to become INFN researchers.
A disclaimer follows:

read more

July 25, 2016

Georg von HippelLattice 2016, Day One

Hello from Southampton, where I am attending the Lattice 2016 conference.

I arrived yesterday safe and sound, but unfortunately too late to attend the welcome reception. Today started off early and quite well with a full English breakfast, however.

The conference programme was opened with a short address by the university's Vicepresident of Research, who made a point of pointing out that he like 93% of UK scientists had voted to remain in the EU - an interesting testimony to the political state of affairs, I think.

The first plenary talk of the conference was a memorial to the scientific legacy of Peter Hasenfratz, who died earlier this year, delivered by Urs Wenger. Peter Hasenfratz was one of the pioneers of lattice field theory, and hearing of his groundbreaking achievements is one of those increasingly rare occasions when I get to feel very young: when he organized the first lattice symposium in 1982, he sent out individual hand-written invitations, and the early lattice reviews he wrote were composed in a time where most results were obtained in the quenched approximation. But his achievements are still very much current, amongst other things in the form of fixed-point actions as a realization of the Ginsparg-Wilson relation, which gave rise to the booming interest in chiral fermions.

This was followed by the review of hadron spectroscopy by Chuan Liu. The contents of the spectroscopy talks have by now shifted away from the ground-state spectrum of stable hadrons, the calculation of which has become more of a benchmark task, and towards more complex issues, such as the proton-neutron mass difference (which requires the treatment of isospin breaking effects both from QED and from the difference in bare mass of the up and down quarks) or the spectrum of resonances (which requires a thorough study of the volume dependence of excited-state energy levels via the Lüscher formalism). The former is required as part to the physics answer to the ageless question why anything exists at all, and the latter is called for in particular by the still pressing current question of the nature of the XYZ states.

Next came a talk by David Wilson on a more specific spectroscopy topic, namely resonances in coupled-channel scattering. Getting these right requires not only extensions of the Lüscher formalism, but also the extraction of very large numbers of energy levels via the generalized eigenvalue problem.

After the coffee break, Hartmut Wittig reviewed the lattice efforts at determining the hadronic contributions to the anomalous magnetic moment (g-2)μ of the muon from first principles. This is a very topical problem, as the next generation of muon experiments will reduce the experimental error by a factor of four or more, which will require a correspondingly large reduction in the theoretical uncertainties in order to interpret the experimental results. Getting to this level of accuracy requires getting the hadronic vacuum polarization contribution to sub-percent accuracy (which requires full control of both finite-volume and cut-off effects, and a reasonably accurate estimate for the disconnected contributions) and the hadronic light-by-light scattering contribution to an accuracy of better than 10% (which some way or another requires the calculation of a four-point function including a reasonable estimate for the disconnected contributions). There has been good progress towards both of these goals from a number of different collaborations, and the generally good overall agreement between results obtained using widely different formulations bodes well for the overall reliability of the lattice results, but there are still many obstacles to overcome.

The last plenary talk of the day was given by Sergei Dubovsky, who spoke about efforts to derive a theory of the QCD string. As with most stringy talks, I have to confess to being far too ignorant to give a good summary; what I took home is that there is some kind of string worldsheet theory with Goldstone bosons that can be used to describe the spectrum of large-Nc gauge theory, and that there are a number of theoretical surprises there.

Since the plenary programme is being streamed on the web, by the way, even those of you who cannot attend the conference can now do without my no doubt quite biased and very limited summaries and hear and see the talks for yourselves.

After lunch, parallel sessions took place. I found the sequence of talks by Stefan Sint, Alberto Ramos and Rainer Sommer about a precise determination of αs(MZ) using the Schrödinger functional and the gradient-flow coupling very interesting.

Scott AaronsonMy biology paper in Science (really)

Think I’m pranking you, right?

You can see the paper right here (“Synthetic recombinase-based state machines in living cells,” by Nathaniel Roquet, Ava P. Soleimany, Alyssa C. Ferris, Scott Aaronson, and Timothy K. Lu).  Unfortunately there’s a paywall, but I think we’ll be able to post our own version before long (will check).  In the meantime, you can read the MIT News article (“Scientists program cells to remember and respond to series of stimuli”).  In any case, my little part of the paper will be fully explained in this post.

A little over a year ago, two MIT synthetic biologists—Timothy Lu and his PhD student Nate Roquet—came to my office saying they had a problem they wanted help with.  Why me? I wondered.  Didn’t they realize I was a quantum complexity theorist, who so hated picking apart owl pellets and memorizing the names of cell parts in junior-high Life Science, that he avoided taking a single biology course since that time?  (Not counting computational biology, taught in a CS department by Richard Karp.)

Nevertheless, I listened to my biologist guests—which turned out to be an excellent decision.

Tim and Nate told me about a DNA system with surprisingly clear rules, which led them to a strange but elegant combinatorial problem.  In this post, first I need to spend some time to tell you the rules; then I can tell you the problem, and lastly its solution.  There are no mathematical prerequisites for this post, and certainly no biology prerequisites: everything will be completely elementary, like learning a card game.  Pen and paper might be helpful, though.

As we all learn in kindergarten, DNA is a finite string over the 4-symbol alphabet {A,C,G,T}.  We’ll find it more useful, though, to think in terms of entire chunks of DNA bases, which we’ll label arbitrarily with letters like X, Y, and Z.  For example, we might have X=ACT, Y=TAG, and Z=GATTACA.

We can also invert one of these chunks, which means writing it backwards while also swapping the A’s with T’s and the G’s with C’s.  We’ll denote this operation by * (the technical name in biology is “reverse-complement”).  For example:


Note that (X*)*=X.

We can then combine our chunks and their inverses into a longer DNA string, like so:


From now on, we’ll work exclusively with the chunks, and forget completely about the underlying A’s, C’s, G’s, and T’s.

Now, there are also certain special chunks of DNA bases, called recognition sites, which tell the little machines that read the DNA when they should start doing something and when they should stop.  Recognition sites come in pairs, so we’ll label them using various parenthesis symbols like ( ), [ ], { }.  To convert a parenthesis into its partner, you invert it: thus ( = )*, [ = ]*, { = }*, etc.  Crucially, the parentheses in a DNA string don’t need to “face the right ways” relative to each other, and they also don’t need to nest properly.  Thus, both of the following are valid DNA strings:

X ( Y [ Z [ U ) V

X { Y ] Z { U [ V

Let’s refer to X, Y, Z, etc.—the chunks that aren’t recognition sites—as letter-chunks.  Then it will be convenient to make the following simplifying assumptions:

  1. Our DNA string consists of an alternating sequence of recognition sites and letter-chunks, beginning and ending with letter-chunks.  (If this weren’t true, then we could just glom together adjacent recognition sites and adjacent letter-chunks, and/or add new dummy chunks, until it was true.)
  2. Every letter-chunk that appears in the DNA string appears exactly once (either inverted or not), while every recognition site that appears, appears exactly twice.  Thus, if there are n distinct recognition sites, there are 2n+1 distinct letter-chunks.
  3. Our DNA string can be decomposed into its constituent chunks uniquely—i.e., it’s always possible to tell which chunk we’re dealing with, and when one chunk stops and the next one starts.  In particular, the chunks and their reverse-complements are all distinct strings.

The little machines that read the DNA string are called recombinases.  There’s one kind of recombinase for each kind of recognition site: a (-recombinase, a [-recombinase, and so on.  When, let’s say, we let a (-recombinase loose on our DNA string, it searches for (‘s and )’s and ignores everything else.  Here’s what it does:

  • If there are no (‘s or )’s in the string, or only one of them, it does nothing.
  • If there are two (‘s facing the same way—like ( ( or ) )—it deletes everything in between them, including the (‘s themselves.
  • If there are two (‘s facing opposite ways—like ( ) or ) (—it deletes the (‘s, and inverts everything in between them.

Let’s see some examples.  When we apply [-recombinase to the string

A ( B [ C [ D ) E,

we get

A ( B D ) E.

When we apply (-recombinase to the same string, we get

A D* ] C* ] B* E.

When we apply both recombinases (in either order), we get

A D* B* E.

Another example: when we apply {-recombinase to

A { B ] C { D [ E,

we get

A D [ E.

When we apply [-recombinase to the same string, we get

A { B D* } C* E.

When we apply both recombinases—ah, but here the order matters!  If we apply { first and then [, we get

A D [ E,

since the [-recombinase now encounters only a single [, and has nothing to do.  On the other hand, if we apply [ first and then {, we get

A D B* C* E.

Notice that inverting a substring can change the relative orientation of two recognition sites—e.g., it can change { { into { } or vice versa.  It can thereby change what happens (inversion or deletion) when some future recombinase is applied.

One final rule: after we’re done applying recombinases, we remove the remaining recognition sites like so much scaffolding, leaving only the letter-chunks.  Thus, the final output

A D [ E

becomes simply A D E, and so on.  Notice also that, if we happen to delete one recognition site of a given type while leaving its partner, the remaining site will necessarily just bounce around inertly before getting deleted at the end—so we might as well “put it out of its misery,” and delete it right away.

My coauthors have actually implemented all of this in a wet lab, which is what most of the Science paper is about (my part is mostly in a technical appendix).  They think of what they’re doing as building a “biological state machine,” which could have applications (for example) to programming cells for medical purposes.

But without further ado, let me tell you the math question they gave me.  For reasons that they can explain better than I can, my coauthors were interested in the information storage capacity of their biological state machine.  That is, they wanted to know the answer to the following:

Suppose we have a fixed initial DNA string, with n pairs of recognition sites and 2n+1 letter-chunks; and we also have a recombinase for each type of recognition site.  Then by choosing which recombinases to apply, as well as which order to apply them in, how many different DNA strings can we generate as output?

It’s easy to construct an example where the answer is as large as 2n.  Thus, if we consider a starting string like

A ( B ) C [ D ] E { F } G < H > I,

we can clearly make 24=16 different output strings by choosing which subset of recombinases to apply and which not.  For example, applying [, {, and < (in any order) yields

A B C D* E F* G H* I.

There are also cases where the number of distinct outputs is less than 2n.  For example,

A ( B [ C [ D ( E

can produce only 3 outputs—A B C D E, A B D E, and A E—rather than 4.

What Tim and Nate wanted to know was: can the number of distinct outputs ever be greater than 2n?

Intuitively, it seems like the answer “has to be” yes.  After all, we already saw that the order in which recombinases are applied can matter enormously.  And given n recombinases, the number of possible permutations of them is n!, not 2n.  (Furthermore, if we remember that any subset of the recombinases can be applied in any order, the number of possibilities is even a bit greater—about e·n!.)

Despite this, when my coauthors played around with examples, they found that the number of distinct output strings never exceeded 2n. In other words, the number of output strings behaved as if the order didn’t matter, even though it does.  The problem they gave me was either to explain this pattern or to find a counterexample.

I found that the pattern holds:

Theorem: Given an initial DNA string with n pairs of recognition sites, we can generate at most 2n distinct output strings by choosing which recombinases to apply and in which order.

Let a recombinase sequence be an ordered list of recombinases, each occurring at most once: for example, ([{ means to apply (-recombinase, then [-recombinase, then {-recombinase.

The proof of the theorem hinges on one main definition.  Given a recombinase sequence that acts on a given DNA string, let’s call the sequence irreducible if every recombinase in the sequence actually finds two recognition sites (and hence, inverts or deletes a nonempty substring) when it’s applied.  Let’s call the sequence reducible otherwise.  For example, given

A { B ] C { D [ E,

the sequence [{ is irreducible, but {[ is reducible, since the [-recombinase does nothing.

Clearly, for every reducible sequence, there’s a shorter sequence that produces the same output string: just omit the recombinases that don’t do anything!  (On the other hand, I leave it as an exercise to show that the converse is false.  That is, even if a sequence is irreducible, there might be a shorter sequence that produces the same output string.)

Key Lemma: Given an initial DNA string, and given a subset of k recombinases, every irreducible sequence composed of all k of those recombinases produces the same output string.

Assuming the Key Lemma, let’s see why the theorem follows.  Given an initial DNA string, suppose you want to specify one of its possible output strings.  I claim you can do this using only n bits of information.  For you just need to specify which subset of the n recombinases you want to apply, in some irreducible order.  Since every irreducible sequence of those recombinases leads to the same output, you don’t need to specify an order on the subset.  Furthermore, for each possible output string S, there must be some irreducible sequence that leads to S—given a reducible sequence for S, just keep deleting irrelevant recombinases until no more are left—and therefore some subset of recombinases you could pick that uniquely determines S.  OK, but if you can specify each S uniquely using n bits, then there are at most 2n possible S’s.

Proof of Key Lemma.  Given an initial DNA string, let’s assume for simplicity that we’re going to apply all n of the recombinases, in some irreducible order.  We claim that the final output string doesn’t depend at all on which irreducible order we pick.

If we can prove this claim, then the lemma follows, since given a proper subset of the recombinases, say of size k<n, we can simply glom together everything between one relevant recognition site and the next one, treating them as 2k+1 giant letter-chunks, and then repeat the argument.

Now to prove the claim.  Given two letter-chunks—say A and B—let’s call them soulmates if either A and B or A* and B* will necessarily end up next to each other, whenever all n recombinases are applied in some irreducible order, and whenever A or B appears at all in the output string.  Also, let’s call them anti-soulmates if either A and B* or A* and B will necessarily end up next to each other if either appears at all.

To illustrate, given the initial DNA sequence,

A [ B ( C ] D ( E,

you can check that A and C are anti-soulmates.  Why?  Because if we apply all the recombinases in an irreducible sequence, then at some point, the [-recombinase needs to get applied, and it needs to find both [ recognition sites.  And one of these recognition sites will still be next to A, and the other will still be next to C (for what could have pried them apart?  nothing).  And when that happens, no matter where C has traveled in the interim, C* must get brought next to A.  If the [-recombinase does an inversion, the transformation will look like

A [ … C ] → A C* …,

while if it does a deletion, the transformation will look like

A [ … [ C* → A C*

Note that C’s [ recognition site will be to its left, if and only if C has been flipped to C*.  In this particular example, A never moves, but if it did, we could repeat the analysis for A and its [ recognition site.  The conclusion would be the same: no matter what inversions or deletions we do first, we’ll maintain the invariant that A and C* (or A* and C) will immediately jump next to each other, as soon as the [ recombinase is applied.  And once they’re next to each other, nothing will ever separate them.

Similarly, you can check that C and D are soulmates, connected by the ( recognition sites; D and B are anti-soulmates, connected by the [ sites; and B and E are soulmates, connected by the ( sites.

More generally, let’s consider an arbitrary DNA sequence, with n pairs of recognition sites.  Then we can define a graph, called the soulmate graph, where the 2n+1 letter-chunks are the vertices, and where X and Y are connected by (say) a blue edge if they’re soulmates, and by a red edge if they’re anti-soulmates.

When we construct this graph, we find that every vertex has exactly 2 neighbors, one for each recognition site that borders it—save the first and last vertices, which border only one recognition site each and so have only one neighbor each.  But these facts immediately determine the structure of the graph.  Namely, it must consist of a simple path, starting at the first letter-chunk and ending at the last one, together with possibly a disjoint union of cycles.

But we know that the first and last letter-chunks can never move anywhere.  For that reason, a path of soulmates and anti-soulmates, starting at the first letter-chunk and ending at the last one, uniquely determines the final output string, when the n recombinases are applied in any irreducible order.  We just follow it along, switching between inverted and non-inverted letter-chunks whenever we encounter a red edge.  The cycles contain the letter-chunks that necessarily get deleted along the way to that unique output string.  This completes the proof of the lemma, and hence the theorem.


There are other results in the paper, like a generalization to the case where there can be k pairs of recognition sites of each type, rather than only one. In that case, we can prove that the number of distinct output strings is at most 2kn, and that it can be as large as ~(2k/3e)n. We don’t know the truth between those two bounds.

Why is this interesting?  As I said, my coauthors had their own reasons to care, involving the number of bits one can store using a certain kind of DNA state machine.  I got interested for a different reason: because this is a case where biology threw up a bunch of rules that look like a random mess—the parentheses don’t even need to nest correctly?  inversion can also change the semantics of the recognition sites?  evolution never thought about what happens if you delete one recognition site while leaving the other one?—and yet, on analysis, all the rules work in perfect harmony to produce a certain outcome.  Change a single one of them, and the “at most 2n distinct DNA sequences” theorem would be false.  Mind you, I’m still not sure what biological purpose it serves for the rules to work in harmony this way, but they do.

But the point goes further.  While working on this problem, I’d repeatedly encounter an aspect of the mathematical model that seemed weird and inexplicable—only to have Tim and Nate explain that the aspect made sense once you brought in additional facts from biology, facts not in the model they gave me.  As an example, we saw that in the soulmate graph, the deleted substrings appear as cycles.  But surely excised DNA fragments don’t literally form loops?  Why yes, apparently, they do.  As a second example, consider the DNA string

A ( B [ C ( D [ E.

When we construct the soulmate graph for this string, we get the path


Yet there’s no actual recombinase sequence that leads to A D C B E as an output string!  Thus, we see that it’s possible to have a “phantom output,” which the soulmate graph suggests should be reachable but that isn’t actually reachable.  According to my coauthors, that’s because the “phantom outputs” are reachable, once you know that in real biology (as opposed to the mathematical model), excised DNA fragments can also reintegrate back into the long DNA string.

Many of my favorite open problems about this model concern algorithms and complexity. For example: given as input an initial DNA string, does there exist an irreducible order in which the recombinases can be applied? Is the “utopian string”—the string suggested by the soulmate graph—actually reachable? If it is reachable, then what’s the shortest sequence of recombinases that reaches it? Are these problems solvable in polynomial time? Are they NP-hard? More broadly, if we consider all the subsets of recombinases that can be applied in an irreducible order, or all the irreducible orders themselves, what combinatorial conditions do they satisfy?  I don’t know—if you’d like to take a stab, feel free to share what you find in the comments!

What I do know is this: I’m fortunate that, before they publish your first biology paper, the editors at Science don’t call up your 7th-grade Life Science teacher to ask how you did in the owl pellet unit.

More in the comments:

  • Some notes on the generalization to k pairs of recognition sites of each type
  • My coauthor Nathaniel Roquet’s comments on the biology

Unrelated Announcement from My Friend Julia Wise (July 24): Do you like science and care about humanity’s positive trajectory? July 25 is the final day to apply for Effective Altruism Global 2016. From August 5-7 at UC Berkeley, a network of founders, academics, policy-makers, and more will gather to apply economic and scientific thinking to the world’s most important problems. Last year featured Elon Musk and the head of This year will be headlined by Cass Sunstein, the co-author of Nudge. If you apply with this link, the organizers will give you a free copy of Doing Good Better by Will MacAskill. Scholarships are available for those who can’t afford the cost.  Read more here.  Apply here.

Tommaso DorigoThe Daily Physics Problem - 3

As explained in the previous installment of this series, these questions are a warm-up for my younger colleagues, who will in two months have to pass a tough exam to become INFN researchers.

A disclaimer is useful here. Here it is:

read more

David Hoggget simpler; chemical diversity

Dan Foreman-Mackey opined that the main thing I was communicating with my ur-complex probabilistic graphical model for stars and Gaia is it's complicated. So I built a simplified pgm, crushing all the stellar physics into one node, with the intention of blowing up that node into its own model in a separate figure.

I had a great conversation at lunch with David Weinberg (OSU) about stellar chemical abundances, and in particular whether we could show that chemical diversity is larger than can be explained with any model in which supernovae ejecta get fully mixed into the ISM. He pointed out that while there does appear to be diversity among stars, in certain directions the diversity of abundance ratios is extremely tiny; in particular: Although the alpha/Fe distribution is bimodal, at any Fe/H, the two modes are both very narrow! That's super-interesting.

Particlebites750 GeV Bump Update

Article: Search for resonant production of high-mass photon pairs in proton-proton collisions at sqrt(s) = 8 and 13 TeV
Authors: CMS Collaboration
Reference: arXiv:1606.04093 (Submitted to Phys. Rev. Lett)

Following the discovery of the Higgs boson at the LHC in 2012, high-energy physicists asked the same question that they have asked for years: “what’s next?” This time, however, the answer to that question was not nearly as obvious as it had been in the past. When the top quark was discovered at Fermilab in 1995, the answer was clear “the Higgs is next.” And when the W and Z bosons were discovered at CERN in 1983, physicists were saying “the top quark is right around the corner.” However, because the Higgs is the last piece of the puzzle that is the Standard Model, there is no clear answer to the question “what’s next?” At the moment, the honest answer to this question is “we aren’t quite sure.”

The Higgs completes the Standard Model, which would be fantastic news were it not for the fact that there remain unambiguous indications of physics beyond the Standard Model. Among these is dark matter, which makes up roughly one-quarter of the energy content of the universe. Neutrino mass, the Hierarchy Problem, and the matter-antimatter asymmetry in the universe are among other favorite arguments in favor of new physics. The salient point is clear: the Standard Model, though newly-completed, is not a complete description of nature, so we must press on.

Background-only p-values for a new scalar particle in the CMS diphoton data. The dip at 750 GeV may be early evidence for a new particle.

Background-only p-values for a new scalar particle in the CMS diphoton data. The dip at 750 GeV may be early evidence for a new particle.

Near the end of Run I of the LHC (2013) and the beginning of Run II (2015), the focus was on searches for new physics. While searches for supersymmetry and the direct production of dark matter drew a considerable deal of focus, towards the end of 2015, a small excess – or, as physicists commonly refer to them, a bump – began to materialize in decays to two photons seen by the CMS Collaboration. This observation was made all the more exciting by the fact that ATLAS observed an analogous bump in the same channel with roughly the same significance. The paper in question here, published June 2016, presents a combination of the 2012 (8 TeV) and 2015 (13 TeV) CMS data; it represents the most recent public CMS result on the so-called “di-photon resonance”. (See also Roberto’s recent ParticleBite.)

This analysis searches for events with two photons, a relatively clean signal. If there is a heavy particle which decays into two photons, then we expect to see an excess of events near the mass of this particle. In this case, CMS and ATLAS have observed an excess of events near 750 GeV in the di-photon channel. While some searches for new physics rely upon hard kinematic requirements or tailor their search to a certain signal model, the signal here is simple: look for an event with two photons and nothing else. However, because this is a model-independent search with loose selection requirements, great care must be taken to understand the background (events that mimic the signal) in order to observe an excess, should one exist. In this case, the background processes are direct production of two photons and events where one or more photon is actually a misidentified jet. For example, a neutral pion may be mistaken for a photon.

Part of the excitement from this excess is due to the fact that ATLAS and CMS both observed corresponding bump sin their datasets, a useful cross-check that the bump has a chance of being real. A bigger part of the excitement, however, are the physics implications of a new, heavy particle that decays into two photons. A particle decaying to two photons would likely be either spin-0 or spin-2 (in principle, it could be of spin-N where N is an integer and N ≥ 2). Models exist in which the aforementioned Higgs boson, h(125), is one of a family of Higgs particles, and these so-called “expanded Higgs sectors” predict heavy, spin-0 particles which would decay to two photons. Moreover, in models which there are extra spatial dimensions, we would expect to find a spin-2 resonance – a graviton – decaying to two photons. Both of these scenarios would be extremely exciting, if realized by experiment, which contributed to the buzz surrounding this signal.

So, where do we stand today? After considering the data from 2015 (at 13 TeV center-of-mass energy) and 2012 (at 8 TeV center-of-mass energy) together, CMS reports an excess with a local significance of 3.4-sigma. However, the global significance – which takes into account the “look-elsewhere effect” and is the figure of merit here – is a mere 1.6-sigma. While the outlook is not extremely encouraging, more data is needed to definitively rule on the status of the di-photon resonance. CMS and ATLAS should have just that, more data, in time for the International Conference on High Energy Physics (ICHEP) 2016 in early August. At that point, we should have sufficient data to determine the fate of the di-photon excess. For now, the di-photon bump serves as a reminder of the unpredictability of new physics signatures, and it might suggest the need for more model-independent searches for new physics, especially as the LHC continues to chip away at the available supersymmetry phase space without any discoveries.

References and Further Reading

BackreactionCan we please agree what we mean by “Big Bang”?

Can you answer the following question?

At the Big Bang the observable universe had the size of:
    A) A point (no size).
    B) A grapefruit.
    C) 168 meters.

The right answer would be “all of the above.” And that’s not because I can’t tell a point from a grapefruit, it’s because physicists can’t agree what they mean by Big Bang!

For someone in quantum gravity, the Big Bang is the initial singularity that occurs in General Relativity when the current expansion of the universe is extrapolated back to the beginning of time. At the Big Bang, then, the universe had size zero and an infinite energy density. Nobody believes this to be a physically meaningful event. We interpret it as a mathematical artifact which merely signals the breakdown of General Relativity.

If you ask a particle physicist, they’ll therefore sensibly put the Big Bang at the time where the density of matter was at the Planck scale – about 80 orders of magnitude higher than the density of a neutron star. That’s where General Relativity breaks down; it doesn’t make sense to extrapolate back farther than this. At this Big Bang, space and time were subject to significant quantum fluctuations and it’s questionable that even speaking of size makes sense, since that would require a well-defined notion of distance.

Cosmologists tend to be even more conservative. The currently most widely used model for the evolution of the universe posits that briefly after the Planck epoch an exponential expansion, known as inflation, took place. At the end of inflation, so the assumption, the energy of the field which drives the exponential expansion is dumped into particles of the standard model. Cosmologists like to put the Big Bang at the end of inflation because inflation itself hasn’t been observationally confirmed. But they can’t agree how long inflation lasted, and so the estimates for the size of the universe range between a grapefruit and a football field.

Finally, if you ask someone in science communication, they’ll throw up their hands in despair and then explain that the Big Bang isn’t an event but a theory for the evolution of the universe. Wikipedia engages in the same obfuscation – if you look up “Big Bang” you get instead an explanation for “Big Bang theory,” leaving you to wonder what it’s a theory of.

I admit it’s not a problem that bugs physicists a lot because they don’t normally debate the meaning of words. They’ll write down whatever equations they use, and this prevents further verbal confusion. Of course the rest of the world should also work this way, by first writing down definitions before entering unnecessary arguments.

While I am waiting for mathematical enlightment to catch on, I find this state of affairs terribly annoying. I recently had an argument on twitter about whether or not the LHC “recreates the Big Bang,” as the popular press likes to claim. It doesn’t. But it’s hard to make a point if no two references agree on what the Big Bang is to begin with, not to mention that it was neither big nor did it bang. If biologists adopted physicists standards, they’d refer to infants as blastocysts, and if you complained about it they’d explain both are phases of pregnancy theory.

I find this nomenclature unfortunate because it raises the impression we understand far less about the early universe than we do. If physicists can’t agree whether the universe at the Big Bang had the size of the White House or of a point, would you give them 5 billion dollars to slam things into each other? Maybe they’ll accidentally open a portal to a parallel universe where the US Presidential candidates are Donald Duck and Brigitta MacBridge.

Historically, the term “Big Bang” was coined by Fred Hoyle, a staunch believer in steady state cosmology. He used the phrase to make fun of Lemaitre, who, in 1927, had found a solution to Einstein’s field equations according to which the universe wasn’t eternally constant in time. Lemaitre showed, for the first time, that matter caused space to expand, which implied that the universe must have had an initial moment from which it started expanding. They didn’t then worry about exactly when the Big Bang would have been – back then they worried whether cosmology was science at all.

But we’re not in the 1940s any more, and precise science deserves precise terminology. Maybe we should rename the different stages of the universe that into “Big Bang,” “Big Bing” and “Big Bong.” This idea has much potential by allowing further refinement to “Big Bång,” “Big Bîng” or “Big Böng.” I’m sure Hoyle would approve. Then he would laugh and quote Niels Bohr, “Never express yourself more clearly than you are able to think.”

You can count me to the Planck epoch camp.

John PreskillBringing the heat to Cal State LA

John Baez is a tough act to follow.

The mathematical physicist presented a colloquium at Cal State LA this May.1 The talk’s title: “My Favorite Number.” The advertisement image: A purple “24” superimposed atop two egg cartons.


The colloquium concerned string theory. String theorists attempt to reconcile Einstein’s general relativity with quantum mechanics. Relativity concerns the large and the fast, like the sun and light. Quantum mechanics concerns the small, like atoms. Relativity and with quantum mechanics individually suggest that space-time consists of four dimensions: up-down, left-right, forward-backward, and time. String theory suggests that space-time has more than four dimensions. Counting dimensions leads theorists to John Baez’s favorite number.

His topic struck me as bold, simple, and deep. As an otherworldly window onto the pedestrian. John Baez became, when I saw the colloquium ad, a hero of mine.

And a tough act to follow.

I presented Cal State LA’s physics colloquium the week after John Baez. My title: “Quantum steampunk: Quantum information applied to thermodynamics.” Steampunk is a literary, artistic, and film genre. Stories take place during the 1800s—the Victorian era; the Industrial era; an age of soot, grime, innovation, and adventure. Into the 1800s, steampunkers transplant modern and beyond-modern technologies: automata, airships, time machines, etc. Example steampunk works include Will Smith’s 1999 film Wild Wild West. Steampunk weds the new with the old.

So does quantum information applied to thermodynamics. Thermodynamics budded off from the Industrial Revolution: The steam engine crowned industrial technology. Thinkers wondered how efficiently engines could run. Thinkers continue to wonder. But the steam engine no longer crowns technology; quantum physics (with other discoveries) does. Quantum information scientists study the roles of information, measurement, and correlations in heat, energy, entropy, and time. We wed the new with the old.


What image could encapsulate my talk? I couldn’t lean on egg cartons. I proposed a steampunk warrior—cravatted, begoggled, and spouting electricity. The proposal met with a polite cough of an email. Not all department members, Milan Mijic pointed out, had heard of steampunk.

Steampunk warrior

Milan is a Cal State LA professor and my erstwhile host. We toured the palm-speckled campus around colloquium time. What, he asked, can quantum information contribute to thermodynamics?

Heat offers an example. Imagine a classical (nonquantum) system of particles. The particles carry kinetic energy, or energy of motion: They jiggle. Particles that bump into each other can exchange energy. We call that energy heat. Heat vexes engineers, breaking transistors and lowering engines’ efficiencies.

Like heat, work consists of energy. Work has more “orderliness” than the heat transferred by random jiggles. Examples of work exertion include the compression of a gas: A piston forces the particles to move in one direction, in concert. Consider, as another example, driving electrons around a circuit with an electric field. The field forces the electrons to move in the same direction. Work and heat account for all the changes in a system’s energy. So states the First Law of Thermodynamics.

Suppose that the system is quantum. It doesn’t necessarily have a well-defined energy. But we can stick the system in an electric field, and the system can exchange motional-type energy with other systems. How should we define “work” and “heat”?

Quantum information offers insights, such as via entropies. Entropies quantify how “mixed” or “disordered” states are. Disorder grows as heat suffuses a system. Entropies help us extend the First Law to quantum theory.

First slide

So I explained during the colloquium. Rarely have I relished engaging with an audience as much as I relished engaging with Cal State LA’s. Attendees made eye contact, posed questions, commented after the talk, and wrote notes. A student in a corner appeared to be writing homework solutions. But a presenter couldn’t have asked for more from the rest. One exclamation arrested me like a coin in the cogs of a grandfather clock.

I’d peppered my slides with steampunk art: paintings, drawings, stills from movies. The peppering had staved off boredom as I’d created the talk. I hoped that the peppering would stave off my audience’s boredom. I apologized about the trimmings.

“No!” cried a woman near the front. “It’s lovely!”

I was about to discuss experiments by Jukka Pekola’s group. Pekola’s group probes quantum thermodynamics using electronic circuits. The group measures heat by counting the electrons that hop from one part of the circuit to another. Single-electron transistors track tunneling (quantum movements) of single particles.

Heat complicates engineering, calculations, and California living. Heat scrambles signals, breaks devices, and lowers efficiencies. Quantum heat can evade definition. Thermodynamicists grind their teeth over heat.

“No!” the woman near the front had cried. “It’s lovely!”

She was referring to steampunk art. But her exclamation applied to my subject. Heat has not only practical importance, but also fundamental: Heat influences every law of thermodynamics. Thermodynamic law underpins much of physics as 24 underpins much of string theory. Lovely, I thought, indeed.

Cal State LA offered a new view of my subfield, an otherworldly window onto the pedestrian. The more pedestrian an idea—the more often the idea surfaces, the more of our world the idea accounts for—the deeper the physics. Heat seems as pedestrian as a Pokémon Go player. But maybe, someday, I’ll present an idea as simple, bold, and deep as the number 24.


A window onto Cal State LA.

With gratitude to Milan Mijic, and to Cal State LA’s Department of Physics and Astronomy, for their hospitality.

1For nonacademics: A typical physics department hosts several presentations per week. A seminar relates research that the speaker has undertaken. The audience consists of department members who specialize in the speaker’s subfield. A department’s astrophysicists might host a Monday seminar; its quantum theorists, a Wednesday seminar; etc. One colloquium happens per week. Listeners gather from across the department. The speaker introduces a subfield, like the correction of errors made by quantum computers. Course lectures target students. Endowed lectures, often named after donors, target researchers.

Doug NatelsonDark matter, one more time.

There is strong circumstantial evidence that there is some kind of matter in the universe that interacts with ordinary matter via gravity, but is otherwise not readily detected - it is very hard to explain things like the rotation rates of galaxies, the motion of star clusters, and features of the large scale structure of the universe without dark matter.   (The most discussed alternative would be some modification to gravity, but given the success of general relativity at explaining many things including gravitational radiation, this seems less and less likely.)  A favorite candidate for dark matter would be some as-yet undiscovered particle or class of particles that would have to be electrically neutral (dark!) and would only interact very weakly if at all beyond the gravitational attraction.

There have been many experiments trying to detect these particles directly.  The usual assumption is that these particles are all around us, and very occasionally they will interact with the nuclei of ordinary matter via some residual, weak mechanism (say higher order corrections to ordinary standard model physics).  The signature would be energy getting dumped into a nucleus without necessarily producing a bunch of charged particles.   So, you need a detector that can discriminate between nuclear recoils and charged particles.  You want a lot of material, to up the rate of any interactions, and yet the detector has to be sensitive enough to see a single event, and you need pure enough material and surroundings that a real signal wouldn't get swamped by background radiation, including that from impurities.  The leading detection approaches these days use sodium iodide scintillators (DAMA), solid blocks of germanium or silicon (CDMS), and liquid xenon (XENON, LUX, PandaX - see here for some useful discussion and links).

I've been blogging long enough now to have seen rumors about dark matter detection come and go.  See here and here.  Now in the last week both LUX and PandaX have reported their latest results, and they have found nothing - no candidate events at all - after their recent experimental runs.  This is in contrast to DAMA, who have been seeing some sort of signal for years that seems to vary with the seasons.  See here for some discussion.  The lack of any detection at all is interesting.  There's always the possibility that whatever dark matter exists really does only interact with ordinary matter via gravity - perhaps all other interactions are somehow suppressed by some symmetry.  Between the lack of dark matter particle detection and the apparent lack of exotica at the LHC so far, there is a lot of head scratching going on....

July 24, 2016

Noncommutative GeometryA motivic product formula

The classical product formula for number fields is a fundamental tool in arithmetic. In 1993, Pierre Colmez published a truly inspired generalization of this to the case of Grothendieck's motives. In turn, this spring Urs Hartl and Rajneesh Kumar Singh put an equally inspired manuscript on the arXiv devoted to translating Colmez into the theory of Drinfeld modules and the like. Underneath the

Tommaso DorigoThe Daily Physics Problem - 2

As explained in the previous installment of this series, these questions are a warm-up for my younger colleagues, who will in two months have to pass a tough exam to become INFN researchers.
By the way, when I wrote the first question yesterday I thought I would not need to explain it in detail, but it just occurred to me that a disclaimer would be useful. Here it is:

read more

July 23, 2016

ParticlebitesThe dawn of multi-messenger astronomy: using KamLAND to study gravitational wave events GW150914 and GW151226

Article: Search for electron antineutrinos associated with gravitational wave events GW150914 and GW151226 using KamLAND
Authors: KamLAND Collaboration
Reference: arXiv:1606.07155


After the chirp heard ‘round the world, the search is on for coincident astrophysical particle events to provide insight into the source and nature of the era-defining gravitational wave events detected by the LIGO Scientific Collaboration in late 2015.

By combining information from gravitational wave (GW) events with the detection of astrophysical neutrinos and electromagnetic signatures such as gamma-ray bursts, physicists and astronomers are poised to draw back the curtain on the dynamics of astrophysical phenomena, and we’re surely in for some surprises.

The first recorded gravitational wave event, GW150914, was likely a merger of two black holes which took place more than one billion light years from the Earth. The event’s name marks the day it was observed by the Advanced Laser Interferometer Gravitational-wave Observatory (LIGO), September 14th, 2015.  LIGO detections are named “GW” for “gravitational wave,” followed by the observation date in YYMMDD format. The second event, GW151226 (December 26th, 2015) was likely another merger of two black holes, having 8 and 14 times the mass of the sun, taking place 1.4 billion light years away from Earth. A third gravitational wave event candidate, LVT151012, a possible black hole merger which occurred on October 12th, 2015, did not reach the same detection significance a the aforementioned events, but still has a >50% chance of astrophysical origin. LIGO candidates are named differently than detections. The names start with “LVT” for “LIGO-Virgo Trigger,” but are followed by the observation date in the same YYMMDD format. The different name indicates that the event was not significant enough to be called a gravitational wave.


Two black holes spiral in towards one another and merge to emit a burst of gravitational waves that Advanced LIGO can detect. Source: APS Physics.

The following  computer simulation created by the multi-university SXS (Simulating eXtreme Spacetimes) project depicts what the collision of two black holes would look like  if we could get close enough to the merger. It was created by solving equations from Albert Einstein’s general theory of relativity using the LIGO data. (Source: LIGO Lab Caltech : MIT).

Observations from other scientific collaborations can search for particles associated with these gravitational waves. The combined information from the gravitational wave and particle detections could identify the origin of these gravitational wave events. For example, some violent astrophysical phenomena emit not only gravitational waves, but also high-energy neutrinos. Conversely, there is currently no known mechanism for the production of either neutrinos or electromagnetic waves in a black hole merger.

Black holes with rapidly accreting disks can be the origin of gamma-ray bursts and neutrino signals, but these disks are not expected to be present during mergers like the ones detected by LIGO. For this reason, it was surprising when the Fermi Gamma-ray Space Telescope reported a coincident gamma-ray burst occurring 0.4 seconds after the September GW event with a false alarm probability of 1 in 455. Although there is some debate in the community about whether or not this observation is to be believed, the observation motivates a multi-messenger analysis including the hunt for associated astrophysical neutrinos at all energies.

Could a neutrino experiment like KamLAND find low energy antineutrino events coincident with the GW events, even when higher energy searches by IceCube and ANTARES did not?


Schematic diagram of the KamLAND detector. Source:  hep-ex/0212021v1

KamLAND, the Kamioka Liquid scintillator Anti-Neutrino Detector, is located under Mt. Ikenoyama, Japan, buried beneath the equivalent of 2,700 meters of water. It consists of an 18 meter diameter stainless steel sphere, the inside of which is covered with photomultiplier tubes, surrounding an EVOH/nylon balloon enclosed by pure mineral oil. Inside the balloon resides 1 kton of highly purified liquid scintillator. Outside the stainless steel sphere is a cylindrical 3.2 kton water-Cherenkov detector that provides shielding and enables cosmic ray muon identification.

KamLAND is optimized to search for ~MeV neutrinos and antineutrinos. The detection of the gamma ray burst by the Fermi telescope suggests that the detected black hole merger might have retained its accretion disk, and the spectrum of accretion disk neutrinos around a single black hole is expected to peak around 10 MeV, so KamLAND searched for correlations between the LIGO GW events and ~10 MeV electron antineutrino events occurring within a 500 second window of the merger events. Researchers focused on the detection of electron antineutrinos through the inverse beta decay reaction.

No events were found within the target window of any gravitational wave event, and any adjacent event was consistent with background. KamLAND researchers used this information to determine a monochromatic fluence (time integrated flux) upper limit, as well as an upper limit on source luminosity for each gravitational wave event, which places a bound on the total energy released as low energy neutrinos during the merger events and candidate event. The lack of detected concurrent inverse beta decay events supports the conclusion that GW150914 was a black hole merger, and not another astrophysical event such as a core-collapse supernova.

More information would need to be obtained to explain the gamma ray burst observed by the Fermi telescope, and work to improve future measurements is ongoing. Large uncertainties in the origin region of gamma ray bursts observed by the Fermi telescope will be reduced, and the localization of GW events will be improved, most drastically so by the addition of a third LIGO detector (LIGO India).

As Advanced LIGO continues its operation, there will likely be many more chances for KamLAND and other neutrino experiments to search for coincidence neutrinos. Multi-messenger astronomy has only just begun to shed light on the nature of black holes, supernovae, mergers, and other exciting astrophysical phenomena — and the future looks bright.

Background reading:

Tommaso DorigoThe Daily Physics Problem - 1

Today I wish to start a series of posts that are supposed to help my younger colleagues who will, in two months from now, compete for a position as INFN research scientists. 
The INFN has opened 73 new positions and the selection includes two written exams besides an evaluation of titles and an oral colloquium. The rules also say that the candidates will have to pass the written exams with a score of at least 140/200 on each, in order to access the oral colloquium. Of course, no information is given on how the tests will be graded, so 140 over 200 does not really mean much at this point.

read more

July 21, 2016

Doug NatelsonImpact factors and academic "moneyball"

For those who don't know the term:  Moneyball is the title of a book and a movie about the 2002 Oakland Athletics baseball team, a team with a payroll in the bottom 10% of major league baseball at the time.   They used a data-intensive, analytics-based strategy called sabermetrics to find "hidden value" and "market inefficiencies", to put together a very competitive team despite their very limited financial resources.   A recent (very fun if you're a baseball fan) book along the same lines is this one.  (It also has a wonderful discussion of confirmation bias!)

A couple of years ago there was a flurry of articles (like this one and the academic paper on which it was based) about whether a similar data-driven approach could be used in scientific academia - to predict success of individuals in research careers, perhaps to put together a better department or institute (a "roster") by getting a competitive edge at identifying likely successful researchers.

The central problems in trying to apply this philosophy to academia are the lack of really good metrics and the timescales involved in research careers.  Baseball is a paradise for people who love statistics.  The rules have been (largely) unchanged for over a hundred years; the seasons are very long (formerly 154 games, now 162), and in any game an everyday player can get multiple opportunities to show their offensive or defensive skills.   With modern tools it is possible to get quantitative information about every single pitched ball and batted ball.  As a result, the baseball stats community has come up with a huge number of quantitative metrics for evaluating performance in different aspects of the game, and they have a gigantic database against which to test their models.  They even have devised metrics to try and normalize out the effects of local environment (baseball park-neutral or adjusted stats).

Fig. 1, top panel, from this article.  x-axis = # of citations.
The mean of the distribution is strongly affected by the outliers.
In scientific research, there are very few metrics (publications; citation count; impact factor of the journals in which articles are published), and the total historical record available on which to base some evaluation of an early career researcher is practically the definition of what a baseball stats person would call "small sample size".   An article in Nature this week highlights the flaws with impact factor as a metric.  I've written before about this (here and here), pointing out that impact factor is a lousy statistic because it's dominated by outliers, and now I finally have a nice graph (fig. 1 in the article; top panel shown here) to illustrate this.  

So, in academia, the tantalizing fact is that there is almost certainly a lot of "hidden value" out there missed by traditional evaluation approaches.  Just relying on pedigree (where did so-and-so get their doctorate?) and high impact publications (person A must be better than person B because person A published a paper as a postdoc in a high impact glossy journal) almost certainly misses some people who could be outstanding researchers.  However, the lack of good metrics, the small sample sizes, the long timescales associated with research, and enormous local environmental influence (it's just easier to do cutting-edge work at Harvard than at Northern Michigan), all mean that it's incredibly hard to come up with a way to find these people via some analytic approach.  

July 20, 2016

John PreskillThe physics of Trump?? Election renormalization.

Two things were high in my mind this last quarter: My course on advanced statistical mechanics and phase transitions, and the bizarre general elections that raged all around. It is no wonder then, that I would start to conflate the Ising model, Landau mean field, and renormalization group, with the election process, and just think of each and every one of us as a tiny magnet, that needs to say up or down – Trump or Cruz, Clinton or Sanders (a more appetizing choice, somehow), and .. you get the drift.

Elections and magnetic phase transitions are very much alike. The latter, I will argue, teaches us something very important about the former.

The physics of magnetic phase transitions is amazing. If I hadn’t thought this way, I wouldn’t be a condensed matter physicist. Models of magnets consider a bunch of spins – each one a small magnet – that talk only to their nearest neighbor, as happens in typical magnets. At the onset of magnetic order (the Curie temperature), when the symmetry of the spins becomes broken, it turns out that the spin correlation length diverges. Even though Interaction length = lattice constant, we get correlation length = infinity.

To understand how ridiculous this is, you should understand what a correlation length is. The correlation tells you a simple thing. If you are a spin, trying to make it out in life, and trying to figure out where to point, your pals around you are certainly going to influence you. Their pals will influence them, and therefore you. The correlation length tells you how distant can a spin be, and still manage to nudge you to point up or down. In physics-speak, it is the reduced correlation length. It makes sense that somebody in you neighborhood, or your office, or even your town, will do something that will affect you – after all – you always interact with people that distant. But the analogy to the spins is that there is always a given circumstance where some random person in Incheon, South Korea, could influence your vote. A diverging correlation length is the Butterfly effect for real.

And yet, spins do this. At the critical temperature, just as the spins decide whether they want to point along the north pole or towards Venus, every nonsense of a fluctuation that one of them makes leagues away may galvanize things one way or another. Without ever even remotely directly talking to even their father’s brother’s nephew’s cousin’s former roommate! Every fluctuation, no matter where, factors into the symmetry breaking process.

A bit of physics, before I’m blamed for being crude in my interpretation. The correlation length at the Curie point, and almost all symmetry-breaking continuous transitions, diverges as some inverse power of the temperature difference to the critical point: \frac{1}{|T-T_c|}^{\nu}. The faster it diverges (the higher the power \nu) , actually the more feeble the symmetry breaking is. Why is that? After I argued that this is an amazing phenomenon? Well, if 10^2 voices can shift you one way or another, each voice is worth something. If 10^{20} voices are able to push you around, I’m not really buying influence on you by bribing ten of these. Each voice is worth less. Why? The correlation length is also a measure of the uncertainty before the moment of truth – when the battle starts and we don’t know who wins. Big correlation length – any little element of the battlefield can change something, and many souls are involved and active. Small correlation length – the battle was already decided since one of the sides has a single bomb that will evaporate the world. Who knew that Dr. Strangelove could be a condensed matter physicist?

This lore of correlations led to one of the most breathtaking developments of 20th century physics. I’m a condensed matter guy, so it is natural that Ken Wilson, as well as Ben Widom, Michael Fisher, and Leo Kadanoff are my superheros. They came up with an idea so simple yet profound – scaling. If you have a system (say, of spins) that you can’t figure out – maybe because it is fluctuating, and because it is interacting – regardless, all you need to do is to move away from it. Let averaging (aka, central limit theorem) do the job and suppress fluctuations. Let us just zoom out. If we change the scale by a factor of 2, so that all spins look more crowded, then the correlation length also look half as big. The system looks less critical. It is as if we managed to move away from the critical temperature – either cooling towards T=0 , or heating up towards T=\infty. Both limits are easy to solve. How do we make this into a framework? If the pre-zoom-out volume had 8 spins, we can average them into a representative single spin. This way you’ll end up with a system that looks pretty much like the one you had before – same spin density, same interaction, same physics – but at a different temperature, and further from the phase transition. It turns out you can do this, and you can figure out how much changed in the process. Together, this tells you how the correlation length depends on T-T_c. This is the renormalization group, aka, RG.

Interestingly, this RG procedure informs us that criticality and symmetry breaking are more feeble the lower the dimension. There are no 1d permanent magnets, and magnetism in 2d is very frail. Why? Well, the more dimensions there are, the more nearest neighbors each spin has, and more neighbors your neighbors have. Think about the 6-degrees of separation game. 3d is okay for magnets, as we know. It turns out, however, that in physical systems above 4 dimensions, critical phenomena is the same as that of a fully connected (infinite dimensional) network. The uncertainty stage is very small, correlations length diverge slowly. Even at distance 1 there are enough people or spins to bend your will one way or another. Magnetization is just a question of time elapsed from the beginning of the experiment.

Spins, votes, what’s the difference? You won’t be surprised to find that the term renormalization has permeated every aspect of economics and social science as well. What is voting Republican vs Democrat if not a symmetry breaking? Well, it is not that bad yet – the parties are different. No real symmetry there, you would think. Unless you ask the ‘undecided voter’.

And if elections are affected by such correlated dynamics, what about revolutions? Here the analogy with phase transitions is so much more prevalent even in our language – resistance to a regime solidifies, crystallizes, and aligns – just like solids and magnets. When people are fed up with a regime, the crucial question is – if I would go to the streets, will I be joined by enough people to affect a change?

Revolutions, therefore, seem to rise out of strong fluctuations in the populace. If you wish, think of revolutions as domains where the frustration is so high, which give a political movement the inertia it needs.

Domains-: that’s exactly what the correlation length is about. The correlation length is the size of correlated magnetic domains, i.e.,groups of spins that point in the same direction. And now we remember that close to a phase transition, the correlation length diverges as some power of the distance ot the transition: \frac{1}{|T-T_c|^{\nu}}. Take a magnet just above its Curie temperature. The closer we are to a phase transition, the larger the correlation length is, and the bigger are the fluctuating magnetized domains. The parameter \nu is the correlation-length critical exponent and something of a holy grail for practitioners of statistical mechanics. Everyone wants to calculate it for various phase transition. It is not that easy. That’s partially why I have a job.

The correlation length aside, how many spins are involved in a domain? \left[1/|T-T_c|^d\right]^{\nu} . Actually, we know roughly what \nu is. For systems with dimension $latex  d>4$, it is ½. For systems with a lower dimensionality it is roughly $latex  2/d$. (Comment for the experts: I’m really not kidding – this fits the Ising model for 2 and 3 dimensions, and it fits the xy model for 3d).

So the number of spins in a domain in systems below 4d is 1/|T-T_c|^2, independent of dimension. On the other hand, four d and up it is 1/|T-T_c|^{d/2}. Increasing rapidly with dimension, when we are close to the critical point.

Back to voters. In a climate of undecided elections, analogous to a magnet near its Curie point, the spins are the voters, and domain walls are the crowds supporting this candidate or that policy; domain walls are what becomes large demonstrations in the Washington Mall. And you would think that the world we live in is clearly 2d – a surface of a 3d sphere (and yes – that includes Manhattan!). So a political domain size just diverges as a simple moderate 1/|T-T_c|^2 during times of contested elections.

Something happened, however, in the past two decades: the internet. The connectivity of the world has changed dramatically.

No more 2d. Now, our effective dimension is determined by our web based social network. Facebook perhaps? Roughly speaking, the dimensionality of the Facebook network is that number of friends we have, divided by the number of mutual friends. I venture to say this averages at about 10. With about a 150 friends in tow, out of which 15 are mutual. So our world, for election purposes, is 10 dimensional big!

Let’s simulate what this means for our political system. Any event – a terrorist attack, or a recession, etc. will cause a fluctuation that will involve a large group of people – a domain. Take a time when T-T_c is a healthy 0.1 for instance. In the good old 2d world this would involve 100 friends times 1/0.1^2\sim 10000 people. Now it would be more like 100\cdot 1/0.1^{10/2}\sim 10-millions. So any small perturbation of conditions could make entire states turn one way or another.

When response to slight shifts in prevailing conditions encompasses entire states, rather than entire neighborhoods, polarization follows. Over all, a state where each neighborhood has a slightly different opinion will be rather moderate – extreme opinions will only resonate locally. Single voices could only sway so many people. But nowadays, well – we’ve all seen Trump and the like on the march. Millions. It’s not even their fault – its physics!

Can we do anything about it? It’s up for debate. Maybe cancel the electoral college, to make the selecting unit larger than the typical size of a fluctuating domain. Maybe carry out a time averaged election: make an election year where each month there is a contest for the grand prize. Or maybe just move to Canada.

ParticlebitesProbing the Standard Model with muons: new results from MEG

Article: Search for the lepton flavor violating decay μ+ → e+γ with the full dataset of the MEG experiment
Authors: MEG Collaboration
Reference: arXiv:1605.05081

I work on the Muon g-2 experiment, which is housed inside a brand new building at Fermilab.  Next door, another experiment hall is under construction. It will be the home of the Mu2e experiment, which is slated to use Fermilab’s muon beam as soon as Muon g-2 wraps up in a few years. Mu2e will search for evidence of an extremely rare process — namely, the conversion of a muon to an electron in the vicinity of a nucleus. You can read more about muon-to-electron conversion in a previous post by Flip.

Today, though, I bring you news of a different muon experiment, located at the Paul Scherrer Institute in Switzerland. The MEG experiment was operational from 2008-2013, and they recently released their final result.

Context of the MEG experiment

Figure 1: Almost 100% of the time, a muon will decay into an electron and two neutrinos.

MEG (short for “mu to e gamma”) and Mu2e are part of the same family of experiments. They each focus on a particular example of charged lepton flavor violation (CLFV). Normally, a muon decays into an electron and two neutrinos. The neutrinos ensure that lepton flavor is conserved; the overall amounts of “muon-ness” and “electron-ness” do not change.

Figure 2 lists some possible CLFV muon processes. In each case, the muon transforms into an electron without producing any neutrinos — so lepton flavor is not conserved! These processes are allowed by the standard model, but with such minuscule probabilities that we couldn’t possibly measure them. If that were the end of the story, no one would bother doing experiments like MEG and Mu2e — but of course that’s not the end of the story. It turns out that many new physics models predict CLFV at levels that are within range of the next generation of experiments. If an experiment finds evidence for one of these CLFV processes, it will be a clear indication of beyond-the-standard-model physics.

Figure 2: Some examples of muon processes that do not conserve lepton flavor. Also listed are the current/upcoming experiments that aim to measure the probabilities of these never-before-observed processes.

Results from MEG

The goal of the MEG experiment was to do one of two things:

  1. Measure the branching ratio of the μ+ → e+γ decay, or
  2. Establish a new upper limit

Outcome #1 is only possible if the branching ratio is high enough to produce a clear signal. Otherwise, all the experimenters can do is say “the branching ratio must be smaller than such-and-such, because otherwise we would have seen a signal” (i.e., outcome #2).

MEG saw no evidence of μ+ → e+γ decays. Instead, they determined that the branching ratio is less than 4.2 × 10^-13 (90% confidence level). Roughly speaking, that means if you had a pair of magic goggles that let you peer directly into the subatomic world, you could stand around and watch 2 × 10^12 muons decay without seeing anything unusual. Because real experiments are messier and less direct than magic goggles, the MEG result is actually based on data from 7.5 × 10^14 muons.

Before MEG, the previous experiment to search for μ+ → e+γ was the MEGA experiment at Los Alamos; they collected data from 1993-1995, and published their final result in 1999. They found an upper limit for the branching ratio of 1.2 × 10^-11. Thus, MEG achieved a factor of 30 improvement in sensitivity over the previous result.

How the experiment works

Figure 3: The MEG signal consists of a back-to-back positron and gamma, each carrying half the rest energy of the parent muon.

A continuous beam of positive muons enters a large magnet and hits a thin plastic target. By interacting with the material, about 80% of the muons lose their kinetic energy and come to rest inside the target. Because the muons decay from rest, the MEG signal is simple. Energy and momentum must be conserved, so the positron and gamma emerge from the target in opposite directions, each with an energy of 52.83 MeV (half the rest energy of the muon).1  The experiment is specifically designed to catch and measure these events. It consists of three detectors: a drift chamber to measure the positron trajectory and momentum, a timing counter to measure the positron time, and a liquid xenon detector to measure the photon time, position, and energy. Data from all three detectors must be combined to get a complete picture of each muon decay, and determine whether it fits the profile of a MEG signal event.

Figure 4: Layout of the MEG experiment. Source: arXiv:1605.05081.

In principle, it sounds pretty simple….to search for MEG events, you look at each chunk of data and go through a checklist:

  • Is there a photon with the correct energy?
  • Is there a positron at the same time?
  • Did the photon and positron emerge from the target in opposite directions?
  • Does the positron have the correct energy?

Four yeses and you might be looking at a rare CLFV muon decay! However, the key word here is might. Unfortunately, it is possible for a normal muon decay to masquerade as a CLFV decay. For MEG, one source of background is “radiative muon decay,” in which a muon decays into a positron, two neutrinos and a photon; if the neutrinos happen to have very low energy, this will look exactly like a MEG event. In order to get a meaningful result, MEG scientists first had to account for all possible sources of background and figure out the expected number of background events for their data sample. In general, experimental particle physicists spend a great deal of time reducing and understanding backgrounds!

What’s next for MEG?

The MEG collaboration is planning an upgrade to their detector which will produce an order of magnitude improvement in sensitivity. MEG-II is expected to begin three years of data-taking late in 2017. Perhaps at the new level of sensitivity, a μ+ → e+γ signal will emerge from the background!


1 Because photons are massless and positrons are not, their energies are not quite identical, but it turns out that they both round to 52.83 MeV. You can work it out yourself if you’re skeptical (that’s what I did).

Further Reading

  • Robert H. Bernstein and Peter S. Cooper, “Charged Lepton Flavor Violation: An Experimenter’s Guide.” (arXiv:1307.5787)
  • S. Mihara, J.P. Miller, P. Paradisi and G. Piredda, “Charged Lepton Flavor–Violation Experiments.” (DOI: 10.1146/annurev-nucl-102912-144530)
  • André de Gouvêa and Petr Vogel, “Lepton Flavor and Number Conservation, and Physics Beyond the Standard Model.” (arXiv:1303.4097)

Chad OrzelPhysics Blogging Round-Up: Roman Engineering, Water, and Baseball

It’s been a month since the last links dump of posts from Forbes, though, really, I took a couple of weeks off there, so it’s been less than that in terms of active blogging time. But I’ve put up a bunch of stuff in July, so here are some links:

The Physics Of Ancient Roman Architecture: First of a couple posts inspired by our trip to Rome, this one looking at the basic mechanics of the key structural element of Roman building, the arch.

What Ancient Roman Buildings Teach Us About Science And Engineering: Second post about Roman construction, in which looking into the question of how they designed their major structures leads to thinking about the artificiality of the distinction between “science” and “engineering.”

The Microscopic Physics Of Beautiful Fountains: Prompted by taking photos of a bunch of Roman fountains, a look at how microscopic forces create surface tension, which in turn makes most of the cool effects of splashing water.

Baseball Physics: Real Curves And Dead Balls: A brief sports interlude, prompted by a NIST video about baseball-related research by former director Lyman Briggs.

How To Stick Atoms And Molecules Together: A follow-on of sorts to the surface-tension-in-fountains post, looking at the origin of some of the microscopic forces that hold liquids together.

Pools And Beaches: The Fun Physics Of Water Waves: Rounding out the accidental water blogging theme, a look at the physics of water waves prompted by taking SteelyKid and The Pip first to Jones Beach and then to a wave pool at Six Flags Great Escape.

So, anyway, if you’re looking for some uplifting physics content to buoy your spirits during the political conventions, here’s a good reading list to start with.

July 19, 2016

Terence TaoNotes on the Bombieri asymptotic sieve

The twin prime conjecture, still unsolved, asserts that there are infinitely many primes {p} such that {p+2} is also prime. A more precise form of this conjecture is (a special case) of the Hardy-Littlewood prime tuples conjecture, which asserts that

\displaystyle \sum_{n \leq x} \Lambda(n) \Lambda(n+2) = (2\Pi_2+o(1)) x \ \ \ \ \ (1)


as {x \rightarrow \infty}, where {\Lambda} is the von Mangoldt function and {\Pi_2 = 0.6606\dots} is the twin prime constant

\displaystyle \prod_{p>2} (1 - \frac{1}{(p-1)^2}).

Because {\Lambda} is almost entirely supported on the primes, it is not difficult to see that (1) implies the twin prime conjecture.

One can give a heuristic justification of the asymptotic (1) (and hence the twin prime conjecture) via sieve theoretic methods. Recall that the von Mangoldt function can be decomposed as a Dirichlet convolution

\displaystyle \Lambda(n) = \sum_{d|n} \mu(d) \log \frac{n}{d}

where {\mu} is the Möbius function. Because of this, we can rewrite the left-hand side of (1) as

\displaystyle \sum_{d \leq x} \mu(d) \sum_{n \leq x: d|n} \log\frac{n}{d} \Lambda(n+2). \ \ \ \ \ (2)


To compute this double sum, it is thus natural to consider sums such as

\displaystyle \sum_{n \leq x: d|n} \log \frac{n}{d} \Lambda(n+2)

or (to simplify things by removing the logarithm)

\displaystyle \sum_{n \leq x: d|n} \Lambda(n+2).

The prime number theorem in arithmetic progressions suggests that one has an asymptotic of the form

\displaystyle \sum_{n \leq x: d|n} \Lambda(n+2) \approx \frac{g(d)}{d} x \ \ \ \ \ (3)


where {g} is the multiplicative function with {g(d)=0} for {d} even and

\displaystyle g(d) := \frac{d}{\phi(d)} = \prod_{p|d} (1-\frac{1}{p})^{-1}

for {d} odd. Summing by parts, one then expects

\displaystyle \sum_{n \leq x: d|n} \Lambda(n+2)\log \frac{n}{d}  \approx \frac{g(d)}{d} x \log \frac{x}{d}

and so we heuristically have

\displaystyle \sum_{n \leq x} \Lambda(n) \Lambda(n+2) \approx x \sum_{d \leq x} \frac{\mu(d) g(d)}{d} \log \frac{x}{d}.

The Dirichlet series

\displaystyle \sum_n \frac{\mu(n) g(n)}{n^s}

has an Euler product factorisation

\displaystyle \sum_n \frac{\mu(n) g(n)}{n^s} = \prod_p (1 - \frac{g(p)}{p^s})

for {\hbox{Re} s > 1}; comparing this with the Euler product factorisation

\displaystyle \zeta(s) = \prod_p (1 - \frac{1}{p^s})^{-1}

for the Riemann zeta function, and recalling that {\zeta} has a simple pole of residue {1} at {s=1}, we see that

\displaystyle \sum_n \frac{\mu(n) g(n)}{n^s} = \frac{1}{\zeta(s)} \prod_p \frac{1-g(p)/p^s}{1-p^s}

has a simple zero at {s=1} with first derivative

\displaystyle \prod_p \frac{1 - g(p)/p}{1-1/p} = 2 \Pi_2.

From this and standard multiplicative number theory manipulations, one can calculate the asymptotic

\displaystyle \sum_{d \leq x} \frac{\mu(d) g(d)}{d} \log \frac{x}{d} = 2 \Pi_2 + o(1)

which concludes the heuristic justification of (1).

What prevents us from making the above heuristic argument rigorous, and thus proving (1) and the twin prime conjecture? Note that the variable {d} in (2) ranges to be as large as {x}. On the other hand, the prime number theorem in arithmetic progressions (3) is not expected to hold for {d} anywhere that large (for instance, the left-hand side of (3) vanishes as soon as {d} exceeds {x}). The best unconditional result known of the type (3) is the Siegel-Walfisz theorem, which allows {d} to be as large as {\log^{O(1)} x}. Even the powerful generalised Riemann hypothesis (GRH) only lets one prove an estimate of the form (3) for {d} up to about {x^{1/2-o(1)}}.

However, because of the averaging effect of the summation in {d} in (2), we don’t need the asymptotic (3) to be true for all {d} in a particular range; having it true for almost all {d} in that range would suffice. Here the situation is much better; the celebrated Bombieri-Vinogradov theorem (sometimes known as “GRH on the average”) implies, roughly speaking, that the approximation (3) is valid for almost all {d \leq x^{1/2-\varepsilon}} for any fixed {\varepsilon>0}. While this is not enough to control (2) or (1), the Bombieri-Vinogradov theorem can at least be used to control variants of (1) such as

\displaystyle \sum_{n \leq x} (\sum_{d|n} \lambda_d) \Lambda(n+2)

for various sieve weights {\lambda_d} whose associated divisor function {\sum_{d|n} \lambda_d} is supposed to approximate the von Mangoldt function {\Lambda}, although that theorem only lets one do this when the weights {\lambda_d} are supported on the range {d \leq x^{1/2-\varepsilon}}. This is still enough to obtain some partial results towards (1); for instance, by selecting weights according to the Selberg sieve, one can use the Bombieri-Vinogradov theorem to establish the upper bound

\displaystyle \sum_{n \leq x} \Lambda(n) \Lambda(n+2) \leq (4+o(1)) 2 \Pi_2 x, \ \ \ \ \ (4)


which is off from (1) by a factor of about {4}. See for instance this blog post for details.

It has been difficult to improve upon the Bombieri-Vinogradov theorem in its full generality, although there are various improvements to certain restricted versions of the Bombieri-Vinogradov theorem, for instance in the famous work of Zhang on bounded gaps between primes. Nevertheless, it is believed that the Elliott-Halberstam conjecture (EH) holds, which roughly speaking would mean that (3) now holds for almost all {d \leq x^{1-\varepsilon}} for any fixed {\varepsilon>0}. (Unfortunately, the {\varepsilon} factor cannot be removed, as investigated in a series of papers by Friedlander, Granville, and also Hildebrand and Maier.) This comes tantalisingly close to having enough distribution to control all of (1). Unfortunately, it still falls short. Using this conjecture in place of the Bombieri-Vinogradov theorem leads to various improvements to sieve theoretic bounds; for instance, the factor of {4+o(1)} in (4) can now be improved to {2+o(1)}.

In two papers from the 1970s (which can be found online here and here respectively, the latter starting on page 255 of the pdf), Bombieri developed what is now known as the Bombieri asymptotic sieve to clarify the situation more precisely. First, he showed that on the Elliott-Halberstam conjecture, while one still could not establish the asymptotic (1), one could prove the generalised asymptotic

\displaystyle \sum_{n \leq x} \Lambda_k(n) \Lambda(n+2) = (2\Pi_2+o(1)) k x \log^{k-1} x \ \ \ \ \ (5)


for all natural numbers {k \geq 2}, where the generalised von Mangoldt functions {\Lambda_k} are defined by the formula

\displaystyle \Lambda_k(n) := \sum_{d|n} \mu(d) \log^k \frac{n}{d}.

These functions behave like the von Mangoldt function, but are concentrated on {k}-almost primes (numbers with at most {k} prime factors) rather than primes. The right-hand side of (5) corresponds to what one would expect if one ran the same heuristics used to justify (1). Sadly, the {k=1} case of (5), which is just (1), is just barely excluded from Bombieri’s analysis.

More generally, on the assumption of EH, the Bombieri asymptotic sieve provides the asymptotic

\displaystyle \sum_{n \leq x} \Lambda_{(k_1,\dots,k_r)}(n) \Lambda(n+2) \ \ \ \ \ (6)


\displaystyle = (2\Pi_2+o(1)) \frac{\prod_{i=1}^r k_i!}{(k_1+\dots+k_r-1)!} x \log^{k_1+\dots+k_r-1} x

for any fixed {r \geq 1} and any tuple {(k_1,\dots,k_r)} of natural numbers other than {(1,\dots,1)}, where

\displaystyle \Lambda_{(k_1,\dots,k_r)} := \Lambda_{k_1} * \dots * \Lambda_{k_r}

is a further generalisation of the von Mangoldt function (now concentrated on {k_1+\dots+k_r}-almost primes). By combining these asymptotics with some elementary identities involving the {\Lambda_{(k_1,\dots,k_r)}}, together with the Weierstrass approximation theorem, Bombieri was able to control a wide family of sums including (1), except for one undetermined scalar {\delta_x \in [0,2]}. Namely, he was able to show (again on EH) that for any fixed {r \geq 1} and any continuous function {g_r} on the simplex {\Delta_r := \{ (t_1,\dots,t_r) \in {\bf R}^r: t_1+\dots+t_r = 1; 0 \leq t_1 \leq \dots \leq t_r\}} that had suitable vanishing at the boundary, the sum

\displaystyle \sum_{n \leq x: n=p_1 \dots p_r} g_r( \frac{\log p_1}{\log n}, \dots, \frac{\log p_r}{\log n} ) \Lambda(n+2)

was equal to

\displaystyle (\delta_x+o(1)) \int_{\Delta_r} g_r \frac{x}{\log x} \ \ \ \ \ (7)


when {r} was odd and

\displaystyle (2-\delta_x+o(1)) \int_{\Delta_r} g_r \frac{x}{\log x} \ \ \ \ \ (8)


when {r} was even, where the integral on {\Delta_r} is with respect to the measure {\frac{dt_1 \dots dt_{r-1}}{t_1 \dots t_r}} (this is Dirac measure in the case {r=1}). In particular, we have

\displaystyle \sum_{n \leq x} \Lambda(n) \Lambda(n+2) = (\delta_x + o(1)) 2 \Pi_2 x

and the twin prime conjecture would be proved if one could show that {\delta_x} is bounded away from zero, while (1) is equivalent to the assertion that {\delta_x} is equal to {1+o(1)}. Unfortunately, no additional bound beyond the inequalities {0 \leq \delta_x \leq 2} provided by the Bombieri asymptotic sieve is known, even if one assumes all other major conjectures in number theory than the prime tuples conjecture and its variants (e.g. GRH, GEH, GUE, abc, Chowla, …).

To put it another way, the Bombieri asymptotic sieve is able (on EH) to compute asymptotics for sums

\displaystyle \sum_{n \leq x} f(n) \Lambda(n+2) \ \ \ \ \ (9)


without needing to know the unknown scalar {\delta_x}, when {f} is a function supported on almost primes of the form

\displaystyle f(p_1 \dots p_r) = g_r( \frac{\log p_1}{\log n}, \dots, \frac{\log p_r}{\log n} )

for {1 \leq r \leq r_*} and some fixed {r_*}, with {f} vanishing elsewhere and for some continuous (symmetric) functions {g_r: \Delta_r \rightarrow {\bf C}} obeying some vanishing at the boundary, so long as the parity condition

\displaystyle \sum_{r \hbox{ odd}} \int_{\Delta_r} g_r = \sum_{r \hbox{ even}} \int_{\Delta_r} g_r

is obeyed (informally: {f} gives the same weight to products of an odd number of primes as to products of an even number of primes, or to put it another way, {f} is asymptotically orthogonal to the Möbius function {\mu}). But when {f} violates the parity condition, the asymptotic involves the unknown {\delta_x}. This scalar {\delta_x} thus embodies the “parity problem” for the twin prime conjecture (discussed in these previous blog posts).

Because the obstruction to the parity problem is only one-dimensional (on EH), one can replace any parity-violating weight (such as {\Lambda}) with any other parity-violating weight and obtain a logically equivalent estimate. For instance, to prove the twin prime conjecture on EH, it would suffice to show that

\displaystyle \sum_{p_1 p_2 p_3 \leq x: p_1,p_2,p_3 \geq x^\alpha} \Lambda(p_1 p_2 p_3 + 2) \gg \frac{x}{\log x}

for some fixed {\alpha>0}, or equivalently that there are {\gg \frac{x}{\log^2 x}} solutions to the equation {p - p_1 p_2 p_3 = 2} in primes with {p \leq x} and {p_1,p_2,p_3 \geq x^\alpha}. (In some cases, this sort of reduction can also be made using other sieves than the Bombieri asymptotic sieve, as was observed by Ng.) As another example, the Bombieri asymptotic sieve can be used to show that the asymptotic (1) is equivalent to the asymptotic

\displaystyle \sum_{n \leq x} \mu(n) 1_R(n) \Lambda(n+2) = o( \frac{x}{\log x})

where {R} is the set of numbers that are rough in the sense that they have no prime factors less than {x^\alpha} for some fixed {\alpha>0} (the function {\mu 1_R} clearly correlates with {\mu} and so must violate the parity condition). One can replace {1_R} with similar sieve weights (e.g. a Selberg sieve) that concentrate on almost primes if desired.

As it turns out, if one is willing to strengthen the assumption of the Elliott-Halberstam (EH) conjecture to the assumption of the generalised Elliott-Halberstam (GEH) conjecture (as formulated for instance in Claim 2.6 of the Polymath8b paper), one can also swap the {\Lambda(n+2)} factor in the above asymptotics with other parity-violating weights and obtain a logically equivalent estimate, as the Bombieri asymptotic sieve also applies to weights such as {\mu 1_R} under the assumption of GEH. For instance, on GEH one can use two such applications of the Bombieri asymptotic sieve to show that the twin prime conjecture would follow if one could show that there are {\gg \frac{x}{\log^2 x}} solutions to the equation

\displaystyle p_1 p_2 - p_3 p_4 = 2

in primes with {p_1,p_2,p_3,p_4 \geq x^\alpha} and {p_1 p_2 \leq x}, for some {\alpha > 0}. Similarly, on GEH the asymptotic (1) is equivalent to the asymptotic

\displaystyle \sum_{n \leq x} \mu(n) 1_R(n) \mu(n+2) 1_R(n+2) = o( \frac{x}{\log^2 x})

for some fixed {\alpha>0}, and similarly with {1_R} replaced by other sieves. This form of the quantitative twin primes conjecture is appealingly similar to the (special case)

\displaystyle \sum_{n \leq x} \mu(n) \mu(n+2) = o(x)

of the Chowla conjecture, for which there has been some recent progress (discussed for instance in these recent posts). Informally, the Bombieri asymptotic sieve lets us (on GEH) view the twin prime conjecture as a sort of Chowla conjecture restricted to almost primes. Unfortunately, the recent progress on the Chowla conjecture relies heavily on the multiplicativity of {\mu} at small primes, which is completely destroyed by inserting a weight such as {1_R}, so this does not yet yield a viable path towards the twin prime conjecture even assuming GEH. Still, the similarity is striking, and one can hope that further ways to attack the Chowla conjecture may emerge that could impact the twin prime conjecture. (Alternatively, if one assumes a sufficiently optimistic version of the GEH, one could perhaps relax the notion of “almost prime” to the extent that one could start usefully using multiplicativity at smallish primes, though this seems rather wishful at present, particularly since the most optimistic versions of GEH are known to be false.)

The Bombieri asymptotic sieve is already well explained in the original two papers of Bombieri; there is also a slightly different treatment of the sieve by Friedlander and Iwaniec, as well as a simplified version in the book of Friedlander and Iwaniec (in which the distribution hypothesis is strengthened in order to shorten the arguments. I’ve decided though to write up my own notes on the sieve below the fold; this is primarily for my own benefit, but may be useful to some readers also. I largely follow the treatment of Bombieri, with the one idiosyncratic twist of replacing the usual “elementary” Selberg sieve with the “analytic” Selberg sieve used in particular in many of the breakthrough works in small gaps between primes; I prefer working with the latter due to its Fourier-analytic flavour.

— 1. Controlling generalised von Mangoldt sums —

To prove (5), we shall first generalise it, by replacing the sequence {\Lambda(n+2)} by a more general sequence {a_n} obeying the following axioms:

  • (i) (Non-negativity) One has {a_n \geq 0} for all {n}.
  • (ii) (Crude size bound) One has {a_n \ll \tau(n)^{O(1)} \log^{O(1)} n} for all {n}, where {\tau} is the divisor function.
  • (iii) (Size) We have {\sum_{n \leq x} a_n = (C+o(1)) x} for some constant {C>0}.
  • (iv) (Elliott-Halberstam type conjecture) For any {\varepsilon,A>0}, one has

    \displaystyle \sum_{d \leq x^{1-\varepsilon}} |\sum_{n \leq x: d|n} a_n - C x \frac{g(d)}{d}| \ll_{\varepsilon,A} x \log^{-A} x

    where {g} is a multiplicative function with {g(p^j) = 1 + O(1/p)} for all primes {p} and {j \geq 1}.

These axioms are a little bit stronger than what is actually needed to make the Bombieri asymptotic sieve work, but we will not attempt to work with the weakest possible axioms here.

We introduce the function

\displaystyle G(s) := \prod_p \frac{1-g(p)/p^s}{1-1/p^s}

which is analytic for {\hbox{Re}(s) > 0}; in particular it can be evaluated at {s=1} to yield

\displaystyle G(1) = \prod_p \frac{1-g(p)/p}{1-1/p}.

There are two model examples of data {a_n, C, g} to keep in mind. The first, discussed in the introduction, is when {a_n =\Lambda(n+2)}, then {C = 2 \Pi_2} and {g} is as in the introduction; one of course needs EH to justify axiom (iv) in this case. The other is when {a_n=1}, in which case {C=1} and {g(n)=1} for all {n}. We will later take advantage of the second example to avoid doing some (routine, but messy) main term computations.

The main result of this section is then

Theorem 1 Let {a_n, g, C, G} be as above. Let {\vec k = (k_1,\dots,k_r)} be a tuple of natural numbers (independent of {x}) that is not equal to {(1,\dots,1)}. Then one has the asymptotic

\displaystyle \sum_{n \leq x} \Lambda_{\vec k}(n) a_n = (G(1)+o(1)) \frac{\prod_{i=1}^r k_i!}{(|\vec k|-1)!} C x \log^{|\vec k|-1} x

as {x \rightarrow \infty}, where {|\vec k| := k_1 + \dots + k_r}.

Note that this recovers (5) (on EH) as a special case.

We now begin the proof of this theorem. Henceforth we allow implied constants in the {O()} or {\ll} notation to depend on {r, \vec k} and {g,G}.

It will be convenient to replace the range {n \leq x} by a shorter range by the following standard localisation trick. Let {B} be a large quantity depending on {r, \vec k} to be chosen later, and let {I} denote the interval {\{ n: x - x \log^{-B} x \leq n \leq x \}}. We will show the estimate

\displaystyle \sum_{n \in I} \Lambda_{\vec k}(n) a_n = (G(1)+o(1)) \frac{\prod_{i=1}^r k_i!}{(|\vec k|-1)!} C |I| \log^{|\vec k|-1} x \ \ \ \ \ (10)


from which the original claim follows by a routine summation argument. Observe from axiom (iv) and the triangle inequality that

\displaystyle \sum_{d \leq x^{1-\varepsilon}: \mu^2(d)=1} |\sum_{n \in I: d|n} a_n - C |I| \frac{g(d)}{d}| \ll_{\varepsilon,A} x \log^{-A} x

for any {\varepsilon,A > 0}.

Write {L} for the logarithm function {L(n) := \log n}, thus {\Lambda_k = \mu * L^k} for any {k}. Without loss of generality we may assume that {k_r > 1}; we then factor {\Lambda_{\vec k} = \mu_{\vec k} * L^{k_r}}, where

\displaystyle \mu_{\vec k} := \Lambda_{k_1} * \dots * \Lambda_{k_{r-1}} * \mu.

This function is just {\mu} when {r=1}. When {r>1} the function is more complicated, but we at least have the following crude bound:

Lemma 2 One has the pointwise bound {|\mu_{\vec k}| \leq L^{|\vec k|-k_r}}.

Proof: We induct on {r}. The case {r=1} is obvious, so suppose {r>1} and the claim has already been proven for {r-1}. Since {\mu_{\vec k} = \Lambda_{k_1} * \mu_{(k_2,\dots,k_r)}}, we see from induction hypothesis and the triangle inequality that

\displaystyle |\mu_{\vec k}| \leq \Lambda_{k_1} * L^{|\vec k| - k_r - k_1} \leq L^{|\vec k| - k_r - k_1} (\Lambda_{k_1} * 1).

Since {\Lambda_{k_1}*1 = L^{k_1}} by Möbius inversion, the claim follows. \Box

We can write

\displaystyle \Lambda_{\vec k}(n) = \sum_{d|n} \mu_{\vec k}(d) \log^{k_r} \frac{n}{d}.

In the region {n \in I}, we have {\log^{k_r} \frac{n}{d} = \log^{k_r} \frac{x}{d} + O( \log^{-B+O(1)} x )}. Thus

\displaystyle \Lambda_{\vec k}(n) = \sum_{d|n} \mu_{\vec k}(d) \log^{k_r} \frac{x}{d} + O( \tau(x) \log^{-B+O(1)} x )

for {n \in I}. The contribution of the error term to {O( \tau(x) \log^{-B+O(1)} x )} to (10) is easily seen to be negligible if {B} is large enough, so we may freely replace {\Lambda_{\vec k}(n)} with {\sum_{d|n} \mu_{\vec k}(d) \log^{k_r} \frac{x}{d}} with little difficulty.

If we insert this replacement directly into the left-hand side of (10) and rearrange, we get

\displaystyle \sum_{d \leq x} \mu_{\vec k}(d) \log^{k_r} \frac{x}{d} \sum_{n \in I: d|n} a_d.

We can’t quite control this using axiom (iv) because the range of {d} is a bit too big, as explained in the introduction. So let us introduce a truncated function

\displaystyle \Lambda_{\vec k,\varepsilon}(n) := \sum_{d|n} \mu_{\vec k}(d) \log^{k_r} \frac{x}{d} \eta_\varepsilon( \frac{\log d}{\log x} ) \ \ \ \ \ (11)


where {\varepsilon>0} is a small quantity to be chosen later, and {\eta_\varepsilon: {\bf R} \rightarrow [0,1]} is a smooth function that equals {1} on {(-\infty,1-4\varepsilon)} and equals {0} on {(1-3\varepsilon,+\infty)}. Suppose one could establish the following two estimates for any fixed {\varepsilon>0}:

\displaystyle \sum_{n \in I} \Lambda_{\vec k}(n) a_n = \sum_{n \in I} \Lambda_{\vec k,\varepsilon}(n) a_n + O( (\varepsilon+o(1)) C |I| \log^{|\vec k|-1} x ) \ \ \ \ \ (12)



\displaystyle \sum_{n \in I} \Lambda_{\vec k,\varepsilon}(n) a_n = C Q_{\varepsilon,x} G(1) + o( |I| \log^{|\vec k|-1} x ) \ \ \ \ \ (13)


where {Q_{\varepsilon,x}} is a quantity that depends on {\varepsilon, \eta_\varepsilon, \vec k, B, x} but not on {C, g,G}. Then on combining the two estimates we would have

\displaystyle \sum_{n \in I} \Lambda_{\vec k}(n) a_n = C Q_{\varepsilon,x} G(1) + (O(\varepsilon) + o(1)) C |I| \log^{|\vec k|-1} x. \ \ \ \ \ (14)


One could in principle compute {Q_{\varepsilon,x}} explicitly from the proof of (13), but one can avoid doing so by the following comparison trick. In the special case {a_n=1}, standard multiplicative number theory (noting that the Dirichlet series {\sum_n \frac{\Lambda_{\vec k}(n)}{n^s}} has a pole of order {|\vec k|} at {s=1}, with top Laurent coefficient {\prod_{j=1}^r k_j!}) gives the asymptotic

\displaystyle \sum_{n \in I} \Lambda_{\vec k}(n) a_n = \frac{\prod_{i=1}^r k_i!}{(|\vec k|-1)!} + o(1)) |I| \log^{|\vec k|-1} x

which when compared with (14) for {a_n=1} (recalling that {G(1)=C=1} in this case) gives the formula

\displaystyle Q_{\varepsilon,x} = (\prod_{j=1}^r k_j + O(\varepsilon)) |I| \log^{|\vec k|-1} x.

Inserting this back into (14) and recalling that {\varepsilon>0} can be made arbitrarily small, we obtain (10).

As it turns out, the estimate (13) is easy to establish, but the estimate (12) is not, roughly speaking because the typical number {n} in {I} has too many divisors {d} in the range {[x^{1-4\varepsilon},1]}, each of which gives a contribution to the error term. (In the book of Friedlander and Iwaniec, the estimate (13) is established anyway, but only after assuming a stronger version of (iv), roughly speaking in which {d} is allowed to be as large as {x \exp( -\log^{1/4} x)}.) To resolve this issue, we will insert a preliminary sieve {\nu_\varepsilon} that will remove most of the potential divisors {d} i the range {[x^{1-4\varepsilon},1]} (leaving only about {O(1)} such divisors on the average for typical {n}), making the analogue of (12) easier to prove (at the cost of making the analogue of (13) more difficult). Namely, if one can find a function {\nu_\varepsilon: {\bf N} \rightarrow {\bf R}} for which one has the estimates

\displaystyle \sum_{n \in I} \Lambda_{\vec k}(n) a_n = \sum_{n \in I} \Lambda_{\vec k}(n) \nu_\varepsilon(n) a_n + O( (\varepsilon+o(1)) C |I| \log^{|\vec k|-1} x ), \ \ \ \ \ (15)


\displaystyle \sum_{n \in I} \Lambda_{\vec k}(n) \nu_\varepsilon(n) a_n

\displaystyle = \sum_{n \in I} \Lambda_{\vec k,\varepsilon}(n) \nu_\varepsilon(n) a_n + O( (\varepsilon+o(1)) C |I| \log^{|\vec k|-1} x ) \ \ \ \ \ (16)



\displaystyle \sum_{n \in I} \Lambda_{\vec k,\varepsilon}(n) \nu_\varepsilon(n) a_n = C Q'_{\varepsilon,x} G(1) + o( |I| \log^{|\vec k|-1} x ) \ \ \ \ \ (17)


for some quantity {Q'_{\varepsilon,x}} that depends on {\varepsilon, \eta_\varepsilon, \vec k, B, x} but not on {C, g, G,}, then by repeating the previous arguments we will again be able to establish (10).

The key estimate is (16). As we shall see, when comparing {\Lambda_{\vec k}(n) \nu_\varepsilon(n)} with {\Lambda_{\vec k,\varepsilon}(n) \nu_\varepsilon(n)}, the weight {\nu_\varepsilon} will cost us a factor of {1/\varepsilon}, but the {\log^{k_r} \frac{x}{d}} term in the definitions of {\Lambda_{\vec k}} and {\Lambda_{\vec k,\varepsilon}} will recover a factor of {\varepsilon^{k_r}}, which will give the desired bound since we are assuming {k_r > 1}.

One has some flexibility in how to select the weight {\nu_\varepsilon}: basically any standard sieve that uses divisors of size at most {x^{2\varepsilon}} to localise (at least approximately) to numbers that are rough in the sense that they have no (or at least very few) factors less than {x^\varepsilon}, will do. We will use the analytic Selberg sieve choice

\displaystyle \nu_\varepsilon(n) := (\sum_{d|n} \mu(d) \psi( \frac{\log d}{\varepsilon \log x} ))^2 \ \ \ \ \ (18)


where {\psi: {\bf R} \rightarrow [0,1]} is a smooth function supported on {[-1,1]} that equals {1} on {[-1/2,1/2]}.

It remains to establish the bounds (15), (16), (17). To warm up and introduce the various methods needed, we begin with the standard bound

\displaystyle \sum_{n \in I} \nu_\varepsilon(n) a_n = \frac{C|I|}{\varepsilon \log x} (\int_0^1 \psi'(u)^2\ du) G(1) + o(1)), \ \ \ \ \ (19)


where {\psi'} denotes the derivative of {\psi}. Note the loss of {1/\varepsilon} that had previously been pointed out. In the arguments that follows I will be a little brief with the details, as they are standard (see e.g. this previous post).

We now prove (19). The left-hand side can be expanded as

\displaystyle \sum_{d_1,d_2} \mu(d_1) \mu(d_2) \psi( \frac{\log d_1}{\varepsilon \log x} ) \psi( \frac{\log d_2}{\varepsilon \log x} ) \sum_{n \in I: [d_1,d_2]|n} a_n

where {[d_1,d_2]} denotes the least common multiple of {d_1} and {d_2}. From the support of {\psi} we see that the summand is only non-vanishing when {[d_1,d_2] \leq x^{2\varepsilon}}. We now use axiom (iv) and split the left-hand side into a main term

\displaystyle \sum_{d_1,d_2} \mu(d_1) \mu(d_2) \psi( \frac{\log d_1}{\varepsilon \log x} ) \psi( \frac{\log d_2}{\varepsilon \log x} ) \frac{g(d)}{d} C |I|

and an error term that is at most

\displaystyle O_\varepsilon( \sum_{d \leq x^{2\varepsilon}} \tau(d)^{O(1)} | \sum_{n \in I: d|n} a_n - \frac{g(d)}{d} C |I|| ). \ \ \ \ \ (20)


From axiom (ii) and elementary multiplicative number theory, we have the bound

\displaystyle \sum_{d \leq x} \tau(d)^{O(1)} | \sum_{n \in I: d|n} a_n - \frac{g(d)}{d} C |I| \ll C |I| \log^{O(1)} x

so from axiom (iv) and Cauchy-Schwarz we see that the error term (20) is acceptable. Thus it will suffice to establish the bound

\displaystyle \sum_{d_1,d_2} \mu(d_1) \mu(d_2) \psi( \frac{\log d_1}{\varepsilon \log x} ) \psi( \frac{\log d_2}{\varepsilon \log x} ) \frac{g([d_1,d_2])}{[d_1,d_2]}

\displaystyle = \frac{1}{\varepsilon \log x} (\int_0^1 \psi'(u)^2\ du) G(1) + o(\frac{1}{\log x}). \ \ \ \ \ (21)


The summand here is almost, but not quite, multiplicative in {d_1,d_2}. To make it genuinely multiplicative, we perform a (shifted) Fourier expansion

\displaystyle \psi(u) = \int_{\bf R} e^{-(1+it)u} \Psi(t)\ dt \ \ \ \ \ (22)


for some rapidly decreasing function {\Psi} (essentially the Fourier transform of {e^u \psi(u)}). Thus

\displaystyle \psi( \frac{\log d}{\varepsilon \log x} ) = \int_{\bf R} \frac{1}{d^{\frac{1+it}{\varepsilon \log x}}} \Psi(t)\ dt,

and so the left-hand side of (21) can be rearranged using Fubini’s theorem as

\displaystyle \int_{\bf R} \int_{\bf R} E(\frac{1+it_1}{\varepsilon \log x},\frac{1+it_2}{\varepsilon \log x})\ \Psi(t_1) \Psi(t_2) dt_1 dt_2 \ \ \ \ \ (23)



\displaystyle E(s_1,s_2) := \sum_{d_1,d_2} \frac{\mu(d_1) \mu(d_2)}{d_1^{s_1}d_2^{s_2}} \frac{g([d_1,d_2])}{[d_1,d_2]}.

We can factorise {E(s_1,s_2)} as an Euler product:

\displaystyle E(s_1,s_2) = \prod_p (1 - \frac{g(p)}{p^{1+s_1}} - \frac{g(p)}{p^{1+s_2}} + \frac{g(p)}{p^{1+s_1+s_2}}).

Taking absolute values and using Mertens’ theorem leads to the crude bound

\displaystyle E(\frac{1+it_1}{\varepsilon \log x},\frac{1+it_2}{\varepsilon \log x}) \ll_\varepsilon \log^{O(1)} x

which when combined with the rapid decrease of {\Psi}, allows us to restrict the region of integration in (23) to the square {\{ |t_1|, |t_2| \leq \sqrt{\log x} \}} (say) with negligible error. Next, we use the Euler product

\displaystyle \zeta(s) = \prod_p (1-\frac{1}{p^s})^{-1}

for {\hbox{Re} s > 1} to factorise

\displaystyle E(s_1,s_2) = \frac{\zeta(1+s_1+s_2)}{\zeta(1+s_1) \zeta(1+s_2)} \prod_p E_p(s_1,s_2)


\displaystyle E_p(s_1,s_2) := \frac{(1 - \frac{g(p)}{p^{1+s_1}} - \frac{g(p)}{p^{1+s_2}} + \frac{g(p)}{p^{1+s_1+s_2}})(1 - \frac{1}{p^{1+s_1+s_2}})}{(1-\frac{1}{p^{1+s_1}})(1-\frac{1}{p^{1+s_2}})}.

For {s_1,s_2=o(1)} with nonnegative real part, one has

\displaystyle E_p(s_1,s_2) = 1 + O(1/p^2)

and so by the Weierstrass {M}-test, {\prod_p E_p(s_1,s_2)} is continuous at {s_1=s_2=0}. Since

\displaystyle \prod_p E_p(0,0) = G(1)

we thus have

\displaystyle \prod_p E_p(s_1,s_2) = G(1) + o(1)

Also, since {\zeta} has a pole of order {1} at {s=1} with residue {1}, we have

\displaystyle \frac{\zeta(1+s_1+s_2)}{\zeta(1+s_1) \zeta(1+s_2)} = (1+o(1)) \frac{s_1 s_2}{s_1+s_2}

and thus

\displaystyle E(s_1,s_2) = (G(1)+o(1)) \frac{s_1s_2}{s_1+s_2}.

The quantity (23) can thus be written, up to errors of {o(\frac{1}{\log x})}, as

\displaystyle \frac{G(1)}{\varepsilon \log x} \int_{|t_1|, |t_2| \leq \sqrt{\log x}} \frac{(1+it_1)(1+it_2)}{1+it_1+1+it_2} \Psi(t_1) \Psi(t_2)\ dt_1 dt_2.

Using the rapid decrease of {\Psi}, we may remove the restriction on {t_1,t_2}, and it will now suffice to prove the identity

\displaystyle \int_{\bf R} \int_{\bf R} \frac{(1+it_1)(1+it_2)}{1+it_1+1+it_2} \Psi(t_1) \Psi(t_2)\ dt_1 dt_2 = (\int_0^1 \psi'(u)^2\ du)^2.

But on differentiating and then squaring (22) we have

\displaystyle \psi'(u)^2 = \int_{\bf R} \int_{\bf R} (1+it_1)(1+it_2) e^{-(1+it_1+1+it_2)u}\Psi(t_1) \Psi(t_2)\ dt_1 dt_2

and the claim follows by integrating in {u} from zero to infinity (noting that {\psi'} vanishes for {u>1}).

We have the following variant of (19):

Lemma 3 For any {d \leq x^{1-3\varepsilon}}, one has

\displaystyle \sum_{n \in I: d|n} \nu_\varepsilon(n) a_n \ll \frac{C|I|}{\varepsilon \log x} \frac{\prod_{p|d} O( \min( \frac{\log p}{\varepsilon \log x}, 1 )^2 )}{d} + R_d \ \ \ \ \ (24)


where the {R_d} are such that

\displaystyle \sum_{d \leq x^{1-3\varepsilon}} R_d \ll_A |I| \log^{-A} x \ \ \ \ \ (25)


for any {A>0}. We also have the variant

\displaystyle \sum_{n \in I: d|n} \nu_\varepsilon(n/d) a_n \ll \frac{C|I|}{\varepsilon \log x} \frac{\prod_{p|d} O(1 ) )}{d} + R_d. \ \ \ \ \ (26)


If in addition {d} has no prime factors less than {x^\delta} for some fixed {\delta>0}, one has

\displaystyle \sum_{n \in I: d|n} \nu_\varepsilon(n) a_n

\displaystyle = \frac{1+o(1)}{d} \frac{C|I|}{\varepsilon \log x} (\int_0^1 \psi'(u)^2\ du) G(1) + O(R_d). \ \ \ \ \ (27)


Roughly speaking, the above estimates assert that {\nu_\varepsilon} is concentrated on those numbers {n} with no prime factors much less than {x^\varepsilon}, but factors {d} without such small prime divisors occur with about the same relative density as they do in the integers.

Proof: The left-hand side of (24) can be expanded as

\displaystyle \sum_{d_1,d_2} \mu(d_1) \mu(d_2) \psi( \frac{\log d_1}{\varepsilon \log x} ) \psi( \frac{\log d_2}{\varepsilon \log x} ) \sum_{n \in I: [d_1,d_2,d]|n} a_n.

If we define

\displaystyle R_d := \sum_{d' \leq x^{1-\varepsilon}: d|d'} \tau(d')^2 |\sum_{n \in I:d'|n} a_n - \frac{g(d')}{d'} C|I||

then the previous expression can be written as

\displaystyle \sum_{d_1,d_2} \mu(d_1) \mu(d_2) \psi( \frac{\log d_1}{\varepsilon \log x} ) \psi( \frac{\log d_2}{\varepsilon \log x} ) \frac{g([d_1,d_2,d])}{[d_1,d_2,d]} C|I| + O(R_d),

while one has

\displaystyle \sum_{d \leq x^{1-3\varepsilon}} R_d \leq \sum_{d' \leq x^{1-\varepsilon}} \tau(d')^3 |\sum_{n \in I:d'|n} a_n - \frac{g(d')}{d'} C|I||

which gives (25) from Axiom (iv). To prove (24), it now suffices to show that

\displaystyle \sum_{d_1,d_2} \mu(d_1) \mu(d_2) \psi( \frac{\log d_1}{\varepsilon \log x} ) \psi( \frac{\log d_2}{\varepsilon \log x} ) \frac{g([d_1,d_2,d])}{[d_1,d_2,d]}

\displaystyle \ll \frac{1}{\varepsilon \log x} \frac{\prod_{p|d} O( \min( \frac{\log p}{\varepsilon \log x}, 1 )^2 )}{d}. \ \ \ \ \ (28)


Arguing as before, the left-hand side is

\displaystyle \int_{\bf R} \int_{\bf R} E^{(d)}(\frac{1+it_1}{\varepsilon \log x},\frac{1+it_2}{\varepsilon \log x})\ \Psi(t_1) \Psi(t_2) dt_1 dt_2


\displaystyle E^{(d)}(s_1,s_2) := \sum_{d_1,d_2} \frac{\mu(d_1) \mu(d_2)}{d_1^{s_1}d_2^{s_2}} \frac{g([d_1,d_2,d])}{[d_1,d_2,d]}.

From Mertens’ theorem we have

\displaystyle E^{(d)}(s_1,s_2) \ll_\varepsilon \frac{\prod_{p|d} O(1)}{d} \log^{O(1)} x

when {\hbox{Re} s_1, \hbox{Re} s_2 = \frac{1}{\varepsilon \log x}}, so the contribution of the terms where {|t_1|, |t_2| \geq \sqrt{\log x}} can be absorbed into the {R_d} error (after increasing that error slightly). For the remaining contributions, we see that

\displaystyle E^{(d)}(s_1,s_2) = \frac{\zeta(1+s_1+s_2)}{\zeta(1+s_1) \zeta(1+s_2)} \prod_p E^{(d)}_p(s_1,s_2)

where {E^{(d)}_p(s_1,s_2) = E_p(s_1,s_2)} if {p} does not divide {d}, and

\displaystyle E^{(d)}_p(s_1,s_2) = \frac{g(p^j)}{p^j} \frac{(1 - \frac{1}{p^{s_1}}) (1 - \frac{1}{p^{s_2}}) (1 - \frac{1}{p^{1+s_1+s_2}})}{(1-\frac{1}{p^{1+s_1}})(1-\frac{1}{p^{1+s_2}})}

if {p} divides {d} {j} times for some {j \geq 1}. In the latter case, Taylor expansion gives the bounds

\displaystyle |E^{(d)}_p(\frac{1+it_1}{\varepsilon \log x},\frac{1+it_2}{\varepsilon \log x})| \lesssim (1+|t_1|+|t_2|)^{O(1)} \frac{\min( \frac{\log p}{\varepsilon \log x}, 1 )^2}{p}

and the claim (28) follows. When {p \geq x^\delta} and {|t_1|, |t_2| \leq \sqrt{\log x}} we have

\displaystyle E^{(d)}_p(\frac{1+it_1}{\varepsilon \log x},\frac{1+it_2}{\varepsilon \log x}) = \frac{1+o(1)}{p^j}

and (27) follows by repeating the previous calculations. Finally, (26) is proven similarly to (24) (using {d[d_1,d_2]} in place of {[d_1,d_2,d]}). \Box

Now we can prove (15), (16), (17). We begin with (15). Using the Leibniz rule {L(f*g) = (Lf)*g + f*(Lg)} applied to the identity {\mu = \mu * 1 * \mu} and using {\Lambda = \mu*L} and Möbius inversion (and the associativity and commutativity of Dirichlet convolution) we see that

\displaystyle L\mu = - \mu * \Lambda. \ \ \ \ \ (29)


Next, by applying the Leibniz rule to {\Lambda_k = \mu * L^k} for some {k \geq 1} and using (29) we see that

\displaystyle L \Lambda_k = L \mu * L^k + \mu * L^{k+1}

\displaystyle = - \mu * \Lambda * L^k + \Lambda_{k+1}

and hence we have the recursive identity

\displaystyle \Lambda_{k+1} = L \Lambda_k + \Lambda *\Lambda_k. \ \ \ \ \ (30)


In particular, from induction we see that {\Lambda_k} is supported on numbers with at most {k} distinct prime factors, and hence {\Lambda_{\vec k}} is supported on numbers with at most {|\vec k|} distinct prime factors. In particular, from (18) we see that {\nu_\varepsilon(n) = O(1)} on the support of {\Lambda_{\vec k}}. Thus it will suffice to show that

\displaystyle \sum_{n \in I: \nu_\varepsilon(n) \neq 1} \Lambda_{\vec k}(n) a_n \ll (\varepsilon+o(1)) C |I| \log^{|\vec k|-1} x.

If {\nu_\varepsilon(n) \neq 1} and {\Lambda_{\vec k}(n) \neq 0}, then {n} has at most {|\vec k|} distinct prime factors {p_1 < p_2 < \dots < p_r}, with {p_1 \leq x^\varepsilon}. If we factor {n = n_1 n_2}, where {n_1} is the contribution of those {p_i} with {p_i \leq x^{1/10|\vec k|}}, and {n_2} is the contribution of those {p_i} with {p_i > x^{1/10|\vec k|}}, then at least one of the following two statements hold:

  • (a) {n_1} (and hence {n}) is divisible by a square number of size at least {x^{1/10}}.
  • (b) {n_1 \leq x^{1/5}}.

The contribution of case (a) is easily seen to be acceptable by axiom (ii). For case (b), we observe from (30) and induction that

\displaystyle \Lambda_k(n) \ll \log^{|\vec k|} x \prod_{j=1}^k \frac{\log p_j}{\log x}

and so it will suffice to show that

\displaystyle \sum_{n_1} (\prod_{p|n_1} \frac{\log p}{\log x}) \sum_{n \in I: n_1 | n} 1_R(n/n_1) a_n \ll (\varepsilon + o(1)) C |I| \log^{-1} x

where {n_1} ranges over numbers bounded by {x^{1/5}} with at most {|\vec k|} distinct prime factors, the smallest of which is at most {x^\varepsilon}, and {R} consists of those numbers with no prime factor less than or equal to {x^{1/10|\vec k|}}. Applying (26) (with {\varepsilon} replaced by {1/10|\vec k|}) gives the bound

\displaystyle \sum_{n \in I: d|n} 1_R(n/n_1) a_n \ll \frac{C|I|}{\log x} \frac{1}{n_1} + R_d

so by (25) it suffices to show that

\displaystyle \sum_{n_1} (\prod_{p|n_1} \frac{\log p}{\log x}) \frac{1}{n_1} \ll \varepsilon

subject to the same constraints on {n_1} as before. The contribution of those {n_1} with {r} distinct prime factors can be bounded by

\displaystyle O(\sum_{p_1 \leq x^\varepsilon} \frac{\log p_1}{p_1 \log x}) \times O(\sum_{p \leq x^{1/5}} \frac{\log p}{p\log x})^{r-1};

applying Mertens’ theorem and summing over {1 \leq r \leq |\vec k|}, one obtains the claim.

Now we show (16). As discussed previously in this section, we can replace {\Lambda_{\vec k}(n)} by {\sum_{d|n} \mu_{\vec k}(d) \log^{k_r} \frac{x}{d}} with negligible error. Comparing this with (16) and (11), we see that it suffices to show that

\displaystyle \sum_{n \in I} \sum_{d|n} \mu_{\vec k}(d) \log^{k_r} \frac{x}{d} (1 - \eta_\varepsilon(\frac{\log d}{\log x})) \nu_\varepsilon(n) a_n \ll (\varepsilon+o(1)) C |I| \log^{|\vec k|-1} x.

From the support of {\eta_\varepsilon}, the summand on the left-hand side is only non-zero when {d \geq x^{1-4\varepsilon}}, which makes {\log^{k_r} \frac{x}{d} \ll \varepsilon^{k_r} \log^{k_r} x \leq \varepsilon^2 \log^{k_r} x}, where we use the crucial hypothesis {k_r > 1} to gain enough powers of {\varepsilon} to make the argument here work. Applying Lemma 2, we reduce to showing that

\displaystyle \sum_{n \in I} \sum_{d|n: d \geq x^{1-4\varepsilon}} \nu_\varepsilon(n) a_n \ll \frac{1+o(1)}{\varepsilon \log x} C |I|.

We can make the change of variables {d \mapsto n/d} to flip the sum

\displaystyle \sum_{d|n: d \geq x^{1-4\varepsilon}} 1 \leq \sum_{d|n: d \leq x^{3\varepsilon}} 1

and then swap the sums to reduce to showing that

\displaystyle \sum_{d \leq x^{4\varepsilon}} \sum_{n \in I} \nu_\varepsilon(n) a_n \ll \frac{1+o(1)}{\varepsilon \log x} C |I|.

By Lemma 3, it suffices to show that

\displaystyle \sum_{d \leq x^{4\varepsilon}} \frac{\prod_{p|d} O( \min( \frac{\log p}{\varepsilon \log x}, 1 )^2 )}{d} \ll 1.

To prove this, we use the Rankin trick, bounding the implied weight {1_{d \leq x^{4\varepsilon}}} by {O( \frac{1}{d^{1/\varepsilon \log x}} )}. We can then bound the left-hand side by the Euler product

\displaystyle \prod_p (1 + O( \frac{\min( \frac{\log p}{\varepsilon \log x}, 1 )^2}{p^{1+1/\varepsilon \log x}} ))

which can be bounded by

\displaystyle \exp( O( \sum_p \frac{\min( \frac{\log p}{\varepsilon \log x}, 1 )^2}{p^{1+1/\varepsilon \log x}} ) )

and the claim follows from Mertens’ theorem.

Finally, we show (17). By (11), the left-hand side expands as

\displaystyle \sum_{d \leq x^{1-3\varepsilon}} \mu_{\vec k}(d) \log^{k_r} \frac{x}{d} \eta_\varepsilon(\frac{\log d}{\log x}) \sum_{n \in I: d|n} \nu_\varepsilon(n) a_n.

We let {\delta>0} be a small constant to be chosen later. We divide the outer sum into two ranges, depending on whether {d} only has prime factors greater than {x^\delta} or not. In the former case, we can apply (27) to write this contribution as

\displaystyle \sum_{d \leq x^{1-3\varepsilon}} \mu_{\vec k}(d) \log^{k_r} \frac{x}{d} \eta_\varepsilon(\frac{\log d}{\log x}) \frac{1+o(1)}{d} \frac{C|I|}{\varepsilon \log x} (\int_0^1 \psi'(u)^2\ du) G(1)

plus a negligible error, where the {d} is implicitly restricted to numbers with all prime factors greater than {x^\delta}. The main term is messy, but it is of the required form {C Q'_{\varepsilon,x} G(1)} up to an acceptable error, so there is no need to compute it any further. It remains to consider those {d} that have at least one prime factor less than {x^\delta}. Here we use (24) instead of (27) as well as Lemma 3 to dominate this contribution by

\displaystyle \sum_{d \leq x^{1-3\varepsilon}} O( \log^{|\vec k|} x \frac{C|I|}{\varepsilon \log x} \frac{\prod_{p|d} O( \min( \frac{\log p}{\varepsilon \log x}, 1 )^2 )}{d} )

up to negligible errors, where {d} is now restricted to have at least one prime factor less than {x^\delta}. This makes at least one of the factors {\min( \frac{\log p}{\varepsilon \log x}, 1 )} to be at most {O_\varepsilon(\delta)}. A routine application of Rankin’s trick shows that

\displaystyle \sum_{d \leq x^{1-3\varepsilon}} \frac{\prod_{p|d} O( \min( \frac{\log p}{\varepsilon \log x}, 1 ) )}{d} \ll_\varepsilon 1

and so the total contribution of this case is {O_\varepsilon((\delta+o(1)) |I| \log^{|\vec k|-1} x)}. Since {\delta>0} can be made arbitrarily small, (17) follows.

— 2. Weierstrass approximation —

Having proved Theorem 1, we now take linear combinations of this theorem, combined with the Weierstrass approximation theorem, to give the asymptotics (7), (8) described in the introduction.

Let {a_n}, {g}, {C}, {G} be as in that theorem. It will be convenient to normalise the weights {\Lambda_{\vec k}} by {L^{1-|\vec k|}} to make their mean value comparable to {1}. From Theorem 1 and summation by parts we have

\displaystyle \sum_{n \leq x} L^{1-|\vec k|} \Lambda_{\vec k}(n) a_n = (G(1)+o(1)) \frac{\prod_{i=1}^r k_i!}{(|\vec k|-1)!} C x \ \ \ \ \ (31)


whenever {\vec k} does not consist entirely of ones.

We now take a closer look at what happens when {\vec k} does consist entirely of ones. Let {1^r} denote the {r}-tuple {(1,\dots,1)}. Convolving the {k=1} case of (30) with {r-1} copies of {\Lambda} for some {r \geq 1} and using the Leibniz rule, we see that

\displaystyle \Lambda_{(1^{r-1}, 2)} = \frac{1}{r} L \Lambda_{1^r} + \Lambda_{1^{r+1}}

and hence

\displaystyle L^{-r} \Lambda_{1^{r+1}} = L^{-r} \Lambda_{(1^{r-1},2)} - \frac{1}{r} L^{1-r} \Lambda_{1^r}.

Multiplying by {a_n} and summing over {n \leq x}, and using (31) to control the {\Lambda_{(1^{r-1},2)}} term, one has

\displaystyle \sum_{n \leq x} L^{-r} \Lambda_{1^{r+1}}(n) a_n = (G(1)+o(1)) \frac{2}{r!} - \frac{1}{r} \sum_{n \leq x} L^{1-r} \Lambda_{1^{r}}(n) a_n.

If we define {\delta_x} (up to an error of {o(1)}) by the formula

\displaystyle \sum_{n \leq x} \Lambda(n) a_n = (\delta_x G(1) + o(1)) C x

then an induction then shows that

\displaystyle \sum_{n \leq x} L^{1-r} \Lambda_{1^r}(n) a_n = \frac{1}{(r-1)!} (\delta_x G(1) + o(1)) C x

for odd {r}, and

\displaystyle \sum_{n \leq x} L^{1-r} \Lambda_{1^r}(n) a_n = \frac{1}{(r-1)!} ((2-\delta_x) G(1) + o(1)) C x

for even {r}. In particular, after adjusting {\delta_x} by {o(1)} if necessary, we have {0 \leq \delta_x \leq 2} since the left-hand sides are non-negative.

If we now define the comparison sequence {b_n := C G(1) (1 + (1-\delta_x) \mu(n))}, standard multiplicative number theory shows that the above estimates also hold when {a_n} is replaced by {b_n}; thus

\displaystyle \sum_{n \leq x} L^{1-r} \Lambda_{1^r}(n) a_n = \sum_{n \leq x} L^{1-r} \Lambda_{1^r}(n) b_n + o( x )

for both odd and even {r}. The bound (31) also holds for {b_n} when {\vec k} does not consist entirely of ones, and hence

\displaystyle \sum_{n \leq x} L^{1-|\vec k|} \Lambda_{\vec k}(n) a_n = \sum_{n \leq x} L^{1-|\vec k|} \Lambda_{\vec k}(n) b_n + o( x )

for any fixed {\vec k} (which may or may not consist entirely of ones).

Next, from induction (on {j_1+\dots+j_r}), the Leibniz rule, and (30), we see that for any {r \geq 1} and {j_1,\dots,j_r \geq 0}, {k_1,\dots,k_r}, the function

\displaystyle L^{1-j_1-\dots-j_r-|\vec k|} ((L^{j_1} \Lambda_{k_1}) * \dots * (L^{j_r} \Lambda_{k_r})) \ \ \ \ \ (32)


is a finite linear combination of functions of the form {L^{1-|\vec k'|} \Lambda_{\vec k'}} for tuples {\vec k'} that may possibly consist entirely of ones. We thus have

\displaystyle \sum_{n \leq x} f(n) a_n = \sum_{n \leq x}f(n) b_n + o( x )

whenever {f} is one of these functions (32). Specialising to the case {k_1=\dots=k_r=1}, we thus have

\displaystyle \sum_{n_1 \dots n_r \leq x} a_{n} \log^{1-r} n \prod_{i=1}^r (\log n_i/\log n)^{j_i} \Lambda(n_i)

\displaystyle = \sum_{n_1 \dots n_r \leq x} b_{n} \log^{1-r} n \prod_{i=1}^r (\log n_i/\log n)^{j_i} \Lambda(n_i) + o(x )

where {n := n_1 \dots n_r}. The contribution of those {n_i} that are powers of primes can be easily seen to be negligible, leading to

\displaystyle \sum_{p_1 \dots p_r \leq x} a_{n} \log n \prod_{i=1}^r (\log p_i/\log n)^{j_i+1}

\displaystyle = \sum_{p_1 \dots p_r \leq x} b_{n} \prod_{i=1}^r (\log p_i/\log n)^{j_i+1} + o(x)

where now {n := p_1 \dots p_r}. The contribution of the case where two of the primes {p_i} agree can also be seen to be negligible, as can the error when replacing {\log n} with {\log x}, and then by symmetry

\displaystyle \sum_{p_1 \dots p_r \leq x: p_1 < \dots < p_r} a_{n} \prod_{i=1}^r (\log p_i/\log n)^{j_i+1}

\displaystyle = \sum_{p_1 \dots p_r \leq x: p_1 < \dots < p_r} b_{n} \prod_{i=1}^r (\log p_i/\log n)^{j_i+1} + o(x / \log x).

By linearity, this implies that

\displaystyle \sum_{p_1 \dots p_r \leq x: p_1 < \dots < p_r} a_{n} P( \log p_1/\log n, \dots, \log p_r/\log n)

\displaystyle = \sum_{p_1 \dots p_r \leq x: p_1 < \dots < p_r} b_{n} P( \log p_1/\log n, \dots, \log p_r/\log n) + o(x / \log x)

for any polynomial {P(t_1,\dots,t_r)} that vanishes on the coordinate hyperplanes {t_i=0}. The right-hand side can also be evaluated by Mertens’ theorem as

\displaystyle CG(1) \delta_x \int_{\Delta_r} P x + o(x)

when {r} is odd and

\displaystyle CG(1) (2-\delta_x) \int_{\Delta_r} P x + o(x)

when {r} is even. Using the Weierstrass approximation theorem, we then have

\displaystyle \sum_{p_1 \dots p_r \leq x: p_1 < \dots < p_r} a_{n} g_r( \log p_1/\log n, \dots, \log p_r/\log n)

\displaystyle = \sum_{p_1 \dots p_r \leq x: p_1 < \dots < p_r} b_{n} g_r( \log p_1/\log n, \dots, \log p_r/\log n) + o(x / \log x)

for any continuous function {g_r} that is compactly supported in the interior of {\Delta_r}. Computing the right-hand side using Mertens’ theorem as before, we obtain the claimed asymptotics (7), (8).

Remark 4 The Bombieri asymptotic sieve has to use the full power of EH (or GEH); there are constructions due to Ford that show that if one only has a distributional hypothesis up to {x^{1-c}} for some fixed constant {c>0}, then the asymptotics of sums such as (5), or more generally (9), are not determined by a single scalar parameter {\delta_x}, but can also vary in other ways as well. Thus the Bombieri asymptotic sieve really is asymptotic; in order to get {o(1)} type error terms one needs the level {1-\varepsilon} of distribution to be asymptotically equal to {1} as {x \rightarrow \infty}. Related to this, the quantitative decay of the {o(1)} error terms in the Bombieri asymptotic sieve are extremely poor; in particular, they depend on the dependence of implied constant in axiom (iv) on the parameters {\varepsilon,A}, for which there is no consensus on what one should conjecturally expect.

Filed under: expository, math.NT Tagged: Bombieri sieve, parity problem, sieve theory, twin primes

ParticlebitesJets: From Energy Deposits to Physics Objects


  • Title: “Jet energy scale and resolution in the CMS experiment in pp collisions at 8 TeV”
  • Author: The CMS Collaboration
  • Reference: arXiv:hep-ex:1607.03663v1.pdf


As a collider physicist, I care a lot about jets. They are fascinating objects that cover the ATLAS and CMS detectors during LHC operation and make event displays look really cool (see Figure 1.) Unfortunately, as interesting as jets are, they’re also somewhat complicated and difficult to measure. A recent paper from the CMS Collaboration details exactly how we reconstruct, simulate, and calibrate these objects.

This event was collected in August 2015. The two high-pT jets have an invariant mass of 6.9 TeV and the leading and subleading jet have a pT of 1.3 and 1.2 TeV respectively. (Image credit: ATLAS public results)

Figure 1: This event was collected in August 2015. The two high-pT jets have an invariant mass of 6.9 TeV and the leading and subleading jet have a pT of 1.3 and 1.2 TeV respectively. (Image credit: ATLAS public results)

For the uninitiated, a jet is the experimental signature of quarks or gluons that emerge from a high energy particle collision. Since these colored Standard Model particles cannot exist on their own due to confinement, they cluster or ‘hadronize’ as they move through a detector. The result is a spray of particles coming from the interaction point. This spray can contain mesons, charged and neutral hadrons, basically anything that is colorless as per the rules of QCD.

So what does this mess actually look like in a detector? ATLAS and CMS are designed to absorb most of a jet’s energy by the end of the calorimeters. If the jet has charged constituents, there will also be an associated signal in the tracker. It is then the job of the reconstruction algorithm to combine these various signals into a single object that makes sense. This paper discusses two different reconstructed jet types: calo jets and particle-flow (PF) jets. Calo jets are built only from energy deposits in the calorimeter; since the resolution of the calorimeter gets worse with higher energies, this method can get bad quickly. PF jets, on the other hand, are reconstructed by linking energy clusters in the calorimeters with signals in the trackers to create a complete picture of the object at the individual particle level. PF jets generally enjoy better momentum and spatial resolutions, especially at low energies (see Figure 2).

Jet-energy resolution for calorimeter and particle-flow jets as a function of the jet transverse momentum. The improvement in resolution, of almost a factor of two at low transverse momentum, remains sizable even for jets with very high transverse momentum. (Image credit: CMS Collaboration)

Jet-energy resolution for calorimeter and particle-flow jets as a function of the jet transverse momentum. The improvement in resolution, of almost a factor of two at low transverse momentum, remains sizable even for jets with very high transverse momentum.
(Image credit: CMS Collaboration)

Once reconstruction is done, we have a set of objects that we can now call jets. But we don’t want to keep all of them for real physics. Any given event will have a large number of pile up jets, which come from softer collisions between other protons in a bunch (in time), or leftover calorimeter signals from the previous bunch crossing (out of time). Being able to identify and subtract pile up considerably enhances our ability to calibrate the deposits that we know came from good physics objects. In this paper CMS reports a pile up reconstruction and identification efficiency of nearly 100% for hard scattering events, and they estimate that each jet energy is enhanced by about 10 GeV due to pileup alone.

Once the pile up is corrected, the overall jet energy correction (JEC) is determined via detector response simulation. The simulation is necessary to simulate how the initial quarks and gluons fragment, and the way in which those subsequent partons shower in the calorimeters. This correction is dependent on jet momentum (since the calorimeter resolution is as well), and jet pseudorapidity (different areas of the detector are made of different materials or have different total thickness.) Figure 3 shows the overall correction factors for several different jet radius R values.

Jet energy correction factors for a jet with pT = 30 GeV, as a function of eta (left). Note the spikes around 1.7 (TileGap3, very little absorber material) and 3 (beginning of endcaps.) Simulated jet energy response after JEC as a function of pT (right).

Figure 3: Jet energy correction factors for a jet with pT = 30 GeV, as a function of eta (left). Note the spikes around 1.7 (TileGap3, very little absorber material) and 3 (beginning of endcaps.) Simulated jet energy response after JEC as a function of pT (right).

Finally, we turn to data as a final check on how well these calibrations went. An example of such a check is the tag and probe method with dijet events. Here, we take a good clean event with two back-to-back jets, and ask for one low eta jet for a ‘tag’ jet. The other ‘probe’ jet, at arbitrary eta, is then measured using the previously derived corrections. If the resulting pT is close to the pT of the tag jet, we know the calibration was solid (this also gives us info on how calibrations perform as a function of eta.) A similar method known as pT balancing can be done with a single jet back to back with an easily reconstructed object, such as a Z boson or a photon.

This is really a bare bones outline of how jet calibration is done. In real life, there are systematic uncertainties, jet flavor dependence, correlations; the list goes on. But the entire procedure works remarkably well given the complexity of the task. Ultimately CMS reports a jet energy uncertainty of 3% for most physics analysis jets, and as low as 0.32% for some jets—a new benchmark for hadron colliders!


Further Reading:

  1. “Jets: The Manifestation of Quarks and Gluons.” Of Particular Significance, Matt Strassler.
  2. “Commissioning of the Particle-flow Event Reconstruction with the first LHC collisions recorded in the CMS detector.” The CMS Collaboration, CMS PAS PFT-10-001.
  3. “Determination of jet energy calibrations and transverse momentum resolution in CMS.” The CMS Collaboration, 2011 JINST 6 P11002.

July 18, 2016

Sean CarrollSpace Emerging from Quantum Mechanics

The other day I was amused to find a quote from Einstein, in 1936, about how hard it would be to quantize gravity: “like an attempt to breathe in empty space.” Eight decades later, I think we can still agree that it’s hard.

So here is a possibility worth considering: rather than quantizing gravity, maybe we should try to gravitize quantum mechanics. Or, more accurately but less evocatively, “find gravity inside quantum mechanics.” Rather than starting with some essentially classical view of gravity and “quantizing” it, we might imagine starting with a quantum view of reality from the start, and find the ordinary three-dimensional space in which we live somehow emerging from quantum information. That’s the project that ChunJun (Charles) Cao, Spyridon (Spiros) Michalakis, and I take a few tentative steps toward in a new paper.

We human beings, even those who have been studying quantum mechanics for a long time, still think in terms of a classical concepts. Positions, momenta, particles, fields, space itself. Quantum mechanics tells a different story. The quantum state of the universe is not a collection of things distributed through space, but something called a wave function. The wave function gives us a way of calculating the outcomes of measurements: whenever we measure an observable quantity like the position or momentum or spin of a particle, the wave function has a value for every possible outcome, and the probability of obtaining that outcome is given by the wave function squared. Indeed, that’s typically how we construct wave functions in practice. Start with some classical-sounding notion like “the position of a particle” or “the amplitude of a field,” and to each possible value we attach a complex number. That complex number, squared, gives us the probability of observing the system with that observed value.

Mathematically, wave functions are elements of a mathematical structure called Hilbert space. That means they are vectors — we can add quantum states together (the origin of superpositions in quantum mechanics) and calculate the angle (“dot product”) between them. (We’re skipping over some technicalities here, especially regarding complex numbers — see e.g. The Theoretical Minimum for more.) The word “space” in “Hilbert space” doesn’t mean the good old three-dimensional space we walk through every day, or even the four-dimensional spacetime of relativity. It’s just math-speak for “a collection of things,” in this case “possible quantum states of the universe.”

Hilbert space is quite an abstract thing, which can seem at times pretty removed from the tangible phenomena of our everyday lives. This leads some people to wonder whether we need to supplement ordinary quantum mechanics by additional new variables, or alternatively to imagine that wave functions reflect our knowledge of the world, rather than being representations of reality. For purposes of this post I’ll take the straightforward view that quantum mechanics says that the real world is best described by a wave function, an element of Hilbert space, evolving through time. (Of course time could be emergent too … something for another day.)

Here’s the thing: we can construct a Hilbert space by starting with a classical idea like “all possible positions of a particle” and attaching a complex number to each value, obtaining a wave function. All the conceivable wave functions of that form constitute the Hilbert space we’re interested in. But we don’t have to do it that way. As Einstein might have said, God doesn’t do it that way. Once we make wave functions by quantizing some classical system, we have states that live in Hilbert space. At this point it essentially doesn’t matter where we came from; now we’re in Hilbert space and we’ve left our classical starting point behind. Indeed, it’s well-known that very different classical theories lead to the same theory when we quantize them, and likewise some quantum theories don’t have classical predecessors at all.

The real world simply is quantum-mechanical from the start; it’s not a quantization of some classical system. The universe is described by an element of Hilbert space. All of our usual classical notions should be derived from that, not the other way around. Even space itself. We think of the space through which we move as one of the most basic and irreducible constituents of the real world, but it might be better thought of as an approximate notion that emerges at large distances and low energies.

So here is the task we set for ourselves: start with a quantum state in Hilbert space. Not a random or generic state, admittedly; a particular kind of state. Divide Hilbert space up into pieces — technically, factors that we multiply together to make the whole space. Use quantum information — in particular, the amount of entanglement between different parts of the state, as measured by the mutual information — to define a “distance” between them. Parts that are highly entangled are considered to be nearby, while unentangled parts are far away. This gives us a graph, in which vertices are the different parts of Hilbert space, and the edges are weighted by the emergent distance between them.


We can then ask two questions:

  1. When we zoom out, does the graph take on the geometry of a smooth, flat space with a fixed number of dimensions? (Answer: yes, when we put in the right kind of state to start with.)
  2. If we perturb the state a little bit, how does the emergent geometry change? (Answer: space curves in response to emergent mass/energy, in a way reminiscent of Einstein’s equation in general relativity.)

It’s that last bit that is most exciting, but also most speculative. The claim, in its most dramatic-sounding form, is that gravity (spacetime curvature caused by energy/momentum) isn’t hard to obtain in quantum mechanics — it’s automatic! Or at least, the most natural thing to expect. If geometry is defined by entanglement and quantum information, then perturbing the state (e.g. by adding energy) naturally changes that geometry. And if the model matches onto an emergent field theory at large distances, the most natural relationship between energy and curvature is given by Einstein’s equation. The optimistic view is that gravity just pops out effortlessly in the classical limit of an appropriate quantum system. But the devil is in the details, and there’s a long way to go before we can declare victory.

Here’s the abstract for our paper:

Space from Hilbert Space: Recovering Geometry from Bulk Entanglement
ChunJun Cao, Sean M. Carroll, Spyridon Michalakis

We examine how to construct a spatial manifold and its geometry from the entanglement structure of an abstract quantum state in Hilbert space. Given a decomposition of Hilbert space H into a tensor product of factors, we consider a class of “redundancy-constrained states” in H that generalize the area-law behavior for entanglement entropy usually found in condensed-matter systems with gapped local Hamiltonians. Using mutual information to define a distance measure on the graph, we employ classical multidimensional scaling to extract the best-fit spatial dimensionality of the emergent geometry. We then show that entanglement perturbations on such emergent geometries naturally give rise to local modifications of spatial curvature which obey a (spatial) analog of Einstein’s equation. The Hilbert space corresponding to a region of flat space is finite-dimensional and scales as the volume, though the entropy (and the maximum change thereof) scales like the area of the boundary. A version of the ER=EPR conjecture is recovered, in that perturbations that entangle distant parts of the emergent geometry generate a configuration that may be considered as a highly quantum wormhole.

Like almost any physics paper, we’re building on ideas that have come before. The idea that spacetime geometry is related to entanglement has become increasingly popular, although it’s mostly been explored in the holographic context of the AdS/CFT correspondence; here we’re working directly in the “bulk” region of space, not appealing to a faraway boundary. A related notion is the ER=EPR conjecture of Maldacena and Susskind, relating entanglement to wormholes. In some sense, we’re making this proposal a bit more specific, by giving a formula for distance as a function of entanglement. The relationship of geometry to energy comes from something called the Entanglement First Law, articulated by Faulkner et al., and used by Ted Jacobson in a version of entropic gravity. But as far as we know we’re the first to start directly from Hilbert space, rather than assuming classical variables, a boundary, or a background spacetime. (There’s an enormous amount of work that has been done in closely related areas, obviously, so I’d love to hear about anything in particular that we should know about.)

We’re quick to admit that what we’ve done here is extremely preliminary and conjectural. We don’t have a full theory of anything, and even what we do have involves a great deal of speculating and not yet enough rigorous calculating.

Most importantly, we’ve assumed that parts of Hilbert space that are highly entangled are also “nearby,” but we haven’t actually derived that fact. It’s certainly what should happen, according to our current understanding of quantum field theory. It might seem like entangled particles can be as far apart as you like, but the contribution of particles to the overall entanglement is almost completely negligible — it’s the quantum vacuum itself that carries almost all of the entanglement, and that’s how we derive our geometry.

But it remains to be seen whether this notion really matches what we think of as “distance.” To do that, it’s not sufficient to talk about space, we also need to talk about time, and how states evolve. That’s an obvious next step, but one we’ve just begun to think about. It raises a variety of intimidating questions. What is the appropriate Hamiltonian that actually generates time evolution? Is time fundamental and continuous, or emergent and discrete? Can we derive an emergent theory that includes not only curved space and time, but other quantum fields? Will those fields satisfy the relativistic condition of being invariant under Lorentz transformations? Will gravity, in particular, have propagating degrees of freedom corresponding to spin-2 gravitons? (And only one kind of graviton, coupled universally to energy-momentum?) Full employment for the immediate future.

Perhaps the most interesting and provocative feature of what we’ve done is that we start from an assumption that the degrees of freedom corresponding to any particular region of space are described by a finite-dimensional Hilbert space. In some sense this is natural, as it follows from the Bekenstein bound (on the total entropy that can fit in a region) or the holographic principle (which limits degrees of freedom by the area of the boundary of their region). But on the other hand, it’s completely contrary to what we’re used to thinking about from quantum field theory, which generally assumes that the number of degrees of freedom in any region of space is infinitely big, corresponding to an infinite-dimensional Hilbert space. (By itself that’s not so worrisome; a single simple harmonic oscillator is described by an infinite-dimensional Hilbert space, just because its energy can be arbitrarily large.) People like Jacobson and Seth Lloyd have argued, on pretty general grounds, that any theory with gravity will locally be described by finite-dimensional Hilbert spaces.

That’s a big deal, if true, and I don’t think we physicists have really absorbed the consequences of the idea as yet. Field theory is embedded in how we think about the world; all of the notorious infinities of particle physics that we work so hard to renormalize away owe their existence to the fact that there are an infinite number of degrees of freedom. A finite-dimensional Hilbert space describes a very different world indeed. In many ways, it’s a much simpler world — one that should be easier to understand. We shall see.

Part of me thinks that a picture along these lines — geometry emerging from quantum information, obeying a version of Einstein’s equation in the classical limit — pretty much has to be true, if you believe (1) regions of space have a finite number of degrees of freedom, and (2) the world is described by a wave function in Hilbert space. Those are fairly reasonable postulates, all by themselves, but of course there could be any number of twists and turns to get where we want to go, if indeed it’s possible. Personally I think the prospects are exciting, and I’m eager to see where these ideas lead us.

John BaezFrigatebirds


Frigatebirds are amazing!

They have the largest ratio of wing area to body weight of any bird. This lets them fly very long distances while only rarely flapping their wings. They often stay in the air for weeks at time. And one being tracked by satellite in the Indian Ocean stayed aloft for two months.

Surprisingly for sea birds, they don’t go into the water. Their feathers aren’t waterproof. They are true creatures of the air. They snatch fish from the ocean surface using their long, hooked bills—and they often eat flying fish! They clean themselves in flight by flying low and wetting themselves at the water’s surface before preening themselves.

They live a long time: often over 35 years.

But here’s the cool new discovery:

Since the frigatebird spends most of its life at sea, its habits outside of when it breeds on land aren’t well-known—until researchers started tracking them around the Indian Ocean. What the researchers discovered is that the birds’ flying ability almost defies belief.

Ornithologist Henri Weimerskirch put satellite tags on a couple of dozen frigatebirds, as well as instruments that measured body functions such as heart rate. When the data started to come in, he could hardly believe how high the birds flew.

“First, we found, ‘Whoa, 1,500 meters. Wow. Excellent, fantastique,’ ” says Weimerskirch, who is with the National Center for Scientific Research in Paris. “And after 2,000, after 3,000, after 4,000 meters — OK, at this altitude they are in freezing conditions, especially surprising for a tropical bird.”

Four thousand meters is more than 12,000 feet, or as high as parts of the Rocky Mountains. “There is no other bird flying so high relative to the sea surface,” he says.

Weimerskirch says that kind of flying should take a huge amount of energy. But the instruments monitoring the birds’ heartbeats showed that the birds weren’t even working up a sweat. (They wouldn’t, actually, since birds don’t sweat, but their heart rate wasn’t going up.)

How did they do it? By flying into a cloud.

“It’s the only bird that is known to intentionally enter into a cloud,” Weimerskirch says. And not just any cloud—a fluffy, white cumulus cloud. Over the ocean, these clouds tend to form in places where warm air rises from the sea surface. The birds hitch a ride on the updraft, all the way up to the top of the cloud.


“Absolutely incredible,” says Curtis Deutsch, an oceanographer at the University of Washington. “They’re doing it right through these cumulus clouds. You know, if you’ve ever been on an airplane, flying through turbulence, you know it can be a little bit nerve-wracking.”

One of the tagged birds soared 40 miles without a wing-flap. Several covered more than 300 miles a day on average, and flew continuously for weeks.

• Christopher Joyce, Nonstop flight: how the frigatebird can soar for weeks without stopping, All Things Considered, National Public Radio, 30 June 2016.

Frigatebirds aren’t admirable in every way. They’re kleptoparasites—now there’s a word you don’t hear every day! That’s a name for animals that steal food:

Frigatebirds will rob other seabirds such as boobies, particularly the red-footed booby, tropicbirds, shearwaters, petrels, terns, gulls and even ospreys of their catch, using their speed and maneuverability to outrun and harass their victims until they regurgitate their stomach contents. They may either assail their targets after they have caught their food or circle high over seabird colonies waiting for parent birds to return laden with food.

Frigatebird, Wikipedia.

BackreactionCan black holes tunnel to white holes?

Tl;dr: Yes, but it’s unlikely.

If black holes attract your attention, white holes might blow your mind.

A white hole is a time-reversed black hole, an anti-collapse. While a black hole contains a region from which nothing can escape, a white hole contains a region to which nothing can fall in. Since the time-reversal of a solution of General Relativity is another solution, we know that white holes exist mathematically. But are they real?

Black holes were originally believed to merely be of mathematical interest, solutions that exist but cannot come into being in the natural world. As physicists understood more about General Relativity, however, the exact opposite turned out to be the case: It is hard to avoid black holes. They generically form from matter that collapses under its own gravitational pull. Today it is widely accepted that the black hole solutions of General Relativity describe to high accuracy astrophysical objects which we observe in the real universe.

The simplest black hole solutions in General Relativity are the Schwarzschild-solutions, or their generalizations to rotating and electrically charged black holes. These solutions however are not physically realistic because they are entirely time-independent, which means such black holes must have existed forever. Schwarzschild black holes, since they are time-reversal invariant, also necessarily come together with a white hole. Realistic black holes, on the contrary, which are formed from collapsing matter, do not have to be paired with white holes.

(Aside: Karl Schwarzschild was German. Schwarz means black, Schild means shield. Probably a family crest. It’s got nothing to do with children.)

But there are many things we don’t understand about black holes, most prominently how they handle information of the matter that falls in. Solving the black hole information loss problem requires that information finds a way out of the black hole, and this could be done for example by flipping a black hole over to a white hole. In this case the collapse would not complete, and instead the black hole would burst, releasing all that it had previously swallowed.

It’s an intriguing and simple option. This black-to-white-hole transition has been discussed in the literature for some while, recently by Rovelli and Vidotto in the Planck star idea. It’s also subject of a last week’s paper by Barcelo and Carballo-Rubio.

Is this a plausible solution to the black hole information loss problem?

It is certainly possible to join part of the black hole solution with part of the white hole solution. But doing this brings some problems.

The first problem is that at the junction the matter must get a kick that transfers it from one state into the other. This kick cannot be achieved by any known physics – we know this from the singularity theorems. There isn’t anything in the known physics can prevent a black hole from collapsing entirely once the horizon is formed. Whatever makes this kick hence needs to violate one of the energy conditions, it must be new physics.

Something like this could happen in a region with quantum gravitational effects. But this region is normally confined to deep inside the black hole. A transition to a white hole could therefore happen, but only if the black hole is very small, for example because it has evaporated for a long time.

But this isn’t the only problem.

Before we think about the stability of black holes, let us think about a simpler question. Why doesn’t dough unmix into eggs and flour and sugar neatly separated? Because that would require an entropy decrease. The unmixing can happen, but it’s exceedingly unlikely, hence we never see it.

A black hole too has entropy. It has indeed enormous entropy. It saturates the possible entropy that can be contained within a closed surface. If matter collapses to a black hole, that’s a very likely process to happen. Consequently, if you time-reverse this collapse, you get an exceedingly unlikely process. This solution exists, but it’s not going to happen unless the black hole is extremely tiny, close by the Planck scale.

It is possible that the white hole which a black hole supposedly turns into is not the exact time-reverse, but instead another solution that further increases entropy. But in that case I don’t know where this solution comes from. And even so I would suspect that the kick required at the junction must be extremely finetuned. And either way, it’s not a problem I’ve seen addressed in the literature. (If anybody knows a reference, please let me know.)

In a paper written for the 2016 Awards for Essays on Gravitation, Haggard and Rovelli make an argument in favor of their idea, but instead they just highlight the problem with it. They claim that small quantum fluctuations around the semi-classical limit which is General Relativity can add up over time, eventually resulting in large deviations. Yes, this can happen. But the probability that this happens is tiny, otherwise the semi-classical limit wouldn’t be the semi-classical limit.

The most likely thing to happen instead is that quantum fluctuations average out to give back the semi-classical limit. Hence, no white-hole transition. For the black-to-white-hole transition one would need quantum fluctuations to conspire together in just the right way. That’s possible. But it’s exceedingly unlikely.

In the other recent paper the authors find a surprisingly large transition rate for black to white holes. But they use a highly symmetrized configuration with very few degrees of freedom. This must vastly overestimate the probability for transition. It’s an interesting mathematical example, but it has very little to do with real black holes out there.

In summary: That black holes transition to white holes and in this way release information is an idea appealing because of its simplicity. But I remain unconvinced because I am missing a good argument demonstrating that such a process is likely to happen.

July 17, 2016

Scott AaronsonThe Complexity of Quantum States and Transformations: From Quantum Money to Black Holes

On February 21-25, I taught a weeklong mini-course at the Bellairs Research Institute in Barbados, where I tried to tell an integrated story about everything from quantum proof and advice complexity classes to quantum money to AdS/CFT and the firewall problem—all through the unifying lens of quantum circuit complexity.  After a long effort—on the part of me, the scribes, the guest lecturers, and the organizers—the 111-page lecture notes are finally available, right here.

Here’s the summary:

This mini-course will introduce participants to an exciting frontier for quantum computing theory: namely, questions involving the computational complexity of preparing a certain quantum state or applying a certain unitary transformation. Traditionally, such questions were considered in the context of the Nonabelian Hidden Subgroup Problem and quantum interactive proof systems, but they are much broader than that. One important application is the problem of “public-key quantum money” – that is, quantum states that can be authenticated by anyone, but only created or copied by a central bank – as well as related problems such as copy-protected quantum software. A second, very recent application involves the black-hole information paradox, where physicists realized that for certain conceptual puzzles in quantum gravity, they needed to know whether certain states and operations had exponential quantum circuit complexity. These two applications (quantum money and quantum gravity) even turn out to have connections to each other! A recurring theme of the course will be the quest to relate these novel problems to more traditional computational problems, so that one can say, for example, “this quantum money is hard to counterfeit if that cryptosystem is secure,” or “this state is hard to prepare if PSPACE is not in PP/poly.” Numerous open problems and research directions will be suggested, many requiring only minimal quantum background. Some previous exposure to quantum computing and information will be assumed, but a brief review will be provided.

If you still haven’t decided whether to tackle this thing: it’s basically a quantum complexity theory textbook (well, a textbook for certain themes within quantum complexity theory) that I’ve written and put on the Internet for free.  It has explanations of lots of published results both old and new, but also some results of mine (e.g., about private-key quantum money, firewalls, and AdS/CFT) that I shamefully haven’t yet written up as papers, and that therefore aren’t currently available anywhere else.  If you’re interested in certain specific topics—for example, only quantum money, or only firewalls—you should be able to skip around in the notes without too much difficulty.

Thanks so much to Denis Therien for organizing the mini-course, Anil Ada for managing the scribe notes effort, my PhD students Adam Bouland and Luke Schaeffer for their special guest lecture (the last one), and finally, the course attendees for their constant questions and interruptions, and (of course) for scribing.

And in case you were wondering: yes, I’ll do absolutely anything for science, even if it means teaching a weeklong course in Barbados!  Lest you consider this a pure island boondoggle, please know that I spent probably 12-14 hours per day either lecturing (in two 3-hour installments) or preparing for the lectures, with little sleep and just occasional dips in the ocean.

And now I’m headed to the Perimeter Institute for their It from Qubit summer school, not at all unrelated to my Barbados lectures.  This time, though, it’s thankfully other people’s turns to lecture…

David HoggGaia DR-1

Today Coryn Bailer-Jones (MPIA) gave a great talk at MPIA about the upcoming first data release from the Gaia mission. It will be called “Gaia DR-1” and it will contain a primary release of the TGAS sample of about 2 million stars with proper motions and parallaxes, and a secondary release of about a billion stars with just positions. He walked us through why the DR-1 data will be very limited relative to later releases. But (as my loyal reader knows) I am stoked! The talk was clear and complete, and gives us all lots to think about.

In the morning, Hans-Walter Rix and I met with Karin Lind (MPIA), Maria Bergemann (MPIA), Sven Buder (MPIA), Morgan Fouesneau (MPIA), and Joachim Bestenlehner (MPIA) to talk about how we combine Gaia parallaxes with optical and infrared spectroscopy to determine stellar parameters. The meeting was inspired by inconsistencies between how we expected people to use the spectral and astrometric data. We learned in the meeting that there are simple ways forward to improve both the spectral analysis and things that will be done with Gaia. I showed the crowd my probabilistic graphical model (directed acyclic graph) and we used it to guide the discussion. Almost as useful as showing the PGM for the full problem was showing the PGMs for what is being done now. As my loyal reader knows, I think PGMs are great for communicating about problem structure, and planning code.

David HoggABC is hard; The Cannon with missing labels

I spent some time discussing ABC today with Joe Hennawi (MPIA) and Fred Davies (MPIA), with some help from Dan Foreman-Mackey. The context is the transmission of the IGM (the forest) at very high redshift. We discussed the distance metric to use when you are comparing two distributions, and I suggested the K-S statistic. I suggested this not because I love it, but because there is experience in the literature with it. For ABC to work (I think) all you need is that the distance metric go to zero if and only if the data statistics equal the simulation statistics, and that the metric be convex (which perhaps is implied in the word “distance metric”; I'm not sure about that). That said, the ease with which you can ABC sample depends strongly on the choice (and details within that choice). There is a lot of art to the ABC method. We don't expect the sampling in the Hennawi–Davies problem to be easy.

As part of the above discussion, Foreman-Mackey pointed out that when you do an MCMC sampling, you can be hurt by unimportant nuisance parameters. That is, if you add 100 random numbers to your inference as additional parametersL, each of which has no implications for the likelihood at all, your MCMC still may slow way down, because you still have to accept/reject the prior! Crazy, but true, I think.

In other news, Christina Eilers (MPIA) showed today that she can simultaneously optimize the internal parameters of The Cannon and the labels of training-set objects with missing labels! The context is labeling dwarf stars in the SEGUE data, using labels from Gaia-ESO. This is potentially a big step for data-driven spectral inference, because right now we are restricted (very severely) to training sets with complete labels.

July 16, 2016

Dave Bacon5 Years!

Five years ago I (it’s me Dave Bacon former supposed pseudo-professor and one time quantum pontiff) jumped off the academic ship, swam to shore, and put on a new set of clothes as a software developer for Google. Can it really have been five years? Well I should probably update this blog so my mom knows what I’ve been up to.

  • I helped build and launch Google Domains. From quantum physics professor to builder of domain name registrar. I bet you wouldn’t have predicted that one! Along the way I was lucky to be surrounded by a team of software engineers who were gracious enough to tell me when I was doing silly things, and show me the craft that is a modern software development. I may now, in fact, be a real software developer. Though this just means that I know how much I still need to master.
  • We built a cabin! Well, we worked with wonderful architects and buiders to construct “New Caelifera” over in the Methow Valley (about 4 hours east of Seattle).
    New CaeliferaI have to say that this was one of the funnest things I’ve done in my life. Who knew a dumpy software engineer could also be an aesthete. Even cooler, the end result is an awesome weekend place that you have to drive through a National Park to get to. I’ve been super spoiled.
  • Lost: my sister, Catherine Bacon, and my dog, the Test Dog. Life is precious and we should cherish it!
  • Gained: a new puppy, Imma Dog Bacon. Imma dog? You’re a dog! Imma Dog!
    Imma Dog
  • Hobbies. arXiv:1605.03266. The difference between being a hobby scientist and a professional scientist is that when you’re a professional it’s “Fail. Fail. Fail. Fail. Fail. Fail. Fail. Fail. Fail. Success!” and when you’re a hobbiest it’s “Fffffffaaaaaiiiiiillllll. Fffffffaaaaaiiiiiillllll. Fffffffaaaaaiiiiiillllll. Fffffffaaaaaiiiiiillllll. Fffffffaaaaaiiiiiillllll. Fffffffaaaaaiiiiiillllll. Fffffffaaaaaiiiiiillllll. Fffffffaaaaaiiiiiillllll. Success?” Yes I’m that guy that reads your quantum computing papers at night after work for fun.

So maybe I’ll write another blog post in five years? Or maybe I should resurrect the Pontiff. I saw the Optimizer the other day, and he suggested that since it’s hard for me to blog about quantum computing stuff what with Google involved as it is, I could blog about stuff from the past. But I’m more of a promethean than a pastoralist. It didn’t occur to me until later that there is an alternative solution, one that is particularly appealing to a quantum dude like myself: maybe I should start blogging about an alternative universe? I’ve always liked Tlön, Uqbar, Orbis Tertius.

July 15, 2016

Chad Orzel306-313/366: Strong Island

A delayed photo dump this week, because I was solo-parenting last week while Kate was traveling for work, and then I took the sillyheads down to Long Island to visit my grandmother while Kate was at Readercon. Recovering from all that took a lot of time, plus there was a bunch of computer wrangling in there. But, on the bright side, you get several cute-kid photos in this set…

And here’s a bonus image, which captures something of the flavor of travel with the sillyheads:

Selfie in the car with the kids engrossed in their tablets.

Selfie in the car with the kids engrossed in their tablets.

(That was stopped at a light, not actively driving, so don’t waste everyone’s time leaving chiding comments about the dangers of using phones while driving.)

306/366: Beach

Panorama of Jones Beach State Park.

Panorama of Jones Beach State Park.

One of the main reasons I wanted to get down to Long Island with the kids was the chance to visit one of my favorite spots, Jones Beach State Park. Robert Moses was a sonofabitch in a lot of ways, but setting the prime beach area on the Atlantic side of Long Island aside for a public park redeems a lot.

This is a panorama stitched together from about five shots, and the panorama-stitching program I have really struggled with these– I have some wider shots, but it comepletely choked on those, I think because of the lack of vertical landmarks. This one worked out reasonably well, though I had to crop out some weirdness at the edges. You can see a larger version on Google Photos.

307/366: Waves

The Pip and SteelyKid taunt the waves from high ground.

The Pip and SteelyKid taunt the waves from high ground.

I love Jones Beach because I enjoy body-surfing in the waves there. The day we went wasn’t great for that– it was kind of grey and cloudy, and there was a lot of wind-driven chop that made it hard to find good waves for riding, but I had a great time in the water. The kids were a little more dubious about the ocean, but had a great time running up to and then away from the breaking waves. There was a lot of “Ha, ha, you can’t get us!” from the top of this little sand ridge, and shrieking when a few waves topped it.

308/366: Tunnel

The tunnel in the sand that the kids spent a long time constructing.

The tunnel in the sand that the kids spent a long time constructing.

It wouldn’t be a trip to the beach without building a sand castle. In this case, they didn’t build up but dug down, making a long tunnel that they’d pour water into from a bucket. The goal was to partially fill “ponds” at both ends with water, and there was much cheering when it worked.

309/366: Rats With Wings

Gull taking flight on the boardwalk at Jones Beach.

Gull taking flight on the boardwalk at Jones Beach.

It wouldn’t be a photo dump post here without at least one bird picture. And the beach is, of course, overrun with seagulls. So here are several shots of a gull taking flight, composited together in GIMP, because it was that kind of morning a few days ago.

310/366: Footprints

Random artsy shot of seagull tracks in the sand.

Random artsy shot of seagull tracks in the sand.

Here’s your random artsy shot for this set: the complex pattern of footprints left by gulls swarming all over the sand.

311/366: Hot Lava

The kids playing "Lava Walk" at my grandmother's house.

The kids playing “Lava Walk” at my grandmother’s house.

After a morning at the beach, we spent a bunch of time at my grandmother’s, where the kids amused themselves playing “Lava Walk.” The floor was held to be hot lava, so you weren’t allowed to touch the rug, but had to walk on various books and papers. SteelyKid managed to get all the way across the room and back this way.

312/366: Climber

The kids making their way up the awesome climber at the Long Island Children's Museum.

The kids making their way up the awesome climber at the Long Island Children’s Museum.

Before heading home on Sunday, we went to the Long Island Children’s Museum, which is very convenient to my grandmother’s house. We’d never been there before, but it turns out to be pretty awesome, especially the giant climber they have right inside the entrance, with several levels of carpet-covered wooden platforms making a sort of three-dimensional maze. The Pip was just tall enough to go in, and it was the biggest hit of the whole visit.

313/366: Optics

The kids in front of an array of mirrors at the Long Island Children's Museum.

The kids in front of an array of mirrors at the Long Island Children’s Museum.

The rest of the LICM includes a bunch of very well-done science-museum exhibits, and as an AMO physics person I’m obliged to include at least one photo from the optics section. This was a semicircle of mirrors at slight angles to each other, and the kids greatly enjoyed mugging in front of it.

All in all, a very successful (but exhausting) road trip. The LICM in particular was a great discovery, because it’s a perfect morning activity to burn off some energy before lunch and the long car ride home. We’ll definitely be going back there in the future.

July 14, 2016

Chad OrzelPolitical Query: Who Should I Give Money To?

A question for the more politically plugged-in folks out there: If I want to donate money this election cycle, who should I be looking at giving it to?

OK, that probably needs some unpacking, but given Internet attention spans, I wanted to get the basic question right up front before a passing Pokemon draws people away…

So, the last couple of presidential elections, I donated to the Obama campaign, but it doesn’t really look like Hilary Clinton desperately needs my money. And I’ve given to the Democratic party more generally (DSCC and DCCC) in the past, but given their track record in recent legislative elections, I’m not convinced that’s money well spent– that is, I’m not sure they’re optimally allocating their resources.

So, I’d sort of like to give my regular political donation (which is of order a few hundred bucks) directly to a candidate for whom it would do some good. There are way more races out there than I can really keep track of, though, thus this appeal.

So: who’s a candidate for the House or Senate either running as a Democrat or who will caucus with the Democrats who’s in a race that’s potentially winnable who would benefit from additional cash? Bonus points for opposing a particularly objectionable incumbent, but I don’t want to just protest vote, I’d like to give money to somebody who has a chance of actually winning.

Note: I’m not committing to blindly donating money to any random candidate people name in comments here, I’m just soliciting names for possible donations. I’ll check out the policy views and the status of the race for anybody whose name comes up, and if there’s somebody particularly appealing, I’ll throw a little money their way. I’d prefer to directly contribute to a candidate rather than some progressive umbrella group just on bang-for-buck grounds, so please don’t give me sales pitches for activist organizations. And if you want to berate me for being excessively liberal, or insufficiently liberal, well, I can’t really stop you from leaving a comment, but I’ll mock you quietly.

July 13, 2016

John BaezOperads for “Systems of Systems”

“Systems of systems” is a fashionable buzzword for complicated systems that are themselves made of complicated systems, often of disparate sorts. They’re important in modern engineering, and it takes some thought to keep them from being unmanageable. Biology and ecology are full of systems of systems.

David Spivak has been working a lot on operads as a tool for describing systems of systems. Here’s a nice programmatic talk advocating this approach:

• David Spivak, Operads as a potential foundation for
systems of systems

This was a talk he gave at the Generalized Network Structures and Dynamics Workshop at the Mathematical Biosciences Institute at Ohio State University this spring.

You won’t learn what operads are from this talk—for that, try this:

• Wikipedia, Operad.

But if you know a bit about operads, it may help give you an idea of their flexibility as a formalism for describing ways of sticking together components to form bigger systems!

I’ll probably talk about this kind of thing more pretty soon. So far I’ve been using category theory to study networked systems like electrical circuits, Markov processes and chemical reaction networks. The same ideas handle all these different kind of systems in a unified way. But I want to push toward biology. Here we need more sophisticated ideas. My philosophy is that while biology seems “messy” to physicists, living systems actually operate at higher levels of abstraction, which call for new mathematics.

July 12, 2016

BackreactionPulsars could probe black hole horizons

The first antenna of MeerKAT,
a SKA precursor in South Africa.
[Image Source.]

It’s hard to see black holes – after all, their defining feature is that they swallow light. But it’s also hard to discourage scientists from trying to shed light on mysteries. In a recent paper, a group of researchers from Long Island University and Virginia Tech have proposed a new way to probe the near-horizon region of black holes and, potentially, quantum gravitational effects.

    Shining Light on Quantum Gravity with Pulsar-Black Hole Binaries
    John Estes, Michael Kavic, Matthew Lippert, John H. Simonetti
    arXiv:1607.00018 [hep-th]

The idea is simple and yet promising: Search for a binary system in which a pulsar and a black hole orbit around each other, then analyze the pulsar signal for unusual fluctuations.

A pulsar is a rapidly rotating neutron star that emits a focused beam of electromagnetic radiation. This beam goes into the direction of the poles of the magnetic field, and is normally not aligned with the neutron star’s axis of rotation. The beam therefore spins with a regular period like a lighthouse beacon. If Earth is located within the beam’s reach, our telescopes receive a pulse every time the beam points into our direction.

Pulsar timing can be extremely precise. We know some pulsars that have been flashing for decades every couple of milliseconds to a precision of a few microseconds. This high regularity allows astrophysicists to search for signals which might affect the timing. Fluctuations of space-time itself, for example, would increase the pulsar-timing uncertainty, a method that has been used to derive constraints on the stochastic gravitational wave background. And if a pulsar is in a binary system with a black hole, the pulsar’s signal might scrape by the black hole and thus encode information about the horizon which we can catch on Earth.

No such pulsar-black hole binaries are known to date. But upcoming experiments like eLISA and the Square Kilometer Array (SKA) will almost certainly detect new pulsars. In their paper, the authors estimate that SKA might observe up to 100 new pulsar-black hole binaries, and they put the probability that a newly discovered system would have a suitable orientation at roughly one in a hundred. If they are right, the SKA would have a good chance to find a promising binary.

Much of the paper is dedicated to arguing that the timing accuracy of such a binary pulsar could carry information about quantum gravitational effects. This is not impossible but speculative. Quantum gravitational effects are normally expect to be strong towards the black hole singularity, ie well inside the black hole and hidden from observation. Naïve dimensional estimates reveal that quantum gravity should be unobservably small in the horizon area.

However, this argument has recently been questioned in the aftermath of the firewall controversy surrounding black holes, because one solution to the black hole firewall paradox is that quantum gravitational effects can stretch over much longer distances than the dimensional estimates lead one to expect. Steve Giddings has long been a proponent of such long-distance fluctuations, and scenarios like black hole fuzzballs, or Dvali’s Bose-Einstein Computers also lead to horizon-scale deviations from general relativity. It is hence something that one should definitely look for.

Previous proposals to test the near-horizon geometry were based on measurements of gravitational waves from merger events or the black hole shadow, each of which could reveal deviations from general relativity. However, so far these were quite general ideas lacking quantitative estimates. To my knowledge, this paper is the first to demonstrate that it’s technologically feasible.

Michael Kavic, one of the authors of this paper, will attend our September conference on “Experimental Search for Quantum Gravity.” We’re still planning to life-streaming the talks, so stay tuned and you’ll get a chance to listen in.

July 10, 2016

Clifford JohnsonKill Your Darlings…

dialogues_process_share_7-7-16(Apparently I spent a lot of time cross-hatching, back in 2010-2012? More on this below. click for larger view.)

I've changed locations, have several physics research tasks to work on, and so my usual work flow is not going to be appropriate for the next couple of weeks, so I thought I'd work on a different aspect of the book project. I'm well into the "one full page per day for the rest of the year to stay on target" part of the calendar and there's good news and bad news. On the good news side, I've refined my workflow a lot, and devised new ways of achieving various technical tasks too numerous (and probably boring) to mention, and so I've actually got [...] Click to continue reading this post

The post Kill Your Darlings… appeared first on Asymptotia.

John BaezLarge Countable Ordinals (Part 3)

Last time we saw why it’s devilishly hard to give names to large countable ordinals.

An obvious strategy is to make up a function f from ordinals to ordinals that grows really fast, so that f(x) is a lot bigger than the ordinal x indexing it. This is indeed a good idea. But something funny tends to happen! Eventually x catches up with f(x). In other words, you eventually hit a solution of

x = f(x)

This is called a fixed point of f. At this point, there’s no way to use f(x) as a name for x unless you already have a name for x. So, your scheme fizzles out!

For example, we started by looking at powers of \omega, the smallest infinite ordinal. But eventually we ran into ordinals x that obey

x = \omega^x

There’s an obvious work-around: we make up a new name for ordinals x that obey

x = \omega^x

We call them epsilon numbers. In our usual nerdy way we start counting at zero, so we call the smallest solution of this equation \epsilon_0, and the next one \epsilon_1, and so on.

But eventually we run into ordinals x that are fixed points of the function \epsilon_x, meaning that

x = \epsilon_x

There’s an obvious work-around: we make up a new name for ordinals x that obey

x = \epsilon_x

But by now you can guess that this problem will keep happening, so we’d better get systematic about making up new names! We should let

\phi_0(\alpha) = \omega^\alpha

and let \phi_{n+1}(\alpha) be the \alphath fixed point of \phi_n.

Oswald Veblen, a mathematician at Princeton, came up with this idea around 1908, based on some thoughts of G. H. Hardy:

• Oswald Veblen, Continuous increasing functions of finite and transfinite ordinals, Trans. Amer. Math. Soc. 9 (1908), 280–292.

He figured out how to define \phi_\gamma(\alpha) even when the index \gamma is infinite.

Last time we saw how to name a lot of countable ordinals using this idea: in fact, all ordinals less than the ‘Feferman–Schütte ordinal’. This time I want go further, still using Veblen’s work.

First, however, I feel an urge to explain things a bit more precisely.

Veblen’s fixed point theorem

There are three kinds of ordinals. The first is a successor ordinal, which is one more than some other ordinal. So, we say \alpha is a successor ordinal if

\alpha = \beta + 1

for some \beta. The second is 0, which is not a successor ordinal. And the third is a limit ordinal, which is neither 0 nor a successor ordinal. The smallest example is

\omega = \{1, 2, 3, \dots \}

Every limit ordinal is the ‘limit’ of ordinals less than it. What does that mean, exactly? Remember, each ordinal is a set: the set of all smaller ordinals. We can define the limit of a set of ordinals to be the union of that set. Alternatively, it’s the smallest ordinal that’s greater than or equal to every ordinal in that set.

Now for Veblen’s key idea:

Veblen’s Fixed Point Theorem. Suppose a function f from ordinals to ordinals is:

strictly increasing: if x < y then f(x) < f(y)


continuous: if x is a limit ordinal, f(x) is the limit of the ordinals f(\alpha) where \alpha < x.

Then f must have a fixed point.

Why? For starters, we always have this fact:

x \le f(x)

After all, if this weren’t true, there’d be a smallest x with the property that f(x) < x, since every nonempty set of ordinals has a smallest element. But since f is strictly increasing,

f(f(x)) < f(x)

so f(x) would be an even smaller ordinal with this property. Contradiction!

Using this fact repeatedly, we get

0 \le f(0) \le f(f(0)) \le \cdots

Let \alpha be the limit of the ordinals

0, f(0), f(f(0)), \dots

Then by continuity, f(\alpha) is the limit of the sequence

f(0), f(f(0)), f(f(f(0))),\dots

So f(\alpha) equals \alpha. Voilà! A fixed point!

This construction gives the smallest fixed point of f. There are infinitely many more, since we can start not with 0 but with \alpha+1 and repeat the same argument, etc. Indeed if we try to list these fixed points, we find there is one for each ordinal.

So, we can make up a new function that lists these fixed points. Just to be cute, people call this the derivative of f, so that f'(\alpha) is the \alphath fixed point of f. Beware: while the derivative of a polynomial grows more slowly than the original polynomial, the derivative of a continuous increasing function f from ordinals to ordinals generally grows more quickly than f. It doesn’t really act like a derivative; people just call it that.

Veblen proved another nice theorem:

Theorem. If f is a continuous and strictly increasing function from ordinals to ordinals, so is f'.

So, we can take the derivative repeatedly! This is the key to the Veblen hierarchy.

If you want to read more about this, it helps to know that a function from ordinals to ordinals that’s continuous and strictly increasing is called normal. ‘Normal’ is an adjective that mathematicians use when they haven’t had enough coffee in the morning and aren’t feeling creative—it means a thousand different things. In this case, a better term would be ‘differentiable’.

Armed with that buzzword, you can try this:

• Wikipedia, Fixed-point lemma for normal functions.

Okay, enough theory. On to larger ordinals!

The Feferman–Schütte barrier

First let’s summarize how far we got last time, and why we got stuck. We inductively defined the \alphath ordinal of the \gammath kind by:

\phi_0(\alpha) = \omega^\alpha


\phi_{\gamma+1}(\alpha) = \phi'_\gamma(\alpha)

meaning that \phi_{\gamma+1}(\alpha) is the \alphath fixed point of \phi_\gamma.

This handles the cases where \gamma is zero or a successor ordinal. When \gamma is a limit ordinal we let \phi_{\gamma}(\alpha) be the \alphath ordinal that’s a fixed point of all the functions \phi_\beta for \beta < \gamma.

Last time I explained how these functions \phi_\gamma give a nice notation for ordinals less than the Feferman–Schütte ordinal, which is also called \Gamma_0. This ordinal is the smallest solution of

x = \phi_x(0)

So it’s a fixed point, but of a new kind, because now the x appears as a subscript of the \phi function.

We can get our hands on the Feferman–Schütte ordinal by taking the limit of the ordinals

\phi_0(0), \; \phi_{\phi_0(0)}(0) , \; \phi_{\phi_{\phi_0(0)}(0)}(0), \dots

(If you’re wondering why we use the number 0 here, instead of some other ordinal, I believe the answer is: it doesn’t really matter, we would get the same result if we used any ordinal less than the Feferman–Schütte ordinal.)

The ‘Feferman–Schütte barrier’ is the combination of these two facts:

• On the one hand, every ordinal \beta less than \Gamma_0 can be written as a finite sum of guys \phi_\gamma(\alpha) where \alpha and \gamma are even smaller than \beta. Using this fact repeatedly, we can get a finite expression for any ordinal less than the Feferman–Schütte ordinal in terms of the \phi function, addition, and the ordinal 0.

• On the other hand, if \alpha and \gamma are less than \Gamma_0 then \phi_\gamma(\alpha) is less than \Gamma_0. So we can’t use the \phi function to name the Feferman–Schütte ordinal in terms of smaller ordinals.

But now let’s break the Feferman–Schütte barrier and reach some bigger countable ordinas!

The Γ function

The function \phi_x(0) is strictly increasing and continuous as a function of x. So, using Veblen’s theorems, we can define \Gamma_\alpha to be the \alphath solution of

x = \phi_x(0)

We can then define a bunch of enormous countable ordinals:

\Gamma_0, \Gamma_1, \Gamma_2, \dots

and still bigger ones:

\Gamma_\omega, \; \Gamma_{\omega^2}, \; \Gamma_{\omega^3} , \dots

and even bigger ones:

\Gamma_{\omega^\omega}, \; \Gamma_{\omega^{\omega^\omega}}, \; \Gamma_{\omega^{\omega^{\omega^\omega}}}, \dots

and even bigger ones:

\Gamma_{\epsilon_0}, \Gamma_{\epsilon_1}, \Gamma_{\epsilon_2}, \dots

But since \epsilon_\alpha is just \phi_1(\alpha), we can reach much bigger countable ordinals with the help of the \phi function:

\Gamma_{\phi_2(0)}, \; \Gamma_{\phi_3(0)}, \; \Gamma_{\phi_4(0)}, \dots

and we can do vastly better using the \Gamma function itself:

\Gamma_{\Gamma_0}, \Gamma_{\Gamma_{\Gamma_0}}, \Gamma_{\Gamma_{\Gamma_{\Gamma_0}}} , \dots

The limit of all these is the smallest solution of

x = \Gamma_x

As usual, this ordinal is still countable, but there’s no way to express it in terms of the \Gamma function and smaller ordinals. So we are stuck again.

In short: we got past the Feferman–Schütte barrier by introducing a name for the \alphath solution of x = \phi_x(0). We called it \Gamma_\alpha. This made us happy for about two minutes…

…. but then we ran into another barrier of the same kind.

So what we really need is a more general notation: one that gets us over not just this particular bump in the road, but all bumps of this kind! We don’t want to keep randomly choosing goofy new letters like \Gamma. We need something systematic.

The multi-variable Veblen hierarchy

We were actually doing pretty well with the \phi function. It was nice and systematic. It just wasn’t powerful enough. But if you’re trying to keep track of how far you’re driving on a really long trip, you want an odometer with more digits. So, let’s try that.

In other words, let’s generalize the \phi function to allow more subscripts. Let’s rename \Gamma_\alpha and call it \phi_{1,0}(\alpha). The fact that we’re using two subscripts says that we’re going beyond the old \phi functions with just one subscript. The subscripts 1 and 0 should remind you of what happens when you drive more than 9 miles: if your odometer has two digits, it’ll say you’re on mile 10.

Now we proceed as before: we make up new functions, each of which enumerates the fixed points of the previous one:

\phi_{1,1} = \phi'_{1,0}
\phi_{1,2} = \phi'_{1,1}
\phi_{1,3} = \phi'_{1,2}

and so on. In general, we let

\phi_{1,\gamma+1} = \phi'_{1,\gamma}

and when \gamma is a limit ordinal, we let

\displaystyle{ \phi_{1,\gamma}(\alpha) = \lim_{\beta \to \gamma} \phi_{1,\beta}(\alpha) }

Are you confused?

How could you possibly be confused???

Okay, maybe an example will help. In the last section, our notation fizzled out when we took the limit of these ordinals:

\Gamma_{\Gamma_0}, \Gamma_{\Gamma_{\Gamma_0}}, \Gamma_{\Gamma_{\Gamma_{\Gamma_0}}} , \dots

The limit of these is the smallest solution of x = \Gamma_x. But now we’re writing \Gamma_x = \phi_{1,0}(x), so this limit is the smallest fixed point of \phi_{1,0}. So, it’s \phi_{1,1}(0).

We can now ride happily into the sunset, defining \phi_{1,\gamma}(\alpha) for all ordinals \alpha, \gamma. Of course, this will never give us a notation for ordinals with

x = \phi_{1,x}(0)

But we don’t let that stop us! This is where the new extra subscript really comes in handy. We now define \phi_{2,0}(\alpha) to be the \alphath solution of

x = \phi_{1,x}(0)

Then we drive on as before. We let

\phi_{2,\gamma+1} = \phi'_{2,\gamma}

and when \gamma is a limit ordinal, we say

\displaystyle{ \phi_{2,\gamma}(\alpha) = \lim_{\beta \to \gamma} \phi_{2,\beta}(\alpha) }

I hope you get the idea. Keep doing this!

We can inductively define \phi_{\beta,\gamma}(\alpha) for all \alpha, \beta and \gamma. Of course, these functions will never give a notation for solutions of

x = \phi_{x,0}(0)

To describe these, we need a function with one more subscript! So let \phi_{1,0,0}(\alpha) be the \alphath solution of

x = \phi_{x,0}(0)

We can then proceed on and on and on, adding extra subscripts as needed.

This is called the multi-variable Veblen hierarchy.


To help you understand the multi-variable Veblen hierarchy, I’ll use it to describe lots of ordinals. Some are old friends. Starting with finite ones, we have:

\phi_0(0) = 1

\phi_0(0) + \phi_0(0) = 2

and so on, so we don’t need separate names for natural numbers… but I’ll use them just to save space.

\phi_0(1) = \omega

\phi_0(2) = \omega^2

and so on, so we don’t need separate names for \omega and its powers, but I’ll use them just to save space.

\phi_0(\omega) = \omega^\omega

\phi_0(\omega^\omega) = \omega^{\omega^\omega}

\phi_1(0) = \epsilon_0

\phi_1(1) = \epsilon_1

\displaystyle{ \phi_1(\phi_1(0)) = \epsilon_{\epsilon_0} }

\phi_2(0) = \zeta_0

\phi_2(1) = \zeta_1

where I should remind you that \zeta_\alpha is a name for the \alphath solution of x = \epsilon_x.

\phi_{1,0}(0) = \Gamma_0

\phi_{1,0}(1) = \Gamma_1

\displaystyle{ \phi_{1,0}(\phi_{1,0}(0)) = \Gamma_{\Gamma_0} }

\phi_{1,1}(0) is the limit of \Gamma_{\Gamma_0}, \Gamma_{\Gamma_{\Gamma_0}}, \Gamma_{\Gamma_{\Gamma_{\Gamma_0}}} , \dots

\phi_{1,0,0}(0) is called the Ackermann ordinal.

Apparently Wilhelm Ackermann, the logician who invented a very fast-growing function called Ackermann’s function, had a system for naming ordinals that fizzled out at this ordinal.

The small Veblen ordinal

There are obviously lots more ordinals that can be described using the multi-variable Veblen hierarchy, but I don’t have anything interesting to say about them. And you’re probably more interested in this question: what’s next?

The limit of these ordinals

\phi_1(0), \; \phi_{1,0}(0), \; \phi_{1,0,0}(0), \dots

is called the small Veblen ordinal. Yet again, it’s a countable ordinal. It’s the smallest ordinal that cannot be named in terms of smaller ordinals using the multi-variable Veblen hierarchy…. at least, not the version I described. And here’s a nice fact:

Theorem. Every ordinal \beta less than the small Veblen ordinal can be written as a finite expression in terms of the multi-variable \phi function, addition, and 0.

For example,

\Gamma_0 + \epsilon_{\epsilon_0} + \omega^\omega + 2

is equal to

\displaystyle{  \phi_{\phi_0(0),0}(0) + \phi_{\phi_0(0)}(\phi_{\phi_0(0)}(0)) +  \phi_0(\phi_0(\phi_0(0))) + \phi_0(0) + \phi_0(0)  }

On the one hand, this notation is quite tiresome to read. On the other hand, it’s amazing that it gets us so far!

Furthermore, if you stare at expressions like the above one for a while, and think about them abstractly, they should start looking like trees. So you should find it easy to believe that ordinals less than the small Veblen ordinal correspond to trees, perhaps labelled in some way.

Indeed, this paper describes a correspondence of this sort:

• Herman Ruge Jervell, Finite trees as ordinals, in New Computational Paradigms, Lecture Notes in Computer Science 3526, Springer, Berlin, 2005, pp. 211–220.

However, I don’t think his idea is quite same as what you’d come up with by staring at expressions like

\displaystyle{  \phi_{\phi_0(0),0}(0) + \phi_{\phi_0(0)}(\phi_{\phi_0(0)}(0)) +  \phi_0(\phi_0(\phi_0(0))) + \phi_0(0) + \phi_0(0)  }

Beyond the small Veblen ordinal

We’re not quite done yet. The modifier ‘small’ in the term ‘small Veblen ordinal’ should make you suspect that there’s more in Veblen’s paper. And indeed there is!

Veblen actually extended his multi-variable function \phi_{\gamma_1, \dots, \gamma_n}(\alpha) to the case where there are infinitely many variables. He requires that all but finitely many of these variables equal zero, to keep things under control. Using this, one can set up a notation for even bigger countable ordinals! This notation works for all ordinals less than the large Veblen ordinal.

We don’t need to stop here. The large Veblen ordinal is just the first of a new series of even larger countable ordinals!

These can again be defined as fixed points. Yes: it’s déjà vu all over again. But around here, people usually switch to a new method for naming these fixed points, called ‘ordinal collapsing functions’. One interesting thing about this notation is that it makes use of uncountable ordinal. The first uncountable ordinal is called \Omega, and it dwarfs all those we’ve seen here.

We can use the ordinal collapsing function \psi to name many of our favorite countable ordinals, and more:

\psi(\Omega) is \zeta_0, the smallest solution of x = \epsilon_x.

\psi(\Omega^\Omega) is \Gamma_0, the Feferman–Schütte ordinal.

\psi(\Omega^{\Omega^2}) is the Ackermann ordinal.

\psi(\Omega^{\Omega^\omega}) is the small Veblen ordinal.

\psi(\Omega^{\Omega^\Omega}) is the large Veblen ordinal.

\psi(\epsilon_{\Omega+1}) is called the Bachmann–Howard ordinal. This is the limit of the ordinals

\psi(\Omega), \psi(\Omega^\Omega), \psi(\Omega^{\Omega^\Omega}), \dots

I won’t explain this now. Maybe later! But not tonight. As Bilbo Baggins said:

The Road goes ever on and on
Out from the door where it began.
Now far ahead the Road has gone,
Let others follow it who can!
Let them a journey new begin,
But I at last with weary feet
Will turn towards the lighted inn,
My evening-rest and sleep to meet.

For more

But perhaps you’re impatient and want to begin a new journey now!

The people who study notations for very large countable ordinals tend to work on proof theory, because these ordinals have nice applications to that branch of logic. For example, Peano arithmetic is powerful enough to work with ordinals up to but not including \epsilon_0, so we call \epsilon_0 the proof-theoretic ordinal of Peano arithmetic. Stronger axiom systems have bigger proof-theoretic ordinals.

Unfortunately this makes it a bit hard to learn about large countable ordinals without learning, or at least bumping into, a lot of proof theory. And this subject, while interesting in principle, is quite tough. So it’s hard to find a readable introduction to large countable ordinals.

The bibliography of the Wikipedia article on large countable ordinals gives this half-hearted recommendation:

Wolfram Pohlers, Proof theory, Springer 1989 ISBN 0-387-51842-8 (for Veblen hierarchy and some impredicative ordinals). This is probably the most readable book on large countable ordinals (which is not saying much).

Unfortunately, Pohlers does not seem to give a detailed account of ordinal collapsing functions. If you want to read something fun that goes further than my posts so far, try this:

• Hilbert Levitz, Transfinite ordinals and their notations: for the uninitiated.

(Anyone whose first name is Hilbert must be born to do logic!)

This is both systematic and clear:

• Wikipedia, Ordinal collapsing functions.

And if you want to explore countable ordinals using a computer program, try this:

• Paul Budnik, Ordinal calculator and research tool.

Among other things, this calculator can add, multiply and exponentiate ordinals described using the multi-variable Veblen hierarchy—even the version with infinitely many variables!

July 08, 2016

Secret Blogging SeminarBounds for sum free sets in prime power cyclic groups — three ways

A few weeks ago, I e-mailed Will Sawin excitedly to tell him that I could extend the new bounds on three-term arithmetic progression free subsets of (\mathbb{Z}/p \mathbb{Z})^n to (\mathbb{Z}/p^r \mathbb{Z})^n. Will politely told me that I was the third team to get there — he and Eric Naslund already had the result, as did Fedor Petrov. But I think there might be some expository benefit in writing up the three arguments, to see how they are all really the same trick underneath.

Here is the result we are proving: Let q=p^r be a prime power and let C_q be the cyclic group of order q. Let A \subset C_q^n be a set which does not contain any three term arithmetic progression, except for the trivial progressions (a,a,a). Then

\displaystyle{|A| \leq 3 | \{(m^1, m^2, \ldots, m^n) \in [0,q-1]^n : \sum m^i \leq n(q-1)/3 \}|.}

The exciting thing about this bound is that it is exponentially better than the obvious bound of q^n. Until recently, all people could prove was bounds like q^n/n^c, and this is still the case if q is not a prime power.

All of our bounds extend to the colored version: Let (\bar{a}_i, \bar{b}_i, \bar{c}_i) be a list of N triples in ((C_q)^n)^3 such that \bar{a}_i-2 \bar{b}_i+\bar{c}_i=0, but \bar{a}_i-2\bar{b}_j+\bar{c}_k \neq 0 if (i,j,k) are not all equal. Then the same bound applies to N. To see that this is a special case of the previous problem, take \bar{a}_i = \bar{b}_i = \bar{c}_i. Once the problem is cast this way, if q is odd, one might as well define (a_i, b_i, c_i) = (\bar{a}_i, -2 \bar{b}_i, \bar{c}_i), so our hypotheses are that a_i+b_i+c_i = 0 but a_i+b_j+c_k \neq 0 if (i,j,k) are not all equal. We will prove our bounds in this setting.

My argument — Witt vectors

I must admit, this is the least slick of the three arguments. The reader who wants to cut to the slick versions may want to scroll down to the other sections.

We will put an abelian group structure \oplus on the set \mathbb{F}_p^r which is isomorphic to C_q, using formulas found by Witt. I give an example first: Define an addition on \mathbb{F}_3^2 by

\displaystyle{(x_0,x_1) \oplus (y_0, y_1) = (x_0+y_0, x_1+y_1 - x_0^2 y_0 - x_0 y_0^2)}

The reader may enjoy verifying that this is an associative addition, and makes \mathbb{F}_3^2 into a group isomorphic to C_9. For example, (1,0) \oplus (1,0) \oplus (1,0) = (2,1) \oplus (1,0) = (0,1) and (0,1) \oplus (0,1) \oplus (0,1) = (0,0).

In general, Witt found formulas

\displaystyle{(x_0,x_1, \ldots x_{r-1}) \oplus (y_0, y_1, \ldots, y_{r-1}) = (S_0(x,y), S_1(x,y), \ldots, S_{r-1}(x,y))}

such that \mathbb{F}_p^r becomes an abelian group isomorphic to C_q. If we define x_e and y_e to have degree p^e, then S_e is homogenous of degree p^e. (Of course, Witt did much more: See Wikipedia or Rabinoff.)


\displaystyle{(x_0,x_1, \ldots x_{r-1}) \oplus (y_0, y_1, \ldots, y_{r-1}) \oplus (z_0, \ldots, z_{r-1}) = (S_0(x,y,z), S_1(x,y,z), \ldots, S_{r-1}(x,y,z))}.

and set

\displaystyle{\bar{F}(x,y,z) = \prod_{i=0}^{r-1} \left( 1-S_i(x,y,z)^{p-1} \right)}.

For example, when q=9, we have

\displaystyle{\bar{F} = \left( 1-(x_0+y_0+z_0)^2 \right) \left( 1-(x_1+y_1+z_1 - x_0^2 y_0 - x_0 y_0^2 - x_0^2 z_0 - x_0 z_0^2 - y_0^2 z_0 - y_0 z_0^2 + x_0 y_0 z_0)^2 \right)}.

So \bar{F}(x,y,z)=0 if and only if (x_0, \ldots, x_{r-1}) \oplus (y_0, \ldots, y_{r-1}) \oplus (z_0, \ldots, z_{r-1}) \neq 0 in C_q.

We now work with 3kn variables, x_e^f, y_e^f and z_e^f, where 0 \leq e \leq k-1 and 1 \leq f \leq n. Consider the polynomial

\displaystyle{ \bar{G} = \prod_{f=1}^n \bar{F}(x^f, y^f, z^f) }.

Here each \bar{F} is a polynomial in 3k variables.

So \bar{G} is a polynomial on (\mathbb{F}_p^k)^{3n}. We identify this domain with C_q^{3n}. Then \bar{G}(x,y,z)=0 if and only if x+y+z \neq 0 in the group C_q^n.

We define the degree of a monomial in the x_e^f, y_e^f and z_e^f by setting \deg(x_e^f)=\deg(y_e^f)=\deg(z_e^f)=p^e. In this section, “degree” always has this meaning, not the standard one. The degree of S_e is p^e; the degree of \bar{F} is (p-1) + p(p-1) + p^2(p-1) + \cdots + p^{r-1}(p-1) = p^r-1 = q-1 and the degree of \bar{G} is (q-1)n.

From each monomial in \bar{G}, extract whichever of \prod (x_e^f)^{i_e^f}, \prod (y_e^f)^{j_e^f} or \prod (z_e^f)^{k_e^f} has lowest degree. We see that we can write

\displaystyle{ \bar{G}(x,y,z) = \sum A_s(x) P_s(y,z) + \sum B_s(y) Q_s(x,z) + \sum C_s(z) R_s(x,y)}

where A_s, B_s and C_s are monomials of degree \leq (q-1)n/3.

The now-standard argument (I like Terry Tao’s exposition) shows that N is bounded by three times the number of monomials \prod (x_e^f)^{m_e^f} of degree \leq (q-1)n/3. One needs to check that the argument works when the “degree” of a variable need not be 1, but this is straightforward.

Except we have a problem! There are too many monomials. To solve this issue, let F be the polynomial obtained from \bar{F} by replacing every monomial u^{\bar{k}} by u^k where k \equiv \bar{k} \bmod p-1 with k=0 if \bar{k}=0 and 1 \leq k \leq p-1 if \bar{k}>0. So F coincides with \bar{F} as a function on \mathbb{F}_p^k, but F uses smaller monomials. For example, the reader who multiplies out the expression for \bar{F} when q=9 will find a term 2 x_0^6 y_0 z_0. In F, this is replaced by 2 x_0^2 y_0 z_0. The polynomial F does not have the nice factorization of \bar{F}, but it is much smaller. For example, when q=9, \bar{F} has 137 nonzero monomials and F has 65. Replacing \bar{F} by F can only lower degree, so \deg(F) \leq q-1. Now rerun the argument with G(x,y,z) := \prod_{f=1}^n F(x^f,y^f,z^f). Our new bound is three times the number of monomials \prod (x_e^f)^{m_e^f} of degree \leq (q-1)n/3, with the additional condition that all exponents m_e^f are \leq p-1.

Now, the monomial \prod_e x_e^{m_e} has degree \sum_e p^e m_e. Identify [0,p-1]^r with [0,q-1] by sending (m_0, m_1, \ldots, m_{r-1}) to \sum m_e p^e. We can thus think of [0,p-1]^{rn} as [0,q-1]^n. We get the bound N \leq 3 | \{(m^1, m^2, \ldots, m^n) \in [0,q-1]^n : \sum m^i \leq n(q-1)/3 \}|, just as in the prime case.

Naslund-Sawin — binomial coefficients

Let’s be much slicker. Here is how Naslund and Sawin do it (original here).

Notice that, by Lucas’s theorem, the function x \mapsto \binom{x}{m} is a well defined function C_q \to \mathbb{F}_p when 0 \leq m \leq q-1. Moreover, using Lucas again,

\displaystyle{\binom{x-1}{q-1}=\begin{cases} 1 & x \equiv 0 \bmod q \\ 0 & \mbox{otherwise} \end{cases}}.

Define a function F: C_q^3 \to \mathbb{F}_p by

\displaystyle{F(x,y,z) = \binom{x+y+z-1}{q-1} = \sum_{i+j+k = q-1} \binom{x-1}{i} \binom{y}{i} \binom{z}{k}}

\displaystyle{=\sum_{i+j+k \leq  q-1} (-1)^{q-1-i-j-k} \binom{x}{i} \binom{y}{i} \binom{z}{k}}.

Here we have expanded by Vandermonde’s identity and used \binom{x-1}{m} = \sum_{i \leq m} (-1)^{m-i} \binom{x}{i}.

Define a function G: C_q^{3n} \to \mathbb{F}_p by G(x,y,z) = \prod_{f=1}^n F(x^f,y^f,z^f) just as before. So G(x,y,z)=0 if and only if x+y+z \neq 0 in the abelian group C_q^n. Expanding G gives a sum of terms of the form \prod_f \binom{x^f}{i^f} \binom{y^f}{j^f} \binom{z^f}{k^f}. Considering such a term to have “degree” \sum_f (i^f+j^f+k^f), we see that G has degree \leq (q-1)n.

As in the standard proof, factor out whichever of \prod_f \binom{x^f}{i^f}, \prod_f \binom{y^f}{j^f} or \prod_f \binom{z^f}{k^f}, has least degree. We obtain

\displaystyle{ G(x,y,z) = \sum A_s(x) P_s(y,z) + \sum B_s(y) Q_s(x,z) + \sum C_s(z) R_s(x,y)}

where A_s, B and C are products of binomial coefficients and, taking \deg \binom{w}{m}=m, we have \deg A_s, \deg B_s and \deg C_s \leq (q-1)n/3.

We derive the bound N \leq 3 | \{(m^1, m^2, \ldots, m^n) \in [0,q-1]^n : \sum m^f \leq n(q-1)/3 \}|, exactly as before.

Group rings — Petrov’s argument

I have taken the most liberties in rewriting this argument, to emphasize the similarity with the other arguments. The reader can see the original here.

Let \Gamma = C_q^n. Let \mathbb{F}_p^{\Gamma} be the ring of functions \Gamma \to \mathbb{F}_p with pointwise operations, and let \mathbb{F}_p[\Gamma] be the group ring of \Gamma. We think of \mathbb{F}_p[\Gamma] acting on \mathbb{F}_p^{\Gamma} by \left( \left( \sum a_{\sigma} \sigma \right) f \right)(w) = \sum a_{\sigma} f(w+\sigma).

Let \tau_1, \tau_2, …, \tau_n be generators for \Gamma = C_q^n. Let Y_{\leq d} \subset \mathbb{F}_p^{\Gamma} the functions annihilated by the operators \prod_f (\tau_f-1)^{m_f} where \sum m_f = d+1. For example, Y_{\leq 1} is the functions f: \Gamma \to \mathbb{F}_p which obey f(w+\tau_i+\tau_j) - f(w+\tau_i)-f(w+\tau_j)+f(w)=0 for any w, i and j. We think of Y_{\leq d} as polynomials of degree \leq d, and the dimension of Y_{\leq d} is the number of monomials in n variables of total degree \leq d where each variable has degree \leq q-1.

Define H: \Gamma \to \mathbb{F}_p by H(0)=1 and H(w)=0 otherwise. Define G: \Gamma^3 \to \mathbb{F}_p by G(x,y,z) = H(x+y+z).

We write \alpha_i, \beta_i and \gamma_i for the generators of the three factors in \Gamma \times \Gamma \times \Gamma.
Then we have

\displaystyle{ \left( \prod (\alpha_f-1)^{i_f} \prod (\beta_f-1)^{j_f} \prod (\gamma_f-1)^{k_f} G\right) (x,y,z) = }

\displaystyle{ \left(  \prod (\tau_f-1)^{i_f+j_f+k_f}  H\right) (x+y+z)}.

So, if i_f+j_f+k_f = q-1, then \prod (\alpha_f-1)^{i_f} \prod (\beta_f-1)^{j_f} \prod (\gamma_f-1)^{k_f} G=0 as a function on \Gamma^3.

On the other hand, we can expand G(x,y,z) = \sum A_s(x) B_s(y) C_s(z) for A_s, B_s and C_s in \mathbb{F}_p^{\Gamma}. We see that, if i_f+j_f+k_f = q-1, then

\displaystyle{\sum_s \left( \prod (\alpha_f-1)^{i_f} A_s \right) (x) \left( \prod (\beta_f-1)^{j_f} B_s \right) (y) \left( \prod (\gamma_f-1)^{k_f} C \right) (z) =0}.

We make the familiar deduction: We can write G in the form

\displaystyle{ G(x,y,z) = \sum A_s(x) P_s(y,z) + \sum B_s(y) Q_s(x,z) + \sum C_s(z) R_s(x,y)}

where A_s, B_s and C_s run over a basis for Y_{\leq (q-1)n/3}.

Once more, we obtain the bound N \leq 3 \dim Y_{(q-1)n/3}.

Petrov’s method has an advantage not seen in the other proofs: It generalizes well to the case that \Gamma is non-abelian. For any finite group \Gamma, let I be a one-sided ideal in \mathbb{F}_p[\Gamma] obeying I^3=0. In our case, this is the ideal generated by \prod_f (\tau_f-1)^{m_f} with \sum m_f = (q-1)n/3+1. Then we obtain a bound N \leq 3 \dim \mathbb{F}_p[\Gamma]/I for sum free sets in \Gamma.

What’s going on?

I find Petrov’s proof immensely clarifying, because it explains why the arguments all give the same bound. We are all working with functions C_q \to \mathbb{F}_p. I write them as polynomials in r variables x_e, Naslund and Sawin use binomial coefficients \binom{x}{m}. The formulas to translate between our variables are a mess: For example, my x_1 is their \tfrac{1}{p} (x-x^p) \bmod p. However, we both agree on what it means to be a polynomial of degree \leq d: It means to be annihilated by (\tau-1)^{d+1}.

In both cases, we take the indicator function of the identity and pull it back to \Gamma^3 along the addition map. The first two proofs use explicit identities to see that the result has degree \leq (q-1)n. The third proof points out this is an abstract property of functions pulled back along addition of groups, and has nothing to do with how we write the functions as explicit formulas.

I sometimes think that mathematical progress consists of first finding a dozen proofs of a result, then realizing there is only one proof. My mental image is settling a wilderness — first there are many trails through the dark woods, but later there is an open field where we can run wherever we like. But can we get anywhere beyond the current bounds with this understanding? All I can say is not yet…

Matt StrasslerSpinoffs from Fundamental Science

I find that some people just don’t believe scientists when we point out that fundamental research has spin-off benefits for modern society.  The assumption often seems to be that it’s just a bunch of egghead esoteric researchers trying to justify their existence.  It’s a real problem when those scoffing at our evidence are congresspeople of the United States and their staffers, or other members of governmental funding agencies around the world.

So I thought I’d point out an example, reported on Bloomberg News.  It’s a good illustration of how these things often work out, and it is very rare indeed that they are discussed in the press.

Gravitational waves are usually incredibly tiny effects [typically squeezing the radius of our planet by less than the width of an atomic nucleus] that can be made only with monster black holes and neutron stars.   There’s not much hope of using them in technology.  So what good could an experiment to discover them, such as LIGO, possibly be for the rest of the world?

Well, Shell Oil seems to have found some value in it.   It’s not in the gravitational waves themselves, of course; instead, it is in the technology that has to be developed to detect something so delicate.

Score another one for investment in fundamental scientific research.


Filed under: Gravitational Waves, Science and Modern Society Tagged: LIGO, Spinoffs

July 06, 2016

Doug NatelsonKeeping your (samples) cool is not always easy.

Very often in condensed matter physics we like to do experiments on materials or devices in a cold environment.  As has been appreciated for more than a century, cooling materials down often makes them easier to understand, because at low temperatures there is not enough thermal energy bopping around to drive complicated processes.  There are fewer lattice vibrations.  Electrons settle down more into their lowest available states.  The spread in available electron energies is proportional to \(k_{\mathrm{B}}T\), so any electronic measurement as a function of energy gets sharper-looking at low temperatures.

Sometimes, though, you have to dump energy into the system to do the study you care about.  If you want to measure electronic conduction, you have to apply some voltage \(V\) across your sample to drive a current \(I\), and that \(I \times V\) power shows up as heat.  In our case, we have done work over the last few years trying to do simultaneous electronic measurements and optical spectroscopy on metal junctions containing one or a few molecules (see here).   What we are striving toward is doing inelastic electron tunneling spectroscopy (IETS - see here) at the same time as molecular-scale Raman spectroscopy (see here for example).   The tricky bit is that IETS works best at really low temperatures (say 4.2 K), where the electronic energy spread is small (hundreds of microvolts), but the optical spectroscopy works best when the structure is illuminated by a couple of mW of laser power focused into a ~ 1.5 micron diameter spot.

It turns out that the amount of heating you get when you illuminate a thin metal wire (which can be detected in various ways; for example, we can use the temperature-dependent electrical resistance of the wire itself as a thermometer) isn't too bad when the sample starts out at, say, 100 K.  If the sample/substrate starts out at about 5 K, however, even modest incident laser power directly on the sample can heat the metal wire by tens of Kelvin, as we show in a new paper.  How the local temperature changes with incident laser intensity is rather complicated, and we find that we can model this well if the main roadblock at low temperatures is the acoustic mismatch thermal boundary resistance.  This is a neat effect discussed in detail here.  Vibrational heat transfer between the metal and the underlying insulating substrate is hampered (like \(1/T^3\) at low temperatures) by the fact that the speed of sound is very different between the metal and the insulator.   There are a bunch of other complicated issues (this and this, for example) that can also hinder heat flow in nanostructures, but the acoustic mismatch appears to be the dominant one in our case.   The bottom line:  staying cool in the spotlight is hard.  We are working away on some ideas on mitigating this issue.  Fun stuff.

(Note:  I'm doing some travel, so posting will slow down for a bit.)

Scott AaronsonITCS’2017: Special Guest Post by Christos Papadimitriou

The wait is over.

Yes, that’s correct: the Call for Papers for the 2017 Innovations in Theoretical Computer Science (ITCS) conference, to be held in Berkeley this coming January 9-11, is finally up.  I attended ITCS’2015 in Rehovot, Israel and had a blast, and will attend ITCS’2017 if logistics permit.

But that’s not all: in a Shtetl-Optimized exclusive, the legendary Christos Papadimitriou, coauthor of the acclaimed Logicomix and ITCS’2017 program chair, has written us a guest post about what makes ITCS special and why you should come.  Enjoy!  –SA

ITCS:  A hidden treasure of TCS

by Christos Papadimitriou

Conferences, for me, are a bit like demonstrations: they were fun in the 1970s.  There was the Hershey STOC, of course, and that great FOCS in Providence, plus a memorable database theory gathering in Calabria.  Ah, children, you should have been there…

So, even though I was a loyal supporter of the ITCS idea from the beginning – the “I”, you recall, stands for innovation –, I managed to miss essentially all of them – except for those that left me no excuse.  For example, this year the program committee was unreasonably kind to my submissions, and so this January I was in Boston to attend.

I want to tell you about ITCS 2016, because it was a gas.

First, I saw all the talks.  I cannot recall this ever happening to me before.  I reconnected with fields of old, learned a ton, and got a few cool new ideas.

In fact, I believe that there was no talk with fewer than 60 people in the audience – and that’s about 70% of the attendees.  In most talks it was closer to 90%.  When was the last conference where you saw that?

And what is the secret of this enhanced audience attention?  One explanation is that smaller conference means small auditorium.  Listening to the talk no longer feels like watching a concert in a stadium, or an event on TV, it’s more like a story related by a friend.  Another gimmick that works well is that, at ITCS, session chairs start the session with a 10-minute “rant,” providing context and their own view of the papers in the session.

Our field got a fresh breath of cohesion at ITCS 2016: cryptographers listened to game theorists in the presence of folks who do data structures for a living, or circuit complexity – for a moment there, the seventies were back.

Ah, those papers, their cleverness and diversity and freshness!  Here is a sample of a few with a brief comment for each (take a look at the conference website for the papers and the presentations).

  • What is keeping quantum computers from conquering all of NP? It is the problem with destructive measurements, right?  Think again, say Aaronson, Bouland and Fitzsimons.  In their paper (pdf, slides) they consider several deviations from current restrictions, including non-destructive measurements, and the space ‘just above’ BQP turns out to be a fascinating and complex place.
  • Roei Tell (pdf, slides) asks another unexpected question: when is an object far from being far from having a property? On the way to an answer he discovers a rich and productive duality theory of property testing, as well as a very precise and sophisticated framework in which to explore it.
  • If you want to represent the permanent of a matrix as the determinant of another matrix of linear forms in the entries, how large must this second matrix be? – an old question by Les Valiant. The innovation by Landsberg and Ressayre (pdf, slides) is that they make fantastic progress in this important problem through geometric complexity: If certain natural symmetries are to be satisfied, the answer is exponential!

(A parenthesis:  The last two papers make the following important point clear: Innovation in ITCS is not meant to be the antithesis of mathematical sophistication.  Deep math and methodological innovation are key ingredients of the ITCS culture.)

  • When shall we find an explicit function requiring more than 3n gates? In their brave exploration of new territory for circuit complexity, Golovnev and Kulikov (pdf, slides) find one possible answer: “as soon as we have explicit dispersers for quadratic varieties.”
  • The student paper award went to Aviad Rubinstein for his work (pdf) on auctioning multiple items – the hardest nut in algorithmic mechanism design. He gives a PTAS for optimizing over a large – and widely used – class of “partitioning” heuristics.

Even though there were no lively discussions at the lobby during the sessions – too many folks attending, see? – the interaction was intense and enjoyable during the extra long breaks and the social events.

Plus we had the Graduating Bits night, when the youngest among us get 5 minutes to tell.  I would have traveled to Cambridge just for that!

All said, ITCS 2016 was a gem of a meeting.  If you skipped it, you really missed a good one.

But there is no reason to miss ITCS 2017, let me tell you a few things about it:

  • It will be in Berkeley, January 9 -11 2017, the week before the Barcelona SODA.
  • It will take place at the Simons Institute just a few days before the boot camps on Pseudorandomness and Learning.
  • I volunteered to be program chair, and the steering committee has decided to try a few innovations in the submission process:
  • Submission deadline is mid-September, so you have a few more weeks to collect your most innovative thoughts. Notification before the STOC deadline.
  • Authors will post a copy of their paper, and will submit to the committee a statement about it, say 1000 words max. Think of it as your chance to write a favorable referee report for your own paper!  Telling the committee why you think it is interesting and innovative.  If you feel this is self-evident, just tell us that.
  • The committee members will be the judges of the overall worth and innovative nature of the paper. Sub-reviewers are optional, and their opinion is not communicated to the rest of the committee.
  • The committee may invite speakers to present specific recent interesting work. Submitted papers especially liked by the committee may be elevated to “invited.”
  • Plus Graduating Bits, chair rants, social program, not to mention the Simons Institute auditorium and Berkeley in January.

You should come!

ResonaancesCMS: Higgs to mu tau is going away

One interesting anomaly in the LHC run-1 was a hint of Higgs boson decays to a muon and a tau lepton. Such process is forbidden in the Standard Model by the conservation of  muon and tau lepton numbers. Neutrino masses violate individual lepton numbers, but their effect is far too small to affect the Higgs decays in practice. On the other hand, new particles do not have to respect global symmetries of the Standard Model, and they could induce lepton flavor violating Higgs decays at an observable level. Surprisingly, CMS found a small excess in the Higgs to tau mu search in their 8 TeV data, with the measured branching fraction Br(h→τμ)=(0.84±0.37)%.  The analogous measurement in ATLAS is 1 sigma above the background-only hypothesis, Br(h→τμ)=(0.53±0.51)%. Together this merely corresponds to a 2.5 sigma excess, so it's not too exciting in itself. However, taken together with the B-meson anomalies in LHCb, it has raised hopes for lepton flavor violating new physics just around the corner.  For this reason, the CMS excess inspired a few dozen of theory papers, with Z' bosons, leptoquarks, and additional Higgs doublets pointed out as possible culprits.

Alas, the wind is changing. CMS made a search for h→τμ in their small stash of 13 TeV data collected in 2015. This time they were hit by a negative background fluctuation, and they found Br(h→τμ)=(-0.76±0.81)%. The accuracy of the new measurement is worse than that in run-1, but nevertheless it lowers the combined significance of the excess below 2 sigma. Statistically speaking, the situation hasn't changed much,  but psychologically this is very discouraging. A true signal is expected to grow when more data is added, and when it's the other way around it's usually a sign that we are dealing with a statistical fluctuation...

So, if you have a cool model explaining the h→τμ  excess be sure to post it on arXiv before more run-2 data is analyzed ;)

July 05, 2016

Mark Chu-CarrollCantor Crankery is Boring

Sometimes, I think that I’m being punished.

I’ve written about Cantor crankery so many times. In fact, it’s one of the largest categories in this blog’s index! I’m pretty sure that I’ve covered pretty much every anti-Cantor argument out there. And yet, not a week goes by when another idiot doesn’t pester me with their “new” refutation of Cantor. The “new” argument is always, without exception, a variation on one of the same old boring ones.

But I haven’t written any Cantor stuff in quite a while, and yet another one landed in my mailbox this morning. So, what the heck. Let’s go down the rabbit-hole once again.

We’ll start with a quick refresher.

The argument that the cranks hate is called Cantor’s diagonalization. Cantor’s diagonalization as argument that according to the axioms of set theory, the cardinality (size) of the set of real numbers is strictly larger than the cardinality of the set of natural numbers.

The argument is based on the set theoretic definition of cardinality. In set theory, two sets are the same size if and only if there exists a one-to-one mapping between the two sets. If you try to create a mapping between set A and set B, and in every possible mapping, every A is mapped onto a unique B, but there are leftover Bs that no element of A maps to, then the cardinality of B is larger than the cardinality of A.

When you’re talking about finite sets, this is really easy to follow. If I is the set {1, 2, 3}, and B is the set {4, 5, 6, 7}, then it’s pretty obvious that there’s no one to one mapping from A to B: there are more elements in B than there are in A. You can easily show this by enumerating every possible mapping of elements of A onto elements of B, and then showing that in every one, there’s an element of B that isn’t mapped to by an element of A.

With infinite sets, it gets complicated. Intuitively, you’d think that the set of even natural numbers is smaller than the set of all natural numbers: after all, the set of evens is a strict subset of the naturals. But your intuition is wrong: there’s a very easy one to one mapping from the naturals to the evens: {n → 2n }. So the set of even natural numbers is the same size as the set of all natural numbers.

To show that one infinite set has a larger cardinality than another infinite set, you need to do something slightly tricky. You need to show that no matter what mapping you choose between the two sets, some elements of the larger one will be left out.

In the classic Cantor argument, what he does is show you how, given any purported mapping between the natural numbers and the real numbers, to find a real number which is not included in the mapping. So no matter what mapping you choose, Cantor will show you how to find real numbers that aren’t in the mapping. That means that every possible mapping between the naturals and the reals will omit members of the reals – which means that the set of real numbers has a larger cardinality than the set of naturals.

Cantor’s argument has stood since it was first presented in 1891, despite the best efforts of people to refute it. It is an uncomfortable argument. It violates our intuitions in a deep way. Infinity is infinity. There’s nothing bigger than infinity. What does it even mean to be bigger than infinity? That’s a non-sequiter, isn’t it?

What it means to be bigger than infinity is exactly what I described above. It means that if you have a two infinitely large sets of objects, and there’s no possible way to map from one to the other without missing elements, then one is bigger than the other.

There are legitimate ways to dispute Cantor. The simplest one is to reject set theory. The diagonalization is an implication of the basic axioms of set theory. If you reject set theory as a basis, and start from some other foundational axioms, you can construct a version of mathematics where Cantor’s proof doesn’t work. But if you do that, you lose a lot of other things.

You can also argue that “cardinality” is a bad abstraction. That is, that the definition of cardinality as size is meaningless. Again, you lose a lot of other things.

If you accept the axioms of set theory, and you don’t dispute the definition of cardinality, then you’re pretty much stuck.

Ok, background out of the way. Let’s look at today’s crackpot. (I’ve reformatted his text somewhat; he sent this to me as plain-text email, which looks awful in my wordpress display theme, so I’ve rendered it into formatted HTML. Any errors introduced are, of course, my fault, and I’ll correct them if and when they’re pointed out to me.)

We have been told that it is not possible to put the natural numbers into a one to one with the real numbers. Well, this is not true. And the argument, to show this, is so simple that I am absolutely surprised that this argument does not appear on the internet.

We accept that the set of real numbers is unlistable, so to place them into a one to one with the natural numbers we will need to make the natural numbers unlistable as well. We can do this by mirroring them to the real numbers.

Given any real number (between 0 and 1) it is possible to extract a natural number of any length that you want from that real number.

Ex: From π-3 we can extract the natural numbers 1, 14, 141, 1415, 14159 etc…

We can form a set that associates the extracted number with the real number that it was extracted from.

Ex: 1 → 0.14159265…

Then we can take another real number (in any arbitrary order) and extract a natural number from it that is not in our set.

Ex: 1 → 0.14159266… since 1 is already in our set we must extract the next natural number 14.

Since 14 is not in our set we can add the pair 14 → 0.1415926l6… to our set.

We can do the same thing with some other real number 0.14159267… since 1 and 14 is already in our set we will need to extract a 3 digit natural number, 141, and place it in our set. And so on.

So our set would look something like this…

A) 1 → 0.14159265…
B) 14 → 0.14159266…
C) 141 → 0.14159267…
D) 1410 → 0.141
E) 14101 → 0.141013456789…
F) 5 → 0.567895…
G) 55 → 0.5567891…
H) 555 → 0.555067891…

Since the real numbers are infinite in length (some terminate in an infinite string of zero’s) then we can always extract a natural number that is not in the set of pairs since all the natural numbers in the set of pairs are finite in length. Even if we mutate the diagonal of the real numbers, we will get a real number not on the list of real numbers, but we can still find a natural number, that is not on the list as well, to correspond with that real number.

Therefore it is not possible for the set of real numbers to have a larger cardinality than the set of natural numbers!

This is a somewhat clever variation on a standard argument.

Over and over and over again, we see arguments based on finite prefixes of real numbers. The problem with them is that they’re based on finite prefixes. The set of all finite prefixes of the real numbers is that there’s an obvious correspondence between the natural numbers and the finite prefixes – but that still doesn’t mean that there are no real numbers that aren’t in the list.

In this argument, every finite prefix of π corresponds to a natural number. But π itself does not. In fact, every real number that actually requires an infinite number of digits has no corresponding natural number.

This piece of it is, essentially, the same thing as John Gabriel’s crankery.

But there’s a subtler and deeper problem. This “refutation” of Cantor contains the conclusion as an implicit premise. That is, it’s actually using the assumption that there’s a one-to-one mapping between the naturals and the reals to prove the conclusion that there’s a one-to-one mapping between the naturals and the reals.

If you look at his procedure for generating the mapping, it requires an enumeration of the real numbers. You need to take successive reals, and for each one in the sequence, you produce a mapping from a natural number to that real. If you can’t enumerate the real numbers as a list, the procedure doesn’t work.

If you can produce a sequence of the real numbers, then you don’t need this procedure: you’ve already got your mapping. 0 to the first real, 1 to the second real, 2 to the third read, 3 to the fourth real, and so on.

So, once again: sorry Charlie: your argument doesn’t work. There’s no Fields medal for you today.

One final note. Like so many other bits of bad math, this is a classic example of what happens when you try to do math with prose. There’s a reason that mathematicians have developed formal notations, formal language, detailed logical inference, and meticulous citation. It’s because in prose, it’s easy to be sloppy. You can accidentally introduce an implicit premise without meaning to. If you need to carefully state every premise, and cite evidence of its truth, it’s a whole lot harder to make this kind of mistake.

That’s the real problem with this whole argument. It’s built on the fact that the premise “you can enumerate the real numbers” is built in. If you wrote it in careful formal mathematics, you wouldn’t be able to get away with that.

Chad OrzelWhy Physicists Disparage Philosophers, In Three Paragraphs

Periodically, some scientific celebrity from the physical sciences– Neil deGrasse Tyson or Stephen Hawking, say– will say something dismissive about philosophy, and kick off a big rush of articles about how dumb their remarks are, how important philosophy is, and so on. Given that this happens on a regular basis, you might wonder why it is that prominent physicists keep saying snide things about philosophy. But never fear, the New York Times is here to help, with an op-ed by James Blachowicz, an emeritus philosopher from Loyola, grandly titled There Is No Scientific Methods.

It’s actually not that bad as essays about science from philosophers go, until the very end. Blachowicz’s point is that the process of scientific discovery has more in common with other disciplines than generally appreciated. He kicks this off with a reference to a long-ago talk about the writing of poetry, but most of the essay is devoted to a comparison between a Socratic sort of argument defining courage and Kepler’s discovery that Mars has an elliptical orbit.

The specific examples he uses are kind of dry and abstract, but I’m mostly on board with his argument. I would pretty much have to be, having written a whole book arguing that scientific thinking plays an essential role in everyday life. But then we come to the final three paragraphs:

If scientific method is only one form of a general method employed in all human inquiry, how is it that the results of science are more reliable than what is provided by these other forms? I think the answer is that science deals with highly quantified variables and that it is the precision of its results that supplies this reliability. But make no mistake: Quantified precision is not to be confused with a superior method of thinking.

I am not a practicing scientist. So who am I to criticize scientists’ understanding of their method?

I would turn this question around. Scientific method is not itself an object of study for scientists, but it is an object of study for philosophers of science. It is not scientists who are trained specifically to provide analyses of scientific method.

I have any number of problems with this, starting with the fact that it feels like he realized he was up against his word limit for the column, and just stopped, like an undergrad whose term paper crossed onto the tenth page in Word. We get a thorough exploration of blind alleys toward a definition of courage, then dismiss all of science with “highly quantified variables”, and off to the faculty club for brandy.

But I think this horrible shruggie of an ending also serves to illustrate what drives (many) physicists (and other scientists) nuts about philosophers. That is, he makes a pretty decent argument by analogy that scientific thinking and philosophical thinking are more similar than not, but then fails to do… anything, really. As a scientist reading along, this positively screams for a “Therefore…” followed by some sort of action item. You’ve made an argument that the scientific method is “only one form of a general method employed in all human inquiry,” great. I’m with you on that. And now, what do we do with that information?

I find this incredibly frustrating, in no small part because (as noted above) I wrote a whole book making a similar argument. And I had a pretty clear take-away message in mind when I did that, namely that the universality of the scientific method shows that science is not, in fact, incomprehensible to non-scientists. We can all think like scientists, and knowing that should give us courage to use those reasoning skills to our advantage.

Now, obviously, I’m a scientist, and thus inclined to favor a conclusion encouraging non-scientists to take some lessons from the study of “highly quantified variables” and apply them to their own activities. But I’d also be happy with a conclusion that ran in the opposite direction– that the commonality on methods should lead scientists to show more respect for their colleagues in other fields of study. That’s also a reasonable argument to make.

But instead, it’s just ¯\_(ツ)_/¯. And while this is an especially abrupt ending, it’s just an extreme example of a pretty general phenomenon when dealing with philosophers and other scholars in “the humanities.” As a scientist, I often find myself nodding along with the steps of the process to work something out, only to be left waiting for some sort of concrete conclusion about what comes next. There’s a comprehensive failure to build on prior results, or even suggest how someone else might build on them in the future, and as a physicist I find this maddening.

(Of course, the absolute worst part of this tendency is that it carries over into faculty governance. As a result, we have interminable meetings in which people ask questions but aren’t interested in hearing answers, or identify problems but decline to make any policy recommendations toward solving them…)

And there’s an extra bonus scoop of “maddening” when, as in this case, the great big ¯\_(ツ)_/¯ is followed by an assertion that study of these Important Questions is a matter that must be left to philosophers. Because, clearly, scientists simply aren’t equipped to follow through this kind of analysis and then… not do anything with what they’ve learned thereby.

So, if you wonder why it is that scientists (particularly physicists) tend to roll their eyes and sigh heavily when the subject of philosophy comes up, I think this is an excellent case study. And it’s something to take into account the next time somebody sits down to write (or edit) yet another essay on Why Philosophy Matters To Science. If you want physicists to take philosophy more seriously, you need an “and therefore…” at the end. Or they’re likely to come up with their own more colorful but less helpful suggestions of what you can do with your research.

BackreactionWhy the LHC is such a disappointment: A delusion by name “naturalness”

Naturalness, according to physicists.

Before the LHC turned on, theoretical physicists had high hopes the collisions would reveal new physics besides the Higgs. The chances of that happening get smaller by the day. The possibility still exists, but the absence of new physics so far has already taught us an important lesson: Nature isn’t natural. At least not according to theoretical physicists.

The reason that many in the community expected new physics at the LHC was the criterion of naturalness. Naturalness, in general, is the requirement that a theory should not contain dimensionless numbers that are either very large or very small. If that is so, then theorists will complain the numbers are “finetuned” and regard the theory as contrived and hand-made, not to say ugly.

Technical naturalness (originally proposed by ‘t Hooft) is a formalized version of naturalness which is applied in the context of effective field theories in particular. Since you can convert any number much larger than one into a number much smaller than one by taking its inverse, it’s sufficient to consider small numbers in the following. A theory is technically natural if all suspiciously small numbers are protected by a symmetry. The standard model is technically natural, except for the mass of the Higgs.

The Higgs is the only (fundamental) scalar we know and, unlike all the other particles, its mass receives quantum corrections of the order of the cutoff of the theory. The cutoff is assumed to be close by the Planck energy – that means the estimated mass is 15 orders of magnitude larger than the observed mass. This too-large mass of the Higgs could be remedied simply by subtracting a similarly large term. This term however would have to be delicately chosen so that it almost, but not exactly, cancels the huge Planck-scale contribution. It would hence require finetuning.

In the framework of effective field theories, a theory that is not natural is one that requires a lot of finetuning at high energies to get the theory at low energies to work out correctly. The degree of finetuning can, and has been, quantified in various measures of naturalness. Finetuning is thought of as unacceptable because the theory at high energy is presumed to be more fundamental. The physics we find at low energies, so the argument, should not be highly sensitive to the choice we make for that more fundamental theory.

Until a few years ago, most high energy particle theorists therefore would have told you that the apparent need to finetuning the Higgs mass means that new physics must appear nearby the energy scale where the Higgs will be produced. The new physics, for example supersymmetry, would avoid the finetuning.

There’s a standard tale they have about the use of naturalness arguments, which goes somewhat like this:

1) The electron mass isn’t natural in classical electrodynamics, and if one wants to avoid finetuning this means new physics has to appear at around 70 MeV. Indeed, new physics appears even earlier in form of the positron, rendering the electron mass technically natural.

2) The difference between the masses of the neutral and charged pion is not natural because it’s suspiciously small. To prevent fine-tuning one estimates new physics must appear around 700 MeV, and indeed it shows up in form of the rho meson.

3) The lack of flavor changing neutral currents in the standard model means that a parameter which could a priori have been anything must be very small. To avoid fine-tuning, the existence of the charm quark is required. And indeed, the charm quark shows up in the estimated energy range.

From these three examples only the last one was an actual prediction (Glashow, Iliopoulos, and Maiani, 1970). To my knowledge this is the only prediction that technical naturalness has ever given rise to – the other two examples are post-dictions.

Not exactly a great score card.

But well, given that the standard model – in hindsight – obeys this principle, it seems reasonable enough to extrapolate it to the Higgs mass. Or does it? Seeing that the cosmological constant, the only other known example where the Planck mass comes in, isn’t natural either, I am not very convinced.

A much larger problem with naturalness is that it’s a circular argument and thus a merely aesthetic criterion. Or, if you prefer, a philosophic criterion. You cannot make a statement about the likeliness of an occurrence without a probability distribution. And that distribution already necessitates a choice.

In the currently used naturalness arguments, the probability distribution is assumed to be uniform (or at least approximately uniform) in a range that can be normalized to one by dividing through suitable powers of the cutoff. Any other type of distribution, say, one that is sharply peaked around small values, would require the introduction of such a small value in the distribution already. But such a small value justifies itself by the probability distribution just like a number close to one justifies itself by its probability distribution.

Naturalness, hence, becomes a chicken-and-egg problem: Put in the number one, get out the number one. Put in 0.00004, get out 0.00004. The only way to break that circle is to just postulate that some number is somehow better than all other numbers.

The number one is indeed a special number in that it’s the unit element of the multiplication group. One can try to exploit this to come up with a mechanism that prefers a uniform distribution with an approximate width of one by introducing a probability distribution on the space of probability distributions, leading to a recursion relation. But that just leaves one to explain why that mechanism.

Another way to see that this can’t solve the problem is that any such mechanism will depend on the basis in the space of functions. Eg, you could try to single out a probability distribution by asking that it’s the same as its Fourier-transformation. But the Fourier-transformation is just one of infinitely many basis transformations in the space of functions. So again, why exactly this one?

Or you could try to introduce a probability distribution on the space of transformations among bases of probability distributions, and so on. Indeed I’ve played around with this for some while. But in the end you are always left with an ambiguity, either you have to choose the distribution, or the basis, or the transformation. It’s just pushing around the bump under the carpet.

The basic reason there’s no solution to this conundrum is that you’d need another theory for the probability distribution, and that theory per assumption isn’t part of the theory for which you want the distribution. (It’s similar to the issue with the meta-law for time-varying fundamental constants, in case you’re familiar with this argument.)

In any case, whether you buy my conclusion or not, it should give you a pause that high energy theorists don’t ever address the question where the probability distribution comes from. Suppose there indeed was a UV-complete theory of everything that predicted all the parameters in the standard model. Why then would you expect the parameters to be stochastically distributed to begin with?

This lacking probability distribution, however, isn’t my main issue with naturalness. Let’s just postulate that the distribution is uniform and admit it’s an aesthetic criterion, alrighty then. My main issue with naturalness is that it’s a fundamentally nonsensical criterion.

Any theory that we can conceive of which describes nature correctly must necessarily contain hand-picked assumptions which we have chosen “just” to fit observations. If that wasn’t so, all we’d have left to pick assumptions would be mathematical consistency, and we’d end up in Tegmark’s mathematical universe. In the mathematical universe then, we’d no longer have to choose a consistent theory, ok. But we’d instead have to figure out where we are, and that’s the same question in green.

All our theories contain lots of assumptions like Hilbert-spaces and Lie-algebras and Haussdorf measures and so on. For none of these is there any explanation other than “it works.” In the space of all possible mathematics, the selection of this particular math is infinitely fine-tuned already – and it has to be, for otherwise we’d be lost again in Tegmark space.

The mere idea that we can justify the choice of assumptions for our theories in any other way than requiring them to reproduce observations is logical mush. The existing naturalness arguments single out a particular type of assumption – parameters that take on numerical values – but what’s worse about this hand-selected assumption than any other hand-selected assumption?

This is not to say that naturalness is always a useless criterion. It can be applied in cases where one knows the probability distribution, for example for the typical distances between stars or the typical quantum fluctuation in the early universe, etc. I also suspect that it is possible to find an argument for the naturalness of the standard model that does not necessitate to postulate a probability distribution, but I am not aware of one.

It’s somewhat of a mystery to me why naturalness has become so popular in theoretical high energy physics. I’m happy to see it go out of the window now. Keep your eyes open in the next couple of years and you’ll witness that turning point in the history of science when theoretical physicists stopped dictating nature what’s supposedly natural.

July 04, 2016

Secret Blogging SeminarThe growth rate for three-colored sum-free sets

There seems to be a rule that all progress on the cap set problem should be announced on blogs, so let me continue the tradition. Robert Kleinberg, Will Sawin and I have found the rate of growth of three-colored sum-free subsets of (\mathbb{Z}/q \mathbb{Z})^n, as n \to \infty. We just don’t know it is that what we’ve found!

The preprint is here.

Let me first explain the problem. Let H be an abelian group. A subset A of H is said to be free of three term arithmetic progressions if there are no solutions to a-2b+c=0 with a, b, c \in A, other than the trivial solutions (a,a,a). I’ll write C_q for the cyclic group of order q. Ellenberg and Giswijt, building on work by Croot, Lev and Pach have recently shown that such an A in C_3^n can have size at most 3 \cdot \# \{ (a_1, a_2, \ldots, a_n) \in  \{0,1,2 \}^n : \sum a_i \leq 2n/3 \}, which is \approx 2.755^n. This was the first upper bound better than 3^n/n^c, and has set off a storm of activity on related questions.

Robert Kleinberg pointed out the argument extends just as well to bound colored-sets without arithmetic progressions. A colored set is a collection of triples (a_i, b_i, c_i) in H^3, and we see that it is free of arithmetic progressions if we have a_i - 2 b_j + c_k = 0 if and only if i=j=k. So, if a_i = b_i = c_i, then this is the same as a set free of three term arithmetic progressions, but the colored version allows us the freedom to set the three coordinates separately.

Moreover, once a, b and c are treated separately, if \# H is odd, we may as well replace b_i by -2b_i and just require that a_i+b_i+c_i=0 if and only if i is odd. This is the three-colored sum-free set problem. Three-colored sum-free sets are easier to construct than three-term arithmetic-progression free sets, but the Croot-Lev-Pach/Ellenberg-Giswijt bounds apply to them as well*.

Our result is a matching of upper and lower bounds: There is a constant \gamma(q) such that

(1) We can construct three-colored sum-free subsets of C_q^n of size \exp(\gamma n-o(n)) and

(2) For q a prime power, we can show that three-colored sum-free subsets of C_q^n have size at most \exp(\gamma n).

So, what is \gamma? We suspect it is the same number as in Ellenberg-Giswijt, but we don’t know!

When q is prime, Ellenberg and Giswijt establish the bound 3 \cdot \{ (a_1, a_2, \ldots, a_n) \in \mathbb{Z}_{\geq 0}^n : \sum a_i \leq (q-1)n/3 \}. Petrov, and independently Naslund and Sawin (in preparation), have extended this argument to prime powers.

In the set \{ (a_1, a_2, \ldots, a_n) \in \{0,1,\ldots,q-1 \}^n : \sum a_i \leq (q-1)n/3 \}, almost all the n-tuples have a particular mix of components. For example, when q=3, almost all n tuples have roughly 0.514n zeroes, 0.305 n ones and 0.181 n twos. The number of such n tuples is roughly \exp(n \eta(0.514, 0.305, 0.181)), where \eta(p_0, p_1, \ldots, p_k) is the entropy - \sum p_k \log p_k.

In general, let (p_0, p_1, \ldots, p_{q-1}) be the probability distribution on \{ 0,1,\ldots, q-1 \} which maximizes entropy, subject to the constraint that the expected value \sum j p_j is (q-1)/3. Then almost all n-tuples in \{ (a_1, a_2, \ldots, a_n) \in \{0,1,\ldots,q-1 \}^n : \sum a_i \leq (q-1)n/3 \} have roughly p_j n copies of j, and the number of such n-tuples is grows like \exp(n \cdot \eta(p_0, \ldots, p_{q-1})). I’ll call (p_0, \ldots, p_{q-1}) the EG-distribution.

So Robert and I set out to construct three-colored sum-free sets of size
\exp(n \cdot \eta(p_0, \ldots, p_{q-1})). What we were actually able to do was to construct such sets whenever there was an S_3-symmetric probability distribution on T:= \{ (a,b,c) \in \mathbb{Z}_{\geq 0} : a+b+c = q-1 \} such that p_j was the marginal probability that the first coordinate of (a,b,c) was j, and the same for the b and c coordinates. For example, in the q=3 case, if we pick (2,0,0), (0,2,0) and (0,0,2) with probability 0.181 and (1,1,0), (1,0,1) and (0,1,0) with probability 0.152, then the resulting distribution on each of the three coordinates is the EG-distribution (0.514, 0.305, 0.181), and we can realize the growth rate of the EG bound for q=3.

Will pointed out to us that, if such a probability distribution on T does not exist, then we can lower the upper bound! So, here is our result:

Consider all S_3-symmetric probability distributions on T. Let (\psi_0, \psi_1,\ldots, \psi_{q-1}) be the corresponding marginal distribution, with \psi_j the probabilty that the first coordinate of (a,b,c) will be j. Let \gamma be the largest value of \eta(\psi_0, \ldots, \psi_{q-1}) for such a \psi. Then

(1) There are three-colored sum-free subsets of C_q^n of size \exp(\gamma n - o(n)) and

(2) If q is a prime power, such sets have size at most \exp(\gamma n).

Any marginal of an S_3-symmetric distribution on T has expected value (q-1)/3, so our upper bound is at least as strong as the Ellenberg-Gisjwijt/Petrov-Naslund-Sawin bound. We suspect they are the same: That their optimal probability distribution is such a marginal. But we don’t know!

Here are a few remarks:

(1) The restriction to S_3-symmetric distributions is a notational convenience. Really, all we need is that all three marginals are equal to each other. But we might as well impose S_3-symmetry because, if all the marginals of a distribution are equal, we can just take the average over all S_3 permutations of that distribution.

(2) Our lower bound does not need q to be a prime power. I’d love to know whether the upper bound can also remove that restriction.

(3) If the largest entropy of a marginal comes from a distribution \pi_{abc} on T with all \pi_{abc}>0, then the marginal distribution is the EG distribution. The problem is about the distributions at the boundary; it seems hard to show that it is always beneficial to perturb inwards.

(4) For q \geq 7, there is more than one distribution on T with the required marginal. One canonical choice would be the one which has largest entropy given that marginal. If the optimal solution has all \pi_{abc}>0, then one can show that it factors as \pi_{abc} = \beta(a) \beta(b) \beta(c) for some function \beta:\{0,1,\ldots,q-1\} \to \mathbb{R}_{>0}.

* One exception is that Fedor Petrov has lowered the bound for AP free sets in C_3^n to 2 \cdot \{ (a_1, a_2, \ldots, a_n) \in \mathbb{Z}_{\geq 0}^n : \sum a_i \leq 2n/3 \}+1, whereas the bound for sum-free is still 3 \cdot \{ (a_1, a_2, \ldots, a_n) \in \mathbb{Z}_{\geq 0}^n : \sum a_i \leq 2n/3 \}. But, as you will see, I am chasing much rougher bounds here.

Jordan EllenbergVariations on three-term arithmetic progressions

Here are three functions.  Let N be an integer, and consider:

  •  G_1(N), the size of the largest subset S of 1..N containing no 3-term arithmetic progression;
  •  G_2(N), the largest M such that there exist subsets S,T of 1..N with |S| = |T| = M such that the equation s_i + t_i = s_j + t_k has no solutions with (j,k) not equal to (i,i).  (This is what’s called  a tri-colored sum-free set.)
  • G_3(N), the largest M such that the following is true: given subsets S,T of 1..N, there always exist subsets S’ of S and T’ of T with |S’| + |T’| = M and S'+T \cup S+T' = S+T.

You can see that G_1(N) <= G_2(N) <= G_3(N).  Why?  Because if S has no 3-term arithmetic progression, we can take S = T and s_i = t_i, and get a tri-colored sum-free set.  Now suppose you have a tri-colored sum-free set (S,T) of size M; if S’ and T’ are subsets of S and T respectively, and S'+T \cup S+T' = S+T, then for every pair (s_i,t_i), you must have either s_i in S’ or t_i in T’; thus |S’| + |T’| is at least M.

When the interval 1..N is replaced by the group F_q^n, the Croot-Lev-Pach-Ellenberg-Gijswijt argument shows that G_1(F_q^n) is bounded above by the number of monomials of degree at most (q-1)n/3; call this quantity M(F_q^n).  In fact, G_3(F_q^n) is bounded above by M(F_q^n), too (see the note linked from this post) and the argument is only a  modest extension of the proof for G_1.  For all we know, G_1(F_q^n) might be much smaller, but Kleinberg has recently shown that G_2(F_2^n) (whence also G_3(F_2^n)) is equal to M(F_2^n) up to subexponential factors, and work in progress by Kleinberg and Speyer has shown this for several more q and seems likely to show that the bound is tight in general.  On the other hand, I have no idea whether to think G_1(F_q^n) is actually equal to M(F_q^n); i.e. is the bound proven by me and Dion sharp?

The behavior of G_1(N) is, of course, very much studied; we know by Behrend (recently sharpened by Elkin) that G_1(N) is at least N/exp(c sqrt(log N)).  Roth proved that G_1(N) = o(N), and the best bounds, due to Tom Sanders, show that G_1(N) is O(N(log log N)^5 / log N).  (Update:  Oops, no!  Thomas Bloom has an upper bound even a little better than Sanders, change that 5 to a 4.)

What about G_2(N) and G_3(N)?  I’m not sure how much people have thought about these problems.  But if, for instance, you could show (for example, by explicit constructions) that G_3(N) was closer to Sanders than to Behrend/Elkin, it would close off certain strategies for pushing the bound on G_1(N) downward. (Update:  Jacob Fox tells me that you can get an upper bound for G_2(N) of order N/2^{clog* N} from his graph removal paper, applied to the multicolored case.)

Do we think that G_2(N) and G_3(N) are basically equal, as is now known to be the case for F_q^n?

July 03, 2016

Chad Orzel300-305/366: Peregrination

A while back, I went down to Vroman’s Nose in Middleburgh to go for a hike, and found a sign saying that peregrine falcons are known to nest on the cliffs. Since the peregrine falcon is SteelyKid’s absolute favorite bird, and the subject of her school research project, this seemed like a good location for a family hike, so I took the kids down there yesterday. And even though that was really only one day out of a whole week, I got a bunch of really good photos from it, so it will supply all the material for this week’s photo dump.

300/366: Panorama

Panorama of farmland from the top of Vroman's Nose.

Panorama of farmland from the top of Vroman’s Nose.

The view from the top of Vroman’s Nose is pretty amazing, but a little hard to capture with the lenses I have. However, you can stitch together several of these to make a pretty cool panorama…

I took shots for multiple panoramic views, and uploaded the whole batch to a Google Photos album in hopes that they would magically decide to do the stitching for me. The Google gods are fickle, though, so I ended up having to download software to do it myself. This is one of those largely inscrutable written-by-UNIX-geeks packages where I’m really not sure what the hell it’s doing, but it did a nice job with this one (three shots with the 24mm lens), and one even bigger (10 shots at the minimum zoom of the telephoto, 55mm); both of them are in the Google Photos album if you want a higher-resolution look.

301/366: Nervewracking

SteelyKid and The Pip wave from a precarious position.

SteelyKid and The Pip wave from a precarious position.

We had a little drama on the way up, but the kids eventually cheered up. Which was good, on the one hand, but on the bad side led to a lot of this kind of thing: the kids getting way out ahead of me, along the edge of a sheer cliff a couple hundred feet high. This was only a little terrifying, especially since I was solo-parenting this trip (Kate has a stomach bug, and stayed home in bed).

302/366: Expounding

The Pip holds forth atop Vroman's Nose.

The Pip holds forth atop Vroman’s Nose.

To be fair, SteelyKid showed a good degree of caution about the cliff edges. Her brother, on the other hand, was prone to grand sweeping gestures as he talked about how high up we were, and how wide the valley was, and so on. At least I managed to get him to stop skipping and jumping while we were near the cliffs…

303/366: Sibs

SteelyKid and The Pip looking over Scoharie County from arop Vroman's Nose.

SteelyKid and The Pip looking over Scoharie County from arop Vroman’s Nose.

Scary as it occasionally was, it was worth it for super cute moments like this. Which I wasn’t sure we’d actually get, as The Pip ran into SteelyKid with his walking stick on the way up the trail, giving her a small cut behind her knee that became a huge issue for a while. About fifteen minutes before I took this, I thought she’d be more likely to thrown him off the cliff. Though that would’ve involved walking, which she was loudly insisting she couldn’t do. While stomping off along the cliff edge faster than I could herd The Pip after her…

Fortunately, the whole thing was redeemed by:

304/366: Flight

Peregrine falcon in flight over Vroman's Nose.

Peregrine falcon in flight over Vroman’s Nose.

An actual peregrine falcon, “living free and in the wild,” as The Pip informed all the many hikers we passed on our way back down the path to the car. SteelyKid was the first to spot it, flying around and squawking agitatedly at us; I suspect we were directly over the ledge where its nest was located. It made a couple passes at or below the cliff level, then settled on a dead tree branch for a bit before deciding we weren’t actually a threat, and soaring off overhead, then diving behind a rock outcropping.

And, of course, the period of posing let me get this:

305/366: Falcon

Peregrine falcon at Vroman's Nose.

Peregrine falcon at Vroman’s Nose.

This is actually a small crop from a much larger photo, because even at max zoom on my telephoto lens (250mm), it wasn’t a huge fraction of the frame. The really 6000×4000 sensor paid off big time, though, because even a small piece of a big frame has plenty of resolution to look awesome.

(Thus, I do not need to blow several hundred dollars on an even-bigger zoom lens. I repeat, I do not need an even-bigger zoom lens.)

(I’m also reasonably certain that this really is a peregrine falcon. But if we misidentified some other type of raptor, for the love of God, don’t tell SteelyKid…)

And despite her grievous injury, the sight of her very favorite bird living free and in the wild cheered SteelyKid up enough that she just about skipped back down the mountain, chattering happily. So, I really couldn’t’ve asked for a better result from a morning hike.

July 02, 2016

Scott Aaronson“Did Einstein Kill Schrödinger’s Cat? A Quantum State of Mind”

No, I didn’t invent that title.  And no, I don’t know of any interesting sense in which “Einstein killed Schrödinger’s cat,” though arguably there are senses in which Schrödinger’s cat killed Einstein.

The above was, however, the title given to a fun panel discussion that Daniel Harlow, Brian Swingle, and I participated in on Wednesday evening, at the spectacular facility of the New York Academy of Sciences on the 40th floor of 7 World Trade Center in lower Manhattan.  The moderator was George Musser of Scientific American.  About 200 people showed up, some of whom we got to meet at the reception afterward.

(The link will take you to streaming video of the event, though you’ll need to scroll to 6:30 or so for the thing to start.)

The subject of the panel was the surprising recent connections between quantum information and quantum gravity, something that Daniel, Brian, and I all talked about different aspects of.  I admitted at the outset that, not only was I not a real expert on the topic (as Daniel and Brian are), I wasn’t even a physicist, just a computer science humor mercenary or whatever the hell I am.  I then proceeded, ironically, to explain the Harlow-Hayden argument for the computational hardness of creating a firewall, despite Harlow sitting right next to me (he chose to focus on something else).  I was planning also to discuss Lenny Susskind’s conjecture relating the circuit complexity of quantum states to the AdS/CFT correspondence, but I ran out of time.

Thanks so much to my fellow participants, to George for moderating, and especially to Jennifer Costley, Crystal Ocampo, and everyone else at NYAS for organizing the event.

July 01, 2016

Doug NatelsonThe critical material nearly everyone overlooks

Condensed matter physics is tough to popularize, and yet aspects of it are absolutely ubiquitous in modern technologies.  For example:  Nearly every flat panel display, from the one on your phone to your computer monitor to your large television, takes advantage of an underappreciated triumph of materials development, a transparent conducting layer.  Usually, when a material is a good conductor of electricity, it tends to be (when more than tens of nm thick) reflective and opaque.   Remember, light is an electromagnetic wave.  If the electric field from the light can make the mobile charge in the material move, and if that charge can keep up with the rapid oscillations (1014 Hz and faster!) of the electric field, then the light tends to be reflected rather than transmitted.  This is why polished aluminum or silver can be used as a mirror.

The dominant technology for transparent conductors is indium tin oxide (ITO), which manages to thread between two constraints.  It's a highly doped semiconductor.  The undoped indium oxide material has a band gap of 3 eV, meaning that violet light with a shorter wavelength than about 350 nm will have enough energy to be absorbed, by kicking electrons out of the filled valence band and into the conduction band.  Longer wavelength light (most of the visible spectrum) doesn't have enough energy to make those transitions, and thus the material is transparent for those colors.   ITO has had enough tin added to make the resulting material fairly conducting at low frequencies (say those relevant for electronics, but much lower than the frequency of visible light).  However, because of the way charge moves in ITO (see here or here for a nice article), it does not act reflective at visible frequencies.   This material is one huge enabling technology for displays!  I remember being told that the upper limit on LCD display size was, at one point, limited by the electrical conductivity of the ITO, and that we'd never have flat screens bigger than about a meter diagonal.  Clearly that problem was resolved.

Indium isn't cheap.  There are many people interested in making cheaper (yet still reasonably transparent) conducting layers.  Possibilities include graphene (though even at monolayer thickness it does absorb about 2% in the visible) and percolative networks of metal nanowires (or nanotubes).    Unfortunately, because of the physics described above, it would appear that transparent aluminum  (in the sense of having true bulk metal-like propeerties but optical transparency in the visible) must remain in the realm of science fiction.

June 29, 2016

Clifford JohnsonGauge Theories are Cool

That is all.


('fraid you'll have to wait for the finished book to learn why those shapes are relevant to the title...)

-cvj Click to continue reading this post

The post Gauge Theories are Cool appeared first on Asymptotia.

Terence TaoFinite time blowup for Lagrangian modifications of the three-dimensional Euler equation

I’ve just posted to the arXiv my paper “Finite time blowup for Lagrangian modifications of the three-dimensional Euler equation“. This paper is loosely in the spirit of other recent papers of mine in which I explore how close one can get to supercritical PDE of physical interest (such as the Euler and Navier-Stokes equations), while still being able to rigorously demonstrate finite time blowup for at least some choices of initial data. Here, the PDE we are trying to get close to is the incompressible inviscid Euler equations

\displaystyle \partial_t u + (u \cdot \nabla) u = - \nabla p

\displaystyle \nabla \cdot u = 0

in three spatial dimensions, where {u} is the velocity vector field and {p} is the pressure field. In vorticity form, and viewing the vorticity {\omega} as a {2}-form (rather than a vector), we can rewrite this system using the language of differential geometry as

\displaystyle \partial_t \omega + {\mathcal L}_u \omega = 0

\displaystyle u = \delta \tilde \eta^{-1} \Delta^{-1} \omega

where {{\mathcal L}_u} is the Lie derivative along {u}, {\delta} is the codifferential (the adjoint of the differential {d}, or equivalently the negative of the divergence operator) that sends {k+1}-vector fields to {k}-vector fields, {\Delta} is the Hodge Laplacian, and {\tilde \eta} is the identification of {k}-vector fields with {k}-forms induced by the Euclidean metric {\tilde \eta}. The equation{u = \delta \tilde \eta^{-1} \Delta^{-1} \omega} can be viewed as the Biot-Savart law recovering velocity from vorticity, expressed in the language of differential geometry.

One can then generalise this system by replacing the operator {\tilde \eta^{-1} \Delta^{-1}} by a more general operator {A} from {2}-forms to {2}-vector fields, giving rise to what I call the generalised Euler equations

\displaystyle \partial_t \omega + {\mathcal L}_u \omega = 0

\displaystyle u = \delta A \omega.

For example, the surface quasi-geostrophic (SQG) equations can be written in this form, as discussed in this previous post. One can view {A \omega} (up to Hodge duality) as a vector potential for the velocity {u}, so it is natural to refer to {A} as a vector potential operator.

The generalised Euler equations carry much of the same geometric structure as the true Euler equations. For instance, the transport equation {\partial_t \omega + {\mathcal L}_u \omega = 0} is equivalent to the Kelvin circulation theorem, which in three dimensions also implies the transport of vortex streamlines and the conservation of helicity. If {A} is self-adjoint and positive definite, then the famous Euler-Poincaré interpretation of the true Euler equations as geodesic flow on an infinite dimensional Riemannian manifold of volume preserving diffeomorphisms (as discussed in this previous post) extends to the generalised Euler equations (with the operator {A} determining the new Riemannian metric to place on this manifold). In particular, the generalised Euler equations have a Lagrangian formulation, and so by Noether’s theorem we expect any continuous symmetry of the Lagrangian to lead to conserved quantities. Indeed, we have a conserved Hamiltonian {\frac{1}{2} \int \langle \omega, A \omega \rangle}, and any spatial symmetry of {A} leads to a conserved impulse (e.g. translation invariance leads to a conserved momentum, and rotation invariance leads to a conserved angular momentum). If {A} behaves like a pseudodifferential operator of order {-2} (as is the case with the true vector potential operator {\tilde \eta^{-1} \Delta^{-1}}), then it turns out that one can use energy methods to recover the same sort of classical local existence theory as for the true Euler equations (up to and including the famous Beale-Kato-Majda criterion for blowup).

The true Euler equations are suspected of admitting smooth localised solutions which blow up in finite time; there is now substantial numerical evidence for this blowup, but it has not been proven rigorously. The main purpose of this paper is to show that such finite time blowup can at least be established for certain generalised Euler equations that are somewhat close to the true Euler equations. This is similar in spirit to my previous paper on finite time blowup on averaged Navier-Stokes equations, with the main new feature here being that the modified equation continues to have a Lagrangian structure and a vorticity formulation, which was not the case with the averaged Navier-Stokes equation. On the other hand, the arguments here are not able to handle the presence of viscosity (basically because they rely crucially on the Kelvin circulation theorem, which is not available in the viscous case).

In fact, three different blowup constructions are presented (for three different choices of vector potential operator {A}). The first is a variant of one discussed previously on this blog, in which a “neck pinch” singularity for a vortex tube is created by using a non-self-adjoint vector potential operator, in which the velocity at the neck of the vortex tube is determined by the circulation of the vorticity somewhat further away from that neck, which when combined with conservation of circulation is enough to guarantee finite time blowup. This is a relatively easy construction of finite time blowup, and has the advantage of being rather stable (any initial data flowing through a narrow tube with a large positive circulation will blow up in finite time). On the other hand, it is not so surprising in the non-self-adjoint case that finite blowup can occur, as there is no conserved energy.

The second blowup construction is based on a connection between the two-dimensional SQG equation and the three-dimensional generalised Euler equations, discussed in this previous post. Namely, any solution to the former can be lifted to a “two and a half-dimensional” solution to the latter, in which the velocity and vorticity are translation-invariant in the vertical direction (but the velocity is still allowed to contain vertical components, so the flow is not completely horizontal). The same embedding also works to lift solutions to generalised SQG equations in two dimensions to solutions to generalised Euler equations in three dimensions. Conveniently, even if the vector potential operator for the generalised SQG equation fails to be self-adjoint, one can ensure that the three-dimensional vector potential operator is self-adjoint. Using this trick, together with a two-dimensional version of the first blowup construction, one can then construct a generalised Euler equation in three dimensions with a vector potential that is both self-adjoint and positive definite, and still admits solutions that blow up in finite time, though now the blowup is now a vortex sheet creasing at on a line, rather than a vortex tube pinching at a point.

This eliminates the main defect of the first blowup construction, but introduces two others. Firstly, the blowup is less stable, as it relies crucially on the initial data being translation-invariant in the vertical direction. Secondly, the solution is not spatially localised in the vertical direction (though it can be viewed as a compactly supported solution on the manifold {{\bf R}^2 \times {\bf R}/{\bf Z}}, rather than {{\bf R}^3}). The third and final blowup construction of the paper addresses the final defect, by replacing vertical translation symmetry with axial rotation symmetry around the vertical axis (basically, replacing Cartesian coordinates with cylindrical coordinates). It turns out that there is a more complicated way to embed two-dimensional generalised SQG equations into three-dimensional generalised Euler equations in which the solutions to the latter are now axially symmetric (but are allowed to “swirl” in the sense that the velocity field can have a non-zero angular component), while still keeping the vector potential operator self-adjoint and positive definite; the blowup is now that of a vortex ring creasing on a circle.

As with the previous papers in this series, these blowup constructions do not directly imply finite time blowup for the true Euler equations, but they do at least provide a barrier to establishing global regularity for these latter equations, in that one is forced to use some property of the true Euler equations that are not shared by these generalisations. They also suggest some possible blowup mechanisms for the true Euler equations (although unfortunately these mechanisms do not seem compatible with the addition of viscosity, so they do not seem to suggest a viable Navier-Stokes blowup mechanism).

Filed under: math.AP, paper Tagged: Euler equations, finite time blowup

June 27, 2016

Steinn SigurðssonIceland Football Demographics

In case anyone hasn’t noticed, Iceland is playing England in the Euro 2016 Cup today, round of 16.

This is the first time Iceland has been in a major football tournament, the first time, obviously, they have progressed to the second stage, and as I write this they are unbeaten in tournament play.

Iceland has a population of just over 330,000.
It is about half the size of Wyoming, both in area and population.

For perspective, the mens’ football team is drawn approximately from the 21-37 year old demographic, which has about 65,000 people in it. Or about 33,000 males.
The football squad has about 20 players, so the chance of someone in Iceland being on the national football team at any given time is about 1/1,500.
Your prior odds of making the national team growing up are order 0.1%, given turnover in the squad etc.

Now think about the fact that there are multiple national squads in different sports and activities and Iceland tries to represent fully at the international level…

A typical male in their early 20s tends to have a social network of about 1-200 people. These are relatives, friends, classmates and teammates.
There will be some overlap among the national team members, but as a good approximation, 10% of the population of Iceland are part of the national teams social network.
That is 1/10th of the country are the friends, relatives, lovers or co-workers of the football team.

Not coincidentally, about a 1/10th of the population of Iceland has gone to France for the Euros, a significant fraction of them are in the stands at any given time.
Enough were there that it significantly reduced the turnout for the Presidential Election this week when Iceland advanced out of the groups to the elimination rounds!

Icelands is unbeaten, England has never lost to Iceland.

Maybe it will come to penalties…

Game On.

PS: well, that was a jolly Good Game.