Planet Musings

May 27, 2016

David Hoggnew discoveries in the K2 data!

At group meeting, Dun Wang showed new discoveries in the brand-new K2 Campaign 9 data. The K2 team released the data the moment it came off the s/c, in fact before they even ran their own data-reformatting code. This was no problem for us, because K2 god Geert Barentsen (Ames) has (clean-roomed? and) open-sourced some of the core code. Wang ran our image-prediction / image-differencing version of our CPM code to predict the data and identify sources with strange excursions. He found ones that looked like possible microlensing events (known and unknown) and published them to the K2C9 group. I asked him to re-run CPM with other parameters to make it less (and more) aggressive and thereby address the (probable) over-fitting. The next step will be to incorporate a microlensing model and fit the K2 systematics (the point of the CPM code) and the microlensing parameters simultaneously. We discussed next steps.

Later in the day I started actually writing the celestial mechanics code my new eclipsing-binary team (Price-Whelan, Ness, and Foreman-Mackey). It is probably all wrong, but it is core technology, so it needs to get instrumented with tests.

Clifford JohnsonOf Spies and Spacetime

Stephanie DeMarco interviewed me a few weeks ago for an article she was writing about the science in the TV show Agent Carter (season two). As you know, I did a lot of work for them on the science, some of which I've mentioned here, and we spoke about some of that and a lot of interesting other things besides. Well, her article appeared in Signal to Noise magazine, a publication all about communicating science, and it's really a nice piece. You can read it here. (The excellent title I used for this post is from her article.)


It is a pity that the show has not been renewed for a third season (I'm trying not [...] Click to continue reading this post

The post Of Spies and Spacetime appeared first on Asymptotia.

ResonaancesCMS: Higgs to mu tau is going away

One interesting anomaly in the LHC run-1 was a hint of Higgs boson decays to a muon and a tau lepton. Such process is forbidden in the Standard Model by the conservation of  muon and tau lepton numbers. Neutrino masses violate individual lepton numbers, but their effect is far too small to affect the Higgs decays in practice. On the other hand, new particles do not have to respect global symmetries of the Standard Model, and they could induce lepton flavor violating Higgs decays at an observable level. Surprisingly, CMS found a small excess in the Higgs to tau mu search in their 8 TeV data, with the measured branching fraction Br(h→τμ)=(0.84±0.37)%.  The analogous measurement in ATLAS is 1 sigma above the background-only hypothesis, Br(h→τμ)=(0.53±0.51)%. Together this merely corresponds to a 2.5 sigma excess, so it's not too exciting in itself. However, taken together with the B-meson anomalies in LHCb, it has raised hopes for lepton flavor violating new physics just around the corner.  For this reason, the CMS excess inspired a few dozen of theory papers, with Z' bosons, leptoquarks, and additional Higgs doublets pointed out as possible culprits.

Alas, the wind is changing. CMS made a search for h→τμ in their small stash of 13 TeV data collected in 2015. This time they were hit by a negative background fluctuation, and they found Br(h→τμ)=(-0.76±0.81)%. The accuracy of the new measurement is worse than that in run-1, but nevertheless it lowers the combined significance of the excess below 2 sigma. Statistically speaking, the situation hasn't changed much,  but psychologically this is very discouraging. A true signal is expected to grow when more data is added, and when it's the other way around it's usually a sign that we are dealing with a statistical fluctuation...

So, if you have a cool model explaining the h→τμ  excess be sure to post it on arXiv before more run-2 data is analyzed ;)

May 26, 2016

David Hoggcelestial mechanics

As part of my project with Adrian Price-Whelan (and also Melissa Ness and Dan Foreman-Mackey), I spent my research time today figuring out time derivatives of Kepler's equations. These so we can do simultaneous fitting of the eclipsing binary light curve and the radial velocities revealed in the double-line spectrum. This was actual pen-on-paper calculus! It's been a while, although as I took these derivatives, it reminded me that I have taken them many times in the past.

In the afternoon I had a great conversation with Duane Lee (Vanderbilt) about chemical tagging and nucleosynthesis. He is close to being able to fit our data in the Milky Way halo with a mixture of dwarf-galaxy stellar populations. That would be awesome! We talked about low-hanging fruit with our APOGEE chemical abundance data.

n-Category Café Good News

Various bits of good news concerning my former students Alissa Crans, Derek Wise and Jeffrey Morton.

Alissa Crans did her thesis on Lie 2-Algebras back in 2004. She got hired by Loyola Marymount University, got tenure there in 2011… and a couple of weeks ago she got promoted to full professor! Hurrah!

Derek Wise did his thesis on Topological Gauge Theory, Cartan Geometry, and Gravity in 2007. After a stint at U. C. Davis he went to Erlangen in 2010. When I was in Erlangen in the spring of 2014 he was working with Catherine Meusberger on gauge theory with Hopf algebras replacing groups, and a while back they came out with a great paper on that: Hopf algebra gauge theory on a ribbon graph. But the good news is this: last fall, he got a tenure-track job at Concordia University St Paul!

Jeffrey Morton did his thesis on Extended TQFT’s and Quantum Gravity in 2007. After postdocs at the University of Western Ontario, the Instituto Superior Técnico, Universität Hamburg, Mount Allison University and a visiting assistant professorship at Toledo University, he has gotten a tenure-track job at SUNY Buffalo State! I guess he’ll start there in the fall.

They’re older and wiser now, but here’s what they looked like once:

From left to right it’s Derek Wise, Jeffrey Morton and Alissa Crans… and then two more students of mine: Toby Bartels and Miguel Carrión Álvarez.

David Hogga new project: eclipsing binaries

I have a dream! If we could get enough long-period eclipsing binaries with multi-epoch spectroscopy, we could go a long way towards building a truly data-driven model of stellar spectra. It would be truly data-driven, because we would use the gravitational model of the eclipsing binary to get stellar masses and radii, and thus give good label (truth) inputs to a model like The Cannon for the stellar spectra. (Yes, if you have an eclipsing binary and spectroscopy for radial velocities, you get everything.) And then we could get densities, masses, and radii of stars for the interpretation of transit and radial-velocity results on exoplanets, without relying on stellar models. There are lots of other things to do too, like build population models for binary stars, and exploit the stellar models for Milky Way science. And etc.

Today, because of a meeting cancellation, both Adrian Price-Whelan and I got the full day off from responsibilities, so we decided to use it very irresponsibly. We searched the (very incomplete and under-studied) Kepler eclipsing binary list for binaries with long periods, deep eclipse depths, and APOGEE spectroscopy. It turns out there are lots! We started with the system KIC 9246715, which is a red-giant pair.

In the APOGEE spectrum, the pair of velocities (double line) is clearly visible, and it clearly changes from epoch to epoch. We found the velocities at each epoch first by auto-correlation and then by modeling the spectrum as a sum of two single stars. A project is born!

BackreactionHow can we test quantum gravity?

If you have good eyes, the smallest objects you can make out are about a tenth of a millimeter, roughly the width of a human hair. Add technology, and the smallest structures we have measured so far are approximately 10-19m, that’s the wavelength of the protons collided at the LHC. It has taken us about 400 years from the invention of the microscope to the construction of the LHC – 400 years to cross 15 orders of magnitude.

Quantum effects of gravity are estimated to become relevant on distance scales of approximately 10-35m, known as the Planck length. That’s another 16 orders of magnitude to go. It makes you wonder whether it’s possible at all, or whether all the effort to find a quantum theory of gravity is just idle speculation.

I am optimistic. The history of science is full with people who thought things to be impossible that have meanwhile been done: measuring the light deflection on the sun, heavier-than-air flying machines, detecting gravitational waves. Hence, I don’t think it’s impossible to experimentally test quantum gravity. Maybe it will take some decades, or maybe it will take some centuries – but if only we keep pushing, one day we will measure quantum gravitational effects. Not by directly crossing these 15 orders of magnitude, I believe, but instead by indirect detections at lower energies.

From nothing comes nothing though. If we don’t think about how quantum gravitational effects can look like and where they might show up, we’ll certainly never find them. But fueling my optimism is the steadily increasing interest in the phenomenology of quantum gravity, the research area dedicated to studying how to best find evidence for quantum gravitational effects.

Since there isn’t any one agreed-upon theory for quantum gravity, existing efforts to find observable phenomena focus on finding ways to test general features of the theory, properties that have been found in several different approaches to quantum gravity. Quantum fluctuations of space-time, for example, or the presence of a “minimal length” that would impose a fundamental resolution limit. Such effects can be quantified in mathematical models, which can then be used to estimate the strength of the effects and thus to find out which experiments are most promising.

Testing quantum gravity has long thought to be out of reach of experiments, based on estimates that show it would take a collider the size of the Milky Way to accelerate protons enough to produce a measureable amount of gravitons (the quanta of the gravitational field), or that we would need a detector the size of planet Jupiter to measure a graviton produced elsewhere. Not impossible, but clearly not something that will happen in my lifetime.

One testable consequence of quantum gravity might be, for example, the violation of the symmetry of special and general relativity, known as Lorentz-invariance. Interestingly it turns out that violations of Lorentz-invariance are not necessarily small even if they are created at distances too short to be measurable. Instead, these symmetry violations seep into many particle reactions at accessible energies, and these have been tested to extremely high accuracy. No evidence for violations of Lorentz-invariance have been found. This might sound like not much, but knowing that this symmetry has to be respected by quantum gravity is an extremely useful guide in the development of the theory.

Other testable consequences might be in the weak-field limit of quantum gravity. In the early universe, quantum fluctuations of space-time would have led to temperature fluctuation of matter. And these temperature fluctuations are still observable today in the Cosmic Microwave Background (CMB). The imprint of such “primordial gravitational waves” on the CMB has not yet been measured (LIGO is not sensitive to them), but they are not so far off measurement precision.

A lot of experiments are currently searching for this signal, including BICEP and Planck. This raises the question whether it is possible to infer from the primordial gravitational waves that gravity must have been quantized in the early universe. Answering this question is one of the presently most active areas in quantum gravity phenomenology.

Also testing the weak-field limit of quantum gravity are attempts to bring objects into quantum superpositions that are much heavier than elementary particles. This makes the gravitational field stronger and potentially offers the chance to probe its quantum behavior. The heaviest objects that have so far been brought into superpositions weigh about a nano-gram, which is still several orders of magnitude too small to measure the gravitational field. But a group in Vienna recently proposed an experimental scheme that would allow to measure the gravitational field more precisely than ever before. We are slowly closing in on the quantum gravitational range.

Such arguments however merely concern the direct detection of gravitons, and that isn’t the only manifestation of quantum gravitational effects. There are various other observable consequences that quantum gravity could give rise to, some of which have already been looked for, and others that we plan to look for. So far, we have only negative results. But even negative results are valuable because they tell us what properties the sought-for theory cannot have.

[From arXiv:1602.07539, for details, see here]

The weak field limit would prove that gravity really is quantized and finally deliver the much-needed experimental evidence, confirming that we’re not just doing philosophy. However, for most of us in the field the strong gravity limit is more interesting. With strong gravity limit I mean Planckian curvature, which (not counting those galaxy-sized colliders) can only be found close by the center of black holes and towards the big bang.

(Note that in astrophysics, “strong gravity” is sometimes used to mean something different, referring to large deviations from Newtonian gravity which can be found, eg, around the horizon of black holes. In comparison to the Planckian curvature required for strong quantum gravitational effects, this is still exceedingly weak.)

Strong quantum gravitational effects could also have left an imprint in the cosmic microwave background, notably in the type of correlations that can be found in the fluctuations. There are various models of string cosmology and loop quantum cosmology that have explored the observational consequences, and proposed experiments like EUCLID and PRISM might find first hints. Also the upcoming experiments to test the 21-cm hydrogen absorption could harbor information about quantum gravity.

A somewhat more speculative idea is based on a recent finding according to which the gravitational collapse of matter might not always form a black hole, but could escape the formation of a horizon. If that is so, then the remaining object would give us open view on a region with quantum gravitational effects. It isn’t yet clear exactly what signals we would have to look for to find such an object, but this is promising research direction because it could give us direct access to strong space-time curvature.

There are many other ideas out there. A large class of models for example deals with the possibility that quantum gravitational effects endow space-time with the properties of a medium. This can lead to the dispersion of light (colors running apart), birefringence (polarizations running apart), decoherence (preventing interference), or an opacity of otherwise empty space. More speculative ideas include Craig Hogan’s quest for holographic noise, Bekenstein’s table-top experiment that searches for Planck-length discreteness, or searches for evidence of a minimal length in tritium decay. Some general properties that have recently been found and that we yet have to find good experimental tests for are geometric phase transitions in the early universe, or dimensional reduction.

Without doubt, there is much that remains to be done. But we’re on the way.

[This post previously appeared on Starts With A Bang.]

Geraint F. LewisOn The Relativity of Redshifts: Does Space Really “Expand”?

I've written an article 'On the Relativity of Redshifts: Does Space Really "Expand"?' which has appeared in Australian Physics (2016, 53(3), 95-100). For some reason, the arXiv has put the article on hold, so you can download it here.

I like it :)

Clifford JohnsonGestures


I'm trying to make the characters somewhat expressive, since you, the book's reader, will be spending a lot of time with them. This means constructing lots of hands doing things. Lots of hands. Hands take time, but are actually rather fun to construct from scratch. I start mine as two or three planes hinged together, and then go from there, subdividing until I'm done.

-cvj Click to continue reading this post

The post Gestures appeared first on Asymptotia.

May 25, 2016

David Hoggextreme precision radial velocities

I continued working on my document about release of data and code. Twitter (tm) continues awesome.

Research highlight of the day was a long discussion with Megan Bedell (Chicago) about the consistency of exoplanet-host radial-velocity measurements order-by-order in a many-order high-resolution echelle spectrograph. The measurements are made by cross-correlation with a binary (ugh) template, and some orders are consistently high and some are consistently low, and we very much hope there are other more subtle regularities to exploit. Why are there these discrepancies? Probably because the model is inflexible and wrong. Unfortunately we don't have access to it directly (yet) so we have to live with the cross-correlation functions. We discussed simple methods to discover regularities in the order-by-order offsets and results, and sent Bedell off with a long to-do list.

I ended the day with a long conversation with Kat Deck (Caltech). Among other things, we discussed what we would do with our lives if exoplanet research evolves into nothing other than atmosphere transmission spectroscopy and modeling. Of course neither of us considers this outcome likely!

David Hoggwriting by tweeting

I spent the day working on my document about releasing data and code. I tweeted (tm) some of the ideas in the paper and started responding to the storm of replies. The twitters are excellent for getting ideas from the community!

Robert HellingHolographic operator ordering?

Believe it or not, at the end of this week I will speak at a workshop on algebraic and constructive quantum field theory. And (I don't know which of these two facts is more surprising) I will advocate holography.

More specifically, I will argue that it seems that holography can be a successful approach to formulate effective low energy theories (similar to other methods like perturbation theory of weakly coupled quasi particles or minimal models). And I will present this as a challenge to the community at the workshop to show that the correlators computed with holographic methods indeed encode a QFT (according to your favorite set of rules, e.g. Whiteman or Osterwalder-Schrader). My [kudos to an anonymous reader for pointing out a typo] guess would be that this has a non-zero chance of being a possible approach to the construction of (new) models in that sense or alternatively to show that the axioms are violated (which would be even more interesting for holography).

In any case, I am currently preparing my slides (I will not be able to post those as I have stolen far too many pictures from the interwebs including the holographic doctor from Star Trek Voyager) and came up with the following question:

In a QFT, the order of insertions in a correlator matters (unless we fix an ordering like time ordering). How is that represented on the bulk side?

Does anybody have any insight about this?

May 24, 2016

Sean CarrollThe Big Picture: The Talk

I’m giving the last lecture on my mini-tour for The Big Picture tonight at the Natural History Museum here in Los Angeles. If you can’t make it, here’s a decent substitute: video of the talk I gave last week at Google headquarters in Mountain View.

I don’t think I’ve quite worked out all the kinks in this talk, but you get the general idea. My biggest regret was that I didn’t have the time to trace the flow of free energy from the Sun to photosynthesis to ATP to muscle contractions. It’s a great demonstration of how biological organisms are maintained through the creation of entropy.

Doug NatelsonResearch blogging: Magnetism in layered materials

Following on from graphene, there has been enormous interest in other layered materials for the last few years, such as transition metal dichalcogenides (TMDs) like MoS2.   Depending on the constituents and particular structure, these materials can be semiconductors, superconductors, charge density wave compounds, etc., and can have properties that vary strongly as the number of layers in the material is reduced toward one.  You can expand the palette further by substitutionally doping different elements into the chalcogenide layers, or you can intercalate other atoms between the layers.  There are a huge number of possible compounds and variations.  (Fun note:  TMDs have been studied intensely before.  See here for a review from almost 50 years ago!  And magnetism in intercalated TMDs was examined by people like Stuart Parkin and Richard Friend almost 40 years ago.   The resurgence now is due to a combination of improved growth and characterization techniques, interest in low-dimensionality materials, and theoretical appreciation for the richness of possible states in these systems.)

Recently, collaborating with my colleague Jun Lou, we had some fun examining a related material, V5S8, which you can also think of as (V0.25)VS2.  There are vanadium disulfide layers, and intercalated between them are additional vanadium atoms in an ordered pattern.  The bulk version of this material was found in the 1970s to be an antiferromagnet - below the Neel temperature TN ~ 32 K, the spins of the unpaired electrons on the intercalated vanadium atoms spontaneously order into the arrangement shown in the upper panel at right.   If an external magnetic field bigger than about 4 T is applied perpendicular to the planes of the material, the spins flop over into the arrangement shown in the bottom panel - this is called a spin flop transition. 

Prof. Lou's group has figured out how to grow V5S8 by chemical vapor deposition, so that we were able to make measurements on single crystals of a variety of thicknesses, down to about 10 nm.  We found a couple of cool things, as reported here.   

First, we found a previously unreported first-order (in the thicker crystals) phase transition as a function of externally applied magnetic field.   The signature of this is hysteresis in the electrical resistance of the material as a function of the magnetic field, H.  Just below TN, the hysteresis appears near zero magnetic field.  As T is lowered, the magnetic field where the hysteresis takes place increases dramatically - in a thick crystal, it can go from basically 0 T to taking place at 9 T when the temperature is lowered by only three Kelvin!  Indeed, that's probably one reason why the transition was missed by previous investigators:  If you take data at only select temperatures, you could easily miss the whole thing.   This kind of a transition is called metamagnetic, and we think that large applied fields kill the antiferromagnetism (AFM), driving the material into a paramagnetic (PM) state.  We suggest a phase diagram shown in the table-of-contents figure shown here.  The transition extrapolates to a finite value of H at zero temperature.  That implies that it ends up as a quantum phase transition.

Second, we found that there are systematic changes in the magnetic properties as a function of the thickness of the crystals.  In thinner crystals, the antiferromagnetism appears to be weaker, with TN falling.  Moreover, the hysteresis in the field-driven transition vanishes in thinner crystals, suggesting that the metamagnetic transition goes from first-order to second order in the thin limit.   

This work was a lot of fun.  As far as I know, it's the first example of a systematic study of magnetic properties in one of these layered materials as a function of material thickness.  I think we've just scratched the surface in terms of what could be possible in terms of magnetism in this layered material platform. 

Terence TaoA symmetric formulation of the Croot-Lev-Pach-Ellenberg-Gijswijt capset bound

A capset in the vector space {{\bf F}_3^n} over the finite field {{\bf F}_3} of three elements is a subset {A} of {{\bf F}_3^n} that does not contain any lines {\{ x,x+r,x+2r\}}, where {x,r \in {\bf F}_3^n} and {r \neq 0}. A basic problem in additive combinatorics (discussed in one of the very first posts on this blog) is to obtain good upper and lower bounds for the maximal size of a capset in {{\bf F}_3^n}.

Trivially, one has {|A| \leq 3^n}. Using Fourier methods (and the density increment argument of Roth), the bound of {|A| \leq O( 3^n / n )} was obtained by Meshulam, and improved only as late as 2012 to {O( 3^n /n^{1+c})} for some absolute constant {c>0} by Bateman and Katz. But in a very recent breakthrough, Ellenberg (and independently Gijswijt) obtained the exponentially superior bound {|A| \leq O( 2.756^n )}, using a version of the polynomial method recently introduced by Croot, Lev, and Pach. (In the converse direction, a construction of Edel gives capsets as large as {(2.2174)^n}.) Given the success of the polynomial method in superficially similar problems such as the finite field Kakeya problem (discussed in this previous post), it was natural to wonder that this method could be applicable to the cap set problem (see for instance this MathOverflow comment of mine on this from 2010), but it took a surprisingly long time before Croot, Lev, and Pach were able to identify the precise variant of the polynomial method that would actually work here.

The proof of the capset bound is very short (Ellenberg’s and Gijswijt’s preprints are both 3 pages long, and Croot-Lev-Pach is 6 pages), but I thought I would present a slight reformulation of the argument which treats the three points on a line in {{\bf F}_3} symmetrically (as opposed to treating the third point differently from the first two, as is done in the Ellenberg and Gijswijt papers; Croot-Lev-Pach also treat the middle point of a three-term arithmetic progression differently from the two endpoints, although this is a very natural thing to do in their context of {({\bf Z}/4{\bf Z})^n}). The basic starting point is this: if {A} is a capset, then one has the identity

\displaystyle \delta_{0^n}( x+y+z ) = \sum_{a \in A} \delta_a(x) \delta_a(y) \delta_a(z) \ \ \ \ \ (1)


for all {(x,y,z) \in A^3}, where {\delta_a(x) := 1_{a=x}} is the Kronecker delta function, which we view as taking values in {{\bf F}_3}. Indeed, (1) reflects the fact that the equation {x+y+z=0} has solutions precisely when {x,y,z} are either all equal, or form a line, and the latter is ruled out precisely when {A} is a capset.

To exploit (1), we will show that the left-hand side of (1) is “low rank” in some sense, while the right-hand side is “high rank”. Recall that a function {F: A \times A \rightarrow {\bf F}} taking values in a field {{\bf F}} is of rank one if it is non-zero and of the form {(x,y) \mapsto f(x) g(y)} for some {f,g: A \rightarrow {\bf F}}, and that the rank of a general function {F: A \times A \rightarrow {\bf F}} is the least number of rank one functions needed to express {F} as a linear combination. More generally, if {k \geq 2}, we define the rank of a function {F: A^k \rightarrow {\bf F}} to be the least number of “rank one” functions of the form

\displaystyle (x_1,\dots,x_k) \mapsto f(x_i) g(x_1,\dots,x_{i-1},x_{i+1},\dots,x_k)

for some {i=1,\dots,k} and some functions {f: A \rightarrow {\bf F}}, {g: A^{k-1} \rightarrow {\bf F}}, that are needed to generate {F} as a linear combination. For instance, when {k=3}, the rank one functions take the form {(x,y,z) \mapsto f(x) g(y,z)}, {(x,y,z) \mapsto f(y) g(x,z)}, {(x,y,z) \mapsto f(z) g(x,y)}, and linear combinations of {r} such rank one functions will give a function of rank at most {r}.

It is a standard fact in linear algebra that the rank of a diagonal matrix is equal to the number of non-zero entries. This phenomenon extends to higher dimensions:

Lemma 1 (Rank of diagonal hypermatrices) Let {k \geq 2}, let {A} be a finite set, let {{\bf F}} be a field, and for each {a \in A}, let {c_a \in {\bf F}} be a coefficient. Then the rank of the function

\displaystyle (x_1,\dots,x_k) \mapsto \sum_{a \in A} c_a \delta_a(x_1) \dots \delta_a(x_k) \ \ \ \ \ (2)


is equal to the number of non-zero coefficients {c_a}.

Proof: We induct on {k}. As mentioned above, the case {k=2} follows from standard linear algebra, so suppose now that {k>2} and the claim has already been proven for {k-1}.

It is clear that the function (2) has rank at most equal to the number of non-zero {c_a} (since the summands on the right-hand side are rank one functions), so it suffices to establish the lower bound. By deleting from {A} those elements {a \in A} with {c_a=0} (which cannot increase the rank), we may assume without loss of generality that all the {c_a} are non-zero. Now suppose for contradiction that (2) has rank at most {|A|-1}, then we obtain a representation

\displaystyle \sum_{a \in A} c_a \delta_a(x_1) \dots \delta_a(x_k) = \sum_{i=1}^k \sum_{\alpha \in I_i} f_{i,\alpha}(x_i) g_{i,\alpha}( x_1,\dots,x_{i-1},x_{i+1},\dots,x_k) \ \ \ \ \ (3)


for some sets {I_1,\dots,I_k} of cardinalities adding up to at most {|A|-1}, and some functions {f_{i,\alpha}: A \rightarrow {\bf F}} and {g_{i,\alpha}: A^{k-1} \rightarrow {\bf R}}.

Consider the space of functions {h: A \rightarrow {\bf F}} that are orthogonal to all the {f_{k,\alpha}}, {\alpha \in I_k} in the sense that

\displaystyle \sum_{x \in A} f_{k,\alpha}(x) h(x) = 0

for all {\alpha \in I_k}. This space is a vector space whose dimension {d} is at least {|A| - |I_k|}. A basis of this space generates a {d \times |A|} coordinate matrix of full rank, which implies that there is at least one non-singular {d \times d} minor. This implies that there exists a function {h: A \rightarrow {\bf F}} in this space which is non-zero on some subset {A'} of {A} of cardinality at least {|A|-|I_k|}.

If we multiply (3) by {h(x_k)} and sum in {x_k}, we conclude that

\displaystyle \sum_{a \in A} c_a h(a) \delta_a(x_1) \dots \delta_a(x_k)

\displaystyle := \sum_{i=1}^{k-1} \sum_{\alpha \in I_i} f_{i,\alpha}(x_i)\tilde g_{i,\alpha}( x_1,\dots,x_{i-1},x_{i+1},\dots,x_{k-1})


\displaystyle \tilde g_{i,\alpha}(x_1,\dots,x_{i-1},x_{i+1},\dots,x_{k-1})

\displaystyle := \sum_{x_k \in A} g_{i,\alpha}(x_1,\dots,x_{i-1},x_{i+1},\dots,x_k) h(x_k).

The right-hand side has rank at most {|A|-1-|I_k|}, since the summands are rank one functions. On the other hand, from induction hypothesis the left-hand side has rank at least {|A|-|I_k|}, giving the required contradiction. \Box

On the other hand, we have the following (symmetrised version of a) beautifully simple observation of Croot, Lev, and Pach:

Lemma 2 On {({\bf F}_3^n)^3}, the rank of the function {(x,y,z) \mapsto \delta_{0^n}(x+y+z)} is at most {3N}, where

\displaystyle N := \sum_{a,b,c \geq 0: a+b+c=n, b+2c \leq 2n/3} \frac{n!}{a!b!c!}.

Proof: Using the identity {\delta(x) = 1 - x^2} for {x \in {\bf F}}, we have

\displaystyle \delta_{0^n}(x+y+z) = \prod_{i=1}^n (1 - (x_i+y_i+z_i)^2).

The right-hand side is clearly a polynomial of degree {2n} in {x,y,z}, which is then a linear combination of monomials

\displaystyle x_1^{i_1} \dots x_n^{i_n} y_1^{j_1} \dots y_n^{j_n} z_1^{k_1} \dots z_n^{k_n}

with {i_1,\dots,i_n,j_1,\dots,j_n,k_1,\dots,k_n \in \{0,1,2\}} with

\displaystyle i_1 + \dots + i_n + j_1 + \dots + j_n + k_1 + \dots + k_n \leq 2n.

In particular, from the pigeonhole principle, at least one of {i_1 + \dots + i_n, j_1 + \dots + j_n, k_1 + \dots + k_n} is at most {2n/3}.

Consider the contribution of the monomials for which {i_1 + \dots + i_n \leq 2n/3}. We can regroup this contribution as

\displaystyle \sum_\alpha f_\alpha(x) g_\alpha(y,z)

where {\alpha} ranges over those {(i_1,\dots,i_n) \in \{0,1,2\}^n} with {i_1 + \dots + i_n \leq 2n/3}, {f_\alpha} is the monomial

\displaystyle f_\alpha(x_1,\dots,x_n) := x_1^{i_1} \dots x_n^{i_n}

and {g_\alpha: {\bf F}_3^n \times {\bf F}_3^n \rightarrow {\bf F}_3} is some explicitly computable function whose exact form will not be of relevance to our argument. The number of such {\alpha} is equal to {N}, so this contribution has rank at most {N}. The remaining contributions arising from the cases {j_1 + \dots + j_n \leq 2n/3} and {k_1 + \dots + k_n \leq 2n/3} similarly have rank at most {N} (grouping the monomials so that each monomial is only counted once), so the claim follows.

Upon restricting from {({\bf F}_3^n)^3} to {A^3}, the rank of {(x,y,z) \mapsto \delta_{0^n}(x+y+z)} is still at most {3N}. The two lemmas then combine to give the Ellenberg-Gijswijt bound

\displaystyle |A| \leq 3N.

All that remains is to compute the asymptotic behaviour of {N}. This can be done using the general tool of Cramer’s theorem, but can also be derived from Stirling’s formula (discussed in this previous post). Indeed, if {a = (\alpha+o(1)) n}, {b = (\beta+o(1)) n}, {c = (\gamma+o(1)) n} for some {\alpha,\beta,\gamma \geq 0} summing to {1}, Stirling’s formula gives

\displaystyle \frac{n!}{a!b!c!} = \exp( n (h(\alpha,\beta,\gamma) + o(1)) )

where {h} is the entropy function

\displaystyle h(\alpha,\beta,\gamma) = \alpha \log \frac{1}{\alpha} + \beta \log \frac{1}{\beta} + \gamma \log \frac{1}{\gamma}.

We then have

\displaystyle N = \exp( n (X + o(1))

where {X} is the maximum entropy {h(\alpha,\beta,\gamma)} subject to the constraints

\displaystyle \alpha,\beta,\gamma \geq 0; \alpha+\beta+\gamma=1; \beta+2\gamma \leq 2/3.

A routine Lagrange multiplier computation shows that the maximum occurs when

\displaystyle \alpha = \frac{32}{3(15 + \sqrt{33})}

\displaystyle \beta = \frac{4(\sqrt{33}-1)}{3(15+\sqrt{33})}

\displaystyle \gamma = \frac{(\sqrt{33}-1)^2}{6(15+\sqrt{33})}

and {h(\alpha,\beta,\gamma)} is approximately {1.013455}, giving rise to the claimed bound of {O( 2.756^n )}.

Remark 3 As noted in the Ellenberg and Gijswijt papers, the above argument extends readily to other fields than {{\bf F}_3} to control the maximal size of aubset of {{\bf F}^n} that has no non-trivial solutions to the equation {ax+by+cz=0}, where {a,b,c \in {\bf F}} are non-zero constants that sum to zero. Of course one replaces the function {(x,y,z) \mapsto \delta_{0^n}(x+y+z)} in Lemma 2 by {(x,y,z) \mapsto \delta(ax+by+cz)} in this case.

Remark 4 This symmetrised formulation suggests that one possible way to improve slightly on the numerical quantity {2.756} by finding a more efficient way to decompose {\delta_{0^n}(x+y+z)} into rank one functions, however I was not able to do so (though such improvements are reminiscent of the Strassen type algorithms for fast matrix multiplication).

Remark 5 It is tempting to see if this method can get non-trivial upper bounds for sets {A} with no length {4} progressions, in (say) {{\bf F}_5^n}. One can run the above arguments, replacing the function

\displaystyle (x,y,z) \mapsto \delta_{0^n}(x+y+z)


\displaystyle (x,y,z,w) \mapsto \delta_{0^n}(x-2y+z) \delta_{0^n}(y-2z+w);

this leads to the bound {|A| \leq 4N} where

\displaystyle N := \sum_{a,b,c,d,e \geq 0: a+b+c+d+e=n, b+2c+3d+4e \leq 2n} \frac{n!}{a!b!c!d!e!}.

Unfortunately, {N} is asymptotic to {\frac{1}{2} 5^n} and so this bound is in fact slightly worse than the trivial bound {|A| \leq 5^n}! However, there is a slim chance that there is a more efficient way to decompose {\delta(x-2y+z) \delta(y-2z+w)} into rank one functions that would give a non-trivial bound on {A}. I experimented with a few possible such decompositions but unfortunately without success.

Remark 6 Return now to the capset problem. Since Lemma 1 is valid for any field {{\bf F}}, one could perhaps hope to get better bounds by viewing the Kronecker delta function {\delta} as taking values in another field than {{\bf F}_3}, such as the complex numbers {{\bf C}}. However, as soon as one works in a field of characteristic other than {3}, one can adjoin a cube root {\omega} of unity, and one now has the Fourier decomposition

\displaystyle \delta_{0^n}(x+y+z) = \frac{1}{3^n} \sum_{\xi \in {\bf F}_3^n} \omega^{\xi \cdot x} \omega^{\xi \cdot y} \omega^{\xi \cdot z}.

Moving to the Fourier basis, we conclude from Lemma 1 that the function {(x,y,z) \mapsto \delta_{0^n}(x+y+z)} on {{\bf F}_3^n} now has rank exactly {3^n}, and so one cannot improve upon the trivial bound of {|A| \leq 3^n} by this method using fields of characteristic other than three as the range field. So it seems one has to stick with {{\bf F}_3} (or the algebraic completion thereof).

Thanks to Jordan Ellenberg and Ben Green for helpful discussions.

Filed under: expository, math.CO, math.RA Tagged: cap sets, Dion Gijswijt, Ernie Croot, Jordan Ellenberg, Peter Pach, polynomial method, Seva Lev

May 23, 2016

John BaezThe Busy Beaver Game

This month, a bunch of ‘logic hackers’ have been seeking to determine the precise boundary between the knowable and the unknowable. The challenge has been around for a long time. But only now have people taken it up with the kind of world-wide teamwork that the internet enables.

A Turing machine is a simple model of a computer. Imagine a machine that has some finite number of states, say N states. It’s attached to a tape, an infinitely long tape with lots of squares, with either a 0 or 1 written on each square. At each step the machine reads the number where it is. Then, based on its state and what it reads, it either halts, or it writes a number, changes to a new state, and moves either left or right.

The tape starts out with only 0’s on it. The machine starts in a particular ‘start’ state. It halts if it winds up in a special ‘halt’ state.

The Busy Beaver Game is to find the Turing machine with N states that runs as long as possible and then halts.

The number BB(N) is the number of steps that the winning machine takes before it halts.

In 1961, Tibor Radó introduced the Busy Beaver Game and proved that the sequence BB(N) is uncomputable. It grows faster than any computable function!

A few values of BB(N) can be computed, but there’s no way to figure out all of them.

As we increase N, the number of Turing machines we need to check increases faster than exponentially: it’s


Of course, many could be ruled out as potential winners by simple arguments. But the real problem is this: it becomes ever more complicated to determine which Turing machines with N states never halt, and which merely take a huge time to halt.

Indeed, matter what axiom system you use for math, as long as it has finitely many axioms and is consistent, you can never use it to correctly determine BB(N) for more than some finite number of cases.

So what do people know about BB(N)?

For starters, BB(0) = 0. At this point I should admit that people don’t count the halt state as one of our N states. This is just a convention. So, when we consider BB(0), we’re considering machines that only have a halt state. They instantly halt.

Next, BB(1) = 1.

Next, BB(2) = 6.

Next, BB(3) = 21. This was proved in 1965 by Tibor Radó and Shen Lin.

Next, BB(4) = 107. This was proved in 1983 by Allan Brady.

Next, BB(5). Nobody knows what BB(5) equals!

The current 5-state busy beaver champion was discovered by Heiner Marxen and Jürgen Buntrock in 1989. It takes 47,176,870 steps before it halts. So, we know

BB(5) ≥ 47,176,870.

People have looked at all the other 5-state Turing machines to see if any does better. But there are 43 machines that do very complicated things that nobody understands. It’s believed they never halt, but nobody has been able to prove this yet.

We may have hit the wall of ignorance here… but we don’t know.

That’s the spooky thing: the precise boundary between the knowable and the unknowable is unknown. It may even be unknowable… but I’m not sure we know that.

Next, BB(6). In 1996, Marxen and Buntrock showed it’s at least 8,690,333,381,690,951. In June 2010, Pavel Kropitz proved that

\displaystyle{ \mathrm{BB}(6) \ge 7.412 \cdot 10^{36,534} }

You may wonder how he proved this. Simple! He found a 6-state machine that runs for


steps and then halts!

Of course, I’m just kidding when I say this was simple. The machine is easy enough to describe, but proving it takes exactly this long to run takes real work! You can read about such proofs here:

• Pascal Michel, The Busy Beaver Competition: a historical survey.

I don’t understand them very well. All I can say at this point is that many of the record-holding machines known so far are similar to the famous Collatz conjecture. The idea there is that you can start with any positive integer and keep doing two things:

• if it’s even, divide it by 2;

• if it’s odd, triple it and add 1.

The conjecture is that this process will always eventually reach the number 1. Here’s a graph of how many steps it takes, as a function of the number you start with:

Nice pattern! But this image shows how it works for numbers up to 10 million, and you’ll see it doesn’t usually take very long for them to reach 1. Usually less than 600 steps is enough!

So, to get a Turing machine that takes a long time to halt, you have to take this kind of behavior and make it much more long and drawn-out. Conversely, to analyze one of the potential winners of the Busy Beaver Game, people must take that long and drawn-out behavior and figure out a way to predict much more quickly when it will halt.

Next, BB(7). In 2014, someone who goes by the name Wythagoras showed that

\displaystyle{ \textrm{BB}(7) > 10^{10^{10^{10^{10^7}}}} }

It’s fun to prove lower bounds on BB(N). For example, in 1964 Milton Green constructed a sequence of Turing machines that implies

\textrm{BB}(2N) \ge 3 \uparrow^{N-2} 3

Here I’m using Knuth’s up-arrow notation, which is a recursively defined generalization of exponentiation, so for example

\textrm{BB}(10) \ge 3 \uparrow^{3} 3 = 3 \uparrow^2 3^{3^3} = 3^{3^{3^{3^{\cdot^{\cdot^\cdot}}}}}

where there are 3^{3^3} threes in that tower.

But it’s also fun to seek the smallest N for which we can prove BB(N) is unknowable! And that’s what people are making lots of progress on right now.

Sometime in April 2016, Adam Yedidia and Scott Aaronson showed that BB(7910) cannot be determined using the widely accepted axioms for math called ZFC: that is, Zermelo—Fraenkel set theory together with the axiom of choice. It’s a great story, and you can read it here:

• Scott Aaronson, The 8000th Busy Beaver number eludes ZF set theory: new paper by Adam Yedidia and me, Shtetl-Optimized, 3 May 2016.

• Adam Yedidia and Scott Aaronson, A relatively small Turing machine whose behavior is independent of set theory, 13 May 2016.

Briefly, Yedidia created a new programming language, called Laconic, which lets you write programs that compile down to small Turing machines. They took an arithmetic statement created by Harvey Friedman that’s equivalent to the consistency of the usual axioms of ZFC together with a large cardinal axiom called the ‘stationary Ramsey property’, or SRP. And they created a Turing machine with 7910 states that seeks a proof of this arithmetic statement using the axioms of ZFC.

Since ZFC can’t prove its own consistency, much less its consistency when supplemented with SRP, their machine will only halt if ZFC+SRP is inconsistent.

Since most set theorists believe ZFC+SRP is consistent, this machine probably doesn’t halt. But we can’t prove this using ZFC.

In short: if the usual axioms of set theory are consistent, we can never use them to determine the value of BB(7910).

The basic idea is nothing new: what’s new is the explicit and rather low value of the number 7910. Poetically speaking, we know the unknowable starts here… if not sooner.

However, this discovery set off a wave of improvements! On the Metamath newsgroup, Mario Carneiro and others started ‘logic hacking’, looking for smaller and smaller Turing machines that would only halt if ZF—that is, Zermelo–Fraenkel set theory, without the axiom of choice—is inconsistent.

By just May 15th, Stefan O’Rear seems to have brought the number down to 1919. He found a Turing machine with just 1919 states that searches for an inconsistency in the ZF axioms. Interestingly, this turned out to work better than using Harvey Friedman’s clever trick.

Thus, if O’Rear’s work is correct, we can only determine BB(1919) if we can determine whether ZF set theory is consistent. However, we cannot do this using ZF set theory—unless we find an inconsistency in ZF set theory.

For details, see:

• Stefan O’Rear, A Turing machine Metamath verifier, 15 May 2016.

I haven’t checked his work, but it’s available on GitHub.

What’s the point of all this? At present, it’s mainly just a game. However, it should have some interesting implications. It should, for example, help us better locate the ‘complexity barrier’.

I explained that idea here:

• John Baez, The complexity barrier, Azimuth, 28 October 2011.

Briefly, while there’s no limit on how much information a string of bits—or any finite structure—can have, there’s a limit on how much information we can prove it has!

This amount of information is pretty low, perhaps a few kilobytes. And I believe the new work on logic hacking can be used to estimate it more accurately!

John PreskillQuantum braiding: It’s all in (and on) your head.

Morning sunlight illuminated John Preskill’s lecture notes. The notes concern Caltech’s quantum-computation course, Ph 219. I’m TAing (the teaching assistant for) Ph 219. I previewed lecture material one sun-kissed Sunday.

Pasadena sunlight spilled through my window. So did the howling of a dog that’s deepened my appreciation for Billy Collins’s poem “Another reason why I don’t keep a gun in the house.” My desk space warmed up, and I unbuttoned my jacket. I underlined a phrase, braided my hair so my neck could cool, and flipped a page.

I flipped back. The phrase concerned a mathematical statement called “the Yang-Baxter relation.” A sunbeam had winked on in my mind: The Yang-Baxter relation described my hair.

The Yang-Baxter relation belongs to a branch of math called “topology.” Topology resembles geometry in its focus on shapes. Topologists study spheres, doughnuts, knots, and braids.

Topology describes some quantum physics. Scientists are harnessing this physics to build quantum computers. Alexei Kitaev largely dreamed up the harness. Alexei, a Caltech professor, is teaching Ph 219 this spring.1 His computational scheme works like this.

We can encode information in radio signals, in letters printed on a page, in the pursing of one’s lips as one passes a howling dog’s owner, and in quantum particles. Imagine three particles on a tabletop.

Peas 1

Consider pushing the particles around like peas on a dinner plate. You could push peas 1 and 2 until they swapped places. The swap represents a computation, in Alexei’s scheme.2

The diagram below shows how the peas move. Imagine slicing the figure into horizontal strips. Each strip would show one instant in time. Letting time run amounts to following the diagram from bottom to top.

Peas 2

Arrows copied from John Preskill’s lecture notes. Peas added by the author.

Imagine swapping peas 1 and 3.

Peas 3

Humor me with one more swap, an interchange of 2 and 3.

Peas 4

Congratulations! You’ve modeled a significant quantum computation. You’ve also braided particles.

2 braids

The author models a quantum computation.

Let’s recap: You began with peas 1, 2, and 3. You swapped 1 with 2, then 1 with 3, and then 2 with 3. The peas end up ordered oppositely the way they began—end up ordered as 3, 2, 1.

You could, instead, morph 1-2-3 into 3-2-1 via a different sequence of swaps. That sequence, or braid, appears below.

Peas 5

Congratulations! You’ve begun proving the Yang-Baxter relation. You’ve shown that  each braid turns 1-2-3 into 3-2-1.

The relation states also that 1-2-3 is topologically equivalent to 3-2-1: Imagine standing atop pea 2 during the 1-2-3 braiding. You’d see peas 1 and 3 circle around you counterclockwise. You’d see the same circling if you stood atop pea 2 during the 3-2-1 braiding.

That Sunday morning, I looked at John’s swap diagrams. I looked at the hair draped over my left shoulder. I looked at John’s swap diagrams.

“Yang-Baxter relation” might sound, to nonspecialists, like a mouthful of tweed. It might sound like a sneeze in a musty library. But an eight-year-old could grasp the half the relation. When I braid my hair, I pass my left hand over the back of my neck. Then, I pass my right hand over. But I could have passed the right hand first, then the left. The braid would have ended the same way. The braidings would look identical to a beetle hiding atop what had begun as the middle hunk of hair.


The Yang-Baxter relation.

I tried to keep reading John’s lecture notes, but the analogy mushroomed. Imagine spinning one pea atop the table.

Pea 6

A 360° rotation returns the pea to its initial orientation. You can’t distinguish the pea’s final state from its first. But a quantum particle’s state can change during a 360° rotation. Physicists illustrate such rotations with corkscrews.


Pachos corkscrew 2

A quantum corkscrew (“twisted worldribbon,” in technical jargon)

Like the corkscrews formed as I twirled my hair around a finger. I hadn’t realized that I was fidgeting till I found John’s analysis.

Version 2

I gave up on his lecture notes as the analogy sprouted legs.

I’ve never mastered the fishtail braid. What computation might it represent? What about the French braid? You begin French-braiding by selecting a clump of hair. You add strands to the clump while braiding. The addition brings to mind particles created (and annihilated) during a topological quantum computation.

Ancient Greek statues wear elaborate hairstyles, replete with braids and twists.  Could you decode a Greek hairdo? Might it represent the first 18 digits in pi? How long an algorithm could you run on Rapunzel’s hair?

Call me one bobby pin short of a bun. But shouldn’t a scientist find inspiration in every fiber of nature? The sunlight spilling through a window illuminates no less than the hair spilling over a shoulder. What grows on a quantum physicist’s head informs what grows in it.


1Alexei and John trade off on teaching Ph 219. Alexei recommends the notes that John wrote while teaching in previous years.

2When your mother ordered you to quit playing with your food, you could have objected, “I’m modeling computations!”

May 22, 2016

Clifford JohnsonA New Era

expo_line_two_opening_1Many years ago, even before the ground was broken on phase one of the Expo line and arguments were continuing about whether it would ever happen, I started saying that I was looking forward to the days when I could put my pen down, step out of my office, get on the train a minute away, and take it all the way to the beach and finish my computation there. Well, Friday, the first such day arrived. Phase two of the Expo line is now complete and has opened to the public, with newly finished stations from Culver City through Santa Monica. It joins the already running (since April 2012) Expo phase one, which I've been using every day to get to campus after changing from the Red line (connecting downtown).

expo_line_two_opening_2On Friday I happened to accidentally catch the first Expo Line train heading all the way out to Santa Monica! (I mean the first one for the plebs - there had been a celebratory one earlier with the mayor and so forth, I was told). I was not planning to do so and was just doing my routine trip to campus, thinking I'd try the new leg out later (as I did when phase one opened - see here). But there was a cheer when the train pulled up at Metro/7th downtown and the voice over the overhead speakers [...] Click to continue reading this post

The post A New Era appeared first on Asymptotia.

Chad OrzelBeyoncé and LIGO: Stochastic Awareness of Science Is Probably Okay

I’ve had this piece by Rick Borchelt on “science literacy” and this one by Paige Brown Jarreau on “echo chambers” open in tabs for… months. I keep them around because I have thoughts on the general subject, but I keep not writing them up because I suspect that what I want to say won’t be read much, and I find it frustrating to put a lot of work into a blog post only to be greeted by crickets chirping.

But, now I find myself in a position where I sort of need to have a more thought-out version of the general argument. So I’m going to do a kind of slapdash blog post working this out as I type, and hopefully end up where I need to be, whether or not anyone else pays any attention.

So. The general thrust of both Borchelt and Jarreau’s pieces is pretty similar: a lot of work in “science communication” seems to be misdirected or ineffective. The audience for science blogs and web sites and the rest is drawn from the same limited pool of people who actively seek that stuff out. Most of the rest of the public isn’t looking for information about science, and thus, they’re not getting it. Which is generally a cue for much hand-wringing among the science-communication crowd over how we’re failing, and need to Do Better.

But over the last few years, I’ve started to wonder whether that’s really as big a problem as all the deeply concerned blog posts I’ve read seem to think. And the reason for that is Beyoncé.

It’s not anything that Beyoncé herself did, just the fact that I’m aware of her. I don’t own any of her music, and I’m not sure I’ve ever listened to a complete song of hers, as occasional snippets have been enough to confirm that it’s really not my thing. Nevertheless, I know of her, and have a generally positive impression, because news about her manages to impinge on my awareness in a variety of indirect ways– performing at the Super Bowl, bits of gossip on the pop-music station I listen to when SteelyKid’s in the car (they don’t regularly play her stuff, but they talk about her a bunch), or various science-y people on Twitter gushing about her dropping a new album, etc.

Beyoncé is just the most positive example of a general category of people I don’t have any particular reason to care about who I am nonetheless vaguely informed about. I think of this general phenomenon as “stochastic awareness of pop culture.” I don’t have any systematic knowledge of Beyoncé or the various Kardashians, but I know who they are and a bit about them because that information randomly shows up in front of me. Which is more or less inevitable, because there are a lot of people out there who care very deeply about the activities of these individuals, and pump an enormous amount of effort into generating stories about them. And the end result is that even though her music is not my thing, I have a hazy sense of her place in the pop-culture firmament, and a generally positive impression.

And hand-wringing blog posts aside, I think science communication could do a lot worse than operating on this same basic model. That is, we generate a lot of content about science that is primarily consumed by people who already care about the subject, in the same way that legions of reporters generate endless stories and thinkpieces about Beyoncé and other celebrities. And some fraction of that content will, from time to time, randomly end up impinging on the awareness of people who aren’t actively seeking information about science, leaving them with the same kind of stochastic awareness of science news that I have about celebrity culture.

Most of the time, this involves big, splashy stories– LIGO detecting gravitational waves, or the Pluto fly-by, and that kind of thing. In the pop-culture analogy, these are basically like Beyoncé performing at the Super Bowl. But there’s also a lot of connections that are essentially random. My favorite personal example of this is when my Forbes blog post about friction, inspired by a silly episode where I didn’t lose my phone off the roof of my car, wound up as a story in the Daily Mail. On a typical day, I’m pretty sure the overlap between my blog readership and the readership of the Daily Mail is negligible, but for essentially random reasons, this story ended up being put in front of a lot of people who wouldn’t actively seek it out.

From a communications and policy perspective, the hope is that when these stories land in front of people, they spark a “Hey, that’s cool…” sort of reaction. Ideally, this might prompt people to learn a bit more about the specific random topic, by reading other articles, or striking up conversations at work, etc. I can’t say how effective this is for random phone-on-the-car stories, but it works for big news events– whenever NASA holds a press conference, I can expect a few questions about it at Starbucks the next day, from the regulars who know I’m a scientist. And hopefully those leave a generally positive impression about science as something that’s pretty cool and thus worth a bit of money.

I would argue that the implication of the Borchelt and Jarreau posts I linked at the start of this post is that this is essentially what we’re already doing. That is, the work people who write about science (or make videos, etc.) are doing mostly ends up in front of an audience who already care about that subject. In the same way that most of what celebrity-culture reporters write about Beyoncé ends up in front of people who already care deeply about pop music. But I think those posts, and a lot of other writing about this, sort of underplay the effect of the occasions when, for random reasons, science news ends up in front of pop-music fans.

This isn’t an argument against doing science communication, though I’m sure it will be taken that way by some. It’s not even an argument against trying to find better and more effective ways to communicate science. On the contrary, I think both of those things are essential– the more science content we put out there, the better the chance that something breaks through, and if we can figure out and put into practice techniques for making stories land more effectively, we can hopefully boost the impact of those stories that do break out. I think we’ve seen some real progress on the latter, actually– NASA comes in for some ribbing about the number of press conferences they hold, but their PR people are genuinely good at crafting their messages, and the LHC has done a brilliant job of getting public attention for really abstract stuff.

So I don’t mean this as a “Science Communication: You’re Doing It Wrong” article. Instead, it’s an “Everybody take a deep breath and try to calm down” article. Yes, we’re mostly talking to ourselves, but “mostly” isn’t “only,” and I’m not so sure that stochastic public awareness of science is a major crisis that we need to wring our hands over endlessly.


(The obvious counterargument to my position takes the form “Yes, but science news is Important, while celebrity gossip is just trash.” Which is true to a point, but this is yet another area where science is not in any way unique. Pretty much any field of study that has even a slight connection to public policy has the same issues of stochastic public awareness of their subject, which is more Important than whatever Beyoncé is up to these days. And it’s been that way for decades (at minimum; Borchelt offers some documentation in the case of science) without the world coming to an end. So, you know, maybe it’s not that big a problem.

(I also find that I’m becoming less comfortable with declarations that this issue or that is What Really Matters, because there’s a kind of fundamental elitism to the whole business that rubs me the wrong way. At the end of the day, what matters to the general public is what they say matters to them, and if that doesn’t align with the elite consensus, it’s on us to convince them otherwise. This is maybe a sign that I’ve been in academia too long, and the secondhand smoke of “postmodernism” is rotting my brain, but whatever…)

May 21, 2016

Chad OrzelPhysics Blogging Round-Up: Books, Entanglement, Optics, Many-Worlds, Two Cultures, and Clocks

A whole bunch of physics posts over at Forbes so far this month:

Recent Physics Books: Gravitational Waves and Brief Lessons: Short reviews of Janna Levin’s Black Hole Blues and Carlo Rovelli’s Seven Brief Lessons on Physics.

The Real Reasons Quantum Entanglement Doesn’t Allow Faster-Than-Light Communication: Expanding on and correcting some stuff I didn’t like about Ethan “Starts With A Bang” Siegel’s take on entanglement as a communications tool.

Why Does The Rising sun Change Color?: I watched a bunch of sunrises on the cruise, which led to me scribbling equations, and then a blog post.

How Quantum Entanglement Can Help You Understand Many-Worlds: Entanglement is weird, but serves as a nice concrete system for talking about how decoherence hides quantum behavior without requiring “collapse” into a single well-defined state.

How Even Research That Sounds Silly Has Value: Kind of a weird thinkpiece, because that’s the sort of mood I was in.

Why Do Physicists Want A “Nuclear Clock”?: A story about a key step toward making a clock based on a nuclear isomer transition in thorium-229 caught my eye, so I tried to explain why that’s cool.

Historians and Astronomers Share These Scientific Methods In Common: A neat project based on digitizing ads about fugitive slaves reminded me of some of the things astronomers do, so I wrote about the parallel.

As usual, quite a range of stuff. I’ll be at DAMOP all next week, which may or may not allow time for on-site blogging, but will almost certainly give me some topics to write on when I get back.

n-Category Café Castles in the Air

The most recent issue of the Notices includes a review by Slava Gerovitch of a book by Amir Alexander called Infinitesimal: How a Dangerous Mathematical Theory Shaped the Modern World. As the reviewer presents it, one of the main points of the book is that science was advanced the most by the people who studied and worked with infinitesimals despite their apparent formal inconsistency. The following quote is from the end of the review:

If… maintaining the appearance of infallibility becomes more important than exploration of new ideas, mathematics loses its creative spirit and turns into a storage of theorems. Innovation often grows out of outlandish ideas, but to make them acceptable one needs a different cultural image of mathematics — not a perfectly polished pyramid of knowledge, but a freely growing tree with tangled branches.

The reviewer makes parallels to more recent situations such as quantum field theory and string theory, where the formal mathematical justification may be lacking but the physical theory is meaningful, fruitful, and made correct predictions, even for pure mathematics. However, I couldn’t help thinking of recent examples entirely within pure mathematics as well, and particularly in some fields of interest around here.

Here are a few; feel free to suggest others in the comments (or to take issue with mine).

  • Informal arguments in higher category theory. For example, Lurie’s original paper On infinity topoi lacked a rigorous formal foundation, but contained many important insights. Because quasicategories had already been invented, he was able to make the ideas rigorous in reasonably short order; but I think it’s fair to say the price is a minefield of technical lemmas. Nowadays one finds people wanting to say “we work with (,1)(\infty,1)-categories model-independently” to avoid all the technicalities, but it’s unclear whether this quite makes sense. (Although I have some hope now that a formal language closer to the informal one may come out of the Riehl-Verity theory of \infty-cosmoi.)

  • String diagrams for monoidal categories. Joyal and Street’s original paper “The geometry of tensor calculus” carefully defined string diagrams as topological graphs and proved that any labeled string diagram could be interpreted in a monoidal category. But since then, string diagrams have proven so useful that many people have invented variants of them that apply to many different kinds of monoidal categories, and in many (perhaps most) cases they proceed to use them without a similar justifying theorem. Kate and I proved the justifying theorem for our string diagrams for bicategories with shadows, but we didn’t even try it with our string diagrams for monoidal fibrations.

  • Combining higher category with string diagrams, we have the recent “graphical proof assistant” Globular, which formally works with a certain kind of semistrict nn-category for n4n\le 4. It’s known that semistrict 3-categories (Gray-categories) suffice to model all weak 3-categories, but no such theorem is yet known for 4-categories. So officially, doing a proof about 4-categories in Globular tells you nothing more than that it’s true about semistrict 4-categories, and I suspect that few naturally-ocurring 4-categories are naturally semistrict. However, such an argument clearly has meaning and applicability much more generally.

  • And, of course, there is homotopy type theory. Plenty of it is completely rigorous, of course (and even formally verified in a computer), but I’m thinking particularly of its conjectural higher-categorical semantics. Pretty much everyone agrees that HoTT should be an internal language for (,1)(\infty,1)-topoi, but with present technology this depends on an initiality theorem for models of type theories in general that is universally believed to be true but is very fiddly to prove correctly and has only been written down carefully in one special case. Moreover, even granting the initiality theorem there are various slight mismatches between the formal theories in current use and what we can construct in higher toposes to model them, e.g. the universes are not strict enough and the HITs are too big. Nevertheless, this relationship has been very fruitful to both sides of the subject already (the type theory and the category theory).

The title of this post is a reference to a classic remark by Thoreau:

“If you have built castles in the air, your work need not be lost; that is where they should be. Now put the foundations under them.”

May 20, 2016

Clifford JohnsonGut Feeling…

gut_feeling_sampleStill slowly getting back up to speed (literally) on page production. I've made some major tweaks in my desktop workflow (I mostly move back and forth between Photoshop and Illustrator at this stage), and finally have started keeping track of my colours in a more efficient way (using global process colours, etc), which will be useful if I have to do big colour changes later on. My workflow improvement also now includes [...] Click to continue reading this post

The post Gut Feeling… appeared first on Asymptotia.

Jordan EllenbergBounds for cap sets

Briefly:  it seems to me that the idea of the Croot-Lev-Pach paper I posted about yesterday can indeed be used to give a new bound on the size of subsets of F_3^n with no three-term arithmetic progression! Such a set has size at most (2.756)^n. (There’s actually a closed form for the constant, I think, but I haven’t written it down yet.)

Here’s the preprint. It’s very short. I’ll post this to the arXiv in a day or two, assuming I (or you) don’t find anything wrong with it, so comment if you have comments!

Update:  Busy few days of administrative stuff and travel, sorry for not having updated the preprint yet, will try to finish it today.  One note, already observed below in the comments:  you get a similar bound for subsets of (F_q)^n free of solutions to (ax+by+cz=0) for any (a,b,c) with a+b+c=0; the cap set case is q=3, (a,b,c) = (1,1,1).

Update 2:  Dion Gijswijt and I will be submitting this result as a joint paper, which will amalgamate the presentations of our essentially identical arguments.  Dion carried out his work independently of mine at around the same time, and the idea should be credited to both of us.


May 19, 2016

Sean CarrollGive the People What They Want

And what they want, apparently, is 470-page treatises on the scientific and philosophical underpinnings of naturalism. To appear soon in the Newspaper of Record:


Happy also to see great science books like Lab Girl and Seven Brief Lessons on Physics make the NYT best-seller list. See? Science isn’t so scary at all.

Clifford JohnsonCompetition Results!

design_film_competition_logo_mediumThis year's USC Science Film Competition saw another crop of films with a great variety of approaches, with live action and animation, comedy, drama and documentary, and all sorts of hydrids of those forms. Thanks to all who took part. As for the results, and seeing the films (do take a look!) I'll repeat here the post I did over on the competition's blog:


We had a lot of fun at the screening and showcase last Thursday. The films looked great on the Imax screen. Many thanks to Matt Scott for working hard to make sure it all looked great, and also to him and the Large Format Cinema Club for co-hosting the event! Once again, thanks to the Burg Foundation for supporting the competition financially with prize money, grants for helping with the filmmaking, and funds for refreshments and logistics.

The results are as follows: [...] Click to continue reading this post

The post Competition Results! appeared first on Asymptotia.

ResonaancesA new boson at 750 GeV?

ATLAS and CMS presented today a summary of the first LHC results obtained from proton collisions with 13 TeV center-of-mass energy. The most exciting news was of course the 3.6 sigma bump at 750 GeV in the ATLAS diphoton spectrum, roughly coinciding with a 2.6 sigma excess in CMS. When there's an experimental hint of new physics signal there is always this set of questions we must ask:

0. WTF ?
0. Do we understand the background?
1. What is the statistical significance of  the signal?
2. Is the signal consistent with other data sets?
3. Is there a theoretical framework to describe it?
4. Does it fit in a bigger scheme of new physics?

Let us go through these questions one by one.

The background.  There's several boring ways to make photon pairs at the LHC, but they are expected to produce a  spectrum smoothly decreasing with the invariant mass of the pair. This expectation was borne out in run-1, where the 125 GeV Higgs resonance could be clearly seen on top of a nicely smooth background, with no breaks or big wiggles. So it is unlikely that some Standard Model processes (other than a statistical fluctuation) may produce a bump such as the one seen by ATLAS.

The stats.   The local significance is 3.6 sigma in ATLAS and 2.6 sigma in CMS.  Naively combining the two, we get a more than 4 sigma excess. It is a very large effect, but we have already seen this large fluctuations at the LHC that vanished into thin air (remember 145 GeV Higgs?). Next year's LHC data will be  crucial to confirm or exclude the signal.  In the meantime, we have a perfect right to be excited.

The consistency. For this discussion, the most important piece of information is the diphoton data collected in run-1 at 8 TeV center-of-mass energy.  Both ATLAS and CMS have a small 1 sigma excess around 750 GeV in the run-1 data, but there is no clear bump there.  If a new 750 GeV  particle is produced in gluon-gluon collisions,  then the gain in the signal cross section at 13 TeV compared to 8 TeV is roughly a factor of 5.  On the other hand, there was 6 times more data collected at 8 TeV by ATLAS (3.2 fb-1 vs 20 fb-1). This means that the number of signal events produced in ATLAS at 13 TeV should be about 75% of those at 8 TeV, and the ratio is even worse for CMS (who used only 2.6 fb-1).  However, the background may grow less fast than the signal, so the power of the 13 TeV and 8 TeV data is comparable.  All in all, there is some tension between the run-1 and run-2 data sets,  however a mild downward fluctuation of the signal at 8 TeV and/or a mild upward fluctuation at 13 TeV is enough to explain it.  One can also try to explain the lack of signal in run-1 by the fact that the 750 GeV particle is a decay product of a heavier resonance (in which case the cross-section gain can be much larger). More careful study with next year's data  will be needed to test for this possibility.

The model.  This is the easiest part :)  A resonance produced in gluon-gluon collisions and decaying to 2 photons?  We've seen that already... that's how the Higgs boson was first spotted.  So all we need to do is to borrow from the Standard Model. The simplest toy model for the resonance would be a new singlet scalar with mass of 750 GeV coupled to new heavy vector-like quarks that carry color and electric charges. Then quantum effects will produce, in analogy to what happens for the Higgs boson, an effective coupling of the new scalar to gluons and photons:

By a judicious choice of the effective couplings (which depend on masses, charges, and couplings of the vector-like quarks) one can easily fit the diphoton excess observed by ATLAS and CMS. This is shown as the green region in the plot.
 If the vector-like quark is a T', that is to say, it has the same color and electric charge as the Standard Model top quark, then the effective couplings must lie along the blue line. The exclusion limits from the run-1 data (mesh) cut through the best fit region, but do not disfavor the model completely. Variation of this minimal toy model will appear in a 100 papers this week.

The big picture.  Here sky is the limit. The situation is completely different than 3 years ago, where there was one strongly preferred (and ultimately true) interpretation of the 125 GeV diphoton and 4-lepton signals as the Higgs boson of the Standard Model. On the other hand,  scalars coupled to new quarks appear in countless model of new physics. We may be seeing the radial Higgs partner predicted by little Higgs or twin Higgs models, or the dilaton arising due to spontaneous conformal symmetry breaking, or a composite state bound by new strong interactions.  It could be a part of the extended Higgs sector in many different context, e.g. the heavy scalar or pseudo-scalar in the two Higgs doublet models.  For more spaced out possibilities, it could be the KK graviton of the Randall-Sundrum model, or it could fit some popular supersymmetric models such as the  NMSSM. All these scenarios face some challenges.  One is to explain why the branching ratio into two photons is large enough to be observed, and why the 750 GeV scalar is not seen in other decays channels, e.g. in decay to W boson pairs which should be the dominant mode for a Higgs-like scalar.  However, these challenges are nothing that an average theorist could not resolve by tomorrow morning.  Most likely, this particle would just be a small part of the larger structure, possibly having something to do with electroweak symmetry breaking and the hierarchy problem of the Standard Model.  If the signal is a real thing, then it may be the beginning of a new golden era in particle physics....

ResonaancesHiggs force awakens

The Higgs boson couples to particles that constitute matter around us, such as electrons, protons, and neutrons. Its virtual quanta are constantly being exchanged between these particles.  In other words, it gives rise to a force -  the Higgs force. I'm surprised why this PR-cool aspect is not explored in our outreach efforts. Higgs bosons mediate the Higgs force in the same fashion as gravitons, gluons, photons, W and Z bosons mediate  the gravity, strong, electromagnetic, and  weak forces. Just like gravity, the Higgs force is always attractive and its strength is proportional, in the first approximation, to particle's mass. It is a force in a common sense; for example, if we bombarded long enough a detector with a beam of particles interacting only via the Higgs force, they would eventually knock off atoms in the detector.

There is of course a reason why the Higgs force is less discussed: it has never been detected directly. Indeed, in the absence of midi-chlorians it is extremely weak. First, it shares the feature of the weak interactions of being short-ranged: since the mediator is massive, the interaction strength is exponentially suppressed at distances larger than an attometer (10^-18 m), about 0.1% of the diameter of a proton. Moreover, for ordinary matter, the weak force is more important because of the tiny Higgs couplings to light quarks and electrons. For example, for the proton the Higgs force is thousand times weaker than the weak force, and for the electron it is hundred thousand times weaker. Finally, there are no known particles interacting only via the Higgs force and gravity (though dark matter in some hypothetical models has this property), so in practice the Higgs force is always a tiny correction to more powerful forces that shape the structure of atoms and nuclei. This is again in contrast to the weak force, which is particularly relevant for neutrinos who are immune to strong and electromagnetic forces.

Nevertheless, this new paper argues that the situation is not hopeless, and that the current experimental sensitivity is good enough to start probing the Higgs force. The authors propose to do it by means of atom spectroscopy. Frequency measurements of atomic transitions have reached the stunning accuracy of order 10^-18. The Higgs force creates a Yukawa type potential between the nucleus and orbiting electrons, which leads to a shift of the atomic levels. The effect is tiny, in particular it  is always smaller than the analogous shift due to the weak force. This is a serious problem, because calculations of the leading effects may not be accurate enough to extract the subleading Higgs contribution.  Fortunately, there may be tricks to reduce the uncertainties. One is to measure how the isotope shift of transition frequencies for several isotope pairs. The theory says that the leading atomic interactions should give rise to a universal linear relation (the so-called King's relation) between  isotope shifts for different transitions. The Higgs and weak interactions should lead to a violation of King's relation. Given many uncertainties plaguing calculations of atomic levels, it may still be difficult to ever claim a detection of the Higgs force. More realistically, one can try to set limits on the Higgs couplings to light fermions which will be better than the current collider limits.  

Atomic spectroscopy is way above my head, so I cannot judge if the proposal is realistic. There are a few practical issues to resolve before the Higgs force is mastered into a lightsaber. However, it is possible that a new front to study the Higgs boson will be opened in the near future. These studies will provide information about the Higgs couplings to light Standard Model fermions, which is complementary to the information obtained from collider searches.

Resonaances750 ways to leave your lover

A new paper last week straightens out the story of the diphoton background in ATLAS. Some confusion was created because theorists misinterpreted the procedures described in the ATLAS conference note, which could lead to a different estimate of the significance of the 750 GeV excess. However, once the correct phenomenological and statistical approach is adopted, the significance quoted by ATLAS can be reproduced, up to small differences due to incomplete information available in public documents. Anyway, now that this is all behind, we can safely continue being excited at least until summer.  Today I want to discuss different interpretations of the diphoton bump observed by ATLAS. I will take a purely phenomenological point of view, leaving for the next time  the question of a bigger picture that the resonance may fit into.

Phenomenologically, the most straightforward interpretation is the so-called everyone's model: a 750 GeV singlet scalar particle produced in gluon fusion and decaying to photons via loops of new vector-like quarks. This simple construction perfectly explains all publicly available data, and can be easily embedded in more sophisticated models. Nevertheless, many more possibilities were pointed out in the 750 papers so far, and here I review a few that I find most interesting.

Spin Zero or More?  
For a particle decaying to two photons, there is not that many possibilities: the resonance has to be a boson and, according to young Landau's theorem, it cannot have spin 1. This leaves at the table spin 0, 2, or higher. Spin-2 is an interesting hypothesis, as this kind of excitations is predicted in popular models like the Randall-Sundrum one. Higher-than-two spins are disfavored theoretically. When more data is collected, the spin of the 750 GeV resonance can be tested by looking at the angular distribution of the photons. The rumor is that the data so far somewhat favor spin-2 over spin-0, although the statistics is certainly insufficient for any serious conclusions.  Concerning the parity, it is practically impossible to determine it by studying the diphoton final state, and both the scalar and the pseudoscalar option are equally viable at present. Discrimination may be possible in the future, but  only if multi-body decay modes of the resonance are discovered. If the true final state is more complicated than two photons (see below), then the 750 GeV resonance may have  any spin, including spin-1 and spin-1/2.

Narrow or Wide? 
The total width is an inverse of particle's lifetime (in our funny units). From the experimental point of view, the width larger than detector's  energy resolution  will show up as a smearing of the resonance due to the uncertainty principle. Currently, the ATLAS run-2 data prefer the width 10 times larger than the experimental resolution  (which is about 5 GeV in this energy ballpark), although the preference is not very strong in the statistical sense. On the other hand, from the theoretical point of view, it is much easier to construct models where the 750 GeV resonance is a narrow particle. Therefore, confirmation of the large width would have profound consequences, as it would significantly narrow down the scope of viable models.  The most exciting interpretation would then be that the resonance is a portal to a dark sector containing new light particles very weakly coupled to ordinary matter.    

How many resonances?  
One resonance is enough, but a family of resonances tightly packed around 750 GeV may also explain the data. As a bonus, this could explain the seemingly large width without opening new dangerous decay channels. It is quite natural for particles to come in multiplets with similar masses: our pion is an example where the small mass splitting π± and π0 arises due to electromagnetic quantum corrections. For Higgs-like multiplets the small splitting may naturally arise after electroweak symmetry breaking, and  the familiar 2-Higgs doublet model offers a simple realization. If the mass splitting of the multiplet is larger than the experimental resolution, this possibility can tested by precisely measuring the profile of the resonance and searching for a departure from the Breit-Wigner shape. On the other side of the spectrum is the idea is that there is no resonance at all at 750 GeV, but rather at another mass, and the bump at 750 GeV appears due to some kinematical accidents.
Who made it? 
The most plausible production process is definitely the gluon-gluon fusion. Production in collisions of light quark and antiquarks is also theoretically sound, however it leads to a more acute tension between run-2 and run-1 data. Indeed, even for the gluon fusion, the production cross section of a 750 GeV resonance in 13 TeV proton collisions is only 5 times larger than at 8 TeV. Given the larger amount of data collected in run-1, we would expect a similar excess there, contrary to observations. For a resonance produced from u-ubar or d-dbar the analogous ratio is only 2.5 (see the table), leading to much more  tension. The ratio climbs back to 5 if the initial state contains the heavier quarks: strange, charm, or bottom (which can also be found sometimes inside a proton), however I haven't seen yet a neat model that makes use of that. Another possibility is to produce the resonance via photon-photon collisions. This way one could cook up a truly minimal and very predictive model where the resonance couples only to photons of all the Standard Model particles. However, in this case, the ratio between 13 and 8 TeV cross section is very unfavorable, merely a factor of 2, and the run-1 vs run-2 tension comes back with more force. More options open up when associated production (e.g. with t-tbar, or in vector boson fusion) is considered. The problem with these ideas is that, according to what was revealed during the talk last December, there isn't any additional energetic particles in the diphoton events. Similar problems are facing models where the 750 GeV resonance appears as a decay product of a heavier resonance, although in this case some clever engineering or fine-tuning may help to hide the additional particles from experimentalist's eyes.

Two-body or more?
While a simple two-body decay of the resonance into two photons is a perfectly plausible explanation of all existing data, a number of interesting alternatives have been suggested. For example, the decay could be 3-body, with another soft visible or invisible  particle accompanying two photons. If the masses of all particles involved are chosen appropriately, the invariant mass spectrum of the diphoton remains sharply peaked. At the same time, a broadening of the diphoton energy due to the 3-body kinematics may explain why the resonance appears wide in ATLAS. Another possibility is a cascade decay into 4 photons. If the  intermediate particles are very light, then the pairs of photons from their decay are very collimated and may look like a single photon in the detector.
 ♬ The problem is all inside your head   and the possibilities are endless. The situation is completely different than during the process of discovering the  Higgs boson, where one strongly favored hypothesis was tested against more exotic ideas. Of course, the first and foremost question is whether the excess is really new physics, or just a nasty statistical fluctuation. But if that is confirmed, the next crucial task for experimentalists will be to establish the nature of the resonance and get model builders on the right track.  The answer is easy if you take it logically ♬ 

All ideas discussed above appeared in recent articles by various authors addressing the 750 GeV excess. If I were to include all references the post would be just one giant hyperlink, so you need to browse the literature yourself to find the original references.

ResonaancesApril Fools' 16: Was LIGO a hack?

This post is an April Fools' joke. LIGO's gravitational waves are for real. At least I hope so ;) 

We have had recently a few scientific embarrassments, where a big discovery announced with great fanfares was subsequently overturned by new evidence.  We still remember OPERA's faster than light neutrinos which turned out to be a loose cable, or BICEP's gravitational waves from inflation, which turned out to be galactic dust emission... It seems that another such embarrassment is coming our way: the recent LIGO's discovery of gravitational waves emitted in a black hole merger may share a similar fate. There are reasons to believe that the experiment was hacked, and the signal was injected by a prankster.

From the beginning, one reason to be skeptical about LIGO's discovery was that the signal  seemed too beautiful to be true. Indeed, the experimental curve looked as if taken out of a textbook on general relativity, with a clearly visible chirp signal from the inspiral phase, followed by a ringdown signal when the merged black hole relaxes to the Kerr state. The reason may be that it *is* taken out of a  textbook. This is at least what is strongly suggested by recent developments.

On EvilZone, a well-known hacker's forum, a hacker using a nickname Madhatter was boasting that it was possible to tamper with scientific instruments, including the LHC, the Fermi satellite, and the LIGO interferometer.  When challenged, he or she uploaded a piece of code that allows one to access LIGO computers. Apparently, the hacker took advantage the same backdoor that allows the selected members of the LIGO team to inject a fake signal in order to test the analysis chain.  This was brought to attention of the collaboration members, who  decided to test the code. To everyone's bewilderment, the effect was to reproduce exactly the same signal in the LIGO apparatus as the one observed in September last year!

Even though the traces of a hack cannot be discovered, there is little doubt now that there was a foul play involved. It is not clear what was the motif of the hacker: was it just a prank, or maybe an elaborate plan to discredit the scientists. What is even more worrying is that the same thing could happen in other experiments. The rumor is that the ATLAS and CMS collaborations are already checking whether the 750 GeV diphoton resonance signal could also be injected by a hacker.

Scott AaronsonMy Quora session

Here it is.  Enjoy!  (But sorry, no new questions right now.)

Chad OrzelDivision of Labor Is a Good Thing for Science and Skepticism

Noted grouchy person John Horgan has found a new way to get people mad at him on the Internet, via a speech-turned-blog-post taking organized Skeptic groups to task for mostly going after “soft targets”. This has generated lots of angry blog posts in response, and a far greater number of people sighing heavily and saying “There Horgan goes again…”

If you want to read only one counter to Horgan’s piece to get caught up, you could do a lot worse than reading Daniel Loxton’s calm and measured response. Loxton correctly notes that Horgan’s comments are nothing especially unique, just a variant of an argument that you find everywhere:

I’ve spent much of my career confronting the common argument that skeptics should not perform the service skeptics do best, but instead tackle other subjects we may not be qualified to address. It’s a head scratcher, honestly. “You have specialized expertise in X, but I think X is trivial. Why don’t you specialize in Y, because I think Y is important?” Nobody ever says this to Shakespeare scholars or doctors or plumbers. (“Dear ‘fire fighters,’ fight fires less and solve more murders”?) Seemingly everyone says it to skeptics.

There are only two minor points where I disagree with Loxton. One is the claim that this is primarily deployed only against skeptics, because the general tactic is everywhere. I get occasional comments and emails of the form “Why are you wasting time writing about arcane quantum physics when climate change is so much more important?” The endless arguments defending “the humanities” in academia are another version of the same basic thing– “Why should students study English lit when computer coding is so much more important?” And there’s even a sense in which much of the Democratic primary campaign has been dominated by this sort of thing– the arguments between Bernie Sanders supporters and Black Lives Matter activists, for example, basically boil down to each side thinking that the other is too focused on an issue that is not as important as their own primary concern.

So, skeptics have a lot of company in fending off “Your issue is trivial, you should spend more time on what I find most important.”

The other tiny disagreement I have is that I would slightly expand the qualifications justifying a decision to work on X rather than Y. That is, I don’t think it’s just a matter of specialized knowledge, but also a question of temperament. I don’t spend a whole lot of time battling quantum kookery– a rich source of targets both hard and soft– not because I lack specialized knowledge, but because I don’t have the right sort of personality to be good at it.

It’s not that I’m not bothered by charlatans trying to profit from misrepresentations of physics– on the contrary, I’m a little too bothered by it. I do occasionally write about this sort of thing, but it’s very difficult for me to do it without becoming snide. It’s sort of cathartic to vent about on occasion, but mostly not particularly productive– when I go back to stuff that I write in that mode, I generally don’t like the way I sound.

And it’s absolutely not in any way sustainable for me. One of the most notable thing about the skeptical fight is that it’s neverending. No debunking of Bigfoot, or Ancient Aliens, or quantum crackpottery is ever definitive– the folks on the other side always come back for more. There are two ways to deal with this: you either draw from a bottomless well of righteous indignation, a la Orac, or have a similarly deep reservoir of patience, as Loxton seems to.

I can’t really do either of those. I can be patient long enough to give a reasonably gracious reply to the nutty questions I get after public lectures, but that’s exhausted pretty quickly. And while I can get angry about this stuff at times, I can’t keep it up long enough to sustain me through the fifteenth round of the same stupid shit. I burn out, and that leads nowhere good.

Don’t get me wrong– I’m not saying this to disparage Loxton or Orac or any of the other folks out there fighting the good fight. What they do is good and valuable, and I’m glad they’re doing it. I’m also glad that I don’t have to do it, because I just don’t have the temperament.

But in the end, that’s the fundamental problem with Horgan’s provocation, and the similar arguments deployed by advocates of every Cause Y confronted with people who work on Issue X. It’s not necessarily the case that someone who does good work on X will be well suited to help with Y. There’s specialized knowledge involved in any of these issues, but also questions of personality and inclination. I’d do a lousy job of fighting kooks even within my field of expertise, let alone some other kind of “more important” political activism, because I don’t have the personality for it.

At bottom, this is just the classic problem of specialization and division of labor in economics. Different people are good at different things, and making people do things they’re not suited to will get you sub-optimal results. The best course is to have everyone work on the things they’re good at: Orac does rage, Loxton does patience, I do “Hey, isn’t quantum physics cool?” And Horgan pokes anthills with sticks.

This can be really hard to remember, especially when you’re passionately attached to a particular thing. God knows, I do my share of grumbling about the overemphasis on particle physics and lack of attention for atomic and condensed-matter physics. But it’s important to try to maintain perspective and recognize that just because you think Y is the most important thing in the world doesn’t mean that the world would be improved by making people who are good at X work on Y instead.

BackreactionThe Holy Grail of Crackpot Filtering: How the arXiv decides what’s science – and what’s not.

Where do we draw the boundary between science and pseudoscience? It’s is a question philosophers have debated for as long as there’s been science – and last time I looked they hadn’t made much progress. When you ask a sociologist their answer is normally a variant of: Science is what scientists do. So what do scientists do?

You might have heard that scientists use what’s called the scientific method, a virtuous cycle of generating and testing hypotheses which supposedly separates the good ideas from the bad ones. But that’s only part of the story because it doesn’t tell you where the hypotheses come from to begin with.

Science doesn’t operate with randomly generated hypotheses for the same reason natural selection doesn’t work with randomly generated genetic codes: it would be highly inefficient and any attempt to optimize the outcome would be doomed to fail. What we do instead is heavily filtering hypotheses, and then we consider only those which are small mutations of ideas that have previously worked. Scientists like to be surprised, but not too much.

Indeed, if you look at the scientific enterprise today, almost all of its institutionalized procedures are methods not for testing hypotheses, but for filtering hypotheses: Degrees, peer reviews, scientific guidelines, reproduction studies, measures for statistical significance, and community quality standards. Even the use of personal recommendations works to that end. In theoretical physics in particular the prevailing quality standard is that theories need to be formulated in mathematical terms. All these are requirements which have evolved over the last two centuries – and they have proved to work very well. It’s only smart to use them.

But the business of hypotheses filtering is a tricky one and it doesn’t proceed by written rules. It is a method that has developed through social demarcation, and as such it has its pitfalls. Humans are prone to social biases and every once in a while an idea get dismissed not because it’s bad, but because it lacks community support. And there is no telling how often this happens because these are the stories we never get to hear.

It isn’t news that scientists lock shoulders to defend their territory and use technical terms like fraternities use secret handshakes. It thus shouldn’t come as a surprise that an electronic archive which caters to the scientific community would develop software to emulate the community’s filters. And that is, in a nutshell, basically what the arXiv is doing.

In an interesting recent paper, Luis Reyes-Galindo had a look at the arXiv moderators and their reliance on automated filters:

In the attempt to develop an algorithm that would sort papers into arXiv categories automatically, thereby supporting arXiv moderators to decide when a submission needs to be reclassified, it turned out that papers which scientists would mark down as “crackpottery” showed up as not classifiable or stood out by language significantly different from that in the published literature. According to Paul Ginsparg, who developed the arXiv more than 20 years ago:
“The first thing I noticed was that every once in a while the classifier would spit something out as ‘I don't know what category this is’ and you’d look at it and it would be what we’re calling this fringe stuff. That quite surprised me. How can this classifier that was tuned to figure out category be seemingly detecting quality?

“[Outliers] also show up in the stop word distribution, even if the stop words are just catching the style and not the content! They’re writing in a style which is deviating, in a way. [...]

“What it’s saying is that people who go through a certain training and who read these articles and who write these articles learn to write in a very specific language. This language, this mode of writing and the frequency with which they use terms and in conjunctions and all of the rest is very characteristic to people who have a certain training. The people from outside that community are just not emulating that. They don’t come from the same training and so this thing shows up in ways you wouldn’t necessarily guess. They’re combining two willy-nilly subjects from different fields and so that gets spit out.”
It doesn’t surprise me much – you can see this happening in comment sections all over the place: The “insiders” can immediately tell who is an “outsider.” Often it doesn’t take more than a sentence or two, an odd expression, a term used in the wrong context, a phrase that nobody in the field would ever use. It is only consequential that with smart software you can tell insiders from outsiders even more efficiently than humans. According to Ginsparg:
“We've actually had submissions to arXiv that are not spotted by the moderators but are spotted by the automated programme [...] All I was trying to do is build a simple text classifier and inadvertently I built what I call The Holy Grail of Crackpot Filtering.”
Trying to speak in the code of a group you haven’t been part of at least for some time is pretty much impossible, much like it’s impossible to fake the accent of a city you haven’t lived in for some while. Such in-group and out-group demarcation is subject of much study in sociology, not specifically the sociology of science, but generally. Scientists are human and of course in-group and out-group behavior also shapes their profession, even though they like to deny it as if they were superhuman think-machines.

What is interesting about this paper is that, for the first time, it openly discusses how the process of filtering happens. It’s software that literally encodes the hidden rules that physicists use to sort out cranks. For what I can tell, the arXiv filters work reasonably well, otherwise there would be much complaint in the community. But the vast majority of researchers in the field are quite satisfied with what the arXiv is doing, meaning the arXiv filters match their own judgement.

There are exceptions of course. I have heard some stories of people who were working on new approaches that fell between the stools and were flagged as potential crackpottery. The cases that I know of could eventually be resolved, but that might tell you more about the people I know than about the way such issues typically end.

Personally, I have never had a problem with the arXiv moderation. I had a paper reclassified from gen-ph to gr-qc once by a well-meaning moderator, which is how I learned that gen-ph is the dump for borderline crackpottery. (How would I have known? I don’t read gen-ph. I was just assuming someone reads it.)

I don’t so much have an issue with what gets filtered on the arXiv, what bothers me much more is what does not get filtered and hence, implicitly, gets approval by the community. I am very sympathetic to the concerns of John The-End-Of-Science Horgan that scientists don’t clean enough on their own doorsteps. There is no “invisible hand” that corrects scientists if they go astray. We have to do this ourselves. In-group behavior can greatly misdirect science because, given sufficiently many people, even fruitless research can become self-supportive. No filter that is derived from the community’s own judgement will do anything about this.

It’s about time that scientists start paying attention to social behavior in their community. It can, and sometimes does, affect objective judgement. Ignoring or flagging what doesn’t fit into pre-existing categories is one such social problem that can stand in the way of progress.

In a 2013 paper published in Science, a group of researchers quantified the likeliness of combinations of topics in citation lists and studied the cross-correlation with the probability of the paper becoming a “hit” (meaning in the upper 5th percentile of citation scores). They found that having previously unlikely combinations in the quoted literature is positively correlated with the later impact of a paper. They also note that the fraction of papers with such ‘unconventional’ combinations has decreased from 3.54% in the 1980s to 2.67% in the 1990, “indicating a persistent and prominent tendency for high conventionality.”

Conventional science isn’t bad science. But we also need unconventional science, and we should be careful to not assign the label “crackpottery” too quickly. If science is what scientists do, scientists should pay some attention to the science of what they do.

n-Category Café The HoTT Effect

Martin-Löf type theory has been around for years, as have category theory, topos theory and homotopy theory. Bundle them all together within the package of homotopy type theory, and philosophy suddenly takes a lot more interest.

If you’re looking for places to go to hear about this new interest, you are spoilt for choice:

For an event which delves back also to pre-HoTT days, try my

CFA: Foundations of Mathematical Structuralism

12-14 October 2016, Munich Center for Mathematical Philosophy, LMU Munich

In the course of the last century, different general frameworks for the foundations of mathematics have been investigated. The orthodox approach to foundations interprets mathematics in the universe of sets. More recently, however, there have been other developments that call into question the whole method of set theory as a foundational discipline. Category-theoretic methods that focus on structural relationships and structure-preserving mappings between mathematical objects, rather than on the objects themselves, have been in play since the early 1960s. But in the last few years they have found clarification and expression through the development of homotopy type theory. This represents a fascinating development in the philosophy of mathematics, where category-theoretic structural methods are combined with type theory to produce a foundation that accounts for the structural aspects of mathematical practice. We are now at a point where the notion of mathematical structure can be elucidated more clearly and its role in the foundations of mathematics can be explored more fruitfully.

The main objective of the conference is to reevaluate the different perspectives on mathematical structuralism in the foundations of mathematics and in mathematical practice. To do this, the conference will explore the following research questions: Does mathematical structuralism offer a philosophically viable foundation for modern mathematics? What role do key notions such as structural abstraction, invariance, dependence, or structural identity play in the different theories of structuralism? To what degree does mathematical structuralism as a philosophical position describe actual mathematical practice? Does category theory or homotopy type theory provide a fully structural account for mathematics?

Confirmed Speakers:

  • Prof. Steve Awodey (Carnegie Mellon University)
  • Dr. Jessica Carter (University of Southern Denmark)
  • Prof. Gerhard Heinzmann (Université de Lorraine)
  • Prof. Geoffrey Hellman (University of Minnesota)
  • Prof. James Ladyman (University of Bristol)
  • Prof. Elaine Landry (UC Davis)
  • Prof. Hannes Leitgeb (LMU Munich)
  • Dr. Mary Leng (University of York)
  • Prof. Øystein Linnebo (University of Oslo)
  • Prof. Erich Reck (UC Riverside)

Call for Abstracts:

We invite the submission of abstracts on topics related to mathematical structuralism for presentation at the conference. Abstracts should include a title, a brief abstract (up to 100 words), and a full abstract (up to 1000 words), blinded for peer review. Authors should send their abstracts (in pdf format), together with their name, institutional affiliation and current position to We will select up to five submissions for presentation at the conference. The conference language is English.

Dates and Deadlines:

  • Submission deadline: 30 June, 2016
  • Notification of acceptance: 31 July, 2016
  • Registration deadline: 1 October, 2016
  • Conference: 12 - 14 October, 2016

Tim GowersReflections on the recent solution of the cap-set problem I

Sometimes blog posts about recent breakthroughs can be useful because they convey the main ideas of a proof without getting bogged down in the technical details. But the recent solution of the cap-set problem by Jordan Ellenberg, and independently and fractionally later by Dion Gijswijt, both making crucial use of an amazing lemma of Croot, Lev and Pach that was made public a week or so before, does not really invite that kind of post, since the papers are so short, and the ideas so transparent, that it’s hard to know how a blog post can explain them more clearly.

But as I’ve got a history with this problem, including posting about it on this blog in the past, I feel I can’t just not react. So in this post and a subsequent one (or ones) I want to do three things. The first is just to try to describe my own personal reaction to these events. The second is more mathematically interesting. As regular readers of this blog will know, I have a strong interest in the question of where mathematical ideas come from, and a strong conviction that they always result from a fairly systematic process — and that the opposite impression, that some ideas are incredible bolts from the blue that require “genius” or “sudden inspiration” to find, is an illusion that results from the way mathematicians present their proofs after they have discovered them.

From time to time an argument comes along that appears to present a stiff challenge to my view. The solution to the cap-set problem is a very good example: it’s easy to understand the proof, but the argument has a magic quality that leaves one wondering how on earth anybody thought of it. I’m referring particularly to the Croot-Lev-Pach lemma here. I don’t pretend to have a complete account of how the idea might have been discovered (if any of Ernie, Seva or Peter, or indeed anybody else, want to comment about this here, that would be extremely welcome), but I have some remarks.

The third thing I’d like to do reflects another interest of mine, which is avoiding duplication of effort. I’ve spent a little time thinking about whether there is a cheap way of getting a Behrend-type bound for Roth’s theorem out of these ideas (and I’m not the only one). Although I wasn’t expecting the answer to be yes, I think there is some value in publicizing some of the dead ends I’ve come across. Maybe it will save others from exploring them, or maybe, just maybe, it will stimulate somebody to find a way past the barriers that seem to be there.

Personal reflections

There’s not actually all that much to say here. I just wanted to comment on a phenomenon that’s part of mathematical life: the feeling of ambivalence one has when a favourite problem is solved by someone else. The existence of such a feeling is hardly a surprise, but slightly more interesting are the conditions that make it more or less painful. For me, an extreme example where it was not at all painful was Wiles’s solution of Fermat’s Last Theorem. I was in completely the wrong area of mathematics to have a hope of solving that problem, so although I had been fascinated by it since boyhood, I could nevertheless celebrate in an uncomplicated way the fact that it had been solved in my lifetime, something that I hadn’t expected.

Towards the other end of the spectrum for me personally was Tom Sanders’s quasipolynomial version of the Bogolyubov-Ruzsa lemma (which was closely related to his bound for Roth’s theorem). That was a problem I had worked on very hard, and some of the ideas I had had were, it turned out, somewhat in the right direction. But Tom got things to work, with the help of further ideas that I had definitely not had, and by the time he solved the problem I had gone for several years without seriously working on it. So on balance, my excitement at the solution was a lot greater than the disappointment that that particular dream had died.

The cap-set problem was another of my favourite problems, and one I intended to return to. But here I feel oddly un-disappointed. The main reason is that I know that if I had started work on it again, I would have continued to try to push the Fourier methods that have been so thoroughly displaced by the Lev-Croot-Pach lemma, and would probably have got nowhere. So the discovery of this proof has saved me from wasting a lot of time at some point in the future. It’s also an incredible bonus that the proof is so short and easy to understand. I could almost feel my brain expanding as I read Jordan Ellenberg’s preprint and realized that here was a major new technique to add to the toolbox. Of course, the polynomial method is not new, but somehow this application of it, at least for me, feels like one where I can make some headway with understanding why it works, rather than just gasping in admiration at each new application and wondering how on earth anyone thought of it.

Why polynomials?

That brings me neatly on to the next theme of this post. From now on I shall assume familiarity with the argument as presented by Jordan Ellenberg, but here is a very brief recap.

The key to it is the lemma of Croot, Lev and Pach (very slightly modified), which states that if A\subset\mathbb F_3^n and P is a polynomial of degree d in variables x_1,\dots,x_n such that P(x+y)=0 for every pair of distinct elements x,y\in A, then P(2x) is non-zero for at most 2m_{d/2} values of x\in A, where m_{d/2} is the dimension of the space of polynomials in x_1,\dots,x_n of degree at most d/2.

Why does this help? Well, the monomials we consider are of the form x_1^{a_1}\dots x_n^{a_n} where each a_i\in\{0,1,2\}. The expected degree of a random such monomial is n, and for large n the degree is strongly concentrated about its mean. In particular, if we choose d=4n/3, then the probability that a random monomial has degree greater than d is exponentially small, and the probability that a random monomial has degree less than d/2 is also exponentially small.

Therefore, the dimension of the space of polynomials of degree at most d (for this d) is at least 3^n(1-c^n), while the dimension of the space of polynomials of degree at most d/2 is at most 3^nc^n. Here c is some constant less than 1. It follows that if A is a set of density greater than c^n we can find a polynomial of degree d that vanishes everywhere on (2.A)^c and doesn’t vanish on 2.A. Furthermore, if A has density a bit bigger than this — say 6c^n, we can find a polynomial of degree d that vanishes on (2.A)^c and is non-zero at more than 2m_{d/2} points of 2.A. Therefore, by the lemma, it cannot vanish on all x+y with x,y distinct elements of A, which implies that there exist distinct x,y\in A such that x+y=2z for some z\in A.

Now let us think about the Croot-Lev-Pach lemma. It is proved by a linear algebra argument: we define a map \phi:\mathbb F_3^n\to V, where V is a certain vector space over \mathbb F_3 of dimension k=2m_{d/2}, and we also define a bilinear form \langle .,.\rangle on V, with the property that P(x+y)=\langle\phi(x),\phi(y)\rangle for every x,y\in\mathbb F_3^n. Then the conditions on P translate into the condition that \langle\phi(x),\phi(y)\rangle=0 for all distinct x,y\in A. But if P is non-zero at more than k points in 2.A, that gives us x_1,\dots,x_{k+1} such that \langle \phi(x_i),\phi(x_j)\rangle=0 if and only if i\ne j, which implies that \phi(x_1),\dots,\phi(x_{k+1}) are linearly independent, which they can’t be as they all live in the k-dimensional space V.

The crucial thing that makes this lemma useful is that we have a huge space of functions — of almost full dimension — each of which can be represented this way with a very small k.

The question I want to think about is the following. Suppose somebody had realized that they could bound the size of an AP-free set by finding an almost full-dimensional space of functions, each of which had a representation of the form f(x+y)=\langle\phi(x),\phi(y)\rangle, where \phi took values in a low-dimensional vector space V. How might they have come to realize that polynomials could do the job? Answering this question doesn’t solve the mystery of how the proof was discovered, since the above realization seems hard to come by: until you’ve seen it, the idea that almost all functions could be represented very efficiently like that seems somewhat implausible. But at least it’s a start.

Let’s turn the question round. Suppose we know that f has the property that f(x+y)=\langle\phi(x),\phi(y)\rangle for every x,y, with \phi taking values in a k-dimensional space. That is telling us that if we think of f as a matrix — that is, we write F(x,y) for f(x+y) — then that matrix has rank k. So we can ask the following question: given a matrix F(x,y) that happens to be of the special form f(x+y) (where the indexing variables x,y live in \mathbb F_3^n), under what circumstances can it possibly have low rank? That is, what about f makes F have low rank?

We can get some purchase on this question by thinking about how F operates as a linear map on functions defined on \mathbb F_3^n. Indeed, we have that if u is a function defined on \mathbb F_3^n (I’m being a bit vague for the moment about where u takes its values, though the eventual answer will be \mathbb F_3), then we have the formula Fu(x)=\sum_yf(x+y)u(y). Now F has rank k if and only if the functions of the form Fu form a k-dimensional subspace. Note that if u is the function \delta_z, we have that Fu(x)=f(x+z). Since every u is a linear combination of delta functions, we are requiring that the translates of f should span a subspace of dimension k. Of course, we’d settle for a lower dimension, so it’s perhaps more natural to say at most k. I won’t actually write that, but it should be understood that it’s what I basically mean.

What kinds of functions have the nice property that their translates span a low-dimensional subspace? And can we find a huge space of such functions?

The answer that occurs most naturally to me is that characters have this property: if \chi is a character, then every translate of \chi is a multiple of \chi, since \chi(x+z)=\chi(z)\chi(x). So if f is a linear combination of k characters, then its translates span a k-dimensional space. (So now, just to be explicit about it, my functions are taking values in \mathbb C.)

Moreover, the converse is true. What we are asking for is equivalent to asking for the convolutions of f with other functions to live in a k-dimensional subspace. If we take Fourier transforms, we now want the pointwise products of \hat f with other functions to live in a k-dimensional subspace. Well, that’s exactly saying that \hat f takes k non-zero values. Transforming back, that gives us that f needs to be a linear combination of k characters.

But that’s a bit of a disaster. If we want an r-dimensional space of functions such that each one is a linear combination of at most k characters, we cannot do better than to take r=3k/2. The proof is the same as one of the arguments in Ellenberg’s preprint: in an r-dimensional space there must be at least r active coordinates, and then a random element of the space is on average non-zero on at least 2r/3 of those.

So we have failed in our quest to make r exponentially close to 3^n and k exponentially close to zero.

But before we give up, shouldn’t we at least consider backtracking and trying again with a different field of scalars? The complex numbers didn’t work out for us, but there is one other choice that stands out as natural, namely \mathbb F_3.

So now we ask a question that’s exactly analogous to the question we asked earlier: what kinds of functions have the property that they and their translates generate a subspace of dimension k?

Let’s see whether the characters idea works here. Are there functions \phi:\mathbb F_3^n\to\mathbb F_3 with the property that \phi(x+y)=\phi(x)\phi(y)? No there aren’t, or at least not any interesting ones, since that would give us that \phi(0)=\phi(x+x+x)=\phi(x)^3 for every x, which implies that \phi is constant (and because \phi(x)^2=\phi(2x), that constant has to be 0 or 1).

OK, let’s ask a slightly different question. Is there some fairly small space of functions from \mathbb F_3^n to \mathbb F_3 that is closed under taking translates? That is, we would like that if f belongs to the space, then for each y the function x\mapsto f(x+y) also belongs to the space.

One obvious space of functions with this property is linear maps. There aren’t that many of these — just an n-dimensional space of them (or (n+1)-dimensional if we interpret “linear” in the polynomials sense rather than the vector-spaces sense) — sitting inside the 3^n-dimensional space of all functions from \mathbb F_3^n to \mathbb F_3.

It’s not much of a stretch to get from here to noticing that polynomials of degree at most d form another such space. For example, we might think, “What’s the simplest function I can think of that isn’t linear?” and we might then go for something like x\mapsto x_1^2. That and its translates generate the space of all quadratic polynomials that depend on x_1 only. Then we’d start to spot that there are several spaces of functions that are closed under translation. Given any monomial, it and its translates generate the space generated by all smaller monomials. So for example the monomial x_1^2x_2 and its translates generate the space of polynomials of the form ax_1^2x_2+bx_1x_2+cx_2+dx_1^2+ex_1+f. So any down-set of monomials defines a subspace that is closed under translation.

I think, but have not carefully checked, that these are in fact the only subspaces that are closed under translation. Let me try to explain why. Given any function f from \mathbb F_3^n to \mathbb F_3, it must be given by a polynomial P made out of cube-free monomials. That’s simply because the dimension of the space of such polynomials is 3^n. And I think that if you take any polynomial, then the subspace that it and its translates generate is generated by all the monomials that are less than a monomial that occurs in P with a non-zero coefficient.

Actually no, that’s false. If I take the polynomial x_1+\dots+x_n, then every translate of it is of the form x_1+\dots+x_1+C. So without thinking a bit more, I don’t have a characterization of the spaces of functions that are closed under translation. But we can at least say that polynomials give us a rich supply of them.

Is the rank miracle a miracle?

I’m starting this section a day after writing the sections above, and after a good night’s sleep I have clarified in my mind something I sort of knew already, as it’s essential to the whole argument, which is that the conjectures that briefly flitted across my mind two paragraphs ago and that turned out to be false absolutely had to be false. Their falsity is pretty much the whole point of what is going on. So let me come to that now.

Let me call a subspace closed if it is closed under translation. (Just to be completely explicit about this, by “translation” I am referring to operations of the form T_a, which take a function f to the function x\mapsto f(x-a).) Note that the sum of two closed subspaces is closed. Therefore, if we want to find out what closed subspaces are like, we could do a lot worse than thinking about the closed subspaces generated by a single function, which it now seems good to think of as a polynomial.

Unfortunately, it’s not easy to illustrate what I’m about to say with a simple example, because simple examples tend to be too small for the phenomenon to manifest itself. So let us argue in full generality. Let P be a polynomial of degree at most d. We would like to understand the rank of the matrix F(x,y)=P(x+y), which is equal to the dimension of the closed subspace generated by P, or equivalently the subspace generated by all functions of the form P(x+a).

At first sight it looks as though this subspace could contain pretty well all linear combinations of monomials that are dominated by monomials that occur with non-zero coefficients in P. For example, consider the 2-variable polynomial x^2y. In this case we are trying to work out the dimension of the space spanned by the polynomials


These live in the space spanned by six monomials, so we’d like to know whether the vectors of the form (1,b,2a,2ab,a^2,a^2b) span the whole of \mathbb F_3^6 or just some proper subspace. Setting a=0 we see that we can generate the standard basis vectors e_1 and e_2. Setting b=0 it’s not hard to see that we can also get e_3 and e_5. And setting b=1 we see that we can get the fourth and sixth coordinates to be any pair we like. So these do indeed span the full space. Thus, in this particular case one of my false conjectures from earlier happens to be true.

Let’s see why it is false in general. The argument is basically repeating the proof of the Croot-Lev-Pach lemma, but using that proof to prove an equivalent statement (a bound for the rank of the closed subspace generated by P) rather than the precise statement they proved. (I’m not claiming that this is a radically different way of looking at things, but I find it slightly friendlier.)

Let P be a polynomial. One thing that’s pretty clear, and I think this is why I got slightly confused yesterday, is that for every monomial m_1 that’s dominated by a monomial m_2 that occurs non-trivially in P we can find some linear combination of translates P(x+a) such that m_1 occurs with a non-zero coefficient. So if we want to prove that these translates generate a low-dimensional space, we need to show that there are some heavy-duty linear dependences amongst these coefficients. And there are! Here’s how the proof goes. Suppose that P has degree at most d. Then we won’t worry at all about the coefficients of the monomials of degree at most d/2: sure, these generate a subspace of dimension m_{d/2} (that’s the definition of m_{d/2}, by the way), but unless d is very close to 2n, that’s going to be very small.

But what about the coefficients of the monomials of degree greater than d/2? This is where the linear dependences come in. Let x_1^{r_1}\dots x_n^{r_n} be such a monomial. What can we say about its coefficient in the polynomial P(x+a)? Well, if we expand out P(x+a) and write it as a linear combination of monomials, then the coefficient of x_1^{r_1}\dots x_n^{r_n} will work out as a gigantic polynomial in a_1,\dots,a_n. However, and this is the key point, this “gigantic” polynomial will have degree at most d/2. That is, for each such monomial x_1^{r_1}\dots x_n^{r_n}, we have a polynomial Q_{r_1,\dots,r_n} of degree at most d/2 such that Q_{r_1,\dots,r_n}(a) gives the coefficient of x_1^{r_1}\dots x_n^{r_n} in the polynomial P(x+a). But these polynomials all live in the m_{d/2}-dimensional space of polynomials of degree at most d/2, so we can find a spanning subset of them of size at most m_{d/2}. In other words, we can pick out at most m_{d/2} of the polynomials Q_{r_1,\dots,r_n}, and all the rest are linear combinations of those ones. This is the huge linear dependence we wanted, and it shows that the projection of the closed subspace generated by P to the monomials of degree at least d/2 is also at most m_{d/2}.

So in total we get that P and its translates span a space of dimension at most 2m_{d/2}, which for suitable d is much much smaller than m_d. This is what I am referring to when I talk about a “rank miracle”.

Note that we could have phrased the entire discussion in terms of the rank of P(x+y). That is, we could have started with the thought that if f is a function defined on \mathbb F_3^n such that f(x+y)=0 whenever x,y are distinct elements of A, and f(2x)\ne 0 for at least m points x\in A, then the matrix F(x,y)=f(x+y) would have rank at least m, which is the same as saying that f and its translates span a space of dimension at least m. So then we would be on the lookout for a high-dimensional space of functions such that for each function f in the class, f and its translates span a much lower-dimensional space. That is what the polynomials give us, and we don’t have to mention a funny non-linear function \phi from \mathbb F_3^n to a vector space V.

I still haven’t answered the question of whether the rank miracle is a miracle. I actually don’t have a very good answer to this. In the abstract, it is a big surprise that there is a space of functions of dimension that’s exponentially close to the maximal dimension 3^n such that for every single function f in that space, the rank of the matrix F(x,y)=f(x+y) is exponentially small. (Here “exponentially small/close” means as a fraction of 3^n.) And yet, once one has seen the proof, it begins to feel like a fairly familiar concentration of measure argument: it isn’t a surprise that the polynomials of degree at most 4n/3 form a space of almost full dimension and the polynomials of degree at most 2n/3 form a space of tiny dimension. And it’s not completely surprising (again with hindsight) that because in the expansion of P(x+y) you can’t use more than half the degree for both x and y, there might be some way of arguing that the translates of P live in a subspace of dimension closer to m_{d/2}.

This post has got rather long, so this seems like a good place to cut it off. To be continued.

May 17, 2016

Terence TaoEquivalence of the logarithmically averaged Chowla and Sarnak conjectures

I’ve just uploaded to the arXiv my paper “Equivalence of the logarithmically averaged Chowla and Sarnak conjectures“, submitted to the Festschrift “Number Theory – Diophantine problems, uniform distribution and applications” in honour of Robert F. Tichy. This paper is a spinoff of my previous paper establishing a logarithmically averaged version of the Chowla (and Elliott) conjectures in the two-point case. In that paper, the estimate

\displaystyle  \sum_{n \leq x} \frac{\lambda(n) \lambda(n+h)}{n} = o( \log x )

as {x \rightarrow \infty} was demonstrated, where {h} was any positive integer and {\lambda} denoted the Liouville function. The proof proceeded using a method I call the “entropy decrement argument”, which ultimately reduced matters to establishing a bound of the form

\displaystyle  \sum_{n \leq x} \frac{|\sum_{h \leq H} \lambda(n+h) e( \alpha h)|}{n} = o( H \log x )

whenever {H} was a slowly growing function of {x}. This was in turn established in a previous paper of Matomaki, Radziwill, and myself, using the recent breakthrough of Matomaki and Radziwill.

It is natural to see to what extent the arguments can be adapted to attack the higher-point cases of the logarithmically averaged Chowla conjecture (ignoring for this post the more general Elliott conjecture for other bounded multiplicative functions than the Liouville function). That is to say, one would like to prove that

\displaystyle  \sum_{n \leq x} \frac{\lambda(n+h_1) \dots \lambda(n+h_k)}{n} = o( \log x )

as {x \rightarrow \infty} for any fixed distinct integers {h_1,\dots,h_k}. As it turns out (and as is detailed in the current paper), the entropy decrement argument extends to this setting (after using some known facts about linear equations in primes), and allows one to reduce the above estimate to an estimate of the form

\displaystyle  \sum_{n \leq x} \frac{1}{n} \| \lambda \|_{U^d[n, n+H]} = o( \log x )

for {H} a slowly growing function of {x} and some fixed {d} (in fact we can take {d=k-1} for {k \geq 3}), where {U^d} is the (normalised) local Gowers uniformity norm. (In the case {k=3}, {d=2}, this becomes the Fourier-uniformity conjecture discussed in this previous post.) If one then applied the (now proven) inverse conjecture for the Gowers norms, this estimate is in turn equivalent to the more complicated looking assertion

\displaystyle  \sum_{n \leq x} \frac{1}{n} \sup |\sum_{h \leq H} \lambda(n+h) F( g^h x )| = o( \log x ) \ \ \ \ \ (1)

where the supremum is over all possible choices of nilsequences {h \mapsto F(g^h x)} of controlled step and complexity (see the paper for definitions of these terms).

The main novelty in the paper (elaborating upon a previous comment I had made on this blog) is to observe that this latter estimate in turn follows from the logarithmically averaged form of Sarnak’s conjecture (discussed in this previous post), namely that

\displaystyle  \sum_{n \leq x} \frac{1}{n} \lambda(n) F( T^n x )= o( \log x )

whenever {n \mapsto F(T^n x)} is a zero entropy (i.e. deterministic) sequence. Morally speaking, this follows from the well-known fact that nilsequences have zero entropy, but the presence of the supremum in (1) means that we need a little bit more; roughly speaking, we need the class of nilsequences of a given step and complexity to have “uniformly zero entropy” in some sense.

On the other hand, it was already known (see previous post) that the Chowla conjecture implied the Sarnak conjecture, and similarly for the logarithmically averaged form of the two conjectures. Putting all these implications together, we obtain the pleasant fact that the logarithmically averaged Sarnak and Chowla conjectures are equivalent, which is the main result of the current paper. There have been a large number of special cases of the Sarnak conjecture worked out (when the deterministic sequence involved came from a special dynamical system), so these results can now also be viewed as partial progress towards the Chowla conjecture also (at least with logarithmic averaging). However, my feeling is that the full resolution of these conjectures will not come from these sorts of special cases; instead, conjectures like the Fourier-uniformity conjecture in this previous post look more promising to attack.

It would also be nice to get rid of the pesky logarithmic averaging, but this seems to be an inherent requirement of the entropy decrement argument method, so one would probably have to find a way to avoid that argument if one were to remove the log averaging.

Filed under: math.NT, paper Tagged: Chowla conjecture, entropy decrement argument, Sarnak conjecture

Chad OrzelImaginary Syllabus: Science of Sports and Games

It’s one of those days where none of the stuff I probably ought to be writing seems even slightly appealing, so instead I’m going to do something frivolous and morale-boosting, namely think out loud about an imaginary course. Despite being on sabbatical, I do still check my work email, and have caught the edges of a new round of arguments about whether we’re providing enough “Gen Ed” science courses pitched at non-majors. The hardest part of this is always meeting the “science with lab” component, because those courses are pretty resource-instensive, and we have a limited set of them that we run through. Which has occasionally led me to speculate about what we could do to offer another science-with-lab course for non-majors.

The one semi-serious idea that’s stuck comes from a joking tweet I made some time back, when I said I was going to invent a “physics of sports” class to justify making the college pay for me to go to the Sloan Sports Conference. Which, like a lot of good jokes, sort of stuck in the back of my head, and every now and then I run across something that I mentally add to the imaginary syllabus. Which sort of looks like this, in roughly the order that they’d go in:

Topic 1: Kinematics: Position, velocity, and acceleration, and the relation between them. Basics of video analysis using Tracker to look at things like parabolic motion of thrown objects, acceleration of sprinters, etc.

Labs: Make and analyze video of simple motions. Look at existing video to check physics, a la Rhett’s Angry Birds stuff.

Topic 2: Momentum: Conservation of momentum in collisions of simple objects– billiard balls, etc.

Labs: Video analysis of simple mostly-elastic collisions.

Topic 3: Energy: Get into it with deviations from ideal elastic collisions– energy loss in bouncing balls, etc. Potential and kinetic energy, energy loss to thermal motion, etc.

Labs: Video analysis of motion with energy loss: bouncing balls slowing to a stop, projectiles with significant air resistance, etc.

Topic 4: Rotational Motion/ Angular Momentum: Start with energy in bouncing footballs, etc. Look at conservation of angular momentum in ice skaters, gymnasts, etc.

Labs: Video analysis of motion of spinning things.

Topic 5: Basic Probability and Statistics: Different types of averages, their pros and cons. Some discussion of uncertainties.

Labs: Flipping coins, rolling dice, etc.

Topic 6: More Advanced Statistics: Why do all these people on ESPN go on about “advanced metrics” all the time?

Labs: “Hot hand” experiment?

Topic 7: Final Project: some sort of student-chosen investigation to end the term. Either a pro sports incident that can be analyzed on video, or making a video of something that can be analyzed.

Typed out like that, this seems like a good deal of fun, but also a shitload of work. There are a whole host of reasons why this probably won’t happen, of course, but it has served its basic mood-elevating purpose for the morning, so hooray for that. And if you have suggestions of topics to add, or grant funds you’d be willing to direct my way to actually do this, well, you know where the comments are…

May 16, 2016

Scott AaronsonThe 8000th Busy Beaver number eludes ZF set theory: new paper by Adam Yedidia and me

I’ve supervised a lot of great student projects in my nine years at MIT, but my inner nerdy teenager has never been as personally delighted by a project as it is right now.  Today, I’m proud to announce that Adam Yedidia, a PhD student at MIT (but an MEng student when he did most of this work), has explicitly constructed a one-tape, two-symbol Turing machine with 7,918 states, whose behavior (when run on a blank tape) can never be proven from the usual axioms of set theory, under reasonable consistency hypotheses.  Adam has also constructed a 4,888-state Turing machine that halts iff there’s a counterexample to Goldbach’s Conjecture, and a 5,372-state machine that halts iff there’s a counterexample to the Riemann Hypothesis.  In all three cases, this is the first time we’ve had a reasonable explicit upper bound on how many states you need in a Turing machine before you can see the behavior in question.

Here’s our research paper, on which Adam generously included me as a coauthor, even though he did the heavy lifting.  Also, here’s a github repository where you can download all the code Adam used to generate these Turing machines, and even use it to build your own small Turing machines that encode interesting mathematical statements.  Finally, here’s a YouTube video where Adam walks you through how to use his tools.

A more precise statement of our main result is this: we give a 7,918-state Turing machine, called Z (and actually explicitly listed in our paper!), such that:

  1. Z runs forever, assuming the consistency of a large-cardinal theory called SRP (Stationary Ramsey Property), but
  2. Z can’t be proved to run forever in ZFC (Zermelo-Fraenkel set theory with the Axiom of Choice, the usual foundation for mathematics), assuming that ZFC is consistent.

A bit of background: it follows, as an immediate consequence of Gödel’s Incompleteness Theorem, that there’s some computer program, of some length, that eludes the power of ordinary mathematics to prove what it does, when it’s run with an unlimited amount of memory.  So for example, such a program could simply enumerate all the possible consequences of the ZFC axioms, one after another, and halt if it ever found a contradiction (e.g., a proof of 1+1=3).  Assuming ZFC is consistent, this program must run forever.  But again assuming ZFC is consistent, ZFC can’t prove that the program runs forever, since if it did, then it would prove its own consistency, thereby violating the Second Incompleteness Theorem!

Alas, this argument still leaves us in the dark about where, in space of computer programs, the “Gödelian gremlin” rears its undecidable head.  A program that searches for an inconsistency in ZFC is a fairly complicated animal: it needs to encode not only the ZFC axiom schema, but also the language and inference rules of first-order logic.  Such a program might be thousands of lines long if written in a standard programming language like C, or millions of instructions if compiled down to a bare-bones machine code.  You’d certainly never run across such a program by chance—not even if you had a computer the size of the observable universe, trying one random program after another for billions of years in a “primordial soup”!

So the question stands—a question that strikes me as obviously important, even though as far as I know, only one or two people ever asked the question before us; see here for example.  Namely: do the axioms of set theory suffice to analyze the behavior of every computer program that’s at most, let’s say, 50 machine instructions long?  Or are there super-short programs that already exhibit “Gödelian behavior”?

Theoretical computer scientists might object that this is “merely a question of constants.”  Well yes, OK, but the origin of life in our universe—a not entirely unrelated puzzle—is also “merely a question of constants”!  In more detail, we know that it’s possible with our laws of physics to build a self-replicating machine: say, DNA or RNA and their associated paraphernalia.  We also know that tiny molecules like H2O and CO2 are not self-replicating.  But we don’t know how small the smallest self-replicating molecule can be—and that’s an issue that influences whether we should expect to find ourselves alone in the universe or find it teeming with life.

Some people might also object that what we’re asking about has already been studied, in the half-century quest to design the smallest universal Turing machine (the subject of Stephen Wolfram’s $25,000 prize in 2007, to which I responded with my own $25.00 prize).  But I see that as fundamentally different, for the following reason.  A universal Turing machine—that is, a machine that simulates any other machine that’s described to it on its input tape—has the privilege of offloading almost all of its complexity onto the description format for the input machine.  So indeed, that’s exactly what all known tiny universal machines do!  But a program that checks (say) Goldbach’s Conjecture, or the Riemann Hypothesis, or the consistency of set theory, on an initially blank tape, has no such liberty.  For such machines, the number of states really does seem like an intrinsic measure of complexity, because the complexity can’t be shoehorned anywhere else.

One can also phrase what we’re asking in terms of the infamous Busy Beaver function.  Recall that BB(n), or the nth Busy Beaver number, is defined to be the maximum number of steps that any n-state Turing machine takes when run on an initially blank tape, assuming that the machine eventually halts. The Busy Beaver function was the centerpiece of my 1998 essay Who Can Name the Bigger Number?, which might still attract more readers than anything else I’ve written since. As I stressed there, if you’re in a biggest-number-naming contest, and you write “BB(10000),” you’ll destroy any opponent—however otherwise mathematically literate they are—who’s innocent of computability theory.  For BB(n) grows faster than any computable sequence of integers: indeed, if it didn’t, then one could use that fact to solve the halting problem, contradicting Turing’s theorem.

But the BB function has a second amazing property: namely, it’s a perfectly well-defined integer function, and yet once you fix the axioms of mathematics, only finitely many values of the function can ever be proved, even in principle.  To see why, consider again a Turing machine M that halts if and only if there’s a contradiction in ZF set theory.  Clearly such a machine could be built, with some finite number of states k.  But then ZF set theory can’t possibly determine the value of BB(k) (or BB(k+1), BB(k+2), etc.), unless ZF is inconsistent!  For to do so, ZF would need to prove that M ran forever, and therefore prove its own consistency, and therefore be inconsistent by Gödel’s Theorem.

OK, but we can now ask a quantitative question: how many values of the BB function is it possible for us to know?  Where exactly is the precipice at which this function “departs the realm of mortals and enters the realm of God”: is it closer to n=10 or to n=10,000,000?  In practice, four values of BB have been determined so far:

  • BB(1)=1
  • BB(2)=6
  • BB(3)=21 (Lin and Rado 1965)
  • BB(4)=107 (Brady 1975)

We also know some lower bounds:

See Heiner Marxen’s page or the Googology Wiki (which somehow I only learned about today) for more information.

Some Busy Beaver enthusiasts have opined that even BB(6) will never be known exactly.  On the other hand, the abstract argument from before tells us only that, if we confine ourselves to (say) ZF set theory, then there’s some k—possibly in the tens of millions or higher—such that the values of BB(k), BB(k+1), BB(k+2), and so on can never be proven.  So again: is the number of knowable values of the BB function more like 10, or more like a million?

This is the question that Adam and I (but mostly Adam) have finally addressed.

It’s hopeless to design a Turing machine by hand for all but the simplest tasks, so as a first step, Adam created a new programming language, called Laconic, specifically for writing programs that compile down to small Turing machines.  Laconic programs actually compile to an intermediary language called TMD (Turing Machine Descriptor), and from there to Turing machines.

Even then, we estimate that a direct attempt to write a Laconic program that searched for a contradiction in ZFC would lead to a Turing machine with millions of states.  There were three ideas needed to get the state count down to something reasonable.

The first was to take advantage of the work of Harvey Friedman, who’s one of the one or two people I mentioned earlier who’s written about these problems before.  In particular, Friedman has been laboring since the 1960s to find “natural” arithmetical statements that are provably independent of ZFC or other strong set theories.  (See this AMS Notices piece by Martin Davis for a discussion of Friedman’s progress as of 2006.)  Not only does Friedman’s quest continue, but some of his most important progress has come only within the last year.  His statements—typically involving objects called “order-invariant graphs”—strike me as alien, and as far removed from anything I’d personally have independent reasons to think about (but is that just a sign of my limited perspective?).  Be that as it may, Friedman’s statements still seem a lot easier to encode as short computer programs than the full apparatus of first-order logic and set theory!  So that’s what we started with; our work wouldn’t have been possible without Friedman (who we consulted by email throughout the project).

The second idea was something we called “on-tape processing.”  Basically, instead of compiling directly from Laconic down to Turing machine, Adam wrote an interpreter in Turing machine (which took about 4000 states—a single, fixed cost), and then had the final Turing machine first write a higher-level program onto its tape and then interpret that program.  Instead of the compilation process producing a huge multiplicative overhead in the number of Turing machine states (and a repetitive machine), this approach gives us only an additive overhead.  We found that this one idea decreased the number of states by roughly an order of magnitude.

The third idea was first suggested in 2002 by Ben-Amram and Petersen (and refined for us by Luke Schaeffer); we call it “introspective encoding.”  When we write the program to be interpreted onto the Turing machine tape, the naïve approach would use one Turing machine state per bit.  But that’s clearly wasteful, since in an n-state Turing machine, every state contains ~log(n) bits of information (because of the other states it needs to point to).  A better approach tries to exploit as many of those bits as it can; doing that gave us up to a factor-of-5 additional savings in the number of states.

For Goldbach’s Conjecture and the Riemann Hypothesis, we paid the same 4000-state overhead for the interpreter, but then the program to be interpreted was simpler, giving a smaller overall machine.  Incidentally, it’s not intuitively obvious that the Riemann Hypothesis is equivalent to the statement that some particular computer program runs forever, but it is—that follows, for example, from work by Lagarias and by Davis, Matijasevich, and Robinson (we used the latter; an earlier version of this post incorrectly stated that we used the Lagarias result).

To preempt the inevitable question in the comments section: yes, we did run these Turing machines for a while, and no, none of them had halted after a day or so.  But before you interpret that as evidence in favor of Goldbach, Riemann, and the consistency of ZFC, you should probably know that a Turing machine to test whether all perfect squares are less than 5, produced using Laconic, needed to run for more than an hour before it found the first counterexample (namely, 32=9) and halted.  Laconic Turing machines are optimized only for the number of states, not for speed, to put it mildly.

Of course, three orders of magnitude still remain between the largest value of n (namely, 4) for which BB(n) is known to be knowable in ZFC-based mathematics, and the smallest value of n (namely, 7,918) for which BB(n) is known to be unknowable.  I’m optimistic that further improvements are possible to the machine Z—whether that means simplifications to Friedman’s statement, a redesigned interpreter (possibly using lambda calculus?), or a “multi-stage rocket model” where a bare-bones interpreter would be used to unpack a second, richer interpreter which would be used to unpack a third, etc., until you got to the actual program you cared about.  But I’d be shocked if anyone in my lifetime determined the value of BB(10), for example, or proved the value independent of set theory.  Even after the Singularity happens, I imagine that our robot overlords would find the determination of BB(10) quite a challenge.

In an early Shtetl-Optimized post, I described theoretical computer science as “quantitative epistemology.”  Constructing small Turing machines whose behavior eludes set theory is not conventional theoretical computer science by any stretch of the imagination: it’s closer in practice to programming languages or computer architecture, or even the recreational practice known as code-golfing.  On the other hand, I’ve never been involved with any other project that was so clearly, explicitly about pinning down the quantitative boundary between the knowable and the unknowable.

Comments on our paper are welcome.

Addendum: Some people might wonder “why Turing machines,” as opposed to a more reasonable programming language like C or Python.  Well, first of all, we needed a language that could address an unlimited amount of memory.  Also, the BB function is traditionally defined in terms of Turing machines.  But the most important issue is that we wanted there to be no suspicion whatsoever that our choice of programming language was artificially helping to make our machine small.  And hopefully everyone can agree that one-tape, two-symbol Turing machines aren’t designed for anyone’s convenience!

n-Category Café E8 as the Symmetries of a PDE

My friend Dennis The recently gave a new description of the Lie algebra of E 8\mathrm{E}_8 (as well as all the other complex simple Lie algebras, except 𝔰𝔩(2,)\mathfrak{sl}(2,\mathbb{C})) as the symmetries of a system of partial differential equations. Even better, when he writes down his PDE explicitly, the exceptional Jordan algebra makes an appearance, as we will see.

This is a story with deep roots: it goes back to two very different models for the Lie algebra of G 2\mathrm{G}_2, one due to Cartan and one due to Engel, which were published back-to-back in 1893. Dennis figured out how these two results are connected, and then generalized the whole story to nearly every simple Lie algebra, including E 8\mathrm{E}_8.

Let’s begin with that model of G 2\mathrm{G}_2 due to Cartan: the Lie algebra 𝔤 2\mathfrak{g}_2 is formed by the infinitesimal symmetries of the system of PDE u xx=13(u yy) 3,u xy=12(u yy) 2. u_{x x} = \frac{1}{3} (u_{y y})^3, \quad u_{x y} = \frac{1}{2} (u_{y y})^2 . What does it mean to be an infintesimal symmetry of a PDE? To understand this, we need to see how PDE can be realized geometrically, using jet bundles.

A jet bundle over 2\mathbb{C}^2 is a bundle whose sections are given by holomorphic functions u: 2 u \colon \mathbb{C}^2 \to \mathbb{C} and their partials, up to some order. Since we have a 2nd order PDE, we need the 2nd jet bundle: J 2( 2,) 2 \begin{matrix} J^2(\mathbb{C}^2, \mathbb{C}) \\ \downarrow \\ \mathbb{C}^2 \end{matrix} This is actually the trivial bundle whose total space is 8\mathbb{C}^8, but we label the coordinates suggestively: J 2( 2,)={(x,y,u,u x,u y,u xx,u xy,u yy) 8}. J^2(\mathbb{C}^2, \mathbb{C}) = \left\{ (x,y,u,u_x,u_y, u_{x x}, u_{x y}, u_{y y}) \in \mathbb{C}^8 \right\} . The bundle projection just picks out (x,y)(x,y).

For the moment, u xu_x, u yu_y and so on are just the names of some extra coordinates and have nothing to do with derivatives. To relate them, we choose some distinguished 1-forms on J 2J^2, called the contact 1-forms, spanned by holomorphic combinations of θ 1 = duu xdxu ydy, θ 2 = du xu xxdxu xydy, θ 3 = du yu xydxu yydy. \begin{array}{rcl} \theta_1 & = & d u - u_x d x - u_y d y, \\ \theta_2 & = & d u_x - u_{x x} d x - u_{x y} d y, \\ \theta_3 & = & d u_y - u_{x y} d x - u_{y y} d y . \end{array} These are chosen so that, if our suggestively named variables really were partials, these 1-forms would vanish.

For any holomorphic function u: 2 u \colon \mathbb{C}^2 \to \mathbb{C} we get a section j 2uj^2 u of J 2J^2, called the prolongation of uu. It simply takes those variables that we named after the partial derivatives seriously, and gives us the actual partial derivatives of uu in those slots: (j 2u)(x,y)=(x,y,u(x,y),u x(x,y),u y(x,y),u xx(x,y),u xy(x,y),u yy(x,y)). (j^2 u) (x,y) = (x, y, u(x,y), u_x(x,y), u_y(x,y), u_{x x}(x,y), u_{x y}(x,y), u_{y y}(x,y) ) . Conversely, an arbitrary section ss of J 2J^2 is the prolongation of some uu if and only if it annihilates the contact 1-forms. Since contact 1-forms are spanned by θ 1\theta_1, θ 2\theta_2 and θ 3\theta_3, it suffices that: s *θ 1=0,s *θ 2=0,s *θ 3=0. s^\ast \theta_1 = 0, \quad s^\ast \theta_2 = 0, \quad s^\ast \theta_3 = 0 . Such sections are called holonomic. This correspondence between prolongations and holonomic sections is the key to thinking about jet bundles.

Our PDE u xx=13(u yy) 3,u xy=12(u yy) 2 u_{x x} = \frac{1}{3} (u_{y y})^3, \quad u_{x y} = \frac{1}{2} (u_{y y})^2 carves out a submanifold SS of J 2J^2. Solutions correspond to local holonomic sections that land in SS. In general, PDE give us submanifolds of jet spaces.

The external symmetries of our PDE are those diffeomorphisms of J 2J^2 that send contact 1-forms to contact 1-forms and send SS to itself. The infinitesimal external symmetries are vector fields that preserve SS and the contact 1-forms. There are also things called internal symmetries, but I won’t need them here.

So now we’re ready for:

Amazing theorem 1. The infinitesimal external symmetries of our PDE is the Lie algebra 𝔤 2\mathfrak{g}_2.

Like I said above, Dennis takes this amazing theorem of Cartan and connects it to an amazing theorem of Engel, and then generalizes the whole story to nearly all simple complex Lie algebras. Here’s Engel’s amazing theorem:

Amazing theorem 2. 𝔤 2\mathfrak{g}_2 is the Lie algebra of infinitesimal contact transformations on a 5-dim contact manifold preserving a field of twisted cubic varieties.

This theorem lies at the heart of the story, so let me explain what it’s saying. First, it requires us to become acquainted with contact geometry, the odd-dimensional cousin of symplectic geometry. A contact manifold MM is a (2n+1)(2n+1)-dimensional manifold with a contact distribution CC on it. This is a smoothly-varying family of 2n2n-dimensional subspaces C mC_m of each tangent space T mMT_m M, satisfying a certain nondegeneracy condition.

In Engel’s theorem, MM is 5-dimensional, so each C mC_m is 4-dimensional. We can projectivize each C mC_m to get a 3-dimensional projective space (C m)\mathbb{P}(C_m) over each point. Our field of twisted cubic varieties is a curve in each of these projective spaces, the image of a cubic map: 1(C m). \mathbb{C}\mathbb{P}^1 \to \mathbb{P}(C_m) . This gives us a curve 𝒱 m\mathcal{V}_m in each (C m)\mathbb{P}(C_m), and taken together this is our field of twisted cubic varieties, 𝒱\mathcal{V}. Engel gave explicit formulas for a contact structure on 5\mathbb{C}^5 with a twisted cubic field 𝒱\mathcal{V} whose symmetries are 𝔤 2\mathfrak{g}_2, and you can find these formulas in Dennis’s paper.

How are these two theorems related? The secret is to go back to thinking about jet spaces, except this time, we’ll start with the 1st jet space: J 1( 2,)={(x,y,u,u x,u y) 5}. J^1(\mathbb{C}^2, \mathbb{C}) = \left\{ (x, y, u, u_x, u_y) \in \mathbb{C}^5 \right\} . This comes equipped with a space of contact 1-forms, spanned by a single 1-form: θ=duu xdxu ydy. \theta = d u - u_x d x - u_y d y . And now we see where contact 1-forms get their name: this contact 1-form defines a contact structure on J 1J^1, given by C=ker(θ)C = \mathrm{ker}(\theta).

Many of you may know Darboux’s theorem in symplectic geometry, which says that any two symplectic manifolds of the same dimension look the same locally. In contact geometry, the analogue of Darboux’s theorem holds, and goes by the name of Pfaff’s theorem. By Pfaff’s theorem, there’s an open set in J 1J^1 which is contactomorphic to an open set in 5\mathbb{C}^5 with Engel’s contact structure. And we can use this map to transfer our twisted cubic field 𝒱\mathcal{V} to J 1J^1, or at least an open subset of it. This gives us a twisted cubic field on J 1J^1, one that continues to have 𝔤 2\mathfrak{g}_2 symmetry.

We are getting tantalizingly close to a PDE now. We have a jet space J 1J^1, with some structure on it. We just lack a submanifold of that jet space. Our twisted cubic field 𝒱\mathcal{V} gives us a curve in each (C m)\mathbb{P}(C_m), not in J 1J^1 itself.

To these ingredients, add a bit of magic. Dennis found a natural construction that takes our twisted cubic field 𝒱\mathcal{V} and gives us a submanifold of a space that, at least locally, looks like J 2( 2,)J^2(\mathbb{C}^2, \mathbb{C}), and hence describes a PDE. This PDE is the G 2\mathrm{G}_2 PDE.

It works like this. Our contact 1-form θ\theta endows each C mC_m with a symplectic structure, dθ md\theta_m. Starting with our contact structure, CC, this symplectic structure is only defined up to rescaling, because CC determines θ\theta only up to rescaling. Nonetheless, it makes sense to look for subspaces of C mC_m that are Lagrangian: subspaces of maximal dimension on which dθ md\theta_m vanishes. The space of all Lagrangian subspaces of C mC_m is called the Lagrangian-Grassmannian, LG(C m)\mathrm{LG}(C_m), and we can form a bundle LG(J 1) J 1 \begin{matrix} \mathrm{LG}(J^1) \\ \downarrow \\ J^1 \\ \end{matrix} whose fiber over each point is LG(C m)LG(C_m). It turns out LG(J 1)LG(J^1) is locally the same as J 2( 2,)J^2(\mathbb{C}^2, \mathbb{C}), complete the with latter’s complement of contact 1-forms.

Dennis’s construction takes 𝒱\mathcal{V} and gives us a submanifold of LG(J 1)\mathrm{LG}(J^1), as follows. Remember, each 𝒱 m\mathcal{V}_m is a curve in (C m)\mathbb{P}(C_m). The tangent space to a point p𝒱 mp \in \mathcal{V}_m is thus a line in the projective space (C m)\mathbb{P}(C_m), and this corresponds to 2-dimensional subspace of the 4-dimensional contact space C mC_m. This subspace turns out to be Lagrangian! Thus, points pp of 𝒱 m\mathcal{V}_m give us points of LG(C m)LG(C_m), and letting mm and pp vary, we get a submanifold of LG(J 1)LG(J^1). Locally, this is our PDE.

Dennis then generalizes this story to all simple Lie algebras besides 𝔰𝔩(2,)\mathfrak{sl}(2,\mathbb{C}). For simple Lie groups other than those in the AA and CC series, there is a homogenous space with a natural contact structure that has a field of twisted varieties living on it, called the field of “sub-adjoint varieties”. The same construction that worked for G 2\mathrm{G}_2 now gives PDE for these. The AA and CC cases take more care.

Better yet, Dennis builds on work of Landsberg and Manivel to get explicit descriptions of all these PDE in terms of cubic forms on Jordan algebras! Landsberg and Manivel describe the field of sub-adjoint varieties using these cubic forms. For G 2\mathrm{G}_2, the Jordan algebra in question is the complex numbers \mathbb{C} with the cubic form (t)=t 33. \mathfrak{C}(t) = \frac{t^3}{3} .

Given any Jordan algebra WW with a cubic form \mathfrak{C} on it, first polarize \mathfrak{C}: (t)= abct at bt c, \mathfrak{C}(t) = \mathfrak{C}_{abc} t^a t^b t^c , and then cook up a PDE for a function u:W. u \colon \mathbb{C} \oplus W \to \mathbb{C} . as follows: u 00= abct at bt c,u 0a=32 abct bt c,u ab=3 abct c, u_{00} = \mathfrak{C}_{abc} t^a t^b t^c, \quad u_{0a} = \frac{3}{2} \mathfrak{C}_{a b c} t^b t^c, \quad u_{a b} = 3 \mathfrak{C}_{a b c} t^c , where tWt \in W, and I’ve used the indices aa, bb, and cc for coordiantes in WW, 0 for the coordinate in \mathbb{C}. For G 2\mathrm{G}_2, this gives us the PDE u 00=t 33,u 01=t 22,u 11=t, u_{00} = \frac{t^3}{3}, \quad u_{01} = \frac{t^2}{2}, \quad u_{11} = t , which is clearly equivalent to the PDE we wrote down earlier. Note that this PDE is determined entirely by the cubic form \mathfrak{C} - the product on our Jordan algebra plays no role.

Now we’re ready for Dennis’s amazing theorem.

Amazing theorem 3. Let W=𝔥 3(𝕆)W = \mathbb{C} \otimes \mathfrak{h}_3(\mathbb{O}), the exceptional Jordan algebra, and \mathfrak{C} be the cubic form on WW given by the determinant. Then the following PDE on W\mathbb{C} \oplus W u 00= abct at bt c,u 0a=32 abct bt c,u ab=3 abct c, u_{00} = \mathfrak{C}_{abc} t^a t^b t^c, \quad u_{0a} = \frac{3}{2} \mathfrak{C}_{a b c} t^b t^c, \quad u_{a b} = 3 \mathfrak{C}_{a b c} t^c , has external symmetry algebra 𝔢 8\mathfrak{e}_8.


Thanks to Dennis The for explaining his work to me, and for his comments on drafts of this post.

Scott AaronsonThree announcements

(-2) Today’s Bonus Announcement: Stefan O’Rear says that his Turing machine to search for contradictions in ZFC is now down to 1919 states.  If verified, this is an important milestone: our upper bound on the number of Busy Beaver values that are knowable in standard mathematics is now less than the number of years since the birth of Christ (indeed, even since the generally-accepted dates for the writing of the Gospels).

Stefan also says that his Not-Quite-Laconic system has yielded a 1008-state Turing machine to search for counterexamples to the Riemann Hypothesis, improving on our 5372 states.

(-1) Another Bonus Announcement: Great news, everyone!  Using a modified version of Adam Yedidia’s Laconic language (which he calls NQL, for Not Quite Laconic), Stefan O’Rear has now constructed a 5349-state Turing machine that directly searches for contradictions in ZFC (or rather in Metamath, which is known to be equivalent to ZFC), and whose behavior is therefore unprovable in ZFC, assuming ZFC is consistent.  This, of course, improves on my and Adam’s state count by 2561 states—but it also fixes the technical issue with needing to assume a large cardinal axiom (SRP) in order to prove that the TM runs forever.  Stefan promises further state reductions in the near future.

In other news, Adam has now verified the 43-state Turing machine by Jared S that halts iff there’s a counterexample to Goldbach’s Conjecture.  The 27-state machine by code golf addict is still being verified.

(0) Bonus Announcement: I’ve had half a dozen “Ask Me Anything” sessions on this blog, but today I’m trying something different: a Q&A session on Quora.  The way it works is that you vote for your favorite questions; then on Tuesday, I’ll start with the top-voted questions and keep going down the list until I get tired.  Fire away!  (And thanks to Shreyes Seshasai at Quora for suggesting this.)

(1) When you announce a new result, the worst that can happen is that the result turns out to be wrong, trivial, or already known.  The best that can happen is that the result quickly becomes obsolete, as other people race to improve it.  With my and Adam Yedidia’s work on small Turing machines that elude set theory, we seem to be heading for that best case.  Stefan O’Rear wrote a not-quite-Laconic program that just searches directly for contradictions in a system equivalent to ZFC.  If we could get his program to compile, it would likely yield a Turing machine with somewhere around 6,000-7,000 states whose behavior was independent of ZFC, and would also fix the technical problem with my and Adam’s machine Z, where one needed to assume a large-cardinal axiom called SRP to prove that Z runs forever.  While it would require a redesign from the ground up, a 1,000-state machine whose behavior eludes ZFC also seems potentially within reach using Stefan’s ideas.  Meanwhile, our 4,888-state machine for Goldbach’s conjecture seems to have been completely blown out of the water: first, a commenter named Jared S says he’s directly built a 73-state machine for Goldbach (now down to 43 states); second, a commenter named “code golf addict” claims to have improved on that with a mere 31 states (now down to 27 states).  These machines are now publicly posted, but still await detailed verification.

(2) My good friend Jonah Sinick cofounded Signal Data Science, a data-science summer school that will be running for the second time this summer.  They operate on an extremely interesting model, which I’m guessing might spread more widely: tuition is free, but you pay 10% of your first year’s salary after finding a job in the tech sector.  He asked me to advertise them, so—here!

(3) I was sad to read the news that Uber and Lyft will be suspending all service in Austin, because the city passed an ordinance requiring their drivers to get fingerprint background checks, and imposing other regulations that Uber and Lyft argue are incompatible with their model of part-time drivers.  The companies, of course, are also trying to send a clear message to other cities about what will happen if they don’t get the regulatory environment they want.  To me, the truth of the matter is that Uber/Lyft are like the web, Google, or smartphones: clear, once-per-decade quality-of-life advances that you try once, and then no longer understand how you survived without.  So if Austin wants to maintain a reputation as a serious, modern city, it has no choice but to figure out some way to bring these companies back to the negotiating table.  On the other hand, I’d also say to Uber and Lyft that, even if they needed to raise fares to taxi levels to comply with the new regulations, I expect they’d still do a brisk business!

For me, the “value proposition” of Uber has almost nothing to do with the lower fares, even though they’re lower.  For me, it’s simply about being able to get from one place to another without needing to drive and park, and also without needing desperately to explain where you are, over and over, to a taxi dispatcher who sounds angry that you called and who doesn’t understand you because of a combination of language barriers and poor cellphone reception and your own inability to articulate your location.  And then wondering when and if your taxi will ever show up, because the dispatcher couldn’t promise a specific time, or hung up on you before you could ask them.  And then embarking on a second struggle, to explain to the driver where you’re going, or at least convince them to follow the Google Maps directions.  And then dealing with the fact that the driver has no change, you only have twenties and fifties, and their little machine that prints receipts is out of paper so you can’t submit your trip for reimbursement either.

So yes, I really hope Uber, Lyft, and the city of Austin manage to sort this out before Dana and I move there!  On the other hand, I should say that there’s another part of the new ordinance—namely, requiring Uber and Lyft cars to be labeled—that strikes me as an unalloyed good.  For if there’s one way in which Uber is less convenient than taxis, it’s that you can never figure out which car is your Uber, among all the cars stopping or slowing down near you that look vaguely like the one in the app.

May 15, 2016

Chad Orzel251-256/366: The Week in Photos

Getting caught up to today, a bunch of pictures from the past week (250/366 was the photo of Emmy’s memorial shrub):

251/366: Wagons Ho!

SteelyKid and The Pip taking turns pulling each other in their wagon.

SteelyKid and The Pip taking turns pulling each other in their wagon.

Last weekend we had gorgeous spring weather, so the kids were playing outside a bunch. They ended up getting out the wooden wagon SteelyKid got for her birthday some years back, and giving each other rides. This inevitably resulted in me being asked to tow them around the neighborhood, but they did return the favor…

SteelyKid and The Pip pulling me on a wagon.

SteelyKid and The Pip pulling me on a wagon.

(Photo by Kate.) They’re very strong.

252/366: M-V-P! M-V-P!

SteelyKid's MVP of the Week poster.

SteelyKid’s MVP of the Week poster.

Just a cell-phone snap of a display in the hall at the JCC, so not a great picture. It does, however, commemorate two things: 1) SteelyKid being honored for being a good kid at her after-school program, and 2) the bug-eyed face she’s doing when asked to smile for a picture these days. This is entirely deliberate on her part, for the record– she knows it’s a goofy face, and delights in making it whenever anybody tries to take a picture.

253/366: Lead Off

SteelyKid on first base, ready to run.

SteelyKid on first base, ready to run.

SteelyKid is playing rec softball this spring, which gives me an excuse to take a lot of telephoto pictures, because baseball variants take a loooooong time. This is probably my favorite from the batch of photos I got at her very first game on Tuesday.

254/366: At Bat

SteelyKid about to get a hit.

SteelyKid about to get a hit.

Of course, I can’t post a softball picture without showing her at the bat. Where she’s the Ted Williams of the second-grade set– the longest at-bat she’s had in two games has gone three pitches.

255/366: Climb High, Climb Far

The Pip can climb very high.

The Pip can climb very high.

The park where the softball games are played has this big Platonic solid with a web of ropes inside for kids to climb on. When we arrived on Tuesday, The Pip promptly climbed to the little platform in the center, then demanded in a slightly whiny voice “Dad, why can’t I climb higher than this?” I was a little frazzled trying to identify the correct field for SteelyKid’s game and herd her in that direction, so I distractedly replied “What? You don’t have to stop there. Climb as high as you like. Go nuts.”

So, yeah, that got a little scary. Notice Kate in the lower right, for scale.

256/366: Sakura

Two pictures of the flowers on the ornamental cherry tree in our yard.

Two pictures of the flowers on the ornamental cherry tree in our yard.

I realize that in a lot of the country, peak flowering-tree season passed a month or more ago, but remember, we’re in the Northeast. So we just had pretty cherry blossoms this week on the small tree in our front hard.

Geraint F. LewisHow Far Can We Go? A long way, but not not that far!

Obligatory "sorry it's been a long time since I posted" comment. Life, grants, student, etc All of the usual excuses! But I plan (i.e. hope) to do more writing here in the future.

But what's the reason for today's post? Namely this video posted on your tube.
The conclusion is that humans are destined to explore the Local Group of galaxies, and that is it. And this video has received a bit of circulation on the inter-webs, promoted by a few sciency people.

The problem, however, is that it is wrong. The basic idea is that accelerating expansion due to the presence of dark energy means that the separation of objects will get faster and fast, and so it will be a little like chasing after a bus; the distance between the two of you will continue to get bigger and bigger. This part is correct, and in the very distant universe, there will be extremely isolated bunches of galaxies whose own gravitational pull overcomes the cosmic expansion. But the rest, just how much we can explore is wrong.

Why? Because they seem to have forgotten something key. Once we are out there traveling in the "expanding universe" then the expansion works in our advantage, increasing the distance not only between us and where we want to get to, but also between us and home. We effectively "ride" expansion.

So, how far could we get? Well, time to call (again - sorry) Tamara Davis's excellent cosmological work, in particular this paper on misconceptions about the Big Bang. I've spoken about this paper many times (and read it, it is quite excellent) but for this post, what we need to look is at the "conformal" picture of our universe. I don't have time togo into the details here, but the key thing is that you manipulate space and time so light rays trade at 45 degrees in the picture. Here's our universe.

The entire (infinite!) history of the universe is in this picture, mapped onto "conformal time". We're in the middle on the line marked now. If we extend our past light cone into the future, we can see the volume of the universe acceptable to us, given the continued accelerating expansion. We can see that encompasses objects that are currently not far from 20 billion light years away from us. This means that light rays fired out today will get this far, much, much larger than the Local Group of galaxies.

But ha! you scoff, that's a light ray. Puny humans in rockets have no chance!

Again, wrong, as you need to care about relativity again. How do I know? I wrote a paper about this with two smart students, Juliana Kwan (who is now at the University of Pennsylvania)  and Berian James, at Square. The point is that if you accelerate off into the universe, even at a nice gentle acceleration similar to what we experience here on Earth, you still get to explore much of the universe accessible to light rays.

Here's our paper 
 The key point is not just about how far you want to get, but whether or not you want to get home again. I am more than happy to acknowledge Jeremy Heyl's earlier work that inspired ours.

One tiny last point is the question whether our (or maybe not our) decedents will realise that there is dark energy in the universe. Locked away in Milkomenda (how I hate that name)  the view of the dark universe in the future might lead you to conclude that there is no more to the universe than ourselves, and it would appear static and unchanging, but anything thrown "out there", such as rocket ships (as per above) or high velocity stars, would still reveal the presence of dark energy.

There's plenty of universe we could potentially explore!

BackreactionDear Dr B: If photons have a mass, would this mean special relativity is no longer valid?

Einstein and Lorentz.
[Image: Wikipedia]
“[If photons have a restmass] would that mean the whole business of the special theory of relativity being derived from the idea that light has to go at a particular velocity in order for it to exist/Maxwell’s identification of e/m waves as light because they would have to go at the appropriate velocity is no longer valid?”

(This question came up in the discussion of a recent proposal according to which photons with a tiny restmass might cause an effect similar to the cosmological constant.)

Dear Brian,

The short answer to your question is “No.” If photons had a restmass, special relativity would still be as valid as it’s always been.

The longer answer is that the invariance of the speed of light features prominently in the popular explanations of special relativity for historic reasons, not for technical reasons. Einstein was lead to special relativity contemplating what it would be like to travel with light, and then tried to find a way to accommodate an observer’s motion with the invariance of the speed of light. But the derivation of special relativity is much more general than that, and it is unnecessary to postulate that the speed of light is invariant.

Special relativity is really just physics in Minkowski space, that is the 4-dimensional space-time you obtain after promoting time from a parameter to a coordinate. Einstein wanted the laws of physics to be the same for all inertial observers in Minkowski-space, ie observers moving at constant velocity. If you translate this requirement into mathematics, you are lead to ask for the symmetry transformations in Minkowski-space. These transformations form a group – the Poincaré-group – from which you can read off all the odd things you have heard of: time-dilatation, length-contraction, relativistic mass, and so on.

The Poincaré-group itself has two subgroups. One contains just translations in space and time. This tells you that if you have an infinitely extended and unchanging space then it doesn’t matter where or when you do your experiment, the outcome will be the same. The remaining part of the Poincaré-group is the Lorentz-group. The Lorentz-group contains rotations – this tells you it doesn’t matter in which direction you turn, the laws of nature will still be the same. Besides the rotations, the Lorentz-group contains boosts, that are basically rotations between space and time. Invariance under boosts tells you that it doesn’t matter at which velocity you move, the laws of nature will remain the same. It’s the boosts where all the special relativistic fun goes on.

Deriving the Lorentz-group, if you know how to do it, is a three-liner, and I assure you it has absolutely nothing to do with rocket ships and lasers and so on. It is merely based on the requirement that the metric of Minkowski-space has to remain invariant. Carry through with the math and you’ll find that the boosts depend on a free constant with the dimension of a speed. You can further show that this constant is the speed of massless particles.

Hence, if photons are massless, then the constant in the Lorentz-transformation is the speed of light. If photons are not massless, then the constant in the Lorentz-transformation is still there, but not identical to the speed of light. We already know however that these constants must be identical to very good precision, which is the same as saying the mass of photons must be very small.

Giving a mass to photons is unappealing not because it violates special relativity – it doesn’t – but because it violates gauge-invariance, the most cherished principle underlying the standard model. But that’s a different story and shall be told another time.

Thanks for an interesting question!

May 14, 2016

Terence TaoNotes on the Nash embedding theorem

Throughout this post we shall always work in the smooth category, thus all manifolds, maps, coordinate charts, and functions are assumed to be smooth unless explicitly stated otherwise.

A (real) manifold {M} can be defined in at least two ways. On one hand, one can define the manifold extrinsically, as a subset of some standard space such as a Euclidean space {{\bf R}^d}. On the other hand, one can define the manifold intrinsically, as a topological space equipped with an atlas of coordinate charts. The fundamental embedding theorems show that, under reasonable assumptions, the intrinsic and extrinsic approaches give the same classes of manifolds (up to isomorphism in various categories). For instance, we have the following (special case of) the Whitney embedding theorem:

Theorem 1 (Whitney embedding theorem) Let {M} be a compact manifold. Then there exists an embedding {u: M \rightarrow {\bf R}^d} from {M} to a Euclidean space {{\bf R}^d}.

In fact, if {M} is {n}-dimensional, one can take {d} to equal {2n}, which is often best possible (easy examples include the circle {{\bf R}/{\bf Z}} which embeds into {{\bf R}^2} but not {{\bf R}^1}, or the Klein bottle that embeds into {{\bf R}^4} but not {{\bf R}^3}). One can also relax the compactness hypothesis on {M} to second countability, but we will not pursue this extension here. We give a “cheap” proof of this theorem below the fold which allows one to take {d} equal to {2n+1}.

A significant strengthening of the Whitney embedding theorem is (a special case of) the Nash embedding theorem:

Theorem 2 (Nash embedding theorem) Let {(M,g)} be a compact Riemannian manifold. Then there exists a isometric embedding {u: M \rightarrow {\bf R}^d} from {M} to a Euclidean space {{\bf R}^d}.

In order to obtain the isometric embedding, the dimension {d} has to be a bit larger than what is needed for the Whitney embedding theorem; in this article of Gunther the bound

\displaystyle  d = \max(  n(n+5)/2, n(n+3)/2 + 5) \ \ \ \ \ (1)

is attained, which I believe is still the record for large {n}. (In the converse direction, one cannot do better than {d = \frac{n(n+1)}{2}}, basically because this is the number of degrees of freedom in the Riemannian metric {g}.) Nash’s original proof of theorem used what is now known as Nash-Moser inverse function theorem, but a subsequent simplification of Gunther allowed one to proceed using just the ordinary inverse function theorem (in Banach spaces).

I recently had the need to invoke the Nash embedding theorem to establish a blowup result for a nonlinear wave equation, which motivated me to go through the proof of the theorem more carefully. Below the fold I give a proof of the theorem that does not attempt to give an optimal value of {d}, but which hopefully isolates the main ideas of the argument (as simplified by Gunther). One advantage of not optimising in {d} is that it allows one to freely exploit the very useful tool of pairing together two maps {u_1: M \rightarrow {\bf R}^{d_1}}, {u_2: M \rightarrow {\bf R}^{d_2}} to form a combined map {(u_1,u_2): M \rightarrow {\bf R}^{d_1+d_2}} that can be closer to an embedding or an isometric embedding than the original maps {u_1,u_2}. This lets one perform a “divide and conquer” strategy in which one first starts with the simpler problem of constructing some “partial” embeddings of {M} and then pairs them together to form a “better” embedding.

In preparing these notes, I found the articles of Deane Yang and of Siyuan Lu to be helpful.

— 1. The Whitney embedding theorem —

To prove the Whitney embedding theorem, we first prove a weaker version in which the embedding is replaced by an immersion:

Theorem 3 (Weak Whitney embedding theorem) Let {M} be a compact manifold. Then there exists an immersion {u: M \rightarrow {\bf R}^d} from {M} to a Euclidean space {{\bf R}^d}.

Proof: Our objective is to construct a map {u: M \rightarrow {\bf R}^d} such that the derivatives {\partial_\alpha u(x)} are linearly independent in {{\bf R}^d} for each {x \in M}. For any given point {x_0 \in M}, we have a coordinate chart {\phi_{x_0}: U_{x_0} \rightarrow {\bf R}^n} from some neighbourhood {U_{x_0}} of {x_0} to {{\bf R}^n}. If we set {u_{x_0}: M \rightarrow {\bf R}^n} to be {\phi_{x_0}} multiplied by a suitable cutoff function supported near {x_0}, we see that {u_{x_0}} is an immersion in a neighbourhood of {x_0}. Pairing together finitely many of the {u_{x_0}} and using compactness, we obtain the claim. \Box

Now we upgrade the immersion {u} from the above theorem to an embedding by further use of pairing. First observe that as {M} is smooth and compact, an embedding is nothing more than an immersion that is injective. Let {u: M \rightarrow {\bf R}^d} be an immersion. Let {\Sigma \subset M \times M} be the set of pairs {(x_1,x_2)} of distinct points {(x_1,x_2)\in M} such that {u(x_1)=u(x_2)}; note that this set is compact since {u} is an immersion (and so there is no failure of injectivity when {(x_1,x_2)} is near the diagonal). If {\Sigma} is empty then {u} is injective and we are done. If {\Sigma} contains a point {(x_1,x_2)}, then by pairing {u} with some scalar function {\eta: M \rightarrow {\bf R}} that separates {x_1} and {x_2}, we can replace {u} by another immersion (in one higher dimension {{\bf R}^{d+1}}) such that a neighbourhood of {x_1} and a neighbourhood of {x_2} get mapped to disjoint sets, thus effectively removing an open neighbourhood of {(x_1,x_2)} from {\Sigma}. Repeating these procedures finitely many times, using the compactness of {\Sigma}, we end up with an immersion which is injective, giving the Whitney embedding theorem.

At present, the embedding {u: M \rightarrow {\bf R}^d} of an {n}-dimensional compact manifold {M} could be extremely high dimensional. However, if {d > 2n+1}, then it is possible to project {u} from {{\bf R}^d} to {{\bf R}^{d-1}} by the random projection trick (discussed in this previous post). Indeed, if one picks a random element {\omega \in S^{d-1}} of the unit sphere, and then lets {T: {\bf R}^d \rightarrow \omega^\perp} be the (random) orthogonal projection to the hyperplane {\omega^\perp} orthogonal to {\omega}, then it is geometrically obvious that {T \circ u: M \rightarrow \omega^\perp} will remain an embedding unless {\omega} either is of the form {\pm \frac{u(x)-u(y)}{\|u(x)-u(y)\|}} for some distinct {x,y \in M}, or lies in the tangent plane to {u(M)} at {u(x)} for some {x \in M}. But the set of all such excluded {\omega} is of dimension at most {2n} (using, for instance, the Hausdorff notion of dimension), and so for {d > 2n+1} almost every {\omega} in {S^{d-1}} will avoid this set. Thus one can use these projections to cut the dimension {d} down by one for {d>2n+1}; iterating this observation we can end up with the final value of {d=2n+1} for the Whitney embedding theorem.

Remark 4 The Whitney embedding theorem for {d=2n} is more difficult to prove. Using the random projection trick, one can arrive at an immersion {u: M \rightarrow {\bf R}^{2n}} which is injective except at a finite number of “double points” where {u(M)} meets itself transversally (think of projecting a knot in {{\bf R}^3} randomly down to {{\bf R}^2}). One then needs to “push” the double points out of existence using a device known as the “Whitney trick”.

— 2. Reduction to a local isometric embedding theorem —

We now begin the proof of the Nash embedding theorem. In this section we make a series of reductions that reduce the “global” problem of isometric embedding a compact manifold to a “local” problem of turning a near-isometric embedding of a torus into a true isometric embedding.

We first make a convenient (though not absolutely necessary) reduction: in order to prove Theorem 2, it suffices to do so in the case when {M} is a torus {({\bf R}/{\bf Z})^n} (equipped with some metric {g} which is not necessarily flat). Indeed, if {M} is not a torus, we can use the Whitney embedding theorem to embed {M} (non-isometrically) into some Euclidean space {{\bf R}^m}, which by rescaling and then quotienting out by {{\bf Z}^m} lets one assume without loss of generality that {M} is some submanifold of a torus {({\bf R}/{\bf Z})^m} equipped with some metric {g}. One can then use a smooth version of the Tietze extension theorem to extend the metric {g} smoothly from {M} to all of {({\bf R}/{\bf Z})^m}; this extended metric {\tilde g} will remain positive definite in some neighbourhood of {M}, so by using a suitable (smooth) partition of unity and taking a convex combination of {\tilde g} with the flat metric on {({\bf R}/{\bf Z})^m}, one can find another extension {g'} of {g} to {({\bf R}/{\bf Z})^m} that remains positive definite (and symmetric) on all of {({\bf R}/{\bf Z})^m}, giving rise to a Riemannian torus {(({\bf R}/{\bf Z})^{m}, g')}. Any isometric embedding of this torus into {{\bf R}^d} will induce an isometric embedding of the original manifold {M}, completing the reduction.

The main advantage of this reduction to the torus case is that it gives us a global system of (periodic) coordinates on {M}, so that we no longer need to work with local coordinate charts. Also, one can easily use Fourier analysis on the torus to verify the ellipticity properties of the Laplacian that we will need later in the proof. These are however fairly minor conveniences, and it would not be difficult to continue the argument below without having first reduced to the torus case.

Henceforth our manifold {M} is assumed to be the torus {({\bf R}/{\bf Z})^n} equipped with a Riemannian metric {g = g_{\alpha \beta}}, where the indices {\alpha,\beta} run from {1} to {n}. Our task is to find an injective map {u: ({\bf R}/{\bf Z})^n \rightarrow {\bf R}^d} which is isometric in the sense that it obeys the system of partial differential equations

\displaystyle  \partial_\alpha u \cdot \partial_\beta u = g_{\alpha \beta}

for {\alpha,\beta=1,\dots,n}, where {\cdot} denotes the usual dot product on {{\bf R}^d}. Let us write this equation as

\displaystyle  Q(u) = g

where {Q(u)} is the symmetric tensor

\displaystyle  Q(u)_{\alpha \beta} := \partial_\alpha u \cdot \partial_\beta u.

The operator {Q} is a nonlinear differential operator, but it behaves very well with respect to pairing:

\displaystyle  Q( (u_1,u_2) ) = Q(u_1) + Q(u_2). \ \ \ \ \ (2)

We can use (2) to obtain a number of very useful reductions (at the cost of worsening the eventual value of {d}, which as stated in the introduction we will not be attempting to optimise). First we claim that we can drop the injectivity requirement on {u}, that is to say it suffices to show that every Riemannian metric {g} on {({\bf R}/{\bf Z})^n} is of the form {g = Q(u)} for some map {u: ({\bf R}/{\bf Z})^n \rightarrow {\bf R}^d} into some Euclidean space {{\bf R}^d}. Indeed, suppose that this were the case. Let {u_1: ({\bf R}/{\bf Z})^n \rightarrow {\bf R}^d} be any (not necessarily isometric) embedding (the existence of which is guaranteed by the Whitney embedding theorem; alternatively, one can use the usual exponential map {\theta \mapsto (\cos \theta, \sin \theta)} to embed {{\bf R}/{\bf Z}} into {{\bf R}^2}). For {\varepsilon>0} small enough, the map {\varepsilon u_1} is short in the sense that {Q(\varepsilon u_1) < g} pointwise in the sense of symmetric tensors (or equivalently, the map {\varepsilon u_1} is a contraction from {(M,g)} to {{\bf R}^d}). For such an {\varepsilon}, we can write {g = Q(\varepsilon u_1) + g'} for some Riemannian metric {g'}. If we then write {g' = Q(u')} for some (not necessarily injective) map {u': ({\bf R}/{\bf Z})^n \rightarrow {\bf R}^{d'}}, then from (2) we see that {g = Q( (\varepsilon u_1, u') )}; since {(\varepsilon u_1, u')} inherits its injectivity from the component map {u_1}, this gives the desired isometric embedding.

Call a metric {g} on {({\bf R}/{\bf Z})^d} good if it is of the form {Q(u)} for some map {u: ({\bf R}/{\bf Z})^n \rightarrow {\bf R}^d} into a Euclidean space {{\bf R}^d}. Our task is now to show that every metric is good; the relation (2) tells us that the sum of any two good metrics is good.

In order to make the local theory work later, it will be convenient to introduce the following notion: a map {u: ({\bf R}/{\bf Z})^n \rightarrow {\bf R}^d} is said to be free if, for every point {x \in ({\bf R}/{\bf Z})^n}, the {n} vectors {\partial_\alpha u(x)}, {\alpha=1,\dots,n} and the {\frac{n(n+1)}{2}} vectors {\partial_\alpha \partial_\beta u(x)}, {1 \leq \alpha \leq \beta \leq n} are all linearly independent; equivalently, given a further map {v: ({\bf R}/{\bf Z})^n \rightarrow {\bf R}^d}, there are no dependencies whatsoever between the {n + \frac{n(n+1)}{2}} scalar functions {\partial_\alpha u \cdot v}, {\alpha=1,\dots,n} and {\partial_{\alpha \beta} u \cdot v}, {1 \leq \alpha \leq \beta \leq n}. Clearly, a free map into {{\bf R}^d} is only possible for {d \geq n + \frac{n(n+1)}{2}}, and this explains the bulk of the formula (1) of the best known value of {d}.

For any natural number {m}, the “Veronese embedding” {\iota: {\bf R}^m \rightarrow {\bf R}^{m + \frac{m(m+1)}{2}}} defined by

\displaystyle  \iota(x_1,\dots,x_m) := ( (x_\alpha)_{1 \leq \alpha \leq m}, (x_\alpha x_\beta)_{1 \leq \alpha \leq \beta \leq m} )

can easily be verified to be free. From this, one can construct a free map {u: ({\bf R}/{\bf Z})^n \rightarrow {\bf R}^{m + \frac{m(m+1)}{2}}} by starting with an arbitrary immersion {u: ({\bf R}/{\bf Z})^n \rightarrow {\bf R}^m} and composing it with the Veronese embedding (the fact that the composition is free will follow after several applications of the chain rule).

Given a Riemannian metric {g}, one can find a free map {u} which is short in the sense that {Q(u) < g}, by taking an arbitrary free map and scaling it down by some small scaling factor {\varepsilon>0}. This gives us a decomposition

\displaystyle  g = Q(u) + g'

for some Riemannian metric {g'}.

The metric {Q(u)} is clearly good, so by (2) it would suffice to show that {g'} is good. What is easy to show is that {g'} is approximately good:

Proposition 5 Let {g'} be a Riemannian metric on {({\bf R}/{\bf Z})^n}. Then there exists a smooth symmetric tensor {h} on {({\bf R}/{\bf Z})^n} with the property that {g' + \varepsilon^2 h} is good for every {\varepsilon>0}.

Proof: Roughly speaking, the idea here is to use “tightly wound spirals” to capture various “rank one” components of the metric {g'}, the point being that if a map {u: ({\bf R}/{\bf Z})^n \rightarrow {\bf R}^d} “oscillates” at some high frequency {\xi \in {\bf R}^n} with some “amplitude” {A}, then {Q(u)} is approximately equal to the rank one tensor {A \xi_\alpha \xi_\beta}. The argument here is related to the technique of convex integration, which among other things leads to one way to establish the {h}-principle of Gromov.

By the spectral theorem, every positive definite tensor {g_{\alpha \beta}} can be written as a positive linear combination of symmetric rank one tensors {v_\alpha v_\beta} for some vector {v \in {\bf R}^n}. By adding some additional rank one tensors if necessary, one can make this decomposition stable, in the sense that any nearby tensor {g_{\alpha \beta}} is also a positive linear combination of the {v_\alpha v_\beta}. One can think of {v_\alpha} as the gradient {\partial_\alpha \psi} of some linear function {\psi: {\bf R}^n \rightarrow {\bf R}^n}. Using compactness and a smooth partition of unity, one can then arrive at a decomposition

\displaystyle  g'_{\alpha \beta} = \sum_{i=1}^m (\eta^{(i)})^2 \partial_\alpha \psi^{(i)} \partial_\beta \psi^{(i)}

for some finite {m}, some smooth scalar functions {\eta^{(i)}, \psi^{(i)}: ({\bf R}/{\bf Z})^n \rightarrow {\bf R}} (one can take {\psi^{(i)}} to be linear functions on small coordinate charts, and {\eta^{(i)}} to basically be cutoffs to these charts).

For any {\varepsilon>0} and {i=1,\dots,m}, consider the “spiral” map {u_\varepsilon^{(i)}: ({\bf R}/{\bf Z})^n \rightarrow {\bf R}^2} defined by

\displaystyle  u_\varepsilon^{(i)} := \varepsilon \eta^{(i)} ( \cos( \frac{\psi^{(i)}}{\varepsilon} ), \sin( \frac{\psi^{(i)}}{\varepsilon} ) )

Direct computation shows that

\displaystyle  Q(u_\varepsilon^{(i)})_{\alpha \beta} = (\eta^{(i)})^2 \partial_\alpha \psi^{(i)} \partial_\beta \psi^{(i)}

\displaystyle  + \varepsilon^2 \eta^{(i)}_\alpha \eta^{(i)}_\beta

and the claim follows by summing in {i} (using (2)) and taking {h := \sum_{i=1}^m \eta^{(i)}_\alpha \eta^{(i)}_\beta}. \Box

The claim then reduces to the following local (perturbative) statement, that shows that the property of being good is stable around a free map:

Theorem 6 (Local embedding) Let {u: ({\bf R}/{\bf Z})^n \rightarrow {\bf R}^d} be a free map. Then {Q(u)+h} is good for all symmetric tensors {h} sufficiently close to zero in the {C^\infty} topology.

Indeed, assuming Theorem 6, and with {h} as in Proposition 5, we have {Q(u) - \varepsilon^2 h} good for {\varepsilon} small enough. By (2) and Proposition 5, we then have {g = (Q(u) - \varepsilon^2 h) + (g' + \varepsilon^2 h)} good, as required.

The remaining task is to prove Theorem 6. This is a problem in perturbative PDE, to which we now turn.

— 3. Proof of local embedding —

We are given a free map {u: ({\bf R}/{\bf Z})^n \rightarrow {\bf R}^d} and a small tensor {h}. It will suffice to find a perturbation {u+v: ({\bf R}/{\bf Z})^n \rightarrow {\bf R}^d} of {u} that solves the PDE

\displaystyle  Q(u+v) = Q(u) + h.

We can expand the left-hand side and cancel off {Q(u)} to write this as

\displaystyle  L(v)= h - Q(v) \ \ \ \ \ (3)

where the symmetric tensor-valued first-order linear operator {L} is defined (in terms of the fixed free map {u}) as

\displaystyle  L(v)_{\alpha \beta} := \partial_\alpha u \partial_\beta v + \partial_\beta u \partial_\alpha v.

To exploit the free nature of {u}, we would like to write the operator {L} in terms of the inner products {\partial_\alpha u \cdot v} and {\partial_{\alpha \beta} u \cdot v}. After some rearranging using the product rule, we arrive at the representation

\displaystyle  L(v) = \partial_\beta ( \partial_\alpha u \cdot v) + \partial_\alpha( \partial_\beta u \cdot v ) - 2 \partial_{\alpha \beta} u \cdot v.

Among other things, this allows for a way to right-invert the underdetermined linear operator {L}. As {u} is free, we can use Cramer’s rule to find smooth maps {w_{\alpha \beta}: ({\bf R}/{\bf Z})^n \rightarrow {\bf R}^d} for {\alpha,\beta=1,\dots,n} (with {w_{\alpha \beta} = w_{\beta \alpha}}) that is dual to {\partial_{\alpha \beta} u} in the sense that

\displaystyle  \partial_\alpha u \cdot w_{\alpha' \beta'} = 0

\displaystyle  \partial_{\alpha \beta} u \cdot w_{\alpha' \beta'} = \delta_{\alpha \alpha'} \delta_{\beta \beta'} + \delta_{\alpha \beta'} \delta_{\beta \alpha'}

where {\delta} denotes the Kronecker delta. If one then defines the linear zeroth-order operator {M} from symmetric tensors {f = f_{\alpha \beta}} to maps {Mf: ({\bf R}/{\bf Z})^n \rightarrow {\bf R}^d} by the formula

\displaystyle  Mf := -\frac{1}{4} \sum_{1 \leq \alpha,\beta \leq n} F_{\alpha \beta} w_{\alpha \beta}

then direct computation shows that {LMf = f} for any sufficiently regular {F}. As a consequence of this, one could try to use the ansatz {v = Mf} and transform the equation (3) to the fixed point equation

\displaystyle  f = h - Q(Mf). \ \ \ \ \ (4)

One can hope to solve this equation by standard perturbative techniques, such as the inverse function theorem or the contraction mapping theorem, hopefully exploiting the smallness of {h} to obtain the required contraction. Unfortunately we run into a fundamental loss of derivatives problem, in that the quadratic differential operator {Q} loses a degree of regularity, and this loss is not recovered by the operator {M} (which has no smoothing properties).

We know of two ways around this difficulty. The original argument of Nash used what is now known as the Nash-Moser iteration scheme to overcome the loss of derivatives by replacing the simple iterative scheme used in the contraction mapping theorem with a much more rapidly convergent scheme that generalises Newton’s method; see this previous blog post for a similar idea. The other way out, due to Gunther, is to observe that {Q(v)} can be factored as

\displaystyle  Q(v) = L Q_0(v) \ \ \ \ \ (5)

where {Q_0} is a zeroth order quadratic operator {Q_0}, so that (3) can be written instead as

\displaystyle  L( v - Q_0(v) ) = h,

and using the right-inverse {M}, it now suffices to solve the equation

\displaystyle  v = Mh + Q_0(v) \ \ \ \ \ (6)

(compare with (4)), which can be done perturbatively if {Q_0} is indeed zeroth order (e.g. if it is bounded on Hölder spaces such as {C^{2,\alpha}}).

It remains to achieve the desired factoring (5). We can bilinearise {Q(v)} as {\frac{1}{2} B(v,v)}, where

\displaystyle  B(v,w)_{\alpha \beta} := \partial_\alpha v \cdot \partial_\beta w + \partial_\beta v \cdot \partial_\alpha w.

The basic point is that when {v} is much higher frequency than {w}, then

\displaystyle  B(v,w)_{\alpha \beta} \approx \partial_\alpha ( v \cdot \partial_\beta w ) + \partial_\beta ( v \cdot \partial_\alpha w ) \ \ \ \ \ (7)

which can be approximated by {L} applied to some quantity relating to the vector field {v \cdot \partial_\beta w}; similarly if {w} is much higher frequency than {v}. One can formalise these notions of “much higher frequency” using the machinery of paraproducts, but one can proceed in a slightly more elementary fashion by using the Laplacian operator {\Delta = \sum_{\alpha=1}^n \partial_{\alpha \alpha}} and its (modified) inverse operator {(1-\Delta)^{-1}} (which is easily defined on the torus using the Fourier transform, and has good smoothing properties) as a substitute for the paraproduct calculus. We begin by writing

\displaystyle  Q(v) = \frac{1}{2} B(v,v) = \frac{1}{2} (1 - \Delta)^{-1} ( B(v,v) - \Delta B(v,v) ).

The dangerous term here is {\Delta B(v,v)}. Using the product rule and symmetry, we can write

\displaystyle  \Delta B(v,v) = 2 B(\Delta v, v) + 2 \sum_{\gamma=1}^n B(\partial_\gamma v, \partial_\gamma v).

The second term will be “lower order” in that it only involves second derivatives of {v}, rather than third derivatives. As for the higher order term {B(\Delta v, v)}, the main contribution will come from the terms where {\Delta v} is higher frequency than {v} (since the Laplacian accentuates high frequencies and dampens low frequencies, as can be seen by inspecting the Fourier symbol of the Laplacian). As such, we can profitably use the approximation (7) here. Indeed, from the product rule we have

\displaystyle  B(\Delta v, v)_{\alpha \beta} = \partial_\alpha ( \Delta v \cdot \partial_\beta v) + \partial_\beta (\Delta v \cdot \partial_\alpha v) - 2 \Delta v \cdot \partial_{\alpha \beta} v.

Putting all this together, we obtain the decomposition

\displaystyle  Q(v)_{\alpha \beta} = \partial_\alpha Q_\beta(v) + \partial_\beta Q_\alpha(v) + Q'_{\alpha \beta}(v)


\displaystyle  Q_\alpha(v) := - (1-\Delta)^{-1} ( \Delta v \cdot \partial_\beta v )


\displaystyle  Q'_{\alpha \beta}(v) := (1-\Delta)^{-1} ( \frac{1}{2} B(v,v)_{\alpha \beta} - \sum_{\gamma=1}^n B(\partial_\gamma v, \partial_\gamma v)_{\alpha \beta} + 2 \Delta v \cdot \partial_{\alpha \beta} v ).

If we then use Cramer’s rule to create smooth functions {w_\alpha} dual to the {\partial_\alpha u} in the sense that

\displaystyle  \partial_\alpha u \cdot w_{\alpha'} = \delta_{\alpha \alpha'}

\displaystyle  \partial_{\alpha \beta} u \cdot w_{\alpha' \beta'} = 0

then we obtain the desired factorisation (5) with

\displaystyle  Q_0(v) := \sum_{\alpha=1}^n Q_\alpha(v) w_\alpha - \frac{1}{4} \sum_{1 \leq \alpha,\beta \leq n} Q'_{\alpha \beta} w_{\alpha \beta}.

Note that {Q_0(v)} is the smoothing operator {(1-\Delta)^{-1}} applied to quadratic expressions of up to two derivatives of {v}. As such, one can show using elliptic (Schauder) estimates to show that {Q_0} is Lipschitz continuous in the Holder spaces {C^{2,\alpha}(({\bf R}/{\bf Z})^n)} for {0 < \alpha < 1} (with the Lipschitz constant being small when {v} has small norm); this together with the contraction mapping theorem in the Banach space {C^{2,\alpha}} is already enough to solve the equation (6) in this space if {h} is small enough. This is not quite enough because we also need {v} to be smooth; but it is possible (using Schauder estimates and product Hölder estimates) to establish bounds of the form

\displaystyle  \| Q_0(v) \|_{C^{k,\alpha}} \lesssim \| v \|_{C^{k,\alpha}} \|v\|_{C^{2,\alpha}} + O_k( \|v\|_{C^{k-1,\alpha}}^2 )

for any {k \geq 2} (with implied constants depending on {\alpha,u} but independent of {k}), which can be used (for {k} small enough) to show that the solution {v} constructed by the contraction mapping principle lies in {C^{k,\alpha}} for any {k \geq 2} (by showing that the iterates used in the construction remain bounded in these norms), and is thus smooth.

Filed under: expository, math.AP, math.DG, math.MG Tagged: Nash embedding theorem, Riemannian geometry, Whitney embedding theorem

Terence TaoVisualising random variables

When teaching mathematics, the traditional method of lecturing in front of a blackboard is still hard to improve upon, despite all the advances in modern technology.  However, there are some nice things one can do in an electronic medium, such as this blog.  Here, I would like to experiment with the ability to animate images, which I think can convey some mathematical concepts in ways that cannot be easily replicated by traditional static text and images. Given that many readers may find these animations annoying, I am placing the rest of the post below the fold.

Suppose we are in the classical (Kolmogorov) framework of probability theory, in which one has a probability space (\Omega, {\mathcal F}, {\mathbb P}) representing all possible states \omega \in \Omega.  One can make a distinction between deterministic quantities x that do not depend on the state \omega, and random variables (or stochastic variables) X = X(\omega) that do depend (in some measurable fashion) on the state \omega.  (As discussed in this previous post, it is often helpful to adopt a perspective that suppresses the sample space \Omega as much as possible, but we will not do so for the current discussion.)

One can visualise the distinction as follows.  If I pick a deterministic integer x between 1 and 6, say 3, then this fixes the value of x for the rest of the discussion:


However, if I pick a random integer X uniformly from \{1,\dots,6\} (e.g. by rolling a fair die), one can think of X as a quantity that keeps changing as one flips from one state to the next:


Here, I have “faked” the randomness by looping together a finite number of images, each of which is depicting one of the possible values X could take.  As such, one may notice that the above image eventually repeats in an endless loop.  One could presumably write some more advanced code to render a more random-looking sequence of X‘s, but the above imperfect rendering should hopefully suffice for the sake of illustration.

Here is a (“faked” rendering of) a random variable Y that also takes values in \{1,\dots,6\}, but is non-uniformly distributed, being more biased towards smaller values than larger values:


For continuous random variables, taking values for instance in {\bf R}^2 with some distribution (e.g. uniform in a square, multivariate gaussian, etc.) one could display these random variables as a rapidly changing dot wandering over {\bf R}^2; if one lets some “afterimages” of previous dots linger for some time on the screen, one can begin to see the probability density function emerge in the animation.  This is unfortunately beyond my ability to quickly whip up as an image; but if someone with a bit more programming skill is willing to do so, I would be very happy to see the result :).

The operation of conditioning to an event corresponds to ignoring all states in the sample space outside of the event.  For instance, if one takes the previous random variable X, and conditions to the event X > 3, one gets the conditioned random variable


One can use the animation to help illustrate concepts such as independence or correlation.  If we revert to the unconditioned random variable


and let Z be an independently sampled uniform random variable from \{1,\dots,6\}, one can sum the variables together to create a new random variable X+Z, ranging in \{2,\dots,12\}:



(In principle, the above images should be synchronised, so that the value of X stays the same from line to line at any given point in time.  Unfortunately, due to internet lag, caching, and other web artefacts, you may experience an unpleasant delay between the two.  Closing the page, clearing your cache and returning to the page may help.)

If on the other hand one defines the random variable Z' to be Z' := 7-X, then Z' has the same distribution as Z (they are both uniformly distributed on \{1,\dots,6\}, but now there is a very strong correlation between X and Z', leading to completely different behaviour for X+Z':



Filed under: expository, math.PR Tagged: random variables

Jordan EllenbergCroot-Lev-Pach on AP-free sets in (Z/4Z)^n

As you know I love the affine cap problem:  how big can a subset of (Z/3Z)^n be that contains no three elements summing to 0 — or, in other words, that contains no 3-term arithmetic progression?  The best upper bounds, due to Bateman and Katz, are on order 3^n / n^(1+epsilon).  And I think it’s fair to say that all progress on this problem, since Meshulam’s initial results, have come from Fourier-analytic arguments.

So I’m charmed by this paper of Ernie Croot, Vsevolod Lev, and Peter Pach which proves a much stronger result for A = (Z/4Z)^n:  a subset with no 3-term arithmetic progression has size at most c^n for c strictly less than 4.  Better still (for an algebraic geometer) the argument has no harmonic analysis at all, but proceeds via the polynomial method!

This is surprising for two reasons.  First, it’s hard to make the polynomial method work well for rings, like Z/4Z, that aren’t fields; extending our knowledge about additive combinatorics to such settings is a long-standing interest of mine.  Second, the polynomial method over finite fields usually works in the “fixed dimension large field” regime; problems like affine cap, where the base ring is fixed and the dimension are growing, have so far been mostly untouched.

As for the first issue, here’s the deal.  This looks like a problem over Z/4Z but is really a problem over F_2, because the condition for being a 3-term AP

a – 2b + c = 0

has a 2 in it.  In other words:  the two outer terms have to lie in the same coset of 2A, and the middle term is only determined up to 2A.

 So CLP recast the problem as follows.  Let S be a large subset of A with no 3-term AP.   Let V be 2A, which is an n-dimensional vector space over F_2.  For each v in V, there’s a coset of V consisting of the solutions to 2a = v, and we can let S_v be the intersection of S with this coset.

We want to make this a problem about V, not about A.  So write T_v for a translate of S_v by some element of the coset, so T_v now sits in V.  Which element?  Doesn’t matter!

We can now write the “no 3-term AP” condition strictly in terms of these subsets of V.  Write (T_v – T_v)^* for the set of differences between distinct elements of T_v.  Write U for the set of v in V such that T_v is nonempty.  Then the union over all v in U of

(T_v – T_v)^* + v

is disjoint from U.

I leave it as an exercise to check the equivalence.

Now we have a combinatorial question about vector spaces over F_2; we want to show that, under the condition above, the sum of |T_v| over all v in U can’t be too large.

This is where the polynomial method comes in!  CLP show that (over any field, not just F_2), a polynomial of low degree vanishing on (T_v – T_v)^* has to vanish at 0 as well; this is Lemma 1 in their paper.  So write down a polynomial P vanishing on V – U; by dimension considerations we can choose one which doesn’t vanish on all of V.  (This uses the fact that the squarefree monomials of degree up to d are linearly independent functions on F_2^n.)  If U is big, we can choose P to have lowish degree.

Since P vanishes on V-U, P has to vanish on (T_v – T_v)^* + v for all v.  Since P has low degree, it has to vanish on v too, for all v.  But then P vanishes everywhere, contrary to our assumption.

The magic of the paper is in Lemma 1, in my view, which is where you really see the polynomial method applied in this unusual fixed-field-large-dimension regime.  Let me say a vague word about how it works.  (The actual proof is less than a page, by the way, so I’m not hiding much!)  Let P be your polynomial and d its degee.  You send your vector space into a subvariety of a much larger vector space W via degree-d Veronese embedding F_d. In fact you do this twice, writing

V x V -> W x W.

Now if P is your polynomial of degree-d, you can think of P(v_1 – v_2) as a bilinear form <,> on W x W.  Suppose S is a subset of V such that P(s_1 – s_2) vanishes for all distinct s_1, s_2 in S.   That means

<F_d(s_1), F_d(s_2)> = 0

for all distinct s_1, s_2 in S.  On the other hand,

<F_d(s_1), F_d(s_1)>

doesn’t depend on s_1; it just takes the value P(0).  So if P(0) is not equal to 0, you have |S| vectors of nonzero norm which are mutually orthogonal under this bilinear form, and so there can be at most dim W of these, and that’s the bound on |S| you need.

This is very slick and I hope the idea is more generally applicable!




May 13, 2016

Doug NatelsonInteracting Quantum Systems Driven Out of Equilibrium - day 2

Continuing into day 2 of our workshop:

  • Bryce Gadway of the University of Illinois spoke about using cold atoms in an optical lattice to simulate topological and disordered systems.  His group has implemented an optical lattice constructed not by interference of retroreflected lasers, but by interference between a laser and counterpropagating beams frequency shifted by precise, controlled amounts.  As a non-atomic physics person I'm a bit fuzzy on the details, but the point is that this allows his group to put in place precise control of the on-site potential of each lattice site and to dial in designer phase shifts associated with tunneling between adjacent sites, on demand.  This means it is possible to study transport problems (like Bloch oscillations) as well as introducing designer, time-varying, site-specific disorder if desired.  
  • I spoke about my group's work on heating and dissipation in atomic- and molecular-scale junctions driven out of equilibrium (and into a steady state) by electronic bias.  I framed the discussion in terms of how hard it is to obtain truly local information about vibrational and electronic distributions in such driven systems.  On the vibration side, if you're interested, I suggest looking here, here, and here, with a recent related result here.  On the electronic front, I talked about published (here and here) and some unpublished data looking at electronic shot noise at high biases in atomic-scale metal junctions.  
  • Eugene Demler from Harvard (my grad school classmate) gave a nice talk that addressed nonequilibrium aspects of both cold atoms and electrons.   For example, he and collaborators have developed some theoretical machinery for looking at a cold atom version of the orthogonality catastrophe - what happens if you suddenly "turn on" interactions between a cold Fermi gas and a single impurity, and watch the dynamics.  These same theoretical techniques can be applied to solid state systems as well.  (This is just a subset of what was presented.)
  • Ryo Shimano from Tokyo University gave a very pretty talk about optical manipulation and driving of the Higgs mode inside superconductors.  You can hit a superconductor with THz radiation as a pump, and then probe at some delay with additional THz radiation.  If the pump is at the right frequency (energy half the superconducting gap, in the s-wave case), you can excite collective sloshing of the condensate (see here and scroll down to the first example).  As you might imagine, things get more rich and complicated with more exotic superconductors (multiband or unconventional).
  • Emil Yuzbashyan from Rutgers presented a look at the fundamental issues involved in non-thermal steady states of ensembles of quantum particles at long times after a quench (a sudden change in some parameter).  As I wrote in the first-day discussion, the interesting question here is when does the system evolve seemingly coherently (i.e., the particles slosh around in recurring patterns, just as a Newton's cradle ticks back and forth), and when does the system instead tend toward a long-time state that looks like a randomized, thermalized condition?   To see how this relates to classical mechanics, see these articles (here and here) that I need to find time to read.  
  • Lastly, my colleague Matt Foster from Rice spoke about quenched BCS superfluids, topology, spectral probes, and gapless (topological) superconductivity under intense THz pumping.  This was a neat pedagogical talk about this work.  It touches some of the same issues as the Shimano talk above.  One aspect that I found interesting to consider:  You can have a system where a quench drives some collective oscillations, and those collective oscillations act as a Floquet perturbation, changing the effective band structure and giving rise to nonlinearities that continue the oscillations.  Wild stuff - here are the slides.  
We then had a lunch that segued smoothly into a relaxed poster session for students and postdocs.  Overall, it was a great workshop and a good chance to get people from diverse areas of CM and AMO physics together over common interests in nonequilibrium response of quantum systems.  Thanks to everyone who was able to come, to our staff organizer who made everything actually come together, and of course to our sponsors (NSF DMR, ICAM-I2CAM, the Gordon and Betty Moore Foundation, Lakeshore, Advantest, and Coherent)!

Jordan EllenbergBourgain, Gamburd, Sarnak on Markoff triples

Such a great colloquium last week by Peter Sarnak, this year’s Hilldale Lecturer, on his paper with Bourgain and Gamburd.  My only complaint is that he promised to talk about the mapping class group and then barely did!  So I thought I’d jot down what their work has to do with mapping class groups and spaces of branched covers.

Let E be a genus 1 Riemann surface — that is, a torus — and O a point of E.  Then pi_1(E-O) is just a free group on two generators, whose commutator is (the conjugacy class of) a little loop around the puncture.  If G is a group, a G-cover of E branched only at O is thus a map from pi_1(E-O) to G, which is to say a pair (a,b) of elements of G.  Well, such a pair considered up to conjugacy, since we didn’t specify a basepoint for our pi_1.  And actually, we might as well just think about the surjective maps, which is to say the connected G-covers.

Let’s focus on the case G = SL_2(Z).  And more specifically on those maps where the puncture class is sent to a matrix of trace -2.  Here’s an example:  we can take

a_0 = \left[ \begin{array}{cc} 1 & 1 \\ 1 & 2 \end{array} \right]

b_0 = \left[ \begin{array}{cc} 2 & 1 \\ 1 & 1 \end{array} \right]

You can check that in this case the puncture class has trace -2; that is, it is the negative of a unipotent matrix.  Actually, I gotta be honest, these matrices don’t generate SL_2(Z); they generate a finite-index subgroup H of SL_2(Z), its commutator.

Write S for the set of all conjugacy classes of pairs (a,b) of matrices which generate H and have commutator with trace -2.  It turns out that this set is the set of integral points of an affine surface called the Markoff surface:  namely, if we take x = Tr(a)/3, y = Tr(b)/3, and z = Tr(ab)/3, then the three traces obey the relation

x^2 + y^2 + z^2 = 3xyz

and indeed every solution to this equation corresponds to an element of S.

So the integral points on the Markoff surface are acted on by an infinite discrete group.  Which if you just look at the equation seems like kind of a ridiculous miracle.  But in the setting of H-covers is very natural.  Because there’s a natural group acting on S: namely, the mapping class group Γ of type (1,1).  This group’s whole purpose in life is to act on the fundamental group of a once-punctured torus!  (For readers unfamiliar with mapping class groups, I highly recommend Benson Farb and Dan Margalit’s wonderful textbook.)   So you start with a surjection from pi_1(E-O) to H, you compose with the action of  Γ, and you get a new homomorphism.  The action of  Γ on pi_1(E-O) is only outer, but that’s OK, because we’re only keeping track of conjugacy classes of homomorphisms from pi_1(E-O) to H.

So Γ acts on S; and now the lovely theorem is that this action is transitive.

I don’t want to make this mapping class group business sound more abstract than it is.  Γ isn’t a mystery group; it acts on H_1(E-O), a free abelian group of rank 2, which gives a map from Γ to SL_2(Z), which turns out to be an isomorphism.  What’s more, the action of Γ on pairs (a,b) is completely explicit; the standard unipotent generators of SL_2(Z) map to the moves

(a,b) -> (ab,b)

(a,b) -> (a,ab)

(Sanity check:  each of these transformations preserves the conjugacy class of the commutator of a and b.)

Sarnak, being a number theorist, is interested in strong approximation: are the integral solutions of the Markoff equation dense in the adelic solutions?   In particular, if I have a solution to the Markoff equation over F_p — which is to say, a pair (a,b) in SL_2(F_p) with the right commutator — can I lift it to a solution over Z?

Suppose I have a pair (a,b) which lifts to a pair (a,b).  We know (a,b) = g(a_0,b_0) for some g in Γ.  Thus (a,b) = g(a_0,b_0).  In other words, if strong approximation is true, Γ acts transitively on the set S_p of Markoff solutions mod p.  And this is precisely what Bourgain, Gamburd, and Sarnak conjecture.  (In fact, they conjecture more:  that the Cayley-Schreier graph of this action is an expander, which is kind of a quantitative version of an action being transitive.)  One reason to believe this:  if we replace F_p with C, we replace S with the SL_2(C) character variety of pi_1(E-O), and Goldman showed long ago that the action of mapping class groups on complex character varieties of fundamental groups was ergodic; it mixes everything around very nicely.

Again, I emphasize that this is on its face a question of pure combinatorial group theory.  You want to know if you can get from any pair of elements in SL_2(F_p) with negative-unipotent commutator to any other via the two moves above.  You can set this up on your computer and check that it holds for lots and lots of p (they did.)  But it’s not clear how to prove this transitivity for all p!

They’re not quite there yet.  But what they can prove is that the action of Γ on S_p has a very big orbit, and has no very small orbits.

Now that G is the finite group SL_2(F_p), we’re in my favorite situation, that of Hurwitz spaces.  The mapping class group Γ is best seen as the fundamental group of the moduli stack M_{1,1} of elliptic curves.  So an action of Γ on the finite set S_p is just a cover H_p of M_{1,1}.  It is nothing but the Hurwitz space parametrizing maps (f: X -> E) where E is an elliptic curve and f an SL_2(F_p)-cover branched only at the origin.  What Bourgain, Gamburd, and Sarnak conjecture is that H_p is connected.

If you like, this is a moduli space of curves with nonabelian level structure as in deJong and Pikaart.  Or, if you prefer (and if H_p is actually connected) it is a noncongruence modular curve corresponding to the stabilizer of an element of S_p in Γ = SL_2(Z).  This stabilizer is in general going to be a noncongruence subgroup, except it is a congruence subgroup in the more general sense of Thurston.

This seems like an interesting family of algebraic curves!  What, if anything, can we say about it?

May 12, 2016

Sean CarrollBig Picture Part Six: Caring

One of a series of quick posts on the six sections of my book The Big PictureCosmos, Understanding, Essence, Complexity, Thinking, Caring.

Chapters in Part Six, Caring:

  • 45. Three Billion Heartbeats
  • 46. What Is and What Ought to Be
  • 47. Rules and Consequences
  • 48. Constructing Goodness
  • 49. Listening to the World
  • 50. Existential Therapy

In this final section of the book, we take a step back to look at the journey we’ve taken, and ask what it implies for how we should think about our lives. I intentionally kept it short, because I don’t think poetic naturalism has many prescriptive advice to give along these lines. Resisting the temptation to give out a list of “Ten Naturalist Commandments,” I instead offer a list of “Ten Considerations,” things we can keep in mind while we decide for ourselves how we want to live.

A good poetic naturalist should resist the temptation to hand out commandments. “Give someone a fish,” the saying goes, “and you feed them for a day. Teach them to fish, and you feed them for a lifetime.” When it comes to how to lead our lives, poetic naturalism has no fish to give us. It doesn’t even really teach us how to fish. It’s more like poetic naturalism helps us figure out that there are things called “fish,” and perhaps investigate the various possible ways to go about catching them, if that were something we were inclined to do. It’s up to us what strategy we want to take, and what to do with our fish once we’ve caught them.

There are nevertheless some things worth saying, because there are a lot of untrue beliefs to which we all tend to cling from time to time. Many (most?) naturalists have trouble letting go of the existence of objective moral truths, even if they claim to accept the idea that the natural world is all that exists. But you can’t derive ought from is, so an honest naturalist will admit that our ethical principles are constructed rather than derived from nature. (In particular, I borrow the idea of “Humean constructivism” from philosopher Sharon Street.) Fortunately, we’re not blank slates, or computers happily idling away; we have aspirations, desires, preferences, and cares. More than enough raw material to construct workable notions of right and wrong, no less valuable for being ultimately subjective.

Of course there are also incorrect beliefs on the religious or non-naturalist side of the ledger, from the existence of divinely-approved ways of being to the promise of judgment and eternal reward for good behavior. Naturalists accept that life is going to come to an end — this life is not a dress rehearsal for something greater, it’s the only performance we get to give. The average person can expect a lifespan of about three billion heartbeats. That’s a goodly number, but far from limitless. We should make the most of each of our heartbeats.


The finitude of life doesn’t imply that it’s meaningless, any more than obeying the laws of physics implies that we can’t find purpose and joy within the natural world. The absence of a God to tell us why we’re here and hand down rules about what is and is not okay doesn’t leave us adrift — it puts the responsibility for constructing meaningful lives back where it always was, in our own hands.

Here’s a story one could imagine telling about the nature of the world. The universe is a miracle. It was created by God as a unique act of love. The splendor of the cosmos, spanning billions of years and countless stars, culminated in the appearance of human beings here on Earth — conscious, aware creatures, unions of soul and body, capable of appreciating and returning God’s love. Our mortal lives are part of a larger span of existence, in which we will continue to participate after our deaths.

It’s an attractive story. You can see why someone would believe it, and work to reconcile it with what science has taught us about the nature of reality. But the evidence points elsewhere.

Here’s a different story. The universe is not a miracle. It simply is, unguided and unsustained, manifesting the patterns of nature with scrupulous regularity. Over billions of years it has evolved naturally, from a state of low entropy toward increasing complexity, and it will eventually wind down to a featureless equilibrium condition. We are the miracle, we human beings. Not a break-the-laws-of-physics kind of miracle; a miracle in that it is wondrous and amazing how such complex, aware, creative, caring creatures could have arisen in perfect accordance with those laws. Our lives are finite, unpredictable, and immeasurably precious. Our emergence has brought meaning and mattering into the world.

That’s a pretty darn good story, too. Demanding in its own way, it may not give us everything we want, but it fits comfortably with everything science has taught us about nature. It bequeaths to us the responsibility and opportunity to make life into what we would have it be.

I do hope people enjoy the book. As I said earlier, I don’t presume to be offering many final answers here. I do think that the basic precepts of naturalism provide a framework for thinking about the world that, given our current state of knowledge, is overwhelmingly likely to be true. But the hard work of understanding the details of how that world works, and how we should shape our lives within it, is something we humans as a species have really only just begun to tackle in a serious way. May our journey of discovery be enlivened by frequent surprises!

Sean CarrollBig Picture Part Five: Thinking

One of a series of quick posts on the six sections of my book The Big PictureCosmos, Understanding, Essence, Complexity, Thinking, Caring.

Chapters in Part Five, Thinking:

  • 37. Crawling Into Consciousness
  • 38. The Babbling Brain
  • 39. What Thinks?
  • 40. The Hard Problem
  • 41. Zombies and Stories
  • 42. Are Photons Conscious?
  • 43. What Acts on What?
  • 44. Freedom to Choose

Even many people who willingly describe themselves as naturalists — who agree that there is only the natural world, obeying laws of physics — are brought up short by the nature of consciousness, or the mind-body problem. David Chalmers famously distinguished between the “Easy Problems” of consciousness, which include functional and operational questions like “How does seeing an object relate to our mental image of that object?”, and the “Hard Problem.” The Hard Problem is the nature of qualia, the subjective experiences associated with conscious events. “Seeing red” is part of the Easy Problem, “experiencing the redness of red” is part of the Hard Problem. No matter how well we might someday understand the connectivity of neurons or the laws of physics governing the particles and forces of which our brains are made, how can collections of such cells or particles ever be said to have an experience of “what it is like” to feel something?

These questions have been debated to death, and I don’t have anything especially novel to contribute to discussions of how the brain works. What I can do is suggest that (1) the emergence of concepts like “thinking” and “experiencing” and “consciousness” as useful ways of talking about macroscopic collections of matter should be no more surprising than the emergence of concepts like “temperature” and “pressure”; and (2) our understanding of those underlying laws of physics is so incredibly solid and well-established that there should be an enormous presumption against modifying them in some important way just to account for a phenomenon (consciousness) which is admittedly one of the most subtle and complex things we’ve ever encountered in the world.

My suspicion is that the Hard Problem won’t be “solved,” it will just gradually fade away as we understand more and more about how the brain actually does work. I love this image of the magnetic fields generated in my brain as neurons squirt out charged particles, evidence of thoughts careening around my gray matter. (Taken by an MEG machine in David Poeppel’s lab at NYU.) It’s not evidence of anything surprising — not even the most devoted mind-body dualist is reluctant to admit that things happen in the brain while you are thinking — but it’s a vivid illustration of how closely our mental processes are associated with the particles and forces of elementary physics.


The divide between those who doubt that physical concepts can account for subjective experience and those who are think it can is difficult to bridge precisely because of the word “subjective” — there are no external, measurable quantities we can point to that might help resolve the issue. In the book I highlight this gap by imagining a dialogue between someone who believes in the existence of distinct mental properties (M) and a poetic naturalist (P) who thinks that such properties are a way of talking about physical reality:

M: I grant you that, when I am feeling some particular sensation, it is inevitably accompanied by some particular thing happening in my brain — a “neural correlate of consciousness.” What I deny is that one of my subjective experiences simply is such an occurrence in my brain. There’s more to it than that. I also have a feeling of what it is like to have that experience.

P: What I’m suggesting is that the statement “I have a feeling…” is simply a way of talking about those signals appearing in your brain. There is one way of talking that speaks a vocabulary of neurons and synapses and so forth, and another way that speaks of people and their experiences. And there is a map between these ways: when the neurons do a certain thing, the person feels a certain way. And that’s all there is.

M: Except that it’s manifestly not all there is! Because if it were, I wouldn’t have any conscious experiences at all. Atoms don’t have experiences. You can give a functional explanation of what’s going on, which will correctly account for how I actually behave, but such an explanation will always leave out the subjective aspect.

P: Why? I’m not “leaving out” the subjective aspect, I’m suggesting that all of this talk of our inner experiences is a very useful way of bundling up the collective behavior of a complex collection of atoms. Individual atoms don’t have experiences, but macroscopic agglomerations of them might very well, without invoking any additional ingredients.

M: No they won’t. No matter how many non-feeling atoms you pile together, they will never start having experiences.

P: Yes they will.

M: No they won’t.

P: Yes they will.

I imagine that close analogues of this conversation have happened countless times, and are likely to continue for a while into the future.

Doug NatelsonInteracting Quantum Systems Driven Out of Equilibrium - day 1 (updated - complete)

Our workshop was fun and interesting.   There are multiple ways to drive physical systems out of equilibrium - you can take some system and push on it with some force, for example.  In the case of a condensed matter system (whether solid state or trapped cold atoms), you can apply a bias - some difference in population (or chemical potential or pressure) that drives the system, either by adding kinetic energy to it or encouraging the flow of matter and/or charge.  You can apply a temperature difference across the system, driving some average flow of energy through the system's degrees of freedom.  You can shine light on the system, adding energy and momentum either at a steady rate or in a sudden pulse.  One favorite piece of vocabulary these days is a quench - suddenly (compared with relaxation rates of the system) changing some condition like the potential energy of the particles, and then watching the response of the system's degrees of freedom.  Does the system "thermalize"?  That is, do the microscopic pieces of the system interact with each other and redistribute energy so that there seems to be some effective temperature?  Or does the system fail to thermalize, and instead slosh around in some non-thermal configuration for a long time?  There are many open issues.

We had 13 talks on the first day, and I don't want to write exhaustive summaries of all of them.  We will eventually be posting pdf files of the relevant slides.  That being said, I will give a super-brief description of each, and link to a relevant paper or two so that you can see what was discussed.  Here are the 13 talks we had on the first day.

  • Nadya Mason from UIUC spoke about her group's work on engineered superconducting/normal metal structures in magnetic fields.  These devices allow studies of current-driven motion of trapped magnetic flux.  In some sense this is an old, established problem, but traditional models actually do a poor job of reproducing the experimental data.  The experiments are here, and it looks like it's important to include some "delayed friction" to understand vortex motion.
  • Jonathan Bird from Buffalo spoke about his group's studies of quantum point contacts in semiconductors, where it's long been known how to measure electronic conduction down to the limit of discrete quantum channels, where the devices act like waveguides for the electrons.   His group has developed some high speed techniques for making sub-ns electronic measurements, and what really gets interesting is when systems are driven hard, so that the electronic bias is the largest energy scale in the problem - you have to worry quite a bit about exciting phonons and what they do.  A key result is the apparent formation of a specific, somewhat heating-immune transport mode when such a point contact is driven really hard.
  • David Goldhaber-Gordon from Stanford spoke about his group's recent experiments looking at quantum dots, some building on work looking at the so-called two-channel Kondo effect.  An unpaired electron is placed in the position of trying to couple to two (carefully tuned to be) independent baths of electrons.  Some of the not-yet-published results look at interesting scaling as one tunes through the accessible regimes, and involved some stunningly pretty device fabrication done at the Weizmann Institute.  Other experiments looked at the apparent emergence of symmetry in systems comprising two quantum dots.
  • Tilman Esslinger of ETH presented his group's great work on using cold atoms to look at systems rather analogous to the ones Prof. Bird had mentioned.  They can create blobs of fermionic cold atom fluids of unequal populations, and link them by a carefully controlled constriction, and then they can image transport.  If they squeeze the contact to be effectively one dimensional, they can see quantized conductance of atoms (just as solid state folks can do with charge in a quantum point contact).  They can use atomic physics methods to dial around the interactions between the particles, and can then look at how this affects dissipation in the out of equilibrium situation.  Gorgeous stuff.
  • Takashi Oka of the Max Planck Institutes in Dresden talked about Floquet theory and using lasers to control the topology of the band structure of materials.  There was a lot to this talk, and it's not easy to summarize.  In Floquet theory, you apply a periodic driving potential to a quantum system.  Just like a spatially periodic potential energy picks out certain spatial periodicities and gives you a compact way of looking at band structure, temporal periodicity creates what you could call replicas of the band structure but shifted in energy by multiples of \( \hbar \omega\), where \(\omega\) is the driving frequency.  If you do this right, the driven system can have topological edge states.  You can also use periodic driving to reorient the magnetization of materials as if you had a whopping huge effective magnetic field.
  • Andrew Millis of Columbia University has worked on many relevant topics, and in this case chose to speak about theory he and collaborators have done regarding a recent experiment looking at vanadium dioxide.  That material has a structural phase transition at 65 C that separates a low temperature, monoclinic, insulating state from a high temperature, tetragonal, metallic state.  In the experiment, optical excitation puts the material into a metallic state without actually leaving the monoclinic crystal structure.  The theory suggests that this is a correlation effect - scoop electrons out of the lower Hubbard band and drop them into the upper band, and interorbital interaction effects can stabilize a new, metastable electronic structure that's a metal.
  • Alessandra Lanzara of Berkeley gave a really nice talk about her group's work on time-resolved angle-resolved photoemission.  You hit a material of interest with an ultrafast, time-resolved pump pulse of near-infrared light (1.5 eV photons), and then at some known delay you smack the system with a 6 eV probe pulse at a particular polarization and orientation, and measure the energy and momentum distribution of the electrons that get kicked out.  This lets you measure the transient electronic structure.  They've been able to use this approach to study the dynamics of quasiparticles in cuprate superconductors, how Cooper pairs respond to such pumping, etc.
  • N. Peter Armitage at Johns Hopkins articulated nicely three reasons to "go nonequilibrium":  to learn about elementary excitations of an equilibrium phase; to access "phases" not possible in equilibrium material configurations; and to look for new "phases" that have no equilibrium analog.  He then gave a fun talk about using optical spectroscopy techniques to look at many-body relaxations (older paper here) in the Coulomb glass phase of lightly doped semiconductors - when there are strongly interacting, localized electrons in a disordered configuration so that screening is poor.  Interestingly, these systems relax more slowly when the carrier densities get higher, in physics related to the orthogonality catastrophe
  • My faculty colleague Jun Kono from Rice spoke about so-called Dicke phenomena (such as superradiance, superfluorescence) in semiconductors.  These effects are great examples of nonequilibrium physics, when a driven system (say a semiconductor in a magnetic field illuminated by THz radiation that spans the energy scale of the cyclotron resonance, \(\omega_{\mathrm{c}} = e B/m^{*}\)) spontaneously develops coherence among the many electron-hole excitations in the system.  You can put such a system in a clever kind of 1d optical cavity, and approach the "strong coupling" regime so that the energetic coupling between the charge carriers and the photons in the cavity is comparable to the cyclotron energy.
  • Christof Weitenberg from Hamburg then spoke about exciting results in simulating condensed matter systems using cold atoms in optical lattices.  One piece of physics that's very in vogue right now because of the rise of topology and various 2d materials is Berry curvature.  It's hard to explain this in brief - if you look at how the energy bands of a material as a function of crystal momentum \(E(\mathbf{k})\) are curved, the wavefunction of a particle traversing some closed trajectory in \(\mathbf{k}\)-space can pick up a phase factor related to that curvature.  In Weitenberg's experiments, cleverly arranged laser beams can create designer lattices.  Shaking the lasers periodically as a function of time can lead to the same Floquet physics discussed above, changing the effective band structure for atoms confined in those lattices, and through cool imaging techniques the experimentalists can reconstruct the Berry curvature that they have designed into that effective band structure.
  • Another colleague Kaden Hazzard from Rice gave a nice theoretical talk about different nonequilibrium collective phenomena in ultracold atomic matter.  One aspect involved dilute molecules with electric dipoles (KBr) trapped in an optical lattice.  Because of their dipole moments, the molecules interact with each other over long ranges (dipole-dipole interactions scale like \(1/r^{3}\)), and their relaxation after getting dinged is governed by many-body interaction effects.  Another system is trapped Rydberg atoms, where dipolar interactions scale like the principal quantum number to the eleventh power (!).  
  • Andrea Cavalleri from the Max Planck in Hamburg (and also spending time at Oxford) spoke about his group's very high profile work that I've already described here.  The central question here is really can driving a quantum material stabilize collective states like superconductivity that have coherence, correlations, and remarkable physical properties that would be absent without the drive.  Both Cavalleri and Oka made reference to this video, which shows how driving a classical pendulum can render the inverted position of the pendulum stable.  The experiments themselves are truly remarkable.
  • In the last talk of Day 1, Sarang Gopalakrishnan of Cal Tech gave a theory talk again examining the response of driven many-body quantum systems, focusing particularly on the issue of many-body localization.  That is, when do the quantum dynamics of a many-body system lead to a real breakdown of quantum ergodicity, so that the degrees of freedom get "stuck", having large variability of local observables (instead of things being smoothed out and looking thermally smeared) and comparatively weak entanglement (which grows more slowly with system size than in the effectively thermal case).  He pointed out experimental challenges, that experiments probe dynamics rather than quantum eigenstates and that everything really is coupled (however weakly) to some thermal "bath", but argued that these issues aren't fatal to the interesting physics.

n-Category Café The Works of Charles Ehresmann

Charles Ehresmann’s complete works are now available for free here:

There are 630 pages on algebraic topology and differential geometry, 800 pages on local structures and ordered categories, and their applications to topology, 900 pages on structured categories and quotients and internal categories and fibrations, and 850 pages on sketches and completions and sketches and monoidal closed structures.

That’s 3180 pages!

On top of this, more issues of the journal he founded, Cahiers de Topologie et Géométrie Différentielle Catégoriques, will become freely available online.

Andrée Ehresmann announced this magnificent gift to the world on the category theory mailing list, writing:

We are pleased to announce that the issues of the Cahiers de Topologie et Géométrie Différentielle Catégoriques, from Volume L (2009) to LV (2014) included, are now freely downloadable from the internet site of the Cahiers:

through the hyperlink to Recent Volumes.

In the future the issues of the Cahiers will become freely available on the site of the Cahiers two years after their paper publication. We recall that papers published up to Volume XLIX are accessible on the NUMDAM site.

Moreover, the 7 volumes of Charles Ehresmann: Oeuvres complètes et commentées (edited by A. Ehresmann from 1980-83 as Supplements to the Cahiers) are now also freely downloadable from the site

These 2 sites are included in the site of Andrée Ehresmann

and they can also be accessed through hyperlinks on its first page.


Andree Ehresmann, Marino Gran and Rene Guitart,

Chief-Editors of the Cahiers

Jordan EllenbergSongs: Ramrao Parsatwar, EEK, Dick Diver

A couple of songs I want to remember I like.

“Jaltarang” is an old record by Ramrao Parsatwar.  The jaltarang is a percussion instrument consisting of bowls partially filled with water.  This comes to me via radiooooo, which I heartily recommend.

Something about this instrumental reminds me of EEK’s amazing “Trinity,” — I don’t know what to say about this one except it comes from Egypt and  it sounds kind of like video game music but from a video game too fun to humanly play.

But as you know I mostly listen to square Anglophone guitar-based indie so here’s a gem in that vein, “Waste the Alphabet,” by Dick Diver, like the EEK track a 2015 release.  Vocals a little like John Vanderslice, bass that sounds like Athens, GA in 1982, sort of New Zealandy guitar (this is actually from Australia.)  Hits all my spots.


May 11, 2016

Sean CarrollBig Picture Part Four: Complexity

One of a series of quick posts on the six sections of my book The Big PictureCosmos, Understanding, Essence, Complexity, Thinking, Caring.

Chapters in Part Four, Complexity:

  • 28. The Universe in a Cup of Coffee
  • 29. Light and Life
  • 30. Funneling Energy
  • 31. Spontaneous Organization
  • 32. The Origin and Purpose of Life
  • 33. Evolution’s Bootstraps
  • 34. Searching Through the Landscape
  • 35. Emergent Purpose
  • 36. Are We the Point?

One of the most annoying arguments a scientist can hear is that “evolution (or the origin of life) violates the Second Law of Thermodynamics.” The idea is basically that the Second Law says things become more disorganized over time, but the appearance of life represents increased organization, so what do you have to say about that, Dr. Smarty-Pants?

This is a very bad argument, since the Second Law only says that entropy increases in closed systems, not open ones. (Otherwise refrigerators would be impossible, since the entropy of a can of Diet Coke goes down when you cool it.) The Earth’s biosphere is obviously an open system — we get low-entropy photons from the Sun, and radiate high-entropy photons back to the universe — so there is manifestly no contradiction between the Second Law and the appearance of complex structures.

As right and true as that response is, it doesn’t quite address the question of why complex structures actually do come into being. Sure, they can come into being without violating the Second Law, but that doesn’t quite explain why they actually do. In Complexity, the fourth part of The Big Picture, I talk about why it’s very natural for such a thing to happen. This covers the evolution of complexity in general, as well as specific questions about the origin of life and Darwinian natural selection. When it comes to abiogenesis, there’s a lot we don’t know, but good reason to be optimistic about near-term progress.

In 2000, Gretchen Früh-Green, on a ship in the mid-Atlantic Ocean as part of an expedition led by marine geologist Deborah Kelley, stumbled across a collection of ghostly white towers in the video feed from a robotic camera near the ocean floor deep below. Fortunately they had with them a submersible vessel named Alvin, and Kelley set out to explore the structure up close. Further investigation showed that it was just the kind of alkaline vent formation that Russell had anticipated. Two thousand miles east of South Carolina, not far from the Mid-Atlantic Ridge, the Lost City hydrothermal vent field is at least 30,000 years old, and may be just the first known example of a very common type of geological formation. There’s a lot we don’t know about the ocean floor.

Lost City

The chemistry in vents like those at Lost City is rich, and driven by the sort of gradients that could reasonably prefigure life’s metabolic pathways. Reactions familiar from laboratory experiments have been able to produce a number of amino acids, sugars, and other compounds that are needed to ultimately assemble RNA. In the minds of the metabolism-first contingent, the power source provided by disequilibria must come first; the chemistry leading to life will eventually piggyback upon it.

Albert Szent-Györgyi, a Hungarian physiologist who won the Nobel Prize in 1937 for the discovery of Vitamin C, once offered the opinion that “Life is nothing but an electron looking for a place to rest.” That’s a good summary of the metabolism-first view. There is free energy locked up in certain chemical configurations, and life is one way it can be released. One compelling aspect of the picture is that it’s not simply working backwards from “we know there’s life, how did it start?” Instead, its suggesting that life is the solution to a problem: “we have some free energy, how do we liberate it?”

Planetary scientists have speculated that hydrothermal vents similar to Lost City, might be abundant on Jupiter’s moon Europa or Saturn’s moon Enceladus. Future exploration of the Solar System might be able to put this picture to a different kind of test.

A tricky part of this discussion is figuring out when it’s okay to say that a certain naturally-evolved organism or characteristic has a “purpose.” Evolution itself has no purpose, but according to poetic naturalism it’s perfectly okay to ascribe purposes to specific things or processes, as long as that kind of description actually provides a useful way of talking about the higher-level emergent behavior.

Jordan EllenbergLMFDB!

Very happy to see that the L-functions and Modular Forms Database is now live!

When I was a kid we looked up our elliptic curves in Cremona’s tables, on paper.  Then William Stein created the Modular Forms Database (you can still go there but it doesn’t really work) and suddenly you could look at the q-expansions of cusp forms in whatever weight and level you wanted, up to the limits of what William had computed.

The LMFDB is a sort of massively souped up version of Cremona and Stein, put together by a team of dozens and dozens of number theorists, including too many friends of mine to name individually.  And it’s a lot more than what the title suggests:  the incredibly useful Jones-Roberts database of local fields is built in; there’s a database of genus 2 curves over Q with small conductor; it even has Maass forms!  I’ve been playing with it all night.  It’s like an adventure playground for number theorists.

One of my first trips through Stein’s database came when I was a postdoc and was thinking about Ljunggren’s equation y^2 + 1 = 2x^4.  This equation has a large solution, (13,239), which has to do with the classical identity

\pi/4 = 4\arctan(1/5) - \arctan(1/239).

It turns out, as I explain in an old survey paper, that the existence of such a large solution is “explained” by the presence of a certain weight-2 cuspform in level 1024 whose mod-5 Galois representation is reducible.

With the LMFDB, you can easily wander around looking for more such examples!  For instance, you can very easily ask the database for non-CM elliptic curves whose mod-7 Galois representation is nonsurjective.  Among those, you can find this handsome curve of conductor 1296, which has very large height relative to its conductor.  Applying the usual Frey curve trick you can turn this curve into the Diophantine oddity

3*48383^2 – (1915)^3 = 2^13.

Huh — I wonder whether people ever thought about this Diophantine problem, when can the difference between a cube and three times a square be a power of 2?  Of course they did!  I just Googled

48383 Diophantine

and found this paper of Stanley Rabinowitz from 1978, which finds all solutions to that problem, of which this one is the largest.

Now whether you can massage this into an arctan identity, that I don’t know!



May 10, 2016

Tim GowersThe L-functions and modular forms database

With each passing decade, mathematics grows substantially. As it grows, mathematicians are forced to become more specialized — in the sense of knowing a smaller fraction of the whole — and the time and effort needed to get to the frontier of what is known, and perhaps to contribute to it, increases. One might think that this process will eventually mean that nobody is prepared to make the effort any more, but fortunately there are forces that work in the opposite direction. With the help of the internet, it is now far easier to find things out, and this makes research a whole lot easier in important ways.

It has long been a conviction of mine that the effort-reducing forces we have seen so far are just the beginning. One way in which the internet might be harnessed more fully is in the creation of amazing new databases, something I once asked a Mathoverflow question about. I recently had cause (while working on a research project with a student of mine, Jason Long) to use Sloane’s database in a serious way. That is, a sequence of numbers came out of some calculations we did, we found it in the OEIS, that gave us a formula, and we could prove that the formula was right. The great thing about the OEIS was that it solved an NP-ish problem for us: once the formula was given to us, it wasn’t that hard to prove that it was correct for our sequence, but finding it in the first place would have been extremely hard without the OEIS.

I’m saying all this just to explain why I rejoice that a major new database was launched today. It’s not in my area, so I won’t be using it, but I am nevertheless very excited that it exists. It is called the L-functions and modular forms database. The thinking behind the site is that lots of number theorists have privately done lots of difficult calculations concerning L-functions, modular forms, and related objects. Presumably up to now there has been a great deal of duplication, because by no means all these calculations make it into papers, and even if they do it may be hard to find the right paper. But now there is a big database of these objects, with a large amount of information about each one, as well as a great big graph of connections between them. I will be very curious to know whether it speeds up research in number theory: I hope it will become a completely standard tool in the area and inspire people in other areas to create databases of their own.

Terence TaoL-functions and modular forms database now out of beta

Over the last few years, a large group of mathematicians have been developing an online database to systematically collect the known facts, numerical data, and algorithms concerning some of the most central types of objects in modern number theory, namely the L-functions associated to various number fields, curves, and modular forms, as well as further data about these modular forms.  This of course includes the most famous examples of L-functions and modular forms respectively, namely the Riemann zeta function \zeta(s) and the discriminant modular form \Delta(q), but there are countless other examples of both. The connections between these classes of objects lie at the heart of the Langlands programme.

As of today, the “L-functions and modular forms database” is now out of beta, and open to the public; at present the database is mostly geared towards specialists in computational number theory, but will hopefully develop into a more broadly useful resource as time develops.  An article by John Cremona summarising the purpose of the database can be found here.

(Thanks to Andrew Sutherland and Kiran Kedlaya for the information.)

Filed under: advertising, math.NT Tagged: L-function, Langlands program, modular forms

John BaezShelves and the Infinite

Infinity is a very strange concept. Like alien spores floating down from the sky, large infinities can come down and contaminate the study of questions about ordinary finite numbers! Here’s an example.

A shelf is a set with a binary operation \rhd that distributes over itself:

a \rhd (b \rhd c) = (a \rhd b) \rhd (a \rhd c)

There are lots of examples, the simplest being any group, where we define

g \rhd h = g h g^{-1}

They have a nice connection to knot theory, which you can see here if you think hard:

My former student Alissa Crans, who invented the term ‘shelf’, has written a lot about them, starting here:

• Alissa Crans, Lie 2-Algebras, Chapter 3.1: Shelves, Racks, Spindles and Quandles, Ph.D. thesis, U.C. Riverside, 2004.

I could tell you a long and entertaining story about this, including the tale of how shelves got their name. But instead I want to talk about something far more peculiar, which I understand much less well. There’s a strange connection between shelves, extremely large infinities, and extremely large finite numbers! It was first noticed by a logician named Richard Laver in the late 1980s, and it’s been developed further by Randall Dougherty.

It goes like this. For each n, there’s a unique shelf structure on the numbers \{1,2, \dots ,2^n\} such that

a \rhd 1 = a + 1 \bmod 2^n

So, the elements of our shelf are


1 \rhd 1 = 2

2 \rhd 1 = 3

and so on, until we get to

2^n \rhd 1 = 1

However, we can now calculate

1 \rhd 1

1 \rhd 2

1 \rhd 3

and so on. You should try it yourself for a simple example! You’ll need to use the self-distributive law. It’s quite an experience.

You’ll get a list of 2^n numbers, but this list will not contain all the numbers \{1, 2, \dots, 2^n\}. Instead, it will repeat with some period P(n).

Here is where things get weird. The numbers P(n) form this sequence:

1, 1, 2, 4, 4, 8, 8, 8, 8, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, …

It may not look like it, but the numbers in this sequence approach infinity!

if we assume an extra axiom, which goes beyond the usual axioms of set theory, but so far seems consistent!

This axiom asserts the existence of an absurdly large cardinal, called an I3 rank-into-rank cardinal.

I’ll say more about this kind of cardinal later. But, this is not the only case where a ‘large cardinal axiom’ has consequences for down-to-earth math, like the behavior of some sequence that you can define using simple rules.

On the other hand, Randall Dougherty has proved a lower bound on how far you have to go out in this sequence to reach the number 32.

And, it’s an incomprehensibly large number!

The third Ackermann function A_3(n) is roughly 2 to the nth power. The fourth Ackermann function A_4(n) is roughly 2 raised to itself n times:


And so on: each Ackermann function is defined by iterating the previous one.

Dougherty showed that for the sequence P(n) to reach 32, you have to go at least

n = A(9,A(8,A(8,255)))

This is an insanely large number!

I should emphasize that if we use just the ordinary axioms of set theory, the ZFC axioms, nobody has proved that the sequence P(n) ever reaches 32. Neither is it known that this is unprovable if we only use ZFC.

So, what we’ve got here is a very slowly growing sequence… which is easy to define but grows so slowly that (so far) mathematicians need new axioms of set theory to prove it goes to infinity, or even reaches 32.

I should admit that my definition of the Ackermann function is rough. In reality it’s defined like this:

A(m, n) = \begin{cases} n+1 & \mbox{if } m = 0 \\ A(m-1, 1) & \mbox{if } m > 0 \mbox{ and } n = 0 \\ A(m-1, A(m, n-1)) & \mbox{if } m > 0 \mbox{ and } n > 0. \end{cases}

And if you work this out, you’ll find it’s a bit annoying. Somehow the number 3 sneaks in:

A(2,n) = 2 + (n+3) - 3

A(3,n) = 2 \cdot (n+3) - 3

A(4,n) = 2^{n+3} - 3

A(5,n) = 2\uparrow\uparrow(n+3) - 3

where a \uparrow\uparrow b means a raised to itself b times,

A(6,n) = 2 \uparrow\uparrow\uparrow(n+3) - 3

where a \uparrow\uparrow\uparrow b means a \uparrow\uparrow (a \uparrow\uparrow (a \uparrow\uparrow \cdots )) with the number a repeated b times, and so on.

However, these irritating 3’s scarcely matter, since Dougherty’s number is so large… and I believe he could have gotten an even larger upper bound if he wanted.

Perhaps I’ll wrap up by saying very roughly what an I3 rank-into-rank cardinal is.

In set theory the universe of all sets is built up in stages. These stages are called the von Neumann hierarchy. The lowest stage has nothing in it:

V_0 = \emptyset

Each successive stage is defined like this:

V_{\lambda + 1} = P(V_\lambda)

where P(S) is the the power set of S, that is, the set of all subsets of S. For ‘limit ordinals’, that is, ordinals that aren’t of the form \lambda + 1, we define

\displaystyle{ V_\lambda = \bigcup_{\alpha < \lambda} V_\alpha }

An I3 rank-into-rank cardinal is an ordinal \lambda such that V_\lambda admits a nontrivial elementary embedding into itself.

Very roughly, this means the infinity \lambda is so huge that the collection of sets that can be built by this stage can mapped into itself, in a one-to-one but not onto way, into a smaller collection that’s indistinguishable from the original one when it comes to the validity of anything you can say about sets!

More precisely, a nontrivial elementary embedding of V_\lambda into itself is a one-to-one but not onto function

f: V_\lambda \to V_\lambda

that preserves and reflects the validity of all statements in the language of set theory. That is: for any sentence \phi(a_1, \dots, a_n) in the language of set theory, this statement holds for sets a_1, \dots, a_n \in V_\lambda if and only if \phi(f(a_1), \dots, f(a_n)) holds.

I don’t know why, but an I3 rank-into-rank cardinal, if it’s even consistent to assume one exists, is known to be extraordinarily big. What I mean by this is that it automatically has a lot of other properties known to characterize large cardinals. It’s inaccessible (which is big) and ineffable (which is bigger), and measurable (which is bigger), and huge (which is even bigger), and so on.

How in the world is this related to shelves?

The point is that if

f, g : V_\lambda \to V_\lambda

are elementary embeddings, we can apply f to any set in V_\lambda. But in set theory, functions are sets too: sets of ordered pairs So, g is a set. It’s not an element of V_\lambda, but all its subsets g \cap V_\alpha are, where \alpha < \lambda. So, we can define

f \rhd g = \bigcup_{\alpha < \lambda} f (g \cap V_\alpha)

Laver showed that this operation distributes over itself:

f \rhd (g \rhd h) = (f \rhd g) \rhd (f \rhd h)

And, he showed that if we take one elementary embedding and let it generate a shelf by this this operation, we get the free shelf on one generator!

The shelf I started out describing, the numbers \{1, \dots, 2^n \} with

a \rhd 1 = a + 1 \bmod 2^n

also has one generator namely the number 1. So, it’s a quotient of the free shelf on one generator by one relation, namely the above equation.

That’s about all I understand. I don’t understand how the existence of a nontrivial elementary embedding of V_\lambda into itself implies that the function P(n) goes to infinity, and I don’t understand Randall Dougherty’s lower bound on how far you need to go to reach P(n) = 32. For more, read these:

• Richard Laver, The left distributive law and the freeness of an algebra of elementary embeddings, Adv. Math. 91 (1992), 209–231.

• Richard Laver, On the algebra of elementary embeddings of a rank into itself, Adv. Math. 110 (1995), 334–346.

• Randall Dougherty and Thomas Jech, Finite left distributive algebras and embedding algebras, Adv. Math. 130 (1997), 201–241.

• Randall Dougherty, Critical points in an algebra of elementary embeddings, Ann. Pure Appl. Logic 65 (1993), 211–241.

• Randall Dougherty, Critical points in an algebra of elementary embeddings, II.

May 09, 2016

BackreactionBook review: “The Big Picture” by Sean Carroll

The Big Picture: On the Origins of Life, Meaning, and the Universe Itself
Sean Carroll
Dutton (May 10, 2016)

Among the scientific disciplines, physics is unique: Concerned with the most fundamental entities, its laws must be respected in all other areas of science. While there are many emergent laws which are interesting in their own right – from neurobiology to sociology – there is no doubt they all have to be compatible with energy conservation. And the second law of thermodynamics. And quantum mechanics. And the standard model better be consistent with whatever you think are the neurological processes that make you “you.” There’s no avoiding physics.

In his new book, The Big Picture Sean explains just why you can’t ignore physics when you talk about extrasensory perception, consciousness, god, afterlife, free will, or morals. In the first part, Sean lays out what, to our best current knowledge, the fundamental laws of nature are, and what their relevance is for all other emergent laws. In the later parts he then goes through the consequences that follow from this.

On the way from quantum field theory to morals, he covers what science has to say about complexity, the arrow of time, and the origin of life. (If you attended the 2011 FQXi conference, parts will sound very familiar.) Then, towards the end of the book, he derives advice from his physics-based philosophy – which he calls “poetic naturalism” – for finding “meaning” in life and finding a “good” way to organize our living together (scare quotes because these words might not mean what you think they mean). His arguments rely heavily on Bayesian reasoning, so you better be prepared to update your belief system while reading.

The Big Picture is, above everything, a courageous book – and an overdue one. I have had many arguments about exactly the issues that Sean addresses in his book – from “qualia” to “downwards causation” – but I neither have the patience nor the interest to talk people out of their cherished delusions. I’m an atheist primarily because I think religion would be wasting my time, time that I’d rather spend on something more insightful. Trying to convince people that their beliefs are inconsistent would also be wasting my time, hence I don’t. But if I did, I almost certainly wouldn’t be able to remain as infallibly polite as Sean.

So, I am super happy about this book. Because now, whenever someone brings up Mary The Confused Color Scientist who can’t tell sensory perception from knowledge about that perception, I’ll just – politely – tell them to read Sean’s book. The best thing I learned from The Big Picture is that apparently Franck Jackson, the philosopher who came up with The Color Scientist, eventually conceded himself that the argument was wrong. The world of philosophy indeed sometimes moves! Time then, to stop talking about qualia.

I really wish I had found something to disagree with in Sean’s book, but the only quibble I have (you won’t be surprised to hear) is that I think what Sean-The-Compatibilist calls “free will” doesn’t deserve being called “free will.” Using the adjective “free” strongly suggests an independence from the underlying microscopic laws, and hence a case of “strong emergence” – which is an idea that should go into the same bin as qualia. I also agree with Sean however that fighting about the use of words is moot.

(The other thing I’m happy about is that, leaving aside the standard model and general relativity, Sean’s book has almost zero overlap with the book I’m writing. *wipes_sweat_off_forehead*. Could you all please stop writing books until I’m done, it makes me nervous.)

In any case, it shouldn’t come as a surprise that I agree so wholeheartedly with Sean because I think everybody who open-mindedly looks at the evidence – ie all we currently know about the laws of nature – must come to the same conclusions. The main obstacle in conveying this message is that most people without training in particle physics don’t understand effective field theory, and consequently don’t see what this implies for the emergence of higher level laws. Sean does a great job overcoming this obstacle.

I wish I could make myself believe that after the publication of Sean’s book I’ll never again have to endure someone insisting there must be something about their experience that can’t be described by a handful of elementary particles. But I’m not very good at making myself believe in exceedingly unlikely scenarios, whether that’s the existence of an omniscient god or the ability of humans to agree on how unlikely this existence is. At the very least however, The Big Picture should make clear that physicists aren’t just arrogant when they say their work reveals insights that reach far beyond the boundaries of their discipline. Physics indeed has an exceptional status among the sciences.

[Disclaimer: Free review copy.]

May 08, 2016

Doug NatelsonUpdates coming - Interacting Quantum Systems Driven Out of Equilibrium

As I'd advertised, the Rice Center for Quantum Materials is hosting a two-day workshop on interacting quantum systems driven out of equilibrium.  This event brings together people from roughly three different perspectives:  people who worry about (solid state) systems driven out of equilibrium by electrical bias; people who worry about quantum systems driven out of equilibrium by light (often ultrafast and/or very intense); and people who leverage the amazing cleanliness and tunability of cold atom systems to examine driven quantum many-body systems.   I've been taking notes, and after the workshop wraps up today I'll post some highlights.

John BaezVehicle-to-Grid

One of the big problems with intermittent power sources like wind and solar is the difficulty of storing energy. But if we ever get a lot of electric vehicles, we’ll have a lot of batteries—and at any time, most of these vehicles are parked. So, they can be connected to the power grid.

This leads to the concept of vehicle-to-grid or V2G. In a V2G system, electric vehicles can connect to the grid, with electricity flowing from the grid to the vehicle or back. Cars can help solve the energy storage problem.

Here’s something I read about vehicle-to-grid systems in Sierra magazine:

At the University of Delaware, dozens of electric vehicles sit in a uniform row. They’re part of an experiment involving BMW, power-generating company NRG, and PJM—a regional organization that moves electricity around 13 states and the District of Columbia—that’s examining how electric vehicles can give energy back to the electricity grid.

It works like this: When the cars are idle (our vehicles typically sit 95 percent of the time), they’re plugged in and able to deliver the electricity in their batteries back to the grid. When energy demand is high, they return electricity to the grid; when demand is low, they absorb electricity. One car doesn’t offer much, but 30 of them is another story—worth about 300 kilowatts of power. Utilities will pay for this service, called “load leveling,” because it means that they don’t have to turn on backup power plants, which are usually coal or natural gas burners. And the EV owners get regular checks—approximately $2.50 a day, or about $900 a year.

It’s working well, according to Willett Kempton, a longtime V2G guru and University of Delaware professor who heads the school’s Center for Carbon-Free Power Integration: “In three years hooked up to the grid, the revenue was better than we thought. The project, which is ongoing, shows that V2G is viable. We can earn money from cars that are driven regularly.”

V2G still has some technical hurdles to overcome, but carmakers—and utilities, too—want it to happen. In a 2014 report, Edison Electric Institute, the power industry’s main trade group, called on utilities to promote EVs [electric vehicles], describing EV adoption as a “quadruple win” that would sustain electricity demand, improve customer relations, support environmental goals, and reduce utilities’ operating costs.

Utilities appear to be listening. In Virginia and North Carolina, Dominion Resources is running a pilot project to identify ways to encourage EV drivers to only charge during off-peak demand. In California, San Diego Gas & Electric will be spending $45 million on a vehicle-to-grid integration system. At least 25 utilities in 14 states are offering customers some kind of EV incentive. And it’s not just utilities—the Department of Defense is conducting V2G pilot programs at four military bases.

Paula DuPont-Kidd, a spokesperson for PJM, says V2G is especially useful for what’s called “frequency regulation service”—keeping electricity transmissions at a steady 60 cycles per second. “V2G has proven its ability to be a resource to the grid when power is aggregated,” she says. “We know it’s possible. It just hasn’t happened yet.”

I wonder how much, exactly, this system would help.

My quote comes from here:

• Jim Motavalli, Siri, will connected vehicles be greener?, Sierra, May–June 2016.

Motavalli also discusses vehicle-to-vehicle connectivity and vehicle-to-building systems. The latter could let your vehicle power your house during a blackout—which seems of limited use to me, but maybe I don’t get the point.

In general, it seems good to have everything I own have the ability to talk to all the rest. There will be security concerns. But as we move toward ‘ecotechnology’, our gadgets should become less obtrusive, less hungry for raw power, more communicative, and more intelligent.