Planet Musings

April 21, 2015

David HoggPhil Marshall

Phil Marshall showed up today to give the astrophysics seminar. He also attended the CampHogg group meeting. In his seminar, he talked about finding and exploiting strong gravitational lenses in large sky surveys to make precise (and, importantly, accurate) inferences about the expansion history (or redshift—distance relation). He showed that when you are concerned that you might be affected by severe systematics, the best approach is to make your model much more flexible but then learn the relationships among the new nuisance parameters that make the much more flexible model nonetheless still informative. This requires hierarchical inference, which both Marshall and I have been pushing on the community for some years now.

In group meeting, he had the group members talk about the things they are most excited about. Among other things, this got Angus talking about periodograms with much better noise models under the hood and it got Foreman-Mackey talking about linear algebra tricks that might change our lives. Huppenkothen blew Marshall away with her example light-curves from GRS 1915. Marshall himself said he was excited about building a full three-dimensional model of all the mass density inside the Hubble volume, using both weak lensing and large-scale structure simultaneously. He has some ideas about baby steps that might make first projects tractable in the short run.

April 20, 2015

David Hoggrecap

The only real research I did today was a recap of projects with Foreman-Mackey as he prepares to complete his thesis. There are a lot of projects open, and there is some decision-making about what ought to be highest priority.

Chad OrzelMy Valuable Extra Phone

Back when we went to London for Worldcon (and then I went to Sweden for a workshop), I bought a smartphone in Heathrow thinking I could sell it back when I left. That turned out not to work the way we thought, but it’s served me well ever since as an e-reader. It can’t connect to the local cell network, but I can download stuff via wi-fi, and it’s small enough to hold in one hand, and back-lit, which makes it nice for reading in bed and on planes.

The lack of a cell connection, though, means it’s just running on its onboard clock, so has gotten out of synch with my US phone. But even that is impressively good, and as I explain at Forbes, would’ve made me rich in 18th-century London.

So, you know, if that sounds interesting, go over there and read all about it…

Clifford JohnsonFestivities (I)

Love this picture posted by USC's Facebook page*. (I really hope that we did not go over the heads of our - very patient** - audience during the Festival of Books panel...) Screen Shot 2015-04-20 at 08.59.59 -cvj *They don't give a photo credit, so I'm pointing you back to the posting here until I work it out. [...] Click to continue reading this post

Resonaances2014 Mad Hat awards

New Year is traditionally the time of recaps and best-ofs. This blog is focused on particle physics beyond the standard model where compiling such lists is challenging, given the dearth of discoveries or even   plausible signals pointing to new physics.  Therefore I thought I could somehow honor those who struggle to promote our discipline by finding new signals against all odds, and sometimes against all logic. Every year from now on, the Mad Hat will be awarded to the researchers who make the most outlandish claim of a particle-physics-related discovery, on the condition it gets enough public attention.

The 2014 Mad Hat award unanimously goes to Andy Read, Steve Sembay, Jenny Carter, Emile Schyns, and, posthumously, to George Fraser, for the paper Potential solar axion signatures in X-ray observations with the XMM–Newton observatory. Although the original arXiv paper sadly went unnoticed, this remarkable work was publicized several months later by the Royal Astronomical Society press release and by the article in Guardian.

The crucial point in this kind of endeavor is to choose an observable that is noisy enough to easily accommodate a new physics signal. In this particular case the observable is x-ray emission from Earth's magnetosphere, which could include a component from axion dark matter emitted from the Sun and converting to photons. A naive axion hunter might expect the conversion signal should be observed by looking at the sun (that is the photon inherits the momentum of the incoming axion), something that XMM cannot do due to technical constraints. The authors thoroughly address this point in a sentence in Introduction, concluding that it would be nice if the x-rays could scatter afterwards at the right angle. Then the signal that is searched for is an annual modulation of the x-ray emission, as the magnetic field strength in XMM's field of view is on average larger in summer than in winter. A seasonal dependence of the x-ray flux is indeed observed, for which axion dark matter is clearly the most plausible explanation.

Congratulations to all involved. Nominations for the 2015 Mad Hat award are open as of today ;) Happy New Year everyone!

ResonaancesWeekend Plot: Fermi and more dwarfs

This weekend's plot comes from the recent paper of the Fermi collaboration:

It shows the limits on the cross section of dark matter annihilation into tau lepton pairs. The limits are obtained from gamma-ray observations of 15 dwarf galaxies during 6 years. Dwarf galaxies are satellites of Milky Way made mostly of dark matter with few stars in it, which makes them a clean environment to search for dark matter signals. This study is particularly interesting because it is sensitive to dark matter models that could explain the gamma-ray excess detected from the center of the Milky Way.  Similar limits for the annihilation into b-quarks have already been shown before at conferences. In that case, the region favored by the Galactic center excess seems entirely excluded. Annihilation of 10 GeV dark matter into tau leptons could also explain the excess. As can be seen in the plot, in this case there is also  large tension with the dwarf limits, although astrophysical uncertainties help to keep hopes alive.  

Gamma-ray observations by Fermi will continue for another few years, and the limits will get stronger.   But a faster way to increase the statistics may be to find more observation targets. Numerical simulations with vanilla WIMP dark matter predict a few hundred dwarfs around the Milky Way. Interestingly, a discovery of several new dwarf candidates was reported last week. This is an important development, as the total number of known dwarf galaxies now exceeds the number of dwarf characters in Peter Jackson movies. One of the candidates, known provisionally as DES J0335.6-5403 or  Reticulum-2, has a large J-factor (the larger the better, much like the h-index).  In fact, some gamma-ray excess around 1-10 GeV is observed from this source, and one paper last week even quantified its significance as ~4 astrosigma (or ~3 astrosigma in an alternative more conservative analysis). However, in the Fermi analysis using  more recent reconstruction Pass-8 photon reconstruction,  the significance quoted is only 1.5 sigma. Moreover the dark matter annihilation cross section required to fit the excess is excluded by an order of magnitude by the combined dwarf limits. Therefore,  for the moment, the excess should not be taken seriously.

ResonaancesLHCb: B-meson anomaly persists

Today LHCb released a new analysis of the angular distribution in  the B0 → K*0(892) (→K+π-) μ+ μ- decays. In this 4-body decay process, the angles between the direction of flight of all the different particles can be measured as a function of the invariant mass  q^2 of the di-muon pair. The results are summarized in terms of several form factors with imaginative names like P5', FL, etc. The interest in this particular decay comes from the fact that 2 years ago LHCb reported a large deviation from the standard model prediction in one q^2 region of 1 form factor called P5'. That measurement was based on 1 inverse femtobarn of data;  today it was updated to full 3 fb-1 of run-1 data. The news is that the anomaly persists in the q^2 region 4-8 GeV, see the plot.  The measurement  moved a bit toward the standard model, but the statistical errors have shrunk as well.  All in all, the significance of the anomaly is quoted as 3.7 sigma, the same as in the previous LHCb analysis. New physics that effectively induces new contributions to the 4-fermion operator (\bar b_L \gamma_\rho s_L) (\bar \mu \gamma_\rho \mu) can significantly improve agreement with the data, see the blue line in the plot. The preference for new physics remains remains high, at the 4 sigma level, when this measurement is combined with other B-meson observables.

So how excited should we be? One thing we learned today is that the anomaly is unlikely to be a statistical fluctuation. However, the observable is not of the clean kind, as the measured angular distributions are  susceptible to poorly known QCD effects. The significance depends a lot on what is assumed about these uncertainties, and experts wage ferocious battles about the numbers. See for example this paper where larger uncertainties are advocated, in which case the significance becomes negligible. Therefore, the deviation from the standard model is not yet convincing at this point. Other observables may tip the scale.  If a  consistent pattern of deviations in several B-physics observables emerges,  only then we can trumpet victory.

Plots borrowed from David Straub's talk in Moriond; see also the talk of Joaquim Matias with similar conclusions. David has a post with more details about the process and uncertainties. For a more popular write-up, see this article on Quanta Magazine. 

Doug NatelsonAnecdotes from grad school and beyond

I've been thinking about what a more general audience likes to read in terms of science writing beyond descriptions of cool science.  Interesting personalities definitely have appeal.  Sure, he was a Nobel Laureate, but my guess is that much of Feyman's popularity originates from the fact that he really was a "curious character" and a great story-teller.  I'm not remotely in the same league, but in my scientific career, going back to grad school, I've been ridiculously fortunate to have had the chance to meet and interact with many interesting people.  Some of the stories might give a better slice-of-life feel for graduate science education and a scientific career than you'd get from The Big Bang Theory.  I'm going to start trying to write up some of these anecdotes - my apologies to friends who have heard some of these before....

Matt StrasslerCompleted Final Section of Article on Dark Matter and LHC

As promised, I’ve completed the third section, as well as a short addendum to the second section, of my article on how experimenters at the Large Hadron Collider [LHC] can try to discover dark matter particles.   The article is here; if you’ve already read what I wrote as of last Wednesday, you can pick up where you left off by clicking here.

Meanwhile, in the last week there were several dark-matter related stories that hit the press.

There has been a map made by the Dark Energy Survey of dark matter’s location across a swathe of the universe, based on the assumption that weak signals of gravitational lensing (bending of light by gravity) that cannot be explained by observed stars and dust is due to dark matter.  This will be useful down the line as we test simulations of the universe such as the one I referred you to on Wednesday.

There’s been a claim that dark matter interacts with itself, which got a lot of billing in the BBC; however one should be extremely cautious with this one, and the BBC editor should have put the word “perhaps” in the headline! It’s certainly possible that dark matter interacts with itself much more strongly than it interacts with ordinary matter, and many scientists (including myself) have considered this possibility over the years.  However, the claim reported by the BBC is considered somewhat dubious even by the authors of the study, because the little group of four galaxies they are studying is complicated and has to be modeled carefully.  The effect they observed may well be due to ordinary astrophysical effects, and in any case it is less than 3 Standard Deviations away from zero, which makes it more a hint than evidence.  We will need many more examples, or a far more compelling one, before anyone will get too excited about this.

Finally, the AMS experiment (whose early results I reported on here; you can find their September update here) has released some new results, but not yet in papers, so there’s limited information.  The most important result is the one whose details will apparently take longest to come out: this is the discovery (see the figure below) that the ratio of anti-protons to protons in cosmic rays of energies above 100 GeV is not decreasing as was expected. (Note this is a real discovery by AMS alone — in contrast the excess positron-to-electron ratio at similar energies, which was discovered by PAMELA and confirmed by AMS.)  The only problem is that they’ve made the discovery seem very exciting and dramatic by comparing their work to expectations from a model that is out of date and that no one seems to believe.  This model (the brown swathe in the Figure below) tries to predict how high-energy anti-protons are produced (“secondary production”) from even higher energy protons in cosmic rays.  Newer versions of this models are apparently significantly higher than the brown curve. Moreover, some scientists claim also that the uncertainty band (the width of the brown curve) on these types of models is wider than shown in the Figure.  At best, the modeling needs a lot more study before we can say that this discovery is really in stark conflict with expectations.  So stay tuned, but again, this is not yet something that in which one can have confidence.  The experts will be busy.

Figure 1. Antiproton to proton ratio measured by AMS. As seen, the measured ratio cannot be explained by existing models of secondary production.

Figure 1. Antiproton to proton ratio (red data points, with uncertainties given by vertical bars) as measured by AMS. AMS claims that the measured ratio cannot be explained by existing models of secondary production, but the model shown (brown swathe, with uncertainties given by the width of the swathe) is an old one; newer ones lie closer to the data. Also, the uncertainties in the models are probably larger than shown. Whether this is a true discrepancy with expectations is now a matter of healthy debate among the experts.

Filed under: Uncategorized

Terence TaoEmbedding the SQG equation in a modified Euler equation

The Euler equations for three-dimensional incompressible inviscid fluid flow are

\displaystyle  \partial_t u + (u \cdot \nabla) u = - \nabla p \ \ \ \ \ (1)

\displaystyle \nabla \cdot u = 0

where {u: {\bf R} \times {\bf R}^3 \rightarrow {\bf R}^3} is the velocity field, and {p: {\bf R} \times {\bf R}^3 \rightarrow {\bf R}} is the pressure field. For the purposes of this post, we will ignore all issues of decay or regularity of the fields in question, assuming that they are as smooth and rapidly decreasing as needed to justify all the formal calculations here; in particular, we will apply inverse operators such as {(-\Delta)^{-1}} or {|\nabla|^{-1} := (-\Delta)^{-1/2}} formally, assuming that these inverses are well defined on the functions they are applied to.

Meanwhile, the surface quasi-geostrophic (SQG) equation is given by

\displaystyle  \partial_t \theta + (u \cdot \nabla) \theta = 0 \ \ \ \ \ (2)

\displaystyle  u = ( -\partial_y |\nabla|^{-1}, \partial_x |\nabla|^{-1} ) \theta \ \ \ \ \ (3)

where {\theta: {\bf R} \times {\bf R}^2 \rightarrow {\bf R}} is the active scalar, and {u: {\bf R} \times {\bf R}^2 \rightarrow {\bf R}^2} is the velocity field. The SQG equations are often used as a toy model for the 3D Euler equations, as they share many of the same features (e.g. vortex stretching); see this paper of Constantin, Majda, and Tabak for more discussion (or this previous blog post).

I recently found a more direct way to connect the two equations. We first recall that the Euler equations can be placed in vorticity-stream form by focusing on the vorticity {\omega := \nabla \times u}. Indeed, taking the curl of (1), we obtain the vorticity equation

\displaystyle  \partial_t \omega + (u \cdot \nabla) \omega = (\omega \cdot \nabla) u \ \ \ \ \ (4)

while the velocity {u} can be recovered from the vorticity via the Biot-Savart law

\displaystyle  u = (-\Delta)^{-1} \nabla \times \omega. \ \ \ \ \ (5)

The system (4), (5) has some features in common with the system (2), (3); in (2) it is a scalar field {\theta} that is being transported by a divergence-free vector field {u}, which is a linear function of the scalar field as per (3), whereas in (4) it is a vector field {\omega} that is being transported (in the Lie derivative sense) by a divergence-free vector field {u}, which is a linear function of the vector field as per (5). However, the system (4), (5) is in three dimensions whilst (2), (3) is in two spatial dimensions, the dynamical field is a scalar field {\theta} for SQG and a vector field {\omega} for Euler, and the relationship between the velocity field and the dynamical field is given by a zeroth order Fourier multiplier in (3) and a {-1^{th}} order operator in (5).

However, we can make the two equations more closely resemble each other as follows. We first consider the generalisation

\displaystyle  \partial_t \omega + (u \cdot \nabla) \omega = (\omega \cdot \nabla) u \ \ \ \ \ (6)

\displaystyle  u = T (-\Delta)^{-1} \nabla \times \omega \ \ \ \ \ (7)

where {T} is an invertible, self-adjoint, positive-definite zeroth order Fourier multiplier that maps divergence-free vector fields to divergence-free vector fields. The Euler equations then correspond to the case when {T} is the identity operator. As discussed in this previous blog post (which used {A} to denote the inverse of the operator denoted here as {T}), this generalised Euler system has many of the same features as the original Euler equation, such as a conserved Hamiltonian

\displaystyle  \frac{1}{2} \int_{{\bf R}^3} u \cdot T^{-1} u,

the Kelvin circulation theorem, and conservation of helicity

\displaystyle  \int_{{\bf R}^3} \omega \cdot T^{-1} u.

Also, if we require {\omega} to be divergence-free at time zero, it remains divergence-free at all later times.

Let us consider “two-and-a-half-dimensional” solutions to the system (6), (7), in which {u,\omega} do not depend on the vertical coordinate {z}, thus

\displaystyle  \omega(t,x,y,z) = \omega(t,x,y)


\displaystyle  u(t,x,y,z) = u(t,x,y)

but we allow the vertical components {u_z, \omega_z} to be non-zero. For this to be consistent, we also require {T} to commute with translations in the {z} direction. As all derivatives in the {z} direction now vanish, we can simplify (6) to

\displaystyle  D_t \omega = (\omega_x \partial_x + \omega_y \partial_y) u \ \ \ \ \ (8)

where {D_t} is the two-dimensional material derivative

\displaystyle  D_t := \partial_t + u_x \partial_x + u_y \partial_y.

Also, divergence-free nature of {\omega,u} then becomes

\displaystyle  \partial_x \omega_x + \partial_y \omega_y = 0


\displaystyle  \partial_x u_x + \partial_y u_y = 0. \ \ \ \ \ (9)

In particular, we may (formally, at least) write

\displaystyle  (\omega_x, \omega_y) = (\partial_y \theta, -\partial_x \theta)

for some scalar field {\theta(t,x,y,z) = \theta(t,x,y)}, so that (7) becomes

\displaystyle  u = T ( (- \Delta)^{-1} \partial_y \omega_z, - (-\Delta^{-1}) \partial_x \omega_z, \theta ). \ \ \ \ \ (10)

The first two components of (8) become

\displaystyle  D_t \partial_y \theta = \partial_y \theta \partial_x u_x - \partial_x \theta \partial_y u_x

\displaystyle - D_t \partial_x \theta = \partial_y \theta \partial_x u_y - \partial_x \theta \partial_y u_y

which rearranges using (9) to

\displaystyle  \partial_y D_t \theta = \partial_x D_t \theta = 0.

Formally, we may integrate this system to obtain the transport equation

\displaystyle  D_t \theta = 0, \ \ \ \ \ (11)

Finally, the last component of (8) is

\displaystyle  D_t \omega_z = \partial_y \theta \partial_x u_z - \partial_x \theta \partial_y u_z. \ \ \ \ \ (12)

At this point, we make the following choice for {T}:

\displaystyle  T ( U_x, U_y, \theta ) = \alpha (U_x, U_y, \theta) + (-\partial_y |\nabla|^{-1} \theta, \partial_x |\nabla|^{-1} \theta, 0) \ \ \ \ \ (13)

\displaystyle  + P( 0, 0, |\nabla|^{-1} (\partial_y U_x - \partial_x U_y) )

where {\alpha > 0} is a real constant and {Pu := (-\Delta)^{-1} (\nabla \times (\nabla \times u))} is the Leray projection onto divergence-free vector fields. One can verify that for large enough {\alpha}, {T} is a self-adjoint positive definite zeroth order Fourier multiplier from divergence free vector fields to divergence-free vector fields. With this choice, we see from (10) that

\displaystyle  u_z = \alpha \theta - |\nabla|^{-1} \omega_z

so that (12) simplifies to

\displaystyle  D_t \omega_z = - \partial_y \theta \partial_x |\nabla|^{-1} \omega_z + \partial_x \theta \partial_y |\nabla|^{-1} \omega_z.

This implies (formally at least) that if {\omega_z} vanishes at time zero, then it vanishes for all time. Setting {\omega_z=0}, we then have from (10) that

\displaystyle (u_x,u_y,u_z) = (-\partial_y |\nabla|^{-1} \theta, \partial_x |\nabla|^{-1} \theta, \alpha \theta )

and from (11) we then recover the SQG system (2), (3). To put it another way, if {\theta(t,x,y)} and {u(t,x,y)} solve the SQG system, then by setting

\displaystyle  \omega(t,x,y,z) := ( \partial_y \theta(t,x,y), -\partial_x \theta(t,x,y), 0 )

\displaystyle  \tilde u(t,x,y,z) := ( u_x(t,x,y), u_y(t,x,y), \alpha \theta(t,x,y) )

then {\omega,\tilde u} solve the modified Euler system (6), (7) with {T} given by (13).

We have {T^{-1} \tilde u = (0, 0, \theta)}, so the Hamiltonian {\frac{1}{2} \int_{{\bf R}^3} \tilde u \cdot T^{-1} \tilde u} for the modified Euler system in this case is formally a scalar multiple of the conserved quantity {\int_{{\bf R}^2} \theta^2}. The momentum {\int_{{\bf R}^3} x \cdot \tilde u} for the modified Euler system is formally a scalar multiple of the conserved quantity {\int_{{\bf R}^2} \theta}, while the vortex stream lines that are preserved by the modified Euler flow become the level sets of the active scalar that are preserved by the SQG flow. On the other hand, the helicity {\int_{{\bf R}^3} \omega \cdot T^{-1} \tilde u} vanishes, and other conserved quantities for SQG (such as the Hamiltonian {\int_{{\bf R}^2} \theta |\nabla|^{-1} \theta}) do not seem to correspond to conserved quantities of the modified Euler system. This is not terribly surprising; a low-dimensional flow may well have a richer family of conservation laws than the higher-dimensional system that it is embedded in.

Filed under: expository, math.AP Tagged: Euler equations, surface quasi-geostrophic equation

April 19, 2015

David Hoggtracking the sky, spectroscopically

At group meeting today, Blanton spoke at length about telluric corrections and sky subtraction in the various spectroscopic surveys that make up SDSS-IV. His feeling is that the number of sky and telluric-standard fibers assigned in the surveys might not be optimal given the variability of the relevant systematics. He enlisted our help in analyzing that situation. In particular, what model complexity does the data support? And, given that model complexity, what is the best sampling of the focal plane with telluric standards (and sky fibers)? I agreed to write down some ideas for the SDSS-IV mailing lists.

Tommaso DorigoThe Era Of The Atom

"The era of the atom" is a new book by Piero Martin and Alessandra Viola - for now the book is only printed in Italian (by Il Mulino), but I hope it will soon be translated in English.

read more

Tommaso DorigoThe Era Of The Atom

"The era of the atom" is a new book by Piero Martin and Alessandra Viola - for now the book is only printed in Italian (by Il Mulino), but I hope it will soon be translated in English. Piero Martin is a professor of Physics at the University of Padova and a member of the RFX collaboration, a big experiment which studies the confinement of plasma with the aim of constructing a fusion reactor - a real, working one like ITER; and Alessandra Viola is a well-known journalist and writer of science popularization. 

read more

BackreactionA wonderful 100th anniversary gift for Einstein

This year, Einstein’s theory of General Relativity celebrates its 100th anniversary. 2015 is also the “Year of Light,” and fittingly so, because the first and most famous confirmation of General Relativity was the light deflection on the Sun.

As light carries energy and is thus subject of gravitational attraction, a ray of light passing by a massive body should be slightly bent towards it. This is so both in Newton’s theory of gravity and in Einstein’s, but Einstein’s deflection is by a factor two larger than Newton’s. Because of this effect, the positions of stars seem to slightly shift as they stand close by the Sun, but the shift is absolutely tiny: The deflection of light from a star close to the rim of the Sun is just about a thousandth of the Sun's diameter, and the deflection drops rapidly the farther away the star’s position is from the rim.

In the year 1915 one couldn’t observe stars in such close vicinity of the Sun because if the Sun does one thing it’s shining really brightly, which is generally bad if you want to observe something small and comparably dim next to it. The German astronomer Johann Georg von Soldner had calculated the deflection in Newton’s theory already in 1801. His paper wasn’t published until 1804, and then with a very defensive final paragraph that explained:
“Uebrigens glaube ich nicht nöthig zu haben, mich zu entschuldigen, daß ich gegenwärtige Abhandlung bekannt mache; da doch das Resultat dahin geht, daß alle Perturbationen unmerklich sind. Denn es muß uns fast eben so viel daran gelegen seyn, zu wissen, was nach der Theorie vorhanden ist, aber auf die Praxis keinen merklichen Einfluß hat; als uns dasjenige interessirt, was in Rücksicht auf Praxis wirklichen Einfluß hat. Unsere Einsichten werden durch beyde gleichviel erweitert.”

[“Incidentally I do not think it should be necessary for me to apologize that I publish this article even though the result indicates that the deviation is unobservably small. We must pay as much attention to knowing what theoretically exists but has no influence in practice, as we are interested in that what really affects practice. Our insights are equally increased by both.” - translation SH]
A century passed and physicists now had somewhat more confidence in their technology, but still they had to patiently wait for a total eclipse of the Sun during which they were hoping to observe the predicted deflection of light.

In 1919 finally, British astronomer and relativity aficionado Arthur Stanley Eddington organized two expeditions to observe a solar eclipse with a zone of totality roughly along the equator. He himself travelled to Principe, an island in the Atlantic ocean, while a second team observed the event from Sobral in Brazil. The results of these observations were publicly announced November 1919 at a meeting in London that made Einstein a scientific star over night: The measured deflection of light did fit to the Einstein value, while it was much less compatible with the Newtonian bending.

As history has it, Eddington’s original data actually wasn’t good enough to make that claim with certainty. His measurements had huge error bars due to bad weather and he also might have cherry-picked his data because he liked Einstein’s theory a little too much. Shame on him. Be that as it may, dozens of subsequent measurement proved his premature announcement correct. Einstein was right, Newton was wrong.

By the 1990s, one didn’t have to wait for solar eclipses any more. Data from radio sources, such as distant pulsars, measured by very long baseline interferometry (VLBI) could now be analyzed for the effect of light deflection. In VLBI, one measures the time delay by which wavefronts from radio sources arrive at distant detectors that might be distributed all over the globe. The long baseline together with a very exact timing of the signal’s arrival allows one to then pinpoint very precisely where the object is located – or seems to be located. In 1991, Robertson, Carter & Dillinger confirmed to high accuracy the light deflection predicted by General Relativity by analyzing data from VLBI accumulated over 10 years.

But crunching data is one thing, seeing it is another thing, and so I wanted to share with you today a plot I came across coincidentally, in a paper from February by two researchers located in Australia.

They have analyzed the VLBI data from some selected radio sources over a period of 10 years. In the image below, you can see how the apparent position of the blazar (1606+106) moves around over the course of the year. Each dot is one measurement point; the “real” position is in the middle of the circle that can be inferred at the point marked zero on the axes.

Figure 2 from arXiv:1502.07395

How is that for an effect that was two centuries ago thought to be unobservable?

April 18, 2015

Chad OrzelThe Real Access Problem with the Hugos

There has been a lot of stuff written in response to the Hugo award nomination mess, most of it stupid. Some of it is stupid to such an impressive degree that it actually makes me feel sympathetic toward people who I know are wrong about everything.

One of the few exceptions is the long essay by Eric Flint. This comes as a mild surprise, as I’ve always mentally lumped him in with the folks whose incessant political wrangling was a blight on Usenet’s rec.arts.sf.written back in the day; now I can’t remember if he was actually one of the annoying idiots, or if I’ve mistakenly put him in with them because I associate them with Baen…

Anyway, Flint’s post is very good, and gets to the thing I think is the real problem here. A lot of the anti-Puppy writers take the basically correct position that since the Puppy slate was put in by a tiny fraction of the eligible nominators, the solution is to just get more people to nominate. To this end, there are measures like Mary Robinette Kowall’s offer to buy supporting memberships for random people. Which is a lovely gesture, and I applaud it, but I think it kind of misses the point.

I don’t think the real barrier to Hugo nomination is financial– after all, there were thousands of people who already had memberships but didn’t nominate at all, and many more (like me) who sent in nomination ballots with a lot of the categories blank. What’s stopping those people isn’t lack of money, but lack of information, because of the factors Flint identifies:

The first objective factor is about as simple as gets. The field is simply too damn BIG, nowadays. […]

Forty or fifty years ago—even thirty years ago, to a degree—it was quite possible for any single reader to keep on top of the entire field. You wouldn’t read every F&SF story, of course. But you could maintain a good general knowledge of the field as a whole and be at least familiar with every significant author.

Today, that’s simply impossible. Leaving aside short fiction, of which there’s still a fair amount being produced, you’d have to be able to read at least two novels a day to keep up with what’s being published—and that’s just in the United States. In reality, nobody can do it, so what happens is that over the past few decades the field has essentially splintered, from a critical standpoint.

This problem of not being able to keep on top of things is especially acute for the Hugos, because as Flint points out, they’re designed around how the field was fifty-odd years ago:

Both the Hugo and the Nebula give out four literary awards. (I’m not including here the more recent dramatic awards, just the purely literary categories.) Those awards are given for best short story, best novelette, best novella, and best novel. In other words, three out of four awards—75% of the total—are given for short fiction.

Forty or fifty years ago, that made perfect sense. It was an accurate reflection of the reality of the field for working authors. F&SF in those days was primarily a short form genre, whether you measured that in terms of income generated or number of readers.

But that is no longer true. Today, F&SF is overwhelmingly a novel market. Short fiction doesn’t generate more than 1% or 2% of all income for writers. And even measured in terms of readership, short fiction doesn’t account for more than 5% of the market.

Taken together, you have the reason why so few people nominate, and why many of those who do send in ballots that are 75% blank. The market is so big and diffuse, and three-quarters of the categories are for stuff that just isn’t widely read. You can see this in this year’s stats: over 1,800 people sent in ballots with Best Novel, but just over 1,000 nominated for Best Novelette. And that’s with the Puppy voters inflating the totals.

It’s not just a matter of getting more people the right to vote– the set of people who nominated a novel but didn’t nominate any short fiction is about three times the plausible size of the Puppy bloc. What you need is a way to get those people the information they need to fill out those categories they’re leaving blank.

Now, in an ideal world, those people would all read a lot of short fiction and make up their own minds about which stories are their favorites, and the voting will take care of itself. In that same ideal world, I have a pony. A magical nanotech pony that eats carbon dioxide and craps out diamonds.

It’s not enough to buy memberships for more people: you can already more than fix the problem with just the set of people who already nominate, let alone the people who were eligible to nominate (there were something close to 10,000 of those, I think, from last year’s Worldcon attendance). What you need is a way to help those people nominate in categories of works that just aren’t that widely read.

And that’s a really tough problem to crack, if you’re implacably opposed to counter-slates. There are a few relatively neutral recommendation lists, but they’re not much use. The Locus recommended reading list usually does a decent job identifying high-quality works, but it’s mostly useless to a low-information potential voter– in the most under-nominated category, novelette, they recommend more than 50 stories. Only a fraction of those are readily available on-line, with most of them in a bunch of anthologies that voters would need to buy or get from the library. And the Locus list comes out in February, leaving barely more than a month to read enough of that to make an informed decision.

(The situation isn’t a whole lot better in the other short fiction categories; the list of recommended novellas is at least short, but about half are published as stand-alone books, where you could get about a third of the novelette nominees out of the same handful of anthologies. The short story list is about the same size as the novelette list, but at least those are, by definition, shorter. Another option would be the various “Year’s Best” anthologies, but those have similar size and timing issues, though they would reduce the expense somewhat…)

I’m not sure how you fix that without slates, or something that will look too much like a slate to satisfy a good chunk of the anti-Puppy crowd. Tweaking the nomination rules isn’t really a fix, but it’s probably the best you could do– of the options, I’d probably go for keeping the nominations per ballot at five and expanding the number of finalists to ten. Yeah, that doesn’t leave a lot of time to read the ten finalists before the voting closes, but it’s still better than trying to plow through the Locus list. And even before the Puppy nonsense started, it was a rare year when the Hugo ballot didn’t feature a few books and stories I stopped halfway through.

That’s a slow change, though, because of the way WSFS works, and the best it can do is limit the impact of slates (so, for example, it wouldn’t prevent everyone’s least favorite walking prion disease from oozing his way onto the Best Editor ballot). If you’re going to do all that Business Meeting work to change the rules, you might be better off changing the categories instead, to better reflect the modern market.

Clifford JohnsonFestival Panel

father and son at LA Times Festival of BooksDon't forget that this weekend is the fantastic LA Times Festival of Books! See my earlier post. Actually, I'll be on a panel at 3:00pm in Wallis Annenberg Hall entitled "Grasping the Ineffable: On Science and Health", with Pat Levitt and Elyn Saks, chaired by the Science writer KC Cole. I've no idea where the conversation is going to go, but I hope it'll be fun and interesting! (See the whole schedule here.) Maybe see you there! -cvj Click to continue reading this post

Dave BaconQIP 2015 Return of the Live-blogging, Day 1

Jan 14 update at the end.

The three Pontiffs are reunited at QIP 2015 and, having forgotten how painful liveblogging was in the past, are doing it again. This time we will aim for some slightly more selective comments.

In an ideal world the QIP PC would have written these sorts of summaries and posted them on scirate, but instead they are posted on easychair where most of you can’t access them. Sorry about this! We will argue at the business meeting for a more open refereeing process.

The first plenary talk was:

Ran Raz (Weizmann Institute)
How to Delegate Computations: The Power of No-Signaling Proofs

Why is the set of no-signalling distributions worth looking at? (That is, the set of conditional probability distributions p(a,b|x,y) that have well-defined marginals p(a|x) and p(b|y).) One way to think about it is as a relaxation of the set of “quantum” distributions, meaning the input-output distributions that are compatible with entangled states. The no-signalling polytope is defined by a polynomial number of linear constraints, and so is the sort of relaxation that is amenable to linear programming, whereas we don’t even know whether the quantum value of a game is computable. But is the no-signalling condition ever interesting in itself?

Raz and his coauthors (Yael Kalai and Ron Rothblum) prove a major result (which we’ll get to below) about the computational power of multi-prover proof systems where the provers have access to arbitrary non-signalling distributions. But they began by trying to prove an apparently unrelated classical crypto result. In general, multiple provers are stronger than one prover. Classically we have MIP=NEXP and IP=PSPACE, and in fact that MIP protocol just requires one round, whereas k rounds with a single prover is (roughly) within the k’th level of the polynomial hierarchy (i.e. even below PSPACE). So simulating many provers with one prover seems in general crazy.

But suppose instead the provers are computationally limited. Suppose they are strong enough for the problem to be interesting (i.e. they are much stronger than the verifier, so it is worthwhile for the verifier to delegate some nontrivial computation to them) but to weak to break some FHE (fully homomorphic encryption) scheme. This requires computational assumptions, but nothing too outlandish. Then the situation might be very different. If the verifier sends its queries using FHE, then one prover might simulate many provers without compromising security. This was the intuition of a paper from 2000, which Raz and coauthors finally are able to prove. The catch is that even though the single prover can’t break the FHE, it can let its simulated provers play according to a no-signalling distribution. (Or at least this possibility cannot be ruled out.) So proving the security of 1-prover delegated computation requires not only the computational assumptions used for FHE, but also a multi-prover proof system that is secure against no-signalling distributions.

Via this route, Raz and coauthors found themselves in QIP territory. When they started it was known that

This work nails down the complexity of the many-prover setting, showing that EXP is contained in MIPns[poly provers], so that in fact that classes are equal.

It is a nice open question whether the same is true for a constant number of provers, say 3. By comparison, three entangled provers or two classical provers are strong enough to contain NEXP.

One beautiful consequence is that optimizing a linear function over the no-signalling polytope is roughly a P-complete problem. Previously it was known that linear programming was P-complete, meaning that it was unlikely to be solvable in, say, log space. But this work shows that this is true even if the constraints are fixed once and for all, and only the objective function is varied. (And we allow error.) This is established in a recent followup paper [ECCC TR14-170] by two of the same authors.

Francois Le Gall.
Improved Quantum Algorithm for Triangle Finding via Combinatorial Arguments
abstract arXiv:1407.0085

A technical tour-de-force that we will not do justice to here. One intriguing barrier-breaking aspect of the work is that all previous algorithms for triangle finding worked equally well for the standard unweighted case as well as a weighted variant in which each edge is labeled by a number and the goal is to find a set of edges (a,b), (b,c), (c,a) whose weights add up to a particular target. Indeed this algorithm has a query complexity for the unweighted case that is known to be impossible for the weighted version. A related point is that this shows the limitations of the otherwise versatile non-adaptive learning-graph method.

Ryan O’Donnell and John Wright
Quantum Spectrum Testing
abstract arXiv:1501.05028

A classic problem: given \rho^{\otimes n} for \rho an unknown d-dimensional state, estimate some property of \rho. One problem where the answer is still shockingly unknown is to estimate \hat\rho in a way that achieves \mathbb{E} \|\rho-\hat \rho\|_1 \leq\epsilon.
Results from compressed sensing show that n = \tilde\Theta(d^2r^2) for single-copy two-outcome measurements of rank-r states with constant error, but if we allow block measurements then maybe we can do better. Perhaps O(d^2/\epsilon) is possible using using the Local Asymptotic Normality results of Guta and Kahn [0804.3876], as Hayashi has told me, but the details are – if we are feeling generous – still implicit. I hope that he, or somebody, works them out. (18 Jan update: thanks Ashley for fixing a bug in an earlier version of this.)

The current talk focuses instead on properties of the spectrum, e.g. how many copies are needed to distinguish a maximally mixed state of rank r from one of rank r+c? The symmetry of the problem (invariant under both permutations and rotations of the form U^{\otimes n}) means that we can WLOG consider “weak Schur sampling” meaning that we measure which S_n \times U_d irrep our state lies in, and output some function of this result. This irrep is described by an integer partition which, when normalized, is a sort of mangled estimate of the spectrum. It remains only to analyze the accuracy of this estimator in various ways. In many of the interesting cases we can say something nontrivial even if n= o(d^2). This involves some delicate calculations using a lot of symmetric polynomials. Some of these first steps (including many of the canonical ones worked out much earlier by people like Werner) are in my paper quant-ph/0609110 with Childs and Wocjan. But the current work goes far far beyond our old paper and introduces many new tools.

Han-Hsuan Lin and Cedric Yen-Yu Lin. Upper bounds on quantum query complexity inspired by the Elitzur-Vaidman bomb tester
abstract arXiv:1410.0932

This talk considers a new model of query complexity inspired by the Elitzur-Vaidman bomb tester. The bomb tester is a classic demonstration of quantum weirdness: You have a collection of bombs that have a detonation device so sensitive that even a single photon impacting it will set it off. Some of these bombs are live and some are duds, and you’d like to know which is which. Classically, you don’t stand a chance, but quantum mechanically, you can put a photon into a beamsplitter and place the bomb in one arm of a Mach-Zender interferometer. A dud will destroy the interference effects, and a homodyne detector will always click the same way. But you have a 50/50 chance of detecting a live bomb if the other detector clicks! There are various tricks that you can play related to the quantum Zeno effect that let you do much better than this 50% success probability.

The authors define a model of query complexity where one risks explosion for some events, and they showed that the quantum query complexity is related to the bomb query complexity by B(f) = \Theta(Q(f)^2). There were several other interesting results in this talk, but we ran out of steam as it was the last talk before lunch.

Kirsten Eisentraeger, Sean Hallgren, Alexei Kitaev and Fang Song
A quantum algorithm for computing the unit group of an arbitrary degree number field
STOC 2014

One unfortunate weakness of this work: The authors, although apparently knowledgeable about Galois theory, don’t seem to know about this link.

The unit group is a fundamental object in algebraic number theory. It comes up frequently in applications as well, and is used for fully homomorphic encryption, code obfuscation, and many other things.

My [Steve] personal way of understanding the unit group of a number field is that it is a sort of gauge group with respect to the factoring problem. The units in a ring are those numbers with multiplicative inverses. In the ring of integers, where the units are just \pm1 , we can factor composite numbers into 6 = 3 \times 2 = (-3)\times (-2). Both of these are equally valid factorizations; they are equivalent modulo units. In more complicated settings where unique factorization fails, we have factorization into prime ideals, and the group of units can in general become infinite (though always discrete).

The main result of this talk is a quantum algorithm for finding the unit group of a number field of arbitrary degree. One of the technical problems that they had to solve to get this result was to solve the hidden subgroup problem on a continuous group, namely \mathbb{R}^n.

The speaker also announced some work in progress: a quantum algorithm for the principal ideal problem and the class group problem in arbitrary degree number fields [Biasse Song ‘14]. It sounds like not all the details of this are finished yet.

Dominic Berry, Andrew Childs and Robin Kothari
Hamiltonian simulation with nearly optimal dependence on all parameters
abstract 1501.01715

Hamiltonian simulation is not only the original killer app of quantum computers, but also a key subroutine in a large and growing number of problems. I remember thinking it was pretty slick that higher-order Trotter-Suzuki could achieve a run-time of \|H\|t\text{poly}(s)(\|H\|t/\epsilon)^{o(1)} where t is the time we simulate the Hamiltonian for and s is the sparsity. I also remember believing that the known optimality thoerems for Trotter-Suzuki (sorry I can’t find the reference, but it involves decomposing e^{t(A+B)} for the free Lie algebra generated by A,B) meant that this was essentially optimal.

Fortunately, Berry, Childs and Kothari (and in other work, Cleve) weren’t so pessimistic, and have blasted past this implicit barrier. This work synthesizes everything that comes before to achieve a run-time of \tau \text{poly}\log(\tau/\epsilon) where \tau = \|H\|_{\max}st (where \|H\|_{\max} is \max_{i,j} |H_{i,j}| can be related to the earlier bounds via \|H\| \leq d \|H\|_{\max}).

One quote I liked: “but this is just a generating function for Bessel functions!” Miraculously, Dominic makes that sound encouraging. The lesson I suppose is to find an important problem (like Hamiltonian simulation) and to approach it with courage.

Salman Beigi and Amin Gohari
Wiring of No-Signaling Boxes Expands the Hypercontractivity Ribbon
abstract arXiv:1409.3665

If you have some salt water with salt concentration 0.1% and some more with concentration 0.2%, then anything in the range [0.1, 0.2] is possible, but no amount of mixing will give you even a single drop with concentration 0.05% or 0.3%, even if you start with oceans at the initial concentrations. Similarly if Alice and Bob share an unlimited number of locally unbiased random bits with correlation \eta they cannot produce even a single bit with correlation \eta' > \eta if they don’t communicate. This was famously proved by Reingold, Vadhan and Wigderson.

This talk does the same thing for no-signaling boxes. Let’s just think about noisy PR boxes to make this concrete. The exciting thing about this work is that it doesn’t just prove a no-distillation theorem but it defines an innovative new framework for doing so. The desired result feels like something from information theory, in that there is a monotonicity argument, but it needs to use quantities that do not increase with tensor product.

Here is one such quantity. Define the classical correlation measure \rho(A,B) = \max \text{Cov}(f,g) where f:A\mapsto \mathbb{R}, g:B\mapsto \mathbb{R} and each have variance 1. Properties:

  • 0 \leq \rho(A,B) \leq 1
  • \rho(A,B) =0 iff p_{AB} = p_A \cdot p_B
  • \rho(A^n, B^n) = \rho(A,B)
  • for any no-signaling box, \rho(A,B) \leq \max(\rho(A,B|X,Y), \rho(X,Y))

Together this shows that any wiring of boxes cannot increase this quantity.

The proof of this involves a more sophisticated correlation measure that is not just a single number but is a region called the hypercontractivity ribbon (originally due to [Ahlswede, Gacs ‘76]). This is defined to be the set of (\lambda_1, \lambda_2) such that for any f,g we have
\mathbb{E}[f_A g_B] \leq \|f_A\|_{\frac{1}{\lambda_1}} \|g_B\|_{\frac{1}{\lambda_2}}
A remarkable result of [Nair ‘14] is that this is equivalent to the condition that
I(U;AB) \geq \lambda_1 I(U:A) + \lambda_2 I(U:B)
for any extension of the distribution on AB to one on ABU.

Some properties.

  • The ribbon is [0,1]\times [0,1] iff A,B are independent.
  • It is stable under tensor power.
  • monotonicity: local operations on A,B enlarge R

For boxes define R(A,B|X,Y) = \cap_{x,y} R(A,B|x,y). The main theorem is then that rewiring never shrinks hypercontractivity ribbon. And as a result, PR box noise cannot be reduced.

These techniques are beautiful and seem as though they should have further application.

Masahito Hayashi
Estimation of group action with energy constraint
abstract arXiv:1209.3463

Your humble bloggers were at this point also facing an energy constraint which limited our ability to estimate what happened. The setting is that you pick a state, nature applies a unitary (specifically from a group representation) and then you pick a measurement and try to minimize the expected error in estimating the group element corresponding to what nature did. The upshot is that entanglement seems to give a quadratic improvement in metrology. Noise (generally) destroys this. This talk showed that a natural energy constraint on the input also destroys this. One interesting question from Andreas Winter was about what happens when energy constraints are applied also to the measurement, along the lines of 1211.2101 by Navascues and Popescu.

Jan 14 update: forgot one! Sorry Ashley.

Ashley Montanaro
Quantum pattern matching fast on average

Continuing the theme of producing shocking and sometimes superpolynomial speedups to average-case problems, Ashley shows that finding a random pattern of length m in a random text of length n can be done in quantum time \tilde O(\sqrt{n/m}\exp(\sqrt{\log m})). Here “random” means something subtle. The text is uniformly random and the pattern is either uniformly random (in the “no” case) or is a random substring of the text (in the “yes” case). There is also a higher-dimensional generalization of the result.

One exciting thing about this is that it is a fairly natural application of Kuperberg’s algorithm for the dihedral-group HSP; in fact the first such application, although Kuperberg’s original paper does mention a much less natural such variant. (correction: not really the first – see Andrew’s comment below.)

It is interesting to think about this result in the context of the general question about quantum speedups for promise problems. It has long been known that query complexity cannot be improved by more than a polynomial (perhaps quadratic) factor for total functions. The dramatic speedups for things like the HSP, welded trees and even more contrived problems must then use the fact that they work for partial functions, and indeed even “structured” functions. Pattern matching is of course a total function, but not one that will ever be hard on average over a distribution with, say, i.i.d. inputs. Unless the pattern is somehow planted in the text, most distributions simply fail to match with overwhelming probability. It is funny that for i.i.d. bit strings this stops being true when m = O(\log n), which is almost exactly when Ashley’s speedup becomes merely quadratic. So pattern matching is a total function whose hard distributions all look “partial” in some way, at least when quantum speedups are possible. This is somewhat vague, and it may be that some paper out there expresses the idea more clearly.

Part of the strength of this paper is then finding a problem where the promise is so natural. It gives me new hope for the future relevance of things like the HSP.

April 17, 2015

ResonaancesAntiprotons from AMS

This week the AMS collaboration released the long expected measurement of the cosmic ray antiproton spectrum.  Antiprotons are produced in our galaxy in collisions of high-energy cosmic rays with interstellar matter, the so-called secondary production.  Annihilation of dark matter could add more antiprotons on top of that background, which would modify the shape of the spectrum with respect to the prediction from the secondary production. Unlike for cosmic ray positrons, in this case there should be no significant primary production in astrophysical sources such as pulsars or supernovae. Thanks to this, antiprotons could in principle be a smoking gun of dark matter annihilation, or at least a powerful tool to constrain models of WIMP dark matter.

The new data from the AMS-02 detector extend the previous measurements from PAMELA up to 450 GeV and significantly reduce experimental errors at high energies. Now, if you look at the  promotional material, you may get an impression that a clear signal of dark matter has been observed.  However,  experts unanimously agree that the brown smudge in the plot above is just shit, rather than a range of predictions from the secondary production. At this point, there is certainly no serious hints for dark matter contribution to the antiproton flux. A quantitative analysis of this issue appeared in a paper today.  Predicting  the antiproton spectrum is subject to large experimental uncertainties about the flux of cosmic ray proton and about the nuclear cross sections, as well as theoretical uncertainties inherent in models of cosmic ray propagation. The  data and the predictions are compared in this Jamaican band plot. Apparently, the new AMS-02 data are situated near the upper end of the predicted range.

Thus, there is no currently no hint of dark matter detection. However, the new data are extremely useful to constrain models of dark matter. New constraints on the annihilation cross section of dark matter  are shown in the plot to the right. The most stringent limits apply to annihilation into b-quarks or into W bosons, which yield many antiprotons after decay and hadronization. The thermal production cross section - theoretically preferred in a large class of WIMP dark matter models - is in the  case of b-quarks excluded for the mass of the dark matter particle below 150 GeV. These results provide further constraints on models addressing the hooperon excess in the gamma ray emission from the galactic center.

More experimental input will allow us to tune the models of cosmic ray propagation to better predict the background. That, in turn, should lead to  more stringent limits on dark matter. Who knows... maybe a hint for dark matter annihilation will emerge one day from this data; although, given the uncertainties,  it's unlikely to ever be a smoking gun.

Thanks to Marco for comments and plots. 

Clifford JohnsonSouthern California Strings Seminar

2011 scss held in doheney library, uscThere's an SCSS today, at USC! (Should have mentioned it earlier, but I've been snowed under... I hope that the appropriate research groups have been contacted and so forth.) The schedule can be found here along with maps. -cvj Click to continue reading this post

Dave BaconQIP 2015 Talks Available

Talks from QIP 2015 are now available on this YouTube channel. Great to see! I’m still amazed by the wondrous technology that allows me to watch talks given on the other side of the world, at my own leisure, on such wonderful quantum esoterica.

Here also are links to the Pontifical live-blogging and the QIP schedule of talks.

Tommaso DorigoGuess the Plot

I used to post on this blog very abstruse graphs from time to time, asking readers to guess what they represented. I don't know why I stopped it - it is fun. So here is a very colourful graph for you today. You are asked to guess what it represents. 

I am reluctant to provide any hints, as I do not want to cripple your fantasy. But if you really want to try and guess something close to the truth, this graph represents a slice of a multi-dimensional space, and the information in the lines and in the coloured map is not directly related. Have a shot in the comments thread! (One further hint: you stand no chance of figuring this out).

Chad OrzelA Note to the University of Pretoria

So, the mysterious strings of digits that I wrote about the other day seem to be part of a class assignment from the University of Pretoria in South Africa. Students are being asked to go read and comment on blogs, and the random digits are individual student identifiers.

This makes sense given the form and content of the other comments, and makes me sorry we live in a world where my first guess regarding this was that it was some sort of con. But, you know, I was led there by gigabytes of email from ersatz Nigerian widows.

I would, however, like to send one message to whoever it is at the University of Pretoria who’s assigning students to do blog comments, which is this: It would be a good idea, in future, to send a courtesy message to the authors of the blogs you’re sending students to, so we know what’s going on. And it would be a good idea, in the future, for the students leaving these comments to clearly identify that this is what they’re doing. That way, I won’t spend weeks deleting people’s homework.

It’s not clear how much follow-up there is to any of these comments, so I’m making this a top-level blog post in hopes of getting more attention. If you’re coming here to leave a comment for school, please state that clearly, and also pass along the message to the professor who assigned this. Next time, give me some advance warning, please.


The "publon" is "the elementary quantum of scientific research which justifies publication" and it's also a website that might be interesting for you if you're an active researcher. Publons helps you collect records of your peer review activities. On this website, you can set up an account and then add your reviews to your profile page.

You can decide whether you want to actually add the text of your reviews, or not, and to which level you want your reviews to be public. By default, only the journal for which you reviewed and the month during which the review was completed will be shown. So you need not be paranoid that people will know all the expletives you typed in reply to that idiot last year!

You don't even have to add the text of your review at all, you just have to provide a manuscript number. Your review activity is then checked against the records of the publisher, or so is my understanding.

Since I'm always interested in new community services, I set up an account there some months ago. It goes really quickly and is totally painless. You can then enter your review activities on the website or - super conveniently - you just forward the "Thank You" note from the publisher to some email address. The record then automatically appears on your profile within a day or two. I forwarded a bunch of "Thank You" emails from the last months, and now my profile page looks like follows:

The folks behind the website almost all have a background in academia and probably know it's pointless trying to make money from researchers. One expects of course that at some point they will try to monetize their site, but at least so far I have received zero spam, upgrade offers, or the dreaded newsletters that nobody wants to read.

In short, the site is doing exactly what it promises to do. I find the profile page really useful and will probably forward my other "Thank You" notes (to the extent that I can dig them up), and then put the link to that page in my CV and on my homepage.

Terence TaoNewton iteration and the Siegel linearisation theorem

An extremely large portion of mathematics is concerned with locating solutions to equations such as

\displaystyle  f(x) = 0


\displaystyle  \Phi(x) = x \ \ \ \ \ (1)

for {x} in some suitable domain space (either finite-dimensional or infinite-dimensional), and various maps {f} or {\Phi}. To solve the fixed point iteration equation (1), the simplest general method available is the fixed point iteration method: one starts with an initial approximate solution {x_0} to (1), so that {\Phi(x_0) \approx x_0}, and then recursively constructs the sequence {x_1, x_2, x_3, \dots} by {x_n := \Phi(x_{n-1})}. If {\Phi} behaves enough like a “contraction”, and the domain is complete, then one can expect the {x_n} to converge to a limit {x}, which should then be a solution to (1). For instance, if {\Phi: X \rightarrow X} is a map from a metric space {X = (X,d)} to itself, which is a contraction in the sense that

\displaystyle  d( \Phi(x), \Phi(y) ) \leq (1-\eta) d(x,y)

for all {x,y \in X} and some {\eta>0}, then with {x_n} as above we have

\displaystyle  d( x_{n+1}, x_n ) \leq (1-\eta) d(x_n, x_{n-1} )

for any {n}, and so the distances {d(x_n, x_{n-1} )} between successive elements of the sequence decay at at least a geometric rate. This leads to the contraction mapping theorem, which has many important consequences, such as the inverse function theorem and the Picard existence theorem.

A slightly more complicated instance of this strategy arises when trying to linearise a complex map {f: U \rightarrow {\bf C}} defined in a neighbourhood {U} of a fixed point. For simplicity we normalise the fixed point to be the origin, thus {0 \in U} and {f(0)=0}. When studying the complex dynamics {f^2 = f \circ f}, {f^3 = f \circ f \circ f}, {\dots} of such a map, it can be useful to try to conjugate {f} to another function {g = \psi^{-1} \circ f \circ \psi}, where {\psi} is a holomorphic function defined and invertible near {0} with {\psi(0)=0}, since the dynamics of {g} will be conjguate to that of {f}. Note that if {f(0)=0} and {f'(0)=\lambda}, then from the chain rule any conjugate {g} of {f} will also have {g(0)=0} and {g'(0)=\lambda}. Thus, the “simplest” function one can hope to conjugate {f} to is the linear function {z \mapsto \lambda z}. Let us say that {f} is linearisable (around {0}) if it is conjugate to {z \mapsto \lambda z} in some neighbourhood of {0}. Equivalently, {f} is linearisable if there is a solution to the Schröder equation

\displaystyle  f( \psi(z) ) = \psi(\lambda z) \ \ \ \ \ (2)

for some {\psi: U' \rightarrow {\bf C}} defined and invertible in a neighbourhood {U'} of {0} with {\psi(0)=0}, and all {z} sufficiently close to {0}. (The Schröder equation is normalised somewhat differently in the literature, but this form is equivalent to the usual form, at least when {\lambda} is non-zero.) Note that if {\psi} solves the above equation, then so does {z \mapsto \psi(cz)} for any non-zero {c}, so we may normalise {\psi'(0)=1} in addition to {\psi(0)=0}, which also ensures local invertibility from the inverse function theorem. (Note from winding number considerations that {\psi} cannot be invertible near zero if {\psi'(0)} vanishes.)

We have the following basic result of Koenigs:

Theorem 1 (Koenig’s linearisation theorem) Let {f: U \rightarrow {\bf C}} be a holomorphic function defined near {0} with {f(0)=0} and {f'(0)=\lambda}. If {0 < |\lambda| < 1} (attracting case) or {1 < |\lambda| < \infty} (repelling case), then {f} is linearisable near zero.

Proof: Observe that if {f, \psi, \lambda} solve (2), then {f^{-1}, \psi^{-1}, \lambda^{-1}} solve (2) also (in a sufficiently small neighbourhood of zero). Thus we may reduce to the attractive case {0 < |\lambda| < 1}.

Let {r>0} be a sufficiently small radius, and let {X} denote the space of holomorphic functions {\psi: B(0,r) \rightarrow {\bf C}} on the complex disk {B(0,r) := \{z \in {\bf C}: |z| < r \}} with {\psi(0)=0} and {\psi'(0)=1}. We can view the Schröder equation (2) as a fixed point equation

\displaystyle  \psi = \Phi(\psi)

where {\Phi: X' \rightarrow X} is the partially defined function on {X} that maps a function {\psi: B(0,r) \rightarrow {\bf C}} to the function {\Phi(\psi): B(0,r) \rightarrow {\bf C}} defined by

\displaystyle  \Phi(\psi)(z) := f^{-1}( \psi( \lambda z ) ),

assuming that {f^{-1}} is well-defined on the range of {\psi(B(0,\lambda r))} (this is why {\Phi} is only partially defined).

We can solve this equation by the fixed point iteration method, if {r} is small enough. Namely, we start with {\psi_0: B(0,r) \rightarrow {\bf C}} being the identity map, and set {\psi_1 := \Phi(\psi_0), \psi_2 := \Phi(\psi_1)}, etc. We equip {X} with the uniform metric {d( \psi, \tilde \psi ) := \sup_{z \in B(0,r)} |\psi(z) - \tilde \psi(z)|}. Observe that if {d( \psi, \psi_0 ), d(\tilde \psi, \psi_0) \leq r}, and {r} is small enough, then {\psi, \tilde \psi} takes values in {B(0,2r)}, and {\Phi(\psi), \Phi(\tilde \psi)} are well-defined and lie in {X}. Also, since {f^{-1}} is smooth and has derivative {\lambda^{-1}} at {0}, we have

\displaystyle  |f^{-1}(z) - f^{-1}(w)| \leq (1+\varepsilon) |\lambda|^{-1} |z-w|

if {z, w \in B(0,r)}, {\varepsilon>0} and {r} is sufficiently small depending on {\varepsilon}. This is not yet enough to establish the required contraction (thanks to Mario Bonk for pointing this out); but observe that the function {\frac{\psi(z)-\tilde \psi(z)}{z^2}} is holomorphic on {B(0,r)} and bounded by {d(\psi,\tilde \psi)/r^2} on the boundary of this ball (or slightly within this boundary), so by the maximum principle we see that

\displaystyle  |\frac{\psi(z)-\tilde \psi(z)}{z^2}| \leq \frac{1}{r^2} d(\psi,\tilde \psi)

on all of {B(0,r)}, and in particular

\displaystyle  |\psi(z)-\tilde \psi(z)| \leq |\lambda|^2 d(\psi,\tilde \psi)

on {B(0,\lambda r)}. Putting all this together, we see that

\displaystyle  d( \Phi(\psi), \Phi(\tilde \psi)) \leq (1+\varepsilon) |\lambda| d(\psi, \tilde \psi);

since {|\lambda|<1}, we thus obtain a contraction on the ball {\{ \psi \in X: d(\psi,\psi_0) \leq r \}} if {\varepsilon} is small enough (and {r} sufficiently small depending on {\varepsilon}). From this (and the completeness of {X}, which follows from Morera’s theorem) we see that the iteration {\psi_n} converges (exponentially fast) to a limit {\psi \in X} which is a fixed point of {\Phi}, and thus solves Schröder’s equation, as required. \Box

Koenig’s linearisation theorem leaves open the indifferent case when {|\lambda|=1}. In the rationally indifferent case when {\lambda^n=1} for some natural number {n}, there is an obvious obstruction to linearisability, namely that {f^n = 1} (in particular, linearisation is not possible in this case when {f} is a non-trivial rational function). An obstruction is also present in some irrationally indifferent cases (where {|\lambda|=1} but {\lambda^n \neq 1} for any natural number {n}), if {\lambda} is sufficiently close to various roots of unity; the first result of this form is due to Cremer, and the optimal result of this type for quadratic maps was established by Yoccoz. In the other direction, we have the following result of Siegel:

Theorem 2 (Siegel’s linearisation theorem) Let {f: U \rightarrow {\bf C}} be a holomorphic function defined near {0} with {f(0)=0} and {f'(0)=\lambda}. If {|\lambda|=1} and one has the Diophantine condition {\frac{1}{|\lambda^n-1|} \leq C n^C} for all natural numbers {n} and some constant {C>0}, then {f} is linearisable at {0}.

The Diophantine condition can be relaxed to a more general condition involving the rational exponents of the phase {\theta} of {\lambda = e^{2\pi i \theta}}; this was worked out by Brjuno, with the condition matching the one later obtained by Yoccoz. Amusingly, while the set of Diophantine numbers (and hence the set of linearisable {\lambda}) has full measure on the unit circle, the set of non-linearisable {\lambda} is generic (the complement of countably many nowhere dense sets) due to the above-mentioned work of Cremer, leading to a striking disparity between the measure-theoretic and category notions of “largeness”.

Siegel’s theorem does not seem to be provable using a fixed point iteration method. However, it can be established by modifying another basic method to solve equations, namely Newton’s method. Let us first review how this method works to solve the equation {f(x)=0} for some smooth function {f: I \rightarrow {\bf R}} defined on an interval {I}. We suppose we have some initial approximant {x_0 \in I} to this equation, with {f(x_0)} small but not necessarily zero. To make the analysis more quantitative, let us suppose that the interval {[x_0-r_0,x_0+r_0]} lies in {I} for some {r_0>0}, and we have the estimates

\displaystyle  |f(x_0)| \leq \delta_0 r_0

\displaystyle  |f'(x)| \geq \eta_0

\displaystyle  |f''(x)| \leq \frac{1}{\eta_0 r_0}

for some {\delta_0 > 0} and {0 < \eta_0 < 1/2} and all {x \in [x_0-r_0,x_0+r_0]} (the factors of {r_0} are present to make {\delta_0,\eta_0} “dimensionless”).

Lemma 3 Under the above hypotheses, we can find {x_1} with {|x_1 - x_0| \leq \eta_0 r_0} such that

\displaystyle  |f(x_1)| \ll \delta_0^2 \eta_0^{-O(1)} r_0.

In particular, setting {r_1 := (1-\eta_0) r_0}, {\eta_1 := \eta_0/2}, and {\delta_1 = O(\delta_0^2 \eta_0^{-O(1)})}, we have {[x_1-r_1,x_1+r_1] \subset [x_0-r_0,x_0+r_0] \subset I}, and

\displaystyle  |f(x_1)| \leq \delta_1 r_1

\displaystyle  |f'(x)| \geq \eta_1

\displaystyle  |f''(x)| \leq \frac{1}{\eta_1 r_1}

for all {x \in [x_1-r_1,x_1+r_1]}.

The crucial point here is that the new error {\delta_1} is roughly the square of the previous error {\delta_0}. This leads to extremely fast (double-exponential) improvement in the error upon iteration, which is more than enough to absorb the exponential losses coming from the {\eta_0^{-O(1)}} factor.

Proof: If {\delta_0 > c \eta_0^{C}} for some absolute constants {C,c>0} then we may simply take {x_0=x_1}, so we may assume that {\delta_0 \leq c \eta_0^{C}} for some small {c>0} and large {C>0}. Using the Newton approximation {f(x_0+h) \approx f(x_0) + h f'(x_0)} we are led to the choice

\displaystyle  x_1 := x_0 - \frac{f(x_0)}{f'(x_0)}

for {x_1}. From the hypotheses on {f} and the smallness hypothesis on {\delta} we certainly have {|x_1-x_0| \leq \eta_0 r_0}. From Taylor’s theorem with remainder we have

\displaystyle  f(x_1) = f(x_0) - \frac{f(x_0)}{f'(x_0)} f'(x_0) + O( \frac{1}{\eta_0 r_0} |\frac{f(x_0)}{f'(x_0)}|^2 )

\displaystyle  = O( \frac{1}{\eta_0 r_0} (\frac{\delta_0 r_0}{\eta_0})^2 )

and the claim follows. \Box

We can iterate this procedure; starting with {x_0,\eta_0,r_0,\delta_0} as above, we obtain a sequence of nested intervals {[x_n-r_n,x_n+r_n]} with {f(x_n)| \leq \delta_n}, and with {\eta_n,r_n,\delta_n,x_n} evolving by the recursive equations and estimates

\displaystyle  \eta_n = \eta_{n-1} / 2

\displaystyle  r_n = (1 - \eta_{n-1}) r_{n-1}

\displaystyle  \delta_n = O( \delta_{n-1}^2 \eta_{n-1}^{-O(1)} )

\displaystyle  |x_n - x_{n-1}| \leq \eta_{n-1} r_{n-1}.

If {\delta_0} is sufficiently small depending on {\eta_0}, we see that {\delta_n} converges rapidly to zero (indeed, we can inductively obtain a bound of the form {\delta_n \leq \eta_0^{C (2^n + n)}} for some large absolute constant {C} if {\delta_0} is small enough), and {x_n} converges to a limit {x \in I} which then solves the equation {f(x)=0} by the continuity of {f}.

As I recently learned from Zhiqiang Li, a similar scheme works to prove Siegel’s theorem, as can be found for instance in this text of Carleson and Gamelin. The key is the following analogue of Lemma 3.

Lemma 4 Let {\lambda} be a complex number with {|\lambda|=1} and {\frac{1}{|\lambda^n-1|} \ll n^{O(1)}} for all natural numbers {n}. Let {r_0>0}, and let {f_0: B(0,r_0) \rightarrow {\bf C}} be a holomorphic function with {f_0(0)=0}, {f'_0(0)=\lambda}, and

\displaystyle  |f_0(z) - \lambda z| \leq \delta_0 r_0 \ \ \ \ \ (3)

for all {z \in B(0,r_0)} and some {\delta_0>0}. Let {0 < \eta_0 \leq 1/2}, and set {r_1 := (1-\eta_0) r_0}. Then there exists an injective holomorphic function {\psi_0: B(0, r_1) \rightarrow B(0, r_0)} and a holomorphic function {f_1: B(0,r_1) \rightarrow {\bf C}} such that

\displaystyle  f_0( \psi_1(z) ) = \psi_1(f_1(z)) \ \ \ \ \ (4)

for all {z \in B(0,r_1)}, and such that

\displaystyle  |\psi_1(z) - z| \ll \delta_0 \eta_0^{-O(1)} r_1


\displaystyle  |f_1(z) - \lambda z| \leq \delta_1 r_1

for all {z \in B(0,r_1)} and some {\delta_1 = O(\delta_0^2 \eta_0^{-O(1)})}.

Proof: By scaling we may normalise {r_0=1}. If {\delta_0 > c \eta_0^C} for some constants {c,C>0}, then we can simply take {\psi_1} to be the identity and {f_1=f_0}, so we may assume that {\delta_0 \leq c \eta_0^C} for some small {c>0} and large {C>0}.

To motivate the choice of {\psi_1}, we write {f_0(z) = \lambda z + \hat f_0(z)} and {\psi_1(z) = z + \hat \psi(z)}, with {\hat f_0} and {\hat \psi_1} viewed as small. We would like to have {f_0(\psi_1(z)) \approx \psi_1(\lambda z)}, which expands as

\displaystyle  \lambda z + \lambda \hat \psi_1(z) + \hat f_0( z + \hat \psi_1(z) ) \approx \lambda z + \hat \psi_1(\lambda z).

As {\hat f_0} and {\hat \psi} are both small, we can heuristically approximate {\hat f_0(z + \hat \psi_1(z) ) \approx \hat f_0(z)} up to quadratic errors (compare with the Newton approximation {f(x_0+h) \approx f(x_0) + h f'(x_0)}), and arrive at the equation

\displaystyle  \hat \psi_1(\lambda z) - \lambda \hat \psi_1(z) = \hat f_0(z). \ \ \ \ \ (5)

This equation can be solved by Taylor series; the function {\hat f_0} vanishes to second order at the origin and thus has a Taylor expansion

\displaystyle  \hat f_0(z) = \sum_{n=2}^\infty a_n z^n

and then {\hat \psi_1} has a Taylor expansion

\displaystyle  \hat \psi_1(z) = \sum_{n=2}^\infty \frac{a_n}{\lambda^n - \lambda} z^n.

We take this as our definition of {\hat \psi_1}, define {\psi_1(z) := z + \hat \psi_1(z)}, and then define {f_1} implicitly via (4).

Let us now justify that this choice works. By (3) and the generalised Cauchy integral formula, we have {|a_n| \leq \delta_0} for all {n}; by the Diophantine assumption on {\lambda}, we thus have {|\frac{a_n}{\lambda^n - \lambda}| \ll \delta_0 n^{O(1)}}. In particular, {\hat \psi_1} converges on {B(0,1)}, and on the disk {B(0, (1-\eta_0/4))} (say) we have the bounds

\displaystyle  |\hat \psi_1(z)|, |\hat \psi'_1(z)| \ll \delta_0 \sum_{n=2}^\infty n^{O(1)} (1-\eta_0/4)^n \ll \eta_0^{-O(1)} \delta_0. \ \ \ \ \ (6)

In particular, as {\delta_0} is so small, we see that {\psi_1} maps {B(0, (1-\eta_0/4))} injectively to {B(0,1)} and {B(0,1-\eta_0)} to {B(0,1-3\eta_0/4)}, and the inverse {\psi_1^{-1}} maps {B(0, (1-\eta_0/2))} to {B(0, (1-\eta_0/4))}. From (3) we see that {f_0} maps {B(0,1-3\eta_0/4)} to {B(0,1-\eta_0/2)}, and so if we set {f_1: B(0,1-\eta_0) \rightarrow B(0,1-\eta_0/4)} to be the function {f_1 := \psi_1^{-1} \circ f_0 \circ \psi_1}, then {f_1} is a holomorphic function obeying (4). Expanding (4) in terms of {\hat f_0} and {\hat \psi_1} as before, and also writing {f_1(z) = \lambda z + \hat f_1(z)}, we have

\displaystyle  \lambda z + \lambda \hat \psi_1(z) + \hat f_0( z + \hat \psi_1(z) ) = \lambda z + \hat f_1(z) + \hat \psi_1(\lambda z + \hat f_1(z))

for {z \in B(0, 1-\eta_0)}, which by (5) simplifies to

\displaystyle  \hat f_1(z) = \hat f_0( z + \hat \psi_1(z) ) - \hat f_0(z) + \hat \psi_1(\lambda z) - \hat \psi_1(\lambda z + \hat f_1(z)).

From (6), the fundamental theorem of calculus, and the smallness of {\delta_0} we have

\displaystyle  |\hat \psi_1(\lambda z) - \hat \psi_1(\lambda z + \hat f_1(z))| \leq \frac{1}{2} |\hat f_1(z)|

and thus

\displaystyle  |\hat f_1(z)| \leq 2 |\hat f_0( z + \hat \psi_1(z) ) - \hat f_0(z)|.

From (3) and the Cauchy integral formula we have {\hat f'_0(z) = O( \delta_0 \eta_0^{-O(1)})} on (say) {B(0,1-\eta_0/4)}, and so from (6) and the fundamental theorem of calculus we conclude that

\displaystyle  |\hat f_1(z)| \ll \delta_0^2 \eta_0^{-O(1)}

on {B(0,1-\eta_0)}, and the claim follows. \Box

If we set {\eta_0 := 1/2}, {f_0 := f}, and {\delta_0>0} to be sufficiently small, then (since {f(z)-\lambda z} vanishes to second order at the origin), the hypotheses of this lemma will be obeyed for some sufficiently small {r_0}. Iterating the lemma (and halving {\eta_0} repeatedly), we can then find sequences {\eta_n, \delta_n, r_n > 0}, injective holomorphic functions {\psi_n: B(0,r_n) \rightarrow B(0,r_{n-1})} and holomorphic functions {f_n: B(0,r_n) \rightarrow {\bf C}} such that one has the recursive identities and estimates

\displaystyle  \eta_n = \eta_{n-1} / 2

\displaystyle  r_n = (1 - \eta_{n-1}) r_{n-1}

\displaystyle  \delta_n = O( \delta_{n-1}^2 \eta_{n-1}^{-O(1)} )

\displaystyle  |\psi_n(z) - z| \ll \delta_{n-1} \eta_{n-1}^{-O(1)} r_n

\displaystyle  |f_n(z) - \lambda z| \leq \delta_n r_n

\displaystyle  f_{n-1}( \psi_n(z) ) = \psi_n(f_n(z))

for all {n \geq 1} and {z \in B(0,r_n)}. By construction, {r_n} decreases to a positive radius {r_\infty} that is a constant multiple of {r_0}, while (for {\delta_0} small enough) {\delta_n} converges double-exponentially to zero, so in particular {f_n(z)} converges uniformly to {\lambda z} on {B(0,r_\infty)}. Also, {\psi_n} is close enough to the identity, the compositions {\Psi_n := \psi_1 \circ \dots \circ \psi_n} are uniformly convergent on {B(0,r_\infty/2)} with {\Psi_n(0)=0} and {\Psi'_n(0)=1}. From this we have

\displaystyle  f( \Psi_n(z) ) = \Psi_n(f_n(z))

on {B(0,r_\infty/4)}, and on taking limits using Morera’s theorem we obtain a holomorphic function {\Psi} defined near {0} with {\Psi(0)=0}, {\Psi'(0)=1}, and

\displaystyle  f( \Psi(z) ) = \Psi(\lambda z),

obtaining the required linearisation.

Remark 5 The idea of using a Newton-type method to obtain error terms that decay double-exponentially, and can therefore absorb exponential losses in the iteration, also occurs in KAM theory and in Nash-Moser iteration, presumably due to Siegel’s influence on Moser. (I discuss Nash-Moser iteration in this note that I wrote back in 2006.)

Filed under: expository, math.CV, math.DS Tagged: complex dynamics, Newton's method, Siegel's linearization theorem

Chad OrzelSteelyKid, Galactic Engineer

“Hey, Daddy, did you know that in five or six million years the Sun is going to explode.”

“It’s five or six billion years, with a ‘b.'”

“Right, in five or six billion years, the Sun’s going to explode.”

“Well, a star like our Sun won’t really explode. It’ll swell up really big, probably swallow the Earth, and then kind of… go out.”

“Right, and then it would be dark all the time. So we’d need to build a really big lamp.”

“Well, in five or six billion years, maybe we’d just build a new star.”

“How would we do that?”

“Well, you know, you just get a really big bunch of hydrogen together.”

“Oh, right, and gravity pulls it in and it heats up and then makes a star.”


“Yeah. so, that’s why we have scientists. Scientists who are, like, working on how to make stars, and make them really big. So we can make a new star when the Sun goes out.”

“In five or six billion years, sure.”

“That’s a good idea. That’s a better idea than building a big lamp. Because a big lamp would need a really big light bulb, and light bulbs burn out. A star is better than a lamp.”


“So, if the Sun goes out, what will happen to all the people?”

“What people?”

“You know… people. I mean, I know I won’t still be alive. And you definitely won’t still be alive. But what will happen to, you know, the other people?”

“Well, before the Sun goes out, it will swell up really big. We’re not exactly sure how big, but probably about the size of the Earth’s orbit, so the sun will burn up the Earth before it goes out.”

“But what will happen to the other planets?”

“Well, the Sun will get about big enough to swallow the Earth, but I don’t think it’ll get Mars. And Jupiter and Saturn and Uranus and Neptune will stay just where they are.”


“But, you know, by that time, there might be people living on Jupiter, and they’d be fine.”

“How could you live on Jupiter? It’s a gas giant. They’d just fall right into it.”

“Sure, but you could build floating cities, or something.”

“Oh, right. But you’d need to send the builders first. To make the city. And also some girls.”


“Girls are the ones who have the babies, right? So you would need to send some girls to Jupiter with the builders, to have babies so there would be more people.”

“Yeah, but some of the builders would be girls, right? I mean, girls like to build stuff, right?”

“Well, obviously. I build stuff, and I want to be a builder who, like, uses science to build things. And make them better and stuff.”

“That’s a good plan, honey. You do that.”

“Yeah, and then when the Sun goes out, we can make a new star, for the people living on Jupiter.”

“Sure. Of course, by that time, people might be living on other planets around other stars.”

“Oh, right. So maybe we could just, you know, grab one of those. Or maybe one will just come along and take the place of the Sun after it goes out.”


“You know, because space has a way of doing things.”

“Yes it does, honey, yes it does.”

April 16, 2015

Clifford JohnsonBeyond the Battling Babes

Screen Shot 2015-04-16 at 14.03.58The recent Babe War (Food Babe vs Science Babe) that probably touched your inbox or news feed is a great opportunity to think about a broader issue: the changing faces of science communication. I spoke about this with LA Times science writer Eryn Brown who wrote an excellent article about it that appears today. (Picture (Mark Boster/Los Angeles Times) and headline are from the article's online version.) (By the way, due to space issues, a lot of what we spoke about did not make it to the article (at least not in the form of quotes), including: [...] Click to continue reading this post

Doug NatelsonSeveral items - SpaceX, dark matter, Dyson spheres, Bell Labs, and some condensed matter articles

There are a number of interesting physicsy science stories out there right now:

  • SpaceX came very very close to successfully landing and recovering the first stage of their Falcon 9 rocket yesterday.  It goes almost without saying that they are doing this because they want to reuse the booster and want to avoid ruining the engines by having them end up in salt water.  I've seen a number of well-intentioned people online ask, why don't they just use a parachute, or set up a big net to catch it if it falls sideways, etc.  To answer the first question:  The booster is designed to be mechanically happy in compression, when the weight of the rocket is pushing down on the lower parts as it sits on the pad, and when the acceleration due to the engines is pushing it along its long axis.  Adding structure to make the booster strong in tension as well (as when it gets yanked on from above by parachute drogue lines) would be a major redesign and would add mass (that takes away from payload).  For the second question:  The nearly empty booster is basically a thin-walled metal tube.  If it's supported unevenly from the side, it will buckle under accelerations (like hitting a net).  Good luck to them!
  • It would appear that there is observational evidence that dark matter might interact with itself through forces that are not just gravitational.  That would be very interesting indeed.  Many "simple" ideas about dark matter (say photinos) are not charged, so real dark-dark interactions beyond gravity could limit the candidates to consider.  I'm sure there will be papers on the high energy part of the arXiv within days claiming that string theory predicts exactly this, regardless of what "this" is.
  • A Penn State group did a study based on WISE data, and concluded after surveying 100000 distant galaxies that there are only about 50 that seem to emit "too much" in the infrared relative to expectations.   Why look for this?  Well, if there were galaxy-spanning civilizations capable of stellar-scale engineering projects, and if they decided to use that capability to build Dyson spheres to try to capture more than 10% of the star-radiated power in the galaxy, and if those civilizations liked temperature ranges near ours, then you would expect to see an excess of infrared.  So.  Seems like galaxy-spanning civilizations that like to do massive building of Dyson spheres and similar structures are very rare.  I can't say that I'm surprised, but I am glad that creative people are doing searches like this.
  • Alcatel-Lucent, including Bell Labs, is being purchased by Nokia.  If anyone knows what this means for Bell Labs research at the combined company, please feel free to post below.  
  • One interesting article I noticed in Nature Physics (sorry for the paywall) shows remarkably nice, clean fractional quantum Hall effect (FQHE) physics in ZnMgO/ZnO heterostructures.  The FQHE tends to be "fragile" - the 2d electron system has to be in a material environment so clean and perfect that not only can an electron make many cyclotron orbits before it scatters off any impurities or defects, but that kind of disorder has to be weak compared to some finicky electron-electron interactions that are at milliKelvin scales.   The new data shows FQHE signatures at "filling fractions" (ratios of magnetic field to electron density) that correspond to some comparatively exotic collective states.  Neat.
  • There is a special issue of Physica C coming out in honor of the remarkable (and very nice guy) Ted Geballe, a pioneer in superconductivity research.  I really don't like Elsevier as a publisher, so I am not going to link to their journal.  However, I will link to the arXiv versions of all the articles I've found from that issue:  "What Tc Tells", "Unconventional superconductivity in electron-doped layered metal nitride halides", "Superconductivity of magnesium diboride", "Superconducting doped topological materials", "Hole-doped cuprate high temperature superconductors", "Superconductivity in the elements, alloys, and simple compounds", "Epilogue:  Superconducting materials, past, present, and future", and "Superconducting materials classes:  Introduction and overview".  Good stuff by some of the big names in the field.

Clifford JohnsonNaan

Oh Naan, I have the measure of thee... (Well, more or less. Ought to drop the level in the oven down a touch so that the broiler browns them less quickly.) (Click for larger view.) naan_construction_montage Very tasty! (The Murphy's stout was not a part of the recipe...) -cvj Click to continue reading this post

Matt StrasslerScience Festival About to Start in Cambridge, MA

It’s a busy time here in Cambridge, Massachusetts, as the US’s oldest urban Science Festival opens tomorrow for its 2015 edition.  It has been 100 years since Einstein wrote his equations for gravity, known as his Theory of General Relativity, and so this year a significant part of the festival involves Celebrating Einstein.  The festival kicks off tomorrow with a panel discussion of Einstein and his legacy near Harvard University — and I hope some of you can go!   Here are more details:


First Parish in Cambridge, 1446 Massachusetts Avenue, Harvard Square, Cambridge
Friday, April 17; 7:30pm-9:30pm

Officially kicking off the Cambridge Science Festival, four influential physicists will sit down to discuss how Einstein’s work shaped the world we live in today and where his influence will continue to push the frontiers of science in the future!

Our esteemed panelists include:
Lisa Randall | Professor of Physics, Harvard University
Priyamvada Natarajan | Professor of Astronomy & Physics, Yale University
Clifford Will | Professor of Physics, University of Florida
Peter Galison | Professor of History of Science, Harvard University
David Kaiser | Professor of the History of Science, MIT

Cost: $10 per person, $5 per student, Tickets available now at

Filed under: History of Science, Public Outreach Tagged: Einstein, PublicOutreach, relativity

April 15, 2015

Chad OrzelMy Quantum Alarm Clock

One of the things I struggle with a bit when it comes to writing about cool modern physics is how much to play up the weirdness. On the one hand, people just can’t get enough of “spooky action at a distance,” but on the other hand, talking too much about that sort of thing makes quantum physics seem like a completely bizarre theory with no applications.

Which is unfortunate, because quantum physics is essential for all manner of everyday technology. For example, as I try to explain in a new post at Forbes, quantum physics is essential to the cheap alarm clock that wakes me up in the morning.

So, you know, go over there and check it out. Also, you can see a carefully curated selection from the giant pile of books that usually lives on my nightstand with the alarm clock…

David Hogg#astrohackny, week N+1

The day started at #astrohackny with Foreman-Mackey and I arguing about convolutions of Gaussians. The question is: Consider a model (probability of the data given parameters) with two (linear) parameters of importance and 150 (linear) nuisance parameters. There is a very weak Gaussian prior on the nuisance parameters. How to write down the marginalized likelihood such that you only have to do a 2x2 least squares, not a 152x152 least squares? I had a very strong intuition about the answer but no solid argument. Very late at night I demonstrated that my intuition is correct, by the method of experimental coding. Not very satisfying, but my abilities to complete squares with high-dimensional linear operators are not strong!

Taisiya Kopytova (MPIA) is visiting NYU for a couple of months, to work on characterizing directly imaged extra-solar planets. We discussed the simultaneous fitting of photometry and spectroscopy, one of my favorite subjects! I, of course, recommended modeling the calibration (or, equivalently, continuum-normalization) issues simultaneously with the parameter estimation. We also discussed interpolation (of the model grid) and MCMC sampling and the likelihood function.

At Pizza Lunch at Columbia, Chiara Mingarelli (Caltech) talked about the Pulsar Timing Array and its project to detect the stochastic background of gravitational waves. The beautiful thing about the experiment is that it detects the motion of the Earth relative to the pulsars, not the individual motions of the pulsars, and it does so using time correlations in timing residuals as a function of angle between the pulsars. The assumption is that the ball of pulsars is far larger than the relevant wavelengths, and that different pulsars are causally unconnected in time. Interesting to think about the "multiple hypotheses" aspects of this with finite data.

David Hoggvary all the exposure times!

Ruth Angus showed up for a few days, and we talked out the first steps to make an argument for taking time-series data with variable exposure times. We all know that non-uniform spacing of data helps with frequency recovery in time series; our new intuition is that non-uniform exposure time will help as well, especially for very high frequencies (short periods). We are setting up tests now with Kepler data but an eye to challenging the TESS mission to biting a big, scary bullet.

After complaining for the millionth time about PCA (and my loyal reader—who turns out to be Todd Small at The Climate Corporation—knows I love to hate on the PCA), Foreman-Mackey and I finally decided to fire up the robust PCA or PCP method from compressed sensing (not the badly-re-named "robust PCA" in the astronomy literature). The fundamental paper is Candès et al; the method has no free parameters, and the paper includes ridiculously simple pseudo-code. It looks like it absolutely rocks, and obviates all masking or interpolation of missing or bad data!

At lunch, Gabriele Veneziano (Paris, NYU) spoke about graviton–graviton interactions and causality constraints. Question that came up in the talk: If a particle suffers a negative time delay (like the opposite of a gravitational time delay), can you necessarily therefore build a time machine? That's something to dine out on.

Matt StrasslerMore on Dark Matter and the Large Hadron Collider

As promised in my last post, I’ve now written the answer to the second of the three questions I posed about how the Large Hadron Collider [LHC] can search for dark matter.  You can read the answers to the first two questions here. The first question was about how scientists can possibly look for something that passes through a detector without leaving any trace!  The second question is how scientists can tell the difference between ordinary production of neutrinos — which also leave no trace — and production of something else. [The answer to the third question — how one could determine this “something else” really is what makes up dark matter — will be added to the article later this week.]

In the meantime, after Monday’s post, I got a number of interesting questions about dark matter, why most experts are confident it exists, etc.  There are many reasons to be confident; it’s not just one argument, but a set of interlocking arguments.  One of the most powerful comes from simulations of the universe’s history.  These simulations

  • start with what we think we know about the early universe from the cosmic microwave background [CMB], including the amount of ordinary and dark matter inferred from the CMB (assuming Einstein’s gravity theory is right), and also including the degree of non-uniformity of the local temperature and density;
  • and use equations for known physics, including Einstein’s gravity, the behavior of gas and dust when compressed and heated, the effects of various forms of electromagnetic radiation on matter, etc.

The output of the these simulations is a prediction for the universe today — and indeed, it roughly has the properties of the one we inhabit.

Here’s a video from the Illustris collaboration, which has done the most detailed simulation of the universe so far.  Note the age of the universe listed at the bottom as the video proceeds.  On the left side of the video you see dark matter.  It quickly clumps under the force of gravity, forming a wispy, filamentary structure with dense knots, which then becomes rather stable; moderately dense regions are blue, highly dense regions are pink.  On the right side is shown gas.  You see that after the dark matter structure begins to form, that structure attracts gas, also through gravity, which then forms galaxies (blue knots) around the dense knots of dark matter.  The galaxies then form black holes with energetic disks and jets, and stars, many of which explode.   These much more complicated astrophysical effects blow clouds of heated gas (red) into intergalactic space.

Meanwhile, the distribution of galaxies in the real universe, as measured by astronomers, is illustrated in this video from the Sloan Digital Sky Survey.   You can see by eye that the galaxies in our universe show a filamentary structure, with big nearly-empty spaces, and loose strings of galaxies ending in big clusters.  That’s consistent with what is seen in the Illustris simulation.

Now if you’d like to drop the dark matter idea, the question you have to ask is this: could the simulations still give a universe similar to ours if you took dark matter out and instead modified Einstein’s gravity somehow?  [Usually this type of change goes under the name of MOND.]

In the simulation, gravity causes the dark matter, which is “cold” (cosmo-speak for “made from objects traveling much slower than light speed”), to form filamentary structures that then serve as the seeds for gas to clump and form galaxies.  So if you want to take the dark matter out, and instead change gravity to explain other features that are normally explained by dark matter, you have a challenge.   You are in danger of not creating the filamentary structure seen in our universe.  Somehow your change in the equations for gravity has to cause the gas to form galaxies along filaments, and do so in the time allotted.  Otherwise it won’t lead to the type of universe that we actually live in.

Challenging, yes.  Challenging is not the same as impossible. But everyone one should understand that the arguments in favor of dark matter are by no means limited to the questions of how stars move in galaxies and how galaxies move in galaxy clusters.  Any implementation of MOND has to explain a lot of other things that, in most experts’ eyes, are efficiently taken care of by cold dark matter.

Filed under: Dark Matter, LHC Background Info Tagged: atlas, cms, DarkMatter, LHC, neutrinos

Terence Tao254A, Supplement 5: The linear sieve and Chen’s theorem (optional)

We continue the discussion of sieve theory from Notes 4, but now specialise to the case of the linear sieve in which the sieve dimension {\kappa} is equal to {1}, which is one of the best understood sieving situations, and one of the rare cases in which the precise limits of the sieve method are known. A bit more specifically, let {z, D \geq 1} be quantities with {z = D^{1/s}} for some fixed {s>1}, and let {g} be a multiplicative function with

\displaystyle  g(p) = \frac{1}{p} + O(\frac{1}{p^2}) \ \ \ \ \ (1)


\displaystyle  0 \leq g(p) \leq 1-c \ \ \ \ \ (2)

for all primes {p} and some fixed {c>0} (we allow all constants below to depend on {c}). Let {P(z) := \prod_{p<z} p}, and for each prime {p < z}, let {E_p} be a set of integers, with {E_d := \bigcap_{p|d} E_p} for {d|P(z)}. We consider finitely supported sequences {(a_n)_{n \in {\bf Z}}} of non-negative reals for which we have bounds of the form

\displaystyle  \sum_{n \in E_d} a_n = g(d) X + r_d. \ \ \ \ \ (3)

for all square-free {d \leq D} and some {X>0}, and some remainder terms {r_d}. One is then interested in upper and lower bounds on the quantity

\displaystyle  \sum_{n\not \in\bigcup_{p <z} E_p} a_n.

The fundamental lemma of sieve theory (Corollary 19 of Notes 4) gives us the bound

\displaystyle  \sum_{n\not \in\bigcup_{p <z} E_p} a_n = (1 + O(e^{-s})) X V(z) + O( \sum_{d \leq D: \mu^2(d)=1} |r_d| ) \ \ \ \ \ (4)

where {V(z)} is the quantity

\displaystyle  V(z) := \prod_{p<z} (1-g(p)). \ \ \ \ \ (5)

This bound is strong when {s} is large, but is not as useful for smaller values of {s}. We now give a sharp bound in this regime. We introduce the functions {F, f: (0,+\infty) \rightarrow {\bf R}^+} by

\displaystyle  F(s) := 2e^\gamma ( \frac{1_{s>1}}{s} \ \ \ \ \ (6)

\displaystyle  + \sum_{j \geq 3, \hbox{ odd}} \frac{1}{j!} \int_{[1,+\infty)^{j-1}} 1_{t_1+\dots+t_{j-1}\leq s-1} \frac{dt_1 \dots dt_{j-1}}{t_1 \dots t_j} )


\displaystyle  f(s) := 2e^\gamma \sum_{j \geq 2, \hbox{ even}} \frac{1}{j!} \int_{[1,+\infty)^{j-1}} 1_{t_1+\dots+t_{j-1}\leq s-1} \frac{dt_1 \dots dt_{j-1}}{t_1 \dots t_j} \ \ \ \ \ (7)

where we adopt the convention {t_j := s - t_1 - \dots - t_{j-1}}. Note that for each {s} one has only finitely many non-zero summands in (6), (7). These functions are closely related to the Buchstab function {\omega} from Exercise 28 of Supplement 4; indeed from comparing the definitions one has

\displaystyle  F(s) + f(s) = 2 e^\gamma \omega(s)

for all {s>0}.

Exercise 1 (Alternate definition of {F, f}) Show that {F(s)} is continuously differentiable except at {s=1}, and {f(s)} is continuously differentiable except at {s=2} where it is continuous, obeying the delay-differential equations

\displaystyle  \frac{d}{ds}( s F(s) ) = f(s-1) \ \ \ \ \ (8)

for {s > 1} and

\displaystyle  \frac{d}{ds}( s f(s) ) = F(s-1) \ \ \ \ \ (9)

for {s>2}, with the initial conditions

\displaystyle  F(s) = \frac{2e^\gamma}{s} 1_{s>1}

for {s \leq 3} and

\displaystyle  f(s) = 0

for {s \leq 2}. Show that these properties of {F, f} determine {F, f} completely.

For future reference, we record the following explicit values of {F, f}:

\displaystyle  F(s) = \frac{2e^\gamma}{s} \ \ \ \ \ (10)

for {1 < s \leq 3}, and

\displaystyle  f(s) = \frac{2e^\gamma}{s} \log(s-1) \ \ \ \ \ (11)

for {2 \leq s \leq 4}.

We will show

Theorem 2 (Linear sieve) Let the notation and hypotheses be as above, with {s > 1}. Then, for any {\varepsilon > 0}, one has the upper bound

\displaystyle  \sum_{n\not \in\bigcup_{p <z} E_p} a_n \leq (F(s) + O(\varepsilon)) X V(z) + O( \sum_{d \leq D: \mu^2(d)=1} |r_d| ) \ \ \ \ \ (12)

and the lower bound

\displaystyle  \sum_{n\not \in\bigcup_{p <z} E_p} a_n \geq (f(s) - O(\varepsilon)) X V(z) + O( \sum_{d \leq D: \mu^2(d)=1} |r_d| ) \ \ \ \ \ (13)

if {D} is sufficiently large depending on {\varepsilon, s, c}. Furthermore, this claim is sharp in the sense that the quantity {F(s)} cannot be replaced by any smaller quantity, and similarly {f(s)} cannot be replaced by any larger quantity.

Comparing the linear sieve with the fundamental lemma (and also testing using the sequence {a_n = 1_{1 \leq n \leq N}} for some extremely large {N}), we conclude that we necessarily have the asymptotics

\displaystyle  1 - O(e^{-s}) \leq f(s) \leq 1 \leq F(s) \leq 1 + O( e^{-s} )

for all {s \geq 1}; this can also be proven directly from the definitions of {F, f}, or from Exercise 1, but is somewhat challenging to do so; see e.g. Chapter 11 of Friedlander-Iwaniec for details.

Exercise 3 Establish the integral identities

\displaystyle  F(s) = 1 + \frac{1}{s} \int_s^\infty (1 - f(t-1))\ dt


\displaystyle  f(s) = 1 + \frac{1}{s} \int_s^\infty (1 - F(t-1))\ dt

for {s \geq 2}. Argue heuristically that these identities are consistent with the bounds in Theorem 2 and the Buchstab identity (Equation (16) from Notes 4).

Exercise 4 Use the Selberg sieve (Theorem 30 from Notes 4) to obtain a slightly weaker version of (12) in the range {1 < s < 3} in which the error term {|r_d|} is worsened to {\tau_3(d) |r_d|}, but the main term is unchanged.

We will prove Theorem 2 below the fold. The optimality of {F, f} is closely related to the parity problem obstruction discussed in Section 5 of Notes 4; a naive application of the parity arguments there only give the weak bounds {F(s) \geq \frac{2 e^\gamma}{s}} and {f(s)=0} for {s \leq 2}, but this can be sharpened by a more careful counting of various sums involving the Liouville function {\lambda}.

As an application of the linear sieve (specialised to the ranges in (10), (11)), we will establish a famous theorem of Chen, giving (in some sense) the closest approach to the twin prime conjecture that one can hope to achieve by sieve-theoretic methods:

Theorem 5 (Chen’s theorem) There are infinitely many primes {p} such that {p+2} is the product of at most two primes.

The same argument gives the version of Chen’s theorem for the even Goldbach conjecture, namely that for all sufficiently large even {N}, there exists a prime {p} between {2} and {N} such that {N-p} is the product of at most two primes.

The discussion in these notes loosely follows that of Friedlander-Iwaniec (who study sieving problems in more general dimension than {\kappa=1}).

— 1. Optimality —

We first establish that the quantities {F(s), f(s)} appearing in Theorem 2 cannot be improved. We use the parity argument of Selberg, based on weight sequences {a_n} related to the Liouville function.

We argue for the optimality of {F(s)}; the argument for {f(s)} is similar and is left as an exercise. Suppose that there is {s \geq 1} for which the claim in Theorem 2 is not optimal, thus there exists {\delta>0} such that

\displaystyle  \sum_{n\not \in\bigcup_{p <z} E_p} a_n \leq (F(s) - \delta) X V(z) + O( \sum_{d \leq D: \mu^2(d)=1} |r_d| ) \ \ \ \ \ (14)

for {z, D, g, E_p, a_n, X, V(z), r_d} as in that theorem, with {z} sufficiently large.

We will contradict this claim by specialising to a special case. Let {z} be a large parameter going to infinity, and set {D := z^s}. We set {g(d) := 1/d}, then by Mertens’ theorem we have {V(z) = \frac{e^{-\gamma}+o(1)}{\log z}}. We set {E_p} to be the residue class {0\ (p)}, thus (3) becomes

\displaystyle  \sum_{n: d|n} a_n = g(d) X + r_d \ \ \ \ \ (15)

and (14) becomes

\displaystyle  \sum_{n: (n,P(z)) = 1} a_n \leq \frac{F(s) - \delta + o(1)}{e^\gamma} \frac{X}{\log z} + O( \sum_{d \leq D: \mu^2(d)=1} |r_d| ) \ \ \ \ \ (16)

where {P(z) := \prod_{p<z} p}.

Now let {\varepsilon > 0} be a small fixed quantity to be chosen later, set {X := D^{1+\varepsilon}}, and let {a_n} be the sequence

\displaystyle  a_n := (1 - \lambda(n)) 1_{1 \leq n \leq X}.

This is clearly finitely supported and non-negative. For any {d}, we have

\displaystyle  \sum_{n: d|n} a_n = \sum_{n \leq X/d} 1 - \lambda(d) \sum_{n \leq X/d} \lambda(n)

from the multiplicativity of {\lambda}. If {d \leq D}, then {X/d \geq D^\varepsilon}, and then by the prime number theorem for the Liouville function (Exercise 41 from Notes 2, combined with Exercise 18 from Supplement 4) we have

\displaystyle  \sum_{n \leq x/d} \lambda(n) \ll_\varepsilon \frac{X}{d} \log^{-10} D

(say), annd hence the remainder term {r_d} in (15) is of size

\displaystyle  |r_d| \ll_\varepsilon \frac{X}{d} \log^{-10} D. \ \ \ \ \ (17)

As such, the error term {O( \sum_{d \leq D: \mu^2(d)=1} |r_d| )} in (16) may be absorbed into the {o(1)} term, and so

\displaystyle  \sum_{n \leq X: (n,P(z))=1} (1-\lambda(n)) \leq \frac{F(s) - \delta + o(1)}{e^\gamma} \frac{X}{\log z}. \ \ \ \ \ (18)

Now we count the left-hand side. Observe that {1-\lambda(n)} is supported on those numbers {n = p_1 \dots p_r} that are the product of an odd number of primes {p_1 \geq \dots \geq p_r} (possibly with repetition), in which case {1-\lambda(n)=2}. To be coprime to {P(z)}, all these primes must be at least {z}; since we are restricting {n \leq X = z^{(1+\varepsilon)s}}, we thus must have {r \leq (1+\varepsilon) s}. The left-hand side of (18) may thus be written as

\displaystyle  2 \sum_{r \leq (1+\varepsilon) s, \hbox{ odd}} \sum_{p_1 \geq \dots \geq p_r \geq z: p_1 \dots p_r \leq X} 1. \ \ \ \ \ (19)

This expression may be computed using the prime number theorem:

Exercise 6 Show that the expression (19) is equal to {\frac{F( s(1+\varepsilon))+o(1)}{e^\gamma} \frac{X}{\log z}}.

Since {F} is continuous for {s>1}, we obtain a contradiction if {\varepsilon} is sufficiently small.

Exercise 7 Verify the optimality of {f(s)} in Theorem 2. (Hint: replace {1-\lambda(n)} by {1+\lambda(n)} in the above arguments.)

— 2. The linear sieve —

We now prove the forward direction of Theorem 2. Again, we focus on the upper bound (12), as the lower bound case is similar.

Fix {s>1}. Morally speaking, the most natural sieve to use here is the (upper bound) beta sieve from Notes 4, with the optimal value of {\beta}, which for the linear sieve turns out to be {\beta=2}. Recall that this sieve is defined as the sum

\displaystyle  \sum_{d \in {\mathcal D}_+} \mu(d) 1_{E_d}

where {{\mathcal D}_+} is the set of divisors {d = p_1 \dots p_m} of {P(z)} with {z > p_1 > \dots > p_m}, such that

\displaystyle  p_1 \dots p_{r-1} p_r^3 \leq D

for all odd {1 \leq r \leq m}. From Proposition 14 of Notes 4 this is indeed an upper bound sieve; indeed we have

\displaystyle  \sum_{d \in {\mathcal D}_+} \mu(d) 1_{E_d}(n) = 1_{n \not \in \bigcup_{p < z} E_p} \ \ \ \ \ (20)

\displaystyle  + \sum_{r \hbox{ odd}} \sum_{d \in {\mathcal E}_r} 1_{n \in E_d} 1_{n \not \in \bigcup_{p < p_*(d)} E_p}

where {{\mathcal E}_r} is the set of divisors {d = p_1 \dots p_r} of {P(z)} with {z > p_1 > \dots > p_r = p_*(d)}, such that

\displaystyle  p_1 \dots p_{r-1} p_r^3 > D \ \ \ \ \ (21)


\displaystyle  p_1 \dots p_{r'-1} p_{r'}^3 \leq D \ \ \ \ \ (22)

for all odd {1 \leq r' < r}. Now for the key heuristic point: if {n \approx D} lies in the support of {1-\lambda}, then the sum in (20) mostly vanishes. Indeed, if {n \leq D} is such that {n \in E_d} and {n \not \in \bigcup_{p < p_*(d)} E_p} for some {d = p_1 \dots p_r \in {\mathcal E}_r} and odd {r}, then one has {n = p_1 \dots p_r q} for some {q} that is not divisible by any prime less than {p_r}. On the other hand, from (21), (22) one has

\displaystyle  q \approx \frac{D}{p_1 \dots p_r} < p_r^2


\displaystyle  q \approx \frac{D}{p_1 \dots p_r} \geq 1

which (morally) implies from the sieve of Erathosthenes that {q} is prime, thus {\lambda(n) = (-1)^{r+1} = +1} and so {n} is not in the support of {1-\lambda}. As such, we expect the upper bound sieve { \sum_{d \in {\mathcal D}_+} \mu(d) 1_{E_d}} to be extremely efficient on the support of {1-\lambda}, which when combined with the analysis of the previous section suggests that this sieve should produce the desired upper bound (12).

One can indeed use this sieve to establish the required bound (12); see Chapter 11 of Friedlander-Iwaniec for details. However, for various technical reasons it will be convenient to modify this sieve slightly, by increasing the {\beta} parameter to be slightly greater than {2}, and also by using the fundamental lemma to perform a preliminary sifting on the small primes.

We turn to the details. To prove (12) we may of course assume that {\varepsilon>0} is suitably small. It will be convenient to worsen the error {O(\varepsilon)} in (12) a little to {O( \varepsilon \log \frac{1}{\varepsilon} )}, since one can of course remove the logarithm by reducing {\varepsilon} appropriately.

Set {D_0 := z^{\varepsilon^2}} and {z_0 := z^{\varepsilon^3}}. By the fundamental lemma of sieve theory (Lemma 17 of Notes 4), one can find combinatorial upper and lower bound sieve coefficients {(\lambda^{\pm,0}_d)_{d \leq D_0}} at sifting level {z_0} supported on divisors of {P(z_0)}, such that

\displaystyle  \sum_{d \leq D_0: d|P(z_0)} \lambda^{\pm,0}_d g(d) = V( z_0 ) ( 1 + O( e^{-1/\varepsilon} ) ). \ \ \ \ \ (23)

Thus we have {\lambda^{\pm,0}_d \in \{-1,0,1\}} and

\displaystyle  \sum_{d \leq D_0: d|P(z_0)} \lambda^{-,0}_d 1_{E_d}(n) \leq 1_{n \not \in \bigcup_{p<z_0} E_p} \leq \sum_{d \leq D_0: d|P(z_0)} \lambda^{+,0}_d 1_{E_d}(n) \ \ \ \ \ (24)

for all {n}.

We will use the upper bound sieve {\sum_{d \leq D_0:d|P(z_0)} \lambda^{+,0}_d 1_{E_d}(n)} as a preliminary sieve to remove the {E_p} for {p<z_0}; the lower bound sieve {\lambda^{-,0}_d} plays only a minor supporting role, mainly to control the difference between the upper bound sieve {1_{n \not \in \bigcup_{p<z_0} E_p}}.

Next, we set {P(z_0,z) := P(z)/P(z_0) = \prod_{z_0 \leq p < z} p}, and let {\lambda^+_d} be the upper bound beta sieve with parameter {\beta = 2+\varepsilon} on the primes dividing {P(z_0,z)} up to level of distribution {D / D_0^2}. In other words, {{\mathcal D}_+} consists of those divisors {d = p_1 \dots p_r} of {P(z_0,z)} with {p_1 > \dots > p_r} such that

\displaystyle  p_1 \dots p_{m-1} p_m^{3+\varepsilon} \leq D / D_0^2

for all odd {1 \leq m \leq r}; in particular, {d \leq D/D_0^2} for all {d \in {\mathcal D}_+}. By Proposition 14 of Notes 4, this is indeed an upper bound sieve for the primes dividing {P(z_0,z)}:

\displaystyle  \sum_{d \in {\mathcal D}_+} \mu(d) 1_{E_d} \geq 1_{n \not \in \bigcup_{z_0 \leq p < z} E_p}.

Multiplying this with the second inequality in (24) (this is the method of composition of sieves), we obtain an upper bound sieve for the primes up to {z}:

\displaystyle  \sum_{d_0 \leq D_0: d_0|P(z_0)} \sum_{d \in {\mathcal D}_+} \lambda^{+,0}_{d_0} \mu(d) 1_{E_{d_0d}}(n) \geq 1_{n \not \in \bigcup_{p < z} E_p}.

Multiplying this by {a_n} and summing in {n}, we conclude that

\displaystyle  \sum_{n \not \in \bigcup_{p<z} E_p} a_n \leq \sum_{d_0 \leq D_0: d_0|P(z_0)} \sum_{d \in {\mathcal D}_+} \lambda^{+,0}_{d_0} \mu(d) \sum_{n \in E_{d_0 d}} a_n.

Note that each product {d_0 d} appears at most once in the above sum, and all such products are squarefree and at most {D}. Applying (3), we thus have

\displaystyle  \sum_{n \not \in \bigcup_{p<z} E_p} a_n \leq \sum_{d_0 \leq D_0: d_0|P(z_0)} \sum_{d \in {\mathcal D}_+} \lambda^{+,0}_{d_0} \mu(d) g(d_0) g(d) X

\displaystyle  + \sum_{d \leq D: \mu^2(d)=1} |r_d|.

Thus to prove (12), it suffices to show that

\displaystyle  \sum_{d_0 \leq D_0: d_0|P(z_0)} \sum_{d \in {\mathcal D}_+} \lambda^{+,0}_{d_0} \mu(d) g(d_0) g(d) = (F(s) + O(\varepsilon\log \frac{1}{\varepsilon})) V(z)

for {z} sufficiently large depending on {\varepsilon}. Factoring out the {d_0} summation using (23), it thus suffices to show that

\displaystyle  \sum_{d \in {\mathcal D}_+} \mu(d) g(d) = (F(s) + O(\varepsilon\log \frac{1}{\varepsilon})) V(z) / V(z_0).

Now we eliminate the role of {g}. From (1), (5) and Mertens’ theorem we have

\displaystyle  V(z) / V(z_0) = (1 + O(\varepsilon)) \frac{\log z_0}{\log z}

for {z} large enough. Also, if {d \in {\mathcal D}_+}, then {d \leq D/D_0^2} and all prime factors of {d} are at least {z_0 = D^{\varepsilon^3/s}}. Thus {d} has at most {O_{s,\varepsilon}(1)} prime factors, each of which are at least {z_0}. From (1) we then have

\displaystyle  g(d) = \frac{1}{d} + O_{s,\varepsilon}( \frac{1}{z_0} \frac{1}{d} ).

The contribution of the error term {O_{s,\varepsilon}( \frac{1}{z_0} \frac{1}{d} )} is then easily seen to be {O( \varepsilon \frac{\log z_0}{\log z} )} for {z} large enough, and so we reduce to showing that

\displaystyle  \sum_{d \in {\mathcal D}_+} \frac{\mu(d)}{d} = (F(s) + O(\varepsilon\log \frac{1}{\varepsilon})) \frac{\log z_0}{\log z}. \ \ \ \ \ (25)

One can proceed here by evaluating the left-hand side directly. However, we will proceed instead by using the weight function {1-\lambda} from the previous section. More precisely, we will evaluate the expression

\displaystyle  \sum_{n \leq D} (1-\lambda(n)) \sum_{d_0 \leq D_0: d_0|P(z_0)} \sum_{d \in {\mathcal D}_+} \lambda^{+,0}_{d_0} \mu(d) 1_{d_0d|n} \ \ \ \ \ (26)

in two different ways, where {\lambda^{+,0}_{d_0}} is as before (but with the role of {g(d)} now replaced by the function {1/d}). Firstly, since {d_0 d \leq D/D_0}, we see from the argument used to establish (17) that

\displaystyle  \sum_{n \leq D: d_0 d|n} 1-\lambda(n) = \frac{D}{d_0 d} + O( \frac{D}{d_0 d} \log^{-10} D )

(say). Since each {d_0 d} appears at least once, we can thus write (26) as

\displaystyle  \sum_{d_0 \leq D_0: d_0|P(z_0)} \sum_{d \in {\mathcal D}_+} \lambda^{+,0}_{d_0} \mu(d) \frac{D}{d_0 d} + O( D \log^{-9} D )

which upon factoring the {d_0} sum using (23) and Mertens’ theorem

\displaystyle  (1 + O(\varepsilon)) \frac{D}{e^\gamma \log z_0} \sum_{d \in {\mathcal D}_+} \frac{\mu(d)}{d} + O( D \log^{-9} D ).

Thus to verify (25), it will suffice to show that (26) is of the form

\displaystyle  (F(s) + O(\varepsilon \log \frac{1}{\varepsilon})) \frac{D}{e^\gamma \log z}

for {z} sufficiently large.

To do this, we abbreviate (26) as

\displaystyle  \sum_{n \leq D} (1-\lambda(n)) \nu^{+,0}(n) \sum_{d \in {\mathcal D}_+} \mu(d) 1_{d|n} \ \ \ \ \ (27)

where {\nu^{\pm,0}} are the sieves

\displaystyle  \nu^{\pm,0}(n) := \sum_{d_0 \leq D_0: d_0|P(z_0)} \lambda^{\pm,0}_{d_0} 1_{d_0|n}.

By Proposition 14 of Notes 4, we can expand {\sum_{d \in {\mathcal D}_+} \mu(d) 1_{d|n}} as

\displaystyle  1_{(n,P(z_0,z))=1} + \sum_{r \hbox{ odd}} \sum_{d \in {\mathcal E}_r} 1_{d|n} 1_{(n,P(z_0,p_*(d)))=1} \ \ \ \ \ (28)

where for any {r \geq 1}, {{\mathcal E}_r} is the collection of all divisors {d = p_1 \dots p_r} of {P(z_0,z)} with {p_1 > \dots > p_r = p_*(d)} such that

\displaystyle  p_1 \dots p_{r-1} p_r^{3+\varepsilon} > D/D_0^2 \ \ \ \ \ (29)


\displaystyle  p_1 \dots p_{r'-1} p_{r'}^{3+\varepsilon} \leq D/D_0^2 \ \ \ \ \ (30)

for all {1 \leq r' < r} with the same parity as {r}. For technical reasons, we will also impose the additional inequality

\displaystyle  p_1 \dots p_{r'-1} p_{r'}^{2+\varepsilon} \leq D/D_0^2 \ \ \ \ \ (31)

for all {1 \leq r' < r} with the opposite parity as {r}; this follows from (30) when {r' > 1}, but is an additional constraint when {r'=1} and {r} is even, but in the above identity {r} is odd, so this additional constraint is harmless. For similar reasons we impose the inequality

\displaystyle  p_1 \dots p_r^{1+\varepsilon} \leq D/D_0^2 \ \ \ \ \ (32)

which follows from (30) or (31) except when {r=1}, but then this inequality is automatic from our hypothesis {s>1}, which implies {z^{1+\varepsilon} \leq D/D_0^2} if {\varepsilon} is chosen small enough.

Inserting the identity (28), we can write (26) as

\displaystyle  A + \sum_{r \hbox{ odd}} B_r


\displaystyle  A :=\sum_{n \leq D} (1-\lambda(n)) \nu^{+,0}(n) 1_{(n,P(z_0,z))=1}


\displaystyle  B_r := \sum_{d \in {\mathcal E}_r} \sum_{n \leq D} (1-\lambda(n)) \nu^{+,0}(n) 1_{d|n} 1_{(n,P(z_0,p_*(d))=1}.

We first estimate {A}. By(24), we can write {A} as the sum of

\displaystyle  \sum_{n \leq D} (1-\lambda(n)) 1_{(n,P(z))=1} \ \ \ \ \ (33)

plus an error of size at most

\displaystyle  O( |\sum_{n \leq D} (\sum_{d_0 \leq D_0: d_0|P(z_0)} \lambda^{+,0}_{d_0} 1_{d_0|n} - \sum_{d_0 \leq D_0: d_0|P(z_0)} \lambda^{-,0}_{d_0} 1_{d_0|n} )| )

(where we bound {(1-\lambda(n)) 1_{(n,P(z_0,z))=1}} by {O(1)}). The error may be rearranged as

\displaystyle  O( |\sum_{d_0 \leq D_0: d_0|P(z_0)} (\lambda^{+,0}_{d_0} - \lambda^{-,0}_{d_0}) (\frac{D}{d_0}+O(1))| ),

which by (23) is of size {O( e^{-1/\varepsilon} \frac{D}{\log z_0} ) = O( \varepsilon \frac{D}{\log z} )} for {\varepsilon} small enough. As for the main term (33), we see from Exercise 6 (and the arguments preceding that exercise) that this term is equal to {\frac{F(s)+O(\varepsilon)}{e^\gamma} \frac{D}{\log z}} for {z} sufficiently large. Thus, to obtain the desired approximation for (26), it will suffice to show that

\displaystyle  \sum_{r \hbox{ odd}} B_r \ll \varepsilon \log \frac{1}{\varepsilon} \frac{D}{\log z}.

Next, we establish an exponential decay estimate on the {B_r}:

Lemma 8 For {z} sufficiently large depending on {\varepsilon}, we have

\displaystyle  B_r \ll \varepsilon^{-O(1)} \alpha^r e^{-s} \frac{D}{\log z}

for all {r \geq 1} and some absolute constant {0 < \alpha < 1}.

Proof: (Sketch) Note that if {d = p_1 \dots p_r} is in {E_r}, then {d \leq D/D_0^2} and all prime factors are at least {z_0 = D^{\varepsilon^3}}, thus we may assume without loss of generality that {r \leq 1/\varepsilon^3}.

We bound

\displaystyle  B_r \ll \sum_{d \in {\mathcal E}_r} \sum_{n \leq D/d} \nu^{+,0}(n) 1_{(n,P(z_0,p_*(d))=1}.

Note that if {d = p_1 \dots p_r} lies in {{\mathcal E}_r}, then

\displaystyle  D_0^2 p_r^\varepsilon \leq \frac{D}{d} < D_0^2 p_r^{2+\varepsilon}

thanks to (29), (32). From this and the fundamental lemma of sieve theory we see (Exercise!) that

\displaystyle  \sum_{n \leq D/d} \nu^{+,0}(n) 1_{(n,P(z_0,p_r))=1} \ll \varepsilon^{-O(1)} \frac{D}{d \log p_*(d)}

and so it will suffice to show that

\displaystyle  \sum_{d \in {\mathcal E}_r} \frac{1}{d} \frac{\log z}{\log p_*(d)} \ll \alpha^r e^{-s}. \ \ \ \ \ (34)

By the prime number theorem, the left-hand side is bounded (Exercise!) by {f_r(s-2\varepsilon^3) + o(1)} as {z \rightarrow \infty}, where

\displaystyle  f_r(s) := \int_{J_r(s)} \frac{dt_1 \dots dt_r}{t_1 \dots t_{r-1} t_r^2}

and {J_r(s) \subset (0,+\infty)^r} is the set of points {(t_1,\dots,t_r)} with {1 \geq t_1 \geq \dots \geq t_r > 0},

\displaystyle  t_1 + \dots + t_{r-1} + (3+\varepsilon) t_r \geq s,

and such that

\displaystyle  t_1 + \dots + t_{r'-1} + (3+\varepsilon)t_{r'} \leq s

for all {1 \leq r' < r} with the same parity as {r}, and

\displaystyle  t_1 + \dots + t_{r'-1} + (2+\varepsilon)t_{r'} \leq s

for all {1 \leq r' \leq r}. It thus suffices to prove the bound

\displaystyle  f_r(s) \leq C \alpha^r e^{-s} \ \ \ \ \ (35)

for all {s>0} and some absolute constant {C>0}.

We use an argument from the book of Harman. Observe that {f_r(s)} vanishes for {s > r+2}, which makes the claim (35) routine for {r=1} (Exercise!) for {C} sufficiently large. We will now inductively prove (35) for all odd {r}. From the change of variables {u = \frac{s}{t_1}}, we obtain the identity

\displaystyle  f_r(s) = \frac{1}{s} \int_{\max(s, b_r)}^\infty f_{r-1}(t-1)\ dt \ \ \ \ \ (36)

where {b_r := 3+\varepsilon} when {r} is odd and {b_r := 2+\varepsilon} when {r} is even (Exercise!). In particular, if {r \geq 3} is odd and (35) was already proven for {r-2}, then

\displaystyle  f_r(s) = \frac{1}{s} \int_{\max(s,3+\varepsilon)}^\infty \frac{1}{t-1} \int_{t-1}^\infty f_{r-2}(u-1)\ du dt

\displaystyle  \leq \frac{1}{s} \int_{\max(s,3+\varepsilon)}^\infty \frac{1}{t-1} \int_{t-1}^\infty C \alpha^{r-2} e^{1-u}\ du dt

\displaystyle  \leq C \alpha^{r} e^{-s} \times \alpha^{-2} \times \frac{e^{s+2}}{s} \int_{\max(s,3+\varepsilon)}^\infty \frac{e^{-t}}{t-1}\ dt.

One can check (Exercise!) that the quantity {\frac{e^{s+2}}{s} \int_{\max(s,3+\varepsilon)}^\infty \frac{e^{-t}}{t-1}\ dt} is maximised at {s=3+\varepsilon}, where its value is less than {1} (in fact it is {0.94\dots}) if {\varepsilon} is small enough. As such, we obtain (35) if {\alpha} is sufficiently close to {1}.

Finally, (35) for even {r} follows from the odd {r} case (with a slightly larger choice of {C}) by one final application of (36). \Box

Exercise 9 Fill in the steps marked (Exercise!) in the above proof.

In view of this lemma, the total contribution of {B_r} with {r > C \log \frac{1}{\varepsilon}} for some sufficiently large {C} is acceptable. Thus it suffices to show that

\displaystyle  B_r \ll \varepsilon \frac{D}{\log z}

whenever {r = O( \log\frac{1}{\varepsilon} )} is odd.

By (24), we can write {B_r} as

\displaystyle  \sum_{d \in {\mathcal E}_r} \sum_{n \leq D} (1-\lambda(n)) 1_{d|n} 1_{(n,P(p_*(d)))=1}

plus an error of size

\displaystyle  O( \sum_{d \in {\mathcal E}_r} \sum_{n \leq D} (\sum_{d_0 \leq D_0: d_0|P(z_0)} \lambda^{+,0}_{d_0} 1_{dd_0|n} - \sum_{d_0 \leq D_0: d_0|P(z_0)} \lambda^{-,0}_{d_0} 1_{dd_0|n} ) ).

Arguing as in the treatment of the {A} term, we see from (23) that the error term is bounded by

\displaystyle  \ll O( \sum_{d \in {\mathcal E}_r} \sum_{d_0 \leq D_0: d_0|P(z_0)} (\lambda^{+,0}_{d_0} - \lambda^{-,0}_{d_0}) \frac{D}{dd_0} ) + \sum_{d \in {\mathcal E}_r} \sum_{d_0 \leq D_0: d_0|P(z_0)} 1

\displaystyle  \ll \sum_{d \in {\mathcal E}_r} e^{-1/\varepsilon} \frac{D}{d \log z_0} + \frac{D}{D_0^2} D_0

\displaystyle  \ll e^{-1/\varepsilon} \frac{D}{\log z_0} (\sum_{z_0 \leq p \leq z} \frac{1}{p})^r + \frac{D}{D_0}

\displaystyle  \ll e^{-1/\varepsilon} \frac{D}{\log z_0} O( \log \frac{\log z}{\log z_0} )^r + \frac{D}{D_0}

\displaystyle  \ll \varepsilon \frac{D}{\log z}

as desired for {z} large enough, since {r = O(\log \frac{1}{\varepsilon})} and {\frac{\log z}{\log z_0} = O( \varepsilon^{-3} )}. Thus it suffices to show that

\displaystyle  \sum_{d \in {\mathcal E}_r} \sum_{n \leq D} (1-\lambda(n)) 1_{d|n} 1_{(n,P(p_*(d)))=1} \ll \varepsilon \frac{D}{\log z}. \ \ \ \ \ (37)

If {d = p_1 \dots p_r} and {n} appear in the above sum, then we have {n = p_1 \dots p_r q} where {q \leq \frac{D}{d}} has no prime factor less than {p_r}, has an even number of prime factors, and obeys the bounds

\displaystyle  q \leq D_0^2 p_r^{2+\varepsilon}

thanks to (29). Note that (29) also gives {p_r \geq (D/D_0^2)^{1/(r+2)}}, and thus (since {r = O( \log \frac{1}{\varepsilon} )} and {D_0 = z^{\varepsilon^3}}) we see that {q < p_r^3} if {\varepsilon} is small enough and {z} is large enough. This forces {q} to either equal {1}, or be the product of two primes between {p_r} and {D_0^2 p_r^{1+\varepsilon}}. The contribution of the {q=1} case is bounded by {|{\mathcal E}_r| \leq D/D_0^2}, which is acceptable. As for the contribution of those {q} that are the product of two primes, the prime number theorem shows that there are at most

\displaystyle  \sum_{p_r \leq p \leq D_0^2 p_r^{1+\varepsilon}} O( \frac{D}{d p \log p_r} ) = O( \varepsilon \frac{D}{d \log p_r} )

values of {q} that can contribute to the sum, and so this contribution to {B_r} is at most

\displaystyle  O( \varepsilon \frac{D}{\log z} \sum_{d \in {\mathcal E}_r} \frac{1}{d} \frac{\log z}{\log p_*(d)} ),

but by (34) the sum here is {O(1)} for {z} large enough, and the claim follows. This completes the proof of (12).

Exercise 10 Establish the lower bound (13) in Theorem 2. (Note that one can assume without loss of generality that {s>2}, which will now be needed to ensure (31) when {r'=1}.)

— 3. Chen’s theorem —

We now prove Chen’s theorem for twin primes, loosely following the treatment in Chapter 25 of Friedlander-Iwaniec. We will in fact show the slightly stronger statement that

\displaystyle  \sum_{x/2 \leq n \leq x-2} \Lambda(n) 1_{{\mathcal P}_2}(n+2) 1_{(n+2,P(z))=1} \gg \frac{x}{\log x}

for sufficiently large {x}, where {{\mathcal P}_2} is the set of all numbers that are products of at most two primes, and {z := x^{1/8}}. Indeed, after removing the (negligible) contribution of those {n} that are powers of primes, this estimate would imply that there are infinitely many primes {p} such that {p+2} is the product of at most two primes, each of which is at least {p^{1/8}}.

Chen’s argument begins with the following simple lower bound sieve for {1_{{\mathcal P}_2}}:

Lemma 11 If {x^{2/3} < n \leq x}, then

\displaystyle  1_{{\mathcal P}_2}(n) \geq 1 - \frac{1}{2} \sum_{p \leq x^{1/3}} 1_{p|n} - \frac{1}{2} \sum_{p_1 \leq x^{1/3} < p_2 \leq p_3} 1_{n=p_1p_2p_3}.

Proof: If {n} has no prime factors less than or equal to {x^{1/3}}, then {n \in {\mathcal P}_2} and the claim follows. If {n} has two or more factors less than or equal to {x^{1/3}}, then {1 - \frac{1}{2} \sum_{p \leq x^{1/3}} 1_{p|n} \leq 0} and the claim follows. Finally, if {n} has exactly one factor less than or equal to {x^{1/3}}, then (as {n > x^{2/3}}) it must be of the form {n=p_1p_2p_3} for some {p_1 \leq x^{1/3} < p_2 \leq p_3}, and the claim again follows. \Box

In view of this sieve (trivially managing the contribution of {n \leq x^{2/3}}, and using the restriction of {n+2} to be coprime to {P(z)}), it suffices to show that

\displaystyle  A_1 - \frac{1}{2} \sum_{z \leq p \leq x^{1/3}} A_{2,p} - \frac{1}{2} A_3 \gg \frac{x}{\log x} \ \ \ \ \ (38)

for sufficiently large {x}, where

\displaystyle  A_1 := \sum_{x/2 \leq n \leq x-2} \Lambda(n) 1_{(n+2,P(z))=1},

\displaystyle  A_{2,p} := \sum_{x/2 \leq n \leq x-2: p|n+2} \Lambda(n) 1_{(n+2,P(z))=1},


\displaystyle  A_3 := \sum_{x/2 \leq n \leq x-2} \Lambda(n) \sum_{z \leq p_1 \leq x^{1/3} < p_2 \leq p_3} 1_{n+2=p_1p_2p_3}.

We thus seek sufficiently good lower bounds on {A_1} and sufficiently good upper bounds on {A_{2,p}} and {A_3}. As it turns out, the linear sieve, combined with the Bombieri-Vinogradov theorem, will give bounds on {A_1, A_{2,p}, A_3} with numerical constants that are sufficient for this purpose.

We begin with {A_1}. We use the lower bound linear sieve, with {E_p} equal to the residue class {-2\ (p)} for all {p < z}, so that {E_d} is the residue class {-2\ (d)}. We approximate

\displaystyle  \sum_{x/2 \leq n \leq x-2: n \in E_d} \Lambda(n) = g(d) \frac{x}{2} + r_d

where {g} is the multiplicative function with {g(2) :=0} and {g(p) := \frac{1}{p-1}} for {p>2}. From the Bombieri-Vinogradov theorem (Theorem 17 of Notes 3) we have

\displaystyle  \sum_{d \leq D} |r_d| \ll x \log^{-10} x \ \ \ \ \ (39)

(say) if {D := x^{1/2 - \varepsilon} = z^{4-8\varepsilon}} for some small fixed {\varepsilon > 0}. Applying the lower bound linear sieve (13), we conclude that

\displaystyle  A_1 \geq (f(4-8\varepsilon) - O(\varepsilon)) \frac{x}{2} V(z) + O( x \log^{-10} x )


\displaystyle  V(z) = \prod_{p < z} (1-g(p)).

We can compute an asymptotic for {V}:

Exercise 12 Show that

\displaystyle  V(z) = (2 \Pi_2 + o(1)) \frac{1}{e^\gamma \log z}

as {z \rightarrow \infty}, where {\Pi_2 = \prod_{p>2} (1-\frac{1}{(p-1)^2})} is the twin prime constant.

From (11) we have {f(4) = \frac{e^\gamma}{2} \log 3}. Sending {\varepsilon} slowly to zero, we conclude that

\displaystyle  A_1 \geq (\log 3-o(1)) \Pi_2 \frac{x}{2\log z}. \ \ \ \ \ (40)

Now we turn to {A_{2,p}}. Here we use the upper bound linear sieve. Let {E_d} be as before. For any {d} dividing {P(z)} and {z \leq p < x^{1/3}}, we have

\displaystyle  \sum_{x/2 \leq n \leq x-2: n \in E_d} \Lambda(n) 1_{p|n+2} = g(d) g(p) \frac{x}{2} + r_{pd}

where {g} and {r_d} are as previously. We apply the upper bound linear sieve (12) with level of distribution {D/p}, to conclude that

\displaystyle  A_{2,p} \leq (F( \frac{\log D/p}{\log z} ) + O(\varepsilon)) g(p) \frac{x}{2} V(z)

\displaystyle  + O( \sum_{d \leq D/p} |r_{pd}| ).

We sum over {p}. Since {pd} is at most {D}, and each number less than or equal to {D} has at most {O( \log x )} prime factors, we have

\displaystyle  \sum_{z \leq p < x^{1/3}} A_{2,p} \leq \sum_{z \leq p< x^{1/3}} (F( \frac{\log D/p}{\log z} )

\displaystyle + O(\varepsilon)) g(p) \frac{x}{2} V(z) + O( \log x \sum_{d \leq D} |r_d| ).

The error term is {O( x \log^{-9} x )} thanks to (39). Since {g(p) = \frac{1+o(1)}{p}}, we thus have

\displaystyle  \sum_{z \leq p < x^{1/3}} A_{2,p} \leq (\sum_{z \leq p< x^{1/3}} \frac{F( \frac{\log D/p}{\log z} )}{p} + O(\varepsilon)) \frac{\Pi_2 x}{e^\gamma \log z}

for sufficiently large {x}, thanks to Exercise 12. We can compute the sum using Exercise 37 of Notes 1, to obtain

\displaystyle  \sum_{z \leq p< x^{1/3}} \frac{F( \frac{\log D/p}{\log z} )}{p} = \int_1^{8/3} F( 4-8\varepsilon-t ) \frac{dt}{t} + o(1),

which by (10) and sending {\varepsilon \rightarrow 0} slowly gives

\displaystyle  \sum_{z \leq p < x^{1/3}} A_{2,p} \leq (\int_1^{8/3} \frac{1}{4-t} \frac{dt}{t} + o(1)) \frac{2 \Pi_2 x}{\log z}.

A routine computation shows that

\displaystyle  \int_1^{8/3} \frac{1}{4-t} \frac{dt}{t} = \frac{\log 6}{4}

and so

\displaystyle  \sum_{z \leq p < x^{1/3}} A_{2,p} \leq (\log 6+o(1)) \Pi_2 \frac{x}{2\log z}. \ \ \ \ \ (41)

Finally, we consider {A_3}, which is estimated by “switching” the sieve to sift out small divisors of {n}, rather than small divisors of {n+2}. Removing those {n} with {n \leq \sqrt{x}}, as well as those {n} that are powers of primes, and then shifting {n} by {2}, we have

\displaystyle  A_3 \leq \log x \sum_{n} 1_{(n-2,P(\sqrt{x}))=1} a_n + O_\varepsilon( x^{1/2+\varepsilon} )

where {a_n} is the finitely supported non-negative sequence

\displaystyle  a_n := 1_{x/2+2 \leq n \leq x} \sum_{z \leq p_1 \leq x^{1/3} < p_2 \leq p_3} 1_{n=p_1p_2p_3}. \ \ \ \ \ (42)

Here we are sifting out the residue classes {E'_p := 2\ (p)}, so that {E'_d := 2\ (d)}.

The sequence {a_n} has good distribution up to level {D}:

Proposition 13 One has

\displaystyle  \sum_{n \in E'_d} a_n = g(d) \sum_n a_n + r_d

where {g(d)} is as before, and

\displaystyle  \sum_{d \leq D: \mu^2(d)=1} |r_d| \ll x \log^{-10} x

(say), with {D} as before.

Proof: Observe that the quantity {p_3} in (42) is bounded above by {x^{2/3}} if the summand is to be non-zero. We now use a finer-than-dyadic decomposition trick similar to that used in the proof of the Bombieri-Vinogradov theorem in Notes 3 to approximate {a_n} as a combination of Dirichlet convolutions. Namely, we set {\lambda := 1 + \log^{-20} x}, and partition {[x^{1/3},x^{2/3}]} (plus possibly a little portion to the right of {x^{2/3}}) into {A=O( \log^{21} x )} consecutive intervals {I_1,\dots,I_A} each of the form {[N, \lambda N]} for some {x^{1/3} \leq N \leq x^{2/3}}. We similarly split {[z,x^{1/3}]} (plus possibly a tiny portion of {[z/2,z]}) into {B=O( \log^{21} x)} intervals {J_1,\dots, J_B} each of the form {[M, \lambda M]} for some {1 \ll M \leq x^{1/3}}. We can thus split {a_n} as

\displaystyle  a_n = \sum_{1 \leq b \leq B} \sum_{1 \leq a \leq a' \leq A} \sum_{z \leq p_1 \leq x^{1/3} < p_2 \leq p_3: p_1 \in J_b, p_2 \in I_a, p_3 \in I_{a'}} 1_{n=p_1p_2p_3}1_{x/2+2 \leq n \leq x} .

Observe that for each {a,a'} there are only {O(1)} choices of {b} for which the summand can be non-zero. As such, the contribution of the diagonal case {a=a'} can be easily seen to be absorbed into the {r_d} error, as can those cases where the product set {\{ uvw: u \in J_b, v \in I_a, r \in I_{a'} \}} is not contained completely in {[x/2+2,x]}. If we let {\Omega} be the set of triplets {(b,a,a')} obeying these properties, we can thus approximate {a_n} by {\sum_{(b,a,a') \not \in \Omega} a_n^{(b,a,a')}}, where {a_n^{(b,a,a')}} is the Dirichlet convolution

\displaystyle  a_n^{(b,a,a')} := 1_{{\mathcal P} \cap J_b} * 1_{{\mathcal P} \cap I_a} * 1_{{\mathcal P} \cap I_{a'}}.

From the general Bombieri-Vinogradov theorem (Theorem 16 of Notes 3) and the Siegel-Walfisz theorem (Exercise 64 of Notes 2) we see that

\displaystyle  \sum_{n \in E_d} a_n^{(b,a,a')} = g(d) X^{(b,a,a')} + r_d^{(b,a,a')}


\displaystyle  \sum_{d \leq D: \mu^2(d)=1} |r_d^{(b,a,a')}| \ll x \log^{-10} x

(say) and

\displaystyle  X^{(b,a,a')} = \sum_n a_n^{(b,a,a')}.

This gives the claim with {\sum_n a_n} replaced by the quantity {\sum_{(b,a,a') \not \in \Omega} X^{(b,a,a')}}; but by undoing the previous decomposition we see that this quantity is equal to {\sum_n a_n} up to an error of {O( x \log^{-11} x)} (say), and the claim follows.

Applying the upper bound sieve (12) (with sifting level {D^{1/(1+\varepsilon)}}), we thus have

\displaystyle  \sum_{n} 1_{(n-2,P(\sqrt{x}))=1} a_n \leq (F(1+\varepsilon)+O(\varepsilon)) V( D^{1/(1+\varepsilon)} ) \sum_n a_n + O( x \log^{-10} x )

and hence by (10) and Exercise 12

\displaystyle  \sum_{n} 1_{(n-2,P(\sqrt{x}))=1} a_n \leq (4+O(\varepsilon)) \frac{2\Pi_2}{\log x} \sum_n a_n + O( x \log^{-10} x )

for {x} sufficiently large.

Note that

\displaystyle  \sum_n a_n = \sum_{z \leq p_1 \leq x^{1/3} < p_2 < \frac{x}{p_1 p_2}} \sum_{\max( p_2, \frac{x/2+2}{p_1 p_2} ) < p_3 < \frac{x}{p_1 p_2}} 1.

From the prime number theorem and Exercise 37 of Notes 1, we thus have

\displaystyle  \sum_n a_n \leq (1+o(1)) \frac{x}{2\log x} \int_{1/8 \leq t_1 \leq 1/3 < t_2 < 1-t_1-t_2} \frac{dt_1 dt_2}{t_1 t_2 (1-t_1-t_2)}.

(In fact one also has a matching lower bound, but we will not need it here.) We thus conclude that

\displaystyle  A_3 \leq (C+o(1)) \Pi_2 \frac{x}{2\log z}


\displaystyle  C := \int_{1/8 \leq t_1 \leq 1/3 < t_2 < 1-t_1-t_2} \frac{dt_1 dt_2}{t_1 t_2 (1-t_1-t_2)}

\displaystyle  = 0.363 \dots

The left-hand side of (38) is then at least

\displaystyle  (2 \log 3 - \log 6 - C) \Pi_2 \frac{x}{4 \log z}.

One can calculate that {2 \log 3 - \log 6 - C > 0}, and the claim follows.

Exercise 14 Establish Chen’s theorem for the even Goldbach conjecture.

Remark 15 If one is willing to use stronger distributional claims on the primes than is provided by the Bombieri-Vinogradov theorem, then one can use a simpler sieve than Chen’s sieve to establish Chen’s theorem, but then the required distributional theorem will then either be conjectural or more difficult to establish than the Bombieri-Vinogradov theorem. See Chapter 25 of Friedlander-Iwaniec for further discussion.

Filed under: 254A - analytic prime number theory, math.NT Tagged: Chen's theorem, linear sieve, sieve theory, twin primes

April 14, 2015

Terence Tao254A, Notes 8: The Hardy-Littlewood circle method and Vinogradov’s theorem

We have seen in previous notes that the operation of forming a Dirichlet series

\displaystyle  {\mathcal D} f(n) := \sum_n \frac{f(n)}{n^s}

or twisted Dirichlet series

\displaystyle  {\mathcal D} \chi f(n) := \sum_n \frac{f(n) \chi(n)}{n^s}

is an incredibly useful tool for questions in multiplicative number theory. Such series can be viewed as a multiplicative Fourier transform, since the functions {n \mapsto \frac{1}{n^s}} and {n \mapsto \frac{\chi(n)}{n^s}} are multiplicative characters.

Similarly, it turns out that the operation of forming an additive Fourier series

\displaystyle  \hat f(\theta) := \sum_n f(n) e(-n \theta),

where {\theta} lies on the (additive) unit circle {{\bf R}/{\bf Z}} and {e(\theta) := e^{2\pi i \theta}} is the standard additive character, is an incredibly useful tool for additive number theory, particularly when studying additive problems involving three or more variables taking values in sets such as the primes; the deployment of this tool is generally known as the Hardy-Littlewood circle method. (In the analytic number theory literature, the minus sign in the phase {e(-n\theta)} is traditionally omitted, and what is denoted by {\hat f(\theta)} here would be referred to instead by {S_f(-\theta)}, {S(f;-\theta)} or just {S(-\theta)}.) We list some of the most classical problems in this area:

  • (Even Goldbach conjecture) Is it true that every even natural number {N} greater than two can be expressed as the sum {p_1+p_2} of two primes?
  • (Odd Goldbach conjecture) Is it true that every odd natural number {N} greater than five can be expressed as the sum {p_1+p_2+p_3} of three primes?
  • (Waring problem) For each natural number {k}, what is the least natural number {g(k)} such that every natural number {N} can be expressed as the sum of {g(k)} or fewer {k^{th}} powers?
  • (Asymptotic Waring problem) For each natural number {k}, what is the least natural number {G(k)} such that every sufficiently large natural number {N} can be expressed as the sum of {G(k)} or fewer {k^{th}} powers?
  • (Partition function problem) For any natural number {N}, let {p(N)} denote the number of representations of {N} of the form {N = n_1 + \dots + n_k} where {k} and {n_1 \geq \dots \geq n_k} are natural numbers. What is the asymptotic behaviour of {p(N)} as {N \rightarrow \infty}?

The Waring problem and its asymptotic version will not be discussed further here, save to note that the Vinogradov mean value theorem (Theorem 13 from Notes 5) and its variants are particularly useful for getting good bounds on {G(k)}; see for instance the ICM article of Wooley for recent progress on these problems. Similarly, the partition function problem was the original motivation of Hardy and Littlewood in introducing the circle method, but we will not discuss it further here; see e.g. Chapter 20 of Iwaniec-Kowalski for a treatment.

Instead, we will focus our attention on the odd Goldbach conjecture as our model problem. (The even Goldbach conjecture, which involves only two variables instead of three, is unfortunately not amenable to a circle method approach for a variety of reasons, unless the statement is replaced with something weaker, such as an averaged statement; see this previous blog post for further discussion. On the other hand, the methods here can obtain weaker versions of the even Goldbach conjecture, such as showing that “almost all” even numbers are the sum of two primes; see Exercise 34 below.) In particular, we will establish the following celebrated theorem of Vinogradov:

Theorem 1 (Vinogradov’s theorem) Every sufficiently large odd number {N} is expressible as the sum of three primes.

Recently, the restriction that {n} be sufficiently large was replaced by Helfgott with {N > 5}, thus establishing the odd Goldbach conjecture in full. This argument followed the same basic approach as Vinogradov (based on the circle method), but with various estimates replaced by “log-free” versions (analogous to the log-free zero-density theorems in Notes 7), combined with careful numerical optimisation of constants and also some numerical work on the even Goldbach problem and on the generalised Riemann hypothesis. We refer the reader to Helfgott’s text for details.

We will in fact show the more precise statement:

Theorem 2 (Quantitative Vinogradov theorem) Let {N \geq 2} be an natural number. Then

\displaystyle  \sum_{a,b,c: a+b+c=N} \Lambda(a) \Lambda(b) \Lambda(c) = G_3(N) \frac{N^2}{2} + O_A( N^2 \log^{-A} N )

for any {A>0}, where

\displaystyle  G_3(N) = \prod_{p|N} (1-\frac{1}{(p-1)^2}) \times \prod_{p \not | N} (1 + \frac{1}{(p-1)^3}). \ \ \ \ \ (1)

The implied constants are ineffective.

We dropped the hypothesis that {N} is odd in Theorem 2, but note that {G_3(N)} vanishes when {N} is even. For odd {N}, we have

\displaystyle  1 \ll G_3(N) \ll 1.

Exercise 3 Show that Theorem 2 implies Theorem 1.

Unfortunately, due to the ineffectivity of the constants in Theorem 2 (a consequence of the reliance on the Siegel-Walfisz theorem in the proof of that theorem), one cannot quantify explicitly what “sufficiently large” means in Theorem 1 directly from Theorem 2. However, there is a modification of this theorem which gives effective bounds; see Exercise 32 below.

Exercise 4 Obtain a heuristic derivation of the main term {G_3(N) \frac{N^2}{2}} using the modified Cramér model (Section 1 of Supplement 4).

To prove Theorem 2, we consider the more general problem of estimating sums of the form

\displaystyle  \sum_{a,b,c \in {\bf Z}: a+b+c=N} f(a) g(b) h(c)

for various integers {N} and functions {f,g,h: {\bf Z} \rightarrow {\bf C}}, which we will take to be finitely supported to avoid issues of convergence.

Suppose that {f,g,h} are supported on {\{1,\dots,N\}}; for simplicity, let us first assume the pointwise bound {|f(n)|, |g(n)|, |h(n)| \ll 1} for all {n}. (This simple case will not cover the case in Theorem 2, when {f,g,h} are truncated versions of the von Mangoldt function {\Lambda}, but will serve as a warmup to that case.) Then we have the trivial upper bound

\displaystyle  \sum_{a,b,c \in {\bf Z}: a+b+c=N} f(a) g(b) h(c) \ll N^2. \ \ \ \ \ (2)

A basic observation is that this upper bound is attainable if {f,g,h} all “pretend” to behave like the same additive character {n \mapsto e(\theta n)} for some {\theta \in {\bf R}/{\bf Z}}. For instance, if {f(n)=g(n)=h(n) = e(\theta n) 1_{n \leq N}}, then we have {f(a)g(b)h(c) = e(\theta N)} when {a+b+c=N}, and then it is not difficult to show that

\displaystyle  \sum_{a,b,c \in {\bf Z}: a+b+c=N} f(a) g(b) h(c) = (\frac{1}{2}+o(1)) e(\theta N) N^2

as {N \rightarrow \infty}.

The key to the success of the circle method lies in the converse of the above statement: the only way that the trivial upper bound (2) comes close to being sharp is when {f,g,h} all correlate with the same character {n \mapsto e(\theta n)}, or in other words {\hat f(\theta), \hat g(\theta), \hat h(\theta)} are simultaneously large. This converse is largely captured by the following two identities:

Exercise 5 Let {f,g,h: {\bf Z} \rightarrow {\bf C}} be finitely supported functions. Then for any natural number {N}, show that

\displaystyle  \sum_{a,b,c: a+b+c=N} f(a) g(b) h(c) = \int_{{\bf R}/{\bf Z}} \hat f(\theta) \hat g(\theta) \hat h(\theta) e(\theta N)\ d\theta \ \ \ \ \ (3)


\displaystyle  \sum_n |f(n)|^2 = \int_{{\bf R}/{\bf Z}} |\hat f(\theta)|^2\ d\theta.

The traditional approach to using the circle method to compute sums such as {\sum_{a,b,c: a+b+c=N} f(a) g(b) h(c)} proceeds by invoking (3) to express this sum as an integral over the unit circle, then dividing the unit circle into “major arcs” where {\hat f(\theta), \hat g(\theta),\hat h(\theta)} are large but computable with high precision, and “minor arcs” where one has estimates to ensure that {\hat f(\theta), \hat g(\theta),\hat h(\theta)} are small in both {L^\infty} and {L^2} senses. For functions {f,g,h} of number-theoretic significance, such as truncated von Mangoldt functions, the “major arcs” typically consist of those {\theta} that are close to a rational number {\frac{a}{q}} with {q} not too large, and the “minor arcs” consist of the remaining portions of the circle. One then obtains lower bounds on the contributions of the major arcs, and upper bounds on the contribution of the minor arcs, in order to get good lower bounds on {\sum_{a,b,c: a+b+c=N} f(a) g(b) h(c)}.

This traditional approach is covered in many places, such as this text of Vaughan. We will emphasise in this set of notes a slightly different perspective on the circle method, coming from recent developments in additive combinatorics; this approach does not quite give the sharpest quantitative estimates, but it allows for easier generalisation to more combinatorial contexts, for instance when replacing the primes by dense subsets of the primes, or replacing the equation {a+b+c=N} with some other equation or system of equations.

From Exercise 5 and Hölder’s inequality, we immediately obtain

Corollary 6 Let {f,g,h: {\bf Z} \rightarrow {\bf C}} be finitely supported functions. Then for any natural number {N}, we have

\displaystyle  |\sum_{a,b,c: a+b+c=N} f(a) g(b) h(c)| \leq (\sum_n |f(n)|^2)^{1/2} (\sum_n |g(n)|^2)^{1/2}

\displaystyle  \times \sup_\theta |\sum_n h(n) e(n\theta)|.

Similarly for permutations of the {f,g,h}.

In the case when {f,g,h} are supported on {[1,N]} and bounded by {O(1)}, this corollary tells us that we have {\sum_{a,b,c: a+b+c=N} f(a) g(b) h(c)} is {o(N^2)} whenever one has {\sum_n h(n) e(n\theta) = o(N)} uniformly in {\theta}, and similarly for permutations of {f,g,h}. From this and the triangle inequality, we obtain the following conclusion: if {f} is supported on {[1,N]} and bounded by {O(1)}, and {f} is Fourier-approximated by another function {g} supported on {[1,N]} and bounded by {O(1)} in the sense that

\displaystyle  \sum_n f(n) e(n\theta) = \sum_n g(n) e(n\theta) + o(N)

uniformly in {\theta}, then we have

\displaystyle  \sum_{a,b,c: a+b+c=N} f(a) f(b) f(c) = \sum_{a,b,c: a+b+c=N} g(a) g(b) g(c) + o(N^2). \ \ \ \ \ (4)

Thus, one possible strategy for estimating the sum {\sum_{a,b,c: a+b+c=N} f(a) f(b) f(c)} is, one can effectively replace (or “model”) {f} by a simpler function {g} which Fourier-approximates {g} in the sense that the exponential sums {\sum_n f(n) e(n\theta), \sum_n g(n) e(n\theta)} agree up to error {o(N)}. For instance:

Exercise 7 Let {N} be a natural number, and let {A} be a random subset of {\{1,\dots,N\}}, chosen so that each {n \in \{1,\dots,N\}} has an independent probability of {1/2} of lying in {A}.

In the case when {f} is something like the truncated von Mangoldt function {\Lambda(n) 1_{n \leq N}}, the quantity {\sum_n |f(n)|^2} is of size {O( N \log N)} rather than {O( N )}. This costs us a logarithmic factor in the above analysis, however we can still conclude that we have the approximation (4) whenever {g} is another sequence with {\sum_n |g(n)|^2 \ll N \log N} such that one has the improved Fourier approximation

\displaystyle  \sum_n f(n) e(n\theta) = \sum_n g(n) e(n\theta) + o(\frac{N}{\log N}) \ \ \ \ \ (5)

uniformly in {\theta}. (Later on we will obtain a “log-free” version of this implication in which one does not need to gain a factor of {\frac{1}{\log N}} in the error term.)

This suggests a strategy for proving Vinogradov’s theorem: find an approximant {g} to some suitable truncation {f} of the von Mangoldt function (e.g. {f(n) = \Lambda(n) 1_{n \leq N}} or {f(n) = \Lambda(n) \eta(n/N)}) which obeys the Fourier approximation property (5), and such that the expression {\sum_{a+b+c=N} g(a) g(b) g(c)} is easily computable. It turns out that there are a number of good options for such an approximant {g}. One of the quickest ways to obtain such an approximation (which is used in Chapter 19 of Iwaniec and Kowalski) is to start with the standard identity {\Lambda = -\mu L * 1}, that is to say

\displaystyle  \Lambda(n) = - \sum_{d|n} \mu(d) \log d,

and obtain an approximation by truncating {d} to be less than some threshold {R} (which, in practice, would be a small power of {N}):

\displaystyle  \Lambda(n) \approx - \sum_{d \leq R: d|n} \mu(d) \log d. \ \ \ \ \ (6)

Thus, for instance, if {f(n) = \Lambda(n) 1_{n \leq N}}, the approximant {g} would be taken to be

\displaystyle  g(n) := - \sum_{d \leq R: d|n} \mu(d) \log d 1_{n \leq N}.

One could also use the slightly smoother approximation

\displaystyle  \Lambda(n) \approx \sum_{d \leq R: d|n} \mu(d) \log \frac{R}{d} \ \ \ \ \ (7)

in which case we would take

\displaystyle  g(n) := \sum_{d \leq R: d|n} \mu(d) \log \frac{R}{d} 1_{n \leq N}.

The function {g} is somewhat similar to the continuous Selberg sieve weights studied in Notes 4, with the main difference being that we did not square the divisor sum as we will not need to take {g} to be non-negative. As long as {z} is not too large, one can use some sieve-like computations to compute expressions like {\sum_{a+b+c=N} g(a)g(b)g(c)} quite accurately. The approximation (5) can be justified by using a nice estimate of Davenport that exemplifies the Mobius pseudorandomness heuristic from Supplement 4:

Theorem 8 (Davenport’s estimate) For any {A>0} and {x \geq 2}, we have

\displaystyle  \sum_{n \leq x} \mu(n) e(\theta n) \ll_A x \log^{-A} x

uniformly for all {\theta \in {\bf R}/{\bf Z}}. The implied constants are ineffective.

This estimate will be proven by splitting into two cases. In the “major arc” case when {\theta} is close to a rational {a/q} with {q} small (of size {O(\log^{O(1)} x)} or so), this estimate will be a consequence of the Siegel-Walfisz theorem ( from Notes 2); it is the application of this theorem that is responsible for the ineffective constants. In the remaining “minor arc” case, one proceeds by using a combinatorial identity (such as Vaughan’s identity) to express the sum {\sum_{n \leq x} \mu(n) e(\theta n)} in terms of bilinear sums of the form {\sum_n \sum_m a_n b_m e(\theta nm)}, and use the Cauchy-Schwarz inequality and the minor arc nature of {\theta} to obtain a gain in this case. This will all be done below the fold. We will also use (a rigorous version of) the approximation (6) (or (7)) to establish Vinogradov’s theorem.

A somewhat different looking approximation for the von Mangoldt function that also turns out to be quite useful is

\displaystyle  \Lambda(n) \approx \sum_{q \leq Q} \sum_{a \in ({\bf Z}/q{\bf Z})^\times} \frac{\mu(q)}{\phi(q)} e( \frac{an}{q} ) \ \ \ \ \ (8)

for some {Q} that is not too large compared to {N}. The methods used to establish Theorem 8 can also establish a Fourier approximation that makes (8) precise, and which can yield an alternate proof of Vinogradov’s theorem; this will be done below the fold.

The approximation (8) can be written in a way that makes it more similar to (7):

Exercise 9 Show that the right-hand side of (8) can be rewritten as

\displaystyle  \sum_{d \leq Q: d|n} \mu(d) \rho_d


\displaystyle  \rho_d := \frac{d}{\phi(d)} \sum_{m \leq Q/d: (m,d)=1} \frac{\mu^2(m)}{\phi(m)}.

Then, show the inequalities

\displaystyle  \sum_{m \leq Q/d} \frac{\mu^2(m)}{\phi(m)} \leq \rho_d \leq \sum_{m \leq Q} \frac{\mu^2(m)}{\phi(m)}

and conclude that

\displaystyle  \log \frac{Q}{d} - O(1) \leq \rho_d \leq \log Q + O(1).

(Hint: for the latter estimate, use Theorem 27 of Notes 1.)

The coefficients {\rho_d} in the above exercise are quite similar to optimised Selberg sieve coefficients (see Section 2 of Notes 4).

Another approximation to {\Lambda}, related to the modified Cramér random model (see Model 10 of Supplement 4) is

\displaystyle  \Lambda(n) \approx \frac{W}{\phi(W)} 1_{(n,W)=1} \ \ \ \ \ (9)

where {W := \prod_{p \leq w} p} and {w} is a slowly growing function of {N} (e.g. {w = \log\log N}); a closely related approximation is

\displaystyle  \frac{\phi(W)}{W} \Lambda(Wn+b) \approx 1 \ \ \ \ \ (10)

for {W,w} as above and {1 \leq b \leq W} coprime to {W}. These approximations (closely related to a device known as the “{W}-trick”) are not as quantitatively accurate as the previous approximations, but can still suffice to establish Vinogradov’s theorem, and also to count many other linear patterns in the primes or subsets of the primes (particularly if one injects some additional tools from additive combinatorics, and specifically the inverse conjecture for the Gowers uniformity norms); see this paper of Ben Green and myself for more discussion (and this more recent paper of Shao for an analysis of this approach in the context of Vinogradov-type theorems). The following exercise expresses the approximation (9) in a form similar to the previous approximation (8):

Exercise 10 With {W} as above, show that

\displaystyle  \frac{W}{\phi(W)} 1_{(n,W)=1} = \sum_{q|W} \sum_{a \in ({\bf Z}/q{\bf Z})^\times} \frac{\mu(q)}{\phi(q)} e( \frac{an}{q} )

for all natural numbers {n}.

— 1. Exponential sums over primes: the minor arc case —

We begin by developing some simple rigorous instances of the following general heuristic principle (cf. Section 3 of Supplement 4):

Principle 11 (Equidistribution principle) An exponential sum (such as a linear sum {\sum_n e(f(n))} or a bilinear sum {\sum_n \sum_m a_n b_m e(f(n,m))} involving a “structured” phase function {f} should exhibit some non-trivial cancellation, unless there is an “obvious” algebraic reason why such cancellation may not occur (e.g. {f} is approximately periodic with small period, or {f(n,m)} approximately decouples into a sum {g(n)+h(m)}).

There are some quite sophisticated versions of this principle in the literature, such as Ratner’s theorems on equidistribution of unipotent flows, discussed in this previous blog post. There are yet further precise instances of this principle which are conjectured to be true, but for which this remains unproven (e.g. regarding incomplete Weil sums in finite fields). Here, though, we will focus only on the simplest manifestations of this principle, in which {f} is a linear or bilinear phase. Rigorous versions of this special case of the above principle will be very useful in estimating exponential sums such as

\displaystyle  \sum_{n \leq x} \Lambda(n) e(\theta n)


\displaystyle  \sum_{n \leq x} \mu(n) e(\theta n)

in “minor arc” situations in which {\theta} is not too close to a rational number {a/q} of small denominator. The remaining “major arc” case when {\theta} is close to such a rational number {a/q} has to be handled by the complementary methods of multiplicative number theory, which we turn to later in this section.

For pedagogical reasons we shall develop versions of this principle that are in contrapositive form, starting with a hypothesis that a significant bias in an exponential sum is present, and deducing algebraic structure as a consequence. This leads to estimates that are not fully optimal from a quantitative viewpoint, but I believe they give a good qualitative illustration of the phenomena being exploited here.

We begin with the simplest instance of Principle 11, namely regarding unweighted linear sums of linear phases:

Lemma 12 Let {I \subset {\bf Z}} be an interval of length at most {N} for some {N \geq 1}, let {\theta \in{\bf R}/{\bf Z}}, and let {\delta > 0}.

  • (i) If

    \displaystyle  |\sum_{n \in I} e(\theta n)| \geq \delta N,

    Then {\|\theta\|_{{\bf R}/{\bf Z}} \ll \frac{1}{\delta N}}, where {\|\theta\|_{{\bf R}/{\bf Z}}} denotes the distance from (any representative of) {\theta} to the nearest integer.

  • (ii) More generally, if

    \displaystyle  |\sum_{n \in I} f(n) e(\theta n)| > \delta N \sup_{n \in I} |f(n)|

    for some monotone function {f: I \rightarrow {\bf R}}, then {\|\theta\|_{{\bf R}/{\bf Z}} \ll \frac{1}{\delta N}}.

Proof: From the geometric series formula we have

\displaystyle  |\sum_{n \in I} e(\theta n)| \leq \frac{2}{|e(\theta)-1|} \ll \frac{1}{\|\theta\|}

and the claim (i) follows. To prove (ii), we write {I = \{a,\dots,b\}} and observe from summation by parts that

\displaystyle  \sum_{n \in I} f(n) e(\theta n) = f(a) \sum_{n \in I} e(\theta n) + \sum_{m = a+1}^b (f(m)-f(m-1)) \sum_{n=m}^b e(\theta n)

while from monotonicity we have

\displaystyle  |f(a)|+ \sum_{m = a+1}^b |f(m)-f(m-1)| \leq 2 \sup_{n \in I} |f(n)|

and the claim then follows from (i) and the pigeonhole principle. \Box

Now we move to bilinear sums. We first need an elementary lemma:

Lemma 13 (Vinogradov lemma) Let {I \subset {\bf Z}} be an interval of length at most {N} for some {N \geq 1}, and let {\theta \in{\bf R}/{\bf Z}} be such that {\|n\theta\|_{{\bf R}/{\bf Z}} \leq \varepsilon} for at least {\delta N} values of {n \in I}, for some {0 < \varepsilon, \delta < 1}. Then either

\displaystyle  N < \frac{2}{\delta}


\displaystyle  \varepsilon > 10^{-2} \delta

or else there is a natural number {q \leq 2/\delta} such that

\displaystyle  \| q \theta \|_{{\bf R}/{\bf Z}} \ll \frac{\varepsilon}{\delta N}.

One can obtain somewhat sharper estimates here by using the classical theory of continued fractions and Bohr sets, as in this previous blog post, but we will not need these refinements here.

Proof: We may assume that {N \geq \frac{2}{\delta}} and {\varepsilon \leq 10^{-2} \delta}, since we are done otherwise. Then there are at least two {n \in I} with {\|n \theta \|_{{\bf R}/{\bf Z}} \leq \varepsilon}, and by the pigeonhole principle we can find {n_1 < n_2} in {I} with {\|n_1 \theta \|_{{\bf R}/{\bf Z}}, \|n_2 \theta \|_{{\bf R}/{\bf Z}} \leq \varepsilon} and {n_2-n_1 \leq \frac{2}{\delta}}. By the triangle inequality, we conclude that there exists at least one natural number {q \leq \frac{2}{\delta}} for which

\displaystyle  \| q \theta \|_{{\bf R}/{\bf Z}} \leq 2\varepsilon.

We take {q} to be minimal amongst all such natural numbers, then we see that there exists {a} coprime to {q} and {|\kappa| \leq 2\varepsilon} such that

\displaystyle  \theta = \frac{a}{q} + \frac{\kappa}{q}. \ \ \ \ \ (11)

If {\kappa=0} then we are done, so suppose that {\kappa \neq 0}. Suppose that {n < m} are elements of {I} such that {\|n\theta \|_{{\bf R}/{\bf Z}}, \|m\theta \|_{{\bf R}/{\bf Z}} \leq \varepsilon} and {m-n \leq \frac{1}{10 \kappa}}. Writing {m-n = qk + r} for some {0 \leq r < q}, we have

\displaystyle  \| (m-n) \theta \|_{{\bf R}/{\bf Z}} = \| \frac{ra}{q} + (m-n) \frac{\kappa}{q} \|_{{\bf R}/{\bf Z}} \leq 2\varepsilon.

By hypothesis, {(m-n) \frac{\kappa}{q} \leq \frac{1}{10 q}}; note that as {q \leq 2/\delta} and {\varepsilon \leq 10^{-2} \delta} we also have {\varepsilon \leq \frac{1}{10q}}. This implies that {\| \frac{ra}{q} \|_{{\bf R}/{\bf Z}} < \frac{1}{q}} and thus {r=0}. We then have

\displaystyle  |k \kappa| \leq 2 \varepsilon.

We conclude that for fixed {n \in I} with {\|n\theta \|_{{\bf R}/{\bf Z}} \leq \varepsilon}, there are at most {\frac{2\varepsilon}{|\kappa|}} elements {m} of {[n, n + \frac{1}{10 |\kappa|}]} such that {\|m\theta \|_{{\bf R}/{\bf Z}} \leq \varepsilon}. Iterating this with a greedy algorithm, we see that the number of {n \in I} with {\|n\theta \|_{{\bf R}/{\bf Z}} \leq \varepsilon} is at most {(\frac{N}{1/10|\kappa|} + 1) 2\varepsilon/|\kappa|}; since {\varepsilon < 10^{-2} \delta}, this implies that

\displaystyle  \delta N \ll 2 \varepsilon / \kappa

and the claim follows. \Box

Now we can control bilinear sums of the form

\displaystyle  \sum_{n \in I} \alpha*\beta(n) e(\theta n) = \sum_{m,n: mn\in I} \alpha(m) \beta(n) e(\theta nm).

Theorem 14 (Bilinear sum estimate) Let {M, N \geq 1}, let {I \subset {\bf Z}} be an interval, and let {a: {\bf N} \rightarrow {\bf C}}, {\beta: {\bf N} \rightarrow {\bf C}} be sequences supported on {[1,M]} and {[1,N]} respectively. Let {\delta > 0} and {\theta \in {\bf R}/{\bf Z}}.

  • (i) (Type I estimate) If {\beta} is real-valued and monotone and

    \displaystyle  |\sum_{m,n: mn \in I} \alpha(m) \beta(n) e( \theta nm )| > \delta MN \sup_m |\alpha(m)| \sup_n |\beta(n)|

    then either {N \ll \delta^{-2}}, or there exists {q \ll 1/\delta} such that {\|q\theta\|_{{\bf R}/{\bf Z}} \ll \frac{1}{\delta^2 NM}}.

  • (ii) (Type II estimate) If

    \displaystyle  |\sum_{m,n: mn \in I} \alpha(m) \beta(n) e( \theta nm )| > \delta (MN)^{1/2}

    \displaystyle  \times (\sum_m |\alpha(m)|^2)^{1/2} (\sum_n |\beta(n)|^2)^{1/2}

    then either {\min(M,N) \ll \delta^{-4}}, or there exists {q \ll 1/\delta^2} such that {\|q\theta\|_{{\bf R}/{\bf Z}} \ll \frac{1}{\delta^4 NM}}.

The hypotheses of (i) and (ii) should be compared with the trivial bounds

\displaystyle  |\sum_{m,n: mn \in I} \alpha(m) \beta(n) e( \theta nm )| \leq MN \sup_m |\alpha(m)| \sup_n |\beta(n)|


\displaystyle  |\sum_{m,n: mn \in I} \alpha(m) \beta(n) e( \theta nm )| \leq (MN)^{1/2} (\sum_m |\alpha(m)|^2)^{1/2} (\sum_n |\beta(n)|^2)^{1/2}

arising from the triangle inequality and the Cauchy-Schwarz inequality.

Proof: We begin with (i). By the triangle inequality, we have

\displaystyle  \sum_{m \leq M} |\sum_{n: mn \in I} \beta(n) e( \theta nm )| > \delta MN \sup_n |\beta(n)|.

The summand in {m} is bounded by {N \sup_n |\beta(n)|}. We conclude that

\displaystyle  |\sum_{n: mn \in I} \beta(n) e( \theta nm )| > \frac{\delta}{2} N \sup_n |\beta(n)|

for at least {\frac{\delta}{2} M} choices of {m \leq M} (this is easiest to see by arguing by contradiction). Applying Lemma 12(ii), we conclude that

\displaystyle  \| \theta m \|_{{\bf R}/{\bf Z}} \ll \frac{1}{\delta N} \ \ \ \ \ (12)

for at least {\frac{\delta}{2} M} choices of {m \leq M}. Applying Lemma 13, we conclude that one of {M \ll 1/\delta}, {\frac{1}{\delta N} \gg \delta}, or there exists a natural number {q \ll 1/\delta} such that {\|q\theta \|_{{\bf R}/{\bf Z}} \ll \frac{1}{\delta^2 NM}}. This gives (i) except when {M \ll 1/\delta}. In this case, we return to (12), which holds for at least one natural number {m \leq M \ll 1/\delta}, and set {q := m}.

Now we prove (ii). By the triangle inequality, we have

\displaystyle  \sum_m |\alpha(m)| |\sum_{n: mn \in I} \beta(n) e( \theta nm )|

\displaystyle  > \delta (MN)^{1/2} (\sum_m |\alpha(m)|^2)^{1/2} (\sum_n |\beta(n)|^2)^{1/2}

and hence by the Cauchy-Schwarz inequality

\displaystyle  \sum_{m\leq M} |\sum_{n: mn \in I} \beta(n) e( \theta nm )|^2 > \delta^2 MN \sum_n |\beta(n)|^2.

The left-hand side expands as

\displaystyle  \sum_{n,n' \leq N} \beta(n) \overline{\beta(n')} \sum_{m \leq M: mn, mn' \in I} e(\theta(n-n') m);

from the triangle inequality, the estimate {\beta(n) \overline{\beta(n')} \ll |\beta(n)|^2 + |\beta(n')|^2} and symmetry we conclude that

\displaystyle  \sum_{n \leq N} |\sum_{m \leq M: mn, mn' \in I} e(\theta(n-n') m)| \gg \delta^2 M N

for at least one choice of {n' \leq N}. Fix this {n'}. Since {|\sum_{m \leq M: mn, mn' \in I} e(\theta(n-n') m)| \leq M}, we thus have

\displaystyle  |\sum_{m \leq M: mn, mn' \in I} e(\theta(n-n') m)| \gg \delta^2 M

for {\gg \delta^2 N} choices of {n \leq N}. Applying Lemma 12(i), we conclude that

\displaystyle  \| \theta(n-n') \|_{{\bf R}/{\bf Z}} \ll \frac{1}{\delta^2 M }

for {\gg \delta^2 N} choices of {n \leq N}. Applying Lemma 13, we obtain the claim. \Box

The following exercise demonstrates the sharpness of the above theorem, at least with regards to the bound on {q}.

Exercise 15 Let {\frac{a}{q}} be a rational number with {(a,q)=1}, let {\theta \in {\bf R}/{\bf Z}}, and let {M, N} be multiples of {q}.

  • (i) If {\|\theta - \frac{a}{q}\|_{{\bf R}/{\bf Z}} \leq \frac{c}{qNM}} for a sufficiently small absolute constant {c>0}, show that {|\sum_{n \leq N} \sum_{m \leq M} e( \theta nm ) | \gg \frac{1}{q} NM}.
  • (ii) If {a} is even, and {\|\theta - \frac{a}{q}\|_{{\bf R}/{\bf Z}} \leq \frac{c}{\sqrt{q}NM}} for a sufficiently small absolute constant {c>0}, show that {|\sum_{n \leq N} \sum_{m \leq M} e( a n^2/2q) e(-a m^2/2q) e( \theta nm ) | \gg \frac{1}{\sqrt{q}} NM}.

Exercise 16 (Quantitative Weyl exponential sum estimates) Let {P(n) = \alpha_d n^d + \dots + \alpha_1 n + \alpha_0} be a polynomial with coefficients {\alpha_0,\dots,\alpha_d \in {\bf R}/{\bf Z}} for some {d \geq 1}, let {N \geq 1}, and let {\delta > 0}.

  • (i) Suppose that {|\sum_{n \in I} e(P(n))| \geq \delta N} for some interval {I} of length at most {N}. Show that there exists a natural number {q \ll_d \delta^{-O_d(1)}} such that {\| q\alpha_d \|_{{\bf R}/{\bf Z}} \ll_d \delta^{-O_d(1)} / N^d}. (Hint: induct on {d} and use the van der Corput inequality (Proposition 7 of Notes 5).
  • (ii) Suppose that {|\sum_{n \in I} e(P(n))| \geq \delta N} for some interval {I} contained in {[1,N]}. Show that there exists a natural number {q \ll_d \delta^{-O_d(1)}} such that {\| q\alpha_j \|_{{\bf R}/{\bf Z}} \ll_d \delta^{-O_d(1)} / N^j} for all {0 \leq j \leq d} (note this claim is trivial for {j=0}). (Hint: use downwards induction on {j}, adjusting {q} as one goes along, and split up {I} into somewhat short arithmetic progressions of various spacings {q} in order to turn the top degree components of {P(n)} into essentially constant phases.)
  • (iii) Use these bounds to give an alternate proof of Exercise 8 of Notes 5.

Exercise 17 (Quantitative multidimensional Weyl exponential sum estimates) Let {P(n_1,\dots,n_k) = \sum_{i_1+\dots+i_k \leq d} \alpha_{i_1,\dots,i_k} n_1^{i_1} \dots n_k^{i_k}} be a polynomial in {k} variables with coefficients {\alpha_{i_1,\dots,i_k} \in {\bf R}/{\bf Z}} for some {d \geq 1}. Let {N_1,\dots,N_k \geq 1} and {\delta > 0}. Suppose that

\displaystyle  |\sum_{n_1 \leq N_1} \dots \sum_{n_k \leq N_k} e( P(n_1,\dots,n_k) )| \geq \delta N_1 \dots N_k.

Show that there exists a natural number {q \ll_{d,k} \delta^{-O_{d,k}(1)}} such that {\|q \alpha_{i_1,\dots,i_k} \|_{{\bf R}/{\bf Z}} \ll \frac{\delta^{-O_{d,k}(1)}}{N_1^{i_1} \dots N_k^{i_k}}} for all {i_1,\dots,i_k}.

Recall that in the proof of the Bombieri-Vinogradov theorem (see Notes 3), sums such as {\sum_n \Lambda(n) \chi(n)} or {\sum_n \mu(n) \chi(n)} were handled by using combinatorial identities such as Vaughan’s identity to split {\Lambda} or {\mu} into combinations of Type I or Type II convolutions. The same strategy can be applied here:

Proposition 18 (Minor arc exponential sums are small) Let {x \geq 2}, {\delta > 0}, and {\theta \in {\bf R}/{\bf Z}}, and let {I} be an interval in {[1,x]}. Suppose that either

\displaystyle  |\sum_{n\in I} \Lambda(n) e(\theta n)| \geq \delta x \ \ \ \ \ (13)


\displaystyle  |\sum_{n\in I} \mu(n) e(\theta n)| \geq \delta x. \ \ \ \ \ (14)

Then either {x \ll \delta^{-O(1)}}, or there exists a natural number {q \ll \delta^{-2} \log^{O(1)} x} such that {\| q \theta \|_{{\bf R}/{\bf Z}} \ll \delta^{-5} \frac{\log^{O(1)} x}{x}}.

The exponent in the bound {x \ll \delta^{-O(1)}} can be made explicit (and fairly small) if desired, but this exponent is not of critical importance in applications. The losses of {\log x} in this proposition are undesirable (though affordable, for the purposes of proving results such as Vinogradov’s theorem); these losses have been reduced over the years, and finally eliminated entirely in the recent work of Helfgott.

Proof: We will prove this under the hypothesis (13); the argument for (14) is similar and is left as an exercise. By removing the portion of {I} in {[0, \delta x/2]}, and shrinking {\delta} slightly, we may assume without loss of generality that {I \subset [\delta x, x]}.

We recall the Vaughan identity

\displaystyle  \Lambda = \Lambda_{\leq V} + \mu_{\leq U} * L - \mu_{\leq U} * \Lambda_{\leq V} * 1 + \mu_{>U} * \Lambda_{>V} * 1

valid for any {U,V > 1}; see Lemma 18 of Notes 3. We select {U=V=x^{1/3}}. By the triangle inequality, one of the assertions

\displaystyle  |\sum_{n\in I} \Lambda_{\leq V}(n) e(\theta n)| \gg \delta x \ \ \ \ \ (15)

\displaystyle  |\sum_{n\in I} \mu_{\leq U} * L(n) e(\theta n)| \gg \delta x \ \ \ \ \ (16)

\displaystyle  |\sum_{n\in I} \mu_{\leq U} * \Lambda_{\leq V} * 1(n) e(\theta n)| \gg \delta x \ \ \ \ \ (17)

\displaystyle  |\sum_{n\in I} \mu_{>U} * \Lambda_{>V} * 1(n) e(\theta n)| \gg \delta x \ \ \ \ \ (18)

must hold. If (15) holds, then {V \geq \delta x}, from which we easily conclude that {x \ll \delta^{-O(1)}}. Now suppose that (16) holds. By dyadic decomposition, we then have

\displaystyle  |\sum_{n\in I} \alpha * \beta(n) e(\theta n)| \gg \frac{\delta x}{\log^{O(1)} x} \ \ \ \ \ (19)

where {\alpha, \beta} are restrictions of {\mu_{\leq U}}, {L} to dyadic intervals {[M,2M]}, {[N,2N]} respectively. Note that the location of {I} then forces {\delta x \ll MN \ll x}, and the support of {\mu_{\leq U}} forces {M \ll U}, so that {N \gg x/U = x^{2/3}}. Applying Theorem 14(i), we conclude that either {N \ll \delta^{-2} \log^{O(1)} x} (and hence {x \ll \delta^{-O(1)}}), or else there is {q \ll \delta^{-2} \log^{O(1)} x} such that {\| q \theta \|_{{\bf R}/{\bf Z}} \ll \delta^{-3} \frac{\log^{O(1)} x}{x}}, and we are done.

Similarly, if (17) holds, we again apply dyadic decomposition to arrive at (19), where {\alpha,\beta} are now restrictions of {\mu_{\leq U} * \Lambda_{\leq V}} and {1} to {[M,2M]} and {[N,2N]}. As before, we have {\delta x \ll MN \ll x}, and now {M \ll UV} and so {N \gg x^{1/3}}. Note from the identity {\sum_{d|n} \Lambda(d)=\log n} that {\alpha} is bounded pointwise by {\log x}. Repeating the previous argument then gives one of the required conclusions.

Finally, we consider the “Type II” scenario in which (18) holds. We again dyadically decompose and arrive at (19), where {\alpha} and {\beta} are now the restrictions of {\mu_{>U}} and {\Lambda_{>V}*1} (say) to {[M,2M]} and {[N,2N]}, so that {\delta x \ll MN \ll x}, {M \gg U}, and {N \gg V}. As before we can bound {\beta} pointwise by {\log x}. Applying Theorem 14(ii), we conclude that either {\min(M,N) \ll \delta^{-4} \log^{O(1)} x}, or else there exists {q \ll \delta^{-2} \log^{O(1)} x} such that {\|q\theta\|_{{\bf R}/{\bf Z}} \ll \delta^{-5} \frac{\log^{O(1)} x}{x}}, and we again obtain one of the desired conclusions. \Box

Exercise 19 Finish the proof of Proposition 18 by treating the case when (14) occurs.

Exercise 20 Establish a version of Proposition 18 in which (13) or (14) are replaced with

\displaystyle  |\sum_{p \leq x} e( \theta p )| \gg \delta \frac{x}{\log x}.

— 2. Exponential sums over primes: the major arc case —

Proposition 18 provides guarantees that exponential sums such as {\sum_{n \leq x} \Lambda(n) e(\theta n)} are much smaller than {x}, unless {x} is itself small, or if {\theta} is close to a rational number {a/q} of small denominator. We now analyse this latter case. In contrast with the minor arc analysis, the implied constants will usually be ineffective.

The situation is simplest in the case of the Möbius function:

Proposition 21 Let {I} be an interval in {[1,x]} for some {x \geq 2}, and let {\theta \in {\bf R}/{\bf Z}}. Then for any {A>0} and natural number {q}, we have

\displaystyle  \sum_{n \in I} \mu(n) e(\theta n) \ll_A (q + x \| q \theta \|_{{\bf R}/{\bf Z}}) x \log^{-A} x.

The implied constants are ineffective.

Proof: By splitting into residue classes modulo {q}, it suffices to show that

\displaystyle  \sum_{n \in I: n = a\ (q)} \mu(n) e(\theta n) \ll_A (1 + \frac{x}{q} \| q \theta \|_{{\bf R}/{\bf Z}}) x \log^{-A} x

for all {a = 0,\dots,q-1}. Writing {n = qm+a}, and removing a factor of {e(a\theta)}, it suffices to show that

\displaystyle  \sum_{m \in {\bf Z}: qm+a \in I} \mu(qm+a) e(\xi m) \ll_A (1 + \frac{x}{q} \| q \theta \|_{{\bf R}/{\bf Z}}) x \log^{-A} x

where {\xi \in {\bf R}} is the representative of {q\theta} that is closest to the origin, so that {|\xi| = \|q\theta\|_{{\bf R}/{\bf Z}}}.

For all {m} in the above sum, one has {0 \leq m \leq x/q}. From the fundamental theorem of calculus, one has

\displaystyle  e(\xi m) = 1 + 2\pi i \xi \int_0^{x/q} e(\xi t) 1_{m \geq t}\ dt \ \ \ \ \ (20)

and so by the triangle inequality it suffices to show that

\displaystyle  \sum_{m \in {\bf Z}: qm+a \in I; m \geq t} \mu(qm+a) \ll_A x \log^{-A} x

for all {0 \leq t \leq x/q}. But this follows from the Siegel-Walfisz theorem for the Möbius function (Exercise 66 of Notes 2). \Box

Arguing as in the proof of Lemma 12(ii), we also obtain the corollary

\displaystyle  \sum_{n \in I} \mu(n) e(\theta n) f(n) \ll_A (q + x \| q \theta \|_{{\bf R}/{\bf Z}}) x \log^{-A} x \sup_{n \in I} |f(n)| \ \ \ \ \ (21)

for any monotone function {f: I \rightarrow {\bf R}}, with ineffective constants.

Davenport’s theorem (Theorem 8) is now immediate from applying Proposition 18 with {\delta := \log^{-A} x}, followed by Proposition 21 (with {A} replaced by a larger constant) to deal with the major arc case.

Now we turn to the analogous situation for the von Mangoldt function {\Lambda}. Here we expect to have a non-trivial main term in the major arc case: for instance, the prime number theorem tells us that {\sum_{n \leq I} \Lambda(n) e(\theta n)} should be approximately the length of {I} when {\theta=0}. There are several ways to describe the behaviour of {\Lambda}. One way is to use the approximation

\displaystyle  \Lambda^\sharp_R(n) := - \sum_{d \leq R: d|n} \mu(d) \log d

discussed in the introduction:

Proposition 22 Let {x \geq 2} and let {R} be such that {R \geq x^\varepsilon}. Then for any interval {I} in {[1,x]} and any {\theta \in {\bf R}/{\bf Z}}, one has

\displaystyle  \sum_{n \in I} \Lambda(n) e(\theta n) - \sum_{n \in I} \Lambda^\sharp_R(n) e(\theta n) \ll_{A,\varepsilon} x \log^{-A} x \ \ \ \ \ (22)

for all {A>0}. The implied constants are ineffective.

Proof: As discussed in the introduction, we have

\displaystyle  \Lambda(n) - \Lambda^\sharp_R(n) = - \sum_{d>R: d|n} \mu(d) \log d

so the left-hand side of (22) can be rearranged as

\displaystyle  - \sum_m \sum_{d>R: dm \in I} \mu(d) \log d e( m \theta d).

Since {I \subset [1,x]}, the inner sum vanishes unless {m \leq x/R}. From Theorem 8 and summation by parts (or (21) and Proposition 21), we have

\displaystyle  \sum_{d>R: dm \in I} \mu(d) \log d e( m \theta d) \ll_A \frac{x}{m} \log^{-A}(x/m) \times \log x;

since {m \leq x/R}, we have {\log^{-A}(x/m) \ll_{A,\varepsilon} \log^{-A} x}, and the claim now follows from summing in {m} (and increasing {A} appropriately). \Box

Exercise 23 Show that Proposition 22 continues to hold if {\Lambda^\sharp_R} is replaced by the function

\displaystyle  \Lambda_R(n) := \sum_{d \leq R: d|n} \mu(d) \log \frac{R}{d}

or more generally by

\displaystyle  \Lambda_{R,\eta}(n) := \log R \sum_{d \leq R: d|n} \mu(d) \eta( \frac{\log d}{\log R} )

where {\eta: {\bf R} \rightarrow [0,1]} is a bounded function such that {\eta(t) = 1-t} for {0 \leq t \leq 1/2} and {\eta(t)=0} for {t>1}. (Hint: use a suitable linear combination of the identities {\Lambda = - \mu L * 1} and {\Lambda = \mu * L}.)

Alternatively, we can try to replicate the proof of Proposition 21 directly, keeping track of the main terms that are now present in the Siegel-Walfisz theorem. This gives a quite explicit approximation for {\sum_{n \in I} \Lambda(n) e(\theta n)} in major arc cases:

Proposition 24 Let {I} be an interval in {[1,x]} for some {x \geq 2}, and let {\theta \in {\bf R}/{\bf Z}} be of the form {\theta = \frac{a}{q} + \frac{\xi}{q}}, where {q \geq 1} is a natural number, {a \in ({\bf Z}/q{\bf Z})^\times}, and {\xi \in {\bf R}}. Then for any {A>0}, we have

\displaystyle  \sum_{n \in I} \Lambda(n) e(\theta n) - \frac{\mu(q)}{\phi(q)} \int_I e(\frac{\xi t}{q})\ dt \ll_A (q + x |\xi|) x \log^{-A} x.

The implied constants are ineffective.

Proof: We may assume that {|\xi| \leq \frac{\log^{A} x}{x}} and {q \leq \log^A x}, as the claim follows from the triangle inequality and the prime number theorem otherwise. For similar reasons we can also assume that {x} is sufficiently large depending on {A}.

As in the proof of Proposition 24, we decompose into residue classes mod {q} to write

\displaystyle  \sum_{n \in I} \Lambda(n) e(\theta n) = \sum_{b \in {\bf Z}/q{\bf Z}} \sum_{n \in I: n = b\ (q)} \Lambda(n) e(\theta n).

If {b} is not coprime to {q}, then one easily verifies that

\displaystyle  \sum_{n \in I: n = b\ (q)} \Lambda(n) e(\theta n) \ll \log^2 x

and the contribution of these cases is thus acceptable. Thus, up to negligible errors, we may restrict {b} to be coprime to {q}. Writing {n = qm+b}, we thus may replace {\sum_{n \in I} \Lambda(n) e(\theta n)} by

\displaystyle  \sum_{b \in ({\bf Z}/q{\bf Z})^\times} e(\theta b) \sum_{m: qm+b \in I} \Lambda(qm+b) e(\xi m).

Applying (20), we can write

\displaystyle  \sum_{m: qm+b \in I} \Lambda(qm+b) e(\xi m) = \sum_{m: qm+b \in I} \Lambda(qm+b)

\displaystyle + 2\pi i \xi \int_0^{x/q} e(\xi t) \sum_{m: qm+b \in I; m \geq t} \Lambda(qm+b).

Applying the Siegel-Walfisz theorem (Exercise 64 of Notes 2), we can replace {\Lambda} here by {\frac{q}{\phi(q)}}, up to an acceptable error. Applying (20) again, we have now replaced {\sum_{n \in I} \Lambda(n) e(\theta n)} by

\displaystyle  \frac{q}{\phi(q)} \sum_{b \in ({\bf Z}/q{\bf Z})^\times} e(\theta b) \sum_{m: qm+b \in I} e(\xi m),

which we can rewrite as

\displaystyle  \frac{q}{\phi(q)} \sum_{n \in I: (n,q)=1} e(\theta n).

From Möbius inversion one has

\displaystyle  1_{(n,q)=1} = \sum_{d|q} \mu(d) 1_{d|n},

so we can rewrite the previous expression as

\displaystyle  \frac{q}{\phi(q)} \sum_{d|q} \mu(d) \sum_{m: dm \in I} e(d\theta m).

For {d|q} with {d < q}, we see from the hypotheses that {\|d\theta\|_{{\bf R}/{\bf Z}} \gg 1/q}, and so {\sum_{m: dm \in I} e(d\theta m) \ll q} by Lemma 12(i). The contribution of all {d<q} is then {\ll \frac{q}{\phi(q)} \tau(q) q}, which is acceptable since {q \leq \log^A x}. So, up to acceptable errors, we may replace {\sum_{n \in I} \Lambda(n) e(\theta n)} by {\frac{q}{\phi(q)} \mu(q) \sum_{m: qm \in I} e(q\theta m)}. We can write {e(q\theta m)=e(\xi m)}, and the claim now follows from Exercise 11 of Notes 1 and a change of variables. \Box

Exercise 25 Assuming the generalised Riemann hypothesis, obtain the significantly stronger estimate

\displaystyle  \sum_{n \in I} \Lambda(n) e(\theta n) - \frac{\mu(q)}{\phi(q)} \int_I e(\frac{\xi t}{q})\ dt \ll (q + x |\xi|)^{O(1)} x^{1/2} \log^2 x

with effective constants. (Hint: use Exercise 48 of Notes 2, and adjust the arguments used to prove Proposition 24 accordingly.)

Exercise 26 If {q \leq \log^A x} and there is a real primitive Dirichlet character {\chi} of modulus {q} whose {L}-function {L(\cdot,\chi)} has an exceptional zero {\beta} with {1 - \frac{c}{\log q} \leq \beta < 1} for a sufficiently small {c}, establish the variant

\displaystyle  \sum_{n \in I} \Lambda(n) e(\theta n) - \frac{\mu(q)}{\phi(q)} \int_I e(\frac{\xi t}{q})\ dt + \chi(a) \frac{\tau(\chi)}{\phi(q)} \int_I t^{\beta-1} e(\frac{\xi t}{q})\ dt

\displaystyle \ll_A (q + x |\xi|) x \log^{-A} x

of Proposition 24, with the implied constants now effective and with the Gauss sum {\tau(\chi)} defined in equation (11) of Supplement 2. If there is no such primitive Dirichlet character, show that the above estimate continues to hold with the exceptional term {\overline{\chi(a)} \frac{\tau(\chi)}{\phi(q)} \int_I t^{\beta-1} e(\frac{\xi t}{q})\ dt} deleted.

Informally, the above exercise suggests that one should add an additional correction term {- \frac{q}{\phi(q)} \chi(n) n^{\beta-1}} to the model for {\Lambda} when there is an exceptional zero.

We can now formalise the approximation (8):

Exercise 27 Let {A > 0}, and suppose that {B} is sufficiently large depending on {A}. Let {x \geq 2}, and let {Q} be a quantity with {\log^B x \leq Q \leq x^{1/2} \log^{-B} x}. Let {\nu_Q: {\bf N} \rightarrow {\bf R}} be the function

\displaystyle  \nu_Q(n) := \sum_{q \leq Q} \sum_{a \in ({\bf Z}/q{\bf Z})^\times} \frac{\mu(q)}{\phi(q)} e( \frac{an}{q} ).

Show that for any interval {I \subset [1,x]}, we have

\displaystyle  \sum_{n \in I} \Lambda(n) e(\theta n) - \sum_{n \in I} \nu_Q(n) e(\theta n) \ll_A x \log^{-A} x

for all {\theta \in {\bf R}/{\bf Z}}, with ineffective constants.

We also can formalise the approximations (9), (10):

Exercise 28 Let {x \geq 2}, and let {w > 1} be such that {w \leq \frac{1}{4} \log x}. Write {W := \prod_{p \leq w} p}.

  • (i) Show that

    \displaystyle  \sum_{n \in I} \Lambda(n) e(\theta n) - \sum_{n \in I} \frac{W}{\phi(W)} 1_{(n,W)=1} e(\theta n) \ll \frac{1}{w}

    for all {\theta \in {\bf R}/{\bf Z}} and {I \subset [1,x]}, with an ineffective constant.

  • (ii) Suppose now that {x \geq 100} and {w = O(\log\log x)}. Let {1 \leq b \leq W} be coprime to {W}, and let {\Lambda_{b,W} := \frac{\phi(W)}{W} \Lambda(Wn+b)}. Show that

    \displaystyle  \sum_{n \in I} \Lambda_b(n) e(\theta n) - \sum_{n \in I} e(\theta n) \ll \frac{1}{w}

    for all {\theta \in {\bf R}/{\bf Z}} and {I \subset [1,x]}.

Proposition 24 suggests that the exponential sum {\sum_{n \in I} \Lambda(n) e(\theta n)} should be of size about {x / \phi(q)} when {\theta} is close to {a/q} and {q} is fairly small, and {x} is large. However, the arguments in Proposition 18 only give an upper bound of {O( x / \sqrt{q} )} instead (ignoring logarithmic factors). There is a good reason for this discrepancy, though. The proof of Proposition 24 relied on the Siegel-Walfisz theorem, which in turn relied on Siegel’s theorem. As discussed in Notes 2, the bounds arising from this theorem are ineffective – we do not have any control on how the implied constant in the estimate in Proposition 24 depends on {A}. In contrast, the upper bounds in Proposition 18 are completely effective. Furthermore, these bounds are close to sharp in the hypothetical scenario of a Landau-Siegel zero:

Exercise 29 Let {c} be a sufficiently small (effective) absolute constant. Suppose there is a non-principal character {\chi} of conductor {q} with an exceptional zero {\beta \geq 1 - \frac{c}{\log q}}. Let {x \geq 2} be such that {q \leq \exp(c \sqrt{\log x} )} and {\beta \geq 1 - \frac{c}{\log x}}. Show that

\displaystyle  |\sum_{n \leq x} \Lambda(n) e( \frac{a}{q} n )| \gg \frac{x \sqrt{q}}{\phi(q)} \geq \frac{x}{\sqrt{q}}

for every {a \in ({\bf Z}/q{\bf Z})^\times}.

This exercise indicates that apart from the factors of {\log x}, any substantial improvements to Proposition 18 will first require some progress on the notorious Landau-Siegel zero problem. It also indicates that if a Landau-Siegel zero is present, then one way to proceed is to simply incorporate the effect of that zero into the estimates (so that the computations for major arc exponential sums would acquire an additional main term coming from the exceptional zero), and try to establish results like Vinogradov’s theorem separately in this case (similar to how things were handled for Linnik’s theorem, see Notes 7), by using something like Exercise 26 in place of Proposition 24.

— 3. Vinogradov’s theorem —

We now have a number of routes to establishing Theorem 2. Let {N} be a large number. We wish to compute the expression

\displaystyle  \sum_{a,b,c \in {\bf N}: a+b+c=N} \Lambda(a) \Lambda(b) \Lambda(c)

or equivalently

\displaystyle  \sum_{a,b,c \in {\bf Z}: a+b+c=N} f(a) f(b) f(c)

where {f(n) := \Lambda(n) 1_{1 \leq n \leq N}}.

Now we replace {f} by a more tractable approximation {g}. There are a number of choices for {g} that were presented in the previous section. For sake of illustration, let us select the choice

\displaystyle  g(n) := \log R \sum_{d \leq R: d|n} \mu(d) \eta( \frac{\log d}{\log R} ) 1_{1 \leq n\leq N} \ \ \ \ \ (23)

where {R := N^{1/10}} (say) and where {\eta: {\bf R} \rightarrow [0,1]} is a fixed smooth function supported on {[-1,1]} such that {\eta(t) = 1-t} for {0 \leq t \leq 1/2}. From Exercise 23 we have

\displaystyle  \sup_{\theta \in {\bf R}/{\bf Z}} | \sum_n (f(n)-g(n)) e(n \theta) | \ll_{A,\eta} N \log^{-A} N

(with ineffective constants) for any {A>0}. Also, by bounding {g} by the divisor function on {[1,N]} we have the bounds

\displaystyle  \sum_n |f(n)|^2, \sum_n |g(n)|^2 \ll N \log^{O(1)} N

so from several applications of Corollary 6 (splitting {f} as the sum of {f-g} and {g}) we have

\displaystyle  \sum_{a,b,c \in {\bf Z}: a+b+c=N} f(a) f(b) f(c) =

\displaystyle  \sum_{a,b,c \in {\bf Z}: a+b+c=N} g(a) g(b) g(c) + O_A( N^2 \log^{-A} N )

for any {A>0} (again with ineffective constants).

Now we compute {\sum_{a,b,c \in {\bf Z}: a+b+c=N} g(a) g(b) g(c)}. Using (23), we may rearrange this expression as

\displaystyle  \log^3 R \sum_{d_1,d_2,d_3} (\prod_{i=1}^3 \mu(d_i) \eta(\frac{\log d_i}{\log R})) \sum_{a,b,c \in {\bf N}: d_1|a, d_2|b, d_3|c, a+b+c=N} 1.

The inner sum can be estimated by covering the {(a,b)} parameter space by squares of sidelength {[d_1, d_2, d_3]} (the least common multiple of {d_1,d_2,d_3}) as

\displaystyle  \sum_{a,b,c \in {\bf N}: d_1|a, d_2|b, d_3|c, a+b+c=N} 1 = \rho(d_1,d_2,d_3) \frac{N^2}{2} + O( N d_1 d_2 d_3 )

where {\rho(d_1,d_2,d_3)} is the proportion of residue classes in the plane {\{ (a,b,c) \in ({\bf Z}/[d_1,d_2,d_3]): a+b+c=N\ ([d_1,d_2,d_3])\}} with {d_1|a, d_2|b, d_3|c}. Since {d_1,d_2,d_3 \leq R = N^{1/10}}, the contribution of the error term is certainly acceptable, so

\displaystyle  \sum_{a,b,c \in {\bf Z}: a+b+c=N} f(a) f(b) f(c)

\displaystyle  = \frac{N^2}{2} \log^3 R \sum_{d_1,d_2,d_3} (\prod_{i=1}^3 \mu(d_i) \eta(\frac{\log d_i}{\log R})) \rho(d_1,d_2,d_3) + O_A( N^2 \log^{-A} N).

Thus to prove Theorem 2, it suffices to establish the asymptotic

\displaystyle  \log^3 R \sum_{d_1,d_2,d_3} (\prod_{i=1}^3 \mu(d_i) \eta(\frac{\log d_i}{\log R})) \rho(d_1,d_2,d_3) = G_3(N) + O_A( \log^{-A} N ). \ \ \ \ \ (24)

From the Chinese remainder theorem we see that {\rho} is multiplicative in the sense that {\rho(d_1 d'_1,d_2 d'_2, d_3 d'_3) = \rho(d_1,d_2,d_3) \rho(d'_1,d'_2,d'_3)} when {d_1d_2d_3} is coprime to {d'_1d'_2d'_3}, so to evaluate this quantity for squarefree {d_1,d_2,d_3} it suffices to do so when {d_1,d_2,d_3 \in \{1,p\}} for a single prime {p}. This is easily done:

Exercise 30 Show that

\displaystyle  \rho(d_1,d_2,d_3) = \frac{1}{d_1 d_2 d_3}

except when {d_1=d_2=d_3=p}, in which case one has

\displaystyle  \rho(p,p,p) = \frac{1_{p|N}}{p^2}.

The left-hand side of (24) is an expression similar to that studied in Section 2 of Notes 4, and can be estimated in a similar fashion. Namely, we can perform a Fourier expansion

\displaystyle  e^u \eta( u ) = \int_{\bf R} F(t) e^{-itu}\ dt \ \ \ \ \ (25)

for some smooth, rapidly decreasing {F: {\bf R} \rightarrow {\bf C}}. This lets us write the left-hand side of (24) as

\displaystyle  \log^3 R \int_{\bf R} \int_{\bf R} \int_{\bf R} \sum_{d_1,d_2,d_3} \frac{\mu(d_1) \mu(d_2) \mu(d_3) \rho(d_1,d_2,d_3)}{d_1^{(1+it_1)/\log R} d_2^{(1+it_2)/\log R} d_3^{(1+it_3)/\log R}}

\displaystyle  F(t_1) F(t_2) F(t_3)\ dt_1 dt_2 dt_3.

We can factor this as

\displaystyle  \log^3 R \int_{\bf R} \int_{\bf R} \int_{\bf R} \prod_p E_p(\frac{1+it_1}{\log R},\frac{1+it_2}{\log R},\frac{1+it_3}{\log R}) \ \ \ \ \ (26)

\displaystyle  F(t_1) F(t_2) F(t_3)\ dt_1 dt_2 dt_3

where (by Exercise 30)

\displaystyle  E_p(s_1,s_2,s_3) = 1 - \frac{1}{p^{1+s_1}} - \frac{1}{p^{1+s_2}} - \frac{1}{p^{1+s_3}}

\displaystyle  + \frac{1}{p^{2+s_1+s_2}} + \frac{1}{p^{2+s_1+s_3}} + \frac{1}{p^{2+s_2+s_3}}

\displaystyle  - \frac{1_{p|N}}{p^{2+s_1+s_2+s_3}}.

From Mertens’ theorem we see that

\displaystyle  \prod_p E_p(\frac{1+it_1}{\log R},\frac{1+it_2}{\log R},\frac{1+it_3}{\log R}) \ll \prod_p (1 + O(\frac{1}{p^{1+1/\log R}}) )

\displaystyle  \ll \log^{O(1)} R

so from the rapid decrease of {F} we may restrict {t_1,t_2,t_3} to be bounded in magnitude by {\sqrt{\log R}} accepting a negligible error of {O_A(\log^{-A} R)}. Using

\displaystyle  \prod_p 1 - \frac{1}{p^s} = \frac{1}{\zeta(s)}

for {\hbox{Re}(s)=1}, we can write

\displaystyle  \prod_p E_p(s_1,s_2,s_3) = \frac{1}{\zeta(1+s_1) \zeta(1+s_2) \zeta(1+s_3)}

\displaystyle  \prod_p E_p(s_1,s_2,s_3) \prod_{j=1}^3 (1 - \frac{1}{p^{1+s_j}})^{-1}.

By Taylor expansion, we have

\displaystyle  E_p(s_1,s_2,s_3) \prod_{j=1}^3 (1 - \frac{1}{p^{1+s_j}})^{-1} = 1 + O( \frac{1}{p^{3/2}} )

(say) uniformly for {|s_1|, |s_2|, |s_3| \leq 1/10}, and so the logarithm of the product {\prod_p E_p(s_1,s_2,s_3) \prod_{j=1}^3 (1 - \frac{1}{p^{1+s_j}})} is a bounded holomorphic function in this region. From Taylor expansion we thus have

\displaystyle  \prod_p E_p(s_1,s_2,s_3) \prod_{j=1}^3 (1 - \frac{1}{p^{1+s_j}}) = (1+P(s_1,s_2,s_3)+O_A(\log^{-A} R))

\displaystyle  \times \prod_p E_p(0,0,0) / (1 - \frac{1}{p})^3

when {s_1,s_2,s_3 = O( 1/\sqrt{\log R} )}, where {P} is some polynomial (depending on {A}) with vanishing constant term. From (1) we see that

\displaystyle  \prod_p E_p(0,0,0) / (1 - \frac{1}{p})^3 = G_3(N).

Similarly we have

\displaystyle  \zeta(1+s) = \frac{1 + Q(s) + O_A( \log^{-A} R)}{s}

for {s = O(1/\sqrt{\log R})}, where {Q} is another polynomial depending on {A} with vanishing constant term. We can thus write (26) (up to errors of {O_A(\log^{-A} R)}) as

\displaystyle  G_3(N) \int_{|t_1|, |t_2|, |t_3| \leq \sqrt{\log R}} (1+S(\frac{1+it_1}{\log R},\frac{1+it_2}{\log R},\frac{1+it_3}{\log R}))

\displaystyle  (1+it_1) (1+it_2) (1+it_3) F(t_1) F(t_2) F(t_3)\ dt_1 dt_2 dt_3

where {S} is a polynomial depending on {A} with vanishing constant term. By the rapid decrease of {F} we may then remove the constraints on {t_1,t_2,t_3}, and reduce (24) to showing that

\displaystyle  \int_{\bf R} \int_{\bf R} \int_{\bf R} (1+S(\frac{1+it_1}{\log R},\frac{1+it_2}{\log R},\frac{1+it_3}{\log R}))

\displaystyle  (1+it_1) (1+it_2) (1+it_3) F(t_1) F(t_2) F(t_3)\ dt_1 dt_2 dt_3 = 1

which on expanding {S} and Fubini’s theorem reduces to showing that

\displaystyle  \int_{\bf R} (1+it)^k F(t)\ dt = 1_{k=1}

for {k=1,2,\dots}. But from multiplying (25) by {e^{-u}} and then differentiating {k} times at {u=0}, we see that

\displaystyle  \eta^{(k)}(0) = (-1)^k \int_{\bf R} (1+it)^k F(t)\ dt,

and the claim follows since {\eta(u) = 1-u} for {0 \leq u \leq 1/2}. This proves Theorem 2 and hence Theorem 1.

One can of course use other approximations to {\Lambda} to establish Vinogradov’s theorem. The following exercise gives one such route:

Exercise 31 Use Exercise 27 to obtain the asymptotic

\displaystyle  \sum_{a,b,c: a+b+c=N} \Lambda(a) \Lambda(b) \Lambda(c)

\displaystyle = \frac{N^2}{2} \sum_{q \leq Q} \sum_{a \in ({\bf Z}/q{\bf Z})^\times} \frac{\mu(q)}{\phi(q)^3} e( aN/q) + O_A( N^2 \log^{-A} N )

for any {A>0} with ineffective constants. Then show that

\displaystyle  \sum_q \sum_{a \in ({\bf Z}/q{\bf Z})^\times} \frac{\mu(q)}{\phi(q)^3} e( aN/q) = G_3(N)

and give an alternate derivation of Theorem 2.

Exercise 32 By using Exercise 26 in place of Exercise 27, obtain the asymptotic

\displaystyle  \sum_{a,b,c: a+b+c=N} \Lambda(a) \Lambda(b) \Lambda(c)

\displaystyle = (G_3(N) + O( \frac{q^2}{\phi(q)^3}) ) \frac{N^2}{2} + O_A( N^2 \log^{-A} N )

for any {A>0} with effective constants if there is a real primitive Dirichlet character {\chi} of modulus {q} and modulus {q \leq \log^B x} and an exceptional {\beta} with {1 - \frac{c}{\log q} \leq \beta < 1} for some sufficiently small {c} and for some sufficiently large {B} depending on {A}, with the {\chi(N) \frac{\tau(\chi)^4}{\phi(q)^3}} term being deleted if no such exceptional character exists. Use this to establish Theorem 1 with an effective bound on how sufficiently large {N} has to be.

Exercise 33 Let {N \geq 2}. Show that

\displaystyle  \sum_{a,r: a+2r \leq N} \Lambda(a) \Lambda(a+r) \Lambda(a+2r) = {\mathfrak S} \frac{N^2}{4} + O_A( N^2 \log^{-A} N )

for any {A > 0}, where

\displaystyle  {\mathfrak S} := \frac{3}{2} \prod_{p>3} (1-\frac{2}{p}) (1-\frac{1}{p})^{-2}.

Conclude that the number of length three arithmetic progressions {a,a+r,a+2r} contained in the primes up to {N} is {{\mathfrak S} \frac{N^2}{4\log^3 N} + O_A( N^2 \log^{-A} N )} for any {A > 0}. (This result is due to van der Corput.)

Exercise 34 (Even Goldbach conjecture for most {N}) Let {N \geq 2}, and let {g} be as in (23).

  • (i) Show that {\Lambda(f,f,h) = \Lambda(g,g,h) + O_A( N^2 \log^{-A} N )} for any {A > 0} and any function {h: \{1,\dots,N\} \rightarrow {\bf R}} bounded in magnitude by {1}.
  • (ii) For any {M \leq N}, show that

    \displaystyle  \sum_{a,b: a+b=M} g(a) g(b) = G_2(M) M + O_A( N \log^{-A} N )

    for any {A > 0}, where

    \displaystyle  G_2(M) := \prod_{p|M} \frac{p^2-p}{(p-1)^2} \times \prod_{p \not | M} \frac{p^2-2p}{(p-1)^2}.

  • (iii) Show that for any {A > 0}, one has

    \displaystyle  \sum_{a,b: a+b=M} f(a) f(b) = G_2(M) M + O_A( N \log^{-A} N )

    for all but at most {O_A( N \log^{-A} N )} of the numbers {M \in \{1,\dots,N\}}.

  • (iv) Show that all but at most {O_A( N \log^{-A} N )} of the even numbers in {\{1,\dots,N\}} are expressible as the sum of two primes.

Filed under: 254A - analytic prime number theory, math.NT Tagged: circle method, exponential sums, Vinogradov's theorem

Jordan EllenbergHNTBW covers: Italy and Brazil

The first foreign editions of How Not To Be Wrong are coming out!  Italy is first, this week:

Screen Shot 2015-04-14 at 14 Apr  11.00.AM


And after that, Brazil in June:


John BaezResource Convertibility (Part 3)

guest post by Tobias Fritz

In Part 1 and Part 2, we learnt about ordered commutative monoids and how they formalize theories of resource convertibility and combinability. In this post, I would like to say a bit about the applications that have been explored so far. First, the study of resource theories has become a popular subject in quantum information theory, and many of the ideas in my paper actually originate there. I’ll list some references at the end. So I hope that the toolbox of ordered commutative monoids will turn out to be useful for this. But here I would like to talk about an example application that is much easier to understand, but no less difficult to analyze: graph theory and the resource theory of zero-error communication.

A graph consists of a bunch of nodes connected by a bunch of edges, for example like this:

This particular graph is the pentagon graph or 5-cycle. To give it some resource-theoretic interpretation, think of it as the distinguishability graph of a communication channel, where the nodes are the symbols that can be sent across the channel, and two symbols share an edge if and only if they can be unambiguously decoded. For example, the pentagon graph roughly corresponds to the distinguishability graph of my handwriting, when restricted to five letters only:

So my ‘w’ is distinguishable from my ‘u’, but it may be confused for my ‘m’. In order to communicate unambiguously, it looks like I should restrict myself to using only two of those letters in writing, since any third of them may be mistaken for one of the other three. But alternatively, I could use a block code to create context around each letter which allows for perfect disambiguation. This is what happens in practice: I write in natural language, where an entire word is usually not ambiguous.

One can now also consider graph homomorphisms, which are maps like this:

The numbers on the nodes indicate where each node on the left gets mapped to. Formally, a graph homomorphism is a function taking nodes to nodes such that adjacent nodes get mapped to adjacent nodes. If a homomorphism G\to H exists between graphs G and H, then we also write H\geq G; in terms of communication channels, we can interpret this as saying that H simulates G, since the homomorphism provides a map between the symbols which preserves distinguishability. A ‘code’ for a communication channel is then just a homomorphism from the complete graph in which all nodes share an edge to the graph which describes the channel. With this ordering structure, the collection of all finite graphs forms an ordered set. This ordered set has an intricate structure which is intimately related to some big open problems in graph theory.

We can also combine two communication channels to form a compound one. Going back to the handwriting example, we can consider the new channel in which the symbols are pairs of letters. Two such pairs are distinguishable if and only if either the first letters of each pair are distinguishable or the second letters are,

(a,b) \sim (a',b') \:\Leftrightarrow\: a\sim a' \:\lor\: b\sim b'

When generalized to arbitrary graphs, this yields the definition of disjunctive product of graphs. It is not hard to show that this equips the ordered set of graphs with a binary operation compatible with the ordering, so that we obtain an ordered commutative monoid denoted Grph. It mathematically formalizes the resource theory of zero-error communication.

Using the toolbox of ordered commutative monoids combined with some concrete computations on graphs, one can show that Grph is not cancellative: if K_{11} is the complete graph on 11 nodes, then 3C_5\not\geq K_{11}, but there exists a graph G such that

3 C_5 + G \geq K_{11} + G

The graph G turns out to have 136 nodes. This result seems to be new. But if you happen to have seen something like this before, please let me know!

Last time, we also talked about rates of conversion. In Grph, it turns out that some of these correspond to famous graph invariants! For example, the rate of conversion from a graph G to the single-edge graph K_2 is Shannon capacity \Theta(\overline{G}), where \overline{G} is the complement graph. This is of no surprise since \Theta was originally defined by Shannon with precisely this rate in mind, although he did not use the language of ordered commutative monoids. In any case, the Shannon capacity \Theta(\overline{G}) is a graph invariant notorious for its complexity: it is not known whether there exists an algorithm to compute it! But an application of the Rate Theorem from Part 2 gives us a formula for the Shannon capacity:

\Theta(\overline{G}) = \inf_f f(G)

where f ranges over all graph invariants which are monotone under graph homomorphisms, multiplicative under disjunctive product, and normalized such that f(K_2) = 2. Unfortunately, this formula still not produce an algorithm for computing \Theta. But it nonconstructively proves the existence of many new graph invariants f which approximate the Shannon capacity from above.

Although my story ends here, I also feel that the whole project has barely started. There are lots of directions to explore! For example, it would be great to fit Shannon’s noisy channel coding theorem into this framework, but this has turned out be technically challenging. If you happen to be familiar with rate-distortion theory and you want to help out, please get in touch!


Here is a haphazard selection of references on resource theories in quantum information theory and related fields:

• Igor Devetak, Aram Harrow and Andreas Winter, A resource framework for quantum Shannon theory.

• Gilad Gour, Markus P. Müller, Varun Narasimhachar, Robert W. Spekkens and Nicole Yunger Halpern, The resource theory of informational nonequilibrium in thermodynamics.

• Fernando G.S.L. Brandão, Michał Horodecki, Nelly Huei Ying Ng, Jonathan Oppenheim and Stephanie Wehner, The second laws of quantum thermodynamics.

• Iman Marvian and Robert W. Spekkens, The theory of manipulations of pure state asymmetry: basic tools and equivalence classes of states under symmetric operations.

• Elliott H. Lieb and Jakob Yngvason, The physics and mathematics of the second law of thermodynamics.

Tommaso DorigoFun With A Dobson

It is galaxy season in the northern hemisphere, with Ursa Mayor at the zenith during the night and the Virgo cluster as high as it gets. And if you have ever put your eye on the eyepiece of a large telescope aimed at a far galaxy, you will agree it is quite an experience: you get to see light that traveled for tens or even hundreds of millions of years before reaching your pupil, crossing sizable portions of the universe to make a quite improbable rendez-vous with your photoreceptors. 

read more

Tommaso DorigoFun With A Dobson

It is galaxy season in the northern hemisphere, with Ursa Mayor at the zenith during the night and the Virgo cluster as high as it gets. And if you have ever put your eye on the eyepiece of a large telescope aimed at a far galaxy, you will agree it is quite an experience: you get to see light that traveled for tens or even hundreds of millions of years before reaching your pupil, crossing sizable portions of the universe to make a quite improbable rendez-vous with your photoreceptors. 

read more

April 13, 2015

John PreskillPaul Dirac and poetry

In science one tries to tell people, in such a way as to be understood by everyone, something that no one ever knew before. But in the case of poetry, it’s the exact opposite!

      – Paul Dirac

Paul Dirac

I tacked Dirac’s quote onto the bulletin board above my desk, the summer before senior year of high school. I’d picked quotes by T.S. Elliot and Einstein, Catullus and Hatshepsut.* In a closet, I’d found amber-, peach-, and scarlet-colored paper. I’d printed the quotes and arranged them, starting senior year with inspiration that looked like a sunrise.

Not that I knew who Paul Dirac was. Nor did I evaluate his opinion. But I’d enrolled in Advanced Placement Physics C and taken the helm of my school’s literary magazine. The confluence of two passions of mine—science and literature—in Dirac’s quote tickled me.

A fiery lecturer began to alleviate my ignorance in college. Dirac, I learned, had co-invented quantum theory. The “Dee-rac Equa-shun,” my lecturer trilled in her Italian accent, describes relativistic quantum systems—tiny particles associated with high speeds. I developed a taste for spin, a quantum phenomenon encoded in Dirac’s equation. Spin serves quantum-information scientists as two-by-fours serve carpenters: Experimentalists have tried to build quantum computers from particles that have spins. Theorists keep the idea of electron spins in a mental car trunk, to tote out when illustrating abstract ideas with examples.

The next year, I learned that Dirac had predicted the existence of antimatter. Three years later, I learned to represent antimatter mathematically. I memorized the Dirac Equation, forgot it, and re-learned it.

One summer in grad school, visiting my parents, I glanced at my bulletin board.

The sun rises beyond a window across the room from the board. Had the light faded the papers’ colors? If so, I couldn’t tell.

In science one tries to tell people, in such a way as to be understood by everyone, something that no one ever knew before. But in the case of poetry, it’s the exact opposite!

Do poets try to obscure ideas everyone understands? Some poets express ideas that people intuit but feel unable, lack the attention, or don’t realize one should, articulate. Reading and hearing poetry helps me grasp the ideas. Some poets express ideas in forms that others haven’t imagined.

Did Dirac not represent physics in a form that others hadn’t imagined?

Dirac Eqn

The Dirac Equation

Would you have imagined that form? I didn’t imagine it until learning it. Do scientists not express ideas—about gravity, time, energy, and matter—that people feel unable, lack the attention, or don’t realize we should, articulate?

The U.S. and Canada have designated April as National Poetry Month. A hub for cousins of poets, Quantum Frontiers salutes. Carry a poem in your pocket this month. Or carry a copy of the Dirac Equation. Or tack either on a bulletin board; I doubt whether their colors will fade.

*“Now my heart turns this way and that, as I think what the people will say. Those who see my monuments in years to come, and who shall speak of what I have done.” I expect to build no such monuments. But here’s to trying.

Doug NatelsonThe Leidenfrost Effect, or how I didn't burn myself in the kitchen

The transfer of heat, the energy content of materials tied to the disorganized motion of their constituents, is big business.  A typical car engine is cooled by conducting heat to a flowing mixture of water and glycol, and that mixture is cooled by transferring that heat to gas molecules that get blown past a radiator by a fan.  Without this transfer of heat, your engine would overheat and fail.  Likewise, the processor in your desktop computer generates about 100 W of thermal power, and that's carried away by either a fancy heat-sink with air blown across it by a fan, or through a liquid cooling system if you have a really fancy gaming machine.

Heat transfer is described quantitatively by a couple of different parameters.  The simplest one to think about is the thermal conductivity \(\kappa_{T}\).  If you have a hunk of material with cross-sectional area \(A\) and length \(L\), and the temperature difference between the hot side and the cold side is \(\Delta T\), the thermal conductivity (units of W/m-K in SI) tells you the rate (\(\dot{q}\), units of Watts) at which thermal energy is transferred across the material:  \( \dot{q} = \kappa_{T} A \Delta T/L\).

Where things can get tricky is that \(\kappa_{T}\) isn't necessarily just some material-specific number - the transport of heat can depend on lots of details.  For example, you could have heat being transferred from the bottom of a hot pot into water that's boiling.  Some of the energy from the solid is going into the kinetic energy of the liquid water molecules; some of that energy is going into popping molecules from the liquid and into the gas phase.  The motion of the liquid and the vapor is complicated, and made all the more so because \(\kappa_{T}\) for the liquid is \(>> \kappa_{T}\) for the vapor.  (There is a generalized quantity, the heat transfer coefficient, that is defined similarly to \(\kappa_{T}\) but is meant to encompass all this sort of mess.)  If you think about \(\dot{q}\) as the variable you control (for example, by cranking up the knob on your gas burner), you can have different regimes, as shown in the graph to the right (from this nice wikipedia entry).  

At the highest heat flux, the water right next to the pan flashes into a layer of vapor, and because that vapor is a relatively poor thermal conductor, the liquid water remains relatively cool (that is, because \(\kappa_{T}\) is low, \(\Delta T\) is comparatively large for a fixed \(\dot{q}\)).    This regime is called film boiling, and you have seen it if you've ever watched a droplet of water skitter over a hot pan, or watched a blob of liquid nitrogen skate across a lab floor.  The fact that the liquid stays comparatively cool is called the Leidenfrost Effect.   This comparatively thermal insulating property of the vapor layer can be very dramatic, as shown in this Mythbusters video, where they show that having wet hands allows you to momentarily dip your hand in molten lead (!) without being injured. Note that this demo was most famously performed by Prof. Jearl Walker, author of the Flying Circus of Physics, former Amateur Scientist columnist for SciAm, and inheritor of the mantle of Halliday and Resnick.  The Leidenfrost Effect is also the reason that I did not actually burn my (wet) hand on the handle of a hot roasting pan last weekend.

This heat transfer example is actually a particular instance of a more general phenomenon.  When some property of a material (here \(\kappa_{T}\)) is dramatically dependent on the phase of that material (here liquid vs vapor), and that property can help determine dynamically which phase the material is in, you can get very rich behavior, including oscillations.  This can be seen in boiling liquids, as well as electronic systems with a phase change (pdf example with a metal-insulator transition, link to a review of examples with superconductor-normal metal transitions ).  

BackreactionPhotonic Booms: How images can move faster than light, and what they can tell us.

If you sweep a laser pointer across the moon, will the spot move faster than the speed of light? Every physics major encounters this question at some point, and the answer is yes, it will. If you sweep the laser pointer it in an arc, the velocity of the spot increases with the distance to the surface you point at. On Earth, you only have to rotate the laser in a full arc within a few seconds, then it will move faster than the speed of light on the moon!

This simplified explanation would be all there is to say were the moon a disk, but the moon isn’t a disk and this makes the situation more interesting. The speed of the spot also increases the more parallel the surface you aim at is relative to the beam’s direction. And so the spot’s speed increases without bound as it reaches the edge of the visible part of the moon.

That’s the theory. In practice of course your average laser pointer isn’t strong enough to still be visible on the moon.

This faster-than-light motion is not in conflict with special relativity because the continuous movement of the spot is an illusion. What actually moves are the photons in the laser beam, and they move at the always same speed of light. But different photons illuminate different parts of the surface in a pattern synchronized by the photon’s collective origin, which appears like a continuous movement that can happen at arbitrary speed. It isn’t possible in this way to exchange information faster than the speed of light because information can only be sent from the source to the surface, not between the illuminated parts on the surface.

That is for what the movement of the spot on the surface is concerned. Trick question: If you sweep a laser pointer across the moon, what will you see? Note the subtle difference – now you have to take into account the travel time of the signal.

Let us assume for the sake of simplicity that you and the moon are not moving relative to each other, and you sweep from left to right. Let us also assume that the moon reflects diffusely into all directions, so you will see the spot regardless of where you are. This isn’t quite right but good enough for our purposes.

Now, if you were to measure the speed of the spot on the surface of the moon it would appear on the left moving faster than the speed of light initially, then slowing down as it approaches the place on the moon’s surface that is orthogonal to the beam, then speed up again. But that’s not what you would see on Earth. That’s because the very left and very right edges are also farther away and so the light takes longer to reach us. You would instead see a pair of spots appear close by the left edge and then separate, one of them disappearing at the left edge, the other moving across the moon to disappear on the other edge. The point where the spot pair seems to appear is the position where the velocity of the spot on the surface drops from above the speed of light to below.

This pair creation of spots happens for the same reason you hear a sonic boom when a plane passes by faster than the speed of sound. That’s because the signal (the sound or the light) is slower than what is causing the signal (the plane or the laser hitting the surface of the moon). The spot pair creation is thus signal of a “photonic boom,” a catchy phrase coined by Robert Nemiroff, Professor for astrophysics at Michigan Technological University, and one of the two people behind the Astronomy Picture Of the Day that clogs our facebook feeds every morning.

The most surprising thing about this spot pair creation is that nobody ever thought through this until December 2014, when Nemiroff put out a paper in which he laid out the math of the photonic booms. The above considerations for a perfectly spherical surface can be put in more general terms, taking into account also relative motion between the source and the reflecting surface. The upshot is that the spot pair creation events carry information about the structure of the surface that they are reflected on.

But why, you might wonder, who cares about spots on the Moon? To begin with, if you were to measure the structure of any object, say an asteroid, by aiming at it with laser beams and recording the reflections, then you would have to take into account this effect. Maybe more interestingly, these spot pair creations probably occur in astrophysical situations. Nemiroff in his paper for example mentions the binary pulsar 3U 0900-40, whose x-rays may be scattering off the surface of its companion, a signal that one will misinterpret without knowing about photonic booms.

The above considerations don’t only apply to illuminated spots but also to shadows. Shadows can be cast for example by opaque clouds on reflecting nebula, resulting in changes of brightness that may appear to move faster than the speed of light. There are many nebula that show changes in brightness thought to be due to such effects, like for example the Hubble Nebula (HVN: NGC 2260). Again, one cannot properly analyze these situations without taking into account the spot pair creation effect.

In his January paper, Nemiroff hints at an upcoming paper “in preparation” with a colleague, so I think we will hear more about the photonic booms in the near future.

In 2015, Special Relativity is 110 years old, but it still holds surprises for us.

This post first appeared on Starts with A Bang with the title "Photonic Booms".

Matt StrasslerDark Matter: How Could the Large Hadron Collider Discover It?

Dark Matter. Its existence is still not 100% certain, but if it exists, it is exceedingly dark, both in the usual sense — it doesn’t emit light or reflect light or scatter light — and in a more general sense — it doesn’t interact much, in any way, with ordinary stuff, like tables or floors or planets or  humans. So not only is it invisible (air is too, after all, so that’s not so remarkable), it’s actually extremely difficult to detect, even with the best scientific instruments. How difficult? We don’t even know, but certainly more difficult than neutrinos, the most elusive of the known particles. The only way we’ve been able to detect dark matter so far is through the pull it exerts via gravity, which is big only because there’s so much dark matter out there, and because it has slow but inexorable and remarkable effects on things that we can see, such as stars, interstellar gas, and even light itself.

About a week ago, the mainstream press was reporting, inaccurately, that the leading aim of the Large Hadron Collider [LHC], after its two-year upgrade, is to discover dark matter. [By the way, on Friday the LHC operators made the first beams with energy-per-proton of 6.5 TeV, a new record and a major milestone in the LHC’s restart.]  There are many problems with such a statement, as I commented in my last post, but let’s leave all that aside today… because it is true that the LHC can look for dark matter.   How?

When people suggest that the LHC can discover dark matter, they are implicitly assuming

  • that dark matter exists (very likely, but perhaps still with some loopholes),
  • that dark matter is made from particles (which isn’t established yet) and
  • that dark matter particles can be commonly produced by the LHC’s proton-proton collisions (which need not be the case).

You can question these assumptions, but let’s accept them for now.  The question for today is this: since dark matter barely interacts with ordinary matter, how can scientists at an LHC experiment like ATLAS or CMS, which is made from ordinary matter of course, have any hope of figuring out that they’ve made dark matter particles?  What would have to happen before we could see a BBC or New York Times headline that reads, “Large Hadron Collider Scientists Claim Discovery of Dark Matter”?

Well, to address this issue, I’m writing an article in three stages. Each stage answers one of the following questions:

  1. How can scientists working at ATLAS or CMS be confident that an LHC proton-proton collision has produced an undetected particle — whether this be simply a neutrino or something unfamiliar?
  2. How can ATLAS or CMS scientists tell whether they are making something new and Nobel-Prizeworthy, such as dark matter particles, as opposed to making neutrinos, which they do every day, many times a second?
  3. How can we be sure, if ATLAS or CMS discovers they are making undetected particles through a new and unknown process, that they are actually making dark matter particles?

My answer to the first question is finished; you can read it now if you like.  The second and third answers will be posted later during the week.

But if you’re impatient, here are highly compressed versions of the answers, in a form which is accurate, but admittedly not very clear or precise.

  1. Dark matter particles, like neutrinos, would not be observed directly. Instead their presence would be indirectly inferred, by observing the behavior of other particles that are produced alongside them.
  2. It is impossible to directly distinguish dark matter particles from neutrinos or from any other new, equally undetectable particle. But the equations used to describe the known elementary particles (the “Standard Model”) predict how often neutrinos are produced at the LHC. If the number of neutrino-like objects is larger that the predictions, that will mean something new is being produced.
  3. To confirm that dark matter is made from LHC’s new undetectable particles will require many steps and possibly many decades. Detailed study of LHC data can allow properties of the new particles to be inferred. Then, if other types of experiments (e.g. LUX or COGENT or Fermi) detect dark matter itself, they can check whether it shares the same properties as LHC’s new particles. Only then can we know if LHC discovered dark matter.

I realize these brief answers are cryptic at best, so if you want to learn more, please check out my new article.

Filed under: Dark Matter, LHC Background Info, LHC News, Particle Physics Tagged: atlas, cms, DarkMatter, LHC

John BaezResource Convertibility (Part 1)

guest post by Tobias Fritz

Hi! I am Tobias Fritz, a mathematician at the Perimeter Institute for Theoretical Physics in Waterloo, Canada. I like to work on all sorts of mathematical structures which pop up in probability theory, information theory, and other sorts of applied math. Today I would like to tell you about my latest paper:

The mathematical structure of theories of resource convertibility, I.

It should be of interest to Azimuth readers as it forms part of what John likes to call ‘green mathematics’. So let’s get started!

Resources and their management are an essential part of our everyday life. We deal with the management of time or money pretty much every day. We also consume natural resources in order to afford food and amenities for (some of) the 7 billion people on our planet. Many of the objects that we deal with in science and engineering can be considered as resources. For example, a communication channel is a resource for sending information from one party to another. But for now, let’s stick with a toy example: timber and nails constitute a resource for making a table. In mathematical notation, this looks like so:

\mathrm{timber} + \mathrm{nails} \geq \mathrm{table}

We interpret this inequality as saying that “given timber and nails, we can make a table”. I like to write it as an inequality like this, which I think of as stating that having timber and nails is at least as good as having a table, because the timber and nails can always be turned into a table whenever one needs a table.

To be more precise, we should also take into account that making the table requires some tools. These tools do not get consumed in the process, so we also get them back out:

\text{timber} + \text{nails} + \text{saw} + \text{hammer} \geq \text{table} + \text{hammer} + \text{saw}

Notice that this kind of equation is analogous to a chemical reaction equation like this:

2 \mathrm{H}_2 + \mathrm{O}_2 \geq \mathrm{H}_2\mathrm{O}

So given a hydrogen molecules and an oxygen molecule, we can let them react such as to form a molecule of water. In chemistry, this kind of equation would usually be written with an arrow ‘\rightarrow’ instead of an ordering symbol ‘\geq’ , but here we interpret the equation slightly differently. As with the timber and the nails and nails above, the inequality says that if we have two hydrogen atoms and an oxygen atom, then we can let them react to a molecule of water, but we don’t have to. In this sense, having two hydrogen atoms and an oxygen atom is at least as good as having a molecule of water.

So what’s going on here, mathematically? In all of the above equations, we have a bunch of stuff on each side and an inequality ‘\geq’ in between. The stuff on each side consists of a bunch of objects tacked together via ‘+’ . With respect to these two pieces of structure, the collection of all our resource objects forms an ordered commutative monoid:

Definition: An ordered commutative monoid is a set A equipped with a binary relation \geq, a binary operation +, and a distinguished element 0 such that the following hold:

+ and 0 equip A with the structure of a commutative monoid;

\geq equips A with the structure of a partially ordered set;

• addition is monotone: if x\geq y, then also x + z \geq y + z.

Here, the third axiom is the most important, since it tells us how the additive structure interacts with the ordering structure.

Ordered commutative monoids are the mathematical formalization of resource convertibility and combinability as follows. The elements x,y\in A are the resource objects, corresponding to the ‘collections of stuff’ in our earlier examples, such as x = \text{timber} + \text{nails} or y = 2 \text{H}_2 + \text{O}_2. Then the addition operation simply joins up collections of stuff into bigger collections of stuff. The ordering relation \geq is what formalizes resource convertibility, as in the examples above. The third axiom states that if we can convert x into y, then we can also convert x together with z into y together with z for any z, for example by doing nothing to z.

A mathematically minded reader might object that requiring A to form a partially ordered set under \geq is too strong a requirement, since it requires two resource objects to be equal as soon as they are mutually interconvertible: x \geq y and y \geq x implies x = y. However, I think that this is not an essential restriction, because we can regard this implication as the definition of equality: ‘x = y’ is just a shorthand notation for ‘x\geq y and y\geq x’ which formalizes the perfect interconvertibility of resource objects.

We could now go back to the original examples and try to model carpentry and chemistry in terms of ordered commutative monoids. But as a mathematician, I needed to start out with something mathematically precise and rigorous as a testing ground for the formalism. This helps ensure that the mathematics is sensible and useful before diving into real-world applications. So, the main example in my paper is the ordered commutative monoid of graphs, which has a resource-theoretic interpretation in terms of zero-error information theory. As graph theory is a difficult and traditional subject, this application constitutes the perfect training camp for the mathematics of ordered commutative monoids. I will get to this in Part 3.

In Part 2, I will say something about what one can do with ordered commutative monoids. In the meantime, I’d be curious to know what you think about what I’ve said so far!

Resource convertibility: part 2.

John BaezResource Convertibility (Part 2)

guest post by Tobias Fritz

In Part 1, I introduced ordered commutative monoids as a mathematical formalization of resources and their convertibility. Today I’m going to say something about what to do with this formalization. Let’s start with a quick recap!

Definition: An ordered commutative monoid is a set A equipped with a binary relation \geq, a binary operation +, and a distinguished element 0 such that the following hold:

+ and 0 equip A with the structure of a commutative monoid;

\geq equips A with the structure of a partially ordered set;

• addition is monotone: if x\geq y, then also x + z \geq y + z.

Recall also that we think of the x,y\in A as resource objects such that x+y represents the object consisting of x and y together, and x\geq y means that the resource object x can be converted into y.

When confronted with an abstract definition like this, many people ask: so what is it useful for? The answer to this is twofold: first, it provides a language which we can use to guide our thoughts in any application context. Second, the definition itself is just the very start: we can now also prove theorems about ordered commutative monoids, which can be instantiated in any particular application context. So the theory of ordered commutative monoids will provide a useful toolbox for talking about concrete resource theories and studying them. In the remainder of this post, I’d like to say a bit about what this toolbox contains. For more, you’ll have to read the paper!

To start, let’s consider catalysis as one of the resource-theoretic phenomena neatly captured by ordered commutative monoids. Catalysis is the phenomenon that certain conversions become possible only due to the presence of a catalyst, which is an additional resource object which does not get consumed in the process of the conversion. For example, we have

\text{timber + nails}\not\geq \text{table},

\text{timber + nails + saw + hammer} \geq \text{table + saw + hammer}

because making a table from timber and nails requires a saw and a hammer as tools. So in this example, ‘saw + hammer’ is a catalyst for the conversion of ‘timber + nails’ into ‘table’. In mathematical language, catalysis occurs precisely when the ordered commutative monoid is not cancellative, which means that x + z\geq y + z sometimes holds even though x\geq y does not. So, the notion of catalysis perfectly matches up with a very natural and familiar notion from algebra.

One can continue along these lines and study those ordered commutative monoids which are cancellative. It turns out that every ordered commutative monoid can be made cancellative in a universal way; in the resource-theoretic interpretation, this boils down to replacing the convertibility relation by catalytic convertibility, in which x is declared to be convertible into y as soon as there exists a catalyst which achieves this conversion. Making an ordered commutative monoid cancellative like this is a kind of ‘regularization': it leads to a mathematically more well-behaved structure. As it turns out, there are several additional steps of regularization that can be performed, and all of these are both mathematically natural and have an appealing resource-theoretic interpretation. These regularizations successively take us from the world of ordered commutative monoids to the realm of linear algebra and functional analysis, where powerful theorems are available. For now, let me not go into the details, but only try to summarize one of the consequences of this development. This requires a bit of preparation.

In many situations, it is not just of interest to convert a single copy of some resource object x into a single copy of some y; instead, one may be interested in converting many copies of x into many copies of y all together, and thereby maximizing (or minimizing) the ratio of the resulting number of y‘s compared to the number of x‘s that get consumed. This ratio is measured by the maximal rate:

\displaystyle{ R_{\mathrm{max}}(x\to y) = \sup \left\{ \frac{m}{n} \:|\: nx \geq my \right\} }

Here, m and n are natural numbers, and nx stands for the n-fold sum x+\cdots+x, and similarly for my. So this maximal rate quantifies how many y’ s we can get out of one copy of x, when working in a ‘mass production’ setting. There is also a notion of regularized rate, which has a slightly more complicated definition that I don’t want to spell out here, but is similar in spirit. The toolbox of ordered commutative monoids now provides the following result:

Rate Theorem: If x\geq 0 and y\geq 0 in an ordered commutative monoid A which satisfies a mild technical assumption, then the maximal regularized rate from x to y can be computed like this:

\displaystyle{ R_{\mathrm{max}}^{\mathrm{reg}}(x\to y) = \inf_f \frac{f(y)}{f(x)} }

where f ranges over all functionals on A with f(y)\neq 0.

Wait a minute, what’s a ‘functional’? It’s defined to be a map f:A\to\mathbb{R} which is monotone,

x\geq y \:\Rightarrow\: f(x)\geq f(y)

and additive,

f(x+y) = f(x) + f(y)

In economic terms, we can think of a functional as a consistent assignment of prices to all resource objects. If x is at least as useful as y, then the price of x should be at least as high as the price of y; and the price of two objects together should be the sum of their individual prices. So the f in the rate formula above ranges over all ‘markets’ on which resource objects can be ‘traded’ at consistent prices. The term ‘functional’ is supposed to hint at a relation to functional analysis. In fact, the proof of the theorem crucially relies on the Hahn–Banach Theorem.

The mild technical mentioned in the Rate Theorem is that the ordered commutative monoid needs to have a generating pair. This turns out to hold in the applications that I have considered so far, and I hope that it will turn out to hold in most others as well. For the full gory details, see the paper.

So this provides some idea of what kinds of gadgets one can find in the toolbox of ordered commutative monoids. Next time, I’ll show some applications to graph theory and zero-error communication and say a bit about where this project might be going next.

April 12, 2015

n-Category Café The Structure of A

I attended a workshop last week down in Bristol organised by James Ladyman and Stuart Presnell, as part of their Homotopy Type Theory project.

Urs was there, showing everyone his magical conjuring trick where the world emerges out of the opposition between \emptyset and *\ast\; in Modern Physics formalized in Modal Homotopy Type Theory.

Jamie Vicary spoke on the Categorified Heisenberg Algebra. (See also John’s page.) After the talk, interesting connections were discussed with dependent linear type theory and tangent (infinity, 1)-toposes. It seems that André Joyal and colleagues are working on the latter. This should link up with Urs’s Quantization via Linear homotopy types at some stage.

As for me, I was speaking on the subject of my chapter for the book that Mike’s Introduction to Synthetic Mathematics and John’s Concepts of Sameness will appear in. It’s on reviving the philosophy of geometry through the (synthetic) approach of cohesion.

In the talk I mentioned the outcome of some further thinking about how to treat the phrase ‘the structure of AA’ for a mathematical entity. It occurred to me to combine what I wrote in that discussion we once had on The covariance of coloured balls with the analysis of ‘the’ from The King of France thread. After the event I thought I’d write out a note explaining this point of view, and it can be found here. Thanks to Mike and Urs for suggestions and comments.

The long and the short of it is that there’s no great need for the word ‘structure’ when using homotopy type theory. If anyone has any thoughts, I’d like to hear them.

April 11, 2015

Terence TaoThe ergodic theorem and Gowers-Host-Kra seminorms without separability or amenability

The von Neumann ergodic theorem (the Hilbert space version of the mean ergodic theorem) asserts that if {U: H \rightarrow H} is a unitary operator on a Hilbert space {H}, and {v \in H} is a vector in that Hilbert space, then one has

\displaystyle \lim_{N \rightarrow \infty} \frac{1}{N} \sum_{n=1}^N U^n v = \pi_{H^U} v

in the strong topology, where {H^U := \{ w \in H: Uw = w \}} is the {U}-invariant subspace of {H}, and {\pi_{H^U}} is the orthogonal projection to {H^U}. (See e.g. these previous lecture notes for a proof.) The same proof extends to more general amenable groups: if {G} is a countable amenable group acting on a Hilbert space {H} by unitary transformations {T^g: H \rightarrow H} for {g \in G}, and {v \in H} is a vector in that Hilbert space, then one has

\displaystyle \lim_{N \rightarrow \infty} \mathop{\bf E}_{g \in \Phi_N} T^g v = \pi_{H^G} v \ \ \ \ \ (1)


for any Folner sequence {\Phi_N} of {G}, where {H^G := \{ w \in H: T^g w = w \hbox{ for all }g \in G \}} is the {G}-invariant subspace, and {\mathop{\bf E}_{a \in A} f(a) := \frac{1}{|A|} \sum_{a \in A} f(a)} is the average of {f} on {A}. Thus one can interpret {\pi_{H^G} v} as a certain average of elements of the orbit {Gv := \{ T^g v: g \in G \}} of {v}.

In a previous blog post, I noted a variant of this ergodic theorem (due to Alaoglu and Birkhoff) that holds even when the group {G} is not amenable (or not discrete), using a more abstract notion of averaging:

Theorem 1 (Abstract ergodic theorem) Let {G} be an arbitrary group acting unitarily on a Hilbert space {H}, and let {v} be a vector in {H}. Then {\pi_{H^G} v} is the element in the closed convex hull of {Gv := \{ T^g v: g \in G \}} of minimal norm, and is also the unique element of {H^G} in this closed convex hull.

I recently stumbled upon a different way to think about this theorem, in the additive case {G = (G,+)} when {G} is abelian, which has a closer resemblance to the classical mean ergodic theorem. Given an arbitrary additive group {G = (G,+)} (not necessarily discrete, or countable), let {{\mathcal F}} denote the collection of finite non-empty multisets in {G} – that is to say, unordered collections {\{a_1,\dots,a_n\}} of elements {a_1,\dots,a_n} of {G}, not necessarily distinct, for some positive integer {n}. Given two multisets {A = \{a_1,\dots,a_n\}}, {B = \{b_1,\dots,b_m\}} in {{\mathcal F}}, we can form the sum set {A + B := \{ a_i + b_j: 1 \leq i \leq n, 1 \leq j \leq m \}}. Note that the sum set {A+B} can contain multiplicity even when {A, B} do not; for instance, {\{ 1,2\} + \{1,2\} = \{2,3,3,4\}}. Given a multiset {A = \{a_1,\dots,a_n\}} in {{\mathcal F}}, and a function {f: G \rightarrow H} from {G} to a vector space {H}, we define the average {\mathop{\bf E}_{a \in A} f(a)} as

\displaystyle \mathop{\bf E}_{a \in A} f(a) = \frac{1}{n} \sum_{j=1}^n f(a_j).

Note that the multiplicity function of the set {A} affects the average; for instance, we have {\mathop{\bf E}_{a \in \{1,2\}} a = \frac{3}{2}}, but {\mathop{\bf E}_{a \in \{1,2,2\}} a = \frac{5}{3}}.

We can define a directed set on {{\mathcal F}} as follows: given two multisets {A,B \in {\mathcal F}}, we write {A \geq B} if we have {A = B+C} for some {C \in {\mathcal F}}. Thus for instance we have {\{ 1, 2, 2, 3\} \geq \{1,2\}}. It is easy to verify that this operation is transitive and reflexive, and is directed because any two elements {A,B} of {{\mathcal F}} have a common upper bound, namely {A+B}. (This is where we need {G} to be abelian.) The notion of convergence along a net, now allows us to define the notion of convergence along {{\mathcal F}}; given a family {x_A} of points in a topological space {X} indexed by elements {A} of {{\mathcal F}}, and a point {x} in {X}, we say that {x_A} converges to {x} along {{\mathcal F}} if, for every open neighbourhood {U} of {x} in {X}, one has {x_A \in U} for sufficiently large {A}, that is to say there exists {B \in {\mathcal F}} such that {x_A \in U} for all {A \geq B}. If the topological space {V} is Hausdorff, then the limit {x} is unique (if it exists), and we then write

\displaystyle x = \lim_{A \rightarrow G} x_A.

When {x_A} takes values in the reals, one can also define the limit superior or limit inferior along such nets in the obvious fashion.

We can then give an alternate formulation of the abstract ergodic theorem in the abelian case:

Theorem 2 (Abelian abstract ergodic theorem) Let {G = (G,+)} be an arbitrary additive group acting unitarily on a Hilbert space {H}, and let {v} be a vector in {H}. Then we have

\displaystyle \pi_{H^G} v = \lim_{A \rightarrow G} \mathop{\bf E}_{a \in A} T^a v

in the strong topology of {H}.

Proof: Suppose that {A \geq B}, so that {A=B+C} for some {C \in {\mathcal F}}, then

\displaystyle \mathop{\bf E}_{a \in A} T^a v = \mathop{\bf E}_{c \in C} T^c ( \mathop{\bf E}_{b \in B} T^b v )

so by unitarity and the triangle inequality we have

\displaystyle \| \mathop{\bf E}_{a \in A} T^a v \|_H \leq \| \mathop{\bf E}_{b \in B} T^b v \|_H,

thus {\| \mathop{\bf E}_{a \in A} T^a v \|_H^2} is monotone non-increasing in {A}. Since this quantity is bounded between {0} and {\|v\|_H}, we conclude that the limit {\lim_{A \rightarrow G} \| \mathop{\bf E}_{a \in A} T^a v \|_H^2} exists. Thus, for any {\varepsilon > 0}, we have for sufficiently large {A} that

\displaystyle \| \mathop{\bf E}_{b \in B} T^b v \|_H^2 \geq \| \mathop{\bf E}_{a \in A} T^a v \|_H^2 - \varepsilon

for all {B \geq A}. In particular, for any {g \in G}, we have

\displaystyle \| \mathop{\bf E}_{b \in A + \{0,g\}} T^b v \|_H^2 \geq \| \mathop{\bf E}_{a \in A} T^a v \|_H^2 - \varepsilon.

We can write

\displaystyle \mathop{\bf E}_{b \in A + \{0,g\}} T^b v = \frac{1}{2} \mathop{\bf E}_{a \in A} T^a v + \frac{1}{2} T^g \mathop{\bf E}_{a \in A} T^a v

and so from the parallelogram law and unitarity we have

\displaystyle \| \mathop{\bf E}_{a \in A} T^a v - T^g \mathop{\bf E}_{a \in A} T^a v \|_H^2 \leq 4 \varepsilon

for all {g \in G}, and hence by the triangle inequality (averaging {g} over a finite multiset {C})

\displaystyle \| \mathop{\bf E}_{a \in A} T^a v - \mathop{\bf E}_{b \in A+C} T^b v \|_H^2 \leq 4 \varepsilon

for any {C \in {\mathcal F}}. This shows that {\mathop{\bf E}_{a \in A} T^a v} is a Cauchy sequence in {H} (in the strong topology), and hence (by the completeness of {H}) tends to a limit. Shifting {A} by a group element {g}, we have

\displaystyle \lim_{A \rightarrow G} \mathop{\bf E}_{a \in A} T^a v = \lim_{A \rightarrow G} \mathop{\bf E}_{a \in A + \{g\}} T^a v = T^g \lim_{A \rightarrow G} \mathop{\bf E}_{a \in A} T^a v

and hence {\lim_{A \rightarrow G} \mathop{\bf E}_{a \in A} T^a v} is invariant under shifts, and thus lies in {H^G}. On the other hand, for any {w \in H^G} and {A \in {\mathcal F}}, we have

\displaystyle \langle \mathop{\bf E}_{a \in A} T^a v, w \rangle_H = \mathop{\bf E}_{a \in A} \langle v, T^{-a} w \rangle_H = \langle v, w \rangle_H

and thus on taking strong limits

\displaystyle \langle \lim_{A \rightarrow G} \mathop{\bf E}_{a \in A} T^a v, w \rangle_H = \langle v, w \rangle_H

and so {v - \lim_{A \rightarrow G} \mathop{\bf E}_{a \in A} T^a v} is orthogonal to {H^G}. Combining these two facts we see that {\lim_{A \rightarrow G} \mathop{\bf E}_{a \in A} T^a v} is equal to {\pi_{H^G} v} as claimed. \Box

To relate this result to the classical ergodic theorem, we observe

Lemma 3 Let {G} be a countable additive group, with a F{\o}lner sequence {\Phi_n}, and let {f_g} be a bounded sequence in a normed vector space indexed by {G}. If {\lim_{A \rightarrow G} \mathop{\bf E}_{a \in A} f_a} exists, then {\lim_{n \rightarrow \infty} \mathop{\bf E}_{a \in \Phi_n} f_a} exists, and the two limits are equal.

Proof: From the F{\o}lner property, we see that for any {A} and any {\varepsilon>0}, the averages {\mathop{\bf E}_{a \in \Phi_n} f_a} and {\mathop{\bf E}_{a \in A+\Phi_n} f_a} differ by at most {\varepsilon} in norm if {n} is sufficiently large depending on {A}, {\varepsilon} (and the {f_a}). On the other hand, by the existence of the limit {\lim_{A \rightarrow G} \mathop{\bf E}_{a \in A} f_a}, the averages {\mathop{\bf E}_{a \in A} f_a} and {\mathop{\bf E}_{a \in A + \Phi_n} f_a} differ by at most {\varepsilon} in norm if {A} is sufficiently large depending on {\varepsilon} (regardless of how large {n} is). The claim follows. \Box

It turns out that this approach can also be used as an alternate way to construct the GowersHost-Kra seminorms in ergodic theory, which has the feature that it does not explicitly require any amenability on the group {G} (or separability on the underlying measure space), though, as pointed out to me in comments, even uncountable abelian groups are amenable in the sense of possessing an invariant mean, even if they do not have a F{\o}lner sequence.

Given an arbitrary additive group {G}, define a {G}-system {({\mathrm X}, T)} to be a probability space {{\mathrm X} = (X, {\mathcal X}, \mu)} (not necessarily separable or standard Borel), together with a collection {T^g: X \rightarrow X} of invertible, measure-preserving maps, such that {T^0} is the identity and {T^g T^h = T^{g+h}} (modulo null sets) for all {g,h \in G}. This then gives isomorphisms {T^g: L^p({\mathrm X}) \rightarrow L^p({\mathrm X})} for {1 \leq p \leq \infty} by setting {T^g f(x) := f(T^{-g} x)}. From the above abstract ergodic theorem, we see that

\displaystyle {\mathbf E}( f | {\mathcal X}^G ) = \lim_{A \rightarrow G} \mathop{\bf E}_{a \in A} T^g f

in the strong topology of {L^2({\mathrm X})} for any {f \in L^2({\mathrm X})}, where {{\mathcal X}^G} is the collection of measurable sets {E} that are essentially {G}-invariant in the sense that {T^g E = E} modulo null sets for all {g \in G}, and {{\mathbf E}(f|{\mathcal X}^G)} is the conditional expectation of {f} with respect to {{\mathcal X}^G}.

In a similar spirit, we have

Theorem 4 (Convergence of Gowers-Host-Kra seminorms) Let {({\mathrm X},T)} be a {G}-system for some additive group {G}. Let {d} be a natural number, and for every {\omega \in\{0,1\}^d}, let {f_\omega \in L^{2^d}({\mathrm X})}, which for simplicity we take to be real-valued. Then the expression

\displaystyle \langle (f_\omega)_{\omega \in \{0,1\}^d} \rangle_{U^d({\mathrm X})} := \lim_{A_1,\dots,A_d \rightarrow G}

\displaystyle \mathop{\bf E}_{h_1 \in A_1-A_1,\dots,h_d \in A_d-A_d} \int_X \prod_{\omega \in \{0,1\}^d} T^{\omega_1 h_1 + \dots + \omega_d h_d} f_\omega\ d\mu

converges, where we write {\omega = (\omega_1,\dots,\omega_d)}, and we are using the product direct set on {{\mathcal F}^d} to define the convergence {A_1,\dots,A_d \rightarrow G}. In particular, for {f \in L^{2^d}({\mathrm X})}, the limit

\displaystyle \| f \|_{U^d({\mathrm X})}^{2^d} = \lim_{A_1,\dots,A_d \rightarrow G}

\displaystyle \mathop{\bf E}_{h_1 \in A_1-A_1,\dots,h_d \in A_d-A_d} \int_X \prod_{\omega \in \{0,1\}^d} T^{\omega_1 h_1 + \dots + \omega_d h_d} f\ d\mu


We prove this theorem below the fold. It implies a number of other known descriptions of the Gowers-Host-Kra seminorms {\|f\|_{U^d({\mathrm X})}}, for instance that

\displaystyle \| f \|_{U^d({\mathrm X})}^{2^d} = \lim_{A \rightarrow G} \mathop{\bf E}_{h \in A-A} \| f T^h f \|_{U^{d-1}({\mathrm X})}^{2^{d-1}}

for {d > 1}, while from the ergodic theorem we have

\displaystyle \| f \|_{U^1({\mathrm X})} = \| {\mathbf E}( f | {\mathcal X}^G ) \|_{L^2({\mathrm X})}.

This definition also manifestly demonstrates the cube symmetries of the Host-Kra measures {\mu^{[d]}} on {X^{\{0,1\}^d}}, defined via duality by requiring that

\displaystyle \langle (f_\omega)_{\omega \in \{0,1\}^d} \rangle_{U^d({\mathrm X})} = \int_{X^{\{0,1\}^d}} \bigotimes_{\omega \in \{0,1\}^d} f_\omega\ d\mu^{[d]}.

In a subsequent blog post I hope to present a more detailed study of the {U^2} norm and its relationship with eigenfunctions and the Kronecker factor, without assuming any amenability on {G} or any separability or topological structure on {{\mathrm X}}.

— 1. Proof of theorem —

If {\vec f := (f_\omega)_{\omega \in \{0,1\}^d}} is a tuple of functions {f_\omega \in L^{2^d}({\mathrm X})} and {0 \leq d' \leq d}, we say that {\vec f} is {d'}-symmetric if we have {f_\omega = f_{\omega'}} whenever {\omega = (\omega_1,\dots,\omega_d)} and {\omega' = (\omega'_1,\dots,\omega'_d)} agree in the first {d'} components (that is, {\omega_i = \omega'_i} for {i=1,\dots,d'}). We will prove Theorem 4 by downward induction on {d'}, with the {d'=0} case establishing the full theorem.

Thus, assume that {0 \leq d' \leq d} and that the claim has already been proven for larger values of {d'} (this hypothesis is vacuous for {d'=d}). Write

\displaystyle F( A_1,\dots, A_d ) := \mathop{\bf E}_{h_1 \in A_1-A_1,\dots,h_d \in A_d-A_d} \int_X \prod_{\omega \in \{0,1\}^d} T^{\omega_1 h_1 + \dots + \omega_d h_d} f_\omega\ d\mu

We will show that for any {\varepsilon > 0}, and for sufficiently large {A_1,\dots,A_d} (in the net {{\mathcal F}}), the quantity {F(A_1,\dots,A_d)} can only increase by at most {\varepsilon} when one increases any of the {A_i}, {1 \leq i \leq d}, that is to say that

\displaystyle F(A_1,\dots,A_{i-1}, A'_i, A_{i+1},\dots, A_d) \leq F(A_1,\dots,A_d) + \varepsilon

whenever {1 \leq i \leq d} and {A'_i \geq A_i}. This implies that the limit superior of {F(A_1,\dots,A_d)} exceeds the limit inferior by at most {d\varepsilon}, and on sending {\varepsilon \rightarrow 0} we will obtain Theorem 4.

There are two cases, depending on whether {i \leq d'} or {d' < i \leq d}. We begin with the first case {i \leq d'}. By relabeling we may take {i=1}, so that {d' \geq 1}. As {\vec f} is {d'}-symmetric, we can write

\displaystyle F(A_1,\dots,A_d) = \mathop{\bf E}_{h_2 \in A_2-A_2,\dots,h_d \in A_d-A_d} \| \mathop{\bf E}_{a \in A_1} T^{a} f_{h_2,\dots,h_d}\|_{L^2({\mathrm X})}^2


\displaystyle f_{h_2,\dots,h_d} := \prod_{\omega_2,\dots,\omega_d \in \{0,1\}} T^{\omega_2 h_2 + \dots + \omega_d h_d} f_{(0,\omega_2,\dots,\omega_d)}.

By the triangle inequality argument used to prove Theorem 2 we thus see that

\displaystyle F(A_1 + B, A_2,\dots,A_d) \leq F(A_1,\dots,A_d),

and so {F} certainly cannot increase by {\varepsilon} by increasing {A_1}.

Now we turn to the case when {d' < i \leq d}. By relabeling we may take {i=d}, so that {d' < d}. We can write

\displaystyle F(A_1,\dots,A_d) = \mathop{\bf E}_{h_1 \in A_1-A_1,\dots,h_{d-1} \in A_{d-1}-A_{d-1}}

\displaystyle \left\langle \mathop{\bf E}_{a \in A_d} T^{a} f^0_{h_1,\dots,h_{d-1}}, \mathop{\bf E}_{a \in A_d} T^{a} f^1_{h_1,\dots,h_{d-1}} \right\rangle_{L^2({\mathrm X})}


\displaystyle f^{\omega_d}_{h_1,\dots,h_{d-1}} := \prod_{\omega_1,\dots,\omega_{d-1} \in \{0,1\}} T^{\omega_1 h_1 + \dots + \omega_{d-1} h_{d-1}} f_{(\omega_1,\omega_2,\dots,\omega_d)}.

On the other hand, the quantity

\displaystyle \mathop{\bf E}_{h_1 \in A_1-A_1,\dots,h_{d-1} \in A_{d-1}-A_{d-1}} \| \mathop{\bf E}_{a \in A_d} T^{a} f^0_{h_1,\dots,h_{d-1}} \|_{L^2({\mathrm X})}^2

is the same as {F(A_1,\dots,A_d)}, but with {f_{(\omega_1,\dots,\omega_{d-1},1)}} replaced by {f_{(\omega_1,\dots,\omega_{d-1},0)}}. After rearrangement, this is a {d'+1}-symmetric inner product, and so by induction hypothesis the limit

\displaystyle \lim_{A_1,\dots,A_d \rightarrow G} \mathop{\bf E}_{h_1 \in A_1-A_1,\dots,h_{d-1} \in A_{d-1}-A_{d-1}} \| \mathop{\bf E}_{a \in A_d} T^{a} f^0_{h_1,\dots,h_{d-1}} \|_{L^2({\mathrm X})}^2

exists. In particular, for {A_1,\dots,A_d} large enough, we have

\displaystyle \mathop{\bf E}_{h_1 \in A_1-A_1,\dots,h_{d-1} \in A_{d-1}-A_{d-1}} \| \mathop{\bf E}_{a \in A_d + \{0,g\}} T^{a} f^0_{h_1,\dots,h_{d-1}} \|_{L^2({\mathrm X})}^2

\displaystyle \geq \mathop{\bf E}_{h_1 \in A_1-A_1,\dots,h_{d-1} \in A_{d-1}-A_{d-1}} \| \mathop{\bf E}_{a \in A_d} T^{a} f^0_{h_1,\dots,h_{d-1}} \|_{L^2({\mathrm X})}^2 - \varepsilon

for all {g \in G}, which by the parallelogram law as in the proof of Theorem 2 shows that

\displaystyle \mathop{\bf E}_{h_1 \in A_1-A_1,\dots,h_{d-1} \in A_{d-1}-A_{d-1}}

\displaystyle \| \mathop{\bf E}_{a \in A_d + \{g\}} T^{a} f^0_{h_1,\dots,h_{d-1}} - \mathop{\bf E}_{a \in A_d} T^{a} f^0_{h_1,\dots,h_{d-1}} \|_{L^2({\mathrm X})}^2 \leq 4 \varepsilon

and hence by averaging

\displaystyle \mathop{\bf E}_{h_1 \in A_1-A_1,\dots,h_{d-1} \in A_{d-1}-A_{d-1}}

\displaystyle \| \mathop{\bf E}_{a \in A'_d} T^{a} f^0_{h_1,\dots,h_{d-1}} - \mathop{\bf E}_{a \in A_d} T^{a} f^0_{h_1,\dots,h_{d-1}} \|_{L^2({\mathrm X})}^2 \leq 4 \varepsilon

whenever {A'_d \geq A_d}. Similarly with {f^0_{h_1,\dots,h_{d-1}}} replaced by {f^1_{h_1,\dots,h_{d-1}}}. From Cauchy-Schwarz we then have

\displaystyle |F(A_1,\dots,A_{d-1},A'_d) - F(A_1,\dots,A_d)| \leq C \varepsilon

for some {C} independent of {\varepsilon} (depending on the {L^{2^d}} norms of the {f_\omega}), and the claim follows after redefining {\varepsilon}.

Filed under: expository, math.CA, math.DS Tagged: ergodic theory, Gowers uniformity norms

April 10, 2015

John BaezInformation and Entropy in Biological Systems (Part 3)

I think you can watch live streaming video of our workshop on Information and Entropy in Biological Systems, which runs Wednesday April 8th to Friday April 10th. Later, videos will be made available in a permanent location.

To watch the workshop live, go here. Go down to where it says

Investigative Workshop: Information and Entropy in Biological Systems

Then click where it says live link. There’s nothing there now, but I’m hoping there will be when the show starts!

Below you can see the schedule of talks and a list of participants. The hours are in Eastern Daylight Time: add 4 hours to get Greenwich Mean Time. The talks start at 10 am EDT, which is 2 pm GMT.


There will be 1½ hours of talks in the morning and 1½ hours in the afternoon for each of the 3 days, Wednesday April 8th to Friday April 10th. The rest of the time will be for discussions on different topics. We’ll break up into groups, based on what people want to discuss.

Each invited speaker will give a 30-minute talk summarizing the key ideas in some area, not their latest research so much as what everyone should know to start interesting conversations. After that, 15 minutes for questions and/or coffee.

Here’s the schedule. You can already see slides or other material for the talks with links!

Wednesday April 8

• 9:45-10:00 — the usual introductory fussing around.
• 10:00-10:30 — John Baez, Information and entropy in biological systems.
• 10:30-11:00 — questions, coffee.
• 11:00-11:30 — Chris Lee, Empirical information, potential information and disinformation.
• 11:30-11:45 — questions.

• 11:45-1:30 — lunch, conversations.

• 1:30-2:00 — John Harte, Maximum entropy as a foundation for theory building in ecology.
• 2:00-2:15 — questions, coffee.
• 2:15-2:45 — Annette Ostling, The neutral theory of biodiversity and other competitors to the principle of maximum entropy.
• 2:45-3:00 — questions, coffee.
• 3:00-5:30 — break up into groups for discussions.

• 5:30 — reception.

Thursday April 9

• 10:00-10:30 — David Wolpert, The Landauer limit and thermodynamics of biological organisms.
• 10:30-11:00 — questions, coffee.
• 11:00-11:30 — Susanne Still, Efficient computation and data modeling.
• 11:30-11:45 — questions.

• 11:45-1:30 — group photo, lunch, conversations.

• 1:30-2:00 — Matina Donaldson-Matasci, The fitness value of information in an uncertain environment.
• 2:00-2:15 — questions, coffee.
• 2:15-2:45 — Roderick Dewar, Maximum entropy and maximum entropy production in biological systems: survival of the likeliest?
• 2:45-3:00 — questions, coffee.
• 3:00-6:00 — break up into groups for discussions.

Friday April 10

• 10:00-10:30 — Marc Harper, Information transport and evolutionary dynamics.
• 10:30-11:00 — questions, coffee.
• 11:00-11:30 — Tobias Fritz, Characterizations of Shannon and Rényi entropy.
• 11:30-11:45 — questions.

• 11:45-1:30 — lunch, conversations.

• 1:30-2:00 — Christina Cobbold, Biodiversity measures and the role of species similarity.
• 2:00-2:15 — questions, coffee.
• 2:15-2:45 — Tom Leinster, Maximizing biological diversity.
• 2:45-3:00 — questions, coffee.
• 3:00-6:00 — break up into groups for discussions.


Here are the confirmed participants. This list may change a little bit:

• John Baez – mathematical physicist.

• Romain Brasselet – postdoc in cognitive neuroscience knowledgeable about information-theoretic methods and methods of estimating entropy from samples of probability distributions.

• Katharina Brinck – grad student at Centre for Complexity Science at Imperial College; did masters at John Harte’s lab, where she extended his Maximum Entropy Theory of Ecology (METE) to trophic food webs, to study how entropy maximization on the macro scale together with MEP on the scale of individuals drive the structural development of model ecosystems.

• Christina Cobbold – mathematical biologist, has studied the role of species similarity in measuring biodiversity.

• Troy Day – mathematical biologist, works with population dynamics, host-parasite dynamics, etc.; influential and could help move population dynamics to a more information-theoretic foundation.

• Roderick Dewar – physicist who studies the principle of maximal entropy production.

• Barrett Deris – MIT postdoc studying the studying the factors that influence evolvability of drug resistance in bacteria.

• Charlotte de Vries – a biology master’s student who studied particle physics to the master’s level at Oxford and the Perimeter Institute. Interested in information theory.

• Matina Donaldson-Matasci – a biologist who studies information, uncertainty and collective behavior.

• Chris Ellison – a postdoc who worked with James Crutchfield on “information-theoretic measures of structure and memory in stationary, stochastic systems – primarily, finite state hidden Markov models”. He coauthored “Intersection Information based on Common Randomness”, The idea: “The introduction of the partial information decomposition generated a flurry of proposals for defining an intersection information that quantifies how much of “the same information” two or more random variables specify about a target random variable. As of yet, none is wholly satisfactory.” Works on mutual information between organisms and environment (along with David Krakauer and Jessica Flack), and also entropy rates.

• Cameron Freer – MIT postdoc in Brain and Cognitive Sciences working on maximum entropy production principles, algorithmic entropy etc.

• Tobias Fritz – a physicist who has worked on “resource theories” and haracterizations of Shannon and Rényi entropy and on resource theories.

• Dashiell Fryer – works with Marc Harper on information geometry and evolutionary game theory.

• Michael Gilchrist – an evolutionary biologist studying how errors and costs of protein translation affect the codon usage observed within a genome. Works at NIMBioS.

• Manoj Gopalkrishnan – an expert on chemical reaction networks who understands entropy-like Lyapunov functions for these systems.

• Marc Harper – works on evolutionary game theory using ideas from information theory, information geometry, etc.

• John Harte – an ecologist who uses the maximum entropy method to predict the structure of ecosystems.

• Ellen Hines – studies habitat modeling and mapping for marine endangered species and ecosystems, sea level change scenarios, documenting of human use and values. Her lab has used MaxEnt methods.

• Elizabeth Hobson – behavior ecology postdoc developing methods to quantify social complexity in animals. Works at NIMBioS.

• John Jungk – works on graph theory and biology.

• Chris Lee – in bioinformatics and genomics; applies information theory to experiment design and evolutionary biology.

• Maria Leites – works on dynamics, bifurcations and applications of coupled systems of non-linear ordinary differential equations with applications to ecology, epidemiology, and transcriptional regulatory networks. Interested in information theory.

• Tom Leinster – a mathematician who applies category theory to study various concepts of ‘magnitude’, including biodiversity and entropy.

• Timothy Lezon – a systems biologist in the Drug Discovery Institute at Pitt, who has used entropy to characterize phenotypic heterogeneity in populations of cultured cells.

• Maria Ortiz Mancera – statistician working at CONABIO, the National Commission for Knowledge and Use of Biodiversity, in Mexico.

• Yajun Mei – statistician who uses Kullback-Leibler divergence and how to efficiently compute entropy for the two-state hidden Markov models.

• Robert Molzon – mathematical economist who has studied deterministic approximation of stochastic evolutionary dynamics.

• David Murrugarra – works on discrete models in mathematical biology; interested in learning about information theory.

• Annette Ostling – studies community ecology, focusing on the influence of interspecific competition on community structure, and what insights patterns of community structure might provide about the mechanisms by which competing species coexist.

• Connie Phong – grad student at Chicago’s Institute of Genomics and System biology, working on how “certain biochemical network motifs are more attuned than others at maintaining strong input to output relationships under fluctuating conditions.”

• Petr Plechak – works on information-theoretic tools for estimating and minimizing errors in coarse-graining stochastic systems. Wrote “Information-theoretic tools for parametrized coarse-graining of non-equilibrium extended systems”.

• Blake Polllard – physics grad student working with John Baez on various generalizations of Shannon and Renyi entropy, and how these entropies change with time in Markov processes and open Markov processes.

• Timothee Poisot – works on species interaction networks; developed a “new suite of tools for probabilistic interaction networks”.

• Richard Reeve – works on biodiversity studies and the spread of antibiotic resistance. Ran a program on entropy-based biodiversity measures at a mathematics institute in Barcelona.

• Rob Shaw – works on entropy and information in biotic and pre-biotic systems.

• Matteo Smerlak – postdoc working on nonequilibrium thermodynamics and its applications to biology, especially population biology and cell replication.

• Susanne Still – a computer scientist who studies the role of thermodynamics and information theory in prediction.

• Alexander Wissner-Gross – Institute Fellow at the Harvard University Institute for Applied Computational Science and Research Affiliate at the MIT Media Laboratory, interested in lots of things.

• David Wolpert – works at the Santa Fe Institute on i) information theory and game theory, ii) the second law of thermodynamics and dynamics of complexity, iii) multi-information source optimization, iv) the mathematical underpinnings of reality, v) evolution of organizations.

• Matthew Zefferman – works on evolutionary game theory, institutional economics and models of gene-culture co-evolution. No work on information, but a postdoc at NIMBioS.

John BaezKinetic Networks: From Topology to Design

Here’s an interesting conference for those of you who like networks and biology:

Kinetic networks: from topology to design, Santa Fe Institute, 17–19 September, 2015. Organized by Yoav Kallus, Pablo Damasceno, and Sidney Redner.

Proteins, self-assembled materials, virus capsids, and self-replicating biomolecules go through a variety of states on the way to or in the process of serving their function. The network of possible states and possible transitions between states plays a central role in determining whether they do so reliably. The goal of this workshop is to bring together researchers who study the kinetic networks of a variety of self-assembling, self-replicating, and programmable systems to exchange ideas about, methods for, and insights into the construction of kinetic networks from first principles or simulation data, the analysis of behavior resulting from kinetic network structure, and the algorithmic or heuristic design of kinetic networks with desirable properties.

Jordan Ellenberg

From the NYRB:

In 2009, police in Long Branch, New Jersey, were alerted to the presence of an “eccentric-looking old man” wandering around a residential neighborhood in the rain and peering into the windows of a house marked with a “for sale” sign. When the police arrived, the man introduced himself as Bob Dylan. He had no identification; the officer, Kristie Buble, then twenty-four, suspected he was an escaped mental patient. It “never crossed my mind,” she said, “that this could really be him.”

The funny part is, it was actually Jonah Lehrer!

Doug Natelsonsubmerged due to grant deadline

Fear not, a new post is coming soon, but for now I'm trying to finish off a proposal.

Georg von HippelWorkshop "Fundamental Parameters from Lattice QCD" at MITP (upcoming deadline)

Recent years have seen a significant increase in the overall accuracy of lattice QCD calculations of various hadronic observables. Results for quark and hadron masses, decay constants, form factors, the strong coupling constant and many other quantities are becoming increasingly important for testing the validity of the Standard Model. Prominent examples include calculations of Standard Model parameters, such as quark masses and the strong coupling constant, as well as the determination of CKM matrix elements, which is based on a variety of input quantities from experiment and theory. In order to make lattice QCD calculations more accessible to the entire particle physics community, several initiatives and working groups have sprung up, which collect the available lattice results and produce global averages.

The scientific programme "Fundamental Parameters from Lattice QCD" at the Mainz Institute of Theoretical Physics (MITP) is designed to bring together lattice practitioners with members of the phenomenological and experimental communities who are using lattice estimates as input for phenomenological studies. In addition to sharing the expertise among several communities, the aim of the programme is to identify key quantities which allow for tests of the CKM paradigm with greater accuracy and to discuss the procedures in order to arrive at more reliable global estimates.

The deadline for registration is Wednesday, 15 April 2015. Please register at this link.

April 09, 2015

Sean CarrollA Personal Narrative

I was very pleased to learn that I’m among this year’s recipients of a Guggenheim Fellowship. The Fellowships are mid-career awards, meant “to further the development of scholars and artists by assisting them to engage in research in any field of knowledge and creation in any of the arts, under the freest possible conditions and irrespective of race, color, or creed.” This year 173 Fellowships were awarded, chosen from 3,100 applications. About half of the winners are in the creative arts, and the majority of those remaining are in the humanities and social sciences, leaving eighteen slots for natural scientists. Only two physicists were chosen, so it’s up to Philip Phillips and me to uphold the honor of our discipline.

The Guggenheim application includes a “Career Narrative” as well as a separate research proposal. I don’t like to share my research proposals around, mostly because I’m a theoretical physicist and what I actually end up doing rarely bears much resemblance to what I had previously planned to do. But I thought I could post my career narrative, if only on the chance that it might be useful to future fellowship applicants (or young students embarking on their own research careers). Be warned that it’s more personal than most things I write on the blog here, not to mention that it’s beastly long. Also, keep in mind that the purpose of the document was to convince people to give me money — as such, it falls pretty heavily on the side of grandiosity and self-justification. Be assured that in real life I remain meek and humble.

Sean M. Carroll: Career Narrative

Reading over applications for graduate school in theoretical physics, one cannot help but be struck by a certain common theme: everyone wants to discover the fundamental laws of nature, quantize gravity, and find a unified theory of everything. That was certainly what interested me, ever since I first became enamored with physics when I was about ten years old. It’s an ambitious goal, worthy of pursuing, and I’ve been fortunate enough to contribute to the quest in my own small way over the course of my research career, especially in gravitational physics and cosmology.

But when a goal is this far-reaching, it’s important to keep in mind different routes to the ultimate end. In recent years I have become increasingly convinced that there is important progress to be made by focusing on emergence: how the deepest levels of reality are connected to the many higher levels of behavior we observe. How do spacetime and classical reality arise from an underlying quantum description? What is complexity, and how does it evolve over time, and how is that evolution driven by the increase of entropy? What do we mean when we talk about “causes” and “purposes” if the underlying laws are perfectly reversible? What role does information play in the structure of reality? All of these questions are thoroughly interdisciplinary in nature, and can be addressed with a wide variety of different techniques. I strongly believe that the time is right for groundbreaking work in this area, and a Guggenheim fellowship would help me develop the relevant expertise and start stimulating new collaborations.

University, Villanova and Harvard: 1984-1993

There is no question I am a physicist. The topics that first sparked my interest in science – the Big Bang, black holes, elementary particles – are the ones that I think about today, and they lie squarely within the purview of physics. So it is somewhat curious that I have no degrees in physics. For a variety of reasons (including questionable guidance), both my undergraduate degree from Villanova and my Ph.D. from Harvard are in astronomy and astrophysics. I would like to say that this was a clever choice based on a desire for interdisciplinary engagement, but it was more of an accident of history (and a seeming insistence on doing things the hard way). Villanova offered me a full-tuition academic scholarship (rare at the time), and I financed my graduate education through fellowships from NASA and the National Science Foundation.

Nevertheless, my education was extremely rewarding. As an undergraduate at a very small but research-oriented department, I got a start in doing real science at an early age, taking photometric data on variable stars and building models based on their light curves [Carroll, Guinan, McCook and Donahue, 1991]. In graduate school I was surrounded by incredible resources in the Cambridge area, and made an effort to take advantage of them. My advisor, George Field, was a well-established theoretical astrophysicist, specializing in magnetohydrodynamics and the interstellar medium. He wasn’t an expert in the area that I wanted to study, the particle physics/cosmology connection, but he was curious about it. So we essentially learned things together, writing papers on alternatives to general relativity, the origin of intergalactic magnetic fields, and inflationary cosmology, including one of the first studies of a non-Lorentz-invariant modification of electromagnetism [Carroll, Field, and Jackiw 1990]. George also encouraged me to work with others, and I collaborated with fellow graduate students on topics in mathematical physics and topological defects, as well as with Edward Farhi and Alan Guth from MIT on closed timelike curves (what people on the street call “time machines”) in general relativity [Carroll, Farhi, and Guth 1992].

Setting a pattern that would continue to be followed down the line, I didn’t limit my studies to physics alone. In particular, my time at Villanova ignited an interest in philosophy that remains strong to this day. I received a B.A. degree in “General Honors” as well as my B.S. in Astronomy and Astrophysics, and also picked up a philosophy minor. At Harvard, I sat in on courses with John Rawls, Robert Nozick, and Barbara Johnson. While science was my first love and remains my primary passion, the philosophical desire to dig deep and ask fundamental questions continues to resonate strongly with me, and I’m convinced that familiarity with modern philosophy of science can be invaluable to physicists trying to tackle questions at the foundations of the discipline.

Postdoctoral, MIT and ITP: 1993-1999

For my first postdoctoral fellowship, in 1993 I moved just a bit down the road, from Harvard to MIT; three years later I would fly across the country to the prestigious Institute for Theoretical Physics at UC Santa Barbara. At both places I continued to do research in a somewhat scattershot fashion, working on a potpourri of topics in gravitation and field theory, usually in collaboration with other physicists my age rather than with the senior professors. I had great fun, writing papers on supergravity (the supersymmetric version of general relativity), topological defects, perturbations of the cosmic microwave background radiation, two-dimensional quantum gravity, interacting dark matter, and tests of the large-scale isotropy of the universe.

Although I was slow to catch on, the academic ground was shifting beneath me. The late 80’s and early 90’s, when I was a graduate student, were a sluggish time in particle physics and cosmology. There were few new experimental results; the string theory revolution, which generated so much excitement in the early 80’s, had not lived up to its initial promise; and astronomers continued to grapple with the difficulties in measuring properties of the universe with any precision. In such an environment, my disjointed research style was enough to get by. But as I was graduating with my Ph.D., things were changing. In 1992, results from the COBE satellite showed us for the first time the tiny temperature variations in the cosmic background radiation, representing primordial density fluctuations that gradually grew into galaxies and large-scale structure. In 1994-95, a series of theoretical breakthroughs launched the second superstring revolution. Suddenly, it was no longer good enough just to be considered smart and do random interesting things. Theoretical cosmologists dived into work on the microwave background, or at least models of inflation that made predictions for it; field theorists and string theorists were concentrating on dualities, D-branes, and the other shiny new toys that the latest revolution had brought them. In 1993 I was a hot property on the postdoctoral job market, with multiple offers from the very best places; by 1996 those offers had largely dried up, and I was very fortunate to be offered a position at a place as good as ITP.

Of course, nobody actually told me this in so many words, and it took me a while to figure it out. It’s a valuable lesson that I still take to heart – it’s not good enough to do work on things you think are interesting, you have to make real contributions that others recognize as interesting, as well. I don’t see this as merely a cynical strategy for academic career success. As enjoyable and stimulating as it may be to bounce from topic to topic, the chances of make a true and lasting contribution are larger for people who focus on an area with sufficient intensity to master it in all of its nuance.

What I needed was a topic that I personally found fascinating enough to investigate in real detail, and which the rest of the community recognized as being of central importance. Happily, the universe obligingly provided just the thing. In 1998, two teams of astronomers, one led by Saul Perlmutter and the other by Brian Schmidt and Adam Riess, announced an amazing result: our universe is not only expanding, it’s accelerating. Although in retrospect there were clues that this might have been the case, it took most of the community by complete surprise, and certainly stands as the most important discovery that has happened during my own career. Perlmutter, Schmidt, and Riess shared the Nobel Prize in 2011.

Like many other physicists, my imagination was immediately captured by the question of why the universe is accelerating. Through no planning of my own, I was perfectly placed to dive into the problem. Schmidt and Riess had both been fellow graduate students of mine while I was at Harvard (Brian was my officemate), and I had consulted with Perlmutter’s group early on in their investigations, so I was very familiar with the observations of Type Ia supernovae on which the discovery was based. The most obvious explanation for universal acceleration is that empty space itself carries a fixed energy density, what Einstein had labeled the “cosmological constant”; I happened to be a co-author, with Bill Press and Ed Turner, on a 1992 review article on the subject that had become a standard reference in the field [Carroll, Press, and Turner 1992], and which hundreds of scientists were now hurriedly re-reading. In 1997 Greg Anderson and I had proposed a model in which dark-matter particles would interact with an ambient field, growing in mass as the universe expands [Anderson and Carroll 1997]; this kind of model natural leads to cosmic acceleration, and was an early idea for what is now known as “dark energy” (as well as for the more intriguing possibility that there may be a variety of interactions within a rich “dark sector”).

With that serendipitous preparation, I was able to throw myself into the questions of dark energy and the acceleration of the universe. After the discovery was announced, models were quickly proposed in which the dark energy was a dynamically-evolving field, rather than a constant energy density. I realized that most such models were subject to severe experimental constraints, because they would lead to new long-range forces and cause particle-physics parameters to slowly vary with time. I wrote a paper [Carroll 1998] pointing out these features, as well as suggesting symmetries that could help avoid them. I also collaborated with the Schmidt/Riess group on a pioneering paper [Garnavich et al. 1998] that placed limits on the rate at which the density of dark energy could change as the universe expands. With this expertise and these papers, I was suddenly a hot property on the job market once again; in 1999 I accepted a junior-faculty position at the University of Chicago.

University of Chicago: 1999-2006

While I was a postdoc, for the most part my intellectual energies were devoted completely to research. As a new faculty member, I had the responsibility and opportunity to expand my reach in a variety of ways. I had always loved teaching, and took to it with gusto, pioneering new courses (undergraduate general relativity, graduate cosmology), and winning a “Spherical Cow” teaching award from the physics graduate students. I developed my lecture notes for a graduate course in general relativity into a textbook, Spacetime and Geometry, which is now used widely in universities around the world. I helped organize a major international conference (Cosmo-02), served on a number of national committees (including the roadmap team for NASA’s Beyond Einstein program), and was a founding member and leader of the theory group at Chicago’s Kavli Institute for Cosmological Physics. I was successful at bringing in money, including fellowships from the Sloan and Packard Foundations. I made connections with professors in other departments, and started to work with Project Exploration, an outreach nonprofit led by Gabrielle Lyon and Chicago paleontologist Paul Sereno. With Classics professor Shadi Bartsch, I taught an undergraduate humanities course on the history of atheism. I became involved in the local theatre community, helping advise companies that were performing plays with scientific themes (Arcadia, Proof, Humble Boy). And in 2004 I took up blogging at my site Preposterous Universe, a fun and stimulating pastime that I continue to this day.

Research, of course, was still central, and I continued to concentrate on the challenge posed by the accelerating universe, especially in a series of papers with Mark Trodden (then at Syracuse, now at U. Penn.) and other collaborators. Among the more speculative ideas that had been proposed was “phantom energy,” a form of dark energy whose density actually increases as the universe expands. In one paper [Carroll, Hoffman, and Trodden 2003] we showed that such theories tended to be catastrophically unstable, and in another [Carroll, De Felice, and Trodden 2004] we showed that more complex models could nevertheless trick observers into concluding that the dark energy was phantom-like.

Our most influential work proposed a simple idea: that there isn’t any dark energy at all, but rather that general relativity breaks down on cosmological scales, where new dynamics can kick in [Carroll, Duvvuri, Trodden, and Turner 2004]. This became an extremely popular scenario within the theoretical cosmology community, launching a great deal of work devoted to investigating these “f(R) theories.” (The name refers to the fact that the dynamical equations are based on an arbitrary function of R, a quantity that measures the curvature of spacetime.) This work included papers by our group looking at long-term cosmological evolution in such models [Carroll et al. 2004], and studying the formation of structure in theories designed to be compatible with observational constraints on modified gravity [Carroll, Sawicki, Silvestri, and Trodden 2006].

Being of restless temperament, I couldn’t confine myself to only thinking about dark energy and modified gravity. I published on a number of topics at the interface of cosmology, field theory, and gravitation: observational constraints on alternative cosmologies, large extra dimensions of spacetime, supersymmetric topological defects, violations of fundamental symmetries, the origin of the matter/antimatter asymmetry, the connection between cosmology and the arrow of time. I found the last of these especially intriguing. To physicists, all of the manifold ways in which the past is different from the future (we age toward the future, we can remember the past, we can make choices toward the future) ultimately come back to the celebrated Second Law of Thermodynamics: in closed systems, entropy tends to increase over time. Back in the 19th century, Ludwig Boltzmann and others explained why entropy increases toward the future; what remains as a problem is why the entropy was ever so low in the past. That’s a question for cosmology, and presents a significant challenge to current models of the early universe. With graduate student Jennifer Chen, I proposed a novel scenario in which the Big Bang is not the beginning of the universe, but simply one event among many; in the larger multiverse, entropy increases without bound both toward the distant future and also in the very distant past [Carroll and Chen 2004, 2005]. Our picture was speculative, to say the least, but it serves as a paradigmatic example of attempts to find a purely dynamical basis for the Second Law, and continues to attract attention from both physicists and philosophers.

In May, 2005, I was informed that I had been denied tenure. This came as a complete shock, in part because I had been given no warning that any trouble was brewing. I will never know precisely what was said at the relevant faculty meetings, and the explanations I received from different colleagues were notable mostly for the lack of any consistent narrative. But one thing that came through clearly was that my interest in doing things other than research had counted substantially against me. I was told that I came across as “more interested in writing textbooks,” and that perhaps I would be happier at a university that placed a “greater emphasis on pedagogy.”

An experience like that cannot help but inspire some self-examination, and I thought hard about what my next steps should be. I recognized that, if I wanted to continue in academia, my best chance of being considered successful would be to focus my energies as intently as possible in a single area of research, and cut down non-research activities to a minimum.

After a great deal of contemplation, I decided that such a strategy was exactly what I didn’t want to do. I would remain true to my own intellectual passions, and let the chips fall where they may.

Caltech and Beyond: 2006-

After the Chicago decision I was again very fortunate, when the physics department at Caltech quickly offered me a position as a research faculty member. It was a great opportunity, offering both a topflight research environment and an extraordinary amount of personal freedom. I took the job with two goals in mind: to expand my outreach and non-academic efforts even further, and to do innovative interdisciplinary research that would represent a true and lasting contribution.

To be brutally honest, since I arrived here in 2006 I have been much more successful at the former than at the latter (although I feel this is beginning to change). I’ve written two popular-level books: From Eternity to Here, on cosmology and the arrow of time, and The Particle at the End of the Universe, on the search for the Higgs boson at the Large Hadron Collider. Both were well-received, with Particle winning the Winton Prize from the Royal Society, the world’s most prestigious award for popular science books. I have produced two lecture courses for The Teaching Company, given countless public talks, and appeared on numerous TV programs, up to and including The Colbert Report. Living in Los Angeles, I’ve had the pleasure of serving as a science consultant on various films and TV shows, working with people such as Ron Howard, Kenneth Branagh, and Ridley Scott. My talk from TEDxCaltech, “Distant time and the hint of a multiverse,” recently passed a million total views. I helped organize a major interdisciplinary conference on the nature of time, as well as a much smaller workshop on philosophical naturalism that attracted some of the best people in the field (such as Steven Weinberg, Daniel Dennett, and Richard Dawkins). I was elected a Fellow of the American Physical Society and won the Gemant Award from the American Institute of Physics.

More substantively, I’ve developed my longstanding interest in philosophy in productive directions. Some of the physics questions that I find most interesting, such as the arrow of time or the measurement problem in quantum mechanics, are ones where philosophers have made a significant impact, and I have begun interacting and collaborating with several of the best in the business. In recent years the subject called “philosophy of cosmology” has become a new and exciting field, and I’ve had the pleasure of being at the center of many activities in the area; a conference next month has set aside a discussion session to examine the implications of the approach to the arrow of time that Jennifer Chen and I put forward a decade ago. My first major work in philosophy of science, a paper with graduate student Charles Sebens on how to derive the Born Rule in the many-worlds approach to quantum mechanics, was recently accepted into one of the leading journals in the field [Sebens and Carroll 2014]. I’ve also published invited articles on the implications of modern cosmology for religion, and participated in a number of popular debates on naturalism vs. theism.

At the same time, my research efforts have been productive but somewhat meandering. As usual, I have worked on a variety of interesting topics, including the use of effective field theory to understand the growth of large-scale structure, the dynamics of Lorentz-violating “aether” fields, how new forces can interact with dark matter, black hole entropy, novel approaches to dark-matter abundance, cosmological implications of a decaying Higgs field, and the role of rare fluctuations in the long-term evolution of universe. Some of my work over these years includes papers of which I am quite proud; these include investigations of dynamical compactification of dimensions of space [Carroll, Johnson, and Randall 2009], possible preferred directions in the universe [Ackerman, Carroll, and Wise 2007; Erickcek, Kamionkowski, and Carroll 2008a, b], the prospect of a force similar to electromagnetism interacting with dark matter [Ackerman et al. 2008], and quantitative investigations of fine- tuning of cosmological evolution [Carroll and Tam 2010; Remmen and Carroll 2013, 2014; Carroll 2014]. Almost none of this work has been on my previous specialty, dark energy and the accelerating universe. After having put a great amount of effort into thinking about this (undoubtedly important) problem, I have become pessimistic about the prospect for an imminent theoretical breakthrough, at least until we have a better understanding of the basic principles of quantum gravity. This helps explain the disjointed nature of my research over the past few years, but has also driven home to me the need to find a new direction and tackle it with determination.

Very recently I’ve found such a focus, and in some sense I have finally started to do the research I was born to do. It has resulted from a confluence of my interests in cosmology, quantum mechanics, and philosophy, along with a curiosity about complexity theory that I have long nurtured but never really acted upon. This is the turn toward “emergence” that I mentioned at the beginning of this narrative, and elaborate on in my research plan. I go into greater detail there, but the basic point is that we need to construct a more reliable framework in which to connect the very foundations of physics – quantum mechanics, field theory, spacetime – to a multitude of higher-level phenomena, from statistical mechanics to organized structures. A substantial amount of work has already been put into such issues, but a number of very basic questions remain unanswered.

This represents an evolution of my research focus rather than a sudden break with my earlier work; many topics in cosmology and quantum gravity are intimately tied to issues of emergence, and I’ve already begun investigating some of these questions in different ways. One prominent theme is the emergence of the classical world out of an underlying quantum description. My papers with Sebens on the many-worlds approach are complementary to a recent paper I wrote with two graduate students on the nature of quantum fluctuations [Boddy, Carroll, and Pollack 2014]. There, we argued that configurations don’t actually “fluctuate into existence” in stationary quantum states, since there is no process of decoherence; this has important implications for cosmology in both the early and late universe. In another paper [Aaronson, Carroll, and Ouellette 2014], my collaborators and I investigated the relationship between entropy (which always increases in closed systems) and complexity (which first increases, then decreases as the system approaches equilibrium). Since the very notion of complexity does not have a universally-agreed-upon definition, any progress we can make in understanding its basic features is potentially very important.

I am optimistic that this new research direction will continue to expand and flourish, and that there is a substantial possibility of making important breakthroughs in the field. (My papers on the Born Rule and quantum fluctuations have already attracted considerable attention from influential physicists and philosophers – they don’t always agree with our unconventional conclusions, but I choose to believe that it’s just a matter of time.) I am diving into these new waters headfirst, including taking online courses (complexity theory from Santa Fe, programming and computer science from MIT) that will help me add skills that weren’t part of my education as a cosmologist. A Guggenheim Fellowship will be invaluable in aiding me in this effort.

My ten-year-old self was right: there is nothing more exciting than trying to figure out how nature works at a deep level. Having hit upon a promising new way of doing it, I can’t wait to see where it goes.

April 08, 2015

BackreactionNo, the black hole information loss problem has not been solved. Even if PRL thinks so.

This morning I got several requests for comments on this paper which apparently was published in PRL
    Radiation from a collapsing object is manifestly unitary
    Anshul Saini, Dejan Stojkovic
    Phys.Rev.Lett. 114 (2015) 11, 111301
    arXiv:1503.01487 [gr-qc]
The authors claim they find “that the process of gravitational collapse and subsequent evaporation is manifestly unitary as seen by an asymptotic observer.”

What do they do to arrive at this groundbreaking result that solves the black hole information loss problem in 4 PRL-approved pages? The authors calculate the particle production due to the time-dependent background of the gravitational field of a collapsing mass-shell. Using the mass-shell is a standard approximation. It is strictly speaking unnecessary, but it vastly simplifies the calculation and is often used. They use the functional Schrödinger formalism (see eg section II of this paper for a brief summary), which is somewhat unusual, but its use shouldn’t make a difference for the outcome. They find the time evolution of the particle production is unitary.

In the picture they use, they do not explicitly use Bogoliubov transformations, but I am sure one could reformulate their time-evolution in terms of the normally used Bogoliubov-coefficients, since both pictures have to be unitarily equivalent. There is an oddity in their calculation which is that in their field expansion they don’t seem to have anti-particles, or else I am misreading their notation, but this might not matter much as long as one keeps track of all branch cuts.

Due to the unusual picture that they use one unfortunately cannot directly compare their intermediate results with the standard calculation. In the most commonly used Schrödinger picture, the operators are time-independent. In the picture used in the paper, part of the time-dependence is pushed into the operators. Therefore I don’t know how to interpret these quantities, and in the paper there’s no explanation on what observables they might correspond to. I haven’t actually checked the steps of the calculation, but it all looks quite plausible as by method and functional dependence.

What’s new about this? Nothing really. The process of particle production in time-dependent background fields is unitary. The particles produced in the collapse process do form a pure state. They have to because it’s a Hamiltonian evolution. The reason for the black hole information loss is not that the particle production isn’t unitary – Bogoliubov transformations are by construction unitary – but that the outside observer in the end doesn’t get to see the full state. He only sees the part of the particles which manage to escape. The trouble is that these particles are entangled with the particles that are behind the horizon and eventually hit the singularity.

It is this eventual destruction of half of the state at the singularity that ultimately leads to a loss of information. That’s why remnants or baby-universes in a sense solve the information loss problem simply by preventing the destruction at the singularity, since the singularity is assumed to not be there. For many people this is a somewhat unsatisfactory solution because the outside observer still doesn’t have access the information. However, since the whole state still exists in a remnant scenario the time evolution remains unitary and no inconsistency with quantum mechanics ever arises. The new paper is not a remnant scenario, I am telling you this to explain that what causes the non-unitarity is not the particle production itself, but that the produced particles are entangled across the horizon, and part of them later become inaccessible, thereby leaving the outside observer with a mixed state (read: “information loss”).

The authors in the paper never trace out the part behind the horizon, so it’s not surprising the get a pure state. They just haven’t done the whole calculation. They write (p. 3) “Original Hawking radiation density matrix contains only the diagonal elements while the cross-terms are absent.” The original matrix of the (full!) Hawking radiation contains off-diagonal terms, it’s a fully entangled state. It becomes a diagonal, mixed, matrix only after throwing out the particles behind the horizon. One cannot directly compare the both matrices though because in the paper they use a different basis than one normally does.

So, in summary, they redid a textbook calculation by a different method and claimed they got a different result. That should be a warning sign. This is a 30+ years old problem, thousands of papers have been written about it. What are the odds that all these calculations have simply been wrong? Another warning sign is that they never explain just why they manage to solve the problem. They try to explain that their calculation has something in common with other calculations (about entanglement in the outgoing radiation only) but I cannot see any connection, and they don’t explain it either.

The funny thing about the paper is that I think the calculation, to the extent that they do it, is actually correct. But then the authors omit the last step, which means they do not, as stated in the quote above, calculate what the asymptotic observer sees. The conclusion that this solves the black hole information problem is then a classical non-sequitur.

April 07, 2015

n-Category Café Information and Entropy in Biological Systems

I’m helping run a workshop on Information and Entropy in Biological Systems at NIMBioS, the National Institute of Mathematical and Biological Synthesis, which is in Knoxville Tennessee.

I think you’ll be able to watch live streaming video of this workshop while it’s taking place from Wednesday April 8th to Friday April 10th. Later, videos will be made available in a permanent location.

To watch the workshop live, go here. Go down to where it says

Investigative Workshop: Information and Entropy in Biological Systems

Then click where it says live link. There’s nothing there now, but I’m hoping there will be when the show starts!

Below you can see the schedule of talks and a list of participants. The hours are in Eastern Daylight Time: add 4 hours to get Greenwich Mean Time. The talks start at 10 am EDT, which is 2 pm GMT.


There will be 1½ hours of talks in the morning and 1½ hours in the afternoon for each of the 3 days, Wednesday April 8th to Friday April 10th. The rest of the time will be for discussions on different topics. We’ll break up into groups, based on what people want to discuss.

Each invited speaker will give a 30-minute talk summarizing the key ideas in some area, not their latest research so much as what everyone should know to start interesting conversations. After that, 15 minutes for questions and/or coffee.

Here’s the schedule. You can already see slides or other material for the talks with links!

Wednesday April 8

• 9:45-10:00 — the usual introductory fussing around.

• 10:00-10:30 — John Baez, Information and entropy in biological systems.

• 10:30-11:00 — questions, coffee.

• 11:00-11:30 — Chris Lee, Empirical information, potential information and disinformation.

• 11:30-11:45 — questions.

• 11:45-1:30 — lunch, conversations.

• 1:30-2:00 — John Harte, Maximum entropy as a foundation for theory building in ecology.

• 2:00-2:15 — questions, coffee.

• 2:15-2:45 — Annette Ostling, The neutral theory of biodiversity and other competitors to the principle of maximum entropy.

• 2:45-3:00 — questions, coffee.

• 3:00-5:30 — break up into groups for discussions.

• 5:30 — reception.

Thursday April 9

• 10:00-10:30 — David Wolpert, The Landauer limit and thermodynamics of biological organisms.

• 10:30-11:00 — questions, coffee.

• 11:00-11:30 — Susanne Still, Efficient computation and data modeling.

• 11:30-11:45 — questions.

• 11:45-1:30 — lunch, conversations.

• 1:30-2:00 — Matina Donaldson-Matasci, The fitness value of information in an uncertain environment.

• 2:00-2:15 — questions, coffee.

• 2:15-2:45 — Roderick Dewar, Maximum entropy and maximum entropy production in biological systems: survival of the likeliest?

• 2:45-3:00 — questions, coffee.

• 3:00-6:00 — break up into groups for discussions.

Friday April 10

• 10:00-10:30 — Marc Harper, Information transport and evolutionary dynamics.

• 10:30-11:00 — questions, coffee.

• 11:00-11:30 — Tobias Fritz, Characterizations of Shannon and Rényi entropy.

• 11:30-11:45 — questions.

• 11:45-1:30 — lunch, conversations.

• 1:30-2:00 — Christina Cobbold, Biodiversity measures and the role of species similarity.

• 2:00-2:15 — questions, coffee.

• 2:15-2:45 — Tom Leinster, Maximizing biological diversity.

• 2:45-3:00 — questions, coffee.

• 3:00-6:00 — break up into groups for discussions.


Here are the confirmed participants, just so you can get a sense of who is involved:

• John Baez - mathematical physicist.

• Romain Brasselet - postdoc in cognitive neuroscience knowledgeable about information-theoretic methods and methods of estimating entropy from samples of probability distributions.

• Katharina Brinck - grad student at Centre for Complexity Science at Imperial College; did masters at John Harte’s lab, where she extended his Maximum Entropy Theory of Ecology (METE) to trophic food webs, to study how entropy maximization on the macro scale together with MEP on the scale of individuals drive the structural development of model ecosystems.

• Christina Cobbold - mathematical biologist, has studied the role of species similarity in measuring biodiversity.

• Troy Day - mathematical biologist, works with population dynamics, host-parasite dynamics, etc.; influential and could help move population dynamics to a more information-theoretic foundation.

• Roderick Dewar - physicist who studies the principle of maximal entropy production.

• Barrett Deris - MIT postdoc studying the studying the factors that influence evolvability of drug resistance in bacteria.

• Charlotte de Vries - a biology master’s student who studied particle physics to the master’s level at Oxford and the Perimeter Institute. Interested in information theory.

• Matina Donaldson-Matasci - a biologist who studies information, uncertainty and collective behavior.

• Chris Ellison - a postdoc who worked with James Crutchfield on “information-theoretic measures of structure and memory in stationary, stochastic systems - primarily, finite state hidden Markov models”. He coauthored Intersection information based on common randomness. The idea: “The introduction of the partial information decomposition generated a flurry of proposals for defining an intersection information that quantifies how much of “the same information” two or more random variables specify about a target random variable. As of yet, none is wholly satisfactory.” Works on mutual information between organisms and environment (along with David Krakauer and Jessica Flack), and also entropy rates.

• Cameron Freer - MIT postdoc in Brain and Cognitive Sciences working on maximum entropy production principles, algorithmic entropy etc.

• Tobias Fritz - a physicist who has worked on “resource theories” and haracterizations of Shannon and Rényi entropy and on resource theories.

• Dashiell Fryer - works with Marc Harper on information geometry and evolutionary game theory.

• Michael Gilchrist - an evolutionary biologist studying how errors and costs of protein translation affect the codon usage observed within a genome. Works at NIMBioS.

• Manoj Gopalkrishnan - an expert on chemical reaction networks who understands entropy-like Lyapunov functions for these systems.

• Marc Harper - works on evolutionary game theory using ideas from information theory, information geometry, etc.

• John Harte - an ecologist who uses the maximum entropy method to predict the structure of ecosystems.

• Ellen Hines - studies habitat modeling and mapping for marine endangered species and ecosystems, sea level change scenarios, documenting of human use and values. Her lab has used MaxEnt methods.

• Elizabeth Hobson - behavior ecology postdoc developing methods to quantify social complexity in animals. Works at NIMBioS.

• John Jungk - works on graph theory and biology.

• Chris Lee - in bioinformatics and genomics; applies information theory to experiment design and evolutionary biology.

• Maria Leites - works on dynamics, bifurcations and applications of coupled systems of non-linear ordinary differential equations with applications to ecology, epidemiology, and transcriptional regulatory networks. Interested in information theory.

• Tom Leinster - a mathematician who applies category theory to study various concepts of ‘magnitude’, including biodiversity and entropy.

• Timothy Lezon - a systems biologist in the Drug Discovery Institute at Pitt, who has used entropy to characterize phenotypic heterogeneity in populations of cultured cells.

• Maria Ortiz Mancera - statistician working at CONABIO, the National Commission for Knowledge and Use of Biodiversity, in Mexico.

• Yajun Mei - statistician who uses Kullback-Leibler divergence and how to efficiently compute entropy for the two-state hidden Markov models.

• Robert Molzon - mathematical economist who has studied deterministic approximation of stochastic evolutionary dynamics.

• David Murrugarra - works on discrete models in mathematical biology; interested in learning about information theory.

• Annette Ostling - studies community ecology, focusing on the influence of interspecific competition on community structure, and what insights patterns of community structure might provide about the mechanisms by which competing species coexist.

• Connie Phong - grad student at Chicago’s Institute of Genomics and System biology, working on how “certain biochemical network motifs are more attuned than others at maintaining strong input to output relationships under fluctuating conditions.”

• Petr Plechak - works on information-theoretic tools for estimating and minimizing errors in coarse-graining stochastic systems. Wrote “Information-theoretic tools for parametrized coarse-graining of non-equilibrium extended systems”.

• Blake Polllard - physics grad student working with John Baez on various generalizations of Shannon and Renyi entropy, and how these entropies change with time in Markov processes and open Markov processes.

• Timothee Poisot - works on species interaction networks; developed a “new suite of tools for probabilistic interaction networks”.

• Richard Reeve - works on biodiversity studies and the spread of antibiotic resistance. Ran a program on entropy-based biodiversity measures at a mathematics institute in Barcelona.

• Rob Shaw - works on entropy and information in biotic and pre-biotic systems.

• Matteo Smerlak - postdoc working on nonequilibrium thermodynamics and its applications to biology, especially population biology and cell replication.

• Susanne Still - a computer scientist who studies the role of thermodynamics and information theory in prediction.

• Alexander Wissner-Gross - Institute Fellow at the Harvard University Institute for Applied Computational Science and Research Affiliate at the MIT Media Laboratory, interested in lots of things.

• David Wolpert - works at the Santa Fe Institute on i) information theory and game theory, ii) the second law of thermodynamics and dynamics of complexity, iii) multi-information source optimization, iv) the mathematical underpinnings of reality, v) evolution of organizations.

• Matthew Zefferman - works on evolutionary game theory, institutional economics and models of gene-culture co-evolution. No work on information, but a postdoc at NIMBioS.

April 06, 2015

n-Category Café Five Quickies

I’m leaving tomorrow for an “investigative workshop” on Information and Entropy in Biological Systems, co-organized by our own John Baez, in Knoxville, Tennessee. I’m excited! And I’m hoping to learn a lot.

A quick linkdump before I go:

  • Reflexive completion   Tom Avery and I are writing a paper on Isbell conjugacy and reflexive completion (also called Isbell or MacNeille completion). I gave a talk on it last week at the British Mathematical Colloquium in Cambridge, which is somewhat similar to the Joint Meetings in the US.

    The concepts of Isbell conjugacy and reflexive completion are at the same very basic, primitive categorical level as the Yoneda lemma. They rely on nothing more than the notions of category, functor and natural transformation. But they’re immeasurably less well-known than the Yoneda lemma.

    As for Tom, you may remember him from the Kan Extension Seminar. He’s doing a PhD with me here in Edinburgh.

  • Mathematicians accepting and resisting overtures from GCHQ   The British Mathematical Colloquium was part-sponsored by the UK surveillance agency GCHQ, which I’ve written about many times before. They paid for one of the plenary talks and some of the costs of student attendance. (They didn’t pay for me.)

    I began my talk by expressing both my thanks to the organizers and my disappointment in them for allowing an extremist organization like GCHQ to buy itself a presence at the conference. As I understand it, the deal was that in return for the money it paid, GCHQ got a recruiting platform for the Heilbronn Institute (their academic brand). For instance, the conference hosted a Heilbronn recruitment session, advertised in the blurb that every delegate received.

    And I said that although one might think it irregular for me to be taking a few moments of my seminar to talk about this, consider the fact that GCHQ bought itself three whole days in which to create the impression that working for an agency of mass surveillance is a normal, decent thing for a mathematician to do.

    There were several encouraging signs, though:

    • An important person within the organizational structure of the British Mathematical Colloquium (an annual event) exhibited clear awareness that GCHQ involvement in the BMC was controversial. This is a start, though it requires people to keep pointing out that committees, as well as individuals, are making a political choice when they cooperate with the intelligence agencies.
    • The conference website advertised that GCHQ would have a stand/booth at the conference, but in the end they didn’t — perhaps knowing that they’d have been asking for trouble.
    • I heard that a certain prominent British mathematician had refused requests to review from both GCHQ and the NSA.
    • And as usual, just about every mathematician I spoke to was opposed to GCHQ mass surveillance.
  • The Euler characteristic of an algebra   I recently gave a couple of talks entitled “The Euler characteristic of an algebra”: one for category theorists and one for algebraists. This represents joint work with Joe Chuang and Alastair King, which I hope we’ll have written up soon. I wrote a couple of posts warming up to this result, though I never got round to the result itself.

  • Review of Nick Gurski’s higher categories book   I wrote a review of Nick Gurski’s book Coherence in Three-Dimensional Category Theory. The review will appear in the Bulletin of the London Mathematical Society.

  • Are lectures the best way to teach?   Nick also has an opinion piece in The Guardian (joint with his colleague Sam Marsh), answering the question: are lectures the best way to teach students?

Matt StrasslerThe LHC restarts — in a manner of speaking —

As many of you will have already read, the Large Hadron Collider [LHC], located at the CERN laboratory in Geneva, Switzerland, has “restarted”. Well, a restart of such a machine, after two years of upgrades, is not a simple matter, and perhaps we should say that the LHC has “begun to restart”. The process of bringing the machine up to speed begins with one weak beam of protons at a time — with no collisions, and with energy per proton at less than 15% of where the beams were back in 2012. That’s all that has happened so far.

If that all checks out, then the LHC operators will start trying to accelerate a beam to higher energy — eventually to record energy, 40% more than in 2012, when the LHC last was operating.  This is the real test of the upgrade; the thousands of magnets all have to work perfectly. If that all checks out, then two beams will be put in at the same time, one going clockwise and the other counterclockwise. Only then, if that all works, will the beams be made to collide — and the first few collisions of protons will result. After that, the number of collisions per second will increase, gradually. If everything continues to work, we could see the number of collisions become large enough — approaching 1 billion per second — to be scientifically interesting within a couple of months. I would not expect important scientific results before late summer, at the earliest.

This isn’t to say that the current milestone isn’t important. There could easily have been (and there almost were) magnet problems that could have delayed this event by a couple of months. But delays could also occur over the coming weeks… so let’s not expect too much in 2015. Still, the good news is that once the machine gets rolling, be it in May, June, July or beyond, we have three to four years of data ahead of us, which will offer us many new opportunities for discoveries, anticipated and otherwise.

One thing I find interesting and odd is that many of the news articles reported that finding dark matter is the main goal of the newly upgraded LHC. If this is truly the case, then I, and most theoretical physicists I know, didn’t get the memo. After all,

  • dark matter could easily be of a form that the LHC cannot produce, (for example, axions, or particles that interact only gravitationally, or non-particle-like objects)
  • and even if the LHC finds signs of something that behaves like dark matter (i.e. something that, like neutrinos, cannot be directly detected by LHC’s experiments), it will be impossible for the LHC to prove that it actually is dark matter.  Proof will require input from other experiments, and could take decades to obtain.

What’s my own understanding of LHC’s current purpose? Well, based on 25 years of particle physics research and ten years working almost full time on LHC physics, I would say (and I do say, in my public talks) that the coming several-year run of the LHC is for the purpose of

  1. studying the newly discovered Higgs particle in great detail, checking its properties very carefully against the predictions of the “Standard Model” (the equations that describe the known apparently-elementary particles and forces)  to see whether our current understanding of the Higgs field is complete and correct, and
  2. trying to find particles or other phenomena that might resolve the naturalness puzzle of the Standard Model, a puzzle which makes many particle physicists suspicious that we are missing an important part of the story, and
  3. seeking either dark matter particles or particles that may be shown someday to be “associated” with dark matter.

Finding dark matter itself is a worthy goal, but the LHC may simply not be the right machine for the job, and certainly can’t do the job alone.

Why the discrepancy between these two views of LHC’s purpose? One possibility is that since everybody has heard of dark matter, the goal of finding it is easier for scientists to explain to journalists, even though it’s not central.  And in turn, it is easier for journalists to explain this goal to readers who don’t care to know the real situation.  By the time the story goes to press, all the modifiers and nuances uttered by the scientists are gone, and all that remains is “LHC looking for dark matter”.  Well, stay tuned to this blog, and you’ll get a much more accurate story.

Fortunately a much more balanced story did appear in the BBC, due to Pallab Ghosh…, though as usual in Europe, with rather too much supersymmetry and not enough of other approaches to the naturalness problem.   Ghosh also does mention what I described in the italicized part of point 3 above — the possibility of what he calls the “wonderfully evocatively named `dark sector’ ”.  [Mr. Ghosh: back in 2006, well before these ideas were popular, Kathryn Zurek and I named this a “hidden valley”, potentially relevant either for dark matter or the naturalness problem. We like to think this is a much more evocative name.]  A dark sector/hidden valley would involve several types of particles that interact with one another, but interact hardly at all with anything that we and our surroundings are made from.  Typically, one of these types of particles could make up dark matter, but the others would unsuitable for making dark matter.  So why are these others important?  Because if they are produced at the LHC, they may decay in a fashion that is easy to observe — easier than dark matter itself, which simply exits the LHC experiments without a trace, and can only be inferred from something recoiling against it.   In other words, if such a dark sector [or more generally, a hidden valley of any type] exists, the best targets for LHC’s experiments (and other experiments, such as APEX or SHiP) are often not the stable particles that could form dark matter but their unstable friends and associates.

But this will all be irrelevant if the collider doesn’t work, so… first things first.  Let’s all wish the accelerator physicists success as they gradually bring the newly powerful LHC back into full operation, at a record energy per collision and eventually a record collision rate.

Filed under: Dark Matter, LHC News, Particle Physics Tagged: DarkMatter, Higgs, LHC, particle physics

April 05, 2015

Jordan EllenbergWhy would anyone want to become a security analyst or portfolio manager?

In today’s Wall Street Journal, Jason Zweig frets about the popularity of index funds:

If investors keep turning their money over to machines that have no opinion about which stocks or bonds are better than others, why would anyone want to become a security analyst or portfolio manager? Who will set the prices of investments? What will stop all stocks and bonds from going up and down together? Who will have the judgment and courage to step in and buy during a crash or to sell during a mania?

First of all, it hardly seems like the entire stock market is liable to become one big Vanguard fund:  as Zweig says later in the piece, “indexing accounts for 11.5% of the total value of the U.S. stock market.”  Big institutional actors have special needs which give them reason to actively manage their funds.  And an institution like Wisconsin’s pension fund, which manages about $100b, isn’t giving away 2% of its money per year to a manager, the way you or I would.  (This document says we spent $52.5 million in external management fees in 2013; percentagewise, that’s less than I give Vanguard for my index.  Update:  I screwed this up, as a commenter points out.  Our external management fees increased by $52.5m.  They present this as a substantial percentage of the total but I can’t find the actual amount of the fee.)

But second:  am I supposed to be upset if it becomes less attractive to become a portfolio manager?  One out of six Harvard seniors goes into finance.  Is that a good use of human capital?

(By the way, here’s a startling stat from that Harvard survey:  “None of the women going into finance said they would earn $90,000 or more, compared to 29 percent of men in finance.”  Is that because men are overpaid, or because we lie about our salaries the same way we lie about sex?)




April 04, 2015

Scott AaronsonQuantum Machine Learning Algorithms: Read the Fine Print

So, I’ve written a 4-page essay of that title, which examines the recent spate of quantum algorithms for clustering, classification, support vector machines, and other “Big Data” problems that grew out of a 2008 breakthrough on solving linear systems by Harrow, Hassidim, and Lloyd, as well as the challenges in applying these algorithms to get genuine exponential speedups over the best classical algorithms.  An edited version of the essay will be published as a Commentary in Nature Physics.  Thanks so much to Iulia Georgescu at Nature for suggesting that I write this.

Update (April 4, 2015): The piece has now been published.

Geraint F. LewisMusings on an academic career - Part 2

A long rainy Easter weekend in Sydney. And, as promised, here's some additional musings on an academic career. I thought I would tackle a big one and present the question that all ECRs and wannabe-academics should be asking themselves from day one, and it's a question that all academics should ask themselves periodically (where the period of periodically can be as short as 5 minutes). Namely, "Do I really want an academic career?"

Now, I am sure that some of you reading this, especially the more junior researchers of you, will be thinking "Well, duh! Ain't that obvious?" But, in fact, I think this goes to the heart of many of the touted problems with regards to academia, and it's a problem of our own making, and I mean all of us.

But before I start, the usual caveats apply. While this year marks two decades since I got my PhD and so I have a long history with academia, and while I am a professor at a large, prestigious university, I have limited experience of the entire world, and what I write here is a reflection of what I have seen in this time. Furthermore, a lot of what is below has accumulated over the years, and I did not get to where I am through the execution of some well developed plan; I got here through sweat, stress and lucky breaks. Of course, my experience is limited to science, physics and astronomy. It could be very different for the historians and economists out there.

So, buyer beware, although, honestly, I wish I had realised a lot of this a long time ago.

The Romantic Academic
I am pretty sure that if I did a straw-poll of researchers on why they are in this game, the answer would be very similar. When we start off as undergraduates we get a taste of research projects, thinking that we are unlocking the mysteries of the universe (without realising that we are doing research projects with training wheels attached). Research is fun, it's exciting, it's stressful and, when it works, it can be fulfilling. I love doing research. I love thinking about all sorts of different things, trying new methods, spending the afternoon with someone at the whiteboard scribbling an erasing. Hey, it may not cure cancer, but I will understand the chemical composition of clouds of gas ten billion light years away!

I don't know about everyone else reading this, but once I was bitten but the research bug, I could not let it go. I have a hard time thinking about anything else (although, I do not only research astronomy and physics - but that's for another story). I can't imagine a day where I don't learn something new. The thought of a "job" out there banging widgets, working in finance, or running a company, just strikes us as boring (although often we are making the case from ignorance as we really don't know what these jobs comprise of). Clearly, we want a career that still allows us to continue down this research word, and looking around we see the Drs and Professors of academia who supervise and employ us, and it is obvious that we need to follow the same trajectory.

However, the everything is not as it seems, but more of that in a moment.

A Life in Research
But there is a way to have a long and fruitful career in research, a career where you can do what you want, when you want, attend the conferences you want, with nobody to answer to than yourself. Such a career is the dream of virtually every academic I have ever met, and it is possible. Want to know the secret?

Well, skip the PhD and spend the twenties making your fortune. Get a few million in the bank by the time you are thirty and then live of your investments. Effectively retire into research and become a "Gentleman scientist" (and they were virtually all men) of a bygone age.

You might be spluttering on your corn flakes at this point and be thinking that I have gone mad. But think about it.

Why do you do a PhD? To learn, of course, but you don't need to do this in the context of a degree do you? You could learn the same stuff in your living room with access to the internet and a boxset of "House of Cards" in the background. Maybe you are after the title, but what is that really for? Well, it's the next step towards an academic career, but has a journal ever asked you if you have a PhD before considering your paper? To legitimise yourself as a researcher? The Dr in front of your name means little if you don't have publications to back it all up.

So, really, why do it? If you are going to fund yourself, why do you need it? If you really want one, do one after you have made your fortune, but I don't think it is really necessary.

Now you are probably thinking that you can't do that. You don't understand finances and investing and all that stuff. It all sounds very complicated. But you are supposed to be smart, and you should realise that there are many people out there who make their fortune who don't have a PhD in astrophysics or nano-photonics or whatever. What is stopping you is that you haven't learnt how it works (but, in the end, it is just more research). Yes, there is a risk that you won't make it, but risk is a topic we'll come back to later.

But after ten years of graft, you should be set up to do what you like for the rest of your life. Impossible? Not really. It does happen.

Academic SuperStar
OK, so you don't want to make your fortune and do what you want to, but you want to continue into academia and want the next best thing. You want to do research as an academic. Well, to be able to devote yourself to research, and only research, you need to either get yourself a fantastic fellowship from a grant agency (and acknowledge that these only last a limited amount of time) or get into completely research-focused departments.

Such positions can be relatively cushy, with funding for your salary, for travel, for research costs and people. You don't have complete autonomy as you will have had to written a proposal that was assessed and you will have to follow, and here will be lovely middle-management people whose roll it is to spot you spending your funds (or at least ensuring you are spending it on what you were supposed to), but it is not bad.

And, as you can guess, these are extremely competitive and you better have all of the things on your CV that people are expecting, lots of papers, lots of citations, prizes and well connected with the right people saying the right things about you. In short, you better be pretty smart and on-the-ball, especially in terms of career management. I'll choose my words carefully here, but we all know that some are better at gathering those career-boosting bits-and-pieces than others. But it takes a lot of management on top of everything else.

Of course, as well as being very competitive, such positions are also relatively rare, and even if you have all those bits and pieces you might not get one. You might have to become an everyday academic.

Everyday Academia
So this brings us to people like me, every day academics. And if you look round the world, in the web and in the new, we appear to be a quite whiney lot. Lots of complaints about workload and the lack of time. The life of a modern everyday academic is anything but hours of musing about the mysteries of the Universe, but time is consumed by administration and teaching (two things that have hard, finite deadlines that cannot be missed), plus all of these roles that we have not been trained in, including financial and people management. The reward for research success, such as attracting more grants and students, is typically more work.

And, if we go back to the start, the reason that we got into this game was research, but time for research actually becomes often vanishingly small when one gets the coveted permanent position. It is funny that I am productive in terms of output and grant success, but it is only because I have group of students and postdocs to work with (and, in fact, working with these people remains the highlight of my everyday academia).

Not only that, but I realise that I am the lucky one to get here at all, as many able researchers leave the field as the opportunities become rarer and rarer, and the competition becomes fiercer. I actually finding it funny that people who are so risk adverse that they would not really consider alternative careers or making your own fortune to support themselves continue blindly down one of the riskiest pathways of all, namely that of trying to secure a permanent position at a good university.

Wrapping it all up
I've written a lot here, but for the students and ECRs I would like you to think about the question of whether the academic career, and it is most likely going to be an everyday academic if you stay in the field, is really what you want. If not, then it is never to early to think about managing and directing your career to at least give you the best chance of what you want.

In closing, I often hear that those that leave at the various stages towards becoming an everyday academic have somehow failed, but in reality I wonder if the real failure is us successes finding ourselves locked into careers that squeeze the prospect of doing hand-on research out of the day.

Why don't I put my money where my mouth is and walk so I can spend my copious leisure time researching what I want? Maybe I will, maybe I will.