Planet Musings

September 04, 2015

John BaezA Compositional Framework for Markov Processes


This summer my students Brendan Fong and Blake Pollard visited me at the Centre for Quantum Technologies, and we figured out how to understand open continuous-time Markov chains! I think this is a nice step towards understanding the math of living systems.

Admittedly, it’s just a small first step. But I’m excited by this step, since Blake and I have been trying to get this stuff to work for a couple years, and it finally fell into place. And we think we know what to do next.

Here’s our paper:

• John C. Baez, Brendan Fong and Blake S. Pollard, A compositional framework for open Markov processes.

And here’s the basic idea…

Open detailed balanced Markov processes

A continuous-time Markov chain is a way to specify the dynamics of a population which is spread across some finite set of states. Population can flow between the states. The larger the population of a state, the more rapidly population flows out of the state. Because of this property, under certain conditions the populations of the states tend toward an equilibrium where at any state the inflow of population is balanced by its outflow.

In applications to statistical mechanics, we are often interested in equilibria such that for any two states connected by an edge, say i and j, the flow from i to j equals the flow from j to i. A continuous-time Markov chain with a chosen equilibrium having this property is called ‘detailed balanced‘.

I’m getting tired of saying ‘continuous-time Markov chain’, so from now on I’ll just say ‘Markov process’, just because it’s shorter. Okay? That will let me say the next sentence without running out of breath:

Our paper is about open detailed balanced Markov processes.

Here’s an example:

The detailed balanced Markov process itself consists of a finite set of states together with a finite set of edges between them, with each state i labelled by an equilibrium population q_i >0, and each edge e labelled by a rate constant r_e > 0.

These populations and rate constants are required to obey an equation called the ‘detailed balance condition’. This equation means that in equilibrium, the flow from i to j equal the flow from j to i. Do you see how it works in this example?

To get an ‘open’ detailed balanced Markov process, some states are designated as inputs or outputs. In general each state may be specified as both an input and an output, or as inputs and outputs multiple times. See how that’s happening in this example? It may seem weird, but it makes things work better.

People usually say Markov processes are all about how probabilities flow from one state to another. But we work with un-normalized probabilities, which we call ‘populations’, rather than probabilities that must sum to 1. The reason is that in an open Markov process, probability is not conserved: it can flow in or out at the inputs and outputs. We allow it to flow both in and out at both the input states and the output states.

Our most fundamental result is that there’s a category \mathrm{DetBalMark} where a morphism is an open detailed balanced Markov process. We think of it as a morphism from its inputs to its outputs.

We compose morphisms in \mathrm{DetBalMark} by identifying the output states of one open detailed balanced Markov process with the input states of another. The populations of identified states must match. For example, we may compose this morphism N:

with the previously shown morphism M to get this morphism M \circ N:

(In this example, one state was specified as an output twice, which means we need to identify two input states of another Markov process with it. Get it?)

And here’s our second most fundamental result: the category \mathrm{DetBalMark} is actually a dagger compact category. This lets us do other stuff with open Markov processes. An important one is ‘tensoring’, which lets us take two open Markov processes like M and N above and set them side by side, giving M \otimes N:

The so-called compactness is also important. This means we can take some inputs of an open Markov process and turn them into outputs, or vice versa. For example, using the compactness of \mathrm{DetBalMark} we can get this open Markov process from M:

In fact all the categories in our paper are dagger compact categories, and all our functors preserve this structure. Dagger compact categories are a well-known framework for describing systems with inputs and outputs, so this is good.

The analogy to electrical circuits

In a detailed balanced Markov process, population can flow along edges. In the detailed balanced equilibrium, without any flow of population from outside, the flow along from state i to state j will be matched by the flow back from j to i. The populations need to take specific values for this to occur.

In an electrical circuit made of linear resistors, charge can flow along wires. In equilibrium, without any driving voltage from outside, the current along each wire will be zero. The potentials will be equal at every node.

This sets up an analogy between detailed balanced continuous-time Markov chains and electrical circuits made of linear resistors! I love analogy charts, so this makes me very happy:

    Circuits    Detailed balanced Markov processes
potential population
current flow
conductance rate constant
power dissipation

This analogy is already well known. Schnakenberg used it in his book Thermodynamic Network Analysis of Biological Systems. So, our main goal is to formalize and exploit it. This analogy extends from systems in equilibrium to the more interesting case of nonequilibrium steady states, which are the main topic of our paper.

Earlier, Brendan and I introduced a way to ‘black box’ a circuit and define the relation it determines between potential-current pairs at the input and output terminals. This relation describes the circuit’s external behavior as seen by an observer who can only perform measurements at the terminals.

An important fact is that black boxing is ‘compositional’: if one builds a circuit from smaller pieces, the external behavior of the whole circuit can be determined from the external behaviors of the pieces. For category theorists, this means that black boxing is a functor!

Our new paper with Blake develops a similar ‘black box functor’ for detailed balanced Markov processes, and relates it to the earlier one for circuits.

When you black box a detailed balanced Markov process, you get the relation between population–flow pairs at the terminals. (By the ‘flow at a terminal’, we more precisely mean the net population outflow.) This relation holds not only in equilibrium, but also in any nonequilibrium steady state. Thus, black boxing an open detailed balanced Markov process gives its steady state dynamics as seen by an observer who can only measure populations and flows at the terminals.

The principle of minimum dissipation

At least since the work of Prigogine, it’s been widely accepted that a large class of systems minimize entropy production in a nonequilibrium steady state. But people still fight about the the precise boundary of this class of systems, and even the meaning of this ‘principle of minimum entropy production’.

For detailed balanced open Markov processes, we show that a quantity we call the ‘dissipation’ is minimized in any steady state. This is a quadratic function of the populations and flows, analogous to the power dissipation of a circuit made of resistors. We make no claim that this quadratic function actually deserves to be called ‘entropy production’. Indeed, Schnakenberg has convincingly argued that they are only approximately equal.

But still, the ‘dissipation’ function is very natural and useful—and Prigogine’s so-called ‘entropy production’ is also a quadratic function.

Black boxing

I’ve already mentioned the category \mathrm{DetBalMark}, where a morphism is an open detailed balanced Markov process. But our paper needs two more categories to tell its story! There’s the category of circuits, and the category of linear relations.

A morphism in the category \mathrm{Circ} is an open electrical circuit made of resistors: that is, a graph with each edge labelled by a ‘conductance’ c_e > 0, together with specified input and output nodes:

A morphism in the category \mathrm{LinRel} is a linear relation L : U \leadsto V between finite-dimensional real vector spaces U and V. This is nothing but a linear subspace L \subseteq U \oplus V. Just as relations generalize functions, linear relations generalize linear functions!

In our previous paper, Brendan and I introduced these two categories and a functor between them, the ‘black box functor’:

\blacksquare : \mathrm{Circ} \to \mathrm{LinRel}

The idea is that any circuit determines a linear relation between the potentials and net current flows at the inputs and outputs. This relation describes the behavior of a circuit of resistors as seen from outside.

Our new paper introduces a black box functor for detailed balanced Markov processes:

\square : \mathrm{DetBalMark} \to \mathrm{LinRel}

We draw this functor as a white box merely to distinguish it from the other black box functor. The functor \square maps any detailed balanced Markov process to the linear relation obeyed by populations and flows at the inputs and outputs in a steady state. In short, it describes the steady state behavior of the Markov process ‘as seen from outside’.

How do we manage to black box detailed balanced Markov processes? We do it using the analogy with circuits!

The analogy becomes a functor

Every analogy wants to be a functor. So, we make the analogy between detailed balanced Markov processes and circuits precise by turning it into a functor:

K : \mathrm{DetBalMark} \to \mathrm{Circ}

This functor converts any open detailed balanced Markov process into an open electrical circuit made of resistors. This circuit is carefully chosen to reflect the steady-state behavior of the Markov process. Its underlying graph is the same as that of the Markov process. So, the ‘states’ of the Markov process are the same as the ‘nodes’ of the circuit.

Both the equilibrium populations at states of the Markov process and the rate constants labelling edges of the Markov process are used to compute the conductances of edges of this circuit. In the simple case where the Markov process has exactly one edge from any state i to any state j, the rule is this:

C_{i j} = H_{i j} q_j


q_j is the equilibrium population of the jth state of the Markov process,

H_{i j} is the rate constant for the edge from the jth state to the ith state of the Markov process, and

C_{i j} is the conductance (that is, the reciprocal of the resistance) of the wire from the jth node to the ith node of the resulting circuit.

The detailed balance condition for Markov processes says precisely that the matrix C_{i j} is symmetric! This is just right for an electrical circuit made of resistors, since it means that the resistance of the wire from node i to node j equals the resistance of the same wire in the reverse direction, from node j to node i.

A triangle of functors

If you paid careful attention, you’ll have noticed that I’ve described a triangle of functors:

And if you know anything about how category theorists think, you’ll be wondering if this diagram commutes.

In fact, this triangle of functors does not commute! However, a general lesson of category theory is that we should only expect diagrams of functors to commute up to natural isomorphism, and this is what happens here:

The natural transformation \alpha ‘corrects’ the black box functor for resistors to give the one for detailed balanced Markov processes.

The functors \square and \blacksquare \circ K are actually equal on objects. An object in \mathrm{DetBalMark} is a finite set X with each element i \in X labelled a positive populations q_i. Both functors map this object to the vector space \mathbb{R}^X \oplus \mathbb{R}^X. For the functor \square, we think of this as a space of population-flow pairs. For the functor \blacksquare \circ K, we think of it as a space of potential-current pairs. The natural transformation \alpha then gives a linear relation

\alpha_{X,q} : \mathbb{R}^X \oplus \mathbb{R}^X \leadsto \mathbb{R}^X \oplus \mathbb{R}^X

in fact an isomorphism of vector spaces, which converts potential-current pairs into population-flow pairs in a manner that depends on the q_i. I’ll skip the formula; it’s in the paper.

But here’s the key point. The naturality of \alpha actually allows us to reduce the problem of computing the functor \square to the problem of computing \blacksquare. Suppose

M: (X,q) \to (Y,r)

is any morphism in \mathrm{DetBalMark}. The object (X,q) is some finite set X labelled by populations q, and (Y,r) is some finite set Y labelled by populations r. Then the naturality of \alpha means that this square commutes:

Since \alpha_{X,q} and \alpha_{Y,r} are isomorphisms, we can solve for the functor \square as follows:

\square(M) = \alpha_Y \circ \blacksquare K(M) \circ \alpha_X^{-1}

This equation has a clear intuitive meaning! It says that to compute the behavior of a detailed balanced Markov process, namely \square(f), we convert it into a circuit made of resistors and compute the behavior of that, namely \blacksquare K(f). This is not equal to the behavior of the Markov process, but we can compute that behavior by converting the input populations and flows into potentials and currents, feeding them into our circuit, and then converting the outputs back into populations and flows.

What we really do

So that’s a sketch of what we do, and I hope you ask questions if it’s not clear. But I also hope you read our paper! Here’s what we actually do in there. After an introduction and summary of results:

• Section 3 defines open Markov processes and the open master equation.

• Section 4 introduces detailed balance for open Markov

• Section 5 recalls the principle of minimum power
for open circuits made of linear resistors, and explains how to black box them.

• Section 6 introduces the principle of minimum dissipation for open detailed balanced Markov processes, and describes how to black box these.

• Section 7 states the analogy between circuits and detailed balanced Markov processes in a formal way.

• Section 8 describes how to compose open Markov processes, making them into the morphisms of a category.

• Section 9 does the same for detailed balanced Markov processes.

• Section 10 describes the ‘black box functor’ that sends any open detailed balanced Markov process to the linear relation describing its external behavior, and recalls the similar functor for circuits.

• Section 11 makes the analogy between between open detailed balanced Markov processes and open circuits even more formal, by making it into a functor. We prove that together with the two black box functors, this forms a triangle that commutes up to natural isomorphism.

• Section 12 is about geometric aspects of this theory. We show that the linear relations in the image of these black box functors are Lagrangian relations between symplectic vector spaces. We also show that the master equation can be seen as a gradient flow equation.

• Section 13 is a summary of what we have learned.

Finally, Appendix A is a quick tutorial on decorated cospans. This is a key mathematical tool in our work, developed by Brendan in an earlier paper.

Georg von HippelFundamental Parameters from Lattice QCD, Day Four

Today's first speaker was Andreas Jüttner, who reviewed the extraction of the light-quark CKM matrix elements Vud and Vus from lattice simulations. Since leptonic and semileptonic decay widths of Kaons and pions are very well measured, the matrix element |Vus| and the ratio |Vus|/|Vud| can be precisely determined if the form factor f+(0) and the ratio of decay constants fK/fπ are precisely predicted from the lattice. To reach the desired level of precision, the isospin breaking effects from the difference of the up and down quark masses and from electromagnetic interactions will need to be included (they are currently treated in chiral perturbation theory, which may not apply very well in the SU(3) case). Given the required level of precision, full control of all systematics is very important, and the problem of how to properly estimate the associated errors arises, to which different collaborations are offering very different answers. To make the lattice results optimally usable for CKMfitter &Co., one should ideally provide all of the lattice inputs to the CKMfitter fit separately (and not just some combination that presents a particularly small error), as well as their correlations (as far as possible).

Unfortunately, I had to miss the second talk of the morning, by Xavier García i Tormo on the extraction of αs from the static-quark potential, because our Sonderforschungsbereich (SFB/CRC) is currently up for review for a second funding period, and the local organizers had to be available for questioning by panel members.

Later in the afternoon, I returned to the workshop and joined a very interesting discussion on the topic of averaging in the presence of theoretical uncertainties. The large number of possible choices to be made in that context implies that the somewhat subjective nature of systematic error estimates survives into the averages, rather than being dissolved into a consensus of some sort.

Georg von HippelFundamental Parameters from Lattice QCD, Day Three

Today, our first speaker was Jerôme Charles, who presented new ideas about how treat data with theoretical uncertainties. The best place to read about this is probably his talk, but I will try to summarize what I understood. The framework is a firmly frequentist approach to statistics, which answers the basic question of how likely the observed data are if a given null hypothesis is true. In such a context, one can consider a theoretical uncertainty as a fixed bias δ of the estimator under consideration (such as a lattice simulation) which survives the limit of infinite statistics. One can then test the null hypothesis that the true value of the observable in question is μ by constructing a test statistic for the estimator being distributed normally with mean μ+δ and standard deviation σ (the statistical error quoted for the result). The p-value of μ then depends on δ, but not on the quoted systematic error Δ. Since the true value of δ is not known, one has to perform a scan over some region Ω, for example the interval Ωn=[-nΔ;nΔ] and take the supremum over this range of δ. One possible extension is to choose Ω adaptively in that a larger range of values needs to be scanned (i.e. a larger true systematic error in comparison to the quoted systematic error is allowed for) for lower p-values; interestingly enough, the resulting curves of p-values are numerically close to what is obtained from a naive Gaussian approach treating the systematic error as a (pseudo-)random variable. For multiple systematic errors, a multidimensional Ω has to be chosen in some way; the most natural choices of a hypercube or a hyperball correspond to adding the errors linearly or in quadrature, respectively. The linear (hypercube) scheme stands out as the only one that guarantees that the systematic error of an average is no smaller than the smallest systematic error of an individual result.

The second speaker was Patrick Fritzsch, who gave a nive review of recent lattice determinations of semileptonic heavy-light decays, both the more commonly studied B decays to πℓν and Kℓν, and the decays of the Λb that have recently been investigated by Meinel et al. with the help of LHCb.

In the afternoon, both the CKMfitter collaboration and the FLAG group held meetings.

Chad Orzel003/366: The Mountains, the Mountains

Last weekend, while the kids were at my parents’, Kate and I decided to go over to Williamstown and look at some art. We originally intended to go to the Clark Art Institute, but it was mobbed, so we drove on to MassMoCA instead.

I told several different people about that, all of whom said “Oh, did you go to the Van Gogh show?” Which made me want to see the Van Gogh show, and since I’m on sabbatical and not teaching, I drove over there and actually went to the Clark this time. (Which was still mobbed, but I got there early enough to get in…)

The Van Gogh exhibit was, in fact, very impressive. They also had the famous “Whistler’s Mother” picture on display in the separate gallery at the top of Stone Hill, so I walked up there and saw that. (It’s huge– I sort of assumed it was a small painting, but no, it’s like five feet on a side…)

Of course, it’d be a little cheesy to post a photo of a painting (though I did take several; not in the Van Gogh and Whistler exhibits, but in the main collection), so instead here’s a view from above of one of my very favorite places on Earth:

View from the top of Stone Hill, behind the Clark Art Insitute in Williamstown.

View from the top of Stone Hill, behind the Clark Art Insitute in Williamstown.

The Clark is the complex of buildings and parking lots at the base of the hill, and behind it in the center of the photo is Williams College, my alma mater. Here’s a zoomed-in version as proof:

Williams College, magnified from the wide shot above,

Williams College, magnified from the wide shot above.

You can clearly see Thompson Chapel, Griffin Hall, and the Congregational church. This would be clearer if I’d had a different lens on the camera, but I wanted the wide-angle for inside the museums, and wasn’t about to go all the way back to the bottom of the hill to get a stronger zoom lens just for the one picture. Given that this is about 1/25th the area of the original photo, though, I think that’s pretty damn good.

And the wide shot gets more of the Berkshires; it was a hot and hazy day, but you can still tell that Williamstown is really extremely pretty. Which is, after all, the point of taking this photo in the first place…

September 03, 2015

n-Category Café Rainer Vogt

I was sad to learn that Rainer Vogt died last month. He is probably best-known to Café readers for his work on homotopy-algebraic structures, especially his seminal 1973 book with Michael Boardman, Homotopy Invariant Algebraic Structures on Topological Spaces.

It was Boardman and Vogt who first studied simplicial sets satisfying what they called the “restricted Kan condition”, later renamed quasi-categories by André Joyal, and today (thanks especially to Jacob Lurie) the most deeply-explored incarnation of the notion of (,1)(\infty, 1)-category. Their 1973 book also asked and answered fundamental questions like this:

Given a topological group XX and a homotopy equivalence between XX and another space YY, what structure does YY acquire?

Clearly YY is some kind of “group up to homotopy” — but the details take some working out, and Boardman and Vogt did just that.

Martin Markl wrote a nice tribute to Vogt, which I reproduce with permission here:

Martin Markl writes:   The first time I encountered the name Rainer Vogt was during my PhD study in Praha, when among randomly chosen books I was reading appeared a Russian translation of Boardman and Vogt’s “Homotopy invariant algebraic structures on topological spaces.” I did not expect that this kind of structures would turn to be the central theme of my professional career. I did not know who Rainer Vogt was either, I only realized that, along with Rainer Maria Rilke, he was the only person christened “Rainer” I knew.

I met Rainer in person several years later, in 1998, when he delivered plenary talks at the 18th Winter School “Geometry and Physics” in Srni, a remote Bohemian village of Sumava Forest. It stricken me how he physically resembled my grandfather from the mother side. He obviously knew about my humble work on operads, and invited me to participate in the “Workshop on Operads” in Osnabrück, in June of the same year. I have been visiting Rainer regularly since.

Rainer was not only an excellent mathematician, but also a devoted amateur choir singer. Once I visited him shortly before Christmas. He brought me directly from the train station to a church in a neighboring village, where he sung in Bach’s Weihnachtsoratorium. I sat in the first row next to the priest who made frequent comments to me, not realizing that I do not understand a word. Another day he brought me to the house of his music teacher, where he rehearsed with some other people. I vividly remember his performing, in German, an aria from Smetana’s Bartered Bride, a kitschy comic opera which is considered a Czech national gem.

I learned about his serious illness during my stay at the Max-Planck-Institut für Matematik in Bonn in Winter 2014. Together with Michael Batanin and Clemens Berger, I visited him in a hospital in Osnabrück, and then once again shortly after his return home. He told me that listening to the record of Handel’s “Theodora” I brought to the hospital helped him greatly.

He was full of optimism, willing to fight his fate. I have been indirectly making inquiries about his health since, and was always assured that he was doing well. I believed I would meet him again in Bonn in January 2016. I was deeply shattered when I learned that Rainer died a couple of weeks ago.

Martin Markl, Praha, 26th of August 2015

A happy episode in my own mathematical life was an invitation from Rainer in 2000 to another workshop in operads, in his home university of Osnabrück, when I was a postdoc. That was one of the first mathematical invitations I received to anything anywhere, so it has a special place in my heart, and I have very pleasant memories of that week. In my conversations with him both then and by email in the years following, Rainer never spoke down to me or made me feel that he was a senior person and I was a young whippersnapper — something that I probably took for granted then, but now appreciate as a major virtue.

I don’t have the expertise to do full justice to Rainer Vogt’s work, but please feel free to add to what I’ve written in the comments.

Clifford JohnsonNaked!

negative_mass_thoughtsSnapshot of my doodles done while talking on the phone to a BBC producer about perhaps appearing in a film discussing anti-gravity*. I explained lots of things, including why General Relativity does not seem to like negative mass, partly because they have something called a "naked singularity", which is a big problem. Only after the call ended did I recall that in 1999 I co-authored a paper in which we discovered [...] Click to continue reading this post

The post Naked! appeared first on Asymptotia.

Tommaso DorigoA 3 TeV Dielectron Event By CMS !

The first really exciting thing from Run 2 at the CERN Large Hadron Collider (at least for me) has finally appeared. A 2.9 TeV dielectron event was recorded by CMS on August 22. At this mass a new Z' boson is not excluded by Run 1 searches.... And in the whole Run 1 data the highest-mass dielectron event collected by CMS was only 1.8 TeV. So by raising the centre-of-mass energy by 60% we collect a 60%-higher-mass event, but with 0.5% of the collisions. It is nice to think that the event might really be the first hint of a new resonance !

read more

BackreactionMore about Hawking and Perry’s new proposal to solve the black hole information loss problem

Malcom Perry’s lecture that summarizes the idea he has been working on with Stephen Hawking is now on YouTube:

The first 17 minutes or so are a very well done, but also very basic introduction to the black hole information loss problem. If you’re familiar with that, you can skip to 17:25. If you know the BMS group and only want to hear about the conserved charges on the horizon, skip to 45:00. Go to 56:10 for the summary.

Last week, there furthermore was a paper on the arxiv with a very similar argument: BMS invariance and the membrane paradigm, by Robert Penna from MIT, though this paper doesn’t directly address the information loss problem. One is lead to suspect that the author was working on the topic for a while, then heard about the idea put forward by Hawking and Perry and made sure to finish and upload his paper to the arxiv immediately... Towards the end of the paper the author also expresses concern, as I did earlier, that these degrees of freedom cannot possibly contain all the relevant information “This may be relevant for the information problem, as it forces the outgoing Hawking radiation to carry the same energy and momentum at every angle as the infalling state. This is usually not enough information to fully characterize an S-matrix state...”

The third person involved in this work is Andrew Strominger, who has been very reserved on the whole media hype. Eryn Brown reports for
“Contacted via telephone Tuesday evening, Strominger said he felt confident that the information loss paradox was not irreconcilable. But he didn't think everything was settled just yet.

He had heard Hawking say there would be a paper by the end of September. It had been the first he'd learned of it, he laughed, though he said the group did have a draft.”
(Did Hawking actually say that? Can someone point me to a source?)

Meanwhile I’ve pushed this idea back and forth in my head and, lacking further information about what they hope to achieve with this approach, have tentatively come to the conclusion that it can’t solve the problem. The reason is the following.

The endstate of black hole collapse is known to be simple and characterized only by three “hairs” – the mass, charge, and angular momentum of the black hole. This means that all higher multipole moments – deviations of the initial mass configuration from perfect spherical symmetry – have to be radiated off during collapse. If you disregard actual emission of matter, it will be radiated off in gravitons. The angular momentum related to these multipole moments has to be conserved, and there has to be an energy flux related to the emission. In my reading the BMS group and its conserved charges tell you exactly that: the multipole moments can’t vanish, they have to go to infinity. Alternatively you can interpret this as the black hole not actually being hair-less, if you count all that’s happned in the dynamical evolution.

Having said that, I didn’t know the BMS group before, but I never doubted this to be the case, and I don’t think anybody doubted this. But this isn’t the problem. The problem that occurs during collapse exists already classically, and is not in the multipole moments – which we know can’t get lost – it’s in the density profile in the radial direction. Take the simplest example: one shell of mass M, or two concentric shells of half the mass. The outside metric is identical. The inside metric vanishes behind the horizon. Where does the information about this distribution go if you let the shells collapse?

Now as Malcom said in his lecture, you can make a power-series expansion of the (Bondi) mass around the asymptotic value, and I’m pretty sure it contains the missing information about the density profile (which however already misses information of the quantum state). But this isn’t information you can measure locally, since you need an infinite amount of derivatives or infinite space to make your measurement respectively. And besides this, it’s not particularly insightful: If you have a metric that is analytic with an infinite convergence radius, you can expand it around any point and get back the metric in the whole space-time, including the radial profile. You don’t need any BMS group or conserved charges for that. (The example with the two shells is not analytical, it’s also pathological for various reasons.)

As an aside, that the real problem with the missing information in black hole collapse is in the radial direction and not in the angular direction is the main reason I never believed the strong interpretation of the Bekenstein-Hawking entropy. It seems to indicate that the entropy, which scales with the surface area, counts the states that are accessible from the outside and not all the states that black holes form from.

Feedback on this line of thought is welcome.

In summary, the Hawking-Perry argument makes perfect sense and it neatly fits together with all I know about black holes. But I don’t see how it gets one closer to solve the problem.

September 02, 2015

Chad Orzel002/366: Boids

I spent a while this morning typing on my laptop on the deck, and brought the new camera out with me for occasional procrastination. The shady spot at that hour has a nice view of the bird feeder, and I snapped a few shots of these guys feeding (using a telephoto lens):

Two birds on our backyard feeder; not sure of the species.

Two birds on our backyard feeder; not sure of the species.

(I cropped and scaled this, and did the auto-level color correction in GIMP.)

The one on the left is a house sparrow, I believe, and we have dozens of them around. I don’t think I’ve ever seen the one on the right before, though, and have no idea what species it is. I could probably look it up, but posting the photo and waiting for someone to identify it in the comments is probably just as fast…

(Sadly, the hummingbird I saw feeding on one of the Rose of Sharon bushes a few days ago didn’t reappear. Maybe next time.)

David Hoggbad start

Today was the first day of my sabbatical. I got no research done at all!

Chad OrzelOn Advising Students to Fail

Slate’s been doing a series about college classes everyone should take, and one of the most heavily promoted of these has been a piece by Dan Check urging students to take something they’re terrible at. This is built around an amusing anecdote about an acting class he took back in the day, but as much as this appeals to my liberal-arts-school background, I think it has some serious problems, which are pretty clearly displayed early on:

At four-year colleges, there is enough time and space to do something much more interesting: take at least one course in a subject in which you are untalented, and about which you are not passionate. There is a lot to be learned by taking seriously that which you have no business taking seriously, and college is one of the last times you will be able to pursue something that you’re truly awful at without serious consequence. So, rather than recommending a specific course, I recommend a type of course: the type that exposes you to other people’s talents, rather than your own, and that which allows you to really bask in the feeling of utter, hopeless ineptitude.

While this is very much in the “cleverly counterintuitive” mode that Slate has raised to an art (and better than the default “everybody should take a class in this thing that I personally work on”), it’s also making a lot of presumptions. First and foremost, it’s addressed only to four-year college students, who as Matt “Dean Dad” Reed is quick to remind people, are only a subset of the total college population.

I think this is probably restricted even more than that, though, because that “without serious consequence” is doing a lot of heavy lifting. Yes, if you’re a good student from a well-off family, your college GPA probably isn’t terribly consequential in terms of your future employment prospects. You can afford to take a C- in a drama course if you’re pulling A’s in your political science major, and it will provide an amusing anecdote to share when you interview for law school/ an internship at a major New York media company.

But as much as academics, particularly in a liberal arts context, like to decry grade-grubbing and careerism among students in general, I suspect that good grades and “marketable” skills really do matter for students from less privileged backgrounds. All those paper-resume studies showing the effects of implicit biases sort of highlight this– if changing the name on an identical resume significantly reduces the chance of an imaginary student getting an offer, well, if you’re a real student on the bad side of that effect, you probably want to do everything you can to make sure your resume isn’t identical, but better. The C- in drama that becomes a funny anecdote to liven up Dan’s interview might become a “lack of focus” that keeps Denise from getting called back.

And, of course, there’s the economic issue. If you’re paying a comprehensive fee at an elite college, then, yeah, you’ve got a few “extra” classes you might as well fill with something amusing. If you’re paying course by course at a regional university, a class in something you’re terrible at might represent a significant additional cost.

(It probably goes without saying that this is also terrible advice for students who are already mediocre. If you’re running a C+/B- in your major, your time would be much better spent getting good at something first before you go looking for personal growth through hopeless ineptitude.)

So, like a lot of advice-to-students that prioritizes personal growth, this makes me a little twitchy. It’s pitched as general advice, but it’s really advice for a rather limited demographic– good students from well-off backgrounds. Which, admittedly, is the primary demographic for elite media outlets like Slate (or at least the demographic that their readers want to see themselves as), but it’d be nice to have a little more acknowledgement of that.

(Of course, as a college classmate noted in comments on Facebook, this advice is arguably less awful than “Do what you love, and the money will follow.” Which it’s trivial to demonstrate is a false statement…)

Again, I’m all in favor of broadening horizons and trying new things, and occasionally regret not taking a slightly wider range of stuff back when I was a student. And God knows, there are a lot of students out there who could benefit enormously from loosening up a bit and taking a class they know they’ll bomb. But the idea that this is “without serious consequence” for everyone is presuming an awful lot, in a way that I don’t think is particularly helpful for anyone.


(This is a revise-and-extend of stuff I said on Twitter yesterday.)

n-Category Café How Do You Handle Your Email?

The full-frontal assault of semester begins in Edinburgh in twelve days’ time, and I have been thinking hard about coping strategies. Perhaps the most incessantly attention-grabbing part of that assault is email.

Reader: how do you manage your email? You log on and you have, say, forty new mails. If your mail is like my mail, then that forty is an unholy mixture of teaching-related and admin-related mails, with a relatively small amount of spam and perhaps one or two interesting mails on actual research (at grave peril of being pushed aside by the rest).

So, you’re sitting there with your inbox in front of you. What, precisely, do you do next?

People have all sorts of different styles of approaching their inbox. Some try to deal with each new mail before they’ve read the other new ones. Some read or scan all of them first, before replying to anything. Some do them in batches according to subject. Some use flags. What do you do?

And then, there’s the question of what you do with emails that are read but require action. Do you use your inbox as a reminder of things to do? Do you try to keep your inbox empty? Do you use other folders as “to do” lists?

And for old mails that you’ve dealt with but want to keep, how do you store them? One huge folder? Hundreds, with one for each correspondent? Somewhere in between?

There are all sorts of guides out there telling you how to manage your email effectively, but I’ve never seen one tailored to academics. So I’m curious to know: how do you, dear reader, handle your email?

Doug NatelsonNano and the oil industry

I went to an interesting lunchtime talk today by Sergio Kapusta, former chief scientist of Shell.  He gave a nice overview of the oil/gas industry and where nanoscience and nanotechnology fit in.   Clearly one of the main issues of interest is assessing (and eventually recovering) oil and gas trapped in porous rock, where the hydrocarbons can be trapped due to capillarity and the connectivity of the pores and cracks may be unknown.  Nanoparticles can be made with various chemical functionalizations (for example, dangling ligands known to be cleaved if the particle temperature exceeds some threshold) and then injected into a well; the particles can then be sought at another nearby well.  The particles act as "reporters".  The physics and chemistry of getting hydrocarbons out of these environments is all about the solid/liquid interface at the nanoscale.  More active sensor technologies for the aggressive, nasty down-hole environment are always of interest, too.

When asked about R&D spending in the oil industry, he pointed out something rather interesting:  R&D is actually cheap compared to the huge capital investments made by the major companies.  That means that it's relatively stable even in boom/bust cycles because it's only a minor perturbation on the flow of capital.  

Interesting numbers:  Total capital in hardware in the field for the petrochemical industry is on the order of $2T, built up over several decades.  Typical oil consumption worldwide is around 90M barrels equivalent per day (!).   If the supply ranges from 87-93M barrels per day, the price swings from $120 to $40/barrel, respectively.  Pretty wild.

September 01, 2015

Chad Orzel001/366: New Camera, Photo Blogging, Regal Dog

Today, I officially stopped being department chair, and started my sabbatical leave. I also acquired a new toy:

My new camera, taken with the old camera.

My new camera, taken with the old camera.

My old DSLR camera, a Canon Rebel XSi that I got mumble years ago, has been very good for over 20,000 pictures, but a few things about it were getting kind of flaky– it’s been bad at reading light levels for a while now, meaning I’m constantly having to monkey with the ISO setting manually, then forgetting to change it back when I move to a brighter location and taking a bunch of pictures where everything is all blown out. It also stopped talking to the external flash unit, after I had gotten used to the indirect flash option (though it looks like that might be a problem with the external flash unit, as it won’t talk to the new camera, either…). I could probably get those issues fixed, but it would probably cost about the same as a new camera body, with new cool features, so…

So I got a new camera. (And, as it turned out, I can now spend my vast accumulation of credit-card reward points (that I never remember to do anything with) on Amazon, so I didn’t spend any real money on this at all…) This is a Canon Rebel T6i, a slight upgrade from the XSi (not a full professional-grade DSLR), but the advance of technology over the last mumble years means it has a bunch of fun new features– bigger sensor, faster speeds, video capability. So it’ll be a nice toy.

As a bonus, I’m also acquiring a casual sabbatical project– for a while now, I’ve watched other people do the photo-a-day thing, and thought that might be fun. While I was saddled with administrative hassles of being chair, that wasn’t really going to work, but while I’m on sabbatical, I can devote a little time every day to taking photos.

So, I’ll be trying to take and post one picture a day for the next year (Or at least GIMPing photos taken on a previous day to get them down to a size I can post here…)– which happens to be a leap year, so 366 days, woo-hoo– and we’ll see how that works out. For the officially official first photo of the set, we’ll use one of the test pictures I took of Emmy:

Emmy, the Queen of Niskayuna, regally surveying her back yard.

Emmy, the Queen of Niskayuna, regally surveying her back yard.

I cropped and scaled this, but didn’t do any color tweaking. Emmy’s too jaded about photography to actually turn her head and look at the camera, but this pose has a certain dignity…

(For comparison, here’s the first posted photo from the previous camera.)

Clifford JohnsonPBS Shoot Fun

pbs_shoot_selfieMore adventures in communicating the awesomeness of physics! Yesterday I spent a gruelling seven hours in the sun talking about the development of various ideas in physics over the centuries for a new show (to air next year) on PBS. Interestingly, we did all of this at a spot that, in less dry times, would have been underwater. It was up at lake Piru, which, due to the drought, is far below capacity. You can see this by going to google maps, looking at the representation of its shape on the map, and then clicking the satellite picture overlay to see the much changed (and reduced) shape in recent times.

There's an impressive dam at one end of the lake/reservoir, and I will admit that I did not resist the temptation to pull over, look at a nice view of it from the road on the way home, and say out loud "daaayuum". An offering to the god Pun, you see.


Turns out that there's a wide range of wildlife, large and small, trudging around on the [...] Click to continue reading this post

The post PBS Shoot Fun appeared first on Asymptotia.

BackreactionLoops and Strings and Stuff

If you tie your laces, loops and strings might seem like parts of the same equation, but when it comes to quantum gravity they don’t have much in common. String Theory and Loop Quantum Gravity, both attempts to consistently combine Einstein’s General Relativity with quantum theory, rest on entirely different premises.

String theory posits that everything, including the quanta of the gravitational field, is made up of vibrating strings which are characterized by nothing but their tension. Loop Quantum Gravity is a way to quantize gravity while staying as closely as possible to the quantization methods which have been successful with the other interactions.

The mathematical realization of the two theories is completely different too. The former builds on dynamically interacting strings which give rise to higher dimensional membranes, and leads to a remarkably complex theoretical construct that might or might not actually describe the quantum properties of space and time in our universe. The latter divides up space-time into spacelike slices, and then further chops up the slices into discrete chunks to which quantum properties are assigned. This might or might not describe the quantum properties of space and time in our universe...

String theory and Loop Quantum Gravity also differ in their ambition. While String Theory is meant to be a theory for gravity and all the other interactions – a “Theory of Everything” – Loop Quantum Gravity merely aims at finding a quantum version of gravity, leaving aside the quantum properties of matter.

Needless to say, each side claims their approach is the better one. String theorists argue that taking into account all we know about the other interactions provides additional guidance. Researchers in Loop Quantum Gravity emphasize their modest and minimalist approach that carries on the formerly used quantization methods in the most conservative way possible.

They’ve been arguing for 3 decades now, but maybe there’s an end in sight.

In a little noted paper out last year, Jorge Pullin and Rudolfo Gambini argue that taking into account the interaction of matter on a loop-quantized space-time forces one to use a type of interaction that is very similar to that also found in effective models of string interactions.

The reason is Lorentz-invariance, the symmetry of special relativity. The problem with the quantization in Loop Quantum Gravity comes from the difficulty of making anything discrete Lorentz-invariant and thus compatible with special relativity and, ultimately, general relativity. The splitting of space-time into slices is not a priori a problem as long as you don’t introduce any particular length scale on the resulting slices. Once you do that, you’re stuck with a particular slicing, thus ruining Lorentz-invariance. And if you fix the size of a loop, or the length of a link in a network, that’s exactly what happens.

There has been twenty years of debate whether or not the fate of Lorentz-invariance in Loop Quantum Gravity is really problematic, because it isn’t so clear just exactly how it would make itself noticeable in observations as long as you are dealing with the gravitational sector only. But once you start putting matter on the now quantized space, you have something to calculate.

Pullin and Gambini – both from the field of LQG it must be mentioned! – argue that the Lorentz-invariance violation inevitably creeps into the matter sector if one uses local quantum field theory on the loop quantized space. But that violation of Lorentz-invariance in the matter sector would be in conflict with experiment, so that can’t be correct. Instead they suggest that this problem can be circumvented by using an interaction that is non-local in a particular way, which serves to suppress unwanted contributions that spoil Lorentz-invariance. This non-locality is similar to the non-locality that one finds in low-energy string scattering, where the non-locality is a consequence of the extension of the strings. They write:

“It should be noted that this is the first instance in which loop quantum gravity imposes restrictions on the matter content of the theory. Up to now loop quantum gravity, in contrast to supergravity or string theory, did not appear to impose any restrictions on matter. Here we are seeing that in order to be consistent with Lorentz invariance at small energies, limitations on the types of interactions that can be considered arise.”

In a nutshell it means that they’re acknowledging they have a problem and that the only way to solve it is to inch closer to string theory.

But let me extrapolate their paper, if you allow. It doesn’t stop at the matter sector of course, because if one doesn’t assume a fixed background like they do in the paper one should also have gravitons and these need to have an interaction too. This interaction will suffer from the same problem, unless you cure it by the same means. Consequently, you will in the end have to modify the quantization procedure for gravity itself. And while I’m at it anyway, I think a good way to remedy the problem would be to not force the loops to have a fixed length, but to make them dynamical and give them a tension...

I’ll stop here because I know just enough of both string theory and loop quantum gravity to realize that technically this doesn’t make a lot of sense (among many other things because you don’t quantize loops, they are the quantization), and I have no idea how to make this formally correct. All I want to say is that after thirty years maybe something is finally starting to happen.

Should this come as a surprise?

It shouldn’t if you’ve read my review on Minimal Length Scale Scenarios for Quantum Gravity. As I argued in this review, there aren't many ways you can consistently introduce a minimal length scale into quantum field theory as a low-energy effective approximation. And pretty much the only way you can consistently do it is using particular types of non-local Lagrangians (infinite series, no truncation!) that introduce exponential suppression factors. If you have a theory in which a minimal length appears in any other way, for example by means of deformations of the Poincaré algebra (once argued to arise in Loop Quantum Gravity, now ailing on life-support), you get yourself into deep shit (been there, done that, still got the smell in my nose).

Does that mean that the string is the thing? No, because this doesn’t actually tell you anything specific about the UV completion, except that it must have a well-behaved type of non-local interaction that Loop Quantum Gravity doesn’t seem to bring, or at least it isn’t presently understood how it would. Either way, I find this an interesting development.

The great benefit of writing a blog is that I’m not required to contact “researchers not involved in the study” and ask for an “outside opinion.” It’s also entirely superfluous because I can just tell you myself that the String Theorist said “well, it’s about time” and the Loop Quantum Gravity person said “that’s very controversial and actually there is also this paper and that approach which says something different.” Good thing you have me to be plainly unapologetically annoying ;) My pleasure.

Georg von HippelFundamental Parameters from Lattice QCD, Day Two

This morning, we started with a talk by Taku Izubuchi, who reviewed the lattice efforts relating to the hadronic contributions to the anomalous magnetic moment (g-2) of the muon. While the QED and electroweak contributions to (g-2) are known to great precision, most of the theoretical uncertainty presently comes from the hadronic (i.e. QCD) contributions, of which there are two that are relevant at the present level of precision: the contribution from the hadronic vacuum polarization, which can be inserted into the leading-order QED correction, and the contribution from hadronic light-by-light scattering, which can be inserted between the incoming external photon and the muon line. There are a number of established methods for computing the hadronic vacuum polarization, both phenomenologically using a dispersion relation and the experimental R-ratio, and in lattice field theory by computing the correlator of two vector currents (which can, and needs to, be refined in various way in order to achieve competitive levels of precision). No such well-established methods exist yet for the light-by-light scattering, which is so far mostly described using models. There are however, now efforts from a number of different sides to tackle this contribution; Taku mainly presented the appproach by the RBC/UKQCD collaboration, which uses stochastic sampling of the internal photon propagators to explicitly compute the diagrams contributing to (g-2). Another approach would be to calculate the four-point amplitude explicitly (which has recently been done for the first time by the Mainz group) and to decompose this into form factors, which can then be integrated to yield the light-by-light scattering contribution to (g-2).

The second talk of the day was given by Petros Dimopoulos, who reviewed lattice determinations of D and B leptonic decays and mixing. For the charm quark, cut-off effects appear to be reasonably well-controlled with present-day lattice spacings and actions, and the most precise lattice results for the D and Ds decay constants claim sub-percent accuracy. For the b quark, effective field theories or extrapolation methods have to be used, which introduces a source of hard-to-assess theoretical uncertainty, but the results obtained from the different approaches generally agree very well amongst themselves. Interestingly, there does not seem to be any noticeable dependence on the number of dynamical flavours in the heavy-quark flavour observables, as Nf=2 and Nf=2+1+1 results agree very well to within the quoted precisions.

In the afternoon, the CKMfitter collaboration split off to hold their own meeting, and the lattice participants met for a few one-on-one or small-group discussions of some topics of interest.

John BaezThe Inverse Cube Force Law

Here you see three planets. The blue planet is orbiting the Sun in a realistic way: it’s going around an ellipse.

The other two are moving in and out just like the blue planet, so they all stay on the same circle. But they’re moving around this circle at different rates! The green planet is moving faster than the blue one: it completes 3 orbits each time the blue planet goes around once. The red planet isn’t going around at all: it only moves in and out.

What’s going on here?

In 1687, Isaac Newton published his Principia Mathematica. This book is famous, but in Propositions 43–45 of Book I he did something that people didn’t talk about much—until recently. He figured out what extra force, besides gravity, would make a planet move like one of these weird other planets. It turns out an extra force obeying an inverse cube law will do the job!

Let me make this more precise. We’re only interested in ‘central forces’ here. A central force is one that only pushes a particle towards or away from some chosen point, and only depends on the particle’s distance from that point. In Newton’s theory, gravity is a central force obeying an inverse square law:

F(r) = - \displaystyle{ \frac{a}{r^2} }

for some constant a. But he considered adding an extra central force obeying an inverse cube law:

F(r) = - \displaystyle{ \frac{a}{r^2} + \frac{b}{r^3} }

He showed that if you do this, for any motion of a particle in the force of gravity you can find a motion of a particle in gravity plus this extra force, where the distance r(t) is the same, but the angle \theta(t) is not.

In fact Newton did more. He showed that if we start with any central force, adding an inverse cube force has this effect.

There’s a very long page about this on Wikipedia:

Newton’s theorem of revolving orbits, Wikipedia.

I haven’t fully understood all of this, but it instantly makes me think of three other things I know about the inverse cube force law, which are probably related. So maybe you can help me figure out the relationship.

The first, and simplest, is this. Suppose we have a particle in a central force. It will move in a plane, so we can use polar coordinates r, \theta to describe its position. We can describe the force away from the origin as a function F(r). Then the radial part of the particle’s motion obeys this equation:

\displaystyle{ m \ddot r = F(r) + \frac{L^2}{mr^3} }

where L is the magnitude of particle’s angular momentum.

So, angular momentum acts to provide a ‘fictitious force’ pushing the particle out, which one might call the centrifugal force. And this force obeys an inverse cube force law!

Furthermore, thanks to the formula above, it’s pretty obvious that if you change L but also add a precisely compensating inverse cube force, the value of \ddot r will be unchanged! So, we can set things up so that the particle’s radial motion will be unchanged. But its angular motion will be different, since it has a different angular momentum. This explains Newton’s observation.

It’s often handy to write a central force in terms of a potential:

F(r) = -V'(r)

Then we can make up an extra potential responsible for the centrifugal force, and combine it with the actual potential V into a so-called effective potential:

\displaystyle{ U(r) = V(r) + \frac{L^2}{2mr^2} }

The particle’s radial motion then obeys a simple equation:

\ddot{r} = - U'(r)

For a particle in gravity, where the force obeys an inverse square law and V is proportional to -1/r, the effective potential might look like this:

This is the graph of

\displaystyle{ U(r) = -\frac{4}{r} + \frac{1}{r^2} }

If you’re used to particles rolling around in potentials, you can easily see that a particle with not too much energy will move back and forth, never making it to r = 0 or r = \infty. This corresponds to an elliptical orbit. Give it more energy and the particle can escape to infinity, but it will never hit the origin. The repulsive ‘centrifugal force’ always overwhelms the attraction of gravity near the origin, at least if the angular momentum is nonzero.

On the other hand, suppose we have a particle moving in an attractive inverse cube force! Then the potential is proportional to 1/r^2, so the effective potential is

\displaystyle{ U(r) = \frac{c}{r^2} + \frac{L^2}{mr^2} }

where c is negative for an attractive force. If this attractive force is big enough, namely

\displaystyle{ c < -\frac{L^2}{m} }

then this force can exceed the centrifugal force, and the particle can fall in to r = 0. If we keep track of the angular coordinate \theta, we can see what’s really going on. The particle is spiraling in to its doom, hitting the origin in a finite amount of time!

This should remind you of a black hole, and indeed something similar happens there, but even more drastic:

Schwarzschild geodesics: effective radial potential energy, Wikipedia.

For a nonrotating uncharged black hole, the effective potential has three terms. Like Newtonian gravity it has an attractive -1/r term and a repulsive 1/r^2 term. But it also has an attractive term -1/r^3 term! In other words, it’s as if on top of Newtonian gravity, we had another attractive force obeying an inverse fourth power law! This overwhelms the others at short distances, so if you get too close to a black hole, you spiral in to your doom.

For example, a black hole can have an effective potential like this:

But back to inverse cube force laws! I know two more things about them. A while back I discussed how a particle in an inverse square force can be reinterpreted as a harmonic oscillator:

Planets in the fourth dimension, Azimuth.

There are many ways to think about this, and apparently the idea in some form goes all the way back to Newton! It involves a sneaky way to take a particle in a potential

\displaystyle{ V(r) \propto r^{-1} }

and think of it as moving around in the complex plane. Then if you square its position—thought of as a complex number—and cleverly reparametrize time, you get a particle moving in a potential

\displaystyle{ V(r) \propto r^2 }

This amazing trick can be generalized! A particle in a potential

\displaystyle{ V(r) \propto r^p }

can transformed to a particle in a potential

\displaystyle{ V(r) \propto r^q }


(p+2)(q+2) = 4

A good description is here:

• Rachel W. Hall and Krešimir Josić, Planetary motion and
the duality of force laws
, SIAM Review 42 (2000), 115–124.

This trick transforms particles in r^p potentials with p ranging between -2 and +\infty to r^q potentials with q ranging between +\infty and -2. It’s like a see-saw: when p is small, q is big, and vice versa.

But you’ll notice this trick doesn’t actually work at p = -2, the case that corresponds to the inverse cube force law. The problem is that p + 2 = 0 in this case, so we can’t find q with (p+2)(q+2) = 4.

So, the inverse cube force is special in three ways: it’s the one that you can add on to any force to get solutions with the same radial motion but different angular motion, it’s the one that naturally describes the ‘centrifugal force’, and it’s the one that doesn’t have a partner! We’ve seen how the first two ways are secretly the same. I don’t know about the third, but I’m hopeful.

Quantum aspects

Finally, here’s a fourth way in which the inverse cube law is special. This shows up most visibly in quantum mechanics… and this is what got me interested in this business in the first place.

You see, I’m writing a paper called ‘Struggles with the continuum’, which discusses problems in analysis that arise when you try to make some of our favorite theories of physics make sense. The inverse square force law poses interesting problems of this sort, which I plan to discuss. But I started wanting to compare the inverse cube force law, just so people can see things that go wrong in this case, and not take our successes with the inverse square law for granted.

Unfortunately a huge digression on the inverse cube force law would be out of place in that paper. So, I’m offloading some of that material to here.

In quantum mechanics, a particle moving in an inverse cube force law has a Hamiltonian like this:

H = -\nabla^2 + c r^{-2}

The first term describes the kinetic energy, while the second describes the potential energy. I’m setting \hbar = 1 and 2m = 1 to remove some clutter that doesn’t really affect the key issues.

To see how strange this Hamiltonian is, let me compare an easier case. If p < 2, the Hamiltonian

H = -\nabla^2 + c r^{-p}

is essentially self-adjoint on C_0^\infty(\mathbb{R}^3 - \{0\}), which is the space of compactly supported smooth functions on 3d Euclidean space minus the origin. What this means is that first of all, H is defined on this domain: it maps functions in this domain to functions in L^2(\mathbb{R}^3). But more importantly, it means we can uniquely extend H from this domain to a self-adjoint operator on some larger domain. In quantum physics, we want our Hamiltonians to be self-adjoint. So, this fact is good.

Proving this fact is fairly hard! It uses something called the Kato–Lax–Milgram–Nelson theorem together with this beautiful inequality:

\displaystyle{ \int_{\mathbb{R}^3} \frac{1}{4r^2} |\psi(x)|^2 \,d^3 x \le \int_{\mathbb{R}^3} |\nabla \psi(x)|^2 \,d^3 x }

for any \psi\in C_0^\infty(\mathbb{R}^3).

If you think hard, you can see this inequality is actually a fact about the quantum mechanics of the inverse cube law! It says that if c \ge -1/4, the energy of a quantum particle in the potential c r^{-2} is bounded below. And in a sense, this inequality is optimal: if c < -1/4, the energy is not bounded below. This is the quantum version of how a classical particle can spiral in to its doom in an attractive inverse cube law, if it doesn’t have enough angular momentum. But it’s subtly and mysteriously different.

You may wonder how this inequality is used to prove good things about potentials that are ‘less singular’ than the c r^{-2} potential: that is, potentials c r^{-p} with p < 2. For that, you have to use some tricks that I don’t want to explain here. I also don’t want to prove this inequality, or explain why its optimal! You can find most of this in some old course notes of mine:

• John Baez, Quantum Theory and Analysis, 1989.

See especially section 15.

But it’s pretty easy to see how this inequality implies things about the expected energy of a quantum particle in the potential c r^{-2}. So let’s do that.

In this potential, the expected energy of a state \psi is:

\displaystyle{  \langle \psi, H \psi \rangle =   \int_{\mathbb{R}^3} \overline\psi(x)\, (-\nabla^2 + c r^{-2})\psi(x) \, d^3 x }

Doing an integration by parts, this gives:

\displaystyle{  \langle \psi, H \psi \rangle = \int_{\mathbb{R}^3} |\nabla \psi(x)|^2 + cr^{-2} |\psi(x)|^2 \,d^3 x }

The inequality I showed you says precisely that when c = -1/4, this is greater than or equal to zero. So, the expected energy is actually nonnegative in this case! And making c greater than -1/4 only makes the expected energy bigger.

Note that in classical mechanics, the energy of a particle in this potential ceases to be bounded below as soon as c < 0. Quantum mechanics is different because of the uncertainty principle! To get a lot of negative potential energy, the particle’s wavefunction must be squished near the origin, but that gives it kinetic energy.

It turns out that the Hamiltonian for a quantum particle in an inverse cube force law has exquisitely subtle and tricky behavior. Many people have written about it, running into ‘paradoxes’ when they weren’t careful enough. Only rather recently have things been straightened out.

For starters, the Hamiltonian for this kind of particle

H = -\nabla^2 + c r^{-2}

has different behaviors depending on c. Obviously the force is attractive when c > 0 and repulsive when c < 0, but that’s not the only thing that matters! Here’s a summary:

c \ge 3/4. In this case H is essentially self-adjoint on C_0^\infty(\mathbb{R}^3 - \{0\}). So, it admits a unique self-adjoint extension and there’s no ambiguity about this case.

c < 3/4. In this case H is not essentially self-adjoint on C_0^\infty(\mathbb{R}^3 - \{0\}). In fact, it admits more than one self-adjoint extension! This means that we need extra input from physics to choose the Hamiltonian in this case. It turns out that we need to say what happens when the particle hits the singularity at r = 0. This is a long and fascinating story that I just learned yesterday.

c \ge -1/4. In this case the expected energy \langle \psi, H \psi \rangle is bounded below for \psi \in C_0^\infty(\mathbb{R}^3 - \{0\}). It turns out that whenever we have a Hamiltonian that is bounded below, even if there is not a unique self-adjoint extension, there exists a canonical ‘best choice’ of self-adjoint extension, called the Friedrichs extension. I explain this in my course notes.

c < -1/4. In this case the expected energy is not bounded below, so we don’t have the Friedrichs extension to help us choose which self-adjoint extension is ‘best’.

To go all the way down this rabbit hole, I recommend these two papers:

• Sarang Gopalakrishnan, Self-Adjointness and the Renormalization of Singular Potentials, B.A. Thesis, Amherst College.

• D. M. Gitman, I. V. Tyutin and B. L. Voronov, Self-adjoint extensions and spectral analysis in the Calogero problem, Jour. Phys. A 43 (2010), 145205.

The first is good for a broad overview of problems associated to singular potentials such as the inverse cube force law; there is attention to mathematical rigor the focus is on physical insight. The second is good if you want—as I wanted—to really get to the bottom of the inverse cube force law in quantum mechanics. Both have lots of references.

Also, both point out a crucial fact I haven’t mentioned yet: in quantum mechanics the inverse cube force law is special because, naively, at least it has a kind of symmetry under rescaling! You can see this from the formula

H = -\nabla^2 + cr^{-2}

by noting that both the Laplacian and r^{-2} have units of length-2. So, they both transform in the same way under rescaling: if you take any smooth function \psi, apply H and then expand the result by a factor of k,, you get k^2 times what you get if you do those operations in the other order.

In particular, this means that if you have a smooth eigenfunction of H with eigenvalue \lambda, you will also have one with eigenfunction k^2 \lambda for any k > 0. And if your original eigenfunction was normalizable, so will be the new one!

With some calculation you can show that when c \le -1/4, the Hamiltonian H has a smooth normalizable eigenfunction with a negative eigenvalue. In fact it’s spherically symmetric, so finding it is not so terribly hard. But this instantly implies that H has smooth normalizable eigenfunctions with any negative eigenvalue.

This implies various things, some terrifying. First of all, it means that H is not bounded below, at least not on the space of smooth normalizable functions. A similar but more delicate scaling argument shows that it’s also not bounded below on C_0^\infty(\mathbb{R}^3 - \{0\}), as I claimed earlier.

This is scary but not terrifying: it simply means that when c \le -1/4, the potential is too strongly negative for the Hamiltonian to be bounded below.

The terrifying part is this: we’re getting uncountably many normalizable eigenfunctions, all with different eigenvalues, one for each choice of k. A self-adjoint operator on a countable-dimensional Hilbert space like L^2(\mathbb{R}^3) can’t have uncountably many normalizable eigenvectors with different eigenvalues, since then they’d all be orthogonal to each other, and that’s too many orthogonal vectors to fit in a Hilbert space of countable dimension!

This sounds like a paradox, but it’s not. These functions are not all orthogonal, and they’re not all eigenfunctions of a self-adjoint operator. You see, the operator H is not self-adjoint on the domain we’ve chosen, the space of all smooth functions in L^2(\mathbb{R}^3). We can carefully choose a domain to get a self-adjoint operator… but it turns out there are many ways to do it.

Intriguingly, in most cases this choice breaks the naive dilation symmetry. So, we’re getting what physicists call an ‘anomaly’: a symmetry of a classical system that fails to give a symmetry of the corresponding quantum system.

Of course, if you’ve made it this far, you probably want to understand what the different choices of Hamiltonian for a particle in an inverse cube force law actually mean, physically. The idea seems to be that they say how the particle changes phase when it hits the singularity at r = 0 and bounces back out.

(Why does it bounce back out? Well, if it didn’t, time evolution would not be unitary, so it would not be described by a self-adjoint Hamiltonian! We could try to describe the physics of a quantum particle that does not come back out when it hits the singularity, and I believe people have tried, but this requires a different set of mathematical tools.)

For a detailed analysis of this, it seems one should take Schrödinger’s equation and do a separation of variables into the angular part and the radial part:

\psi(r,\theta,\phi) = \Psi(r) \Phi(\theta,\phi)

For each choice of \ell = 0,1,2,\dots one gets a space of spherical harmonics that one can use for the angular part \Phi. The interesting part is the radial part, \Psi. Here it is helpful to make a change of variables

u(r) = \Psi(r)/r

At least naively, Schrödinger’s equation for the particle in the cr^{-2} potential then becomes

\displaystyle{ \frac{d}{dt} u = -iH u }


\displaystyle{ H = -\frac{d^2}{dr^2} + \frac{c + \ell(\ell+1)}{r^2} }

Beware: I keep calling all sorts of different but related Hamiltonians H, and this one is for the radial part of the dynamics of a quantum particle in an inverse cube force. As we’ve seen before in the classical case, the centrifugal force and the inverse cube force join forces in an ‘effective potential’

\displaystyle{ U(r) = kr^{-2} }


k = c + \ell(\ell+1)

So, we have reduced the problem to that of a particle on the open half-line (0,\infty) moving in the potential kr^{-2}. The Hamiltonian for this problem:

\displaystyle{ H = -\frac{d^2}{dr^2} + \frac{k}{r^2} }

is called the Calogero Hamiltonian. Needless to say, it has fascinating and somewhat scary properties, since to make it into a bona fide self-adjoint operator, we must make some choice about what happens when the particle hits r = 0. The formula above does not really specify the Hamiltonian.

This is more or less where Gitman, Tyutin and Voronov begin their analysis, after a long and pleasant review of the problem. They describe all the possible choices of self-adjoint operator that are allowed. The answer depends on the values of k, but very crudely, the choice says something like how the phase of your particle changes when it bounces off the singularity. Most choices break the dilation invariance of the problem. But intriguingly, some choices retain invariance under a discrete subgroup of dilations!

So, the rabbit hole of the inverse cube force law goes quite deep, and I expect I haven’t quite gotten to the bottom yet. The problem may seem pathological, verging on pointless. But the math is fascinting, and it’s a great testing-ground for ideas in quantum mechanics—very manageable compared to deeper subjects like quantum field theory, which are riddled with their own pathologies. Finally, the connection between the inverse cube force law and centrifugal force makes me think it’s not a mere curiosity.


The animation was made by ‘WillowW’ and placed on Wikicommons. It’s one of a number that appears in this Wikipedia article:

Newton’s theorem of revolving orbits, Wikipedia.

I made the graphs using the free online Desmos graphing calculator.

Tommaso DorigoBel's Temple In Palmyra Is No More

Images of the systematic destruction of archaeological sites and art pieces in Syria are no news any more, but I was especially saddened to see before/after aerial pictures of Palmyra's site today, which demonstrate how the beautiful temple of Bel has been completely destroyed by explosives. A picture of the temple is shown below.

read more

August 31, 2015

Secret Blogging Seminarvan Ekeren, Möller, Scheithauer on holomorphic orbifolds

There aren’t many blog posts about vertex operator algebras, so I thought I’d help fill this gap by mentioning a substantial advance by Jethro van Ekeren, Sven Möller, and Nils Scheithauer that appeared on the ArXiv last month. The most important feature is that this paper resolves several folklore conjectures that have been around since near the beginning of vertex operator algebra theory. This was good for me, since I was able to use some of these results to prove the Generalized Moonshine Conjecture much more quickly than I had expected. I won’t say much about moonshine here, as I think it deserves its own post.

I briefly discussed vertex operator algebras in my earlier post on generalized moonshine. While an ordinary commutative ring has a multiplication structure A \otimes A \to A, a vertex operator algebra (or VOA) has a “meromorphic” version V \otimes V \to V((z)), and there is an integer grading on the underlying vector space that is compatible with the powers of z in a straightforward way.

I won’t say much about VOAs in general, but rather, I will consider those that satisfy some of the following nice properties:
Rational: Any V-module is a direct sum of irreducibles.
Holomorphic: Any V-module is a direct sum of copies of V.
C_2 cofinite: This is a rather technical-sounding condition that ends up being equivalent to a lot of natural representation-theoretic finiteness properties, like “every representation is a direct sum of generalized eigenspaces for the energy operator L(0)”.
It is conjectured that every rational VOA is C_2 cofinite.

As usual, when we have a collection of nice objects, we may want to classify them, or at least find ways of building new ones and discovering invariants and constraints.

Some basic invariants are the central charge c (a complex number), and the character of a module M, given by the graded dimension T_M(\tau) = Tr(q^{L(0)-c/24}|M), where the grading is given by the energy operator L(0), and we view the power series as a function on the complex upper half plane using q = \exp(2\pi i \tau). One of the first general results for “nice” VOAs is Zhu’s 1996 proof that if V is rational and C_2 cofinite, then the characters of irreducible V-modules form a vector-valued modular form for a finite dimensional representation of SL_2(\mathbb{Z}). Furthermore, he showed that in this case, the central charge c is a rational number, and if V is holomorphic, then c is a nonnegative integer divisible by 8.

Dong and Mason classified the holomorphic C_2 cofinite VOAs of central charge 8 and 16 – there is one isomorphism class for central charge 8, and 2 isomorphism classes for central charge 16. All three are given by a lattice VOA construction. In general, if you are given an even unimodular positive definite lattice (which only exists in dimension divisible by 8), you get a a holomorphic C_2 cofinite VOA from it, so the central charge 8 object comes from the E_8 lattice, and the central charge 16 objects come from the E_8 \times E_8 and D_{16}^* lattices. Central charge 24 is at a sweet spot of difficulty, where Schellekens did a long calculation in 1993 and conjectured the existence of 71 isomorphism types. Central charge 32 is more or less impossible, since lattices alone give over 10^9 types.

For central charge 24, because the L(0) eigenspace V_1 with eigenvalue 1 is naturally a Lie algebra, the proposed isomorphism types are labeled by finite dimensional Lie algebras. Schellekens’s list is basically

1. The monster VOA, with V_1 = 0.
2. The Leech lattice VOA, with V_1 commutative of dimension 24.
3. 69 extensions of rational Kac-Moody VOAs by suitable modules (here the Lie algebras are products of simple Lie algebras and in particular noncommutative).

As far as existence is concerned, 23 of the 69 come from lattices, known as the Niemeier lattices. An additional 14 come from Z/2 orbifolds of lattices. Another 18 come from a “framed VOA” construction, given by adjoining modules to a tensor product of Ising models according to some codes (Lam, Shimakura, and Yamauchi are the main names here). The remaining 12 are more difficult, and after this recent paper, there are 2 that have not been constructed. There are only a few cases where uniqueness is known, such as the Leech lattice VOA. The V_1 = 0 case is wide open, and perhaps the worst for uniqueness, since there isn’t any Lie algebra structure to work with.

One of the results of van Ekeren, Möller, and Scheithauer was a reconstruction of Schellekens list, i.e., eliminating other choices of Lie algebras from possibility. This was desirable, since the original paper was quite sketchy in places and didn’t have proofs. A second result was a collection of new examples, in particular nearly filling out this list of 69. They did this by solving an old problem, namely the construction of holomorphic orbifolds. The idea is the following: Given a holomorphic C_2 cofinite VOA V, and a finite order automorphism g, take the fixed-point subalgebra V^g, and take a direct sum with some V^g-modules not in V to get something new. In fact, the desired V^g-modules were more or less known – there is a notion of g-twisted V-module V(g), and one takes the submodules of all V(g^i) fixed by a suitable lift of g. To show that this even makes sense requires substantial development of the theory.

First, the existence and uniqueness of irreducible g-twisted V-modules V(g) was a nonconstructive theorem of Dong, Li, and Mason in 2000. Then, to get a multiplication operation on the component V^g-modules, one first shows that irreducible V^g-modules have a nice tensor structure (in particular, are simple currents), so that the space of suitable multiplication maps is highly constrained. This requires recent major theorems of Miyamoto (V^g is rational and C_2 cofinite – 2013), and Huang (if V is rational and C_2 cofinite, then Rep(V) is a modular tensor category and the Verlinde formula holds – 2008). By some clever applications of the Verlinde formula, van Ekeren, Möller, and Scheithauer showed that once we have simple currents with suitable L(0)-eigenvalues, the homological obstruction to a well-behaved multiplication vanishes, and one gets a holomorphic VOA.

The intermediate results that I found most useful for my own purposes were:
1. assembly of an abelian intertwining algebra (a generalization of VOA where the commutativity of multiplication is allowed some monodromy) from all irreducible V^g-modules.
2. the explicit description of the SL_2(\mathbb{Z}) action on the characters of irreducible V^g-modules. This also solves a conjecture of Dong, Li, and Mason concerning the graded dimension of twisted modules.

In particular, if g has order n, then the simple currents are arranged into a central extension 0 \to \mathbb{Z}/n\mathbb{Z} \to A \to \mathbb{Z}/n\mathbb{Z} \to 0, where the kernel is given by an action of g, and the image is the twisting on modules. The group A is also equipped with a canonical \mathbb{Q}/\mathbb{Z}-valued quadratic form. One obtains an A-graded abelian intertwining algebra with monodromy determined by the quadratic form (up to a certain coboundary), and the SL_2(\mathbb{Z}) action is by the corresponding Weil representation (up to the c/24 correction).

n-Category Café Wrangling generators for subobjects

Guest post by John Wiltshire-Gordon

My new paper arXiv:1508.04107 contains a definition that may be of interest to category theorists. Emily Riehl has graciously offered me this chance to explain.

In algebra, if we have a firm grip on some object X X , we probably have generators for X X . Later, if we have some quotient X/ X / \sim , the same set of generators will work. The trouble comes when we have a subobject YX Y \subseteq X , which (given typical bad luck) probably misses every one of our generators. We need theorems to find generators for subobjects.

Category theory offers a clean definition of generation: if C C is some category of algebraic objects and FU F \dashv U is a free-forgetful adjunction with U:CSet U : C \longrightarrow \mathrm{Set} , then it makes sense to say that a subset SUX S \subseteq U X generates X X if the adjunct arrow FSX F S \rightarrow X is epic.

Certainly R R -modules fit into this setup nicely, and groups, commutative rings, etc. What about simplicial sets? It makes sense to say that some simplicial set X X is “generated” by its 1-simplices, for example: this is saying that X X is 1-skeletal. But simplicial sets come with many sorts of generator…Ah, and they also come with many forgetful functors, given by evaluation at the various objects of Δ op \Delta^{op} .

Let’s assume we’re in a context where there are many forgetful functors, and many corresponding notions of generation. In fact, for concreteness, let’s think about cosimplicial vector spaces over the rational numbers. A cosimplicial vector space is a functor ΔVect \Delta \longrightarrow \mathrm{Vect} , and so for each dΔ d \in \Delta we have a functor U d:Vect ΔSet U_d : \mathrm{Vect}^{\Delta} \longrightarrow \mathrm{Set} with U dV=Vd U_d V = V d and left adjoint F d F_d . We will say that a vector vVd v \in V d sits in degree d d , and generally think of V V as a vector space graded by the objects of Δ \Delta .

Definition A cosimplicial vector space V V is generated in degree dΔ d \in \Delta if the component at V V of the counit F dU dVV F_d U_d V \longrightarrow V is epic. Similarly, V V is generated in degrees {d i} \{d_i \} if iF d iU d iVV \oplus_i F_{d_i} U_{d_i} V \longrightarrow V is epic.

Example Let V=F d{*} V = F_d \{ \ast \} be the free cosimplicial vector space on a single vector in degree d d . Certainly V V is generated in degree d d . It’s less obvious that V V admits a unique nontrivial subobject WV W \hookrightarrow V . Let’s try to find generators for W W . It turns out that Wd=0 W d = 0 , so no generators there. Since W0 W \neq 0 , there must be generators somewhere… but where?

Theorem (Wrangling generators for cosimplicial abelian groups): If V V is a cosimplicial abelian group generated in degrees {d i} \{ d_i \} , then any subobject WV W \hookrightarrow V is generated in degrees {d i+1} \{d_i + 1 \} .

Ok, so now we know exactly where to look for generators for subobjects: exactly one degree higher than our generators for the ambient object. The generators have been successfully wrangled.

The preorder on degrees of generation d \leq_d

Time to formalize. Let U d,U x,U y:CSet U_d, U_x, U_y: C \longrightarrow \mathrm{Set} be three forgetful functors, and let F d,F x,F y F_d, F_x, F_y be their left adjoints. When the labels d,x,y d, x, y appear unattached to U U or F F , they represent formal “degrees of generation,” even though C C need not be a functor category. In this broader setting, we say VC V \in C is generated in (formal) degree \star if the component of the counit F U VV F_{\star} U_{\star} V \longrightarrow V is epic. By the unit-counit identities, if V V is generated in degree \star , the whole set U V U_{\star} V serves as a generating set.

Definition Say x dy x \leq_d y if for all VC V \in C generated in degree d d , every subobject WV W \hookrightarrow V generated in degree x x is also generated in degree y y .

Practically speaking, if x dy x \leq_d y , then generators in degree x x can always be replaced by generators in degree y y provided that the ambient object is generated in degree d d .

Suppose that we have a complete understanding of the preorder d \leq_d , and we’re trying to generate subobjects inside some object generated in degree d d . Then every time x dy x \leq_d y , we may replace generators in degree x x with their span in degree y y . In other words, the generators SU xV S \subseteq U_x V are equivalent to generators Im(U yF xSU yV)U yV \mathrm{Im}(U_y F_x S \longrightarrow U_y V) \subseteq U_y V . Arguing in this fashion, we may wrangle all generators upward in the preorder d \leq_d . If d \leq_d has a finite system of elements m 1,m 2,,m k m_1, m_2, \ldots, m_k capable of bounding any other element from above, then all generators may be replaced by generators in degrees m 1,m 2,,m k m_1, m_2, \ldots, m_k . This is the ideal wrangling situation, and lets us restrict our search for generators to this finite set of degrees.

In the case of cosimplicial vector spaces, d+1 d + 1 is a maximum for the preorder d \leq_d with dΔ d \in \Delta . So any subobject of a simplicial vector space generated in degree d d is generated in degree d+1 d + 1 . (It is also true that, for example, d+2 d + 2 is a maximum for the preorder d \leq_d . In fact, we have d+1 dd+2 dd+1 d + 1 \leq_d d+2 \leq_d d + 1 . That’s why it’s important that d \leq_d is a preorder, and not a true partial order.)

Connection to the preprint arXiv:1508.04107

In the generality presented above, where a formal degree of generation is a free-forgetful adjunction to Set \mathrm{Set} , I do not know much about the preorder d \leq_d . The paper linked above is concerned with the case C=(Mod R) 𝒟 C = (\mathrm{Mod}_R)^{\mathcal{D}} of functor categories of 𝒟 \mathcal{D} -shaped diagrams of R R -modules. In this case I can say a lot.

In Definition 1.1, I give a computational description of the preorder d \leq_d . This description makes it clear that if 𝒟 \mathcal{D} has finite hom-sets, then you could program a computer to tell you whenever x dy x \leq_d y .

In Section 2.2, I give many different categories 𝒟 \mathcal{D} for which explicit upper bounds are known for the preorders d \leq_d . (In the paper, an explicit system of upper bounds for every preorder is called a homological modulus.)

Connection to the field of Representation Stability

If you’re interested in more context for this work, I highly recommend two of Emily Riehl’s posts from February of last year on Representation Stability, a subject begun by Tom Church and Benson Farb. With Jordan Ellenberg, they explained how certain stability patterns can be considered consequences of structure theory for the category of FI \mathrm{FI} -modules (Vect ) FI (\mathrm{Vect}_{\mathbb{Q}})^{\mathrm{FI}} where FI \mathrm{FI} is the category of finite sets with injections. In the category of FI \mathrm{FI} -modules, the preorders n \leq_n have no finite system of upper bounds. In contrast, for Fin \mathrm{Fin} -modules, every preorder has a maximum! (Here Fin \mathrm{Fin} is the usual category of finite sets). So having all finite set maps instead of just the injections gives much better control on generators for subobjects. As an application, Jordan and I use this extra control to obtain new results about configuration spaces of points on a manifold. You can read about it on his blog.

For more on the recent progress of representation stability, you can also check out the bibliography of my paper or take a look at exciting new results by CEF, as well as Rohit Nagpal, Andy Putman, Steven Sam, and Andrew Snowden, and Jenny Wilson.

Georg von HippelFundamental Parameters from Lattice QCD, Day One

Greetings from Mainz, where I have the pleasure of covering a meeting for you without having to travel from my usual surroundings (I clocked up more miles this year already than can be good from my environmental conscience).

Our Scientific Programme (which is the bigger of the two formats of meetings that the Mainz Institute of Theoretical Physics (MITP) hosts, the smaller being Topical Workshops) started off today with two keynote talks summarizing the status and expectations of the FLAG (Flavour Lattice Averaging Group, presented by Tassos Vladikas) and CKMfitter (presented by Sébastien Descotes-Genon) collaborations. Both groups are in some way in the business of performing weighted averages of flavour physics quantities, but of course their backgrounds, rationale and methods are quite different in many regards. I will no attempt to give a line-by-line summary of the talks or the afternoon discussion session here, but instead just summarize a few
points that caused lively discussions or seemed important in some other way.

By now, computational resources have reached the point where we can achieve such statistics that the total error on many lattice determinations of precision quantities is completely dominated by systematics (and indeed different groups would differ at the several-σ level if one were to consider only their statistical errors). This may sound good in a way (because it is what you'd expect in the limit of infinite statistics), but it is also very problematic, because the estimation of systematic errors is in the end really more of an art than a science, having a crucial subjective component at its heart. This means not only that systematic errors quoted by different groups may not be readily comparable, but also that it become important how to treat systematic errors (which may also be correlated, if e.g. two groups use the same one-loop renormalization constants) when averaging different results. How to do this is again subject to subjective choices to some extent. FLAG imposes cuts on quantities relating to the most important sources of systematic error (lattice spacings, pion mass, spatial volume) to select acceptable ensembles, then adds the statistical and systematic errors in quadrature, before performing a weighted average and computing the overall error taking correlations between different results into account using Schmelling's procedure. CKMfitter, on the other hand, adds all systematic errors linearly, and uses the Rfit procedure to perform a maximum likelihood fit. Either choice is equally permissible, but they are not directly compatible (so CKMfitter can't use FLAG averages as such).

Another point raised was that it is important for lattice collaborations computing mixing parameters to not just provide products like fB√BB, but also fB and BB separately (as well as information about the correlation between these quantities) in order to help making the global CKM fits easier.

John PreskillThe enigma of Robert Hooke

In 1675, Robert Hooke published the “true mathematical and mechanical form” for the shape of an ideal arch.  However, Hooke wrote the theory as an anagram,


Its solution was never published in his lifetime.  What was the secret hiding in these series of letters?


An excerpt from Hooke’s manuscript “A description of helioscopes, and some other instruments”.

The arch is one of the fundamental building blocks in architecture.  Used in bridges, cathedrals, doorways, etc., arches provide an aesthetic quality to the structures they dwell within.  Their key utility comes from their ability to support weight above an empty space, by distributing the load onto the abutments at its feet.  A dome functions much like an arch, except a dome takes on a three-dimensional shape whereas an arch is two-dimensional.  Paradoxically, while being the backbone of the many edifices, arches and domes themselves are extremely delicate: a single misplaced component along its curve, or an improper shape in the design would spell doom for the entire structure.

The Romans employed the rounded arch/dome—in the shape of a semicircle/hemisphere–in their bridges and pantheons.  The Gothic architecture favored the pointed arch and the ribbed vault in their designs.  However, neither of these arch forms were adequate for the progressively grander structures and more ambitious cathedrals sought in the 17th century.  Following the great fire of London in 1666, a massive rebuilding effort was under way.  Among the new public buildings, the most prominent was to be St. Paul’s Cathedral with its signature dome.  A modern theory of arches was sorely needed: what is the perfect shape for an arch/dome?

Christopher Wren, the chief architect of St. Paul’s Cathedral, consulted Hooke on the dome’s design.  To quote from the cathedral’s website [1]:

The two half-sections [of the dome] in the study employ a formula devised by Robert Hooke in about 1671 for calculating the curve of a parabolic dome and reducing its thickness.  Hooke had explored this curve the three-dimensional equivalent of the ‘hanging chain’, or catenary arch: the shape of a weighted chain which, when inverted, produces the ideal profile for a self-supporting arch.  He thought that such a curve derived from the equation y = x3.

A figure from Wren's design of St. Paul's Cathedral. (Courtesy of the British Museum)

A figure from Wren’s design of St. Paul’s Cathedral. (Courtesy of the British Museum)

How did Hooke came about the shape for the dome?  It wasn’t until after Hooke’s death his executor provided the unencrypted solution to the anagram [2]

Ut pendet continuum flexile, sic stabit contiguum rigidum inversum

which translates to

As hangs a flexible cable so, inverted, stand the touching pieces of an arch.

In other words, the ideal shape of an arch is exactly that of a freely hanging rope, only upside down.  Hooke understood that the building materials could withstand only compression forces and not tensile forces, in direct contrast to a rope that could resist tension but would buckle under compression.  The mathematics describing the arch and the cable are in fact identical, save for a minus sign.  Consequently, you could perform a real-time simulation of an arch using a piece of string!

Bonus:  Robert published the anagram in his book describing helioscopes, simply to “fill up the vacancy of the ensuring page” [3].  On that very page among other claims, Hooke also wrote the anagram “ceiiinosssttuu” in regards to “the true theory of elasticity”.  Can you solve this riddle?

[2] Written in Latin, the ‘u’ and ‘v’ are the same letter.
[3] In truth, Hooke was likely trying to avoid being scooped by his contemporaries, notably Issac Newton.

This article was inspired by my visit to the Huntington Library.  I would like to thank Catherine Wehrey for the illustrations and help with the research.

ResonaancesWeekend plot: SUSY limits rehashed

Lake Tahoe is famous for preserving dead bodies in good condition over many years,  therefore it is a natural place to organize the SUSY conference. As a tribute to this event, here is a plot from a recent ATLAS meta-analysis:
It shows the constraints on the gluino and the lightest neutralino masses in the pMSSM. Usually, the most transparent way to present experimental limits on supersymmetry is by using simplified models. This consists in picking two or more particles out of the MSSM zoo, and assuming that they are the only ones playing role in the analyzed process. For example, a popular simplified model has a gluino and a stable neutralino interacting via an effective quark-gluino-antiquark-neutralino coupling. In this model, gluino pairs are produced at the LHC through their couplings to ordinary gluons, and then each promptly decays to 2 quarks and  a neutralino via the effective couplings. This shows up in a detector as 4 or more jets and the missing energy carried off by the neutralinos. Within this simplified model, one can thus interpret the LHC multi-jets + missing energy data as constraints on 2 parameters: the gluino mass and  the lightest neutralino mass. One result of this analysis is that, for a massless neutralino, the gluino mass is constrained to be bigger than about 1.4 TeV, see the white line in the plot.

A non-trivial question is what happens to these limits if one starts to fiddle with the remaining one hundred parameters of the MSSM.  ATLAS tackles this question in the framework of the pMSSM,  which is a version of the  MSSM where all flavor and CP violating parameters are set to zero. In the resulting 19-dimensional parameter space,  ATLAS picks a large number of points that reproduce the correct Higgs mass and are consistent with various precision measurements. Then they check what fraction of the points with a given m_gluino and m_neutralino survives the constraints from all ATLAS supersymmetry searches so far. Of course, the results will depend on how the parameter space is sampled, but nevertheless  we can get a feeling of how robust are the limits obtained in simplified models. It is interesting that the gluino mass limits turn out to be quite robust. From the plot one  can see that, for a light neutralino, it is difficult to live with m_gluino < 1.4 TeV, and that there's no surviving points with  m_gluino < 1.1 TeV. Similar conclusion are  not true for all simplified models, e.g.,  the limits on squark masses in simplified models can be very much  relaxed by going to the larger parameter space of the pMSSM. Another thing worth noticing is that the blind spot near the m_gluino=m_neutralino diagonal is not really there: it is covered by ATLAS monojet searches.  

The LHC run-2 is going slow, so we still have some time to play with  the run-1 data. See the ATLAS paper for many more plots. New stronger limits on supersymmetry are not expected before next summer.

Tommaso DorigoHighlights From ICNFP 2015

The fourth edition of the International Conference on New Frontiers in Physics has ended yesterday evening, and it is time for a summary. However, this year I must say that I am not in a good position to give an overview of the most interesting physics discussion that have taken place here, as I was involved in the organization of events for the conference and I could only attend a relatively small fraction of the presentations.
ICNFP offers a broad view on the forefront topics of many areas of physics, with the main topics being nuclear and particle physics, yet with astrophysics and theoretical developments in quantum mechanics and related subjects also playing a major role. 

read more

August 30, 2015

Clifford JohnsonFresh Cycle

brompton_30_08_2015I've been a bit quiet here the last week or so, you may have noticed. I wish I could say it was because I've been scribbling some amazing new physics in my notebooks, or drawing several new pages for the book, or producing some other simply measurable output, but I cannot. Instead, I can only report that it was the beginning of a new semester (and entire new academic year!) this week just gone, and this - and all the associated preparations and so forth - coincided with several other things including working on several drafts of a grant renewal proposal.

The best news of all is that my new group of students for my class (graduate electromagnetism, second part) seems like a really good and fun group, and I am looking forward to working with them. We've had two lectures already and they seem engaged, and eager to take part in the way I like my classes to run - interactively and investigatively. I'm looking forward to working with them over the semester.

Other things I've been chipping away at in preparation for the next couple of months include launching the USC science film competition (its fourth year - I skipped last year because of family leave), moving my work office (for the first time in the 12 years I've been here), giving some lectures at an international school, organizing a symposium in celebration of the centenary of Einstein's General Relativity, and a number of shoots for some TV shows that might be of [...] Click to continue reading this post

The post Fresh Cycle appeared first on Asymptotia.

Jordan EllenbergAnthony Trollope’s preternatural power

Simon Winchester in today’s New York Times Book Review:

Traveling in China back in the early 1990s, I was waiting for my westbound train to take on water at a lonely halt in the Taklamakan Desert when a young Chinese woman tapped me on the shoulder, asked if I spoke English and, further, if I knew anything of Anthony Trollope. I was quite taken aback. Trollope here? A million miles from anywhere? I mumbled an incredulous, “Yes, I know a bit” — whereupon, in a brisk and businesslike manner, she declared that the train would remain at the oasis for the next, let me see, 27 minutes, and in that time would I kindly answer as many of her questions as possible about plot and character development in “The Eustace Diamonds”?

Ever since that encounter, I’ve been fully convinced of China’s perpetual and preternatural power to astonish, amaze and delight.

It doesn’t actually seem that preternatural to me that a young, presumably educated woman read a novel and liked it.  What he should have been convinced of is Anthony Trollope’s perpetual and preternatural power to astonish, amaze and delight people separated from him by vast spans of culture and time.  “The Eustace Diamonds” is ace.  Probably “He Knew He Was Right” or “Can You Forgive Her?” (my own first Trollope) are better places to start.  Free Gutenbergs of both here.  Was any other Victorian novelist great enough to have the Pet Shop Boys name a song after one of their books?  No.  None other was so great.

Jordan EllenbergPersonal time

Predictions about self-driving cars:

The average American could shift some of the 5.5 hours of television watched per day into the car, and end up with vastly more personal time once freed from the need to pay attention to the road.

Wouldn’t that person just watch another hour of TV and end up with the same amount of personal time?

Jordan EllenbergThe Coin Game, II

Good answers to the last question! I think I perhaps put my thumb on the scale too much by naming a variable p.

Let me try another version in the form of a dialogue.

ME: Hey in that other room somebody flipped a fair coin. What would you say is the probability that it fell heads?

YOU: I would say it is 1/2.

ME: Now I’m going to give you some more information about the coin. A confederate of mine made a prediction about whether the coin would fall head or tails and he was correct. Now what would you say is the probability that it fell heads?

YOU: Now I have no idea, because I have no information about the propensity of your confederate to predict heads.

(Update: What if what you knew about the coin in advance was that it fell heads 99.99% of the time? Would you still be at ease saying you end up with no knowledge at all about the probability that the coin fell heads?) This is in fact what Joyce thinks you should say. White disagrees. But I think they both agree that it feels weird to say this, whether or not it’s correct.

Why would it not feel weird? I think Qiaochu’s comment in the previous thread gives a clue. He writes:

Re: the update, no, I don’t think that’s strange. You gave me some weird information and I conditioned on it. Conditioning on things changes my subjective probabilities, and conditioning on weird things changes my subjective probabilities in weird ways.

In other words, he takes it for granted that what you are supposed to do is condition on new information. Which is obviously what you should do in any context where you’re dealing with mathematical probability satisfying the usual axioms. Are we in such a context here? I certainly don’t mean “you have no information about Coin 2” to mean “Coin 2 falls heads with probability p where p is drawn from the uniform distribution (or Jeffreys, or any other specified distribution, thanks Ben W.) on [0,1]” — if I meant that, there could be no controversy!

I think as mathematicians we are very used to thinking that probability as we know it is what we mean when we talk about uncertainty. Or, to the extent we think we’re talking about something other than probability, we are wrong to think so. Lots of philosophers take this view. I’m not sure it’s wrong. But I’m also not sure it’s right. And whether it’s wrong or right, I think it’s kind of weird.

Chad OrzelOn the Need for “Short Story Club”

So, the Hugo awards were handed out a little while ago, with half of the prose fiction categories going to “No Award” and the other half to works I voted below “No Award.” Whee. I’m not really interested in rehashing the controversy, though I will note that Abigail Nussbaum’s take is probably the one I most agree with.

With the release of the nominating stats, a number of people released “what might’ve been” ballots, stripping out the slate nominees– Tobias Buckell’s was the first I saw, so I’ll link that. I saw a lot of people exclaiming over how awesome that would’ve been, and found myself with some time to kill, so I went and read the short stories from that list (all of which are freely available online).

And, you know, they’re… fine. Really, the main effect this had for me was to reconfirm that short fiction is a low-return investment for me. I wouldn’t object to any of these winning an award, but none of them jumped out at me as brilliant “Oh my God, this must win!” stuff.

(Aside: I spent a while thinking about why it is that short fiction at least seems to have a lower rate of return for me than novels. I think it’s mostly that current tastes interact with the length limit in a way that works really badly for me. The stuff that’s getting celebrated these days tends toward the “literalized metaphor” side of things– the speculative elements tend not to be points of science, but supernatural reflections of the emotional state of the characters. The failure mode of that is “crashingly obvious,” particularly when constrained to keep it under 7,500 words, and I really hate that. My reaction tends to be “Yes, I see what you did there. You’re very clever. Here’s a shiny gold star,” and a story starting in that hole needs to be really good just to ascend to the heights of “Meh.” Novels provide a little more room to work, and it’s easier to hide the clever metaphors, so they’re less likely to bug me in that particular way.)

This then leads to the fundamental problem I have with Hugo nominating, namely that the “just nominate what you love!” method really doesn’t work for me when three-quarters of the awards go to low-return categories. Left to my own devices, I’m just not going to read much short fiction, certainly not enough to make sensible nominations. Which means I’m going to be one of those folks who nominates a bunch of novels and maybe a couple of movies, and leaves the rest blank. Which plays into the hands of the slate voters.

The one year recently when I actually read enough short fiction to make halfway sensible nominations was when Niall Harrison put together a “Short Story Club” of bloggers who all read a particular story and reviewed it online (you can find my reviews here). Working from a limited selection of stories by somebody with a really good grasp of the state of the field was a big help, and the obligation to say something about them on the blog was enough to motivate me to read them. (And, no, “keep garbage off the Hugo ballot” is not by itself enough motivation, especially in the absence of quality curation.)

Niall has since moved to a role where it wouldn’t be appropriate for him to do that kind of thing, but I’d really love to see someone else take that up: picking a set of plausibly Hugo-worthy stories, setting a schedule, and collecting links to reviews. Even if it doesn’t lead to finding stuff that I actually love and want to nominate, it would be interesting to read about what other people see in these stories.

I don’t know that anybody reading this has the free time or standing in the SF community to do this kind of thing, but I know I would find it really valuable. So I’ll throw it out there and hope for the best.

David Hoggend of the summer

Today was my last full day at MPIA for 2015. Ness and I worked on her age paper, and made the list of final items that need to be completed before the paper can be submitted, first to the SDSS-IV Collaboration, and then to the journal. I also figured out a bunch of baby steps I can do on my own paper with the age catalog.

August 29, 2015

Jordan EllenbergThe coin game

Here is a puzzling example due to Roger White.

There are two coins.  Coin 1 you know is fair.  Coin 2 you know nothing about; it falls heads with some probability p, but you have no information about what p is.

Both coins are flipped by an experimenter in another room, who tells you that the two coins agreed (i.e. both were heads or both tails.)

What do you now know about Pr(Coin 1 landed heads) and Pr(Coin 2 landed heads)?

(Note:  as is usual in analytic philosophy, whether or not this is puzzling is itself somewhat controversial, but I think it’s puzzling!)

Update: Lots of people seem to not find this at all puzzling, so let me add this. If your answer is “I know nothing about the probability that coin 1 landed heads, it’s some unknown quantity p that agrees with the unknown parameter governing coin 2,” you should ask yourself: is it strange that someone flipped a fair coin in another room and you don’t know what the probability is that it landed heads?”

Relevant readings: section 3.1 of the Stanford Encyclopedia of Philosophy article on imprecise probabilities and Joyce’s paper on imprecise credences, pp.13-14.

August 28, 2015

David Hoggreionization

At MPIA Galaxy Coffee, K. G. Lee (MPIA) and Jose Oñorbe (MPIA) gave talks about the intergalactic medium. Lee spoke about reconstruction of the density field, and Oñorbe spoke about reionization. The conversations continued into lunch, where I spoke with the research group of Joe Hennawi (MPIA) about various problems in inferring things about the intergalactic medium and quasar spectra in situations where (a) it is easy to simulate the data but (b) there is no explicit likelihood function. I advocated likelihood-free inference or ABC (as it is often called), plus adaptive sampling. We also discussed model selection, and I advocated cross-validation.

In the afternoon, Ness and I continued code review and made decisions for final runs of The Cannon for our red-giant masses and ages paper.

BackreactionEmbrace your 5th dimension.

I found this awesome photo at
What does it mean to live in a holographic universe?

“We live in a hologram,” the physicists say, but what do they mean? Is there a flat-world-me living on the walls of the room? Or am I the projection of a mysterious five-dimensional being and beyond my own comprehension? And if everything inside my head can be described by what’s on its boundary, then how many dimensions do I really live in? If these are questions that keep you up at night, I have the answers.

1. Why do some physicists think our universe may be a hologram?

It all started with the search for a unified theory.

Unification has been enormously useful for our understanding of natural law: Apples fall according to the same laws that keep planets on their orbits. The manifold appearances of matter as gases, liquids and solids, can be described as different arrangements of molecules. The huge variety of molecules themselves can be understood as various compositions of atoms. These unifying principles were discovered long ago. Today physicists refer by unification specifically to a common origin of different interactions. The electric and magnetic interactions, for example, turned out to be two different aspects of the same electromagnetic interaction. The electromagnetic interaction, or its quantum version respectively, has further been unified with the weak nuclear interaction. Nobody has succeeded yet in unifying all presently known interactions, the electromagnetic with the strong and weak nuclear ones, plus gravity.

String theory was conceived as a theory of the strong nuclear interaction, but it soon became apparent that quantum chromodynamcis, the theory of quarks and gluons, did a better job at this. But string theory gained second wind after physicists discovered it may serve to explain all the known interactions including gravity, and so could be a unified theory of everything, the holy grail of physics.

It turned out to be difficult however to get specifically the Standard Model interactions back from string theory. And so the story goes that in recent years the quest for unification has slowly been replaced with a quest for dualities that demonstrate that all the different types of string theories are actually different aspects of the same theory, which is yet to be fully understood.

A duality in the most general sense is a relation that identifies two theories. You can understand a duality as a special type of unification: In a normal unification, you merge two theories together to a larger theory that contains the former two in a suitable limit. If you relate two theories by a duality, you show that the theories are the same, they just appear different, depending on how you look at them.

One of the most interesting developments in high energy physics during the last decades is the finding of dualities between theories in a different number of space-time dimensions. One of the theories is a gravitational theory in the higher-dimensional space, often called “the bulk”. The other is a gauge-theory much like the ones in the standard model, and it lives on the boundary of the bulk space-time. This relation is often referred to as the gauge-gravity correspondence, and it is a limit of a more general duality in string theory.

To be careful, this correspondence hasn’t been strictly speaking proved. But there are several examples where it has been so thoroughly studied that there is very little doubt it will be proved at some point.

These dualities are said to be “holographic” because they tell us that everything allowed to happen in the bulk space-time of the gravitational theory is encoded on the boundary of that space. And because there are fewer bits of information on the surface of a volume than in the volume itself, fewer things can happen in the volume than you’d have expected. It might seem as if particles inside a box are all independent from each other, but they must actually be correlated. It’s like you were observing a large room with kids running and jumping but suddenly you’d notice that every time one of them jumps, for a mysterious reason ten others must jump at exactly the same time.

2. Why is it interesting that our universe might be a hologram?

This limitation on the amount of independence between particles due to holography would only become noticeable at densities too high for us to test directly. The reason this type of duality is interesting nevertheless is that physics is mostly the art of skillful approximation, and using dualities is a new skill.

You have probably seen these Feynman diagrams that sketch particle scattering processes? Each of these diagrams makes a contribution to an interaction process. The more loops there are in a diagram, the smaller the contributions are. And so what physicists do is adding up the largest contributions first, then the smaller ones, and even smaller ones, until they’ve reached the desired precision. It’s called “perturbation theory” and only works if the contributions really get smaller the more interactions take place. If that is so, the theory is said to be “weakly coupled” and all is well. If it ain’t so, the theory is said to be “strongly coupled” and you’d never be done summing all the relevant contributions. If a theory is strongly coupled, then the standard methods of particle physicists fail.

The strong nuclear force for example has the peculiar property of “asymptotic freedom,” meaning it becomes weaker at high energies. But at low energies, it is very strong. Consequently nuclear matter at low energies is badly understood, as for example the behavior of the quark gluon plasma, or the reason why single quarks do not travel freely but are always “confined” to larger composite states. Another interesting case that falls in this category is that of “strange” metals, which include high-temperature superconductors, another holy grail of physicists. The gauge-gravity duality helps dealing with these systems because when the one theory is strongly coupled and difficult to treat, then the dual theory is weakly coupled and easy to treat. So the duality essentially serves to convert a difficult calculation to a simple one.

3. Where are we in the holographic universe?

Since the theory on the boundary and the theory in the bulk are related by the duality they can be used to describe the same physics. So on a fundamental level the distinction doesn’t make sense – they are two different ways to describe the same thing. It’s just that sometimes one of them is easier to use, sometimes the other.

One can give meaning to the question though if you look at particular systems, as for example the quark gluon plasma or a black hole, and ask for the number of dimensions that particles experience. This specification of particles is what makes the question meaningful because identifying particles isn’t always possible.

The theory for the quark gluon plasma is placed on the boundary because it would be described by the strongly coupled theory. So if you consider it to be part of your laboratory then you have located the lab, with yourself in it, on the boundary. However, the notion of ‘dimensions’ that we experience is tied to the freedom of particles to move around. This can be made more rigorous in the definition of ‘spectral dimension’ which measures, roughly speaking, in how many directions a particle can get lost. The very fact that makes a system strongly coupled though means that one can’t properly define single particles that travel freely. So while you can move around in the laboratory’s three spatial dimensions, the quark gluon plasma first has to be translated to the higher dimensional theory to even speak about individual particles moving. In that sense, part of the laboratory has become higher dimensional, indeed.

If you look at an astrophysical black hole however, then the situation is reversed. We know that particles in its vicinity are weakly coupled and experience three spatial dimensions. If you wanted to apply the duality in this case then we would be situated in the bulk and there would be lower-dimensional projections of us and the black hole on the boundary, constraining our freedom to move around, but in such a subtle way that we don’t notice. However, the bulk space-times that are relevant in the gauge-gravity duality are so-called Anti-de-Sitter spaces, and these always have a negative cosmological constant. The universe we inhabit however has to our best current knowledge a positive cosmological constant. So it is not clear that there actually is a dual system that can describe the black holes in our universe.

Many researchers are presently working on expanding the gauge-gravity duality to include spaces with a positive cosmological constant or none at all, but at least so far it isn’t clear that this works. So for now we do not know whether there exist projections of us in a lower-dimensional space-time.

4. How well does this duality work?

The applications of the gauge-gravity duality fall roughly into three large areas, plus a diversity of technical developments driving the general understanding of the theory. The three areas are the quark gluon plasma, strange metals, and black hole evaporation. In the former two cases our universe is on the boundary, in the latter we are in the bulk.

The studies of black hole evaporation are examinations of mathematical consistency conducted to unravel just exactly how information may escape a black hole, or what happens at the singularity. In this area there are presently more answers than there are questions. The applications of the duality to the quark gluon plasma initially caused a lot of excitement, but as of recently some skepticism has spread. It seems that the plasma is not as strongly coupled as originally thought, and using the duality is not as straightforward as hoped. The applications to strange metals and other classes of materials are making rapid progress as both analytical and numerical methods are being developed. The behavior for several observables has been qualitatively reproduced, but it is as present not very clear exactly which systems are the best to use. The space of models is still too big, which leaves too much room for useful predictions. In summary, as the scientists say “more work is needed”.

5. Does this have something to do with Stephen Hawking's recent proposal for how to solve the black hole information loss problem?

That’s what he says, yes. Essentially he is claiming that our universe has holographic properties even though it has a positive cosmological constant, and that the horizon of a black hole also serves as a surface that contains all the information of what happens in the full space-time. This would mean in particular that the horizon of a black hole keeps track of what fell into the black hole, and so nothing is really forever lost.

This by itself isn’t a new idea. What is new in this work with Malcom Perry and Andrew Strominger is that they claim to have a way to store and release the information, in a dynamical situation. Details of how this is supposed to work however are so far not clear. By and large the scientific community has reacted with much skepticism, not to mention annoyance over the announcement of an immature idea.

[This post previously appeared at Starts with a Bang.]

Secret Blogging SeminarAnnouncing ALGECOM Fall 2015

The University of Michigan at Ann Arbor is proud to be hosting
ALGECOM, the twice annual midwestern conference on algebra, geometry
and combinatorics on Saturday, October 24. We will feature four
speakers, namely,

Jonah Blasiak (Drexel University)
Laura Escobar (University of Illinois at Urbana-Champaign)
Joel Kamnitzer (University of Toronto)
Tri Lai (IMA and University of Minnesota)

as well as a poster session. If you would like to submit a poster, please e-mail (David Speyer) with a quick summary of your work by September 15.

A block of rooms has been reserved at the (Lamp Post Inn) under the name of ALGECOM.

This conference is supported by a conference grant form the NSF. Limited funds are available for graduate student travel to the conference. Please contact (David Speyer) to request support, and include a note from your adviser.

More information will be added to our website as it becomes available.

We hope to see you there!

August 27, 2015

Terence TaoHeath-Brown’s theorem on prime twins and Siegel zeroes

The twin prime conjecture is one of the oldest unsolved problems in analytic number theory. There are several reasons why this conjecture remains out of reach of current techniques, but the most important obstacle is the parity problem which prevents purely sieve-theoretic methods (or many other popular methods in analytic number theory, such as the circle method) from detecting pairs of prime twins in a way that can distinguish them from other twins of almost primes. The parity problem is discussed in these previous blog posts; this obstruction is ultimately powered by the Möbius pseudorandomness principle that asserts that the Möbius function {\mu} is asymptotically orthogonal to all “structured” functions (and in particular, to the weight functions constructed from sieve theory methods).

However, there is an intriguing “alternate universe” in which the Möbius function is strongly correlated with some structured functions, and specifically with some Dirichlet characters, leading to the existence of the infamous “Siegel zero“. In this scenario, the parity problem obstruction disappears, and it becomes possible, in principle, to attack problems such as the twin prime conjecture. In particular, we have the following result of Heath-Brown:

Theorem 1 At least one of the following two statements are true:

  • (Twin prime conjecture) There are infinitely many primes {p} such that {p+2} is also prime.
  • (No Siegel zeroes) There exists a constant {c>0} such that for every real Dirichlet character {\chi} of conductor {q > 1}, the associated Dirichlet {L}-function {s \mapsto L(s,\chi)} has no zeroes in the interval {[1-\frac{c}{\log q}, 1]}.

Informally, this result asserts that if one had an infinite sequence of Siegel zeroes, one could use this to generate infinitely many twin primes. See this survey of Friedlander and Iwaniec for more on this “illusory” or “ghostly” parallel universe in analytic number theory that should not actually exist, but is surprisingly self-consistent and to date proven to be impossible to banish from the realm of possibility.

The strategy of Heath-Brown’s proof is fairly straightforward to describe. The usual starting point is to try to lower bound

\displaystyle  \sum_{x \leq n \leq 2x} \Lambda(n) \Lambda(n+2) \ \ \ \ \ (1)

for some large value of {x}, where {\Lambda} is the von Mangoldt function. Actually, in this post we will work with the slight variant

\displaystyle  \sum_{x \leq n \leq 2x} \Lambda_2(n(n+2)) \nu(n(n+2))


\displaystyle  \Lambda_2(n) = (\mu * L^2)(n) = \sum_{d|n} \mu(d) \log^2 \frac{n}{d}

is the second von Mangoldt function, and {*} denotes Dirichlet convolution, and {\nu} is an (unsquared) Selberg sieve that damps out small prime factors. This sum also detects twin primes, but will lead to slightly simpler computations. For technical reasons we will also smooth out the interval {x \leq n \leq 2x} and remove very small primes from {n}, but we will skip over these steps for the purpose of this informal discussion. (In Heath-Brown’s original paper, the Selberg sieve {\nu} is essentially replaced by the more combinatorial restriction {1_{(n(n+2),q^{1/C}\#)=1}} for some large {C}, where {q^{1/C}\#} is the primorial of {q^{1/C}}, but I found the computations to be slightly easier if one works with a Selberg sieve, particularly if the sieve is not squared to make it nonnegative.)

If there is a Siegel zero {L(\beta,\chi)=0} with {\beta} close to {1} and {\chi} a Dirichlet character of conductor {q}, then multiplicative number theory methods can be used to show that the Möbius function {\mu} “pretends” to be like the character {\chi} in the sense that {\mu(p) \approx \chi(p)} for “most” primes {p} near {q} (e.g. in the range {q^\varepsilon \leq p \leq q^C} for some small {\varepsilon>0} and large {C>0}). Traditionally, one uses complex-analytic methods to demonstrate this, but one can also use elementary multiplicative number theory methods to establish these results (qualitatively at least), as will be shown below the fold.

The fact that {\mu} pretends to be like {\chi} can be used to construct a tractable approximation (after inserting the sieve weight {\nu}) in the range {[x,2x]} (where {x = q^C} for some large {C}) for the second von Mangoldt function {\Lambda_2}, namely the function

\displaystyle  \tilde \Lambda_2(n) := (\chi * L)(n) = \sum_{d|n} \chi(d) \log^2 \frac{n}{d}.

Roughly speaking, we think of the periodic function {\chi} and the slowly varying function {\log^2} as being of about the same “complexity” as the constant function {1}, so that {\tilde \Lambda_2} is roughly of the same “complexity” as the divisor function

\displaystyle  \tau(n) := (1*1)(n) = \sum_{d|n} 1,

which is considerably simpler to obtain asymptotics for than the von Mangoldt function as the Möbius function is no longer present. (For instance, note from the Dirichlet hyperbola method that one can estimate {\sum_{x \leq n \leq 2x} \tau(n)} to accuracy {O(\sqrt{x})} with little difficulty, whereas to obtain a comparable level of accuracy for {\sum_{x \leq n \leq 2x} \Lambda(n)} or {\sum_{x \leq n \leq 2x} \Lambda_2(n)} is essentially the Riemann hypothesis.)

One expects {\tilde \Lambda_2(n)} to be a good approximant to {\Lambda_2(n)} if {n} is of size {O(x)} and has no prime factors less than {q^{1/C}} for some large constant {C}. The Selberg sieve {\nu} will be mostly supported on numbers with no prime factor less than {q^{1/C}}. As such, one can hope to approximate (1) by the expression

\displaystyle  \sum_{x \leq n \leq 2x} \tilde \Lambda_2(n(n+2)) \nu(n(n+2)); \ \ \ \ \ (2)

as it turns out, the error between this expression and (1) is easily controlled by sieve-theoretic techniques. Let us ignore the Selberg sieve for now and focus on the slightly simpler sum

\displaystyle  \sum_{x \leq n \leq 2x} \tilde \Lambda_2(n(n+2)).

As discussed above, this sum should be thought of as a slightly more complicated version of the sum

\displaystyle \sum_{x \leq n \leq 2x} \tau(n(n+2)). \ \ \ \ \ (3)

Accordingly, let us look (somewhat informally) at the task of estimating the model sum (3). One can think of this problem as basically that of counting solutions to the equation {ab+2=cd} with {a,b,c,d} in various ranges; this is clearly related to understanding the equidistribution of the hyperbola {\{ (a,b) \in {\bf Z}/d{\bf Z}: ab + 2 = 0 \hbox{ mod } d \}} in {({\bf Z}/d{\bf Z})^2}. Taking Fourier transforms, the latter problem is closely related to estimation of the Kloosterman sums

\displaystyle  \sum_{m \in ({\bf Z}/r{\bf Z})^\times} e( \frac{a_1 m + a_2 \overline{m}}{r} )

where {\overline{m}} denotes the inverse of {m} in {({\bf Z}/r{\bf Z})^\times}. One can then use the Weil bound

\displaystyle  \sum_{m \in ({\bf Z}/r{\bf Z})^\times} e( \frac{am+b\overline{m}}{r} ) \ll r^{1/2 + o(1)} (a,b,r)^{1/2} \ \ \ \ \ (4)

where {(a,b,r)} is the greatest common divisor of {a,b,r} (with the convention that this is equal to {r} if {a,b} vanish), and the {o(1)} decays to zero as {r \rightarrow \infty}. The Weil bound yields good enough control on error terms to estimate (3), and as it turns out the same method also works to estimate (2) (provided that {x=q^C} with {C} large enough).

Actually one does not need the full strength of the Weil bound here; any power savings over the trivial bound of {r} will do. In particular, it will suffice to use the weaker, but easier to prove, bounds of Kloosterman:

Lemma 2 (Kloosterman bound) One has

\displaystyle  \sum_{m \in ({\bf Z}/r{\bf Z})^\times} e( \frac{am+b\overline{m}}{r} ) \ll r^{3/4 + o(1)} (a,b,r)^{1/4} \ \ \ \ \ (5)

whenever {r \geq 1} and {a,b} are coprime to {r}, where the {o(1)} is with respect to the limit {r \rightarrow \infty} (and is uniform in {a,b}).

Proof: Observe from change of variables that the Kloosterman sum {\sum_{m \in ({\bf Z}/r{\bf Z})^\times} e( \frac{am+b\overline{m}}{r} )} is unchanged if one replaces {(a,b)} with {(\lambda a, \lambda^{-1} b)} for {\lambda \in ({\bf Z}/d{\bf Z})^\times}. For fixed {a,b}, the number of such pairs {(\lambda a, \lambda^{-1} b)} is at least {r^{1-o(1)} / (a,b,r)}, thanks to the divisor bound. Thus it will suffice to establish the fourth moment bound

\displaystyle  \sum_{a,b \in {\bf Z}/r{\bf Z}} |\sum_{m \in ({\bf Z}/r{\bf Z})^\times} e\left( \frac{am+b\overline{m}}{r} \right)|^4 \ll d^{4+o(1)}.

The left-hand side can be rearranged as

\displaystyle  \sum_{m_1,m_2,m_3,m_4 \in ({\bf Z}/r{\bf Z})^\times} \sum_{a,b \in {\bf Z}/d{\bf Z}}

\displaystyle  e\left( \frac{a(m_1+m_2-m_3-m_4) + b(\overline{m_1}+\overline{m_2}-\overline{m_3}-\overline{m_4})}{r} \right)

which by Fourier summation is equal to

\displaystyle  d^2 \# \{ (m_1,m_2,m_3,m_4) \in (({\bf Z}/r{\bf Z})^\times)^4:

\displaystyle  m_1+m_2-m_3-m_4 = \frac{1}{m_1} + \frac{1}{m_2} - \frac{1}{m_3} - \frac{1}{m_4} = 0 \hbox{ mod } r \}.

Observe from the quadratic formula and the divisor bound that each pair {(x,y)\in ({\bf Z}/r{\bf Z})^2} has at most {O(r^{o(1)})} solutions {(m_1,m_2)} to the system of equations {m_1+m_2=x; \frac{1}{m_1} + \frac{1}{m_2} = y}. Hence the number of quadruples {(m_1,m_2,m_3,m_4)} of the desired form is {r^{2+o(1)}}, and the claim follows. \Box

We will also need another easy case of the Weil bound to handle some other portions of (2):

Lemma 3 (Easy Weil bound) Let {\chi} be a primitive real Dirichlet character of conductor {q}, and let {a,b,c,d \in{\bf Z}/q{\bf Z}}. Then

\displaystyle  \sum_{n \in {\bf Z}/q{\bf Z}} \chi(an+b) \chi(cn+d) \ll q^{o(1)} (ad-bc, q).

Proof: As {q} is the conductor of a primitive real Dirichlet character, {q} is equal to {2^j} times a squarefree odd number for some {j \leq 3}. By the Chinese remainder theorem, it thus suffices to establish the claim when {q} is an odd prime. We may assume that {ad-bc} is not divisible by this prime {q}, as the claim is trivial otherwise. If {a} vanishes then {c} does not vanish, and the claim follows from the mean zero nature of {\chi}; similarly if {c} vanishes. Hence we may assume that {a,c} do not vanish, and then we can normalise them to equal {1}. By completing the square it now suffices to show that

\displaystyle  \sum_{n \in {\bf Z}/p{\bf Z}} \chi( n^2 - b ) \ll 1

whenever {b \neq 0 \hbox{ mod } p}. As {\chi} is {+1} on the quadratic residues and {-1} on the non-residues, it now suffices to show that

\displaystyle  \# \{ (m,n) \in ({\bf Z}/p{\bf Z})^2: n^2 - b = m^2 \} = p + O(1).

But by making the change of variables {(x,y) = (n+m,n-m)}, the left-hand side becomes {\# \{ (x,y) \in ({\bf Z}/p{\bf Z})^2: xy=b\}}, and the claim follows. \Box

While the basic strategy of Heath-Brown’s argument is relatively straightforward, implementing it requires a large amount of computation to control both main terms and error terms. I experimented for a while with rearranging the argument to try to reduce the amount of computation; I did not fully succeed in arriving at a satisfactorily minimal amount of superfluous calculation, but I was able to at least reduce this amount a bit, mostly by replacing a combinatorial sieve with a Selberg-type sieve (which was not needed to be positive, so I dispensed with the squaring aspect of the Selberg sieve to simplify the calculations a little further; also for minor reasons it was convenient to retain a tiny portion of the combinatorial sieve to eliminate extremely small primes). Also some modest reductions in complexity can be obtained by using the second von Mangoldt function {\Lambda_2(n(n+2))} in place of {\Lambda(n) \Lambda(n+2)}. These exercises were primarily for my own benefit, but I am placing them here in case they are of interest to some other readers.

— 1. Consequences of a Siegel zero —

It is convenient to phrase Heath-Brown’s theorem in the following equivalent form:

Theorem 4 Suppose one has a sequence {\chi_{\bf n}} of real Dirichlet characters of conductor {q_{\bf n}} going to infinity, and a sequence of real zeroes {L(\beta_{\bf n},\chi_{\bf n}) = 0} with {\beta_{\bf n} = 1 - o( \frac{1}{\log q_{\bf n}})} as {{\bf n} \rightarrow \infty}. Then there are infinitely many prime twins.

Henceforth, we omit the dependence on {{\bf n}} from all of our quantities (unless they are explicitly declared to be “fixed”), and the asymptotic notation {o(1)}, {O(1)}, {\ll}, etc. will always be understood to be with respect to the {{\bf n}} parameter, e.g. {X \ll Y} means that {X \leq CY} for some fixed {C}. (In the language of this previous blog post, we are thus implicitly using “cheap nonstandard analysis”, although we will not explicitly use nonstandard analysis notation (other than the asymptotic notation mentioned above) further in this post. With this convention, we now have a single (but not fixed) Dirichlet character {\chi} of some conductor {q} with a Siegel zero

\displaystyle  \beta = 1 - o(\frac{1}{\log q}). \ \ \ \ \ (6)

It will also be convenient to use the crude bound

\displaystyle  1 - \beta \gg q^{-O(1)} \ \ \ \ \ (7)

which can be proven by elementary means (see e.g. Exercise 57 of this post), although one can use Siegel’s theorem to obtain the better bound {1 - \beta \gg q^{-o(1)}}. Standard arguments (see also Lemma 59 of this blog post) then give

\displaystyle  L(1,\chi) \gg q^{-O(1)} \ \ \ \ \ (8)

We now use this Siegel zero to show that {\mu} pretends to be like {\chi} for primes that are comparable (in log-scale) to {q}:

Lemma 5 For any fixed {0 < \varepsilon < C}, we have

\displaystyle  \sum_{q^\varepsilon \leq p \leq q^C: \chi(p) \neq -1} \frac{1}{p} = o(1).

For more precise estimates on the {o(1)} error, see the paper of Heath-Brown (particularly Lemma 3).

Proof: It suffices to show, for sufficiently large fixed {C>0}, that

\displaystyle  \sum_{q^{C/k} < p \leq q^{2C/k}: \chi(p) \neq -1} \frac{1}{p} = o(1)

for each fixed natural number {n}.

We begin by considering the sum

\displaystyle  \sum_{n \leq x} \frac{1*\chi(n)}{n^\beta} \ \ \ \ \ (9)

for some large {x} (which we will eventually take to be a power of {q}); we will exploit the fact that this sum is very stable for {x} comparable to {q} in log-scale. By the Dirichlet hyperbola method, we can write this as

\displaystyle  \sum_{d \leq \sqrt{x}} \frac{1}{d^\beta} \sum_{m \leq x/d} \frac{\chi(m)}{m^\beta} + \sum_{m < \sqrt{x}} \frac{\chi(m)}{m^\beta} \sum_{\sqrt{x} < d \leq x/m} \frac{1}{d^\beta}

Since {L(\beta,\chi) = 0}, one can show through summation by parts (see Lemma 71 of this previous post) that

\displaystyle  \sum_{m \leq y} \frac{\chi(m)}{m^\beta} \ll \frac{q}{y^\beta}

for any {y \geq 1}, while from the integral test (see Lemma 2 of this previous post) we have

\displaystyle  \sum_{\sqrt{x} < d \leq x/m} \frac{1}{d^\beta} = \frac{(x/m)^{1-\beta}-\sqrt{x}^{1-\beta}}{1-\beta} + O( \frac{1}{\sqrt{x}^\beta}).

We can thus estimate (9) as

\displaystyle  \sum_{d \leq \sqrt{x}} O( \frac{q}{x^\beta} ) + \frac{x^{1-\beta}}{1-\beta} \sum_{m < \sqrt{x}} \frac{\chi(m)}{m} - \frac{x^{(1-\beta)/2}}{1-\beta} \sum_{m < \sqrt{x}} \frac{\chi(m)}{m^\beta}

\displaystyle + \sum_{m < \sqrt{x}} O( \frac{1}{m^\beta \sqrt{x}^\beta} ).

From summation by parts we again have

\displaystyle  \sum_{m < \sqrt{x}} \frac{\chi(m)}{m} = L(1,\chi) + O( \frac{q}{\sqrt{x}})

and we have the crude bound

\displaystyle  \sum_{m < \sqrt{x}} \frac{1}{m^\beta} \ll \frac{x^{1-\beta}}{1-\beta}

so by using (7) and {x^{1-\beta} = x^{o(1)}} we arrive at

\displaystyle  \sum_{n \leq x} \frac{1*\chi(n)}{n^\beta} = \frac{x^{1-\beta}}{1-\beta} L(1,\chi) + O( q^{O(1)} x^{-1/2+o(1)} ).

for any {x > 1}, where the {O(1)} exponent does not depend on {C}. In particular, if {q^C \leq x \leq q^{3C}} and {C} is large enough, then by (6), (7), (8) we have

\displaystyle \sum_{n \leq x} \frac{1*\chi(n)}{n^\beta} = \frac{1+o(1)}{1-\beta} L(1,\chi).

Setting {x=q^C} and {x=q^{3C}} and subtracting, we conclude that

\displaystyle  \sum_{q^C < n \leq q^{3C}} \frac{1*\chi(n)}{n^\beta} = o( \sum_{n \leq q^{C}} \frac{1*\chi(n)}{n^\beta} ). \ \ \ \ \ (10)

On the other hand, observe that {1*\chi} is always non-negative, and that {1*\chi(p_1 \dots p_k n) \geq 1*\chi(n)} whenever {n \leq q^C} and {q^{C/k} < p_1,\dots,p_k \leq q^{2C/k}}, with {p_1,\dots,p_k} primes with {\chi(p_1),\dots,\chi(p_k) \neq -1}. Since any number {N} with {q^C < N \leq q^{3C}} has at most {O(1)} representations of the form {N = p_1 \dots p_k n} with {n \leq q^C} and {q^{C/k} < p \leq q^{2C/k}}, and no {N} outside of the range {q^C < N \leq q^{3C}} has such a representation, we thus see that

\displaystyle  \sum_{q^C < N \leq q^{3C}} \frac{1*\chi(N)}{N^\beta} \gg (\sum_{n \leq q^C} \frac{1*\chi(n)}{n^\beta}) (\sum_{q^{C/k} < p \leq q^{2C/k}: \chi(p) \neq -1} \frac{1}{p^\beta})^k.

Comparing this with (10), we conclude that

\displaystyle  \sum_{q^{C/k} < p \leq q^{2C/k}: \chi(p) \neq -1} \frac{1}{p^\beta} = o(1);

since {1/p \leq 1/p^\beta}, the claim follows. \Box

— 2. Main argument —

We let {w} be a large absolute constant ({w=100} will do) and set {W := \prod_{p \leq w} p} to be the primorial of {w}. Set {x := q^C} for some large fixed {C} (large compared to {w} or {W}). Let {\psi: {\bf R} \rightarrow {\bf R}} be a smooth non-negative function supported on {[-1/2,1/2]} and equal to {1} at {0}. Set

\displaystyle  f(n) := \psi( \frac{\log n}{\log q} )


\displaystyle  \psi_x(n) := \psi( \log^C x (\frac{n}{x}-1) ).

Thus {f(n)} is a smooth cutoff to the region {n \leq \sqrt{q}}, and {\psi_x(n)} is a smooth cutoff to the region {n = (1+O(\log^{-C} x)) x}. It will suffice to establish the lower bound

\displaystyle  \sum_{n:(n(n+2),W)=1} \Lambda_2(n(n+2)) (\mu f * 1)(n(n+2)) \psi_x(n)

\displaystyle  \gg (1-o(1)) x \log^{1-C} x,

because the non-twin primes {n} contribute at most {O(x^{1/2+o(1)})} to the left-hand side. The weight {(\mu f*1)(n(n+2))} is an unsquared Selberg sieve designed to damp out those {n} for which {n} or {n+2} have somewhat small prime factors; we did not square this weight as is customary with the Selberg sieve in order to simplify the calculations slightly (the fact that the weight can be non-negative sometimes will not be a serious concern for us).

We split {1*\chi} as

\displaystyle  1*\chi(n) = 1_{n=1} + g(n). \ \ \ \ \ (11)

Thus {g} is non-negative, and supported on those products {p_1 \dots p_k} of primes with {k \geq 1} and {\chi(p_1),\dots,\chi(p_k) \neq -1}. Convolving (11) by {\Lambda_2} and using the identity {\Lambda_2*1=L^2}, we have

\displaystyle  \tilde \Lambda_2 = \Lambda_2 + \Lambda_2 * g

where {\tilde \Lambda_2 := \chi * L^2}. (The quantities {\Lambda_2, g, \tilde \Lambda_2} are all non-negative, but we will not take advantage of these facts here.) It thus suffices to establish the two bounds

\displaystyle  \sum_{n:(n(n+2),W)=1} \tilde \Lambda_2(n (n+2)) (\mu f * 1)(n(n+2)) \psi_x(n) \ \ \ \ \ (12)

\displaystyle  \gg (1-o(1)) x \log^{1-C} x


\displaystyle  \sum_{n:(n(n+2),W)=1} (\Lambda_2*g)(n (n+2)) (\mu f * 1)(n(n+2)) \psi_x(n) \ \ \ \ \ (13)

\displaystyle  = o(x \log^{1-C} x);

the intuition here is that Lemma 5 is showing that {g} is “sparse” and so the contribution of {\Lambda_2 * g} should be relatively small

We begin with (13). Let {\varepsilon > 0} be a small fixed quantity to be chosen later. Observe that if {(\Lambda_2*g)(n(n+2))} is non-zero, then {n(n+2)} must have a factor on which {g} is non-zero, which implies that {n(n+2)} is either divisible by a prime {p} with {\chi(p) \neq -1}, or by the square of a prime. If the former case occurs, then either {n} or {n+2} is divisible by {p}; since {n,n+2 \leq 2x}, this implies that either {n(n+2)} is divisible by a prime {p} with {x^\varepsilon \leq \chi(p) \leq 2x^{1-\varepsilon}}, or that {n(n+2)} is divisible by a prime less than {x^\varepsilon}. To summarise, at least one of the following three statements must hold:

  • {n(n+2)} is divisible by a prime {w < p < x^\varepsilon}.
  • {n(n+2)} is divisible by the square {p^2} of a prime {p \geq x^\varepsilon}.
  • {n(n+2)} is divisible by a prime {x^\varepsilon \leq p \leq 2x^{1-\varepsilon}} with {\chi(p) \neq -1}.

It thus suffices to establish the estimates

\displaystyle  \sum_{w < p < x^\varepsilon} \sum_{n: p|n(n+2),(n(n+2),W)=1} (\Lambda_2*g)(n (n+2)) |(\mu f * 1)(n(n+2))| \ \ \ \ \ (14)

\displaystyle \psi_x(n) \ll_C \varepsilon x \log^{1-C} x,

\displaystyle  \sum_{p \geq x^\varepsilon} \sum_{n: p^2|n(n+2)} (\Lambda_2*g)(n (n+2)) |(\mu f * 1)(n(n+2))| \ \ \ \ \ (15)

\displaystyle  \psi_x(n) = o(x \log^{1-C} x),


\displaystyle  \sum_{x^\varepsilon \leq p \geq 2x^{1-\varepsilon}: \chi(p) \neq -1} \sum_{n: p|n(n+2)} (\Lambda_2*g)(n (n+2)) |(\mu f * 1)(n(n+2))| \ \ \ \ \ (16)

\displaystyle  \psi_x(n) = o(x \log^{1-C} x),

as the claim then follows by summing and sending {\varepsilon} slowly to zero.

We begin with (15). Observe that if {p^2} divides {n(n+2)} then either {p^2} divides {n} or {p^2} divides {(n+2)}. In particular the number of {n \leq 2x} with {p^2 | n(n+2)} is {O( \frac{x}{p^2} )}. The summand {(\Lambda_2*g)(n (n+2)) |(\mu f * 1)(n(n+2)| \psi_x(n)} is {O(x^{o(1)})} by the divisor bound, so the left-hand side of (15) is bounded by

\displaystyle  \ll \sum_{p \geq x^\varepsilon} \frac{x}{p^2} x^{o(1)} \ll x^{1-\varepsilon+o(1)}

and the claim follows.

Next we turn to (14). We can very crudely bound

\displaystyle  \Lambda_2*g(n(n+2)) \ll \tau(n(n+2))^{O(1)} \log^2 x, \ \ \ \ \ (17)

so it suffices to show that

\displaystyle  \sum_{w < p < x^\varepsilon} \sum_{n: p|n(n+2); (n(n+2),W)=1} \tau(n (n+2))^{O(1)} |(\mu f * 1)(n(n+2)| \psi_x(n)

\displaystyle  \ll_C \varepsilon x \log^{-C-1} x.

By Mertens’ theorem, it suffices to show that

\displaystyle  \sum_{n: p|n(n+2); (n(n+2),W)=1} \tau(n (n+2))^{O(1)} |(\mu f * 1)(n(n+2)| \psi_x(n) \ \ \ \ \ (18)

\displaystyle  \ll_C \frac{\log p}{p \log q} x \log^{-C-1} x

for all {w < p < x^\varepsilon}.

We use a modification of the argument used to prove Proposition 4.2 of this Polymath8b paper. By Fourier inversion, we may write

\displaystyle  e^u \psi(u) = \int_{\bf R} \Psi(t) e^{-itu}\ dt

for some rapidly decreasing function {\Psi}, so that

\displaystyle  f(d) = \int_{\bf R} \frac{1}{d^{(1+it)/\log q}} \Psi(t)\ dt,

and hence

\displaystyle  \mu f * 1(n) = \int_{\bf R} \sum_{d|n} \frac{\mu(d)}{d^{(1+it)/\log q}} \Psi(t)\ dt

\displaystyle  = \int_{\bf R} \prod_{p'|n} (1 - \frac{1}{(p')^{(1+it)/\log q}})\ \Psi(t)\ dt

and hence by the triangle inequality

\displaystyle  (\mu f*1)(n(n+2)) \ll_A \int_{\bf R} \prod_{p'|n(n+2)} O( \min( 1, (1+|t|) \frac{\log p'}{\log q}) )\ \frac{dt}{(1+|t|)^A}

for any fixed {A>0}. Since {\prod_{p'|n(n+2)} O(1) \ll \tau(n(n+2))^{O(1)}}, we can thus (after substituting {\sigma := 1+|t|}) bound the left-hand side of (18) by

\displaystyle  \ll_A \int_1^\infty \sum_{n: p|n(n+2); (n(n+2),W)=1} \tau(n(n+2))^{O(1)} \prod_{p'|n(n+2)} \min( 1, \sigma \frac{\log p'}{\log q})\ \frac{d\sigma}{\sigma^A}

and so it will suffice to show the bound

\displaystyle  \sum_{n: p|n(n+2); (n(n+2),W)=1} \tau(n(n+2))^{O(1)} \prod_{p'|n(n+2)} \min( 1, \sigma \frac{\log p'}{\log q}) \ \ \ \ \ (19)

\displaystyle  \ll_C \sigma^{O_C(1)} \frac{\log p}{p \log q} x \log^{-C-1} x

for any {\sigma \geq 1} and {p \leq 2x^{1-\varepsilon}}.

We factor {n(n+2) = p_1 \dots p_r} where {w < p_1 \leq \dots \leq p_r} are primes, and then write {n(n+2) = dm} where {d = p_1 \dots p_i} and {i} is the largest index for which {p_1 \dots p_i < x^{1/10}}. Clearly {0 \leq i < r} and {d < x^{1/10}} with {(d,W)=1}, and the least prime factor {p(m) = p_{i+1}} of {m} is such that

\displaystyle  p_{i+1} \geq (p_1 \dots p_{i+1})^{1/(i+1)} \geq x^{\frac{1}{10(i+1)}};

we have {n(n+2) \ll x^2} on the support of {\psi_x(n)}, and so

\displaystyle  r \ll 1+i

and thus {\tau(n(n+2)) \ll \exp( O(i) )}. Clearly we have

\displaystyle  \prod_{p'|n(n+2)} \min( 1, \sigma \frac{\log p'}{\log q}) \leq \prod_{p'|[p,d]} \min( 1, \sigma \frac{\log p'}{\log q}).

We write {i = \Omega(d)}, where {\Omega(d)} denotes the number of prime factors of {d} counting multiplicity. We can thus bound the left-hand side of (19) by

\displaystyle  \ll \sum_{d < x^{1/10}; (d,W)=1} \exp( O( \Omega(d) ) ) \prod_{p'|[p,d]} \min( 1, \sigma \frac{\log p'}{\log q})

\displaystyle  \sum_{n: [p,d]|n(n+2); p(\frac{n(n+2)}{d}) \geq x^{\frac{1}{10(\Omega(d)+1)}}} \psi_x(n).

We may replace the {\psi_x(n)} weight with a restriction of {n} to the interval {[x - O(x \log^{-C} x), x + O(x \log^{-C} x)]}. The constraint {p(\frac{n(n+2)}{d}) \geq x^{\frac{1}{10(\Omega(d)+1)}}} removes two residue classes modulo every odd prime less than {x^{\frac{1}{10(\Omega(d)+1)}}}, while the constraint {[p,d]|n(n+2)} restricts {n} to {O( \exp( O(\Omega(d))))} residue classes modulo {[p,d]}. Standard sieve theory then gives

\displaystyle  \sum_{n: [p,d]|n(n+2); p(\frac{n(n+2)}{d} \geq x^{\frac{1}{10(\Omega(d)+1)}}} \psi_x(n) \ll \exp( O(\Omega(d))) \frac{x}{[p,d]} \log^{-C-1} x

and so we are reduced to showing that

\displaystyle  \sum_{d < x^{1/10}; (d,W)=1} \frac{\exp( O( \Omega(d) ) )}{[p,d]} \prod_{p'|[p,d]} \min( 1, \sigma \frac{\log p'}{\log q}) \ll \sigma^{O(1)} \frac{\log p}{p \log q}.

Factoring {d = p_1 \dots p_i}, we can bound the left-hand side by

\displaystyle  \ll \frac{\min(1, \sigma \frac{\log p}{\log q}}{p}) \prod_{w < p' < x^{1/10}: p' \neq p} (1 + \sum_{j=1}^\infty \frac{\exp(O(j))}{(p')^j} \min( 1, \sigma \frac{\log p'}{\log q}) )

which (for {w} large enough) is bounded by

\displaystyle  \ll \sigma \frac{\log p}{p \log q} \prod_{w < p' < x^{1/10}} (1 + O( \frac{1}{p'} \min( 1, \sigma \frac{\log p'}{\log q}) ) )

\displaystyle  \ll \sigma \frac{\log p}{p \log q} \exp( O( \sum_{p' < x^{1/10}} \frac{1}{p'} \min( 1, \sigma \frac{\log p'}{\log q}) ) )

which by Mertens’ theorem is bounded by

\displaystyle  \ll \sigma \frac{\log p}{p \log q} \exp( O_C( \log \sigma ) )

and the claim follows.

For future reference we observe that the above arguments also establish the bound

\displaystyle  \sum_{n: (n(n+2),W)=1} \tau(n (n+2))^{O(1)} |(\mu f * 1)(n(n+2)| \psi_x(n) \ \ \ \ \ (20)

\displaystyle  \ll_C \frac{1}{\log q} x \log^{-C-1} x

and (if one replaces {x^{1/10}} with {x^{\varepsilon/10}})

\displaystyle  \sum_{n: p|n(n+2); (n(n+2),W)=1} \tau(n (n+2))^{O(1)} |(\mu f * 1)(n(n+2)| \psi_x(n) \ \ \ \ \ (21)

\displaystyle  \ll_{C,\varepsilon} \frac{\log p}{p \log q} x \log^{-C-1} x

for all {p \leq 2x^{1-\varepsilon}}.

Finally, we turn to (16). Using (17) again, it suffices to show that

\displaystyle  \sum_{x^\varepsilon \leq p \geq 2x^{1-\varepsilon}: \chi(p) \neq -1} \sum_{n: p|n(n+2)} \tau(n(n+2))^{O(1)} |(\mu f * 1)(n(n+2)| \psi_x(n)

\displaystyle  = o(x \log^{-C-1} x).

The claim then follows from (21) and Lemma 5.

It remains to prove (12), which we write as

\displaystyle  \sum_{n: (n(n+2),W)=1} \left(\sum_{d|n(n+2)} \chi(d) \log^2 \frac{n(n+2)}{d}\right) (\mu f * 1)(n(n+2)) \psi_x(n)

\displaystyle \gg (1-o(1)) x \log^{1-C} x.

On the support of {\psi_x(n)}, we can write

\displaystyle  \log^2 \frac{n(n+2)}{d} = \log^2 \frac{x^2}{d} + O( \log^{-C+O(1)} x).

The contribution of the error term can be bounded by

\displaystyle  O( \log^{-C+O(1)} x \sum_n \tau(n(n+2))^{O(1)} |(\mu f * 1)(n(n+2))| \psi_x(n) );

applying (20), this is bounded by {O( x \log^{-2C+O(1)} x)} which is acceptable for {C} large enough. Thus it suffices to show that

\displaystyle  \sum_{n: (n(n+2),W)=1} (\sum_{d|n(n+2)} \chi(d) \log^2 \frac{x^2}{d}) (\mu f * 1)(n(n+2)) \psi_x(n)

\displaystyle  \gg (1-o(1)) x \log^{1-C} x

which we write as

\displaystyle  \sum_{n:(n(n+2),W)=1} \left(\sum_{d|n(n+2)} \chi(d) G( \frac{\log d}{\log x} )\right) (\mu f * 1)(n(n+2)) \psi_x(n)/

\displaystyle  \gg (1-o(1)) x \log^{-C-1} x

where {G(u) := (2-u)^2}. We split {G = G_< + G_\sim + G_>} where {G_<}, {G_\sim}, {G_>} are smooth truncations of {G} to the intervals {(-\infty,0.99)}, {(0.98, 1.02)}, and {(1.01, +\infty)} respectively. It will suffice to establish the bounds

\displaystyle  \sum_{n:(n(n+2),W)=1} \left(\sum_{d|n(n+2)} \chi(d) G_<( \frac{\log d}{\log x} )\right) (\mu f * 1)(n(n+2)) \psi_x(n) \ \ \ \ \ (22)

\displaystyle  \gg (1-o(1)) x \log^{-C-1} x

\displaystyle  \sum_{n:(n(n+2),W)=1} \left(\sum_{d|n(n+2)} \chi(d) G_\sim( \frac{\log d}{\log x} )\right) (\mu f * 1)(n(n+2)) \psi_x(n) \ \ \ \ \ (23)

\displaystyle  = o(x \log^{-C-1} x)

\displaystyle  \sum_{n: (n(n+2),W)=1} \left(\sum_{d|n(n+2)} \chi(d) G_>( \frac{\log d}{\log x} )\right) (\mu f * 1)(n(n+2)) \psi_x(n) \ \ \ \ \ (24)

\displaystyle  = o(x \log^{-C-1} x)

We begin with (24), which is a relatively easy consequence of the cancellation properties of {\chi}. We may rewrite the left-hand side as

\displaystyle  \sum_e \mu(e) f(e) \sum_m \sum_{n: (n(n+2),W)=1; [e,m] | n(n+2)} \chi\left(\frac{n(n+2)}{m}\right)

\displaystyle G_>\left( \frac{\log \frac{n(n+2)}{m}}{\log x} \right) \psi_x(n).

The summand vanishes unless {m \ll x^{0.99}}, {e \leq q}, and {[e,m]/m} is coprime to {q}, so that {[e,m] \ll q x^{0.99}}. For fixed {e,m}, the constraints {(n(n+2),W)=1}, {[e,m]|n(n+2)} restricts {n} to {O_w(x^{o(1)})} residue classes of the form {a \hbox{ mod } W[e,m]}, with {[e,m]|a(a+2)}, in particular {m_1 | a} and {m_2 | a+2} for some {m_1,m_2} with {m = m_1 m_2}. Let us fix {e,m,a,m_1,m_2} and consider the sum

\displaystyle  \sum_{n: n = a \hbox{ mod } W[e,m]} \chi(\frac{n(n+2)}{m}) G_>( \frac{\log n(n+2)/m}{\log x} ) \psi_x(n).

Writing {n = k W[e,m] + a}, this becomes

\displaystyle  \sum_k \chi( \frac{W[e,m]}{m_1} k + \frac{a}{m_1} ) \chi( \frac{W[e,m]}{m_2} k + \frac{a+2}{m_2} )

\displaystyle G_>( \frac{\log (kW[e,m]+a)(kW[e,m]+a+2)/m}{\log x} ) \psi_x(kW[e,m]+a).

From Lemma 3, we have

\displaystyle  \sum_{k \in {\bf Z}/q{\bf Z}} \chi( \frac{W[e,m]}{m_1} k + \frac{a}{m_1} ) \chi( \frac{W[e,m]}{m_2} k + \frac{a+2}{m_2} ) \ll x^{o(1)} (2 \frac{W[e,m]}{m},q)

\displaystyle  \ll x^{o(1)}

since {[e,m]/m} is coprime to {q}. From summation by parts we thus have

\displaystyle  \sum_k \chi( \frac{W[e,m]}{m_1} k + \frac{a}{m_1} ) \chi( \frac{W[e,m]}{m_2} k + \frac{a+2}{m_2} )

\displaystyle G_>( \frac{\log (kW[e,m]+a)(kW[e,m]+a+2)/m}{\log x} ) \psi_x(n)

\displaystyle \ll q^{-1} x^{o(1)} \frac{x}{[e,m]}

(noting that {[e,m] \ll q^{-1} x} if {C} is large enough) and so we can bound the left-hand side of (24) in magnitude by

\displaystyle  q^{-1} x^{1+o(1)} \sum_{e \leq q} \sum_{m \ll x^{0.99}} \frac{1}{[e,m]} \ll q^{-1} x^{1+o(1)} \sum_{d \ll q x^{0.99}} \frac{x^{o(1)}}{d}

\displaystyle  \ll q^{-1} x^{1+o(1)}

and (24) follows.

Now we prove (23), which is where we need nontrivial bounds on Kloosterman sums. Expanding out {\mu f * 1} and using the triangle inequality, it suffices (for {C} large enough) to show that

\displaystyle \sum_{n: (n(n+2),W)=1; r|n(n+2)} (\sum_{d|n(n+2)} \chi(d) G_\sim( \frac{\log d}{\log x} )) \psi_x(n) \ll q^{O(1)} x^{0.99+o(1)}

for all {r < q^{1/2}}. By Fourier expansion of the {r|n(n+2)} and {(n(n+2),W)=1} constraints (retaining only the restriction that {n} is odd), it suffices to show that

\displaystyle \sum_{n \hbox{ odd}} (\sum_{d|n(n+2)} \chi(d) G_\sim( \frac{\log d}{\log x} )) \psi_x(n) e( kn/Wr )\ll q^{O(1)} x^{0.99+o(1)}

for every {k \in {\bf Z}/Wr{\bf Z}}.

Fix {r,k}. If {d|n(n+2)} for an odd {n}, then we can uniquely factor {d = d_1 d_2} such that {d_1|n}, {d_2|n+2}, and {(d_1,d_2)=1}. It thus suffices to show that

\displaystyle  \sum_{d_1,d_2: (d_1,d_2)=1} \chi(d_1) \chi(d_2) G_\sim( \frac{\log d_1 + \log d_2}{\log x} )) \ \ \ \ \ (25)

\displaystyle  \sum_{n \hbox{ odd}: d_1|n; d_2|n+2}\psi_x(n) e( kn/Wr)

\displaystyle \ll q^{O(1)} x^{0.99+o(1)}.

Actually, we may delete the condition {(d_1,d_2)=1} since this is implied by the constraints {d_1|n, d_2|n+2} and {n} odd.

We first dispose of the case when {d_1} is large in the sense that {d_1 \geq x^{0.51}}. Making the change of variables {d_3 = n/d_1}, we may rewrite the left-hand side as

\displaystyle  \sum_{d_3,d_2} \chi(d_2) \sum_{n \hbox{ odd}: d_3|n; d_2|n+2} \chi( \frac{n}{d_3} ) G_\sim( \frac{\log n - \log d_3 + \log d_2}{\log x} )

\displaystyle  \psi_x(n) e(kn/Wr).

We can assume {d_2} is coprime to {q} and {d_2,d_3} odd with {d_3} coprime to {d_2} and {d_2,d_3 \ll x^{0.49}}, as the contribution of all other cases vanish. The constraints that {n} is odd and {d_3|n, d_2|n+2} then restricts {n} to a single residue class modulo {2 d_2 d_3}, with {n/d_3} restricted to a single residue class modulo {2 d_2}. We split this into {Wr} residue classes modulo {2Wd_2 r} to make the {e(kn/Wr)} phase constant on each residue class. The modulus {2Wd_2 r} is not divisible by {q}, since {d_2} is coprime to {q} and {r \leq \sqrt{q}}. As such, {\chi(\frac{n}{d_3})} has mean zero on every consecutive {q} elements in each residue class modulo {2Wd_2 r} under consideration, and from summation by parts we then have

\displaystyle  \sum_{n \hbox{odd}: d_3|n; d_2|n+2} \chi( \frac{n}{d_3} ) G_\sim( \frac{\log n - \log d_3 + \log d_2}{\log x} ) \psi_x(n) e(kn/Wr)

\displaystyle  \ll q Wr

and hence the contribution of the {d_1 \geq x^{0.51}} case to (25) is

\displaystyle  \ll \sum_{d_3,d_2 \ll x^{0.49}} q Wr \ll q^{O(1)} x^{0.98}

which is acceptable.

It remains to control the contribution of the {d_1 < x^{0.51}} case to (25). By the triangle inequality, it suffices to show that

\displaystyle  \sum_{d_2} \chi(d_2) G_\sim( \frac{\log d_1 + \log d_2}{\log x} )) \sum_{n \hbox{odd}: d_1|n; d_2|n+2}\psi_x(n) e(kn/Wr)

\displaystyle \ll q^{O(1)} x^{0.99+o(1)} / d_1

for all {d_1 < x^{0.51}} coprime to {q}. We can of course restrict {d_1,d_2} to be coprime to each other and to {W}. Writing {n+2 = d_2 (2m+1)}, the constraint {d_1|n} is equivalent to

\displaystyle  m = \overline{d_2} - \overline{2} \hbox{ mod } d_1

and so we can rewrite the left-hand side as

\displaystyle  \sum_{d_2: (d_1,d_2)=1} \chi(d_2) G_\sim( \frac{\log d_1 + \log d_2}{\log x} ))

\displaystyle \sum_{m = \overline{d_2} - \overline{2} \hbox{ mod } d_1} \psi_x(d_2 (2m+1)-2) e(k(d_2(2m+1))/Wr).

By Fourier expansion, we can write {\chi(d_2)} as a linear combination of {e( l d_2 / q)} with bounded coefficients and {(l,q)=1}, so it suffices to show that

\displaystyle  \sum_{d_2: (d_1,d_2)=1} e(ld_2/q) G_\sim( \frac{\log d_1 + \log d_2}{\log x} ))

\displaystyle \sum_{m = \overline{d_2} - \overline{2} \hbox{ mod } d_1} \psi_x(d_2 (2m+1)-2) e( 2kd_2 m / Wr )

\displaystyle \ll q^{O(1)} x^{0.99+o(1)} / d_1.

Next, by Fourier expansion of the constraint {m = \overline{d_2} - \overline{2} \hbox{ mod } d_1}, we write the left-hand side as

\displaystyle  \frac{1}{d_1} \sum_{h \in {\bf Z}/d_1 {\bf Z}} \sum_{d_2: (d_1 d_2)=1} e(ld_2/q) G_\sim( \frac{\log d_1 + \log d_2}{\log x} )) e( h(\overline{d_2} - \overline{2})/d_1)

\displaystyle  \sum_m e(-hm/d_1) \psi_x(d_2 (2m+1)-2) e( 2kd_2 m / Wr ).

From Poisson summation and the smoothness of {\psi}, we see that the inner sum is {O(x^{-100})} unless

\displaystyle  \| \frac{h}{d_1} - \frac{c}{Wr} \| \ll \frac{x^{0.01}d_2}{x} \ \ \ \ \ (26)

for some integer {c}, where {\|\theta\|} denotes the distance from {\theta} to the nearest integer. The contribution of the {h} which do not satisfy this relation is easily seen to be acceptable. From the support of {G_\sim} we see in particular that there are only {O( r x^{0.03} )} remaining choices for {h}. Thus it suffices by the triangle inequality to show that

\displaystyle  \sum_{d_2: (d_1 d_2)=1} e(ld_2/q) G_\sim( \frac{\log d_1 + \log d_2}{\log x} )) e( h\overline{d_2}/d_1)

\displaystyle  \sum_m e(-hm/d_1) \psi_x(d_2 (2m+1)-2) e( 2kd_2 m / Wr )

\displaystyle \ll q^{O(1)} x^{0.95}

for each {h \in {\bf Z}/d_1{\bf Z}} of the form (26).

We rearrange the left-hand side as

\displaystyle  \sum_m e(-hm/d_1) \sum_{d_2: (d_1 d_2)=1} e(kd_2/q) G_\sim( \frac{\log d_1 + \log d_2}{\log x} )) e( h\overline{d_2}/d_1)

\displaystyle  e( 2kd_2 m / Wr ) \psi_x(d_2 (2m+1)-2).

Suppose first that {h/d_1} is of the form {c/Wr} for some integer {c}. Then the phase {d_2 \mapsto e(kd_2/q) e( h\overline{d_2}/d_1) e( 2kd_2 m / Wr )} is periodic with period {Wqr} and has mean zero here (since {Wr<q}). From this, we can estimate the inner sum by {O( Wqr )}; since {m} is restricted to be of size {O( x / d_2 ) = O( x^{0.02} d_1 ) = O( x^{0.53} )}, this contribution is certainly acceptable. Thus we may assume that {h/d_1} is not of the form {c/Wr}. A similar argument works when {d_1 \leq x^{0.4}} (say), so we may assume that {d_1 \geq x^{0.4}}, so that {d_2 \ll x^{0.6}}.

By (26), this forces the denominator of {h/d_1} in lowest form to be {\gg \frac{x}{x^{0.01} d_2 Wr} \gg q^{-O(1)} x^{0.39}}. By Lemma 2, we thus have

\displaystyle  \sum_{d_2 \in ({\bf Z}/d_1{\bf Z})^\times} e( a d_2/d_1) e( h\overline{d_2}/d_1) \ll x^{o(1)} d_2 ( q^{-O(1)} x^{0.39} )^{-1/4}

\displaystyle  \ll q^{O(1)} x^{-0.09} d_2

for any {a}, so from Poisson summation we have

\displaystyle  \sum_{d_2: (d_1 d_2)=1} e(kd_2/q) G_\sim( \frac{\log d_1 + \log d_2}{\log x} )) e( h\overline{d_2}/d_1) e( 2kd_2 m / Wr )

\displaystyle  \psi_x(d_2 (2m+1)-2) \ll q^{O(1)} x^{-0.07} \frac{x}{d_1};

since {m} is constrained to be {O( x^{0.02} d_1 )}, the claim follows.

Finally, we prove (22), which is a routine sieve-theoretic calculation. We rewrite the left-hand side as

\displaystyle  \sum_{d,e} \chi(d) G_<(\frac{\log d}{\log x}) \mu(e) f(e) 1_{n: (n(n+2),W)=1; [d,e] | n(n+2)} \psi_x(n).

The summand vanishes unless {d,e} are coprime to {W} with {d \ll x^{0.52}} and {e \leq \sqrt{q}}. From Poisson summation one then has

\displaystyle  1_{m: (n(n+2),W)=1; [d,e] | n(n+2)} \psi_x(n) = \frac{1}{2} (\prod_{2 < p \leq w} (1-\frac{2}{p})) (\int_{\bf R} \psi) \frac{x \log^{-C} x}{[d,e]}

\displaystyle + O( x^{-100} ).

The error term is certainly negligible, so it suffices to show that

\displaystyle  (\prod_{2 < p \leq w} (1-\frac{2}{p})) \sum_{d,e: (de,W)=1} \chi(d) G_<(\frac{\log d}{\log x}) \mu(e) \psi(\frac{\log e}{\log q}) \frac{1}{[d,e]}

\displaystyle  \gg (1-o(1)) \log^{-1} x.

We can control the left-hand side by Fourier analysis. Writing

\displaystyle  e^u G_<(u) = \int_{\bf R} g(t) e^{-itu}\ dt


\displaystyle  e^u \psi(u) = \int_{\bf R} \Psi(t) e^{-itu}\ dt

for some rapidly decreasing functions {g,\Psi}, the left-hand side may be expressed as

\displaystyle  (\prod_{2 < p \leq w} (1-\frac{2}{p})) \int_{\bf R} \int_{\bf R} \sum_{d,e: (de,W)=1} \frac{\chi(d) \mu(e)}{[d,e] d^{\frac{1+it_1}{\log x}} e^{\frac{1+it_2}{\log q}}}\ g(t_1) \Psi(t_2)\ dt_1 dt_2

which factors as

\displaystyle  \int_{\bf R} \int_{\bf R} \prod_p E_p( \frac{1+it_1}{\log x}, \frac{1+it_2}{\log q} ) \ g(t_1) \Psi(t_2)\ dt_1 dt_2 \ \ \ \ \ (27)


\displaystyle  E_2(s_1,s_2) := 1

\displaystyle  E_p(s_1,s_2) := 1 - \frac{2}{p}

for {p \leq w}, and

\displaystyle  E_p(s_1,s_2) := 1 - \frac{1}{p^{1+s_2}} + \sum_{j=1}^\infty \frac{\chi(p)^j}{p^{j(1+s_1)}} (1 - \frac{1}{p^{s_2}})

for {p>w}. From Mertens’ theorem we have the crude bound

\displaystyle  \prod_p E_p( \frac{1+it_1}{\log x}, \frac{1+it_2}{\log q} ) \ll \log^{O(1)} x

which by the rapid decrease of {g,\Psi} allows one to restrict to the range {|t_1|, |t_2| \leq \sqrt{\log q}} with an error of {o(\log^{-1} x)}. In particular, we now have {s_1,s_2 = o(1)}.

Recalling that

\displaystyle  \zeta(s) = \prod_p (1 - \frac{1}{p^s})^{-1}

for {\hbox{Re}(s)>1}, we can factor

\displaystyle  \prod_p E_p( s_1, s_2 ) = \frac{\zeta(1 + s_1+s_2)}{\zeta(1+s_1) \zeta(1+s_2)} \prod_p E'_p(s_1,s_2) E''_p(s_1,s_2)


\displaystyle  E''_p(s_1,s_2) := 1 + 1_{p>3} (\frac{1+\chi(p)}{p^{1+s_1}} - \frac{1+\chi(p)}{p^{1+s_1+s_2}})

(the restriction {p>3} being to prevent {E''_p(s_1,s_2)} vanishing for {p=2,3} and {s_1,s_2} small) and one has

\displaystyle  E'_p(s_1,s_2) = 1 + O( \frac{1}{p^{2+o(1)}} )

for {s_1,s_2 = o(1)}, and

\displaystyle  E_2(0,0) = 2


\displaystyle  E_p(0,0) = 1

for odd {p}. In particular, from the Cauchy integral formula we see that

\displaystyle  \prod_p E'_p(s_1,s_2) = 2 + o(1)

for {s_1,s_2 = o(1)}. Since we also have {\zeta(1+s) = \frac{1+o(1)}{s}} in this region, we thus can write (27) as

\displaystyle  \int_{|t_1|, |t_2| \leq \sqrt{\log q}} (1+o(1)) \prod_p E''_p(s_1,s_2) \frac{(1+it_2) (1+it_1)}{(1+it_2 + \frac{1+it_1}{C}) \log x}

\displaystyle  g(t_1) \Psi(t_2) dt_1 dt_2 + o(\log^{-1} x)

and our task is now to show that

\displaystyle  \int_{|t_1|, |t_2| \leq \sqrt{\log q}} (1+o(1)) \prod_p E''_p(s_1,s_2) \frac{(1+it_2) (1+it_1)}{(1+it_2 + \frac{1+it_1}{C})}

\displaystyle  g(t_1) \Psi(t_2) dt_1 dt_2 \gg 1 - o(1).

We have

\displaystyle  \log E''_p(s_1,s_2) = O(1/p)

when {s_1,s_2 = O(\frac{1}{\log p})} (even when {s_1,s_2} have negative real part); since {\log E''_p(0,0)=0}, we conclude from the Cauchy integral formula that

\displaystyle  \log E''_p(s_1,s_2) \ll (|s_1|+|s_2|) \frac{\log p}{p}

when {\log p \ll \frac{1}{|s_1|+|s_2|}}. For the remaining primes {p}, we have

\displaystyle  \log E''_p(s_1,s_2) \ll \frac{1+\chi(p)}{p^{1+1/\log x}}

when {s_1 = \frac{1+it_1}{\log x}} and {s_2 := \frac{1+it_2}{\log q}}. Summing in {p} using Lemma 5 to handle those {p} between {q} and {x}, and Mertens’ theorem and the trivial bound {1+\chi(p)=O(1)} for all other {p}, we conclude that

\displaystyle  \sum_p \log E''_p(s_1,s_2) \ll \log( 2+|s_1|+|s_2| )

and thus

\displaystyle  \prod_p E''_p(s_1,s_2) \ll (2 + |s_1| + |s_2| )^{O(1)}.

From this and the rapid decrease of {g,\Psi}, we may restrict the range of {t_1,t_2} even further to {|t_1|, |t_2| \leq \omega(q)} for any {\omega(q)} that goes to infinity arbitrarily slowly with {q}. For sufficiently slow {\omega}, the above estimates on {\log E''_p(s_1,s_2)} and Lemma 5 (now used to handle those {p} between {q^\varepsilon} and {x^{1/\varepsilon}} for some {\varepsilon} going sufficiently slowly to zero) give

\displaystyle  \sum_p \log E''_p(s_1,s_2) = o(1)

and so we are reduced to establishing that

\displaystyle  \int_{|t_1|, |t_2| \leq \omega(q)} (1+o(1)) \frac{(1+it_2) (1+it_1)}{(1+it_2 + \frac{1+it_1}{C})}\ g(t_1) \Psi(t_2) dt_1 dt_2 \gg 1 - o(1).

We may once again use the rapid decrease of {g,\Psi} to remove the {o(1)} prefactor as well as the restrictions {|t_1|, |t_2| \leq \omega(q)}, and reduce to showing that

\displaystyle  \int_{\bf R} \int_{\bf R} \frac{(1+it_2) (1+it_1)}{(1+it_2 + \frac{1+it_1}{C})}\ g(t_1) \Psi(t_2) dt_1 dt_2 \gg 1 - o(1).

For {C} large enough, it will suffice to show that

\displaystyle  \int_{\bf R} \int_{\bf R} (1+it_1)\ g(t_1) \Psi(t_2) dt_1 dt_2 \gg 1

with the implied constant independent of {C}. But the left-hand side evaluates to {-G'_<(0) \psi(0) = 4 \psi(0)}, and the claim follows.

Filed under: expository, math.NT Tagged: prime numbers, Roger Heath-Brown, Siegel zero, twin primes

David Hoggdust structures, code audit

At Milky Way group meeting, Eddie Schlafly (MPIA) showed beautiful results (combining PanSTARRS, APOGEE, 2MASS, and WISE data) on the dust extinction law in the Milky Way. He can see that some of the nearby dust structures have anomalous RV values (dust extinction law shapes). Some of these are previously unknown features; they only appear when you have maps of density and RV at the same time. Maybe he gets to name these new structures!

Late in the day, Ness and I audited her code that infers red-giant masses from APOGEE spectra. We found some issues with sigmas and variances and inverse variances. It gets challenging! One consideration is that you don't ever want to have infinities, so you want to use inverse variances (which become zero when data are missing). But on the other hand, you want to avoid singular or near-singular matrices (which happen when you have lots of vanishing inverse variances). So we settled on a consistent large value for sigma (and correspondingly small value for the inverse variance) that satisfies both issues for our problem.

David Hoggvisualization, delta functions, interpretability

Late in the day, Rix, Ness, and I showed Ben Weiner (Arizona) the figures we have made for our paper on inferring red-giant masses and ages from APOGEE spectroscopy. He helped us think about changes we might make to the figures to bolster and make more clear the arguments.

I spent some of the day manipulating delta functions and mixtures of delta functions for my attempt to infer the star-formation history of the Milky Way. I learned (for the Nth time) that it is better to manipulate Gaussians than delta functions; delta functions are way too freaky! And, once again, thinking about things dimensionally (that is, in terms of units) is extremely valuable.

In the morning, Rix and I wrote to Andy Casey (Cambridge) regarding a proposal he made to use The Cannon and things we know about weak (that is, linearly responding) spectral lines to create a more interpretable or physically motivated version of our data-driven model, and maybe get detailed chemical abundances. Some of his ideas overlap what we are doing with Yuan-Sen Ting (Harvard). Unfortunately, there is no real way to benefit enormously from the flexibility of the data-driven model without also losing interpretability. The problem is that the training set can have arbitrary issues within it, and these become part of the model; if you can understand the training set so well that you can rule out these issues, then you don't need a data-driven model!

Doug NatelsonShort term-ism and industrial research

I have written multiple times (here and here, for example) about my concern that the structure of financial incentives and corporate governance have basically killed much of the American corporate research enterprise.  Simply put:  corporate officers are very heavily rewarded based on very short term metrics (stock price, year-over-year change in rate of growth of profit).  When faced with whether to invest company resources in risky long-term research that may not pay off for years if ever, most companies opt out of that investment.  Companies that do make long-term investments in research are generally quasi-monopolies.  The definition of "research" has increasingly crept toward what used to be called "development"; the definition of "long term" has edged toward "one year horizon for a product"; and physical sciences and engineering research has massively eroded in favor of much less expensive (in infrastructure, at least) work on software and algorithms. 

I'm not alone in making these observations - Norm Augustine, former CEO of Lockheed Martin, basically says the same thing, for example.  Hillary Clinton has lately started talking about this issue.

Now, writing in The New Yorker this week, James Surowiecki claims that "short termism" is a myth.  Apparently companies love R&D and have been investing in it more heavily.  I think he's just incorrect, in part because I don't think he really appreciates the difference between research and development, and in part because I don't think he appreciates the sliding definitions of "research", "long term" and the difference between software development and physical sciences and engineering.  I'm not the only one who thinks his article has issues - see this article at Forbes.

No one disputes the long list of physical research enterprises that have been eliminated, gutted, strongly reduced, or refocused onto much shorter term projects.  A brief list includes IBM, Xerox, Bell Labs, Motorola, General Electric, Ford, General Motors, RCA, NEC, HP Labs, Seagate, 3M, Dupont, and others.  Even Microsoft has been cutting back.  No one disputes that corporate officers have often left these organizations with fat benefits packages after making long-term, irreversible reductions in research capacity (I'm looking at you, Carly Fiorina).   Perhaps "short termism" is too simple an explanation, but claiming that all is well in the world of industrial research just rings false.

Tommaso DorigoThou Shalt Have One Higgs - $100 Bet Won !

One of the important things in life is to have a job you enjoy and which is a motivation for waking up in the morning. I can say I am lucky enough to be in that situation. Besides providing me with endless entertainment through the large dataset I enjoy analyzing, and the constant challenge to find new ways and ideas to extract more information from data, my job also gives me the opportunity to gamble - and win money, occasionally.

read more

Scott AaronsonCommon Knowledge and Aumann’s Agreement Theorem

The following is the prepared version of a talk that I gave at SPARC: a high-school summer program about applied rationality held in Berkeley, CA for the past two weeks.  I had a wonderful time in Berkeley, meeting new friends and old, but I’m now leaving to visit the CQT in Singapore, and then to attend the AQIS conference in Seoul.

Common Knowledge and Aumann’s Agreement Theorem

August 14, 2015

Thank you so much for inviting me here!  I honestly don’t know whether it’s possible to teach applied rationality, the way this camp is trying to do.  What I know is that, if it is possible, then the people running SPARC are some of the awesomest people on earth to figure out how.  I’m incredibly proud that Chelsea Voss and Paul Christiano are both former students of mine, and I’m amazed by the program they and the others have put together here.  I hope you’re all having fun—or maximizing your utility functions, or whatever.

My research is mostly about quantum computing, and more broadly, computation and physics.  But I was asked to talk about something you can actually use in your lives, so I want to tell a different story, involving common knowledge.

I’ll start with the “Muddy Children Puzzle,” which is one of the greatest logic puzzles ever invented.  How many of you have seen this one?

OK, so the way it goes is, there are a hundred children playing in the mud.  Naturally, they all have muddy foreheads.  At some point their teacher comes along and says to them, as they all sit around in a circle: “stand up if you know your forehead is muddy.”  No one stands up.  For how could they know?  Each kid can see all the other 99 kids’ foreheads, so knows that they’re muddy, but can’t see his or her own forehead.  (We’ll assume that there are no mirrors or camera phones nearby, and also that this is mud that you don’t feel when it’s on your forehead.)

So the teacher tries again.  “Knowing that no one stood up the last time, now stand up if you know your forehead is muddy.”  Still no one stands up.  Why would they?  No matter how many times the teacher repeats the request, still no one stands up.

Then the teacher tries something new.  “Look, I hereby announce that at least one of you has a muddy forehead.”  After that announcement, the teacher again says, “stand up if you know your forehead is muddy”—and again no one stands up.  And again and again; it continues 99 times.  But then the hundredth time, all the children suddenly stand up.

(There’s a variant of the puzzle involving blue-eyed islanders who all suddenly commit suicide on the hundredth day, when they all learn that their eyes are blue—but as a blue-eyed person myself, that’s always struck me as needlessly macabre.)

What’s going on here?  Somehow, the teacher’s announcing to the children that at least one of them had a muddy forehead set something dramatic in motion, which would eventually make them all stand up—but how could that announcement possibly have made any difference?  After all, each child already knew that at least 99 children had muddy foreheads!

Like with many puzzles, the way to get intuition is to change the numbers.  So suppose there were two children with muddy foreheads, and the teacher announced to them that at least one had a muddy forehead, and then asked both of them whether their own forehead was muddy.  Neither would know.  But each child could reason as follows: “if my forehead weren’t muddy, then the other child would’ve seen that, and would also have known that at least one of us has a muddy forehead.  Therefore she would’ve known, when asked, that her own forehead was muddy.  Since she didn’t know, that means my forehead is muddy.”  So then both children know their foreheads are muddy, when the teacher asks a second time.

Now, this argument can be generalized to any (finite) number of children.  The crucial concept here is common knowledge.  We call a fact “common knowledge” if, not only does everyone know it, but everyone knows everyone knows it, and everyone knows everyone knows everyone knows it, and so on.  It’s true that in the beginning, each child knew that all the other children had muddy foreheads, but it wasn’t common knowledge that even one of them had a muddy forehead.  For example, if your forehead and mine are both muddy, then I know that at least one of us has a muddy forehead, and you know that too, but you don’t know that I know it (for what if your forehead were clean?), and I don’t know that you know it (for what if my forehead were clean?).

What the teacher’s announcement did, was to make it common knowledge that at least one child has a muddy forehead (since not only did everyone hear the announcement, but everyone witnessed everyone else hearing it, etc.).  And once you understand that point, it’s easy to argue by induction: after the teacher asks and no child stands up (and everyone sees that no one stood up), it becomes common knowledge that at least two children have muddy foreheads (since if only one child had had a muddy forehead, that child would’ve known it and stood up).  Next it becomes common knowledge that at least three children have muddy foreheads, and so on, until after a hundred rounds it’s common knowledge that everyone’s forehead is muddy, so everyone stands up.

The moral is that the mere act of saying something publicly can change the world—even if everything you said was already obvious to every last one of your listeners.  For it’s possible that, until your announcement, not everyone knew that everyone knew the thing, or knew everyone knew everyone knew it, etc., and that could have prevented them from acting.

This idea turns out to have huge real-life consequences, to situations way beyond children with muddy foreheads.  I mean, it also applies to children with dots on their foreheads, or “kick me” signs on their backs…

But seriously, let me give you an example I stole from Steven Pinker, from his wonderful book The Stuff of Thought.  Two people of indeterminate gender—let’s not make any assumptions here—go on a date.  Afterward, one of them says to the other: “Would you like to come up to my apartment to see my etchings?”  The other says, “Sure, I’d love to see them.”

This is such a cliché that we might not even notice the deep paradox here.  It’s like with life itself: people knew for thousands of years that every bird has the right kind of beak for its environment, but not until Darwin and Wallace could anyone articulate why (and only a few people before them even recognized there was a question there that called for a non-circular answer).

In our case, the puzzle is this: both people on the date know perfectly well that the reason they’re going up to the apartment has nothing to do with etchings.  They probably even both know the other knows that.  But if that’s the case, then why don’t they just blurt it out: “would you like to come up for some intercourse?”  (Or “fluid transfer,” as the John Nash character put it in the Beautiful Mind movie?)

So here’s Pinker’s answer.  Yes, both people know why they’re going to the apartment, but they also want to avoid their knowledge becoming common knowledge.  They want plausible deniability.  There are several possible reasons: to preserve the romantic fantasy of being “swept off one’s feet.”  To provide a face-saving way to back out later, should one of them change their mind: since nothing was ever openly said, there’s no agreement to abrogate.  In fact, even if only one of the people (say A) might care about such things, if the other person (say B) thinks there’s any chance A cares, B will also have an interest in avoiding common knowledge, for A’s sake.

Put differently, the issue is that, as soon as you say X out loud, the other person doesn’t merely learn X: they learn that you know X, that you know that they know that you know X, that you want them to know you know X, and an infinity of other things that might upset the delicate epistemic balance.  Contrast that with the situation where X is left unstated: yeah, both people are pretty sure that “etchings” are just a pretext, and can even plausibly guess that the other person knows they’re pretty sure about it.  But once you start getting to 3, 4, 5, levels of indirection—who knows?  Maybe you do just want to show me some etchings.

Philosophers like to discuss Sherlock Holmes and Professor Moriarty meeting in a train station, and Moriarty declaring, “I knew you’d be here,” and Holmes replying, “well, I knew that you knew I’d be here,” and Moriarty saying, “I knew you knew I knew I’d be here,” etc.  But real humans tend to be unable to reason reliably past three or four levels in the knowledge hierarchy.  (Related to that, you might have heard of the game where everyone guesses a number between 0 and 100, and the winner is whoever’s number is the closest to 2/3 of the average of all the numbers.  If this game is played by perfectly rational people, who know they’re all perfectly rational, and know they know, etc., then they must all guess 0—exercise for you to see why.  Yet experiments show that, if you actually want to win this game against average people, you should guess about 20.  People seem to start with 50 or so, iterate the operation of multiplying by 2/3 a few times, and then stop.)

Incidentally, do you know what I would’ve given for someone to have explained this stuff to me back in high school?  I think that a large fraction of the infamous social difficulties that nerds have, is simply down to nerds spending so much time in domains (like math and science) where the point is to struggle with every last neuron to make everything common knowledge, to make all truths as clear and explicit as possible.  Whereas in social contexts, very often you’re managing a delicate epistemic balance where you need certain things to be known, but not known to be known, and so forth—where you need to prevent common knowledge from arising, at least temporarily.  “Normal” people have an intuitive feel for this; it doesn’t need to be explained to them.  For nerds, by contrast, explaining it—in terms of the muddy children puzzle and so forth—might be exactly what’s needed.  Once they’re told the rules of a game, nerds can try playing it too!  They might even turn out to be good at it.

OK, now for a darker example of common knowledge in action.  If you read accounts of Nazi Germany, or the USSR, or North Korea or other despotic regimes today, you can easily be overwhelmed by this sense of, “so why didn’t all the sane people just rise up and overthrow the totalitarian monsters?  Surely there were more sane people than crazy, evil ones.  And probably the sane people even knew, from experience, that many of their neighbors were sane—so why this cowardice?”  Once again, it could be argued that common knowledge is the key.  Even if everyone knows the emperor is naked; indeed, even if everyone knows everyone knows he’s naked, still, if it’s not common knowledge, then anyone who says the emperor’s naked is knowingly assuming a massive personal risk.  That’s why, in the story, it took a child to shift the equilibrium.  Likewise, even if you know that 90% of the populace will join your democratic revolt provided they themselves know 90% will join it, if you can’t make your revolt’s popularity common knowledge, everyone will be stuck second-guessing each other, worried that if they revolt they’ll be an easily-crushed minority.  And because of that very worry, they’ll be correct!

(My favorite Soviet joke involves a man standing in the Moscow train station, handing out leaflets to everyone who passes by.  Eventually, of course, the KGB arrests him—but they discover to their surprise that the leaflets are just blank pieces of paper.  “What’s the meaning of this?” they demand.  “What is there to write?” replies the man.  “It’s so obvious!”  Note that this is precisely a situation where the man is trying to make common knowledge something he assumes his “readers” already know.)

The kicker is that, to prevent something from becoming common knowledge, all you need to do is censor the common-knowledge-producing mechanisms: the press, the Internet, public meetings.  This nicely explains why despots throughout history have been so obsessed with controlling the press, and also explains how it’s possible for 10% of a population to murder and enslave the other 90% (as has happened again and again in our species’ sorry history), even though the 90% could easily overwhelm the 10% by acting in concert.  Finally, it explains why believers in the Enlightenment project tend to be such fanatical absolutists about free speech—why they refuse to “balance” it against cultural sensitivity or social harmony or any other value, as so many well-meaning people urge these days.

OK, but let me try to tell you something surprising about common knowledge.  Here at SPARC, you’ve learned all about Bayes’ rule—how, if you like, you can treat “probabilities” as just made-up numbers in your head, which are required obey the probability calculus, and then there’s a very definite rule for how to update those numbers when you gain new information.  And indeed, how an agent that wanders around constantly updating these numbers in its head, and taking whichever action maximizes its expected utility (as calculated using the numbers), is probably the leading modern conception of what it means to be “rational.”

Now imagine that you’ve got two agents, call them Alice and Bob, with common knowledge of each other’s honesty and rationality, and with the same prior probability distribution over some set of possible states of the world.  But now imagine they go out and live their lives, and have totally different experiences that lead to their learning different things, and having different posterior distributions.  But then they meet again, and they realize that their opinions about some topic (say, Hillary’s chances of winning the election) are common knowledge: they both know each other’s opinion, and they both know that they both know, and so on.  Then a striking 1976 result called Aumann’s Theorem states that their opinions must be equal.  Or, as it’s summarized: “rational agents with common priors can never agree to disagree about anything.”

Actually, before going further, let’s prove Aumann’s Theorem—since it’s one of those things that sounds like a mistake when you first hear it, and then becomes a triviality once you see the 3-line proof.  (Albeit, a “triviality” that won Aumann a Nobel in economics.)  The key idea is that knowledge induces a partition on the set of possible states of the world.  Huh?  OK, imagine someone is either an old man, an old woman, a young man, or a young woman.  You and I agree in giving each of these a 25% prior probability.  Now imagine that you find out whether they’re a man or a woman, and I find out whether they’re young or old.  This can be illustrated as follows:


The diagram tells us, for example, that if the ground truth is “old woman,” then your knowledge is described by the set {old woman, young woman}, while my knowledge is described by the set {old woman, old man}.  And this different information leads us to different beliefs: for example, if someone asks for the probability that the person is a woman, you’ll say 100% but I’ll say 50%.  OK, but what does it mean for information to be common knowledge?  It means that I know that you know that I know that you know, and so on.  Which means that, if you want to find out what’s common knowledge between us, you need to take the least common coarsening of our knowledge partitions.  I.e., if the ground truth is some given world w, then what do I consider it possible that you consider it possible that I consider possible that … etc.?  Iterate this growth process until it stops, by “zigzagging” between our knowledge partitions, and you get the set S of worlds such that, if we’re in world w, then what’s common knowledge between us is that the world belongs to S.  Repeat for all w’s, and you get the least common coarsening of our partitions.  In the above example, the least common coarsening is trivial, with all four worlds ending up in the same set S, but there are nontrivial examples as well:


Now, if Alice’s expectation of a random variable X is common knowledge between her and Bob, that means that everywhere in S, her expectation must be constant … and hence must equal whatever the expectation is, over all the worlds in S!  Likewise, if Bob’s expectation is common knowledge with Alice, then everywhere in S, it must equal the expectation of X over S.  But that means that Alice’s and Bob’s expectations are the same.

There are lots of related results.  For example, rational agents with common priors, and common knowledge of each other’s rationality, should never engage in speculative trade (e.g., buying and selling stocks, assuming that they don’t need cash, they’re not earning a commission, etc.).  Why?  Basically because, if I try to sell you a stock for (say) $50, then you should reason that the very fact that I’m offering it means I must have information you don’t that it’s worth less than $50, so then you update accordingly and you don’t want it either.

Or here’s another one: suppose again that we’re Bayesians with common priors, and we’re having a conversation, where I tell you my opinion (say, of the probability Hillary will win the election).  Not any of the reasons or evidence on which the opinion is based—just the opinion itself.  Then you, being Bayesian, update your probabilities to account for what my opinion is.  Then you tell me your opinion (which might have changed after learning mine), I update on that, I tell you my new opinion, then you tell me your new opinion, and so on.  You might think this could go on forever!  But, no, Geanakoplos and Polemarchakis observed that, as long as there are only finitely many possible states of the world in our shared prior, this process must converge after finitely many steps with you and me having the same opinion (and moreover, with it being common knowledge that we have that opinion).  Why?  Because as long as our opinions differ, your telling me your opinion or me telling you mine must induce a nontrivial refinement of one of our knowledge partitions, like so:


I.e., if you learn something new, then at least one of your knowledge sets must get split along the different possible values of the thing you learned.  But since there are only finitely many underlying states, there can only be finitely many such splittings (note that, since Bayesians never forget anything, knowledge sets that are split will never again rejoin).

And something else: suppose your friend tells you a liberal opinion, then you take it into account, but reply with a more conservative opinion.  The friend takes your opinion into account, and replies with a revised opinion.  Question: is your friend’s new opinion likelier to be more liberal than yours, or more conservative?

Obviously, more liberal!  Yes, maybe your friend now sees some of your points and vice versa, maybe you’ve now drawn a bit closer (ideally!), but you’re not going to suddenly switch sides because of one conversation.

Yet, if you and your friend are Bayesians with common priors, one can prove that that’s not what should happen at all.  Indeed, your expectation of your own future opinion should equal your current opinion, and your expectation of your friend’s next opinion should also equal your current opinion—meaning that you shouldn’t be able to predict in which direction your opinion will change next, nor in which direction your friend will next disagree with you.  Why not?  Formally, because all these expectations are just different ways of calculating an expectation over the same set, namely your current knowledge set (i.e., the set of states of the world that you currently consider possible)!  More intuitively, we could say: if you could predict that, all else equal, the next thing you heard would probably shift your opinion in a liberal direction, then as a Bayesian you should already shift your opinion in a liberal direction right now.  (This is related to what’s called the “martingale property”: sure, a random variable X could evolve in many ways in the future, but the average of all those ways must be its current expectation E[X], by the very definition of E[X]…)

So, putting all these results together, we get a clear picture of what rational disagreements should look like: they should follow unbiased random walks, until sooner or later they terminate in common knowledge of complete agreement.  We now face a bit of a puzzle, in that hardly any disagreements in the history of the world have ever looked like that.  So what gives?

There are a few ways out:

(1) You could say that the “failed prediction” of Aumann’s Theorem is no surprise, since virtually all human beings are irrational cretins, or liars (or at least, it’s not common knowledge that they aren’t). Except for you, of course: you’re perfectly rational and honest.  And if you ever met anyone else as rational and honest as you, maybe you and they could have an Aumannian conversation.  But since such a person probably doesn’t exist, you’re totally justified to stand your ground, discount all opinions that differ from yours, etc.  Notice that, even if you genuinely believed that was all there was to it, Aumann’s Theorem would still have an aspirational significance for you: you would still have to say this is the ideal that all rationalists should strive toward when they disagree.  And that would already conflict with a lot of standard rationalist wisdom.  For example, we all know that arguments from authority carry little weight: what should sway you is not the mere fact of some other person stating their opinion, but the actual arguments and evidence that they’re able to bring.  Except that as we’ve seen, for Bayesians with common priors this isn’t true at all!  Instead, merely hearing your friend’s opinion serves as a powerful summary of what your friend knows.  And if you learn that your rational friend disagrees with you, then even without knowing why, you should take that as seriously as if you discovered a contradiction in your own thought processes.  This is related to an even broader point: there’s a normative rule of rationality that you should judge ideas only on their merits—yet if you’re a Bayesian, of course you’re going to take into account where the ideas come from, and how many other people hold them!  Likewise, if you’re a Bayesian police officer or a Bayesian airport screener or a Bayesian job interviewer, of course you’re going to profile people by their superficial characteristics, however unfair that might be to individuals—so all those studies proving that people evaluate the same resume differently if you change the name at the top are no great surprise.  It seems to me that the tension between these two different views of rationality, the normative and the Bayesian, generates a lot of the most intractable debates of the modern world.

(2) Or—and this is an obvious one—you could reject the assumption of common priors. After all, isn’t a major selling point of Bayesianism supposed to be its subjective aspect, the fact that you pick “whichever prior feels right for you,” and are constrained only in how to update that prior?  If Alice’s and Bob’s priors can be different, then all the reasoning I went through earlier collapses.  So rejecting common priors might seem appealing.  But there’s a paper by Tyler Cowen and Robin Hanson called “Are Disagreements Honest?”—one of the most worldview-destabilizing papers I’ve ever read—that calls that strategy into question.  What it says, basically, is this: if you’re really a thoroughgoing Bayesian rationalist, then your prior ought to allow for the possibility that you are the other person.  Or to put it another way: “you being born as you,” rather than as someone else, should be treated as just one more contingent fact that you observe and then conditionalize on!  And likewise, the other person should condition on the observation that they’re them and not you.  In this way, absolutely everything that makes you different from someone else can be understood as “differing information,” so we’re right back to the situation covered by Aumann’s Theorem.  Imagine, if you like, that we all started out behind some Rawlsian veil of ignorance, as pure reasoning minds that had yet to be assigned specific bodies.  In that original state, there was nothing to differentiate any of us from any other—anything that did would just be information to condition on—so we all should’ve had the same prior.  That might sound fanciful, but in some sense all it’s saying is: what licenses you to privilege an observation just because it’s your eyes that made it, or a thought just because it happened to occur in your head?  Like, if you’re objectively smarter or more observant than everyone else around you, fine, but to whatever extent you agree that you aren’t, your opinion gets no special epistemic protection just because it’s yours.

(3) If you’re uncomfortable with this tendency of Bayesian reasoning to refuse to be confined anywhere, to want to expand to cosmic or metaphysical scope (“I need to condition on having been born as me and not someone else”)—well then, you could reject the entire framework of Bayesianism, as your notion of rationality. Lest I be cast out from this camp as a heretic, I hasten to say: I include this option only for the sake of completeness!

(4) When I first learned about this stuff 12 years ago, it seemed obvious to me that a lot of it could be dismissed as irrelevant to the real world for reasons of complexity. I.e., sure, it might apply to ideal reasoners with unlimited time and computational power, but as soon as you impose realistic constraints, this whole Aumannian house of cards should collapse.  As an example, if Alice and Bob have common priors, then sure they’ll agree about everything if they effectively share all their information with each other!  But in practice, we don’t have time to “mind-meld,” swapping our entire life experiences with anyone we meet.  So one could conjecture that agreement, in general, requires a lot of communication.  So then I sat down and tried to prove that as a theorem.  And you know what I found?  That my intuition here wasn’t even close to correct!

In more detail, I proved the following theorem.  Suppose Alice and Bob are Bayesians with shared priors, and suppose they’re arguing about (say) the probability of some future event—or more generally, about any random variable X bounded in [0,1].  So, they have a conversation where Alice first announces her expectation of X, then Bob announces his new expectation, and so on.  The theorem says that Alice’s and Bob’s estimates of X will necessarily agree to within ±ε, with probability at least 1-δ over their shared prior, after they’ve exchanged only O(1/(δε2)) messages.  Note that this bound is completely independent of how much knowledge they have; it depends only on the accuracy with which they want to agree!  Furthermore, the same bound holds even if Alice and Bob only send a few discrete bits about their real-valued expectations with each message, rather than the expectations themselves.

The proof involves the idea that Alice and Bob’s estimates of X, call them XA and XB respectively, follow “unbiased random walks” (or more formally, are martingales).  Very roughly, if |XA-XB|≥ε with high probability over Alice and Bob’s shared prior, then that fact implies that the next message has a high probability (again, over the shared prior) of causing either XA or XB to jump up or down by about ε.  But XA and XB, being estimates of X, are bounded between 0 and 1.  So a random walk with a step size of ε can only continue for about 1/ε2 steps before it hits one of the “absorbing barriers.”

The way to formalize this is to look at the variances, Var[XA] and Var[XB], with respect to the shared prior.  Because Alice and Bob’s partitions keep getting refined, the variances are monotonically non-decreasing.  They start out 0 and can never exceed 1 (in fact they can never exceed 1/4, but let’s not worry about constants).  Now, the key lemma is that, if Pr[|XA-XB|≥ε]≥δ, then Var[XB] must increase by at least δε2 if Alice sends XA to Bob, and Var[XA] must increase by at least δε2 if Bob sends XB to Alice.  You can see my paper for the proof, or just work it out for yourself.  At any rate, the lemma implies that, after O(1/(δε2)) rounds of communication, there must be at least a temporary break in the disagreement; there must be some round where Alice and Bob approximately agree with high probability.

There are lots of other results in my paper, including an upper bound on the number of calls that Alice and Bob need to make to a “sampling oracle” to carry out this sort of protocol approximately, assuming they’re not perfect Bayesians but agents with bounded computational power.  But let me step back and address the broader question: what should we make of all this?  How should we live with the gargantuan chasm between the prediction of Bayesian rationality for how we should disagree, and the actual facts of how we do disagree?

We could simply declare that human beings are not well-modeled as Bayesians with common priors—that we’ve failed in giving a descriptive account of human behavior—and leave it at that.   OK, but that would still leave the question: does this stuff have normative value?  Should it affect how we behave, if we want to consider ourselves honest and rational?  I would argue, possibly yes.

Yes, you should constantly ask yourself the question: “would I still be defending this opinion, if I had been born as someone else?”  (Though you might say this insight predates Aumann by quite a bit, going back at least to Spinoza.)

Yes, if someone you respect as honest and rational disagrees with you, you should take it as seriously as if the disagreement were between two different aspects of yourself.

Finally, yes, we can try to judge epistemic communities by how closely they approach the Aumannian ideal.  In math and science, in my experience, it’s common to see two people furiously arguing with each other at a blackboard.  Come back five minutes later, and they’re arguing even more furiously, but now their positions have switched.  As we’ve seen, that’s precisely what the math says a rational conversation should look like.  In social and political discussions, though, usually the very best you’ll see is that two people start out diametrically opposed, but eventually one of them says “fine, I’ll grant you this,” and the other says “fine, I’ll grant you that.”  We might say, that’s certainly better than the common alternative, of the two people walking away even more polarized than before!  Yet the math tells us that even the first case—even the two people gradually getting closer in their views—is nothing at all like a rational exchange, which would involve the two participants repeatedly leapfrogging each other, completely changing their opinion about the question under discussion (and then changing back, and back again) every time they learned something new.  The first case, you might say, is more like haggling—more like “I’ll grant you that X is true if you grant me that Y is true”—than like our ideal friendly mathematicians arguing at the blackboard, whose acceptance of new truths is never slow or grudging, never conditional on the other person first agreeing with them about something else.

Armed with this understanding, we could try to rank fields by how hard it is to have an Aumannian conversation in them.  At the bottom—the easiest!—is math (or, let’s say, chess, or debugging a program, or fact-heavy fields like lexicography or geography).  Crucially, here I only mean the parts of these subjects with agreed-on rules and definite answers: once the conversation turns to whose theorems are deeper, or whose fault the bug was, things can get arbitrarily non-Aumannian.  Then there’s the type of science that involves messy correlational studies (I just mean, talking about what’s a risk factor for what, not the political implications).  Then there’s politics and aesthetics, with the most radioactive topics like Israel/Palestine higher up.  And then, at the very peak, there’s gender and social justice debates, where everyone brings their formative experiences along, and absolutely no one is a disinterested truth-seeker, and possibly no Aumannian conversation has ever been had in the history of the world.

I would urge that even at the very top, it’s still incumbent on all of us to try to make the Aumannian move, of “what would I think about this issue if I were someone else and not me?  If I were a man, a woman, black, white, gay, straight, a nerd, a jock?  How much of my thinking about this represents pure Spinozist reason, which could be ported to any rational mind, and how much of it would get lost in translation?”

Anyway, I’m sure some people would argue that, in the end, the whole framework of Bayesian agents, common priors, common knowledge, etc. can be chucked from this discussion like so much scaffolding, and the moral lessons I want to draw boil down to trite advice (“try to see the other person’s point of view”) that you all knew already.  Then again, even if you all knew all this, maybe you didn’t know that you all knew it!  So I hope you gained some new information from this talk in any case.  Thanks.

Update: Coincidentally, there’s a moving NYT piece by Oliver Sacks, which (among other things) recounts his experiences with his cousin, the Aumann of Aumann’s theorem.

Another Update: If I ever did attempt an Aumannian conversation with someone, the other Scott A. would be a candidate! Here he is in 2011 making several of the same points I did above, using the same examples (I thank him for pointing me to his post).

August 26, 2015

Scott AaronsonD-Wave Open Thread

A bunch of people have asked me to comment on D-Wave’s release of its 1000-qubit processor, and a paper by a group including Cathy McGeoch saying that the machine is 1 or 2 orders of faster (in annealing time, not wall-clock time) than simulated annealing running on a single-core classical computer.  It’s even been suggested that the “Scott-signal” has been shining brightly for a week above Quantham City, but that Scott-man has been too lazy and out-of-shape even to change into his tights.

Scientifically, it’s not clear if much has changed.  D-Wave now has a chip with twice as many qubits as the last one.  That chip continues to be pretty effective at finding its own low-energy states: indeed, depending on various details of definition, the machine can even find its own low-energy states “faster” than some implementation of simulated annealing running on a single-core chip.  Of course, it’s entirely possible that Matthias Troyer or Sergei Isakov or Troels Ronnow or someone like that will be able to find a better implementation of simulated annealing that closes even the modest gap—as happened the last time—but I’ll give the authors the benefit of the doubt that they put good-faith effort into optimizing the classical code.

More importantly, I’d say it remains unclear whether any of the machine’s performance on the instances tested here can be attributed to quantum tunneling effects.  In fact, the paper explicitly states (see page 3) that it’s not going to consider such questions, and I think the authors would agree that you could very well see results like theirs, even if what was going on was fundamentally classical annealing.  Also, of course, it’s still true that, if you wanted to solve a practical optimization problem, you’d first need to encode it into the Chimera graph, and that reduction entails a loss that could hand a decisive advantage to simulated annealing, even without the need to go to multiple cores.  (This is what I’ve described elsewhere as essentially all of these performance comparisons taking place on “the D-Wave machine’s home turf”: that is, on binary constraint satisfaction problems that have precisely the topology of D-Wave’s Chimera graph.)

But, I dunno, I’m just not feeling the urge to analyze this in more detail.  Part of the reason is that I think the press might be getting less hyper-excitable these days, thereby reducing the need for a Chief D-Wave Skeptic.  By this point, there may have been enough D-Wave announcements that papers realize they no longer need to cover each one like an extraterrestrial landing.  And there are more hats in the ring now, with John Martinis at Google seeking to build superconducting quantum annealing machines but with ~10,000x longer coherence times than D-Wave’s, and with IBM Research and some others also trying to scale superconducting QC.  The realization has set in, I think, that both D-Wave and the others are in this for the long haul, with D-Wave currently having lots of qubits, but with very short coherence times and unclear prospects for any quantum speedup, and Martinis and some others having qubits of far higher quality, but not yet able to couple enough of them.

The other issue is that, on my flight from Seoul back to Newark, I watched two recent kids’ movies that were almost defiant in their simple, unironic, 1950s-style messages of hope and optimism.  One was Disney’s new live-action Cinderella; the other was Brad Bird’s Tomorrowland.  And seeing these back-to-back filled me with such positivity and good will that, at least for these few hours, it’s hard to summon my usual crusty self.  I say, let’s invent the future together, and build flying cars and jetpacks in our garages!  Let a thousand creative ideas bloom for how to tackle climate change and the other crises facing civilization!  (Admittedly, mass-market flying cars and jetpacks are probably not a step forward on climate change … but, see, there’s that negativity coming back.)  And let another thousand ideas bloom for how to build scalable quantum computers—sure, including D-Wave’s!  Have courage and be kind!

So yeah, if readers would like to discuss the recent D-Wave paper further (especially those who know something about it), they’re more than welcome to do so in the comments section.  But I’ve been away from Dana and Lily for two weeks, and will endeavor to spend time with them rather than obsessively reloading the comments (let’s see if I succeed).

As a small token of my goodwill, I enclose two photos from my last visit to a D-Wave machine, which occurred when I met with some grad students in Waterloo this past spring.  As you can see, I even personally certified that the machine was operating as expected.  But more than that: surpassing all reasonable expectations for quantum AI, this model could actually converse intelligently, through a protruding head resembling that of IQC grad student Sarah Kaiser.


BackreactionHawking proposes new idea for how information might escape from black holes

So I’m at this black hole conference in Stockholm, and at his public lecture yesterday evening, Stephen Hawking announced that he has figured out how information escapes from black holes, and he will tell us today at the conference at 11am.
As your blogger at location I feel a certain duty to leak information ;)

Extrapolating from the previous paper and some rumors, it’s something with AdS/CFT and work with Andrew Strominger, so likely to have some strings attached.

30 minutes to 11, and the press has arrived. They're clustering in my back, so they're going to watch me type away, fun.

10 minutes to 11, some more information emerges. There's a third person involved in this work, besides Andrew Strominger also Malcom Perry who is sitting in the row in front of me. They started their collaboration at a workshop in Hereforshire Easter 2015.

10 past 11. The Awaited is late. We're told it will be another 10 minutes.

11 past 11. Here he comes.

He says that he has solved a problem that has bothered people since 40 years, and so on. He now understands that information is stored on the black hole horizon in form of "supertranslations," which were introduced in the mid 1960s by Bondi and Metzner. This makes much sense because Strominger has been onto this recently. It occurred to Hawking in April, when listening to a talk by Strominger, that black hole horizons also have supertranslations. The supertranslations are caused by the ingoing particles.

That's it. Time for questions. Rovelli asking: Do supertranslations change the quantum state?

Just for the record, I don't know anything about supertranslations, so don't ask.

It's taking a long time for Hawking to compose a reply. People start mumbling. Everybody trying to guess what he meant. I can see that you can use supertranslations to store information, but don't understand how the information from the initial matter gets moved into other degrees of freedom. The only way I can see how this works is that the information was there twice to begin with.

Oh, we're now seeing Hawking's desktop projected by beamer. He is patching together a reply to Rovelli. Everybody seems confused.

Malcom Perry mumbling he'll give a talk this afternoon and explain everything. Good.

Hawking is saying (typing) that the supertranslations are a hologram of the ingoing particles.

It's painful to watch actually, seeing that I'm easily typing two paragraphs in the time he needs for one word :(

Yes, I figure he is saying the information was there twice to begin with. It's stored on the horizon in form of supertranslations, which can make a tiny delay for the emission of Hawking particles. Which presumably can encode information in the radiation.

Paul Davies asking if the argument goes through for de Sitter space or only asymptotically flat space. Hawking saying it applies to black holes in any background.

Somebody else asks if quantum fluctuations of the background will be relevant. 't Hooft answering with yes, but they have no microphone, I can't understand them very well.

I'm being told there will be an arxiv paper some time end of September probably.

Ok, so Hawking is saying in reply to Rovelli that it's an effect caused by the classical gravitational field. Now I am confused because the gravitational field doesn't uniquely encode quantum states. It's something I myself have tried to use before. The gravitational field of the ingoing particles does always affect the outgoing radation, in principle. The effect is exceedingly weak of course, but it's there. If the classical gravitational field of the ingoing particles could encode all the information about the ingoing radiation then this alone would do away with the information loss problem. But it doesn't work.You can have two bosons of energy E on top of each other and arrange it so they have the same classical gravitational field as one of twice this energy.

Rovelli nodding to my question (I think he meant the same thing). 't Hooft saying in reply that not all field configurations would be allowed. Somebody else saying there are no states that cannot be distinguished by their metric. This doesn't make sense to me because then the information was always present twice, already classically and then what would one need the supertranslations for?

Ok, so, end of discussion session, lunch break. We'll all await Malcom Perry's talk this afternoon.

Update: After Malcom Perry's talk, some more details have emerged. Yes, it is a purely classical picture, at least for now. The BMS group essentially provides classical black hole hair in form of an infinite amount of charges. Of course you don't really want an infinite amount, you want a finite amount that fits the Bekenstein-Hawking entropy. One would expect that this necessitates a quantized version (at least geometrically quantized, or with a finite phase-space volume). But there isn't one so far.

Neither is there, at this point, a clear picture for how the information gets into the outgoing radiation. I am somewhat concerned actually that once one looks at the quantum picture, the BMS charges at infinity will be entangled with charges falling into the black hole, thus essentially reinventing the black hole information problem.

Finally, to add some context to 't Hooft's remark, Perry said that since this doesn't work for all types of charges, not all models for particle content would be allowed, as for example information about baryon number couldn't be saved this way. He also said that you wouldn't have this problem in string theory, but I didn't really understand why.

Another Update: Here is a summary from Jacob Aron at New Scientist.

Another Update: A video of Hawking's talk is now available here.

Yet another update: Malcom Perry will give a second, longer, lecture on the topic tomorrow morning, which will be recorded and be made available on the Nordita website.

Jordan EllenbergVampire post gets Brooksed

A while ago I read a great paper by the philosopher L. A. Paul and wrote this post about it, asking:  is the experience of becoming a vampire analogous in important ways to the experience of becoming a parent?  When deciding whether to become a vampire, is it relevant what human you thinks about being a vampire, or only what future vampire you would think about being a vampire?

Paul liked the example and was kind enough to include (her much deeper and more fully worked-out version of) it in her book, Transformative Experience.

And now David Brooks, the official public philosopher de nous jours, has devoted a whole column to Paul’s book!  And he leads with the vampires!

Let’s say you had the chance to become a vampire. With one magical bite you would gain immortality, superhuman strength and a life of glamorous intensity. Your friends who have undergone the transformation say the experience is incredible. They drink animal blood, not human blood, and say everything about their new existence provides them with fun, companionship and meaning.

Would you do it? Would you consent to receive the life-altering bite, even knowing that once changed you could never go back?

The difficulty of the choice is that you’d have to use your human self and preferences to try to guess whether you’d enjoy having a vampire self and preferences. Becoming a vampire is transformational. You would literally become a different self. How can you possibly know what it would feel like to be this different version of you or whether you would like it?

Brooks punts on the actually difficult questions raised by Paul’s book, counseling you to cast aside contemplation of your various selves’ preferences and do as objective moral standards demand.  But Paul makes it clear (p.19) that “in the circumstances I am considering… there are no moral or religious rules that determine just which act you should choose.”

Note well, buried in the last paragraph:

When we’re shopping for something, we act as autonomous creatures who are looking for the product that will produce the most pleasure or utility. But choosing to have a child or selecting a spouse, faith or life course is not like that.

Choosing children, spouses, and vocations are discussed elsewhere in the piece, but choosing a religion is not.  And yet there it is in the summation.  The column is yet more evidence for my claim that David Brooks will shortly announce — let’s say within a year — that he’s converting to Christianity.  Controversial predictions!  And vampires!  All part of the Quomodocumque brand.


August 25, 2015

Doug NatelsonNews items: Feynman, superconductors, faculty shuffle

A few brief news items - our first week of classes this term is a busy time.

  • Here is a video of Richard Feynman, explaining why he can't readily explain permanent magnets to the interviewer.   This gets right to the heart of why explaining science in a popular, accessible way can be very difficult.  Sure, he could come up with really stretched and tortured analogies, but truly getting at the deeper science behind the permanent magnets and their interactions would require laying a ton of groundwork, way more than what an average person would want to hear.
  • Here is a freely available news article from Nature about superconductivity in H2S at very high pressures.   I was going to write at some length about this but haven't found the time.  The short version:  There have been predictions for a long time that hydrogen, at very high pressures like in the interior of Jupiter, should be metallic and possibly a relatively high temperature superconductor.  There are later predictions that hydrogen-rich alloys and compounds could also superconduct at pretty high temperatures.  Now it seems that hydrogen sulfide does just this.  Crank up the pressure to 1.5 million atmospheres, and that stinky gas becomes what seems to be a relatively conventional (!) superconductor, with a transition temperature close to 200 K.  The temperature is comparatively high because of a combination of an effectively high speed of sound (the material gets pretty stiff at those pressures), a large density of electrons available to participate, and a strong coupling between the electrons and those vibrations (so that the vibrations can provide an effective attractive interaction between the electrons that leads to pairing).    The important thing about this work is that it shows that there is no obvious reason why superconductivity at or near room temperature should be ruled out.
  • Congratulations to Prof. Laura Greene, incoming APS president, who has been named the new chief scientist of the National High Magnetic Field Lab.  
  • Likewise, congratulations to Prof. Meigan Aronson, who has been named Texas A&M University's new Dean of Science.  

August 24, 2015

Terence TaoA wave equation approach to automorphic forms in analytic number theory

The Poincaré upper half-plane {{\mathbf H} := \{ z: \hbox{Im}(z) > 0 \}} (with a boundary consisting of the real line {{\bf R}} together with the point at infinity {\infty}) carries an action of the projective special linear group

\displaystyle  \hbox{PSL}_2({\bf R}) := \{ \begin{pmatrix} a & b \\ c & d \end{pmatrix}: a,b,c,d \in {\bf R}: ad-bc = 1 \} / \{\pm 1\}

via fractional linear transformations:

\displaystyle  \begin{pmatrix} a & b \\ c & d \end{pmatrix} z := \frac{az+b}{cz+d}. \ \ \ \ \ (1)

Here and in the rest of the post we will abuse notation by identifying elements {\begin{pmatrix} a & b \\ c & d \end{pmatrix}} of the special linear group {\hbox{SL}_2({\bf R})} with their equivalence class {\{ \pm \begin{pmatrix} a & b \\ c & d \end{pmatrix} \}} in {\hbox{PSL}_2({\bf R})}; this will occasionally create or remove a factor of two in our formulae, but otherwise has very little effect, though one has to check that various definitions and expressions (such as (1)) are unaffected if one replaces a matrix {\begin{pmatrix} a & b \\ c & d \end{pmatrix}} by its negation {\begin{pmatrix} -a & -b \\ -c & -d \end{pmatrix}}. In particular, we recommend that the reader ignore the signs {\pm} that appear from time to time in the discussion below.

As the action of {\hbox{PSL}_2({\bf R})} on {{\mathbf H}} is transitive, and any given point in {{\mathbf H}} (e.g. {i}) has a stabiliser isomorphic to the projective rotation group {\hbox{PSO}_2({\bf R})}, we can view the Poincaré upper half-plane {{\mathbf H}} as a homogeneous space for {\hbox{PSL}_2({\bf R})}, and more specifically the quotient space of {\hbox{PSL}_2({\bf R})} of a maximal compact subgroup {\hbox{PSO}_2({\bf R})}. In fact, we can make the half-plane a symmetric space for {\hbox{PSL}_2({\bf R})}, by endowing {{\mathbf H}} with the Riemannian metric

\displaystyle  dg^2 := \frac{dx^2 + dy^2}{y^2}

(using Cartesian coordinates {z=x+iy}), which is invariant with respect to the {\hbox{PSL}_2({\bf R})} action. Like any other Riemannian metric, the metric on {{\mathbf H}} generates a number of other important geometric objects on {{\mathbf H}}, such as the distance function {d(z,w)} which can be computed to be given by the formula

\displaystyle  2(\cosh(d(z_1,z_2))-1) = \frac{|z_1-z_2|^2}{\hbox{Im}(z_1) \hbox{Im}(z_2)}, \ \ \ \ \ (2)

the volume measure {\mu = \mu_{\mathbf H}}, which can be computed to be

\displaystyle  d\mu = \frac{dx dy}{y^2},

and the Laplace-Beltrami operator, which can be computed to be {\Delta = y^2 (\frac{\partial^2}{\partial x^2} + \frac{\partial^2}{\partial y^2})} (here we use the negative definite sign convention for {\Delta}). As the metric {dg} was {\hbox{PSL}_2({\bf R})}-invariant, all of these quantities arising from the metric are similarly {\hbox{PSL}_2({\bf R})}-invariant in the appropriate sense.

The Gauss curvature of the Poincaré half-plane can be computed to be the constant {-1}, thus {{\mathbf H}} is a model for two-dimensional hyperbolic geometry, in much the same way that the unit sphere {S^2} in {{\bf R}^3} is a model for two-dimensional spherical geometry (or {{\bf R}^2} is a model for two-dimensional Euclidean geometry). (Indeed, {{\mathbf H}} is isomorphic (via projection to a null hyperplane) to the upper unit hyperboloid {\{ (x,t) \in {\bf R}^{2+1}: t = \sqrt{1+|x|^2}\}} in the Minkowski spacetime {{\bf R}^{2+1}}, which is the direct analogue of the unit sphere in Euclidean spacetime {{\bf R}^3} or the plane {{\bf R}^2} in Galilean spacetime {{\bf R}^2 \times {\bf R}}.)

One can inject arithmetic into this geometric structure by passing from the Lie group {\hbox{PSL}_2({\bf R})} to the full modular group

\displaystyle  \hbox{PSL}_2({\bf Z}) := \{ \begin{pmatrix} a & b \\ c & d \end{pmatrix}: a,b,c,d \in {\bf Z}: ad-bc = 1 \} / \{\pm 1\}

or congruence subgroups such as

\displaystyle  \Gamma_0(q) := \{ \begin{pmatrix} a & b \\ c & d \end{pmatrix} \in \hbox{PSL}_2({\bf Z}): c = 0\ (q) \} / \{ \pm 1 \} \ \ \ \ \ (3)

for natural number {q}, or to the discrete stabiliser {\Gamma_\infty} of the point at infinity:

\displaystyle  \Gamma_\infty := \{ \pm \begin{pmatrix} 1 & b \\ 0 & 1 \end{pmatrix}: b \in {\bf Z} \} / \{\pm 1\}. \ \ \ \ \ (4)

These are discrete subgroups of {\hbox{PSL}_2({\bf R})}, nested by the subgroup inclusions

\displaystyle  \Gamma_\infty \leq \Gamma_0(q) \leq \Gamma_0(1)=\hbox{PSL}_2({\bf Z}) \leq \hbox{PSL}_2({\bf R}).

There are many further discrete subgroups of {\hbox{PSL}_2({\bf R})} (known collectively as Fuchsian groups) that one could consider, but we will focus attention on these three groups in this post.

Any discrete subgroup {\Gamma} of {\hbox{PSL}_2({\bf R})} generates a quotient space {\Gamma \backslash {\mathbf H}}, which in general will be a non-compact two-dimensional orbifold. One can understand such a quotient space by working with a fundamental domain {\hbox{Fund}( \Gamma \backslash {\mathbf H})} – a set consisting of a single representative of each of the orbits {\Gamma z} of {\Gamma} in {{\mathbf H}}. This fundamental domain is by no means uniquely defined, but if the fundamental domain is chosen with some reasonable amount of regularity, one can view {\Gamma \backslash {\mathbf H}} as the fundamental domain with the boundaries glued together in an appropriate sense. Among other things, fundamental domains can be used to induce a volume measure {\mu = \mu_{\Gamma \backslash {\mathbf H}}} on {\Gamma \backslash {\mathbf H}} from the volume measure {\mu = \mu_{\mathbf H}} on {{\mathbf H}} (restricted to a fundamental domain). By abuse of notation we will refer to both measures simply as {\mu} when there is no chance of confusion.

For instance, a fundamental domain for {\Gamma_\infty \backslash {\mathbf H}} is given (up to null sets) by the strip {\{ z \in {\mathbf H}: |\hbox{Re}(z)| < \frac{1}{2} \}}, with {\Gamma_\infty \backslash {\mathbf H}} identifiable with the cylinder formed by gluing together the two sides of the strip. A fundamental domain for {\hbox{PSL}_2({\bf Z}) \backslash {\mathbf H}} is famously given (again up to null sets) by an upper portion {\{ z \in {\mathbf H}: |\hbox{Re}(z)| < \frac{1}{2}; |z| > 1 \}}, with the left and right sides again glued to each other, and the left and right halves of the circular boundary glued to itself. A fundamental domain for {\Gamma_0(q) \backslash {\mathbf H}} can be formed by gluing together

\displaystyle  [\hbox{PSL}_2({\bf Z}) : \Gamma_0(q)] = q \prod_{p|q} (1 + \frac{1}{p}) = q^{1+o(1)}

copies of a fundamental domain for {\hbox{PSL}_2({\bf Z}) \backslash {\mathbf H}} in a rather complicated but interesting fashion.

While fundamental domains can be a convenient choice of coordinates to work with for some computations (as well as for drawing appropriate pictures), it is geometrically more natural to avoid working explicitly on such domains, and instead work directly on the quotient spaces {\Gamma \backslash {\mathbf H}}. In order to analyse functions {f: \Gamma \backslash {\mathbf H} \rightarrow {\bf C}} on such orbifolds, it is convenient to lift such functions back up to {{\mathbf H}} and identify them with functions {f: {\mathbf H} \rightarrow {\bf C}} which are {\Gamma}-automorphic in the sense that {f( \gamma z ) = f(z)} for all {z \in {\mathbf H}} and {\gamma \in \Gamma}. Such functions will be referred to as {\Gamma}-automorphic forms, or automorphic forms for short (we always implicitly assume all such functions to be measurable). (Strictly speaking, these are the automorphic forms with trivial factor of automorphy; one can certainly consider other factors of automorphy, particularly when working with holomorphic modular forms, which corresponds to sections of a more non-trivial line bundle over {\Gamma \backslash {\mathbf H}} than the trivial bundle {(\Gamma \backslash {\mathbf H}) \times {\bf C}} that is implicitly present when analysing scalar functions {f: {\mathbf H} \rightarrow {\bf C}}. However, we will not discuss this (important) more general situation here.)

An important way to create a {\Gamma}-automorphic form is to start with a non-automorphic function {f: {\mathbf H} \rightarrow {\bf C}} obeying suitable decay conditions (e.g. bounded with compact support will suffice) and form the Poincaré series {P_\Gamma[f]: {\mathbf H} \rightarrow {\bf C}} defined by

\displaystyle  P_{\Gamma}[f](z) = \sum_{\gamma \in \Gamma} f(\gamma z),

which is clearly {\Gamma}-automorphic. (One could equivalently write {f(\gamma^{-1} z)} in place of {f(\gamma z)} here; there are good argument for both conventions, but I have ultimately decided to use the {f(\gamma z)} convention, which makes explicit computations a little neater at the cost of making the group actions work in the opposite order.) Thus we naturally see sums over {\Gamma} associated with {\Gamma}-automorphic forms. A little more generally, given a subgroup {\Gamma_\infty} of {\Gamma} and a {\Gamma_\infty}-automorphic function {f: {\mathbf H} \rightarrow {\bf C}} of suitable decay, we can form a relative Poincaré series {P_{\Gamma_\infty \backslash \Gamma}[f]: {\mathbf H} \rightarrow {\bf C}} by

\displaystyle  P_{\Gamma_\infty \backslash \Gamma}[f](z) = \sum_{\gamma \in \hbox{Fund}(\Gamma_\infty \backslash \Gamma)} f(\gamma z)

where {\hbox{Fund}(\Gamma_\infty \backslash \Gamma)} is any fundamental domain for {\Gamma_\infty \backslash \Gamma}, that is to say a subset of {\Gamma} consisting of exactly one representative for each right coset of {\Gamma_\infty}. As {f} is {\Gamma_\infty}-automorphic, we see (if {f} has suitable decay) that {P_{\Gamma_\infty \backslash \Gamma}[f]} does not depend on the precise choice of fundamental domain, and is {\Gamma}-automorphic. These operations are all compatible with each other, for instance {P_\Gamma = P_{\Gamma_\infty \backslash \Gamma} \circ P_{\Gamma_\infty}}. A key example of Poincaré series are the Eisenstein series, although there are of course many other Poincaré series one can consider by varying the test function {f}.

For future reference we record the basic but fundamental unfolding identities

\displaystyle  \int_{\Gamma \backslash {\mathbf H}} P_\Gamma[f] g\ d\mu_{\Gamma \backslash {\mathbf H}} = \int_{\mathbf H} f g\ d\mu_{\mathbf H} \ \ \ \ \ (5)

for any function {f: {\mathbf H} \rightarrow {\bf C}} with sufficient decay, and any {\Gamma}-automorphic function {g} of reasonable growth (e.g. {f} bounded and compact support, and {g} bounded, will suffice). Note that {g} is viewed as a function on {\Gamma \backslash {\mathbf H}} on the left-hand side, and as a {\Gamma}-automorphic function on {{\mathbf H}} on the right-hand side. More generally, one has

\displaystyle  \int_{\Gamma \backslash {\mathbf H}} P_{\Gamma_\infty \backslash \Gamma}[f] g\ d\mu_{\Gamma \backslash {\mathbf H}} = \int_{\Gamma_\infty \backslash {\mathbf H}} f g\ d\mu_{\Gamma_\infty \backslash {\mathbf H}} \ \ \ \ \ (6)

whenever {\Gamma_\infty \leq \Gamma} are discrete subgroups of {\hbox{PSL}_2({\bf R})}, {f} is a {\Gamma_\infty}-automorphic function with sufficient decay on {\Gamma_\infty \backslash {\mathbf H}}, and {g} is a {\Gamma}-automorphic (and thus also {\Gamma_\infty}-automorphic) function of reasonable growth. These identities will allow us to move fairly freely between the three domains {{\mathbf H}}, {\Gamma_\infty \backslash {\mathbf H}}, and {\Gamma \backslash {\mathbf H}} in our analysis.

When computing various statistics of a Poincaré series {P_\Gamma[f]}, such as its values {P_\Gamma[f](z)} at special points {z}, or the {L^2} quantity {\int_{\Gamma \backslash {\mathbf H}} |P_\Gamma[f]|^2\ d\mu}, expressions of interest to analytic number theory naturally emerge. We list three basic examples of this below, discussed somewhat informally in order to highlight the main ideas rather than the technical details.

The first example we will give concerns the problem of estimating the sum

\displaystyle  \sum_{n \leq x} \tau(n) \tau(n+1), \ \ \ \ \ (7)

where {\tau(n) := \sum_{d|n} 1} is the divisor function. This can be rewritten (by factoring {n=bc} and {n+1=ad}) as

\displaystyle  \sum_{ a,b,c,d \in {\bf N}: ad-bc = 1} 1_{bc \leq x} \ \ \ \ \ (8)

which is basically a sum over the full modular group {\hbox{PSL}_2({\bf Z})}. At this point we will “cheat” a little by moving to the related, but different, sum

\displaystyle  \sum_{a,b,c,d \in {\bf Z}: ad-bc = 1} 1_{a^2+b^2+c^2+d^2 \leq x}. \ \ \ \ \ (9)

This sum is not exactly the same as (8), but will be a little easier to handle, and it is plausible that the methods used to handle this sum can be modified to handle (8). Observe from (2) and some calculation that the distance between {i} and {\begin{pmatrix} a & b \\ c & d \end{pmatrix} i = \frac{ai+b}{ci+d}} is given by the formula

\displaystyle  2(\cosh(d(i,\begin{pmatrix} a & b \\ c & d \end{pmatrix} i))-1) = a^2+b^2+c^2+d^2 - 2

and so one can express the above sum as

\displaystyle  2 \sum_{\gamma \in \hbox{PSL}_2({\bf Z})} 1_{d(i,\gamma i) \leq \hbox{cosh}^{-1}(x/2)}

(the factor of {2} coming from the quotient by {\{\pm 1\}} in the projective special linear group); one can express this as {P_\Gamma[f](i)}, where {\Gamma = \hbox{PSL}_2({\bf Z})} and {f} is the indicator function of the ball {B(i, \hbox{cosh}^{-1}(x/2))}. Thus we see that expressions such as (7) are related to evaluations of Poincaré series. (In practice, it is much better to use smoothed out versions of indicator functions in order to obtain good control on sums such as (7) or (9), but we gloss over this technical detail here.)

The second example concerns the relative

\displaystyle  \sum_{n \leq x} \tau(n^2+1) \ \ \ \ \ (10)

of the sum (7). Note from multiplicativity that (7) can be written as {\sum_{n \leq x} \tau(n^2+n)}, which is superficially very similar to (10), but with the key difference that the polynomial {n^2+1} is irreducible over the integers.

As with (7), we may expand (10) as

\displaystyle  \sum_{A,B,C \in {\bf N}: B^2 - AC = -1} 1_{B \leq x}.

At first glance this does not look like a sum over a modular group, but one can manipulate this expression into such a form in one of two (closely related) ways. First, observe that any factorisation {B + i = (a-bi) (c+di)} of {B+i} into Gaussian integers {a-bi, c+di} gives rise (upon taking norms) to an identity of the form {B^2 - AC = -1}, where {A = a^2+b^2} and {C = c^2+d^2}. Conversely, by using the unique factorisation of the Gaussian integers, every identity of the form {B^2-AC=-1} gives rise to a factorisation of the form {B+i = (a-bi) (c+di)}, essentially uniquely up to units. Now note that {(a-bi)(c+di)} is of the form {B+i} if and only if {ad-bc=1}, in which case {B = ac+bd}. Thus we can essentially write the above sum as something like

\displaystyle  \sum_{a,b,c,d: ad-bc = 1} 1_{|ac+bd| \leq x} \ \ \ \ \ (11)

and one the modular group {\hbox{PSL}_2({\bf Z})} is now manifest. An equivalent way to see these manipulations is as follows. A triple {A,B,C} of natural numbers with {B^2-AC=1} gives rise to a positive quadratic form {Ax^2+2Bxy+Cy^2} of normalised discriminant {B^2-AC} equal to {-1} with integer coefficients (it is natural here to allow {B} to take integer values rather than just natural number values by essentially doubling the sum). The group {\hbox{PSL}_2({\bf Z})} acts on the space of such quadratic forms in a natural fashion (by composing the quadratic form with the inverse {\begin{pmatrix} d & -b \\ -c & a \end{pmatrix}} of an element {\begin{pmatrix} a & b \\ c & d \end{pmatrix}} of {\hbox{SL}_2({\bf Z})}). Because the discriminant {-1} has class number one (this fact is equivalent to the unique factorisation of the gaussian integers, as discussed in this previous post), every form {Ax^2 + 2Bxy + Cy^2} in this space is equivalent (under the action of some element of {\hbox{PSL}_2({\bf Z})}) with the standard quadratic form {x^2+y^2}. In other words, one has

\displaystyle  Ax^2 + 2Bxy + Cy^2 = (dx-by)^2 + (-cx+ay)^2

which (up to a harmless sign) is exactly the representation {B = ac+bd}, {A = c^2+d^2}, {C = a^2+b^2} introduced earlier, and leads to the same reformulation of the sum (10) in terms of expressions like (11). Similar considerations also apply if the quadratic polynomial {n^2+1} is replaced by another quadratic, although one has to account for the fact that the class number may now exceed one (so that unique factorisation in the associated quadratic ring of integers breaks down), and in the positive discriminant case the fact that the group of units might be infinite presents another significant technical problem.

Note that {\begin{pmatrix} a & b \\ c & d \end{pmatrix} i = \frac{ai+b}{ci+d}} has real part {\frac{ac+bd}{c^2+d^2}} and imaginary part {\frac{1}{c^2+d^2}}. Thus (11) is (up to a factor of two) the Poincaré series {P_\Gamma[f](i)} as in the preceding example, except that {f} is now the indicator of the sector {\{ z: |\hbox{Re} z| \leq x |\hbox{Im} z| \}}.

Sums involving subgroups of the full modular group, such as {\Gamma_0(q)}, often arise when imposing congruence conditions on sums such as (10), for instance when trying to estimate the expression {\sum_{n \leq x: q|n} \tau(n^2+1)} when {q} and {x} are large. As before, one then soon arrives at the problem of evaluating a Poincaré series at one or more special points, where the series is now over {\Gamma_0(q)} rather than {\hbox{PSL}_2({\bf Z})}.

The third and final example concerns averages of Kloosterman sums

\displaystyle  S(m,n;c) := \sum_{x \in ({\bf Z}/c{\bf Z})^\times} e( \frac{mx + n\overline{x}}{c} ) \ \ \ \ \ (12)

where {e(\theta) := e^{2p\i i\theta}} and {\overline{x}} is the inverse of {x} in the multiplicative group {({\bf Z}/c{\bf Z})^\times}. It turns out that the {L^2} norms of Poincaré series {P_\Gamma[f]} or {P_{\Gamma_\infty \backslash \Gamma}[f]} are closely tied to such averages. Consider for instance the quantity

\displaystyle  \int_{\Gamma_0(q) \backslash {\mathbf H}} |P_{\Gamma_\infty \backslash \Gamma_0(q)}[f]|^2\ d\mu_{\Gamma \backslash {\mathbf H}} \ \ \ \ \ (13)

where {q} is a natural number and {f} is a {\Gamma_\infty}-automorphic form that is of the form

\displaystyle  f(x+iy) = F(my) e(m x)

for some integer {m} and some test function {f: (0,+\infty) \rightarrow {\bf C}}, which for sake of discussion we will take to be smooth and compactly supported. Using the unfolding formula (6), we may rewrite (13) as

\displaystyle  \int_{\Gamma_\infty \backslash {\mathbf H}} \overline{f} P_{\Gamma_\infty \backslash \Gamma_0(q)}[f]\ d\mu_{\Gamma_\infty \backslash {\mathbf H}}.

To compute this, we use the double coset decomposition

\displaystyle  \Gamma_0(q) = \Gamma_\infty \cup \bigcup_{c \in {\mathbf N}: q|c} \bigcup_{1 \leq d \leq c: (d,c)=1} \Gamma_\infty \begin{pmatrix} a & b \\ c & d \end{pmatrix} \Gamma_\infty,

where for each {c,d}, {a,b} are arbitrarily chosen integers such that {ad-bc=1}. To see this decomposition, observe that every element {\begin{pmatrix} a & b \\ c & d \end{pmatrix}} in {\Gamma_0(q)} outside of {\Gamma_\infty} can be assumed to have {c>0} by applying a sign {\pm}, and then using the row and column operations coming from left and right multiplication by {\Gamma_\infty} (that is, shifting the top row by an integer multiple of the bottom row, and shifting the right column by an integer multiple of the left column) one can place {d} in the interval {[1,c]} and {(a,b)} to be any specified integer pair with {ad-bc=1}. From this we see that

\displaystyle  P_{\Gamma_\infty \backslash \Gamma_0(q)}[f] = f + \sum_{c \in {\mathbf N}: q|c} \sum_{1 \leq d \leq c: (d,c)=1} P_{\Gamma_\infty}[ f( \begin{pmatrix} a & b \\ c & d \end{pmatrix} \cdot ) ]

and so from further use of the unfolding formula (5) we may expand (13) as

\displaystyle  \int_{\Gamma_\infty \backslash {\mathbf H}} |f|^2\ d\mu_{\Gamma_\infty \backslash {\mathbf H}}

\displaystyle  + \sum_{c \in {\mathbf N}} \sum_{1 \leq d \leq c: (d,c)=1} \int_{\mathbf H} \overline{f}(z) f( \begin{pmatrix} a & b \\ c & d \end{pmatrix} z)\ d\mu_{\mathbf H}.

The first integral is just {m \int_0^\infty |F(y)|^2 \frac{dy}{y^2}}. The second expression is more interesting. We have

\displaystyle  \begin{pmatrix} a & b \\ c & d \end{pmatrix} z = \frac{az+b}{cz+d} = \frac{a}{c} - \frac{1}{c(cz+d)}

\displaystyle  = \frac{a}{c} - \frac{cx+d}{c((cx+d)^2+c^2y^2)} + \frac{iy}{(cx+d)^2 + c^2y^2}

so we can write

\displaystyle  \int_{\mathbf H} \overline{f}(z) f( \begin{pmatrix} a & b \\ c & d \end{pmatrix} z)\ d\mu_{\mathbf H}


\displaystyle  \int_0^\infty \int_{\bf R} \overline{F}(my) F(\frac{imy}{(cx+d)^2 + c^2y^2}) e( -mx + \frac{ma}{c} - m \frac{cx+d}{c((cx+d)^2+c^2y^2)} )

\displaystyle \frac{dx dy}{y^2}

which on shifting {x} by {d/c} simplifies a little to

\displaystyle  e( \frac{ma}{c} + \frac{md}{c} ) \int_0^\infty \int_{\bf R} F(my) \bar{F}(\frac{imy}{c^2(x^2 + y^2)}) e(- mx - m \frac{x}{c^2(x^2+y^2)} )

\displaystyle  \frac{dx dy}{y^2}

and then on scaling {x,y} by {m} simplifies a little further to

\displaystyle  e( \frac{ma}{c} + \frac{md}{c} ) \int_0^\infty \int_{\bf R} F(y) \bar{F}(\frac{m^2}{c^2} \frac{iy}{x^2 + y^2}) e(- x - \frac{m^2}{c^2} \frac{x}{x^2+y^2} )\ \frac{dx dy}{y^2}.

Note that as {ad-bc=1}, we have {a = \overline{d}} modulo {c}. Comparing the above calculations with (12), we can thus write (13) as

\displaystyle  m (\int_0^\infty |F(y)|^2 \frac{dy}{y^2} + \sum_{q|c} \frac{S(m,m;c)}{c} V(\frac{m}{c})) \ \ \ \ \ (14)


\displaystyle  V(u) := \frac{1}{u} \int_0^\infty \int_{\bf R} F(y) \bar{F}(u^2 \frac{y}{x^2 + y^2}) e(- x - u^2 \frac{x}{x^2+y^2} )\ \frac{dx dy}{y^2}

is a certain integral involving {F} and a parameter {u}, but which does not depend explicitly on parameters such as {m,c,d}. Thus we have indeed expressed the {L^2} expression (13) in terms of Kloosterman sums. It is possible to invert this analysis and express varius weighted sums of Kloosterman sums in terms of {L^2} expressions (possibly involving inner products instead of norms) of Poincaré series, but we will not do so here; see Chapter 16 of Iwaniec and Kowalski for further details.

Traditionally, automorphic forms have been analysed using the spectral theory of the Laplace-Beltrami operator {-\Delta} on spaces such as {\Gamma\backslash {\mathbf H}} or {\Gamma_\infty \backslash {\mathbf H}}, so that a Poincaré series such as {P_\Gamma[f]} might be expanded out using inner products of {P_\Gamma[f]} (or, by the unfolding identities, {f}) with various generalised eigenfunctions of {-\Delta} (such as cuspidal eigenforms, or Eisenstein series). With this approach, special functions, and specifically the modified Bessel functions {K_{it}} of the second kind, play a prominent role, basically because the {\Gamma_\infty}-automorphic functions

\displaystyle  x+iy \mapsto y^{1/2} K_{it}(2\pi |m| y) e(mx)

for {t \in {\bf R}} and {m \in {\bf Z}} non-zero are generalised eigenfunctions of {-\Delta} (with eigenvalue {\frac{1}{4}+t^2}), and are almost square-integrable on {\Gamma_\infty \backslash {\mathbf H}} (the {L^2} norm diverges only logarithmically at one end {y \rightarrow 0^+} of the cylinder {\Gamma_\infty \backslash {\mathbf H}}, while decaying exponentially fast at the other end {y \rightarrow +\infty}).

However, as discussed in this previous post, the spectral theory of an essentially self-adjoint operator such as {-\Delta} is basically equivalent to the theory of various solution operators associated to partial differential equations involving that operator, such as the Helmholtz equation {(-\Delta + k^2) u = f}, the heat equation {\partial_t u = \Delta u}, the Schrödinger equation {i\partial_t u + \Delta u = 0}, or the wave equation {\partial_{tt} u = \Delta u}. Thus, one can hope to rephrase many arguments that involve spectral data of {-\Delta} into arguments that instead involve resolvents {(-\Delta + k^2)^{-1}}, heat kernels {e^{t\Delta}}, Schrödinger propagators {e^{it\Delta}}, or wave propagators {e^{\pm it\sqrt{-\Delta}}}, or involve the PDE more directly (e.g. applying integration by parts and energy methods to solutions of such PDE). This is certainly done to some extent in the existing literature; resolvents and heat kernels, for instance, are often utilised. In this post, I would like to explore the possibility of reformulating spectral arguments instead using the inhomogeneous wave equation

\displaystyle  \partial_{tt} u - \Delta u = F.

Actually it will be a bit more convenient to normalise the Laplacian by {\frac{1}{4}}, and look instead at the automorphic wave equation

\displaystyle  \partial_{tt} u + (-\Delta - \frac{1}{4}) u = F. \ \ \ \ \ (15)

This equation somewhat resembles a “Klein-Gordon” type equation, except that the mass is imaginary! This would lead to pathological behaviour were it not for the negative curvature, which in principle creates a spectral gap of {\frac{1}{4}} that cancels out this factor.

The point is that the wave equation approach gives access to some nice PDE techniques, such as energy methods, Sobolev inequalities and finite speed of propagation, which are somewhat submerged in the spectral framework. The wave equation also interacts well with Poincaré series; if for instance {u} and {F} are {\Gamma_\infty}-automorphic solutions to (15) obeying suitable decay conditions, then their Poincaré series {P_{\Gamma_\infty \backslash \Gamma}[u]} and {P_{\Gamma_\infty \backslash \Gamma}[F]} will be {\Gamma}-automorphic solutions to the same equation (15), basically because the Laplace-Beltrami operator commutes with translations. Because of these facts, it is possible to replicate several standard spectral theory arguments in the wave equation framework, without having to deal directly with things like the asymptotics of modified Bessel functions. The wave equation approach to automorphic theory was introduced by Faddeev and Pavlov (using the Lax-Phillips scattering theory), and developed further by by Lax and Phillips, to recover many spectral facts about the Laplacian on modular curves, such as the Weyl law and the Selberg trace formula. Here, I will illustrate this by deriving three basic applications of automorphic methods in a wave equation framework, namely

  • Using the Weil bound on Kloosterman sums to derive Selberg’s 3/16 theorem on the least non-trivial eigenvalue for {-\Delta} on {\Gamma_0(q) \backslash {\mathbf H}} (discussed previously here);
  • Conversely, showing that Selberg’s eigenvalue conjecture (improving Selberg’s {3/16} bound to the optimal {1/4}) implies an optimal bound on (smoothed) sums of Kloosterman sums; and
  • Using the same bound to obtain pointwise bounds on Poincaré series similar to the ones discussed above. (Actually, the argument here does not use the wave equation, instead it just uses the Sobolev inequality.)

This post originated from an attempt to finally learn this part of analytic number theory properly, and to see if I could use a PDE-based perspective to understand it better. Ultimately, this is not that dramatic a depature from the standard approach to this subject, but I found it useful to think of things in this fashion, probably due to my existing background in PDE.

I thank Bill Duke and Ben Green for helpful discussions. My primary reference for this theory was Chapters 15, 16, and 21 of Iwaniec and Kowalski.

— 1. Selberg’s {3/16} theorem —

We begin with a proof of the following celebrated result of Selberg:

Theorem 1 Let {q \geq 1} be a natural number. Then every eigenvalue of {-\Delta} on {L^2(\Gamma_0(q) \backslash {\mathbf H})_0} (the mean zero functions on {\Gamma_0(q) \backslash {\mathbf H}}) is at least {3/16}.

One can show that {-\Delta} has only pure point spectrum below {1/4} on {L^2(\Gamma_0(q) \backslash {\mathbf H})_0} (see this previous blog post for more discussion). Thus, this theorem shows that the spectrum of {-\Delta} on {L^2(\Gamma_0(q) \backslash {\mathbf H})_0} is contained in {[-3/16,+\infty)}.

We now prove this theorem. Suppose this were not the case, then we have a non-zero eigenfunction {\phi} of {-\Delta} in {L^2(\Gamma_0(q) \backslash {\mathbf H})_0} with eigenvalue {\frac{1}{4} - \delta^2} for some {\delta > \frac{1}{4}}; we may assume {\phi} to be real-valued, and by elliptic regularity it is smooth (on {{\mathbf H}}). If it is constant in the horizontal variable, thus {\phi(x+iy) = \phi(y)}, then by the {\Gamma_0(q)}-automorphic nature of {\phi} it is easy to see that {\phi} is globally constant, contradicting the fact that it is mean zero but not identically zero. Thus it is not identically constant in the horizontal variable. By Fourier analysis on the cylinder {\Gamma_\infty \backslash {\mathbf H}}, one can then find a {\Gamma_\infty}-automorphic function {f_0} of the form {f_0(x+iy) = F_0(my) e(mx)} for some non-zero integer {m} which has a non-zero inner product with {\phi} on {\Gamma_\infty \backslash {\mathbf H}}, where {F_0: (0,+\infty) \rightarrow {\bf C}} is a smooth compactly supported function.

Now we evolve {f_0} by the wave equation

\displaystyle  \partial_{tt} f - \Delta f - \frac{1}{4} f = 0 \ \ \ \ \ (16)

to obtain a smooth function {f: {\bf R} \times {\mathbf H} \rightarrow {\bf C}} such that {f(0,x) = f_0(x)} and {\partial_t f(0,x) = 0}; the existence (and uniqueness) of such a solution to this initial value problem can be established by standard wave equation methods (e.g. parametrices and energy estimates), or by the spectral theory of the Laplacian. (One can also solve for {f} explicitly in terms of modified Bessel functions, but we will not need to do so here, which is one of the main points of using the wave equation method.) Since the initial data {f_0} obeyed the translation symmetry {f_0(z + x) = f_0(z) e(mx)} for all {z \in {\mathbf H}} and {x \in {\bf R}}, we see (from the uniqueness theory and translation invariance of the wave equation) that {f} also obeys this symmetry; in particular {f(t)} is {\Gamma_\infty}-automorphic for all times {t}. By finite speed of propagation, {f(t)} remains compactly supported in {\Gamma_\infty \backslash {\mathbf H}} for all time {t}, in fact for positive time it will lie in the strip {\{ z: e^{-t} \ll \hbox{Im} z \ll e^t \}}, where we allow the implied constants to depend on the initial data {f_0}.

Taking the inner product of {f} with the eigenfunction {\phi} on {\Gamma_\infty \backslash {\mathbf H}}, differentiating under the integral sign, and integrating by parts, we see that

\displaystyle  \partial_{tt} \langle f, \phi \rangle_{L^2(\Gamma_\infty \backslash {\mathbf H})} + \delta^2 \langle f, \phi \rangle_{L^2(\Gamma_\infty \backslash {\mathbf H})} = 0.

Since {\langle f, \phi \rangle_{\Gamma \backslash {\mathbf H}}} is initially non-zero with zero velocity, we conclude from solving the ODE that {\langle f(t), \phi \rangle_{\Gamma_\infty \backslash {\mathbf H}}} is a non-zero multiple of {\cosh(\delta t)}. In particular, it grows like {e^{\delta t}} as {t \rightarrow +\infty}. Using the unfolding identity (6) to write

\displaystyle  \langle f(t), \phi \rangle_{L^2(\Gamma_\infty \backslash {\mathbf H})} = \langle P_{\Gamma_\infty \backslash \Gamma_0(q)}[f(t)], \phi \rangle_{L^2(\Gamma_0(q) \backslash {\mathbf H})}

and then using the Cauchy-Schwarz inequality, we conclude that

\displaystyle  \int_{\Gamma_0(q) \backslash {\mathbf H}} | P_{\Gamma_\infty \backslash \Gamma_0(q)}[f(t)] |^2\ d\mu_{\Gamma_0(q) \backslash {\mathbf H}} \gg e^{2\delta t} \ \ \ \ \ (17)

as {t \rightarrow +\infty}, where we allow implied constants to depend on {f, q, \phi}.

We complement this lower bound with slightly crude upper bound in which we are willing to lose some powers of {t}. We have already seen that {f(t)} is supported in the strip {\{ z: e^{-t} \ll \hbox{Im} z \ll e^t \}}. Compactly supported solutions to (16) on the cylinder {\Gamma_\infty \backslash {\mathbf H}} conserve the energy

\displaystyle  \frac{1}{2} \int_{\Gamma_\infty \backslash {\mathbf H}} |\partial_t f|^2 + y^2(|\partial_x f|^2 + |\partial_y f|^2 - \frac{1}{4} |f|^2) \frac{ dx dy}{y^2}.

In particular, this quantity is {O(1)} for all time (recall we allow implied constants to depend on {f,q,\phi}). From Hardy’s inequality, the quantity

\displaystyle  \int_{\Gamma_\infty \backslash {\mathbf H}} |\partial_y f|^2 - \frac{1}{4} |f|^2\ dx dy

is non-negative. Discarding this term and using {\partial_x f = 2\pi i my}, and using the fact that {m} is non-zero, we arrive at the bounds

\displaystyle  \int_{\Gamma_\infty \backslash {\mathbf H}} |f|^2\ dx dy \ll 1


\displaystyle  \int_{\Gamma_\infty \backslash {\mathbf H}} |\partial_t f|^2\ \frac{dx dy}{y^2} \ll 1.

(We allow implied constants to depend on {m,f,q,\phi}, but not on {t}.) From the fundamental theorem of calculus and Minkowski’s inequality in {L^2}, the latter inequality implies that

\displaystyle  \int_{\Gamma_\infty \backslash {\mathbf H}} |f|^2\ \frac{dx dy}{y^2} \ll t^2

for {t \geq 1}, which on combining with the former inequality gives

\displaystyle  \int_{\Gamma_\infty \backslash {\mathbf H}} |f|^2 (1 + y^{-2})\ dx dy \ll t^2.

The function {\Delta f} also obeys the wave equation (16), so a similar argument gives

\displaystyle  \int_{\Gamma_\infty \backslash {\mathbf H}} |\Delta f|^2 (1+y^{-2})\ dx dy \ll t^2.

Applying a Sobolev inequality on unit squares (for {y \geq 1}) or on squares of length comparable to {y} (for {y < 1}) we conclude the pointwise estimates

\displaystyle  |f(t,x+iy)| \ll t

for {y \geq 1} and

\displaystyle  |f(t,x+iy)| \ll t y^{1/2}

for {y < 1}. In particular, we write {f(t,x+iy) = F(t,my) e(m,x)}, we have the somewhat crude estimates

\displaystyle  |F(t,y)| \ll t^{O(1)} y^{1/2}

for all {y > 0} and {t \geq 1}. (One can do better than this, particularly for large {y}, but this bound will suffice for us.)

By repeating the analysis of (13) at the start of this post, we see that the quantity

\displaystyle  \int_{\Gamma_0(q) \backslash {\mathbf H}} | P_{\Gamma_\infty \backslash \Gamma_0(q)}[f(t)] |^2\ d\mu_{\Gamma_0(q) \backslash {\mathbf H}} \ \ \ \ \ (18)

can be expressed as

\displaystyle  m (\int_0^\infty |F(t,y)|^2 \frac{dy}{y^2} + \sum_{q|c} \frac{S(m,m;c)}{c} V(t,\frac{m}{c}))


\displaystyle  V(t,u) := \frac{1}{u} \int_0^\infty \int_{\bf R} F(t,y) \bar{F}(t,u^2 \frac{y}{x^2 + y^2}) e(- x - u^2 \frac{x}{x^2+y^2} )\ \frac{dx dy}{y^2}.

Since {F(t,y)} is supported on {\{ e^{-t} \ll y \ll e^t \}} and is bounded by {O( t^{O(1)} y^{1/2} )}, the integral {\int_0^\infty |F(t,y)|^2 \frac{dy}{y^2}} is {O(t^{O(1)})} for {t \geq 1}. We also see that {V(t,u)} vanishes unless {u \gg e^{-t}} (otherwise {y} and {u^2 \frac{y}{x^2+y^2} \geq \frac{u^2}{y}} cannot simultaneously be {\gg e^{-t}}, and for such values of {u}, we have the triangle inequality bound

\displaystyle  V(t,u) \ll \frac{t^{O(1)}}{u} \int_{e^{-t} \ll y \ll e^t} \int_{x \ll e^t u} y^{1/2} (u^2 \frac{y}{x^2+y^2})^{1/2}\ \frac{dx dy}{y^2}.

Evaluating the {x} integral and then the {y} integral, we arrive at

\displaystyle  V(t,u) \ll t^{O(1)} \log(2+u)

and so we can bound (18) (ignoring any potential cancellation in {c}) by

\displaystyle  \ll t^{O(1)} ( 1 + \sum_{c \ll e^t} \frac{|S(m,m;c)|}{c} ).

Now we use the Weil bound for Kloosterman sums, which gives

\displaystyle  |S(m,m;c)| \ll c^{1/2 + o(1)}

(see e.g. this previous post for a discussion of this bound) to arrive at the bound

\displaystyle  \int_{\Gamma_0(q) \backslash {\mathbf H}} | P_{\Gamma_\infty \backslash \Gamma_0(q)}[f(t)] |^2\ d\mu_{\Gamma_0(q) \backslash {\mathbf H}} \ll t^{O(1)} e^{t/2}

as {t \rightarrow \infty}. Comparing this with (17) we obtain a contradiction as {t \rightarrow \infty} since we have {\delta > \frac{1}{4}}, and the claim follows.

Remark 2 It was conjectured by Linnik that

\displaystyle  \sum_{c \leq x} \frac{S(m,n;c)}{c} \ll x^{o(1)}

as {x \rightarrow \infty} for any fixed {m,n}; this, when combined with a more refined analysis of the above type, implies the Selberg eigenvalue conjecture that all eigenvalues of {-\Delta} on {L^2(\Gamma_0(q) \backslash {\mathbf H})_0} are at least {1/4}.

— 2. Consequences of Selberg’s conjecture —

In the previous section we saw how bounds on Kloosterman sums gave rise to lower bounds on eigenvalues of the Laplacian. It turns out that this implication is reversible. The simplest case (at least from the perspective of wave equation methods) is when Selberg’s eigenvalue conjecture is true, so that the Laplacian on {L^2(\Gamma_0(q) \backslash {\mathbf H})_0} has spectrum in {[\frac{1}{4},\infty)}. Equivalently, one has the inequality

\displaystyle  \langle -\Delta f, f \rangle_{L^2(\Gamma_0(q) \backslash {\mathbf H})_0} \geq \frac{1}{4} \langle f, f \rangle_{L^2(\Gamma_0(q) \backslash {\mathbf H})_0}

for all {f \in L^2(\Gamma_0(q) \backslash {\mathbf H})_0} (interpreting derivatives in a distributional sense if necessary). Integrating by parts, this shows that

\displaystyle  \int_{\Gamma_0(q) \backslash {\mathbf H}} |\nabla f|^2 - \frac{1}{4} |f|^2\ d\mu \geq 0 \ \ \ \ \ (19)

for all {f \in L^2(\Gamma_0(q) \backslash {\mathbf H})_0}, where the gradient {\nabla f} and its magnitude {|\nabla f|} are computed using the Riemannian metric in {\Gamma_0(q) \backslash {\mathbf H}}.

Now suppose one has a smooth, compactly supported in space solution {f: {\bf R} \times \Gamma_0(q) \backslash {\mathbf H} \rightarrow {\bf R}} to the inhomogeneous wave equation

\displaystyle  \partial_{tt} f - \Delta f - \frac{1}{4} f = g

for some forcing term {g: {\bf R} \times \Gamma_0(q) \backslash {\mathbf H} \rightarrow {\bf R}} which is also smooth and compactly supported in space. We assume that {f(t)} has mean zero for all {t}. Introducing the energy

\displaystyle  E[f(t)] := \frac{1}{2}\int_{\Gamma_0(q) \backslash {\mathbf H}} |\partial_t f|^2 + |\nabla f|^2 - \frac{1}{4} |f|^2\ d\mu,

which is non-negative thanks to (19) and integrating by parts, we obtain the energy identity

\displaystyle  \partial_t E[f(t)] = \int_{\Gamma_0(q) \backslash {\mathbf H}} (\partial_t f) g\ d\mu

and hence by Cauchy-Schwarz

\displaystyle  \partial_t E[f(t)] \ll E[f(t)]^{1/2} (\int_{\Gamma_0(q) \backslash {\mathbf H}} |g|^2\ d\mu)^{1/2}

and hence

\displaystyle  \partial_t E[f(t)]^{1/2} \ll (\int_{\Gamma_0(q) \backslash {\mathbf H}} |g|^2\ d\mu)^{1/2}

(in a distributional sense at least), giving rise to the energy inequality

\displaystyle  E[f(t)]^{1/2} \ll E[f(0)]^{1/2} + \int_0^t (\int_{\Gamma_0(q) \backslash {\mathbf H}} |g(t')|^2\ d\mu)^{1/2}\ dt'.

We can lift this inequality to the cylinder {\Gamma_\infty \backslash {\mathbf H}}, concluding that for any smooth, compactly supported in space solution {f:{\bf R} \times (\Gamma_\infty \backslash {\mathbf H}) \rightarrow {\bf R}} to the inhomogeneous equation

\displaystyle  \partial_{tt} f - \Delta f - \frac{1}{4} f = g \ \ \ \ \ (20)

for some forcing term {g: {\bf R} \times \Gamma_\infty \backslash {\mathbf H} \rightarrow {\bf R}} which is also smooth and compactly supported in space, with {f} mean zero for all time, we have the energy inequality

\displaystyle  E[P_{\Gamma_\infty \backslash \Gamma_0(q)}[f(t)]]^{1/2} \ll E[P_{\Gamma_\infty \backslash \Gamma_0(q)}[f(0)]]^{1/2}

\displaystyle + \int_0^t (\int_{\Gamma_0(q) \backslash {\mathbf H}} |P_{\Gamma_\infty \backslash \Gamma_0(q)}[g(t')]|^2\ d\mu)^{1/2}\ dt'.

One can use this inequality to analyse the {L^2} norm of Poincaré series by testing on various functions {f} (and working out {g} using (20)). Suppose for instance that {m} is a fixed natural number, and {\psi: {\bf R} \rightarrow {\bf R}} is a smooth compactly supported function. We consider the traveling wave {f:{\bf R} \times \Gamma_\infty \backslash {\mathbf H} \rightarrow {\bf R}} given by the formula

\displaystyle  f(t,x+iy) := y^{1/2} \Psi( - \log y ) \Psi( t + \log y ) e(mx)

where {\Psi(u) = \int_{-\infty}^u \psi(v)\ dv} is the primitive of {\psi}; the point is that this is an approximate solution to the homogeneous wave equation, particularly at small values of {y}. Clearly {f(t)} is compactly supported with mean zero for {t \geq 0}, in the region {\{ x+iy: e^{-t} \ll y \ll 1 \}} (we allow implied constants to depend on {\psi,m} but not on {q}). In the region {y \sim 1}, {f} and its first derivatives are {O(1)}, giving a contribution of {O(1)} to the energy {E[P_{\Gamma_\infty \backslash \Gamma_0(q)}[f(t)]]} (note that the shifts of the region {\{ 0 \leq x \leq 1; y \sim 1\}} by {\hbox{PSL}_2({\bf Z})} have bounded overlap). In particular we have

\displaystyle  E[P_{\Gamma_\infty \backslash \Gamma_0(q)}[f(0)]] \ll 1

and thus by the energy inequality (using only the {|\partial_t f|^2} portion of the energy)

\displaystyle  \int_{\Gamma_0(q) \backslash {\mathbf H}} |P_{\Gamma_\infty \backslash \Gamma_0(q)}[\partial_t f]|^2 \ll 1 + \int_0^t (\int_{\Gamma_0(q) \backslash {\mathbf H}} |P_{\Gamma_\infty \backslash \Gamma_0(q)}[g(t')]|^2\ d\mu)^{1/2}\ dt'

for {t \geq 0}, where

\displaystyle  g := \partial_{tt} f - \Delta f - \frac{1}{4} f.

Clearly {g} is supported on the region {\{ x+iy: e^{-t} \ll y \ll 1 \}}. For {y = O(1)}, one can compute that {g=O(1)}, giving a contribution of {O(t)} to the right-hand side. When {y} is much less than {1} but much larger than {e^{-t}}, we have {f(x+iy) = y^{1/2} e(mx)}, which after some calculation yields {g(x+iy) = 4\pi^2 m^2 y^{5/2} e(mx)}. As this decays so quickly as {y \rightarrow 0}, one can compute (using for instance the expansion (14) of (13) and crude estimates, ignoring all cancellation) that this contributes a total of {O(t)} to the right-hand side also. Finally one has to deal with the region {y \sim e^{-t}}, but {y} is much less than {1}. Here, {\partial_t f} is equal to {y^{1/2} \psi( t + \log y ) e(mx)}, and {f} is equal to {y^{1/2} \Psi(t + \log y) e(mx)}, which after some computation makes {g} equal to {4\pi^2 m^2 y^{5/2} \Psi(t + \log y) e(mx)}. Again, one can compute the contribution of this term to the energy inequality to be {O(t)}. We conclude that

\displaystyle  \int_{\Gamma_0(q) \backslash {\mathbf H}} |P_{\Gamma_\infty \backslash \Gamma_0(q)}[y^{1/2} \psi(t+\log y) e(mx)]|^2 \ll 1+t.

Applying the expansion (14) of (13), we conclude that

\displaystyle  \sum_{q|c} \frac{S(m,m;c)}{c} V(\frac{m}{c})) \ll 1+t \ \ \ \ \ (21)


\displaystyle  V(u) := \int_0^\infty \int_{\bf R} \frac{y}{\sqrt{x^2+y^2}} \psi(t + \log y) \psi(t + \log u^2 \frac{y}{x^2 + y^2})

\displaystyle e(- x - u^2 \frac{x}{x^2+y^2} )\ \frac{dx dy}{y^2}

The expression {V(u)} is only non-zero when {u \gg e^{-t}}, and the integrand is only non-zero when {y \sim e^{-t}} and {x \sim u}, which makes the phase {-x - u^2 \frac{x}{x^2+y^2}} of size {O(u)}. For {u} much smaller than {1}, the phase is thus largely irrelevant and the quantity {V(u)} is roughly comparable to {1} for {u \gg e^{-t}}. As such, the bound (21) can be viewed as a smoothed out version of the estimate

\displaystyle  \sum_{q|c: c \ll e^t} \frac{S(m,m;c)}{c} \ll 1+t

which is basically Linnik’s conjecture, mentioned in Remark 2. One can make this connection between Selberg’s eigenvalue conjecture and Linnik’s conjecture more precise: see Section 16.6 of Iwaniec and Kowalski, which goes through modified Bessel functions rather than through wave equation methods.

— 3. Pointwise bounds on Poincaré series —

The formula (14) for (13) allows one to compute {L^2} norms of Poincaré series. By using Sobolev embedding, one can then obtain pointwise control on such Poincaré series, as long as one stays away from the cusps. For instance, suppose we are interested in evaluating a Poincaré series {P_{\Gamma_\infty \backslash \Gamma_0(q)}[f]} at a point of the form {z = \gamma i} for some {\gamma \in \hbox{PSL}_2({\bf Z})}. From the Sobolev inequality we have

\displaystyle  |f(i)|^2 \ll \int_{B(i,1)} |f|^2 + |\Delta f|^2\ d\mu_{\mathbf H}

for any smooth function {f}, and thus by translation

\displaystyle  |f(\gamma i)|^2 \ll \int_{B(\gamma i,1)} |f|^2 + |\Delta f|^2\ d\mu_{\mathbf H}.

The ball {B(i,1)} meets only boundedly many translates of the standard fundamental domain of {\hbox{PSL}_2({\bf Z}) \backslash {\mathbf H}}, and hence {B(\gamma i,1)} does too. Since {\Gamma_0(q)} is a subgroup of {\hbox{PSL}_2({\bf Z})}, we conclude that {B(\gamma i,1)} meets only boundedly many translates of a fundamental domain for {\Gamma_0(q) \backslash {\mathbf H}}. In particular, we obtain the Sobolev inequality

\displaystyle  |f(\gamma i)|^2 \ll \int_{\Gamma_0(q) \backslash {\mathbf H}} |f|^2 + |\Delta f|^2\ d\mu_{\Gamma_0(q) \backslash \mathbf H} \ \ \ \ \ (22)

for any smooth {\Gamma_0(q)}-automorphic function {f}. This estimate is unfortunately a little inefficient when {q} is large, since the ball {B(\gamma i,1)} has area comparable to one, whereas the quotient space {\Gamma_0(q) \backslash {\mathbf H}} has area roughly comparable to {q}, so that one is conceding quite a bit by replacing the ball by the quotient space. Nevertheless this estimate is still useful enough to give some good results. We illustrate this by proving the estimate

\displaystyle  \sum_{n: q|n} \sum_{\nu \in {\bf Z}/n{\bf Z}: \nu^2 + 1 = 0 \hbox{ mod } n} e( \frac{m \nu}{n} ) g( \frac{n}{X} ) \ll X^{1/2+o(1)} + q^{-1/2} X^{3/4+o(1)} \ \ \ \ \ (23)

for {1 \leq q,m \leq X} with {m} coprime to {q}, where {g: {\bf R} \rightarrow {\bf R}} is a fixed smooth function supported in, say, {[1,2]} (and implied constants are allowed to depend on {g}), and the asymptotic notation is with regard to the limit {x \rightarrow \infty}. This type of estimate (appearing for instance (in a stronger form) in this paper of Duke, Friedlander, and Iwaniec; see also Proposition 21.10 of Iwaniec and Kowalski.) establishes some equidistribution of the square roots {\{\nu \in {\bf Z}/n{\bf Z}: \nu^2 + 1 = 0 \}} as {n} varies (while staying comparable to {x}). For comparison, crude estimates (ignoring the cancellation in the phase {e(\frac{h \nu}{n})}) give a bound of {O( x^{1+o(1)} / q )}, so the bound (23) is non-trivial whenever {q} is significantly smaller than {x^{1/2}}. Estimates such as (23) are also useful for getting good error terms in the asymptotics for the expression (10), as was first done by Hooley.

One can write (23) in terms of Poincaré series much as was done for (10). Using the fact that the discriminant {-1} has class number one as before, we see that for every positive {n} and {\nu \in {\bf Z}/n{\bf Z}} with {\nu^2 + 1 = 0 \hbox{ mod } n}, we can find an element {\gamma = \begin{pmatrix} a & b \\ c & d \end{pmatrix}} of {\hbox{PSL}_2({\bf Z})} such that {\gamma i = \frac{ai+b}{ci+d}} has imaginary part {\frac{1}{n}} and real part {\frac{\nu}{n}} modulo one (thus, {n = c^2+d^2} and {\nu = ac + bd \hbox{ mod } n}); this element {\gamma} is unique up to left translation by {\Gamma_\infty}. We can thus write the left-hand side of (23) as

\displaystyle  \sum_{\gamma \in \hbox{Fund}( \Gamma_\infty \backslash \hbox{PSL}_2({\bf Z})): q|c^2+d^2} F( \gamma i )


\displaystyle  F(x+iy) = e( mx ) g( \frac{1}{Xy} )

and {c,d} are the bottom two entries of the matrix {\gamma} (determined up to sign). The condition {q | c^2+d^2} implies (since {c,d} must be coprime) that {c,d} are coprime to {q} with {c/d = \delta \hbox{ mod } q} for some {\delta} with {\delta^2+1=0 \hbox{ mod } q}; conversely, if {c,d} obey such a condition then {q|c^2+d^2}. The number of such {\delta} is at most {X^{o(1)}}. Thus it suffices to show that

\displaystyle  \sum_{\gamma \in \hbox{Fund}( \Gamma_\infty \backslash \hbox{PSL}_2({\bf Z})): c/d = \delta \hbox{ mod } q} F( \gamma i ) \ll X^{1/2+o(1)} + q^{-1/2} X^{3/4+o(1)}

for each such {\delta}.

The constraint {c/d = \delta \hbox{ mod } q} constrains {\gamma} to a single right coset of {\Gamma_0(q)}. Thus the left-hand side can be written as

\displaystyle  \sum_{\gamma' \in \hbox{Fund}( \Gamma_\infty \backslash \Gamma_0(q))} F( \gamma' \gamma i )

which is just {P_{\Gamma_\infty \backslash \Gamma_0(q)}[F](\gamma i)}. Applying (22) (and interchanging the Poincaré series and the Laplacian), it thus suffices to show that

\displaystyle  \int_{\Gamma_0(q) \backslash {\mathbf H}} |P_{\Gamma_\infty \backslash \Gamma_0(q)}[F]|^2\ d\mu_{\Gamma_0(q) \backslash {\mathbf H}} \ll X^{1+o(1)} + q^{-1} X^{3/2+o(1)} \ \ \ \ \ (24)


\displaystyle  \int_{\Gamma_0(q) \backslash {\mathbf H}} |P_{\Gamma_\infty \backslash \Gamma_0(q)}[\Delta F]|^2\ d\mu_{\Gamma_0(q) \backslash {\mathbf H}} \ll X^{1+o(1)} + q^{-1} X^{3/2+o(1)}. \ \ \ \ \ (25)

We can compute

\displaystyle  \Delta F(x+iy) = e(mx) \tilde g( \frac{1}{Xy} )


\displaystyle  \tilde g(u) = 2 u g'(u) + u^2 g''(u) - \frac{4\pi^2 m^2}{X^2} u^{-2} g(u).

By hypothesis, the coefficient {\frac{4\pi^2 m^2}{X^2}} is bounded, and so {\tilde g} has all derivatives bounded while remaining supported in {[1,2]}. Because of this, the arguments used to establish (24) can be adapted without difficulty to establish (25).

Using the expansion (14) of (13), we can write the left-hand side of (24) as

\displaystyle  m (\int_0^\infty |\tilde g( \frac{m}{Xy})|^2 \frac{dy}{y^2} + \sum_{q|c} \frac{S(m,m;c)}{c} V(\frac{m}{c}))


\displaystyle  V(u) := \frac{1}{u} \int_0^\infty \int_{\bf R} \tilde g( \frac{m}{Xy} ) \tilde g(\frac{m (x^2+y^2)}{X u^2 y}) e(- x - u^2 \frac{x}{x^2+y^2} )\ \frac{dx dy}{y^2}.

The first term can be computed to give a contribution of {O(X)}, so it suffices to show that

\displaystyle  \sum_{q|c} \frac{S(m,m;c)}{c} V(\frac{m}{c}) \ll m^{-1} q^{-1} X^{3/2+o(1)}. \ \ \ \ \ (26)

The quantity {V(u)} is vanishing unless {u \gg m/X}. In that case, the integrand vanishes unless {y \sim m/X} and {x = O(u)}, so by the triangle inequality we have {V(u) \ll X/m}. So the left-hand side of (26) is bounded by

\displaystyle  \ll \frac{X}{m} \sum_{q|c: c \ll X} \frac{|S(m,m;c)|}{c}.

By the Weil bound for Kloosterman sums, we have {|S(m,m;c)| \ll X^{o(1)} (m,c)^{1/2} c^{1/2}}, so on factoring out {q} from {c} we can bound the previous expression by

\displaystyle  \ll \frac{X^{1+o(1)}}{m} q^{-1/2} \sum_{c: c \ll X/q} (m,c)^{1/2} c^{-1/2}

\displaystyle  \ll \frac{X^{1+o(1)}}{m} q^{-1/2} \sum_{d|m} \sum_{c: c \ll X/q; d|c} d^{1/2} c^{-1/2}

\displaystyle  \ll \frac{X^{1+o(1)}}{m} q^{-1/2} \sum_{d|m} d^{1/2} (\frac{X}{qd})^{1/2}

\displaystyle  \ll \frac{X^{1+o(1)}}{m} q^{-1/2} X^{o(1)} \frac{X^{1/2}}{q^{1/2}}

and the claim follows.

Remark 3 By using improvements to Selberg’s 3/16 theorem (such as the result of Kim and Sarnak improving this fraction to {\frac{975}{4096}}) one can improve the second term in the right-hand side of (23) slightly.

Filed under: expository, math.AP, math.NT, math.SP Tagged: automorphic forms, Poincare series, wave equation

BackreactionMe, Elsewhere

A few links:

Tommaso DorigoNew Frontiers In Physics: The 2015 Conference In Kolimbari

Nowadays Physics is a very big chunck of science, and although in our University courses we try to give our students a basic knowledge of all of it, it has become increasingly clear that it is very hard to keep up to date with the developments in such diverse sub-fields as quantum optics, material science, particle physics, astrophysics, quantum field theory, statistical physics, thermodynamics, etcetera.

Simply put, there is not enough time within the average life time of a human being to read and learn about everything that is being studied in dozens of different disciplines that form what one may generically call "Physics. 

read more

Clifford JohnsonBeetlemania…

Fig beetles.


(Slightly blurred due to it being windy and a telephoto shot with a light handheld point-and-shoot...)

-cvj Click to continue reading this post

The post Beetlemania… appeared first on Asymptotia.

August 23, 2015

Clifford JohnsonRed and Round…


Some more good results from the garden, after I thought that the whole crop was again going to be prematurely doomed, like last year. I tried to photograph the other thing about this year's gardening narrative that I intend to tell you about, but with poor results, but I'll say more shortly. In the meantime, for the record here are some Carmello tomatoes and some of a type of Russian Black [...] Click to continue reading this post

The post Red and Round… appeared first on Asymptotia.

August 21, 2015

Doug NatelsonAnecdote 5: Becoming an experimentalist, and the Force-o-Matic

As an undergrad, I was a mechanical engineering major doing an engineering physics program from the engineering side.  When I was a sophomore, my lab partner in the engineering fluid mechanics course, Brian, was doing the same program, but from the physics side.  Rather than doing a pre-made lab, we chose to take the opportunity to do an experiment of our own devising.   We had a great plan.  We wanted to compare the drag forces on different shapes of boat hulls.  The course professor got us permission to go to a nearby research campus, where we would be able to take our homemade models and run them in their open water flow channel (like an infinity pool for engineering experiments) for about three hours one afternoon.  

The idea was simple:  The flowing water would tend to push the boat hull downstream due to drag.  We would attach a string to the hull, run the string over a pulley, and hang known masses on the end of the string, until the weight of the masses (transmitted via the string) pulled upstream to balance out the drag force - that way, when we had the right amount of weight on there, the boat hull would sit motionless in the flow channel.  By plotting the weight vs. the flow velocity, we'd be able to infer the dependence of the drag force on the flow speed, and we could compare different hull designs. 

Like many great ideas, this was wonderful right up until we actually tried to implement it in practice.  Because we were sophomores and didn't really have a good feel for the numbers, we hadn't estimated anything and tacitly assumed that our approach would work.  Instead, the drag forces on our beautiful homemade wood hulls were much smaller than we'd envisioned, so much so that just the horizontal component of the force from the sagging string itself was enough to hold the boats in place.  With only a couple of hours at our disposal, we had to face the fact that our whole measurement scheme was not going to work.

What did we do?  With improvisation that would have made McGyver proud, we used a protractor, chewing gum, and the spring from a broken ballpoint pen to create a much "softer" force measurement apparatus, dubbed the Force-o-Matic.  With the gum, we anchored one end of the stretched spring to the "origin" point of the protractor, with the other end attached to a pointer made out of the pen cap, oriented to point vertically relative to the water surface.  With fine thread instead of the heavier string, we connected the boat hull to the tip of the pointer, so that tension in the thread laterally deflected the extended spring by some angle.  We could then later calibrate the force required to produce a certain angular deflection.  We got usable data, an A on the project, and a real introduction, vividly memorable 25 years later, to real experimental work.

Scott Aaronson6-photon BosonSampling

The news is more-or-less what the title says!

In Science, a group led by Anthony Laing at Bristol has now reported BosonSampling with 6 photons, beating their own previous record of 5 photons, as well as the earlier record of 4 photons achieved a few years ago by the Walmsley group at Oxford (as well as the 3-photon experiments done by groups around the world).  I only learned the big news from a commenter on this blog, after the paper was already published (protip: if you’ve pushed forward the BosonSampling frontier, feel free to shoot me an email about it).

As several people explain in the comments, the main advance in the paper is arguably not increasing the number of photons, but rather the fact that the device is completely reconfigurable: you can try hundreds of different unitary transformations with the same chip.  In addition, the 3-photon results have an unprecedentedly high fidelity (about 95%).

The 6-photon results are, of course, consistent with quantum mechanics: the transition amplitudes are indeed given by permanents of 6×6 complex matrices.  Key sentence:

After collecting 15 sixfold coincidence events, a confidence of P = 0.998 was determined that these are drawn from a quantum (not classical) distribution.

No one said scaling BosonSampling would be easy: I’m guessing that it took weeks of data-gathering to get those 15 coincidence events.  Scaling up further will probably require improvements to the sources.

There’s also a caveat: their initial state consisted of 2 modes with 3 photons each, as opposed to what we really want, which is 6 modes with 1 photon each.  (Likewise, in the Walmsley group’s 4-photon experiments, the initial state consisted of 2 modes with 2 photons each.)  If the number of modes stayed 2 forever, then the output distributions would remain easy to sample with a classical computer no matter how many photons we had, since we’d then get permanents of matrices with only 2 distinct rows.  So “scaling up” needs to mean increasing not only the number of photons, but also the number of sources.

Nevertheless, this is an obvious step forward, and it came sooner than I expected.  Huge congratulations to the authors on their accomplishment!

But you might ask: given that 6×6 permanents are still pretty easy for a classical computer (the more so when the matrices have only 2 distinct rows), why should anyone care?  Well, the new result has major implications for what I’ve always regarded as the central goal of quantum computing research, much more important than breaking RSA or Grover search or even quantum simulation: namely, getting Gil Kalai to admit he was wrong.  Gil is on record, repeatedly, on this blog as well as his (see for example here), as saying that he doesn’t think BosonSampling will ever be possible even with 7 or 8 photons.  I don’t know whether the 6-photon result is giving him second thoughts (or sixth thoughts?) about that prediction.

August 20, 2015

John PreskillQuantum Information meets Quantum Matter

“Quantum Information meets Quantum Matter”, it sounds like the beginning of a perfect romance story. It is probably not the kind that makes an Oscar movie, but it does get many physicists excited, physicists including Bei, Duanlu, Xiaogang and me. Actually we find the story so compelling that we decided to write a book about it, and it all started one day in 2011 when Bei popped the question ‘Do you want to write a book about it?’ during one of our conversations.

This idea quickly sparked enthusiasm among the rest of us, who have all been working in this interdisciplinary area and are witness to its rising power. In fact Xiao-Gang has had the same idea of book writing for some time. So now here we are, four years later, posting the first version of the book on arXiv last week.  (arXiv link)

The book is a condensed matter book on the topic of strongly interacting many-body systems, with a special focus on the emergence of topological order. This is an exciting topic, with new developments everyday. We are not trying to cover the whole picture, but rather to present just one perspective – the quantum information perspective – of the story. Quantum information ideas, like entanglement, quantum circuit, quantum codes are becoming ever more popular nowadays in condensed matter study and have lead to many important developments. On the other hand, they are not usually taught in condensed matter courses or covered by condensed matter books. Therefore, we feel that writing a book may help bridge the gap.

We keep the writing in a self-consistent way, requiring minimum background in quantum information and condensed matter. The first part introduces concepts in quantum information that is going to be useful in the later study of condensed matter systems. (It is by no means a well-rounded introduction to quantum information and should not be read in that way.) The second part moves onto explaining one major topic of condensed matter theory, the local Hamiltonians and their ground states, and contains introduction to the most basic concepts in condensed matter theory like locality, gap, universality, etc. The third part then focuses on the emergence of topological order, first presenting a historical and intuitive picture of topological order and then building a more systematic approach based on entanglement and quantum circuit. With this framework established, the fourth part studies some interesting topological phases in 1D and 2D, with the help of the tensor network formalism. Finally part V concludes with the outlook of where this miraculous encounter of quantum information and condensed matter would take us – the unification between information and matter.

We hope that, with such a structure, the book is accessible to both condensed matter students / researchers interested in this quantum information approach and also quantum information people who are interested in condensed matter topics. And of course, the book is also limited by the perspective we are taking. Compared to a standard condensed matter book, we are missing even the most elementary ingredient – the free fermion. Therefore, this book is not to be read as a standard textbook on condensed matter theory. On the other hand, by presenting a new approach, we hope to bring the readers to the frontiers of current research.

The most important thing I want to say here is: this arXiv version is NOT the final version. We posted it so that we can gather feedbacks from our colleagues. Therefore, it is not yet ready for junior students to read in order to learn the subject. On the other hand, if you are a researcher in a related field, please send us criticism, comments, suggestions, or whatever comes to your mind. We will be very grateful for that! (One thing we already learned (thanks Burak!) is that we forgot to put in all the references on conditional mutual information. That will be corrected in a later version, together with everything else.) The final version will be published by Springer as part of their “Quantum Information Science and Technology” series.

I guess it is quite obvious that me writing on the blog of the Institute for Quantum Information and Matter (IQIM) about this book titled “Quantum Information meets Quantum Matter” (QIQM) is not a simple coincidence. The romance story between the two emerged in the past decade or so and has been growing at a rate much beyond expectations. Our book is merely an attempt to record some aspects of the beginning. Let’s see where it will take us.

August 19, 2015

John PreskillBeware global search and replace!

I’m old enough to remember when cutting and pasting were really done with scissors and glue (or Scotch tape). When I was a graduate student in the late 1970s, few physicists typed their own papers, and if they did they left gaps in the text, to be filled in later with handwritten equations. The gold standard of technical typing was the IBM Correcting Selectric II typewriter. Among its innovations was the correction ribbon, which allowed one to remove a typo with the touch of a key. But it was especially important for scientists that the Selectric could type mathematical characters, including Greek letters.

IBM Selectric typeballs

IBM Selectric typeballs

It wasn’t easy. Many different typeballs were available, to support various fonts and special characters. Typing a displayed equation or in-line equation usually involved swapping back and forth between typeballs to access all the needed symbols. Most physics research groups had staff who knew how to use the IBM Selectric and spent much of their time typing manuscripts.

Though the IBM Selectric was used by many groups, typewriters have unique personalities, as forensic scientists know. I had a friend who claimed he had learned to recognize telltale differences among documents produced by various IBM Selectric machines. That way, whenever he received a referee report, he could identify its place of origin.

Manuscripts did not evolve through 23 typeset versions in those days, as one of my recent papers did. Editing was arduous and frustrating, particularly for a lowly graduate student like me, who needed to beg Blanche to set aside what she was doing for Steve Weinberg and devote a moment or two to working on my paper.

It was tremendously liberating when I learned to use TeX in 1990 and started typing my own papers. (Not LaTeX in those days, but Plain TeX embellished by a macro for formatting.) That was a technological advance that definitely improved my productivity. An earlier generation had felt the same way about the Xerox machine.

But as I was reminded a few days ago, while technological advances can be empowering, they can also be dangerous when used recklessly. I was editing a very long document, and decided to make a change. I had repeatedly used $x$ to denote an n-bit string, and thought it better to use $\vec x$ instead. I was walking through the paper with the replace button, changing each $x$ to $\vec x$ where the change seemed warranted. But I slipped once, and hit the “Replace All” button instead of “Replace.” My computer curtly informed me that it had made the replacement 1011 times. Oops …

This was a revocable error. There must have been a way to undo it (though it was not immediately obvious how). Or I could have closed the file without saving, losing some recent edits but limiting the damage.

But it was late at night and I was tired. I panicked, immediately saving and LaTeXing the file. It was a mess.

Okay, no problem, all I had to do was replace every \vec x with x and everything would be fine. Except that in the original replacement I had neglected to specify “Match Case.” In 264 places $X$ had become $\vec x$, and the new replacement did not restore the capitalization. It took hours to restore every $X$ by hand, and there are probably a few more that I haven’t noticed yet.

Which brings me to the cautionary tale of one of my former graduate students, Robert Navin. Rob’s thesis had two main topics, scattering off vortices and scattering off monopoles. On the night before the thesis due date, Rob made a horrifying discovery. The crux of his analysis of scattering off vortices concerned the singularity structure of a certain analytic function, and the chapter about vortices made many references to the poles of this function. What Rob realized at this late stage is that these singularities are actually branch points, not poles!

What to do? It’s late and you’re tired and your thesis is due in a few hours. Aha! Global search and replace! Rob replaced every occurrence of “pole” in his thesis by “branch point.” Problem solved.

Except … Rob had momentarily forgotten about that chapter on monopoles. Which, when I read the thesis, had been transformed into a chapter on monobranch points. His committee accepted the thesis, but requested some changes …

Rob Navin no longer does physics, but has been very successful in finance. I’m sure he’s more careful now.

August 18, 2015

Scott AaronsonJacob Bekenstein (1947-2015)

Today I learned the sad news that Jacob Bekenstein, one of the great theoretical physicists of our time, passed away at the too-early age of 68.

Everyone knows what a big deal it was when Stephen Hawking discovered in 1975 that black holes radiate.  Bekenstein was the guy who, as a grad student in Princeton in the early 1970s, was already raving about black holes having nonzero entropy and temperature, and satisfying the Second Law of Thermodynamics—something just about everyone, including Hawking, considered nuts at the time.  It was, as I understand it, Hawking’s failed attempt to prove Bekenstein wrong that led to Hawking’s discovery of the Hawking radiation, and thence to the modern picture of black holes.

In the decades since, Bekenstein continued to prove ingenious physical inequalities, often using thought experiments involving black holes.  The most famous of these, the Bekenstein bound, says that the number of bits that can be stored in any bounded physical system is finite, and is upper-bounded by ~2.6×1043 MR, where M is the system’s mass in kilograms and R is its radius in meters.  (This bound is saturated by black holes, and only by black holes, which therefore emerge as the most compact possible storage medium—though probably not the best for retrieval!)  Bekenstein’s lectures were models of clarity and rigor: at conferences full of audacious speculations, he stood out to my non-expert eyes as someone who was simply trying to follow chains of logic from accepted physical principles, however mind-bogglingly far those chains led but no further.

I first met Bekenstein in 2003, when I was a grad student spending a semester at Hebrew University in Jerusalem.  I was struck by the kindness he showed a 21-year-old nobody, who wasn’t even a physics student, coming to bother him.  Not only did he listen patiently to my blather about applying computational complexity to physics, he said that of course physics should ultimately aim to understand everything as the output of some computer program, that he too was thinking in terms of computation when he studied black-hole entropy.  I remember pondering the fact that the greatest reductionist I’d ever met was wearing a yarmulke—and then scolding myself for wasting precious brain-cycles on such a trivial thought when there was science to discuss.  I met Bekenstein maybe four or five more times on visits to Israel, most recently a year and a half ago, when we shared walks to and from the hotel at a firewall workshop at the Weizmann Institute.  He was unfailingly warm, modest, and generous—totally devoid of the egotism that I’ve heard can occasionally afflict people of his stature.  Now, much like with the qubits hitting the event horizon, the information that comprised Jacob Bekenstein might seem to be gone, but it remains woven into the cosmos.

ResonaancesWeekend Plot: Inflation'15

The Planck collaboration is releasing new publications based on their full dataset, including CMB temperature and large-scale polarization data.  The updated values of the crucial  cosmological parameters were already made public in December last year, however one important new element is the combination of these result with the joint Planck/Bicep constraints on the CMB B-mode polarization.  The consequences for models of inflation are summarized in this plot:

It shows the constraints on the spectral index ns and the tensor-to-scalar ratio r of the CMB fluctuations, compared to predictions of various single-field models of inflation.  The limits on ns changed slightly compared to the previous release, but the more important progress is along the y-axis. After including the joint Planck/Bicep analysis (in the plot referred to as BKP), the combined limit on the tensor-to-scalar ratio becomes r < 0.08.  What is also important, the new limit is much more robust; for example, allowing for a scale dependence of the spectral index  relaxes the bound  only slightly,  to r< 0.10.

The new results have a large impact on certain classes models. The model with the quadratic inflaton potential, arguably the simplest model of inflation, is now strongly disfavored. Natural inflation, where the inflaton is a pseudo-Golsdtone boson with a cosine potential, is in trouble. More generally, the data now favors a concave shape of the inflaton potential during the observable period of inflation; that is to say, it looks more like a hilltop than a half-pipe. A strong player emerging from this competition is R^2 inflation which, ironically, is the first model of inflation ever written.  That model is equivalent to an exponential shape of the inflaton potential, V=c[1-exp(-a φ/MPL)]^2, with a=sqrt(2/3) in the exponent. A wider range of the exponent a can also fit the data, as long as a is not too small. If your favorite theory predicts an exponential potential of this form, it may be a good time to work on it. However, one should not forget that other shapes of the potential are still allowed, for example a similar exponential potential without the square V~ 1-exp(-a φ/MPL), a linear potential V~φ, or more generally any power law potential V~φ^n, with the power n≲1. At this point, the data do not favor significantly one or the other. The next waves of CMB polarization experiments should clarify the picture. In particular, R^2 inflation predicts 0.003 < r < 0.005, which is should be testable in a not-so-distant future.

Planck's inflation paper is here.

ResonaancesHow long until it's interesting?

Last night, for the first time, the LHC  collided particles at the center-of-mass energy of 13 TeV. Routine collisions should follow early in June. The plan is to collect 5-10 inverse femtobarn (fb-1) of data before winter comes, adding to the 25 fb-1 from Run-1. It's high time dust off your Madgraph and tool up for what may be the most exciting time in particle physics in this century. But when exactly should we start getting excited? When should we start friending LHC experimentalists on facebook? When is the time to look over their shoulders for a glimpse of of gluinos popping out of the detectors. One simple way to estimate the answer is to calculate what is the luminosity when the number of particles produced  at 13 TeV will exceed that produced during the whole Run-1. This depends on the ratio of the production cross sections at 13 and 8 TeV which is of course strongly dependent on the particle's mass and production mechanism. Moreover, the LHC discovery potential will also depend on how the background processes change, and on a host of other experimental issues.  Nevertheless, let us forget for a moment about  the fine-print, and  calculate the ratio of 13 and 8 TeV cross sections for a few particles popular among the general public. This will give us a rough estimate of the threshold luminosity when things should get interesting.

  • Higgs boson: Ratio≈2.3; Luminosity≈10 fb-1.
    Higgs physics will not be terribly exciting this year, with only a modest improvement of the couplings measurements expected. 
  • tth: Ratio≈4; Luminosity≈6 fb-1.
    Nevertheless, for certain processes involving the Higgs boson the improvement may be a bit  faster. In particular, the theoretically very important process of Higgs production in association with top quarks (tth) was on the verge of being detected in Run-1. If we're lucky, this year's data may tip the scale and provide an evidence for a non-zero top Yukawa couplings. 
  • 300 GeV Higgs partner:  Ratio≈2.7 Luminosity≈9 fb-1.
    Not much hope for new scalars in the Higgs family this year.  
  • 800 GeV stops: Ratio≈10; Luminosity≈2 fb-1.
    800 GeV is close to the current lower limit on the mass of a scalar top partner decaying to a top quark and a massless neutralino. In this case, one should remember that backgrounds also increase at 13 TeV, so the progress will be a bit slower than what the above number suggests. Nevertheless,  this year we will certainly explore new parameter space and make the naturalness problem even more severe. Similar conclusions hold for a fermionic top partner. 
  • 3 TeV Z' boson: Ratio≈18; Luminosity≈1.2 fb-1.
    Getting interesting! Limits on Z' bosons decaying to leptons will be improved very soon; moreover, in this case background is not an issue.  
  • 1.4 TeV gluino: Ratio≈30; Luminosity≈0.7 fb-1.
    If all goes well, better limits on gluinos can be delivered by the end of the summer! 

In summary, the progress will be very fast for new heavy particles. In particular, for gluon-initiated production of TeV-scale particles  already the first inverse femtobarn may bring us into a new territory. For lighter particles the progress will be slower, especially when backgrounds are difficult.  On the other hand, precision physics, such as the Higgs couplings measurements, is unlikely to be in the spotlight this year.

ResonaancesWeekend Plot: Higgs mass and SUSY

This weekend's plot shows the region in the stop mass and mixing space of the MSSM that reproduces the measured Higgs boson mass of 125 GeV:

Unlike in the Standard Model, in the minimal supersymmetric extension of the Standard Model (MSSM) the Higgs boson mass is not a free parameter; it can be calculated given all masses and couplings of the supersymmetric particles. At the lowest order, it is equal to the Z bosons mass 91 GeV (for large enough tanβ). To reconcile the predicted and the observed Higgs mass, one needs to invoke large loop corrections due to supersymmetry breaking. These are dominated by the contribution of the top quark and its 2 scalar partners (stops) which couple most strongly of all particles to the Higgs. As can be seen in the plot above, the stop mass preferred by the Higgs mass measurement is around 10 TeV. With a little bit of conspiracy, if the mixing between the two stops  is just right, this can be lowered to about 2 TeV. In any case, this means that, as long as the MSSM is the correct theory, there is little chance to discover the stops at the LHC.

This conclusion may be surprising because previous calculations were painting a more optimistic picture. The results above are derived with the new SUSYHD code, which utilizes effective field theory techniques to compute the Higgs mass in the presence of  heavy supersymmetric particles. Other frequently used codes, such as FeynHiggs or Suspect, obtain a significantly larger Higgs mass for the same supersymmetric spectrum, especially near the maximal mixing point. The difference can be clearly seen in the plot to the right (called the boobs plot by some experts). Although there is a  debate about the size of the error as estimated by SUSYHD, other effective theory calculations report the same central values.

ResonaancesWeekend plot: dark photon update

Here is a late weekend plot with new limits on the dark photon parameter space:

The dark photon is a hypothetical massive spin-1 boson mixing with the ordinary photon. The minimal model is fully characterized by just 2 parameters: the mass mA' and the mixing angle ε. This scenario is probed by several different experiments using completely different techniques.  It is interesting to observe how quickly the experimental constraints have been improving in the recent years. The latest update appeared a month ago thanks to the NA48 collaboration. NA48/2 was an experiment a decade ago at CERN devoted to studying CP violation in kaons. Kaons can decay to neutral pions, and the latter can be recycled into a nice probe of dark photons.  Most often,  π0 decays to two photons. If the dark photon is lighter than 135 MeV, one of the photons can mix into an on-shell dark photon, which in turn can decay into an electron and a positron. Therefore,  NA48 analyzed the π0 → γ e+ e-  decays in their dataset. Such pion decays occur also in the Standard Model, with an off-shell photon instead of a dark photon in the intermediate state.  However, the presence of the dark photon would produce a peak in the invariant mass spectrum of the e+ e- pair on top of the smooth Standard Model background. Failure to see a significant peak allows one to set limits on the dark photon parameter space, see the dripping blood region in the plot.

So, another cute experiment bites into the dark photon parameter space.  After this update, one can robustly conclude that the mixing angle in the minimal model has to be less than 0.001 as long as the dark photon is lighter than 10 GeV. This is by itself not very revealing, because there is no  theoretically preferred value of  ε or mA'.  However, one interesting consequence the NA48 result is that it closes the window where the minimal model can explain the 3σ excess in the muon anomalous magnetic moment.

n-Category Café A Wrinkle in the Mathematical Universe

Of all the permutation groups, only S 6S_6 has an outer automorphism. This puts a kind of ‘wrinkle’ in the fabric of mathematics, which would be nice to explore using category theory.

For starters, let Bij nBij_n be the groupoid of nn-element sets and bijections between these. Only for n=6n = 6 is there an equivalence from this groupoid to itself that isn’t naturally isomorphic to the identity!

This is just another way to say that only S 6S_6 has an outer isomorphism.

And here’s another way to play with this idea:

Given any category XX, let Aut(X)Aut(X) be the category where objects are equivalences f:XXf : X \to X and morphisms are natural isomorphisms between these. This is like a group, since composition gives a functor

:Aut(X)×Aut(X)Aut(X) \circ : Aut(X) \times Aut(X) \to Aut(X)

which acts like the multiplication in a group. It’s like the symmetry group of XX. But it’s not a group: it’s a ‘2-group’, or categorical group. It’s called the automorphism 2-group of XX.

By calling it a 2-group, I mean that Aut(X)Aut(X) is a monoidal category where all objects have weak inverses with respect to the tensor product, and all morphisms are invertible. Any pointed space has a fundamental 2-group, and this sets up a correspondence between 2-groups and connected pointed homotopy 2-types. So, topologists can have some fun with 2-groups!

Now consider Bij nBij_n, the groupoid of nn-element sets and bijections between them. Up to equivalence, we can describe Aut(Bij n)Aut(Bij_n) as follows. The objects are just automorphisms of S nS_n, while a morphism from an automorphism f:S nS nf: S_n \to S_n to an automorphism f:S nS nf' : S_n \to S_n is an element gS ng \in S_n that conjugates one automorphism to give the other:

f(h)=gf(h)g 1hS n f'(h) = g f(h) g^{-1} \qquad \forall h \in S_n

So, if all automorphisms of S nS_n are inner, all objects of Aut(Bij n)Aut(Bij_n) are isomorphic to the unit object, and thus to each other.

Puzzle 1. For n6n \ne 6, all automorphisms of S nS_n are inner. What are the connected pointed homotopy 2-types corresponding to Aut(Bij n)Aut(Bij_n) in these cases?

Puzzle 2. The permutation group S 6S_6 has an outer automorphism of order 2, and indeed Out(S 6)= 2.Out(S_6) = \mathbb{Z}_2. What is the connected pointed homotopy 2-type corresponding to Aut(Bij 6)Aut(Bij_6)?

Puzzle 3. Let BijBij be the groupoid where objects are finite sets and morphisms are bijections. BijBij is the coproduct of all the groupoids Bij nBij_n where n0n \ge 0:

Bij= n=0 Bij n Bij = \sum_{n = 0}^\infty Bij_n

Give a concrete description of the 2-group Aut(Bij)Aut(Bij), up to equivalence. What is the corresponding pointed connected homotopy 2-type?

You can get a bit of intuition for the outer automorphism of S 6S_6 using something called the Tutte–Coxeter graph.

Let S={1,2,3,4,5,6}S = \{1,2,3,4,5,6\}. Of course the symmetric group S 6S_6 acts on SS, but James Sylvester found a different action of S 6S_6 on a 6-element set, which in turn gives an outer automorphism of S 6S_6.

To do this, he made the following definitions:

• A duad is a 2-element subset of SS. Note that there are 6choose2=15 {6 \choose 2} = 15 duads.

• A syntheme is a set of 3 duads forming a partition of SS. There are also 15 synthemes.

• A synthematic total is a set of 5 synthemes partitioning the set of 15 duads. There are 6 synthematic totals.

Any permutation of SS gives a permutation of the set TT of synthematic totals, so we obtain an action of S 6S_6 on TT. Choosing any bijection betweeen SS and TT, this in turn gives an action of S 6S_6 on SS, and thus a homomorphism from S 6S_6 to itself. Sylvester showed that this is an outer automorphism!

There’s a way to draw this situation. It’s a bit tricky, but Greg Egan has kindly done it:

Here we see 15 small red blobs: these are the duads. We also see 15 larger blue blobs: these are the synthemes. We draw an edge from a duad to a syntheme whenever that duad lies in that syntheme. The result is a graph called the Tutte–Coxeter graph, with 30 vertices and 45 edges.

The 6 concentric rings around the picture are the 6 synthematic totals. A band of color appears in one of these rings near some syntheme if that syntheme is part of that synthematic total.

If we draw the Tutte–Coxeter graph without all the decorations, it looks like this:

The red vertices come from duads, the blue ones from synthemes. The outer automorphism of S 6S_6 gives a symmetry of the Tutte–Coxeter graph that switches the red and blue vertices!

The inner automorphisms, which correspond to elements of S 6S_6, also give symmetries: for each element of S 6S_6, the Tutte–Coxeter graph has a symmetry that permutes the numbers in the picture. These symmetries map red vertices to red ones and blue vertices to blue ones.

The group Aut(S 6)\mathrm{Aut}(S_6) has

2×6!=1440 2 \times 6! = 1440

elements, coming from the 6!6! inner automorphisms of S 6S_6 and the outer automorphism of order 2. In fact, Aut(S 6)\mathrm{Aut}(S_6) is the whole symmetry group of the Tutte–Coxeter graph.

For more on the Tutte–Coxeter graph, see my post on the AMS-hosted blog Visual Insight: