More adventures in communicating the awesomeness of physics! Yesterday I spent a gruelling seven hours in the sun talking about the development of various ideas in physics over the centuries for a new show (to air next year) on PBS. Interestingly, we did all of this at a spot that, in less dry times, would have been underwater. It was up at lake Piru, which, due to the drought, is far below capacity. You can see this by going to google maps, looking at the representation of its shape on the map, and then clicking the satellite picture overlay to see the much changed (and reduced) shape in recent times.

There's an impressive dam at one end of the lake/reservoir, and I will admit that I did not resist the temptation to pull over, look at a nice view of it from the road on the way home, and say out loud "daaayuum". An offering to the god Pun, you see.

Turns out that there's a wide range of wildlife, large and small, trudging around on the [...] Click to continue reading this post

The post PBS Shoot Fun appeared first on Asymptotia.

Today, I officially stopped being department chair, and started my sabbatical leave. I also acquired a new toy:

My old DSLR camera, a Canon Rebel XSi that I got *mumble years ago, has been very good for over 20,000 pictures, but a few things about it were getting kind of flaky– it’s been bad at reading light levels for a while now, meaning I’m constantly having to monkey with the ISO setting manually, then forgetting to change it back when I move to a brighter location and taking a bunch of pictures where everything is all blown out. It also stopped talking to the external flash unit, after I had gotten used to the indirect flash option (though it looks like that might be a problem with the external flash unit, as it won’t talk to the new camera, either…). I could probably get those issues fixed, but it would probably cost about the same as a new camera body, with new cool features, so…*

So I got a new camera. (And, as it turned out, I can now spend my vast accumulation of credit-card reward points (that I never remember to do anything with) on Amazon, so I didn’t spend any real money on this at all…) This is a Canon Rebel T6i, a slight upgrade from the XSi (not a full professional-grade DSLR), but the advance of technology over the last *mumble* years means it has a bunch of fun new features– bigger sensor, faster speeds, video capability. So it’ll be a nice toy.

As a bonus, I’m also acquiring a casual sabbatical project– for a while now, I’ve watched other people do the photo-a-day thing, and thought that might be fun. While I was saddled with administrative hassles of being chair, that wasn’t really going to work, but while I’m on sabbatical, I can devote a little time every day to taking photos.

So, I’ll be trying to take and post one picture a day for the next year (Or at least GIMPing photos taken on a previous day to get them down to a size I can post here…)– which happens to be a leap year, so 366 days, woo-hoo– and we’ll see how that works out. For the officially official first photo of the set, we’ll use one of the test pictures I took of Emmy:

I cropped and scaled this, but didn’t do any color tweaking. Emmy’s too jaded about photography to actually turn her head and look at the camera, but this pose has a certain dignity…

*(For comparison, here’s the first posted photo from the previous camera.)*

If you tie your laces, loops and strings might seem like parts of the same equation, but when it comes to quantum gravity they don’t have much in common. String Theory and Loop Quantum Gravity, both attempts to consistently combine Einstein’s General Relativity with quantum theory, rest on entirely different premises.

String theory posits that everything, including the quanta of the gravitational field, is made up of vibrating strings which are characterized by nothing but their tension. Loop Quantum Gravity is a way to quantize gravity while staying as closely as possible to the quantization methods which have been successful with the other interactions.The mathematical realization of the two theories is completely different too. The former builds on dynamically interacting strings which give rise to higher dimensional membranes, and leads to a remarkably complex theoretical construct that might or might not actually describe the quantum properties of space and time in our universe. The latter divides up space-time into spacelike slices, and then further chops up the slices into discrete chunks to which quantum properties are assigned. This might or might not describe the quantum properties of space and time in our universe...

String theory and Loop Quantum Gravity also differ in their ambition. While String Theory is meant to be a theory for gravity and all the other interactions – a “Theory of Everything” – Loop Quantum Gravity merely aims at finding a quantum version of gravity, leaving aside the quantum properties of matter.

Needless to say, each side claims their approach is the better one. String theorists argue that taking into account all we know about the other interactions provides additional guidance. Researchers in Loop Quantum Gravity emphasize their modest and minimalist approach that carries on the formerly used quantization methods in the most conservative way possible.

They’ve been arguing for 3 decades now, but maybe there’s an end in sight.

In a little noted paper out last year, Jorge Pullin and Rudolfo Gambini argue that taking into account the interaction of matter on a loop-quantized space-time forces one to use a type of interaction that is very similar to that also found in effective models of string interactions.

The reason is Lorentz-invariance, the symmetry of special relativity. The problem with the quantization in Loop Quantum Gravity comes from the difficulty of making anything discrete Lorentz-invariant and thus compatible with special relativity and, ultimately, general relativity. The splitting of space-time into slices is not a priori a problem as long as you don’t introduce any particular length scale on the resulting slices. Once you do that, you’re stuck with a particular slicing, thus ruining Lorentz-invariance. And if you fix the size of a loop, or the length of a link in a network, that’s exactly what happens.

There has been twenty years of debate whether or not the fate of Lorentz-invariance in Loop Quantum Gravity is really problematic, because it isn’t so clear just exactly how it would make itself noticeable in observations as long as you are dealing with the gravitational sector only. But once you start putting matter on the now quantized space, you have something to calculate.

Pullin and Gambini – both from the field of LQG it must be mentioned! – argue that the Lorentz-invariance violation inevitably creeps into the matter sector if one uses local quantum field theory on the loop quantized space. But that violation of Lorentz-invariance in the matter sector would be in conflict with experiment, so that can’t be correct. Instead they suggest that this problem can be circumvented by using an interaction that is non-local in a particular way, which serves to suppress unwanted contributions that spoil Lorentz-invariance. This non-locality is similar to the non-locality that one finds in low-energy string scattering, where the non-locality is a consequence of the extension of the strings. They write:

“It should be noted that this is the first instance in which loop quantum gravity imposes restrictions on the matter content of the theory. Up to now loop quantum gravity, in contrast to supergravity or string theory, did not appear to impose any restrictions on matter. Here we are seeing that in order to be consistent with Lorentz invariance at small energies, limitations on the types of interactions that can be considered arise.”

In a nutshell it means that they’re acknowledging they have a problem and that the only way to solve it is to inch closer to string theory.

But let me extrapolate their paper, if you allow. It doesn’t stop at the matter sector of course, because if one doesn’t assume a fixed background like they do in the paper one should also have gravitons and these need to have an interaction too. This interaction will suffer from the same problem, unless you cure it by the same means. Consequently, you will in the end have to modify the quantization procedure for gravity itself. And while I’m at it anyway, I think a good way to remedy the problem would be to not force the loops to have a fixed length, but to make them dynamical and give them a tension...

I’ll stop here because I know just enough of both string theory and loop quantum gravity to realize that technically this doesn’t make a lot of sense (among many other things because you don’t quantize loops, they are the quantization), and I have no idea how to make this formally correct. All I want to say is that after thirty years maybe something is finally starting to happen.

Should this come as a surprise?

It shouldn’t if you’ve read my review on

Does that mean that the string is the thing? No, because this doesn’t actually tell you anything specific about the UV completion, except that it must have a well-behaved type of non-local interaction that Loop Quantum Gravity doesn’t seem to bring, or at least it isn’t presently understood how it would. Either way, I find this an interesting development.

The great benefit of writing a blog is that I’m not required to contact “researchers not involved in the study” and ask for an “outside opinion.” It’s also entirely superfluous because I can just tell you myself that the String Theorist said “well, it’s about time” and the Loop Quantum Gravity person said “that’s very controversial and actually there is also this paper and that approach which says something different.” Good thing you have me to be plainly unapologetically annoying ;) My pleasure.

This morning, we started with a talk by Taku Izubuchi, who reviewed the lattice efforts relating to the hadronic contributions to the anomalous magnetic moment (g-2) of the muon. While the QED and electroweak contributions to (g-2) are known to great precision, most of the theoretical uncertainty presently comes from the hadronic (i.e. QCD) contributions, of which there are two that are relevant at the present level of precision: the contribution from the hadronic vacuum polarization, which can be inserted into the leading-order QED correction, and the contribution from hadronic light-by-light scattering, which can be inserted between the incoming external photon and the muon line. There are a number of established methods for computing the hadronic vacuum polarization, both phenomenologically using a dispersion relation and the experimental R-ratio, and in lattice field theory by computing the correlator of two vector currents (which can, and needs to, be refined in various way in order to achieve competitive levels of precision). No such well-established methods exist yet for the light-by-light scattering, which is so far mostly described using models. There are however, now efforts from a number of different sides to tackle this contribution; Taku mainly presented the appproach by the RBC/UKQCD collaboration, which uses stochastic sampling of the internal photon propagators to explicitly compute the diagrams contributing to (g-2). Another approach would be to calculate the four-point amplitude explicitly (which has recently been done for the first time by the Mainz group) and to decompose this into form factors, which can then be integrated to yield the light-by-light scattering contribution to (g-2).

The second talk of the day was given by Petros Dimopoulos, who reviewed lattice determinations of D and B leptonic decays and mixing. For the charm quark, cut-off effects appear to be reasonably well-controlled with present-day lattice spacings and actions, and the most precise lattice results for the D and D_{s} decay constants claim sub-percent accuracy. For the b quark, effective field theories or extrapolation methods have to be used, which introduces a source of hard-to-assess theoretical uncertainty, but the results obtained from the different approaches generally agree very well amongst themselves. Interestingly, there does not seem to be any noticeable dependence on the number of dynamical flavours in the heavy-quark flavour observables, as N_{f}=2 and N_{f}=2+1+1 results agree very well to within the quoted precisions.

In the afternoon, the CKMfitter collaboration split off to hold their own meeting, and the lattice participants met for a few one-on-one or small-group discussions of some topics of interest.

The second talk of the day was given by Petros Dimopoulos, who reviewed lattice determinations of D and B leptonic decays and mixing. For the charm quark, cut-off effects appear to be reasonably well-controlled with present-day lattice spacings and actions, and the most precise lattice results for the D and D

In the afternoon, the CKMfitter collaboration split off to hold their own meeting, and the lattice participants met for a few one-on-one or small-group discussions of some topics of interest.

Here you see three planets. The blue planet is orbiting the Sun in a realistic way: it’s going around an ellipse. The other two are moving in and out just like the blue planet, so they all stay on the same circle. But they’re moving around this circle at different rates! The green planet is […]

Here you see three planets. The blue planet is orbiting the Sun in a realistic way: it’s going around an ellipse.

The other two are moving *in and out* just like the blue planet, so they all stay on the same circle. But they’re moving around this circle at different rates! The green planet is moving faster than the blue one: it completes 3 orbits each time the blue planet goes around once. The red planet isn’t going around at all: it only moves in and out.

What’s going on here?

In 1687, Isaac Newton published his *Principia Mathematica*. This book is famous, but in Propositions 43–45 of Book I he did something that people didn’t talk about much—until recently. He figured out what extra force, besides gravity, would make a planet move like one of these weird other planets. It turns out an extra force obeying an inverse cube law will do the job!

Let me make this more precise. We’re only interested in ‘central forces’ here. A **central force** is one that only pushes a particle towards or away from some chosen point, and only depends on the particle’s distance from that point. In Newton’s theory, gravity is a central force obeying an inverse square law:

for some constant But he considered adding an extra central force obeying an *inverse cube* law:

He showed that if you do this, for any motion of a particle in the force of gravity you can find a motion of a particle in gravity plus this extra force, where the distance is the same, but the angle is not.

In fact Newton did more. He showed that if we start with *any* central force, adding an inverse cube force has this effect.

There’s a very long page about this on Wikipedia:

• Newton’s theorem of revolving orbits, Wikipedia.

I haven’t fully understood all of this, but it instantly makes me think of three other things I know about the inverse cube force law, which are probably related. So maybe you can help me figure out the relationship.

The first, and simplest, is this. Suppose we have a particle in a central force. It will move in a plane, so we can use polar coordinates to describe its position. We can describe the force away from the origin as a function Then the radial part of the particle’s motion obeys this equation:

where is the magnitude of particle’s angular momentum.

So, angular momentum acts to provide a ‘fictitious force’ pushing the particle out, which one might call the centrifugal force. And this force obeys an inverse cube force law!

Furthermore, thanks to the formula above, it’s pretty obvious that if you change but also add a precisely compensating inverse cube force, the value of will be unchanged! So, we can set things up so that the particle’s radial motion will be unchanged. But its angular motion will be different, since it has a different angular momentum. This explains Newton’s observation.

It’s often handy to write a central force in terms of a potential:

Then we can make up an extra potential responsible for the centrifugal force, and combine it with the actual potential into a so-called **effective potential**:

The particle’s radial motion then obeys a simple equation:

For a particle in gravity, where the force obeys an inverse square law and is proportional to the effective potential might look like this:

This is the graph of

If you’re used to particles rolling around in potentials, you can easily see that a particle with not too much energy will move back and forth, never making it to or This corresponds to an elliptical orbit. Give it more energy and the particle can escape to infinity, but it will never hit the origin. The repulsive ‘centrifugal force’ always overwhelms the attraction of gravity near the origin, at least if the angular momentum is nonzero.

On the other hand, suppose we have a particle moving in an attractive inverse cube force! Then the potential is proportional to so the effective potential is

where is negative for an attractive force. If this attractive force is big enough, namely

then this force can exceed the centrifugal force, and the particle can fall in to If we keep track of the angular coordinate we can see what’s really going on. The particle is *spiraling in to its doom*, hitting the origin in a finite amount of time!

This should remind you of a black hole, and indeed something similar happens there, but even more drastic:

• Schwarzschild geodesics: effective radial potential energy, Wikipedia.

For a nonrotating uncharged black hole, the effective potential has three terms. Like Newtonian gravity it has an attractive term and a repulsive term. But it also has an attractive term term! In other words, it’s as if on top of Newtonian gravity, we had another attractive force obeying an inverse *fourth power* law! This overwhelms the others at short distances, so if you get too close to a black hole, you spiral in to your doom.

For example, a black hole can have an effective potential like this:

But back to inverse cube force laws! I know two more things about them. A while back I discussed how a particle in an inverse square force can be reinterpreted as a harmonic oscillator:

• Planets in the fourth dimension, *Azimuth*.

There are many ways to think about this, and apparently the idea in some form goes all the way back to Newton! It involves a sneaky way to take a particle in a potential

and think of it as moving around in the complex plane. Then if you *square* its position—thought of as a complex number—and cleverly *reparametrize time*, you get a particle moving in a potential

This amazing trick can be generalized! A particle in a potential

can transformed to a particle in a potential

if

A good description is here:

• Rachel W. Hall and Krešimir Josić, Planetary motion and

the duality of force laws, *SIAM Review* **42** (2000), 115–124.

This trick transforms particles in potentials with ranging between and to potentials with ranging between and It’s like a see-saw: when is small, is big, and vice versa.

But you’ll notice this trick doesn’t actually work *at* the case that corresponds to the inverse cube force law. The problem is that in this case, so we can’t find with

So, the inverse cube force is special in three ways: it’s the one that you can add on to any force to get solutions with the same radial motion but different angular motion, it’s the one that naturally describes the ‘centrifugal force’, and it’s the one that doesn’t have a partner! We’ve seen how the first two ways are secretly the same. I don’t know about the third, but I’m hopeful.

Finally, here’s a fourth way in which the inverse cube law is special. This shows up most visibly in quantum mechanics… and this is what got me interested in this business in the first place.

You see, I’m writing a paper called ‘Struggles with the continuum’, which discusses problems in analysis that arise when you try to make some of our favorite theories of physics make sense. The inverse square force law poses interesting problems of this sort, which I plan to discuss. But I started wanting to compare the inverse cube force law, just so people can see things that go wrong in this case, and not take our successes with the inverse square law for granted.

Unfortunately a huge digression on the inverse cube force law would be out of place in that paper. So, I’m offloading some of that material to here.

In quantum mechanics, a particle moving in an inverse cube force law has a Hamiltonian like this:

The first term describes the kinetic energy, while the second describes the potential energy. I’m setting and to remove some clutter that doesn’t really affect the key issues.

To see how strange this Hamiltonian is, let me compare an easier case. If the Hamiltonian

is essentially self-adjoint on which is the space of compactly supported smooth functions on 3d Euclidean space minus the origin. What this means is that first of all, is defined on this domain: it maps functions in this domain to functions in . But more importantly, it means we can uniquely extend from this domain to a self-adjoint operator on some larger domain. In quantum physics, we want our Hamiltonians to be self-adjoint. So, this fact is good.

Proving this fact is fairly hard! It uses something called the Kato–Lax–Milgram–Nelson theorem together with this beautiful inequality:

for any

If you think hard, you can see this inequality is actually a fact about the quantum mechanics of the inverse cube law! It says that if the energy of a quantum particle in the potential is bounded below. And in a sense, this inequality is optimal: if , the energy is *not* bounded below. This is the quantum version of how a classical particle can spiral in to its doom in an attractive inverse cube law, if it doesn’t have enough angular momentum. But it’s subtly and mysteriously different.

You may wonder how this inequality is used to prove good things about potentials that are ‘less singular’ than the potential: that is, potentials with For that, you have to use some tricks that I don’t want to explain here. I also don’t want to prove this inequality, or explain why its optimal! You can find most of this in some old course notes of mine:

• John Baez, *Quantum Theory and Analysis*, 1989.

See especially section 15.

But it’s pretty easy to see how this inequality implies things about the expected energy of a quantum particle in the potential . So let’s do that.

In this potential, the expected energy of a state is:

Doing an integration by parts, this gives:

The inequality I showed you says precisely that when this is greater than or equal to zero. So, the expected energy is actually *nonnegative* in this case! And making greater than only makes the expected energy bigger.

Note that in classical mechanics, the energy of a particle in this potential ceases to be bounded below as soon as Quantum mechanics is different because of the uncertainty principle! To get a lot of negative potential energy, the particle’s wavefunction must be squished near the origin, but that gives it kinetic energy.

It turns out that the Hamiltonian for a quantum particle in an inverse cube force law has exquisitely subtle and tricky behavior. Many people have written about it, running into ‘paradoxes’ when they weren’t careful enough. Only rather recently have things been straightened out.

For starters, the Hamiltonian for this kind of particle

has different behaviors depending on Obviously the force is attractive when and repulsive when but that’s not the only thing that matters! Here’s a summary:

• In this case is essentially self-adjoint on So, it admits a unique self-adjoint extension and there’s no ambiguity about this case.

• In this case is *not* essentially self-adjoint on In fact, it admits more than one self-adjoint extension! This means that we need *extra input from physics* to choose the Hamiltonian in this case. It turns out that we need to say what happens when the particle hits the singularity at This is a long and fascinating story that I just learned yesterday.

• In this case the expected energy is bounded below for It turns out that whenever we have a Hamiltonian that is bounded below, even if there is not a *unique* self-adjoint extension, there exists a canonical ‘best choice’ of self-adjoint extension, called the Friedrichs extension. I explain this in my course notes.

• In this case the expected energy is not bounded below, so we don’t have the Friedrichs extension to help us choose which self-adjoint extension is ‘best’.

To go all the way down this rabbit hole, I recommend these two papers:

• Sarang Gopalakrishnan, *Self-Adjointness and the Renormalization of Singular Potentials*, B.A. Thesis, Amherst College.

• D. M. Gitman, I. V. Tyutin and B. L. Voronov, Self-adjoint extensions and spectral analysis in the Calogero problem, *Jour. Phys. A* **43** (2010), 145205.

The first is good for a broad overview of problems associated to singular potentials such as the inverse cube force law; there is attention to mathematical rigor the focus is on physical insight. The second is good if you want—as I wanted—to really get to the bottom of the inverse cube force law in quantum mechanics. Both have lots of references.

Also, both point out a crucial fact I haven’t mentioned yet: in quantum mechanics the inverse cube force law is special because, *naively, at least* it has a kind of symmetry under rescaling! You can see this from the formula

by noting that both the Laplacian and have units of length^{-2}. So, they both transform in the same way under rescaling: if you take any smooth function , apply and then expand the result by a factor of , you get times what you get if you do those operations in the other order.

In particular, this means that if you have a smooth eigenfunction of with eigenvalue you will also have one with eigenfunction for any And if your original eigenfunction was normalizable, so will be the new one!

With some calculation you can show that when the Hamiltonian has a smooth normalizable eigenfunction with a negative eigenvalue. In fact it’s spherically symmetric, so finding it is not so terribly hard. But this instantly implies that has smooth normalizable eigenfunctions with *any* negative eigenvalue.

This implies various things, some terrifying. First of all, it means that is not bounded below, at least not on the space of smooth normalizable functions. A similar but more delicate scaling argument shows that it’s also not bounded below on as I claimed earlier.

This is scary but not terrifying: it simply means that when the potential is too strongly negative for the Hamiltonian to be bounded below.

The terrifying part is this: we’re getting uncountably many normalizable eigenfunctions, all with different eigenvalues, one for each choice of A self-adjoint operator on a countable-dimensional Hilbert space like can’t have uncountably many normalizable eigenvectors with different eigenvalues, since then they’d all be orthogonal to each other, and that’s too many orthogonal vectors to fit in a Hilbert space of countable dimension!

This sounds like a paradox, but it’s not. These functions are not all orthogonal, and they’re not all eigenfunctions of a self-adjoint operator. You see, the operator is not self-adjoint on the domain we’ve chosen, the space of all smooth functions in We can carefully choose a domain to get a self-adjoint operator… but it turns out there are many ways to do it.

Intriguingly, in most cases this choice breaks the naive dilation symmetry. So, we’re getting what physicists call an ‘anomaly’: a symmetry of a classical system that fails to give a symmetry of the corresponding quantum system.

Of course, if you’ve made it this far, you probably want to understand what the different choices of Hamiltonian for a particle in an inverse cube force law *actually mean, physically.* The idea seems to be that they say how the particle changes phase when it hits the singularity at and bounces back out.

(Why does it bounce back out? Well, if it didn’t, time evolution would not be unitary, so it would not be described by a self-adjoint Hamiltonian! We could try to describe the physics of a quantum particle that *does not* come back out when it hits the singularity, and I believe people have tried, but this requires a different set of mathematical tools.)

For a detailed analysis of this, it seems one should take Schrödinger’s equation and do a separation of variables into the angular part and the radial part:

For each choice of one gets a space of spherical harmonics that one can use for the angular part The interesting part is the radial part, Here it is helpful to make a change of variables

At least naively, Schrödinger’s equation for the particle in the potential then becomes

where

Beware: I keep calling all sorts of different but related Hamiltonians and this one is for the *radial part* of the dynamics of a quantum particle in an inverse cube force. As we’ve seen before in the classical case, the centrifugal force and the inverse cube force join forces in an ‘effective potential’

where

So, we have reduced the problem to that of a particle on the open half-line moving in the potential The Hamiltonian for this problem:

is called the **Calogero Hamiltonian**. Needless to say, it has fascinating and somewhat scary properties, since to make it into a bona fide self-adjoint operator, we must make some choice about what happens when the particle hits The formula above does not really specify the Hamiltonian.

This is more or less where Gitman, Tyutin and Voronov *begin* their analysis, after a long and pleasant review of the problem. They describe all the possible choices of self-adjoint operator that are allowed. The answer depends on the values of but very crudely, the choice says something like how the phase of your particle changes when it bounces off the singularity. Most choices break the dilation invariance of the problem. But intriguingly, some choices retain invariance under a *discrete subgroup* of dilations!

So, the rabbit hole of the inverse cube force law goes quite deep, and I expect I haven’t quite gotten to the bottom yet. The problem may seem pathological, verging on pointless. But the math is fascinting, and it’s a great testing-ground for ideas in quantum mechanics—very manageable compared to deeper subjects like quantum field theory, which are riddled with their own pathologies. Finally, the connection between the inverse cube force law and centrifugal force makes me think it’s not a mere curiosity.

The animation was made by ‘WillowW’ and placed on Wikicommons. It’s one of a number that appears in this Wikipedia article:

• Newton’s theorem of revolving orbits, Wikipedia.

I made the graphs using the free online Desmos graphing calculator.

Images of the systematic destruction of archaeological sites and art pieces in Syria are no news any more, but I was especially saddened to see before/after aerial pictures of Palmyra's site today, which demonstrate how the beautiful temple of Bel has been completely destroyed by explosives. A picture of the temple is shown below.

There aren’t many blog posts about vertex operator algebras, so I thought I’d help fill this gap by mentioning a substantial advance by Jethro van Ekeren, Sven Möller, and Nils Scheithauer that appeared on the ArXiv last month. The most important feature is that this paper resolves several folklore conjectures that have been around since […]

There aren’t many blog posts about vertex operator algebras, so I thought I’d help fill this gap by mentioning a substantial advance by Jethro van Ekeren, Sven Möller, and Nils Scheithauer that appeared on the ArXiv last month. The most important feature is that this paper resolves several folklore conjectures that have been around since near the beginning of vertex operator algebra theory. This was good for me, since I was able to use some of these results to prove the Generalized Moonshine Conjecture much more quickly than I had expected. I won’t say much about moonshine here, as I think it deserves its own post.

I briefly discussed vertex operator algebras in my earlier post on generalized moonshine. While an ordinary commutative ring has a multiplication structure , a vertex operator algebra (or VOA) has a “meromorphic” version , and there is an integer grading on the underlying vector space that is compatible with the powers of z in a straightforward way.

I won’t say much about VOAs in general, but rather, I will consider those that satisfy some of the following nice properties:

Rational: Any V-module is a direct sum of irreducibles.

Holomorphic: Any V-module is a direct sum of copies of V.

cofinite: This is a rather technical-sounding condition that ends up being equivalent to a lot of natural representation-theoretic finiteness properties, like “every representation is a direct sum of generalized eigenspaces for the energy operator L(0)”.

It is conjectured that every rational VOA is cofinite.

As usual, when we have a collection of nice objects, we may want to classify them, or at least find ways of building new ones and discovering invariants and constraints.

Some basic invariants are the central charge c (a complex number), and the character of a module , given by the graded dimension , where the grading is given by the energy operator L(0), and we view the power series as a function on the complex upper half plane using . One of the first general results for “nice” VOAs is Zhu’s 1996 proof that if V is rational and cofinite, then the characters of irreducible V-modules form a vector-valued modular form for a finite dimensional representation of . Furthermore, he showed that in this case, the central charge c is a rational number, and if V is holomorphic, then c is a nonnegative integer divisible by 8.

Dong and Mason classified the holomorphic cofinite VOAs of central charge 8 and 16 – there is one isomorphism class for central charge 8, and 2 isomorphism classes for central charge 16. All three are given by a lattice VOA construction. In general, if you are given an even unimodular positive definite lattice (which only exists in dimension divisible by 8), you get a a holomorphic cofinite VOA from it, so the central charge 8 object comes from the lattice, and the central charge 16 objects come from the and lattices. Central charge 24 is at a sweet spot of difficulty, where Schellekens did a long calculation in 1993 and conjectured the existence of 71 isomorphism types. Central charge 32 is more or less impossible, since lattices alone give over types.

For central charge 24, because the L(0) eigenspace with eigenvalue 1 is naturally a Lie algebra, the proposed isomorphism types are labeled by finite dimensional Lie algebras. Schellekens’s list is basically

1. The monster VOA, with .

2. The Leech lattice VOA, with commutative of dimension 24.

3. 69 extensions of rational Kac-Moody VOAs by suitable modules (here the Lie algebras are products of simple Lie algebras and in particular noncommutative).

As far as existence is concerned, 23 of the 69 come from lattices, known as the Niemeier lattices. An additional 14 come from Z/2 orbifolds of lattices. Another 18 come from a “framed VOA” construction, given by adjoining modules to a tensor product of Ising models according to some codes (Lam, Shimakura, and Yamauchi are the main names here). The remaining 12 are more difficult, and after this recent paper, there are 2 that have not been constructed. There are only a few cases where uniqueness is known, such as the Leech lattice VOA. The case is wide open, and perhaps the worst for uniqueness, since there isn’t any Lie algebra structure to work with.

One of the results of van Ekeren, Möller, and Scheithauer was a reconstruction of Schellekens list, i.e., eliminating other choices of Lie algebras from possibility. This was desirable, since the original paper was quite sketchy in places and didn’t have proofs. A second result was a collection of new examples, in particular nearly filling out this list of 69. They did this by solving an old problem, namely the construction of holomorphic orbifolds. The idea is the following: Given a holomorphic cofinite VOA V, and a finite order automorphism g, take the fixed-point subalgebra , and take a direct sum with some -modules not in V to get something new. In fact, the desired -modules were more or less known – there is a notion of g-twisted V-module V(g), and one takes the submodules of all fixed by a suitable lift of g. To show that this even makes sense requires substantial development of the theory.

First, the existence and uniqueness of irreducible g-twisted V-modules V(g) was a nonconstructive theorem of Dong, Li, and Mason in 2000. Then, to get a multiplication operation on the component -modules, one first shows that irreducible -modules have a nice tensor structure (in particular, are simple currents), so that the space of suitable multiplication maps is highly constrained. This requires recent major theorems of Miyamoto ( is rational and cofinite – 2013), and Huang (if V is rational and cofinite, then Rep(V) is a modular tensor category and the Verlinde formula holds – 2008). By some clever applications of the Verlinde formula, van Ekeren, Möller, and Scheithauer showed that once we have simple currents with suitable L(0)-eigenvalues, the homological obstruction to a well-behaved multiplication vanishes, and one gets a holomorphic VOA.

The intermediate results that I found most useful for my own purposes were:

1. assembly of an abelian intertwining algebra (a generalization of VOA where the commutativity of multiplication is allowed some monodromy) from all irreducible -modules.

2. the explicit description of the action on the characters of irreducible -modules. This also solves a conjecture of Dong, Li, and Mason concerning the graded dimension of twisted modules.

In particular, if g has order n, then the simple currents are arranged into a central extension , where the kernel is given by an action of g, and the image is the twisting on modules. The group A is also equipped with a canonical -valued quadratic form. One obtains an A-graded abelian intertwining algebra with monodromy determined by the quadratic form (up to a certain coboundary), and the action is by the corresponding Weil representation (up to the c/24 correction).

*Guest post by John Wiltshire-Gordon*

My new paper arXiv:1508.04107 contains a definition that may be of interest to category theorists. Emily Riehl has graciously offered me this chance to explain.

In algebra, if we have a firm grip on some object $X$, we probably have generators for $X$. Later, if we have some quotient $X / \sim$, the same set of generators will work. The trouble comes when we have a subobject $Y \subseteq X$, which (given typical bad luck) probably misses every one of our generators. We need theorems to find generators for subobjects.

Category theory offers a clean definition of generation: if $C$ is some category of algebraic objects and $F \dashv U$ is a free-forgetful adjunction with $U : C \longrightarrow \mathrm{Set}$, then it makes sense to say that a subset $S \subseteq U X$ generates $X$ if the adjunct arrow $F S \rightarrow X$ is epic.

Certainly $R$-modules fit into this setup nicely, and groups, commutative rings, etc. What about simplicial sets? It makes sense to say that some simplicial set $X$ is “generated” by its 1-simplices, for example: this is saying that $X$ is 1-skeletal. But simplicial sets come with many sorts of generator…Ah, and they also come with many forgetful functors, given by evaluation at the various objects of $\Delta^{op}$.

Let’s assume we’re in a context where there are many forgetful functors, and many corresponding notions of generation. In fact, for concreteness, let’s think about cosimplicial vector spaces over the rational numbers. A cosimplicial vector space is a functor $\Delta \longrightarrow \mathrm{Vect}$, and so for each $d \in \Delta$ we have a functor $U_d : \mathrm{Vect}^{\Delta} \longrightarrow \mathrm{Set}$ with $U_d V = V d$ and left adjoint $F_d$. We will say that a vector $v \in V d$ sits **in degree** $d$, and generally think of $V$ as a vector space graded by the objects of $\Delta$.

**Definition** A cosimplicial vector space $V$ is **generated in degree** $d \in \Delta$ if the component at $V$ of the counit $F_d U_d V \longrightarrow V$ is epic. Similarly, $V$ is **generated in degrees** $\{d_i \}$ if $\oplus_i F_{d_i} U_{d_i} V \longrightarrow V$ is epic.

**Example** Let $V = F_d \{ \ast \}$ be the free cosimplicial vector space on a single vector in degree $d$. Certainly $V$ is generated in degree $d$. It’s less obvious that $V$ admits a unique nontrivial subobject $W \hookrightarrow V$. Let’s try to find generators for $W$. It turns out that $W d = 0$, so no generators there. Since $W \neq 0$, there must be generators somewhere… but where?

**Theorem** (Wrangling generators for cosimplicial abelian groups): If $V$ is a cosimplicial abelian group generated in degrees $\{ d_i \}$, then any subobject $W \hookrightarrow V$ is generated in degrees $\{d_i + 1 \}$.

Ok, so now we know exactly where to look for generators for subobjects: exactly one degree higher than our generators for the ambient object. The generators have been successfully wrangled.

Time to formalize. Let $U_d, U_x, U_y: C \longrightarrow \mathrm{Set}$ be three forgetful functors, and let $F_d, F_x, F_y$ be their left adjoints. When the labels $d, x, y$ appear unattached to $U$ or $F$, they represent formal “degrees of generation,” even though $C$ need not be a functor category. In this broader setting, we say $V \in C$ is generated in (formal) degree $\star$ if the component of the counit $F_{\star} U_{\star} V \longrightarrow V$ is epic. By the unit-counit identities, if $V$ is generated in degree $\star$ , the whole set $U_{\star} V$ serves as a generating set.

**Definition** Say $x \leq_d y$ if for all $V \in C$ generated in degree $d$, every subobject $W \hookrightarrow V$ generated in degree $x$ is also generated in degree $y$.

Practically speaking, if $x \leq_d y$, then generators in degree $x$ can always be replaced by generators in degree $y$ provided that the ambient object is generated in degree $d$.

Suppose that we have a complete understanding of the preorder $\leq_d$ , and we’re trying to generate subobjects inside some object generated in degree $d$. Then every time $x \leq_d y$, we may replace generators in degree $x$ with their span in degree $y$. In other words, the generators $S \subseteq U_x V$ are equivalent to generators $\mathrm{Im}(U_y F_x S \longrightarrow U_y V) \subseteq U_y V$. Arguing in this fashion, we may wrangle all generators upward in the preorder $\leq_d$. If $\leq_d$ has a finite system of elements $m_1, m_2, \ldots, m_k$ capable of bounding any other element from above, then all generators may be replaced by generators in degrees $m_1, m_2, \ldots, m_k$. This is the ideal wrangling situation, and lets us restrict our search for generators to this finite set of degrees.

In the case of cosimplicial vector spaces, $d + 1$ is a maximum for the preorder $\leq_d$ with $d \in \Delta$. So any subobject of a simplicial vector space generated in degree $d$ is generated in degree $d + 1$. (It is also true that, for example, $d + 2$ is a maximum for the preorder $\leq_d$. In fact, we have $d + 1 \leq_d d+2 \leq_d d + 1$. That’s why it’s important that $\leq_d$ is a preorder, and not a true partial order.)

In the generality presented above, where a formal degree of generation is a free-forgetful adjunction to $\mathrm{Set}$, I do not know much about the preorder $\leq_d$. The paper linked above is concerned with the case $C = (\mathrm{Mod}_R)^{\mathcal{D}}$ of functor categories of $\mathcal{D}$-shaped diagrams of $R$-modules. In this case I can say a lot.

In Definition 1.1, I give a computational description of the preorder $\leq_d$. This description makes it clear that if $\mathcal{D}$ has finite hom-sets, then you could program a computer to tell you whenever $x \leq_d y$.

In Section 2.2, I give many different categories $\mathcal{D}$ for which explicit upper bounds are known for the preorders $\leq_d$. (In the paper, an explicit system of upper bounds for every preorder is called a homological modulus.)

If you’re interested in more context for this work, I highly recommend two of Emily Riehl’s posts from February of last year on Representation Stability, a subject begun by Tom Church and Benson Farb. With Jordan Ellenberg, they explained how certain stability patterns can be considered consequences of structure theory for the category of $\mathrm{FI}$-modules $(\mathrm{Vect}_{\mathbb{Q}})^{\mathrm{FI}}$ where $\mathrm{FI}$ is the category of finite sets with injections. In the category of $\mathrm{FI}$-modules, the preorders $\leq_n$ have no finite system of upper bounds. In contrast, for $\mathrm{Fin}$-modules, every preorder has a maximum! (Here $\mathrm{Fin}$ is the usual category of finite sets). So having all finite set maps instead of just the injections gives much better control on generators for subobjects. As an application, Jordan and I use this extra control to obtain new results about configuration spaces of points on a manifold. You can read about it on his blog.

For more on the recent progress of representation stability, you can also check out the bibliography of my paper or take a look at exciting new results by CEF, as well as Rohit Nagpal, Andy Putman, Steven Sam, and Andrew Snowden, and Jenny Wilson.

Greetings from Mainz, where I have the pleasure of covering a meeting for you without having to travel from my usual surroundings (I clocked up more miles this year already than can be good from my environmental conscience).

Our Scientific Programme (which is the bigger of the two formats of meetings that the Mainz Institute of Theoretical Physics (MITP) hosts, the smaller being Topical Workshops) started off today with two keynote talks summarizing the status and expectations of the FLAG (Flavour Lattice Averaging Group, presented by Tassos Vladikas) and CKMfitter (presented by Sébastien Descotes-Genon) collaborations. Both groups are in some way in the business of performing weighted averages of flavour physics quantities, but of course their backgrounds, rationale and methods are quite different in many regards. I will no attempt to give a line-by-line summary of the talks or the afternoon discussion session here, but instead just summarize a few

points that caused lively discussions or seemed important in some other way.

By now, computational resources have reached the point where we can achieve such statistics that the total error on many lattice determinations of precision quantities is completely dominated by systematics (and indeed different groups would differ at the several-σ level if one were to consider only their statistical errors). This may sound good in a way (because it is what you'd expect in the limit of infinite statistics), but it is also very problematic, because the estimation of systematic errors is in the end really more of an art than a science, having a crucial subjective component at its heart. This means not only that systematic errors quoted by different groups may not be readily comparable, but also that it become important how to treat systematic errors (which may also be correlated, if e.g. two groups use the same one-loop renormalization constants) when averaging different results. How to do this is again subject to subjective choices to some extent. FLAG imposes cuts on quantities relating to the most important sources of systematic error (lattice spacings, pion mass, spatial volume) to select acceptable ensembles, then adds the statistical and systematic errors in quadrature, before performing a weighted average and computing the overall error taking correlations between different results into account using Schmelling's procedure. CKMfitter, on the other hand, adds all systematic errors linearly, and uses the Rfit procedure to perform a maximum likelihood fit. Either choice is equally permissible, but they are not directly compatible (so CKMfitter can't use FLAG averages as such).

Another point raised was that it is important for lattice collaborations computing mixing parameters to not just provide products like*f*_{B}√B_{B}, but also *f*_{B} and *B*_{B} separately (as well as information about the correlation between these quantities) in order to help making the global CKM fits easier.

Our Scientific Programme (which is the bigger of the two formats of meetings that the Mainz Institute of Theoretical Physics (MITP) hosts, the smaller being Topical Workshops) started off today with two keynote talks summarizing the status and expectations of the FLAG (Flavour Lattice Averaging Group, presented by Tassos Vladikas) and CKMfitter (presented by Sébastien Descotes-Genon) collaborations. Both groups are in some way in the business of performing weighted averages of flavour physics quantities, but of course their backgrounds, rationale and methods are quite different in many regards. I will no attempt to give a line-by-line summary of the talks or the afternoon discussion session here, but instead just summarize a few

points that caused lively discussions or seemed important in some other way.

By now, computational resources have reached the point where we can achieve such statistics that the total error on many lattice determinations of precision quantities is completely dominated by systematics (and indeed different groups would differ at the several-σ level if one were to consider only their statistical errors). This may sound good in a way (because it is what you'd expect in the limit of infinite statistics), but it is also very problematic, because the estimation of systematic errors is in the end really more of an art than a science, having a crucial subjective component at its heart. This means not only that systematic errors quoted by different groups may not be readily comparable, but also that it become important how to treat systematic errors (which may also be correlated, if e.g. two groups use the same one-loop renormalization constants) when averaging different results. How to do this is again subject to subjective choices to some extent. FLAG imposes cuts on quantities relating to the most important sources of systematic error (lattice spacings, pion mass, spatial volume) to select acceptable ensembles, then adds the statistical and systematic errors in quadrature, before performing a weighted average and computing the overall error taking correlations between different results into account using Schmelling's procedure. CKMfitter, on the other hand, adds all systematic errors linearly, and uses the Rfit procedure to perform a maximum likelihood fit. Either choice is equally permissible, but they are not directly compatible (so CKMfitter can't use FLAG averages as such).

Another point raised was that it is important for lattice collaborations computing mixing parameters to not just provide products like

In 1675, Robert Hooke published the “true mathematical and mechanical form” for the shape of an ideal arch. However, Hooke wrote the theory as an anagram, abcccddeeeeefggiiiiiiiillmmmmnnnnnooprrsssttttttuuuuuuuux. Its solution was never published in his lifetime. What was the secret hiding … Continue reading

In 1675, Robert Hooke published the “true mathematical and mechanical form” for the shape of an ideal arch. However, Hooke wrote the theory as an anagram,

abcccddeeeeefggiiiiiiiillmmmmnnnnnooprrsssttttttuuuuuuuux.

Its solution was never published in his lifetime. What was the secret hiding in these series of letters?

The arch is one of the fundamental building blocks in architecture. Used in bridges, cathedrals, doorways, etc., arches provide an aesthetic quality to the structures they dwell within. Their key utility comes from their ability to support weight above an empty space, by distributing the load onto the abutments at its feet. A dome functions much like an arch, except a dome takes on a three-dimensional shape whereas an arch is two-dimensional. Paradoxically, while being the backbone of the many edifices, arches and domes themselves are extremely delicate: a single misplaced component along its curve, or an improper shape in the design would spell doom for the entire structure.

The Romans employed the rounded arch/dome—in the shape of a semicircle/hemisphere–in their bridges and pantheons. The Gothic architecture favored the pointed arch and the ribbed vault in their designs. However, neither of these arch forms were adequate for the progressively grander structures and more ambitious cathedrals sought in the 17th century. Following the great fire of London in 1666, a massive rebuilding effort was under way. Among the new public buildings, the most prominent was to be St. Paul’s Cathedral with its signature dome. A modern theory of arches was sorely needed: what is the perfect shape for an arch/dome?

Christopher Wren, the chief architect of St. Paul’s Cathedral, consulted Hooke on the dome’s design. To quote from the cathedral’s website [1]:

How did Hooke came about the shape for the dome? It wasn’t until after Hooke’s death his executor provided the unencrypted solution to the anagram [2]

The two half-sections [of the dome] in the study employ a formula devised by Robert Hooke in about 1671 for calculating the curve of a parabolic dome and reducing its thickness. Hooke had explored this curve the three-dimensional equivalent of the ‘hanging chain’, or catenary arch: the shape of a weighted chain which, when inverted, produces the ideal profile for a self-supporting arch. He thought that such a curve derived from the equationy=x^{3}.

Ut pendet continuum flexile, sic stabit contiguum rigidum inversum

which translates to

As hangs a flexible cable so, inverted, stand the touching pieces of an arch.

In other words, the ideal shape of an arch is exactly that of a freely hanging rope, only upside down. Hooke understood that the building materials could withstand only compression forces and not tensile forces, in direct contrast to a rope that could resist tension but would buckle under compression. The mathematics describing the arch and the cable are in fact identical, save for a minus sign. Consequently, you could perform a real-time simulation of an arch using a piece of string!

Bonus: Robert published the anagram in his book describing helioscopes, simply to “fill up the vacancy of the ensuring page” [3]. On that very page among other claims, Hooke also wrote the anagram “*ceiiinosssttuu*” in regards to “the true theory of elasticity”. Can you solve this riddle?

[1] https://www.stpauls.co.uk/history-collections/the-collections/architectural-archive/wren-office-drawings/5-designs-for-the-dome-c16871708

[2] Written in Latin, the ‘u’ and ‘v’ are the same letter.

[3] In truth, Hooke was likely trying to avoid being scooped by his contemporaries, notably Issac Newton.

This article was inspired by my visit to the Huntington Library. I would like to thank Catherine Wehrey for the illustrations and help with the research.

Lake Tahoe is famous for preserving dead bodies in good condition over many years, therefore it is a natural place to organize the SUSY conference. As a tribute to this event, here is a plot from a recent ATLAS meta-analysis:

It shows the constraints on the gluino and the lightest neutralino masses in the pMSSM. Usually, the most transparent way to present experimental limits on supersymmetry is by using*simplified models. *This consists in picking two or more particles out of the MSSM zoo, and assuming that they are the only ones playing role in the analyzed process. For example, a popular simplified model has a gluino and a stable neutralino interacting via an effective quark-gluino-antiquark-neutralino coupling. In this model, gluino pairs are produced at the LHC through their couplings to ordinary gluons, and then each promptly decays to 2 quarks and a neutralino via the effective couplings. This shows up in a detector as 4 or more jets and the missing energy carried off by the neutralinos. Within this simplified model, one can thus interpret the LHC multi-jets + missing energy data as constraints on 2 parameters: the gluino mass and the lightest neutralino mass. One result of this analysis is that, for a massless neutralino, the gluino mass is constrained to be bigger than about 1.4 TeV, see the white line in the plot.

A non-trivial question is what happens to these limits if one starts to fiddle with the remaining one hundred parameters of the MSSM. ATLAS tackles this question in the framework of the pMSSM, which is a version of the MSSM where all flavor and CP violating parameters are set to zero. In the resulting 19-dimensional parameter space, ATLAS picks a large number of points that reproduce the correct Higgs mass and are consistent with various precision measurements. Then they check what fraction of the points with a given m_gluino and m_neutralino survives the constraints from all ATLAS supersymmetry searches so far. Of course, the results will depend on how the parameter space is sampled, but nevertheless we can get a feeling of how robust are the limits obtained in simplified models. It is interesting that the gluino mass limits turn out to be quite robust. From the plot one can see that, for a light neutralino, it is difficult to live with m_gluino < 1.4 TeV, and that there's no surviving points with m_gluino < 1.1 TeV. Similar conclusion are not true for all simplified models, e.g., the limits on squark masses in simplified models can be very much relaxed by going to the larger parameter space of the pMSSM. Another thing worth noticing is that the blind spot near the m_gluino=m_neutralino diagonal is not really there: it is covered by ATLAS monojet searches.

The LHC run-2 is going slow, so we still have some time to play with the run-1 data. See the ATLAS paper for many more plots. New stronger limits on supersymmetry are not expected before next summer.

It shows the constraints on the gluino and the lightest neutralino masses in the pMSSM. Usually, the most transparent way to present experimental limits on supersymmetry is by using

A non-trivial question is what happens to these limits if one starts to fiddle with the remaining one hundred parameters of the MSSM. ATLAS tackles this question in the framework of the pMSSM, which is a version of the MSSM where all flavor and CP violating parameters are set to zero. In the resulting 19-dimensional parameter space, ATLAS picks a large number of points that reproduce the correct Higgs mass and are consistent with various precision measurements. Then they check what fraction of the points with a given m_gluino and m_neutralino survives the constraints from all ATLAS supersymmetry searches so far. Of course, the results will depend on how the parameter space is sampled, but nevertheless we can get a feeling of how robust are the limits obtained in simplified models. It is interesting that the gluino mass limits turn out to be quite robust. From the plot one can see that, for a light neutralino, it is difficult to live with m_gluino < 1.4 TeV, and that there's no surviving points with m_gluino < 1.1 TeV. Similar conclusion are not true for all simplified models, e.g., the limits on squark masses in simplified models can be very much relaxed by going to the larger parameter space of the pMSSM. Another thing worth noticing is that the blind spot near the m_gluino=m_neutralino diagonal is not really there: it is covered by ATLAS monojet searches.

The LHC run-2 is going slow, so we still have some time to play with the run-1 data. See the ATLAS paper for many more plots. New stronger limits on supersymmetry are not expected before next summer.

The fourth edition of the International Conference on New Frontiers in Physics has ended yesterday evening, and it is time for a summary. However, this year I must say that I am not in a good position to give an overview of the most interesting physics discussion that have taken place here, as I was involved in the organization of events for the conference and I could only attend a relatively small fraction of the presentations.

ICNFP offers a broad view on the forefront topics of many areas of physics, with the main topics being nuclear and particle physics, yet with astrophysics and theoretical developments in quantum mechanics and related subjects also playing a major role.

ICNFP offers a broad view on the forefront topics of many areas of physics, with the main topics being nuclear and particle physics, yet with astrophysics and theoretical developments in quantum mechanics and related subjects also playing a major role.

I've been a bit quiet here the last week or so, you may have noticed. I wish I could say it was because I've been scribbling some amazing new physics in my notebooks, or drawing several new pages for the book, or producing some other simply measurable output, but I cannot. Instead, I can only report that it was the beginning of a new semester (and entire new academic year!) this week just gone, and this - and all the associated preparations and so forth - coincided with several other things including working on several drafts of a grant renewal proposal.

The best news of all is that my new group of students for my class (graduate electromagnetism, second part) seems like a really good and fun group, and I am looking forward to working with them. We've had two lectures already and they seem engaged, and eager to take part in the way I like my classes to run - interactively and investigatively. I'm looking forward to working with them over the semester.

Other things I've been chipping away at in preparation for the next couple of months include launching the USC science film competition (its fourth year - I skipped last year because of family leave), moving my work office (for the first time in the 12 years I've been here), giving some lectures at an international school, organizing a symposium in celebration of the centenary of Einstein's General Relativity, and a number of shoots for some TV shows that might be of [...] Click to continue reading this post

The post Fresh Cycle appeared first on Asymptotia.

Simon Winchester in today’s New York Times Book Review: Traveling in China back in the early 1990s, I was waiting for my westbound train to take on water at a lonely halt in the Taklamakan Desert when a young Chinese woman tapped me on the shoulder, asked if I spoke English and, further, if I […]

Simon Winchester in today’s New York Times Book Review:

Traveling in China back in the early 1990s, I was waiting for my westbound train to take on water at a lonely halt in the Taklamakan Desert when a young Chinese woman tapped me on the shoulder, asked if I spoke English and, further, if I knew anything of Anthony Trollope. I was quite taken aback. Trollope here? A million miles from anywhere? I mumbled an incredulous, “Yes, I know a bit” — whereupon, in a brisk and businesslike manner, she declared that the train would remain at the oasis for the next, let me see, 27 minutes, and in that time would I kindly answer as many of her questions as possible about plot and character development in “The Eustace Diamonds”?

Ever since that encounter, I’ve been fully convinced of China’s perpetual and preternatural power to astonish, amaze and delight.

It doesn’t actually seem that preternatural to me that a young, presumably educated woman read a novel and liked it. What he should have been convinced of is *Anthony Trollope’s* perpetual and preternatural power to astonish, amaze and delight people separated from him by vast spans of culture and time. “The Eustace Diamonds” is ace. Probably “He Knew He Was Right” or “Can You Forgive Her?” (my own first Trollope) are better places to start. Free Gutenbergs of both here. Was any other Victorian novelist great enough to have the Pet Shop Boys name a song after one of their books? No. None other was so great.

Predictions about self-driving cars: The average American could shift some of the 5.5 hours of television watched per day into the car, and end up with vastly more personal time once freed from the need to pay attention to the road. Wouldn’t that person just watch another hour of TV and end up with the […]

Predictions about self-driving cars:

The average American could shift some of the 5.5 hours of television watched per day into the car, and end up with vastly more personal time once freed from the need to pay attention to the road.

Wouldn’t that person just watch another hour of TV and end up with the same amount of personal time?

Good answers to the last question! I think I perhaps put my thumb on the scale too much by naming a variable p. Let me try another version in the form of a dialogue. ME: Hey in that other room somebody flipped a fair coin. What would you say is the probability that it fell […]

Good answers to the last question! I think I perhaps put my thumb on the scale too much by naming a variable p.

Let me try another version in the form of a dialogue.

ME: Hey in that other room somebody flipped a fair coin. What would you say is the probability that it fell heads?

YOU: I would say it is 1/2.

ME: Now I’m going to give you some more information about the coin. A confederate of mine made a prediction about whether the coin would fall head or tails and he was correct. Now what would you say is the probability that it fell heads?

YOU: Now I have no idea, because I have no information about the propensity of your confederate to predict heads.

(**Update**: What if what you knew about the coin in advance was that it fell heads 99.99% of the time? Would you still be at ease saying you end up with no knowledge at all about the probability that the coin fell heads?) This is in fact what Joyce thinks you should say. White disagrees. But I think they both agree that it *feels weird* to say this, whether or not it’s correct.

Why would it not feel weird? I think Qiaochu’s comment in the previous thread gives a clue. He writes:

Re: the update, no, I don’t think that’s strange. You gave me some weird information and I conditioned on it. Conditioning on things changes my subjective probabilities, and conditioning on weird things changes my subjective probabilities in weird ways.

In other words, he takes it for granted that what you are supposed to do is condition on new information. Which is obviously what you should do in any context where you’re dealing with mathematical probability satisfying the usual axioms. Are we in such a context here? I certainly don’t mean “you have no information about Coin 2” to mean “Coin 2 falls heads with probability p where p is drawn from the uniform distribution (or Jeffreys, or any other specified distribution, thanks Ben W.) on [0,1]” — if I meant that, there could be no controversy!

I think as mathematicians we are very used to thinking that probability as we know it is what we mean when we talk about uncertainty. Or, to the extent we think we’re talking about something other than probability, we are wrong to think so. Lots of philosophers take this view. I’m not sure it’s wrong. But I’m also not sure it’s right. And whether it’s wrong or right, I think it’s kind of weird.

So, the Hugo awards were handed out a little while ago, with half of the prose fiction categories going to “No Award” and the other half to works I voted below “No Award.” Whee. I’m not really interested in rehashing the controversy, though I will note that Abigail Nussbaum’s take is probably the one I most agree with.

With the release of the nominating stats, a number of people released “what might’ve been” ballots, stripping out the slate nominees– Tobias Buckell’s was the first I saw, so I’ll link that. I saw a lot of people exclaiming over how awesome that would’ve been, and found myself with some time to kill, so I went and read the short stories from that list (all of which are freely available online).

And, you know, they’re… fine. Really, the main effect this had for me was to reconfirm that short fiction is a low-return investment for me. I wouldn’t object to any of these winning an award, but none of them jumped out at me as brilliant “Oh my God, this *must* win!” stuff.

(Aside: I spent a while thinking about why it is that short fiction at least seems to have a lower rate of return for me than novels. I think it’s mostly that current tastes interact with the length limit in a way that works really badly for me. The stuff that’s getting celebrated these days tends toward the “literalized metaphor” side of things– the speculative elements tend not to be points of science, but supernatural reflections of the emotional state of the characters. The failure mode of that is “crashingly obvious,” particularly when constrained to keep it under 7,500 words, and I really hate that. My reaction tends to be “Yes, I see what you did there. You’re very clever. Here’s a shiny gold star,” and a story starting in that hole needs to be really good just to ascend to the heights of “Meh.” Novels provide a little more room to work, and it’s easier to hide the clever metaphors, so they’re less likely to bug me in that particular way.)

This then leads to the fundamental problem I have with Hugo nominating, namely that the “just nominate what you love!” method really doesn’t work for me when three-quarters of the awards go to low-return categories. Left to my own devices, I’m just not going to read much short fiction, certainly not enough to make sensible nominations. Which means I’m going to be one of those folks who nominates a bunch of novels and maybe a couple of movies, and leaves the rest blank. Which plays into the hands of the slate voters.

The one year recently when I actually read enough short fiction to make halfway sensible nominations was when Niall Harrison put together a “Short Story Club” of bloggers who all read a particular story and reviewed it online (you can find my reviews here). Working from a limited selection of stories by somebody with a really good grasp of the state of the field was a big help, and the obligation to say something about them on the blog was enough to motivate me to read them. (And, no, “keep garbage off the Hugo ballot” is not by itself enough motivation, especially in the absence of quality curation.)

Niall has since moved to a role where it wouldn’t be appropriate for him to do that kind of thing, but I’d really love to see someone else take that up: picking a set of plausibly Hugo-worthy stories, setting a schedule, and collecting links to reviews. Even if it doesn’t lead to finding stuff that I actually love and want to nominate, it would be interesting to read about what other people see in these stories.

I don’t know that anybody reading this has the free time or standing in the SF community to do this kind of thing, but I know I would find it really valuable. So I’ll throw it out there and hope for the best.

Today was my last full day at MPIA for 2015. Ness and I worked on her age paper, and made the list of final items that need to be completed before the paper can be submitted, first to the *SDSS-IV* Collaboration, and then to the journal. I also figured out a bunch of baby steps I can do on my own paper with the age catalog.

Here is a puzzling example due to Roger White. There are two coins. Coin 1 you know is fair. Coin 2 you know nothing about; it falls heads with some probability p, but you have no information about what p is. Both coins are flipped by an experimenter in another room, who tells you that […]

Here is a puzzling example due to Roger White.

There are two coins. Coin 1 you know is fair. Coin 2 you know nothing about; it falls heads with some probability p, but you have no information about what p is.

Both coins are flipped by an experimenter in another room, who tells you that the two coins agreed (i.e. both were heads or both tails.)

What do you now know about Pr(Coin 1 landed heads) and Pr(Coin 2 landed heads)?

(Note: as is usual in analytic philosophy, whether or not this is puzzling is *itself* somewhat controversial, but I think it’s puzzling!)

**Update**: Lots of people seem to not find this at all puzzling, so let me add this. If your answer is “I know nothing about the probability that coin 1 landed heads, it’s some unknown quantity p that agrees with the unknown parameter governing coin 2,” you should ask yourself: is it strange that someone flipped a fair coin in another room and you don’t know what the probability is that it landed heads?”

Relevant readings: section 3.1 of the Stanford Encyclopedia of Philosophy article on imprecise probabilities and Joyce’s paper on imprecise credences, pp.13-14.

I forgot to do this last week, because I was busy preparing for SteelyPalooza on Saturday, but here are links to my recent physics posts over at Forbes:

— What ‘Ant-Man’ Gets Wrong About The Real Quantum Realm: On the way home from the Schrödinger Sessions, I had some time to kill so I stopped to watch a summer blockbuster. The movie was enjoyable enough, thanks to charming performances from the key players, but the premise is dippy even for a comic-book movie. It does, however, provide a hook to talk about quantum physics, so…

— Great Books For Non-Physicists Who Want To Understand Quantum Physics: I did a bit of name-and-title-dropping at the Schrödinger Sessions, and a few of the writers asked if I had a list of books I would recommend (other than, you know, *How to Teach [Quantum] Physics to your Dog*). I didn’t have one already put together, so I made a new post listing a dozen good books to read.

— How Quantum Randomness Saves Relativity: Inspired in part by the many discussions of entanglement at the Schrödinger Sessions, a discussion of why you can’t actually use entangled particles to send messages faster than light. “Spooky action at a distance” is impossible because of “God playing dice,” a cute bit of historical irony.

— What Has Quantum Mechanics Ever Done For Us? I know you get more and angrier comments on political posts, but for sheer “WTF?” weirdness in the comment section, nothing beats quantum physics. This is a short explanation of the quantum underpinnings of major modern technologies, in response to a crank who left a bunch of angry comments on a G+ link to the quantum randomness article.

Not a huge number of posts for two weeks of blogging, but I’m very happy with them. And the quantum randomness one in particular is a nice counter to some myths about science communication– over 20,000 people have clicked through to read an article that builds up to a citation of the no-cloning theorem. I’m pretty proud of that.

SteelyKid starts second grade next week, and her summer project was to read Julius, the Baby of the World and make a poster with baby pictures of herself. This, of course, led to looking at a lot of old photos of SteelyKid, including many of the Baby Blogging shots I took back in the day with Appa for scale.

And now, of course, both kids are *way* bigger than Appa, so they wanted some up-to-date scale photos. Which, of course, I had to share with the Internet. So, behold, the attack of the giant children:

(The side-eye from The Pip is not because he’s skeptical about the picture-taking, but because he’d rather be watching the TV, out of frame to the right…)

And that’s how things look in what would be week 368 of Baby Blogging had we kept that up…

I seem to be settling into a groove of doing about two posts a week at Forbes, which isn’t quite enough to justify a weekly wrap-up, but works well bi-weekly. (I’m pretty sure that’s the one that means “every two weeks” not “twice a week,” but I always struggle with that one…) Over the last couple of weeks, I’ve hit a wide range of stuff:

— Planning To Study Science In College? Here’s Some Advice Pretty much what it says on the label. I saw a bunch of “advice to new students” posts, and said “Oh, I should do one of those…” so I did.

— The Physics of Star Trek: Teleportation Versus Transporters: Somebody pointed out that Gene Roddenberry’s birthday was last week, and Alex Knapp at Forbes is a big Trekkie, so he asked the science folks if we could write about Star Trek science. I had been thinking of writing about teleportation anyway, so this was an obvious choice.

— How Quantum Symmetry Makes Solid Matter Possible: At the Schrödinger Sessions a few weeks back, Trey Porto of JQI gave a really nice explanation of quantum statistics that I said “I’m totally going to steal that.” In the course of poking at ideas for a new book proposal, I ran across some mathematical physics papers showing that you need Pauli exclusion to explain the stability of solid matter, so I combined those here.

— New Experiment Closes Quantum Loopholes, Confirms Spookiness: A new arxiv preprint is the first “loophole-free” test of Bell’s inequality, which is something people have been working on for decades now. So I wrote up an explanation of what it means and how it works.

So, that’s pretty much the full range of stuff I might write about over there: Two explainers, one with a pop-culture hook, one news story, and a thing about science education. Something about their system makes umpteen copies of the “photo gallery” for the old “Six Things Everyone Should Know About Quantum Physics” show up on my author page, making me look more insanely prolific than I really am, but that’s a decent two weeks worth of stuff…

At MPIA Galaxy Coffee, K. G. Lee (MPIA) and Jose Oñorbe (MPIA) gave talks about the intergalactic medium. Lee spoke about reconstruction of the density field, and Oñorbe spoke about reionization. The conversations continued into lunch, where I spoke with the research group of Joe Hennawi (MPIA) about various problems in inferring things about the intergalactic medium and quasar spectra in situations where *(a)* it is easy to simulate the data but *(b)* there is no explicit likelihood function. I advocated likelihood-free inference or ABC (as it is often called), plus adaptive sampling. We also discussed model selection, and I advocated cross-validation.

In the afternoon, Ness and I continued code review and made decisions for final runs of *The Cannon* for our red-giant masses and ages paper.

I found this awesome photo at entdeckungen.net |

“We live in a hologram,” the physicists say, but what do they mean? Is there a flat-world-me living on the walls of the room? Or am I the projection of a mysterious five-dimensional being and beyond my own comprehension? And if everything inside my head can be described by what’s on its boundary, then how many dimensions do I really live in? If these are questions that keep you up at night, I have the answers.

It all started with the search for a unified theory.

Unification has been enormously useful for our understanding of natural law: Apples fall according to the same laws that keep planets on their orbits. The manifold appearances of matter as gases, liquids and solids, can be described as different arrangements of molecules. The huge variety of molecules themselves can be understood as various compositions of atoms. These unifying principles were discovered long ago. Today physicists refer by unification specifically to a common origin of different interactions. The electric and magnetic interactions, for example, turned out to be two different aspects of the same electromagnetic interaction. The electromagnetic interaction, or its quantum version respectively, has further been unified with the weak nuclear interaction. Nobody has succeeded yet in unifying all presently known interactions, the electromagnetic with the strong and weak nuclear ones, plus gravity.

String theory was conceived as a theory of the strong nuclear interaction, but it soon became apparent that quantum chromodynamcis, the theory of quarks and gluons, did a better job at this. But string theory gained second wind after physicists discovered it may serve to explain all the known interactions including gravity, and so could be a unified theory of everything, the holy grail of physics.

It turned out to be difficult however to get specifically the Standard Model interactions back from string theory. And so the story goes that in recent years the quest for unification has slowly been replaced with a quest for dualities that demonstrate that all the different types of string theories are actually different aspects of the same theory, which is yet to be fully understood.

A duality in the most general sense is a relation that identifies two theories. You can understand a duality as a special type of unification: In a normal unification, you merge two theories together to a larger theory that contains the former two in a suitable limit. If you relate two theories by a duality, you show that the theories are the same, they just appear different, depending on how you look at them.

One of the most interesting developments in high energy physics during the last decades is the finding of dualities between theories in a different number of space-time dimensions. One of the theories is a gravitational theory in the higher-dimensional space, often called “the bulk”. The other is a gauge-theory much like the ones in the standard model, and it lives on the boundary of the bulk space-time. This relation is often referred to as the gauge-gravity correspondence, and it is a limit of a more general duality in string theory.

To be careful, this correspondence hasn’t been strictly speaking proved. But there are several examples where it has been so thoroughly studied that there is very little doubt it will be proved at some point.

These dualities are said to be “holographic” because they tell us that everything allowed to happen in the bulk space-time of the gravitational theory is encoded on the boundary of that space. And because there are fewer bits of information on the surface of a volume than in the volume itself, fewer things can happen in the volume than you’d have expected. It might seem as if particles inside a box are all independent from each other, but they must actually be correlated. It’s like you were observing a large room with kids running and jumping but suddenly you’d notice that every time one of them jumps, for a mysterious reason ten others must jump at exactly the same time.

This limitation on the amount of independence between particles due to holography would only become noticeable at densities too high for us to test directly. The reason this type of duality is interesting nevertheless is that physics is mostly the art of skillful approximation, and using dualities is a new skill.

You have probably seen these Feynman diagrams that sketch particle scattering processes? Each of these diagrams makes a contribution to an interaction process. The more loops there are in a diagram, the smaller the contributions are. And so what physicists do is adding up the largest contributions first, then the smaller ones, and even smaller ones, until they’ve reached the desired precision. It’s called “perturbation theory” and only works if the contributions really get smaller the more interactions take place. If that is so, the theory is said to be “weakly coupled” and all is well. If it ain’t so, the theory is said to be “strongly coupled” and you’d never be done summing all the relevant contributions. If a theory is strongly coupled, then the standard methods of particle physicists fail.

The strong nuclear force for example has the peculiar property of “asymptotic freedom,” meaning it becomes weaker at high energies. But at low energies, it is very strong. Consequently nuclear matter at low energies is badly understood, as for example the behavior of the quark gluon plasma, or the reason why single quarks do not travel freely but are always “confined” to larger composite states. Another interesting case that falls in this category is that of “strange” metals, which include high-temperature superconductors, another holy grail of physicists. The gauge-gravity duality helps dealing with these systems because when the one theory is strongly coupled and difficult to treat, then the dual theory is weakly coupled and easy to treat. So the duality essentially serves to convert a difficult calculation to a simple one.

Since the theory on the boundary and the theory in the bulk are related by the duality they can be used to describe the same physics. So on a fundamental level the distinction doesn’t make sense – they are two different ways to describe the same thing. It’s just that sometimes one of them is easier to use, sometimes the other.

One can give meaning to the question though if you look at particular systems, as for example the quark gluon plasma or a black hole, and ask for the number of dimensions that particles experience. This specification of particles is what makes the question meaningful because identifying particles isn’t always possible.

The theory for the quark gluon plasma is placed on the boundary because it would be described by the strongly coupled theory. So if you consider it to be part of your laboratory then you have located the lab, with yourself in it, on the boundary. However, the notion of ‘dimensions’ that we experience is tied to the freedom of particles to move around. This can be made more rigorous in the definition of ‘spectral dimension’ which measures, roughly speaking, in how many directions a particle can get lost. The very fact that makes a system strongly coupled though means that one can’t properly define single particles that travel freely. So while you can move around in the laboratory’s three spatial dimensions, the quark gluon plasma first has to be translated to the higher dimensional theory to even speak about individual particles moving. In that sense, part of the laboratory has become higher dimensional, indeed.

If you look at an astrophysical black hole however, then the situation is reversed. We know that particles in its vicinity are weakly coupled and experience three spatial dimensions. If you wanted to apply the duality in this case then we would be situated in the bulk and there would be lower-dimensional projections of us and the black hole on the boundary, constraining our freedom to move around, but in such a subtle way that we don’t notice. However, the bulk space-times that are relevant in the gauge-gravity duality are so-called Anti-de-Sitter spaces, and these always have a negative cosmological constant. The universe we inhabit however has to our best current knowledge a positive cosmological constant. So it is not clear that there actually is a dual system that can describe the black holes in our universe.

Many researchers are presently working on expanding the gauge-gravity duality to include spaces with a positive cosmological constant or none at all, but at least so far it isn’t clear that this works. So for now we do not know whether there exist projections of us in a lower-dimensional space-time.

The applications of the gauge-gravity duality fall roughly into three large areas, plus a diversity of technical developments driving the general understanding of the theory. The three areas are the quark gluon plasma, strange metals, and black hole evaporation. In the former two cases our universe is on the boundary, in the latter we are in the bulk.

The studies of black hole evaporation are examinations of mathematical consistency conducted to unravel just exactly how information may escape a black hole, or what happens at the singularity. In this area there are presently more answers than there are questions. The applications of the duality to the quark gluon plasma initially caused a lot of excitement, but as of recently some skepticism has spread. It seems that the plasma is not as strongly coupled as originally thought, and using the duality is not as straightforward as hoped. The applications to strange metals and other classes of materials are making rapid progress as both analytical and numerical methods are being developed. The behavior for several observables has been qualitatively reproduced, but it is as present not very clear exactly which systems are the best to use. The space of models is still too big, which leaves too much room for useful predictions. In summary, as the scientists say “more work is needed”.

That’s what he says, yes. Essentially he is claiming that our universe has holographic properties even though it has a positive cosmological constant, and that the horizon of a black hole also serves as a surface that contains all the information of what happens in the full space-time. This would mean in particular that the horizon of a black hole keeps track of what fell into the black hole, and so nothing is really forever lost.

This by itself isn’t a new idea. What is new in this work with Malcom Perry and Andrew Strominger is that they claim to have a way to store and release the information, in a dynamical situation. Details of how this is supposed to work however are so far not clear. By and large the scientific community has reacted with much skepticism, not to mention annoyance over the announcement of an immature idea.

The University of Michigan at Ann Arbor is proud to be hosting ALGECOM, the twice annual midwestern conference on algebra, geometry and combinatorics on Saturday, October 24. We will feature four speakers, namely, Jonah Blasiak (Drexel University) Laura Escobar (University of Illinois at Urbana-Champaign) Joel Kamnitzer (University of Toronto) Tri Lai (IMA and University of […]

The University of Michigan at Ann Arbor is proud to be hosting

ALGECOM, the twice annual midwestern conference on algebra, geometry

and combinatorics on Saturday, October 24. We will feature four

speakers, namely,

Jonah Blasiak (Drexel University)

Laura Escobar (University of Illinois at Urbana-Champaign)

Joel Kamnitzer (University of Toronto)

Tri Lai (IMA and University of Minnesota)

as well as a poster session. If you would like to submit a poster, please e-mail (David Speyer) with a quick summary of your work by September 15.

A block of rooms has been reserved at the (Lamp Post Inn) under the name of ALGECOM.

This conference is supported by a conference grant form the NSF. Limited funds are available for graduate student travel to the conference. Please contact (David Speyer) to request support, and include a note from your adviser.

More information will be added to our website as it becomes available.

We hope to see you there!

The twin prime conjecture is one of the oldest unsolved problems in analytic number theory. There are several reasons why this conjecture remains out of reach of current techniques, but the most important obstacle is the parity problem which prevents purely sieve-theoretic methods (or many other popular methods in analytic number theory, such as the […]

The twin prime conjecture is one of the oldest unsolved problems in analytic number theory. There are several reasons why this conjecture remains out of reach of current techniques, but the most important obstacle is the parity problem which prevents purely sieve-theoretic methods (or many other popular methods in analytic number theory, such as the circle method) from detecting pairs of prime twins in a way that can distinguish them from other twins of almost primes. The parity problem is discussed in these previous blog posts; this obstruction is ultimately powered by the *Möbius pseudorandomness principle* that asserts that the Möbius function is asymptotically orthogonal to all “structured” functions (and in particular, to the weight functions constructed from sieve theory methods).

However, there is an intriguing “alternate universe” in which the Möbius function *is* strongly correlated with some structured functions, and specifically with some Dirichlet characters, leading to the existence of the infamous “Siegel zero“. In this scenario, the parity problem obstruction disappears, and it becomes possible, *in principle*, to attack problems such as the twin prime conjecture. In particular, we have the following result of Heath-Brown:

Theorem 1At least one of the following two statements are true:

- (Twin prime conjecture) There are infinitely many primes such that is also prime.
- (No Siegel zeroes) There exists a constant such that for every real Dirichlet character of conductor , the associated Dirichlet -function has no zeroes in the interval .

Informally, this result asserts that if one had an infinite sequence of Siegel zeroes, one could use this to generate infinitely many twin primes. See this survey of Friedlander and Iwaniec for more on this “illusory” or “ghostly” parallel universe in analytic number theory that should not actually exist, but is surprisingly self-consistent and to date proven to be impossible to banish from the realm of possibility.

The strategy of Heath-Brown’s proof is fairly straightforward to describe. The usual starting point is to try to lower bound

for some large value of , where is the von Mangoldt function. Actually, in this post we will work with the slight variant

where

is the second von Mangoldt function, and denotes Dirichlet convolution, and is an (unsquared) Selberg sieve that damps out small prime factors. This sum also detects twin primes, but will lead to slightly simpler computations. For technical reasons we will also smooth out the interval and remove very small primes from , but we will skip over these steps for the purpose of this informal discussion. (In Heath-Brown’s original paper, the Selberg sieve is essentially replaced by the more combinatorial restriction for some large , where is the primorial of , but I found the computations to be slightly easier if one works with a Selberg sieve, particularly if the sieve is not squared to make it nonnegative.)

If there is a Siegel zero with close to and a Dirichlet character of conductor , then multiplicative number theory methods can be used to show that the Möbius function “pretends” to be like the character in the sense that for “most” primes near (e.g. in the range for some small and large ). Traditionally, one uses complex-analytic methods to demonstrate this, but one can also use elementary multiplicative number theory methods to establish these results (qualitatively at least), as will be shown below the fold.

The fact that pretends to be like can be used to construct a tractable approximation (after inserting the sieve weight ) in the range (where for some large ) for the second von Mangoldt function , namely the function

Roughly speaking, we think of the periodic function and the slowly varying function as being of about the same “complexity” as the constant function , so that is roughly of the same “complexity” as the divisor function

which is considerably simpler to obtain asymptotics for than the von Mangoldt function as the Möbius function is no longer present. (For instance, note from the Dirichlet hyperbola method that one can estimate to accuracy with little difficulty, whereas to obtain a comparable level of accuracy for or is essentially the Riemann hypothesis.)

One expects to be a good approximant to if is of size and has no prime factors less than for some large constant . The Selberg sieve will be mostly supported on numbers with no prime factor less than . As such, one can hope to approximate (1) by the expression

as it turns out, the error between this expression and (1) is easily controlled by sieve-theoretic techniques. Let us ignore the Selberg sieve for now and focus on the slightly simpler sum

As discussed above, this sum should be thought of as a slightly more complicated version of the sum

Accordingly, let us look (somewhat informally) at the task of estimating the model sum (3). One can think of this problem as basically that of counting solutions to the equation with in various ranges; this is clearly related to understanding the equidistribution of the hyperbola in . Taking Fourier transforms, the latter problem is closely related to estimation of the Kloosterman sums

where denotes the inverse of in . One can then use the Weil bound

where is the greatest common divisor of (with the convention that this is equal to if vanish), and the decays to zero as . The Weil bound yields good enough control on error terms to estimate (3), and as it turns out the same method also works to estimate (2) (provided that with large enough).

Actually one does not need the full strength of the Weil bound here; any power savings over the trivial bound of will do. In particular, it will suffice to use the weaker, but easier to prove, bounds of Kloosterman:

Lemma 2 (Kloosterman bound)One haswhenever and are coprime to , where the is with respect to the limit (and is uniform in ).

*Proof:* Observe from change of variables that the Kloosterman sum is unchanged if one replaces with for . For fixed , the number of such pairs is at least , thanks to the divisor bound. Thus it will suffice to establish the fourth moment bound

The left-hand side can be rearranged as

which by Fourier summation is equal to

Observe from the quadratic formula and the divisor bound that each pair has at most solutions to the system of equations . Hence the number of quadruples of the desired form is , and the claim follows.

We will also need another easy case of the Weil bound to handle some other portions of (2):

Lemma 3 (Easy Weil bound)Let be a primitive real Dirichlet character of conductor , and let . Then

*Proof:* As is the conductor of a primitive real Dirichlet character, is equal to times a squarefree odd number for some . By the Chinese remainder theorem, it thus suffices to establish the claim when is an odd prime. We may assume that is not divisible by this prime , as the claim is trivial otherwise. If vanishes then does not vanish, and the claim follows from the mean zero nature of ; similarly if vanishes. Hence we may assume that do not vanish, and then we can normalise them to equal . By completing the square it now suffices to show that

whenever . As is on the quadratic residues and on the non-residues, it now suffices to show that

But by making the change of variables , the left-hand side becomes , and the claim follows.

While the basic strategy of Heath-Brown’s argument is relatively straightforward, implementing it requires a large amount of computation to control both main terms and error terms. I experimented for a while with rearranging the argument to try to reduce the amount of computation; I did not fully succeed in arriving at a satisfactorily minimal amount of superfluous calculation, but I was able to at least reduce this amount a bit, mostly by replacing a combinatorial sieve with a Selberg-type sieve (which was not needed to be positive, so I dispensed with the squaring aspect of the Selberg sieve to simplify the calculations a little further; also for minor reasons it was convenient to retain a tiny portion of the combinatorial sieve to eliminate extremely small primes). Also some modest reductions in complexity can be obtained by using the second von Mangoldt function in place of . These exercises were primarily for my own benefit, but I am placing them here in case they are of interest to some other readers.

** — 1. Consequences of a Siegel zero — **

It is convenient to phrase Heath-Brown’s theorem in the following equivalent form:

Theorem 4Suppose one has a sequence of real Dirichlet characters of conductor going to infinity, and a sequence of real zeroes with as . Then there are infinitely many prime twins.

Henceforth, we omit the dependence on from all of our quantities (unless they are explicitly declared to be “fixed”), and the asymptotic notation , , , etc. will always be understood to be with respect to the parameter, e.g. means that for some fixed . (In the language of this previous blog post, we are thus implicitly using “cheap nonstandard analysis”, although we will not explicitly use nonstandard analysis notation (other than the asymptotic notation mentioned above) further in this post. With this convention, we now have a single (but not fixed) Dirichlet character of some conductor with a Siegel zero

It will also be convenient to use the crude bound

which can be proven by elementary means (see e.g. Exercise 57 of this post), although one can use Siegel’s theorem to obtain the better bound . Standard arguments (see also Lemma 59 of this blog post) then give

We now use this Siegel zero to show that pretends to be like for primes that are comparable (in log-scale) to :

Lemma 5For any fixed , we have

For more precise estimates on the error, see the paper of Heath-Brown (particularly Lemma 3).

*Proof:* It suffices to show, for sufficiently large fixed , that

for each fixed natural number .

We begin by considering the sum

Since , one can show through summation by parts (see Lemma 71 of this previous post) that

for any , while from the integral test (see Lemma 2 of this previous post) we have

We can thus estimate (9) as

From summation by parts we again have

and we have the crude bound

so by using (7) and we arrive at

for any , where the exponent does not depend on . In particular, if and is large enough, then by (6), (7), (8) we have

Setting and and subtracting, we conclude that

Comparing this with (10), we conclude that

since , the claim follows.

** — 2. Main argument — **

We let be a large absolute constant ( will do) and set to be the primorial of . Set for some large fixed (large compared to or ). Let be a smooth non-negative function supported on and equal to at . Set

and

Thus is a smooth cutoff to the region , and is a smooth cutoff to the region . It will suffice to establish the lower bound

because the non-twin primes contribute at most to the left-hand side. The weight is an unsquared Selberg sieve designed to damp out those for which or have somewhat small prime factors; we did not square this weight as is customary with the Selberg sieve in order to simplify the calculations slightly (the fact that the weight can be non-negative sometimes will not be a serious concern for us).

Thus is non-negative, and supported on those products of primes with and . Convolving (11) by and using the identity , we have

where . (The quantities are all non-negative, but we will not take advantage of these facts here.) It thus suffices to establish the two bounds

the intuition here is that Lemma 5 is showing that is “sparse” and so the contribution of should be relatively small

We begin with (13). Let be a small fixed quantity to be chosen later. Observe that if is non-zero, then must have a factor on which is non-zero, which implies that is either divisible by a prime with , or by the square of a prime. If the former case occurs, then either or is divisible by ; since , this implies that either is divisible by a prime with , or that is divisible by a prime less than . To summarise, at least one of the following three statements must hold:

- is divisible by a prime .
- is divisible by the square of a prime .
- is divisible by a prime with .

It thus suffices to establish the estimates

as the claim then follows by summing and sending slowly to zero.

We begin with (15). Observe that if divides then either divides or divides . In particular the number of with is . The summand is by the divisor bound, so the left-hand side of (15) is bounded by

and the claim follows.

Next we turn to (14). We can very crudely bound

By Mertens’ theorem, it suffices to show that

for all .

We use a modification of the argument used to prove Proposition 4.2 of this Polymath8b paper. By Fourier inversion, we may write

for some rapidly decreasing function , so that

and hence

and hence by the triangle inequality

for any fixed . Since , we can thus (after substituting ) bound the left-hand side of (18) by

and so it will suffice to show the bound

for any and .

We factor where are primes, and then write where and is the largest index for which . Clearly and with , and the least prime factor of is such that

we have on the support of , and so

and thus . Clearly we have

We write , where denotes the number of prime factors of counting multiplicity. We can thus bound the left-hand side of (19) by

We may replace the weight with a restriction of to the interval . The constraint removes two residue classes modulo every odd prime less than , while the constraint restricts to residue classes modulo . Standard sieve theory then gives

and so we are reduced to showing that

Factoring , we can bound the left-hand side by

which (for large enough) is bounded by

which by Mertens’ theorem is bounded by

and the claim follows.

For future reference we observe that the above arguments also establish the bound

for all .

Finally, we turn to (16). Using (17) again, it suffices to show that

The claim then follows from (21) and Lemma 5.

It remains to prove (12), which we write as

On the support of , we can write

The contribution of the error term can be bounded by

applying (20), this is bounded by which is acceptable for large enough. Thus it suffices to show that

which we write as

where . We split where , , are smooth truncations of to the intervals , , and respectively. It will suffice to establish the bounds

We begin with (24), which is a relatively easy consequence of the cancellation properties of . We may rewrite the left-hand side as

The summand vanishes unless , , and is coprime to , so that . For fixed , the constraints , restricts to residue classes of the form , with , in particular and for some with . Let us fix and consider the sum

Writing , this becomes

From Lemma 3, we have

since is coprime to . From summation by parts we thus have

(noting that if is large enough) and so we can bound the left-hand side of (24) in magnitude by

and (24) follows.

Now we prove (23), which is where we need nontrivial bounds on Kloosterman sums. Expanding out and using the triangle inequality, it suffices (for large enough) to show that

for all . By Fourier expansion of the and constraints (retaining only the restriction that is odd), it suffices to show that

for every .

Fix . If for an odd , then we can uniquely factor such that , , and . It thus suffices to show that

Actually, we may delete the condition since this is implied by the constraints and odd.

We first dispose of the case when is large in the sense that . Making the change of variables , we may rewrite the left-hand side as

We can assume is coprime to and odd with coprime to and , as the contribution of all other cases vanish. The constraints that is odd and then restricts to a single residue class modulo , with restricted to a single residue class modulo . We split this into residue classes modulo to make the phase constant on each residue class. The modulus is not divisible by , since is coprime to and . As such, has mean zero on every consecutive elements in each residue class modulo under consideration, and from summation by parts we then have

and hence the contribution of the case to (25) is

which is acceptable.

It remains to control the contribution of the case to (25). By the triangle inequality, it suffices to show that

for all coprime to . We can of course restrict to be coprime to each other and to . Writing , the constraint is equivalent to

and so we can rewrite the left-hand side as

By Fourier expansion, we can write as a linear combination of with bounded coefficients and , so it suffices to show that

Next, by Fourier expansion of the constraint , we write the left-hand side as

From Poisson summation and the smoothness of , we see that the inner sum is unless

We rearrange the left-hand side as

Suppose first that is of the form for some integer . Then the phase is periodic with period and has mean zero here (since ). From this, we can estimate the inner sum by ; since is restricted to be of size , this contribution is certainly acceptable. Thus we may assume that is not of the form . A similar argument works when (say), so we may assume that , so that .

By (26), this forces the denominator of in lowest form to be . By Lemma 2, we thus have

for any , so from Poisson summation we have

since is constrained to be , the claim follows.

Finally, we prove (22), which is a routine sieve-theoretic calculation. We rewrite the left-hand side as

The summand vanishes unless are coprime to with and . From Poisson summation one then has

The error term is certainly negligible, so it suffices to show that

We can control the left-hand side by Fourier analysis. Writing

and

for some rapidly decreasing functions , the left-hand side may be expressed as

for , and

for . From Mertens’ theorem we have the crude bound

which by the rapid decrease of allows one to restrict to the range with an error of . In particular, we now have .

Recalling that

for , we can factor

where

(the restriction being to prevent vanishing for and small) and one has

for , and

and

for odd . In particular, from the Cauchy integral formula we see that

for . Since we also have in this region, we thus can write (27) as

and our task is now to show that

We have

when (even when have negative real part); since , we conclude from the Cauchy integral formula that

when . For the remaining primes , we have

when and . Summing in using Lemma 5 to handle those between and , and Mertens’ theorem and the trivial bound for all other , we conclude that

and thus

From this and the rapid decrease of , we may restrict the range of even further to for any that goes to infinity arbitrarily slowly with . For sufficiently slow , the above estimates on and Lemma 5 (now used to handle those between and for some going sufficiently slowly to zero) give

and so we are reduced to establishing that

We may once again use the rapid decrease of to remove the prefactor as well as the restrictions , and reduce to showing that

For large enough, it will suffice to show that

with the implied constant independent of . But the left-hand side evaluates to , and the claim follows.

Filed under: expository, math.NT Tagged: prime numbers, Roger Heath-Brown, Siegel zero, twin primes

At Milky Way group meeting, Eddie Schlafly (MPIA) showed beautiful results (combining *PanSTARRS*, *APOGEE*, *2MASS*, and *WISE* data) on the dust extinction law in the Milky Way. He can see that some of the nearby dust structures have anomalous RV values (dust extinction law shapes). Some of these are previously unknown features; they only appear when you have maps of density and RV at the same time. Maybe he gets to name these new structures!

Late in the day, Ness and I audited her code that infers red-giant masses from *APOGEE* spectra. We found some issues with sigmas and variances and inverse variances. It gets challenging! One consideration is that you don't ever want to have infinities, so you want to use inverse variances (which become zero when data are missing). But on the other hand, you want to avoid singular or near-singular matrices (which happen when you have lots of vanishing inverse variances). So we settled on a consistent large value for sigma (and correspondingly small value for the inverse variance) that satisfies both issues for our problem.

Late in the day, Rix, Ness, and I showed Ben Weiner (Arizona) the figures we have made for our paper on inferring red-giant masses and ages from *APOGEE* spectroscopy. He helped us think about changes we might make to the figures to bolster and make more clear the arguments.

I spent some of the day manipulating delta functions and mixtures of delta functions for my attempt to infer the star-formation history of the Milky Way. I learned (for the *N*th time) that it is better to manipulate Gaussians than delta functions; delta functions are way too freaky! And, once again, thinking about things dimensionally (that is, in terms of units) is extremely valuable.

In the morning, Rix and I wrote to Andy Casey (Cambridge) regarding a proposal he made to use *The Cannon* and things we know about weak (that is, linearly responding) spectral lines to create a more interpretable or physically motivated version of our data-driven model, and maybe get detailed chemical abundances. Some of his ideas overlap what we are doing with Yuan-Sen Ting (Harvard). Unfortunately, there is no real way to benefit enormously from the flexibility of the data-driven model without also losing interpretability. The problem is that the training set can have arbitrary issues within it, and these become part of the model; if you can understand the training set so well that you can rule out these issues, then you don't need a data-driven model!

I have written multiple times (here and here, for example) about my concern that the structure of financial incentives and corporate governance have basically killed much of the American corporate research enterprise. Simply put: corporate officers are very heavily rewarded based on very short term metrics (stock price, year-over-year change in rate of growth of profit). When faced with whether to invest company resources in risky long-term research that may not pay off for years if ever, most companies opt out of that investment. Companies that do make long-term investments in research are generally quasi-monopolies. The definition of "research" has increasingly crept toward what used to be called "development"; the definition of "long term" has edged toward "one year horizon for a product"; and physical sciences and engineering research has massively eroded in favor of much less expensive (in infrastructure, at least) work on software and algorithms.

I'm not alone in making these observations - Norm Augustine, former CEO of Lockheed Martin, basically says the same thing, for example. Hillary Clinton has lately started talking about this issue.

Now, writing in The New Yorker this week, James Surowiecki claims that "short termism" is a myth. Apparently companies love R&D and have been investing in it more heavily. I think he's just incorrect, in part because I don't think he really appreciates the difference between research and development, and in part because I don't think he appreciates the sliding definitions of "research", "long term" and the difference between software development and physical sciences and engineering. I'm not the only one who thinks his article has issues - see this article at Forbes.

No one disputes the long list of physical research enterprises that have been eliminated, gutted, strongly reduced, or refocused onto much shorter term projects. A brief list includes IBM, Xerox, Bell Labs, Motorola, General Electric, Ford, General Motors, RCA, NEC, HP Labs, Seagate, 3M, Dupont, and others. Even Microsoft has been cutting back. No one disputes that corporate officers have often left these organizations with fat benefits packages after making long-term, irreversible reductions in research capacity (I'm looking at you, Carly Fiorina). Perhaps "short termism" is too simple an explanation, but claiming that all is well in the world of industrial research just rings false.

I'm not alone in making these observations - Norm Augustine, former CEO of Lockheed Martin, basically says the same thing, for example. Hillary Clinton has lately started talking about this issue.

Now, writing in The New Yorker this week, James Surowiecki claims that "short termism" is a myth. Apparently companies love R&D and have been investing in it more heavily. I think he's just incorrect, in part because I don't think he really appreciates the difference between research and development, and in part because I don't think he appreciates the sliding definitions of "research", "long term" and the difference between software development and physical sciences and engineering. I'm not the only one who thinks his article has issues - see this article at Forbes.

No one disputes the long list of physical research enterprises that have been eliminated, gutted, strongly reduced, or refocused onto much shorter term projects. A brief list includes IBM, Xerox, Bell Labs, Motorola, General Electric, Ford, General Motors, RCA, NEC, HP Labs, Seagate, 3M, Dupont, and others. Even Microsoft has been cutting back. No one disputes that corporate officers have often left these organizations with fat benefits packages after making long-term, irreversible reductions in research capacity (I'm looking at you, Carly Fiorina). Perhaps "short termism" is too simple an explanation, but claiming that all is well in the world of industrial research just rings false.

One of the important things in life is to have a job you enjoy and which is a motivation for waking up in the morning. I can say I am lucky enough to be in that situation. Besides providing me with endless entertainment through the large dataset I enjoy analyzing, and the constant challenge to find new ways and ideas to extract more information from data, my job also gives me the opportunity to gamble - and win money, occasionally.

The following is the prepared version of a talk that I gave at SPARC: a high-school summer program about applied rationality held in Berkeley, CA for the past two weeks. I had a wonderful time in Berkeley, meeting new friends and old, but I’m now leaving to visit the CQT in Singapore, and then to attend the AQIS conference in Seoul.

**Common Knowledge and Aumann’s Agreement Theorem**

*August 14, 2015*

Thank you so much for inviting me here! I honestly don’t know whether it’s possible to teach applied rationality, the way this camp is trying to do. What I know is that, if it *is* possible, then the people running SPARC are some of the awesomest people on earth to figure out how. I’m incredibly proud that Chelsea Voss and Paul Christiano are both former students of mine, and I’m amazed by the program they and the others have put together here. I hope you’re all having fun—or maximizing your utility functions, or whatever.

My research is mostly about quantum computing, and more broadly, computation and physics. But I was asked to talk about something you can actually use in your lives, so I want to tell a different story, involving common knowledge.

I’ll start with the “Muddy Children Puzzle,” which is one of the greatest logic puzzles ever invented. How many of you have seen this one?

OK, so the way it goes is, there are a hundred children playing in the mud. Naturally, they all have muddy foreheads. At some point their teacher comes along and says to them, as they all sit around in a circle: “stand up if you know your forehead is muddy.” No one stands up. For how could they know? Each kid can see all the *other* 99 kids’ foreheads, so knows that they’re muddy, but can’t see his or her own forehead. (We’ll assume that there are no mirrors or camera phones nearby, and also that this is mud that you don’t feel when it’s on your forehead.)

So the teacher tries again. “*Knowing* that no one stood up the last time, *now* stand up if you know your forehead is muddy.” Still no one stands up. Why would they? No matter how many times the teacher repeats the request, still no one stands up.

Then the teacher tries something new. “Look, I hereby announce that *at least one of you* has a muddy forehead.” After that announcement, the teacher again says, “stand up if you know your forehead is muddy”—and again no one stands up. And again and again; it continues 99 times. But then the hundredth time, all the children suddenly stand up.

(There’s a variant of the puzzle involving blue-eyed islanders who all suddenly commit suicide on the hundredth day, when they all learn that their eyes are blue—but as a blue-eyed person myself, that’s always struck me as needlessly macabre.)

What’s going on here? Somehow, the teacher’s announcing to the children that *at least one of them had a muddy forehead* set something dramatic in motion, which would eventually make them all stand up—but how could that announcement possibly have made any difference? After all, each child already knew that at least *99* children had muddy foreheads!

Like with many puzzles, the way to get intuition is to change the numbers. So suppose there were *two* children with muddy foreheads, and the teacher announced to them that at least one had a muddy forehead, and then asked both of them whether their own forehead was muddy. Neither would know. But each child could reason as follows: “if my forehead *weren’t* muddy, then the other child would’ve seen that, and would also have known that at least one of us has a muddy forehead. Therefore she would’ve known, when asked, that her own forehead was muddy. Since she didn’t know, that means my forehead *is* muddy.” So then both children know their foreheads are muddy, when the teacher asks a second time.

Now, this argument can be generalized to *any* (finite) number of children. The crucial concept here is common knowledge. We call a fact “common knowledge” if, not only does everyone know it, but everyone knows everyone knows it, and everyone knows everyone knows everyone knows it, and so on. It’s true that in the beginning, each child knew that all the other children had muddy foreheads, but it wasn’t common knowledge that even *one* of them had a muddy forehead. For example, if your forehead and mine are both muddy, then I know that at least one of us has a muddy forehead, and you know that too, but you don’t know that I know it (for what if your forehead were clean?), and I don’t know that you know it (for what if my forehead were clean?).

What the teacher’s announcement did, was to *make it* common knowledge that at least one child has a muddy forehead (since not only did everyone hear the announcement, but everyone witnessed everyone else hearing it, etc.). And once you understand that point, it’s easy to argue by induction: after the teacher asks and no child stands up (and everyone sees that no one stood up), it becomes common knowledge that at least *two* children have muddy foreheads (since if only one child had had a muddy forehead, that child would’ve known it and stood up). Next it becomes common knowledge that at least *three* children have muddy foreheads, and so on, until after a hundred rounds it’s common knowledge that everyone’s forehead is muddy, so everyone stands up.

The moral is that *the mere act of saying something publicly can change the world—even if everything you said was already obvious to every last one of your listeners.* For it’s possible that, until your announcement, not everyone knew that everyone knew the thing, or knew everyone knew everyone knew it, etc., and that could have prevented them from acting.

This idea turns out to have huge real-life consequences, to situations way beyond children with muddy foreheads. I mean, it also applies to children with dots on their foreheads, or “kick me” signs on their backs…

But seriously, let me give you an example I stole from Steven Pinker, from his wonderful book *The Stuff of Thought*. Two people of indeterminate gender—let’s not make any assumptions here—go on a date. Afterward, one of them says to the other: “Would you like to come up to my apartment to see my etchings?” The other says, “Sure, I’d love to see them.”

This is such a cliché that we might not even notice the deep paradox here. It’s like with life itself: people knew for thousands of years that every bird has the right kind of beak for its environment, but not until Darwin and Wallace could anyone articulate why (and only a few people before them *even recognized there was a question there* that called for a non-circular answer).

In our case, the puzzle is this: both people on the date know perfectly well that the reason they’re going up to the apartment has nothing to do with etchings. They probably even both know the other knows that. But if that’s the case, then why don’t they just blurt it out: “would you like to come up for some intercourse?” (Or “fluid transfer,” as the John Nash character put it in the *Beautiful Mind* movie?)

So here’s Pinker’s answer. Yes, both people know why they’re going to the apartment, but they also want to avoid their knowledge becoming *common* knowledge. They want plausible deniability. There are several possible reasons: to preserve the romantic fantasy of being “swept off one’s feet.” To provide a face-saving way to back out later, should one of them change their mind: since nothing was ever openly said, there’s no agreement to abrogate. In fact, even if only one of the people (say A) might care about such things, if the other person (say B) thinks there’s any *chance* A cares, B will also have an interest in avoiding common knowledge, for A’s sake.

Put differently, the issue is that, as soon as you say X out loud, the other person doesn’t merely learn X: they learn that you know X, that you know that they know that you know X, that you *want* them to know you know X, and an infinity of other things that might upset the delicate epistemic balance. Contrast that with the situation where X is left unstated: yeah, both people are pretty sure that “etchings” are just a pretext, and can even plausibly guess that the other person knows they’re pretty sure about it. But once you start getting to 3, 4, 5, levels of indirection—who knows? Maybe you *do* just want to show me some etchings.

Philosophers like to discuss Sherlock Holmes and Professor Moriarty meeting in a train station, and Moriarty declaring, “I knew you’d be here,” and Holmes replying, “well, I knew that you knew I’d be here,” and Moriarty saying, “I knew you knew I knew I’d be here,” etc. But real humans tend to be unable to reason reliably past three or four levels in the knowledge hierarchy. (Related to that, you might have heard of the game where everyone guesses a number between 0 and 100, and the winner is whoever’s number is the closest to 2/3 of the average of all the numbers. If this game is played by perfectly rational people, who know they’re all perfectly rational, and know they know, etc., then they must all guess 0—exercise for you to see why. Yet experiments show that, if you *actually* want to win this game against average people, you should guess about 20. People seem to start with 50 or so, iterate the operation of multiplying by 2/3 a few times, *and then stop*.)

Incidentally, do you know what I would’ve given for someone to have explained this stuff to me back in high school? I think that a large fraction of the infamous social difficulties that nerds have, is simply down to nerds spending so much time in domains (like math and science) where the point is to struggle with every last neuron to make *everything* common knowledge, to make all truths as clear and explicit as possible. Whereas in social contexts, very often you’re managing a delicate epistemic balance where you need certain things to be known, but not *known* to be known, and so forth—where you need to *prevent* common knowledge from arising, at least temporarily. “Normal” people have an intuitive feel for this; it doesn’t need to be explained to them. For nerds, by contrast, explaining it—in terms of the muddy children puzzle and so forth—might be exactly what’s needed. Once they’re told the rules of a game, nerds can try playing it too! They might even turn out to be good at it.

OK, now for a darker example of common knowledge in action. If you read accounts of Nazi Germany, or the USSR, or North Korea or other despotic regimes today, you can easily be overwhelmed by this sense of, “so why didn’t all the sane people just rise up and overthrow the totalitarian monsters? *Surely* there were more sane people than crazy, evil ones. And probably the sane people even knew, from experience, that many of their neighbors were sane—so why this cowardice?” Once again, it could be argued that common knowledge is the key. Even if everyone knows the emperor is naked; indeed, even if everyone knows everyone knows he’s naked, still, if it’s not *common knowledge*, then anyone who says the emperor’s naked is knowingly assuming a massive personal risk. That’s why, in the story, it took a child to shift the equilibrium. Likewise, even if you know that 90% of the populace will join your democratic revolt *provided they themselves know 90% will join it*, if you can’t make your revolt’s popularity common knowledge, everyone will be stuck second-guessing each other, worried that if they revolt they’ll be an easily-crushed minority. And because of that very worry, they’ll be correct!

(My favorite Soviet joke involves a man standing in the Moscow train station, handing out leaflets to everyone who passes by. Eventually, of course, the KGB arrests him—but they discover to their surprise that the leaflets are just blank pieces of paper. “What’s the meaning of this?” they demand. “What is there to write?” replies the man. “It’s so obvious!” Note that this is *precisely* a situation where the man is trying to make common knowledge something he assumes his “readers” already know.)

The kicker is that, to prevent something from becoming common knowledge, all you need to do is *censor the common-knowledge-producing mechanisms*: the press, the Internet, public meetings. This nicely explains why despots throughout history have been so obsessed with controlling the press, and also explains how it’s possible for 10% of a population to murder and enslave the other 90% (as has happened again and again in our species’ sorry history), even though the 90% could easily overwhelm the 10% by acting in concert. Finally, it explains why believers in the Enlightenment project tend to be such fanatical absolutists about free speech—why they refuse to “balance” it against cultural sensitivity or social harmony or any other value, as so many well-meaning people urge these days.

OK, but let me try to tell you something *surprising* about common knowledge. Here at SPARC, you’ve learned all about Bayes’ rule—how, if you like, you can treat “probabilities” as just made-up numbers in your head, which are required obey the probability calculus, and then there’s a very definite rule for how to update those numbers when you gain new information. And indeed, how an agent that wanders around constantly updating these numbers in its head, and taking whichever action maximizes its expected utility (as calculated using the numbers), is probably *the* leading modern conception of what it means to be “rational.”

Now imagine that you’ve got two agents, call them Alice and Bob, with common knowledge of each other’s honesty and rationality, and with the *same* prior probability distribution over some set of possible states of the world. But now imagine they go out and live their lives, and have totally different experiences that lead to their learning different things, and having different posterior distributions. But then they meet again, and they realize that their opinions about some topic (say, Hillary’s chances of winning the election) are *common knowledge*: they both know each other’s opinion, and they both know that they both know, and so on. Then a striking 1976 result called *Aumann’s Theorem* states that their opinions must be equal. Or, as it’s summarized: “rational agents with common priors can never agree to disagree about anything.”

Actually, before going further, let’s prove Aumann’s Theorem—since it’s one of those things that sounds like a mistake when you first hear it, and then becomes a triviality once you see the 3-line proof. (Albeit, a “triviality” that won Aumann a Nobel in economics.) The key idea is that *knowledge induces a partition on the set of possible states of the world*. Huh? OK, imagine someone is either an old man, an old woman, a young man, or a young woman. You and I agree in giving each of these a 25% prior probability. Now imagine that you find out whether they’re a man or a woman, and I find out whether they’re young or old. This can be illustrated as follows:

The diagram tells us, for example, that *if* the ground truth is “old woman,” then your knowledge is described by the set {old woman, young woman}, while my knowledge is described by the set {old woman, old man}. And this different information leads us to different beliefs: for example, if someone asks for the probability that the person is a woman, you’ll say 100% but I’ll say 50%. OK, but what does it mean for information to be *common knowledge*? It means that I know that you know that I know that you know, and so on. Which means that, if you want to find out what’s common knowledge between us, you need to take the *least common coarsening* of our knowledge partitions. I.e., if the ground truth is some given world w, then what do I consider it possible that you consider it possible that I consider possible that … etc.? Iterate this growth process until it stops, by “zigzagging” between our knowledge partitions, and you get the set S of worlds such that, if we’re in world w, then *what’s common knowledge between us is that the world belongs to S*. Repeat for all w’s, and you get the least common coarsening of our partitions. In the above example, the least common coarsening is trivial, with all four worlds ending up in the same set S, but there are nontrivial examples as well:

Now, if Alice’s expectation of a random variable X is common knowledge between her and Bob, that means that everywhere in S, her expectation must be constant … and hence must equal whatever the expectation *is*, over all the worlds in S! Likewise, if Bob’s expectation is common knowledge with Alice, then everywhere in S, it must equal the expectation of X over S. But that means that Alice’s and Bob’s expectations are the same.

There are lots of related results. For example, rational agents with common priors, and common knowledge of each other’s rationality, should never engage in speculative trade (e.g., buying and selling stocks, assuming that they don’t need cash, they’re not earning a commission, etc.). Why? Basically because, if I try to sell you a stock for (say) $50, then you should reason that the very fact that I’m offering it means I *must* have information you don’t that it’s worth less than $50, so then you update accordingly and you don’t want it either.

Or here’s another one: suppose again that we’re Bayesians with common priors, and we’re having a conversation, where I tell you my opinion (say, of the probability Hillary will win the election). Not any of the reasons or evidence on which the opinion is based—just the opinion itself. Then you, being Bayesian, update your probabilities to account for what my opinion is. Then you tell me *your* opinion (which might have changed after learning mine), I update on that, I tell you my *new* opinion, then you tell me your new opinion, and so on. You might think this could go on forever! But, no, Geanakoplos and Polemarchakis observed that, as long as there are only finitely many possible states of the world in our shared prior, this process must converge after finitely many steps with you and me having the same opinion (and moreover, with it being *common knowledge* that we have that opinion). Why? Because as long as our opinions differ, your telling me your opinion or me telling you mine must induce a nontrivial *refinement* of one of our knowledge partitions, like so:

I.e., if you learn something new, then at least one of your knowledge sets must get split along the different possible values of the thing you learned. But since there are only finitely many underlying states, there can only be finitely many such splittings (note that, since Bayesians never forget anything, knowledge sets that are split will never again rejoin).

And something else: suppose your friend tells you a liberal opinion, then you take it into account, but reply with a more conservative opinion. The friend takes *your* opinion into account, and replies with a revised opinion. Question: is your friend’s new opinion likelier to be more liberal than yours, or more conservative?

Obviously, more liberal! Yes, maybe your friend now sees some of your points and vice versa, maybe you’ve now drawn a bit closer (ideally!), but you’re not going to suddenly switch sides because of one conversation.

Yet, if you and your friend are Bayesians with common priors, one can prove that *that’s not what should happen at all*. Indeed, your expectation of your own future opinion should equal your current opinion, and your expectation of your friend’s next opinion should also equal your current opinion—meaning that you shouldn’t be able to predict in which direction your opinion will change next, nor in which direction your friend will next disagree with you. Why not? Formally, because all these expectations are just different ways of calculating an expectation over the *same set*, namely your current knowledge set (i.e., the set of states of the world that you currently consider possible)! More intuitively, we could say: if you could predict that, all else equal, the next thing you heard would probably shift your opinion in a liberal direction, then as a Bayesian *you should already shift your opinion in a liberal direction right now*. (This is related to what’s called the “martingale property”: sure, a random variable X could evolve in many ways in the future, but the average of all those ways must be its current expectation E[X], by the very definition of E[X]…)

So, putting all these results together, we get a clear picture of what rational disagreements should look like: they should follow unbiased random walks, until sooner or later they terminate in common knowledge of complete agreement. We now face a bit of a puzzle, in that *hardly any disagreements in the history of the world have ever looked like that*. So what gives?

There are a few ways out:

(1) You could say that the “failed prediction” of Aumann’s Theorem is no surprise, since virtually all human beings are irrational cretins, or liars (or at least, it’s not common knowledge that they aren’t). Except for you, of course: *you’re* perfectly rational and honest. And if you ever met anyone else as rational and honest as you, maybe you and they could have an Aumannian conversation. But since such a person probably doesn’t exist, you’re totally justified to stand your ground, discount all opinions that differ from yours, etc. Notice that, even if you genuinely believed that was all there was to it, Aumann’s Theorem would still have an *aspirational* significance for you: you would still have to say this is the ideal that all rationalists should strive toward when they disagree. And that would already conflict with a lot of standard rationalist wisdom. For example, we all know that arguments from authority carry little weight: what should sway you is not the mere fact of some other person *stating* their opinion, but the actual arguments and evidence that they’re able to bring. Except that as we’ve seen, for Bayesians with common priors this isn’t true at all! Instead, merely hearing your friend’s opinion serves as a powerful summary of what your friend knows. And if you learn that your rational friend disagrees with you, then even without knowing *why*, you should take that as seriously as if you discovered a contradiction in your own thought processes. This is related to an even broader point: there’s a normative rule of rationality that you should judge ideas only on their merits—yet if you’re a Bayesian, *of course* you’re going to take into account where the ideas come from, and how many other people hold them! Likewise, if you’re a Bayesian police officer or a Bayesian airport screener or a Bayesian job interviewer, *of course* you’re going to profile people by their superficial characteristics, however unfair that might be to individuals—so all those studies proving that people evaluate the same resume differently if you change the name at the top are no great surprise. It seems to me that the tension between these two different views of rationality, the normative and the Bayesian, generates a lot of the most intractable debates of the modern world.

(2) Or—and this is an obvious one—you could reject the assumption of common priors. After all, isn’t a major selling point of Bayesianism supposed to be its *subjective* aspect, the fact that you pick “whichever prior feels right for you,” and are constrained only in how to update that prior? If Alice’s and Bob’s priors can be different, then all the reasoning I went through earlier collapses. So rejecting common priors might seem appealing. But there’s a paper by Tyler Cowen and Robin Hanson called “Are Disagreements Honest?”—one of the most worldview-destabilizing papers I’ve ever read—that calls that strategy into question. What it says, basically, is this: if you’re really a thoroughgoing Bayesian rationalist, then your prior ought to allow for the possibility that you *are* the other person. Or to put it another way: “you being born as you,” rather than as someone else, should be treated as just one more contingent fact that you observe and then conditionalize on! And likewise, the other person should condition on the observation that they’re them and not you. In this way, *absolutely everything* that makes you different from someone else can be understood as “differing information,” so we’re right back to the situation covered by Aumann’s Theorem. Imagine, if you like, that we all started out behind some Rawlsian veil of ignorance, as pure reasoning minds that had yet to be assigned specific bodies. In that original state, there was nothing to differentiate any of us from any other—anything that did would just be information to condition on—so we all should’ve had the same prior. That might sound fanciful, but in some sense all it’s saying is: what licenses you to privilege an observation just because it’s *your* eyes that made it, or a thought just because it happened to occur in *your* head? Like, if you’re objectively smarter or more observant than everyone else around you, fine, but to whatever extent you agree that you aren’t, *your* opinion gets no special epistemic protection just because it’s yours.

(3) If you’re uncomfortable with this tendency of Bayesian reasoning to refuse to be confined anywhere, to want to expand to cosmic or metaphysical scope (“I need to condition on having been born as *me* and not someone else”)—well then, you could reject the entire framework of Bayesianism, as your notion of rationality. Lest I be cast out from this camp as a heretic, I hasten to say: I include this option only for the sake of completeness!

(4) When I first learned about this stuff 12 years ago, it seemed obvious to me that a lot of it could be dismissed as irrelevant to the real world for reasons of *complexity*. I.e., sure, it might apply to ideal reasoners with unlimited time and computational power, but as soon as you impose realistic constraints, this whole Aumannian house of cards should collapse. As an example, if Alice and Bob have common priors, then sure they’ll agree about everything if they effectively share all their information with each other! But in practice, we don’t have time to “mind-meld,” swapping our entire life experiences with anyone we meet. So one could conjecture that agreement, in general, requires a lot of communication. So then I sat down and tried to prove that as a theorem. And you know what I found? That my intuition here wasn’t even *close* to correct!

In more detail, I proved the following theorem. Suppose Alice and Bob are Bayesians with shared priors, and suppose they’re arguing about (say) the probability of some future event—or more generally, about any random variable X bounded in [0,1]. So, they have a conversation where Alice first announces her expectation of X, then Bob announces his new expectation, and so on. The theorem says that Alice’s and Bob’s estimates of X will necessarily agree to within ±ε, with probability at least 1-δ over their shared prior, after they’ve exchanged only O(1/(δε^{2})) messages. Note that this bound is completely independent of how much knowledge they have; it depends only on the accuracy with which they want to agree! Furthermore, the same bound holds even if Alice and Bob only send a few discrete bits about their real-valued expectations with each message, rather than the expectations themselves.

The proof involves the idea that Alice and Bob’s estimates of X, call them X_{A} and X_{B} respectively, follow “unbiased random walks” (or more formally, are martingales). Very roughly, if |X_{A}-X_{B}|≥ε with high probability over Alice and Bob’s shared prior, then that fact implies that the next message has a high probability (again, over the shared prior) of causing either X_{A} or X_{B} to jump up or down by about ε. But X_{A} and X_{B}, being estimates of X, are bounded between 0 and 1. So a random walk with a step size of ε can only continue for about 1/ε^{2} steps before it hits one of the “absorbing barriers.”

The way to formalize this is to look at the variances, Var[X_{A}] and Var[X_{B}], with respect to the shared prior. Because Alice and Bob’s partitions keep getting refined, the variances are monotonically non-decreasing. They start out 0 and can never exceed 1 (in fact they can never exceed 1/4, but let’s not worry about constants). Now, the key lemma is that, if Pr[|X_{A}-X_{B}|≥ε]≥δ, then Var[X_{B}] must increase by at least δε^{2} if Alice sends X_{A} to Bob, and Var[X_{A}] must increase by at least δε^{2} if Bob sends X_{B} to Alice. You can see my paper for the proof, or just work it out for yourself. At any rate, the lemma implies that, after O(1/(δε^{2})) rounds of communication, there must be at least a temporary break in the disagreement; there must be some round where Alice and Bob approximately agree with high probability.

There are lots of other results in my paper, including an upper bound on the number of calls that Alice and Bob need to make to a “sampling oracle” to carry out this sort of protocol approximately, assuming they’re not perfect Bayesians but agents with bounded computational power. But let me step back and address the broader question: what should we make of all this? How should we live with the gargantuan chasm between the prediction of Bayesian rationality for how we should disagree, and the actual facts of how we *do* disagree?

We could simply declare that *human beings are not well-modeled as Bayesians with common priors*—that we’ve failed in giving a descriptive account of human behavior—and leave it at that. OK, but that would still leave the question: does this stuff have *normative* value? Should it affect how we behave, if we want to consider ourselves honest and rational? I would argue, possibly yes.

Yes, you should constantly ask yourself the question: “would I still be defending this opinion, if I had been born as someone else?” (Though you might say this insight predates Aumann by quite a bit, going back at least to Spinoza.)

Yes, if someone you respect as honest and rational disagrees with you, you should take it as seriously as if the disagreement were between two different aspects of yourself.

Finally, yes, we can try to judge epistemic communities by how closely they approach the Aumannian ideal. In math and science, in my experience, it’s common to see two people furiously arguing with each other at a blackboard. Come back five minutes later, and they’re arguing even more furiously, but now their positions have switched. As we’ve seen, that’s *precisely* what the math says a rational conversation should look like. In social and political discussions, though, usually the very best you’ll see is that two people start out diametrically opposed, but eventually one of them says “fine, I’ll grant you this,” and the other says “fine, I’ll grant you that.” We might say, that’s certainly better than the common alternative, of the two people walking away even more polarized than before! Yet the math tells us that even the first case—even the two people gradually getting closer in their views—is *nothing at all* like a rational exchange, which would involve the two participants repeatedly leapfrogging each other, completely changing their opinion about the question under discussion (and then changing back, and back again) every time they learned something new. The first case, you might say, is more like *haggling*—more like “I’ll grant you that X is true if you grant me that Y is true”—than like our ideal friendly mathematicians arguing at the blackboard, whose acceptance of new truths is never slow or grudging, never conditional on the other person first agreeing with them about something else.

Armed with this understanding, we could try to rank fields by how hard it is to have an Aumannian conversation in them. At the bottom—the easiest!—is math (or, let’s say, chess, or debugging a program, or fact-heavy fields like lexicography or geography). Crucially, here I only mean the *parts* of these subjects with agreed-on rules and definite answers: once the conversation turns to whose theorems are deeper, or whose fault the bug was, things can get arbitrarily non-Aumannian. Then there’s the type of science that involves messy correlational studies (I just mean, talking about what’s a risk factor for what, not the political implications). Then there’s politics and aesthetics, with the most radioactive topics like Israel/Palestine higher up. And then, at the very peak, there’s gender and social justice debates, where *everyone* brings their formative experiences along, and absolutely no one is a disinterested truth-seeker, and possibly no Aumannian conversation has ever been had in the history of the world.

I would urge that even at the very top, it’s still incumbent on all of us to *try* to make the Aumannian move, of “what would I think about this issue if I were someone else and not me? If I were a man, a woman, black, white, gay, straight, a nerd, a jock? How much of my thinking about this represents pure Spinozist reason, which could be ported to any rational mind, and how much of it would get lost in translation?”

Anyway, I’m sure some people would argue that, in the end, the whole framework of Bayesian agents, common priors, common knowledge, etc. can be chucked from this discussion like so much scaffolding, and the moral lessons I want to draw boil down to trite advice (“try to see the other person’s point of view”) that you all knew already. Then again, even if you all knew all this, maybe you didn’t know that you all knew it! So I hope you gained some new information from this talk in any case. Thanks.

**Update:** Coincidentally, there’s a moving NYT piece by Oliver Sacks, which (among other things) recounts his experiences with his cousin, the Aumann of Aumann’s theorem.

**Another Update:** If I ever *did* attempt an Aumannian conversation with someone, the other Scott A. would be a candidate! Here he is in 2011 making several of the same points I did above, using the same examples (I thank him for pointing me to his post).

A bunch of people have asked me to comment on D-Wave’s release of its 1000-qubit processor, and a paper by a group including Cathy McGeoch saying that the machine is 1 or 2 orders of faster (in annealing time, not wall-clock time) than simulated annealing running on a single-core classical computer. It’s even been suggested that the “Scott-signal” has been shining brightly for a week above Quantham City, but that Scott-man has been too lazy and out-of-shape even to change into his tights.

Scientifically, it’s not clear if much has changed. D-Wave now has a chip with twice as many qubits as the last one. That chip continues to be pretty effective at finding its own low-energy states: indeed, depending on various details of definition, the machine can even find its own low-energy states “faster” than some implementation of simulated annealing running on a single-core chip. Of course, it’s entirely possible that Matthias Troyer or Sergei Isakov or Troels Ronnow or someone like that will be able to find a better implementation of simulated annealing that closes even the modest gap—as happened the last time—but I’ll give the authors the benefit of the doubt that they put good-faith effort into optimizing the classical code.

More importantly, I’d say it remains unclear whether *any* of the machine’s performance on the instances tested here can be attributed to quantum tunneling effects. In fact, the paper explicitly states (see page 3) that it’s not going to consider such questions, and I think the authors would agree that you could very well see results like theirs, even if what was going on was fundamentally classical annealing. Also, of course, it’s still true that, if you wanted to solve a *practical* optimization problem, you’d first need to encode it into the Chimera graph, and that reduction entails a loss that could hand a decisive advantage to simulated annealing, even without the need to go to multiple cores. (This is what I’ve described elsewhere as essentially all of these performance comparisons taking place on “the D-Wave machine’s home turf”: that is, on binary constraint satisfaction problems that have precisely the topology of D-Wave’s Chimera graph.)

But, I dunno, I’m just not feeling the urge to analyze this in more detail. Part of the reason is that I *think* the press might be getting less hyper-excitable these days, thereby reducing the need for a Chief D-Wave Skeptic. By this point, there may have been enough D-Wave announcements that papers realize they no longer need to cover each one like an extraterrestrial landing. And there are more hats in the ring now, with John Martinis at Google seeking to build superconducting quantum annealing machines but with ~10,000x longer coherence times than D-Wave’s, and with IBM Research and some others also trying to scale superconducting QC. The realization has set in, I think, that both D-Wave and the others are in this for the long haul, with D-Wave currently having lots of qubits, but with very short coherence times and unclear prospects for any quantum speedup, and Martinis and some others having qubits of far higher quality, but not yet able to couple enough of them.

The other issue is that, on my flight from Seoul back to Newark, I watched two recent kids’ movies that were almost *defiant* in their simple, unironic, 1950s-style messages of hope and optimism. One was Disney’s new live-action *Cinderella*; the other was Brad Bird’s *Tomorrowland*. And seeing these back-to-back filled me with such positivity and good will that, at least for these few hours, it’s hard to summon my usual crusty self. I say, let’s invent the future together, and build flying cars and jetpacks in our garages! Let a thousand creative ideas bloom for how to tackle climate change and the other crises facing civilization! (Admittedly, mass-market flying cars and jetpacks are probably *not* a step forward on climate change … but, see, there’s that negativity coming back.) And let *another* thousand ideas bloom for how to build scalable quantum computers—sure, including D-Wave’s! Have courage and be kind!

So yeah, if readers would like to discuss the recent D-Wave paper further (especially those who know something about it), they’re more than welcome to do so in the comments section. But I’ve been away from Dana and Lily for two weeks, and will endeavor to spend time with them rather than obsessively reloading the comments (let’s see if I succeed).

As a small token of my goodwill, I enclose two photos from my last visit to a D-Wave machine, which occurred when I met with some grad students in Waterloo this past spring. As you can see, I even personally certified that the machine was operating as expected. But more than that: surpassing all reasonable expectations for quantum AI, this model could actually converse intelligently, through a protruding head resembling that of IQC grad student Sarah Kaiser.

So I’m at this black hole conference in Stockholm, and at his public lecture yesterday evening, Stephen Hawking announced that he has figured out how information escapes from black holes, and he will tell us today at the conference at 11am.

As your blogger at location I feel a certain duty to leak information ;)

Extrapolating from the previous paper and some rumors, it’s something with AdS/CFT and work with Andrew Strominger, so likely to have some strings attached.

30 minutes to 11, and the press has arrived. They're clustering in my back, so they're going to watch me type away, fun.

10 minutes to 11, some more information emerges. There's a third person involved in this work, besides Andrew Strominger also Malcom Perry who is sitting in the row in front of me. They started their collaboration at a workshop in Hereforshire Easter 2015.

10 past 11. The Awaited is late. We're told it will be another 10 minutes.

11 past 11. Here he comes.

He says that he has solved a problem that has bothered people since 40 years, and so on. He now understands that information is stored on the black hole horizon in form of "supertranslations," which were introduced in the mid 1960s by Bondi and Metzner. This makes much sense because Strominger has been onto this recently. It occurred to Hawking in April, when listening to a talk by Strominger, that black hole horizons also have supertranslations. The supertranslations are caused by the ingoing particles.

That's it. Time for questions. Rovelli asking: Do supertranslations change the quantum state?

Just for the record, I don't know anything about supertranslations, so don't ask.

It's taking a long time for Hawking to compose a reply. People start mumbling. Everybody trying to guess what he meant. I can see that you can use supertranslations to store information, but don't understand how the information from the initial matter gets moved into other degrees of freedom. The only way I can see how this works is that the information was there twice to begin with.

Oh, we're now seeing Hawking's desktop projected by beamer. He is patching together a reply to Rovelli. Everybody seems confused.

Malcom Perry mumbling he'll give a talk this afternoon and explain everything. Good.

Hawking is saying (typing) that the supertranslations are a hologram of the ingoing particles.

It's painful to watch actually, seeing that I'm easily typing two paragraphs in the time he needs for one word :(

Yes, I figure he is saying the information was there twice to begin with. It's stored on the horizon in form of supertranslations, which can make a tiny delay for the emission of Hawking particles. Which presumably can encode information in the radiation.

Paul Davies asking if the argument goes through for de Sitter space or only asymptotically flat space. Hawking saying it applies to black holes in any background.

Somebody else asks if quantum fluctuations of the background will be relevant. 't Hooft answering with yes, but they have no microphone, I can't understand them very well.

I'm being told there will be an arxiv paper some time end of September probably.

Ok, so Hawking is saying in reply to Rovelli that it's an effect caused by the classical gravitational field. Now I am confused because the gravitational field doesn't uniquely encode quantum states. It's something I myself have tried to use before. The gravitational field of the ingoing particles does always affect the outgoing radation, in principle. The effect is exceedingly weak of course, but it's there. If the classical gravitational field of the ingoing particles could encode all the information about the ingoing radiation then this alone would do away with the information loss problem. But it doesn't work.You can have two bosons of energy E on top of each other and arrange it so they have the same classical gravitational field as one of twice this energy.

Rovelli nodding to my question (I think he meant the same thing). 't Hooft saying in reply that not all field configurations would be allowed. Somebody else saying there are no states that cannot be distinguished by their metric. This doesn't make sense to me because then the information was always present twice, already classically and then what would one need the supertranslations for?

Ok, so, end of discussion session, lunch break. We'll all await Malcom Perry's talk this afternoon.

**Update**: After Malcom Perry's talk, some more details have emerged. Yes, it is a purely classical picture, at least for now. The BMS group essentially provides classical black hole hair in form of an infinite amount of charges. Of course you don't really want an infinite amount, you want a finite amount that fits the Bekenstein-Hawking entropy. One would expect that this necessitates a quantized version (at least geometrically quantized, or with a finite phase-space volume). But there isn't one so far.

Neither is there, at this point, a clear picture for how the information gets into the outgoing radiation. I am somewhat concerned actually that once one looks at the quantum picture, the BMS charges at infinity will be entangled with charges falling into the black hole, thus essentially reinventing the black hole information problem.

Finally, to add some context to 't Hooft's remark, Perry said that since this doesn't work for all types of charges, not all models for particle content would be allowed, as for example information about baryon number couldn't be saved this way. He also said that you wouldn't have this problem in string theory, but I didn't really understand why.

**Another Update**: Here is a summary from Jacob Aron at New Scientist.

**Another Update**: A video of Hawking's talk is now available here.

**Yet another update**: Malcom Perry will give a second, longer, lecture on the topic tomorrow morning, which will be recorded and be made available on the Nordita website.

As your blogger at location I feel a certain duty to leak information ;)

Extrapolating from the previous paper and some rumors, it’s something with AdS/CFT and work with Andrew Strominger, so likely to have some strings attached.

30 minutes to 11, and the press has arrived. They're clustering in my back, so they're going to watch me type away, fun.

10 minutes to 11, some more information emerges. There's a third person involved in this work, besides Andrew Strominger also Malcom Perry who is sitting in the row in front of me. They started their collaboration at a workshop in Hereforshire Easter 2015.

10 past 11. The Awaited is late. We're told it will be another 10 minutes.

11 past 11. Here he comes.

He says that he has solved a problem that has bothered people since 40 years, and so on. He now understands that information is stored on the black hole horizon in form of "supertranslations," which were introduced in the mid 1960s by Bondi and Metzner. This makes much sense because Strominger has been onto this recently. It occurred to Hawking in April, when listening to a talk by Strominger, that black hole horizons also have supertranslations. The supertranslations are caused by the ingoing particles.

That's it. Time for questions. Rovelli asking: Do supertranslations change the quantum state?

Just for the record, I don't know anything about supertranslations, so don't ask.

It's taking a long time for Hawking to compose a reply. People start mumbling. Everybody trying to guess what he meant. I can see that you can use supertranslations to store information, but don't understand how the information from the initial matter gets moved into other degrees of freedom. The only way I can see how this works is that the information was there twice to begin with.

Oh, we're now seeing Hawking's desktop projected by beamer. He is patching together a reply to Rovelli. Everybody seems confused.

Malcom Perry mumbling he'll give a talk this afternoon and explain everything. Good.

Hawking is saying (typing) that the supertranslations are a hologram of the ingoing particles.

It's painful to watch actually, seeing that I'm easily typing two paragraphs in the time he needs for one word :(

Yes, I figure he is saying the information was there twice to begin with. It's stored on the horizon in form of supertranslations, which can make a tiny delay for the emission of Hawking particles. Which presumably can encode information in the radiation.

Paul Davies asking if the argument goes through for de Sitter space or only asymptotically flat space. Hawking saying it applies to black holes in any background.

Somebody else asks if quantum fluctuations of the background will be relevant. 't Hooft answering with yes, but they have no microphone, I can't understand them very well.

I'm being told there will be an arxiv paper some time end of September probably.

Ok, so Hawking is saying in reply to Rovelli that it's an effect caused by the classical gravitational field. Now I am confused because the gravitational field doesn't uniquely encode quantum states. It's something I myself have tried to use before. The gravitational field of the ingoing particles does always affect the outgoing radation, in principle. The effect is exceedingly weak of course, but it's there. If the classical gravitational field of the ingoing particles could encode all the information about the ingoing radiation then this alone would do away with the information loss problem. But it doesn't work.You can have two bosons of energy E on top of each other and arrange it so they have the same classical gravitational field as one of twice this energy.

Rovelli nodding to my question (I think he meant the same thing). 't Hooft saying in reply that not all field configurations would be allowed. Somebody else saying there are no states that cannot be distinguished by their metric. This doesn't make sense to me because then the information was always present twice, already classically and then what would one need the supertranslations for?

Ok, so, end of discussion session, lunch break. We'll all await Malcom Perry's talk this afternoon.

Neither is there, at this point, a clear picture for how the information gets into the outgoing radiation. I am somewhat concerned actually that once one looks at the quantum picture, the BMS charges at infinity will be entangled with charges falling into the black hole, thus essentially reinventing the black hole information problem.

Finally, to add some context to 't Hooft's remark, Perry said that since this doesn't work for all types of charges, not all models for particle content would be allowed, as for example information about baryon number couldn't be saved this way. He also said that you wouldn't have this problem in string theory, but I didn't really understand why.

A while ago I read a great paper by the philosopher L. A. Paul and wrote this post about it, asking: is the experience of becoming a vampire analogous in important ways to the experience of becoming a parent? When deciding whether to become a vampire, is it relevant what human you thinks about being […]

A while ago I read a great paper by the philosopher L. A. Paul and wrote this post about it, asking: is the experience of becoming a vampire analogous in important ways to the experience of becoming a parent? When deciding whether to become a vampire, is it relevant what human you thinks about being a vampire, or only what future vampire you would think about being a vampire?

Paul liked the example and was kind enough to include (her much deeper and more fully worked-out version of) it in her book, *Transformative Experience.*

And now David Brooks, the official public philosopher de nous jours, has devoted a whole column to Paul’s book! And he leads with the vampires!

Let’s say you had the chance to become a vampire. With one magical bite you would gain immortality, superhuman strength and a life of glamorous intensity. Your friends who have undergone the transformation say the experience is incredible. They drink animal blood, not human blood, and say everything about their new existence provides them with fun, companionship and meaning.

Would you do it? Would you consent to receive the life-altering bite, even knowing that once changed you could never go back?

The difficulty of the choice is that you’d have to use your human self and preferences to try to guess whether you’d enjoy having a vampire self and preferences. Becoming a vampire is transformational. You would literally become a different self. How can you possibly know what it would feel like to be this different version of you or whether you would like it?

Brooks punts on the actually difficult questions raised by Paul’s book, counseling you to cast aside contemplation of your various selves’ preferences and do as objective moral standards demand. But Paul makes it clear (p.19) that “in the circumstances I am considering… there are no moral or religious rules that determine just which act you should choose.”

Note well, buried in the last paragraph:

When we’re shopping for something, we act as autonomous creatures who are looking for the product that will produce the most pleasure or utility. But choosing to have a child or selecting a spouse, faith or life course is not like that.

Choosing children, spouses, and vocations are discussed elsewhere in the piece, but choosing a religion is not. And yet there it is in the summation. The column is yet more evidence for my claim that David Brooks will shortly announce — let’s say within a year — that he’s converting to Christianity. Controversial predictions! And vampires! All part of the Quomodocumque brand.

At breakfast I told Morgan Fouesneau (MPIA) my desiderata for a set of `matplotlib` color maps: I want a map that indicates intensity (dark to bright, say), a map that indicates a sequential value (mass or metallicty or age, say), a map that indicates residuals *away from zero* that de-emphasizes the near-zero values, and a map that is the same but that emphasizes the near-zero values. I want the diverging maps to never hit pure white or pure black (indeed none of these maps should) because we always want to distinguish values from “no data”. And I want them to be good for people with standard color-blindness. But here's the hard part: I want all four of these colormaps to be drawn from the same general palette, so that a scientific paper that uses them will have a consistent visual feel.

Before lunch, Ness and I met with Marie Martig (MPIA) and Fouesneau to go through our stellar age results. Martig and Fouesneau are writing up a method to use carbon and nitrogen features to infer red-giant ages and masses. Ness and I are writing up our use of *The Cannon* to get red-giant ages and masses. It turns out that *The Cannon* (being a brain-dead data-driven model) has also chosen, internally, to use carbon and nitrogren indicators. This is a great endorsement of the Martig and Fouesneau method and project. Because they are using their prior beliefs about stellar spectroscopy better than we are, they ought to get more accurate results, but we haven't compared in detail yet.

Late in the day, Foreman-Mackey and I discussed *K2* and *Kepler* projects. We discussed at length the relationship between stellar multiplicity and binary and trinary (and so on) populations inference. Has all this been done for stars, just like we are doing it for exoplanets? We also discussed candidate (transiting candidate) vetting, and the fact that you can't remove the non-astrophysics (systematics-induced) false positives unless you have a model for all the things that can happen.

A few brief news items - our first week of classes this term is a busy time.

- Here is a video of Richard Feynman, explaining why he can't readily explain permanent magnets to the interviewer. This gets right to the heart of why explaining science in a popular, accessible way can be very difficult. Sure, he could come up with really stretched and tortured analogies, but truly getting at the deeper science behind the permanent magnets and their interactions would require laying a ton of groundwork, way more than what an average person would want to hear.
- Here is a freely available news article from Nature about superconductivity in H
_{2}S at very high pressures. I was going to write at some length about this but haven't found the time. The short version: There have been predictions for a long time that hydrogen, at very high pressures like in the interior of Jupiter, should be metallic and possibly a relatively high temperature superconductor. There are later predictions that hydrogen-rich alloys and compounds could also superconduct at pretty high temperatures. Now it seems that hydrogen sulfide does just this. Crank up the pressure to 1.5 million atmospheres, and that stinky gas becomes what seems to be a relatively conventional (!) superconductor, with a transition temperature close to 200 K. The temperature is comparatively high because of a combination of an effectively high speed of sound (the material gets pretty stiff at those pressures), a large density of electrons available to participate, and a strong coupling between the electrons and those vibrations (so that the vibrations can provide an effective attractive interaction between the electrons that leads to pairing). The important thing about this work is that it shows that there is no obvious reason why superconductivity at or near room temperature should be ruled out. - Congratulations to Prof. Laura Greene, incoming APS president, who has been named the new chief scientist of the National High Magnetic Field Lab.
- Likewise, congratulations to Prof. Meigan Aronson, who has been named Texas A&M University's new Dean of Science.

The Poincaré upper half-plane (with a boundary consisting of the real line together with the point at infinity ) carries an action of the projective special linear group via fractional linear transformations: Here and in the rest of the post we will abuse notation by identifying elements of the special linear group with their equivalence […]

The Poincaré upper half-plane (with a boundary consisting of the real line together with the point at infinity ) carries an action of the projective special linear group

via fractional linear transformations:

Here and in the rest of the post we will abuse notation by identifying elements of the special linear group with their equivalence class in ; this will occasionally create or remove a factor of two in our formulae, but otherwise has very little effect, though one has to check that various definitions and expressions (such as (1)) are unaffected if one replaces a matrix by its negation . In particular, we recommend that the reader ignore the signs that appear from time to time in the discussion below.

As the action of on is transitive, and any given point in (e.g. ) has a stabiliser isomorphic to the projective rotation group , we can view the Poincaré upper half-plane as a homogeneous space for , and more specifically the quotient space of of a maximal compact subgroup . In fact, we can make the half-plane a symmetric space for , by endowing with the Riemannian metric

(using Cartesian coordinates ), which is invariant with respect to the action. Like any other Riemannian metric, the metric on generates a number of other important geometric objects on , such as the distance function which can be computed to be given by the formula

the volume measure , which can be computed to be

and the Laplace-Beltrami operator, which can be computed to be (here we use the negative definite sign convention for ). As the metric was -invariant, all of these quantities arising from the metric are similarly -invariant in the appropriate sense.

The Gauss curvature of the Poincaré half-plane can be computed to be the constant , thus is a model for two-dimensional hyperbolic geometry, in much the same way that the unit sphere in is a model for two-dimensional spherical geometry (or is a model for two-dimensional Euclidean geometry). (Indeed, is isomorphic (via projection to a null hyperplane) to the upper unit hyperboloid in the Minkowski spacetime , which is the direct analogue of the unit sphere in Euclidean spacetime or the plane in Galilean spacetime .)

One can inject arithmetic into this geometric structure by passing from the Lie group to the full modular group

or congruence subgroups such as

for natural number , or to the discrete stabiliser of the point at infinity:

These are discrete subgroups of , nested by the subgroup inclusions

There are many further discrete subgroups of (known collectively as Fuchsian groups) that one could consider, but we will focus attention on these three groups in this post.

Any discrete subgroup of generates a quotient space , which in general will be a non-compact two-dimensional orbifold. One can understand such a quotient space by working with a fundamental domain – a set consisting of a single representative of each of the orbits of in . This fundamental domain is by no means uniquely defined, but if the fundamental domain is chosen with some reasonable amount of regularity, one can view as the fundamental domain with the boundaries glued together in an appropriate sense. Among other things, fundamental domains can be used to induce a volume measure on from the volume measure on (restricted to a fundamental domain). By abuse of notation we will refer to both measures simply as when there is no chance of confusion.

For instance, a fundamental domain for is given (up to null sets) by the strip , with identifiable with the cylinder formed by gluing together the two sides of the strip. A fundamental domain for is famously given (again up to null sets) by an upper portion , with the left and right sides again glued to each other, and the left and right halves of the circular boundary glued to itself. A fundamental domain for can be formed by gluing together

copies of a fundamental domain for in a rather complicated but interesting fashion.

While fundamental domains can be a convenient choice of coordinates to work with for some computations (as well as for drawing appropriate pictures), it is geometrically more natural to avoid working explicitly on such domains, and instead work directly on the quotient spaces . In order to analyse functions on such orbifolds, it is convenient to lift such functions back up to and identify them with functions which are *-automorphic* in the sense that for all and . Such functions will be referred to as -automorphic forms, or *automorphic forms* for short (we always implicitly assume all such functions to be measurable). (Strictly speaking, these are the automorphic forms with trivial factor of automorphy; one can certainly consider other factors of automorphy, particularly when working with holomorphic modular forms, which corresponds to sections of a more non-trivial line bundle over than the trivial bundle that is implicitly present when analysing scalar functions . However, we will not discuss this (important) more general situation here.)

An important way to create a -automorphic form is to start with a non-automorphic function obeying suitable decay conditions (e.g. bounded with compact support will suffice) and form the Poincaré series defined by

which is clearly -automorphic. (One could equivalently write in place of here; there are good argument for both conventions, but I have ultimately decided to use the convention, which makes explicit computations a little neater at the cost of making the group actions work in the opposite order.) Thus we naturally see sums over associated with -automorphic forms. A little more generally, given a subgroup of and a -automorphic function of suitable decay, we can form a relative Poincaré series by

where is any fundamental domain for , that is to say a subset of consisting of exactly one representative for each right coset of . As is -automorphic, we see (if has suitable decay) that does not depend on the precise choice of fundamental domain, and is -automorphic. These operations are all compatible with each other, for instance . A key example of Poincaré series are the Eisenstein series, although there are of course many other Poincaré series one can consider by varying the test function .

For future reference we record the basic but fundamental *unfolding identities*

for any function with sufficient decay, and any -automorphic function of reasonable growth (e.g. bounded and compact support, and bounded, will suffice). Note that is viewed as a function on on the left-hand side, and as a -automorphic function on on the right-hand side. More generally, one has

When computing various statistics of a Poincaré series , such as its values at special points , or the quantity , expressions of interest to analytic number theory naturally emerge. We list three basic examples of this below, discussed somewhat informally in order to highlight the main ideas rather than the technical details.

The first example we will give concerns the problem of estimating the sum

where is the divisor function. This can be rewritten (by factoring and ) as

which is basically a sum over the full modular group . At this point we will “cheat” a little by moving to the related, but different, sum

This sum is not exactly the same as (8), but will be a little easier to handle, and it is plausible that the methods used to handle this sum can be modified to handle (8). Observe from (2) and some calculation that the distance between and is given by the formula

and so one can express the above sum as

(the factor of coming from the quotient by in the projective special linear group); one can express this as , where and is the indicator function of the ball . Thus we see that expressions such as (7) are related to evaluations of Poincaré series. (In practice, it is much better to use smoothed out versions of indicator functions in order to obtain good control on sums such as (7) or (9), but we gloss over this technical detail here.)

The second example concerns the relative

of the sum (7). Note from multiplicativity that (7) can be written as , which is superficially very similar to (10), but with the key difference that the polynomial is irreducible over the integers.

As with (7), we may expand (10) as

At first glance this does not look like a sum over a modular group, but one can manipulate this expression into such a form in one of two (closely related) ways. First, observe that any factorisation of into Gaussian integers gives rise (upon taking norms) to an identity of the form , where and . Conversely, by using the unique factorisation of the Gaussian integers, every identity of the form gives rise to a factorisation of the form , essentially uniquely up to units. Now note that is of the form if and only if , in which case . Thus we can essentially write the above sum as something like

and one the modular group is now manifest. An equivalent way to see these manipulations is as follows. A triple of natural numbers with gives rise to a positive quadratic form of normalised discriminant equal to with integer coefficients (it is natural here to allow to take integer values rather than just natural number values by essentially doubling the sum). The group acts on the space of such quadratic forms in a natural fashion (by composing the quadratic form with the inverse of an element of ). Because the discriminant has class number one (this fact is equivalent to the unique factorisation of the gaussian integers, as discussed in this previous post), every form in this space is equivalent (under the action of some element of ) with the standard quadratic form . In other words, one has

which (up to a harmless sign) is exactly the representation , , introduced earlier, and leads to the same reformulation of the sum (10) in terms of expressions like (11). Similar considerations also apply if the quadratic polynomial is replaced by another quadratic, although one has to account for the fact that the class number may now exceed one (so that unique factorisation in the associated quadratic ring of integers breaks down), and in the positive discriminant case the fact that the group of units might be infinite presents another significant technical problem.

Note that has real part and imaginary part . Thus (11) is (up to a factor of two) the Poincaré series as in the preceding example, except that is now the indicator of the sector .

Sums involving subgroups of the full modular group, such as , often arise when imposing congruence conditions on sums such as (10), for instance when trying to estimate the expression when and are large. As before, one then soon arrives at the problem of evaluating a Poincaré series at one or more special points, where the series is now over rather than .

The third and final example concerns averages of Kloosterman sums

where and is the inverse of in the multiplicative group . It turns out that the norms of Poincaré series or are closely tied to such averages. Consider for instance the quantity

where is a natural number and is a -automorphic form that is of the form

for some integer and some test function , which for sake of discussion we will take to be smooth and compactly supported. Using the unfolding formula (6), we may rewrite (13) as

To compute this, we use the double coset decomposition

where for each , are arbitrarily chosen integers such that . To see this decomposition, observe that every element in outside of can be assumed to have by applying a sign , and then using the row and column operations coming from left and right multiplication by (that is, shifting the top row by an integer multiple of the bottom row, and shifting the right column by an integer multiple of the left column) one can place in the interval and to be any specified integer pair with . From this we see that

and so from further use of the unfolding formula (5) we may expand (13) as

The first integral is just . The second expression is more interesting. We have

so we can write

as

which on shifting by simplifies a little to

and then on scaling by simplifies a little further to

Note that as , we have modulo . Comparing the above calculations with (12), we can thus write (13) as

is a certain integral involving and a parameter , but which does not depend explicitly on parameters such as . Thus we have indeed expressed the expression (13) in terms of Kloosterman sums. It is possible to invert this analysis and express varius weighted sums of Kloosterman sums in terms of expressions (possibly involving inner products instead of norms) of Poincaré series, but we will not do so here; see Chapter 16 of Iwaniec and Kowalski for further details.

Traditionally, automorphic forms have been analysed using the spectral theory of the Laplace-Beltrami operator on spaces such as or , so that a Poincaré series such as might be expanded out using inner products of (or, by the unfolding identities, ) with various generalised eigenfunctions of (such as cuspidal eigenforms, or Eisenstein series). With this approach, special functions, and specifically the modified Bessel functions of the second kind, play a prominent role, basically because the -automorphic functions

for and non-zero are generalised eigenfunctions of (with eigenvalue ), and are almost square-integrable on (the norm diverges only logarithmically at one end of the cylinder , while decaying exponentially fast at the other end ).

However, as discussed in this previous post, the spectral theory of an essentially self-adjoint operator such as is basically equivalent to the theory of various solution operators associated to partial differential equations involving that operator, such as the Helmholtz equation , the heat equation , the Schrödinger equation , or the wave equation . Thus, one can hope to rephrase many arguments that involve spectral data of into arguments that instead involve resolvents , heat kernels , Schrödinger propagators , or wave propagators , or involve the PDE more directly (e.g. applying integration by parts and energy methods to solutions of such PDE). This is certainly done to some extent in the existing literature; resolvents and heat kernels, for instance, are often utilised. In this post, I would like to explore the possibility of reformulating spectral arguments instead using the inhomogeneous wave equation

Actually it will be a bit more convenient to normalise the Laplacian by , and look instead at the *automorphic wave equation*

This equation somewhat resembles a “Klein-Gordon” type equation, except that the mass is imaginary! This would lead to pathological behaviour were it not for the negative curvature, which in principle creates a spectral gap of that cancels out this factor.

The point is that the wave equation approach gives access to some nice PDE techniques, such as energy methods, Sobolev inequalities and finite speed of propagation, which are somewhat submerged in the spectral framework. The wave equation also interacts well with Poincaré series; if for instance and are -automorphic solutions to (15) obeying suitable decay conditions, then their Poincaré series and will be -automorphic solutions to the same equation (15), basically because the Laplace-Beltrami operator commutes with translations. Because of these facts, it is possible to replicate several standard spectral theory arguments in the wave equation framework, without having to deal directly with things like the asymptotics of modified Bessel functions. The wave equation approach to automorphic theory was introduced by Faddeev and Pavlov (using the Lax-Phillips scattering theory), and developed further by by Lax and Phillips, to recover many spectral facts about the Laplacian on modular curves, such as the Weyl law and the Selberg trace formula. Here, I will illustrate this by deriving three basic applications of automorphic methods in a wave equation framework, namely

- Using the Weil bound on Kloosterman sums to derive Selberg’s 3/16 theorem on the least non-trivial eigenvalue for on (discussed previously here);
- Conversely, showing that Selberg’s eigenvalue conjecture (improving Selberg’s bound to the optimal ) implies an optimal bound on (smoothed) sums of Kloosterman sums; and
- Using the same bound to obtain pointwise bounds on Poincaré series similar to the ones discussed above. (Actually, the argument here does not use the wave equation, instead it just uses the Sobolev inequality.)

This post originated from an attempt to finally learn this part of analytic number theory properly, and to see if I could use a PDE-based perspective to understand it better. Ultimately, this is not that dramatic a depature from the standard approach to this subject, but I found it useful to think of things in this fashion, probably due to my existing background in PDE.

I thank Bill Duke and Ben Green for helpful discussions. My primary reference for this theory was Chapters 15, 16, and 21 of Iwaniec and Kowalski.

** — 1. Selberg’s theorem — **

We begin with a proof of the following celebrated result of Selberg:

Theorem 1Let be a natural number. Then every eigenvalue of on (the mean zero functions on ) is at least .

One can show that has only pure point spectrum below on (see this previous blog post for more discussion). Thus, this theorem shows that the spectrum of on is contained in .

We now prove this theorem. Suppose this were not the case, then we have a non-zero eigenfunction of in with eigenvalue for some ; we may assume to be real-valued, and by elliptic regularity it is smooth (on ). If it is constant in the horizontal variable, thus , then by the -automorphic nature of it is easy to see that is globally constant, contradicting the fact that it is mean zero but not identically zero. Thus it is not identically constant in the horizontal variable. By Fourier analysis on the cylinder , one can then find a -automorphic function of the form for some non-zero integer which has a non-zero inner product with on , where is a smooth compactly supported function.

Now we evolve by the wave equation

Taking the inner product of with the eigenfunction on , differentiating under the integral sign, and integrating by parts, we see that

Since is initially non-zero with zero velocity, we conclude from solving the ODE that is a non-zero multiple of . In particular, it grows like as . Using the unfolding identity (6) to write

and then using the Cauchy-Schwarz inequality, we conclude that

as , where we allow implied constants to depend on .

We complement this lower bound with slightly crude upper bound in which we are willing to lose some powers of . We have already seen that is supported in the strip . Compactly supported solutions to (16) on the cylinder conserve the energy

In particular, this quantity is for all time (recall we allow implied constants to depend on ). From Hardy’s inequality, the quantity

is non-negative. Discarding this term and using , and using the fact that is non-zero, we arrive at the bounds

and

(We allow implied constants to depend on , but not on .) From the fundamental theorem of calculus and Minkowski’s inequality in , the latter inequality implies that

for , which on combining with the former inequality gives

The function also obeys the wave equation (16), so a similar argument gives

Applying a Sobolev inequality on unit squares (for ) or on squares of length comparable to (for ) we conclude the pointwise estimates

for and

for . In particular, we write , we have the somewhat crude estimates

for all and . (One can do better than this, particularly for large , but this bound will suffice for us.)

By repeating the analysis of (13) at the start of this post, we see that the quantity

where

Since is supported on and is bounded by , the integral is for . We also see that vanishes unless (otherwise and cannot simultaneously be , and for such values of , we have the triangle inequality bound

Evaluating the integral and then the integral, we arrive at

and so we can bound (18) (ignoring any potential cancellation in ) by

Now we use the Weil bound for Kloosterman sums, which gives

(see e.g. this previous post for a discussion of this bound) to arrive at the bound

as . Comparing this with (17) we obtain a contradiction as since we have , and the claim follows.

Remark 2It was conjectured by Linnik thatas for any fixed ; this, when combined with a more refined analysis of the above type, implies the Selberg eigenvalue conjecture that all eigenvalues of on are at least .

** — 2. Consequences of Selberg’s conjecture — **

In the previous section we saw how bounds on Kloosterman sums gave rise to lower bounds on eigenvalues of the Laplacian. It turns out that this implication is reversible. The simplest case (at least from the perspective of wave equation methods) is when Selberg’s eigenvalue conjecture is true, so that the Laplacian on has spectrum in . Equivalently, one has the inequality

for all (interpreting derivatives in a distributional sense if necessary). Integrating by parts, this shows that

for all , where the gradient and its magnitude are computed using the Riemannian metric in .

Now suppose one has a smooth, compactly supported in space solution to the inhomogeneous wave equation

for some forcing term which is also smooth and compactly supported in space. We assume that has mean zero for all . Introducing the energy

which is non-negative thanks to (19) and integrating by parts, we obtain the energy identity

and hence by Cauchy-Schwarz

and hence

(in a distributional sense at least), giving rise to the energy inequality

We can lift this inequality to the cylinder , concluding that for any smooth, compactly supported in space solution to the inhomogeneous equation

One can use this inequality to analyse the norm of Poincaré series by testing on various functions (and working out using (20)). Suppose for instance that is a fixed natural number, and is a smooth compactly supported function. We consider the traveling wave given by the formula

where is the primitive of ; the point is that this is an approximate solution to the homogeneous wave equation, particularly at small values of . Clearly is compactly supported with mean zero for , in the region (we allow implied constants to depend on but not on ). In the region , and its first derivatives are , giving a contribution of to the energy (note that the shifts of the region by have bounded overlap). In particular we have

and thus by the energy inequality (using only the portion of the energy)

for , where

Clearly is supported on the region . For , one can compute that , giving a contribution of to the right-hand side. When is much less than but much larger than , we have , which after some calculation yields . As this decays so quickly as , one can compute (using for instance the expansion (14) of (13) and crude estimates, ignoring all cancellation) that this contributes a total of to the right-hand side also. Finally one has to deal with the region , but is much less than . Here, is equal to , and is equal to , which after some computation makes equal to . Again, one can compute the contribution of this term to the energy inequality to be . We conclude that

Applying the expansion (14) of (13), we conclude that

The expression is only non-zero when , and the integrand is only non-zero when and , which makes the phase of size . For much smaller than , the phase is thus largely irrelevant and the quantity is roughly comparable to for . As such, the bound (21) can be viewed as a smoothed out version of the estimate

which is basically Linnik’s conjecture, mentioned in Remark 2. One can make this connection between Selberg’s eigenvalue conjecture and Linnik’s conjecture more precise: see Section 16.6 of Iwaniec and Kowalski, which goes through modified Bessel functions rather than through wave equation methods.

** — 3. Pointwise bounds on Poincaré series — **

The formula (14) for (13) allows one to compute norms of Poincaré series. By using Sobolev embedding, one can then obtain pointwise control on such Poincaré series, as long as one stays away from the cusps. For instance, suppose we are interested in evaluating a Poincaré series at a point of the form for some . From the Sobolev inequality we have

for any smooth function , and thus by translation

The ball meets only boundedly many translates of the standard fundamental domain of , and hence does too. Since is a subgroup of , we conclude that meets only boundedly many translates of a fundamental domain for . In particular, we obtain the Sobolev inequality

for any smooth -automorphic function . This estimate is unfortunately a little inefficient when is large, since the ball has area comparable to one, whereas the quotient space has area roughly comparable to , so that one is conceding quite a bit by replacing the ball by the quotient space. Nevertheless this estimate is still useful enough to give some good results. We illustrate this by proving the estimate

for with coprime to , where is a fixed smooth function supported in, say, (and implied constants are allowed to depend on ), and the asymptotic notation is with regard to the limit . This type of estimate (appearing for instance (in a stronger form) in this paper of Duke, Friedlander, and Iwaniec; see also Proposition 21.10 of Iwaniec and Kowalski.) establishes some equidistribution of the square roots as varies (while staying comparable to ). For comparison, crude estimates (ignoring the cancellation in the phase ) give a bound of , so the bound (23) is non-trivial whenever is significantly smaller than . Estimates such as (23) are also useful for getting good error terms in the asymptotics for the expression (10), as was first done by Hooley.

One can write (23) in terms of Poincaré series much as was done for (10). Using the fact that the discriminant has class number one as before, we see that for every positive and with , we can find an element of such that has imaginary part and real part modulo one (thus, and ); this element is unique up to left translation by . We can thus write the left-hand side of (23) as

where

and are the bottom two entries of the matrix (determined up to sign). The condition implies (since must be coprime) that are coprime to with for some with ; conversely, if obey such a condition then . The number of such is at most . Thus it suffices to show that

for each such .

The constraint constrains to a single right coset of . Thus the left-hand side can be written as

which is just . Applying (22) (and interchanging the Poincaré series and the Laplacian), it thus suffices to show that

where

By hypothesis, the coefficient is bounded, and so has all derivatives bounded while remaining supported in . Because of this, the arguments used to establish (24) can be adapted without difficulty to establish (25).

Using the expansion (14) of (13), we can write the left-hand side of (24) as

where

The first term can be computed to give a contribution of , so it suffices to show that

The quantity is vanishing unless . In that case, the integrand vanishes unless and , so by the triangle inequality we have . So the left-hand side of (26) is bounded by

By the Weil bound for Kloosterman sums, we have , so on factoring out from we can bound the previous expression by

and the claim follows.

Remark 3By using improvements to Selberg’s 3/16 theorem (such as the result of Kim and Sarnak improving this fraction to ) one can improve the second term in the right-hand side of (23) slightly.

Filed under: expository, math.AP, math.NT, math.SP Tagged: automorphic forms, Poincare series, wave equation

A few links:

- Scientific American has a special feature to celebrate the 100th anniversary of General Relativity, and I wrote a contribution about Thought Experiments.
- Believe it or not, but together with the awesome Naomi Lubick I wrote this week's feature article for New Scientist. It is about the research by Glenn Starkman et al summarized in this paper.
- I was interviewed by a NYC based radio station for a program called "Equal Time for Free Thought". You can listen to the recording of the full interview here. (They cut out a lot of additional explanations I gave, I suppose it was too technical.)
- I am presently attending this conference about Hawking and his radiation, which I'm sure will be a very interesting event.

Nowadays Physics is a very big chunck of science, and although in our University courses we try to give our students a basic knowledge of all of it, it has become increasingly clear that it is very hard to keep up to date with the developments in such diverse sub-fields as quantum optics, material science, particle physics, astrophysics, quantum field theory, statistical physics, thermodynamics, etcetera.

Simply put, there is not enough time within the average life time of a human being to read and learn about everything that is being studied in dozens of different disciplines that form what one may generically call "Physics.

Simply put, there is not enough time within the average life time of a human being to read and learn about everything that is being studied in dozens of different disciplines that form what one may generically call "Physics.

Fig beetles.

(Slightly blurred due to it being windy and a telephoto shot with a light handheld point-and-shoot...)

-cvj Click to continue reading this post

The post Beetlemania… appeared first on Asymptotia.

Some more good results from the garden, after I thought that the whole crop was again going to be prematurely doomed, like last year. I tried to photograph the other thing about this year's gardening narrative that I intend to tell you about, but with poor results, but I'll say more shortly. In the meantime, for the record here are some Carmello tomatoes and some of a type of Russian Black [...] Click to continue reading this post

The post Red and Round… appeared first on Asymptotia.

Diagram depicting pion exchange between a proton and a neutron. Image souce: Wikipedia. |

When the discovery of the Higgs boson was confirmed by CERN, physicists cheered and the world cheered with them. Finally they knew what gave mass to matter, or so the headlines said. Except that most of the mass carried by matter doesn’t come courtesy of the Higgs. Rather, it’s a short-lived particle called the pion that generates it.

The pion is the most prevalent meson, composed of a quark and anti-quark, and the reason you missed the headlines announcing its discovery is that it took place already in 1947. But the mechanism by which pions give rise to mass still holds some mysteries, notably nobody really knows at which temperature it happens. In a recent PRL now researchers at RHIC in Brookhaven have taken a big step towards filling in this blank.STAR Collaboration

Phys. Rev. Lett. 114, 252302 (2015)

arXiv:1504.02175 [nucl-ex]

In contrast to the Higgs which gives masses to elementary particles, the pion is responsible for generating most of the masses of composite particles that are found in atomic nuclei, protons and neutrons, collectively called nucleons. If we would only add up the masses of the elementary particles – the up and down quarks – that they are made up of, we would get a badly wrong answer. Instead, much of the mass is in a background condensate, mathematically analogous to the Higgs field (vev).

The pion is the Goldstone boson of the “chiral symmetry” of the standard model, a symmetry that relates left-handed with right-handed particles. It is also one of the best examples for technical naturalness. The pion’s mass is suspiciously small, smaller than one would naively expect, and therefore technical naturalness tells us that we ought to find an additional symmetry when the mass is entirely zero. And indeed it is the chiral symmetry that is recovered when the pions’ masses all vanish. The pions aren’t exactly massless because chiral symmetry isn’t an exact symmetry, after all the Higgs does create masses for the quarks, even if they are only small ones.

Mathematically all this is well understood, but the devil is in the details. The breaking of chiral symmetry happens at an energy where the strong nuclear force is strong indeed. This is in contrast to the breaking of electro-weak symmetry that the Higgs participates in, which happens at much higher energies. The peculiar nature of the strong force has it that the interaction is “asymptotically free”, meaning it gets weaker at higher energies. When it’s weak, it is well understood. But at low energies, such as close by chiral symmetry breaking, little can be calculated from first principles. Instead, one works on the level of effective models, such as that based on pions and nucleons rather than quarks and gluons.

We know that quarks cannot float around freely but that they are always bound together to multiples that mostly neutralize the “color charge” that makes quarks attract each other. This requirement of quarks to form bound states at low energies is known as “confinement” and exactly how it comes about is one of the big open questions in theoretical physics. Particle physicists deal with their inability to calculate it by using tables and various models for how the quarks find and bind each other.

The breaking of chiral symmetry which gives mass to nucleons is believed to take place at a temperature close by the transition in which quarks stop being confined. This deconfinement transition has been subject of much interest and lead to some stunning insights about the properties of the plasma that quarks form when no longer confined. In particular this plasma turned out to have much lower viscosity than originally believed, and the transition turned out to be much smoother than expected. Nature is always good for a surprise. But the chiral phase transition hasn’t attracted much attention, at least so far, though maybe this is about to change now.

These properties of nuclear matter cannot be studied in collisions of single highly energetic particles, like proton-proton collisions at the LHC. Instead, one needs to bring together as many quarks as possible, and for this reason one collides heavy nuclei, for example gold or lead nuclei. RHIC at Brookhaven is one of the places where these studies are done. The GSI in Darmstadt, Germany, another one. And the LHC also has a heavy ion program, another run of which is expected to take place later this year.

But how does one find out whether chiral symmetry is restored together with the deconfinement transition? It’s a tough question that I recall being discussed already when I was an undergraduate student. The idea that emerged over long debates was to make use of the coupling of chiral matter to magnetic fields.

The heavy ions that collide move at almost the speed of light and they are electrically charged. Since moving charges create magnetic fields, this generically causes very strong magnetic fields in the collision region. The charged pions that are produced in large amounts when the nuclei collide couple to the magnetic field. And their coupling depends on whether or not they have masses, ie it depends on whether chiral symmetry was restored or not. And so the idea is that one measures the distribution of charged pions that come out of the collision of the heavy nuclei, and from that one infers whether chiral symmetry was restored and, ideally, what was the transition temperature and the type of phase transition.

So much for the theory. In practice of course it isn’t that easy to find out exactly what gave rise to the measured distribution. And so the recent results have to be taken with a grain of salt: even the title of the paper carefully speaks of a “possible observation” rather than declaring it has been observed. It will certainly take more study to make sure they are really seeing chiral symmetry restoration and not something else. In any case though, I find this an interesting development because it demonstrates that the method works, and I think this will be a fruitful research direction about which we will hear more in the future.

For me chirality has always been the most puzzling aspect of the standard model. It’s just so uncalled for. That molecules come in left-handed and right-handed variants, and biology on Earth settled on mostly the left-handed, ones can be put down as a historical accident – one that might not even have repeated on other planets. But fundamentally, on the level of elementary particles, where would such a distinction come from?

The pion, I think, deserves a little bit more attention.

As an undergrad, I was a mechanical engineering major doing an engineering physics program from the engineering side. When I was a sophomore, my lab partner in the engineering fluid mechanics course, Brian, was doing the same program, but from the physics side. Rather than doing a pre-made lab, we chose to take the opportunity to do an experiment of our own devising. We had a great plan. We wanted to compare the drag forces on different shapes of boat hulls. The course professor got us permission to go to a nearby research campus, where we would be able to take our homemade models and run them in their open water flow channel (like an infinity pool for engineering experiments) for about three hours one afternoon.

The idea was simple: The flowing water would tend to push the boat hull downstream due to drag. We would attach a string to the hull, run the string over a pulley, and hang known masses on the end of the string, until the weight of the masses (transmitted via the string) pulled upstream to balance out the drag force - that way, when we had the right amount of weight on there, the boat hull would sit motionless in the flow channel. By plotting the weight vs. the flow velocity, we'd be able to infer the dependence of the drag force on the flow speed, and we could compare different hull designs.

Like many great ideas, this was wonderful right up until we actually tried to implement it in practice. Because we were sophomores and didn't really have a good feel for the numbers, we hadn't estimated anything and tacitly assumed that our approach would work. Instead, the drag forces on our beautiful homemade wood hulls were much smaller than we'd envisioned, so much so that just the horizontal component of the force from the sagging string itself was enough to hold the boats in place. With only a couple of hours at our disposal, we had to face the fact that our whole measurement scheme was not going to work.

What did we do? With improvisation that would have made McGyver proud, we used a protractor, chewing gum, and the spring from a broken ballpoint pen to create a much "softer" force measurement apparatus, dubbed the**Force-o-Matic**. With the gum, we anchored one end of the stretched spring to the "origin" point of the protractor, with the other end attached to a pointer made out of the pen cap, oriented to point vertically relative to the water surface. With fine thread instead of the heavier string, we connected the boat hull to the tip of the pointer, so that tension in the thread *laterally* deflected the extended spring by some angle. We could then later calibrate the force required to produce a certain angular deflection. We got usable data, an A on the project, and a real introduction, vividly memorable 25 years later, to **real** experimental work.

The idea was simple: The flowing water would tend to push the boat hull downstream due to drag. We would attach a string to the hull, run the string over a pulley, and hang known masses on the end of the string, until the weight of the masses (transmitted via the string) pulled upstream to balance out the drag force - that way, when we had the right amount of weight on there, the boat hull would sit motionless in the flow channel. By plotting the weight vs. the flow velocity, we'd be able to infer the dependence of the drag force on the flow speed, and we could compare different hull designs.

Like many great ideas, this was wonderful right up until we actually tried to implement it in practice. Because we were sophomores and didn't really have a good feel for the numbers, we hadn't estimated anything and tacitly assumed that our approach would work. Instead, the drag forces on our beautiful homemade wood hulls were much smaller than we'd envisioned, so much so that just the horizontal component of the force from the sagging string itself was enough to hold the boats in place. With only a couple of hours at our disposal, we had to face the fact that our whole measurement scheme was not going to work.

What did we do? With improvisation that would have made McGyver proud, we used a protractor, chewing gum, and the spring from a broken ballpoint pen to create a much "softer" force measurement apparatus, dubbed the

The news is more-or-less what the title says!

In *Science*, a group led by Anthony Laing at Bristol has now reported BosonSampling with 6 photons, beating their own previous record of 5 photons, as well as the earlier record of 4 photons achieved a few years ago by the Walmsley group at Oxford (as well as the 3-photon experiments done by groups around the world). I only learned the big news from a commenter on this blog, after the paper was already published (protip: if you’ve pushed forward the BosonSampling frontier, feel free to shoot me an email about it).

As several people explain in the comments, the *main* advance in the paper is arguably not increasing the number of photons, but rather the fact that the device is completely reconfigurable: you can try hundreds of different unitary transformations with the same chip. In addition, the 3-photon results have an unprecedentedly high fidelity (about 95%).

The 6-photon results are, of course, consistent with quantum mechanics: the transition amplitudes are indeed given by permanents of 6×6 complex matrices. Key sentence:

After collecting 15 sixfold coincidence events, a confidence of P = 0.998 was determined that these are drawn from a quantum (not classical) distribution.

No one said scaling BosonSampling would be easy: I’m guessing that it took weeks of data-gathering to get those 15 coincidence events. Scaling up further will probably require improvements to the sources.

There’s also a caveat: their initial state consisted of 2 modes with 3 photons each, as opposed to what we *really* want, which is 6 modes with 1 photon each. (Likewise, in the Walmsley group’s 4-photon experiments, the initial state consisted of 2 modes with 2 photons each.) If the number of modes stayed 2 forever, then the output distributions would remain easy to sample with a classical computer no matter how many photons we had, since we’d then get permanents of matrices with only 2 distinct rows. So “scaling up” needs to mean increasing not only the number of photons, but also the number of sources.

Nevertheless, this is an obvious step forward, and it came sooner than I expected. Huge congratulations to the authors on their accomplishment!

But you might ask: given that 6×6 permanents are still pretty easy for a classical computer (the more so when the matrices have only 2 distinct rows), why should anyone care? Well, the new result has major implications for what I’ve always regarded as the central goal of quantum computing research, much more important than breaking RSA or Grover search or even quantum simulation: namely, *getting Gil Kalai to admit he was wrong*. Gil is on record, repeatedly, on this blog as well as his (see for example here), as saying that he doesn’t think BosonSampling will ever be possible even with 7 or 8 photons. I don’t know whether the 6-photon result is giving him second thoughts (or sixth thoughts?) about that prediction.

“Quantum Information meets Quantum Matter”, it sounds like the beginning of a perfect romance story. It is probably not the kind that makes an Oscar movie, but it does get many physicists excited, physicists including Bei, Duanlu, Xiaogang and me. … Continue reading

“Quantum Information meets Quantum Matter”, it sounds like the beginning of a perfect romance story. It is probably not the kind that makes an Oscar movie, but it does get many physicists excited, physicists including Bei, Duanlu, Xiaogang and me. Actually we find the story so compelling that we decided to write a book about it, and it all started one day in 2011 when Bei popped the question ‘Do you want to write a book about it?’ during one of our conversations.

This idea quickly sparked enthusiasm among the rest of us, who have all been working in this interdisciplinary area and are witness to its rising power. In fact Xiao-Gang has had the same idea of book writing for some time. So now here we are, four years later, posting the first version of the book on arXiv last week. (arXiv link)

The book is a condensed matter book on the topic of strongly interacting many-body systems, with a special focus on the emergence of topological order. This is an exciting topic, with new developments everyday. We are not trying to cover the whole picture, but rather to present just one perspective – the quantum information perspective – of the story. Quantum information ideas, like entanglement, quantum circuit, quantum codes are becoming ever more popular nowadays in condensed matter study and have lead to many important developments. On the other hand, they are not usually taught in condensed matter courses or covered by condensed matter books. Therefore, we feel that writing a book may help bridge the gap.

We keep the writing in a self-consistent way, requiring minimum background in quantum information and condensed matter. The first part introduces concepts in quantum information that is going to be useful in the later study of condensed matter systems. (It is by no means a well-rounded introduction to quantum information and should not be read in that way.) The second part moves onto explaining one major topic of condensed matter theory, the local Hamiltonians and their ground states, and contains introduction to the most basic concepts in condensed matter theory like locality, gap, universality, etc. The third part then focuses on the emergence of topological order, first presenting a historical and intuitive picture of topological order and then building a more systematic approach based on entanglement and quantum circuit. With this framework established, the fourth part studies some interesting topological phases in 1D and 2D, with the help of the tensor network formalism. Finally part V concludes with the outlook of where this miraculous encounter of quantum information and condensed matter would take us – the unification between information and matter.

We hope that, with such a structure, the book is accessible to both condensed matter students / researchers interested in this quantum information approach and also quantum information people who are interested in condensed matter topics. And of course, the book is also limited by the perspective we are taking. Compared to a standard condensed matter book, we are missing even the most elementary ingredient – the free fermion. Therefore, this book is not to be read as a standard textbook on condensed matter theory. On the other hand, by presenting a new approach, we hope to bring the readers to the frontiers of current research.

The most important thing I want to say here is: this arXiv version is * NOT* the final version. We posted it so that we can gather feedbacks from our colleagues. Therefore, it is not yet ready for junior students to read in order to learn the subject. On the other hand, if you are a researcher in a related field, please send us criticism, comments, suggestions, or whatever comes to your mind. We will be very grateful for that! (One thing we already learned (thanks Burak!) is that we forgot to put in all the references on conditional mutual information. That will be corrected in a later version, together with everything else.) The final version will be published by Springer as part of their “Quantum Information Science and Technology” series.

I guess it is quite obvious that me writing on the blog of the Institute for Quantum Information and Matter (IQIM) about this book titled “Quantum Information meets Quantum Matter” (QIQM) is not a simple coincidence. The romance story between the two emerged in the past decade or so and has been growing at a rate much beyond expectations. Our book is merely an attempt to record some aspects of the beginning. Let’s see where it will take us.

I’m old enough to remember when cutting and pasting were really done with scissors and glue (or Scotch tape). When I was a graduate student in the late 1970s, few physicists typed their own papers, and if they did they … Continue reading

I’m old enough to remember when cutting and pasting were really done with scissors and glue (or Scotch tape). When I was a graduate student in the late 1970s, few physicists typed their own papers, and if they did they left gaps in the text, to be filled in later with handwritten equations. The gold standard of technical typing was the IBM Correcting Selectric II typewriter. Among its innovations was the correction ribbon, which allowed one to remove a typo with the touch of a key. But it was especially important for scientists that the Selectric could type mathematical characters, including Greek letters.

It wasn’t easy. Many different typeballs were available, to support various fonts and special characters. Typing a displayed equation or in-line equation usually involved swapping back and forth between typeballs to access all the needed symbols. Most physics research groups had staff who knew how to use the IBM Selectric and spent much of their time typing manuscripts.

Though the IBM Selectric was used by many groups, typewriters have unique personalities, as forensic scientists know. I had a friend who claimed he had learned to recognize telltale differences among documents produced by various IBM Selectric machines. That way, whenever he received a referee report, he could identify its place of origin.

Manuscripts did not evolve through 23 typeset versions in those days, as one of my recent papers did. Editing was arduous and frustrating, particularly for a lowly graduate student like me, who needed to beg Blanche to set aside what she was doing for Steve Weinberg and devote a moment or two to working on my paper.

It was tremendously liberating when I learned to use TeX in 1990 and started typing my own papers. (Not LaTeX in those days, but Plain TeX embellished by a macro for formatting.) That was a technological advance that definitely improved my productivity. An earlier generation had felt the same way about the Xerox machine.

But as I was reminded a few days ago, while technological advances can be empowering, they can also be dangerous when used recklessly. I was editing a very long document, and decided to make a change. I had repeatedly used $x$ to denote an n-bit string, and thought it better to use $\vec x$ instead. I was walking through the paper with the replace button, changing each $x$ to $\vec x$ where the change seemed warranted. But I slipped once, and hit the “Replace All” button instead of “Replace.” My computer curtly informed me that it had made the replacement 1011 times. Oops …

This was a revocable error. There must have been a way to undo it (though it was not immediately obvious how). Or I could have closed the file without saving, losing some recent edits but limiting the damage.

But it was late at night and I was tired. I panicked, immediately saving and LaTeXing the file. It was a mess.

Okay, no problem, all I had to do was replace every \vec x with x and everything would be fine. Except that in the original replacement I had neglected to specify “Match Case.” In 264 places $X$ had become $\vec x$, and the new replacement did not restore the capitalization. It took hours to restore every $X$ by hand, and there are probably a few more that I haven’t noticed yet.

Which brings me to the cautionary tale of one of my former graduate students, Robert Navin. Rob’s thesis had two main topics, scattering off vortices and scattering off monopoles. On the night before the thesis due date, Rob made a horrifying discovery. The crux of his analysis of scattering off vortices concerned the singularity structure of a certain analytic function, and the chapter about vortices made many references to the poles of this function. What Rob realized at this late stage is that these singularities are actually branch points, not poles!

What to do? It’s late and you’re tired and your thesis is due in a few hours. Aha! Global search and replace! Rob replaced every occurrence of “pole” in his thesis by “branch point.” Problem solved.

Except … Rob had momentarily forgotten about that chapter on monopoles. Which, when I read the thesis, had been transformed into a chapter on monobranch points. His committee accepted the thesis, but requested some changes …

Rob Navin no longer does physics, but has been very successful in finance. I’m sure he’s more careful now.

Been a while since I shared a snippet from the graphic book in progress. And this time the dialogue is not redacted! A few remarks: [...] Click to continue reading this post

The post Mid-Conversation… appeared first on Asymptotia.

Today I learned the sad news that Jacob Bekenstein, one of the great theoretical physicists of our time, passed away at the too-early age of 68.

Everyone knows what a big deal it was when Stephen Hawking discovered in 1975 that black holes radiate. Bekenstein was the guy who, as a grad student in Princeton in the early 1970s, was already raving about black holes having nonzero entropy and temperature, and satisfying the Second Law of Thermodynamics—something just about everyone, including Hawking, considered nuts at the time. It was, as I understand it, Hawking’s failed attempt to prove Bekenstein wrong that led to Hawking’s discovery of the Hawking radiation, and thence to the modern picture of black holes.

In the decades since, Bekenstein continued to prove ingenious physical inequalities, often using thought experiments involving black holes. The most famous of these, the Bekenstein bound, says that the number of bits that can be stored in any bounded physical system is finite, and is upper-bounded by ~2.6×10^{43} MR, where M is the system’s mass in kilograms and R is its radius in meters. (This bound is saturated by black holes, and only by black holes, which therefore emerge as the most compact possible storage medium—though probably not the best for retrieval!) Bekenstein’s lectures were models of clarity and rigor: at conferences full of audacious speculations, he stood out to my non-expert eyes as someone who was simply trying to follow chains of logic from accepted physical principles, however mind-bogglingly far those chains led but no further.

I first met Bekenstein in 2003, when I was a grad student spending a semester at Hebrew University in Jerusalem. I was struck by the kindness he showed a 21-year-old nobody, who wasn’t even a *physics* student, coming to bother him. Not only did he listen patiently to my blather about applying computational complexity to physics, he said that *of course* physics should ultimately aim to understand everything as the output of some computer program, that he too was thinking in terms of computation when he studied black-hole entropy. I remember pondering the fact that the greatest reductionist I’d ever met was wearing a yarmulke—and then scolding myself for wasting precious brain-cycles on such a trivial thought when there was science to discuss*.* I met Bekenstein maybe four or five more times on visits to Israel, most recently a year and a half ago, when we shared walks to and from the hotel at a firewall workshop at the Weizmann Institute. He was unfailingly warm, modest, and generous—totally devoid of the egotism that I’ve *heard* can occasionally afflict people of his stature. Now, much like with the qubits hitting the event horizon, the information that comprised Jacob Bekenstein might seem to be gone, but it remains woven into the cosmos.

The Planck collaboration is releasing new publications based on their full dataset, including CMB temperature and large-scale polarization data. The updated values of the crucial cosmological parameters were already made public in December last year, however one important new element is the combination of these result with the joint Planck/Bicep constraints on the CMB B-mode polarization. The consequences for models of inflation are summarized in this plot:

It shows the constraints on the spectral index* ns* and the tensor-to-scalar ratio *r *of the CMB fluctuations, compared to predictions of various single-field models of inflation. The limits on *ns* changed slightly compared to the previous release, but the more important progress is along the y-axis. After including the joint Planck/Bicep analysis (in the plot referred to as BKP), the combined limit on the tensor-to-scalar ratio becomes r < 0.08. What is also important, the new limit is much more robust; for example, allowing for a scale dependence of the spectral index relaxes the bound only slightly, to r< 0.10.

The new results have a large impact on certain classes models. The model with the quadratic inflaton potential, arguably the simplest model of inflation, is now strongly disfavored. Natural inflation, where the inflaton is a pseudo-Golsdtone boson with a cosine potential, is in trouble. More generally, the data now favors a concave shape of the inflaton potential during the observable period of inflation; that is to say, it looks more like a hilltop than a half-pipe. A strong player emerging from this competition is R^2 inflation which, ironically, is the first model of inflation ever written. That model is equivalent to an exponential shape of the inflaton potential,*V*=c[1-exp(-a φ/MPL)]^2, with *a=sqrt(2/3)* in the exponent. A wider range of the exponent *a *can* *also fit the data, as long as *a *is not too small. If your favorite theory predicts an exponential potential of this form, it may be a good time to work on it. However, one should not forget that other shapes of the potential are still allowed, for example a similar exponential potential without the square *V*~ 1-exp(-a φ/MPL), a linear potential *V*~*φ*, or more generally any power law potential *V*~φ^*n*, with the power *n≲*1. At this point, the data do not favor significantly one or the other. The next waves of CMB polarization experiments should clarify the picture. In particular, R^2 inflation predicts 0.003 < r < 0.005, which is should be testable in a not-so-distant future.

Planck's inflation paper is here.

It shows the constraints on the spectral index

The new results have a large impact on certain classes models. The model with the quadratic inflaton potential, arguably the simplest model of inflation, is now strongly disfavored. Natural inflation, where the inflaton is a pseudo-Golsdtone boson with a cosine potential, is in trouble. More generally, the data now favors a concave shape of the inflaton potential during the observable period of inflation; that is to say, it looks more like a hilltop than a half-pipe. A strong player emerging from this competition is R^2 inflation which, ironically, is the first model of inflation ever written. That model is equivalent to an exponential shape of the inflaton potential,

Planck's inflation paper is here.

Last night, for the first time, the LHC collided particles at the center-of-mass energy of 13 TeV. Routine collisions should follow early in June. The plan is to collect 5-10 inverse femtobarn (fb-1) of data before winter comes, adding to the 25 fb-1 from Run-1. It's high time dust off your Madgraph and tool up for what may be the most exciting time in particle physics in this century. But when exactly should we start getting excited? When should we start friending LHC experimentalists on facebook? When is the time to look over their shoulders for a glimpse of of gluinos popping out of the detectors. One simple way to estimate the answer is to calculate what is the luminosity when the number of particles produced at 13 TeV will exceed that produced during the whole Run-1. This depends on the ratio of the production cross sections at 13 and 8 TeV which is of course strongly dependent on the particle's mass and production mechanism. Moreover, the LHC discovery potential will also depend on how the background processes change, and on a host of other experimental issues. Nevertheless, let us forget for a moment about the fine-print, and calculate the *ratio* of 13 and 8 TeV cross sections for a few particles popular among the general public. This will give us a rough estimate of the threshold *luminosity *when things should get interesting.

In summary, the progress will be very fast for new heavy particles. In particular, for gluon-initiated production of TeV-scale particles already the first inverse femtobarn may bring us into a new territory. For lighter particles the progress will be slower, especially when backgrounds are difficult. On the other hand, precision physics, such as the Higgs couplings measurements, is unlikely to be in the spotlight this year.

**Higgs boson:**Ratio≈2.3; Luminosity≈10 fb-1.

Higgs physics will not be terribly exciting this year, with only a modest improvement of the couplings measurements expected.**tth:**Ratio≈4; Luminosity≈6 fb-1.

Nevertheless, for certain processes involving the Higgs boson the improvement may be a bit faster. In particular, the theoretically very important process of Higgs production in association with top quarks (tth) was on the verge of being detected in Run-1. If we're lucky, this year's data may tip the scale and provide an evidence for a non-zero top Yukawa couplings.**300 GeV Higgs partner:**Ratio≈2.7 Luminosity≈9 fb-1.

Not much hope for new scalars in the Higgs family this year.**800 GeV stops:**Ratio≈10; Luminosity≈2 fb-1.

800 GeV is close to the current lower limit on the mass of a scalar top partner decaying to a top quark and a massless neutralino. In this case, one should remember that backgrounds also increase at 13 TeV, so the progress will be a bit slower than what the above number suggests. Nevertheless, this year we will certainly explore new parameter space and make the naturalness problem even more severe. Similar conclusions hold for a fermionic top partner.**3 TeV Z' boson:**Ratio≈18; Luminosity≈1.2 fb-1.

Getting interesting! Limits on Z' bosons decaying to leptons will be improved very soon; moreover, in this case background is not an issue.**1.4 TeV gluino:**Ratio≈30; Luminosity≈0.7 fb-1.

If all goes well, better limits on gluinos can be delivered by the end of the summer!

In summary, the progress will be very fast for new heavy particles. In particular, for gluon-initiated production of TeV-scale particles already the first inverse femtobarn may bring us into a new territory. For lighter particles the progress will be slower, especially when backgrounds are difficult. On the other hand, precision physics, such as the Higgs couplings measurements, is unlikely to be in the spotlight this year.

This weekend's plot shows the region in the stop mass and mixing space of the MSSM that reproduces the measured Higgs boson mass of 125 GeV:

Unlike in the Standard Model, in the minimal supersymmetric extension of the Standard Model (MSSM) the Higgs boson mass is not a free parameter; it can be calculated given all masses and couplings of the supersymmetric particles. At the lowest order, it is equal to the Z bosons mass 91 GeV (for large enough tanβ). To reconcile the predicted and the observed Higgs mass, one needs to invoke large loop corrections due to supersymmetry breaking. These are dominated by the contribution of the top quark and its 2 scalar partners (stops) which couple most strongly of all particles to the Higgs. As can be seen in the plot above, the stop mass preferred by the Higgs mass measurement is around 10 TeV. With a little bit of conspiracy, if the mixing between the two stops is just right, this can be lowered to about 2 TeV. In any case, this means that, as long as the MSSM is the correct theory, there is little chance to discover the stops at the LHC.

This conclusion may be surprising because previous calculations were painting a more optimistic picture. The results above are derived with the new SUSYHD code, which utilizes effective field theory techniques to compute the Higgs mass in the presence of heavy supersymmetric particles. Other frequently used codes, such as FeynHiggs or Suspect, obtain a significantly larger Higgs mass for the same supersymmetric spectrum, especially near the maximal mixing point. The difference can be clearly seen in the plot to the right (called the boobs plot by some experts). Although there is a debate about the size of the error as estimated by SUSYHD, other effective theory calculations report the same central values.

Unlike in the Standard Model, in the minimal supersymmetric extension of the Standard Model (MSSM) the Higgs boson mass is not a free parameter; it can be calculated given all masses and couplings of the supersymmetric particles. At the lowest order, it is equal to the Z bosons mass 91 GeV (for large enough tanβ). To reconcile the predicted and the observed Higgs mass, one needs to invoke large loop corrections due to supersymmetry breaking. These are dominated by the contribution of the top quark and its 2 scalar partners (stops) which couple most strongly of all particles to the Higgs. As can be seen in the plot above, the stop mass preferred by the Higgs mass measurement is around 10 TeV. With a little bit of conspiracy, if the mixing between the two stops is just right, this can be lowered to about 2 TeV. In any case, this means that, as long as the MSSM is the correct theory, there is little chance to discover the stops at the LHC.

This conclusion may be surprising because previous calculations were painting a more optimistic picture. The results above are derived with the new SUSYHD code, which utilizes effective field theory techniques to compute the Higgs mass in the presence of heavy supersymmetric particles. Other frequently used codes, such as FeynHiggs or Suspect, obtain a significantly larger Higgs mass for the same supersymmetric spectrum, especially near the maximal mixing point. The difference can be clearly seen in the plot to the right (called the boobs plot by some experts). Although there is a debate about the size of the error as estimated by SUSYHD, other effective theory calculations report the same central values.

Here is a late weekend plot with new limits on the dark photon parameter space:

The dark photon is a hypothetical massive spin-1 boson mixing with the ordinary photon. The minimal model is fully characterized by just 2 parameters: the mass mA' and the mixing angle ε. This scenario is probed by several different experiments using completely different techniques. It is interesting to observe how quickly the experimental constraints have been improving in the recent years. The latest update appeared a month ago thanks to the NA48 collaboration. NA48/2 was an experiment a decade ago at CERN devoted to studying CP violation in kaons. Kaons can decay to neutral pions, and the latter can be recycled into a nice probe of dark photons. Most often, π0 decays to two photons. If the dark photon is lighter than 135 MeV, one of the photons can mix into an on-shell dark photon, which in turn can decay into an electron and a positron. Therefore, NA48 analyzed the π0 → γ e+ e- decays in their dataset. Such pion decays occur also in the Standard Model, with an off-shell photon instead of a dark photon in the intermediate state. However, the presence of the dark photon would produce a peak in the invariant mass spectrum of the e+ e- pair on top of the smooth Standard Model background. Failure to see a significant peak allows one to set limits on the dark photon parameter space, see the dripping blood region in the plot.

So, another cute experiment bites into the dark photon parameter space. After this update, one can robustly conclude that the mixing angle in the minimal model has to be less than 0.001 as long as the dark photon is lighter than 10 GeV. This is by itself not very revealing, because there is no theoretically preferred value of ε or mA'. However, one interesting consequence the NA48 result is that it closes the window where the minimal model can explain the 3σ excess in the muon anomalous magnetic moment.

The dark photon is a hypothetical massive spin-1 boson mixing with the ordinary photon. The minimal model is fully characterized by just 2 parameters: the mass mA' and the mixing angle ε. This scenario is probed by several different experiments using completely different techniques. It is interesting to observe how quickly the experimental constraints have been improving in the recent years. The latest update appeared a month ago thanks to the NA48 collaboration. NA48/2 was an experiment a decade ago at CERN devoted to studying CP violation in kaons. Kaons can decay to neutral pions, and the latter can be recycled into a nice probe of dark photons. Most often, π0 decays to two photons. If the dark photon is lighter than 135 MeV, one of the photons can mix into an on-shell dark photon, which in turn can decay into an electron and a positron. Therefore, NA48 analyzed the π0 → γ e+ e- decays in their dataset. Such pion decays occur also in the Standard Model, with an off-shell photon instead of a dark photon in the intermediate state. However, the presence of the dark photon would produce a peak in the invariant mass spectrum of the e+ e- pair on top of the smooth Standard Model background. Failure to see a significant peak allows one to set limits on the dark photon parameter space, see the dripping blood region in the plot.

So, another cute experiment bites into the dark photon parameter space. After this update, one can robustly conclude that the mixing angle in the minimal model has to be less than 0.001 as long as the dark photon is lighter than 10 GeV. This is by itself not very revealing, because there is no theoretically preferred value of ε or mA'. However, one interesting consequence the NA48 result is that it closes the window where the minimal model can explain the 3σ excess in the muon anomalous magnetic moment.

Twenty years have passed since the first observation of the top quark, the last of the collection of six that constitutes the matter of which atomic nuclei are made. And in these twenty years particle physics has made some quite serious leaps forward; the discovery that neutrinos oscillate and have mass (albeit a tiny one), and the discovery of the Higgs boson are the two most important ones to cite. Yet the top quark remains a very interesting object to study at particle colliders.

Of all the permutation groups, only $S_6$ has an outer automorphism. This puts a kind of ‘wrinkle’ in the fabric of mathematics, which would be nice to explore using category theory.

For starters, let $Bij_n$ be the groupoid of $n$-element sets and bijections between these. Only for $n = 6$ is there an equivalence from this groupoid to itself that isn’t naturally isomorphic to the identity!

This is just another way to say that only $S_6$ has an outer isomorphism.

And here’s another way to play with this idea:

Given any category $X$, let $Aut(X)$ be the category where objects are equivalences $f : X \to X$ and morphisms are natural isomorphisms between these. This is like a group, since composition gives a functor

$\circ : Aut(X) \times Aut(X) \to Aut(X)$

which acts like the multiplication in a group. It’s like the symmetry group of $X$. But it’s not a group: it’s a ‘2-group’, or categorical group. It’s called the automorphism 2-group of $X$.

By calling it a 2-group, I mean that $Aut(X)$ is a monoidal category where all objects have weak inverses with respect to the tensor product, and all morphisms are invertible. Any pointed space has a fundamental 2-group, and this sets up a correspondence between 2-groups and connected pointed homotopy 2-types. So, topologists can have some fun with 2-groups!

Now consider $Bij_n$, the groupoid of $n$-element sets and bijections between them. Up to equivalence, we can describe $Aut(Bij_n)$ as follows. The objects are just automorphisms of $S_n$, while a morphism from an automorphism $f: S_n \to S_n$ to an automorphism $f' : S_n \to S_n$ is an element $g \in S_n$ that conjugates one automorphism to give the other:

$f'(h) = g f(h) g^{-1} \qquad \forall h \in S_n$

So, if all automorphisms of $S_n$ are inner, all objects of $Aut(Bij_n)$ are isomorphic to the unit object, and thus to each other.

**Puzzle 1.** For $n \ne 6$, all automorphisms of $S_n$ are inner. What are the connected pointed homotopy 2-types corresponding to $Aut(Bij_n)$ in these cases?

**Puzzle 2.** The permutation group $S_6$ has an outer automorphism of order 2, and indeed $Out(S_6) = \mathbb{Z}_2.$ What is the connected pointed homotopy 2-type corresponding to $Aut(Bij_6)$?

**Puzzle 3.** Let $Bij$ be the groupoid where objects are finite sets and morphisms are bijections. $Bij$ is the coproduct of all the groupoids $Bij_n$ where $n \ge 0$:

$Bij = \sum_{n = 0}^\infty Bij_n$

Give a concrete description of the 2-group $Aut(Bij)$, up to equivalence. What is the corresponding pointed connected homotopy 2-type?

You can get a bit of intuition for the outer automorphism of $S_6$ using something called the Tutte–Coxeter graph.

Let $S = \{1,2,3,4,5,6\}$. Of course the symmetric group $S_6$ acts on $S$, but James Sylvester found a different action of $S_6$ on a 6-element set, which in turn gives an outer automorphism of $S_6$.

To do this, he made the following definitions:

• A **duad** is a 2-element subset of $S$. Note that there are ${6 \choose 2} = 15$ duads.

• A **syntheme** is a set of 3 duads forming a partition of $S$. There are also 15 synthemes.

• A **synthematic total** is a set of 5 synthemes partitioning the set of 15 duads. There are 6 synthematic totals.

Any permutation of $S$ gives a permutation of the set $T$ of synthematic totals, so we obtain an action of $S_6$ on $T$. Choosing any bijection betweeen $S$ and $T$, this in turn gives an action of $S_6$ on $S$, and thus a homomorphism from $S_6$ to itself. Sylvester showed that this is an outer automorphism!

There’s a way to draw this situation. It’s a bit tricky, but Greg Egan has kindly done it:

Here we see 15 small red blobs: these are the duads. We also see 15 larger blue blobs: these are the synthemes. We draw an edge from a duad to a syntheme whenever that duad lies in that syntheme. The result is a graph called the **Tutte–Coxeter graph**, with 30 vertices and 45 edges.

The 6 concentric rings around the picture are the 6 synthematic totals. A band of color appears in one of these rings near some syntheme if that syntheme is part of that synthematic total.

If we draw the Tutte–Coxeter graph without all the decorations, it looks like this:

The red vertices come from duads, the blue ones from synthemes. The outer automorphism of $S_6$ gives a symmetry of the Tutte–Coxeter graph that switches the red and blue vertices!

The inner automorphisms, which correspond to elements of $S_6$, also give symmetries: for each element of $S_6$, the Tutte–Coxeter graph has a symmetry that permutes the numbers in the picture. These symmetries map red vertices to red ones and blue vertices to blue ones.

The group $\mathrm{Aut}(S_6)$ has

$2 \times 6! = 1440$

elements, coming from the $6!$ inner automorphisms of $S_6$ and the outer automorphism of order 2. In fact, $\mathrm{Aut}(S_6)$ is the whole symmetry group of the Tutte–Coxeter graph.

For more on the Tutte–Coxeter graph, see my post on the AMS-hosted blog *Visual Insight*:

There has been a lot of interest online recently about the "drought balls" that the state of California is using to limit unwanted photochemistry and evaporation in its reservoirs. These are hollow balls each about 10 cm in diameter, made from a polymer mixed with carbon black. When dumped by the zillions into reservoirs, don't just help conserve water: They spontaneously become a teaching tool about condensed matter physics.

As you can see from the figure, the balls spontaneously assemble into "crystalline" domains. The balls are spherically symmetric, and they experience a few interactions: They are buoyant, so they float on the water surface; they are rigid objects, so they have what a physicist would call "hard-core, short-ranged repulsive interactions" and what a chemist would call "steric hindrance"; a regular person would say that you can't make two balls occupy the same place. Because they float and distort the water surface, they also experience some amount of an effective attractive interaction. They get agitated by the rippling of the water, but not too much. Throw all those ingredients together, and amazing things happen: The balls pack together in a very tight spatial arrangement. The balls are spherically symmetric, and there's nothing about the surface of the water that picks out a particular direction. Nonetheless, the balls "spontaneously break rotational symmetry in the plane" and pick out a directionality to their arrangement. There's nothing about the surface of the water that picks out a particular spatial scale or "origin", but the balls "spontaneously break continuous translational symmetry", picking out special evenly-spaced lattice sites. Physicists would say they preserve discrete rotational and translational symmetries. The balls in different regions of the surface were basically isolated to begin with, so they broke those symmetries differently, leading to a "polycrystalline" arrangement, with "grain boundaries". As the water jostles the system, there is a competition between the tendency to order and the ability to rearrange, and the grains rearrange over time. This arrangement of balls has rigidity and supports collective motions (basically the analog of sound) within the layer that are meaningless when talking about the individual balls. We can even spot some density of "point defects", where a ball is missing, or an "extra" ball is sitting on top.

What this tells us is that there are certain universal, emergent properties of what we think of as solids that really do not depend on the underlying microscopic details. This is a pretty deep idea - that there are collective organizing principles that give emergent universal behaviors, even from very simple and generic microscopic rules. Knowing that the balls are made deep down from quarks and leptons does not tell you anything about these properties.

As you can see from the figure, the balls spontaneously assemble into "crystalline" domains. The balls are spherically symmetric, and they experience a few interactions: They are buoyant, so they float on the water surface; they are rigid objects, so they have what a physicist would call "hard-core, short-ranged repulsive interactions" and what a chemist would call "steric hindrance"; a regular person would say that you can't make two balls occupy the same place. Because they float and distort the water surface, they also experience some amount of an effective attractive interaction. They get agitated by the rippling of the water, but not too much. Throw all those ingredients together, and amazing things happen: The balls pack together in a very tight spatial arrangement. The balls are spherically symmetric, and there's nothing about the surface of the water that picks out a particular direction. Nonetheless, the balls "spontaneously break rotational symmetry in the plane" and pick out a directionality to their arrangement. There's nothing about the surface of the water that picks out a particular spatial scale or "origin", but the balls "spontaneously break continuous translational symmetry", picking out special evenly-spaced lattice sites. Physicists would say they preserve discrete rotational and translational symmetries. The balls in different regions of the surface were basically isolated to begin with, so they broke those symmetries differently, leading to a "polycrystalline" arrangement, with "grain boundaries". As the water jostles the system, there is a competition between the tendency to order and the ability to rearrange, and the grains rearrange over time. This arrangement of balls has rigidity and supports collective motions (basically the analog of sound) within the layer that are meaningless when talking about the individual balls. We can even spot some density of "point defects", where a ball is missing, or an "extra" ball is sitting on top.

What this tells us is that there are certain universal, emergent properties of what we think of as solids that really do not depend on the underlying microscopic details. This is a pretty deep idea - that there are collective organizing principles that give emergent universal behaviors, even from very simple and generic microscopic rules. Knowing that the balls are made deep down from quarks and leptons does not tell you anything about these properties.

Some butterflies have shiny, vividly colored wings. From different angles you see different colors. This effect is called iridescence. How does it work? It turns out these butterfly wings are made of very fancy materials! Light bounces around inside these materials in a tricky way. Sunlight of different colors winds up reflecting off these materials […]

Some butterflies have shiny, vividly colored wings. From different angles you see different colors. This effect is called iridescence. How does it work?

It turns out these butterfly wings are made of very fancy materials! Light bounces around inside these materials in a tricky way. Sunlight of different colors winds up reflecting off these materials in different directions.

We’re starting to understand the materials and make similar substances in the lab. They’re called photonic crystals. They have amazing properties.

Here at the Centre for Quantum Technologies we have people studying exotic materials of many kinds. Next door, there’s a lab completely devoted to studying graphene: crystal sheets of carbon in which electrons can move as if they were massless particles! Graphene has a lot of potential for building new technologies—that’s why Singapore is pumping money into researching it.

Some physicists at MIT just showed that one of the materials in butterfly wings might act like a 3d form of graphene. In graphene, electrons can only move easily in 2 directions. In this new material, electrons could move in all 3 directions, acting as if they had no mass.

The pictures here show the microscopic structure of two materials found in butterfly wings:

The picture at left actually shows a sculpture made by the mathematical artist Bathsheba Grossman. But it’s a piece of a gyroid: a surface with a very complicated shape, which repeats forever in 3 directions. It’s called a minimal surface because you can’t shrink its area by tweaking it just a little. It divides space into two regions.

The gyroid was discovered in 1970 by a mathematician, Alan Schoen. It’s a triply periodic minimal surfaces, meaning one that repeats itself in 3 different directions in space, like a crystal.

Schoen was working for NASA, and his idea was to use the gyroid for building ultra-light, super-strong structures. But that didn’t happen. Research doesn’t move in predictable directions.

In 1983, people discovered that in some mixtures of oil and water, the oil naturally forms a gyroid. The sheets of oil try to minimize their area, so it’s not surprising that they form a minimal surface. Something else makes this surface be a gyroid—I’m not sure what.

Butterfly wings are made of a hard material called chitin. Around 2008, people discovered that the chitin in some iridescent butterfly wings is made in a gyroid pattern! The spacing in this pattern is very small, about one wavelength of visible light. This makes light move through this material in a complicated way, which depends on the light’s color and the direction it’s moving.

So: *butterflies have naturally evolved a photonic crystal based on a gyroid!*

The universe is awesome, but it’s not magic. A mathematical pattern is beautiful if it’s a simple solution to at least one simple problem. This is why beautiful patterns naturally bring themselves into existence: they’re the simplest ways for certain things to happen. Darwinian evolution helps out: it scans through trillions of possibilities and finds solutions to problems. So, we should *expect* life to be packed with mathematically beautiful patterns… and it is.

The picture at right above shows a ‘double gyroid’. Here it is again:

This is actually two interlocking surfaces, shown in red and blue. You can get them by writing the gyroid as a level surface:

and taking the two nearby surfaces

for some small value of .

It turns out that while they’re still growing, some butterflies have a double gyroid pattern in their wings. This turns into a single gyroid when they grow up!

The new research at MIT studied how an electron would move through a double gyroid pattern. They calculated its dispersion relation: how the speed of the electron would depend on its energy and the direction it’s moving.

An ordinary particle moves faster if it has more energy. But a massless particle, like a photon, moves at the same speed no matter what energy it has. The MIT team showed that an electron in a double gyroid pattern moves at a speed that doesn’t depend much on its energy. So, in some ways this electron acts like a massless particle.

But it’s quite different than a photon. It’s actually more like a neutrino! You see, unlike photons, electrons and neutrinos are spin-1/2 particles. Neutrinos are almost massless. A massless spin-1/2 particle can have a built-in handedness, spinning in only one direction around its axis of motion. Such a particle is called a Weyl spinor. The MIT team showed that a electron moving through a double gyroid acts approximately like a Weyl spinor!

How does this work? Well, the key fact is that the double gyroid has a built-in handedness, or chirality. It comes in a left-handed and right-handed form. You can see the handedness quite clearly in Grossman’s sculpture of the ordinary gyroid:

Beware: nobody has actually made electrons act like Weyl spinors in the lab yet. The MIT team just found a way that should work. Someday someone will actually make it happen, probably in less than a decade. And later, someone will do amazing things with this ability. I don’t know what. Maybe the butterflies know!

For a good introduction to the physics of gyroids, see:

• James A. Dolan, Bodo D. Wilts, Silvia Vignolini, Jeremy J. Baumberg, Ullrich Steiner and Timothy D. Wilkinson, Optical properties of gyroid structured materials: from photonic crystals to metamaterials, *Advanced Optical Materials* **3** (2015), 12–32.

For some of the history and math of gyroids, see Alan Schoen’s webpage:

• Alan Schoen, Triply-periodic minimal surfaces.

For more on gyroids in butterfly wings, see:

• K. Michielsen and D. G. Stavenga, Gyroid cuticular structures in butterfly wing scales: biological photonic crystals.

• Vinodkumar Saranathana *et al*, Structure, function, and self-assembly of single network gyroid (I4_{1}32) photonic crystals in butterfly wing scales, *PNAS* **107** (2010), 11676–11681.

The paper by Michielsen and Stavenga is free online! They say the famous ‘blue Morpho’ butterfly shown in the picture at the top of this article does *not* use a gyroid; it uses a “two-dimensional photonic crystal slab consisting of arrays of rectangles formed by lamellae and microribs.” But they find gyroids in four other species: *Callophrys rubi*, *Cyanophrys remus*, *Pardes sesostris* and *Teinopalpus imperialis*. It compares tunnelling electron microscope pictures of slices of their iridescent patches with computer-generated slices of gyroids. The comparison looks pretty good to me:

For the evolution of iridescence, see:

• Melissa G. Meadows *et al*, Iridescence: views from many angles, *J. Roy. Soc. Interface* **6** (2009).

For the new research at MIT, see:

• Ling Lu, Liang Fu, John D. Joannopoulos and Marin Soljačić, Weyl points and line nodes in gapless gyroid photonic crystals.

• Ling Lu, Zhiyu Wang, Dexin Ye, Lixin Ran, Liang Fu, John D. Joannopoulos and Marin Soljačić, Experimental observation of Weyl points, *Science* **349** (2015), 622–624.

Again, the first is free online. There’s a lot of great math lurking inside, most of which is too mind-blowing too explain quickly. Let me just paraphrase the start of the paper, so at least experts can get the idea:

Two-dimensional (2d) electrons and photons at the energies and frequencies of Dirac points exhibit extraordinary features. As the best example, almost all the remarkable properties of graphene are tied to the massless Dirac fermions at its Fermi level. Topologically, Dirac cones are not only the critical points for 2d phase transitions but also the unique surface manifestation of a topologically gapped 3d bulk. In a similar way, it is expected that if a material could be found that exhibits a 3d linear dispersion relation, it would also display a wide range of interesting physics phenomena. The associated 3D linear point degeneracies are called “Weyl points”. In the past year, there have been a few studies of Weyl fermions in electronics. The associated Fermi-arc surface states, quantum Hall effect, novel transport properties and a realization of the Adler–Bell–Jackiw anomaly are also expected. However, no observation of Weyl points has been reported. Here, we present a theoretical discovery and detailed numerical investigation of frequency-isolated Weyl points in perturbed double-gyroid photonic crystals along with their complete phase diagrams and their topologically protected surface states.

Also a bit for the mathematicians:

Weyl points are topologically stable objects in the 3d Brillouin zone: they act as monopoles of Berry flux in momentum space, and hence are intimately related to the topological invariant known as the Chern number. The Chern number can be defined for a single bulk band or a set of bands, where the Chern numbers of the individual bands are summed, on any closed 2d surface in the 3d Brillouin zone. The difference of the Chern numbers defined on two surfaces, of all bands below the Weyl point frequencies, equals the sum of the chiralities of the Weyl points enclosed in between the two surfaces.

This is a mix of topology and physics jargon that may be hard for pure mathematicians to understand, but I’ll be glad to translate if there’s interest.

For starters, a ‘monopole of Berry flux in momentum space’ is a poetic way of talking about a twisted complex line bundle over the space of allowed energy-momenta of the electron in the double gyroid. We get a twist at every ‘Weyl point’, meaning a point where the dispersion relations look locally like those of a Weyl spinor when its energy-momentum is near zero. Near such a point, the dispersion relations are a Fourier-transformed version of the Weyl equation.

When I was a first-year grad student, I started working in my adviser's lab, learning how to do experiments at extremely low temperatures. This involved working quite a bit with liquid helium, which boils at atmospheric pressure at only 4.2 degrees above absolute zero, and is stored in big, vacuum-jacketed thermos bottles called dewars (named after James Dewar). We had to transfer liquid helium from storage dewars into our experimental systems, and very often we were interested in knowing how much helium was left in the bottom of a storage dewar.

The easiest way to do this was to use a "thumper" - a skinny (maybe 1/8" diameter) thin-walled stainless steel tube, a few feet long, open at the bottom, and silver-soldered to a larger (say 1" diameter) brass cylinder at the top, with the cylinder closed off by a stretched piece of latex glove. When the bottom of the tube was inserted into the dewar (like a dipstick) and lowered into the cold gas, the rubber membrane at the top of the thumper would spontaneously start to pulse (hence the name). The frequency of the thumping would go from a couple of beats per second when the bottom was immersed in liquid helium to more of a buzz when the bottom was raised into vapor. You can measure the depth of the liquid left in the dewar this way, and look up the relevant volume of liquid on a sticker chart on the side of the dewar.

The "thumping" pulses are called Taconis oscillations. They are an example of "thermoacoustic" oscillations. The physics involved is actually pretty neat, and I'll explain it at the end of this post, but that's not really the point of this story. I found this thumping business to be really weird, and I wanted to know how it worked, so I walked across the hall from the lab and knocked on my adviser's door, hoping to ask him for a reference. He was clearly busy (being department chair at the time didn't help), and when I asked him "How do Taconis oscillations happen?" he said, after a brief pause, "Well, they're driven by the temperature difference between the hot and cold ends of the tube, and they're a complicated nonlinear phenomenon." in a tone that I thought was dismissive. Doug O. loves explaining things, so I figured either he was trying to get rid of me, or (much less likely) he didn't really know.

I decided I really wanted to know. I went to the physics library upstairs in Varian Hall and started looking through books and chasing journal articles. Remember, this was back in the wee early dawn of the web, so there was no such thing as google or wikipedia. Anyway, I somehow found this paper and its sequels. In there are a collection of coupled partial differential equations looking at the pressure and density of the fluid, the flow of heat along the tube, the temperature everywhere, etc., and guess what: They are complicated, nonlinear, and have oscillating solutions. Damn. Doug O. wasn't blowing me off - he was completely right (and knew that a more involved explanation would have been a huge mess). I quickly got used to this situation.

**Epilogue**: So, what *is* going on in Taconis oscillations, really? Well, suppose you assume that there is gas rushing into the open end of the tube and moving upward toward the closed end. That gas is getting compressed, so it would tend to get warmer. Moreover, if the temperature gradient along the tube is steep enough, the upper walls of the tube can be warmer than the incoming gas, which then warms further by taking heat from the tube walls. Now that the pressure of the gas has built up near the closed end, there is a pressure gradient that pushes the gas back down the tube. The now warmed gas cools as it expands, but again if the tube walls have a steep temperature gradient, the gas can dump heat into the tube walls nearer the bottom. This is discussed in more detail here. Turns out that you have basically an engine, driven by the flow of heat from the top to the bottom, that cyclically drives gas pulses. The pulse amplitude ratchets up until the dissipation in the whole system equals the work done per cycle on the gas. More interesting than that: Like some engines, you can run this one *backwards*. If you drive pressure pulses properly, you can use the gas to pump heat from the cold side to the hot side - this is the basis for the thermoacoustic refrigerator.

The easiest way to do this was to use a "thumper" - a skinny (maybe 1/8" diameter) thin-walled stainless steel tube, a few feet long, open at the bottom, and silver-soldered to a larger (say 1" diameter) brass cylinder at the top, with the cylinder closed off by a stretched piece of latex glove. When the bottom of the tube was inserted into the dewar (like a dipstick) and lowered into the cold gas, the rubber membrane at the top of the thumper would spontaneously start to pulse (hence the name). The frequency of the thumping would go from a couple of beats per second when the bottom was immersed in liquid helium to more of a buzz when the bottom was raised into vapor. You can measure the depth of the liquid left in the dewar this way, and look up the relevant volume of liquid on a sticker chart on the side of the dewar.

The "thumping" pulses are called Taconis oscillations. They are an example of "thermoacoustic" oscillations. The physics involved is actually pretty neat, and I'll explain it at the end of this post, but that's not really the point of this story. I found this thumping business to be really weird, and I wanted to know how it worked, so I walked across the hall from the lab and knocked on my adviser's door, hoping to ask him for a reference. He was clearly busy (being department chair at the time didn't help), and when I asked him "How do Taconis oscillations happen?" he said, after a brief pause, "Well, they're driven by the temperature difference between the hot and cold ends of the tube, and they're a complicated nonlinear phenomenon." in a tone that I thought was dismissive. Doug O. loves explaining things, so I figured either he was trying to get rid of me, or (much less likely) he didn't really know.

I decided I really wanted to know. I went to the physics library upstairs in Varian Hall and started looking through books and chasing journal articles. Remember, this was back in the wee early dawn of the web, so there was no such thing as google or wikipedia. Anyway, I somehow found this paper and its sequels. In there are a collection of coupled partial differential equations looking at the pressure and density of the fluid, the flow of heat along the tube, the temperature everywhere, etc., and guess what: They are complicated, nonlinear, and have oscillating solutions. Damn. Doug O. wasn't blowing me off - he was completely right (and knew that a more involved explanation would have been a huge mess). I quickly got used to this situation.

Entropy increases. Closed systems become increasingly disordered over time. So says the Second Law of Thermodynamics, one of my favorite notions in all of physics. At least, entropy usually increases. If we define entropy by first defining “macrostates” — collections of … Continue reading

Entropy increases. Closed systems become increasingly disordered over time. So says the Second Law of Thermodynamics, one of my favorite notions in all of physics.

At least, entropy *usually* increases. If we define entropy by first defining “macrostates” — collections of individual states of the system that are macroscopically indistinguishable from each other — and then taking the logarithm of the number of microstates per macrostate, as portrayed in this blog’s header image, then we don’t expect entropy to *always* increase. According to Boltzmann, the increase of entropy is just really, really probable, since higher-entropy macrostates are much, much bigger than lower-entropy ones. But if we wait long enough — really long, much longer than the age of the universe — a macroscopic system will spontaneously fluctuate into a lower-entropy state. Cream and coffee will unmix, eggs will unbreak, maybe whole universes will come into being. But because the timescales are so long, this is just a matter of intellectual curiosity, not experimental science.

That’s what I was taught, anyway. But since I left grad school, physicists (and chemists, and biologists) have become increasingly interested in ultra-tiny systems, with only a few moving parts. Nanomachines, or the molecular components inside living cells. In systems like that, the occasional downward fluctuation in entropy is not only possible, it’s going to happen relatively frequently — with crucial consequences for how the real world works.

Accordingly, the last fifteen years or so has seen something of a revolution in non-equilibrium statistical mechanics — the study of statistical systems far from their happy resting states. Two of the most important results are the Crooks Fluctuation Theorem (by Gavin Crooks), which relates the probability of a process forward in time to the probability of its time-reverse, and the Jarzynski Equality (by Christopher Jarzynski), which relates the change in free energy between two states to the average amount of work done on a journey between them. (Professional statistical mechanics are so used to dealing with **in**equalities that when they finally do have an honest equation, they call it an “equality.”) There is a sense in which these relations underlie the good old Second Law; the Jarzynski equality can be derived from the Crooks Fluctuation Theorem, and the Second Law can be derived from the Jarzynski Equality. (Though the three relations were discovered in reverse chronological order from how they are used to derive each other.)

Still, there is a mystery lurking in how we think about entropy and the Second Law — a puzzle that, like many such puzzles, I never really thought about until we came up with a solution. Boltzmann’s definition of entropy (logarithm of number of microstates in a macrostate) is very conceptually clear, and good enough to be engraved on his tombstone. But it’s not the only definition of entropy, and it’s not even the one that people use most often.

Rather than referring to macrostates, we can think of entropy as characterizing something more subjective: our knowledge of the state of the system. That is, we might not know the exact position ** x** and momentum

That is, we take the probability distribution ρ, multiply it by its own logarithm, and integrate the result over all the possible states of the system, to get (minus) the entropy. A formula like this was introduced by Boltzmann himself, but these days is often associated with Josiah Willard Gibbs, unless you are into information theory, where it’s credited to Claude Shannon. Don’t worry if the symbols are totally opaque; the point is that low entropy means we know a lot about the specific state a system is in, and high entropy means we don’t know much at all.

In appropriate circumstances, the Boltzmann and Gibbs formulations of entropy and the Second Law are closely related to each other. But there’s a crucial difference: in a perfectly isolated system, the Boltzmann entropy tends to increase, but the Gibbs entropy stays exactly constant. In an open system — allowed to interact with the environment — the Gibbs entropy will go up, but it will *only* go up. It will never fluctuate down. (Entropy can decrease through heat loss, if you put your system in a refrigerator or something, but you know what I mean.) The Gibbs entropy is about our knowledge of the system, and as the system is randomly buffeted by its environment we know less and less about its specific state. So what, from the Gibbs point of view, can we possibly mean by “entropy rarely, but occasionally, will fluctuate downward”?

I won’t hold you in suspense. Since the Gibbs/Shannon entropy is a feature of our knowledge of the system, the way it can fluctuate downward is for us to *look* at the system and notice that it is in a relatively unlikely state — thereby gaining knowledge.

But this operation of “looking at the system” doesn’t have a ready implementation in how we usually formulate statistical mechanics. Until now! My collaborators Tony Bartolotta, Stefan Leichenauer, Jason Pollack, and I have written a paper formulating statistical mechanics with explicit knowledge updating via measurement outcomes. (Some extra figures, animations, and codes are available at this web page.)

The Bayesian Second Law of Thermodynamics

Anthony Bartolotta, Sean M. Carroll, Stefan Leichenauer, and Jason PollackWe derive a generalization of the Second Law of Thermodynamics that uses Bayesian updates to explicitly incorporate the effects of a measurement of a system at some point in its evolution. By allowing an experimenter’s knowledge to be updated by the measurement process, this formulation resolves a tension between the fact that the entropy of a statistical system can sometimes fluctuate downward and the information-theoretic idea that knowledge of a stochastically-evolving system degrades over time. The Bayesian Second Law can be written as ΔH(ρ

_{m},ρ)+⟨Q⟩_{F|m}≥0, where ΔH(ρ_{m},ρ) is the change in the cross entropy between the original phase-space probability distribution ρ and the measurement-updated distribution ρ_{m}, and ⟨Q⟩_{F|m}is the expectation value of a generalized heat flow out of the system. We also derive refined versions of the Second Law that bound the entropy increase from below by a non-negative number, as well as Bayesian versions of the Jarzynski equality. We demonstrate the formalism using simple analytical and numerical examples.

The crucial word “Bayesian” here refers to Bayes’s Theorem, a central result in probability theory. Bayes’s theorem tells us how to update the probability we assign to any given idea, after we’ve received relevant new information. In the case of statistical mechanics, we start with some probability distribution for the system, then let it evolve (by being influenced by the outside world, or simply by interacting with a heat bath). Then we make some measurement — but a realistic measurement, which tells us something about the system but not everything. So we can use Bayes’s Theorem to update our knowledge and get a new probability distribution.

So far, all perfectly standard. We go a bit farther by also updating the *initial* distribution that we started with — our knowledge of the measurement outcome influences what we think we know about the system at the beginning of the experiment. Then we derive the Bayesian Second Law of Thermodynamics, which relates the original (un-updated) distribution at initial and final times to the updated distribution at initial and final times.

That relationship makes use of the cross entropy between two distributions, which you actually don’t see that often in information theory. Think of how much you would expect to learn by being told the specific state of a system, when all you originally knew was some probability distribution. If that distribution were sharply peaked around some value, you don’t expect to learn very much — you basically already know what state the system is in. But if it’s spread out, you expect to learn a bit more. Indeed, we can think of the Gibbs/Shannon entropy *S*(ρ) as “the average amount we expect to learn by being told the exact state of the system, given that it is described by a probability distribution ρ.”

By contrast, the cross-entropy *H*(ρ, ω) is a function of two distributions: the “assumed” distribution ω, and the “true” distribution ρ. Now we’re imagining that there are two sources of uncertainty: one because the actual distribution has a nonzero entropy, and another because we’re not even using the right distribution! The cross entropy between those two distributions is “the average amount we expect to learn by being told the exact state of the system, given that we think it is described by a probability distribution ω but it is actually described by a probability distribution ρ.” And the Bayesian Second Law (BSL) tells us that this lack of knowledge — the amount we would learn on average by being told the exact state of the system, given that we were using the un-updated distribution — is always larger at the end of the experiment than at the beginning (up to corrections because the system may be emitting heat). So the BSL gives us a nice information-theoretic way of incorporating the act of “looking at the system” into the formalism of statistical mechanics.

I’m very happy with how this paper turned out, and as usual my hard-working collaborators deserve most of the credit. Of course, none of us actually does statistical mechanics for a living — we’re all particle/field theorists who have wandered off the reservation. What inspired our wandering was actually this article by Natalie Wolchover in *Quanta* magazine, about work by Jeremy England at MIT. I had read the *Quanta* article, and Stefan had seen a discussion of it on Reddit, so we got to talking about it at lunch. We thought there was more we could do along these lines, and here we are.

It will be interesting to see what we can do with the BSL, now that we have it. As mentioned, occasional fluctuations downward in entropy happen all the time in small systems, and are especially important in biophysics, perhaps even for the origin of life. While we have phrased the BSL in terms of a measurement carried out by an observer, it’s certainly not necessary to have an actual person there doing the observing. All of our equations hold perfectly well if we simply ask “what happens, given that the system ends up in a certain kind of probability distribution.” That final conditioning might be “a bacteria has replicated,” or “an RNA molecule has assembled itself.” It’s an exciting connection between fundamental principles of physics and the messy reality of our fluctuating world.