Gravitational waves are already the science story of the week but if the rumours hold up they will one of the science stories of the century. We'll know soon enough, as there will be a press conference in Washington DC at 10:30am (local time) on Thursday. And this revolution will be broadcast; you can catch a livestream on Youtube.

The rumour doing the rounds is that the LIGO team will announce the detection of gravitational waves emitted during the merger of two black holes. Here's a quick explainer as we head into the (we hope!) big day...

**What Are Gravitational Waves?** Gravitational waves are waves that travel through the fabric of space, just as ripples move across the surface of a pond.

**Waves In Space?** Yep. By detecting gravitational waves we are watching space bend and stretch.

Gravitational fields are encoded in the curvature of space CC BY-SA 3.0

**Really? Waves in Space? **Yep, Really. Once upon a time, physicists thought space was rigid and unchanging. However, in 1915 Einstein's General Theory of Relativity told us that gravitational forces are communicated via the curvature of space. This is often described via the "rubber sheet model"; curved space is analogous to a rubber sheet warped by massive objects that sit upon it. But what really matters for gravitational waves is not just that space (or, more properly, spacetime) can curve, but that its curvature can change. As stars and planets move the curvature of space must adapt itself to their new positions. If the curvature didn't change the universe would be a very strange place as a moving object would leave its gravitational field behind, a little like Peter Pan losing his shadow. Mathematically, the ability of space to bend and stretch means that waves can move through it, and this led Einstein to predict that gravitational waves could exist.

**Why Is This So Exciting? ** Science waited 100 years for this; who wouldn't be excited? For physicists, LIGO is testing a key prediction of General Relativity, which is one of the most fundamental theories we have. On top of that, if LIGO sees gravitational waves emitted by a pair of black holes as they collide and merge we will have ringside seats to some of the most remarkable events in the universe. And a detection by LIGO will mark the culmination of decades of work by a cast of thousands who have built what is probably the world's most sensitive scientific instrument.

**How Does LIGO Work?** LIGO has two giant L-shaped detectors; one in Washington State and the other in Louisiana, on the other side of the United States. Each detector is 4 kilometres on a side. Gravitational waves always stretch space in one direction while squeezing it in another, so a passing gravitational wave expands one side of the "L" while shrinking the other. Powerful lasers then pick up the resulting change in the lengths of the arms. The stretching and squeezing is tiny – each arm may grow and shrink by only a quadrillionth of a millimetre, far less than the diameter of a single atom. By having two detectors LIGO rules out spurious signals from local vibrations, traffic or tiny earthquakes; LIGO also pools its data with two smaller European experiments, GEO and VIRGO.

Spacetime near two orbiting black holesImage: Swinburne University

**How Are Gravitational Waves Made? **Two black holes (or any pair of orbiting objects) stir up space as they circle one another, creating gravitational waves. If the black holes are far apart the gravitational waves are unimaginably small. But gravitational waves carry away energy, and that energy has to come from somewhere – so the orbit slowly shrinks. But a smaller orbit is a faster orbit, increasing the output of gravitational radiation and the orbit shrinks faster and faster. This is the *inspiral, *and can take hundreds of millions or even billions of years. But eventually the two black holes are orbiting one other at a decent fraction of the speed of light, churning space like an out-of-control cosmic egg-beater. This phase lasts seconds but produces a huge burst of gravitational waves: this is the signal that LIGO detects. The black holes then plunge towards a *merger,* followed by the *ringdown *as the new black hole settles into a stable shape.

**Didn't Everyone Get All Excited About Gravitational Waves A Couple of Years Ago? **We did, and it was a false alarm. Several things are different this time, though. That claim was made by BICEP2, a telescope that looks at the microwave background, fossil light from the Big Bang. BICEP2 did not observe gravitational waves directly, as LIGO does. The signal BICEP2 saw turned out to be associated with dust in our own galaxy; this was quickly realized as astrophysicists checked and re-checked the results. (I blogged about the latest news from BICEP2; it is producing lovely data and starting to test a number of different theories about the Big Bang.) Moreover, the LIGO team has built a reputation for caution – going so far as to do "signal injections", where the analysis teams are unknowingly fed synthetic data to test their ability to extract real gravitational waves from the experimental noise. Finally, the rumour is that their results have been through peer review, and will have stood up to scrutiny from independent scientists.

**What Next? **Physicists will use LIGO to make stringent tests General Relativity: do its predictions match the behavior of spacetime seen during black holes mergers? And for astronomers it will like growing a new set of eyes: LIGO is an entirely new kind of telescope that lets us explore the universe with gravitational waves. Watch this space.

Scarcely a hundred years after Einstein revealed the equations for his theory of gravity (“General Relativity”) on November 25th, 1915, the world today awaits an announcement from the LIGO experiment, where the G in LIGO stands for Gravity. (The full … Continue reading

Scarcely a hundred years after Einstein revealed the equations for his theory of gravity (“General Relativity”) on November 25th, 1915, the world today awaits an announcement from the LIGO experiment, where the G in LIGO stands for Gravity. *(The full acronym stands for “Laser Interferometer Gravitational Wave Observatory.”)* As you’ve surely heard, the widely reported rumors are that at some point in the last few months, LIGO, recently upgraded to its “Advanced” version, finally observed gravitational waves — ripples in the fabric of space (more accurately, of space-time). These waves, which can make the length of LIGO shorter and longer by an incredibly tiny amount, seem to have come from the violent merger of two black holes, each with a mass *[rest-mass!]* dozens of times larger than the Sun. Their coalescence occurred long long ago (billions of years) in a galaxy far far away (a good fraction of the distance across the visible part of the universe), but the ripples from the event arrived at Earth just weeks ago. For a brief moment, it is rumored, they shook LIGO hard enough to be convincingly observed.

For today’s purposes, let me assume the rumors are *true*, and let me assume also that the result to be announced is actually *correct*. We’ll learn today whether the first assumption is right, but the second assumption may not be certain for some months (remember OPERA’s [*NOT*] faster-than-light neutrinos and BICEP2’s [*PROBABLY NOT*] gravitational waves from inflation). We must always keep in mind that any extraordinary scientific result has to be scrutinized and confirmed by experts before scientists will believe it! Discovery is difficult, and a large fraction of such claims — **large** — fail the test of time.

**What the Big News Isn’t**

There will be so much press and so many blog articles about this subject that I’m just going to point out a few things that I suspect most articles will miss, especially those in the press.

Most importantly, if LIGO has indeed directly discovered gravitational waves, that’s exciting of course. ** But it’s by no means the most important story here**.

That’s because gravitational waves were **already** observed indirectly, quite some time ago, in a system of two neutron stars orbiting each other. This pair of neutron stars, discovered by Joe Taylor and his graduate student Russell Hulse, is interesting because one of the neutron stars is a pulsar, an object whose rotation and strong magnetic field combine to make it a natural lighthouse, or more accurately a radiohouse, sending out pulses of radio waves that can be detected at great distances. The time between pulses shifts very slightly as the pulsar moves toward and away from Earth, so the pulsar’s motion around its companion can be carefully monitored. Its orbital period has slowly changed over the decades, and the changes are perfectly consistent with what one would expect if the system were losing energy, emitting it in the form of unseen gravitational waves at just the rate predicted by Einstein’s theory (as shown in this graph.) For their discovery, Hulse and Taylor received the 1993 Nobel Prize. By now, there are other examples of similar pairs of neutron stars, also showing the same type of energy loss in detailed accord with Einstein’s equations.

*A bit more subtle (so you can skip this paragraph if you want), but also more general, is that some kind of gravitational waves are inevitable… inevitable, after you accept Einstein’s earlier (1905) equations of special relativity, in which he suggested that the speed of light is a sort of universal speed limit on everything, imposed by the structure of space-time. Sound waves, for instance, exist because the speed of sound is finite; if it were infinite, a vibrating guitar string would make the whole atmosphere wiggle back and forth in sync with the guitar string. Similarly, since effects of gravity must travel at a finite speed, the gravitational effects of orbiting objects must create waves. The only question is the specific properties those waves might have.*

No one, therefore, should be surprised that gravitational waves exist, or that they travel at the universal speed limit, just like electromagnetic waves (including visible light, radio waves, etc.) No one should even be surprised that the waves LIGO is (perhaps) detecting have properties predicted by Einstein’s specific equations for gravity; if they were different in a dramatic way, the Hulse-Taylor neutron stars would have behaved differently than expected.

Furthermore, no one should be surprised if waves from a black hole merger have been observed by the Advanced LIGO experiment. This experiment was designed *from the beginning,* decades ago, so that it could hardly fail to discover gravitational waves from the coalescence of two black holes, two neutron stars, or one of each. We know these mergers happen, and the experts were very confident that Advanced LIGO could find them. The really serious questions were: (a) would Advanced LIGO work as advertised? (b) if it worked, how soon would it make its first discovery? and (c) would the discovery agree in detail with expectations from Einstein’s equations?

**Big News In Scientific Technology**

So the first big story is that Advanced LIGO **WORKS**! This experiment represents one of the greatest technological achievements in human history. Congratulations are due to the designers, builders, and operators of this experiment — and to the National Science Foundation of the United States, which is LIGO’s largest funding source. U.S. taxpayers, who on average each contributed a few cents per year over the past two-plus decades, can be proud. And because of the new engineering and technology that were required to make Advanced LIGO functional, I suspect that, over the long run, taxpayers will get a positive financial return on their investment. That’s in addition of course to a vast scientific return.

Advanced LIGO is not even in its final form; further improvements are in the works. Currently, Advanced LIGO consists of two detectors located 2000 miles (3000 kilometers) apart. Each detector consists of two “arms” a few miles (kilometers) long, oriented at right angles, and the lengths of the arms are continuously compared. This is done using exceptionally stable lasers reflecting off exceptionally perfect mirrors, and requiring use of sophisticated tricks for mitigating all sorts of normal vibrations and even effects of quantum “jitter” from the Heisenberg uncertainty principle. With these tools, Advanced LIGO can detect when passing gravitational waves change the lengths of LIGO’s arms by … incredibly … less than one part in a billion trillion (1,000,000,000,000,000,000,000). That’s an astoundingly tiny distance: a thousand times smaller than the radius of a proton. *(A proton itself is a hundred thousand times smaller, in radius, than an atom. Indeed, LIGO is measuring a distance as small as can be probed by the Large Hadron Collider — albeit with a very very tiny energy, in contrast to the collider.)* By any measure, the gravitational experimenters have done something absolutely extraordinary.

**Big News In Gravity**

The second big story: from the gravitational waves that LIGO has perhaps seen, we would learn that the merger of two black holes occurs, to a large extent, as Einstein’s theory predicts. The success of this prediction for what the pattern of gravitational waves should be is a far more powerful test of Einstein’s equations than the mere existence of the gravitational waves!

Imagine, if you can… Two city-sized black holes, each with a mass [rest-mass!] tens of times greater than the Sun, and separated by a few tens of miles (tens of kilometers), orbit each other. They circle faster and faster, as often, in their last few seconds, as 100 times per second. They move at a speed that approaches the universal speed limit. This extreme motion creates an ever larger and increasingly rapid vibration in space-time, generating large space-time waves that rush outward into space. Finally the two black holes spiral toward each other, meet, and join together to make a single black hole, larger than the first two and spinning at an incredible rate. It takes a short moment to settle down to its final form, emitting still more gravitational waves.

During this whole process, the total amount of energy emitted in the vibrations of space-time is a few times larger than you’d get if you could take the entire Sun and (magically) extract all of the energy stored in its rest-mass (E=mc²). This is an immense amount of energy, significantly more than emitted in a typical supernova. Indeed, LIGO’s black hole merger may perhaps be the most titanic event ever detected by humans!

This violent dance of darkness involves very strong and complicated warping of space and time. In fact, it wasn’t until 2005 or so that the full calculation of the process, including the actual moment of coalescence, was possible, using highly advanced mathematical techniques and powerful supercomputers!

By contrast, the resulting ripples we get to observe, billions of years later, are much more tame. Traveling far across the cosmos, they have spread out and weakened. Today they create extremely small and rather simple wiggles in space and time. You can learn how to calculate their properties in an advanced university textbook on Einstein’s gravity equations. Not for the faint of heart, but certainly no supercomputers required.

So gravitational waves are the (relatively) easy part. It’s the prediction of the merger’s properties that was the really big challenge, and its success would represent a remarkable achievement by gravitational theorists. And it would provide powerful new tests of whether Einstein’s equations are in any way incomplete in their description of gravity, black holes, space and time.

**Big News in Astronomy**

The third big story: If today’s rumor is indeed of a real discovery, we are witnessing the birth of an entirely new field of science: gravitational-wave astronomy. This type of astronomy is complementary to the many other methods we have of “looking” at the universe. What’s great about gravitational wave astronomy is that although dramatic events can occur in the universe without leaving a signal visible to the eye, and even without creating any electromagnetic waves at all, nothing violent can happen in the universe without making waves in space-time. Every object creates gravity, through the curvature of space-time, and every object feels gravity too. *You can try to hide in the shadows, but there’s no hiding from gravity.*

Advanced LIGO may have been rather lucky to observe a two-black-hole merger so early in its life. But we can be optimistic that the early discovery means that black hole mergers will be observed as often as several times a year even with the current version of Advanced LIGO, which will be further improved over the next few years. This in turn would imply that gravitational wave astronomy will soon be a very rich subject, with lots and lots of interesting data to come, even within 2016. We will look back on today as just the beginning.

Although the rumored discovery is of something expected — experts were pretty certain that mergers of black holes of this size happen on a fairly regular basis — gravitational wave astronomy might soon show us something completely unanticipated. Perhaps it will teach us surprising facts about the numbers or properties of black holes, neutron stars, or other massive objects. Perhaps it will help us solve some existing mysteries, such as those of gamma-ray bursts. Or perhaps it will reveal currently unsuspected cataclysmic events that may have occurred somewhere in our universe’s past.

**Prizes On Order?**

So it’s really not the gravitational waves themselves that we should celebrate, although I suspect that’s what the press will focus on. Scientists already knew that these waves exist, just as they were aware of the existence of atoms, neutrinos, and top quarks long before these objects were directly observed. The historic aspects of today’s announcement would be in the successful operation of Advanced LIGO, in its new way of “seeing” the universe that allows us to observe two black holes becoming one, and in the ability of Einstein’s gravitational equations to predict the complexities of such an astronomical convulsion.

Of course all of this is under the assumptions that the rumors are true, and also that LIGO’s results are confirmed by further observations. Let’s hope that any claims of discovery survive the careful and proper scrutiny to which they will now be subjected. If so, then prizes of the highest level are clearly in store, and will be doled out to quite a few people, experimenters for designing and building LIGO and theorists for predicting what black-hole mergers would look like. As always, though, the only prize that really matters is given by Nature… and the many scientists and engineers who have contributed to Advanced LIGO may have already won.

—

*Enjoy the press conference this morning. I, ironically, will be in the most inaccessible of places: over the Atlantic Ocean. I was invited to speak at a workshop on Large Hadron Collider physics this week, and I’ll just be flying home. I suppose I can wait 12 hours to find out the news… it’s been 44 years since LIGO was proposed…*

Filed under: Astronomy, Gravitational Waves Tagged: astronomy, black holes, Gravitational Waves, LIGO

As the rumor noise level has increased over the last few weeks, and LIGO has a press conference scheduled for tomorrow morning, everyone in the gravity community is expecting that LIGO will announce the first detection of gravitational waves. A roundup of rumors can be found here and here and here and here. Preprints with postdictions that sound […]

As the rumor noise level has increased over the last few weeks, and LIGO has a press conference scheduled for tomorrow morning, everyone in the gravity community is expecting that LIGO will announce the first detection of gravitational waves.

A roundup of rumors can be found here and here and here and here.

Preprints with postdictions that sound as predictions can be found here for example. I’ve been told that the cat has been out of the bag for a while, and people with inside information have been posting papers to the arxiv in advance of the LIGO announcement.

Obviously this is very exciting and hopefully the announcement tomorrow will usher a new era of gravitational astronomy.

Filed under: gravity, Physics

Ok, I promised to explain the staircase I put up on Monday. I noticed something rather nice recently, and reported it (actually, two things) in a recent paper, here. It concerns those things I called "Holographic Heat Engines" which I introduced in a paper two years ago, and which I described in some detail in a previous post. You can go to that post in order to learn the details - there's no point repeating it all again - but in short the context is an extension of gravitational thermodynamics where the cosmological constant is dynamical, therefore supplying a meaning to the pressure and the volume variables (p,V) that are normally missing in black hole thermodynamics... Once you have those, it seems obvious that you can start considering processes that do mechanical work (from the pdV term in the first law) and within a short while the idea of heat engines in which the black hole is the working substance comes along. Positive pressure corresponds to negative cosmological constant and so the term "holographic heat engines" is explained. (At least to those who know about holographic dualities.)

So you have a (p,V) plane, some heat flows, and an equation of state determined by the species of (asymptotically AdS) black hole you are working with. It's like discovering a whole new family of fluids for which I know the equation of state (often exactly) and now I get to work out the properties of the heat engines I can define with them. That's what this is.

Now, I suspect that this whole business is an answer waiting for a question. I can't tell you what the question is. One place to look might be in the space of field theories that have such black holes as their holographic dual, but I'm the first to admit that [...] Click to continue reading this post

The post News from the Front, XII: Simplicity appeared first on Asymptotia.

The twitters are ablaze with rumors about the announcement from the *LIGO* project scheduled for tomorrow. We discussed this in group meeting today, with no embargo-breaking by anyone. That is, on purely physical, engineering, sociological, and psychological grounds we made predictions for the press release tomorrow. Here are my best predictions: First, I predict that the total signal-to-noise of any detected black-hole inspiral signal they announce will be greater than 15 in the total data set. That is, I predict that (say) the width of the likelihood function for the overall, scalar signal amplitude will have a half-width that is less than 15 times its mode. Second, I predict that the uncertainty on the sum of the two masses (that is, the total mass of the inspiral system, if any is announced) will be dominated by the (large, many hundreds of km/s) uncertainty in the peculiar velocity of the system (in the context that the system lives inside the cosmological world model). Awesome predictions? Perhaps not, but *you heard them here first!*

*[Note to the world: This is *not* an announcement: I know nothing! This is just a pair of predictions from an outsider.]*

We discussed the things that could be learned from any detection of a single black-hole inspiral signal, about star formation, black-hole formation, and galaxies. I think that if the masses of the detected black holes are large, then there are probably interesting things to say about stars or supernovae or star formation.

Today was the first meeting of #AstroHackNY, where we discuss data analysis and parallel work up at Columbia on Tuesday mornings. We discussed what we want to get out of the series, and started a discussion of why we do linear fitting the way we do, and what are the underlying assumptions.

Prior to that, I talked with Hans-Walter Rix about interpolation and gridding of spectral models. We disagree a bit on the point of all this, but we are trying to minimize the number of stellar model evaluations we need to do to get precise, many-element abundances with a very expensive physical model of stars. We also discussed the point that we probably have to cancel the Heidelberg #GaiaSprint, because of today's announcement from the *Gaia* Project.

Last year in September, upgrades of the gravitational wave interferometer LIGO were completed. The experiment – now named advanced LIGO – searches for gravitational waves emitted in the merger of two black holes. Such a merger signal should fall straight into advanced LIGOs reach.

It was thus expected that the upgraded experiment either sees something immediately, or we’ve gotten something terribly wrong. And indeed, rumors about a positive detection started to appear almost immediately after the upgrade. But it wasn’t until this week that the LIGO collaboration announced several press-conferences in the USA and Europe, scheduled for tomorrow, Thursday Feb 11, at 3:30pm GMT. So something big is going to hit the headlines tomorrow, and here are the essentials that you need to know.

Gravitational waves are periodic distortions of space-time. They alter distance ratios for orthogonal directions. An interferometer works by using lasers to measure and compare orthogonal distances very precisely, thus it picks up even the tiniest space-time deformations.

Moving masses produce gravitational waves much like moving charges create electromagnetic waves. The most relevant differences between the two cases are

Since all matter gravitates, the motion of matter generically creates gravitational waves. Every time you move, you create gravitational waves, lots of them. These are, however, so weak that they are impossible to measure.

The gravitational waves that LIGO is looking for come from the most violent events in the universe that we know of: black hole mergers. In these events, space-time gets distorted dramatically as the two black holes join to one, leading to significant emission of gravitational waves. This combined system later settles with a characteristic “ringdown” into a new stable state.

Yes, this also means that these gravitational waves go right through you and distort you oh-so-slightly on their way.

The wave-lengths of gravitational waves emitted in such merger events are typically of the same order as the dimension of the system. That is, for black holes with masses between 10 and 100 times the solar mass, wavelengths are typically a hundred to a thousand km – right in the range that LIGO is most sensitive.

If you want to score extra points when discussing the headlines we expect tomorrow, learn how to pronounce Fabry–Pérot. This is a method for bouncing back light-signals in interferometer arms several times before making the measurments, which effectively increases the armlength. This is why LIGO is sensitive in a wavelength regime far longer than its actual arm length of about 2-4 km. And don’t call them gravity waves. A gravity wave is a cloud phenomenon.

Gravitational waves were predicted a hundred years ago as one of the consequences of Einstein’s theory of General Relativity. Their existence has since been indirectly confirmed because gravitational wave emission leads to energy loss, which has the consequence that two stars which orbit around a common center speed up over the course of time. This has been observed and was awarded the Nobel Prize for physics in 1993. If LIGO has detected the sought-after signal, it would not be the first detection, but the first*direct* detection.

Interestingly, even though it was long known that black hole mergers would emit gravitational waves, it wasn’t until computing power had increased sufficiently that precise predictions became possible. So it’s not like experiment is all that far behind theory on that one. General Relativity, though often praised for its beauty, does leave you with one nasty set of equations that in most cases cannot be solved analytically and computer simulations become necessary.

The existence of gravitational waves is not doubted by anyone in the physics community, or at least not by anybody I have met. This is for good reasons: On the experimental side there is the indirect evidence, and on the theoretical side there is the difficulty of making any theory of gravity work that does not have gravitational waves. But the direct detection of gravitational waves would be tremendously exciting because it opens our eyes to an entirely new view on the universe.

Hundreds of millions of years ago, a primitive form of life crawled out of the water on planet Earth and opened their eyes to see, for the first time, the light of the stars. Detecting gravitational waves is a momentous event just like this – it’s the first time we can receive signals that were previously entirely hidden from us, revealing an entirely new layer of reality.

So bookmark the webcast page and mark your calendar for tomorrow 3:30 GMT – it might enter the history books.

Estimated gravitational wave spectrum. [Image Source] |

It was thus expected that the upgraded experiment either sees something immediately, or we’ve gotten something terribly wrong. And indeed, rumors about a positive detection started to appear almost immediately after the upgrade. But it wasn’t until this week that the LIGO collaboration announced several press-conferences in the USA and Europe, scheduled for tomorrow, Thursday Feb 11, at 3:30pm GMT. So something big is going to hit the headlines tomorrow, and here are the essentials that you need to know.

Gravitational waves are periodic distortions of space-time. They alter distance ratios for orthogonal directions. An interferometer works by using lasers to measure and compare orthogonal distances very precisely, thus it picks up even the tiniest space-time deformations.

Moving masses produce gravitational waves much like moving charges create electromagnetic waves. The most relevant differences between the two cases are

- Electromagnetic waves travel
*in*space-time, whereas gravitational waves are a disturbance of space-time itself. - Electromagnetic waves have spin 1, gravitational waves have spin two. The spin counts how much you have to rotate the wave for it to come back onto itself. For the electromagnetic fields that’s one full rotation, for the gravitational field it’s only half a rotation.
- The dominant electromagnetic emission comes from the dipole moment (normally used eg for transmitter antennae), but gravitational waves have no dipole moment (a consequence of momentum conservation). It’s instead the quadrupole emission that is leading.

[Image Credit: David Abergel] |

Since all matter gravitates, the motion of matter generically creates gravitational waves. Every time you move, you create gravitational waves, lots of them. These are, however, so weak that they are impossible to measure.

The gravitational waves that LIGO is looking for come from the most violent events in the universe that we know of: black hole mergers. In these events, space-time gets distorted dramatically as the two black holes join to one, leading to significant emission of gravitational waves. This combined system later settles with a characteristic “ringdown” into a new stable state.

Yes, this also means that these gravitational waves go right through you and distort you oh-so-slightly on their way.

The wave-lengths of gravitational waves emitted in such merger events are typically of the same order as the dimension of the system. That is, for black holes with masses between 10 and 100 times the solar mass, wavelengths are typically a hundred to a thousand km – right in the range that LIGO is most sensitive.

If you want to score extra points when discussing the headlines we expect tomorrow, learn how to pronounce Fabry–Pérot. This is a method for bouncing back light-signals in interferometer arms several times before making the measurments, which effectively increases the armlength. This is why LIGO is sensitive in a wavelength regime far longer than its actual arm length of about 2-4 km. And don’t call them gravity waves. A gravity wave is a cloud phenomenon.

Gravitational waves were predicted a hundred years ago as one of the consequences of Einstein’s theory of General Relativity. Their existence has since been indirectly confirmed because gravitational wave emission leads to energy loss, which has the consequence that two stars which orbit around a common center speed up over the course of time. This has been observed and was awarded the Nobel Prize for physics in 1993. If LIGO has detected the sought-after signal, it would not be the first detection, but the first

Interestingly, even though it was long known that black hole mergers would emit gravitational waves, it wasn’t until computing power had increased sufficiently that precise predictions became possible. So it’s not like experiment is all that far behind theory on that one. General Relativity, though often praised for its beauty, does leave you with one nasty set of equations that in most cases cannot be solved analytically and computer simulations become necessary.

The existence of gravitational waves is not doubted by anyone in the physics community, or at least not by anybody I have met. This is for good reasons: On the experimental side there is the indirect evidence, and on the theoretical side there is the difficulty of making any theory of gravity work that does not have gravitational waves. But the direct detection of gravitational waves would be tremendously exciting because it opens our eyes to an entirely new view on the universe.

Hundreds of millions of years ago, a primitive form of life crawled out of the water on planet Earth and opened their eyes to see, for the first time, the light of the stars. Detecting gravitational waves is a momentous event just like this – it’s the first time we can receive signals that were previously entirely hidden from us, revealing an entirely new layer of reality.

So bookmark the webcast page and mark your calendar for tomorrow 3:30 GMT – it might enter the history books.

I did bring my good camera with me to Newport News, and took it on the tour of Jefferson Lab yesterday, but despite the existence of DSLR pics, you’re getting a cell-phone snap for the photo of the day:

That’s the audience about 10-15 minutes before my talk last night, so it was a good turnout. And they laughed in the right places, and asked some really good questions last night. I also got asked to appear in a selfie with a bunch of students from a local school, so they could prove they were there to get extra credit for their science class…

The talk went well, though we had some technical difficulties. The shiny new video projection system in the auditorium has an audio input jack on the VGA cable for the laptop input, but it’s about four inches long, and the headphone jack for my laptop is on the far side of the keyboard from the VGA out. Whoops.

We ended up taping a lapel mike to the desk right under the laptop speaker, which was mostly fine, except for a couple of occasions where we got earsplitting feedback. Technology, man. What can you do? Video will be posted to the JLab web site at some point in the future, after they edit it and get closed captions done (I’m very sorry for whoever has to do that…)

I was also pleasantly surprised that a couple of Williams classmates showed up to the talk (they live in the area), so I went out with them afterwards for a couple of beers, to catch up. All in all, a good day.

Now, I just need to get through most of a day of airports and Regional Jets to get home to Niskayuna.

When I reached IBM’s Watson research center, I’d barely seen Aaron in three weeks. Aaron is an experimentalist pursuing a physics PhD at Caltech. I eat dinner with him and other friends, most Fridays. The group would gather on a … Continue reading

**When I reached IBM’s Watson research center, I’d barely seen Aaron in three weeks. Aaron is an experimentalist pursuing a physics PhD at Caltech. I eat dinner with him and other friends, most Fridays. The group would gather on a sidewalk in the November dusk, those three weeks. Light would spill from a lamppost, and we’d tuck our hands into our pockets against the chill. Aaron’s wife would shake her head.**

“The fridge is running,” she’d explain.

Aaron cools down mechanical devices to near absolute zero. Absolute zero is the lowest temperature possible,^{1} lower than outer space’s temperature. Cold magnifies certain quantum behaviors. Researchers observe those behaviors in small systems, such as nanoscale devices (devices about 10^{-9 }meters long). Aaron studies few-centimeter-long devices. Offsetting the devices’ size with cold might coax them into exhibiting quantum behaviors.

The cooling sounds as effortless as teaching a cat to play fetch. Aaron lowers his fridge’s temperature in steps. Each step involves checking for leaks: A mix of two fluids—two types of helium—cools the fridge. One type of helium costs about $800 per liter. Lose too much helium, and you’ve lost your shot at graduating. Each leak requires Aaron to warm the fridge, then re-cool it. He hauled helium and pampered the fridge for ten days, before the temperature reached 10 milliKelvins (0.01 units above absolute zero). He then worked like…well, like a grad student to check for quantum behaviors.

Aaron came to mind at IBM.

“How long does cooling your fridge take?” I asked Nick Bronn.

Nick works at Watson, IBM’s research center in Yorktown Heights, New York. Watson has sweeping architecture frosted with glass and stone. The building reminded me of Fred Astaire: decades-old, yet classy. I found Nick outside the cafeteria, nursing a coffee. He had sandy hair, more piercings than I, and a mandate to build a quantum computer.

“Might I look around your lab?” I asked.

“Definitely!” Nick fished out an ID badge; grabbed his coffee cup; and whisked me down a wide, window-paneled hall.

Different researchers, across the world, are building quantum computers from different materials. IBMers use superconductors. Superconductors are tiny circuits. They function at low temperatures, so IBM has seven closet-sized fridges. Different teams use different fridges to tackle different challenges to computing.

Nick found a fridge that wasn’t running. He climbed half-inside, pointed at metallic wires and canisters, and explained how they work. I wondered how his cooling process compared to Aaron’s.

“You push a button.” Nick shrugged. “The fridge cools in two days.”

IBM, I learned, has *dry fridges*. Aaron uses a *wet fridge*. Dry and wet fridges operate differently, though both require helium. Aaron’s wet fridge vibrates less, jiggling his experiment less. Jiggling relates to transferring heat. Heat suppresses the quantum behaviors Aaron hopes to observe.

Heat and warmth manifest in many ways, in physics. Count Rumford, an 18th-century American-Brit, conjectured the relationship between heat and jiggling. He noticed that drilling holes into canons immersed in water boils the water. The drill bits rotated–moved in circles–transferring energy of movement to the canons, which heated up. Heat enraptures me because it relates to entropy, a measure of disorderliness and ignorance. The flow of heat helps explain why time flows in just one direction.

A physicist friend of mine writes papers, he says, when catalyzed by “blinding rage.” He reads a paper by someone else, whose misunderstandings anger him. His wrath boils over into a research project.

Warmth manifests as the welcoming of a visitor into one’s lab. Nick didn’t know me from Fred Astaire, but he gave me the benefit of the doubt. He let me pepper him with questions and invited more questions.

Warmth manifests as a 500-word disquisition on fridges. I asked Aaron, via email, about how his cooling compares to IBM’s. I expected two sentences and a link to Wikipedia, since Aaron works 12-hour shifts. But he took pity on his theorist friend. He also warmed to his subject. Can’t you sense the zeal in “Helium is the only substance in the world that will naturally isotopically separate (neat!)”? No knowledge of isotopic separation required.

Many quantum scientists like it cold. But understanding, curiosity, and teamwork fire us up. Anyone under the sway of those elements of science likes it hot.

*With thanks to Aaron and Nick. Thanks also to John Smolin and IBM Watson’s quantum-computing-theory team for their hospitality.*

^{1}In many situations. Some systems, like small magnets, can access negative temperatures.

As I go on some travel, here are some news items that looked interesting to me:

- Rumors are really heating up that LIGO has spotted gravity waves. The details are similar to some things I'd heard, for what that's worth, though that may just mean that everyone is hearing the same rumors.
**update**: Press conference coming (though they may just say that the expt is running well....) - The starship
*Enterprise*is undergoing a refit. - This paper reports a photocatalytic approach involving asymmetric, oblong, core-shell semiconductor nanoparticles, plus a single Pt nanoparticle catalyst, that (under the right solution conditions) can give essentially
*100% efficient*hydrogen reduction - every photon goes toward producing hydrogen gas. If the insights here can be combined with improved solution stability of appropriate nanoparticles, maybe there are ways forward for highly efficient water splitting or photo production of liquid fuels. - Quantum materials are like obscenity - hard to define, but you know it when you see it.

As I go through my daily routine, I find myself sort of out of phase with a lot of the Internet. My peak online hours are from about six to ten in the morning, Eastern US time. That’s when I get up, have breakfast, and then go to Starbucks to write for a few hours.

This means that most of the other people awake and active on my social media feeds are in Europe or Australia. And my standard writing time ends right around the time things start to heat up in the US. I do continue to have access to the Internet through the afternoon, of course, but unless I have a deadline coming up, I’m often doing stuff that doesn’t involve sitting in front of a computer (and if I do have a deadline coming up, I shut down social media to concentrate on work). And evenings are terrible– I spend a lot of weeknights running SteelyKid to various activities, and even when I’m not doing that, our dinner and bedtime routines don’t leave me much space to participate. By nine or ten pm, I’m completely wiped out.

As a result, I find Twitter a deeply frustrating medium. Twitter is mostly about conversation, but its deliberately ephemeral nature means that you can really only converse effectively with other people who are online and active at the same time you are. And the peak activity times for Twitter conversations are at times when I’m not regularly available because of the way my work and family schedules are arranged. In those peak hours, I’m only checking in intermittently– a few times an hour, usually– and as a result, I miss tons of stuff.

I started thinking about this the other day, when there was a big kerfuffle over Twitter’s plan to introduce an “algorithmic” timeline that would depart from the current strictly-chronological display to highlight some posts from the past. This predictably led to wailing and gnashing of teeth among Twitter power users (and it’s since been walked back a little), who declared that it would be the end of Twitter as we know it. Personally, though, I think it might be a good thing, which led to this lengthy tweetstorm, which you’ll notice was posted at 8am on a Saturday, because that’s when I have time to be on Twitter…

The standard line is that any deviation from strictly chronological Twitter will hopelessly break things in one of a variety of ways, but this is largely predicated on the assumption that the algorithm will be the stupidest and most obnoxious thing you could dream up. But, really, it’s not that hard to do a better job than most of the people outraged about the idea seem to think.

Take, for example, Facebook. Facebook famously switched to an algorithmic timeline a while back, and most of the anti-algorithm arguments feature dark mutterings about how this will make Twitter just like Facebook. To an intermittent social-media user like me, though, Facebook is in many ways *better* than Twitter. I have slightly more Facebook friends than people I follow on Twitter (about 750 vs just under 600), but Facebook does a better job of highlighting stuff I want to see. I regularly find tweets from Rhett Allain because he has his feed mirrored to Facebook, and the Facebook algorithm knows I like his stuff and makes sure I see it. On Twitter, in the middle of the day, his tweets get lost in a vast flood of stuff that I don’t get to check very often. At the same time, if I’m actively on Facebook for a relatively long time, the feed I see is pretty much chronological.

The other insinuation is that under an algorithmic scheme only stuff from famous tweeters will get shown, or paid ads. But again, I’m not convinced, because Twitter already has an algorithmic feature, the “While You Were Away” box that pops up when you go several hours without checking in. That was roundly condemned when it was introduced for basically the same reasons, but again, I find that it does a good job of highlighting stuff I wouldn’t see otherwise. And it’s not just getting me massively-retweeted stuff from clickbait outlets. One of the people who pops up most frequently in my “While You Were Away” tab is a guy with under 400 followers, because I like a good deal of his stuff, and the algorithm knows that. I find that feature one of the most useful things Twitter has done recently, and would be happy to have it show up more regularly. And given that they do *that* well, I’m not especially worried about what would happen with a wider use of algorithms.

Of course, the fundamental issue isn’t anything about practical implementation, but rather that the current power users *like* Twitter as it is, because it works well for them. Which, you know, good for them, but it should be noted that this is fundamentally pretty exclusionary. That is, the way Twitter is set up right now works really well for a particular set of people, who have the sort of jobs and family arrangements such that they’re online and actively engaged at the same time as their friends. It’s big among journalists, for example, because their whole business is about being connected, and science Twitter is dominated by folks in fields whose research mostly has them sitting in front of a computer already. If you’re not lucky enough to be in that particular demographic stratum, though, the current experience of Twitter is much less attractive.

I’ve heard Twitter described as a virtual cocktail party before, and it’s a decent metaphor– lots of people hanging around, engaged in conversation and witty banter. I would note, though, that the usual analogy doesn’t go far enough. For an intermittent user like myself, Twitter is like a really cool cocktail party *that I’m not invited to*. It’s a bit like the party is spilling out of bar into the lobby of my hotel– I catch snatches of cool conversations as I make my way to the elevator, but I miss most of it because I have other stuff to do. Every now and then, I get a chance to hang out in the bar for a bit, and that’s great, but mostly I’m getting second-hand reports and that’s just not the same.

And it should be noted that I am, in fact, relatively fortunate as such things go. I do have a few hours in the morning where I’m able to participate, and I sometimes get the chance to do more. In the cocktail party metaphor, I’m at least staying in the same hotel with most of the partygoers. The folks in other hotels don’t get even that much, which is why so many people continue to not see the point of Twitter.

The kinds of changes Twitter is talking about making could, if implemented well, make the medium more accessible for those who are currently shut out. It won’t completely open things up– it’s always going to be a conversational medium, and conversation will always require time for engagement– but good algorithms could make it easier for people who aren’t already part of the conversation to see why those who are find it useful and enjoyable.

Of course, as it is, there’s very much a “cool kids” dynamic to Twitter, and a lot of the reaction is best understood in that light. The experience of Twitter that the current power users enjoy is a relatively exclusive one, and Twitter is choosing to pursue broadening access to the service over enhancing the experience of those who already use it heavily. Nothing I’ve heard described is going to shut out anybody who’s already in, though– at most, they’re going to be inconvenienced to a small fraction of the degree that non-power-users are already inconvenienced.

Most of the Bad Things people trot out as results of algorithmic timelines are things that *I already put up with* as an intermittent Twitter user. Bits of conversation will appear out of context? If you only check in a few times an hour, you already get that (and because otherwise very smart people can’t figure out how to properly thread conversations, there’s often no good way to reconstruct what’s going on, but that’s another rant). You might miss things posted by your friends? That happens now, given the huge flood of stuff that comes in at peak hours– as mentioned above, I have to rely on Facebook’s algorithms to rescue a lot of stuff that gets lost in the noise on Twitter. Your stuff might just vanish without the right people seeing it? That already happens to those of us who are out-of-phase with peak Twitter activity.

All of these negative features are annoyances that people who *aren’t* on the inside already have to put up with. And given sensibly designed algorithms– which my experience with Facebook and “While You Were Away” suggests are entirely possible– these can be minimized. Done right, they have the potential to make Twitter more attractive and enjoyable for a lot of people who don’t currently get anything out of it.

I’m in Newport News, VA, to give a talk tonight at Jefferson Lab, and they’re putting me up at the on-site Residence Facility. The rooms at this are apparently sponsored associated with institutions that use the facility, with big signs on all the doors. Here’s mine:

So, I guess my stay is in some sense subsidized by the University of Manitoba. It’s a perfectly adequate hotel room, so, thanks, Manitoba.

As I am a Sooper Geeenyus, I forgot to pack the dress pants I usually wear when giving talks. Sigh. Happily, this is a public lecture, so jeans-and-sport-coat is a perfectly acceptable outfit. But I still feel like a dope…

In a day limited by health issues, I had a useful conversation with Leslie Greengard and Alex Barnett (SCDA, Dartmouth) about star-shades for nulling starlight in future exoplanet missions. They had ideas about how the electromagnetic field might be calculated, and issues with what might be being done with current calculations of this. These calculations are hard, because the star-shades under discussion for deployment at L1 are many times 10^{7} wavelengths in diameter, and millions of diameters away from the telescope!

I also talked to Boris Leistedt about galaxy and quasar cosmology using imaging (and a tiny bit of spectroscopy), in which the three-dimensional mapping is performed with photometric redshifts, or more precisely models of the source spectral energy distributions that are modeled simultaneously with the density field and so on. We are working on a first paper with recommendations for *LSST*. The idea is that a small amount of spectroscopy and an enormous amount of imaging ought to be sufficient to build a model that returns a redshift and spectral energy distribution for every source.

These stairs probably do not conform to any building code, but I like them anyway, and so they will appear in a paper I'll submit to the arxiv soon.

They're part of a nifty algorithm I thought of on Friday that I like rather a lot.

More later.

-cvj Click to continue reading this post

The post Staring at Stairs… appeared first on Asymptotia.

“Understanding quantum gravity” is on every physicist’s short list of Big Issues we would all like to know more about. If there’s been any lesson from last half-century of serious work on this problem, it’s that the answer is likely … Continue reading

“Understanding quantum gravity” is on every physicist’s short list of Big Issues we would all like to know more about. If there’s been any lesson from last half-century of serious work on this problem, it’s that the answer is likely to be something more subtle than just “take classical general relativity and quantize it.” Quantum gravity doesn’t seem to be an ordinary quantum field theory.

In that context, it makes sense to take many different approaches and see what shakes out. Alongside old stand-bys such as string theory and loop quantum gravity, there are less head-on approaches that try to understand how quantum gravity can really be so weird, without proposing a specific and complete model of what it might be.

Grant Remmen, a graduate student here at Caltech, has been working with me recently on one such approach, dubbed *entropic gravity*. We just submitted a paper entitled “What Is the Entropy in Entropic Gravity?” Grant was kind enough to write up this guest blog post to explain what we’re talking about.

Meanwhile, if you’re near Pasadena, Grant and his brother Cole have written a musical, *Boldly Go!*, which will be performed at Caltech in a few weeks. You won’t want to miss it!

One of the most exciting developments in theoretical physics in the past few years is the growing understanding of the connections between gravity, thermodynamics, and quantum entanglement. Famously, a complete quantum mechanical theory of gravitation is difficult to construct. However, one of the aspects that we are now coming to understand about quantum gravity is that in the final theory, gravitation and even spacetime itself will be closely related to, and maybe even emergent from, the mysterious quantum mechanical property known as entanglement.

This all started several decades ago, when Hawking and others realized that black holes behave with many of the same aspects as garden-variety thermodynamic systems, including temperature, entropy, etc. Most importantly, the black hole’s entropy is equal to its area [divided by (4 times Newton’s constant)]. Attempts to understand the origin of black hole entropy, along with key developments in string theory, led to the formulation of the holographic principle – see, for example, the celebrated AdS/CFT correspondence – in which quantum gravitational physics in some spacetime is found to be completely described by some special non-gravitational physics on the boundary of the spacetime. In a nutshell, one gets a gravitational universe as a “hologram” of a non-gravitational universe.

If gravity can emerge from, or be equivalent to, a set of physical laws without gravity, then something special about that non-gravitational physics has to make it happen. Physicists have now found that that special something is quantum entanglement: the special correlations among quantum mechanical particles that defies classical description. As a result, physicists are very interested in how to get the dynamics describing how spacetime is shaped and moves – Einstein’s equation of general relativity – from various properties of entanglement. In particular, it’s been suggested that the equations of gravity can be shown to come from some notion of entropy. As our universe is quantum mechanical, we should think about the entanglement entropy, a measure of the degree of correlation of quantum subsystems, which for thermal states matches the familiar thermodynamic notion of entropy.

The general idea is as follows: Inspired by black hole thermodynamics, suppose that there’s some more general notion, in which you choose some region of spacetime, compute its area, and find that when its area changes this is associated with a change in entropy. (I’ve been vague here as to what is meant by a “change” in the area and what system we’re computing the area of – this will be clarified soon!) Next, you somehow relate the entropy to an energy (e.g., using thermodynamic relations). Finally, you write the change in area in terms of a change in the spacetime curvature, using differential geometry. Putting all the pieces together, you get a relation between an energy and the curvature of spacetime, which if everything goes well, gives you nothing more or less than Einstein’s equation! This program can be broadly described as entropic gravity and the idea has appeared in numerous forms. With the plethora of entropic gravity theories out there, we realized that there was a need to investigate what categories they fall into and whether their assumptions are justified – this is what we’ve done in our recent work.

In particular, there are two types of theories in which gravity is related to (entanglement) entropy, which we’ve called *holographic gravity* and *thermodynamic gravity* in our paper. The difference between the two is in what system you’re considering, how you define the area, and what you mean by a change in that area.

In holographic gravity, you consider a region and define the area as that of its boundary, then consider various alternate configurations and histories of the matter in that region to see how the area would be different. Recent work in AdS/CFT, in which Einstein’s equation at linear order is equivalent to something called the “entanglement first law”, falls into the holographic gravity category. This idea has been extended to apply outside of AdS/CFT by Jacobson (2015). Crucially, Jacobson’s idea is to apply holographic mathematical technology to arbitrary quantum field theories in the bulk of spacetime (rather than specializing to conformal field theories – special physical models – on the boundary as in AdS/CFT) and thereby derive Einstein’s equation. However, in this work, Jacobson needed to make various assumptions about the entanglement structure of quantum field theories. In our paper, we showed how to justify many of those assumptions, applying recent results derived in quantum field theory (for experts, the form of the modular Hamiltonian and vacuum-subtracted entanglement entropy on null surfaces for general quantum field theories). Thus, we are able to show that the holographic gravity approach actually seems to work!

On the other hand, thermodynamic gravity is of a different character. Though it appears in various forms in the literature, we focus on the famous work of Jacobson (1995). In thermodynamic gravity, you don’t consider changing the entire spacetime configuration. Instead, you imagine a bundle of light rays – a *lightsheet* – in a particular dynamical spacetime background. As the light rays travel along – as you move down the lightsheet – the rays can be focused by curvature of the spacetime. Now, if the bundle of light rays started with a particular cross-sectional area, you’ll find a different area later on. In thermodynamic gravity, this is the change in area that goes into the derivation of Einstein’s equation. Next, one assumes that this change in area is equivalent to an entropy – in the usual black hole way with a factor of 1/(4 times Newton’s constant) – and that this entropy can be interpreted thermodynamically in terms of an energy flow through the lightsheet. The entropy vanishes from the derivation and the Einstein equation almost immediately appears as a thermodynamic equation of state. What we realized, however, is that what the entropy is actually the entropy *of* was ambiguous in thermodynamic gravity. Surprisingly, we found that there doesn’t seem to be a consistent definition of the entropy in thermodynamic gravity – applying quantum field theory results for the energy and entanglement entropy, we found that thermodynamic gravity could not simultaneously reproduce the correct constant in the Einstein equation and in the entropy/area relation for black holes.

So when all is said and done, we’ve found that holographic gravity, but not thermodynamic gravity, is on the right track. To answer our own question in the title of the paper, we found – in admittedly somewhat technical language – that the vacuum-subtracted von Neumann entropy evaluated on the null boundary of small causal diamonds gives a consistent formulation of holographic gravity. The future looks exciting for finding the connections between gravity and entanglement!

Since our recent trip to Vermont, SteelyKid has been obsessed with building blanket forts. These have mostly been in the living room, leading to a bit of angst at the end of the day when we need the blankets back. So i did a little reorganizing in the basement, and dug some sheets out of storage, allowing the construction of a longer-duration fort.

This covers the basement comprehensively enough that it’s probably a little hard to appreciate how much stuff is inside. The critical thing, though, is that SteelyKid is satisfied with the result:

Dun Wang, Steven Mohammed (Columbia), David Schiminovich and I met to discuss *GALEX*. Wang has absolutely beautiful images of the *GALEX* flat, and he can possibly separate the flat appropriate for stars from the flat appropriate for background photons. We realized we might need some robust estimation to deal with transient reflections from bright stars.

Matthew Penny (OSU) showed up and distracted us onto *K2* matters; Penny is involved in our efforts to deliver photometry from the crowded fields of *K2 Campaign 9* in the Milky Way bulge. Wang showed his *CPM*-based prediction of the crowded field in *K2C0* test data, where he has an absolutely beautiful time-domain image model. This is like difference imaging, except that the prediction is made not from a master image, but from the time-domain behavior of other (spatially separated) pixels. The variable stars and asteroids stick out dramatically. So I think we are close to having a plan.

A day of mainly writing: Alex Malz nearly finished a NASA graduate fellowship proposal; I put comments on pages from Dun Wang's CPM paper; and I closed issues open on my MCMC tutorial. I had a long discussion with Tony Butler-Yeoman (Wellington) and Marcus Frean (Wellington) about our *Oddity* method for detecting anomalies (like astronomical sources) in imaging. They asked me two very good questions about writing for astronomers: How do you demonstrate to astronomers that this is a useful method that they want to try—with a few good examples or a large-scale statistical test? And how do you write a methods paper in astrophysics?

On the latter, I advised our new methods-paper template, which is this: Introduction, then a full statement of all of the assumptions underlying the method. Then a demonstration that the method is best or very good under those assumptions (using fake data or analytical arguments). Then a demonstration that the method is okay on real data. Then a discussion, in which the assumptions are addressed, one by one: This permits discussion of advantages, disadvantages, limitations, and places where improvements are possible. The key idea of all this is that a good method should be *the best possible method under some set of identifiable assumptions.* I don't think that's too much to ask of a method (and yet it is not true of most of the things the data-analysis community does these days).

I’ve been remiss in my self-promotional duties, but I’m giving a public lecture tomorrow night in Newport News, VA, as part of the Jefferson Lab Science Series. This will be my traditional “What Every Dog Should Know About Quantum Physics” talk, with the sad addition of a slide honoring the late, great Queen of Niskayuna (visible as the “featured image” with this post). This isn’t the first dog-physics talk I’ve given since her death in December, but the previous one was the relativity talk, which has less Emmy-specific content. This one includes one of the video clips I made around a dog dialogue from the book:

That’s going to be a little hard to watch…

The talk was also written up in the local paper down there, which is always nice. And I’m looking forward to getting a tour of the JLab facilities. Assuming, of course, that the airline actually gets me there, which is no sure thing. Though my change of planes in Charlotte might be less of a problem than anticipated– I was fully expecting the Panthers to win big at the Super Bowl last night, and have their team plane come in at the same time as my flight…

Anyway, if you’re in that part of the world and free for the evening, stop by and hear some dog physics.

The first “official” post of this Polymath project has passed 100 comments, so I think it is time to write a second post. Again I will try to extract some of the useful information from the comments (but not all, and my choice of what to include should not be taken as some kind of […]

The first “official” post of this Polymath project has passed 100 comments, so I think it is time to write a second post. Again I will try to extract some of the useful information from the comments (but not all, and my choice of what to include should not be taken as some kind of judgment). A good way of organizing this post seems to be list a few more methods of construction of interesting union-closed systems that have come up since the last post — where “interesting” ideally means that the system is a counterexample to a conjecture that is not obviously false.

If is a union-closed family on a ground set , and , then we can take the family . The map is a homomorphism (in the sense that , so it makes sense to regard as a quotient of .

If instead we take an equivalence relation on , we can define a set-system to be the set of all unions of equivalence classes that belong to .

Thus, subsets of give quotient families and quotient sets of give subfamilies.

Possibly the most obvious product construction of two families and is to make their ground sets disjoint and then to take . (This is the special case with disjoint ground sets of the construction that Tom Eccles discussed earlier.)

Note that we could define this product slightly differently by saying that it consists of all pairs with the “union” operation . This gives an algebraic system called a join semilattice, and it is isomorphic in an obvious sense to with ordinary unions. Looked at this way, it is not so obvious how one should define abundances, because does not have a ground set. Of course, we can define them via the isomorphism to but it would be nice to do so more intrinsically.

Tobias Fritz, in this comment, defines a more general “fibre bundle” construction as follows. Let be a union-closed family of sets (the “base” of the system). For each let be a union-closed family (one of the “fibres”), and let the elements of consist of pairs with . We would like to define a join operation on by

for a suitable . For that we need a bit more structure, in the form of homomorphisms whenever . These should satisfy the obvious composition rule .

With that structure in place, we can take to be , and we have something like a union-closed system. To turn it into a union-closed system one needs to find a concrete realization of this “join semilattice” as a set system with the union operation. This can be done in certain cases (see the comment thread linked to above) and quite possibly in all cases.

First, here is a simple construction that shows that Conjecture 6 from the previous post is false. That conjecture states that if you choose a random non-empty and then a random , then the average abundance of is at least 1/2. It never seemed likely to be true, but it survived for a surprisingly long time, before the following example was discovered in a comment thread that starts here.

Let be a large integer and let be disjoint sets of size and . (Many details here are unimportant — for example, all that actually matters is that the sizes of the sets should increase fairly rapidly.) Now take the set system

.

To see that this is a counterexample, let us pick our random element of a random set, and then condition on the five possibilities for what that set is. I’ll do a couple of the calculations and then just state the rest. If , then its abundance is 2/3. If it is in , then its abundance is 1/2. If it is in , then the probability that it is in is , which is very small, so its abundance is very close to 1/2 (since with high probability the only three sets it belongs to are , and ). In this kind of way we get that for large enough we can make the average abundance as close as we like to

.

One thing I would like to do — or would like someone to do — is come up with a refinement of this conjecture that isn’t so obviously false. What this example demonstrates is that duplication shows that for the conjecture to have been true, the following apparently much stronger statement would have had to be true. For each non-empty , let be the minimum abundance of any element of . Then the average of over is at least 1/2.

How can we convert the average over into the minimum over ? The answer is simple: take the original set system and write the elements of the ground set in decreasing order of abundance. Now duplicate the first element (that is, the element with greatest abundance) once, the second element times, the third times, and so on. For very large , the effect of this is that if we choose a random element of (after the duplications have taken place) then it will have minimal abundance in .

So it seems that duplication of elements kills off this averaging argument too, but in a slightly subtler way. Could we somehow iterate this thought? For example, could we choose a random by first picking a random non-empty , then a random such that , and finally a random element ? And could we go further — e.g., picking a random chain of the form , etc., and stopping when we reach a set whose points cannot be separated further?

Tobias Fritz came up with a nice strengthening that again turned out (again as expected) to be false. The thought was that it might be nice to find a “bijective” proof of FUNC. Defining to be and to be , we would prove FUNC for if we could find an injection from to .

For such an argument to qualify as a proper bijective proof, it is not enough merely to establish the existence of an injection — that follows from FUNC on mere grounds of cardinality. Rather, one should define it in a nice way somehow. That makes it natural to think about what properties such an injection might have, and a particularly natural requirement that one might think about is that it should preserve unions.

It turns out that there are set systems for which there does not exist any with a union-preserving injection from to . After several failed attempts, I found the following example. Take a not too small pair of positive integers — it looks as though works. Then take a Steiner -system — that is, a collection of sets of size 5 such that each set of size 3 is contained in exactly one set from . (Work of Peter Keevash guarantees that such a set system exists, though this case was known before his amazing result.)

The counterexample is generated by all complements of sets in , though it is more convenient just to take and prove that there is no intersection-preserving injection from to . To establish this, one first proves that any such injection would have to take sets of size to sets of size , which is basically because you need room for all the subsets of size of a set to map to distinct subsets of the image of . Once that is established, it is fairly straightforward to show that there just isn’t room to do things. The argument can be found in the comment linked to above, and the thread below it.

Thomas Bloom came up with a simpler example, which is interesting for other reasons too. His example is generated by the sets , all -subsets of , and the 6 sets , , , , , . I asked him where this set system had come from, and the answer turned out to be very interesting. He had got it by staring at an example of Renaud and Sarvate of a union-closed set system with exactly one minimal-sized set, which has size 3, such that that minimal set contains no element of abundance at least 1/2. Thomas worked out how the Renaud-Servate example had been pieced together, and used similar ideas to produce his example. Tobias Fritz then went on to show that Thomas’s construction was a special case of his fibre-bundle construction.

This post is by no means a comprehensive account of all the potentially interesting ideas from the last post. For example, Gil Kalai has an interesting slant on the conjecture that I think should be pursued further, and there are a number of interesting questions that were asked in the previous comment thread that I have not repeated here, mainly because the post has taken a long time to write and I think it is time to post it.

As I've said, DFT proves that the electron density as a function of position contains basically *all* the information about the ground state (very cool and very non-obvious). DFT has become of enormous practical use because one can use simple *noninteracting* electronic states plus the right functional (which unfortunately we can't write down in simple, easy-to-compute closed form, but we can choose various approximations) to find (a very good approximation to) the true, interacting density.

So, what's the problem, beyond the obvious issues of computing efficiency and the fact that we don't know how to write down an exact form for the exchange-correlation part of the functional (basically where all the bodies are buried)?

Well, the noninteracting states that people like to use, the so-called Kohn-Sham orbitals, are seductive. It's easy to think of them as if they are "real", meaning that it's very tempting to start using them to think about *excited* states and where the electrons "really" live in those states, even though technically there is no *a priori* reason that they should be valid except as a tool to find the ground state density. This is discussed a bit in the comments here. This isn't a completely crazy idea, in the sense that the Kohn-Sham states *usually* have the right symmetries and in molecules tend to agree well with chemistry ideas about where reactions tend to occur, etc. However, there are no guarantees.

There are many approaches to do better (e.g., some statements that can be made about the lowest unoccupied orbital that let you determine not just the ground state energy but get a quantitative estimate of the gap to the lowest electronic excited state, and that has enabled very good computations of energy gaps in molecules and solids; time-dependent DFT, which looks at the general time-dependent electron density). However, you have to be very careful. Perhaps commenters will have some insights here.

The bottom line: DFT is intellectually deep, a boon to many practical calculations when implemented correctly, and so good at many things that the temptation is to treat it like a black box (especially as there are more and more simple-to-use commercial implementations) and assume it's good at everything. It remains an impressive achievement with huge scientific impact, and unless there are major advances in other computational approaches, DFT and its relatives are likely the best bet for achieving the long-desired ability to do "materials by design".

The Laser Interferometric Gravitational-Wave Observatory or LIGO is designed to detect gravitational waves—ripples of curvature in spacetime moving at the speed of light. It’s recently been upgraded, and it will either find gravitational waves soon or something really strange is going on. Rumors are swirling that LIGO has seen gravitational waves produced by two black […]

The Laser Interferometric Gravitational-Wave Observatory or LIGO is designed to detect gravitational waves—ripples of curvature in spacetime moving at the speed of light. It’s recently been upgraded, and it will either find gravitational waves soon or something really strange is going on.

Rumors are swirling that LIGO has seen gravitational waves produced by two black holes, of 29 and 36 solar masses, spiralling towards each other—and then colliding to form a single 62-solar-mass black hole!

You’ll notice that 29 + 36 is more than 62. So, it’s possible that *three solar masses were turned into energy, mostly in the form of gravitational waves!*

According to these rumors, the statistical significance of the signal is supposedly very high: better than 5 sigma! That means there’s at most a 0.000057% probability this event is a random fluke – assuming nobody made a mistake.

If these rumors are correct, we should soon see an official announcement. If the discovery holds up, someone will win a Nobel prize.

The discovery of gravitational waves is completely unsurprising, since they’re predicted by general relativity, a theory that’s passed many tests already. But it would open up a new window to the universe – and we’re likely to see interesting new things, once gravitational wave astronomy becomes a thing.

Here’s the tweet that launched the latest round of rumors:

For background on this story, try this:

• Tale of a doomed galaxy, *Azimuth*, 8 November 2015.

The first four sections of that long post discuss gravitational waves created by black hole collisions—but the last section is about LIGO and an earlier round of rumors, so I’ll quote it here!

LIGO stands for Laser Interferometer Gravitational Wave Observatory. The idea is simple. You shine a laser beam down two very long tubes and let it bounce back and forth between mirrors at the ends. You use this compare the length of these tubes. When a gravitational wave comes by, it stretches space in one direction and squashes it in another direction. So, we can detect it.

Sounds easy, eh? Not when you run the numbers! We’re trying to see gravitational waves that stretch space just a tiny bit: about one part in 10^{23}. At LIGO, the tubes are 4 kilometers long. So, we need to see their length change by an absurdly small amount: *one-thousandth the diameter of a proton!*

It’s amazing to me that people can even contemplate doing this, much less succeed. They use lots of tricks:

• They bounce the light back and forth many times, effectively increasing the length of the tubes to 1800 kilometers.

• There’s no air in the tubes—just a very good vacuum.

• They hang the mirrors on quartz fibers, making each mirror part of a pendulum with very little friction. This means it vibrates very *well* at one particular frequency, and very *badly* at frequencies far from that. This damps out the shaking of the ground, which is a real problem.

• This pendulum is hung on another pendulum.

• That pendulum is hung on a third pendulum.

• That pendulum is hung on a fourth pendulum.

• The whole chain of pendulums is sitting on a device that detects vibrations and moves in a way to counteract them, sort of like noise-cancelling headphones.

• There are 2 of these facilities, one in Livingston, Louisiana and another in Hanford, Washington. Only if *both* detect a gravitational wave do we get excited.

I visited the LIGO facility in Louisiana in 2006. It was really cool! Back then, the sensitivity was good enough to see collisions of black holes and neutron stars up to 50 million light years away.

Here I’m not talking about the supermassive black holes that live in the centers of galaxies. I’m talking about the much more common black holes and neutron stars that form when stars go supernova. Sometimes a *pair* of stars orbiting each other will both blow up, and form *two* black holes—or two neutron stars, or a black hole and neutron star. And eventually these will spiral into each other and emit lots of gravitational waves right before they collide.

50 million light years is big enough that LIGO could see about half the galaxies in the Virgo Cluster. Unfortunately, with that many galaxies, we only expect to see one neutron star collision every 50 years or so.

They never saw anything. So they kept improving the machines, and now we’ve got Advanced LIGO! This should now be able to see collisions up to 225 million light years away… and after a while, three times further.

They turned it on September 18th. Soon we should see more than one gravitational wave burst each year.

In fact, there’s a rumor that they’ve already seen one! But they’re still testing the device, and there’s a team whose job is to inject *fake signals*, just to see if they’re detected. Davide Castelvecchi writes:

LIGO is almost unique among physics experiments in practising ‘blind injection’. A team of three collaboration members has the ability to simulate a detection by using actuators to move the mirrors. “Only they know if, and when, a certain type of signal has been injected,” says Laura Cadonati, a physicist at the Georgia Institute of Technology in Atlanta who leads the Advanced LIGO’s data-analysis team.

Two such exercises took place during earlier science runs of LIGO, one in 2007 and one in 2010. Harry Collins, a sociologist of science at Cardiff University, UK, was there to document them (and has written books about it). He says that the exercises can be valuable for rehearsing the analysis techniques that will be needed when a real event occurs. But the practice can also be a drain on the team’s energies. “Analysing one of these events can be enormously time consuming,” he says. “At some point, it damages their home life.”

The original blind-injection exercises took 18 months and 6 months respectively. The first one was discarded, but in the second case, the collaboration wrote a paper and held a vote to decide whether they would make an announcement. Only then did the blind-injection team ‘open the envelope’ and reveal that the events had been staged.

Aargh! The disappointment would be crushing.

But with luck, Advanced LIGO will soon detect real gravitational waves. And I hope life here in the Milky Way thrives for a long time – so that when the gravitational waves from the doomed galaxy PG 1302-102 reach us, hundreds of thousands of years in the future, we can study them in exquisite detail.

For Castelvecchi’s whole story, see:

• Davide Castelvecchi Has giant LIGO experiment seen gravitational waves?, *Nature*, 30 September 2015.

For pictures of my visit to LIGO, see:

• John Baez, This week’s finds in mathematical physics (week 241), 20 November 2006.

For how Advanced LIGO works, see:

• The LIGO Scientific Collaboration Advanced LIGO, 17 November 2014.

In my previous post, I linked to seven *Closer to Truth* videos of me spouting about free will, Gödel’s Theorem, black holes, etc. etc. I also mentioned that there was a segment of me talking about why the universe exists that for some reason they didn’t put up. Commenter mjgeddes wrote, “Would have liked to hear your views on the existence of the universe question,” so I answered in another comment.

But then I thought about it some more, and it seemed inappropriate to me that my considered statement about why the universe exists should only be available as part of a *comment thread* on my blog. At the very least, I thought, such a thing ought to be a top-level post.

So, without further ado:

My view is that, if we want to make mental peace with the “Why does the universe exist?” question, the key thing we need to do is forget about the universe for a while, and just focus on the meaning of the word “why.” I.e., when we ask a why-question, what kind of answer are we looking for, what kind of answer would make us happy?

Notice, in particular, that there are hundreds of other why-questions, not nearly as prestigious as the universe one, yet that seem just as vertiginously unanswerable. E.g., why is 5 a prime number? Why does “cat” have 3 letters?

Now, the best account of “why”—and of explanation and causality—that I know about is the *interventionist* account, as developed for example in Judea Pearl’s work. In that account, to ask “Why is X true?” is simply to ask: “What could we have changed in order to make X false?” I.e., in the causal network of reality, what are the levers that turn X on or off?

This question can sometimes make sense even in pure math. For example: “Why is this theorem true?” “It’s true only because we’re working over the complex numbers. The analogous statement about real numbers is false.” A perfectly good interventionist answer.

On the other hand, in the case of “Why is 5 prime?,” all the levers you could pull to make 5 composite involve significantly more advanced machinery than is needed to pose the question in the first place. E.g., “5 is prime because we’re working over the ring of integers. Over other rings, like Z[√5], it admits nontrivial factorizations.” Not really an explanation that would satisfy a four-year-old (or me, for that matter).

And then we come to the question of why anything exists. For an interventionist, this translates into: what causal lever could have been pulled in order to make nothing exist? Well, whatever lever it was, presumably the lever itself was *something*—and so you see the problem right there.

Admittedly, suppose there were a giant red button, somewhere within the universe, that when pushed would cause the entire universe (including the button itself) to blink out of existence. In that case, we could say: the reason why the universe *continues* to exist is that no one has pushed the button yet. But even then, that still wouldn’t explain why the universe *had* existed.

Over at Marvel, I chatted with actor Reggie Austin (Dr. Jason Wilkes on Agent Carter) some more about the physics I helped embed in the show this season. It was fun. (See an earlier chat here.) This was about Zero Matter itself (which will also be a precursor to things seen in the movie Dr. Strange later this year)... It was one of the first things the writers asked me about when I first met them, and we brainstormed about things like what it should be called (the name "dark force" comes later in Marvel history), and how a scientist who encountered it would contain it. This got me thinking about things like perfect fluids, plasma physics, exotic phases of materials, magnetic fields, and the like (sadly the interview skips a lot of what I said about those)... and to the writers' and show-runners' enormous credit, lots of these concepts were allowed to appear in the show in various ways, including (versions of) two containment designs that I sketched out. Anyway, have a look in the embed below.

Oh! The name. We did not settle on a name after the first meeting, but one of [...] Click to continue reading this post

The post On Zero Matter appeared first on Asymptotia.

It’s been a long time since you’ve seen an installment of the information geometry series on this blog! Before I took a long break, I was explaining relative entropy and how it changes in evolutionary games. Much of what I said is summarized and carried further here: • John Baez and Blake Pollard, Relative entropy […]

It’s been a long time since you’ve seen an installment of the information geometry series on this blog! Before I took a long break, I was explaining relative entropy and how it changes in evolutionary games. Much of what I said is summarized and carried further here:

• John Baez and Blake Pollard, Relative entropy in biological systems. (Blog article here.)

But now Blake has a new paper, and I want to talk about that:

• Blake Pollard, Open Markov processes: a compositional perspective on non-equilibrium steady states in biology, to appear in *Open Systems and Information Dynamics*.

I’ll focus on just one aspect: the principle of minimum entropy production. This is an exciting yet controversial principle in non-equilibrium thermodynamics. Blake examines it in a situation where we can tell exactly what’s happening.

Life exists away from equilibrium. Left isolated, systems will tend toward thermodynamic equilibrium. However, biology is about **open systems**: physical systems that exchange matter or energy with their surroundings. Open systems can be maintained away from equilibrium by this exchange. This leads to the idea of a **non-equilibrium steady state**—a state of an open system that doesn’t change, but is not in equilibrium.

A simple example is a pan of water sitting on a stove. Heat passes from the flame to the water and then to the air above. If the flame is very low, the water doesn’t boil and nothing moves. So, we have a steady state, at least approximately. But this is not an equilibrium, because there is a constant flow of energy through the water.

Of course in reality the water will be slowly evaporating, so we don’t really have a steady state. As always, models are approximations. If the water is evaporating slowly enough, it can be useful to approximate the situation with a non-equilibrium steady state.

There is much more to biology than steady states. However, to dip our toe into the chilly waters of non-equilibrium thermodynamics, it is nice to start with steady states. And already here there are puzzles left to solve.

Ilya Prigogine won the Nobel prize for his work on non-equilibrium thermodynamics. One reason is that he had an interesting idea about steady states. He claimed that under certain conditions, a non-equilibrium steady state will *minimize entropy production!*

There has been a lot of work trying to make the ‘principle of minimum entropy production’ precise and turn it into a theorem. In this book:

• G. Lebon and D. Jou, *Understanding Non-equilibrium Thermodynamics*, Springer, Berlin, 2008.

the authors give an argument for the principle of minimum entropy production based on four conditions:

• **time-independent boundary conditions**: the surroundings of the system don’t change with time.

• **linear phenomenological laws**: the laws governing the macroscopic behavior of the system are linear.

• **constant phenomenological coefficients**: the laws governing the macroscopic behavior of the system don’t change with time.

• **symmetry of the phenomenological coefficients**: since they are linear, the laws governing the macroscopic behavior of the system can be described by a linear operator, and we demand that in a suitable basis the matrix for this operator is symmetric:

The last condition is obviously the subtlest one; it’s sometimes called **Onsager reciprocity**, and people have spent a lot of time trying to derive it from other conditions.

However, Blake goes in a different direction. He considers a concrete class of open systems, a very large class called ‘open Markov processes’. These systems obey the first three conditions listed above, and the ‘detailed balanced’ open Markov processes also obey the last one. But Blake shows that minimum entropy production holds only approximately—with the approximation being good for steady states that are *near equilibrium!*

However, he shows that another minimum principle holds exactly, even for steady states that are far from equilibrium. He calls this the ‘principle of minimum dissipation’.

We actually discussed the principle of minimum dissipation in an earlier paper:

• John Baez, Brendan Fong and Blake Pollard, A compositional framework for Markov processes. (Blog article here.)

But one advantage of Blake’s new paper is that it presents the results with a minimum of category theory. Of course I love category theory, and I think it’s the right way to formalize open systems, but it can be intimidating.

Another good thing about Blake’s new paper is that it explicitly compares the principle of minimum entropy to the principle of minimum dissipation. He shows they agree in a certain limit—namely, the limit where the system is close to equilibrium.

Let me explain this. I won’t include the nice example from biology that Blake discusses: a very simple model of membrane transport. For that, read his paper! I’ll just give the general results.

An **open Markov process** consists of a finite set of **states**, a subset of **boundary states**, and an **infinitesimal stochastic** operator meaning a linear operator with

and

I’ll explain these two conditions in a minute.

For each we introduce a **population** We call the resulting function the **population distribution**. Populations evolve in time according to the **open master equation**:

So, the populations obey a linear differential equation at states that are not in the boundary, but they are specified ‘by the user’ to be chosen functions at the boundary states.

The off-diagonal entries are the rates at which population hops from the th to the th state. This lets us understand the definition of an infinitesimal stochastic operator. The first condition:

says that the rate for population to transition from one state to another is non-negative. The second:

says that population is conserved, at least if there are no boundary states. Population can flow in or out at boundary states, since the master equation doesn’t hold there.

A **steady state** is a solution of the open master equation that does not change with time. A steady state for a closed Markov process is typically called an **equilibrium**. So, an equilibrium obeys the master equation at all states, while for a steady state this may not be true at the boundary states. Again, the reason is that population can flow in or out at the boundary.

We say an equilibrium of a Markov process is **detailed balanced** if the rate at which population flows from the th state to the th state is equal to the rate at which it flows from the th state to the th:

Suppose we’ve got an open Markov process that has a detailed balanced equilibrium . Then a non-equilibrium steady state will minimize a function called the ‘dissipation’, subject to constraints on its boundary populations. There’s a nice formula for the dissipation in terms of and

**Definition.** Given an open Markov process with detailed balanced equilibrium we define the **dissipation** for a population distribution to be

This formula is a bit tricky, but you’ll notice it’s quadratic in and it vanishes when So, it’s pretty nice.

Using this concept we can formulate a principle of minimum dissipation, and prove that non-equilibrium steady states obey this principle:

**Definition.** We say a population distribution obeys the **principle of minimum dissipation** with boundary population if minimizes subject to the constraint that

**Theorem 1.** A population distribution is a steady state with for all boundary states if and only if obeys the principle of minimum dissipation with boundary population .

**Proof**. This follows from Theorem 28 in A compositional framework for Markov processes.

How does dissipation compare with entropy production? To answer this, first we must ask: what really is entropy production? And: how does the equilibrium state show up in the concept of entropy production?

The **relative entropy** of two population distributions is given by

It is well known that for a closed Markov process with as a detailed balanced equilibrium, the relative entropy is monotonically *decreasing* with time. This is due to an annoying sign convention in the definition of relative entropy: while entropy is typically increasing, relative entropy typically decreases. We could fix this by putting a minus sign in the above formula or giving this quantity some other name. A lot of people call it the **Kullback–Leibler divergence**, but I have taken to calling it **relative information**. For more, see:

• John Baez and Blake Pollard, Relative entropy in biological systems. (Blog article here.)

We say ‘relative entropy’ in the title, but then we explain why ‘relative information’ is a better name, and use that. More importantly, we explain why has the physical meaning of *free energy*. Free energy tends to decrease, so everything is okay. For details, see Section 4.

Blake has a nice formula for how fast decreases:

**Theorem 2.** Consider an open Markov process with as its set of states and as the set of boundary states. Suppose obeys the open master equation and is a detailed balanced equilibrium. For any boundary state let

measure how much fails to obey the master equation. Then we have

Moreover, the first term is less than or equal to zero.

**Proof.** For a self-contained proof, see Information geometry (part 16), which is coming up soon. It will be a special case of the theorems there. █

Blake compares this result to previous work by Schnakenberg:

• J. Schnakenberg, Network theory of microscopic and macroscopic behavior of master equation systems, *Rev. Mod. Phys.* **48** (1976), 571–585.

The negative of Blake’s first term is this:

Under certain circumstances, this equals what Schnakenberg calls the **entropy production**. But a better name for this quantity might be **free energy loss**, since for a closed Markov process that’s exactly what it is! In this case there are no boundary states, so the theorem above says is the rate at which relative entropy—or in other words, free energy—decreases.

For an open Markov process, things are more complicated. The theorem above shows that free energy can also flow in or out at the boundary, thanks to the second term in the formula.

Anyway, the sensible thing is to compare a principle of ‘minimum free energy loss’ to the principle of minimum dissipation. The principle of minimum dissipation is true. How about the principle of minimum free energy loss? It turns out to be approximately true near equilibrium.

For this, consider the situation in which is near to the equilibrium distribution in the sense that

for some small numbers We collect these numbers in a vector called

**Theorem 3.** Consider an open Markov process with as its set of states and as the set of boundary states. Suppose is a detailed balanced equilibrium and let be arbitrary. Then

where is the free energy loss, is the dissipation, is defined as above, and by we mean a sum of terms of order

**Proof.** First take the free energy loss:

Expanding the logarithm to first order in we get

Since is infinitesimal stochastic, so the second term in the sum vanishes, leaving

or

Since is a equilibrium we have so now the last term in the sum vanishes, leaving

Next, take the dissipation

and expand the square, getting

Since is infinitesimal stochastic, The first term is just this times a function of summed over so it vanishes, leaving

Since is an equilibrium, The last term above is this times a function of summed over so it vanishes, leaving

This matches what we got for up to terms of order █

In short: detailed balanced open Markov processes are governed by the principle of minimum dissipation, not minimum entropy production. *Minimum dissipation agrees with minimum entropy production only near equilibrium.*

joint with Blake Pollard Lately we’ve been thinking about open Markov processes. These are random processes where something can hop randomly from one state to another (that’s the ‘Markov process’ part) but also enter or leave the system (that’s the ‘open’ part). The ultimate goal is to understand the nonequilibrium thermodynamics of open systems—systems where […]

*joint with Blake Pollard*

Lately we’ve been thinking about open Markov processes. These are random processes where something can hop randomly from one state to another (that’s the ‘Markov process’ part) but also enter or leave the system (that’s the ‘open’ part).

The ultimate goal is to understand the nonequilibrium thermodynamics of open systems—systems where energy and maybe matter flows in and out. If we could understand this well enough, we could understand in detail how *life* works. That’s a difficult job! But one has to start somewhere, and this is one place to start.

We have a few papers on this subject:

• Blake Pollard, A Second Law for open Markov processes. (Blog article here.)

• John Baez, Brendan Fong and Blake Pollard, A compositional framework for Markov processes. (Blog article here.)

• Blake Pollard, Open Markov processes: A compositional perspective on non-equilibrium steady states in biology. (Blog article here.)

However, right now we just want to show you three closely connected results about how relative entropy changes in open Markov processes.

An **open Markov process** consists of a finite set of **states**, a subset of **boundary states**, and an **infinitesimal stochastic** operator meaning a linear operator with

and

For each state we introduce a **population** We call the resulting function the **population distribution**.

Populations evolve in time according to the **open master equation**:

So, the populations obey a linear differential equation at states that are not in the boundary, but they are specified ‘by the user’ to be chosen functions at the boundary states. The off-diagonal entry for describe the rate at which population transitions from the th to the th state.

A **closed Markov process**, or continuous-time discrete-state Markov chain, is an open Markov process whose boundary is empty. For a closed Markov process, the open master equation becomes the usual **master equation**:

In a closed Markov process the total population is conserved:

This lets us normalize the initial total population to 1 and have it stay equal to 1. If we do this, we can talk about *probabilities* instead of populations. In an open Markov process, population can flow in and out at the boundary states.

For any pair of distinct states is the flow of population from to The **net flux** of population from the th state to the th state is the flow from to minus the flow from to :

A **steady state** is a solution of the open master equation that does not change with time. A steady state for a closed Markov process is typically called an **equilibrium**. So, an equilibrium obeys the master equation at all states, while for a steady state this may not be true at the boundary states. The idea is that population can flow in or out at the boundary states.

We say an equilibrium of a Markov process is **detailed balanced** if all the net fluxes vanish:

or in other words:

Given two population distributions we can define the **relative entropy**

When is a detailed balanced equilibrium solution of the master equation, the relative entropy can be seen as the ‘free energy’ of For a precise statement, see Section 4 of Relative entropy in biological systems.

The Second Law of Thermodynamics implies that the free energy of a closed system tends to decrease with time, so for *closed* Markov processes we expect to be nonincreasing. And this is true! But for *open* Markov processes, free energy can flow in from outside. This is just one of several nice results about how relative entropy changes with time.

**Theorem 1.** Consider an open Markov process with as its set of states and as the set of boundary states. Suppose and obey the open master equation, and let the quantities

measure how much the time derivatives of and fail to obey the master equation. Then we have

This result separates the change in relative entropy change into two parts: an ‘internal’ part and a ‘boundary’ part.

It turns out the ‘internal’ part is always less than or equal to zero. So, from Theorem 1 we can deduce a version of the Second Law of Thermodynamics for open Markov processes:

**Theorem 2.** Given the conditions of Theorem 1, we have

Intuitively, this says that free energy can only increase if it comes in from the boundary!

There is another nice result that holds when is an equilibrium solution of the master equation. This idea seems to go back to Schnakenberg:

**Theorem 3.** Given the conditions of Theorem 1, suppose also that is an equilibrium solution of the master equation. Then we have

where

is the **net flux** from to while

is the conjugate **thermodynamic force**.

The flux has a nice meaning: it’s the net flow of population from to The thermodynamic force is a bit subtler, but this theorem reveals its meaning: it says how much the population *wants* to flow from to

More precisely, up to that factor of the thermodynamic force says how much free energy loss is caused by net flux from to There’s a nice analogy here to water losing potential energy as it flows downhill due to the force of gravity.

**Proof of Theorem 1.** We begin by taking the time derivative of the relative information:

We can separate this into a sum over states for which the time derivatives of and are given by the master equation, and boundary states for which they are not:

For boundary states we have

and similarly for the time derivative of We thus obtain

To evaluate the first sum, recall that

so

Thus, we have

We can rewrite this as

Since is infinitesimal stochastic we have so the first term drops out, and we are left with

as desired. █

**Proof of Theorem 2.** Thanks to Theorem 1, to prove

it suffices to show that

or equivalently (recalling the proof of Theorem 1):

The last two terms on the left hand side cancel when Thus, if we break the sum into an part and an part, the left side becomes

Next we can use the infinitesimal stochastic property of to write as the sum of over not equal to obtaining

Since when and for all we conclude that this quantity is █

**Proof of Theorem 3.** Now suppose also that is an equilibrium solution of the master equation. Then for all states so by Theorem 1 we need to show

We also have so the second

term in the sum at left vanishes, and it suffices to show

By definition we have

This in turn equals

and we can switch the dummy indices in the second sum, obtaining

or simply

But this is

and the first term vanishes because is infinitesimal stochastic: We thus have

as desired. █

Category theory reduces a large chunk of math to the clever manipulation of arrows. One of the fun things about this is that you can often take a familiar mathematical construction, think of it category-theoretically, and just turn around all the arrows to get something new and interesting! In math we love functions. If we […]

Category theory reduces a large chunk of math to the clever manipulation of arrows. One of the fun things about this is that you can often take a familiar mathematical construction, think of it category-theoretically, and just *turn around all the arrows* to get something new and interesting!

In math we love functions. If we have a function

we can formally turn around the arrow to think of as something going back from back to . But this something is usually not a function: it’s called a ‘cofunction’. A **cofunction** from to is simply a function from to

Cofunctions are somewhat interesting, but they’re really just functions viewed through a looking glass, so they don’t give much new—at least, not by themselves.

The game gets more interesting if we think of functions and cofunctions as special sorts of relations. A **relation** from to is a subset

It’s a **function** when for each there’s a unique with It’s a **cofunction** when for each there’s a unique with

Just as we can compose functions, we can compose relations. Relations have certain advantages over functions: for example, we can ‘turn around’ any relation from to and get a relation from to

If we turn around a function we get a cofunction, and vice versa. But we can also do other fun things: for example, since both functions and cofunctions are relations, we can compose a function and a cofunction and get a relation.

Of course, relations also have certain *disadvantages* compared to functions. But it’s utterly clear by now that the category where the objects are finite sets and the morphisms are relations, is very important.

So far, so good. But what happens if we take the definition of ‘relation’ and turn all the arrows around?

There are actually several things I could mean by this question, some more interesting than others. But one of them gives a very interesting new concept: the concept of ‘corelation’. And two of my students have just written a very nice paper on corelations:

• Brandon Coya and Brendan Fong, Corelations are the prop for extraspecial commutative Frobenius monoids.

Here’s why this paper is important for network theory: corelations between finite sets are exactly what we need to describe electrical circuits made of ideal conductive wires! A corelation from a finite set to a finite set can be drawn this way:

I have drawn more wires than strictly necessary: I’ve drawn a wire between two points whenever I want current to be able to flow between them. But there’s a reason I did this: a corelation from to simply tells us when current can flow from one point in either of these sets to any other point in these sets.

Of course circuits made solely of conductive wires are not very exciting for electrical engineers. But in an earlier paper, Brendan introduced corelations as an important stepping-stone toward more general circuits:

• John Baez and Brendan Fong, A compositional framework for passive linear circuits. (Blog article here.)

The key point is simply that you use conductive wires to connect resistors, inductors, capacitors, batteries and the like and build interesting circuits—so if you don’t fully understand the math of conductive wires, you’re limited in your ability to understand circuits in general!

In their new paper, Brendan teamed up with Brandon Coya, and they figured out all the rules obeyed by the category where the objects are finite sets and the morphisms are corelations. I’ll explain these rules later.

This sort of analysis had previously been done for and it turns out there’s a beautiful analogy between the two cases! Here is a chart displaying the analogy:

Spans |
Cospans |

extra bicommutative bimonoids | special commutative Frobenius monoids |

Relations |
Corelations |

extraspecial bicommutative bimonoids | extraspecial commutative Frobenius monoids |

I’m sure this will be cryptic to the nonmathematicians reading this, and even many mathematicians—but the paper explains what’s going on here.

I’ll actually say what an ‘extraspecial commutative Frobenius monoid’ is later in this post. This is a terse way of listing all the rules obeyed by corelations between finite sets—and thus, all the rules obeyed by conductive wires.

But first, let’s talk about something simpler.

Just as we can define functions as relations of a special sort, we can also define relations in terms of functions. A relation from to is a subset

but we can think of this as an equivalence class of one-to-one functions

Why an equivalence class? The image of is our desired subset of The set here could be replaced by any isomorphic set; its only role is to provide ‘names’ for the elements of that are in the image of

Now we have a relation described as an arrow, or really an equivalence class of arrows. Next, let’s turn the arrow around!

There are different things I might mean by that, but we want to do it cleverly. When we turn arrows around, the concept of product (for example, cartesian product of sets) turns into the concept of sum (for example, disjoint union of sets). Similarly, the concept of monomorphism (such as a one-to-one function) turns into the concept of epimorphism (such as an onto function). If you don’t believe me, click on the links!

So, we should define a **corelation** from a set to a set to be an equivalence class of onto functions

Why an equivalence class? The set here could be replaced by any isomorphic set; its only role is to provide ‘names’ for the sets of elements of that get mapped to the same thing via

In simpler terms, a corelation from to a set is just a partition of the disjoint union So, it looks like this:

If we like, we can then draw a line connecting any two points that lie in the same part of the partition:

These lines determine the corelation, so we can also draw a corelation this way:

This is why corelations describe circuits made solely of wires!

The main result in Brandon and Brendan’s paper is that is equivalent to the PROP for extraspecial commutative Frobenius monoids. That’s a terse way of the laws governing

Let me just show you the most important laws. In each of these law I’ll draw two circuits made of wires, and write an equals sign asserting that they give the same corelation from a set to a set The inputs of each circuit are on top, and the outputs are at the bottom. I’ll draw 3-way junctions as little triangles, but don’t worry about that. When we compose two corelations we may get a wire left in mid-air, not connected to the inputs or outputs. We draw the end of the wire as a little circle.

There are some laws called the ‘commutative monoid’ laws:

and an upside-down version called the ‘cocommutative comonoid’ laws:

Then we have ‘Frobenius laws’:

and finally we have the ‘special’ and ‘extra’ laws:

All other laws can be derived from these in some systematic ways.

Commutative Frobenius monoids obey the commutative monoid laws, the cocommutative comonoid laws and the Frobenius laws. They play a fundamental role in 2d topological quantum field theory. Special Frobenius monoids are also well-known. But the ‘extra’ law, which says that a little piece of wire not connected to anything can be thrown away with no effect, is less well studied. Jason Erbele and I gave it this name in our work on control theory:

• John Baez and Jason Erbele, Categories in control. (Blog article here.)

David Ellerman has spent a lot of time studying what would happen to mathematics if we turned around a lot of arrows in a certain systematic way. In particular, just as the concept of relation would be replaced by the concept of corelation, the concept of subset would be replaced by the concept of partition. You can see how it fits together: just as a relation from to is a subset of a corelation from to is a partition of

There’s a lattice of subsets of a set:

In logic these subsets correspond to propositions, and the lattice operations are the logical operations ‘and’ and ‘or’. But there’s also a lattice of partitions of a set:

In Ellerman’s vision, this lattice of partitions gives a new kind of logic. You can read about it here:

• David Ellerman, Introduction to partition logic, *Logic Journal of the Interest Group in Pure and Applied Logic* **22** (2014), 94–125.

As mentioned, the main result in Brandon and Brendan’s paper is that is equivalent to the PROP for extraspecial commutative Frobenius monoids. After they proved this, they noticed that the result has also been stated in other language and proved in other ways by two other authors:

• Fabio Zanasi, *Interacting Hopf Algebras—the Theory of Linear Systems*, PhD thesis, École Normale Supériere de Lyon, 2015.

• K. Dosen and Z. Petrić, Syntax for split preorders, *Annals of Pure and Applied Logic* **164** (2013), 443–481.

Unsurprisingly, I prefer Brendan and Brandon’s approach to deriving the result. But it’s nice to see different perspectives!

Yes, I was in battle again. A persistent skunk that wants to take up residence in the crawl space. I got rid of it last week, having found one place it broke in. This involved a lot of crawling around on my belly armed with a headlamp (not pictured - this is an old picture) and curses. I've done this before... It left. Then yesterday I found a new place it had broken in through and the battle was rejoined. Interestingly, this time it decided to hide after some of the back and forth and I lost track of it for a good while and was about to give up and hope it will feel unsafe with all the lights I'd put on down there (and/or encourage it further to leave by deploying nuclear weapons to match the ones it comes armed with*).

In preparation for this I left open the large access hatch and sprinkled a layer [...] Click to continue reading this post

The post Suited Up! appeared first on Asymptotia.

Ever since I became an environmentalist, the potential destruction wrought by aggressively expanding civilizations has been haunting my thoughts. Not just here and now, where it’s easy to see, but in the future. In October 2006, I wrote this in my online diary: A long time ago on this diary, I mentioned my friend Bruce […]

Ever since I became an environmentalist, the potential destruction wrought by aggressively expanding civilizations has been haunting my thoughts. Not just here and now, where it’s easy to see, but in the future.

In October 2006, I wrote this in my online diary:

A long time ago on this diary, I mentioned my friend Bruce Smith’s nightmare scenario. In the quest for ever faster growth, corporations evolve toward ever faster exploitation of natural resources. The Earth is not enough. So, ultimately, they send out self-replicating von Neumann probes that eat up solar systems as they go, turning the planets into more probes. Different brands of probes will compete among each other, evolving toward ever faster expansion. Eventually, the winners will form a wave expanding outwards at nearly the speed of light—demolishing everything behind them, leaving only wreckage.

The scary part is that even if we don’t let this happen, some other civilization might.

The last point is the key one. Even if something is unlikely, in a sufficiently large universe it will happen, as long as it’s

possible. And then it will perpetuate itself, as long as it’s evolutionarily fit. Our universe seems pretty darn big. So, even if a given strategy is hard to find, if it’s a winning strategy it will get played somewhere.So, even in this nightmare scenario of "spheres of von Neumann probes expanding at near lightspeed", we don’t need to worry about a bleak future for the universe as a whole—any more than we need to worry that viruses will completely kill off all higher life forms. Some fraction of civilizations will probably develop defenses in time to repel the onslaught of these expanding spheres.

It’s not something I stay awake worrying about, but it’s a depressingly plausible possibility. As you can see, I was trying to reassure myself that everything would be okay, or at least acceptable, in the long run.

Even earlier, S. Jay Olson and I wrote a paper together on the limitations in accurately measuring distances caused by quantum gravity. If you try to measure a distance too accurately, you’ll need to concentrate so much energy in such a small space that you’ll create a black hole!

That was in 2002. Later I lost touch with him. But now I’m happy to discover that he’s doing interesting work on quantum gravity and quantum information processing! He is now at Boise State University in Idaho, his home state.

But here’s the cool part: he’s also studying aggressively expanding civilizations.

What will happen if some civilizations start aggressively expanding through the Universe at a reasonable fraction of the speed of light? We don’t have to assume most of them do. Indeed, there can’t be too many, or they’d already be here! More precisely, the density of such civilizations must be low at the present time. The number of them could be infinite, since space is apparently infinite. But none have reached us. We may eventually become such a civilization, but we’re not one yet.

Each such civilization will form a growing ‘bubble’: an expanding sphere of influence. And occasionally, these bubbles will collide!

Here are some pictures from a simulation he did:

As he notes, the math of these bubbles has already been studied by researchers interested in inflationary cosmology, like Alan Guth. These folks have considered the possibility that in the very early Universe, most of space was filled with a ‘false vacuum’: a state of matter that resembles the actual vacuum, but has higher energy density.

A false vacuum could turn into the true vacuum, liberating energy in the form of particle-antiparticle pairs. However, it might not do this instantly! It might be ‘metastable’, like ball number 1 in this picture:

It might need a nudge to ‘roll over the hill’ (metaphorically) and down into the lower-energy state corresponding to the true vacuum, shown as ball number 3. Or, thanks to quantum mechanics, it might ‘tunnel’ through this hill.

The balls and the hill are just an analogy. What I mean is that the false vacuum might need to go through a stage of having even higher energy density before it could turn into the true vacuum. Random fluctuations, either quantum-mechanical or thermal, could make this happen. Such a random fluctuation could happen in one location, forming a ‘bubble’ of true vacuum that—under certain conditions—would rapidly expand.

It’s actually not very different from bubbles of steam forming in superheated water!

But here’s the really interesting Jay Olson noted in his first paper on this subject. Research on bubbles in the inflationary cosmology could actually be relevant to aggressively expanding civilizations!

Why? Just as a bubble of expanding true vacuum has different pressure than the false vacuum surrounding it, the same might be true for an aggressively expanding civilization. If they are serious about expanding rapidly, they may convert a lot of matter into radiation to power their expansion. And while energy is conserved in this process, the *pressure* of radiation in space is a lot bigger than the pressure of matter, which is almost zero.

General relativity says that energy density slows the expansion of the Universe. But also—and this is probably less well-known among nonphysicists—it says that *pressure* has a similar effect. Also, as the Universe expands, the energy density and pressure of radiation drops at a different rate than the energy density of matter.

So, the expansion of the Universe itself, on a very large scale, could be affected by aggressively expanding civilizations!

The fun part is that Jay Olson actually studies this in a quantitative way, making some guesses about the numbers involved. Of course there’s a huge amount of uncertainty in all matters concerning aggressively expanding high-tech civilizations, so he actually considers a wide range of possible numbers. But if we assume a civilization turns a large fraction of matter into radiation, the effects could be significant!

The effect of the extra pressure due to radiation would be to temporarily slow the expansion of the Universe. But the expansion would not be stopped. The radiation will gradually thin out. So eventually, dark energy—which has negative pressure, and does not thin out as the Universe expands—will win. Then the Universe will expand exponentially, as it is already beginning to do now.

(Here I am ignoring speculative theories where dark energy has properties that change dramatically over time.)

Here are his papers on this subject. The abstracts sketch his results, but you have to look at the papers to see how nice they are. He’s thought quite carefully about these things.

• S. Jay Olson, Homogeneous cosmology with aggressively expanding civilizations, *Classical and Quantum Gravity* **32** (2015) 215025.

Abstract.In the context of a homogeneous universe, we note that the appearance of aggressively expanding advanced life is geometrically similar to the process of nucleation and bubble growth in a first-order cosmological phase transition. We exploit this similarity to describe the dynamics of life saturating the universe on a cosmic scale, adapting the phase transition model to incorporate probability distributions of expansion and resource consumption strategies. Through a series of numerical solutions spanning several orders of magnitude in the input assumption parameters, the resulting cosmological model is used to address basic questions related to the intergalactic spreading of life, dealing with issues such as timescales, observability, competition between strategies, and first-mover advantage. Finally, we examine physical effects on the universe itself, such as reheating and the backreaction on the evolution of the scale factor, if such life is able to control and convert a significant fraction of the available pressureless matter into radiation. We conclude that the existence of life, if certain advanced technologies are practical, could have a significant influence on the future large-scale evolution of the universe.

• S. Jay Olson, Estimates for the number of visible galaxy-spanning civilizations and the cosmological expansion of life.

Abstract.If advanced civilizations appear in the universe with a desire to expand, the entire universe can become saturated with life on a short timescale, even if such expanders appear but rarely. Our presence in an untouched Milky Way thus constrains the appearance rate of galaxy-spanning Kardashev type III (K3) civilizations, if it is assumed that some fraction of K3 civilizations will continue their expansion at intergalactic distances. We use this constraint to estimate the appearance rate of K3 civilizations for 81 cosmological scenarios by specifying the extent to which humanity could be a statistical outlier. We find that in nearly all plausible scenarios, the distance to the nearest visible K3 is cosmological. In searches where the observable range is limited, we also find that the most likely detections tend to be expanding civilizations who have entered the observable range from farther away. An observation of K3 clusters is thus more likely than isolated K3 galaxies.

• S. Jay Olson, On the visible size and geometry of aggressively expanding civilizations at cosmological distances.

Abstract.If a subset of advanced civilizations in the universe choose to rapidly expand into unoccupied space, these civilizations would have the opportunity to grow to a cosmological scale over the course of billions of years. If such life also makes observable changes to the galaxies they inhabit, then it is possible that vast domains of life-saturated galaxies could be visible from the Earth. Here, we describe the shape and angular size of these domains as viewed from the Earth, and calculate median visible sizes for a variety of scenarios. We also calculate the total fraction of the sky that should be covered by at least one domain. In each of the 27 scenarios we examine, the median angular size of the nearest domain is within an order of magnitude of a percent of the whole celestial sphere. Observing such a domain would likely require an analysis of galaxies on the order of a giga-lightyear from the Earth.

Here are the main assumptions in his first paper:

1. At early times (relative to the appearance of life), the universe is described by the standard cosmology – a benchmark Friedmann-Robertson-Walker (FRW) solution.

2. The limits of technology will allow for self-reproducing spacecraft, sustained relativistic travel over cosmological distances, and an efficient process to convert baryonic matter into radiation.

3. Control of resources in the universe will tend to be dominated by civilizations that adopt a strategy of aggressive expansion (defined as a frontier which expands at a large fraction of the speed of the individual spacecraft involved), rather than those expanding diffusively due to the conventional pressures of population dynamics.

4. The appearance of aggressively expanding life in the universe is a spatially random event and occurs at some specified, model-dependent rate.

5. Aggressive expanders will tend to expand in all directions unless constrained by the presence of other civilizations, will attempt to gain control of as much matter as is locally available for their use, and once established in a region of space, will consume mass as an energy source (converting it to radiation) at some specified, model-dependent rate.

The cosmological constant is the worst-ever prediction of quantum field theory, infamously off by 120 orders of magnitude. And as if that wasn’t embarrassing enough, this gives rise to, not one, but three problems: Why is the measured cosmological constant neither 1) huge nor 2) zero, and 3) Why didn’t this occur to us a billion years earlier? With that, you’d think that physicists have their hands full getting zeroes arranged correctly. But Niayesh Afshordi and Elliot Nelson just added to our worries.

In a paper that made it third place of this year’s Buchalter Cosmology Prize, Afshordi and Nelson pointed out that the cosmological constant, if it arises from the vacuum energy of matter fields, should be subject to quantum fluctuations. And these fluctuations around the average are still large even if you have managed to get the constant itself to be small.The cosmological constant, thus, is not actually constant. And since matter curves space-time, the matter fluctuations lead to space-time fluctuations – which can screw with our cosmological models. Afshordi and Nelson dubbed it the “Cosmological non-Constant Problem.”

But there is more to their argument than just adding to our problems because Afshordi and Nelson quantified what it takes to avoid a conflict with observation. They calculate the effect of stress-energy fluctuations on the space-time background, and then analyze what consequences this would have for the gravitational interaction. They introduce as a free parameter an energy scale up to which the fluctuations abound, and then contrast the corrections from this with observations, like for example the CMB power spectrum or the peculiar velocities of galaxy clusters. From these measurements they derive bounds on the scale at which the fluctuations must cease, and thus, where some new physics must come into play.

They find that the scale beyond which we should already have seen the effect of the vacuum fluctuations is about 35 TeV. If their argument is right, this means something must happen either to matter or to gravity before reaching this energy scale; the option the authors advocate in their paper is that physics becomes strongly coupled below this scale (thus invalidating the extrapolation to larger energies, removing the problem).

Unfortunately, the LHC will not be able to reach all the way up to 35 TeV. But a next larger collider – and we all hope there will be one! – almost certainly would be able to test the full range. As Niayesh put it: “It’s not a problem yet” – but it will be a problem if there is no new physics before getting all the way up to 35 TeV.

I find this an interesting new twist on the cosmological constant problem(s). Something about this argument irks me, but I can’t quite put a finger on it. If I have an insight, you’ll hear from me again. Just generally I would caution you to not take the exact numerical value too seriously because in this kind of estimate there are usually various places where factors of order one might come in.

I'm back from my trip. Here are some things that prevented me from more substantial blogging:

- I wrote an article for Aeon, "The superfluid Universe," which just appeared. For a somewhat more technical summary, see this earlier blogpost.
- I did a Q&A with John The-End-of-Science Horgan, which was fun. I disagree with him on many things, but I admire his writing. He is infallibly skeptic and unashamedly opinionated -- qualities I find lacking in much of today's science writing, including, sometimes, my own.
- I spoke with Davide Castelvecchi about Stephen Hawking's recent attempt to solve the black hole information loss problem, which I previously wrote about here.
- And I had some words to spare for Zeeya Merali, probably more words than she wanted, on the issue with the arXiv moderation, which we discussed here.
- Finally, I had the opportunity to give some input for this video on the PhysicsGirl's YouTube channel:

I previously explained in this blogpost that Hawking radiation is not produced at the black hole horizon, a correction to the commonly used popular science explanation that caught much more attention than I anticipated.

There are of course still some things in the above video I'd like to complain about. To begin with, anti-particles don't normally have negative energy (no they don't). And the vacuum is the same for two observers who are moving relative to each other with constant velocity - it's the acceleration that makes the difference between the vacua. In any case, I applaud the Physics Girl team for taking on what is admittedly a rather technical and difficult topic. If anyone can come up with a better illustration for Hawking-radiation than Hawking's own idea with the pairs that are being ripped apart (which is far too localized to fit well with the math), please leave a suggestion in the comments.

So, DFT contains a deep truth: Somehow just the electronic density as a function of position within a system in its lowest energy state contains, latent within it, basically *all* of the information about that ground state. This is the case even though you usually think that you should need to know the actual complex electronic wavefunction \(\Psi(\mathbf{r})\), and the density (\(\Psi^{*}\Psi\)) seems to throw away a bunch of information.

Moreover, thanks to Kohn and Sham, there is actually a procedure that lets you calculate things using a formalism where you can ignore electron-electron interactions and, in principle, get arbitrarily close to the*real* (including interaction corrections) density. In practice, life is not so easy. We don't actually know how to write down a readily computable form of the complete Kohn-Sham functional. Some people have very clever ideas about trying to finesse this, but it's hard, especially since the true functional is actually *nonlocal* - it somehow depends on correlations between the density (and its spatial derivatives) at different positions. In our seating chart analogy, we know that there's a procedure for finding the true optimal seating even without worrying about the interactions between people, but we don't know how to write it down nicely. The correct procedure involves looking at whether each seat is empty or full, whether its neighboring seats are occupied, and even potentially the coincident occupation of groups of seats - this is what I mean by *nonlocal*.

We could try a simplifying *local* approximation, where we only care about whether a given chair is empty or full. (If you try to approximate using a functional that depends only on the local density, you are doing LDA (the local density approximation)). We could try to be a bit more sophisticated, and worry about whether a chair is occupied and how much the occupancy varies in different directions. (If you try to incorporate the local density and its gradient, you are doing GGA (the generalized gradient approximation)). There are other, more complicated procedures that add in additional nonlocal bits - if done properly, this is rigorous. The real art in this business is understanding which approximations are best in which regimes, and how to compute things efficiently.

So how good can this be? An example is shown in the figure (from a summer school talk by my friend Leeor Kronik). The yellow points indicate (on both axes) the experimental values of the ionization energies for the various organic molecules shown. The other symbols show different calculated ionization energies plotted vs. the experimental values. A particular mathematical procedure with a clear theoretical justification (read the talk for details) that mixes in long-range and short-range contributions gives the points labeled with asterisks, which show very good agreement with the experiments.

Next time: The conclusion, with pitfalls, perils, and general abuses of DFT.

Moreover, thanks to Kohn and Sham, there is actually a procedure that lets you calculate things using a formalism where you can ignore electron-electron interactions and, in principle, get arbitrarily close to the

Fig. from here. |

So how good can this be? An example is shown in the figure (from a summer school talk by my friend Leeor Kronik). The yellow points indicate (on both axes) the experimental values of the ionization energies for the various organic molecules shown. The other symbols show different calculated ionization energies plotted vs. the experimental values. A particular mathematical procedure with a clear theoretical justification (read the talk for details) that mixes in long-range and short-range contributions gives the points labeled with asterisks, which show very good agreement with the experiments.

Next time: The conclusion, with pitfalls, perils, and general abuses of DFT.

In January 2014, I attended an FQXi conference on Vieques island in Puerto Rico. While there, Robert Lawrence Kuhn interviewed me for his TV program Closer to Truth, which deals with science and religion and philosophy and you get the idea. Alas, my interview was at the very end of the conference, and we lost track of the time—so unbeknownst to me, a plane full of theorists was literally sitting on the runway waiting for me to finish philosophizing! This was the second time Kuhn interviewed me for his show; the first time was on a cruise ship near Norway in 2011. (Thankless hero that I am, there’s nowhere I won’t travel for the sake of truth.)

Anyway, after a two-year wait, the videos from Puerto Rico are **finally available online**. While my vignettes cover what, for most readers of this blog, will be very basic stuff, I’m *sort of* happy with how they turned out: I still stutter and rock back and forth, *but not as much as usual*. For your viewing convenience, here are the new videos:

- The black hole information paradox, firewalls, and Harlow-Hayden argument (6 minutes)
- Physics and free will (8 minutes 24 seconds)
- Which entities are conscious? (6 minutes 3 seconds)
- Quantum mechanics, the predictability of nature, the Bell inequality, and Einstein-certified randomness (5 minutes 12 seconds)
- What’s the value of philosophy, and can it make progress? (3 minutes 42 seconds)
- Newcomb’s Paradox (4 minutes 13 seconds)
- Gödel’s Theorem and the definiteness of mathematical truth (8 minutes 20 seconds)

I had one other vignette, about why the universe exists, but they seem to have cut that one. Alas, if I knew why the universe existed in January 2014, I can’t remember any more.

One embarrassing goof: I referred to the inventor of Newcomb’s Paradox as “Simon Newcomb.” Actually it was William Newcomb: a distant relative of Simon Newcomb, the 19^{th}-century astronomer who measured the speed of light.

At their website, you can also see my older 2011 videos, and videos from others who *might* be known to readers of this blog, like Marvin Minsky, Roger Penrose, Rebecca Newberger Goldstein, David Chalmers, Sean Carroll, Max Tegmark, David Deutsch, Raphael Bousso, Freeman Dyson, Nick Bostrom, Ray Kurzweil, Rodney Brooks, Stephen Wolfram, Greg Chaitin, Garrett Lisi, Seth Lloyd, Lenny Susskind, Lee Smolin, Steven Weinberg, Wojciech Zurek, Fotini Markopoulou, Juan Maldacena, Don Page, and David Albert. (No, I haven’t yet watched most of these, but now that I linked to them, maybe I will!)

Thanks very much to Robert Lawrence Kuhn and Closer to Truth (and my previous self, I guess?) for providing *Shtetl-Optimized* content so I don’t have to.

**Update:** Andrew Critch of CFAR asked me to post the following announcement.

We’re seeking a full time salesperson for the Center for Applied Rationality in Berkeley, California. We’ve streamlined operations to handle large volume in workshop admissions, and now we need that volume to pour in. Your role would be to fill our workshops, events, and alumni community with people. Last year we had 167 total new alumni. This year we want 120 per month. Click here to find out more.

Today’s Memorial Library find: the magazine Advertising and Selling. The September 1912 edition features “How Furniture Could Be Better Advertised,” by Arnold Joerns, of E.J. Thiele and Co. Joerns complains that in 1911, the average American spend $81.22 on food, $26.02 on clothes, $19.23 on intoxicants, $9.08 on tobacco, and only $6.19 on furniture. “Do you […]

Today’s Memorial Library find: the magazine *Advertising and Selling*. The September 1912 edition features “How Furniture Could Be Better Advertised,” by Arnold Joerns, of E.J. Thiele and Co.

Joerns complains that in 1911, the average American spend $81.22 on food, $26.02 on clothes, $19.23 on intoxicants, $9.08 on tobacco, and only $6.19 on furniture. “Do you think furniture should be on the bottom of this list?” he asks, implicitly shaking his head. “Wouldn’t you — dealer or manufacturer — rather see it nearer the top, — say at least ahead of tobacco and intoxicants?”

Good news for furniture lovers: by 2012, US spending on “household furnishings and equipment” was at $1,506 per household, almost a quarter as much as we spent on food. (To be fair, it looks like this includes computers, lawnmowers, and many other non-furniture items.) Meanwhile, spending on alcohol is only $438. That’s pretty interesting: in 1911, liquor expenditures were a quarter of food expenditures; now it’s less than a tenth. Looks like a 1911 dollar is roughly 2012$25, so the real dollars spent on alcohol aren’t that different, but we spend a lot more now on food and on furniture.

Anyway, this piece takes a spendidly nuts turn at the end, as Joerns works up a head of steam about the moral peril of discount furniture:

I do not doubt but that fewer domestic troubles would exist if people were educated to a greater understanding of the furniture sentiment. Our young people would find more pleasure in an evening at home — if we made that home more worth while and a source of personal pride; then, perhaps, they would cease joy-riding, card-playing, or drinking and smoking in environments unhealthful to their minds and bodies.

It would even seem reasonable to assume, that if the public mind were educated to appreciate more the sentiment in furniture and its relation to the Ideal Home, we would have fewer divorces. Home would mean more to the boys and girls of today and the men and women of tomorrow. Obviously, if the public is permitted to lose more and more its appreciation of home sentiment, the divorce evil will grow, year by year.

Joerns proposes that the higher sort of furniture manufacturers boost their brand by advertising it, not as furniture, but as “meuble.” This seems never to have caught on.

This just in. Marvel has posted a video of a chat I did with Agent Carter's Reggie Austin (Dr. Jason Wilkes) about some of the science I dreamed up to underpin some of the things in the show. In particular, we talk about his intangibility and how it connects to other properties of the Zero Matter that we'd already established in earlier episodes. You can see it embedded below [...] Click to continue reading this post

The post It Came from Elsewhere… appeared first on Asymptotia.

I’ve been meaning to return to fluids for some time now, in order to build upon my construction two years ago of a solution to an averaged Navier-Stokes equation that exhibited finite time blowup. (I recently spoke on this work in the recent conference in Princeton in honour of Sergiu Klainerman; my slides for that […]

I’ve been meaning to return to fluids for some time now, in order to build upon my construction two years ago of a solution to an averaged Navier-Stokes equation that exhibited finite time blowup. (I recently spoke on this work in the recent conference in Princeton in honour of Sergiu Klainerman; my slides for that talk are here.)

One of the biggest deficiencies with my previous result is the fact that the averaged Navier-Stokes equation does not enjoy any good equation for the vorticity , in contrast to the true Navier-Stokes equations which, when written in vorticity-stream formulation, become

(Throughout this post we will be working in three spatial dimensions .) So one of my main near-term goals in this area is to exhibit an equation resembling Navier-Stokes as much as possible which enjoys a vorticity equation, and for which there is finite time blowup.

Heuristically, this task should be easier for the Euler equations (i.e. the zero viscosity case of Navier-Stokes) than the viscous Navier-Stokes equation, as one expects the viscosity to only make it easier for the solution to stay regular. Indeed, morally speaking, the assertion that finite time blowup solutions of Navier-Stokes exist should be roughly equivalent to the assertion that finite time blowup solutions of Euler exist which are “Type I” in the sense that all Navier-Stokes-critical and Navier-Stokes-subcritical norms of this solution go to infinity (which, as explained in the above slides, heuristically means that the effects of viscosity are negligible when compared against the nonlinear components of the equation). In vorticity-stream formulation, the Euler equations can be written as

As discussed in this previous blog post, a natural generalisation of this system of equations is the system

where is a linear operator on divergence-free vector fields that is “zeroth order” in some sense; ideally it should also be invertible, self-adjoint, and positive definite (in order to have a Hamiltonian that is comparable to the kinetic energy ). (In the previous blog post, it was observed that the surface quasi-geostrophic (SQG) equation could be embedded in a system of the form (1).) The system (1) has many features in common with the Euler equations; for instance vortex lines are transported by the velocity field , and Kelvin’s circulation theorem is still valid.

So far, I have not been able to fully achieve this goal. However, I have the following partial result, stated somewhat informally:

Theorem 1There is a “zeroth order” linear operator (which, unfortunately, is not invertible, self-adjoint, or positive definite) for which the system (1) exhibits smooth solutions that blowup in finite time.

The operator constructed is not quite a zeroth-order pseudodifferential operator; it is instead merely in the “forbidden” symbol class , and more precisely it takes the form

for some compactly supported divergence-free of mean zero with

being rescalings of . This operator is still bounded on all spaces , and so is arguably still a zeroth order operator, though not as convincingly as I would like. Another, less significant, issue with the result is that the solution constructed does not have good spatial decay properties, but this is mostly for convenience and it is likely that the construction can be localised to give solutions that have reasonable decay in space. But the biggest drawback of this theorem is the fact that is not invertible, self-adjoint, or positive definite, so in particular there is no non-negative Hamiltonian for this equation. It may be that some modification of the arguments below can fix these issues, but I have so far been unable to do so. Still, the construction does show that the circulation theorem is insufficient by itself to prevent blowup.

We sketch the proof of the above theorem as follows. We use the barrier method, introducing the time-varying hyperboloid domains

for (expressed in cylindrical coordinates ). We will select initial data to be for some non-negative even bump function supported on , normalised so that

in particular is divergence-free supported in , with vortex lines connecting to . Suppose for contradiction that we have a smooth solution to (1) with this initial data; to simplify the discussion we assume that the solution behaves well at spatial infinity (this can be justified with the choice (2) of vorticity-stream operator, but we will not do so here). Since the domains disconnect from at time , there must exist a time which is the first time where the support of touches the boundary of , with supported in .

From (1) we see that the support of is transported by the velocity field . Thus, at the point of contact of the support of with the boundary of , the inward component of the velocity field cannot exceed the inward velocity of . We will construct the functions so that this is not the case, leading to the desired contradiction. (Geometrically, what is going on here is that the operator is pinching the flow to pass through the narrow cylinder , leading to a singularity by time at the latest.)

First we observe from conservation of circulation, and from the fact that is supported in , that the integrals

are constant in both space and time for . From the choice of initial data we thus have

for all and all . On the other hand, if is of the form (2) with for some bump function that only has -components, then is divergence-free with mean zero, and

where . We choose to be supported in the slab for some large constant , and to equal a function depending only on on the cylinder , normalised so that . If , then passes through this cylinder, and we conclude that

Inserting ths into (2), (1) we conclude that

for some coefficients . We will not be able to control these coefficients , but fortunately we only need to understand on the boundary , for which . So, if happens to be supported on an annulus , then vanishes on if is large enough. We then have

on the boundary of .

Let be a function of the form

where is a bump function supported on that equals on . We can perform a dyadic decomposition where

where is a bump function supported on with . If we then set

then one can check that for a function that is divergence-free and mean zero, and supported on the annulus , and

so on (where ) we have

One can manually check that the inward velocity of this vector on exceeds the inward velocity of if is large enough, and the claim follows.

Remark 2The type of blowup suggested by this construction, where a unit amount of circulation is squeezed into a narrow cylinder, is of “Type II” with respect to the Navier-Stokes scaling, because Navier-Stokes-critical norms such (or at least ) look like they stay bounded during this squeezing procedure (the velocity field is of size about in cylinders of radius and length about ). So even if the various issues with are repaired, it does not seem likely that this construction can be directly adapted to obtain a corresponding blowup for a Navier-Stokes type equation. To get a “Type I” blowup that is consistent with Kelvin’s circulation theorem, it seems that one needs to coil the vortex lines around a loop multiple times in order to get increased circulation in a small space. This seems possible to pull off to me – there don’t appear to be any unavoidable obstructions coming from topology, scaling, or conservation laws – but would require a more complicated construction than the one given above.

Filed under: expository, math.AP Tagged: incompressible Euler equations

In previous posts, I've tried to introduce the idea that there can be "holistic" approaches to solving physics problems, and I've attempted to give a lay explanation of what a functional is (short version: a functional is a function of a function - it chews on a whole function and spits out a number.). Now I want to talk about density functional theory, an incredibly valuable and useful scientific advance ("easily the most heavily cited concept in the physical sciences"), yet one that is basically invisible to the general public.

Let me try an analogy. You're trying to arrange the seating for a big banquet, and there are a bunch of constraints: Alice wants very much to be close to the kitchen. Bob also wants to be close to the kitchen. However, Alice and Bob both want to be as far from all other people as possible. Etc. Chairs can't be on top of each other, but you still need to accommodate the full guest list. In the end you are going to care about the answers to certain questions: How hard would it be to push two chairs closer to each other? If one person left, how much would all the chairs need to be rearranged to keep everyone maximally comfortable? You could imagine solving this problem by brute force - write down all the constraints and try satisfying them one person at a time, though every person you add might mean rearranging all the previously seated people. You could also imagine solving this by some trial-and-error method, where you guess an initial arrangement, and make adjustments to check and see if you've improved how well you satisfy everyone. However, it doesn't look like there's any clear, immediate strategy for figuring this out and answering the relevant questions.

The analogy of DFT here would be three statements. First, you'd probably be pretty surprised if I told you that if I gave you the final seating positions of the people in the room, that would completely specify and nail down the answer to any of those questions up there that you could ask about the room.^{1} Second, there is a math procedure (a functional that depends on the positions of all of the people in the room that can be minimized) to find that unique seating chart.^{2} Third, even more amazingly, there is some mock-up of the situation where we don't have to worry about the people-people interactions directly, yet (minimizing a functional of the positions of the non-interacting people) would still give us the full seating chart, and therefore let us answer all the questions.^{3}

For a more physicsy example: Suppose you want to figure out the electronic properties of some system. In something like hydrogen gas, H_{2}, maybe we want to know where the electrons are, how far apart the atoms like to sit, and how much energy it takes to kick out an electron - these are important things to know if you are a chemist and want to understand chemical reactions, for example. Conceptually, this is easy: In principle we know the mathematical rules that describe electrons, so we should be able to write down the relevant equations, solve them (perhaps with a computer if we can't find nice analytical solutions), and we're done. In this case, the equation of interest is the time-independent form of the Schroedinger equation. There are two electrons in there, one coming from each hydrogen atom. One tricky wrinkle is that the two electrons don't just feel an attraction to the protons, but they also repel each other - that makes this an "interacting electron" problem. A second tricky wrinkle is that the electrons are fermions. If we imagine swapping (the quantum numbers associated with) two electrons, we have to pick up a minus sign in the math representation of their quantum state. We do know how to solve this problem (two interacting electrons plus two much heavier protons) numerically to a high degree of accuracy. Doing this kind of direct solution gets prohibitively difficult, however, as the number of electrons increases.

So what do we do? DFT tells us:

^{1}If you actually knew the total electron density as a function of position, \(n(\mathbf{r})\), that would completely determine the properties of the electronic ground state. This is the first Hohenberg-Kohn theorem.

^{2}There is a unique functional \(E[n(\mathbf{r})]\) for a given system that, when minimized, will give you the correct density \(n(\mathbf{r})\). This is the second Hohenberg-Kohn theorem.

^{3}You can set up a system where, with the right functional, you can solve a problem involving *noninteracting* electrons that will give you the true density \(n(\mathbf{r})\). That's the Kohn-Sham approach, which has actually made this kind of problem solving practical.

The observations by Kohn and Hohenberg are very deep. Somehow*just the electronic density* encodes a whole lot more information than you might think, especially if you've had homework experience trying to solve many-body quantum mechanics problems. The electronic density somehow contains *complete* information about all the properties of the lowest energy many-electron state. (In quantum language, knowing the density everywhere in principle specifies the expectation value of *any* operator you could apply to the ground state.)

The advance by Kohn and Sham is truly great - it describes an actual procedure that you can carry out to really calculate those ground state properties. The Kohn-Sham approach and its refinements have created the modern field of "quantum chemistry".

More soon....

Let me try an analogy. You're trying to arrange the seating for a big banquet, and there are a bunch of constraints: Alice wants very much to be close to the kitchen. Bob also wants to be close to the kitchen. However, Alice and Bob both want to be as far from all other people as possible. Etc. Chairs can't be on top of each other, but you still need to accommodate the full guest list. In the end you are going to care about the answers to certain questions: How hard would it be to push two chairs closer to each other? If one person left, how much would all the chairs need to be rearranged to keep everyone maximally comfortable? You could imagine solving this problem by brute force - write down all the constraints and try satisfying them one person at a time, though every person you add might mean rearranging all the previously seated people. You could also imagine solving this by some trial-and-error method, where you guess an initial arrangement, and make adjustments to check and see if you've improved how well you satisfy everyone. However, it doesn't look like there's any clear, immediate strategy for figuring this out and answering the relevant questions.

The analogy of DFT here would be three statements. First, you'd probably be pretty surprised if I told you that if I gave you the final seating positions of the people in the room, that would completely specify and nail down the answer to any of those questions up there that you could ask about the room.

For a more physicsy example: Suppose you want to figure out the electronic properties of some system. In something like hydrogen gas, H

So what do we do? DFT tells us:

The observations by Kohn and Hohenberg are very deep. Somehow

The advance by Kohn and Sham is truly great - it describes an actual procedure that you can carry out to really calculate those ground state properties. The Kohn-Sham approach and its refinements have created the modern field of "quantum chemistry".

More soon....

2016 reading project is to have more than half my reading be books in translation. So far this has translated into reading Ferrante after Ferrante. Not really feeling equal to the task of writing about these books, which color everything else around them while you read. The struggle to be the protagonist of your own […]

2016 reading project is to have more than half my reading be books in translation. So far this has translated into reading Ferrante after Ferrante. Not really feeling equal to the task of writing about these books, which color everything else around them while you read. *The struggle to be the protagonist of your own story. *Gatsby is a snapshot of it, Ferrante is a movie of it.

“The whole not only becomes more than but very different from the sum of its parts.” – P. W. Anderson It was a brainstorming meeting. We went from referencing a melodramatic viral Thai video to P. W. Anderson’s famous paper, … Continue reading

“The whole not only becomes more than but very different from the sum of its parts.”

– P. W. Anderson

It was a brainstorming meeting. We went from referencing a melodramatic viral Thai video to P. W. Anderson’s famous paper, “More is Different” to a vote-splitting internally produced Caltech video to the idea of extracting an edit out of existing video we already had from prior IQIM-Parveen Shah Productions’ shoots. And just like that, stepping from one idea to the next, hopping along, I pitched a theorist vs experimentalist “interrogation”. They must have liked it because the opinion in the room hushed for a few seconds. It seemed plausibly exciting and dangerously perhaps…fresh. But what I witnessed in the room was exactly the idea of collaboration and the “storm” of the brain. This wasn’t a conclusion we could likely have arrived to if all of us were sitting alone. There was a sense of the dust settling around the chaotic storm of the collective brain(s). John Preskill, Crystal Dilworth, Spiros Michalakis and I finally agreed on a plan of action going forward. And mind you, this of course was very far from the first meeting or email we had had about the video.

Capitalizing on the instant excitement, the emails started going around. Who will be the partners in crime? A combination of personality, representation, willingness and profile were balanced to decide the participants. We reached out to Gil Refael, David Hsieh, Nai-Chang Yeh, Xie Chen and Oskar Painter. They all said “yes”! It seemed deceivingly easy. And alas it came. Once the idea of the interrogation was unleashed; pitting them against one another or should I say “with” one another, brought about a bit of anxiety and confusion at first. “Wait, we’re supposed to fight on camera?” “But, our fields don’t match necessarily.” “No, no, it just doesn’t make sense.” I was prepared for the paranoia. It was natural and a bit less than what we got back when we pitched the high-fashion shoot for geeks video. I was taking them out of their comfort zone. It was natural. I abated the fears. I told them it was not going to be *that* controversial.

But I had to prep it to a certain level of “conflict” or “drama” so that what we got on camera, was at least some remnant of the initial emotional intention. The questions, the “tone” had to be set. Then we realized that it wasn’t just the meeting of the professorial brains but also the other researchers of the Institute that needed to be represented. And so, we also added some post docs and graduate students. Johannes Pollanen (now already an Assistant Professor at Michigan State University), Chandni Usha and Shaun Maguire. The idea of a nebulous conversation about theory vs practical or theory *and* practical seemed like a literal experiment of the very idea of the formation of the IQIM: putting the best brains in the field, in a sandbox, shaking them around to see the entangled interactions produced. It seemed too perfect.

The resulting video might not have produced the exact linear narrative I desired…but it was indeed “more than the sum of its parts”. It showed the excitement, the constant interaction, the curious conversations and the anxiety of being at the forefront, the cutting edge, where one is sometimes limited by another and sometimes enabled by the other, but most importantly, constantly growing and evolving. IQIM to me signifies that community.

Being accepted and integrated as a filmmaker itself is a virtue of that forced and encouraged collaboration and interaction.

And so we began. Before we filmed, I spent time with each duo, discussing and requesting a narrative and answers to some proposed questions.

Cinematographer, Anthony C. Kuhnz, and I were excited to shoot in Keith Schwab’s spacious lab, that had produced the memorable shot of Emma Wollman for our initial promo video. It’s space and background was exactly what we needed for the blurred backgrounds of this “brain space” we were hoping to create.

The lighting was certainly inspired by an interrogation scene but dampened for the dream state. We wanted to bring people into a *behind the scenes* discussion of some of the most brilliant minds in quantum physics, seeing the issues and challenges that face them; the exciting possibilities that they predict. The handheld camera and the dynamic pans from one to another were also inspired to communicate that transitional and collaborative “ball toss” energy. Once you feel that tangible creativity, then we go into the depths of what IQIM really is, how it creates its community within and without, the latter by focusing on the outreach and the educational efforts to spread the magic and sparkle of Physics.

I’m proud of this video. When I watch it, whether or not I understand everything being said, I do certainly want to be engaged with IQIM and that is the hope for others who watch it.

Nature is subtle and so is the effect of this video…as in, we hope that…we gotcha!

*guest post by Tim Silverman*

“Everything is simpler mod $p$.”

That is is the philosophy of the Mod People; and of all $p$, the simplest is 2. Washed in a bath of mod 2, that exotic object, the $\mathrm{E}_8$ lattice, dissolves into a modest orthogonal space, its Weyl group into an orthogonal group, its “large” $\mathrm{E}_8$ sublattices into some particularly nice subspaces, and the very Leech lattice itself shrinks into a few arrangements of points and lines that would not disgrace the pages of Euclid’s *Elements*. And when we have sufficiently examined these few bones that have fallen out of their matrix, we can lift them back up to Euclidean space in the most naive manner imaginable, and the full Leech springs out in all its glory like instant mashed potato.

What is this about? In earlier posts in this series, JB and Greg Egan have been calculating and exploring a lot of beautiful Euclidean geometry involving $\mathrm{E}_8$ and the Leech lattice. Lately, a lot of Fano planes have been popping up in the constructions. Examining these, I thought I caught some glimpses of a more extensive $\mathbb{F}_2$ geometry; I made a little progress in the comments, but then got completely lost. But there *is* indeed an extensive $\mathbb{F}_2$ world in here, parallel to the Euclidean one. I have finally found the key to it in the following fact:

**Large $\mathrm{E}_8$ lattices mod $2$ are just maximal flats in a $7$-dimensional quadric over $\mathbb{F}_2$.**

I’ll spend the first half of the post explaining what that means, and the second half showing how everything else flows from it. We unfortunately bypass (or simply assume in passing) most of the pretty Euclidean geometry; but in exchange we get a smaller, simpler picture which makes a lot of calculations easier, and the $\mathbb{F}_2$ world seems to lift very cleanly to the Euclidean world, though I haven’t actually proved this or explained why — maybe I shall leave that as an exercise for you, dear readers.

N.B. Just a quick note on scaling conventions before we start. There are two scaling conventions we could use. In one, a ‘shrunken’ $\mathrm{E}_8$ made of integral octonions, with shortest vectors of length $1$, contains ‘standard’ sized $\mathrm{E}_8$ lattices with vectors of minimal length $\sqrt{2}$, and Wilson’s Leech lattice construction comes out the right size. The other is $\sqrt{2}$ times larger: a ‘standard’ $\mathrm{E}_8$ lattice contains “large” $\mathrm{E}_8$ lattices of minimal length $2$, but Wilson’s Leech lattice construction gives something $\sqrt{2}$ times too big. I’ve chosen the latter convention because I find it less confusing: reducing the standard $\mathrm{E}_8$ mod $2$ is a well-known thing that people do, and all the Euclidean dot products come out as integers. But it’s as well to bear this in mind when relating this post to the earlier ones.

I’ll work with projective spaces over $\mathbb{F}_q$ and try not to suddenly start jumping back and forth between projective spaces and the underlying vector spaces as is my wont, at least not unless it really makes things clearer.

So we have an $n$-dimensional projective space over $\mathbb{F}_q$. We’ll denote this by $\mathrm{PG}(n,q)$.

The full symmetry group of $\mathrm{PG}(n,q)$ is $\mathrm{GL}_{n+1}(q)$, and from that we get subgroups and quotients $SL_{n+1}(q)$ (with unit determinant), $\mathrm{PGL}_{n+1}(q)$ (quotient by the centre) and $\mathrm{PSL}_{n+1}(q)$ (both). Over $\mathbb{F}_2$, the determinant is always $1$ (since that’s the only non-zero scalar) and the centre is trivial, so these groups are all the same.

In projective spaces over $\mathbb{F}_2$, there are $3$ points on every line, so we can ‘add’ two any points and get the third point on the line through them. (This is just a projection of the underlying vector space addition.)

In *odd* characteristic, we get two other families of Lie type by preserving two types of non-degenerate bilinear form: symmetric and skew-symmetric, corresponding to orthogonal and symplectic structures respectively. (Non-degenerate Hermitian forms, defined over
$\mathbb{F}_{q^2}$, also exist and behave similarly.)

Denote the form by $B(x,y)$. Points $x$ for which $B(x, x)=0$ are **isotropic**. For a symplectic structure all points are isotropic. A form $B$ such that $B(x,x)=0$ for all $x$ is called **alternating**, and in odd characteristic, but not characteristic $2$, skew-symmetric and alternating forms are the same thing.

A line spanned by two isotropic points, $x$ and $y$, such that $B(x,y)=1$ is a **hyperbolic line**. Any space with a non-degenerate bilinear (or Hermitian) form can be decomposed as the orthogonal sum of hyperbolic lines (i.e. as a vector space, decomposed as an orthogonal sum of hyperbolic planes), possibly together with an **anisotropic** space containing no isotropic points at all. There are no non-empty symplectic anisotropic spaces, so all symplectic spaces are odd-dimensional (projectively — the corresponding vector spaces are
even-dimensional).

There *are* anisotropic orthogonal points and lines (over any finite field including in even characteristic), but all the orthogonal spaces we consider here will be a sum of hyperbolic lines — we say they are of **plus** type. (The odd-dimensional projective spaces with a residual anisotropic line are of **minus** type.)

A **quadratic form** $Q(x)$ is defined by the conditions

i) $Q(x+y)=Q(x)+Q(y)+B(x,y)$, where $B$ is a symmetric bilinear form.

ii) $Q(\lambda x)=\lambda^2Q(x)$ for any scalar $\lambda$.

There are some non-degeneracy conditions I won’t go into.

Obviously, a quadratic form implies a particular symmetric bilinear form, by $B(x,y)=Q(x+y)-Q(x)-Q(y)$. In odd characteristic, we can go the other way: $Q(x)=\frac{1}{2}B(x,x)$.

We denote the group preserving an orthogonal structure of plus type on an $n$-dimensional projective space over $\mathbb{F}_q$ by $\mathrm{GO}_{n+1}^+(q)$, by analogy with $\mathrm{GL}_{n+1}(q)$. Similarly we have $\mathrm{SO}_{n+1}^+(q)$, $\mathrm{PGO}_{n+1}^+(q)$ and $\mathrm{PSO}_{n+1}^+(q)$. However, whereas $\mathrm{PSL}_n(q)$ is simple apart from $2$ exceptions, we usually have an index $2$ subgroup of $\mathrm{SO}_{n+1}^+(q)$, called $\Omega_{n+1}^+(q)$, and a corresponding index $2$ subgroup of $\mathrm{PSO}_{n+1}^+(q)$, called $\mathrm{P}\Omega_{n+1}^+(q)$, and it is the latter that is simple. (There is an infinite family of exceptions, where $\mathrm{PSO}_{n+1}^+(q)$ is simple.)

Symplectic structures are easier — the determinant is automatically $1$, so we just have $\mathrm{Sp}_{n+1}(q)$ and $\mathrm{PSp}_{n+1}(q)$, with the latter being simple except for $3$ exceptions.

Just as a point with $B(x,x)=0$ is an isotropic point, so any subspace with $B$ identically $0$ on it is an isotropic subspace.

And just as the linear groups act on incidence geometries given by the (‘classical’) projective spaces, so the symplectic and orthogonal act on **polar** spaces, whose points, lines, planes, etc, are just the isotropic points, isotropic lines, isotropic planes, etc given by the bilinear (or Hermitian) form. We denote an orthogonal polar space of plus type on an $n$-dimensional projective space over $\mathbb{F}_q$ by $\mathrm{Q}_n^+(q)$.

In *characteristic $2$*, a lot of this goes wrong, but in a way that can be fixed and mostly turns out the same.

1) Symmetric and skew-symmetric forms are the same thing! There are still distinct orthogonal and symplectic structures and groups, but we can’t use this as the distinction.

2) Alternating and skew-symmetric forms are not the same thing! Alternating forms are all skew-symmetric (aka symmetric) but not vice versa. A symplectic structure is given by an *alternating* form — and of course this definition works in odd characteristic too.

3) Symmetric bilinear forms are no longer in bijection with quadratic forms: every quadratic form gives a unique symmetric (aka skew-symmetric, and indeed alternating) bilinear form, but an alternating form is compatible with multiple quadratic forms. We use non-degenerate quadratic forms to define orthogonal structures, rather than symmetric bilinear forms — which of course works in odd characteristic too. (Note also from the above that in characteristic $2$ an orthogonal structure has an associated symplectic structure, which it shares with other orthogonal structures.)

We now have both isotropic subspaces on which the bilinear form is identically $0$, and **singular** subspaces on which the quadratic form is identically $0$, with the latter being a subset of the former. It is the singular spaces which go to make up the polar space for the orthogonal structure.

To cover both cases, we’ll refer to these isotropic/singular projective spaces inside the polar spaces as **flats**.

Everything else is still the same — decomposition into hyperbolic lines and an anisotropic space, plus and minus types, $\Omega_{n+1}^+(q)$ inside $\mathrm{SO}_{n+1}^+(q)$, polar spaces, etc.

Over $\mathbb{F}_2$, we have that $\mathrm{GO}_{n+1}^+(q)$, $\mathrm{SO}_{n+1}^+(q)$, $\mathrm{PGO}_{n+1}^+(q)$ and $\mathrm{PSO}_{n+1}^+(q)$ are all the same group, as are $\Omega_{n+1}^+(q)$ and $\mathrm{P}\Omega_{n+1}^+(q)$.

The *vector space* dimension of the maximal flats in a polar space is the polar **rank** of the space, one of its most important invariants — it’s the number of hyperbolic lines in its orthogonal decomposition.

$\mathrm{Q}_{2m-1}^+(q)$ has rank $m$. The maximal flats fall into two classes. In odd characteristic, the classes are preserved by $\mathrm{SO}_{2m}^+(q)$ but interchanged by the elements of $\mathrm{GO}_{2m}^+(q)$ with determinant $-1$. In even characteristic, the classes are preserved by $\Omega_{2m}^+(q)$, but interchanged by elements of $\mathrm{GO}_{2m}^+(q)$.

Finally, I’ll refer to the value of the quadratic form at a point, $Q(x)$, as the **norm** of $x$, even though in Euclidean space we’d call it “half the norm-squared”.

Here are some useful facts about $\mathrm{Q}_{2m-1}^+(q)$:

1a. The number of points is $\displaystyle\frac{\left(q^m-1\right)\left(q^{m-1}+1\right)}{q-1}$.

1b. The number of maximal flats is $\prod_{i=0}^{m-1}\left(1+q^i\right)$.

1c. Two maximal flats of different types must intersect in a flat of odd codimension; two maximal flats of the same type must intersect in a flat of even codimension.

Here two more general facts.

1d. Pick a projective space $\Pi$ of dimension $n$. Pick a point $p$ in it. The space whose points are lines through $p$, whose lines are planes through $p$, etc, with incidence inherited from $\Pi$, is a projective space of dimension $n-1$.

1e. Pick a polar space $\Sigma$ of rank $m$. Pick a point $p$ in it. The space whose points are lines (i.e. $1$-flats) through $p$, whose lines are planes (i.e. $2$-flats) through $p$, etc, with incidence inherited from $\Sigma$, is a polar space of the same type, of rank $m-1$.

The bivectors of a $4$-dimensional vector space constitute a $6$-dimensional vector space. Apart from the zero bivector, these fall into two types: degenerate ones which can be decomposed as the wedge product of two vectors and therefore correspond to planes (or, projectively, lines); and non-degenerate ones, which, by, wedging with vectors on each side give rise to symplectic forms. Wedging two bivectors gives an element of the $1$-dimensional space of $4$-vectors, and, picking a basis, the single component of this wedge product gives a non-degenerate symmetric bilinear form on the $6$-dimensional vector space of bivectors, and hence, in odd characteristic, an orthogonal space, which turns out to be of plus type. It also turns out that this can be carried over to characteristic $2$ as well, and gives a correspondence between $\mathrm{PG}(3,q)$ and $\mathrm{Q}_5^+(q)$, and isomorphisms between their symmetry groups. It is precisely the degenerate bivectors that are the ones of norm $0$, and we get the following correspondence:

$\array{\arrayopts{\collayout{left}\collines{dashed}\rowlines{solid dashed}\frame{solid}} \mathbf{\mathrm{Q}_5^+(q)}&\mathbf{\mathrm{PG}(3,q)}\\ \text{point}&\text{line}\\ \text{orthogonal points}&\text{intersecting lines}\\ \text{line}&\text{plane pencil}\\ \text{plane}_1&\text{point}\\ \text{plane}_2&\text{plane} }$

Here, “plane pencil” is all the lines that both go through a particular point and lie in a particular plane: effectively a point on a plane. The two types of plane in $\mathrm{Q}_5^+(q)$ are two families of maximal flats, and they correspond, in $\mathrm{PG}(3,q)$, to “all the lines through a particular point” and “all the lines in a particular plane”.

From fact 1c above, in $\mathrm{Q}_5^+(q)$ we have that two maximal flats of of different type must either intersect in a line or not intersect at all, corresponding to the fact in $\mathrm{PG}(3,q)$ that a point and a plane either coincide or don’t; while two maximal flats of the same type must intersect in a point, corresponding to the fact in $\mathrm{PG}(3,q)$ that any two points lie in a line, and any two planes intersect in a line.

In $\mathrm{Q}_7^+(q)$, you may observe from facts 1a and 1b that the following three things are equal in number: points; maximal flats of one type; maximal flats of the other type. This is because these three things are cycled by the triality symmetry.

Over $\mathbb{F}_2$, we have the following things:

2a. $\mathrm{PG}(3,2)$ has $15$ planes, each containing $7$ points and $7$ lines. It has (dually) $15$ points, each contained in $7$ lines and $7$ planes. It has $35$ lines, each containing $3$ points and contained in $3$ planes.

2b. $\mathrm{Q}_5^+(2)$ has $35$ points, corresponding to the $35$ lines of $\mathrm{PG}(3,2)$, and $30$ planes, corresponding to the $15$ points and $15$ planes of $\mathrm{PG}(3, 2)$. There’s lots and lots of other interesting stuff, but we will ignore it.

2c. $\mathrm{Q}_7^+(2)$ has $135$ points and $270$ $3$-spaces, i.e. two families of maximal flats containing $135$ elements each. A projective $7$-space has $255$ points, so if we give it an orthogonal structure of plus type, it will have $255-135=120$ points of norm $1$.

Now we move onto the second part.

We’ll coordinatise the $\mathrm{E}_8$ lattice so that the coordinates of its points are of the following types:

a) All integer, summing to an even number

b) All integer+$\frac{1}{2}$, summing to an odd number.

Then the roots are of the following types:

a) All permutations of $\left(\pm1,\pm1,0,0,0,0,0,0\right)$

b) All points like $\left(\pm\frac{1}{2},\pm\frac{1}{2},\pm\frac{1}{2},\pm\frac{1}{2}, \pm\frac{1}{2},\pm\frac{1}{2},\pm\frac{1}{2},\pm\frac{1}{2}\right)$ with an *odd* number of minus signs.

We now quotient $\mathrm{E}_8$ by $2\mathrm{E}_8$. The elements of the quotient can by represented by the following:

a) All coordinates are $1$ or $0$, an even number of each.

b) All coordinates are $\pm\frac{1}{2}$ with either $1$ or $3$ minus signs.

c) Take an element of type b and put a star after it. The meaning of this is: you can replace any coordinate $\frac{1}{2}$ and replace it with $-\frac{3}{2}$, or any coordinate $-\frac{1}{2}$ and replace it with $\frac{3}{2}$, to get an $\mathrm{E}_8$ lattice element representing this element of $\mathrm{E}_8/2\mathrm{E}_8$.

This is an $8$-dimensional vector space over $\mathbb{F}_2$.

Now we put the following quadratic form on this space: $Q(x)$ is half the Euclidean norm-squared, mod $2$. This gives rise to the following bilinear form: the Euclidean dot product mod $2$. This turns out to be a perfectly good non-degenerate quadratic form of plus type over $\mathbb{F}_2$.

There are $120$ elements of norm $1$, and these correspond to roots of $\mathrm{E}_8$ , with $2$ roots per element (related by switching the sign of all coordinates).

a) Elements of shape $\left(1,1,0,0,0,0,0,0\right)$ are already roots in this form.

b) Elements of shape $\left(0,0,1,1,1,1,1,1\right)$ correspond to the roots obtained by taking the complement (replacing all $1$s by $0$ and vice versa) and then changing the sign of one of the $1$s.

c) Elements in which all coordinates are $\pm\frac{1}{2}$ with either $1$ or $3$ minus signs are already roots, and by switching all the signs we get the half-integer roots with $5$ or $7$ minus signs.

There are $135$ non-zero elements of norm $0$, and these all correspond to lattice points in shell $2$, with $16$ lattice points per element of the vector space.

a) There are $70$ elements of shape $\left(1,1,1,1,0,0,0,0\right)$. We get $8$ lattice points by changing an even number of signs (including $0$). We get another $8$ lattice points by taking the complement and then changing an odd number of signs.

b) There is $1$ element of shape $\left(1,1,1,1,1,1,1,1\right)$. This corresponds to the $16$ lattice points of shape $\left(\pm2,0,0,0,0,0,0,0\right)$.

c) There are $64$ elements like $\left(\pm\frac{1}{2},\pm\frac{1}{2},\pm\frac{1}{2},\pm\frac{1}{2},\pm\frac {1}{2},\pm\frac{1}{2},\pm\frac{1}{2},\pm\frac{1}{2}\right)^*$, with $1$ or $3$ minus signs. We get $8$ actual lattice points by replacing $\pm\frac{1}{2}$ by $\mp\frac{3}{2}$ in one coordinate, and another $8$ by changing the signs of all coordinates.

This accounts for all $16\cdot135=2160$ points in shell $2$.

**Isotropic:**

$\array{\arrayopts{\collayout{left}\rowlines{solid}\collines{solid}\frame{solid}} \mathbf{shape}&\mathbf{number}\\ \left(1,1,1,1,1,1,1,1\right)&1\\ \left(1,1,1,1,0,0,0,0\right)&70\\ \left(\pm\tfrac{1}{2},\pm\tfrac{1}{2},\pm\tfrac{1}{2},\pm\tfrac{1}{2},\pm\tfrac{ 1}{2},\pm\tfrac{1}{2},\pm\tfrac{1}{2},\pm\tfrac{1}{2}\right)^*&64\\ \mathbf{total}&\mathbf{135} }$

**Anisotropic:**

$\array{\arrayopts{\collayout{left}\rowlines{solid}\collines{solid}\frame{solid}} \mathbf{shape}&\mathbf{number}\\ \left(1,1,1,1,1,1,0,0\right)&28\\ \left(1,1,0,0,0,0,0,0\right)&28\\ \left(\pm\tfrac{1}{2},\pm\tfrac{1}{2},\pm\tfrac{1}{2},\pm\tfrac{1}{2},\pm\tfrac{ 1}{2},\pm\tfrac{1}{2},\pm\tfrac{1}{2},\pm\tfrac{1}{2}\right)&64\\ \mathbf{total}&\mathbf{120} }$

Since the quadratic form in $\mathbb{F}_2$ comes from the quadratic form in Euclidean space, it is preserved by the Weyl group $W(\mathrm{E}_8)$. In fact the homomorphism $W(\mathrm{E}_8)\rightarrow \mathrm{GO}_8^+(2)$ is onto, although (contrary to what I said in an earlier comment) it is a double cover — the element of $W(\mathrm{E}_8)$ that reverses the sign of all coordinates is a (in fact, the) non-trivial element element of the kernel.

Pick a Fano plane structure on a set of seven points.

Here is a large $\mathrm{E}_8$ containing $\left(2,0,0,0,0,0,0,0\right)$:

(where $1\le i,j,k,p,q,r,s\le7$)

$\pm2e_i$

$\pm e_0\pm e_i\pm e_j\pm e_k$ where $i$, $j$, $k$ lie on a line in the Fano plane

$\pm e_p\pm e_q\pm e_r\pm e_s$ where $p$, $q$, $r$ , $s$ lie off a line in the Fano plane.

Reduced to $\mathrm{E}_8$ mod $2$, these come to

i) $\left(1,1,1,1,1,1,1,1\right)$

ii) $e_0+e_i+e_j+e_k$ where $i$, $j$, $k$ lie on a line in the Fano plane. E.g. $\left(1,1,1,0,1,0,0,0\right)$.

iii) $e_p+e_q+e_r+e_s$ where $p$, $q$, $r$, $s$ lie off a line in the Fano plane. E.g. $\left(0,0,0,1,0,1,1,1\right)$.

Each of these corresponds to $16$ elements of the large $\mathrm{E}_8$ roots.

Some notes on these points:

1) They’re all isotropic, since they have a multiple of $4$ non-zero entries.

2) They’re mutually orthogonal.

a) Elements of types ii and iii are all orthogonal to $\left(1,1,1,1,1,1,1,1\right)$ because they have an even number of ones (like all all-integer elements).

b) Two elements of type ii overlap in two places: $e_0$ and the point of the Fano plane that they share.

c) If an element $x$ of type ii and an element $y$ of type iii are mutual complements, obviously they have no overlap. Otherwise, the complement of $y$ is an element of type ii, so $x$ overlaps with it in exactly two places; hence $x$ overlaps with $y$ itself in the other two non-zero places of $x$.

d) From $c$, given two elements of type iii, one will overlap with the complement of the other in two places, hence (by the argument of c) will overlap with the other element itself in two places.

3) Adjoining the zero vector, they give a set closed under addition.

The rule for addition of all-integer elements is reasonably straightforward: if they are orthogonal, then treat the $1$s and $0$s as bits and add mod $2$. If they aren’t orthogonal, then do the same, then take the complement of the answer.

a) Adding $\left(1,1,1,1,1,1,1,1\right)$ to any of the others just gives the complement, which is a member of the set.

b) Adding two elements of type ii, we set to $0$ the $e_0$ component and the component corresponding to the point of intersection in the Fano plane, leaving the $4$ components where they don’t overlap, which are just the complement of the third line of the Fano plane through their point of intersection, and is hence a member of the set.

c) Each element of type iii is the sum of the element of type i and an element of type ii, hence is covered implicitly by cases a and b.

4) There are $15$ elements of the set.

a) There is $\left(1,1,1,1,1,1,1,1\right)$.

b) There are $7$ corresponding to lines of the Fano plane.

c) There are $7$ corresponding to the complements of lines of the Fano plane.

From the above, these $15$ elements form a maximal flat of $\mathrm{Q}_7^+(2)$. (That is, $15$ points projectively, forming a projective $3$-space in a projective $7$-space.)

That a large $\mathrm{E}_8$ lattice projects to a flat is straightforward:

First, as a lattice it’s closed under addition over $\mathbb{Z}$, so should project to a subspace over $\mathbb{F}_2$.

Second, since the cosine of the angle between two roots of $\mathrm{E}_8$ is always a multiple of $\frac{1}{2}$, and the points in the second shell have Euclidean length $2$, the dot product of two large $\mathrm{E}_8$ roots must always be an even integer. Also, the large $\mathrm{E}_8$ roots project to norm $0$ points. So all points of the large $\mathrm{E}_8$ should project to norm $0$ points.

It’s not instantly obvious to me that large $\mathrm{E}_8$ should project to a *maximal* flat, but it clearly does.

So I’ll assume each $\mathrm{E}_8$ corresponds to a maximal flat, and generally that everything that I’m going to talk about over $\mathbb{F}_2$ lifts faithfully to Euclidean space, which seems plausible (and works)! But I haven’t proved it. Anyway, assuming this, a bunch of stuff follows.

We immediately know there are $270$ large $\mathrm{E}_8$ lattices, because there are $270$ maximal flats in $\mathrm{Q}_7^+(2)$, either from the formula $\prod_{i=0}^{m-1}\left(1+q^i\right)$, or immediately from triality and the fact that there are $135$ points in $\mathrm{Q}_7^+(2)$.

We can now bring to bear some more general theory. How many large $\mathrm{E}_8$ root-sets share a point? Let us project this down and instead ask, How many maximal flats share a given point?

Recall fact 1e:

1e. Pick a polar space $\Sigma$ of rank $m$. Pick a point $p$ in it. The space whose points are lines (i.e. $1$-flats) through $p$, whose lines are planes (i.e. $2$-flats) through $p$, etc, with incidence inherited from $\Sigma$, form a polar space of the same type, of rank $m-1$.

So pick a point $p$ in $\mathrm{Q}_7^+(2)$. The space of all flats containing $p$ is isomorphic to $\mathrm{Q}_5^+(2)$. The maximal flats containing $p$ in $\mathrm{Q}_7^+(2)$ correspond to all maximal flats of $\mathrm{Q}_5^+(2)$, of which there are $30$. So there are $30$ maximal flats of $\mathrm{Q}_7^+(2)$ containing $p$, and hence $30$ large $\mathrm{E}_8$ lattices containing a given point.

We see this if we fix $\left(1,1,1,1,1,1,1,1\right)$, and the maximal flats correspond to the $30$ ways of putting a Fano plane structure on $7$ points. Via the Klein correspondence, I guess this is a way to show that the $30$ Fano plane structures correspond to the points and planes of $\mathrm{PG}(3,2)$.

Now assume that large $\mathrm{E}_8$ lattices with non-intersecting sets of roots correspond to non-intersecting maximal flats. The intersections of maximal flats obey rule 1c:

1c. Two maximal flats of different types must intersect in a flat of odd codimension; two maximal flats of the same type must intersect in a flat of even codimension.

So two $3$-flats of opposite type must intersect in a plane or a point; if they are of the same type, they must intersect in a line or not at all (the empty set having dimension $-1$).

We want to count the dimension $-1$ intersections, but it’s easier to count the dimension $1$ intersections and subtract from the total.

So, given a $3$-flat, how many other $3$-flats intersect it in a line?

Pick a point $p$ in $\mathrm{Q}_7^+(2)$. The $3$-flats sharing that point correspond to the planes of $\mathrm{Q}_5^+(2)$. Then the set of $3$-flats sharing just a line through $p$ with our given $3$-flat correspond to the set of planes of $\mathrm{Q}_5^+(2)$ sharing a single point with a given plane. By what was said above, this is *all* the other planes of the same type (there’s no other dimension these intersections can have). There are $14$ of these ($15$ planes minus the given one).

So, given a point in the $3$-flat, there are $14$ other $3$-flats sharing a line (and no more) which passes through the point. There are $15$ points in the $3$-flat, but on the other hand there are $3$ points in a line, giving $\frac{14\cdot15}{3}=70$ $3$-spaces sharing a line (and no more) with a given $3$-flat.

But there are a total of $135$ $3$-flats of a given type. If $1$ of them is a given $3$-flat, and $70$ of them intersect that $3$-flat in a line, then $135-1-70=64$ don’t intersect the $3$-flat at all. So there should be $64$ large $\mathrm{E}_8$ lattices whose roots don’t meet the roots of a given large $\mathrm{E}_8$ lattice.

We can also look at the intersections of large $\mathrm{E}_8$ root systems with large $\mathrm{E}_8$ root systems of opposite type. What about the intersections of two $3$-flats in a plane? If we focus just on planes passing through a particular point, this corresponds, in $\mathrm{Q}_5^+(2)$, to planes intersecting in a line. There are $7$ planes intersecting a given plane in a line (from the Klein correspondence — they correspond to the seven points in a plane or the seven planes containing a point of $\mathrm{PG}(3,2)$). So there are $7$ $3$-flats of $\mathrm{Q}_7^+(2)$ which intersect a given $3$-flat in a plane containing a given point. There $15$ points to choose from, but $7$ points in a plane, meaning that there are $\frac{7\cdot15}{7}=15$ $3$-flats intersecting a given $3$-flat in a plane. A plane has $7$ points, so translating that to $\mathrm{E}_8$ lattices should give $7\cdot16=112$ shared roots.

That leaves $135-15=120$ $3$-flats intersecting a given $3$-flat in a single point, corresponding to $16$ shared roots.

$\array{\arrayopts{\collayout{left}\collines{solid}\rowlines{solid}\frame{solid}} \mathbf{\text{intersection dim.}}&\mathbf{\text{number}}&\mathbf{\text{same type}}\\ 2&15&No\\ 1&70&Yes\\ 0&120&No\\ -1&64&Yes }$

A couple of points here related to triality. Under triality, one type of maximal flat gets sent to the other type, and the other type gets sent to singular points ($0$-flats). The incidence relation of “intersecting in a plane” gets sent to ordinary incidence of a point with a flat. So the fact that there are $15$ maximal flats that intersect a given maximal flat in a plane is a reflection of the fact that there are $15$ points in a maximal flat (or, dually, $15$ maximal flats of a given type containing a given point).

The intersection of two maximal flats of the same type translates into a relation between two singular points. Just from the numbers, we’d expect “intersection in a line” to translate into “orthogonal to”, and “disjoint” to translate into “not orthogonal to”.

In that case, a pair of maximal flats intersecting in a (flat) line translates to $2$ mutually orthogonal flat points — whose span is a flat line. Which makes sense, because under triality, $1$-flats transform to $1$-flats, reflecting the fact that the central point of the $D_4$ diagram (representing lines) is sent to itself under triality.

In that case, two disjoint maximal flats translates to a pair of non-orthogonal singular points, defining a hyperbolic line.

Fixing a hyperbolic line (pointwise) obviously reduces the rank of the polar space by $1$, picking out a $\mathrm{GO}_6^+(2)$ subgroup of $\mathrm{GO}_8^+(2)$. By the Klein correspondence, $\mathrm{GO}_6^+(2)$ is isomorphic to $\mathrm{PSL}_4(2)$, which is just the automorphism group of $\mathrm{PG}(3, 2)$ — i.e., here, the automorphism group of a maximal flat. So the joint stabiliser of two disjoint maximal flats is just automorphisms of one of them, which forces corresponding automorphisms of the other. This group is also isomorphic to the symmetric group $S_8$, giving all permutations of the coordinates (of the $\mathrm{E}_8$ lattice).

(My guess would be that the actions of $\mathrm{GL}_4(2)$ on the two maximal flats would be related by an outer automorphsm of $\mathrm{GL}_4(2)$, in which the action on the points of one flat would match an action on the planes of the other, and vice versa, preserving the orthogonality relations coming from the symplectic structure implied by the orthogonal structure — i.e. the alternating form implied by the quadratic form.)

We see this “non-orthogonal singular points” $\leftrightarrow$ “disjoint maximal flats” echoed when we look at nearest neighbours.

Nearest neighbours in the second shell of the $\mathrm{E}_8$ lattice are separated from each other by an angle of $\cos^{-1}\frac{3}{4}$, so have a mutual dot product of $3$, hence are non-orthogonal over $\mathbb{F}_2$.

Let us choose a fixed point $\left(2,0,0,0,0,0,0,0\right)$ in the second shell of $\mathrm{E}_8$ . This has as our chosen representative $\left(1,1,1,1,1,1,1,1\right)$ in our version of $\mathrm{PG}(7,2)$, which has the convenient property that it is orthogonal to the all-integer points, and non-orthogonal to the half-integer points. The half-integer points in the second shell are just those that we write as $\left(\pm\frac{1}{2},\pm\frac{1}{2},\pm\frac{1}{2},\pm\frac{1}{2},\pm\frac{1}{2}, \pm\frac{1}{2},\pm\frac{1}{2},\pm\frac{1}{2}\right)^\star$ in our notation, where the $*$ means that we should replace any $\frac{1}{2}$ by $-\frac{3}{2}$ or replace any $-\frac{1}{2}$ by $\frac{3}{2}$ to get a corresponding element in the second shell of the $\mathrm{E}_8$ latttice, and where we require $1$ or $3$ minus signs in the notation, to correspond two points in the lattice with opposite signs in all coordinates.

Now, since each reduced isotropic point represents $16$ points of the second shell, merely saying that two reduced points have dot product of $1$ is not enough to pin down actual nearest neighbours.

But very conveniently, the sets of $16$ are formed in parallel ways for the particular setup we have chosen. Namely, lifting $\left(1,1,1,1,1,1,1,1\right)$ to a second-shell element, we can choose to put the $\pm2$ in each of the $8$ coordinates, with positive or negative sign, and lifting an element of the form $\left(\pm\frac{1}{2},\pm\frac{1}{2},\pm\frac{1}{2},\pm\frac{1}{2},\pm\frac{1}{2}, \pm\frac{1}{2},\pm\frac{1}{2},\pm\frac{1}{2}\right)^*$ to a second-shell element, we can choose to put the $\pm\frac{3}{2}$ in each of the $8$ coordinates, with positive or negative sign.

So we can line up our conventions, and choose, e.g., specifically $\left(+2,0,0, 0,0,0,0,0\right)$, and choose neighbours of the form $\left(+\frac{3}{2},\pm\frac{1}{2},\pm\frac{1}{2},\pm\frac{1}{2},\pm\frac{1}{2}, \pm\frac{1}{2},\pm\frac{1}{2},\pm\frac{1}{2}\right)$, with an even number of minus signs.

This tells us we have $64$ nearest neighbours, corresponding to the $64$ isotropic points of half-integer form. Let us call this set of points $T$.

Now pick one of those $64$ isotropic points, call it $p$. It lies, as we showed earlier, in $30$ maximal flats, corresponding to the $30$ plane flats of $\mathrm{Q}_5^+(2)$, and we would like to understand the intersections of these flats with $T$: that is, those nearest neighbours which belong to each large $\mathrm{E}_8$ lattice.

In any maximal flat, i.e. any $3$-flat, containing $p$, there will be $7$ lines passing through $p$, each with $2$ other points on it, totalling $14$ which, together with $p$ itself form the $15$ points of a copy of $\mathrm{PG}(3,2)$.

Now, the sum of two all-integer points is an all-integer point, but the sum of two half-integer points is also an all-integer point. So of the two other points on each of those lines, one will be half-integer and one all-integer. So there will be $7$ half-integer points in addition to $p$ itself; i.e. the maximal flat will meet $T$ in $8$ points; hence the corresponding large $\mathrm{E}_8$ lattice will contain $8$ of the nearest neighbours of $\left(2,0,0,0,0,0,0,0\right)$.

Also, because the sum of two half-integer points is not a half-integer point, no $3$ of those $8$ points will lie on a line.

But the only way that you can get $8$ points in a $3$-space such that no $3$ of them lie on a line of the space is if they are the $8$ points that do not lie on a plane of the space. Hence the other $7$ points — the ones lying in the all-integer subspace — must form a Fano plane.

So we have the following: inside the projective $7$-space of lattice elements mod $2$, we have the projective $6$-space of all-integer elements, and inside there we have the $5$-space of all-integer elements orthogonal to $p$, and inside there we have a polar space isomorphic to $\mathrm{Q}_5^+(2)$, and in there we have $30$ planes. And adding $p$ to each element of one of those planes gives the $7$ elements which accompany $p$ in the intersection of the isotropic half-integer points with the corresponding $3$-flat, which lift to the nearest neighbours of $\left(2,0,0,0,0,0,0,0\right)$ lying in the corresponding large $\mathrm{E}_8$ lattice.

A new paper last week straightens out the story of the diphoton background in ATLAS. Some confusion was created because theorists misinterpreted the procedures described in the ATLAS conference note, which could lead to a different estimate of the significance of the 750 GeV excess. However, once the correct phenomenological and statistical approach is adopted, the significance quoted by ATLAS can be reproduced, up to small differences due to incomplete information available in public documents. Anyway, now that this is all behind, we can safely continue being excited at least until summer. Today I want to discuss different interpretations of the diphoton bump observed by ATLAS. I will take a purely phenomenological point of view, leaving for the next time the question of a bigger picture that the resonance may fit into.

Phenomenologically, the most straightforward interpretation is the so-called*everyone's model*: a 750 GeV singlet scalar particle produced in gluon fusion and decaying to photons via loops of new vector-like quarks. This simple construction perfectly explains all publicly available data, and can be easily embedded in more sophisticated models. Nevertheless, many more possibilities were pointed out in the 750 papers so far, and here I review a few that I find most interesting.

**Spin Zero or More? **

For a particle decaying to two photons, there is not that many possibilities: the resonance has to be a boson and, according to young Landau's theorem, it cannot have spin 1. This leaves at the table spin 0, 2, or higher. Spin-2 is an interesting hypothesis, as this kind of excitations is predicted in popular models like the Randall-Sundrum one. Higher-than-two spins are disfavored theoretically. When more data is collected, the spin of the 750 GeV resonance can be tested by looking at the angular distribution of the photons. The rumor is that the data so far somewhat favor spin-2 over spin-0, although the statistics is certainly insufficient for any serious conclusions. Concerning the parity, it is practically impossible to determine it by studying the diphoton final state, and both the scalar and the pseudoscalar option are equally viable at present. Discrimination may be possible in the future, but only if multi-body decay modes of the resonance are discovered. If the true final state is more complicated than two photons (see below), then the 750 GeV resonance may have any spin, including spin-1 and spin-1/2.

**Narrow or Wide? **

The total width is an inverse of particle's lifetime (in our funny units). From the experimental point of view, the width larger than detector's energy resolution will show up as a smearing of the resonance due to the uncertainty principle. Currently, the ATLAS run-2 data prefer the width 10 times larger than the experimental resolution (which is about 5 GeV in this energy ballpark), although the preference is not very strong in the statistical sense. On the other hand, from the theoretical point of view, it is much easier to construct models where the 750 GeV resonance is a narrow particle. Therefore, confirmation of the large width would have profound consequences, as it would significantly narrow down the scope of viable models. The most exciting interpretation would then be that the resonance is a portal to a dark sector containing new light particles very weakly coupled to ordinary matter.

**How many resonances? **

One resonance is enough, but a family of resonances tightly packed around 750 GeV may also explain the data. As a bonus, this could explain the seemingly large width without opening new dangerous decay channels. It is quite natural for particles to come in multiplets with similar masses: our pion is an example where the small mass splitting π± and π0 arises due to electromagnetic quantum corrections. For Higgs-like multiplets the small splitting may naturally arise after electroweak symmetry breaking, and the familiar 2-Higgs doublet model offers a simple realization. If the mass splitting of the multiplet is larger than the experimental resolution, this possibility can tested by precisely measuring the profile of the resonance and searching for a departure from the Breit-Wigner shape. On the other side of the spectrum is the idea is that there is no resonance at all at 750 GeV, but rather at another mass, and the bump at 750 GeV appears due to some kinematical accidents.

**Who made it? **

The most plausible production process is definitely the gluon-gluon fusion. Production in collisions of light quark and antiquarks is also theoretically sound, however it leads to a more acute tension between run-2 and run-1 data. Indeed, even for the gluon fusion, the production cross section of a 750 GeV resonance in 13 TeV proton collisions is only 5 times larger than at 8 TeV. Given the larger amount of data collected in run-1, we would expect a similar excess there, contrary to observations. For a resonance produced from u-ubar or d-dbar the analogous ratio is only 2.5 (see the table), leading to much more tension. The ratio climbs back to 5 if the initial state contains the heavier quarks: strange, charm, or bottom (which can also be found sometimes inside a proton), however I haven't seen yet a neat model that makes use of that. Another possibility is to produce the resonance via photon-photon collisions. This way one could cook up a truly minimal and very predictive model where the resonance couples only to photons of all the Standard Model particles. However, in this case, the ratio between 13 and 8 TeV cross section is very unfavorable, merely a factor of 2, and the run-1 vs run-2 tension comes back with more force. More options open up when associated production (e.g. with t-tbar, or in vector boson fusion) is considered. The problem with these ideas is that, according to what was revealed during the talk last December, there isn't any additional energetic particles in the diphoton events. Similar problems are facing models where the 750 GeV resonance appears as a decay product of a heavier resonance, although in this case some clever engineering or fine-tuning may help to hide the additional particles from experimentalist's eyes.

**Two-body or more?**

While a simple two-body decay of the resonance into two photons is a perfectly plausible explanation of all existing data, a number of interesting alternatives have been suggested. For example, the decay could be 3-body, with another soft visible or invisible particle accompanying two photons. If the masses of all particles involved are chosen appropriately, the invariant mass spectrum of the diphoton remains sharply peaked. At the same time, a broadening of the diphoton energy due to the 3-body kinematics may explain why the resonance appears wide in ATLAS. Another possibility is a cascade decay into 4 photons. If the intermediate particles are very light, then the pairs of photons from their decay are very collimated and may look like a single photon in the detector.

♬ T*he problem is all inside your head** *♬* *and the possibilities are endless. The situation is completely different than during the process of discovering the Higgs boson, where one strongly favored hypothesis was tested against more exotic ideas. Of course, the first and foremost question is whether the excess is really new physics, or just a nasty statistical fluctuation. But if that is confirmed, the next crucial task for experimentalists will be to establish the nature of the resonance and get model builders on the right track. ♬ *The answer is easy if you take it logically *♬

All ideas discussed above appeared in recent articles by various authors addressing the 750 GeV excess. If I were to include all references the post would be just one giant hyperlink, so you need to browse the literature yourself to find the original references.

Phenomenologically, the most straightforward interpretation is the so-called

For a particle decaying to two photons, there is not that many possibilities: the resonance has to be a boson and, according to young Landau's theorem, it cannot have spin 1. This leaves at the table spin 0, 2, or higher. Spin-2 is an interesting hypothesis, as this kind of excitations is predicted in popular models like the Randall-Sundrum one. Higher-than-two spins are disfavored theoretically. When more data is collected, the spin of the 750 GeV resonance can be tested by looking at the angular distribution of the photons. The rumor is that the data so far somewhat favor spin-2 over spin-0, although the statistics is certainly insufficient for any serious conclusions. Concerning the parity, it is practically impossible to determine it by studying the diphoton final state, and both the scalar and the pseudoscalar option are equally viable at present. Discrimination may be possible in the future, but only if multi-body decay modes of the resonance are discovered. If the true final state is more complicated than two photons (see below), then the 750 GeV resonance may have any spin, including spin-1 and spin-1/2.

The total width is an inverse of particle's lifetime (in our funny units). From the experimental point of view, the width larger than detector's energy resolution will show up as a smearing of the resonance due to the uncertainty principle. Currently, the ATLAS run-2 data prefer the width 10 times larger than the experimental resolution (which is about 5 GeV in this energy ballpark), although the preference is not very strong in the statistical sense. On the other hand, from the theoretical point of view, it is much easier to construct models where the 750 GeV resonance is a narrow particle. Therefore, confirmation of the large width would have profound consequences, as it would significantly narrow down the scope of viable models. The most exciting interpretation would then be that the resonance is a portal to a dark sector containing new light particles very weakly coupled to ordinary matter.

One resonance is enough, but a family of resonances tightly packed around 750 GeV may also explain the data. As a bonus, this could explain the seemingly large width without opening new dangerous decay channels. It is quite natural for particles to come in multiplets with similar masses: our pion is an example where the small mass splitting π± and π0 arises due to electromagnetic quantum corrections. For Higgs-like multiplets the small splitting may naturally arise after electroweak symmetry breaking, and the familiar 2-Higgs doublet model offers a simple realization. If the mass splitting of the multiplet is larger than the experimental resolution, this possibility can tested by precisely measuring the profile of the resonance and searching for a departure from the Breit-Wigner shape. On the other side of the spectrum is the idea is that there is no resonance at all at 750 GeV, but rather at another mass, and the bump at 750 GeV appears due to some kinematical accidents.

The most plausible production process is definitely the gluon-gluon fusion. Production in collisions of light quark and antiquarks is also theoretically sound, however it leads to a more acute tension between run-2 and run-1 data. Indeed, even for the gluon fusion, the production cross section of a 750 GeV resonance in 13 TeV proton collisions is only 5 times larger than at 8 TeV. Given the larger amount of data collected in run-1, we would expect a similar excess there, contrary to observations. For a resonance produced from u-ubar or d-dbar the analogous ratio is only 2.5 (see the table), leading to much more tension. The ratio climbs back to 5 if the initial state contains the heavier quarks: strange, charm, or bottom (which can also be found sometimes inside a proton), however I haven't seen yet a neat model that makes use of that. Another possibility is to produce the resonance via photon-photon collisions. This way one could cook up a truly minimal and very predictive model where the resonance couples only to photons of all the Standard Model particles. However, in this case, the ratio between 13 and 8 TeV cross section is very unfavorable, merely a factor of 2, and the run-1 vs run-2 tension comes back with more force. More options open up when associated production (e.g. with t-tbar, or in vector boson fusion) is considered. The problem with these ideas is that, according to what was revealed during the talk last December, there isn't any additional energetic particles in the diphoton events. Similar problems are facing models where the 750 GeV resonance appears as a decay product of a heavier resonance, although in this case some clever engineering or fine-tuning may help to hide the additional particles from experimentalist's eyes.

While a simple two-body decay of the resonance into two photons is a perfectly plausible explanation of all existing data, a number of interesting alternatives have been suggested. For example, the decay could be 3-body, with another soft visible or invisible particle accompanying two photons. If the masses of all particles involved are chosen appropriately, the invariant mass spectrum of the diphoton remains sharply peaked. At the same time, a broadening of the diphoton energy due to the 3-body kinematics may explain why the resonance appears wide in ATLAS. Another possibility is a cascade decay into 4 photons. If the intermediate particles are very light, then the pairs of photons from their decay are very collimated and may look like a single photon in the detector.

♬ T

All ideas discussed above appeared in recent articles by various authors addressing the 750 GeV excess. If I were to include all references the post would be just one giant hyperlink, so you need to browse the literature yourself to find the original references.

After my tentative Polymath proposal, there definitely seems to be enough momentum to start a discussion “officially”, so let’s see where it goes. I’ve thought about the question of whether to call it Polymath11 (the first unclaimed number) or Polymath12 (regarding the polynomial-identities project as Polymath11). In the end I’ve gone for Polymath11, since the […]

After my tentative Polymath proposal, there definitely seems to be enough momentum to start a discussion “officially”, so let’s see where it goes. I’ve thought about the question of whether to call it Polymath11 (the first unclaimed number) or Polymath12 (regarding the polynomial-identities project as Polymath11). In the end I’ve gone for Polymath11, since the polynomial-identities project was listed on the Polymath blog as a proposal, and I think the right way of looking at things is that the problem got solved before the proposal became a fully-fledged project. But I still think that that project should be counted as a Polymathematical success story: it shows the potential benefits of opening up a problem for consideration by anybody who might be interested.

Something I like to think about with Polymath projects is the following question: if we end up *not* solving the problem, then what can we hope to achieve? The Erdős discrepancy problem project is a good example here. An obvious answer is that we can hope that enough people have been stimulated in enough ways that the probability of somebody solving the problem in the not too distant future increases (for example because we have identified more clearly the gap in our understanding). But I was thinking of something a little more concrete than that: I would like at the very least for this project to leave behind it an online resource that will be essential reading for anybody who wants to attack the problem in future. The blog comments themselves may achieve this to some extent, but it is not practical to wade through hundreds of comments in search of ideas that may or may not be useful. With past projects, we have developed Wiki pages where we have tried to organize the ideas we have had into a more browsable form. One thing we didn’t do with EDP, which in retrospect I think we should have, is have an official “closing” of the project marked by the writing of a formal article that included what we judged to be the main ideas we had had, with complete proofs when we had them. An advantage of doing that is that if somebody later solves the problem, it is more convenient to be able to refer to an article (or preprint) than to a combination of blog comments and Wiki pages.

With an eye to this, I thought I would make FUNC1 a data-gathering exercise of the following slightly unusual kind. For somebody working on the problem in the future, it would be very useful, I would have thought, to have a list of natural strengthenings of the conjecture, together with a list of “troublesome” examples. One could then produce a table with strengthenings down the side and examples along the top, with a tick in the table entry if the example disproves the strengthening, a cross if it doesn’t, and a question mark if we don’t yet know whether it does.

A first step towards drawing up such a table is of course to come up with a good supply of strengthenings and examples, and that is what I want to do in this post. I am mainly selecting them from the comments on the previous post. I shall present the strengthenings as statements rather than questions, so they are not necessarily true.

Let be a function from the power set of a finite set to the non-negative reals. Suppose that the weights satisfy the condition for every and that at least one non-empty set has positive weight. Then there exists such that the sum of the weights of the sets containing is at least half the sum of all the weights.

~~Note that if all weights take values 0 or 1, then this becomes the original conjecture. It is possible that the above statement ~~*follows* from the original conjecture, but we do not know this (though it may be known).

This is not a good question after all, as the deleted statement above is false. When is 01-valued, the statement reduces to saying that for every up-set there is an element in at least half the sets, which is trivial: all the elements are in at least half the sets. Thanks to Tobias Fritz for pointing this out.

Let be a function from the power set of a finite set to the non-negative reals. Suppose that the weights satisfy the condition for every and that at least one non-empty set has positive weight. Then there exists such that the sum of the weights of the sets containing is at least half the sum of all the weights.

Again, if all weights take values 0 or 1, then the collection of sets of weight 1 is union closed and we obtain the original conjecture. It was suggested in this comment that one might perhaps be able to attack this strengthening using tropical geometry, since the operations it uses are addition and taking the minimum.

Tom Eccles suggests (in this comment) a generalization that concerns two set systems rather than one. Given set systems and , write for the union set . A family is union closed if and only if . What can we say if and are set systems with small? There are various conjectures one can make, of which one of the cleanest is the following: if and are of size and is of size at most , then there exists such that , where denotes the set of sets in that contain . This obviously implies FUNC.

Simple examples show that can be much smaller than either or — for instance, it can consist of just one set. But in those examples there always seems to be an element contained in many more sets. So it would be interesting to find a good conjecture by choosing an appropriate function to insert into the following statement: if , , and , then there exists such that .

Let be a union-closed family of subsets of a finite set . Then the average size of is at least .

This is false, as the example shows for any .

Let be a union-closed family of subsets of a finite set and suppose that *separates points*, meaning that if , then at least one set in contains exactly one of and . (Equivalently, the sets are all distinct.) Then the average size of is at least .

This again is false: see Example 2 below.

In this comment I had a rather amusing (and typically Polymathematical) experience of formulating a conjecture that I thought was obviously false in order to think about how it might be refined, and then discovering that I couldn’t disprove it (despite temporarily thinking I had a counterexample). So here it is.

As I have just noted (and also commented in the first post), very simple examples show that if we define the “abundance” of an element to be , then the average abundance does not have to be at least . However, that still leaves open the possibility that some kind of naturally defined *weighted* average might do the job. Since we want to define the weighting in terms of and to favour elements that are contained in lots of sets, a rather crude idea is to pick a random non-empty set and then a random element , and make that the probability distribution on that we use for calculating the average abundance.

A short calculation reveals that the average abundance with this probability distribution is equal to the *average overlap density*, which we define to be

where the averages are over . So one is led to the following conjecture, which implies FUNC: if is a union-closed family of sets, at least one of which is non-empty, then its average overlap density is at least 1/2.

A not wholly pleasant feature of this conjecture is that the average overlap density is very far from being isomorphism invariant. (That is, if you duplicate elements of , the average overlap density changes.) Initially, I thought this would make it easy to find counterexamples, but that seems not to be the case. It also means that one can give some thought to how to put a measure on that makes the average overlap density as small as possible. Perhaps if the conjecture is true, this “worst case” would be easier to analyse. (It’s not actually clear that there is a worst case — it may be that one wants to use a measure on that gives measure zero to some non-empty set , at which point the definition of average overlap density breaks down. So one might have to look at the “near worst” case.)

This conjecture comes from a comment by Igor Balla. Let be a union-closed family and let . Define a new family by replacing each by if and leaving it alone if . Repeat this process for every and the result is an *up-set* , that is, a set-system such that and implies that .

Note that each time we perform the “add if you can” operation, we are applying a bijection to the current set system, so we can compose all these bijections to obtain a bijection from to .

Suppose now that are distinct sets. It can be shown that there is no set such that and . In other words, is never a subset of .

Now the fact that is an up-set means that each element is in at least half the sets (since if then ). Moreover, it seems hard for too many sets in to be “far” from their images , since then there is a strong danger that we will be able to find a pair of sets and with .

This leads to the conjecture that Balla makes. He is not at all confident that it is true, but has checked that there are no small counterexamples.

**Conjecture.** Let be a set system such that there exist an up-set and a bijection with the following properties.

- For each , .
- For no distinct do we have .

Then there is an element that belongs to at least half the sets in .

The following comment by Gil Kalai is worth quoting: “Years ago I remember that Jeff Kahn said that he bet he will find a counterexample to every meaningful strengthening of Frankl’s conjecture. And indeed he shot down many of those and a few I proposed, including weighted versions. I have to look in my old emails to see if this one too.” So it seems that even to find a conjecture that genuinely strenghtens FUNC without being obviously false (at least to Jeff Kahn) would be some sort of achievement. (Apparently the final conjecture above passes the Jeff-Kahn test in the following weak sense: he believes it to be false but has not managed to find a counterexample.)

If is a finite set and is the power set of , then every element of has abundance 1/2. (Remark 1: I am using the word “abundance” for the *proportion* of sets in that contain the element in question. Remark 2: for what it’s worth, the above statement is meaningful and true even if is empty.)

Obviously this is not a counterexample to FUNC, but it was in fact a counterexample to an over-optimistic conjecture I very briefly made and then abandoned while writing it into a comment.

This example was mentioned by Alec Edgington. Let be a finite set and let be an element that does not belong to . Now let consist of together with all sets of the form such that .

If , then has abundance , while each has abundance . Therefore, only one point has abundance that is not less than 1/2.

A slightly different example, also used by Alec Edgington, is to take all subsets of together with the set . If , then the abundance of any element of is while the abundance of is . Therefore, the average abundance is

When is large, the amount by which exceeds 1/2 is exponentially small, from which it follows easily that this average is less than 1/2. In fact, it starts to be less than 1/2 when (which is the case Alec mentioned). This shows that Conjecture 5 above (that the average abundance must be at least 1/2 if the system separates points) is false.

Let be a positive integer and take the set system that consists of the sets and . This is a simple example (or rather class of examples) of a set system for which although there is certainly an element with abundance at least 1/2 (the element has abundance 2/3), the *average* abundance is close to 1/3. Very simple variants of this example can give average abundances that are arbitrarily small — just take a few small sets and one absolutely huge set.

I will not explain these in detail, but just point you to an interesting comment by Uwe Stroinski that suggests a number-theoretic way of constructing union-closed families.

I will continue with methods of building union-closed families out of other union-closed families.

I’ll define this process formally first. Let be a set of size and let be a collection of subsets of . Now let be a collection of disjoint non-empty sets and define to be the collection of all sets of the form for some . If is union closed, then so is .

One can think of as “duplicating” the element of times. A simple example of this process is to take the set system and let and . This gives the set system 3 above.

Let us say that if for some suitable set-valued function . And let us say that two set systems are *isomorphic* if they are in the same equivalence class of the symmetric-transitive closure of the relation . Equivalently, they are isomorphic if we can find and such that .

The effect of duplication is basically that we can convert the uniform measure on the ground set into any other probability measure (at least to an arbitrary approximation). What I mean by that is that the uniform measure on the ground set of , which is of course , gives you a probability of of landing in , so has the same effect as assigning that probability to and sticking with the set system . (So the precise statement is that we can get any probability measure where all the probabilities are rational.)

If one is looking for an averaging argument, then it would seem that a nice property that such an argument might have is (as I have already commented above) that the average should be with respect to a probability measure on that is constructed from in an isomorphism-invariant way.

It is common in the literature to outlaw duplication by insisting that separates points. However, it may be genuinely useful to consider different measures on the ground set.

Tom Eccles, in his off-diagonal conjecture, considered the set system, which he denoted by , that is defined to be . This might more properly be denoted , by analogy with the notation for sumsets, but obviously one can’t write it like that because that notation already stands for something else, so I’ll stick with Tom’s notation.

It’s trivial to see that if and are union closed, then so is . Moreover, sometimes it does quite natural things: for instance, if and are any two sets, then , where is the power-set operation.

Another remark is that if and are disjoint, and and , then the abundance of in is equal to the abundance of in .

I got this from a comment by Thomas Bloom. Let and be disjoint finite sets and let and be two union-closed families living inside and , respectively, and assume that and . We then build a new family as follows. Let be some function from to . Then take all sets of one of the following four forms:

- sets with ;
- sets with ;
- sets with ;
- sets with .

It can be checked quite easily (there are six cases to consider, all straightforward) that the resulting family is union closed.

Thomas Bloom remarks that if consists of all subsets of and consists of all subsets of , then (for suitable ) the result is a union-closed family that contains no set of size less than 3, and also contains a set of size 3 with no element of abundance greater than or equal to 1/2. This is interesting because a simple argument shows that if is a set with two elements in a union-closed family then at least one of its elements has abundance at least 1/2.

Thus, this construction method can be used to create interesting union-closed families out of boring ones.

Thomas discusses what happens to abundances when you do this construction, and the rough answer is that elements of become less abundant but elements of become quite a lot more abundant. So one can’t just perform this construction a few times and end up with a counterexample to FUNC. However, as Thomas also says, there is plenty of scope for modifying this basic idea, and maybe good things could flow from that.

I feel as though there is much more I could say, but this post has got quite long, and has taken me quite a long time to write, so I think it is better if I just post it. If there are things I wish I had mentioned, I’ll put them in comments and possibly repeat them in my next post.

I’ll close by remarking that I have created a wiki page. At the time of writing it has almost nothing on it but I hope that will change before too long.

The arXiv is the physicsts' marketplace of ideas. In high energy physics and adjacent fields, almost all papers are submitted to the arXiv prior to journal submission. Developed by Paul Ginsparg in the early 1990s, this open-access pre-print repository has served the physics community for more than 20 years, and meanwhile extends also to adjacent fields like mathematics, economics, and biology. It fulfills an extremely important function by helping us to exchange ideas quickly and efficiently.

A few days ago, a story appeared online which quickly spread. Nicolas Gisin, an established Professor for Physics who works on quantum cryptography (among other things) relates the story of two of his students who ventured in a territory unfamiliar for him, black hole physics. They wrote a paper that appeared to him likely wrong but reasonable. It got rejected by the arxiv. The paper later got published by PLA (a respected journal that however does not focus on general relativity). More worrisome still, the students' next paper also got rejected by the arXiv, making it appear as if they were now blacklisted.

Now the paper that caused the offense is, haha, not on the arXiv, but I tracked it down. So let me just say that I think it's indeed wrong and it shouldn't have gotten published in a journal. They are basically trying to include the backreaction of the outgoing Hawking-radiation on the black hole. It's a thorny problem (the very problem this blog was named after) and the treatment in the paper doesn't make sense.

Hawking radiation is not produced at the black hole horizon. No, it is not. And tracking back the flux from infinity to the horizon is therefore is not correct. Besides this, the equation for the mass-loss that they use is a late-time approximation in a collapse situation. One can't use this approximation for a metric without collapse, and it certainly shouldn't be used down to the Planck mass. If you have a collapse-scenario, to get the backreaction right you would have to calculate the emission rate prior to horizon formation, time-dependently, and integrate over this.

Ok, so the paper is wrong. But should it have been rejected by the arXiv? I don't think so. The arxiv moderation can't and shouldn't replace peer review, it should just be a basic quality check, and the paper looks like a reasonable research project.

I asked a colleague who I know works as an arXiv moderator for comment. (S)he wants to stay anonymous but offers the following explanation:

The main problem I have with this episode is not that a paper got rejected which maybe shouldn't have been rejected -- because shit happens. Humans make mistakes, and let us be clear that the arXiv, underfunded as it is, relies on volunteers for the moderation. No, the main problem I have is the lack of transparency.

The arXiv is an essential resource for the physics community. We all put trust in a group of mostly anonymous moderators who do a rather thankless and yet vital job. I don't think the origin of the problem is with these people. I am sure they do the best they can. No, I think the origin of the problem is the lack of financial resources which must affect the possibility to employ administrative staff to oversee the operations. You get what you pay for.

I hope that this episode be a wake-up call to the community to put their financial support behind the arXiv, and to the arXiv to use this support to put into place a more transparent and better organized moderation procedure.

Note added: It was mentioned to me that the problem with the paper might be more elementary in that they're using wrong coordinates to begin with - it hadn't even occurred to me to check this. To tell you the truth, I am not really interested in figuring out exactly why the paper is wrong, it's besides the point. I just hope that whoever reviewed the paper for PLA now goes and sits in the corner for an hour with a paper bag over their head.

Over the years the originally free signup became more restricted. If you sign up for the arXiv now, you need to be "endorsed" by several people who are already signed up. It also became necessary to screen submissions to keep the quality level up. In hindsight, this isn't surprising: more people means more trouble. And sometimes, of course, things go wrong.

I have heard various stories about arXiv moderation gone wrong, mostly these are from students, and mostly it affects those who work in small research areas or those whose name is Garrett Lisi.A few days ago, a story appeared online which quickly spread. Nicolas Gisin, an established Professor for Physics who works on quantum cryptography (among other things) relates the story of two of his students who ventured in a territory unfamiliar for him, black hole physics. They wrote a paper that appeared to him likely wrong but reasonable. It got rejected by the arxiv. The paper later got published by PLA (a respected journal that however does not focus on general relativity). More worrisome still, the students' next paper also got rejected by the arXiv, making it appear as if they were now blacklisted.

Now the paper that caused the offense is, haha, not on the arXiv, but I tracked it down. So let me just say that I think it's indeed wrong and it shouldn't have gotten published in a journal. They are basically trying to include the backreaction of the outgoing Hawking-radiation on the black hole. It's a thorny problem (the very problem this blog was named after) and the treatment in the paper doesn't make sense.

Hawking radiation is not produced at the black hole horizon. No, it is not. And tracking back the flux from infinity to the horizon is therefore is not correct. Besides this, the equation for the mass-loss that they use is a late-time approximation in a collapse situation. One can't use this approximation for a metric without collapse, and it certainly shouldn't be used down to the Planck mass. If you have a collapse-scenario, to get the backreaction right you would have to calculate the emission rate prior to horizon formation, time-dependently, and integrate over this.

Ok, so the paper is wrong. But should it have been rejected by the arXiv? I don't think so. The arxiv moderation can't and shouldn't replace peer review, it should just be a basic quality check, and the paper looks like a reasonable research project.

I asked a colleague who I know works as an arXiv moderator for comment. (S)he wants to stay anonymous but offers the following explanation:

I had not heard of the complaints/blog article, thanks for passing that information on...

The version of the article I saw was extremely naive and was very confused regarding coordinates and horizons in GR... I thought it was not “referee-able quality’’ — at least not in any competently run GR journal... (The hep-th moderator independently raised concerns...)

(S)he is correct of course. We haven't seen the paper that was originally submitted. It was very likely in considerably worse shape than the published version. Indeed, Gisin writes in his post that the paper was significantly revised during peer review. Taking this into account, the decision seems understandable to me.While it is now published at Physics Letters A, it is perhaps worth noting that the editorial board of Physics Letters A does *not* include anyone specializing in GR.

The main problem I have with this episode is not that a paper got rejected which maybe shouldn't have been rejected -- because shit happens. Humans make mistakes, and let us be clear that the arXiv, underfunded as it is, relies on volunteers for the moderation. No, the main problem I have is the lack of transparency.

The arXiv is an essential resource for the physics community. We all put trust in a group of mostly anonymous moderators who do a rather thankless and yet vital job. I don't think the origin of the problem is with these people. I am sure they do the best they can. No, I think the origin of the problem is the lack of financial resources which must affect the possibility to employ administrative staff to oversee the operations. You get what you pay for.

I hope that this episode be a wake-up call to the community to put their financial support behind the arXiv, and to the arXiv to use this support to put into place a more transparent and better organized moderation procedure.

Note added: It was mentioned to me that the problem with the paper might be more elementary in that they're using wrong coordinates to begin with - it hadn't even occurred to me to check this. To tell you the truth, I am not really interested in figuring out exactly why the paper is wrong, it's besides the point. I just hope that whoever reviewed the paper for PLA now goes and sits in the corner for an hour with a paper bag over their head.

Greetings from the west-end of my trip, which brought me out to Maui, visiting Garrett at the Pacific Science Institute, PSI. Launched roughly a year ago, Garrett and his girlfriend/partner Crystal have now hosted about 60 traveling scientists, "from all areas except chemistry" I was told.

I got bitten by mosquitoes and picked at by a set of adorable chickens (named after the six quarks), but managed to convince everybody that I really didn't feel like swimming, or diving, or jumping off things at great height. I know I'm dull. I did watch some sea turtles though and I also got a new T-shirt with the PSI-logo, which you can admire in the photo to the right (taken in front of a painting by Crystal).I'm not an island-person, don't like mountains, and I can't stand humidity, so for me it's somewhat of a mystery what people think is so great about Hawaii. But leaving aside my preference for German forests, it's as pleasant a place as can be.

You won't be surprised to hear that Garrett is still working on his E8 unification and says things are progressing well, if slowly. Aloha.

When I was a kid people thought it would be a long time before computers could adequately translate natural language text, or play Go against a human being, because you’d need some kind of AI to do those things, and AI seemed really hard. Now we know that you can get pretty decent translation and […]

When I was a kid people thought it would be a long time before computers could adequately translate natural language text, or play Go against a human being, because you’d need some kind of AI to do those things, and AI seemed really hard.

Now we know that you can get pretty decent translation and Go without anything like AI. But AI still seems really hard.

However, Apple CEO Tim Cook pointed out that 60% of people who owned an iPhone before the launch of the iPhone 6 haven’t upgraded to the most recent models, which implies that there is still room to grow, Reuters notes. Doesn’t it imply that a) people are no longer on contracts incentivizing biannual upgrade; and b) Apple […]

However, Apple CEO Tim Cook pointed out that 60% of people who owned an iPhone before the launch of the iPhone 6 haven’t upgraded to the most recent models, which implies that there is still room to grow, Reuters notes.

Doesn’t it imply that a) people are no longer on contracts incentivizing biannual upgrade; and b) Apple hasn’t figured out a way to make a new phone that’s different enough from the iPhone 5 to make people want to switch?

This makes my public outreach efforts look lame by comparison. Well done!

The Institute for Pure and Applied Mathematics (IPAM) here at UCLA is seeking applications for its new director in 2017 or 2018, to replace Russ Caflisch, who is nearing the end of his five-year term as IPAM director. The previous directors of IPAM (Tony Chan, Mark Green, and Russ Caflisch) were also from the mathematics […]

The Institute for Pure and Applied Mathematics (IPAM) here at UCLA is seeking applications for its new director in 2017 or 2018, to replace Russ Caflisch, who is nearing the end of his five-year term as IPAM director. The previous directors of IPAM (Tony Chan, Mark Green, and Russ Caflisch) were also from the mathematics department here at UCLA, but the position is open to all qualified applicants with extensive scientific and administrative experience in mathematics, computer science, or statistics. Applications will be reviewed on June 1, 2016 (though the applications process will remain open through to Dec 1, 2016).

Filed under: advertising Tagged: ipam

Given any finite collection of elements in some Banach space , the triangle inequality tells us that However, when the all “oscillate in different ways”, one expects to improve substantially upon the triangle inequality. For instance, if is a Hilbert space and the are mutually orthogonal, we have the Pythagorean theorem For sake of comparison, […]

Given any finite collection of elements in some Banach space , the triangle inequality tells us that

However, when the all “oscillate in different ways”, one expects to improve substantially upon the triangle inequality. For instance, if is a Hilbert space and the are mutually orthogonal, we have the Pythagorean theorem

For sake of comparison, from the triangle inequality and Cauchy-Schwarz one has the general inequality

for any finite collection in any Banach space , where denotes the cardinality of . Thus orthogonality in a Hilbert space yields “square root cancellation”, saving a factor of or so over the trivial bound coming from the triangle inequality.

More generally, let us somewhat informally say that a collection exhibits *decoupling in * if one has the Pythagorean-like inequality

for any , thus one obtains almost the full square root cancellation in the norm. The theory of *almost orthogonality* can then be viewed as the theory of decoupling in Hilbert spaces such as . In spaces for one usually does not expect this sort of decoupling; for instance, if the are disjointly supported one has

and the right-hand side can be much larger than when . At the opposite extreme, one usually does not expect to get decoupling in , since one could conceivably align the to all attain a maximum magnitude at the same location with the same phase, at which point the triangle inequality in becomes sharp.

However, in some cases one can get decoupling for certain . For instance, suppose we are in , and that are *bi-orthogonal* in the sense that the products for are pairwise orthogonal in . Then we have

giving decoupling in . (Similarly if each of the is orthogonal to all but of the other .) A similar argument also gives decoupling when one has tri-orthogonality (with the mostly orthogonal to each other), and so forth. As a slight variant, Khintchine’s inequality also indicates that decoupling should occur for any fixed if one multiplies each of the by an independent random sign .

In recent years, Bourgain and Demeter have been establishing *decoupling theorems* in spaces for various key exponents of , in the “restriction theory” setting in which the are Fourier transforms of measures supported on different portions of a given surface or curve; this builds upon the earlier decoupling theorems of Wolff. In a recent paper with Guth, they established the following decoupling theorem for the curve parameterised by the polynomial curve

For any ball in , let denote the weight

which should be viewed as a smoothed out version of the indicator function of . In particular, the space can be viewed as a smoothed out version of the space . For future reference we observe a fundamental self-similarity of the curve : any arc in this curve, with a compact interval, is affinely equivalent to the standard arc .

Theorem 1 (Decoupling theorem)Let . Subdivide the unit interval into equal subintervals of length , and for each such , let be the Fourier transformof a finite Borel measure on the arc , where . Then the exhibit decoupling in for any ball of radius .

Orthogonality gives the case of this theorem. The bi-orthogonality type arguments sketched earlier only give decoupling in up to the range ; the point here is that we can now get a much larger value of . The case of this theorem was previously established by Bourgain and Demeter (who obtained in fact an analogous theorem for any curved hypersurface). The exponent (and the radius ) is best possible, as can be seen by the following basic example. If

where is a bump function adapted to , then standard Fourier-analytic computations show that will be comparable to on a rectangular box of dimensions (and thus volume ) centred at the origin, and exhibit decay away from this box, with comparable to

On the other hand, is comparable to on a ball of radius comparable to centred at the origin, so is , which is just barely consistent with decoupling. This calculation shows that decoupling will fail if is replaced by any larger exponent, and also if the radius of the ball is reduced to be significantly smaller than .

This theorem has the following consequence of importance in analytic number theory:

Corollary 2 (Vinogradov main conjecture)Let be integers, and let . Then

*Proof:* By the Hölder inequality (and the trivial bound of for the exponential sum), it suffices to treat the critical case , that is to say to show that

We can rescale this as

As the integrand is periodic along the lattice , this is equivalent to

The left-hand side may be bounded by , where and . Since

the claim now follows from the decoupling theorem and a brief calculation.

Using the Plancherel formula, one may equivalently (when is an integer) write the Vinogradov main conjecture in terms of solutions to the system of equations

but we will not use this formulation here.

A history of the Vinogradov main conjecture may be found in this survey of Wooley; prior to the Bourgain-Demeter-Guth theorem, the conjecture was solved completely for , or for and either below or above , with the bulk of recent progress coming from the *efficient congruencing* technique of Wooley. It has numerous applications to exponential sums, Waring’s problem, and the zeta function; to give just one application, the main conjecture implies the predicted asymptotic for the number of ways to express a large number as the sum of fifth powers (the previous best result required fifth powers). The Bourgain-Demeter-Guth approach to the Vinogradov main conjecture, based on decoupling, is ostensibly very different from the efficient congruencing technique, which relies heavily on the arithmetic structure of the program, but it appears (as I have been told from second-hand sources) that the two methods are actually closely related, with the former being a sort of “Archimedean” version of the latter (with the intervals in the decoupling theorem being analogous to congruence classes in the efficient congruencing method); hopefully there will be some future work making this connection more precise. One advantage of the decoupling approach is that it generalises to non-arithmetic settings in which the set that is drawn from is replaced by some other similarly separated set of real numbers. (A random thought – could this allow the Vinogradov-Korobov bounds on the zeta function to extend to Beurling zeta functions?)

Below the fold we sketch the Bourgain-Demeter-Guth argument proving Theorem 1.

I thank Jean Bourgain and Andrew Granville for helpful discussions.

** — 1. Initial reductions — **

The claim will proceed by an induction on dimension, thus we assume henceforth that (the case being immediate from the Pythagorean theorem) and that Theorem 1 has already been proven for smaller values of . This has the following nice consequence:

Proposition 3 (Lower dimensional decoupling)Let the notation be as in Theorem 1. Suppose also that , and that Theorem 1 has already been proven for all smaller values of . Then for any , the exhibits decoupling in for any ball of radius .

*Proof:* (Sketch) We slice the ball into -dimensional slices parallel to the first coordinate directions. On each slice, the can be interpreted as functions on whose Fourier transform lie on the curve , where . Applying Theorem 1 with replaced by , and then integrating over all slices using Fubini’s theorem and Minkowski’s inequality (to interchange the norm and the square function), we obtain the claim.

The first step, needed for technical inductive purposes, is to work at an exponent slightly below . More precisely, given any and , let denote the assertion that

whenever , , and are as in Theorem 1. Theorem 1 is then clearly equivalent to the claim holding for all . This turns out to be equivalent to the following variant:

Proposition 4Let , and assume Theorem 1 has been established for all smaller values of . If is sufficiently close to , then holds for all .

The reason for this is that the functions and all have Fourier transform supported on a ball of radius , and so there is a Bernstein-type inequality that lets one replace the norm of either function by the norm, losing a power of that goes to zero as goes to . (See Corollary 6.2 and Lemma 8.2 of the Bourgain-Demeter-Guth paper for more details of this.)

Using the trivial bound (1) we see that holds for large (e.g. ). To reduce , it suffices to prove the following inductive claim.

Proposition 5 (Inductive claim)Let , and assume Theorem 1 has been established for all smaller values of . If is sufficiently close to , and holds for some , then holds for some .

Since the set of for which holds is clearly a closed half-infinite interval, Proposition 5 implies Proposition 4 and hence Theorem 1.

Henceforth we fix as in Proposition 5. We fix and use to denote any quantity that goes to zero as , keeping fixed. Then the hypothesis reads

The next step is to reduce matters to a “multilinear” version of the above estimate, in order to exploit a multilinear Kakeya estimate at a later stage of the argument. Let be a large integer depending only on (actually Bourgain, Demeter, and Guth choose ). It turns out that it will suffice to prove the multilinear version

whenever are families of disjoint subintervals on of length that are separated from each other by a distance of , and where denotes the geometric mean

We have the following nice equivalence (essentially due to Bourgain and Guth, building upon an earlier “bilinear equivalence” result of Vargas, Vega, and myself, and discussed in this previous blog post):

Proposition 6 (Multilinear equivalence)For any , the estimates (2) and (3) are equivalent.

*Proof:* The derivation of (3) from (2) is immediate from Hölder’s inequality. To obtain the converse implication, let denote the best constant in (2), thus is the smallest constant such that

The idea is to prove an inequality of the form

for any fixed integer (with the implied constant in the notation independent of ); by choosing large enough one can then prove by an inductive argument.

We partition the intervals in (2) into classes of consecutive intervals, so that can be expressed as where . Observe that for any , one either has

for some (i.e. one of the dominates the sum), or else one has

for some with the transversality condition . This leads to the pointwise inequality

Bounding the supremum by and then taking norms and using (3), we conclude that

On the other hand, applying an affine rescaling to (4) one sees that

and the claim follows. (A more detailed version of this argument may be found in Theorem 4.1 of this paper of Bourgain and Demeter.)

It thus suffices to show (3).

The next step is to set up some intermediate scales between and , in order to run an “induction on scales” argument. For any scale , any exponent , and any function , let denote the local average

where denotes the volume of (one could also use the equivalent quantity here if desired). For any exponents , , and (independent of ), let denote the least exponent for which one has the local decoupling inequality

for as in (3), where the -length intervals in have been covered by a family of finitely overlapping intervals of length , and . It is then not difficult to see that the estimate (3) is equivalent to the inequality

(basically because when , there is essentially only one for each , and is basically ; also; the averaging is essentially the identity when since all the and here have Fourier support on a ball of radius ). To put it another way, our task is now to show that

On the other hand, one can establish the following inequalities concerning the quantities , arranged roughly in increasing order of difficulty to prove.

Proposition 7 (Inequalities on )Throughout this proposition it is understood that , , and .

- (i) (Hölder) The quantity is convex in , and monotone nondecreasing in .
- (ii) (Minkowski) If , then is monotone non-decreasing in .
- (iii) (Stability) One has . (In fact, is Lipschitz in uniformly in , but we will not need this.)
- (iv) (Rescaled decoupling hypothesis) If and , then one has .
- (v) (Lower dimensional decoupling) If and , then .
- (vi) (Multilinear Kakeya) If and , then .

We sketch the proof of the various parts of this proposition in later sections. For now, let us show how these properties imply the claim (6). In the paper of Bourgain, Demeter, and Guth, the above properties were iterated along a certain “tree” of parameters , relying in (v) to increase the parameter (which measures the amount of decoupling) and (vi) to “inflate” or increase the parameter (which measures the spatial scale at which decoupling has been obtained), and (i) to reconcile the different choices of appearing in (v) and (vi), with the remaining properties (ii), (iii), (iv) used to control various “boundary terms” arising from this tree iteration. Here, we will present an essentially equivalent “Bellman function” formulation of the argument which replaces this iteration by a carefully (but rather unmotivatedly) chosen inductive claim. More precisely, let be a small quantity (depending only on and ) to be chosen later. For any , let denote the claim that for every , and for all sufficiently small , one has the inequality

From Proposition 7 (i), (ii), (iv), we see that holds for some small . We will shortly establish the implication

for some independent of ; this implies upon iteration that holds for arbitrarily large values of . Applying (9) with for a sufficiently large and a sufficiently small , and combining with Proposition 7(iii), we obtain the claim (6).

We now prove the implication (10). Thus we assume (7) holds for , sufficiently small , and obeying (8), and also (9) for and we wish to improve this to

for the same range of and for sufficiently small , and also

By Proposition 7(i) it suffices to show this for the extreme values of , thus we wish to show that

We begin with (13). The case of this estimate is

But since , we see that if is small enough, so the right-hand side of (16) is greater than and the claim follows from Proposition 7(iv) (with a little bit of room to spare). Now we look at the cases of (13). By Proposition 7(vi), we have

For close to , lies between and , so from (7) one has

Since , one has

for small enough depending on , and (13) follows (if is small enough depending on but not on ).

The same argument applied with gives

Since , we thus have

if are sufficiently small depending on (but not on ). This, together with Proposition 7(i), gives (15).

Finally, we establish (14). From Proposition 7(v) (with replaced by ) we have

In the case, this gives

and the claim (14) follows from (15) in this case. Now suppose . Since is close to , lies between and , and so we may apply (7) to conclude that

and hence (after simplifying)

which gives (14) for small enough (depending on , but not on ).

** — 2. Rescaled decoupling — **

The claims (i), (ii), (iii) of Proposition 7 are routine applications of the Hölder and Minkowski inequalities (and also the Bernstein inequality, in the case of (iii)); we will focus on the more interesting claims (iv), (v), (vi).

Here we establish (iv). The main geometric point exploited here is that any segment of the curve is affinely equivalent to itself, with the key factor of in the bound coming from this affine rescaling.

Using the definition (5) of , we see that we need to show that

for balls of radius . By Hölder’s inequality, it suffices to show that

for each . By Minkowski’s inequality (and the fact that ), the left-hand side is at most

so it suffices to show that

for each . From Fubini’s theorem one has

so we reduce to showing that

But this follows by applying an affine rescaling to map to , and then using the hypothesis with replaced by . (The ball gets distorted into an ellipsoid, but one can check that this ellipsoid can be covered efficiently by finitely overlapping balls of radius , and so one can close the argument using the triangle inequality.)

** — 3. Lower dimensional decoupling — **

Now we establish (v). Here, the geometric point is the one implicitly used in Proposition 3, namely that the -dimensional curve projects down to the -dimensional curve for any .

Let be as in Proposition 7(v). From (5), it suffices to show that

for balls of radius . It will suffice to show the pointwise estimate

for any , or equivalently that

where . Clearly this will follow if we have

for each . Covering the intervals in by those in , it suffices to show that

for each . But this follows from Proposition 3.

** — 4. Multidimensional Kakeya — **

Finally, we establish (vi), which is the most substantial component of Proposition 7, and the only component which truly takes advantage of the reduction to the multilinear setting. Let and be such that . From (5), it suffices to show that

for balls of radius . By averaging, it suffices to establish the bound

for balls of radius . If we write , the right-hand side simplifies to

so it suffices to show that

At this point it is convenient to perform a dyadic pigeonholing (giving up a factor of ) to normalise, for each , all of the quantities to be of comparable size, after reducing the sets so some appropriate subset . (The contribution of those for which this quantity is less than, say, of the maximal value, can be safely discarded by trivial estimates.) By homogeneity we may then normalise

for all surviving , so the estimate now becomes

Since is close to , is less than , so we can estimate

and so it suffices to show that

or, on raising to the power ,

Localising to balls of radius , it suffices to show that

The arc is contained in a box of dimensions roughly , so by the uncertainty principle is essentially constant along boxes of dimensions (this can be made precise by standard methods, see e.g. the discussion in the proof of Theorem 5.6 of Bourgain-Demeter-Guth, or my general discussion on the uncertainty principle in this previous blog post). This implies that , when restricted to , is essentially constant on “plates”, defined as the intersection of with slabs that have dimensions of length and the remaining dimensions infinite (and thus restricted to be of length about after restriction to ). Furthermore, as varies (and is constrained to be in , the orientation of these slabs varies in a suitably “transverse” fashion (the precise definition of this is a little technical, but can be verified for ; see the BDG paper for details). After rescaling, the claim then follows from the following proposition:

Proposition 8 (Multilinear Kakeya)For , let be a collection of “plates” that have dimensions of length , and dimensions that are infinite, and for each let be a non-negative number. Assume that the families of plates obey a suitable transversality condition. Thenfor any ball of radius .

The exponent here is natural, as can be seen by considering the example where each consists of about parallel disjoint plates passing through , with for all such plates.

For (where the plates now become tubes), this result was first obtained by Bennett, Carbery, and myself using heat kernel methods, with a rather different proof (also capturing the endpoint case) later given using algebraic topological methods by Guth (as discussed in this previous post. More recently, a very short and elementary proof of this theorem was given by Guth, which was initially given for but extends to general . The scheme of the proof can be described as follows.

- When all the plates in a each family are parallel, the claim follows from the Loomis-Whitney inequality (when ) or a more general Brascamp-Lieb inequality of Bennett, Carbery, Christ, and myself (for general ). These inequalities can be proven by a repeated applications of the Hölder inequality and Fubini’s theorem.
- Perturbing this, we can obtain the proposition with a loss of for any and , provided that the plates in each are within of being parallel, and is sufficiently small depending on and . (For the case of general , this requires some uniformity in the result of Bennett, Carbery, Christ, and myself, which can be obtained by hand in the specific case of interest here, but was recently established in general by Bennett, Bez, Flock, and Lee.
- A standard “induction on scales” argument shows that if the proposition is true at scale with some loss , then it is also true at scale with loss . Iterating this, we see that we can obtain the proposition with a loss of uniformly for
*all*, provided that the plates are within of being parallel and is sufficiently small depending now only on (and not on ). - A finite partition of unity then suffices to remove the restriction of the plates being within of each other, and then sending to zero we obtain the claim.

The proof of the decoupling theorem (and thus the Vinogradov main conjecture) are now complete.

Remark 9The above arguments extend to give decoupling for the curve in for every . As it turns out (Bourgain, private communication), a variant of the argument also handles the range , and the range can be covered from an induction on dimension (using the argument used to establish Proposition 3).

Filed under: expository, math.CA, math.NT Tagged: Ciprian Demeter, decoupling, induction on scales, Jean Bourgain, Larry Guth, multilinear Kakeya conjecture, restriction theorems, Vinogradov main conjecture

Yesterday brought the sad news that Marvin Minsky passed away at age 88. I never met Minsky (I wish I had); I just had one email exchange with him back in 2002, about Stephen Wolfram’s book. But Minsky was my academic great-grandfather (through Manuel Blum and Umesh Vazirani), and he influenced me in many other ways. For example, in his and Papert’s 1968 book Perceptrons—notorious for “killing neural net research for a decade,” because of its mis- or over-interpreted theorems about the representational limitations of single-layer neural nets—the way Minsky and Papert *proved* those theorems was by translating questions about computation into questions about the existence or nonexistence of low-degree polynomials with various properties, and then answering the latter questions using **MATH**. Their “polynomial method” is now a mainstay of quantum algorithms research (having been brought to the subject by Beals et al.), and in particular, has been a mainstay of my own career. Hardly Minsky’s best-known contribution to human knowledge, but that even such a relatively minor part of his oeuvre could have legs half a century later is a testament to his impact.

I’m sure readers will have other thoughts to share about Minsky, so please do so in the comments section. Personal reminiscences are especially welcome.

There was a movie, in the old days, Journey to the Far-Side of the Sun (also known as Doppleganger) which (spoiler alert) posits that there is a mirror version of the Earth hidden on the other side of the Sun, sharing the orbit with our Earth. The idea is that this planet would always be hidden behind the Sun, and so we would not know it there there.

This idea comes up a lot, over and over again. In fact, it came up again last week on twitter. But there's a problem. It assumes the Earth is on a circular orbit.

I won't go into the details here, but one of the greatest insights in astronomy was the discovery of Kepler's laws of planetary motion, telling us that planets move on elliptical orbits. With this, there was the realisation that planets can't move at uniform speeds, but travel quickly when closer to the Sun, while slowing down as their orbits carry them to larger distance.

There has been a lot of work examining orbits in the Solar System, and you can simply locate the position of a planet along its orbit. So it is similarly simply to consider two planets sharing the same orbit, but starting at different locations, one at the closest approach to the Sun, one at the farthest.

Let's start with a simple circular orbit with two planets. Everything here is scaled to the Earth's orbit, and the circles in the figures coming up are not to scale. By here's an instance in the orbit.

It should be obvious that at all points in the orbit, the planets remain exactly on opposite sides of the Sun, and so would not be visible to each other.

So, here's a way of conveying this. The x-axis is the point in the orbit (in Earth Years) while the y-axis is the distance a light ray between the two planets passes from the centre of the Sun (blue line). The red line is the radius of the Sun (in Astronomical Units).

The blue line, as expected, is at zero. The planets remain hidden from each other.

Let's take a more eccentric orbit, with an eccentricity of 0.1. Here is the orbit

This doesn't look too different to the circular case above. The red circle in there is the location of the closest approach of each line of sight to the centre of the Sun, which is no longer a point. Let's take a look at the separation plot as before. Again, the red is the radius of the Sun.

Wow! For small segments of the planets orbits, they are hidden from one another, but for most of the orbit, the light between the planets pass at large distances from the Sun. Now, it might be tricky to see each other directly due to the glare of the Sun, but opportunities such as eclipses would mean the planets should be visible to one another.

But an eccentricity of 0.1 is much more than that of the Earth, whose orbit is much closer to a circle with an eccentricity of 0.0167086 . Here's the orbit plot again.

So, the separation of the paths between the planets pass closer to the centre of the Sun, but, of course, smaller than the more eccentric orbits. What about the separation plot?

Excellent! As we saw before, for a large part of the orbits, the light paths between the planets passes outside the Sun! If the Earth did have a twin in the same orbit, it would be visible (modulo the glare of the Sun) for most of the year! We have never seen our Doppleganger planet!

Now, you might complain that maybe the other Earth is on the same elliptical orbit but flipped so we are both at closest approach at the same time, always being exactly on the other side of the Sun from one another. Maybe, but orbital mechanics are a little more complex than that, especially with a planet like Jupiter in the Solar System. It's tugs would be different on the Earth and its (evil?) twin, and so the orbits would subtly differ over time.

It is pretty hard to hide a planet in the inner Solar System!

This idea comes up a lot, over and over again. In fact, it came up again last week on twitter. But there's a problem. It assumes the Earth is on a circular orbit.

I won't go into the details here, but one of the greatest insights in astronomy was the discovery of Kepler's laws of planetary motion, telling us that planets move on elliptical orbits. With this, there was the realisation that planets can't move at uniform speeds, but travel quickly when closer to the Sun, while slowing down as their orbits carry them to larger distance.

There has been a lot of work examining orbits in the Solar System, and you can simply locate the position of a planet along its orbit. So it is similarly simply to consider two planets sharing the same orbit, but starting at different locations, one at the closest approach to the Sun, one at the farthest.

Let's start with a simple circular orbit with two planets. Everything here is scaled to the Earth's orbit, and the circles in the figures coming up are not to scale. By here's an instance in the orbit.

It should be obvious that at all points in the orbit, the planets remain exactly on opposite sides of the Sun, and so would not be visible to each other.

So, here's a way of conveying this. The x-axis is the point in the orbit (in Earth Years) while the y-axis is the distance a light ray between the two planets passes from the centre of the Sun (blue line). The red line is the radius of the Sun (in Astronomical Units).

The blue line, as expected, is at zero. The planets remain hidden from each other.

Let's take a more eccentric orbit, with an eccentricity of 0.1. Here is the orbit

This doesn't look too different to the circular case above. The red circle in there is the location of the closest approach of each line of sight to the centre of the Sun, which is no longer a point. Let's take a look at the separation plot as before. Again, the red is the radius of the Sun.

Wow! For small segments of the planets orbits, they are hidden from one another, but for most of the orbit, the light between the planets pass at large distances from the Sun. Now, it might be tricky to see each other directly due to the glare of the Sun, but opportunities such as eclipses would mean the planets should be visible to one another.

But an eccentricity of 0.1 is much more than that of the Earth, whose orbit is much closer to a circle with an eccentricity of 0.0167086 . Here's the orbit plot again.

So, the separation of the paths between the planets pass closer to the centre of the Sun, but, of course, smaller than the more eccentric orbits. What about the separation plot?

Excellent! As we saw before, for a large part of the orbits, the light paths between the planets passes outside the Sun! If the Earth did have a twin in the same orbit, it would be visible (modulo the glare of the Sun) for most of the year! We have never seen our Doppleganger planet!

Now, you might complain that maybe the other Earth is on the same elliptical orbit but flipped so we are both at closest approach at the same time, always being exactly on the other side of the Sun from one another. Maybe, but orbital mechanics are a little more complex than that, especially with a planet like Jupiter in the Solar System. It's tugs would be different on the Earth and its (evil?) twin, and so the orbits would subtly differ over time.

It is pretty hard to hide a planet in the inner Solar System!

As I mentioned, I’m reading Ph.D. admission files. Each file is read by two committee members and thus each file has two numerical scores. How to put all this information together into a preliminary ranking? The traditional way is to assign to each applicant their mean score. But there’s a problem: different raters have different […]

As I mentioned, I’m reading Ph.D. admission files. Each file is read by two committee members and thus each file has two numerical scores.

How to put all this information together into a preliminary ranking?

The traditional way is to assign to each applicant their mean score. But there’s a problem: different raters have different scales. My 7 might be your 5.

You could just normalize the scores by subtracting that rater’s overall mean. But that’s problematic too. What if one rater actually happens to have looked at stronger files? Or even if not: what if the relation between rater A’s scale and rater B’s scale isn’t linear? Maybe, for instance, rater A gives everyone she doesn’t think should get in a 0, while rater A uses a range of low scores to express the same opinion, depending on just how unsuitable the candidate seems.

Here’s what I did last year. If (r,a,a’) is a triple with r is a rater and a and a’ are two applicants, such that r rated a higher than a’, you can think of that as a judgment that a is more admittable than a’. And you can put all those judgments from all the raters in a big bag, and then see if you can find a ranking of the applicants (or, if you like, a real-valued function f on the applicants) such that, for every judgment a > a’, we have f(a) > f(a’).

Of course, this might not be possible — two raters might disagree! Or there might be more complicated incompatibilities generated by multiple raters. Still, you can ask: what if I tried to minimize the number of “mistakes”, i.e. the number of judgments in your bag that your choice of ranking contradicts?

Well, you can ask that, but you may not get an answer, because that’s a highly non-convex minimization problem, and is as far as we know completely intractable.

But here’s a way out, or at least a way part of the way out — we can use a *convex relaxation*. Set it up this way. Let V be the space of real-valued functions on applicants. For each judgment j, let mistake_j(f) be the step function

mistake_j(f) = 1 if f(a) < f(a’) + 1

mistake_j(f) = 0 if f(a) >= f(a’) + 1

Then “minimize total number of mistakes” is the problem of minimizing

M = sum_j mistake_j(f)

over V. And M is terribly nonconvex. If you try to gradient-descend (e.g. start with a random ranking and then switch two adjacent applicants whenever doing so reduces the total number of mistakes) you are likely to get caught in a local minimum that’s far from optimal. (Or at least that *can* happen; whether this typically actually happens in practice, I haven’t checked!)

So here’s the move: replace mistake_j(f) with a function that’s “close enough,” but is convex. It acts as a sort of tractable proxy for the optimization you’re actually after. The customary choice here is the *hinge loss*:

hinge_j(f) = min(0, f(a)-f(a’) -1).

Then H := sum_j hinge_j(f) is a convex function on f, which you can easily minimize in Matlab or python. If you can actually find an f with H(f) = 0, you’ve found a ranking which agrees with every judgment in your bag. Usually you can’t, but that’s OK! You’ve very quickly found a function H which does a decent job aggregating the committee scores. and which you can use as your starting point.

Now here’s a paper by Nihal Shah and Martin Wainwright commenter Dustin Mixon linked in my last ranking post. It suggests doing something much simpler: using a *linear* function as a proxy for mistake_j. What this amounts to is: score each applicant by the number of times they were placed above another applicant. Should I be doing this instead? My first instinct is no. It looks like Shah and Wainwright assume that each pair of applicants is equally likely to be compared; I think I don’t want to assume that, and I think (but correct me if I’m wrong!) the optimality they get may not be robust to that?

Anyway, all thoughts on this question — or suggestions as to something *totally different* I could be doing — welcome, of course.

Just a week ago I hailed the new king, and already there was an assassination attempt. A new paper claims that the statistical significance of the 750 GeV diphoton excess is merely 2 sigma local. The story is being widely discussed in the corridors and comment sections because we all like to watch things die... The assassins used this plot:

The Standard Model prediction for the diphoton background at the LHC is difficult to calculate from first principles. Therefore, the ATLAS collaboration assumes a theoretically motivated functional form for this background as a function of the diphoton invariant mass. The ansatz contains a number of free parameters, which are then fitted using the data in the entire analyzed range of invariant masses. This procedure leads to the prediction represented by the dashed line in the plot (but see later). The new paper assumes a slightly more complicated functional form with more free parameters, such that the slope of the background is allowed to change. The authors argue that their more general ansatz provides a better fit to the entire diphoton spectrum, and moreover predicts a larger background for the large invariant masses. As a result, the significance of the 750 GeV excess decreases to an insignificant value of 2 sigma.

There are several problems with this claim. First, I'm confused why the blue line is described as the ATLAS fit, since it is clearly different than the background curve in the money-plot provided by ATLAS (Fig. 1 in ATLAS-CONF-2015-081). The true ATLAS background is above the blue line, and much closer to the black line in the peak region (*edit: it seems now that the background curve plotted by ATLAS corresponds to a1=0 and one more free parameter for an overall normalization, while the paper assumes fixed normalization*). Second, I cannot reproduce the significance quoted in the paper. Taking the two ATLAS bins around 750 GeV, I find 3.2 sigma excess using the true ATLAS background, and 2.6 sigma using the black line (*edit: this is because my estimate is too simplistic, and the paper also takes into account the uncertainty on the background curve*). Third, the postulated change of slope is difficult to justify theoretically. It would mean there is a new background component kicking in at ~500 GeV, but this does not seem to be the case in this analysis.

Finally, the problem with the black line is that it grossly overshoots the high mass tail, which is visible even to a naked eye. To be more quantitative, in the range 790-1590 GeV there are 17 diphoton events observed by ATLAS, the true ATLAS backgrounds predicts 19 events, and the black line predicts 33 events. Therefore, the background shape proposed in the paper is inconsistent with the tail at the 3 sigma level! While the alternative background choice decreases the significance at the 750 GeV peak, it simply moves (and amplifies) the tension to another place.

So, I think the plot is foiled and the claim does not stand scrutiny. The 750 GeV peak may well be just a statistical fluctuation that will go away when more data is collected, but it's unlikely to be a stupid error on the part of ATLAS. The king will live at least until summer.

The Standard Model prediction for the diphoton background at the LHC is difficult to calculate from first principles. Therefore, the ATLAS collaboration assumes a theoretically motivated functional form for this background as a function of the diphoton invariant mass. The ansatz contains a number of free parameters, which are then fitted using the data in the entire analyzed range of invariant masses. This procedure leads to the prediction represented by the dashed line in the plot (but see later). The new paper assumes a slightly more complicated functional form with more free parameters, such that the slope of the background is allowed to change. The authors argue that their more general ansatz provides a better fit to the entire diphoton spectrum, and moreover predicts a larger background for the large invariant masses. As a result, the significance of the 750 GeV excess decreases to an insignificant value of 2 sigma.

There are several problems with this claim. First, I'm confused why the blue line is described as the ATLAS fit, since it is clearly different than the background curve in the money-plot provided by ATLAS (Fig. 1 in ATLAS-CONF-2015-081). The true ATLAS background is above the blue line, and much closer to the black line in the peak region (

Finally, the problem with the black line is that it grossly overshoots the high mass tail, which is visible even to a naked eye. To be more quantitative, in the range 790-1590 GeV there are 17 diphoton events observed by ATLAS, the true ATLAS backgrounds predicts 19 events, and the black line predicts 33 events. Therefore, the background shape proposed in the paper is inconsistent with the tail at the 3 sigma level! While the alternative background choice decreases the significance at the 750 GeV peak, it simply moves (and amplifies) the tension to another place.

So, I think the plot is foiled and the claim does not stand scrutiny. The 750 GeV peak may well be just a statistical fluctuation that will go away when more data is collected, but it's unlikely to be a stupid error on the part of ATLAS. The king will live at least until summer.

The Higgs boson couples to particles that constitute matter around us, such as electrons, protons, and neutrons. Its virtual quanta are constantly being exchanged between these particles. In other words, it gives rise to a force - the *Higgs force*. I'm surprised why this PR-cool aspect is not explored in our outreach efforts. Higgs bosons mediate the Higgs force in the same fashion as gravitons, gluons, photons, W and Z bosons mediate the gravity, strong, electromagnetic, and weak forces. Just like gravity, the Higgs force is always attractive and its strength is proportional, in the first approximation, to particle's mass. It is a force in a common sense; for example, if we bombarded long enough a detector with a beam of particles interacting only via the Higgs force, they would eventually knock off atoms in the detector.

There is of course a reason why the Higgs force is less discussed: it has never been detected directly. Indeed, in the absence of midi-chlorians it is extremely weak. First, it shares the feature of the weak interactions of being short-ranged: since the mediator is massive, the interaction strength is exponentially suppressed at distances larger than an attometer (10^-18 m), about 0.1% of the diameter of a proton. Moreover, for ordinary matter, the weak force is more important because of the tiny Higgs couplings to light quarks and electrons. For example, for the proton the Higgs force is thousand times weaker than the weak force, and for the electron it is hundred thousand times weaker. Finally, there are no known particles interacting*only* via the Higgs force and gravity (though dark matter in some hypothetical models has this property), so in practice the Higgs force is always a tiny correction to more powerful forces that shape the structure of atoms and nuclei. This is again in contrast to the weak force, which is particularly relevant for neutrinos who are immune to strong and electromagnetic forces.

Nevertheless, this new paper argues that the situation is not hopeless, and that the current experimental sensitivity is good enough to start probing the Higgs force. The authors propose to do it by means of atom spectroscopy. Frequency measurements of atomic transitions have reached the stunning accuracy of order 10^-18. The Higgs force creates a Yukawa type potential between the nucleus and orbiting electrons, which leads to a shift of the atomic levels. The effect is tiny, in particular it is always smaller than the analogous shift due to the weak force. This is a serious problem, because calculations of the leading effects may not be accurate enough to extract the subleading Higgs contribution. Fortunately, there may be tricks to reduce the uncertainties. One is to measure how the isotope shift of transition frequencies for several isotope pairs. The theory says that the leading atomic interactions should give rise to a universal linear relation (the so-called King's relation) between isotope shifts for different transitions. The Higgs and weak interactions should lead to a violation of King's relation. Given many uncertainties plaguing calculations of atomic levels, it may still be difficult to ever claim a detection of the Higgs force. More realistically, one can try to set limits on the Higgs couplings to light fermions which will be better than the current collider limits.

Atomic spectroscopy is way above my head, so I cannot judge if the proposal is realistic. There are a few practical issues to resolve before the Higgs force is mastered into a lightsaber. However, it is possible that a new front to study the Higgs boson will be opened in the near future. These studies will provide information about the Higgs couplings to light Standard Model fermions, which is complementary to the information obtained from collider searches.

There is of course a reason why the Higgs force is less discussed: it has never been detected directly. Indeed, in the absence of midi-chlorians it is extremely weak. First, it shares the feature of the weak interactions of being short-ranged: since the mediator is massive, the interaction strength is exponentially suppressed at distances larger than an attometer (10^-18 m), about 0.1% of the diameter of a proton. Moreover, for ordinary matter, the weak force is more important because of the tiny Higgs couplings to light quarks and electrons. For example, for the proton the Higgs force is thousand times weaker than the weak force, and for the electron it is hundred thousand times weaker. Finally, there are no known particles interacting

Nevertheless, this new paper argues that the situation is not hopeless, and that the current experimental sensitivity is good enough to start probing the Higgs force. The authors propose to do it by means of atom spectroscopy. Frequency measurements of atomic transitions have reached the stunning accuracy of order 10^-18. The Higgs force creates a Yukawa type potential between the nucleus and orbiting electrons, which leads to a shift of the atomic levels. The effect is tiny, in particular it is always smaller than the analogous shift due to the weak force. This is a serious problem, because calculations of the leading effects may not be accurate enough to extract the subleading Higgs contribution. Fortunately, there may be tricks to reduce the uncertainties. One is to measure how the isotope shift of transition frequencies for several isotope pairs. The theory says that the leading atomic interactions should give rise to a universal linear relation (the so-called King's relation) between isotope shifts for different transitions. The Higgs and weak interactions should lead to a violation of King's relation. Given many uncertainties plaguing calculations of atomic levels, it may still be difficult to ever claim a detection of the Higgs force. More realistically, one can try to set limits on the Higgs couplings to light fermions which will be better than the current collider limits.

Atomic spectroscopy is way above my head, so I cannot judge if the proposal is realistic. There are a few practical issues to resolve before the Higgs force is mastered into a lightsaber. However, it is possible that a new front to study the Higgs boson will be opened in the near future. These studies will provide information about the Higgs couplings to light Standard Model fermions, which is complementary to the information obtained from collider searches.

Here is a fun problem, with a great story and a surprising answer. According to the Talmud, in order for the Sanhedrin to sentence a man to death, the majority of them must agree to it. However R. Kahana said: If the Sanhedrin unanimously find [the accused] guilty, he is acquitted. (Babylonian Talmud, Tractate Sanhedrin, […]

Here is a fun problem, with a great story and a surprising answer.

According to the Talmud, in order for the Sanhedrin to sentence a man to death, the majority of them must agree to it. However

R. Kahana said: If the Sanhedrin unanimously find [the accused] guilty, he is acquitted. (Babylonian Talmud, Tractate Sanhedrin, Folio 17a)

Scott Alexander has a devious mind and considers how he would respond to this rule as a criminal:

[F]irst I’d invite a bunch of trustworthy people over as eyewitnesses, then I’d cover all available surfaces of the crime scene with fingerprints and bodily fluids, and finally I’d make sure to videotape myself doing the deed and publish the video on YouTube.

So, suppose you were on a panel of judges, all of whom had seen overwhelming evidence of the accused’s guilt, and wanted to make sure that a majority of you would vote to convict, but not all of you. And suppose you cannot communicate. With what probability would you vote to convict?

Test your intuition by guessing an answer now, then click below:

My gut instincts were that (1) we should choose really close to , probably approaching as and (2) there is no way this question would have a precise round answer. As you will see, I was quite wrong.

Tumblr user lambdaphagy is smarter than I was and wrote a program. Here are his or her results:

As you can see, it appears that is not approaching , or even coming close to it, but is somewhere near . Can we explain this?

We want to avoid two events: unanimity, and a majority vote to acquit. The probability of unanimity is .

The probability of a majority vote to acquit is . Assuming that , and it certainly should be, almost all of the contribution to that sum will come from terms where . In that case, . And we’ll roughly care about such terms. So the odds of acquittal are roughly .

So we roughly want to be as small as possible. For large, one of the two terms will be much larger than the other, so it is the same to ask that be as small as possible.

Here is a plot of :

Ignore the part with below ; that’s clearly wrong and our approximation that is dominated by won’t be good there. Over the range , the minimum is where .

Let’s do some algebra: , , (since is clearly wrong), . Holy cow, is actually right!

First of all, actually do some computations.

Secondly, I was wrongly thinking that failing by acquittal would be much more important than failing by unanimity. I think I was mislead because one of them occurs for values of and the other only occurs for one value. I should have realized two things (1) the bell curve is tightly peaked, so it is really only the very close to which matter and (2) exponentials are far more powerful than the ratio between or and anyway.

Finally, for the skeptics, here is an actual proof. Assuming , we have

The main step is to replace each by the largest it can be.

But also,

Here we have lower bounded the sum by one of its terms, and then used the easy bound since it is the largest of the entries in a row of Pascal’s triangle which sums to .

So the odds of failure are bounded between

and . We further use the convenient trick of replacing a with a , up to bounded error to get that the odds of failure are bounded between and .

Now, let be a probability greater than other than . We claim that choosing conviction probability will be better than for large. Indeed, the -strategy will fail with odds at least , and the strategy will fail with odds at most . Since , one of the two exponentials in the first case is larger than , and the -strategy is more likely to fail, as claimed.

Of course, for a Sanhedrin of members, , so our upper bound predicts only a one percent probability of failure. More accurately computations give . So the whole conversation deals with the overly detailed analysis of an unlikely consequence of a bizarre hypothetical event. Fortunately, this is not a problem in the study of Talmud!

**Non-Lily-Related Updates (Jan. 22)**

Uri Bram posted a cute little article about whether he was justified, as a child, to tell his parents that he wouldn’t clean up his room because doing so would only increase the universe’s entropy and thereby hasten its demise. The article quotes me, Sean Carroll, and others about that important question.

On Wednesday I gave a TCS+ online seminar about “The Largest Possible Quantum Speedups.” If you’re interested, you can watch the YouTube video here.

(I promised a while ago that I’d upload some examples of Lily’s MOMA-worthy modern artworks. So, here are two!)

**A few quotable quotes:**

Daddy, when you were little, you were a girl like me!

I’m feeling a bit juicy [thirsty for juice].

Saba and Safta live in Israel. They’re mommy’s friends! [Actually they’re mommy’s parents.]

Me: You’re getting bigger every day!

Lily: But I’m also getting smaller every day!

Me: Then Goldilocks tasted the third bowl, which was Baby Bear’s, and it was *just right!* So she *ate it all up*. Then Goldilocks went…

Lily: No, then Goldilocks ate some cherries in the kitchen before she went to the bedroom. And blueberries.

Me: Fine, so she ate cherries and blueberries. Then she went to the bedroom, and she saw that there were three beds…

Lily: No, four beds!

Me: Fine, four beds. So she laid in the first bed, but she said, “this bed is too hard.”

Lily: No, it was too comfortable!

Me: Too comfortable? Is she some kind of monk?

Me [pointing to a taxidermed black bear in a museum]: What’s that?

Lily: A bear!

Me: Is it Winnie the Pooh?

Lily: No, it’s a different kind of bear.

Me [pointing to a tan bear in the next case]: So what about that one? Is *that* Winnie?

Lily: Yes! That’s Winnie the Pooh!

[Looking at it more closely] No, it’s a different kind of Winnie.

Lily: Why is it dark outside?

Me: Because it’s night time.

Lily: Why is it night time?

Me: Because the sun went to the other side of the world.

Lily: It went to China!

Me: Yes! It did in fact go to China.

Lily: Why did the sun go to China?

Me: Well, more accurately, it only *seemed* to go there, because the world that we’re on is spinning.

Lily: Why is the world spinning?

Me: Because of the conservation of angular momentum.

Lily: Why is the … consibation of amomomo?

Me: I suppose because of Noether’s Theorem, and the fact that our laws of physics are symmetric under spatial rotations.

Lily: Why is…

Me: That’s enough for today Lily!

Although it was from only a couple of people, I had an enthusiastic response to a very tentative suggestion that it might be rewarding to see whether a polymath project could say anything useful about Frankl’s union-closed conjecture. A potentially worrying aspect of the idea is that the problem is extremely elementary to state, does […]

Although it was from only a couple of people, I had an enthusiastic response to a very tentative suggestion that it might be rewarding to see whether a polymath project could say anything useful about Frankl’s union-closed conjecture. A potentially worrying aspect of the idea is that the problem is extremely elementary to state, does not seem to yield to any standard techniques, and is rather notorious. But, as one of the commenters said, that is not necessarily an argument against trying it. A notable feature of the polymath experiment has been that it throws up surprises, so while I wouldn’t expect a polymath project to solve Frankl’s union-closed conjecture, I also know that I need to be rather cautious about my expectations — which in this case is an argument in favour of giving it a try.

A less serious problem is what acronym one would use for the project. For the density Hales-Jewett problem we went for DHJ, and for the Erdős discrepancy problem we used EDP. That general approach runs into difficulties with Frankl’s union-closed conjecture, so I suggest FUNC. This post, if the project were to go ahead, could be FUNC0; in general I like the idea that we would be engaged in a funky line of research.

The problem, for anyone who doesn’t know, is this. Suppose you have a family that consists of distinct subsets of a set . Suppose also that it is *union closed*, meaning that if , then as well. Must there be an element of that belongs to at least of the sets? This seems like the sort of question that ought to have an easy answer one way or the other, but it has turned out to be surprisingly difficult.

If you are potentially interested, then one good thing to do by way of preparation is look at this survey article by Henning Bruhn and Oliver Schaudt. It is very nicely written and seems to be a pretty comprehensive account of the current state of knowledge about the problem. It includes some quite interesting reformulations (interesting because you don’t just look at them and see that they are trivially equivalent to the original problem).

For the remainder of this post, I want to discuss a couple of failures. The first is a natural idea for generalizing the problem to make it easier that completely fails, at least initially, but can perhaps be rescued, and the second is a failed attempt to produce a counterexample. I’ll present these just in case one or other of them stimulates a useful idea in somebody else.

An immediate reaction of any probabilistic combinatorialist is likely to be to wonder whether in order to prove that there *exists* a point in at least half the sets it might be easier to show that in fact an *average* point belongs to half the sets.

Unfortunately, it is very easy to see that that is false: consider, for example, the three sets , , and . The average (over ) of the number of sets containing a random element is , but there are three sets.

However, this example doesn't feel like a genuine counterexample somehow, because the set system is just a dressed up version of : we replace the singleton by the set and that's it. So for this set system it seems more natural to consider a *weighted* average, or equivalently to take not the uniform distribution on , but some other distribution that reflects more naturally the properties of the set system at hand. For example, we could give a probability 1/2 to the element 1 and to each of the remaining 12 elements of the set. If we do that, then the average number of sets containing a random element will be the same as it is for the example with the uniform distribution (not that the uniform distribution is obviously the most natural distribution for that example).

This suggests a very slightly more sophisticated version of the averaging-argument idea: does there *exist* a probability distribution on the elements of the ground set such that the expected number of sets containing a random element (drawn according to that probability distribution) is at least half the number of sets?

With this question we have in a sense the opposite problem. Instead of the answer being a trivial no, it is a trivial yes — if, that is, the union-closed conjecture holds. That’s because if the conjecture holds, then some belongs to at least half the sets, so we can assign probability 1 to that and probability zero to all the other elements.

Of course, this still doesn’t feel like a complete demolition of the approach. It just means that for it not to be a trivial reformulation we will have to put *conditions* on the probability distribution. There are two ways I can imagine getting the approach to work. The first is to insist on some property that the distribution is required to have that means that its existence does *not* follow easily from the conjecture. That is, the idea would be to prove a stronger statement. It seems paradoxical, but as any experienced mathematician knows, it can sometimes be easier to prove a stronger statement, because there is less room for manoeuvre. In extreme cases, once a statement has been suitably strengthened, you have so little choice about what to do that the proof becomes almost trivial.

A second idea is that there might be a nice way of defining the probability distribution in terms of the set system. This would be a situation rather like the one I discussed in my previous post, on entropy and Sidorenko’s conjecture. There, the basic idea was to prove that a set had cardinality at least by proving that there is a probability distribution on with entropy at least . At first, this seems like an unhelpful idea, because if then the uniform distribution on will trivially do the job. But it turns out that there is a different distribution for which it is easier to *prove* that it does the job, even though it usually has lower entropy than the uniform distribution. Perhaps with the union-closed conjecture something like this works too: obviously the best distribution is supported on the set of elements that are contained in a maximal number of sets from the set system, but perhaps one can construct a different distribution out of the set system that gives a smaller average in general but about which it is easier to prove things.

I have no doubt that thoughts of the above kind have occurred to a high percentage of people who have thought about the union-closed conjecture, and can probably be found in the literature as well, but it would be odd not to mention them in this post.

To finish this section, here is a wild guess at a distribution that does the job. Like almost all wild guesses, its chances of being correct are very close to zero, but it gives the flavour of the kind of thing one might hope for.

Given a finite set and a collection of subsets of , we can pick a random set (uniformly from ) and look at the events for each . In general, these events are correlated.

Now let us define a matrix by . We could now try to find a probability distribution on that minimizes the sum . That is, in a certain sense we would be trying to make the events as uncorrelated as possible on average. (There may be much better ways of measuring this — I’m just writing down the first thing that comes into my head that I can’t immediately see is stupid.)

What does this give in the case of the three sets , and ? We have that if or or and . If and , then , since if , then is one of the two sets and , with equal probability.

So to minimize the sum we should choose so as to maximize the probability that and . If , then this probability is , which is maximized when , so in fact we get the distribution mentioned earlier. In particular, for this distribution the average number of sets containing a random point is , which is precisely half the total number of sets. (I find this slightly worrying, since for a successful proof of this kind I would expect equality to be achieved only in the case that you have disjoint sets and you take all their unions, including the empty set. But since this definition of a probability distribution isn’t supposed to be a serious candidate for a proof of the whole conjecture, I’m not too worried about being worried.)

Just to throw in another thought, perhaps some entropy-based distribution would be good. I wondered, for example, about defining a probability distribution as follows. Given any probability distribution, we obtain weights on the sets by taking to be the probability that a random element (chosen from the distribution) belongs to . We can then form a probability distribution on by taking the probabilities to be proportional to the weights. Finally, we can choose a distribution on the elements to maximize the entropy of the distribution on .

If we try that with the example above, and if is the probability assigned to the element 1, then the three weights are and , so the probabilities we will assign will be and . The entropy of this distribution will be maximized when the two non-zero probabilities are equal, which gives us , so in this case we will pick out the element 1. It isn’t completely obvious that that is a bad thing to do for this particular example — indeed, we will do it whenever there is an element that is contained in all the non-empty sets from . Again, there is virtually no chance that this rather artificial construction will work, but perhaps after a lot of thought and several modifications and refinements, something like it could be got to work.

I find the non-example I’m about to present interesting because I don’t have a good conceptual understanding of why it fails — it’s just that the numbers aren’t kind to me. But I think there *is* a proper understanding to be had. Can anyone give me a simple argument that no construction that is anything like what I tried can possibly work? (I haven’t even checked properly whether the known positive results about the problem ruled out my attempt before I even started.)

The idea was as follows. Let and be parameters to be chosen later, and let be a random set system obtained by choosing each subset of of size with probability , the choices being independent. We then take as our attempted counterexample the set of all unions of sets in .

Why might one entertain even for a second the thought that this could be a counterexample? Well, if we choose to be rather close to , but just slightly less, then a typical pair of sets of size have a union of size close to , and more generally a typical union of sets of size has size at least this. There are vastly fewer sets of size greater than than there are of size , so we could perhaps dare to hope that almost all the sets in the set system are the ones of size , so the average size is close to , which is less than . And since the sets are spread around, the elements are likely to be contained in roughly the same number of sets each, so this gives a counterexample.

Of course, the problem here is that although a typical union is large, there are many atypical unions, so we need to get rid of them somehow — or at least the vast majority of them. This is where choosing a random subset comes in. The hope is that if we choose a fairly sparse random subset, then all the unions will be large rather than merely almost all.

However, this introduces a new problem, which is that if we have passed to a *sparse* random subset, then it is no longer clear that the size of that subset is bigger than the number of possible unions. So it becomes a question of balance: can we choose small enough for the unions of those sets to be typical, but still large enough for the sets of size to dominate the set system? We’re also free to choose of course.

I usually find when I’m in a situation like this, where I’m hoping for a miracle, that a miracle doesn’t occur, and that indeed seems to be the case here. Let me explain my back-of-envelope calculation.

I’ll write for the set of unions of sets in . Let us now take and give an upper bound for the expected number of sets in of size . So fix a set of size and let us give a bound for the probability that . We know that must contain at least two sets in . But the number of pairs of sets of size contained in is at most and each such pair has a probability of being a pair of sets in , so the probability that is at most . Therefore, the expected number of sets in of size is at most .

As for the expected number of sets in , it is , so if we want the example to work, we would very much like it to be the case that when , we have the inequality

.

We can weaken this requirement by observing that the expected number of sets in of size is also trivially at most , so it is enough to go for

.

If the left-hand side is not just greater than the right-hand side, but greater by a factor of say for each , then we should be in good shape: the average size of a set in will be not much greater than and we’ll be done.

If is not much bigger than , then things look quite promising. In this case, will be comparable in size to , but will be quite small — it equals , and is small. A crude estimate says that we’ll be OK provided that is significantly smaller than . And that looks OK, since is a lot smaller than , so we aren’t being made to choose a ridiculously small value of .

If on the other hand is quite a lot larger than , then is much much smaller than , so we’re in great shape as long as we haven’t chosen so tiny that is also much much smaller than .

So what goes wrong? Well, the problem is that the first argument requires smaller and smaller values of as gets further and further away from , and the result seems to be that by the time the second regime takes over, has become too small for the trivial argument to work.

Let me try to be a bit more precise about this. The point at which becomes smaller than is of course the point at which . For that value of , we require , so we need . However, an easy calculation reveals that

,

(or observe that if you multiply both sides by , then both expressions are equal to the multinomial coefficient that counts the number of ways of writing an -element set as with and ). So unfortunately we find that however we choose the value of there is a value of such that the number of sets in of size is greater than . (I should remark that the estimate for the number of sets in of size can be improved to , but this does not make enough of a difference to rescue the argument.)

So unfortunately it turns out that the middle of the range is worse than the two ends, and indeed worse by enough to kill off the idea. However, it seemed to me to be good to make at least some attempt to find a counterexample in order to understand the problem better.

From here there are two obvious ways to go. One is to try to modify the above idea to give it a better chance of working. The other, which I have already mentioned, is to try to generalize the failure: that is, to explain why that example, and many others like it, had no hope of working. Alternatively, somebody could propose a completely different line of enquiry.

I’ll stop there. Experience with Polymath projects so far seems to suggest that, as with individual projects, it is hard to predict how long they will continue before there is a general feeling of being stuck. So I’m thinking of this as a slightly tentative suggestion, and if it provokes a sufficiently healthy conversation and interesting new (or at least new to me) ideas, then I’ll write another post and launch a project more formally. In particular, only at that point will I call it Polymath11 (or should that be Polymath12? — I don’t know whether the almost instantly successful polynomial-identities project got round to assigning itself a number). Also, for various reasons I don’t want to get properly going on a Polymath project for at least a week, though I realize I may not be in complete control of what happens in response to this post.

Just before I finish, let me remark that Polymath10, attempting to prove Erdős’s sunflower conjecture, is still continuing on Gil Kalai’s blog. What’s more, I think it is still at a stage where a newcomer could catch up with what is going on — it might take a couple of hours to find and digest a few of the more important comments. But Gil and I agree that there may well be room to have more than one Polymath project going at the same time, since a common pattern is for the group of participants to shrink down to a smallish number of “enthusiasts”, and there are enough mathematicians to form many such groups.

And a quick reminder, as maybe some people reading this will be new to the concept of Polymath projects. The aim is to try to make the problem-solving process easier in various ways. One is to have an open discussion, in the form of blog posts and comments, so that anybody can participate, and with luck a process of self-selection will take place that results in a team of enthusiastic people with a good mixture of skills and knowledge. Another is to encourage people to express ideas that may well be half-baked or even wrong, or even completely *obviously* wrong. (It’s surprising how often a completely obviously wrong idea can stimulate a different idea that turns out to be very useful. Naturally, expressing such an idea can be embarrassing, but it shouldn’t be, as it is an important part of what we do when we think about problems privately.) Another is to provide a mechanism where people can get very quick feedback on their ideas — this too can be extremely stimulating and speed up the process of thought considerably. If you like the problem but don’t feel like pursuing either of the approaches I’ve outlined above, that’s of course fine — your ideas are still welcome and may well be more fruitful than those ones, which are there just to get the discussion started.

A week has passed since the LHC jamboree, but the excitement about the 750 GeV diphoton excess has not abated. So far, the scenario from 2011 repeats itself. A significant but not definitive signal is spotted in the early data set by the ATLAS and CMS experiments. This announcement is wrapped in multiple layers of caution and skepticism by experimentalists, but is universally embraced by theorists. What is unprecedented is the scale of theorist's response, which took a form of a hep-ph tsunami. I still need time to digest this feast, and pick up interesting bits among general citation fishing. So today I won't write about the specific models in which the 750 GeV particle could fit: I promise a post on that after the New Year (anyway, the short story is that, *oh my god, it could be just anybody*). Instead, I want to write about one point that was elucidated by the early papers, namely that the diphoton resonance signal is unlikely to be on its own, and there should be accompanying signals in other channels. In the best case scenario, confirmation of the diphoton signal may come by analyzing the existing data in other channels collected this year or in run-1.

First of all, there should be a**dijet** signal. Since the new particle is almost certainly produced via gluon collisions, it must be able to decay to gluons as well by time-reversing the production process. This would show up at the LHC as a pair of energetic jets with the invariant mass of 750 GeV. Moreover, in simplest models the 750 GeV particle decays to gluons *most of the times*. The precise dijet rate is very model-dependent, and in some models it is too small to ever be observed, but typical scenarios predict order 1-10 picobarn dijet cross-sections. This would mean that thousands of such events have been produced in the LHC run-1 and this year in run-2. The plot on the right shows one example of a parameter space (green) overlaid with contours of dijet cross section (red lines) and limits from dijet resonance searches in run-1 with 8 TeV proton collisions (red area). Dijet resonance searches are routine at the LHC, however experimenters usually focus on the high-energy end of the spectrum, far above 1 TeV invariant mass. In fact, the 750 GeV region is not covered at all by the recent LHC searches at 13 TeV proton collision energy.

The next important conclusion is that there should be matching signals in other**diboson** channels at the 750 GeV invariant mass. For the 125 GeV Higgs boson, the signal was originally discovered in both the γγ and the ZZ final states, while in the WW channel the signal is currently similarly strong. If the 750 GeV particle were anything like the Higgs, the resonance should actually first show in the ZZ and WW final states (due to the large coupling to longitudinal polarizations of vector bosons which is a characteristic feature of Higgs-like particles). From the non-observation of anything interesting in run-1 one can conclude that there must be little Higgsiness in the 750 GeV particle, less than 10%. Nevertheless, even if the particle has nothing to do with the Higgs (for example, if it's a pseudo-scalar), it should still decay to diboson final states once in a while. This is because a neutral scalar cannot couple directly to photons, and the coupling has to arise at the quantum level through some other new electrically charged particles, see the diagram above. The latter couple not only to photons but also to Z bosons, and sometimes to W bosons too. While the details of the branching fractions are highly dependent, diboson signals with comparable rates as the diphoton one are generically predicted. In this respect, the decays of the 750 GeV particle to one photon and one Z boson emerge as a new interesting battleground. For the 125 GeV Higgs boson, decays to Zγ have not been observed yet, but in the heavier mass range the sensitivity is apparently better. ATLAS made a search for high-mass Zγ resonances in the run-1 data, and their limits already put non-trivial constraint on some models explaining the 750 GeV excess. Amusingly, the ATLAS Zγ search has a 1 sigma excess at 730 GeV... CMS has no search in this mass range at all, and both experiments are yet to analyze the run-2 data in this channel. So, in principle, it is well possible that we learn something interesting even before the new round of collisions starts at the LHC.

Another generic prediction is that there should be**vector-like quarks **or other new colored particles just behind the corner. As mentioned above, such particles are necessary to generate an effective coupling of the 750 GeV particle to photons and gluons. In order for those couplings to be large enough to explain the observed signal, at least one of the new states should have mass below ~1.5 TeV. Limits on vector-like quarks depend on what they decay to, but the typical sensitivity in run-1 is around 800 GeV. In run-2, CMS already presented a search for a charge 5/3 quark decaying to a top quark and a W boson, and they were able to improve the run-1 limits on the new quark's mass from 800 GeV up to 950 GeV. Limits on other type of new quarks should follow shortly.

On a bit more speculative side, ATLAS claims that the best fit to the data is obtained if the 750 GeV resonance is wider than the experimental resolution. While the statistical significance of this statement is not very high, it would have profound consequences if confirmed. Large width is possible only if the 750 GeV particle decays to other final states than photons and gluons. An exciting possibility is that the large width is due to decays to a new hidden sector with new light particles very weakly or not at all coupled to the Standard Model. If these particles do not leave any trace in the detector then the signal is the same**monojet** signature as that of dark matter: an energetic jet emitted before the collision without matching activity on the other side of the detector. In fact, dark matter searches in run-1 practically exclude the possibility that the large width can be accounted for uniquely by invisible decays (see comments #2 and #13 below). However, if the new particles in the hidden sector couple weakly to the known particles, they can decay back to our sector, possibly after some delay, leading to complicated exotic signals in the detector. This is the so-called **hidden valley** scenario that my fellow blogger has been promoting for some time. If the 750 GeV particle is confirmed to have a large width, the motivation for this kind of new physics will become very strong. Many of the possible signals that one can imagine in this context are yet to be searched for.

Dijets, dibosons, monojets, vector-like quarks, hidden valley... experimentalists will have hands full this winter. A negative result in any of these searches would not strongly disfavor the diphoton signal, but would provide important clues for model building. A positive signal would break all hell loose, assuming it hasn't yet. So, we are waiting eagerly for further results from the LHC, which should show up around the time of the Moriond conference in March. Watch out for rumors on blogs and Twitter ;)

First of all, there should be a

The next important conclusion is that there should be matching signals in other

Another generic prediction is that there should be

On a bit more speculative side, ATLAS claims that the best fit to the data is obtained if the 750 GeV resonance is wider than the experimental resolution. While the statistical significance of this statement is not very high, it would have profound consequences if confirmed. Large width is possible only if the 750 GeV particle decays to other final states than photons and gluons. An exciting possibility is that the large width is due to decays to a new hidden sector with new light particles very weakly or not at all coupled to the Standard Model. If these particles do not leave any trace in the detector then the signal is the same

Dijets, dibosons, monojets, vector-like quarks, hidden valley... experimentalists will have hands full this winter. A negative result in any of these searches would not strongly disfavor the diphoton signal, but would provide important clues for model building. A positive signal would break all hell loose, assuming it hasn't yet. So, we are waiting eagerly for further results from the LHC, which should show up around the time of the Moriond conference in March. Watch out for rumors on blogs and Twitter ;)

Last summer my students Brendan Fong and Blake Pollard visited me at the Centre for Quantum Technologies, and we figured out how to understand *open* continuous-time Markov chains! I think this is a nice step towards understanding the math of living systems.

Admittedly, it’s just a small first step. But I’m excited by this step, since Blake and I have been trying to get this stuff to work for a couple years, and it finally fell into place. And we think we know what to do next. Here’s our paper:

- John Baez, Brendan Fong and Blake S. Pollard, A compositional framework for open Markov processes.

And here’s the basic idea….

A continuous-time Markov chain is a way to specify the dynamics of a population which is spread across some finite set of states. Population can flow between the states. The larger the population of a state, the more rapidly population flows out of the state. Because of this property, under certain conditions the populations of the states tend toward an equilibrium where at any state the inflow of population is balanced by its outflow.

In applications to statistical mechanics, we are often interested in equilibria such that for any two states connected by an edge, say $i$ and $j,$ the flow from $i$ to $j$ equals the flow from $j$ to $i.$ A continuous-time Markov chain with a chosen equilibrium having this property is called ‘detailed balanced’.

I’m getting tired of saying ‘continuous-time Markov chain’, so from now on I’ll just say ‘Markov process’, just because it’s shorter. Okay? That will let me say the next sentence without running out of breath:

Our paper is about *open* detailed balanced Markov processes.

Here’s an example:

The detailed balanced Markov process itself consists of a finite set of states together with a finite set of edges between them, with each state $i$ labelled by an equilibrium population $q_i >0,$ and each edge $e$ labelled by a rate constant $r_e > 0.$

These populations and rate constants are required to obey an equation called the ‘detailed balance condition’. This equation means that in equilibrium, the flow from $i$ to $j$ equal the flow from $j$ to $i.$ Do you see how it works in this example?

To get an ‘open’ detailed balanced Markov process, some states are designated as inputs or outputs. In general each state may be specified as both an input and an output, or as inputs and outputs multiple times. See how that’s happening in this example? It may seem weird, but it makes things work better.

People usually say Markov processes are all about how *probabilities* flow from one state to another. But we work with un-normalized probabilities, which we call ‘populations’, rather than probabilities that must sum to 1. The reason is that in an *open* Markov process, probability is not conserved: it can flow in or out at the inputs and outputs. We allow it to flow both in and out at both the input states and the output states.

Our most fundamental result is that there’s a category ${DetBalMark}$ where a morphism is an open detailed balanced Markov process. We think of it as a morphism from its inputs to its outputs.

We compose morphisms in ${DetBalMark}$ by identifying the output states of one open detailed balanced Markov process with the input states of another. The populations of identified states must match. For example, we may compose this morphism $N$:

with the previously shown morphism $M$ to get this morphism $M \circ N$:

And here’s our second most fundamental result: the category ${DetBalMark}$ is actually a dagger compact category. This lets us do other stuff with open Markov processes. An important one is ‘tensoring’, which lets us take two open Markov processes like $M$ and $N$ above and set them side by side, giving $M \otimes N$:

The compactness is also important. This means we can take some inputs of an open Markov process and turn them into outputs, or vice versa. For example, using the compactness of ${DetBalMark}$ we can get this open Markov process from $M$:

In fact all the categories in our paper are dagger compact categories, and all our functors preserve this structure. Dagger compact categories are a well-known framework for describing systems with inputs and outputs, so this is good.

In a detailed balanced Markov process, population can flow along edges. In the detailed balanced equilibrium, without any flow of population from outside, the flow along from state $i$ to state $j$ will be matched by the flow back from $j$ to $i.$ The populations need to take specific values for this to occur.

In an electrical circuit made of linear resistors, charge can flow along wires. In equilibrium, without any driving voltage from outside, the current along each wire will be zero. The potentials will be equal at every node.

This sets up an analogy between detailed balanced continuous-time Markov chains and electrical circuits made of linear resistors! I love analogy charts, so this makes me very happy:

Circuits |
Detailed balanced Markov processes |

potential | population |

current | flow |

conductance | rate constant |

power | dissipation |

This analogy is already well known. Schnakenberg used it in his book *Thermodynamic Network Analysis of Biological Systems*. So, our main goal is to formalize and exploit it. This analogy extends from systems in equilibrium to the more interesting case of nonequilibrium steady states, which are the main topic of our paper.

Earlier, Brendan and I introduced a way to ‘black box’ a circuit and define the relation it determines between potential-current pairs at the input and output terminals. This relation describes the circuit’s external behavior as seen by an observer who can only perform measurements at the terminals.

An important fact is that black boxing is ‘compositional’: if one builds a circuit from smaller pieces, the external behavior of the whole circuit can be determined from the external behaviors of the pieces. For category theorists, this means that black boxing is a functor!

Our new paper with Blake develops a similar ‘black box functor’ for detailed balanced Markov processes, and relates it to the earlier one for circuits.

When you black box a detailed balanced Markov process, you get the relation between population–flow pairs at the terminals. (By the ‘flow at a terminal’, we more precisely mean the net population outflow.) This relation holds not only in equilibrium, but also in any nonequilibrium steady state. Thus, *black boxing an open detailed balanced Markov process gives its steady state dynamics as seen by an observer who can only measure populations and flows at the terminals*.

At least since the work of Prigogine, it’s been widely accepted that a large class of systems minimize entropy production in a nonequilibrium steady state. But people still fight about the the precise boundary of this class of systems, and even the meaning of this ‘principle of minimum entropy production’.

For detailed balanced open Markov processes, we show that a quantity we call the ‘dissipation’ is minimized in any steady state. This is a quadratic function of the populations and flows, analogous to the power dissipation of a circuit made of resistors. We make no claim that this quadratic function actually deserves to be called ‘entropy production’. Indeed, Schnakenberg has convincingly argued that they are only approximately equal.

But still, the ‘dissipation’ function is very natural and useful—and Prigogine’s so-called ‘entropy production’ is also a quadratic function.

I’ve already mentioned the category ${DetBalMark},$ where a morphism is an open detailed balanced Markov process. But our paper needs two more categories to tell its story! There’s the category of circuits, and the category of linear relations.

A morphism in the category ${Circ}$ is an open electrical circuit made of resistors: that is, a graph with each edge labelled by a ‘conductance’ $c_e > 0,$ together with specified input and output nodes:

A morphism in the category ${LinRel}$ is a linear relation $L : U \rightsquigarrow V$ between finite-dimensional real vector spaces $U$ and $V.$ This is nothing but a linear subspace $L \subseteq U \oplus V.$ Just as relations generalize functions, linear relations generalize linear functions!

In our previous paper, Brendan and I introduced these two categories and a functor between them, the ‘black box functor’:

$\blacksquare \colon {Circ} \to {LinRel}$

The idea is that any circuit determines a linear relation between the potentials and net current flows at the inputs and outputs. This relation describes the behavior of a circuit of resistors as seen from outside.

Our new paper introduces a black box functor for detailed balanced Markov processes:

$\square \colon {DetBalMark} \to {LinRel}$

We draw this functor as a white box merely to distinguish it from the other black box functor. The functor $\square$ maps any detailed balanced Markov process to the linear relation obeyed by populations and flows at the inputs and outputs in a steady state. In short, it describes the steady state behavior of the Markov process ‘as seen from outside’.

How do we manage to black box detailed balanced Markov processes? We do it using the analogy with circuits!

Every analogy wants to be a functor. So, we make the analogy between detailed balanced Markov processes and circuits precise by turning it into a functor:

$K : {DetBalMark} \to {Circ}$

This functor converts any open detailed balanced Markov process into an open electrical circuit made of resistors. This circuit is carefully chosen to reflect the steady-state behavior of the Markov process. Its underlying graph is the same as that of the Markov process. So, the ‘states’ of the Markov process are the same as the ‘nodes’ of the circuit.

Both the equilibrium populations at states of the Markov process and the rate constants labelling edges of the Markov process are used to compute the conductances of edges of this circuit. In the simple case where the Markov process has exactly one edge from any state $i$ to any state $j,$ the rule is this:

$C_{i j} = H_{i j} q_j$

where:

- $q_j$ is the equilibrium population of the $j$th state of the Markov process,
- $H_{i j}$ is the rate constant for the edge from the $j$th state to the $i$th state of the Markov process, and
- $C_{i j}$ is the conductance (that is, the reciprocal of the resistance) of the wire from the $j$th node to the $i$th node of the resulting circuit.

The detailed balance condition for Markov processes says precisely that the matrix $C_{i j}$ is symmetric! This is just right for an electrical circuit made of resistors, since it means that the resistance of the wire from node $i$ to node $j$ equals the resistance of the same wire in the reverse direction, from node $j$ to node $i.$

If you paid careful attention, you’ll have noticed that I’ve described a triangle of functors:

And if you’ve got the tao of category theory flowing in your veins, you’ll be wondering if this diagram commutes.

In fact, this triangle of functors does not commute! However, a general lesson of category theory is that we should only expect diagrams of functors to commute *up to natural isomorphism*, and this is what happens here:

The natural transformation $\alpha$ ‘corrects’ the black box functor for resistors to give the one for detailed balanced Markov processes.

The functors $\square$ and $\blacksquare \circ K$ are actually equal on objects. An object in ${DetBalMark}$ is a finite set $X$ with each element $i \in X$ labelled a positive populations $q_i.$ Both functors map this object to the vector space $\mathbb{R}^X \oplus \mathbb{R}^X.$ For the functor $\square,$ we think of this as a space of population-flow pairs. For the functor $\blacksquare \circ K,$ we think of it as a space of potential-current pairs. The natural transformation $\alpha$ then gives a linear relation

$\alpha_{X,q} : \mathbb{R}^X \oplus \mathbb{R}^X \rightsquigarrow \mathbb{R}^X \oplus \mathbb{R}^X$

in fact an isomorphism of vector spaces, which converts potential-current pairs into population-flow pairs in a manner that depends on the $q_i.$ I’ll skip the formula; it’s in the paper.

But here’s the key point. The naturality of $\alpha$ actually allows us to reduce the problem of computing the functor $\square$ to the problem of computing $\blacksquare.$ Suppose

$M \colon (X,q) \to (Y,r)$

is any morphism in ${DetBalMark}.$ The object $(X,q)$ is some finite set $X$ labelled by populations $q,$ and $(Y,r)$ is some finite set $Y$ labelled by populations $r.$ Then the naturality of $\alpha$ means that this square commutes:

Since $\alpha_{X,q}$ and $\alpha_{Y,r}$ are isomorphisms, we can solve for the functor $\square$ as follows:

$\square(M) = \alpha_Y \circ \blacksquare K(M) \circ \alpha_X^{-1}$

This equation has a clear intuitive meaning! It says that to compute the behavior of a detailed balanced Markov process, namely $\square(f),$ we convert it into a circuit made of resistors and compute the behavior of that, namely $\blacksquare K(f).$ This is not *equal* to the behavior of the Markov process, but we can compute that behavior by converting the input populations and flows into potentials and currents, feeding them into our circuit, and then converting the outputs back into populations and flows.

So that’s a sketch of what we did, and I hope you ask questions if it’s not clear. But I also hope you read our paper! Here’s what we actually do in there. After an introduction and summary of results:

- Section 3 defines open Markov processes and the open master equation.
- Section 4 introduces detailed balance for open Markov processes.
- Section 5 recalls the principle of minimum power for open circuits made of linear resistors, and explains how to black box them.
- Section 6 introduces the principle of minimum dissipation for open detailed balanced Markov processes, and describes how to black box these.
- Section 7 states the analogy between circuits and detailed balanced Markov processes in a formal way.
- Section 8 describes how to compose open Markov processes, making them into the morphisms of a category.
- Section 9 does the same for detailed balanced Markov processes.
- Section 10 describes the ‘black box functor’ that sends any open detailed balanced Markov process to the linear relation describing its external behavior, and recalls the similar functor for circuits.
- Section 11 makes the analogy between between open detailed balanced Markov processes and open circuits even more formal, by making it into a functor. We prove that together with the two black box functors, this forms a triangle that commutes up to natural isomorphism.
- Section 12 is about geometric aspects of this theory. We show that linear relations in the image of these black box functors are Lagrangian relations between symplectic vector spaces. We also show that the master equation can be seen as a gradient flow equation.
- Section 13 is a summary of what we have learned.

Finally, Appendix A is a quick tutorial on decorated cospans. This is a key mathematical tool in our work, developed by Brendan in an earlier paper.

Benoit Fresse has finished a big two-volume book on operads, which you can now see on his website:

- Benoit Fresse,
*Homotopy of Operads and Grothendieck-Teichmüller Groups*.

He writes:

The first aim of this book project is to give an overall reference, starting from scratch, on the application of methods of algebraic topology to operads. To be more specific, one of our main objectives is the development of a rational homotopy theory for operads. Most definitions, notably fundamental concepts of operad and homotopy theory, are carefully reviewed in order to make our account accessible to a broad readership, which should include graduate students, as well as researchers coming from the various fields of mathematics related to our main topics.

The second purpose of the book is to explain, from a homotopical viewpoint, a deep relationship between operads and Grothendieck-Teichmüller groups. This connection, which has been foreseen by M. Kontsevich (from researches on the deformation quantization process in mathematical physics), gives a new approach to understanding internal symmetries of structures occurring in various constructions of algebra and topology. In the book, we set up the background required by an in-depth study of this subject, and we make precise the interpretation of the Grothendieck-Teichmüller group in terms of the homotopy of operads. The book is actually organized for this ultimate objective, which readers can take either as a main motivation or as a leading example to learn about general theories.

The first volume is over 500 pages:

Contents: Introduction to the general theory of operads. Introduction to $E_n$-operads. Relationship between $E_2$-operads and (braided) monoidal categories. Applications of Hopf algebras to the Malcev completion of groups, of groupoids, and of operads in groupoids. Operadic definition of the Grothendieck-Teichmüller groups and of the set of Drinfeld’s associators. Appendices on free operads, trees and the cotriple resolution of operads.

The second volume is over 700 pages:

Contents: Introduction to general methods of the theory of model categories. The homotopy theory of modules, algebras, and the rational homotopy of spaces. The (rational) homotopy of operads. Applications of the rational homotopy theory to $E_n$-operads. Homotopy spectral sequences and the computation of homotopy automorphism spaces of operads. Applications to $E_2$-operads and the homotopy interpretation of the Grothendieck-Teichmüller group. Appendix on cofree cooperads and the Koszul duality of operads.