Planet Musings

February 13, 2016

Tim GowersFUNC3 — further strengthenings and variants

In the last post I concentrated on examples, so in this one I’ll concentrate on conjectures related to FUNC, though I may say a little about examples at the end, since a discussion has recently started about how we might go about trying to find a counterexample to FUNC.

A proposal for a rather complicated averaging argument

After the failure of the average-overlap-density conjecture, I came up with a more refined conjecture along similar lines that has one or two nice properties and has not yet been shown to be false.

The basic aim is the same: to take a union-closed family \mathcal A and use it to construct a probability measure on the ground set in such a way that the average abundance with respect to that measure is at least 1/2. With the failed conjecture the method was very basic: pick a random non-empty set A\in\mathcal A and then a random element x\in A.

The trouble with picking random elements is that it gives rise to a distribution that does not behave well when you duplicate elements. (What you would want is that the probability is shared out amongst the duplicates, but in actual fact if you duplicate an element lots of times it gives an advantage to the set of duplicates that the original element did not have.) This is not just an aesthetic concern: it was at the heart of the downfall of the conjecture. What one really wants, and this is a point that Tobias Fritz has been emphasizing, is to avoid talking about the ground set altogether, something one can do by formulating the conjecture in terms of lattices, though I’m not sure what I’m about to describe does make sense for lattices.

Let \mathcal A be a union-closed set system with ground set X. Define a chain to be a collection B_1\supset\dots\supset B_k of subsets of X with the following properties.

  1. The inclusions are strict.
  2. Each B_i is an intersection of sets in A.
  3. B_k is non-empty, but for every A\in\mathcal A, either A\cap B_k=\emptyset or B_k\subset A.

The idea is to choose a random chain and then a random element of B_k. That last step is harmless because the elements of B_k are indistinguishable from the point of view of \mathcal A (they are all contained in the same sets). So this construction behaves itself when you duplicate elements.

What exactly is a random chain? What I suggested before was to run an algorithm like this. You start with B_1=X. Having got to B_i, let \mathcal A_i consist of all sets A\in\mathcal A such that A\cap B is neither empty nor B, pick a random set A\in\mathcal A_i, and let B_{i+1}=B_i\cap A. But that is not the only possibility. Another would be to define a chain to be maximal if for every i there is no set A\in\mathcal A such that A\cap B_{i-1} lies strictly between B_i and B_{i-1}, and then to pick a maximal chain uniformly at random.

At the moment I think that the first idea is more natural and therefore more likely to work. (But “more likely” does not imply “likely”.) The fact that it seems hard to disprove is not a good reason for optimism, since the definition is sufficiently complicated that it is hard to analyse. Perhaps there is a simple example for which the conjecture fails by miles, but for which it is very hard to prove that it fails by miles (other than by checking it on a computer if the example is small enough).

Another possible idea is this. Start a random walk at X. The walk takes place on the set of subsets of X that are non-empty intersections of sets in \mathcal A. Call this set system I(\mathcal A). Then join B to B' in I(A) if B is a proper subset of B' and there is no B''\in I(A) that lies properly between B and B'. To be clear, I’m defining an undirected graph here, so if B is joined to B', then B' is joined to B.

Now we do a random walk on this graph by picking a random neighbour at each stage, and we take its stationary distribution. One could then condition this distribution on the set you are at being a minimal element of I(\mathcal A). This gives a distribution on the minimal elements, and then the claim would be that on average a minimal element is contained in at least half the sets in \mathcal A.

I’ll finish this section with the obvious question.

Question. Does an averaging argument with a probability distribution like one of these have the slightest chance of working? If so, how would one go about proving it?

Describing union-closed families using Horn clauses

Tobias Fritz has shared with us a very nice observation that gives another way of looking at union-closed families. It is sufficiently natural that I feel there is a good chance that it will be genuinely helpful, and not just a slightly different perspective on all the same statements.

Let X be a finite set, let x\in X and let B\subset X be a non-empty subset of X. Write (x,B) as shorthand for the condition

x\in A\implies A\cap B\ne\emptyset.

If B=\{b_1,\dots,b_k\}, then we can write this as a Horn clause

x\in A\implies b_1\in A\vee\dots\vee b_k\in A.

If (x_1,B_1),\dots,(x_r,B_r) is a collection of conditions of this kind, then we can define a set system \mathcal A to consist of all sets A that satisfy all of them. That is, for each i, if x_i\in A, then A\cap B_i\ne\emptyset.

It is very easy to check that any set system \mathcal A defined this way is union closed and contains the empty set. Conversely, given a union-closed family \mathcal A that includes the empty set, let C be a subset of X that does not belong to \mathcal A. If for every x\in C we can find A_x\in\mathcal A such that x\in A_x\subset C, then we have a contradiction, since the union of these A_x belongs to \mathcal A and is equal to C. So there must be some x such that for every A\in\mathcal A, if x\in A, then A\cap(X\setminus C)\ne\emptyset. That is, there is a condition (x,X\setminus C) that is satisfied by every A\in\mathcal A and is not satisfied by C. Taking all such conditions, we have a collection of conditions that gives rise to precisely the set system \mathcal A.

As Thomas says, this is strongly reminiscent of describing a convex body not as a set of points but as an intersection of half spaces. Since that dual approach is often extremely useful, it seems very much worth bearing in mind when thinking about FUNC. At the very least, it gives us a concise way of describing some union-closed families that would be complicated to define in a more element-listing way: Tobias used it to describe one of Thomas Bloom’s examples quite concisely, for instance.

Generalizing the idea

Suppose we have a Horn-clause description of a union-closed family \mathcal A. For each x\in X, it gives us a collection of conditions that A must satisfy, each of the form x_1\in A\vee\dots\vee x_k\in A. Putting all these together gives us a single condition in conjunctive normal form. This single condition is a monotone property of \mathcal A, and any monotone property can arise in this way. So if we want, we can forget about Horn clauses and instead think of an arbitrary union-closed family as being defined as follows. For each x\in X, there is some monotone property P_x, and then \mathcal A consists of all sets A such that for every x\in A, the property P_x(A) holds.

To illustrate this with an example (not one that has any chance of being a counterexample to FUNC — just an example of the kind of thing one can do), we could take X=\mathbb{Z}_p (the integers mod a prime p) and take P_x to be the property “contains a subset of the form \{a,a+x,\dots,a+(r-1)x\}“. Note that this is a very concise definition, but the resulting criterion for a set A to belong to \mathcal A is not simple at all. (If you think it is, then can you exhibit for me a non-empty set A of density less than 1/2 that satisfies the condition when r=10, or prove that no such set exists? Update: I’ve now realized that this question has a fairly easy answer — given in a comment below. But describing the sets that satisfy the condition would not be simple.)

Natural questions that arise

This way of looking at union-closed families also generates many special cases of FUNC that could be interesting to tackle. For example, we can take the ground set X to be some structure (above, I took a cyclic group, but one could also take, for instance, the complete graph on a set V of vertices) and restrict attention to properties P_x that are natural within that structure (where “natural” could mean something like invariant under symmetries of the structure that fix x).

Another special case that is very natural to think about is where each property P_x is a single disjunction — that is, the Horn-clause formulation in the special case where each x is on the left of exactly one Horn clause. Is FUNC true in this case? Or might this case be a good place to search for a counterexample? At the time of writing, I have no intuition at all about this question, so even heuristic arguments would be interesting.

A question of Gil Kalai

As discussed in the last post, we already know that an optimistic conjecture of Tobias Fritz, that there is always some x and a union-preserving injection from \mathcal A_{\overline x} to \mathcal A_x, is false. Gil Kalai proposed a conjecture in a similar spirit: that there is always an injection from \mathcal A_{\overline x} to \mathcal A_x such that each set in \mathcal A_{\overline x} is a subset of its image. So far, nobody (or at least nobody here) has disproved this. I tried to check whether the counterexamples to Tobias’s conjecture worked here too, and I’m fairly sure the complement-of-Steiner-system approach doesn’t work.

While the general belief seems to be (at least if we believe Jeff Kahn) that such strengthenings are false, it would be very good to confirm this. Of course it would be even better to prove the strengthening …

Update: Alec Edgington has now found a counterexample.

A question of Tom Eccles

In this comment Tom Eccles asked a question motivated by thinking about what an inductive proof of FUNC could possibly look like. The question ought to be simpler than FUNC, and asks the following. Does there exist a union-closed family \mathcal A and an element x\in X with the following three properties?

  1. x has abundance less than 1/2.
  2. No element has abundance greater than or equal to 1/2 in both \mathcal A_x and \mathcal A_{\overline x}.
  3. Both \mathcal A_x and \mathcal A_{\overline x} contain at least one non-empty set.

It would be very nice to have such an example, because it would make an excellent test case for proposed inductive approaches.

There’s probably plenty more I could extract from the comment thread in the last post, but I think it’s time to post this, since the number of comments has exceeded 100.

While I’m saying that, let me add a general remark that if anyone thinks that a direction of discussion is being wrongly neglected, then please feel free to highlight it, even (or perhaps especially) if it is a direction that you yourself introduced. These posts are based on what happens to have caught my attention, but should not be interpreted as a careful judgment of what is interesting and what is not. I hope that everything I include is interesting, but the converse is completely false.

Chad Orzel162/366: The Love Fishtank

SteelyKid was sent home on Wednesday with strep throat, and so needed to be home thursday as well. she was very disappointed to be missing school, as her class was preparing for a Valentine’s Day party on Friday. I picked up a bunch of work for her, including a heart-shaped paper pouch to hold the cards the kids would exchange.

You wouldn’t’ve known she was officially sick on Thursday– her energy level was basically at normal. She powered through a bunch of homework, and then set to decoarting the pouch with hearts and happy stick figures in a rainbow of colors. And also a fishtank:

Detail from SteelyKid's heart-shaped pouch for Valentine's Day cards.

Detail from SteelyKid’s heart-shaped pouch for Valentine’s Day cards.

I’m not sure why love has a fishtank, but I’m glad to see she’s not confined by conventional Valentine iconography…

More seriously, her teacher also had the kids write a note to each of their classmates saying something nice about them. SteelyKid had finished all but one of these before she was sent home, so I mostly don’t know what she wrote, but the booklet of messages to her from other kids is ridiculously cute and charming. She’s apparently widely regarded as funny and “wise” (a sort of odd word choice, which must’ve come from something they read as a class), and about a third of them mention the “Zoom Day” when she donned sparring gear and let all her classmates punch her in the chest. As a physicist, I was also happy to see that one classmate wrote “I hope you can teach me your math tricks”…

It was a really sweet class project, and must’ve taken really careful management by the teacher. But it came out great, so I continue to be very impressed with her school.

John BaezThe Quagga

The quagga was a subspecies of zebra found only in South Africa’s Western Cape region. After the Dutch invaded, they hunted the quagga to extinction. While some were taken to zoos in Europe, breeding programs failed. The last wild quagga died in 1878, and the very last quagga died in an Amsterdam zoo in 1883.

Only one was ever photographed—the mare shown above, in London. Only 23 stuffed and mounted quagga specimens exist. There was one more, but it was destroyed in Königsberg, Germany, during World War II. There is also a mounted head and neck, a foot, 7 complete skeletons, and samples of various tissues.

The quagga was the first extinct animal to have its DNA analyzed. It used to be thought that the quagga was a distinct species from the zebra. After some argument, a genetic study published in 2005 convinced most people that the quagga is a subspecies of the zebra. It showed that the quagga diverged from the other zebra subspecies only between 120,000 and 290,000 years ago, during the Pleistocene.

In 1987, a natural historian named Reinhold Rau started the Quagga Project. He was goal was to breed zebras into quaggas by selecting for quagga-like traits, most notably the lack of stripes on the back half of its body.

The founding population consisted of 19 zebras from Namibia and South Africa, chosen because they had reduced striping on the rear body and legs. The first foal was born in 1988.

By now, members of the Quagga Project believe they have recreated the quagga. Here they are:

Rau-quagga (zebra subspecies)

The new quaggas are called ‘rau–quaggas’ to distinguish them from the original ones. Do they look the same as the originals? It’s hard for me to decide. Old paintings show quite a bit of variability:

This is an 1804 illustration by Samuel Daniell, which served as the basis of a claimed subspecies of quagga, Equus quagga danielli. Perhaps they just have variable coloring.

Why try to resurrect the quagga? Rau is no longer alive, but Eric Harley, a retired professor of chemical pathology at the University of Cape Town, had this to say:

It’s an attempt to try and repair ecological damage that was done a long time ago in some sort of small way. It is also to try and get a representation back of a charismatic animal that used to live in South Africa.

We don’t do genetic engineering, we aren’t cloning, we aren’t doing any particularly clever sort of embryo transfers—it is a very simple project of selective breeding. If it had been a different species the whole project would have been unjustifiable.

The current Quagga Project chairman, Mike Gregor, has this to say:

I think there is controversy with all programmes like this. There is no way that all scientists are going to agree that this is the right way to go. We are a bunch of enthusiastic people trying to do something to replace something that we messed up many years ago.

What we’re not doing is selecting some fancy funny colour variety of zebra, as is taking place in other areas, where funny mutations have taken place with strange colouring which may look amusing but is rather frowned upon in conservation circles.

What we are trying to do is get sufficient animals—ideally get a herd of up to 50 full-blown rau-quaggas in one locality, breeding together, and then we would have a herd we could say at the very least represents the original quagga.

We obviously want to keep them separate from other populations of plains zebra otherwise we simply mix them up again and lose the characteristic appearance.

The quotes are from here:

• Lawrence Bartlett, South Africa revives ‘extinct’ zebra subspecies,, 12 February 2016.

This project is an example of ‘resurrection biology’, or ‘de-extinction’:

• Wikipedia, De-extinction.

Needless to say, it’s a controversial idea.

February 12, 2016

Scott AaronsonThe universe has a high (but not infinite) Sleep Number

As everyone knows, this was a momentous week in the history of science.  And I don’t need to tell you why: the STOC and CCC accepted paper lists finally came out.

Haha, kidding!  I meant, we learned this week that gravitational waves were directly detected for the first time, a hundred years after Einstein first predicted them (he then reneged on the prediction, then reinstated it, then reneged again, then reinstated it a second time—see Daniel Kennefick’s article for some of the fascinating story).

By now, we all know some of the basic parameters here: a merger of two black holes, ~1.3 billion light-years away, weighing ~36 and ~29 solar masses respectively, which (when they merged) gave off 3 solar masses’ worth of energy in the form of gravitational waves—in those brief 0.2 seconds, radiating more watts of power than all the stars in the observable universe combined.  By the time the waves reached earth, they were only stretching and compressing space by 1 part in 4×1021—thus, changing the lengths of the 4-kilometer arms of LIGO by 10-18 meters (1/1000 the diameter of a proton).  But this was detected, in possibly the highest-precision measurement ever made.

As I read the historic news, there’s one question that kept gnawing at me: how close would you need to have been to the merging black holes before you could, you know, feel the distortion of space?  I made a guess, assuming the strength of gravitational waves fell off with distance as 1/r2.  Then I checked Wikipedia and learned that the strength falls off only as 1/r, which completely changes the situation, and implies that the answer to my question is: you’d need to be very close.  Even if you were only as far from the black-hole cataclysm as the earth is from the sun, I get that you’d be stretched and squished by a mere ~50 nanometers (this interview with Jennifer Ouellette and Amber Stuver says 165 nanometers, but as a theoretical computer scientist, I try not to sweat factors of 3).  Even if you were 3000 miles from the black holes—New-York/LA distance—I get that the gravitational waves would only stretch and squish you by around a millimeter.  Would you feel that?  Not sure.  At 300 miles, it would be maybe a centimeter—though presumably the linearized approximation is breaking down by that point.  (See also this Physics StackExchange answer, which reaches similar conclusions, though again off from mine by factors of 3 or 4.)  Now, the black holes themselves were orbiting about 200 miles from each other before they merged.  So, the distance at which you could safely feel their gravitational waves, isn’t too far from the distance at which they’d rip you to shreds and swallow you!

In summary, to stretch and squeeze spacetime by just a few hundred nanometers per meter, along the surface of a sphere whose radius equals our orbit around the sun, requires more watts of power than all the stars in the observable universe give off as starlight.  People often say that the message of general relativity is that matter bends spacetime “as if it were a mattress.”  But they should add that the reason it took so long for humans to notice this, is that it’s a really friggin’ firm mattress, one that you need to bounce up and down on unbelievably hard before it quivers, and would probably never want to sleep on.

As if I needed to say it, this post is an invitation for experts to correct whatever I got wrong.  Public humiliation, I’ve found, is a very fast and effective way to learn an unfamiliar field.

Chad Orzel161/366: Mobile Office

Some time back, I posted a photo of my usual spot at the Starbucks in Niskayuna. When I was in Newport News earlier this week, of course, I had to find a different space from which to rant about Twitter. Here’s that spot:

My "office" when I was in Virginia.

My “office” when I was in Virginia.

As you can see, the principal difference between the two is that I didn’t bring my stainless-steel travel mug with me on the trip. The store in Newport News is laid out almost exactly the same way as my regular one in Niskayuna.

I’m probably more amused by this than any of my readers will be, but then, this was the only remotely photo-worthy thing in a long annoying day of air travel on Wednesday. So.

ResonaancesLIGO: what's in it for us?

I mean us theoretical particle physicists. With this constraint, the short answer is not much.  Of course, every human being must experience shock and awe when picturing the phenomenon observed by LIGO. Two black holes spiraling into each other and merging in a cataclysmic event which releases energy equivalent to 3 solar masses within a fraction of a second.... Besides, one can only  admire the ingenuity that allows us to detect here on Earth a disturbance of the gravitational field created in a galaxy 1.3 billion light years away. In more practical terms, the LIGO announcement starts the era of gravitational wave astronomy and thus opens a new window on the universe. In particular, LIGO's discovery is a first ever observation of a black hole binary, and we should soon learn more about the ubiquity of astrophysical systems containing one or more black holes. Furthermore, it is possible that we will discover completely new objects whose existence we don't even suspect. Still, all of the above is what I fondly call dirty astrophysics on this blog,  and it does not touch upon any fundamental issues. What are the prospects for learning something new about those?

In the long run, I think we can be cautiously optimistic. While we haven't learned anything unexpected from today's LIGO announcement, progress in gravitational wave astronomy should eventually teach us something about fundamental physics. First of all, advances in astronomy, inevitably brought by this new experimental technique,  will allow us to better measure the basic parameters of the universe. This in turn will provide us information about aspects of fundamental physics that can affect the entire universe, such as e.g. the dark energy. Moreover, by observing phenomena occurring in strong gravitational fields and of which signals propagate over large distances, we can place constraints on modifications of Einstein gravity such as the graviton mass (on the downside,  often there is no consistent alternative theory that can be constrained).

Closer to our hearts, one potential source of gravitational waves is a strongly first-order phase transition. Such an event may have occurred as the early universe was cooling down.  Below a certain critical temperature a symmetric phase of the high-energy theory may no longer be energetically preferred, and the universe enters a new phase where the symmetry is broken. If the transition is violent (strongly first-order in the physics jargon), bubbles of the new phase emerge, expand, and collide, until they fill the entire visible universe. Such a dramatic event produces gravitational waves with the amplitude that may be observable by future experiments.   Two examples of phase transitions we suspect to have occurred are the QCD phase transition  around T=100 MeV, and the electroweak phase transition around T=100 GeV. The Standard Model predicts that neither is first order, however new physics beyond the Standard Model may change that conclusion. Many examples of required  new physics have been proposed to modify the electroweak phase transition, for example models with additional Higgs scalars, or with warped extra dimensions.  Moreover, the phase transition could be related to symmetry breaking in a hidden sector that is very weakly or not at all coupled (except via gravity) to ordinary matter.  Therefore, by observing or putting limits on phase transitions in the early universe we will obtain complementary information about the fundamental theory at high energies.    

Gravitational waves from phase transitions are typically predicted to peak at frequencies much smaller than the ones probed by LIGO (35 to 250 Hz). The next generation of gravitational telescopes will be more equipped to detect such a signal thanks to a much larger arm-length (see figure borrowed from here). This concerns especially the eLISA space interferometer which will probe millihertz frequencies. Even smaller frequencies can be probed by pulsar timing arrays which search for signals of gravitational waves using stable pulsars for an antenna.  The worry is that the  interesting signal may be obscured by astrophysical backgrounds, such as (oh horror) gravitational wave emission from white dwarf binaries. Another  interesting beacon for future experiments is to detect gravitational waves from inflation (almost discovered 2 years ago via another method by the BICEP collaboration).  However, given the constraints from the CMB observations,  the inflation signal may well be too weak even for the future  giant space interferometers like DECIGO or BBO.

To summarize, the importance of the LIGO discovery for the field  of particle physics is mostly the boost it gives to  further experimental efforts in this direction.  Hopefully, the eLISA project will now take off, and other ideas will emerge. Once gravitational wave experiments become sensitive to sub-Hertz frequencies, they will start probing the parameter space of interesting theories beyond the Standard Model.  

Thanks YouTube! It's the first time I see a webcast of a highly anticipated event running smoothly in spite of 100000 viewers. This can be contrasted with Physical Review Letters who struggled to make one damn pdf file accessible ;) 

David Hoggfollowing up GW150914; new disrupting cluster

The day started with Dun Wang, Steven Mohammed, David Schiminovich, and I meeting to discuss GALEX projects. Of course instead we brain-stormed projects we could do around the LIGO discovery of gravitational radiation. So many ideas! Rates, counterparts, and re-analysis of the raw data emerged as early leaders in the brain-storming session.

Adrian Price-Whelan crashed the party and showed me evidence he has of a disrupting globular cluster. Not many are known! So then we dropped everything and spent the day getting membership probabilities for stars in the field. The astrophysical innovation is that Price-Whelan found this candidate on theoretical grounds: What Milky Way clusters are most likely to be disrupting? The methodological innovation is that we figured out a way to do membership likelihoods without an isochrone model: We are completely data-driven! We fired a huge job into the NSF supercomputer Stampede. Holy crap, that computer is huge.

Clifford JohnsonNews from the Front, XII: Simplicity

adding_cyclesOk, I promised to explain the staircase I put up on Monday. I noticed something rather nice recently, and reported it (actually, two things) in a recent paper, here. It concerns those things I called "Holographic Heat Engines" which I introduced in a paper two years ago, and which I described in some detail in a previous post. You can go to that post in order to learn the details - there's no point repeating it all again - but in short the context is an extension of gravitational thermodynamics where the cosmological constant is dynamical, therefore supplying a meaning to the pressure and the volume variables (p,V) that are normally missing in black hole thermodynamics... Once you have those, it seems obvious that you can start considering processes that do mechanical work (from the pdV term in the first law) and within a short while the idea of heat engines in which the black hole is the working substance comes along. Positive pressure corresponds to negative cosmological constant and so the term "holographic heat engines" is explained. (At least to those who know about holographic dualities.)

So you have a (p,V) plane, some heat flows, and an equation of state determined by the species of (asymptotically AdS) black hole you are working with. It's like discovering a whole new family of fluids for which I know the equation of state (often exactly) and now I get to work out the properties of the heat engines I can define with them. That's what this is.

Now, I suspect that this whole business is an answer waiting for a question. I can't tell you what the question is. One place to look might be in the space of field theories that have such black holes as their holographic dual, but I'm the first to admit that [...] Click to continue reading this post

The post News from the Front, XII: Simplicity appeared first on Asymptotia.

Steinn SigurðssonLIGO: Useful Things

LIGO and allies have also provided a bunch of fun useful stuff:

Have We Detected Gravitational Waves Yet?

Stretch and Squash

Black Hole Hunter


Gravitational Waves 101 – Markus Pössel’s excellent visualizations.

The Data

The Papers

SXS – Simulating eXtreme Spacetimes visualizations

The Chirp – courtesy of Georgia Tech GR group

Steinn SigurðssonLIGO: Listening to the Universe with Gravitational Waves

In 2011 Daniel Holz gave a Heinz R. Pagels Public Lecture at the Aspen Center for Physics on the topic of Gravitational Waves.

The talk is one of the better explanations of what this is all about, with a bonus introduction!

Steinn SigurðssonLIGO explains

February 11th was a good day.

I spent the day at the “Dynamics and accretion at the Galactic Center” Conference at the Aspen Center for Physics, where about 75 physicists have spent the week talking about black holes and stuff.
This morning we watched the LIGO press conference, frantically deciphered the papers, and had a LIGO Science Collaboration member give us a very good rundown of what the situation is.

Over the last few months, LIGO has been making some very good outreach material to explain what is what:

LIGO: A Passion for Understanding a film by Kai Staats

LIGO: Generations

LIGO: PhD Comics!

David Hogggravitational waves; Dr Moyses

The day started at Columbia, where many hundreds of people showed up to listen to the announcement from the LIGO project. As expected, they announced the detection of a gravitational inspiral, merger, and ringdown from a pair of 30-ish solar-mass black holes. Incredible. The signal is so clear, you can just see it directly in the data stream. There was lots of great discussion after the press conference, led by Imre Bartos (Columbia), who did a great job of fielding questions. I asked about the large masses (larger than naively expected), and about the cosmological-constraint implications. David Schiminovich asked about the event rate, which looks high (especially because we all believe they have more inspirals in the data stream). Adrian Price-Whelan asked about the the Earth-bound noise sources. And so on. It was a great party, and it is a great accomplishment for a very impressive experiment. And there will be much more, we all very much hope.

The crowd at Columbia for the LIGO press release.

In the afternoon, I had the pleasure of serving on the committee of Henrique Moyses (NYU), who successfully defended a PhD on microscopic particles subject to non-conservative forces (and a lot of thermal noise). He has beautiful theoretical explanations for non-trivial experimental results on particles that are thermophoretic (are subject to forces caused by temperature gradients). Interestingly, the thermophoretic mechanisms are not well understood, but that didn't stop Moyses from developing a good predictive theory. Moyses made interesting comments on biological systems; it appears that driven, microscopic, fluctuating systems collectively work together to make our bodies move and work. That's incredible, and shows just how important this kind of work is.

February 11, 2016

John PreskillLIGO: Playing the long game, and winning big!

Wow. What a day! And what a story!

Kip Thorne in 1972, around the time MTW was completed.

Kip Thorne in 1972, around the time MTW was completed.

It is hard for me to believe, but I have been on the Caltech faculty for nearly a third of a century. And when I arrived in 1983, interferometric detection of gravitational waves was already a hot topic of discussion here. At Kip Thorne’s urging, Ron Drever had been recruited to Caltech and was building the 40-meter prototype interferometer (which is still operating as a testbed for future detection technologies). Kip and his colleagues, spurred by Vladimir Braginsky’s insights, had for several years been actively studying the fundamental limits of quantum measurement precision, and how these might impact the search for gravitational waves.

I decided to bone up a bit on the subject, so naturally I pulled down from my shelf the “telephone book” — Misner, Thorne, and Wheeler’s mammoth Gravitationand browsed Chapter 37 (Detection of Gravitational Wave), for which Kip had been the lead author. The chapter brimmed over with enthusiasm for the subject, but to my surprise interferometers were hardly mentioned. Instead the emphasis was on mechanical bar detectors. These had been pioneered by Joseph Weber, whose efforts in the 1960s had first aroused Kip’s interest in detecting gravitational waves, and by Braginsky.

I sought Kip out for an explanation, and with characteristic clarity and patience he told how his views had evolved. He had realized in the 1970s that a strain sensitivity of order 10^{-21} would be needed for a good chance at detection, and after many discussions with colleagues like Drever, Braginsky, and Rai Weiss, he had decided that kind of sensitivity would not be achievable with foreseeable technology using bars.

Ron Drever, who built Caltech's 40-meter prototype interferometer in the 1980s.

Ron Drever, who built Caltech’s 40-meter prototype interferometer in the 1980s.

We talked about what would be needed — a kilometer scale detector capable of sensing displacements of 10^{-18} meters. I laughed. As he had many times by then, Kip told why this goal was not completely crazy, if there is enough light in an interferometer, which bounces back and forth many times as a waveform passes. Immediately after the discussion ended I went to my desk and did some crude calculations. The numbers kind of worked, but I shook my head, unconvinced. This was going to be a huge undertaking. Success seemed unlikely. Poor Kip!

I’ve never been involved in LIGO, but Kip and I remained friends, and every now and then he would give me the inside scoop on the latest developments (most memorably while walking the streets of London for hours on a beautiful spring evening in 1991). From afar I followed the forced partnership between Caltech and MIT that was forged in the 1980s, and the painful transition from a small project under the leadership of Drever-Thorne-Weiss (great scientists but lacking much needed management expertise) to a large collaboration under a succession of strong leaders, all based at Caltech.

Vladimir Braginsky, who realized that quantum effects constrain gravitational wave detectors.

Vladimir Braginsky, who realized that quantum effects limit the sensitivity of  gravitational wave detectors.

During 1994-95, I co-chaired a committee formulating a long-range plan for Caltech physics, and we spent more time talking about LIGO than any other issue. Part of our concern was whether a small institution like Caltech could absorb such a large project, which was growing explosively and straining Institute resources. And we also worried about whether LIGO would ultimately succeed. But our biggest worry of all was different — could Caltech remain at the forefront of gravitational wave research so that if and when LIGO hit paydirt we would reap the scientific benefits?

A lot has changed since then. After searching for years we made two crucial new faculty appointments: theorist Yanbei Chen (2007), who provided seminal ideas for improving sensitivity, and experimentalist Rana Adhikari (2006), a magician at the black art of making an interferometer really work. Alan Weinstein transitioned from high energy physics to become a leader of LIGO data analysis. We established a world-class numerical relativity group, now led by Mark Scheel. Staff scientists like Stan Whitcomb also had an essential role, as did longtime Project Manager Gary Sanders. LIGO Directors Robbie Vogt, Barry Barish, Jay Marx, and now Dave Reitze have provided effective and much needed leadership.

Rai Weiss, around the time he conceived LIGO in an amazing 1972 paper.

Rai Weiss, around the time he conceived LIGO in an amazing 1972 paper.

My closest connection to LIGO arose during the 1998-99 academic year, when Kip asked me to participate in a “QND reading group” he organized. (QND stands for Quantum Non-Demolition, Braginsky’s term for measurements that surpass the naïve quantum limits on measurement precision.) At that time we envisioned that Advanced LIGO would turn on in 2008, yet there were still many questions about how it would achieve the sensitivity required to ensure detection. I took part enthusiastically, and learned a lot, but never contributed any ideas of enduring value. The discussions that year did have positive outcomes, however; leading for example to a seminal paper by Kimble, Levin, Matsko, Thorne, and Vyatchanin on improving precision through squeezing of light. By the end of the year I had gained a much better appreciation of the strength of the LIGO team, and had accepted that Advanced LIGO might actually work!

I once asked Vladimir Braginsky why he spent years working on bar detectors for gravitational waves, while at the same time realizing that fundamental limits on quantum measurement would make successful detection very unlikely. Why wasn’t he trying to build an interferometer already in the 1970s? Braginsky loved to be asked questions like this, and his answer was a long story, told with many dramatic flourishes. The short answer is that he viewed interferometric detection of gravitational waves as too ambitious. A bar detector was something he could build in his lab, while an interferometer of the appropriate scale would be a long-term project involving a much larger, technically diverse team.

Joe Weber, who audaciously believed gravitational waves can be detected on earth.

Joe Weber, whose audacious belief that gravitational waves are detectable on earth inspired Kip Thorne and many others.

Kip’s chapter in MTW ends with section 37.10 (“Looking toward the future”) which concludes with this juicy quote (written almost 45 years ago):

“The technical difficulties to be surmounted in constructing such detectors are enormous. But physicists are ingenious; and with the impetus provided by Joseph Weber’s pioneering work, and with the support of a broad lay public sincerely interested in pioneering in science, all obstacles will surely be overcome.”

That’s what we call vision, folks. You might also call it cockeyed optimism, but without optimism great things would never happen.

Optimism alone is not enough. For something like the detection of gravitational waves, we needed technical ingenuity, wise leadership, lots and lots of persistence, the will to overcome adversity, and ultimately the efforts of hundreds of hard working, talented scientists and engineers. Not to mention the courage displayed by the National Science Foundation in supporting such a risky project for decades.

I have never been prouder than I am today to be part of the Caltech family.

Alexey Petrov“Ladies and gentlemen, we have detected gravitational waves.”

The title says it all. Today, The Light Interferometer Gravitational-Wave Observatory  (or simply LIGO) collaboration announced the detection of gravitational waves coming from the merger of two black holes located somewhere in the Southern sky, in the direction of the Magellanic  Clouds.  In the presentation, organized by the National Science Foundation, David Reitze (Caltech), Gabriela Gonzales (Louisiana State), Rainer Weiss (MIT), and Kip Thorn (Caltech), announced to the room full of reporters — and thousand of scientists worldwide via the video feeds — that they have seen a gravitational wave event. Their paper, along with a nice explanation of the result, can be seen here.


The data that they have is rather remarkable. The event, which occurred on 14 September 2015, has been seen by two sites (Livingston and Hanford) of the experiment, as can be seen in the picture taken from their presentation. It likely happened over a billion years ago (1.3B light years away) and is consistent with the merger of two black holes, of 29 and 46 solar masses. The resulting larger black hole’s mass is about 62 solar masses, which means that about 3 solar masses of energy (29+36-62=3) has been radiated in the form of gravitational waves. This is a huge amount of energy! The shape of the signal is exactly what one should expect from the merging of two black holes, with 5.1 sigma significance.

It is interesting to note that the information presented today totally confirms the rumors that have been floating around for a couple of months. Physicists like to spread rumors, as it seems.

ligoSince the gravitational waves are quadrupole, the most straightforward way to see the gravitational waves is to measure the relative stretches of the its two arms (see another picture from the MIT LIGO site) that are perpendicular to each other. Gravitational wave from black holes falling onto each other and then merging. The LIGO device is a marble of engineering — one needs to detect a signal that is very small — approximately of the size of the nucleus on the length scale of the experiment. This is done with the help of interferometry, where the laser beams bounce through the arms of the experiment and then are compared to each other. The small change of phase of the beams can be related to the change of the relative distance traveled by each beam. This difference is induced by the passing gravitational wave, which contracts one of the arms and extends the other. The way noise that can mimic gravitational wave signal is eliminated should be a subject of another blog post.

This is really a remarkable result, even though it was widely expected since the (indirect) discovery of Hulse and Taylor of binary pulsar in 1974! It seems that now we have another way to study the Universe.

Dirac Sea ShoreWhoop!

That is the sound of the gravitational waves hitting the LIGO detector. A chirp.

That is also the sound of the celebratory hurrah’s from the gravity community. We  finally have experimental (observational) confirmation to a prediction made by Einstein’s theory of general relativity 100 years ago.

The quest to hear gravitational waves started about 50 years ago by Webber and it is only now that enough sensitivity is available in the detectors to be able to hear the ripples of spacetime as they pass through the earth.

The particular event in question turned the equivalent of 3 solar masses into gravitational waves in a few seconds. This is much brighter in power than the brightest supernova. Remember that when supernova collapse, the light emitted from them gets trapped in the shells of ejected mater and the rise of the signal and afterglow is extended to months. This was brighter in energy than all the output of all the stars in the visible universe combined! The event of Sep. 14 2015 recorded the merger of two black holes of intermediate masses (about 30 solar masses each) about 1.3 billion lightyears away.

The official press release is here, the PRL paper is here.

The New York times has a nice movie and article to mark this momentous scientific breakthrough.

Congratulations to the LIGO team.


Usual caveat: Now we wait for confirmation, yadda yadda.






Filed under: gravity, Physics Tagged: gravitational waves, gravity, LIGO

Clifford JohnsonWhat Fantastic News!

einstein_and_binary_atlantic_graphicThis is an amazing day for humanity! Notice I said humanity, not science, not physics - humanity. The LIGO experiment has announced the discovery of a direct detection of gravitational waves (actual ripples in spacetime itself!!), opening a whole new window with which to see and understand the universe. This is equivalent to Galileo first pointing a telescope at the sky and beginning to see things like the moons of Jupiter and the phases of venus for the first time. Look how much we learned following from that... so we've a lot to look forward to. It is 100 years ago since gravitational waves were predicted, and we've now seen them directly for the first time!

Actually, more has been discovered in this announcement:- The signal came from the merger of two large (stellar) black holes, and so this is also the first direct confirmation of such black holes' existence! (We've known about them [...] Click to continue reading this post

The post What Fantastic News! appeared first on Asymptotia.

BackreactionEverything you need to know about gravitational waves

Last year in September, upgrades of the gravitational wave interferometer LIGO were completed. The experiment – now named advanced LIGO – searches for gravitational waves emitted in the merger of two black holes. Such a merger signal should fall straight into advanced LIGOs reach.

Estimated gravitational wave spectrum. [Image Source]

It was thus expected that the upgraded experiment either sees something immediately, or we’ve gotten something terribly wrong. And indeed, rumors about a positive detection started to appear almost immediately after the upgrade. But it wasn’t until this week that the LIGO collaboration announced several press-conferences in the USA and Europe, scheduled for tomorrow, Thursday Feb 11, at 3:30pm GMT. So something big is going to hit the headlines tomorrow, and here are the essentials that you need to know.

Gravitational waves are periodic distortions of space-time. They alter distance ratios for orthogonal directions. An interferometer works by using lasers to measure and compare orthogonal distances very precisely, thus it picks up even the tiniest space-time deformations.

Moving masses produce gravitational waves much like moving charges create electromagnetic waves. The most relevant differences between the two cases are
  1. Electromagnetic waves travel in space-time, whereas gravitational waves are a disturbance of space-time itself.
  2. Electromagnetic waves have spin 1, gravitational waves have spin two. The spin counts how much you have to rotate the wave for it to come back onto itself. For the electromagnetic fields that’s one full rotation, for the gravitational field it’s only half a rotation.
  3. [Image Credit: David Abergel]
  4. The dominant electromagnetic emission comes from the dipole moment (normally used eg for transmitter antennae), but gravitational waves have no dipole moment (a consequence of momentum conservation). It’s instead the quadrupole emission that is leading.
If you keep these differences in mind, you can understand gravitational waves in much the same way as electromagnetic waves. They can exist at any wavelength. They move at the speed of light. How many there are at a given wavelength depends on how many processes there are to produce them. The known processes give rise to the distribution in the graphic above. A gravitational wave detector is basically an antenna tuned in to a particularly promising frequency.

Since all matter gravitates, the motion of matter generically creates gravitational waves. Every time you move, you create gravitational waves, lots of them. These are, however, so weak that they are impossible to measure.

The gravitational waves that LIGO is looking for come from the most violent events in the universe that we know of: black hole mergers. In these events, space-time gets distorted dramatically as the two black holes join to one, leading to significant emission of gravitational waves. This combined system later settles with a characteristic “ringdown” into a new stable state.

Yes, this also means that these gravitational waves go right through you and distort you oh-so-slightly on their way.

The wave-lengths of gravitational waves emitted in such merger events are typically of the same order as the dimension of the system. That is, for black holes with masses between 10 and 100 times the solar mass, wavelengths are typically a hundred to a thousand km – right in the range that LIGO is most sensitive.

If you want to score extra points when discussing the headlines we expect tomorrow, learn how to pronounce Fabry–Pérot. This is a method for bouncing back light-signals in interferometer arms several times before making the measurments, which effectively increases the armlength. This is why LIGO is sensitive in a wavelength regime far longer than its actual arm length of about 2-4 km. And don’t call them gravity waves. A gravity wave is a cloud phenomenon.

Gravitational waves were predicted a hundred years ago as one of the consequences of Einstein’s theory of General Relativity. Their existence has since been indirectly confirmed because gravitational wave emission leads to energy loss, which has the consequence that two stars which orbit around a common center speed up over the course of time. This has been observed and was awarded the Nobel Prize for physics in 1993. If LIGO has detected the sought-after signal, it would not be the first detection, but the first direct detection.

Interestingly, even though it was long known that black hole mergers would emit gravitational waves, it wasn’t until computing power had increased sufficiently that precise predictions became possible. So it’s not like experiment is all that far behind theory on that one. General Relativity, though often praised for its beauty, does leave you with one nasty set of equations that in most cases cannot be solved analytically and computer simulations become necessary.

The existence of gravitational waves is not doubted by anyone in the physics community, or at least not by anybody I have met. This is for good reasons: On the experimental side there is the indirect evidence, and on the theoretical side there is the difficulty of making any theory of gravity work that does not have gravitational waves. But the direct detection of gravitational waves would be tremendously exciting because it opens our eyes to an entirely new view on the universe.

Hundreds of millions of years ago, a primitive form of life crawled out of the water on planet Earth and opened their eyes to see, for the first time, the light of the stars. Detecting gravitational waves is a momentous event just like this – it’s the first time we can receive signals that were previously entirely hidden from us, revealing an entirely new layer of reality.

So bookmark the webcast page and mark your calendar for tomorrow 3:30 GMT  –  it might enter the history books.

Update Feb 11: The rumors were all true. They have a 5.1 σ signal of a binary black hole merger. The paper is published in PRL, here is the abstract.

Sean CarrollGravitational Waves at Last

ONCE upon a time, there lived a man who was fascinated by the phenomenon of gravity. In his mind he imagined experiments in rocket ships and elevators, eventually concluding that gravity isn’t a conventional “force” at all — it’s a manifestation of the curvature of spacetime. He threw himself into the study of differential geometry, the abstruse mathematics of arbitrarily curved manifolds. At the end of his investigations he had a new way of thinking about space and time, culminating in a marvelous equation that quantified how gravity responds to matter and energy in the universe.

Not being one to rest on his laurels, this man worked out a number of consequences of his new theory. One was that changes in gravity didn’t spread instantly throughout the universe; they traveled at the speed of light, in the form of gravitational waves. In later years he would change his mind about this prediction, only to later change it back. Eventually more and more scientists became convinced that this prediction was valid, and worth testing. They launched a spectacularly ambitious program to build a technological marvel of an observatory that would be sensitive to the faint traces left by a passing gravitational wave. Eventually, a century after the prediction was made — a press conference was called.

Chances are that everyone reading this blog post has heard that LIGO, the Laser Interferometric Gravitational-Wave Observatory, officially announced the first direct detection of gravitational waves. Two black holes, caught in a close orbit, gradually lost energy and spiraled toward each other as they emitted gravitational waves, which zipped through space at the speed of light before eventually being detected by our observatories here on Earth. Plenty of other places will give you details on this specific discovery, or tutorials on the nature of gravitational waves, including in user-friendly comic/video form.

What I want to do here is to make sure, in case there was any danger, that nobody loses sight of the extraordinary magnitude of what has been accomplished here. We’ve become a bit blasé about such things: physics makes a prediction, it comes true, yay. But we shouldn’t take it for granted; successes like this reveal something profound about the core nature of reality.

Some guy scribbles down some symbols in an esoteric mixture of Latin, Greek, and mathematical notation. Scribbles originating in his tiny, squishy human brain. (Here are what some of those those scribbles look like, in my own incredibly sloppy handwriting.) Other people (notably Rainer Weiss, Ronald Drever, and Kip Thorne), on the basis of taking those scribbles extremely seriously, launch a plan to spend hundreds of millions of dollars over the course of decades. They concoct an audacious scheme to shoot laser beams at mirrors to look for modulated displacements of less than a millionth of a billionth of a centimeter — smaller than the diameter of an atomic nucleus. Meanwhile other people looked at the sky and tried to figure out what kind of signals they might be able to see, for example from the death spiral of black holes a billion light-years away. You know, black holes: universal regions of death where, again according to elaborate theoretical calculations, the curvature of spacetime has become so pronounced that anything entering can never possibly escape. And still other people built the lasers and the mirrors and the kilometers-long evacuated tubes and the interferometers and the electronics and the hydraulic actuators and so much more, all because they believed in those equations. And then they ran LIGO (and other related observatories) for several years, then took it apart and upgraded to Advanced LIGO, finally reaching a sensitivity where you would expect to see real gravitational waves if all that fancy theorizing was on the right track. 

And there they were. On the frikkin’ money.


Our universe is mind-bogglingly vast, complex, and subtle. It is also fantastically, indisputably knowable.


I got a hard time a few years ago for predicting that we would detect gravitational waves within five years. And indeed, the track record of such predictions has been somewhat spotty. Outside Kip Thorne’s office you can find this record of a lost bet — after he predicted that we would see them before 1988. (!)


But this time around I was pretty confident. The existence of overly-optimistic predictions in the past doesn’t invalidate the much-better predictions we can make with vastly updated knowledge. Advanced LIGO represents the first time when we would have been more surprised not to see gravitational waves than to have seen them. And I believed in those equations.

I don’t want to be complacent about it, however. The fact that Einstein’s prediction has turned out to be right is an enormously strong testimony to the power of science in general, and physics in particular, to describe our natural world. Einstein didn’t know about black holes; he didn’t even know about lasers, although it was his work that laid the theoretical foundations for both ideas. He was working at a level of abstraction that reached as far as he could (at the time) to the fundamental basis of things, how our universe works at the deepest of levels. And his theoretical insights were sufficiently powerful and predictive that we could be confident in testing them a century later. This seemingly effortless insight that physics gives us into the behavior of the universe far away and under utterly unfamiliar conditions should never cease to be a source of wonder.

We’re nowhere near done yet, of course. We have never observed the universe in gravitational waves before, so we can’t tell for sure what we will see, but plausible estimates predict between one-half and several hundred events per year. Hopefully, the success of LIGO will invigorate interest in other ways of looking for gravitational waves, including at very different wavelengths. Here’s a plot focusing on three regimes: LIGO and its cousins on the right, the proposed space-based observatory LISA in the middle, and pulsar-timing arrays (using neutron stars throughout the galaxy as a giant gravitational-wave detector) on the left. Colorful boxes are predicted sources; solid lines are the sensitivities of different experiments. Gravitational-wave astrophysics has just begun; asking us what we will find is like walking up to Galileo and asking him what else you could discover with telescopes other than moons around Jupiter.


For me, the decade of the 2010’s opened with five big targets in particle physics/gravitation/cosmology:

  1. Discover the Higgs boson.
  2. Directly detect gravitational waves.
  3. Directly observe dark matter.
  4. Find evidence of inflation (e.g. tensor modes) in the CMB.
  5. Discover a particle not in the Standard Model.

The decade is about half over, and we’ve done two of them! Keep up the good work, observers and experimentalists, and the 2010’s will go down as a truly historic decade in physics.

Richard EastherMaking Waves

Gravitational waves are already the science story of the week but if the rumours hold up they will one of the science stories of the century. We'll know soon enough, as there will be a press conference in Washington DC at 10:30am (local time) on Thursday. And this revolution will be broadcast; you can catch a livestream on Youtube.

The rumour doing the rounds is that the LIGO team will announce the detection of gravitational waves emitted during the merger of two black holes. Here's a quick explainer as we head into the (we hope!) big day...

What Are Gravitational Waves? Gravitational waves are waves that travel through the fabric of space, just as ripples move across the surface of a pond.

Waves In Space?  Yep. By detecting gravitational waves we are watching space bend and stretch.

Gravitational fields are encoded in the curvature of space CC BY-SA 3.0

Gravitational fields are encoded in the curvature of space CC BY-SA 3.0

Really? Waves in Space? Yep, Really. Once upon a time, physicists thought space was rigid and unchanging. However, in 1915 Einstein's General Theory of Relativity told us that gravitational forces are communicated via the curvature of space. This is often described via the "rubber sheet model"; curved space is analogous to a rubber sheet warped by massive objects that sit upon it. But what really matters for gravitational waves is not just that space (or, more properly, spacetime) can curve, but that its curvature can change. As stars and planets move the curvature of space must adapt itself to their new positions. If the curvature didn't change the universe would be a very strange place as a moving object would leave its gravitational field behind, a little like Peter Pan losing his shadow. Mathematically, the ability of space to bend and stretch means that waves can move through it, and this led Einstein to predict that gravitational waves could exist.

Why Is This So Exciting?  Science waited 100 years for this; who wouldn't be excited? For physicists, LIGO is testing a key prediction of General Relativity, which is one of the most fundamental theories we have. On top of that, if LIGO sees gravitational waves emitted by a pair of black holes as they collide and merge we will have ringside seats to some of the most remarkable events in the universe. And a detection by LIGO will mark the culmination of decades of work by a cast of thousands who have built what is probably the world's most sensitive scientific instrument.

How Does LIGO Work? LIGO has two giant L-shaped detectors; one in Washington State and the other in Louisiana, on the other side of the United States. Each detector is 4 kilometres on a side. Gravitational waves always stretch space in one direction while squeezing it in another, so a passing gravitational wave expands one side of the "L" while shrinking the other. Powerful lasers then pick up the resulting change in the lengths of the arms. The stretching and squeezing is tiny – each arm may grow and shrink by only a quadrillionth of a millimetre, far less than the diameter of a single atom. By having two detectors LIGO rules out spurious signals from local vibrations, traffic or tiny earthquakes; LIGO also pools its data with two smaller European experiments, GEO and VIRGO.

Spacetime near two orbiting black holesImage: Swinburne University

Spacetime near two orbiting black holesImage: Swinburne University

How Are Gravitational Waves Made? Two black holes (or any pair of orbiting objects) stir up space as they circle one another, creating gravitational waves. If the black holes are far apart the gravitational waves are unimaginably small. But gravitational waves carry away energy, and that energy has to come from somewhere – so the orbit slowly shrinks. But a smaller orbit is a faster orbit, increasing the output of gravitational radiation and the orbit shrinks faster and faster. This is the inspiral, and can take hundreds of millions or even billions of years. But eventually the two black holes are orbiting one other at a decent fraction of the speed of light, churning space like an out-of-control cosmic egg-beater. This phase lasts seconds but produces a huge burst of gravitational waves: this is the signal that LIGO detects. The black holes then plunge towards a merger, followed by the ringdown as the new black hole settles into a stable shape.

Didn't Everyone Get All Excited About Gravitational Waves A Couple of Years Ago? We did, and it was a false alarm. Several things are different this time, though. That claim was made by BICEP2, a telescope that looks at the microwave background, fossil light from the Big Bang. BICEP2 did not observe gravitational waves directly, as LIGO does. The signal BICEP2 saw turned out to be associated with dust in our own galaxy; this was quickly realized as astrophysicists checked and re-checked the results. (I blogged about the latest news from BICEP2; it is producing lovely data and starting to test a number of different theories about the Big Bang.) Moreover, the LIGO team has built a reputation for caution – going so far as to do "signal injections", where the analysis teams are unknowingly fed synthetic data to test their ability to extract real gravitational waves from the experimental noise. Finally, the rumour is that their results have been through peer review, and will have stood up to scrutiny from independent scientists.

What Next? Physicists will use LIGO to make stringent tests General Relativity: do its predictions match the behavior of spacetime seen during black holes mergers? And for astronomers it will like growing a new set of eyes: LIGO is an entirely new kind of telescope that lets us explore the universe with gravitational waves. Watch this space.

Matt StrasslerAdvance Thoughts on LIGO

Scarcely a hundred years after Einstein revealed the equations for his theory of gravity (“General Relativity”) on November 25th, 1915, the world today awaits an announcement from the LIGO experiment, where the G in LIGO stands for Gravity. (The full acronym stands for “Laser Interferometer Gravitational Wave Observatory.”) As you’ve surely heard, the widely reported rumors are that at some point in the last few months, LIGO, recently upgraded to its “Advanced” version, finally observed gravitational waves — ripples in the fabric of space (more accurately, of space-time). These waves, which can make the length of LIGO shorter and longer by an incredibly tiny amount, seem to have come from the violent merger of two black holes, each with a mass [rest-mass!] dozens of times larger than the Sun. Their coalescence occurred long long ago (billions of years) in a galaxy far far away (a good fraction of the distance across the visible part of the universe), but the ripples from the event arrived at Earth just weeks ago. For a brief moment, it is rumored, they shook LIGO hard enough to be convincingly observed.

For today’s purposes, let me assume the rumors are true, and let me assume also that the result to be announced is actually correct. We’ll learn today whether the first assumption is right, but the second assumption may not be certain for some months (remember OPERA’s [NOT] faster-than-light neutrinos  and BICEP2’s [PROBABLY NOT] gravitational waves from inflation). We must always keep in mind that any extraordinary scientific result has to be scrutinized and confirmed by experts before scientists will believe it! Discovery is difficult, and a large fraction of such claims — large — fail the test of time.

What the Big News Isn’t

There will be so much press and so many blog articles about this subject that I’m just going to point out a few things that I suspect most articles will miss, especially those in the press.

Most importantly, if LIGO has indeed directly discovered gravitational waves, that’s exciting of course. But it’s by no means the most important story here.

That’s because gravitational waves were already observed indirectly, quite some time ago, in a system of two neutron stars orbiting each other. This pair of neutron stars, discovered by Joe Taylor and his graduate student Russell Hulse, is interesting because one of the neutron stars is a pulsar, an object whose rotation and strong magnetic field combine to make it a natural lighthouse, or more accurately a radiohouse, sending out pulses of radio waves that can be detected at great distances. The time between pulses shifts very slightly as the pulsar moves toward and away from Earth, so the pulsar’s motion around its companion can be carefully monitored. Its orbital period has slowly changed over the decades, and the changes are perfectly consistent with what one would expect if the system were losing energy, emitting it in the form of unseen gravitational waves at just the rate predicted by Einstein’s theory (as shown in this graph.) For their discovery, Hulse and Taylor received the 1993 Nobel Prize. By now, there are other examples of similar pairs of neutron stars, also showing the same type of energy loss in detailed accord with Einstein’s equations.

A bit more subtle (so you can skip this paragraph if you want), but also more general, is that some kind of gravitational waves are inevitable… inevitable, after you accept Einstein’s earlier (1905) equations of special relativity, in which he suggested that the speed of light is a sort of universal speed limit on everything, imposed by the structure of space-time.  Sound waves, for instance, exist because the speed of sound is finite; if it were infinite, a vibrating guitar string would make the whole atmosphere wiggle back and forth in sync with the guitar string.  Similarly, since effects of gravity must travel at a finite speed, the gravitational effects of orbiting objects must create waves. The only question is the specific properties those waves might have.

No one, therefore, should be surprised that gravitational waves exist, or that they travel at the universal speed limit, just like electromagnetic waves (including visible light, radio waves, etc.) No one should even be surprised that the waves LIGO is (perhaps) detecting have properties predicted by Einstein’s specific equations for gravity; if they were different in a dramatic way, the Hulse-Taylor neutron stars would have behaved differently than expected.

Furthermore, no one should be surprised if waves from a black hole merger have been observed by the Advanced LIGO experiment. This experiment was designed from the beginning, decades ago, so that it could hardly fail to discover gravitational waves from the coalescence of two black holes, two neutron stars, or one of each. We know these mergers happen, and the experts were very confident that Advanced LIGO could find them. The really serious questions were: (a) would Advanced LIGO work as advertised? (b) if it worked, how soon would it make its first discovery? and (c) would the discovery agree in detail with expectations from Einstein’s equations?

Big News In Scientific Technology

So the first big story is that Advanced LIGO WORKS! This experiment represents one of the greatest technological achievements in human history. Congratulations are due to the designers, builders, and operators of this experiment — and to the National Science Foundation of the United States, which is LIGO’s largest funding source. U.S. taxpayers, who on average each contributed a few cents per year over the past two-plus decades, can be proud. And because of the new engineering and technology that were required to make Advanced LIGO functional, I suspect that, over the long run, taxpayers will get a positive financial return on their investment. That’s in addition of course to a vast scientific return.

Advanced LIGO is not even in its final form; further improvements are in the works. Currently, Advanced LIGO consists of two detectors located 2000 miles (3000 kilometers) apart. Each detector consists of two “arms” a few miles (kilometers) long, oriented at right angles, and the lengths of the arms are continuously compared.  This is done using exceptionally stable lasers reflecting off exceptionally perfect mirrors, and requiring use of sophisticated tricks for mitigating all sorts of normal vibrations and even effects of quantum “jitter” from the Heisenberg uncertainty principle. With these tools, Advanced LIGO can detect when passing gravitational waves change the lengths of LIGO’s arms by … incredibly … less than one part in a billion trillion (1,000,000,000,000,000,000,000). That’s an astoundingly tiny distance: a thousand times smaller than the radius of a proton. (A proton itself is a hundred thousand times smaller, in radius, than an atom. Indeed, LIGO is measuring a distance as small as can be probed by the Large Hadron Collider — albeit with a very very tiny energy, in contrast to the collider.) By any measure, the gravitational experimenters have done something absolutely extraordinary.

Big News In Gravity

The second big story: from the gravitational waves that LIGO has perhaps seen, we would learn that the merger of two black holes occurs, to a large extent, as Einstein’s theory predicts. The success of this prediction for what the pattern of gravitational waves should be is a far more powerful test of Einstein’s equations than the mere existence of the gravitational waves!

Imagine, if you can… Two city-sized black holes, each with a mass [rest-mass!] tens of times greater than the Sun, and separated by a few tens of miles (tens of kilometers), orbit each other. They circle faster and faster, as often, in their last few seconds, as 100 times per second. They move at a speed that approaches the universal speed limit. This extreme motion creates an ever larger and increasingly rapid vibration in space-time, generating large space-time waves that rush outward into space. Finally the two black holes spiral toward each other, meet, and join together to make a single black hole, larger than the first two and spinning at an incredible rate.  It takes a short moment to settle down to its final form, emitting still more gravitational waves.

During this whole process, the total amount of energy emitted in the vibrations of space-time is a few times larger than you’d get if you could take the entire Sun and (magically) extract all of the energy stored in its rest-mass (E=mc²). This is an immense amount of energy, significantly more than emitted in a typical supernova. Indeed, LIGO’s black hole merger may perhaps be the most titanic event ever detected by humans!

This violent dance of darkness involves very strong and complicated warping of space and time. In fact, it wasn’t until 2005 or so that the full calculation of the process, including the actual moment of coalescence, was possible, using highly advanced mathematical techniques and powerful supercomputers!

By contrast, the resulting ripples we get to observe, billions of years later, are much more tame. Traveling far across the cosmos, they have spread out and weakened. Today they create extremely small and rather simple wiggles in space and time. You can learn how to calculate their properties in an advanced university textbook on Einstein’s gravity equations. Not for the faint of heart, but certainly no supercomputers required.

So gravitational waves are the (relatively) easy part. It’s the prediction of the merger’s properties that was the really big challenge, and its success would represent a remarkable achievement by gravitational theorists. And it would provide powerful new tests of whether Einstein’s equations are in any way incomplete in their description of gravity, black holes, space and time.

Big News in Astronomy

The third big story: If today’s rumor is indeed of a real discovery, we are witnessing the birth of an entirely new field of science: gravitational-wave astronomy. This type of astronomy is complementary to the many other methods we have of “looking” at the universe. What’s great about gravitational wave astronomy is that although dramatic events can occur in the universe without leaving a signal visible to the eye, and even without creating any electromagnetic waves at all, nothing violent can happen in the universe without making waves in space-time. Every object creates gravity, through the curvature of space-time, and every object feels gravity too. You can try to hide in the shadows, but there’s no hiding from gravity.

Advanced LIGO may have been rather lucky to observe a two-black-hole merger so early in its life. But we can be optimistic that the early discovery means that black hole mergers will be observed as often as several times a year even with the current version of Advanced LIGO, which will be further improved over the next few years. This in turn would imply that gravitational wave astronomy will soon be a very rich subject, with lots and lots of interesting data to come, even within 2016. We will look back on today as just the beginning.

Although the rumored discovery is of something expected — experts were pretty certain that mergers of black holes of this size happen on a fairly regular basis — gravitational wave astronomy might soon show us something completely unanticipated. Perhaps it will teach us surprising facts about the numbers or properties of black holes, neutron stars, or other massive objects. Perhaps it will help us solve some existing mysteries, such as those of gamma-ray bursts. Or perhaps it will reveal currently unsuspected cataclysmic events that may have occurred somewhere in our universe’s past.

Prizes On Order?

So it’s really not the gravitational waves themselves that we should celebrate, although I suspect that’s what the press will focus on. Scientists already knew that these waves exist, just as they were aware of the existence of atoms, neutrinos, and top quarks long before these objects were directly observed. The historic aspects of today’s announcement would be in the successful operation of Advanced LIGO, in its new way of “seeing” the universe that allows us to observe two black holes becoming one, and in the ability of Einstein’s gravitational equations to predict the complexities of such an astronomical convulsion.

Of course all of this is under the assumptions that the rumors are true, and also that LIGO’s results are confirmed by further observations. Let’s hope that any claims of discovery survive the careful and proper scrutiny to which they will now be subjected. If so, then prizes of the highest level are clearly in store, and will be doled out to quite a few people, experimenters for designing and building LIGO and theorists for predicting what black-hole mergers would look like. As always, though, the only prize that really matters is given by Nature… and the many scientists and engineers who have contributed to Advanced LIGO may have already won.

Enjoy the press conference this morning. I, ironically, will be in the most inaccessible of places: over the Atlantic Ocean.  I was invited to speak at a workshop on Large Hadron Collider physics this week, and I’ll just be flying home. I suppose I can wait 12 hours to find out the news… it’s been 44 years since LIGO was proposed…

Filed under: Astronomy, Gravitational Waves Tagged: astronomy, black holes, Gravitational Waves, LIGO

February 10, 2016

Dirac Sea ShoreGravitational waves announcement from LIGO expected

As the rumor noise level has increased over the last few weeks, and LIGO has a press conference scheduled for tomorrow morning, everyone in the gravity community is expecting that LIGO will announce the first detection of gravitational waves.


A roundup of rumors can be found here and here and here and here.

Preprints with postdictions that sound as predictions can be found here for example. I’ve been told that the cat has been out of the bag for a while, and people with inside information have been posting papers to the arxiv in advance of the LIGO announcement.

Obviously  this is very exciting and hopefully the announcement tomorrow will usher a new era of gravitational astronomy.


Filed under: gravity, Physics

David Hoggpredictions for the LIGO event

The twitters are ablaze with rumors about the announcement from the LIGO project scheduled for tomorrow. We discussed this in group meeting today, with no embargo-breaking by anyone. That is, on purely physical, engineering, sociological, and psychological grounds we made predictions for the press release tomorrow. Here are my best predictions: First, I predict that the total signal-to-noise of any detected black-hole inspiral signal they announce will be greater than 15 in the total data set. That is, I predict that (say) the width of the likelihood function for the overall, scalar signal amplitude will have a half-width that is less than 15 times its mode. Second, I predict that the uncertainty on the sum of the two masses (that is, the total mass of the inspiral system, if any is announced) will be dominated by the (large, many hundreds of km/s) uncertainty in the peculiar velocity of the system (in the context that the system lives inside the cosmological world model). Awesome predictions? Perhaps not, but you heard them here first!

[Note to the world: This is not an announcement: I know nothing! This is just a pair of predictions from an outsider.]

We discussed the things that could be learned from any detection of a single black-hole inspiral signal, about star formation, black-hole formation, and galaxies. I think that if the masses of the detected black holes are large, then there are probably interesting things to say about stars or supernovae or star formation.

David Hoggwhy we do linear fitting; how to avoid computation

Today was the first meeting of #AstroHackNY, where we discuss data analysis and parallel work up at Columbia on Tuesday mornings. We discussed what we want to get out of the series, and started a discussion of why we do linear fitting the way we do, and what are the underlying assumptions.

Prior to that, I talked with Hans-Walter Rix about interpolation and gridding of spectral models. We disagree a bit on the point of all this, but we are trying to minimize the number of stellar model evaluations we need to do to get precise, many-element abundances with a very expensive physical model of stars. We also discussed the point that we probably have to cancel the Heidelberg #GaiaSprint, because of today's announcement from the Gaia Project.

Chad Orzel160/366: Nice Crowd

I did bring my good camera with me to Newport News, and took it on the tour of Jefferson Lab yesterday, but despite the existence of DSLR pics, you’re getting a cell-phone snap for the photo of the day:


That’s the audience about 10-15 minutes before my talk last night, so it was a good turnout. And they laughed in the right places, and asked some really good questions last night. I also got asked to appear in a selfie with a bunch of students from a local school, so they could prove they were there to get extra credit for their science class…

The talk went well, though we had some technical difficulties. The shiny new video projection system in the auditorium has an audio input jack on the VGA cable for the laptop input, but it’s about four inches long, and the headphone jack for my laptop is on the far side of the keyboard from the VGA out. Whoops.

We ended up taping a lapel mike to the desk right under the laptop speaker, which was mostly fine, except for a couple of occasions where we got earsplitting feedback. Technology, man. What can you do? Video will be posted to the JLab web site at some point in the future, after they edit it and get closed captions done (I’m very sorry for whoever has to do that…)

I was also pleasantly surprised that a couple of Williams classmates showed up to the talk (they live in the area), so I went out with them afterwards for a couple of beers, to catch up. All in all, a good day.

Now, I just need to get through most of a day of airports and Regional Jets to get home to Niskayuna.

John PreskillSome like it cold.

When I reached IBM’s Watson research center, I’d barely seen Aaron in three weeks. Aaron is an experimentalist pursuing a physics PhD at Caltech. I eat dinner with him and other friends, most Fridays. The group would gather on a sidewalk in the November dusk, those three weeks. Light would spill from a lamppost, and we’d tuck our hands into our pockets against the chill. Aaron’s wife would shake her head.

“The fridge is running,” she’d explain.

Aaron cools down mechanical devices to near absolute zero. Absolute zero is the lowest temperature possible,1 lower than outer space’s temperature. Cold magnifies certain quantum behaviors. Researchers observe those behaviors in small systems, such as nanoscale devices (devices about 10-9 meters long). Aaron studies few-centimeter-long devices. Offsetting the devices’ size with cold might coax them into exhibiting quantum behaviors.

The cooling sounds as effortless as teaching a cat to play fetch. Aaron lowers his fridge’s temperature in steps. Each step involves checking for leaks: A mix of two fluids—two types of helium—cools the fridge. One type of helium costs about $800 per liter. Lose too much helium, and you’ve lost your shot at graduating. Each leak requires Aaron to warm the fridge, then re-cool it. He hauled helium and pampered the fridge for ten days, before the temperature reached 10 milliKelvins (0.01 units above absolute zero). He then worked like…well, like a grad student to check for quantum behaviors.

Aaron came to mind at IBM.

“How long does cooling your fridge take?” I asked Nick Bronn.

Nick works at Watson, IBM’s research center in Yorktown Heights, New York. Watson has sweeping architecture frosted with glass and stone. The building reminded me of Fred Astaire: decades-old, yet classy. I found Nick outside the cafeteria, nursing a coffee. He had sandy hair, more piercings than I, and a mandate to build a quantum computer.


IBM Watson

“Might I look around your lab?” I asked.

“Definitely!” Nick fished out an ID badge; grabbed his coffee cup; and whisked me down a wide, window-paneled hall.

Different researchers, across the world, are building quantum computers from different materials. IBMers use superconductors. Superconductors are tiny circuits. They function at low temperatures, so IBM has seven closet-sized fridges. Different teams use different fridges to tackle different challenges to computing.

Nick found a fridge that wasn’t running. He climbed half-inside, pointed at metallic wires and canisters, and explained how they work. I wondered how his cooling process compared to Aaron’s.

“You push a button.” Nick shrugged. “The fridge cools in two days.”

IBM, I learned, has dry fridges. Aaron uses a wet fridge. Dry and wet fridges operate differently, though both require helium. Aaron’s wet fridge vibrates less, jiggling his experiment less. Jiggling relates to transferring heat. Heat suppresses the quantum behaviors Aaron hopes to observe.

Heat and warmth manifest in many ways, in physics. Count Rumford, an 18th-century American-Brit, conjectured the relationship between heat and jiggling. He noticed that drilling holes into canons immersed in water boils the water. The drill bits rotated–moved in circles–transferring energy of movement to the canons, which heated up. Heat enraptures me because it relates to entropy, a measure of disorderliness and ignorance. The flow of heat helps explain why time flows in just one direction.

A physicist friend of mine writes papers, he says, when catalyzed by “blinding rage.” He reads a paper by someone else, whose misunderstandings anger him. His wrath boils over into a research project.

Warmth manifests as the welcoming of a visitor into one’s lab. Nick didn’t know me from Fred Astaire, but he gave me the benefit of the doubt. He let me pepper him with questions and invited more questions.

Warmth manifests as a 500-word disquisition on fridges. I asked Aaron, via email, about how his cooling compares to IBM’s. I expected two sentences and a link to Wikipedia, since Aaron works 12-hour shifts. But he took pity on his theorist friend. He also warmed to his subject. Can’t you sense the zeal in “Helium is the only substance in the world that will naturally isotopically separate (neat!)”? No knowledge of isotopic separation required.

Many quantum scientists like it cold. But understanding, curiosity, and teamwork fire us up. Anyone under the sway of those elements of science likes it hot.

With thanks to Aaron and Nick. Thanks also to John Smolin and IBM Watson’s quantum-computing-theory team for their hospitality.

1In many situations. Some systems, like small magnets, can access negative temperatures.

February 09, 2016

Doug NatelsonBrief news items

As I go on some travel, here are some news items that looked interesting to me:

  • Rumors are really heating up that LIGO has spotted gravity waves.  The details are similar to some things I'd heard, for what that's worth, though that may just mean that everyone is hearing the same rumors.  update:  Press conference coming (though they may just say that the expt is running well....)
  • The starship Enterprise is undergoing a refit.
  • This paper reports a photocatalytic approach involving asymmetric, oblong, core-shell semiconductor nanoparticles, plus a single Pt nanoparticle catalyst, that (under the right solution conditions) can give essentially 100% efficient hydrogen reduction - every photon goes toward producing hydrogen gas.   If the insights here can be combined with improved solution stability of appropriate nanoparticles, maybe there are ways forward for highly efficient water splitting or photo production of liquid fuels.
  • Quantum materials are like obscenity - hard to define, but you know it when you see it.

Chad OrzelTwitter Is a Cocktail Party That I’m Not Invited To

As I go through my daily routine, I find myself sort of out of phase with a lot of the Internet. My peak online hours are from about six to ten in the morning, Eastern US time. That’s when I get up, have breakfast, and then go to Starbucks to write for a few hours.

This means that most of the other people awake and active on my social media feeds are in Europe or Australia. And my standard writing time ends right around the time things start to heat up in the US. I do continue to have access to the Internet through the afternoon, of course, but unless I have a deadline coming up, I’m often doing stuff that doesn’t involve sitting in front of a computer (and if I do have a deadline coming up, I shut down social media to concentrate on work). And evenings are terrible– I spend a lot of weeknights running SteelyKid to various activities, and even when I’m not doing that, our dinner and bedtime routines don’t leave me much space to participate. By nine or ten pm, I’m completely wiped out.

As a result, I find Twitter a deeply frustrating medium. Twitter is mostly about conversation, but its deliberately ephemeral nature means that you can really only converse effectively with other people who are online and active at the same time you are. And the peak activity times for Twitter conversations are at times when I’m not regularly available because of the way my work and family schedules are arranged. In those peak hours, I’m only checking in intermittently– a few times an hour, usually– and as a result, I miss tons of stuff.

I started thinking about this the other day, when there was a big kerfuffle over Twitter’s plan to introduce an “algorithmic” timeline that would depart from the current strictly-chronological display to highlight some posts from the past. This predictably led to wailing and gnashing of teeth among Twitter power users (and it’s since been walked back a little), who declared that it would be the end of Twitter as we know it. Personally, though, I think it might be a good thing, which led to this lengthy tweetstorm, which you’ll notice was posted at 8am on a Saturday, because that’s when I have time to be on Twitter…

The standard line is that any deviation from strictly chronological Twitter will hopelessly break things in one of a variety of ways, but this is largely predicated on the assumption that the algorithm will be the stupidest and most obnoxious thing you could dream up. But, really, it’s not that hard to do a better job than most of the people outraged about the idea seem to think.

Take, for example, Facebook. Facebook famously switched to an algorithmic timeline a while back, and most of the anti-algorithm arguments feature dark mutterings about how this will make Twitter just like Facebook. To an intermittent social-media user like me, though, Facebook is in many ways better than Twitter. I have slightly more Facebook friends than people I follow on Twitter (about 750 vs just under 600), but Facebook does a better job of highlighting stuff I want to see. I regularly find tweets from Rhett Allain because he has his feed mirrored to Facebook, and the Facebook algorithm knows I like his stuff and makes sure I see it. On Twitter, in the middle of the day, his tweets get lost in a vast flood of stuff that I don’t get to check very often. At the same time, if I’m actively on Facebook for a relatively long time, the feed I see is pretty much chronological.

The other insinuation is that under an algorithmic scheme only stuff from famous tweeters will get shown, or paid ads. But again, I’m not convinced, because Twitter already has an algorithmic feature, the “While You Were Away” box that pops up when you go several hours without checking in. That was roundly condemned when it was introduced for basically the same reasons, but again, I find that it does a good job of highlighting stuff I wouldn’t see otherwise. And it’s not just getting me massively-retweeted stuff from clickbait outlets. One of the people who pops up most frequently in my “While You Were Away” tab is a guy with under 400 followers, because I like a good deal of his stuff, and the algorithm knows that. I find that feature one of the most useful things Twitter has done recently, and would be happy to have it show up more regularly. And given that they do that well, I’m not especially worried about what would happen with a wider use of algorithms.

Of course, the fundamental issue isn’t anything about practical implementation, but rather that the current power users like Twitter as it is, because it works well for them. Which, you know, good for them, but it should be noted that this is fundamentally pretty exclusionary. That is, the way Twitter is set up right now works really well for a particular set of people, who have the sort of jobs and family arrangements such that they’re online and actively engaged at the same time as their friends. It’s big among journalists, for example, because their whole business is about being connected, and science Twitter is dominated by folks in fields whose research mostly has them sitting in front of a computer already. If you’re not lucky enough to be in that particular demographic stratum, though, the current experience of Twitter is much less attractive.

I’ve heard Twitter described as a virtual cocktail party before, and it’s a decent metaphor– lots of people hanging around, engaged in conversation and witty banter. I would note, though, that the usual analogy doesn’t go far enough. For an intermittent user like myself, Twitter is like a really cool cocktail party that I’m not invited to. It’s a bit like the party is spilling out of bar into the lobby of my hotel– I catch snatches of cool conversations as I make my way to the elevator, but I miss most of it because I have other stuff to do. Every now and then, I get a chance to hang out in the bar for a bit, and that’s great, but mostly I’m getting second-hand reports and that’s just not the same.

And it should be noted that I am, in fact, relatively fortunate as such things go. I do have a few hours in the morning where I’m able to participate, and I sometimes get the chance to do more. In the cocktail party metaphor, I’m at least staying in the same hotel with most of the partygoers. The folks in other hotels don’t get even that much, which is why so many people continue to not see the point of Twitter.

The kinds of changes Twitter is talking about making could, if implemented well, make the medium more accessible for those who are currently shut out. It won’t completely open things up– it’s always going to be a conversational medium, and conversation will always require time for engagement– but good algorithms could make it easier for people who aren’t already part of the conversation to see why those who are find it useful and enjoyable.

Of course, as it is, there’s very much a “cool kids” dynamic to Twitter, and a lot of the reaction is best understood in that light. The experience of Twitter that the current power users enjoy is a relatively exclusive one, and Twitter is choosing to pursue broadening access to the service over enhancing the experience of those who already use it heavily. Nothing I’ve heard described is going to shut out anybody who’s already in, though– at most, they’re going to be inconvenienced to a small fraction of the degree that non-power-users are already inconvenienced.

Most of the Bad Things people trot out as results of algorithmic timelines are things that I already put up with as an intermittent Twitter user. Bits of conversation will appear out of context? If you only check in a few times an hour, you already get that (and because otherwise very smart people can’t figure out how to properly thread conversations, there’s often no good way to reconstruct what’s going on, but that’s another rant). You might miss things posted by your friends? That happens now, given the huge flood of stuff that comes in at peak hours– as mentioned above, I have to rely on Facebook’s algorithms to rescue a lot of stuff that gets lost in the noise on Twitter. Your stuff might just vanish without the right people seeing it? That already happens to those of us who are out-of-phase with peak Twitter activity.

All of these negative features are annoyances that people who aren’t on the inside already have to put up with. And given sensibly designed algorithms– which my experience with Facebook and “While You Were Away” suggests are entirely possible– these can be minimized. Done right, they have the potential to make Twitter more attractive and enjoyable for a lot of people who don’t currently get anything out of it.

Chad Orzel159/366: Thanks, Manitoba

I’m in Newport News, VA, to give a talk tonight at Jefferson Lab, and they’re putting me up at the on-site Residence Facility. The rooms at this are apparently sponsored associated with institutions that use the facility, with big signs on all the doors. Here’s mine:

Door to my room at JLab's Residence Facility.

Door to my room at JLab’s Residence Facility.

So, I guess my stay is in some sense subsidized by the University of Manitoba. It’s a perfectly adequate hotel room, so, thanks, Manitoba.

As I am a Sooper Geeenyus, I forgot to pack the dress pants I usually wear when giving talks. Sigh. Happily, this is a public lecture, so jeans-and-sport-coat is a perfectly acceptable outfit. But I still feel like a dope…

David Hoggstar shades, data-driven redshifts

In a day limited by health issues, I had a useful conversation with Leslie Greengard and Alex Barnett (SCDA, Dartmouth) about star-shades for nulling starlight in future exoplanet missions. They had ideas about how the electromagnetic field might be calculated, and issues with what might be being done with current calculations of this. These calculations are hard, because the star-shades under discussion for deployment at L1 are many times 107 wavelengths in diameter, and millions of diameters away from the telescope!

I also talked to Boris Leistedt about galaxy and quasar cosmology using imaging (and a tiny bit of spectroscopy), in which the three-dimensional mapping is performed with photometric redshifts, or more precisely models of the source spectral energy distributions that are modeled simultaneously with the density field and so on. We are working on a first paper with recommendations for LSST. The idea is that a small amount of spectroscopy and an enormous amount of imaging ought to be sufficient to build a model that returns a redshift and spectral energy distribution for every source.

Clifford JohnsonStaring at Stairs…

triangle_staircaseThese stairs probably do not conform to any building code, but I like them anyway, and so they will appear in a paper I'll submit to the arxiv soon.

They're part of a nifty algorithm I thought of on Friday that I like rather a lot.

More later.

-cvj Click to continue reading this post

The post Staring at Stairs… appeared first on Asymptotia.

February 08, 2016

Sean CarrollGuest Post: Grant Remmen on Entropic Gravity

Grant Remmen“Understanding quantum gravity” is on every physicist’s short list of Big Issues we would all like to know more about. If there’s been any lesson from last half-century of serious work on this problem, it’s that the answer is likely to be something more subtle than just “take classical general relativity and quantize it.” Quantum gravity doesn’t seem to be an ordinary quantum field theory.

In that context, it makes sense to take many different approaches and see what shakes out. Alongside old stand-bys such as string theory and loop quantum gravity, there are less head-on approaches that try to understand how quantum gravity can really be so weird, without proposing a specific and complete model of what it might be.

Grant Remmen, a graduate student here at Caltech, has been working with me recently on one such approach, dubbed entropic gravity. We just submitted a paper entitled “What Is the Entropy in Entropic Gravity?” Grant was kind enough to write up this guest blog post to explain what we’re talking about.

Meanwhile, if you’re near Pasadena, Grant and his brother Cole have written a musical, Boldly Go!, which will be performed at Caltech in a few weeks. You won’t want to miss it!

One of the most exciting developments in theoretical physics in the past few years is the growing understanding of the connections between gravity, thermodynamics, and quantum entanglement. Famously, a complete quantum mechanical theory of gravitation is difficult to construct. However, one of the aspects that we are now coming to understand about quantum gravity is that in the final theory, gravitation and even spacetime itself will be closely related to, and maybe even emergent from, the mysterious quantum mechanical property known as entanglement.

This all started several decades ago, when Hawking and others realized that black holes behave with many of the same aspects as garden-variety thermodynamic systems, including temperature, entropy, etc. Most importantly, the black hole’s entropy is equal to its area [divided by (4 times Newton’s constant)]. Attempts to understand the origin of black hole entropy, along with key developments in string theory, led to the formulation of the holographic principle – see, for example, the celebrated AdS/CFT correspondence – in which quantum gravitational physics in some spacetime is found to be completely described by some special non-gravitational physics on the boundary of the spacetime. In a nutshell, one gets a gravitational universe as a “hologram” of a non-gravitational universe.

If gravity can emerge from, or be equivalent to, a set of physical laws without gravity, then something special about that non-gravitational physics has to make it happen. Physicists have now found that that special something is quantum entanglement: the special correlations among quantum mechanical particles that defies classical description. As a result, physicists are very interested in how to get the dynamics describing how spacetime is shaped and moves – Einstein’s equation of general relativity – from various properties of entanglement. In particular, it’s been suggested that the equations of gravity can be shown to come from some notion of entropy. As our universe is quantum mechanical, we should think about the entanglement entropy, a measure of the degree of correlation of quantum subsystems, which for thermal states matches the familiar thermodynamic notion of entropy.

The general idea is as follows: Inspired by black hole thermodynamics, suppose that there’s some more general notion, in which you choose some region of spacetime, compute its area, and find that when its area changes this is associated with a change in entropy. (I’ve been vague here as to what is meant by a “change” in the area and what system we’re computing the area of – this will be clarified soon!) Next, you somehow relate the entropy to an energy (e.g., using thermodynamic relations). Finally, you write the change in area in terms of a change in the spacetime curvature, using differential geometry. Putting all the pieces together, you get a relation between an energy and the curvature of spacetime, which if everything goes well, gives you nothing more or less than Einstein’s equation! This program can be broadly described as entropic gravity and the idea has appeared in numerous forms. With the plethora of entropic gravity theories out there, we realized that there was a need to investigate what categories they fall into and whether their assumptions are justified – this is what we’ve done in our recent work.

In particular, there are two types of theories in which gravity is related to (entanglement) entropy, which we’ve called holographic gravity and thermodynamic gravity in our paper. The difference between the two is in what system you’re considering, how you define the area, and what you mean by a change in that area.

In holographic gravity, you consider a region and define the area as that of its boundary, then consider various alternate configurations and histories of the matter in that region to see how the area would be different. Recent work in AdS/CFT, in which Einstein’s equation at linear order is equivalent to something called the “entanglement first law”, falls into the holographic gravity category. This idea has been extended to apply outside of AdS/CFT by Jacobson (2015). Crucially, Jacobson’s idea is to apply holographic mathematical technology to arbitrary quantum field theories in the bulk of spacetime (rather than specializing to conformal field theories – special physical models – on the boundary as in AdS/CFT) and thereby derive Einstein’s equation. However, in this work, Jacobson needed to make various assumptions about the entanglement structure of quantum field theories. In our paper, we showed how to justify many of those assumptions, applying recent results derived in quantum field theory (for experts, the form of the modular Hamiltonian and vacuum-subtracted entanglement entropy on null surfaces for general quantum field theories). Thus, we are able to show that the holographic gravity approach actually seems to work!

On the other hand, thermodynamic gravity is of a different character. Though it appears in various forms in the literature, we focus on the famous work of Jacobson (1995). In thermodynamic gravity, you don’t consider changing the entire spacetime configuration. Instead, you imagine a bundle of light rays – a lightsheet – in a particular dynamical spacetime background. As the light rays travel along – as you move down the lightsheet – the rays can be focused by curvature of the spacetime. Now, if the bundle of light rays started with a particular cross-sectional area, you’ll find a different area later on. In thermodynamic gravity, this is the change in area that goes into the derivation of Einstein’s equation. Next, one assumes that this change in area is equivalent to an entropy – in the usual black hole way with a factor of 1/(4 times Newton’s constant) – and that this entropy can be interpreted thermodynamically in terms of an energy flow through the lightsheet. The entropy vanishes from the derivation and the Einstein equation almost immediately appears as a thermodynamic equation of state. What we realized, however, is that what the entropy is actually the entropy of was ambiguous in thermodynamic gravity. Surprisingly, we found that there doesn’t seem to be a consistent definition of the entropy in thermodynamic gravity – applying quantum field theory results for the energy and entanglement entropy, we found that thermodynamic gravity could not simultaneously reproduce the correct constant in the Einstein equation and in the entropy/area relation for black holes.

So when all is said and done, we’ve found that holographic gravity, but not thermodynamic gravity, is on the right track. To answer our own question in the title of the paper, we found – in admittedly somewhat technical language – that the vacuum-subtracted von Neumann entropy evaluated on the null boundary of small causal diamonds gives a consistent formulation of holographic gravity. The future looks exciting for finding the connections between gravity and entanglement!

Tim GowersFUNC2 — more examples

The first “official” post of this Polymath project has passed 100 comments, so I think it is time to write a second post. Again I will try to extract some of the useful information from the comments (but not all, and my choice of what to include should not be taken as some kind of judgment). A good way of organizing this post seems to be list a few more methods of construction of interesting union-closed systems that have come up since the last post — where “interesting” ideally means that the system is a counterexample to a conjecture that is not obviously false.

Standard “algebraic” constructions


If \mathcal A is a union-closed family on a ground set X, and Y\subset X, then we can take the family \mathcal A_Y=\{A\cap Y:A\in\mathcal{A}\}. The map \phi:A\to A\cap Y is a homomorphism (in the sense that \phi(A\cup B)=\phi(A)\cup\phi(B), so it makes sense to regard \mathcal A_Y as a quotient of \mathcal A.


If instead we take an equivalence relation R on X, we can define a set-system \mathcal A( R) to be the set of all unions of equivalence classes that belong to \mathcal{A}.

Thus, subsets of X give quotient families and quotient sets of X give subfamilies.


Possibly the most obvious product construction of two families \mathcal A and \mathcal B is to make their ground sets disjoint and then to take \{A\cup B:A\in\mathcal A,B\in\mathcal B\}. (This is the special case with disjoint ground sets of the construction \mathcal A+\mathcal B that Tom Eccles discussed earlier.)

Note that we could define this product slightly differently by saying that it consists of all pairs (A,B)\in\mathcal A\times\mathcal B with the “union” operation (A,B)\sqcup(A',B')=(A\cup A',B\cup B'). This gives an algebraic system called a join semilattice, and it is isomorphic in an obvious sense to \mathcal A+\mathcal B with ordinary unions. Looked at this way, it is not so obvious how one should define abundances, because (\mathcal A\times\mathcal B,\sqcup) does not have a ground set. Of course, we can define them via the isomorphism to \mathcal A+\mathcal B but it would be nice to do so more intrinsically.

More “twisted” products

Tobias Fritz, in this comment, defines a more general “fibre bundle” construction as follows. Let \mathcal K be a union-closed family of sets (the “base” of the system). For each K\in\mathcal K let \mathcal A_K be a union-closed family (one of the “fibres”), and let the elements of \mathcal A consist of pairs (K,A) with A\in\mathcal A_K. We would like to define a join operation \sqcup on \mathcal A by
(K,A)\sqcup(L,B)=(K\cup L,C)
for a suitable C\in\mathcal A_L. For that we need a bit more structure, in the form of homomorphisms \phi_{K,L}:\mathcal A_K\to\mathcal A_L whenever K\subset L. These should satisfy the obvious composition rule \phi_{L,M}\phi_{K,L}=\phi_{K,M}.

With that structure in place, we can take C to be \phi_{K,K\cup L}(A)\cup\phi_{L,K\cup L}(B), and we have something like a union-closed system. To turn it into a union-closed system one needs to find a concrete realization of this “join semilattice” as a set system with the union operation. This can be done in certain cases (see the comment thread linked to above) and quite possibly in all cases.

More specific constructions

Giving more weight to less abundant elements

First, here is a simple construction that shows that Conjecture 6 from the previous post is false. That conjecture states that if you choose a random non-empty A\in\mathcal A and then a random x\in A, then the average abundance of x is at least 1/2. It never seemed likely to be true, but it survived for a surprisingly long time, before the following example was discovered in a comment thread that starts here.

Let m be a large integer and let A,B,C be disjoint sets of size 1, m and m^2. (Many details here are unimportant — for example, all that actually matters is that the sizes of the sets should increase fairly rapidly.) Now take the set system

\{\emptyset, A, B, A\cup B, A\cup C, A\cup B\cup C\}.

To see that this is a counterexample, let us pick our random element x of a random set, and then condition on the five possibilities for what that set is. I’ll do a couple of the calculations and then just state the rest. If x\in A, then its abundance is 2/3. If it is in B, then its abundance is 1/2. If it is in A\cup B, then the probability that it is in A is m^{-1}, which is very small, so its abundance is very close to 1/2 (since with high probability the only three sets it belongs to are B, A\cup B, and A\cup B\cup C). In this kind of way we get that for large enough m we can make the average abundance as close as we like to

\frac 15(2/3 + 1/2 + 1/2 + 1/3 + 1/3)=7/15<1/2.

One thing I would like to do — or would like someone to do — is come up with a refinement of this conjecture that isn’t so obviously false. What this example demonstrates is that duplication shows that for the conjecture to have been true, the following apparently much stronger statement would have had to be true. For each non-empty A\in\mathcal{A}, let m(A) be the minimum abundance of any element of A. Then the average of m(A) over \mathcal A is at least 1/2.

How can we convert the average over A into the minimum over A? The answer is simple: take the original set system \mathcal A and write the elements of the ground set in decreasing order of abundance. Now duplicate the first element (that is, the element with greatest abundance) once, the second element m times, the third m^2 times, and so on. For very large m, the effect of this is that if we choose a random element of A (after the duplications have taken place) then it will have minimal abundance in A.

So it seems that duplication of elements kills off this averaging argument too, but in a slightly subtler way. Could we somehow iterate this thought? For example, could we choose a random x by first picking a random non-empty A\in\mathcal A, then a random B\in\mathcal A such that A\cap B\ne\emptyset, and finally a random element x\in A\cap B? And could we go further — e.g., picking a random chain of the form A_1, A_1\cap A_2, A_1\cap A_2\cap A_3, etc., and stopping when we reach a set whose points cannot be separated further?

Complements of designs

Tobias Fritz came up with a nice strengthening that again turned out (again as expected) to be false. The thought was that it might be nice to find a “bijective” proof of FUNC. Defining \mathcal A_x to be \{A\in\mathcal A:x\in A\} and \mathcal{A}_{\overline x} to be \mathcal A\setminus\mathcal A_x, we would prove FUNC for \mathcal A if we could find an injection from \mathcal A_{\overline x} to \mathcal A_x.

For such an argument to qualify as a proper bijective proof, it is not enough merely to establish the existence of an injection — that follows from FUNC on mere grounds of cardinality. Rather, one should define it in a nice way somehow. That makes it natural to think about what properties such an injection might have, and a particularly natural requirement that one might think about is that it should preserve unions.

It turns out that there are set systems \mathcal A for which there does not exist any x with a union-preserving injection from \mathcal A_{\overline x} to \mathcal A_x. After several failed attempts, I found the following example. Take a not too small pair of positive integers r>s — it looks as though r=5, s=3 works. Then take a Steiner (r,s)-system — that is, a collection \mathcal K of sets of size 5 such that each set of size 3 is contained in exactly one set from \mathcal K. (Work of Peter Keevash guarantees that such a set system exists, though this case was known before his amazing result.)

The counterexample is generated by all complements of sets in \mathcal K, though it is more convenient just to take \mathcal K and prove that there is no intersection-preserving injection from \mathcal K_x to \mathcal K_{\overline x}. To establish this, one first proves that any such injection would have to take sets of size r to sets of size r, which is basically because you need room for all the subsets of size s of a set K to map to distinct subsets of the image of K. Once that is established, it is fairly straightforward to show that there just isn’t room to do things. The argument can be found in the comment linked to above, and the thread below it.

An example of Thomas Bloom

Thomas Bloom came up with a simpler example, which is interesting for other reasons too. His example is generated by the sets \{1,2,3,4,5,6\}, all 2-subsets of \{7,8,9,10\}, and the 6 sets \{1,7,8\}, \{2,7,9\}, \{3,7,10\}, \{4,8,9\}, \{5,8,10\}, \{6,9,10\}. I asked him where this set system had come from, and the answer turned out to be very interesting. He had got it by staring at an example of Renaud and Sarvate of a union-closed set system with exactly one minimal-sized set, which has size 3, such that that minimal set contains no element of abundance at least 1/2. Thomas worked out how the Renaud-Servate example had been pieced together, and used similar ideas to produce his example. Tobias Fritz then went on to show that Thomas’s construction was a special case of his fibre-bundle construction.


This post is by no means a comprehensive account of all the potentially interesting ideas from the last post. For example, Gil Kalai has an interesting slant on the conjecture that I think should be pursued further, and there are a number of interesting questions that were asked in the previous comment thread that I have not repeated here, mainly because the post has taken a long time to write and I think it is time to post it.

Doug NatelsonWhat is density functional theory? part 3 - pitfalls and perils

As I've said, DFT proves that the electron density as a function of position contains basically all the information about the ground state (very cool and very non-obvious).  DFT has become of enormous practical use because one can use simple noninteracting electronic states plus the right functional (which unfortunately we can't write down in simple, easy-to-compute closed form, but we can choose various approximations) to find (a very good approximation to) the true, interacting density.

So, what's the problem, beyond the obvious issues of computing efficiency and the fact that we don't know how to write down an exact form for the exchange-correlation part of the functional (basically where all the bodies are buried)?  

Well, the noninteracting states that people like to use, the so-called Kohn-Sham orbitals, are seductive.  It's easy to think of them as if they are "real", meaning that it's very tempting to start using them to think about excited states and where the electrons "really" live in those states, even though technically there is no a priori reason that they should be valid except as a tool to find the ground state density.  This is discussed a bit in the comments here.  This isn't a completely crazy idea, in the sense that the Kohn-Sham states usually have the right symmetries and in molecules tend to agree well with chemistry ideas about where reactions tend to occur, etc.  However, there are no guarantees.

There are many approaches to do better (e.g., some statements that can be made about the lowest unoccupied orbital that let you determine not just the ground state energy but get a quantitative estimate of the gap to the lowest electronic excited state, and that has enabled very good computations of energy gaps in molecules and solids; time-dependent DFT, which looks at the general time-dependent electron density).  However, you have to be very careful.  Perhaps commenters will have some insights here.  

The bottom line:  DFT is intellectually deep, a boon to many practical calculations when implemented correctly, and so good at many things that the temptation is to treat it like a black box (especially as there are more and more simple-to-use commercial implementations) and assume it's good at everything.  It remains an impressive achievement with huge scientific impact, and unless there are major advances in other computational approaches, DFT and its relatives are likely the best bet for achieving the long-desired ability to do "materials by design".  

February 07, 2016

John BaezRumors of Gravitational Waves

The Laser Interferometric Gravitational-Wave Observatory or LIGO is designed to detect gravitational waves—ripples of curvature in spacetime moving at the speed of light. It’s recently been upgraded, and it will either find gravitational waves soon or something really strange is going on.

Rumors are swirling that LIGO has seen gravitational waves produced by two black holes, of 29 and 36 solar masses, spiralling towards each other—and then colliding to form a single 62-solar-mass black hole!

You’ll notice that 29 + 36 is more than 62. So, it’s possible that three solar masses were turned into energy, mostly in the form of gravitational waves!

According to these rumors, the statistical significance of the signal is supposedly very high: better than 5 sigma! That means there’s at most a 0.000057% probability this event is a random fluke – assuming nobody made a mistake.

If these rumors are correct, we should soon see an official announcement. If the discovery holds up, someone will win a Nobel prize.

The discovery of gravitational waves is completely unsurprising, since they’re predicted by general relativity, a theory that’s passed many tests already. But it would open up a new window to the universe – and we’re likely to see interesting new things, once gravitational wave astronomy becomes a thing.

Here’s the tweet that launched the latest round of rumors:


For background on this story, try this:

Tale of a doomed galaxy, Azimuth, 8 November 2015.

The first four sections of that long post discuss gravitational waves created by black hole collisions—but the last section is about LIGO and an earlier round of rumors, so I’ll quote it here!

LIGO stands for Laser Interferometer Gravitational Wave Observatory. The idea is simple. You shine a laser beam down two very long tubes and let it bounce back and forth between mirrors at the ends. You use this compare the length of these tubes. When a gravitational wave comes by, it stretches space in one direction and squashes it in another direction. So, we can detect it.

Sounds easy, eh? Not when you run the numbers! We’re trying to see gravitational waves that stretch space just a tiny bit: about one part in 1023. At LIGO, the tubes are 4 kilometers long. So, we need to see their length change by an absurdly small amount: one-thousandth the diameter of a proton!

It’s amazing to me that people can even contemplate doing this, much less succeed. They use lots of tricks:

• They bounce the light back and forth many times, effectively increasing the length of the tubes to 1800 kilometers.

• There’s no air in the tubes—just a very good vacuum.

• They hang the mirrors on quartz fibers, making each mirror part of a pendulum with very little friction. This means it vibrates very well at one particular frequency, and very badly at frequencies far from that. This damps out the shaking of the ground, which is a real problem.

• This pendulum is hung on another pendulum.

• That pendulum is hung on a third pendulum.

• That pendulum is hung on a fourth pendulum.

• The whole chain of pendulums is sitting on a device that detects vibrations and moves in a way to counteract them, sort of like noise-cancelling headphones.

• There are 2 of these facilities, one in Livingston, Louisiana and another in Hanford, Washington. Only if both detect a gravitational wave do we get excited.

I visited the LIGO facility in Louisiana in 2006. It was really cool! Back then, the sensitivity was good enough to see collisions of black holes and neutron stars up to 50 million light years away.

Here I’m not talking about the supermassive black holes that live in the centers of galaxies. I’m talking about the much more common black holes and neutron stars that form when stars go supernova. Sometimes a pair of stars orbiting each other will both blow up, and form two black holes—or two neutron stars, or a black hole and neutron star. And eventually these will spiral into each other and emit lots of gravitational waves right before they collide.

50 million light years is big enough that LIGO could see about half the galaxies in the Virgo Cluster. Unfortunately, with that many galaxies, we only expect to see one neutron star collision every 50 years or so.

They never saw anything. So they kept improving the machines, and now we’ve got Advanced LIGO! This should now be able to see collisions up to 225 million light years away… and after a while, three times further.

They turned it on September 18th. Soon we should see more than one gravitational wave burst each year.

In fact, there’s a rumor that they’ve already seen one! But they’re still testing the device, and there’s a team whose job is to inject fake signals, just to see if they’re detected. Davide Castelvecchi writes:

LIGO is almost unique among physics experiments in practising ‘blind injection’. A team of three collaboration members has the ability to simulate a detection by using actuators to move the mirrors. “Only they know if, and when, a certain type of signal has been injected,” says Laura Cadonati, a physicist at the Georgia Institute of Technology in Atlanta who leads the Advanced LIGO’s data-analysis team.

Two such exercises took place during earlier science runs of LIGO, one in 2007 and one in 2010. Harry Collins, a sociologist of science at Cardiff University, UK, was there to document them (and has written books about it). He says that the exercises can be valuable for rehearsing the analysis techniques that will be needed when a real event occurs. But the practice can also be a drain on the team’s energies. “Analysing one of these events can be enormously time consuming,” he says. “At some point, it damages their home life.”

The original blind-injection exercises took 18 months and 6 months respectively. The first one was discarded, but in the second case, the collaboration wrote a paper and held a vote to decide whether they would make an announcement. Only then did the blind-injection team ‘open the envelope’ and reveal that the events had been staged.

Aargh! The disappointment would be crushing.

But with luck, Advanced LIGO will soon detect real gravitational waves. And I hope life here in the Milky Way thrives for a long time – so that when the gravitational waves from the doomed galaxy PG 1302-102 reach us, hundreds of thousands of years in the future, we can study them in exquisite detail.

For Castelvecchi’s whole story, see:

• Davide Castelvecchi Has giant LIGO experiment seen gravitational waves?, Nature, 30 September 2015.

For pictures of my visit to LIGO, see:

• John Baez, This week’s finds in mathematical physics (week 241), 20 November 2006.

For how Advanced LIGO works, see:

• The LIGO Scientific Collaboration Advanced LIGO, 17 November 2014.

Scott Aaronson“Why does the universe exist?” … finally answered (or dissolved) in this blog post!

In my previous post, I linked to seven Closer to Truth videos of me spouting about free will, Gödel’s Theorem, black holes, etc. etc.  I also mentioned that there was a segment of me talking about why the universe exists that for some reason they didn’t put up.  Commenter mjgeddes wrote, “Would have liked to hear your views on the existence of the universe question,” so I answered in another comment.

But then I thought about it some more, and it seemed inappropriate to me that my considered statement about why the universe exists should only be available as part of a comment thread on my blog.  At the very least, I thought, such a thing ought to be a top-level post.

So, without further ado:

My view is that, if we want to make mental peace with the “Why does the universe exist?” question, the key thing we need to do is forget about the universe for a while, and just focus on the meaning of the word “why.”  I.e., when we ask a why-question, what kind of answer are we looking for, what kind of answer would make us happy?

Notice, in particular, that there are hundreds of other why-questions, not nearly as prestigious as the universe one, yet that seem just as vertiginously unanswerable.  E.g., why is 5 a prime number?  Why does “cat” have 3 letters?

Now, the best account of “why”—and of explanation and causality—that I know about is the interventionist account, as developed for example in Judea Pearl’s work.  In that account, to ask “Why is X true?” is simply to ask: “What could we have changed in order to make X false?”  I.e., in the causal network of reality, what are the levers that turn X on or off?

This question can sometimes make sense even in pure math.  For example: “Why is this theorem true?” “It’s true only because we’re working over the complex numbers.  The analogous statement about real numbers is false.”  A perfectly good interventionist answer.

On the other hand, in the case of “Why is 5 prime?,” all the levers you could pull to make 5 composite involve significantly more advanced machinery than is needed to pose the question in the first place.  E.g., “5 is prime because we’re working over the ring of integers.  Over other rings, like Z[√5], it admits nontrivial factorizations.”  Not really an explanation that would satisfy a four-year-old (or me, for that matter).

And then we come to the question of why anything exists.  For an interventionist, this translates into: what causal lever could have been pulled in order to make nothing exist?  Well, whatever lever it was, presumably the lever itself was something—and so you see the problem right there.

Admittedly, suppose there were a giant red button, somewhere within the universe, that when pushed would cause the entire universe (including the button itself) to blink out of existence. In that case, we could say: the reason why the universe continues to exist is that no one has pushed the button yet. But even then, that still wouldn’t explain why the universe had existed.

February 06, 2016

Clifford JohnsonOn Zero Matter

zero-matter-containedOver at Marvel, I chatted with actor Reggie Austin (Dr. Jason Wilkes on Agent Carter) some more about the physics I helped embed in the show this season. It was fun. (See an earlier chat here.) This was about Zero Matter itself (which will also be a precursor to things seen in the movie Dr. Strange later this year)... It was one of the first things the writers asked me about when I first met them, and we brainstormed about things like what it should be called (the name "dark force" comes later in Marvel history), and how a scientist who encountered it would contain it. This got me thinking about things like perfect fluids, plasma physics, exotic phases of materials, magnetic fields, and the like (sadly the interview skips a lot of what I said about those)... and to the writers' and show-runners' enormous credit, lots of these concepts were allowed to appear in the show in various ways, including (versions of) two containment designs that I sketched out. Anyway, have a look in the embed below.

Oh! The name. We did not settle on a name after the first meeting, but one of [...] Click to continue reading this post

The post On Zero Matter appeared first on Asymptotia.

John BaezInformation Geometry (Part 15)

It’s been a long time since you’ve seen an installment of the information geometry series on this blog! Before I took a long break, I was explaining relative entropy and how it changes in evolutionary games. Much of what I said is summarized and carried further here:

• John Baez and Blake Pollard, Relative entropy in biological systems. (Blog article here.)

But now Blake has a new paper, and I want to talk about that:

• Blake Pollard, Open Markov processes: a compositional perspective on non-equilibrium steady states in biology, to appear in Open Systems and Information Dynamics.

I’ll focus on just one aspect: the principle of minimum entropy production. This is an exciting yet controversial principle in non-equilibrium thermodynamics. Blake examines it in a situation where we can tell exactly what’s happening.

Non-equilibrium steady states

Life exists away from equilibrium. Left isolated, systems will tend toward thermodynamic equilibrium. However, biology is about open systems: physical systems that exchange matter or energy with their surroundings. Open systems can be maintained away from equilibrium by this exchange. This leads to the idea of a non-equilibrium steady state—a state of an open system that doesn’t change, but is not in equilibrium.

A simple example is a pan of water sitting on a stove. Heat passes from the flame to the water and then to the air above. If the flame is very low, the water doesn’t boil and nothing moves. So, we have a steady state, at least approximately. But this is not an equilibrium, because there is a constant flow of energy through the water.

Of course in reality the water will be slowly evaporating, so we don’t really have a steady state. As always, models are approximations. If the water is evaporating slowly enough, it can be useful to approximate the situation with a non-equilibrium steady state.

There is much more to biology than steady states. However, to dip our toe into the chilly waters of non-equilibrium thermodynamics, it is nice to start with steady states. And already here there are puzzles left to solve.

Minimum entropy production

Ilya Prigogine won the Nobel prize for his work on non-equilibrium thermodynamics. One reason is that he had an interesting idea about steady states. He claimed that under certain conditions, a non-equilibrium steady state will minimize entropy production!

There has been a lot of work trying to make the ‘principle of minimum entropy production’ precise and turn it into a theorem. In this book:

• G. Lebon and D. Jou, Understanding Non-equilibrium Thermodynamics, Springer, Berlin, 2008.

the authors give an argument for the principle of minimum entropy production based on four conditions:

time-independent boundary conditions: the surroundings of the system don’t change with time.

linear phenomenological laws: the laws governing the macroscopic behavior of the system are linear.

constant phenomenological coefficients: the laws governing the macroscopic behavior of the system don’t change with time.

symmetry of the phenomenological coefficients: since they are linear, the laws governing the macroscopic behavior of the system can be described by a linear operator, and we demand that in a suitable basis the matrix for this operator is symmetric: T_{ij} = T_{ji}.

The last condition is obviously the subtlest one; it’s sometimes called Onsager reciprocity, and people have spent a lot of time trying to derive it from other conditions.

However, Blake goes in a different direction. He considers a concrete class of open systems, a very large class called ‘open Markov processes’. These systems obey the first three conditions listed above, and the ‘detailed balanced’ open Markov processes also obey the last one. But Blake shows that minimum entropy production holds only approximately—with the approximation being good for steady states that are near equilibrium!

However, he shows that another minimum principle holds exactly, even for steady states that are far from equilibrium. He calls this the ‘principle of minimum dissipation’.

We actually discussed the principle of minimum dissipation in an earlier paper:

• John Baez, Brendan Fong and Blake Pollard, A compositional framework for Markov processes. (Blog article here.)

But one advantage of Blake’s new paper is that it presents the results with a minimum of category theory. Of course I love category theory, and I think it’s the right way to formalize open systems, but it can be intimidating.

Another good thing about Blake’s new paper is that it explicitly compares the principle of minimum entropy to the principle of minimum dissipation. He shows they agree in a certain limit—namely, the limit where the system is close to equilibrium.

Let me explain this. I won’t include the nice example from biology that Blake discusses: a very simple model of membrane transport. For that, read his paper! I’ll just give the general results.

The principle of minimum dissipation

An open Markov process consists of a finite set X of states, a subset B \subseteq X of boundary states, and an infinitesimal stochastic operator H: \mathbb{R}^X \to \mathbb{R}^X, meaning a linear operator with

H_{ij} \geq 0 \ \  \text{for all} \ \ i \neq j


\sum_i H_{ij} = 0 \ \  \text{for all} \ \ j

I’ll explain these two conditions in a minute.

For each i \in X we introduce a population p_i  \in [0,\infty). We call the resulting function p : X \to [0,\infty) the population distribution. Populations evolve in time according to the open master equation:

\displaystyle{ \frac{dp_i}{dt} = \sum_j H_{ij}p_j} \ \  \text{for all} \ \ i \in X-B

p_i(t) = b_i(t) \ \  \text{for all} \ \ i \in B

So, the populations p_i obey a linear differential equation at states i that are not in the boundary, but they are specified ‘by the user’ to be chosen functions b_i at the boundary states.

The off-diagonal entries H_{ij}, \ i \neq j are the rates at which population hops from the jth to the ith state. This lets us understand the definition of an infinitesimal stochastic operator. The first condition:

H_{ij} \geq 0 \ \  \text{for all} \ \ i \neq j

says that the rate for population to transition from one state to another is non-negative. The second:

\sum_i H_{ij} = 0 \ \  \text{for all} \ \ j

says that population is conserved, at least if there are no boundary states. Population can flow in or out at boundary states, since the master equation doesn’t hold there.

A steady state is a solution of the open master equation that does not change with time. A steady state for a closed Markov process is typically called an equilibrium. So, an equilibrium obeys the master equation at all states, while for a steady state this may not be true at the boundary states. Again, the reason is that population can flow in or out at the boundary.

We say an equilibrium q : X \to [0,\infty) of a Markov process is detailed balanced if the rate at which population flows from the ith state to the jth state is equal to the rate at which it flows from the jth state to the ith:

H_{ji}q_i = H_{ij}q_j \ \  \text{for all} \ \ i,j \in X

Suppose we’ve got an open Markov process that has a detailed balanced equilibrium q. Then a non-equilibrium steady state p will minimize a function called the ‘dissipation’, subject to constraints on its boundary populations. There’s a nice formula for the dissipation in terms of p and q.

Definition. Given an open Markov process with detailed balanced equilibrium q we define the dissipation for a population distribution p to be

\displaystyle{ D(p) = \frac{1}{2}\sum_{i,j} H_{ij}q_j \left( \frac{p_j}{q_j} - \frac{p_i}{q_i} \right)^2 }

This formula is a bit tricky, but you’ll notice it’s quadratic in p and it vanishes when p = q. So, it’s pretty nice.

Using this concept we can formulate a principle of minimum dissipation, and prove that non-equilibrium steady states obey this principle:

Definition. We say a population distribution p: X \to \mathbb{R} obeys the principle of minimum dissipation with boundary population b: X \to \mathbb{R} if p minimizes D(p) subject to the constraint that

p_i = b_i \ \  \text{for all} \ \ i \in B.

Theorem 1. A population distribution p is a steady state with p_i = b_i for all boundary states i if and only if p obeys the principle of minimum dissipation with boundary population b.

Proof. This follows from Theorem 28 in A compositional framework for Markov processes.

Minimum entropy production versus minimum dissipation

How does dissipation compare with entropy production? To answer this, first we must ask: what really is entropy production? And: how does the equilibrium state q show up in the concept of entropy production?

The relative entropy of two population distributions p,q is given by

\displaystyle{ I(p,q) = \sum_i p_i \ln \left( \frac{p_i}{q_i} \right) }

It is well known that for a closed Markov process with q as a detailed balanced equilibrium, the relative entropy is monotonically decreasing with time. This is due to an annoying sign convention in the definition of relative entropy: while entropy is typically increasing, relative entropy typically decreases. We could fix this by putting a minus sign in the above formula or giving this quantity I(p,q) some other name. A lot of people call it the Kullback–Leibler divergence, but I have taken to calling it relative information. For more, see:

• John Baez and Blake Pollard, Relative entropy in biological systems. (Blog article here.)

We say ‘relative entropy’ in the title, but then we explain why ‘relative information’ is a better name, and use that. More importantly, we explain why I(p,q) has the physical meaning of free energy. Free energy tends to decrease, so everything is okay. For details, see Section 4.

Blake has a nice formula for how fast I(p,q) decreases:

Theorem 2. Consider an open Markov process with X as its set of states and B as the set of boundary states. Suppose p(t) obeys the open master equation and q is a detailed balanced equilibrium. For any boundary state i \in B, let

\displaystyle{ \frac{Dp_i}{Dt} = \frac{dp_i}{dt} - \sum_{j \in X} H_{ij}p_j }

measure how much p_i fails to obey the master equation. Then we have

\begin{array}{ccl}   \displaystyle{  \frac{d}{dt}  I(p(t),q) } &=& \displaystyle{ \sum_{i, j \in X} H_{ij} p_j  \left( \ln(\frac{p_i}{q_i}) - \frac{p_i q_j}{p_j q_i} \right)} \\ \\ && \; + \; \displaystyle{ \sum_{i \in B} \frac{\partial I}{\partial p_i} \frac{Dp_i}{Dt} }  \end{array}

Moreover, the first term is less than or equal to zero.

Proof. For a self-contained proof, see Information geometry (part 16), which is coming up soon. It will be a special case of the theorems there.   █

Blake compares this result to previous work by Schnakenberg:

• J. Schnakenberg, Network theory of microscopic and macroscopic behavior of master equation systems, Rev. Mod. Phys. 48 (1976), 571–585.

The negative of Blake’s first term is this:

\displaystyle{ K(p) = - \sum_{i, j \in X} H_{ij} p_j  \left( \ln(\frac{p_i}{q_i}) - \frac{p_i q_j}{p_j q_i} \right) }

Under certain circumstances, this equals what Schnakenberg calls the entropy production. But a better name for this quantity might be free energy loss, since for a closed Markov process that’s exactly what it is! In this case there are no boundary states, so the theorem above says K(p) is the rate at which relative entropy—or in other words, free energy—decreases.

For an open Markov process, things are more complicated. The theorem above shows that free energy can also flow in or out at the boundary, thanks to the second term in the formula.

Anyway, the sensible thing is to compare a principle of ‘minimum free energy loss’ to the principle of minimum dissipation. The principle of minimum dissipation is true. How about the principle of minimum free energy loss? It turns out to be approximately true near equilibrium.

For this, consider the situation in which p is near to the equilibrium distribution q in the sense that

\displaystyle{ \frac{p_i}{q_i} = 1 + \epsilon_i }

for some small numbers \epsilon_i. We collect these numbers in a vector called \epsilon.

Theorem 3. Consider an open Markov process with X as its set of states and B as the set of boundary states. Suppose q is a detailed balanced equilibrium and let p be arbitrary. Then

K(p) = D(p) + O(\epsilon^2)

where K(p) is the free energy loss, D(p) is the dissipation, \epsilon_i is defined as above, and by O(\epsilon^2) we mean a sum of terms of order \epsilon_i^2.

Proof. First take the free energy loss:

\displaystyle{ K(p) = -\sum_{i, j \in X} H_{ij} p_j  \left( \ln(\frac{p_i}{q_i}) - \frac{p_i q_j}{p_j q_i} \right)}

Expanding the logarithm to first order in \epsilon, we get

\displaystyle{ K(p) =  -\sum_{i, j \in X} H_{ij} p_j  \left( \frac{p_i}{q_i} - 1 - \frac{p_i q_j}{p_j q_i} \right) + O(\epsilon^2) }

Since H is infinitesimal stochastic, \sum_i H_{ij} = 0, so the second term in the sum vanishes, leaving

\displaystyle{ K(p) =  -\sum_{i, j \in X} H_{ij} p_j  \left( \frac{p_i}{q_i} - \frac{p_i q_j}{p_j q_i} \right) \; + O(\epsilon^2) }


\displaystyle{ K(p) =  -\sum_{i, j \in X} \left( H_{ij} p_j  \frac{p_i}{q_i} - H_{ij} q_j \frac{p_i}{q_i} \right) \; + O(\epsilon^2) }

Since q is a equilibrium we have \sum_j H_{ij} q_j = 0, so now the last term in the sum vanishes, leaving

\displaystyle{ K(p) =  -\sum_{i, j \in X} H_{ij} \frac{p_i p_j}{q_i} \; + O(\epsilon^2) }

Next, take the dissipation

\displaystyle{ D(p) = \frac{1}{2}\sum_{i,j} H_{ij}q_j \left( \frac{p_j}{q_j} - \frac{p_i}{q_i} \right)^2 }

and expand the square, getting

\displaystyle{ D(p) = \frac{1}{2}\sum_{i,j} H_{ij}q_j \left( \frac{p_j^2}{q_j^2} - 2\frac{p_i p_j}{q_i q_j} +  \frac{p_i^2}{q_i^2} \right) }

Since H is infinitesimal stochastic, \sum_i H_{ij} = 0. The first term is just this times a function of j, summed over j, so it vanishes, leaving

\displaystyle{ D(p) = \frac{1}{2}\sum_{i,j} H_{ij}q_j \left(- 2\frac{p_i p_j}{q_i q_j} +  \frac{p_i^2}{q_i^2} \right) }

Since q is an equilibrium, \sum_j H_{ij} q_j = 0. The last term above is this times a function of i, summed over i, so it vanishes, leaving

\displaystyle{ D(p) = - \sum_{i,j} H_{ij}q_j  \frac{p_i p_j}{q_i q_j} = - \sum_{i,j} H_{ij} \frac{p_i p_j}{q_i}  }

This matches what we got for K(p), up to terms of order O(\epsilon^2).   █

In short: detailed balanced open Markov processes are governed by the principle of minimum dissipation, not minimum entropy production. Minimum dissipation agrees with minimum entropy production only near equilibrium.

John BaezInformation Geometry (Part 16)

joint with Blake Pollard

Lately we’ve been thinking about open Markov processes. These are random processes where something can hop randomly from one state to another (that’s the ‘Markov process’ part) but also enter or leave the system (that’s the ‘open’ part).

The ultimate goal is to understand the nonequilibrium thermodynamics of open systems—systems where energy and maybe matter flows in and out. If we could understand this well enough, we could understand in detail how life works. That’s a difficult job! But one has to start somewhere, and this is one place to start.

We have a few papers on this subject:

• Blake Pollard, A Second Law for open Markov processes. (Blog article here.)

• John Baez, Brendan Fong and Blake Pollard, A compositional framework for Markov processes. (Blog article here.)

• Blake Pollard, Open Markov processes: A compositional perspective on non-equilibrium steady states in biology. (Blog article here.)

However, right now we just want to show you three closely connected results about how relative entropy changes in open Markov processes.


An open Markov process consists of a finite set X of states, a subset B \subseteq X of boundary states, and an infinitesimal stochastic operator H: \mathbb{R}^X \to \mathbb{R}^X, meaning a linear operator with

H_{ij} \geq 0 \ \  \text{for all} \ \ i \neq j


\sum_i H_{ij} = 0 \ \  \text{for all} \ \ j

For each state i \in X we introduce a population p_i  \in [0,\infty). We call the resulting function p : X \to [0,\infty) the population distribution.

Populations evolve in time according to the open master equation:

\displaystyle{ \frac{dp_i}{dt} = \sum_j H_{ij}p_j} \ \  \text{for all} \ \  i \in X-B

p_i(t) = b_i(t) \ \  \text{for all} \ \  i \in B

So, the populations p_i obey a linear differential equation at states i that are not in the boundary, but they are specified ‘by the user’ to be chosen functions b_i at the boundary states. The off-diagonal entry H_{ij} for i \neq j describe the rate at which population transitions from the jth to the ith state.

A closed Markov process, or continuous-time discrete-state Markov chain, is an open Markov process whose boundary is empty. For a closed Markov process, the open master equation becomes the usual master equation:

\displaystyle{  \frac{dp}{dt} = Hp }

In a closed Markov process the total population is conserved:

\displaystyle{ \frac{d}{dt} \sum_{i \in X} p_i = \sum_{i,j} H_{ij}p_j = 0 }

This lets us normalize the initial total population to 1 and have it stay equal to 1. If we do this, we can talk about probabilities instead of populations. In an open Markov process, population can flow in and out at the boundary states.

For any pair of distinct states i,j, H_{ij}p_j is the flow of population from j to i. The net flux of population from the jth state to the ith state is the flow from j to i minus the flow from i to j:

J_{ij} = H_{ij}p_j - H_{ji}p_i

A steady state is a solution of the open master equation that does not change with time. A steady state for a closed Markov process is typically called an equilibrium. So, an equilibrium obeys the master equation at all states, while for a steady state this may not be true at the boundary states. The idea is that population can flow in or out at the boundary states.

We say an equilibrium p : X \to [0,\infty) of a Markov process is detailed balanced if all the net fluxes vanish:

J_{ij} = 0 \ \  \text{for all} \ \ i,j \in X

or in other words:

H_{ij}p_j = H_{ji}p_i \ \  \text{for all} \ \ i,j \in X

Given two population distributions p, q : X \to [0,\infty) we can define the relative entropy

\displaystyle{  I(p,q) = \sum_i p_i \ln \left( \frac{p_i}{q_i} \right)}

When q is a detailed balanced equilibrium solution of the master equation, the relative entropy can be seen as the ‘free energy’ of p. For a precise statement, see Section 4 of Relative entropy in biological systems.

The Second Law of Thermodynamics implies that the free energy of a closed system tends to decrease with time, so for closed Markov processes we expect I(p,q) to be nonincreasing. And this is true! But for open Markov processes, free energy can flow in from outside. This is just one of several nice results about how relative entropy changes with time.


Theorem 1. Consider an open Markov process with X as its set of states and B as the set of boundary states. Suppose p(t) and q(t) obey the open master equation, and let the quantities

\displaystyle{ \frac{Dp_i}{Dt} = \frac{dp_i}{dt} - \sum_{j \in X} H_{ij}p_j }

\displaystyle{  \frac{Dq_i}{Dt} = \frac{dq_i}{dt} - \sum_{j \in X} H_{ij}q_j }

measure how much the time derivatives of p_i and q_i fail to obey the master equation. Then we have

\begin{array}{ccl}   \displaystyle{  \frac{d}{dt}  I(p(t),q(t)) } &=& \displaystyle{ \sum_{i, j \in X} H_{ij} p_j  \left( \ln(\frac{p_i}{q_i}) - \frac{p_i q_j}{p_j q_i} \right)} \\ \\ && \; + \; \displaystyle{ \sum_{i \in B} \frac{\partial I}{\partial p_i} \frac{Dp_i}{Dt} +  \frac{\partial I}{\partial q_i} \frac{Dq_i}{Dt} }  \end{array}

This result separates the change in relative entropy change into two parts: an ‘internal’ part and a ‘boundary’ part.

It turns out the ‘internal’ part is always less than or equal to zero. So, from Theorem 1 we can deduce a version of the Second Law of Thermodynamics for open Markov processes:

Theorem 2. Given the conditions of Theorem 1, we have

\displaystyle{  \frac{d}{dt}  I(p(t),q(t)) \; \le \;  \sum_{i \in B} \frac{\partial I}{\partial p_i} \frac{Dp_i}{Dt} +  \frac{\partial I}{\partial q_i} \frac{Dq_i}{Dt}  }

Intuitively, this says that free energy can only increase if it comes in from the boundary!

There is another nice result that holds when q is an equilibrium solution of the master equation. This idea seems to go back to Schnakenberg:

Theorem 3. Given the conditions of Theorem 1, suppose also that q is an equilibrium solution of the master equation. Then we have

\displaystyle{  \frac{d}{dt}  I(p(t),q) =  -\frac{1}{2} \sum_{i,j \in X} J_{ij} A_{ij} \; + \; \sum_{i \in B} \frac{\partial I}{\partial p_i} \frac{Dp_i}{Dt} }


J_{ij} = H_{ij}p_j - H_{ji}p_i

is the net flux from j to i, while

\displaystyle{ A_{ij} = \ln \left(\frac{p_j q_i}{p_i q_j} \right) }

is the conjugate thermodynamic force.

The flux J_{ij} has a nice meaning: it’s the net flow of population from j to i. The thermodynamic force is a bit subtler, but this theorem reveals its meaning: it says how much the population wants to flow from j to i.

More precisely, up to that factor of 1/2, the thermodynamic force A_{ij} says how much free energy loss is caused by net flux from j to i. There’s a nice analogy here to water losing potential energy as it flows downhill due to the force of gravity.


Proof of Theorem 1. We begin by taking the time derivative of the relative information:

\begin{array}{ccl} \displaystyle{ \frac{d}{dt}  I(p(t),q(t)) } &=&  \displaystyle{  \sum_{i \in X} \frac{\partial I}{\partial p_i} \frac{dp_i}{dt} +  \frac{\partial I}{\partial q_i} \frac{dq_i}{dt} } \end{array}

We can separate this into a sum over states i \in X - B, for which the time derivatives of p_i and q_i are given by the master equation, and boundary states i \in B, for which they are not:

\begin{array}{ccl} \displaystyle{ \frac{d}{dt}  I(p(t),q(t)) } &=&  \displaystyle{  \sum_{i \in X-B, \; j \in X} \frac{\partial I}{\partial p_i} H_{ij} p_j +                                               \frac{\partial I}{\partial q_i} H_{ij} q_j }\\  \\   && + \; \; \; \displaystyle{  \sum_{i \in B} \frac{\partial I}{\partial p_i} \frac{dp_i}{dt} +  \frac{\partial I}{\partial q_i} \frac{dq_i}{dt}}   \end{array}

For boundary states we have

\displaystyle{ \frac{dp_i}{dt} = \frac{Dp_i}{Dt} + \sum_{j \in X} H_{ij}p_j }

and similarly for the time derivative of q_i. We thus obtain

\begin{array}{ccl}  \displaystyle{ \frac{d}{dt}  I(p(t),q(t)) } &=&  \displaystyle{  \sum_{i,j \in X} \frac{\partial I}{\partial p_i} H_{ij} p_j + \frac{\partial I}{\partial q_i} H_{ij} q_j }\\  \\ && + \; \; \displaystyle{  \sum_{i \in B} \frac{\partial I}{\partial p_i} \frac{Dp_i}{Dt} +  \frac{\partial I}{\partial q_i} \frac{Dq_i}{Dt}}   \end{array}

To evaluate the first sum, recall that

\displaystyle{   I(p,q) = \sum_{i \in X} p_i \ln (\frac{p_i}{q_i})}


\displaystyle{\frac{\partial I}{\partial p_i}} =\displaystyle{1 +  \ln (\frac{p_i}{q_i})} ,  \qquad \displaystyle{ \frac{\partial I}{\partial q_i}}=  \displaystyle{- \frac{p_i}{q_i}   }

Thus, we have

\displaystyle{ \sum_{i,j \in X}  \frac{\partial I}{\partial p_i} H_{ij} p_j + \frac{\partial I}{\partial q_i} H_{ij} q_j  =   \sum_{i,j\in X} (1 +  \ln (\frac{p_i}{q_i})) H_{ij} p_j - \frac{p_i}{q_i} H_{ij} q_j }

We can rewrite this as

\displaystyle{   \sum_{i,j \in X} H_{ij} p_j  \left( 1 + \ln(\frac{p_i}{q_i}) - \frac{p_i q_j}{p_j q_i} \right) }

Since H_{ij} is infinitesimal stochastic we have \sum_{i} H_{ij} = 0, so the first term drops out, and we are left with

\displaystyle{   \sum_{i,j \in X} H_{ij} p_j  \left( \ln(\frac{p_i}{q_i}) - \frac{p_i q_j}{p_j q_i} \right) }

as desired.   █

Proof of Theorem 2. Thanks to Theorem 1, to prove

\displaystyle{  \frac{d}{dt}  I(p(t),q(t)) \; \le \;  \sum_{i \in B} \frac{\partial I}{\partial p_i} \frac{Dp_i}{Dt} +  \frac{\partial I}{\partial q_i} \frac{Dq_i}{Dt}  }

it suffices to show that

\displaystyle{   \sum_{i,j \in X} H_{ij} p_j  \left( \ln(\frac{p_i}{q_i}) - \frac{p_i q_j}{p_j q_i} \right) \le 0  }

or equivalently (recalling the proof of Theorem 1):

\displaystyle{ \sum_{i,j} H_{ij} p_j  \left( \ln(\frac{p_i}{q_i}) + 1 - \frac{p_i q_j}{p_j q_i} \right) \le 0 }

The last two terms on the left hand side cancel when i = j. Thus, if we break the sum into an i \ne j part and an i = j part, the left side becomes

\displaystyle{   \sum_{i \ne j} H_{ij} p_j  \left( \ln(\frac{p_i}{q_i}) + 1 - \frac{p_i q_j}{p_j q_i} \right) \; + \; \sum_j H_{jj} p_j \ln(\frac{p_j}{q_j}) }

Next we can use the infinitesimal stochastic property of H to write H_{jj} as the sum of -H_{ij} over i not equal to j, obtaining

\displaystyle{ \sum_{i \ne j} H_{ij} p_j  \left( \ln(\frac{p_i}{q_i}) + 1 - \frac{p_i q_j}{p_j q_i} \right) - \sum_{i \ne j} H_{ij} p_j \ln(\frac{p_j}{q_j}) } =

\displaystyle{ \sum_{i \ne j} H_{ij} p_j  \left( \ln(\frac{p_iq_j}{p_j q_i}) + 1 - \frac{p_i q_j}{p_j q_i} \right) }

Since H_{ij} \ge 0 when i \ne j and \ln(s) + 1 - s \le 0 for all s > 0, we conclude that this quantity is \le 0.   █

Proof of Theorem 3. Now suppose also that q is an equilibrium solution of the master equation. Then Dq_i/Dt = dq_i/dt = 0 for all states i, so by Theorem 1 we need to show

\displaystyle{ \sum_{i, j \in X} H_{ij} p_j  \left( \ln(\frac{p_i}{q_i}) - \frac{p_i q_j}{p_j q_i} \right)  \; = \;  -\frac{1}{2} \sum_{i,j \in X} J_{ij} A_{ij} }

We also have \sum_{j \in X} H_{ij} q_j = 0, so the second
term in the sum at left vanishes, and it suffices to show

\displaystyle{  \sum_{i, j \in X} H_{ij} p_j  \ln(\frac{p_i}{q_i}) \; = \;  - \frac{1}{2} \sum_{i,j \in X} J_{ij} A_{ij} }

By definition we have

\displaystyle{  \frac{1}{2} \sum_{i,j} J_{ij} A_{ij}} =  \displaystyle{  \frac{1}{2} \sum_{i,j}  \left( H_{ij} p_j - H_{ji}p_i \right)   \ln \left( \frac{p_j q_i}{p_i q_j} \right) }

This in turn equals

\displaystyle{  \frac{1}{2} \sum_{i,j}  H_{ij}p_j    \ln \left( \frac{p_j q_i}{p_i q_j} \right) -   \frac{1}{2} \sum_{i,j}  H_{ji}p_i  \ln \left( \frac{p_j q_i}{p_i q_j} \right) }

and we can switch the dummy indices i,j in the second sum, obtaining

\displaystyle{  \frac{1}{2} \sum_{i,j}  H_{ij}p_j    \ln \left( \frac{p_j q_i}{p_i q_j} \right) -   \frac{1}{2} \sum_{i,j}  H_{ij}p_j    \ln \left( \frac{p_i q_j}{p_j q_i} \right) }

or simply

\displaystyle{ \sum_{i,j} H_{ij} p_j \ln \left( \frac{p_j q_i}{p_i q_j} \right) }

But this is

\displaystyle{  \sum_{i,j} H_{ij} p_j \left(\ln ( \frac{p_j}{q_j}) + \ln (\frac{q_i}{p_i}) \right) }

and the first term vanishes because H is infinitesimal stochastic: \sum_i H_{ij} = 0. We thus have

\displaystyle{  \frac{1}{2} \sum_{i,j} J_{ij} A_{ij}} = \sum_{i,j} H_{ij} p_j  \ln (\frac{q_i}{p_i} )

as desired.   █

John BaezCorelations in Network Theory

Category theory reduces a large chunk of math to the clever manipulation of arrows. One of the fun things about this is that you can often take a familiar mathematical construction, think of it category-theoretically, and just turn around all the arrows to get something new and interesting!

In math we love functions. If we have a function

f: X \to Y

we can formally turn around the arrow to think of f as something going back from Y back to X. But this something is usually not a function: it’s called a ‘cofunction’. A cofunction from Y to X is simply a function from X to Y.

Cofunctions are somewhat interesting, but they’re really just functions viewed through a looking glass, so they don’t give much new—at least, not by themselves.

The game gets more interesting if we think of functions and cofunctions as special sorts of relations. A relation from X to Y is a subset

R \subseteq X \times Y

It’s a function when for each x \in X there’s a unique y \in Y with (x,y) \in R. It’s a cofunction when for each y \in Y there’s a unique x \in x with (x,y) \in R.

Just as we can compose functions, we can compose relations. Relations have certain advantages over functions: for example, we can ‘turn around’ any relation R from X to Y and get a relation R^\dagger from Y to X:

R^\dagger = \{(y,x) : \; (x,y) \in R \}

If we turn around a function we get a cofunction, and vice versa. But we can also do other fun things: for example, since both functions and cofunctions are relations, we can compose a function and a cofunction and get a relation.

Of course, relations also have certain disadvantages compared to functions. But it’s utterly clear by now that the category \mathrm{FinRel}, where the objects are finite sets and the morphisms are relations, is very important.

So far, so good. But what happens if we take the definition of ‘relation’ and turn all the arrows around?

There are actually several things I could mean by this question, some more interesting than others. But one of them gives a very interesting new concept: the concept of ‘corelation’. And two of my students have just written a very nice paper on corelations:

• Brandon Coya and Brendan Fong, Corelations are the prop for extraspecial commutative Frobenius monoids.

Here’s why this paper is important for network theory: corelations between finite sets are exactly what we need to describe electrical circuits made of ideal conductive wires! A corelation from a finite set X to a finite set Y can be drawn this way:

I have drawn more wires than strictly necessary: I’ve drawn a wire between two points whenever I want current to be able to flow between them. But there’s a reason I did this: a corelation from X to Y simply tells us when current can flow from one point in either of these sets to any other point in these sets.

Of course circuits made solely of conductive wires are not very exciting for electrical engineers. But in an earlier paper, Brendan introduced corelations as an important stepping-stone toward more general circuits:

• John Baez and Brendan Fong, A compositional framework for passive linear circuits. (Blog article here.)

The key point is simply that you use conductive wires to connect resistors, inductors, capacitors, batteries and the like and build interesting circuits—so if you don’t fully understand the math of conductive wires, you’re limited in your ability to understand circuits in general!

In their new paper, Brendan teamed up with Brandon Coya, and they figured out all the rules obeyed by the category \mathrm{FinCorel}, where the objects are finite sets and the morphisms are corelations. I’ll explain these rules later.

This sort of analysis had previously been done for \mathrm{FinRel}, and it turns out there’s a beautiful analogy between the two cases! Here is a chart displaying the analogy:

Spans Cospans
extra bicommutative bimonoids special commutative Frobenius monoids
Relations Corelations
extraspecial bicommutative bimonoids extraspecial commutative Frobenius monoids

I’m sure this will be cryptic to the nonmathematicians reading this, and even many mathematicians—but the paper explains what’s going on here.

I’ll actually say what an ‘extraspecial commutative Frobenius monoid’ is later in this post. This is a terse way of listing all the rules obeyed by corelations between finite sets—and thus, all the rules obeyed by conductive wires.

But first, let’s talk about something simpler.

What is a corelation?

Just as we can define functions as relations of a special sort, we can also define relations in terms of functions. A relation from X to Y is a subset

R \subseteq X \times Y

but we can think of this as an equivalence class of one-to-one functions

i: R \to X \times Y

Why an equivalence class? The image of i is our desired subset of X \times Y. The set R here could be replaced by any isomorphic set; its only role is to provide ‘names’ for the elements of X \times Y that are in the image of i.

Now we have a relation described as an arrow, or really an equivalence class of arrows. Next, let’s turn the arrow around!

There are different things I might mean by that, but we want to do it cleverly. When we turn arrows around, the concept of product (for example, cartesian product X \times Y of sets) turns into the concept of sum (for example, disjoint union X + Y of sets). Similarly, the concept of monomorphism (such as a one-to-one function) turns into the concept of epimorphism (such as an onto function). If you don’t believe me, click on the links!

So, we should define a corelation from a set X to a set Y to be an equivalence class of onto functions

p: X + Y \to C

Why an equivalence class? The set C here could be replaced by any isomorphic set; its only role is to provide ‘names’ for the sets of elements of X + Y that get mapped to the same thing via p.

In simpler terms, a corelation from X to a set Y is just a partition of the disjoint union X + Y. So, it looks like this:

If we like, we can then draw a line connecting any two points that lie in the same part of the partition:

These lines determine the corelation, so we can also draw a corelation this way:

This is why corelations describe circuits made solely of wires!

The rules governing corelations

The main result in Brandon and Brendan’s paper is that \mathrm{FinCorel} is equivalent to the PROP for extraspecial commutative Frobenius monoids. That’s a terse way of the laws governing \mathrm{FinCorel}.

Let me just show you the most important laws. In each of these law I’ll draw two circuits made of wires, and write an equals sign asserting that they give the same corelation from a set X to a set Y. The inputs X of each circuit are on top, and the outputs Y are at the bottom. I’ll draw 3-way junctions as little triangles, but don’t worry about that. When we compose two corelations we may get a wire left in mid-air, not connected to the inputs or outputs. We draw the end of the wire as a little circle.

There are some laws called the ‘commutative monoid’ laws:

and an upside-down version called the ‘cocommutative comonoid’ laws:

Then we have ‘Frobenius laws’:

and finally we have the ‘special’ and ‘extra’ laws:

All other laws can be derived from these in some systematic ways.

Commutative Frobenius monoids obey the commutative monoid laws, the cocommutative comonoid laws and the Frobenius laws. They play a fundamental role in 2d topological quantum field theory. Special Frobenius monoids are also well-known. But the ‘extra’ law, which says that a little piece of wire not connected to anything can be thrown away with no effect, is less well studied. Jason Erbele and I gave it this name in our work on control theory:

• John Baez and Jason Erbele, Categories in control. (Blog article here.)

For more

David Ellerman has spent a lot of time studying what would happen to mathematics if we turned around a lot of arrows in a certain systematic way. In particular, just as the concept of relation would be replaced by the concept of corelation, the concept of subset would be replaced by the concept of partition. You can see how it fits together: just as a relation from X to Y is a subset of X \times Y, a corelation from X to Y is a partition of X + Y.

There’s a lattice of subsets of a set:

In logic these subsets correspond to propositions, and the lattice operations are the logical operations ‘and’ and ‘or’. But there’s also a lattice of partitions of a set:

In Ellerman’s vision, this lattice of partitions gives a new kind of logic. You can read about it here:

• David Ellerman, Introduction to partition logic, Logic Journal of the Interest Group in Pure and Applied Logic 22 (2014), 94–125.

As mentioned, the main result in Brandon and Brendan’s paper is that \mathrm{FinCorel} is equivalent to the PROP for extraspecial commutative Frobenius monoids. After they proved this, they noticed that the result has also been stated in other language and proved in other ways by two other authors:

• Fabio Zanasi, Interacting Hopf Algebras—the Theory of Linear Systems, PhD thesis, École Normale Supériere de Lyon, 2015.

• K. Dosen and Z. Petrić, Syntax for split preorders, Annals of Pure and Applied Logic 164 (2013), 443–481.

Unsurprisingly, I prefer Brendan and Brandon’s approach to deriving the result. But it’s nice to see different perspectives!

Clifford JohnsonSuited Up!

war_gear_smYes, I was in battle again. A persistent skunk that wants to take up residence in the crawl space. I got rid of it last week, having found one place it broke in. This involved a lot of crawling around on my belly armed with a headlamp (not pictured - this is an old picture) and curses. I've done this before... It left. Then yesterday I found a new place it had broken in through and the battle was rejoined. Interestingly, this time it decided to hide after some of the back and forth and I lost track of it for a good while and was about to give up and hope it will feel unsafe with all the lights I'd put on down there (and/or encourage it further to leave by deploying nuclear weapons to match the ones it comes armed with*).

In preparation for this I left open the large access hatch and sprinkled a layer [...] Click to continue reading this post

The post Suited Up! appeared first on Asymptotia.

February 05, 2016

BackreactionMuch Ado around Nothing: The Cosmological non-Constant Problem

Tl;dr: Researchers put forward a theoretical argument that new physics must appear at energies much lower than commonly thought, barely beyond the reach of the LHC.

The cosmological constant is the worst-ever prediction of quantum field theory, infamously off by 120 orders of magnitude. And as if that wasn’t embarrassing enough, this gives rise to, not one, but three problems: Why is the measured cosmological constant neither 1) huge nor 2) zero, and 3) Why didn’t this occur to us a billion years earlier? With that, you’d think that physicists have their hands full getting zeroes arranged correctly. But Niayesh Afshordi and Elliot Nelson just added to our worries.

In a paper that made it third place of this year’s Buchalter Cosmology Prize, Afshordi and Nelson pointed out that the cosmological constant, if it arises from the vacuum energy of matter fields, should be subject to quantum fluctuations. And these fluctuations around the average are still large even if you have managed to get the constant itself to be small.

The cosmological constant, thus, is not actually constant. And since matter curves space-time, the matter fluctuations lead to space-time fluctuations – which can screw with our cosmological models. Afshordi and Nelson dubbed it the “Cosmological non-Constant Problem.”

But there is more to their argument than just adding to our problems because Afshordi and Nelson quantified what it takes to avoid a conflict with observation. They calculate the effect of stress-energy fluctuations on the space-time background, and then analyze what consequences this would have for the gravitational interaction. They introduce as a free parameter an energy scale up to which the fluctuations abound, and then contrast the corrections from this with observations, like for example the CMB power spectrum or the peculiar velocities of galaxy clusters. From these measurements they derive bounds on the scale at which the fluctuations must cease, and thus, where some new physics must come into play.

They find that the scale beyond which we should already have seen the effect of the vacuum fluctuations is about 35 TeV. If their argument is right, this means something must happen either to matter or to gravity before reaching this energy scale; the option the authors advocate in their paper is that physics becomes strongly coupled below this scale (thus invalidating the extrapolation to larger energies, removing the problem).

Unfortunately, the LHC will not be able to reach all the way up to 35 TeV. But a next larger collider – and we all hope there will be one! – almost certainly would be able to test the full range. As Niayesh put it: “It’s not a problem yet” – but it will be a problem if there is no new physics before getting all the way up to 35 TeV.

I find this an interesting new twist on the cosmological constant problem(s). Something about this argument irks me, but I can’t quite put a finger on it. If I have an insight, you’ll hear from me again. Just generally I would caution you to not take the exact numerical value too seriously because in this kind of estimate there are usually various places where factors of order one might come in.

In summary, if Afshordi and Nelson are right, we’ve been missing something really essential about gravity.

BackreactionMe, Elsewhere

I'm back from my trip. Here are some things that prevented me from more substantial blogging:
  • I wrote an article for Aeon, "The superfluid Universe," which just appeared. For a somewhat more technical summary, see this earlier blogpost.
  • I did a Q&A with John The-End-of-Science Horgan, which was fun. I disagree with him on many things, but I admire his writing. He is infallibly skeptic and unashamedly opinionated -- qualities I find lacking in much of today's science writing, including, sometimes, my own.
  • I spoke with Davide Castelvecchi about Stephen Hawking's recent attempt to solve the black hole information loss problem, which I previously wrote about here.
  • And I had some words to spare for Zeeya Merali, probably more words than she wanted, on the issue with the arXiv moderation, which we discussed here.
  • Finally, I had the opportunity to give some input for this video on the PhysicsGirl's YouTube channel:

    I previously explained in this blogpost that Hawking radiation is not produced at the black hole horizon, a correction to the commonly used popular science explanation that caught much more attention than I anticipated.

    There are of course still some things in the above video I'd like to complain about. To begin with, anti-particles don't normally have negative energy (no they don't). And the vacuum is the same for two observers who are moving relative to each other with constant velocity - it's the acceleration that makes the difference between the vacua. In any case, I applaud the Physics Girl team for taking on what is admittedly a rather technical and difficult topic. If anyone can come up with a better illustration for Hawking-radiation than Hawking's own idea with the pairs that are being ripped apart (which is far too localized to fit well with the math), please leave a suggestion in the comments.

Doug NatelsonWhat is density functional theory? part 2 - approximations

So, DFT contains a deep truth:   Somehow just the electronic density as a function of position within a system in its lowest energy state contains, latent within it, basically all of the information about that ground state.  This is the case even though you usually think that you should need to know the actual complex electronic wavefunction \(\Psi(\mathbf{r})\), and the density (\(\Psi^{*}\Psi\)) seems to throw away a bunch of information.

Moreover, thanks to Kohn and Sham, there is actually a procedure that lets you calculate things using a formalism where you can ignore electron-electron interactions and, in principle, get arbitrarily close to the real (including interaction corrections) density.  In practice, life is not so easy.  We don't actually know how to write down a readily computable form of the complete Kohn-Sham functional.  Some people have very clever ideas about trying to finesse this, but it's hard, especially since the true functional is actually nonlocal - it somehow depends on correlations between the density (and its spatial derivatives) at different positions.   In our seating chart analogy, we know that there's a procedure for finding the true optimal seating even without worrying about the interactions between people, but we don't know how to write it down nicely.  The correct procedure involves looking at whether each seat is empty or full, whether its neighboring seats are occupied, and even potentially the coincident occupation of groups of seats - this is what I mean by nonlocal.

Fig. from here.
We could try a simplifying local approximation, where we only care about whether a given chair is empty or full.  (If you try to approximate using a functional that depends only on the local density, you are doing LDA (the local density approximation)).  We could try to be a bit more sophisticated, and worry about whether a chair is occupied and how much the occupancy varies in different directions.  (If you try to incorporate the local density and its gradient, you are doing GGA (the generalized gradient approximation)).   There are other, more complicated procedures that add in additional nonlocal bits - if done properly, this is rigorous.  The real art in this business is understanding which approximations are best in which regimes, and how to compute things efficiently.

So how good can this be?  An example is shown in the figure (from a summer school talk by my friend Leeor Kronik).  The yellow points indicate (on both axes) the experimental values of the ionization energies for the various organic molecules shown.  The other symbols show different calculated ionization energies plotted vs. the experimental values.  A particular mathematical procedure with a clear theoretical justification (read the talk for details) that mixes in long-range and short-range contributions gives the points labeled with asterisks, which show very good agreement with the experiments.

Next time:  The conclusion, with pitfalls, perils, and general abuses of DFT.

Scott AaronsonHere’s some video of me spouting about Deep Questions

In January 2014, I attended an FQXi conference on Vieques island in Puerto Rico.  While there, Robert Lawrence Kuhn interviewed me for his TV program Closer to Truth, which deals with science and religion and philosophy and you get the idea.  Alas, my interview was at the very end of the conference, and we lost track of the time—so unbeknownst to me, a plane full of theorists was literally sitting on the runway waiting for me to finish philosophizing!  This was the second time Kuhn interviewed me for his show; the first time was on a cruise ship near Norway in 2011.  (Thankless hero that I am, there’s nowhere I won’t travel for the sake of truth.)

Anyway, after a two-year wait, the videos from Puerto Rico are finally available online.  While my vignettes cover what, for most readers of this blog, will be very basic stuff, I’m sort of happy with how they turned out: I still stutter and rock back and forth, but not as much as usual.  For your viewing convenience, here are the new videos:

I had one other vignette, about why the universe exists, but they seem to have cut that one.  Alas, if I knew why the universe existed in January 2014, I can’t remember any more.

One embarrassing goof: I referred to the inventor of Newcomb’s Paradox as “Simon Newcomb.”  Actually it was William Newcomb: a distant relative of Simon Newcomb, the 19th-century astronomer who measured the speed of light.

At their website, you can also see my older 2011 videos, and videos from others who might be known to readers of this blog, like Marvin Minsky, Roger Penrose, Rebecca Newberger Goldstein, David ChalmersSean Carroll, Max Tegmark, David Deutsch, Raphael Bousso, Freeman DysonNick BostromRay Kurzweil, Rodney Brooks, Stephen Wolfram, Greg Chaitin, Garrett Lisi, Seth Lloyd, Lenny Susskind, Lee Smolin, Steven Weinberg, Wojciech Zurek, Fotini Markopoulou, Juan Maldacena, Don Page, and David Albert.  (No, I haven’t yet watched most of these, but now that I linked to them, maybe I will!)

Thanks very much to Robert Lawrence Kuhn and Closer to Truth (and my previous self, I guess?) for providing Shtetl-Optimized content so I don’t have to.

Update: Andrew Critch of CFAR asked me to post the following announcement.

We’re seeking a full time salesperson for the Center for Applied Rationality in Berkeley, California. We’ve streamlined operations to handle large volume in workshop admissions, and now we need that volume to pour in. Your role would be to fill our workshops, events, and alumni community with people. Last year we had 167 total new alumni. This year we want 120 per month. Click here to find out more.

February 04, 2016

Jordan EllenbergThe furniture sentiment

Today’s Memorial Library find:  the magazine Advertising and Selling.  The September 1912 edition features “How Furniture Could Be Better Advertised,” by Arnold Joerns, of E.J. Thiele and Co.

Joerns complains that in 1911, the average American spend $81.22 on food, $26.02 on clothes, $19.23 on intoxicants, $9.08 on tobacco, and only $6.19 on furniture.  “Do you think furniture should be on the bottom of this list?” he asks, implicitly shaking his head.  “Wouldn’t you — dealer or manufacturer — rather see it nearer the top, — say at least ahead of tobacco and intoxicants?”

Good news for furniture lovers:  by 2012, US spending on “household furnishings and equipment” was  at $1,506 per household, almost a quarter as much as we spent on food.  (To be fair, it looks like this includes computers, lawnmowers, and many other non-furniture items.)  Meanwhile, spending on alcohol is only $438.  That’s pretty interesting:  in 1911, liquor expenditures were a quarter of food expenditures; now it’s less than a tenth.  Looks like a 1911 dollar is roughly 2012$25, so the real dollars spent on alcohol aren’t that different, but we spend a lot more now on food and on furniture.

Anyway, this piece takes a spendidly nuts turn at the end, as Joerns works up a head of steam about the moral peril of discount furniture:

I do not doubt but that fewer domestic troubles would exist if people were educated to a greater understanding of the furniture sentiment.  Our young people would find more pleasure in an evening at home — if we made that home more worth while and a source of personal pride; then, perhaps, they would cease joy-riding, card-playing, or drinking and smoking in environments unhealthful to their minds and bodies.

It would even seem reasonable to assume, that if the public mind were educated to appreciate more the sentiment in furniture and its relation to the Ideal Home, we would have fewer divorces.  Home would mean more to the boys and girls of today and the men and women of tomorrow.  Obviously, if the public is permitted to lose more and more its appreciation of home sentiment, the divorce evil will grow, year by year.

Joerns proposes that the higher sort of furniture manufacturers boost their brand by advertising it, not as furniture, but as “meuble.” This seems never to have caught on.

February 03, 2016

Terence TaoFinite time blowup for an Euler-type equation in vorticity stream form

I’ve been meaning to return to fluids for some time now, in order to build upon my construction two years ago of a solution to an averaged Navier-Stokes equation that exhibited finite time blowup. (I recently spoke on this work in the recent conference in Princeton in honour of Sergiu Klainerman; my slides for that talk are here.)

One of the biggest deficiencies with my previous result is the fact that the averaged Navier-Stokes equation does not enjoy any good equation for the vorticity {\omega = \nabla \times u}, in contrast to the true Navier-Stokes equations which, when written in vorticity-stream formulation, become

\displaystyle \partial_t \omega + (u \cdot \nabla) \omega = (\omega \cdot \nabla) u + \nu \Delta \omega

\displaystyle u = (-\Delta)^{-1} (\nabla \times \omega).

(Throughout this post we will be working in three spatial dimensions {{\bf R}^3}.) So one of my main near-term goals in this area is to exhibit an equation resembling Navier-Stokes as much as possible which enjoys a vorticity equation, and for which there is finite time blowup.

Heuristically, this task should be easier for the Euler equations (i.e. the zero viscosity case {\nu=0} of Navier-Stokes) than the viscous Navier-Stokes equation, as one expects the viscosity to only make it easier for the solution to stay regular. Indeed, morally speaking, the assertion that finite time blowup solutions of Navier-Stokes exist should be roughly equivalent to the assertion that finite time blowup solutions of Euler exist which are “Type I” in the sense that all Navier-Stokes-critical and Navier-Stokes-subcritical norms of this solution go to infinity (which, as explained in the above slides, heuristically means that the effects of viscosity are negligible when compared against the nonlinear components of the equation). In vorticity-stream formulation, the Euler equations can be written as

\displaystyle \partial_t \omega + (u \cdot \nabla) \omega = (\omega \cdot \nabla) u

\displaystyle u = (-\Delta)^{-1} (\nabla \times \omega).

As discussed in this previous blog post, a natural generalisation of this system of equations is the system

\displaystyle \partial_t \omega + (u \cdot \nabla) \omega = (\omega \cdot \nabla) u \ \ \ \ \ (1)


\displaystyle u = T (-\Delta)^{-1} (\nabla \times \omega).

where {T} is a linear operator on divergence-free vector fields that is “zeroth order” in some sense; ideally it should also be invertible, self-adjoint, and positive definite (in order to have a Hamiltonian that is comparable to the kinetic energy {\frac{1}{2} \int_{{\bf R}^3} |u|^2}). (In the previous blog post, it was observed that the surface quasi-geostrophic (SQG) equation could be embedded in a system of the form (1).) The system (1) has many features in common with the Euler equations; for instance vortex lines are transported by the velocity field {u}, and Kelvin’s circulation theorem is still valid.

So far, I have not been able to fully achieve this goal. However, I have the following partial result, stated somewhat informally:

Theorem 1 There is a “zeroth order” linear operator {T} (which, unfortunately, is not invertible, self-adjoint, or positive definite) for which the system (1) exhibits smooth solutions that blowup in finite time.

The operator {T} constructed is not quite a zeroth-order pseudodifferential operator; it is instead merely in the “forbidden” symbol class {S^0_{1,1}}, and more precisely it takes the form

\displaystyle T v = \sum_{j \in {\bf Z}} 2^{3j} \langle v, \phi_j \rangle \psi_j \ \ \ \ \ (2)


for some compactly supported divergence-free {\phi,\psi} of mean zero with

\displaystyle \phi_j(x) := \phi(2^j x); \quad \psi_j(x) := \psi(2^j x)

being {L^2} rescalings of {\phi,\psi}. This operator is still bounded on all {L^p({\bf R}^3)} spaces {1 < p < \infty}, and so is arguably still a zeroth order operator, though not as convincingly as I would like. Another, less significant, issue with the result is that the solution constructed does not have good spatial decay properties, but this is mostly for convenience and it is likely that the construction can be localised to give solutions that have reasonable decay in space. But the biggest drawback of this theorem is the fact that {T} is not invertible, self-adjoint, or positive definite, so in particular there is no non-negative Hamiltonian for this equation. It may be that some modification of the arguments below can fix these issues, but I have so far been unable to do so. Still, the construction does show that the circulation theorem is insufficient by itself to prevent blowup.

We sketch the proof of the above theorem as follows. We use the barrier method, introducing the time-varying hyperboloid domains

\displaystyle \Omega(t) := \{ (r,\theta,z): r^2 \leq 1-t + z^2 \}

for {t>0} (expressed in cylindrical coordinates {(r,\theta,z)}). We will select initial data {\omega(0)} to be {\omega(0,r,\theta,z) = (0,0,\eta(r))} for some non-negative even bump function {\eta} supported on {[-1,1]}, normalised so that

\displaystyle \int\int \eta(r)\ r dr d\theta = 1;

in particular {\omega(0)} is divergence-free supported in {\Omega(0)}, with vortex lines connecting {z=-\infty} to {z=+\infty}. Suppose for contradiction that we have a smooth solution {\omega} to (1) with this initial data; to simplify the discussion we assume that the solution behaves well at spatial infinity (this can be justified with the choice (2) of vorticity-stream operator, but we will not do so here). Since the domains {\Omega(t)} disconnect {z=-\infty} from {z=+\infty} at time {t=1}, there must exist a time {0 < T_* < 1} which is the first time where the support of {\omega(T_*)} touches the boundary of {\Omega(T_*)}, with {\omega(t)} supported in {\Omega(t)}.

From (1) we see that the support of {\omega(t)} is transported by the velocity field {u(t)}. Thus, at the point of contact of the support of {\omega(T_*)} with the boundary of {\Omega(T_*)}, the inward component of the velocity field {u(T_*)} cannot exceed the inward velocity of {\Omega(T_*)}. We will construct the functions {\phi,\psi} so that this is not the case, leading to the desired contradiction. (Geometrically, what is going on here is that the operator {T} is pinching the flow to pass through the narrow cylinder {\{ z, r = O( \sqrt{1-t} )\}}, leading to a singularity by time {t=1} at the latest.)

First we observe from conservation of circulation, and from the fact that {\omega(t)} is supported in {\Omega(t)}, that the integrals

\displaystyle \int\int \omega_z(t,r,\theta,z) \ r dr d\theta

are constant in both space and time for {0 \leq t \leq T_*}. From the choice of initial data we thus have

\displaystyle \int\int \omega_z(t,r,\theta,z) \ r dr d\theta = 1

for all {t \leq T_*} and all {z}. On the other hand, if {T} is of the form (2) with {\phi = \nabla \times \eta} for some bump function {\eta = (0,0,\eta_z)} that only has {z}-components, then {\phi} is divergence-free with mean zero, and

\displaystyle \langle (-\Delta) (\nabla \times \omega), \phi_j \rangle = 2^{-j} \langle (-\Delta) (\nabla \times \omega), \nabla \times \eta_j \rangle

\displaystyle = 2^{-j} \langle \omega, \eta_j \rangle

\displaystyle = 2^{-j} \int\int\int \omega_z(t,r,\theta,z) \eta_z(2^j r, \theta, 2^j z)\ r dr d\theta dz,

where {\eta_j(x) := \eta(2^j x)}. We choose {\eta_z} to be supported in the slab {\{ C \leq z \leq 2C\}} for some large constant {C}, and to equal a function {f(z)} depending only on {z} on the cylinder {\{ C \leq z \leq 2C; r \leq 10C \}}, normalised so that {\int f(z)\ dz = 1}. If {C/2^j \geq (1-t)^{1/2}}, then {\Omega(t)} passes through this cylinder, and we conclude that

\displaystyle \langle (-\Delta) (\nabla \times \omega), \phi_j \rangle = -2^{-j} \int f(2^j z)\ dz

\displaystyle = 2^{-2j}.

Inserting ths into (2), (1) we conclude that

\displaystyle u = \sum_{j: C/2^j \geq (1-t)^{1/2}} 2^j \psi_j + \sum_{j: C/2^j < (1-t)^{1/2}} c_j(t) \psi_j

for some coefficients {c_j(t)}. We will not be able to control these coefficients {c_j(t)}, but fortunately we only need to understand {u} on the boundary {\partial \Omega(t)}, for which {r+|z| \gg (1-t)^{1/2}}. So, if {\psi} happens to be supported on an annulus {1 \ll r+|z| \ll 1}, then {\psi_j} vanishes on {\partial \Omega(t)} if {C} is large enough. We then have

\displaystyle u = \sum_j 2^j \psi_j

on the boundary of {\partial \Omega(t)}.

Let {\Phi(r,\theta,z)} be a function of the form

\displaystyle \Phi(r,\theta,z) = C z \varphi(z/r)

where {\varphi} is a bump function supported on {[-2,2]} that equals {1} on {[-1,1]}. We can perform a dyadic decomposition {\Phi = \sum_j \Psi_j} where

\displaystyle \Psi_j(r,\theta,z) = \Phi(r,\theta,z) a(2^j r)

where {a} is a bump function supported on {[1/2,2]} with {\sum_j a(2^j r) = 1}. If we then set

\displaystyle \psi_j = \frac{2^{-j}}{r} (-\partial_z \Psi_j, 0, \partial_r \Psi_j)

then one can check that {\psi_j(x) = \psi(2^j x)} for a function {\psi} that is divergence-free and mean zero, and supported on the annulus {1 \ll r+|z| \ll 1}, and

\displaystyle \sum_j 2^j \psi_j = \frac{1}{r} (-\partial_z \Phi, 0, \partial_r \Phi)

so on {\partial \Omega(t)} (where {|z| \leq r}) we have

\displaystyle u = (-\frac{C}{r}, 0, 0 ).

One can manually check that the inward velocity of this vector on {\partial \Omega(t)} exceeds the inward velocity of {\Omega(t)} if {C} is large enough, and the claim follows.

Remark 2 The type of blowup suggested by this construction, where a unit amount of circulation is squeezed into a narrow cylinder, is of “Type II” with respect to the Navier-Stokes scaling, because Navier-Stokes-critical norms such {L^3({\bf R}^3)} (or at least {L^{3,\infty}({\bf R}^3)}) look like they stay bounded during this squeezing procedure (the velocity field is of size about {2^j} in cylinders of radius and length about {2^j}). So even if the various issues with {T} are repaired, it does not seem likely that this construction can be directly adapted to obtain a corresponding blowup for a Navier-Stokes type equation. To get a “Type I” blowup that is consistent with Kelvin’s circulation theorem, it seems that one needs to coil the vortex lines around a loop multiple times in order to get increased circulation in a small space. This seems possible to pull off to me – there don’t appear to be any unavoidable obstructions coming from topology, scaling, or conservation laws – but would require a more complicated construction than the one given above.

Filed under: expository, math.AP Tagged: incompressible Euler equations

February 02, 2016

Doug NatelsonWhat is density functional theory? part 1.

In previous posts, I've tried to introduce the idea that there can be "holistic" approaches to solving physics problems, and I've attempted to give a lay explanation of what a functional is (short version: a functional is a function of a function - it chews on a whole function and spits out a number.).  Now I want to talk about density functional theory, an incredibly valuable and useful scientific advance ("easily the most heavily cited concept in the physical sciences"), yet one that is basically invisible to the general public.

Let me try an analogy.  You're trying to arrange the seating for a big banquet, and there are a bunch of constraints:  Alice wants very much to be close to the kitchen.  Bob also wants to be close to the kitchen.  However, Alice and Bob both want to be as far from all other people as possible.  Etc. Chairs can't be on top of each other, but you still need to accommodate the full guest list.  In the end you are going to care about the answers to certain questions:  How hard would it be to push two chairs closer to each other? If one person left, how much would all the chairs need to be rearranged to keep everyone maximally comfortable?    You could imagine solving this problem by brute force - write down all the constraints and try satisfying them one person at a time, though every person you add might mean rearranging all the previously seated people.  You could also imagine solving this by some trial-and-error method, where you guess an initial arrangement, and make adjustments to check and see if you've improved how well you satisfy everyone.  However, it doesn't look like there's any clear, immediate strategy for figuring this out and answering the relevant questions.

The analogy of DFT here would be three statements.  First, you'd probably be pretty surprised if I told you that if I gave you the final seating positions of the people in the room, that would completely specify and nail down the answer to any of those questions up there that you could ask about the room.1  Second, there is a math procedure (a functional that depends on the positions of all of the people in the room that can be minimized) to find that unique seating chart.2  Third, even more amazingly, there is some mock-up of the situation where we don't have to worry about the people-people interactions directly, yet (minimizing a functional of the positions of the non-interacting people) would still give us the full seating chart, and therefore let us answer all the questions.3

For a more physicsy example:  Suppose you want to figure out the electronic properties of some system.  In something like hydrogen gas, H2, maybe we want to know where the electrons are, how far apart the atoms like to sit, and how much energy it takes to kick out an electron - these are important things to know if you are a chemist and want to understand chemical reactions, for example.  Conceptually, this is easy:  In principle we know the mathematical rules that describe electrons, so we should be able to write down the relevant equations, solve them (perhaps with a computer if we can't find nice analytical solutions), and we're done.  In this case, the equation of interest is the time-independent form of the Schroedinger equation.  There are two electrons in there, one coming from each hydrogen atom.  One tricky wrinkle is that the two electrons don't just feel an attraction to the protons, but they also repel each other - that makes this an "interacting electron" problem.  A second tricky wrinkle is that the electrons are fermions.  If we imagine swapping (the quantum numbers associated with) two electrons, we have to pick up a minus sign in the math representation of their quantum state.  We do know how to solve this problem (two interacting electrons plus two much heavier protons) numerically to a high degree of accuracy.  Doing this kind of direct solution gets prohibitively difficult, however, as the number of electrons increases.

So what do we do?  DFT tells us:
1If you actually knew the total electron density as a function of position, \(n(\mathbf{r})\), that would completely determine the properties of the electronic ground state.  This is the first Hohenberg-Kohn theorem.

2There is a unique functional \(E[n(\mathbf{r})]\) for a given system that, when minimized, will give you the correct density \(n(\mathbf{r})\).  This is the second Hohenberg-Kohn theorem.

3You can set up a system where, with the right functional, you can solve a problem involving noninteracting electrons that will give you the true density \(n(\mathbf{r})\).  That's the Kohn-Sham approach, which has actually made this kind of problem solving practical.

The observations by Kohn and Hohenberg are very deep.  Somehow just the electronic density encodes a whole lot more information than you might think, especially if you've had homework experience trying to solve many-body quantum mechanics problems.  The electronic density somehow contains complete information about all the properties of the lowest energy many-electron state.  (In quantum language, knowing the density everywhere in principle specifies the expectation value of any operator you could apply to the ground state.)

The advance by Kohn and Sham is truly great - it describes an actual procedure that you can carry out to really calculate those ground state properties.  The Kohn-Sham approach and its refinements have created the modern field of "quantum chemistry".

More soon....

Jordan EllenbergThe Story of a New Name

2016 reading project is to have more than half my reading be books in translation.  So far this has translated into reading Ferrante after Ferrante.  Not really feeling equal to the task of writing about these books, which color everything else around them while you read.  The struggle to be the protagonist of your own story.  Gatsby is a snapshot of it, Ferrante is a movie of it.

John PreskillMore than Its Parts

“The whole not only becomes more than but very different from the sum of its parts.”
– P. W. Anderson

It was a brainstorming meeting. We went from referencing a melodramatic viral Thai video to P. W. Anderson’s famous paper, “More is Different” to a vote-splitting internally produced Caltech video to the idea of extracting an edit out of existing video we already had from prior IQIM-Parveen Shah Productions’ shoots. And just like that, stepping from one idea to the next, hopping along, I pitched a theorist vs experimentalist “interrogation”. They must have liked it because the opinion in the room hushed for a few seconds. It seemed plausibly exciting and dangerously perhaps…fresh. But what I witnessed in the room was exactly the idea of collaboration and the “storm” of the brain. This wasn’t a conclusion we could likely have arrived to if all of us were sitting alone. There was a sense of the dust settling around the chaotic storm of the collective brain(s). John Preskill, Crystal Dilworth, Spiros Michalakis and I finally agreed on a plan of action going forward. And mind you, this of course was very far from the first meeting or email we had had about the video.

Capitalizing on the instant excitement, the emails started going around. Who will be the partners in crime? A combination of personality, representation, willingness and profile were balanced to decide the participants. We reached out to Gil Refael, David Hsieh, Nai-Chang Yeh, Xie Chen and Oskar Painter. They all said “yes”! It seemed deceivingly easy. And alas it came. Once the idea of the interrogation was unleashed; pitting them against one another or should I say “with” one another, brought about a bit of anxiety and confusion at first. “Wait, we’re supposed to fight on camera?” “But, our fields don’t match necessarily.” “No, no, it just doesn’t make sense.” I was prepared for the paranoia. It was natural and a bit less than what we got back when we pitched the high-fashion shoot for geeks video. I was taking them out of their comfort zone. It was natural. I abated the fears. I told them it was not going to be that controversial.

But I had to prep it to a certain level of “conflict” or “drama” so that what we got on camera, was at least some remnant of the initial emotional intention. The questions, the “tone” had to be set. Then we realized that it wasn’t just the meeting of the professorial brains but also the other researchers of the Institute that needed to be represented. And so, we also added some post docs and graduate students. Johannes Pollanen (now already an Assistant Professor at Michigan State University), Chandni Usha and Shaun Maguire. The idea of a nebulous conversation about theory vs practical or theory and practical seemed like a literal experiment of the very idea of the formation of the IQIM: putting the best brains in the field, in a sandbox, shaking them around to see the entangled interactions produced. It seemed too perfect.

The resulting video might not have produced the exact linear narrative I desired…but it was indeed “more than the sum of its parts”. It showed the excitement, the constant interaction, the curious conversations and the anxiety of being at the forefront, the cutting edge, where one is sometimes limited by another and sometimes enabled by the other, but most importantly, constantly growing and evolving. IQIM to me signifies that community.

Being accepted and integrated as a filmmaker itself is a virtue of that forced and encouraged collaboration and interaction.

And so we began. Before we filmed, I spent time with each duo, discussing and requesting a narrative and answers to some proposed questions.

Cinematographer, Anthony C. Kuhnz, and I were excited to shoot in Keith Schwab’s spacious lab, that had produced the memorable shot of Emma Wollman for our initial promo video. It’s space and background was exactly what we needed for the blurred backgrounds of this “brain space” we were hoping to create.

The lighting was certainly inspired by an interrogation scene but dampened for the dream state. We wanted to bring people into a behind the scenes discussion of some of the most brilliant minds in quantum physics, seeing the issues and challenges that face them; the exciting possibilities that they predict. The handheld camera and the dynamic pans from one to another were also inspired to communicate that transitional and collaborative “ball toss” energy. Once you feel that tangible creativity, then we go into the depths of what IQIM really is, how it creates its community within and without, the latter by focusing on the outreach and the educational efforts to spread the magic and sparkle of Physics.

I’m proud of this video. When I watch it, whether or not I understand everything being said, I do certainly want to be engaged with IQIM and that is the hope for others who watch it.

Nature is subtle and so is the effect of this video…as in, we hope that…we gotcha!

n-Category Café Integral Octonions (Part 12)

guest post by Tim Silverman

“Everything is simpler mod pp.”

That is is the philosophy of the Mod People; and of all pp, the simplest is 2. Washed in a bath of mod 2, that exotic object, the E 8\mathrm{E}_8 lattice, dissolves into a modest orthogonal space, its Weyl group into an orthogonal group, its “large” E 8\mathrm{E}_8 sublattices into some particularly nice subspaces, and the very Leech lattice itself shrinks into a few arrangements of points and lines that would not disgrace the pages of Euclid’s Elements. And when we have sufficiently examined these few bones that have fallen out of their matrix, we can lift them back up to Euclidean space in the most naive manner imaginable, and the full Leech springs out in all its glory like instant mashed potato.

What is this about? In earlier posts in this series, JB and Greg Egan have been calculating and exploring a lot of beautiful Euclidean geometry involving E 8\mathrm{E}_8 and the Leech lattice. Lately, a lot of Fano planes have been popping up in the constructions. Examining these, I thought I caught some glimpses of a more extensive 𝔽 2\mathbb{F}_2 geometry; I made a little progress in the comments, but then got completely lost. But there is indeed an extensive 𝔽 2\mathbb{F}_2 world in here, parallel to the Euclidean one. I have finally found the key to it in the following fact:

Large E 8\mathrm{E}_8 lattices mod 22 are just maximal flats in a 77-dimensional quadric over 𝔽 2\mathbb{F}_2.

I’ll spend the first half of the post explaining what that means, and the second half showing how everything else flows from it. We unfortunately bypass (or simply assume in passing) most of the pretty Euclidean geometry; but in exchange we get a smaller, simpler picture which makes a lot of calculations easier, and the 𝔽 2\mathbb{F}_2 world seems to lift very cleanly to the Euclidean world, though I haven’t actually proved this or explained why — maybe I shall leave that as an exercise for you, dear readers.

N.B. Just a quick note on scaling conventions before we start. There are two scaling conventions we could use. In one, a ‘shrunken’ E 8\mathrm{E}_8 made of integral octonions, with shortest vectors of length 11, contains ‘standard’ sized E 8\mathrm{E}_8 lattices with vectors of minimal length 2\sqrt{2}, and Wilson’s Leech lattice construction comes out the right size. The other is 2\sqrt{2} times larger: a ‘standard’ E 8\mathrm{E}_8 lattice contains “large” E 8\mathrm{E}_8 lattices of minimal length 22, but Wilson’s Leech lattice construction gives something 2\sqrt{2} times too big. I’ve chosen the latter convention because I find it less confusing: reducing the standard E 8\mathrm{E}_8 mod 22 is a well-known thing that people do, and all the Euclidean dot products come out as integers. But it’s as well to bear this in mind when relating this post to the earlier ones.

Projective and polar spaces

I’ll work with projective spaces over 𝔽 q\mathbb{F}_q and try not to suddenly start jumping back and forth between projective spaces and the underlying vector spaces as is my wont, at least not unless it really makes things clearer.

So we have an nn-dimensional projective space over 𝔽 q\mathbb{F}_q. We’ll denote this by PG(n,q)\mathrm{PG}(n,q).

The full symmetry group of PG(n,q)\mathrm{PG}(n,q) is GL n+1(q)\mathrm{GL}_{n+1}(q), and from that we get subgroups and quotients SL n+1(q)SL_{n+1}(q) (with unit determinant), PGL n+1(q)\mathrm{PGL}_{n+1}(q) (quotient by the centre) and PSL n+1(q)\mathrm{PSL}_{n+1}(q) (both). Over 𝔽 2\mathbb{F}_2, the determinant is always 11 (since that’s the only non-zero scalar) and the centre is trivial, so these groups are all the same.

In projective spaces over 𝔽 2\mathbb{F}_2, there are 33 points on every line, so we can ‘add’ two any points and get the third point on the line through them. (This is just a projection of the underlying vector space addition.)

In odd characteristic, we get two other families of Lie type by preserving two types of non-degenerate bilinear form: symmetric and skew-symmetric, corresponding to orthogonal and symplectic structures respectively. (Non-degenerate Hermitian forms, defined over 𝔽 q 2\mathbb{F}_{q^2}, also exist and behave similarly.)

Denote the form by B(x,y)B(x,y). Points xx for which B(x,x)=0B(x, x)=0 are isotropic. For a symplectic structure all points are isotropic. A form BB such that B(x,x)=0B(x,x)=0 for all xx is called alternating, and in odd characteristic, but not characteristic 22, skew-symmetric and alternating forms are the same thing.

A line spanned by two isotropic points, xx and yy, such that B(x,y)=1B(x,y)=1 is a hyperbolic line. Any space with a non-degenerate bilinear (or Hermitian) form can be decomposed as the orthogonal sum of hyperbolic lines (i.e. as a vector space, decomposed as an orthogonal sum of hyperbolic planes), possibly together with an anisotropic space containing no isotropic points at all. There are no non-empty symplectic anisotropic spaces, so all symplectic spaces are odd-dimensional (projectively — the corresponding vector spaces are even-dimensional).

There are anisotropic orthogonal points and lines (over any finite field including in even characteristic), but all the orthogonal spaces we consider here will be a sum of hyperbolic lines — we say they are of plus type. (The odd-dimensional projective spaces with a residual anisotropic line are of minus type.)

A quadratic form Q(x)Q(x) is defined by the conditions

i) Q(x+y)=Q(x)+Q(y)+B(x,y)Q(x+y)=Q(x)+Q(y)+B(x,y), where BB is a symmetric bilinear form.

ii) Q(λx)=λ 2Q(x)Q(\lambda x)=\lambda^2Q(x) for any scalar λ\lambda.

There are some non-degeneracy conditions I won’t go into.

Obviously, a quadratic form implies a particular symmetric bilinear form, by B(x,y)=Q(x+y)Q(x)Q(y)B(x,y)=Q(x+y)-Q(x)-Q(y). In odd characteristic, we can go the other way: Q(x)=12B(x,x)Q(x)=\frac{1}{2}B(x,x).

We denote the group preserving an orthogonal structure of plus type on an nn-dimensional projective space over 𝔽 q\mathbb{F}_q by GO n+1 +(q)\mathrm{GO}_{n+1}^+(q), by analogy with GL n+1(q)\mathrm{GL}_{n+1}(q). Similarly we have SO n+1 +(q)\mathrm{SO}_{n+1}^+(q), PGO n+1 +(q)\mathrm{PGO}_{n+1}^+(q) and PSO n+1 +(q)\mathrm{PSO}_{n+1}^+(q). However, whereas PSL n(q)\mathrm{PSL}_n(q) is simple apart from 22 exceptions, we usually have an index 22 subgroup of SO n+1 +(q)\mathrm{SO}_{n+1}^+(q), called Ω n+1 +(q)\Omega_{n+1}^+(q), and a corresponding index 22 subgroup of PSO n+1 +(q)\mathrm{PSO}_{n+1}^+(q), called PΩ n+1 +(q)\mathrm{P}\Omega_{n+1}^+(q), and it is the latter that is simple. (There is an infinite family of exceptions, where PSO n+1 +(q)\mathrm{PSO}_{n+1}^+(q) is simple.)

Symplectic structures are easier — the determinant is automatically 11, so we just have Sp n+1(q)\mathrm{Sp}_{n+1}(q) and PSp n+1(q)\mathrm{PSp}_{n+1}(q), with the latter being simple except for 33 exceptions.

Just as a point with B(x,x)=0B(x,x)=0 is an isotropic point, so any subspace with BB identically 00 on it is an isotropic subspace.

And just as the linear groups act on incidence geometries given by the (‘classical’) projective spaces, so the symplectic and orthogonal act on polar spaces, whose points, lines, planes, etc, are just the isotropic points, isotropic lines, isotropic planes, etc given by the bilinear (or Hermitian) form. We denote an orthogonal polar space of plus type on an nn-dimensional projective space over 𝔽 q\mathbb{F}_q by Q n +(q)\mathrm{Q}_n^+(q).

In characteristic 22, a lot of this goes wrong, but in a way that can be fixed and mostly turns out the same.

1) Symmetric and skew-symmetric forms are the same thing! There are still distinct orthogonal and symplectic structures and groups, but we can’t use this as the distinction.

2) Alternating and skew-symmetric forms are not the same thing! Alternating forms are all skew-symmetric (aka symmetric) but not vice versa. A symplectic structure is given by an alternating form — and of course this definition works in odd characteristic too.

3) Symmetric bilinear forms are no longer in bijection with quadratic forms: every quadratic form gives a unique symmetric (aka skew-symmetric, and indeed alternating) bilinear form, but an alternating form is compatible with multiple quadratic forms. We use non-degenerate quadratic forms to define orthogonal structures, rather than symmetric bilinear forms — which of course works in odd characteristic too. (Note also from the above that in characteristic 22 an orthogonal structure has an associated symplectic structure, which it shares with other orthogonal structures.)

We now have both isotropic subspaces on which the bilinear form is identically 00, and singular subspaces on which the quadratic form is identically 00, with the latter being a subset of the former. It is the singular spaces which go to make up the polar space for the orthogonal structure.

To cover both cases, we’ll refer to these isotropic/singular projective spaces inside the polar spaces as flats.

Everything else is still the same — decomposition into hyperbolic lines and an anisotropic space, plus and minus types, Ω n+1 +(q)\Omega_{n+1}^+(q) inside SO n+1 +(q)\mathrm{SO}_{n+1}^+(q), polar spaces, etc.

Over 𝔽 2\mathbb{F}_2, we have that GO n+1 +(q)\mathrm{GO}_{n+1}^+(q), SO n+1 +(q)\mathrm{SO}_{n+1}^+(q), PGO n+1 +(q)\mathrm{PGO}_{n+1}^+(q) and PSO n+1 +(q)\mathrm{PSO}_{n+1}^+(q) are all the same group, as are Ω n+1 +(q)\Omega_{n+1}^+(q) and PΩ n+1 +(q)\mathrm{P}\Omega_{n+1}^+(q).

The vector space dimension of the maximal flats in a polar space is the polar rank of the space, one of its most important invariants — it’s the number of hyperbolic lines in its orthogonal decomposition.

Q 2m1 +(q)\mathrm{Q}_{2m-1}^+(q) has rank mm. The maximal flats fall into two classes. In odd characteristic, the classes are preserved by SO 2m +(q)\mathrm{SO}_{2m}^+(q) but interchanged by the elements of GO 2m +(q)\mathrm{GO}_{2m}^+(q) with determinant 1-1. In even characteristic, the classes are preserved by Ω 2m +(q)\Omega_{2m}^+(q), but interchanged by elements of GO 2m +(q)\mathrm{GO}_{2m}^+(q).

Finally, I’ll refer to the value of the quadratic form at a point, Q(x)Q(x), as the norm of xx, even though in Euclidean space we’d call it “half the norm-squared”.

Here are some useful facts about Q 2m1 +(q)\mathrm{Q}_{2m-1}^+(q):

1a. The number of points is (q m1)(q m1+1)q1\displaystyle\frac{\left(q^m-1\right)\left(q^{m-1}+1\right)}{q-1}.

1b. The number of maximal flats is i=0 m1(1+q i)\prod_{i=0}^{m-1}\left(1+q^i\right).

1c. Two maximal flats of different types must intersect in a flat of odd codimension; two maximal flats of the same type must intersect in a flat of even codimension.

Here two more general facts.

1d. Pick a projective space Π\Pi of dimension nn. Pick a point pp in it. The space whose points are lines through pp, whose lines are planes through pp, etc, with incidence inherited from Π\Pi, is a projective space of dimension n1n-1.

1e. Pick a polar space Σ\Sigma of rank mm. Pick a point pp in it. The space whose points are lines (i.e. 11-flats) through pp, whose lines are planes (i.e. 22-flats) through pp, etc, with incidence inherited from Σ\Sigma, is a polar space of the same type, of rank m1m-1.

The Klein correspondence at breakneck speed

The bivectors of a 44-dimensional vector space constitute a 66-dimensional vector space. Apart from the zero bivector, these fall into two types: degenerate ones which can be decomposed as the wedge product of two vectors and therefore correspond to planes (or, projectively, lines); and non-degenerate ones, which, by, wedging with vectors on each side give rise to symplectic forms. Wedging two bivectors gives an element of the 11-dimensional space of 44-vectors, and, picking a basis, the single component of this wedge product gives a non-degenerate symmetric bilinear form on the 66-dimensional vector space of bivectors, and hence, in odd characteristic, an orthogonal space, which turns out to be of plus type. It also turns out that this can be carried over to characteristic 22 as well, and gives a correspondence between PG(3,q)\mathrm{PG}(3,q) and Q 5 +(q)\mathrm{Q}_5^+(q), and isomorphisms between their symmetry groups. It is precisely the degenerate bivectors that are the ones of norm 00, and we get the following correspondence:

Q 5 +(q) PG(3,q) point line orthogonal points intersecting lines line plane pencil plane 1 point plane 2 plane\array{\arrayopts{\collayout{left}\collines{dashed}\rowlines{solid dashed}\frame{solid}} \mathbf{\mathrm{Q}_5^+(q)}&\mathbf{\mathrm{PG}(3,q)}\\ \text{point}&\text{line}\\ \text{orthogonal points}&\text{intersecting lines}\\ \text{line}&\text{plane pencil}\\ \text{plane}_1&\text{point}\\ \text{plane}_2&\text{plane} }

Here, “plane pencil” is all the lines that both go through a particular point and lie in a particular plane: effectively a point on a plane. The two types of plane in Q 5 +(q)\mathrm{Q}_5^+(q) are two families of maximal flats, and they correspond, in PG(3,q)\mathrm{PG}(3,q), to “all the lines through a particular point” and “all the lines in a particular plane”.

From fact 1c above, in Q 5 +(q)\mathrm{Q}_5^+(q) we have that two maximal flats of of different type must either intersect in a line or not intersect at all, corresponding to the fact in PG(3,q)\mathrm{PG}(3,q) that a point and a plane either coincide or don’t; while two maximal flats of the same type must intersect in a point, corresponding to the fact in PG(3,q)\mathrm{PG}(3,q) that any two points lie in a line, and any two planes intersect in a line.

Triality zips past your window

In Q 7 +(q)\mathrm{Q}_7^+(q), you may observe from facts 1a and 1b that the following three things are equal in number: points; maximal flats of one type; maximal flats of the other type. This is because these three things are cycled by the triality symmetry.

Counting things over 𝔽 2\mathbb{F}_2

Over 𝔽 2\mathbb{F}_2, we have the following things:

2a. PG(3,2)\mathrm{PG}(3,2) has 1515 planes, each containing 77 points and 77 lines. It has (dually) 1515 points, each contained in 77 lines and 77 planes. It has 3535 lines, each containing 33 points and contained in 33 planes.

2b. Q 5 +(2)\mathrm{Q}_5^+(2) has 3535 points, corresponding to the 3535 lines of PG(3,2)\mathrm{PG}(3,2), and 3030 planes, corresponding to the 1515 points and 1515 planes of PG(3,2)\mathrm{PG}(3, 2). There’s lots and lots of other interesting stuff, but we will ignore it.

2c. Q 7 +(2)\mathrm{Q}_7^+(2) has 135135 points and 270270 33-spaces, i.e. two families of maximal flats containing 135135 elements each. A projective 77-space has 255255 points, so if we give it an orthogonal structure of plus type, it will have 255135=120255-135=120 points of norm 11.

E 8\mathrm{E}_8 mod 22

Now we move onto the second part.

We’ll coordinatise the E 8\mathrm{E}_8 lattice so that the coordinates of its points are of the following types:

a) All integer, summing to an even number

b) All integer+12\frac{1}{2}, summing to an odd number.

Then the roots are of the following types:

a) All permutations of (±1,±1,0,0,0,0,0,0)\left(\pm1,\pm1,0,0,0,0,0,0\right)

b) All points like (±12,±12,±12,±12,±12,±12,±12,±12)\left(\pm\frac{1}{2},\pm\frac{1}{2},\pm\frac{1}{2},\pm\frac{1}{2}, \pm\frac{1}{2},\pm\frac{1}{2},\pm\frac{1}{2},\pm\frac{1}{2}\right) with an odd number of minus signs.

We now quotient E 8\mathrm{E}_8 by 2E 82\mathrm{E}_8. The elements of the quotient can by represented by the following:

a) All coordinates are 11 or 00, an even number of each.

b) All coordinates are ±12\pm\frac{1}{2} with either 11 or 33 minus signs.

c) Take an element of type b and put a star after it. The meaning of this is: you can replace any coordinate 12\frac{1}{2} and replace it with 32-\frac{3}{2}, or any coordinate 12-\frac{1}{2} and replace it with 32\frac{3}{2}, to get an E 8\mathrm{E}_8 lattice element representing this element of E 8/2E 8\mathrm{E}_8/2\mathrm{E}_8.

This is an 88-dimensional vector space over 𝔽 2\mathbb{F}_2.

Now we put the following quadratic form on this space: Q(x)Q(x) is half the Euclidean norm-squared, mod 22. This gives rise to the following bilinear form: the Euclidean dot product mod 22. This turns out to be a perfectly good non-degenerate quadratic form of plus type over 𝔽 2\mathbb{F}_2.

There are 120120 elements of norm 11, and these correspond to roots of E 8\mathrm{E}_8 , with 22 roots per element (related by switching the sign of all coordinates).

a) Elements of shape (1,1,0,0,0,0,0,0)\left(1,1,0,0,0,0,0,0\right) are already roots in this form.

b) Elements of shape (0,0,1,1,1,1,1,1)\left(0,0,1,1,1,1,1,1\right) correspond to the roots obtained by taking the complement (replacing all 11s by 00 and vice versa) and then changing the sign of one of the 11s.

c) Elements in which all coordinates are ±12\pm\frac{1}{2} with either 11 or 33 minus signs are already roots, and by switching all the signs we get the half-integer roots with 55 or 77 minus signs.

There are 135135 non-zero elements of norm 00, and these all correspond to lattice points in shell 22, with 1616 lattice points per element of the vector space.

a) There are 7070 elements of shape (1,1,1,1,0,0,0,0)\left(1,1,1,1,0,0,0,0\right). We get 88 lattice points by changing an even number of signs (including 00). We get another 88 lattice points by taking the complement and then changing an odd number of signs.

b) There is 11 element of shape (1,1,1,1,1,1,1,1)\left(1,1,1,1,1,1,1,1\right). This corresponds to the 1616 lattice points of shape (±2,0,0,0,0,0,0,0)\left(\pm2,0,0,0,0,0,0,0\right).

c) There are 6464 elements like (±12,±12,±12,±12,±12,±12,±12,±12) *\left(\pm\frac{1}{2},\pm\frac{1}{2},\pm\frac{1}{2},\pm\frac{1}{2},\pm\frac {1}{2},\pm\frac{1}{2},\pm\frac{1}{2},\pm\frac{1}{2}\right)^*, with 11 or 33 minus signs. We get 88 actual lattice points by replacing ±12\pm\frac{1}{2} by 32\mp\frac{3}{2} in one coordinate, and another 88 by changing the signs of all coordinates.

This accounts for all 16135=216016\cdot135=2160 points in shell 22.


shape number (1,1,1,1,1,1,1,1) 1 (1,1,1,1,0,0,0,0) 70 (±12,±12,±12,±12,±12,±12,±12,±12) * 64 total 135\array{\arrayopts{\collayout{left}\rowlines{solid}\collines{solid}\frame{solid}} \mathbf{shape}&\mathbf{number}\\ \left(1,1,1,1,1,1,1,1\right)&1\\ \left(1,1,1,1,0,0,0,0\right)&70\\ \left(\pm\tfrac{1}{2},\pm\tfrac{1}{2},\pm\tfrac{1}{2},\pm\tfrac{1}{2},\pm\tfrac{ 1}{2},\pm\tfrac{1}{2},\pm\tfrac{1}{2},\pm\tfrac{1}{2}\right)^*&64\\ \mathbf{total}&\mathbf{135} }


shape number (1,1,1,1,1,1,0,0) 28 (1,1,0,0,0,0,0,0) 28 (±12,±12,±12,±12,±12,±12,±12,±12) 64 total 120\array{\arrayopts{\collayout{left}\rowlines{solid}\collines{solid}\frame{solid}} \mathbf{shape}&\mathbf{number}\\ \left(1,1,1,1,1,1,0,0\right)&28\\ \left(1,1,0,0,0,0,0,0\right)&28\\ \left(\pm\tfrac{1}{2},\pm\tfrac{1}{2},\pm\tfrac{1}{2},\pm\tfrac{1}{2},\pm\tfrac{ 1}{2},\pm\tfrac{1}{2},\pm\tfrac{1}{2},\pm\tfrac{1}{2}\right)&64\\ \mathbf{total}&\mathbf{120} }

Since the quadratic form in 𝔽 2\mathbb{F}_2 comes from the quadratic form in Euclidean space, it is preserved by the Weyl group W(E 8)W(\mathrm{E}_8). In fact the homomorphism W(E 8)GO 8 +(2)W(\mathrm{E}_8)\rightarrow \mathrm{GO}_8^+(2) is onto, although (contrary to what I said in an earlier comment) it is a double cover — the element of W(E 8)W(\mathrm{E}_8) that reverses the sign of all coordinates is a (in fact, the) non-trivial element element of the kernel.

Large E 8\mathrm{E}_8 lattices

Pick a Fano plane structure on a set of seven points.

Here is a large E 8\mathrm{E}_8 containing (2,0,0,0,0,0,0,0)\left(2,0,0,0,0,0,0,0\right):

(where 1i,j,k,p,q,r,s71\le i,j,k,p,q,r,s\le7)

±2e i\pm2e_i

±e 0±e i±e j±e k\pm e_0\pm e_i\pm e_j\pm e_k where ii, jj, kk lie on a line in the Fano plane

±e p±e q±e r±e s\pm e_p\pm e_q\pm e_r\pm e_s where pp, qq, rr , ss lie off a line in the Fano plane.

Reduced to E 8\mathrm{E}_8 mod 22, these come to

i) (1,1,1,1,1,1,1,1)\left(1,1,1,1,1,1,1,1\right)

ii) e 0+e i+e j+e ke_0+e_i+e_j+e_k where ii, jj, kk lie on a line in the Fano plane. E.g. (1,1,1,0,1,0,0,0)\left(1,1,1,0,1,0,0,0\right).

iii) e p+e q+e r+e se_p+e_q+e_r+e_s where pp, qq, rr, ss lie off a line in the Fano plane. E.g. (0,0,0,1,0,1,1,1)\left(0,0,0,1,0,1,1,1\right).

Each of these corresponds to 1616 elements of the large E 8\mathrm{E}_8 roots.

Some notes on these points:

1) They’re all isotropic, since they have a multiple of 44 non-zero entries.

2) They’re mutually orthogonal.

  a) Elements of types ii and iii are all orthogonal to (1,1,1,1,1,1,1,1)\left(1,1,1,1,1,1,1,1\right) because they have an even number of ones (like all all-integer elements).

  b) Two elements of type ii overlap in two places: e 0e_0 and the point of the Fano plane that they share.

  c) If an element xx of type ii and an element yy of type iii are mutual complements, obviously they have no overlap. Otherwise, the complement of yy is an element of type ii, so xx overlaps with it in exactly two places; hence xx overlaps with yy itself in the other two non-zero places of xx.

  d) From cc, given two elements of type iii, one will overlap with the complement of the other in two places, hence (by the argument of c) will overlap with the other element itself in two places.

3) Adjoining the zero vector, they give a set closed under addition.

The rule for addition of all-integer elements is reasonably straightforward: if they are orthogonal, then treat the 11s and 00s as bits and add mod 22. If they aren’t orthogonal, then do the same, then take the complement of the answer.

  a) Adding (1,1,1,1,1,1,1,1)\left(1,1,1,1,1,1,1,1\right) to any of the others just gives the complement, which is a member of the set.

  b) Adding two elements of type ii, we set to 00 the e 0e_0 component and the component corresponding to the point of intersection in the Fano plane, leaving the 44 components where they don’t overlap, which are just the complement of the third line of the Fano plane through their point of intersection, and is hence a member of the set.

  c) Each element of type iii is the sum of the element of type i and an element of type ii, hence is covered implicitly by cases a and b.

4) There are 1515 elements of the set.

  a) There is (1,1,1,1,1,1,1,1)\left(1,1,1,1,1,1,1,1\right).

  b) There are 77 corresponding to lines of the Fano plane.

  c) There are 77 corresponding to the complements of lines of the Fano plane.

From the above, these 1515 elements form a maximal flat of Q 7 +(2)\mathrm{Q}_7^+(2). (That is, 1515 points projectively, forming a projective 33-space in a projective 77-space.)

That a large E 8\mathrm{E}_8 lattice projects to a flat is straightforward:

First, as a lattice it’s closed under addition over \mathbb{Z}, so should project to a subspace over 𝔽 2\mathbb{F}_2.

Second, since the cosine of the angle between two roots of E 8\mathrm{E}_8 is always a multiple of 12\frac{1}{2}, and the points in the second shell have Euclidean length 22, the dot product of two large E 8\mathrm{E}_8 roots must always be an even integer. Also, the large E 8\mathrm{E}_8 roots project to norm 00 points. So all points of the large E 8\mathrm{E}_8 should project to norm 00 points.

It’s not instantly obvious to me that large E 8\mathrm{E}_8 should project to a maximal flat, but it clearly does.

So I’ll assume each E 8\mathrm{E}_8 corresponds to a maximal flat, and generally that everything that I’m going to talk about over 𝔽 2\mathbb{F}_2 lifts faithfully to Euclidean space, which seems plausible (and works)! But I haven’t proved it. Anyway, assuming this, a bunch of stuff follows.

Total number of large E 8\mathrm{E}_8 lattices

We immediately know there are 270270 large E 8\mathrm{E}_8 lattices, because there are 270270 maximal flats in Q 7 +(2)\mathrm{Q}_7^+(2), either from the formula i=0 m1(1+q i)\prod_{i=0}^{m-1}\left(1+q^i\right), or immediately from triality and the fact that there are 135135 points in Q 7 +(2)\mathrm{Q}_7^+(2).

Number of large E 8\mathrm{E}_8 root systems sharing a given point

We can now bring to bear some more general theory. How many large E 8\mathrm{E}_8 root-sets share a point? Let us project this down and instead ask, How many maximal flats share a given point?

Recall fact 1e:

1e. Pick a polar space Σ\Sigma of rank mm. Pick a point pp in it. The space whose points are lines (i.e. 11-flats) through pp, whose lines are planes (i.e. 22-flats) through pp, etc, with incidence inherited from Σ\Sigma, form a polar space of the same type, of rank m1m-1.

So pick a point pp in Q 7 +(2)\mathrm{Q}_7^+(2). The space of all flats containing pp is isomorphic to Q 5 +(2)\mathrm{Q}_5^+(2). The maximal flats containing pp in Q 7 +(2)\mathrm{Q}_7^+(2) correspond to all maximal flats of Q 5 +(2)\mathrm{Q}_5^+(2), of which there are 3030. So there are 3030 maximal flats of Q 7 +(2)\mathrm{Q}_7^+(2) containing pp, and hence 3030 large E 8\mathrm{E}_8 lattices containing a given point.

We see this if we fix (1,1,1,1,1,1,1,1)\left(1,1,1,1,1,1,1,1\right), and the maximal flats correspond to the 3030 ways of putting a Fano plane structure on 77 points. Via the Klein correspondence, I guess this is a way to show that the 3030 Fano plane structures correspond to the points and planes of PG(3,2)\mathrm{PG}(3,2).

Number of large E 8\mathrm{E}_8 root system disjoint from a given large E 8\mathrm{E}_8 root system

Now assume that large E 8\mathrm{E}_8 lattices with non-intersecting sets of roots correspond to non-intersecting maximal flats. The intersections of maximal flats obey rule 1c:

1c. Two maximal flats of different types must intersect in a flat of odd codimension; two maximal flats of the same type must intersect in a flat of even codimension.

So two 33-flats of opposite type must intersect in a plane or a point; if they are of the same type, they must intersect in a line or not at all (the empty set having dimension 1-1).

We want to count the dimension 1-1 intersections, but it’s easier to count the dimension 11 intersections and subtract from the total.

So, given a 33-flat, how many other 33-flats intersect it in a line?

Pick a point pp in Q 7 +(2)\mathrm{Q}_7^+(2). The 33-flats sharing that point correspond to the planes of Q 5 +(2)\mathrm{Q}_5^+(2). Then the set of 33-flats sharing just a line through pp with our given 33-flat correspond to the set of planes of Q 5 +(2)\mathrm{Q}_5^+(2) sharing a single point with a given plane. By what was said above, this is all the other planes of the same type (there’s no other dimension these intersections can have). There are 1414 of these (1515 planes minus the given one).

So, given a point in the 33-flat, there are 1414 other 33-flats sharing a line (and no more) which passes through the point. There are 1515 points in the 33-flat, but on the other hand there are 33 points in a line, giving 14153=70\frac{14\cdot15}{3}=70 33-spaces sharing a line (and no more) with a given 33-flat.

But there are a total of 135135 33-flats of a given type. If 11 of them is a given 33-flat, and 7070 of them intersect that 33-flat in a line, then 135170=64135-1-70=64 don’t intersect the 33-flat at all. So there should be 6464 large E 8\mathrm{E}_8 lattices whose roots don’t meet the roots of a given large E 8\mathrm{E}_8 lattice.

Other numbers of intersecting root systems

We can also look at the intersections of large E 8\mathrm{E}_8 root systems with large E 8\mathrm{E}_8 root systems of opposite type. What about the intersections of two 33-flats in a plane? If we focus just on planes passing through a particular point, this corresponds, in Q 5 +(2)\mathrm{Q}_5^+(2), to planes intersecting in a line. There are 77 planes intersecting a given plane in a line (from the Klein correspondence — they correspond to the seven points in a plane or the seven planes containing a point of PG(3,2)\mathrm{PG}(3,2)). So there are 77 33-flats of Q 7 +(2)\mathrm{Q}_7^+(2) which intersect a given 33-flat in a plane containing a given point. There 1515 points to choose from, but 77 points in a plane, meaning that there are 7157=15\frac{7\cdot15}{7}=15 33-flats intersecting a given 33-flat in a plane. A plane has 77 points, so translating that to E 8\mathrm{E}_8 lattices should give 716=1127\cdot16=112 shared roots.

That leaves 13515=120135-15=120 33-flats intersecting a given 33-flat in a single point, corresponding to 1616 shared roots.

intersection dim. number same type 2 15 No 1 70 Yes 0 120 No 1 64 Yes\array{\arrayopts{\collayout{left}\collines{solid}\rowlines{solid}\frame{solid}} \mathbf{\text{intersection dim.}}&\mathbf{\text{number}}&\mathbf{\text{same type}}\\ 2&15&No\\ 1&70&Yes\\ 0&120&No\\ -1&64&Yes }

A couple of points here related to triality. Under triality, one type of maximal flat gets sent to the other type, and the other type gets sent to singular points (00-flats). The incidence relation of “intersecting in a plane” gets sent to ordinary incidence of a point with a flat. So the fact that there are 1515 maximal flats that intersect a given maximal flat in a plane is a reflection of the fact that there are 1515 points in a maximal flat (or, dually, 1515 maximal flats of a given type containing a given point).

The intersection of two maximal flats of the same type translates into a relation between two singular points. Just from the numbers, we’d expect “intersection in a line” to translate into “orthogonal to”, and “disjoint” to translate into “not orthogonal to”.

In that case, a pair of maximal flats intersecting in a (flat) line translates to 22 mutually orthogonal flat points — whose span is a flat line. Which makes sense, because under triality, 11-flats transform to 11-flats, reflecting the fact that the central point of the D 4D_4 diagram (representing lines) is sent to itself under triality.

In that case, two disjoint maximal flats translates to a pair of non-orthogonal singular points, defining a hyperbolic line.

Fixing a hyperbolic line (pointwise) obviously reduces the rank of the polar space by 11, picking out a GO 6 +(2)\mathrm{GO}_6^+(2) subgroup of GO 8 +(2)\mathrm{GO}_8^+(2). By the Klein correspondence, GO 6 +(2)\mathrm{GO}_6^+(2) is isomorphic to PSL 4(2)\mathrm{PSL}_4(2), which is just the automorphism group of PG(3,2)\mathrm{PG}(3, 2) — i.e., here, the automorphism group of a maximal flat. So the joint stabiliser of two disjoint maximal flats is just automorphisms of one of them, which forces corresponding automorphisms of the other. This group is also isomorphic to the symmetric group S 8S_8, giving all permutations of the coordinates (of the E 8\mathrm{E}_8 lattice).

(My guess would be that the actions of GL 4(2)\mathrm{GL}_4(2) on the two maximal flats would be related by an outer automorphsm of GL 4(2)\mathrm{GL}_4(2), in which the action on the points of one flat would match an action on the planes of the other, and vice versa, preserving the orthogonality relations coming from the symplectic structure implied by the orthogonal structure — i.e. the alternating form implied by the quadratic form.)

Nearest neighbours

We see this “non-orthogonal singular points” \leftrightarrow “disjoint maximal flats” echoed when we look at nearest neighbours.

Nearest neighbours in the second shell of the E 8\mathrm{E}_8 lattice are separated from each other by an angle of cos 134\cos^{-1}\frac{3}{4}, so have a mutual dot product of 33, hence are non-orthogonal over 𝔽 2\mathbb{F}_2.

Let us choose a fixed point (2,0,0,0,0,0,0,0)\left(2,0,0,0,0,0,0,0\right) in the second shell of E 8\mathrm{E}_8 . This has as our chosen representative (1,1,1,1,1,1,1,1)\left(1,1,1,1,1,1,1,1\right) in our version of PG(7,2)\mathrm{PG}(7,2), which has the convenient property that it is orthogonal to the all-integer points, and non-orthogonal to the half-integer points. The half-integer points in the second shell are just those that we write as (±12,±12,±12,±12,±12,±12,±12,±12) \left(\pm\frac{1}{2},\pm\frac{1}{2},\pm\frac{1}{2},\pm\frac{1}{2},\pm\frac{1}{2}, \pm\frac{1}{2},\pm\frac{1}{2},\pm\frac{1}{2}\right)^\star in our notation, where the ** means that we should replace any 12\frac{1}{2} by 32-\frac{3}{2} or replace any 12-\frac{1}{2} by 32\frac{3}{2} to get a corresponding element in the second shell of the E 8\mathrm{E}_8 latttice, and where we require 11 or 33 minus signs in the notation, to correspond two points in the lattice with opposite signs in all coordinates.

Now, since each reduced isotropic point represents 1616 points of the second shell, merely saying that two reduced points have dot product of 11 is not enough to pin down actual nearest neighbours.

But very conveniently, the sets of 1616 are formed in parallel ways for the particular setup we have chosen. Namely, lifting (1,1,1,1,1,1,1,1)\left(1,1,1,1,1,1,1,1\right) to a second-shell element, we can choose to put the ±2\pm2 in each of the 88 coordinates, with positive or negative sign, and lifting an element of the form (±12,±12,±12,±12,±12,±12,±12,±12) *\left(\pm\frac{1}{2},\pm\frac{1}{2},\pm\frac{1}{2},\pm\frac{1}{2},\pm\frac{1}{2}, \pm\frac{1}{2},\pm\frac{1}{2},\pm\frac{1}{2}\right)^* to a second-shell element, we can choose to put the ±32\pm\frac{3}{2} in each of the 88 coordinates, with positive or negative sign.

So we can line up our conventions, and choose, e.g., specifically (+2,0,0,0,0,0,0,0)\left(+2,0,0, 0,0,0,0,0\right), and choose neighbours of the form (+32,±12,±12,±12,±12,±12,±12,±12)\left(+\frac{3}{2},\pm\frac{1}{2},\pm\frac{1}{2},\pm\frac{1}{2},\pm\frac{1}{2}, \pm\frac{1}{2},\pm\frac{1}{2},\pm\frac{1}{2}\right), with an even number of minus signs.

This tells us we have 6464 nearest neighbours, corresponding to the 6464 isotropic points of half-integer form. Let us call this set of points TT.

Now pick one of those 6464 isotropic points, call it pp. It lies, as we showed earlier, in 3030 maximal flats, corresponding to the 3030 plane flats of Q 5 +(2)\mathrm{Q}_5^+(2), and we would like to understand the intersections of these flats with TT: that is, those nearest neighbours which belong to each large E 8\mathrm{E}_8 lattice.

In any maximal flat, i.e. any 33-flat, containing pp, there will be 77 lines passing through pp, each with 22 other points on it, totalling 1414 which, together with pp itself form the 1515 points of a copy of PG(3,2)\mathrm{PG}(3,2).

Now, the sum of two all-integer points is an all-integer point, but the sum of two half-integer points is also an all-integer point. So of the two other points on each of those lines, one will be half-integer and one all-integer. So there will be 77 half-integer points in addition to pp itself; i.e. the maximal flat will meet TT in 88 points; hence the corresponding large E 8\mathrm{E}_8 lattice will contain 88 of the nearest neighbours of (2,0,0,0,0,0,0,0)\left(2,0,0,0,0,0,0,0\right).

Also, because the sum of two half-integer points is not a half-integer point, no 33 of those 88 points will lie on a line.

But the only way that you can get 88 points in a 33-space such that no 33 of them lie on a line of the space is if they are the 88 points that do not lie on a plane of the space. Hence the other 77 points — the ones lying in the all-integer subspace — must form a Fano plane.

So we have the following: inside the projective 77-space of lattice elements mod 22, we have the projective 66-space of all-integer elements, and inside there we have the 55-space of all-integer elements orthogonal to pp, and inside there we have a polar space isomorphic to Q 5 +(2)\mathrm{Q}_5^+(2), and in there we have 3030 planes. And adding pp to each element of one of those planes gives the 77 elements which accompany pp in the intersection of the isotropic half-integer points with the corresponding 33-flat, which lift to the nearest neighbours of (2,0,0,0,0,0,0,0)\left(2,0,0,0,0,0,0,0\right) lying in the corresponding large E 8\mathrm{E}_8 lattice.

February 01, 2016

Resonaances750 ways to leave your lover

A new paper last week straightens out the story of the diphoton background in ATLAS. Some confusion was created because theorists misinterpreted the procedures described in the ATLAS conference note, which could lead to a different estimate of the significance of the 750 GeV excess. However, once the correct phenomenological and statistical approach is adopted, the significance quoted by ATLAS can be reproduced, up to small differences due to incomplete information available in public documents. Anyway, now that this is all behind, we can safely continue being excited at least until summer.  Today I want to discuss different interpretations of the diphoton bump observed by ATLAS. I will take a purely phenomenological point of view, leaving for the next time  the question of a bigger picture that the resonance may fit into.

Phenomenologically, the most straightforward interpretation is the so-called everyone's model: a 750 GeV singlet scalar particle produced in gluon fusion and decaying to photons via loops of new vector-like quarks. This simple construction perfectly explains all publicly available data, and can be easily embedded in more sophisticated models. Nevertheless, many more possibilities were pointed out in the 750 papers so far, and here I review a few that I find most interesting.

Spin Zero or More?  
For a particle decaying to two photons, there is not that many possibilities: the resonance has to be a boson and, according to young Landau's theorem, it cannot have spin 1. This leaves at the table spin 0, 2, or higher. Spin-2 is an interesting hypothesis, as this kind of excitations is predicted in popular models like the Randall-Sundrum one. Higher-than-two spins are disfavored theoretically. When more data is collected, the spin of the 750 GeV resonance can be tested by looking at the angular distribution of the photons. The rumor is that the data so far somewhat favor spin-2 over spin-0, although the statistics is certainly insufficient for any serious conclusions.  Concerning the parity, it is practically impossible to determine it by studying the diphoton final state, and both the scalar and the pseudoscalar option are equally viable at present. Discrimination may be possible in the future, but  only if multi-body decay modes of the resonance are discovered. If the true final state is more complicated than two photons (see below), then the 750 GeV resonance may have  any spin, including spin-1 and spin-1/2.

Narrow or Wide? 
The total width is an inverse of particle's lifetime (in our funny units). From the experimental point of view, the width larger than detector's  energy resolution  will show up as a smearing of the resonance due to the uncertainty principle. Currently, the ATLAS run-2 data prefer the width 10 times larger than the experimental resolution  (which is about 5 GeV in this energy ballpark), although the preference is not very strong in the statistical sense. On the other hand, from the theoretical point of view, it is much easier to construct models where the 750 GeV resonance is a narrow particle. Therefore, confirmation of the large width would have profound consequences, as it would significantly narrow down the scope of viable models.  The most exciting interpretation would then be that the resonance is a portal to a dark sector containing new light particles very weakly coupled to ordinary matter.    

How many resonances?  
One resonance is enough, but a family of resonances tightly packed around 750 GeV may also explain the data. As a bonus, this could explain the seemingly large width without opening new dangerous decay channels. It is quite natural for particles to come in multiplets with similar masses: our pion is an example where the small mass splitting π± and π0 arises due to electromagnetic quantum corrections. For Higgs-like multiplets the small splitting may naturally arise after electroweak symmetry breaking, and  the familiar 2-Higgs doublet model offers a simple realization. If the mass splitting of the multiplet is larger than the experimental resolution, this possibility can tested by precisely measuring the profile of the resonance and searching for a departure from the Breit-Wigner shape. On the other side of the spectrum is the idea is that there is no resonance at all at 750 GeV, but rather at another mass, and the bump at 750 GeV appears due to some kinematical accidents.
Who made it? 
The most plausible production process is definitely the gluon-gluon fusion. Production in collisions of light quark and antiquarks is also theoretically sound, however it leads to a more acute tension between run-2 and run-1 data. Indeed, even for the gluon fusion, the production cross section of a 750 GeV resonance in 13 TeV proton collisions is only 5 times larger than at 8 TeV. Given the larger amount of data collected in run-1, we would expect a similar excess there, contrary to observations. For a resonance produced from u-ubar or d-dbar the analogous ratio is only 2.5 (see the table), leading to much more  tension. The ratio climbs back to 5 if the initial state contains the heavier quarks: strange, charm, or bottom (which can also be found sometimes inside a proton), however I haven't seen yet a neat model that makes use of that. Another possibility is to produce the resonance via photon-photon collisions. This way one could cook up a truly minimal and very predictive model where the resonance couples only to photons of all the Standard Model particles. However, in this case, the ratio between 13 and 8 TeV cross section is very unfavorable, merely a factor of 2, and the run-1 vs run-2 tension comes back with more force. More options open up when associated production (e.g. with t-tbar, or in vector boson fusion) is considered. The problem with these ideas is that, according to what was revealed during the talk last December, there isn't any additional energetic particles in the diphoton events. Similar problems are facing models where the 750 GeV resonance appears as a decay product of a heavier resonance, although in this case some clever engineering or fine-tuning may help to hide the additional particles from experimentalist's eyes.

Two-body or more?
While a simple two-body decay of the resonance into two photons is a perfectly plausible explanation of all existing data, a number of interesting alternatives have been suggested. For example, the decay could be 3-body, with another soft visible or invisible  particle accompanying two photons. If the masses of all particles involved are chosen appropriately, the invariant mass spectrum of the diphoton remains sharply peaked. At the same time, a broadening of the diphoton energy due to the 3-body kinematics may explain why the resonance appears wide in ATLAS. Another possibility is a cascade decay into 4 photons. If the  intermediate particles are very light, then the pairs of photons from their decay are very collimated and may look like a single photon in the detector.
 ♬ The problem is all inside your head   and the possibilities are endless. The situation is completely different than during the process of discovering the  Higgs boson, where one strongly favored hypothesis was tested against more exotic ideas. Of course, the first and foremost question is whether the excess is really new physics, or just a nasty statistical fluctuation. But if that is confirmed, the next crucial task for experimentalists will be to establish the nature of the resonance and get model builders on the right track.  The answer is easy if you take it logically ♬ 

All ideas discussed above appeared in recent articles by various authors addressing the 750 GeV excess. If I were to include all references the post would be just one giant hyperlink, so you need to browse the literature yourself to find the original references.

January 29, 2016

Tim GowersFUNC1 — strengthenings, variants, potential counterexamples

After my tentative Polymath proposal, there definitely seems to be enough momentum to start a discussion “officially”, so let’s see where it goes. I’ve thought about the question of whether to call it Polymath11 (the first unclaimed number) or Polymath12 (regarding the polynomial-identities project as Polymath11). In the end I’ve gone for Polymath11, since the polynomial-identities project was listed on the Polymath blog as a proposal, and I think the right way of looking at things is that the problem got solved before the proposal became a fully-fledged project. But I still think that that project should be counted as a Polymathematical success story: it shows the potential benefits of opening up a problem for consideration by anybody who might be interested.

Something I like to think about with Polymath projects is the following question: if we end up not solving the problem, then what can we hope to achieve? The Erdős discrepancy problem project is a good example here. An obvious answer is that we can hope that enough people have been stimulated in enough ways that the probability of somebody solving the problem in the not too distant future increases (for example because we have identified more clearly the gap in our understanding). But I was thinking of something a little more concrete than that: I would like at the very least for this project to leave behind it an online resource that will be essential reading for anybody who wants to attack the problem in future. The blog comments themselves may achieve this to some extent, but it is not practical to wade through hundreds of comments in search of ideas that may or may not be useful. With past projects, we have developed Wiki pages where we have tried to organize the ideas we have had into a more browsable form. One thing we didn’t do with EDP, which in retrospect I think we should have, is have an official “closing” of the project marked by the writing of a formal article that included what we judged to be the main ideas we had had, with complete proofs when we had them. An advantage of doing that is that if somebody later solves the problem, it is more convenient to be able to refer to an article (or preprint) than to a combination of blog comments and Wiki pages.

With an eye to this, I thought I would make FUNC1 a data-gathering exercise of the following slightly unusual kind. For somebody working on the problem in the future, it would be very useful, I would have thought, to have a list of natural strengthenings of the conjecture, together with a list of “troublesome” examples. One could then produce a table with strengthenings down the side and examples along the top, with a tick in the table entry if the example disproves the strengthening, a cross if it doesn’t, and a question mark if we don’t yet know whether it does.

A first step towards drawing up such a table is of course to come up with a good supply of strengthenings and examples, and that is what I want to do in this post. I am mainly selecting them from the comments on the previous post. I shall present the strengthenings as statements rather than questions, so they are not necessarily true.


1. A weighted version.

Let w be a function from the power set of a finite set X to the non-negative reals. Suppose that the weights satisfy the condition w(A\cup B)\geq\max\{w(A),w(B)\} for every A,B\subset X and that at least one non-empty set has positive weight. Then there exists x\in X such that the sum of the weights of the sets containing x is at least half the sum of all the weights.

Note that if all weights take values 0 or 1, then this becomes the original conjecture. It is possible that the above statement follows from the original conjecture, but we do not know this (though it may be known).

This is not a good question after all, as the deleted statement above is false. When w is 01-valued, the statement reduces to saying that for every up-set there is an element in at least half the sets, which is trivial: all the elements are in at least half the sets. Thanks to Tobias Fritz for pointing this out.

2. Another weighted version.

Let w be a function from the power set of a finite set X to the non-negative reals. Suppose that the weights satisfy the condition w(A\cup B)\geq\min\{w(A),w(B)\} for every A,B\subset X and that at least one non-empty set has positive weight. Then there exists x\in X such that the sum of the weights of the sets containing x is at least half the sum of all the weights.

Again, if all weights take values 0 or 1, then the collection of sets of weight 1 is union closed and we obtain the original conjecture. It was suggested in this comment that one might perhaps be able to attack this strengthening using tropical geometry, since the operations it uses are addition and taking the minimum.

3. An “off-diagonal” version.

Tom Eccles suggests (in this comment) a generalization that concerns two set systems rather than one. Given set systems \mathcal{A} and \mathcal{B}, write \mathcal{A}+\mathcal{B} for the union set \{A\cup B:A\in\mathcal{A},B\in\mathcal{B}\}. A family \mathcal{A} is union closed if and only if |\mathcal{A}+\mathcal{A}|\leq|\mathcal{A}|. What can we say if \mathcal{A} and \mathcal{B} are set systems with \mathcal{A}+\mathcal{B} small? There are various conjectures one can make, of which one of the cleanest is the following: if \mathcal{A} and \mathcal{B} are of size k and \mathcal{A}+\mathcal{B} is of size at most k, then there exists x such that |\mathcal{A}_x|+|\mathcal{B}_x|\geq k, where \mathcal{A}_x denotes the set of sets in \mathcal{A} that contain x. This obviously implies FUNC.

Simple examples show that \mathcal{A}+\mathcal{B} can be much smaller than either \mathcal{A} or \mathcal{B} — for instance, it can consist of just one set. But in those examples there always seems to be an element contained in many more sets. So it would be interesting to find a good conjecture by choosing an appropriate function \phi to insert into the following statement: if |\mathcal{A}|=r, |\mathcal{B}|=s, and |\mathcal{A}+\mathcal{B}|\leq t, then there exists x such that |\mathcal{A}_x|+|\mathcal{B}_x|\geq\phi(r,s,t).

4. A first “averaged” version.

Let \mathcal{A} be a union-closed family of subsets of a finite set X. Then the average size of \mathcal{A}_x is at least \frac 12|\mathcal{A}|.

This is false, as the example \Bigl\{\emptyset,\{1\},\{1,2,\dots,m\}\Bigr\} shows for any m\geq 3.

5. A second averaged version.

Let \mathcal{A} be a union-closed family of subsets of a finite set X and suppose that \mathcal{A} separates points, meaning that if x\ne y, then at least one set in \mathcal{A} contains exactly one of x and y. (Equivalently, the sets \mathcal{A}_x are all distinct.) Then the average size of \mathcal{A}_x is at least \frac 12|\mathcal{A}|.

This again is false: see Example 2 below.

6. A better “averaged” version.

In this comment I had a rather amusing (and typically Polymathematical) experience of formulating a conjecture that I thought was obviously false in order to think about how it might be refined, and then discovering that I couldn’t disprove it (despite temporarily thinking I had a counterexample). So here it is.

As I have just noted (and also commented in the first post), very simple examples show that if we define the “abundance” a(x) of an element x to be |\mathcal{A}_x|/|\mathcal{A}|, then the average abundance does not have to be at least 1/2. However, that still leaves open the possibility that some kind of naturally defined weighted average might do the job. Since we want to define the weighting in terms of \mathcal{A} and to favour elements that are contained in lots of sets, a rather crude idea is to pick a random non-empty set A\in\mathcal{A} and then a random element x\in A, and make that the probability distribution on X that we use for calculating the average abundance.

A short calculation reveals that the average abundance with this probability distribution is equal to the average overlap density, which we define to be
\mathbb{E}_{A\ne\emptyset}\mathbb{E}_B|A\cap B|/|A|,
where the averages are over \mathcal{A}. So one is led to the following conjecture, which implies FUNC: if \mathcal{A} is a union-closed family of sets, at least one of which is non-empty, then its average overlap density is at least 1/2.

A not wholly pleasant feature of this conjecture is that the average overlap density is very far from being isomorphism invariant. (That is, if you duplicate elements of X, the average overlap density changes.) Initially, I thought this would make it easy to find counterexamples, but that seems not to be the case. It also means that one can give some thought to how to put a measure on X that makes the average overlap density as small as possible. Perhaps if the conjecture is true, this “worst case” would be easier to analyse. (It’s not actually clear that there is a worst case — it may be that one wants to use a measure on X that gives measure zero to some non-empty set A, at which point the definition of average overlap density breaks down. So one might have to look at the “near worst” case.)

7. Compressing to an up-set.

This conjecture comes from a comment by Igor Balla. Let \mathcal{A} be a union-closed family and let x\in X. Define a new family \mathcal{A}_x by replacing each A\in\mathcal{A} by A\cup\{x\} if A\cup\{x\}\notin\mathcal{A} and leaving it alone if A\cup\{x\}\in\mathcal{A}. Repeat this process for every x\in X and the result is an up-set \mathcal{B}, that is, a set-system \mathcal{B} such that B_1\in\mathcal{B} and B_1\subset B_2 implies that B_2\in\mathcal{B}.

Note that each time we perform the “add x if you can” operation, we are applying a bijection to the current set system, so we can compose all these bijections to obtain a bijection \phi from \mathcal{A} to \mathcal{B}.

Suppose now that A,B\in\mathcal{A} are distinct sets. It can be shown that there is no set C such that A\subset C\subset\phi(A) and B\subset C\subset\phi(B). In other words, A\cup B is never a subset of \phi(A)\cap\phi(B).

Now the fact that \mathcal{B} is an up-set means that each element x is in at least half the sets (since if x\notin B then x\in B\cup\{x\}). Moreover, it seems hard for too many sets A in \mathcal{A} to be “far” from their images \phi(A), since then there is a strong danger that we will be able to find a pair of sets A and B with A\cup B\subset\phi(A)\cap\phi(B).

This leads to the conjecture that Balla makes. He is not at all confident that it is true, but has checked that there are no small counterexamples.

Conjecture. Let \mathcal{A} be a set system such that there exist an up-set \mathcal{B} and a bijection \phi:\mathcal{A}\to\mathcal{B} with the following properties.

  • For each A\in\mathcal{A}, A\subset\phi(A).
  • For no distinct A,B\in\mathcal{A} do we have A\cup B\subset\phi(A)\cap\phi(B).

Then there is an element x that belongs to at least half the sets in \mathcal{A}.

The following comment by Gil Kalai is worth quoting: “Years ago I remember that Jeff Kahn said that he bet he will find a counterexample to every meaningful strengthening of Frankl’s conjecture. And indeed he shot down many of those and a few I proposed, including weighted versions. I have to look in my old emails to see if this one too.” So it seems that even to find a conjecture that genuinely strenghtens FUNC without being obviously false (at least to Jeff Kahn) would be some sort of achievement. (Apparently the final conjecture above passes the Jeff-Kahn test in the following weak sense: he believes it to be false but has not managed to find a counterexample.)

Examples and counterexamples

1. Power sets.

If X is a finite set and \mathcal{A} is the power set of X, then every element of X has abundance 1/2. (Remark 1: I am using the word “abundance” for the proportion of sets in \mathcal{A} that contain the element in question. Remark 2: for what it’s worth, the above statement is meaningful and true even if X is empty.)

Obviously this is not a counterexample to FUNC, but it was in fact a counterexample to an over-optimistic conjecture I very briefly made and then abandoned while writing it into a comment.

2. Almost power sets

This example was mentioned by Alec Edgington. Let X be a finite set and let z be an element that does not belong to X. Now let \mathcal{A} consist of \emptyset together with all sets of the form A\cup\{z\} such that A\subset X.

If |X|=n, then z has abundance 1-1/(2^n+1), while each x\in X has abundance 2^{n-1}/(2^n+1). Therefore, only one point has abundance that is not less than 1/2.

A slightly different example, also used by Alec Edgington, is to take all subsets of X together with the set X\cup\{z\}. If |X|=n, then the abundance of any element of X is (2^{n-1}+1)/(2^n+1) while the abundance of z is 1/(2^n+1). Therefore, the average abundance is
\displaystyle \frac n{n+1}\frac{2^{n-1}+1}{2^n+1}+\frac 1{n+1}\frac 1{2^n+1}.
When n is large, the amount by which (2^{n-1}+1)/(2^n+1) exceeds 1/2 is exponentially small, from which it follows easily that this average is less than 1/2. In fact, it starts to be less than 1/2 when n=2 (which is the case Alec mentioned). This shows that Conjecture 5 above (that the average abundance must be at least 1/2 if the system separates points) is false.

3. Empty set, singleton, big set.

Let m be a positive integer and take the set system that consists of the sets \emptyset, \{1\} and \{1,2,\dots,m\}. This is a simple example (or rather class of examples) of a set system for which although there is certainly an element with abundance at least 1/2 (the element 1 has abundance 2/3), the average abundance is close to 1/3. Very simple variants of this example can give average abundances that are arbitrarily small — just take a few small sets and one absolutely huge set.

4. Using strong divisibility sequences.

I will not explain these in detail, but just point you to an interesting comment by Uwe Stroinski that suggests a number-theoretic way of constructing union-closed families.

I will continue with methods of building union-closed families out of other union-closed families.

5. Duplicating elements.

I’ll define this process formally first. Let X be a set of size n and let \mathcal{A} be a collection of subsets of X. Now let \{E(x):x\in X\} be a collection of disjoint non-empty sets and define E(\mathcal{A}) to be the collection of all sets of the form \bigcup_{x\in A}E(x) for some A\in\mathcal{A}. If \mathcal{A} is union closed, then so is E(\mathcal{A}).

One can think of E(x) as “duplicating” the element of x |E(x)| times. A simple example of this process is to take the set system \emptyset, \{1\}, \{1,2\} and let E(1)=\{1\} and E(2)=\{2,3,\dots,m\}. This gives the set system 3 above.

Let us say that \mathcal{A}\rightarrow\mathcal{B} if \mathcal{B}=E(\mathcal{A}) for some suitable set-valued function E. And let us say that two set systems are isomorphic if they are in the same equivalence class of the symmetric-transitive closure of the relation \rightarrow. Equivalently, they are isomorphic if we can find E_1 and E_2 such that E_1(\mathcal{A})=E_2(\mathcal{B}).

The effect of duplication is basically that we can convert the uniform measure on the ground set into any other probability measure (at least to an arbitrary approximation). What I mean by that is that the uniform measure on the ground set of E(\mathcal{A}), which is of course \bigcup_x E(x), gives you a probability of |E(x)|/\sum|E(x)| of landing in E(x), so has the same effect as assigning that probability to x and sticking with the set system \mathcal{A}. (So the precise statement is that we can get any probability measure where all the probabilities are rational.)

If one is looking for an averaging argument, then it would seem that a nice property that such an argument might have is (as I have already commented above) that the average should be with respect to a probability measure on X that is constructed from \mathcal{A} in an isomorphism-invariant way.

It is common in the literature to outlaw duplication by insisting that \mathcal{A} separates points. However, it may be genuinely useful to consider different measures on the ground set.

6. Union-sets.

Tom Eccles, in his off-diagonal conjecture, considered the set system, which he denoted by \mathcal{A}+\mathcal{B}, that is defined to be \{A\cup B:A\in\mathcal{A},B\in\mathcal{B}\}. This might more properly be denoted \mathcal{A}\cup\mathcal{B}, by analogy with the notation A+B for sumsets, but obviously one can’t write it like that because that notation already stands for something else, so I’ll stick with Tom’s notation.

It’s trivial to see that if \mathcal{A} and \mathcal{B} are union closed, then so is \mathcal{A}+\mathcal{B}. Moreover, sometimes it does quite natural things: for instance, if X and Y are any two sets, then P(X)+P(Y)=P(X\cup Y), where P is the power-set operation.

Another remark is that if X and Y are disjoint, and \mathcal{A}\subset P(X) and \mathcal{B}\subset P(Y), then the abundance of x in \mathcal{A} is equal to the abundance of x in \mathcal{A}+\mathcal{B}.

7. A less obvious construction method.

I got this from a comment by Thomas Bloom. Let X and Y be disjoint finite sets and let \mathcal A and \mathcal B be two union-closed families living inside X and Y, respectively, and assume that X\in\mathcal A and Y\in\mathcal B. We then build a new family as follows. Let \phi be some function from X to \mathcal{B}. Then take all sets of one of the following four forms:

  • sets A\cup Y with A\in\mathcal{A};
  • sets X\cup B with B\in\mathcal{B};
  • sets (X\setminus\{x\})\cup\phi(x) with x\in X;
  • sets (X\setminus\{x\})\cup Y with x\in X.

It can be checked quite easily (there are six cases to consider, all straightforward) that the resulting family is union closed.

Thomas Bloom remarks that if \mathcal A consists of all subsets of \{1,2,3\} and \mathcal B consists of all subsets of \{4,5,6\}, then (for suitable \phi) the result is a union-closed family that contains no set of size less than 3, and also contains a set of size 3 with no element of abundance greater than or equal to 1/2. This is interesting because a simple argument shows that if A is a set with two elements in a union-closed family then at least one of its elements has abundance at least 1/2.

Thus, this construction method can be used to create interesting union-closed families out of boring ones.

Thomas discusses what happens to abundances when you do this construction, and the rough answer is that elements of Y become less abundant but elements of X become quite a lot more abundant. So one can’t just perform this construction a few times and end up with a counterexample to FUNC. However, as Thomas also says, there is plenty of scope for modifying this basic idea, and maybe good things could flow from that.

I feel as though there is much more I could say, but this post has got quite long, and has taken me quite a long time to write, so I think it is better if I just post it. If there are things I wish I had mentioned, I’ll put them in comments and possibly repeat them in my next post.

I’ll close by remarking that I have created a wiki page. At the time of writing it has almost nothing on it but I hope that will change before too long.

BackreactionDoes the arXiv censor submissions?

The arXiv is the physicsts' marketplace of ideas. In high energy physics and adjacent fields, almost all papers are submitted to the arXiv prior to journal submission. Developed by Paul Ginsparg in the early 1990s, this open-access pre-print repository has served the physics community for more than 20 years, and meanwhile extends also to adjacent fields like mathematics, economics, and biology. It fulfills an extremely important function by helping us to exchange ideas quickly and efficiently.

Over the years the originally free signup became more restricted. If you sign up for the arXiv now, you need to be "endorsed" by several people who are already signed up. It also became necessary to screen submissions to keep the quality level up. In hindsight, this isn't surprising: more people means more trouble. And sometimes, of course, things go wrong.

I have heard various stories about arXiv moderation gone wrong, mostly these are from students, and mostly it affects those who work in small research areas or those whose name is Garrett Lisi.

A few days ago, a story appeared online which quickly spread. Nicolas Gisin, an established Professor for Physics who works on quantum cryptography (among other things) relates the story of two of his students who ventured in a territory unfamiliar for him, black hole physics. They wrote a paper that appeared to him likely wrong but reasonable. It got rejected by the arxiv. The paper later got published by PLA (a respected journal that however does not focus on general relativity). More worrisome still, the students' next paper also got rejected by the arXiv, making it appear as if they were now blacklisted.

Now the paper that caused the offense is, haha, not on the arXiv, but I tracked it down. So let me just say that I think it's indeed wrong and it shouldn't have gotten published in a journal. They are basically trying to include the backreaction of the outgoing Hawking-radiation on the black hole. It's a thorny problem (the very problem this blog was named after) and the treatment in the paper doesn't make sense.

Hawking radiation is not produced at the black hole horizon. No, it is not. And tracking back the flux from infinity to the horizon is therefore is not correct. Besides this, the equation for the mass-loss that they use is a late-time approximation in a collapse situation. One can't use this approximation for a metric without collapse, and it certainly shouldn't be used down to the Planck mass. If you have a collapse-scenario, to get the backreaction right you would have to calculate the emission rate prior to horizon formation, time-dependently, and integrate over this.

Ok, so the paper is wrong. But should it have been rejected by the arXiv? I don't think so. The arxiv moderation can't and shouldn't replace peer review, it should just be a basic quality check, and the paper looks like a reasonable research project.

I asked a colleague who I know works as an arXiv moderator for comment. (S)he wants to stay anonymous but offers the following explanation:

I had not heard of the complaints/blog article, thanks for passing that information on...  
 The version of the article I saw was extremely naive and was very confused regarding coordinates and horizons in GR... I thought it was not “referee-able quality’’ — at least not in any competently run GR journal... (The hep-th moderator independently raised concerns...)  
 While it is now published at Physics Letters A, it is perhaps worth noting that the editorial board of Physics Letters A does *not* include anyone specializing in GR.
(S)he is correct of course. We haven't seen the paper that was originally submitted. It was very likely in considerably worse shape than the published version. Indeed, Gisin writes in his post that the paper was significantly revised during peer review. Taking this into account, the decision seems understandable to me.

The main problem I have with this episode is not that a paper got rejected which maybe shouldn't have been rejected -- because shit happens. Humans make mistakes, and let us be clear that the arXiv, underfunded as it is, relies on volunteers for the moderation. No, the main problem I have is the lack of transparency.

The arXiv is an essential resource for the physics community. We all put trust in a group of mostly anonymous moderators who do a rather thankless and yet vital job. I don't think the origin of the problem is with these people. I am sure they do the best they can. No, I think the origin of the problem is the lack of financial resources which must affect the possibility to employ administrative staff to oversee the operations. You get what you pay for.

I hope that this episode be a wake-up call to the community to put their financial support behind the arXiv, and to the arXiv to use this support to put into place a more transparent and better organized moderation procedure.

Note added: It was mentioned to me that the problem with the paper might be more elementary in that they're using wrong coordinates to begin with - it hadn't even occurred to me to check this. To tell you the truth, I am not really interested in figuring out exactly why the paper is wrong, it's besides the point. I just hope that whoever reviewed the paper for PLA now goes and sits in the corner for an hour with a paper bag over their head.

January 28, 2016

BackreactionHello from Maui

Greetings from the west-end of my trip, which brought me out to Maui, visiting Garrett at the Pacific Science Institute, PSI. Launched roughly a year ago, Garrett and his girlfriend/partner Crystal have now hosted about 60 traveling scientists, "from all areas except chemistry" I was told.

I got bitten by mosquitoes and picked at by a set of adorable chickens (named after the six quarks), but managed to convince everybody that I really didn't feel like swimming, or diving, or jumping off things at great height. I know I'm dull. I did watch some sea turtles though and I also got a new T-shirt with the PSI-logo, which you can admire in the photo to the right (taken in front of a painting by Crystal).

I'm not an island-person, don't like mountains, and I can't stand humidity, so for me it's somewhat of a mystery what people think is so great about Hawaii. But leaving aside my preference for German forests, it's as pleasant a place as can be.

You won't be surprised to hear that Garrett is still working on his E8 unification and says things are progressing well, if slowly. Aloha.

Jordan EllenbergAI challop

When I was a kid people thought it would be a long time before computers could adequately translate natural language text, or play Go against a human being, because you’d need some kind of AI to do those things, and AI seemed really hard.

Now we know that you can get pretty decent translation and Go without anything like AI.  But AI still seems really hard.

January 27, 2016

Jordan EllenbergRoom to grow

However, Apple CEO Tim Cook pointed out that 60% of people who owned an iPhone before the launch of the iPhone 6 haven’t upgraded to the most recent models, which implies that there is still room to grow, Reuters notes.

Doesn’t it imply that a) people are no longer on contracts incentivizing biannual upgrade; and b) Apple hasn’t figured out a way to make a new phone that’s different enough from the iPhone 5 to make people want to switch?

Doug NatelsonCalTech wins the whole internet - public outreach for quantum.

This makes my public outreach efforts look lame by comparison.  Well done!

Terence TaoIPAM begins search for new director

The Institute for Pure and Applied Mathematics (IPAM) here at UCLA is seeking applications for its new director in 2017 or 2018, to replace Russ Caflisch, who is nearing the end of his five-year term as IPAM director.  The previous directors of IPAM (Tony Chan, Mark Green, and Russ Caflisch) were also from the mathematics department here at UCLA, but the position is open to all qualified applicants with extensive scientific and administrative experience in mathematics, computer science, or statistics.  Applications will be reviewed on June 1, 2016 (though the applications process will remain open through to Dec 1, 2016).

Filed under: advertising Tagged: ipam

Terence TaoDecoupling and the Bourgain-Demeter-Guth proof of the Vinogradov main conjecture

Given any finite collection of elements {(f_i)_{i \in I}} in some Banach space {X}, the triangle inequality tells us that

\displaystyle \| \sum_{i \in I} f_i \|_X \leq \sum_{i \in I} \|f_i\|_X.

However, when the {f_i} all “oscillate in different ways”, one expects to improve substantially upon the triangle inequality. For instance, if {X} is a Hilbert space and the {f_i} are mutually orthogonal, we have the Pythagorean theorem

\displaystyle \| \sum_{i \in I} f_i \|_X = (\sum_{i \in I} \|f_i\|_X^2)^{1/2}.

For sake of comparison, from the triangle inequality and Cauchy-Schwarz one has the general inequality

\displaystyle \| \sum_{i \in I} f_i \|_X \leq (\# I)^{1/2} (\sum_{i \in I} \|f_i\|_X^2)^{1/2} \ \ \ \ \ (1)


for any finite collection {(f_i)_{i \in I}} in any Banach space {X}, where {\# I} denotes the cardinality of {I}. Thus orthogonality in a Hilbert space yields “square root cancellation”, saving a factor of {(\# I)^{1/2}} or so over the trivial bound coming from the triangle inequality.

More generally, let us somewhat informally say that a collection {(f_i)_{i \in I}} exhibits decoupling in {X} if one has the Pythagorean-like inequality

\displaystyle \| \sum_{i \in I} f_i \|_X \ll_\varepsilon (\# I)^\varepsilon (\sum_{i \in I} \|f_i\|_X^2)^{1/2}

for any {\varepsilon>0}, thus one obtains almost the full square root cancellation in the {X} norm. The theory of almost orthogonality can then be viewed as the theory of decoupling in Hilbert spaces such as {L^2({\bf R}^n)}. In {L^p} spaces for {p < 2} one usually does not expect this sort of decoupling; for instance, if the {f_i} are disjointly supported one has

\displaystyle \| \sum_{i \in I} f_i \|_{L^p} = (\sum_{i \in I} \|f_i\|_{L^p}^p)^{1/p}

and the right-hand side can be much larger than {(\sum_{i \in I} \|f_i\|_{L^p}^2)^{1/2}} when {p < 2}. At the opposite extreme, one usually does not expect to get decoupling in {L^\infty}, since one could conceivably align the {f_i} to all attain a maximum magnitude at the same location with the same phase, at which point the triangle inequality in {L^\infty} becomes sharp.

However, in some cases one can get decoupling for certain {2 < p < \infty}. For instance, suppose we are in {L^4}, and that {f_1,\dots,f_N} are bi-orthogonal in the sense that the products {f_i f_j} for {1 \leq i < j \leq N} are pairwise orthogonal in {L^2}. Then we have

\displaystyle \| \sum_{i = 1}^N f_i \|_{L^4}^2 = \| (\sum_{i=1}^N f_i)^2 \|_{L^2}

\displaystyle = \| \sum_{1 \leq i,j \leq N} f_i f_j \|_{L^2}

\displaystyle \ll (\sum_{1 \leq i,j \leq N} \|f_i f_j \|_{L^2}^2)^{1/2}

\displaystyle = \| (\sum_{1 \leq i,j \leq N} |f_i f_j|^2)^{1/2} \|_{L^2}

\displaystyle = \| \sum_{i=1}^N |f_i|^2 \|_{L^2}^{1/2}

\displaystyle \leq (\sum_{i=1}^N \| |f_i|^2 \|_{L^2})^{1/2}

\displaystyle = (\sum_{i=1}^N \|f_i\|_{L^4}^2)^{1/2}

giving decoupling in {L^4}. (Similarly if each of the {f_i f_j} is orthogonal to all but {O_\varepsilon( N^\varepsilon )} of the other {f_{i'} f_{j'}}.) A similar argument also gives {L^6} decoupling when one has tri-orthogonality (with the {f_i f_j f_k} mostly orthogonal to each other), and so forth. As a slight variant, Khintchine’s inequality also indicates that decoupling should occur for any fixed {2 < p < \infty} if one multiplies each of the {f_i} by an independent random sign {\epsilon_i \in \{-1,+1\}}.

In recent years, Bourgain and Demeter have been establishing decoupling theorems in {L^p({\bf R}^n)} spaces for various key exponents of {2 < p < \infty}, in the “restriction theory” setting in which the {f_i} are Fourier transforms of measures supported on different portions of a given surface or curve; this builds upon the earlier decoupling theorems of Wolff. In a recent paper with Guth, they established the following decoupling theorem for the curve {\gamma({\bf R}) \subset {\bf R}^n} parameterised by the polynomial curve

\displaystyle \gamma: t \mapsto (t, t^2, \dots, t^n).

For any ball {B = B(x_0,r)} in {{\bf R}^n}, let {w_B: {\bf R}^n \rightarrow {\bf R}^+} denote the weight

\displaystyle w_B(x) := \frac{1}{(1 + \frac{|x-x_0|}{r})^{100n}},

which should be viewed as a smoothed out version of the indicator function {1_B} of {B}. In particular, the space {L^p(w_B) = L^p({\bf R}^n, w_B(x)\ dx)} can be viewed as a smoothed out version of the space {L^p(B)}. For future reference we observe a fundamental self-similarity of the curve {\gamma({\bf R})}: any arc {\gamma(I)} in this curve, with {I} a compact interval, is affinely equivalent to the standard arc {\gamma([0,1])}.

Theorem 1 (Decoupling theorem) Let {n \geq 1}. Subdivide the unit interval {[0,1]} into {N} equal subintervals {I_i} of length {1/N}, and for each such {I_i}, let {f_i: {\bf R}^n \rightarrow {\bf R}} be the Fourier transform

\displaystyle f_i(x) = \int_{\gamma(I_i)} e(x \cdot \xi)\ d\mu_i(\xi)

of a finite Borel measure {\mu_i} on the arc {\gamma(I_i)}, where {e(\theta) := e^{2\pi i \theta}}. Then the {f_i} exhibit decoupling in {L^{n(n+1)}(w_B)} for any ball {B} of radius {N^n}.

Orthogonality gives the {n=1} case of this theorem. The bi-orthogonality type arguments sketched earlier only give decoupling in {L^p} up to the range {2 \leq p \leq 2n}; the point here is that we can now get a much larger value of {n}. The {n=2} case of this theorem was previously established by Bourgain and Demeter (who obtained in fact an analogous theorem for any curved hypersurface). The exponent {n(n+1)} (and the radius {N^n}) is best possible, as can be seen by the following basic example. If

\displaystyle f_i(x) := \int_{I_i} e(x \cdot \gamma(\xi)) g_i(\xi)\ d\xi

where {g_i} is a bump function adapted to {I_i}, then standard Fourier-analytic computations show that {f_i} will be comparable to {1/N} on a rectangular box of dimensions {N \times N^2 \times \dots \times N^n} (and thus volume {N^{n(n+1)/2}}) centred at the origin, and exhibit decay away from this box, with {\|f_i\|_{L^{n(n+1)}(w_B)}} comparable to

\displaystyle 1/N \times (N^{n(n+1)/2})^{1/(n(n+1))} = 1/\sqrt{N}.

On the other hand, {\sum_{i=1}^N f_i} is comparable to {1} on a ball of radius comparable to {1} centred at the origin, so {\|\sum_{i=1}^N f_i\|_{L^{n(n+1)}(w_B)}} is {\gg 1}, which is just barely consistent with decoupling. This calculation shows that decoupling will fail if {n(n+1)} is replaced by any larger exponent, and also if the radius of the ball {B} is reduced to be significantly smaller than {N^n}.

This theorem has the following consequence of importance in analytic number theory:

Corollary 2 (Vinogradov main conjecture) Let {s, n, N \geq 1} be integers, and let {\varepsilon > 0}. Then

\displaystyle \int_{[0,1]^n} |\sum_{j=1}^N e( j x_1 + j^2 x_2 + \dots + j^n x_n)|^{2s}\ dx_1 \dots dx_n

\displaystyle \ll_{\varepsilon,s,n} N^{s+\varepsilon} + N^{2s - \frac{n(n+1)}{2}+\varepsilon}.

Proof: By the Hölder inequality (and the trivial bound of {N} for the exponential sum), it suffices to treat the critical case {s = n(n+1)/2}, that is to say to show that

\displaystyle \int_{[0,1]^n} |\sum_{j=1}^N e( j x_1 + j^2 x_2 + \dots + j^n x_n)|^{n(n+1)}\ dx_1 \dots dx_n \ll_{\varepsilon,n} N^{\frac{n(n+1)}{2}+\varepsilon}.

We can rescale this as

\displaystyle \int_{[0,N] \times [0,N^2] \times \dots \times [0,N^n]} |\sum_{j=1}^N e( x \cdot \gamma(j/N) )|^{n(n+1)}\ dx \ll_{\varepsilon,n} N^{3\frac{n(n+1)}{2}+\varepsilon}.

As the integrand is periodic along the lattice {N{\bf Z} \times N^2 {\bf Z} \times \dots \times N^n {\bf Z}}, this is equivalent to

\displaystyle \int_{[0,N^n]^n} |\sum_{j=1}^N e( x \cdot \gamma(j/N) )|^{n(n+1)}\ dx \ll_{\varepsilon,n} N^{\frac{n(n+1)}{2}+n^2+\varepsilon}.

The left-hand side may be bounded by {\ll \| \sum_{j=1}^N f_j \|_{L^{n(n+1)}(w_B)}^{n(n+1)}}, where {B := B(0,N^n)} and {f_j(x) := e(x \cdot \gamma(j/N))}. Since

\displaystyle \| f_j \|_{L^{n(n+1)}(w_B)} \ll (N^{n^2})^{\frac{1}{n(n+1)}},

the claim now follows from the decoupling theorem and a brief calculation. \Box

Using the Plancherel formula, one may equivalently (when {s} is an integer) write the Vinogradov main conjecture in terms of solutions {j_1,\dots,j_s,k_1,\dots,k_s \in \{1,\dots,N\}} to the system of equations

\displaystyle j_1^i + \dots + j_s^i = k_1^i + \dots + k_s^i \forall i=1,\dots,n,

but we will not use this formulation here.

A history of the Vinogradov main conjecture may be found in this survey of Wooley; prior to the Bourgain-Demeter-Guth theorem, the conjecture was solved completely for {n \leq 3}, or for {n > 3} and {s} either below {n(n+1)/2 - n/3 + O(n^{2/3})} or above {n(n-1)}, with the bulk of recent progress coming from the efficient congruencing technique of Wooley. It has numerous applications to exponential sums, Waring’s problem, and the zeta function; to give just one application, the main conjecture implies the predicted asymptotic for the number of ways to express a large number as the sum of {23} fifth powers (the previous best result required {28} fifth powers). The Bourgain-Demeter-Guth approach to the Vinogradov main conjecture, based on decoupling, is ostensibly very different from the efficient congruencing technique, which relies heavily on the arithmetic structure of the program, but it appears (as I have been told from second-hand sources) that the two methods are actually closely related, with the former being a sort of “Archimedean” version of the latter (with the intervals {I_i} in the decoupling theorem being analogous to congruence classes in the efficient congruencing method); hopefully there will be some future work making this connection more precise. One advantage of the decoupling approach is that it generalises to non-arithmetic settings in which the set {\{1,\dots,N\}} that {j} is drawn from is replaced by some other similarly separated set of real numbers. (A random thought – could this allow the Vinogradov-Korobov bounds on the zeta function to extend to Beurling zeta functions?)

Below the fold we sketch the Bourgain-Demeter-Guth argument proving Theorem 1.

I thank Jean Bourgain and Andrew Granville for helpful discussions.

— 1. Initial reductions —

The claim will proceed by an induction on dimension, thus we assume henceforth that {n \geq 2} (the {n=1} case being immediate from the Pythagorean theorem) and that Theorem 1 has already been proven for smaller values of {n}. This has the following nice consequence:

Proposition 3 (Lower dimensional decoupling) Let the notation be as in Theorem 1. Suppose also that {n \geq 2}, and that Theorem 1 has already been proven for all smaller values of {n}. Then for any {1 \leq k < n}, the {f_i} exhibits decoupling in {L^{k(k+1)}(w_B)} for any ball {B} of radius {N^k}.

Proof: (Sketch) We slice the ball {B} into {k}-dimensional slices parallel to the first {k} coordinate directions. On each slice, the {f_i} can be interpreted as functions on {{\bf R}^n} whose Fourier transform lie on the curve {\gamma_k(I) \subset {\bf R}^k}, where {\gamma_k(t) := (t,\dots,t^k)}. Applying Theorem 1 with {n} replaced by {k}, and then integrating over all slices using Fubini’s theorem and Minkowski’s inequality (to interchange the {L^{k(k+1)}} norm and the square function), we obtain the claim. \Box

The first step, needed for technical inductive purposes, is to work at an exponent slightly below {n(n+1)}. More precisely, given any {2 < p < \infty} and {\eta > 0}, let {P(p,\eta)} denote the assertion that

\displaystyle \| \sum_{i=1}^N f_i \|_{L^p(w_B)} \ll_{\varepsilon,p,n} N^{\eta+\varepsilon} (\sum_{i=1}^N \|f_i\|_{L^p(w_B)}^2)^{1/2}

whenever {\varepsilon>0}, {N \geq 1}, {B} and {f_1,\dots,f_N} are as in Theorem 1. Theorem 1 is then clearly equivalent to the claim {P(n(n+1),\eta)} holding for all {\eta>0}. This turns out to be equivalent to the following variant:

Proposition 4 Let {n \geq 2}, and assume Theorem 1 has been established for all smaller values of {n}. If {p < n(n+1)} is sufficiently close to {n(n+1)}, then {P(p,\eta)} holds for all {\eta > 0}.

The reason for this is that the functions {f_i} and {\sum_{i=1}^N f_i} all have Fourier transform supported on a ball of radius {O(1)}, and so there is a Bernstein-type inequality that lets one replace the {L^p(w_B)} norm of either function by the {L^{n(n+1)}(w_B)} norm, losing a power of {N} that goes to zero as {p} goes to {n(n+1)}. (See Corollary 6.2 and Lemma 8.2 of the Bourgain-Demeter-Guth paper for more details of this.)

Using the trivial bound (1) we see that {P(p,\eta)} holds for large {\eta} (e.g. {\eta \geq 1/2}). To reduce {\eta}, it suffices to prove the following inductive claim.

Proposition 5 (Inductive claim) Let {n \geq 2}, and assume Theorem 1 has been established for all smaller values of {n}. If {p < n(n+1)} is sufficiently close to {n(n+1)}, and {P(p,\eta)} holds for some {\eta > 0}, then {P(p,\eta')} holds for some {0 < \eta' < \eta}.

Since the set of {\eta \geq 0} for which {P(p,\eta)} holds is clearly a closed half-infinite interval, Proposition 5 implies Proposition 4 and hence Theorem 1.

Henceforth we fix {n,p,\eta} as in Proposition 5. We fix {n,p,\eta} and use {o(1)} to denote any quantity that goes to zero as {N \rightarrow \infty}, keeping {n,p,\eta} fixed. Then the {P(p,\eta)} hypothesis reads

\displaystyle \| \sum_{i=1}^N f_i \|_{L^p(w_B)} \ll N^{\eta+o(1)} (\sum_{i=1}^N \|f_i\|_{L^p(w_B)}^2)^{1/2}

and our task is to show that

\displaystyle \| \sum_{i=1}^N f_i \|_{L^p(w_B)} \ll N^{\eta'+o(1)} (\sum_{i=1}^N \|f_i\|_{L^p(w_B)}^2)^{1/2} \ \ \ \ \ (2)


for some {0 < \eta' < \eta}.

The next step is to reduce matters to a “multilinear” version of the above estimate, in order to exploit a multilinear Kakeya estimate at a later stage of the argument. Let {M} be a large integer depending only on {n} (actually Bourgain, Demeter, and Guth choose {M := n!}). It turns out that it will suffice to prove the multilinear version

\displaystyle \| \hbox{geom} |\sum_{i \in {\mathcal I}_j} f_i|\|_{L^p(w_B)} \ll N^{\eta'+o(1)} \hbox{geom} (\sum_{i \in {\mathcal I}_j} \|f_i\|_{L^p(w_B)}^2)^{1/2} \ \ \ \ \ (3)


whenever {{\mathcal I}_1,\dots,{\mathcal I}_M} are families of disjoint subintervals on {[0,1]} of length {1/N} that are separated from each other by a distance of {\gg 1}, and where {\hbox{geom} = \hbox{geom}_{j=1,\dots,M}} denotes the geometric mean

\displaystyle \hbox{geom} x_j := (x_1 \dots x_M)^{1/M}.

We have the following nice equivalence (essentially due to Bourgain and Guth, building upon an earlier “bilinear equivalence” result of Vargas, Vega, and myself, and discussed in this previous blog post):

Proposition 6 (Multilinear equivalence) For any {\eta'>0}, the estimates (2) and (3) are equivalent.

Proof: The derivation of (3) from (2) is immediate from Hölder’s inequality. To obtain the converse implication, let {A(N)} denote the best constant in (2), thus {A(N)} is the smallest constant such that

\displaystyle \| \sum_{i=1}^N f_i \|_{L^p(w_B)} \leq A(N) (\sum_{i=1}^N \|f_i\|_{L^p(w_B)}^2)^{1/2}. \ \ \ \ \ (4)


The idea is to prove an inequality of the form

\displaystyle A(N) \ll A(N/K) + O_K( N^{\eta' + o(1)} )

for any fixed integer {K>2M} (with the implied constant in the {\ll} notation independent of {K}); by choosing {K} large enough one can then prove {A(N) \ll N^{\eta'+o(1)}} by an inductive argument.

We partition the {N} intervals in (2) into {K} classes {{\mathcal I}_1,\dots, {\mathcal I}_K} of {\sim N/K} consecutive intervals, so that {\sum_{i=1}^N f_i} can be expressed as {\sum_{k=1}^K F_k} where {F_k := \sum_{i \in {\mathcal I}_k} f_i}. Observe that for any {x}, one either has

\displaystyle |F_k(x)| \gg \sum_{k=1}^K |F_k(x)|

for some {k=1,\dots,K} (i.e. one of the {|F_k(x)|} dominates the sum), or else one has

\displaystyle \sum_{k=1}^K |F_k(x)| \ll_K \hbox{geom} |F_{k_j}(x)|

for some {k_1 < \dots < k_K} with the transversality condition {k_{j+1} - k_j > 1}. This leads to the pointwise inequality

\displaystyle \sum_{k=1}^K |F_k(x)| \ll \sup_k |F_k(x)| + O_K( \sum_{k_1 < \dots < k_K, \hbox{ transverse}} \hbox{geom} |F_{k_j}(x)| ).

Bounding the supremum {\sup_k |F_k(x)|} by {(\sum_{k=1}^K |F_k(x)|^p)^{1/p}} and then taking {L^p} norms and using (3), we conclude that

\displaystyle \| \sum_{i=1}^N f_i\|_{L^p(w_B)} \ll (\sum_{k=1}^K \| \sum_{i \in {\mathcal I}_k} f_i \|_{L^p(w_B)}^p)^{1/p}

\displaystyle + O_K( N^{\eta'+o(1)} (\sum_{i=1}^n \|f_i\|_{L^p(w_B)}^2)^{1/2} ).

On the other hand, applying an affine rescaling to (4) one sees that

\displaystyle \| \sum_{i \in {\mathcal I}_k} f_i \|_{L^p(w_B)} \leq A(N/K) (\sum_{i \in {\mathcal I}_k} \|f_i\|_{L^p(w_B)}^2)^{1/2},

and the claim follows. (A more detailed version of this argument may be found in Theorem 4.1 of this paper of Bourgain and Demeter.) \Box

It thus suffices to show (3).

The next step is to set up some intermediate scales between {1} and {N}, in order to run an “induction on scales” argument. For any scale {r > 0}, any exponent {1 < t < \infty}, and any function {f \in L^t({\bf R}^n)}, let {\hbox{Avg}_{t,r} f \in L^p({\bf R}^n)} denote the local {L^p} average

\displaystyle \hbox{Avg}_{t,r} f(x) := ( \frac{1}{|B(x,r)|} \int_{{\bf R}^n} |f(y)|^t w_{B(x,r)}(y)\ dy)^{1/t}

where {|B(x,r)|} denotes the volume of {B(x,r)} (one could also use the equivalent quantity {\int_{{\bf R}^n} w_{B(x,r)}(y)\ dy} here if desired). For any exponents {2 \leq t < \infty}, {0 \leq q \leq 1}, and {q \leq s \leq n} (independent of {N}), let {a_t(q,s)} denote the least exponent for which one has the local decoupling inequality

\displaystyle \| \hbox{geom} (\sum_{I \in {\mathcal I}_{j,q}} |\hbox{Avg}_{t,N^s} f_{I}|^2)^{1/2}\|_{L^p(w_B)} \ \ \ \ \ (5)


\displaystyle \ll N^{a_t(q,s)+o(1)} \hbox{geom} (\sum_{i \in {\mathcal I}_j} \|f_i\|_{L^p(w_B)}^2)^{1/2}

for {f_i, B, {\mathcal I}_j} as in (3), where the {1/N}-length intervals in {{\mathcal I}_j} have been covered by a family {{\mathcal I}_{j,q}} of finitely overlapping intervals of length {1/N^q}, and {f_I := \sum_{i: I_i \subset I} f_i}. It is then not difficult to see that the estimate (3) is equivalent to the inequality

\displaystyle a_2(0,0) \leq \eta'

(basically because when {q=0}, there is essentially only one {I} for each {j}, and {f_I} is basically {\sum_{i \in {\mathcal I}_j} f_i}; also; the averaging {\hbox{Avg}_{t,N^s}} is essentially the identity when {s=0} since all the {f_i} and {f_I} here have Fourier support on a ball of radius {O(1)}). To put it another way, our task is now to show that

\displaystyle a_2(0,0) < \eta' \ \ \ \ \ (6)


On the other hand, one can establish the following inequalities concerning the quantities {a_t(q,s)}, arranged roughly in increasing order of difficulty to prove.

Proposition 7 (Inequalities on {a_t(q,s)}) Throughout this proposition it is understood that {2 \leq t < \infty}, {0 \leq q \leq 1}, and {q \leq s \leq n}.

  • (i) (Hölder) The quantity {a_t(q,s)} is convex in {1/t}, and monotone nondecreasing in {t}.
  • (ii) (Minkowski) If {t=p}, then {a_t(q,s)} is monotone non-decreasing in {s}.
  • (iii) (Stability) One has {a_2(q,s) = a_2(0,0) + O( q + s )}. (In fact, {a_t(q,s)} is Lipschitz in {q,s} uniformly in {t}, but we will not need this.)
  • (iv) (Rescaled decoupling hypothesis) If {t=p} and {s=n}, then one has {a_t(q,s) \leq (1-q) \eta}.
  • (v) (Lower dimensional decoupling) If {1 \leq k \leq n-1} and {q \leq \frac{s}{k}}, then {a_{k(k+1)}(q,s) \leq a_{k(k+1)}(\frac{s}{k},s)}.
  • (vi) (Multilinear Kakeya) If {1 \leq k \leq n-1} and {(k+1)q \leq n}, then {a_{kp/n}(q,kq) \leq a_{kp/n}(q,(k+1)q)}.

We sketch the proof of the various parts of this proposition in later sections. For now, let us show how these properties imply the claim (6). In the paper of Bourgain, Demeter, and Guth, the above properties were iterated along a certain “tree” of parameters {(t,q,s)}, relying in (v) to increase the {q} parameter (which measures the amount of decoupling) and (vi) to “inflate” or increase the {s} parameter (which measures the spatial scale at which decoupling has been obtained), and (i) to reconcile the different choices of {t} appearing in (v) and (vi), with the remaining properties (ii), (iii), (iv) used to control various “boundary terms” arising from this tree iteration. Here, we will present an essentially equivalent “Bellman function” formulation of the argument which replaces this iteration by a carefully (but rather unmotivatedly) chosen inductive claim. More precisely, let {\varepsilon > 0} be a small quantity (depending only on {p} and {n}) to be chosen later. For any {W>0}, let {Q(W) = Q_\eta(W)} denote the claim that for every {k=2,\dots,n}, and for all sufficiently small {u > 0}, one has the inequality

\displaystyle a_t(u, ku) \leq (1 - 2 Wu (n-k+1) \ \ \ \ \ (7)


\displaystyle + (1-\varepsilon) W u (k-1) k (n+1) (\frac{1}{(k-1)k} - \frac{1}{t} ) ) \eta

for all

\displaystyle (k-1) k \leq t \leq \frac{kp}{n}, \ \ \ \ \ (8)


and also

\displaystyle a_t(u,u) \leq (1 - Wu(n-1)) \eta \ \ \ \ \ (9)


for {2 \leq t \leq p/n}.

From Proposition 7 (i), (ii), (iv), we see that {Q(W)} holds for some small {W>0}. We will shortly establish the implication

\displaystyle Q(W) \implies Q( (1+\delta) W ) \ \ \ \ \ (10)


for some {\delta>0} independent of {W}; this implies upon iteration that {Q(W)} holds for arbitrarily large values of {W}. Applying (9) with {t=2} for a sufficiently large {W} and a sufficiently small {u}, and combining with Proposition 7(iii), we obtain the claim (6).

We now prove the implication (10). Thus we assume (7) holds for {2 \leq k \leq n}, sufficiently small {u > 0}, and {t} obeying (8), and also (9) for {2 \leq t \leq p/n} and we wish to improve this to

\displaystyle a_t(u, ku) \leq (1 - 2 (1+\delta)Wu (n-k+1) \ \ \ \ \ (11)


\displaystyle + (1-\varepsilon) (1+\delta) W u (k-1) k (n+1) (\frac{1}{(k-1)k} - \frac{1}{t} ) ) \eta

for the same range of {k, t} and for sufficiently small {u}, and also

\displaystyle a_t(u,u) \leq (1 - (1+\delta)Wu(n-1)) \eta \ \ \ \ \ (12)


for {2 \leq t \leq p/n}.

By Proposition 7(i) it suffices to show this for the extreme values of {t}, thus we wish to show that

\displaystyle a_{kp/n}(u, ku) \leq (1 - 2 (1+\delta)Wu (n-k+1) \ \ \ \ \ (13)


\displaystyle + (1-\varepsilon) (1+\delta) W u (k-1) k (n+1) (\frac{1}{(k-1)k} - \frac{n}{pk} ) ) \eta

for {k=2,\dots,n},

\displaystyle a_{(k-1)k}(u, ku) \leq (1 - 2 (1+\delta)Wu (n-k+1) ) \eta \ \ \ \ \ (14)


for {k=2,\dots,n}, and

\displaystyle a_2(u, u), a_{p/n}(u,u) \leq (1 - (1+\delta)Wu (n-1) ) \eta. \ \ \ \ \ (15)


We begin with (13). The {k=n} case of this estimate is

\displaystyle a_p(u,nu) \leq (1 - 2 (1+\delta) Wu \ \ \ \ \ (16)


\displaystyle + (1-\varepsilon) (1+\delta) W (n-1) n (n+1) (\frac{1}{(n-1)n} - \frac{1}{p}) ) \eta.

But since {p < n(n+1)}, we see that {(1-\varepsilon) (n-1) n (n+1) (\frac{1}{(n-1)n} - \frac{1}{p}) > 2} if {\varepsilon} is small enough, so the right-hand side of (16) is greater than {\eta} and the claim follows from Proposition 7(iv) (with a little bit of room to spare). Now we look at the {k=2,\dots,n-1} cases of (13). By Proposition 7(vi), we have

\displaystyle a_{kp/n}(u, ku) \leq a_{kp/n}(u,(k+1)u).

For {p} close to {n(n+1)}, {kp/n} lies between {\max(2,k(k+1))} and {\frac{(k+1)p}{n}}, so from (7) one has

\displaystyle a_{kp/n}(u,(k+1)u) \leq (1 - 2 Wu (n-k)

\displaystyle + (1-\varepsilon) W u k (k+1) (n+1) (\frac{1}{k(k+1)} - \frac{n}{pk} ) ) \eta.

Since {p < n(n+1)}, one has

\displaystyle -2(n-k) + (1-\varepsilon) k(k+1) (n+1) (\frac{1}{k(k+1)} - \frac{n}{pk})

\displaystyle < -2(n-k+1) + (1-\varepsilon) (k-1) k (n+1) (\frac{1}{(k-1)k} - \frac{n}{pk} )

for {\varepsilon} small enough depending on {p}, and (13) follows (if {\delta} is small enough depending on {\varepsilon,p} but not on {W}).

The same argument applied with {k=1} gives

\displaystyle a_{p/n}(u,u) \leq (1 - 2Wu(n-1)

\displaystyle + (1-\varepsilon) 2W u (n+1) (\frac{1}{2} - \frac{n}{2p} ) ) \eta.

Since {p < n(n+1)}, we thus have

\displaystyle a_{p/n}(u,u) \leq (1 - (1+\delta) Wu(n-1) ) \eta

if {\varepsilon,\delta} are sufficiently small depending on {p} (but not on {W}). This, together with Proposition 7(i), gives (15).

Finally, we establish (14). From Proposition 7(v) (with {k} replaced by {k-1}) we have

\displaystyle a_{(k-1)k}(u, ku) \leq a_{(k-1)k}( \frac{k}{k-1} u, ku ).

In the {k=2} case, this gives

\displaystyle a_{(k-1)k}(u,ku) \leq a_2( 2u, 2u )

and the claim (14) follows from (15) in this case. Now suppose {2 < k \leq n}. Since {p} is close to {n(n+1)}, {(k-1)k} lies between {(k-2)(k-1)} and {\frac{(k-1)p}{n}}, and so we may apply (7) to conclude that

\displaystyle a_{(k-1)k}( \frac{k}{k-1} u, ku ) \leq (1 - 2 W \frac{k}{k-1} u (n-k+2)

\displaystyle + (1-\varepsilon) W \frac{k}{k-1} u(k-1) (k-2) (n+1) (\frac{1}{(k-2)(k-1)} - \frac{1}{(k-1)k} ) ) \eta

and hence (after simplifying)

\displaystyle a_{(k-1)k}(u, ku) \leq (1 - 2W u (n-k+1) (1 + \frac{\varepsilon (n+1)}{(n-k+1)(k-1)} )) \eta,

which gives (14) for {\delta} small enough (depending on {\varepsilon,k,n}, but not on {W}).

— 2. Rescaled decoupling —

The claims (i), (ii), (iii) of Proposition 7 are routine applications of the Hölder and Minkowski inequalities (and also the Bernstein inequality, in the case of (iii)); we will focus on the more interesting claims (iv), (v), (vi).

Here we establish (iv). The main geometric point exploited here is that any segment of the curve {\gamma([0,1])} is affinely equivalent to {\gamma([0,1])} itself, with the key factor of {1-q} in the bound {a_t(q,s) \leq (1-q) \eta} coming from this affine rescaling.

Using the definition (5) of {a_t(q,s)}, we see that we need to show that

\displaystyle \| \hbox{geom} (\sum_{I \in {\mathcal I}_{j,q}} |\hbox{Avg}_{p,N^n} f_{I}|^2)^{1/2}\|_{L^p(w_B)}

\displaystyle \ll N^{(1-q)\eta+o(1)} \hbox{geom} (\sum_{i \in {\mathcal I}_j} \|f_i\|_{L^p(w_B)}^2)^{1/2}

for balls {B} of radius {N^n}. By Hölder’s inequality, it suffices to show that

\displaystyle \| (\sum_{I \in {\mathcal I}_{j,q}} |\hbox{Avg}_{p,N^n} f_{I}|^2)^{1/2}\|_{L^p(w_B)} \ll N^{(1-q)\eta+o(1)} (\sum_{i \in {\mathcal I}_j} \|f_i\|_{L^p(w_B)}^2)^{1/2}

for each {j}. By Minkowski’s inequality (and the fact that {p>2}), the left-hand side is at most

\displaystyle ( \sum_{I \in {\mathcal I}_{j,q}} \| \hbox{Avg}_{p,N^n} f_{I}\|_{L^p(w_B)}^2)^{1/2}

so it suffices to show that

\displaystyle \| \hbox{Avg}_{p,N^n} f_{I} \|_{L^p(w_B)} \ll N^{(1-q)\eta+o(1)} (\sum_{i: I_i \subset I} \|f_i\|_{L^p(w_B)})^{1/2}

for each {I \in {\mathcal I}_{j,q}}. From Fubini’s theorem one has

\displaystyle \| \hbox{Avg}_{p,N^n} f_{I} \|_{L^p(w_B)} \ll_{n,p} \| f_{I} \|_{L^p(w_B)}

so we reduce to showing that

\displaystyle \| f_{I} \|_{L^p(w_B)} \ll N^{(1-q)\eta+o(1)} (\sum_{i: I_i \subset I} \|f_i\|_{L^p(w_B)}^2)^{1/2}.

But this follows by applying an affine rescaling to map {\gamma(I)} to {\gamma([0,1])}, and then using the hypothesis {P(p,\eta)} with {N} replaced by {N^{1-q}}. (The ball {B} gets distorted into an ellipsoid, but one can check that this ellipsoid can be covered efficiently by finitely overlapping balls of radius {N^{1-q}}, and so one can close the argument using the triangle inequality.)

— 3. Lower dimensional decoupling —

Now we establish (v). Here, the geometric point is the one implicitly used in Proposition 3, namely that the {n}-dimensional curve {\gamma([0,1])} projects down to the {k}-dimensional curve {\gamma_k([0,1])} for any {1 \leq k < n}.

Let {k,q,s} be as in Proposition 7(v). From (5), it suffices to show that

\displaystyle \| \hbox{geom} (\sum_{I \in {\mathcal I}_{j,q}} |\hbox{Avg}_{k(k+1),N^s} f_{I}|^2)^{1/2}\|_{L^p(w_B)}

\displaystyle \ll N^{o(1)} \| \hbox{geom} (\sum_{J \in {\mathcal I}_{j,s/k}} |\hbox{Avg}_{k(k+1),N^s} f_{J}|^2)^{1/2}\|_{L^p(w_B)}

for balls {B} of radius {N^n}. It will suffice to show the pointwise estimate

\displaystyle \hbox{geom} (\sum_{I \in {\mathcal I}_{j,q}} |\hbox{Avg}_{k(k+1),N^s} f_{I}(x_0)|^2)^{1/2}

\displaystyle \ll N^{o(1)} \hbox{geom} (\sum_{J \in {\mathcal I}_{j,s/k}} |\hbox{Avg}_{k(k+1),N^s} f_{J}(x_0)|^2)^{1/2}

for any {x_0 \in {\bf R}^n}, or equivalently that

\displaystyle \hbox{geom} (\sum_{I \in {\mathcal I}_{j,q}} \| f_I \|_{L^{k(k+1)}(w_{B'})}^2)^{1/2}

\displaystyle \ll N^{o(1)} \hbox{geom} (\sum_{J \in {\mathcal I}_{j,s/k}} \| f_J \|_{L^{k(k+1)}(w_{B'})}^2)^{1/2}

where {B' := B(x_0,N^s)}. Clearly this will follow if we have

\displaystyle (\sum_{I \in {\mathcal I}_{j,q}} \| f_I \|_{L^{k(k+1)}(w_{B'})}^2)^{1/2} \ll N^{o(1)} (\sum_{J \in {\mathcal I}_{j,s/k}} \| f_J \|_{L^{k(k+1)}(w_{B'})}^2)^{1/2}

for each {j}. Covering the intervals in {{\mathcal I}_{j,s/k}} by those in {{\mathcal I}_{j,q}}, it suffices to show that

\displaystyle \| f_I \|_{L^{k(k+1)}(w_{B'})} \ll N^{o(1)} (\sum_{J: J \subset I} \| f_J \|_{L^{k(k+1)}(w_{B'})}^2)^{1/2}

for each {I \in {\mathcal I}_{j,q}}. But this follows from Proposition 3.

— 4. Multidimensional Kakeya —

Finally, we establish (vi), which is the most substantial component of Proposition 7, and the only component which truly takes advantage of the reduction to the multilinear setting. Let {1 \leq k \leq n-1} and {q} be such that {(k+1)q \leq n}. From (5), it suffices to show that

\displaystyle \| \hbox{geom} (\sum_{I \in {\mathcal I}_{j,q}} |\hbox{Avg}_{kp/n, N^{kq}} f_{I}|^2)^{1/2}\|_{L^p(w_B)}

\displaystyle \ll N^{o(1)} \| \hbox{geom} (\sum_{I \in {\mathcal I}_{j,q}} |\hbox{Avg}_{kp/n, N^{(k+1)q}} f_{I}|^2)^{1/2}\|_{L^p(w_B)}

for balls {B} of radius {N^n}. By averaging, it suffices to establish the bound

\displaystyle \| \hbox{geom} (\sum_{I \in {\mathcal I}_{j,q}} |\hbox{Avg}_{kp/n, N^{kq}} f_{I}|^2)^{1/2}\|_{L^p(w_{B'})}

\displaystyle \ll N^{o(1)} \| \hbox{geom} (\sum_{I \in {\mathcal I}_{j,q}} |\hbox{Avg}_{kp/n, N^{(k+1)q}} f_{I}|^2)^{1/2}\|_{L^p(w_{B'})}

for balls {B'} of radius {N^{(k+1)q}}. If we write {F_I := (\hbox{Avg}_{kp/n, N^{kq}} f_{I})^{kp/n}}, the right-hand side simplifies to

\displaystyle N^{o(1)} |B'|^{1/p} \hbox{geom} (\sum_{I \in {\mathcal I}_{j,q}} (\frac{1}{|B'|} \int F_I w_{B'})^{2n/kp})^{1/2}

so it suffices to show that

\displaystyle |B'|^{-1/p} \| \hbox{geom} (\sum_{I \in {\mathcal I}_{j,q}} F_I^{2n/kp})^{1/2}\|_{L^p(w_{B'})}

\displaystyle \ll N^{o(1)} \hbox{geom} (\sum_{I \in {\mathcal I}_{j,q}} (\frac{1}{|B'|} \int F_I w_{B'})^{2n/kp})^{1/2}.

At this point it is convenient to perform a dyadic pigeonholing (giving up a factor of {N^{o(1)}}) to normalise, for each {j}, all of the quantities {\frac{1}{|B'|} \int F_I w_{B'}} to be of comparable size, after reducing the sets {{\mathcal I}_{j,q}} so some appropriate subset {{\mathcal I}'_{j,q}}. (The contribution of those {I} for which this quantity is less than, say, {N^{-100n^2}} of the maximal value, can be safely discarded by trivial estimates.) By homogeneity we may then normalise

\displaystyle \frac{1}{|B'|} \int F_I w_{B'} \sim 1

for all surviving {I}, so the estimate now becomes

\displaystyle |B'|^{-1/p} \| \hbox{geom} (\sum_{I \in {\mathcal I}'_{j,q}} F_I^{2n/kp})^{1/2}\|_{L^p(w_{B'})} \ll N^{o(1)} \hbox{geom} (\# {\mathcal I}'_{j,q})^{1/2}.

Since {p} is close to {n(n+1)}, {2n/kp} is less than {1}, so we can estimate

\displaystyle (\sum_{I \in {\mathcal I}'_{j,q}} F_I^{2n/kp})^{1/2} \leq (\# {\mathcal I}'_{j,q})^{1/2 - n/kp} (\sum_{I \in {\mathcal I}'_{j,q}} F_I)^{n/kp}

and so it suffices to show that

\displaystyle |B'|^{-1/p} \| \hbox{geom} (\sum_{I \in {\mathcal I}'_{j,q}} F_I)^{n/kp}\|_{L^p(w_{B'})} \ll N^{o(1)} \hbox{geom} (\# {\mathcal I}'_{j,q})^{n/kp},

or, on raising to the power {pk/n},

\displaystyle |B'|^{-k/n} \| \hbox{geom} \sum_{I \in {\mathcal I}'_{j,q}} F_I\|_{L^{n/k}(w_{B'})} \ll N^{o(1)} \hbox{geom} (\# {\mathcal I}'_{j,q}).

Localising to balls {B'} of radius {N^{(k+1)q}}, it suffices to show that

\displaystyle |B'|^{-k/n} \| \hbox{geom} \sum_{I \in {\mathcal I}'_{j,q}} F_I\|_{L^{n/k}(B')} \ll N^{o(1)} \hbox{geom} \sum_{I \in {\mathcal I}'_{j,q}} |B'|^{-1} \| F_I \|_{L^1(B')}.

The arc {\gamma(I)} is contained in a box of dimensions roughly {N^{-q} \times N^{-2q} \times \dots \times N^{-nq}}, so by the uncertainty principle {f_I} is essentially constant along boxes of dimensions {N^q \times N^{2q} \times \dots \times N^{nq}} (this can be made precise by standard methods, see e.g. the discussion in the proof of Theorem 5.6 of Bourgain-Demeter-Guth, or my general discussion on the uncertainty principle in this previous blog post). This implies that {F_I}, when restricted to {B'}, is essentially constant on “plates”, defined as the intersection of {B'} with slabs that have {k} dimensions of length {N^{kq}} and the remaining {n-k} dimensions infinite (and thus restricted to be of length about {N^{(k+1)q}} after restriction to {B'}). Furthermore, as {j} varies (and {I} is constrained to be in {{\mathcal I}'_{j,q}}, the orientation of these slabs varies in a suitably “transverse” fashion (the precise definition of this is a little technical, but can be verified for {M=n!}; see the BDG paper for details). After rescaling, the claim then follows from the following proposition:

Proposition 8 (Multilinear Kakeya) For {j=1,\dots,M}, let {{\mathcal P}_j} be a collection of “plates” that have {k} dimensions of length {1}, and {n-k} dimensions that are infinite, and for each {P \in {\mathcal P}_j} let {c_P} be a non-negative number. Assume that the families of plates {{\mathcal P}_j} obey a suitable transversality condition. Then

\displaystyle \| \hbox{geom} \sum_{P \in {\mathcal P}_j} c_P 1_P \|_{L^{n/k}(B)} \ll N^{o(1)} \hbox{geom} \sum_{P \in {\mathcal P}_j} c_P

for any ball {B} of radius {N}.

The exponent {n/k} here is natural, as can be seen by considering the example where each {{\mathcal P}_j} consists of about {N^k} parallel disjoint plates passing through {B}, with {c_P=1} for all such plates.

For {k = n-1} (where the plates now become tubes), this result was first obtained by Bennett, Carbery, and myself using heat kernel methods, with a rather different proof (also capturing the endpoint case) later given using algebraic topological methods by Guth (as discussed in this previous post. More recently, a very short and elementary proof of this theorem was given by Guth, which was initially given for {k=n-1} but extends to general {k}. The scheme of the proof can be described as follows.

  • When all the plates {P} in a each family {{\mathcal P}_j} are parallel, the claim follows from the Loomis-Whitney inequality (when {k=n-1}) or a more general Brascamp-Lieb inequality of Bennett, Carbery, Christ, and myself (for general {k}). These inequalities can be proven by a repeated applications of the Hölder inequality and Fubini’s theorem.
  • Perturbing this, we can obtain the proposition with a loss of {N^\varepsilon} for any {N>0} and {\varepsilon>0}, provided that the plates in each {{\mathcal P}_j} are within {\delta} of being parallel, and {\delta} is sufficiently small depending on {N} and {\varepsilon}. (For the case of general {k}, this requires some uniformity in the result of Bennett, Carbery, Christ, and myself, which can be obtained by hand in the specific case of interest here, but was recently established in general by Bennett, Bez, Flock, and Lee.
  • A standard “induction on scales” argument shows that if the proposition is true at scale {N} with some loss {K(N)}, then it is also true at scale {N^2} with loss {O( K(N)^2 )}. Iterating this, we see that we can obtain the proposition with a loss of {O_\varepsilon(N^\varepsilon)} uniformly for all {N>0}, provided that the plates are within {\delta} of being parallel and {\delta} is sufficiently small depending now only on {\varepsilon} (and not on {N}).
  • A finite partition of unity then suffices to remove the restriction of the plates being within {\delta} of each other, and then sending {\varepsilon} to zero we obtain the claim.

The proof of the decoupling theorem (and thus the Vinogradov main conjecture) are now complete.

Remark 9 The above arguments extend to give decoupling for the curve {\gamma([0,1])} in {L^p} for every {n^2 \leq p \leq n(n+1)}. As it turns out (Bourgain, private communication), a variant of the argument also handles the range {n(n-1) \leq p \leq n^2}, and the range {2 \leq p \leq n(n-1)} can be covered from an induction on dimension (using the argument used to establish Proposition 3).

Filed under: expository, math.CA, math.NT Tagged: Ciprian Demeter, decoupling, induction on scales, Jean Bourgain, Larry Guth, multilinear Kakeya conjecture, restriction theorems, Vinogradov main conjecture

January 26, 2016

Scott AaronsonMarvin Minsky

Yesterday brought the sad news that Marvin Minsky passed away at age 88.  I never met Minsky (I wish I had); I just had one email exchange with him back in 2002, about Stephen Wolfram’s book.  But Minsky was my academic great-grandfather (through Manuel Blum and Umesh Vazirani), and he influenced me in many other ways.  For example, in his and Papert’s 1968 book Perceptrons—notorious for “killing neural net research for a decade,” because of its mis- or over-interpreted theorems about the representational limitations of single-layer neural nets—the way Minsky and Papert proved those theorems was by translating questions about computation into questions about the existence or nonexistence of low-degree polynomials with various properties, and then answering the latter questions using MATH.  Their “polynomial method” is now a mainstay of quantum algorithms research (having been brought to the subject by Beals et al.), and in particular, has been a mainstay of my own career.  Hardly Minsky’s best-known contribution to human knowledge, but that even such a relatively minor part of his oeuvre could have legs half a century later is a testament to his impact.

I’m sure readers will have other thoughts to share about Minsky, so please do so in the comments section.  Personal reminiscences are especially welcome.