# Planet Musings

## July 22, 2017

### Scott Aaronson — Is “information is physical” contentful?

“Information is physical.”

This slogan seems to have originated around 1991 with Rolf Landauer.  It’s ricocheted around quantum information for the entire time I’ve been in the field, incanted in funding agency reports and popular articles and at the beginnings and ends of talks.

But what the hell does it mean?

There are many things it’s taken to mean, in my experience, that don’t make a lot of sense when you think about them—or else they’re vacuously true, or purely a matter of perspective, or not faithful readings of the slogan’s words.

For example, some people seem to use the slogan to mean something more like its converse: “physics is informational.”  That is, the laws of physics are ultimately not about mass or energy or pressure, but about bits and computations on them.  As I’ve often said, my problem with that view is less its audacity than its timidity!  It’s like, what would the universe have to do in order not to be informational in this sense?  “Information” is just a name we give to whatever picks out one element from a set of possibilities, with the “amount” of information given by the log of the set’s cardinality (and with suitable generalizations to infinite sets, nonuniform probability distributions, yadda yadda).  So, as long as the laws of physics take the form of telling us that some observations or configurations of the world are possible and others are not, or of giving us probabilities for each configuration, no duh they’re about information!

Other people use “information is physical” to pour scorn on the idea that “information” could mean anything without some actual physical instantiation of the abstract 0’s and 1’s, such as voltage differences in a loop of wire.  Here I certainly agree with the tautology that in order to exist physically—that is, be embodied in the physical world—a piece of information (like a song, video, or computer program) does need to be embodied in the physical world.  But my inner Platonist slumps in his armchair when people go on to assert that, for example, it’s meaningless to discuss the first prime number larger than 1010^125, because according to post-1998 cosmology, one couldn’t fit its digits inside the observable universe.

If the cosmologists revise their models next week, will this prime suddenly burst into existence, with all the mathematical properties that one could’ve predicted for it on general grounds—only to fade back into the netherworld if the cosmologists revise their models again?  Why would anyone want to use language in such a tortured way?

Yes, brains, computers, yellow books, and so on that encode mathematical knowledge comprise only a tiny sliver of the physical world.  But it’s equally true that the physical world we observe comprises only a tiny sliver of mathematical possibility-space.

Still other people use “information is physical” simply to express their enthusiasm for the modern merger of physical and information sciences, as exemplified by quantum computing.  Far be it from me to temper that enthusiasm: rock on, dudes!

Yet others use “information is physical” to mean that the rules governing information processing and transmission in the physical world aren’t knowable a priori, but can only be learned from physics.  This is clearest in the case of quantum information, which has its own internal logic that generalizes the logic of classical information.  But in some sense, we didn’t need quantum mechanics to tell us this!  Of course the laws of physics have ultimate jurisdiction over whatever occurs in the physical world, information processing included.

My biggest beef, with all these unpackings of the “information is physical” slogan, is that none of them really engage with any of the deep truths that we’ve learned about physics.  That is, we could’ve had more-or-less the same debates about any of them, even in a hypothetical world where the laws of physics were completely different.

So then what should we mean by “information is physical”?  In the rest of this post, I’d like to propose an answer to that question.

We get closer to the meat of the slogan if we consider some actual physical phenomena, say in quantum mechanics.  The double-slit experiment will do fine.

Recall: you shoot photons, one by one, at a screen with two slits, then examine the probability distribution over where the photons end up on a second screen.  You ask: does that distribution contain alternating “light” and “dark” regions, the signature of interference between positive and negative amplitudes?  And the answer, predicted by the math and confirmed by experiment, is: yes, but only if the information about which slit the photon went through failed to get recorded anywhere else in the universe, other than the photon location itself.

Here a skeptic interjects: but that has to be wrong!  The criterion for where a physical particle lands on a physical screen can’t possibly depend on anything as airy as whether “information” got “recorded” or not.  For what counts as “information,” anyway?  As an extreme example: what if God, unbeknownst to us mortals, took divine note of which slit the photon went through?  Would that destroy the interference pattern?  If so, then every time we do the experiment, are we collecting data about the existence or nonexistence of an all-knowing God?

It seems to me that the answer is: insofar as the mind of God can be modeled as a tensor factor in Hilbert space, yes, we are.  And crucially, if quantum mechanics is universally true, then the mind of God would have to be such a tensor factor, in order for its state to play any role in the prediction of observed phenomena.

To say this another way: it’s obvious and unexceptionable that, by observing a physical system, you can often learn something about what information must be in it.  For example, you need never have heard of DNA to deduce that chickens must somehow contain information about making more chickens.  What’s much more surprising is that, in quantum mechanics, you can often deduce things about what information can’t be present, anywhere in the physical world—because if such information existed, even a billion light-years away, it would necessarily have a physical effect that you don’t see.

Another famous example here concerns identical particles.  You may have heard the slogan that “if you’ve seen one electron, you’ve seen them all”: that is, apart from position, momentum, and spin, every two electrons have exactly the same mass, same charge, same every other property, including even any properties yet to be discovered.  Again the skeptic interjects: but that has to be wrong.  Logically, you could only ever confirm that two electrons were different, by observing a difference in their behavior.  Even if the electrons had behaved identically for a billion years, you couldn’t rule out the possibility that they were actually different, for example because of tiny nametags (“Hi, I’m Emily the Electron!” “Hi, I’m Ernie!”) that had no effect on any experiment you’d thought to perform, but were visible to God.

You can probably guess where this is going.  Quantum mechanics says that, no, you can verify that two particles are perfectly identical by doing an experiment where you swap them and see what happens.  If the particles are identical in all respects, then you’ll see quantum interference between the swapped and un-swapped states.  If they aren’t, you won’t.  The kind of interference you’ll see is different for fermions (like electrons) than for bosons (like photons), but the basic principle is the same in both cases.  Once again, quantum mechanics lets you verify that a specific type of information—in this case, information that distinguishes one particle from another—was not present anywhere in the physical world, because if it were, it would’ve destroyed an interference effect that you in fact saw.

This, I think, already provides a meatier sense in which “information is physical” than any of the senses discussed previously.

But we haven’t gotten to the filet mignon yet.  The late, great Jacob Bekenstein will forever be associated with the discovery that information, wherever and whenever it occurs in the physical world, takes up a minimum amount of space.  The most precise form of this statement, called the covariant entropy bound, was worked out in detail by Raphael Bousso.  Here I’ll be discussing a looser version of the bound, which holds in “non-pathological” cases, and which states that a bounded physical system can store at most A/(4 ln 2) bits of information, where A is the area in Planck units of any surface that encloses the system—so, about 1069 bits per square meter.  (Actually it’s 1069 qubits per square meter, but because of Holevo’s theorem, an upper bound on the number of qubits is also an upper bound on the number of classical bits that can be reliably stored in a system and then retrieved later.)

You might have heard of the famous way Nature enforces this bound.  Namely, if you tried to create a hard drive that stored more than 1069 bits per square meter of surface area, the hard drive would necessarily collapse to a black hole.  And from that point on, the information storage capacity would scale “only” with the area of the black hole’s event horizon—a black hole itself being the densest possible hard drive allowed by physics.

Let’s hear once more from our skeptic.  “Nonsense!  Matter can take up space.  Energy can take up space.  But information?  Bah!  That’s just a category mistake.  For a proof, suppose God took one of your black holes, with a 1-square-meter event horizon, which already had its supposed maximum of ~1069 bits of information.  And suppose She then created a bunch of new fundamental fields, which didn’t interact with gravity, electromagnetism, or any of the other fields that we know from observation, but which had the effect of encoding 10300 new bits in the region of the black hole.  Presto!  An unlimited amount of additional information, exactly where Bekenstein said it couldn’t exist.”

We’d like to pinpoint what’s wrong with the skeptic’s argument—and do so in a self-contained, non-question-begging way, a way that doesn’t pull any rabbits out of hats, other than the general principles of relativity and quantum mechanics.  I was confused myself about how to do this, until a month ago, when Daniel Harlow helped set me straight (any remaining howlers in my exposition are 100% mine, not his).

I believe the logic goes like this:

1. Relativity—even just Galilean relativity—demands that, in flat space, the laws of physics must have the same form for all inertial observers (i.e., all observers who move through space at constant speed).
2. Anything in the physical world that varies in space—say, a field that encodes different bits of information at different locations—also varies in time, from the perspective of an observer who moves through the field at a constant speed.
3. Combining 1 and 2, we conclude that anything that can vary in space can also vary in time.  Or to say it better, there’s only one kind of varying: varying in spacetime.
4. More strongly, special relativity tells us that there’s a specific numerical conversion factor between units of space and units of time: namely the speed of light, c.  Loosely speaking, this means that if we know the rate at which a field varies across space, we can also calculate the rate at which it varies across time, and vice versa.
5. Anything that varies across time carries energy.  Why?  Because this is essentially the definition of energy in quantum mechanics!  Up to a constant multiple (namely, Planck’s constant), energy is the expected speed of rotation of the global phase of the wavefunction, when you apply your Hamiltonian.  If the global phase rotates at the slowest possible speed, then we take the energy to be zero, and say you’re in a vacuum state.  If it rotates at the next highest speed, we say you’re in a first excited state, and so on.  Indeed, assuming a time-independent Hamiltonian, the evolution of any quantum system can be fully described by simply decomposing the wavefunction into a superposition of energy eigenstates, then tracking of the phase of each eigenstate’s amplitude as it loops around and around the unit circle.  No energy means no looping around means nothing ever changes.
6. Combining 3 and 5, any field that varies across space carries energy.
7. More strongly, combining 4 and 5, if we know how quickly a field varies across space, we can lower-bound how much energy it has to contain.
8. In general relativity, anything that carries energy couples to the gravitational field.  This means that anything that carries energy necessarily has an observable effect: if nothing else, its effect on the warping of spacetime.  (This is dramatically illustrated by dark matter, which is currently observable via its spacetime warping effect and nothing else.)
9. Combining 6 and 8, any field that varies across space couples to the gravitational field.
10. More strongly, combining 7 and 8, if we know how quickly a field varies across space, then we can lower-bound by how much it has to warp spacetime.  This is so because of another famous (and distinctive) feature of gravity: namely, the fact that it’s universally attractive, so all the warping contributions add up.
11. But in GR, spacetime can only be warped by so much before we create a black hole: this is the famous Schwarzschild bound.
12. Combining 10 and 11, the information contained in a physical field can only vary so quickly across space, before it causes spacetime to collapse to a black hole.

Summarizing where we’ve gotten, we could say: any information that’s spatially localized at all, can only be localized so precisely.  In our world, the more densely you try to pack 1’s and 0’s, the more energy you need, therefore the more you warp spacetime, until all you’ve gotten for your trouble is a black hole.  Furthermore, if we rewrote the above conceptual argument in math—keeping track of all the G’s, c’s, h’s, and so on—we could derive a quantitative bound on how much information there can be in a bounded region of space.  And if we were careful enough, that bound would be precisely the holographic entropy bound, which says that the number of (qu)bits is at most A/(4 ln 2), where A is the area of a bounding surface in Planck units.

Let’s pause to point out some interesting features of this argument.

Firstly, we pretty much needed the whole kitchen sink of basic physical principles: special relativity (both the equivalence of inertial frames and the finiteness of the speed of light), quantum mechanics (in the form of the universal relation between energy and frequency), and finally general relativity and gravity.  All three of the fundamental constants G, c, and h made appearances, which is why all three show up in the detailed statement of the holographic bound.

But secondly, gravity only appeared from step 8 onwards.  Up till then, everything could be said solely in the language of quantum field theory: that is, quantum mechanics plus special relativity.  The result would be the so-called Bekenstein bound, which upper-bounds the number of bits in any spatial region by the product of the region’s radius and its energy content.  I learned that there’s an interesting history here: Bekenstein originally deduced this bound using ingenious thought experiments involving black holes.  Only later did people realize that the Bekenstein bound can be derived purely within QFT (see here and here for example)—in contrast to the holographic bound, which really is a statement about quantum gravity.  (An early hint of this was that, while the holographic bound involves Newton’s gravitational constant G, the Bekenstein bound doesn’t.)

Thirdly, speaking of QFT, some readers might be struck by the fact that at no point in our 12-step program did we ever seem to need QFT machinery.  Which is fortunate, because if we had needed it, I wouldn’t have been able to explain any of this!  But here I have to confess that I cheated slightly.  Recall step 4, which said that “if you know the rate at which a field varies across space, you can calculate the rate at which it varies across time.”  It turns out that, in order to give that sentence a definite meaning, one uses the fact that in QFT, space and time derivatives in the Hamiltonian need to be related by a factor of c, since otherwise the Hamiltonian wouldn’t be Lorentz-invariant.

Fourthly, eagle-eyed readers might notice a loophole in the argument.  Namely, we never upper-bounded how much information God could add to the world, via fields that are constant across all of spacetime.  For example, there’s nothing to stop Her from creating a new scalar field that takes the same value everywhere in the universe—with that value, in suitable units, encoding 1050000 separate divine thoughts in its binary expansion.  But OK, being constant, such a field would interact with nothing and affect no observations—so Occam’s Razor itches to slice it off, by rewriting the laws of physics in a simpler form where that field is absent.  If you like, such a field would at most be a comment in the source code of the universe: it could be as long as the Great Programmer wanted it to be, but would have no observable effect on those of us living inside the program’s execution.

Of course, even before relativity and quantum mechanics, information had already been playing a surprisingly fleshy role in physics, through its appearance as entropy in 19th-century thermodynamics.  Which leads to another puzzle.  To a computer scientist, the concept of entropy, as the log of the number of microstates compatible with a given macrostate, seems clear enough, as does the intuition for why it should increase monotonically with time.  Or at least, to whatever extent we’re confused about these matters, we’re no more confused than the physicists are!

But then why should this information-theoretic concept be so closely connected to tangible quantities like temperature, and pressure, and energy?  From the mere assumption that a black hole has a nonzero entropy—that is, that it takes many bits to describe—how could Bekenstein and Hawking have possibly deduced that it also has a nonzero temperature?  Or: if you put your finger into a tub of hot water, does the heat that you feel somehow reflect how many bits are needed to describe the water’s microstate?

Once again our skeptic pipes up: “but surely God could stuff as many additional bits as She wanted into the microstate of the hot water—for example, in degrees of freedom that are still unknown to physics—without the new bits having any effect on the water’s temperature.”

But we should’ve learned by now to doubt this sort of argument.  There’s no general principle, in our universe, saying that you can hide as many bits as you want in a physical object, without those bits influencing the object’s observable properties.  On the contrary, in case after case, our laws of physics seem to be intolerant of “wallflower bits,” which hide in a corner without talking to anyone.  If a bit is there, the laws of physics want it to affect other nearby bits and be affected by them in turn.

In the case of thermodynamics, the assumption that does all the real work here is that of equidistribution.  That is, whatever degrees of freedom might be available to your thermal system, your gas in a box or whatever, we assume that they’re all already “as randomized as they could possibly be,” subject to a few observed properties like temperature and volume and pressure.  (At least, we assume that in classical thermodynamics.  Non-equilibrium thermodynamics is a whole different can of worms, worms that don’t stay in equilibrium.)  Crucially, we assume this despite the fact that we might not even know all the relevant degrees of freedom.

Why is this assumption justified?  “Because experiment bears it out,” the physics teacher explains—but we can do better.  The assumption is justified because, as long as the degrees of freedom that we’re talking about all interact with each other, they’ve already had plenty of time to equilibrate.  And conversely, if a degree of freedom doesn’t interact with the stuff we’re observing—or with anything that interacts with the stuff we’re observing, etc.—well then, who cares about it anyway?

But now, because the microscopic laws of physics have the fundamental property of reversibility—that is, they never destroy information—a new bit has to go somewhere, and it can’t overwrite degrees of freedom that are already fully randomized.  So this is essentially why, if you pump more bits of information into a tub of hot water, while keeping it at the same volume, physicsts are confident that the new bits will have nowhere to go except into pushing up the energy—which, in this case, means describing a greater range of possible speeds for the water molecules.  So then the molecules move faster on average, and your finger feels the water get hotter.

In summary, our laws of physics are structured in such a way that even pure information often has “nowhere to hide”: if the bits are there at all in the abstract machinery of the world, then they’re forced to pipe up and have a measurable effect.  And this is not a tautology, but comes about only because of nontrivial facts about special and general relativity, quantum mechanics, quantum field theory, and thermodynamics.  And this is what I think people should mean when they say “information is physical.”

Anyway, if this was all obvious to you, I apologize for having wasted your time!  But in my defense, it was never explained to me quite this way, nor was it sorted out in my head until recently—even though it seems like one of the most basic and general things one can possibly say about physics.

Endnotes. Thanks again to Daniel Harlow, not only for explaining the logic of the holographic bound to me but for several suggestions that improved this post.

Some readers might suspect circularity in the arguments we’ve made: are we merely saying that “any information that has observable physical consequences, has observable physical consequences”?  No, it’s more than that.  In all the examples I discussed, the magic was that we inserted certain information into our abstract mathematical description of the world, taking no care to ensure that the information’s presence would have any observable consequences whatsoever.  But then the principles of quantum mechanics, quantum gravity, or thermodynamics forced the information to be detectable in very specific ways (namely, via the destruction of quantum interference, the warping of spacetime, or the generation of heat respectively).

## July 21, 2017

### Backreaction — Penrose claims LIGO noise is evidence for Cyclic Cosmology

Noise is the physicists’ biggest enemy. Unless you are a theorist whose pet idea masquerades as noise. Then you are best friends with noise. Like Roger Penrose. Correlated "noise" in LIGO gravitational wave signals: an implication of Conformal Cyclic Cosmology Roger Penrose arXiv:1707.04169 [gr-qc] Roger Penrose made his name with the Penrose-Hawking theorems and twistor theory. He is also

## July 20, 2017

### John Preskill — The world of hackers and secrets

I’m Evgeny Mozgunov, and some of you may remember my earlier posts on Quantum Frontiers. I’ve recently graduated with a PhD after 6 years in the quantum information group at Caltech. As I’m navigating the job market in quantum physics, it was only a matter of time before I got dragged into a race between startups. Those who can promise impressive quantum speedups for practical tasks get a lot of money from venture capitalists. Maybe there’s something about my mind and getting paid: when I’m paid to do something, I suddenly start coming up with solutions that never occurred to me while I was wandering around as a student. And this time, I’ve noticed a possibility of impressing the public with quantum speedups that nobody has ever used before.

Three former members of John Preskill’s group, Gorjan Alagic, Stacey Jeffery and Stephen Jordan, have already proposed this idea (Circuit Obfuscation Using Braids, p.10), but none of the startups seems to have picked it up. You only need a small quantum computer. Imagine you are in the audience. I ask you to come up with a number. Don’t tell it out loud: instead, write it on a secret piece of paper, and take a little time to do a few mathematical operations based on the number. Then announce the result of those operations. Once you are done, people will automatically be split into two categories. Those with access to a small quantum computer (like the one at IBM) will be able to put on a magic hat (the computer…) and recover your number. But the rest of the audience will be left in awe, with no clue as to how this is even possible. There’s nothing they could do to guess your number based only on the result you announced, unless you’re willing to wait for a few days and they have access to the world’s supercomputing powers.

So far I’ve described the general setting of encryption – a cipher is announced, the key to the cipher is destroyed, and only those who can break the code can decipher.  For instance, if RSA encryption is used for the magic show above, indeed only people with a big quantum computer will be able to recover the secret number. To complete my story, I need to describe what the result that you announce (the cipher) looks like:

A sequence of instructions for a small quantum computer that is equivalent to a simple instruction for spitting out your number. However, the announced sequence of instructions is obfuscated, such that you can’t just read off the number from it.

You really need to feed the sequence into a quantum computer, and see what it outputs. Obfuscation is more general than encryption, but here we’re going to use it as a method of encryption.

Alagic et al. taught us how to do something called obfuscation by compiling for a quantum computer: much like when you compile a .c file in your CS class, you can’t really understand the .out file. Of course you can just execute the .out file, but not if it describes a quantum circuit, unless you have access to a quantum computer. The proposed classical compiler turns either a classical or a quantum algorithm into a hard-to-read quantum circuit that looks like braids. Unfortunately, any obfuscation by compiling scheme has the problem that whoever understands the compiler well enough will be able to actually read the .out file (or notice a pattern in braids reduced to a compact “normal” form), and guess your number without resorting to a quantum computer. Surprisingly, even though Alagic et al.’s scheme doesn’t claim any protection under this attack, it still satisfies one of the theoretical definitions of obfuscation: if two people write two different sets of instructions to perform the same operation, and then each obfuscate their own set of instructions by a restricted set of tricks, then it should be impossible to tell from the end result which program was obtained by whom.

Theoretical obfuscation can be illustrated by these video game Nier cosplayers: when they put on their wig and blindfold, they look like the same person. The character named 2B is an android, whose body is disposable, and whose mind is a set of instructions stored on a server. Other characters try to hack her mind as the story progresses.

Quantum scientists can have their own little world of hackers and secrets, organized in the following way: some researchers present their obfuscated code outputting a secret message, and other researchers become hackers trying to break it. Thanks to another result by Alagic et al, we know that hard-to-break obfuscated circuits secure against classical computers exist. But we don’t know how the obfuscator that produces those worst-case instances reliably looks like, so a bit of crowdsourcing to find it is in order. It’s a free-for-all, where all tools and tricks are allowed. In fact, even you can enter! All you need to know is a universal gate set H,T = R(π/4),CNOT and good old matrix multiplication. Come up with a product of these matrices that multiplies to a bunch of X‘s (X=HT⁴H), but such that only you know on which qubits the X are applied. This code will spit out your secret bitstring on an input of all 0’es. Publish it and wait until some hacker breaks it!

Here’s mine, can anyone see what’s my secret bitstring?

One can run it on a 5 qubit quantum computer in less than 1ms. But if you try to multiply the corresponding 32×32 matrices on your laptop, it takes more than 1ms. Quantum speedup right there. Of course I didn’t prove that there’s no better way of finding out my secret than multiplying matrices. In fact, had I used only even powers of the matrix T in the picture above, then there is a classical algorithm available in open source (Aaronson, Gottesman) that recovers the number without having to multiply large matrices.

I’m in luck: startups and venture capitalists never cared about theoretical proofs, it only has to work until it fails. I think they should give millions to me instead of D-wave. Seriously, there’s plenty of applications for practical obfuscation, besides magic shows. One can set up a social network where posts are gibberish except for those who have a quantum computer (that would be a good conspiracy theory some years from now). One can verify when a private company claims to sell a small quantum computer.

I’d like to end on a more general note: small quantum computers are already faster than classical hardware at multiplying certain kinds of matrices. This has already been proven for a restricted class of quantum computers and a task called boson sampling. If there’s a competition in matrix multiplication somewhere in the world, we can already win.

### Andrew Jaffe — Python Bug Hunting

This is a technical, nerdy post, mostly so I can find the information if I need it later, but possibly of interest to others using a Mac with the Python programming language, and also since I am looking for excuses to write more here. (See also updates below.)

It seems that there is a bug in the latest (mid-May 2017) release of Apple’s macOS Sierra 10.12.5 (ok, there are plenty of bugs, as there in any sufficiently complex piece of software).

It first manifested itself (to me) as an error when I tried to load the jupyter notebook, a web-based graphical front end to Python (and other languages). When the command is run, it opens up a browser window. However, after updating macOS from 10.12.4 to 10.12.5, the browser didn’t open. Instead, I saw an error message:

    0:97: execution error: "http://localhost:8888/tree?token=<removed>" doesn't understand the "open location" message. (-1708)


A little googling found that other people had seen this error, too. I was able to figure out a workaround pretty quickly: this behaviour only happens when I wanted to use the “default” browser, which is set in the “General” tab of the “System Preferences” app on the Mac (I have it set to Apple’s own “Safari” browser, but you can use Firefox or Chrome or something else). Instead, there’s a text file you can edit to explicitly set the browser that you want jupyter to use, located at ~/.jupyter/jupyter_notebook_config.py, by including the line

c.NotebookApp.browser = u'Safari'


(although an unrelated bug in Python means that you can’t currently use “Chrome” in this slot).

But it turns out this isn’t the real problem. I went and looked at the code in jupyter that is run here, and it uses a Python module called webbrowser. Even outside of jupyter, trying to use this module to open the default browser fails, with exactly the same error message (though I’m picking a simpler URL at http://python.org instead of the jupyter-related one above):

>>> import webbrowser
>>> br = webbrowser.get()
>>> br.open("http://python.org")
0:33: execution error: "http://python.org" doesn't understand the "open location" message. (-1708)
False


So I reported this as an error in the Python bug-reporting system, and hoped that someone with more experience would look at it.

But it nagged at me, so I went and looked at the source code for the webbrowser module. There, it turns out that the programmers use a macOS command called “osascript” (which is a command-line interface to Apple’s macOS automation language “AppleScript”) to launch the browser, with a slightly different syntax for the default browser compared to explicitly picking one. Basically, the command is osascript -e 'open location "http://www.python.org/"'. And this fails with exactly the same error message. (The similar code osascript -e 'tell application "Safari" to open location "http://www.python.org/"' which picks a specific browser runs just fine, which is why explicitly setting “Safari” back in the jupyter file works.)

But there is another way to run the exact same AppleScript command. Open the Mac app called “Script Editor”, type open location "http://python.org" into the window, and press the “run” button. From the experience with “osascript”, I expected it to fail, but it didn’t: it runs just fine.

So the bug is very specific, and very obscure: it depends on exactly how the offending command is run, so appears to be a proper bug, and not some sort of security patch from Apple (and it certainly doesn’t appear in the 10.12.5 release notes). I have filed a bug report with Apple, but these are not publicly accessible, and are purported to be something of a black hole, with little feedback from the still-secretive Apple development team.

## July 19, 2017

### David Hogg — #GaiaSprint, day 1

Today was the first day of the 2017 Heidelberg Gaia Sprint. It was the first day of the meeting but nonetheless an impressive day of accomplishments. The day started with a pitch session in which each of the 47 participants was given one slide and 120 seconds to say who they are and what they want to do or learn at the Sprint. These pitch slides are here.

After the pitch, my projects launched well: Jessica Birky (UCSD) was able to get the new version of The Cannon created by Christina Eilers (MPIA) working and started to get some seemingly valuable spectral models out of the M-dwarf spectra in APOGEE. Lauren Anderson (Flatiron) set up and trained a data-driven (empirical) model for the extinction of red stars, based on the Gaia and 2MASS photometry.

Perhaps the most impressive accomplishment of the day is that Morgan Fouesneau (MPIA) and Hans-Walter Rix (MPIA) matched stars between Gaia TGAS and the new GPS1 catalog that puts proper motions onto all PanSTARRS stars. They find co-moving stars where the brighter is in TGAS and the fainter is in GPS1. These pairs are extremely numerous. Many are main-sequence pairs but many pair a main-sequence star in TGAS with a white dwarf in GPS1. These pairs identify white dwarfs but also potentially put cooling ages onto both stars in the pair. The white-dwarf sequence they find is beautiful. Exciting!

### n-Category CaféWhat is the comprehension construction?

Dominic Verity and I have just posted a paper on the arXiv entitled “The comprehension construction.” This post is meant to explain what we mean by the name.

The comprehension construction is somehow analogous to both the straightening and the unstraightening constructions introduced by Lurie in his development of the theory of quasi-categories. Most people use the term $\infty$-categories as a rough synonym for quasi-categories, but we reserve this term for something more general: the objects in any $\infty$-cosmos. There is an $\infty$-cosmos whose objects are quasi-categories and another whose objects are complete Segal spaces. But there are also more exotic $\infty$-cosmoi whose objects model $(\infty,n)$-categories or fibered $(\infty,1)$-categories, and our comprehension construction applies to any of these contexts.

The input to the comprehension construction is any cocartesian fibration between $\infty$-categories together with a third $\infty$-category $A$. The output is then a particular homotopy coherent diagram that we refer to as the comprehension functor. In the case $A=1$, the comprehension functor defines a “straightening” of the cocartesian fibration. In the case where the cocartesian fibration is the universal one over the quasi-category of small $\infty$-categories, the comprehension functor converts a homotopy coherent diagram of shape $A$ into its “unstraightening,” a cocartesian fibration over $A$.

The fact that the comprehension construction can be applied in any $\infty$-cosmos has an immediate benefit. The codomain projection functor associated to an $\infty$-category $A$ defines a cocartesian fibration in the slice $\infty$-cosmos over $A$, in which case the comprehension functor specializes to define the Yoneda embedding.

## Classical comprehension

The comprehension scheme in ZF set theory asserts that for any proposition $\phi$ involving a variable $x$ whose values range over some set $A$ there exists a subset

$\{ x \in A \mid \phi(x)\}$

comprised of those elements for which the formula is satisfied. If the proposition $\phi$ is represented by its characteristic function $\chi_\phi \colon A \to 2$, then this subset is defined by the following pullback

$\begin{svg} \end{svg}$

of the canonical monomorphism $\top \colon 1 \to 2$. For that reason, $2$ is often called the subobject classifier of the category $\text{Set}$ and the morphism $\top\colon 1 \to 2$ is regarded as being its generic subobject. On abstracting this point of view, we obtain the theory of elementary toposes.

## The Grothendieck construction as comprehension

What happens to the comprehension scheme when we pass from the 1-categorical context just discussed to the world of 2-categories?

A key early observation in this regard, due to Ross Street I believe, is that we might usefully regard the Grothendieck construction as an instance of a generalised form of comprehension for the category of categories. This analogy becomes clear when we observe that the category of elements of a functor $F \colon \mathcal{C} \to \text{Set}$ may be formed by taking the pullback:

$\begin{svg} \end{svg}$

Here the projection functor on the right, from the slice ${}^{\ast/}\text{Set}$ of the category of sets under the one point set, is a discrete cocartesian fibration. It follows, therefore, that this pullback is also a 2-pullback and that its left-hand vertical is a discrete cocartesian fibration.

Street’s point of view is (roughly) that in a 2-category $\mathcal{K}$ it is the (suitably defined) discrete cocartesian fibrations that play the role that the sub-objects inhabit in topos theory. Then the generic sub-object $\top\colon 1\to \Omega$ becomes a discrete cocartesian fibration $\top\colon S_\ast\to S$ in $\mathcal{K}$ with the property that pullback of $\top$ along 1-cells $a\colon A\to S$ provides us with equivalences between each hom-category $\text{Fun}_{\mathcal{K}}(A,S)$ and the category $\text{dCoCart}(\mathcal{K})_{/A}$ of discrete cocartesian fibrations over $A$ in $\mathcal{K}$.

This account, however, glosses over one important point; thus far we have only specified that each comparison functor $\text{Fun}_{\mathcal{K}}(A,S) \to \text{dCoCart}(\mathcal{K})_{/A}$ should act by pulling back $\top\colon S_{\ast}\to S$ along each 1-cell $a\colon A\to S$. We have said nothing about how, or weather, this action might extend in any reasonable way to 2-cells $\phi\colon a\Rightarrow b$ in $\text{Fun}_{\mathcal{K}}(A,S)$!

The key observation in that regard is that for any fixed “representably defined” cocartesian fibration $p\colon E\to B$ in a (finitely complete) 2-category $\mathcal{K}$, we may extend pullback to define a pseudo-functor $\text{Fun}_{\mathcal{K}}(A,B)\to\mathcal{K}/A$. This carries each 1-cell $a\colon A\to B$ to the pullback $p_a\colon E_a\to A$ of $p$ along $a$ and its action on a 2-cell $\phi\colon a\Rightarrow b$ is constructed in the manner depicted in the following diagram:

$\begin{svg} \end{svg}$

Here we make use of the fact that $p\colon E\to B$ is a cocartesian fibration in order to lift the whiskered 2-cell $\phi p_a$ to a cocartesian 2-cell $\chi$. Its codomain 1-cell may then be factored through $E_b$, using the pullback property of the front square, to give a 1-cell $E_{\phi}\colon E_a\to E_b$ over $A$ as required. Standard (essential) uniqueness properties of cocartesian lifts may now be deployed to provide canonical isomorphisms $E_{\psi\cdot\phi}\cong E_{\psi}\circ E_{\phi}$ and $E_{\id_a}\cong\id_{E_a}$ and to prove that these satisfy required coherence conditions.

It is this 2-categorical comprehension construction that motivates the key construction of our paper.

### Comprehension and 2-fibrations

In passing, we might quickly observe that the 2-categorical comprehension construction may be regarded as being but one aspect of the theory of 2-fibrations. Specifically the totality of all cocartesian fibrations and cartesian functors between them in $\mathcal{K}$ is a 2-category whose codomain projection $\text{coCart}(\mathcal{K})\to\mathcal{K}$ is a cartesian 2-fibration, it is indeed the archetypal such gadget. Under this interpretation, the lifting construction used to define the pseudo-functor $\text{Fun}_{\mathcal{K}}(A,B) \to \mathcal{K}_{/A}$ is quite simply the typical cartesian 2-cell lifting property characteristic of a 2-fibration.

In an early draft of our paper, our narrative followed just this kind of route. There we showed that the totality of cocartesian fibrations in an $\infty$-cosmos could be assembled to give the total space of a kind of cartesian fibration of (weak) 2-complicial sets. In the end, however, we abandoned this presentation in favour of one that was more explicitly to the point for current purposes. Watch this space, however, because we are currently preparing a paper on the complicial version of this theory which will return to this point of view. For us this has become a key component of our work on foundations of complicial approach to $(\infty,\infty)$-category theory.

## An $\infty$-categorical comprehension construction

In an $\infty$-cosmos $\mathcal{K}$, by which we mean a category enriched over quasi-categories that admits a specified class of isofibrations and certain simplicially enriched limits, we may again define $p \colon E \twoheadrightarrow B$ to be a cocartesian fibration representably. That is to say, $p$ is a cocartesian fibration if it is an isofibration in the specified class and if $\text{Fun}_{\mathcal{K}}(X,p) \colon \text{Fun}_{\mathcal{K}}(X,E) \to \text{Fun}_{\mathcal{K}}(X,B)$ is a cocartesian fibration of quasi-categories for every $\infty$-category $X$. Then a direct “homotopy coherent” generalisation of the 2-categorical construction discussed above demonstrates that we define an associated comprehension functor:

$c_{p,A} \colon \mathfrak{C}\text{Fun}_{\mathcal{K}}(A,B)\to \text{coCart}(\mathcal{K})_{/A}.$

The image lands in the maximal Kan complex enriched subcategory of the quasi-categorically enriched category of cocartesian fibrations and cartesian functors over $A$, so the comprehension functor transposes to define a map of quasi-categories

$c_{p,A} \colon \text{Fun}_{\mathcal{K}}(A,B) \to \mathbb{N}(\text{coCart}(\mathcal{K})_{/A})$

whose codomain is defined by applying the homotopy coherent nerve.

### Straightening as comprehension

The “straightening” of a cocartesian fibration into a homotopy coherent diagram is certainly one of early highlights in Lurie’s account of quasi-category theory. Such functors are intrinsically tricky to construct, since that process embroils us in specifying an infinite hierarchy of homotopy coherent data.

We may deploy the $\infty$-categorical comprehension to provide a alternative approach to straightening. To that end we work in the $\infty$-cosmos of quasi-categories $\text{qCat}$ and let $A=1$, and observe that the comprehension functor $c_{p,1}\colon \mathfrak{C}B \to \text{qCat}$ is itself the straightening of $p$. Indeed, it is possible to use the constructions in our paper to extend this variant of unstraightening to give a functor of quasi-categories:

$\mathbb{N}(\text{coCart}_{/B}) \to \text{Fun}(B,Q)$

Here $Q$ is the (large) quasi-category constructed by taking the homotopy coherent nerve of (the maximal Kan complex enriched subcategory of) $\text{qCat}$. So the objects of $\text{Fun}(B,Q)$ correspond bijectively to “straight” simplicial functors $\mathfrak{C}B\to\qCat$. We should confess, however, that we do not explicitly pursue the full construction of this straightening functor there.

### Unstraightening as comprehension

In the $\infty$-categorical context, the Grothendieck construction is christened unstraightening by Lurie. It is inverse to the straightening construction discussed above.

We may also realise unstraightening as comprehension. To that end we follow Ross Street’s lead by taking $Q_{\ast}$ to be a quasi-category of pointed quasi-categories and apply the comprehension construction to the “forget the point” projection $Q_{\ast}\to Q$. The comprehension functor thus derived

$c_{p,A} \colon Fun(A,Q) \to \mathbb{N}\left(dCoCart_{/A}\right)$

defines a quasi-categorical analogue of Lurie’s unstraightening construction. In an upcoming paper we use the quasi-categorical variant of Beck’s monadicity theorem to prove that this functor is an equivalence. We also extend this result to certain other $\infty$-cosmoi, such as the $\infty$-cosmos of (co)cartesian fibrations over a fixed quasi-category.

### Constructing the Yoneda embedding

Applying the comprehension construction to the cocartesian fibration $cod : A^2 \to A$ in the slice $\infty$-cosmos $\mathcal{K}_{/A}$, we obtain a map

$y \colon \text{Fun}_{\mathcal{K}}(1,A) \to \mathbb{N}Cart(\mathcal{K})_{/A}$

that carries an element $a \colon 1 \to A$ to the groupoidal cartesian fibration $dom : A \downarrow a \to A$. This provides us with a particularly explicit model of the Yoneda embedding, whose action on hom-spaces is easily computed. In particular, this allows us to easily demonstrate that the Yoneda embedding is fully-faithful and thus that every quasi-category is equivalent to the homotopy coherent nerve of some Kan complex enriched category.

## July 18, 2017

### Terence Tao — Maryam Mirzakhani

I am totally stunned to learn that Maryam Mirzakhani died today yesterday, aged 40, after a severe recurrence of the cancer she had been fighting for several years.  I had planned to email her some wishes for a speedy recovery after learning about the relapse yesterday; I still can’t fully believe that she didn’t make it.

My first encounter with Maryam was in 2010, when I was giving some lectures at Stanford – one on Perelman’s proof of the Poincare conjecture, and another on random matrix theory.  I remember a young woman sitting in the front who asked perceptive questions at the end of both talks; it was only afterwards that I learned that it was Mirzakhani.  (I really wish I could remember exactly what the questions were, but I vaguely recall that she managed to put a nice dynamical systems interpretation on both of the topics of my talks.)

After she won the Fields medal in 2014 (as I posted about previously on this blog), we corresponded for a while.  The Fields medal is of course one of the highest honours one can receive in mathematics, and it clearly advances one’s career enormously; but it also comes with a huge initial burst of publicity, a marked increase in the number of responsibilities to the field one is requested to take on, and a strong expectation to serve as a public role model for mathematicians.  As the first female recipient of the medal, and also the first to come from Iran, Maryam was experiencing these pressures to a far greater extent than previous medallists, while also raising a small daughter and fighting off cancer.  I gave her what advice I could on these matters (mostly that it was acceptable – and in fact necessary – to say “no” to the vast majority of requests one receives).

Given all this, it is remarkable how productive she still was mathematically in the last few years.  Perhaps her greatest recent achievement has been her “magic wandtheorem with Alex Eskin, which is basically the analogue of the famous measure classification and orbit closure theorems of Marina Ratner, in the context of moduli spaces instead of unipotent flows on homogeneous spaces.  (I discussed Ratner’s theorems in this previous post.  By an unhappy coincidence, Ratner also passed away this month, aged 78.)  Ratner’s theorems are fundamentally important to any problem to which a homogeneous dynamical system can be associated (for instance, a special case of that theorem shows up in my work with Ben Green and Tamar Ziegler on the inverse conjecture for the Gowers norms, and on linear equations in primes), as it gives a good description of the equidistribution of any orbit of that system (if it is unipotently generated); and it seems the Eskin-Mirzakhani result will play a similar role in problems associated instead to moduli spaces.  The remarkable proof of this result – which now stands at over 200 pages, after three years of revision and updating – uses almost all of the latest techniques that had been developed for homogeneous dynamics, and ingeniously adapts them to the more difficult setting of moduli spaces, in a manner that had not been dreamed of being possible only a few years earlier.

Maryam was an amazing mathematician and also a wonderful and humble human being, who was at the peak of her powers.  Today was a huge loss for Maryam’s family and friends, as well as for mathematics.

[EDIT, Jul 16: New York times obituary here.]

[EDIT, Jul 18: New Yorker memorial here.]

Filed under: obituary Tagged: Maryam Mirzakhani

## July 17, 2017

### Scott Aaronson — Three things

I was shocked and horrified to learn of the loss of Maryam Mirzakhani at age 40, after a battle with cancer (see here or here).  Mirzakhani was a renowned mathematician at Stanford and the world’s first and so far only female Fields Medalist.  I never had the privilege of meeting her, but everything I’ve read about her fills me with admiration.  I wish to offer condolences to her friends and family, including her husband Jan Vondrák, also a professor at Stanford and a member of the CS theory community.

In other depressing news, discussion continues to rage on social media about “The Uninhabitable Earth,” the New York magazine article by David Wallace-Wells arguing that the dangers of climate change have been systematically understated even by climate scientists; that sea level rise is the least of the problems; and that if we stay the current course, much of the earth’s landmass has a good chance of being uninhabitable by the year 2100.  In an unusual turn of events, the Wallace-Wells piece has been getting slammed by climate scientists, including Michael Mann (see here and also this interview)—people who are usually in the news to refute the claims of deniers.

Some of the critics’ arguments seem cogent to me: for example, that Wallace-Wells misunderstood some satellite data, and more broadly, that the piece misleadingly presents its scenario as overwhelmingly probable by 2100 if we do nothing, rather than as “only” 10% likely or whatever—i.e., a mere Trump-becoming-president level of risk.  Other objections to the article impressed me less: for example, that doom-and-gloom is a bad way to motivate people about climate change; that the masses need a more optimistic takeaway.  That obviously has no bearing on the truth of what’s going to happen—but even if we did agree to entertain such arguments, well, it’s not as if mainstream messaging on climate change has been an unmitigated success.  What if everyone should be sweating-in-the-night terrified?

As far as I understand it, the question of the plausibility of Wallace-Wells’s catastrophe scenario mostly just comes down to a single scientific unknown: namely, will the melting permafrost belch huge amounts of methane into the atmosphere?  If it does, then “Armageddon” is probably a fair description of what awaits us in the next century, and if not, not.  Alas, our understanding of permafrost doesn’t seem especially reliable, and it strikes me that models of such feedbacks have a long history of erring on the side of conservatism (for example, researchers were astonished by how quickly glaciers and ice shelves fell apart).

So, while I wish the article was written with more caveats, I submit that runaway warming scenarios deserve more attention rather than less.  And we should be putting discussion of those scenarios in exactly the broader context that Wallace-Wells does: namely, that of the Permian-Triassic extinction event, the Fermi paradox, and the conditions for a technological civilization to survive past its infancy.

Certainly we spend much more time on risks to civilization (e.g., nuclear terrorism, bioengineered pandemics) that strike me as less probable than this one.  And certainly this tail, in the distribution of possible outcomes, deserves at least as much attention as its more popular opposite, the tail where climate change turns out not to be much of a problem at all.  For the grim truth about climate change is that history won’t end in 2100: only the projections do.  And the mere addition of 50 more years could easily suffice to turn a tail risk into a body risk.

Of course, that the worst will happen is a clear prediction of reverse Hollywoodism theory—besides being the “natural, default” prediction for a computer scientist used to worst-case analysis.  This is one prediction that I hope turns out to be as wrong as possible.

OK, now for something to cheer us all up.  Yesterday the group of Misha Lukin, at Harvard, put a paper on the arXiv reporting the creation of a 51-qubit quantum simulator using cold atoms.  The paper doesn’t directly address the question of quantum supremacy, or indeed of performance comparisons between the new device and classical simulations at all.  But this is clearly a big step forward, while the world waits for the fully-programmable 50-qubit superconducting QCs that have been promised by the groups at Google and IBM.

Indeed, this strikes me as the most exciting news in experimental quantum information since last month, when Jian-Wei Pan’s group in Shanghai reported the first transmission of entangled photons from a satellite to earth—thereby allowing violations of the Bell inequality over 1200 kilometers, teleportation of a qubit from earth to space, and other major firsts.  These are breakthroughs that we knew were in the works ever since the Chinese government launched the QUESS satellite devoted to quantum communications.  I should’ve blogged about them in June.  Then again, regular readers of Shtetl-Optimized, familiar as they already are with the universal reach of quantum mechanics and with the general state of quantum information technology, shouldn’t find anything here that fundamentally surprises them, should they?

### Noncommutative Geometry — Maryam Mirzakhani (May 3, 1977 - July 15, 2017)

Some twenty five years ago when I was told by one of  her teachers that she is truly brilliant and exceptional I could have not imagined that I will see this terribly sad day. A shining star is turned off far too soon. Heartfelt condolences to her husband, family, and mathematics community worldwide. از شمار دو چشم یک تن کم وز شمار خرد هزاران بیش "Two eyes are gone  Thousands of minds

### Tommaso Dorigo — Revenge Of The Slimeballs: When US Labs Competed For Leadership In HEP

The clip below, together with the following few which will be published every few days in the coming weeks, is extracted from the third chapter of my book "Anomaly! Collider Physics and the Quest for New Phenomena at Fermilab". It recounts the pioneering measurement of the Z mass by the CDF detector, and the competition with SLAC during the summer of 1989.

### Doug Natelson — A thermoelectric surprise in metals

Earlier this year I'd described what thermoelectricity is, and I'd also discussed recent work of ours where we used a laser as a scan-able heat source, and were then able to see nicely the fact that changing the size of a nanoscale metal structure can vary the material's thermoelectric properties, and make a thermocouple out of a single metal.

With this same measurement technique, we found a result that we thought was rather strange and surprising, which we have written up here.   Take a moderately long wire, say 120 nm wide and several microns long, made by patterning a 15 nm thick Au film.  Hook up basically a volt meter to the ends of the wire, and scan the laser spot along the length of the wire, recording the voltage as a function of the laser position.  If the wire is nice and homogeneous, you'd expect not to see to much until you get to the ends of the wire where it widens out into bigger contacts.  (There the size variation should make the skinny/wide junction act like a thermocouple.)   Instead, we see the result shown here in the figure (fig. 2 of the paper).  There is a great deal of spatial variability in the photothermoelectric voltage, like the wire is actually made up of a whole bunch of little thermocouples!

Note that your eye tends to pick out a spatial scale in panel (a) comparable to the 1 micron scale bar.  That's a bit misleading; the spot size of the laser in our system is about 1.8 microns, so this measurement approach would not pick up much smaller spatial scales of variation.

The metal wire is polycrystalline, and if you look at the electron microscope images in panels (c, d, e) you can make out a grain structure with lateral grain sizes of 15-20 nm.  Maybe the wire isn't all that homogeneous?  One standard way physicists look at the quality of metal films is to consider the electrical resistance of a square patch of film ($$R_{\square}$$, the "sheet resistance" or "resistance per square"), and compare that number with the "resistance quantum", $$R_{\mathrm{q}}\equiv h/2e^2$$, a combination of fundamental constants that sets a scale for resistance.  If you had two pieces of metal touching at a single atom, the resistance between them would be around the resistance quantum.  For our wire material, $$R_{\square}$$ is a little under 4 $$\Omega$$, so $$R_{\square} << R_{\mathrm{q}}$$, implying that the grains of our material are very well-connected - that it should act like a pretty homogeneous film.  This is why the variation shown in the figure is surprising.  Annealing the wires does change the voltage pattern as well as smoothing it out.  This is a pretty good indicator that the grain boundaries really are important here.  We hope to understand this better - it's always fun when a system thought to be well understood surprises you.

## July 16, 2017

### Matt Strassler — Ongoing Chance of Northern (or Southern) Lights

As forecast, the cloud of particles from Friday’s solar flare (the “coronal mass emission”, or “CME”) arrived at our planet a few hours after my last post, early in the morning New York time. If you’d like to know how I knew that it had reached Earth, and how I know what’s going on now, scroll down to the end of this post and I’ll show you the data I was following, which is publicly available at all times.

So far the resulting auroras have stayed fairly far north, and so I haven’t seen any — though they were apparently seen last night in Washington and Wyoming, and presumably easily seen in Canada and Alaska. [Caution: sometimes when people say they’ve been “seen”, they don’t quite mean that; I often see lovely photos of aurora that were only visible to a medium-exposure camera shot, not to the naked eye.]  Or rather, I should say that the auroras have stayed fairly close to the Earth’s poles; they were also seen in New Zealand.

Russia and Europe have a good opportunity this evening. As for the U.S.? The storm in the Earth’s magnetic field is still going on, so tonight is still a definite possibility for northern states. Keep an eye out! Look for what is usually a white or green-hued glow, often in swathes or in stripes pointing up from the northern horizon, or even overhead if you’re lucky.  The stripes can move around quite rapidly.

Now, here’s how I knew all this.  I’m no expert on auroras; that’s not my scientific field at all.   But the U.S. Space Weather Prediction Center at the National Oceanic and Atmospheric Administration, which needs to monitor conditions in space in case they should threaten civilian and military satellites or even installations on the ground, provides a wonderful website with lots of relevant data.

The first image on the site provides the space weather overview; a screenshot from the present is shown below, with my annotations.  The upper graph indicates a blast of x-rays (a form of light not visible to the human eye) which is generated when the solar flare, the magnetically-driven explosion on the sun, first occurs.  Then the slower cloud of particles (protons, electrons, and other atomic nuclei, all of which have mass and therefore can’t travel at light’s speed) takes a couple of days to reach Earth.  It’s arrival is shown by the sudden jump in the middle graph.  Finally, the lower graph measures how active the Earth’s magnetic field is.  The only problem with that plot is it tends to be three hours out of date, so beware of that! A “Kp index” of 5 shows significant activity; 6 means that auroras are likely to be moving away from the poles, and 7 or 8 mean that the chances in a place like the north half of the United States are pretty good.  So far, 6 has been the maximum generated by the current flare, but things can fluctuate a little, so 6 or 7 might occur tonight.  Keep an eye on that lower plot; if it drops back down to 4, forget it, but it it’s up at 7, take a look for sure!

Also on the site is data from the ACE satellite.  This satellite sits 950 thousand miles [1.5 million kilometers] from Earth, between Earth and the Sun, which is 93 million miles [150 million kilometers] away.  At that vantage point, it gives us (and our other satellites) a little early warning, of up to an hour, before the cloud of slow particles from a solar flare arrives.  That provides enough lead-time to turn off critical equipment that might otherwise be damaged.  And you can see, in the plot below, how at a certain time in the last twenty-four hours the readings from the satellite, which had been tepid before, suddenly started fluctuating wildly.  That was the signal that the flare had struck the satellite, and would arrive shortly at our location.

It’s a wonderful feature of the information revolution that you can get all this scientific data yourself, and not wait around hoping for a reporter or blogger to process it for you.  None of this was available when I was a child, and I missed many a sky show.  A big thank you to NOAA, and to the U.S. taxpayers who make their work possible.

Filed under: Astronomy Tagged: astronomy, auroras, space

### David Hogg — M-dwarf expertise

Jessica Birky (UCSD) and I met with Wolfgang Brandner (MPIA) and Derek Homeier (MPIA) to discuss M-dwarf spectra. Homeier has just finished a study of a few dozen M-dwarfs in APOGEE with the PHOENIX models. We are going to find out whether this set of stars will constitute an adequate training set for The Cannon. It is very weighted to a small temperature range, so it might not have enough coverage for us. We learned a huge amount in our meeting, like whether rotation might affect us (or be detectable), whether binaries might be common in our sample, and whether we might be able to use photometry (or photometry plus astrometry) to get effective temperatures. The conversation was very wide ranging and I learned a huge amount.

## July 15, 2017

### Matt Strassler — Lights in the Sky (maybe…)

The Sun is busy this summer. The upcoming eclipse on August 21 will turn day into deep twilight and transfix millions across the United States.  But before we get there, we may, if we’re lucky, see darkness transformed into color and light.

On Friday July 14th, a giant sunspot in our Sun’s upper regions, easily visible if you project the Sun’s image onto a wall, generated a powerful flare.  A solar flare is a sort of magnetically powered explosion; it produces powerful electromagnetic waves and often, as in this case, blows a large quantity of subatomic particles from the Sun’s corona. The latter is called a “coronal mass ejection.” It appears that the cloud of particles from Friday’s flare is large, and headed more or less straight for the Earth.

Light, visible and otherwise, is an electromagnetic wave, and so the electromagnetic waves generated in the flare — mostly ultraviolet light and X-rays — travel through space at the speed of light, arriving at the Earth in eight and a half minutes. They cause effects in the Earth’s upper atmosphere that can disrupt radio communications, or worse.  That’s another story.

But the cloud of subatomic particles from the coronal mass ejection travels a few hundred times slower than light, and it takes it about two or three days to reach the Earth.  The wait is on.

Bottom line: a huge number of high-energy subatomic particles may arrive in the next 24 to 48 hours. If and when they do, the electrically charged particles among them will be trapped in, and shepherded by, the Earth’s magnetic field, which will drive them spiraling into the atmosphere close to the Earth’s polar regions. And when they hit the atmosphere, they’ll strike atoms of nitrogen and oxygen, which in turn will glow. Aurora Borealis, Northern Lights.

So if you live in the upper northern hemisphere, including Europe, Canada and much of the United States, keep your eyes turned to the north (and to the south if you’re in Australia or southern South America) over the next couple of nights. Dark skies may be crucial; the glow may be very faint.

You can also keep abreast of the situation, as I will, using NOAA data, available for instance at

http://www.swpc.noaa.gov/communities/space-weather-enthusiasts

The plot on the upper left of that website, an example of which is reproduced below, shows three types of data. The top graph shows the amount of X-rays impacting the atmosphere; the big jump on the 14th is Friday’s flare. And if and when the Earth’s magnetic field goes nuts and auroras begin, the bottom plot will show the so-called “Kp Index” climbing to 5, 6, or hopefully 7 or 8. When the index gets that high, there’s a much greater chance of seeing auroras much further away from the poles than usual.

Keep an eye also on the data from the ACE satellite, lower down on the website; it’s placed to give Earth an early warning, so when its data gets busy, you’ll know the cloud of particles is not far away.

Wishing you all a great sky show!

Filed under: LHC News

### David Hogg — Bayes Cannon, asteroseismology, binaries

Today, at MPIA Milky Way Group Meeting, I presented my thinking about Stephen Feeney (Flatiron), Ana Bonaca (Harvard), and my project on doing asteroseismology without the Fourier Transform. I am so excited about the (remote, perhaps) possibility that Gaia might be able to measure delta-nu and nu-max for many stars! Possible #GaiaSprint project?

Before me, Kareem El-Badry (Berkeley) talked about how wrong your inferences about stars can be when you model the spectrum without considering binarity. This maps on to a lot of things I discuss with Tim Morton (Princeton) in the area of exoplanet science. Also Yuan-Sen Ting (ANU) spoke about using t-SNE to look for clustering of stars in chemical space.

I spent the early morning writing up a safe-for-methodologists (think: statisticians, mathematicians, and computer scientists) description of The Cannon's likelihood function, when the stellar labels themselves are poorly known (really the project of Christina Eilers here at MPIA). I did this because Jonathan Weare (Chicago) has proposed that he can probably sample the full posterior. I hope that is true! It would be a probabilistic tour de force.

### David Hogg — not ready for #GaiaSprint

Lauren Anderson (Flatiron) showed up at MPIA today to discuss #GaiaSprint projects and our next projects more generally. We discussed a possible project in which we try to use the TGAS data to infer the relationships between extinction and intrinsic color for red-giant stars, and then use those relationships in the billion-star catalog to predict parallaxes for DR2 (and also learn the dust map and the spatial distribution of stars in the Milky Way).

### Tommaso Dorigo — Muon G-2: The Anomaly That Could Change Physics, And A New Exciting Theoretical Development

Do you remember the infamous "g-2" measurement ? The anomalous magnetic moment of the muon has been for over a decade in the agenda of HEP physicists, both as a puzzle and as a hope for good things to come.

Ever since the Brookhaven laboratories estimated the quantity at a value over 3 standard deviations away from the equally precise theoretical predictions, the topic (could the discrepancy be due to new physics??) has been commonplace in dinner table conversations among HEP physicists.

## July 14, 2017

### n-Category CaféLaws of Mathematics "Commendable"

Australia’s Prime Minister Malcolm Turnbull, today:

The laws of mathematics are very commendable, but the only law that applies in Australia is the law of Australia.

The context: Turnbull wants Australia to undermine encryption by compelling backdoors by law. The argument is that governments should have the right to read all their citizens’ communications.

Technologists have explained over and over again why this won’t work, but politicians like Turnbull know better. The recent, enormous, Petya and WannaCry malware attacks (hitting British hospitals, for instance) show what can happen when intelligence agencies such as the NSA treat vulnerabilities in software as opportunities to be exploited rather than problems to be fixed.

Thanks to David Roberts for sending me the link.

## July 13, 2017

### David Hogg — asteroseismology; toy model potentials; dwarfs vs giants

Stephen Feeney (Flatiron) sent me plots today that suggest that we can measure asteroseismic nu-max and delta-nu for a red-giant star without ever taking the Fourier Transform of the data. Right now, there are still many issues: This is still fake data, which is always cheating. The sampler (despite being nested and all) gets stuck in poor modes (and this problem is exceedingly multimodal). But when we inspect the sampling after the fact, the good answer beats the bad answers in likelihood by a huge ratio, which suggests that we might be able to do asteroseismology at pretty low signal-to-noise too. We need to move to real data (from Kepler).

Because of concern that (in our stellar-stream project) we aren't marginalizing out all our unknowns yet—and maybe that is making things look more informative than they are—Ana Bonaca (Harvard) stared today on including the progenitor position in our Fisher-matrix (Cramér-Rao) analysis of all stellar streams. We also have concerns about the rigidity of the gravitational potential model (which is a toy model, in keeping with the traditions of the field!). We discussed also marginalizing out some kind of perturbation expansion around that toy model. This would permit us to both be more conservative, and also criticize the precisions obtained with these toy models.

Jessica Birky (UCSD) looked at chi-square differences (in spectral space) between APOGEE spectra of low-temperature stars without good labels and two known M-type stars, one giant and one dwarf. This separated all the cool stars in APOGEE easily into two classes. Nice! We are sanity-checking the answers. We are still far, however, from having a good training set to fire into The Cannon.

### Jordan Ellenberg — Lo!

“A naked man in a city street — the track of a horse in volcanic mud — the mystery of reindeer’s ears — a huge, black form, like a whale, in the sky, and it drips red drops as if attacked by celestial swordfishes — an appalling cherub appears in the sea —

Confusions.”

### Scott Aaronson — Amsterdam art museums plagiarizing my blog?

This past week I had the pleasure of attending COLT (Conference on Learning Theory) 2017 in Amsterdam, and of giving an invited talk on “PAC-Learning and Reconstruction of Quantum States.”  You can see the PowerPoint slides here; videos were also made, but don’t seem to be available yet.

This was my first COLT, but almost certainly not the last.  I learned lots of cool new tidbits, from the expressive power of small-depth neural networks, to a modern theoretical computer science definition of “non-discriminatory” (namely, your learning algorithm’s output should be independent of protected categories like race, sex, etc. after conditioning on the truth you’re trying to predict), to the inapproximability of VC dimension (assuming the Exponential Time Hypothesis).  You can see the full schedule here.  Thanks so much to the PC chairs, Ohad Shamir and Satyen Kale, for inviting me and for putting on a great conference.

And one more thing: I’m not normally big on art museums, but Amsterdam turns out to have two in close proximity to each other—the Rijksmuseum and the Stedelijk—each containing something that Shtetl-Optimized readers might recognize.

Photo credits: Ronald de Wolf and Marijn Heule.

### Backreaction — Nature magazine publishes comment on quantum gravity phenomenology, demonstrates failure of editorial oversight

I have a headache andblame Nature magazine for it. For about 15 years, I have worked on quantum gravity phenomenology, which means I study ways to experimentally test the quantum properties of space and time. Since 2007, my research area has its own conference series, “Experimental Search for Quantum Gravity,” which took place most recently September 2016 in Frankfurt, Germany. Extrapolating

## July 11, 2017

### Terence Tao — On the universality of potential well dynamics

I’ve just uploaded to the arXiv my paper “On the universality of potential well dynamics“, submitted to Dynamics of PDE. This is a spinoff from my previous paper on blowup of nonlinear wave equations, inspired by some conversations with Sungjin Oh. Here we focus mainly on the zero-dimensional case of such equations, namely the potential well equation

$\displaystyle \partial_{tt} u = - (\nabla F)(u) \ \ \ \ \ (1)$

for a particle ${u: {\bf R} \rightarrow {\bf R}^m}$ trapped in a potential well with potential ${F: {\bf R}^m \rightarrow {\bf R}}$, with ${F(z) \rightarrow +\infty}$ as ${z \rightarrow \infty}$. This ODE always admits global solutions from arbitrary initial positions ${u(0)}$ and initial velocities ${\partial_t u(0)}$, thanks to conservation of the Hamiltonian ${\frac{1}{2} |\partial_t u|^2 + F(u)}$. As this Hamiltonian is coercive (in that its level sets are compact), solutions to this equation are always almost periodic. On the other hand, as can already be seen using the harmonic oscillator ${\partial_{tt} u = - k^2 u}$ (and direct sums of this system), this equation can generate periodic solutions, as well as quasiperiodic solutions.

All quasiperiodic motions are almost periodic. However, there are many examples of dynamical systems that admit solutions that are almost periodic but not quasiperiodic. So one can pose the question: are the dynamics of potential wells universal in the sense that they can capture all almost periodic solutions?

A precise question can be phrased as follows. Let ${M}$ be a compact manifold, and let ${X}$ be a smooth vector field on ${M}$; to avoid degeneracies, let us take ${X}$ to be non-singular in the sense that it is everywhere non-vanishing. Then the trajectories of the first-order ODE

$\displaystyle \partial_t u = X(u) \ \ \ \ \ (2)$

for ${u: {\bf R} \rightarrow M}$ are always global and almost periodic. Can we then find a (coercive) potential ${F: {\bf R}^m \rightarrow {\bf R}}$ for some ${m}$, as well as a smooth embedding ${\phi: M \rightarrow {\bf R}^m}$, such that every solution ${u}$ to (2) pushes forward under ${\phi}$ to a solution to (1)? (Actually, for technical reasons it is preferable to map into the phase space ${{\bf R}^m \times {\bf R}^m}$, rather than position space ${{\bf R}^m}$, but let us ignore this detail for this discussion.)

It turns out that the answer is no; there is a very specific obstruction. Given a pair ${(M,X)}$ as above, define a strongly adapted ${1}$-form to be a ${1}$-form ${\phi}$ on ${M}$ such that ${\phi(X)}$ is pointwise positive, and the Lie derivative ${{\mathcal L}_X \phi}$ is an exact ${1}$-form. We then have

Theorem 1 A smooth compact non-singular dynamics ${(M,X)}$ can be embedded smoothly in a potential well system if and only if it admits a strongly adapted ${1}$-form.

For the “only if” direction, the key point is that potential wells (viewed as a Hamiltonian flow on the phase space ${{\bf R}^m \times {\bf R}^m}$) admit a strongly adapted ${1}$-form, namely the canonical ${1}$-form ${p dq}$, whose Lie derivative is the derivative ${dL}$ of the Lagrangian ${L := \frac{1}{2} |\partial_t u|^2 - F(u)}$ and is thus exact. The converse “if” direction is mainly a consequence of the Nash embedding theorem, and follows the arguments used in my previous paper.

Interestingly, the same obstruction also works for potential wells in a more general Riemannian manifold than ${{\bf R}^m}$, or for nonlinear wave equations with a potential; combining the two, the obstruction is also present for wave maps with a potential.

It is then natural to ask whether this obstruction is non-trivial, in the sense that there are at least some examples of dynamics ${(M,X)}$ that do not support strongly adapted ${1}$-forms (and hence cannot be modeled smoothly by the dynamics of a potential well, nonlinear wave equation, or wave maps). I posed this question on MathOverflow, and Robert Bryant provided a very nice construction, showing that the vector field ${(\sin(2\pi x), \cos(2\pi x))}$ on the ${2}$-torus ${({\bf R}/{\bf Z})^2}$ had no strongly adapted ${1}$-forms, and hence the dynamics of this vector field cannot be smoothly reproduced by a potential well, nonlinear wave equation, or wave map:

On the other hand, the suspension of any diffeomorphism does support a strongly adapted ${1}$-form (the derivative ${dt}$ of the time coordinate), and using this and the previous theorem I was able to embed a universal Turing machine into a potential well. In particular, there are flows for an explicitly describable potential well whose trajectories have behavior that is undecidable using the usual ZFC axioms of set theory! So potential well dynamics are “effectively” universal, despite the presence of the aforementioned obstruction.

In my previous work on blowup for Navier-Stokes like equations, I speculated that if one could somehow replicate a universal Turing machine within the Euler equations, one could use this machine to create a “von Neumann machine” that replicated smaller versions of itself, which on iteration would lead to a finite time blowup. Now that such a mechanism is present in nonlinear wave equations, it is tempting to try to make this scheme work in that setting. Of course, in my previous paper I had already demonstrated finite time blowup, at least in a three-dimensional setting, but that was a relatively simple discretely self-similar blowup in which no computation occurred. This more complicated blowup scheme would be significantly more effort to set up, but would be proof-of-concept that the same scheme would in principle be possible for the Navier-Stokes equations, assuming somehow that one can embed a universal Turing machine into the Euler equations. (But I’m still hopelessly stuck on how to accomplish this latter task…)

## July 10, 2017

### n-Category CaféA Bicategory of Decorated Cospans

My students are trying to piece together general theory of networks, inspired by many examples. A good general theory should clarify and unify these examples. What some people call network theory, I’d just call ‘applied graph invariant theory’: they come up with a way to calculate numbers from graphs, they calculate these numbers for graphs that show up in nature, and then they try to draw conclusions about this. That’s fine as far as it goes, but there’s a lot more to network theory!

There are many kinds of networks. You can usually create big networks of a given kind by sticking together smaller networks of this kind. The networks usually do something, and the behavior of the whole is usually determined by the behavior of the parts and how the parts are stuck together.

So, we should think of networks of a given kind as morphisms in a category, or more generally elements of an algebra of some operad, and define a map sending each such network to its behavior. Then we can study this map mathematically!

All these insights (and many more) are made precise in Fong’s theory of ‘decorated cospans’:

Kenny Courser is starting to look at the next thing: how one network can turn into another. For example, a network might change over time, or we might want to simplify a complicated network somehow. If a network is morphism, a process where one network turns into another could be a ‘2-morphism’: that is, a morphism between morphisms. Just as categories have objects and morphisms, bicategories have objects, morphisms and 2-morphisms.

So, Kenny is looking at bicategories. As a first step, Kenny took Brendan’s setup and souped it up to define ‘decorated cospan bicategories’:

In this paper, he showed that these bicategories are often ‘symmetric monoidal’. This means that you can not only stick networks together end to end, you can also set them side by side or cross one over the other—and similarly for processes that turn one network into another! A symmetric monoidal bicategory is a somewhat fearsome structure, so Kenny used some clever machinery developed by Mike Shulman to get the job done:

I would love to talk about the details, but they’re a bit technical so I think I’d better talk about something more basic. Namely: what’s a decorated cospan category and what’s a decorated cospan bicategory?

First: what’s a decorated cospan? A cospan in some category $C$ is a diagram like this:

where the objects and morphisms are all in $C.$ For example, if $C$ is the category of sets, we’ve got two sets $X$ and $Y$ mapped to a set $\Gamma.$

In a ‘decorated’ cospan, the object $\Gamma$ is equipped or, as we like to say, ‘decorated’ with extra structure. For example:

Here the set $\Gamma$ consists of 3 points—but it’s decorated with a graph whose edges are labelled by numbers! You could use this to describe an electrical circuit made of resistors. The set $X$ would then be the set of ‘input terminals’, and $Y$ the set of ‘output terminals’.

In this example, and indeed in many others, there’s no serious difference between inputs and outputs. We could reflect the picture, switching the roles of $X$ and $Y,$ and the inputs would become outputs and vice versa. One reason for distinguishing them is that we can then attach the outputs of one circuit to the inputs of another and build a larger circuit. If we think of our circuit as a morphism from the input set $X$ to the output set $Y,$ this process of attaching circuits to form larger ones can be seen as composing morphisms in a category.

In other words, if we get the math set up right, we can compose a decorated cospan from $X$ to $Y$ and a decorated cospan from $Y$ to $Z$ and get a decorated cospan from $X$ to $Z.$ So with luck, we get a category with objects of $C$ as objects, and decorated cospans between these guys as morphisms!

For example, we can compose this:

and this:

to get this:

What did I mean by saying ‘with luck’? Well, there’s not really any luck involved, but we need some assumptions for all this to work. Before we even get to the decorations, we need to be able to compose cospans. We can do this whenever our cospans live in a category with pushouts. In category theory, a pushout is how we glue two things together.

So, suppose our category $C$ has pushouts. IF we then have two cospans in $C,$ one from $X$ to $Y$ and one from $Y$ to $Z:$

we can take a pushout:

and get a cospan from $X$ to $Z:$

All this is fine and dandy, but there’s a slight catch: the pushout is only defined up to isomorphism, so we can’t expect this process of composing cospans to be associative: it will only be associative up to isomorphism.

What does that mean? What’s an isomorphism of cospans?

I’m glad you asked. A map of cospans is a diagram like this:

where the two triangles commmute. You can see two cospans in this picture; the morphism $f$ provides the map from one to the other. If $f$ is an isomorphism, then this is an isomorphism of cospans.

To get around this problem, we can work with a category where the morphisms aren’t cospans, but isomorphism classes of cospans. That’s what Brendan did, and it’s fine for many purposes.

But back around 1972, when Bénabou was first inventing bicategories, he noticed that you could also create a bicategory with

• objects of $C$ as objects,
• spans in $C$ as morphisms, and
• maps of spans in $C$ as 2-morphisms.

Bicategories are perfectly happy for composition of 1-morphisms to be associative only up to isomorphism, so this solves the problem in a somewhat nicer way. (Taking equivalence classes of things when you don’t absolutely need to is regarded with some disdain in category theory, because it often means you’re throwing out information—and when you throw out information, you often regret it later.)

So, if you’re interested in decorated cospan categories, and you’re willing to work with bicategories, you should consider thinking about decorated cospan bicategories. And now, thanks to Kenny Courser’s work, you can!

He showed how the decorations work in the bicategorical approach: for example, he proved that whenever $C$ has finite colimits and

$F : (C,+) \to (\mathrm{Set}, \times)$

is a lax symmetric monoidal functor, you get a symmetric monoidal bicategory where a morphism is a cospan in $C:$

with the object $\Gamma$ decorated by an element of $F(\Gamma).$

Proving this took some virtuosic work in category theory. The key turns out to be this glorious diagram:

For the explanation, check out Proposition 4.5 in his paper.

I’ll talk more about applications of cospan bicategories when I blog about some other papers Kenny Courser and Daniel Cicala are writing.

### Tommaso Dorigo — 600 Attend To Outreach Event In Venice

On Saturday, July 8th, the "Sala Perla" of the Palazzo del Casinò was crowded by 600 attendees, who filled all seats and then some. The event, titled "Universo: tempo zero - breve storia dell'inizio", was organized in conjunction with the international EPS conference, which takes place until this Wednesday at Lido of Venice. It featured a discussion between the anchor, Silvia Rosa Brusin, and a few guests: Fabiola Gianotti, general director of CERN; Antonio Masiero, vice-president of INFN; and Mirko Pojer, responsible of operations of the LHC collider. The program was enriched by a few videos, and by readings by Sonia Bergamasco and jazz music by Umberto Petrin.

### Backreaction — Stephen Hawking’s 75th Birthday Conference: Impressions

I’m back from Cambridge, where I attended the conference “Gravity and Black Holes” in honor of Stephen Hawking’s 75th birthday. First things first, the image on the conference poster, website, banner, etc is not a psychedelic banana, but gravitational wave emission in a black hole merger. It’s a still from a numerical simulation done by a Cambridge group that you can watch in full on YouTube.

## July 08, 2017

### John Baez — A Bicategory of Decorated Cospans

My students are trying to piece together general theory of networks, inspired by many examples. A good general theory should clarify and unify these examples. What some people call network theory, I’d just call ‘applied graph invariant theory’: they come up with a way to calculate numbers from graphs, they calculate these numbers for graphs that show up in nature, and then they try to draw conclusions about this. That’s fine as far as it goes, but there’s a lot more to network theory!

There are many kinds of networks. You can usually create big networks of a given kind by sticking together smaller networks of this kind. The networks usually do something, and the behavior of the whole is usually determined by the behavior of the parts and how the parts are stuck together.

So, we should think of networks of a given kind as morphisms in a category, or more generally elements of an algebra of some operad, and define a map sending each such network to its behavior. Then we can study this map mathematically!

All these insights (and many more) are made precise in Fong’s theory of ‘decorated cospans’:

• Brendan Fong, The Algebra of Open and Interconnected Systems, Ph.D. thesis, University of Oxford, 2016. (Blog article here.)

Kenny Courser is starting to look at the next thing: how one network can turn into another. For example, a network might change over time, or we might want to simplify a complicated network somehow. If a network is morphism, a process where one network turns into another could be a ‘2-morphism’: that is, a morphism between morphisms. Just as categories have objects and morphisms, bicategories have objects, morphisms and 2-morphisms.

So, Kenny is looking at bicategories. As a first step, Kenny took Brendan’s setup and souped it up to define ‘decorated cospan bicategories’:

• Kenny Courser, Decorated cospan bicategories, to appear in Theory and Applications of Categories.

In this paper, he showed that these bicategories are often ‘symmetric monoidal’. This means that you can not only stick networks together end to end, you can also set them side by side or cross one over the other—and similarly for processes that turn one network into another! A symmetric monoidal bicategory is a somewhat fearsome structure, so Kenny used some clever machinery developed by Mike Shulman to get the job done:

• Mike Shulman, Constructing symmetric monoidal bicategories.

I would love to talk about the details, but they’re a bit technical so I think I’d better talk about something more basic. Namely: what’s a decorated cospan category and what’s a decorated cospan bicategory?

First: what’s a decorated cospan? A cospan in some category $C$ is a diagram like this:

where the objects and morphisms are all in $C.$ For example, if $C$ is the category of sets, we’ve got two sets $X$ and $Y$ mapped to a set $\Gamma.$

In a ‘decorated’ cospan, the object $\Gamma$ is equipped or, as we like to say, ‘decorated’ with extra structure. For example:

Here the set $\Gamma$ consists of 3 points—but it’s decorated with a graph whose edges are labelled by numbers! You could use this to describe an electrical circuit made of resistors. The set $X$ would then be the set of ‘input terminals’, and $Y$ the set of ‘output terminals’.

In this example, and indeed in many others, there’s no serious difference between inputs and outputs. We could reflect the picture, switching the roles of $X$ and $Y,$ and the inputs would become outputs and vice versa. One reason for distinguishing them is that we can then attach the outputs of one circuit to the inputs of another and build a larger circuit. If we think of our circuit as a morphism from the input set $X$ to the output set $Y,$ this process of attaching circuits to form larger ones can be seen as composing morphisms in a category.

In other words, if we get the math set up right, we can compose a decorated cospan from $X$ to $Y$ and a decorated cospan from $Y$ to $Z$ and get a decorated cospan from $X$ to $Z.$ So with luck, we get a category with objects of $C$ as objects, and decorated cospans between these guys as morphisms!

For example, we can compose this:

and this:

to get this:

What did I mean by saying ‘with luck’? Well, there’s not really any luck involved, but we need some assumptions for all this to work. Before we even get to the decorations, we need to be able to compose cospans. We can do this whenever our cospans live in a category with pushouts. In category theory, a pushout is how we glue two things together.

So, suppose our category $C$ has pushouts. IF we then have two cospans in $C,$ one from $X$ to $Y$ and one from $Y$ to $Z:$

we can take a pushout:

and get a cospan from $X$ to $Z:$

All this is fine and dandy, but there’s a slight catch: the pushout is only defined up to isomorphism, so we can’t expect this process of composing cospans to be associative: it will only be associative up to isomorphism.

What does that mean? What’s an isomorphism of cospans?

I’m glad you asked. A map of cospans is a diagram like this:

where the two triangles commmute. You can see two cospans in this picture; the morphism $f$ provides the map from one to the other. If $f$ is an isomorphism, then this is an isomorphism of cospans.

To get around this problem, we can work with a category where the morphisms aren’t cospans, but isomorphism classes of cospans. That’s what Brendan did, and it’s fine for many purposes.

But back around 1972, when Bénabou was first inventing bicategories, he noticed that you could also create a bicategory with

• objects of $C$ as objects,
• spans in $C$ as morphisms, and
• maps of spans in $C$ as 2-morphisms.

Bicategories are perfectly happy for composition of 1-morphisms to be associative only up to isomorphism, so this solves the problem in a somewhat nicer way. (Taking equivalence classes of things when you don’t absolutely need to is regarded with some disdain in category theory, because it often means you’re throwing out information—and when you throw out information, you often regret it later.)

So, if you’re interested in decorated cospan categories, and you’re willing to work with bicategories, you should consider thinking about decorated cospan bicategories. And now, thanks to Kenny Courser’s work, you can!

He showed how the decorations work in the bicategorical approach: for example, he proved that whenever $C$ has finite colimits and

$F : (C,+) \to (\mathrm{Set}, \times)$

is a lax symmetric monoidal functor, you get a symmetric monoidal bicategory where a morphism is a cospan in $C:$

with the object $\Gamma$ decorated by an element of $F(\Gamma).$

Proving this took some virtuosic work in category theory. The key turns out to be this glorious diagram:

For the explanation, check out Proposition 4.1 in his paper.

I’ll talk more about applications of cospan bicategories when I blog about some other papers Kenny Courser and Daniel Cicala are writing.

### Chad Orzel — Vacation Update

So, when last I posted an update on kid stuff, we were about to embark for a week in Mexico with family. As you would expect, I have a huge pile of pictures from this, but most of the cute-kid shots feature the kids with their cousins from Illinois, and I try to avoid posting photos of other people’s children.

We had a good time, for the most part. The highlight for the kids was probably a “Swim with Dolphins” excursion. SteelyKid did one of these on last year’s Disney cruise, but now The Pip is both old enough, and enjoys swimming. Technically, he wasn’t tall enough (by maybe an inch) to do the parts where the dolphin pulls or pushes you through the water, but the trainers were super nice, and one of them swam out with him to get him set up, so he got to go. And it was totally worth it for the grin when his boogie board ride started:

(Photos by the photographer at the Dolphin Discovery facility, purchased at somewhat exorbitant rates because there was nowhere to use our own camera from…)

We did get one adults-only trip in, an excursion to the Mayan ruins at Tulum, source of the artsy iguana photo in the “featured image” at the top, and also these more conventional pictures:

Our tour group at Tulum.

The big temple and beach at Tulum.

Tulum was pretty neat, though much less decorated than Chichen Itza (which I saw twenty-mumble years ago, on a different family vacation), reflecting its status as a trading port rather than a ceremonial center. The ruins are fascinating, if really hot, and the beach is spectacular. The excursion also included a bit of snorkeling in a cave, and in the Yal-Ku lagoon, neither of which I have photos of, because they weren’t in places I could use my camera.

Unfortunately, as good as that excursion was, it also led to the absolute worst part of the trip. The excursion included a (not terribly impressive) buffet lunch at Xpu-Ha, where we were seated on an outdoor deck. At the end of lunch, Kate scooted her chair back to re-apply bug spray to her legs, and the chair went over backwards, past the totally inadequate rope “railing,” taking her down to the beach below.

The drop was maybe a couple of meters, and when I got down to the beach a few seconds later, she was awake and moving around, so we helped her to a lounge chair and got her some water and a bag of ice for the lump on the back of her head.

“What happened?” she asked. I explained that she’d gone backwards off the edge of the deck, onto the beach. “That’s so embarrassing,” she said. About this time, the tour guide came by and asked if we needed to see a doctor.

“Give us a minute,” I said. And then Kate said “What happened?” I explained again that the chair had gone over backwards, and then asked “Do you remember me telling you this sixty seconds ago?”

“No,” she said. “What happened?” And I said “Yo! Tour guy! We’d like a doctor, please!”

So, Kate and I spent the rest of that day in the COSTAMED hospital in Playa del Carmen, which was not exactly an ideal vacation experience. They did a CT scan and some other tests, and confirmed there was no really serious damage, but it was probably three or four hours before Kate started reliably remembering anything about that day.

From a detached after-the-fact perspective, it was kind of fascinating to watch lights come back on– right after the fall, she just kept asking “What happened?,” then slowly added other questions (“Was it my fault?” first, then “Did the kids see?” then “Why weren’t the kids with us?” then “Would you tell me even if it was my fault?” (“Honey, after answering this 57 times, yes, I would.”), then “How many times have I asked you this?” (“Three hundred and twelve. And counting.”) and a few more…). In the moment, it was absolutely h*ckin terrifying. Head injuries have very little to recommend them.

In the end, she’s okay. they discharged us that night, after scheduling a follow-up with a neurologist the next day. That guy did a neck x-ray, and recommended a foam cervical collar for a few weeks, plus an appointment with an orthopedist back here in the US. By the next morning, Kate remembered everything about the day up to getting to the restaurant, but nothing between that and the last half-hour or so in the hospital. Which were the only worthwhile parts of that day, anyway, so it’s all good…

The staff at the hospital were very professional, calm, and kind in dealing with a couple of scared Americans who spoke basically no Spanish. And the tour company, Cancun Adventures, was also great about the whole deal– some of their people sat around the waiting room for several hours until we were discharged, then drove us back to the hotel, and they sent a car and driver the next morning for the follow-up appointment. The driver of that was invaluable in dealing with the hospital bureaucracy, and I believe they paid for everything. At least, I wasn’t asked to give anybody any money (though I happily would’ve…).

(As I said on Twitter and Facebook when I mentioned this the next day, any attempt to use this as a jumping-off point to discuss the politics of US health care will get you blocked so hard that it’ll crack the glass on your monitor. Don’t even think about it.)

So, that cast a bit of a pall over the rest of the vacation. Not too much, though– in fact, the dolphin swim was the day after the follow-up appointment, and on the last day in Mexico, we went to the Xel-Ha theme park for some snorkeling and other stuff. I was pleasantly surprised at how good the snorkeling was there, though a lot of that was timing– it got super crowded later in the day. No pictures, though, because we were in the water for most of it.

Anyway, that’s what we’ve been up to recently. And I’ll leave you with this photo of Kate and SteelyKid showing off the stylish accessories they picked up on our trip. And also expressing their opinion of me taking this picture…

Kate and SteelyKid modeling stylish accessories from our Mexican vacation.

### Terence Tao — What are some useful, but little-known, features of the tools used in professional mathematics?

A few days ago, I was talking with Ed Dunne, who is currently the Executive Editor of Mathematical Reviews (and in particular with its online incarnation at MathSciNet).  At the time, I was mentioning how laborious it was for me to create a BibTeX file for dozens of references by using MathSciNet to locate each reference separately, and to export each one to BibTeX format.  He then informed me that underneath to every MathSciNet reference there was a little link to add the reference to a Clipboard, and then one could export the entire Clipboard at once to whatever format one wished.  In retrospect, this was a functionality of the site that had always been visible, but I had never bothered to explore it, and now I can populate a BibTeX file much more quickly.

This made me realise that perhaps there are many other useful features of popular mathematical tools out there that only a few users actually know about, so I wanted to create a blog post to encourage readers to post their own favorite tools, or features of tools, that are out there, often in plain sight, but not always widely known.  Here are a few that I was able to recall from my own workflow (though for some of them it took quite a while to consciously remember, since I have been so used to them for so long!):

1. TeX for Gmail.  A Chrome plugin that lets one write TeX symbols in emails sent through Gmail (by writing the LaTeX code and pressing a hotkey, usually F8).
2. Boomerang for Gmail.  Another Chrome plugin for Gmail, which does two main things.  Firstly, it can “boomerang” away an email from your inbox to return at some specified later date (e.g. one week from today).  I found this useful to declutter my inbox regarding mail that I needed to act on in the future, but was unable to deal with at present due to travel, or because I was waiting for some other piece of data to arrive first.   Secondly, it can send out email with some specified delay (e.g. by tomorrow morning), giving one time to cancel the email if necessary.  (Thanks to Julia Wolf for telling me about Boomerang!)
3. Which just reminds me, the Undo Send feature on Gmail has saved me from embarrassment a few times (but one has to set it up first; it delays one’s emails by a short period, such as 30 seconds, during which time it is possible to undo the email).
4. LaTeX rendering in Inkscape.  I used to use plain text to write mathematical formulae in my images, which always looked terrible.  It took me years to realise that Inkscape had the functionality to compile LaTeX within it.
5. Bookmarks in TeXnicCenter.  I probably only use a tiny fraction of the functionality that TeXnicCenter offers, but one little feature I quite like is the ability to bookmark a portion of the TeX file (e.g. the bibliography at the end, or the place one is currently editing) with one hot-key (Ctrl-F2) and then one can cycle quickly between one bookmarked location and another with some further hot-keys (F2 and shift-F2).
6. Actually, there are a number of Windows keyboard shortcuts that are worth experimenting with (and similarly for Mac or Linux systems of course).
7. Detexify has been the quickest way for me to locate the TeX code for a symbol that I couldn’t quite remember (or when hunting for a new symbol that would roughly be shaped like something I had in mind).
8. For writing LaTeX on my blog, I use Luca Trevisan’s LaTeX to WordPress Python script (together with a little batch file I wrote to actually run the python script).
9. Using the camera on my phone to record a blackboard computation or a slide (or the wifi password at a conference centre, or any other piece of information that is written or displayed really).  If the phone is set up properly this can be far quicker than writing it down with pen and paper.  (I guess this particular trick is now quite widely used, but I still see people surprised when someone else uses a phone instead of a pen to record things.)
10. Using my online calendar not only to record scheduled future appointments, but also to block out time to do specific tasks (e.g. reserve 2-3pm at Tuesday to read paper X, or do errand Y).  I have found I am able to get a much larger fraction of my “to do” list done on days in which I had previously blocked out such specific chunks of time, as opposed to days in which I had left several hours unscheduled (though sometimes those hours were also very useful for finding surprising new things to do that I had not anticipated).  (I learned of this little trick online somewhere, but I have long since lost the original reference.)

Anyway, I would very much like to hear what other little tools or features other readers have found useful in their work.

Filed under: non-technical, tricks Tagged: productivity, Software

## July 07, 2017

### Doug Natelson — Two books that look fun

Two books that look right up my alley:

• Storm in a Teacup by Helen Czerski.  Dr. Czerski is a researcher at University College London, putting her physics credentials to work studying bubbles in physical oceanography.  She also writes the occasional "everyday physics" column in the Wall Street Journal, and it's great stuff.
• Max the Demon vs. Entropy of Doom by Assa Auerbach and Richard Codor.   Prof. Auerbach is a serious condensed matter theorist at the Technion.  This one is a kick-starter to produce a light-hearted graphic novel that is educational without being overly mathematical.  Looks fun.  Seems like the target audience would be similar to that for Spectra.

### Tommaso Dorigo — LHCb Unearths New Doubly-Charmed Hadron Where Marek Karliner And Jonathan Rosner Ordered It

[UPDATE: see at the bottom for some additional commentary following a post on the matter by our friend Lubos Motl in his blog, where he quotes this piece and disagrees on the interest of finding the Xi mass in perfect agreement with an a priori calculation.]

It is always nice to learn that a new hadron is discovered - this broadens our understanding of the extremely complicated fabric of Quantum Chromodynamics (QCD), the theory of strong interactions that govern nuclear matter and are responsible for its stability.

### Terence Tao — Correlations of the von Mangoldt and higher divisor functions I. Long shift ranges

Kaisa Matomaki, Maksym Radziwill, and I have uploaded to the arXiv our paper “Correlations of the von Mangoldt and higher divisor functions I. Long shift ranges“, submitted to Proceedings of the London Mathematical Society. This paper is concerned with the estimation of correlations such as

$\displaystyle \sum_{n \leq X} \Lambda(n) \Lambda(n+h) \ \ \ \ \ (1)$

for medium-sized ${h}$ and large ${X}$, where ${\Lambda}$ is the von Mangoldt function; we also consider variants of this sum in which one of the von Mangoldt functions is replaced with a (higher order) divisor function, but for sake of discussion let us focus just on the sum (1). Understanding this sum is very closely related to the problem of finding pairs of primes that differ by ${h}$; for instance, if one could establish a lower bound

$\displaystyle \sum_{n \leq X} \Lambda(n) \Lambda(n+2) \gg X$

then this would easily imply the twin prime conjecture.

The (first) Hardy-Littlewood conjecture asserts an asymptotic

$\displaystyle \sum_{n \leq X} \Lambda(n) \Lambda(n+h) = {\mathfrak S}(h) X + o(X) \ \ \ \ \ (2)$

as ${X \rightarrow \infty}$ for any fixed positive ${h}$, where the singular series ${{\mathfrak S}(h)}$ is an arithmetic factor arising from the irregularity of distribution of ${\Lambda}$ at small moduli, defined explicitly by

$\displaystyle {\mathfrak S}(h) := 2 \Pi_2 \prod_{p|h; p>2} \frac{p-2}{p-1}$

when ${h}$ is even, and ${{\mathfrak S}(h)=0}$ when ${h}$ is odd, where

$\displaystyle \Pi_2 := \prod_{p>2} (1-\frac{1}{(p-1)^2}) = 0.66016\dots$

is (half of) the twin prime constant. See for instance this previous blog post for a a heuristic explanation of this conjecture. From the previous discussion we see that (2) for ${h=2}$ would imply the twin prime conjecture. Sieve theoretic methods are only able to provide an upper bound of the form ${ \sum_{n \leq X} \Lambda(n) \Lambda(n+h) \ll {\mathfrak S}(h) X}$.

Needless to say, apart from the trivial case of odd ${h}$, there are no values of ${h}$ for which the Hardy-Littlewood conjecture is known. However there are some results that say that this conjecture holds “on the average”: in particular, if ${H}$ is a quantity depending on ${X}$ that is somewhat large, there are results that show that (2) holds for most (i.e. for ${1-o(1)}$) of the ${h}$ betwen ${0}$ and ${H}$. Ideally one would like to get ${H}$ as small as possible, in particular one can view the full Hardy-Littlewood conjecture as the endpoint case when ${H}$ is bounded.

The first results in this direction were by van der Corput and by Lavrik, who established such a result with ${H = X}$ (with a subsequent refinement by Balog); Wolke lowered ${H}$ to ${X^{5/8+\varepsilon}}$, and Mikawa lowered ${H}$ further to ${X^{1/3+\varepsilon}}$. The main result of this paper is a further lowering of ${H}$ to ${X^{8/33+\varepsilon}}$. In fact (as in the preceding works) we get a better error term than ${o(X)}$, namely an error of the shape ${O_A( X \log^{-A} X)}$ for any ${A}$.

Our arguments initially proceed along standard lines. One can use the Hardy-Littlewood circle method to express the correlation in (2) as an integral involving exponential sums ${S(\alpha) := \sum_{n \leq X} \Lambda(n) e(\alpha n)}$. The contribution of “major arc” ${\alpha}$ is known by a standard computation to recover the main term ${{\mathfrak S}(h) X}$ plus acceptable errors, so it is a matter of controlling the “minor arcs”. After averaging in ${h}$ and using the Plancherel identity, one is basically faced with establishing a bound of the form

$\displaystyle \int_{\beta-1/H}^{\beta+1/H} |S(\alpha)|^2\ d\alpha \ll_A X \log^{-A} X$

for any “minor arc” ${\beta}$. If ${\beta}$ is somewhat close to a low height rational ${a/q}$ (specifically, if it is within ${X^{-1/6-\varepsilon}}$ of such a rational with ${q = O(\log^{O(1)} X)}$), then this type of estimate is roughly of comparable strength (by another application of Plancherel) to the best available prime number theorem in short intervals on the average, namely that the prime number theorem holds for most intervals of the form ${[x, x + x^{1/6+\varepsilon}]}$, and we can handle this case using standard mean value theorems for Dirichlet series. So we can restrict attention to the “strongly minor arc” case where ${\beta}$ is far from such rationals.

The next step (following some ideas we found in a paper of Zhan) is to rewrite this estimate not in terms of the exponential sums ${S(\alpha) := \sum_{n \leq X} \Lambda(n) e(\alpha n)}$, but rather in terms of the Dirichlet polynomial ${F(s) := \sum_{n \sim X} \frac{\Lambda(n)}{n^s}}$. After a certain amount of computation (including some oscillatory integral estimates arising from stationary phase), one is eventually reduced to the task of establishing an estimate of the form

$\displaystyle \int_{t \sim \lambda X} (\sum_{t-\lambda H}^{t+\lambda H} |F(\frac{1}{2}+it')|\ dt')^2\ dt \ll_A \lambda^2 H^2 X \log^{-A} X$

for any ${X^{-1/6-\varepsilon} \ll \lambda \ll \log^{-B} X}$ (with ${B}$ sufficiently large depending on ${A}$).

The next step, which is again standard, is the use of the Heath-Brown identity (as discussed for instance in this previous blog post) to split up ${\Lambda}$ into a number of components that have a Dirichlet convolution structure. Because the exponent ${8/33}$ we are shooting for is less than ${1/4}$, we end up with five types of components that arise, which we call “Type ${d_1}$“, “Type ${d_2}$“, “Type ${d_3}$“, “Type ${d_4}$“, and “Type II”. The “Type II” sums are Dirichlet convolutions involving a factor supported on a range ${[X^\varepsilon, X^{-\varepsilon} H]}$ and is quite easy to deal with; the “Type ${d_j}$” terms are Dirichlet convolutions that resemble (non-degenerate portions of) the ${j^{th}}$ divisor function, formed from convolving together ${j}$ portions of ${1}$. The “Type ${d_1}$” and “Type ${d_2}$” terms can be estimated satisfactorily by standard moment estimates for Dirichlet polynomials; this already recovers the result of Mikawa (and our argument is in fact slightly more elementary in that no Kloosterman sum estimates are required). It is the treatment of the “Type ${d_3}$” and “Type ${d_4}$” sums that require some new analysis, with the Type ${d_3}$ terms turning to be the most delicate. After using an existing moment estimate of Jutila for Dirichlet L-functions, matters reduce to obtaining a family of estimates, a typical one of which (relating to the more difficult Type ${d_3}$ sums) is of the form

$\displaystyle \int_{t - H}^{t+H} |M( \frac{1}{2} + it')|^2\ dt' \ll X^{\varepsilon^2} H \ \ \ \ \ (3)$

for “typical” ordinates ${t}$ of size ${X}$, where ${M}$ is the Dirichlet polynomial ${M(s) := \sum_{n \sim X^{1/3}} \frac{1}{n^s}}$ (a fragment of the Riemann zeta function). The precise definition of “typical” is a little technical (because of the complicated nature of Jutila’s estimate) and will not be detailed here. Such a claim would follow easily from the Lindelof hypothesis (which would imply that ${M(1/2 + it) \ll X^{o(1)}}$) but of course we would like to have an unconditional result.

$\displaystyle \frac{H}{X^{1/3}} \sum_{\ell \sim X^{1/3}/H} |\sum_{m \sim X^{1/3}} e( \frac{t}{2\pi} \log \frac{m+\ell}{m-\ell} )|.$

The phase ${\frac{t}{2\pi} \log \frac{m+\ell}{m-\ell}}$ can be Taylor expanded as the sum of ${\frac{t_j \ell}{\pi m}}$ and a lower order term ${\frac{t_j \ell^3}{3\pi m^3}}$, plus negligible errors. If we could discard the lower order term then we would get quite a good bound using the exponential sum estimates of Robert and Sargos, which control averages of exponential sums with purely monomial phases, with the averaging allowing us to exploit the hypothesis that ${t}$ is “typical”. Figuring out how to get rid of this lower order term caused some inefficiency in our arguments; the best we could do (after much experimentation) was to use Fourier analysis to shorten the sums, estimate a one-parameter average exponential sum with a binomial phase by a two-parameter average with a monomial phase, and then use the van der Corput ${B}$ process followed by the estimates of Robert and Sargos. This rather complicated procedure works up to ${H = X^{8/33+\varepsilon}}$ it may be possible that some alternate way to proceed here could improve the exponent somewhat.

In a sequel to this paper, we will use a somewhat different method to reduce ${H}$ to a much smaller value of ${\log^{O(1)} X}$, but only if we replace the correlations ${\sum_{n \leq X} \Lambda(n) \Lambda(n+h)}$ by either ${\sum_{n \leq X} \Lambda(n) d_k(n+h)}$ or ${\sum_{n \leq X} d_k(n) d_l(n+h)}$, and also we now only save a ${o(1)}$ in the error term rather than ${O_A(\log^{-A} X)}$.

Filed under: math.NT, paper Tagged: circle method, exponential sums, Kaisa Matomaki, Maksym Radziwill, twin primes

## July 06, 2017

### Doug Natelson — Science and policy-making in the US

Over twenty years ago, Congress de-funded its Office of Technology Assessment, which was meant to be a non-partisan group (somewhat analogous to the Congressional Budget Office) that was to help inform congressional decision-making on matters related to technology and public policy.  The argument at the time of the de-funding was that it was duplicative - that there are other federal agencies (e.g., DOE, NSF, NIH, EPA, NOAA) and bodies (the National Academies) that are capable of providing information and guidance to Congress.   In addition, there are think-tanks like the Rand CorporationIDA, and MITRE, though those groups need direction and a "customer" for their studies.   Throughout this period, the executive branch at least had the Office of Science and Technology Policy, headed by the Presidential Science Advisor, to help in formulating policy.  The level of influence of OSTP and the science advisor waxed and waned depending on the administration.   Science is certainly not the only component of technology-related policy, nor even the dominant one, but for the last forty years (OSTP's existence) and arguably going back to Vannevar Bush, there has been broad bipartisan agreement that science should at least factor into relevant decisions.

We are now in a new "waning" limit, where all of the key staff offices at OSTP are vacant, and there seems to be no plan or timeline to fill them.     The argument from the administration, articulated in here, is that OSTP was redundant and that its existence is not required for science to have a voice in policy-making within the executive branch.   While that is technically true, in the sense that the White House can always call up anyone they want and ask for advice, removing science's official seat at the table feels like a big step.  As I've mentioned before, some things are hard to un-do.   Wiping out OSTP for at least the next 3.5 years would send a strong message, as does gutting the science boards of agencies.   There will be long-term effects, both in actual policy-making, and in continuity of knowledge and the pipeline of scientists and engineers interested in and willing to devote time to this kind of public service.   (Note that there is a claim from an unnamed source that there will be a new OSTP director, though there is no timeline.)

### John Baez — Entropy 2018

The editors of the journal Entropy are organizing this conference:

Entropy 2018 — From Physics to Information Sciences and Geometry, 14–16 May 2018, Auditorium Enric Casassas, Faculty of Chemistry, University of Barcelona, Barcelona, Spain.

They write:

One of the most frequently used scientific words is the word “entropy”. The reason is that it is related to two main scientific domains: physics and information theory. Its origin goes back to the start of physics (thermodynamics), but since Shannon, it has become related to information theory. This conference is an opportunity to bring researchers of these two communities together and create a synergy. The main topics and sessions of the conference cover:

• Physics: classical and quantum thermodynamics
• Statistical physics and Bayesian computation
• Geometrical science of information, topology and metrics
• Maximum entropy principle and inference
• Kullback and Bayes or information theory and Bayesian inference
• Entropy in action (applications)

The inter-disciplinary nature of contributions from both theoretical and applied perspectives are very welcome, including papers addressing conceptual and methodological developments, as well as new applications of entropy and information theory.

All accepted papers will be published in the proceedings of the conference. A selection of invited and contributed talks presented during the conference will be invited to submit an extended version of their paper for a special issue of the open access journal Entropy.

## July 05, 2017

### Scott Aaronson — ITCS’2018

My friend Anna Karlin, who chairs the ITCS program committee this year, asked me to post the following announcement, and I’m happy to oblige her.  I’ve enjoyed ITCS every time I’ve attended, and was even involved in the statement that led to ITCS’s creation, although I don’t take direct responsibility for the content of this ad. –SA

The ITCS 2018 Call For Papers is now available!

ITCS is a conference that stands apart from all others. For a decade now, it has been celebrating the vibrancy and unity of our field of Theoretical Computer Science. See this blog post for a detailed discussion of what makes ITCS so cool and the brief description of ITCS’17 at the end of this post.

ITCS seeks to promote research that carries a strong conceptual message  (e.g., introducing a new concept, model or understanding, opening a new line of inquiry within traditional or interdisciplinary areas, introducing new mathematical techniques and methodologies, or new applications of known techniques). ITCS welcomes both conceptual and technical contributions whose contents will advance and inspire the greater theory community.

This year, ITCS will be held at MIT in Cambridge, MA from January 11-14, 2018.

The submission deadline is September 8, 2017, with notification of decisions by October 30, 2017.

Authors should strive to make their papers accessible not only to experts in their subarea, but also to the theory community at large. The committee will place a premium on writing that conveys clearly and in the simplest possible way what the paper is accomplishing.

Ten-page versions of accepted papers will be published in an electronic proceedings of the conference. However, the alternative of publishing a one page abstract with a link to a full PDF will also be available (to accommodate subsequent publication in journals that would not consider results that have been published in preliminary form in a conference proceedings).

You can find all the details in the official Call For Papers.

On last year’s ITCS (by the PC Chair Christos Papadimitriou)

This past ITCS (2017) was by all accounts the most successful ever.  We had 170+ submissions and 61 papers, including 5 “invited papers”, and 90+ registrants, all new records.  There was a voluntary poster session for authors to get a chance to go through more detail, and the famous Graduating Bits event, where the younger ones get their 5 minutes to show off their accomplishment and personality.

The spirit of the conference was invigorating, heartwarming, and great fun.  I believe none of the twelve sessions had fewer than 70 attendees — no parallelism, of course — while the now famous last session was among the best attended and went one hour overtime due to the excitement of discussion (compare with the last large conference that you attended).

## July 03, 2017

### n-Category CaféThe Geometric McKay Correspondence (Part 2)

Last time I sketched how the $E_8$ Dynkin diagram arises from the icosahedron. This time I’m fill in some details. I won’t fill in all the details, because I don’t know how! Working them out is the goal of this series, and I’d like to enlist your help.

Remember the basic idea. We start with the rotational symmetry group of the isosahedron and take its double cover, getting a 120-element group $\Gamma$ called the binary icosahedral group. Since this is naturally a subgroup of $\mathrm{SU}(2)$ it acts on $\mathbb{C}^2$, and we can form the quotient space

$S = \mathbb{C}^2/\Gamma$

This is a smooth manifold except at the origin — by which I mean the point coming from $0 \in \mathbb{C}^2$. Luckily we can ‘resolve’ this singularity! This implies that we can find a smooth manifold $\widetilde{S}$ and a smooth map

$\pi \colon \widetilde{S} \to S$

that’s one-to-one and onto except at the origin. There may be various ways to do this, but there’s one best way, the ‘minimal’ resolution, and that’s what I’ll be talking about.

The origin is where all the fun happens. The map $\pi$ sends 8 spheres to the origin in $\mathbb{C}^2/\Gamma$, one for each dot in the $\mathrm{E}_8$ Dynkin diagram:

Two of these spheres intersect in a point if their dots are connected by an edge; otherwise they’re disjoint.

This is wonderful! So, the question is just how do we really see it? For starters, how do we get our hands on this manifold $\widetilde{S}$ and this map $\pi \colon \widetilde{S} \to S$?

For this we need some algebraic geometry. Indeed, the whole subject of ‘resolving singularities’ is part of algebraic geometry! However, since I still remember my ignorant youth, I want to avoid flinging around the vocabulary of this subject until we actually need it. So, experts will have to pardon my baby-talk. Nonexperts can repay me in cash, chocolate, bitcoins or beer.

What’s $\widetilde{S}$ like? First I’ll come out and tell you, and then I’ll start explaining what the heck I just said.

Theorem. $\widetilde{S}$ is the space of all $\Gamma$-invariant ideals $I \subseteq \mathbb{C}[x,y]$ such that $\mathbb{C}[x,y]/I$ is isomorphic, as a representation of $\Gamma$, to the regular representation of $\Gamma$.

If you want a proof, this is Corollary 12.8 in Kirillov’s Quiver Representations and Quiver Varieties. It’s on page 245, so you’ll need to start by reading lots of other stuff. It’s a great book! But it’s not completely self-contained: for example, right before Corollary 12.8 he brings in a crucial fact without proof: “it can be shown that in dimension 2, if a crepant resolution exists, it is minimal”.

I will not try to prove this theorem; instead I will start explaining what it means.

Suppose you have a bunch of points $p_1, \dots, p_n \in \mathbb{C}^2$. We can look at all the polynomials on $\mathbb{C}^2$ that vanish at these points. What is this collection of polynomials like?

Let’s use $x$ and $y$ as names for the standard coordinates on $\mathbb{C}^2$, so polynomials on $\mathbb{C}^2$ are just polynomials in these variables. Let’s call the ring of all such polynomials $\mathbb{C}[x,y]$. And let’s use $I$ to stand for the collection of such polynomials that vanish at our points $p_1, \dots, p_n$.

Here are two obvious facts about $I$:

A. If $f \in I$ and $g \in I$ then $f + g \in I$.

B. If $f \in I$ and $g \in \mathbb{C}[x,y]$ then $f g \in I$.

We summarize these by saying $I$ is an ideal, and this is why we called it $I$. (So clever!)

Here’s a slightly less obvious fact about $I$:

C. If the points $p_1, \dots, p_n$ are all distinct, then $\mathbb{C}[x,y]/I$ has dimension $n$.

The point is that the value of a function $f \in \mathbb{C}[x,y]$ at a point $p_i$ doesn’t change if we add an element of $I$ to $f$, so this value defines a linear functional on $\mathbb{C}[x,y]/I$ . Guys like this form a basis of linear functionals on $\mathbb{C}[x,y]/I$, so it’s $n$-dimensional.

All this should make you interested in the set of ideals $I$ with $\mathrm{dim}(\mathbb{C}[x,y]/I) = n$. This set is called the Hilbert scheme $\mathrm{Hilb}^n(\mathbb{C}^2)$.

Why is it called a scheme? Well, Hilbert had a bunch of crazy schemes and this was one. Just kidding: actually Hilbert schemes were invented by Grothendieck in 1961. I don’t know why he named them after Hilbert. The kind of Hilbert scheme I’m using is a very basic one, more precisely called the ‘punctual’ Hilbert scheme.

The Hilbert scheme $\mathrm{Hilb}^n(\mathbb{C}^2)$ is a whole lot like the set of unordered $n$-tuples of distinct points in $\mathbb{C}^2$. Indeed, we’ve seen that every such $n$-tuple gives a point in the Hilbert scheme. But there are also other points in the Hilbert scheme! And this is where the fun starts!

Imagine $n$ particles moving in $\mathbb{C}^2$, with their motion described by polynomial functions of time. As long as these particles don’t collide, they define a curve in the Hilbert scheme. But it still works when they collide! When they collide, this curve will hit a point in the Hilbert scheme that doesn’t come from an unordered $n$-tuple of distinct points in $\mathbb{C}^2$. This point describes a ‘type of collision’.

More precisely: $n$-tuples of distinct points in $\mathbb{C}^2$ give an open dense set in the Hilbert scheme, but there are other points in the Hilbert scheme which can be reached as limits of those in this open dense set! The topology here is very subtle, so let’s look at an example.

Let’s look at the Hilbert scheme $\mathrm{Hilb}^2(\mathbb{C}^2)$. Given two distinct points $p_1, p_2 \in \mathbb{C}^2$, we get an ideal

$\{ f \in \mathbb{C}[x,y] \, : \; f(p_1) = f(p_2) = 0 \}$

This ideal is a point in our Hilbert scheme, since $\mathrm{dim}(\mathbb{C}[x,y]/I) = 2$.

But there are other points in our Hilbert scheme! For example, if we take any point $p \in \mathbb{C}^2$ and any vector $v \in \mathbb{C}^2$, there’s an ideal consisting of polynomials that vanish at $p$ and whose directional derivative in the $v$ direction also vanishes at $p$:

$I = \{ f \in \mathbb{C}[x,y] \, : \; f(p) = \lim_{t \to 0} \frac{f(p+t v) - f(p)}{t} = 0 \}$

It’s pretty easy to check that this is an ideal and that $\mathrm{dim}(\mathbb{C}[x,y]/I) = 2$. We can think of this ideal as describing ‘two particles in $\mathbb{C}^2$ that have collided at $p$ with relative velocity some multiple of $v$’.

For example you could have one particle sitting at $p$ while another particle smacks into it while moving with velocity $v$; as they collide the corresponding curve in the Hilbert scheme would hit $I$.

This would also work if the velocity were any multiple of $v$, since we also have

$I = \{ f \in \mathbb{C}[x,y] \, : \; f(p) = \lim_{t \to 0} \frac{f(p+ c t v) - f(p)}{t} = 0 \}$

for any constant $c \ne 0$. And note, this constant can be complex. I’m trying to appeal to your inner physicist, but we’re really doing algebraic geometry over the complex numbers, so we can do weird stuff like multiply velocities by complex numbers.

Or, both particles could be moving and collide at $p$ while their relative velocity was some complex multiple of $v$. As they collide, the corresponding point in the Hilbert scheme would still hit $I$.

But here’s the cool part: such ‘2-particle collisions with specified position and relative velocity’ give all the points in the Hilbert scheme $\mathrm{Hilb}^2(\mathbb{C}^2)$, except of course for those points coming from 2 particles with distinct positions.

What happens when we go to the next Hilbert scheme, $\mathrm{Hilb}^3(\mathbb{C}^2)$? This Hilbert scheme has an open dense set corresponding to triples of particles with distinct positions. It has other points coming from situations where two particles collide with some specified position and relative velocity while a third ‘bystander’ particle sits somewhere else. But it also has points coming from triple collisions. And these are more fancy! Not only velocities but accelerations play a role!

I could delve into this further, but for now I’ll just point you here:

• John Baez, The Hilbert scheme for 3 points on a surface, MathOverflow, June 7, 2017.

The main thing to keep in mind is this. As $n$ increases, there are more and more ways we can dream up ideals $I$ with $\mathrm{dim}(\mathbb{C}[x,y]/I) = n$. But all these ideals consist of functions that vanish at $n$ or fewer points and also obey other equations saying that various linear combinations of their first, second, and higher derivatives vanish. We can think of these ideals as ways for $n$ particles to collide, with conditions on their positions, velocities, accelerations, etc. The total number of conditions needs to be $n$.

Now let’s revisit that description of the wonderful space we’re seeking to understand, $\widetilde{S}$:

Theorem. $\widetilde{S}$ is the space of all $\Gamma$-invariant ideals $I \subseteq \mathbb{C}[x,y]$ such that $\mathbb{C}[x,y]/I$ is isomorphic, as a representation of $\Gamma$, to the regular representation of $\Gamma$.

Since $\Gamma$ has 120 elements, its regular representation — the obvious representation of this group on the space of complex functions on this group — is 120-dimensional. So, points in $\widetilde{S}$ are ideals $I$ with $\mathrm{dim}(\mathbb{C}[x,y]/I) = 120$. So, they’re points in the Hilbert scheme $\mathrm{Hilb}^{120}(\mathbb{C}^2)$.

But they’re not just any old points in this Hilbert scheme! The binary icosahedral group $\Gamma$ acts on $\mathbb{C}^2$ and thus anything associated with it. In particular, it acts on the HIlbert scheme $\mathrm{Hilb}^{120}(\mathbb{C}^2)$. A point in this Hilbert scheme can lie in $\widetilde{S}$ only if it’s invariant under the action of $\Gamma$. And given this, it’s in $\widetilde{S}$ if and only if $\mathbb{C}[x,y]/I$ is isomorphic to the regular representation of $\Gamma$.

Given all this, there’s an easy way to get your hands on a point $I \in \widetilde{S}$. Just take any nonzero element of $\mathbb{C}^2$ and act on it by $\Gamma$. You’ll get 120 distinct points in $\mathbb{C}^2$ — I promise. Do you see why? Then let $I$ be the set of polynomials that vanish on all these points.

In fact, we saw last time that your 120 points will be the vertices of a 600-cell centered at the origin of $\mathbb{C}^2$:

By this construction we get enough points to form an open dense subset of $\widetilde{S}$. These are the points that aren’t mapped to the origin by

$\pi \colon \widetilde{S} \to S$

Alas, it’s the other points in $\widetilde{S}$ that I’m really interested in. As I hope you see, these are certain ‘limits’ of 600-cells that have ‘shrunk to the origin’… or in other words, highly symmetrical ways for 120 points in $\mathbb{C}^2$ to collide at the origin, with some highly symmetrical conditions on their velocities, accelerations, etc.

That’s what I need to understand.

### John Baez — The Geometric McKay Correspondence (Part 2)

Last time I sketched how the $E_8$ Dynkin diagram arises from the icosahedron. This time I’m fill in some details. I won’t fill in all the details, because I don’t know how! Working them out is the goal of this series, and I’d like to enlist your help.

(In fact, I’m running this series of posts both here and at the n-Category Café. So far I’m getting many more comments over there. So, to keep the conversation in one place, I’ll disable comments here and urge you to comment over there.)

Remember the basic idea. We start with the rotational symmetry group of the isosahedron and take its double cover, getting a 120-element group $\Gamma$ called the binary icosahedral group. Since this is naturally a subgroup of $\mathrm{SU}(2)$ it acts on $\mathbb{C}^2,$ and we can form the quotient space

$S = \mathbb{C}^2/\Gamma$

This is a smooth manifold except at the origin—by which I mean the point coming from $0 \in \mathbb{C}^2.$ Luckily we can ‘resolve’ this singularity! This implies that we can find a smooth manifold $\widetilde{S}$ and a smooth map

$\pi \colon \widetilde{S} \to S$

that’s one-to-one and onto except at the origin. There may be various ways to do this, but there’s one best way, the ‘minimal’ resolution, and that’s what I’ll be talking about.

The origin is where all the fun happens. The map $\pi$ sends 8 spheres to the origin in $\mathbb{C}^2/\Gamma,$ one for each dot in the $\mathrm{E}_8$ Dynkin diagram:

Two of these spheres intersect in a point if their dots are connected by an edge; otherwise they’re disjoint.

This is wonderful! So, the question is just how do we really see it? For starters, how do we get our hands on this manifold $\widetilde{S}$ and this map $\pi \colon \widetilde{S} \to S?$

For this we need some algebraic geometry. Indeed, the whole subject of ‘resolving singularities’ is part of algebraic geometry! However, since I still remember my ignorant youth, I want to avoid flinging around the vocabulary of this subject until we actually need it. So, experts will have to pardon my baby-talk. Nonexperts can repay me in cash, chocolate, bitcoins or beer.

What’s $\widetilde{S}$ like? First I’ll come out and tell you, and then I’ll start explaining what the heck I just said.

Theorem. $\widetilde{S}$ is the space of all $\Gamma$-invariant ideals $I \subseteq \mathbb{C}[x,y]$ such that $\mathbb{C}[x,y]/I$ is isomorphic, as a representation of $\Gamma,$ to the regular representation of $\Gamma.$

If you want a proof, this is Corollary 12.8 in Kirillov’s Quiver Representations and Quiver Varieties. It’s on page 245, so you’ll need to start by reading lots of other stuff. It’s a great book! But it’s not completely self-contained: for example, right before Corollary 12.8 he brings in a crucial fact without proof: “it can be shown that in dimension 2, if a crepant resolution exists, it is minimal”.

I will not try to prove the theorem; instead I will start explaining what it means.

Suppose you have a bunch of points $p_1, \dots, p_n \in \mathbb{C}^2.$ We can look at all the polynomials on $\mathbb{C}^2$ that vanish at these points. What is this collection of polynomials like?

Let’s use $x$ and $y$ as names for the standard coordinates on $\mathbb{C}^2,$ so polynomials on $\mathbb{C}^2$ are just polynomials in these variables. Let’s call the ring of all such polynomials $\mathbb{C}[x,y].$ And let’s use $I$ to stand for the collection of such polynomials that vanish at our points $p_1, \dots, p_n.$

Here are two obvious facts about $I$:

A. If $f \in I$ and $g \in I$ then $f + g \in I.$

B. If $f \in I$ and $g \in \mathbb{C}[x,y]$ then $fg \in I.$

We summarize these by saying $I$ is an ideal, and this is why we called it $I.$ (So clever!)

Here’s a slightly less obvious fact about $I$:

C. If the points $p_1, \dots, p_n$ are all distinct, then $\mathbb{C}[x,y]/I$ has dimension n.

The point is that the value of a function $f \in \mathbb{C}[x,y]$ at a point $p_i$ doesn’t change if we add an element of $I$ to $f,$ so this value defines a linear functional on $\mathbb{C}[x,y]/I.$ Guys like this form a basis of linear functionals on $\mathbb{C}[x,y]/I,$ so it’s n-dimensional.

All this should make you interested in the set of ideals $I$ with $\mathrm{dim}(\mathbb{C}[x,y]/I) = n.$ This set is called the Hilbert scheme $\mathrm{Hilb}^n(\mathbb{C}^2).$

Why is it called a scheme? Well, Hilbert had a bunch of crazy schemes and this was one. Just kidding: actually Hilbert schemes were invented by Grothendieck in 1961. I don’t know why he named them after Hilbert. The kind of Hilbert scheme I’m using is a very basic one, more precisely called the ‘punctual’ Hilbert scheme.

The Hilbert scheme $\mathrm{Hilb}^n(\mathbb{C}^2)$ is a whole lot like the set of unordered n-tuples of distinct points in $\mathbb{C}^2.$ Indeed, we’ve seen that every such n-tuple gives a point in the Hilbert scheme. But there are also other points in the Hilbert scheme! And this is where the fun starts!

Imagine n particles moving in $\mathbb{C}^2,$ with their motion described by polynomial functions of time. As long as these particles don’t collide, they define a curve in the Hilbert scheme. But it still works when they collide! When they collide, this curve will hit a point in the Hilbert scheme that doesn’t come from an unordered n-tuple of distinct points in $\mathbb{C}^2.$ This point describes a ‘type of collision’.

More precisely: n-tuples of distinct points in $\mathbb{C}^2$ give an open dense set in the Hilbert scheme, but there are other points in the Hilbert scheme which can be reached as limits of those in this open dense set! The topology here is very subtle, so let’s look at an example.

Let’s look at the Hilbert scheme $\mathrm{Hilb}^2(\mathbb{C}^2).$ Given two distinct points $p_1, p_2 \in \mathbb{C}^2,$ we get an ideal

$\{ f \in \mathbb{C}[x,y] \, : \; f(p_1) = f(p_2) = 0 \}$

This ideal is a point in our Hilbert scheme, since $\mathrm{dim}(\mathbb{C}[x,y]/I) = 2 .$

But there are other points in our Hilbert scheme! For example, if we take any point $p \in \mathbb{C}^2$ and any vector $v \in \mathbb{C}^2,$ there’s an ideal consisting of polynomials that vanish at $p$ and whose directional derivative in the $v$ direction also vanishes at $p$:

$\displaystyle{ I = \{ f \in \mathbb{C}[x,y] \, : \; f(p) = \lim_{t \to 0} \frac{f(p+t v) - f(p)}{t} = 0 \} }$

It’s pretty easy to check that this is an ideal and that $\mathrm{dim}(\mathbb{C}[x,y]/I) = 2 .$ We can think of this ideal as describing two particles in $\mathbb{C}^2$ that have collided at $p$ with relative velocity some multiple of $v$.

For example you could have one particle sitting at $p$ while another particle smacks into it while moving with velocity $v;$ as they collide the corresponding curve in the Hilbert scheme would hit $I.$

This would also work if the velocity were any multiple of $v,$ since we also have

$\displaystyle{ I = \{ f \in \mathbb{C}[x,y] \, : \; f(p) = \lim_{t \to 0} \frac{f(p+ c t v) - f(p)}{t} = 0 \} }$

for any constant $c \ne 0.$ And note, this constant can be complex. I’m trying to appeal to your inner physicist, but we’re really doing algebraic geometry over the complex numbers, so we can do weird stuff like multiply velocities by complex numbers.

Or, both particles could be moving and collide at $p$ while their relative velocity was some complex multiple of $v.$ As they collide, the corresponding point in the Hilbert scheme would still hit $I.$

But here’s the cool part: such ‘2-particle collisions with specified position and relative velocity’ give all the points in the Hilbert scheme $\mathrm{Hilb}^2(\mathbb{C}^2),$ except of course for those points coming from 2 particles with distinct positions.

What happens when we go to the next Hilbert scheme, $\mathrm{Hilb}^3(\mathbb{C}^2)?$ This Hilbert scheme has an open dense set corresponding to triples of particles with distinct positions. It has other points coming from situations where two particles collide with some specified position and relative velocity while a third ‘bystander’ particle sits somewhere else. But it also has points coming from triple collisions. And these are more fancy! Not only velocities but accelerations play a role!

I could delve into this further, but for now I’ll just point you here:

• John Baez, The Hilbert scheme for 3 points on a surface, MathOverflow, June 7, 2017.

The main thing to keep in mind is this. As n increases, there are more and more ways we can dream up ideals $I$ with $\mathrm{dim}(\mathbb{C}[x,y]/I) = n.$ But all these ideals consist of functions that vanish at n or fewer points and also obey other equations saying that various linear combinations of their first, second, and higher derivatives vanish. We can think of these ideals as ways for n particles to collide, with conditions on their positions, velocities, accelerations, etc. The total number of conditions needs to be n.

Now let’s revisit that description of the wonderful space we’re seeking to understand, $\widetilde{S}$:

Theorem. $\widetilde{S}$ is the space of all $\Gamma$-invariant ideals $I \subseteq \mathbb{C}[x,y]$ such that $\mathbb{C}[x,y]/I$ is isomorphic, as a representation of $\Gamma,$ to the regular representation of $\Gamma.$

Since $\Gamma$ has 120 elements, its regular representation—the obvious representation of this group on the space of complex functions on this group—is 120-dimensional. So, points in $\widetilde{S}$ are ideals $I$ with $\mathrm{dim}(\mathbb{C}[x,y]/I) = 120 .$ So, they’re points in the Hilbert scheme $\mathrm{Hilb}^{120}(\mathbb{C}^2).$

But they’re not just any old points in this Hilbert scheme! The binary icosahedral group $\Gamma$ acts on $\mathbb{C}^2$ and thus anything associated with it. In particular, it acts on the Hilbert scheme $\mathrm{Hilb}^{120}(\mathbb{C}^2).$ A point in this Hilbert scheme can lie in $\widetilde{S}$ only if it’s invariant under the action of $\Gamma.$ And given this, it’s in $\widetilde{S}$ if and only if $\mathbb{C}[x,y]/I$ is isomorphic to the regular representation of $\Gamma.$

Given all this, there’s an easy way to get your hands on a point $I \in \widetilde{S}.$ Just take any nonzero element of $\mathbb{C}^2$ and act on it by $\Gamma.$ You’ll get 120 distinct points in $\mathbb{C}^2$ — I promise. Do you see why? Then let $I$ be the set of polynomials that vanish on all these points.

In fact, we saw last time that your 120 points will be the vertices of a 600-cell centered at the origin of $\mathbb{C}^2$:

By this construction we get enough points to form an open dense subset of $\widetilde{S}.$ These are the points that aren’t mapped to the origin by

$\pi \colon \widetilde{S} \to S$

Alas, it’s the other points in $\widetilde{S}$ that I’m really interested in. As I hope you see, these are certain ‘limits’ of 600-cells that have ‘shrunk to the origin’… or in other words, highly symmetrical ways for 120 points in $\mathbb{C}^2$ to collide at the origin, with some highly symmetrical conditions on their velocities, accelerations, etc.

That’s what I need to understand.

### n-Category CaféThe Geometric McKay Correspondence (Part 1)

The ‘geometric McKay correspondence’, actually discovered by Patrick du Val in 1934, is a wonderful relation between the Platonic solids and the ADE Dynkin diagrams. In particular, it sets up a connection between two of my favorite things, the icosahedron:

and the $\mathrm{E}_8$ Dynkin diagram:

When I recently gave a talk on this topic, I realized I didn’t understand it as well as I’d like. Since then I’ve been making progress with the help of this book:

• Alexander Kirillov Jr., Quiver Representations and Quiver Varieties, AMS, Providence, Rhode Island, 2016.

I now think I glimpse a way forward to a very concrete and vivid understanding of the relation between the icosahedron and E8. It’s really just a matter of taking the ideas in this book and working them out concretely in this case. But it takes some thought, at least for me. I’d like to enlist your help.

The rotational symmetry group of the icosahedron is a subgroup of $\mathrm{SO}(3)$ with 60 elements, so its double cover up in $\mathrm{SU}(2)$ has 120. This double cover is called the binary icosahedral group, but I’ll call it $\Gamma$ for short.

This group $\Gamma$ is the star of the show, the link between the icosahedron and E8. To visualize this group, it’s good to think of $\mathrm{SU}(2)$ as the unit quaternions. This lets us think of the elements of $\Gamma$ as 120 points in the unit sphere in 4 dimensions. They are in fact the vertices of a 4-dimensional regular polytope, which looks like this:

It’s called the 600-cell.

Since $\Gamma$ is a subgroup of $\mathrm{SU}(2)$ it acts on $\mathbb{C}^2$, and we can form the quotient space

$S = \mathbb{C}^2/\Gamma$

This is a smooth manifold except at the origin—that is, the point coming from $0 \in \mathbb{C}^2$. There’s a singularity at the origin, and this where $\mathrm{E}_8$ is hiding! The reason is that there’s a smooth manifold $\widetilde{S}$ and a map

$\pi : \widetilde{S} \to S$

that’s one-to-one and onto except at the origin. It maps 8 spheres to the origin! There’s one of these spheres for each dot here:

Two of these spheres intersect in a point if their dots are connected by an edge; otherwise they’re disjoint.

The challenge is to find a nice concrete description of $\widetilde{S}$, the map $\pi : \widetilde{S} \to S$, and these 8 spheres.

But first it’s good to get a mental image of $S$. Each point in this space is a $\Gamma$ orbit in $\mathbb{C}^2$, meaning a set like this:

$\{g x : \; g \in \Gamma \}$

for some $x \in \mathbb{C}^2$. For $x = 0$ this set is a single point, and that’s what I’ve been calling the ‘origin’. In all other cases it’s 120 points, the vertices of a 600-cell in $\mathbb{C}^2$. This 600-cell is centered at the point $0 \in \mathbb{C}^2$, but it can be big or small, depending on the magnitude of $x$.

So, as we take a journey starting at the origin in $S$, we see a point explode into a 600-cell, which grows and perhaps also rotates as we go. The origin, the singularity in $S$, is a bit like the Big Bang.

Unfortunately not every 600-cell centered at the origin is of the form I’ve shown:

$\{g x : \; g \in \Gamma \}$

It’s easiest to see this by thinking of points in 4d space as quaternions rather than elements of $\mathbb{C}^2$. Then the points $g \in \Gamma$ are unit quaternions forming the vertices of a 600-cell, and multiplying $g$ on the right by $x$ dilates this 600-cell and also rotates it… but we don’t get arbitrary rotations this way. To get an arbitrarily rotated 600-cell we’d have to use both a left and right multiplication, and consider

$\{x g y : \; g \in \Gamma \}$

for a pair of quaternions $x, y$.

Luckily, there’s a simpler picture of the space $S$. It’s the space of all regular icosahedra centered at the origin in 3d space!

To see this, we start by switching to the quaternion description, which says

$S = \mathbb{H}/\Gamma$

Specifying a point $x \in \mathbb{H}$ amounts to specifying the magnitude $\|x\|$ together with $x/\|x\|$, which is a unit quaternion, or equivalently an element of $\mathrm{SU}(2)$. So, specifying a point in

$\{g x : \; g \in \Gamma \} \in \mathbb{H}/\Gamma$

amounts to specifying the magnitude $\|x\|$ together with a point in $\mathrm{SU}(2)/\Gamma$. But $\mathrm{SU}(2)$ modulo the binary icosahedral group $\Gamma$ is the same as $\mathrm{SO}(3)$ modulo the icosahedral group (the rotational symmetry group of an icosahedron). Furthermore, $\mathrm{SO}(3)$ modulo the icosahedral group is just the space of unit-sized icosahedra centered at the origin of $\mathbb{R}^3$.

So, specifying a point

$\{g x : \; g \in \Gamma \} \in \mathbb{H}/\Gamma$

amounts to specifying a nonnegative number $\|x\|$ together with a unit-sized icosahedron centered at the origin of $\mathbb{R}^3$. But this is the same as specifying an icosahedron of arbitrary size centered at the origin of $\mathbb{R}^3$. There’s just one subtlety: we allow the size of this icosahedron to be zero, but then the way it’s rotated no longer matters.

So, $S$ is the space of icosahedra centered at the origin, with the ‘icosahedron of zero size’ being a singularity in this space. When we pass to the smooth manifold $\widetilde{S}$, we replace this singularity with 8 spheres, intersecting in a pattern described by the $\mathrm{E}_8$ Dynkin diagram.

Points on these spheres are limiting cases of icosahedra centered at the origin. We can approach these points by letting an icosahedron centered at the origin shrink to zero size in a clever way, perhaps spinning about wildly as it does.

I don’t understand this last paragraph nearly as well as I’d like! I’m quite sure it’s true, and I know a lot of relevant information, but I don’t see it. There should be a vivid picture of how this works, not just an abstract argument. Next time I’ll start trying to assemble the material that I think needs to go into building this vivid picture.

### Tommaso Dorigo — EPS 2017: 1000 Physicists In Venice

The 2017 edition of the European Physical Society conference will take place in the Lido of Venice this week, from July 5th to 12th. For the first time in many years -30 as of now- a big international conference in HEP is organized in Italy, a datum I found surprising at first. When I learned it, the years were 26 and I was in a local organizing committee that tried to propose another conference in the same location. Although excellent, our proposal was ditched, and from the episode I learned I should not be too surprised for the hiatus.

## July 02, 2017

### Chad Orzel — Physics Blogging Round-Up: June

To make up for last month’s long delay in posting, I’ll knock out this month’s recap of Forbes blog posts really quickly. Also, I still have Vacation Brain, so writing anything really new isn’t in the cards…

What Should Non-Scientists Learn From Physics?: You probably won’t be surprised to hear that, in my opinion, it’s not a specific set of facts, but an attitude toward the world.

Softball Physics: How Far Can You Run While The Ball Is In The Air?: In which SteelyKid learning softball’s “tag up” rule the hard way leads to an interesting problem in physics.

How Long Would A Fidget Spinner Spin In Space?: If we’re going to have a bunch of the things in the house, I might as well get a blog post out of it…

How Laser Cooling Continues To Open Up New Possibilities For Physics: A delayed reaction to some talks at DAMOP about new research areas that are rooted in the development of laser cooling back in the 1980’s. Written while in Mexico on vacation.

The Physics Of Vacation: It’s All About Phase Transitions: Another post written in Mexico while on vacation, this one about being on vacation, specifically the way phase transitions of water have a huge impact on the experience.

Predictably enough, the post capitalizing on a recent fad is the runaway winner, traffic-wise. I was disappointed that the softball one didn’t get more traction, because I thought it was cute. Probably should’ve put “Baseball” in the title rather than “Softball,” since I mention both, and some baseball fans are louts. The two written on vacation went basically nowhere, traffic-wise; this is probably partly because I was on vacation and not able to actively social-media bomb them, partly a summer melt thing (traffic always dips in the summer) and partly a matter of topic selection. But those are the things I felt like writing about, and that’s the whole point, here…

## July 01, 2017

### Backreaction — To understand the foundations of physics, study numerology

Numbers speak. [Img Src] Once upon a time, we had problems in the foundations of physics. Then we solved them. That was 40 years ago. Today we spend most of our time discussing non-problems. Here is one of these non-problems. Did you know that the universe is spatially almost flat? There is a number in the cosmological concordance model called the “curvature parameter” that, according to

## June 30, 2017

### Backreaction — Away Note

I’ll be traveling the next two weeks. First to Cambridge to celebrate Stephen Hawking’s 75th birthday (which was in January), then in Trieste for a conference on “Probing the spacetime fabric: from concepts to phenomenology.”  Rant coming up later today, but after that please prepare for a slow time.

## June 29, 2017

### Doug Natelson — Condensed matter/nano resources for science writers and journalists

I've been thinking about and planning to put together some resources about condensed matter physics and nanoscience that would be helpful for science writers and journalists.  Part of the motivation here is rather similar to that of doing outreach work with teachers - you can get a multiplicative effect compared to working with individual students, since each teacher interacts with many students.  Along those lines, helping science writers, journalists, and editors might have an impact on a greater pool than just those who directly read my own (by necessity, limited) writing.  I've had good exchanges of emails with some practitioners about this, and that has been very helpful, but I'd like more input from my readers.

In answer to a few points that have come up in my email discussions:

• Why do this?  Because I'd like to see improved writing out there.  I'd like the science-interested public to understand that there is amazing, often deep physics around them all the time - that there are deep ideas at work deep down in your iphone or your morning cup of coffee, and that those are physics, too.  I know that high energy ("Building blocks of the universe!") and astro ("Origins of everything!  Alien worlds!  Black holes!") are very marketable.  I'd be happy to guide little more of the bandwidth toward condensed matter/materials/real nano (not sci-fi) popularization.  I think the perception that high energy = all of physics goes a long way toward explaining why so many people (incl politicians) think that basic research is pie-in-the-sky-useless, and everything else is engineering that should be funded by companies.  I do think online magazines like Quanta and sites like Inside Science are great and headed in a direction I like.  I wish IFLS was more careful, but I admire their reach.
• What is the long-range audience and who are the stakeholders?  I'd like CMP and nano to reach a broad audience.  There are serious technically trained people (faculty, researchers, some policy makers) who already know a lot of what I'd write about, though some of them still enjoy reading prose that is well written.  I am more thinking about the educated lay-public - the people who watch Nova or Scientific American Frontiers or Mythbusters or Through The Wormhole (bleah) or Cosmos, or who read Popular Science or Discovery or Scientific American or National Geographic.  Those are people who want to know more about science, or at least aren't opposed to the idea.  I guess the stakeholders would be the part of the physics  and engineering community that work on solid state and nano things, but don't have the time or inclination to do serious popular communication themselves.  I think that community is often disserved by (1) the popular portrayal that high energy = all of physics and crazy speculative stuff = actual tested science; (2) hype-saturated press releases that claim breakthroughs or feel the need to promise "1000x faster computers" when real, fundamental results are often downplayed; and (3) a focus in the field that only looks at applications rather than properly explaining the context of basic research.
• You know that journalists usually have to cover many topics and have very little time, right?  Yes.  I also know that just because I make something doesn't mean anyone would necessarily use it.  Hence, why I'm looking for input.   Maybe something like a CM/nano FAQ would be helpful.
• You know that long-form non-fiction writers love to do their own topical research, right?  Yes, and if there was something I could do to help those folks save time and avoid subject matter pitfalls, I'd feel like I'd accomplished something.
• You could do more writing yourself, or give regular tips/summaries to journalists and editors via twitter, your blog, etc.  That's true, and I plan to try to do more, but as I said at the top, the point is not for me to become a professional journalist (in the sense of providing breaking news tidbits) or writer, but to do what I can to help those people who have already chosen that vocation.
• You know there are already pros who worry about quality of science writing and journalism, right?  Yes, and they have some nice reading material.  For example, this and this from the Berkeley Science Review; this from the Guardian; this from the National Association of Science Writers.
So, writers and editors that might read this:  What would actually be helpful to you along these lines, if anything?  Some primer material on some topics more accessible and concise than wikipedia?

### Georg von Hippel — Lattice 2017, Day Six

On the last day of the 2017 lattice conference, there were plenary sessions only. The first plenary session opened with a talk by Antonio Rago, who gave a "community review" of lattice QCD on new chips. New chips in the case of lattice QCD means mostly Intel's new Knight's Landing architecture, to whose efficient use significant effort is devoted by the community. Different groups pursue very different approaches, from purely OpenMP-based C codes to mixed MPI/OpenMP-based codes maximizing the efficiency of the SIMD pieces using assembler code. The new NVidia Tesla Volta and Intel's OmniPath fabric also featured in the review.

The next speaker was Zoreh Davoudi, who reviewed lattice inputs for nuclear physics. While simulating heavier nuclei directly in the lattice is still infeasible, nuclear phenomenologists appear to be very excited about the first-principles lattice QCD simulations of multi-baryon systems now reaching maturity, because these can be use to tune and validate nuclear models and effective field theories, from which predictions for heavier nuclei can then be derived so as to be based ultimately on QCD. The biggest controversy in the multi-baryon sector at the moment is due to HALQCD's claim that the multi-baryon mass plateaux seen by everyone except HALQCD (who use their own method based on Bethe-Salpeter amplitudes) are probably fakes or "mirages", and that using the Lüscher method to determine multi-baryon binding would require totally unrealistic source-sink separations of over 10 fm. The volume independence of the bound-state energies determined from the allegedly fake plateaux, as contrasted to the volume dependence of the scattering-state energies so extracted, provides a fairly strong defence against this claim, however. There are also new methods to improve the signal-to-noise ratio for multi-baryon correlation functions, such as phase reweighting.

This was followed by a talk on the tetraquark candidate Zc(3900) by Yoichi Ikeda, who spent a large part of his talk on reiterating the HALQCD claim that the Lüscher method requires unrealistically large time separations. During the questions, William Detmold raised the important point that there would be no excited-state contamination at all if the interpolating operator created an eigenstate of the QCD Hamiltonian, and that for improved interpolating operators (such as generated by the variational method) one can get rather close to this situation, so that the HLAQCD criticism seems hardly applicable. As for the Zc(3900), HALQCD find it to be not a resonance, but a kinematic cusp, although this conclusion is based on simulations at rather heavy pion masses (mπ> 400 MeV).

The final plenary session was devoted to the anomalous magnetic moment of the muon, which is perhaps the most pressing topic for the lattice community, since the new (g-2) experiment is now running, and theoretical predictions matching the improved experimental precision will be needed soon. The first speaker was Christoph Lehner, who presented RBC/UKQCD's efforts to determine the hadronic vacuum polarization contribution to aμ with high precision. The strategy for this consists of two main ingredients: one is to minimize the statistical and systematic errors of the lattice calculation by using a full-volume low-mode average via a multigrid Lanczos method, explicitly including the leading effects of strong isospin breaking and QED, and the contribution from disconnected diagrams, and the other is to combine lattice and phenomenology to take maximum advantage of their respective strengths. This is achieved by using the time-momentum representation with a continuum correlator reconstructed from the R-ratio, which turns out to be quite precise at large times, but more uncertain at shorter times, which is exactly the opposite of the situation for the lattice correlator. Using a window which continuously switches over from the lattice to the continuum at time separations around 1.2 fm then minimizes the overall error on aμ.

The last plenary talk was given by Gilberto Colangelo, who discussed the new dispersive approach to the hadronic light-by-light scattering contribution to aμ. Up to now the theory results for this small, but important, contribution have been based on models, which will always have an a priori unknown and irreducible systematic error, although lattice efforts are beginning to catch up. For a dispersive approach based on general principles such as analyticity and unitarity, the hadronic light-by-light tensor first needs to be Lorentz decomposed, which gives 138 tensors, of which 136 are independent, and of which gauge invariance permits only 54, of which 7 are distinct, with the rest related by crossing symmetry; care has to be taken to choose the tensor basis such that there are no kinematic singularities. A master formula in terms of 12 linear combinations of these components has been derived by Gilberto and collaborators, and using one- and two-pion intermediate states (and neglecting the rest) in a systematic fashion, they have been able to produce a model-independent theory result with small uncertainties based on experimental data for pion form factors and scattering amplitudes.

The closing remarks were delivered by Elvira Gamiz, who advised participants that the proceedings deadline of 18 October will be strict, because this year's proceedings will not be published in PoS, but in EPJ Web of Conferences, who operate a much stricter deadline policy. Many thanks to Elvira for organizing such a splendid lattice conference! (I can appreciate how much work that is, and I think you should have received far more applause.)

Huey-Wen Lin invited the community to East Lansing, Michigan, USA, for the Lattice 2018 conference, which will take place 22-28 July 2018 on the campus of Michigan State University.

The IAC announced that Lattice 2019 will take place in Wuhan, China.

And with that the conference ended. I stayed in Granada for a couple more days of sightseeing and relaxation, but the details thereof will be of legitimate interest only to a very small subset of my readership (whom I keep updated via different channels), and I therefore conclude my coverage and return the blog to its accustomed semi-hiatus state.

## June 27, 2017

### Jordan Ellenberg — When random people give money to random other people

A post on Decision Science about a problem of Uri Wilensky‘s has been making the rounds:

Imagine a room full of 100 people with 100 dollars each. With every tick of the clock, every person with money gives a dollar to one randomly chosen other person. After some time progresses, how will the money be distributed?

People often expect the distribution to be close to uniform.  But this isn’t right; the simulations in the post show clearly that inequality of wealth rapidly appears and then persists (though each individual person bobs up and down from rich to poor.)  What’s going on?  Why would this utterly fair and random process generate winners and losers?

Here’s one way to think about it.  The possible states of the system are the sets of nonnegative integers (m_1, .. m_100) summing to 10,000; if you like, the lattice points inside a simplex.  (From now on, let’s write N for 100 because who cares if it’s 100?)

The process is a random walk on a graph G, whose vertices are these states and where two vertices are connected if you can get from one to the other by taking a dollar from one person and giving it to another.  We are asking:  when you run the random walk for a long time, where are you on this graph?  Well, we know what the stationary distribution for random walk on an undirected graph is; it gives each vertex a probability proportional to its degree.  On a regular graph, you get uniform distribution.

Our state graph G isn’t regular, but it almost is; most nodes have degree N, where by “most” I mean “about 1-1/e”; since the number of states is

$N^2 + N - 1 \choose N-1$

and, of these, the ones with degree N are exactly those in which nobody’s out of money; if each person has a dollar, the number of ways to distribute the remaining N^2 – N dollars is

$N^2 - 1 \choose N-1$

and so the proportion of states where someone’s out of money is about

$\frac{(N^2 - 1)^N}{(N^2 + N - 1)^N} \sim (1-1/N)^N \sim 1/e$.

So, apart from those states where somebody’s broke, in the long run every possible state is equally likely;  we are just as likely to see $9,901 in one person’s hands and everybody else with$1 as we are to see exact equidistribution again.

What is a random lattice point in this simplex like?  Good question!  An argument just like the one above shows that the probability nobody goes below $c is on order e^-c, at least when c is small relative to N; in other words, it’s highly likely that somebody’s very nearly out of money. If X is the maximal amount of money held by any player, what’s the distribution of X? I didn’t immediately see how to figure this out. You might consider the continuous version, where you pick a point at random from the real simplex $(x_1, .. x_N) \in \mathbf{R}^N: \sum x_i = N^2$. Equivalently; break a stick at N-1 randomly chosen points; what is the length of the longest piece? This is a well-studied problem; the mean size of the longest piece is about N log N. So I guess I think maybe that’s the expected value of the net worth of the richest player? But it’s not obvious to me whether you can safely approximate the finite problem by its continuous limit (which corresponds to the case where we keep the number of players at N but reduce the step size so that each player can give each other a cent, or a picocent, or whatever.) What happens if you give each of the N players just one dollar? Now the uniformity really breaks down, because it’s incredibly unlikely that nobody’s broke. The probability distribution on the set of (m_1, .. m_N) summing to N assigns each vector a probability proportional to the size of its support (i.e. the number of m_i that are nonzero.) That must be a well-known distribution, right? What does the corresponding distribution on partitions of N look like? Update: Kenny Easwaran points out that this is basically the same computation physicists do when they compute the Boltzmann distribution, which was new to me. ## June 25, 2017 ### Georg von Hippel — Lattice 2017, Day Five The programme for today took account of the late end of the conference dinner in the early hours of the day, by moving the plenary sessions by half an hour. The first plenary talk of the day was given by Ben Svetitsky, who reviewed the status of BSM investigations using lattice field theory. An interesting point Ben raised was that these studies go not so much "beyond" the Standard Model (like SUSY, dark matter, or quantum gravity would), but "behind" or "beneath" it by seeking for a deeper explanation of the seemingly unnaturally small Higgs mass, flavour hierarchies, and other unreasonable-looking features of the SM. The original technicolour theory is quite dead, being Higgsless, but "walking" technicolour models are an area of active investigation. These models have a β-function that comes close to zero at some large coupling, leading to an almost conformal behaviour near the corresponding IR almost-fixed point. In such almost conformal theories, a light scalar (i.e. the Higgs) could arise naturally as the pseudo-Nambu-Goldstone boson of the approximate dilatation symmetry of the theory. A range of different gauge groups, numbers of flavours, and fermion representations are being investigated, with the conformal or quasi-conformal status of some of these being apparently controversial. An alternative approach to Higgs compositeness has the Higgs appear as the exact Nambu-Goldstone boson of some spontaneous symmetry breaking which keeps SU(2)L⨯U(1) intact, with the Higgs potential being generated at the loop level by the coupling to the SM sector. There are also some models of this type being actively investigated. The next plenary speaker was Stefano Forte, who reviewed the status and prospects of determining the strong coupling αs from sources other than the lattice. The PDG average for αs is a weighted average of six values, four of which are the pre-averages of the determinations from the lattice, from τ decays, from jet rates and shapes, and from parton distribution functions, and two of which are the determinations from the global electroweak fit and from top production at the LHC. Each of these channels has its own systematic issues, and one problem can be that overaggressive error estimates give too much weight to the corresponding determination, leading to statistically implausible scatter of results in some channels. It should be noted, however, that the lattice results are all quite compatible, with the most precise results by ALPHA and by HPQCD (which use different lattice formulations and completely different analysis methods) sitting right on top of each other. This was followed by a presentation by Thomas Korzec of the determination of αs by the ALPHA collaboration. I cannot really attempt to do justice to this work in a blog post, so I encourage you to look at their paper. By making use of both the Schrödinger functional and the gradient flow coupling in finite volume, they are able to non-perturbatively run αs between hadronic and perturbative scales with high accuracy. After the coffee break, Erhard Seiler reviewed the status of the complex Langevin method, which is one of the leading methods for simulating actions with a sign problem, e.g. at finite chemical potential or with a θ term. Unfortunately, it is known that the complex Langevin method can sometimes converge to wrong results, and this can be traced to the violation by the complexification of the conditions under which the (real) Langevin method is justified, of which the development of zeros in e-S seems to be the most important case, giving rise to poles in the force which will violate ergodicity. There seems to be a lack of general theorems for situations like this, although the complex Langevin method has apparently been shown to be correct under certain difficult-to-check conditions. One of the best hopes for simulating with complex Langevin seems to be the dynamical stabilization proposed by Benjamin Jäger and collaborators. This was followed by Paulo Bedaque discussing the prospects of solving the sign problem using the method of thimbles and related ideas. As far as I understand, thimbles are permissible integration regions in complexified configuration space on which the imaginary part of the action is constant, and which can thus be integrated over without a sign problem. A holomorphic flow that is related both to the gradient flow and the Hamiltonian flow can be constructed so as to flow from the real integration region to the thimbles, and based on this it appears to have become possible to solve some toy models with a sign problem, even going so far as to perform real-time simulations in the Keldysh-Schwinger formalism in Euclidean space (if I understood correctly). In the afternoon, there was a final round of parallel sessions, one of which was again dedicated to the anomalous magnetic moment of the muon, this time focusing on the very difficult hadronic light-by-light contribution, for which the Mainz group has some very encouraging first results. ## June 23, 2017 ### Georg von Hippel — Lattice 2017, Days Three and Four Wednesday was the customary short day, with parallel sessions in the morning, and time for excursions in the afternoon. I took the "Historic Granada" walking tour, which included visits to the Capilla Real and the very impressive Cathedral of Granada. The first plenary session of today had a slightly unusual format in that it was a kind of panel discussion on the topic of axions and QCD topology at finite temperature. After a brief outline by Mikko Laine, the session chair, the session started off with a talk by Guy Moore on the role of axions in cosmology and the role of lattice simulations in this context. Axions arise in the Peccei-Quinn solution to the strong CP problem and are a potential dark matter candidate. Guy presented some of his own real-time lattice simulations in classical field theory for axion fields, which exhibit the annihilation of cosmic-string-like vortex defects and associated axion production, and pointed out the need for accurate lattice QCD determinations of the topological susceptibility in the temperature range of 500-1200 MeV in order to fix the mass of the axion more precisely from the dark matter density (assuming that dark matter consists of axions). The following talks were all fairly short. Claudio Bonati presented algorithmic developments for simulations of the topological properties of high-temperature QCD. The long autocorrelations of the topological charge at small lattice spacing are a problem. Metadynamics, which bias the Monte Carlo evolution in a non-Markovian manner so as to more efficiently sample the configuration space, appear to be of help. Hidenori Fukaya reviewed the question of whether U(1)A remains anomalous at high temperature, which he claimed (both on theoretical grounds and based on numerical simulation results) it doesn't. I didn't quite understand this, since as far as I understand the axial anomaly, it is an operator identity, which will remain true even if both sides of the identity were to happen to vanish at high enough temperature, which is all that seemed to be shown; but this may just be my ignorance showing. Tamas Kovacs showed recent results on the temperature-dependence of the topological susceptibility of QCD. By a careful choice of algorithms based on physical considerations, he could measure the topological susceptibility over a wide range of temperatures, showing that it becomes tiny at large temperature. Then the speakers all sat on the stage as a panel and fielded questions from the audience. Perhaps it might have been a good idea to somehow force the speakers to engage each other; as it was, the advantage of this format over simply giving each speaker a longer time for answering questions didn't immediately become apparent to me. After the coffee break, things returned to the normal format. Boram Yoon gave a review of lattice determinations of the neutron electric dipole moment. Almost any BSM source of CP violation must show up as a contribution to the neutron EDM, which is therefore a very sensitive probe of new physics. The very strong experimental limits on any possible neutron EDM imply e.g. |θ|<10-10 in QCD through lattice measurements of the effects of a θ term on the neutron EDM. Similarly, limits can be put on any quark EDMs or quark chromoelectric dipole moments. The corresponding lattice simulations have to deal with sign problems, and the usual techniques (Taylor expansions, simulations at complex θ) are employed to get past this, and seem to be working very well. The next plenary speaker was Phiala Shanahan, who showed recent results regarding the gluon structure of hadrons and nuclei. This line of research is motivated by the prospect of an electron-ion collider that would be particularly sensitive to the gluon content of nuclei. For gluonic contributions to the momentum and spin decomposition of the nucleon, there are some fresh results from different groups. For the gluonic transversity, Phiala and her collaborators have performed first studies in the φ system. The gluonic radii of small nuclei have also been looked at, with no deviation from the single-nucleon case visible at the present level of accuracy. The 2017 Kenneth Wilson Award was awarded to Raúl Briceño for his groundbreaking contributions to the study of resonances in lattice QCD. Raúl has been deeply involved both in the theoretical developments behind extending the reach of the Lüscher formalism to more and more complicated situations, and in the numerical investigations of resonance properties rendered possible by those developments. After the lunch break, there were once again parallel sessions, two of which were dedicated entirely to the topic of the hadronic vacuum polarization contribution to the anomalous magnetic moment of the muon, which has become one of the big topics in lattice QCD. In the evening, the conference dinner took place. The food was excellent, and the Flamenco dancers who arrived at midnight (we are in Spain after all, where it seems dinner never starts before 9pm) were quite impressive. ### Terence Tao — Quantitative continuity estimates Suppose ${F: X \rightarrow Y}$ is a continuous (but nonlinear) map from one normed vector space ${X}$ to another ${Y}$. The continuity means, roughly speaking, that if ${x_0, x \in X}$ are such that ${\|x-x_0\|_X}$ is small, then ${\|F(x)-F(x_0)\|_Y}$ is also small (though the precise notion of “smallness” may depend on ${x}$ or ${x_0}$, particularly if ${F}$ is not known to be uniformly continuous). If ${F}$ is known to be differentiable (in, say, the Fréchet sense), then we in fact have a linear bound of the form $\displaystyle \|F(x)-F(x_0)\|_Y \leq C(x_0) \|x-x_0\|_X$ for some ${C(x_0)}$ depending on ${x_0}$, if ${\|x-x_0\|_X}$ is small enough; one can of course make ${C(x_0)}$ independent of ${x_0}$ (and drop the smallness condition) if ${F}$ is known instead to be Lipschitz continuous. In many applications in analysis, one would like more explicit and quantitative bounds that estimate quantities like ${\|F(x)-F(x_0)\|_Y}$ in terms of quantities like ${\|x-x_0\|_X}$. There are a number of ways to do this. First of all, there is of course the trivial estimate arising from the triangle inequality: $\displaystyle \|F(x)-F(x_0)\|_Y \leq \|F(x)\|_Y + \|F(x_0)\|_Y. \ \ \ \ \ (1)$ This estimate is usually not very good when ${x}$ and ${x_0}$ are close together. However, when ${x}$ and ${x_0}$ are far apart, this estimate can be more or less sharp. For instance, if the magnitude of ${F}$ varies so much from ${x_0}$ to ${x}$ that ${\|F(x)\|_Y}$ is more than (say) twice that of ${\|F(x_0)\|_Y}$, or vice versa, then (1) is sharp up to a multiplicative constant. Also, if ${F}$ is oscillatory in nature, and the distance between ${x}$ and ${x_0}$ exceeds the “wavelength” of the oscillation of ${F}$ at ${x_0}$ (or at ${x}$), then one also typically expects (1) to be close to sharp. Conversely, if ${F}$ does not vary much in magnitude from ${x_0}$ to ${x}$, and the distance between ${x}$ and ${x_0}$ is less than the wavelength of any oscillation present in ${F}$, one expects to be able to improve upon (1). When ${F}$ is relatively simple in form, one can sometimes proceed simply by substituting ${x = x_0 + h}$. For instance, if ${F: R \rightarrow R}$ is the squaring function ${F(x) = x^2}$ in a commutative ring ${R}$, one has $\displaystyle F(x_0+h) = (x_0+h)^2 = x_0^2 + 2x_0 h+ h^2$ and thus $\displaystyle F(x_0+h) - F(x_0) = 2x_0 h + h^2$ or in terms of the original variables ${x, x_0}$ one has $\displaystyle F(x) - F(x_0) = 2 x_0 (x-x_0) + (x-x_0)^2.$ If the ring ${R}$ is not commutative, one has to modify this to $\displaystyle F(x) - F(x_0) = x_0 (x-x_0) + (x-x_0) x_0 + (x-x_0)^2.$ Thus, for instance, if ${A, B}$ are ${n \times n}$ matrices and ${\| \|_{op}}$ denotes the operator norm, one sees from the triangle inequality and the sub-multiplicativity ${\| AB\|_{op} \leq \| A \|_{op} \|B\|_{op}}$ of operator norm that $\displaystyle \| A^2 - B^2 \|_{op} \leq \| A - B \|_{op} ( 2 \|B\|_{op} + \|A - B \|_{op} ). \ \ \ \ \ (2)$ If ${F(x)}$ involves ${x}$ (or various components of ${x}$) in several places, one can sometimes get a good estimate by “swapping” ${x}$ with ${x_0}$ at each of the places in turn, using a telescoping series. For instance, if we again use the squaring function ${F(x) = x^2 = x x}$ in a non-commutative ring, we have $\displaystyle F(x) - F(x_0) = x x - x_0 x_0$ $\displaystyle = (x x - x_0 x) + (x_0 x - x_0 x_0)$ $\displaystyle = (x-x_0) x + x_0 (x-x_0)$ $\displaystyle \| A^2 - B^2 \|_{op} \leq \| A - B \|_{op} ( \| A\|_{op} + \|B\|_{op} ).$ More generally, for any natural number ${n}$, one has the identity $\displaystyle x^n - x_0^n = (x-x_0) (x^{n-1} + x^{n-2} x_0 + \dots + x x_0^{n-2} + x_0^{n-1}) \ \ \ \ \ (3)$ in a commutative ring, while in a non-commutative ring one must modify this to $\displaystyle x^n - x_0^n = \sum_{i=0}^{n-1} x_0^i (x-x_0) x^{n-1-i},$ and for matrices one has $\displaystyle \| A^n - B^n \|_{op} \leq \| A-B\|_{op} ( \|A\|_{op}^{n-1} + \| A\|_{op}^{n-2} \| B\|_{op} + \dots + \|B\|_{op}^{n-1} ).$ Exercise 1 If ${U}$ and ${V}$ are unitary ${n \times n}$ matrices, show that the commutator ${[U,V] := U V U^{-1} V^{-1}}$ obeys the inequality $\displaystyle \| [U,V] - I \|_{op} \leq 2 \| U - I \|_{op} \| V - I \|_{op}.$ (Hint: first control ${\| UV - VU \|_{op}}$.) Now suppose (for simplicity) that ${F: {\bf R}^d \rightarrow {\bf R}^{d'}}$ is a map between Euclidean spaces. If ${F}$ is continuously differentiable, then one can use the fundamental theorem of calculus to write $\displaystyle F(x) - F(x_0) = \int_0^1 \frac{d}{dt} F( \gamma(t) )\ dt$ where ${\gamma: [0,1] \rightarrow Y}$ is any continuously differentiable path from ${x_0}$ to ${x}$. For instance, if one uses the straight line path ${\gamma(t) := (1-t) x_0 + tx}$, one has $\displaystyle F(x) - F(x_0) = \int_0^1 ((x-x_0) \cdot \nabla F)( (1-t) x_0 + t x )\ dt.$ $\displaystyle F(x) - F(x_0) = (x-x_0) \int_0^1 F'( (1-t) x_0 + t x )\ dt. \ \ \ \ \ (4)$ Among other things, this immediately implies the factor theorem for ${C^k}$ functions: if ${F}$ is a ${C^k({\bf R})}$ function for some ${k \geq 1}$ that vanishes at some point ${x_0}$, then ${F(x)}$ factors as the product of ${x-x_0}$ and some ${C^{k-1}}$ function ${G}$. Another basic consequence is that if ${\nabla F}$ is uniformly bounded in magnitude by some constant ${C}$, then ${F}$ is Lipschitz continuous with the same constant ${C}$. Applying (4) to the power function ${x \mapsto x^n}$, we obtain the identity $\displaystyle x^n - x_0^n = n (x-x_0) \int_0^1 ((1-t) x_0 + t x)^{n-1}\ dt \ \ \ \ \ (5)$ which can be compared with (3). Indeed, for ${x_0}$ and ${x}$ close to ${1}$, one can use logarithms and Taylor expansion to arrive at the approximation ${((1-t) x_0 + t x)^{n-1} \approx x_0^{(1-t) (n-1)} x^{t(n-1)}}$, so (3) behaves a little like a Riemann sum approximation to (5). Exercise 2 For each ${i=1,\dots,n}$, let ${X^{(1)}_i}$ and ${X^{(0)}_i}$ be random variables taking values in a measurable space ${R_i}$, and let ${F: R_1 \times \dots \times R_n \rightarrow {\bf R}^m}$ be a bounded measurable function. • (i) (Lindeberg exchange identity) Show that $\displaystyle \mathop{\bf E} F(X^{(1)}_1,\dots,X^{(1)}_n) - \mathop{\bf E} F(X^{(0)}_1,\dots,X^{(0)}_n)$ $\displaystyle = \sum_{i=1}^n \mathop{\bf E} F( X^{(1)}_1,\dots, X^{(1)}_{i-1}, X^{(1)}_i, X^{(0)}_{i+1}, \dots, X^{(0)}_n)$ $\displaystyle - \mathop{\bf E} F( X^{(1)}_1,\dots, X^{(1)}_{i-1}, X^{(0)}_i, X^{(0)}_{i+1}, \dots, X^{(0)}_n).$ • (ii) (Knowles-Yin exchange identity) Show that $\displaystyle \mathop{\bf E} F(X^{(1)}_1,\dots,X^{(1)}_n) - \mathop{\bf E} F(X^{(0)}_1,\dots,X^{(0)}_n)$ $\displaystyle = \int_0^1 \sum_{i=1}^n \mathop{\bf E} F( X^{(t)}_1,\dots, X^{(t)}_{i-1}, X^{(1)}_i, X^{(t)}_{i+1}, \dots, X^{(t)}_n)$ $\displaystyle - \mathop{\bf E} F( X^{(t)}_1,\dots, X^{(t)}_{i-1}, X^{(0)}_i, X^{(t)}_{i+1}, \dots, X^{(t)}_n)\ dt,$ where ${X^{(t)}_i = 1_{I_i \leq t} X^{(0)}_i + 1_{I_i > t} X^{(1)}_i}$ is a mixture of ${X^{(0)}_i}$ and ${X^{(1)}_i}$, with ${I_1,\dots,I_n}$ uniformly drawn from ${[0,1]}$ independently of each other and of the ${X^{(0)}_1,\dots,X^{(0)}_n, X^{(1)}_0,\dots,X^{(1)}_n}$. • (iii) Discuss the relationship between the identities in parts (i), (ii) with the identities (3), (5). (The identity in (i) is the starting point for the Lindeberg exchange method in probability theory, discussed for instance in this previous post. The identity in (ii) can also be used in the Lindeberg exchange method; the terms in the right-hand side are slightly more symmetric in the indices ${1,\dots,n}$, which can be a technical advantage in some applications; see this paper of Knowles and Yin for an instance of this.) Exercise 3 If ${F: {\bf R}^d \rightarrow {\bf R}^{d'}}$ is continuously ${k}$ times differentiable, establish Taylor’s theorem with remainder $\displaystyle F(x) = \sum_{j=0}^{k-1} \frac{1}{j!} (((x-x_0) \cdot \nabla)^j F)( x_0 )$ $\displaystyle + \int_0^1 \frac{(1-t)^{k-1}}{(k-1)!} (((x-x_0) \cdot \nabla)^k F)((1-t) x_0 + t x)\ dt.$ If ${\nabla^k F}$ is bounded, conclude that $\displaystyle |F(x) - \sum_{j=0}^{k-1} \frac{1}{j!} (((x-x_0) \cdot \nabla)^j F)( x_0 )|$ $\displaystyle \leq \frac{|x-x_0|^k}{k!} \sup_{y \in {\bf R}^d} |\nabla^k F(y)|.$ For real scalar functions ${F: {\bf R}^d \rightarrow {\bf R}}$, the average value of the continuous real-valued function ${(x - x_0) \cdot \nabla F((1-t) x_0 + t x)}$ must be attained at some point ${t}$ in the interval ${[0,1]}$. We thus conclude the mean-value theorem $\displaystyle F(x) - F(x_0) = ((x - x_0) \cdot \nabla F)((1-t) x_0 + t x)$ for some ${t \in [0,1]}$ (that can depend on ${x}$, ${x_0}$, and ${F}$). This can for instance give a second proof of fact that continuously differentiable functions ${F}$ with bounded derivative are Lipschitz continuous. However it is worth stressing that the mean-value theorem is only available for real scalar functions; it is false for instance for complex scalar functions. A basic counterexample is given by the function ${e(x) := e^{2\pi i x}}$; there is no ${t \in [0,1]}$ for which ${e(1) - e(0) = e'(t)}$. On the other hand, as ${e'}$ has magnitude ${2\pi}$, we still know from (4) that ${e}$ is Lipschitz of constant ${2\pi}$, and when combined with (1) we obtain the basic bounds $\displaystyle |e(x) - e(y)| \leq \min( 2, 2\pi |x-y| )$ which are already very useful for many applications. Exercise 4 Let ${H_0, V}$ be ${n \times n}$ matrices, and let ${t}$ be a non-negative real. • (i) Establish the Duhamel formula $\displaystyle e^{t(H_0+V)} = e^{tH_0} + \int_0^t e^{(t-s) H_0} V e^{s (H_0+V)}\ ds$ $\displaystyle = e^{tH_0} + \int_0^t e^{(t-s) (H_0+V)} V e^{s H_0}\ ds$ where ${e^A}$ denotes the matrix exponential of ${A}$. (Hint: Differentiate ${e^{(t-s) H_0} e^{s (H_0+V)}}$ or ${e^{(t-s) (H_0+V)} e^{s H_0}}$ in ${s}$.) • (ii) Establish the iterated Duhamel formula $\displaystyle e^{t(H_0+V)} = e^{tH_0} + \sum_{j=1}^k \int_{0 \leq t_1 \leq \dots \leq t_j \leq t}$ $\displaystyle e^{(t-t_j) H_0} V e^{(t_j-t_{j-1}) H_0} V \dots e^{(t_2-t_1) H_0} V e^{t_1 H_0}\ dt_1 \dots dt_j$ $\displaystyle + \int_{0 \leq t_1 \leq \dots \leq t_{k+1} \leq t}$ $\displaystyle e^{(t-t_{k+1}) (H_0+V)} V e^{(t_{k+1}-t_k) H_0} V \dots e^{(t_2-t_1) H_0} V e^{t_1 H_0}\ dt_1 \dots dt_{k+1}$ for any ${k \geq 0}$. • (iii) Establish the infinitely iterated Duhamel formula $\displaystyle e^{t(H_0+V)} = e^{tH_0} + \sum_{j=1}^\infty \int_{0 \leq t_1 \leq \dots \leq t_j \leq t}$ $\displaystyle e^{(t-t_j) H_0} V e^{(t_j-t_{j-1}) H_0} V \dots e^{(t_2-t_1) H_0} V e^{t_1 H_0}\ dt_1 \dots dt_j.$ • (iv) If ${H(t)}$ is an ${n \times n}$ matrix depending in a continuously differentiable fashion on ${t}$, establish the variation formula $\displaystyle \frac{d}{dt} e^{H(t)} = (F(\mathrm{ad}(H(t))) H'(t)) e^{H(t)}$ where ${\mathrm{ad}(H)}$ is the adjoint representation ${\mathrm{ad}(H)(V) = HV - VH}$ applied to ${H}$, and ${F}$ is the function $\displaystyle F(z) := \int_0^1 e^{sz}\ ds$ (thus ${F(z) = \frac{e^z-1}{z}}$ for non-zero ${z}$), with ${F(\mathrm{ad}(H(t)))}$ defined using functional calculus. We remark that further manipulation of (iv) of the above exercise using the fundamental theorem of calculus eventually leads to the Baker-Campbell-Hausdorff-Dynkin formula, as discussed in this previous blog post. $\displaystyle AX + X B = Y$ which is given by the formula $\displaystyle X = \int_0^\infty e^{-tA} Y e^{-tB}\ dt.$ In the above examples we had applied the fundamental theorem of calculus along linear curves ${\gamma(t) = (1-t) x_0 + t x}$. However, it is sometimes better to use other curves. For instance, the circular arc ${\gamma(t) = \cos(\pi t/2) x_0 + \sin(\pi t/2) x}$ can be useful, particularly if ${x_0}$ and ${x}$ are “orthogonal” or “independent” in some sense; a good example of this is the proof by Maurey and Pisier of the gaussian concentration inequality, given in Theorem 8 of this previous blog post. In a similar vein, if one wishes to compare a scalar random variable ${X}$ of mean zero and variance one with a Gaussian random variable ${G}$ of mean zero and variance one, it can be useful to introduce the intermediate random variables ${\gamma(t) := (1-t)^{1/2} X + t^{1/2} G}$ (where ${X}$ and ${G}$ are independent); note that these variables have mean zero and variance one, and after coupling them together appropriately they evolve by the Ornstein-Uhlenbeck process, which has many useful properties. For instance, one can use these ideas to establish monotonicity formulae for entropy; see e.g. this paper of Courtade for an example of this and further references. More generally, one can exploit curves ${\gamma}$ that flow according to some geometrically natural ODE or PDE; several examples of this occur famously in Perelman’s proof of the Poincaré conjecture via Ricci flow, discussed for instance in this previous set of lecture notes. In some cases, it is difficult to compute ${F(x)-F(x_0)}$ or the derivative ${\nabla F}$ directly, but one can instead proceed by implicit differentiation, or some variant thereof. Consider for instance the matrix inversion map ${F(A) := A^{-1}}$ (defined on the open dense subset of ${n \times n}$ matrices consisting of invertible matrices). If one wants to compute ${F(B)-F(A)}$ for ${B}$ close to ${A}$, one can write temporarily write ${F(B) - F(A) = E}$, thus $\displaystyle B^{-1} - A^{-1} = E.$ Multiplying both sides on the left by ${B}$ to eliminate the ${B^{-1}}$ term, and on the right by ${A}$ to eliminate the ${A^{-1}}$ term, one obtains $\displaystyle A - B = B E A$ and thus on reversing these steps we arrive at the basic identity $\displaystyle B^{-1} - A^{-1} = B^{-1} (A - B) A^{-1}. \ \ \ \ \ (6)$ $\displaystyle R_0(z) := (H_0 - z I)^{-1}; \quad R_V(z) := (H_0 + V - zI)^{-1}$ then we have the resolvent identity $\displaystyle R_V(z) - R_0(z) = - R_V(z) V R_0(z) \ \ \ \ \ (7)$ as long as ${z}$ does not lie in the spectrum of ${H_0}$ or ${H_0+V}$ (for instance, if ${H_0}$, ${V}$ are self-adjoint then one can take ${z}$ to be any strictly complex number). One can iterate this identity to obtain $\displaystyle R_V(z) = \sum_{j=0}^k (-R_0(z) V)^j R_0(z) + (-R_V(z) V) (-R_0(z) V)^k R_0(z)$ $\displaystyle R_V(z) = \sum_{j=0}^\infty (-R_0(z) V)^j R_0(z).$ Similarly, if ${A(t)}$ is a family of invertible matrices that depends in a continuously differentiable fashion on a time variable ${t}$, then by implicitly differentiating the identity $\displaystyle A(t) A(t)^{-1} = I$ in ${t}$ using the product rule, we obtain $\displaystyle (\frac{d}{dt} A(t)) A(t)^{-1} + A(t) \frac{d}{dt} A(t)^{-1} = 0$ and hence $\displaystyle \frac{d}{dt} A(t)^{-1} = - A(t)^{-1} (\frac{d}{dt} A(t)) A(t)^{-1}$ (this identity may also be easily derived from (6)). One can then use the fundamental theorem of calculus to obtain variants of (6), for instance by using the curve ${\gamma(t) = (1-t) A + tB}$ we arrive at $\displaystyle B^{-1} - A^{-1} = \int_0^1 ((1-t)A + tB)^{-1} (A-B) ((1-t)A + tB)^{-1}\ dt$ assuming that the curve stays entirely within the set of invertible matrices. While this identity may seem more complicated than (6), it is more symmetric, which conveys some advantages. For instance, using this identity it is easy to see that if ${A, B}$ are positive definite with ${A>B}$ in the sense of positive definite matrices (that is, ${A-B}$ is positive definite), then ${B^{-1} > A^{-1}}$. (Try to prove this using (6) instead!) Exercise 6 If ${A}$ is an invertible ${n \times n}$ matrix and ${u, v}$ are ${n \times 1}$ vectors, establish the Sherman-Morrison formula $\displaystyle (A + t uv^T)^{-1} = A^{-1} - \frac{t}{1 + t v^T A^{-1} u} A^{-1} uv^T A^{-1}$ whenever ${t}$ is a scalar such that ${1 + t v^T A^{-1} u}$ is non-zero. (See also this previous blog post for more discussion of these sorts of identities.) One can use the Cauchy integral formula to extend these identities to other functions of matrices. For instance, if ${F: {\bf C} \rightarrow {\bf C}}$ is an entire function, and ${\gamma}$ is a counterclockwise contour that goes around the spectrum of both ${H_0}$ and ${H_0+V}$, then we have $\displaystyle F(H_0+V) = \frac{-1}{2\pi i} \int_\gamma F(z) R_V(z)\ dz$ and similarly $\displaystyle F(H_0) = \frac{-1}{2\pi i} \int_\gamma F(z) R_0(z)\ dz$ and hence by (7) one has $\displaystyle F(H_0+V) - F(H_0) = \frac{1}{2\pi i} \int_\gamma F(z) R_V(z) V F_0(z)\ dz;$ similarly, if ${H(t)}$ depends on ${t}$ in a continuously differentiable fashion, then $\displaystyle \frac{d}{dt} F(H(t)) = \frac{1}{2\pi i} \int_\gamma F(z) (H(t) - zI)^{-1} H'(t) (z) (H(t)-zI)^{-1}\ dz$ as long as ${\gamma}$ goes around the spectrum of ${H(t)}$. Exercise 7 If ${H(t)}$ is an ${n \times n}$ matrix depending continuously differentiably on ${t}$, and ${F: {\bf C} \rightarrow {\bf C}}$ is an entire function, establish the tracial chain rule $\displaystyle \frac{d}{dt} \hbox{tr} F(H(t)) = \hbox{tr}(F'(H(t)) H'(t)).$ In a similar vein, given that the logarithm function is the antiderivative of the reciprocal, one can express the matrix logarithm ${\log A}$ of a positive definite matrix by the fundamental theorem of calculus identity $\displaystyle \log A = \int_0^\infty (I + sI)^{-1} - (A + sI)^{-1}\ ds$ (with the constant term ${(I+tI)^{-1}}$ needed to prevent a logarithmic divergence in the integral). Differentiating, we see that if ${A(t)}$ is a family of positive definite matrices depending continuously on ${t}$, that $\displaystyle \frac{d}{dt} \log A(t) = \int_0^\infty (A(t) + sI)^{-1} A'(t) (A(t)+sI)^{-1}\ dt.$ This can be used for instance to show that ${\log}$ is a monotone increasing function, in the sense that ${\log A> \log B}$ whenever ${A > B > 0}$ in the sense of positive definite matrices. One can of course integrate this formula to obtain some formulae for the difference ${\log A - \log B}$ of the logarithm of two positive definite matrices ${A,B}$. To compare the square root ${A^{1/2} - B^{1/2}}$ of two positive definite matrices ${A,B}$ is trickier; there are multiple ways to proceed. One approach is to use contour integration as before (but one has to take some care to avoid branch cuts of the square root). Another to express the square root in terms of exponentials via the formula $\displaystyle A^{1/2} = \frac{1}{\Gamma(-1/2)} \int_0^\infty (e^{-tA} - I) t^{-1/2} \frac{dt}{t}$ where ${\Gamma}$ is the gamma function; this formula can be verified by first diagonalising ${A}$ to reduce to the scalar case and using the definition of the Gamma function. Then one has $\displaystyle A^{1/2} - B^{1/2} = \frac{1}{\Gamma(-1/2)} \int_0^\infty (e^{-tA} - e^{-tB}) t^{-1/2} \frac{dt}{t}$ and one can use some of the previous identities to control ${e^{-tA} - e^{-tB}}$. This is pretty messy though. A third way to proceed is via implicit differentiation. If for instance ${A(t)}$ is a family of positive definite matrices depending continuously differentiably on ${t}$, we can differentiate the identity $\displaystyle A(t)^{1/2} A(t)^{1/2} = A(t)$ to obtain $\displaystyle A(t)^{1/2} \frac{d}{dt} A(t)^{1/2} + (\frac{d}{dt} A(t)^{1/2}) A(t)^{1/2} = \frac{d}{dt} A(t).$ This can for instance be solved using Exercise 5 to obtain $\displaystyle \frac{d}{dt} A(t)^{1/2} = \int_0^\infty e^{-sA(t)^{1/2}} A'(t) e^{-sA(t)^{1/2}}\ ds$ and this can in turn be integrated to obtain a formula for ${A^{1/2} - B^{1/2}}$. This is again a rather messy formula, but it does at least demonstrate that the square root is a monotone increasing function on positive definite matrices: ${A > B > 0}$ implies ${A^{1/2} > B^{1/2} > 0}$. Several of the above identities for matrices can be (carefully) extended to operators on Hilbert spaces provided that they are sufficiently well behaved (in particular, if they have a good functional calculus, and if various spectral hypotheses are obeyed). We will not attempt to do so here, however. Filed under: expository, math.CA, math.RA Tagged: estimates, matrix identities, stability ## June 22, 2017 ### Jordan Ellenberg — Driftless Father’s Day This Father’s Day I found that, by some kind of unanticipated-gap-in-the-Red-Sea-level miracle, neither of my children had any events scheduled, so I gave myself a present and did something I’d been meaning to do for a year; take them to Dubuque. It’s not far from Madison. You drive southwest through the Driftless Zone, where the glaciers somehow looped around and missed a spot while they were grinding the rest of the Midwest flat. At the exit to Platteville there was a sign for a “Mining Museum.” We had about six seconds to decide whether we all wanted to go to a mining museum but that was plenty of time because obviously we all totally wanted to go to a mining museum. And it was great! Almost the platonic ideal of a small-town museum. Our guide took us down into the old lead mine from the 1850s, now with electric lights and a lot of mannequins caught in the act of blasting holes in the rock. (One of the mannequins was black; our guide told us that there were African-American miners in southwestern Wisconsin, but not that some of them were enslaved.) This museum did a great job of conveying the working conditions of those miners; ankle-deep in water, darkness broken only by the candle wired to the front of their hat, the hammers on the rock so loud you couldn’t talk, and had to communicate by hand signals. Riding up and down to the surface with one leg in the bucket and one leg out so more men could fit in one load, just hoping the bucket didn’t swing wrong and crush your leg against the rock wall. There’s nothing like an industrial museum to remind you that everything you buy in a store has hours of difficult, dangerous labor built into it. But it was also labor people traveled miles to get the chance to do! Only twenty miles further to the Mississippi, my daughter’s first time seeing the river, and across it Dubuque. Which has a pretty great Op-Art flag: Our main goal was the National Mississippi River Museum; slick where the Platteville museum was homespun, up-to-date where the Plateville Museum was old-fashioned. The kids really liked both. I wanted fewer interactive screens, more actual weird river creatures. The museum is on the Riverwalk; Dubuque, like just about every city on a body of water, is reinventing its shoreline as a tourist hub. Every harbor a Harborplace. OK, I snark, but it was a lovely walk; lots of handsome bridges in view, all different, an old-timey band playing in the gazebo, Illinois and Wisconsin and Iowa invisibly meeting across the water…. Only disappointment of the afternoon; the famous funicular railway was closed. Maybe they could have posted that on their website or something. But in a way it’s good they didn’t; if I’d known it was closed, I probably would have decided to put off the trip, and who knows if we’d ever have gone? On the way back we stopped in Dickeyville to get gas but missed the Dickeyville Grotto; would have stopped there for sure if I’d known about it. Dinner in Dodgeville at Culver’s, the Midwest’s superior version of In-N-Out, where I got my free Father’s Day turtle. I like cheese curds and brats as much as the next guy, but I gotta say, I think the turtle is my favorite of the many foods I’d never heard of before I moved to Wisconsin. ## June 21, 2017 ### Scott Aaronson — Alex Halderman testifying before the Senate Intelligence Committee This morning, my childhood best friend Alex Halderman testified before the US Senate about the proven ease of hacking electronic voting machines without leaving any record, the certainty that Russia has the technical capability to hack American elections, and the urgency of three commonsense (and cheap) countermeasures: 1. a paper trail for every vote cast in every state, 2. routine statistical sampling of the paper trail—enough to determine whether large-scale tampering occurred, and 3. cybersecurity audits to instill general best practices (such as firewalling election systems). You can watch Alex on C-SPAN here—his testimony begins at 2:16:13, and is followed by the Q&A period. You can also read Alex’s prepared testimony here, as well as his accompanying Washington Post editorial (joint with Justin Talbot-Zorn). Alex’s testimony—its civic, nonpartisan nature, right down to Alex’s flourish of approvingly quoting President Trump in support of paper ballots—reflects a moving optimism that, even in these dark times for democracy, Congress can be prodded into doing the right thing merely because it’s clearly, overwhelmingly in the national interest. I wish I could say I shared that optimism. Nevertheless, when called to testify, what can one do but act on the assumption that such optimism is justified? Here’s hoping that Alex’s urgent message is heard and acted on. ### John Baez — The Theory of Devices I’m visiting the University of Genoa and talking to two category theorists: Marco Grandis and Giuseppe Rosolini. Grandis works on algebraic topology and higher categories, while Rosolini works on the categorical semantics of programming languages. Yesterday, Marco Grandis showed me a fascinating paper by his thesis advisor: • Gabriele Darbo, Aspetti algebrico-categoriali della teoria dei dispotivi, Symposia Mathematica IV (1970), Istituto Nazionale di Alta Matematica, 303–336. It’s closely connected to Brendan Fong’s thesis, but also different—and, of course, much older. According to Grandis, Darbo was the first person to work on category theory in Italy. He’s better known for other things, like ‘Darbo’s fixed point theorem’, but this piece of work is elegant, and, it seems to me, strangely ahead of its time. The paper’s title translates as ‘Algebraic-categorical aspects of the theory of devices’, and its main concept is that of a ‘universe of devices’: a collection of devices of some kind that can be hooked up using wires to form more devices of this kind. Nowadays we might study this concept using operads—but operads didn’t exist in 1970, and Darbo did quite fine without them. The key is the category $\mathrm{FinCorel},$ which has finite sets as objects and ‘corelations’ as morphisms. I explained corelations here: Corelations in network theory, 2 February 2016. Briefly, a corelation from a finite set $X$ to a finite set $Y$ is a partition of the disjoint union of $X$ and $Y.$ We can get such a partition from a bunch of wires connecting points of $X$ and $Y.$ The idea is that two points lie in the same part of the partition iff they’re connected, directly or indirectly, by a path of wires. So, if we have some wires like this: they determine a corelation like this: There’s an obvious way to compose corelations, giving a category $\mathrm{FinCorel}.$ Gabriele Darbo doesn’t call them ‘corelations’: he calls them ‘trasduttori’. A literal translation might be ‘transducers’. But he’s definitely talking about corelations, and like Fong he thinks they are basic for studying ways to connect systems. Darbo wants a ‘universe of devices’ to assign to each finite set $X$ a set $D(X)$ of devices having $X$ as their set of ‘terminals’. Given a device in $D(X)$ and a corelation $f \colon X \to Y,$ thought of as a bunch of wires, he wants to be able to attach these wires to the terminals in $X$ and get a new device with $Y$ as its set of terminals. Thus he wants a map $D(f): D(X) \to D(Y).$ If you draw some pictures, you’ll see this should give a functor $D : \mathrm{FinCorel} \to \mathrm{Set}$ Moreover, if we have device with a set $X$ of terminals and a device with a set $Y$ of terminals, we should be able to set them side by side and get a device whose set of terminals form the set $X + Y$, meaning the disjoint union of $X$ and $Y.$ So, Darbo wants to have maps $\delta_{X,Y} : D(X) \times D(Y) \to D(X + Y)$ If you draw some more pictures you can convince yourself that $\delta$ should be a lax symmetric monoidal functor… if you’re one of the lucky few who knows what that means. If you’re not, you can look it up in many places, such as Section 1.2 here: • Brendan Fong, The Algebra of Open and Interconnected Systems, Ph.D. thesis, University of Oxford, 2016. (Blog article here.) Darbo does not mention lax symmetric monoidal functors, perhaps because such concepts were first introduced by Mac Lane only in 1968. But as far as I can tell, Darbo’s definition is almost equivalent to this: Definition. A universe of devices is a lax symmetric monoidal functor $D \colon \mathrm{FinCorel} \to \mathrm{Set}.$ One difference is that Darbo wants there to be exactly one device with no terminals. Thus, he assumes $D(\emptyset)$ is a one-element set, say $1$, while the definition above would only demand the existence of a map $\delta \colon 1 \to D(\emptyset)$ obeying a couple of axioms. That gives a particular ‘favorite’ device with no terminals. I believe we get Darbo’s definition from the above one if we further assume $\delta$ is the identity map. This makes sense if we take the attitude that ‘a device is determined by its observable behavior’, but not otherwise. This attitude is called ‘black-boxing’. Darbo does various things in his paper, but the most exciting to me is his example connected to linear electrical circuits. He defines, for any pair of objects $V$ and $I$ in an abelian category $C,$ a particular universe of devices. He calls this the universe of linear devices having $V$ as the object of potentials and $I$ as the object of currents. If you don’t like abelian categories, think of $C$ as the category of finite-dimensional real vector spaces, and let $V = I = \mathbb{R}.$ Electric potential and electric current are described by real numbers so this makes sense. The basic idea will be familiar to Fong fans. In an electrical circuit made of purely conductive wires, when two wires merge into one we add the currents to get the current on the wire going out. When one wire splits into two we duplicate the potential to get the potentials on the wires going out. Working this out further, any corelation $f : X \to Y$ between finite set determines two linear relations, one $f_* : I^X \rightharpoonup I^Y$ relating the currents on the wires coming in to the currents on the wires going out, and one $f^* : V^Y \rightharpoonup V^X$ relating the potentials on the wires going out to the potentials on the wires coming in. Here $I^X$ is the direct sum of $X$ copies of $I,$ and so on; the funky arrow indicates that we have a linear relation rather than a linear map. Note that $f_*$ goes forward while $f^*$ goes backward; this is mainly just conventional, since you can turn linear relations around, but we’ll see it’s sort of nice. If we let $\mathrm{Rel}(A,B)$ be the set of linear relations between two objects $A, B \in C,$ we can use the above technology to get a universe of devices where $D(X) = \mathrm{Rel}(V^X, I^X)$ In other words, a device of this kind is simply a linear relation between the potentials and currents at its terminals! How does $D$ get to be a functor $D : \mathrm{FinCorel} \to \mathrm{FinSet}$? That’s pretty easy. We’ve defined it on objects (that is, finite sets) by the above formula. So, suppose we have a morphism (that is, a corelation) $f \colon X \to Y.$ How do we define $D(f) : D(X) \to D(Y)?$ To answer this question, we need a function $D(f) : \mathrm{Rel}(V^X, I^X) \to \mathrm{Rel}(V^Y, I^Y)$ Luckily, we’ve got linear relations $f_* : I^X \rightharpoonup I^Y$ and $f^* : V^Y \rightharpoonup V^X$ So, given any linear relation $R \in \mathrm{Rel}(V^X, I^X),$ we just define $D(f)(R) = f_* \circ R \circ f^*$ Voilà! People who have read Fong’s thesis, or my paper with Blake Pollard on reaction networks: • John Baez and Blake Pollard, A compositional framework for reaction networks. will find many of Darbo’s ideas eerily similar. In particular, the formula $D(f)(R) = f_* \circ R \circ f^*$ appears in Lemma 16 of my paper with Blake, where we are defining a category of open dynamical systems. We prove that $D$ is a lax symmetric monoidal functor, which is just what Darbo proved—though in a different context, since our $R$ is not linear like his, and for a different purpose, since he’s trying to show $D$ is a ‘universe of devices’, while we’re trying to construct the category of open dynamical systems as a ‘decorated cospan category’. In short: if this work of Darbo had become more widely known, the development of network theory could have been sped up by three decades! But there was less interest in a general theory of networks at the time, lax monoidal functors were brand new, operads unknown… and, sadly, few mathematicians read Italian. Darbo has other papers, and so do his students. We should read them and learn from them! Here are a few open-access ones: • Franco Parodi, Costruzione di un universo di dispositivi non lineari su una coppia di gruppi abeliani , Rendiconti del Seminario Matematico della Università di Padova 58 (1977), 45–54. • Franco Parodi, Categoria degli universi di dispositivi e categoria delle T-algebre, Rendiconti del Seminario Matematico della Università di Padova 62 (1980), 1–15. • Stefano Testa, Su un universo di dispositivi monotoni, Rendiconti del Seminario Matematico della Università di Padova 65 (1981), 53–57. At some point I will scan in G. Darbo’s paper and make it available here. ### Doug Natelson — About grants: What are "indirect costs"? Before blogging further about science, I wanted to explain something about the way research grants work in the US. Consider this part of my series of posts intended to educate students (and perhaps the public) about careers in academic research. When you write a proposal to a would-be source of research funding, you have to include a budget. As anyone would expect, that budget will list direct costs - these are items that are clear research expenses. Examples would include, say,$30K/yr for a graduate student's stipend, and $7K for a piece of laboratory electronics essential to the work, and$2K/yr to support travel of the student and the principal investigator (PI) to conferences.   However, budgets also include indirect costs, sometimes called overhead.  The idea is that research involves certain costs that aren't easy to account for directly, like the electricity to run the lights and air conditioning in the lab, or the costs to keep the laboratory building maintained so that the research can get done, or the (meta)costs for the university to administer the grant.

So, how does the university to figure out how much to tack on for indirect costs?  For US federal grants, the magic (ahem) is all hidden away in OMB Circular A21 (wiki about it, pdf of the actual doc).  Universities periodically go through an elaborate negotiation process with the federal government (see here for a description of this regarding MIT), and determine an indirect cost rate for that university.  The idea is you take the a version of the direct costs ("modified total direct costs" - for example, a piece of equipment that costs more than \$5K is considered a capital expense and not subject to indirect costs) and multiply by a negotiated factor (in the case of Rice right now, 56.5%) to arrive at the indirect costs.  The cost rates are lower for research done off campus (like at CERN), with the argument that this should be cheaper for the university.  (Effective indirect cost rates at US national labs tend to be much higher.)

Foundations and industry negotiate different rates with universities.  Foundations usually limit their indirect cost payments, arguing that they just can't afford to pay at the federal level.  The Bill and Melinda Gates Foundation, for example, only allows (pdf) 10% for indirect costs.   The effective indirect rate for a university, averaged over the whole research portfolio, is always quite a bit lower than the nominal A21 negotiated rate.  Vice provosts/presidents/chancellors for research at major US universities would be happy to explain at length that indirect cost recovery doesn't come close to covering the actual costs associated with doing university-based research.

Indirect cost rates in the US are fraught with controversy, particularly now.  The current system is definitely complicated, and reasonable people can ask whether it makes sense (and adds administrative costs) to have every university negotiate its own rate with the feds.   It remains to be seen whether there are changes in the offing.

## June 20, 2017

### Georg von Hippel — Lattice 2017, Day Two

Welcome back to our blog coverage of the Lattics 2017 conference in Granada.

Today's first plenary session started with an experimental talk by Arantza Oyanguren of the LHCb collaboration on B decay anomalies at LHCb. LHCb have amassed a huge number of b-bbar pairs, which allow them to search for and study in some detail even the rarest of decay modes, and they are of course still collecting more integrated luminosity. Readers of this blog will likely recall the Bs → μ+μ- branching ratio result from LHCb, which agreed with the Standard Model prediction. In the meantime, there are many similar results for branching ratios that do not agree with Standard Model predictions at the 2-3σ level, e.g. the ratios of branching fractions like Br(B+→K+μ+μ-)/Br(B+→K+e+e-), in which lepton flavour universality appears to be violated. Global fits to data in these channels appear to favour the new physics hypothesis, but one should be cautious because of the "look-elsewhere" effect: when studying a very large number of channels, some will show an apparently significant deviation simply by statistical chance. On the other hand, it is very interesting that all the evidence indicating potential new physics (including the anomalous magnetic moment of the muon and the discrepancy between the muonic and electronic determinations of the proton electric charge radius) involve differences between processes involving muons and analogous processes involving electrons, an observation I'm sure model-builders have made a long time ago.

This was followed by a talk on flavour physics anomalies by Damir Bečirević. Expanding on the theoretical interpretation of the anomalies discussed in the previous talk, he explained how the data seem to indicate a violation of lepton flavour universality at the level where the Wilson coefficient C9 in the effective Hamiltonian is around zero for electrons, and around -1 for muons. Experimental data seem to favour the situation where C10=-C9, which can be accommodated is certain models with a Z' boson coupling preferentially to muons, or in certain special leptoquark models with corrections at the loop level only. Since I have little (or rather no) expertise in phenomenological model-building, I have no idea how likely these explanations are.

The next speaker was Xu Feng, who presented recent progress in kaon physics simulations on the lattice. The "standard" kaon quantities, such as the kaon decay constant or f+(0), are by now very well-determined from the lattice, with overall errors at the sub-percent level, but beyond that there are many important quantities, such as the CP-violating amplitudes in K → ππ decays, that are still poorly known and very challenging. RBC/UKQCD have been leading the attack on many of these observables, and have presented a possible solution to the ΔI=1/2 rule, which consists in non-perturbative effects making the amplitude A0 much larger relative to A2 than what would be expected from naive colour counting. Making further progress on long-distance contributions to the KL-KS mass difference or εK will require working at the physical pion mass and treating the charm quark with good control of discretization effects. For some processes, such as KL→π0+-, even the sign of the coefficient would be desirable.

After the coffee break, Luigi Del Debbio talked about parton distributions in the LHC era. The LHC data reduce the error on the NNLO PDFs by around a factor of two in the intermediate-x region. Conversely, the theory errors coming from the PDFs are a significant part of the total error from the LHC on Higgs physics and BSM searches. In particular the small-x and large-x regions remain quite uncertain. On the lattice, PDFs can be determined via quasi-PDFs, in which the Wilson line inside the non-local bilinear is along a spatial direction rather than in a light-like direction. However, there are still theoretical issues to be settled in order to ensure that the renormalization and matching the the continuum really lead to the determination of continuum PDFs in the end.

Next was a talk about chiral perturbation theory results on the multi-hadron state contamination of nucleon observables by Oliver Bär. It is well known that until very recently, lattice calculations of the nucleon axial charge underestimated its value relative to experiment, and this has been widely attributed to excited-state effects. Now, Oliver has calculated the corrections from nucleon-pion states on the extraction of the axial charge in chiral perturbation theory, and has found that they actually should lead to an overestimation of the axial charge from the plateau method, at least for source-sink separations above 2 fm, where ChPT is applicable. Similarly, other nucleon charges should be overestimated by 5-10%. Of course, nobody is currently measuring in that distance regime, and so it is quite possible that higher-order corrections or effects not captured by ChPT overcompensate this and lead to an underestimation, which would however mean that there is some instermediate source-sink separation for which one gets the experimental result by accident, as it were.

The final plenary speaker of the morning was Chia-Cheng Chang, who discussed progress towards a precise lattice determination of the nucleon axial charge, presenting the results of the CalLAT collaboration from using what they refer to as the Feynman-Hellmann method, a novel way of implementing what is essentially the summation method through ideas based in the Feynman-Hellmann theorem (but which doesn't involve simulating with a modified action, as a straightforward applicaiton of the Feynman-Hellmann theorem would demand).

After the lunch break, there were parallel sessions, and in the evening, the poster session took place. A particular interesting and entertaining contribution was a quiz about women's contributions to physics and computer science, the winner of which will win a bottle of wine and a book.

### Georg von Hippel — Lattice 2017, Day One

Hello from Granada and welcome to our coverage of the 2017 lattice conference.

After welcome addresses by the conference chair, a representative of the government agency in charge of fundamental research, and the rector of the university, the conference started off in a somewhat sombre mood with a commemoration of Roberto Petronzio, a pioneer of lattice QCD, who passed away last year. Giorgio Parisi gave a memorial talk summarizing Roberto's many contributions to the development of the field, from his early work on perturbative QCD and the parton model, through his pioneering contributions to lattice QCD back in the days of small quenched lattices, to his recent work on partially twisted boundary conditions and on isospin breaking effects, which is very much at the forefront of the field at the moment, not to omit Roberto's role as director of the Italian INFN in politically turbulent times.

This was followed by a talk by Martin Lüscher on stochastic locality and master-field simulations of very large lattices. The idea of a master-field simulation is based on the observation of volume self-averaging, i.e. that the variance of volume-averaged quantities is much smaller on large lattices (intuitively, this would be because an infinitely-extended properly thermalized lattice configuration would have to contain any possible finite sub-configuration with a frequency corresponding to its weight in the path integral, and that thus a large enough typical lattice configuration is itself a sort of ensemble). A master field is then a huge (e.g. 2564) lattice configuration, on which volume averages of quantities are computed, which have an expectation value equal to the QCD expectation value of the quantity in question, and a variance which can be estimated using a double volume sum that is doable using an FFT. To generate such huge lattice, algorithms with global accept-reject steps (like HMC) are unsuitable, because ΔH grows with the square root of the volume, but stochastic molecular dynamics (SMD) can be used, and it has been rigorously shown that for short-enough trajectory lengths SMD converges to a unique stationary state even without an accept-reject step.

After the coffee break, yet another novel simulation method was discussed by Ignacio Cirac, who presented techniques to perform quantum simulations of QED and QCD on alattice. While quantum computers of the kind that would render RSA-based public-key cryptography irrelevant remain elusive at the moment, the idea of a quantum simulator (which is essentially an analogue quantum computer), which goes back to Richard Feynman, can already be realized in practice: optical lattices allow trapping atoms on lattice sites while fine-tuning their interactions so as to model the couplings of some other physical system, which can thus be simulated. The models that are typically simulated in this way are solid-state models such as the Hubbard model, but it is of course also possible to setup a quantum simulator for a lattice field theory that has been formulated in the Hamiltonian framework. In order to model a gauge theory, it is necessary to model the gauge symmetry by some atomic symmetry such as angular momentum conservation, and this has been done at least in theory for QED and QCD. The Schwinger model has been studied in some detail. The plaquette action for d>1+1 additionally requires a four-point interaction between the atoms modelling the link variables, which can be realized using additional auxiliary variables, and non-abelian gauge groups can be encoded using multiple species of bosonic atoms. A related theoretical tool that is still in its infancy, but shows significant promise, is the use of tensor networks. This is based on the observation that for local Hamiltonians the entanglement between a region and its complement grows only as the surface of the region, not its volume, so only a small corner of the total Hilbert space is relevant; this allows one to write the coefficients of the wavefunction in a basis of local states as a contraction of tensors, from where classical algorithms that scale much better than the exponential growth in the number of variables that would naively be expected can be derived. Again, the method has been successfully applied to the Schwinger model, but higher dimensions are still challenging, because the scaling, while not exponential, still becomes very bad.

Staying with the topic of advanced simulation techniques, the next talk was Leonardo Giusti speaking about the block factorization of fermion determinants into local actions for multi-boson fields. By decomposing the lattice into three pieces, of which the middle one separates the other by a distance Δ large enough to render e-MπΔ small, and by applying a domain decomposition similar to the one used in Lüscher's DD-HMC algorithm to the Dirac operator, Leonardo and collaborators have been able to derive a multi-boson algorithm that allows to perform multilevel integration with dynamical fermions. For hadronic observables, the quark propagator also needs to be factorized, which Leonardo et al. also have achieved, making a significant decrease in statistical error possible.

After the lunch break there were parallel sessions, in one of which I gave my own talk and another one of which I chaired, thus finishing all of my duties other than listening (and blogging) on day one.

In the evening, there was a reception followed by a special guided tour of the truly stunning Alhambra (which incidentally contains a great many colourful - and very tasteful - lattices in the form of ornamental patterns).

### Chad Orzel — Physics Blogging Round-Up: May

Much delayed, but this works out well because it’ll give you something to read while we’re away in Mexico on a family vacation. Here’s what I wrote for Forbes in the merry month of May:

In Science, Probability Is More Certain Than You Think: Some thoughts on the common mistake people make in saying that science only predicts probabilities of future outcomes.

A “Cosmic Controversy” Is Mostly A Distraction: A lament about the neglect of science we know to be true versus more speculative stuff.

Why Do We Invent Historical Roots For Modern Science?: Claims of ancient origins for current ideas in science often have more to do with modern concerns than historical reality.

What Things Should Every Physics Major Know?: A look at the very broad topics that are truly essential for an undergraduate physics degree.

Science Communication Is A Two-Way Street: The calmer version of a Twitter rant about how failures in science communication can’t be blamed only on scientists; the non-scientists who actively push us away also bear some responsibility.

Kind of a lot of noodle-y stuff in this month, largely because of my day job. I was team-teaching our Integrated Math and Physics class with a colleague from Math, and the class met for a couple of hours a day four days a week. It also used a book that I’d never used before, which means that even though the subject matter (introductory E&M) was familiar, it was essentially a new prep because all my notes needed to be converted to match the notation and language of the new book. That didn’t leave an enormous amount of mental energy for blogging.

Traffic-wise, the physics major post was a big hit, and most of the feedback I got was positive. Many of the others were a little too inside-baseball to get read all that widely, which is a Thing.

Anyway, that’s what I was blogging about not all that long ago.

### John Baez — The Geometric McKay Correspondence (Part 1)

The ‘geometric McKay correspondence’, actually discovered by Patrick du Val in 1934, is a wonderful relation between the Platonic solids and the ADE Dynkin diagrams. In particular, it sets up a connection between two of my favorite things, the icosahedron:

and the $\mathrm{E}_8$ Dynkin diagram:

When I recently gave a talk on this topic, I realized I didn’t understand it as well as I’d like. Since then I’ve been making progress with the help of this book:

• Alexander Kirillov Jr., Quiver Representations and Quiver Varieties, AMS, Providence, Rhode Island, 2016.

I now think I glimpse a way forward to a very concrete and vivid understanding of the relation between the icosahedron and E8. It’s really just a matter of taking the ideas in this book and working them out concretely in this case. But it takes some thought, at least for me. I’d like to enlist your help.

The rotational symmetry group of the icosahedron is a subgroup of $\mathrm{SO}(3)$ with 60 elements, so its double cover up in $\mathrm{SU}(2)$ has 120. This double cover is called the binary icosahedral group, but I’ll call it $\Gamma$ for short.

This group $\Gamma$ is the star of the show, the link between the icosahedron and E8. To visualize this group, it’s good to think of $\mathrm{SU}(2)$ as the unit quaternions. This lets us think of the elements of $\Gamma$ as 120 points in the unit sphere in 4 dimensions. They are in fact the vertices of a 4-dimensional regular polytope, which looks like this:

It’s called the 600-cell.

Since $\Gamma$ is a subgroup of $\mathrm{SU}(2)$ it acts on $\mathbb{C}^2,$ and we can form the quotient space

$S = \mathbb{C}^2/\Gamma$

This is a smooth manifold except at the origin—that is, the point coming from $0 \in \mathbb{C}^2.$ There’s a singularity at the origin, and this where $\mathrm{E}_8$ is hiding! The reason is that there’s a smooth manifold $\widetilde{S}$ and a map

$\pi : \widetilde{S} \to S$

that’s one-to-one and onto except at the origin. It maps 8 spheres to the origin! There’s one of these spheres for each dot here:

Two of these spheres intersect in a point if their dots are connected by an edge; otherwise they’re disjoint.

The challenge is to find a nice concrete description of $\widetilde{S},$ the map $\pi : \widetilde{S} \to S,$ and these 8 spheres.

But first it’s good to get a mental image of $S.$ Each point in this space is a $\Gamma$ orbit in $\mathbb{C}^2,$ meaning a set like this:

$\{g x : \; g \in \Gamma \}$

for some $x \in \mathbb{C}^2.$ For $x = 0$ this set is a single point, and that’s what I’ve been calling the ‘origin’. In all other cases it’s 120 points, the vertices of a 600-cell in $\mathbb{C}^2.$ This 600-cell is centered at the point $0 \in \mathbb{C}^2,$ but it can be big or small, depending on the magnitude of $x.$

So, as we take a journey starting at the origin in $S,$ we see a point explode into a 600-cell, which grows and perhaps also rotates as we go. The origin, the singularity in $S,$ is a bit like the Big Bang.

Unfortunately not every 600-cell centered at the origin is of the form I’ve shown:

$\{g x : \; g \in \Gamma \}$

It’s easiest to see this by thinking of points in 4d space as quaternions rather than elements of $\mathbb{C}^2.$ Then the points $g \in \Gamma$ are unit quaternions forming the vertices of a 600-cell, and multiplying $g$ on the right by $x$ dilates this 600-cell and also rotates it… but we don’t get arbitrary rotations this way. To get an arbitrarily rotated 600-cell we’d have to use both a left and right multiplication, and consider

$\{x g y : \; g \in \Gamma \}$

for a pair of quaternions $x, y.$

Luckily, there’s a simpler picture of the space $S.$ It’s the space of all regular icosahedra centered at the origin in 3d space!

To see this, we start by switching to the quaternion description, which says

$S = \mathbb{H}/\Gamma$

Specifying a point $x \in \mathbb{H}$ amounts to specifying the magnitude $\|x\|$ together with $x/\|x\|,$ which is a unit quaternion, or equivalently an element of $\mathrm{SU}(2).$ So, specifying a point in

$\{g x : \; g \in \Gamma \} \in \mathbb{H}/\Gamma$

amounts to specifying the magnitude $\|x\|$ together with a point in $\mathrm{SU}(2)/\Gamma$. But $\mathrm{SU}(2)$ modulo the binary icosahedral group $\Gamma$ is the same as $\mathrm{SO(3)}$ modulo the icosahedral group (the rotational symmetry group of an icosahedron). Furthermore, $\mathrm{SO(3)}$ modulo the icosahedral group is just the space of unit-sized icosahedra centered at the origin of $\mathbb{R}^3.$

So, specifying a point

$\{g x : \; g \in \Gamma \} \in \mathbb{H}/\Gamma$

amounts to specifying a nonnegative number $\|x\|$ together with a unit-sized icosahedron centered at the origin of $\mathbb{R}^3.$ But this is the same as specifying an icosahedron of arbitrary size centered at the origin of $\mathbb{R}^3.$ There’s just one subtlety: we allow the size of this icosahedron to be zero, but then the way it’s rotated no longer matters.

So, $S$ is the space of icosahedra centered at the origin, with the ‘icosahedron of zero size’ being a singularity in this space. When we pass to the smooth manifold $\widetilde{S},$ we replace this singularity with 8 spheres, intersecting in a pattern described by the $\mathrm{E}_8$ Dynkin diagram.

Points on these spheres are limiting cases of icosahedra centered at the origin. We can approach these points by letting an icosahedron centered at the origin shrink to zero size in a clever way, perhaps spinning about wildly as it does.

I don’t understand this last paragraph nearly as well as I’d like! I’m quite sure it’s true, and I know a lot of relevant information, but I don’t see it. There should be a vivid picture of how this works, not just an abstract argument. Next time I’ll start trying to assemble the material that I think needs to go into building this vivid picture.

## June 19, 2017

### Chad Orzel — Kids Update, Programming Note

I’ve skipped a few weeks of cute-kid updates, largely because I was at DAMOP for a week, and then catching on stuff I missed while I was at DAMOP for a week. The principal activity during this stretch has been SteelyKid’s softball, with a mad flurry of games at the end of the season to make up for all the rained-out games. This has been sort of stressful, but it also led to the greatest Google Photos animation ever, so…

Anyway, softball was fun, providing the opportunity for me to take no end of photos with my telephoto lens, some of which are pretty good. SteelyKid was way into running the bases, which ended up providing material for a blog post, so everybody wins. And I got a lot of photos like this one:

SteelyKid is pretty intense when she runs the bases.

Of course, while the intense running and dark helmet make her look a little intimidating in that, she’s still a cheerful eight-year-old, which means there’s really not a lot of killer instinct going on. For example, while she was playing first base, she high-fived every player on the other team who reached base safely:

Sportsmanlike!

Softball’s kind of a slow game, so boredom is always a danger when you have an eight-year-old’s attention span. She finds ways to pass the slower moments, though:

SteelyKid working on her fitness while her teammates bat.

(That’s not the greatest photo because the sun is setting more or less directly behind that dugout, but GIMP makes it tolerably clear…)

Speaking of short attention spans, The Pip has also come to a lot of the games, though he mostly just runs around and pays no attention to softball. Here he is stalking Kate, who’s watching SteelyKid play:

Stealthy Pip.

He’s like a ninja. In safety orange.

Much of the time, though, he’s pretty effective at keeping one or both of us from watching the game, frequently roping us into games of hide and seek or tag:

The Pip was It. Now Kate is.

(I also let him chase me around, but I’m the one who knows how to work the good camera. And more importantly, I’m the one who has editorial control over what pictures get posted here…)

And when all else fails, he can plop down on the ground and play in the grass:

“Dad, do I have grass in my hair?”

On the “programming note” side of things, I’m also aware that I haven’t posted the Forbes blog recap from May. I’m planning to type that up and post it probably Wednesday morning. Wednesday afternoon, we’re leaving on vacation for a while, going to Mexico with family, so you can expect a lot of photos of the kids doing tropical things in a few weeks…

### John Preskill — Time capsule at the Dibner Library

The first time I met Lilla Vekerdy, she was holding a book.

“A second edition of Galileo’s Siderius nuncius. Here,” she added, thrusting the book into my hands. “Take it.”

So began my internship at the Smithsonian Institution’s Dibner Library for the History of Science and Technology.

Many people know the Smithsonian for its museums. The Smithsonian, they know, houses the ruby slippers worn by Dorothy in The Wizard of Oz. The Smithsonian houses planes constructed by Orville and Wilbur Wright, the dresses worn by First Ladies on presidential inauguration evenings, a space shuttle, and a Tyrannosaurus Rex skeleton. Smithsonian museums line the National Mall in Washington, D.C.—the United States’ front lawn—and march beyond.

Most people don’t know that the Smithsonian has 21 libraries.

Lilla heads the Smithsonian Libraries’ Special-Collections Department. She also directs a library tucked into a corner of the National Museum of American History. I interned at that library—the Dibner—in college. Images of Benjamin Franklin, of inventor Eli Whitney, and of astronomical instruments line the walls. The reading room contains styrofoam cushions on which scholars lay crumbling rare books. Lilla and the library’s technician, Morgan Aronson, find references for researchers, curate exhibitions, and introduce students to science history. They also care for the vault.

The vault. How I’d missed the vault.

A corner of the Dibner’s reading room and part of the vault

The vault contains manuscripts and books from the past ten centuries. We handle the items without gloves, which reduce our fingers’ sensitivities: Interpose gloves between yourself and a book, and you’ll raise your likelihood of ripping a page. A temperature of 65°F inhibits mold from growing. Redrot mars some leather bindings, though, and many crowns—tops of books’ spines—have collapsed. Aging carries hazards.

But what the ages have carried to the Dibner! We1 have a survey filled out by Einstein and a first edition of Newton’s Principia mathematica. We have Euclid’s Geometry in Latin, Arabic, and English, from between 1482 and 1847. We have a note, handwritten by quantum physicist Erwin Schödinger, about why students shouldn’t fear exams.

I returned to the Dibner one day this spring. Lilla and I fetched out manuscripts and books related to quantum physics and thermodynamics. “Hermann Weyl” labeled one folder.

Weyl contributed to physics and mathematics during the early 1900s. I first encountered his name when studying particle physics. The Dibner, we discovered, owns a proof for part of his 1928 book Gruppentheorie und Quantenmechanik. Weyl appears to have corrected a typed proof by hand. He’d handwritten also spin matrices.

Electrons have a property called “spin.” Spin resembles a property of yours, your position relative to the Earth’s center. We represent your position with three numbers: your latitude, your longitude, and your distance above the Earth’s surface. We represent electron spin with three blocks of numbers, three $2 \times 2$ matrices. Today’s physicists write the matrices as2

$S_x = \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix} \, , \quad S_y = \begin{bmatrix} 0 & -i \\ i & 0 \end{bmatrix} \, , \quad \text{and} \quad S_z = \begin{bmatrix} -1 & 0 \\ 0 & 1 \end{bmatrix} \, .$

We needn’t write these matrices. We could represent electron spin with different $2 \times 2$ matrices, so long as the matrices obey certain properties. But most physicists choose the above matrices, in my experience. We call our choice “a convention.”

Weyl chose a different convention:

$S_x = \begin{bmatrix} 1 & 0 \\ 0 & -1 \end{bmatrix} \, , \quad S_y = \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix} \, , \quad \text{and} \quad S_z = \begin{bmatrix} 0 & i \\ -i & 0 \end{bmatrix} \, .$

The difference surprised me. Perhaps it shouldn’t have: Conventions change. Approaches to quantum physics change. Weyl’s matrices differ from ours little: Permute our matrices and negate one matrix, and you recover Weyl’s.

But the electron-spin matrices play a role, in quantum physics, like the role played by T. Rex in paleontology exhibits: All quantum scientists recognize electron spin. We illustrate with electron spin in examples. Students memorize spin matrices in undergrad classes. Homework problems feature electron spin. Physicists have known of electron spin’s importance for decades. I didn’t expect such a bedrock to have changed its trappings.

How did scientists’ convention change? When did it? Why? Or did the convention not change—did Weyl’s contemporaries use today’s convention, and did Weyl stand out?

A partially typed, partially handwritten, proof of a book by Hermann Weyl.

I intended to end this article with these questions. I sent a draft to John Preskill, proposing to post soon. But he took up the questions like a knight taking up arms.

Wolfgang Pauli, John emailed, appears to have written the matrices first. (Physicist call the matrices “Pauli matrices.”) A 1927 paper by Pauli contains the notation used today. Paul Dirac copied the notation in a 1928 paper, acknowledging Pauli. Weyl’s book appeared the same year. The following year, Weyl used Pauli’s notation in a paper.

No document we know of, apart from the Dibner proof, contains the Dibner-proof notation. Did the notation change between the proof-writing and publication? Does the Dibner hold the only anomalous electron-spin matrices? What accounts for the anomaly?

If you know, feel free to share. If you visit DC, drop Lilla and Morgan a line. Bring a research project. Bring a class. Bring zeal for the past. You might find yourself holding a time capsule by Galileo.

Dibner librarian Lilla Vekerdy and a former intern

With thanks to Lilla and Morgan for their hospitality, time, curiosity, and expertise. With thanks to John for burrowing into the Pauli matrices’ history.

1I continue to count myself as part of the Dibner community. Part of me refuses to leave.

2I’ll omit factors of $\hbar / 2 \, .$

## June 18, 2017

### Sean Carroll — A Response to “On the time lags of the LIGO signals” (Guest Post)

This is a special guest post by Ian Harry, postdoctoral physicist at the Max Planck Institute for Gravitational Physics, Potsdam-Golm. You may have seen stories about a paper that recently appeared, which called into question whether the LIGO gravitational-wave observatory had actually detected signals from inspiralling black holes, as they had claimed. Ian’s post is an informal response to these claims, on behalf of the LIGO Scientific Collaboration. He argues that there are data-analysis issues that render the new paper, by James Creswell et al., incorrect. Happily, there are sufficient online tools that this is a question that interested parties can investigate for themselves. Here’s Ian:

On 13 Jun 2017 a paper appeared on the arXiv titled “On the time lags of the LIGO signals” by Creswell et al. This paper calls into question the 5-sigma detection claim of GW150914 and following detections. In this short response I will refute these claims.

Who am I? I am a member of the LIGO collaboration. I work on the analysis of LIGO data, and for 10 years have been developing searches for compact binary mergers. The conclusions I draw here have been checked by a number of colleagues within the LIGO and Virgo collaborations. We are also in touch with the authors of the article to raise these concerns directly, and plan to write a more formal short paper for submission to the arXiv explaining in more detail the issues I mention below. In the interest of timeliness, and in response to numerous requests from outside of the collaboration, I am sharing these notes in the hope that they will clarify the situation.

In this article I will go into some detail to try to refute the claims of Creswell et al. Let me start though by trying to give a brief overview. In Creswell et al. the authors take LIGO data made available through the LIGO Open Science Data from the Hanford and Livingston observatories and perform a simple Fourier analysis on that data. They find the noise to be correlated as a function of frequency. They also perform a time-domain analysis and claim that there are correlations between the noise in the two observatories, which is present after removing the GW150914 signal from the data. These results are used to cast doubt on the reliability of the GW150914 observation. There are a number of reasons why this conclusion is incorrect: 1. The frequency-domain correlations they are seeing arise from the way they do their FFT on the filtered data. We have managed to demonstrate the same effect with simulated Gaussian noise. 2. LIGO analyses use whitened data when searching for compact binary mergers such as GW150914. When repeating the analysis of Creswell et al. on whitened data these effects are completely absent. 3. Our 5-sigma significance comes from a procedure of repeatedly time-shifting the data, which is not invalidated if correlations of the type described in Creswell et al. are present.

Section II: The curious case of the Fourier phase correlations?

The main result (in my opinion) from section II of Creswell et al. is Figure 3, which shows that, when one takes the Fourier transform of the LIGO data containing GW150914, and plots the Fourier phases as a function of frequency, one can see a clear correlation (ie. all the points line up, especially for the Hanford data). I was able to reproduce this with the LIGO Open Science Center data and a small ipython notebook. I make the ipython notebook available so that the reader can see this, and some additional plots, and reproduce this.

For Gaussian noise we would expect the Fourier phases to be distributed randomly (between -pi and pi). Clearly in the plot shown above, and in Creswell et al., this is not the case. However, the authors overlooked one critical detail here. When you take a Fourier transform of a time series you are implicitly assuming that the data are cyclical (i.e. that the first point is adjacent to the last point). For colored Gaussian noise this assumption will lead to a discontinuity in the data at the two end points, because these data are not causally connected. This discontinuity can be responsible for misleading plots like the one above.

To try to demonstrate this I perform two tests. First I whiten the colored LIGO noise by measuring the power spectral density (see the LOSC example, which I use directly in my ipython notebook, for some background on colored noise and noise power spectral density), then dividing the data in the Fourier domain by the power spectral density, and finally converting back to the time domain. This process will corrupt some data at the edges so after whitening we only consider the middle half of the data. Then we can make the same plot:

And we can see that there are now no correlations visible in the data. For white Gaussian noise there is no correlation between adjacent points, so no discontinuity is introduced when treating the data as cyclical. I therefore assert that Figure 3 of Creswell et al. actually has no meaning when generated using anything other than whitened data.

I would also like to mention that measuring the noise power spectral density of LIGO data can be challenging when the data are non-stationary and include spectral lines (as Creswell et al. point out). Therefore it can be difficult to whiten data in many circumstances. For the Livingston data some of the spectral lines are still clearly present after whitening (using the methods described in the LOSC example), and then mild correlations are present in the resulting plot (see ipython notebook). This is not indicative of any type of non-Gaussianity, but demonstrates that measuring the noise power-spectral density of LIGO noise is difficult, and, especially for parameter inference, a lot of work has been spent on answering this question.

To further illustrate that features like those seen in Figure 3 of Creswell et al. can be seen in known Gaussian noise I perform an additional check (suggested by my colleague Vivien Raymond). I generate a 128 second stretch of white Gaussian noise (using numpy.random.normal) and invert the whitening procedure employed on the LIGO data above to produce 128 seconds of colored Gaussian noise. Now the data, previously random, are ‘colored’ Coloring the data in the manner I did makes the full data set cyclical (the last point is correlated with the first) so taking the Fourier transform of the complete data set, I see the expected random distribution of phases (again, see the ipython notebook). However, If I select 32s from the middle of this data, introducing a discontinuity as I mention above, I can produce the following plot:

In other words, I can produce an even more extremely correlated example than on the real data, with actual Gaussian noise.

Section III: The data is strongly correlated even after removing the signal

The second half of Creswell et al. explores correlations between the data taken from Hanford and Livingston around GW150914. For me, the main conclusion here is communicated in Figure 7, where Creswell et al. claim that even after removal of the GW150914 best-fit waveform there is still correlation in the data between the two observatories. This is a result I have not been able to reproduce. Nevertheless, if such a correlation were present it would suggest that we have not perfectly subtracted the real signal from the data, which would not invalidate any detection claim. There could be any number of reasons for this, for example the fact that our best-fit waveform will not exactly match what is in the data as we cannot measure all parameters with infinite precision. There might also be some deviations because the waveform models we used, while very good, are only approximations to the real signal (LIGO put out a paper quantifying this possibility). Such deviations might also be indicative of a subtle deviation from general relativity. These are of course things that LIGO is very interested in pursuing, and we have published a paper exploring potential deviations from general relativity (finding no evidence for that), which includes looking for a residual signal in the data after subtraction of the waveform (and again finding no evidence for that).

Finally, LIGO runs “unmodelled” searches, which do not search for specific signals, but instead look for any coherent non-Gaussian behaviour in the observatories. These searches actually were the first to find GW150914, and did so with remarkably consistent parameters to the modelled searches, something which we would not expect to be true if the modelled searches are “missing” some parts of the signal.

With that all said I try to reproduce Figure 7. First I begin by cross-correlating the Hanford and Livingston data, after whitening and band-passing, in a very narrow 0.02s window around GW150914. This produces the following:

There is a clear spike here at 7ms (which is GW150914), with some expected “ringing” behaviour around this point. This is a much less powerful method to extract the signal than matched-filtering, but it is completely signal independent, and illustrates how loud GW150914 is. Creswell et al. however, do not discuss their normalization of this cross-correlation, or how likely a deviation like this is to occur from noise alone. Such a study would be needed before stating that this is significant—In this case we know this signal is significant from other, more powerful, tests of the data. Then I repeat this but after having removed the best-fit waveform from the data in both observatories (using only products made available in the LOSC example notebooks). This gives:

This shows nothing interesting at all.

Section IV: Why would such correlations not invalidate the LIGO detections?

Creswell et al. claim that correlations between the Hanford and Livingston data, which in their results appear to be maximized around the time delay reported for GW150914, raised questions on the integrity of the detection. They do not. The authors claim early on in their article that LIGO data analysis assumes that the data are Gaussian, independent and stationary. In fact, we know that LIGO data are neither Gaussian nor stationary and if one reads through the technical paper accompanying the detection PRL, you can read about the many tests we run to try to distinguish between non-Gaussianities in our data and real signals. But in doing such tests, we raise an important question: “If you see something loud, how can you be sure it is not some chance instrumental artifact, which somehow was missed in the various tests that you do”. Because of this we have to be very careful when assessing the significance (in terms of sigmas—or the p-value, to use the correct term). We assess the significance using a process called time-shifts. We first look through all our data to look for loud events within the 10ms time-window corresponding to the light travel time between the two observatories. Then we look again. Except the second time we look we shift ALL of the data from Livingston by 0.1s. This delay is much larger than the light travel time so if we see any interesting “events” now they cannot be genuine astrophysical events, but must be some noise transient. We then repeat this process with a 0.2s delay, 0.3s delay and so on up to time delays on the order of weeks long. In this way we’ve conducted of order 10 million experiments. For the case of GW150914 the signal in the non-time shifted data was louder than any event we saw in any of the time-shifted runs—all 10 million of them. In fact, it was still a lot louder than any event in the time-shifted runs as well. Therefore we can say that this is a 1-in-10-million event, without making any assumptions at all about our noise. Except one. The assumption is that the analysis with Livingston data shifted by e.g. 8s (or any of the other time shifts) is equivalent to the analysis with the Livingston data not shifted at all. Or, in other words, we assume that there is nothing special about the non-time shifted analysis (other than it might contain real signals!). As well as the technical papers, this is also described in the science summary that accompanied the GW150914 PRL.

Nothing in the paper “On the time lags of the LIGO signals” suggests that the non-time shifted analysis is special. The claimed correlations between the two detectors due to resonance and calibration lines in the data would be present also in the time-shifted analyses—The calibration lines are repetitive lines, and so if correlated in the non-time shift analyses, they will also be correlated in the time-shift analyses as well. I should also note that potential correlated noise sources was explored in another of the companion papers to the GW150914 PRL. Therefore, taking the results of this paper at face value, I see nothing that calls into question the “integrity” of the GW150914 detection.

Section V: Wrapping up

I have tried to reproduce the results quoted in “On the time lags of the LIGO signals”. I find the claims of section 2 are due to an issue in how the data is Fourier transformed, and do not reproduce the correlations claimed in section 3. Even if taking the results at face value, it would not affect the 5-sigma confidence associated with GW150914. Nevertheless I am in contact with the authors and we will try to understand these discrepancies.

For people interested in trying to explore LIGO data, check out the LIGO Open Science Center tutorials. As someone who was involved in the internal review of the LOSC data products it is rewarding to see these materials being used. It is true that these tutorials are intended as an introduction to LIGO data analysis, and do not accurately reflect many of the intricacies of these studies. For the interested reader a number of technical papers, for example this one, accompany the main PRL and within this paper and its references you can find all the nitty-gritty about how our analyses work. Finally, the PyCBC analysis toolkit, which was used to obtain the 5-sigma confidence, and of which I am one of the primary developers, is available open-source on git-hub. There are instructions here and also a number of examples that illustrate a number of aspects of our data analysis methods.

This article was circulated in the LIGO-Virgo Public Outreach and Education mailing list before being made public, and I am grateful to comments and feedback from: Christopher Berry, Ofek Birnholtz, Alessandra Buonanno, Gregg Harry, Martin Hendry, Daniel Hoak, Daniel Holz, David Keitel, Andrew Lundgren, Harald Pfeiffer, Vivien Raymond, Jocelyn Read and David Shoemaker.

## June 16, 2017

### Robert Helling — I got this wrong

In yesterday's post, I totally screwed up when identifying the middle part of the spectrum as low frequency. It is not. Please ignore what I said or better take it as a warning what happens when you don't double check.

Apologies to everybody that I stirred up!