Planet Musings

September 24, 2017

Scott AaronsonMy Big Numbers talk at Festivaletteratura

Last weekend, I gave a talk on big numbers, as well as a Q&A about quantum computing, at Festivaletteratura: one of the main European literary festivals, held every year in beautiful and historic Mantua, Italy.  (For those who didn’t know, as I didn’t: this is the city where Virgil was born, and where Romeo gets banished in Romeo and Juliet.  Its layout hasn’t substantially changed since the Middle Ages.)

I don’t know how much big numbers or quantum computing have to do with literature, but I relished the challenge of explaining these things to an audience that was not merely “popular” but humanisitically rather than scientifically inclined.  In this case, there was not only a math barrier, but also a language barrier, as the festival was mostly in Italian and only some of the attendees knew English, to varying degrees.  The quantum computing session was live-translated into Italian (the challenge faced by the translator in not mangling this material provided a lot of free humor), but the big numbers talk wasn’t.  What’s more, the talk was held outdoors, on the steps of a cathedral, with tons of background noise, including a bell that loudly chimed halfway through the talk.  So if my own words weren’t simple and clear, forget it.

Anyway, in the rest of this post, I’ll share a writeup of my big numbers talk.  The talk has substantial overlap with my “classic” Who Can Name The Bigger Number? essay from 1999.  While I don’t mean to supersede or displace that essay, the truth is that I think and write somewhat differently than I did as a teenager (whuda thunk?), and I wanted to give Scott2017 a crack at material that Scott1999 has been over already.  If nothing else, the new version is more up-to-date and less self-indulgent, and it includes points (for example, the relation between ordinal generalizations of the Busy Beaver function and the axioms of set theory) that I didn’t understand back in 1999.

For regular readers of this blog, I don’t know how much will be new here.  But if you’re one of those people who keeps introducing themselves at social events by saying “I really love your blog, Scott, even though I don’t understand anything that’s in it”—something that’s always a bit awkward for me, because, uh, thanks, I guess, but what am I supposed to say next?—then this lecture is for you.  I hope you’ll read it and understand it.

Thanks so much to Festivaletteratura organizer Matteo Polettini for inviting me, and to Fabrizio Illuminati for moderating the Q&A.  I had a wonderful time in Mantua, although I confess there’s something about being Italian that I don’t understand.  Namely: how do you derive any pleasure from international travel, if anywhere you go, the pizza, pasta, bread, cheese, ice cream, coffee, architecture, scenery, historical sights, and pretty much everything else all fall short of what you’re used to?

Big Numbers

by Scott Aaronson
Sept. 9, 2017

My four-year-old daughter sometimes comes to me and says something like: “daddy, I think I finally figured out what the biggest number is!  Is it a million million million million million million million million thousand thousand thousand hundred hundred hundred hundred twenty eighty ninety eighty thirty a million?”

So I reply, “I’m not even sure exactly what number you named—but whatever it is, why not that number plus one?”

“Oh yeah,” she says.  “So is that the biggest number?”

Of course there’s no biggest number, but it’s natural to wonder what are the biggest numbers we can name in a reasonable amount of time.  Can I have two volunteers from the audience—ideally, two kids who like math?

[Two kids eventually come up.  I draw a line down the middle of the blackboard, and place one kid on each side of it, each with a piece of chalk.]

So the game is, you each have ten seconds to write down the biggest number you can.  You can’t write anything like “the other person’s number plus 1,” and you also can’t write infinity—it has to be finite.  But other than that, you can write basically anything you want, as long as I’m able to understand exactly what number you’ve named.  [These instructions are translated into Italian for the kids.]

Are you ready?  On your mark, get set, GO!

[The kid on the left writes something like: 999999999

While the kid on the right writes something like: 11111111111111111

Looking at these, I comment:]

9 is bigger than 1, but 1 is a bit faster to write, and as you can see that makes the difference here!  OK, let’s give our volunteers a round of applause.

[I didn’t plant the kids, but if I had, I couldn’t have designed a better jumping-off point.]

I’ve been fascinated by how to name huge numbers since I was a kid myself.  When I was a teenager, I even wrote an essay on the subject, called Who Can Name the Bigger Number?  That essay might still get more views than any of the research I’ve done in all the years since!  I don’t know whether to be happy or sad about that.

I think the reason the essay remains so popular, is that it shows up on Google whenever someone types something like “what is the biggest number?”  Some of you might know that Google itself was named after the huge number called a googol: 10100, or 1 followed by a hundred zeroes.

Of course, a googol isn’t even close to the biggest number we can name.  For starters, there’s a googolplex, which is 1 followed by a googol zeroes.  Then there’s a googolplexplex, which is 1 followed by a googolplex zeroes, and a googolplexplexplex, and so on.  But one of the most basic lessons you’ll learn in this talk is that, when it comes to naming big numbers, whenever you find yourself just repeating the same operation over and over and over, it’s time to step back, and look for something new to do that transcends everything you were doing previously.  (Applications to everyday life left as exercises for the listener.)

One of the first people to think about systems for naming huge numbers was Archimedes, who was Greek but lived in what’s now Italy (specifically Syracuse, Sicily) in the 200s BC.  Archimedes wrote a sort of pop-science article—possibly history’s first pop-science article—called The Sand-Reckoner.  In this remarkable piece, which was addressed to the King of Syracuse, Archimedes sets out to calculate an upper bound on the number of grains of sand needed to fill the entire universe, or at least the universe as known in antiquity.  He thereby seeks to refute people who use “the number of sand grains” as a shorthand for uncountability and unknowability.

Of course, Archimedes was just guessing about the size of the universe, though he did use the best astronomy available in his time—namely, the work of Aristarchus, who anticipated Copernicus.  Besides estimates for the size of the universe and of a sand grain, the other thing Archimedes needed was a way to name arbitrarily large numbers.  Since he didn’t have Arabic numerals or scientific notation, his system was basically just to compose the word “myriad” (which means 10,000) into bigger and bigger chunks: a “myriad myriad” gets its own name, a “myriad myriad myriad” gets another, and so on.  Using this system, Archimedes estimated that ~1063 sand grains would suffice to fill the universe.  Ancient Hindu mathematicians were able to name similarly large numbers using similar notations.  In some sense, the next really fundamental advances in naming big numbers wouldn’t occur until the 20th century.

We’ll come to those advances, but before we do, I’d like to discuss another question that motivated Archimedes’ essay: namely, what are the biggest numbers relevant to the physical world?

For starters, how many atoms are in a human body?  Anyone have a guess?  About 1028.  (If you remember from high-school chemistry that a “mole” is 6×1023, this is not hard to ballpark.)

How many stars are in our galaxy?  Estimates vary, but let’s say a few hundred billion.

How many stars are in the entire observable universe?  Something like 1023.

How many subatomic particles are in the observable universe?  No one knows for sure—for one thing, because we don’t know what the dark matter is made of—but 1090 is a reasonable estimate.

Some of you might be wondering: but for all anyone knows, couldn’t the universe be infinite?  Couldn’t it have infinitely many stars and particles?  The answer to that is interesting: indeed, no one knows whether space goes on forever or curves back on itself, like the surface of the earth.  But because of the dark energy, discovered in 1998, it seems likely that even if space is infinite, we can only ever see a finite part of it.  The dark energy is a force that pushes the galaxies apart.  The further away they are from us, the faster they’re receding—with galaxies far enough away from us receding faster than light.

Right now, we can see the light from galaxies that are up to about 45 billion light-years away.  (Why 45 billion light-years, you ask, if the universe itself is “only” 13.6 billion years old?  Well, when the galaxies emitted the light, they were a lot closer to us than they are now!  The universe expanded in the meantime.)  If, as seems likely, the dark energy has the form of a cosmological constant, then there’s a somewhat further horizon, such that it’s not just that the galaxies beyond that can’t be seen by us right now—it’s that they can never be seen.

In practice, many big numbers come from the phenomenon of exponential growth.  Here’s a graph showing the three functions n, n2, and 2n:

The difference is, n and even n2 grow in a more-or-less manageable way, but 2n just shoots up off the screen.  The shooting-up has real-life consequences—indeed, more important consequences than just about any other mathematical fact one can think of.

The current human population is about 7.5 billion (when I was a kid, it was more like 5 billion).  Right now, the population is doubling about once every 64 years.  If it continues to double at that rate, and humans don’t colonize other worlds, then you can calculate that, less than 3000 years from now, the entire earth, all the way down to the core, will be made of human flesh.  I hope the people use deodorant!

Nuclear chain reactions are a second example of exponential growth: one uranium or plutonium nucleus fissions and emits neutrons that cause, let’s say, two other nuclei to fission, which then cause four nuclei to fission, then 8, 16, 32, and so on, until boom, you’ve got your nuclear weapon (or your nuclear reactor, if you do something to slow the process down).  A third example is compound interest, as with your bank account, or for that matter an entire country’s GDP.  A fourth example is Moore’s Law, which is the thing that said that the number of components in a microprocessor doubled every 18 months (with other metrics, like memory, processing speed, etc., on similar exponential trajectories).  Here at Festivaletteratura, there’s a “Hack Space,” where you can see state-of-the-art Olivetti personal computers from around 1980: huge desk-sized machines with maybe 16K of usable RAM.  Moore’s Law is the thing that took us from those (and the even bigger, weaker computers before them) to the smartphone that’s in your pocket.

However, a general rule is that any time we encounter exponential growth in our observed universe, it can’t last for long.  It will stop, if not before then when it runs out of whatever resource it needs to continue: for example, food or land in the case of people, fuel in the case of a nuclear reaction.  OK, but what about Moore’s Law: what physical constraint will stop it?

By some definitions, Moore’s Law has already stopped: computers aren’t getting that much faster in terms of clock speed; they’re mostly just getting more and more parallel, with more and more cores on a chip.  And it’s easy to see why: the speed of light is finite, which means the speed of a computer will always be limited by the size of its components.  And transistors are now just 15 nanometers across; a couple orders of magnitude smaller and you’ll be dealing with individual atoms.  And unless we leap really far into science fiction, it’s hard to imagine building a transistor smaller than one atom across!

OK, but what if we do leap really far into science fiction?  Forget about engineering difficulties: is there any fundamental principle of physics that prevents us from making components smaller and smaller, and thereby making our computers faster and faster, without limit?

While no one has tested this directly, it appears from current physics that there is a fundamental limit to speed, and that it’s about 1043 operations per second, or one operation per Planck time.  Likewise, it appears that there’s a fundamental limit to the density with which information can be stored, and that it’s about 1069 bits per square meter, or one bit per Planck area. (Surprisingly, the latter limit scales only with the surface area of a region, not with its volume.)

What would happen if you tried to build a faster computer than that, or a denser hard drive?  The answer is: cycling through that many different states per second, or storing that many bits, would involve concentrating so much energy in so small a region, that the region would exceed what’s called its Schwarzschild radius.  If you don’t know what that means, it’s just a fancy way of saying that your computer would collapse to a black hole.  I’ve always liked that as Nature’s way of telling you not to do something!

Note that, on the modern view, a black hole itself is not only the densest possible object allowed by physics, but also the most efficient possible hard drive, storing ~1069 bits per square meter of its event horizon—though the bits are not so easy to retrieve! It’s also, in a certain sense, the fastest possible computer, since it really does cycle through 1043 states per second—though it might not be computing anything that anyone would care about.

We can also combine these fundamental limits on computer speed and storage capacity, with the limits that I mentioned earlier on the size of the observable universe, which come from the cosmological constant.  If we do so, we get an upper bound of ~10122 on the number of bits that can ever be involved in any computation in our world, no matter how large: if we tried to do a bigger computation than that, the far parts of it would be receding away from us faster than the speed of light.  In some sense, this 10122 is the most fundamental number that sets the scale of our universe: on the current conception of physics, everything you’ve ever seen or done, or will see or will do, can be represented by a sequence of at most 10122 ones and zeroes.

Having said that, in math, computer science, and many other fields (including physics itself), many of us meet bigger numbers than 10122 dozens of times before breakfast! How so? Mostly because we choose to ask, not about the number of things that are, but about the number of possible ways they could be—not about the size of ordinary 3-dimensional space, but the sizes of abstract spaces of possible configurations. And the latter are subject to exponential growth, continuing way beyond 10122.

As an example, let’s ask: how many different novels could possibly be written (say, at most 400 pages long, with a normal-size font, yadda yadda)? Well, we could get a lower bound on the number just by walking around here at Festivaletteratura, but the number that could be written certainly far exceeds the number that have been written or ever will be. This was the subject of Jorge Luis Borges’ famous story The Library of Babel, which imagined an immense library containing every book that could possibly be written up to a certain length. Of course, the vast majority of the books are filled with meaningless nonsense, but among their number one can find all the great works of literature, books predicting the future of humanity in perfect detail, books predicting the future except with a single error, etc. etc. etc.

To get more quantitative, let’s simply ask: how many different ways are there to fill the first page of a novel?  Let’s go ahead and assume that the page is filled with intelligible (or at least grammatical) English text, rather than arbitrary sequences of symbols, at a standard font size and page size.  In that case, using standard estimates for the entropy (i.e., compressibility) of English, I estimated this morning that there are maybe ~10700 possibilities.  So, forget about the rest of the novel: there are astronomically more possible first pages than could fit in the observable universe!

We could likewise ask: how many chess games could be played?  I’ve seen estimates from 1040 up to 10120, depending on whether we count only “sensible” games or also “absurd” ones (though in all cases, with a limit on the length of the game as might occur in a real competition). For Go, by contrast, which is played on a larger board (19×19 rather than 8×8) the estimates for the number of possible games seem to start at 10800 and only increase from there. This difference in magnitudes has something to do with why Go is a “harder” game than chess, why computers were able to beat the world chess champion already in 1997, but the world Go champion not until last year.

Or we could ask: given a thousand cities, how many routes are there for a salesman that visit each city exactly once? We write the answer as 1000!, pronounced “1000 factorial,” which just means 1000×999×998×…×2×1: there are 1000 choices for the first city, then 999 for the second city, 998 for the third, and so on.  This number is about 4×102567.  So again, more possible routes than atoms in the visible universe, yadda yadda.

But suppose the salesman is interested only in the shortest route that visits each city, given the distance between every city and every other.  We could then ask: to find that shortest route, would a computer need to search exhaustively through all 1000! possibilities—or, maybe not all 1000!, maybe it could be a bit more clever than that, but at any rate, a number that grew exponentially with the number of cities n?  Or could there be an algorithm that zeroed in on the shortest route dramatically faster: say, using a number of steps that grew only linearly or quadratically with the number of cities?

This, modulo a few details, is one of the most famous unsolved problems in all of math and science.  You may have heard of it; it’s called P versus NP.  P (Polynomial-Time) is the class of problems that an ordinary digital computer can solve in a “reasonable” amount of time, where we define “reasonable” to mean, growing at most like the size of the problem (for example, the number of cities) raised to some fixed power.  NP (Nondeterministic Polynomial-Time) is the class for which a computer can at least recognize a solution in polynomial-time.  If P=NP, it would mean that for every combinatorial problem of this sort, for which a computer could recognize a valid solution—Sudoku puzzles, scheduling airline flights, fitting boxes into the trunk of a car, etc. etc.—there would be an algorithm that cut through the combinatorial explosion of possible solutions, and zeroed in on the best one.  If P≠NP, it would mean that at least some problems of this kind required astronomical time, regardless of how cleverly we programmed our computers.

Most of us believe that P≠NP—indeed, I like to say that if we were physicists, we would’ve simply declared P≠NP a “law of nature,” and given ourselves Nobel Prizes for the discovery of the law!  And if it turned out that P=NP, we’d just give ourselves more Nobel Prizes for the law’s overthrow.  But because we’re mathematicians and computer scientists, we call it a “conjecture.”

Another famous example of an NP problem is: I give you (say) a 2000-digit number, and I ask you to find its prime factors.  Multiplying two thousand-digit numbers is easy, at least for a computer, but factoring the product back into primes seems astronomically hard—at least, with our present-day computers running any known algorithm.  Why does anyone care?  Well, you might know that, any time you order something online—in fact, every time you see a little padlock icon in your web browser—your personal information, like (say) your credit card number, is being protected by a cryptographic code that depends on the belief that factoring huge numbers is hard, or a few closely-related beliefs.  If P=NP, then those beliefs would be false, and indeed all cryptography that depends on hard math problems would be breakable in “reasonable” amounts of time.

In the special case of factoring, though—and of the other number theory problems that underlie modern cryptography—it wouldn’t even take anything as shocking as P=NP for them to fall.  Actually, that provides a good segue into another case where exponentials, and numbers vastly larger than 10122, regularly arise in the real world: quantum mechanics.

Some of you might have heard that quantum mechanics is complicated or hard.  But I can let you in on a secret, which is that it’s incredibly simple once you take the physics out of it!  Indeed, I think of quantum mechanics as not exactly even “physics,” but more like an operating system that the rest of physics runs on as application programs.  It’s a certain generalization of the rules of probability.  In one sentence, the central thing quantum mechanics says is that, to fully describe a physical system, you have to assign a number called an “amplitude” to every possible configuration that the system could be found in.  These amplitudes are used to calculate the probabilities that the system will be found in one configuration or another if you look at it.  But the amplitudes aren’t themselves probabilities: rather than just going from 0 to 1, they can be positive or negative or even complex numbers.

For us, the key point is that, if we have a system with (say) a thousand interacting particles, then the rules of quantum mechanics say we need at least 21000 amplitudes to describe it—which is way more than we could write down on pieces of paper filling the entire observable universe!  In some sense, chemists and physicists knew about this immensity since 1926.  But they knew it mainly as a practical problem: if you’re trying to simulate quantum mechanics on a conventional computer, then as far as we know, the resources needed to do so increase exponentially with the number of particles being simulated.  Only in the 1980s did a few physicists, such as Richard Feynman and David Deutsch, suggest “turning the lemon into lemonade,” and building computers that themselves would exploit the exponential growth of amplitudes.  Supposing we built such a computer, what would it be good for?  At the time, the only obvious application was simulating quantum mechanics itself!  And that’s probably still the most important application today.

In 1994, though, a guy named Peter Shor made a discovery that dramatically increased the level of interest in quantum computers.  That discovery was that a quantum computer, if built, could factor an n-digit number using a number of steps that grows only like about n2, rather than exponentially with n.  The upshot is that, if and when practical quantum computers are built, they’ll be able to break almost all the cryptography that’s currently used to secure the Internet.

(Right now, only small quantum computers have been built; the record for using Shor’s algorithm is still to factor 21 into 3×7 with high statistical confidence!  But Google is planning within the next year or so to build a chip with 49 quantum bits, or qubits, and other groups around the world are pursuing parallel efforts.  Almost certainly, 49 qubits still won’t be enough to do anything useful, including codebreaking, but it might be enough to do something classically hard, in the sense of taking at least ~249 or 563 trillion steps to simulate classically.)

I should stress, though, that for other NP problems—including breaking various other cryptographic codes, and solving the Traveling Salesman Problem, Sudoku, and the other combinatorial problems mentioned earlier—we don’t know any quantum algorithm analogous to Shor’s factoring algorithm.  For these problems, we generally think that a quantum computer could solve them in roughly the square root of the number of steps that would be needed classically, because of another famous quantum algorithm called Grover’s algorithm.  But getting an exponential quantum speedup for these problems would, at the least, require an additional breakthrough.  No one has proved that such a breakthrough in quantum algorithms is impossible: indeed, no one has proved that it’s impossible even for classical algorithms; that’s the P vs. NP question!  But most of us regard it as unlikely.

If we’re right, then the upshot is that quantum computers are not magic bullets: they might yield dramatic speedups for certain special problems (like factoring), but they won’t tame the curse of exponentiality, cut through to the optimal solution, every time we encounter a Library-of-Babel-like profusion of possibilities.  For (say) the Traveling Salesman Problem with a thousand cities, even a quantum computer—which is the most powerful kind of computer rooted in known laws of physics—might, for all we know, take longer than the age of the universe to find the shortest route.

The truth is, though, the biggest numbers that show up in math are way bigger than anything we’ve discussed until now: bigger than 10122, or even

$$ 2^{10^{122}}, $$

which is a rough estimate for the number of quantum-mechanical amplitudes needed to describe our observable universe.

For starters, there’s Skewes’ number, which the mathematician G. H. Hardy once called “the largest number which has ever served any definite purpose in mathematics.”  Let π(x) be the number of prime numbers up to x: for example, π(10)=4, since we have 2, 3, 5, and 7.  Then there’s a certain estimate for π(x) called li(x).  It’s known that li(x) overestimates π(x) for an enormous range of x’s (up to trillions and beyond)—but then at some point, it crosses over and starts underestimating π(x) (then overestimates again, then underestimates, and so on).  Skewes’ number is an upper bound on the location of the first such crossover point.  In 1955, Skewes proved that the first crossover must happen before

$$ x = 10^{10^{10^{964}}}. $$

Note that this bound has since been substantially improved, to 1.4×10316.  But no matter: there are numbers vastly bigger even than Skewes’ original estimate, which have since shown up in Ramsey theory and other parts of logic and combinatorics to take Skewes’ number’s place.

Alas, I won’t have time here to delve into specific (beautiful) examples of such numbers, such as Graham’s number.  So in lieu of that, let me just tell you about the sorts of processes, going far beyond exponentiation, that tend to yield such numbers.

The starting point is to remember a sequence of operations we all learn about in elementary school, and then ask why the sequence suddenly and inexplicably stops.

As long as we’re only talking about positive integers, “multiplication” just means “repeated addition.”  For example, 5×3 means 5 added to itself 3 times, or 5+5+5.

Likewise, “exponentiation” just means “repeated multiplication.”  For example, 53 means 5×5×5.

But what’s repeated exponentiation?  For that we introduce a new operation, which we call tetration, and write like so: 35 means 5 raised to itself 3 times, or

$$ ^{3} 5 = 5^{5^5} = 5^{3125} \approx 1.9 \times 10^{2184}. $$

But we can keep going. Let x pentated to the y, or xPy, mean x tetrated to itself y times.  Let x sextated to the y, or xSy, mean x pentated to itself y times, and so on.

Then we can define the Ackermann function, invented by the mathematician Wilhelm Ackermann in 1928, which cuts across all these operations to get more rapid growth than we could with any one of them alone.  In terms of the operations above, we can give a slightly nonstandard, but perfectly serviceable, definition of the Ackermann function as follows:

A(1) is 1+1=2.

A(2) is 2×2=4.

A(3) is 3 to the 3rd power, or 33=27.

Not very impressive so far!  But wait…

A(4) is 4 tetrated to the 4, or

$$ ^{4}4 = 4^{4^{4^4}} = 4^{4^{256}} = BIG $$

A(5) is 5 pentated to the 5, which I won’t even try to simplify.  A(6) is 6 sextated to the 6.  And so on.

More than just a curiosity, the Ackermann function actually shows up sometimes in math and theoretical computer science.  For example, the inverse Ackermann function—a function α such that α(A(n))=n, which therefore grows as slowly as the Ackermann function grows quickly, and which is at most 4 for any n that would ever arise in the physical universe—sometimes appears in the running times of real-world algorithms.

In the meantime, though, the Ackermann function also has a more immediate application.  Next time you find yourself in a biggest-number contest, like the one with which we opened this talk, you can just write A(1000), or even A(A(1000)) (after specifying that A means the Ackermann function above).  You’ll win—period—unless your opponent has also heard of something Ackermann-like or beyond.

OK, but Ackermann is very far from the end of the story.  If we want to go incomprehensibly beyond it, the starting point is the so-called “Berry Paradox”, which was first described by Bertrand Russell, though he said he learned it from a librarian named Berry.  The Berry Paradox asks us to imagine leaping past exponentials, the Ackermann function, and every other particular system for naming huge numbers.  Instead, why not just go straight for a single gambit that seems to beat everything else:

The biggest number that can be specified using a hundred English words or fewer

Why is this called a paradox?  Well, do any of you see the problem here?

Right: if the above made sense, then we could just as well have written

Twice the biggest number that can be specified using a hundred English words or fewer

But we just specified that number—one that, by definition, takes more than a hundred words to specify—using far fewer than a hundred words!  Whoa.  What gives?

Most logicians would say the resolution of this paradox is simply that the concept of “specifying a number with English words” isn’t precisely defined, so phrases like the ones above don’t actually name definite numbers.  And how do we know that the concept isn’t precisely defined?  Why, because if it was, then it would lead to paradoxes like the Berry Paradox!

So if we want to escape the jaws of logical contradiction, then in this gambit, we ought to replace English by a clear, logical language: one that can be used to specify numbers in a completely unambiguous way.  Like … oh, I know!  Why not write:

The biggest number that can be specified using a computer program that’s at most 1000 bytes long

To make this work, there are just two issues we need to get out of the way.  First, what does it mean to “specify” a number using a computer program?  There are different things it could mean, but for concreteness, let’s say a computer program specifies a number N if, when you run it (with no input), the program runs for exactly N steps and then stops.  A program that runs forever doesn’t specify any number.

The second issue is, which programming language do we have in mind: BASIC? C? Python?  The answer is that it won’t much matter!  The Church-Turing Thesis, one of the foundational ideas of computer science, implies that every “reasonable” programming language can emulate every other one.  So the story here can be repeated with just about any programming language of your choice.  For concreteness, though, we’ll pick one of the first and simplest programming languages, namely “Turing machine”—the language invented by Alan Turing all the way back in 1936!

In the Turing machine language, we imagine a one-dimensional tape divided into squares, extending infinitely in both directions, and with all squares initially containing a “0.”  There’s also a tape head with n “internal states,” moving back and forth on the tape.  Each internal state contains an instruction, and the only allowed instructions are: write a “0” in the current square, write a “1” in the current square, move one square left on the tape, move one square right on the tape, jump to a different internal state, halt, and do any of the previous conditional on whether the current square contains a “0” or a “1.”

Using Turing machines, in 1962 the mathematician Tibor Radó invented the so-called Busy Beaver function, or BB(n), which allowed naming by far the largest numbers anyone had yet named.  BB(n) is defined as follows: consider all Turing machines with n internal states.  Some of those machines run forever, when started on an all-0 input tape.  Discard them.  Among the ones that eventually halt, there must be some machine that runs for a maximum number of steps before halting.  However many steps that is, that’s what we call BB(n), the nth Busy Beaver number.

The first few values of the Busy Beaver function have actually been calculated, so let’s see them.

BB(1) is 1.  For a 1-state Turing machine on an all-0 tape, the choices are limited: either you halt in the very first step, or else you run forever.

BB(2) is 6, as isn’t too hard to verify by trying things out with pen and paper.

BB(3) is 21: that determination was already a research paper.

BB(4) is 107 (another research paper).

Much like with the Ackermann function, not very impressive yet!  But wait:

BB(5) is not yet known, but it’s known to be at least 47,176,870.

BB(6) is at least 7.4×1036,534.

BB(7) is at least

$$ 10^{10^{10^{10^{18,000,000}}}}. $$

Clearly we’re dealing with a monster here, but can we understand just how terrifying of a monster?  Well, call a sequence f(1), f(2), … computable, if there’s some computer program that takes n as input, runs for a finite time, then halts with f(n) as its output.  To illustrate, f(n)=n2, f(n)=2n, and even the Ackermann function that we saw before are all computable.

But I claim that the Busy Beaver function grows faster than any computable function.  Since this talk should have at least some math in it, let’s see a proof of that claim.

Maybe the nicest way to see it is this: suppose, to the contrary, that there were a computable function f that grew at least as fast as the Busy Beaver function.  Then by using that f, we could take the Berry Paradox from before, and turn it into an actual contradiction in mathematics!  So for example, suppose the program to compute f were a thousand bytes long.  Then we could write another program, not much longer than a thousand bytes, to run for (say) 2×f(1000000) steps: that program would just need to include a subroutine for f, plus a little extra code to feed that subroutine the input 1000000, and then to run for 2×f(1000000) steps.  But by assumption, f(1000000) is at least the maximum number of steps that any program up to a million bytes long can run for—even though we just wrote a program, less than a million bytes long, that ran for more steps!  This gives us our contradiction.  The only possible conclusion is that the function f, and the program to compute it, couldn’t have existed in the first place.

(As an alternative, rather than arguing by contradiction, one could simply start with any computable function f, and then build programs that compute f(n) for various “hardwired” values of n, in order to show that BB(n) must grow at least as rapidly as f(n).  Or, for yet a third proof, one can argue that, if any upper bound on the BB function were computable, then one could use that to solve the halting problem, which Turing famously showed to be uncomputable in 1936.)

In some sense, it’s not so surprising that the BB function should grow uncomputably quickly—because if it were computable, then huge swathes of mathematical truth would be laid bare to us.  For example, suppose we wanted to know the truth or falsehood of the Goldbach Conjecture, which says that every even number 4 or greater can be written as a sum of two prime numbers.  Then we’d just need to write a program that checked each even number one by one, and halted if and only if it found one that wasn’t a sum of two primes.  Suppose that program corresponded to a Turing machine with N states.  Then by definition, if it halted at all, it would have to halt after at most BB(N) steps.  But that means that, if we knew BB(N)—or even any upper bound on BB(N)—then we could find out whether our program halts, by simply running it for the requisite number of steps and seeing.  In that way we’d learn the truth or falsehood of Goldbach’s Conjecture—and similarly for the Riemann Hypothesis, and every other famous unproved mathematical conjecture (there are a lot of them) that can be phrased in terms of a computer program never halting.

(Here, admittedly, I’m using “we could find” in an extremely theoretical sense.  Even if someone handed you an N-state Turing machine that ran for BB(N) steps, the number BB(N) would be so hyper-mega-astronomical that, in practice, you could probably never distinguish the machine from one that simply ran forever.  So the aforementioned “strategy” for proving Goldbach’s Conjecture, or the Riemann Hypothesis would probably never yield fruit before the heat death of the universe, even though in principle it would reduce the task to a “mere finite calculation.”)

OK, you wanna know something else wild about the Busy Beaver function?  In 2015, my former student Adam Yedidia and I wrote a paper where we proved that BB(8000)—i.e., the 8000th Busy Beaver number—can’t be determined using the usual axioms for mathematics, which are called Zermelo-Fraenkel (ZF) set theory.  Nor can B(8001) or any larger Busy Beaver number.

To be sure, BB(8000) has some definite value: there are finitely many 8000-state Turing machines, and each one either halts or runs forever, and among the ones that halt, there’s some maximum number of steps that any of them runs for.  What we showed is that math, if it limits itself to the currently-accepted axioms, can never prove the value of BB(8000), even in principle.

The way we did that was by explicitly constructing an 8000-state Turing machine, which (in effect) enumerates all the consequences of the ZF axioms one after the next, and halts if and only if it ever finds a contradiction—that is, a proof of 0=1.  Presumably set theory is actually consistent, and therefore our program runs forever.  But if you proved the program ran forever, you’d also be proving the consistency of set theory.  And has anyone heard of any obstacle to doing that?  Of course, Gödel’s Incompleteness Theorem!  Because of Gödel, if set theory is consistent (well, technically, also arithmetically sound), then it can’t prove our program either halts or runs forever.  But that means set theory can’t determine BB(8000) either—because if it could do that, then it could also determine the behavior of our program.

To be clear, it was long understood that there’s some computer program that halts if and only if set theory is inconsistent—and therefore, that the axioms of set theory can determine at most k values of the Busy Beaver function, for some positive integer k.  “All” Adam and I did was to prove the first explicit upper bound, k≤8000, which required a lot of optimizations and software engineering to get the number of states down to something reasonable (our initial estimate was more like k≤1,000,000).  More recently, Stefan O’Rear has improved our bound—most recently, he says, to k≤1000, meaning that, at least by the lights of ZF set theory, fewer than a thousand values of the BB function can ever be known.

Meanwhile, let me remind you that, at present, only four values of the function are known!  Could the value of BB(100) already be independent of set theory?  What about BB(10)?  BB(5)?  Just how early in the sequence do you leap off into Platonic hyperspace?  I don’t know the answer to that question but would love to.

Ah, you ask, but is there any number sequence that grows so fast, it blows even the Busy Beavers out of the water?  There is!

Imagine a magic box into which you could feed in any positive integer n, and it would instantly spit out BB(n), the nth Busy Beaver number.  Computer scientists call such a box an “oracle.”  Even though the BB function is uncomputable, it still makes mathematical sense to imagine a Turing machine that’s enhanced by the magical ability to access a BB oracle any time it wants: call this a “super Turing machine.”  Then let SBB(n), or the nth super Busy Beaver number, be the maximum number of steps that any n-state super Turing machine makes before halting, if given no input.

By simply repeating the reasoning for the ordinary BB function, one can show that, not only does SBB(n) grow faster than any computable function, it grows faster than any function computable by super Turing machines (for example, BB(n), BB(BB(n)), etc).

Let a super duper Turing machine be a Turing machine with access to an oracle for the super Busy Beaver numbers.  Then you can use super duper Turing machines to define a super duper Busy Beaver function, which you can use in turn to define super duper pooper Turing machines, and so on!

Let “level-1 BB” be the ordinary BB function, let “level-2 BB” be the super BB function, let “level 3 BB” be the super duper BB function, and so on.  Then clearly we can go to “level-k BB,” for any positive integer k.

But we need not stop even there!  We can then go to level-ω BB.  What’s ω?  Mathematicians would say it’s the “first infinite ordinal”—the ordinals being a system where you can pass from any set of numbers you can possibly name (even an infinite set), to the next number larger than all of them.  More concretely, the level-ω Busy Beaver function is simply the Busy Beaver function for Turing machines that are able, whenever they want, to call an oracle to compute the level-k Busy Beaver function, for any positive integer k of their choice.

But why stop there?  We can then go to level-(ω+1) BB, which is just the Busy Beaver function for Turing machines that are able to call the level-ω Busy Beaver function as an oracle.  And thence to level-(ω+2) BB, level-(ω+3) BB, etc., defined analogously.  But then we can transcend that entire sequence and go to level-2ω BB, which involves Turing machines that can call level-(ω+k) BB as an oracle for any positive integer k.  In the same way, we can pass to level-3ω BB, level-4ω BB, etc., until we transcend that entire sequence and pass to level-ω2 BB, which can call any of the previous ones as oracles.  Then we have level-ω3 BB, level-ω4 BB, etc., until we transcend that whole sequence with level-ωω BB.  But we’re still not done!  For why not pass to level

$$ \omega^{\omega^{\omega}} $$,


$$ \omega^{\omega^{\omega^{\omega}}} $$,

etc., until we reach level

$$ \left. \omega^{\omega^{\omega^{.^{.^{.}}}}}\right\} _{\omega\text{ times}} $$?

(This last ordinal is also called ε0.)  And mathematicians know how to keep going even to way, way bigger ordinals than ε0, which give rise to ever more rapidly-growing Busy Beaver sequences.  Ordinals achieve something that on its face seems paradoxical, which is to systematize the concept of transcendence.

So then just how far can you push this?  Alas, ultimately the answer depends on which axioms you assume for mathematics.  The issue is this: once you get to sufficiently enormous ordinals, you need some systematic way to specify them, say by using computer programs.  But then the question becomes which ordinals you can “prove to exist,” by giving a computer program together with a proof that the program does what it’s supposed to do.  The more powerful the axiom system, the bigger the ordinals you can prove to exist in this way—but every axiom system will run out of gas at some point, only to be transcended, in Gödelian fashion, by a yet more powerful system that can name yet larger ordinals.

So for example, if we use Peano arithmetic—invented by the Italian mathematician Giuseppe Peano—then Gentzen proved in the 1930s that we can name any ordinals below ε0, but not ε0 itself or anything beyond it.  If we use ZF set theory, then we can name vastly bigger ordinals, but once again we’ll eventually run out of steam.

(Technical remark: some people have claimed that we can transcend this entire process by passing from first-order to second-order logic.  But I fundamentally disagree, because with second-order logic, which number you’ve named could depend on the model of set theory, and therefore be impossible to pin down.  With the ordinal Busy Beaver numbers, by contrast, the number you’ve named might be breathtakingly hopeless ever to compute—but provided the notations have been fixed, and the ordinals you refer to actually exist, at least we know there is a unique positive integer that you’re talking about.)

Anyway, the upshot of all of this is that, if you try to hold a name-the-biggest-number contest between two actual professionals who are trying to win, it will (alas) degenerate into an argument about the axioms of set theory.  For the stronger the set theory you’re allowed to assume consistent, the bigger the ordinals you can name, therefore the faster-growing the BB functions you can define, therefore the bigger the actual numbers.

So, yes, in the end the biggest-number contest just becomes another Gödelian morass, but one can get surprisingly far before that happens.

In the meantime, our universe seems to limit us to at most 10122 choices that could ever be made, or experiences that could ever be had, by any one observer.  Or fewer, if you believe that you won’t live until the heat death of the universe in some post-Singularity computer cloud, but for at most about 102 years.  In the meantime, the survival of the human race might hinge on people’s ability to understand much smaller numbers than 10122: for example, a billion, a trillion, and other numbers that characterize the exponential growth of our civilization and the limits that we’re now running up against.

On a happier note, though, if our goal is to make math engaging to young people, or to build bridges between the quantitative and literary worlds, the way this festival is doing, it seems to me that it wouldn’t hurt to let people know about the vastness that’s out there.  Thanks for your attention.

September 23, 2017

John PreskillStanding back at Stanford

T-shirt 1

This T-shirt came to mind last September. I was standing in front of a large silver-colored table littered with wires, cylinders, and tubes. Greg Bentsen was pointing at components and explaining their functions. He works in Monika Schleier-Smith’s lab, as a PhD student, at Stanford.

Monika’s group manipulates rubidium atoms. A few thousand atoms sit in one of the cylinders. That cylinder contains another cylinder, an optical cavity, that contains the atoms. A mirror caps each of the cavity’s ends. Light in the cavity bounces off the mirrors.

Light bounces off your bathroom mirror similarly. But we can describe your bathroom’s light accurately with Maxwellian electrodynamics, a theory developed during the 1800s. We describe the cavity’s light with quantum electrodynamics (QED). Hence we call the lab’s set-up cavity QED.

The light interacts with the atoms, entangling with them. The entanglement imprints information about the atoms on the light. Suppose that light escaped from the cavity. Greg and friends could measure the light, then infer about the atoms’ quantum state.

A little light leaks through the mirrors, though most light bounces off. From leaked light, you can infer about the ensemble of atoms. You can’t infer about individual atoms. For example, consider an atom’s electrons. Each electron has a quantum property called a spin. We sometimes imagine the spin as an arrow that points upward or downward. Together, the electrons’ spins form the atom’s joint spin. You can tell, from leaked light, whether one atom’s spin points upward. But you can’t tell which atom’s spin points upward. You can’t see the atoms for the ensemble.

Monika’s team can. They’ve cut a hole in their cylinder. Light escapes the cavity through the hole. The light from the hole’s left-hand edge carries information about the leftmost atom, and so on. The team develops a photograph of the line of atoms. Imagine holding a photograph of a line of people. You can point to one person, and say, “Aha! She’s the xkcd fan.” Similarly, Greg and friends can point to one atom in their photograph and say, “Aha! That atom has an upward-pointing spin.” Monika’s team is developing single-site imaging.


Aha! She’s the xkcd fan.

Monika’s team plans to image atoms in such detail, they won’t need for light to leak through the mirrors. Light leakage creates problems, including by entangling the atoms with the world outside the cavity. Suppose you had to diminish the amount of light that leaks from a rubidium cavity. How should you proceed?

Tell the mirrors,

T-shirt 2

You should lengthen the cavity. Why? Imagine a photon, a particle of light, in the cavity. It zooms down the cavity’s length, hits a mirror, bounces off, retreats up the cavity’s length, hits the other mirror, and bounces off. The photon repeats this process until a mirror hit fails to generate a bounce. The mirror transmits the photon to the exterior; the photon leaks out. How can you reduce leaks? By preventing photons from hitting mirrors so often, by forcing the photons to zoom longer, by lengthening the cavity, by shifting the mirrors outward.

So Greg hinted, beside that silver-colored table in Monika’s lab. The hint struck a chord: I recognized the impulse to

T-shirt 3

The impulse had led me to Stanford.

Weeks earlier, I’d written my first paper about quantum chaos and information scrambling. I’d sat and read and calculated and read and sat and emailed and written. I needed to stand up, leave my cavity, and image my work from other perspectives.

Stanford physicists had written quantum-chaos papers I admired. So I visited, presented about my work, and talked. Patrick Hayden introduced me to a result that might help me apply my result to another problem. His group helped me simplify a mathematical expression. Monika reflected that a measurement scheme I’d proposed sounded not unreasonable for cavity QED.

And Greg led me to recognize the principle behind my visit: Sometimes, you have to

T-shirt 4

to move forward.

With gratitude to Greg, Monika, Patrick, and the rest of Monika’s and Patrick’s groups for their time, consideration, explanations, and feedback. With thanks to Patrick and Stanford’s Institute for Theoretical Physics for their hospitality.

September 22, 2017

Doug NatelsonLab question - Newport NPC3SG

Anyone out there using a Newport NPC3SG controller to drive a piezo positioning stage, with computer communication successfully talking to the NPC3SG?  If so, please leave a comment so that we can get in touch, as I have questions.

n-Category Café Schröder Paths and Reverse Bessel Polynomials

I want to show you a combinatorial interpretation of the reverse Bessel polynomials which I learnt from Alan Sokal. The sequence of reverse Bessel polynomials begins as follows.

θ 0(R) =1 θ 1(R) =R+1 θ 2(R) =R 2+3R+3 θ 3(R) =R 3+6R 2+15R+15 \begin{aligned} \theta_0(R)&=1\\ \theta_1(R)&=R+1\\ \theta_2(R)&=R^2+3R+3\\ \theta_3(R)&= R^3 +6R^2+15R+15 \end{aligned}

To give you a flavour of the combinatorial interpretation we will prove, you can see that the second reverse Bessel polynomial can be read off the following set of ‘weighted Schröder paths’: multiply the weights together on each path and add up the resulting monomials.

Schroeder paths

In this post I’ll explain how to prove the general result, using a certain result about weighted Dyck paths that I’ll also prove. At the end I’ll leave some further questions for the budding enumerative combinatorialists amongst you.

These reverse Bessel polynomials have their origins in the theory of Bessel functions, but which I’ve encountered in the theory of magnitude, and they are key to a formula I give for the magnitude of an odd dimensional ball which I have just posted on the arxiv.

In that paper I use the combinatorial expression for these Bessel polynomials to prove facts about the magnitude.

Here, to simplify things slightly, I have used the standard reverse Bessel polynomials whereas in my paper I use a minor variant (see below).

I should add that a very similar expression can be given for the ordinary, unreversed Bessel polynomials; you just need a minor modification to the way the weights on the Schröder paths are defined. I will leave that as an exercise.

The reverse Bessel polynomials

The reverse Bessel polynomials have many properties. In particular they satisfy the recursion relation θ i+1(R)=R 2θ i1(R)+(2i+1)θ i(R) \theta_{i+1}(R)=R^2\theta_{i-1}(R) + (2i+1)\theta_{i}(R) and θ i(R)\theta_i(R) satisfies the differential equation Rθ i (R)2(R+i)θ i (R)+2iθ i(R)=0. R\theta_i^{\prime\prime}(R)-2(R+i)\theta_i^\prime(R)+2i\theta_i(R)=0. There’s an explicit formula: θ i(R)= t=0 i(i+t)!(it)!t!2 tR it. \theta_i(R) = \sum_{t=0}^i \frac{(i+t)!}{(i-t)!\, t!\, 2^t}R^{i-t}.

I’m interested in them because they appear in my formula for the magnitude of odd dimensional balls. To be more precise, in my formula I use the associated Sheffer polynomials, (χ i(R)) i=0 (\chi_i(R))_{i=0}^\infty; they are related by χ i(R)=Rθ i1(R)\chi_i(R)=R\theta_{i-1}(R), so the coefficients are the same, but just moved around a bit. These polynomials have a similar but slightly more complicated combinatorial interpretation.

In my paper I prove that the magnitude of the (2p+1)(2p+1)-dimensional ball of radius RR has the following expression:

|B R 2p+1|=det[χ i+j+2(R)] i,j=0 p(2p+1)!Rdet[χ i+j(R)] i,j=0 p \left|B^{2p+1}_R \right|= \frac{\det[\chi_{i+j+2}(R)]_{i,j=0}^p}{(2p+1)!\, R\,\det[\chi_{i+j}(R)]_{i,j=0}^p}

As the each polynomial χ i(R)\chi_i(R) has a path counting interpretation, one can use the rather beautiful Lindström-Gessel-Viennot Lemma to give a path counting interpretation to the determinants in the above formula and find some explicit expression. I will probably blog about this another time. (Fellow host Qiaochu has also blogged about the LGV Lemma.)

Weighted Dyck paths

Before getting on to Bessel polynomials and weighted Schröder paths, we need to look at counting weighted Dyck paths, which are simpler and more classical.

A Dyck path is a path in the lattice 2\mathbb{Z}^2 which starts at (0,0)(0,0), stays in the upper half plane, ends back on the xx-axis at (2i,0)(2{i},0) and has steps going either diagonally right and up or right and down. The integer 2i2{i} is called the length of the path. Let D iD_{i} be the set of length 2i2{i} Dyck paths.

For each Dyck path σ\sigma, we will weight each edge going right and down, from (x,y)(x,y) to (x+1,y1)(x+1,y-1) by yy then we will take w(σ)w(\sigma), the weight of σ\sigma, to be the product of all the weights on its steps. Here are all five weighted Dyck paths of length six.

Dyck paths

Famously, the number of Dyck paths of length 2i2{i} is given by the i{i}th Catalan number; here, however, we are interested in the number of paths weighted by the weighting(!). If we sum over the weights of each of the above diagrams we get 6+4+2+2+1=156+4+2+2+1=15. Note that this is 5×3×15\times 3 \times 1. This is a pattern that holds in general.

Theorem A. (Françon and Viennot) The weighted count of length 2i2{i} Dyck paths is equal to the double factorial of 2i12{i} -1: σD iw(σ) =(2i1)(2i3)(2i5)31 (2i1)!!. \begin{aligned} \sum_{\sigma\in D_{i}} w(\sigma)&= (2{i} -1)\cdot (2{i} -3)\cdot (2{i}-5)\cdot \cdots\cdot 3\cdot 1 \\ &\eqqcolon (2{i} -1)!!. \end{aligned}

The following is a nice combinatorial proof of this theorem that I found in a survey paper by Callan. (I was only previously aware of a high-tech proof involving continued fractions and a theorem of Gauss.)

The first thing to note is that the weight of a Dyck path is actually counting something. It is counting the ways of labelling each of the down steps in the diagram by a positive integer less than the height (i.e.~the weight) of that step. We call such a labelling a height labelling. Note that we have no choice of weighting but we often have choice of height labelling. Here’s a height labelled Dyck path.

height labelled Dyck path

So the weighted count of Dyck paths of length 2i2{i} is precisely the number of height labelled Dyck paths of length 2i2{i}. σD iw(σ)=#{height labelled paths of length 2i} \sum_{\sigma\in D_{i}} w(\sigma) = \#\{\text{height labelled paths of length }\,\,2{i}\}

We are going to consider marked Dyck paths, which just means we single out a specific vertex. A path of length 2i2{i} has 2i+12{i} + 1 vertices. Thus

#{height labelled, MARKED paths of length 2i} =(2i+1)×#{height labelled paths of length 2i}. \begin{aligned} \#\{\text{height labelled,}\,\, &\text{ MARKED paths of length }\,\,2{i}\}\\ &=(2{i}+1)\times\#\{\text{height labelled paths of length }\,\,2{i}\}. \end{aligned}

Hence the theorem will follow by induction if we find a bijection

{height labelled, paths of length 2i} {height labelled, MARKED paths of length 2i2}. \begin{aligned} \{\text{height labelled,}\,\,&\text{ paths of length }\,\,2{i} \}\\ &\cong \{\text{height labelled, MARKED paths of length }\,\,2{i}-2 \}. \end{aligned}

Such a bijection can be constructed in the following way. Given a height labelled Dyck path, remove the left-hand step and the first step that has a label of one on it. On each down step between these two deleted steps decrease the label by one. Now join the two separated parts of the path together and mark the vertex at which they are joined. Here is an example of the process.

dyck bijection

Working backwards it is easy to describe the inverse map. And so the theorem is proved.

Schröder paths and reverse Bessel polynomials

In order to give a path theoretic interpretation of reverse Bessel polynomials we will need to use Schröder paths. These are like Dyck paths except we allow a certain kind of flat step.

A Schröder path is a path in the lattice 2\mathbb{Z}^2 which starts at (0,0)(0,0), stays in the upper half plane, ends back on the xx-axis at (2i,0)(2{i},0) and has steps going either diagonally right and up, diagonally right and down, or horizontally two units to the right. The integer 2i2{i} is called the length of the path. Let S iS_{i} be the set of all length 2i2{i} Schröder paths.

For each Schröder path σ\sigma, we will weight each edge going right and down, from (x,y)(x,y) to (x+1,y1)(x+1,y-1) by yy and we will weight each flat edge by the indeterminate RR. Then we will take w(σ)w(\sigma), the weight of σ\sigma, to be the product of all the weights on its steps.

Here is the picture of all six length four weighted Schröder paths again.

Schroeder paths

You were asked at the top of this post to check that the sum of the weights equals the second reverse Bessel polynomial. Of course that result generalizes!

The following theorem was shown to me by Alan Sokal, he proved it using continued fractions methods, but these essentially amount to the combinatorial proof I’m about to give.

Theorem B. The weighted count of length 2i2{i} Schröder paths is equal to the i{i}th reverse Bessel polynomial: σS iw(σ)=θ i(R). \sum_{\sigma\in S_{i}} w(\sigma)= \theta_{i}(R).

The idea is to observe that you can remove the flat steps from a weighted Schröder path to obtain a weighted Dyck path. If a Schröder path has length 2i2{i} and tt upward steps then it has tt downward steps and it{i}-t flat steps, so it has a total of i+t{i}+t steps. This means that there are (i+tit)\binom{{i}+t}{{i}-t} length 2i2{i} Schröder paths with the same underlying length 2t2t Dyck path (we just choose were to insert the flat steps). Let’s write S i tS^t_{i} for the set of Schröder paths of length 2i2{i} with tt upward steps. σS iw(σ) = t=0 i σS i tw(σ)= t=0 i(i+tit) σD tw(σ)R it = t=0 i(i+tit)(2t1)!!R it = t=0 i(i+t)!(it)!(2t)!(2t)!2 tt!R it =θ i(R), \begin{aligned} \sum_{\sigma\in S_{i}} w(\sigma) &= \sum_{t=0}^{i} \sum_{\sigma\in S^t_{i}} w(\sigma) = \sum_{t=0}^{i} \binom{{i}+t}{{i}-t}\sum_{\sigma'\in D_t} w(\sigma')R^{{i}-t}\\ &= \sum_{t=0}^{i} \binom{{i}+t}{{i}-t}(2t-1)!!\,R^{{i}-t}\\ &= \sum_{t=0}^{i} \frac{({i}+t)!}{({i}-t)!\,(2t)!}\frac{(2t)!}{2^t t!}R^{{i}-t}\\ &= \theta_{i}(R), \end{aligned} where the last equality comes from the formula for θ i(R)\theta_{i}(R) given at the beginning of the post.

Thus we have the required combinatorial interpretation of the reverse Bessel polynomials.

Further questions

The first question that springs to mind for me is if it is possible to give a bijective proof of Theorem B, similar in style, perhaps (or perhaps not), to the proof given of Theorem A, basically using the recursion relation θ i+1(R)=R 2θ i1(R)+(2i+1)θ i(R) \theta_{i+1}(R)=R^2\theta_{i-1}(R) + (2i+1)\theta_{i}(R) rather than the explicit formular for them.

The second question would be whether the differential equation Rθ i (R)2(R+i)θ i (R)+2iθ i(R)=0. R\theta_i^{\prime\prime}(R)-2(R+i)\theta_i^\prime(R)+2i\theta_i(R)=0. has some sort of combinatorial interpretation in terms of paths.

I’m interested to hear if anyone has any thoughts.

n-Category Café Lattice Paths and Continued Fractions II

Last time we proved Flajolet’s Fundamental Lemma about enumerating Dyck paths. This time I want to give some examples, in particular to relate this to what I wrote previously about Dyck paths, Schröder paths and what they have to do with reverse Bessel polynomials.

We’ll see that the generating function of the sequence of reverse Bessel polynomials (θ i(R)) i=0 \left(\theta_i(R)\right)_{i=0}^\infty has the following continued fraction expansion.

i=0 θ i(R)t i=11Rtt1Rt2t1Rt3t1 \sum_{i=0}^\infty \theta_i(R) \,t^i = \frac{1}{1-Rt- \frac{t}{1-Rt - \frac{2t}{1-Rt- \frac{3t}{1-\dots}}}}

I’ll even give you a snippet of SageMath code so you can have a play around with this if you like.

Flajolet’s Fundamental Lemma

Let’s just recall from last time that if we take Motzkhin paths weighted by a ia_is, b ib_is and c ic_is as in this example,

weighted Motzkhin path

then when we sum the weightings of all Motzkhin paths together we have the following continued fraction expression. σMotzkhinw a,b,c(σ)=11c 0a 1b 11c 1a 2b 21c 2a 3b 31[[a i,b i,c i]] \sum_{\sigma\,\,\mathrm{Motzkhin}} w_{a,b,c}(\sigma) = \frac{1} {1- c_{0} - \frac{a_{1} b_{1}} {1-c_{1} - \frac{a_{2} b_{2}} {1- c_2 - \frac{a_3 b_3} {1-\dots }}}} \in\mathbb{Z}[[a_i, b_i, c_i]]

Jacobi continued fractions and Motzkhin paths

Flajolet’s Fundamental Lemma is very beautiful, but we want a power series going up in terms of path length. So let’s use another variable tt to keep track of path length. All three types of step in a Motzkhin path have length one. We can set a i=α ita_i=\alpha_i t, b i=β itb_i=\beta_i t and c i=γ itc_i=\gamma_i t. Then σw a,b,c(σ)[α i,β i,γ i][[t]]\sum_{\sigma} w_{a, b, c}(\sigma)\in \mathbb{Z}[\alpha_i, \beta_i, \gamma_i][[t]], and the coefficient of t t^\ell will be the sum of the weights of Motzkhin paths of length \ell. This coefficient will be a polynomial (rather than a power series) as there are only finitely paths of a given length.

=0 ( σMotzkhin lengthw α,β,γ(σ))t =11γ 0tα 1β 1t 21γ 1tα 2β 2t 21γ 2tα 3β 3t 21 \sum_{\ell=0}^\infty\left(\sum_{\sigma\,\,\text{Motzkhin length}\,\,\ell} w_{\alpha,\beta,\gamma}(\sigma)\right)t^\ell = \frac{1} {1- \gamma_{0}t - \frac{\alpha_{1}\beta_1 t^2} {1-\gamma_{1}t - \frac{\alpha_{2}\beta_2 t^2} {1- \gamma_2t - \frac{\alpha_3 \beta_3 t^2} {1-\dots }}}}

Such a continued fraction is called a Jacobi (or J-type) continued fraction. They crop up in the study of moments of orthogonal polynomials and also in birth-death processes.

For example, I believe that Euler proved the following Jacobi continued fraction expansion of the generating function of the factorials. =0 !t =11tt 213t4t 215t9t 21 \sum_{\ell=0}^\infty \ell!\, t^\ell = \frac{1} {1- t - \frac{t^2} {1-3 t - \frac{4 t^2} {1- 5t - \frac{9 t^2} {1-\dots }}}} We can get the right hand side by taking α i=β i=i\alpha_i=\beta_i=i and γ i=2i+1\gamma_i=2i+1. Here is a Motzkhin path weighted in that way.

weighted Motzkhin path

The equation above is telling us that if we weight Motzkhin paths in that way, then the weighted count of Motzkhin paths of length \ell is !\ell!, and that deserves an exclamation mark! (You’re invited to verify this for Motzkhin paths of length 4.)

I’ve put some SageMath code at the bottom of this post if you want to check the continued fraction equality numerically.

Stieltjes continued fractions and Dyck paths

A Dyck path is a Motzkhin path with no flat steps. So if we weight the flat steps in Motzkhin paths with 00 then when we do a weighted count then we just count the weighted Dyck paths. This means setting γ i=0\gamma_i=0.

Also the weigh α i\alpha_i on an up step always appears with the weight β i\beta_i on a corresponding down step (what goes up must come down!) so we can simplify things by just putting a weighting α iβ i\alpha_i\beta_i — which we’ll rename as α i\alpha_i — on the down step from level ii and put a weighting of 11 on each up step. We can call this weighting w αw_\alpha.

Putting this together we get the following, where we’ve noted that there are no Dyck paths of odd length.

n=0 ( σDyck length2nw α(σ))t 2n=11α 1t 21α 2t 21α 3t 21 \sum_{n=0}^\infty\left(\sum_{\sigma\,\,\text{Dyck length}\,\,2n} w_\alpha(\sigma)\right)t^{2n} = \frac{1} {1- \frac{\alpha_{1} t^2} {1- \frac{\alpha_{2} t^2} {1- \frac{\alpha_3 t^2} {1-\dots }}}}

This kind of continued fraction is called a Stieltjes (or S-type) continued fraction. Of course, we could replace t 2t^2 by tt in the above, without any ill effect.

Previously we proved combinatorially that with the weighting where α i=i\alpha_i=i the weighted count of Dyck paths of length 2n2n was precisely (2n1)!!(2n-1)!!. This means that we have proved the following continued fraction expansion of the generating function of the odd double factorials.

n=0 (2n1)!!t 2n=11t 212t 213t 21 \sum_{n=0}^\infty (2n -1)!!\, t^{2n} = \frac{1} {1- \frac{ t^2} {1- \frac{2 t^2} {1- \frac{3 t^2} {1-\dots }}}}

I believe this was originally proved by Gauss, but I have no idea how.

Again there’s some SageMath code at the end for you to see this in action.

Thron continued fractions and Schröder paths

What I’m really interested in, you’ll remember, is reverse Bessel polynomials, and these are giving weighted counts of Schroder paths. Using continued fractions in this context is less standard than for Dyck paths and Motzkhin paths as above, but it only requires a minor modification. I learnt about this from Alan Sokal.

The difference between Motzkhin paths and Schröoder paths is that the flat steps have length 22 in Schroder paths. Remember that the power of tt was encoding the length, so we just have to assign t 2t^2 to each flat step rather than tt. So if we put a i=ta_i= t, b i=α itb_i=\alpha_i t and c i=γ it 2c_i = \gamma_i t^2 in Flajolet’s Fundamental Lemma then we get the following.

n=0 ( σSchroder length2nw α,γ(σ))t 2n=11γ 0t 2α 1t 21γ 1t 2α 2t 21γ 2t 2α 3t 21 \sum_{n=0}^\infty\left(\sum_{\sigma\,\,\text{Schroder length}\,\,2n} w_{\alpha,\gamma}(\sigma)\right)t^{2n} = \frac{1} {1- \gamma_{0}t^2 - \frac{\alpha_{1} t^2} {1-\gamma_{1}t^2 - \frac{\alpha_{2} t^2} {1- \gamma_2t^2 - \frac{\alpha_3 t^2} {1-\dots }}}}

Here w α,γw_{\alpha, \gamma} is the weighting where we put α i\alpha_is on the down steps and γ i\gamma_is on the flat steps.

This kind of continued fraction is called a Thron (or T-type) continued fraction. Again, we could replace t 2t^2 by tt in the above, without any ill effect.

We saw before that if we take the weighting, w rBpw_{rBp}, with α i:=i\alpha_i:=i and γ i:=R\gamma_i:=R, such as in the following picture,

weighted schroder path

then the weighted sum of Schroder paths of length 2n2n is precisely the nnth reverse Bessel polynomial: θ i(R)= σSchroder length2nw rBp(σ). \theta_i(R)= \sum_{\sigma\,\,\text{Schroder length}\,\,2n} w_{rBp}(\sigma).

Putting that together with the Thron continued fraction above we get the following Thron continued fraction expansion for the generating function of the reverse Bessel polynomials.

n=0 θ n(R)t n=11Rtt1Rt2t1Rt3t1 \sum_{n=0}^\infty \theta_n(R) t^n = \frac{1}{1-Rt- \frac{t}{1-Rt - \frac{2t}{1-Rt- \frac{3t}{1-\dots}}}}

This expression is given by Paul Barry, without any reference, in the formulas section of the entry in the Online Encyclopedia of Integer Sequences.

See the end of the post for some SageMath code to check this numerically.

In my recent magnitude paper I actually work backwards. I start with the continued fraction expansion as a given, and use Flajolet’s Fundamental Lemma to give the Schr&\ouml;der path interpretation of the reverse Bessel polynomials. Of course, I now know that I can bypass the use of continued fractions completely, and have a purely combinatorial proof of this interpretation. Regardless of that, however, the theory of lattice paths and continued fractions remains beautiful.

Appendix: Some SageMath code

It’s quite easy to play around with these continued fractions in SageMath, at least to some finite order. I thought I’d let you have some code to get you started…

Here’s some SageMath code for you to check the Jacodi continued fraction expansion of the generating function of the factorials.

# T = Z[t]
T.<t> = PolynomialRing(ZZ)
# We'll take the truncated continued fraction to be in the 
# ring of rational functions, P = Z(t)
P = Frac(T)

def j_ctd_frac(alphas, gammas):
    if alphas == [] or gammas == []:
        return 1
        return P(1/(1 - gammas[0]*t - alphas[0]*t^2* 
                        j_ctd_frac(alphas[1:], gammas[1:])))

cf(t) = j_ctd_frac([1, 4, 9, 16, 25, 36], [1, 3, 5, 7, 9, 11]) 
print cf(t).series(t, 10)

The above code can be used to define a Stieltjes continued fraction and check out the expansion of Gauss on the odd double factorials.

def s_ctd_frac(alphas):
    gammas = [0]*len(alphas)
    return j_ctd_frac(alphas, gammas)

cf(t) = s_ctd_frac([1, 2, 3, 4, 5, 6])
print cf(t).series(t, 13)

Here’s the code for getting the reverse Bessel polynomials from a Thron continued fraction.

S.<R> = PolynomialRing(ZZ)
T.<t> = PowerSeriesRing(S)

def t_ctd_frac(alphas, gammas):
    if alphas == [] or gammas == []:
        return 1
        return (1/(1- gammas[0]*t^2 - alphas[0]*t^2* 
                      t_ctd_frac(alphas[1:], gammas[1:])))

print T(t_ctd_frac([1, 2, 3, 4, 5, 6], [R, R, R, R, R, R]),

BackreactionThe Quantum Quartet

I made some drawings recently. For no particular purpose, really, other than to distract myself. And here is the joker:

September 21, 2017

John BaezApplied Category Theory at UCR (Part 2)

I’m running a special session on applied category theory, and now the program is available:

Applied category theory, Fall Western Sectional Meeting of the AMS, 4-5 November 2017, U.C. Riverside.

This is going to be fun.

My former student Brendan Fong is now working with David Spivak at MIT, and they’re both coming. My collaborator John Foley at Metron is also coming: we’re working on the CASCADE project for designing networked systems.

Dmitry Vagner is coming from Duke: he wrote a paper with David and Eugene Lerman on operads and open dynamical system. Christina Vaisilakopolou, who has worked with David and Patrick Schultz on dynamical systems, has just joined our group at UCR, so she will also be here. And the three of them have worked with Ryan Wisnesky on algebraic databases. Ryan will not be here, but his colleague Peter Gates will: together with David they have a startup called Categorical Informatics, which uses category theory to build sophisticated databases.

That’s not everyone—for example, most of my students will be speaking at this special session, and other people too—but that gives you a rough sense of some people involved. The conference is on a weekend, but John Foley and David Spivak and Brendan Fong and Dmitry Vagner are staying on for longer, so we’ll have some long conversations… and Brendan will explain decorated corelations in my Tuesday afternoon network theory seminar.

Here’s the program. Click on talk titles to see abstracts. For a multi-author talk, the person with the asterisk after their name is doing the talking. All the talks will be in Room 268 of the Highlander Union Building or ‘HUB’.

Saturday November 4, 2017, 9:00 a.m.-10:50 a.m.

9:00 a.m.
A higher-order temporal logic for dynamical systems.
David I. Spivak, MIT

10:00 a.m.
Algebras of open dynamical systems on the operad of wiring diagrams.
Dmitry Vagner*, Duke University
David I. Spivak, MIT
Eugene Lerman, University of Illinois at Urbana-Champaign

10:30 a.m.
Abstract dynamical systems.
Christina Vasilakopoulou*, University of California, Riverside
David Spivak, MIT
Patrick Schultz, MIT

Saturday November 4, 2017, 3:00 p.m.-5:50 p.m.

3:00 p.m.
Black boxes and decorated corelations.
Brendan Fong, MIT

4:00 p.m.
Compositional modelling of open reaction networks.
Blake S. Pollard*, University of California, Riverside
John C. Baez, University of California, Riverside

4:30 p.m.
A bicategory of coarse-grained Markov processes.
Kenny Courser, University of California, Riverside

5:00 p.m.
A bicategorical syntax for pure state qubit quantum mechanics.
Daniel M. Cicala, University of California, Riverside

5:30 p.m.
Open systems in classical mechanics.
Adam Yassine, University of California Riverside

Sunday November 5, 2017, 9:00 a.m.-10:50 a.m.

9:00 a.m.
Controllability and observability: diagrams and duality.
Jason Erbele, Victor Valley College

9:30 a.m.
Frobenius monoids, weak bimonoids, and corelations.
Brandon Coya, University of California, Riverside

10:00 a.m.
Compositional design and tasking of networks.
John D. Foley*, Metron, Inc.
John C. Baez, University of California, Riverside
Joseph Moeller, University of California, Riverside
Blake S. Pollard, University of California, Riverside

10:30 a.m.
Operads for modeling networks.
Joseph Moeller*, University of California, Riverside
John Foley, Metron Inc.
John C. Baez, University of California, Riverside
Blake S. Pollard, University of California, Riverside

Sunday November 5, 2017, 2:00 p.m.-4:50 p.m.

2:00 p.m.
Reeb graph smoothing via cosheaves.
Vin de Silva, Department of Mathematics, Pomona College

3:00 p.m.
Knowledge representation in bicategories of relations.
Evan Patterson*, Stanford University, Statistics Department

3:30 p.m.
The multiresolution analysis of flow graphs.
Steve Huntsman*, BAE Systems

4:00 p.m.
Categorical logic as a foundation for reasoning under uncertainty.
Ralph L. Wojtowicz*, Shepherd University

4:30 p.m.
Data modeling and integration using the open source tool Algebraic Query Language (AQL).
Peter Y. Gates*, Categorical Informatics
Ryan Wisnesky, Categorical Informatics

n-Category Café Applied Category Theory at UCR (Part 2)

I’m running a special session on applied category theory, and now the program is available:

This is going to be fun.

My former student Brendan Fong is now working with David Spivak at M.I.T., and they’re both coming. My collaborator John Foley at Metron is also coming: we’re working on the CASCADE project for designing networked systems.

Dmitry Vagner is coming from Duke: he wrote a paper with David and Eugene Lerman on operads and open dynamical system. Christina Vaisilakopolou, who has worked with David and Patrick Schultz on dynamical systems, has just joined our group at UCR, so she will also be here. And the three of them have worked with Ryan Wisnesky on algebraic databases. Ryan will not be here, but his colleague Peter Gates will: together with David they have a startup called Categorical Informatics, which uses category theory to build sophisticated databases.

That’s not everyone — for example, most of my students will be speaking at this special session, and other people too — but that gives you a rough sense of some people involved. The conference is on a weekend, but John Foley and David Spivak and Brendan Fong and Dmitry Vagner are staying on for longer, so we’ll have some long conversations… and Brendan will explain decorated corelations in my Tuesday afternoon network theory seminar.

Wanna see what the talks are about?

Here’s the program. Click on talk titles to see abstracts. For a multi-author talk, the person with the asterisk after their name is doing the talking. All the talks will be in Room 268 of the Highlander Union Building or ‘HUB’.

Saturday November 4, 2017, 9:00 a.m.-10:50 a.m.

9:00 a.m.
A higher-order temporal logic for dynamical systems.
David I. Spivak, M.I.T.

10:00 a.m.
Algebras of open dynamical systems on the operad of wiring diagrams.
Dmitry Vagner*, Duke University
David I. Spivak, M.I.T.
Eugene Lerman, University of Illinois at Urbana-Champaign

10:30 a.m.
Abstract dynamical systems.
Christina Vasilakopoulou*, University of California, Riverside
David Spivak, M.I.T.
Patrick Schultz, M.I.T.

Saturday November 4, 2017, 3:00 p.m.-5:50 p.m.

3:00 p.m.
Black boxes and decorated corelations.
Brendan Fong, M.I.T.

4:00 p.m.
Compositional modelling of open reaction networks.
Blake S. Pollard*, University of California, Riverside
John C. Baez, University of California, Riverside

4:30 p.m.
A bicategory of coarse-grained Markov processes.
Kenny Courser, University of California, Riverside

5:00 p.m.
A bicategorical syntax for pure state qubit quantum mechanics.
Daniel Michael Cicala, University of California, Riverside

5:30 p.m.
Open systems in classical mechanics.
Adam Yassine, University of California Riverside

Sunday November 5, 2017, 9:00 a.m.-10:50 a.m.

9:00 a.m.
Controllability and observability: diagrams and duality.
Jason Erbele, Victor Valley College

9:30 a.m.
Frobenius monoids, weak bimonoids, and corelations.
Brandon Coya, University of California, Riverside

10:00 a.m.
Compositional design and tasking of networks.
John D. Foley*, Metron, Inc.
John C. Baez, University of California, Riverside
Joseph Moeller, University of California, Riverside
Blake S. Pollard, University of California, Riverside

10:30 a.m.
Operads for modeling networks.
Joseph Moeller*, University of California, Riverside
John Foley, Metron Inc.
John C. Baez, University of California, Riverside
Blake S. Pollard, University of California, Riverside

Sunday November 5, 2017, 2:00 p.m.-4:50 p.m.

2:00 p.m.
Reeb graph smoothing via cosheaves.
Vin de Silva, Department of Mathematics, Pomona College

3:00 p.m.
Knowledge representation in bicategories of relations.
Evan Patterson*, Stanford University, Statistics Department

3:30 p.m.
The multiresolution analysis of flow graphs.
Steve Huntsman*, BAE Systems

4:00 p.m.
Categorical logic as a foundation for reasoning under uncertainty.
Ralph L. Wojtowicz*, Shepherd University

4:30 p.m.
Data modeling and integration using the open source tool Algebraic Query Language (AQL).
Peter Y. Gates*, Categorical Informatics
Ryan Wisnesky, Categorical Informatics

n-Category Café Applied Category Theory 2018

We’re having a conference on applied category theory!

The plenary speakers will be:

  • Samson Abramsky (Oxford)
  • John Baez (UC Riverside)
  • Kathryn Hess (EPFL)
  • Mehrnoosh Sadrzadeh (Queen Mary)
  • David Spivak (MIT)

There will be a lot more to say as this progresses, but for now let me just quote from the conference website.

Applied Category Theory (ACT 2018) is a five-day workshop on applied category theory running from April 30 to May 4 at the Lorentz Center in Leiden, the Netherlands.

Towards an integrative science: in this workshop, we want to instigate a multi-disciplinary research program in which concepts, structures, and methods from one scientific discipline can be reused in another. The aim of the workshop is to (1) explore the use of category theory within and across different disciplines, (2) create a more cohesive and collaborative ACT community, especially among early-stage researchers, and (3) accelerate research by outlining common goals and open problems for the field.

While the workshop will host talks on a wide range of applications of category theory, there will be three special tracks on exciting new developments in the field:

  1. Dynamical systems and networks
  2. Systems biology
  3. Cognition and AI
  4. Causality

Accompanying the workshop will be an Adjoint Research School for early-career researchers. This will comprise a 16 week online seminar, followed by a 4 day research meeting at the Lorentz Center in the week prior to ACT 2018. Applications to the school will open prior to October 1, and are due November 1. Admissions will be notified by November 15.

The organizers

Bob Coecke (Oxford), Brendan Fong (MIT), Aleks Kissinger (Nijmegen), Martha Lewis (Amsterdam), and Joshua Tan (Oxford)

We welcome any feedback! Please send comments to this link.

About Applied Category Theory

Category theory is a branch of mathematics originally developed to transport ideas from one branch of mathematics to another, e.g. from topology to algebra. Applied category theory refers to efforts to transport the ideas of category theory from mathematics to other disciplines in science, engineering, and industry.

This site originated from discussions at the Computational Category Theory Workshop at NIST on Sept. 28-29, 2015. It serves to collect and disseminate research, resources, and tools for the development of applied category theory, and hosts a blog for those involved in its study.

The Proposal: Towards an Integrative Science

Category theory was developed in the 1940s to translate ideas from one field of mathematics, e.g. topology, to another field of mathematics, e.g. algebra. More recently, category theory has become an unexpectedly useful and economical tool for modeling a range of different disciplines, including programming language theory [10], quantum mechanics [2], systems biology [12], complex networks [5], database theory [7], and dynamical systems [14].

A category consists of a collection of objects together with a collection of maps between those objects, satisfying certain rules. Topologists and geometers use category theory to describe the passage from one mathematical structure to another, while category theorists are also interested in categories for their own sake. In computer science and physics, many types of categories (e.g. topoi or monoidal categories) are used to give a formal semantics of domain-specific phenomena (e.g. automata [3], or regular languages [11], or quantum protocols [2]). In the applied category theory community, a long-articulated vision understands categories as mathematical workspaces for the experimental sciences, similar to how they are used in topology and geometry [13]. This has proved true in certain fields, including computer science and mathematical physics, and we believe that these results can be extended in an exciting direction: we believe that category theory has the potential to bridge specific different fields, and moreover that developments in such fields (e.g. automata) can be transferred successfully into other fields (e.g. systems biology) through category theory. Already, for example, the categorical modeling of quantum processes has helped solve an important open problem in natural language processing [9].

In this workshop, we want to instigate a multi-disciplinary research program in which concepts, structures, and methods from one discipline can be reused in another. Tangibly and in the short-term, we will bring together people from different disciplines in order to write an expository survey paper that grounds the varied research in applied category theory and lays out the parameters of the research program.

In formulating this research program, we are motivated by recent successes where category theory was used to model a wide range of phenomena across many disciplines, e.g. open dynamical systems (including open Markov processes and open chemical reaction networks), entropy and relative entropy [6], and descriptions of computer hardware [8]. Several talks will address some of these new developments. But we are also motivated by an open problem in applied category theory, one which was observed at the most recent workshop in applied category theory (Dagstuhl, Germany, in 2015): “a weakness of semantics/CT is that the definitions play a key role. Having the right definitions makes the theorems trivial, which is the opposite of hard subjects where they have combinatorial proofs of theorems (and simple definitions). […] In general, the audience agrees that people see category theorists only as reconstructing the things they knew already, and that is a disadvantage, because we do not give them a good reason to care enough” [1, pg. 61].

In this workshop, we wish to articulate a natural response to the above: instead of treating the reconstruction as a weakness, we should treat the use of categorical concepts as a natural part of transferring and integrating knowledge across disciplines. The restructuring employed in applied category theory cuts through jargon, helping to elucidate common themes across disciplines. Indeed, the drive for a common language and comparison of similar structures in algebra and topology is what led to the development category theory in the first place, and recent hints show that this approach is not only useful between mathematical disciplines, but between scientific ones as well. For example, the ‘Rosetta Stone’ of Baez and Stay demonstrates how symmetric monoidal closed categories capture the common structure between logic, computation, and physics [4].

[1] Samson Abramsky, John C. Baez, Fabio Gadducci, and Viktor Winschel. Categorical methods at the crossroads. Report from Dagstuhl Perspectives Workshop 14182, 2014.

[2] Samson Abramsky and Bob Coecke. A categorical semantics of quantum protocols. In Handbook of Quantum Logic and Quantum Structures. Elsevier, Amsterdam, 2009.

[3] Michael A. Arbib and Ernest G. Manes. A categorist’s view of automata and systems. In Ernest G. Manes, editor, Category Theory Applied to Computation and Control. Springer, Berlin, 2005.

[4] John C. Baez and Mike Stay. Physics, topology, logic and computation: a Rosetta stone. In Bob Coecke, editor, New Structures for Physics. Springer, Berlin, 2011.

[5] John C. Baez and Brendan Fong. A compositional framework for passive linear networks. arXiv e-prints, 2015.

[6] John C. Baez, Tobias Fritz, and Tom Leinster. A characterization of entropy in terms of information loss. Entropy, 13(11):1945-1957, 2011.

[7] Michael Fleming, Ryan Gunther, and Robert Rosebrugh. A database of categories. Journal of Symbolic Computing, 35(2):127-135, 2003.

[8] Dan R. Ghica and Achim Jung. Categorical semantics of digital circuits. In Ruzica Piskac and Muralidhar Talupur, editors, Proceedings of the 16th Conference on Formal Methods in Computer-Aided Design. Springer, Berlin, 2016.

[9] Dimitri Kartsaklis, Mehrnoosh Sadrzadeh, Stephen Pulman, and Bob Coecke. Reasoning about meaning in natural language with compact closed categories and Frobenius algebras. In Logic and Algebraic Structures in Quantum Computing and Information. Cambridge University Press, Cambridge, 2013.

[10] Eugenio Moggi. Notions of computation and monads. Information and Computation, 93(1):55-92, 1991.

[11] Nicholas Pippenger. Regular languages and Stone duality. Theory of Computing Systems 30(2):121-134, 1997.

[12] Robert Rosen. The representation of biological systems from the standpoint of the theory of categories. Bulletin of Mathematical Biophysics, 20(4):317-341, 1958.

[13] David I. Spivak. Category Theory for Scientists. MIT Press, Cambridge MA, 2014.

[14] David I. Spivak, Christina Vasilakopoulou, and Patrick Schultz. Dynamical systems and sheaves. arXiv e-prints, 2016.

John BaezApplied Category Theory 2018

There will be a conference on applied category theory!

Applied Category Theory (ACT 2018). School 23–27 April 2018 and conference 30 April–4 May 2018 at the Lorentz Center in Leiden, the Netherlands. Organized by Bob Coecke (Oxford), Brendan Fong (MIT), Aleks Kissinger (Nijmegen), Martha Lewis (Amsterdam), and Joshua Tan (Oxford).

The plenary speakers will be:

• Samson Abramsky (Oxford)
• John Baez (UC Riverside)
• Kathryn Hess (EPFL)
• Mehrnoosh Sadrzadeh (Queen Mary)
• David Spivak (MIT)

There will be a lot more to say as this progresses, but for now let me just quote from the conference website:

Applied Category Theory (ACT 2018) is a five-day workshop on applied category theory running from April 30 to May 4 at the Lorentz Center in Leiden, the Netherlands.

Towards an Integrative Science: in this workshop, we want to instigate a multi-disciplinary research program in which concepts, structures, and methods from one scientific discipline can be reused in another. The aim of the workshop is to (1) explore the use of category theory within and across different disciplines, (2) create a more cohesive and collaborative ACT community, especially among early-stage researchers, and (3) accelerate research by outlining common goals and open problems for the field.

While the workshop will host discussions on a wide range of applications of category theory, there will be four special tracks on exciting new developments in the field:

1. Dynamical systems and networks
2. Systems biology
3. Cognition and AI
4. Causality

Accompanying the workshop will be an Adjoint Research School for early-career researchers. This will comprise a 16 week online seminar, followed by a 4 day research meeting at the Lorentz Center in the week prior to ACT 2018. Applications to the school will open prior to October 1, and are due November 1. Admissions will be notified by November 15.

The organizers

Bob Coecke (Oxford), Brendan Fong (MIT), Aleks Kissinger (Nijmegen), Martha Lewis (Amsterdam), and Joshua Tan (Oxford)

We welcome any feedback! Please send comments to this link.

About Applied Category Theory

Category theory is a branch of mathematics originally developed to transport ideas from one branch of mathematics to another, e.g. from topology to algebra. Applied category theory refers to efforts to transport the ideas of category theory from mathematics to other disciplines in science, engineering, and industry.

This site originated from discussions at the Computational Category Theory Workshop at NIST on Sept. 28-29, 2015. It serves to collect and disseminate research, resources, and tools for the development of applied category theory, and hosts a blog for those involved in its study.

The proposal: Towards an Integrative Science

Category theory was developed in the 1940s to translate ideas from one field of mathematics, e.g. topology, to another field of mathematics, e.g. algebra. More recently, category theory has become an unexpectedly useful and economical tool for modeling a range of different disciplines, including programming language theory [10], quantum mechanics [2], systems biology [12], complex networks [5], database theory [7], and dynamical systems [14].

A category consists of a collection of objects together with a collection of maps between those objects, satisfying certain rules. Topologists and geometers use category theory to describe the passage from one mathematical structure to another, while category theorists are also interested in categories for their own sake. In computer science and physics, many types of categories (e.g. topoi or monoidal categories) are used to give a formal semantics of domain-specific phenomena (e.g. automata [3], or regular languages [11], or quantum protocols [2]). In the applied category theory community, a long-articulated vision understands categories as mathematical workspaces for the experimental sciences, similar to how they are used in topology and geometry [13]. This has proved true in certain fields, including computer science and mathematical physics, and we believe that these results can be extended in an exciting direction: we believe that category theory has the potential to bridge specific different fields, and moreover that developments in such fields (e.g. automata) can be transferred successfully into other fields (e.g. systems biology) through category theory. Already, for example, the categorical modeling of quantum processes has helped solve an important open problem in natural language processing [9].

In this workshop, we want to instigate a multi-disciplinary research program in which concepts, structures, and methods from one discipline can be reused in another. Tangibly and in the short-term, we will bring together people from different disciplines in order to write an expository survey paper that grounds the varied research in applied category theory and lays out the parameters of the research program.

In formulating this research program, we are motivated by recent successes where category theory was used to model a wide range of phenomena across many disciplines, e.g. open dynamical systems (including open Markov processes and open chemical reaction networks), entropy and relative entropy [6], and descriptions of computer hardware [8]. Several talks will address some of these new developments. But we are also motivated by an open problem in applied category theory, one which was observed at the most recent workshop in applied category theory (Dagstuhl, Germany, in 2015): “a weakness of semantics/CT is that the definitions play a key role. Having the right definitions makes the theorems trivial, which is the opposite of hard subjects where they have combinatorial proofs of theorems (and simple definitions). […] In general, the audience agrees that people see category theorists only as reconstructing the things they knew already, and that is a disadvantage, because we do not give them a good reason to care enough” [1, pg. 61].

In this workshop, we wish to articulate a natural response to the above: instead of treating the reconstruction as a weakness, we should treat the use of categorical concepts as a natural part of transferring and integrating knowledge across disciplines. The restructuring employed in applied category theory cuts through jargon, helping to elucidate common themes across disciplines. Indeed, the drive for a common language and comparison of similar structures in algebra and topology is what led to the development category theory in the first place, and recent hints show that this approach is not only useful between mathematical disciplines, but between scientific ones as well. For example, the ‘Rosetta Stone’ of Baez and Stay demonstrates how symmetric monoidal closed categories capture the common structure between logic, computation, and physics [4].

[1] Samson Abramsky, John C. Baez, Fabio Gadducci, and Viktor Winschel. Categorical methods at the crossroads. Report from Dagstuhl Perspectives Workshop 14182, 2014.

[2] Samson Abramsky and Bob Coecke. A categorical semantics of quantum protocols. In Handbook of Quantum Logic and Quantum Structures. Elsevier, Amsterdam, 2009.

[3] Michael A. Arbib and Ernest G. Manes. A categorist’s view of automata and systems. In Ernest G. Manes, editor, Category Theory Applied to Computation and Control. Springer, Berlin, 2005.

[4] John C. Baez. Physics, topology, logic and computation: a Rosetta stone. In Bob Coecke, editor, New Structures for Physics. Springer, Berlin, 2011.

[5] John C. Baez and Brendan Fong. A compositional framework for passive linear networks. arXiv e-prints, 2015.

[6] John C. Baez, Tobias Fritz, and Tom Leinster. A characterization of entropy in terms of information loss. Entropy, 13(11):1945–1957, 2011.

[7] Michael Fleming, Ryan Gunther, and Robert Rosebrugh. A database of categories. Journal of Symbolic Computing, 35(2):127–135, 2003.

[8] Dan R. Ghica and Achim Jung. Categorical semantics of digital circuits. In Ruzica Piskac and Muralidhar Talupur, editors, Proceedings of the 16th Conference on Formal Methods in Computer-Aided Design. Springer, Berlin, 2016.

[9] Dimitri Kartsaklis, Mehrnoosh Sadrzadeh, Stephen Pulman, and Bob Coecke. Reasoning about meaning in natural language with compact closed categories and Frobenius algebras. In Logic and Algebraic Structures in Quantum Computing and Information. Cambridge University Press, Cambridge, 2013.

[10] Eugenio Moggi. Notions of computation and monads. Information and Computation, 93(1):55–92, 1991.

[11] Nicholas Pippenger. Regular languages and Stone duality. Theory of Computing Systems 30(2):121–134, 1997.

[12] Robert Rosen. The representation of biological systems from the standpoint of the theory of categories. Bulletin of Mathematical Biophysics, 20(4):317–341, 1958.

[13] David I. Spivak. Category Theory for Scientists. MIT Press, Cambridge MA, 2014.

[14] David I. Spivak, Christina Vasilakopoulou, and Patrick Schultz. Dynamical systems and sheaves. arXiv e-prints, 2016.

September 19, 2017

Tim GowersTwo infinities that are surprisingly equal

It has been in the news recently — or rather, the small corner of the news that is of particular interest to mathematicians — that Maryanthe Malliaris and Saharon Shelah recently had an unexpected breakthrough when they stumbled on a proof that two infinities were equal that had been conjectured, and widely believed, to be distinct. Or rather, since both were strictly between the cardinality of the natural numbers and the cardinality of the reals, they were widely believed to be distinct in some models of set theory where the continuum hypothesis fails.

A couple of days ago, John Baez was sufficiently irritated by a Quanta article on this development that he wrote a post on Google Plus in which he did a much better job of explaining what was going on. As a result of reading that, and following and participating in the ensuing discussion, I have got interested in the problem. In particular, as a complete non-expert, I am struck that a problem that looks purely combinatorial (though infinitary) should, according to Quanta, have a solution that involves highly non-trivial arguments in proof theory and model theory. It makes me wonder, again as a complete non-expert so probably very naively, whether there is a simpler purely combinatorial argument that the set theorists missed because they believed too strongly that the two infinities were different.

I certainly haven’t found such an argument, but I thought it might be worth at least setting out the problem, in case it appeals to anyone, and giving a few preliminary thoughts about it. I’m not expecting much from this, but if there’s a small chance that it leads to a fruitful mathematical discussion, then it’s worth doing. As I said above, I am indebted to John Baez and to several commenters on his post for being able to write much of what I write in this post, as can easily be checked if you read that discussion as well.

A few definitions and a statement of the result

The problem concerns the structure you obtain when you take the power set of the natural numbers and quotient out by the relation “has a finite symmetric difference with”. That is, we regard two sets A and B as equivalent if you can turn A into B by removing finitely many elements and adding finitely many other elements.

It’s easy to check that this is an equivalence relation. We can also define a number of the usual set-theoretic operations. For example, writing [A] for the equivalence class of A, we can set [A]\cap[B] to be [A\cap B], [A]\cup [B] to be [A\cup B], [A]^c to be [A^c], etc. It is easy to check that these operations are well-defined.

What about the subset relation? That too has an obvious definition. We don’t want to say that [A]\subset[B] if A\subset B, since that is not well-defined. However, we can define A to be almost contained in B if the set A\setminus B is finite, and then say that [A]\subset[B] if A is almost contained in B. This is well-defined and it’s also easy to check that it is true if and only if [A]\cap[B]=[A], which is the sort of thing we’d like to happen if our finite-fuzz set theory is to resemble normal set theory as closely as possible.

I will use a non-standard piece of terminology and refer to an equivalence class of sets as an f-set, the “f” standing for “finite” or “fuzzy” (though these fuzzy sets are not to be confused with the usual definition of fuzzy sets, which I don’t know and probably never will know). I’ll also say things like “is f-contained in” (which means the same as “is almost contained in” except that it refers to the f-sets rather than to representatives of their equivalence classes).

So far so good, but things start to get a bit less satisfactory when we consider infinite intersections and unions. How are we to define \bigcap_{n=1}^\infty[A_n], for example?

An obvious property we would like is that the intersection should be the largest f-set that is contained in all the [A_n]. However, simple examples show that there doesn’t have to be a largest f-set contained in all the [A_n]. Indeed, let A_1\supset A_2\supset\dots be an infinite sequence of subsets of \mathbb N such that A_n\setminus A_{n+1} is infinite for every n. Then A is almost contained in every A_n if and only if A\setminus A_n is finite for every n. Given any such set, we can find for each n an element b_n of A_n\setminus A_{n+1} that is not contained in A (since A_n\setminus A_{n+1} is infinite but A\setminus A_{n+1} is finite). Then the set B=A\cup\{b_1,b_2,\dots\} is also almost contained in every A_n, and [A] is properly contained in [B] (in the obvious sense).

OK, we don’t seem to have a satisfactory definition of infinite intersections, but we could at least hope for a satisfactory definition of “has an empty intersection”. And indeed, there is an obvious one. Given a collection of f-sets [A_\gamma], we say that its intersection is empty if the only f-set that is f-contained in every [A_\gamma] is [\emptyset]. (Note that [\emptyset] is the equivalence class of the empty set, which consists of all finite subsets of \mathbb N.) In terms of the sets rather than their equivalence classes, this is saying that there is no infinite set that is almost contained in every A_\gamma.

An important concept that appears in many places in mathematics, but particularly in set theory, is the finite-intersection property. A collection \mathcal A of subsets of a set X is said to have this property if A_1\cap\dots\cap A_n is non-empty whenever A_1,\dots,A_n\in\mathcal A. This definition carries over to f-sets with no problem at all, since finite f-intersections were easy to define.

Let’s ask ourselves a little question here: can we find a collection of f-sets with the finite-intersection property but with an empty intersection? That is, no finite intersection is empty, but the intersection of all the f-sets is empty.

That should be pretty easy. For sets, there are very simple examples like A_n=\{n,n+1,\dots\} — finitely many of those have a non-empty intersection, but there is no set that’s contained in all of them.

Unfortunately, all those sets are the same if we turn them into f-sets. But there is an obvious way of adjusting the example: we just take sets A_1\supset A_2\supset\dots such that A_n\setminus A_{n+1} is infinite for each n and \bigcap_{n=1}^\infty A_n=\emptyset. That ought to do the job once we turn each A_n into its equivalence class [A_n].

Except that it doesn’t do the job. In fact, we’ve already observed that we can just pick a set B=\{b_1,b_2,\dots\} with b_n\in A_n\setminus A_{n+1} and then [B] will be a non-empty f-intersection of the A_n.

However, here’s an example that does work. We’ll take all f-sets [A] such that A has density 1. (This means that n^{-1}|A\cap\{1,2,\dots,n\}| tends
to 1 as n tends to infinity.) Since the intersection of any two sets of density 1 has density 1 (a simple exercise), this collection of f-sets has the finite-intersection property. I claim that any f-set contained in all these f-sets must be [\emptyset].

Indeed, let B be an infinite set and (b_1,b_2,\dots) the enumeration of its elements in increasing order. We can pick a subsequence (c_1,c_2,\dots) such that c_n\geq 2^n for every n, and the corresponding subset C is an infinite subset of B with density zero. Therefore, \mathbb N\setminus C is a set of density 1 that does not almost contain B.

The number of f-sets we took there in order to achieve an f-empty intersection was huge: the cardinality of the continuum. (That’s another easy exercise.) Did we really need that many? This innocent question leads straight to a definition that is needed in order to understand what Malliaris and Shelah did.

Definition. The cardinal p is the smallest cardinality of a collection F of f-sets such that F has the finite-intersection property but also F has an empty f-intersection.

It is simple to prove that this cardinal is uncountable, but it is also known not to be as big as the cardinality of the continuum (where again this means that there are models of set theory — necessarily ones where CH fails — for which it is strictly smaller). So it is a rather nice intermediate cardinal, which partially explains its interest to set theorists.

The cardinal p is one of the two infinities that Malliaris and Shelah proved were the same. The other one is closely related. Define a tower to be a collection of f-sets that does not contain [\emptyset] and is totally ordered by inclusion. Note that a tower T trivially satisfies the finite-intersection property: if [A_1],\dots,[A_n] belong to T, then the smallest of the f-sets [A_i] is the f-intersection and it isn’t f-empty. So let’s make another definition.

Definition. The cardinal t is the smallest cardinality of a tower T that has an empty f-intersection.

Since a tower has the finite-intersection property, we are asking for something strictly stronger before, so strictly harder to obtain. It follows that t is at least as large as p.

And now we have the obvious question: is the inequality strict? As I have said, it was widely believed that it was, and a big surprise when Malliaris and Shelah proved that the two infinities were in fact equal.

What does this actually say? It says that if you can find a bunch F of f-sets with the finite-intersection property and an empty f-intersection, then you can find a totally ordered example T that has at most the cardinality of F.

Why is the problem hard?

I don’t have a sophisticated answer to this that would explain why it is hard to experts in set theory. I just want to think about why it might be hard to prove the statement using a naive approach.

An immediate indication that things might be difficult is that it isn’t terribly easy to give any example of a tower with an empty f-intersection, let alone one with small cardinality.

An indication of the problem we face was already present when I gave a failed attempt to construct a system of sets with the finite-intersection property and empty intersection. I took a nested sequence [A_1]\supset[A_2]\supset such that the sets A_n had empty intersection, but that didn’t work because I could pick an element from each A_n\setminus A_{n+1} and put those together to make a non-empty f-intersection. (I’m using “f-intersection” to mean any f-set f-contained in all the given f-sets. In general, we can’t choose a largest one, so it’s far from unique. The usual terminology would be to say that if A is almost contained in every set from a collection of sets, then A is a pseudointersection of that collection. But I’m trying to express as much as possible in terms of f-sets.)

Anyone who is familiar with ordinal hierarchies will see that there is an obvious thing we could do here. We could start as above, and then when we find the annoying f-intersection we simply add it to the tower and call it [A_\omega]. And then inside [A_\omega] we can find another nested decreasing sequence of sets and call those [A_{\omega+1}], [A_{\omega+2}],\dots and so on. Those will also have a non-empty f-intersection, which we could call [A_{2\omega}], and so on.

Let’s use this idea to prove that there do exist towers with empty f-intersections. I shall build a collection of non-empty f-sets [A_\alpha] by transfinite induction. If I have already built [A_\alpha], I let [A_{\alpha+1}] be any non-empty f-set that is strictly f-contained in [A_\alpha]. That tells me how to build my sets at successor ordinals. If \alpha is a limit ordinal, then I’ll take A_\alpha to be a non-empty f-intersection of all the [A_\beta] with \beta<\alpha.

But how am I so sure that such an f-intersection exists? I’m not, but if it doesn’t exist, then I’m very happy, as that means that the f-sets [A_\beta] with \beta<\alpha form a tower with empty f-intersection.

Since all the f-sets in this tower are distinct, the process has to terminate at some point, and that implies that a tower with empty f-intersection must exist.

For a lot of ordinal constructions like this, one can show that the process terminates at the first uncountable ordinal, \omega_1. To set theorists, this has extremely small cardinality — by definition, the smallest one after the cardinality of the natural numbers. In some models of set theory, there will be a dizzying array of cardinals between this and the cardinality of the continuum.

In our case it is not too hard to prove that the process doesn’t terminate before we get to the first uncountable ordinal. Indeed, if \alpha is a countable limit ordinal, then we can take an increasing sequence of ordinals \alpha_n that tend to \alpha, pick an element b_n from A_{\alpha_n}\setminus A_{\alpha_{n+1}}, and define A_\alpha to be \{b_1,b_2,\dots\}.

However, there doesn’t seem to be any obvious argument to say that the f-sets [A_\alpha] with \alpha<\omega_1 have an empty f-intersection, even if we make some effort to keep our sets small (for example, by defining A_{\alpha+1} to consist of every other element of A_\alpha). In fact, we sort of know that there won’t be such an argument, because if there were, then it would show that there was a tower whose cardinality was that of the first uncountable ordinal. That would prove that t had this cardinality, and since p is uncountable (that is easy to check) we would immediately know that p and t were equal.

So that’s already an indication that something subtle is going on that you need to be a proper set theorist to understand properly.

But do we need to understand these funny cardinalities to solve the problem? We don’t need to know what they are — just to prove that they are the same. Perhaps that can still be done in a naive way.

So here’s a very naive idea. Let’s take a set F of f-sets with the finite intersection property and empty f-intersection, and let’s try to build a tower T with empty intersection using only sets from F. This would certainly be sufficient for showing that T has cardinality at most that of F, and if F has minimal cardinality it would show that p=t.

There’s almost no chance that this will work, but let’s at least see where it goes wrong, or runs into a brick wall.

At first things go swimmingly. Let [A]\in F. Then there must exist an f-set [A']'\in F that does not f-contain [A], since otherwise [A] itself would be a non-empty f-intersection for F. But then [A]\cap [A'] is a proper f-subset of [A], and by the finite-intersection property it is not f-empty.

By iterating this argument, we can therefore obtain a nested sequence [A_1]\supset[A_2]\supset of f-sets in F.

The next thing we’d like to do is create [A_\omega]. And this, unsurprisingly, is where the brick wall is. Consider, for example, the case where F consists of all sets of density 1. What if we stupidly chose A_n in such a way that \min A_n\geq 2^n for every n? Then our diagonal procedure — picking an element from each set A_n\setminus A_{n+1} — would yield a set of density zero. Of course, we could go for a different diagonal procedure. We would need to prove that for this particular F and any nested sequence we can always find an f-intersection that belongs to F. That’s equivalent to saying that for any sequence A_1\supset A_2\supset of dense sets we can find a set A such that A\setminus A_n is finite for every n and A has density 1.

That’s a fairly simple (but not trivial) exercise I think, but when I tried to write a proof straight down I failed — it’s more like a pen-and-paper job until you get the construction right. But here’s the real question I’d like to know the answer to right at this moment. It splits into two questions actually.

Question 1. Let F be a collection of f-sets with the finite-intersection property and no non-empty f-intersection. Let [A_1]\supset[A_2]\supset\dots be a nested sequence of elements of F. Must this sequence have an f-intersection that belongs to F?

Question 2. If, as seems likely, the answer to Question 1 is no, must it at least be the case that there exists a nested sequence in F with an f-intersection that also belongs to F?

If the answer to Question 2 turned out to be yes, it would naturally lead to the following further question.

Question 3. If the answer to Question 2 is yes, then how far can we go with it? For example, must F contain a nested transfinite sequence of uncountable length?

Unfortunately, even a positive answer to Question 3 would not be enough for us, for reasons I’ve already given. It might be the case that we can indeed build nice big towers in F, but that the arguments stop working once we reach the first uncountable ordinal. Indeed, it might well be known that there are sets F with the finite-intersection property and no non-empty f-intersection that do not contain towers that are bigger than this. If that’s the case, it would give at least one serious reason for the problem being hard. It would tell us that we can’t prove the equality by just finding a suitable tower inside F: instead, we’d need to do something more indirect, constructing a tower T and some non-obvious injection from T to F. (It would be non-obvious because it would not preserve the subset relation.)

Another way the problem might be difficult is if F does contain a tower with no non-empty f-intersection, but we can’t extend an arbitrary tower in F to a tower with this property. Perhaps if we started off building our tower the wrong way, it would lead us down a path that had a dead end long before the tower was big enough, even though good paths and good towers did exist.

But these are just pure speculations on my part. I’m sure the answers to many of my questions are known. If so, I’ll be interested to hear about it, and to understand better why Malliaris and Shelah had to use big tools and a much less obvious argument than the kind of thing I was trying to do above.


I’m still writing on the book. After not much happened for almost a year, my publisher now rather suddenly asked for the final version of the manuscript. Until that’s done not much will be happening on this blog. We do seem to have settled on a title though: “Lost in Math: How Beauty Leads Physics Astray.” The title is my doing, the subtitle isn’t. I just hope it won’t lead too many readers

September 18, 2017

Doug NatelsonFaculty position at Rice - theoretical astro-particle/cosmology

Assistant Professor Position at Rice University in

Theoretical Astro-Particle Physics/Cosmology

The Department of Physics and Astronomy at Rice University in Houston, Texas, invites applications for a tenure-track faculty position (Assistant Professor level) in Theoretical Astro-Particle physics and/or Cosmology. The department seeks an outstanding individual whose research will complement and connect existing activities in Nuclear/Particle physics and Astrophysics groups at Rice University (see This is the second position in a Cosmic Frontier effort that may eventually grow to three members. The successful applicant will be expected to develop an independent and vigorous research program, and teach graduate and undergraduate courses. A PhD in Physics, Astrophysics or related field is required.

Applicants should send the following: (i) cover letter; (ii) curriculum vitae (including electronic links to 2 relevant publications); (iii) research statement (4 pages or less); (iv) teaching statement (2 pages or less); and (v) the names, professional affiliations, and email addresses of three references.  To apply, please visit:  Applications will be accepted until the position is filled, but only those received by Dec 15, 2017 will be assured full consideration. The appointment is expected to start in July 2018.  Further inquiries should be directed to the chair of the search committee, Prof. Paul Padley (

Rice University is an Equal Opportunity Employer with commitment to diversity at all levels, and considers for employment qualified applicants without regard to race, color, religion, age, sex, sexual orientation, gender identity, national or ethnic origin, genetic information, disability or protected veteran status.


n-Category Café Lattice Paths and Continued Fractions I

In my last post I talked about certain types of lattice paths with weightings on them and formulas for the weighted count of the paths, in particular I was interested in expressing the reverse Bessel polynomials as a certain weighted count of Schröder paths. I alluded to some connection with continued fractions and it is this connection that I want to explain here and in my next post.

In this post I want to prove Flajolet’s Fundamental Lemma. Alan Sokal calls this Flajolet’s Master Theorem, but Viennot takes the stance that it deserves the high accolade of being described as a ‘Fundamental Lemma’, citing Aigner and Ziegler in Proofs from THE BOOK:

“The essence of mathematics is proving theorems – and so, that is what mathematicians do: They prove theorems. But to tell the truth, what they really want to prove, once in their lifetime, is a Lemma, like the one by Fatou in analysis, the Lemma of Gauss in number theory, or the Burnside-Frobenius Lemma in combinatorics.

“Now what makes a mathematical statement a true Lemma? First, it should be applicable to a wide variety of instances, even seemingly unrelated problems. Secondly, the statement should, once you have seen it, be completely obvious. The reaction of the reader might well be one of faint envy: Why haven’t I noticed this before? And thirdly, on an esthetic level, the Lemma – including its proof – should be beautiful!”

Interestingly, Aigner and Ziegler were building up to describing a result of Viennot’s – the Gessel-Lindström-Viennot Lemma – as a fundamental lemma! (I hope to talk about that lemma in a later post.)

Anyway, Flajolet’s Fundamental Lemma that I will describe and prove below is about expressing the weighted count of paths that look like

weighted Motzkhin path

as a continued fraction

11c 0a 1b 11c 1a 2b 21c 2a 3b 31 \frac{1} {1- c_{0} - \frac{a_{1} b_{1}} {1-c_{1} - \frac{a_{2} b_{2}} {1- c_2 - \frac{a_3 b_3} {1-\dots }}}}

Next time I’ll give a few examples, including the connection with reverse Bessel polynomials.

Motzkhin paths

We consider Motzkhin paths, which are like Dyck paths and Schröder paths we considered last time, but here the flat paths have length 11.

A Motzkhin path, then, is a lattice path in 2\mathbb{N}^2 starting at (0,0)(0, 0), having steps in the direction (1,1)(1,1), (1,1)(1,-1) or (1,0)(1,0). The path finishes at some (,0)(\ell, 0). Here is a Motzkhin path.

Motzkhin path

(Actually at this point the length of each step is a bit of a red herring, but let’s not worry about that.)

We want to count weighted paths, so we’re going to have to weight them. We’ll do it in a universal way to start with. Let {a i} i=1 \{a_i\}_{i=1}^\infty, {b i} i=1 \{b_i\}_{i=1}^\infty and {c i} i=0 \{c_i\}_{i=0}^\infty, be three sets of commuting indeterminates. Now weight each step in a path in the following way. Each step going up to level ii will be given the weight a ia_i; each step going down from level ii will be given the weight b ib_i; and each flat step at level ii will be given weight c ic_i. Here’s the path from above with the weights marked on it.

weighted Motzkhin path

The weight w a,b,c(σ)w_{a,b,c}(\sigma) of a path σ\sigma is just the product of the weights of each of its steps, so the weight of the above path is c 0a 1 2b 1 2c 1a 2b 2c_0a_1^2b_1^2c_1a_2b_2.

If you try to start writing down the sum of the weightings of all Motzkhin paths you’ll get a power series that begins

1+c 0+a 1b 1+c 0 2+2a 1b 1c 0+a 1b 1c 1+c 0 3+[[a i,b i,c i]] 1 + c_0 + a_1b_1 + c_0^2 + 2a_1b_1c_0 + a_1b_1c_1 + c_0^3 + \dots \in\mathbb{Z}[[a_i, b_i, c_i]]

Flajolet’s Fundamental Lemma will give us a formula for this power series.

Flajolet’s Fundamental Lemma

In order to prove the result about the enumeration of weightings of all paths we will need to consider slightly more general paths that don’t just start on the xx-axis. So define a (h,k)(h,k)–path to be like a Motzkhin path except that it starts at some point (0,h)(0, h), for h0h\ge 0, does not go below the line y=hy=h nor above the line y=ky=k and finishes at some point (,h)(\ell, h). Let P h kP_h^k denote the set of all (h,k)(h,k)–paths.

Here is a (2,4)(2,4)–path with the weights marked on. Of course this is also, for instance, a (2,13)(2,13)–path.

high Motzkhin path

We want the weighted sum of all Motzkhin paths, so in order to calculate that we will take p h kp_h^k to be the sum of all weights of (h,k)(h,k)-paths: p h k σP h kw a,b,c(σ)[[a i,b i,c i]].p_h^k\coloneqq \sum_{\sigma\in P_h^k} w_{a,b,c}(\sigma)\in \mathbb{Z}[[a_i, b_i, c_i]]. There is a beautifully simple expression for p h kp^k_h.

Observe first that any path in P k kP_k^k is constrained to lie at level kk so must simply be a product of flat steps which all have weight c kc_k, thus

p k k=1+c k+c k 2+c k 3+=11c k. p^k_k = 1 + c_k +c_k^2 + c_k^3+\dots = \frac{1}{1-c_k}.

Given two paths σ 1,σ 2P h k\sigma_1, \sigma_2\in P^k_h we can multiply them together simply by placing σ 2\sigma_2 after σ 1\sigma_1. The above pictured example is the product of three paths in P 2 4P_2^4, the middle one being a flat path. Weighting is clearly preserved by this multiplication: w a,b,c(σ 1σ 2)=w a,b,c(σ 1)w a,b,c(σ 2)w_{a,b,c}(\sigma_1\sigma_2)=w_{a,b,c}(\sigma_1)w_{a,b,c}(\sigma_2).

An indecomposable (h,k)(h,k)-path is a path which only returns to level hh at its finishing point, i.e. as the name suggests, it can not be decomposed into a non-trivial product. It is clear that any path uniquely decomposes as a product of indecomposable paths. There are two types of non-trivial indecomposable (h,k)(h,k)-paths: there is the single flat step; and there are the paths which are an up step followed by a path in P h+1 kP_{h+1}^k followed by a down step back to level hh. We let I h kI_{h}^k be the set of non-trivial indecomposable (h,k)(h,k)-paths.

This all leads to the following argument to deduce an expression for the weighted count of all (h,k)(h,k)-paths.

p h k = σP h kw a,b,c(σ) = n=0 π i,,π nI h kw a,b,c(π 1π n) = n=0 π i,,π nI h kw a,b,c(π 1)w a,b,c(π n) =11 πI h kw a,b,c(π) =11c k σP h+1 ka h+1w a,b,c(σ)b h+1 =11c ka h+1b h+1 σP h+1 kw a,b,c(σ) =11c ka h+1b h+1p h+1 k \begin{aligned} p^k_h&=\sum_{\sigma\in P^k_h} w_{a,b,c}(\sigma)\\ &= \sum_{n=0}^\infty \sum_{\pi_i,\dots,\pi_n \in I^k_h} w_{a,b,c}(\pi_1\dots \pi_n)\\ &= \sum_{n=0}^\infty \sum_{\pi_i,\dots,\pi_n \in I^k_h} w_{a,b,c}(\pi_1)\dots w_{a,b,c}(\pi_n)\\ &= \frac{1}{1- \sum_{\pi\in I^k_h} w_{a,b,c}(\pi)} \\ &= \frac{1}{1- c_k - \sum_{\sigma\in P^k_{h+1}}a_{h+1} w_{a,b,c}(\sigma)b_{h+1}} \\ &= \frac{1}{1- c_k- a_{h+1}b_{h+1}\sum_{\sigma\in P^k_{h+1}} w_{a,b,c}(\sigma)} \\ &= \frac{1}{1- c_k - a_{h+1} b_{h+1}p_{h+1}^k} \end{aligned}

This is a lovely recursive expression for the weighted count p h kp_h^k. Using the fact p k k=11c kp^k_k=\frac{1}{1-c_k} that we gave above, we obtain the following.

Lemma p h k=11c ha h+1b h+11c h+1a h+2b h+21c k1a kb k1c k p_h^k= \frac{1} {1- c_{h} - \frac{a_{h+1} b_{h+1}} {1-c_{h+1} - \frac{a_{h+2} b_{h+2}} {\qquad \frac{\vdots}{1- c_{k-1}-\frac{a_k b_k}{1-c_k}} }}}

Now taking h=0h=0 and letting kk\to \infty we get the following continued fraction expansion for the weighted count of all Motzkhin paths starting at level 00.

Flajolet’s Fundamental Lemma σMotzkhinw a,b,c(σ)=11c 0a 1b 11c 1a 2b 21c 2a 3b 31[[a i,b i,c i]] \sum_{\sigma\,\,\mathrm{Motzkhin}} w_{a,b,c}(\sigma) = \frac{1} {1- c_{0} - \frac{a_{1} b_{1}} {1-c_{1} - \frac{a_{2} b_{2}} {1- c_2 - \frac{a_3 b_3} {1-\dots }}}} \in\mathbb{Z}[[a_i, b_i, c_i]]

How lovely and simple is that?

Next time I’ll give some examples and applications which include the Dyck paths and Scröder paths we looked at previously.

Doug NatelsonFaculty position at Rice - experimental condensed matter

Faculty Position in Experimental Condensed Matter Physics Rice University

The Department of Physics and Astronomy at Rice University in Houston, TX invites applications for a tenure-track faculty position in experimental condensed matter physics.  The department expects to make an appointment at the assistant professor level. This search seeks an outstanding individual whose research interest is in hard condensed matter systems, who will complement and extend existing experimental and theoretical activities in condensed matter physics on semiconductor and nanoscale structures, strongly correlated systems, topological matter, and related quantum materials (see A PhD in physics or related field is required. 

Applicants to this search should submit the following: (1) cover letter; (2) curriculum vitae; (3) research statement; (4) teaching statement; and (5) the names, professional affiliations, and email addresses of three references. For full details and to apply, please visit: Applications will be accepted until the position is filled. The review of applications will begin October 15 2017, but all those received by December 1 2017 will be assured full consideration. The appointment is expected to start in July 2018.  Further inquiries should be directed to the chair of the search committee, Prof. Emilia Morosan (  

Rice University is an Equal Opportunity Employer with commitment to diversity at all levels, and considers for employment qualified applicants without regard to race, color, religion, age, sex, sexual orientation, gender identity, national or ethnic origin, genetic information, disability or protected veteran status.

September 17, 2017

David Hoggregression

Our data-driven model for stars, The Cannon, is a regression. That is, it figures out how the labels generate the spectral pixels with a model for possible functional forms for that generation. I spent part of today building a Jupyter notebook to demonstrate that—when the assumptions underlying the regression are correct—the results of the regression are accurate (and precise). That is, the maximum-likelihood regression estimator is a good one. That isn't surprising; there are very general proofs; but it answers some questions (that my collaborators have) about cases where the labels (the regressors) are correlated in the training set.

Terence TaoSzemeredi’s proof of Szemeredi’s theorem

Szemerédi’s theorem asserts that all subsets of the natural numbers of positive density contain arbitrarily long arithmetic progressions.  Roth’s theorem is the special case when one considers arithmetic progressions of length three.  Both theorems have many important proofs using tools from additive combinatorics, (higher order) Fourier analysis, (hyper) graph regularity theory, and ergodic theory.  However, the original proof by Endre Szemerédi, while extremely intricate, was purely combinatorial (and in particular “elementary”) and almost entirely self-contained, except for an invocation of the van der Waerden theorem.  It is also notable for introducing a prototype of what is now known as the Szemerédi regularity lemma.

Back in 2005, I rewrote Szemerédi’s original proof in order to understand it better, however my rewrite ended up being about the same length as the original argument and was probably only usable to myself.  In 2012, after Szemerédi was awarded the Abel prize, I revisited this argument with the intention to try to write up a more readable version of the proof, but ended up just presenting some ingredients of the argument in a blog post, rather than try to rewrite the whole thing.  In that post, I suspected that the cleanest way to write up the argument would be through the language of nonstandard analysis (perhaps in an iterated hyperextension that could handle various hierarchies of infinitesimals), but was unable to actually achieve any substantial simplifications by passing to the nonstandard world.

A few weeks ago, I participated in a week-long workshop at the American Institute of Mathematics on “Nonstandard methods in combinatorial number theory”, and spent some time in a working group with Shabnam Akhtari, Irfam Alam, Renling Jin, Steven Leth, Karl Mahlburg, Paul Potgieter, and Henry Towsner to try to obtain a manageable nonstandard version of Szemerédi’s original proof.  We didn’t end up being able to do so – in fact there are now signs that perhaps nonstandard analysis is not the optimal framework in which to place this argument – but we did at least clarify the existing standard argument, to the point that I was able to go back to my original rewrite of the proof and present it in a more civilised form, which I am now uploading here as an unpublished preprint.   There are now a number of simplifications to the proof.  Firstly, one no longer needs the full strength of the regularity lemma; only the simpler “weak” regularity lemma of Frieze and Kannan is required.  Secondly, the proof has been “factored” into a number of stand-alone propositions of independent interest, in particular involving just (families of) one-dimensional arithmetic progressions rather than the complicated-looking multidimensional arithmetic progressions that occur so frequently in the original argument of Szemerédi.  Finally, the delicate manipulations of densities and epsilons via double counting arguments in Szemerédi’s original paper have been abstracted into a certain key property of families of arithmetic progressions that I call the “double counting property”.

The factoring mentioned above is particularly simple in the case of proving Roth’s theorem, which is now presented separately in the above writeup.  Roth’s theorem seeks to locate a length three progression {(P(1),P(2),P(3)) = (a, a+r, a+2r)} in which all three elements lie in a single set.  This will be deduced from an easier variant of the theorem in which one locates (a family of) length three progressions in which just the first two elements {P(1), P(2)} of the progression lie in a good set (and some other properties of the family are also required).  This is in turn derived from an even easier variant in which now just the first element of the progression is required to be in the good set.

More specifically, Roth’s theorem is now deduced from

Theorem 1.5.  Let {L} be a natural number, and let {S} be a set of integers of upper density at least {1-1/10L}.  Then, whenever {S} is partitioned into finitely many colour classes, there exists a colour class {A} and a family {(P_l(1),P_l(2),P_l(3))_{l=1}^L} of 3-term arithmetic progressions with the following properties:

  1. For each {l}, {P_l(1)} and {P_l(2)} lie in {A}.
  2. For each {l}, {P_l(3)} lie in {S}.
  3. The {P_l(3)} for {l=1,\dots,L} are in arithmetic progression.

The situation in this theorem is depicted by the following diagram, in which elements of A are in blue and elements of S are in grey:

Theorem 1.5 is deduced in turn from the following easier variant:

Theorem 1.6.  Let {L} be a natural number, and let {S} be a set of integers of upper density at least {1-1/10L}.  Then, whenever {S} is partitioned into finitely many colour classes, there exists a colour class {A} and a family {(P_l(1),P_l(2),P_l(3))_{l=1}^L} of 3-term arithmetic progressions with the following properties:

  1. For each {l}, {P_l(1)} lie in {A}.
  2. For each {l}, {P_l(2)} and {P_l(3)} lie in {S}.
  3. The {P_l(2)} for {l=1,\dots,L} are in arithmetic progression.

The situation here is described by the figure below.

Theorem 1.6 is easy to prove.  To derive Theorem 1.5 from Theorem 1.6, or to derive Roth’s theorem from Theorem 1.5, one uses double counting arguments, van der Waerden’s theorem, and the weak regularity lemma, largely as described in this previous blog post; see the writeup for the full details.  (I would be interested in seeing a shorter proof of Theorem 1.5 though that did not go through these arguments, and did not use the more powerful theorems of  Roth or Szemerédi.)


Filed under: expository, math.CO Tagged: regularity lemma, Roth's theorem, Szemeredi's theorem

Tommaso DorigoLetters From Indochina, 1952-54: The Tragic Story Of A 18 Year Old Enrolled In The Legion Etrangere

In 1952 my uncle Antonio, then 18 years old, left his family home in Venice, Italy to never return, running away from the humiliation of a failure at school. With a friend he reached the border with France and crossed it during the night, chased by border patrols and wolves. Caught by the French police Toni - that was the abbreviated name with which was known by everybody - was offered a choice: be sent back to Italy, facing three months of jail, or enrol in the French legion. Afraid of the humiliation and the consequences, he tragically chose the latter.

read more

Terence TaoInverting the Schur complement, and large-dimensional Gelfand-Tsetlin patterns

Suppose we have an {n \times n} matrix {M} that is expressed in block-matrix form as

\displaystyle  M = \begin{pmatrix} A & B \\ C & D \end{pmatrix}

where {A} is an {(n-k) \times (n-k)} matrix, {B} is an {(n-k) \times k} matrix, {C} is an {k \times (n-k)} matrix, and {D} is a {k \times k} matrix for some {1 < k < n}. If {A} is invertible, we can use the technique of Schur complementation to express the inverse of {M} (if it exists) in terms of the inverse of {A}, and the other components {B,C,D} of course. Indeed, to solve the equation

\displaystyle  M \begin{pmatrix} x & y \end{pmatrix} = \begin{pmatrix} a & b \end{pmatrix},

where {x, a} are {(n-k) \times 1} column vectors and {y,b} are {k \times 1} column vectors, we can expand this out as a system

\displaystyle  Ax + By = a

\displaystyle  Cx + Dy = b.

Using the invertibility of {A}, we can write the first equation as

\displaystyle  x = A^{-1} a - A^{-1} B y \ \ \ \ \ (1)

and substituting this into the second equation yields

\displaystyle  (D - C A^{-1} B) y = b - C A^{-1} a

and thus (assuming that {D - CA^{-1} B} is invertible)

\displaystyle  y = - (D - CA^{-1} B)^{-1} CA^{-1} a + (D - CA^{-1} B)^{-1} b

and then inserting this back into (1) gives

\displaystyle  x = (A^{-1} + A^{-1} B (D - CA^{-1} B)^{-1} C A^{-1}) a - A^{-1} B (D - CA^{-1} B)^{-1} b.

Comparing this with

\displaystyle  \begin{pmatrix} x & y \end{pmatrix} = M^{-1} \begin{pmatrix} a & b \end{pmatrix},

we have managed to express the inverse of {M} as

\displaystyle  M^{-1} =

\displaystyle  \begin{pmatrix} A^{-1} + A^{-1} B (D - CA^{-1} B)^{-1} C A^{-1} & - A^{-1} B (D - CA^{-1} B)^{-1} \\ - (D - CA^{-1} B)^{-1} CA^{-1} & (D - CA^{-1} B)^{-1} \end{pmatrix}. \ \ \ \ \ (2)

One can consider the inverse problem: given the inverse {M^{-1}} of {M}, does one have a nice formula for the inverse {A^{-1}} of the minor {A}? Trying to recover this directly from (2) looks somewhat messy. However, one can proceed as follows. Let {U} denote the {n \times k} matrix

\displaystyle  U := \begin{pmatrix} 0 \\ I_k \end{pmatrix}

(with {I_k} the {k \times k} identity matrix), and let {V} be its transpose:

\displaystyle  V := \begin{pmatrix} 0 & I_k \end{pmatrix}.

Then for any scalar {t} (which we identify with {t} times the identity matrix), one has

\displaystyle  M + UtV = \begin{pmatrix} A & B \\ C & D+t \end{pmatrix},

and hence by (2)

\displaystyle  (M+UtV)^{-1} =

\displaystyle \begin{pmatrix} A^{-1} + A^{-1} B (D + t - CA^{-1} B)^{-1} C A^{-1} & - A^{-1} B (D + t- CA^{-1} B)^{-1} \\ - (D + t - CA^{-1} B)^{-1} CA^{-1} & (D + t - CA^{-1} B)^{-1} \end{pmatrix}.

noting that the inverses here will exist for {t} large enough. Taking limits as {t \rightarrow \infty}, we conclude that

\displaystyle  \lim_{t \rightarrow \infty} (M+UtV)^{-1} = \begin{pmatrix} A^{-1} & 0 \\ 0 & 0 \end{pmatrix}.

On the other hand, by the Woodbury matrix identity (discussed in this previous blog post), we have

\displaystyle  (M+UtV)^{-1} = M^{-1} - M^{-1} U (t^{-1} + V M^{-1} U)^{-1} V M^{-1}

and hence on taking limits and comparing with the preceding identity, one has

\displaystyle  \begin{pmatrix} A^{-1} & 0 \\ 0 & 0 \end{pmatrix} = M^{-1} - M^{-1} U (V M^{-1} U)^{-1} V M^{-1}.

This achieves the aim of expressing the inverse {A^{-1}} of the minor in terms of the inverse of the full matrix. Taking traces and rearranging, we conclude in particular that

\displaystyle  \mathrm{tr} A^{-1} = \mathrm{tr} M^{-1} - \mathrm{tr} (V M^{-2} U) (V M^{-1} U)^{-1}. \ \ \ \ \ (3)

In the {k=1} case, this can be simplified to

\displaystyle  \mathrm{tr} A^{-1} = \mathrm{tr} M^{-1} - \frac{e_n^T M^{-2} e_n}{e_n^T M^{-1} e_n} \ \ \ \ \ (4)

where {e_n} is the {n^{th}} basis column vector.

We can apply this identity to understand how the spectrum of an {n \times n} random matrix {M} relates to that of its top left {n-1 \times n-1} minor {A}. Subtracting any complex multiple {z} of the identity from {M} (and hence from {A}), we can relate the Stieltjes transform {s_M(z) := \frac{1}{n} \mathrm{tr}(M-z)^{-1}} of {M} with the Stieltjes transform {s_A(z) := \frac{1}{n-1} \mathrm{tr}(A-z)^{-1}} of {A}:

\displaystyle  s_A(z) = \frac{n}{n-1} s_M(z) - \frac{1}{n-1} \frac{e_n^T (M-z)^{-2} e_n}{e_n^T (M-z)^{-1} e_n} \ \ \ \ \ (5)

At this point we begin to proceed informally. Assume for sake of argument that the random matrix {M} is Hermitian, with distribution that is invariant under conjugation by the unitary group {U(n)}; for instance, {M} could be drawn from the Gaussian Unitary Ensemble (GUE), or alternatively {M} could be of the form {M = U D U^*} for some real diagonal matrix {D} and {U} a unitary matrix drawn randomly from {U(n)} using Haar measure. To fix normalisations we will assume that the eigenvalues of {M} are typically of size {O(1)}. Then {A} is also Hermitian and {U(n)}-invariant. Furthermore, the law of {e_n^T (M-z)^{-1} e_n} will be the same as the law of {u^* (M-z)^{-1} u}, where {u} is now drawn uniformly from the unit sphere (independently of {M}). Diagonalising {M} into eigenvalues {\lambda_j} and eigenvectors {v_j}, we have

\displaystyle u^* (M-z)^{-1} u = \sum_{j=1}^n \frac{|u^* v_j|^2}{\lambda_j - z}.

One can think of {u} as a random (complex) Gaussian vector, divided by the magnitude of that vector (which, by the Chernoff inequality, will concentrate to {\sqrt{n}}). Thus the coefficients {u^* v_j} with respect to the orthonormal basis {v_1,\dots,v_j} can be thought of as independent (complex) Gaussian vectors, divided by that magnitude. Using this and the Chernoff inequality again, we see (for {z} distance {\sim 1} away from the real axis at least) that one has the concentration of measure

\displaystyle  u^* (M-z)^{-1} u \approx \frac{1}{n} \sum_{j=1}^n \frac{1}{\lambda_j - z}

and thus

\displaystyle  e_n^T (M-z)^{-1} e_n \approx \frac{1}{n} \mathrm{tr} (M-z)^{-1} = s_M(z)

(that is to say, the diagonal entries of {(M-z)^{-1}} are roughly constant). Similarly we have

\displaystyle  e_n^T (M-z)^{-2} e_n \approx \frac{1}{n} \mathrm{tr} (M-z)^{-2} = \frac{d}{dz} s_M(z).

Inserting this into (5) and discarding terms of size {O(1/n^2)}, we thus conclude the approximate relationship

\displaystyle  s_A(z) \approx s_M(z) + \frac{1}{n} ( s_M(z) - s_M(z)^{-1} \frac{d}{dz} s_M(z) ).

This can be viewed as a difference equation for the Stieltjes transform of top left minors of {M}. Iterating this equation, and formally replacing the difference equation by a differential equation in the large {n} limit, we see that when {n} is large and {k \approx e^{-t} n} for some {t \geq 0}, one expects the top left {k \times k} minor {A_k} of {M} to have Stieltjes transform

\displaystyle  s_{A_k}(z) \approx s( t, z ) \ \ \ \ \ (6)

where {s(t,z)} solves the Burgers-type equation

\displaystyle  \partial_t s(t,z) = s(t,z) - s(t,z)^{-1} \frac{d}{dz} s(t,z) \ \ \ \ \ (7)

with initial data {s(0,z) = s_M(z)}.

Example 1 If {M} is a constant multiple {M = cI_n} of the identity, then {s_M(z) = \frac{1}{c-z}}. One checks that {s(t,z) = \frac{1}{c-z}} is a steady state solution to (7), which is unsurprising given that all minors of {M} are also {c} times the identity.

Example 2 If {M} is GUE normalised so that each entry has variance {\sigma^2/n}, then by the semi-circular law (see previous notes) one has {s_M(z) \approx \frac{-z + \sqrt{z^2-4\sigma^2}}{2\sigma^2} = -\frac{2}{z + \sqrt{z^2-4\sigma^2}}} (using an appropriate branch of the square root). One can then verify the self-similar solution

\displaystyle  s(t,z) = \frac{-z + \sqrt{z^2 - 4\sigma^2 e^{-t}}}{2\sigma^2 e^{-t}} = -\frac{2}{z + \sqrt{z^2 - 4\sigma^2 e^{-t}}}

to (7), which is consistent with the fact that a top {k \times k} minor of {M} also has the law of GUE, with each entry having variance {\sigma^2 / n \approx \sigma^2 e^{-t} / k} when {k \approx e^{-t} n}.

One can justify the approximation (6) given a sufficiently good well-posedness theory for the equation (7). We will not do so here, but will note that (as with the classical inviscid Burgers equation) the equation can be solved exactly (formally, at least) by the method of characteristics. For any initial position {z_0}, we consider the characteristic flow {t \mapsto z(t)} formed by solving the ODE

\displaystyle  \frac{d}{dt} z(t) = s(t,z(t))^{-1} \ \ \ \ \ (8)

with initial data {z(0) = z_0}, ignoring for this discussion the problems of existence and uniqueness. Then from the chain rule, the equation (7) implies that

\displaystyle  \frac{d}{dt} s( t, z(t) ) = s(t,z(t))

and thus {s(t,z(t)) = e^t s(0,z_0)}. Inserting this back into (8) we see that

\displaystyle  z(t) = z_0 + s(0,z_0)^{-1} (1-e^{-t})

and thus (7) may be solved implicitly via the equation

\displaystyle  s(t, z_0 + s(0,z_0)^{-1} (1-e^{-t}) ) = e^t s(0, z_0) \ \ \ \ \ (9)

for all {t} and {z_0}.

Remark 3 In practice, the equation (9) may stop working when {z_0 + s(0,z_0)^{-1} (1-e^{-t})} crosses the real axis, as (7) does not necessarily hold in this region. It is a cute exercise (ultimately coming from the Cauchy-Schwarz inequality) to show that this crossing always happens, for instance if {z_0} has positive imaginary part then {z_0 + s(0,z_0)^{-1}} necessarily has negative or zero imaginary part.

Example 4 Suppose we have {s(0,z) = \frac{1}{c-z}} as in Example 1. Then (9) becomes

\displaystyle  s( t, z_0 + (c-z_0) (1-e^{-t}) ) = \frac{e^t}{c-z_0}

for any {t,z_0}, which after making the change of variables {z = z_0 + (c-z_0) (1-e^{-t}) = c - e^{-t} (c - z_0)} becomes

\displaystyle  s(t, z ) = \frac{1}{c-z}

as in Example 1.

Example 5 Suppose we have

\displaystyle  s(0,z) = \frac{-z + \sqrt{z^2-4\sigma^2}}{2\sigma^2} = -\frac{2}{z + \sqrt{z^2-4\sigma^2}}.

as in Example 2. Then (9) becomes

\displaystyle  s(t, z_0 - \frac{z_0 + \sqrt{z_0^2-4\sigma^2}}{2} (1-e^{-t}) ) = e^t \frac{-z_0 + \sqrt{z_0^2-4\sigma^2}}{2\sigma^2}.

If we write

\displaystyle  z := z_0 - \frac{z_0 + \sqrt{z_0^2-4\sigma^2}}{2} (1-e^{-t})

\displaystyle  = \frac{(1+e^{-t}) z_0 - (1-e^{-t}) \sqrt{z_0^2-4\sigma^2}}{2}

one can calculate that

\displaystyle  z^2 - 4 \sigma^2 e^{-t} = (\frac{(1-e^{-t}) z_0 - (1+e^{-t}) \sqrt{z_0^2-4\sigma^2}}{2})^2

and hence

\displaystyle  \frac{-z + \sqrt{z^2 - 4\sigma^2 e^{-t}}}{2\sigma^2 e^{-t}} = e^t \frac{-z_0 + \sqrt{z_0^2-4\sigma^2}}{2\sigma^2}

which gives

\displaystyle  s(t,z) = \frac{-z + \sqrt{z^2 - 4\sigma^2 e^{-t}}}{2\sigma^2 e^{-t}}. \ \ \ \ \ (10)

One can recover the spectral measure {\mu} from the Stieltjes transform {s(z)} as the weak limit of {x \mapsto \frac{1}{\pi} \mathrm{Im} s(x+i\varepsilon)} as {\varepsilon \rightarrow 0}; we write this informally as

\displaystyle  d\mu(x) = \frac{1}{\pi} \mathrm{Im} s(x+i0^+)\ dx.

In this informal notation, we have for instance that

\displaystyle  \delta_c(x) = \frac{1}{\pi} \mathrm{Im} \frac{1}{c-x-i0^+}\ dx

which can be interpreted as the fact that the Cauchy distributions {\frac{1}{\pi} \frac{\varepsilon}{(c-x)^2+\varepsilon^2}} converge weakly to the Dirac mass at {c} as {\varepsilon \rightarrow 0}. Similarly, the spectral measure associated to (10) is the semicircular measure {\frac{1}{2\pi \sigma^2 e^{-t}} (4 \sigma^2 e^{-t}-x^2)_+^{1/2}}.

If we let {\mu_t} be the spectral measure associated to {s(t,\cdot)}, then the curve {e^{-t} \mapsto \mu_t} from {(0,1]} to the space of measures is the high-dimensional limit {n \rightarrow \infty} of a Gelfand-Tsetlin pattern (discussed in this previous post), if the pattern is randomly generated amongst all matrices {M} with spectrum asymptotic to {\mu_0} as {n \rightarrow \infty}. For instance, if {\mu_0 = \delta_c}, then the curve is {\alpha \mapsto \delta_c}, corresponding to a pattern that is entirely filled with {c}‘s. If instead {\mu_0 = \frac{1}{2\pi \sigma^2} (4\sigma^2-x^2)_+^{1/2}} is a semicircular distribution, then the pattern is

\displaystyle  \alpha \mapsto \frac{1}{2\pi \sigma^2 \alpha} (4\sigma^2 \alpha -x^2)_+^{1/2},

thus at height {\alpha} from the top, the pattern is semicircular on the interval {[-2\sigma \sqrt{\alpha}, 2\sigma \sqrt{\alpha}]}. The interlacing property of Gelfand-Tsetlin patterns translates to the claim that {\alpha \mu_\alpha(-\infty,\lambda)} (resp. {\alpha \mu_\alpha(\lambda,\infty)}) is non-decreasing (resp. non-increasing) in {\alpha} for any fixed {\lambda}. In principle one should be able to establish these monotonicity claims directly from the PDE (7) or from the implicit solution (9), but it was not clear to me how to do so.

An interesting example of such a limiting Gelfand-Tsetlin pattern occurs when {\mu_0 = \frac{1}{2} \delta_{-1} + \frac{1}{2} \delta_1}, which corresponds to {M} being {2P-I}, where {P} is an orthogonal projection to a random {n/2}-dimensional subspace of {{\bf C}^n}. Here we have

\displaystyle  s(0,z) = \frac{1}{2} \frac{1}{-1-z} + \frac{1}{2} \frac{1}{1-z} = \frac{z}{1-z^2}

and so (9) in this case becomes

\displaystyle  s(t, z_0 + \frac{1-z_0^2}{z_0} (1-e^{-t}) ) = \frac{e^t z_0}{1-z_0^2}

A tedious calculation then gives the solution

\displaystyle  s(t,z) = \frac{(2e^{-t}-1)z + \sqrt{z^2 - 4e^{-t}(1-e^{-t})}}{2e^{-t}(1-z^2)}. \ \ \ \ \ (11)

For {\alpha = e^{-t} > 1/2}, there are simple poles at {z=-1,+1}, and the associated measure is

\displaystyle  \mu_\alpha = \frac{2\alpha-1}{2\alpha} \delta_{-1} + \frac{2\alpha-1}{2\alpha} \delta_1 + \frac{1}{2\pi \alpha(1-x^2)} (4\alpha(1-\alpha)-x^2)_+^{1/2}\ dx.

This reflects the interlacing property, which forces {\frac{2\alpha-1}{2\alpha} \alpha n} of the {\alpha n} eigenvalues of the {\alpha n \times \alpha n} minor to be equal to {-1} (resp. {+1}). For {\alpha = e^{-t} \leq 1/2}, the poles disappear and one just has

\displaystyle  \mu_\alpha = \frac{1}{2\pi \alpha(1-x^2)} (4\alpha(1-\alpha)-x^2)_+^{1/2}\ dx.

For {\alpha=1/2}, one has an inverse semicircle distribution

\displaystyle  \mu_{1/2} = \frac{1}{\pi} (1-x^2)_+^{-1/2}.

There is presumably a direct geometric explanation of this fact (basically describing the singular values of the product of two random orthogonal projections to half-dimensional subspaces of {{\bf C}^n}), but I do not know of one off-hand.

The evolution of {s(t,z)} can also be understood using the {R}-transform and {S}-transform from free probability. Formally, letlet {z(t,s)} be the inverse of {s(t,z)}, thus

\displaystyle  s(t,z(t,s)) = s

for all {t,s}, and then define the {R}-transform

\displaystyle  R(t,s) := z(t,-s) - \frac{1}{s}.

The equation (9) may be rewritten as

\displaystyle  z( t, e^t s ) = z(0,s) + s^{-1} (1-e^{-t})

and hence

\displaystyle  R(t, -e^t s) = R(0, -s)

or equivalently

\displaystyle  R(t,s) = R(0, e^{-t} s). \ \ \ \ \ (12)

See these previous notes for a discussion of free probability topics such as the {R}-transform.

Example 6 If {s(t,z) = \frac{1}{c-z}} then the {R} transform is {R(t,s) = c}.

Example 7 If {s(t,z)} is given by (10), then the {R} transform is

\displaystyle  R(t,s) = \sigma^2 e^{-t} s.

Example 8 If {s(t,z)} is given by (11), then the {R} transform is

\displaystyle  R(t,s) = \frac{-1 + \sqrt{1 + 4 s^2 e^{-2t}}}{2 s e^{-t}}.

This simple relationship (12) is essentially due to Nica and Speicher (thanks to Dima Shylakhtenko for this reference). It has the remarkable consequence that when {\alpha = 1/m} is the reciprocal of a natural number {m}, then {\mu_{1/m}} is the free arithmetic mean of {m} copies of {\mu}, that is to say {\mu_{1/m}} is the free convolution {\mu \boxplus \dots \boxplus \mu} of {m} copies of {\mu}, pushed forward by the map {\lambda \rightarrow \lambda/m}. In terms of random matrices, this is asserting that the top {n/m \times n/m} minor of a random matrix {M} has spectral measure approximately equal to that of an arithmetic mean {\frac{1}{m} (M_1 + \dots + M_m)} of {m} independent copies of {M}, so that the process of taking top left minors is in some sense a continuous analogue of the process of taking freely independent arithmetic means. There ought to be a geometric proof of this assertion, but I do not know of one. In the limit {m \rightarrow \infty} (or {\alpha \rightarrow 0}), the {R}-transform becomes linear and the spectral measure becomes semicircular, which is of course consistent with the free central limit theorem.

In a similar vein, if one defines the function

\displaystyle  \omega(t,z) := \alpha \int_{\bf R} \frac{zx}{1-zx}\ d\mu_\alpha(x) = e^{-t} (- 1 - z^{-1} s(t, z^{-1}))

and inverts it to obtain a function {z(t,\omega)} with

\displaystyle  \omega(t, z(t,\omega)) = \omega

for all {t, \omega}, then the {S}-transform {S(t,\omega)} is defined by

\displaystyle  S(t,\omega) := \frac{1+\omega}{\omega} z(t,\omega).


\displaystyle  s(t,z) = - z^{-1} ( 1 + e^t \omega(t, z^{-1}) )

for any {t}, {z}, we have

\displaystyle  z_0 + s(0,z_0)^{-1} (1-e^{-t}) = z_0 \frac{\omega(0,z_0^{-1})+e^{-t}}{\omega(0,z_0^{-1})+1}

and so (9) becomes

\displaystyle  - z_0^{-1} \frac{\omega(0,z_0^{-1})+1}{\omega(0,z_0^{-1})+e^{-t}} (1 + e^{t} \omega(t, z_0^{-1} \frac{\omega(0,z_0^{-1})+1}{\omega(0,z_0^{-1})+e^{-t}}))

\displaystyle = - e^t z_0^{-1} (1 + \omega(0, z_0^{-1}))

which simplifies to

\displaystyle  \omega(t, z_0^{-1} \frac{\omega(0,z_0^{-1})+1}{\omega(0,z_0^{-1})+e^{-t}})) = \omega(0, z_0^{-1});

replacing {z_0} by {z(0,\omega)^{-1}} we obtain

\displaystyle  \omega(t, z(0,\omega) \frac{\omega+1}{\omega+e^{-t}}) = \omega

and thus

\displaystyle  z(0,\omega)\frac{\omega+1}{\omega+e^{-t}} = z(t, \omega)

and hence

\displaystyle  S(0, \omega) = \frac{\omega+e^{-t}}{\omega+1} S(t, \omega).

One can compute {\frac{\omega+e^{-t}}{\omega+1}} to be the {S}-transform of the measure {(1-\alpha) \delta_0 + \alpha \delta_1}; from the link between {S}-transforms and free products (see e.g. these notes of Guionnet), we conclude that {(1-\alpha)\delta_0 + \alpha \mu_\alpha} is the free product of {\mu_1} and {(1-\alpha) \delta_0 + \alpha \delta_1}. This is consistent with the random matrix theory interpretation, since {(1-\alpha)\delta_0 + \alpha \mu_\alpha} is also the spectral measure of {PMP}, where {P} is the orthogonal projection to the span of the first {\alpha n} basis elements, so in particular {P} has spectral measure {(1-\alpha) \delta_0 + \alpha \delta_1}. If {M} is unitarily invariant then (by a fundamental result of Voiculescu) it is asymptotically freely independent of {P}, so the spectral measure of {PMP = P^{1/2} M P^{1/2}} is asymptotically the free product of that of {M} and of {P}.

Filed under: expository, math.PR, math.RA, math.SP Tagged: free probability, Gelfand-Tsetlin patterns, Schur complement

September 16, 2017

David Hoggnew parallel-play workshop

Today was the first try at a new group-meeting idea for my group. I invited my NYC close collaborators to my (new) NYU office (which is also right across the hall from Huppenkothen and Leistedt) to work on whatever they are working on. The idea is that we will work in parallel (and independently), but we are all there to answer questions, discuss, debug, and pair-code. It was intimate today, but successful. Megan Bedell (Flatiron) and I debugged a part of her code that infers the telluric absorption spectrum (in a data-driven way, of course). And Elisabeth Andersson (NYU) got kplr and batman installed inside the sandbox that runs her Jupyter notebooks.

David Hogglatent variable models, weak lensing

The day started with a call with Bernhard Schölkopf (MPI-IS), Hans-Walter Rix (MPIA), and Markus Bonse (Darmstadt) to discuss taking Christina Eilers's (MPIA) problem of modeling spectra with partial labels over to a latent-variable model, probably starting with the GPLVM. We discussed data format and how we might start. There is a lot of work in astronomy using GANs and deep learning to make data generators. These are great, but we are betting it will be easier to put causal structure that we care about into the latent-variable model.

At Cosmology & Data Group Meeting at Flatiron, the whole group discussed the big batch of weak lensing results released by the Dark Energy Survey last month. A lot of the discussion was about understanding the covariances of the likelihood information coming from the weak lensing. This is a bit hard to understand, because everyone uses highly informative priors (for good reasons, of course) from prior data. We also discussed the multiplicative bias and other biases in shape measurement; how might we constrain these independently from the cosmological parameters themselves? Data simulations, of course, but most of us would like to see a measurement to constrain them.

At the end of Cosmology Meeting, Ben Wandelt (Flatiron) and I spent time discussing projects of mutual interest. In particular we discussed dimensionality reduction related to galaxy morphologies and spatially resolved spectroscopy, in part inspired by the weak-lensing discussion, and also the future of Euclid.

September 15, 2017

Terence TaoContinuous analogues of the Schur and skew Schur polynomials

Fix a non-negative integer {k}. Define an (weak) integer partition of length {k} to be a tuple {\lambda = (\lambda_1,\dots,\lambda_k)} of non-increasing non-negative integers {\lambda_1 \geq \dots \geq \lambda_k \geq 0}. (Here our partitions are “weak” in the sense that we allow some parts of the partition to be zero. Henceforth we will omit the modifier “weak”, as we will not need to consider the more usual notion of “strong” partitions.) To each such partition {\lambda}, one can associate a Young diagram consisting of {k} left-justified rows of boxes, with the {i^{th}} row containing {\lambda_i} boxes. A semi-standard Young tableau (or Young tableau for short) {T} of shape {\lambda} is a filling of these boxes by integers in {\{1,\dots,k\}} that is weakly increasing along rows (moving rightwards) and strictly increasing along columns (moving downwards). The collection of such tableaux will be denoted {{\mathcal T}_\lambda}. The weight {|T|} of a tableau {T} is the tuple {(n_1,\dots,n_k)}, where {n_i} is the number of occurrences of the integer {i} in the tableau. For instance, if {k=3} and {\lambda = (6,4,2)}, an example of a Young tableau of shape {\lambda} would be

\displaystyle  \begin{tabular}{|c|c|c|c|c|c|} \hline 1 & 1 & 1 & 2 & 3 & 3 \\ \cline{1-6} 2 & 2 & 2 &3\\ \cline{1-4} 3 & 3\\ \cline{1-2} \end{tabular}

The weight here would be {|T| = (3,4,5)}.

To each partition {\lambda} one can associate the Schur polynomial {s_\lambda(u_1,\dots,u_k)} on {k} variables {u = (u_1,\dots,u_k)}, which we will define as

\displaystyle  s_\lambda(u) := \sum_{T \in {\mathcal T}_\lambda} u^{|T|}

using the multinomial convention

\displaystyle (u_1,\dots,u_k)^{(n_1,\dots,n_k)} := u_1^{n_1} \dots u_k^{n_k}.

Thus for instance the Young tableau {T} given above would contribute a term {u_1^3 u_2^4 u_3^5} to the Schur polynomial {s_{(6,4,2)}(u_1,u_2,u_3)}. In the case of partitions of the form {(n,0,\dots,0)}, the Schur polynomial {s_{(n,0,\dots,0)}} is just the complete homogeneous symmetric polynomial {h_n} of degree {n} on {k} variables:

\displaystyle  s_{(n,0,\dots,0)}(u_1,\dots,u_k) := \sum_{n_1,\dots,n_k \geq 0: n_1+\dots+n_k = n} u_1^{n_1} \dots u_k^{n_k},

thus for instance

\displaystyle  s_{(3,0)}(u_1,u_2) = u_1^3 + u_1^2 u_2 + u_1 u_2^2 + u_2^3.

Schur polyomials are ubiquitous in the algebraic combinatorics of “type {A} objects” such as the symmetric group {S_k}, the general linear group {GL_k}, or the unitary group {U_k}. For instance, one can view {s_\lambda} as the character of an irreducible polynomial representation of {GL_k({\bf C})} associated with the partition {\lambda}. However, we will not focus on these interpretations of Schur polynomials in this post.

This definition of Schur polynomials allows for a way to describe the polynomials recursively. If {k > 1} and {T} is a Young tableau of shape {\lambda = (\lambda_1,\dots,\lambda_k)}, taking values in {\{1,\dots,k\}}, one can form a sub-tableau {T'} of some shape {\lambda' = (\lambda'_1,\dots,\lambda'_{k-1})} by removing all the appearances of {k} (which, among other things, necessarily deletes the {k^{th}} row). For instance, with {T} as in the previous example, the sub-tableau {T'} would be

\displaystyle  \begin{tabular}{|c|c|c|c|} \hline 1 & 1 & 1 & 2 \\ \cline{1-4} 2 & 2 & 2 \\ \cline{1-3} \end{tabular}

and the reduced partition {\lambda'} in this case is {(4,3)}. As Young tableaux are required to be strictly increasing down columns, we can see that the reduced partition {\lambda'} must intersperse the original partition {\lambda} in the sense that

\displaystyle  \lambda_{i+1} \leq \lambda'_i \leq \lambda_i \ \ \ \ \ (1)

for all {1 \leq i \leq k-1}; we denote this interspersion relation as {\lambda' \prec \lambda} (though we caution that this is not intended to be a partial ordering). In the converse direction, if {\lambda' \prec \lambda} and {T'} is a Young tableau with shape {\lambda'} with entries in {\{1,\dots,k-1\}}, one can form a Young tableau {T} with shape {\lambda} and entries in {\{1,\dots,k\}} by appending to {T'} an entry of {k} in all the boxes that appear in the {\lambda} shape but not the {\lambda'} shape. This one-to-one correspondence leads to the recursion

\displaystyle  s_\lambda(u) = \sum_{\lambda' \prec \lambda} s_{\lambda'}(u') u_k^{|\lambda| - |\lambda'|} \ \ \ \ \ (2)

where {u = (u_1,\dots,u_k)}, {u' = (u_1,\dots,u_{k-1})}, and the size {|\lambda|} of a partition {\lambda = (\lambda_1,\dots,\lambda_k)} is defined as {|\lambda| := \lambda_1 + \dots + \lambda_k}.

One can use this recursion (2) to prove some further standard identities for Schur polynomials, such as the determinant identity

\displaystyle  s_\lambda(u) V(u) = \det( u_i^{\lambda_j+k-j} )_{1 \leq i,j \leq k} \ \ \ \ \ (3)

for {u=(u_1,\dots,u_k)}, where {V(u)} denotes the Vandermonde determinant

\displaystyle  V(u) := \prod_{1 \leq i < j \leq k} (u_i - u_j), \ \ \ \ \ (4)

or the Jacobi-Trudi identity

\displaystyle  s_\lambda(u) = \det( h_{\lambda_j - j + i}(u) )_{1 \leq i,j \leq k}, \ \ \ \ \ (5)

with the convention that {h_d(u) = 0} if {d} is negative. Thus for instance

\displaystyle s_{(1,1,0,\dots,0)}(u) = h_1^2(u) - h_0(u) h_2(u) = \sum_{1 \leq i < j \leq k} u_i u_j.

We review the (standard) derivation of these identities via (2) below the fold. Among other things, these identities show that the Schur polynomials are symmetric, which is not immediately obvious from their definition.

One can also iterate (2) to write

\displaystyle  s_\lambda(u) = \sum_{() = \lambda^0 \prec \lambda^1 \prec \dots \prec \lambda^k = \lambda} \prod_{j=1}^k u_j^{|\lambda^j| - |\lambda^{j-1}|} \ \ \ \ \ (6)

where the sum is over all tuples {\lambda^1,\dots,\lambda^k}, where each {\lambda^j} is a partition of length {j} that intersperses the next partition {\lambda^{j+1}}, with {\lambda^k} set equal to {\lambda}. We will call such a tuple an integral Gelfand-Tsetlin pattern based at {\lambda}.

One can generalise (6) by introducing the skew Schur functions

\displaystyle  s_{\lambda/\mu}(u) := \sum_{\mu = \lambda^i \prec \dots \prec \lambda^k = \lambda} \prod_{j=i+1}^k u_j^{|\lambda^j| - |\lambda^{j-1}|} \ \ \ \ \ (7)

for {u = (u_{i+1},\dots,u_k)}, whenever {\lambda} is a partition of length {k} and {\mu} a partition of length {i} for some {0 \leq i \leq k}, thus the Schur polynomial {s_\lambda} is also the skew Schur polynomial {s_{\lambda /()}} with {i=0}. (One could relabel the variables here to be something like {(u_1,\dots,u_{k-i})} instead, but this labeling seems slightly more natural, particularly in view of identities such as (8) below.)

By construction, we have the decomposition

\displaystyle  s_{\lambda/\nu}(u_{i+1},\dots,u_k) = \sum_\mu s_{\mu/\nu}(u_{i+1},\dots,u_j) s_{\lambda/\mu}(u_{j+1},\dots,u_k) \ \ \ \ \ (8)

whenever {0 \leq i \leq j \leq k}, and {\nu, \mu, \lambda} are partitions of lengths {i,j,k} respectively. This gives another recursive way to understand Schur polynomials and skew Schur polynomials. For instance, one can use it to establish the generalised Jacobi-Trudi identity

\displaystyle  s_{\lambda/\mu}(u) = \det( h_{\lambda_j - j - \mu_i + i}(u) )_{1 \leq i,j \leq k}, \ \ \ \ \ (9)

with the convention that {\mu_i = 0} for {i} larger than the length of {\mu}; we do this below the fold.

The Schur polynomials (and skew Schur polynomials) are “discretised” (or “quantised”) in the sense that their parameters {\lambda, \mu} are required to be integer-valued, and their definition similarly involves summation over a discrete set. It turns out that there are “continuous” (or “classical”) analogues of these functions, in which the parameters {\lambda,\mu} now take real values rather than integers, and are defined via integration rather than summation. One can view these continuous analogues as a “semiclassical limit” of their discrete counterparts, in a manner that can be made precise using the machinery of geometric quantisation, but we will not do so here.

The continuous analogues can be defined as follows. Define a real partition of length {k} to be a tuple {\lambda = (\lambda_1,\dots,\lambda_k)} where {\lambda_1 \geq \dots \geq \lambda_k \geq 0} are now real numbers. We can define the relation {\lambda' \prec \lambda} of interspersion between a length {k-1} real partition {\lambda' = (\lambda'_1,\dots,\lambda'_{k-1})} and a length {k} real partition {\lambda = (\lambda_1,\dots,\lambda_{k})} precisely as before, by requiring that the inequalities (1) hold for all {1 \leq i \leq k-1}. We can then define the continuous Schur functions {S_\lambda(x)} for {x = (x_1,\dots,x_k) \in {\bf R}^k} recursively by defining

\displaystyle  S_{()}() = 1


\displaystyle  S_\lambda(x) = \int_{\lambda' \prec \lambda} S_{\lambda'}(x') \exp( (|\lambda| - |\lambda'|) x_k ) \ \ \ \ \ (10)

for {k \geq 1} and {\lambda} of length {k}, where {x' := (x_1,\dots,x_{k-1})} and the integral is with respect to {k-1}-dimensional Lebesgue measure, and {|\lambda| = \lambda_1 + \dots + \lambda_k} as before. Thus for instance

\displaystyle  S_{(\lambda_1)}(x_1) = \exp( \lambda_1 x_1 )


\displaystyle  S_{(\lambda_1,\lambda_2)}(x_1,x_2) = \int_{\lambda_2}^{\lambda_1} \exp( \lambda'_1 x_1 + (\lambda_1+\lambda_2-\lambda'_1) x_2 )\ d\lambda'_1.

More generally, we can define the continuous skew Schur functions {S_{\lambda/\mu}(x)} for {\lambda} of length {k}, {\mu} of length {j \leq k}, and {x = (x_{j+1},\dots,x_k) \in {\bf R}^{k-j}} recursively by defining

\displaystyle  S_{\mu/\mu}() = 1


\displaystyle  S_{\lambda/\mu}(x) = \int_{\lambda' \prec \lambda} S_{\lambda'/\mu}(x') \exp( (|\lambda| - |\lambda'|) x_k )

for {k > j}. Thus for instance

\displaystyle  S_{(\lambda_1,\lambda_2,\lambda_3)/(\mu_1,\mu_2)}(x_3) = 1_{\lambda_3 \leq \mu_2 \leq \lambda_2 \leq \mu_1 \leq \lambda_1} \exp( x_3 (\lambda_1+\lambda_2+\lambda_3 - \mu_1 - \mu_2 ))


\displaystyle  S_{(\lambda_1,\lambda_2,\lambda_3)/(\mu_1)}(x_2, x_3) = \int_{\lambda_2 \leq \lambda'_2 \leq \lambda_2, \mu_1} \int_{\mu_1, \lambda_2 \leq \lambda'_1 \leq \lambda_1}

\displaystyle \exp( x_2 (\lambda'_1+\lambda'_2 - \mu_1) + x_3 (\lambda_1+\lambda_2+\lambda_3 - \lambda'_1 - \lambda'_2))\ d\lambda'_1 d\lambda'_2.

By expanding out the recursion, one obtains the analogue

\displaystyle  S_\lambda(x) = \int_{\lambda^1 \prec \dots \prec \lambda^k = \lambda} \exp( \sum_{j=1}^k x_j (|\lambda^j| - |\lambda^{j-1}|))\ d\lambda^1 \dots d\lambda^{k-1},

of (6), and more generally one has

\displaystyle  S_{\lambda/\mu}(x) = \int_{\mu = \lambda^i \prec \dots \prec \lambda^k = \lambda} \exp( \sum_{j=i+1}^k x_j (|\lambda^j| - |\lambda^{j-1}|))\ d\lambda^{i+1} \dots d\lambda^{k-1}.

We will call the tuples {(\lambda^1,\dots,\lambda^k)} in the first integral real Gelfand-Tsetlin patterns based at {\lambda}. The analogue of (8) is then

\displaystyle  S_{\lambda/\nu}(x_{i+1},\dots,x_k) = \int S_{\mu/\nu}(x_{i+1},\dots,x_j) S_{\lambda/\mu}(x_{j+1},\dots,x_k)\ d\mu

where the integral is over all real partitions {\mu} of length {j}, with Lebesgue measure.

By approximating various integrals by their Riemann sums, one can relate the continuous Schur functions to their discrete counterparts by the limiting formula

\displaystyle  N^{-k(k-1)/2} s_{\lfloor N \lambda \rfloor}( \exp[ x/N ] ) \rightarrow S_\lambda(x) \ \ \ \ \ (11)

as {N \rightarrow \infty} for any length {k} real partition {\lambda = (\lambda_1,\dots,\lambda_k)} and any {x = (x_1,\dots,x_k) \in {\bf R}^k}, where

\displaystyle  \lfloor N \lambda \rfloor := ( \lfloor N \lambda_1 \rfloor, \dots, \lfloor N \lambda_k \rfloor )


\displaystyle  \exp[x/N] := (\exp(x_1/N), \dots, \exp(x_k/N)).

More generally, one has

\displaystyle  N^{j(j-1)/2-k(k-1)/2} s_{\lfloor N \lambda \rfloor / \lfloor N \mu \rfloor}( \exp[ x/N ] ) \rightarrow S_{\lambda/\mu}(x)

as {N \rightarrow \infty} for any length {k} real partition {\lambda}, any length {j} real partition {\mu} with {0 \leq j \leq k}, and any {x = (x_{j+1},\dots,x_k) \in {\bf R}^{k-j}}.

As a consequence of these limiting formulae, one expects all of the discrete identities above to have continuous counterparts. This is indeed the case; below the fold we shall prove the discrete and continuous identities in parallel. These are not new results by any means, but I was not able to locate a good place in the literature where they are explicitly written down, so I thought I would try to do so here (primarily for my own internal reference, but perhaps the calculations will be worthwhile to some others also).

— 1. Proofs of identities —

We first prove the determinant identity (3), by induction on {k}. The case {k=0} is trivial (one could also use {k=1} as the base case if desired); now suppose {k \geq 1} and the claim has already been proven for {k-1}. Writing {u = (u',u_k)} with {u' = (u_1,\dots,u_{k-1})}, we have from (4) that

\displaystyle  V(u) = V(u') \prod_{i=1}^{k-1} (u_i - u_k) \ \ \ \ \ (12)

so by (2) it will suffice to show that

\displaystyle  \sum_{\lambda' \prec \lambda} \det( u_i^{\lambda'_j+k-1-j} )_{1 \leq i,j \leq k-1} u_k^{|\lambda| - |\lambda'|} \prod_{i=1}^{k-1} (u_i - u_k) = \det( u_i^{\lambda_j+k-j} )_{1 \leq i,j \leq k}.

By continuity we may assume {u_k} is non-zero. Both sides are homogeneous in {u} of degree {|\lambda|+k(k-1)/2}, so without loss of generality we may normalise {u_k=1}, thus we need to show

\displaystyle  \sum_{\lambda' \prec \lambda} \det( u_i^{\lambda'_j+k-1-j} )_{1 \leq i,j \leq k-1} \prod_{i=1}^{k-1} (u_i - 1) = \det( u_i^{\lambda_j+k-j} )_{1 \leq i,j \leq k} \ \ \ \ \ (13)

where the bottom row of the matrix on the right-hand side consists entirely of {1}‘s.

The sum {\sum_{\lambda' \prec \lambda}} can be factored into {k-1} sums {\sum_{\lambda_{j+1} \leq \lambda'_j \leq \lambda_j}} for {j=1,\dots,k-1}. By the multilinearity of the determinant, the left-hand side of (13) may thus be written as

\displaystyle  \det( (u_i-1) \sum_{\lambda_{j+1} \leq \lambda'_j \leq \lambda_j} u_i^{\lambda'_j+k-1-j} )_{1 \leq i,j \leq k-1}.

This telescopes to

\displaystyle  \det( u_i^{\lambda_{j+1}+k-(j+1)} - u_i^{\lambda_j+k-j} )_{1 \leq i,j \leq k-1}.

By multilinearity, this expands out to an alternating sum of {2^{k-1}} terms, however all but {k} of these terms will vanish due to having two columns identical. The {k} terms that survive are of the form

\displaystyle  (-1)^{a-1} \det( u_i^{\lambda_j+k-j} )_{1 \leq i \leq k-1; j \in \{1,\dots,k\} \backslash \{a\}}

for {a=1,\dots,k} (where we enumerate {\{1,\dots,k\} \backslash \{a\}} in increasing order); but this sums to {\det( u_i^{\lambda_j+k-j} )_{1 \leq i,j \leq k}} after performing cofactor expansion on the bottom row of the latter determinant. This proves (3).

The continuous analogue of (3) is

\displaystyle  S_\lambda(x) V(x) = \det( \exp( x_i \lambda_j ) )_{1 \leq i,j \leq k}

and can either be proven from (3) and (11), or by mimicking the proof of (3) (replacing sums by integrals). We do the latter, leaving the former as an exercise for the reader. (This identity is also discussed at this MathOverflow question of mine, where it was noted that it essentially appears in this paper of Shatashvili; Apoorva Khare and I also used it in this recent paper.) Again we induct on {k}; the {k=0} case is trivial, so suppose {k \geq 1} and the claim has already been proven for {k-1}. Since

\displaystyle  S_\lambda(x) = \int_{\lambda' \prec \lambda} S_{\lambda'}(x') \exp( x_k (|\lambda| - |\lambda'|) )\ d\lambda'

it will suffice by (10) and (12) to prove that

\displaystyle  \int_{\lambda' \prec \lambda} \det( \exp( x_i \lambda'_j ) )_{1 \leq i,j \leq k-1} \exp( x_k (|\lambda| - |\lambda'|)) \prod_{i=1}^{k-1} (x_k - x_i)\ d\lambda'

\displaystyle = \det( \exp( x_i \lambda_j ) )_{1 \leq i,j \leq k}.

If we shift all of the {x_i} by the same shift {h}, both sides of this identity multiply by {\exp( h |\lambda| )}, so we may normalise {x_k=0}. Our task is now to show that

\displaystyle  \int_{\lambda' \prec \lambda} \det( \exp( x_i \lambda'_j ) )_{1 \leq i,j \leq k-1} \prod_{i=1}^{k-1} (- x_i)\ d\lambda'

\displaystyle  = \det( \exp( x_i \lambda_j ) )_{1 \leq i,j \leq k}, \ \ \ \ \ (14)

where the matrix on the right-hand side has a bottom row consisting entirely of {1}s.

The integral {\int_{\lambda' \prec \lambda}\ d\lambda'} can be factored into {k-1} integrals {\int_{\lambda_{j+1}}^{\lambda_j}\ d\lambda'_j} for {j=1,\dots,k-1}. By the multilinearity of the determinant, the left-hand side of (14) may thus be written as

\displaystyle  \det( -x_i \int_{\lambda_{j+1}}^{\lambda_j} \exp( x_i \lambda'_j )\ d\lambda'_j )_{1 \leq i,j \leq k-1}.

By the fundamental theorem of calculus, this evaluates to

\displaystyle  \det( \exp( x_i \lambda_j ) - \exp( x_i \lambda_{j+1} ) )_{1 \leq i,j \leq k-1}.

Again, this expands to {2^{k-1}} terms, all but {k} of which vanish, and the remaining {k} terms form the cofactor expansion of the right-hand side of (14).

Remark 1 Comparing (13) with (14) we obtain a relation between the discrete and continuous Schur functions, namely that

\displaystyle  s_\lambda(\exp[x]) V(\exp[x]) = S_{(\lambda_1+k-1,\dots,\lambda_k)}(x) V(x)

for any integer partition {\lambda} and any {x \in {\bf R}^k}. One can use this identity to obtain an alternate proof of the limiting relation (11).

Now we turn to (5), which can be proven by a similar argument to (3). Again, the base case {k=0} (or {k=1}, if one prefers) is trivial, so suppose {k \geq 1} and the claim has already been proven for {k-1}. By (2) it will suffice to show that

\displaystyle  \sum_{\lambda' \prec \lambda} \det( h_{\lambda'_j - j + i}(u') )_{1 \leq i,j \leq k-1} u_k^{|\lambda|-|\lambda'|} = \det( h_{\lambda_j - j + i}(u) )_{1 \leq i,j \leq k}. \ \ \ \ \ (15)

Both sides are homogeneous of degree {|\lambda|}, so as before we may normalise {u_k=1}. Factoring the left-hand side summation into {k-1} summations and using multilinearity as before, the left-hand side may be written as

\displaystyle  \det( \sum_{\lambda_{j+1} \leq \lambda'_j \leq \lambda_j} h_{\lambda'_j - j + i}(u') )_{1 \leq i,j \leq k-1}.

Now one observes the identities

\displaystyle  h_{\lambda_j - j + i}(u) = \sum_{\lambda'_j \leq \lambda_j} h_{\lambda'_j - j + i}(u')

and similarly

\displaystyle  h_{\lambda_{j+1} - (j+1) + i}(u) = \sum_{\lambda'_j < \lambda_{j+1}} h_{\lambda'_j - j + i}(u')

(where {\lambda'_j} is understood to range over the integers), hence on subtracting

\displaystyle  h_{\lambda_j - j+i}(u) - h_{\lambda_{j+1} - (j+1) + i}(u) = \sum_{\lambda_{j+1} \leq \lambda'_j \leq \lambda_j} h_{\lambda'_j - j + i}(u')

and so the above determinant may be written as

\displaystyle  \det( h_{\lambda_j - j+i}(u) - h_{\lambda_{j+1} - (j+1) + i}(u) )_{1 \leq i,j \leq k-1}.

Again, this expands into {2^{k-1}} terms, all but {k} of which vanish, and which can be collected by cofactor expansion to become the determinant of the {k \times k} matrix whose top {k-1} rows are {(h_{\lambda_j - j+i}(u))_{1 \leq i \leq k-1; 1 \leq j \leq k}}, and whose bottom row {(1)_{1 \leq j \leq k}} consists entirely of {1}s.

Now we use the identity

\displaystyle  1 = \sum_{S \subset \{1,\dots,k-1\}} (-1)^{|S|} h_{d-|S|}(u) \prod_{i \in S} u_i

for any {d \geq 0}. To verify this identity, we observe that the {u^n} coefficient of the right-hand side is equal to

\displaystyle  \sum_{S \subset \{1 \leq i \leq k-1: n_i \neq 0\}} (-1)^{|S|}

if {|n| \leq d}, and zero otherwise; but from the binomial theorem we see that this coefficient is {1} when {n=0} and {0} otherwise, giving the claim. Using this identity with {d = \lambda_j - j + k}, we can write the bottom row {(1)_{1 \leq j \leq k}} as the sum of {(h_{\lambda_j-j+k}(u))} plus a linear combination of {(h_{\lambda_j-j+i}(u))} for {i=1,\dots,k-1}, so after some row operations we conclude (15). The generalised Jacobi-Trudi identity (9) is proven similarly (keeping {\mu} fixed, and inducting on the length of {\lambda}); we leave this to the interested reader.

The continuous analogue of the Jacobi-Trudi identity (5) is a little less intuitive. The analogue of the complete homogeneous polynomials

\displaystyle  h_n(u_1,\dots,u_k) = \sum_{n_1+\dots+n_k=n: n_1,\dots,n_k \geq 0} u_1^{n_1} \dots u_k^{n_k}

for {n \geq 0} an integer, will be the functions

\displaystyle  H_t(x_1,\dots,x_k) := \int_{t_1+\dots+t_k=t: t_1,\dots,t_k \geq 0} \exp( t_1 x_1 + \dots + t_k x_k)\ dt_1 \dots dt_{k-1}

for {t \geq 0} a real number. Thus for example {H_t(x) = \frac{\exp(tx) - 1}{x}} when {k=1} and {t \geq 0}. By rescaling one may write

\displaystyle  H_t(x_1,\dots,x_k)

\displaystyle = t^{k-1} \int_{t_1+\dots+t_k=1: t_1,\dots,t_k \geq 0} \exp( t_1 t x_1 + \dots + t_k t x_k)\ dt_1 \dots dt_{k-1},

at which point it is clear that these expressions are smooth in {t} for any {t \geq 0}, so we may form derivatives {H^{(j)}_t(x) = \frac{d^j}{dt^j} H_t(x)} for any non-negative integer {j} and any {t \geq 0}; here our differentiation will always be in the {t} variable rather than the {x} variables. The analogue of (5) is then

\displaystyle  S_\lambda(x) = \det( H^{(i-1)}_{\lambda_j}(x) )_{1 \leq i,j \leq k}, \ \ \ \ \ (16)

thus for instance

\displaystyle  S_{(\lambda_1)}(x_1) = H_{\lambda_1}(x_1)


\displaystyle  S_{(\lambda_1,\lambda_2)}(x_1,x_2) = H_{\lambda_1}(x_1,x_2) H^{(1)}_{\lambda_2}(x_1,x_2) - H_{\lambda_2}(x_1,x_2) H^{(1)}_{\lambda_1}(x_1,x_2)

and so forth.

As before, we may prove (16) by induction on {k}. The cases {k=0,1} are easy, so let us suppose {k \geq 2} and that the claim already holds for {k-1} (actually the inductive argument will also work for {k=1} if one pays careful attention to the conventions). By (10), it suffices to show that

\displaystyle  \int_{\lambda' \prec \lambda} \det( H^{(i-1)}_{\lambda'_j}(x') )_{1 \leq i,j \leq k-1} \exp(x_k (|\lambda|-|\lambda'|))\ d\lambda'

\displaystyle  = \det( H^{(i-1)}_{\lambda_j}(x) )_{1 \leq i,j \leq k} \ \ \ \ \ (17)

whenever {x = (x_1,\dots,x_k) \in {\bf R}^k}, {\lambda} is a real partition of length {k}, and {x' := (x_1,\dots,x_{k-1})}. Shifting all the {x_j} by {h} will multiply each {H_t(x)} by {\exp( ht)}, and (after some application of the Leibniz rule and row operations) can be seen to multiply both sides here by {\exp(h|\lambda|)}; thus we may normalise {x_k=0}. We can then factor the integral and use multilinearity of the determinant to write the left-hand side of (17) as

\displaystyle  \det( \int_{\lambda_{j+1}}^{\lambda_j} H^{(i-1)}_{\lambda'_j}(x')\ d\lambda'_j )_{1 \leq i,j \leq k-1}.

From construction we see that

\displaystyle  H_t(x) = \int_0^t H_{t'}(x')\ dt'

for any {t \geq 0}, and hence

\displaystyle  H_{t_1}(x) - H_{t_2}(x) = \int_{t_1}^{t_2} H_t(x')\ dt

for any {t_2 \geq t_1 \geq 0}; actually with the convention that {H_t = 0} for negative {t}, this identity holds for all {t_2 \geq t_1}. Shifting {t_1,t_2,t} by {h} and then differentiating repeatedly at {h=0}, we conclude that

\displaystyle  H^{(i-1)}_{t_1}(x) - H^{(i-1)}_{t_2}(x) = \int_{t_1}^{t_2} H^{(i-1)}_t(x')\ dt

for any natural number {i}. Thus we can rewrite the preceding determinant as

\displaystyle  \det( H^{(i-1)}_{\lambda_j}(x) - H^{(i-1)}_{\lambda_{j+1}}(x) )_{1 \leq i,j \leq k-1}.

Performing the now familiar maneuvre of expanding out into {2^{k-1}} terms, observing that all but {k} of them vanish, and interpretating the surviving terms as cofactors, this is the determinant of the {k \times k} matrix whose top {k-1} rows are {( H^{(i-1)}_{\lambda_j}(x))_{1 \leq i \leq k-1; 1 \leq j \leq k}}, and whose bottom row is {(1)_{1 \leq j \leq k}}.

Next, we observe from definition that

\displaystyle  H_t(x_1,\dots,x_k) = \int_0^t H_{t'}(x_2,\dots,x_k) \exp( (t-t') x_1 )\ dt'

for any {t \geq 0} and {(x_1,\dots,x_k) \in {\bf R}^k}, and hence by the fundamental theorem of calculus

\displaystyle  (\frac{d}{dt} - x_1) H_t(x_1,\dots,x_k) = H_t(x_2,\dots,x_k).

Iterating this identity we conclude that

\displaystyle  (\frac{d}{dt}-x_{k-1}) \dots (\frac{d}{dt}-x_1) H_t(x_1,\dots,x_k) = H_t(x_k)

and in particular when {x_k=0} we have

\displaystyle  (\frac{d}{dt}-x_{k-1}) \dots (\frac{d}{dt}-x_1) H_t(x) = 1.

Thus we can write {1} as {H^{(k-1)}_t(x)} plus a linear combination of the {H^{(i-1)}_t(x)} for {i=1,\dots,k-1}, where the coefficients are independent of {t}. This allows us to write the bottom row {(1)_{1 \leq j \leq k}} as {(H^{(k-1)}_{\lambda_j}(x))_{1 \leq j \leq k}} plus a linear combination of the {(H^{(i-1)}_{\lambda_j}(x))_{1 \leq j \leq k}} for {i=1,\dots,k}, and (17) follows.

A similar argument gives the more general Jacobi-Trudi identity

\displaystyle  S_{\lambda/\mu}(x) = \det( ( H_{\lambda_j-\mu_i}(x) )_{1 \leq j \leq k; 1 \leq i \leq k'}, (H^{(i-1)}_{\lambda_j}(x))_{1 \leq j \leq k; 1 \leq i \leq k-k'} ),

whenever {\lambda} is a real partition of length {k}, {\mu} is a real partition of length {0 \leq k' \leq k}, {x \in {\bf R}^k}, and one adopts the convention that {H_t} (and its first {k-1} derivatives) vanish for {t < 0}. Thus for instance

\displaystyle  S_{(\lambda_1,\lambda_2)/(\mu_1)}(x_2) = \det \begin{pmatrix} H_{\lambda_1-\mu_1}(x_2) & H_{\lambda_1}(x_2) \\ H_{\lambda_2 - \mu_1}(x_2) & H_{\lambda_2}(x_2) \end{pmatrix},

\displaystyle  S_{(\lambda_1,\lambda_2,\lambda_3)/(\mu_1)}(x_2,x_3) = \det \begin{pmatrix} H_{\lambda_1-\mu_1}(x_2,x_3) & H_{\lambda_1}(x_2,x_3) & H^{(1)}_{\lambda_1}(x_2,x_3) \\ H_{\lambda_2 - \mu_1}(x_2,x_3) & H_{\lambda_2}(x_2,x_3) & H^{(1)}_{\lambda_2}(x_2,x_3) \\ H_{\lambda_3 - \mu_1}(x_2,x_3) & H_{\lambda_3}(x_2,x_3) & H^{(1)}_{\lambda_3}(x_2,x_3) \end{pmatrix},

and so forth.

Exercise 2 If {\lambda,\mu} are real partitions of length {k} with positive entries, and {k' \geq k}, show that

\displaystyle  \det( H_{\lambda_i-\mu_j}(x))_{1 \leq i,j \leq k} = \lim_{\nu \rightarrow 0} \frac{1}{V(\nu)} S_{(\lambda,\nu)/\mu}(x)

for any {x \in {\bf R}^{k'}}, where {\nu} ranges over real partitions of length {k'-k} with distinct entries, and {(\lambda,\nu)} is the length {k'} partition formed by concatenating {\lambda} and {\nu} (this will also be a partition if {\nu} is sufficiently small).

(Sep 14: updated with several suggestions and corrections supplied by Darij Grinberg.)

Filed under: expository, math.CO, math.RA Tagged: determinants, Schur polynomials, skew-Schur functions

Doug NatelsonDOE experimental condensed matter physics PI meeting, day 3

And from the last half-day of the meeting:

  • Because the mobile electrons in graphene have an energy-momentum relationship similar to that of relativistic particles, the physics of electrons bound to atomic-scale defects in graphene has much in common with the physics that sets the limits on the stability of heavy atoms - when the kinetic energy of the electrons in the innermost orbitals is high enough that relativistic effects become very important.  It is possible to examine single defect sites with a scanning tunneling microscope and look at the energies of bound states, and see this kind of physics in 2d.  
  • There is a ton of activity concentrating on realizing Majorana fermions, expected to show up in the solid state when topologically interesting "edge states" are coupled to superconducting leads.  One way to do this would be to use the edge states of the quantum Hall effect, but usually the magnetic fields required to get in the quantum Hall regime don't play well with superconductivity.  Graphene can provide a way around this, with amorphous MoRe acting as very efficient superconducting contact material.  The results are some rather spectacular and complex superconducting devices (here and here).
  • With an excellent transmission electron microscope, it's possible to carve out atomically well defined holes in boron nitride monolayers, and then use those to create confined potential wells for carriers in graphene.  Words don't do justice to the fabrication process - it's amazing.  See here and here.
  • It's possible to induce and see big collective motions of a whole array of molecules on a surface that each act like little rotors.
  • In part due to the peculiar band structure of some topologically interesting materials, they can have truly remarkable nonlinear optical properties.
My apologies for not including everything - side discussions made it tough to take notes on everything, and the selection in these postings is set by that and not any judgment of excitement.  Likewise, the posters at the meeting were very informative, but I did not take notes on those.

September 14, 2017

David HoggGaia, asteroseismology, robots

In our panic about upcoming Gaia DR2, Adrian Price-Whelan and I have established a weekly workshop on Wednesdays, in which we discuss, hack, and parallel-work on Gaia projects in the library at the Flatiron CCA. In our first meeting we just said what we wanted to do, jointly edited a big shared google doc, and then started working. At each workshop meeting, we will spend some time talking and some time working. My plan is to do data-driven photometric parallaxes, and maybe infer some dust.

At the Stars Group Meeting, Stephen Feeney (Flatiron) talked about asteroseismology, where we are trying to get the seismic parameters without ever taking a Fourier Transform. Some of the crowd (Cantiello in particular) suggested that we have started on stars that are too hard; we should choose super-easy, super-bright, super-standard stars to start. Others in the crowd (Hawkins in particular) pointed out that we could be using asteroseismic H-R diagram priors on our inference. Why not be physically motivated? Duh.

At the end of Group Meeting, Kevin Schawinski (ETH) said a few words about auto-encoders. We discussed imposing more causal structure on them, and seeing what happens. He is going down this path. We also veered off into networks-of-autonomous-robots territory for LSST follow-up, keying off remarks from Or Graur (CfA) about time-domain and spectroscopic surveys. Building robots that know about scientific costs and utility is an incredibly promising direction, but hard.

David Hoggstatistics of power spectra

Daniela Huppenkothen (NYU) came to talk about power spectra and cross-spectra today. The idea of the cross-spectrum is that you multiply one signal's Fourier transform against the complex conjugate of the others'. If the signals are identical, this is the power spectrum. If they differ by phase lags, the answer has an imaginary part, and so on. We then launched into a long conversation about the distribution of cross-spectrum components given distributions for the original signals. In the simplest case, this is about distributions of sums of products of Gaussian-distributed variables, where analytic results are rare. And that's the simplest case!

One paradox or oddity that we discussed is the following: In a long time series, imagine that every time point gets a value (flux value, say) that is drawn from a very skew or very non-Gaussian distribution. Now take the Fourier transform. By central-limit reasoning, all the Fourier amplitudes must be very close to Gaussian-distributed! Where did the non-Gaussianity go? After all, the FT is simply a rotation in data space. I think it probably all went into the correlations of the Fourier amplitudes, but how to see that? These are old ideas that are well understood in signal processing, I am sure, but not by me!

September 13, 2017

Doug NatelsonDOE experimental condensed matter PI meeting, day 2

More things I learned:

  • I've talked about skyrmions before.  It turns out that by coupling a ferromagnet to a strong spin-orbit coupling metal, one can stabilize skyrmions at room temperature.  They can be visualized using magnetic transmission x-ray microscopy - focused, circularly polarized x-ray studies.   The skyrmion motion can show its own form of the Hall effect.  Moreover, it is possible to create structures where skyrmions can be created one at a time on demand, and moved back and forth in a strip of that material - analogous to a racetrack memory.
  • Patterned arrays of little magnetic islands continue to be a playground for looking at analogs of complicated magnetic systems.  They're a kind of magnetic metamaterial.  See here.  It's possible to build in frustration, and to look at how topologically protected magnetic excitations (rather like skyrmions) stick around and can't relax.
  • Topological insulator materials, with their large spin-orbit effects and surface spin-momentum locking, can be used to pump spin and flip magnets.  However, the electronic structure of both the magnet and the TI are changed when one is deposited on the other, due in part to interfacial charge transfer.
  • There continues to be remarkable progress on the growth and understanding of complex oxide heterostructures and interfaces - too many examples and things to describe.
  • The use of nonlinear optics to reveal complicated internal symmetries (talked about here) continues to be very cool.
  • Antiferromagnetic layers can be surprisingly good at passing spin currents.  Also, I want to start working on yttrium iron garnet, so that I can use this at some point in a talk.
  • It's possible to do some impressive manipulation of the valley degree of freedom in 2d transition metal dichalcogenides, creating blobs of complete valley polarization, for example.  It's possible to use an electric field to break inversion symmetry in bilayers and turn some of these effects on and off electrically.
  • The halide perovskites actually can make fantastic nanocrystals in terms of optical properties and homogeneity.

John BaezComplex Adaptive Systems (Part 5)

When we design a complex system, we often start with a rough outline and fill in details later, one piece at a time. And if the system is supposed to be adaptive, these details may need to changed as the system is actually being used!

The use of operads should make this easier. One reason is that an operad typically has more than one algebra.

Remember from Part 3: an operad has operations, which are abstract ways of sticking things together. An algebra makes these operations concrete: it specifies some sets of actual things, and how the operations in the operad get implemented as actual ways to stick these things together.

So, an operad O can have one algebra in which things are described in a bare-bones, simplified way, and another algebra in which things are described in more detail. Indeed it will typically have many algebras, corresponding to many levels of detail, but let’s just think about two for a minute.

When we have a ‘less detailed’ algebra A and a ‘more detailed’ algebra A', they will typically be related by a map

f : A' \to A

which ‘forgets the extra details’. This map should be a ‘homomorphism’ of algebras, but I’ll postpone the definition of that concept.

What we often want to do, when designing a system, is not forget extra detail, but rather add extra detail to some rough specification. There is not always a systematic way to do this. If there is, then we may have a homomorphism

g : A \to A'

going back the other way. This is wonderful, because it lets us automate the process of filling in the details. But we can’t always count on being able to do this—especially not if we want an optimal or even acceptable result. So, often we may have to start with an element of A and search for elements of A' that are mapped to it by f : A' \to A.

Let me give some examples. I’ll take the operad that I described last time, and describe some of its algebras, and homomorphisms between these.

I’ll start with an algebra that has very little detail: its elements will be simple graphs. As the name suggests, these are among the simplest possible ways of thinking about networks. They just look like this:

Then I’ll give an algebra with more detail, where the vertices of our simple graphs are points in the plane. There’s nothing special about the plane: we could replace the plane by any other set, and get another algebra of our operad. For example, we could use the set of points on the surface of the Caribbean Sea, the blue stuff in the rectangle here:

That’s what we might use in a search and rescue operation. The points could represent boats, and the edges could represent communication channels.

Then I’ll give an algebra with even more detail, where two points connected by an edge can’t be too far apart. This would be good for range-limited communication channels.

Then I’ll give an algebra with still more detail, where the locations of the points are functions of time. Now our boats are moving around!

Okay, here we go.

The operad from last time was called O_G. Here G is the network model of simple graphs. The best way to picture an operation of O_G is as a way of sticking together a list of simple graphs to get a new simple graph.

For example, an operation

f \in O_G(3,4,2;9)

is a way of sticking together a simple graph with 3 vertices, one with 4 vertices and one with 2 vertices to get one with 9 vertices. Here’s a picture of such an operation:

Note that this operation is itself a simple graph. An operation in O_G(3,4,2;9) is just a simple graph with 9 vertices, where we have labelled the vertices from 1 to 9.

This operad comes with a very obvious algebra A where the operations do just what I suggested. In this algebra, an element of A(t) is a simple graph with t vertices, listed in order. Here t is any natural number, which I’m calling ‘t’ for ‘type’.

We also need to say how the operations in O_G act on these sets A(t). If we take simple graphs in A(3), A(4), and A(2):

we can use our operation f to stick them together and get this:

But we can also make up a more interesting algebra of O_G. Let’s call this algebra A'. We’ll let an element of A'(t) be a simple graph with t vertices, listed in order, which are points in the plane.

My previous pictures can be reused to show how operations in O_G act on this new algebra A'. The only difference is that now we tread the vertices literally as points in the plane! Before you should have been imagining them as abstract points not living anywhere; now they have locations.

Now let’s make up an even more detailed algebra A''.

What if our communication channels are ‘range-limited’? For example, what if two boats can’t communicate if they are more than 100 kilometers apart?

Then we can let an element of A''(t) be a simple graph with t vertices in the plane such that no two vertices connected by an edge have distance > 100.

Now the operations of our operad O_G act in a more interesting way. If we have an operation, and we apply it to elements of our algebra, it ‘tries’ to put in new edges as it did before, but it ‘fails’ for any edge that would have length > 100. In other words, we just leave out any edges that would be too long.

It took me a while to figure this out. At first I thought the result of the operation would need to be undefined whenever we tried to create an edge that violated the length constraint. But in fact it acts in a perfectly well-defined way: we just don’t put in edges that would be too long!

This is good. This means that if you tell two boats to set up a communication channel, and they’re too far apart, you don’t get the ‘blue screen of death’: your setup doesn’t crash and burn. Instead, you just get a polite warning—‘communication channel not established’—and you can proceed.

The nontrivial part is to check that if we do this, we really get an algebra of our operad! There are some laws that must hold in any algebra. But since I haven’t yet described those laws, I won’t check them here. You’ll have to wait for our paper to come out.

Let’s do one more algebra today. For lack of creativity I’ll call it A'''. Now an element of A'''(t) is a time-dependent graph in the plane with t vertices, listed in order. Namely, the positions of the vertices depend on time, and the presence or absence of an edge between two vertices can also depend on time. Furthermore, let’s impose the requirement that any two vertices can only connected by an edge at times when their distance is ≤ 100.

When I say ‘functions of time’ here, what ‘time’? We can model time by some interval [T_1, T_2]. But if you don’t like that, you can change it.

This algebra A''' works more or less like A''. The operations of O_G try to create edges, but these edges only ‘take’ at times when the vertices they connect have distance ≤ 100.

There’s something here you might not like. Our operations can only try to create edges ‘for all times’… and succeed at times when the vertices are close enough. We can’t try to set up a communication channel for a limited amount of time.

But fear not: this is just a limitation in our chosen network model, ‘simple graphs’. With a fancier network model, we’d get a fancier operad, with fancier operations. Right now I’m trying to keep the operad simple (pun not intended), and show you a variety of different algebras.

And you might expect, we have algebra homomorphisms going from more detailed algebras to less detailed ones:

f_T : A''' \to A'', \quad h : A' \to A

The homomorphism h takes a simple graph in the plane and forgets the location of its vertices. The homomorphism f_T depends on a choice of time T \in [T_1, T_2]. For any time T, it takes a time-dependent graph in the plane and evaluates it at that time, getting a graph in the plane (which obeys the distance constraints, since the time-dependent graph obeyed those constraints at any time).

We do not have a homomorphism g: A'' \to A' that takes a simple graph in the plane obeying our distance constraints and forgets about those constraints. There’s a map g sending elements of A'' to elements of A' in this way. But it’s not an algebra homomorphism! The problem is that first trying to connect two graphs with an edge and then applying g may give a different result than first applying g and then connecting two graphs with an edge.

In short: a single operad has many algebras, which we can use to describe our desired system at different levels of detail. Algebra homomorphisms relate these different levels of detail.

Next time I’ll look at some more interesting algebras of the same operad. For example, there’s one that describes a system of interacting mobile agents, which move around in some specific way, determined by their location and the locations of the agents they’re communicating with.

Even this is just the tip of the iceberg—that is, still a rather low level of detail. We can also introduce stochasticity (that is, randomness). And to go even further, we could switch to a more sophisticated operad, based on a fancier ‘network model’.

But not today.

BackreactionAway Note

I'm in Switzerland this week, for a conference on "Thinking about Space and Time: 100 Years of Applying and Interpreting General Relativity." I am also behind with several things and blogging will remain slow for the next weeks. If you miss my writing all too much, here is a new paper.

Richard EastherSet The Controls For The Heart of Saturn


It has been a bitter-sweet month for solar system explorers.

As a teenager and a space-geek, I had a poster of this iconic montage of montage of Saturn's moons, composed from images taken by the two Voyager probes as they rushed past the ringed planet.

Time passes. In August, the Voyager mission celebrated its 40th anniversary; the twin spacecraft are heading into interstellar space but still regularly dispatch data to Earth, their signals growing fainter and fainter as they travel further and further from home. With luck they will survive another decade, but they must eventually fall silent and will journey mutely to the stars.

This week on September 15, the Cassini spacecraft – the most recent visitor human beings have sent to Saturn – will meet a far more emphatic demise. Launched 20 years ago, Cassini arrived at Saturn in 2004 and took up orbit around the giant planet, making hundreds of loops through its retinue of moons and skimming the iconic rings. It's about to run out of the propellant needed for manoeuvring; theoretically it might circle Saturn forever, but we could no longer steer it. 

So, as I write this, Cassini has climbed away from Saturn for the final time to make a flyby of Titan, Saturn's largest moon, which has nudged it into a collision course with the giant planet. 

Cassini's spectacular finale is, in part, a tribute to its success. All spacecraft carry stowaways: bacterial spores which can, remarkably, remain viable in the vacuum of space. And one of Cassini's many discoveries was that three of Saturn's moons appear to have oceans of water beneath their solid crusts. Leaving the spacecraft to wander unguided around the Saturnian system would risk a collision with one of these moons. So, rather than let its microbial hitchhikers disembark onto a pristine world, Cassini will not "go gentle into the good night" as its propellant runs low but has been steered towards a fiery demise.

Geysers of warm water from the sub-surface oceans of Enceladus.

Geysers of warm water from the sub-surface oceans of Enceladus.

For cosmologists like myself, the contents of our own solar system (and even our own galaxy, much of the time) are things to look past rather than to look at. The complex worlds that circle our sun can seem to be a motley collection of adorable oddballs when compared to the deep simplicity of the universe itself,  like Shakespearean comic turns in the foreground with the serious actors behind.

If cosmologists were architects, I supect most of us would be austere modernists, whereas planetary scientists might prefer rococo delights adorned with complicated facades and gratuitous flourishes. 

Saturn's moon Mimas, with the impact crater Herschel. 

Saturn's moon Mimas, with the impact crater Herschel. 

Saturn is a prime example: beyond its gaudy rings and complex cloud tops, it boasts an astonishingly diverse collection of moons. These include Titan, the only known moon with its own atmosphere, to which Cassini dispatched the Huygens lander; and Mimas, a battered icy world sporting a giant impact crater that makes it look uncannily like the Death Star. ("That's no moon, it's a space station" – and yet this, indeed, is a moon.)

NASA has an undoubted ability to sell a story, and it has been making the most of the anthropomorphic appeal of this brave little $3 billion, 5 ton, plutonium-powered spacecraft on its two-decade mission. But the hype is not misplaced: Saturn has a key place in the evolving human understanding of the cosmos. "Childlike wonder" is both a cliché and the literal truth when we speak of the "space". The most distant planet easily visible to the naked eye, Saturn once marked the apparent edge of our solar system. Its rings are visible through even the smallest of telescopes, and seeing them this way still takes my breath away. Cassini has shown us Saturn with its rings and its moons up close and personal, with astonishing clarity and precision. The spacecraft and the team of scientists responsible for it have written themselves into the history books. 

Beyond revealing the universe to us, space exploration exposes our own  small place in the big picture. Saturn's rings, backlit by the distant sun, dominate what may be the most haunting image returned by Cassini. After taking in the planet's dark bulk and golden rings, our eye drifts to the pale blue dot in the lower right hand corner of the frame, and the image's full weight is revealed: it is a lovely, lonely, long-distance portrait of the Earth and of humanity itself.

The earth, as seen from Saturn. 

The earth, as seen from Saturn. 

CODA: All images courtesy of NASA, JPL and/or the ESA. The header image is from a visualisation of Cassini's final plunge into Saturn's atmosphere. 

As a physicist, Saturn always stuns me with the complex orbital dynamics of its moons, the chaotically braided and filigreed rings, and its astonishing atmospheric dynamics. Truly, there is something here for everyone. 

No introduction necessary
No introduction necessary A hexagon storm at Saturn's north pole. Fluid dynamics at its best.
A hexagon storm at Saturn's north pole. Fluid dynamics at its best. Braided structure in the rings.
Braided structure in the rings. Hyperion
Hyperion Enceladus
Enceladus Titan
Titan The surface of Titan, from the Huygens lander.
The surface of Titan, from the Huygens lander.

September 12, 2017

Tommaso DorigoCMS Reports Evidence For Higgs Decays To B-Quark Pairs

Another chapter in the saga of the search for the elusive, but dominant, decay mode of the Higgs boson has been reported by the CMS collaboration last month. This is one of those specific sub-fields of research where a hard competition arises on the answer to a relatively minor scientific question. That the Higgs boson couples to b-quarks is indirectly already well demonstrated by a number of other measurements - its coupling to (third generation) quarks being demonstrated by its production rate, for example. Yet, being the first ones to "observe" the H->bb decay is a coveted goal.

read more

September 11, 2017

Richard EastherPop Science

This week I acquired a copy of Steven Weinberg's 1977 book The First Three Minutescourtesy of an emeritus colleague downsizing his library. It was the first detailed popularisation of the Big Bang and is a pop sci classic, written by one of the leading theoretical physicists of the modern era.  

An absolute classic, even if this copy has seen better days. 

An absolute classic, even if this copy has seen better days. 

As you can guess from the title, The First Three Minutes tells the story of the moments following the Big Bang. The early universe sets the stage for the development of the cosmos we see around us now, and the Cosmic Microwave Background is a key link between the distant past and the present day.  Discovered just a dozen years before the book appeared in 1977, the microwave background is a a time capsule buried moments after the Big Bang, and Weinberg explains how it reveals the nature of the infant universe 

And, as it happens, the latest addition to my library fell open to reveal these words:

Page 77

Page 77

This text is almost a time capsule on its own. In 1992 CoBE made headlines by providing a map of the microwave background sensitive enough to reveal minute variations in the temperature of different regions of the sky. In 2006, Mather shared the Nobel prize for his work on CoBE. Meanwhile, Rai Weiss moved on from CoBE to become a founder of LIGO which earned its own place in history by successfully detecting gravitational waves in 2015.

Bon voyage indeed. 

CODA: And it goes without staying that Weiss is an odds-on favourite to get the call from Stockholm a few weeks from now when this year's prizes are announced.

IMAGE: The header image shows the temperature differences across the sky, as measured by the CoBE satellite. The temperature range corresponds to changes of a few parts in 100,000. 

John BaezA Compositional Framework for Reaction Networks

For a long time Blake Pollard and I have been working on ‘open’ chemical reaction networks: that is, networks of chemical reactions where some chemicals can flow in from an outside source, or flow out. The picture to keep in mind is something like this:

where the yellow circles are different kinds of chemicals and the aqua boxes are different reactions. The purple dots in the sets X and Y are ‘inputs’ and ‘outputs’, where certain kinds of chemicals can flow in or out.

Here’s our paper on this stuff:

• John Baez and Blake Pollard, A compositional framework for reaction networks, Reviews in Mathematical Physics 29, 1750028.

Blake and I gave talks about this stuff in Luxembourg this June, at a nice conference called Dynamics, thermodynamics and information processing in chemical networks. So, if you’re the sort who prefers talk slides to big scary papers, you can look at those:

• John Baez, The mathematics of open reaction networks.

• Blake Pollard, Black-boxing open reaction networks.

But I want to say here what we do in our paper, because it’s pretty cool, and it took a few years to figure it out. To get things to work, we needed my student Brendan Fong to invent the right category-theoretic formalism: ‘decorated cospans’. But we also had to figure out the right way to think about open dynamical systems!

In the end, we figured out how to first ‘gray-box’ an open reaction network, converting it into an open dynamical system, and then ‘black-box’ it, obtaining the relation between input and output flows and concentrations that holds in steady state. The first step extracts the dynamical behavior of an open reaction network; the second extracts its static behavior. And both these steps are functors!

Lawvere had the idea that the process of assigning ‘meaning’ to expressions could be seen as a functor. This idea has caught on in theoretical computer science: it’s called ‘functorial semantics’. So, what we’re doing here is applying functorial semantics to chemistry.

Now Blake has passed his thesis defense based on this work, and he just needs to polish up his thesis a little before submitting it. This summer he’s doing an internship at the Princeton branch of the engineering firm Siemens. He’s working with Arquimedes Canedo on ‘knowledge representation’.

But I’m still eager to dig deeper into open reaction networks. They’re a small but nontrivial step toward my dream of a mathematics of living systems. My working hypothesis is that living systems seem ‘messy’ to physicists because they operate at a higher level of abstraction. That’s what I’m trying to explore.

Here’s the idea of our paper.

The idea

Reaction networks are a very general framework for describing processes where entities interact and transform int other entities. While they first showed up in chemistry, and are often called ‘chemical reaction networks’, they have lots of other applications. For example, a basic model of infectious disease, the ‘SIRS model’, is described by this reaction network:

S + I \stackrel{\iota}{\longrightarrow} 2 I  \qquad  I \stackrel{\rho}{\longrightarrow} R \stackrel{\lambda}{\longrightarrow} S

We see here three types of entity, called species:

S: susceptible,
I: infected,
R: resistant.

We also have three `reactions’:

\iota : S + I \to 2 I: infection, in which a susceptible individual meets an infected one and becomes infected;
\rho : I \to R: recovery, in which an infected individual gains resistance to the disease;
\lambda : R \to S: loss of resistance, in which a resistant individual becomes susceptible.

In general, a reaction network involves a finite set of species, but reactions go between complexes, which are finite linear combinations of these species with natural number coefficients. The reaction network is a directed graph whose vertices are certain complexes and whose edges are called reactions.

If we attach a positive real number called a rate constant to each reaction, a reaction network determines a system of differential equations saying how the concentrations of the species change over time. This system of equations is usually called the rate equation. In the example I just gave, the rate equation is

\begin{array}{ccl} \displaystyle{\frac{d S}{d t}} &=& r_\lambda R - r_\iota S I \\ \\ \displaystyle{\frac{d I}{d t}} &=&  r_\iota S I - r_\rho I \\  \\ \displaystyle{\frac{d R}{d t}} &=& r_\rho I - r_\lambda R \end{array}

Here r_\iota, r_\rho and r_\lambda are the rate constants for the three reactions, and S, I, R now stand for the concentrations of the three species, which are treated in a continuum approximation as smooth functions of time:

S, I, R: \mathbb{R} \to [0,\infty)

The rate equation can be derived from the law of mass action, which says that any reaction occurs at a rate equal to its rate constant times the product of the concentrations of the species entering it as inputs.

But a reaction network is more than just a stepping-stone to its rate equation! Interesting qualitative properties of the rate equation, like the existence and uniqueness of steady state solutions, can often be determined just by looking at the reaction network, regardless of the rate constants. Results in this direction began with Feinberg and Horn’s work in the 1960’s, leading to the Deficiency Zero and Deficiency One Theorems, and more recently to Craciun’s proof of the Global Attractor Conjecture.

In our paper, Blake and I present a ‘compositional framework’ for reaction networks. In other words, we describe rules for building up reaction networks from smaller pieces, in such a way that its rate equation can be figured out knowing those those of the pieces. But this framework requires that we view reaction networks in a somewhat different way, as ‘Petri nets’.

Petri nets were invented by Carl Petri in 1939, when he was just a teenager, for the purposes of chemistry. Much later, they became popular in theoretical computer science, biology and other fields. A Petri net is a bipartite directed graph: vertices of one kind represent species, vertices of the other kind represent reactions. The edges into a reaction specify which species are inputs to that reaction, while the edges out specify its outputs.

You can easily turn a reaction network into a Petri net and vice versa. For example, the reaction network above translates into this Petri net:

Beware: there are a lot of different names for the same thing, since the terminology comes from several communities. In the Petri net literature, species are called places and reactions are called transitions. In fact, Petri nets are sometimes called ‘place-transition nets’ or ‘P/T nets’. On the other hand, chemists call them ‘species-reaction graphs’ or ‘SR-graphs’. And when each reaction of a Petri net has a rate constant attached to it, it is often called a ‘stochastic Petri net’.

While some qualitative properties of a rate equation can be read off from a reaction network, others are more easily read from the corresponding Petri net. For example, properties of a Petri net can be used to determine whether its rate equation can have multiple steady states.

Petri nets are also better suited to a compositional framework. The key new concept is an ‘open’ Petri net. Here’s an example:

The box at left is a set X of ‘inputs’ (which happens to be empty), while the box at right is a set Y of ‘outputs’. Both inputs and outputs are points at which entities of various species can flow in or out of the Petri net. We say the open Petri net goes from X to Y. In our paper, we show how to treat it as a morphism f : X \to Y in a category we call \textrm{RxNet}.

Given an open Petri net with rate constants assigned to each reaction, our paper explains how to get its ‘open rate equation’. It’s just the usual rate equation with extra terms describing inflows and outflows. The above example has this open rate equation:

\begin{array}{ccr} \displaystyle{\frac{d S}{d t}} &=&  - r_\iota S I - o_1 \\ \\ \displaystyle{\frac{d I}{d t}} &=&  r_\iota S I - o_2  \end{array}

Here o_1, o_2 : \mathbb{R} \to \mathbb{R} are arbitrary smooth functions describing outflows as a function of time.

Given another open Petri net g: Y \to Z, for example this:

it will have its own open rate equation, in this case

\begin{array}{ccc} \displaystyle{\frac{d S}{d t}} &=& r_\lambda R + i_2 \\ \\ \displaystyle{\frac{d I}{d t}} &=& - r_\rho I + i_1 \\  \\ \displaystyle{\frac{d R}{d t}} &=& r_\rho I - r_\lambda R  \end{array}

Here i_1, i_2: \mathbb{R} \to \mathbb{R} are arbitrary smooth functions describing inflows as a function of time. Now for a tiny bit of category theory: we can compose f and g by gluing the outputs of f to the inputs of g. This gives a new open Petri net gf: X \to Z, as follows:

But this open Petri net gf has an empty set of inputs, and an empty set of outputs! So it amounts to an ordinary Petri net, and its open rate equation is a rate equation of the usual kind. Indeed, this is the Petri net we have already seen.

As it turns out, there’s a systematic procedure for combining the open rate equations for two open Petri nets to obtain that of their composite. In the example we’re looking at, we just identify the outflows of f with the inflows of g (setting i_1 = o_1 and i_2 = o_2) and then add the right hand sides of their open rate equations.

The first goal of our paper is to precisely describe this procedure, and to prove that it defines a functor

\diamond: \textrm{RxNet} \to \textrm{Dynam}

from \textrm{RxNet} to a category \textrm{Dynam} where the morphisms are ‘open dynamical systems’. By a dynamical system, we essentially mean a vector field on \mathbb{R}^n, which can be used to define a system of first-order ordinary differential equations in n variables. An example is the rate equation of a Petri net. An open dynamical system allows for the possibility of extra terms that are arbitrary functions of time, such as the inflows and outflows in an open rate equation.

In fact, we prove that \textrm{RxNet} and \textrm{Dynam} are symmetric monoidal categories and that d is a symmetric monoidal functor. To do this, we use Brendan Fong’s theory of ‘decorated cospans’.

Decorated cospans are a powerful general tool for describing open systems. A cospan in any category is just a diagram like this:

We are mostly interested in cospans in \mathrm{FinSet}, the category of finite sets and functions between these. The set S, the so-called apex of the cospan, is the set of states of an open system. The sets X and Y are the inputs and outputs of this system. The legs of the cospan, meaning the morphisms i: X \to S and o: Y \to S, describe how these inputs and outputs are included in the system. In our application, S is the set of species of a Petri net.

For example, we may take this reaction network:

A+B \stackrel{\alpha}{\longrightarrow} 2C \quad \quad C \stackrel{\beta}{\longrightarrow} D

treat it as a Petri net with S = \{A,B,C,D\}:

and then turn that into an open Petri net by choosing any finite sets X,Y and maps i: X \to S, o: Y \to S, for example like this:

(Notice that the maps including the inputs and outputs into the states of the system need not be one-to-one. This is technically useful, but it introduces some subtleties that I don’t feel like explaining right now.)

An open Petri net can thus be seen as a cospan of finite sets whose apex S is ‘decorated’ with some extra information, namely a Petri net with S as its set of species. Fong’s theory of decorated cospans lets us define a category with open Petri nets as morphisms, with composition given by gluing the outputs of one open Petri net to the inputs of another.

We call the functor

\diamond: \textrm{RxNet} \to \textrm{Dynam}

gray-boxing because it hides some but not all the internal details of an open Petri net. (In the paper we draw it as a gray box, but that’s too hard here!)

We can go further and black-box an open dynamical system. This amounts to recording only the relation between input and output variables that must hold in steady state. We prove that black-boxing gives a functor

\square: \textrm{Dynam} \to \mathrm{SemiAlgRel}

(yeah, the box here should be black, and in our paper it is). Here \mathrm{SemiAlgRel} is a category where the morphisms are semi-algebraic relations between real vector spaces, meaning relations defined by polynomials and inequalities. This relies on the fact that our dynamical systems involve algebraic vector fields, meaning those whose components are polynomials; more general dynamical systems would give more general relations.

That semi-algebraic relations are closed under composition is a nontrivial fact, a spinoff of the Tarski–Seidenberg theorem. This says that a subset of \mathbb{R}^{n+1} defined by polynomial equations and inequalities can be projected down onto \mathbb{R}^n, and the resulting set is still definable in terms of polynomial identities and inequalities. This wouldn’t be true if we didn’t allow inequalities. It’s neat to see this theorem, important in mathematical logic, showing up in chemistry!

Structure of the paper

Okay, now you’re ready to read our paper! Here’s how it goes:

In Section 2 we review and compare reaction networks and Petri nets. In Section 3 we construct a symmetric monoidal category \textrm{RNet} where an object is a finite set and a morphism is an open reaction network (or more precisely, an isomorphism class of open reaction networks). In Section 4 we enhance this construction to define a symmetric monoidal category \textrm{RxNet} where the transitions of the open reaction networks are equipped with rate constants. In Section 5 we explain the open dynamical system associated to an open reaction network, and in Section 6 we construct a symmetric monoidal category \textrm{Dynam} of open dynamical systems. In Section 7 we construct the gray-boxing functor

\diamond: \textrm{RxNet} \to \textrm{Dynam}

In Section 8 we construct the black-boxing functor

\square: \textrm{Dynam} \to \mathrm{SemiAlgRel}

We show both of these are symmetric monoidal functors.

Finally, in Section 9 we fit our results into a larger ‘network of network theories’. This is where various results in various papers I’ve been writing in the last few years start assembling to form a big picture! But this picture needs to grow….

September 09, 2017

Tommaso DorigoOn Turtles, Book Writing, And Overcommitments

Back from vacations, I think I need to report a few random things before I get back into physics blogging. So I'll peruse the science20 article category aptly called "Random Thoughts" for this one occasion.
My summer vacations took place just after a week spent in Ecuador, where I gave 6 hours of lectures on LHC physics and statistics for data analysis to astrophysics PhD students. I did report about that and an eventful hike in the last post. Unfortunately, the first week of my alleged rest was mostly spent fixing a few documents that the European Commission expected to receive by August 31st. As a coordinator of a training network, I have indeed certain obligations that I cannot escape. 

read more

September 08, 2017

Sean CarrollJoe Polchinski’s Memories, and a Mark Wise Movie

Joe Polchinski, a universally-admired theoretical physicist at the Kavli Institute for Theoretical Physics in Santa Barbara, recently posted a 150-page writeup of his memories of doing research over the years.

Memories of a Theoretical Physicist
Joseph Polchinski

While I was dealing with a brain injury and finding it difficult to work, two friends (Derek Westen, a friend of the KITP, and Steve Shenker, with whom I was recently collaborating), suggested that a new direction might be good. Steve in particular regarded me as a good writer and suggested that I try that. I quickly took to Steve’s suggestion. Having only two bodies of knowledge, myself and physics, I decided to write an autobiography about my development as a theoretical physicist. This is not written for any particular audience, but just to give myself a goal. It will probably have too much physics for a nontechnical reader, and too little for a physicist, but perhaps there with be different things for each. Parts may be tedious. But it is somewhat unique, I think, a blow-by-blow history of where I started and where I got to. Probably the target audience is theoretical physicists, especially young ones, who may enjoy comparing my struggles with their own. Some disclaimers: This is based on my own memories, jogged by the arXiv and Inspire. There will surely be errors and omissions. And note the title: this is about my memories, which will be different for other people. Also, it would not be possible for me to mention all the authors whose work might intersect mine, so this should not be treated as a reference work.

As the piece explains, it’s a bittersweet project, as it was brought about by Joe struggling with a serious illness and finding it difficult to do physics. We all hope he fully recovers and gets back to leading the field in creative directions.

I had the pleasure of spending three years down the hall from Joe when I was a postdoc at the ITP (it didn’t have the “K” at that time). You’ll see my name pop up briefly in his article, sadly in the context of an amusing anecdote rather than an exciting piece of research, since I stupidly spent three years in Santa Barbara without collaborating with any of the brilliant minds on the faculty there. Not sure exactly what I was thinking.

Joe is of course a world-leading theoretical physicist, and his memories give you an idea why, while at the same time being very honest about setbacks and frustrations. His style has never been to jump on a topic while it was hot, but to think deeply about fundamental issues and look for connections others have missed. This approach led him to such breakthroughs as a new understanding of the renormalization group, the discovery of D-branes in string theory, and the possibility of firewalls in black holes. It’s not necessarily a method that would work for everyone, especially because it doesn’t necessarily lead to a lot of papers being written at a young age. (Others who somehow made this style work for them, and somehow survived, include Ken Wilson and Alan Guth.) But the purity and integrity of Joe’s approach to doing science is an example for all of us.

Somehow over the course of 150 pages Joe neglected to mention perhaps his greatest triumph, as a three-time guest blogger (one, two, three). Too modest, I imagine.

His memories make for truly compelling reading, at least for physicists — he’s an excellent stylist and pedagogue, but the intended audience is people who have already heard about the renormalization group. This kind of thoughtful but informal recollection is an invaluable resource, as you get to see not only the polished final product of a physics paper, but the twists and turns of how it came to be, especially the motivations underlying why the scientist chose to think about things one way rather than some other way.

(Idea: there is a wonderful online magazine called The Players’ Tribune, which gives athletes an opportunity to write articles expressing their views and experiences, e.g. the raw feelings after you are traded. It would be great to have something like that for scientists, or for academics more broadly, to write about the experiences [good and bad] of doing research. Young people in the field would find it invaluable, and non-scientists could learn a lot about how science really works.)

You also get to read about many of the interesting friends and colleagues of Joe’s over the years. A prominent one is my current Caltech colleague Mark Wise, a leading physicist in his own right (and someone I was smart enough to collaborate with — with age comes wisdom, or at least more wisdom than you used to have). Joe and Mark got to know each other as postdocs, and have remained friends ever since. When it came time for a scientific gathering to celebrate Joe’s 60th birthday, Mark contributed a home-made movie showing (in inimitable style) how much progress he had made over the years in the activities they had enjoyed together in their relative youth. And now, for the first time, that movie is available to the whole public. It’s seven minutes long, but don’t make the mistake of skipping the blooper reel that accompanies the end credits. Many thanks to Kim Boddy, the former Caltech student who directed and produced this lost masterpiece.

When it came time for his own 60th, Mark being Mark he didn’t want the usual conference, and decided instead to gather physicist friends from over the years and take them to a local ice rink for a bout of curling. (Canadian heritage showing through.) Joe being Joe, this was an invitation he couldn’t resist, and we had a grand old time, free of any truly serious injuries.

We don’t often say it out loud, but one of the special privileges of being in this field is getting to know brilliant and wonderful people, and interacting with them over periods of many years. I owe Joe a lot — even if I wasn’t smart enough to collaborate with him when he was down the hall, I learned an enormous amount from his example, and often wonder how he would think about this or that issue in physics.


John BaezPostdoc in Applied Category Theory

guest post by Spencer Breiner

One Year Postdoc Position at Carnegie Mellon/NIST

We are seeking an early-career researcher with a background in category theory, functional programming and/or electrical engineering for a one-year post-doctoral position supported by an Early-concept Grant (EAGER) from the NSF’s Systems Science program. The position will be managed through Carnegie Mellon University (PI: Eswaran Subrahmanian), but the position itself will be located at the US National Institute for Standards and Technology (NIST), located in Gaithersburg, Maryland outside of Washington, DC.

The project aims to develop a compositional semantics for electrical networks which is suitable for system prediction, analysis and control. This work will extend existing methods for linear circuits (featured on this blog!) to include (i) probabilistic estimates of future consumption and (ii) top-down incentives for load management. We will model a multi-layered system of such “distributed energy resources” including loads and generators (e.g., solar array vs. power plant), different types of resource aggregation (e.g., apartment to apartment building), and across several time scales. We hope to demonstrate that such a system can balance local load and generation in order to minimize expected instability at higher levels of the electrical grid.

This post is available full-time (40 hours/5 days per week) for 12 months, and can begin as early as October 1st.

For more information on this position, please contact Dr. Eswaran Subrahmanian ( or Dr. Spencer Breiner (

September 07, 2017

John PreskillWhat Clocks have to do with Quantum Computation

Have you ever played the game “telephone”? You might remember it from your nursery days, blissfully oblivious to the fact that quantum mechanics governs your existence, and not yet wondering why Fox canceled Firefly. For everyone who forgot, here is the gist of the game: sit in a circle with your friends. Now you think of a story (prompt: a spherical weapon that can destroy planets). Once you have the story laid out in your head, tell it to your neighbor on your left. She takes the story and tells it to her friend on her left. It is important to master the art of whispering for this game: you don’t want to be overheard when the story is passed on. After one round, the friend on your right tells you what he heard from his friend on his right. Does the story match your masterpiece?

If your story is generic, it probably survived without alterations. Tolstoy’s War and Peace, on the other hand, might turn into a version of Game of Thrones. Passing along complex stories seems to be more difficult than passing on easy ones, and it also becomes more prone to errors the more friends join your circle—which makes intuitive sense.

So what does this have to do with physics or quantum computation?

Let’s add maths to this game, because why not. Take a difficult calculation that follows a certain procedure, such as long division of two integer numbers.


Now you perform one step of the division and pass the piece of paper on to your left. Your friend there is honest and trusts you: she doesn’t check what you did, but happily performs the next step in the division. Once she’s done, she passes the piece of paper on to her left, and so on. By the time the paper reaches you again, you hopefully have the result of the calculation, given you have enough friends to divide your favorite numbers, and given that everyone performed their steps accurately.

I’m not sure if Feynman thought about telephone when he, in 1986, proposed a method of embedding computation into eigenstates (e.g. the ground state) of a Hamiltonian, but the fact remains that the similarity is striking. Remember that writing down a Hamiltonian is a way of describing a quantum-mechanical system, for instance how the constituents of a multi-body system are coupled with each other. The ground state of such a Hamiltonian describes the lowest energy state that a system assumes when it is cooled down as far as possible. Before we dive into how the Hamiltonian looks, let’s try to understand how, in Feynman’s construction, a game of telephone can be represented as a quantum state of a physical system.


In this picture, | \psi_t \rangle represents a snapshot of the story or calculation at time t—in the division example, this would be the current divisor and remainder terms; so e.g. the snapshot | \psi_1 \rangle represents the initial dividend and divisor, and the person next to you is thinking of | \psi_2 \rangle, one step into the calculation. The label |t\rangle in front of the tensor sign \otimes is like a tag that you put on files on your computer, and uniquely associates the snapshot | \psi_t \rangle with the t-th time step. We say that the story snapshot is entangled with its label.

This is also an example of quantum superposition: all the |t\rangle\otimes|\psi_t\rangle are distinct states (the time labels, if not the story snapshots, are all unique), and by adding these states up we put them into superposition. So if we were to measure the time label, we would obtain one of the snapshots uniformly at random—it’s as if you had a cloth bag full of cards, and you blindly pick one. One side of the card will have the time label on it, while the other side contains the story snapshot. But don’t be fooled—you cannot access all story snapshots by successive measurements! Quantum states collapse; whatever measurement outcome you have dictates what the quantum state will look like after the measurement. In our example, this means that we burn the cloth bag after you pick your card; in this sense, the quantum state behaves differently than a simple juxtaposition of scraps of paper.

Nonetheless, this is the reason why we call such a quantum state a history state: it preserves the history of the computation, where every step that is performed is appropriately tagged. If we manage to compare all pairs of successively-labeled snapshots (without measuring them!), one can verify that the end result does, in fact, stem from a valid computation—and not just a random guess. In the division example, this would correspond to checking that each of your friends performs a correct division step.

So history states are clearly useful. But how do you design a Hamiltonian with a history state as the ground state? Is it even possible? The answer is yes, and it all boils down to verifying that two successive snapshots | \psi_t \rangle and | \psi_{t+1} \rangle are related to each other in the correct manner, e.g. that your friend on seat t+1 performs a valid division step from the snapshot prepared by the person on seat t. In fancy physics speak (aka Bra-Ket notation), we can for example write


The actual Hamiltonian will then be a sum of such terms, and one can verify that its ground state is indeed the one representing the history state we introduced above.

I’m glossing over a few details here: there is a minus sign in front of this term, and we have to add its Hermitian conjugate (flip the labels and snapshots around). But this is not essential for the argument, so let’s not go there for now. However, you’re totally right with one thing: it wouldn’t make sense to write down all snapshots themselves into the Hamiltonian! After all, if we had to calculate every snapshot transition like | \psi_2 \rangle \langle \psi_1 | in advance, there would be no use to this construction. So instead, we can write


Perfect. We now have a Hamiltonian which, in its ground state, can encode the history of a computation, and if we replace the transition operator \mathbf U_\text{DIVISION} with another desired transition operator (a unitary matrix), we can perform any computation we want (more precisely, any computation that can be written as a unitary matrix; this includes anything your laptop can do). However, this is only half of the story, since we need to have a way of reading out the final answer. So let’s step back for a moment, and go back to the telephone game.

Can you motivate your friends to cheat?

Your friends playing telephone make mistakes.


Ok, let’s assume we give them a little incentive: offer $1 to the person on your right in case the result is an even number. Will he cheat? With so much at stake?


In fact, maybe your friend is not only greedy but also dishonest: he wants to hide the fact that he miscalculates on purpose, and sometimes tells his friend on his right to make a mistake instead (maybe giving him a share of the money). So for a few of your friends close to the person at the end of the chain, there is a real incentive to cheat!


Can we motivate spins to cheat?

We already discussed how to write down a Hamiltonian that verifies valid computational steps. But can we do the same thing as bribing your friends to procure a certain outcome? Can we give an energy bonus to certain outcomes of the computation?

In fact, we can. Alexei Kitaev proposed adding a term to Feynman’s Hamiltonian which raises the energy of an unwanted outcome, relative to a desirable outcome. How? Again in fancy physics language,


What this term does is that it takes the history state and yields a negative energy contribution (signaled by the minus sign in front) if the last snapshot | \psi_T \rangle is an even number. If it isn’t, no bonus is felt; this would correspond to you keeping the dollar you promised to your friend. This simply means that in case the computation has a desirable outcome—i.e. an even number—the Hamiltonian allows a lower energy ground state than for any other output. Et voilà, we can distinguish between different outputs of the computation.

The true picture is, of course, a tad more complicated; generally, we give penalty terms to unwanted states instead of bonus terms to desirable ones. The reason for this is somewhat subtle, but can potentially be explained with an analogy: humans fear loss much more than they value gains of the same magnitude. Quantum systems behave in a completely opposite manner: the promise of a bonus at the end of the computation is such a great incentive that most of the weight of the history state will flock to the bonus term (for the physicists: the system now has a bound state, meaning that the wave function is localized around a specific site, and drops off exponentially quickly away from it). This makes it difficult to verify the computation far away from the bonus term.

So the Feynman-Kitaev Hamiltonian consists of three parts: one which checks each step of the computation, one which penalizes invalid outcomes—and obviously we also need to make sure the input of the computation is valid. Why? Well, are you saying you are more honest than your friends?


Physical Implications of History State Hamiltonians

If there is one thing I’ve learned throughout my PhD it is that we should always ask what use a theory is. So what can we learn from this construction? Almost 20 years ago, Alexei Kitaev used Feynman’s idea to prove that estimating the ground state energy of a physical system with local interactions is hard, even on a quantum computer (for the experts: QMA-hard under the assumption of a 1/\text{poly} promise gap splitting the embedded YES and NO instances). Why is estimating the ground state energy hard? The energy shift induced by the output penalty depends on the outcome of the computation that we embed (e.g. even or odd outcome). And as fun as long division is, there are much more difficult tasks we can write down as a history state Hamiltonian—in fact, it is this very freedom which makes estimating the ground state energy difficult: if we can embed any computation we want, estimating the induced energy shift should be at least as hard as actually performing the computation on a quantum computer. This has one curious implication: if we don’t expect that we can estimate the ground state energy efficiently, the physical system will take a long time to actually assume its ground state when cooled down, and potentially behave like a spin glass!

Feynman’s history state construction and the QMA-hardness proof of Kitaev were a big part of the research I did for my PhD. I formalized the case where the message is not passed on along a unique path from neighbor to neighbor, but can take an arbitrary path between beginning and end in a more complicated graph; in this way, computation can in some sense be parallelized.

Well, to be honest, the last statement is not entirely true: while there can be parallel tracks of computation from A to B, these tracks have to perform the same computation (albeit in potentially different steps); otherwise the system becomes much more complicated to analyze. The reason why this admittedly quite restricted form of branching might still be an advantage is somewhat subtle: if your computation has a lot of classical if-else cases, but you don’t have enough space on your piece of paper to store all the variables to check the conditions, it might be worth just taking a gamble: pass your message down one branch, in the hope that the condition is met. The only thing that you have to be careful about is that in case the condition isn’t met, you don’t produce invalid results. What use is that in physics? If you don’t have to store a lot of information locally, it means you can get away using a much lower local spin dimension for the system you describe.

Such small and physically realistic models have as of late been proposed as actual computational devices (called Hamiltonian quantum computers), where a prepared initial state is evolved under such a history state Hamiltonian for a specific time, in contrast to the static property of a history ground state we discussed above. Yet whether or not this is something one could actually build in a lab remains an open question.

Last year, Thomas Vidick invited me to visit Caltech, and I worked with IQIM postdoc Elizabeth Crosson to improve the analysis of the energy penalty that is assigned to any history state that cheats the constraints in the Feynman-Kitaev Hamiltonian. We identified some open problems and also proved limitations on the extent of the energetic penalty that these kinds of Hamiltonians can have. This summer I went back to Caltech to further develop these ideas and make progress towards a complete understanding of such “clock” Hamiltonians, which Elizabeth and I are putting together in a follow-up work that should appear soon.

It is striking how such simple idea can have so profound an implication across fields, and remain relevant, even 30 years after its first proposal.


Feynman concludes his 1986 Foundations of Physics paper with the following words.

At any rate, it seems that the laws of physics present no barrier to reducing the size of computers until bits are the size of atoms, and quantum behavior holds dominant sway.

For my part, I hope that he was right and that history state constructions will play a part in this future.

Matt StrasslerWatch for Auroras

Those of you who remember my post on how to keep track of opportunities to see northern (and southern) lights will be impressed by this image from .

The latest space weather overview plot

The top plot shows the number of X-rays (high-energy photons [particles of light]) coming from the sun, and that huge spike in the middle of the plot indicates a very powerful solar flare occurred about 24 hours ago.  It should take about 2 days from the time of the flare for its other effects — the cloud of electrically-charged particles expelled from the Sun’s atmosphere — to arrive at Earth.  The electrically-charged particles are what generate the auroras, when they are directed by Earth’s magnetic field to enter the Earth’s atmosphere near the Earth’s magnetic poles, where they crash into atoms in the upper atmosphere, exciting them and causing them to radiate visible light.

The flare was very powerful, but its cloud of particles didn’t head straight for Earth.  We might get only a glancing blow.  So we don’t know how big an effect to expect here on our planet.  All we can do for now is be hopeful, and wait.

In any case, auroras borealis and australis are possible in the next day or so.  Watch for the middle plot to go haywire, and for the bars in the lower plot to jump higher; then you know the time has arrived.

Filed under: Astronomy Tagged: astronomy, auroras

Matt StrasslerAn Experience of a Lifetime: My 1999 Eclipse Adventure

Back in 1999 I saw a total solar eclipse in Europe, and it was a life-altering experience.  I wrote about it back then, but was never entirely happy with the article.  This week I’ve revised it.  It could still benefit from some editing and revision (comments welcome), but I think it’s now a good read.  It’s full of intellectual observations, but there are powerful emotions too.

If you’re interested, you can read it as a pdf, or just scroll down.



A Luminescent Darkness: My 1999 Eclipse Adventure

© Matt Strassler 1999

After two years of dreaming, two months of planning, and two hours of packing, I drove to John F. Kennedy airport, took the shuttle to the Air France terminal, and checked in.  I was brimming with excitement. In three days time, with a bit of luck, I would witness one the great spectacles that a human being can experience: a complete, utter and total eclipse of the Sun.

I had missed one eight years earlier. In July 1991, a total solar eclipse crossed over Baja California. I had thought seriously about driving the fourteen hundred miles from the San Francisco area, where I was a graduate student studying theoretical physics, to the very southern tip of the peninsula. But worried about my car’s ill health and scared by rumors of gasoline shortages in Baja, I chickened out. Four of my older colleagues, more worldly and more experienced, and supplied with a more reliable vehicle, drove down together. When they returned, exhilarated, they regaled us with stories of their magical adventure. Hearing their tales, I kicked myself for not going, and had been kicking myself ever since. Life is not so long that such opportunities can be rationalized or procrastinated away.

A total eclipse of the Sun is a event of mythic significance, so rare and extraordinary and unbelievable that it really ought to exist only in ancient legends, in epic poems, and in science fiction stories. There are other types of eclipses — partial and total eclipses of the Moon, in which the Earth blocks sunlight that normally illuminates the Moon, and various eclipses of the Sun in which the Moon blocks sunlight that normally illuminates the Earth. But total solar eclipses are in a class all their own. Only during the brief moments of totality does the Sun vanish altogether, leaving the shocked spectator in a suddenly darkened world, gazing uncomprehendingly at a black disk of nothingness.

Our species relies on daylight. Day is warm; day grows our food; day permits travel with a clear sense of what lies ahead. We are so fearful of the night — of what lurks there unseen, of the sounds that we cannot interpret. Horror films rely on this fear; demons and axe murderers are rarely found walking about in bright sunshine. Dark places are dangerous places; sudden unexpected darkness is worst of all. These are the conventions of cinema, born of our inmost psychology. But the Sun and the Moon are not actors projected on a screen. The terror is real.

It has been said that if the Earth were a member of a federation of a million planets, it would be a famous tourist attraction, because this home of ours would be the only one in the republic with such beautiful eclipses. For our skies are witness to a coincidence truly of cosmic proportions. It is a stunning accident that although the Sun is so immense that it could hold a million Earths, and the Moon so small that dozens could fit inside our planet, these two spheres, the brightest bodies in Earth’s skies, appear the same size. A faraway giant may seem no larger than a nearby child. And this perfect match of their sizes and distances makes our planet’s eclipses truly spectacular, visually and scientifically. They are described by witnesses as a sight of weird and unique beauty, a visual treasure completely unlike anything else a person will ever see, or even imagine.

But total solar eclipses are uncommon, occurring only once every year or two. Even worse, totality only occurs in a narrow band that sweeps across the Earth — often just across its oceans. Only a small fraction of the Earth sees a total eclipse in any century. And so these eclipses are precious; only the lucky, or the devoted, will experience one before they die.

In my own life, I’d certainly been more devoted than lucky. I knew it wasn’t wise to wait for the Moon’s shadow to find me by chance. Instead I was going on a journey to place myself in its path.

The biggest challenge in eclipse-chasing is the logistics. The area in which totality is visible is very long but very narrow. For my trip, in 1999, it was a long strip running west to east all across Europe, but only a hundred miles wide from north to south. A narrow zone crossing heavily populated areas is sure to attract a massive crowd, so finding hotels and transport can be difficult. Furthermore, although eclipses are precisely predictable, governed by the laws of gravity worked out by Isaac Newton himself, weather and human beings are far less dependable.

But I had a well-considered plan. I would travel by train to a small city east of Paris, where I had reserved a rental car. Keeping a close watch on the weather forecast, I would drive on back roads, avoiding clogged highways. I had no hotel reservations. It would have been pointless to make them for the night before the event, since it was well known that everything within two hours drive of the totality zone was booked solid. Moreover, I wanted the flexibility to adjust to the weather and couldn’t know in advance where I’d want to stay. So my idea was that on the night prior to the eclipse, I would drive to a good location in the path of the lunar shadow, and sleep in the back of my car. I had a sleeping bag with me to keep me warm, and enough lightweight clothing for the week — and not much else.

Oh, it was such a good plan, clean and simple, and that’s why my heart had so far to sink and my brain so ludicrous a calamity to contemplate when I checked my wallet, an hour before flight time, and saw a gaping black emptiness where my driver’s license was supposed to be. I was struck dumb. No license meant no car rental; no car meant no flexibility and no place to sleep. Sixteen years of driving and I had never lost it before; why, why, of all times, now, when it was to play a central role in a once-in-a-lifetime adventure?

I didn’t panic. I walked calmly back to the check-in counters, managed to get myself rescheduled for a flight on the following day, drove the three hours back to New Jersey, and started looking. It wasn’t in my car. Nor was it in the pile of unneeded items I’d removed from my wallet. Not in my suitcase, not under my bed, not in my office. As it was Sunday, I couldn’t get a replacement license. Hope dimmed, flickered, and went dark.

Deep breaths. Plan B?

I didn’t have a tent, and couldn’t easily have found one. But I did have a rain poncho, large enough to keep my sleeping bag off the ground. As long as it didn’t rain too hard, I could try, the night before the eclipse, to find a place to camp outdoors; with luck I’d find lodging for the other nights. I doubted this would be legal, but I was willing to take the chance. But what about my suitcase? I couldn’t carry that around with me into the wilderness. Fortunately, I knew a solution. For a year after college, I had studied music in France, and had often gone sightseeing by rail. On those trips I had commonly made use of the ubiquitous lockers at the train stations, leaving some luggage while I explored the nearby town. As for flexibility of location, that was unrecoverable; the big downside of Plan B was that I could no longer adjust to the weather. I’d just have to be lucky. I comforted myself with the thought that the worst that could happen to me would be a week of eating French food.

So the next day, carrying the additional weight of a poncho and an umbrella, but having in compensation discarded all inessential clothing and tourist information, I headed back to the airport, this time by bus. Without further misadventures, I was soon being carried across the Atlantic.

As usual I struggled to nap amid the loud silence of a night flight. But my sleeplessness was rewarded with one of those good omens that makes you think that you must be doing the right thing. As we approached the European coastline, and I gazed sleepily out my window, I suddenly saw a bright glowing light. It was the rising white tip of the thin crescent Moon.

Solar eclipses occur at New Moon, always. This is nothing but simple geometry; the Moon must place itself exactly between the Sun and the Earth to cause an eclipse, and that means the half of the Moon that faces us must be in shadow. (At Full Moon, the opposite is true; the Earth is between the Sun and the Moon, so the half of the Moon that faces us is in full sunlight. That’s when lunar eclipses can occur.) And just before a New Moon, the Moon is close to the Sun’s location in the sky. It becomes visible, as the Earth turns, just before the Sun does, rising as a morning crescent shortly before sunrise. (Similarly, we get an evening crescent just after a New Moon.)

There, out over the vast Atlantic, from a dark ocean of water into a dark sea of stars, rose the delicate thin slip of Luna the lover, on her way to her mystical rendezvous with Sol. Her crescent smiled at me and winked a greeting. I smiled back, and whispered, “see you in two days…” For totality is not merely the only time you can look straight at the Sun and see its crown. It is the only time you can see the New Moon.

We landed in Paris at 6:30 Monday morning, E-day-minus-two. I headed straight to the airport train station, and poured over rail maps and my road maps trying to guess a good location to use as a base. Eventually I chose a medium-sized town with the name Charleville-Mezieres. It was on the northern edge of the totality zone, at the end of a large spoke of the Paris-centered rail system, and was far enough from Paris, Brussels, and all large German towns that I suspected it might escape the worst of the crowds. It would then be easy, the night before the eclipse, to take a train back into the center of the zone, where totality would last the longest.

Two hours later I was in the Paris-East rail station and had purchased my ticket for Charleville-Mezieres. With ninety minutes to wait, I wandered around the station. It was evident that France had gone eclipse-happy. Every magazine had a cover story; every newspaper had a special insert; signs concerning the event were everywhere. Many of the magazines carried free eclipse glasses, with a black opaque metallic material for lenses that only the Sun can penetrate. Warnings against looking at the Sun without them were to be found on every newspaper front page. I soon learned that there had been a dreadful scandal in which a widely distributed shipment of imported glasses was discovered to be dangerously defective, leading the government to make a hurried and desperate attempt to recall them. There were also many leaflets advertising planned events in towns lying in the totality zone, and information about extra trains that would be running. A chaotic rush out of Paris was clearly expected.

Before noon I was on a train heading through the Paris suburbs into the farmlands of the Champagne region. The rocking of the train put me right to sleep, but the shrieking children halfway up the rail car quickly ended my nap. I watched the lovely sunlit French countryside as it rolled by. The Sun was by now well overhead — or rather, the Earth had rotated so that France was nearly facing the Sun head on. Sometimes, when the train banked on a turn, the light nearly blinded me, and I had to close my eyes.

With my eyelids shut, I thought about how I’d managed, over decades, to avoid ever once accidentally staring at the Sun for even a second… and about how almost every animal with eyes manages to do this during its entire life. It’s quite a feat, when you think about it. But it’s essential, of course. The Sun’s ferocious blaze is even worse than it appears, for it contains more than just visible light. It also radiates light too violet for us to see — ultraviolet — which is powerful enough to destroy our vision. Any animal lacking instincts powerful enough to keep its eyes off the Sun will go blind, soon to starve or be eaten. But humans are in danger during solar eclipses, because our intense curiosity can make us ignore our instincts. Many of us will suffer permanent eye damage, not understanding when and how it is safe to look at the Sun… which is almost, but not quite, never.

In fact the only time it is safe to look with the naked eye is during totality, when the Sun’s disk is completely blocked by the New Moon, and the world is dark. Then, and only then, can one see that the Sun is not a sphere, and that it has a sort of atmosphere, immense and usually unseen.

At the heart of the Sun, and source of its awesome power, is its nuclear furnace, nearly thirty million degrees hot and nearly five billion years old. All that heat gradually filters and boils out of the Sun’s core toward its visible surface, which is a mere six thousand degrees… still white-hot. Outside this region is a large irregular halo of material that is normally too dim to see against the blinding disk. The inner part of that halo is called the chromosphere; there, giant eruptions called “prominences” loop outward into space. The outer part of the halo is the “corona”, Latin for “crown.” The opportunity to see the Sun’s corona is one of the main reasons to seek totality.

Still very drowsy, but in a good mood, I arrived in Charleville. Wanting to leave my bags in the station while I looked for a hotel room, I searched for the luggage lockers. After three tiring trips around the station, I asked at a ticket booth. “Oh,” said the woman behind the desk, “we haven’t had them available since the Algerian terrorism of a few years ago.”

I gulped. This threatened plan B, for what was I to do with my luggage on eclipse day? I certainly couldn’t walk out into the French countryside looking for a place to camp while carrying a full suitcase and a sleeping bag! And even the present problem of looking for a hotel would be daunting. The woman behind the desk was sympathetic, but her only suggestion was to try one of the hotels near the station. Since the tourist information office was a mile away, it seemed the only good option, and I lugged my bags across the street.

Here, finally, luck smiled. The very first place I stopped at had a room for that night, reasonably priced and perfectly clean, if spartan. It was also available the night after the eclipse. My choice of Charleville had been wise. Unfortunately, even here, Eclipse Eve — Tuesday evening — was as bad as I imagined. The hoteliere assured me that all of Charleville was booked (and my later attempts to find a room, even a last-minute cancellation, proved fruitless.) Still, she she was happy for me to leave my luggage at the hotel while I tramped through the French countryside. Thus was Plan B saved.

Somewhat relieved, I wandered around the town. Charleville is not unattractive, and the orange sandstone 16th century architecture of its central square is very pleasing to the eye. By dusk I was exhausted and collapsed on my bed. I slept long and deep, and awoke refreshed. I took a short sightseeing trip by train, ate a delicious lunch, and tried one more time to find a room in Charleville for Eclipse Eve. Failing once again, I resolved to camp in the heart of the totality zone.

But where? I had several criteria in mind. For the eclipse, I wanted to be far from any large town or highway, so that streetlights, often automatically triggered by darkness, would not spoil the experience. Also I wanted hills and farmland; I wanted to be at a summit, with no trees nearby, in order to have the best possible view. It didn’t take long to decide on a location. About five miles south of the unassuming town of Rethel, rebuilt after total destruction in the first world war, my map showed a high hill. It seemed perfect.

Fortunately, I learned just in time that this same high hill had attracted the attention of the local authorities, and they had decided to designate this very place the “official viewing site” in the region. A hundred thousand people were expected to descend on Rethel and take shuttles from the town to the “site.” Clearly this was not where I wanted to be!

So instead, when I arrived in Rethel, I walked in another direction. I aimed for an area a few miles west of town, quiet hilly farmland.

Yet again, my luck seemed to be on the wane. By four it was drizzling, and by five it was raining. Darkness would settle at around eight, and I had little time to find a site for unobtrusive camping, much less a dry one. The rain stopped, restarted, hesitated, spat, but refused to go away. An unending mass of rain clouds could be seen heading toward me from the west. I had hoped to use trees for some shelter against rain, but now the trees were drenched and dripping, even worse than the rain itself.

Still completely unsure what I would do, I continued walking into the evening. I must have cut a very odd figure, carrying an open umbrella, a sleeping bag, and a small black backpack. I took a break in a village square, taking shelter at a church’s side door, where I munched on French bread and cheese. Maybe one of these farmers would let me sleep in a dry spot in his barn, I thought to myself. But I still hadn’t reached the hills I was aiming for, so I kept walking.

After another mile, I came to a hilltop with a dirt farm track crossing the road. There, just off the road to the right, was a large piece of farm machinery. And underneath it, a large, flat, sheltered spot. Hideous, but I could sleep there. Since it wasn’t quite nightfall yet and I could see a hill on the other side of the road along the same track, one which looked like it might be good for watching the eclipse, I took a few minutes to explore it. There I found another piece of farm equipment, also with a sheltered underbelly. This one was much further from the road, looked unused, and presumably offered both safer and quieter shelter. It was sitting just off the dirt track in a fallow field. The field was of thick, sticky, almost hard mud, the kind you don’t slip in and which doesn’t ooze but which gloms onto the sides of your shoe.

And so it was that Eclipse Eve found me spreading my poncho in a friendly unknown farmer’s field, twisting my body so as not to hit my head on the metal bars of my shelter, carefully unwrapping my sleeping bag and removing my shoes so as not to cover everything in mud, brushing my teeth in bottled water, and bedding down for the night. The whole scene was so absurd that I found myself sporting a slightly manic grin and giggling. But still, I was satisfied. Despite the odds, I was in the zone at the appointed time; when I awoke the next morning I would be scarcely two miles from my final destination. If the clouds were against me, so be it. I had done my part.

I slept pretty well, considering both my excitement and the uneven ground. At daybreak I was surrounded by fog, but by 8 a.m.~the fog was lifting, revealing a few spots of blue sky amid low clouds. My choice of shelter was also confirmed; my sleeping bag was dry, and across the road the other piece of machinery I had considered was already in use.

I packed up and started walking west again. The weather seemed uncertain, with three layers of clouds — low stratus, medium cumulus, and high cirrus — crossing over each other. Blue patches would appear, then close up. I trudged to the base of my chosen hill, then followed another dirt track to the top, where I was graced with a lovely view. The rolling paysage of fertile France stretched before me, blotched here and there with sunshine.  Again I had chosen well, better than I realized, as it turned out, for I was not alone on the hill. A Belgian couple had chosen it too — and they had a car…

There I waited. The minutes ticked by. The temperature fluctuated, and the fields changed color, as the Sun played hide and seek. I didn’t need these reminders of the Sun’s importance — that without its heat the Earth would freeze, and without its light, plants would not grow and the cycle of life would quickly end. I thought about how pre-scientific cultures had viewed the Sun. In cultures and religions around the world, the blazing disk has often been attributed divine power and regal authority. And why not? In the past century, we’ve finally learned what the Sun is made from and why it shines. But we are no less in awe than our ancestors, for the Sun is much larger, much older, and much more powerful than most of them imagined.

For a while, I listened to the radio. Crowds were assembling across Europe. Special events — concerts, art shows, contests — were taking place, organized by towns in the zone to coincide with the eclipse. This was hardly surprising. All those tourists had come for totality. But totality is brief, never more than a handful of minutes.  It’s the luck of geometry, the details of the orbits of the Earth and Moon, that set its duration. For my eclipse, the Moon’s shadow was only about a hundred miles wide. Racing along at three thousand miles per hour, it would darken any one location for at most two minutes. Now if a million people are expected to descend on your town for a two-minute event, I suppose it is a good idea to give them something else to do while they wait. And of course, the French cultural establishment loves this kind of opportunity. Multimedia events are their specialty, and they often give commissions to contemporary artists. I was particularly amused to discover later that an old acquaintance of mine — I met him in 1987 at the composers’ entrance exams for the Paris Conservatory — had been commissioned to write an orchestral piece, called “Eclipse,” for the festival in the large city of Reims. It was performed just before the moment of darkness.

Finally, around 11:30, the eclipse began. The Moon nibbled a tiny notch out of the sun. I looked at it briefly through my eclipse glasses, and felt the first butterflies of anticipation. The Belgian couple, in their late fourties, came up to the top of the hill and stood alongside me. They were Flemish, but the man spoke French, and we chatted for a while. It turned out he was a scientist also, and had spent some time in the United States, so we had plenty to talk about. But our discussion kept turning to the clouds, which showed no signs of dissipating. The Sun was often veiled by thin cirrus or completely hidden by thick cumulus. We kept a nervous watch.

Time crawled as the Moon inched across the brilliant disk. It passed the midway point and the Sun became a crescent. With only twenty minutes before totality, my Belgian friends conversed in Dutch. The man turned to me. “We have decided to drive toward that hole in the clouds back to the east,” he said in French. “It’s really not looking so good here. Do you want to come with us?” I paused to think. How far away was that hole? Would we end up back at the town? Would we get caught in traffic? Would we end up somewhere low? What were my chances if I stayed where I was? I hesitated, unsure. If I went with them, I was subject to their whims, not my own. But after looking at the oncoming clouds one more time, I decided my present location was not favorable. I joined them.

We descended the dirt track and turned left onto the road I’d taken so long to walk. It was completely empty. We kept one eye on where we were going and five eyes on the sky. After two miles, the crescent sun became visible through a large gap in the low clouds. There were still high thin clouds slightly veiling it, but the sky around it was a pale blue. We went a bit further, and then stopped… at the very same dirt track where I had slept the night before. A line of ten or fifteen cars now stretched along it, but there was plenty of room for our vehicle.

By now, with ten minutes to go, the light was beginning to change. When only five percent of the Sun remains, your eye can really tell. The blues become deeper, the whites become milkier, and everything is more subdued. Also it becomes noticeably cooler. I’d seen this light before, in New Mexico in 1994. I had gone there to watch an “annular” eclipse of the Sun. An annular eclipse occurs when the Moon passes directly in front of the Sun but is just a bit too far away from the Earth for its shadow to reach the ground. In such an eclipse, the Moon fails to completely block the Sun; a narrow ringlet, or “annulus”, often called the “ring of fire,” remains visible. That day I watched from a mountain top, site of several telescopes, in nearly clear skies. But imagine the dismay of the spectators as the four-and-a-half minutes of annularity were blocked by a five-minute cloud! Fortunately there was a bright spot. For a brief instant — no more than three seconds — the cloud became thin, and a perfect circle of light shone through, too dim to penetrate eclipse glasses but visible with the naked eye… a veiled, surreal vision.

On the dirt track in the middle of French fields, we started counting down the minutes. There was more and more tension in the air. I put faster speed film into my camera. The light became still milkier, and as the crescent became a fingernail, all eyes were focused either on the Sun itself or on a small but thick and dangerous-looking cloud heading straight for it. Except mine. I didn’t care if I saw the last dot of sunlight disappear. What I wanted to watch was the coming of Moon-shadow.

One of my motivations for seeking a hill was that I wanted to observe the approach of darkness. Three thousand miles an hour is just under a mile per second, so if one had a view extending out five miles or so, I thought, one could really see the edge coming. I expected it would be much like watching the shadow of a cloud coming toward me, with the darkness sweeping along the ground, only much darker and faster. I looked to the west and waited for the drama to unfold.

And it did, but it was not what I was expecting. Even though observing the shadow is a common thing for eclipse watchers to do, nothing I had ever read about eclipses prepared me in the slightest for what I was about to witness. I’ve never seen it photographed, or even described. Maybe it was an effect of all the clouds around us. Or maybe others, just as I do, find it difficult to convey.

For how can one relate the sight of daylight sliding swiftly, like an sigh, to deep twilight? of the western sky, seen through scattered clouds, changing seamlessly and inexorably from blue to pink to slate gray to the last yellow of sunset? of colors rising up out of the horizon and spreading across the sky like water from a broken dyke flooding onto a field?

I cannot find the right combination of words to capture the sense of being swept up, of being overwhelmed, of being transfixed with awe, as one might be before the summoning of a great wave or a great wind by the command of a god, yet all in utter silence and great beauty. Reliving it as I write this brings a tear. In the end I have nothing to compare it to.

The great metamorphosis passed. The light stabilized. Shaken, I looked up.

And quickly looked away. I had seen a near-disk of darkness, the fuzzy whiteness of the corona, and some bright dots around the disk’s edge, one especially bright where the Sun still clearly shone through. Accidentally I had seen with my naked eyes the “diamond ring,” a moment when the last brilliant drop of Sun and the glistening corona are simultaneously visible. It’s not safe to look at. I glanced again. Still several bright dots. I glanced again. Still there — but the Sun had to be covered by now…

So I looked longer, and realized that the Sun was indeed covered, that those bright dots were there to stay. There it was. The eclipsed Sun, or rather, the dark disk of the New Moon, surrounded by the Sun’s crown, studded at its edge with seven bright pink jewels. It was bizarre, awe-inspiring, a spooky hallucination. It shimmered.

The Sun’s corona didn’t really resemble what I had seen in photographs, and I could immediately see why. The corona looked as though it were made of glistening white wispy hair, billowing outward like a mop of whiskers. It gleamed with a celestial light, a shine resembling that of well-lit tinsel. No camera could capture that glow, no photograph reproduce it.

But the greatest, most delightful surprise was the seven beautiful gems. I knew they had to be the great eruptions on the surface of the Sun, prominences, huge magnetic storms larger than our planet and more violent than anything else in the solar system. However, nobody ever told me they were bright pink! I always assumed they were orange (silly of me, since the whole Sun looks orange if you look at it through an orange filter, which the photographs always do.) They were arranged almost symmetrically around the sun, with one of them actually well separated from its surface and halfway out into the lovely soft filaments of the corona. I explored them with my binoculars. The colors, the glistening timbre, the rich detail, it is a visual delight. The scene is living, vibrant, delicate and soft; by comparison, all the photographs and films seem dry, flat, deadened.

I was surprised at my calm. After the great rush of the shadow, the stasis of totality had caught me off guard.  Around me it was much lighter than I had expected. The sense was of late twilight, with a deep blue-purple sky; yet it was still bright enough to read by. The yellow light of late sunset stretched all the way around the horizon. The planet Venus was visible, but no stars peeked through the clouds. Perhaps longer eclipses have darker skies, a larger Moon-shadow putting daylight further away.

I had scarcely had time to absorb all of this when, just at the halfway point of totality, the dangerous-looking cumulus cloud finally arrived, and blotted out the view. A groan, but only a half-hearted one, emerged from the spectators; after all we’d seen what we’d come to see. I took in the colors emanating from the different parts of the sky, and then looked west again, waiting for the light to return. A thin red glow touched the horizon. I waited. Suddenly the red began to grow furiously. I yelled “Il revient!” — it is returning! — and then watched in awe as the reds became pinks, swarmed over us, turned yellow-white…

And then… it was daylight again. Normality, or a slightly muted version of it. The magical show was over, heavenly love had been consummated, we who had traveled far had been rewarded. The weather had been kind to us. There was a pause as we savored the experience, and waited for our brains to resume functioning. Then congratulations were passed around as people shook hands and hugged each other. I thanked my Belgian friends, who like me were smiling broadly. They offered me a ride back to town. I almost accepted, but stopped short, and instead thanked them again and told them I somehow wanted to be outside for a while longer. We exchanged addresses, said goodbyes, they drove off.

I started retracing my steps from the previous evening. As I walked back to the town of Rethel in the returning sunshine, the immensity of what I had seen began gradually to make its way through my skin into my blood, making me teary-eyed. I thought about myself, a scientist, educated and knowledgeable about the events that had just taken place, and tried to imagine what would have happened to me today if I had not had
that knowledge and had found myself, unexpectedly, in the Moon’s shadow.

It was not difficult; I had only to imagine what I would feel if the sky suddenly, without any warning, turned a fiery red instead of blue and began to howl. It would have been a living nightmare. The terror that I would have felt would have penetrated my bones. I would have fallen on my knees in panic; I would have screamed and wept; I would have called on every deity I knew and others I didn’t know for help; I would have despaired; I would have thought death or hell had come; I would have assumed my life was about to end. The two minutes of darkness, filled with the screams and cries of my neighbors, would have been timeless, maddening. When the Sun just as suddenly returned, I would have collapsed onto the ground with relief, profusely and weepingly thanking all of the deities for restoring the world to its former condition, and would have rushed home to relatives and friends, hoping to find some comfort and solace.

I would have sought explanations. I would have been willing to consider anything: dragons eating the Sun, spirits seeking to punish our village or country for its transgressions, evil and spiteful monsters trying to freeze the Earth, gods warning us of terrible things to come in future. But above all, I could never, never have imagined that this brief spine-chilling extinction and transformation of the Sun was a natural phenomenon. Nothing so spectacular and sudden and horrifying could have been the work of mere matter. It would once and for all have convinced me of the existence of creatures greater and more powerful than human beings, if I had previously had any doubt.

And I would have been forever changed. No longer could I have entirely trusted the regularity of days and nights, of seasons, of years. For the rest of my life I would have always found myself glancing at the sky, wanting to make sure that all, for the moment, was well. For if the Sun could suddenly vanish for two minutes, perhaps the next time it could vanish for two hours, or two days… or two centuries. Or forever.

I pondered the impact that eclipses, both solar and lunar, have had throughout human history. They have shaped civilizations. Wars and slaughters were begun and ended on their appearance; they sent ordinary people to their deaths as appeasement sacrifices; new gods and legends were invoked to give meaning to them. The need to predict them, and the coincidences which made their prediction possible, helped give birth to astronomy as a mathematically precise science, in China, in Greece, in modern Europe — developments without which my profession, and even my entire technologically-based culture, might not exist.

It was an hour’s walk to Rethel, but that afternoon it was a long journey. It took me across the globe to nations ancient and distant. By the time I reached the town, I’d communed with my ancestors, reconsidered human history, and examined anew my tiny place in the universe.  If I’d been a bit calm during totality itself, I wasn’t anymore. What I’d seen was gradually filtering, with great potency, into my soul.

I took the train back to Charleville, and slept dreamlessly. The next two days were an opportunity to unwind, to explore, and to eat well. On my last evening I returned to Paris to visit my old haunts. I managed to sneak into the courtyard of the apartment house where I had had a one-room garret up five flights of stairs, with its spartan furnishings and its one window that looked over the roofs of Paris to the Eiffel Tower. I wandered past the old Music Conservatory, since moved to the northeast corner of town, and past the bookstore where I bought so much music. My favorite bakery was still open.

That night I slept in an airport hotel, and the next day flew happily home to the American continent. I never did find my driver’s license.

But psychological closure came already on the day following the eclipse. I spent that day in Laon, a small city perched magnificently atop a rocky hill that rises vertically out of the French plains. I wandered its streets and visited its sights — an attractive church, old houses, pleasant old alleyways, ancient walls and gates. As evening approached I began walking about, looking for a restaurant, and I came to the northwestern edge of town overlooking the new city and the countryside beyond. The clouds had parted, and the Sun, looking large and dull red, was low in the sky. I leaned on the city wall and watched as the turning Earth carried me, and Laon, and all of France, at hundreds of miles an hour, intent on placing itself between me and the Sun. Yet another type of solar eclipse, one we call “sunset.”

The ruddy disk touched the horizon. I remembered the wispy white mane and the brilliant pink jewels. In my mind the Sun had always been grand and powerful, life-giver and taker, essential and dangerous. It could blind, burn, and kill.  I respected it, was impressed and awed by it, gave thanks for it, swore at it, feared it. But in the strange light of totality, I had seen beyond its unforgiving, blazing sphere, and glimpsed a softer side of the Sun. With its feathery hair blowing in a dark sky, it had seemed delicate, even vulnerable. It is, I thought to myself, as mortal as we.

The distant French hills rose across its face. As it waned, I found myself feeling a warmth, even a tenderness — affection for this giant glowing ball of hydrogen, this protector of our planet, this lonely beacon in a vast emptiness… the only star you and I will ever know.

Filed under: Astronomy, History of Science Tagged: astronomy, earth, eclipse, moon, space, sun

September 06, 2017

BackreactionInterpretations of Quantum Mechanics: The Cat’s View

Something else I made for the book but later removed. (Click to enlarge.)

September 02, 2017

Scott AaronsonGapP, Oracles, and Quantum Supremacy

Let me start with a few quick announcements before the main entrée:

First, the website is now live!  Thanks so much to my friend Adam Chalmers for setting it up.  Please try it out on your favorite P vs. NP solution paper—I think you’ll be impressed by how well our secret validation algorithm performs.

Second, some readers might enjoy a YouTube video of me lecturing about the computability theory of closed timelike curves, from the Workshop on Computational Complexity and High Energy Physics at the University of Maryland a month ago.  Other videos from the workshop—including of talks by John Preskill, Daniel Harlow, Stephen Jordan, and other names known around Shtetl-Optimized, and of a panel discussion in which I participated—are worth checking out as well.  Thanks so much to Stephen for organizing such a great workshop!

Third, thanks to everyone who’s emailed to ask whether I’m holding up OK with Hurricane Harvey, and whether I know how to swim (I do).  As it happens, I haven’t been in Texas for two months—I spent most of the summer visiting NYU and doing other travel, and this year, Dana and I are doing an early sabbatical at Tel Aviv University.  However, I understand from friends that Austin, being several hours’ drive further inland, got nothing compared to what Houston did, and that UT is open on schedule for the fall semester.  Hopefully our house is still standing as well!  Our thoughts go to all those affected by the disaster in Houston.  Eventually, the Earth’s rapidly destabilizing climate almost certainly means that Austin will be threatened as well by “500-year events” happening every year or two, as for that matter will a large portion of the earth’s surface.  For now, though, Austin lives to be weird another day.

GapP, Oracles, and Quantum Supremacy

by Scott Aaronson

Stuart Kurtz 60th Birthday Conference, Columbia, South Carolina

August 20, 2017

It’s great to be here, to celebrate the life and work of Stuart Kurtz, which could never be … eclipsed … by anything.

I wanted to say something about work in structural complexity and counting complexity and oracles that Stuart was involved with “back in the day,” and how that work plays a major role in issues that concern us right now in quantum computing.  A major goal for the next few years is the unfortunately-named Quantum Supremacy.  What this means is to get a clear quantum speedup, for some task: not necessarily a useful task, but something that we can be as confident as possible is classically hard.  For example, consider the 49-qubit superconducting chip that Google is planning to fabricate within the next year or so.  This won’t yet be good enough for running Shor’s algorithm, to factor numbers of any interesting size, but it hopefully will be good enough to sample from a probability distribution over n-bit strings—in this case, 49-bit strings—that’s hard to sample from classically, taking somewhere on the order of 249 steps.

Furthermore, the evidence that that sort of thing is indeed classically hard, might actually be stronger than the evidence that factoring is classically hard.  As I like to say, a fast classical factoring algorithm would “merely” collapse the world’s electronic commerce—as far as we know, it wouldn’t collapse the polynomial hierarchy!  By contrast, a fast classical algorithm to simulate quantum sampling would collapse the polynomial hierarchy, assuming the simulation is exact.  Let me first go over the argument for that, and then explain some of the more recent things we’ve learned.

Our starting point will be two fundamental complexity classes, #P and GapP.

#P is the class of all nonnegative integer functions f, for which there exists a nondeterministic polynomial-time Turing machine M such that f(x) equals the number of accepting paths of M(x).  Less formally, #P is the class of problems that boil down to summing up an exponential number of nonnegative terms, each of which is efficiently computable individually.

GapP—introduced by Fenner, Fortnow, and Kurtz in 1992—can be defined as the set {f-g : f,g∈#P}; that is, the closure of #P under subtraction.  Equivalently, GapP is the class of problems that boil down to summing up an exponential number of terms, each of which is efficiently computable individually, but which could be either positive or negative, and which can therefore cancel each other out.  As you can see, GapP is a class that in some sense anticipates quantum computing!

For our purposes, the most important difference between #P and GapP is that #P functions can at least be multiplicatively approximated in the class BPPNP, by using Stockmeyer’s technique of approximating counting with universal hash functions.  By contrast, even if you just want to approximate a GapP function to within (say) a factor of 2—or for that matter, just decide whether a GapP function is positive or negative—it’s not hard to see that that’s already a #P-hard problem.  For, supposing we had an oracle to solve this problem, we could then shift the sum this way and that by adding positive and negative dummy terms, and use binary search, to zero in on the sum’s exact value in polynomial time.

It’s also not hard to see that a quantum computation can encode an arbitrary GapP function in one of its amplitudes.  Indeed, let s:{0,1}n→{1,-1} be any Boolean function that’s given by a polynomial-size circuit.  Then consider the quantum circuit below.

When we run this circuit, the probability that we see the all-0 string as output is

$$ \left( \frac{1}{\sqrt{2^n}} \sum_{z\in \{0,1\}^n} s(z) \right)^2 = \frac{1}{2^n} \sum_{z,w\in \{0,1\}^n} s(z) s(w) $$

which is clearly in GapP, and clearly #P-hard even to approximate to within a multiplicative factor.

By contrast, suppose we had a probabilistic polynomial-time classical algorithm, call it M, to sample the output distribution of the above quantum circuit.  Then we could rewrite the above probability as Prr[M(r) outputs 0…0], where r consists of the classical random bits used by M.  This is again an exponentially large sum, with one term for each possible r value—but now it’s a sum of nonnegative terms (probabilities), which is therefore approximable in BPPNP.

We can state the upshot as follows.  Let ExactSampBPP be the class of sampling problems—that is, families of probability distributions {Dx}x, one for each input x∈{0,1}n—for which there exists a polynomial-time randomized algorithm that outputs a sample exactly from Dx, in time polynomial in |x|.  Let ExactSampBQP be the same thing except that we allow a polynomial-time quantum algorithm.  Then we have that, if ExactSampBPP = ExactSampBQP, then squared sums of both positive and negative terms, could efficiently be rewritten as sums of nonnegative terms only—and hence P#P=BPPNP.  This, in turn, would collapse the polynomial hierarchy to the third level, by Toda’s Theorem that PH⊆P#P, together with the result BPPNP⊆∑3.  To summarize:

Theorem 1.  Quantum computers can efficiently solve exact sampling problems that are classically hard unless the polynomial hierarchy collapses.

(In fact, the argument works not only if the classical algorithm exactly samples Dx, but if it samples from any distribution in which the probabilities are multiplicatively close to Dx‘s.  If we really only care about exact sampling, then we can strengthen the conclusion to get that PH collapses to the second level.)

This sort of reasoning was implicit in several early works, including those of Fenner et al. and Terhal and DiVincenzo.  It was made fully explicit in my paper with Alex Arkhipov on BosonSampling in 2011, and in the independent work of Bremner, Jozsa, and Shepherd on the IQP model.  These works actually showed something stronger, which is that we get a collapse of PH, not merely from a fast classical algorithm to simulate arbitrary quantum systems, but from fast classical algorithms to simulate various special quantum systems.  In the case of BosonSampling, that special system is a collection of identical, non-interacting photons passing through a network of beamsplitters, then being measured at the very end to count the number of photons in each mode.  In the case of IQP, the special system is a collection of qubits that are prepared, subjected to some commuting Hamiltonians acting on various subsets of the qubits, and then measured.  These special systems don’t seem to be capable of universal quantum computation (or for that matter, even universal classical computation!)—and correspondingly, many of them seem easier to realize in the lab than a full universal quantum computer.

From an experimental standpoint, though, all these results are unsatisfactory, because they all talk only about the classical hardness of exact (or very nearly exact) sampling—and indeed, the arguments are based around the hardness of estimating just a single, exponentially-small amplitude.  But any real experiment will have tons of noise and inaccuracy, so it seems only fair to let the classical simulation be subject to serious noise and inaccuracy as well—but as soon as we do, the previous argument collapses.

Thus, from the very beginning, Alex Arkhipov and I took it as our “real” goal to show, under some reasonable assumption, that there’s a distribution D that a polynomial-time quantum algorithm can sample from, but such that no polynomial-time classical algorithm can sample from any distribution that’s even ε-close to D in variation distance.  Indeed, this goal is what led us to BosonSampling in the first place: we knew that we needed amplitudes that were not only #P-hard but “robustly” #P-hard; we knew that the permanent of an n×n matrix (at least over finite fields) was the canonical example of a “robustly” #P-hard function; and finally, we knew that systems of identical non-interacting bosons, such as photons, gave rise to amplitudes that were permanents in an extremely natural way.  The fact that photons actually exist in the physical world, and that our friends with quantum optics labs like to do experiments with them, was just a nice bonus!

A bit more formally, let ApproxSampBPP be the class of sampling problems for which there exists a classical algorithm that, given an input x∈{0,1}n and a parameter ε>0, samples a distribution that’s at most  away from Dx in variation distance, in time polynomial in n and 1/ε.  Let ApproxSampBQP be the same except that we allow a quantum algorithm.  Then the “dream” result that we’d love to prove—both then and now—is the following.

Strong Quantum Supremacy Conjecture.  If ApproxSampBPP = ApproxSampBQP, then the polynomial hierarchy collapses.

Unfortunately, Alex and I were only able to prove this conjecture assuming a further hypothesis, about the permanents of i.i.d. Gaussian matrices.

Theorem 2 (A.-Arkhipov).  Given an n×n matrix X of independent complex Gaussian entries, each of mean 0 and variance 1, assume it’s a #P-hard problem to approximate |Per(X)|2 to within ±ε⋅n!, with probability at least 1-δ over the choice of X, in time polynomial in n, 1/ε, and 1/δ.  Then the Strong Quantum Supremacy Conjecture holds.  Indeed, more than that: in such a case, even a fast approximate classical simulation of BosonSampling, in particular, would imply P#P=BPPNP and hence a collapse of PH.

Alas, after some months of effort, we were unable to prove the needed #P-hardness result for Gaussian permanents, and it remains an outstanding open problem—there’s not even a consensus as to whether it should be true or false.  Note that there is a famous polynomial-time classical algorithm to approximate the permanents of nonnegative matrices, due to Jerrum, Sinclair, and Vigoda, but that algorithm breaks down for matrices with negative or complex entries.  This is once again the power of cancellations, the difference between #P and GapP.

Frustratingly, if we want the exact permanents of i.i.d. Gaussian matrices, we were able to prove that that’s #P-hard; and if we want the approximate permanents of arbitrary matrices, we also know that that’s #P-hard—it’s only when we have approximation and random inputs in the same problem that we no longer have the tools to prove #P-hardness.

In the meantime, one can also ask a meta-question.  How hard should it be to prove the Strong Quantum Supremacy Conjecture?  Were we right to look at slightly exotic objects, like the permanents of Gaussian matrices?  Or could Strong Quantum Supremacy have a “pure, abstract complexity theory proof”?

Well, one way to formalize that question is to ask whether Strong Quantum Supremacy has a relativizing proof, a proof that holds in the presence of an arbitrary oracle.  Alex and I explicitly raised that as an open problem in our BosonSampling paper.

Note that “weak” quantum supremacy—i.e., the statement that ExactSampBPP = ExactSampBQP collapses the polynomial hierarchy—has a relativizing proof, namely the proof that I sketched earlier.  All the ingredients that we used—Toda’s Theorem, Stockmeyer approximate counting, simple manipulations of quantum circuits—were relativizing ingredients.  By contrast, all the way back in 1998, Fortnow and Rogers proved the following.

Theorem 3 (Fortnow and Rogers).  There exists an oracle relative to which P=BQP and yet PH is infinite.

In other words, if you want to prove that P=BQP collapses the polynomial hierarchy, the proof can’t be relativizing.  This theorem was subsequently generalized in a paper by Fenner, Fortnow, Kurtz, and Li, which used concepts like “generic oracles” that seem powerful but that I don’t understand.

The trouble is, Fortnow and Rogers’s construction was extremely tailored to making P=BQP.  It didn’t even make PromiseBPP=PromiseBQP (that is, it allowed that quantum computers might still be stronger than classical ones for promise problems), let alone did it collapse quantum with classical for sampling problems.

We can organize the various quantum/classical collapse possibilities as follows:

ExactSampBPP = ExactSampBQP

ApproxSampBPP = ApproxSampBQP   ⇔   FBPP = FBQP

PromiseBPP = PromiseBQP


Here FBPP is the class of relation problems solvable in randomized polynomial time—that is, problems where given an input x∈{0,1}n and a parameter ε>0, the goal is to produce any output in a certain set Sx, with success probability at least 1-ε, in time polynomial in n and 1/ε.  FBQP is the same thing except for quantum polynomial time.

The equivalence between the two equalities ApproxSampBPP = ApproxSampBQP and FBPP=FBQP is not obvious, and was the main result in my 2011 paper The Equivalence of Sampling and Searching.  While it’s easy to see that ApproxSampBPP = ApproxSampBQP implies FBPP=FBQP, the opposite direction requires us to take an arbitrary sampling problem S, and define a relation problem RS that has “essentially the same difficulty” as S (in the sense that RS has an efficient classical algorithm iff S does, RS has an efficient quantum algorithm iff S does, etc).  This, in turn, we do using Kolmogorov complexity: basically, RS asks us to output a tuple of samples that have large probabilities according to the requisite probability distribution from the sampling problem; and that also, conditioned on that, are close to algorithmically random.  The key observation is that, if a probabilistic Turing machine of fixed size can solve that relation problem for arbitrarily large inputs, then it must be doing so by sampling from a probability distribution close in variation distance to D—since any other approach would lead to outputs that were algorithmically compressible.

Be that as it may, staring at the chain of implications above, a natural question is which equalities in the chain collapse the polynomial hierarchy in a relativizing way, and which equalities collapse PH (if they do) only for deeper, non-relativizing reasons.

This is one of the questions that Lijie Chen and I took up, and settled, in our paper Complexity-Theoretic Foundations of Quantum Supremacy Experiments, which was presented at this summer’s Computational Complexity Conference (CCC) in Riga.  The “main” results in our paper—or at least, the results that the physicists care about—were about how confident we can be in the classical hardness of simulating quantum sampling experiments with random circuits, such as the experiments that the Google group will hopefully be able to do with its 49-qubit device in the near future.  This involved coming up with a new hardness assumption, which was tailored to those sorts of experiments, and giving a reduction from that new assumption, and studying how far existing algorithms come toward breaking the new assumption (tl;dr: not very far).

But our paper also had what I think of as a “back end,” containing results mainly of interest to complexity theorists, about what kinds of quantum supremacy theorems we can and can’t hope for in principle.  When I’m giving talks about our paper to physicists, I never have time to get to this back end—it’s always just “blah, blah, we also did some stuff involving structural complexity and oracles.”  But given that a large fraction of all the people on earth who enjoy those things are probably right here in this room, in the rest of this talk, I’d like to tell you about what was in the back end.

The first thing there was the following result.

Theorem 4 (A.-Chen).  There exists an oracle relative to which ApproxSampBPP = ApproxSampBQP and yet PH is infinite. In other words, any proof of the Strong Quantum Supremacy Conjecture will require non-relativizing techniques.

Theorem 4 represents a substantial generalization of Fortnow and Rogers’s Theorem 3, in that it makes quantum and classical equivalent not only for promise problems, but even for approximate sampling problems.  There’s also a sense in which Theorem 4 is the best possible: as we already saw, there are no oracles relative to which ExactSampBPP = ExactSampBQP and yet PH is infinite, because the opposite conclusion relativizes.

So how did we prove Theorem 4?  Well, we learned at this workshop that Stuart Kurtz pioneered the development of principled ways to prove oracle results just like this one, with multiple “nearly conflicting” requirements.  But, because we didn’t know that at the time, we basically just plunged in and built the oracle we wanted by hand!

In more detail, you can think of our oracle construction as proceeding in three steps.

  1. We throw in an oracle for a PSPACE-complete problem.  This collapses ApproxSampBPP with ApproxSampBQP, which is what we want.  Unfortunately, it also collapses the polynomial hierarchy down to P, which is not what we want!
  2. So then we need to add in a second part of the oracle that makes PH infinite again.  From Håstad’s seminal work in the 1980s until recently, even if we just wanted any oracle that makes PH infinite, without doing anything else at the same time, we only knew how to achieve that with quite special oracles.  But in their 2015 breakthrough, Rossman, Servedio, and Tan have shown that even a random oracle makes PH infinite with probability 1.  So for simplicity, we might as well take this second part of the oracle to be random.  The “only” problem is that, along with making PH infinite, a random oracle will also re-separate ApproxSampBPP and ApproxSampBQP (and for that matter, even ExactSampBPP and ExactSampBQP)—for example, because of the Fourier sampling task performed by the quantum circuit I showed you earlier!  So we once again seem back where we started.
    (To ward off confusion: ever since Fortnow and Rogers posed the problem in 1998, it remains frustratingly open whether BPP and BQP can be separated by a random oracle—that’s a problem that I and others have worked on, making partial progress that makes a query complexity separation look unlikely without definitively ruling one out.  But separating the sampling versions of BPP and BQP by a random oracle is much, much easier.)
  3. So, finally, we need to take the random oracle that makes PH infinite, and “scatter its bits around randomly” in such a way that a PH machine can still find the bits, but an ApproxSampBQP machine can’t.  In other words: given our initial random oracle A, we can make a new oracle B such that B(y,r)=(1,A(y)) if r is equal to a single randomly-chosen “password” ry, depending on the query y, and B(y,r)=(0,0) otherwise.  In that case, it takes just one more existential quantifier to guess the password ry, so PH can do it, but a quantum algorithm is stuck, basically because the linearity of quantum mechanics makes the algorithm not very sensitive to tiny random changes to the oracle string (i.e., the same reason why Grover’s algorithm can’t be arbitrarily sped up).  Incidentally, the reason why the password ry needs to depend on the query y is that otherwise the input x to the quantum algorithm could hardcode a password, and thereby reveal exponentially many bits of the random oracle A.

We should now check: why does the above oracle “only” collapse ApproxSampBPP and ApproxSampBQP?  Why doesn’t it also collapse ExactSampBPP and ExactSampBQP—as we know that it can’t, by our previous argument?  The answer is: because a quantum algorithm does have an exponentially small probability of correctly guessing a given password ry.  And that’s enough to make the distribution sampled by the quantum algorithm differ, by 1/exp(n) in variation distance, from the distribution sampled by any efficient classical simulation of the algorithm—an error that doesn’t matter for approximate sampling, but does matter for exact sampling.

Anyway, it’s then just like seven pages of formalizing the above intuitions and you’re done!

OK, since there seems to be time, I’d like to tell you about one more result from the back end of my and Lijie’s paper.

If we can work relative to whatever oracle A we like, then it’s easy to get quantum supremacy, and indeed BPPA≠BQPA.  We can, for example, use Simon’s problem, or Shor’s period-finding problem, or Forrelation, or other choices of black-box problems that admit huge, provable quantum speedups.  In the unrelativized world, by contrast, it’s clear that we have to make some complexity assumption for quantum supremacy—even if we just want ExactSampBPP ≠ ExactSampBQP.  For if (say) P=P#P, then ExactSampBPP and ExactSampBQP would collapse as well.

Lijie and I were wondering: what happens if we try to “interpolate” between the relativized and unrelativized worlds?  More specifically, what happens if our algorithms are allowed to query a black box, but we’re promised that whatever’s inside the black box is efficiently computable (i.e., has a small circuit)?  How hard is it to separate BPP from BQP, or ApproxSampBPP from ApproxSampBQP, relative to an oracle A that’s constrained to lie in P/poly?

Here, we’ll start with a beautiful observation that’s implicit in 2004 work by Servedio and Gortler, as well as 2012 work by Mark Zhandry.  In our formulation, this observation is as follows:

Theorem 5.  Suppose there exist cryptographic one-way functions (even just against classical adversaries).  Then there exists an oracle A∈P/poly such that BPPA≠BQPA.

While we still need to make a computational hardness assumption here, to separate quantum from classical computing, the surprise is that the assumption is so much weaker than what we’re used to.  We don’t need to assume the hardness of factoring or discrete log—or for that matter, of any “structured” problem that could be a basis for, e.g., public-key cryptography.  Just a one-way function that’s hard to invert, that’s all!

The intuition here is really simple.  Suppose there’s a one-way function; then it’s well-known, by the HILL and GGM Theorems of classical cryptography, that we can bootstrap it to get a cryptographic pseudorandom function family.  This is a family of polynomial-time computable functions fs:{0,1}n→{0,1}n, parameterized by a secret seed s, such that fs can’t be distinguished from a truly random function f by any polynomial-time algorithm that’s given oracle access to the function and that doesn’t know s.  Then, as our efficiently computable oracle A that separates quantum from classical computing, we take an ensemble of functions like

gs,r(x) = fs(x mod r),

where r is an exponentially large integer that serves as a “hidden period,” and s and r are both secrets stored by the oracle that are inaccessible to the algorithm that queries it.

The reasoning is now as follows: certainly there’s an efficient quantum algorithm to find r, or to solve some decision problem involving r, which we can use to define a language that’s in BQPA but not in BPPA.  That algorithm is just Shor’s period-finding algorithm!  (Technically, Shor’s algorithm needs certain assumptions on the starting function fs to work—e.g., it couldn’t be a constant function—but if those assumptions aren’t satisfied, then fs wasn’t pseudorandom anyway.)  On the other hand, suppose there were an efficient classical algorithm to find the period r.  In that case, we have a dilemma on our hands: would the classical algorithm still have worked, had we replaced fs by a truly random function?  If so, then the classical algorithm would violate well-known lower bounds on the classical query complexity of period-finding.  But if not, then by working on pseudorandom functions but not on truly random functions, the algorithm would be distinguishing the two—so fs wouldn’t have been a cryptographic pseudorandom function at all, contrary to assumption!

This all caused Lijie and me to wonder whether Theorem 5 could be strengthened even further, so that it wouldn’t use any complexity assumption at all.  In other words, why couldn’t we just prove unconditionally that there’s an oracle A∈P/poly such that BPPA≠BQPA?  By comparison, it’s not hard to see that we can unconditionally construct an oracle A∈P/poly such that PA≠NPA.

Alas, with the following theorem, we were able to explain why BPP vs. BQP (and even ApproxSampBPP vs. ApproxSampBQP) are different, and why some computational assumption is still needed to separate quantum from classical, even if we’re working relative to an efficiently computable oracle.

Theorem 6 (A.-Chen).  Suppose that, in the real world, ApproxSampBPP = ApproxSampBQP and NP⊆BPP (granted, these are big assumptions!).  Then ApproxSampBPPA = ApproxSampBQPA for all oracles A∈P/poly.

Taking the contrapositive, this is saying that you can’t separate ApproxSampBPP from ApproxSampBQP relative to an efficiently computable oracle, without separating some complexity classes in the real world.  This contrasts not only with P vs. NP, but even with ExactSampBPP vs. ExactSampBQP, which can be separated unconditionally relative to efficiently computable oracles.

The proof of Theorem 6 is intuitive and appealing.  Not surprisingly, we’re going to heavily exploit the assumptions ApproxSampBPP = ApproxSampBQP and NP⊆BPP.  Let Q be a polynomial-time quantum algorithm that queries an oracle A∈P/poly.  Then we need to simulate Q—and in particular, sample close to the same probability distribution over outputs—using a polynomial-time classical algorithm that queries A.


$$ \sum_{x,w} \alpha_{x,w} \left|x,w\right\rangle $$

be the state of Q immediately before its first query to the oracle A, where x is the input to be submitted to the oracle.  Then our first task is to get a bunch of samples from the probability distribution D={|αx,w|2}x,w, or something close to D in variation distance.  But this is easy to do, using the assumption ApproxSampBPP = ApproxSampBQP.

Let x1,…,xk be our samples from D, marginalized to the x part.  Then next, our classical algorithm queries A on each of x1,…,xk, getting responses A(x1),…,A(xk).  The next step is to search for a function f∈P/poly—or more specifically, a function of whatever fixed polynomial size is relevant—that agrees with A on the sample data, i.e. such that f(xi)=A(xi) for all i∈[k].  This is where we’ll use the assumption NP⊆BPP (together, of course, with the fact that at least one such f exists, namely A itself!), to make the task of finding f efficient.  We’ll also appeal to a fundamental fact about the sample complexity of PAC-learning.  The fact is that, if we find a polynomial-size circuit f that agrees with A on a bunch of sample points drawn independently from a distribution, then f will probably agree with A on most further points drawn from the same distribution as well.

So, OK, we then have a pretty good “mock oracle,” f, that we can substitute for the real oracle on the first query that Q makes.  Of course f and A won’t perfectly agree, but the small fraction of disagreements won’t matter much, again because of the linearity of quantum mechanics (i.e., the same thing that prevents us from speeding up Grover’s algorithm arbitrarily).  So we can basically simulate Q’s first query, and now our classical simulation is good to go until Q’s second query!  But now you can see where this is going: we iterate the same approach, and reuse the same assumptions ApproxSampBPP = ApproxSampBQP and NP⊆BPP, to find a new “mock oracle” that lets us simulate Q’s second query, and so on until all of Q’s queries have been simulated.

OK, I’ll stop there.  I don’t have a clever conclusion or anything.  Thank you.

August 31, 2017

Terence TaoDodgson condensation from Schur complementation

The determinant {\det_n(A)} of an {n \times n} matrix (with coefficients in an arbitrary field) obey many useful identities, starting of course with the fundamental multiplicativity {\det_n(AB) = \det_n(A) \det_n(B)} for {n \times n} matrices {A,B}. This multiplicativity can in turn be used to establish many further identities; in particular, as shown in this previous post, it implies the Schur determinant identity

\displaystyle  \det_{n+k}\begin{pmatrix} A & B \\ C & D \end{pmatrix} = \det_n(A) \det_k( D - C A^{-1} B ) \ \ \ \ \ (1)

whenever {A} is an invertible {n \times n} matrix, {B} is an {n \times k} matrix, {C} is a {k \times n} matrix, and {D} is a {k \times k} matrix. The matrix {D - CA^{-1} B} is known as the Schur complement of the block {A}.

I only recently discovered that this identity in turn immediately implies what I always found to be a somewhat curious identity, namely the Dodgson condensation identity (also known as the Desnanot-Jacobi identity)

\displaystyle  \det_n(M) \det_{n-2}(M^{1,n}_{1,n}) = \det_{n-1}( M^1_1 ) \det_{n-1}(M^n_n)

\displaystyle - \det_{n-1}(M^1_n) \det_{n-1}(M^n_1)

for any {n \geq 3} and {n \times n} matrix {M}, where {M^i_j} denotes the {n-1 \times n-1} matrix formed from {M} by removing the {i^{th}} row and {j^{th}} column, and similarly {M^{i,i'}_{j,j'}} denotes the {n-2 \times n-2} matrix formed from {M} by removing the {i^{th}} and {(i')^{th}} rows and {j^{th}} and {(j')^{th}} columns. Thus for instance when {n=3} we obtain

\displaystyle  \det_3 \begin{pmatrix} a & b & c \\ d & e & f \\ g & h & i \end{pmatrix} \cdot e

\displaystyle  = \det_2 \begin{pmatrix} e & f \\ h & i \end{pmatrix} \cdot \det_2 \begin{pmatrix} a & b \\ d & e \end{pmatrix}

\displaystyle  - \det_2 \begin{pmatrix} b & c \\ e & f \end{pmatrix} \cdot \det_2 \begin{pmatrix} d & e \\ g & h \end{pmatrix}

for any scalars {a,b,c,d,e,f,g,h,i}. (Charles Dodgson, better known by his pen name Lewis Caroll, is of course also known for writing “Alice in Wonderland” and “Through the Looking Glass“.)

The derivation is not new; it is for instance noted explicitly in this paper of Brualdi and Schneider, though I do not know if this is the earliest place in the literature where it can be found. (EDIT: Apoorva Khare has pointed out to me that the original arguments of Dodgson can be interpreted as implicitly following this derivation.) I thought it is worth presenting the short derivation here, though.

Firstly, by swapping the first and {(n-1)^{th}} rows, and similarly for the columns, it is easy to see that the Dodgson condensation identity is equivalent to the variant

\displaystyle  \det_n(M) \det_{n-2}(M^{n-1,n}_{n-1,n}) = \det_{n-1}( M^{n-1}_{n-1} ) \det_{n-1}(M^n_n) \ \ \ \ \ (2)

\displaystyle  - \det_{n-1}(M^{n-1}_n) \det_{n-1}(M^n_{n-1}).

Now write

\displaystyle  M = \begin{pmatrix} A & B_1 & B_2 \\ C_1 & d_{11} & d_{12} \\ C_2 & d_{21} & d_{22} \end{pmatrix}

where {A} is an {n-2 \times n-2} matrix, {B_1, B_2} are {n-2 \times 1} column vectors, {C_1, C_2} are {1 \times n-2} row vectors, and {d_{11}, d_{12}, d_{21}, d_{22}} are scalars. If {A} is invertible, we may apply the Schur determinant identity repeatedly to conclude that

\displaystyle  \det_n(M) = \det_{n-2}(A) \det_2 \begin{pmatrix} d_{11} - C_1 A^{-1} B_1 & d_{12} - C_1 A^{-1} B_2 \\ d_{21} - C_2 A^{-1} B_1 & d_{22} - C_2 A^{-1} B_2 \end{pmatrix}

\displaystyle  \det_{n-2} (M^{n-1,n}_{n-1,n}) = \det_{n-2}(A)

\displaystyle  \det_{n-1}( M^{n-1}_{n-1} ) = \det_{n-2}(A) (d_{22} - C_2 A^{-1} B_2 )

\displaystyle  \det_{n-1}( M^{n-1}_{n} ) = \det_{n-2}(A) (d_{21} - C_2 A^{-1} B_1 )

\displaystyle  \det_{n-1}( M^{n}_{n-1} ) = \det_{n-2}(A) (d_{12} - C_1 A^{-1} B_2 )

\displaystyle  \det_{n-1}( M^{n}_{n} ) = \det_{n-2}(A) (d_{11} - C_1 A^{-1} B_1 )

and the claim (2) then follows by a brief calculation (and the explicit form {\det_2 \begin{pmatrix} a & b \\ c & d \end{pmatrix} = ad-bc} of the {2 \times 2} determinant). To remove the requirement that {A} be invertible, one can use a limiting argument, noting that one can work without loss of generality in an algebraically closed field, and in such a field, the set of invertible matrices is dense in the Zariski topology. (In the case when the scalars are reals or complexes, one can just use density in the ordinary topology instead if desired.)

The same argument gives the more general determinant identity of Sylvester

\displaystyle  \det_n(M) \det_{n-k}(M^S_S)^{k-1} = \det_k \left( \det_{n-k+1}(M^{S \backslash \{i\}}_{S \backslash \{j\}}) \right)_{i,j \in S}

whenever {n > k \geq 1}, {S} is a {k}-element subset of {\{1,\dots,n\}}, and {M^S_{S'}} denotes the matrix formed from {M} by removing the rows associated to {S} and the columns associated to {S'}. (The Dodgson condensation identity is basically the {k=2} case of this identity.)

A closely related proof of (2) proceeds by elementary row and column operations. Observe that if one adds some multiple of one of the first {n-2} rows of {M} to one of the last two rows of {M}, then the left and right sides of (2) do not change. If the minor {A} is invertible, this allows one to reduce to the case where the components {C_1,C_2} of the matrix vanish. Similarly, using elementary column operations instead of row operations we may assume that {B_1,B_2} vanish. All matrices involved are now block-diagonal and the identity follows from a routine computation.

The latter approach can also prove the cute identity

\displaystyle  \det_2 \begin{pmatrix} \det_n( X_1, Y_1, A ) & \det_n( X_1, Y_2, A ) \\ \det_n(X_2, Y_1, A) & \det_n(X_2,Y_2, A) \end{pmatrix} = \det_n( X_1,X_2,A) \det_n(Y_1,Y_2,A)

for any {n \geq 2}, any {n \times 1} column vectors {X_1,X_2,Y_1,Y_2}, and any {n \times n-2} matrix {A}, which can for instance be found in page 7 of this text of Karlin. Observe that both sides of this identity are unchanged if one adds some multiple of any column of {A} to one of {X_1,X_2,Y_1,Y_2}; for generic {A}, this allows one to reduce {X_1,X_2,Y_1,Y_2} to have only the first two entries allowed to be non-zero, at which point the determinants split into {2 \times 2} and {n -2 \times n-2} determinants and we can reduce to the {n=2} case (eliminating the role of {A}). One can now either proceed by a direct computation, or by observing that the left-hand side is quartilinear in {X_1,X_2,Y_1,Y_2} and antisymmetric in {X_1,X_2} and {Y_1,Y_2} which forces it to be a scalar multiple of {\det_2(X_1,X_2) \det_2(Y_1,Y_2)}, at which point one can test the identity at a single point (e.g. {X_1=Y_1 = e_1} and {X_2=Y_2=e_2} for the standard basis {e_1,e_2}) to conclude the argument. (One can also derive this identity from the Sylvester determinant identity but I think the calculations are a little messier if one goes by that route. Conversely, one can recover the Dodgson condensation identity from Karlin’s identity by setting {X_1=e_1}, {X_2=e_2} (for instance) and then permuting some rows and columns.)

Filed under: expository, math.RA Tagged: Dodgson condensation, matrix identities, Schur complement

August 30, 2017

BackreactionThe annotated math of (almost) everything

Have you heard of the principle of least action? It’s the most important idea in physics, and it underlies everything. According to this principle, our reality is optimal in a mathematically exact way: it minimizes a function called the “action.” The universe that we find ourselves in is the one for which the action takes on the smallest value. In quantum mechanics, reality isn’t quite that

August 29, 2017

John PreskillDecoding (the allure of) the apparent horizon

I took 32 hours to unravel why Netta Engelhardt’s talk had struck me.

We were participating in Quantum Information in Quantum Gravity III, a workshop hosted by the University of British Columbia (UBC) in Vancouver. Netta studies quantum gravity as a Princeton postdoc. She discussed a feature of black holes—an apparent horizon—I’d not heard of. After hearing of it, I had to grasp it. I peppered Netta with questions three times in the following day. I didn’t understand why, for 32 hours.

After 26 hours, I understood apparent horizons like so.

Imagine standing beside a glass sphere, an empty round shell. Imagine light radiating from a point source in the sphere’s center. Think of the point source as a minuscule flash light. Light rays spill from the point source.

Which paths do the rays follow through space? They fan outward from the sphere’s center, hit the glass, and fan out more. Imagine turning your back to the sphere and looking outward. Light rays diverge as they pass you.

At least, rays diverge in flat space-time. We live in nearly flat space-time. We wouldn’t if we neighbored a supermassive object, like a black hole. Mass curves space-time, as described by Einstein’s theory of general relativity.

Sphere 2

Imagine standing beside the sphere near a black hole. Let the sphere have roughly the black hole’s diameter—around 10 kilometers, according to astrophysical observations. You can’t see much of the sphere. So—imagine—you recruit your high-school-physics classmates. You array yourselves around the sphere, planning to observe light and compare observations. Imagine turning your back to the sphere. Light rays would converge, or flow toward each other. You’d know yourself to be far from Kansas.

Picture you, your classmates, and the sphere falling into the black hole. When would everyone agree that the rays switch from diverging to converging? Sometime after you passed the event horizon, the point of no return.1 Before you reached the singularity, the black hole’s center, where space-time warps infinitely. The rays would switch when you reached an in-between region, the apparent horizon.

Imagine pausing at the apparent horizon with your sphere, facing away from the sphere. Light rays would neither diverge nor converge; they’d point straight. Continue toward the singularity, and the rays would converge. Reverse away from the singularity, and the rays would diverge.

Rose garden 2

UBC near twilight

Rays diverged from the horizon beyond UBC at twilight. Twilight suits UBC as marble suits the Parthenon; and UBC’s twilight suits musing. You can reflect while gazing on reflections in glass buildings, or reflections in a pool by a rose garden. Your mind can roam as you roam paths lined by elms, oaks, and willows. I wandered while wondering why the sphere intrigued me.

Science thrives on instrumentation. Galileo improved the telescope, which unveiled Jupiter’s moons. Alexander von Humboldt measured temperatures and pressures with thermometers and barometers, charting South America during the 1700s. The Large Hadron Collider revealed the Higgs particle’s mass in 2012.

The sphere reminded me of a thermometer. As thermometers register temperature, so does the sphere register space-time curvature. Not that you’d need a sphere to distinguish a black hole from Kansas. Nor do you need a thermometer to distinguish Vancouver from a Brazilian jungle. But thermometers quantify the distinction. A sphere would sharpen your observations’ precision.

A sphere and a light source—free of supercolliders, superconductors, and superfridges. The instrument boasts not only profundity, but also simplicity.

von Humboldt.001

Alexander von Humboldt

Netta proved a profound theorem about apparent horizons, with coauthor Aron Wall. Jakob Bekenstein and Stephen Hawking had studied event horizons during the 1970s. An event horizon’s area, Bekenstein and Hawking showed, is proportional to the black hole’s thermodynamic entropy. Netta and Aron proved a proportionality between another area and another entropy.

They calculated an apparent horizon’s area, A. The math that represents their black hole represents also a quantum system, by a duality called AdS/CFT. The quantum system can occupy any of several states. Different states encode different information about the black hole. Consider the information needed to describe, fully and only, the region outside the apparent horizon. Some quantum state \rho encodes this information. \rho encodes no information about the region behind the apparent horizon, closer to the black hole. How would you quantify this lack of information? With the von Neumann entropy S(\rho). This entropy is proportional to the apparent horizon’s area: S( \rho )  \propto  A.

Netta and Aron entitled their paper “Decoding the apparent horizon.” Decoding the apparent horizon’s allure took me 32 hours and took me to an edge of campus. But I didn’t mind. Edges and horizons suited my visit as twilight suits UBC. Where can we learn, if not at edges, as where quantum information meets other fields?


With gratitude to Mark van Raamsdonk and UBC for hosting Quantum Information in Quantum Gravity III; to Mark, the other organizers, and the “It from Qubit” Simons Foundation collaboration for the opportunity to participate; and to Netta Engelhardt for sharing her expertise.

1Nothing that draws closer to a black hole than the event horizon can turn around and leave, according to general relativity. The black hole’s gravity pulls too strongly. Quantum mechanics implies that information leaves, though, in Hawking radiation.

August 27, 2017

Scott AaronsonHTTPS / Kurtz / eclipse / Charlottesville / Blum / P vs. NP

This post has a grab bag of topics, unified only by the fact that I can no longer put off blogging about them. So if something doesn’t interest you, just scroll down till you find something that does.

Great news, everyone: following a few reader complaints about the matter, the domain now supports https—and even automatically redirects to it! I’m so proud that Shtetl-Optimized has finally entered the technological universe of 1994. Thanks so much to heroic reader Martin Dehnel-Wild for setting this up for me.

Update 26/08/2017: Comments should now be working again; comments are now coming through to the moderated view in the blog’s control panel, so if they don’t show up immediately it might just be awaiting moderation. Thanks for your patience.

Last weekend, I was in Columbia, South Carolina, for a workshop to honor the 60th birthday of Stuart Kurtz, theoretical computer scientist at the University of Chicago.  I gave a talk about how work Kurtz was involved in from the 1990s—for example, on defining the complexity class GapP, and constructing oracles that satisfy conflicting requirements simultaneously—plays a major role in modern research on quantum computational supremacy: as an example, my recent paper with Lijie Chen.  (Except, what a terrible week to be discussing the paths to supremacy!  I promise there are no tiki torches involved, only much weaker photon sources.)

Coincidentally, I don’t know if you read anything about this on social media, but there was this total solar eclipse that passed right over Columbia at the end of the conference.

I’d always wondered why some people travel to remote corners of the earth to catch these.  So the sky gets dark for two minutes, and then it gets light again, in a way that’s been completely understood and predictable for centuries?

Having seen it, I can now tell you the deal, if you missed it and prefer to read about it here rather than 10500 other places online.  At risk of stating the obvious: it’s not the dark sky; it’s the sun’s corona visible around the moon.  Ironically, it’s only when the sun’s blotted out that you can actually look at the sun, at all the weird stuff going on around its disk.

OK, but totality is “only” to eclipses as orgasms are to sex.  There’s also the whole social experience of standing around outside with friends for an hour as the moon gradually takes a bigger bite out of the sun, staring up from time to time with eclipse-glasses to check its progress—and then everyone breaking into applause as the sky finally goes mostly dark, and you can look at the corona with the naked eye.  And then, if you like, standing around for another hour as the moon gradually exits the other way.  (If you’re outside the path of totality, this standing around and checking with eclipse-glasses is the whole experience.)

One cool thing is that, a little before and after totality, shadows on the ground have little crescents in them, as if the eclipse is imprinting its “logo” all over the earth.

For me, the biggest lesson the eclipse drove home was the logarithmic nature of perceived brightness (see also Scott Alexander’s story).  Like, the sun can be more than 90% occluded, and yet it’s barely a shade darker outside.  And you can still only look up with glasses so dark that they blot out everything except the sliver of sun, which still looks pretty much like the normal sun if you catch it out of the corner of your unaided eye.  Only during totality, and a few minutes before and after, is the darkening obvious.

Another topic at the workshop, unsurprisingly, was the ongoing darkening of the United States.  If it wasn’t obvious from my blog’s name, and if saying so explicitly will make any difference for anything, let the record state:

Shtetl-Optimized condemns Nazis, as well as anyone who knowingly marches with Nazis or defends them as “fine people.”

For a year, this blog has consistently described the now-president as a thug, liar, traitor, bully, sexual predator, madman, racist, and fraud, and has urged decent people everywhere to fight him by every peaceful and legal means available.  But if there’s some form of condemnation that I accidentally missed, then after Charlottesville, and Trump’s unhinged quasi-defenses of violent neo-Nazis, and defenses of his previous defenses, etc.—please consider Shtetl-Optimized to have condemned Trump that way also.

At least Charlottesville seems to have set local decisionmakers on an unstoppable course toward removing the country’s remaining Confederate statues—something I strongly supported back in May, before it had become the fully thermonuclear issue that it is now.  In an overnight operation, UT Austin has taken down its statues of Robert E. Lee, Albert Johnston, John Reagan, and Stephen Hogg.  (I confess, the postmaster general of the Confederacy wouldn’t have been my #1 priority for removal.  And, genuine question: what did Texas governor Stephen Hogg do that was so awful for his time, besides naming his daughter Ima Hogg?)

A final thing to talk about—yeah, we can’t avoid it—is Norbert Blum’s claimed proof of P≠NP.  I suppose I should be gratified that, after my last post, there were commenters who said, “OK, but enough about gender politics—what about P vs. NP?”  Here’s what I wrote on Tuesday the 15th:

To everyone who keeps asking me about the “new” P≠NP proof: I’d again bet $200,000 that the paper won’t stand, except that the last time I tried that, it didn’t achieve its purpose, which was to get people to stop asking me about it. So: please stop asking, and if the thing hasn’t been refuted by the end of the week, you can come back and tell me I was a closed-minded fool.

Many people misunderstood me to be saying that I’d again bet $200,000, even though the sentence said the exact opposite.  Maybe I should’ve said: I’m searching in vain for the right way to teach the nerd world to get less excited about these claims, to have the same reaction that the experts do, which is ‘oh boy, not another one’—which doesn’t mean that you know the error, or even that there is an error, but just means that you know the history.

Speaking of which, some friends and I recently had an awesome idea.  Just today, I registered the domain name  I’d like to set this up with a form that lets you type in the URL of a paper claiming to resolve the P vs. NP problem.  The site will then take 30 seconds or so to process the paper—with a status bar, progress updates, etc.—before finally rendering a verdict about the paper’s correctness.  Do any readers volunteer to help me create this?  Don’t worry, I’ll supply the secret algorithm to decide correctness, and will personally vouch for that algorithm for as long as the site remains live.

I have nothing bad to say about Norbert Blum, who made important contributions including the 3n circuit size lower bound for an explicit Boolean function—something that stood until very recently as the world record—and whose P≠NP paper was lucidly written, passing many of the most obvious checks.  And I received a bit of criticism for my “dismissive” stance.  Apparently, some right-wing former string theorist who I no longer read, whose name rhymes with Mubos Lotl, even accused me of being a conformist left-wing ideologue, driven to ignore Blum’s proof by an irrational conviction that any P≠NP proof will necessarily be so difficult that it will need to “await the Second Coming of Christ.”  Luca Trevisan’s reaction to that is worth quoting:

I agree with [Mubos Lotl] that the second coming of Jesus Christ is not a necessary condition for a correct proof that P is different from NP. I am keeping an open mind as to whether it is a sufficient condition.

On reflection, though, Mubos has a point: all of us, including me, should keep an open mind.  Maybe P≠NP (or P=NP!) is vastly easier to prove than most experts think, and is susceptible to a “fool’s mate.”

That being the case, it’s only intellectual honesty that compels me to report that, by about Friday of last week—i.e., exactly on my predicted schedule—a clear consensus had developed among experts that Blum’s P≠NP proof was irreparably flawed, and the consensus has stood since that time.

I’ve often wished that, even just for an hour or two, I could be free from this terrifying burden that I’ve carried around since childhood: the burden of having the right instincts about virtually everything.  Trust me, this “gift” is a lot less useful than it sounds, especially when reality so often contradicts what’s popular or expedient to say.

The background to Blum’s attempt, the counterexample that shows the proof has to fail somewhere, and the specifics of what appears to go wrong have already been covered at length elsewhere: see especially Luca’s post, Dick Lipton’s post, John Baez’s post, and the CS Theory StackExchange thread.

Very briefly, though: Blum claims to generalize some of the most celebrated complexity results of the 1980s—namely, superpolynomial lower bounds on the sizes of monotone circuits, which consist entirely of Boolean AND and OR gates—so that they also work for general (non-monotone) circuits, consisting of AND, OR, and NOT gates.  Everyone agrees that, if this succeeded, it would imply P≠NP.

Alas, another big discovery from the 1980s was that there are monotone Boolean functions (like Perfect Matching) that require superpolynomial-size monotone circuits, even though they have polynomial-size non-monotone circuits.  Why is that such a bummer?  Because it means our techniques for proving monotone circuit lower bounds can’t possibly work in as much generality as one might’ve naïvely hoped: if they did, they’d imply not merely that P doesn’t contain NP, but also that P doesn’t contain itself.

Blum was aware of all this, and gave arguments as to why his approach evades the Matching counterexample.  The trouble is, there’s another counterexample, which Blum doesn’t address, called Tardos’s function.  This is a weird creature: it’s obtained by starting with a graph invariant called the Lovász theta function, then looking at a polynomial-time approximation scheme for the theta function, and finally rounding the output of that PTAS to get a monotone function.  But whatever: in constructing this function, Tardos achieved her goal, which was to produce a monotone function that all known lower bound techniques for monotone circuits work perfectly fine for, but which is nevertheless in P (i.e., has polynomial-size non-monotone circuits).  In particular, if Blum’s proof worked, then it would also work for Tardos’s function, and that gives us a contradiction.

Of course, this merely tells us that Blum’s proof must have one or more mistakes; it doesn’t pinpoint where they are.  But the latter question has now been addressed as well.  On CS StackExchange, an anonymous commenter who goes variously by “idolvon” and “vloodin” provides a detailed analysis of the proof of Blum’s crucial Theorem 6.  I haven’t gone through every step myself, and there might be more to say about the matter than “vloodin” has, but several experts who are at once smarter, more knowledgeable, more cautious, and more publicity-shy than me have confirmed for me that vloodin correctly identified the erroneous region.

To those who wonder what gave me the confidence to call this immediately, without working through the details: besides the Cassandra-like burden that I was born with, I can explain something that might be helpful.  When Razborov achieved his superpolynomial monotone lower bounds in the 1980s, there was a brief surge of excitement: how far away could a P≠NP proof possibly be?  But then people, including Razborov himself, understood much more deeply what was going on—an understanding that was reflected in the theorems they proved, but also wasn’t completely captured by those theorems.

What was going on was this: monotone circuits are an interesting and nontrivial computational model.  Indeed for certain Boolean functions, such as the “slice functions,” they’re every bit as powerful as general circuits.  However, insofar as it’s possible to prove superpolynomial lower bounds on monotone circuit size, it’s possible only because monotone circuits are ridiculously less expressive than general Boolean circuits for the problems in question.  E.g., it’s possible only because monotone circuits aren’t expressing pseudorandom functions, and therefore aren’t engaging the natural proofs barrier or most of the other terrifying beasts that we’re up against.

So what can we say about the prospect that a minor tweak to the monotone circuit lower bound techniques from the 1980s would yield P≠NP?  If, like Mubos Lotl, you took the view that discrete math and theory of computation are just a mess of disconnected, random statements, then such a prospect would seem as likely to you as not.  But if you’re armed with the understanding above, then this possibility is a lot like the possibility that the OPERA experiment discovered superluminal neutrinos: no, not a logical impossibility, but something that’s safe to bet against at 10,000:1 odds.

During the discussion of Deolalikar’s earlier P≠NP claim, I once compared betting against a proof that all sorts of people are calling “formidable,” “solid,” etc., to standing in front of a huge pendulum—behind the furthest point that it reached the last time—even as it swings toward your face.  Just as certain physics teachers stake their lives on the conservation of energy, so I’m willing to stake my academic reputation, again and again, on the conservation of circuit-lower-bound difficulty.  And here I am, alive to tell the tale.

Jordan EllenbergRoad trip to totality

My kids both wanted to see the eclipse and I said “that sounds fun but it’s too far” and I kept thinking about it and thinking about it and finally, Saturday night, I looked inward and asked myself is there really a reason we can’t do this? And the answer was no.  Or rather the answer was “it might be the case that it’s totally impossible to find a place to sleep in the totality zone within 24 hours for a non-insane amount of money, and that would be a reason” so I said, if I can get a room, we’re going.  Hotel Tonight did the rest.  (Not the first time this last-minute hotel app has saved my bacon, by the way.  I don’t use it a lot, but when I need it, it gets the job done.)

Notes on the trip:

  • We got to St. Louis Sunday night; the only sight still open was my favorite one, the Gateway Arch.  The arch is one of those things whose size and physical strangeness a photo really doesn’t capture, like Mt. Rushmore.  It works for me in the same way a Richard Serra sculpture works; it cuts the sky up in a way that doesn’t quite make sense.
  • I thought I was doing this to be a good dad, but in fact the total eclipse was more spectacular than I’d imagined, worth it in its own right.  From the photos I imagined the whole sky going nighttime dark.  But no, it’s more like twilight. That makes it better.  A dark blue sky with a flaming hole in it.
  • Underrated aspect:  the communality of it all.  An experience now rare in everyday life.  You’re in a field with thousands of other people there for the same reason as you, watching the same thing you’re watching.  Like a baseball game!  No radio call can compare with the feeling of jumping up with the crowd for a home run.  You’re just one in an array of sensors, all focused on a sphere briefly suspended in the sky.
  • People thought it was going to be cloudy.  I never read so many weather blogs as I did Monday morning.  Our Hotel Tonight room was in O’Fallon, MO, right at the edge of the totality.  Our original plan was to meet Patrick LaVictoire in Hermann, west of where we were.  But the weather blogs said south, go south, as far as you can.  That was a problem, because at the end of the day we had to drive back north.  We got as far as Festus.  There were still three hours to totality and we thought it might be smart to drive further, maybe even all the way to southern Illinois.  But a guy outside the Comfort Inn with a telescope, who seemed to know what he was doing, told us not to bother, it was a crapshoot either way and we weren’t any better off there than here.  I always trust a man with a telescope.
  • Google Maps (or the Waze buried within Google Maps) not really adequate to handle the surge of traffic after a one-time event.  Its estimates for how long it would take us to traverse I-55 through southern Illinois were … unduly optimistic.  Google sent us off the highway onto back roads, but here’s the thing — it sent the same suggestion to everyone else, which meant that instead of being in a traffic jam on the interstate we were in a traffic jam on a gravel road in the middle of a cornfield.  When Google says “switch to this road, it’ll save you ten minutes,” does it take into account the effect of its own suggestion, broadcast to thousands of cars in the same jam?  My optimization friends tell me this kind of secondary prediction is really hard.  It would have been much better, in retrospect, for us to have chosen a back road at random; if everybody injected stochasticity that way, the traffic would have been better-distributed, you have to figure.  Should Google build that stochasticity into its route suggestions?
  • It became clear around Springfield we weren’t going to get home until well after midnight, so we stopped for the night in David Foster Wallace’s hometown, Normal, IL, fitting, considering we did a supposedly fun thing that turned out to be an actual fun thing which we will hardly ever have the chance to, and thus may never, do again.



August 26, 2017

Jordan EllenbergNot to exceed 25%

Supreme Court will hear a math case!

At issue in Murphy v. Smith:  the amount of a judgment that a court can apply to covering attorney’s fees.  Here’s the relevant statute:

Whenever a monetary judgment is awarded in an action described in paragraph (1), a portion of the judgment (not to exceed 25 percent) shall be applied to satisfy the amount of attorney’s fees awarded against the defendant.

To be clear: there are two amounts of money here.  The first is the amount of attorney’s fees awarded against the defendant; the second is the portion of the judgment which the court applies towards that first amount.  This case concerns the discretion of the court to decide on the second number.

In Murphy’s case, the court decided to apply just 10% of the judgment to attorney’s fees.  Other circuit courts have licensed this practice, interpreting the law to allow the court discretion to apply any portion between 0 and 25% of the judgement to attorney’s fees.  The 7th circuit disagreed, saying that, given that the amount of attorney’s fees awarded exceeded 25% of the judgment, the court was obligated to apply the full 25% maximum.

The cert petition to the Supreme Court hammers this view, which it calls “non-literal”:

The Seventh Circuit is simply wrong in interpreting this language to mean “exactly 25 percent.” “Statutory interpretation, as we always say, begins with the text.” Ross v. Blake, 136 S. Ct. 1850, 1856 (2016). Here, the text is so clear that interpretation should end with the text as well. “Not to exceed” does not mean “exactly.”

This seems pretty clearly correct:  “not to exceed 25%” means what it means, not “exactly 25%.”  So the 7th circuit just blew it, right?

Nope!  The 7th circuit is right, the other circuits and the cert are wrong, and the Supreme Court should affirm.  At least that’s what I say.  Here’s why.

I can imagine at least three interpretations of the statuye.

  1.  The court has to apply exactly 25% of the judgment to attorney’s fees.
  2.  The court has to apply the smaller of the following numbers:  the total amount awarded in attorney’s fees, or 25% of the judgment.
  3.  The court has full discretion to apply any nonnegative amount of the judgment to attorney’s fees.

Cert holds that 3 is correct, that the 7th circuit applied 1, and that 1 is absurdly wrong.  In fact, the 7th circuit applied 2, which is correct, and 1 and 3 are both wrong.

1 is wrong:  1 is wrong for two reasons.  One is pointed out by the cert petition:  “Not to exceed 25%” doesn’t mean “Exactly 25%.”  Another reason is that “Exactly 25%” might be more than the amount awarded in attorney’s fees, in which case it would be ridiculous to apply more money than was actually owed.

7th circuit applied 2, not 1:  The opinion reads:

In Johnson v. Daley, 339 F.3d 582, 585 (7th Cir. 2003) (en banc), we explained that § 1997e(d)(2) required that “attorneys’ compensation come[] first from the damages.” “[O]nly  if 25% of the award is inadequate to compensate counsel fully” does the defendant contribute more to the fees. Id. We continue to believe that is the most natural reading of the statutory text. We do not think the statute contemplated a discretionary decision by the district court. The statute neither uses discretionary language nor provides any guidance for such discretion.

The attorney’s compensation comes first out of the damages, but if that compensation is less than 25% of the damages, then less than 25% of the damages will be applied.  This is interpretation 2.  In the case at hand, 25% of the damages was $76,933.46 , while the attorney’s fees awarded were $108,446.54.   So, in this case, the results of applying 1 and 2 are the same; but the court’s interpretation is clearly 2, not the absurd 1.

3 is wrong:  Interpretation 3 is on first glance appealing.  Why shouldn’t “a portion of the judgment (not to exceed 25%)” mean any portion satisfying that inequality?  The reason comes later in the statute; that portion is required to “satisfy the amount of attorney’s fees awarded against the defendant.”  To “satisfy” a claim is to pay it in full, not in part.  Circuits that have adopted interpretation 3, as the 8th did in Boesing v. Spiess, are adopting a reading at least as non-literal as the one cert accuses the 7th of.

Of course, in cases like Murphy v. Smith, the two clauses are in conflict:  25% of the judgment is insufficient to satisfy the amount awarded.  In this case, one requirement must bend.  Under interpretation 2, when the two clauses are in conflict, “satisfy” is the one to give way.  The 7th circuit recognizes this, correctly describing the 25% awarded as ” toward satisfying the attorney fee the court awarded,” not “satisfying” it.

Under interpretation 3, on the other hand, the requirement to “satisfy” has no force even when it is not in conflict with the first clause.  In other words, they interpret the law as if the word “satisfy” were absent, and the clause read “shall be applied to the amount of attorney’s fees.”

Suppose the attorney’s fees awarded in Murphy had been $60,000.  Under interpretation 3, the court would be free to ignore the requirement to satisfy entirely, and apply only 10% of the judgment to the attorneys, despite the fact that satisfaction was achievable within the statutory 25% limit.

Even worse:  imagine that the statute didn’t have the parenthetical, and said just

Whenever a monetary judgment is awarded in an action described in paragraph (1), a portion of the judgment shall be applied to satisfy the amount of attorney’s fees awarded against the defendant.

It would be crystal clear that the court was required to apply $60,000, the amount necessary to satisfy the award.  On interpretation 3, the further constraint imposed by the statute gives the court more discretion rather than less in a case like this one!  This can’t be right.

You could imagine switching to an interpretation 3′, in which the court is required to satisfy the amount awarded if it can do so without breaking the 25% limit, but is otherwise totally unconstrained.  Under this theory, an increase in award from $60,000 to $100,000 lessens the amount the court is required to contribute — indeed, lessens it to essentially zero.  This also can’t be right.


2 is right:  When two clauses of a statute can’t simultaneously be satisfied, the court’s job is to find some balance which satisfies each requirement to the greatest extent possible in a range of possible cases.  Interpretation 2 seems the most reasonable choice.  The Supreme Court should recognize that, contra the cert petition, this is the interpretation actually adopted by the 7th Circuit, and should go along with it.



Terence TaoAn addendum to “amplification, arbitrage, and the tensor power trick”

In one of the earliest posts on this blog, I talked about the ability to “arbitrage” a disparity of symmetry in an inequality, and in particular to “amplify” such an inequality into a stronger one. (The principle can apply to other mathematical statements than inequalities, with the “hypothesis” and “conclusion” of that statement generally playing the role of the “right-hand side” and “left-hand side” of an inequality, but for sake of discussion I will restrict attention here to inequalities.) One can formalise this principle as follows. Many inequalities in analysis can be expressed in the form

\displaystyle A(f) \leq B(f) \ \ \ \ \ (1)

for all {f} in some space {X} (in many cases {X} will be a function space, and {f} a function in that space), where {A(f)} and {B(f)} are some functionals of {f} (that is to say, real-valued functions of {f}). For instance, {B(f)} might be some function space norm of {f} (e.g. an {L^p} norm), and {A(f)} might be some function space norm of some transform of {f}. In addition, we assume we have some group {G} of symmetries {T: X \rightarrow X} acting on the underlying space. For instance, if {X} is a space of functions on some spatial domain, the group might consist of translations (e.g. {Tf(x) = f(x-h)} for some shift {h}), or perhaps dilations with some normalisation (e.g. {Tf(x) = \frac{1}{\lambda^\alpha} f(\frac{x}{\lambda})} for some dilation factor {\lambda > 0} and some normalisation exponent {\alpha \in {\bf R}}, which can be thought of as the dimensionality of length one is assigning to {f}). If we have

\displaystyle A(Tf) = A(f)

for all symmetries {T \in G} and all {f \in X}, we say that {A} is invariant with respect to the symmetries in {G}; otherwise, it is not.

Suppose we know that the inequality (1) holds for all {f \in X}, but that there is an imbalance of symmetry: either {A} is {G}-invariant and {B} is not, or vice versa. Suppose first that {A} is {G}-invariant and {B} is not. Substituting {f} by {Tf} in (1) and taking infima, we can then amplify (1) to the stronger inequality

\displaystyle A(f) \leq \inf_{T \in G} B(Tf).

In particular, it is often the case that there is a way to send {T} off to infinity in such a way that the functional {B(Tf)} has a limit {B_\infty(f)}, in which case we obtain the amplification

\displaystyle A(f) \leq B_\infty(f) \ \ \ \ \ (2)

of (1). Note that these amplified inequalities will now be {G}-invariant on both sides (assuming that the way in which we take limits as {T \rightarrow \infty} is itself {G}-invariant, which it often is in practice). Similarly, if {B} is {G}-invariant but {A} is not, we may instead amplify (1) to

\displaystyle \sup_{T \in G} A(Tf) \leq B(f)

and in particular (if {A(Tf)} has a limit {A_\infty(f)} as {T \rightarrow \infty})

\displaystyle A_\infty(f) \leq B(f). \ \ \ \ \ (3)

If neither {A(f)} nor {B(f)} has a {G}-symmetry, one can still use the {G}-symmetry by replacing {f} by {Tf} and taking a limit to conclude that

\displaystyle A_\infty(f) \leq B_\infty(f),

though now this inequality is not obviously stronger than the original inequality (1) (for instance it could well be trivial). In some cases one can also average over {G} instead of taking a limit as {T \rightarrow \infty}, thus averaging a non-invariant inequality into an invariant one.

As discussed in the previous post, this use of amplification gives rise to a general principle about inequalities: the most efficient inequalities are those in which the left-hand side and right-hand side enjoy the same symmetries. It is certainly possible to have true inequalities that have an imbalance of symmetry, but as shown above, such inequalities can always be amplified to more efficient and more symmetric inequalities. In the case when limits such as {A_\infty} and {B_\infty} exist, the limiting functionals {A_\infty(f)} and {B_\infty(f)} are often simpler in form, or more tractable analytically, than their non-limiting counterparts {A(f)} and {B(f)} (this is one of the main reasons why we take limits at infinity in the first place!), and so in many applications there is really no reason to use the weaker and more complicated inequality (1), when stronger, simpler, and more symmetric inequalities such as (2), (3) are available. Among other things, this explains why many of the most useful and natural inequalities one sees in analysis are dimensionally consistent.

One often tries to prove inequalities (1) by directly chaining together simpler inequalities. For instance, one might attempt to prove (1) by by first bounding {A(f)} by some auxiliary quantity {C(f)}, and then bounding {C(f)} by {B(f)}, thus obtaining (1) by chaining together two inequalities

\displaystyle A(f) \leq C(f) \leq B(f). \ \ \ \ \ (4)

A variant of the above principle then asserts that when proving inequalities by such direct methods, one should, whenever possible, try to maintain the symmetries that are present in both sides of the inequality. Why? Well, suppose that we ignored this principle and tried to prove (1) by establishing (4) for some {C} that is not {G}-invariant. Assuming for sake of argument that (4) were actually true, we could amplify the first half {A(f) \leq C(f)} of this inequality to conclude that

\displaystyle A(f) \leq \inf_{T \in G} C(Tf)

and also amplify the second half {C(f) \leq B(f)} of the inequality to conclude that

\displaystyle \sup_{T \in G} C(Tf) \leq B(f)

and hence (4) amplifies to

\displaystyle A(f) \leq \inf_{T \in G} C(Tf) \leq \sup_{T \in G} C(Tf) \leq B(f). \ \ \ \ \ (5)

Let’s say for sake of argument that all the quantities involved here are positive numbers (which is often the case in analysis). Then we see in particular that

\displaystyle \frac{\sup_{T \in G} C(Tf)}{\inf_{T \in G} C(Tf)} \leq \frac{B(f)}{A(f)}. \ \ \ \ \ (6)

Informally, (6) asserts that in order for the strategy (4) used to prove (1) to work, the extent to which {C} fails to be {G}-invariant cannot exceed the amount of “room” present in (1). In particular, when dealing with those “extremal” {f} for which the left and right-hand sides of (1) are comparable to each other, one can only have a bounded amount of non-{G}-invariance in the functional {C}. If {C} fails so badly to be {G}-invariant that one does not expect the left-hand side of (6) to be at all bounded in such extremal situations, then the strategy of proving (1) using the intermediate quantity {C} is doomed to failure – even if one has already produced some clever proof of one of the two inequalities {A(f) \leq C(f)} or {C(f) \leq B(f)} needed to make this strategy work. And even if it did work, one could amplify (4) to a simpler inequality

\displaystyle A(f) \leq C_\infty(f) \leq B(f) \ \ \ \ \ (7)

(assuming that the appropriate limit {C_\infty(f) = \lim_{T \rightarrow \infty} C(Tf)} existed) which would likely also be easier to prove (one can take whatever proofs one had in mind of the inequalities in (4), conjugate them by {T}, and take a limit as {T \rightarrow \infty} to extract a proof of (7)).

Here are some simple (but somewhat contrived) examples to illustrate these points. Suppose one wishes to prove the inequality

\displaystyle xy \leq x^2 + y^2 \ \ \ \ \ (8)

for all {x,y>0}. Both sides of this inequality are invariant with respect to interchanging {x} with {y}, so the principle suggests that when proving this inequality directly, one should only use sub-inequalities that are also invariant with respect to this interchange. However, in this particular case there is enough “room” in the inequality that it is possible (though somewhat unnatural) to violate this principle. For instance, one could decide (for whatever reason) to start with the inequality

\displaystyle 0 \leq (x - y/2)^2 = x^2 - xy + y^2/4

to conclude that

\displaystyle xy \leq x^2 + y^2/4

and then use the obvious inequality {x^2 + y^2/4 \leq x^2+y^2} to conclude the proof. Here, the intermediate quantity {x^2 + y^2/4} is not invariant with respect to interchange of {x} and {y}, but the failure is fairly mild (changing {x} and {y} only modifies the quantity {x^2 + y^2/4} by a multiplicative factor of {4} at most), and disappears completely in the most extremal case {x=y}, which helps explain why one could get away with using this quantity in the proof here. But it would be significantly harder (though still not impossible) to use non-symmetric intermediaries to prove the sharp version

\displaystyle xy \leq \frac{x^2 + y^2}{2}

of (8) (that is to say, the arithmetic mean-geometric mean inequality). Try it!

Similarly, consider the task of proving the triangle inequality

\displaystyle |z+w| \leq |z| + |w| \ \ \ \ \ (9)

for complex numbers {z, w}. One could try to leverage the triangle inequality {|x+y| \leq |x| + |y|} for real numbers by using the crude estimate

\displaystyle |z+w| \leq |\hbox{Re}(z+w)| + |\hbox{Im}(z+w)|

and then use the real triangle inequality to obtain

\displaystyle |\hbox{Re}(z+w)| \leq |\hbox{Re}(z)| + |\hbox{Re}(w)|


\displaystyle |\hbox{Im}(z+w)| \leq |\hbox{Im}(z)| + |\hbox{Im}(w)|

and then finally use the inequalities

\displaystyle |\hbox{Re}(z)|, |\hbox{Im}(z)| \leq |z| \ \ \ \ \ (10)


\displaystyle |\hbox{Re}(w)|, |\hbox{Im}(w)| \leq |w| \ \ \ \ \ (11)

but when one puts this all together at the end of the day, one loses a factor of two:

\displaystyle |z+w| \leq 2(|z| + |w|).

One can “blame” this loss on the fact that while the original inequality (9) was invariant with respect to phase rotation {(z,w) \mapsto (e^{i\theta} z, e^{i\theta} w)}, the intermediate expressions we tried to use when proving it were not, leading to inefficient estimates. One can try to be smarter than this by using Pythagoras’ theorem {|z|^2 = |\hbox{Re}(z)|^2 + |\hbox{Im}(z)|^2}; this reduces the loss from {2} to {\sqrt{2}} but does not eliminate it completely, which is to be expected as one is still using non-invariant estimates in the proof. But one can remove the loss completely by using amplification; see the previous blog post for details (we also give a reformulation of this amplification below).

Here is a slight variant of the above example. Suppose that you had just learned in class to prove the triangle inequality

\displaystyle (\sum_{n=1}^\infty |a_n+b_n|^2)^{1/2} \leq (\sum_{n=1}^\infty |a_n|^2)^{1/2} + (\sum_{n=1}^\infty |b_n|^2)^{1/2} \ \ \ \ \ (12)

for (say) real square-summable sequences {(a_n)_{n=1}^\infty}, {(b_n)_{n=1}^\infty}, and was tasked to conclude the corresponding inequality

\displaystyle (\sum_{n \in {\bf Z}} |a_n+b_n|^2)^{1/2} \leq (\sum_{n \in {\bf Z}} |a_n|^2)^{1/2} + (\sum_{n \in {\bf Z}} |b_n|^2)^{1/2} \ \ \ \ \ (13)

for doubly infinite square-summable sequences {(a_n)_{n \in {\bf Z}}, (b_n)_{n \in {\bf Z}}}. The quickest way to do this is of course to exploit a bijection between the natural numbers {1,2,\dots} and the integers, but let us say for sake of argument that one was unaware of such a bijection. One could then proceed instead by splitting the integers into the positive integers and the non-positive integers, and use (12) on each component separately; this is very similar to the strategy of proving (9) by splitting a complex number into real and imaginary parts, and will similarly lose a factor of {2} or {\sqrt{2}}. In this case, one can “blame” this loss on the abandonment of translation invariance: both sides of the inequality (13) are invariant with respect to shifting the sequences {(a_n)_{n \in {\bf Z}}}, {(b_n)_{n \in {\bf Z}}} by some shift {h} to arrive at {(a_{n-h})_{n \in {\bf Z}}, (b_{n-h})_{n \in {\bf Z}}}, but the intermediate quantities caused by splitting the integers into two subsets are not invariant. Another way of thinking about this is that the splitting of the integers gives a privileged role to the origin {n=0}, whereas the inequality (13) treats all values of {n} equally thanks to the translation invariance, and so using such a splitting is unnatural and not likely to lead to optimal estimates. On the other hand, one can deduce (13) from (12) by sending this symmetry to infinity; indeed, after applying a shift to (12) we see that

\displaystyle (\sum_{n=-N}^\infty |a_n+b_n|^2)^{1/2} \leq (\sum_{n=-N}^\infty |a_n|^2)^{1/2} + (\sum_{n=-N}^\infty |b_n|^2)^{1/2}

for any {N}, and on sending {N \rightarrow \infty} we obtain (13) (one could invoke the monotone convergence theorem here to justify the limit, though in this case it is simple enough that one can just use first principles).

Note that the principle of preserving symmetry only applies to direct approaches to proving inequalities such as (1). There is a complementary approach, discussed for instance in this previous post, which is to spend the symmetry to place the variable {f} “without loss of generality” in a “normal form”, “convenient coordinate system”, or a “good gauge”. Abstractly: suppose that there is some subset {Y} of {X} with the property that every {f \in X} can be expressed in the form {f = Tg} for some {T \in G} and {g \in Y} (that is to say, {X = GY}). Then, if one wishes to prove an inequality (1) for all {f \in X}, and one knows that both sides {A(f), B(f)} of this inequality are {G}-invariant, then it suffices to check (1) just for those {f} in {Y}, as this together with the {G}-invariance will imply the same inequality (1) for all {f} in {GY=X}. By restricting to those {f} in {Y}, one has given up (or spent) the {G}-invariance, as the set {Y} will in typical not be preserved by the group action {G}. But by the same token, by eliminating the invariance, one also eliminates the prohibition on using non-invariant proof techniques, and one is now free to use a wider range of inequalities in order to try to establish (1). Of course, such inequalities should make crucial use of the restriction {f \in Y}, for if they did not, then the arguments would work in the more general setting {f \in X}, and then the previous principle would again kick in and warn us that the use of non-invariant inequalities would be inefficient. Thus one should “spend” the symmetry wisely to “buy” a restriction {f \in Y} that will be of maximal utility in calculations (for instance by setting as many annoying factors and terms in one’s analysis to be {0} or {1} as possible).

As a simple example of this, let us revisit the complex triangle inequality (9). As already noted, both sides of this inequality are invariant with respect to the phase rotation symmetry {(z,w) \mapsto (e^{i\theta} z, e^{i\theta} w)}. This seems to limit one to using phase-rotation-invariant techniques to establish the inequality, in particular ruling out the use of real and imaginary parts as discussed previously. However, we can instead spend the phase rotation symmetry to restrict to a special class of {z} and {w}. It turns out that the most efficient way to spend the symmetry is to achieve the normalisation of {z+w} being a nonnegative real; this is of course possible since any complex number {z+w} can be turned into a nonnegative real by multiplying by an appropriate phase {e^{i\theta}}. Once {z+w} is a nonnegative real, the imaginary part disappears and we have

\displaystyle |z+w| = \hbox{Re}(z+w) = \hbox{Re}(z) + \hbox{Re}(w),

and the triangle inequality (9) is now an immediate consequence of (10), (11). (But note that if one had unwisely spent the symmetry to normalise, say, {z} to be a non-negative real, then one is no closer to establishing (9) than before one had spent the symmetry.)

Filed under: math.CA, tricks Tagged: amplification, arbitrage, inequalities

Tommaso DorigoA Narrow Escape From The Cumbre Rucu Volcano

If I am alive, I probably owe it to my current very good physical shape.

That does not mean I narrowly escaped a certain death; rather, it means that if I had been slower there are good chances I would have got hit by lightning, under arduous conditions, at 4300 meters of altitude.

read more

August 24, 2017

John PreskillTeacher Research at Caltech

The Yeh Lab group’s research activities at Caltech have been instrumental in studying semiconductors and making two-dimensional materials such as graphene, as highlighted on a BBC Horizons show.  

An emerging sub-field of semiconductor and two-dimensional research is that of Transition metal dichalcogenide (TDMC) monolayers. In particular, a monolayer of Tungsten disulfide, a TDMC, is believed to exhibit interesting semiconductor properties when exposed to circularly polarized light. My role in the Yeh Lab, as a visiting high school Physics Teacher intern,  for the Summer of 2017 has been to help research and set up a vacuum chamber to study Tungsten disulfide samples under circularly polarized light.

What makes semiconductors unique is that conductivity can be controlled by doping or changes in temperature. Higher temperatures or doping can bridge the energy gap between the valence and conduction bands; in other words, electrons can start moving from one side of the material to the other. Like graphene, Tungsten disulfide has a hexagonal, symmetric crystal structure. Monolayers of transition metal dichalcogenides in such a honeycomb structure have two valleys of energy. One valley can interact with another valley. Circularly polarized light is used to populate one valley versus another. This gives a degree of control over the population of electrons by polarized light.

The Yeh Lab Group prides itself on making in-house the materials and devices needed for research. For example, in order to study high temperature superconductors, the Yeh Group designed and built their own scanning tunneling microscope. When they began researching graphene, instead of buying vast quantities of graphene, they pioneered new ways of fabricating it. This research topic has been no different: Wei-hsiang Lin, a Caltech graduate student, has been busy fabricating Tungsten disulfide samples via chemical vapor deposition (CVD) using Tungsten oxide and sulfur powder.  


Wei-hsiang Lin’s area for using PLD to form the TDMC samples

The first portion of my assignment was spent learning more about vacuum chambers and researching what to order to confine our sample into the chamber. One must determine how the electronic feeds should be attached, how many are necessary, which vacuum pump will be used, how many flanges and gaskets of each size must be purchased in order to prepare the vacuum chamber.

There were also a number of flanges and parts already in the lab that needed to be examined for possible use. After triple checking the details the order was set with Kurt J. Lesker. Following a sufficient amount of anti-seize lubricant and numerous nuts, washers, and bolts, we assembled the vacuum chamber that will hold the TDMC sample.


The original vacuum chamber


Fun in the lab

IMG_0672 (1)

The prepped vacuum chamber


The second part of my assignment was spent researching how to set up the optics for our experiment and ordering the necessary equipment. Once the experiment is up and running we will be using a milliWatt broad spectrum light source that is directed into a monochromator to narrow down the light to specific wavelengths for testing. Ultimately we will be evaluating the giant wavelength range of 300 nm through 1800 nm. Following the monochromator, light will be refocused by a planoconvex lens. Next, light will pass through a linear polarizer and then a circular polarizer (quarter wave plate). Lastly, the light will be refocused by a biconvex lens into the vacuum chamber and onto a 1 mm by 1 mm area of the sample.  

Soon, we are excited to verify how tungsten disulfide responds to circularly polarized light.  Does our sample resonate at the exact same wavelengths as the first labs found? Why or why not?  What other unique properties are observed?  How can they be explained?  How is the Hall Effect observed?  What does this mean for the possible applications of semiconductors? How can the transfer of information from one valley to another be used in advanced electronics for communication?  Then, similar exciting experimentation will take place with graphene under circularly polarized light.

I love the sharp contrast of the high-energy, adolescent classroom to the quiet, calm of the lab.  I am grateful for getting to learn a different and new-to-me area of Physics during the summer.  Yes, I remember studying polarization and semiconductors in high school and as an undergraduate.  But it is completely different to set up an experiment from scratch, to be a part of groundbreaking research in these areas.  And it is just fun to get to work with your hands and build research equipment at a world leading research university.  Sometimes Science teachers can get bogged down with all the paperwork and meetings.  I am grateful to have had this fabulous opportunity during the summer to work on applied Science and to be re-energized in my love for Physics.  I look forward to meeting my new batch of students in a few short weeks to share my curiosity and joy for learning how the world works with them.

August 23, 2017

Scott AaronsonAmsterdam art museums plagiarizing my blog?

This past week I had the pleasure of attending COLT (Conference on Learning Theory) 2017 in Amsterdam, and of giving an invited talk on “PAC-Learning and Reconstruction of Quantum States.”  You can see the PowerPoint slides here; videos were also made, but don’t seem to be available yet.

This was my first COLT, but almost certainly not the last.  I learned lots of cool new tidbits, from the expressive power of small-depth neural networks, to a modern theoretical computer science definition of “non-discriminatory” (namely, your learning algorithm’s output should be independent of protected categories like race, sex, etc. after conditioning on the truth you’re trying to predict), to the inapproximability of VC dimension (assuming the Exponential Time Hypothesis).  You can see the full schedule here.  Thanks so much to the PC chairs, Ohad Shamir and Satyen Kale, for inviting me and for putting on a great conference.

And one more thing: I’m not normally big on art museums, but Amsterdam turns out to have two in close proximity to each other—the Rijksmuseum and the Stedelijk—each containing something that Shtetl-Optimized readers might recognize.


Photo credits: Ronald de Wolf and Marijn Heule.

Tommaso DorigoRevenge Of The Slimeballs Part 5: When US Labs Competed For Leadership In HEP

This is the fifth and final part of Chapter 3 of the book "Anomaly! Collider Physics and the Quest for New Phenomena at Fermilab". (the beginning of the chapter was omitted since it described a different story). The chapter recounts the pioneering measurement of the Z mass by the CDF detector, and the competition with SLAC during the summer of 1989.  The title of the post is the same as the one of chapter 3, and it refers to the way some SLAC physicists called their Fermilab colleagues, whose hadron collider was to their eyes obviously inferior to the electron-positron linear collider.

read more

August 22, 2017

Chad OrzelThe Age Math Game

I keep falling down on my duty to provide cute-kid content, here; I also keep forgetting to post something about a nerdy bit of our morning routine. So, let’s maximize the bird-to-stone ratio, and do them at the same time.

The Pip can be a Morning Dude at times, but SteelyKid is never very happy to get up. So on weekday mornings, we’ve developed a routine to ease the two of them into the day: SteelyKid has a radio alarm, and then I go in and gently shake her out of bed. I usually carry her downstairs to the couch, where she burrows into the cushions a bit; The Pip mostly comes downstairs under his own power, though occasionally he needs a lot of badgering to get him out of bed.

Once on the couch, we play one level of Candy Crush on my phone, often while SteelyKid has a small snack. At the end of this, if we beat the level, we get a leaderboard showing our place among my Facebook friends who play, and also Kate’s ranking (she’s something like a hundred levels ahead of us, so she always has a ranking…).

Once we get those two numbers, we play a math game with them: The kids have to figure out how to combine those two numbers to get their ages (currently five and nine). Allowed operations are all ordinary arithmetic (addition, subtraction, multiplication, division), and also operations between the digits of two-digit numbers. Extra pairs of the starting numbers can be brought in as needed

So, for example, if we’re in eighth place and Kate’s one spot ahead, the process of getting to 5 would be something like:

“Seven plus eight is fifteen, and one times five is five, then you’re done.”

And to get to nine would be:

“Eight minus seven is one, then add that to another eight, and you get nine.”

SteelyKid learned about square roots at some point, and she’s now a big fan of taking the square root of nine to get a three– so if we end up in second and Kate was seventh, she’ll go for:

“Two plus seven is nine, and the square root of nine is three, then three plus another two is five.”

I have no recollection of how I started doing this with SteelyKid (it used to be just her, but The Pip decided a few months back that he wanted in on the game), but this works amazingly well to get them to wake up a bit. It’s a nice introduction to math-as-a-game, too, which I hope will serve them well down the line.

And there’s your cute-and-nerdy kid content. Also here’s a bonus photo of the two of them wearing eclipse glasses in preparation for yesterday’s solar spectacle:

Sillyheads modeling eclipse glasses. Photo by Kate Nepveu.

(They were duly impressed by the Sun looking like a crescent moon, up here in 60-odd-percent country. They saw it at day camp; I was waiting for an eye doctor appointment at a local mall, and shared around a set of eclipse glasses with random shoppers and retail workers.)

August 21, 2017

John PreskillTopological qubits: Arriving in 2018?

Editor‘s note: This post was prepared jointly by Ryan Mishmash and Jason Alicea.

Physicists appear to be on the verge of demonstrating proof-of-principle “usefulness” of small quantum computers.  Preskill’s notion of quantum supremacy spotlights a particularly enticing goal: use a quantum device to perform some computation—any computation in fact—that falls beyond the reach of the world’s best classical computers.  Efforts along these lines are being vigorously pursued along many fronts, from academia to large corporations to startups.  IBM’s publicly accessible 16-qubit superconducting device, Google’s pursuit of a 7×7 superconducting qubit array, and the recent synthesis of a 51-qubit quantum simulator using rubidium atoms are a few of many notable highlights.  While the number of qubits obtainable within such “conventional” approaches has steadily risen, synthesizing the first “topological qubit” remains an outstanding goal.  That ceiling may soon crumble however—vaulting topological qubits into a fascinating new chapter in the quest for scalable quantum hardware.

Why topological quantum computing?

As quantum computing progresses from minimalist quantum supremacy demonstrations to attacking real-world problems, hardware demands will naturally steepen.  In, say, a superconducting-qubit architecture, a major source of overhead arises from quantum error correction needed to combat decoherence.  Quantum-error-correction schemes such as the popular surface-code approach encode a single fault-tolerant logical qubit in many physical qubits, perhaps thousands.  The number of physical qubits required for practical applications can thus rapidly balloon.

The dream of topological quantum computing (introduced by Kitaev) is to construct hardware inherently immune to decoherence, thereby mitigating the need for active error correction.  In essence, one seeks physical qubits that by themselves function as good logical qubits.  This lofty objective requires stabilizing exotic phases of matter that harbor emergent particles known as “non-Abelian anyons”.  Crucially, nucleating non-Abelian anyons generates an exponentially large set of ground states that cannot be distinguished from each other by any local measurement.  Topological qubits encode information in those ground states, yielding two key virtues:

(1) Insensitivity to local noise.  For reference, consider a conventional qubit encoded in some two-level system, with the 0 and 1 states split by an energy \hbar \omega.  Local noise sources—e.g., random electric and magnetic fields—cause that splitting to fluctuate stochastically in time, dephasing the qubit.  In practice one can engender immunity against certain environmental perturbations.  One famous example is the transmon qubit (see “Charge-insensitive qubit design derived from the Cooper pair box” by Koch et al.) used extensively at IBM, Google, and elsewhere.  The transmon is a superconducting qubit that cleverly suppresses the effects of charge noise by operating in a regime where Josephson couplings are sizable compared to charging energies.  Transmons remain susceptible, however, to other sources of randomness such as flux noise and critical-current noise.  By contrast, topological qubits embed quantum information in global properties of the system, building in immunity against all local noise sources.  Topological qubits thus realize “perfect” quantum memory.

(2) Perfect gates via braiding.  By exploiting the remarkable phenomenon of non-Abelian statistics, topological qubits further enjoy “perfect” quantum gates: Moving non-Abelian anyons around one another reshuffles the system among the ground states—thereby processing the qubits—in exquisitely precise ways that depend only on coarse properties of the exchange.

Disclaimer: Adjectives like “perfect” should come with the qualifier “up to exponentially small corrections”, a point that we revisit below.

Experimental status

The catch is that systems supporting non-Abelian anyons are not easily found in nature.  One promising topological-qubit implementation exploits exotic 1D superconductors whose ends host “Majorana modes”—novel zero-energy degrees of freedom that underlie non-Abelian-anyon physics.  In 2010, two groups (Lutchyn et al. and Oreg et al.) proposed a laboratory realization that combines semiconducting nanowires, conventional superconductors, and modest magnetic fields.

Since then, the materials-science progress on nanowire-superconductor hybrids has been remarkable.  Researchers can now grow extremely clean, versatile devices featuring various manipulation and readout bells and whistles.  These fabrication advances paved the way for experiments that have reported increasingly detailed Majorana characteristics: tunneling signatures including recent reports of long-sought quantized response, evolution of Majorana modes with system size, mapping out of the phase diagram as a function of external parameters, etc.  Alternate explanations are still being debated though.  Perhaps the most likely culprit are conventional localized fermionic levels (“Andreev bound states”) that can imitate Majorana signatures under certain conditions; see in particular Liu et al.  Still, the collective experimental effort on this problem over the last 5+ years has provided mounting evidence for the existence of Majorana modes.  Revealing their prized quantum-information properties poses a logical next step.

Validating a topological qubit

Ideally one would like to verify both hallmarks of topological qubits noted above—“perfect” insensitivity to local noise and “perfect” gates via braiding.  We will focus on the former property, which can be probed in simpler device architectures.  Intuitively, noise insensitivity should imply long qubit coherence times.  But how do you pinpoint the topological origin of long coherence times, and in any case what exactly qualifies as “long”?

Here is one way to sharply address these questions (for more details, see our work in Aasen et al.).  As alluded to in our disclaimer above, logical 0 and 1 topological-qubit states aren’t exactly degenerate.  In nanowire devices they’re split by an energy \hbar \omega that is exponentially small in the separation distance L between Majorana modes divided by the superconducting coherence length \xi.  Correspondingly, the qubit states are not quite locally indistinguishable either, and hence not perfectly immune to local noise.  Now imagine pulling apart Majorana modes to go from a relatively poor to a perfect topological qubit.  During this process two things transpire in tandem: The topological qubit’s oscillation frequency, \omega, vanishes exponentially while the dephasing time T_2 becomes exponentially long.  That is,


This scaling relation could in fact be used as a practical definition of a topologically protected quantum memory.  Importantly, mimicking this property in any non-topological qubit would require some form of divine intervention.  For example, even if one fine-tuned conventional 0 and 1 qubit states (e.g., resulting from the Andreev bound states mentioned above) to be exactly degenerate, local noise could still readily produce dephasing.

As discussed in Aasen et al., this topological-qubit scaling relation can be tested experimentally via Ramsey-like protocols in a setup that might look something like the following:


This device contains two adjacent Majorana wires (orange rectangles) with couplings controlled by local gates (“valves” represented by black switches).  Incidentally, the design was inspired by a gate-controlled variation of the transmon pioneered in Larsen et al. and de Lange et al.  In fact, if only charge noise was present, we wouldn’t stand to gain much in the way of coherence times: both the transmon and topological qubit would yield exponentially long T_2 times.  But once again, other noise sources can efficiently dephase the transmon, whereas a topological qubit enjoys exponential protection from all sources of local noise.  Mathematically, this distinction occurs because the splitting for transmon qubit states is exponentially flat only with respect to variations in a “gate offset” n_g.  For the topological qubit, the splitting is exponentially flat with respect to variations in all external parameters (e.g., magnetic field, chemical potential, etc.), so long as Majorana modes still survive.  (By “exponentially flat” we mean constant up to exponentially small deviations.)  Plotting the energies of the qubit states in the two respective cases versus external parameters, the situation can be summarized as follows:


Outlook: Toward “topological quantum ascendancy”

These qubit-validation experiments constitute a small stepping stone toward building a universal topological quantum computer.  Explicitly demonstrating exponentially protected quantum information as discussed above would, nevertheless, go a long way toward establishing practical utility of Majorana-based topological qubits.  One might even view this goal as single-qubit-level “topological quantum ascendancy”.  Completion of this milestone would further set the stage for implementing “perfect” quantum gates, which requires similar capabilities albeit in more complex devices.  Researchers at Microsoft and elsewhere have their sights set on bringing a prototype topological qubit to life in the very near future.  It is not unreasonable to anticipate that 2018 will mark the debut of the topological qubit.  We could of course be off target.  There is, after all, still plenty of time in 2017 to prove us wrong.

August 18, 2017

Scott AaronsonWhat I believe II (ft. Sarah Constantin and Stacey Jeffery)

Unrelated Update: To everyone who keeps asking me about the “new” P≠NP proof: I’d again bet $200,000 that the paper won’t stand, except that the last time I tried that, it didn’t achieve its purpose, which was to get people to stop asking me about it. So: please stop asking, and if the thing hasn’t been refuted by the end of the week, you can come back and tell me I was a closed-minded fool.

In my post “The Kolmogorov Option,” I tried to step back from current controversies, and use history to reflect on the broader question of how nerds should behave when their penchant for speaking unpopular truths collides head-on with their desire to be kind and decent and charitable, and to be judged as such by their culture.  I was gratified to get positive feedback about this approach from men and women all over the ideological spectrum.

However, a few people who I like and respect accused me of “dogwhistling.” They warned, in particular, that if I wouldn’t just come out and say what I thought about the James Damore Google memo thing, then people would assume the very worst—even though, of course, my friends themselves knew better.

So in this post, I’ll come out and say what I think.  But first, I’ll do something even better: I’ll hand the podium over to two friends, Sarah Constantin and Stacey Jeffery, both of whom were kind enough to email me detailed thoughts in response to my Kolmogorov post.

Sarah Constantin completed her PhD in math at Yale. I don’t think I’ve met her in person yet, but we have a huge number of mutual friends in the so-called “rationalist community.”  Whenever Sarah emails me about something I’ve written, I pay extremely close attention, because I have yet to read a single thing by her that wasn’t full of insight and good sense.  I strongly urge anyone who likes her beautiful essay below to check out her blog, which is called Otium.

Sarah Constantin’s Commentary:

I’ve had a women-in-STEM essay brewing in me for years, but I’ve been reluctant to actually write publicly on the topic for fear of stirring up a firestorm of controversy.  On the other hand, we seem to be at a cultural inflection point on the issue, especially in the wake of the leaked Google memo, and other people are already scared to speak out, so I think it’s past time for me to put my name on the line, and Scott has graciously provided me a platform to do so.

I’m a woman in tech myself. I’m a data scientist doing machine learning for drug discovery at Recursion Pharmaceuticals, and before that I was a data scientist at Palantir. Before that I was a woman in math — I got my PhD from Yale, studying applied harmonic analysis. I’ve been in this world all my adult life, and I obviously don’t believe my gender makes me unfit to do the work.

I’m also not under any misapprehension that I’m some sort of exception. I’ve been mentored by Ingrid Daubechies and Maryam Mirzakhani (the first female Fields Medalist, who died tragically young last month).  I’ve been lucky enough to work with women who are far, far better than me.  There are a lot of remarkable women in math and computer science — women just aren’t the majority in those fields. But “not the majority” doesn’t mean “rare” or “unknown.”

I even think diversity programs can be worthwhile. I went to the Institute for Advanced Studies’ Women and Math Program, which would be an excellent graduate summer school even if it weren’t all-female, and taught at its sister program for high school girls, which likewise is a great math camp independent of the gender angle. There’s a certain magic, if you’re in a male-dominated field, of once in a while being in a room full of women doing math, and I hope that everybody gets to have that experience once.  

But (you knew the “but” was coming), I think the Google memo was largely correct, and the way people conventionally talk about women in tech is wrong.

Let’s look at some of his claims. From the beginning of the memo:

  • Google’s political bias has equated the freedom from offense with psychological safety, but shaming into silence is the antithesis of psychological safety.
  • This silencing has created an ideological echo chamber where some ideas are too sacred to be honestly discussed.
  • The lack of discussion fosters the most extreme and authoritarian elements of this ideology.
  • Extreme: all disparities in representation are due to oppression
  • Authoritarian: we should discriminate to correct for this oppression

Okay, so there’s a pervasive assumption that any deviation from 50% representation of women in technical jobs is a.) due to oppression, and b.) ought to be corrected by differential hiring practices. I think it is basically true that people widely believe this, and that people can lose their jobs for openly contradicting it (as James Damore, the author of the memo, did).  I have heard people I work with advocating hiring quotas for women (i.e. explicitly earmarking a number of jobs for women candidates only).  It’s not a strawman.

Then, Damore disagrees with this assumption:

  • Differences in distributions of traits between men and women may in part explain why we don’t have 50% representation of women in tech and leadership. Discrimination to reach equal representation is unfair, divisive, and bad for business.

Again, I agree with Damore. Note that this doesn’t mean that I must believe that sexism against women isn’t real and important (I’ve heard enough horror stories to be confident that some work environments are toxic to women).  It doesn’t even mean that I must be certain that the different rates of men and women in technical fields are due to genetics.  I’m very far from certain, and I’m not an expert in psychology. I don’t think I can do justice to the science in this post, so I’m not going to cover the research literature.

But I do think it’s irresponsible to assume a priori that there are no innate sex differences that might explain what we see.  It’s an empirical matter, and a topic for research, not dogma.

Moreover, I think discrimination on the basis of sex to reach equal representation is unfair and unproductive.  It’s unfair, because it’s not meritocratic.  You’re not choosing the best human for the job regardless of gender.

I think women might actually benefit from companies giving genuine meritocracy a chance. “Blind” auditions (in which the evaluator doesn’t see the performer) gave women a better chance of landing orchestra jobs; apparently, orchestras were prejudiced against female musicians, and the blinding canceled out that prejudice. Google’s own research has actually shown that the single best predictor of work performance is a work sample — testing candidates with a small project similar to what they’d do on the job. Work samples are easy to anonymize to reduce gender bias, and they’re more effective than traditional interviews, where split-second first impressions usually decide who gets hired, but don’t correlate at all with job performance. A number of tech companies have switched to work samples as part of their interview process.  I used work samples myself when I was hiring for a startup, just because they seemed more accurate at predicting who’d be good at the job; entirely without intending to, I got a 50% gender ratio.  If you want to reduce gender bias in tech, it’s worth at least considering blinded hiring via work samples.

Moreover, thinking about “representation” in science and technology reflects underlying assumptions that I think are quite dangerous.

You expect interest groups to squabble over who gets a piece of the federal budget. In politics, people will band together in blocs, and try to get the biggest piece of the spoils they can.  “Women should get such-and-such a percent of tech jobs” sounds precisely like this kind of politicking; women are assumed to be a unified bloc who will vote together, and the focus is on what size chunk they can negotiate for themselves. If a tech job (or a university position) were a cushy sinecure, a ticket to privilege, and nothing more, you might reasonably ask “how come some people get more goodies than others? Isn’t meritocracy just an excuse to restrict the goodies to your preferred group?”

Again, this is not a strawman. Here’s one Vox response to the memo stating explicitly that she believes women are a unified bloc:

The manifesto’s sleight-of-hand delineation between “women, on average” and the actual living, breathing women who have had to work alongside this guy failed to reassure many of those women — and failed to reassure me. That’s because the manifesto’s author overestimated the extent to which women are willing to be turned against their own gender.

Speaking for myself, it doesn’t matter to me how soothingly a man coos that I’m not like most women, when those coos are accompanied by misogyny against most women. I am a woman. I do not stop being one during the parts of the day when I am practicing my craft. There can be no realistic chance of individual comfort for me in an environment where others in my demographic categories (or, really, any protected demographic categories) are subjected to skepticism and condescension.

She can’t be comfortable unless everybody in any protected demographic category — note that this is a legal, governmental category — is given the benefit of the doubt?  That’s a pretty collectivist commitment!

Or, look at Piper Harron, an assistant professor in math who blogged on the American Mathematical Society’s website that universities should simply “stop hiring white cis men”, and explicitly says “If you are on a hiring committee, and you are looking at applicants and you see a stellar white male applicant, think long and hard about whether your department needs another white man. You are not hiring a researching robot who will output papers from a dark closet. You are hiring an educator, a role model, a spokesperson, an advisor, a committee person … There is no objectivity. There is no meritocracy.”

Piper Harron reflects an extreme, of course, but she’s explicitly saying, on America’s major communication channel for and by mathematicians, that whether you get to work in math should not be based on whether you’re actually good at math. For her, it’s all politics.  Life itself is political, and therefore a zero-sum power struggle between groups.  

But most of us, male or female, didn’t fall in love with science and technology for that. Science is the mission to explore and understand our universe. Technology is the project of expanding human power to shape that universe. What we do towards those goals will live longer than any “protected demographic category”, any nation, any civilization.  We know how the Babylonians mapped the stars.

Women deserve an equal chance at a berth on the journey of exploration not because they form a political bloc but because some of them are discoverers and can contribute to the human mission.

Maybe, in a world corrupted by rent-seeking, the majority of well-paying jobs have some element of unearned privilege; perhaps almost all of us got at least part of our salaries by indirectly expropriating someone who had as good a right to it as us.

But that’s not a good thing, and that’s not what we hope for science and engineering to be, and I truly believe that this is not the inevitable fate of the human race — that we can only squabble over scraps, and never create.  

I’ve seen creation, and I’ve seen discovery. I know they’re real.

I care a lot more about whether my company achieves its goal of curing 100 rare diseases in 10 years than about the demographic makeup of our team.  We have an actual mission; we are trying to do something beyond collecting spoils.  

Do I rely on brilliant work by other women every day? I do. My respect for myself and my female colleagues is not incompatible with primarily caring about the mission.

Am I “turning against my own gender” because I see women as individuals first? I don’t think so. We’re half the human race, for Pete’s sake! We’re diverse. We disagree. We’re human.

When you think of “women-in-STEM” as a talking point on a political agenda, you mention Ada Lovelace and Grace Hopper in passing, and move on to talking about quotas.  When you think of women as individuals, you start to notice how many genuinely foundational advances were made by women — just in my own field of machine learning, Adele Cutler co-invented random forests, Corrina Cortes co-invented support vector machines, and Fei Fei Li created the famous ImageNet benchmark dataset that started a revolution in image recognition.

As a child, my favorite book was Carl Sagan’s Contact, a novel about Ellie Arroway, an astronomer loosely based on his wife Ann Druyan. The name is not an accident; like the title character in Sinclair Lewis’ Arrowsmith, Ellie is a truth-seeking scientist who battles corruption, anti-intellectualism, and blind prejudice.  Sexism is one of the challenges she faces, but the essence of her life is about wonder and curiosity. She’s what I’ve always tried to become.

I hope that, in seeking to encourage the world’s Ellies in science and technology, we remember why we’re doing that in the first place. I hope we remember humans are explorers.

Now let’s hear from another friend who wrote to me recently, and who has a slightly different take.  Stacey Jeffery is a quantum computing theorist at one of my favorite research centers, CWI in Amsterdam.  She completed her PhD at University of Waterloo, and has done wonderful work on quantum query complexity and other topics close to my heart.  When I was being viciously attacked in the comment-171 affair, Stacey was one of the first people to send me a note of support, and I’ve never forgotten it.

Stacey Jeffery’s Commentary

I don’t think Google was right to fire Damore. This makes me a minority among people with whom I have discussed this issue.  Hopefully some people come out in the comments in support of the other position, so it’s not just me presenting that view, but the main argument I encountered was that what he said just sounded way too sexist for Google to put up with.  I agree with part of that, it did sound sexist to me.  In fact it also sounded racist to me. But that’s not because he necessarily said anything actually sexist or actually racist, but because he said the kinds of things that you usually only hear from sexist people, and in particular, the kind of sexist people who are also racist.  I’m very unlikely to try to pursue further interaction with a person who says these kinds of things for those reasons, but I think firing him for what he said between the lines sets a very bad precedent.  It seems to me he was fired for associating himself with the wrong ideas, and it does feel a bit like certain subjects are not up for rational discussion.  If Google wants an open environment, where employees can feel safe discussing company policy, I don’t think this contributes to that.  If they want their employees, and the world, to think that they aim for diversity because it’s the most rational course of action to achieve their overall objectives, rather than because it serves some secret agenda, like maintaining a PC public image, then I don’t think they’ve served that cause either.  Personally, this irritates me the most, because I feel they have damaged the image for a cause I feel strongly about.

My position is independent of the validity of Damore’s attempt at scientific argument, which is outside my area of expertise.  I personally don’t think it’s very productive for non-social-scientists to take authoritative positions on social science issues, especially ones that appear to be controversial within the field (but I say this as a layperson).  This may include some of the other commentary in this blog post, which I have not yet read, and might even extend to Scott’s decision to comment on this issue at all (but this bridge was crossed in the previous blog post).  However, I think one of the reasons that many of us do this is that the burden of solving the problem of too few women in STEM is often placed on us.  Some people in STEM feel they are blamed for not being welcoming enough to women (in fact, in my specific field, it’s my experience that the majority of people are very sympathetic).  Many scientific funding applications even ask applicants how they plan to address the issue of diversity, as if they should be the ones to come up with a solution for this difficult problem that nobody knows the answer to, and is not even within their expertise.  So it’s not surprising when these same people start to think about and form opinions on these social science issues.  Obviously, we working in STEM have valuable insight into how we might encourage women to pursue STEM careers, and we should be pushed to think about this, but we don’t have all the answers (and maybe we should remember that the next time we consider authoring an authoritative memo on the subject).

Scott’s Mansplaining Commentary

I’m incredibly grateful to Sarah and Stacey for sharing their views.  Now it’s time for me to mansplain my own thoughts in light of what they said.  Let me start with a seven-point creed.

1. I believe that science and engineering, both in academia and in industry, benefit enormously from contributions from people of every ethnic background and gender identity.  This sort of university-president-style banality shouldn’t even need to be said, but in a world where the President of the US criticizes neo-Nazis only under extreme pressure from his own party, I suppose it does.

2. I believe that there’s no noticeable difference in average ability between men and women in STEM fields—or if there’s some small disparity, for all I know the advantage goes to women. I have enough Sheldon Cooper in me that, if this hadn’t been my experience, I’d probably let it slip that it hadn’t been, but it has been.  When I taught 6.045 (undergrad computability and complexity) at MIT, women were only 20% or so of the students, but for whatever reasons they were wildly overrepresented among the top students.

3. I believe that women in STEM face obstacles that men don’t.  These range from the sheer awkwardness of sometimes being the only woman in a room full of guys, to challenges related to pregnancy and childcare, to actual belittlement and harassment.  Note that, even if men in STEM fields are no more sexist on average than men in other fields—or are less sexist, as one might expect from their generally socially liberal views and attitudes—the mere fact of the gender imbalance means that women in STEM will have many more opportunities to be exposed to whatever sexists there are.  This puts a special burden on us to create a welcoming environment for women.

4. Given that we know that gender gaps in interest and inclination appear early in life, I believe in doing anything we can to encourage girls’ interest in STEM fields.  Trust me, my four-year-old daughter Lily wishes I didn’t believe so fervently in working with her every day on her math skills.

5. I believe that gender diversity is valuable in itself.  It’s just nicer, for men and women alike, to have a work environment with many people of both sexes—especially if (as is often the case in STEM) so much of our lives revolves around our work.  I think that affirmative action for women, women-only scholarships and conferences, and other current efforts to improve gender diversity can all be defended and supported on that ground alone.

6. I believe that John Stuart Mill’s The Subjection of Women is one of the masterpieces of history, possibly the highest pinnacle that moral philosophy has ever reached.  Everyone should read it carefully and reflect on it if they haven’t already.

7. I believe it’s a tragedy that the current holder of the US presidency is a confessed sexual predator, who’s full of contempt not merely for feminism, but for essentially every worthwhile human value. I believe those of us on the “pro-Enlightenment side” now face the historic burden of banding together to stop this thug by every legal and peaceful means available. I believe that, whenever the “good guys” tear each other down in internecine warfare—e.g. “nerds vs. feminists”—it represents a wasted opportunity and an unearned victory for the enemies of progress.

OK, now for the part that might blow some people’s minds.  I hold that every single belief above is compatible with what James Damore wrote in his now-infamous memo—at least, if we’re talking about the actual words in it.  In some cases, Damore even makes the above points himself.  In particular, there’s nothing in what he wrote about female Googlers being less qualified on average than male Googlers, or being too neurotic to code, or anything like that: the question at hand is just why there are fewer women in these positions, and that in turn becomes a question about why there are fewer women earlier in the CS pipeline.  Reasonable people need not agree about the answers to those questions, or regard them as known or obvious, to see that the failure to make this one elementary distinction, between quality and quantity, already condemns 95% of Damore’s attackers as not having read or understood what he wrote.

Let that be the measure of just how terrifyingly efficient the social-media outrage machine has become at twisting its victims’ words to fit a clickbait narrative—a phenomenon with which I happen to be personally acquainted.  Strikingly, it seems not to make the slightest difference if (as in this case) the original source text is easily available to everyone.

Still, while most coverage of Damore’s memo was depressing in its monotonous incomprehension, dissent was by no means confined to the right-wingers eager to recruit Damore to their side.  Peter Singer—the legendary leftist moral philosopher, and someone whose fearlessness and consistency I’ve always admired whether I’ve agreed with him or not—wrote a powerful condemnation of Google’s decision to fire Damore.  Scott Alexander was brilliant as usual in picking apart bad arguments.  Megan McArdle drew on her experiences to illustrate some of Damore’s contentions.  Steven Pinker tweeted that Damore’s firing “makes [the] job of anti-Trumpists harder.”

Like Peter Singer, and also like Sarah Constantin and Stacey Jeffery above, I have no plans to take any position on biological differences in male and female inclinations and cognitive styles, and what role (if any) such differences might play in 80% of Google engineers being male—or, for that matter, what role they might play in 80% of graduating veterinarians now being female, or other striking gender gaps.  I decline to take a position not only because I’m not an expert, but also because, as Singer says, doing so isn’t necessary to reach the right verdict about Damore’s firing.  It suffices to note that the basic thesis being discussed—namely, that natural selection doesn’t stop at the neck, and that it’s perfectly plausible that it acted differently on women and men in ways that might help explain many of the population-level differences that we see today—can also be found in, for example, The Blank Slate by Steven Pinker, and other mainstream works by some of the greatest thinkers alive.

And therefore I say: if James Damore deserves to be fired from Google, for treating evolutionary psychology as potentially relevant to social issues, then Steven Pinker deserves to be fired from Harvard for the same offense.

Yes, I realize that an employee of a private company is different from a tenured professor.  But I don’t see why it’s relevant here.  For if someone really believes that mooting the hypothesis of an evolutionary reason for average differences in cognitive styles between men and women, is enough by itself to create a hostile environment for women—well then, why should tenure be a bar to firing, any more than it is in cases of sexual harassment?

But the reductio needn’t stop there.  It seems to me that, if Damore deserves to be fired, then so do the 56% of Googlers who said in a poll that they opposed his firing.  For isn’t that 56% just as responsible for maintaining a hostile environment as Damore himself was? (And how would Google find out which employees opposed the firing? Well, if there’s any company on earth that could…)  Furthermore, after those 56% of Googlers are fired, any of the remaining 44% who think the 56% shouldn’t have been fired should be fired as well!  And so on iteratively, until only an ideologically reliable core remains, which might or might not be the empty set.

OK, but while the wider implications of Damore’s firing have frightened and depressed me all week, as I said, I depart from Damore on the question of affirmative action and other diversity policies.  Fundamentally, what I want is a sort of negotiated agreement or bargain, between STEM nerds and the wider culture in which they live.  The agreement would work like this: STEM nerds do everything they can to foster diversity, including by creating environments that are welcoming for women, and by supporting affirmative action, women-only scholarships and conferences, and other diversity policies.  The STEM nerds also agree never to talk in public about possible cognitive-science explanations for gender disparities in which careers people choose, or overlapping bell curves,  or anything else potentially inflammatory.  In return, just two things:

  1. Male STEM nerds don’t regularly get libelled as misogynist monsters, who must be scaring all the women away with their inherently gross, icky, creepy, discriminatory brogrammer maleness.
  2. The fields beloved by STEM nerds are suffered to continue to exist, rather than getting destroyed and rebuilt along explicitly ideological lines, as already happened with many humanities and social science fields.

So in summary, neither side advances its theories about the causes of gender gaps; both sides simply agree that there are more interesting topics to explore.  In concrete terms, the social-justice side gets to retain 100% of what it has now, or maybe even expand it.  And all it has to offer in exchange is “R-E-S-P-E-C-T“!  Like, don’t smear and shame male nerds as a class, or nerdy disciplines themselves, for gender gaps that the male nerds would be as happy as anybody to see eradicated.

The trouble is that, fueled by outrage-fests on social media, I think the social-justice side is currently failing to uphold its end of this imagined bargain.  Nearly every day the sun rises to yet another thinkpiece about the toxic “bro culture” of Silicon Valley: a culture so uniquely and incorrigibly misogynist, it seems, that it still intentionally keeps women out, even after law and biology and most other white-collar fields have achieved or exceeded gender parity, their own “bro cultures” notwithstanding.  The trouble with this slander against male STEM nerds, besides its fundamental falsity (which Scott Alexander documented), is that puts the male nerds into an impossible position.  For how can they refute the slander without talking about other possible explanations for fields like CS being 80% male, which is the very thing we all know they’re not supposed to talk about?

In Europe, in the Middle Ages, the Church would sometimes enjoy forcing the local Jews into “disputations” about whose religion was the true one.  At these events, a popular tactic on the Church’s side was to make statements that the Jews couldn’t possibly answer without blaspheming the name of Christ—which, of course, could lead to the Jews’ expulsion or execution if they dared it.

Maybe I have weird moral intuitions, but it’s hard for me to imagine a more contemptible act of intellectual treason, than deliberately trapping your opponents between surrender and blasphemy.  I’d actually rather have someone force me into one or the other, than make me choose, and thereby make me responsible for whichever choice I made.  So I believe the social-justice left would do well to forswear this trapping tactic forever.

Ironically, I suspect that in the long term, doing so would benefit no entity more than the social-justice left itself.  If I had to steelman, in one sentence, the argument that in the space of one year propelled the “alt-right” from obscurity in dark and hateful corners of the Internet, to the improbable and ghastly ascent of Donald Trump and his white-nationalist brigade to the most powerful office on earth, the argument would be this:

If the elites, the technocrats, the “Cathedral”-dwellers, were willing to lie to the masses about humans being blank slates—and they obviously were—then why shouldn’t we assume that they also lied to us about healthcare and free trade and guns and climate change and everything else?

We progressives deluded ourselves that we could permanently shame our enemies into silence, on pain of sexism, racism, xenophobia, and other blasphemies.  But the “victories” won that way were hollow and illusory, and the crumbling of the illusion brings us to where we are now: with a vindictive, delusional madman in the White House who has a non-negligible chance of starting a nuclear war this week.

The Enlightenment was a specific historical period in 18th-century Europe.  But the term can also be used much more broadly, to refer to every trend in human history that’s other than horrible.  Seen that way, the Enlightenment encompasses the scientific revolution, the abolition of slavery, the decline of all forms of violence, the spread of democracy and literacy, and the liberation of women from domestic drudgery to careers of their own choosing.  The invention of Google, which made the entire world’s knowledge just a search bar away, is now also a permanent part of the story of the Enlightenment.

I fantasize that, within my lifetime, the Enlightenment will expand further to tolerate a diversity of cognitive styles—including people on the Asperger’s and autism spectrum, with their penchant for speaking uncomfortable truths—as well as a diversity of natural abilities and inclinations.  Society might or might not get the “demographically correct” percentage of Ellie Arroways—Ellie might decide to become a doctor or musician rather than an astronomer, and that’s fine too—but most important, it will nurture all the Ellie Arroways that it gets, all the misfits and explorers of every background.  I wonder whether, while disagreeing on exactly what’s meant by it, all parties to this debate could agree that diversity represents a next frontier for the Enlightenment.

Comment Policy: Any comment, from any side, that attacks people rather than propositions will be deleted.  I don’t care if the comment also makes useful points: if it contains a single ad hominem, it’s out.

As it happens, I’m at a quantum supremacy workshop in Bristol, UK right now—yeah, yeah, I’m a closet supremacist after all, hur hur—so I probably won’t participate in the comments until later.

August 13, 2017

Chad OrzelKid Art Update

Our big home renovation has added a level of chaos to everything that’s gotten in the way of my doing more regular cute-kid updates. And even more routine tasks, like photographing the giant pile of kid art that we had to move out of the dining room. Clearing stuff up for the next big stage of the renovation– cabinets arrive tomorrow– led me to this stuff, though, so I finally took pictures of a whole bunch of good stuff. (On the spiffy new tile floor in the kitchen, because the light was good there…)

The kids’s school sends home portfolios of what they’ve done in art class for the year, and I collected those photos together into a Google Photos album for easy sharing, because they’re pretty cool. My favorite piece of the lot is this polar bear by SteelyKid:

Polar bear by SteelyKid.

That’s in pastel chalk on construction paper; you can see some preliminary sketches of the bear in the album. She drew the scene in pencil, colored it in chalk, then traced important lines with a marker. It’s very cool.

The Pip has some neat stuff in his portfolio, too– I especially like that they had the kids making Mondrians out of strips of construction paper– but my favorite of his was a non-art-class drawing that was in the pile:

The Pigeon, by The Pip.

That’s a very credible rendering of Mo Willems’s Pigeon for a kindergartener…

Anyway, other than that, life continues in the usual whirl. I’m getting really tired of living out of a mini-fridge in the living room and a temporary sink in the kitchen that’s at about knee level to me (when I have to wash dishes, I pull up a chair and sit down, which takes the stress on my back from “agonizing” down to “annoying”). But, cabinets this week, so we can see the oncoming train at the end of this tunnel…

August 12, 2017

Tim GowersIntransitive dice VII — aiming for further results

While Polymath13 has (barring a mistake that we have not noticed) led to an interesting and clearly publishable result, there are some obvious follow-up questions that we would be wrong not to try to answer before finishing the project, especially as some of them seem to be either essentially solved or promisingly close to a solution. The ones I myself have focused on are the following.

  1. Is it true that if two random elements A and B of [n]^n are chosen, then A beats B with very high probability if it has a sum that is significantly larger? (Here “significantly larger” should mean larger by f(n) for some function f(n)=o(n^{3/2}) — note that the standard deviation of the sum has order n^{3/2}, so the idea is that this condition should be satisfied one way or the other with probability 1-o(1)).
  2. Is it true that the stronger conjecture, which is equivalent (given what we now know) to the statement that for almost all pairs (A,B) of random dice, the event that A beats a random die C has almost no correlation with the event that B beats C, is false?
  3. Can the proof of the result obtained so far be modified to show a similar result for the multisets model?

The status of these three questions, as I see it, is that the first is basically solved — I shall try to justify this claim later in the post, for the second there is a promising approach that will I think lead to a solution — again I shall try to back up this assertion, and while the third feels as though it shouldn’t be impossibly difficult, we have so far made very little progress on it, apart from experimental evidence that suggests that all the results should be similar to those for the balanced sequences model. [Added after finishing the post: I may possibly have made significant progress on the third question as a result of writing this post, but I haven’t checked carefully.]

The strength of a die depends strongly on the sum of its faces.

Let A=(a_1,\dots,a_n) and B=(b_1,\dots,b_n) be elements of [n]^n chosen uniformly and independently at random. I shall now show that the average of


is zero, and that the probability that this quantity differs from its average by substantially more than n\log n is very small. Since typically the modulus of \sum_ia_i-\sum_jb_j has order n^{3/2}, it follows that whether or not A beats B is almost always determined by which has the bigger sum.

As in the proof of the main theorem, it is convenient to define the functions

f_A(j)=|\{i:a_i<j\}|+\frac 12|\{i:a_i=j\}|


g_A(j)=f_A(j)-j+\frac 12.


\sum_jf_A(b_j)=\sum_{i,j}\mathbf 1_{a_i<b_j}+\frac 12\sum_{i,j}\mathbf 1_{a_i=b_j},

from which it follows that B beats A if and only if \sum_jf_A(b_j)>n^2/2. Note also that

\sum_jg_A(b_j)=\sum_jf_A(b_j)-\sum_jb_j+\frac n2.

If we choose A purely at random from [n]^n, then the expectation of f_A(j) is j-1/2, and Chernoff’s bounds imply that the probability that there exists j with |g_A(j)|=|f_A(j)-j+1/2|\geq C\sqrt{n\log n} is, for suitable C at most n^{-10}. Let us now fix some A for which there is no such j, but keep B as a purely random element of [n]^n.

Then \sum_jg_A(b_j) is a sum of n independent random variables, each with maximum at most C\sqrt{n\log n}. The expectation of this sum is \sum_jg_A(j)=\sum_jf_A(j)-n^2/2.


\sum_jf_A(j)=\sum_{i,j}\mathbf 1_{a_i<j}+\frac 12\sum_{i,j}\mathbf 1_{a_i=j}

=\sum_i(n-a_i)+\frac n2=n^2+\frac n2-\sum_ia_i,

so the expectation of \sum_jg_A(b_j) is n(n+1)/2-\sum_ia_i.

By standard probabilistic estimates for sums of independent random variables, with probability at least 1-n^{-10} the difference between \sum_jg_A(b_j) and its expectation \sum_jf_A(j)-n^2/2 is at most Cn\log n. Writing this out, we have

|\sum_jf_A(b_j)-\sum_jb_j+\frac n2-n(n+1)/2+\sum_ia_i|\leq Cn\log n,

which works out as

|\sum_jf_A(b_j)-\frac {n^2}2-\sum_jb_j+\sum_ia_i|\leq Cn\log n.

Therefore, if \sum_ia_i>\sum_jb_j+Cn\log n, it follows that with high probability \sum_jf_A(b_j)<n^2/2, which implies that A beats B, and if \sum_jb_j>\sum_ia_i+Cn\log n, then with high probability B beats A. But one or other of these two cases almost always happens, since the standard deviations of \sum_ia_i and \sum_jb_j are of order n^{3/2}. So almost always the die that wins is the one with the bigger sum, as claimed. And since “has a bigger sum than” is a transitive relation, we get transitivity almost all the time.

Why the strong conjecture looks false

As I mentioned, the experimental evidence seems to suggest that the strong conjecture is false. But there is also the outline of an argument that points in the same direction. I’m going to be very sketchy about it, and I don’t expect all the details to be straightforward. (In particular, it looks to me as though the argument will be harder than the argument in the previous section.)

The basic idea comes from a comment of Thomas Budzinski. It is to base a proof on the following structure.

  1. With probability bounded away from zero, two random dice A and B are “close”.
  2. If A and B are two fixed dice that are close to each other and C is random, then the events “A beats C” and “B beats C” are positively correlated.

Here is how I would imagine going about defining “close”. First of all, note that the function g_A is somewhat like a random walk that is constrained to start and end at zero. There are results that show that random walks have a positive probability of never deviating very far from the origin — at most half a standard deviation, say — so something like the following idea for proving the first step (remaining agnostic for the time being about the precise definition of “close”). We choose some fixed positive integer k and let x_1<\dots<x_k be integers evenly spread through the interval \{1,2,\dots,n\}. Then we argue — and this should be very straightforward — that with probability bounded away from zero, the values of f_A(x_i) and f_B(x_i) are close to each other, where here I mean that the difference is at most some small (but fixed) fraction of a standard deviation.

If that holds, it should also be the case, since the intervals between x_{i-1} and x_i are short, that f_A and f_B are uniformly close with positive probability.

I’m not quite sure whether proving the second part would require the local central limit theorem in the paper or whether it would be an easier argument that could just use the fact that since f_A and f_B are close, the sums \sum_jf_A(c_j) and \sum_jf_B(c_j) are almost certainly close too. Thomas Budzinski sketches an argument of the first kind, and my guess is that that is indeed needed. But either way, I think it ought to be possible to prove something like this.

What about the multisets model?

We haven’t thought about this too hard, but there is a very general approach that looks to me promising. However, it depends on something happening that should be either quite easy to establish or not true, and at the moment I haven’t worked out which, and as far as I know neither has anyone else.

The difficulty is that while we still know in the multisets model that A beats B if and only if \sum_jf_A(b_j)<n^2/2 (since this depends just on the dice and not on the model that is used to generate them randomly), it is less easy to get traction on the sum because it isn’t obvious how to express it as a sum of independent random variables.

Of course, we had that difficulty with the balanced-sequences model too, but there we got round the problem by considering purely random sequences B and conditioning on their sum, having established that certain events held with sufficiently high probability for the conditioning not to stop them holding with high probability.

But with the multisets model, there isn’t an obvious way to obtain the distribution over random dice B by choosing b_1,\dots,b_n independently (according to some distribution) and conditioning on some suitable event. (A quick thought here is that it would be enough if we could approximate the distribution of B in such a way, provided the approximation was good enough. The obvious distribution to take on each b_i is the marginal distribution of that b_i in the multisets model, and the obvious conditioning would then be on the sum, but it is far from clear to me whether that works.)

A somewhat different approach that I have not got far with myself is to use the standard one-to-one correspondence between increasing sequences of length n taken from [n] and subsets of [2n-1] of size n. (Given such a sequence (a_1,\dots,a_n) one takes the subset \{a_1,a_2+1,\dots,a_n+n-1\}, and given a subset S=\{s_1,\dots,s_n\}\subset[2n-1], where the s_i are written in increasing order, one takes the multiset of all values s_i-i+1, with multiplicity.) Somehow a subset of [2n-1] of size n feels closer to a bunch of independent random variables. For example, we could model it by choosing each element with probability n/(2n-1) and conditioning on the number of elements being exactly n, which will happen with non-tiny probability.

Actually, now that I’m writing this, I’m coming to think that I may have accidentally got closer to a solution. The reason is that earlier I was using a holes-and-pegs approach to defining the bijection between multisets and subsets, whereas with this approach, which I had wrongly assumed was essentially the same, there is a nice correspondence between the elements of the multiset and the elements of the set. So I suddenly feel more optimistic that the approach for balanced sequences can be adapted to the multisets model.

I’ll end this post on that optimistic note: no doubt it won’t be long before I run up against some harsh reality.

August 08, 2017

Chad OrzelPhysics Blogging Round-Up: July

Another month, another collection of blog posts for Forbes:

The Physics Of Century-Old Mirror Selfies: Back in the early 1900’s there was a brief vogue for trick pictures showing the same person from five different angles; this post explains how to do that with mirrors.

Why Research By Undergraduates Is Important For Science And Students: A reply to an essay talking up the products of undergraduate research projects, arguing that the most valuable part of research is the effect on students.

What Does It Mean To Share ‘Raw Data’?: Some thoughts on the uselessness of much “raw data” in my field to anyone outside the lab where it was produced.

Breaking Stuff Is An Essential Part Of The Scientific Process: Thoughts on how the most important year of my grad school career was the frustrating one in which I broke and then repaired everything in the lab.

Measuring The Speed Of Quantum Tunneling: A couple of recent experiments use a clever trick to look at whether there’s a time delay as electrons tunnel out of an atom in a strong electric field. Unfortunately, they get very different results…

I was a little disappointed that the photo-multigraph thing didn’t get more traction, but it was fun to do, so that’s okay. The quantum tunneling post did surprisingly well– I thought it was likely to be a little too technical to really take off, but it did. Always nice when that happens.

The other three are closely related to a development at work, namely that on July 1 I officially added “Director of Undergraduate Research” to the many hats I wear. I’m in charge of supervising the research program at Union, disbursing summer fellowships and small grants for research projects and conference travel, and arranging a number of research-oriented events on campus. This involves a certain amount of administrative hassle, but then again, it’s hassle in the service of helping students do awesome stuff, so I’m happy to do it.

Anyway, that’s where things are. Blogging will very likely tail off dramatically for the fall, possibly as soon as this month (though I already have one post up), as I have a book on contract due Dec. 1, and a review article due to a journal a month later. And, you know, classes to teach and research to direct…

August 02, 2017

Jordan EllenbergMaryland flag, my Maryland flag

The Maryland flag is, in my opinion as a Marylander, the greatest state flag.

Ungepotch?  Yes.  But it has that ineffable “it shouldn’t work but it does” that marks really great art.

But here’s something I didn’t know about my home state’s flag:


Despite the antiquity of its design, the Maryland flag is of post-Civil War origin. Throughout the colonial period, only the yellow-and-black Calvert family colors are mentioned in descriptions of the Maryland flag. After independence, the use of the Calvert family colors was discontinued. Various banners were used to represent the state, although none was adopted officially as a state flag. By the Civil War, the most common Maryland flag design probably consisted of the great seal of the state on a blue background. These blue banners were flown at least until the late 1890s….

Reintroduction of the Calvert coat of arms on the great seal of the state [in 1854] was followed by a reappearance at public events of banners in the yellow-and-black Calvert family colors. Called the “Maryland colors” or “Baltimore colors,” these yellow-and-black banners lacked official sanction of the General Assembly, but appear to have quickly become popular with the public as a unique and readily identifiable symbol of Maryland and its long history.

The red-and-white Crossland arms gained popularity in quite a different way. Probably because the yellow-and-black “Maryland colors” were popularly identified with a state which, reluctantly or not, remained in the Union, Marylanders who sympathized with the South adopted the red-and-white of the Crossland arms as their colors. Following Lincoln’s election in 1861, red and white “secession colors” appeared on everything from yarn stockings and cravats to children’s clothing. People displaying these red-and-white symbols of resistance to the Union and to Lincoln’s policies were vigorously prosecuted by Federal authorities.

During the war, Maryland-born Confederate soldiers used both the red-and-white colors and the cross bottony design from the Crossland quadrants of the Calvert coat of arms as a unique way of identifying their place of birth. Pins in the cross bottony shape were worn on uniforms, and the headquarters flag of the Maryland-born Confederate general Bradley T. Johnson was a red cross bottony on a white field.

By the end of the Civil War, therefore, both the yellow-and-black Calvert arms and the red-and-white colors and bottony cross design of the Crossland arms were clearly identified with Maryland, although they represented opposing sides in the conflict.

In 4th grade, in Maryland history, right after having to memorize the names of the counties, we learned about the flag’s origin in the Calvert coat of arms

but not about the symbolic meaning of the flag’s adoption, as an explicit gesture of reconciliation between Confederate sympathizers and Union loyalists sharing power in a post-war border state.

The Howard County flag is based on the Crossland arms.  (There’s also a sheaf of wheat and a silhouette of Howard County nosing its way through a golden triangle.)  The city of Baltimore, on the other hand, uses the Calvert yellow-and-black only.

Oh, and there’s one more flag:

That’s the flag of the Republic of Maryland, an independent country in West Africa settled mostly by free black Marylanders.  It existed only from 1854 to 1857, when it was absorbed into Liberia, of which it’s still a part, called Maryland County.  The county flag still has Lord Baltimore’s yellow, but not the black.





August 01, 2017

Jordan EllenbergGood math days

I have good math days and bad math days; we all do.  An outsider might think the good math days are the days when you have good ideas.  That’s not how it works, at least for me.  You have good ideas on the bad math days, too; but one at a time.  You have an idea, you try it, you make some progress, it doesn’t work, your mind says “too bad.”

On the good math days, you have an idea, you try it, it doesn’t work, you click over to the next idea, you get over the obstacle that was blocking you, then you’re stuck again, you ask your mind “What’s the next thing to do?” you get the next idea, you take another step, and you just keep going.

You don’t feel smarter on the good math days.  It’s not even momentum, exactly, because it’s not a feeling of speed.  More like:  the feeling of being a big, heavy, not very fast vehicle, with very large tires, that’s just going to keep on traveling, over a bump, across a ditch, through a river, continually and inexorably moving in a roughly fixed direction.


July 29, 2017

Jordan EllenbergShow report: Camp Friends and Omni at the Terrace

Beautiful weather last night so I decided, why not, go to the Terrace for the free show WUD put on:  Camp Friends (Madison) and Omni (Atlanta).

Missed most of Camp Friends, who were billed as experimental but in fact played genial, not-real-tight college indie.  Singer took his shirt off.

Omni, though — this is the real thing.  Everyone says it sounds like 1981 (specifically:  1981), and they’re right, but it rather wonderfully doesn’t sound like any particular thing in 1981.  There’s the herky-jerky-shoutiness and clipped chords (but on some songs that sounds like Devo and on others like Joe Jackson) and the jazz chords high on the neck (the Fall?  The Police?) and weird little technical guitar runs that sound like Genesis learning to play new wave guitar on Abacab and arpeggios that sound like Peter Buck learning to play guitar in the first place (these guys are from Georgia, after all.)  What I kind of love about young people is this.  To me, all these sounds are separate styles; to a kid picking up these records now, they’re just 1981, they’re all material to work from, you can put them together and something kind of great comes out of it.

You see a lot of bands with a frontman but not that many which, like Omni, have a frontman and a backman.  Philip Frobos sings and plays bass and mugs and talks to the audience.  Frankie Broyles, the guitar player, is a slight guy who looks like a librarian and stands still and almost expressionless while he plays his tight little runs.  Then, every once in a while, he unleashes an absolute storm of noise.  But still doesn’t grimace, still doesn’t move!  Amazing.  Penn and Teller is the only analogue I can think of.

Omni plays “Jungle Jenny,” live in Atlanta:

And here’s “Wire,” to give a sense of their more-dance-less-rock side:


Both songs are on Omni’s debut album, Deluxe, listenable at Bandcamp.

Best show I’ve seen at the Terrace in a long time.  Good job, WUD.