Planet Musings

August 20, 2017

John PreskillTwo Views of the Eclipse

I am sure many of us are thinking about the eclipse.

It all starts with how far are we going to drive in order to see totality. My family and I are currently in Colorado, so we are relatively close to the path of darkness in Wyoming. I thought about trying to book a hotel room. But if you’d like to see the dusk in Lusk, here is what you get:

Let us just say that I became quite acquainted with small-town WY and any-ville NE before giving up. Driving in the same day for 10 hours with my two children, ages 4 and 5, was not an option. So I will have to be content with 90% coverage.

90% coverage sounds like it is good enough… But when you think about the sun and its output, you realize that it won’t actually be very dark. The sun gives out about 1kW of light and heat per square meter. 90% of that still leaves us with 100W per meter squared. Imagine a room lit by a square array of 100W incandescent bulbs at one meter apart from each other. Not so dark. Luckily, we have really dark eclipse glasses.

All things considered, it is a huge coincidence that the moon is just about the right size and distance from the earth to block the sun exactly, \frac{\mbox{sun radius}}{\mbox{sun-Earth distance}}=\frac{0.7\cdot 10^6 km}{150\cdot 10^6 km}\approx \frac{\mbox{luna radius}}{\mbox{luna-Earth distance}}=\frac{1.7\cdot 10^3 km}{385\cdot 10^3 km}.

On a more personal note, another coincidence of a lesser cosmic meaning is that my wife, Jocelyn Holland, a professor of comparative literature at UCSB and Caltech, has also done research on eclipses. She has recently published an essay that shows how, for nineteenth-century observers, and astronomers in particular, the unique darkness associated with the eclipse during totality shook their subjective experience of time. Readers might want to share their own personal experiences at the end of this blog so that we can see how a twenty-first century perspective compares.

As for Jocelyn’s paper, here is a redacted ‘poetry for scientists’ excerpt from it.

Eclipses are well-known objects of scientific study but it is just as true that, throughout history, they have been perceived as the most supernatural of events, permitting superstition and fear to intrude. As a result, eclipses have frequently been used across cultures, in particular, by the community of scientists and scholars, as an index of “enlightenment.” Astronomers in the nineteenth century – an epoch that witnessed several mathematical advances in the calculation of solar and lunar eclipses, as exemplified in the work of Friedrich Bessel – looked back at prior centuries with scorn, mocking the irrational fears of times past. The German astronomer Ludwig August Busch, in text published shortly before a total eclipse in 1851, points out with some smugness that scarcely 200 years before then, in Germany, “the majority of the population threw itself upon its knees in desperation during a total eclipse,” and that the composure with which the next eclipse will be greeted is “the most certain proof how only science is able to conquer prejudices and superstition which prior centuries have gone through.”

Two solar eclipses were witnessed by Europeans in the mid-nineteenth century, on July 8th, 1842 and July 28th, 1851, when the first photographic image of an eclipse was made by Julius Berkowski (see below).

What Berkowski’s daguerreotype cannot convey, however, is a particular perception shared by both professional astronomers and amateur observers of these eclipses: that the darkness of the eclipse’s totality is unlike any darkness they had experienced before. As it turns out, this perception posed a challenge to their self-proclaimed enlightenment.

There was already a historical record in place describing the strange darkness of a total eclipse. As another nineteenth-century astronomer, Jacob Lehmann, phrased it, “How is it now to be explained, namely what several observers report during the eclipse of 1706, that the darkness at the time of the total occultation of the sun compares neither to night nor to dusk, but rather is of a particular kind. What is this particular kind?” The strange darkness of the eclipse presents a problem that one can state quite simply in temporal terms: it corresponds to no prior experience of natural light or time of day.

It might strike us as odd that August Ludwig Busch, the same astronomer who derided the superstition of prior generations, writes the following with reference to eclipses past, and in anticipation of the eclipse of 1851:

You will all remember the inexplicable melancholic frame of mind which one already experiences during large if not even total eclipses, when all objects appear in a dull, unusual light, there lies namely in the sight of great plains and far-spread drifts, upon which trees and rocks, although still illuminated by sunlight, still seem to cast no shadow, such a thing which causes mourning, that one is involuntarily overcome by horror. This feeling should occur more intensely in people when, during the total eclipse, a very peculiar darkness arrives which can be named neither night nor dusk.

August Ludwig Busch.

One can say that the perceived relationship between the quality of light and time of day is based on expectations that are so innate as to be taken as infallible until experience teaches otherwise. It is natural for us to use the available light in the sky as the basis for a measure of time when no time-keeping piece is on hand. The cyclical predictability of a steady increase and decrease in available light during the course of the day, however, in addition to all the nuances of how the midday light differs from dawn and twilight, is less than helpful in the rare event of an eclipse. The quality of light does not correspond to any experience of lived time. As a consequence, not only August Ludwig Busch, but also numerous other observers, attributed it to death, as if for lack of an alternative.

For all their claims of rationality, nineteenth-century observers were troubled by this darkness that conformed to no experienced time of day. It signaled to them, among other things, that time and light are out of joint. In short, as natural and as it may be, a full solar eclipse has, historically, posed a real challenge: not to the predictability of mechanical time-keeping, but rather to a very human experience of time.

n-Category Café Simplicial Sets vs. Simplicial Complexes

I’m looking for a reference. Homotopy theorists love simplicial sets; certain other topologists love simplicial complexes; they are related in various ways, and I’m interested in one such relation.

Let me explain…

There’s a category SSetSSet of simplicial sets and a category SCpxSCpx of simplicial complexes.
There is, I believe, a functor

F:SCpxSSetF: SCpx \to SSet

that takes the simplices in a simplicial complex, which have unordered vertices, and creates a simplicial set in which the vertices of each simplex are given all possible orderings.

There is a geometric realization functor

||:SSetTop| \cdot | : SSet \to Top

There’s also a geometric realization functor for simplicial complexes. I’d better give it another name… let’s say

[]:SCpxTop [ \cdot ] : SCpx \to Top

Here’s what I want a reference for, assuming it’s true: there’s a natural transformation α\alpha from the composite

SCpxFSSet||Top SCpx \stackrel{F}{\longrightarrow} SSet \stackrel{|\cdot|}{\longrightarrow} Top


SCpx[]Top SCpx \stackrel{[\cdot]}{\longrightarrow} Top

and for any simplicial complex XX

α X:|F(X)|[X] \alpha_X : |F(X)| \to [X]

is a homotopy equivalence. (I don’t think it’s a homeomorphism; a bunch of simplices need to be squashed down.)

Since this field is fraught with confusion, I’d better say exactly what I mean by some things. By a simplicial set I mean a functor X:Δ opSetX : \Delta^{op} \to Set , and a morphism between simplicial sets to be a natural transformation between them. That seems pretty uncontroversial. By a simplicial complex I mean what Wikipedia calls an abstract simplicial complex, to distinguish them from simplicial complexes that are made of concrete and used as lawn ornaments… or something like that. Namely, I mean a set XX equipped with a family U XU_X of non-empty finite subsets that is downward closed and contains all singletons. By a map of simplicial complexes from (X,U X)(X,U_X) to (Y,U Y)(Y,U_Y) I mean a map f:XYf: X \to Y such that the image of any set in U XU_X is a set in U YU_Y.

On MathOverflow the topologist Allen Hatcher wrote:

In some areas simplicial sets are far more natural and useful than simplicial complexes, in others the reverse is true. If one drew a Venn diagram of the people using one or the other structure, the intersection might be very small.

Perhaps this cultural divide is reflected in the fact that the usual definition of an abstract simplicial complex involves a set that contains the set of vertices, whose extra members are entirely irrelevant when it comes to defining morphisms! It’s like defining a vector space to consist of two sets, one of which is a vector space in the usual sense and the other of which doesn’t show up in the definition of linear map. Someone categorically included would instead take the approach described on the nLab.

Ultimately you get equivalent categories either way. And as Alex Hoffnung and I explain, the resulting category of simplicial complexes is a quasitopos of concrete sheaves:

A quasitopos has many, but not quite all, of the good properties of a topos. Simplicial sets are a topos of presheaves, so from a category-theoretic viewpoint they’re more tractable than simplicial complexes.

August 18, 2017

Doug NatelsonInvited symposia/speaker deadlines, APS 2018

For those readers who are APS members, a reminder.  The deadline for nominations for invited symposia for the Division of Condensed Matter Physics for the 2018 March Meeting is August 31.   See here.

Likewise, the Division of Materials Physics has their invited speaker (w/in focus topic) deadline of August 29.  See here.

Please nominate!

Doug NatelsonPower output of a lightsaber

It's been a long week and lots of seriously bad things have been in the news.   I intend to briefly distract myself by looking at that long-standing question, what is the power output of a lightsaber?  I'm talking about the power output when the lightsaber is actually slicing through something, not just looking cool.  We can get a very rough, conservative estimate from the documentary video evidence before us.  I choose not to use the prequels, on general principle, though the scene in The Phantom Menace when Gui-Gon Jinn cuts through the blast door would be a good place to start.  Instead, let's look at The Force Awakens, where Kylo Ren throws a tantrum and slices up an instrument panel, leaving behind dripping molten metal.  

With each low-effort swing of his arm, Ren's lightsaber, with a diameter of around 2 cm moves at something more than 2 m/s, slicing metal to a depth of, say, 3 cm (actually probably deeper than that).  That is, the cutting part of the blade is sweeping out a volume of around 1200 cc/sec.  It is heating that volume of console material up to well above its melting point, so we need to worry about the energy it takes to heat the solid from room temperature (300 K) up to its melting point, and then the heat of fusion required to melt the material.  At a rough guess, suppose imperial construction is aluminum.  Aluminum has a specific heat of 0.9 J/g-K, and a density of 2.7 g/cc when solid, a melting point of 933 K, and a heat of fusion of 10.7 kJ/mol.  In terms of volume, that's (10.7 kJ/mol)(1 mol/27 g)(2.7 g/cc) = 1070 J/cc.  So, the total power is around (933K-300K)*(0.9J/g-K)*(2.7g/cc)*(1200 cc/sec) + (1070 J/cc)(1200 cc/sec) = 3.1e6 J/s = 3.1 MW.  

Hot stuff.

Matt StrasslerAn Experience of a Lifetime: My 1999 Eclipse Adventure

Back in 1999 I saw a total solar eclipse in Europe, and it was a life-altering experience.  I wrote about it back then, but was never entirely happy with the article.  This week I’ve revised it.  It could still benefit from some editing and revision (comments welcome), but I think it’s now a good read.  It’s full of intellectual observations, but there are powerful emotions too.

If you’re interested, you can read it as a pdf, or just scroll down.



A Luminescent Darkness: My 1999 Eclipse Adventure

© Matt Strassler 1999

After two years of dreaming, two months of planning, and two hours of packing, I drove to John F. Kennedy airport, took the shuttle to the Air France terminal, and checked in.  I was brimming with excitement. In three days time, with a bit of luck, I would witness one the great spectacles that a human being can experience: a complete, utter and total eclipse of the Sun.

I had missed one eight years earlier. In July 1991, a total solar eclipse crossed over Baja California. I had thought seriously about driving the fourteen hundred miles from the San Francisco area, where I was a graduate student studying theoretical physics, to the very southern tip of the peninsula. But worried about my car’s ill health and scared by rumors of gasoline shortages in Baja, I chickened out. Four of my older colleagues, more worldly and more experienced, and supplied with a more reliable vehicle, drove down together. When they returned, exhilarated, they regaled us with stories of their magical adventure. Hearing their tales, I kicked myself for not going, and had been kicking myself ever since. Life is not so long that such opportunities can be rationalized or procrastinated away.

A total eclipse of the Sun is a event of mythic significance, so rare and extraordinary and unbelievable that it really ought to exist only in ancient legends, in epic poems, and in science fiction stories. There are other types of eclipses — partial and total eclipses of the Moon, in which the Earth blocks sunlight that normally illuminates the Moon, and various eclipses of the Sun in which the Moon blocks sunlight that normally illuminates the Earth. But total solar eclipses are in a class all their own. Only during the brief moments of totality does the Sun vanish altogether, leaving the shocked spectator in a suddenly darkened world, gazing uncomprehendingly at a black disk of nothingness.

Our species relies on daylight. Day is warm; day grows our food; day permits travel with a clear sense of what lies ahead. We are so fearful of the night — of what lurks there unseen, of the sounds that we cannot interpret. Horror films rely on this fear; demons and axe murderers are rarely found walking about in bright sunshine. Dark places are dangerous places; sudden unexpected darkness is worst of all. These are the conventions of cinema, born of our inmost psychology. But the Sun and the Moon are not actors projected on a screen. The terror is real.

It has been said that if the Earth were a member of a federation of a million planets, it would be a famous tourist attraction, because this home of ours would be the only one in the republic with such beautiful eclipses. For our skies are witness to a coincidence truly of cosmic proportions. It is a stunning accident that although the Sun is so immense that it could hold a million Earths, and the Moon so small that dozens could fit inside our planet, these two spheres, the brightest bodies in Earth’s skies, appear the same size. A faraway giant may seem no larger than a nearby child. And this perfect match of their sizes and distances makes our planet’s eclipses truly spectacular, visually and scientifically. They are described by witnesses as a sight of weird and unique beauty, a visual treasure completely unlike anything else a person will ever see, or even imagine.

But total solar eclipses are uncommon, occurring only once every year or two. Even worse, totality only occurs in a narrow band that sweeps across the Earth — often just across its oceans. Only a small fraction of the Earth sees a total eclipse in any century. And so these eclipses are precious; only the lucky, or the devoted, will experience one before they die.

In my own life, I’d certainly been more devoted than lucky. I knew it wasn’t wise to wait for the Moon’s shadow to find me by chance. Instead I was going on a journey to place myself in its path.

The biggest challenge in eclipse-chasing is the logistics. The area in which totality is visible is very long but very narrow. For my trip, in 1999, it was a long strip running west to east all across Europe, but only a hundred miles wide from north to south. A narrow zone crossing heavily populated areas is sure to attract a massive crowd, so finding hotels and transport can be difficult. Furthermore, although eclipses are precisely predictable, governed by the laws of gravity worked out by Isaac Newton himself, weather and human beings are far less dependable.

But I had a well-considered plan. I would travel by train to a small city east of Paris, where I had reserved a rental car. Keeping a close watch on the weather forecast, I would drive on back roads, avoiding clogged highways. I had no hotel reservations. It would have been pointless to make them for the night before the event, since it was well known that everything within two hours drive of the totality zone was booked solid. Moreover, I wanted the flexibility to adjust to the weather and couldn’t know in advance where I’d want to stay. So my idea was that on the night prior to the eclipse, I would drive to a good location in the path of the lunar shadow, and sleep in the back of my car. I had a sleeping bag with me to keep me warm, and enough lightweight clothing for the week — and not much else.

Oh, it was such a good plan, clean and simple, and that’s why my heart had so far to sink and my brain so ludicrous a calamity to contemplate when I checked my wallet, an hour before flight time, and saw a gaping black emptiness where my driver’s license was supposed to be. I was struck dumb. No license meant no car rental; no car meant no flexibility and no place to sleep. Sixteen years of driving and I had never lost it before; why, why, of all times, now, when it was to play a central role in a once-in-a-lifetime adventure?

I didn’t panic. I walked calmly back to the check-in counters, managed to get myself rescheduled for a flight on the following day, drove the three hours back to New Jersey, and started looking. It wasn’t in my car. Nor was it in the pile of unneeded items I’d removed from my wallet. Not in my suitcase, not under my bed, not in my office. As it was Sunday, I couldn’t get a replacement license. Hope dimmed, flickered, and went dark.

Deep breaths. Plan B?

I didn’t have a tent, and couldn’t easily have found one. But I did have a rain poncho, large enough to keep my sleeping bag off the ground. As long as it didn’t rain too hard, I could try, the night before the eclipse, to find a place to camp outdoors; with luck I’d find lodging for the other nights. I doubted this would be legal, but I was willing to take the chance. But what about my suitcase? I couldn’t carry that around with me into the wilderness. Fortunately, I knew a solution. For a year after college, I had studied music in France, and had often gone sightseeing by rail. On those trips I had commonly made use of the ubiquitous lockers at the train stations, leaving some luggage while I explored the nearby town. As for flexibility of location, that was unrecoverable; the big downside of Plan B was that I could no longer adjust to the weather. I’d just have to be lucky. I comforted myself with the thought that the worst that could happen to me would be a week of eating French food.

So the next day, carrying the additional weight of a poncho and an umbrella, but having in compensation discarded all inessential clothing and tourist information, I headed back to the airport, this time by bus. Without further misadventures, I was soon being carried across the Atlantic.

As usual I struggled to nap amid the loud silence of a night flight. But my sleeplessness was rewarded with one of those good omens that makes you think that you must be doing the right thing. As we approached the European coastline, and I gazed sleepily out my window, I suddenly saw a bright glowing light. It was the rising white tip of the thin crescent Moon.

Solar eclipses occur at New Moon, always. This is nothing but simple geometry; the Moon must place itself exactly between the Sun and the Earth to cause an eclipse, and that means the half of the Moon that faces us must be in shadow. (At Full Moon, the opposite is true; the Earth is between the Sun and the Moon, so the half of the Moon that faces us is in full sunlight. That’s when lunar eclipses can occur.) And just before a New Moon, the Moon is close to the Sun’s location in the sky. It becomes visible, as the Earth turns, just before the Sun does, rising as a morning crescent shortly before sunrise. (Similarly, we get an evening crescent just after a New Moon.)

There, out over the vast Atlantic, from a dark ocean of water into a dark sea of stars, rose the delicate thin slip of Luna the lover, on her way to her mystical rendezvous with Sol. Her crescent smiled at me and winked a greeting. I smiled back, and whispered, “see you in two days…” For totality is not merely the only time you can look straight at the Sun and see its crown. It is the only time you can see the New Moon.

We landed in Paris at 6:30 Monday morning, E-day-minus-two. I headed straight to the airport train station, and poured over rail maps and my road maps trying to guess a good location to use as a base. Eventually I chose a medium-sized town with the name Charleville-Mezieres. It was on the northern edge of the totality zone, at the end of a large spoke of the Paris-centered rail system, and was far enough from Paris, Brussels, and all large German towns that I suspected it might escape the worst of the crowds. It would then be easy, the night before the eclipse, to take a train back into the center of the zone, where totality would last the longest.

Two hours later I was in the Paris-East rail station and had purchased my ticket for Charleville-Mezieres. With ninety minutes to wait, I wandered around the station. It was evident that France had gone eclipse-happy. Every magazine had a cover story; every newspaper had a special insert; signs concerning the event were everywhere. Many of the magazines carried free eclipse glasses, with a black opaque metallic material for lenses that only the Sun can penetrate. Warnings against looking at the Sun without them were to be found on every newspaper front page. I soon learned that there had been a dreadful scandal in which a widely distributed shipment of imported glasses was discovered to be dangerously defective, leading the government to make a hurried and desperate attempt to recall them. There were also many leaflets advertising planned events in towns lying in the totality zone, and information about extra trains that would be running. A chaotic rush out of Paris was clearly expected.

Before noon I was on a train heading through the Paris suburbs into the farmlands of the Champagne region. The rocking of the train put me right to sleep, but the shrieking children halfway up the rail car quickly ended my nap. I watched the lovely sunlit French countryside as it rolled by. The Sun was by now well overhead — or rather, the Earth had rotated so that France was nearly facing the Sun head on. Sometimes, when the train banked on a turn, the light nearly blinded me, and I had to close my eyes.

With my eyelids shut, I thought about how I’d managed, over decades, to avoid ever once accidentally staring at the Sun for even a second… and about how almost every animal with eyes manages to do this during its entire life. It’s quite a feat, when you think about it. But it’s essential, of course. The Sun’s ferocious blaze is even worse than it appears, for it contains more than just visible light. It also radiates light too violet for us to see — ultraviolet — which is powerful enough to destroy our vision. Any animal lacking instincts powerful enough to keep its eyes off the Sun will go blind, soon to starve or be eaten. But humans are in danger during solar eclipses, because our intense curiosity can make us ignore our instincts. Many of us will suffer permanent eye damage, not understanding when and how it is safe to look at the Sun… which is almost, but not quite, never.

In fact the only time it is safe to look with the naked eye is during totality, when the Sun’s disk is completely blocked by the New Moon, and the world is dark. Then, and only then, can one see that the Sun is not a sphere, and that it has a sort of atmosphere, immense and usually unseen.

At the heart of the Sun, and source of its awesome power, is its nuclear furnace, nearly thirty million degrees hot and nearly five billion years old. All that heat gradually filters and boils out of the Sun’s core toward its visible surface, which is a mere six thousand degrees… still white-hot. Outside this region is a large irregular halo of material that is normally too dim to see against the blinding disk. The inner part of that halo is called the chromosphere; there, giant eruptions called “prominences” loop outward into space. The outer part of the halo is the “corona”, Latin for “crown.” The opportunity to see the Sun’s corona is one of the main reasons to seek totality.

Still very drowsy, but in a good mood, I arrived in Charleville. Wanting to leave my bags in the station while I looked for a hotel room, I searched for the luggage lockers. After three tiring trips around the station, I asked at a ticket booth. “Oh,” said the woman behind the desk, “we haven’t had them available since the Algerian terrorism of a few years ago.”

I gulped. This threatened plan B, for what was I to do with my luggage on eclipse day? I certainly couldn’t walk out into the French countryside looking for a place to camp while carrying a full suitcase and a sleeping bag! And even the present problem of looking for a hotel would be daunting. The woman behind the desk was sympathetic, but her only suggestion was to try one of the hotels near the station. Since the tourist information office was a mile away, it seemed the only good option, and I lugged my bags across the street.

Here, finally, luck smiled. The very first place I stopped at had a room for that night, reasonably priced and perfectly clean, if spartan. It was also available the night after the eclipse. My choice of Charleville had been wise. Unfortunately, even here, Eclipse Eve — Tuesday evening — was as bad as I imagined. The hoteliere assured me that all of Charleville was booked (and my later attempts to find a room, even a last-minute cancellation, proved fruitless.) Still, she she was happy for me to leave my luggage at the hotel while I tramped through the French countryside. Thus was Plan B saved.

Somewhat relieved, I wandered around the town. Charleville is not unattractive, and the orange sandstone 16th century architecture of its central square is very pleasing to the eye. By dusk I was exhausted and collapsed on my bed. I slept long and deep, and awoke refreshed. I took a short sightseeing trip by train, ate a delicious lunch, and tried one more time to find a room in Charleville for Eclipse Eve. Failing once again, I resolved to camp in the heart of the totality zone.

But where? I had several criteria in mind. For the eclipse, I wanted to be far from any large town or highway, so that streetlights, often automatically triggered by darkness, would not spoil the experience. Also I wanted hills and farmland; I wanted to be at a summit, with no trees nearby, in order to have the best possible view. It didn’t take long to decide on a location. About five miles south of the unassuming town of Rethel, rebuilt after total destruction in the first world war, my map showed a high hill. It seemed perfect.

Fortunately, I learned just in time that this same high hill had attracted the attention of the local authorities, and they had decided to designate this very place the “official viewing site” in the region. A hundred thousand people were expected to descend on Rethel and take shuttles from the town to the “site.” Clearly this was not where I wanted to be!

So instead, when I arrived in Rethel, I walked in another direction. I aimed for an area a few miles west of town, quiet hilly farmland.

Yet again, my luck seemed to be on the wane. By four it was drizzling, and by five it was raining. Darkness would settle at around eight, and I had little time to find a site for unobtrusive camping, much less a dry one. The rain stopped, restarted, hesitated, spat, but refused to go away. An unending mass of rain clouds could be seen heading toward me from the west. I had hoped to use trees for some shelter against rain, but now the trees were drenched and dripping, even worse than the rain itself.

Still completely unsure what I would do, I continued walking into the evening. I must have cut a very odd figure, carrying an open umbrella, a sleeping bag, and a small black backpack. I took a break in a village square, taking shelter at a church’s side door, where I munched on French bread and cheese. Maybe one of these farmers would let me sleep in a dry spot in his barn, I thought to myself. But I still hadn’t reached the hills I was aiming for, so I kept walking.

After another mile, I came to a hilltop with a dirt farm track crossing the road. There, just off the road to the right, was a large piece of farm machinery. And underneath it, a large, flat, sheltered spot. Hideous, but I could sleep there. Since it wasn’t quite nightfall yet and I could see a hill on the other side of the road along the same track, one which looked like it might be good for watching the eclipse, I took a few minutes to explore it. There I found another piece of farm equipment, also with a sheltered underbelly. This one was much further from the road, looked unused, and presumably offered both safer and quieter shelter. It was sitting just off the dirt track in a fallow field. The field was of thick, sticky, almost hard mud, the kind you don’t slip in and which doesn’t ooze but which gloms onto the sides of your shoe.

And so it was that Eclipse Eve found me spreading my poncho in a friendly unknown farmer’s field, twisting my body so as not to hit my head on the metal bars of my shelter, carefully unwrapping my sleeping bag and removing my shoes so as not to cover everything in mud, brushing my teeth in bottled water, and bedding down for the night. The whole scene was so absurd that I found myself sporting a slightly manic grin and giggling. But still, I was satisfied. Despite the odds, I was in the zone at the appointed time; when I awoke the next morning I would be scarcely two miles from my final destination. If the clouds were against me, so be it. I had done my part.

I slept pretty well, considering both my excitement and the uneven ground. At daybreak I was surrounded by fog, but by 8 a.m.~the fog was lifting, revealing a few spots of blue sky amid low clouds. My choice of shelter was also confirmed; my sleeping bag was dry, and across the road the other piece of machinery I had considered was already in use.

I packed up and started walking west again. The weather seemed uncertain, with three layers of clouds — low stratus, medium cumulus, and high cirrus — crossing over each other. Blue patches would appear, then close up. I trudged to the base of my chosen hill, then followed another dirt track to the top, where I was graced with a lovely view. The rolling paysage of fertile France stretched before me, blotched here and there with sunshine.  Again I had chosen well, better than I realized, as it turned out, for I was not alone on the hill. A Belgian couple had chosen it too — and they had a car…

There I waited. The minutes ticked by. The temperature fluctuated, and the fields changed color, as the Sun played hide and seek. I didn’t need these reminders of the Sun’s importance — that without its heat the Earth would freeze, and without its light, plants would not grow and the cycle of life would quickly end. I thought about how pre-scientific cultures had viewed the Sun. In cultures and religions around the world, the blazing disk has often been attributed divine power and regal authority. And why not? In the past century, we’ve finally learned what the Sun is made from and why it shines. But we are no less in awe than our ancestors, for the Sun is much larger, much older, and much more powerful than most of them imagined.

For a while, I listened to the radio. Crowds were assembling across Europe. Special events — concerts, art shows, contests — were taking place, organized by towns in the zone to coincide with the eclipse. This was hardly surprising. All those tourists had come for totality. But totality is brief, never more than a handful of minutes.  It’s the luck of geometry, the details of the orbits of the Earth and Moon, that set its duration. For my eclipse, the Moon’s shadow was only about a hundred miles wide. Racing along at three thousand miles per hour, it would darken any one location for at most two minutes. Now if a million people are expected to descend on your town for a two-minute event, I suppose it is a good idea to give them something else to do while they wait. And of course, the French cultural establishment loves this kind of opportunity. Multimedia events are their specialty, and they often give commissions to contemporary artists. I was particularly amused to discover later that an old acquaintance of mine — I met him in 1987 at the composers’ entrance exams for the Paris Conservatory — had been commissioned to write an orchestral piece, called “Eclipse,” for the festival in the large city of Reims. It was performed just before the moment of darkness.

Finally, around 11:30, the eclipse began. The Moon nibbled a tiny notch out of the sun. I looked at it briefly through my eclipse glasses, and felt the first butterflies of anticipation. The Belgian couple, in their late fourties, came up to the top of the hill and stood alongside me. They were Flemish, but the man spoke French, and we chatted for a while. It turned out he was a scientist also, and had spent some time in the United States, so we had plenty to talk about. But our discussion kept turning to the clouds, which showed no signs of dissipating. The Sun was often veiled by thin cirrus or completely hidden by thick cumulus. We kept a nervous watch.

Time crawled as the Moon inched across the brilliant disk. It passed the midway point and the Sun became a crescent. With only twenty minutes before totality, my Belgian friends conversed in Dutch. The man turned to me. “We have decided to drive toward that hole in the clouds back to the east,” he said in French. “It’s really not looking so good here. Do you want to come with us?” I paused to think. How far away was that hole? Would we end up back at the town? Would we get caught in traffic? Would we end up somewhere low? What were my chances if I stayed where I was? I hesitated, unsure. If I went with them, I was subject to their whims, not my own. But after looking at the oncoming clouds one more time, I decided my present location was not favorable. I joined them.

We descended the dirt track and turned left onto the road I’d taken so long to walk. It was completely empty. We kept one eye on where we were going and five eyes on the sky. After two miles, the crescent sun became visible through a large gap in the low clouds. There were still high thin clouds slightly veiling it, but the sky around it was a pale blue. We went a bit further, and then stopped… at the very same dirt track where I had slept the night before. A line of ten or fifteen cars now stretched along it, but there was plenty of room for our vehicle.

By now, with ten minutes to go, the light was beginning to change. When only five percent of the Sun remains, your eye can really tell. The blues become deeper, the whites become milkier, and everything is more subdued. Also it becomes noticeably cooler. I’d seen this light before, in New Mexico in 1994. I had gone there to watch an “annular” eclipse of the Sun. An annular eclipse occurs when the Moon passes directly in front of the Sun but is just a bit too far away from the Earth for its shadow to reach the ground. In such an eclipse, the Moon fails to completely block the Sun; a narrow ringlet, or “annulus”, often called the “ring of fire,” remains visible. That day I watched from a mountain top, site of several telescopes, in nearly clear skies. But imagine the dismay of the spectators as the four-and-a-half minutes of annularity were blocked by a five-minute cloud! Fortunately there was a bright spot. For a brief instant — no more than three seconds — the cloud became thin, and a perfect circle of light shone through, too dim to penetrate eclipse glasses but visible with the naked eye… a veiled, surreal vision.

On the dirt track in the middle of French fields, we started counting down the minutes. There was more and more tension in the air. I put faster speed film into my camera. The light became still milkier, and as the crescent became a fingernail, all eyes were focused either on the Sun itself or on a small but thick and dangerous-looking cloud heading straight for it. Except mine. I didn’t care if I saw the last dot of sunlight disappear. What I wanted to watch was the coming of Moon-shadow.

One of my motivations for seeking a hill was that I wanted to observe the approach of darkness. Three thousand miles an hour is just under a mile per second, so if one had a view extending out five miles or so, I thought, one could really see the edge coming. I expected it would be much like watching the shadow of a cloud coming toward me, with the darkness sweeping along the ground, only much darker and faster. I looked to the west and waited for the drama to unfold.

And it did, but it was not what I was expecting. Even though observing the shadow is a common thing for eclipse watchers to do, nothing I had ever read about eclipses prepared me in the slightest for what I was about to witness. I’ve never seen it photographed, or even described. Maybe it was an effect of all the clouds around us. Or maybe others, just as I do, find it difficult to convey.

For how can one relate the sight of daylight sliding swiftly, like an sigh, to deep twilight? of the western sky, seen through scattered clouds, changing seamlessly and inexorably from blue to pink to slate gray to the last yellow of sunset? of colors rising up out of the horizon and spreading across the sky like water from a broken dyke flooding onto a field?

I cannot find the right combination of words to capture the sense of being swept up, of being overwhelmed, of being transfixed with awe, as one might be before the summoning of a great wave or a great wind by the command of a god, yet all in utter silence and great beauty. Reliving it as I write this brings a tear. In the end I have nothing to compare it to.

The great metamorphosis passed. The light stabilized. Shaken, I looked up.

And quickly looked away. I had seen a near-disk of darkness, the fuzzy whiteness of the corona, and some bright dots around the disk’s edge, one especially bright where the Sun still clearly shone through. Accidentally I had seen with my naked eyes the “diamond ring,” a moment when the last brilliant drop of Sun and the glistening corona are simultaneously visible. It’s not safe to look at. I glanced again. Still several bright dots. I glanced again. Still there — but the Sun had to be covered by now…

So I looked longer, and realized that the Sun was indeed covered, that those bright dots were there to stay. There it was. The eclipsed Sun, or rather, the dark disk of the New Moon, surrounded by the Sun’s crown, studded at its edge with seven bright pink jewels. It was bizarre, awe-inspiring, a spooky hallucination. It shimmered.

The Sun’s corona didn’t really resemble what I had seen in photographs, and I could immediately see why. The corona looked as though it were made of glistening white wispy hair, billowing outward like a mop of whiskers. It gleamed with a celestial light, a shine resembling that of well-lit tinsel. No camera could capture that glow, no photograph reproduce it.

But the greatest, most delightful surprise was the seven beautiful gems. I knew they had to be the great eruptions on the surface of the Sun, prominences, huge magnetic storms larger than our planet and more violent than anything else in the solar system. However, nobody ever told me they were bright pink! I always assumed they were orange (silly of me, since the whole Sun looks orange if you look at it through an orange filter, which the photographs always do.) They were arranged almost symmetrically around the sun, with one of them actually well separated from its surface and halfway out into the lovely soft filaments of the corona. I explored them with my binoculars. The colors, the glistening timbre, the rich detail, it is a visual delight. The scene is living, vibrant, delicate and soft; by comparison, all the photographs and films seem dry, flat, deadened.

I was surprised at my calm. After the great rush of the shadow, the stasis of totality had caught me off guard.  Around me it was much lighter than I had expected. The sense was of late twilight, with a deep blue-purple sky; yet it was still bright enough to read by. The yellow light of late sunset stretched all the way around the horizon. The planet Venus was visible, but no stars peeked through the clouds. Perhaps longer eclipses have darker skies, a larger Moon-shadow putting daylight further away.

I had scarcely had time to absorb all of this when, just at the halfway point of totality, the dangerous-looking cumulus cloud finally arrived, and blotted out the view. A groan, but only a half-hearted one, emerged from the spectators; after all we’d seen what we’d come to see. I took in the colors emanating from the different parts of the sky, and then looked west again, waiting for the light to return. A thin red glow touched the horizon. I waited. Suddenly the red began to grow furiously. I yelled “Il revient!” — it is returning! — and then watched in awe as the reds became pinks, swarmed over us, turned yellow-white…

And then… it was daylight again. Normality, or a slightly muted version of it. The magical show was over, heavenly love had been consummated, we who had traveled far had been rewarded. The weather had been kind to us. There was a pause as we savored the experience, and waited for our brains to resume functioning. Then congratulations were passed around as people shook hands and hugged each other. I thanked my Belgian friends, who like me were smiling broadly. They offered me a ride back to town. I almost accepted, but stopped short, and instead thanked them again and told them I somehow wanted to be outside for a while longer. We exchanged addresses, said goodbyes, they drove off.

I started retracing my steps from the previous evening. As I walked back to the town of Rethel in the returning sunshine, the immensity of what I had seen began gradually to make its way through my skin into my blood, making me teary-eyed. I thought about myself, a scientist, educated and knowledgeable about the events that had just taken place, and tried to imagine what would have happened to me today if I had not had
that knowledge and had found myself, unexpectedly, in the Moon’s shadow.

It was not difficult; I had only to imagine what I would feel if the sky suddenly, without any warning, turned a fiery red instead of blue and began to howl. It would have been a living nightmare. The terror that I would have felt would have penetrated my bones. I would have fallen on my knees in panic; I would have screamed and wept; I would have called on every deity I knew and others I didn’t know for help; I would have despaired; I would have thought death or hell had come; I would have assumed my life was about to end. The two minutes of darkness, filled with the screams and cries of my neighbors, would have been timeless, maddening. When the Sun just as suddenly returned, I would have collapsed onto the ground with relief, profusely and weepingly thanking all of the deities for restoring the world to its former condition, and would have rushed home to relatives and friends, hoping to find some comfort and solace.

I would have sought explanations. I would have been willing to consider anything: dragons eating the Sun, spirits seeking to punish our village or country for its transgressions, evil and spiteful monsters trying to freeze the Earth, gods warning us of terrible things to come in future. But above all, I could never, never have imagined that this brief spine-chilling extinction and transformation of the Sun was a natural phenomenon. Nothing so spectacular and sudden and horrifying could have been the work of mere matter. It would once and for all have convinced me of the existence of creatures greater and more powerful than human beings, if I had previously had any doubt.

And I would have been forever changed. No longer could I have entirely trusted the regularity of days and nights, of seasons, of years. For the rest of my life I would have always found myself glancing at the sky, wanting to make sure that all, for the moment, was well. For if the Sun could suddenly vanish for two minutes, perhaps the next time it could vanish for two hours, or two days… or two centuries. Or forever.

I pondered the impact that eclipses, both solar and lunar, have had throughout human history. They have shaped civilizations. Wars and slaughters were begun and ended on their appearance; they sent ordinary people to their deaths as appeasement sacrifices; new gods and legends were invoked to give meaning to them. The need to predict them, and the coincidences which made their prediction possible, helped give birth to astronomy as a mathematically precise science, in China, in Greece, in modern Europe — developments without which my profession, and even my entire technologically-based culture, might not exist.

It was an hour’s walk to Rethel, but that afternoon it was a long journey. It took me across the globe to nations ancient and distant. By the time I reached the town, I’d communed with my ancestors, reconsidered human history, and examined anew my tiny place in the universe.  If I’d been a bit calm during totality itself, I wasn’t anymore. What I’d seen was gradually filtering, with great potency, into my soul.

I took the train back to Charleville, and slept dreamlessly. The next two days were an opportunity to unwind, to explore, and to eat well. On my last evening I returned to Paris to visit my old haunts. I managed to sneak into the courtyard of the apartment house where I had had a one-room garret up five flights of stairs, with its spartan furnishings and its one window that looked over the roofs of Paris to the Eiffel Tower. I wandered past the old Music Conservatory, since moved to the northeast corner of town, and past the bookstore where I bought so much music. My favorite bakery was still open.

That night I slept in an airport hotel, and the next day flew happily home to the American continent. I never did find my driver’s license.

But psychological closure came already on the day following the eclipse. I spent that day in Laon, a small city perched magnificently atop a rocky hill that rises vertically out of the French plains. I wandered its streets and visited its sights — an attractive church, old houses, pleasant old alleyways, ancient walls and gates. As evening approached I began walking about, looking for a restaurant, and I came to the northwestern edge of town overlooking the new city and the countryside beyond. The clouds had parted, and the Sun, looking large and dull red, was low in the sky. I leaned on the city wall and watched as the turning Earth carried me, and Laon, and all of France, at hundreds of miles an hour, intent on placing itself between me and the Sun. Yet another type of solar eclipse, one we call “sunset.”

The ruddy disk touched the horizon. I remembered the wispy white mane and the brilliant pink jewels. In my mind the Sun had always been grand and powerful, life-giver and taker, essential and dangerous. It could blind, burn, and kill.  I respected it, was impressed and awed by it, gave thanks for it, swore at it, feared it. But in the strange light of totality, I had seen beyond its unforgiving, blazing sphere, and glimpsed a softer side of the Sun. With its feathery hair blowing in a dark sky, it had seemed delicate, even vulnerable. It is, I thought to myself, as mortal as we.

The distant French hills rose across its face. As it waned, I found myself feeling a warmth, even a tenderness — affection for this giant glowing ball of hydrogen, this protector of our planet, this lonely beacon in a vast emptiness… the only star you and I will ever know.

Filed under: Astronomy, History of Science Tagged: astronomy, earth, eclipse, moon, space, sun

Scott AaronsonWhat I believe II (ft. Sarah Constantin and Stacey Jeffery)

Unrelated Update: To everyone who keeps asking me about the “new” P≠NP proof: I’d again bet $200,000 that the paper won’t stand, except that the last time I tried that, it didn’t achieve its purpose, which was to get people to stop asking me about it. So: please stop asking, and if the thing hasn’t been refuted by the end of the week, you can come back and tell me I was a closed-minded fool.

In my post “The Kolmogorov Option,” I tried to step back from current controversies, and use history to reflect on the broader question of how nerds should behave when their penchant for speaking unpopular truths collides head-on with their desire to be kind and decent and charitable, and to be judged as such by their culture.  I was gratified to get positive feedback about this approach from men and women all over the ideological spectrum.

However, a few people who I like and respect accused me of “dogwhistling.” They warned, in particular, that if I wouldn’t just come out and say what I thought about the James Damore Google memo thing, then people would assume the very worst—even though, of course, my friends themselves knew better.

So in this post, I’ll come out and say what I think.  But first, I’ll do something even better: I’ll hand the podium over to two friends, Sarah Constantin and Stacey Jeffery, both of whom were kind enough to email me detailed thoughts in response to my Kolmogorov post.

Sarah Constantin completed her PhD in math at Yale. I don’t think I’ve met her in person yet, but we have a huge number of mutual friends in the so-called “rationalist community.”  Whenever Sarah emails me about something I’ve written, I pay extremely close attention, because I have yet to read a single thing by her that wasn’t full of insight and good sense.  I strongly urge anyone who likes her beautiful essay below to check out her blog, which is called Otium.

Sarah Constantin’s Commentary:

I’ve had a women-in-STEM essay brewing in me for years, but I’ve been reluctant to actually write publicly on the topic for fear of stirring up a firestorm of controversy.  On the other hand, we seem to be at a cultural inflection point on the issue, especially in the wake of the leaked Google memo, and other people are already scared to speak out, so I think it’s past time for me to put my name on the line, and Scott has graciously provided me a platform to do so.

I’m a woman in tech myself. I’m a data scientist doing machine learning for drug discovery at Recursion Pharmaceuticals, and before that I was a data scientist at Palantir. Before that I was a woman in math — I got my PhD from Yale, studying applied harmonic analysis. I’ve been in this world all my adult life, and I obviously don’t believe my gender makes me unfit to do the work.

I’m also not under any misapprehension that I’m some sort of exception. I’ve been mentored by Ingrid Daubechies and Maryam Mirzakhani (the first female Fields Medalist, who died tragically young last month).  I’ve been lucky enough to work with women who are far, far better than me.  There are a lot of remarkable women in math and computer science — women just aren’t the majority in those fields. But “not the majority” doesn’t mean “rare” or “unknown.”

I even think diversity programs can be worthwhile. I went to the Institute for Advanced Studies’ Women and Math Program, which would be an excellent graduate summer school even if it weren’t all-female, and taught at its sister program for high school girls, which likewise is a great math camp independent of the gender angle. There’s a certain magic, if you’re in a male-dominated field, of once in a while being in a room full of women doing math, and I hope that everybody gets to have that experience once.  

But (you knew the “but” was coming), I think the Google memo was largely correct, and the way people conventionally talk about women in tech is wrong.

Let’s look at some of his claims. From the beginning of the memo:

  • Google’s political bias has equated the freedom from offense with psychological safety, but shaming into silence is the antithesis of psychological safety.
  • This silencing has created an ideological echo chamber where some ideas are too sacred to be honestly discussed.
  • The lack of discussion fosters the most extreme and authoritarian elements of this ideology.
  • Extreme: all disparities in representation are due to oppression
  • Authoritarian: we should discriminate to correct for this oppression

Okay, so there’s a pervasive assumption that any deviation from 50% representation of women in technical jobs is a.) due to oppression, and b.) ought to be corrected by differential hiring practices. I think it is basically true that people widely believe this, and that people can lose their jobs for openly contradicting it (as James Damore, the author of the memo, did).  I have heard people I work with advocating hiring quotas for women (i.e. explicitly earmarking a number of jobs for women candidates only).  It’s not a strawman.

Then, Damore disagrees with this assumption:

  • Differences in distributions of traits between men and women may in part explain why we don’t have 50% representation of women in tech and leadership. Discrimination to reach equal representation is unfair, divisive, and bad for business.

Again, I agree with Damore. Note that this doesn’t mean that I must believe that sexism against women isn’t real and important (I’ve heard enough horror stories to be confident that some work environments are toxic to women).  It doesn’t even mean that I must be certain that the different rates of men and women in technical fields are due to genetics.  I’m very far from certain, and I’m not an expert in psychology. I don’t think I can do justice to the science in this post, so I’m not going to cover the research literature.

But I do think it’s irresponsible to assume a priori that there are no innate sex differences that might explain what we see.  It’s an empirical matter, and a topic for research, not dogma.

Moreover, I think discrimination on the basis of sex to reach equal representation is unfair and unproductive.  It’s unfair, because it’s not meritocratic.  You’re not choosing the best human for the job regardless of gender.

I think women might actually benefit from companies giving genuine meritocracy a chance. “Blind” auditions (in which the evaluator doesn’t see the performer) gave women a better chance of landing orchestra jobs; apparently, orchestras were prejudiced against female musicians, and the blinding canceled out that prejudice. Google’s own research has actually shown that the single best predictor of work performance is a work sample — testing candidates with a small project similar to what they’d do on the job. Work samples are easy to anonymize to reduce gender bias, and they’re more effective than traditional interviews, where split-second first impressions usually decide who gets hired, but don’t correlate at all with job performance. A number of tech companies have switched to work samples as part of their interview process.  I used work samples myself when I was hiring for a startup, just because they seemed more accurate at predicting who’d be good at the job; entirely without intending to, I got a 50% gender ratio.  If you want to reduce gender bias in tech, it’s worth at least considering blinded hiring via work samples.

Moreover, thinking about “representation” in science and technology reflects underlying assumptions that I think are quite dangerous.

You expect interest groups to squabble over who gets a piece of the federal budget. In politics, people will band together in blocs, and try to get the biggest piece of the spoils they can.  “Women should get such-and-such a percent of tech jobs” sounds precisely like this kind of politicking; women are assumed to be a unified bloc who will vote together, and the focus is on what size chunk they can negotiate for themselves. If a tech job (or a university position) were a cushy sinecure, a ticket to privilege, and nothing more, you might reasonably ask “how come some people get more goodies than others? Isn’t meritocracy just an excuse to restrict the goodies to your preferred group?”

Again, this is not a strawman. Here’s one Vox response to the memo stating explicitly that she believes women are a unified bloc:

The manifesto’s sleight-of-hand delineation between “women, on average” and the actual living, breathing women who have had to work alongside this guy failed to reassure many of those women — and failed to reassure me. That’s because the manifesto’s author overestimated the extent to which women are willing to be turned against their own gender.

Speaking for myself, it doesn’t matter to me how soothingly a man coos that I’m not like most women, when those coos are accompanied by misogyny against most women. I am a woman. I do not stop being one during the parts of the day when I am practicing my craft. There can be no realistic chance of individual comfort for me in an environment where others in my demographic categories (or, really, any protected demographic categories) are subjected to skepticism and condescension.

She can’t be comfortable unless everybody in any protected demographic category — note that this is a legal, governmental category — is given the benefit of the doubt?  That’s a pretty collectivist commitment!

Or, look at Piper Harron, an assistant professor in math who blogged on the American Mathematical Society’s website that universities should simply “stop hiring white cis men”, and explicitly says “If you are on a hiring committee, and you are looking at applicants and you see a stellar white male applicant, think long and hard about whether your department needs another white man. You are not hiring a researching robot who will output papers from a dark closet. You are hiring an educator, a role model, a spokesperson, an advisor, a committee person … There is no objectivity. There is no meritocracy.”

Piper Harron reflects an extreme, of course, but she’s explicitly saying, on America’s major communication channel for and by mathematicians, that whether you get to work in math should not be based on whether you’re actually good at math. For her, it’s all politics.  Life itself is political, and therefore a zero-sum power struggle between groups.  

But most of us, male or female, didn’t fall in love with science and technology for that. Science is the mission to explore and understand our universe. Technology is the project of expanding human power to shape that universe. What we do towards those goals will live longer than any “protected demographic category”, any nation, any civilization.  We know how the Babylonians mapped the stars.

Women deserve an equal chance at a berth on the journey of exploration not because they form a political bloc but because some of them are discoverers and can contribute to the human mission.

Maybe, in a world corrupted by rent-seeking, the majority of well-paying jobs have some element of unearned privilege; perhaps almost all of us got at least part of our salaries by indirectly expropriating someone who had as good a right to it as us.

But that’s not a good thing, and that’s not what we hope for science and engineering to be, and I truly believe that this is not the inevitable fate of the human race — that we can only squabble over scraps, and never create.  

I’ve seen creation, and I’ve seen discovery. I know they’re real.

I care a lot more about whether my company achieves its goal of curing 100 rare diseases in 10 years than about the demographic makeup of our team.  We have an actual mission; we are trying to do something beyond collecting spoils.  

Do I rely on brilliant work by other women every day? I do. My respect for myself and my female colleagues is not incompatible with primarily caring about the mission.

Am I “turning against my own gender” because I see women as individuals first? I don’t think so. We’re half the human race, for Pete’s sake! We’re diverse. We disagree. We’re human.

When you think of “women-in-STEM” as a talking point on a political agenda, you mention Ada Lovelace and Grace Hopper in passing, and move on to talking about quotas.  When you think of women as individuals, you start to notice how many genuinely foundational advances were made by women — just in my own field of machine learning, Adele Cutler co-invented random forests, Corrina Cortes co-invented support vector machines, and Fei Fei Li created the famous ImageNet benchmark dataset that started a revolution in image recognition.

As a child, my favorite book was Carl Sagan’s Contact, a novel about Ellie Arroway, an astronomer loosely based on his wife Ann Druyan. The name is not an accident; like the title character in Sinclair Lewis’ Arrowsmith, Ellie is a truth-seeking scientist who battles corruption, anti-intellectualism, and blind prejudice.  Sexism is one of the challenges she faces, but the essence of her life is about wonder and curiosity. She’s what I’ve always tried to become.

I hope that, in seeking to encourage the world’s Ellies in science and technology, we remember why we’re doing that in the first place. I hope we remember humans are explorers.

Now let’s hear from another friend who wrote to me recently, and who has a slightly different take.  Stacey Jeffery is a quantum computing theorist at one of my favorite research centers, CWI in Amsterdam.  She completed her PhD at University of Waterloo, and has done wonderful work on quantum query complexity and other topics close to my heart.  When I was being viciously attacked in the comment-171 affair, Stacey was one of the first people to send me a note of support, and I’ve never forgotten it.

Stacey Jeffery’s Commentary

I don’t think Google was right to fire Damore. This makes me a minority among people with whom I have discussed this issue.  Hopefully some people come out in the comments in support of the other position, so it’s not just me presenting that view, but the main argument I encountered was that what he said just sounded way too sexist for Google to put up with.  I agree with part of that, it did sound sexist to me.  In fact it also sounded racist to me. But that’s not because he necessarily said anything actually sexist or actually racist, but because he said the kinds of things that you usually only hear from sexist people, and in particular, the kind of sexist people who are also racist.  I’m very unlikely to try to pursue further interaction with a person who says these kinds of things for those reasons, but I think firing him for what he said between the lines sets a very bad precedent.  It seems to me he was fired for associating himself with the wrong ideas, and it does feel a bit like certain subjects are not up for rational discussion.  If Google wants an open environment, where employees can feel safe discussing company policy, I don’t think this contributes to that.  If they want their employees, and the world, to think that they aim for diversity because it’s the most rational course of action to achieve their overall objectives, rather than because it serves some secret agenda, like maintaining a PC public image, then I don’t think they’ve served that cause either.  Personally, this irritates me the most, because I feel they have damaged the image for a cause I feel strongly about.

My position is independent of the validity of Damore’s attempt at scientific argument, which is outside my area of expertise.  I personally don’t think it’s very productive for non-social-scientists to take authoritative positions on social science issues, especially ones that appear to be controversial within the field (but I say this as a layperson).  This may include some of the other commentary in this blog post, which I have not yet read, and might even extend to Scott’s decision to comment on this issue at all (but this bridge was crossed in the previous blog post).  However, I think one of the reasons that many of us do this is that the burden of solving the problem of too few women in STEM is often placed on us.  Some people in STEM feel they are blamed for not being welcoming enough to women (in fact, in my specific field, it’s my experience that the majority of people are very sympathetic).  Many scientific funding applications even ask applicants how they plan to address the issue of diversity, as if they should be the ones to come up with a solution for this difficult problem that nobody knows the answer to, and is not even within their expertise.  So it’s not surprising when these same people start to think about and form opinions on these social science issues.  Obviously, we working in STEM have valuable insight into how we might encourage women to pursue STEM careers, and we should be pushed to think about this, but we don’t have all the answers (and maybe we should remember that the next time we consider authoring an authoritative memo on the subject).

Scott’s Mansplaining Commentary

I’m incredibly grateful to Sarah and Stacey for sharing their views.  Now it’s time for me to mansplain my own thoughts in light of what they said.  Let me start with a seven-point creed.

1. I believe that science and engineering, both in academia and in industry, benefit enormously from contributions from people of every ethnic background and gender identity.  This sort of university-president-style banality shouldn’t even need to be said, but in a world where the President of the US criticizes neo-Nazis only under extreme pressure from his own party, I suppose it does.

2. I believe that there’s no noticeable difference in average ability between men and women in STEM fields—or if there’s some small disparity, for all I know the advantage goes to women. I have enough Sheldon Cooper in me that, if this hadn’t been my experience, I’d probably let it slip that it hadn’t been, but it has been.  When I taught 6.045 (undergrad computability and complexity) at MIT, women were only 20% or so of the students, but for whatever reasons they were wildly overrepresented among the top students.

3. I believe that women in STEM face obstacles that men don’t.  These range from the sheer awkwardness of sometimes being the only woman in a room full of guys, to challenges related to pregnancy and childcare, to actual belittlement and harassment.  Note that, even if men in STEM fields are no more sexist on average than men in other fields—or are less sexist, as one might expect from their generally socially liberal views and attitudes—the mere fact of the gender imbalance means that women in STEM will have many more opportunities to be exposed to whatever sexists there are.  This puts a special burden on us to create a welcoming environment for women.

4. Given that we know that gender gaps in interest and inclination appear early in life, I believe in doing anything we can to encourage girls’ interest in STEM fields.  Trust me, my four-year-old daughter Lily wishes I didn’t believe so fervently in working with her every day on her math skills.

5. I believe that gender diversity is valuable in itself.  It’s just nicer, for men and women alike, to have a work environment with many people of both sexes—especially if (as is often the case in STEM) so much of our lives revolves around our work.  I think that affirmative action for women, women-only scholarships and conferences, and other current efforts to improve gender diversity can all be defended and supported on that ground alone.

6. I believe that John Stuart Mill’s The Subjection of Women is one of the masterpieces of history, possibly the highest pinnacle that moral philosophy has ever reached.  Everyone should read it carefully and reflect on it if they haven’t already.

7. I believe it’s a tragedy that the current holder of the US presidency is a confessed sexual predator, who’s full of contempt not merely for feminism, but for essentially every worthwhile human value. I believe those of us on the “pro-Enlightenment side” now face the historic burden of banding together to stop this thug by every legal and peaceful means available. I believe that, whenever the “good guys” tear each other down in internecine warfare—e.g. “nerds vs. feminists”—it represents a wasted opportunity and an unearned victory for the enemies of progress.

OK, now for the part that might blow some people’s minds.  I hold that every single belief above is compatible with what James Damore wrote in his now-infamous memo—at least, if we’re talking about the actual words in it.  In some cases, Damore even makes the above points himself.  In particular, there’s nothing in what he wrote about female Googlers being less qualified on average than male Googlers, or being too neurotic to code, or anything like that: the question at hand is just why there are fewer women in these positions, and that in turn becomes a question about why there are fewer women earlier in the CS pipeline.  Reasonable people need not agree about the answers to those questions, or regard them as known or obvious, to see that the failure to make this one elementary distinction, between quality and quantity, already condemns 95% of Damore’s attackers as not having read or understood what he wrote.

Let that be the measure of just how terrifyingly efficient the social-media outrage machine has become at twisting its victims’ words to fit a clickbait narrative—a phenomenon with which I happen to be personally acquainted.  Strikingly, it seems not to make the slightest difference if (as in this case) the original source text is easily available to everyone.

Still, while most coverage of Damore’s memo was depressing in its monotonous incomprehension, dissent was by no means confined to the right-wingers eager to recruit Damore to their side.  Peter Singer—the legendary leftist moral philosopher, and someone whose fearlessness and consistency I’ve always admired whether I’ve agreed with him or not—wrote a powerful condemnation of Google’s decision to fire Damore.  Scott Alexander was brilliant as usual in picking apart bad arguments.  Megan McArdle drew on her experiences to illustrate some of Damore’s contentions.  Steven Pinker tweeted that Damore’s firing “makes [the] job of anti-Trumpists harder.”

Like Peter Singer, and also like Sarah Constantin and Stacey Jeffery above, I have no plans to take any position on biological differences in male and female inclinations and cognitive styles, and what role (if any) such differences might play in 80% of Google engineers being male—or, for that matter, what role they might play in 80% of graduating veterinarians now being female, or other striking gender gaps.  I decline to take a position not only because I’m not an expert, but also because, as Singer says, doing so isn’t necessary to reach the right verdict about Damore’s firing.  It suffices to note that the basic thesis being discussed—namely, that natural selection doesn’t stop at the neck, and that it’s perfectly plausible that it acted differently on women and men in ways that might help explain many of the population-level differences that we see today—can also be found in, for example, The Blank Slate by Steven Pinker, and other mainstream works by some of the greatest thinkers alive.

And therefore I say: if James Damore deserves to be fired from Google, for treating evolutionary psychology as potentially relevant to social issues, then Steven Pinker deserves to be fired from Harvard for the same offense.

Yes, I realize that an employee of a private company is different from a tenured professor.  But I don’t see why it’s relevant here.  For if someone really believes that mooting the hypothesis of an evolutionary reason for average differences in cognitive styles between men and women, is enough by itself to create a hostile environment for women—well then, why should tenure be a bar to firing, any more than it is in cases of sexual harassment?

But the reductio needn’t stop there.  It seems to me that, if Damore deserves to be fired, then so do the 56% of Googlers who said in a poll that they opposed his firing.  For isn’t that 56% just as responsible for maintaining a hostile environment as Damore himself was? (And how would Google find out which employees opposed the firing? Well, if there’s any company on earth that could…)  Furthermore, after those 56% of Googlers are fired, any of the remaining 44% who think the 56% shouldn’t have been fired should be fired as well!  And so on iteratively, until only an ideologically reliable core remains, which might or might not be the empty set.

OK, but while the wider implications of Damore’s firing have frightened and depressed me all week, as I said, I depart from Damore on the question of affirmative action and other diversity policies.  Fundamentally, what I want is a sort of negotiated agreement or bargain, between STEM nerds and the wider culture in which they live.  The agreement would work like this: STEM nerds do everything they can to foster diversity, including by creating environments that are welcoming for women, and by supporting affirmative action, women-only scholarships and conferences, and other diversity policies.  The STEM nerds also agree never to talk in public about possible cognitive-science explanations for gender disparities in which careers people choose, or overlapping bell curves,  or anything else potentially inflammatory.  In return, just two things:

  1. Male STEM nerds don’t regularly get libelled as misogynist monsters, who must be scaring all the women away with their inherently gross, icky, creepy, discriminatory brogrammer maleness.
  2. The fields beloved by STEM nerds are suffered to continue to exist, rather than getting destroyed and rebuilt along explicitly ideological lines, as already happened with many humanities and social science fields.

So in summary, neither side advances its theories about the causes of gender gaps; both sides simply agree that there are more interesting topics to explore.  In concrete terms, the social-justice side gets to retain 100% of what it has now, or maybe even expand it.  And all it has to offer in exchange is “R-E-S-P-E-C-T“!  Like, don’t smear and shame male nerds as a class, or nerdy disciplines themselves, for gender gaps that the male nerds would be as happy as anybody to see eradicated.

The trouble is that, fueled by outrage-fests on social media, I think the social-justice side is currently failing to uphold its end of this imagined bargain.  Nearly every day the sun rises to yet another thinkpiece about the toxic “bro culture” of Silicon Valley: a culture so uniquely and incorrigibly misogynist, it seems, that it still intentionally keeps women out, even after law and biology and most other white-collar fields have achieved or exceeded gender parity, their own “bro cultures” notwithstanding.  The trouble with this slander against male STEM nerds, besides its fundamental falsity (which Scott Alexander documented), is that puts the male nerds into an impossible position.  For how can they refute the slander without talking about other possible explanations for fields like CS being 80% male, which is the very thing we all know they’re not supposed to talk about?

In Europe, in the Middle Ages, the Church would sometimes enjoy forcing the local Jews into “disputations” about whose religion was the true one.  At these events, a popular tactic on the Church’s side was to make statements that the Jews couldn’t possibly answer without blaspheming the name of Christ—which, of course, could lead to the Jews’ expulsion or execution if they dared it.

Maybe I have weird moral intuitions, but it’s hard for me to imagine a more contemptible act of intellectual treason, than deliberately trapping your opponents between surrender and blasphemy.  I’d actually rather have someone force me into one or the other, than make me choose, and thereby make me responsible for whichever choice I made.  So I believe the social-justice left would do well to forswear this trapping tactic forever.

Ironically, I suspect that in the long term, doing so would benefit no entity more than the social-justice left itself.  If I had to steelman, in one sentence, the argument that in the space of one year propelled the “alt-right” from obscurity in dark and hateful corners of the Internet, to the improbable and ghastly ascent of Donald Trump and his white-nationalist brigade to the most powerful office on earth, the argument would be this:

If the elites, the technocrats, the “Cathedral”-dwellers, were willing to lie to the masses about humans being blank slates—and they obviously were—then why shouldn’t we assume that they also lied to us about healthcare and free trade and guns and climate change and everything else?

We progressives deluded ourselves that we could permanently shame our enemies into silence, on pain of sexism, racism, xenophobia, and other blasphemies.  But the “victories” won that way were hollow and illusory, and the crumbling of the illusion brings us to where we are now: with a vindictive, delusional madman in the White House who has a non-negligible chance of starting a nuclear war this week.

The Enlightenment was a specific historical period in 18th-century Europe.  But the term can also be used much more broadly, to refer to every trend in human history that’s other than horrible.  Seen that way, the Enlightenment encompasses the scientific revolution, the abolition of slavery, the decline of all forms of violence, the spread of democracy and literacy, and the liberation of women from domestic drudgery to careers of their own choosing.  The invention of Google, which made the entire world’s knowledge just a search bar away, is now also a permanent part of the story of the Enlightenment.

I fantasize that, within my lifetime, the Enlightenment will expand further to tolerate a diversity of cognitive styles—including people on the Asperger’s and autism spectrum, with their penchant for speaking uncomfortable truths—as well as a diversity of natural abilities and inclinations.  Society might or might not get the “demographically correct” percentage of Ellie Arroways—Ellie might decide to become a doctor or musician rather than an astronomer, and that’s fine too—but most important, it will nurture all the Ellie Arroways that it gets, all the misfits and explorers of every background.  I wonder whether, while disagreeing on exactly what’s meant by it, all parties to this debate could agree that diversity represents a next frontier for the Enlightenment.

Comment Policy: Any comment, from any side, that attacks people rather than propositions will be deleted.  I don’t care if the comment also makes useful points: if it contains a single ad hominem, it’s out.

As it happens, I’m at a quantum supremacy workshop in Bristol, UK right now—yeah, yeah, I’m a closet supremacist after all, hur hur—so I probably won’t participate in the comments until later.

Tommaso DorigoRevenge Of The Slimeballs - Part 4

This is the fourth part of Chapter 3 of the book "Anomaly! Collider Physics and the Quest for New Phenomena at Fermilab". The chapter recounts the pioneering measurement of the Z mass by the CDF detector, and the competition with SLAC during the summer of 1989. The title of the post is the same as the one of chapter 3, and it refers to the way some SLAC physicists called their Fermilab colleagues, whose hadron collider was to their eyes obviously inferior to the electron-positron linear collider.

read more

Terence TaoOn the sign patterns of entrywise positivity preservers in fixed dimension

Apoorva Khare and I have just uploaded to the arXiv our paper “On the sign patterns of entrywise positivity preservers in fixed dimension“. This paper explores the relationship between positive definiteness of Hermitian matrices, and entrywise operations on these matrices. The starting point for this theory is the Schur product theorem, which asserts that if {A = (a_{ij})_{1 \leq i,j \leq N}} and {B = (b_{ij})_{1 \leq i,j \leq N}} are two {N \times N} Hermitian matrices that are positive semi-definite, then their Hadamard product

\displaystyle  A \circ B := (a_{ij} b_{ij})_{1 \leq i,j \leq N}

is also positive semi-definite. (One should caution that the Hadamard product is not the same as the usual matrix product.) To prove this theorem, first observe that the claim is easy when {A = {\bf u} {\bf u}^*} and {B = {\bf v} {\bf v}^*} are rank one positive semi-definite matrices, since in this case {A \circ B = ({\bf u} \circ {\bf v}) ({\bf u} \circ {\bf v})^*} is also a rank one positive semi-definite matrix. The general case then follows by noting from the spectral theorem that a general positive semi-definite matrix can be expressed as a non-negative linear combination of rank one positive semi-definite matrices, and using the bilinearity of the Hadamard product and the fact that the set of positive semi-definite matrices form a convex cone. A modification of this argument also lets one replace “positive semi-definite” by “positive definite” in the statement of the Schur product theorem.

One corollary of the Schur product theorem is that any polynomial {P(z) = c_0 + c_1 z + \dots + c_d z^d} with non-negative coefficients {c_n \geq 0} is entrywise positivity preserving on the space {{\mathbb P}_N({\bf C})} of {N \times N} positive semi-definite Hermitian matrices, in the sense that for any matrix {A = (a_{ij})_{1 \leq i,j \leq N}} in {{\mathbb P}_N({\bf C})}, the entrywise application

\displaystyle  P[A] := (P(a_{ij}))_{1 \leq i,j \leq N}

of {P} to {A} is also positive semi-definite. (As before, one should caution that {P[A]} is not the application {P(A)} of {P} to {A} by the usual functional calculus.) Indeed, one can expand

\displaystyle  P[A] = c_0 A^{\circ 0} + c_1 A^{\circ 1} + \dots + c_d A^{\circ d},

where {A^{\circ i}} is the Hadamard product of {i} copies of {A}, and the claim now follows from the Schur product theorem and the fact that {{\mathbb P}_N({\bf C})} is a convex cone.

A slight variant of this argument, already observed by Pólya and Szegö in 1925, shows that if {I} is any subset of {{\bf C}} and

\displaystyle  f(z) = \sum_{n=0}^\infty c_n z^n \ \ \ \ \ (1)

is a power series with non-negative coefficients {c_n \geq 0} that is absolutely and uniformly convergent on {I}, then {f} will be entrywise positivity preserving on the set {{\mathbb P}_N(I)} of positive definite matrices with entries in {I}. (In the case that {I} is of the form {I = [0,\rho]}, such functions are precisely the absolutely monotonic functions on {I}.)

In the work of Schoenberg and of Rudin, we have a converse: if {f: (-1,1) \rightarrow {\bf C}} is a function that is entrywise positivity preserving on {{\mathbb P}_N((-1,1))} for all {N}, then it must be of the form (1) with {c_n \geq 0}. Variants of this result, with {(-1,1)} replaced by other domains, appear in the work of Horn, Vasudeva, and Guillot-Khare-Rajaratnam.

This gives a satisfactory classification of functions {f} that are entrywise positivity preservers in all dimensions {N} simultaneously. However, the question remains as to what happens if one fixes the dimension {N}, in which case one may have a larger class of entrywise positivity preservers. For instance, in the trivial case {N=1}, a function would be entrywise positivity preserving on {{\mathbb P}_1((0,\rho))} if and only if {f} is non-negative on {I}. For higher {N}, there is a necessary condition of Horn (refined slightly by Guillot-Khare-Rajaratnam) which asserts (at least in the case of smooth {f}) that all derivatives of {f} at zero up to {(N-1)^{th}} order must be non-negative in order for {f} to be entrywise positivity preserving on {{\mathbb P}_N((0,\rho))} for some {0 < \rho < \infty}. In particular, if {f} is of the form (1), then {c_0,\dots,c_{N-1}} must be non-negative. In fact, a stronger assertion can be made, namely that the first {N} non-zero coefficients in (1) (if they exist) must be positive, or equivalently any negative term in (1) must be preceded (though not necessarily immediately) by at least {N} positive terms. If {f} is of the form (1) is entrywise positivity preserving on the larger set {{\mathbb P}_N((0,+\infty))}, one can furthermore show that any negative term in (1) must also be followed (though not necessarily immediately) by at least {N} positive terms.

The main result of this paper is that these sign conditions are the only constraints for entrywise positivity preserving power series. More precisely:

Theorem 1 For each {n}, let {\epsilon_n \in \{-1,0,+1\}} be a sign.

  • Suppose that any negative sign {\epsilon_M = -1} is preceded by at least {N} positive signs (thus there exists {0 \leq n_0 < \dots < n_{N-1}< M} with {\epsilon_{n_0} = \dots = \epsilon_{n_{N-1}} = +1}). Then, for any {0 < \rho < \infty}, there exists a convergent power series (1) on {(0,\rho)}, with each {c_n} having the sign of {\epsilon_n}, which is entrywise positivity preserving on {{\bf P}_N((0,\rho))}.
  • Suppose in addition that any negative sign {\epsilon_M = -1} is followed by at least {N} positive signs (thus there exists {M < n_N < \dots < n_{2N-1}} with {\epsilon_{n_N} = \dots = \epsilon_{n_{2N-1}} = +1}). Then there exists a convergent power series (1) on {(0,+\infty)}, with each {c_n} having the sign of {\epsilon_n}, which is entrywise positivity preserving on {{\bf P}_N((0,+\infty))}.

One can ask the same question with {(0,\rho)} or {(0,+\infty)} replaced by other domains such as {(-\rho,\rho)}, or the complex disk {D(0,\rho)}, but it turns out that there are far fewer entrywise positivity preserving functions in those cases basically because of the non-trivial zeroes of Schur polynomials in these ranges; see the paper for further discussion. We also have some quantitative bounds on how negative some of the coefficients can be compared to the positive coefficients, but they are a bit technical to state here.

The heart of the proofs of these results is an analysis of the determinants {\mathrm{det} P[ {\bf u} {\bf u}^* ]} of polynomials {P} applied entrywise to rank one matrices {uu^*}; the positivity of these determinants can be used (together with a continuity argument) to establish the positive definiteness of {P[uu^*]} for various ranges of {P} and {u}. Using the Cauchy-Binet formula, one can rewrite such determinants as linear combinations of squares of magnitudes of generalised Vandermonde determinants

\displaystyle  \mathrm{det}( u_i^{n_j} )_{1 \leq i,j \leq N},

where {{\bf u} = (u_1,\dots,u_N)} and the signs of the coefficients in the linear combination are determined by the signs of the coefficients of {P}. The task is then to find upper and lower bounds for the magnitudes of such generalised Vandermonde determinants. These determinants oscillate in sign, which makes the problem look difficult; however, an algebraic miracle intervenes, namely the factorisation

\displaystyle  \mathrm{det}( u_i^{n_j} )_{1 \leq i,j \leq N} = \pm V({\bf u}) s_\lambda({\bf u})

of the generalised Vandermonde determinant into the ordinary Vandermonde determinant

\displaystyle  V({\bf u}) = \prod_{1 \leq i < j \leq N} (u_j - u_i)

and a Schur polynomial {s_\lambda} applied to {{\bf u}}, where the weight {\lambda} of the Schur polynomial is determined by {n_1,\dots,n_N} in a simple fashion. The problem then boils down to obtaining upper and lower bounds for these Schur polynomials. Because we are restricting attention to matrices taking values in {(0,\rho)} or {(0,+\infty)}, the entries of {{\bf u}} can be taken to be non-negative. One can then take advantage of the total positivity of the Schur polynomials to compare these polynomials with a monomial, at which point one can obtain good criteria for {P[A]} to be positive definite when {A} is a rank one positive definite matrix {A = {\bf u} {\bf u}^*}.

If we allow the exponents {n_1,\dots,n_N} to be real numbers rather than integers (thus replacing polynomials or power series by Pusieux series or Hahn series), then we lose the above algebraic miracle, but we can replace it with a geometric miracle, namely the Harish-Chandra-Itzykson-Zuber identity, which I discussed in this previous blog post. This factors the above generalised Vandermonde determinant as the product of the ordinary Vandermonde determinant and an integral of a positive quantity over the orthogonal group, which one can again compare with a monomial after some fairly elementary estimates.

It remains to understand what happens for more general positive semi-definite matrices {A}. Here we use a trick of FitzGerald and Horn to amplify the rank one case to the general case, by expressing a general positive semi-definite matrix {A} as a linear combination of a rank one matrix {{\bf u} {\bf u}^*} and another positive semi-definite matrix {B} that vanishes on the last row and column (and is thus effectively a positive definite {N-1 \times N-1} matrix). Using the fundamental theorem of calculus to continuously deform the rank one matrix {{\bf u} {\bf u}^*} to {A} in the direction {B}, one can then obtain positivity results for {P[A]} from positivity results for {P[{\bf u} {\bf u}^*]} combined with an induction hypothesis on {N}.

Filed under: math.CA, math.SP, paper, Uncategorized Tagged: Hadamard product, Harish-Chandra-Itzykson-Zuber formula, positive definite matrices, Schur polynomials

John BaezA Bicategory of Decorated Cospans

My students are trying to piece together general theory of networks, inspired by many examples. A good general theory should clarify and unify these examples. What some people call network theory, I’d just call ‘applied graph invariant theory’: they come up with a way to calculate numbers from graphs, they calculate these numbers for graphs that show up in nature, and then they try to draw conclusions about this. That’s fine as far as it goes, but there’s a lot more to network theory!

There are many kinds of networks. You can usually create big networks of a given kind by sticking together smaller networks of this kind. The networks usually do something, and the behavior of the whole is usually determined by the behavior of the parts and how the parts are stuck together.

So, we should think of networks of a given kind as morphisms in a category, or more generally elements of an algebra of some operad, and define a map sending each such network to its behavior. Then we can study this map mathematically!

All these insights (and many more) are made precise in Fong’s theory of ‘decorated cospans’:

• Brendan Fong, The Algebra of Open and Interconnected Systems, Ph.D. thesis, University of Oxford, 2016. (Blog article here.)

Kenny Courser is starting to look at the next thing: how one network can turn into another. For example, a network might change over time, or we might want to simplify a complicated network somehow. If a network is morphism, a process where one network turns into another could be a ‘2-morphism’: that is, a morphism between morphisms. Just as categories have objects and morphisms, bicategories have objects, morphisms and 2-morphisms.

So, Kenny is looking at bicategories. As a first step, Kenny took Brendan’s setup and souped it up to define ‘decorated cospan bicategories’:

• Kenny Courser, Decorated cospan bicategories, Theory and Applications of Categories 32 (2017), 985–1027.

In this paper, he showed that these bicategories are often ‘symmetric monoidal’. This means that you can not only stick networks together end to end, you can also set them side by side or cross one over the other—and similarly for processes that turn one network into another! A symmetric monoidal bicategory is a somewhat fearsome structure, so Kenny used some clever machinery developed by Mike Shulman to get the job done:

• Mike Shulman, Constructing symmetric monoidal bicategories.

I would love to talk about the details, but they’re a bit technical so I think I’d better talk about something more basic. Namely: what’s a decorated cospan category and what’s a decorated cospan bicategory?

First: what’s a decorated cospan? A cospan in some category C is a diagram like this:

where the objects and morphisms are all in C. For example, if C is the category of sets, we’ve got two sets X and Y mapped to a set \Gamma.

In a ‘decorated’ cospan, the object \Gamma is equipped or, as we like to say, ‘decorated’ with extra structure. For example:

Here the set \Gamma consists of 3 points—but it’s decorated with a graph whose edges are labelled by numbers! You could use this to describe an electrical circuit made of resistors. The set X would then be the set of ‘input terminals’, and Y the set of ‘output terminals’.

In this example, and indeed in many others, there’s no serious difference between inputs and outputs. We could reflect the picture, switching the roles of X and Y, and the inputs would become outputs and vice versa. One reason for distinguishing them is that we can then attach the outputs of one circuit to the inputs of another and build a larger circuit. If we think of our circuit as a morphism from the input set X to the output set Y, this process of attaching circuits to form larger ones can be seen as composing morphisms in a category.

In other words, if we get the math set up right, we can compose a decorated cospan from X to Y and a decorated cospan from Y to Z and get a decorated cospan from X to Z. So with luck, we get a category with objects of C as objects, and decorated cospans between these guys as morphisms!

For example, we can compose this:

and this:

to get this:

What did I mean by saying ‘with luck’? Well, there’s not really any luck involved, but we need some assumptions for all this to work. Before we even get to the decorations, we need to be able to compose cospans. We can do this whenever our cospans live in a category with pushouts. In category theory, a pushout is how we glue two things together.

So, suppose our category C has pushouts. IF we then have two cospans in C, one from X to Y and one from Y to Z:

we can take a pushout:

and get a cospan from X to Z:

All this is fine and dandy, but there’s a slight catch: the pushout is only defined up to isomorphism, so we can’t expect this process of composing cospans to be associative: it will only be associative up to isomorphism.

What does that mean? What’s an isomorphism of cospans?

I’m glad you asked. A map of cospans is a diagram like this:

where the two triangles commmute. You can see two cospans in this picture; the morphism f provides the map from one to the other. If f is an isomorphism, then this is an isomorphism of cospans.

To get around this problem, we can work with a category where the morphisms aren’t cospans, but isomorphism classes of cospans. That’s what Brendan did, and it’s fine for many purposes.

But back around 1972, when Bénabou was first inventing bicategories, he noticed that you could also create a bicategory with

• objects of C as objects,
• spans in C as morphisms, and
• maps of spans in C as 2-morphisms.

Bicategories are perfectly happy for composition of 1-morphisms to be associative only up to isomorphism, so this solves the problem in a somewhat nicer way. (Taking equivalence classes of things when you don’t absolutely need to is regarded with some disdain in category theory, because it often means you’re throwing out information—and when you throw out information, you often regret it later.)

So, if you’re interested in decorated cospan categories, and you’re willing to work with bicategories, you should consider thinking about decorated cospan bicategories. And now, thanks to Kenny Courser’s work, you can!

He showed how the decorations work in the bicategorical approach: for example, he proved that whenever C has finite colimits and

F : (C,+) \to (\mathrm{Set}, \times)

is a lax symmetric monoidal functor, you get a symmetric monoidal bicategory where a morphism is a cospan in C:

with the object \Gamma decorated by an element of F(\Gamma).

Proving this took some virtuosic work in category theory. The key turns out to be this glorious diagram:

For the explanation, check out Proposition 4.1 in his paper.

I’ll talk more about applications of cospan bicategories when I blog about some other papers Kenny Courser and Daniel Cicala are writing.

August 17, 2017

John PreskillTopological qubits: Arriving in 2018?

Editor‘s note: This post was prepared jointly by Ryan Mishmash and Jason Alicea.

Physicists appear to be on the verge of demonstrating proof-of-principle “usefulness” of small quantum computers.  Preskill’s notion of quantum supremacy spotlights a particularly enticing goal: use a quantum device to perform some computation—any computation in fact—that falls beyond the reach of the world’s best classical computers.  Efforts along these lines are being vigorously pursued along many fronts, from academia to large corporations to startups.  IBM’s publicly accessible 16-qubit superconducting device, Google’s pursuit of a 7×7 superconducting qubit array, and the recent synthesis of a 51-qubit quantum simulator using rubidium atoms are a few of many notable highlights.  While the number of qubits obtainable within such “conventional” approaches has steadily risen, synthesizing the first “topological qubit” remains an outstanding goal.  That ceiling may soon crumble however—vaulting topological qubits into a fascinating new chapter in the quest for scalable quantum hardware.

Why topological quantum computing?

As quantum computing progresses from minimalist quantum supremacy demonstrations to attacking real-world problems, hardware demands will naturally steepen.  In, say, a superconducting-qubit architecture, a major source of overhead arises from quantum error correction needed to combat decoherence.  Quantum-error-correction schemes such as the popular surface-code approach encode a single fault-tolerant logical qubit in many physical qubits, perhaps thousands.  The number of physical qubits required for practical applications can thus rapidly balloon.

The dream of topological quantum computing (introduced by Kitaev) is to construct hardware inherently immune to decoherence, thereby mitigating the need for active error correction.  In essence, one seeks physical qubits that by themselves function as good logical qubits.  This lofty objective requires stabilizing exotic phases of matter that harbor emergent particles known as “non-Abelian anyons”.  Crucially, nucleating non-Abelian anyons generates an exponentially large set of ground states that cannot be distinguished from each other by any local measurement.  Topological qubits encode information in those ground states, yielding two key virtues:

(1) Insensitivity to local noise.  For reference, consider a conventional qubit encoded in some two-level system, with the 0 and 1 states split by an energy \hbar \omega.  Local noise sources—e.g., random electric and magnetic fields—cause that splitting to fluctuate stochastically in time, dephasing the qubit.  In practice one can engender immunity against certain environmental perturbations.  One famous example is the transmon qubit (see “Charge-insensitive qubit design derived from the Cooper pair box” by Koch et al.) used extensively at IBM, Google, and elsewhere.  The transmon is a superconducting qubit that cleverly suppresses the effects of charge noise by operating in a regime where Josephson couplings are sizable compared to charging energies.  Transmons remain susceptible, however, to other sources of randomness such as flux noise and critical-current noise.  By contrast, topological qubits embed quantum information in global properties of the system, building in immunity against all local noise sources.  Topological qubits thus realize “perfect” quantum memory.

(2) Perfect gates via braiding.  By exploiting the remarkable phenomenon of non-Abelian statistics, topological qubits further enjoy “perfect” quantum gates: Moving non-Abelian anyons around one another reshuffles the system among the ground states—thereby processing the qubits—in exquisitely precise ways that depend only on coarse properties of the exchange.

Disclaimer: Adjectives like “perfect” should come with the qualifier “up to exponentially small corrections”, a point that we revisit below.

Experimental status

The catch is that systems supporting non-Abelian anyons are not easily found in nature.  One promising topological-qubit implementation exploits exotic 1D superconductors whose ends host “Majorana modes”—novel zero-energy degrees of freedom that underlie non-Abelian-anyon physics.  In 2010, two groups (Lutchyn et al. and Oreg et al.) proposed a laboratory realization that combines semiconducting nanowires, conventional superconductors, and modest magnetic fields.

Since then, the materials-science progress on nanowire-superconductor hybrids has been remarkable.  Researchers can now grow extremely clean, versatile devices featuring various manipulation and readout bells and whistles.  These fabrication advances paved the way for experiments that have reported increasingly detailed Majorana characteristics: tunneling signatures including recent reports of long-sought quantized response, evolution of Majorana modes with system size, mapping out of the phase diagram as a function of external parameters, etc.  Alternate explanations are still being debated though.  Perhaps the most likely culprit are conventional localized fermionic levels (“Andreev bound states”) that can imitate Majorana signatures under certain conditions; see in particular Liu et al.  Still, the collective experimental effort on this problem over the last 5+ years has provided mounting evidence for the existence of Majorana modes.  Revealing their prized quantum-information properties poses a logical next step.

Validating a topological qubit

Ideally one would like to verify both hallmarks of topological qubits noted above—“perfect” insensitivity to local noise and “perfect” gates via braiding.  We will focus on the former property, which can be probed in simpler device architectures.  Intuitively, noise insensitivity should imply long qubit coherence times.  But how do you pinpoint the topological origin of long coherence times, and in any case what exactly qualifies as “long”?

Here is one way to sharply address these questions (for more details, see our work in Aasen et al.).  As alluded to in our disclaimer above, logical 0 and 1 topological-qubit states aren’t exactly degenerate.  In nanowire devices they’re split by an energy \hbar \omega that is exponentially small in the separation distance L between Majorana modes divided by the superconducting coherence length \xi.  Correspondingly, the qubit states are not quite locally indistinguishable either, and hence not perfectly immune to local noise.  Now imagine pulling apart Majorana modes to go from a relatively poor to a perfect topological qubit.  During this process two things transpire in tandem: The topological qubit’s oscillation frequency, \omega, vanishes exponentially while the dephasing time T_2 becomes exponentially long.  That is,


This scaling relation could in fact be used as a practical definition of a topologically protected quantum memory.  Importantly, mimicking this property in any non-topological qubit would require some form of divine intervention.  For example, even if one fine-tuned conventional 0 and 1 qubit states (e.g., resulting from the Andreev bound states mentioned above) to be exactly degenerate, local noise could still readily produce dephasing.

As discussed in Aasen et al., this topological-qubit scaling relation can be tested experimentally via Ramsey-like protocols in a setup that might look something like the following:


This device contains two adjacent Majorana wires (orange rectangles) with couplings controlled by local gates (“valves” represented by black switches).  Incidentally, the design was inspired by a gate-controlled variation of the transmon pioneered in Larsen et al. and de Lange et al.  In fact, if only charge noise was present, we wouldn’t stand to gain much in the way of coherence times: both the transmon and topological qubit would yield exponentially long T_2 times.  But once again, other noise sources can efficiently dephase the transmon, whereas a topological qubit enjoys exponential protection from all sources of local noise.  Mathematically, this distinction occurs because the splitting for transmon qubit states is exponentially flat only with respect to variations in a “gate offset” n_g.  For the topological qubit, the splitting is exponentially flat with respect to variations in all external parameters (e.g., magnetic field, chemical potential, etc.), so long as Majorana modes still survive.  (By “exponentially flat” we mean constant up to exponentially small deviations.)  Plotting the energies of the qubit states in the two respective cases versus external parameters, the situation can be summarized as follows:


Outlook: Toward “topological quantum ascendancy”

These qubit-validation experiments constitute a small stepping stone toward building a universal topological quantum computer.  Explicitly demonstrating exponentially protected quantum information as discussed above would, nevertheless, go a long way toward establishing practical utility of Majorana-based topological qubits.  One might even view this goal as single-qubit-level “topological quantum ascendancy”.  Completion of this milestone would further set the stage for implementing “perfect” quantum gates, which requires similar capabilities albeit in more complex devices.  Researchers at Microsoft and elsewhere have their sights set on bringing a prototype topological qubit to life in the very near future.  It is not unreasonable to anticipate that 2018 will mark the debut of the topological qubit.  We could of course be off target.  There is, after all, still plenty of time in 2017 to prove us wrong.

John BaezComplex Adaptive System Design (Part 3)

It’s been a long time since I’ve blogged about the Complex Adaptive System Composition and Design Environment or CASCADE project run by John Paschkewitz. For a reminder, read these:

Complex adaptive system design (part 1), Azimuth, 2 October 2016.

Complex adaptive system design (part 2), Azimuth, 18 October 2016.

A lot has happened since then, and I want to explain it.

I’m working with Metron Scientific Solutions to develop new techniques for designing complex networks.

The particular problem we began cutting our teeth on is a search and rescue mission where a bunch of boats, planes and drones have to locate and save people who fall overboard during a boat race in the Caribbean Sea. Subsequently the Metron team expanded the scope to other search and rescue tasks. But the real goal is to develop very generally applicable new ideas on designing and ‘tasking’ networks of mobile agents—that is, designing these networks and telling the agents what to do.

We’re using the mathematics of ‘operads’, in part because Spivak’s work on operads has drawn a lot of attention and raised a lot of hopes:

• David Spivak, The operad of wiring diagrams: formalizing a graphical language for databases, recursion, and plug-and-play circuits.

An operad is a bunch of operations for sticking together smaller things to create bigger ones—I’ll explain this in detail later, but that’s the core idea. Spivak described some specific operads called ‘operads of wiring diagrams’ and illustrated some of their potential applications. But when we got going on our project, we wound up using a different class of operads, which I’ll call ‘network operads’.

Here’s our dream, which we’re still trying to make into a reality:

Network operads should make it easy to build a big network from smaller ones and have every agent know what to do. You should be able to ‘slap together’ a network, throwing in more agents and more links between them, and automatically have it do something reasonable. This should be more flexible than an approach where you need to know ahead of time exactly how many agents you have, and how they’re connected, before you can tell them what to do.

You don’t want a network to malfunction horribly because you forgot to hook it up correctly. You want to focus your attention on optimizing the network, not getting it to work at all. And you want everything to work so smoothly that it’s easy for the network to adapt to changing conditions.

To achieve this we’re using network operads, which are certain special ‘typed operads’. So before getting into the details of our approach, I should say a bit about typed operads. And I think that will be enough for today’s post: I don’t want to overwhelm you with too much information at once.

In general, a ‘typed operad’ describes ways of sticking together things of various types to get new things of various types. An ‘algebra’ of the operad gives a particular specification of these things and the results of sticking them together. For now I’ll skip the full definition of a typed operad and only highlight the most important features. A typed operad O has:

• a set T of types.

• collections of operations O(t_1,...,t_n ; t) where t_i, t \in T. Here t_1, \dots, t_n are the types of the inputs, while t is the type of the output.

• ways to compose operations. Given an operation
f \in O(t_1,\dots,t_n ;t) and n operations

g_1 \in O(t_{11},\dots,t_{1 k_1}; t_1),\dots, g_n \in O(t_{n1},\dots,t_{n k_n};t_n)

we can compose them to get

f \circ (g_1,\dots,g_n) \in O(t_{11}, \dots, t_{nk_n};t)

These must obey some rules.

But if you haven’t seen operads before, you’re probably reeling in horror—so I need to rush in and save you by showing you the all-important pictures that help explain what’s going on!

First of all, you should visualize an operation f \in O(t_1, \dots, t_n; t) as a little gizmo like this:

It has n inputs at top and one output at bottom. Each input, and the output, has a ‘type’ taken from the set T. So, for example, if you operation takes two real numbers, adds them and spits out the closest integer, both input types would be ‘real’, while the output type would be ‘integer’.

The main thing we do with operations is compose them. Given an an operation f \in O(t_1,\dots,t_n ;t), we can compose it with n operations

g_1 \in O(t_{11},\dots,t_{1 k_1}; t_1), \quad \dots, \quad g_n \in O(t_{n1},\dots,t_{n k_n};t_n)

by feeding their outputs into the inputs of f, like this:

The result is an operation we call

f \circ (g_1, \dots, g_n)

Note that the input types of f have to match the output types of the g_i for this to work! This is the whole point of types: they forbid us from composing operations in ways that don’t make sense.

This avoids certain stupid mistakes. For example, you can take the square root of a positive number, but you may not want to take the square root of a negative number, and you definitely don’t want to take the square root of a hamburger. While you can land a plane on an airstrip, you probably don’t want to land a plane on a person.

The operations in an operad are quite abstract: they aren’t really operating on anything. To render them concrete, we need another idea: operads have ‘algebras’.

An algebra A of the operad O specifies a set of things of each type t \in T such that the operations of O act on these sets. A bit more precisely, an algebra consists of:

• for each type t \in T, a set A(t) of things of type t

• an action of O on A, that is, a collection of maps

\alpha : O(t_1,...,t_n ; t) \times A(t_1) \times \cdots \times A(t_n) \to A(t)

obeying some rules.

In other words, an algebra turns each operation f \in O(t_1,...,t_n ; t) into a function that eats things of types t_1, \dots, t_n and spits out a thing of type t.

When we get to designing systems with operads, the fact that the same operad can have many algebras will be useful. Our operad will have operations describing abstractly how to hook up networks to form larger networks. An algebra will give a specific implementation of these operations. We can use one algebra that’s fairly fine-grained and detailed about what the operations actually do, and another that’s less detailed. There will then be a map between from the first algebra to the second, called an ‘algebra homomorphism’, that forgets some fine-grained details.

There’s a lot more to say—all this is just the mathematical equivalent of clearing my throat before a speech—but I’ll stop here for now.

And as I do—since it also takes me time to stop talking—I should make it clear yet again that I haven’t even given the full definition of typed operads and their algebras! Besides the laws I didn’t write down, there’s other stuff I omitted. Most notably, there’s a way to permute the inputs of an operation in an operad, and operads have identity operations, one for each type.

To see the full definition of an ‘untyped’ operad, which is really an operad with just one type, go here:

• Wikipedia, Operad theory.

They just call it an ‘operad’. Note that they first explain ‘non-symmetric operads’, where you can’t permute the inputs of operations, and then explain operads, where you can.

If you’re mathematically sophisticated, you can easily guess the laws obeyed by a typed operad just by looking at this article and inserting the missing types. You can also see the laws written down in Spivak’s paper, but with some different terminology: he calls types ‘objects’, he calls operations ‘morphisms’, and he calls typed operads ‘symmetric colored operads’—or once he gets going, just ‘operads’.

You can also see the definition of a typed operad in Section 2.1 here:

• Donald Yau, Operads of wiring diagrams.

What I would call a typed operad with S as its set of types, he calls an ‘S-colored operad’.

I guess it’s already evident, but I’ll warn you that the terminology in this subject varies quite a lot from author to author: for example, a certain community calls typed operads ‘symmetric multicategories’. This is annoying at first but once you understand the subject it’s as ignorable as the fact that mathematicians have many different accents. The main thing to remember is that operads come in four main flavors, since they can either be typed or untyped, and they can either let you permute inputs or not. I’ll always be working with typed operads where you can permute inputs.

Finally, I’ll say that while the definition of operad looks lengthy and cumbersome at first, it becomes lean and elegant if you use more category theory.

Next time I’ll give you an example of an operad: the simplest ‘network

August 16, 2017

BackreactionYou don’t expand just because the universe does. Here’s why.

Not how it works. It’s tough to wrap your head around four dimensions. We have known that the universe expands since the 1930s, but whether we expand with it is still one of the questions I am asked most frequently. The less self-conscious simply inform me that the universe doesn’t expand but everything in it shrinks – because how could we tell the difference? The best answer to these

David Hogga mistake in an E-M algorithm

[I am on quasi-vacation this week, so only posting irregularly.]

I (finally—or really for the N-th time, because I keep forgetting) understood the basis of E-M algorithms for optimizing (what I call) marginalized likelihoods in latent-variable models. I then worked out the equations for the E-M step for factor analysis, and a generalization of factor analysis that I hope to use in my project with Christina Eilers (MPIA).

Imagine my concern when I got a different update step than I find in the writings of my friend and mentor Sam Roweis (deceased), who is the source of all knowledge, as far as I am concerned! I spent a lot of time looking up stuff on the web, and most things agree with Roweis. But finally I found this note by Andrew Ng (Stanford / Coursera), which agrees with me (and disagrees with Roweis).

If you care about the weeds, the conflict is between equation (8) in those Ng notes and page 3 of these Roweis notes. It is a subtle difference, and it takes some work to translate notation. I wonder if the many documents that match Roweis derive from (possibly unconscious) propagation from Roweis, or whether the flow is in another direction, or whether it is just that the mistake is an easy one to make? Oddly, Ng decorates his equation (8) with a warning about an error you can easily make, but it isn't the error that Roweis made.

So much of importance in computer science and machine learning is buried in lecture notes and poorly indexed documents in user home pages. This is not a good state of affairs!

August 15, 2017

Tommaso DorigoRevenge Of The Slimeballs - Part 3

This is the third part of Chapter 3 of the book "Anomaly! Collider Physics and the Quest for New Phenomena at Fermilab". The chapter recounts the pioneering measurement of the Z mass by the CDF detector, and the competition with SLAC during the summer of 1989. The title of the post is the same as the one of chapter 3, and it refers to the way some SLAC physicists called their Fermilab colleagues, whose hadron collider was to their eyes obviously inferior to the electron-positron linear collider.

read more

John BaezNorbert Blum on P versus NP

There’s a new paper on the arXiv that claims to solve a hard problem:

• Norbert Blum, A solution of the P versus NP problem.

Most papers that claim to solve hard math problems are wrong: that’s why these problems are considered hard. But these papers can still be fun to look at, at least if they’re not obviously wrong. It’s fun to hope that maybe today humanity has found another beautiful grain of truth.

I’m not an expert on the P versus NP problem, so I have no opinion on this paper. So don’t get excited: wait calmly by your radio until you hear from someone who actually works on this stuff.

I found the first paragraph interesting, though. Here it is, together with some highly non-expert commentary. Beware: everything I say could be wrong!

Understanding the power of negations is one of the most challenging problems in complexity theory. With respect to monotone Boolean functions, Razborov [12] was the first who could shown that the gain, if using negations, can be super-polynomial in comparision to monotone Boolean networks. Tardos [16] has improved this to exponential.

I guess a ‘Boolean network’ is like a machine where you feed in a string of bits and it computes new bits using the logical operations ‘and’, ‘or’ and ‘not’. If you leave out ‘not’ the Boolean network is monotone, since then making more inputs equal to 1, or ‘true’, is bound to make more of the output bits 1 as well. Blum is saying that including ‘not’ makes some computations vastly more efficient… but that this stuff is hard to understand.

For the characteristic function of an NP-complete problem like the clique function, it is widely believed that negations cannot help enough to improve the Boolean complexity from exponential to polynomial.

A bunch of nodes in a graph are a clique if each of these nodes is connected by an edge to every other. Determining whether a graph with n vertices has a clique with more than k nodes is a famous problem: the clique decision problem.

For example, here’s a brute-force search for a clique with at least 4 nodes:

The clique decision problem is NP-complete. This means that if you can solve it with a Boolean network whose complexity grows like some polynomial in n, then P = NP. But if you can’t, then P ≠ NP.

(Don’t ask me what the complexity of a Boolean network is; I can guess but I could get it wrong.)

I guess Blum is hinting that the best monotone Boolean network for solving the clique decision problem has a complexity that’s exponential in n. And then he’s saying it’s widely believed that not gates can’t reduce the complexity to a polynomial.

Since the computation of an one-tape Turing machine can be simulated by a non-monotone Boolean network of size at most the square of the number of steps [15, Ch. 3.9], a superpolynomial lower bound for the non-monotone network complexity of such a function would imply P ≠ NP.

Now he’s saying what I said earlier: if you show it’s impossible to solve the clique decision problem with any Boolean network whose complexity grows like some polynomial in n, then you’ve shown P ≠ NP. This is how Blum intends to prove P ≠ NP.

For the monotone complexity of such a function, exponential lower bounds are known [11, 3, 1, 10, 6, 8, 4, 2, 7].

Should you trust someone who claims they’ve proved P ≠ NP, but can’t manage to get their references listed in increasing order?

But until now, no one could prove a non-linear lower bound for the nonmonotone complexity of any Boolean function in NP.

That’s a great example of how helpless we are: we’ve got all these problems whose complexity should grow faster than any polynomial, and we can’t even prove their complexity grows faster than linear. Sad!

An obvious attempt to get a super-polynomial lower bound for the non-monotone complexity of the clique function could be the extension of the method which has led to the proof of an exponential lower bound of its monotone complexity. This is the so-called “method of approximation” developed by Razborov [11].

I don’t know about this. All I know is that Razborov and Rudich proved a whole bunch of strategies for proving P ≠ NP can’t possibly work. These strategies are called ‘natural proofs’. Here are some friendly blog articles on their result:

• Timothy Gowers, How not to prove that P is not equal to NP, 3 October 2013.

• Timothy Gowers, Razborov and Rudich’s natural proofs argument, 7 October 2013.

From these I get the impression that what Blum calls ‘Boolean networks’ may be what other people call ‘Boolean circuits’. But I could be wrong!


Razborov [13] has shown that his approximation method cannot be used to prove better than quadratic lower bounds for the non-monotone complexity of a Boolean function.

So, this method is unable to prove some NP problem can’t be solved in polynomial time and thus prove P ≠ NP. Bummer!

But Razborov uses a very strong distance measure in his proof for the inability of the approximation method. As elaborated in [5], one can use the approximation method with a weaker distance measure to prove a super-polynomial lower bound for the non-monotone complexity of a Boolean function.

This reference [5] is to another paper by Blum. And in the end, he claims to use similar methods to prove that the complexity of any Boolean network that solves the clique decision problem must grow faster than a polynomial.

So, if you’re trying to check his proof that P ≠ NP, you should probably start by checking that other paper!

The picture below, by Behnam Esfahbod on Wikicommons, shows the two possible scenarios. The one at left is the one Norbert Blum claims to have shown we’re in.

August 13, 2017

Chad OrzelKid Art Update

Our big home renovation has added a level of chaos to everything that’s gotten in the way of my doing more regular cute-kid updates. And even more routine tasks, like photographing the giant pile of kid art that we had to move out of the dining room. Clearing stuff up for the next big stage of the renovation– cabinets arrive tomorrow– led me to this stuff, though, so I finally took pictures of a whole bunch of good stuff. (On the spiffy new tile floor in the kitchen, because the light was good there…)

The kids’s school sends home portfolios of what they’ve done in art class for the year, and I collected those photos together into a Google Photos album for easy sharing, because they’re pretty cool. My favorite piece of the lot is this polar bear by SteelyKid:

Polar bear by SteelyKid.

That’s in pastel chalk on construction paper; you can see some preliminary sketches of the bear in the album. She drew the scene in pencil, colored it in chalk, then traced important lines with a marker. It’s very cool.

The Pip has some neat stuff in his portfolio, too– I especially like that they had the kids making Mondrians out of strips of construction paper– but my favorite of his was a non-art-class drawing that was in the pile:

The Pigeon, by The Pip.

That’s a very credible rendering of Mo Willems’s Pigeon for a kindergartener…

Anyway, other than that, life continues in the usual whirl. I’m getting really tired of living out of a mini-fridge in the living room and a temporary sink in the kitchen that’s at about knee level to me (when I have to wash dishes, I pull up a chair and sit down, which takes the stress on my back from “agonizing” down to “annoying”). But, cabinets this week, so we can see the oncoming train at the end of this tunnel…

August 12, 2017

David Hoggserious bugs; dimensionality reduction

Megan Bedell (Chicago) and I had a scare today: Although we can show that in very realistically simulated fake data (with unmodeled tellurics, wrong continuum, and so on) a synthetic spectrum (data-driven) beats a binary mask for measuring radial velocities, we found that in real data from the HARPS instrument that the mask was doing better. Why? We went through a period of doubting everything we know. I was on the point of resigning. And then we realized it was a bug in the code! Whew.

Adrian Price-Whelan (Princeton) also found a serious bug in our binary-star fitting. The thing we were calculating as the pericenter distance is the distance of the primary-star center-of-mass to the system barycenter. That's not the minimum separation of the two stars! Duh. That had us rolling on the floor laughing, as the kids say, especially since we might have gotten all the way to submission without noticing that absolutely critical bug.

At the end of the day, I gave the Königstuhl Colloquium, on the blackboard, about dimensionality reduction. I started with a long discussion about what is good and bad about machine learning, and then went (too fast!) through PCA, ICA, kernel-PCA, PPCA, factor analyzers, HMF, E-M algorithms, latent-variable models, and the GPLVM, drawing connections between them. The idea was to give the audience context and jumping-off points for their projects.

David Hoggmicro-tellurics

Today, in an attempt to make our simulated extreme-precision radial-velocity fake data as conservative as possible, Megan Bedell (Chicago) and I built a ridiculously pessimistic model for un-modeled (and unknown) telluric lines that could be hiding in the spectra, at amplitudes too low to be clearly seen in any individual spectrum, but with the full wavelength range bristling with lines. Sure enough, these “micro-tellurics” (as you might call them) do indeed mess up radial-velocity measurements. The nice thing (from our perspective) is that they mess up the measurements in a way that is co-variant with barycentric velocity, and they mess up synthetic-spectrum-based RV measurements less than binary-mask-based RV measurements.

At MPIA Galaxy Coffee, Irina Smirnova-Pinchukova (MPIA) gave a great talk about her trip on a SOFIA flight.

David Hoggmachine learning, twins, excitation temperature

After our usual start at the Coffee Nerd, it was MW Group Meeting, where we discussed (separately) Cepheids and Solar-neighborhood nucleosynthesis. On the latter, Oliver Philcox (St Andrews) has taken the one-zone models of Jan Rybizki (MPIA) and made them 500 times faster using a neural-network emulator. This emulator is tuned to interpolate a set of (slowly computed) models very quickly and accurately. That's a good use of machine learning! Also, because of backpropagation, it is possible to take the derivatives of the emulator with respect to the inputs (I think) and therefore you also get derivatives, for optimization and sampling.

The afternoon's PSF Coffee meeting had presentations by Meg Bedell (Chicago) about Solar Twin abundances, and by Richard Teague (Michigan) about protoplanetary disk TW Hya. On the former, Bedell showed that she can make extremely precise measurements, because a lot of theoretical uncertainties cancel out. She finds rock-abundance anomalies (that is, abundance anomalies that are stronger in high-condensation-temperature lines) all over the place, which is context for results from Semyeong Oh (Princeton). On TW Hya, Teague showed that it is possible to get pretty consistent temperature information out of line ratios. I would like to see two-dimensional maps of those: Are there embedded temperature anomalies in the disk?

Tim GowersIntransitive dice VII — aiming for further results

While Polymath13 has (barring a mistake that we have not noticed) led to an interesting and clearly publishable result, there are some obvious follow-up questions that we would be wrong not to try to answer before finishing the project, especially as some of them seem to be either essentially solved or promisingly close to a solution. The ones I myself have focused on are the following.

  1. Is it true that if two random elements A and B of [n]^n are chosen, then A beats B with very high probability if it has a sum that is significantly larger? (Here “significantly larger” should mean larger by f(n) for some function f(n)=o(n^{3/2}) — note that the standard deviation of the sum has order n^{3/2}, so the idea is that this condition should be satisfied one way or the other with probability 1-o(1)).
  2. Is it true that the stronger conjecture, which is equivalent (given what we now know) to the statement that for almost all pairs (A,B) of random dice, the event that A beats a random die C has almost no correlation with the event that B beats C, is false?
  3. Can the proof of the result obtained so far be modified to show a similar result for the multisets model?

The status of these three questions, as I see it, is that the first is basically solved — I shall try to justify this claim later in the post, for the second there is a promising approach that will I think lead to a solution — again I shall try to back up this assertion, and while the third feels as though it shouldn’t be impossibly difficult, we have so far made very little progress on it, apart from experimental evidence that suggests that all the results should be similar to those for the balanced sequences model. [Added after finishing the post: I may possibly have made significant progress on the third question as a result of writing this post, but I haven’t checked carefully.]

The strength of a die depends strongly on the sum of its faces.

Let A=(a_1,\dots,a_n) and B=(b_1,\dots,b_n) be elements of [n]^n chosen uniformly and independently at random. I shall now show that the average of


is zero, and that the probability that this quantity differs from its average by substantially more than n\log n is very small. Since typically the modulus of \sum_ia_i-\sum_jb_j has order n^{3/2}, it follows that whether or not A beats B is almost always determined by which has the bigger sum.

As in the proof of the main theorem, it is convenient to define the functions

f_A(j)=|\{i:a_i<j\}|+\frac 12|\{i:a_i=j\}|


g_A(j)=f_A(j)-j+\frac 12.


\sum_jf_A(b_j)=\sum_{i,j}\mathbf 1_{a_i<b_j}+\frac 12\sum_{i,j}\mathbf 1_{a_i=b_j},

from which it follows that B beats A if and only if \sum_jf_A(b_j)>n^2/2. Note also that

\sum_jg_A(b_j)=\sum_jf_A(b_j)-\sum_jb_j+\frac n2.

If we choose A purely at random from [n]^n, then the expectation of f_A(j) is j-1/2, and Chernoff’s bounds imply that the probability that there exists j with |g_A(j)|=|f_A(j)-j+1/2|\geq C\sqrt{n\log n} is, for suitable C at most n^{-10}. Let us now fix some A for which there is no such j, but keep B as a purely random element of [n]^n.

Then \sum_jg_A(b_j) is a sum of n independent random variables, each with maximum at most C\sqrt{n\log n}. The expectation of this sum is \sum_jg_A(j)=\sum_jf_A(j)-n^2/2.


\sum_jf_A(j)=\sum_{i,j}\mathbf 1_{a_i<j}+\frac 12\sum_{i,j}\mathbf 1_{a_i=j}

=\sum_i(n-a_i)+\frac n2=n^2+\frac n2-\sum_ia_i,

so the expectation of \sum_jg_A(b_j) is n(n+1)/2-\sum_ia_i.

By standard probabilistic estimates for sums of independent random variables, with probability at least 1-n^{-10} the difference between \sum_jg_A(b_j) and its expectation \sum_jf_A(j)-n^2/2 is at most Cn\log n. Writing this out, we have

|\sum_jf_A(b_j)-\sum_jb_j+\frac n2-n(n+1)/2+\sum_ia_i|\leq Cn\log n,

which works out as

|\sum_jf_A(b_j)-\frac {n^2}2-\sum_jb_j+\sum_ia_i|\leq Cn\log n.

Therefore, if \sum_ia_i>\sum_jb_j+Cn\log n, it follows that with high probability \sum_jf_A(b_j)<n^2/2, which implies that A beats B, and if \sum_jb_j>\sum_ia_i+Cn\log n, then with high probability B beats A. But one or other of these two cases almost always happens, since the standard deviations of \sum_ia_i and \sum_jb_j are of order n^{3/2}. So almost always the die that wins is the one with the bigger sum, as claimed. And since “has a bigger sum than” is a transitive relation, we get transitivity almost all the time.

Why the strong conjecture looks false

As I mentioned, the experimental evidence seems to suggest that the strong conjecture is false. But there is also the outline of an argument that points in the same direction. I’m going to be very sketchy about it, and I don’t expect all the details to be straightforward. (In particular, it looks to me as though the argument will be harder than the argument in the previous section.)

The basic idea comes from a comment of Thomas Budzinski. It is to base a proof on the following structure.

  1. With probability bounded away from zero, two random dice A and B are “close”.
  2. If A and B are two fixed dice that are close to each other and C is random, then the events “A beats C” and “B beats C” are positively correlated.

Here is how I would imagine going about defining “close”. First of all, note that the function g_A is somewhat like a random walk that is constrained to start and end at zero. There are results that show that random walks have a positive probability of never deviating very far from the origin — at most half a standard deviation, say — so something like the following idea for proving the first step (remaining agnostic for the time being about the precise definition of “close”). We choose some fixed positive integer k and let x_1<\dots<x_k be integers evenly spread through the interval \{1,2,\dots,n\}. Then we argue — and this should be very straightforward — that with probability bounded away from zero, the values of f_A(x_i) and f_B(x_i) are close to each other, where here I mean that the difference is at most some small (but fixed) fraction of a standard deviation.

If that holds, it should also be the case, since the intervals between x_{i-1} and x_i are short, that f_A and f_B are uniformly close with positive probability.

I’m not quite sure whether proving the second part would require the local central limit theorem in the paper or whether it would be an easier argument that could just use the fact that since f_A and f_B are close, the sums \sum_jf_A(c_j) and \sum_jf_B(c_j) are almost certainly close too. Thomas Budzinski sketches an argument of the first kind, and my guess is that that is indeed needed. But either way, I think it ought to be possible to prove something like this.

What about the multisets model?

We haven’t thought about this too hard, but there is a very general approach that looks to me promising. However, it depends on something happening that should be either quite easy to establish or not true, and at the moment I haven’t worked out which, and as far as I know neither has anyone else.

The difficulty is that while we still know in the multisets model that A beats B if and only if \sum_jf_A(b_j)<n^2/2 (since this depends just on the dice and not on the model that is used to generate them randomly), it is less easy to get traction on the sum because it isn’t obvious how to express it as a sum of independent random variables.

Of course, we had that difficulty with the balanced-sequences model too, but there we got round the problem by considering purely random sequences B and conditioning on their sum, having established that certain events held with sufficiently high probability for the conditioning not to stop them holding with high probability.

But with the multisets model, there isn’t an obvious way to obtain the distribution over random dice B by choosing b_1,\dots,b_n independently (according to some distribution) and conditioning on some suitable event. (A quick thought here is that it would be enough if we could approximate the distribution of B in such a way, provided the approximation was good enough. The obvious distribution to take on each b_i is the marginal distribution of that b_i in the multisets model, and the obvious conditioning would then be on the sum, but it is far from clear to me whether that works.)

A somewhat different approach that I have not got far with myself is to use the standard one-to-one correspondence between increasing sequences of length n taken from [n] and subsets of [2n-1] of size n. (Given such a sequence (a_1,\dots,a_n) one takes the subset \{a_1,a_2+1,\dots,a_n+n-1\}, and given a subset S=\{s_1,\dots,s_n\}\subset[2n-1], where the s_i are written in increasing order, one takes the multiset of all values s_i-i+1, with multiplicity.) Somehow a subset of [2n-1] of size n feels closer to a bunch of independent random variables. For example, we could model it by choosing each element with probability n/(2n-1) and conditioning on the number of elements being exactly n, which will happen with non-tiny probability.

Actually, now that I’m writing this, I’m coming to think that I may have accidentally got closer to a solution. The reason is that earlier I was using a holes-and-pegs approach to defining the bijection between multisets and subsets, whereas with this approach, which I had wrongly assumed was essentially the same, there is a nice correspondence between the elements of the multiset and the elements of the set. So I suddenly feel more optimistic that the approach for balanced sequences can be adapted to the multisets model.

I’ll end this post on that optimistic note: no doubt it won’t be long before I run up against some harsh reality.

August 11, 2017

n-Category Café Magnitude Homology in Sapporo

John and I are currently enjoying Applied Algebraic Topology 2017 in the city of Sapporo, on the northern Japanese island of Hokkaido.

I spoke about magnitude homology of metric spaces. A central concept in applied topology is persistent homology, which is also a homology theory of metric spaces. But magnitude homology is different.

It was brought into being one year ago on this very blog, principally by Mike Shulman, though Richard Hepworth and Simon Willerton had worked out a special case before. You can read a long post of mine about it from a year ago, which in turn refers back to a very long comments thread of an earlier post.

But for a short account, try my talk slides. They introduce both magnitude itself (including some exciting new developments) and magnitude homology. Both are defined in the wide generality of enriched categories, but I concentrated on the case of metric spaces.

Slide on magnitude homology

Of course, John’s favourite slide was the one shown.

n-Category Café A Graphical Calculus for Proarrow Equipments

guest post by David Myers

Proarrow equipments (which also go by the names “fibrant double categories” and “framed bicategories”) are wonderful and fundamental category-like objects. If categories are the abstract algebras of functions, then equipments are the abstract algebras of functions and relations. They are a fantastic setting to do formal category theory, which you can learn about in Mike’s post about them on this blog!

For my undergraduate thesis, I came up with a graphical calculus for working with equipments. I wasn’t the first to come up with it (if you’re familiar with both string diagrams and equipments, it’s basically the only sort of thing that you’d try), but I did prove it sound using a proof similar to Joyal and Street’s proof of the soundness of the graphical calculus for monoidal categories. You can see the full paper on the arXiv, or see the slides from a talk I gave about it at CT2017 here. Below the fold, I’ll show you the diagrams and a bit of what you can do with them.

What is a Double Category?

A double category is a category internal to the category of categories. Now, this is fun to say, but takes a bit of unpacking. Here is a more elementary definition together with the string diagrams:

Definition: A double category has

  • Objects AA, BB, C,C, \ldots, which will be written as bounded plane regions of different colors , , , \ldots.
  • Vertical arrows f:f : \rightarrow , \ldots, which we will just call arrows and write as vertical lines , directed downwards, dividing the plane region from .
  • Horizontal arrows JJ, KK, H:H : \to , \ldots, which we will just call proarrows and write as horizontal lines dividing the plane region from .
  • 2-cells ,\ldots, are represented as beads between the arrows and proarrows ,\ldots

The usual square notation is on the left, and the string diagrams are on the right.

There are two ways to compose 2-cells: horizontally, and vertically. These satisfy and interchange law saying that composing horizontally and then vertically is the same as composing vertically and then horizontally.

Note that when we compose 2-cells horizontally, we must compose the vertical arrows. Therefore, the vertical arrows will form a category. Similary, when we compose 2-cells vertically, we must compose the horizontal proarrows. Therefore, the horizontal proarrows will form a category. Except, this is not quite true; in most of our examples in the wild, the composition of proarrows will only commute up to isomorphism, so they will form a bicategory. I’ll just hand wave this away for the rest of the post.

This is about all there is to the graphical calculus for double categories. Any deformation of a double diagram that keeps the vertical arrows vertical and the horizontal proarrows horizontal will describe an equal composite in any double category.

Here are some examples of double categories:

In many double categories that we meet “in the wild”, the arrows will be function-like and the proarrows relation-like. These double categories are called equipments. In these cases, we can turn functions into relations by taking their graphs. This can be realized in the graphical calculus by bending vertical arrows horizontal.

Companions, Conjoints, and Equipments

An arrow has a companion if there is a proarrow together with two 2-cells and such that

= and = .

I call these the “kink identities”, because they are reminiscent of the “zig-zag identities” for adjunctions in string diagrams. We can think of as the graph of as a subset of its domain times its codomain.

Similarly, is said to have a conjoint if there is a proarrow together with two 2-cells and such that

== and == .

Definition: A proarrow equipment is a double category where every arrow has a conjoint and a companion.

The prototypical example of a proarrow equipment, and also the reason for the name, is the equipment of categories, functors, profunctors, and profunctor morphisms. In this equipment, companions are the restriction of the hom of the codomain by the functor on the left, and conjoints are the restriction of the hom of the codomain by the functor on the right.

In the equipment with objects sets, arrows functions, and proarrows relations, the companion and conjoint are the graph of a function as a relation from the domain to codomain or from the codomain to domain respectively.

The following lemma is a central elementary result of the theory of equipments:

Lemma (Spider Lemma): In an equipment, we can bend arrows. More formally, there is a bijective correspondence between diagrams of form of the left, and diagrams of the form of the right:

\approx .

Proof. The correspondence is given by composing the outermost vertical or horizontal arrows by their companion or conjoint (co)units, as suggested by the slight bends in the arrows above. The kink identities then ensure that these two processes are inverse to each other, giving the desired bijection.

In his post, Mike calls this the “fundamental lemma”. This is the engine humming under the graphical calculus; in short, the Spider Lemma says that we can bend vertical wires horizontal. We can use this bending to prove a classical result of category theory in a very general setting.

Hom-Set and Zig-Zag Adjunctions

It is a classical fact of category theory that an adjunction fg:ABf \dashv g : A \rightleftarrows B may be defined using natural transformations η:idfg\eta: \id \rightarrow fg and ϵ:gfid\epsilon: gf \rightarrow \id (which we will call a zig-zag adjunction, after the coherence conditions they have to satisfy – also called the triangle equations), or by giving a natural isomorphism ψ:B(f,1)A(1,g)\psi : B(f, 1) \cong A(1, g). This equivalence holds in any proarrow equipment, which we can now show quickly and intuitively with string diagrams.

Suppose we have an adjunction \dashv , given by the vertical cells and , satisfying the zig-zag identities

= and = .

By bending the unit and counit, we get the horizontal cells and . Bending the zig-zag identities shows that these maps are inverse to each other

= = = ,

and are therefore the natural isomorphism \cong we wanted.

Going the other way, suppose is a natural isomorphism with inverse . That is,

= and = .

Then we can define a unit and counit by bending. These satisfy the zig-zag identities by pulling straight and using (1):

= = = ,

= = = .

Though this proof can be discovered graphically, it specializes to the usual argument in the case that the equipment is an equipment of enriched categories!

And Much, Much More!

In the paper, you’ll find that every deformation of an equipment diagram gives the same composite – the graphical calculus is sound. But you’ll also find an application of the calculus: a “Yoneda-style” embedding of every equipment into the equipment of categories enriched in it. The paper still definitely needs some work, so I welcome any feedback in the comments!

I hope these string diagrams make using equipments easier and more fun.

Doug NatelsonThat's the way the ball bounces.

How does a ball bounce?  Why does a ball, dropped from some height onto a flat surface, not bounce all the way back up to its starting height?  The answers to these questions may seem obvious, but earlier this week, this paper appeared on the arxiv, and it does a great job of showing what we still don't understand about this everyday physics that is directly relevant for a huge number of sports.

The paper talks specifically about hollow or inflated balls.  When a ball is instantaneously at rest, mid-bounce, it's shape has been deformed by its interaction with the flat surface.  The kinetic energy of its motion has been converted into potential energy, tied up in a combination of the elastic deformation of the skin or shell of the ball and the compression of the gas inside the ball.  (One surprising thing I learned from that paper is that high speed photography shows that the non-impacting parts of such inflated balls tend to remain spherical, even as part of the ball deforms flat against the surface.)  That gas compression is quick enough that heat transfer between the gas and the ball is probably negligible.  A real ball does not bounce back to its full height; equivalently, the ratio \(v_{f}/v_{i}\) of the ball's speed immediately after the bounce, \(v_{f}\), to that immediately before the bounce, \(v_{i}\), is less than one.  That ratio is called the coefficient of restitution.

Somehow in the bounce process some energy must've been lost from the macroscopic motion of the ball, and since we know energy is conserved, that energy must eventually show up as disorganized, microscopic energy of jiggling atoms that we colloquially call heat.   How can this happen?

  • The skin of the ball might not be perfectly elastic - there could be some "viscous losses" or "internal friction" as the skin deforms.
  • As the ball impacts the surface, it can launch sound waves into the surface that eventually dissipate.
  • Similarly, the skin of the ball itself can start vibrating in a complicated way, eventually damping out to disorganized jiggling of the atoms.
  • As the ball's skin hits the ground and deforms, it squeezes air out from beneath the ball; the speed of that air can actually exceed the speed of sound in the surrounding medium (!), creating a shock wave that dissipates by heating the air, as well as ordinary sound vibrations.  (It turns out that clapping your hands can also create shock waves!  See here and here.)
  • There can also be irreversible acoustic process in the gas inside the ball that heat the gas in there.
This paper goes through all of these, estimates how big those effects are, and concludes that, for many common balls (e.g., basketballs), we actually don't understand the relative importance of these different contributions.  The authors propose some experiments to figure out what's going on.  The whole thing is a nice exercise in mechanics and elasticity, and it's always fun to realize that there may still be some surprises lurking in the physics of the everyday.

John BaezApplied Algebraic Topology 2017

In the comments on this blog post I’m taking some notes on this conference:

Applied Algebraic Topology 2017, August 8-12, 2017, Hokkaido University, Sapporo, Japan.

Unfortunately these notes will not give you a good summary of the talks—and almost nothing about applications of algebraic topology. Instead, I seem to be jotting down random cool math facts that I’m learning and don’t want to forget.

August 10, 2017

Terence TaoThe structure of logarithmically averaged correlations of multiplicative functions, with applications to the Chowla and Elliott conjectures

Joni Teräväinen and I have just uploaded to the arXiv our paper “The structure of logarithmically averaged correlations of multiplicative functions, with applications to the Chowla and Elliott conjectures“, submitted to Duke Mathematical Journal. This paper builds upon my previous paper in which I introduced an “entropy decrement method” to prove the two-point (logarithmically averaged) cases of the Chowla and Elliott conjectures. A bit more specifically, I showed that

\displaystyle  \lim_{m \rightarrow \infty} \frac{1}{\log \omega_m} \sum_{x_m/\omega_m \leq n \leq x_m} \frac{g_0(n+h_0) g_1(n+h_1)}{n} = 0

whenever {1 \leq \omega_m \leq x_m} were sequences going to infinity, {h_0,h_1} were distinct integers, and {g_0,g_1: {\bf N} \rightarrow {\bf C}} were {1}-bounded multiplicative functions which were non-pretentious in the sense that

\displaystyle  \liminf_{X \rightarrow \infty} \inf_{|t_j| \leq X} \sum_{p \leq X} \frac{1-\mathrm{Re}( g_j(p) \overline{\chi_j}(p) p^{it_j})}{p} = \infty \ \ \ \ \ (1)

for all Dirichlet characters {\chi_j} and for {j=0,1}. Thus, for instance, one had the logarithmically averaged two-point Chowla conjecture

\displaystyle  \sum_{n \leq x} \frac{\lambda(n) \lambda(n+h)}{n} = o(\log x)

for fixed any non-zero {h}, where {\lambda} was the Liouville function.

One would certainly like to extend these results to higher order correlations than the two-point correlations. This looks to be difficult (though perhaps not completely impossible if one allows for logarithmic averaging): in a previous paper I showed that achieving this in the context of the Liouville function would be equivalent to resolving the logarithmically averaged Sarnak conjecture, as well as establishing logarithmically averaged local Gowers uniformity of the Liouville function. However, in this paper we are able to avoid having to resolve these difficult conjectures to obtain partial results towards the (logarithmically averaged) Chowla and Elliott conjecture. For the Chowla conjecture, we can obtain all odd order correlations, in that

\displaystyle  \sum_{n \leq x} \frac{\lambda(n+h_1) \dots \lambda(n+h_k)}{n} = o(\log x) \ \ \ \ \ (2)

for all odd {k} and all integers {h_1,\dots,h_k} (which, in the odd order case, are no longer required to be distinct). (Superficially, this looks like we have resolved “half of the cases” of the logarithmically averaged Chowla conjecture; but it seems the odd order correlations are significantly easier than the even order ones. For instance, because of the Katai-Bourgain-Sarnak-Ziegler criterion, one can basically deduce the odd order cases of (2) from the even order cases (after allowing for some dilations in the argument {n}).

For the more general Elliott conjecture, we can show that

\displaystyle  \lim_{m \rightarrow \infty} \frac{1}{\log \omega_m} \sum_{x_m/\omega_m \leq n \leq x_m} \frac{g_1(n+h_1) \dots g_k(n+h_k)}{n} = 0

for any {k}, any integers {h_1,\dots,h_k} and any bounded multiplicative functions {g_1,\dots,g_k}, unless the product {g_1 \dots g_k} weakly pretends to be a Dirichlet character {\chi} in the sense that

\displaystyle  \sum_{p \leq X} \frac{1 - \hbox{Re}( g_1 \dots g_k(p) \overline{\chi}(p)}{p} = o(\log\log X).

This can be seen to imply (2) as a special case. Even when {g_1,\dots,g_k} does pretend to be a Dirichlet character {\chi}, we can still say something: if the limits

\displaystyle  f(a) := \lim_{m \rightarrow \infty} \frac{1}{\log \omega_m} \sum_{x_m/\omega_m \leq n \leq x_m} \frac{g_1(n+ah_1) \dots g_k(n+ah_k)}{n}

exist for each {a \in {\bf Z}} (which can be guaranteed if we pass to a suitable subsequence), then {f} is the uniform limit of periodic functions {f_i}, each of which is {\chi}isotypic in the sense that {f_i(ab) = f_i(a) \chi(b)} whenever {a,b} are integers with {b} coprime to the periods of {\chi} and {f_i}. This does not pin down the value of any single correlation {f(a)}, but does put significant constraints on how these correlations may vary with {a}.

Among other things, this allows us to show that all {16} possible length four sign patterns {(\lambda(n+1),\dots,\lambda(n+4)) \in \{-1,+1\}^4} of the Liouville function occur with positive density, and all {65} possible length four sign patterns {(\mu(n+1),\dots,\mu(n+4)) \in \{-1,0,+1\}^4 \backslash \{-1,+1\}^4} occur with the conjectured logarithmic density. (In a previous paper with Matomaki and Radziwill, we obtained comparable results for length three patterns of Liouville and length two patterns of Möbius.)

To describe the argument, let us focus for simplicity on the case of the Liouville correlations

\displaystyle  f(a) := \lim_{X \rightarrow \infty} \frac{1}{\log X} \sum_{n \leq X} \frac{\lambda(n) \lambda(n+a) \dots \lambda(n+(k-1)a)}{n}, \ \ \ \ \ (3)

assuming for sake of discussion that all limits exist. (In the paper, we instead use the device of generalised limits, as discussed in this previous post.) The idea is to combine together two rather different ways to control this function {f}. The first proceeds by the entropy decrement method mentioned earlier, which roughly speaking works as follows. Firstly, we pick a prime {p} and observe that {\lambda(pn)=-\lambda(n)} for any {n}, which allows us to rewrite (3) as

\displaystyle  (-1)^k f(a) = \lim_{X \rightarrow \infty} \frac{1}{\log X}

\displaystyle  \sum_{n \leq X} \frac{\lambda(pn) \lambda(pn+ap) \dots \lambda(pn+(k-1)ap)}{n}.

Making the change of variables {n' = pn}, we obtain

\displaystyle  (-1)^k f(a) = \lim_{X \rightarrow \infty} \frac{1}{\log X}

\displaystyle \sum_{n' \leq pX} \frac{\lambda(n') \lambda(n'+ap) \dots \lambda(n'+(k-1)ap)}{n'} p 1_{p|n'}.

The difference between {n' \leq pX} and {n' \leq X} is negligible in the limit (here is where we crucially rely on the log-averaging), hence

\displaystyle  (-1)^k f(a) = \lim_{X \rightarrow \infty} \frac{1}{\log X} \sum_{n \leq X} \frac{\lambda(n) \lambda(n+ap) \dots \lambda(n+(k-1)ap)}{n} p 1_{p|n}

and thus by (3) we have

\displaystyle  (-1)^k f(a) = f(ap) + \lim_{X \rightarrow \infty} \frac{1}{\log X}

\displaystyle \sum_{n \leq X} \frac{\lambda(n) \lambda(n+ap) \dots \lambda(n+(k-1)ap)}{n} (p 1_{p|n}-1).

The entropy decrement argument can be used to show that the latter limit is small for most {p} (roughly speaking, this is because the factors {p 1_{p|n}-1} behave like independent random variables as {p} varies, so that concentration of measure results such as Hoeffding’s inequality can apply, after using entropy inequalities to decouple somewhat these random variables from the {\lambda} factors). We thus obtain the approximate isotopy property

\displaystyle  (-1)^k f(a) \approx f(ap) \ \ \ \ \ (4)

for most {a} and {p}.

On the other hand, by the Furstenberg correspondence principle (as discussed in these previous posts), it is possible to express {f(a)} as a multiple correlation

\displaystyle  f(a) = \int_X g(x) g(T^a x) \dots g(T^{(k-1)a} x)\ d\mu(x)

for some probability space {(X,\mu)} equipped with a measure-preserving invertible map {T: X \rightarrow X}. Using results of Bergelson-Host-Kra, Leibman, and Le, this allows us to obtain a decomposition of the form

\displaystyle  f(a) = f_1(a) + f_2(a) \ \ \ \ \ (5)

where {f_1} is a nilsequence, and {f_2} goes to zero in density (even along the primes, or constant multiples of the primes). The original work of Bergelson-Host-Kra required ergodicity on {X}, which is very definitely a hypothesis that is not available here; however, the later work of Leibman removed this hypothesis, and the work of Le refined the control on {f_1} so that one still has good control when restricting to primes, or constant multiples of primes.

Ignoring the small error {f_2(a)}, we can now combine (5) to conclude that

\displaystyle  f(a) \approx (-1)^k f_1(ap).

Using the equidistribution theory of nilsequences (as developed in this previous paper of Ben Green and myself), one can break up {f_1} further into a periodic piece {f_0} and an “irrational” or “minor arc” piece {f_3}. The contribution of the minor arc piece {f_3} can be shown to mostly cancel itself out after dilating by primes {p} and averaging, thanks to Vinogradov-type bilinear sum estimates (transferred to the primes). So we end up with

\displaystyle  f(a) \approx (-1)^k f_0(ap),

which already shows (heuristically, at least) the claim that {f} can be approximated by periodic functions {f_0} which are isotopic in the sense that

\displaystyle  f_0(a) \approx (-1)^k f_0(ap).

But if {k} is odd, one can use Dirichlet’s theorem on primes in arithmetic progressions to restrict to primes {p} that are {1} modulo the period of {f_0}, and conclude now that {f_0} vanishes identically, which (heuristically, at least) gives (2).

The same sort of argument works to give the more general bounds on correlations of bounded multiplicative functions. But for the specific task of proving (2), we initially used a slightly different argument that avoids using the ergodic theory machinery of Bergelson-Host-Kra, Leibman, and Le, but replaces it instead with the Gowers uniformity norm theory used to count linear equations in primes. Basically, by averaging (4) in {p} using the “{W}-trick”, as well as known facts about the Gowers uniformity of the von Mangoldt function, one can obtain an approximation of the form

\displaystyle  (-1)^k f(a) \approx {\bf E}_{b: (b,W)=1} f(ab)

where {b} ranges over a large range of integers coprime to some primorial {W = \prod_{p \leq w} p}. On the other hand, by iterating (4) we have

\displaystyle  f(a) \approx f(apq)

for most semiprimes {pq}, and by again averaging over semiprimes one can obtain an approximation of the form

\displaystyle  f(a) \approx {\bf E}_{b: (b,W)=1} f(ab).

For {k} odd, one can combine the two approximations to conclude that {f(a)=0}. (This argument is not given in the current paper, but we plan to detail it in a subsequent one.)

Filed under: math.DS, math.NT, paper Tagged: Chowla conjecture, correspondence principle, Elliott conjecture, Joni Teravainen, Liouville function, multiplicative functions

August 09, 2017

Tommaso DorigoHiggs Decays To Tau Leptons: CMS Sees Them First

I have recently been reproached, by colleagues who are members of the competing ATLAS experiment, of misusing the word "see" in this blog, in the context of searches for physics signals. That was because I reported that CMS recently produced a very nice result where we measure the rate of H->bb decays in events where the Higgs boson recoils against a energetic jet; that signal is not statistically significant, so they could argue that CMS did not "see" anything, as I wrote in the blog title. 

read more

Scott AaronsonThe Kolmogorov option

Andrey Nikolaevich Kolmogorov was one of the giants of 20th-century mathematics.  I’ve always found it amazing that the same man was responsible both for establishing the foundations of classical probability theory in the 1930s, and also for co-inventing the theory of algorithmic randomness (a.k.a. Kolmogorov complexity) in the 1960s, which challenged the classical foundations, by holding that it is possible after all to talk about the entropy of an individual object, without reference to any ensemble from which the object was drawn.  Incredibly, going strong into his eighties, Kolmogorov then pioneered the study of “sophistication,” which amends Kolmogorov complexity to assign low values both to “simple” objects and “random” ones, and high values only to a third category of objects, which are “neither simple nor random.”  So, Kolmogorov was at the vanguard of the revolution, counter-revolution, and counter-counter-revolution.

But that doesn’t even scratch the surface of his accomplishments: he made fundamental contributions to topology and dynamical systems, and together with Vladimir Arnold, solved Hilbert’s thirteenth problem, showing that any multivariate continuous function can be written as a composition of continuous functions of two variables.  He mentored an awe-inspiring list of young mathematicians, whose names (besides Arnold) include Dobrushin, Dynkin, Gelfand, Martin-Löf, Sinai, and in theoretical computer science, our own Leonid Levin.  If that wasn’t enough, during World War II Kolmogorov applied his mathematical gifts to artillery problems, helping to protect Moscow from German bombardment.

Kolmogorov was private in his personal and political life, which might have had something to do with being gay, at a time and place when that was in no way widely accepted.  From what I’ve read—for example, in Gessen’s biography of Perelman—Kolmogorov seems to have been generally a model of integrity and decency.  He established schools for mathematically gifted children, which became jewels of the Soviet Union; one still reads about them with awe.  And at a time when Soviet mathematics was convulsed by antisemitism—with students of Jewish descent excluded from the top math programs for made-up reasons, sent instead to remote trade schools—Kolmogorov quietly protected Jewish researchers.

OK, but all this leaves a question.  Kolmogorov was a leading and admired Soviet scientist all through the era of Stalin’s purges, the Gulag, the KGB, the murders and disappearances and forced confessions, the show trials, the rewritings of history, the allies suddenly denounced as traitors, the tragicomedy of Lysenkoism.  Anyone as intelligent, individualistic, and morally sensitive as Kolmogorov would obviously have seen through the lies of his government, and been horrified by its brutality.  So then why did he utter nary a word in public against what was happening?

As far as I can tell, the answer is simply: because Kolmogorov knew better than to pick fights he couldn’t win.  He judged that he could best serve the cause of truth by building up an enclosed little bubble of truth, and protecting that bubble from interference by the Soviet system, and even making the bubble useful to the system wherever he could—rather than futilely struggling to reform the system, and simply making martyrs of himself and all his students for his trouble.

There’s a saying of Kolmogorov, which associates wisdom with keeping your mouth shut:

“Every mathematician believes that he is ahead of the others. The reason none state this belief in public is because they are intelligent people.”

There’s also a story that Kolmogorov loved to tell about himself, which presents math as a sort of refuge from the arbitrariness of the world: he said that he once studied to become a historian, but was put off by the fact that historians demanded ten different proofs for the same proposition, whereas in math, a single proof suffices.

There was also a dark side to political quietism.  In 1936, Kolmogorov joined other mathematicians in testifying against his former mentor in the so-called Luzin affair.  By many accounts, he did this because the police blackmailed him, by threatening to reveal his homosexual relationship with Pavel Aleksandrov.  On the other hand, while he was never foolish enough to take on Lysenko directly, Kolmogorov did publish a paper in 1940 courageously supporting Mendelian genetics.

It seems likely that in every culture, there have been truths, which moreover everyone knows to be true on some level, but which are so corrosive to the culture’s moral self-conception that one can’t assert them, or even entertain them seriously, without (in the best case) being ostracized for the rest of one’s life.  In the USSR, those truths were the ones that undermined the entire communist project: for example, that humans are not blank slates; that Mendelian genetics is right; that Soviet collectivized agriculture was a humanitarian disaster.  In our own culture, those truths are—well, you didn’t expect me to say, did you? 🙂

I’ve long been fascinated by the psychology of unspeakable truths.  Like, for any halfway perceptive person in the USSR, there must have been an incredible temptation to make a name for yourself as a daring truth-teller: so much low-hanging fruit!  So much to say that’s correct and important, and that best of all, hardly anyone else is saying!

But then one would think better of it.  It’s not as if, when you speak a forbidden truth, your colleagues and superiors will thank you for correcting their misconceptions.  Indeed, it’s not as if they didn’t already know, on some level, whatever you imagined yourself telling them.  In fact it’s often because they fear you might be right that the authorities see no choice but to make an example of you, lest the heresy spread more widely.  One corollary is that the more reasonably and cogently you make your case, the more you force the authorities’ hand.

But what’s the inner psychology of the authorities?  For some, it probably really is as cynical as the preceding paragraph makes it sound.  But for most, I doubt that.  I think that most authorities simply internalize the ruling ideology so deeply that they equate dissent with sin.  So in particular, the better you can ground your case in empirical facts, the craftier and more conniving a deceiver you become in their eyes, and hence the more virtuous they are for punishing you.  Someone who’s arrived at that point is completely insulated from argument: absent some crisis that makes them reevaluate their entire life, there’s no sense in even trying.  The question of whether or not your arguments have merit won’t even get entered upon, nor will the authority ever be able to repeat back your arguments in a form you’d recognize—for even repeating the arguments correctly could invite accusations of secretly agreeing with them.  Instead, the sole subject of interest will be you: who you think you are, what your motivations were to utter something so divisive and hateful.  And you have as good a chance of convincing authorities of your benign motivations as you’d have of convincing the Inquisition that, sure, you’re a heretic, but the good kind of heretic, the kind who rejects the divinity of Jesus but believes in niceness and tolerance and helping people.  To an Inquisitor, “good heretic” doesn’t parse any better than “round square,” and the very utterance of such a phrase is an invitation to mockery.  If the Inquisition had had Twitter, its favorite sentence would be “I can’t even.”

If it means anything to be a lover of truth, it means that anytime society finds itself stuck in one of these naked-emperor equilibriums—i.e., an equilibrium with certain facts known to nearly everyone, but severe punishments for anyone who tries to make those facts common knowledge—you hope that eventually society climbs its way out.  But crucially, you can hope this while also realizing that, if you tried singlehandedly to change the equilibrium, it wouldn’t achieve anything good for the cause of truth.  If iconoclasts simply throw themselves against a ruling ideology one by one, they can be picked off as easily as tribesmen charging a tank with spears, and each kill will only embolden the tank-gunners still further.  The charging tribesmen don’t even have the assurance that, if truth ultimately does prevail, then they’ll be honored as martyrs: they might instead end up like Ted Nelson babbling about hypertext in 1960, or H.C. Pocklington yammering about polynomial-time algorithms in 1917, nearly forgotten by history for being too far ahead of their time.

Does this mean that, like Winston Smith, the iconoclast simply must accept that 2+2=5, and that a boot will stamp on a human face forever?  No, not at all.  Instead the iconoclast can choose what I think of as the Kolmogorov option.  This is where you build up fortresses of truth in places the ideological authorities don’t particularly understand or care about, like pure math, or butterfly taxonomy, or irregular verbs.  You avoid a direct assault on any beliefs your culture considers necessary for it to operate.  You even seek out common ground with the local enforcers of orthodoxy.  Best of all is a shared enemy, and a way your knowledge and skills might be useful against that enemy.  For Kolmogorov, the shared enemy was the Nazis; for someone today, an excellent choice might be Trump, who’s rightly despised by many intellectual factions that spend most of their time despising each other.  Meanwhile, you wait for a moment when, because of social tectonic shifts beyond your control, the ruling ideology has become fragile enough that truth-tellers acting in concert really can bring it down.  You accept that this moment of reckoning might never arrive, or not in your lifetime.  But even if so, you could still be honored by future generations for building your local pocket of truth, and for not giving falsehood any more aid or comfort than was necessary for your survival.

When it comes to the amount of flak one takes for defending controversial views in public under one’s own name, I defer to almost no one.  For anyone tempted, based on this post, to call me a conformist or coward: how many times have you been denounced online, and from how many different corners of the ideological spectrum?  How many people have demanded your firing?   How many death threats have you received?  How many threatened lawsuits?  How many comments that simply say “kill yourself kike” or similar?  Answer and we can talk about cowardice.

But, yes, there are places even I won’t go, hills I won’t die on.  Broadly speaking:

  • My Law is that, as a scientist, I’ll hold discovering and disseminating the truth to be a central duty of my life, one that overrides almost every other value.  I’ll constantly urge myself to share what I see as the truth, even if it’s wildly unpopular, or makes me look weird, or is otherwise damaging to me.
  • The Amendment to the Law is that I’ll go to great lengths not to hurt anyone else’s feelings: for example, by propagating negative stereotypes, or by saying anything that might discourage any enthusiastic person from entering science.  And if I don’t understand what is or isn’t hurtful, then I’ll defer to the leading intellectuals in my culture to tell me.  This Amendment often overrides the Law, causing me to bite my tongue.
  • The Amendment to the Amendment is that, when pushed, I’ll stand by what I care about—such as free scientific inquiry, liberal Enlightenment norms, humor, clarity, and the survival of the planet and of family and friends and colleagues and nerdy misfits wherever they might be found.  So if someone puts me in a situation where there’s no way to protect what I care about without speaking a truth that hurts someone’s feelings, then I might speak the truth, feelings be damned.  (Even then, though, I’ll try to minimize collateral damage.)

When I see social media ablaze with this or that popular falsehood, I sometimes feel the “Galileo urge” washing over me.  I think: I’m a tenured professor with a semi-popular blog.  How can I look myself in the mirror, if I won’t use my platform and relative job safety to declare to the world, “and yet it moves”?

But then I remember that even Galileo weighed his options and tried hard to be prudent.  In his mind, the Dialogue Concerning the Two Chief World Systems actually represented a compromise (!).  Galileo never declared outright that the earth orbits the sun.  Instead, he put the Copernican doctrine, as a “possible view,” into the mouth of his character Salviati—only to have Simplicio “refute” Salviati, by the final dialogue, with the argument that faith always trumps reason, and that human beings are pathetically unequipped to deduce the plan of God from mere surface appearances.  Then, when that fig-leaf turned out not to be wide enough to fool the Church, Galileo quickly capitulated.  He repented of his error, and agreed never to defend the Copernican heresy again.  And he didn’t, at least not publicly.

Some have called Galileo a coward for that.  But the great David Hilbert held a different view.  Hilbert said that science, unlike religion, has no need for martyrs, because it’s based on facts that can’t be denied indefinitely.  Given that, Hilbert considered Galileo’s response to be precisely correct: in effect Galileo told the Inquisitors, hey, you’re the ones with the torture rack.  Just tell me which way you want it.  I can have the earth orbiting Mars and Venus in figure-eights by tomorrow if you decree it so.

Three hundred years later, Andrey Kolmogorov would say to the Soviet authorities, in so many words: hey, you’re the ones with the Gulag and secret police.  Consider me at your service.  I’ll even help you stop Hitler’s ideology from taking over the world—you’re 100% right about that one, I’ll give you that.  Now as for your own wondrous ideology: just tell me the dogma of the week, and I’ll try to make sure Soviet mathematics presents no threat to it.

There’s a quiet dignity to Kolmogorov’s (and Galileo’s) approach: a dignity that I suspect will be alien to many, but recognizable to those in the business of science.

Comment Policy: I welcome discussion about the responses of Galileo, Kolmogorov, and other historical figures to official ideologies that they didn’t believe in; and about the meta-question of how a truth-valuing person ought to behave when living under such ideologies.  In the hopes of maintaining a civil discussion, any comments that mention current hot-button ideological disputes will be ruthlessly deleted.

David Hogglatent-variable model; bound-saturating EPRV

Today, Christina Eilers (MPIA) and I switched her project over to a latent variable model. In this model, stellar spectra (every pixel of every spectrum) and stellar labels (Teff, logg, and so on for every star) are treated on an equal footing as “data”. Then we fit an underlying low-dimensional model to all these data (spectra and labels together). By the end of the day, cross-validation tests were pushing us to higher and higher dimensionality for our latent space, and the quality of our predictions was improving. This seems to work, and is a fully probabilistic generalization of The Cannon. Extremely optimistic about this!

Also today, Megan Bedell (Chicago) built a realistic-data simulation for our EPRV project, and also got pipelines working that measure radial velocities precisely. We have realistic, achievable methods that saturate the Cramér–Rao bound! This is what we planned to do this week not today. However, we have a serious puzzle: We can show that a data-driven synthetic spectral template saturates the bound for radial-velocity measurement, and that a binary mask template does not. But we find that the binary mask is so bad, we can't understand how the HARPS pipeline is doing such a great job. My hypothesis: We are wrong that HARPS is using a binary mask.

BackreactionOutraged about the Google diversity memo? I want you to think about it.

Chairs. [Image: Verco] That leaked internal memo from James Damore at Google? The one that says one shouldn’t expect employees in all professions to reflect the demographics of the whole population? Well, that was a pretty dumb thing to write. But not because it’s wrong. Dumb is that Damore thought he could have a reasoned discussion about this. In the USA, out of all places. The version of

August 08, 2017

Terence TaoSchur convexity and positive definiteness of the even degree complete homogeneous symmetric polynomials

The complete homogeneous symmetric polynomial {h_d(x_1,\dots,x_n)} of {n} variables {x_1,\dots,x_n} and degree {d} can be defined as

\displaystyle h_d(x_1,\dots,x_n) := \sum_{1 \leq i_1 \leq \dots \leq i_d \leq n} x_{i_1} \dots x_{i_d},

thus for instance

\displaystyle h_0(x_1,\dots,x_n) = 0,

\displaystyle h_1(x_1,\dots,x_n) = x_1 + \dots + x_n,


\displaystyle h_2(x_1,\dots,x_n) = x_1^2 + \dots + x_n^2 + \sum_{1 \leq i < j \leq n} x_i x_j.

One can also define all the complete homogeneous symmetric polynomials of {n} variables simultaneously by means of the generating function

\displaystyle \sum_{d=0}^\infty h_d(x_1,\dots,x_n) t^d = \frac{1}{(1-t x_1) \dots (1-tx_n)}. \ \ \ \ \ (1)

We will think of the variables {x_1,\dots,x_n} as taking values in the real numbers. When one does so, one might observe that the degree two polynomial {h_2} is a positive definite quadratic form, since it has the sum of squares representation

\displaystyle h_2(x_1,\dots,x_n) = \frac{1}{2} \sum_{i=1}^n x_i^2 + \frac{1}{2} (x_1+\dots+x_n)^2.

In particular, {h_2(x_1,\dots,x_n) > 0} unless {x_1=\dots=x_n=0}. This can be compared against the superficially similar quadratic form

\displaystyle x_1^2 + \dots + x_n^2 + \sum_{1 \leq i < j \leq n} \epsilon_{ij} x_i x_j

where {\epsilon_{ij} = \pm 1} are independent randomly chosen signs. The Wigner semicircle law says that for large {n}, the eigenvalues of this form will be mostly distributed in the interval {[-\sqrt{n}, \sqrt{n}]} using the semicircle distribution, so in particular the form is quite far from being positive definite despite the presence of the first {n} positive terms. Thus the positive definiteness is coming from the finer algebraic structure of {h_2}, and not just from the magnitudes of its coefficients.

One could ask whether the same positivity holds for other degrees {d} than two. For odd degrees, the answer is clearly no, since {h_d(-x_1,\dots,-x_n) = -h_d(x_1,\dots,x_n)} in that case. But one could hope for instance that

\displaystyle h_4(x_1,\dots,x_n) = \sum_{1 \leq i \leq j \leq k \leq l \leq n} x_i x_j x_k x_l

also has a sum of squares representation that demonstrates positive definiteness. This turns out to be true, but is remarkably tedious to establish directly. Nevertheless, we have a nice result of Hunter that gives positive definiteness for all even degrees {d}. In fact, a modification of his argument gives a little bit more:

Theorem 1 Let {n \geq 1}, let {d \geq 0} be even, and let {x_1,\dots,x_n} be reals.

  • (i) (Positive definiteness) One has {h_d(x_1,\dots,x_n) \geq 0}, with strict inequality unless {x_1=\dots=x_n=0}.
  • (ii) (Schur convexity) One has {h_d(x_1,\dots,x_n) \geq h_d(y_1,\dots,y_n)} whenever {(x_1,\dots,x_n)} majorises {(y_1,\dots,y_n)}, with equality if and only if {(y_1,\dots,y_n)} is a permutation of {(x_1,\dots,x_n)}.
  • (iii) (Schur-Ostrowski criterion for Schur convexity) For any {1 \leq i < j \leq n}, one has {(x_i - x_j) (\frac{\partial}{\partial x_i} - \frac{\partial}{\partial x_j}) h_d(x_1,\dots,x_n) \geq 0}, with strict inequality unless {x_i=x_j}.

Proof: We induct on {d} (allowing {n} to be arbitrary). The claim is trivially true for {d=0}, and easily verified for {d=2}, so suppose that {d \geq 4} and the claims (i), (ii), (iii) have already been proven for {d-2} (and for arbitrary {n}).

If we apply the differential operator {(x_i - x_j) (\frac{\partial}{\partial x_i} - \frac{\partial}{\partial x_j})} to {\frac{1}{(1-t x_1) \dots (1-tx_n)}} using the product rule, one obtains after a brief calculation

\displaystyle \frac{(x_i-x_j)^2 t^2}{(1-t x_1) \dots (1-tx_n) (1-t x_i) (1-t x_j)}.

Using (1) and extracting the {t^d} coefficient, we obtain the identity

\displaystyle (x_i - x_j) (\frac{\partial}{\partial x_i} - \frac{\partial}{\partial x_j}) h_d(x_1,\dots,x_n)

\displaystyle = (x_i-x_j)^2 h_{d-2}( x_1,\dots,x_n,x_i,x_j). \ \ \ \ \ (2)

The claim (iii) then follows from (i) and the induction hypothesis.

To obtain (ii), we use the more general statement (known as the Schur-Ostrowski criterion) that (ii) is implied from (iii) if we replace {h_d} by an arbitrary symmetric, continuously differentiable function. To establish this criterion, we induct on {n} (this argument can be made independently of the existing induction on {d}). If {(y_1,\dots,y_n)} is majorised by {(x_1,\dots,x_n)}, it lies in the permutahedron of {(x_1,\dots,x_n)}. If {(y_1,\dots,y_n)} lies on a face of this permutahedron, then after permuting both the {x_i} and {y_j} we may assume that {(y_1,\dots,y_m)} is majorised by {(x_1,\dots,x_m)}, and {(y_{m+1},\dots,y_n)} is majorised by {(x_{m+1},\dots,x_n)} for some {1 \leq m < n}, and the claim then follows from two applications of the induction hypothesis. If instead {(y_1,\dots,y_n)} lies in the interior of the permutahedron, one can follow it to the boundary by using one of the vector fields {(x_i - x_j) (\frac{\partial}{\partial x_i} - \frac{\partial}{\partial x_j})}, and the claim follows from the boundary case.

Finally, to obtain (i), we observe that {(x_1,\dots,x_n)} majorises {(x,\dots,x)}, where {x} is the arithmetic mean of {x_1,\dots,x_n}. But {h_d(x,\dots,x)} is clearly a positive multiple of {x^d}, and the claim now follows from (ii). \Box

If the variables {x_1,\dots,x_n} are restricted to be nonnegative, the same argument gives Schur convexity for odd degrees also.

The proof in Hunter of positive definiteness is arranged a little differently than the one above, but still relies ultimately on the identity (2). I wonder if there is a genuinely different way to establish positive definiteness that does not go through this identity.

Filed under: expository, math.AC, math.CA, Uncategorized Tagged: positive definiteness, symmetric polynomials

Chad OrzelPhysics Blogging Round-Up: July

Another month, another collection of blog posts for Forbes:

The Physics Of Century-Old Mirror Selfies: Back in the early 1900’s there was a brief vogue for trick pictures showing the same person from five different angles; this post explains how to do that with mirrors.

Why Research By Undergraduates Is Important For Science And Students: A reply to an essay talking up the products of undergraduate research projects, arguing that the most valuable part of research is the effect on students.

What Does It Mean To Share ‘Raw Data’?: Some thoughts on the uselessness of much “raw data” in my field to anyone outside the lab where it was produced.

Breaking Stuff Is An Essential Part Of The Scientific Process: Thoughts on how the most important year of my grad school career was the frustrating one in which I broke and then repaired everything in the lab.

Measuring The Speed Of Quantum Tunneling: A couple of recent experiments use a clever trick to look at whether there’s a time delay as electrons tunnel out of an atom in a strong electric field. Unfortunately, they get very different results…

I was a little disappointed that the photo-multigraph thing didn’t get more traction, but it was fun to do, so that’s okay. The quantum tunneling post did surprisingly well– I thought it was likely to be a little too technical to really take off, but it did. Always nice when that happens.

The other three are closely related to a development at work, namely that on July 1 I officially added “Director of Undergraduate Research” to the many hats I wear. I’m in charge of supervising the research program at Union, disbursing summer fellowships and small grants for research projects and conference travel, and arranging a number of research-oriented events on campus. This involves a certain amount of administrative hassle, but then again, it’s hassle in the service of helping students do awesome stuff, so I’m happy to do it.

Anyway, that’s where things are. Blogging will very likely tail off dramatically for the fall, possibly as soon as this month (though I already have one post up), as I have a book on contract due Dec. 1, and a review article due to a journal a month later. And, you know, classes to teach and research to direct…

Doug NatelsonWorkshop on quantum transport

Blogging has been slow b/c of travel.  I'm attending a workshop on "Quantum transport in nanoscale molecular systems".  This is rather like a Gordon Conference, with a fair bit of unpublished work being presented, but when it's over I'll hit a few highlights that are already in the literature.  Update:  here you go.

August 07, 2017

Doug NatelsonHighlights from Telluride

Here are a few highlights from the workshop I mentioned.  I'll amend this over the next couple of days as I have time.  There is no question that smaller meetings (this one was about 28 people) can be very good for discussions.
  • I learned that there is a new edition of Cuevas and Scheer that I should pick up.  (The authors are Juan Carlos Cuevas and Elke Scheer, a great theorist/experimentalist team-up.)
  • Apparently it's possible to make a guitar amplifier using tunnel junctions made from self-assembled monolayers.  For more detail, see here.
  • Some folks at Aachen have gotten serious about physics lab experiments you can do with your mobile phone.
  • Richard Berndt gave a very nice talk about light emission from atomic-scale junctions made with a scanning tunneling microscope.  Some of that work has been written about here and here.  A key question is, when a bias of \(eV\) is applied to such a junction, what is the mechanism that leads to the emission of photons of energies \(\hbar \omega > eV\)?  Clearly the processes involve multiple electrons, but exactly how things work is quite complicated, involving both the plasmonic/optical resonances of the junction and the scattering of electrons at the atomic-scale region.  Two relevant theory papers are here and here.
  • Latha Venkataraman showed some intriguing new results indicating room temperature Coulomb blockade-like transport in nanoclusters.  (It's not strictly Coulomb blockade, since the dominant energy scale seems to be set by single-particle level spacing rather than by the electrostatic charging energy of changing the electronic population by one electron).
  • Katharina Franke showed some very pretty data on single porphyrins measured via scanning tunneling microscope, as in here.  Interactions between the tip and the top of the molecule result in mechanical deformation of the molecule, which in turn tunes the electronic coupling between the transition metal in the middle of the porphyrin and the substrate.  This ends up being a nice system for tunable studies of Kondo physics.
  • Uri Peskin explained some interesting recent results that were just the beginning of some discussions about what kind of photoelectric responses one can see in very small junctions.  One recurring challenge:  multiple mechanisms that seem to be rather different physics can lead to similar experimentally measurable outcomes (currents, voltages).
  • Jascha Repp discussed some really interesting experiments combining STM and THz optics, to do true time-resolved measurements in the STM, such as watching a molecule bounce up and down on a metal surface (!).  This result is timely (no pun intended), as this remarkable paper just appeared on the arxiv, looking at on-chip ways of doing THz and faster electronics.
  • Jeff Neaton spoke about the ongoing challenge of using techniques like density functional theory to calculate and predict the energy level alignment between molecules and surfaces to which they're adsorbed or bonded.  This is important for transport, but also for catalysis and surface chemistry broadly.  A relevant recent result is here.
  • Jan van Ruitenbeek talked about their latest approach to measuring shot noise spectra in atomically small structures up to a few MHz, and some interesting things that this technique has revealed to them at high bias.  
  • There were multiple theory talks looking at trying to understand transport, inelastic processes, and dissipation in open, driven quantum systems.  Examples include situations where higher driving biases can actually make cooling processes more efficient; whether it's possible to have experiments in condensed matter systems that "see" many-body localization, an effect most explored in cold atom systems; using ballistic effects in graphene to do unusual imaging experiments or make electronic "beam splitters"; open systems from a quantum information point of view; what we mean by local effective temperature on very small scales; and new techniques for transport calculations. 
  • Pramod Reddy gave a really nice presentation about his group's extremely impressive work measuring thermal conduction at the atomic scale.  Directly related, he also talked about the challenges of measuring radiative heat transfer down to nm separations, where the Stefan-Boltzmann approach should be supplanted by near-field physics.  This was a very convincing lesson in how difficult it is to ensure that surfaces are truly clean, even in ultrahigh vacuum.
  • Joe Subotnik's talk about electronic friction was particularly striking to me, as I'd been previously unaware of some of the critical experiments (1, 2).  When and how do electron-hole excitations in metals lead to big changes in vibrational energy content of molecules, and how to think about this.  These issues are related to these experiments as well.  
  • Ron Naaman spoke about chiral molecules and how electron transfer to and from these objects can have surprising, big effects (see here and here).
  • Gemma Solomon closed out the proceedings with a very interesting talk about whether molecules could be used to make effective insulating layers better at resisting tunneling current than actual vacuum, and a great summary of the whole research area, where it's been, and where it's going.

August 05, 2017

n-Category Café Instantaneous Dimension of Finite Metric Spaces via Magnitude and Spread

In June I went to the following conference.

This was held at the Będlewo Conference Centre which is run by the Polish Academy of Sciences’ Institute of Mathematics. Like Oberwolfach it is kind of in the middle of nowhere, being about half an hour’s bus ride from Poznan. (As our excursion guide told us, Poznan is 300km from anywhere: 300 km from Warsaw, 300 km from Berlin, 300 km from the sea and 300 km from the mountains.) You get to eat and drink in the palace pictured below; the seminar rooms and accommodation are in a somewhat less grand building out of shot of the photo.

Bedlewo palace

I gave a 20-minute long, magnitude-related talk. You can download the slides below. Do try the BuzzFeed-like quiz at the end. How many of the ten spaces can just identify just from their dimension profile?

To watch the animation I think that you will have to use acrobat reader. If you don’t want to use that then there’s a movie-free version.

Here’s the abstract.

Some spaces seem to have different dimensions at different scales. A long thin strip might appear one-dimensional at a distance, then two-dimensional when zoomed in on, but when zoomed in on even closer it is seen to be made of a finite array of points, so at that scale it seems zero-dimensional. I will present a way of quantifying this phenomenon.

The main idea is to think of dimension as corresponding to growth rate of size: when you double distances, a line will double in size and a square will quadruple in size. You then just need some good notions of size of metric spaces. One such notion is ‘magnitude’ which was introduced by Leinster, using category theoretic ideas. but was found to have links to many other areas of maths such as biodiversity and potential theory. There’s a closely related, but computationally more tractable, family of notions of size called ‘spreads’ which I introduced following connections with biodiversity.

Meckes showed that the asymptotic growth rate of the magnitude of a metric space is the Minkowski dimension (i.e. the usual dimension for squares and lines and the usual fractal dimension for things like Cantor sets). But this is zero for finite metric spaces. However, by considering growth rate non-asymptotically you get interesting looking results for finite metric spaces, such as the phenomenon described in the first parargraph.

I have blogged about instantaneous dimension before at this post. One connection with applied topology is that as in for persistent homology, one is considering what is happens to a metric space as you scale the metric.

The talk was in the smallest room of three parallel talks, so I had a reasonably small audience. However, it was very nice that almost everyone who was in the talk came up and spoke to me about it afterwards; some even told me how I could calculate magnitude of large metric spaces much faster! For instance Brad Nelson showed me how you can use iterative methods, such as the Krylov subspace method, for solving large linear systems numerically. This is much faster than just naively asking Maple to solve the linear system.

Anyway, do say below how well you did in the quiz!

n-Category Café The Rise and Spread of Algebraic Topology

People have been using algebraic topology in data analysis these days, so we’re starting to see conferences like this:

I’m giving the first talk at this one. I’ve done a lot of work on applied category theory, but only a bit on on applied algebraic topology. It was tempting to smuggle in some categories, operads and props under the guise of algebraic topology. But I decided it would be more useful, as a kind of prelude to the conference, to say a bit about the overall history of algebraic topology, and its inner logic: how it was inevitably driven to categories, and then 2-categories, and then \infty-categories.

This may be the least ‘applied’ of all the talks at this conference, but I’m hoping it will at least trigger some interesting thoughts. We don’t want the ‘applied’ folks to forget the grand view that algebraic topology has to offer!

Here are my talk slides:

Abstract. As algebraic topology becomes more important in applied mathematics it is worth looking back to see how this subject has changed our outlook on mathematics in general. When Noether moved from working with Betti numbers to homology groups, she forced a new outlook on topological invariants: namely, they are often functors, with two invariants counting as ‘the same’ if they are naturally isomorphic. To formalize this it was necessary to invent categories, and to formalize the analogy between natural isomorphisms between functors and homotopies between maps it was necessary to invent 2-categories. These are just the first steps in the ‘homotopification’ of mathematics, a trend in which algebra more and more comes to resemble topology, and ultimately abstract ‘spaces’ (for example, homotopy types) are considered as fundamental as sets. It is natural to wonder whether topological data analysis is a step in the spread of these ideas into applied mathematics, and how the importance of ‘robustness’ in applications will influence algebraic topology.

I thank Mike Shulman with some help on model categories and quasicategories. Any mistakes are, of course, my own fault.

John BaezThe Rise and Spread of Algebraic Topology

People have been using algebraic topology in data analysis these days, so we’re starting to see conferences like this:

Applied Algebraic Topology 2017, August 8-12, 2017, Hokkaido University, Sapporo, Japan.

I’m giving the first talk at this one. I’ve done a lot of work on applied category theory, but only a bit on on applied algebraic topology. It was tempting to smuggle in some categories, operads and props under the guise of algebraic topology. But decided it would be more useful, as a kind of prelude to the conference, to say a bit about the overall history of algebraic topology, and its inner logic: how it was inevitably driven to categories, and then 2-categories, and then ∞-categories.

This may be the least ‘applied’ of all the talks at this conference, but I’m hoping it will at least trigger some interesting thoughts. We don’t want the ‘applied’ folks to forget the grand view that algebraic topology has to offer!

Here are my talk slides:

The rise and spread of algebraic topology.

Abstract. As algebraic topology becomes more important in applied mathematics it is worth looking back to see how this subject has changed our outlook on mathematics in general. When Noether moved from working with Betti numbers to homology groups, she forced a new outlook on topological invariants: namely, they are often functors, with two invariants counting as ‘the same’ if they are naturally isomorphic. To formalize this it was necessary to invent categories, and to formalize the analogy between natural isomorphisms between functors and homotopies between maps it was necessary to invent 2-categories. These are just the first steps in the ‘homotopification’ of mathematics, a trend in which algebra more and more comes to resemble topology, and ultimately abstract ‘spaces’ (for example, homotopy types) are considered as fundamental as sets. It is natural to wonder whether topological data analysis is a step in the spread of these ideas into applied mathematics, and how the importance of ‘robustness’ in applications will influence algebraic topology.

I thank Mike Shulman with some help on model categories and quasicategories. Any mistakes are, of course, my own fault.

August 04, 2017

BackreactionNew paper claims string theory can be tested with Bose-Einstein-Condensates

Fluorescence image of Bose-Einstein-Condensate. Image Credits: Stefan Kuhr and Immanuel Bloch, MPQ String theory is infamously detached from experiment. But in a new paper, a group from Mexico put forward a proposal to change that String theory phenomenology and quantum many–body systems Sergio Gutiérrez, Abel Camacho, Héctor Hernández arXiv:1707.07757 [gr-qc] Ahead, let me be clear they

BackreactionSelf-tuning brings wireless power closer to reality

Cables under my desk. One of the unlikelier fights I picked while blogging was with an MIT group that aimed to wirelessly power devices – by tunneling: “If you bring another resonant object with the same frequency close enough to these tails then it turns out that the energy can tunnel from one object to another,” said Professor Soljacic. [Source: BBC] They had proposed a new method for

August 03, 2017

John PreskillTaming wave functions with neural networks

Note from Nicole Yunger Halpern: One sunny Saturday this spring, I heard Sam Greydanus present about his undergraduate thesis. Sam was about to graduate from Dartmouth with a major in physics. He had worked with quantum-computation theorist Professor James Whitfield. The presentation — about applying neural networks to quantum computation — so intrigued me that I asked him to share his research on Quantum Frontiers. Sam generously agreed; this is his story.

Wave functions in the wild


The wave function, \psi , is a mixed blessing. At first, it causes unsuspecting undergrads (me) some angst via the Schrodinger’s cat paradox. This angst morphs into full-fledged panic when they encounter concepts such as nonlocality and Bell’s theorem (which, by the way, is surprisingly hard to verify experimentally). The real trouble with \psi , though, is that it grows exponentially with the number of entangled particles in a system. We couldn’t even hope to write the wavefunction of 100 entangled particles, much less perform computations on it…but there’s a lot to gain from doing just that.

The thing is, we (a couple of luckless physicists) love \psi . Manipulating wave functions can give us ultra-precise timekeeping, secure encryption, and polynomial-time factoring of integers (read: break RSA). Harnessing quantum effects can also produce better machine learning, better physics simulations, and even quantum teleportation.

Taming the beast

Though \psi grows exponentially with the number of particles in a system, most physical wave functions can be described with a lot less information. Two algorithms for doing this are the Density Matrix Renormalization Group (DMRG) and Quantum Monte Carlo (QMC).


Density Matrix Renormalization Group (DMRG). Imagine we want to learn about trees, but studying a full-grown, 50-foot tall tree in the lab is too unwieldy. One idea is to keep the tree small, like a bonsai tree. DMRG is an algorithm which, like a bonsai gardener, prunes the wave function while preserving its most important components. It produces a compressed version of the wave function called a Matrix Product State (MPS). One issue with DMRG is that it doesn’t extend particularly well to 2D and 3D systems.

Screen Shot 2017-07-29 at 12.01.23 AM

Quantum Monte Carlo (QMC). Another way to study the concept of “tree” in a lab (bear with me on this metaphor) would be to study a bunch of leaf, seed, and bark samples. Quantum Monte Carlo algorithms do this with wave functions, taking “samples” of a wave function (pure states) and using the properties and frequencies of these samples to build a picture of the wave function as a whole. The difficulty with QMC is that it treats the wave function as a black box. We might ask, “how does flipping the spin of the third electron affect the total energy?” and QMC wouldn’t have much of a physical answer.

Brains \gg Brawn

Neural Quantum States (NQS). Some state spaces are far too large for even Monte Carlo to sample adequately. Suppose now we’re studying a forest full of different species of trees. If one type of tree vastly outnumbers the others, choosing samples from random trees isn’t an efficient way to map biodiversity. Somehow, we need to make the sampling process “smarter”. Last year, Google DeepMind used a technique called deep reinforcement learning to do just that – and achieved fame for defeating the world champion human Go player. A recent Science paper by Carleo and Troyer (2017) used the same technique to make QMC “smarter” and effectively compress wave functions with neural networks. This approach, called “Neural Quantum States (NQS)”, produced several state-of-the-art results.


The general idea of my thesis.

My thesis. My undergraduate thesis centered upon much the same idea. In fact, I had to abandon some of my initial work after reading the NQS paper. I then focused on using machine learning techniques to obtain MPS coefficients. Like Carleo and Troyer, I used neural networks to approximate  \psi . Unlike Carleo and Troyer, I trained my model to output a set of Matrix Product State coefficients which have physical meaning (MPS coefficients always correspond to a certain state and site, e.g. “spin up, electron number 3”).

Cool – but does it work?

Yes – for small systems. In my thesis, I considered a toy system of 4 spin-\frac{1}{2} particles interacting via the Heisenberg Hamiltonian. Solving this system is not difficult so I was able to focus on fitting the two disparate parts – machine learning and Matrix Product States – together.

Success! My model solved for ground states with arbitrary precision. Even more interestingly, I used it to automatically obtain MPS coefficients. Shown below, for example, is a visualization of my model’s coefficients for the GHZ state, compared with coefficients taken from the literature.

Screen Shot 2017-07-28 at 11.46.45 PM

A visual comparison of a 4-site Matrix Product State for the GHZ state a) listed in the literature b) obtained from my neural network model. Colored squares correspond to real-valued elements of 2×2 matrices.

Limitations. The careful reader might point out that, according to the schema of my model (above), I still have to write out the full wave function. To scale my model up, I instead trained it variationally over a subspace of the Hamiltonian (just as the authors of the NQS paper did). Results are decent for larger (10-20 particle) systems, but the training itself is still unstable. I’ll finish ironing out the details soon, so keep an eye on arXiv* :).

Outside the ivory tower


A quantum computer developed by Joint Quantum Institute, U. Maryland.

Quantum computing is a field that’s poised to take on commercial relevance. Taming the wave function is one of the big hurdles we need to clear before this happens. Hopefully my findings will have a small role to play in making this happen.

On a more personal note, thank you for reading about my work. As a recent undergrad, I’m still new to research and I’d love to hear constructive comments or criticisms. If you found this post interesting, check out my research blog.

*arXiv is an online library for electronic preprints of scientific papers

August 02, 2017

Jordan EllenbergMaryland flag, my Maryland flag

The Maryland flag is, in my opinion as a Marylander, the greatest state flag.

Ungepotch?  Yes.  But it has that ineffable “it shouldn’t work but it does” that marks really great art.

But here’s something I didn’t know about my home state’s flag:


Despite the antiquity of its design, the Maryland flag is of post-Civil War origin. Throughout the colonial period, only the yellow-and-black Calvert family colors are mentioned in descriptions of the Maryland flag. After independence, the use of the Calvert family colors was discontinued. Various banners were used to represent the state, although none was adopted officially as a state flag. By the Civil War, the most common Maryland flag design probably consisted of the great seal of the state on a blue background. These blue banners were flown at least until the late 1890s….

Reintroduction of the Calvert coat of arms on the great seal of the state [in 1854] was followed by a reappearance at public events of banners in the yellow-and-black Calvert family colors. Called the “Maryland colors” or “Baltimore colors,” these yellow-and-black banners lacked official sanction of the General Assembly, but appear to have quickly become popular with the public as a unique and readily identifiable symbol of Maryland and its long history.

The red-and-white Crossland arms gained popularity in quite a different way. Probably because the yellow-and-black “Maryland colors” were popularly identified with a state which, reluctantly or not, remained in the Union, Marylanders who sympathized with the South adopted the red-and-white of the Crossland arms as their colors. Following Lincoln’s election in 1861, red and white “secession colors” appeared on everything from yarn stockings and cravats to children’s clothing. People displaying these red-and-white symbols of resistance to the Union and to Lincoln’s policies were vigorously prosecuted by Federal authorities.

During the war, Maryland-born Confederate soldiers used both the red-and-white colors and the cross bottony design from the Crossland quadrants of the Calvert coat of arms as a unique way of identifying their place of birth. Pins in the cross bottony shape were worn on uniforms, and the headquarters flag of the Maryland-born Confederate general Bradley T. Johnson was a red cross bottony on a white field.

By the end of the Civil War, therefore, both the yellow-and-black Calvert arms and the red-and-white colors and bottony cross design of the Crossland arms were clearly identified with Maryland, although they represented opposing sides in the conflict.

In 4th grade, in Maryland history, right after having to memorize the names of the counties, we learned about the flag’s origin in the Calvert coat of arms

but not about the symbolic meaning of the flag’s adoption, as an explicit gesture of reconciliation between Confederate sympathizers and Union loyalists sharing power in a post-war border state.

The Howard County flag is based on the Crossland arms.  (There’s also a sheaf of wheat and a silhouette of Howard County nosing its way through a golden triangle.)  The city of Baltimore, on the other hand, uses the Calvert yellow-and-black only.

Oh, and there’s one more flag:

That’s the flag of the Republic of Maryland, an independent country in West Africa settled mostly by free black Marylanders.  It existed only from 1854 to 1857, when it was absorbed into Liberia, of which it’s still a part, called Maryland County.  The county flag still has Lord Baltimore’s yellow, but not the black.





August 01, 2017

Jordan EllenbergGood math days

I have good math days and bad math days; we all do.  An outsider might think the good math days are the days when you have good ideas.  That’s not how it works, at least for me.  You have good ideas on the bad math days, too; but one at a time.  You have an idea, you try it, you make some progress, it doesn’t work, your mind says “too bad.”

On the good math days, you have an idea, you try it, it doesn’t work, you click over to the next idea, you get over the obstacle that was blocking you, then you’re stuck again, you ask your mind “What’s the next thing to do?” you get the next idea, you take another step, and you just keep going.

You don’t feel smarter on the good math days.  It’s not even momentum, exactly, because it’s not a feeling of speed.  More like:  the feeling of being a big, heavy, not very fast vehicle, with very large tires, that’s just going to keep on traveling, over a bump, across a ditch, through a river, continually and inexorably moving in a roughly fixed direction.


July 31, 2017

John PreskillThe sign problem(s)

The thirteen-month-old had mastered the word “dada” by the time I met her. Her parents were teaching her to communicate other concepts through sign language. Picture her, dark-haired and bibbed, in a high chair. Banana and mango slices litter the tray in front of her. More fruit litters the floor in front of the tray. The baby lifts her arms and flaps her hands.

Dada looks up from scrubbing the floor.

“Look,” he calls to Mummy, “she’s using sign language! All done.” He performs the gesture that his daughter seems to have aped: He raises his hands and rotates his forearms about his ulnas, axes perpendicular to the floor. “All done!”

The baby looks down, seizes another morsel, and stuffs it into her mouth.

“Never mind,” Dada amends. “You’re not done, are you?”

His daughter had a sign(-language) problem.


So does Dada, MIT professor Aram Harrow. Aram studies quantum information theory. His interests range from complexity to matrices, from resource theories to entropies. He’s blogged for The Quantum Pontiff, and he studies—including with IQIM postdoc Elizabeth Crossonthe quantum sign problem.

Imagine calculating properties of a chunk of fermionic quantum matter. The chunk consists of sites, each inhabited by one particle or by none. Translate as “no site can house more than one particle” the jargon “the particles are fermions.”

The chunk can have certain amounts of energy. Each amount E_j corresponds to some particle configuration indexed by j: If the system has some amount E_1 of energy, particles occupy certain sites and might not occupy others. If the system has a different amount E_2 \neq E_1 of energy, particles occupy different sites. A Hamiltonian, a mathematical object denoted by H, encodes the energies E_j and the configurations. We represent H with a matrix, a square grid of numbers.

Suppose that the chunk has a temperature T = \frac{ 1 }{ k_{\rm B} \beta }. We could calculate the system’s heat capacity, the energy required to raise the chunk’s temperature by one Kelvin. We could calculate the free energy, how much work the chunk could perform in powering a motor or lifting a weight. To calculate those properties, we calculate the system’s partition function, Z.

How? We would list the configurations j. With each configuration, we would associate the weight e^{ - \beta E_j }. We would sum the weights: Z = e^{ - \beta E_1 }  +  e^{ - \beta E_2}  +  \ldots  =  \sum_j e^{ - \beta E_j}.

Easier—like feeding a 13-month-old—said than done. Let N denote the number of qubits in the chunk. If N is large, the number of configurations is gigantic. Our computers can’t process so many configurations. This inability underlies quantum computing’s promise of speeding up certain calculations.

We don’t have quantum computers, and we can’t calculate Z. Can we  approximate Z?

Yes, if H “lacks the sign problem.” The math that models our system models also a classical system. If our system has D dimensions, the classical system has D+1 dimensions. Suppose, for example, that our sites form a line. The classical system forms a square.

We replace the weights e^{ - \beta E_j } with different weights—numbers formed from a matrix that represents H. If H lacks the sign problem, the new weights are nonnegative and behave like probabilities. Many mathematical tools suit probabilities. Aram and Elizabeth apply such tools to Z, here and here, as do many other researchers.

We call Hamiltonians that lack the sign problem “stoquastic,” which I think fanquastic.Stay tuned for a blog post about stoquasticity by Elizabeth.

What if H has the sign problem? The new weights can assume negative and nonreal values. The weights behave unlike probabilities; we can’t apply those tools. We find ourselves knee-deep in banana and mango chunks.

Mango chunks

Solutions to the sign problem remain elusive. Theorists keep trying to mitigate the problem, though. Aram, Elizabeth, and others are improving calculations of properties of sign-problem-free systems. One scientist-in-the-making has achieved a breakthrough: Aram’s daughter now rotates her hands upon finishing meals and when she wants to leave her car seat or stroller.

One sign problem down; one to go.


With gratitude to Aram’s family for its hospitality and to Elizabeth Crosson for sharing her expertise.

1For experts: A local Hamiltonian is stoquastic relative to the computational basis if each local term is represented, relative to the computational basis, by a matrix whose off-diagonal entries are real and nonpositive.

July 29, 2017

Jordan EllenbergShow report: Camp Friends and Omni at the Terrace

Beautiful weather last night so I decided, why not, go to the Terrace for the free show WUD put on:  Camp Friends (Madison) and Omni (Atlanta).

Missed most of Camp Friends, who were billed as experimental but in fact played genial, not-real-tight college indie.  Singer took his shirt off.

Omni, though — this is the real thing.  Everyone says it sounds like 1981 (specifically:  1981), and they’re right, but it rather wonderfully doesn’t sound like any particular thing in 1981.  There’s the herky-jerky-shoutiness and clipped chords (but on some songs that sounds like Devo and on others like Joe Jackson) and the jazz chords high on the neck (the Fall?  The Police?) and weird little technical guitar runs that sound like Genesis learning to play new wave guitar on Abacab and arpeggios that sound like Peter Buck learning to play guitar in the first place (these guys are from Georgia, after all.)  What I kind of love about young people is this.  To me, all these sounds are separate styles; to a kid picking up these records now, they’re just 1981, they’re all material to work from, you can put them together and something kind of great comes out of it.

You see a lot of bands with a frontman but not that many which, like Omni, have a frontman and a backman.  Philip Frobos sings and plays bass and mugs and talks to the audience.  Frankie Broyles, the guitar player, is a slight guy who looks like a librarian and stands still and almost expressionless while he plays his tight little runs.  Then, every once in a while, he unleashes an absolute storm of noise.  But still doesn’t grimace, still doesn’t move!  Amazing.  Penn and Teller is the only analogue I can think of.

Omni plays “Jungle Jenny,” live in Atlanta:

And here’s “Wire,” to give a sense of their more-dance-less-rock side:


Both songs are on Omni’s debut album, Deluxe, listenable at Bandcamp.

Best show I’ve seen at the Terrace in a long time.  Good job, WUD.




July 27, 2017

John PreskillEntropy Avengers

As you already know if you read my rare (but highly refined!) blog samples, I have spent a big chunk of my professorial career teaching statistical mechanics. And if you teach statistical mechanics, there is pretty much one thing you obsess about: entropy.

So you can imagine my joy of finally seeing a fully anti-entropic superhero appearing on my facebook account (physics enthusiasts out there – the project is seeking support on Kickstarter):

Apart from the plug for Assa Auerbach’s project (which, for full disclosure, I have just supported), I would like to use this as an excuse to share my lessons about entropy. With the same level of seriousness. Here they are, in order of increasing entropy.

1. Cost of entropy. Entropy is always marketed as a very palpable thing. Disorder. In class, however, it is calculated via an enumeration of the ‘microscopic states of the system’. For an atomic gas I know how to calculate the entropy (throw me at the blackboard in the middle of the night, no problem. Bosons or Fermions – anytime!) But how can the concept be applied to our practical existence? I have a proposal:

Quantify entropy by the cost (in $’s) of cleaning up the mess!

Examples can be found at all scales. For anything household-related, we should use the H_k constant. H_k=$25/hour for my housekeeper. You break a glass – it takes about 10 minutes to clean. That puts the entropy of the wreckage at $4.17. Having a birthday party takes about 2 hours to clean up: $50 entropy.

Another insight which my combined experience as professor and parent has produced:

2. Conjecture: Babies are maximally efficient topological entropy machines. If you raised a 1 year-old you know exactly what I mean. You can at least guess why maximum efficiency. But why topological? A baby sauntering through the house leaves a string of destruction behind itself. The baby is a mess-creation string-operator! If you start lagging behind, doom will emerge – hence the maximum efficiency. By the way, the only strategy viable is to undo the damage as it happens. But this blog post is about entropy, not about parenting.

In fact, this allows us to establish a conversion of entropy measured in k_B units, to its, clearly more natural, measure in dollar units. A baby eats about 1000kCal/day=4200kJ/day. To fully deal with the consequences, we need a housekeeper to visit about once a week. 4200kJ/day times 7 days=29400 kJoules. These are consumed at T=300K. So an entropy of S=Q/T~100J/K, which is also S~6 \times 10^{24} (Q/k_B T) in dimensionless units, converts to S~$120, which is the cost of our weekly housekeeper visit. This gives a value of $ 10^{-23} per entropy of a two-level system. Quite a reasonable bang for the buck, don’t you think?

3. My conjecture (2) fails. The second law of thermodynamics is an inequality. Entropy \geq Q/T. Why does the conjecture fail? Babies are not ‘maximal’. Consider presidents. Consider the mess that the government can make. It is at the scale of trillions per year. $ 10^{12}. Using the rigorous conversion rule established above, this corresponds to 10^{35} two-level systems. Which happens to quite precisely match the combined number of electrons present in the human bodies of all our military personnel. But the mess, however, is created by very few individuals.

Given the large amounts of taxpayer money we dish out to deal with entropy in the world, Auerbach’s book is bound to make a big impact. In fact, maybe Max the demon would one day be nominated for the national medal of freedom, or at least be inducted into the National Academy of Sciences.

Tim GowersAnother journal flips

There is widespread (even if not universal) agreement that something is deeply wrong with the current system of academic publishing. The basic point, which has been made innumerable times by innumerable people, is that the really hard parts — the writing of papers, and the peer review and selection of the ones to publish — are done voluntarily by academics, and modern technology makes things like typesetting and dissemination extremely cheap. And yet publishers are making more money than ever before. They do this by insisting that we give them ownership of the content we produce (though many journals will publish papers even if you strike out the part of the contract that hands them this ownership — these days I never agree to give copyright to a publisher, and I urge you not to either), and by bundling their journals together so that libraries are forced into an all-or-nothing decision. (As another aside, I also urge libraries to look closely at what is happening in Germany, where they have gone for the “nothing” option with Elsevier and the world has not come to an end.)

What can be done about this? There are many actions, none of which are likely to be sufficient to bring about major change on their own, but which in combination will help to get us to a tipping point. In no particular order, here are some of them.

  1. Create new journals that operate much more cheaply and wait for them to become established.
  2. Persuade libraries not to agree to Big Deals with the big publishers.
  3. Refuse to publish with, write for, or edit for, the big publishers.
  4. Make sure all your work is freely available online.
  5. Encourage journals that are supporting the big publishers to leave those publishers and set up in a cheaper and fairer way.

Not all of these are easy things to do, but I’m delighted to report that a small group I belong to, set up by Mark Wilson, has, after approaching a large number of maths journals, found one that was ready to “flip”: the Journal of Algebraic Combinatorics has just announced that it will be leaving Springer. Or if you want to be more pedantic about it, a new journal will be starting, called Algebraic Combinatorics and published by The Mersenne Centre for Open Scientific Publishing, and almost all the editors of the Journal of Algebraic Combinatorics will resign from that journal and become editors of the new one, which will adhere to Fair Open Access Principles.

If you want to see change, then you should from now on regard Algebraic Combinatorics as the true continuation of the Journal of Algebraic Combinatorics, and the Journal of Algebraic Combinatorics as a zombie journal that happens to have a name that coincides with a former real journal. And of course, that means that if you are an algebraic combinatorialist with a paper that would have been suitable for the Journal of Algebraic Combinatorics, you should understand that the reputation of the Journal of Algebraic Combinatorics is being transferred, along with the editorial board, to Algebraic Combinatorics, and you should therefore submit it to Algebraic Combinatorics. This has worked with previous flips: the zombie journal rarely thrives afterwards and in some notable cases has ceased to publish after a couple of years or so.

The words of one of the editors of the Journal of Algebraic Combinatorics, Hugh Thomas, are particularly telling, especially the first sentence: “There wasn’t a particular crisis. It has been becoming more and more clear that commercial journal publishers are charging high subscription fees and high Article Processing Charges (APCs), profiting from the volunteer labour of the academic community, and adding little value. It is getting easier and easier to automate the things that they once took care of. The actual printing and distribution of paper copies is also much less important than it has been in the past; this is something which we have decided we can do without.”

I mentioned earlier that we approached many journals. Although it is very exciting that one journal is flipping, I must also admit to disappointment at how low our strike rate has been so far. However, the words “so far” are important: many members of editorial boards were very sympathetic with our aims, and some journals were adopting a wait-and-see attitude, so if the flip of JACo is successful, we hope that it will encourage other journals. I should say that we weren’t just saying, “Why don’t you flip?” but we were also offering support, including financial support. The current situation is that we can almost certainly finance journals that are ready to flip to an “ultra-cheap” model (using a platform that charges either nothing or a very small fee per submission) and help with administrative support, and are working on financial support for more expensive models, but still far cheaper than the commercial publishers, where more elaborate services are offered.

Understandably, the main editors tended to be a lot more cautious on average than the bulk of the editorial boards. I think many of them were worried that they might accidentally destroy their journals if they flipped them, and in the case of journals with long traditions, this is not something one would want to be remembered for. So again, the more we can support Algebraic Combinatorics, the more likely it is that this caution will be reduced and other journals will consider following. (If you are an editor of a journal we have not approached, please do get in touch to discuss what the possibilities are — we have put a lot of thought into it.)

Another argument put forward by some editors is that to flip a journal risks damaging the reputation of the old version of the journal, and therefore, indirectly, the reputation of the papers published in it, some of which are by early-career researchers. So they did not want to flip in order to avoid damaging the careers of young mathematicians. If you are a young mathematician and would like to comment on whether you would be bothered by a journal flipping after you had published in it, we would be very interested to hear what you have to say.

Against that background I’d like to congratulate the editors of the Journal of Algebraic Combinatorics for their courage and for the work they have put into this. (But that word “work” should not put off other editors: one of the aims of our small group was to provide support and expertise, including from Johann Rooryck, the editor of the Elsevier journal Lingua, which flipped to become Glossa, in order to make the transition as easy as possible.) I’d also like to make clear, to avoid any misunderstanding that might arise, that although I’ve been involved in a lot of discussion with Mark Wilson’s group and wrote to many editors of other journals, my role in this particular flip has been a minor one.

And finally, let me repeat the main message of this post: please support the newly flipped journal, since the more successful it is, the greater the chance that other journals will follow, and the greater the chance that we will be able to move to a more sensible academic publishing system.

Tommaso DorigoAn ATLAS 240 GeV Higgs-Like Fluctuation Meets Predictions From Independent Researcher

A new analysis by the ATLAS collaboration, based of the data collected in 13 TeV proton-proton collisions delivered by the LHC in 2016, finds an excess of X-->4 lepton events at a mass of 240 GeV, with a local significance of 3.6 standard deviations. The search, which targeted objects of similar phenomenology to the 125 GeV Higgs boson discovered in 2012, is published in ATLAS CONF-2017-058. Besides the 240 GeV excess, another one at 700 GeV is found, with the same statistical significance.

read more

July 26, 2017

Tommaso DorigoRevenge Of The Slimeballs - Part 2

This is the second part of a section taken from Chapter 3 of the book "Anomaly! Collider Physics and the Quest for New Phenomena at Fermilab". The chapter recounts the pioneering measurement of the Z mass by the CDF detector, and the competition with SLAC during the summer of 1989. The title of the post is the same as the one of chapter 3, and it refers to the way some SLAC physicists called their Fermilab colleagues, whose hadron collider was to their eyes obviously inferior to the electron-positron linear collider.

read more

Terence TaoOn the universality of the incompressible Euler equation on compact manifolds

I’ve just uploaded to the arXiv my paper “On the universality of the incompressible Euler equation on compact manifolds“, submitted to Discrete and Continuous Dynamical Systems. This is a variant of my recent paper on the universality of potential well dynamics, but instead of trying to embed dynamical systems into a potential well {\partial_{tt} u = -\nabla V(u)}, here we try to embed dynamical systems into the incompressible Euler equations

\displaystyle  \partial_t u + \nabla_u u = - \mathrm{grad}_g p \ \ \ \ \ (1)

\displaystyle  \mathrm{div}_g u = 0

on a Riemannian manifold {(M,g)}. (One is particularly interested in the case of flat manifolds {M}, particularly {{\bf R}^3} or {({\bf R}/{\bf Z})^3}, but for the main result of this paper it is essential that one is permitted to consider curved manifolds.) This system, first studied by Ebin and Marsden, is the natural generalisation of the usual incompressible Euler equations to curved space; it can be viewed as the formal geodesic flow equation on the infinite-dimensional manifold of volume-preserving diffeomorphisms on {M} (see this previous post for a discussion of this in the flat space case).

The Euler equations can be viewed as a nonlinear equation in which the nonlinearity is a quadratic function of the velocity field {u}. It is thus natural to compare the Euler equations with quadratic ODE of the form

\displaystyle  \partial_t y = B(y,y) \ \ \ \ \ (2)

where {y: {\bf R} \rightarrow {\bf R}^n} is the unknown solution, and {B: {\bf R}^n \times {\bf R}^n \rightarrow {\bf R}^n} is a bilinear map, which we may assume without loss of generality to be symmetric. One can ask whether such an ODE may be linearly embedded into the Euler equations on some Riemannian manifold {(M,g)}, which means that there is an injective linear map {U: {\bf R}^n \rightarrow \Gamma(TM)} from {{\bf R}^n} to smooth vector fields on {M}, as well as a bilinear map {P: {\bf R}^n \times {\bf R}^n \rightarrow C^\infty(M)} to smooth scalar fields on {M}, such that the map {y \mapsto (U(y), P(y,y))} takes solutions to (2) to solutions to (1), or equivalently that

\displaystyle  U(B(y,y)) + \nabla_{U(y)} U(y) = - \mathrm{grad}_g P(y,y)

\displaystyle  \mathrm{div}_g U(y) = 0

for all {y \in {\bf R}^n}.

For simplicity let us restrict {M} to be compact. There is an obvious necessary condition for this embeddability to occur, which comes from energy conservation law for the Euler equations; unpacking everything, this implies that the bilinear form {B} in (2) has to obey a cancellation condition

\displaystyle  \langle B(y,y), y \rangle = 0 \ \ \ \ \ (3)

for some positive definite inner product {\langle, \rangle: {\bf R}^n \times {\bf R}^n \rightarrow {\bf R}} on {{\bf R}^n}. The main result of the paper is the converse to this statement: if {B} is a symmetric bilinear form obeying a cancellation condition (3), then it is possible to embed the equations (2) into the Euler equations (1) on some Riemannian manifold {(M,g)}; the catch is that this manifold will depend on the form {B} and on the dimension {n} (in fact in the construction I have, {M} is given explicitly as {SO(n) \times ({\bf R}/{\bf Z})^{n+1}}, with a funny metric on it that depends on {B}).

As a consequence, any finite dimensional portion of the usual “dyadic shell models” used as simplified toy models of the Euler equation, can actually be embedded into a genuine Euler equation, albeit on a high-dimensional and curved manifold. This includes portions of the self-similar “machine” I used in a previous paper to establish finite time blowup for an averaged version of the Navier-Stokes (or Euler) equations. Unfortunately, the result in this paper does not apply to infinite-dimensional ODE, so I cannot yet establish finite time blowup for the Euler equations on a (well-chosen) manifold. It does not seem so far beyond the realm of possibility, though, that this could be done in the relatively near future. In particular, the result here suggests that one could construct something resembling a universal Turing machine within an Euler flow on a manifold, which was one ingredient I would need to engineer such a finite time blowup.

The proof of the main theorem proceeds by an “elimination of variables” strategy that was used in some of my previous papers in this area, though in this particular case the Nash embedding theorem (or variants thereof) are not required. The first step is to lessen the dependence on the metric {g} by partially reformulating the Euler equations (1) in terms of the covelocity {g \cdot u} (which is a {1}-form) instead of the velocity {u}. Using the freedom to modify the dimension of the underlying manifold {M}, one can also decouple the metric {g} from the volume form that is used to obtain the divergence-free condition. At this point the metric can be eliminated, with a certain positive definiteness condition between the velocity and covelocity taking its place. After a substantial amount of trial and error (motivated by some “two-and-a-half-dimensional” reductions of the three-dimensional Euler equations, and also by playing around with a number of variants of the classic “separation of variables” strategy), I eventually found an ansatz for the velocity and covelocity that automatically solved most of the components of the Euler equations (as well as most of the positive definiteness requirements), as long as one could find a number of scalar fields that obeyed a certain nonlinear system of transport equations, and also obeyed a positive definiteness condition. Here I was stuck for a bit because the system I ended up with was overdetermined – more equations than unknowns. After trying a number of special cases I eventually found a solution to the transport system on the sphere, except that the scalar functions sometimes degenerated and so the positive definiteness property I wanted was only obeyed with positive semi-definiteness. I tried for some time to perturb this example into a strictly positive definite solution before eventually working out that this was not possible. Finally I had the brainwave to lift the solution from the sphere to an even more symmetric space, and this quickly led to the final solution of the problem, using the special orthogonal group rather than the sphere as the underlying domain. The solution ended up being rather simple in form, but it is still somewhat miraculous to me that it exists at all; in retrospect, given the overdetermined nature of the problem, relying on a large amount of symmetry to cut down the number of equations was basically the only hope.

Filed under: math.AP, math.DS, math.MG, paper Tagged: Euler equations, universality

July 25, 2017

Tim GowersIntransitive dice VI: sketch proof of the main conjecture for the balanced-sequences model

I have now completed a draft of a write-up of a proof of the following statement. Recall that a random n-sided die (in the balanced-sequences model) is a sequence of length n of integers between 1 and n that add up to n(n+1)/2, chosen uniformly from all such sequences. A die (a_1,\dots,a_n) beats a die (b_1,\dots,b_n) if the number of pairs (i,j) such that a_i>b_j exceeds the number of pairs (i,j) such that a_i<b_j. If the two numbers are the same, we say that A ties with B.

Theorem. Let A,B and C be random n-sided dice. Then the probability that A beats C given that A beats B and B beats C is \frac 12+o(1).

In this post I want to give a fairly detailed sketch of the proof, which will I hope make it clearer what is going on in the write-up.

The first step is to show that the theorem is equivalent to the following statement.

Theorem. Let A be a random n-sided die. Then with probability 1-o(1), the proportion of n-sided dice that A beats is \frac 12+o(1).

We had two proofs of this statement in earlier posts and comments on this blog. In the write-up I have used a very nice short proof supplied by Luke Pebody. There is no need to repeat it here, since there isn’t much to say that will make it any easier to understand than it already is. I will, however, mention once again an example that illustrates quite well what this statement does and doesn’t say. The example is of a tournament (that is, complete graph where every edge is given a direction) where every vertex beats half the other vertices (meaning that half the edges at the vertex go in and half go out) but the tournament does not look at all random. One just takes an odd integer n and puts arrows out from x to x+y mod n for every y\in\{1,2,\dots,(n-1)/2\}, and arrows into x for every y\in\{(n+1)/2,\dots,n-1\}. It is not hard to check that the probability that there is an arrow from x to z given that there are arrows from x to y and y to z is approximately 1/2, and this turns out to be a general phenomenon.

So how do we prove that almost all n-sided dice beat approximately half the other n-sided dice?

The first step is to recast the problem as one about sums of independent random variables. Let [n] stand for \{1,2,\dots,n\} as usual. Given a sequence A=(a_1,\dots,a_n)\in[n]^n we define a function f_A:[n]\to[n] by setting f_A(j) to be the number of i such that a_i<j plus half the number of i such that a_i=j. We also define g_A(j) to be f_A(j)-(j-1/2). It is not hard to verify that A beats B if \sum_jg_A(b_j)<0, ties with B if \sum_jg_A(b_j)=0, and loses to B if \sum_jg_A(b_j)>0.

So our question now becomes the following. Suppose we choose a random sequence (b_1,\dots,b_n) with the property that \sum_jb_j=n(n+1)/2. What is the probability that \sum_jg_A(b_j)>0? (Of course, the answer depends on A, and most of the work of the proof comes in showing that a “typical” A has properties that ensure that the probability is about 1/2.)

It is convenient to rephrase the problem slightly, replacing b_j by b_j-(n+1)/2. We can then ask it as follows. Suppose we choose a sequence (V_1,\dots,V_n) of n elements of the set \{-(n-1)/2,-(n-1)/2+1,\dots,(n-1)/2\}, where the terms of the sequence are independent and uniformly distributed. For each j let U_j=g_A(V_j). What is the probability that \sum_jU_j>0 given that \sum_jV_j=0?

This is a question about the distribution of \sum_j(U_j,V_j), where the (U_j,V_j) are i.i.d. random variables taking values in \mathbb Z^2 (at least if n is odd — a small modification is needed if n is even). Everything we know about probability would lead us to expect that this distribution is approximately Gaussian, and since it has mean (0,0), it ought to be the case that if we sum up the probabilities that \sum_j(U_j,V_j)=(x,0) over positive x, we should get roughly the same as if we sum them up over negative x. Also, it is highly plausible that the probability of getting (0,0) will be a lot smaller than either of these two sums.

So there we have a heuristic argument for why the second theorem, and hence the first, ought to be true.

There are several theorems in the literature that initially seemed as though they should be helpful. And indeed they were helpful, but we were unable to apply them directly, and had instead to develop our own modifications of their proofs.

The obvious theorem to mention is the central limit theorem. But this is not strong enough for two reasons. The first is that it tells you about the probability that a sum of random variables will lie in some rectangular region of \mathbb R^2 of size comparable to the standard deviation. It will not tell you the probability of belonging to some subset of the y-axis (even for discrete random variables). Another problem is that the central limit on its own does not give information about the rate of convergence to a Gaussian, whereas here we require one.

The second problem is dealt with for many applications by the Berry-Esseen theorem, but not the first.

The first problem is dealt with for many applications by local central limit theorems, about which Terence Tao has blogged in the past. These tell you not just about the probability of landing in a region, but about the probability of actually equalling some given value, with estimates that are precise enough to give, in many situations, the kind of information that we seek here.

What we did not find, however, was precisely the theorem we were looking for: a statement that would be local and 2-dimensional and would give information about the rate of convergence that was sufficiently strong that we would be able to obtain good enough convergence after only n steps. (I use the word “step” here because we can think of a sum of n independent copies of a 2D random variable as an n-step random walk.) It was not even clear in advance what such a theorem should say, since we did not know what properties we would be able to prove about the random variables (U_i,V_i) when A was “typical”. That is, we knew that not every A worked, so the structure of the proof (probably) had to be as follows.

1. Prove that A has certain properties with probability 1-o(1).

2. Using these properties, deduce that the sum \sum_{i=1}^n(U_i,V_i) converges very well after n steps to a Gaussian.

3. Conclude that the heuristic argument is indeed correct.

The key properties that A needed to have were the following two. First, there needed to be a bound on the higher moments of U. This we achieved in a slightly wasteful way — but the cost was a log factor that we could afford — by arguing that with high probability no value of g_A(j) has magnitude greater than 6\sqrt{n\log n}. To prove this the steps were as follows.

  1. Let A be a random element of [n]^n. Then the probability that there exists j with g_A(j)\geq 6\sqrt{n\log n} is at most n^{-k} (for some k such as 10).
  2. The probability that \sum_ia_i=n(n+1)/2 is at least cn^{-3/2} for some absolute constant c>0.
  3. It follows that if A is a random n-sided die, then with probability 1-o(1) we have |g_A(j)|\leq 6\sqrt{n\log n} for every j.

The proofs of the first two statements are standard probabilistic estimates about sums of independent random variables.

The second property that A needed to have is more difficult to obtain. There is a standard Fourier-analytic approach to proving central limit theorems, and in order to get good convergence it turns out that what one wants is for a certain Fourier transform to be sufficiently well bounded away from 1. More precisely, we define the characteristic function of the random variable (U,V) to be

\hat f(\alpha,\beta)=\mathbb E e(\alpha U+\beta V)=\sum_{x,y}f(x,y)e(\alpha x+\beta y),

where e(x) is shorthand for \exp(2\pi ix), f(x,y)=\mathbb P[(U,V)=(x,y)], and \alpha and \beta range over \mathbb T=\mathbb R/\mathbb Z.

I’ll come later to why it is good for \hat f(\alpha,\beta) not to be too close to 1. But for now I want to concentrate on how one proves a statement like this, since that is perhaps the least standard part of the argument.

To get an idea, let us first think what it would take for |\hat f(\alpha,\beta)| to be very close to 1. This condition basically tells us that \alpha U+\beta V is highly concentrated mod 1: indeed, if \alpha U+\beta V is highly concentrated, then e(\alpha U+\beta V) takes approximately the same value almost all the time, so the average is roughly equal to that value, which has modulus 1; conversely, if \alpha U+\beta V is not highly concentrated mod 1, then there is plenty of cancellation between the different values of e(\alpha U+\beta V) and the result is that the average has modulus appreciably smaller than 1.

So the task is to prove that the values of \alpha U+\beta V are reasonably well spread about mod 1. Note that this is saying that the values of \alpha g_A(j)+\beta(j-(n+1)/2) are reasonably spread about.

The way we prove this is roughly as follows. Let \alpha>0, let m be of order of magnitude \alpha^{-2}, and consider the values of g_A at the four points j, j+m, j+2m and j+3m. Then a typical order of magnitude of g_A(j)-g_A(j+m) is around \sqrt m, and one can prove without too much trouble (here the Berry-Esseen theorem was helpful to keep the proof short) that the probability that

|g_A(j)-g_A(j+m)-g_A(j+2m)+g_A(3m)|\geq c\sqrt m

is at least c, for some positive absolute constant c. It follows by Markov’s inequality that with positive probability one has the above inequality for many values of j.

That’s not quite good enough, since we want a probability that’s very close to 1. This we obtain by chopping up [n] into intervals of length 4m and applying the above argument in each interval. (While writing this I’m coming to think that I could just as easily have gone for progressions of length 3, not that it matters much.) Then in each interval there is a reasonable probability of getting the above inequality to hold many times, from which one can prove that with very high probability it holds many times.

But since m is of order \alpha^{-2}, \alpha\sqrt m is of order 1, which gives that the values e(g_A(j)), e(g_A(j+m)), e(g_A(j+2m), e(g_A(j+3m)) are far from constant whenever the above inequality holds. So by averaging we end up with a good upper bound for |\hat f(\alpha,\beta)|.

The alert reader will have noticed that if \alpha\ll n^{-1/2}, then the above argument doesn’t work, because we can’t choose m to be bigger than n. In that case, however, we just do the best we can: we choose m to be of order n/\log n, the logarithmic factor being there because we need to operate in many different intervals in order to get the probability to be high. We will get many quadruples where

\alpha|g_A(j)-g_A(j+m)-g_A(j+2m)+g_A(3m)|\geq c\alpha\sqrt m=c'\alpha\sqrt{n/\log n},

and this translates into a lower bound for 1-|\hat f(\alpha,\beta)| of order \alpha^2n/\log n, basically because 1-\cos\theta has order \theta^2 for small \theta. This is a good bound for us as long as we can use it to prove that |\hat f(\alpha,\beta)|^n is bounded above by a large negative power of n. For that we need \alpha^2n/\log n to be at least C\log n/n (since (1-C\log n/n)^n is about \exp(-C\log n)=n^{-C}), so we are in good shape provided that \alpha\gg\log n/n.

The alert reader will also have noticed that the probabilities for different intervals are not independent: for example, if some f_A(j) is equal to n, then beyond that g_A(j) depends linearly on j. However, except when j is very large, this is extremely unlikely, and it is basically the only thing that can go wrong. To make this rigorous we formulated a concentration inequality that states, roughly speaking, that if you have a bunch of k events, and almost always (that is, always, unless some very unlikely event occurs) the probability that the ith event holds given that all the previous events hold is at least c, then the probability that fewer than ck/2 of the events hold is exponentially small in k. The proof of the concentration inequality is a standard exponential-moment argument, with a small extra step to show that the low-probability events don’t mess things up too much.

Incidentally, the idea of splitting up the interval in this way came from an answer by Serguei Popov to a Mathoverflow question I asked, when I got slightly stuck trying to prove a lower bound for the second moment of U. I eventually didn’t use that bound, but the interval-splitting idea helped for the bound for the Fourier coefficient as well.

So in this way we prove that |\hat f(\alpha,\beta)|^n is very small if |\alpha|\gg\log n/n. A simpler argument of a similar flavour shows that |\hat f(\alpha,\beta)|^n is also very small if |\alpha| is smaller than this and |\beta|\gg n^{-3/2}.

Now let us return to the question of why we might like |\hat f(\alpha,\beta)|^n to be small. It follows from the inversion and convolution formulae in Fourier analysis. The convolution formula tells us that the characteristic function of the sum of the U_i (which are independent and each have characteristic function \hat f) is (\hat f)^n. And then the inversion formula tells us that

\mathbb P[(\sum_iU_i,\sum_iV_i)=(x,y)]=\int_{(\alpha,\beta)\in\mathbb T^2}\hat f(\alpha,\beta)^ne(-\alpha x-\beta y)\mathop{d\alpha}\mathop{d\beta}

What we have proved can be used to show that the contribution to the integral on the right-hand side from those pairs (\alpha,\beta) that lie outside a small rectangle (of width n^{-1} in the \alpha direction and n^{-3/2} in the \beta direction, up to log factors) is negligible.

All the above is true provided the random n-sided die A satisfies two properties (the bound on \|U\|_\infty and the bound on |\hat f(\alpha,\beta)|), which it does with probability 1-o(1).

We now take a die A with these properties and turn our attention to what happens inside this box. First, it is a standard fact about characteristic functions that their derivatives tell us about moments. Indeed,

\frac{\partial^{r+s}}{\partial^r\alpha\partial^s\beta}\mathbb E e(\alpha U+\beta V)=(2\pi i)^{r+s}\mathbb E U^rV^s e(\alpha U+\beta V),

and when \alpha=\beta=0 this is \mathbb E U^rV^s. It therefore follows from the two-dimensional version of Taylor’s theorem that

\hat f(\alpha,\beta)=1-2\pi^2(\alpha^2\mathbb EU^2+2\alpha\beta\mathbb EUV+\beta^2\mathbb EV^2)

plus a remainder term R(\alpha,\beta) that can be bounded above by a constant times (|\alpha|\|U\|_\infty+|\beta|\|V\|_\infty)^3.

Writing Q(\alpha,\beta) for 2\pi^2(\alpha^2\mathbb EU^2+2\alpha\beta\mathbb EUV+\beta^2\mathbb EV^2) we have that Q is a positive semidefinite quadratic form in \alpha and \beta. (In fact, it turns out to be positive definite.) Provided R(\alpha,\beta) is small enough, replacing it by zero does not have much effect on \hat f(\alpha,\beta)^n, and provided Q(\alpha,\beta)^2 is small enough, (1-Q(\alpha,\beta))^n is well approximated by \exp(-Q(\alpha,\beta)).

It turns out, crucially, that the approximations just described are valid in a box that is much bigger than the box inside which \hat f(\alpha,\beta) has a chance of not being small. That implies that the Gaussian decays quickly (and is why we know that Q is positive definite).

There is a bit of back-of-envelope calculation needed to check this, but the upshot is that the probability that (\sum_iU_i,\sum_iV_i)=(x,y) is very well approximated, at least when x and y aren’t too big, by a formula of the form

G(x,y)=\int\exp(-Q(\alpha,\beta))e(-\alpha x-\beta y)\mathop{d\alpha}\mathop{d\beta}.

But this is the formula for the Fourier transform of a Gaussian (at least if we let \alpha and \beta range over \mathbb R^2, which makes very little difference to the integral because the Gaussian decays so quickly), so it is the restriction to \mathbb Z^2 of a Gaussian, just as we wanted.

When we sum over infinitely many values of x and y, uniform estimates are not good enough, but we can deal with that very directly by using simple measure concentration estimates to prove that the probability that (\sum_iU_i,\sum_iV_i)=(x,y) is very small outside a not too large box.

That completes the sketch of the main ideas that go into showing that the heuristic argument is indeed correct.

Any comments about the current draft would be very welcome, and if anyone feels like working on it directly rather than through me, that is certainly a possibility — just let me know. I will try to post soon on the following questions, since it would be very nice to be able to add answers to them.

1. Is the more general quasirandomness conjecture false, as the experimental evidence suggests? (It is equivalent to the statement that if A and B are two random n-sided dice, then with probability 1-o(1), the four possibilities for whether another die beats A and whether it beats B each have probability \frac 14+o(1).)

2. What happens in the multiset model? Can the above method of proof be adapted to this case?

3. The experimental evidence suggests that transitivity almost always occurs if we pick purely random sequences from [n]^n. Can we prove this rigorously? (I think I basically have a proof of this, by showing that whether or not A beats B almost always depends on whether A has a bigger sum than B. I’ll try to find time reasonably soon to add this to the draft.)

Of course, other suggestions for follow-up questions will be very welcome, as will ideas about the first two questions above.

July 23, 2017

Jordan EllenbergHow many books do I read in a year?


2006: 27

2007: 19

2008: 22

2009: 30

2010: 23

2011: 19

2012: 27

2013: 35

2014: 31

2015: 38

2016: 29

Don’t quite know what to make of this.  I’m sort of surprised there’s so much variation!  I’d have thought I’d have read less when my kids were infants, or when I was writing my own book, but it seems pretty random.   I do see that I’ve been clearly reading more books the last few years than I did in 2012 and before.

Lists, as always, are here (2011 on) and here (2006-2010.)


July 22, 2017

Scott AaronsonIs “information is physical” contentful?

“Information is physical.”

This slogan seems to have originated around 1991 with Rolf Landauer.  It’s ricocheted around quantum information for the entire time I’ve been in the field, incanted in funding agency reports and popular articles and at the beginnings and ends of talks.

But what the hell does it mean?

There are many things it’s taken to mean, in my experience, that don’t make a lot of sense when you think about them—or else they’re vacuously true, or purely a matter of perspective, or not faithful readings of the slogan’s words.

For example, some people seem to use the slogan to mean something more like its converse: “physics is informational.”  That is, the laws of physics are ultimately not about mass or energy or pressure, but about bits and computations on them.  As I’ve often said, my problem with that view is less its audacity than its timidity!  It’s like, what would the universe have to do in order not to be informational in this sense?  “Information” is just a name we give to whatever picks out one element from a set of possibilities, with the “amount” of information given by the log of the set’s cardinality (and with suitable generalizations to infinite sets, nonuniform probability distributions, yadda yadda).  So, as long as the laws of physics take the form of telling us that some observations or configurations of the world are possible and others are not, or of giving us probabilities for each configuration, no duh they’re about information!

Other people use “information is physical” to pour scorn on the idea that “information” could mean anything without some actual physical instantiation of the abstract 0’s and 1’s, such as voltage differences in a loop of wire.  Here I certainly agree with the tautology that in order to exist physically—that is, be embodied in the physical world—a piece of information (like a song, video, or computer program) does need to be embodied in the physical world.  But my inner Platonist slumps in his armchair when people go on to assert that, for example, it’s meaningless to discuss the first prime number larger than 1010^125, because according to post-1998 cosmology, one couldn’t fit its digits inside the observable universe.

If the cosmologists revise their models next week, will this prime suddenly burst into existence, with all the mathematical properties that one could’ve predicted for it on general grounds—only to fade back into the netherworld if the cosmologists revise their models again?  Why would anyone want to use language in such a tortured way?

Yes, brains, computers, yellow books, and so on that encode mathematical knowledge comprise only a tiny sliver of the physical world.  But it’s equally true that the physical world we observe comprises only a tiny sliver of mathematical possibility-space.

Still other people use “information is physical” simply to express their enthusiasm for the modern merger of physical and information sciences, as exemplified by quantum computing.  Far be it from me to temper that enthusiasm: rock on, dudes!

Yet others use “information is physical” to mean that the rules governing information processing and transmission in the physical world aren’t knowable a priori, but can only be learned from physics.  This is clearest in the case of quantum information, which has its own internal logic that generalizes the logic of classical information.  But in some sense, we didn’t need quantum mechanics to tell us this!  Of course the laws of physics have ultimate jurisdiction over whatever occurs in the physical world, information processing included.

My biggest beef, with all these unpackings of the “information is physical” slogan, is that none of them really engage with any of the deep truths that we’ve learned about physics.  That is, we could’ve had more-or-less the same debates about any of them, even in a hypothetical world where the laws of physics were completely different.

So then what should we mean by “information is physical”?  In the rest of this post, I’d like to propose an answer to that question.

We get closer to the meat of the slogan if we consider some actual physical phenomena, say in quantum mechanics.  The double-slit experiment will do fine.

Recall: you shoot photons, one by one, at a screen with two slits, then examine the probability distribution over where the photons end up on a second screen.  You ask: does that distribution contain alternating “light” and “dark” regions, the signature of interference between positive and negative amplitudes?  And the answer, predicted by the math and confirmed by experiment, is: yes, but only if the information about which slit the photon went through failed to get recorded anywhere else in the universe, other than the photon location itself.

Here a skeptic interjects: but that has to be wrong!  The criterion for where a physical particle lands on a physical screen can’t possibly depend on anything as airy as whether “information” got “recorded” or not.  For what counts as “information,” anyway?  As an extreme example: what if God, unbeknownst to us mortals, took divine note of which slit the photon went through?  Would that destroy the interference pattern?  If so, then every time we do the experiment, are we collecting data about the existence or nonexistence of an all-knowing God?

It seems to me that the answer is: insofar as the mind of God can be modeled as a tensor factor in Hilbert space, yes, we are.  And crucially, if quantum mechanics is universally true, then the mind of God would have to be such a tensor factor, in order for its state to play any role in the prediction of observed phenomena.

To say this another way: it’s obvious and unexceptionable that, by observing a physical system, you can often learn something about what information must be in it.  For example, you need never have heard of DNA to deduce that chickens must somehow contain information about making more chickens.  What’s much more surprising is that, in quantum mechanics, you can often deduce things about what information can’t be present, anywhere in the physical world—because if such information existed, even a billion light-years away, it would necessarily have a physical effect that you don’t see.

Another famous example here concerns identical particles.  You may have heard the slogan that “if you’ve seen one electron, you’ve seen them all”: that is, apart from position, momentum, and spin, every two electrons have exactly the same mass, same charge, same every other property, including even any properties yet to be discovered.  Again the skeptic interjects: but that has to be wrong.  Logically, you could only ever confirm that two electrons were different, by observing a difference in their behavior.  Even if the electrons had behaved identically for a billion years, you couldn’t rule out the possibility that they were actually different, for example because of tiny nametags (“Hi, I’m Emily the Electron!” “Hi, I’m Ernie!”) that had no effect on any experiment you’d thought to perform, but were visible to God.

You can probably guess where this is going.  Quantum mechanics says that, no, you can verify that two particles are perfectly identical by doing an experiment where you swap them and see what happens.  If the particles are identical in all respects, then you’ll see quantum interference between the swapped and un-swapped states.  If they aren’t, you won’t.  The kind of interference you’ll see is different for fermions (like electrons) than for bosons (like photons), but the basic principle is the same in both cases.  Once again, quantum mechanics lets you verify that a specific type of information—in this case, information that distinguishes one particle from another—was not present anywhere in the physical world, because if it were, it would’ve destroyed an interference effect that you in fact saw.

This, I think, already provides a meatier sense in which “information is physical” than any of the senses discussed previously.

But we haven’t gotten to the filet mignon yet.  The late, great Jacob Bekenstein will forever be associated with the discovery that information, wherever and whenever it occurs in the physical world, takes up a minimum amount of space.  The most precise form of this statement, called the covariant entropy bound, was worked out in detail by Raphael Bousso.  Here I’ll be discussing a looser version of the bound, which holds in “non-pathological” cases, and which states that a bounded physical system can store at most A/(4 ln 2) bits of information, where A is the area in Planck units of any surface that encloses the system—so, about 1069 bits per square meter.  (Actually it’s 1069 qubits per square meter, but because of Holevo’s theorem, an upper bound on the number of qubits is also an upper bound on the number of classical bits that can be reliably stored in a system and then retrieved later.)

You might have heard of the famous way Nature enforces this bound.  Namely, if you tried to create a hard drive that stored more than 1069 bits per square meter of surface area, the hard drive would necessarily collapse to a black hole.  And from that point on, the information storage capacity would scale “only” with the area of the black hole’s event horizon—a black hole itself being the densest possible hard drive allowed by physics.

Let’s hear once more from our skeptic.  “Nonsense!  Matter can take up space.  Energy can take up space.  But information?  Bah!  That’s just a category mistake.  For a proof, suppose God took one of your black holes, with a 1-square-meter event horizon, which already had its supposed maximum of ~1069 bits of information.  And suppose She then created a bunch of new fundamental fields, which didn’t interact with gravity, electromagnetism, or any of the other fields that we know from observation, but which had the effect of encoding 10300 new bits in the region of the black hole.  Presto!  An unlimited amount of additional information, exactly where Bekenstein said it couldn’t exist.”

We’d like to pinpoint what’s wrong with the skeptic’s argument—and do so in a self-contained, non-question-begging way, a way that doesn’t pull any rabbits out of hats, other than the general principles of relativity and quantum mechanics.  I was confused myself about how to do this, until a month ago, when Daniel Harlow helped set me straight (any remaining howlers in my exposition are 100% mine, not his).

I believe the logic goes like this:

  1. Relativity—even just Galilean relativity—demands that, in flat space, the laws of physics must have the same form for all inertial observers (i.e., all observers who move through space at constant speed).
  2. Anything in the physical world that varies in space—say, a field that encodes different bits of information at different locations—also varies in time, from the perspective of an observer who moves through the field at a constant speed.
  3. Combining 1 and 2, we conclude that anything that can vary in space can also vary in time.  Or to say it better, there’s only one kind of varying: varying in spacetime.
  4. More strongly, special relativity tells us that there’s a specific numerical conversion factor between units of space and units of time: namely the speed of light, c.  Loosely speaking, this means that if we know the rate at which a field varies across space, we can also calculate the rate at which it varies across time, and vice versa.
  5. Anything that varies across time carries energy.  Why?  Because this is essentially the definition of energy in quantum mechanics!  Up to a constant multiple (namely, Planck’s constant), energy is the expected speed of rotation of the global phase of the wavefunction, when you apply your Hamiltonian.  If the global phase rotates at the slowest possible speed, then we take the energy to be zero, and say you’re in a vacuum state.  If it rotates at the next highest speed, we say you’re in a first excited state, and so on.  Indeed, assuming a time-independent Hamiltonian, the evolution of any quantum system can be fully described by simply decomposing the wavefunction into a superposition of energy eigenstates, then tracking of the phase of each eigenstate’s amplitude as it loops around and around the unit circle.  No energy means no looping around means nothing ever changes.
  6. Combining 3 and 5, any field that varies across space carries energy.
  7. More strongly, combining 4 and 5, if we know how quickly a field varies across space, we can lower-bound how much energy it has to contain.
  8. In general relativity, anything that carries energy couples to the gravitational field.  This means that anything that carries energy necessarily has an observable effect: if nothing else, its effect on the warping of spacetime.  (This is dramatically illustrated by dark matter, which is currently observable via its spacetime warping effect and nothing else.)
  9. Combining 6 and 8, any field that varies across space couples to the gravitational field.
  10. More strongly, combining 7 and 8, if we know how quickly a field varies across space, then we can lower-bound by how much it has to warp spacetime.  This is so because of another famous (and distinctive) feature of gravity: namely, the fact that it’s universally attractive, so all the warping contributions add up.
  11. But in GR, spacetime can only be warped by so much before we create a black hole: this is the famous Schwarzschild bound.
  12. Combining 10 and 11, the information contained in a physical field can only vary so quickly across space, before it causes spacetime to collapse to a black hole.

Summarizing where we’ve gotten, we could say: any information that’s spatially localized at all, can only be localized so precisely.  In our world, the more densely you try to pack 1’s and 0’s, the more energy you need, therefore the more you warp spacetime, until all you’ve gotten for your trouble is a black hole.  Furthermore, if we rewrote the above conceptual argument in math—keeping track of all the G’s, c’s, h’s, and so on—we could derive a quantitative bound on how much information there can be in a bounded region of space.  And if we were careful enough, that bound would be precisely the holographic entropy bound, which says that the number of (qu)bits is at most A/(4 ln 2), where A is the area of a bounding surface in Planck units.

Let’s pause to point out some interesting features of this argument.

Firstly, we pretty much needed the whole kitchen sink of basic physical principles: special relativity (both the equivalence of inertial frames and the finiteness of the speed of light), quantum mechanics (in the form of the universal relation between energy and frequency), and finally general relativity and gravity.  All three of the fundamental constants G, c, and h made appearances, which is why all three show up in the detailed statement of the holographic bound.

But secondly, gravity only appeared from step 8 onwards.  Up till then, everything could be said solely in the language of quantum field theory: that is, quantum mechanics plus special relativity.  The result would be the so-called Bekenstein bound, which upper-bounds the number of bits in any spatial region by the product of the region’s radius and its energy content.  I learned that there’s an interesting history here: Bekenstein originally deduced this bound using ingenious thought experiments involving black holes.  Only later did people realize that the Bekenstein bound can be derived purely within QFT (see here and here for example)—in contrast to the holographic bound, which really is a statement about quantum gravity.  (An early hint of this was that, while the holographic bound involves Newton’s gravitational constant G, the Bekenstein bound doesn’t.)

Thirdly, speaking of QFT, some readers might be struck by the fact that at no point in our 12-step program did we ever seem to need QFT machinery.  Which is fortunate, because if we had needed it, I wouldn’t have been able to explain any of this!  But here I have to confess that I cheated slightly.  Recall step 4, which said that “if you know the rate at which a field varies across space, you can calculate the rate at which it varies across time.”  It turns out that, in order to give that sentence a definite meaning, one uses the fact that in QFT, space and time derivatives in the Hamiltonian need to be related by a factor of c, since otherwise the Hamiltonian wouldn’t be Lorentz-invariant.

Fourthly, eagle-eyed readers might notice a loophole in the argument.  Namely, we never upper-bounded how much information God could add to the world, via fields that are constant across all of spacetime.  For example, there’s nothing to stop Her from creating a new scalar field that takes the same value everywhere in the universe—with that value, in suitable units, encoding 1050000 separate divine thoughts in its binary expansion.  But OK, being constant, such a field would interact with nothing and affect no observations—so Occam’s Razor itches to slice it off, by rewriting the laws of physics in a simpler form where that field is absent.  If you like, such a field would at most be a comment in the source code of the universe: it could be as long as the Great Programmer wanted it to be, but would have no observable effect on those of us living inside the program’s execution.

Of course, even before relativity and quantum mechanics, information had already been playing a surprisingly fleshy role in physics, through its appearance as entropy in 19th-century thermodynamics.  Which leads to another puzzle.  To a computer scientist, the concept of entropy, as the log of the number of microstates compatible with a given macrostate, seems clear enough, as does the intuition for why it should increase monotonically with time.  Or at least, to whatever extent we’re confused about these matters, we’re no more confused than the physicists are!

But then why should this information-theoretic concept be so closely connected to tangible quantities like temperature, and pressure, and energy?  From the mere assumption that a black hole has a nonzero entropy—that is, that it takes many bits to describe—how could Bekenstein and Hawking have possibly deduced that it also has a nonzero temperature?  Or: if you put your finger into a tub of hot water, does the heat that you feel somehow reflect how many bits are needed to describe the water’s microstate?

Once again our skeptic pipes up: “but surely God could stuff as many additional bits as She wanted into the microstate of the hot water—for example, in degrees of freedom that are still unknown to physics—without the new bits having any effect on the water’s temperature.”

But we should’ve learned by now to doubt this sort of argument.  There’s no general principle, in our universe, saying that you can hide as many bits as you want in a physical object, without those bits influencing the object’s observable properties.  On the contrary, in case after case, our laws of physics seem to be intolerant of “wallflower bits,” which hide in a corner without talking to anyone.  If a bit is there, the laws of physics want it to affect other nearby bits and be affected by them in turn.

In the case of thermodynamics, the assumption that does all the real work here is that of equidistribution.  That is, whatever degrees of freedom might be available to your thermal system, your gas in a box or whatever, we assume that they’re all already “as randomized as they could possibly be,” subject to a few observed properties like temperature and volume and pressure.  (At least, we assume that in classical thermodynamics.  Non-equilibrium thermodynamics is a whole different can of worms, worms that don’t stay in equilibrium.)  Crucially, we assume this despite the fact that we might not even know all the relevant degrees of freedom.

Why is this assumption justified?  “Because experiment bears it out,” the physics teacher explains—but we can do better.  The assumption is justified because, as long as the degrees of freedom that we’re talking about all interact with each other, they’ve already had plenty of time to equilibrate.  And conversely, if a degree of freedom doesn’t interact with the stuff we’re observing—or with anything that interacts with the stuff we’re observing, etc.—well then, who cares about it anyway?

But now, because the microscopic laws of physics have the fundamental property of reversibility—that is, they never destroy information—a new bit has to go somewhere, and it can’t overwrite degrees of freedom that are already fully randomized.  This is why, if you pump more bits of information into a tub of hot water, while keeping it at the same volume, the new bits have nowhere to go except into pushing up the energy.  Now, there are often ways to push up the energy other than by raising the temperature—the concept of specific heat, in chemistry, is precisely about this—but if you need to stuff more bits into a substance, at the cost of raising its energy, certainly one of the obvious ways to do it is to describe a greater range of possible speeds for the water molecules.  So since that can happen, by equidistribution it typically does happen, which means that the molecules move faster on average, and your finger feels the water get hotter.

In summary, our laws of physics are structured in such a way that even pure information often has “nowhere to hide”: if the bits are there at all in the abstract machinery of the world, then they’re forced to pipe up and have a measurable effect.  And this is not a tautology, but comes about only because of nontrivial facts about special and general relativity, quantum mechanics, quantum field theory, and thermodynamics.  And this is what I think people should mean when they say “information is physical.”

Anyway, if this was all obvious to you, I apologize for having wasted your time!  But in my defense, it was never explained to me quite this way, nor was it sorted out in my head until recently—even though it seems like one of the most basic and general things one can possibly say about physics.

Endnotes. Thanks again to Daniel Harlow, not only for explaining the logic of the holographic bound to me but for several suggestions that improved this post.

Some readers might suspect circularity in the arguments we’ve made: are we merely saying that “any information that has observable physical consequences, has observable physical consequences”?  No, it’s more than that.  In all the examples I discussed, the magic was that we inserted certain information into our abstract mathematical description of the world, taking no care to ensure that the information’s presence would have any observable consequences whatsoever.  But then the principles of quantum mechanics, quantum gravity, or thermodynamics forced the information to be detectable in very specific ways (namely, via the destruction of quantum interference, the warping of spacetime, or the generation of heat respectively).

July 21, 2017

BackreactionPenrose claims LIGO noise is evidence for Cyclic Cosmology

Noise is the physicists’ biggest enemy. Unless you are a theorist whose pet idea masquerades as noise. Then you are best friends with noise. Like Roger Penrose. Correlated "noise" in LIGO gravitational wave signals: an implication of Conformal Cyclic Cosmology Roger Penrose arXiv:1707.04169 [gr-qc] Roger Penrose made his name with the Penrose-Hawking theorems and twistor theory. He is also

July 20, 2017

Andrew JaffePython Bug Hunting

This is a technical, nerdy post, mostly so I can find the information if I need it later, but possibly of interest to others using a Mac with the Python programming language, and also since I am looking for excuses to write more here. (See also updates below.)

It seems that there is a bug in the latest (mid-May 2017) release of Apple’s macOS Sierra 10.12.5 (ok, there are plenty of bugs, as there in any sufficiently complex piece of software).

It first manifested itself (to me) as an error when I tried to load the jupyter notebook, a web-based graphical front end to Python (and other languages). When the command is run, it opens up a browser window. However, after updating macOS from 10.12.4 to 10.12.5, the browser didn’t open. Instead, I saw an error message:

    0:97: execution error: "http://localhost:8888/tree?token=<removed>" doesn't understand the "open location" message. (-1708)

A little googling found that other people had seen this error, too. I was able to figure out a workaround pretty quickly: this behaviour only happens when I wanted to use the “default” browser, which is set in the “General” tab of the “System Preferences” app on the Mac (I have it set to Apple’s own “Safari” browser, but you can use Firefox or Chrome or something else). Instead, there’s a text file you can edit to explicitly set the browser that you want jupyter to use, located at ~/.jupyter/, by including the line

c.NotebookApp.browser = u'Safari'

(although an unrelated bug in Python means that you can’t currently use “Chrome” in this slot).

But it turns out this isn’t the real problem. I went and looked at the code in jupyter that is run here, and it uses a Python module called webbrowser. Even outside of jupyter, trying to use this module to open the default browser fails, with exactly the same error message (though I’m picking a simpler URL at instead of the jupyter-related one above):

>>> import webbrowser
>>> br = webbrowser.get()
0:33: execution error: "" doesn't understand the "open location" message. (-1708)

So I reported this as an error in the Python bug-reporting system, and hoped that someone with more experience would look at it.

But it nagged at me, so I went and looked at the source code for the webbrowser module. There, it turns out that the programmers use a macOS command called “osascript” (which is a command-line interface to Apple’s macOS automation language “AppleScript”) to launch the browser, with a slightly different syntax for the default browser compared to explicitly picking one. Basically, the command is osascript -e 'open location ""'. And this fails with exactly the same error message. (The similar code osascript -e 'tell application "Safari" to open location ""' which picks a specific browser runs just fine, which is why explicitly setting “Safari” back in the jupyter file works.)

But there is another way to run the exact same AppleScript command. Open the Mac app called “Script Editor”, type open location "" into the window, and press the “run” button. From the experience with “osascript”, I expected it to fail, but it didn’t: it runs just fine.

So the bug is very specific, and very obscure: it depends on exactly how the offending command is run, so appears to be a proper bug, and not some sort of security patch from Apple (and it certainly doesn’t appear in the 10.12.5 release notes). I have filed a bug report with Apple, but these are not publicly accessible, and are purported to be something of a black hole, with little feedback from the still-secretive Apple development team.


July 18, 2017

Terence TaoMaryam Mirzakhani

I am totally stunned to learn that Maryam Mirzakhani died today yesterday, aged 40, after a severe recurrence of the cancer she had been fighting for several years.  I had planned to email her some wishes for a speedy recovery after learning about the relapse yesterday; I still can’t fully believe that she didn’t make it.

My first encounter with Maryam was in 2010, when I was giving some lectures at Stanford – one on Perelman’s proof of the Poincare conjecture, and another on random matrix theory.  I remember a young woman sitting in the front who asked perceptive questions at the end of both talks; it was only afterwards that I learned that it was Mirzakhani.  (I really wish I could remember exactly what the questions were, but I vaguely recall that she managed to put a nice dynamical systems interpretation on both of the topics of my talks.)

After she won the Fields medal in 2014 (as I posted about previously on this blog), we corresponded for a while.  The Fields medal is of course one of the highest honours one can receive in mathematics, and it clearly advances one’s career enormously; but it also comes with a huge initial burst of publicity, a marked increase in the number of responsibilities to the field one is requested to take on, and a strong expectation to serve as a public role model for mathematicians.  As the first female recipient of the medal, and also the first to come from Iran, Maryam was experiencing these pressures to a far greater extent than previous medallists, while also raising a small daughter and fighting off cancer.  I gave her what advice I could on these matters (mostly that it was acceptable – and in fact necessary – to say “no” to the vast majority of requests one receives).

Given all this, it is remarkable how productive she still was mathematically in the last few years.  Perhaps her greatest recent achievement has been her “magic wandtheorem with Alex Eskin, which is basically the analogue of the famous measure classification and orbit closure theorems of Marina Ratner, in the context of moduli spaces instead of unipotent flows on homogeneous spaces.  (I discussed Ratner’s theorems in this previous post.  By an unhappy coincidence, Ratner also passed away this month, aged 78.)  Ratner’s theorems are fundamentally important to any problem to which a homogeneous dynamical system can be associated (for instance, a special case of that theorem shows up in my work with Ben Green and Tamar Ziegler on the inverse conjecture for the Gowers norms, and on linear equations in primes), as it gives a good description of the equidistribution of any orbit of that system (if it is unipotently generated); and it seems the Eskin-Mirzakhani result will play a similar role in problems associated instead to moduli spaces.  The remarkable proof of this result – which now stands at over 200 pages, after three years of revision and updating – uses almost all of the latest techniques that had been developed for homogeneous dynamics, and ingeniously adapts them to the more difficult setting of moduli spaces, in a manner that had not been dreamed of being possible only a few years earlier.

Maryam was an amazing mathematician and also a wonderful and humble human being, who was at the peak of her powers.  Today was a huge loss for Maryam’s family and friends, as well as for mathematics.

[EDIT, Jul 16: New York times obituary here.]

[EDIT, Jul 18: New Yorker memorial here.]


Filed under: obituary Tagged: Maryam Mirzakhani

July 17, 2017

Scott AaronsonThree things

I was shocked and horrified to learn of the loss of Maryam Mirzakhani at age 40, after a battle with cancer (see here or here).  Mirzakhani was a renowned mathematician at Stanford and the world’s first and so far only female Fields Medalist.  I never had the privilege of meeting her, but everything I’ve read about her fills me with admiration.  I wish to offer condolences to her friends and family, including her husband Jan Vondrák, also a professor at Stanford and a member of the CS theory community.

In other depressing news, discussion continues to rage on social media about “The Uninhabitable Earth,” the New York magazine article by David Wallace-Wells arguing that the dangers of climate change have been systematically understated even by climate scientists; that sea level rise is the least of the problems; and that if we stay the current course, much of the earth’s landmass has a good chance of being uninhabitable by the year 2100.  In an unusual turn of events, the Wallace-Wells piece has been getting slammed by climate scientists, including Michael Mann (see here and also this interview)—people who are usually in the news to refute the claims of deniers.

Some of the critics’ arguments seem cogent to me: for example, that Wallace-Wells misunderstood some satellite data, and more broadly, that the piece misleadingly presents its scenario as overwhelmingly probable by 2100 if we do nothing, rather than as “only” 10% likely or whatever—i.e., a mere Trump-becoming-president level of risk.  Other objections to the article impressed me less: for example, that doom-and-gloom is a bad way to motivate people about climate change; that the masses need a more optimistic takeaway.  That obviously has no bearing on the truth of what’s going to happen—but even if we did agree to entertain such arguments, well, it’s not as if mainstream messaging on climate change has been an unmitigated success.  What if everyone should be sweating-in-the-night terrified?

As far as I understand it, the question of the plausibility of Wallace-Wells’s catastrophe scenario mostly just comes down to a single scientific unknown: namely, will the melting permafrost belch huge amounts of methane into the atmosphere?  If it does, then “Armageddon” is probably a fair description of what awaits us in the next century, and if not, not.  Alas, our understanding of permafrost doesn’t seem especially reliable, and it strikes me that models of such feedbacks have a long history of erring on the side of conservatism (for example, researchers were astonished by how quickly glaciers and ice shelves fell apart).

So, while I wish the article was written with more caveats, I submit that runaway warming scenarios deserve more attention rather than less.  And we should be putting discussion of those scenarios in exactly the broader context that Wallace-Wells does: namely, that of the Permian-Triassic extinction event, the Fermi paradox, and the conditions for a technological civilization to survive past its infancy.

Certainly we spend much more time on risks to civilization (e.g., nuclear terrorism, bioengineered pandemics) that strike me as less probable than this one.  And certainly this tail, in the distribution of possible outcomes, deserves at least as much attention as its more popular opposite, the tail where climate change turns out not to be much of a problem at all.  For the grim truth about climate change is that history won’t end in 2100: only the projections do.  And the mere addition of 50 more years could easily suffice to turn a tail risk into a body risk.

Of course, that the worst will happen is a clear prediction of reverse Hollywoodism theory—besides being the “natural, default” prediction for a computer scientist used to worst-case analysis.  This is one prediction that I hope turns out to be as wrong as possible.

OK, now for something to cheer us all up.  Yesterday the group of Misha Lukin, at Harvard, put a paper on the arXiv reporting the creation of a 51-qubit quantum simulator using cold atoms.  The paper doesn’t directly address the question of quantum supremacy, or indeed of performance comparisons between the new device and classical simulations at all.  But this is clearly a big step forward, while the world waits for the fully-programmable 50-qubit superconducting QCs that have been promised by the groups at Google and IBM.

Indeed, this strikes me as the most exciting news in experimental quantum information since last month, when Jian-Wei Pan’s group in Shanghai reported the first transmission of entangled photons from a satellite to earth—thereby allowing violations of the Bell inequality over 1200 kilometers, teleportation of a qubit from earth to space, and other major firsts.  These are breakthroughs that we knew were in the works ever since the Chinese government launched the QUESS satellite devoted to quantum communications.  I should’ve blogged about them in June.  Then again, regular readers of Shtetl-Optimized, familiar as they already are with the universal reach of quantum mechanics and with the general state of quantum information technology, shouldn’t find anything here that fundamentally surprises them, should they?

Noncommutative GeometryMaryam Mirzakhani (May 3, 1977 - July 15, 2017)

Some twenty five years ago when I was told by one of  her teachers that she is truly brilliant and exceptional I could have not imagined that I will see this terribly sad day. A shining star is turned off far too soon. Heartfelt condolences to her husband, family, and mathematics community worldwide. از شمار دو چشم یک تن کم وز شمار خرد هزاران بیش "Two eyes are gone  Thousands of minds

July 16, 2017

Matt StrasslerOngoing Chance of Northern (or Southern) Lights

As forecast, the cloud of particles from Friday’s solar flare (the “coronal mass emission”, or “CME”) arrived at our planet a few hours after my last post, early in the morning New York time. If you’d like to know how I knew that it had reached Earth, and how I know what’s going on now, scroll down to the end of this post and I’ll show you the data I was following, which is publicly available at all times.

So far the resulting auroras have stayed fairly far north, and so I haven’t seen any — though they were apparently seen last night in Washington and Wyoming, and presumably easily seen in Canada and Alaska. [Caution: sometimes when people say they’ve been “seen”, they don’t quite mean that; I often see lovely photos of aurora that were only visible to a medium-exposure camera shot, not to the naked eye.]  Or rather, I should say that the auroras have stayed fairly close to the Earth’s poles; they were also seen in New Zealand.

Russia and Europe have a good opportunity this evening. As for the U.S.? The storm in the Earth’s magnetic field is still going on, so tonight is still a definite possibility for northern states. Keep an eye out! Look for what is usually a white or green-hued glow, often in swathes or in stripes pointing up from the northern horizon, or even overhead if you’re lucky.  The stripes can move around quite rapidly.

Now, here’s how I knew all this.  I’m no expert on auroras; that’s not my scientific field at all.   But the U.S. Space Weather Prediction Center at the National Oceanic and Atmospheric Administration, which needs to monitor conditions in space in case they should threaten civilian and military satellites or even installations on the ground, provides a wonderful website with lots of relevant data.

The first image on the site provides the space weather overview; a screenshot from the present is shown below, with my annotations.  The upper graph indicates a blast of x-rays (a form of light not visible to the human eye) which is generated when the solar flare, the magnetically-driven explosion on the sun, first occurs.  Then the slower cloud of particles (protons, electrons, and other atomic nuclei, all of which have mass and therefore can’t travel at light’s speed) takes a couple of days to reach Earth.  It’s arrival is shown by the sudden jump in the middle graph.  Finally, the lower graph measures how active the Earth’s magnetic field is.  The only problem with that plot is it tends to be three hours out of date, so beware of that! A “Kp index” of 5 shows significant activity; 6 means that auroras are likely to be moving away from the poles, and 7 or 8 mean that the chances in a place like the north half of the United States are pretty good.  So far, 6 has been the maximum generated by the current flare, but things can fluctuate a little, so 6 or 7 might occur tonight.  Keep an eye on that lower plot; if it drops back down to 4, forget it, but it it’s up at 7, take a look for sure!


Also on the site is data from the ACE satellite.  This satellite sits 950 thousand miles [1.5 million kilometers] from Earth, between Earth and the Sun, which is 93 million miles [150 million kilometers] away.  At that vantage point, it gives us (and our other satellites) a little early warning, of up to an hour, before the cloud of slow particles from a solar flare arrives.  That provides enough lead-time to turn off critical equipment that might otherwise be damaged.  And you can see, in the plot below, how at a certain time in the last twenty-four hours the readings from the satellite, which had been tepid before, suddenly started fluctuating wildly.  That was the signal that the flare had struck the satellite, and would arrive shortly at our location.


It’s a wonderful feature of the information revolution that you can get all this scientific data yourself, and not wait around hoping for a reporter or blogger to process it for you.  None of this was available when I was a child, and I missed many a sky show.  A big thank you to NOAA, and to the U.S. taxpayers who make their work possible.



Filed under: Astronomy Tagged: astronomy, auroras, space

July 15, 2017

Matt StrasslerLights in the Sky (maybe…)

The Sun is busy this summer. The upcoming eclipse on August 21 will turn day into deep twilight and transfix millions across the United States.  But before we get there, we may, if we’re lucky, see darkness transformed into color and light.

On Friday July 14th, a giant sunspot in our Sun’s upper regions, easily visible if you project the Sun’s image onto a wall, generated a powerful flare.  A solar flare is a sort of magnetically powered explosion; it produces powerful electromagnetic waves and often, as in this case, blows a large quantity of subatomic particles from the Sun’s corona. The latter is called a “coronal mass ejection.” It appears that the cloud of particles from Friday’s flare is large, and headed more or less straight for the Earth.

Light, visible and otherwise, is an electromagnetic wave, and so the electromagnetic waves generated in the flare — mostly ultraviolet light and X-rays — travel through space at the speed of light, arriving at the Earth in eight and a half minutes. They cause effects in the Earth’s upper atmosphere that can disrupt radio communications, or worse.  That’s another story.

But the cloud of subatomic particles from the coronal mass ejection travels a few hundred times slower than light, and it takes it about two or three days to reach the Earth.  The wait is on.

Bottom line: a huge number of high-energy subatomic particles may arrive in the next 24 to 48 hours. If and when they do, the electrically charged particles among them will be trapped in, and shepherded by, the Earth’s magnetic field, which will drive them spiraling into the atmosphere close to the Earth’s polar regions. And when they hit the atmosphere, they’ll strike atoms of nitrogen and oxygen, which in turn will glow. Aurora Borealis, Northern Lights.

So if you live in the upper northern hemisphere, including Europe, Canada and much of the United States, keep your eyes turned to the north (and to the south if you’re in Australia or southern South America) over the next couple of nights. Dark skies may be crucial; the glow may be very faint.

You can also keep abreast of the situation, as I will, using NOAA data, available for instance at

The plot on the upper left of that website, an example of which is reproduced below, shows three types of data. The top graph shows the amount of X-rays impacting the atmosphere; the big jump on the 14th is Friday’s flare. And if and when the Earth’s magnetic field goes nuts and auroras begin, the bottom plot will show the so-called “Kp Index” climbing to 5, 6, or hopefully 7 or 8. When the index gets that high, there’s a much greater chance of seeing auroras much further away from the poles than usual.

The latest space weather overview plot

Keep an eye also on the data from the ACE satellite, lower down on the website; it’s placed to give Earth an early warning, so when its data gets busy, you’ll know the cloud of particles is not far away.

Wishing you all a great sky show!

Filed under: LHC News

July 13, 2017

Jordan EllenbergLo!

“A naked man in a city street — the track of a horse in volcanic mud — the mystery of reindeer’s ears — a huge, black form, like a whale, in the sky, and it drips red drops as if attacked by celestial swordfishes — an appalling cherub appears in the sea —




Scott AaronsonAmsterdam art museums plagiarizing my blog?

This past week I had the pleasure of attending COLT (Conference on Learning Theory) 2017 in Amsterdam, and of giving an invited talk on “PAC-Learning and Reconstruction of Quantum States.”  You can see the PowerPoint slides here; videos were also made, but don’t seem to be available yet.

This was my first COLT, but almost certainly not the last.  I learned lots of cool new tidbits, from the expressive power of small-depth neural networks, to a modern theoretical computer science definition of “non-discriminatory” (namely, your learning algorithm’s output should be independent of protected categories like race, sex, etc. after conditioning on the truth you’re trying to predict), to the inapproximability of VC dimension (assuming the Exponential Time Hypothesis).  You can see the full schedule here.  Thanks so much to the PC chairs, Ohad Shamir and Satyen Kale, for inviting me and for putting on a great conference.

And one more thing: I’m not normally big on art museums, but Amsterdam turns out to have two in close proximity to each other—the Rijksmuseum and the Stedelijk—each containing something that Shtetl-Optimized readers might recognize.

Photo credits: Ronald de Wolf and Marijn Heule.