Planet Musings

January 27, 2020

John PreskillOn the merits of flatworm reproduction

On my right sat a quantum engineer. She was facing a melanoma specialist who works at a medical school. Leftward of us sat a networks expert, a flatworm enthusiast, and a condensed-matter theorist.

Farther down sat a woman who slices up mouse brains. 

Welcome to “Coherent Spins in Biology,” a conference that took place at the University of California, Los Angeles (UCLA) this past December. Two southern Californians organized the workshop: Clarice Aiello heads UCLA’s Quantum Biology Tech lab. Thorsten Ritz, of the University of California, Irvine, cofounded a branch of quantum biology.

Clarice logo

Quantum biology served as the conference’s backdrop. According to conventional wisdom, quantum phenomena can’t influence biology significantly: Biological systems have high temperatures, many particles, and fluids. Quantum phenomena, such as entanglement (a relationship that quantum particles can share), die quickly under such conditions.

Yet perhaps some survive. Quantum biologists search for biological systems that might use quantum resources. Then, they model and measure the uses and resources. Three settings (at least) have held out promise during the past few decades: avian navigation, photosynthesis, and olfaction. You can read about them in this book, cowritten by a conference participant for the general public. I’ll give you a taste (or a possibly quantum smell?) by sketching the avian-navigation proposal, developed by Thorsten and colleagues.

Bird + flower

Birds migrate southward during the autumn and northward during the spring. How do they know where to fly? At least partially by sensing the Earth’s magnetic field, which leads compass needles to point northward. How do birds sense the field?

Possibly with a protein called “cryptochrome.” A photon (a particle of light) could knock an electron out of part of the protein and into another part. Each part would have one electron that lacked a partner. The electrons would share entanglement. One electron would interact with the Earth’s magnetic field differently than its partner, because its surroundings would differ. (Experts: The electrons would form a radical pair. One electron would neighbor different atoms than the other, so the electron would experience a different local magnetic field. The discrepancy would change the relative phase between the electrons’ spins.) The discrepancy could affect the rate at which the chemical system could undergo certain reactions. Which reactions occur could snowball into large and larger effects, eventually signaling the brain about where the bird should fly.

Angry bird

Quantum mechanics and life rank amongst the universe’s mysteries. How could a young researcher resist the combination? A postdoc warned me away, one lunchtime at the start of my PhD. Quantum biology had enjoyed attention several years earlier, he said, but noise the obscured experimental data. Controversy marred the field.

I ate lunch with that postdoc in 2013. Interest in quantum biology is reviving, as evidenced in the conference. Two reasons suggested themselves: new technologies and new research avenues. For example, Thorsten described the disabling and deletion of genes that code for cryptochrome. Such studies require years’ more work but might illuminate whether cryptochrome affects navigation.

Open door

The keynote speaker, Harvard’s Misha Lukin, illustrated new technologies and new research avenues. Misha’s lab has diamonds that contain quantum defects, which serve as artificial atoms. The defects sense tiny magnetic fields and temperatures. Misha’s group applies these quantum sensors to biology problems.

For example, different cells in an embryo divide at different times. Imagine reversing the order in which the cells divide. Would the reversal harm the organism? You could find out by manipulating the temperatures in different parts of the embryo: Temperature controls the rate at which cells divide.

Misha’s team injected nanoscale diamonds into a worm embryo. (See this paper for a related study.) The diamonds reported the temperature at various points in the worm. This information guided experimentalists who heated the embryo with lasers.

The manipulated embryos grew into fairly normal adults. But their cells, and their descendants’ cells, cycled through the stages of life slowly. This study exemplified, to me, one of the most meaningful opportunities for quantum physicists interested in biology: to develop technologies and analyses that can answer biology questions.


I mentioned, in an earlier blog post, another avenue emerging in quantum biology: Physicist Matthew Fisher proposed a mechanism by which entanglement might enhance coordinated neuron firing. My collaborator Elizabeth Crosson and I analyzed how the molecules in Matthew’s proposal—Posner clusters—could process quantum information. The field of Posner quantum biology had a population of about two, when Elizabeth and I entered, and I wondered whether anyone would join us.

The conference helped resolve my uncertainty. Three speakers (including me) presented work based on Matthew’s; two other participants were tilling the Posner soil; and another speaker mentioned Matthew’s proposal. The other two Posner talks related data from three experiments. The experimentalists haven’t finished their papers, so I won’t share details. But stay tuned.

Posner 2

Posner molecule (image by Swift et al.)

Clarice and Thorsten’s conference reminded me of a conference I’d participated in at the end of my PhD: Last month, I moonlighted as a quantum biologist. In 2017, I moonlighted as a quantum-gravity theorist. Two years earlier, I’d been dreaming about black holes and space-time. At UCLA, I was finishing the first paper I’ve coauthored with biophysicists. What a toolkit quantum information theory and thermodynamics provide, that it can unite such disparate fields. 

The contrast—on top of what I learned at UCLA—filled my mind for weeks. And reminded me of the description of asexual reproduction that we heard from the conference’s flatworm enthusiast. According to Western Michigan University’s Wendy Beane, a flatworm “glues its butt down, pops its head off, and grows a new one. Y’know. As one does.” 

I hope I never flinch from popping my head off and growing a new one—on my quantum-information-thermodynamics spine—whenever new science calls for figuring out.


With thanks to Clarice, Thorsten, and UCLA for their invitation and hospitality.

January 26, 2020

Tommaso DorigoAnomaly Detection: When Old Statistics School May Still Beat Super-Duper Machine Learning

One of the most suprising results of the "Machine Learning for Jets" (but really, for particle physics in general) workshop I attended in New York City two weeks ago was the outcome of a challenge that the organizers had proposed to the participants: find a hidden signal of some new physics process in a dataset otherwise made up of some physics background, when no information on the new physics was given, nor on the model of the background.<\p>

The problem is called, in statistical terms, as one of anomaly detection.

read more

Scott AaronsonFrom shtetl to Forum

Daily Updates:
Saturday January 18 (introduction)
Sunday January 19 (Elton John and Greta Thunberg)
Monday January 20 (the $71,000-a-head ski resort conference for Equality)
Tuesday January 21 (Trump! Greta! QC panel!)
Wednesday January 22 (wherein I fail to introduce myself to Al Gore)
Thursday January 23 (wherein I attend the IBM QC panel and “drunkenly unload” at the Canada Reception)
Friday January 24 (second Al Gore session, and getting lost)

It would be great to know whether anyone’s actually reading the later updates, so I know whether to continue putting effort into them!

Saturday January 18

Today I’m headed to the 50th World Economic Forum in Davos, where on Tuesday I’ll participate in a panel discussion on “The Quantum Potential” with Jeremy O’Brien of the quantum computing startup PsiQuantum, and will also host an ask-me-anything session about quantum computational supremacy and Google’s claim to have achieved it.

I’m well aware that this will be unlike any other conference I’ve ever attended: STOC or FOCS it ain’t. As one example, also speaking on Tuesday—although not conflicting with my QC sessions—will be a real-estate swindler and reality-TV star who’s somehow (alas) the current President of the United States. Yes, even while his impeachment trial in the Senate gets underway. Also speaking on Tuesday, a mere hour and a half after him, will be TIME’s Person of the Year, 17-year-old climate activist Greta Thunberg.

In short, this Davos is shaping up to be an epic showdown between two diametrically opposed visions for the future of life on Earth. And your humble blogger will be right there in the middle of it, to … uhh … explain how quantum computers can sample probability distributions that are classically intractable unless the polynomial hierarchy collapses to the third level. I feel appropriately sheepish.

Since the experience will be so unusual for me, I’m planning to “live-blog Davos”: I’ll be updating this post, all week, with any strange new things that I see or learn. As a sign of my devotion to you, my loyal readers, I’ll even clothespin my nose and attend Trump’s speech so I can write about it.

And Greta: on the off chance that you happen to read Shtetl-Optimized, let me treat you to a vegan lunch or dinner! I’d like to try to persuade you of just how essential nuclear power will be to a carbon-free future. Oh, and if it’s not too much trouble, I’d also like a selfie with you for this blog. (Alas, a friend pointed out to me that it would probably be easier to meet Trump: unlike Greta, he won’t be swarmed with thousands of fans!)

Anyway, check back here throughout the week for updates. And if you’re in Davos and would like to meet, please shoot me an email. And please use the comment section to give me your advice, suggestions, well-wishes, requests, or important messages for me to fail to deliver to the “Davoisie” who run the world.

Sunday January 19

So I’ve arrived in Klosters, a village in the Swiss Alps close to Davos where I’ll be staying. (All the hotels in Davos itself were booked by the time I checked.)

I’d braced myself for the challenge of navigating three different trains through the Alps not knowing German. In reality, it was like a hundred times easier than public transportation at home. Every train arrived at the exact right second at the exact platform that was listed, bearing the exact right number, and there were clear visible signs strategically placed at exactly the places where anyone could get confused. I’d entered Bizarro Opposite World. I’m surely one of the more absentminded people on earth, as well as one of the more neurotic about being judged by bystanders if I ever admit to being lost, and it was nothing.

Snow! Once a regular part of my life, now the first I’d seen in several years. Partly because I now live in Texas, but also because even when we take the kids back to Pennsylvania for ChanuChrismaNewYears, it no longer snows like it did when I was a kid. If you show my 2-year-old, Daniel, a picture of snow-covered wilderness, he calls it a “beach.” Daniel’s soon-to-be 7-year-old sister still remembers snow from Boston, but the memory is rapidly fading. I wonder for how many of the children of the 21st century will snow just be a thing from old books and movies, like typewriters or rotary phones.

The World Economic Forum starts tomorrow afternoon. In the meantime, though, I thought I’d give an update not on the WEF itself, but on the inflight movie that I watched on my way here.

I watched Rocketman, the recent biopic/hagiography about Elton John, though as I watched I found that I kept making comparisons between Elton John and Greta Thunberg.

On the surface, these two might not seem to have a great deal of similarity.

But I gathered that they had this in common: while still teenagers, they saw a chance and they seized it. And doing so involved taking inner turmoil and then succesfully externalizing it to the whole planet. Making hundreds of millions of people feel the same emotions that they had felt. If I’m being painfully honest (how often am I not?), that’s something I’ve always wanted to achieve and haven’t.

Of course, when some of the most intense and distinctive emotions you’ve ever felt revolved around the discovery of quantum query complexity lower bounds … yeah, it might be tough to find more people than could fill a room to relive those emotional journeys with you. But a child’s joy at discovering numbers like Ackerman(100) (to say nothing of BB(100)), which are so incomprehensibly bigger than \( 9^{9^{9^{9^9}}} \) that I didn’t need to think twice about how many 9’s I put there? Or the exasperation at those who, yeah, totally get that quantum computers aren’t known to give exponential speedups for NP-complete problems, that’s a really important clarification coming from the theory side, but still, let’s continue to base our entire business or talk or article around the presupposition that quantum computers do give exponential speedups for NP-complete problems? Or even just the type of crush that comes with a ceaseless monologue about what an objectifying, misogynist pig you must be to experience it? Maybe I could someday make people vicariously experience and understand those emotions–if I could only find the right words.

My point is, this is precisely what Greta did for the burgeoning emotion of existential terror about the Anthropocene—another emotion that’s characterized my life since childhood. Not that I ever figured out anything to do about it, with the exception of Gore/Nader vote-swapping. By the standards of existential terrors, I consider this terror to be extraordinarily well-grounded. If Steven Weinberg is scared, who among us has the right to be calm?

The obvious objection to Greta—why should anyone care what a histrionic teenager thinks about a complicated scientific field that thousands of people get PhDs in?—calls for a substantive answer. So here’s mine. Like many concerned citizens, I try to absorb some of the research on ocean warming or the collapse of ice sheets and the melting permafrost leading to even more warming or the collapse of ecosystems due to changes in rainfall or bushfires or climate migrations or whatever. And whenever I do, I’m reminded of Richard Feynman’s remark, during the investigation of the Challenger disaster, that maybe it wasn’t all that interesting for the commission to spend its time reconstructing the exact details of which system caused which other system to malfunction at which millisecond, after the Space Shuttle had already started exploding. The thing was hosed at that point.

Still, even after the 80s and 90s, there remained deep open questions about the eventual shape of the climate crisis, and foremost among them was: how do you get people to stop talking about this crisis in the language of intellectual hypotheticals and meaningless virtue-signalling gestures and “those crazy scientists, who knows what they’ll say tomorrow”? How does one get people to revert to a more ancient language, the one that was used to win WWII for example, which speaks of courage and duty and heroism and defiance in the jaws of death?

Greta’s origin story—the one where the autistic girl spends months so depressed over climate inaction that she can’t eat or leave her room, until finally, no longer able to bear the psychic burden, she ditches school and carries a handmade protest sign to the front of the Swedish parliament—is not merely a prerequisite to a real contribution. It is Greta’s real contribution (so far anyway), and by that I don’t mean to diminish it. The idea was “trivial,” yes, but only in the sense that the wheel, Arabic numerals, or “personal computers will be important” were trivial ideas. Greta modeled for the rest of the world how they, too, would probably feel about climate change were they able to sync up their lizard brains with their higher brains … and crucially, a substantial segment of the world was already primed to agree with her. But it needed to see one successful example of a succesful sync between the science and the emotions appropriate to the science, as a crystal needs a seed.

The thesis of Rocketman is that Elton John’s great achievement was not only to invent a new character, but actually to become that character, since only by succesfully fusing the two could he touch the emotions of the masses. In a similar way, Greta Thunberg’s great accomplishment of her short life has been to make herself into the human race’s first Greta Thunberg.

Monday January 20

Happy 7th birthday to my daughter Lily!  (No, I didn’t miss her birthday party.  We did it on the 18th, right before I flew out.)

I think my goals for Davos have been downgraded from delivering a message of peace and nerd liberation to the world’s powerful, or even getting a selfie with Greta, to simply taking in a week in an environment that’s so alien to me.

Everything in Davos is based on a tiered system of badges, which determine which buildings you can get into to participate in the sessions.  I have a white badge, the highest tier, which would’ve set me back around $71,000 had WEF not thankfully waived its fees for academics.  I should mention that I’m also extremely underdressed compared to most of the people here, and that I spent much of my time today looking for free food.  It turns out that there’s pretty copious and excellent free food, although the sponsors sometimes ask you to leave your business card before you take any.  I don’t have a business card.

The above, for me, represents the true spirit of Davos: a conference at a Swiss ski resort that costs $71,000 to attend, held on behalf of the ideal of human equality.

But maybe I shouldn’t scoff.  I learned today about a war between Greece and Turkey that was averted only because the heads of the two countries talked it over at Davos, so that’s cool.  At the opening ceremony today, besides a beautiful orchestral rendition of “Ode to Joy,” there were a bunch of speeches about how Davos pioneered the entire concept of corporate social responsibility.  I suppose the critics might say instead that Davos pioneered the concept of corporate whitewashing—as with the wall-sized posters that I saw this afternoon, wherein a financial services corporation showcased a diverse cast of people each above their preferred pronouns (he/him, she/her, they/them).  Amazing how pronouns make everything woke and social-justicey!  I imagine that the truth is somewhere between these visions.  Just like the easiest way for NASA to fake a moon landing was actually to send humans to the moon, sometimes the easiest way to virtue-signal is actually to become more virtuous.

Tonight I went to a reception specifically for the academics at Davos.  There, for the first time since my arrival, I saw people who I knew (Shafi Goldwasser, Neha Narula…), and met someone who I’d known by reputation (Brian Schmidt, who shared the Nobel Prize in Physics for the discovery of dark energy).  But even the people who I didn’t know were clearly “my people,” with familiar nerdy mannerisms and interests, and in some cases even a thorough knowledge of SlateStarCodex references.  Imagine visiting a foreign country where no one spoke your language, then suddenly stumbling on the first ones who did.  I found it a hundred times easier than at the main conference to strike up conversations.

Oh yeah, quantum computing.  This afternoon I hosted three roundtable discussions about quantum computing, which were fun and stress-free — I spent much more of my mental energy today figuring out the shuttle buses.  If you’re a regular reader of this blog or my popular articles, or a watcher of my talks on YouTube, etc., then congratulations: you’ve gotten the same explanations of quantum computing for free that others may have paid $71,000 apiece to hear!  Tomorrow are my two “real” quantum computing sessions, as well as the speeches by both the Donald and the Greta (the latter being the much hotter ticket).  So it’s a big day, which I’ll tell you about after it’s happened. Stay tuned!

Tuesday January 21

PsiQuantum’s Jeremy O’Brien and I did the Davos quantum computing panel this morning (moderated by Jennifer Schenker). You can watch our 45-minute panel here. For regular readers of this blog, the territory will be familiar, but I dunno, I hope someone enjoys it anyway!

I’m now in the Congress Hall, in a seat near the front, waiting for Trump to arrive. I will listen to the President of the United States and not attract the Secret Service’s attention by loudly booing, but I have no intention to stand or applaud either.

Alas, getting a seat at Greta’s talk is looking like it will be difficult or impossible.

I was struck by the long runup to Trump’s address: the President of Switzerland gave a searing speech about the existential threats of climate change and ecosystem destruction, and “the politicians in many nations who appeal to fear and bigotry”—never mentioning Trump by name but making clear that she despised the entire ideology of the man people had come to hear. I thought it was a nice touch. Then some technicians spent 15 minutes adjusting Trump’s podium, then nothing happened for 20 minutes as we all waited for a tardy Trump, then some traditional Swiss singers did a performance on stage (!), and finally Klaus Schwab, director of the WEF, gave Trump a brief and coldly cordial introduction, joking about the weather in Davos.

And … now Trump is finally speaking. Once he starts, I suddenly realize that I have no idea what new insight I expected from this. He’s giving his standard stump speech, America has regained its footing after the disaster of the previous administration, winning like it’s never won before, unemployment is the lowest in recorded history, blah blah blah. I estimate that less than half of the audience applauded Trump’s entrance; the rest sat in stony silence. Meanwhile, some people were passing out flyers to the audience documenting all the egregious errors in Trump’s economic statistics.

Given the small and childish nature of the remarks (“we’re the best! ain’t no one gonna push us around!”), it feels somehow right to be looking down at my phone, blogging, rather than giving my undivided attention to the President of the United States speaking 75 feet in front of me.

Ok, I admit I just looked up, when Trump mentioned America’s commitment to developing new technologies like “5G and quantum computing” (he slowly drew out the word “quantum”).

His whole delivery is strangely lethargic, as if he didn’t sleep well last night (I didn’t either).

Trump announced that the US would be joining the WEF’s “1 trillion trees” environmental initiative, garnering the only applause in his speech. But he then immediately pivoted to a denunciation of the “doomsayers and pessimists and socialists who want to control our lives and take away our liberty” (he presumably meant people worried about climate change).

Now, I kid you not, Trump is expanding on his “optimism” theme by going on and on about the architectural achievements of Renaissance Florence.

You can watch Trump’s speech for yourself here.

While I wasn’t able to get in to see Greta Thunberg in person, you can watch her (along with others) here. I learned that her name is pronounced “toon-berg.”

Having now listened to Greta’s remarks, I confess that I disagree with the content of what she says.  She explicitly advocates a sort of purity-based carbon absolutism—demanding that companies and governments immediately implement, not merely net zero emissions (i.e. offsetting their emissions by paying to plant trees and so forth), but zero emissions period.  Since she can’t possibly mean literally zero, I’ll interpret her to mean close to zero.  Even so, it seems to me that the resulting economic upheavals would provoke a massive backlash against whoever tried to enforce such a policy.  Greta also dismisses the idea of technological solutions to climate change, saying that we don’t have time to invent such solutions.  But of course, some of the solutions already exist—a prime example being nuclear power.  And if we no longer have time to nuclearize the world, then to a great extent, that’s the fault of the antinuclear activists—an unbelievable moral and strategic failure that may have doomed our civilization, and for which there’s never been a reckoning.

Despite all my disagreements, if Greta’s strident, uncompromising rhetoric helps push the world toward cutting emissions, then she’ll have to be counted as one of the greatest people who ever lived. Of course, another possibility is the world’s leaders will applaud her and celebrate her moral courage, while not taking anything beyond token actions.

Wednesday January 22

Alas, I’ve come down with a nasty cold (is there any other kind?).  So I’m paring back my participation in the rest of Davos to the stuff that really interests me.  The good news is that my quantum computing sessions are already finished!

This morning, as I sat in the lobby of the Congress Centre checking my email and blowing my nose, I noticed some guy playing a cello nearby.  Dozens were gathered around him — so many that I could barely see the guy, only hear the music.  After he was finished, I worked up the courage to ask someone what the fuss was about.  Turns out that the guy was Yo-Yo Ma.

The Prince Regent of Liechtenstein was explaining to one of my quantum computing colleagues that Liechtenstein does not have much in the way of quantum.

Speaking of princes, I’m now at a cybersecurity session with Shafi Goldwasser and others, at which the attendance might be slightly depressed because it’s up against Prince Charles. That’s right: Davos is the conference where the heir apparent to the British throne speaks in a parallel session.

I’ve realized these past few days that I’m not very good at schmoozing with powerful people.  On the other hand, it’s possible that my being bad at it is a sort of mental defense mechanism.  The issue is that, the more I became a powerful “thought leader” who unironically used phrases like “Fourth Industrial Revolution” or “disruptive innovation,” the more I used business cards and LinkedIn to expand my network of contacts or checked my social media metrics … well, the less I’d be able to do the research that led to stuff like being invited here in the first place.  I imagine that many Davos regulars started out as nerds like me, and that today, coming to Davos to talk about “disruptive innovation” is a fun kind of semi-retirement.  If so, though, I’m not ready to retire just yet!  I still want to do things that are new enough that they don’t need to be described using multiple synonyms for newness.

Apparently one of the hottest tickets at Davos is a post-Forum Shabbat dinner, which used to be frequented by Shimon Peres, Elie Wiesel, etc.  Alas, not having known about it, I already planned my travel in a way that won’t let me attend it.  I feel a little like the guy in this Onion article.

I had signed up for a session entitled What’s At Stake: The Arctic, featuring Al Gore. As I waited for them to start letting people in, I suddenly realized that Al Gore was standing right next to me. However, he was engrossed in conversation with a young woman, and even though I assumed she was just some random fan like I was, I didn’t work up the courage to interrupt them. Only once the panel had started, with the woman on it two seats from Gore, did I realize that she was Sanna Marin, the new Prime Minister of Finland (and at 34, the world’s second-youngest head of state).

You can watch the panel here. Briefly, the Arctic has lost about half of its ice cover, not merely since preindustrial times but since a few decades ago. And this is not only a problem for polar bears. It’s increasing the earth’s absorption of sunlight and hence significantly accelerating global warming, and it’s also screwing up weather patterns all across the northern hemisphere. Of course, the Siberian permafrost is also thawing and releasing greenhouse gases that are even worse than CO2, further accelerating the wonderful feedback loop of doom.

I thought that Gore gave a masterful performance. He was in total command of the facts—discoursing clearly and at length on the relative roles of CO2, SO2, and methane in the permafrost as well as the economics of oil extraction, less in the manner of thundering (or ‘thunberging’?) prophet than in the manner of an academic savoring all the non-obvious twists as he explains something to a colleague—and his every response to the other panelists was completely on point.

In 2000, there was indeed a bifurcation of the universe, and we ended up in a freakishly horrible branch. Instead of something close to the best, most fact-driven US president one could conjure in one’s mind, we got something close to the worst, and then, after an 8-year interregnum just to lull us into complacency, we got something even worse than the worst.

The other panelists were good too. Gail Whiteman (the scientist) had the annoying tic of starting sentence after sentence with “the science says…,” but then did a good job of summarizing what the science does say about the melting of the Arctic and the permafrost.

Alas, rather than trying to talk to Gore, immediately after the session ended, I headed back to my hotel to go to sleep. Why? Partly because of my cold. But partly also because of incident immediately before the panel. I was sitting in the front row, next to an empty seat, when a woman who wanted to occupy that seat hissed at me that I was “manspreading.”

If, on these narrow seats packed so tightly together that they were basically a bench, my left leg had strayed an inch over the line, I would’ve addressed the situation differently: for example, “oh hello, may I sit here?” (At which point I would’ve immediately squeezed in.) Amazingly, the woman didn’t seem to didn’t care that a different woman, the one to my right, kept her pocketbook and other items on the seat next to her throughout the panel, preventing anyone else from using the seat in what was otherwise a packed house. (Is that “womanspreading”?)

Anyway, the effect of her comment was to transform the way I related to the panel. I looked around at the audience and thought: “these activists, who came to hear a panel on climate change, are fighting for a better world. And in their minds, one of the main ways that the world will be better is that it won’t contain sexist, entitled ‘manspreaders’ like me.”

In case any SneerClubbers are reading, I should clarify that I recognize an element of the irrational in these thoughts. I’m simply reporting, truthfully, that they’re what bubbled up outside the arena of conscious control. But furthermore, I feel like the fact that my brain works this way might give me some insight into the psychology of Trump support that few Democrats share—so much that I wonder if I could provide useful service as a Democratic political consultant!

I understand the mindset that howls: “better that every tree burn to the ground, every fish get trawled from the ocean, every coastal city get flooded out of existence, than that these sanctimonious hypocrites ‘on the right side of history,’ singing of their own universal compassion even as they build a utopia with no place for me in it, should get to enjoy even a second of smug self-satisfaction.” I hasten to add that I’ve learned how to override that mindset with a broader, better mindset: I can jump into the abyss, but I can also climb back out, and I can even look down at the abyss from above and report what’s there. It’s as if I’d captured some virulent strain of Ebola in a microbiology lab of the soul. And if nearly half of American voters (more in crucial swing states) have gotten infected with that Ebola strain, then maybe my lab work could have some broader interest.

I thought about Scott Minerd, the investor on the panel, who became a punching bag for the other panelists (except for Gore, a politician in a good sense, who went out of his way to find points of agreement). In his clumsy way, Minerd was making the same point that climate activists themselves correctly make: namely, that the oil companies need to be incentivized (for example, through a carbon tax) to leave reserves in the ground, that we can’t just trust them to do the noble thing and write off their own assets. But for some reason, Minerd presented himself as a greedy fat-cat, raining on the dreams of the hippies all around him for a carbon-free future, so then that’s how the other panelists duly treated him (except, again, for Gore).

But I looked at the audience, which was cheering attacks on Minerd, and the Ebola in my internal microbiology lab said: “the way these activists see Scott Minerd is not far from how they see Scott Aaronson. You’ll never be good enough for them. The people in this room might or might not succeed at saving the world, but in any case they don’t want your help.”

After all, what was the pinnacle of my contribution to saving the world? It was surely when I was 19, and created a website to defend the practice of NaderTrading (i.e., Ralph Nader supporters in swing states voting for Al Gore, while Gore supporters in safe states pledged to vote Nader on their behalf). Alas, we failed. We did help arrange a few thousand swaps, including a few hundred swaps in Florida, but it was 538 too few. We did too little, too late.

So what would I have talked to Gore about, anyway? Would I have reminded him of the central tragedy of his life, which was also a central tragedy of recent American history, just in order to babble, or brag, about a NaderTrading website that I made half a lifetime ago? Would I have made up a post-hoc rationalization for why I work on quantum computing, like that I hope it will lead to the discovery of new carbon-capture methods? Immediately after Gore’s eloquent brief for the survival of the Arctic and all life on earth, would I have asked him for an autograph or a selfie? No, better to just reflect on his words. At a crucial pivot point in history, Gore failed by a mere 538 votes, and I also failed to prevent the failure. But amazingly, Gore never gave up-–he just kept on fighting for what he knew civilization needed to do—and yesterday I sat a few feet away while he explained why the rest of us shouldn’t give up either. And he’s right about this—if not in the sense of the outlook being especially hopeful or encouraging right now, then surely in the sense of which attitude is the useful one to adopt. And my attitude, which you might call “Many-Worlds-inflected despair,” might be epistemically sound but it definitely wasn’t useful. What further clarifications did I need?

Thursday January 23

I attended a panel discussion on quantum computing hosted by IBM. The participants were Thomas Friedman (the New York Times columnist), Arvind Krishna (a senior Vice President at IBM), Raoul Klingner (director of a European research organization), and Alison Snyder (the managing editor of Axios magazine). There were about 100 people in the audience, more than at all of my Davos quantum computing sessions combined. I sat right in front, although I don’t think anyone on the panel recognized me.

Ginni Rometty, the CEO of IBM, gave an introduction. She said that quantum will change the world by speeding up supply-chain and other optimization problems. I assume she was talking about the Grover speedup? She also said that IBM is committed to delivering value for its customers, rather than “things you can do in two seconds that are not commercially valid” (I assume she meant Google’s supremacy experiment). She asked for a show of hands of who knows absolutely nothing about the science behind quantum computing. She then quipped, “well, that’s all of you!” She may have missed two hands that hadn’t gone up (both belonging to the same person).

I accepted an invitation to this session firstly for the free lunch (which turned out to be delicious), and secondly because I was truly, genuinely curious to hear what Thomas Friedman, many of whose columns I’ve liked, had to teach me about quantum computing. The answer turns out to be this: in his travels around the world over the past 6 years, Friedman has witnessed firsthand how the old dichotomy between right-wing parties and left-wing parties is breaking down everywhere (I assume he means, as both sides get taken over by populist movements?). And this is just like how a qubit breaks down the binary dichotomy between 0’s and 1’s! Also, the way a quantum computer can be in multiple states at once, is like how the US now has to be in multiple states at once in its relationship with China.

Friedman opened his remarks by joking about how he never took a single physics course, and had no idea why he was on a quantum computing panel at all. He quickly added, though, that he toured IBM’s QC labs, where he found IBM’s leaders to be wonderful explainers of what it all means.

I’ll note that Friedman, the politics and Middle East affairs writer — not the two panelists serving the role of quantum experts — was the only one who mentioned, even in passing, the idea that the advantage of QCs depends on something called “constructive interference.”

Krishna, the IBM Vice President, explained why IBM rejects the entire concept of “quantum supremacy”: because it’s an irrelevant curiosity, and creating value for customers in the marketplace (for example by solving their supply-chain optimization problems) is the only test that matters. No one on the panel expressed a contrary view.

Later, Krishna explained why quantum computers will never replace classical computers: because if you stored your bank balance on a quantum computer, one day you’d have $1, the next day $1000, the day after that $1 again, and so forth! He explained how, where current supercomputers use the same amount of energy needed to power all of Davos to train machine learning models, quantum computers would use less than the energy needed to power a single house. New algorithms do need to be designed to run neural networks quantumly, but fortunately that’s all being done as we speak.

I got the feeling that the businesspeople who came to this session felt like they got a lot more out of it than the businesspeople who came to my and Jeremy O’Brien’s session felt like they got out of ours. After all, this session got across some big real-world takeaways—e.g., that if you don’t quantum, your business will be left in the dust, stuck with a single value at a time rather than exploring all values in parallel, and IBM can help you rather than your competitors win the quantum race. It didn’t muddy the message with all the incomprehensible technicalities about how QCs only give exponential speedups for problems with special structure.

Later Update:

Tonight I went to a Davos reception hosted by the government of Canada (🇨🇦). I’m not sure why exactly they invited me, although I have of course enjoyed a couple years of life “up north” (well, in Waterloo, so actually further south than a decent chunk of the US … you see that I do have a tiny speck of a Canadian in me?).

I didn’t recognize a single person at the reception. So I just ate the food, drank beer, and answered emails. But then a few people did introduce themselves (two who recognized me, one who didn’t). As they gathered around, they started asking me questions about quantum computing: is it true that QCs could crack the classically impossible Traveling Salesman Problem? That they try all possible answers in parallel? Are they going to go commercial in 2-5 years, or have they already?

It might have been the beer, but for some reason I decided to launch an all-out assault of truth bombs, one after the next, with what they might have considered a somewhat emotional delivery.

OK fine, it wasn’t the beer. That’s just who I am.

And then, improbably, I was a sort of localized “life of the party” — although possibly for the amusement / novelty value of my rant more than for the manifest truth of my assertions. One person afterward told me that it was by far the most useful conversation he’d had at Davos.

And I replied: I’m flattered by your surely inflated praise, but in truth I should also thank you. You caught me at a moment when I’d been thinking to myself that, if only I could make one or two people’s eyes light up with comprehension about the fallacy of a QC simply trying all possible answers in parallel and then magically picking the best one, or about the central role of amplitudes and interference, or about the “merely” quadratic nature of the Grover speedup, or about the specialized nature of the most dramatic known applications for QCs, or about the gap between where the experimentalists are now and what’s needed for error correction and hence true scalability, or about the fact that “quantum supremacy” is obviously not a sufficient condition for a QC to be useful, but it’s equally obviously a necessary condition, or about the fact that doing something “practical” with a QC is of very little interest unless the task in question is actually harder for classical computers, which is a question of great subtlety … I say, if I could make only two or four eyes light up with comprehension of these things, then on that basis alone I could declare that the whole trip to Davos was worth it.

And then one of the people hugged me … and that was the coolest thing that happened to me today.

Friday January 24

I attended a second session with Al Gore, about the problem of the world filling up with plastic. I learned that the world’s plastic waste is set to double over the next 15-20 years, and that a superb solution—indeed, it seems like a crime that it hasn’t been implemented already—-would be to set up garbage booms at the mouths of a few major rivers from which something like 80% of the plastic waste in the ocean gets there.

Anyway, still didn’t introduce myself.

I wrote before about how surprisingly clear and logical the trains to Davos were, even with multiple changes. Unfortunately God’s mercy on me didn’t last. All week, I kept getting lost in warren-like buildings with dozens of “secret passageways” (often literally behind unmarked doors) and few signs—not even exit signs. In one case I missed a tram that was the only way out from somewhere because I arrived to the wrong side of the tram—and getting to the right side required entering a building and navigating another unmarked labyrinth, by which point the tram had already left. In another case, I wandered through a Davos hotel for almost an hour trying to find an exit, ricocheting like a pinball off person after person giving me conflicting directions. Only after I literally started ranting to a crowd: ”holy f-ck, is this place some psychological torture labyrinth designed by Franz Kafka? Am I the only one? Is it clear to all of you? Please, WHERE IS THE F-CKING EXIT???” until finally some local took pity and walked me through the maze. As I mentioned earlier, logistical issues like these made me about 5,000 times more anxious on this trip than the prospect of giving quantum computing talks to the world’s captains of industry. I don’t recall having had a nightmare about lecturing even once—but I’ve had never-ending nightmares about failing to show up to give a lecture because I’m wandering endlessly through an airport or a research center or whatever, always the only one who’s lost.

Terence TaoEquidistribution of Syracuse random variables and density of Collatz preimages

Define the Collatz map {\mathrm{Col}: {\bf N}+1 \rightarrow {\bf N}+1} on the natural numbers {{\bf N}+1 = \{1,2,\dots\}} by setting {\mathrm{Col}(N)} to equal {3N+1} when {N} is odd and {N/2} when {N} is even, and let {\mathrm{Col}^{\bf N}(N) := \{ N, \mathrm{Col}(N), \mathrm{Col}^2(N), \dots \}} denote the forward Collatz orbit of {N}. The notorious Collatz conjecture asserts that {1 \in \mathrm{Col}^{\bf N}(n)} for all {N \in {\bf N}+1}. Equivalently, if we define the backwards Collatz orbit {(\mathrm{Col}^{\bf N})^*(N) := \{ M \in {\bf N}+1: N \in \mathrm{Col}^{\bf N}(M) \}} to be all the natural numbers {M} that encounter {N} in their forward Collatz orbit, then the Collatz conjecture asserts that {(\mathrm{Col}^{\bf N})^*(1) = {\bf N}+1}. As a partial result towards this latter statement, Krasikov and Lagarias in 2003 established the bound

\displaystyle  \# \{ N \leq x: N \in (\mathrm{Col}^{\bf N})^*(1) \} \gg x^\gamma \ \ \ \ \ (1)

for all {x \geq 1} and {\gamma = 0.84}. (This improved upon previous values of {\gamma = 0.81} obtained by Applegate and Lagarias in 1995, {\gamma = 0.65} by Applegate and Lagarias in 1995 by a different method, {\gamma=0.48} by Wirsching in 1993, {\gamma=0.43} by Krasikov in 1989, and some {\gamma>0} by Crandall in 1978.) This is still the largest value of {\gamma} for which (1) has been established. Of course, the Collatz conjecture would imply that we can take {\gamma} equal to {1}, which is the assertion that a positive density set of natural numbers obeys the Collatz conjecture. This is not yet established, although the results in my previous paper do at least imply that a positive density set of natural numbers iterates to an (explicitly computable) bounded set, so in principle the {\gamma=1} case of (1) could now be verified by an (enormous) finite computation in which one verifies that every number in this explicit bounded set iterates to {1}. In this post I would like to record a possible alternate route to this problem that depends on the distribution of a certain family of random variables that appeared in my previous paper, that I called Syracuse random variables.

Definition 1 (Syracuse random variables) For any natural number {n}, a Syracuse random variable {\mathbf{Syrac}({\bf Z}/3^n{\bf Z})} on the cyclic group {{\bf Z}/3^n{\bf Z}} is defined as a random variable of the form

\displaystyle  \mathbf{Syrac}({\bf Z}/3^n{\bf Z}) = \sum_{m=1}^n 3^{n-m} 2^{-{\mathbf a}_m-\dots-{\mathbf a}_n} \ \ \ \ \ (2)

where {\mathbf{a}_1,\dots,\mathbf{a_n}} are independent copies of a geometric random variable {\mathbf{Geom}(2)} on the natural numbers with mean {2}, thus

\displaystyle  \mathop{\bf P}( \mathbf{a}_1=a_1,\dots,\mathbf{a}_n=a_n) = 2^{-a_1-\dots-a_n}

} for {a_1,\dots,a_n \in {\bf N}+1}. In (2) the arithmetic is performed in the ring {{\bf Z}/3^n{\bf Z}}.

Thus for instance

\displaystyle  \mathrm{Syrac}({\bf Z}/3{\bf Z}) = 2^{-\mathbf{a}_1} \hbox{ mod } 3

\displaystyle  \mathrm{Syrac}({\bf Z}/3^2{\bf Z}) = 2^{-\mathbf{a}_1-\mathbf{a}_2} + 3 \times 2^{-\mathbf{a}_2} \hbox{ mod } 3^2

\displaystyle  \mathrm{Syrac}({\bf Z}/3^3{\bf Z}) = 2^{-\mathbf{a}_1-\mathbf{a}_2-\mathbf{a}_3} + 3 \times 2^{-\mathbf{a}_2-\mathbf{a}_3} + 3^2 \times 2^{-\mathbf{a}_3} \hbox{ mod } 3^3

and so forth. One could also view {\mathrm{Syrac}({\bf Z}/3^n{\bf Z})} as the mod {3^n} reduction of a {3}-adic random variable

\displaystyle  \mathbf{Syrac}({\bf Z}_3) = \sum_{m=1}^\infty 3^{m-1} 2^{-{\mathbf a}_1-\dots-{\mathbf a}_m}.

The probability density function {x \mapsto \mathbf{P}( \mathbf{Syrac}({\bf Z}/3^n{\bf Z}) = x )} of the Syracuse random variable can be explicitly computed by a recursive formula (see Lemma 1.12 of my previous paper). For instance, when {n=1}, {\mathbf{P}( \mathbf{Syrac}({\bf Z}/3{\bf Z}) = x )} is equal to {0,1/3,2/3} for {x=0,1,2 \hbox{ mod } 3} respectively, while when {n=2}, {\mathbf{P}( \mathbf{Syrac}({\bf Z}/3^2{\bf Z}) = x )} is equal to

\displaystyle  0, \frac{8}{63}, \frac{16}{63}, 0, \frac{11}{63}, \frac{4}{63}, 0, \frac{2}{63}, \frac{22}{63}

when {x=0,\dots,8 \hbox{ mod } 9} respectively.

The relationship of these random variables to the Collatz problem can be explained as follows. Let {2{\bf N}+1 = \{1,3,5,\dots\}} denote the odd natural numbers, and define the Syracuse map {\mathrm{Syr}: 2{\bf N}+1 \rightarrow 2{\bf N}+1} by

\displaystyle  \mathrm{Syr}(N) := \frac{3n+1}{2^{\nu_2(3N+1)}}

where the {2}valuation {\nu_2(3n+1) \in {\bf N}} is the number of times {2} divides {3N+1}. We can define the forward orbit {\mathrm{Syr}^{\bf N}(n)} and backward orbit {(\mathrm{Syr}^{\bf N})^*(N)} of the Syracuse map as before. It is not difficult to then see that the Collatz conjecture is equivalent to the assertion {(\mathrm{Syr}^{\bf N})^*(1) = 2{\bf N}+1}, and that the assertion (1) for a given {\gamma} is equivalent to the assertion

\displaystyle  \# \{ N \leq x: N \in (\mathrm{Syr}^{\bf N})^*(1) \} \gg x^\gamma \ \ \ \ \ (3)

for all {x \geq 1}, where {N} is now understood to range over odd natural numbers. A brief calculation then shows that for any odd natural number {N} and natural number {n}, one has

\displaystyle  \mathrm{Syr}^n(N) = 3^n 2^{-a_1-\dots-a_n} N + \sum_{m=1}^n 3^{n-m} 2^{-a_m-\dots-a_n}

where the natural numbers {a_1,\dots,a_n} are defined by the formula

\displaystyle  a_i := \nu_2( 3 \mathrm{Syr}^{i-1}(N) + 1 ),

so in particular

\displaystyle  \mathrm{Syr}^n(N) = \sum_{m=1}^n 3^{n-m} 2^{-a_m-\dots-a_n} \hbox{ mod } 3^n.

Heuristically, one expects the {2}-valuation {a = \nu_2(N)} of a typical odd number {N} to be approximately distributed according to the geometric distribution {\mathbf{Geom}(2)}, so one therefore expects the residue class {\mathrm{Syr}^n(N) \hbox{ mod } 3^n} to be distributed approximately according to the random variable {\mathbf{Syrac}({\bf Z}/3^n{\bf Z})}.

The Syracuse random variables {\mathbf{Syrac}({\bf Z}/3^n{\bf Z})} will always avoid multiples of three (this reflects the fact that {\mathrm{Syr}(N)} is never a multiple of three), but attains any non-multiple of three in {{\bf Z}/3^n{\bf Z}} with positive probability. For any natural number {n}, set

\displaystyle  c_n := \inf_{b \in {\bf Z}/3^n{\bf Z}: 3 \not | b} \mathbf{P}( \mathbf{Syrac}({\bf Z}/3^2{\bf Z}) = b ).

Equivalently, {c_n} is the greatest quantity for which we have the inequality

\displaystyle  \sum_{(a_1,\dots,a_n) \in S_{n,N}} 2^{-a_1-\dots-a_m} \geq c_n \ \ \ \ \ (4)

for all integers {N} not divisible by three, where {S_{n,N} \subset ({\bf N}+1)^n} is the set of all tuples {(a_1,\dots,a_n)} for which

\displaystyle  N = \sum_{m=1}^n 3^{m-1} 2^{-a_1-\dots-a_m} \hbox{ mod } 3^n.

Thus for instance {c_0=1}, {c_1 = 1/3}, and {c_2 = 2/63}. On the other hand, since all the probabilities {\mathbf{P}( \mathbf{Syrac}({\bf Z}/3^n{\bf Z}) = b)} sum to {1} as {b \in {\bf Z}/3^n{\bf Z}} ranges over the non-multiples of {3}, we have the trivial upper bound

\displaystyle  c_n \leq \frac{3}{2} 3^{-n}.

There is also an easy submultiplicativity result:

Lemma 2 For any natural numbers {n_1,n_2}, we have

\displaystyle  c_{n_1+n_2-1} \geq c_{n_1} c_{n_2}.

Proof: Let {N} be an integer not divisible by {3}, then by (4) we have

\displaystyle  \sum_{(a_1,\dots,a_{n_1}) \in S_{n_1,N}} 2^{-a_1-\dots-a_{n_1}} \geq c_{n_1}.

If we let {S'_{n_1,N}} denote the set of tuples {(a_1,\dots,a_{n_1-1})} that can be formed from the tuples in {S_{n_1,N}} by deleting the final component {a_{n_1}} from each tuple, then we have

\displaystyle  \sum_{(a_1,\dots,a_{n_1-1}) \in S'_{n_1,N}} 2^{-a_1-\dots-a_{n_1-1}} \geq c_{n_1}. \ \ \ \ \ (5)

Next, observe that if {(a_1,\dots,a_{n_1-1}) \in S'_{n_1,N}}, then

\displaystyle  N = \sum_{m=1}^{n_1-1} 3^{m-1} 2^{-a_1-\dots-a_m} + 3^{n_1-1} 2^{-a_1-\dots-a_{n_1-1}} M

with {M = M_{N,n_1,a_1,\dots,a_{n_1-1}}} an integer not divisible by three. By definition of {S_{n_2,M}} and a relabeling, we then have

\displaystyle  M = \sum_{m=1}^{n_2} 3^{m-1} 2^{-a_{n_1}-\dots-a_{m+n_1-1}} \hbox{ mod } 3^{n_2}

for all {(a_{n_1},\dots,a_{n_1+n_2-1}) \in S_{n_2,M}}. For such tuples we then have

\displaystyle  N = \sum_{m=1}^{n_1+n_2-1} 3^{m-1} 2^{-a_1-\dots-a_{n_1+n_2-1}} \hbox{ mod } 3^{n_1+n_2-1}

so that {(a_1,\dots,a_{n_1+n_2-1}) \in S_{n_1+n_2-1,N}}. Since

\displaystyle  \sum_{(a_{n_1},\dots,a_{n_1+n_2-1}) \in S_{n_2,M}} 2^{-a_{n_1}-\dots-a_{n_1+n_2-1}} \geq c_{n_2}

for each {M}, the claim follows. \Box

From this lemma we see that {c_n = 3^{-\beta n + o(n)}} for some absolute constant {\beta \geq 1}. Heuristically, we expect the Syracuse random variables to be somewhat approximately equidistributed amongst the multiples of {{\bf Z}/3^n{\bf Z}} (in Proposition 1.4 of my previous paper I prove a fine scale mixing result that supports this heuristic). As a consequence it is natural to conjecture that {\beta=1}. I cannot prove this, but I can show that this conjecture would imply that we can take the exponent {\gamma} in (1), (3) arbitrarily close to one:

Proposition 3 Suppose that {\beta=1} (that is to say, {c_n = 3^{-n+o(n)}} as {n \rightarrow \infty}). Then

\displaystyle  \# \{ N \leq x: N \in (\mathrm{Syr}^{\bf N})^*(1) \} \gg x^{1-o(1)}

as {x \rightarrow \infty}, or equivalently

\displaystyle  \# \{ N \leq x: N \in (\mathrm{Col}^{\bf N})^*(1) \} \gg x^{1-o(1)}

as {x \rightarrow \infty}. In other words, (1), (3) hold for all {\gamma < 1}.

I prove this proposition below the fold. A variant of the argument shows that for any value of {\beta}, (1), (3) holds whenever {\gamma < f(\beta)}, where {f: [0,1] \rightarrow [0,1]} is an explicitly computable function with {f(\beta) \rightarrow 1} as {\beta \rightarrow 1}. In principle, one could then improve the Krasikov-Lagarias result {\gamma = 0.84} by getting a sufficiently good upper bound on {\beta}, which is in principle achievable numerically (note for instance that Lemma 2 implies the bound {c_n \leq 3^{-\beta(n-1)}} for any {n}, since {c_{kn-k+1} \geq c_n^k} for any {k}).

— 1. Proof of proposition —

Assume {\beta=1}. Let {\varepsilon>0} be sufficiently small, and let {n_0} be sufficiently large depending on {\varepsilon}. We first establish the following proposition, that shows that elements in a certain residue class have a lot of Syracuse preimages:

Proposition 4 There exists a residue class of {{\bf Z}/3^{n_0}{\bf Z}} with the property that for all integers {N} in this class, and all non-negative integers {j}, there exist natural numbers {n_j, L_j} with

\displaystyle  (2-\varepsilon^2) n_j \leq L_j \leq (2+\varepsilon^2) n_j


\displaystyle  (4/3)^{(1-\varepsilon^2) (1+\varepsilon)^j n_0} \leq 3^{-n_j} 2^{L_j} \leq (4/3)^{(1+\varepsilon^2) (1+\varepsilon)^j n_0}

and at least {3^{-n_j - \varepsilon^4 n_j} 2^{L_j}} tuples

\displaystyle  (a_1,\dots,a_{n_j-1}) \in S'_{n_j,N}

obeying the additional properties

\displaystyle  a_1+\dots+a_{n_j-1} = L_j \ \ \ \ \ (6)


\displaystyle  a_1+\dots+a_i - \frac{\log 3}{\log 2} i \geq - \varepsilon^5 n_0 \ \ \ \ \ (7)

for all {1 \leq i \leq n_j-1}.

Proof: We begin with the base case {j=0}. By (4) and the hypothesis {\beta=1}, we see that

\displaystyle  \sum_{(a_1,\dots,a_{n_0-1}) \in S'_{n_0,N}} 2^{-a_1-\dots-a_{n_0-1}} \gg 3^{-(1+\varepsilon^6) n_0}

for all integers {N} not divisible by {3}. Let {S''_{n_0,N}} denote the tuples {(a_1,\dots,a_{n_0-1})} in {S'_{n_0,N}} that obey the additional regularity hypotheses

\displaystyle  |a_1 + \dots + a_i - 2i| \leq - \varepsilon^5 n_0 \ \ \ \ \ (8)

for all {1 \leq i \leq n_0-1},note that this implies in particular the {j=0} case of (7). From the Chernoff inequality (noting that the geometric random variable {\mathrm{Geom}(2)} has mean {2}) and the union bound we have

\displaystyle  \sum_{b \in {\bf Z}/3^{n_0}{\bf Z}: 3 \not | b} \sum_{(a_1,\dots,a_{n_0-1}) \in S'_{n_0,b} \backslash S''_{n_0,b}} 2^{-a_1-\dots-a_{n_0-1}} \ll 3^{-c \varepsilon^5 n_0}

for an absolute constant {c>0} (where we use the periodicity of {S'_{n_0,N}, S''_{n_0,N}} in {N} to define {S'_{n_0,b}, S''_{n_0,b}} for {b \in {\bf Z}/3^{n_0}{\bf Z}} by abuse of notation). Hence by the pigeonhole principle we can find a residue class {b} not divisible by {3} such that

\displaystyle  \sum_{(a_1,\dots,a_{n_0-1}) \in S'_{n_0,b} \backslash S''_{n_0,b}} 2^{-a_1-\dots-a_{n_0-1}} \ll 3^{-(1+c \varepsilon^5) n_0}

and hence by the triangle inequality we have

\displaystyle  \sum_{(a_1,\dots,a_{n_0-1}) \in S''_{n_0,N}} 2^{-a_1-\dots-a_{n_0-1}} \gg 3^{-(1+\varepsilon^6) n_0}

for all {N} in this residue class.

Henceforth {N} is assumed to be an element of this residue class. For {(a_1,\dots,a_{n_0-1}) \in S''_{n_0,N}}, we see from (8)

\displaystyle  a_1 + \dots + a_{n_0-1} = (2+O(\varepsilon^5)) n_0,

hence by the pigeonhole principle there exists {L_0 = (2+O(\varepsilon^5)) n_0} (so in particular {3^{-n_0} 2^{L_0} = (4/3)^{(1+O(\varepsilon^5))n_0}}) such that

\displaystyle  \sum_{(a_1,\dots,a_{n_0-1}) \in S''_{n_0,N}: a_1+\dots+a_{n_0-1} = L_0} 2^{-L_0} \gg 3^{-(1+\varepsilon^6) n_0}

so the number of summands here is at least {\gg 2^{L_0} 3^{-(1+\varepsilon^6) n_0}}. This establishes the base case {j=0}.

Now suppose inductively that {j \geq 1}, and that the claim has already been proven for {j-1}. By induction hypothesis, there exists natural numbers {n_{j-1}, L_{j-1}} with

\displaystyle  (2-\varepsilon^2) n_{j-1} \leq L_{j-1} \leq (2+\varepsilon^2) n_{j-1}


\displaystyle  (4/3)^{(1-\varepsilon^2) (1+\varepsilon)^{j-1} n_0} \leq 3^{-n_{j-1}} 2^{L_{j-1}} \leq (4/3)^{(1+\varepsilon^2) (1+\varepsilon)^{j-1} n_0} \ \ \ \ \ (9)

(which in particular imply that {n_{j-1} = (1+O(\varepsilon^2)) (1+\varepsilon)^{j-1} n_0}) and at least {3^{-n_{j-1} - \varepsilon^4 n_{j-1}} 2^{L_{j-1}}} tuples

\displaystyle  (a_1,\dots,a_{n_{j-1}-1}) \in S'_{n_{j-1},N} \ \ \ \ \ (10)

obeying the additional properties

\displaystyle  a_1+\dots+a_{n_{j-1}-1} = L_{j-1} \ \ \ \ \ (11)

and (7) for all {1 \leq i \leq n_{j-1}-1}.

Let {n_{j}} be an integer such that

\displaystyle  3^{-n_{j}} 2^{L_{j-1} + 2(n_{j}-n_{j-1})} \asymp (4/3)^{(1+\varepsilon)^j n_0} N. \ \ \ \ \ (12)

One easily checks that

\displaystyle  n_{j} = (1+\varepsilon+O(\varepsilon^2)) n_{j-1} = (1+O(\varepsilon^2)) (1+\varepsilon)^{j-1} n_0.

For each tuple (10), we may write (as in the proof of Lemma 2)

\displaystyle  N = \sum_{m=1}^{n_{j-1}-1} 3^{m-1} 2^{-a_1-\dots-a_m} + 3^{n_{j-1}-1} 2^{-L_{j-1}} M_{\vec a}

for some integers {M_{\vec a}}. We claim that these integers lie in distinct residue classes modulo {3^k} where

\displaystyle  k :=\lfloor \frac{\log 2}{\log 3} L_{j-1} - n_{j-1} + \varepsilon^4 n_{j-1} \rfloor.

Indeed, suppose that {M_{\vec a} = M_{\vec b} \hbox{ mod } 3^k} for two tuples {\vec a = (a_1,\dots,a_{n_{j-1}-1})}, {\vec b = (b_1,\dots,b_{n_{j-1}-1})} of the above form. Then

\displaystyle  \sum_{m=1}^{n_{j-1}-1} 3^{m-1} 2^{-a_1-\dots-a_m} = \sum_{m=1}^{n_{j-1}-1} 3^{m-1} 2^{-b_1-\dots-b_m} \hbox{ mod } 3^{n_{j-1}-1+k}

(where we now invert {2} in the ring {{\bf Z}/3^{n_{j-1}-1+k}{\bf Z}}), or equivalently

\displaystyle  \sum_{m=1}^{n_{j-1}-1} 3^{m-1} 2^{a_{m+1}+\dots+a_{n_{j-1}-1}} = \sum_{m=1}^{n_{j-1}-1} 3^{m-1} 2^{b_{m+1}+\dots+b_{n_{j-1}-1}} \hbox{ mod } 3^{n_{j-1}-1+k}.

By (11), (7), all the summands on the left-hand side are natural numbers of size {O( 2^{L_{j-1}} 3^{O(\varepsilon^5 n_{j-1})})}, hence the sum also has this size; similarly for the right-hand side. From the estimates of {n_{j-1}, n_{j}}, we thus see that both sides are natural numbers between {1} and {3^{n_{j-1}-1+k}}, by hypothesis on {k}. Thus we may remove the modular constraint and conclude that

\displaystyle  \sum_{m=1}^{n_{j-1}-1} 3^{m-1} 2^{a_{m+1}+\dots+a_{n_{j-1}-1}} = \sum_{m=1}^{n_{j-1}-1} 3^{m-1} 2^{b_{m+1}+\dots+b_{n_{j-1}-1}}

and then a routine induction (see Lemma 6.2 of my paper) shows that {(a_1,\dots,a_{n_{j-1}-1}) = (b_1,\dots,b_{n_{j-1}-1})}. This establishes the claim.

As a corollary, we see that every residue class modulo {3^{n_j-n_{j-1}}} contains

\displaystyle O( 3^{k - (n_j-n_{j-1})} ) = O( 2^{L_{j-1}} 3^{-n_j + \varepsilon^4 n_{j-1}} )

of the {M_{\vec a}} at most. Since there were at least {3^{-n_{j-1} - \varepsilon^4 n_{j-1}} 2^{L_{j-1}}} tuples {\vec a} to begin with, we may therefore forbid up to {O(3^{n_j-n_{j-1} - 3 \varepsilon^4 n_{j-1}})} residue classes modulo {3^{n_j-n_{j-1}}}, and still have {\gg 3^{-n_{j-1} - \varepsilon^4 n_{j-1}} 2^{L_{j-1}}} surviving tuples {\vec a} with the property that {M_{\vec a}} avoids all the forbidden classes.

Let {\vec a} be one of the tuples (10). By the hypothesis {\beta = 1}, we have

\displaystyle  \sum_{(a_{n_{j-1}},\dots,a_{n_j-1}) \in S'_{n_j-n_{j-1},M_{\vec a}}} 2^{-a_{n_{j-1}}-\dots-a_{n_j-1}} \gg 3^{-(1+\varepsilon^6) (n_j-n_{j-1})}.

Let {S'''_{n_j-n_{j-1},M}} denote the set of tuples {(a_{n_{j-1}},\dots,a_{n_j-1}) \in S'_{n_j-n_{j-1},M}} with the additional property

\displaystyle  |a_{n_{j-1}} + \dots + a_i - 2(i-n_{j-1}+1)| \leq - \varepsilon^3 (n_j - n_{j-1})

for all {n_{j-1} \leq i \leq n_j - 1}, then by the Chernoff bound we have

\displaystyle  \sum_{b \in {\bf Z}/3^{n_j-n_{j-1}}{\bf Z}} \sum_{(a_{n_{j-1}},\dots,a_{n_j-1}) \in S'_{n_j-n_{j-1},b} \backslash S'''_{n_j-n_{j-1},b}} 2^{-a_{n_{j-1}}-\dots-a_{n_j-1}}

\displaystyle  \ll 3^{-c\varepsilon^3 (n_j-n_{j-1})}

for some absolute constant {c>0}. Thus, by the Markov inequality, by forbidding up to {O(3^{n_j-n_{j-1} - 3 \varepsilon^4 n_{j-1}})} classes, we may ensure that

\displaystyle  \sum_{(a_{n_{j-1}},\dots,a_{n_j-1}) \in S'_{n_j-n_{j-1},M_{\vec a}} \backslash S'''_{n_j-n_{j-1},M_{\vec a}}} 2^{-a_{n_{j-1}}-\dots-a_{n_j-1}} \ll 3^{-(1+\varepsilon^5) (n_j-n_{j-1})}

and hence

\displaystyle  \sum_{(a_{n_{j-1}},\dots,a_{n_j-1}) \in S'''_{n_j-n_{j-1},M_{\vec a}}} 2^{-a_{n_{j-1}}-\dots-a_{n_j-1}} \gg 3^{-(1+\varepsilon^6) (n_j-n_{j-1})}.

We thus have

\displaystyle  \sum_{a_1,\dots,a_{n_j-1}} 2^{-a_{n_{j-1}}-\dots-a_{n_j-1}} \gg 3^{-n_{j-1} - \varepsilon^4 n_{j-1}} 2^{L_{j-1}} 3^{-(1+\varepsilon^6) (n_j-n_{j-1})}

where {(a_1,\dots,a_{n_j-1})} run over all tuples with {\vec a = (a_1,\dots,a_{n_{j-1}-1})} being one of the previously surviving tuples, and {(a_{n_{j-1}},\dots,a_{n_j-1}) \in S'''_{n_j-n_{j-1},M_{\vec a}}}. By (11) we may rearrange this a little as

\displaystyle  \sum_{a_1,\dots,a_{n_j-1}} 2^{-a_1-\dots-a_{n_j-1}} \gg 3^{-n_{j} - \varepsilon^4 n_{j-1}-\varepsilon^6 (n_j-n_{j-1})}.

By construction, we have

\displaystyle  a_1 + \dots + a_{n_j-1} = L_{j-1} + (2 + O(\varepsilon^3)) (n_j - n_{j-1})

for any tuple in the above sum, hence by the pigeonhole principle we may find an integer

\displaystyle  L_j = L_{j-1} + (2 + O(\varepsilon^3)) (n_j - n_{j-1}) \ \ \ \ \ (13)

for which

\displaystyle  \sum_{a_1,\dots,a_{n_j-1}: a_1+\dots+a_{n_j-1}=L_j} 2^{-a_1-\dots-a_{n_j-1}} \geq 3^{-n_{j} - \varepsilon^4 n_j}.

In particular the number of summands is at least {3^{-n_{j} - \varepsilon^4 n_j} 2^{L_j}}. Also observe from (13), (12) that

\displaystyle  3^{-n_j} 2^{L_j} = 3^{-n_{j} + O( \varepsilon^3 (n_j - n_{j-1})} 2^{L_{j-1} + 2(n_j - n_{j-1})}

\displaystyle  = (4/3)^{(1+\varepsilon)^j n_0} 3^{( \varepsilon^3 (n_j - n_{j-1})}

so in particular

\displaystyle  (4/3)^{(1-\varepsilon^2) (1+\varepsilon)^j n_0} \leq 3^{-n_j} 2^{L_j} \leq (4/3)^{(1+\varepsilon^2) (1+\varepsilon)^j n_0}.

It is a routine matter to verify that all tuples in this sum lie in {S'_{n_j,N}} and obeys the requirements (6), (7), closing the induction hypothesis. \Box

Corollary 5 For all {N} in the residue class from the previous proposition, and all {j \geq 0}, we have

\displaystyle  \{ M \in (\mathrm{Syr}^{\bf N})^*(N): M \leq 3 (4/3)^{(1+\varepsilon^2) (1+\varepsilon)^j n_0} N \}

\displaystyle  \gg (4/3)^{(1-\varepsilon) (1+\varepsilon)^j n_0}.

In particular, we have

\displaystyle  \{ M \in (\mathrm{Syr}^{\bf N})^*(N): M \leq x \} \gg_{\varepsilon,n_0,N} x^{1-\varepsilon}

as {x \rightarrow \infty}.

Proof: For every tuple {(a_1,\dots,a_{n_j-1})} in the previous proposition, we have

\displaystyle  N = \sum_{m=1}^{n_{j}-1} 3^{m-1} 2^{-a_1-\dots-a_m} + 3^{n_{j}-1} 2^{-L_{j}} M

for some integer {M}. As before, all these integers {M} are distinct, and have magnitude

\displaystyle  M \leq 3^{-n_j+1} 2^{L_j} N \leq \leq 3 (4/3)^{(1+\varepsilon^2) (1+\varepsilon)^j n_0} N.

From construction we also have {\mathrm{Syr}^{n_j}(M) = N}, so that {M \in (\mathrm{Syr}^{\bf N})^*(N)}. The number of tuples is at least

\displaystyle  3^{-n_j - \varepsilon^4 n_j} 2^{L_j}

which can be computed from the properties of {n_j,L_j} to be of size at least {(4/3)^{(1-\varepsilon) (1+\varepsilon)^j n_0}}. This gives the first claim, and the second claim follows by taking {j} to be the first integer for which {3 (4/3)^{(1+\varepsilon^2) (1+\varepsilon)^j n_0} N \geq x}. \Box

To conclude the proof of Proposition 3, it thus suffices to show that

Lemma 6 Every residue class {b \hbox{ mod } 3^{n_0}} has a non-trivial intersection with {(\mathrm{Syr}^{\bf N})^*(1)}.

Indeed, if we let {b \hbox{ mod } 3^{n_0}} be the residue class from the preceding propositions, and use this lemma to produce an element {N} of {(\mathrm{Syr}^{\bf N})^*(1)} that lies in this class, then from the inclusion {(\mathrm{Syr}^{\bf N})^*(N) \subset (\mathrm{Syr}^{\bf N})^*(1)} we obtain (3) with {\gamma = 1-O(\varepsilon)}, and then on sending {\varepsilon} to zero we obtain the claim.

Proof: An easy induction (based on first establishing that {2^{2 \times 3^n} = 1 + 3^{n+1} \hbox{ mod } 3^{n+2}} for all natural numbers {n}) shows that the powers of two modulo {3^{n_0+1}} occupy every residue class not divisible by {3}. From this we can locate an integer {N} in {b \hbox{ mod } 3^{n_0}} of the form {N = \frac{2^n-1}{3}}. Since {\mathrm{Syr}(N)=1}, the claim follows. \Box

We remark that the same argument in fact shows (assuming {\beta=1} of course) that

\displaystyle  \# \{ N \leq x: N \in (\mathrm{Col}^{\bf N})^*(N_0) \} \gg_{N_0} x^{1-o(1)}

for any natural number {N_0} not divisible by three.

John BaezEntropy in the Universe

If you click on this picture, you’ll see a zoomable image of the Milky Way with 84 million stars:

But stars contribute only a tiny fraction of the total entropy in the observable Universe. If it’s random information you want, look elsewhere!

First: what’s the ‘observable Universe’, exactly?

The further you look out into the Universe, the further you look back in time. You can’t see through the hot gas from 380,000 years after the Big Bang. That ‘wall of fire’ marks the limits of the observable Universe.

But as the Universe expands, the distant ancient stars and gas we see have moved even farther away, so they’re no longer observable. Thus, the so-called ‘observable Universe’ is really the ‘formerly observable Universe’. Its edge is 46.5 billion light years away now!

This is true even though the Universe is only 13.8 billion years old. A standard challenge in understanding general relativity is to figure out how this is possible, given that nothing can move faster than light.

What’s the total number of stars in the observable Universe? Estimates go up as telescopes improve. Right now people think there are between 100 and 400 billion stars in the Milky Way. They think there are between 170 billion and 2 trillion galaxies in the Universe.

In 2009, Chas Egan and Charles Lineweaver estimated the total entropy of all the stars in the observable Universe at 1081 bits. You should think of these as qubits: it’s the amount of information to describe the quantum state of everything in all these stars.

But the entropy of interstellar and intergalactic gas and dust is about ten times more the entropy of stars! It’s about 1082 bits.

The entropy in all the photons in the Universe is even more! The Universe is full of radiation left over from the Big Bang. The photons in the observable Universe left over from the Big Bang have a total entropy of about 1090 bits. It’s called the ‘cosmic microwave background radiation’.

The neutrinos from the Big Bang also carry about 1090 bits—a bit less than the photons. The gravitons carry much less, about 1088 bits. That’s because they decoupled from other matter and radiation very early, and have been cooling ever since. On the other hand, photons in the cosmic microwave background radiation were formed by annihilating
electron-positron pairs until about 10 seconds after the Big Bang. Thus the graviton radiation is expected to be cooler than the microwave background radiation: about 0.6 kelvin as compared to 2.7 kelvin.

Black holes have immensely more entropy than anything listed so far. Egan and Lineweaver estimate the entropy of stellar-mass black holes in the observable Universe at 1098 bits. This is connected to why black holes are so stable: the Second Law says entropy likes to increase.

But the entropy of black holes grows quadratically with mass! So black holes tend to merge and form bigger black holes — ultimately forming the ‘supermassive’ black holes at the centers of most galaxies. These dominate the entropy of the observable Universe: about 10104 bits.

Hawking predicted that black holes slowly radiate away their mass when they’re in a cold enough environment. But the Universe is much too hot for supermassive black holes to be losing mass now. Instead, they very slowly grow by eating the cosmic microwave background, even when they’re not eating stars, gas and dust.

So, only in the far future will the Universe cool down enough for large black holes to start slowly decaying via Hawking radiation. Entropy will continue to increase… going mainly into photons and gravitons! This process will take a very long time. Assuming nothing is falling into it and no unknown effects intervene, a solar-mass black hole takes about 1067 years to evaporate due to Hawking radiation — while a really big one, comparable to the mass of a galaxy, should take about 1099 years.

If our current most popular ideas on dark energy are correct, the Universe will continue to expand exponentially. Thanks to this, there will be a cosmological event horizon surrounding each observer, which will radiate Hawking radiation at a temperature of roughly 10-30 kelvin.

In this scenario the Universe in the very far future will mainly consist of massless particles produced as Hawking radiation at this temperature: photons and gravitons. The entropy within the exponentially expanding ball of space that is today our ‘observable Universe’ will continue to increase exponentially… but more to the point, the entropy density will approach that of a gas of photons and gravitons in thermal equilibrium at 10-30 kelvin.

Of course, it’s quite likely that some new physics will turn up, between now and then, that changes the story! I hope so: this would be a rather dull ending to the Universe.

For more details, go here:

• Chas A. Egan and Charles H. Lineweaver, A larger estimate of the entropy of the universe, The Astrophysical Journal 710 (2010), 1825.

Also read my page on information.

Peter Rohde The NSW Police annual calendar: 2020 edition

I'm pleased to present the first edition of the annual NSW Police calendar.

Get your free PDF copy here.

January 24, 2020

Matt von HippelMath Is the Art of Stating Things Clearly

Why do we use math?

In physics we describe everything, from the smallest of particles to the largest of galaxies, with the language of mathematics. Why should that one field be able to describe so much? And why don’t we use something else?

The truth is, this is a trick question. Mathematics isn’t a language like English or French, where we can choose whichever translation we want. We use mathematics because it is, almost by definition, the best choice. That is because mathematics is the art of stating things clearly.

An infinite number of mathematicians walk into a bar. The first orders a beer. The second orders half a beer. The third orders a quarter. The bartender stops them, pours two beers, and says “You guys should know your limits.”

That was an (old) joke about infinite series of numbers. You probably learned in high school that if you add up one plus a half plus a quarter…you eventually get two. To be a bit more precise:

\sum_{i=0}^\infty \frac{1}{2^i} = 1+\frac{1}{2}+\frac{1}{4}+\ldots=2

We say that this infinite sum limits to two.

But what does it actually mean for an infinite sum to limit to a number? What does it mean to sum infinitely many numbers, let alone infinitely many beers ordered by infinitely many mathematicians?

You’re asking these questions because I haven’t yet stated the problem clearly. Those of you who’ve learned a bit more mathematics (maybe in high school, maybe in college) will know another way of stating it.

You know how to sum a finite set of beers. You start with one beer, then one and a half, then one and three-quarters. Sum N beers, and you get

\sum_{i=0}^N \frac{1}{2^i}

What does it mean for the sum to limit to two?

Let’s say you just wanted to get close to two. You want to get \epsilon close, where epsilon is the Greek letter we use for really small numbers.

For every \epsilon>0 you choose, no matter how small, I can pick a (finite!) N and get at least that close. That means that, with higher and higher N, I can get as close to two as a I want.

As it turns out, that’s what it means for a sum to limit to two. It’s saying the same thing, but more clearly, without sneaking in confusing claims about infinity.

These sort of proofs, with \epsilon (and usually another variable, \delta) form what mathematicians view as the foundations of calculus. They’re immortalized in story and song.

And they’re not even the clearest way of stating things! Go down that road, and you find more mathematics: definitions of numbers, foundations of logic, rabbit holes upon rabbit holes, all from the effort to state things clearly.

That’s why I’m not surprised that physicists use mathematics. We have to. We need clarity, if we want to understand the world. And mathematicians, they’re the people who spend their lives trying to state things clearly.

Mark GoodsellThe KOTO anomaly

The KOTO anomaly

Patrick Meade pointed out some new papers about an experimental anomaly, starting with his own. The KOTO experiment at J-PARC in Japan (where they are also building a \( g-2 \) experiment) has seen 3 events when looking for the rare process \( K_L \rightarrow \pi_0 + \mathrm{invisible} \), when they expect a background of \( 0.05 \pm 0.02 \) Update: it was pointed out to me that the effective background rate is \( 0.1 \pm 0.02 \) as in Meade's paper, because the Standard Model rate is \( 0.049 \pm 0.01 \). For more details see the slides of the talk where the results are reported; there is currently no paper about the excess. This is interesting as the Standard Model process \( K_L \rightarrow \pi \overline{\nu} \nu \) has a tiny branching ratio, two orders of magnitude too small to explain the number of events.

Assuming the anomaly is just statistics, the probability of observing three or more events would be of the order of one chance in \( 10,000 \) if we take the more generous estimate of the background. On the other hand, it is apparently only roughly two-sigma evidence for an anomalous \( K_L \rightarrow \pi_0 + \mathrm{invisible} \) signal. Moreover, the central value of the required signal is just above (but well within errors of) the Grossman-Nir bound, which says that if something generates \( K_L \rightarrow \pi \overline{\nu} \nu \), it should also generate \( K^+ \rightarrow \pi^+ \overline{\nu} \nu\) in the ratio $$ \frac{\mathrm{Br} (K_L \rightarrow \pi_0 \overline{\nu} \nu)}{\mathrm{Br}(K^+ \rightarrow \pi^+ \overline{\nu} \nu)} = \sin^2 \theta_c$$ where \( \theta_c \) is the Cabbibo angle, provided that the interactions respect isospin. Since the charged process is not observed, the observed anomaly might be in slight tension with this bound.

So far I can find three papers seeking to explain this anomaly, through light scalar extensions of the Standard Model (with masses less than 180 MeV) and the inevitable two-Higgs doublet model. Since such scalars must couple to quarks/mesons they look a bit like axion-like particles and there are many astrophysical and beam-dump experiments that exclude large swathes of the potential parameter space, but this is quite exciting as, if the anomaly is confirmed, it should also be possible to easily look for it in (many) other experiments.

Clifford JohnsonSynchronicity?

Here’s a striking coincidence. Last Friday I was preparing to deliver a lecture on special relativity to my undergrad General Relativity class with this Hobbity thought experiment (that helps one discover Lorentz-Fitzgerald contraction), when I heard that Christopher Tolkien (the boy the Hobbit was originally written for) had died. (RIP. … Click to continue reading this post

The post Synchronicity? appeared first on Asymptotia.

January 23, 2020

n-Category Café Emily Gets A Huge Prize

Café host Emily Riehl has just been awarded a $250,000 prize by her university!! Johns Hopkins gives one President’s Frontier Award every year across the whole university, and the 2020 one has gone to Emily. Up to now it’s usually been given to biological and medical researchers, but when Emily came along they had to make an exception and give it to a mathematician. The award has the goal of “supporting exceptional scholars … who are on the cusp of transforming their fields.”.

Congratulations, Emily! Obviously you’re too modest to announce it yourself here, but someone had to.

You can read all about it here, including the delightful description of how the news was sprung on her:

When Riehl arrived at what she thought was a meeting with a department administrator, she says it was “a complete shock” to find JHU President Ronald J. Daniels, Provost Sunil Kumar, other university leaders, and many colleagues poised to surprise her.

Doug NatelsonStretchy bioelectronics and ptychographic imaging - two fun talks

One of the great things about a good university is the variety of excellent talks that you can see. 

Yesterday we had our annual Chapman Lecture on Nanotechnology, in honor of Rice alum Richard Chapman, who turned down a first-round draft selection to the Detroit Lions to pursue a physics PhD and a career in engineering.  This year's speaker was Zhenan Bao from Stanford, whom I know from back in my Bell Labs postdoc days.  She spoke about her group's remarkable work on artificial skin:  biocompatible, ultraflexible electronics including active matrices of touch sensors, transistors, etc.  Here are a few representative papers that give you some idea of the kind of work that goes into this: Engineering semiconducting polymers to have robust elastic properties while retaining high charge mobilities; a way of combining conducting polymers (PEDOT) with hydrogels so that you can pattern them and then hydrate to produce super-soft devices; a full-on demonstration of artificial skin for sensing applications.  Very impressive stuff. 

Today, we had a colloquium by Gabe Aeppli of ETH and the Paul Scherrer Institute, talking about x-ray ptychographic imaging.  Ptychography is a simple enough idea.  Use a coherent source of radiation to illuminate some sample at some spot, and with a large-area detector, measure the diffraction pattern.  Now scan the spot over the sample (including perhaps rotating the sample) and record all those diffraction patterns as well.  With the right approach, you can combine all of those diffraction patterns and invert to get the spatial distribution of the scatterers (that is, the matter in the sample).  Sounds reasonable, but these folks have taken it to the next level (pdf here).   The video I'm embedding here is the old result from 2017.  The 2019 paper I linked here is even more impressive, able to image, nondestructively, in 3D, individual circuit elements within a commercial integrated circuit at nanoscale resolution.  It's clear that a long-term goal is to be able to image, non-destructively, the connectome of brains. 

January 22, 2020

Clifford JohnsonBlack Holes and a Return to 2D Gravity! – Part II

(A somewhat more technical post follows.)

Continuing from part I: Well, I set the scene there, and so after that, a number of different ideas come together nicely. Let me list them:

[caption id="attachment_19442" align="alignright" width="250"]illustration of JT gravity background What "nearly" AdS_2 looks like via JT gravity. The boundary wiggles, but has fixed length 1/T.[/caption]

  • Exact solution of the SYK model (or dual JT model) in that low temperature limit I mentioned before gave an answer for the partition function $latex Z(\beta)$, by solving the Schwarzian dynamics for the wiggling boundary that I mentioned earlier. (The interior has a model of gravity on $latex AdS_2$, as I mentioned before, but as we're in 2D, there's no local dynamics associated with that part. But we'll see in a moment that there's very interesting stuff to take into account there too.) Anyway, the result for the Schwarzian dynamics can be written (see Stanford and Witten) in a way familiar from standard, say, statistical mechanics: $latex Z_0(\beta)=\int dE \rho_0(E) \exp(-\beta E)$, where $latex \rho_0(E)\sim\sinh(2\pi\sqrt{E})$ is the spectral density of the model. I now need to explain why everything has a subscript 0 in it in the last sentence.
  • On the other hand, the JT gravity model organises itself as a very interesting topological sum that is important if we are doing quantum gravity. First, recall that we're working in the "Euclidean" manner discussed before (i.e., time is a spatial parameter, and so 2D space can be tessellated in that nice Escher way). The point is that the Einstein-Hilbert action in 2D is a topological counting parameter (as mentioned before, there's no dynamics!). The thing that is being counted is the Euler characteristic of the space: $latex \chi=2-2g-b-c$, where $latex g,b,c$ are the number of handles, boundaries, and crosscaps the surface has, characterising its topology. Forget about crosscaps for now (that has to do with unorientable surfaces like a möbius strip $latex (g=0,b=1,c=1)$ - we'll stick with orientable surfaces here). The full JT gravity action therefore has just the thing one needs to keep track of the dynamics of the quantum theory, and the partition function (or other quantities that you might wish to compute) can be written as a sum of contributions from every possible topology. So one can write the JT partition function as $latex Z(\beta)=\sum_{g=0}^\infty\hbar^{-(1-2g)}Z_g(\beta)$ where the parameter $latex \hbar$ weights different genus surfaces. In that sum the weight of a surface is $latex \hbar^{-\chi}$ and $latex b=1$ since there's a boundary of length $latex \beta$, you may recall.

    The basic Schwarzian computation mentioned above therefore gives the leading piece of the partition function, i.e., $latex g=0$, and so that's why I put the subscript 0 on it at the outset. A big question then is what is the result for JT gravity computed on all those other topologies?!

  • Click to continue reading this post

    The post Black Holes and a Return to 2D Gravity! – Part II appeared first on Asymptotia.

January 21, 2020

Clifford JohnsonBlack Holes and a Return to 2D Gravity! – Part I

(A somewhat more technical post follows.) Well, I think I promised to say a bit more about what I’ve been up to in that work that resulted in the paper I talked about in an earlier post. The title of my paper, “Non-perturbative JT gravity” has JT (Jackiw-Teitelbiom) gravity in … Click to continue reading this post

The post Black Holes and a Return to 2D Gravity! – Part I appeared first on Asymptotia.

John BaezTopos Theory (Part 4)

In Part 1, I said how to push sheaves forward along a continuous map. Now let’s see how to pull them back! This will set up a pair of adjoint functors with nice properties, called a ‘geometric morphism’.

First recall how we push sheaves forward. I’ll say it more concisely this time. If you have a continuous map f \colon X \to Y between topological spaces, the inverse image of any open set is open, so we get a map

f^{-1} \colon \mathcal{O}(Y) \to \mathcal{O}(X)

A functor between categories gives a functor between the opposite categories. I’ll use the same name for this, if you can stand it:

f^{-1} \colon \mathcal{O}(Y)^{\mathrm{op}} \to \mathcal{O}(X)^{\mathrm{op}}

A presheaf on X is a functor

F \colon \mathcal{O}(X)^{\mathrm{op}} \to \mathsf{Set}

and we can compose this with f^{-1} to get a presheaf on Y,

F \circ f^{-1} \colon \mathcal{O}(Y)^{\mathrm{op}} \to \mathsf{Set}

We call this presheaf on Y the direct image or pushforward of F along f, and we write it as f_\ast F. In a nutshell:

f_\ast F = F \circ f^{-1}

Even better, this direct image operation extends to a functor from the category of presheaves on X to the category of presheaves on Y:

f_\ast \colon \widehat{\mathcal{O}(X)} \to \widehat{\mathcal{O}(Y)}

Better still, this functor sends sheaves to sheaves, so it restricts to a functor

f_\ast \colon \mathsf{Sh}(X) \to \mathsf{Sh}(X)

This is how we push forward sheaves on X to get sheaves on Y.

All this seems very natural and nice. But now let’s stop pushing and start pulling! This will give a functor going the other way:

f^\ast \colon \mathsf{Sh}(Y) \to \mathsf{Sh}(X)

The inverse image of a sheaf

At first it seems hard how to pull back sheaves, given how natural it was to push them forward. This is where our second picture of sheaves comes in handy!

Remember, a bundle over a topological space Y is a topological space E equipped with a continuous map

p \colon E \to Y

We say it’s an etale space over Y if it has a special property: each point e \in E has an open neighborhood such that p restricted to this neighborhood is a homeomorphism from this neighborhood to an open subset of Y. In Part 2 we defined the category of bundles over X, which is called \mathsf{Top}/X, and the full subcategory of this whose objects are etale spaces, called \mathsf{Etale}(X). I also sketched how we get an equivalence of categories

\mathsf{Sh}(X) \simeq \mathsf{Etale}(X)

So, to pull back sheaves we can just convert them into etale spaces, pull those back, and then convert them back into sheaves!

First I’ll tell you how to pull back a bundle. I’ll assume you know the general concept of ‘pullbacks’, and what they’re like in the category of sets. The category of topological spaces and continuous maps has pullbacks, and they work a lot like they do in the category of sets. Say we’re given a bundle over Y, which is really just a continuous map

p \colon E \to Y

and a continuous map

f \colon X \to Y

Then we can form their pullback and get a bundle over X called

f^\ast(p) \colon f^\ast(E) \to X

In class I’ll draw the pullback diagram, but it’s too much work to do here! As a set,

f^\ast E = \{ (e,x) \in E \times X \; \colon \; p(e) = f(x) \}

It’s a subset of E \times X, and we make it into a topological space using the subspace topology. The map

f^\ast p  \colon f^\ast E \to X

does the obvious thing: it sends (e,x) to x.

Puzzle. Prove that this construction really obeys the universal property for pullbacks in the category \mathsf{Top} where objects are topological space and morphisms are continuous maps.

Puzzle. Show that this construction extends to a functor

f^\ast \colon \mathsf{Top}/Y \to \mathsf{Top}/X

That is, find a natural way to define the pullback of a morphism between bundles, and prove that this makes f^\ast into a functor.

Puzzle. Prove that if p \colon E \to Y is an etale space over Y, and f \colon X \to Y is any continuous map, then f^\ast p \colon f^\ast E \to X is an etale space over X.

Putting these puzzles together, it instantly follows that we can restrict the functor

f^\ast \colon \mathsf{Top}/Y \to \mathsf{Top}/X

to etale spaces and morphisms between those, and get a functor

f^\ast \colon \mathsf{Etale}(Y) \to \mathsf{Etale}(X)

Using the equivalence

\mathsf{Sh}(X) \simeq \mathsf{Etale}(X)

we then get our desired functor

f^\ast \colon \mathsf{Sh}(Y) \to \mathsf{Sh}(X)

called the inverse image or pullback functor.

Slick! But what does the inverse image of a sheaf actually look like?

Suppose we have a sheaf F on Y and a continuous map f \colon X \to Y. We get an inverse image sheaf f^\ast(F) on X. But what is it like, concretely?

That is, suppose we have an open set U \subseteq X. What does an element s of (f^\ast F)(U) amount to?

Unraveling the definitions, s must be a section over U of the pullback along f of the etale space corresponding to F.

A point in the etale space corresponding to F is the germ at some y \in Y of some s \in F(V) where V is some open neighborhood of y.

Thus, our section s is just a continuous function sending each point x \in U to some germ of this sort at y = f(x).

There is more to say: we could try to unravel the definitions a bit more, and describe (f^\ast F)(U) directly in terms of the sheaf F, without mentioning the corresponding etale space! But maybe one of you reading this can do that more gracefully than I can.

The adjunction between direct and inverse image functors

Once they have direct and inverse images in hand, Mac Lane and Moerdijk prove the following as Theorem 2 in Section II.9:

Theorem. For any continuous map f \colon X \to Y, the direct image functor

f_\ast \colon \mathsf{Sh}(Y) \to \mathsf{Sh}(X)

is left adjoint to the inverse image functor:

f^\ast \colon \mathsf{Sh}(Y) \to \mathsf{Sh}(X)

I won’t do it here, so please look at their proof if you’re curious! As you might expect, it involves hopping back and forth between our two pictures of sheaves: as presheaves with an extra property, and as bundles with an extra property — namely, etale spaces.

I don’t think there’s anything especially sneaky about their argument. They do however use this: if you take a sheaf, and convert it into an etale space, and convert that back into a sheaf, you get back where you started up to natural isomorphism. This isomorphism is just the counit \eta that I mentioned in Part 3.

Remember, the functor that turns presheaves into bundles

\Lambda \colon \widehat{\mathcal{O}(X)} \to \mathsf{Top}/X

is left adjoint to the functor that turns bundles into presheaves:

\Gamma \colon \mathsf{Top}/X \to \widehat{\mathcal{O}(X)}

So, there’s a unit

\epsilon \colon 1 \Rightarrow \Gamma \Lambda

and a unit

\eta \colon \Lambda \Gamma \Rightarrow 1

The fact we need now is that whenever a presheaf F is a sheaf, its counit

\eta_F \Lambda \Lambda F \to F

is an isomorphism. This is part of Theorem 2 in Section II.6 in Mac Lane and Moerdijk.

And by the way, this fact has a partner! Whenever a bundle is an etale space, its unit is an isomorphism. So, converting an etale space into a sheaf and then back into an etale space also gets you back where you started, up to natural isomorphism. But the favored direction of this morphism is in the other direction: any sheaf maps to the sheaf of sections of its associated etale space, while any bundle maps to the etale space of its sheaf of sections.

David Hoggnothing

Nothing for two days. For reasons that are out of scope here.

Cylindrical OnionCMS Photo Contest 2020 - share the beauty of your everyday CMS

A probe station for testing silicon sensors. CMS will use silicon sensors for building prototypes of a highly granular sandwich calorimeter, the CMS HGC (High Granular Calorimeter) upgrade for High-Luminosity LHC. (Credit: N. Caraban Gonzalez)

A picture is worth a thousand words.

70% of all your sensory receptors are in your eyes. 50% of our brain is involved in visual processing.*

John BaezTopos Theory (Part 3)

Last time I described two viewpoints on sheaves. In the first, a sheaf on a topological space X is a special sort of presheaf

F \colon \mathcal{O}(X)^{\mathrm{op}} \to \mathsf{Set}

Namely, it’s one obeying the ‘sheaf condition’.

I explained this condition in Part 1, but here’s a slicker way to say it. Suppose U \subseteq X is an open set covered by a collection of open sets U_i \subseteq U. Then we get this diagram:

\displaystyle{ FU \rightarrow \prod_i FU_i \rightrightarrows \prod_{i,j} F(U_i \cap U_j) }

The first arrow comes from restricting elements of FU to the smaller sets U_i. The other two arrows come from this: we can either restrict from FU_i to F(U_i \cap U_j), or restrict from FU_j to F(U_i \cap U_j).

The sheaf condition says that this diagram is an equalizer! This is just another way of saying that a family of s_i \in FU_i are the restrictions of a unique s \in FU iff their restrictions to the overlaps U_i \cap U_j are equal.

In the second viewpoint, a sheaf is a bundle over X

p \colon Y \to X

with the special property of being ‘etale’. Remember, this means that every point in Y has an open neighborhood that’s mapped homeomorphically onto an open neighborhood in X.

Last time I showed you how to change viewpoints. We got a functor that turns presheaves into bundles

\Lambda \colon \widehat{\mathcal{O}(X)} \to \mathsf{Top}/X

and a functor that turns bundles into presheaves:

\Gamma \colon \mathsf{Top}/X \to \widehat{\mathcal{O}(X)}

Moreover, I claimed \Lambda actually turns presheaves into etale spaces, and \Gamma actually turns bundles into sheaves. And I claimed that these functors restrict to an equivalence between the category of sheaves and the category of etale spaces:

\mathsf{Sh}(X) \simeq  \mathsf{Etale}(X)

What can we do with these ideas? Right away we can do two things:

• We can describe ‘sheafification’: the process of improving a presheaf to get a sheaf.

• We can see how to push forward and pull back sheaves along a continuous map between spaces.

I’ll do the first now and the second next time. I’m finding it pleasant to break up these notes into small bite-sized pieces, shorter than my actual lectures.


To turn a presheaf into a sheaf, we just hit it with \Lambda and then with \Gamma. In other words, we turn our presheaf into a bundle and then turn it back into a presheaf. It turns out the result is a sheaf!

Why? The reason is this:

Theorem. If we apply the functor

\Gamma \colon \mathsf{Top}/X \to \widehat{\mathcal{O}(X)}

to any object, the result is a sheaf on X.

(The objects of \mathsf{Top}/X are, of course, the bundles over X.)

Proving this theorem was a puzzle last time; let me outline the solution. Remember that if we take a bundle

p \colon Y \to X

and hit it with \Gamma, we get a presheaf called \Gamma_p where \Gamma_p U is the set of sections of Y over X, and we restrict sections in the usual way, by restricting functions. But you can check that if we have an open set U covered by a bunch of open subsets U_i, and a bunch of sections s_i on the U_i that agree on the overlaps U_i \cap U_j, these sections piece together to define a unique section on all of U that restricts to each of the s_i. So, \Gamma_p is a sheaf!

It follows that \Gamma \Lambda sends presheaves to sheaves. Since sheaves are a full subcategory of presheaves, any \Gamma \Lambda automatically sends any morphism of presheaves to a morphism of sheaves, and we get the sheafification functor

\Gamma \Lambda \colon \widehat{\mathcal{O}(X)} \to \mathsf{Sh}(X)

To fully understand this, it’s good to actually take a presheaf and sheafify it. So take a presheaf:

F \colon \mathcal{O}(X)^{\mathrm{op}} \to \mathsf{Set}

When we hit this with \Lambda, we get a bundle

p \colon \Lambda F \to X

Remember: any element of F(U) for any open neighborhood U of x gives a point over x in \Lambda F, all points over x show up this way, and two such elements s \in F(U), s' \in F(U') determine the same point iff they become equal when we restrict them to some sufficiently small open neighborhood of x.

When we hit this bundle with \Gamma, we get a sheaf

\Gamma \Lambda F \colon \mathcal{O}(X)^{\mathrm{op}} \to \mathsf{Set}

where (\Gamma \Lambda F)U is the set of sections of p over U. This is the sheafification of F.

So, if you think about it, you’ll see this: to define a section of the sheafification of F over an open set U, you can just take a bunch of sections of F over open sets covering U that agree when restricted to the overlaps.

Puzzle. Prove the above claim. Give a procedure for constructing a section of \Gamma \Lambda F over U given open sets U_i \subseteq U covering U and sections s_i of F over the U_i that obey

\displaystyle{ s_i|_{U_i \cap U_j} = s_j|_{U_i \cap U_j} }

The adjunction between presheaves and bundles

Here’s one nice consequence of the last puzzle. We can always use the trivial cover of U by U itself! Thus, any section of F over U gives a section of \Gamma \Lambda F over U. This is the key to the following puzzle:

Puzzle. Show that for any presheaf F there is morphism of presheaves

\eta_F \colon F \to \Gamma \Lambda F

Show that these morphisms are natural in F, so they define a natural transformation \eta \colon 1 \Rightarrow \Lambda \Gamma.

Now, this is just the sort of thing we’d expect if \Lambda were the left adjoint of \Gamma. Remember, when you have a left adjoint L \colon C \to D and a right adjoint R \colon D \to C, you always have a ‘unit’

\eta \colon 1 \Rightarrow R L

and a ‘counit’

\epsilon \colon L R \Rightarrow 1

where the double arrows stand for natural transformations.

And indeed, in Part 2 I claimed that \Lambda is the left adjoint of \Gamma. But I didn’t prove it. What we’re doing now could be part of the proof: in fact Mac Lane and Moerdijk prove it this way in Theorem 2 of Section II.6.

Let’s see if we can construct the counit

\epsilon \colon \Lambda \Gamma \Rightarrow 1

For this I hand you a bundle

p \colon Y \to X

You form its sheaf of sections \Gamma_p, and then you form the etale space \Lambda \Gamma_p of that. Then you want to construct a morphism of bundles \eta_p from your etale space \Lambda \Gamma_p to my original bundle.

Mac Lane and Moerdijk call the construction ‘inevitable’. Here’s how it works. We get points in \Lambda \Gamma_p over x \in X from sections of p \colon Y \to X over open sets containing x. But you can just take one of these sections and evaluate it at x and get a point in Y.

Puzzle. Show that this procedure gives a well-defined continuous map

\epsilon_p \colon \Lambda \Gamma_p \to Y

and that this is actually a morphism of bundles over X. Show that these morphisms define a natural transformation \eta \colon \Lambda \Gamma \Rightarrow 1.

Now that we have the unit and counit, if you’re feeling ambitious you can show they obey the two equations required to get a pair of adjoint functors, thus solving the following puzzle:

Puzzle. Show that

\Lambda \colon \widehat{\mathcal{O}(X)} \to \mathsf{Top}/X

is left adjoint to

\Gamma \colon \mathsf{Top}/X \to \widehat{\mathcal{O}(X)}

If you’re not feeling so ambitious, just look at Mac Lane and Moerdijk’s proof of Theorem 2 in Section II.6!

January 20, 2020

Doug NatelsonBrief items

Here are some items of interest:

  • An attempt to lay out a vision for research in the US beyond Science: The Endless Frontier.  The evolving roles of the national academies are interesting, though I found the description of the future of research universities to be rather vague - I'm not sure growing universities to the size of Arizona State is the best way to provide high quality access to knowledge for a large population.  It still feels to me like an eventual successful endpoint for online education could be natural language individualized tutoring ("Alexa, teach me multivariable calculus."), but we are still a long way from there.
  • Atomic-resolution movies of chemistry are still cool.
  • Dan Ralph at Cornell has done a nice service to the community by making his lecture notes available on the arxiv.  The intent is for these to serve as a supplement to a solid state course such as one out of Ashcroft and Mermin, bringing students up to date about Berry curvature and topology at a similar level to that famous text.
  • This preprint tries to understand an extremely early color photography process developed by Becquerel (the photovoltaic one, who was the father of the radioactivity Becquerel).  It turns out that there are systematic changes in reflectivity spectra of the exposed Ag/AgCl films depending on the incident wavelength.  Why the reflectivity changes that way remains a mystery to me after reading this.
  • On a related note, this led me to this PNAS paper about the role of plasmons in the daguerreotype process.  Voila, nanophotonics in the 19th century.
  • This preprint (now out in Nature Nano) demonstrates incredibly sensitive measurements of torques on very rapidly rotating dielectric nanoparticles.  This could be used to see vacuum rotational friction.
  • The inventors of chemically amplified photoresists have been awarded the Charles Stark Draper prize.  Without that research, you probably would not have the computing device sitting in front of you....

Terence TaoAn uncountable Moore-Schmidt theorem

Asgar Jamneshan and I have just uploaded to the arXiv our paper “An uncountable Moore-Schmidt theorem“. This paper revisits a classical theorem of Moore and Schmidt in measurable cohomology of measure-preserving systems. To state the theorem, let {X = (X,{\mathcal X},\mu)} be a probability space, and {\mathrm{Aut}(X, {\mathcal X}, \mu)} be the group of measure-preserving automorphisms of this space, that is to say the invertible bimeasurable maps {T: X \rightarrow X} that preserve the measure {\mu}: {T_* \mu = \mu}. To avoid some ambiguity later in this post when we introduce abstract analogues of measure theory, we will refer to measurable maps as concrete measurable maps, and measurable spaces as concrete measurable spaces. (One could also call {X = (X,{\mathcal X}, \mu)} a concrete probability space, but we will not need to do so here as we will not be working explicitly with abstract probability spaces.)

Let {\Gamma = (\Gamma,\cdot)} be a discrete group. A (concrete) measure-preserving action of {\Gamma} on {X} is a group homomorphism {\gamma \mapsto T^\gamma} from {\Gamma} to {\mathrm{Aut}(X, {\mathcal X}, \mu)}, thus {T^1} is the identity map and {T^{\gamma_1} \circ T^{\gamma_2} = T^{\gamma_1 \gamma_2}} for all {\gamma_1,\gamma_2 \in \Gamma}. A large portion of ergodic theory is concerned with the study of such measure-preserving actions, especially in the classical case when {\Gamma} is the integers (with the additive group law).

Let {K = (K,+)} be a compact Hausdorff abelian group, which we can endow with the Borel {\sigma}-algebra {{\mathcal B}(K)}. A (concrete measurable) {K}cocycle is a collection {\rho = (\rho_\gamma)_{\gamma \in \Gamma}} of concrete measurable maps {\rho_\gamma: X \rightarrow K} obeying the cocycle equation

\displaystyle \rho_{\gamma_1 \gamma_2}(x) = \rho_{\gamma_1} \circ T^{\gamma_2}(x) + \rho_{\gamma_2}(x) \ \ \ \ \ (1)

for {\mu}-almost every {x \in X}. (Here we are glossing over a measure-theoretic subtlety that we will return to later in this post – see if you can spot it before then!) Cocycles arise naturally in the theory of group extensions of dynamical systems; in particular (and ignoring the aforementioned subtlety), each cocycle induces a measure-preserving action {\gamma \mapsto S^\gamma} on {X \times K} (which we endow with the product of {\mu} with Haar probability measure on {K}), defined by

\displaystyle S^\gamma( x, k ) := (T^\gamma x, k + \rho_\gamma(x) ).

This connection with group extensions was the original motivation for our study of measurable cohomology, but is not the focus of the current paper.

A special case of a {K}-valued cocycle is a (concrete measurable) {K}-valued coboundary, in which {\rho_\gamma} for each {\gamma \in \Gamma} takes the special form

\displaystyle \rho_\gamma(x) = F \circ T^\gamma(x) - F(x)

for {\mu}-almost every {x \in X}, where {F: X \rightarrow K} is some measurable function; note that (ignoring the aforementioned subtlety), every function of this form is automatically a concrete measurable {K}-valued cocycle. One of the first basic questions in measurable cohomology is to try to characterize which {K}-valued cocycles are in fact {K}-valued coboundaries. This is a difficult question in general. However, there is a general result of Moore and Schmidt that at least allows one to reduce to the model case when {K} is the unit circle {\mathbf{T} = {\bf R}/{\bf Z}}, by taking advantage of the Pontryagin dual group {\hat K} of characters {\hat k: K \rightarrow \mathbf{T}}, that is to say the collection of continuous homomorphisms {\hat k: k \mapsto \langle \hat k, k \rangle} to the unit circle. More precisely, we have

Theorem 1 (Countable Moore-Schmidt theorem) Let {\Gamma} be a discrete group acting in a concrete measure-preserving fashion on a probability space {X}. Let {K} be a compact Hausdorff abelian group. Assume the following additional hypotheses:

Then a {K}-valued concrete measurable cocycle {\rho = (\rho_\gamma)_{\gamma \in \Gamma}} is a concrete coboundary if and only if for each character {\hat k \in \hat K}, the {\mathbf{T}}-valued cocycles {\langle \hat k, \rho \rangle = ( \langle \hat k, \rho_\gamma \rangle )_{\gamma \in \Gamma}} are concrete coboundaries.

The hypotheses (i), (ii), (iii) are saying in some sense that the data {\Gamma, X, K} are not too “large”; in all three cases they are saying in some sense that the data are only “countably complicated”. For instance, (iii) is equivalent to {K} being second countable, and (ii) is equivalent to {X} being modeled by a complete separable metric space. It is because of this restriction that we refer to this result as a “countable” Moore-Schmidt theorem. This theorem is a useful tool in several other applications, such as the Host-Kra structure theorem for ergodic systems; I hope to return to these subsequent applications in a future post.

Let us very briefly sketch the main ideas of the proof of Theorem 1. Ignore for now issues of measurability, and pretend that something that holds almost everywhere in fact holds everywhere. The hard direction is to show that if each {\langle \hat k, \rho \rangle} is a coboundary, then so is {\rho}. By hypothesis, we then have an equation of the form

\displaystyle \langle \hat k, \rho_\gamma(x) \rangle = \alpha_{\hat k} \circ T^\gamma(x) - \alpha_{\hat k}(x) \ \ \ \ \ (2)

for all {\hat k, \gamma, x} and some functions {\alpha_{\hat k}: X \rightarrow {\mathbf T}}, and our task is then to produce a function {F: X \rightarrow K} for which

\displaystyle \rho_\gamma(x) = F \circ T^\gamma(x) - F(x)

for all {\gamma,x}.

Comparing the two equations, the task would be easy if we could find an {F: X \rightarrow K} for which

\displaystyle \langle \hat k, F(x) \rangle = \alpha_{\hat k}(x) \ \ \ \ \ (3)

for all {\hat k, x}. However there is an obstruction to this: the left-hand side of (3) is additive in {\hat k}, so the right-hand side would have to be also in order to obtain such a representation. In other words, for this strategy to work, one would have to first establish the identity

\displaystyle \alpha_{\hat k_1 + \hat k_2}(x) - \alpha_{\hat k_1}(x) - \alpha_{\hat k_2}(x) = 0 \ \ \ \ \ (4)

for all {\hat k_1, \hat k_2, x}. On the other hand, the good news is that if we somehow manage to obtain the equation, then we can obtain a function {F} obeying (3), thanks to Pontryagin duality, which gives a one-to-one correspondence between {K} and the homomorphisms of the (discrete) group {\hat K} to {\mathbf{T}}.

Now, it turns out that one cannot derive the equation (4) directly from the given information (2). However, the left-hand side of (2) is additive in {\hat k}, so the right-hand side must be also. Manipulating this fact, we eventually arrive at

\displaystyle (\alpha_{\hat k_1 + \hat k_2} - \alpha_{\hat k_1} - \alpha_{\hat k_2}) \circ T^\gamma(x) = (\alpha_{\hat k_1 + \hat k_2} - \alpha_{\hat k_1} - \alpha_{\hat k_2})(x).

In other words, we don’t get to show that the left-hand side of (4) vanishes, but we do at least get to show that it is {\Gamma}-invariant. Now let us assume for sake of argument that the action of {\Gamma} is ergodic, which (ignoring issues about sets of measure zero) basically asserts that the only {\Gamma}-invariant functions are constant. So now we get a weaker version of (4), namely

\displaystyle \alpha_{\hat k_1 + \hat k_2}(x) - \alpha_{\hat k_1}(x) - \alpha_{\hat k_2}(x) = c_{\hat k_1, \hat k_2} \ \ \ \ \ (5)

for some constants {c_{\hat k_1, \hat k_2} \in \mathbf{T}}.

Now we need to eliminate the constants. This can be done by the following group-theoretic projection. Let {L^0({\bf X} \rightarrow {\bf T})} denote the space of concrete measurable maps {\alpha} from {{\bf X}} to {{\bf T}}, up to almost everywhere equivalence; this is an abelian group where the various terms in (5) naturally live. Inside this group we have the subgroup {{\bf T}} of constant functions (up to almost everywhere equivalence); this is where the right-hand side of (5) lives. Because {{\bf T}} is a divisible group, there is an application of Zorn’s lemma (a good exercise for those who are not acquainted with these things) to show that there exists a retraction {w: L^0({\bf X} \rightarrow {\bf T}) \rightarrow {\bf T}}, that is to say a group homomorphism that is the identity on the subgroup {{\bf T}}. We can use this retraction, or more precisely the complement {\alpha \mapsto \alpha - w(\alpha)}, to eliminate the constant in (5). Indeed, if we set

\displaystyle \tilde \alpha_{\hat k}(x) := \alpha_{\hat k}(x) - w(\alpha_{\hat k})

then from (5) we see that

\displaystyle \tilde \alpha_{\hat k_1 + \hat k_2}(x) - \tilde \alpha_{\hat k_1}(x) - \tilde \alpha_{\hat k_2}(x) = 0

while from (2) one has

\displaystyle \langle \hat k, \rho_\gamma(x) \rangle = \tilde \alpha_{\hat k} \circ T^\gamma(x) - \tilde \alpha_{\hat k}(x)

and now the previous strategy works with {\alpha_{\hat k}} replaced by {\tilde \alpha_{\hat k}}. This concludes the sketch of proof of Theorem 1.

In making the above argument rigorous, the hypotheses (i)-(iii) are used in several places. For instance, to reduce to the ergodic case one relies on the ergodic decomposition, which requires the hypothesis (ii). Also, most of the above equations only hold outside of a set of measure zero, and the hypothesis (i) and the hypothesis (iii) (which is equivalent to {\hat K} being at most countable) to avoid the problem that an uncountable union of sets of measure zero could have positive measure (or fail to be measurable at all).

My co-author Asgar Jamneshan and I are working on a long-term project to extend many results in ergodic theory (such as the aforementioned Host-Kra structure theorem) to “uncountable” settings in which hypotheses analogous to (i)-(iii) are omitted; thus we wish to consider actions on uncountable groups, on spaces that are not standard Borel, and cocycles taking values in groups that are not metrisable. Such uncountable contexts naturally arise when trying to apply ergodic theory techniques to combinatorial problems (such as the inverse conjecture for the Gowers norms), as one often relies on the ultraproduct construction (or something similar) to generate an ergodic theory translation of these problems, and these constructions usually give “uncountable” objects rather than “countable” ones. (For instance, the ultraproduct of finite groups is a hyperfinite group, which is usually uncountable.). This paper marks the first step in this project by extending the Moore-Schmidt theorem to the uncountable setting.

If one simply drops the hypotheses (i)-(iii) and tries to prove the Moore-Schmidt theorem, several serious difficulties arise. We have already mentioned the loss of the ergodic decomposition and the possibility that one has to control an uncountable union of null sets. But there is in fact a more basic problem when one deletes (iii): the addition operation {+: K \times K \rightarrow K}, while still continuous, can fail to be measurable as a map from {(K \times K, {\mathcal B}(K) \otimes {\mathcal B}(K))} to {(K, {\mathcal B}(K))}! Thus for instance the sum of two measurable functions {F: X \rightarrow K} need not remain measurable, which makes even the very definition of a measurable cocycle or measurable coboundary problematic (or at least unnatural). This phenomenon is known as the Nedoma pathology. A standard example arises when {K} is the uncountable torus {{\mathbf T}^{{\bf R}}}, endowed with the product topology. Crucially, the Borel {\sigma}-algebra {{\mathcal B}(K)} generated by this uncountable product is not the product {{\mathcal B}(\mathbf{T})^{\otimes {\bf R}}} of the factor Borel {\sigma}-algebras (the discrepancy ultimately arises from the fact that topologies permit uncountable unions, but {\sigma}-algebras do not); relating to this, the product {\sigma}-algebra {{\mathcal B}(K) \otimes {\mathcal B}(K)} is not the same as the Borel {\sigma}-algebra {{\mathcal B}(K \times K)}, but is instead a strict sub-algebra. If the group operations on {K} were measurable, then the diagonal set

\displaystyle K^\Delta := \{ (k,k') \in K \times K: k = k' \} = \{ (k,k') \in K \times K: k - k' = 0 \}

would be measurable in {{\mathcal B}(K) \otimes {\mathcal B}(K)}. But it is an easy exercise in manipulation of {\sigma}-algebras to show that if {(X, {\mathcal X}), (Y, {\mathcal Y})} are any two measurable spaces and {E \subset X \times Y} is measurable in {{\mathcal X} \otimes {\mathcal Y}}, then the fibres {E_x := \{ y \in Y: (x,y) \in E \}} of {E} are contained in some countably generated subalgebra of {{\mathcal Y}}. Thus if {K^\Delta} were {{\mathcal B}(K) \otimes {\mathcal B}(K)}-measurable, then all the points of {K} would lie in a single countably generated {\sigma}-algebra. But the cardinality of such an algebra is at most {2^{\alpha_0}} while the cardinality of {K} is {2^{2^{\alpha_0}}}, and Cantor’s theorem then gives a contradiction.

To resolve this problem, we give {K} a coarser {\sigma}-algebra than the Borel {\sigma}-algebra, namely the Baire {\sigma}-algebra {{\mathcal B}^\otimes(K)}, thus coarsening the measurable space structure on {K = (K,{\mathcal B}(K))} to a new measurable space {K_\otimes := (K, {\mathcal B}^\otimes(K))}. In the case of compact Hausdorff abelian groups, {{\mathcal B}^{\otimes}(K)} can be defined as the {\sigma}-algebra generated by the characters {\hat k: K \rightarrow {\mathbf T}}; for more general compact abelian groups, one can define {{\mathcal B}^{\otimes}(K)} as the {\sigma}-algebra generated by all continuous maps into metric spaces. This {\sigma}-algebra is equal to {{\mathcal B}(K)} when {K} is metrisable but can be smaller for other {K}. With this measurable structure, {K_\otimes} becomes a measurable group; it seems that once one leaves the metrisable world that {K_\otimes} is a superior (or at least equally good) space to work with than {K} for analysis, as it avoids the Nedoma pathology. (For instance, from Plancherel’s theorem, we see that if {m_K} is the Haar probability measure on {K}, then {L^2(K,m_K) = L^2(K_\otimes,m_K)} (thus, every {K}-measurable set is equivalent modulo {m_K}-null sets to a {K_\otimes}-measurable set), so there is no damage to Plancherel caused by passing to the Baire {\sigma}-algebra.

Passing to the Baire {\sigma}-algebra {K_\otimes} fixes the most severe problems with an uncountable Moore-Schmidt theorem, but one is still faced with an issue of having to potentially take an uncountable union of null sets. To avoid this sort of problem, we pass to the framework of abstract measure theory, in which we remove explicit mention of “points” and can easily delete all null sets at a very early stage of the formalism. In this setup, the category of concrete measurable spaces is replaced with the larger category of abstract measurable spaces, which we formally define as the opposite category of the category of {\sigma}-algebras (with Boolean algebra homomorphisms). Thus, we define an abstract measurable space to be an object of the form {{\mathcal X}^{\mathrm{op}}}, where {{\mathcal X}} is an (abstract) {\sigma}-algebra and {\mathrm{op}} is a formal placeholder symbol that signifies use of the opposite category, and an abstract measurable map {T: {\mathcal X}^{\mathrm{op}} \rightarrow {\mathcal Y}^{\mathrm{op}}} is an object of the form {(T^*)^{\mathrm{op}}}, where {T^*: {\mathcal Y} \rightarrow {\mathcal X}} is a Boolean algebra homomorphism and {\mathrm{op}} is again used as a formal placeholder; we call {T^*} the pullback map associated to {T}.  [UPDATE: It turns out that this definition of a measurable map led to technical issues.  In a forthcoming revision of the paper we also impose the requirement that the abstract measurable map be \sigma-complete (i.e., it respects countable joins).] The composition {S \circ T: {\mathcal X}^{\mathrm{op}} \rightarrow {\mathcal Z}^{\mathrm{op}}} of two abstract measurable maps {T: {\mathcal X}^{\mathrm{op}} \rightarrow {\mathcal Y}^{\mathrm{op}}}, {S: {\mathcal Y}^{\mathrm{op}} \rightarrow {\mathcal Z}^{\mathrm{op}}} is defined by the formula {S \circ T := (T^* \circ S^*)^{\mathrm{op}}}, or equivalently {(S \circ T)^* = T^* \circ S^*}.

Every concrete measurable space {X = (X,{\mathcal X})} can be identified with an abstract counterpart {{\mathcal X}^{op}}, and similarly every concrete measurable map {T: X \rightarrow Y} can be identified with an abstract counterpart {(T^*)^{op}}, where {T^*: {\mathcal Y} \rightarrow {\mathcal X}} is the pullback map {T^* E := T^{-1}(E)}. Thus the category of concrete measurable spaces can be viewed as a subcategory of the category of abstract measurable spaces. The advantage of working in the abstract setting is that it gives us access to more spaces that could not be directly defined in the concrete setting. Most importantly for us, we have a new abstract space, the opposite measure algebra {X_\mu} of {X}, defined as {({\bf X}/{\bf N})^*} where {{\bf N}} is the ideal of null sets in {{\bf X}}. Informally, {X_\mu} is the space {X} with all the null sets removed; there is a canonical abstract embedding map {\iota: X_\mu \rightarrow X}, which allows one to convert any concrete measurable map {f: X \rightarrow Y} into an abstract one {[f]: X_\mu \rightarrow Y}. One can then define the notion of an abstract action, abstract cocycle, and abstract coboundary by replacing every occurrence of the category of concrete measurable spaces with their abstract counterparts, and replacing {X} with the opposite measure algebra {X_\mu}; see the paper for details. Our main theorem is then

Theorem 2 (Uncountable Moore-Schmidt theorem) Let {\Gamma} be a discrete group acting abstractly on a {\sigma}-finite measure space {X}. Let {K} be a compact Hausdorff abelian group. Then a {K_\otimes}-valued abstract measurable cocycle {\rho = (\rho_\gamma)_{\gamma \in \Gamma}} is an abstract coboundary if and only if for each character {\hat k \in \hat K}, the {\mathbf{T}}-valued cocycles {\langle \hat k, \rho \rangle = ( \langle \hat k, \rho_\gamma \rangle )_{\gamma \in \Gamma}} are abstract coboundaries.

With the abstract formalism, the proof of the uncountable Moore-Schmidt theorem is almost identical to the countable one (in fact we were able to make some simplifications, such as avoiding the use of the ergodic decomposition). A key tool is what we call a “conditional Pontryagin duality” theorem, which asserts that if one has an abstract measurable map {\alpha_{\hat k}: X_\mu \rightarrow {\bf T}} for each {\hat k \in K} obeying the identity { \alpha_{\hat k_1 + \hat k_2} - \alpha_{\hat k_1} - \alpha_{\hat k_2} = 0} for all {\hat k_1,\hat k_2 \in \hat K}, then there is an abstract measurable map {F: X_\mu \rightarrow K_\otimes} such that {\alpha_{\hat k} = \langle \hat k, F \rangle} for all {\hat k \in \hat K}. This is derived from the usual Pontryagin duality and some other tools, most notably the completeness of the {\sigma}-algebra of {X_\mu}, and the Sikorski extension theorem.

We feel that it is natural to stay within the abstract measure theory formalism whenever dealing with uncountable situations. However, it is still an interesting question as to when one can guarantee that the abstract objects constructed in this formalism are representable by concrete analogues. The basic questions in this regard are:

  • (i) Suppose one has an abstract measurable map {f: X_\mu \rightarrow Y} into a concrete measurable space. Does there exist a representation of {f} by a concrete measurable map {\tilde f: X \rightarrow Y}? Is it unique up to almost everywhere equivalence?
  • (ii) Suppose one has a concrete cocycle that is an abstract coboundary. When can it be represented by a concrete coboundary?

For (i) the answer is somewhat interesting (as I learned after posing this MathOverflow question):

  • If {Y} does not separate points, or is not compact metrisable or Polish, there can be counterexamples to uniqueness. If {Y} is not compact or Polish, there can be counterexamples to existence.
  • If {Y} is a compact metric space or a Polish space, then one always has existence and uniqueness.
  • If {Y} is a compact Hausdorff abelian group, one always has existence.
  • If {X} is a complete measure space, then one always has existence (from a theorem of Maharam).
  • If {X} is the unit interval with the Borel {\sigma}-algebra and Lebesgue measure, then one has existence for all compact Hausdorff {Y} assuming the continuum hypothesis (from a theorem of von Neumann) but existence can fail under other extensions of ZFC (from a theorem of Shelah, using the method of forcing).
  • For more general {X}, existence for all compact Hausdorff {Y} is equivalent to the existence of a lifting from the {\sigma}-algebra {\mathcal{X}/\mathcal{N}} to {\mathcal{X}} (or, in the language of abstract measurable spaces, the existence of an abstract retraction from {X} to {X_\mu}).
  • It is a long-standing open question (posed for instance by Fremlin) whether it is relatively consistent with ZFC that existence holds whenever {Y} is compact Hausdorff.

Our understanding of (ii) is much less complete:

  • If {K} is metrisable, the answer is “always” (which among other things establishes the countable Moore-Schmidt theorem as a corollary of the uncountable one).
  • If {\Gamma} is at most countable and {X} is a complete measure space, then the answer is again “always”.

In view of the answers to (i), I would not be surprised if the full answer to (ii) was also sensitive to axioms of set theory. However, such set theoretic issues seem to be almost completely avoided if one sticks with the abstract formalism throughout; they only arise when trying to pass back and forth between the abstract and concrete categories.

January 19, 2020

n-Category Café Random Permutations (Part 13)

Last time I started talking about the groupoid of ‘finite sets equipped with permutation’, Perm\mathsf{Perm}. Remember:

  • an object (X,σ)(X,\sigma) of Perm\mathsf{Perm} is a finite set XX with a bijection σ:XX\sigma \colon X \to X;
  • a morphism f:(X,σ)(X,σ)f \colon (X,\sigma) \to (X',\sigma') is a bijection f:XXf \colon X \to X' such that σ=fσf 1\sigma' = f \sigma f^{-1} .

In other words, Perm\mathsf{Perm} is the groupoid of finite \mathbb{Z}-sets. It’s also equivalent to the groupoid of covering spaces of the circle having finitely many sheets!

Today I’d like to talk about another slightly bigger groupoid. It’s very pretty, and I think it will shed light on a puzzle we saw earlier: the mysterious connection between random permutations and Poisson distributions.

I’ll conclude with a question for homotopy theorists.

Remember the formula I proved last time:

Perm yY k=1 B(/k) y(k)y(k)! \mathsf{Perm} \simeq \sum_{y \in Y} \prod_{k=1}^\infty \frac{ \mathsf{B}(\mathbb{Z}/k)^{y(k)}}{ y(k)!}

where YY is the set of Young diagrams, y(k)y(k) is the number of columns of length kk in the Young diagram yy, B(G)\mathsf{B}(G) is the one-object groupoid corresponding to the group GG, and for any category C\mathsf{C} I’m using

C nn! \frac{\mathsf{C}^n}{n!}

to stand for the ‘weak quotient’ of C n\mathsf{C}^n by the permutation group S nS_n. (That is, instead of just modding out, we throw in isomorphisms coming from permutations. I explained this in more detail last time.)

Now, in math we often see expressions like

n=0 x nn! \sum_{n = 0}^\infty \frac{x^n}{n!}

and for any category C\mathsf{C},

S(C)= n=0 C nn! \mathsf{S}(\mathsf{C}) = \sum_{n = 0}^\infty \frac{\mathsf{C}^n}{n!}

is the free symmetric monoidal category on C\mathsf{C}. The formula for Perm\mathsf{Perm} looks vaguely similar! Indeed, the free symmetric monoidal category on B(/k)\mathsf{B}(\mathbb{Z}/k) is

S(B(/k))= n=0 B(/k) nn! \mathsf{S}(\mathsf{B}(\mathbb{Z}/k)) = \sum_{n = 0}^\infty \frac{\mathsf{B}(\mathbb{Z}/k)^n}{n!}

and this seems to be lurking in the background in a strangely fractured way here:

Perm yY k=1 B(/k) y(k)y(k)! \mathsf{Perm} \simeq \sum_{y \in Y} \prod_{k=1}^\infty \frac{ \mathsf{B}(\mathbb{Z}/k)^{y(k)}}{ y(k)!}

What’s going on?

What’s going on is that YY, the set of Young diagrams, is really the set of functions y: +y \colon \mathbb{N}^+ \to \mathbb{N} that vanish except at finitely many points. Suppose we drop that finiteness condition! Then things get nicer.

Remember, in any situation where products distribute over sums, if we have a bunch of things x ijx_{i j} indexed by iIi \in I, jJj \in J we can write the distributive law as

iI jJx ij f:IJ iJx if(j) \prod_{i \in I} \sum_{j \in J} x_{i j} \quad \simeq \quad \sum_{f \colon I \to J} \prod_{i \in J} x_{i f(j)}

For example, if we multiply 5 sums of 2 things we get a sum of 2 52^5 products of 2 things. So, if we take

Perm yY k=1 B(/k) y(k)y(k)! \mathsf{Perm} \simeq \sum_{y \in Y} \prod_{k=1}^\infty \frac{ \mathsf{B}(\mathbb{Z}/k)^{y(k)}}{ y(k)!}

and drop the finiteness condition on yy, we get a new groupoid, which I’ll call Poiss\mathsf{Poiss}:

Poiss= y: + k=1 B(/k) y(k)y(k)! \mathsf{Poiss} = \sum_{y \colon \mathbb{N}^+ \to \mathbb{N}} \prod_{k=1}^\infty \frac{ \mathsf{B}(\mathbb{Z}/k)^{y(k)}}{ y(k)!}

and we can rewrite this using the distributive law:

Poiss k=1 n=0 B(/k) nn! k=1 S(B(/k)) \begin{array}{ccl} \mathsf{Poiss} & \simeq & \displaystyle{ \prod_{k =1}^\infty \sum_{n = 0}^\infty \frac{ \mathsf{B}(\mathbb{Z}/k)^n}{n!} } \\ \\ & \simeq & \displaystyle{ \prod_{k =1}^\infty \mathsf{S}(\mathsf{B}(\mathbb{Z}/k)) } \end{array}

It’s just the product of the free symmetric monoidal categories on all the B(/k)\mathsf{B}(\mathbb{Z}/k).

What is the category S(B(/k))\mathsf{S}(\mathsf{B}(\mathbb{Z}/k)) actually like? It’s a groupoid. It has objects 1,x,x 2,x 3,1, x, x^{\otimes 2}, x^{\otimes 3}, \dots and so on. There are no morphisms between distinct objects. The automorphism group of x nx^{\otimes n} is the semidirect product of S nS_n and /k××/k\mathbb{Z}/k \times \cdots \times \mathbb{Z}/k, where the symmetric group acts to permute the factors.

So, in words, S(B(/k))\mathsf{S}(\mathsf{B}(\mathbb{Z}/k)) is the ‘free symmetric monoidal category on an object xx having /k\mathbb{Z}/k as its symmetry group’.

This sounds abstract. But it’s equivalent to something concrete: the groupoid of finite sets that are equipped with a permutation all of whose cycles have length kk. The object xx corresponds to a set with a permutation having a single cycle of length kk. The tensor product corresponds to disjoint union. Thus, x nx^{\otimes n} corresponds to a set with a permutation having nn disjoint cycles of length kk.

So, you can think of an object of

Poiss k=1 S(B(/k))\mathsf{Poiss} \cong \displaystyle{ \prod_{k =1}^\infty \mathsf{S}(\mathsf{B}(\mathbb{Z}/k)) }

as an infinite list of finite sets, one for each k=1,2,3,k = 1, 2, 3, \dots, where the kkth set is equipped with a permutation having only cycles of length kk.

Taking the disjoint union of all these sets, we get a single set with a permutation on it. This set may be infinite, but all its cycles have finite length, and it has finitely many cycles of each length k=1,2,3,k = 1, 2, 3, \dots . So:

Theorem. The groupoid Poiss k=1 S(B(/k)) \mathsf{Poiss} \simeq \prod_{k =1}^\infty \mathsf{S}(\mathsf{B}(\mathbb{Z}/k)) is equivalent to the groupoid of sets equipped with a permutation having only cycles of finite length, with finitely many cycles of each length.

It’s easy from this description to see the inclusion

PermPoiss \mathsf{Perm} \hookrightarrow \mathsf{Poiss}

It’s just the inclusion of the finite sets equipped with permutation!

I claim that the groupoid Poiss\mathsf{Poiss} explains why the number of cycles of length kk in a randomly chosen permutation of an nn-element set approaches a Poisson-distributed random variable with mean 1/k1/k as nn \to \infty. The fact that it’s a product also explains why these random variables become independent in the nn \to \infty limit.

I’ll talk about this more later. But to get an idea of how it works, let’s compute the groupoid cardinality of S(B(/k))\mathsf{S}(\mathsf{B}(\mathbb{Z}/k)). It’s

|S(B(/k))| = | n=0 B(/k) nn!| = n=0 |B(/k)| nn! = n=0 (1/k) nn! = e 1/k \begin{array}{ccl} | \mathsf{S}(\mathsf{B}(\mathbb{Z}/k)) | &=&\displaystyle{ \left| \sum_{n = 0}^\infty \frac{ \mathsf{B}(\mathbb{Z}/k)^n}{n!} \right| } \\ \\ &=& \displaystyle{ \sum_{n = 0}^\infty \frac{ |\mathsf{B}(\mathbb{Z}/k)|^n}{n!} } \\ \\ &=& \displaystyle{ \sum_{n = 0}^\infty \frac{ (1/k)^n}{n!} } \\ \\ &=& e^{1/k} \end{array}

The Poisson distribution with mean 1/k1/k is

p n=e 1/k(1/k) nn! p_n = e^{-1/k} \frac{ (1/k)^n}{n!}

so we’re seeing that lurking here. But I need to think about this more before I can give a really convincing explanation.

Let me conclude with a puzzle for homotopy theorists.

First, some background so others can enjoy the puzzle, or at least learn something. Homotopy theorists know how to take any category and turn it into a topological space: the geometric realization of its nerve. If we take a group GG and apply this trick to the one-object groupoid B(G)\mathsf{B}(G) we get the Eilenberg–Mac Lane space K(G,1)K(G,1). This is the connected space with GG as its fundamental group and all higher homotopy groups being trivial. As long as we give GG the discrete topology, as we’ll do for G=/kG = \mathbb{Z}/k here, K(G,1)K(G,1) is also the classifying space for GG-bundles, denoted BGB G. (This looks a lot like the groupoid B(G)\mathsf{B}(G) — but that’s okay, because in homotopy theory a groupoid is considered only slightly different from the geometric realization of its nerve.)

Puzzle. Given a discrete group GG, what’s a nice description of the geometric realization of the nerve of S(B(G))\mathsf{S}(\mathsf{B}(G)), the free symmetric monoidal category on the one-object groupoid corresponding to GG? I’m especially interested in the case where GG is a finite cyclic group.

By the way, the classifying space B(/k)B(\mathbb{Z}/k) is a ‘lens space’: it’s formed by taking the unit sphere in \mathbb{C}^\infty and quotienting by the action of the kkth roots of unity. My first guess on the puzzle is to take the disjoint union

n=0 B(/k) nS n \sum_{n=0}^\infty \frac{ B(\mathbb{Z}/k)^n}{S_n}

A point in here is a finite set of points in the lens space! Note that the construction here is different from the infinite symmetric product used in the Dold–Kan theorem, because we are not identifying an nn-element set of points with an (n+1)(n+1)-element set whose extra element is the basepoint.

What do you think, experts?

David Hogginterpolation

It was a low-research day (job season) but I worked a bit with Lily Zhao (Yale) on interpolation methods and comparing interpolations. This for our hierarchical, non-parametric wavelength calibration method.

David Hoggnon-parametric and hierarchical

On the flight home from #AAS235, I did some writing in a paper by Lily Zhao (Yale) about spectrograph (wavelength) calibration. I'm very excited about this project; we removed all dependence on polynomials and other kinds of strict functional forms. We went non-parametric. But of course this greatly increases the degrees of freedom of the fitting or interpolation of the calibration data. So when we do this, we also have to go hierarchical; we have to restrict the calibration freedom using the data. That is, we don't have any strict functional form for the calibration of the spectrograph, but we require that the calibration solution we find lives in the space of solutions that we have seen before. That is, if you increase the freedom by going non-parametric, you need to restrict the freedom by going non-parametric. (The results look incredible.)

David Hogg#AAS235, day 4 and #hackaas

Today was Hack Together Day #hackaas at #AAS235. We computed that this is the eighth winter AAS meeting to have a hack day, making it (AAS Hack Together Day) one of my scientific accomplishments of the decade. At the hack day, the main thing I did was hack on hack day, working with Jim Davenport (UW) to brainstorm things we can do to keep the event fresh, and keep us experimenting with it. I also had a great conversation with Brigitta Sipocz, Geert Barentsen, and others about ways we can use our hacking and design thinking to support a reduction in CO2 emissions by astronomers and academics in general. Related to my conversations of yesterday.

But many great things happened in the Hack Together Day. Too many to list here. Look at the wrap-up slides to get a sense of the range and depth of the projects. So many people learned a lot and did a lot. I'm proud, which is a sin, apparently.

David Hoggremote meetings

A highlight of today was a long meeting with Chris Lintott (Oxford) covering many subjects. But he told me about dot-dot-astronomy, which is a fully-remote reboot they are working on for the niche but extremely influential dot-astronomy meetings. The idea is to go fully remote—all participants remote—but then change the meeting expectations and structure to respect that. The idea is: Maybe not try to do remote meetings so they are just as good as face-to-face meetings, but to try to do remote meetings so they are something very different from face-to-face meetings. That seems like a great idea. Let's re-frame our goals. We have to do something about what we are doing to this planet.

January 17, 2020

Matt von HippelWhat Do Theorists Do at Work?

Picture a scientist at work. You’re probably picturing an experiment, test tubes and beakers bubbling away. But not all scientists do experiments. Theoretical physicists work on the mathematical side of the field, making predictions and trying to understand how to make them better. So what does it look like when a theoretical physicist is working?

A theoretical physicist, at work in the equation mines

The first thing you might imagine is that we just sit and think. While that happens sometimes, we don’t actually do that very often. It’s better, and easier, to think by doing something.

Sometimes, this means working with pen and paper. This should be at least a little familiar to anyone who has done math homework. We’ll do short calculations and draw quick diagrams to test ideas, and do a more detailed, organized, “show your work” calculation if we’re trying to figure out something more complicated. Sometimes very short calculations are done on a blackboard instead, it can help us visualize what we’re doing.

Sometimes, we use computers instead. There are computer algebra packages, like Mathematica, Maple, or Sage, that let us do roughly what we would do on pen and paper, but with the speed and efficiency of a computer. Others program in more normal programming languages: C++, Python, even Fortran, making programs that can calculate whatever they are interested in.

Sometimes we read. With most of our field’s papers available for free on, we spend time reading up on what our colleagues have done, trying to understand their work and use it to improve ours.

Sometimes we talk. A paper can only communicate so much, and sometimes it’s better to just walk down the hall and ask a question. Conversations are also a good way to quickly rule out bad ideas, and narrow down to the promising ones. Some people find it easier to think clearly about something if they talk to a colleague about it, even (sometimes especially) if the colleague isn’t understanding much.

And sometimes, of course, we do all the other stuff. We write up our papers, making the diagrams nice and the formulas clean. We teach students. We go to meetings. We write grant applications.

It’s been said that a theoretical physicist can work anywhere. That’s kind of true. Some places are more comfortable, and everyone has different preferences: a busy office, a quiet room, a cafe. But with pen and paper, a computer, and people to talk to, we can do quite a lot.

Scott AaronsonMIP*=RE

Another Update (Jan. 16): Yet another reason to be excited about this result—one that somehow hadn’t occurred to me—is that, as far as I know, it’s the first-ever fully convincing example of a non-relativizing computability result. See this comment for more.

Update: If you’re interested in the above topic, then you should probably stop reading this post right now, and switch to this better post by Thomas Vidick, one of the authors of the new breakthrough. (Or this by Boaz Barak or this by Lance Fortnow or this by Ken Regan.) (For background, also see Thomas Vidick’s excellent piece for the AMS Notices.)

Still here? Alright, alright…

Here’s the paper, which weighs in at 165 pages. The authors are Zhengfeng Ji, Anand Natarajan, my former postdoc Thomas Vidick, John Wright (who will be joining the CS faculty here at UT Austin this fall), and my wife Dana’s former student Henry Yuen. Rather than pretending that I can provide intelligent commentary on this opus in the space of a day, I’ll basically just open my comment section to discussion and quote the abstract:

We show that the class MIP* of languages that can be decided by a classical verifier interacting with multiple all-powerful quantum provers sharing entanglement is equal to the class RE of recursively enumerable languages. Our proof builds upon the quantum low-degree test of (Natarajan and Vidick, FOCS 2018) by integrating recent developments from (Natarajan and Wright, FOCS 2019) and combining them with the recursive compression framework of (Fitzsimons et al., STOC 2019).
An immediate byproduct of our result is that there is an efficient reduction from the Halting Problem to the problem of deciding whether a two-player nonlocal game has entangled value 1 or at most 1/2. Using a known connection, undecidability of the entangled value implies a negative answer to Tsirelson’s problem: we show, by providing an explicit example, that the closure Cqa of the set of quantum tensor product correlations is strictly included in the set Cqc of quantum commuting correlations. Following work of (Fritz, Rev. Math. Phys. 2012) and (Junge et al., J. Math. Phys. 2011) our results provide a refutation of Connes’ embedding conjecture from the theory of von Neumann algebras.

To say it differently (in response to a commenter’s request), some of the major implications are as follows.

(1) There is a protocol by which two entangled provers can convince a polynomial-time verifier of the answer to any computable problem whatsoever (!!), or indeed that a given Turing machine halts.

(2) There is a two-prover game, analogous to the Bell/CHSH game, for which Alice and Bob can do markedly better with a literally infinite amount of entanglement than they can with any finite amount of entanglement.

(3) There is no algorithm even to approximate the entangled value of a two-prover game (i.e., the probability that Alice and Bob win the game, if they use the best possible strategy and as much entanglement as they like). Instead, this problem is equivalent to the halting problem.

(4) There are types of correlations between Alice and Bob that can be produced using infinite entanglement, but that can’t even be approximated using any finite amount of entanglement.

(5) The Connes embedding conjecture, a central conjecture from the theory of operator algebras dating back to the 1970s, is false.

Note that all of these implications—including the ones for pure math and the foundations of quantum physics—were obtained using tools that originated in theoretical computer science, specifically the study of interactive proof systems.

I can remember when the class MIP* was first defined and studied, back around 2003, and people made the point that we didn’t know any reasonable upper bound on the class’s power—not NEXP, not NEEEEXP, not even the set of all computable languages. Back then, the joke was how far our proof techniques were from what was self-evidently the truth. I don’t remember a single person who seriously contemplated that two entangled provers could convince a polynomial-time verifier than an arbitrary Turing machine halts.

Still, ever since Natarajan and Wright’s NEEXP in MIP* breakthrough last year, all of us in quantum computing theory knew that MIP*=RE was a live possibility—and all through the summer and fall, I heard many hints that such a breakthrough was imminent.

It’s worth pointing out that, with only classical correlations between the provers, MIP gives “merely” the power of NEXP (Nondeterministic Exponential Time), while with arbitrary non-signalling correlations between the provers, the so-called MIPns gives the power of EXP (Deterministic Exponential Time). So it’s particularly striking that quantum entanglement, which is “intermediate” between classical correlations and arbitrary non-signalling correlations, yields such wildly greater computational power than either of those two.

The usual proviso applies: when I’ve blogged excitedly about preprints with amazing new results, most have stood, but at least two ended up being retracted. Still, assuming this one stands (as I’m guessing it will), I regard it as easily one of the biggest complexity-theoretic (and indeed computability-theoretic!) surprises so far in this century. Huge congratulations to the authors on what looks to be a historic achievement.

In unrelated news, for anyone for whom the 165-page MIP* paper is too heavy going (really??), please enjoy this CNBC video on quantum computing, which features several clips of yours truly speaking in front of a fake UT tower.

In other unrelated news, I’m also excited about this preprint by Avishay Tal, which sets a new record for the largest known separation between quantum query complexity and classical randomized query complexity, making substantial progress toward proving a conjecture by me and Andris Ambainis from 2015. (Not the “Aaronson-Ambainis Conjecture,” a different conjecture.)

January 16, 2020

Scott AaronsonAn alternative argument for why women leave STEM: Guest post by Karen Morenz

Scott’s preface: Imagine that every time you turned your blog over to a certain topic, you got denounced on Twitter and Reddit as a privileged douchebro, entitled STEMlord, counterrevolutionary bourgeoisie, etc. etc. The sane response would simply be to quit blogging about that topic. But there’s also an insane (or masochistic?) response: the response that says, “but if everyone like me stopped talking, we’d cede the field by default to the loudest, angriest voices on all sides—thereby giving those voices exactly what they wanted. To hell with that!”

A few weeks ago, while I was being attacked for sharing Steven Pinker’s guest post about NIPS vs. NeurIPS, I received a beautiful message of support from a PhD student in physical chemistry and quantum computing named Karen Morenz. Besides her strong words of encouragement, Karen wanted to share with me an essay she had written on Medium about why too many women leave STEM.

Karen’s essay, I found, marshaled data, logic, and her own experience in support of an insight that strikes me as true and important and underappreciated—one that dovetails with what I’ve heard from many other women in STEM fields, including my wife Dana. So I asked Karen for permission to reprint her essay on this blog, and she graciously agreed.

Briefly: anyone with a brain and a soul wants there to be many more women in STEM. Karen outlines a realistic way to achieve this shared goal. Crucially, Karen’s way is not about shaming male STEM nerds for their deep-seated misogyny, their arrogant mansplaining, or their gross, creepy, predatory sexual desires. Yes, you can go the shaming route (God knows it’s being tried). If you do, you’ll probably snare many guys who really do deserve to be shamed as creeps or misogynists, along with many more who don’t. Yet for all your efforts, Karen predicts, you’ll no more solve the original problem of too few women in STEM, than arresting the kulaks solved the problem of lifting the masses out of poverty.

For you still won’t have made a dent in the real issue: namely that, the way we’ve set things up, pursuing an academic STEM career demands fanatical devotion, to the exclusion of nearly everything else in life, between the ages of roughly 18 and 35. And as long as that’s true, Karen says, the majority of talented women are going to look at academic STEM, in light of all the other great options available to them, and say “no thanks.” Solving this problem might look like more money for maternity leave and childcare. It might also look like re-imagining the academic career trajectory itself, to make it easier to rejoin it after five or ten years away. Way back in 2006, I tried to make this point in a blog post called Nerdify the world, and the women will follow. I’m grateful to Karen for making it more cogently than I did.

Without further ado, here’s Karen’s essay. –SA

Is it really just sexism? An alternative argument for why women leave STEM

by Karen Morenz

Everyone knows that you’re not supposed to start your argument with ‘everyone knows,’ but in this case, I think we ought to make an exception:

Everyone knows that STEM (Science, Technology, Engineering and Mathematics) has a problem retaining women (see, for example Jean, Payne, and Thompson 2015). We pour money into attracting girls and women to STEM fields. We pour money into recruiting women, training women, and addressing sexism, both overt and subconscious. In 2011, the United States spent nearly $3 billion tax dollars on STEM education, of which roughly one third was spent supporting and encouraging underrepresented groups to enter STEM (including women). And yet, women are still leaving at alarming rates.

Alarming? Isn’t that a little, I don’t know, alarmist? Well, let’s look at some stats.

A recent report by the National Science Foundation (2011) found that women received 20.3% of the bachelor’s degrees and 18.6% of the PhD degrees in physics in 2008. In chemistry, women earned 49.95% of the bachelor’s degrees but only 36.1% of the doctoral degrees. By comparison, in biology women received 59.8% of the bachelor’s degrees and 50.6% of the doctoral degrees. A recent article in Chemical and Engineering News showed a chart based on a survey of life sciences workers by Liftstream and MassBio demonstrating how women are vastly underrepresented in science leadership despite earning degrees at similar rates, which I’ve copied below. The story is the same in academia, as you can see on the second chart — from comparable or even larger number of women at the student level, we move towards a significantly larger proportion of men at the more and more advanced stages of an academic career.

Although 74% of women in STEM report “loving their work,” half (56%, in fact) leave over the course of their career — largely at the “mid-level” point, when the loss of their talent is most costly as they have just completed training and begun to contribute maximally to the work force.

A study by Dr. Flaherty found that women who obtain faculty position in astronomy spent on average 1 year less than their male counterparts between completing their PhD and obtaining their position — but he concluded that this is because women leave the field at a rate 3 to 4 times greater than men, and in particular, if they do not obtain a faculty position quickly, will simply move to another career. So, women and men are hired at about the same rate during the early years of their post docs, but women stop applying to academic positions and drop out of the field as time goes on, pulling down the average time to hiring for women.

There are many more studies to this effect. At this point, the assertion that women leave STEM at an alarming rate after obtaining PhDs is nothing short of an established fact. In fact, it’s actually a problem across all academic disciplines, as you can see in this matching chart showing the same phenomenon in humanities, social sciences, and education. The phenomenon has been affectionately dubbed the “leaky pipeline.”

But hang on a second, maybe there just aren’t enough women qualified for the top levels of STEM? Maybe it’ll all get better in a few years if we just wait around doing nothing?

Nope, sorry. This study says that 41% of highly qualified STEM people are female. And also, it’s clear from the previous charts and stats that a significantly larger number of women are getting PhDs than going on the be professors, in comparison to their male counterparts. Dr. Laurie Glimcher, when she started her professorship at Harvard University in the early 1980s, remembers seeing very few women in leadership positions. “I thought, ‘Oh, this is really going to change dramatically,’ ” she says. But 30 years later, “it’s not where I expected it to be.” Her experiences are similar to those of other leading female faculty.

So what gives? Why are all the STEM women leaving?

It is widely believed that sexism is the leading problem. A quick google search of “sexism in STEM” will turn up a veritable cornucopia of articles to that effect. And indeed, around 60% of women report experiencing some form of sexism in the last year (Robnett 2016). So, that’s clearly not good.

And yet, if you ask leading women researchers like Nobel Laureate in Physics 2018, Professor Donna Strickland, or Canada Research Chair in Advanced Functional Materials (Chemistry), Professor Eugenia Kumacheva, they say that sexism was not a barrier in their careers. Moreover, extensive research has shown that sexism has overall decreased since Professors Strickland and Kumacheva (for example) were starting their careers. Even more interestingly, Dr. Rachael Robnett showed that more mathematical fields such as Physics have a greater problem with sexism than less mathematical fields, such as Chemistry, a finding which rings true with the subjective experience of many women I know in Chemistry and Physics. However, as we saw above, women leave the field of Chemistry in greater proportions following their BSc than they leave Physics. On top of that, although 22% of women report experiencing sexual harassment at work, the proportion is the same among STEM and non-STEM careers, and yet women leave STEM careers at a much higher rate than non-STEM careers.

So,it seems that sexism can not fully explain why women with STEM PhDs are leaving STEM. At the point when women have earned a PhD, for the most part they have already survived the worst of the sexism. They’ve already proven themselves to be generally thick-skinned and, as anyone with a PhD can attest, very stubborn in the face of overwhelming difficulties. Sexism is frustrating, and it can limit advancement, but it doesn’t fully explain why we have so many women obtaining PhDs in STEM, and then leaving. In fact, at least in the U of T chemistry department, faculty hires are directly proportional to the applicant pool —although the exact number of applicants are not made public, from public information we can see that approximately one in four interview invitees are women, and approximately one in four hires are women. Our hiring committees have received bias training, and it seems that it has been largely successful. That’s not to say that we’re done, but it’s time to start looking elsewhere to explain why there are so few women sticking around.

So why don’t more women apply?

Well, one truly brilliant researcher had the groundbreaking idea of asking women why they left the field. When you ask women why they left, the number one reason they cite is balancing work/life responsibilities — which as far as I can tell is a euphemism for family concerns.

The research is in on this. Women who stay in academia expect to marry later, and delay or completely forego having children, and if they do have children, plan to have fewer than their non-STEM counterparts (Sassler et al 2016Owens 2012). Men in STEM have no such difference compared to their non-STEM counterparts; they marry and have children about the same ages and rates as their non-STEM counterparts (Sassler et al 2016). Women leave STEM in droves in their early to mid thirties (Funk and Parker 2018) — the time when women’s fertility begins to decrease, and risks of childbirth complications begin to skyrocket for both mother and child. Men don’t see an effect on their fertility until their mid forties. Of the 56% of women who leave STEM, 50% wind up self-employed or using their training in a not for profit or government, 30% leave to a non-STEM more ‘family friendly’ career, and 20% leave to be stay-at-home moms (Ashcraft and Blithe 2002). Meanwhile, institutions with better childcare and maternity leave policies have twice(!) the number of female faculty in STEM (Troeger 2018). In analogy to the affectionately named “leaky pipeline,” the challenge of balancing motherhood and career has been titled the “maternal wall.”

To understand the so-called maternal wall better, let’s take a quick look at the sketch of a typical academic career.

For the sake of this exercise, let’s all pretend to be me. I’m a talented 25 year old PhD candidate studying Physical Chemistry — I use laser spectroscopy to try to understand atypical energy transfer processes in innovative materials that I hope will one day be used to make vastly more efficient solar panels. I got my BSc in Chemistry and Mathematics at the age of 22, and have published 4 scientific papers in two different fields already (Astrophysics and Environmental Chemistry). I’ve got a big scholarship, and a lot of people supporting me to give me the best shot at an academic career — a career I dearly want. But, I also want a family — maybe two or three kids. Here’s what I can expect if I pursue an academic career:

With any luck, 2–3 years from now I’ll graduate with a PhD, at the age of 27. Academics are expected to travel a lot, and to move a lot, especially in their 20s and early 30s — all of the key childbearing years. I’m planning to go on exchange next year, and then the year after that I’ll need to work hard to wrap up research, write a thesis, and travel to several conferences to showcase my work. After I finish my PhD, I’ll need to undertake one or two post doctoral fellowships, lasting one or two years each, probably in completely different places. During that time, I’ll start to apply for professorships. In order to do this, I’ll travel around to conferences to advertise my work and to meet important leaders in my field, and then, if I am invited for interviews, I’ll travel around to different universities for two or three days at a time to undertake these interviews. This usually occurs in a person’s early 30s — our helpful astronomy guy, Dr. Flaherty, found the average time to hiring was 5 years, so let’s say I’m 32 at this point. If offered a position, I’ll spend the next year or two renovating and building a lab, buying equipment, recruiting talented graduate students, and designing and teaching courses. People work really, really hard during this time and have essentially no leisure time. Now I’m 34. Within usually 5 years I’ll need to apply for tenure. This means that by the time I’m 36, I’ll need to be making significant contributions in my field, and then in the final year before applying for tenure, I will once more need to travel to many conferences to promote my work, in order to secure tenure — if I fail to do so, my position at the university would probably be terminated. Although many universities offer a “tenure extension” in cases where an assistant professor has had a child, this does not solve all of the problems. Taking a year off during that critical 5 or 6 year period often means that the research “goes bad” — students flounder, projects that were promising get “scooped” by competitors at other institutions, and sometimes, in biology and chemistry especially, experiments literally go bad. You wind up needing to rebuild much more than just a year’s worth of effort.

At no point during this time do I appear stable enough, career-wise, to take even six months off to be pregnant and care for a newborn. Hypothetical future-me is travelling around, or even moving, conducting and promoting my own independent research and training students. As you’re likely aware, very pregnant people and newborns don’t travel well. And academia has a very individualistic and meritocratic culture. Starting at the graduate level, huge emphasis is based on independent research, and independent contributions, rather than valuing team efforts. This feature of academia is both a blessing and a curse. The individualistic culture means that people have the independence and the freedom to pursue whatever research interests them — in fact this is the main draw for me personally. But it also means that there is often no one to fall back on when you need extra support, and because of biological constraints, this winds up impacting women more than men.

At this point, I need to make sure that you’re aware of some basics of female reproductive biology. According to Wikipedia, the unquestionable source of all reliable knowledge, at age 25, my risk of conceiving a baby with chromosomal abnormalities (including Down’s Syndrome) is 1 in about 1400. By 35, that risk more than quadruples to 1 in 340. At 30, I have a 75% chance of a successful birth in one year, but by 35 it has dropped to 66%, and by 40 it’s down to 44%. Meanwhile, 87 to 94% of women report at least 1 health problem immediately after birth, and 1.5% of mothers have a severe health problem, while 31% have long-term persistent health problems as a result of pregnancy (defined as lasting more than six months after delivery). Furthermore, mothers over the age of 35 are at higher risk for pregnancy complications like preterm delivery, hypertension, superimposed preeclampsia, severe preeclampsia (Cavazos-Rehg et al 2016). Because of factors like these, pregnancies in women over 35 are known as “geriatric pregnancies” due to the drastically increased risk of complications. This tight timeline for births is often called the “biological clock” — if women want a family, they basically need to start before 35. Now, that’s not to say it’s impossible to have a child later on, and in fact some studies show that it has positive impacts on the child’s mental health. But it is riskier.

So, women with a PhD in STEM know that they have the capability to make interesting contributions to STEM, and to make plenty of money doing it. They usually marry someone who also has or expects to make a high salary as well. But this isn’t the only consideration. Such highly educated women are usually aware of the biological clock and the risks associated with pregnancy, and are confident in their understanding of statistical risks.

The Irish say, “The common challenge facing young women is achieving a satisfactory work-life balance, especially when children are small. From a career perspective, this period of parenthood (which after all is relatively short compared to an entire working life) tends to coincide exactly with the critical point at which an individual’s career may or may not take off. […] All the evidence shows that it is at this point that women either drop out of the workforce altogether, switch to part-time working or move to more family-friendly jobs, which may be less demanding and which do not always utilise their full skillset.”

And in the Netherlands, “The research project in Tilburg also showed that women academics have more often no children or fewer children than women outside academia.” Meanwhile in Italy “On a personal level, the data show that for a significant number of women there is a trade-off between family and work: a large share of female economists in Italy do not live with a partner and do not have children”

Most jobs available to women with STEM PhDs offer greater stability and a larger salary earlier in the career. Moreover, most non-academic careers have less emphasis on independent research, meaning that employees usually work within the scope of a larger team, and so if a person has to take some time off, there are others who can help cover their workload. By and large, women leave to go to a career where they will be stable, well funded, and well supported, even if it doesn’t fulfill their passion for STEM — or they leave to be stay-at-home moms or self-employed.

I would presume that if we made academia a more feasible place for a woman with a family to work, we could keep almost all of those 20% of leavers who leave to just stay at home, almost all of the 30% who leave to self-employment, and all of those 30% who leave to more family friendly careers (after all, if academia were made to be as family friendly as other careers, there would be no incentive to leave). Of course, there is nothing wrong with being a stay at home parent — it’s an admirable choice and contributes greatly to our society. One estimate valued the equivalent salary benefit of stay-at-home parenthood at about $160,000/year. Moreover, children with a stay-at-home parent show long term benefits such as better school performance — something that most academic women would want for their children. But a lot of people only choose it out of necessity — about half of stay-at-home moms would prefer to be working (Ciciolla, Curlee, & Luthar 2017). When the reality is that your salary is barely more than the cost of daycare, then a lot of people wind up giving up and staying home with their kids rather than paying for daycare. In a heterosexual couple it will usually be the woman that winds up staying home since she is the one who needs to do things like breast feed anyways. And so we lose these women from the workforce.

And yet, somehow, during this informal research adventure of mine, most scholars and policy makers seem to be advising that we try to encourage young girls to be interested in STEM, and to address sexism in the workplace, with the implication that this will fix the high attrition rate in STEM women. But from what I’ve found, the stats don’t back up sexism as the main reason women leave. There is sexism, and that is a problem, and women do leave STEM because of it — but it’s a problem that we’re already dealing with pretty successfully, and it’s not why the majority of women who have already obtained STEM PhDs opt to leave the field. The whole family planning thing is huge and for some reason, almost totally swept under the rug — mostly because we’re too shy to talk about it, I think.

In fact, I think that the plethora of articles suggesting that the problem is sexism actually contribute to our unwillingness to talk about the family planning problem, because it reinforces the perception that that men in power will not hire a woman for fear that she’ll get pregnant and take time off. Why would anyone talk about how they want to have a family when they keep hearing that even the mere suggestion of such a thing will limit their chances of being hired? I personally know women who have avoided bringing up the topic with colleagues or supervisors for fear of professional repercussions. So we spend all this time and energy talking about how sexism is really bad, and very little time trying to address the family planning challenge, because, I guess, as the stats show, if women are serious enough about science then they just give up on the family (except for the really, really exceptional ones who can handle the stresses of both simultaneously).

To be very clear, I’m not saying that sexism is not a problem. What I am saying is that, thanks to the sustained efforts of a large number of people over a long period of time, we’ve reduced the sexism problem to the point where, at least at the graduate level, it is no longer the largest major barrier to women’s advancement in STEM. Hurray! That does not mean that we should stop paying attention to the issue of sexism, but does mean that it’s time to start paying more attention to other issues, like how to properly support women who want to raise a family while also maintaining a career in STEM.

So what can we do to better support STEM women who want families?

A couple of solutions have been tentatively tested. From a study mentioned above, it’s clear that providing free and conveniently located childcare makes a colossal difference to women’s choices of whether or not to stay in STEM, alongside extended and paid maternity leave. Another popular and successful strategy was implemented by a leading woman in STEM, Laurie Glimcher, a past Harvard Professor in Immunology and now CEO of Dana-Farber Cancer Institute. While working at NIH, Dr. Glimcher designed a program to provide primary caregivers (usually women) with an assistant or lab technician to help manage their laboratories while they cared for children. Now, at Dana-Farber Cancer Institute, she has created a similar program to pay for a technician or postdoctoral researcher for assistant professors. In the academic setting, Dr. Glimcher’s strategies are key for helping to alleviate the challenges associated with the individualistic culture of academia without compromising women’s research and leadership potential.

For me personally, I’m in the ideal situation for an academic woman. I graduated my BSc with high honours in four years, and with many awards. I’ve already had success in research and have published several peer reviewed papers. I’ve faced some mild sexism from peers and a couple of TAs, but nothing that’s seriously held me back. My supervisors have all been extremely supportive and feminist, and all of the people that I work with on a daily basis are equally wonderful. Despite all of this support, I’m looking at the timelines of an academic career, and the time constraints of female reproduction, and honestly, I don’t see how I can feasible expect to stay in academia and have the family life I want. And since I’m in the privileged position of being surrounded by supportive and feminist colleagues, I can say it: I’m considering leaving academia, if something doesn’t change, because even though I love it, I don’t see how it can fit in to my family plans.

But wait! All of these interventions are really expensive. Money doesn’t just grow on trees, you know!

It doesn’t in general, but in this case it kind of does — well, actually, we already grew it. We spend billions of dollars training women in STEM. By not making full use of their skills, if we look at only the american economy, we are wasting about $1.5 billion USD per year in economic benefits they would have produced if they stayed in STEM. So here’s a business proposal: let’s spend half of that on better family support and scientific assistants for primary caregivers, and keep the other half in profit. Heck, let’s spend 99% — $1.485 billion (in the states alone) on better support. That should put a dent in the support bill, and I’d sure pick up $15 million if I saw it lying around. Wouldn’t you?

By demonstrating that we will support women in STEM who choose to have a family, we will encourage more women with PhDs to apply for the academic positions that they are eminently qualified for. Our institutions will benefit from the wider applicant pool, and our whole society will benefit from having the skills of these highly trained and intelligent women put to use innovating new solutions to our modern day challenges.

Tommaso DorigoMachine Learning For Jets: A Workshop In New York

The third "Machine Learning for Jets" workshop is ongoing these days at the Kimmel centre of New York University, a nice venue overlooking Washington Square park in downtown Manhattan. I came to attend it and remain up-to-date with the most advanced new algorithms that are been used for research in collider physics, as I have done last year. The workshop is really well organized and all talks are quite interesting, so this is definitely a good time investment for me.

read more

n-Category Café Codensity Monads

Yesterday I gave a seminar at the University of California, Riverside, through the magic of Skype. It was the first time I’ve given a talk sitting down, and only the second time I’ve done it in my socks.

The talk was on codensity monads, and that link takes you to the slides. I blogged about this subject lots of times in 2012 (1, 2, 3, 4), and my then-PhD student Tom Avery blogged about it too. In a nutshell, the message is:

Whenever you meet a functor, ask what its codensity monad is.

This should probably be drilled into learning category theorists as much as better-known principles like “whenever you meet a functor, ask what adjoints it has”. But codensity monads took longer to be discovered, and are saddled with a forbidding name — should we just call them “induced monads”?

In any case, following this principle quickly leads to many riches, of which my talk was intended to give a taste.

Mark GoodsellHow to make progress in High Energy Physics

How to make progress in High Energy Physics Before I start, just following from my previous post about B-mesons, today I saw a CERN press release about lepton universality in B-baryons (i.e. particles made of three quarks, at least one of which is a bottom, rather than B-mesons, which have two quarks, at least one of which is a bottom). It seems there is a \( 1 \sigma \) deviation in $$ R_{pK}^{-1} \equiv \frac{\mathrm{BR} (\Lambda_b^0 \rightarrow p K^- e^+ e^-)}{\mathrm{BR} (\Lambda_b^0 \rightarrow p K^- J/\psi(\rightarrow e^+ e^-))} \times \frac{\mathrm{BR} (\Lambda_b^0 \rightarrow p K^- J/\psi(\rightarrow \mu^+ \mu^-))}{\mathrm{BR} (\Lambda_b^0 \rightarrow p K^- \mu^+ \mu^-)} $$ While, by itself, it is amazing that this gets a press release heralding a "crack in the Standard Model", it does add some small evidence to the picture of deviations from Standard Model predictions; no doubt the interpretation in terms of a global fit with other observables will appear soon on the arXiv. So it's a positive way to start this entry.

Polemics on foundations of HEP

Recently an article appeared by S. Hossenfelder that again makes the claim that "fundamental physics" is stuck, has failed etc, that theorists are pursuing dead-end theories and "do not think about which hypotheses are promising" and "theoretical physicists have developed a habit of putting forward entirely baseless speculations." It is fairly common and depressing to see this message echoed in a public space. However what riled me enough to write was the article on Not Even Wrong discussing it, in which surprise is expressed that people at elite institutions would still teach courses in Beyond the Standard Model Physics and Supersymmetry. This feels to me like an inadvertent personal attack, since I happen to teach courses on BSM physics and SUSY at an elite French institution (Ecole Polytechnique) ... hence this post.

Physicists are not sheep

Firstly though I'd like to address the idea that physicists are not aware of the state of their field. While "maverick outsiders" might like to believe that HEP theorists live in a bubble, just following what they are told to work on, it upsets me that this message has cut through enough that a lot of students now feel that it is true, that if they do not work on what they perceive to be "hot topics" then they will be censured. The truth is that now, more than perhaps at any time since I entered the field, there is a lack of really "hot topics." In previous decades, there were often papers that would appear with a new idea that would be immediately jumped on by tens or hundreds of people, leading to a large number of more-or-less fast follow-up papers (someone once categorised string theorists as monkeys running from tree to tree eating only the low-hanging fruit). This seems to me to be much less prevalent now. People are really stepping back and thinking about what they do. They are aware that there is a lack of clear evidence for what form new physics should take, and that the previously very popular idea of naturalness is probably not a reliable guide. Some people are embracing new directions in dark matter searches, others are trying to re-interpret experimental data in terms of an effective field theory extension of the Standard Model, others are seeing what we can learn from gravitational waves, still others are looking at developing new searches for axions, some people are looking instead at fundamental Quantum Field Theory problems, cosmology has made huge progress, etc etc (see also this thread by Dan Green). There is a huge diversity of ideas in the field and that is actually very healthy. There is also a very healthy amount of skepticism.

On the other hand, as I mentioned in my previous post, there are some tentative pieces of evidence pointing to new physics at accessible scales; and whatever explains dark matter, it should be possible to probe it with some form of experiment or observation. This is the reason for continued optimism in the field of real breakthroughs. We could be on the verge of overturning the status quo, via the (apparently outdated) method of doing experiments, and then we will race to understand the results and interpret them in terms of our favourite theories -- or maybe genuinely new ones. Of course, maybe these are mirages; as physicists we will continue to look for new and creative ways to search for new phenomena, even if we do not have a new high energy collider -- but if we don't build a new collider we will never know what we might find.

What courses should you take

Coming now to the idea of what students entering the field should learn, in the current negative climate it needs repeating that the Standard Model is incomplete. I'm not just talking about a lack of quantum gravity, but there is a laundry list of problems that I repeat to my students at the beginning of the course:
  1. Quantum gravity.
  2. Dark matter, or something that explains rotation curves, the CMB, etc.
  3. Dark energy -- no, it hasn't been ruled out by one paper on supernovae. It was awarded the Nobel Prize because people already expected it from other observations.
  4. Inflation, or something else that solves the same problems.
  5. The strong CP problem. We have phases in the quark Yukawas, so we should have a neutron electric dipole moment \(10^{10} \) times greater than we observe. Most people believe this should be solved by an axion -- which might also be dark matter -- hence a lot of effort to find it, and ADMX (among other experiments) might be getting close.
  6. Baryogenesis. The Standard Model Higgs is too heavy to have electroweak baryogenesis. There is apparently not enough CP violation in the Standard Model either.
  7. Neutrino masses. We can't write them into the Standard Model because we don't even know if neutrinos are Majorana or Dirac! Maybe a heavy right-handed neutrino can give us Baryogenesis through leptogenesis. There is a huge amount going on in neutrino physics at the moment, too ...

Nearly all of these topics are not generally covered in a standard set of graduate courses (at least here). I try to present the evidence and some possible solutions. So the first time a lot of students encounter these issues is through popular press articles, and oblique references in "standard" courses. And if we are going to make progress on solving some of these fundamental issues, should students not have some idea on what attempts have been made to solve them?

Turning now to supersymmetry, I would not recommend that a beginning student in particle phenomenology make it the sole focus of their work (unless they really have a good motivation to do so). But there are many reasons to study it still:

  1. It is hugely important in formal applications -- to give us a handle on strongly coupled theories, allowing us to compute things we could never do in non-SUSY theories, as toy models, \( N=4 \) SYM being the "simplest field theory" (as Arkani-Hamed likes to reiterate) etc etc.
  2. It seems to be necessary for the consistency of string theory. I personally prefer string theory as a candidate framework for quantum gravity; if you want to study it, you need to study SUSY.
  3. A lot of the difficulty with the formalism for beginning students is just understanding two-component spinors -- these are actually very useful tools if you want to study amplitudes in general.
  4. It allows us to actually address the Hierarchy problem, and related to this, the idea of the vacuum energy of the theory being related to a cosmological constant. This is a subtle (and maybe heated) discussion for another time.
  5. The gauge couplings apparently unify in the simplest SUSY extensions of the Standard Model. If this is just a coincidence then I feel that nature is playing a cruel joke on us.
  6. The Standard Model appears to be at best metastable (there is some dispute about this). It has been further suggested (e.g. here) that black holes might seed the vacuum decays, so that if it is not absolutely stable then it should decay much quicker than we would otherwise think; and in any case the standard calculation has to assume that there are no quantum gravity contributions (giving higher-order operators). New physics at an intermediate scale (below \( \sim 10^{11} \) GeV) such as supersymmetry would then be necessary to stabilise the vacuum.
  7. It genuinely could still be found at accessible energies; the LHC is actually very poor at finding particles that don't couple to the strong force, and new electroweak states could easily be lurking in plain sight ...
  8. ... related to this, it's just about the only "phenomenological framework" for new physics that addresses lots of different problems with the Standard Model.

Of course, nowadays as a community we are trying to hedge our bets: there is much more ambivalence about what theories might be found just around the corner, hence my own work on generic pheneomenology, and a lot of interest in the Standard Model EFT.

How we should make progress

Finally we get to the topic of the post. In the original article that I linked to above, Hossenfelder does make (as she has made elsewhere) the positive suggestion that physicists should talk to philosophers. [ In France, this is amusing, because there is a fantastic tradition of famous philosophers, and every schoolchild has to study philosophy up to the age of 18 ] It is good to make suggestions. In the article though is the idea that people cannot recognise promising new ideas amidst a sea of "bad" ones, so people are either following old dead ends or endlessly making ridiculous suggestions. I admit that, superficially, this is the impression people could have got once upon a time, but I would argue is not the state of the field now. I disagree that the problem is a fundamental one about how people think, or that there is a system censuring of "good" radical ideas. I don't think there is only one way to make progress: if I had a suggestion how the scientific creative process should work I would be applying it like crazy before advertising the benefits publically! And of censureship of "good" ideas, there are a lot of people willing to take a risk on new concepts. Research is hard, and creativity is not something that is easily taught. But I am constantly amazed by the creativity and ingenuity of my peers, the diversity of their ideas, and it is heartbreaking to see their effort denigrated in popular articles.

Indeed, repeatedly making the claim in public that one group of scientists are dishonest (or kidding themselves) about progress, that the field has failed etc, helps no-one. It deeply worries me to read that Dominic Cummings has Not Even Wrong on his blog roll; and I have already seen that people in other fields often hold very wrong opinions about the state of fundamental physics due to this filtering through (most people only see wildly speculative and hagiographic articles on one side and hugely negative pessimism on the other). It has been an issue when deciding about grant funding for at least a decade already. And it also filters through to students when they are deciding what to do, who, as I pointed out above, usually haven't really seen enough about the fundamentals of the field before they have to make a decision on what they want to study.

Finally, coming back to the suggestion that physicists do not think about what they are doing or why, there are two very important times when we emphatically do do this: when we are teaching, and when we are writing grant proposals. The preparation of both of these things can be hard work, but they are rewarding, and more reasons that I have faith in my fellow physicists ability to genuinely try to challenge the big problems in our field.

John BaezTopos Theory (Part 2)

Last time I defined sheaves on a topological space X; this time I’ll say how to get these sheaves from ‘bundles’ over X. You may or may not have heard of bundles of various kinds, like vector bundles or fiber bundles. If you have, be glad: the bundles I’m talking about now include these as special cases. If not, don’t worry: the bundles I’m talking about now are much simpler!

A bundle over X is simply a topological space Y equipped with a continuous map to X, say

p \colon Y \to X

You should visualize Y as hovering above X, and p as projecting points y \in Y down to their shadows p(y) in X. This explains the word ‘over’, the term ‘projection’ for the map p, and many other things. It’s a powerful metaphor.

Bundles are not only a great source of examples of sheaves; in fact every sheaf comes from a bundle! Conversely, every sheaf—and even every presheaf—gives rise to a bundle.

But these constructions, which I’ll explain, do not give an equivalence of categories. That is, sheaves are not just another way of thinking about bundles, and neither are presheaves. Instead, we’ll get adjoint functors between the category of presheaves on X and the category of bundles X, and these will restrict to give an equivalence between the category of ‘nice’ presheaves on X—namely, the sheaves—and a certain category of ‘nice’ bundles over X, which are called ‘etale spaces’.

Thus, in the end we’ll get two complementary viewpoints on sheaves: the one I discussed last time, and another, where we think of them as specially nice bundles over X. In Sections 2.8 and 2.9 Mac Lane and Moerdijk use these complementary viewpoints to efficiently prove some of the big theorems about sheaves that I stated last time.

Before we get going, a word about a word: ‘etale’. This is really a French word, ‘étalé’, meaning ‘spread out’. We’ll see why Grothendieck chose this word. But now I mainly just want to apologize for leaving out the accents. I’m going to be typing a lot, it’s a pain to stick in those accents each time, and in English words with accents feel ‘fancy’.

From bundles to presheaves

Any bundle over X, meaning any continuous map

p \colon Y \to X,

gives a sheaf over X. Here’s how. Given an open set U \subseteq X, define a section of p over U to be a continuous function

s \colon U \to Y

such that

p \circ s = 1_U

In terms of pictures (which I’m too lazy to draw here) s maps each point of U to a point in Y ‘sitting directly over it’. There’s a presheaf \Gamma_p on X that assigns to each open set U \subset X the set of all sections of p over U:

\Gamma_p U = \{s: \; s \textrm{ is a section of } p \textrm{ over } U \}

Of course, to make \Gamma_p into a presheaf we need to say how to restrict sections over U to sections over a smaller open set, but we do this in the usual way: by restricting a function to a subset of its domain.

Puzzle. Check that with this choice of restriction maps \Gamma_p is a presheaf, and in fact a sheaf.

There’s actually a category of bundles over X. Given bundles

p \colon Y \to X


p' \colon Y' \to X

a morphism from the first to the second is a continuous map

f \colon Y \to Y'

making the obvious triangle commute:

p' \circ f = p

I’m too lazy to draw this as a triangle, so if you don’t see it in your mind’s eye you’d better draw it. Draw Y and Y' as two spaces hovering over X, and f as mapping each point in Y over x \in X to a point in Y' over the same point x.

We can compose morphisms between bundles over X in an evident way: a morphism is a continuous map with some property, so we just compose those maps. We thus get a category of bundles over X, which is called \mathsf{Top}/X.

I’ve told you how a bundle over X gives a presheaf on X. Similarly, a morphism of bundles over X gives a morphism of presheaves on X. Because this works in a very easy way, it should be no surprise that this gives a functor, which we call

\Gamma \colon \mathsf{Top}/X \to \widehat{\mathcal{O}(X)}

Puzzle. Suppose we have two bundles over X, say p \colon Y \to X and p' \colon Y' \to X, and a morphism from the first to the second, say f \colon Y \to Y'. Suppose s \colon U \to Y is a section of the first bundle over the open set U \subset X. Show that f \circ s is a section of the second bundle over U. Use this to describe what the functor \Gamma does on morphisms, and check functoriality.

From presheaves to bundles

How do we go back from presheaves to bundles? Start with a presheaf

F \colon \mathcal{O}(X)^{\mathrm{op}} \to \mathsf{Set}

on X. To build a bundle over X, we’ll start by building a bunch of sets called \Lambda(F)_x, one for each point x \in X. Then we’ll take the union of these and put a topology on it, getting a space called \Lambda(F). There will be a map

p \colon \Lambda(F) \to X

sending all the points in \Lambda(F)_x to x, and this will be our bundle over x.

How do we build these sets \Lambda(F)_x? Our presheaf

F \colon \mathcal{O}(X)^{\mathrm{op}} \to \mathsf{Set}

doesn’t give us sets for points of X, just for open sets. So, we should take some sort of ‘limit’ of the sets F U over smaller and smaller open neighborhoods U of x. Remember, if U' \subseteq U our presheaf gives a restriction map

F U \to FU'

So, what we’ll actually do is take the colimit of all these sets FU, as U ranges over all neighborhoods of x. That gives us our set \Lambda(F)_x.

It’s good to ponder what elements of \Lambda(F)_x are actually like. They’re called germs at x, which is a nice name, because you can only see them under a microscope! For example, suppose F is the sheaf of continuous real-valued functions, so that FU consists of all continuous functions from U to \mathbb{R}. By the definition of colimit, for any open neighborhood U of x we have a map

FU \to \Lambda(F)_x

So any continuous real-valued function defined on any open neighborhood of x gives a ‘germ’ of a function on x. But also by the definition of colimit, any two such functions give the same germ iff they become equal when restricted to some open neighborhood of x. So the germ of a function is what’s left of that function as you zoom in closer and closer to the point x.

(If we were studying analytic functions on the real line, the germ at x would remember exactly their Taylor series at that point. But smooth functions have more information in their germs, and continuous functions are weirder still. For more on germs, watch this video.)

Now that we have the space of germs \Lambda(F)_x for each point x \in X, we define

\Lambda(F) = \bigcup_{x \in X} \Lambda(F)_x

There is then a unique function

p \colon \Lambda(F) \to X

sending everybody in \Lambda(F)_x to x. So we’ve almost gotten our bundle over X. We just need to put a topology on \Lambda(X).

We do this as follows. We’ll give a basis for the topology, by describing a bunch of open neighborhoods of each point in \Lambda(F). Remember, any point in \Lambda(F) is a germ. More specifically, any point in \Lambda(F) is in some set \Lambda(F)_x, so it’s the germ of some s \in FU where U is an open neighborhood of x. But this s has lots of other germs, too, namely its germs at all points y \in U. We take this collection of all these germs to be an open neighborhood of x. A general open set in \Lambda(F) will then be an arbitrary union of sets like this.

Puzzle. Show that with this topology on \Lambda(F), the map p \colon \Lambda(F) \to X is continuous.

Thus any presheaf on X gives a bundle over X.

Puzzle. Describe how a morphism of presheaves on X gives a morphism of bundles over X, and show that your construction defines a functor

\Lambda \colon \widehat{\mathcal{O}(X)} \to \mathsf{Top}/X

Etale spaces

So now we have functors that turn bundles into presheaves:

\Gamma \colon \mathsf{Top}/X \to \widehat{\mathcal{O}(X)}

and presheaves into bundles:

\Lambda \colon \widehat{\mathcal{O}(X)} \to \mathsf{Top}/X

But we have already seen that the presheaves coming from bundles are ‘better than average’: they are sheaves! Similarly, the bundles coming from presheaves are better than average. They are ‘etale spaces’.

What does this mean? Well, if you think back on how we took a presheaf F and gave \Lambda(F) a topology a minute ago, you’ll see something very funny about that topology. Each point in \Lambda(F) has a neighborhood such that

p \colon \Lambda(F) \to X

restricted to that neighborhood is a homeomorphism. Indeed, remember that each point in \Lambda(F) is a germ of some

s \in F U

for some open U \subseteq X. We made the set of all germs of s into an open set in \Lambda(F). Call that open set V.

Puzzle. Show that p is a homeomorphism from V to U.

In class I’ll draw a picture of what’s going on. \Lambda(F) is a space sitting over X has lots of open sets V that look exactly like open sets U down in X. In terms of our visual metaphor, these open sets V are ‘horizontal’, which is why we invoke the term ‘etale’:

Definition. A bundle p \colon Y \to X is etale if each point y \in Y has an open neighborhood V such that p restricted to V is a homeomorphism from V to an open subset of X. We often call such a bundle an etale space over X.

So, if you did the last puzzle, you’ve shown that any presheaf on X gives an etale space over X.

(By the way, if you know about covering spaces, you should note that every covering space of X is an etale space over X but not conversely. In a covering space p \colon Y \to X we demand that each point down below, in X, has a neighborhood U such that p^{-1}(U) is a disjoint union of open sets homeomorphic to U, with p restricting to homeomorphism on each of these open sets. In an etale space we merely demand that each point up above, in Y, has a neighborhood V such that p restricted to V is a homeomorphism. This is a weaker condition. In general, etale spaces are rather weird if you’re used to spaces like manifolds: for example, Y will often not be Hausdorff.)

Sheaves versus etale spaces

Now things are nicely symmetrical! We have a functor that turns bundles into presheaves

\Gamma \colon \mathsf{Top}/X \to \widehat{\mathcal{O}(X)}

but in fact it turns bundles into sheaves. We have a functor that turns presheaves into bundles

\Lambda \colon \widehat{\mathcal{O}(X)} \to \mathsf{Top}/X

but in fact it turns presheaves into etale spaces.

Last time we defined \mathsf{Sh}(X) to be the full subcategory of \widehat{\mathcal{O}(X)} having sheaves as objects. Now let’s define \mathsf{Etale}(X) to be the full subcategory of \mathsf{Top}/X having etale spaces as objects. And here’s the punchline:

Theorem. The functor

\Lambda \colon \widehat{\mathcal{O}(X)} \to \mathsf{Top}/X

is left adjoint to the functor

\Gamma \colon \mathsf{Top}/X \to \widehat{\mathcal{O}(X)}

Moreover, if we restrict these functors to the subcategories \mathsf{Sh}(X) and \mathsf{Etale}(X), we get an equivalence of categories

\mathsf{Sh}(X) \simeq  \mathsf{Etale}(X)

The proof involves some work but also some very beautiful abstract nonsense: see Theorem 2, Corollary 3 and Lemma 4 of Section II.6. There’s a lot more to say, but this seems like a good place to stop.

January 14, 2020

Doug NatelsonThe Wolf Prize and how condensed matter physics works

The Wolf Prize in Physics for 2020 was announced yesterday, and it's going to Pablo Jarillo-Herrero, Allan MacDonald, and Rafi Bistritzer, for twisted bilayer graphene.  This prize is both well-deserved and a great example of how condensed matter physics works.  

MacDonald and Bistritzer did key theory work (for example) highlighting how the band structure of twisted bilayer graphene would become very interesting for certain twist angles - how the moire pattern from the two layers would produce a lateral periodicity, and that interactions between the layers would lead to very flat bands.  Did they predict every exotic thing that has been seen in this system?  No, but they had the insight to get key elements, and the knowledge that flat bands would likely lead to many competing energy scales, including electron-electron interactions, the weak kinetic energy of the flat bands, the interlayer coupling, effective magnetic interactions, etc.  Jarillo-Herrero was the first to implement this with sufficient control and sample quality to uncover a remarkable phase diagram involving superconductivity and correlated insulating states.  Figuring out what is really going on here and looking at all the possibilities in related layered materials will keep people busy for years.   (As an added example of how condensed matter works as a field, Bistritzer is in industry working for Applied Materials.)

All of this activity and excitement, thanks to feedback between well-motivated theory and experiment, is how the bulk of physics that isn't "high energy theory" actually works.  

John BaezTopos Theory (Part 1)

I’m teaching an introduction to topos theory this quarter, loosely based on Mac Lane and Moerdijk’s Sheaves in Geometry and Logic.

I’m teaching one and a half hours each week for 10 weeks, so we probably won’t make it far very through this 629-page book. I may continue for the next quarter, but still, to make good progress I’ll have to do various things.

First, I’ll assume basic knowledge of category theory, a lot of which is explained in the Categorical Preliminaries and Chapter 1 of this book. I’ll start in with Chapter 2. Feel free to ask questions!

Second, I’ll skip a lot of proofs and focus on stating definitions and theorems, and explaining what they mean and why they’re interesting.

These notes to myself will be compressed versions of what I will later write on the whiteboard.


Topos theory emerged from Grothendieck’s work on algebraic geometry; he developed it as part of his plan to prove the Weil Conjectures. It was really just one of many linked innovations in algebraic geometry that emerged from the French school, and it makes the most sense if you examine the whole package. Unfortunately algebraic geometry takes a long time to explain! But later Lawvere and Tierney realized that topos theory could serve as a grand generalization of logic and set theory. This logical approach is more self-contained, and easier to explain, but also a bit more dry—at least to me. I will try to steer a middle course, and the title Sheaves in Geometry and Logic shows that Mac Lane and Moerdijk were trying to do this too.

The basic idea of algebraic geometry is to associate to a space the commutative ring of functions on that space, and study the geometry and topology of this space using that ring. For example, if X is a compact Hausdorff space there’s a ring C(X) consisting of all continuous real-valued functions on X, and you can recover X from this ring. But algebraic geometers often deal with situations where there aren’t enough everywhere-defined functions (of the sort they want to consider) on a space. For example, the only analytic functions on the Riemann sphere are constant functions. That’s not good enough! Most analytic functions on the Riemann sphere have poles, and are only defined away from these poles. (I’m giving an example from complex analysis, in hopes that more people will get what I’m talking about, but there are plenty of purely algebraic examples.)

This forced algebraic geometers to invent ‘sheaves’, around 1945 or so. The idea of a sheaf is that instead of only considering functions defined everywhere, we look at functions defined on open sets.

So, let X be a topological space and let \mathcal{O}(X) be the collection of open subsets of X. This is a poset with inclusion as the partial ordering, and thus it is a category. A presheaf is a functor

F \colon \mathcal{O}(X)^{\mathrm{op}} \to \mathsf{Set}

So, a sheaf assigns to each open set U a set F U. It allows us to restrict an element of F U to any smaller open set U' \subseteq U, and a couple of axioms hold, which are encoded in the word ‘functor’. Note the ‘op’: that’s what lets us restrict elements of F U to smaller open sets.

The example to keep in mind is where F U consists of functions on U (that is, functions of the sort we want to consider, such as continuous or smooth or analytic functions). However, other examples are important too.

In many of these examples something nice happens. First, suppose we have s \in F U and an open cover of U by open sets U_i. Then we can restrict s to U_i getting something we can call s|_{U_i}. We can then further restrict this to U_i \cap U_j. And by the definition of presheaf, we have

(s|_{U_i})|_{U_i \cap U_j} = (s|_{U_j})|_{U_i \cap U_j}

In other words, if we take a guy in F U and restrict it to a bunch of open sets covering U, the resulting guys agree on the overlaps U_i \cap U_j. Check that this follows from the definition of functor and some other facts!

This is true for any presheaf. A presheaf is a sheaf if we can start the other way around, with a bunch of guys s_i \in F U_i that agree on overlaps:

s_i|_{U_i \cap U_j} = s_j|_{U_i \cap U_j}

and get a unique s \in F U that restricts to all these guys:

s|_{U_i} = s_i

Note this definition secretly has two clauses: I’m saying that in this situation s exists and is unique. If we have uniqueness but not necessarily existence, we say our presheaf is a separated presheaf.

The point of a sheaf is that you can tell if something is in F U by examining it locally. These examples explain what I mean:

Puzzle. Let X = \mathbb{R} and for each open set U \subseteq \mathbb{R} take F U to be the set of continuous real-valued functions on U. Show that with the usual concept of restriction of functions, F is a presheaf and in fact a sheaf.

Puzzle. Let X = \mathbb{R} and for each open set U \subseteq \mathbb{R} take F U to be the set of bounded continuous real-valued functions on U. Show that with the usual concept of restriction of functions, F is a separated presheaf but not a sheaf.

The problem is that a function can be bounded on each open set in an open cover of U yet not bounded on U. You can tell if a function is continuous by examining it locally, but you can’t tell if its bounded!

So, in a sense that should gradually become clear, sheaves are about ‘local truth’.

The category of sheaves on a space

There’s a category of presheaves on any topological space X. Since a presheaf on X is a functor

F \colon \mathcal{O}(X)^{\mathrm{op}} \to \mathsf{Set},

a morphism between presheaves is a natural transformation between such functors.

Remember, if \mathsf{C} and \mathsf{D} are categories, we use \mathsf{C}^{\mathsf{D}} to stand for the category where the objects are functors from \mathsf{D} to \mathsf{C}, and the morphisms are natural transformations. This is called a functor category.

So, a category of presheaves is just an example of a functor category, and the category of presheaves on X is called


But this name is rather ungainly, so we make an abbreviation

\widehat{\mathsf{C}} = \mathsf{Set}^{\mathsf{C}^{\mathrm{op}}}

Then the category of presheaves on X is called


Sheaves are subtler, but we define morphisms of sheaves the exact same way. Every sheaf has an underlying presheaf, so we define a morphism between sheaves to be a morphism between their underlying presheaves. This gives the category of sheaves on X, which we call \mathsf{Sh}(X).

By how we’ve set things up, \mathsf{Sh}(X) is a full subcategory of

Now, what Grothendieck realized is that \mathsf{Sh}(X) acts a whole lot like the category of sets. For example, in the category of sets we can define ‘commutative rings’, but we can copy the definition in \mathsf{Sh}(X) and get ‘sheaves of commutative rings’, and so on. The point is that we’re copying ordinary math, but doing it locally, in a topological space.

Elementary topoi

Lawvere and Tierney clarified what was going on here by inventing the concept of ‘elementary topos’. I’ll throw the definition at you now and explain all the pieces in future classes:

Definition. An elementary topos, or topos for short, is a category with finite limits and colimits, exponentials and a subobject classifier.

I hope you know limits and colimits, since that’s the kind of basic category theory definition I’m assuming. Given two objects x and y in a category, their exponential is an object x^y that acts like the thing of all maps from y to x. I’ll give the actual definition later. A subobject classifier is, roughly, an object \Omega that generalizes the usual set of truth values

2 = \{0,1\}

Namely, subobjects of any object x are in one-to-one correspondence with morphisms from x to \Omega, which serve as ‘characteristic functions’. Again, this is just a sketch: I’ll give the actual definition later, or you can click on the link and read it now.

The point is that an elementary topos has enough bells and whistles that we can ‘do mathematics inside it’. It’s like an alternative universe, a variant of our usual category of sets and functions, where mathematicians can live. But beware: in general, the kind of mathematics we do in an elementary topos is finitistic mathematics using intuitionistic logic.

You see, the category of finite sets is an elementary topos, so you can’t expect to have ‘infinite objects’ like the set of natural numbers in an elementary topos—unless you decree that you want them (which people often do).

Also, we will see that while 2 = \{0,1\} is a Boolean algebra, the subobject classifier of an elementary topos need only be a ‘Heyting algebra’: a generalization of a Boolean algebra in which the law of excluded middle fails. This is actually not weird: it’s connected to the fact that a category of sheaves lets us reason ‘locally’. For example, we don’t just care if two functions are equal or not, we care if they’re equal or not in each open set. So we need a subtler form of logic than classical Boolean logic.

There’s a lot more to say, and I’m just sketching out the territory now, but one of the first big theorems we’re aiming for is this:

Theorem. For any topological space X, \mathsf{Sh}(X) is an elementary topos.

The topos of sheaves \mathsf{Sh}(X) remembers a lot about the topological space X that it came from… so a topos can also be seen as a way of talking about a space! This is even true for elementary topoi that aren’t topoi of sheaves on an actual space. So, topos theory is more than a generalization of set theory. It’s also, in a different way, a generalization of topology.

Grothendieck topoi

You’ll notice that sheaves on X were defined starting with the poset \mathcal{O}(X) of open sets of X. In fact, to define them we never used anything about X except this poset! This suggests that we could define sheaves more generally starting from any poset.

And that’s true—but Grothendieck went further: he defined sheaves starting from any category, as long as that category was equipped with some extra structure saying when a bunch of morphisms f_i \colon x_i \to x serve to ‘cover’ the object x. This extra data is called a ‘coverage’ or more often (rather confusingly) a ‘Grothendieck topology’. A category equipped with a Grothendieck topology is called a ‘site’.

So, Grothendieck figured out how to talk about the category of sheaves \mathsf{Sh}(\mathsf{C}) on any site \mathsf{C}. He did this before Lawvere and Tierney came along, and this was his definition of a topos. So, nowadays we say a category of sheaves on a site is a Grothendieck topos. However:

Theorem. Any Grothendieck topos is an elementary topos.

So, Lawvere and Tierney’s approach subsumes Grothendieck’s, in a sense. Not every elementary topos is a Grothendieck topos, though! For example, the category of finite sets is an elementary topos but not a Grothendieck topos. (It’s not big enough: any Grothendieck topos has, not just finite limits and colimits, but all small limits and colimits.) So both concepts of topos are important and still used. But when I say just ‘topos’, I’ll mean ‘elementary topos’.

Why did Grothendieck bother to generalize the concept of sheaves from sheaves on a topological space to sheaves on a site? He wasn’t just doing it for fun: it was a crucial step in his attempt to prove the Weil Conjectures!

Basically, when you’re dealing with spaces that algebraic geometers like—say, algebraic varieties—there aren’t enough open sets to do everything we want, so we need to use covering spaces as a generalization of open covers. So, instead of defining sheaves using the poset of open subsets of our space X, Grothendieck needed to use the category of covering spaces of X.

That’s the rough idea, anyway.

Geometric morphisms

As you probably know if you’re reading this, category theory is all about the morphisms. This is true not just within a category, but between them. The point of topos theory is not just to study one topos, but many. We don’t want merely to do mathematics in alternative universes: we want to be able to translate mathematics from one alternative universe to another!

So, what are the morphisms between topoi?

First, if you have a continuous map f \colon X \to Y between topological spaces, you can take the ‘direct image’ of a presheaf on X to get a presheaf on Y. Here’s how this works.

The inverse image of any open set is open, so we get an inverse image map

f^{-1} \colon \mathcal{O}(Y) \to \mathcal{O}(X)

sending each open set V \subseteq Y to the open set

f^{-1} V = \{x \in X :\; f(x) \in V \} \subseteq X

Given a presheaf F on X, we define its direct image to be the presheaf on Y given by

(f_\ast F)(V) = F(f^{-1} V)

Note the double reversal here: f maps points in X to points in Y, but open sets in Y give open sets in X, and then presheaves on X give presheaves on Y.

Of course we need to check that it works:

Puzzle. Show that f_\ast F is a presheaf. That is, explain how we can restrict an element of (f_\ast F)(V) to any open set contained in V, and check that we get a presheaf this way.

In fact it works very nicely:

Puzzle. Show that taking direct images gives a functor from the category of presheaves on X to the category of presheaves on Y.

Puzzle. Show that if F is a sheaf on X, its direct image f_\ast F is a sheaf on Y.

The upshot of all this is that a continuous map between topological spaces

f \colon X \to Y

gives a functor between sheaf categories

f_\ast \colon \mathsf{Sh}(X) \to \mathsf{Sh}(Y)

And this functor turns out to be very nice! This is another big theorem we aim to prove later:

Theorem. If f \colon X \to Y is a continuous map between topological spaces, the functor

f_\ast \colon \mathsf{Sh}(X) \to \mathsf{Sh}(Y)

has a left adjoint

f^\ast \colon \mathsf{Sh}(Y) \to \mathsf{Sh}(X)

that preserves finite limits.

This left adjoint is called the inverse image map. Note that because f_\ast has a left adjoint, it is a right adjoint, so it preserves limits. Because f^\ast is a left adjoint, it preserves colimits. The fact that f^\ast preserves finite limits is extra gravy on top of an already nice situation!

We bundle all this niceness into a definition:

Definition. A functor f_\ast \colon \mathsf{T} \to \mathsf{T'} between topoi is a geometric morphism if it has a left adjoint that preserves finite limits.

And this is the most important kind of morphism between topoi. It’s not a very obvious definition, but it’s extracted straight from what happens in examples.

To wrap up, I should add that people usually call the pair consisting of f_\ast \colon \mathsf{T} \to \mathsf{T'} and its left adjoint f^\ast \colon \mathsf{T'} \to \mathsf{T} a geometric morphism. A functor has at most one adjoint, up to natural isomorphism, so my definition is at least tolerable. But I’ll probably switch to the standard one when we get serious about geometric morphisms.

And we will eventually see that geometric morphisms let us translate mathematics from one alternative universe to another!


If this seemed like too much too soon, fear not, I’ll go over it again and actually define a lot of the concepts I merely sketched, like ‘exponentials’, ‘subobject classifier’, ‘Heyting algebra’, ‘Grothendieck topology’, and ‘Grothendieck topos’. I just wanted to get a lot of the main concepts on the table quickly. You should do the puzzles to see if you understand what I wanted you to understand. Unless I made a mistake, all of these are straightforward definition-pushing if you’re comfortable with some basic category theory.

For more background on topos theory I highly recommend this:

• Colin McLarty, The uses and abuses of the history of topos theory.

Abstract. The view that toposes originated as generalized set theory is a figment of set theoretically educated common sense. This false history obstructs understanding of category theory and especially of categorical foundations for mathematics. Problems in geometry, topology, and related algebra led to categories and toposes. Elementary toposes arose when Lawvere’s interest in the foundations of physics and Tierney’s in the foundations of topology led both to study Grothendieck’s foundations for algebraic geometry. I end with remarks on a categorical view of the history of set theory, including a false history plausible from that point of view that would make it helpful to introduce toposes as a generalization from set theory.

There’s also a lot of background material in the book for this course:

January 13, 2020

Terence TaoSome recent papers

Just a brief post to record some notable papers in my fields of interest that appeared on the arXiv recently.

  • A sharp square function estimate for the cone in {\bf R}^3“, by Larry Guth, Hong Wang, and Ruixiang Zhang.  This paper establishes an optimal (up to epsilon losses) square function estimate for the three-dimensional light cone that was essentially conjectured by Mockenhaupt, Seeger, and Sogge, which has a number of other consequences including Sogge’s local smoothing conjecture for the wave equation in two spatial dimensions, which in turn implies the (already known) Bochner-Riesz, restriction, and Kakeya conjectures in two dimensions.   Interestingly, modern techniques such as polynomial partitioning and decoupling estimates are not used in this argument; instead, the authors mostly rely on an induction on scales argument and Kakeya type estimates.  Many previous authors (including myself) were able to get weaker estimates of this type by an induction on scales method, but there were always significant inefficiencies in doing so; in particular knowing the sharp square function estimate at smaller scales did not imply the sharp square function estimate at the given larger scale.  The authors here get around this issue by finding an even stronger estimate that implies the square function estimate, but behaves significantly better with respect to induction on scales.
  • On the Chowla and twin primes conjectures over {\mathbb F}_q[T]“, by Will Sawin and Mark Shusterman.  This paper resolves a number of well known open conjectures in analytic number theory, such as the Chowla conjecture and the twin prime conjecture (in the strong form conjectured by Hardy and Littlewood), in the case of function fields where the field is a prime power q=p^j which is fixed (in contrast to a number of existing results in the “large q” limit) but has a large exponent j.  The techniques here are orthogonal to those used in recent progress towards the Chowla conjecture over the integers (e.g., in this previous paper of mine); the starting point is an algebraic observation that in certain function fields, the Mobius function behaves like a quadratic Dirichlet character along certain arithmetic progressions.  In principle, this reduces problems such as Chowla’s conjecture to problems about estimating sums of Dirichlet characters, for which more is known; but the task is still far from trivial.
  • Bounds for sets with no polynomial progressions“, by Sarah Peluse.  This paper can be viewed as part of a larger project to obtain quantitative density Ramsey theorems of Szemeredi type.  For instance, Gowers famously established a relatively good quantitative bound for Szemeredi’s theorem that all dense subsets of integers contain arbitrarily long arithmetic progressions a, a+r, \dots, a+(k-1)r.  The corresponding question for polynomial progressions a+P_1(r), \dots, a+P_k(r) is considered more difficult for a number of reasons.  One of them is that dilation invariance is lost; a dilation of an arithmetic progression is again an arithmetic progression, but a dilation of a polynomial progression will in general not be a polynomial progression with the same polynomials P_1,\dots,P_k.  Another issue is that the ranges of the two parameters a,r are now at different scales.  Peluse gets around these difficulties in the case when all the polynomials P_1,\dots,P_k have distinct degrees, which is in some sense the opposite case to that considered by Gowers (in particular, she avoids the need to obtain quantitative inverse theorems for high order Gowers norms; which was recently obtained in this integer setting by Manners but with bounds that are probably not strong enough to for the bounds in Peluse’s results, due to a degree lowering argument that is available in this case).  To resolve the first difficulty one has to make all the estimates rather uniform in the coefficients of the polynomials P_j, so that one can still run a density increment argument efficiently.  To resolve the second difficulty one needs to find a quantitative concatenation theorem for Gowers uniformity norms.  Many of these ideas were developed in previous papers of Peluse and Peluse-Prendiville in simpler settings.
  • On blow up for the energy super critical defocusing non linear Schrödinger equations“, by Frank Merle, Pierre Raphael, Igor Rodnianski, and Jeremie Szeftel.  This paper (when combined with two companion papers) resolves a long-standing problem as to whether finite time blowup occurs for the defocusing supercritical nonlinear Schrödinger equation (at least in certain dimensions and nonlinearities).  I had a previous paper establishing a result like this if one “cheated” by replacing the nonlinear Schrodinger equation by a system of such equations, but remarkably they are able to tackle the original equation itself without any such cheating.  Given the very analogous situation with Navier-Stokes, where again one can create finite time blowup by “cheating” and modifying the equation, it does raise hope that finite time blowup for the incompressible Navier-Stokes and Euler equations can be established…  In fact the connection may not just be at the level of analogy; a surprising key ingredient in the proofs here is the observation that a certain blowup ansatz for the nonlinear Schrodinger equation is governed by solutions to the (compressible) Euler equation, and finite time blowup examples for the latter can be used to construct finite time blowup examples for the former.

Doug NatelsonPopular treatment of condensed matter - topics

I'm looking more seriously at trying to do some popularly accessible writing about condensed matter.  I have a number of ideas about what should be included in such a work, but I'm always interested in other peoples' thoughts on this.   Suggestions? 

January 12, 2020

n-Category Café Random Permutations (Part 12)

This time I’d like to repackage some of the results in Part 11 in a prettier way. I’ll describe the groupoid of ‘finite sets equipped with a permutation’ in terms of Young diagrams and cyclic groups. Taking groupoid cardinalities, this description will give a well-known formula for the probability that a random permutation belongs to any given conjugacy class!

Here is what we’ll prove:

Perm yY k=1 B(/k) y(k)y(k)! \mathsf{Perm} \simeq \sum_{y \in Y} \prod_{k=1}^\infty \frac{ \mathsf{B}(\mathbb{Z}/k)^{y(k)}}{ y(k)!}

Let me explain the notation here.

First, Perm\mathsf{Perm} stands for the groupoid of finite sets equipped with permutation. Explicitly:

  • an object (X,σ)(X,\sigma) of Perm\mathsf{Perm} is a finite set XX with a bijection σ:XX\sigma \colon X \to X;
  • a morphism f:(X,σ)(X,σ)f \colon (X,\sigma) \to (X',\sigma') is a bijection f:XXf \colon X \to X' such that σ=fσf 1\sigma' = f \sigma f^{-1} .

Second, YY stands for the set of Young diagrams. A Young diagram looks like this:

but we will think of Young diagrams as functions y: +y \colon \mathbb{N}^+ \to \mathbb{N} that vanish at all but finitely many points. The idea is that a Young diagram yy has y(k)y(k) columns of length kk for each k=1,2,3,k = 1, 2, 3, \dots. For example, the Young diagram above has y(1)=1,y(2)=3,y(3)=1y(1) = 1, y(2) = 3, y(3) = 1 and y(n)=0y(n) = 0 for all other nn.

Third, B(G)\mathsf{B}(G) stands for the one-object groupoid corresponding to the group GG.

Fourth, for any category C\mathsf{C},

C kk! \frac{\mathsf{C}^k}{k!}

stands for the kkth symmetrized power of C\mathsf{C}. This is easiest to understand if we recall that the free symmetric monoidal category on C\mathsf{C}, say S(C)\mathsf{S}(\mathsf{C}), has a description as

S(C) k=0 C kk! \mathsf{S}(\mathsf{C}) \simeq \sum_{k = 0}^\infty \frac{\mathsf{C}^k}{k!}

where an object of C k/k!\mathsf{C}^k/k! is a kk-tuple (c 1,,c k)(c_1, \dots, c_k) of objects of C\mathsf{C} and a morphism is a kk-tuple (f 1,,f k)(f_1, \dots, f_k) of morphisms in C\mathsf{C} together with a permutation σS k\sigma \in S_k. The morphisms are composed in a manner familiar from the ‘wreath product’ of groups. Indeed, if GG is a group and B(G)\mathsf{B}(G) is the corresponding one-object groupoid, we have

B(G) kk!B(S kG k)(1) \frac{\mathsf{B}(G)^k}{k!} \simeq \mathsf{B}(S_k \ltimes G^k) \quad \quad (1)

where the semidirect product S kG kS_k \ltimes G^k is called the wreath product of S kS_k and GG.

Now that all the notation is defined, we can prove the result:

Theorem. There is an equivalence of groupoids

Perm yY k=1 B(/k) y(k)y(k)! \mathsf{Perm} \simeq \sum_{y \in Y} \prod_{k=1}^\infty \frac{ \mathsf{B}(\mathbb{Z}/k)^{y(k)}}{ y(k)!}

Proof. First note that Perm\mathsf{Perm} is equivalent to its full subcategory where we use one finite set with each cardinality. It is thus equivalent to the groupoid where

  • an object is a natural number nn and an element σS n\sigma \in S_n,
  • a morphism f:(n,σ)(n,σ)f \colon (n,\sigma) \to (n, \sigma') is a permutation fS nf \in S_n such that σ=fσf 1\sigma' = f \sigma f^{-1} .

Thus, isomorphism classes of objects in Perm\mathsf{Perm} correspond to conjugacy classes of permutations. A conjugacy class of permutations is classified by its number of cycles of each length, and thus by a Young diagram y: +y \colon \mathbb{N}^+ \to \mathbb{N} saying that there are y(k)y(k) cycles of length kk for each k=1,2,3,k = 1, 2, 3, \dots.

In short, if we use π 0(G)\pi_0(G) to stand for the set of isomorphism classes of objects of the groupoid GG, we have established an isomorphism

π 0(Perm)Y \pi_0(\mathsf{Perm}) \cong Y

where YY is the set of Young diagrams. The groupoid Perm\mathsf{Perm} is thus equivalent to a coproduct of connected groupoids, one for each Young diagram:

Perm yYPerm y \mathsf{Perm} \simeq \sum_{y \in Y} \mathsf{Perm}_y

By taking a skeleton we can assume each groupoid Perm y\mathsf{Perm}_y has one object, namely (n,σ)(n,\sigma) where σS n\sigma \in S_n is a chosen permutation with y(k)y(k) cycles of length kk for each k=1,2,3,k = 1, 2, 3, \dots. The automorphisms of this object are then permutations fS nf \in S_n with σ=fσf 1\sigma = f \sigma f^{-1} .

In short, Perm y\mathsf{Perm}_y is the one-object groupoid corresponding to the centralizer of σS n\sigma \in S_n, where σ\sigma is any permutation with y(k)y(k) cycles of length kk for all kk.

We can choose σ\sigma to act on the boxes of the Young diagram yy, cyclically permuting the entries in each column in such a way that the first entry in each column is mapped to the second, the second is mapped to the third, and so on, with the last entry being mapped to the first. Any element of the centralizer of σ\sigma thus consists of a permutation of the columns, mapping each column to some other column of the same height, followed by an arbitrary cyclic permutation of the entries in each column. It follows that the centralizer is isomorphic to

k=1 S y(k)(/k) y(k) \prod_{k=1}^\infty S_{y(k)} \ltimes (\mathbb{Z}/k)^{y(k)}


Perm yB( k=1 S y(k)(/k) y(k)) \mathsf{Perm}_y \simeq \mathsf{B} \left( \prod_{k=1}^\infty S_{y(k)} \ltimes (\mathbb{Z}/k)^{y(k)} \right)

Then, by equation (1) and the fact that B:GpGpd\mathsf{B} \colon \mathsf{Gp} \to \mathsf{Gpd} preserves products, we have

Perm y k=1 B(S y(k)(/k) y(k)) k=1 B(/k) y(k)y(k)! \begin{array}{ccl} \mathsf{Perm}_y &\simeq& \displaystyle{ \prod_{k=1}^\infty \mathsf{B} \left( S_{y(k)} \ltimes (\mathbb{Z}/k)^{y(k)} \right) } \\ \\ &\simeq & \displaystyle{ \prod_{k=1}^\infty \frac{ \mathsf{B}(\mathbb{Z}/k)^{y(k)}}{ y(k)!} } \end{array}

It follows that

Perm yYPerm y yY k=1 B(/k) y(k)y(k)! \begin{array}{ccl} \mathsf{Perm} &\simeq & \displaystyle{ \sum_{y \in Y} \mathsf{Perm}_y } \\ \\ &\simeq& \displaystyle{ \sum_{y \in Y} \prod_{k=1}^\infty \frac{ \mathsf{B}(\mathbb{Z}/k)^{y(k)}}{ y(k)!} } \end{array}

as desired.   ■

Now let’s see how this result lets us compute the probability that a random permutation of an nn-element set lies in any given conjugacy class. The conjugacy classes in S nS_n correspond to Young diagrams yy with nn boxes. For each such yy we will compute the probability that a random element of S nS_n lies in the corresponding conjugacy class. Let’s call this probability p yp_y.

In general, the probability that a randomly chosen element of a finite group GG lies in some conjugacy class KK is |K|/|G||K|/|G|. But KG/C(k)K \cong G/C(k) where C(k)C(k) is the centralizer of some element kKk \in K. Thus, the probability in question equals 1/|C(k)|1/|C(k)|.

Recall that Perm y\mathsf{Perm}_y is one-object groupoid corresponding to the centralizer of some σS n\sigma \in S_n whose conjugacy class corresponds to the Young diagram yy. The cardinality of a one-object groupoid is the reciprocal of the cardinality of the corresponding group, so

|Perm y|=1|C(σ)| |\mathsf{Perm}_y| = \frac{1}{|C(\sigma)|}

It follows that

p y=|Perm y| p_y = |\mathsf{Perm}_y|

In other words, the probability we are trying to compute is the cardinality of a groupoid we have already studied! We saw in the proof of the Theorem that

Perm yB( k=1 S y(k)(/k) y(k)) \mathsf{Perm}_y \simeq \mathsf{B} \left( \prod_{k=1}^\infty S_{y(k)} \ltimes (\mathbb{Z}/k)^{y(k)} \right)


p y = |Perm y| = k=1 1|S y(k)(/k) y(k)| = k=1 1y(k)!k y(k) \begin{array}{ccl} p_y &=& |\mathsf{Perm}_y| \\ \\ &=& \displaystyle{ \prod_{k=1}^\infty \frac{1}{|S_{y(k)} \ltimes (\mathbb{Z}/k)^{y(k)}| } } \\ \\ &=& \displaystyle{ \prod_{k=1}^\infty \frac{1}{y(k)! \, k^{y(k)} } } \end{array}

So, we get a well-known result:

Theorem. The probability p yp_y that a random permutation of an nn-element set has y(k)y(k) cycles of length kk for all k=1,2,3,k = 1, 2, 3, \dots is given by

p y= k=1 1y(k)!k y(k) p_y = \prod_{k=1}^\infty \frac{1}{y(k)! \, k^{y(k)}}

The theorem is easy to prove, so the point is just that this probability is the cardinality of a naturally defined groupoid, and a similar formula holds at the level of groupoids:

Perm y k=1 B(/k) y(k)y(k)! \mathsf{Perm}_y \simeq \prod_{k=1}^\infty \frac{ \mathsf{B}(\mathbb{Z}/k)^{y(k)}}{ y(k)!}

Here is the series so far, on my website:

  • Part 0 — What’s the average length of the longest cycle in a random permutation of an nn-element set?
  • Part 1 — What is the probability that a randomly chosen permutation of an nn-element set has exactly kk fixed points?
  • Part 2 — What is the probability that the shortest cycle in a randomly chosen permutation of an nn-element set has length greater than kk?
  • Part 3 — A large collection of questions about random permutations, with answers.
  • Part 4 — What is the probability that a randomly chosen permutation of an nn-element set has a cycle of length greater than n/2n/2?
  • Part 5 — What is the average length of a cycle in a randomly chosen permutation of an nn-element set?
  • Part 6 — What expected number of cycles of length kk in a randomly chosen permutation of an nn-element set?
  • Part 7 — How is the distribution of the number of cycles of length kk in a random permutation related to a Poisson distribution?
  • Part 8 — What’s the nnth moment of a Poisson distribution?
  • Part 9 — If we treat the number of cycles of length kk in a random permutation of an nn-element set as a random variable, what do the moments of this random variable approach as nn \to \infty?
  • Part 10 — How to compute statistics of random permutations using groupoid cardinalities.
  • Part 11 — How to prove the Cycle Length Lemma, a fundamental result on random permutations, using groupoid cardinalities.
  • Part 12 — How to write the groupoid of finite sets equipped with a permutation as a sum over Young diagrams, and how to use this to compute the probability that a random permutation has given cycle lengths.

Mark GoodsellLet's try that again

Let's try this again It's been suggested that if you go to send someone a "Happy New Year" message and notice that the last one you sent was the same thing last year, that maybe you don't need to send the message. Well, I'm going to try to prove that wrong and once again kick start this blog.

My writing in 2018 was hamstrung partly by illness. I had much better excuses last year, which was actually much worse. But this blog is called "Real Self Energy" and I am always trying to look on the bright side and see the positives, so if we look at what I was looking forward to last year in physics, in fact this year we could say almost exactly the same things:

  1. We're still waiting for the results on muon \( g-2 \). At the French intensity frontier GDR meeting in November we had a great talk on this by Marc Knecht, and the theory challenges in the future, which really convinced me that theorists do have a good handle on the calculation and we are just waiting for the experiments to have their final say.
  2. The B-meson anomalies lost a bit of their lustre with an update that just preserved the status quo: the measurement of \( R(K) \equiv \frac{\mathrm{BR}(B \rightarrow K \mu \mu)}{\mathrm{BR}(B \rightarrow K e e)}\) by LHCb moved closer to the Standard Model value while the uncertainty shrank, keeping the deviation about the same, while a preliminary measurement by Belle of \( R(K^*) \equiv \frac{\mathrm{BR}(B \rightarrow K^* \mu \mu)}{\mathrm{BR}(B \rightarrow K^* e e)} \) was consistent with the SM value, but with much poorer uncertainty than the (anomalous) values from LHCb. Again we discussed this extensively at the GDR in November and there is still a lot of excitement and anticipation that looks set to continue for some time, with many experiments set to report data over the coming years. A good reference of the current status is this paper.
  3. The CMS and ATLAS collaborations seem to be taking their time with analyses of the full dataset of Run 2, so we are still waiting for lots of new results to come out. From the theory perspective, I have recently been involved in putting collider limits on new theory models ("recasting") and every time new experimental results come out, there is a lag before they are implemented in the various theory tools. One of the interesting questions for me will be which theory tool emerges as the winner in the long run from this effort, or if the experiments will first make their analyses completely unreproducible (e.g. by moving from cut-based analyses to neural networks)!
  4. Regarding the Higgs, the mass has already been experimentally determined to an accuracy much better than we "need" (compared e.g. to the top quark), and the accuracy of the coupling measurements will only be incrementally improved with more data. There has been a lot of interest in the production cross-section, for both single and double Higgs events, where I learnt recently that the prediction in the Standard Model is now more accurately known than it can ever be determined at the LHC. This is an interesting effort that one of my LPTHE colleagues got into last year: here and here.
None of this mentions dark matter, neutrinos or axions, where there are lots of interesting things going on. And I was going to mention last year's politics, but I have run out of time for today, and, since that's a rather personal and bleak post, I will leave it for later!

January 10, 2020

n-Category Café Quotienting Out The Degenerate

This is a quick, off-the-cuff, conceptual question. Hopefully, it has an easy answer.

Often in algebra, we want to quotient out by a set of elements that we regard as trivial or degenerate. That’s almost a tautology: any time we take a quotient, the elements quotiented out are by definition treated as negligible. And often the situation is mathematically trivial too, as when we quotient by the kernel of a homomorphism.

But some examples of quotienting by degenerates are slightly more subtle. The two I have in mind are:

  • the definition of exterior power;

  • the definition of normalized chain complex.

I’d like to know whether there’s a thread connecting the two.

Let me now explain those two examples in a way that makes them look somewhat similar. I’ll start with exterior powers.

Take a vector space XX over some field, and take an integer r0r \geq 0. A multilinear map

f:X rY f: X^r \to Y

to another vector space YY is said to be alternating if

(x 1,,x r) linearly dependentf(x 1,,x r)=0. (x_1, \ldots, x_r) \ \text{ linearly dependent} \implies f(x_1, \ldots, x_r) = 0.

The rrth exterior power rX\bigwedge^r X is the codomain of the universal alternating map out of X rX^r.

That’s a characterization of rX\bigwedge^r X by a universal property, but it’s not an actual construction. It’s constructed like this: the tensor power X rX^{\otimes r} has a linear subspace D rD_r generated by

{x 1x r(x 1,,x r) linearly dependent}, \{ x_1 \otimes \cdots \otimes x_r \mid (x_1, \ldots, x_r) \ \text{ linearly dependent} \},

and then we put rX=X r/D r\bigwedge^r X = X^{\otimes r}/D_r. The letter DD is chosen to stand for either “degenerate” or “dependent”, linearly dependence being a kind of degeneracy condition. So, rX\bigwedge^r X is X rX^{\otimes r} quotiented out by its degenerate part.

Incidentally, the universal property and construction of the exterior power aren’t usually phrased this way. More often, a multilinear map f:X rYf: X^r \to Y is defined to be “alternating” if f(x 1,,x n)=0f(x_1, \ldots, x_n) = 0 whenever x i=x jx_i = x_j for some iji \neq j. But this is equivalent. Similarly, D rD_r can equivalently be defined as the subspace of X rX^{\otimes r} generated by the elements x 1x rx_1 \otimes \cdots \otimes x_r where x i=x jx_i = x_j for some iji \neq j.

Personally, my comfort with exterior algebra took a leap forward when I learned that the definition of exterior power could be expressed in terms of linear dependence, as opposed to the standard presentation via repeated arguments. I never felt entirely motivated by that standard approach, despite the volume interpretation and the nearly-equivalent condition that swapping arguments changes the sign. But the condition that degenerate terms get sent to zero feels more natural to me, whatever “natural” means.

Now let’s do normalized chains. I guess I could say this in the context of an arbitrary abelian category, but I’ll just say it for modules over some commutative ring.

Let XX be a simplicial module. It gives rise to a chain complex C(X)C(X) of modules (the unnormalized complex of XX). The rrth module C r(X)C_r(X) is just X rX_r, and the boundary maps in C(X)C(X) are the alternating sums (1) id i\sum (-1)^i d_i of face maps in C(X)C(X), as usual.

Now, some of the elements of X rX_r are “degenerate”, in the sense that they can be obtained from lower-dimensional elements. Specifically, let D r(X)D_r(X) be the submodule of X rX_r generated by

{(Xf)(y)f:[r][q],q<r,yX q}. \{ (X f)(y) \mid f: [r] \to [q], \ q \lt r, \ y \in X_q\}.

A little calculation shows that D(X)=(D r(X)) r0D(X) = (D_r(X))_{r \geq 0} is a subcomplex of the unnormalized complex C(X)C(X). So we can form the quotient complex N(X)=C(X)/D(X)N(X) = C(X)/D(X), which by definition is the normalized complex of the simplicial module XX.

Again, this isn’t quite the usual presentation. An important fact is that N(X)N(X) is not just a quotient of C(X)C(X), but a direct summand. In particular, it can also be viewed as a subcomplex of C(X)C(X). There are two dual ways to view it thus: N r(X)N_r(X) can be seen as the intersection i=0 r1ker(d i)\bigcap_{i = 0}^{r - 1} ker(d_i) of the kernels of all but the last face map, or dually as the intersection i=1 rker(d i)\bigcap_{i = 1}^r ker(d_i) of the kernels of all but the first face map. Most often, N(X)N(X) is defined to be one of these two subcomplexes. But to me it seems more natural to view it primarily as C(X)/D(X)C(X)/D(X): the quotient of C(X)C(X) by its degenerate part.

So, those are the two situations I wanted to describe. They seem moderately similar to me: both involve a quotient by a subobject generated by degenerate elements; in both, the degenerate elements don’t actually form a subobject themselves (the words “generated by” are crucial); both involve an indexing over the natural numbers rr. So I’m wondering whether the two situations are related. Maybe one is a special case of the other, or maybe there’s a common generalization. Even if not, perhaps there’s some good point of view on “quotient out the degenerate” constructions of this kind, including my two examples and maybe others. Can anyone shed any light?

Matt von HippelThe Road to Reality

I build tools, mathematical tools to be specific, and I want those tools to be useful. I want them to be used to study the real world. But when I build those tools, most of the time, I don’t test them on the real world. I use toy models, simpler cases, theories that don’t describe reality and weren’t intended to.

I do this, in part, because it lets me stay one step ahead. I can do more with those toy models, answer more complicated questions with greater precision, than I can for the real world. I can do more ambitious calculations, and still get an answer. And by doing those calculations, I can start to anticipate problems that will crop up for the real world too. Even if we can’t do a calculation yet for the real world, if it requires too much precision or too many particles, we can still study it in a toy model. Then when we’re ready to do those calculations in the real world, we know better what to expect. The toy model will have shown us some of the key challenges, and how to tackle them.

There’s a risk, working with simpler toy models. The risk is that their simplicity misleads you. When you solve a problem in a toy model, could you solve it only because the toy model is easy? Or would a similar solution work in the real world? What features of the toy model did you need, and which are extra?

The only way around this risk is to be careful. You have to keep track of how your toy model differs from the real world. You must keep in mind difficulties that come up on the road to reality: the twists and turns and potholes that real-world theories will give you. You can’t plan around all of them, that’s why you’re working with a toy model in the first place. But for a few key, important ones, you should keep your eye on the horizon. You should keep in mind that, eventually, the simplifications of the toy model will go away. And you should have ideas, perhaps not full plans but at least ideas, for how to handle some of those difficulties. If you put the work in, you stand a good chance of building something that’s useful, not just for toy models, but for explaining the real world.

January 07, 2020

Tommaso DorigoGuest Post: Andre Kovacs, Mistaken Assumptions In Physics - What Hurts You Is What Everyone Knows To Be True

What hurts you is not what you don't know, but those mistaken assumptions which "everyone knows to be true".
[The following text is courtesy Andras Kovacs - T.D.]

read more

January 06, 2020

Jacques Distler Entanglement for Laymen

I’ve been asked, innumerable times, to explain quantum entanglement to some lay audience. Most of the elementary explanations that I have seen (heck, maybe all of them) fail to draw any meaningful distinction between “entanglement” and mere “(classical) correlation.”

This drives me up the wall, so each time I am asked, I strive to come up with an elementary explanation of the difference. Rather than keep reinventing the wheel, let me herewith record my latest attempt.

“Entanglement” is a bit tricky to explain, versus “correlation” — which has a perfectly classical interpretation.

Say I tear a page of paper in two, crumple up the two pieces into balls and (at random) hand one to Adam and the other to Betty. They then go their separate ways and — sometime later — Adam unfolds his piece of paper. There’s a 50% chance that he got the top half, and 50% that he got the bottom half. But if he got the top half, we know for certain that Betty got the bottom half (and vice versa).

That’s correlation.

In this regard, the entangled state behaves exactly the same way. What distinguishes the entangled state from the merely correlated is something that doesn’t have a classical analogue. So let me shift from pieces of paper to photons.

You’re probably familiar with the polaroid filters in good sunglasses. They absorb light polarized along the horizontal axis, but transmit light polarized along the vertical axis.

Say, instead of crumpled pieces of paper, I send Adam and Betty a pair of photons.

In the correlated state, one photon is polarized horizontally, and one photon is polarized vertically, and there’s a 50% chance that Adam got the first while Betty got the second and a 50% chance that it’s the other way around.

Adam and Betty send their photons through polaroid filters, both aligned vertically. If Adam’s photon makes it through the filter, we can be certain that Betty’s gets absorbed and vice versa. Same is true if they both align their filters horizontally.

Say Adam aligns his filter horizontally, while Betty aligns hers vertically. Then either both photons make it though (with 50% probability) or both get absorbed (also with 50% probability).

All of the above statements are also true in the entangled state.

The tricky thing, the thing that makes the entangled state different from the correlated state, is what happens if both Adam and Betty align their filters at a 45° angle. Now there’s a 50% chance that Adam’s photon makes it through his filter, and a 50% chance that Betty’s photon makes it through her filter.

(You can check this yourself, if you’re willing to sacrifice an old pair of sunglasses. Polarize a beam of light with one sunglass lens, and view it through the other sunglass lens. As you rotate the second lens, the intensity varies from 100% (when the lenses are aligned) to 0 (when they are at 90°). The intensity is 50% when the second lens is at 45°.)

So what is the probability that both Adam and Betty’s photons make it through? Well, if there’s a 50% chance that his made it through and a 50% chance that hers made it through, then you might surmise that there’s a 25% chance that both made it through.

That’s indeed the correct answer in the correlated state.

In fact, in the correlated state, each of the 4 possible outcomes (both photons made it through, Adam’s made it through but Betty’s got absorbed, Adam’s got absorbed but Betty’s made it through or both got absorbed) has a 25% chance of taking place.

But, in the entangled state, things are different.

In the entangled state, the probability that both photons made it through is 50% – the same as the probability that one made it through. In other words, if Adam’s photon made it through the 45° filter, then we can be certain that Betty’s made it through. And if Adam’s was absorbed, so was Betty’s. There’s zero chance that one of their photons made it through while the other got absorbed.

Unfortunately, while it’s fairly easy to create the correlated state with classical tools (polaroid filters, half-silvered mirrors, …), creating the entangled state requires some quantum mechanical ingredients. So you’ll just have to believe me that quantum mechanics allows for a state of two photons with all of the aforementioned properties.

Sorry if this explanation was a bit convoluted; I told you that entanglement is subtle…

January 05, 2020

Doug NatelsonBrief items

Happy new year.  As we head into 2020, here are a few links I've been meaning to point out:

  • This paper is a topical review of high-throughput (sometimes called combinatorial) approaches to searching for new superconductors.   The basic concept is simple enough:  co-deposit multiple different elements in a way that deliberately produces compositional gradients across the target substrate.  This can be done via geometry of deposition, or with stencils that move during the deposition process.  Then characterize the local properties in an efficient way across the various compositional gradients, looking for the target properties you want (e.g., maximum superconducting transition temperature).  Ideally, you combine this with high-throughput structural characterization and even annealing or other post-deposition treatment.  Doing all of this well in practice is a craft.  
  • Calling back to my post on this topic, Scientific American has an article about wealth distribution based on statistical mechanics-like models of economies.   It's hard for me to believe that some of these insights are really "new" - seems like many of these models could have been examined decades ago....
  • This is impressive.  Jason Petta's group at Princeton has demonstrated controlled entanglement between single-electron spins in Si/SiGe gate-defined quantum dots separated by 4 mm.  That may not sound all that exciting; one could use photons to entangle atoms separated by km, as has been done with optical fiber.  However, doing this on-chip using engineered quantum dots (with gates for tunable control) in an arrangement that is in principle scalable via microfabrication techniques is a major achievement.
  • Just in case you needed another demonstration that correlated materials like the copper oxide superconductors are complicated, here you go.  These investigators use an approach based on density functional theory (see here, here, and here), and end up worrying about energetic competition between 26 different electronic/magnetic phases.  Regardless of the robustness of their specific conclusions, just that tells you the inherent challenge of those systems:  Many possible ordered states all with very similar energy scales.

January 03, 2020

Jordan EllenbergEdith Wharton shipped Esther/Haman

That theory, now, that Odysseus never really forgot Circe; and that Esther was in love with Haman, and decoyed him to the banquet with Ahasuerus just for the sake of once having him near her and hearing him speak; and that Dante, perhaps, if he could have been brought to book, would have had to confess to caring a good deal more for the pietosa donna of the window than for a long-dead Beatrice — well, you know, it tallies wonderfully with the inconsequences and surprises that one is always discovering under the superficial fitnesses of life.

(Edith Wharton, “That Good May Come,” 1894.)

Matt von HippelWhy You Might Want to Bootstrap

A few weeks back, Quanta Magazine had an article about attempts to “bootstrap” the laws of physics, starting from simple physical principles and pulling out a full theory “by its own bootstraps”. This kind of work is a cornerstone of my field, a shared philosophy that motivates a lot of what we do. Building on deep older results, people in my field have found that just a few simple principles are enough to pick out specific physical theories.

There are limits to this. These principles pick out broad traits of theories: gravity versus the strong force versus the Higgs boson. As far as we know they don’t separate more closely related forces, like the strong nuclear force and the weak nuclear force. (Originally, the Quanta article accidentally made it sound like we know why there are four fundamental forces: we don’t, and the article’s phrasing was corrected.) More generally, a bootstrap method isn’t going to tell you which principles are the right ones. For any set of principles, you can always ask “why?”

With that in mind, why would you want to bootstrap?

First, it can make your life simpler. Those simple physical principles may be clear at the end, but they aren’t always obvious at the start of a calculation. If you don’t make good use of them, you might find you’re calculating many things that violate those principles, things that in the end all add up to zero. Bootstrapping can let you skip that part of the calculation, and sometimes go straight to the answer.

Second, it can suggest possibilities you hadn’t considered. Sometimes, your simple physical principles don’t select a unique theory. Some of the options will be theories you’ve heard of, but some might be theories that never would have come up, or even theories that are entirely new. Trying to understand the new theories, to see whether they make sense and are useful, can lead to discovering new principles as well.

Finally, even if you don’t know which principles are the right ones, some principles are better than others. If there is an ultimate theory that describes the real world, it can’t be logically inconsistent. That’s a start, but it’s quite a weak requirement. There are principles that aren’t required by logic itself, but that still seem important in making the world “make sense”. Often, we appreciate these principles only after we’ve seen them at work in the real world. The best example I can think of is relativity: while Newtonian mechanics is logically consistent, it requires a preferred reference frame, a fixed notion for which things are moving and which things are still. This seemed reasonable for a long time, but now that we understand relativity the idea of a preferred reference frame seems like it should have been obviously wrong. It introduces something arbitrary into the laws of the universe, a “why is it that way?” question that doesn’t have an answer. That doesn’t mean it’s logically inconsistent, or impossible, but it does make it suspect in a way other ideas aren’t. Part of the hope of these kinds of bootstrap methods is that they uncover principles like that, principles that aren’t mandatory but that are still in some sense “obvious”. Hopefully, enough principles like that really do specify the laws of physics. And if they don’t, we’ll at least have learned how to calculate better.

Scott AaronsonQuantum Dominance, Hegemony, and Superiority

Yay! I’m now a Fellow of the ACM. Along with my fellow new inductee Peter Shor, who I hear is a real up-and-comer in the quantum computing field. I will seek to use this awesome responsibility to steer the ACM along the path of good rather than evil.

Also, last week, I attended the Q2B conference in San Jose, where a central theme was the outlook for practical quantum computing in the wake of the first clear demonstration of quantum computational supremacy. Thanks to the folks at QC Ware for organizing a fun conference (full disclosure: I’m QC Ware’s Chief Scientific Advisor). I’ll have more to say about the actual scientific things discussed at Q2B in future posts.

None of that is why you’re here, though. You’re here because of the battle over “quantum supremacy.”

A week ago, my good friend and collaborator Zach Weinersmith, of SMBC Comics, put out a cartoon with a dark-curly-haired scientist named “Dr. Aaronson,” who’s revealed on a hot mic to be an evil “quantum supremacist.” Apparently a rush job, this cartoon is far from Zach’s finest work. For one thing, if the character is supposed to be me, why not draw him as me, and if he isn’t, why call him “Dr. Aaronson”? In any case, I learned from talking to Zach that the cartoon’s timing was purely coincidental: Zach didn’t even realize what a hornet’s-nest he was poking with this.

Ever since John Preskill coined it in 2012, “quantum supremacy” has been an awkward term. Much as I admire John Preskill’s wisdom, brilliance, generosity, and good sense, in physics as in everything else—yeah, “quantum supremacy” is not a term I would’ve coined, and it’s certainly not a hill I’d choose to die on. Once it had gained common currency, though, I sort of took a liking to it, mostly because I realized that I could mine it for dark one-liners in my talks.

The thinking was: even as white supremacy was making its horrific resurgence in the US and around the world, here we were, physicists and computer scientists and mathematicians of varied skin tones and accents and genders, coming together to pursue a different and better kind of supremacy—a small reflection of the better world that we still believed was possible. You might say that we were reclaiming the word “supremacy”—which, after all, just means a state of being supreme—for something non-sexist and non-racist and inclusive and good.

In the world of 2019, alas, perhaps it was inevitable that people wouldn’t leave things there.

My first intimation came a month ago, when Leonie Mueck—someone who I’d gotten to know and like when she was an editor at Nature handling quantum information papers—emailed me about her view that our community should abandon the term “quantum supremacy,” because of its potential to make women and minorities uncomfortable in our field. She advocated using “quantum advantage” instead.

So I sent Leonie back a friendly reply, explaining that, as the father of a math-loving 6-year-old girl, I understood and shared her concerns—but also, that I didn’t know an alternative term that really worked.

See, it’s like this. Preskill meant “quantum supremacy” to refer to a momentous event that seemed likely to arrive in a matter of years: namely, the moment when programmable quantum computers would first outpace the ability of the fastest classical supercomputers on earth, running the fastest algorithms known by humans, to simulate what the quantum computers were doing (at least on special, contrived problems). And … “the historic milestone of quantum advantage”? It just doesn’t sound right. Plus, as many others pointed out, the term “quantum advantage” is already used to refer to … well, quantum advantages, which might fall well short of supremacy.

But one could go further. Suppose we did switch to “quantum advantage.” Couldn’t that term, too, remind vulnerable people about the unfair advantages that some groups have over others? Indeed, while “advantage” is certainly subtler than “supremacy,” couldn’t that make it all the more insidious, and therefore dangerous?

Oblivious though I sometimes am, I realized Leonie would be unhappy if I offered that, because of my wholehearted agreement, I would henceforth never again call it “quantum supremacy,” but only “quantum superiority,” “quantum dominance,” or “quantum hegemony.”

But maybe you now see the problem. What word does the English language provide to describe one thing decisively beating or being better than a different thing for some purpose, and which doesn’t have unsavory connotations?

I’ve heard “quantum ascendancy,” but that makes it sound like we’re a UFO cult—waiting to ascend, like ytterbium ions caught in a laser beam, to a vast quantum computer in the sky.

I’ve heard “quantum inimitability” (that is, inability to imitate using a classical computer), but who can pronounce that?

Yesterday, my brilliant former student Ewin Tang (yes, that one) relayed to me a suggestion by Kevin Tian: “quantum eclipse” (that is, the moment when quantum computers first eclipse classical ones for some task). But would one want to speak of a “quantum eclipse experiment”? And shouldn’t we expect that, the cuter and cleverer the term, the harder it will be to use unironically?

In summary, while someone might think of a term so inspired that it immediately supplants “quantum supremacy” (and while I welcome suggestions), I currently regard it as an open problem.

Anyway, evidently dissatisfied with my response, last week Leonie teamed up with 13 others to publish a letter in Nature, which was originally entitled “Supremacy is for racists—use ‘quantum advantage,'” but whose title I see has now been changed to the less inflammatory “Instead of ‘supremacy’ use ‘quantum advantage.'” Leonie’s co-signatories included four of my good friends and colleagues: Alan Aspuru-Guzik, Helmut Katzgraber, Anne Broadbent, and Chris Granade (the last of whom got started in the field by helping me edit Quantum Computing Since Democritus).

(Update: Leonie pointed me to a longer list of signatories here, at their website called “” A few names that might be known to Shtetl-Optimized readers are Andrew White, David Yonge-Mallo, Debbie Leung, Matt Leifer, Matthias Troyer.)

Their letter says:

The community claims that quantum supremacy is a technical term with a specified meaning. However, any technical justification for this descriptor could get swamped as it enters the public arena after the intense media coverage of the past few months.

In our view, ‘supremacy’ has overtones of violence, neocolonialism and racism through its association with ‘white supremacy’. Inherently violent language has crept into other branches of science as well — in human and robotic spaceflight, for example, terms such as ‘conquest’, ‘colonization’ and ‘settlement’ evoke the terra nullius arguments of settler colonialism and must be contextualized against ongoing issues of neocolonialism.

Instead, quantum computing should be an open arena and an inspiration for a new generation of scientists.

When I did an “Ask Me Anything” session, as the closing event at Q2B, Sarah Kaiser asked me to comment on the Nature petition. So I repeated what I’d said in my emailed response to Leonie—running through the problems with each proposed alternative term, talking about the value of reclaiming the word “supremacy,” and mostly just trying to diffuse the tension by getting everyone laughing together. Sarah later tweeted that she was “really disappointed” in my response.

Then the Wall Street Journal got in on the action, with a brief editorial (warning: paywalled) mocking the Nature petition:

There it is, folks: Mankind has hit quantum wokeness. Our species, akin to Schrödinger’s cat, is simultaneously brilliant and brain-dead. We built a quantum computer and then argued about whether the write-up was linguistically racist.

Taken seriously, the renaming game will never end. First put a Sharpie to the Supremacy Clause of the U.S. Constitution, which says federal laws trump state laws. Cancel Matt Damon for his 2004 role in “The Bourne Supremacy.” Make the Air Force give up the term “air supremacy.” Tell lovers of supreme pizza to quit being so chauvinistic about their toppings. Please inform Motown legend Diana Ross that the Supremes are problematic.

The quirks of quantum mechanics, some people argue, are explained by the existence of many universes. How did we get stuck in this one?

Steven Pinker also weighed in, with a linguistically-informed tweetstorm:

This sounds like something from The Onion but actually appeared in Nature … It follows the wokified stigmatization of other innocent words, like “House Master” (now, at Harvard, Residential Dean) and “NIPS” (Neural Information Processing Society, now NeurIPS). It’s a familiar linguistic phenomenon, a lexical version of Gresham’s Law: bad meanings drive good ones out of circulation. Examples: the doomed “niggardly” (no relation to the n-word) and the original senses of “cock,” “ass,” “prick,” “pussy,” and “booty.” Still, the prissy banning of words by academics should be resisted. It dumbs down understanding of language: word meanings are conventions, not spells with magical powers, and all words have multiple senses, which are distinguished in context. Also, it makes academia a laughingstock, tars the innocent, and does nothing to combat actual racism & sexism.

Others had a stronger reaction. Curtis Yarvin, better known as Mencius Moldbug, is one of the founders of “neoreaction” (and a significant influence on Steve Bannon, Michael Anton, and other Trumpists). Regulars might remember that Yarvin argued with me in Shtetl-Optimized‘s comment section, under a post in which I denounced Trump’s travel ban and its effects on my Iranian PhD student. Since then, Yarvin has sent me many emails, which have ranged from long to extremely long, and whose message could be summarized as: “[labored breathing] Abandon your liberal Enlightenment pretensions, young Nerdwalker. Come over the Dark Side.”

After the “supremacy is for racists” letter came out in Nature, though, Yarvin sent me his shortest email ever. It was simply a link to the letter, along with the comment “I knew it would come to this.”

He meant: “What more proof do you need, young Nerdawan, that this performative wokeness is a cancer that will eventually infect everything you value—even totally apolitical research in quantum information? And by extension, that my whole worldview, which warned of this, is fundamentally correct, while your faith in liberal academia is naïve, and will be repaid only with backstabbing?”

In a subsequent email, Yarvin predicted that in two years, the whole community will be saying “quantum advantage” instead of “quantum supremacy,” and in five years I’ll be saying “quantum advantage” too. As Yarvin famously wrote: “Cthulhu may swim slowly. But he only swims left.”

So what do I really think about this epic battle for (and against) supremacy?

Truthfully, half of me just wants to switch to “quantum advantage” right now and be done with it. As I said, I know some of the signatories of the Nature letter to be smart and reasonable and kind. They don’t wish to rid the planet of everyone like me. They’re not Amanda Marcottes or Arthur Chus. Furthermore, there’s little I despise more than a meaty scientific debate devolving into a pointless semantic one, with brilliant friend after brilliant friend getting sucked into the vortex (“you too?”). I’m strongly in the Pinkerian camp, which holds that words are just arbitrary designators, devoid of the totemic power to dictate thoughts. So if friends and colleagues—even just a few of them—tell me that they find some word I use to be offensive, why not just be a mensch, apologize for any unintended hurt, switch words midsentence, and continue discussing the matter at hand?

But then the other half of me wonders: once we’ve ceded an open-ended veto over technical terms that remind anyone of anything bad, where does it stop? How do we ever certify a word as kosher? At what point do we all get to stop arguing and laugh together?

To make this worry concrete, look back at Sarah Kaiser’s Twitter thread—the one where she expresses disappointment in me. Below her tweet, someone remarks that, besides “quantum supremacy,” the word “ancilla” (as in ancilla qubit, a qubit used for intermediate computation or other auxiliary purposes) is problematic as well. Here’s Sarah’s response:

I agree, but I wanted to start by focusing on the obvious one, Its harder for them to object to just one to start with, then once they admit the logic, we can expand the list

(What would Curtis Yarvin say about that?)

You’re probably now wondering: what’s wrong with “ancilla”? Apparently, in ancient Rome, an “ancilla” was a female slave, and indeed that’s the Latin root of the English adjective “ancillary” (as in “providing support to”). I confess that I hadn’t known that—had you? Admittedly, once you do know, you might never again look at a Controlled-NOT gate—pitilessly flipping an ancilla qubit, subject only to the whims of a nearby control qubit—in quite the same way.

(Ah, but the ancilla can fight back against her controller! And she does—in the Hadamard basis.)

The thing is, if we’re gonna play this game: what about annihilation operators? Won’t those need to be … annihilated from physics?

And what about unitary matrices? Doesn’t their very name negate the multiplicity of perspectives and cultures?

What about Dirac’s oddly-named bra/ket notation, with its limitless potential for puerile jokes, about the “bra” vectors displaying their contents horizontally and so forth? (Did you smile at that, you hateful pig?)

What about daggers? Don’t we need a less violent conjugate tranpose?

Not to beat a dead horse, but once you hunt for examples, you realize that the whole dictionary is shot through with domination and brutality—that you’d have to massacre the English language to take it out. There’s nothing special about math or physics in this respect.

The same half of me also thinks about my friends and colleagues who oppose claims of quantum supremacy, or even the quest for quantum supremacy, on various scientific grounds. I.e., either they don’t think that the Google team achieved what it said, or they think that the task wasn’t hard enough for classical computers, or they think that the entire goal is misguided or irrelevant or uninteresting.

Which is fine—these are precisely the arguments we should be having—except that I’ve personally seen some of my respected colleagues, while arguing for these positions, opportunistically tack on ideological objections to the term “quantum supremacy.” Just to goose up their case, I guess. And I confess that every time they did this, it made me want to keep saying “quantum supremacy” from now till the end of time—solely to deny these colleagues a cheap and unearned “victory,” one they apparently felt they couldn’t obtain on the merits alone. I realize that this is childish and irrational.

Most of all, though, the half of me that I’m talking about thinks about Curtis Yarvin and the Wall Street Journal editorial board, cackling with glee to see their worldview so dramatically confirmed—as theatrical wokeness, that self-parodying modern monstrosity, turns its gaze on (of all things) quantum computing research. More red meat to fire up the base—or at least that sliver of the base nerdy enough to care. And the left, as usual, walks right into the trap, sacrificing its credibility with the outside world to pursue a runaway virtue-signaling spiral.

The same half of me thinks: do we really want to fight racism and sexism? Then let’s work together to assemble a broad coalition that can defeat Trump. And Jair Bolsonaro, and Viktor Orbán, and all the other ghastly manifestations of humanity’s collective lizard-brain. Then, if we’re really fantasizing, we could liberalize the drug laws, and get contraception and loans and education to women in the Third World, and stop the systematic disenfranchisement of black voters, and open up the world’s richer, whiter, and higher-elevation countries to climate refugees, and protect the world’s remaining indigenous lands (those that didn’t burn to the ground this year).

In this context, the trouble with obsessing over terms like “quantum supremacy” is not merely that it diverts attention, while contributing nothing to fighting the world’s actual racism and sexism. The trouble is that the obsessions are actually harmful. For they make academics—along with progressive activists—look silly. They make people think that we must not have meant it when we talked about the existential urgency of climate change and the world’s other crises. They pump oxygen into right-wing echo chambers.

But it’s worse than ridiculous, because of the message that I fear is received by many outside the activists’ bubble. When you say stuff like “[quantum] supremacy is for racists,” what’s heard might be something more like:

“Watch your back, you disgusting supremacist. Yes, you. You claim that you mentor women and minorities, donate to good causes, try hard to confront the demons in your own character? Ha! None of that counts for anything with us. You’ll never be with-it enough to be our ally, so don’t bother trying. We’ll see to it that you’re never safe, not even in the most abstruse and apolitical fields. We’ll comb through your words—even words like ‘ancilla qubit’—looking for any that we can cast as offensive by our opaque and ever-shifting standards. And once we find some, we’ll have it within our power to end your career, and you’ll be reduced to groveling that we don’t. Remember those popular kids who bullied you in second grade, giving you nightmares of social ostracism that persist to this day? We plan to achieve what even those bullies couldn’t: to shame you with the full backing of the modern world’s moral code. See, we’re the good guys of this story. It’s goodness itself that’s branding you as racist scum.”

In short, I claim that the message—not the message intended, of course, by anyone other than a Chu or a Marcotte or a SneerClubber, but the message received—is basically a Trump campaign ad. I claim further that our civilization’s current self-inflicted catastrophe will end—i.e., the believers in science and reason and progress and rule of law will claw their way back to power—when, and only when, a generation of activists emerges that understands these dynamics as well as Barack Obama did.

Wouldn’t it be awesome if, five years from now, I could say to Curtis Yarvin: you were wrong? If I could say to him: my colleagues and I still use the term ‘quantum supremacy’ whenever we care to, and none of us have been cancelled or ostracized for it—so maybe you should revisit your paranoid theories about Cthulhu and the Cathedral and so forth? If I could say: quantum computing researchers now have bigger fish to fry than arguments over words—like moving beyond quantum supremacy to the first useful quantum simulations, as well as the race for scalability and fault-tolerance? And even: progressive activists now have bigger fish to fry too—like retaking actual power all over the world?

Anyway, as I said, that’s how half of me feels. The other half is ready to switch to “quantum advantage” or any other serviceable term and get back to doing science.

January 02, 2020

Peter Rohde Happy New Year & thank you to our Firies!

Sydney fireworks (2019-2020), taken from Blues Point Tower.

I'm glad Sydney went ahead with this, while giving my absolute respect to those who have perished or lost their homes in the surrounding fires. As I watched in amazement at the display, I chose to dedicate that time to reflecting on my gratitude to the RFS volunteers. A celebration needn’t be disrespectful. It can be used to show gratitude too. Perhaps the City of Sydney should have made such a dedication. Thank you RFS.

Richard EastherRed Sky at Noon

On New Year’s Day, social media in New Zealand was flooded with images of eerie orange skies above the South Island as the smoke from a continent-scale fire disaster crossed the Tasman Sea.

The smoke shows equally well in the view from space. To set the scale, it is 2000km or three hours’ flying from New Zealand to the Australian coast; roughly, London to Moscow, or Denver to Washington DC.


This stunning image (and you'll be seeing more of them as the fires burn) was taken by the Japanese Himawari satellite. The spacecraft is in a geostationary orbit above the Pacific, keeping track with the rotating Earth and high enough to take images of the whole hemisphere beneath it.

There is no sensible doubt that the magnitude of this crisis is driven by climate change. Australia has always had fires, but climate change makes the continent hotter and drier, nudging more places above the tipping point on which fire will take hold, on more days of the year.

Climate change is, ultimately, an unintended consequence of human ingenuity. We can summon light at the flick of a switch, cross continents in a day, and the world's knowledge is at our fingertips. Billions of humans have access to miracles that emperors could never command.

In fairy tales and fables, wishes granted by genies always come with an unanticipated price. These real-life wonders are no exception: carbon dioxide is poured into the atmosphere as we forge steel, burn fossil fuels to power cars and planes and generate electricity.

When Pandora’s Box was opened, while all manner of harm was unleashed into the world the last thing to be found in the box was hope. Climate change is not a myth or a fairy tale, but hope is still to be found in the satellite images of smoke spilling across the Tasman. Our capacity to snap pictures from space is just one the tools we have to understand the climate and underscores our ability to develop technologies that let us walk more lightly on the earth. Ironically, our ability to significantly, albeit unintentionally, change the planet within a few generations reminds us that we have shaped the world that we live in, and can do so for both worse and for better.

If you’ve been reassuring friends and family and yourself that climate change is probably nothing to worry about, look at the news from Australia. Take a moment. Take a deep breath. And think again.

There are people who'll try to minimise and muddy the role of climate change, by telling you that this is all a mistake, or a con; that the fires were lit by arsonists; that forest floors should have been cleared of fuel. But none of this explains the ferocity of the scenes we are seeing on our screens and hearing from friends across the Tasman.

There's no shame in having gotten it wrong when people have profitted from telling you not to worry, glib climate sceptics get air time and media sites earn clicks from spreading ignorance as well as elucidation.

In truth, the science is complex. Weather forecasts are in Celsius (or Fahrenheit, for Americans) but the Universe thinks in Kelvins; degrees above absolute zero. Average temperatures have moved 1 degree Celsius, or 1 Kelvin. A 25 Celsius day is 298 Kelvins, so the global change in temperature we’ve seen so far is just over 1 part in 300. At the scale of the universe, our current planetary warming is small change.

But ecosystems are finely balanced, and attuned to differences in temperatures, not its absolute value – like passengers in a small boat getting seasick from a gentle swell on top of a deep ocean. For Australia, an extra degree or two means hotter days, less rain, more heatwaves, more fires, which burn more fiercely once they get started. If our carbon emissions do not shift radically from "business as usual" that is only going to get worse.

The real challenge is not scientific: it is social and political. We know what we need to do, and we can figure out how to do it. But we need to make it happen. If you are looking for New Year's resolutions, try some of these. Drive petrol-powered cars less. Ride a bike more (you’ll get fitter). Get solar panels (which will likely save you money). Think carefully about the food you eat and how it’s grown. Purchase thoughtfully. Fly less. And this is the big one: insist that our leaders are serious about climate, and expect them to follow through on their promises.

You should worry. You should not despair.

I’ll close on this. Another New Year’s Day photo from New Zealand, a South Islander whose security lights flicked on at noon, as smoke darkened the skies. I’m a scientist and a rationalist; I don’t believe in fairy tales or fables. But you could take this as a sign.

Blog version of this Twitter thread

December 30, 2019

Clifford JohnsonMore Pie

Well, you know the saying: When life hands you several apples, some blackberries, and a chunk of pastry left over from your last pie-making… you make another apple-blackberry pie, taking the opportunity to make it even better! (No? Never heard that saying? Huh.) -cvj

The post More Pie appeared first on Asymptotia.

December 29, 2019

Scott AaronsonQuantum computing motte-and-baileys

In the wake of two culture-war posts—the first on the term “quantum supremacy,” the second on the acronym “NIPS”—it’s clear that we all need to cool off with something anodyne and uncontroversial. Fortunately, this holiday season, I know just the thing to bring everyone together: groaning about quantum computing hype!

When I was at the Q2B conference in San Jose, I learned about lots of cool stuff that’s happening in the wake of Google’s quantum supremacy announcement. I heard about the 57-qubit superconducting chip that the Google group is now building, following up on its 53-qubit one; and also about their first small-scale experimental demonstration of my certified randomness protocol. I learned about recent progress on costing out the numbers of qubits and gates needed to do fault-tolerant quantum simulations of useful chemical reactions (IIRC, maybe a hundred thousand qubits and a few hours’ worth of gates—scary, but not Shor’s algorithm scary).

I also learned about two claims about quantum algorithms that startups have made, and which are being wrongly interpreted. The basic pattern is one that I’ve come to know well over the years, and which you could call a science version of the motte-and-bailey. (For those not up on nerd blogosphere terminology: in medieval times, the motte was a dank castle to which you’d retreat while under attack; the bailey was the desirable land that you’d farm once the attackers left.)

To wit:

  1. Startup makes claims that have both a true boring interpretation (e.g., you can do X with a quantum computer), as well as a false exciting interpretation (e.g., you can do X with a quantum computer, and it would actually make sense to do this, because you’ll get an asymptotic speedup over the best known classical algorithm).
  2. Lots of business and government people get all excited, because they assume the false exciting interpretation must be true (or why else would everyone be talking about this?). Some of those people ask me for comment.
  3. I look into it, perhaps by asking the folks at the startup. The startup folks clarify that they meant only the true boring interpretation. To be sure, they’re actively exploring the false exciting interpretation—whether some parts of it might be true after all—but they’re certainly not making any claims about it that would merit, say, a harsh post on Shtetl-Optimized.
  4. I’m satisfied to have gotten to the bottom of things, and I tell the startup folks to go their merry way.
  5. Yet many people continue to seem as excited as if the false exciting interpretation had been shown to be true. They continue asking me questions that presuppose its truth.

Our first instance of this pattern is the recent claim, by Zapata Computing, to have set a world record for integer factoring (1,099,551,473,989 = 1,048,589 × 1,048,601) with a quantum computer, by running a QAOA/variational algorithm on IBM’s superconducting device. Gosh! That sure sounds a lot better than the 21 that’s been factored with Shor’s algorithm, doesn’t it?

I read the Zapata paper that this is based on, entitled “Variational Quantum Factoring,” and I don’t believe that a single word in it is false. My issue is something the paper omits: namely, that once you’ve reduced factoring to a generic optimization problem, you’ve thrown away all the mathematical structure that Shor’s algorithm cleverly exploits, and that makes factoring asymptotically easy for a quantum computer. And hence there’s no reason to expect your quantum algorithm to scale any better than brute-force trial division (or in the most optimistic scenario, trial division enhanced with Grover search). On large numbers, your algorithm will be roundly outperformed even by classical algorithms that do exploit structure, like the Number Field Sieve. Indeed, the quantum computer’s success at factoring the number will have had little or nothing to do with its being quantum at all—a classical optimization algorithm would’ve served as well. And thus, the only reasons to factor a number on a quantum device in this way, would seem to be stuff like calibrating the device.

Admittedly, to people who work in quantum algorithms, everything above is so obvious that it doesn’t need to be said. But I learned at Q2B that there are interested people for whom this is not obvious, and even comes as a revelation. So that’s why I’m saying it.

Again and again over the past twenty years, I’ve seen people reinvent the notion of a “simpler alternative” to Shor’s algorithm: one that cuts out all the difficulty of building a fault-tolerant quantum computer. In every case, the trouble, typically left unstated, has been that these alternatives also cut out the exponential speedup that’s Shor’s algorithm’s raison d’être.

Our second example today of a quantum computing motte-and-bailey is the claim, by Toronto-based quantum computing startup Xanadu, that Gaussian BosonSampling can be used to solve all sorts of graph problems, like graph isomorphism, graph similarity, and densest subgraph. As the co-inventor of BosonSampling, few things would warm my heart more than finding an actual application for that model (besides quantum supremacy experiments and, perhaps, certified random number generation). But I still regard this as an open problem—if by “application,” we mean outperforming what you could’ve done classically.

In papers (see for example here, here, here), members of the Xanadu team have given all sorts of ways to take a graph, and encode it into an instance of Gaussian BosonSampling, in such a way that the output distribution will then reveal features of the graph, like its isomorphism type or its dense subgraphs. The trouble is that so far, I’ve seen no indications that this will actually lead to quantum algorithms that outperform the best classical algorithms, for any graph problems of practical interest.

In the case of Densest Subgraph, the Xanadu folks use the output of a Gaussian BosonSampler to seed (that is, provide an initial guess for) a classical local search algorithm. They say they observe better results this way than if they seed that classical local search algorithm with completely random initial conditions. But of course, the real question is: could we get equally good results by seeding with the output of some classical heuristic? Or by solving Densest Subgraph with a different approach entirely? Given how hard it’s turned out to be just to verify that the outputs of a BosonSampling device come from such a device at all, it would seem astonishing if the answer to these questions wasn’t “yes.”

In the case of Graph Isomorphism, the situation is even clearer. There, the central claim made by the Xanadu folks is that given a graph G, they can use a Gaussian BosonSampling device to sample a probability distribution that encodes G’s isomorphism type. So, isn’t this “promising” for solving GI with a quantum computer? All you’d need to do now is invent some fast classical algorithm that could look at the samples coming from two graphs G and H, and tell you whether the probability distributions were the same.

Except, not really. While the Xanadu paper never says so, if all you want is to sample a distribution that encodes a graph’s isomorphism type, that’s easy to do classically! (I even put this on the final exam for my undergraduate Quantum Information Science course a couple weeks ago.) Here’s how: given as input a graph G, just output G but with its vertices randomly permuted. Indeed, this will even provide a further property, better than anything the BosonSampling approach has been shown to provide (or than it probably does provide): namely, if G and H are not isomorphic, then the two probability distributions will not only be different but will have disjoint supports. Alas, this still leaves us with the problem of distinguishing which distribution a given sample came from, which is as hard as Graph Isomorphism itself. None of these approaches, classical or quantum, seem to lead to any algorithm that’s subexponential time, let alone competitive with the “Babai approach” of thinking really hard about graphs.

All of this stuff falls victim to what I regard as the Fundamental Error of Quantum Algorithms Research: namely, to treat it as “promising” that a quantum algorithm works at all, or works better than some brute-force classical algorithm, without asking yourself whether there are any indications that your approach will ever be able to exploit interference of amplitudes to outperform the best classical algorithm.

Incidentally, I’m not sure exactly why, but in practice, a major red flag that the Fundamental Error is about to be committed is when someone starts talking about “hybrid quantum/classical algorithms.” By this they seem to mean: “outside the domain of traditional quantum algorithms, so don’t judge us by the standards of that domain.” But I liked the way someone at Q2B put it to me: every quantum algorithm is a “hybrid quantum/classical algorithm,” with classical processors used wherever they can be, and qubits used only where they must be.

The other thing people do, when challenged, is to say “well, admittedly we have no rigorous proof of an asymptotic quantum speedup”—thereby brilliantly reframing the whole conversation, to make people like me look like churlish theoreticians insisting on an impossible and perhaps irrelevant standard of rigor, blind to some huge practical quantum speedup that’s about to change the world. The real issue, of course, is not that they haven’t given a proof of a quantum speedup (in either the real world or the black-box world); rather, it’s that they’ve typically given no reasons whatsoever to think that there might be a quantum speedup, compared to the best classical algorithms available.

In the holiday spirit, let me end on a positive note. When I did the Q&A at Q2B—the same one where Sarah Kaiser asked me to comment on the term “quantum supremacy”—one of my answers touched on the most important theoretical open problems about sampling-based quantum supremacy experiments. At the top of the list, I said, was whether there’s some interactive protocol by which a near-term quantum computer can not only exhibit quantum supremacy, but prove it to a polynomial-time-bounded classical skeptic. I mentioned that there was one proposal for how to do this, in the IQP model, due to Bremner and Shepherd, from way back in 2008. I said that their proposal deserved much more attention than it had received, and that trying to break it would be one obvious thing to work on. Little did I know that, literally while I was speaking, a paper was being posted to the arXiv, by Gregory Kahanamoku-Meyer, that claims to break Bremner and Shepherd’s protocol. I haven’t yet studied the paper, but assuming it’s correct, it represents the first clear progress on this problem in years (even though of a negative kind). Cool!!

December 28, 2019

Clifford JohnsonChalky

I realized recently that I’ve forgotten a great deal of my drawing skills, settling back into some clunky habits, due to zero practice. But I’m going to need them back for a project, and so will start teaching myself again. Above is a (digital) chalk doodle I did yesterday. –cvj

The post Chalky appeared first on Asymptotia.

December 27, 2019

Matt von HippelScience, the Gift That Keeps on Giving

Merry Newtonmas, everyone!

You’ll find many scientists working over the holidays this year. Partly that’s because of the competitiveness of academia, with many scientists competing for a few positions, where even those who are “safe” have students who aren’t. But to put a more positive spin on it, it’s also because science is a gift that keeps on giving.

Scientists are driven by curiosity. We want to know more about the world, to find out everything we can. And the great thing about science is that, every time we answer a question, we have another one to ask.

Discover a new particle? You need to measure its properties, understand how it fits into your models and look for alternative explanations. Do a calculation, and in addition to checking it, you can see if the same method works on other cases, or if you can use the result to derive something else.

Down the line, the science that survives leads to further gifts. Good science spreads, with new fields emerging to investigate new phenomena. Eventually, science leads to technology, and our lives are enriched by the gifts of new knowledge.

Science is the gift that keeps on giving. It takes new forms, builds new ideas, it fills our lives and nourishes our minds. It’s a neverending puzzle.

So this Newtonmas, I hope you receive the greatest gift of all: the gift of science.

December 26, 2019

Terence TaoElgindi’s approximation of the Biot-Savart law

Let {u: {\bf R}^3 \rightarrow {\bf R}^3} be a divergence-free vector field, thus {\nabla \cdot u = 0}, which we interpret as a velocity field. In this post we will proceed formally, largely ignoring the analytic issues of whether the fields in question have sufficient regularity and decay to justify the calculations. The vorticity field {\omega: {\bf R}^3 \rightarrow {\bf R}^3} is then defined as the curl of the velocity:

\displaystyle  \omega = \nabla \times u.

(From a differential geometry viewpoint, it would be more accurate (especially in other dimensions than three) to define the vorticity as the exterior derivative {\omega = d(g \cdot u)} of the musical isomorphism {g \cdot u} of the Euclidean metric {g} applied to the velocity field {u}; see these previous lecture notes. However, we will not need this geometric formalism in this post.)

Assuming suitable regularity and decay hypotheses of the velocity field {u}, it is possible to recover the velocity from the vorticity as follows. From the general vector identity {\nabla \times \nabla \times X = \nabla(\nabla \cdot X) - \Delta X} applied to the velocity field {u}, we see that

\displaystyle  \nabla \times \omega = -\Delta u

and thus (by the commutativity of all the differential operators involved)

\displaystyle  u = - \nabla \times \Delta^{-1} \omega.

Using the Newton potential formula

\displaystyle  -\Delta^{-1} \omega(x) := \frac{1}{4\pi} \int_{{\bf R}^3} \frac{\omega(y)}{|x-y|}\ dy

and formally differentiating under the integral sign, we obtain the Biot-Savart law

\displaystyle  u(x) = \frac{1}{4\pi} \int_{{\bf R}^3} \frac{\omega(y) \times (x-y)}{|x-y|^3}\ dy. \ \ \ \ \ (1)

This law is of fundamental importance in the study of incompressible fluid equations, such as the Euler equations

\displaystyle  \partial_t u + (u \cdot \nabla) u = -\nabla p; \quad \nabla \cdot u = 0

since on applying the curl operator one obtains the vorticity equation

\displaystyle  \partial_t \omega + (u \cdot \nabla) \omega = (\omega \cdot \nabla) u \ \ \ \ \ (2)

and then by substituting (1) one gets an autonomous equation for the vorticity field {\omega}. Unfortunately, this equation is non-local, due to the integration present in (1).

In a recent work, it was observed by Elgindi that in a certain regime, the Biot-Savart law can be approximated by a more “low rank” law, which makes the non-local effects significantly simpler in nature. This simplification was carried out in spherical coordinates, and hinged on a study of the invertibility properties of a certain second order linear differential operator in the latitude variable {\theta}; however in this post I would like to observe that the approximation can also be seen directly in Cartesian coordinates from the classical Biot-Savart law (1). As a consequence one can also initiate the beginning of Elgindi’s analysis in constructing somewhat regular solutions to the Euler equations that exhibit self-similar blowup in finite time, though I have not attempted to execute the entirety of the analysis in this setting.

Elgindi’s approximation applies under the following hypotheses:

A model example of a divergence-free vector field obeying these properties (but without good decay at infinity) is the linear vector field

\displaystyle  X(x) = (x_1, x_2, -2x_3) \ \ \ \ \ (5)

which is of the form (3) with {u_r(r,x_3) = r} and {u_3(r,x_3) = -2x_3}. The associated vorticity {\omega} vanishes.

We can now give an illustration of Elgindi’s approximation:

Proposition 1 (Elgindi’s approximation) Under the above hypotheses (and assuing suitable regularity and decay), we have the pointwise bounds

\displaystyle  u(x) = \frac{1}{2} {\mathcal L}_{12}(\omega)(|x|) X(x) + O( |x| \|\omega\|_{L^\infty({\bf R}^3)} )

for any {x \in {\bf R}^3}, where {X} is the vector field (5), and {{\mathcal L}_{12}(\omega): {\bf R}^+ \rightarrow {\bf R}} is the scalar function

\displaystyle  {\mathcal L}_{12}(\omega)(\rho) := \frac{3}{4\pi} \int_{|y| \geq \rho} \frac{r y_3}{|y|^5} \omega_{r3}(r,y_3)\ dy.

Thus under the hypotheses (i), (ii), and assuming that {\omega} is slowly varying, we expect {u} to behave like the linear vector field {X} modulated by a radial scalar function. In applications one needs to control the error in various function spaces instead of pointwise, and with {\omega} similarly controlled in other function space norms than the {L^\infty} norm, but this proposition already gives a flavour of the approximation. If one uses spherical coordinates

\displaystyle  \omega_{r3}( \rho \cos \theta, \rho \sin \theta ) = \Omega( \rho, \theta )

then we have (using the spherical change of variables formula {dy = \rho^2 \cos \theta d\rho d\theta d\phi} and the odd nature of {\Omega})

\displaystyle  {\mathcal L}_{12}(\omega) = L_{12}(\Omega),


\displaystyle L_{12}(\Omega)(\rho) = 3 \int_\rho^\infty \int_0^{\pi/2} \frac{\Omega(r, \theta) \sin(\theta) \cos^2(\theta)}{r}\ d\theta dr

is the operator introduced in Elgindi’s paper.

Proof: By a limiting argument we may assume that {x} is non-zero, and we may normalise {\|\omega\|_{L^\infty({\bf R}^3)}=1}. From the triangle inequality we have

\displaystyle  \int_{|y| \leq 10|x|} \frac{\omega(y) \times (x-y)}{|x-y|^3}\ dy \leq \int_{|y| \leq 10|x|} \frac{1}{|x-y|^2}\ dy

\displaystyle  \leq \int_{|z| \leq 11 |x|} \frac{1}{|z|^2}\ dz

\displaystyle  = O( |x| )

and hence by (1)

\displaystyle  u(x) = \frac{1}{4\pi} \int_{|y| > 10|x|} \frac{\omega(y) \times (x-y)}{|x-y|^3}\ dy + O(|x|).

In the regime {|y| > 2|x|} we may perform the Taylor expansion

\displaystyle  \frac{x-y}{|x-y|^3} = \frac{x-y}{|y|^3} (1 - \frac{2 x \cdot y}{|y|^2} + \frac{|x|^2}{|y|^2})^{-3/2}

\displaystyle  = \frac{x-y}{|y|^3} (1 + \frac{3 x \cdot y}{|y|^2} + O( \frac{|x|^2}{|y|^2} ) )

\displaystyle  = -\frac{y}{|y|^3} + \frac{x}{|y|^3} - \frac{3 (x \cdot y) y}{|y|^5} + O( \frac{|x|^2}{|y|^4} ).


\displaystyle  \int_{|y| > 10|x|} \frac{|x|^2}{|y|^4}\ dy = O(|x|)

we see from the triangle inequality that the error term contributes {O(|x|)} to {u(x)}. We thus have

\displaystyle  u(x) = -A_0(x) + A_1(x) - 3A'_1(x) + O(|x|)

where {A_0} is the constant term

\displaystyle  A_0 := \int_{|y| > 10|x|} \frac{\omega(y) \times y}{|y|^3}\ dy,

and {A_1, A'_1} are the linear term

\displaystyle  A_1 := \int_{|y| > 10|x|} \frac{\omega(y) \times x}{|y|^3}\ dy,

\displaystyle  A'_1 := \int_{|y| > 10|x|} (x \cdot y) \frac{\omega(y) \times y}{|y|^5}\ dy.

By the hypotheses (i), (ii), we have the symmetries

\displaystyle  \omega(y_1,y_2,-y_3) = - \omega(y_1,y_2,y_3) \ \ \ \ \ (6)


\displaystyle  \omega(-y_1,-y_2,y_3) = - \omega(y_1,y_2,y_3) \ \ \ \ \ (7)

and hence also

\displaystyle  \omega(-y_1,-y_2,-y_3) = \omega(y_1,y_2,y_3). \ \ \ \ \ (8)

The even symmetry (8) ensures that the integrand in {A_0} is odd, so {A_0} vanishes. The symmetry (6) or (7) similarly ensures that {\int_{|y| > 10|x|} \frac{\omega(y)}{|y|^3}\ dy = 0}, so {A_1} vanishes. Since {\int_{|x| < y \leq 10|x|} \frac{|x \cdot y| |y|}{|y|^5}\ dy = O( |x| )}, we conclude that

\displaystyle  \omega(x) = -3\int_{|y| \geq |x|} (x \cdot y) \frac{\omega(y) \times y}{|y|^5}\ dy + O(|x|).

Using (4), the right-hand side is

\displaystyle  -3\int_{|y| \geq |x|} (x_1 y_1 + x_2 y_2 + x_3 y_3) \frac{\omega_{r3}(r,y_3) (-y_1 y_3, -y_2 y_3, y_1^2+y_2^2)}{r|y|^5}\ dy

\displaystyle + O(|x|)

where {r := \sqrt{y_1^2+y_2^2}}. Because of the odd nature of {\omega_{r3}}, only those terms with one factor of {y_3} give a non-vanishing contribution to the integral. Using the rotation symmetry {(y_1,y_2,y_3) \mapsto (-y_2,y_1,y_3)} we also see that any term with a factor of {y_1 y_2} also vanishes. We can thus simplify the above expression as

\displaystyle  -3\int_{|y| \geq |x|} \frac{\omega_{r3}(r,y_3) (-x_1 y_1^2 y_3, -x_2 y_2^2 y_3, x_3 (y_1^2+y_2^2) y_3)}{r|y|^5}\ dy + O(|x|).

Using the rotation symmetry {(y_1,y_2,y_3) \mapsto (-y_2,y_1,y_3)} again, we see that the term {y_1^2} in the first component can be replaced by {y_2^2} or by {\frac{1}{2} (y_1^2+y_2^2) = \frac{r^2}{2}}, and similarly for the {y_2^2} term in the second component. Thus the above expression is

\displaystyle  \frac{3}{2} \int_{|y| \geq |x|} \frac{\omega_{r3}(r,y_3) (x_1 , x_2, -2x_3) r y_3}{|y|^5}\ dy + O(|x|)

giving the claim. \Box

Example 2 Consider the divergence-free vector field {u := \nabla \times \psi}, where the vector potential {\psi} takes the form

\displaystyle  \psi(x_1,x_2,x_3) := (x_2 x_3, -x_1 x_3, 0) \eta(|x|)

for some bump function {\eta: {\bf R} \rightarrow {\bf R}} supported in {(0,+\infty)}. We can then calculate

\displaystyle  u(x_1,x_2,x_3) = X(x) \eta(|x|) + (x_1 x_3, x_2 x_3, -x_1^2-x_2^2) \frac{\eta'(|x|) x_3}{|x|}.


\displaystyle  \omega(x_1,x_2,x_3) = (-6x_2 x_3, 6x_1 x_3, 0) \frac{\eta'(|x|)}{|x|} + (-x_2 x_3, x_1 x_3, 0) \eta''(|x|).

In particular the hypotheses (i), (ii) are satisfied with

\displaystyle  \omega_{r3}(r,x_3) = - 6 \eta'(|x|) \frac{x_3 r}{|x|} - \eta''(|x|) x_3 r.

One can then calculate

\displaystyle  L_{12}(\omega)(\rho) = -\frac{3}{4\pi} \int_{|y| \geq \rho} (6\frac{\eta'(|y|)}{|y|^6} + \frac{\eta''(|y|)}{|y|^5}) r^2 y_3^2\ dy

\displaystyle  = -\frac{2}{5} \int_\rho^\infty 6\eta'(s) + s\eta''(s)\ ds

\displaystyle  = 2\eta(\rho) + \frac{2}{5} \rho \eta'(\rho).

If we take the specific choice

\displaystyle  \eta(\rho) = \varphi( \rho^\alpha )

where {\varphi} is a fixed bump function supported some interval {[c,C] \subset (0,+\infty)} and {\alpha>0} is a small parameter (so that {\eta} is spread out over the range {\rho \in [c^{1/\alpha},C^{1/\alpha}]}), then we see that

\displaystyle  \| \omega \|_{L^\infty} = O( \alpha )

(with implied constants allowed to depend on {\varphi}),

\displaystyle  L_{12}(\omega)(\rho) = 2\eta(\rho) + O(\alpha),


\displaystyle  u = X(x) \eta(|x|) + O( \alpha |x| ),

which is completely consistent with Proposition 1.

One can use this approximation to extract a plausible ansatz for a self-similar blowup to the Euler equations. We let {\alpha>0} be a small parameter and let {\omega_{rx_3}} be a time-dependent vorticity field obeying (i), (ii) of the form

\displaystyle  \omega_{rx_3}(t,r,x_3) \approx \alpha \Omega( t, R ) \mathrm{sgn}(x_3)

where {R := |x|^\alpha = (r^2+x_3^2)^{\alpha/2}} and {\Omega: {\bf R} \times [0,+\infty) \rightarrow {\bf R}} is a smooth field to be chosen later. Admittedly the signum function {\mathrm{sgn}} is not smooth at {x_3}, but let us ignore this issue for now (to rigorously make an ansatz one will have to smooth out this function a little bit; Elgindi uses the choice {(|\sin \theta| \cos^2 \theta)^{\alpha/3} \mathrm{sgn}(x_3)}, where {\theta := \mathrm{arctan}(x_3/r)}). With this ansatz one may compute

\displaystyle  {\mathcal L}_{12}(\omega(t))(\rho) \approx \frac{3\alpha}{2\pi} \int_{|y| \geq \rho; y_3 \geq 0} \Omega(t,R) \frac{r y_3}{|y|^5}\ dy

\displaystyle  = \alpha \int_\rho^\infty \Omega(t, s^\alpha) \frac{ds}{s}

\displaystyle  = \int_{\rho^\alpha}^\infty \Omega(t,s) \frac{ds}{s}.

By Proposition 1, we thus expect to have the approximation

\displaystyle  u(t,x) \approx \frac{1}{2} \int_{|x|^\alpha}^\infty \Omega(t,s) \frac{ds}{s} X(x).

We insert this into the vorticity equation (2). The transport term {(u \cdot \nabla) \omega} will be expected to be negligible because {R}, and hence {\omega_{rx_3}}, is slowly varying (the discontinuity of {\mathrm{sgn}(x_3)} will not be encountered because the vector field {X} is parallel to this singularity). The modulating function {\frac{1}{2} \int_{|x|^\alpha}^\infty \Omega(t,s) \frac{ds}{s}} is similarly slowly varying, so derivatives falling on this function should be lower order. Neglecting such terms, we arrive at the approximation

\displaystyle  (\omega \cdot \nabla) u \approx \frac{1}{2} \int_{|x|^\alpha}^\infty \Omega(t,s) \frac{ds}{s} \omega

and so in the limit {\alpha \rightarrow 0} we expect obtain a simple model equation for the evolution of the vorticity envelope {\Omega}:

\displaystyle  \partial_t \Omega(t,R) = \frac{1}{2} \int_R^\infty \Omega(t,S) \frac{dS}{S} \Omega(t,R).

If we write {L(t,R) := \int_R^\infty \Omega(t,S)\frac{dS}{S}} for the logarithmic primitive of {\Omega}, then we have {\Omega = - R \partial_R L} and hence

\displaystyle  \partial_t (R \partial_R L) = \frac{1}{2} L (R \partial_R L)

which integrates to the Ricatti equation

\displaystyle  \partial_t L = \frac{1}{4} L^2

which can be explicitly solved as

\displaystyle  L(t,R) = \frac{2}{f(R) - t/2}

where {f(R)} is any function of {R} that one pleases. (In Elgindi’s work a time dilation is used to remove the unsightly factor of {1/2} appearing here in the denominator.) If for instance we set {f(R) = 1+R}, we obtain the self-similar solution

\displaystyle  L(t,R) = \frac{2}{1+R-t/2}

and then on applying {-R \partial_R}

\displaystyle  \Omega(t,R) = \frac{2R}{(1+R-t/2)^2}.

Thus, we expect to be able to construct a self-similar blowup to the Euler equations with a vorticity field approximately behaving like

\displaystyle  \omega(t,x) \approx \alpha \frac{2R}{(1+R-t/2)^2} \mathrm{sgn}(x_3) (\frac{x_2}{r}, -\frac{x_1}{r}, 0)

and velocity field behaving like

\displaystyle  u(t,x) \approx \frac{1}{1+R-t/2} X(x).

In particular, {u} would be expected to be of regularity {C^{1,\alpha}} (and smooth away from the origin), and blows up in (say) {L^\infty} norm at time {t/2 = 1}, and one has the self-similarity

\displaystyle  u(t,x) = (1-t/2)^{\frac{1}{\alpha}-1} u( 0, \frac{x}{(1-t/2)^{1/\alpha}} )


\displaystyle  \omega(t,x) = (1-t/2)^{-1} \omega( 0, \frac{x}{(1-t/2)^{1/\alpha}} ).

A self-similar solution of this approximate shape is in fact constructed rigorously in Elgindi’s paper (using spherical coordinates instead of the Cartesian approach adopted here), using a nonlinear stability analysis of the above ansatz. It seems plausible that one could also carry out this stability analysis using this Cartesian coordinate approach, although I have not tried to do this in detail.

December 23, 2019

John PreskillAn equation fit for a novel

Archana Kamal was hunting for an apartment in Cambridge, Massachusetts. She was moving MIT, to work as a postdoc in physics. The first apartment she toured had housed John Updike, during his undergraduate career at Harvard. No other apartment could compete; Archana signed the lease.

The apartment occupied the basement of a red-brick building covered in vines. The rooms spanned no more than 350 square feet. Yet her window opened onto the neighbors’ garden, whose leaves she tracked across the seasons. And Archana cohabited with history.

Apartment photos

She’s now studying the universe’s history, as an assistant professor of physics at the University of Massachusetts Lowell. The cosmic microwave background (CMB) pervades the universe. The CMB consists of electromagnetic radiation, or light. Light has particle-like properties and wavelike properties. The wavelike properties include wavelength, the distance between successive peaks. Long-wavelength light includes red light, infrared light, and radio waves. Short-wavelength light includes blue light, ultraviolet light, and X-rays. Light of one wavelength and light of another wavelength are said to belong to different modes.


Does the CMB have nonclassical properties, impossible to predict with classical physics but (perhaps) predictable with quantum theory? The CMB does according to the theory of inflation. According to the theory, during a short time interval after the Big Bang, the universe expanded very quickly: Spacetime stretched. Inflation explains features of our universe, though we don’t know what mechanism would have effected the expansion.

According to inflation, around the Big Bang time, all the light in the universe crowded together. The photons (particles of light) interacted, entangling (developing strong quantum correlations). Spacetime then expanded, and the photons separated. But they might retain entanglement.

Detecting that putative entanglement poses challenges. For instance, the particles that you’d need to measure could produce a signal too weak to observe. Cosmologists have been scratching their heads about how to observe nonclassicality in the CMB. One team—Nishant Agarwal at UMass Lowell and Sarah Shandera at Pennsylvania State University—turned to Archana for help.

A sky full of stars

Archana studies the theory of open quantum systems, quantum systems that interact with their environments. She thinks most about systems such as superconducting qubits, tiny circuits with which labs are building quantum computers. But the visible universe constitutes an open quantum system.

We can see only part of the universe—or, rather, only part of what we believe is the whole universe. Why? We can see only stuff that’s emitted light that has reached us, and light has had only so long to travel. But the visible universe interacts (we believe) with stuff we haven’t seen. For instance, according to the theory of inflation, that rapid expansion stretched some light modes’ wavelengths. Those wavelengths grew longer than the visible universe. We can’t see those modes’ peak-to-peak variations or otherwise observe the modes, often called “frozen.” But the frozen modes act as an environment that exchanges information and energy with the visible universe.

We describe an open quantum system’s evolution with a quantum master equation, which I blogged about four-and-a-half years ago. Archana and collaborators constructed a quantum master equation for the visible universe. The frozen modes, they found, retain memories of the visible universe. (Experts: the bath is non-Markovian.) Next, they need to solve the equation. Then, they’ll try to use their solution to identify quantum observables that could reveal nonclassicality in the CMB.

Frozen modes

Frozen modes

Archana’s project caught my fancy for two reasons. First, when I visited her in October, I was collaborating on a related project. My coauthors and I were concocting a scheme for detecting nonclassical correlations in many-particle systems by measuring large-scale properties. Our paper debuted last month. It might—with thought and a dash of craziness—be applied to detect nonclassicality in the CMB. Archana’s explanation improved my understanding of our scheme’s potential. 

Second, Archana and collaborators formulated a quantum master equation for the visible universe. A quantum master equation for the visible universe. The phrase sounded romantic to me.1 It merited a coauthor who’d seized on an apartment lived in by a Pulitzer Prize-winning novelist. 

Archana’s cosmology and Updike stories reminded me of one reason why I appreciate living in the Boston area: History envelops us here. Last month, while walking to a grocery, I found a sign that marks the building in which the poet e. e. cummings was born. My walking partner then generously tolerated a recitation of cummings’s “anyone lived in a pretty how town.” History enriches our lives—and some of it might contain entanglement.


1It might sound like gobbledygook to you, if I’ve botched my explanations of the terminology.

With thanks to Archana and the UMass Lowell Department of Physics and Applied Physics for their hospitality and seminar invitation.

December 20, 2019

Jordan EllenbergIf you build it they will come and exploit it

There’s no way to build a program for people in need that can’t be taken advantage of by unscrupulous people who aren’t in need. I have a friend, an attorney, who used to work cases involving people who defrauded the foster care system, taking state money for the care of children who didn’t exist, or who weren’t really in their care. It’s maddening. She eventually quit that job, partially because it was so dispiriting to come in daily contact with people being awful. But what can you do? You can’t build a fence strong enough to keep out all fraud without making the administrative burden impossibly high for the many honest people doing the hard, humane work of raising kids who need parents. There’s some optimal level of vigilance that leads to some optimal level of fraud and that optimal level of fraud isn’t zero.

I thought of my friend when I read this story, about developer Dan Gilbert getting an “opportunity zone” tax break officially intended for spurring development in impoverished areas:

Gilbert’s relationship with the White House helped him win his desired tax break, an email obtained by ProPublica suggests. In February 2018, as the selection process was underway, a top Michigan economic development official asked her colleague to call Quicken’s executive vice president for government affairs about opportunity zones.

“They worked with the White House on it and want to be sure we are coordinated,” wrote the official, Christine Roeder, in an email with the subject line “Quicken.”

The exact role of the White House is not clear. But less than two weeks after the email was written, the Trump administration revised its list of census tracts that were eligible for the tax break. New to the list? One of the downtown Detroit tracts dominated by Gilbert that had not previously been included. And the area made the cut even though it did not meet the poverty requirements of the program. The Gilbert opportunity zone is one of a handful around the country that were included despite not meeting the eligibility criteria, according to an analysis by ProPublica.

Maybe there’s no way to design a program like this without billionaires with phalanxes of lawyers and friends in high places being able to sop up some of the money. Even before the “opportunity zones,” Jared Kushner was able to game a similar program by drawing a gerrymandered “low-income district” that snaked its way through Jersey City to include the site of his luxury skyscraper and also some poor neighborhoods miles away. But I have to believe the optimal enforcement level is higher and the optimal malfeasance level lower than what we have now.

Jordan EllenbergTwo wards in Madison

Ward 49 and Ward 66 are two big voting precincts in Madison. Ward 65, where I vote, has 2.819 registered voters. Ward 49, in the campus area with tons of undergrad-heavy high rises, has 3,505.

In the 2018 governor’s race, Ward 65 went for Tony Evers 2190-179. Ward 49 also went big for Evers, though not as dominantly: he won there 1985-591.

Now look at the April 2019 Supreme Court election. Ward 65 went strongly for the more liberal candidate, Lisa Neubauer, voting for her by a 1631-103 margin. Ward 49 also liked Neubauer but the margin was 531-101. 25% more voters but about a third as many votes. Evers narrowly won his election. Neubauer narrowly lost hers. Young voters sitting out downballot elections is pretty important.

Terence Tao254A, Notes 10 – mean values of nonpretentious multiplicative functions

Let us call an arithmetic function {f: {\bf N} \rightarrow {\bf C}} {1}-bounded if we have {|f(n)| \leq 1} for all {n \in {\bf N}}. In this section we focus on the asymptotic behaviour of {1}-bounded multiplicative functions. Some key examples of such functions include:

  • The Möbius function {\mu};
  • The Liouville function {\lambda};
  • Archimedean” characters {n \mapsto n^{it}} (which I call Archimedean because they are pullbacks of a Fourier character {x \mapsto x^{it}} on the multiplicative group {{\bf R}^+}, which has the Archimedean property);
  • Dirichlet characters (or “non-Archimedean” characters) {\chi} (which are essentially pullbacks of Fourier characters on a multiplicative cyclic group {({\bf Z}/q{\bf Z})^\times} with the discrete (non-Archimedean) metric);
  • Hybrid characters {n \mapsto \chi(n) n^{it}}.

The space of {1}-bounded multiplicative functions is also closed under multiplication and complex conjugation.

Given a multiplicative function {f}, we are often interested in the asymptotics of long averages such as

\displaystyle  \frac{1}{X} \sum_{n \leq X} f(n)

for large values of {X}, as well as short sums

\displaystyle  \frac{1}{H} \sum_{x \leq n \leq x+H} f(n)

where {H} and {x} are both large, but {H} is significantly smaller than {x}. (Throughout these notes we will try to normalise most of the sums and integrals appearing here as averages that are trivially bounded by {O(1)}; note that other normalisations are preferred in some of the literature cited here.) For instance, as we established in Theorem 58 of Notes 1, the prime number theorem is equivalent to the assertion that

\displaystyle  \frac{1}{X} \sum_{n \leq X} \mu(n) = o(1) \ \ \ \ \ (1)

as {X \rightarrow \infty}. The Liouville function behaves almost identically to the Möbius function, in that estimates for one function almost always imply analogous estimates for the other:

Exercise 1 Without using the prime number theorem, show that (1) is also equivalent to

\displaystyle  \frac{1}{X} \sum_{n \leq X} \lambda(n) = o(1) \ \ \ \ \ (2)

as {X \rightarrow \infty}. (Hint: use the identities {\lambda(n) = \sum_{d^2|n} \mu(n/d^2)} and {\mu(n) = \sum_{d^2|n} \mu(d) \lambda(n/d^2)}.)

Henceforth we shall focus our discussion more on the Liouville function, and turn our attention to averages on shorter intervals. From (2) one has

\displaystyle  \frac{1}{H} \sum_{x \leq n \leq x+H} \lambda(n) = o(1) \ \ \ \ \ (3)

as {x \rightarrow \infty} if {H = H(x)} is such that {H \geq \varepsilon x} for some fixed {\varepsilon>0}. However it is significantly more difficult to understand what happens when {H} grows much slower than this. By using the techniques based on zero density estimates discussed in Notes 6, it was shown by Motohashi and that one can also establish \eqref. On the Riemann Hypothesis Maier and Montgomery lowered the threshold to {H \geq x^{1/2} \log^C x} for an absolute constant {C} (the bound {H \geq x^{1/2+\varepsilon}} is more classical, following from Exercise 33 of Notes 2). On the other hand, the randomness heuristics from Supplement 4 suggest that {H} should be able to be taken as small as {x^\varepsilon}, and perhaps even {\log^{1+\varepsilon} x} if one is particularly optimistic about the accuracy of these probabilistic models. On the other hand, the Chowla conjecture (mentioned for instance in Supplement 4) predicts that {H} cannot be taken arbitrarily slowly growing in {x}, due to the conjectured existence of arbitrarily long strings of consecutive numbers where the Liouville function does not change sign (and in fact one can already show from the known partial results towards the Chowla conjecture that (3) fails for some sequence {x \rightarrow \infty} and some sufficiently slowly growing {H = H(x)}, by modifying the arguments in these papers of mine).

The situation is better when one asks to understand the mean value on almost all short intervals, rather than all intervals. There are several equivalent ways to formulate this question:

Exercise 2 Let {H = H(X)} be a function of {X} such that {H \rightarrow \infty} and {H = o(X)} as {X \rightarrow \infty}. Let {f: {\bf N} \rightarrow {\bf C}} be a {1}-bounded function. Show that the following assertions are equivalent:

  • (i) One has

    \displaystyle  \frac{1}{H} \sum_{x \leq n \leq x+H} f(n) = o(1)

    as {X \rightarrow \infty}, uniformly for all {x \in [X,2X]} outside of a set of measure {o(X)}.

  • (ii) One has

    \displaystyle  \frac{1}{X} \int_X^{2X} |\frac{1}{H} \sum_{x \leq n \leq x+H} f(n)|\ dx = o(1)

    as {X \rightarrow \infty}.

  • (iii) One has

    \displaystyle  \frac{1}{X} \int_X^{2X} |\frac{1}{H} \sum_{x \leq n \leq x+H} f(n)|^2\ dx = o(1) \ \ \ \ \ (4)

    as {X \rightarrow \infty}.

As it turns out the second moment formulation in (iii) will be the most convenient for us to work with in this set of notes, as it is well suited to Fourier-analytic techniques (and in particular the Plancherel theorem).

Using zero density methods, for instance, it was shown by Ramachandra that

\displaystyle  \frac{1}{X} \int_X^{2X} |\frac{1}{H} \sum_{x \leq n \leq x+H} \lambda(n)|^2\ dx \ll_{A,\varepsilon} \log^{-A} X

whenever {X^{1/6+\varepsilon} \leq H \leq X} and {\varepsilon>0}. With this quality of bound (saving arbitrary powers of {\log X} over the trivial bound of {O(1)}), this is still the lowest value of {H} one can reach unconditionally. However, in a striking recent breakthrough, it was shown by Matomaki and Radziwill that as long as one is willing to settle for weaker bounds (saving a small power of {\log X} or {\log H}, or just a qualitative decay of {o(1)}), one can obtain non-trivial estimates on far shorter intervals. For instance, they show

Theorem 3 (Matomaki-Radziwill theorem for Liouville) For any {2 \leq H \leq X}, one has

\displaystyle  \frac{1}{X} \int_X^{2X} |\frac{1}{H} \sum_{x \leq n \leq x+H} \lambda(n)|^2\ dx \ll \log^{-c} H

for some absolute constant {c>0}.

In fact they prove a slightly more precise result: see Theorem 1 of that paper. In particular, they obtain the asymptotic (4) for any function {H = H(X)} that goes to infinity as {X \rightarrow \infty}, no matter how slowly! This ability to let {H} grow slowly with {X} is important for several applications; for instance, in order to combine this type of result with the entropy decrement methods from Notes 9, it is essential that {H} be allowed to grow more slowly than {\log X}. See also this survey of Soundararajan for further discussion.

Exercise 4 In this exercise you may use Theorem 3 freely.

  • (i) Establish the lower bound

    \displaystyle  \frac{1}{X} \sum_{n \leq X} \lambda(n)\lambda(n+1) > -1+c

    for some absolute constant {c>0} and all sufficiently large {X}. (Hint: if this bound failed, then {\lambda(n)=\lambda(n+1)} would hold for almost all {n}; use this to create many intervals {[x,x+H]} for which {\frac{1}{H} \sum_{x \leq n \leq x+H} \lambda(n)} is extremely large.)

  • (ii) Show that Theorem 3 also holds with {\lambda(n)} replaced by {\chi_2 \lambda(n)}, where {\chi_2} is the principal character of period {2}. (Use the fact that {\lambda(2n)=-\lambda(n)} for all {n}.) Use this to establish the corresponding upper bound

    \displaystyle  \frac{1}{X} \sum_{n \leq X} \lambda(n)\lambda(n+1) < 1-c

    to (i).

(There is a curious asymmetry to the difficulty level of these bounds; the upper bound in (ii) was established much earlier by Harman, Pintz, and Wolke, but the lower bound in (i) was only established in the Matomaki-Radziwill paper.)

The techniques discussed previously were highly complex-analytic in nature, relying in particular on the fact that functions such as {\mu} or {\lambda} have Dirichlet series {{\mathcal D} \mu(s) = \frac{1}{\zeta(s)}}, {{\mathcal D} \lambda(s) = \frac{\zeta(2s)}{\zeta(s)}} that extend meromorphically into the critical strip. In contrast, the Matomaki-Radziwill theorem does not rely on such meromorphic continuations, and in fact holds for more general classes of {1}-bounded multiplicative functions {f}, for which one typically does not expect any meromorphic continuation into the strip. Instead, one can view the Matomaki-Radziwill theory as following the philosophy of a slightly different approach to multiplicative number theory, namely the pretentious multiplicative number theory of Granville and Soundarajan (as presented for instance in their draft monograph). A basic notion here is the pretentious distance between two {1}-bounded multiplicative functions {f,g} (at a given scale {X}), which informally measures the extent to which {f} “pretends” to be like {g} (or vice versa). The precise definition is

Definition 5 (Pretentious distance) Given two {1}-bounded multiplicative functions {f,g}, and a threshold {X>0}, the pretentious distance {\mathbb{D}(f,g;X)} between {f} and {g} up to scale {X} is given by the formula

\displaystyle  \mathbb{D}(f,g;X) := \left( \sum_{p \leq X} \frac{1 - \mathrm{Re}(f(p) \overline{g(p)})}{p} \right)^{1/2}

Note that one can also define an infinite version {\mathbb{D}(f,g;\infty)} of this distance by removing the constraint {p \leq X}, though in such cases the pretentious distance may then be infinite. The pretentious distance is not quite a metric (because {\mathbb{D}(f,f;X)} can be non-zero, and furthermore {\mathbb{D}(f,g;X)} can vanish without {f,g} being equal), but it is still quite close to behaving like a metric, in particular it obeys the triangle inequality; see Exercise 16 below. The philosophy of pretentious multiplicative number theory is that two {1}-bounded multiplicative functions {f,g} will exhibit similar behaviour at scale {X} if their pretentious distance {\mathbb{D}(f,g;X)} is bounded, but will become uncorrelated from each other if this distance becomes large. A simple example of this philosophy is given by the following “weak Halasz theorem”, proven in Section 2:

Proposition 6 (Logarithmically averaged version of Halasz) Let {X} be sufficiently large. Then for any {1}-bounded multiplicative functions {f,g}, one has

\displaystyle  \frac{1}{\log X} \sum_{n \leq X} \frac{f(n) \overline{g(n)}}{n} \ll \exp( - c \mathbb{D}(f, g;X)^2 )

for an absolute constant {c>0}.

In particular, if {f} does not pretend to be {1}, then the logarithmic average {\frac{1}{\log X} \sum_{n \leq X} \frac{f(n)}{n}} will be small. This condition is basically necessary, since of course {\frac{1}{\log X} \sum_{n \leq X} \frac{1}{n} = 1 + o(1)}.

If one works with non-logarithmic averages {\frac{1}{X} \sum_{n \leq X} f(n)}, then not pretending to be {1} is insufficient to establish decay, as was already observed in Exercise 11 of Notes 1: if {f} is an Archimedean character {f(n) = n^{it}} for some non-zero real {t}, then {\frac{1}{\log X} \sum_{n \leq X} \frac{f(n)}{n}} goes to zero as {X \rightarrow \infty} (which is consistent with Proposition 6), but {\frac{1}{X} \sum_{n \leq X} f(n)} does not go to zero. However, this is in some sense the “only” obstruction to these averages decaying to zero, as quantified by the following basic result:

Theorem 7 (Halasz’s theorem) Let {X} be sufficiently large. Then for any {1}-bounded multiplicative function {f}, one has

\displaystyle  \frac{1}{X} \sum_{n \leq X} f(n) \ll \exp( - c \min_{|t| \leq T} \mathbb{D}(f, n \mapsto n^{it};X)^2 ) + \frac{1}{T}

for an absolute constant {c>0} and any {T > 0}.

Informally, we refer to a {1}-bounded multiplicative function as “pretentious’; if it pretends to be a character such as {n^{it}}, and “non-pretentious” otherwise. The precise distinction is rather malleable, as the precise class of characters that one views as “obstructions” varies from situation to situation. For instance, in Proposition 6 it is just the trivial character {1} which needs to be considered, but in Theorem 7 it is the characters {n \mapsto n^{it}} with {|t| \leq T}. In other contexts one may also need to add Dirichlet characters {\chi(n)} or hybrid characters such as {\chi(n) n^{it}} to the list of characters that one might pretend to be. The division into pretentious and non-pretentious functions in multiplicative number theory is faintly analogous to the division into major and minor arcs in the circle method applied to additive number theory problems; see Notes 8. The Möbius and Liouville functions are model examples of non-pretentious functions; see Exercise 24.

In the contrapositive, Halasz’ theorem can be formulated as the assertion that if one has a large mean

\displaystyle  |\frac{1}{X} \sum_{n \leq X} f(n)| \geq \eta

for some {\eta > 0}, then one has the pretentious property

\displaystyle  \mathbb{D}( f, n \mapsto n^{it}; X ) \ll \sqrt{\log(1/\eta)}

for some {t \ll \eta^{-1}}. This has the flavour of an “inverse theorem”, of the type often found in arithmetic combinatorics.

Among other things, Halasz’s theorem gives yet another proof of the prime number theorem (1); see Section 2.

We now give a version of the Matomaki-Radziwill theorem for general (non-pretentious) multiplicative functions that is formulated in a similar contrapositive (or “inverse theorem”) fashion, though to simplify the presentation we only state a qualitative version that does not give explicit bounds.

Theorem 8 ((Qualitative) Matomaki-Radziwill theorem) Let {\eta>0}, and let {1 \leq H \leq X}, with {H} sufficiently large depending on {\eta}. Suppose that {f} is a {1}-bounded multiplicative function such that

\displaystyle  \frac{1}{X} \int_X^{2X} |\frac{1}{H} \sum_{x \leq n \leq x+H} f(n)|^2\ dx \geq \eta^2.

Then one has

\displaystyle  \mathbb{D}(f, n \mapsto n^{it};X) \ll_\eta 1

for some {t \ll_\eta \frac{X}{H}}.

The condition {t \ll_\eta \frac{X}{H}} is basically optimal, as the following example shows:

Exercise 9 Let {\varepsilon>0} be a sufficiently small constant, and let {1 \leq H \leq X} be such that {\frac{1}{\varepsilon} \leq H \leq \varepsilon X}. Let {f} be the Archimedean character {f(n) = n^{it}} for some {|t| \leq \varepsilon \frac{X}{H}}. Show that

\displaystyle  \frac{1}{X} \int_X^{2X} |\frac{1}{H} \sum_{x \leq n \leq x+H} f(n)|^2\ dx \asymp 1.

Combining Theorem 8 with standard non-pretentiousness facts about the Liouville function (see Exercise 24), we recover Theorem 3 (but with a decay rate of only {o(1)} rather than {\log^{-c} H}). We refer the reader to the original paper of Matomaki-Radziwill (as well as this followup paper with myself) for the quantitative version of Theorem 8 that is strong enough to recover the full version of Theorem 3, and which can also handle real-valued pretentious functions.

With our current state of knowledge, the only arguments that can establish the full strength of Halasz and Matomaki-Radziwill theorems are Fourier analytic in nature, relating sums involving an arithmetic function {f} with its Dirichlet series

\displaystyle  {\mathcal D} f(s) := \sum_{n=1}^\infty \frac{f(n)}{n^s}

which one can view as a discrete Fourier transform of {f} (or more precisely of the measure {\sum_{n=1}^\infty \frac{f(n)}{n} \delta_{\log n}}, if one evaluates the Dirichlet series on the right edge {\{ 1+it: t \in {\bf R} \}} of the critical strip). In this aspect, the techniques resemble the complex-analytic methods from Notes 2, but with the key difference that no analytic or meromorphic continuation into the strip is assumed. The key identity that allows us to pass to Dirichlet series is the following variant of Proposition 7 of Notes 2:

Proposition 10 (Parseval type identity) Let {f,g: {\bf N} \rightarrow {\bf C}} be finitely supported arithmetic functions, and let {\psi: {\bf R} \rightarrow {\bf R}} be a Schwartz function. Then

\displaystyle  \sum_{n=1}^\infty \sum_{m=1}^\infty \frac{f(n)}{n} \frac{\overline{g(m)}}{m} \psi(\log n - \log m) = \frac{1}{2\pi} \int_{\bf R} {\mathcal D} f(1+it) \overline{{\mathcal D} g(1+it)} \hat \psi(t)\ dt

where {\hat \psi(t) := \int_{\bf R} \psi(u) e^{itu}\ du} is the Fourier transform of {\psi}. (Note that the finite support of {f,g} and the Schwartz nature of {\psi,\hat \psi} ensure that both sides of the identity are absolutely convergent.)

The restriction that {f,g} be finitely supported will be slightly annoying in places, since most multiplicative functions will fail to be finitely supported, but this technicality can usually be overcome by suitably truncating the multiplicative function, and taking limits if necessary.

Proof: By expanding out the Dirichlet series, it suffices to show that

\displaystyle  \psi(\log n - \log m) = \frac{1}{2\pi} \int_{\bf R} \frac{1}{n^{it}} \frac{1}{m^{-it}} \hat \psi(t)\ dt

for any natural numbers {n,m}. But this follows from the Fourier inversion formula {\psi(u) = \frac{1}{2\pi} \int_{\bf R} e^{-itu} \hat \psi(t)\ dt} applied at {u = \log n - \log m}. \Box

For applications to Halasz type theorems, one sets {g(n)} equal to the Kronecker delta {\delta_{n=1}}, producing weighted integrals of {{\mathcal D} f(1+it)} of “{L^1}” type. For applications to Matomaki-Radziwill theorems, one instead sets {f=g}, and more precisely uses the following corollary of the above proposition, to obtain weighted integrals of {|{\mathcal D} f(1+it)|^2} of “{L^2}” type:

Exercise 11 (Plancherel type identity) If {f: {\bf N} \rightarrow {\bf C}} is finitely supported, and {\varphi: {\bf R} \rightarrow {\bf R}} is a Schwartz function, establish the identity

\displaystyle  \int_0^\infty |\sum_{n=1}^\infty \frac{f(n)}{n} \varphi(\log n - \log y)|^2 \frac{dy}{y} = \frac{1}{2\pi} \int_{\bf R} |{\mathcal D} f(1+it)|^2 |\hat \varphi(t)|^2\ dt.

In contrast, information about the non-pretentious nature of a multiplicative function {f} will give “pointwise” or “{L^\infty}” type control on the Dirichlet series {{\mathcal D} f(1+it)}, as is suggested from the Euler product factorisation of {{\mathcal D} f}.

It will be convenient to formalise the notion of {L^1}, {L^2}, and {L^\infty} control of the Dirichlet series {{\mathcal D} f}, which as previously mentioned can be viewed as a sort of “Fourier transform” of {f}:

Definition 12 (Fourier norms) Let {f: {\bf N} \rightarrow {\bf C}} be finitely supported, and let {\Omega \subset {\bf R}} be a bounded measurable set. We define the Fourier {L^\infty} norm

\displaystyle  \| f\|_{FL^\infty(\Omega)} := \sup_{t \in \Omega} |{\mathcal D} f(1+it)|,

the Fourier {L^2} norm

\displaystyle  \| f\|_{FL^2(\Omega)} := \left(\int_\Omega |{\mathcal D} f(1+it)|^2\ dt\right)^{1/2},

and the Fourier {L^1} norm

\displaystyle  \| f\|_{FL^1(\Omega)} := \int_\Omega |{\mathcal D} f(1+it)|\ dt.

One could more generally define {FL^p} norms for other exponents {p}, but we will only need the exponents {p=1,2,\infty} in this current set of notes. It is clear that all the above norms are in fact (semi-)norms on the space of finitely supported arithmetic functions.

As mentioned above, Halasz’s theorem gives good control on the Fourier {L^\infty} norm for restrictions of non-pretentious functions to intervals:

Exercise 13 (Fourier {L^\infty} control via Halasz) Let {f: {\bf N} \rightarrow {\bf C}} be a {1}-bounded multiplicative function, let {I} be an interval in {[C^{-1} X, CX]} for some {X \geq C \geq 1}, let {R \geq 1}, and let {\Omega \subset {\bf R}} be a bounded measurable set. Show that

\displaystyle  \| f 1_I \|_{FL^\infty(\Omega)} \ll_C \exp( - c \min_{t: \mathrm{dist}(t,\Omega) \leq R} \mathbb{D}(f, n \mapsto n^{it};X)^2 ) + \frac{1}{R}.

(Hint: you will need to use summation by parts (or an equivalent device) to deal with a {\frac{1}{n}} weight.)

Meanwhile, the Plancherel identity in Exercise 11 gives good control on the Fourier {L^2} norm for functions on long intervals (compare with Exercise 2 from Notes 6):

Exercise 14 ({L^2} mean value theorem) Let {T \geq 1}, and let {f: {\bf N} \rightarrow {\bf C}} be finitely supported. Show that

\displaystyle  \| f \|_{FL^2([-T,T])}^2 \ll \sum_n \frac{1}{n} (\frac{T}{n} \sum_{m: |n-m| \leq n/T} |f(m)|)^2.

Conclude in particular that if {f} is supported in {[C^{-1} N, C N]} for some {C \geq 1} and {N \gg T}, then

\displaystyle  \| f \|_{FL^2([-T,T])}^2 \ll C^{O(1)} \frac{1}{N} \sum_n |f(n)|^2.

In the simplest case of the logarithmically averaged Halasz theorem (Proposition 6), Fourier {L^\infty} estimates are already sufficient to obtain decent control on the (weighted) Fourier {L^1} type expressions that show up. However, these estimates are not enough by themselves to establish the full Halasz theorem or the Matomaki-Radziwill theorem. To get from Fourier {L^\infty} control to Fourier {L^1} or {L^2} control more efficiently, the key trick is use Hölder’s inequality, which when combined with the basic Dirichlet series identity

\displaystyle  {\mathcal D}(f*g) = ({\mathcal D} f) ({\mathcal D} g)

gives the inequalities

\displaystyle  \| f*g \|_{FL^1(\Omega)} \leq \|f\|_{FL^2(\Omega)} \|g\|_{FL^2(\Omega)} \ \ \ \ \ (5)


\displaystyle  \| f*g \|_{FL^2(\Omega)} \leq \|f\|_{FL^2(\Omega)} \|g\|_{FL^\infty(\Omega)} \ \ \ \ \ (6)

The strategy is then to factor (or approximately factor) the original function {f} as a Dirichlet convolution (or average of convolutions) of various components, each of which enjoys reasonably good Fourier {L^2} or {L^\infty} estimates on various regions {\Omega}, and then combine them using the Hölder inequalities (5), (6) and the triangle inequality. For instance, to prove Halasz’s theorem, we will split {f} into the Dirichlet convolution of three factors, one of which will be estimated in {FL^\infty} using the non-pretentiousness hypothesis, and the other two being estimated in {FL^2} using Exercise 14. For the Matomaki-Radziwill theorem, one uses a significantly more complicated decomposition of {f} into a variety of Dirichlet convolutions of factors, and also splits up the Fourier domain {[-T,T]} into several subregions depending on whether the Dirichlet series associated to some of these components are large or small. In each region and for each component of these decompositions, all but one of the factors will be estimated in {FL^\infty}, and the other in {FL^2}; but the precise way in which this is done will vary from component to component. For instance, in some regions a key factor will be small in {FL^\infty} by construction of the region; in other places, the {FL^\infty} control will come from Exercise 13. Similarly, in some regions, satisfactory {FL^2} control is provided by Exercise 14, but in other regions one must instead use “large value” theorems (in the spirit of Proposition 9 from Notes 6), or amplify the power of the standard {L^2} mean value theorems by combining the Dirichlet series with other Dirichlet series that are known to be large in this region.

There are several ways to achieve the desired factorisation. In the case of Halasz’s theorem, we can simply work with a crude version of the Euler product factorisation, dividing the primes into three categories (“small”, “medium”, and “large” primes) and expressing {f} as a triple Dirichlet convolution accordingly. For the Matomaki-Radziwill theorem, one instead exploits the Turan-Kubilius phenomenon (Section 5 of Notes 1, or Lemma 2 of Notes 9)) that for various moderately wide ranges {[P,Q]} of primes, the number of prime divisors of a large number {n} in the range {[P,Q]} is almost always close to {\log\log Q - \log\log P}. Thus, if we introduce the arithmetic functions

\displaystyle  w_{[P,Q]}(n) = \frac{1}{\log\log Q - \log\log P} \sum_{P \leq p \leq Q} 1_{n=p} \ \ \ \ \ (7)

then we have

\displaystyle  1 \approx 1 * w_{[P,Q]}

and more generally we have a twisted approximation

\displaystyle  f \approx f * fw_{[P,Q]}

for multiplicative functions {f}. (Actually, for technical reasons it will be convenient to work with a smoothed out version of these functions; see Section 3.) Informally, these formulas suggest that the “{FL^2} energy” of a multiplicative function {f} is concentrated in those regions where {f w_{[P,Q]}} is extremely large in a {FL^\infty} sense. Iterations of this formula (or variants of this formula, such as an identity due to Ramaré) will then give the desired (approximate) factorisation of {{\mathcal D} f}.

— 1. Pretentious distance —

In this section we explore the notion of pretentious distance. The following Hilbert space lemma will be useful for establishing the triangle inequality for this distance:

Lemma 15 (Triangle inequality) Let {u,v,w} be vectors in a real Hilbert space {H} with {\|u\|, \|v\|, \|w\| \leq 1}. Then

\displaystyle  (1-\langle u,w \rangle)^{1/2} \leq (1-\langle u,v \rangle)^{1/2} + (1-\langle v,w \rangle)^{1/2}.

Proof: First suppose that {u,v,w} are unit vectors: {\|u\|=\|v\|=\|w\|=1}. Then by the cosine rule {(1-\langle u,v\rangle)^{1/2} = 2^{-1/2} \|u-v\|}, and similarly for {(1-\langle v,w \rangle)^{1/2}} and {(1-\langle u,w \rangle)^{1/2}}. The claim now follows from the usual triangle inequality {\|u-w\| \leq \|u-v\| + \|v-w\|}.

Now suppose we are in the general case when {\|u\|,\|v\|,\|w\| \leq 1}. In this case we extend {u,v,w} to unit vectors by working in the product {H \times {\bf R}^3} of {H} with the Euclidean space {{\bf R}^3} and applying the previous inequality to the extended unit vectors

\displaystyle  \tilde u := (u, (1-\|u\|^2)^{1/2},0,0)

\displaystyle  \tilde v := (v,0,(1-\|v\|^2)^{1/2},0)

\displaystyle  \tilde w := (w,0,0,(1-\|w\|^2)^{1/2}),

observing that the extensions have the same inner products as the original vectors. \Box

Exercise 16 (Basic properties of pretentious distance) Let {f,g,h,k} be {1}-bounded multiplicative functions, and let {X,Y \geq 2}.

Exercise 17 If {\chi, \chi'} are Dirichlet characters of periods {q,q'} respectively induced from the same primitive character, and {X \geq 2}, show that {\mathbb{D}(\chi,\chi';X) \ll \log\log\log(C+qq')} for some absolute constant {C>0} (the only purpose of which is to keep the triple logarithm {\log\log\log(C+qq')} positive). (Hint: control the contributions of the primes {p} in each dyadic block {2^k \leq p < 2^{k+1}} separately for {\log qq' \ll 2^k \ll qq'}.)

Next, we relate pretentious distance to the value of Dirichlet series just to the right of the critical strip. There is an annoying minor technicality that the prime {2} has to be treated separately, but this will not cause too much trouble.

Lemma 18 (Dirichlet series and pretentious distance) Let {f: {\bf N} \rightarrow {\bf C}} be a {1}-bounded multiplicative function, {X \geq 10}, and {t \in {\bf R}}. Then

\displaystyle  \frac{1}{\log X} |{\mathcal D} f(1+\frac{1}{\log X}+it)| \asymp |\sum_{j=0}^\infty \frac{f(2^j)}{2^{(1+\frac{1}{\log X}+it)j}}| \ \ \ \ \ (8)

\displaystyle  \times \exp( - {\mathbb D}(f,n \mapsto n^{it};X)^2 ).

In particular, we always have the upper bound

\displaystyle  \frac{1}{\log X} {\mathcal D} f(1+\frac{1}{\log X}+it) \ll \exp( - {\mathbb D}(f,n \mapsto n^{it};X)^2 ), \ \ \ \ \ (9)

and if one imposes the technical condition that either {f(2^j) = f(2)^j} for all {j} or {f(2^j) = 0} for all {j > 1}, then

\displaystyle  \frac{1}{\log X} |{\mathcal D} f(1+\frac{1}{\log X}+it)| \asymp \exp( -{\mathbb D}(f,n \mapsto n^{it};X)^2 ). \ \ \ \ \ (10)

If {f(p^j) = 0} for all {p > X} and {j \geq 1}, then we may delete the {\frac{1}{\log X}} terms in the above claims.

Proof: By replacing {f} with {n \mapsto f(n) n^{-it}} we may assume without loss of generality that {t=0}. We begin with the first claim (8). By expanding out the Euler product, the left-hand side of (8) is equal to

\displaystyle  \frac{1}{\log X} \prod_p |\sum_{j=0}^\infty \frac{f(p^j)}{p^{(1+\frac{1}{\log X})j}}|

and from Definition 5 and Mertens’ theorem we have

\displaystyle {\mathbb D}(f,1;X)^2 = \log\log X - \sum_{p \leq X} \frac{\mathrm{Re} f(p)}{p} + O(1)

and so it will suffice on canceling the {p=2} factor and taking logarithms to show that

\displaystyle  \sum_{p>2} \log |\sum_{j=0}^\infty \frac{f(p^j)}{p^{(1+\frac{1}{\log X})j}}| = \sum_{p \leq X} \frac{\mathrm{Re} f(p)}{p} + O(1).

For {p>2}, the quantity {\sum_{j=0}^\infty \frac{f(p^j)}{p^{(1+\frac{1}{\log X})j}}} differs from {1} by at most {\sum_{j=1}^\infty \frac{1}{p^j} \leq \frac{1}{2}}. Also we have

\displaystyle \sum_{j=0}^\infty \frac{f(p^j)}{p^{(1+\frac{1}{\log X})j}} = 1 + \frac{f(p)}{p^{1+\frac{1}{\log X}}} + O(\frac{1}{p^2})

and hence by Taylor expansion

\displaystyle \log |\sum_{j=0}^\infty \frac{f(p^j)}{p^{(1+\frac{1}{\log X})j}}| = \frac{\mathrm{Re} f(p)}{p^{1+\frac{1}{\log X}}} + O(\frac{1}{p^2}).

By the triangle inequality, it thus suffices to show that

\displaystyle  \sum_{p \leq X} \frac{1}{p} - \frac{1}{p^{1+\frac{1}{\log X}}}, \sum_{p>X} \frac{1}{p^{1+\frac{1}{\log X}}} \ll 1.

But the first bound follows from the mean value estimate {\frac{1}{p} - \frac{1}{p^{1+\frac{1}{\log X}}} \ll \frac{\log p}{p \log X}} and Mertens’ theorems, while the second bound follows from summing the bounds

\displaystyle  \sum_{X^{2^j} < p \leq X^{2^{j+1}}} \frac{1}{p^{1+\frac{1}{\log X}}} \ll 2^{-2^j}

that also arise from Mertens’ theorems.

The quantity {\sum_{j=0}^\infty \frac{f(2^j)}{2^{(1+\frac{1}{\log X}+it)j}}} is bounded in magnitude by {\sum_{j=0}^\infty \frac{1}{2^j} = 2}, giving (9). Under either of the two technical conditions listed, this quantity is equal to either {\frac{1}{1-f(2)/2^{1+\frac{1}{\log X}}}} or {1 + \frac{f(2)}{2^{1+\frac{1}{\log X}}}}, and in either case it is comparable in magnitude to {1}, giving (10).

If {f(p^j)=0} for {p>X} and {j \geq 1}, we may repeat the above arguments with the {\frac{1}{\log X}} terms deleted, since we no longer need to control the tail contribution {\sum_{p>X}}. \Box

Now we explore the geometry of the Archimedean characters {n \mapsto n^{it}} with respect to pretentious distance.

Proposition 19 If {X} is sufficiently large, then

\displaystyle  \mathbb{D}(1, n \mapsto n^{it}; X) \asymp \sqrt{\log(1 + \min(|t|,1) \log X)}

for {|t| \ll \exp( \log^{1.4} X ) )}. In particular one has

\displaystyle  \mathbb{D}(1, n \mapsto n^{it}; X) \asymp \sqrt{\log\log X}

for {1 \ll |t| \ll \exp( \log^{1.4} X ) )}.

The precise exponent {1.4} here is not of particular significance; any constant between {1} and {1.5} would work for our application to the Matomaki-Radziwill theorem, with the most important feature being that {\exp(\log^{1.4} X)} grows significantly faster than any fixed power {X^C} of {X}. The ability to raise the exponent beyond {1} will be provided by the Vinogradov estimates for the zeta function. (For Halasz’s theorem one only needs these bounds in the easier range {|t| \leq X}, which does not require the Vinogradov estimates.) As a particular corollary of this proposition and Exercise 16(iii), we see that

\displaystyle  \mathbb{D}(n \mapsto n^{it}, n \mapsto n^{it'}; X) \asymp \sqrt{\log\log X}

whenever {1 \ll |t-t'| \ll \exp( \log^{1.4} X )}; thus the Archimedean characters {n \mapsto n^{it}} do not pretend to be like each other at all once the {t} parameter is changed by at least a unit distance (but not changed by an enormous amount).

Proof: By Definition 5, our task is to show that

\displaystyle  \sum_{p \leq X} \frac{1 - \mathrm{Re} p^{it}}{p} \asymp \log(1 + \min(|t|,1) \log X).

We begin with the upper bound. For {|t| \gg 1}, the claim follows from Mertens’ theorems and the triangle inequality. For {|t| \ll \frac{1}{\log X}}, we bound

\displaystyle  1 - \mathrm{Re} p^{it} = 1 - \cos(t \log p) \ll (t \log p)^2 \ll |t|^2 \log X \log p

and the claim again follows from Mertens’ theorems (note that {\log(1 + \min(|t|,1) \log X) \asymp |t| \log X} in this case). For {\frac{1}{\log X} \ll |t| \ll 1}, we bound {1-\cos(t\log p)} by {O( t \log p )} for {\log p \leq \frac{1}{|t|}} and by {O(1)} for {\frac{1}{|t|} \leq \log p \leq \log X}, and the claim once again follows from Mertens’ theorems.

Now we establish the lower bound. We first work in the range {|t| \leq \frac{100}{\log X}}. In this case we have a matching lower bound

\displaystyle  1 - \mathrm{Re} p^{it} = 1 - \cos(t \log p) \gg (t \log p)^2

for {p \leq X^c} and some small absolute constant {c>0}, and hence

\displaystyle  \sum_{p \leq X^c} \frac{1 - \mathrm{Re} p^{it}}{p} \gg_c |t|^2 \sum_{X^{c/2} \leq p \leq X^c} \frac{\log^2 X}{p} \gg_c |t|^2 \log^2 X

giving the lower bound. Now suppose that {|t| \geq \frac{100}{\log X} \leq |t| \leq 100}. Applying Lemma 18 with {f=1} and {X} replaced by some {y>10}, we have

\displaystyle  \frac{1}{\log y} |\zeta(1+\frac{1}{\log y}+it)| \asymp \exp( -{\mathbb D}(1,n \mapsto n^{it};y)^2 ); \ \ \ \ \ (11)

for {t \leq 100} and {\log y \gg 1/|t|} one has

\displaystyle  |\zeta(1+\frac{1}{\log y}+it)| \asymp \frac{1}{|t|}

and thus

\displaystyle  {\mathbb D}(1,n \mapsto n^{it};y)^2 = \log\log y - \log \frac{1}{|t|} + O(1)

or equivalently by Mertens’ theorem

\displaystyle  \sum_{p \leq y} \frac{\mathrm{Re} p^{it}}{p} = \log \frac{1}{|t|} + O(1).

Applying this bound with {y} replaced by {\exp(1/|t|)} and by {X} we conclude that

\displaystyle \sum_{\exp(1/|t|) \leq p \leq X} \frac{\mathrm{Re} p^{it}}{p} \ll 1

and hence by Mertens’ theorem and the triangle inequality (for {c} small enough)

\displaystyle \sum_{\exp(1/|t|) \leq p \leq X} \frac{1 - \mathrm{Re} p^{it}}{p} \gg \log( |t|\log X),

giving the claim.

Finally, assume that {100 \leq |t| \ll \exp(\log^{1.4} X)}. From Mertens’ theorems we have

\displaystyle  \sum_{\exp(\log^{0.99} X) \leq p \leq X} \frac{1}{p} \gg \log\log X

so by the triangle inequality it will suffice to show that

\displaystyle  \sum_{\exp(\log^{0.99} X) \leq p \leq X} \frac{1}{p^{1+it}} \ll 1.

Taking logarithms in (11) for {y = X, \exp(\log^{0.99} X)} we have

\displaystyle  \sum_{p \leq X} \frac{1}{p^{1+it}} = \log \zeta(1+\frac{1}{\log X} + it) + O(1)

and also

\displaystyle  \sum_{p \leq \exp(\log^{0.99} X)} \frac{1}{p^{1+it}} = \log \zeta(1+\frac{1}{\log^{0.99} X} + it) + O(1)

hence by the fundamental theorem of calculus

\displaystyle  \sum_{\exp(\log^{0.99} X) \leq p \leq X} \frac{1}{p^{1+it}} \ll \frac{1}{\log^{0.99} X} |\frac{\zeta'}{\zeta}(\sigma+it)| + 1

for some {1+\frac{1}{\log X} \leq \sigma \leq \frac{1}{1+\log^{0.99} X}}. However, from the Vinogradov-Korobov estimates (Exercise 43 of Notes 2) we have

\displaystyle  \frac{\zeta'}{\zeta}(\sigma+it) \ll \log^{2/3} |t|

whenever {\sigma > 1 + \frac{(\log\log |t|)^{1/3}}{\log^{2/3} |t|}}; since we are assuming {\log |t| \ll \log^{1.4} X}, the claim follows. \Box

Exercise 20 Assume the Riemann hypothesis. Establish a bound of the form

\displaystyle  \frac{1}{P} \sum_{n \leq P} \Lambda(n) n^{-it} \ll \frac{1}{1+|t|} + P^{-c}

for some absolute constant {c>0} whenever {P \geq \log^C(2+|t|)} for a sufficiently large absolute constant {C}. (Hint: use Perron’s formula and shift the contour to within {\frac{1}{\log P}} of the critical line.) Use this to conclude that the upper bound {|t| \ll \exp(\log^{1.4} X)} in Proposition 19 can be relaxed (assuming RH) to {|t| \ll \exp(\exp(\log^{0.9} X))}.

Exercise 21 Let {f} be a {1}-bounded multiplicative function with {|f(p)|=1} for all {p}. For any {X \geq 2}, show that

\displaystyle  \lim\inf_{t \rightarrow \infty} \mathbb{D}(1, n \mapsto n^{it}; X) = 0.

Thus some sort of upper bound on {t} in Proposition 19 is necessary.

Exercise 22 Let {\chi} be a non-principal character of modulus {q}, and let {X} be sufficiently large depending on {q}. Show that

\displaystyle  \mathbb{D}(1, n \mapsto \chi(n) n^{it}; X) \asymp \sqrt{\log\log X}

for all {t = O( \exp( \log^{1.4} X ) )}. (One will need to adapt the Vinogradov-Korobov theory to Dirichet {L}-functions.)

Proposition 19 measures how close the function {1} lies to the Archimedean characters {n \mapsto n^{it}}. Using the triangle inequality, one can then lower bound the distance of any other {1}-bounded multiplicative function to these characters:

Proposition 23 Let {X} be sufficiently large. Then for any {1}-bounded multiplicative function {f}, there exists a real number {t_0} with {|t_0| \leq \exp(\log^{1.4} X)} such that

\displaystyle  \mathbb{D}( f, n \mapsto n^{it}; X ) \gg \sqrt{\log(1 + \min(|t-t_0|,1) \log X)}

whenever {|t| \leq \exp(\log^{1.4} X)}. In particular we have

\displaystyle  \mathbb{D}( f, n \mapsto n^{it}; X ) \asymp \sqrt{\log\log X}

if {|t| \leq \exp(\log^{1.4} X)} and {|t-t_0| \geq 1}. If {f} is real-valued, one can take {t_0=0}.

Proof: For the first claim, choose {t_0} to minimize {\mathbb{D}( f, n \mapsto n^{it_0}; X )} among all real numbers with {|t_0| \leq \exp(\log^{1.4} X)}. Then for any other {t}, we see from the triangle inequality that

\displaystyle  \mathbb{D}( n \mapsto n^{it}, n \mapsto n^{it_0}; X) \leq \mathbb{D}( f, n \mapsto n^{it}; X ) + \mathbb{D}( f, n \mapsto n^{it_0}; X ) \leq 2 \mathbb{D}( f, n \mapsto n^{it}; X ).

But from Proposition 19 we have

\displaystyle  \mathbb{D}( n \mapsto n^{it}, n \mapsto n^{it_0}; X) = \mathbb{D}( 1, n \mapsto n^{i(t-t_0)}; X)

\displaystyle \gg \sqrt{\log(1 + \min(|t-t_0|,1) \log X)},

giving the first claim. When {f} is real valued, we can similarly use the triangle inequality to bound

\displaystyle  \mathbb{D}( n \mapsto n^{it}, n \mapsto n^{-it}; X)

\displaystyle \leq \mathbb{D}( f, n \mapsto n^{it}; X ) + \mathbb{D}( f, n \mapsto n^{-it}; X )

\displaystyle = 2 \mathbb{D}( f, n \mapsto n^{it}; X )

which gives

\displaystyle  \mathbb{D}( f, n \mapsto n^{it}; X ) \gg \sqrt{\log(1 + \min(|t|,1) \log X)}

giving the second claim. \Box

We can now quantify the non-pretentious nature of the Möbius and Liouville functions.

Exercise 24

  • (i) If {X} is sufficiently large, and {f} is a real-valued {1}-bounded multiplicative function, show that

    \displaystyle  \mathbb{D}( f, n \mapsto n^{it}; X) \gg \mathbb{D}(f, 1; X)

    whenever {t \ll \exp(\log^{1.4} X)}.

  • (ii) Show that

    \displaystyle  \mathbb{D}( \mu, n \mapsto n^{it}; X) = \mathbb{D}( \lambda, n \mapsto n^{it}; X) \gg \sqrt{\log\log X}

    whenever {t \ll \exp(\log^{1.4} X)} and {X} is sufficiently large.

  • (iii) If {\chi} is a Dirichlet character of some period {q}, show that

    \displaystyle  \mathbb{D}( \mu, n \mapsto n^{it}; X) = \mathbb{D}( \lambda, n \mapsto n^{it}; X) \gg \sqrt{\log\log X}

    whenever {t \ll \exp(\log^{1.4} X)} and {X} is sufficiently large depending on {q}.

— 2. Halasz’s inequality —

We now prove Halasz’s inequality. As a warm up, we prove Proposition 6:

Proof: (Proof of Proposition 6) By Exercise 16(iv) we may normalise {g=1}. We may assume that {f(p^j)=0} when {p>X}, since the value of {f} on these primes has no impact on the sum {\sum_{n \leq X} \frac{f(n)}{n}} or on {\mathbb{D}(f,1;X)}. In particular, from Euler products we now have the absolute convergence {\sum_{n=1}^\infty \frac{|f(n)|}{n} < \infty}. Let {0 < \varepsilon < 1} be a small quantity to be optimized later, and {\psi: {\bf R} \rightarrow {\bf R}} be smooth compactly supported function on {[-1,1]} that equals one on {[-1+\varepsilon,1+\varepsilon]} with the derivative bounds {\psi^{(j)}(u) \ll_j \varepsilon^{-j}}, on {[-1,1] \backslash [-1+\varepsilon,1+\varepsilon]}, so on integration by parts we see that the Fourier transform {\hat \psi(t) = \int_{\bf R} \psi(u) e^{itu}} obeys the bounds

\displaystyle  \hat \psi(t) \ll_j \min( 1, \frac{1}{|t|}, \frac{1}{\varepsilon^{j-1} |t|^j} )

for any {t \in {\bf R}} and {j \geq 2}. From the triangle inequality have

\displaystyle  \frac{1}{\log X} \sum_{n \leq X} \frac{f(n)}{n} = \frac{1}{\log X} \sum_{n=1}^\infty \frac{f(n)}{n} \psi(\frac{\log n}{\log X}) + O( \varepsilon ).

Applying Proposition 10 applied to a finite truncation of {f}, and then using the absolute convergence of {\sum_{n=1}^\infty \frac{|f(n)|}{n}} and dominated convergence to eliminate the truncation (or by using Proposition 7 of Notes 2 and then shifting the contour), we can write the right-hand side as

\displaystyle  \frac{1}{2\pi} \int_{\bf R} {\mathcal D} f(1+it) \hat \psi( t \log X )\ dt + O(\varepsilon)

which after rescaling by {\log X} gives

\displaystyle  \frac{1}{\log X} \sum_{n \leq X} \frac{f(n)}{n} \ll_j \int_{\bf R} \frac{1}{\log X} |{\mathcal D} f(1+\frac{it}{\log X})| \min( 1, \frac{1}{|t|}, \frac{1}{\varepsilon^{j-1} |t|^j} )\ dt + \varepsilon. \ \ \ \ \ (12)

Now from Lemma 18 one has

\displaystyle  \frac{1}{\log X} {\mathcal D} f(1+\frac{it}{\log X}) \ll \exp( \mathbb{D}( f, n \mapsto n^{-it/\log X}; X)^2 ) \ \ \ \ \ (13)

\displaystyle \ll \exp( -\frac{M}{2} + O(\log(1 + \min(|t/\log X|,1) \log X)) )

\displaystyle \leq e^{-M/2} (1 + |t|)^{O(1)},

where {M := \mathbb{D}(f,1;X)^2}, and thus we can bound (12) by

\displaystyle  \ll_j e^{-M/2} \int_{\bf R} (1+|t|)^{O(1)} \min( 1, \frac{1}{|t|}, \frac{1}{\varepsilon^{j-1} |t|^j} ) + \varepsilon

\displaystyle  \ll e^{-M/2} \varepsilon^{-O(1)} + \varepsilon

if {j} is chosen to be a sufficiently large absolute constant. The claim then follows by optimising in {\varepsilon}. \Box

We remark that an elementary proof of Proposition 6 with {c=1/2} was given in Proposition 1.2.6 of .

It was observed \href

that we can also sharpen Proposition 6 when {f} is non-negative by purely elementary methods:

Proposition 25 Let {X \geq 10}, and suppose that {f: {\bf N} \rightarrow [0,1]} is multiplicative, {1}-bounded, and non-negative. Then

\displaystyle  \frac{1}{\log X} \sum_{n \leq X} \frac{f(n)}{n} \asymp \exp( -{\mathbb D}(f,1;X)^2 ).

Proof: From Definition 5 and Mertens’ theorems the estimate is equivalent to

\displaystyle  \sum_{n \leq X} \frac{f(n)}{n} \asymp \exp( \sum_{p \leq X} \frac{f(p)}{p} ).

For the upper bound, we may assume that {f(p^j)} vanishes for {p>X} since these primes make no contribution to either side. We can then bound

\displaystyle  \sum_{n \leq X} \frac{f(n)}{n} \leq \sum_{n=1}^\infty \frac{f(n)}{n}

\displaystyle  = \prod_{p \leq X} \sum_{j=0}^\infty \frac{f(p^j)}{p^j}

\displaystyle  = \prod_{p \leq X} \exp( \frac{f(p)}{p} + O(\frac{1}{p^2}) )

\displaystyle  \ll \exp( \sum_{p \leq X} \frac{f(p)}{p} ).

For the lower bound, we let {g(n)} be the {1}-bounded multiplicative function with {g(p) := 1-f(p)} and {g(p^j) = 1} for {j > 1} and all primes {p}. Then we observe the pointwise bound {f*g(n) \geq 1} for all {n}, hence

\displaystyle  \sum_{n \leq X} \frac{f(n)}{n} \sum_{n \leq X} \frac{g(n)}{n} \geq \sum_{n \leq X} \frac{1}{n} \gg \log X.

By the upper bound just obtained and Mertens’ theorems, we have

\displaystyle  \sum_{n \leq X} \frac{g(n)}{n} \ll \exp( \sum_{p \leq X} \frac{g(p)}{p} ) \ll \log X \exp( -\sum_{p \leq X} \frac{f(p)}{p} )

and the claim follows. \Box

Now we can prove Theorem 7.

Proof: (Proof of Theorem 7) We may assume that {T \geq 10}, since the claim is trivial otherwise. On the other hand, for {T > \log^C X} for a sufficiently large {C}, the second term on the right-hand side is dominated by the first, so the estimate does not become any stronger as one increases {T} beyond {\log^C X}, and hence we may assume without loss of generality that {T \leq \log^{O(1)} X}.

We abbreviate {M := \min_{|t| \leq T} \mathbb{D}(f, n \mapsto n^{it};X)^2}, thus we wish to show that

\displaystyle  \frac{1}{X} \sum_{n \leq X} f(n) \ll \exp( - c M ) + \frac{1}{T}. \ \ \ \ \ (14)

It is convenient to remove some exceptional values of {n}. Let {0 < \varepsilon < \frac{1}{10}} be a small quantity to be chosen later, subject to the restriction

\displaystyle  X^{\varepsilon^3} \geq T^{10}. \ \ \ \ \ (15)

From standard sieves (e.g., Theorem 32 from Notes 4), we see that the proportion of numbers in {[X/2,X]} that do not have a “large” prime factor in {[X^\varepsilon,X]}, or do not have a “medium” prime factor in {[X^{\varepsilon^2},X^\varepsilon)}, is {O(\varepsilon)}. Thus by paying an error of {\varepsilon}, we may restrict to numbers that have at least one “large” prime factor in {[X^\varepsilon,X]} and at least one “medium” prime factor in {[X^{\varepsilon^2},X^\varepsilon)} (and no prime factor larger than {X}). This is the same as replacing {f} with the Dirichlet convolution

\displaystyle  \tilde f := f_1 * f_2 * f_3

where {f_1} is the restriction of {f} to numbers with all prime factors in the “small” range {[1,X^{\varepsilon^2})}, {f_2} is the restriction of {f} to numbers in {[2,X]} with all prime factors in the “medium” range {[X^{\varepsilon^2},X^\varepsilon)}, and {f_3} is the restriction of {f} to numbers in {[2,X]} with all prime factors in the “large” range {[X^\varepsilon,X]}. We can thus write

\displaystyle  \frac{1}{X} \sum_{n \leq X} f(n) = \frac{1}{X} \sum_{n \leq X} \tilde f(n) + O(\varepsilon).

This we can write in turn as

\displaystyle  \sum_{n=1}^\infty \frac{\tilde f(n)}{n} \psi(\log n - \log X) + O(\varepsilon)

where {\psi(u) := 1_{[-\infty, 0]} e^u}. It is not advantageous to immediately apply Proposition 10 due to the rough nature of {\psi} (which is not even Schwartz). But if we let {\varphi} be a Schwartz function of total mass {1} whose Fourier transform is supported on {[-1,1]}, and define the mollified function

\displaystyle  \tilde \psi(u) := \int_{\bf R} \psi(u - \frac{v}{T}) \varphi(v)\ dv

then one easily checks that

\displaystyle  \psi(u) - \tilde \psi(u) \ll (1 + T |u|)^{-10}

which from the triangle inequality soon gives the bounds

\displaystyle  \sum_{n=1}^\infty \frac{\tilde f(n)}{n} (\psi-\tilde \psi)(\log n - \log X) \ll \frac{1}{T}.

Hence we may write

\displaystyle  \frac{1}{X} \sum_{X/2 \leq n \leq X} f(n) = \sum_{n=1}^\infty \frac{\tilde f(n)}{n} \tilde \psi(\log n - \log X) + O(\varepsilon) + O(\frac{1}{T}).

Now we apply Proposition 10 and the triangle inequality to bound this by

\displaystyle  \int_{\bf R} |{\mathcal D} \tilde f(1 + it)| |\hat{\tilde \psi}(t)|\ dt + \varepsilon + \frac{1}{T}.

But we may factor

\displaystyle  \hat{\tilde \psi}(t) = \hat \psi(t) \varphi(t/T)

and we may rather crudely bound {\hat \psi(t) = O(1)}, hence we have

\displaystyle  \frac{1}{X} \sum_{n \leq X} f(n) \ll \| \tilde f \|_{FL^1([-T,T])} + \varepsilon + \frac{1}{T}.

Using the Hölder inequalities (5), (6) we have

\displaystyle  \| \tilde f \|_{FL^1([-T,T])} \leq \| f_1 \|_{FL^\infty([-T,T])} \| f_2 \|_{FL^2([-T,T])} \| f_3 \|_{FL^2([-T,T])}.

From Exercise 14 we have

\displaystyle  \| f_2 \|_{FL^2([-T,T])}^2 \ll \sum_n \frac{1}{n} (\frac{T}{n} \sum_{m: |n-m| \leq n/T} |f_2(m)|)^2.

Note that {f_2} is supported on {[X^{\varepsilon^2}, X]} and hence {n} is effectively also restricted to the range {X^{\varepsilon^2} \ll n \ll X}. From standard sieves (using (15)), we have

\displaystyle  \frac{T}{n} \sum_{m: |n-m| \leq n/T} |f_2(m)| \ll \frac{\varepsilon^{-O(1)}}{\log X}

and thus

\displaystyle \| f_2 \|_{FL^2([-T,T])}^2 \ll \frac{\varepsilon^{-O(1)}}{\log X}.


\displaystyle \| f_3 \|_{FL^2([-T,T])}^2 \ll \frac{\varepsilon^{-O(1)}}{\log X}.

Finally, from Lemma 18 one has

\displaystyle  \| f_1 \|_{FL^\infty([-T,T])} \ll \log X^{\varepsilon^2} \exp( - \inf_{|t| \leq T} \mathbb{D}( f, n \mapsto n^{-it}; X^{\varepsilon^2})^2 )

\displaystyle  \ll \log X \exp( - \inf_{|t| \leq T} \mathbb{D}( f, n \mapsto n^{-it}; X)^2)

\displaystyle  = e^{-M} \log X.

Putting all this together, we conclude that

\displaystyle \frac{1}{X} \sum_{n \leq X} f(n) = \varepsilon^{-O(1)} e^{-M} + \varepsilon + \frac{1}{T}.

Setting {\varepsilon := e^{-cM}} for some sufficiently small constant {c>0} (which in particular will ensure (15) since {M \ll \log\log X + O(1)}), we obtain the claim. \Box

One can optimise this argument to make the constant {c} in Theorem 7 arbitrarily close to {1/2}; see this previous post. With an even more refined argument, one can prove the sharper estimate

\displaystyle  \frac{1}{X} \sum_{n \leq X} f(n) \ll (1+M) e^{-M} + \frac{1}{T}

with {M := \min_{|t| \leq T} \mathbb{D}(f, n \mapsto n^{it};X)^2 )}, a result initially due to Montgomery and Tenenbaum; see Theorem 2.3.1 of this text of Granville and Soundararajan. In the case of non-negative {f}, an elementary argument gives the stronger bound {\frac{1}{X} \sum_{n \leq X} f(n) \ll \exp( - \mathbb{D}(f;1,X)^2)}; see Corollary 1.2.3 of . However, the slightly weaker estimates in Theorem \ref Let {X} be sufficiently large. Then for any real-valued {1}-bounded multiplicative function {f}, one has

\displaystyle  \frac{1}{X} \sum_{n \leq X} f(n) \ll \exp( - c \mathbb{D}(f, 1;X)^2 ).

for an absolute constant {c>0}.

Thus for instance, setting {f = \mu}, we can use Wirsing’s theorem and Exercise 24 (or Mertens’ theorem) to recover a form of the prime number theorem with a modestly decaying error term, in that

\displaystyle  \frac{1}{X} \sum_{n \leq X} \mu(n) \ll \log^{-c} X \ \ \ \ \ (16)

for all large {X} and some absolute constant {c>0}. (Admittedly, we did use the far stronger Vinogradov-Korobov estimates earlier in this set of notes; but a careful inspection reveals that those estimates were not used in the proof of (16), so this is a non-circular proof of the prime number theorem.)

— 3. The Matomaki-Radziwill theorem —

We now give the proof of the Matomaki-Radziwill theorem, though we will leave several of the details to exercises. We first make a small but convenient reduction:

Exercise 26 Show that to prove Theorem 8, it suffices to do so for functions {f} that are completely multiplicative. (This is similar to Exercise 1.)

Now we use Exercise 11 to phrase the theorem in an equivalent Fourier form:

Theorem 27 (Matomaki-Radziwill theorem, Fourier form) Let {\eta>0}, and let {1 \leq T \leq X}, with {X/T} sufficiently large depending on {\eta}. Let {\psi:{\bf R} \rightarrow {\bf R}} be a fixed smooth compactly supported function, and set {\psi_X(n) := \psi(\log n - \log X)}. Suppose that {f} is a {1}-bounded completely multiplicative function such that

\displaystyle  \| f \psi_X \|_{FL^2([-T,T])} \gg \eta. \ \ \ \ \ (17)

Then one has

\displaystyle  \mathbb{D}(f, n \mapsto n^{it};X) \ll_\eta 1

for some {t \ll_\eta T}.

Let us assume Theorem 27 for the moment and see how it implies Theorem 8. In the latter theorem we may assume without loss of generality that {\eta} is small. We may assume that {H \leq \eta^2 X}, since the case {\eta^2 X \leq H \leq X} follows easily from Theorem \reF{halasz}.

Let {\psi} be a smooth compactly supported function with {\psi(u) = e^u} on {[-10,10]}. By hypothesis, we have

\displaystyle  \frac{1}{X} \int_X^{2X} |\frac{X}{H} \sum_{x \leq n \leq x+H} \frac{f \psi_X(n)}{n}|^2\ dx \gg \eta^2. \ \ \ \ \ (18)

Let {\varphi} be a Schwartz function of mean {1} whose Fourier transform is supported on {[-1,1]}. For any {x\in [X,2X]}, we consider the expression

\displaystyle  \frac{X}{H} \int_x^{x+H} \frac{X}{\eta^2 H} \sum_n \frac{f\psi_X(n)}{n} \varphi( \frac{X}{\eta^2 H} (\log n - \log y) )\ \frac{dy}{y}. \ \ \ \ \ (19)

A routine calculation using the rapid decrease of {\varphi} shows that

\displaystyle  \frac{X}{\eta^2 H} \int_x^{x+H} \varphi( \frac{X}{\eta^2 H} (\log n - \log y) )\ \frac{dy}{y} = 1_{[x,x+H]}(n)

\displaystyle + (1 + \frac{|\log n - \log x|}{\eta^2 H/X})^{-10} + (1 + \frac{|\log n - \log(x+H)|}{\eta^2 H/X})^{-10}

and thus the expression (19) can be estimated as

\displaystyle  \frac{X}{H} \sum_{x \leq n \leq x+H} \frac{f\psi_X(n)}{n} + O(\eta^2).

By Cauchy-Schwarz we then have

\displaystyle  |\frac{X}{H} \sum_{x \leq n \leq x+H} \frac{f\psi_X(n)}{n}|^2

\displaystyle \ll \frac{X}{H} \int_x^{x+H} |\frac{X}{\eta^2 H} \sum_n \frac{f\psi_X(n)}{n} \varphi( \frac{X}{\eta^2 H} (\log n - \log y) )|^2\ \frac{dy}{y} + \eta^4.

Averaging this for {x \in [X,2X]} and using Fubini’s theorem, we have

\displaystyle  \frac{1}{X} \int_X^{2X} |\frac{X}{H} \sum_{x \leq n \leq x+H} \frac{f\psi_X(n)}{n}|^2\ dx

\displaystyle \ll \int_0^\infty |\frac{X}{\eta^2 H} \sum_n \frac{f\psi_X(n)}{n} \varphi( \frac{X}{\eta^2 H} (\log n - \log y) )|^2\ \frac{dy}{y} + \eta^4

and thus from (18) and the triangle inequality we have

\displaystyle  \int_0^\infty |\frac{X}{\eta^2 H} \sum_n \frac{f\psi_X(n)}{n} \varphi( \frac{X}{\eta^2 H} (\log n - \log y) )|^2\ \frac{dy}{y} \gg \eta^2.

On the other hand, from Exercise 11 we have

\displaystyle  \int_0^\infty |\frac{X}{\eta^2 H} \sum_n \frac{f\psi_X(n)}{n} \psi( \frac{X}{\eta^2 H} (\log n - \log y) )|^2\ \frac{dy}{y} \ll \| f \psi_X \|_{FL^2([-T,T])}^2.

Applying Theorem 27 (with a slightly smaller value of {H} and {\eta}), we obtain the claim.

Exercise 28 In the converse direction, show that Theorem 27 is a consequence of Theorem 8.

Exercise 29 Let {f} be supported on {[X,3X]}, and let {1 \leq H \leq X}. Show that

\displaystyle  \frac{1}{X} \int_X^{2X} |\frac{1}{H} \sum_{x \leq n \leq x+H} f(n)|^2\ dx

\displaystyle \ll \frac{1}{X/H} \int_{H/X}^{3H/X} \int_0^\infty |\frac{X}{H} \sum_{x \leq n \leq (1+\theta)x} \frac{f(n)}{n}|^2\ \frac{dx}{x} d\theta.

(Hint: use summation by parts to express {\sum_{x \leq n \leq x+H} f(n)} as a suitable linear combination of sums {\sum_{x \leq n \leq (1+\theta)x} \frac{f(n)}{n}} and {\sum_{(x+H) \leq n \leq (1+\theta)(x+H)} \frac{f(n)}{n}}, then use the Cauchy-Schwarz inequality and the Fubini-Tonelli theorem.) Conclude in particular that

\displaystyle  \frac{1}{X} \int_X^{2X} |\frac{1}{H} \sum_{x \leq n \leq x+H} f(n)|^2\ dx \ll \int_{\bf R} |{\mathcal D} f(1+it)|^2 (1 + \frac{|t|}{T})^{-2}\ dx

where {T := X/H}. (This argument is due to Saffari and Vaughan.)

It remains to establish Theorem 27. As before we may assume that {\eta} is small. Let us call a finitely supported arithmetic function {g} large on some subset {\Omega} of {[-X,X]} if

\displaystyle  \| g \|_{FL^2(\Omega)} \gg \eta,

and small on {\Omega} if

\displaystyle  \| g \|_{FL^2(\Omega)} \ll \eta^2.

Note that a function cannot be simultaneously large and small on the same set {\Omega}; and if a function is large on some subset {\Omega}, then it remains large on {\Omega} after modifying by any small error (assuming {\eta} is small enough, and adjusting the implied constants appropriately). From the hypothesis (17) we know that {f \psi_X} is large on {[-T,T]}. As discussed in the introduction, the strategy is now to decompose {[-T,T]} into various regions, and on each of these regions split {f \psi_X} (up to small errors) as an average of Dirichlet convolutions of other factors which enjoy either good {FL^\infty} estimates or good {FL^2} estimates on the given region.

We will need the following ranges:

  • (i) {[P_0,Q_0]} is the interval

    \displaystyle  [P_0,Q_0] := [\exp( \log^{0.9} X ), \exp( \log^{0.95} X )].

  • (ii) {[P_1,Q_1]} is the interval

    \displaystyle  [P_1,Q_1] := [\exp( (\log\log X)^2 ), \exp( (\log\log X)^3 )].

  • (iii) {[P_2,Q_2]} is the interval

    \displaystyle  [P_2,Q_2] := [\exp( (\log\log\log X)^2 ), \exp( (\log\log\log X)^3 )].

We will be able to cover the range {T \leq X/Q_0} just using arguments involving the zeroth interval {[P_0,Q_0]}; the range {T \leq X/Q_1} can be covered using arguments involving the zeroth interval {[P_0,Q_0]} and the first interval {[P_1,Q_1]}; and the range {T \leq X/Q_2} can be covered using arguments involving all three intervals {[P_0,Q_0], [P_1,Q_1], [P_2,Q_2]}. Coverage of the remaining ranges of {T} can be done by an extension of the methods given here and will be left to the exercises at the end of the notes.

We introduce some weight functions and some exceptional sets. For any {P > 10}, let {\phi: {\bf R} \rightarrow {\bf R}} be a bump function on {[-1,1]} of total mass {1}, and let {w_P} denote the arithmetic function

\displaystyle  w_P(n) := \frac{\log P}{\eta^4} \sum_{p} \phi( \frac{\log p - \log P}{\eta^4} ) 1_{n=p}

supported on the primes {p = (1+O(\eta^4)) P}. We then define the following subsets of {[-X,X]}:

  • (i) {\Omega_0} is the set of those {t \in [-X,X]} such that

    \displaystyle  |{\mathcal D}(fw_P)(1+it)| \geq \frac{1}{\log^{10} X}

    for some dyadic {P \in [P_0,Q_0]} (i.e., {P} is restricted to be a power of {2}).

  • (ii) {\Omega_1} is the set of those {t \in [-X,X]} such that

    \displaystyle  |{\mathcal D}(fw_P)(1+it)| \geq P^{-1/20}

    for some dyadic {P \in [P_1,Q_1]}.

  • (iii) {\Omega_2} is the set of those {t \in [-X,X]} such that

    \displaystyle  |{\mathcal D}(fw_P)(1+it)| \geq P^{-1/30}

    for some dyadic {P \in [P_2,Q_2]}.

We will establish the following claims:

Proposition 30

  • (i) If {T \leq X/Q_i} for some {i=0,1,2}, then {f \psi_X} is small on {[-T,T] \backslash \Omega_i}.
  • (ii) {f \psi_X} is small on {\Omega_1 \backslash \Omega_0}.
  • (iii) {f \psi_X} is small on {\Omega_2 \backslash \Omega_1}.
  • (iv) If {f \psi_X} is large on {[-T,T] \cap \Omega_0}, then one has {\mathbb{D}(f, n \mapsto n^{it};X) \ll_\eta 1} for some {t \ll_\eta T}.

Note that parts (i) (with {i=0}) and (iv) of the claims are already enough to treat the case {T \leq X/Q_0}; parts (i) (with {i=1}), (ii), and (iv) are enough to treat the case {T \leq X/Q_1}; and parts (i) (with {i=2}), (ii), (iii), and (iv) are enough to treat the case {T \leq X/Q_2}.

We first prove (i). For {i=0,1,2}, let {\tilde w_{[P_i,Q_i]}} denote the function

\displaystyle  \tilde w_{[P_i,Q_i]} := \frac{1}{\log \frac{\log Q_i}{\log P_i}} \sum_{P \in [P_i,Q_i]} \frac{1}{\log P} w_P. \ \ \ \ \ (20)

This function is a variant of the function {w_{[P_i,Q_i]}} introduced in (7). A key point is that the convolutions {1 * w_{[P_i,Q_i]}} stay close to {1}:

Exercise 31 (Turan-Kubilius inequalities) For {i=0,1,2}, show that

\displaystyle  \frac{1}{X} \sum_{X/2 \leq n \leq 4X} |1*\tilde w_{[P_i,Q_i]}(n) - 1|^2 \ll \eta^4.

(Hint: use the second moment method, as in the proof of Lemma 2 of Notes 9.)

Let {i=0,1,2}. Inserting the bounded factor {|f \psi_X(n)|^2} in the above estimates, and applying Exercise 14, we conclude in particular that the expression

\displaystyle  f \psi_X (1 * \tilde w_{[P_i,Q_i]}) - f \psi_X

is small on {[-X,X]}. Since {f} is completely multiplicative, we can write this expression as

\displaystyle  (f * (f\tilde w_{[P_i,Q_i]})) \psi_X - f \psi_X

We now perform some technical manipulations to move the cutoff {\psi_X} to a more convenient location. From (20) we have

\displaystyle (f * (f \tilde w_{[P_i,Q_i]})) \psi_X = \frac{1}{\log\frac{\log Q_i}{\log P_i}} \sum_{P \in [P_i,Q_i]} \frac{1}{\log P} (f * (f w_{P})) \psi_X.

We would like to approximate {(f*(f w_{P})) \psi_X} by {(f \psi_{X/P}) * (fw_{P})}. A brief triangle inequality calculation using the smoothness of {\psi}, the {1}-boundedness of {f}, and the narrow support of {w_{P'}} shows that

\displaystyle  (f * (f w_{P})) \psi_X = (f \psi_{X/P}) * (fw_{P}) + O( \eta^4 (1 * w_{P}) \tilde \psi_X )

where {\tilde \psi_X} is defined similarly to {\psi_X} but with a slightly larger choice of initial cutoff {\tilde \psi}. Integrating this we conclude that

\displaystyle (f * (f \tilde w_{[P_i,Q_i]})) \psi_X

\displaystyle = \frac{1}{\log\frac{\log Q_i}{\log P_i}} \sum_{P \in [P_i,Q_i]} \frac{1}{\log P} ((f\psi_{X/P}) * (f w_{P}))

\displaystyle + O( \eta^4 (1 * \tilde w_{[P_i,Q_i]}) \tilde \psi_X ).

Using Exercise 31 and Exercise 14, the error term is small on {[-X,X]}. Thus we conclude that

\displaystyle  \frac{1}{\log\frac{\log Q_i}{\log P_i}} \sum_{P \in [P_i,Q_i]} \frac{1}{\log P_i} ((f\psi_{X/P}) * (f w_{P})) - f \psi_X \ \ \ \ \ (21)

is small on {[-X,X]}, and hence also on {[-T,T] \backslash \Omega_i}. Thus by the triangle inequality, it will suffice to show that

\displaystyle  (f\psi_{X/P}) * (f w_{P})

is small on {[-T,T] \backslash \Omega_i} for each {P \in [P_i,Q_i]}. But by construction we definitely have

\displaystyle  \|fw_P \|_{FL^\infty([-T,T] \backslash \Omega_i)} \ll \eta^4

while from Exercise 14 and the hypothesis {T \leq X/Q_i \leq X/P} we have

\displaystyle  \|f \psi_{X/P} \|_{FL^2([-T,T] \backslash \Omega_i)} \ll 1

and the claim (i) now follows from (6).

We now jump to (iv). The first observation is that the set {\Omega_0} is quite small. To quantify this we use the following bound:

Proposition 32 (Large values of {{\mathcal D} w_P(1+it)}) Let {X} be sufficiently large depending on {\eta}, and let {2 \leq P \leq X^{1/2}}. Then for any {\sigma > 0}, the set

\displaystyle  \Omega := \{ t \in [-X,X]: |{\mathcal D} w_P(1+it)| \geq \sigma \}

has measure at most {\exp( (\frac{\log X}{\log P} +1) (1 + \log \frac{\log X}{\sigma \log P} ) )}.

Proof: We use the high moment method. Let {k} be a natural number to be optimised in later, and let {w_P^{*k}} be the convolution of {k} copies of {w_P}. Then on {\Omega} we have {|{\mathcal D} w_P^{*k}(1+it)| \geq \sigma^k}. Thus by Markov’s inequality, the measure of {\Omega} is at most

\displaystyle  \sigma^{-2k} \| w_P^{*k} \|_{FL^2([-X,X])}^2.

To bound this we use Exercise 14. If we choose {k} to be the first integer for which {P^k \geq X}, then this exercise gives us the bound

\displaystyle  \| w_P^{*k} \|_{FL^2([-X,X])}^2 \ll O_\eta(1)^{O(k)} \frac{1}{P^k} \sum_n w_P^{*k}(n)^2.

From the fundamental theorem of arithmetic we see that {w_P^{*k}(n) \ll k^k}, hence

\displaystyle \sum_n w_P^{*k}(n)^2 \ll k^k \sum_n w_P^{*k}(n) = k^k (\sum_n w_P(n))^k.

From the prime number theorem we have {\sum_n w_P(n) \ll_\eta P}. Putting all this together, we conclude that the measure of {\Omega} is at most

\displaystyle  O_\eta( \frac{k}{\sigma^2} )^{k}.

Since {k \leq \frac{\log X}{\log P} + 1}, we obtain the claim. \Box

Applying this proposition with {P} ranging between {P_0} and {Q_0} and {\sigma = \frac{1}{\log^{10} X}}, and applying the union bound, the we see that the measure of {\Omega_0} is at most {\exp(\log^{0.2} X)}. To exploit this, we will need some bounds of Vinogradov-Korobov type:

Exercise 33 (Vinogradov bounds) For {P \geq 10}, establish the bound

\displaystyle  {\mathcal D} w_P( 1+it ) \ll_{\eta,\varepsilon} \frac{1}{(1+|t|)^2}

\displaystyle  + \log^{O(1)}(2+|t|) \exp( - \frac{\log P}{\log^{2/3+\varepsilon}(2+|t|)} )

for any {t \in {\bf R}} and {\varepsilon>0}. (Hint: replace {w_P} with a weighted version of the von Mangoldt function, apply Proposition 7 of Notes 2, and shift the contour, using the zero free region from Exercise 43 of Notes 2.)

Now we can establish we use the following variant of the Montgomery-Halasz large values theorem (cf. Proposition 9 from Notes 6):

Proposition 34 (Montgomery-Halasz for primes) Let {X \geq 10}, let {P \in [P_0,Q_0]}, and {\Omega \subset [-X,X]} have measure {O(\exp(\log^{0.2} X))}. Then for any {1}-bounded function {f}, one has

\displaystyle  \| f w_P \|_{FL^2(\Omega)} \ll_{\eta} 1.

Proof: By duality, we may write

\displaystyle  \| f w_P \|_{FL^2(\Omega)} = \int_\Omega {\mathcal D} fw_P(1+it) G(t)\ dt

for some measurable function {G} with {\int_\Omega |G(t)|^2\ dt = 1}. We can rearrange the right-hand side as

\displaystyle  \sum_p \frac{w_P(p)}{p} f(p) \int_\Omega G(t) p^{-it}\ dt

which by Cauchy-Schwarz is boudned in magnitude by

\displaystyle  \ll \sum_p \frac{w_P(p)}{p} |\int_\Omega G(t) p^{-it}\ dt|^2.

We can rearrange this as

\displaystyle  \int_\Omega \int_\Omega G(t) \overline{G(t')} {\mathcal D} w_P(1 + i(t-t'))\ dt dt',

which by the elementary inequality {G(t) \overline{G(t')} \ll |G(t)|^2 + |G(t')|^2} is bounded by

\displaystyle  \ll \sup_{t' \in \Omega} \int_\Omega |{\mathcal D} w_P(1 + i(t-t'))|\ dt

(cf. Lemma 6 from Notes 9). By Exercise 33 we have

\displaystyle  {\mathcal D} w_P( 1+i(t-t') ) \ll_{\eta,\varepsilon} \frac{1}{(1+|t|)^2} + \exp( - \log^{0.2} X ).

The claim follows. \Box

Now we can prove (iv). By hypothesis, {f \chi_X} is large on {[-T,T] \cap \Omega_0}. On the other hand, the function (21) (with {i=0}) is small on {[-T,T] \cap \Omega_0}. By the triangle inequality, we conclude that

\displaystyle  \frac{1}{\log\frac{\log Q_0}{\log P_0}} \sum_{P \in [P_0,Q_0]} \frac{1}{\log P_0} ((f\psi_{X/P}) * (f w_{P}))

is large on {[-T,T] \cap \Omega_0}, hence by the pigeonhole principle, there exists {P \in [P_0,Q_0]} such that

\displaystyle (f\psi_{X/P}) * (f w_{P})

is large on {[-T,T] \cap \Omega_0}. On the other hand, from Proposition 32 and Prosition 34 we have

\displaystyle  \| f w_P \|_{FL^2([-T,T] \cap \Omega_0)} \ll_{\eta} 1,

hence by (6)

\displaystyle  \| f\psi_{X/P} \|_{FL^\infty([-T,T] \cap \Omega_0)} \gg_\eta 1.

By the pigeonhole principle, this implies that

\displaystyle  \| f 1_I \|_{FL^\infty([-T,T] \cap \Omega_0)} \gg_\eta 1.

for some interval {I \subset [X/10P, 10X/P]}. The claim (iv) now follows from Exercise 13. Note that this already concludes the argument in the range {T \geq X/Q_0}.

Now we establish (ii). Here the set {\Omega_1} is not as well controlled in size as {\Omega_0}, but is still quite small. Indeed, from applying Proposition 32 with {P} ranging between {P_1} and {Q_1} and {\sigma = P^{-1/20}}, and applying the union bound, the we see that the measure of {\Omega_1} is at most {X^{0.1}}. This is too large of a bound to apply Proposition 34, but we may instead apply a different bound:

Exercise 35 (Montgomery-Halasz for integers) Let {X \geq 10}, and let {\Omega \subset [-X,X]} have measure {O(X^{0.1})}. For any {1}-bounded function {f} supported on {[X^{0.8},2X]}, show that

\displaystyle  \| f \|_{FL^2(\Omega)} \ll \log X.

(It is possible to remove the logarithmic loss here by being careful, but this loss will be acceptable for our arguments. One can either repeat the arguments used to prove Proposition 34, or else appeal to Proposition 9 from Notes 6.)

The point here is that we can get good bounds even when the function {f} is supported at narrower scales (such as {X^{0.8}}) than the Fourier interval under consideration (such as {[-X,X]} or {[-T,T]}). In particular, this exercise will serve as a replacement for Exercise 14, which will not give good estimates in this case.

As before, the function (21) is small on {\Omega_1 \backslash \Omega_0}, so it will suffice by the triangle inequality to show that

\displaystyle (f\psi_{X/P}) * (f w_{P})

is small on {\Omega_1 \backslash \Omega_0} for all {P \in [P_0,Q_0]}. From Exercise 35, we have

\displaystyle  \| f \psi_{X/P} \|_{FL^2(\Omega_1 \backslash \Omega_0)} \ll \log X

while from definition of {\Omega_0} we have

\displaystyle  \| f w_P \|_{FL^\infty(\Omega_1 \backslash \Omega_0)} \leq \log^{-10} X

and the claim now follows from (6). Note that this already concludes the argument in the range {T \leq X/Q_1}.

Finally, we establish (iii). The function (21) (with {i=1}) is small on {\Omega_2 \backslash \Omega_1}, so by the triangle inequality as before it suffices to show that

\displaystyle (f\psi_{X/P}) * (f w_{P})

is small on {\Omega_2 \backslash \Omega_1} for all {P \in [P_1,Q_1]}. On the one hand, the definition of {\Omega_1} gives a good {FL^\infty} bound on {f w_P}:

\displaystyle  \| fw_P \|_{FL^\infty(\Omega_2 \backslash \Omega_1)} \ll P^{-1/20}. \ \ \ \ \ (22)

To conclude using (6) we now need a good bound for {\|f \psi_{X/P} \|_{FL^2(\Omega_2 \backslash \Omega_1)}}. Unfortunately, the function {f \psi_{X/P}} is now supported on too short of an interval for Exercise 14 to give good estimates, and {\Omega_2} is too large for Exercise 35 to be applicable either.

But from definition we see that for {t \in \Omega_2}, we have

\displaystyle  {\mathcal D}(f w_{P'})(1+it)| \geq (P')^{-1/30} \ \ \ \ \ (23)

for at least one {P' \in [P_2,Q_2]}. We can use this to amplify the power of Exercise 14:

Proposition 36 (Amplified {L^2} mean value estimate) One has {\| f \psi_{X/P} \|_{FL^2(\Omega_2)} \ll P^{1/25}}.

Proof: For each {P' \in [P_2,Q_2]}, let {\Omega_{2,P'}} denote the set of {t \in [-X,X]} where (23) holds. Since there are {O((\log\log\log X)^3)} values of {P'}, and {P \geq P_1 = \exp((\log\log X)^2)}, it suffices by the union bound to show that

\displaystyle \| f \psi_{X/P} \|_{FL^2(\Omega_{2,P'})} \ll P^{1/26}

(say) for each {P'}. Let {k} be the first integer for which the quantity {(P')^k \geq P}, thus

\displaystyle  k \leq \frac{\log P}{\log P'} + 1.

From (23) we have the pointwise bound

\displaystyle  |{\mathcal D}( f \psi_{X/P})(1+it)| \leq (P')^{k/30} |{\mathcal D}( (f \psi_{X/P}) * (f w_{P'})^{*k} )(1+it)|

and hence by Exercise 14

\displaystyle  \| f \psi_{X/P} \|_{FL^2(\Omega_{2,P'})}^2 \ll \eta^{-O(k)} (P')^{2k/30} \frac{1}{Y} \sum_{n} |(f \psi_{X/P}) * (f w_{P'})^{*k}(n)|^2

where {Y := (P')^k X/P \geq X}. As {f} is {1}-bounded, and the summand only vanishes when {n = \exp(O(k)) Y}, we can bound the right-hand side by

\displaystyle  \ll (\eta^{-O(1)} (P')^{2/30} \log P')^k \frac{1}{Y} \sum_{n = \exp(O(k)) Y} |1 * 1_A^{*k}(n)|^2

where {A} denotes the set of primes in the interval {[P'/2, 2P']}.

Suppose {n} has {l} prime factors in this interval (counting multiplicity). Then {1 * 1_A^{*k}(n)} vanishes unless {l \geq k}, in which case we can bound

\displaystyle  1 * 1_A^{*k}(n) \ll l^k


\displaystyle  1 * 1_A^{*l}(n) \geq 1.

Thus we may bound the above sum by

\displaystyle  \ll (\eta^{-O(1)} (P')^{2/30} \log P')^k \frac{1}{Y} \sum_{l \geq k} l^k \sum_{n = \exp(O(k)) Y} 1 * 1_A^{*l}(n).

By the prime number theorem, {A} has {O(P'/\log P')} elements, so by double counting we have

\displaystyle  \sum_{n = \exp(O(k)) Y} 1 * 1_A^{*l}(n) \ll O(P'/\log P')^l \exp(O(k)) Y / (P')^l

and thus the previous bound becomes

\displaystyle  \ll (\eta^{-O(1)} (P')^{2/30})^k \sum_{l \geq k} l^k O(\frac{1}{\log P'})^{l-k}

which sums to

\displaystyle  \ll (k \eta^{-O(1)} (P')^{2/30})^k.


\displaystyle  (P')^k \leq P P'

we thus have

\displaystyle \| f \psi_{X/P} \|_{FL^2(\Omega_{2,P'})} \ll (PP')^{1/30} \exp( O( k \log( k/\eta ) ) );

\displaystyle  k \ll \frac{\log P}{\log P_2} \ll (\log\log\log X)^{-2} \log P \ll (\log\log X)^3

so that {\log(k/\eta) \ll \log\log\log X}, we obtain the claim. \Box

Combining this proposition with (22) and (6), we conclude part (iii) of Proposition 30. This establishes Theorem 27 up to the range {X \leq T/Q_2}.

Exercise 37 Show that for any fixed {k \geq 2}, Theorem 27 holds in the range

\displaystyle  T \leq X \exp( - (\log_{k} X)^2 )

where {\log_k X = \log \dots \log X} denotes the {k}-fold iterated logarithm of {X}. (Hint: this is already accomplished for {k=3}. For higher {k}, one has to introduce additional exceptional intervals {\Omega_3,\dots,\Omega_{k-1}} and extend Proposition 30 appropriately.)

Exercise 38 Establish Theorem 27 (and hence Theorem 8) in full generality. (This is the preceding exercise, but now with {k} potentially as large as {O(\log_* X)}, where the inverse tower exponential function {\log_* X} is defined as the least {k} for which {\log_k X \leq 2}. Now one has to start tracking dependence on {k} of all the arguments in the above analysis; in particular, the convenient notation of arithmetic functions being “large” or “small” needs to be replaced with something more precise.)

December 17, 2019

Peter Rohde Referee request response (decline)

Dear Editor,

Thank you for your invitation to review this manuscript for your journal. Unfortunately, I must decline the invitation given that, as a matter of principle, I do not support or endorse the activities of for-profit scientific journals.

The scientific community has previously offered this industry, free of charge:

  • Conducting all scientific research.
  • Writing all scientific manuscripts.
  • Acting voluntarily in editorial roles.
  • Performing all refereeing.
  • (i.e the entire workload of your organisation, other than hosting the website on which you serve the PDFs).

In exchange, we receive:

  • Massive journal subscription fees.
  • Article download fees.
  • Article publication fees.
  • Intimidation tactics employed against us when we prefer not to be a part of it.
  • Anti-competitive and financially predatory distribution tactics.
  • Institutionalised mandates for the above.

This is not a symbiotic relationship, but a parasitic one, for the larger part financed by the taxpayer, who should rather be financing our research. I can no longer endorse this one-sided relationship, in which for-profit journals effectively tax scientific research, to the tune of billions of dollars annually, often using coercive and intimidatory sales tactics, whilst providing very little or no value in return. This capital is best spent on what it was intended for — scientific research for the benefit of humankind — training students, hiring research staff, financing equipment, travel and infrastructure — to which your organisation contributes nothing whatsoever other than to extort value.

In addition to declining this offer, please for future reference:

  • Remove my name from your referee database.
  • Immediately cease and desist from using intimidatory tactics when I decline to volunteer my labour (which is of very high value) to your pursuit of profit (in exchange for nothing).
  • Hassling me for failing to voluntarily contribute my labour to your revenue-raising is tantamount to harassment and extortion.
  • Do not request that I voluntarily act as your journal editor.
  • Do not work in cahoots with national scientific funding agencies to enforce your own vendor lock in, thereby effectively mandating your own services, which are in fact of very little or no value whatsoever. This in an indirect form of taxation upon scientific research, which I have no interest in paying, and which we should be expected or forced to.
  • I do not intend personally to submit any further manuscripts to your journal for consideration (if my co-authors do, I won’t stand in their way).

Personal note to the Editor: this should not be construed as a personal attack against you, who I absolutely respect, but rather against the industry which is exploiting you in a slave-like work relationship, whilst using you as a conduit to engage me for the same purpose. I write this as an act of solidarity with you, not as a personal attack against you.

We advance human knowledge for the benefit of humanity, and provide it as a gift for all.

Referee 2.

(This post may be freely linked to, reused, or modified without acknowledgement)

December 15, 2019

John PreskillBreaking up the band structure

Note from the editor: During the Summer of 2019, a group of thirteen undergraduate students from Caltech and universities around the world, spent 10 weeks on campus performing research in experimental quantum physics. Below, Aiden Cullo, a student from Binghampton University in New York, shares his experience working in Professor Yeh’s lab. The program, termed QuantumSURF, will run again during the Summer of 2020.

This summer, I worked in Nai-Chang Yeh’s experimental condensed matter lab. The aim of my project was to observe the effects of a magnetic field on our topological insulator (TI) sample, {(BiSb)}_2{Te}_3. The motivation behind this project was to examine more closely the transformation between a topological insulator and a state exhibiting the anomalous hall effect (AHE).

Both states of matter have garnered a good deal of interest in condensed matter research because of their interesting transport properties, among other things. TIs have gained popularity due to their applications in electronics (spintronics), superconductivity, and quantum computation. TIs are peculiar in that they simultaneously have insulating bulk states and conducting surface state. Due to time-reversal symmetry (TRS) and spin-momentum locking, these surface states have a very symmetric hourglass-like gapless energy band structure (Dirac cone).

The focus of our particular study was the effects of “c-plane” magnetization of our TI’s surface state. Theory predicts TRS and spin-momentum locking will be broken, resulting in a gapped spectrum with a single connection between the valence and conduction bands. This gapping has been theorized and shown experimentally in Chromium (Cr)-doped {(BiSb)}_2{Te}_3 and numerous other TIs with similar make-up.

In 2014, Nai-Chang Yeh’s group showed that Cr-doped {Bi}_2{Se}_3 exhibit this gap opening due to the surface state of {Bi}_2{Se}_3 interacting via the proximity effect with a ferromagnet. Our contention is that a similar material, Cr-doped {(BiSb)}_2{Te}_3, exhibits a similar effect, but more homogeneously because of reduced structural strain between atoms. Specifically, at temperatures below the Curie temperature (Tc), we expect to see a gap in the energy band and an overall increase in the gap magnitude. In short, the main goal of my summer project was to observe the gapping of our TI’s energy band.

Overall, my summer project entailed a combination of reading papers/textbooks and hands-on experimental work. It was difficult to understand fully the theory behind my project in such a short amount of time, but even with a cursory knowledge of topological insulators, I was able to provide a meaningful analysis/interpretation of our data.

Additionally, my experiment relied heavily on external factors such as our supplier for liquid helium, argon gas, etc. As a result, our progress was slowed if an order was delayed or not placed far enough in advance. Most of the issues we encountered were not related to the abstract theory of the materials/machinery, but rather problems with less complex mechanisms such as wiring, insulation, and temperature regulation.

While I expected to spend a good deal of time troubleshooting, I severely underestimated the amount of time that would be spent dealing with quotidian problems such as configuring software or etching STM tips. Working on a machine as powerful as an STM was frustrating at times, but also very rewarding as eventually we were able to collect a large amount of data on our samples.

An important (and extremely difficult) part of our analysis of STM data was whether patterns/features in our data set were artifacts or genuine phenomena, or a combination. I was fortunate enough to be surrounded by other researchers that helped me sift through the volumes of data and identify traits of our samples. Reflecting on my SURF, I believe it was a positive experience as it not only taught me a great deal about research, but also, more importantly, closely mimicked the experience of graduate school.

December 11, 2019

Jordan EllenbergThe real diehard

Yesterday a guy saw my Orioles hat and said, “Wow, you’re a real diehard.”

He was wearing a Hartford Whalers hat.

December 10, 2019

Jordan Ellenberg"Hamilton"

We saw the last show of the touring company’s visit to Madison. The kids have played the record hundreds of times so I know the songs very well. But there’s a lot you get from seeing the songs realized by actors in physical space.

  • I had imagined King George as a character in the plot interacting with the rest of the cast; but in the show, he’s a kind of god/chorus floating above the action, seeing certain things clearly that the people in the thick of it can’t. So his famous line, “I will kill your friends and family to remind you of my love,” comes off in person as less menacing, more cosmic. Neil Haskell played the role very, very, very mincy, which I think was a mistake, but it got laughs.
  • On the other hand, I hadn’t grasped from the songs how big a role George Washington plays. It’s set up very nicely, with the relation between Hamilton and the two Schuyler sisters presented as a shadow of the much more robust and fully felt love triangle between Hamilton, Burr, and Washington.
  • The biggest thing I hadn’t understood from the record was the show’s gentle insistence, built up slowly and unavoidably over the whole of the night, that the winner of a duel is the one who gets shot.