Planet Musings

April 19, 2021

Doug NatelsonPlace your bets. Muon g-2....

Back in the early 20th century, there was a major advance in physics when people realized that particles like the electron have intrinsic angular momentum, spin, discussed here a bit.  The ratio between the magnetic dipole moment of a particle (think of this like the strength of a little bar magnet directed along the direction of the angular momentum) and the angular momentum is characterized by a dimensionless number, the g-factor.  (Note that for an electron in a solid, the effective g-factor is different, because of the coupling between electron spin and orbital angular momentum, but that's another story.)

For a free electron, the g-factor is a little bit larger than 2, deviating from the nice round number due to contributions of high-order processes.  The idea here is that apparently empty space is not so empty, and there are fluctuating virtual particles of all sorts, the interactions of which with the electron leading to small corrections related to high powers of (m/M), where m is the electron mass and M is the mass of some heavier virtual particle.   The "anomalous" g-factor of the electron has been measured to better than one part in a trillion and is in agreement with theory calculations involving contributions of over 12000 Feynman diagrams, including just corrections due to the Standard Model of particle physics.

A muon is very similar to an electron, but 220 times heavier.  That means that the anomalous g-factor of the muon is a great potential test for new physics, because any contributions from yet-undiscovered particles are larger than the electron case.  Technique-wise, measuring the g-factor for the muon is complicated by the fact that muons aren't stable and each decays into an electron (plus a muon neutrino and an electron antineutrino).  In 2006, a big effort at Brookhaven reported a result (from a data run that ended in 2001) that seems to deviate from Standard Model calculations by around 3 \(\sigma\).  

The experiment was moved from Brookhaven to Fermilab and reconstituted and improved, and on Wednesday the group will report their latest results from a new, large dataset.  The big question is, will that deviation from Standard Model expectations grow in significance, indicating possible new physics?  Or will the aggregate result be consistent with the Standard Model?   Stay tuned.

UpdateHere is the FNAL page that includes a zoom link to the webinar, which will happen at 10 AM CST on Wednesday, April 7.

John PreskillOne if by land minus two if by sea, over the square-root of two

Happy National Poetry Month! The United States salutes word and whimsy in April, and Quantum Frontiers is continuing its tradition of celebrating. As a resident of Cambridge, Massachusetts and as a quantum information scientist, I have trouble avoiding the poem “Paul Revere’s Ride.” 

Henry Wadsworth Longfellow wrote the poem, as well as others in the American canon, during the 1800s. Longfellow taught at Harvard in Cambridge, and he lived a few blocks away from the university, in what’s now a national historic site. Across the street from the house, a bust of the poet gazes downward, as though lost in thought, in Longfellow Park. Longfellow wrote one of his most famous poems about an event staged a short drive from—and, arguably, partially in—Cambridge.

Longfellow Park

The event took place “on the eighteenth of April, in [Seventeen] Seventy-Five,” as related by the narrator of “Paul Revere’s Ride.” Revere was a Boston silversmith and a supporter of the American colonies’ independence from Britain. Revolutionaries anticipated that British troops would set out from Boston sometime during the spring. The British planned to seize revolutionaries’ weapons in the nearby town of Concord and to jail revolutionary leaders in Lexington. The troops departed Boston during the night of April 18th. 

Upon learning of their movements, sexton Robert Newman sent a signal from Boston’s old North Church to Charlestown. Revere and the physician William Dawes rode out from Charlestown to warn the people of Lexington and the surrounding areas. A line of artificial hoof prints, pressed into a sidewalk a few minutes from the Longfellow house, marks part of Dawes’s trail through Cambridge. The initial riders galvanized more riders, who stirred up colonial militias that resisted the troops’ advance. The Battles of Lexington and Concord ensued, initiating the Revolutionary War.

Longfellow took liberties with the facts he purported to relate. But “Paul Revere’s Ride” has blown the dust off history books for generations of schoolchildren. The reader shares Revere’s nervous excitement as he fidgets, awaiting Newman’s signal: 

Now he patted his horse’s side, 
Now gazed on the landscape far and near, 
Then impetuous stamped the earth, 
And turned and tightened his saddle-girth;
But mostly he watched with eager search 
The belfry-tower of the old North Church.

The moment the signal arrives, that excitement bursts its seams, and Revere leaps astride his horse. The reader comes to gallop through with the silversmith the night, the poem’s clip-clop-clip-clop rhythm evoking a horse’s hooves on cobblestones.

The author, outside Longfellow House, on the eighteenth of April in…Twenty Twenty.

Not only does “Paul Revere’s Ride” revitalize history, but it also offers a lesson in information theory. While laying plans, Revere instructs Newman: 

He said to his friend, “If the British march
By land or sea from the town to-night,
Hang a lantern aloft in the belfry-arch
Of the North-Church-tower, as a signal light.

Then comes one of the poem’s most famous lines: “One if by land, and two if by sea.” The British could have left Boston by foot or by boat, and Newman had to communicate which. Specifying one of two options, he related one bit, or one basic unit of information. Newton thereby exemplifies a cornerstone of information theory: the encoding of a bit of information—an abstraction—in a physical system that can be in one of two possible states—a light that shines from one or two lanterns.

Benjamin Schumacher and Michael Westmoreland point out the information-theoretic interpretation of Newman’s action in their quantum-information textbook. I used their textbook in my first quantum-information course, as a senior in college. Before reading the book, I’d never felt that I could explain what information is or how it can be quantified. Information is an abstraction and a Big Idea, like consciousness, life, and piety. But, Schumacher and Westmoreland demonstrated, most readers already grasp the basics of information theory; some readers even studied the basics while memorizing a poem in elementary school. So I doff my hat—or, since we’re discussing the 1700s, my mobcap—to the authors.

Reading poetry enriches us more than we realize. So read a poem this April. You can find Longfellow’s poem here or ride off wherever your fancy takes you.  

April 18, 2021

Tommaso DorigoSunday Morning Teleology

Design and purpose are definitely not two things that scientists consider as their guiding ideas in trying to decypher the fabric of our Universe, or of natural phenomena in general. So teleology should not belong to this blog, I agree.

read more

n-Category Café Algebraic Closure

This semester I’ve been teaching an undergraduate course on Galois theory. It was all online, which meant a lot of work, but it was also a lot of fun: the students were great, and I got to know them individually better than I usually would.

For a category theorist, Galois theory is a constant provocation: very little is canonical or functorial, or at least, not in the obvious sense (for reasons closely related to the nontriviality of the Galois group). One important not-obviously-functorial construction is algebraic closure. We didn’t get to it in the course, but I spent a while absorbed in an expository note on it by Keith Conrad.

Proving that every field has an algebraic closure is not entirely trivial, but the proof in Conrad’s note seems easier and more obvious than the argument you’ll find in many algebra books. As he says, it’s a variant on a proof by Zorn, which he attributes to “B. Conrad” (presumably his brother Brian). It should be more widely known, and now I find myself asking: why would you prove it any other way?

What follows is a somewhat categorical take on the Conrad–Zorn proof.

The reason why many constructions in field theory fail to be functorial is that they involve arbitrary choices, often of roots or irreducible factors of polynomials. For instance, to construct the splitting field of a polynomial, we have to make a finite number of arbitrary choices.

The algebraic closure of a field is something like the splitting field of all polynomials, so you’d expect it to require an infinite number of arbitrary choices (at least if the field is infinite). It’s no surprise, then, that it cannot be done without some form of the axiom of choice. Actually, it doesn’t require the full-strength axiom, but it does need a small dose: there are models of ZF in which some fields have no algebraic closure.

That explains why Zorn’s name came up, and it also suggests why no proof of the existence of algebraic closures can be entirely trivial.

An algebraic closure of a field KK is, by definition, an extension MM of KK that is algebraically closed (every polynomial over MM splits in MM) and is minimal as such. So to construct an algebraic closure of KK, it looks as if we have to manufacture an extension MM in which it’s possible to split not only every polynomial over KK, but also every polynomial over MM itself.

Fortunately, there’s a lemma that saves us from that: any algebraic extension MM of KK in which every polynomial over KK splits is, in fact, an algebraic closure of KK. This is proved by a standard field-theoretic argument that I won’t dwell on here, except to say that the main ingredient is the “transitivity of algebraicity”: given extensions KLMK \subseteq L \subseteq M and αM\alpha \in M, if α\alpha is algebraic over LL and LL is algebraic over KK then α\alpha is algebraic over KK.

So: an algebraic closure of KK is an algebraic extension of KK in which every polynomial over KK splits. That’s the formulation I’ll use for the rest of this post.

Colin McLarty’s paper The rising sea: Grothendieck on simplicity and generality begins with a joke I find very resonant:

In 1949, André Weil published striking conjectures linking number theory to topology and a striking strategy for a proof [Weil, 1949]. Around 1953, Jean-Pierre Serre took on the project and soon recruited Alexander Grothendieck. Serre created a series of concise elegant tools which Grothendieck and coworkers simplified into thousands of pages of category theory.

How do we construct an algebraic closure of a given field? If you just want a concise presentation, I can’t do better than point you to Conrad’s note. But here I want to come at it a bit differently.

In my mind, the key to the Conrad–Zorn approach is to work in the world of rings for as long as possible and only descend to fields at the last minute. Behind this is the fact that the theory of rings is algebraic and the theory of fields is not, which is ultimately why constructions with fields often fail to be functorial.

(Here I mean “algebraic theory” in the technical sense. One could argue that little is more algebraic than the theory of fields, just as one could argue that “algebraic group” is a tautology…)

So we’re going to use rings, by which I mean commutative rings with 11. A field extension MM of our field KK is just a homomorphism of fields KMK \to M, but we’re going to be interested in “ring extensions” of KK, by which I mean homomorphisms KAK \to A, where AA is a ring. Such a thing is just a (commutative) KK-algebra, which is why you don’t hear anyone talking about “ring extensions”.

Given a polynomial f0f \neq 0 over KK, we could consider the splitting field SF K(f)SF_K(f) of ff over KK. But instead, let’s consider what I’ll call its splitting ring SR K(f)SR_K(f). Informally, SR K(f)SR_K(f) is the KK-algebra you get by freely adjoining to KK the right number of roots, say

α f,1,,α f,deg(f), \alpha_{f, 1}, \ldots, \alpha_{f, \deg(f)},

and imposing relations to say that they are the roots of ff. Write

f^(x)=a i=1 deg(f)(xα f,i), \hat{f}(x) = a \prod_{i = 1}^{\deg(f)} (x - \alpha_{f, i}),

where aa is the leading coefficient of ff. The relations should make f^(x)\hat{f}(x) and f(x)f(x) be equal as polynomials over SR K(f)SR_K(f).

Formally, then, we define SR K(f)SR_K(f) to be the KK-algebra

SR K(f)=K[α f,1,,α f,deg(f)]/ SR_K(f) = K[\alpha_{f, 1}, \ldots, \alpha_{f, \deg(f)}]/\sim

where \sim identifies the rrth coefficient of ff with the rrth coefficient of f^\hat{f} for each r0r \geq 0. So SR K(f)SR_K(f) is the free KK-algebra in which ff splits.

For example, take f(x)=x(x1)f(x) = x(x - 1). The splitting field of ff over KK is just KK itself. But the splitting ring SR K(f)SR_K(f) is another story. It’s the polynomial ring K[α,β]K[\alpha, \beta] quotiented out by the relations that identify the coefficients of f(x)=x(x1)f(x) = x(x - 1) with those of

f^(x)=(xα)(xβ). \hat{f}(x) = (x - \alpha)(x - \beta).

Comparing constant terms gives αβ=0\alpha\beta = 0, and comparing coefficients of degree 11 gives α+β=1\alpha + \beta = 1. So

SR K(f)=K[α,β]/(αβ=0,α+β=1), SR_K(f) = K[\alpha, \beta]/(\alpha\beta = 0, \ \alpha + \beta = 1),

or equivalently,

SR K(f)=K[α]/(α 2=α). SR_K(f) = K[\alpha]/(\alpha^2 = \alpha).

This is bigger than the splitting field! A field can’t have nontrivial idempotents: α 2=α\alpha^2 = \alpha forces α{0,1}\alpha \in \{0, 1\}. But a ring can, and SR K(f)SR_K(f) does.

More generally, we can form the splitting ring of not just one polynomial but any set of them (providing they’re nonzero). For 𝒫K[x]{0}\mathcal{P} \subseteq K[x] \setminus \{0\}, put

SR K(𝒫)=K[{α f,i:f𝒫,1ideg(f)}]/, SR_K(\mathcal{P}) = K[\{ \alpha_{f, i}: f \in \mathcal{P}, \ 1 \leq i \leq \deg(f)\}]/\sim,

where \sim identifies the coefficients of ff with the corresponding coefficients of f^\hat{f} for each f𝒫f \in \mathcal{P}. Equivalently, SR K(𝒫)SR_K(\mathcal{P}) is the coproduct f𝒫SR K(f)\coprod_{f \in \mathcal{P}} SR_K(f) in the category of KK-algebras. It’s the free KK-algebra in which all the polynomials in 𝒫\mathcal{P} split.

The Conrad proof goes like this: put R=SR K(K[x]{0})R = SR_K(K[x] \setminus \{0\}), then (i) show that RR is nontrivial, (ii) quotient RR out by a maximal ideal to get a field MM, and (iii) show that MM is an algebraic closure of KK. Step (i) is where the substance of the argument lies. Step (ii) calls on the fact that every nontrivial ring has a maximal ideal (a routine application of Zorn’s lemma). And step (iii) is pretty much immediate by the lemma I started with, since by construction every polynomial over KK splits in RR, hence in MM.

But you can come at it a bit differently, as follows. First let me list some apparent trivialities.

  • A ring AA is trivial if and only if 0 A=1 A0_A = 1_A. (Triviality is an equational property.)

  • The forgetful functor from rings to sets preserves filtered colimits. This is equivalent to the statement that the theory of rings is finitary: the ring operations (such as addition) take only finitely many inputs.

  • A filtered colimit of nontrivial rings is nontrivial. This isn’t as obvious as it might seem: for instance, the analogous statement for abelian groups is false. It follows from the previous two bullet points and the way filtered colimits work in SetSet.

  • A ring AA is nontrivial if and only if there is a homomorphism from AA to some field, if and only if there is a surjective homomorphism from AA to some field. Indeed, the first bullet point together with the nontriviality of fields shows that if AA admits a homomorphism to a field then AA is nontrivial. On the other hand, if AA is nontrivial then by Zorn’s lemma, we can find a maximal ideal JJ of AA, and the natural homomorphism AA/JA \to A/J is then a surjection to a field.

  • Given a homomorphism ABA \to B of KK-algebras and a polynomial ff over KK, if ff splits in AA then ff splits in BB.

Now, here’s a proof that every field KK has an algebraic closure.

  • Put S=colim SR K()S = colim_{\mathcal{F}} SR_K(\mathcal{F}), where the colimit is over finite subsets \mathcal{F} of K[x]{0}K[x]\setminus\{0\}. Here I’m implicitly using the fact that SR K()SR_K(\mathcal{F}) is functorial in \mathcal{F} with respect to inclusion. This is a filtered colimit.

  • For each \mathcal{F}, the polynomial ff\prod_{f \in \mathcal{F}} f has a splitting field LL. Then every polynomial in \mathcal{F} splits in LL, so by the universal property of SR K()SR_K(\mathcal{F}), there is a homomorphism SR K()LSR_K(\mathcal{F}) \to L. It follows that SR K()SR_K(\mathcal{F}) is nontrivial.

  • So the KK-algebra SS is a filtered colimit of nontrivial rings, and is therefore nontrivial.

  • Hence there is a surjective homomorphism from SS to some field MM. Composing the natural maps KSMK \to S \to M makes MM into an extension of KK.

  • For each nonzero fK[x]f \in K[x], we have homomorphisms of KK-algebras SR K(f)SMSR_K(f) \to S \to M, and ff splits in SR K(f)SR_K(f), so ff splits in MM. Hence the field MM is an extension of KK in which every polynomial over KK splits.

  • Each of the KK-algebras SR K()SR_K(\mathcal{F}) is generated by elements algebraic over KK, so the same is true of MM. Hence MM is algebraic over KK, and is, therefore, an algebraic closure of KK.

In any construction of algebraic closures, the challenge is to find some extension MM of KK in which every polynomial over KK splits. It may be that MM is wastefully big, but you can cut it down to size by taking the subfield MM' of MM consisting of the elements algebraic over KK, and MM' is then an algebraic closure of KK. This provides an alternative to the last step above. But I like it less, because the fact is that the MM we constructed is already the algebraic closure.

Whether one uses the approach in Conrad’s note or the slight variant I’ve just given, I prefer this to the Artin (?) proof that appears in many books and is also summarized on page 2 of Conrad’s note. It seems much more direct and appealing.

There’s another proof I also like: the existence of algebraic closures follows very quickly from the compactness theorem of model theory. Like the proof above, this argument relies on the knowledge that for any finite collection of polynomials, there’s some field in which they all split. In other words, it uses the existence of splitting fields.

And there’s one more thing I can’t resist saying. Although algebraic closure isn’t functorial in the obvious sense, I believe it’s functorial in a non-obvious sense — what I’d think of as a “Hakim-type” sense.

Let me explain what I mean, taking a bit of a run-up.

The category of rings has a subcategory consisting of the local rings. The inclusion has no left adjoint: there’s no way to take the free local ring on a ring. But there is if you allow the ambient topos to vary! In other words, consider all pairs (,R)(\mathcal{E}, R) where \mathcal{E} is a topos and RR is a ring in \mathcal{E}. Given such a pair, one can look for the universal map from (,R)(\mathcal{E}, R) to some pair (,R)(\mathcal{E}', R') where RR' is local.

Monique Hakim proved, among other things, that if we start with a ring RR in the topos SetSet, then this construction gives the topos Sh(Spec(R))Sh(Spec(R)) of sheaves on the prime spectrum of RR, together with the structure sheaf 𝒪 R\mathcal{O}_R, which is a sheaf of local rings on Spec(R)Spec(R), or equivalently a local ring in Sh(Spec(R))Sh(Spec(R)). This wonderful result appeared in her 1972 book Topos Annelés et Schémas Relatifs.

Now if I’m not mistaken, there’s a similar description of algebraic closure. Consider all pairs (,K)(\mathcal{E}, K) where \mathcal{E} is a topos and KK is a field in \mathcal{E}. Given such a pair, one can look for the universal map from it to a pair (,K)(\mathcal{E}', K') where KK' is algebraically closed. And I believe that if we start with a field KK in SetSet, the resulting pair is the topos of sets equipped with a continuous action by the absolute Galois group GG of KK, together with the algebraic closure K¯\overline{K} with its natural GG-action. There are lots of details I’ve skipped here, but perhaps someone can tell me whether that’s more or less right and, if so, where this is all written up.

April 17, 2021

David Hoggre-scoping our gauge-invariant GNN project

I am in a project with Weichi Yao (NYU) and Soledad Villar (NYU) to look at building machine-learning methods that are constrained by the same symmetries as Newtonian mechanics: Rotation, translation, Galilean boost, and particle exchange, for examples. Kate Storey-Fisher (NYU) joined our weekly call today, because she has ideas about toy problems we could use to demonstrate the value of encoding these symmetries. She steered us towards things in the area of “halo occupation”, or the question of which dark-matter halos contain what kinds of galaxies. Right now halo occupation is performed with very blunt tools, and maybe a sharp tool could do better? We would have the advantage (over others) that anything we found would, by construction, obey the fundamental symmetries of physical law.

David Hoggdomain adaptation and instrument calibration

At the end of the day I had a wide-ranging conversation with Andy Casey (Monash) about all things spectroscopic. I mentioned to him my new interest in domain adaptation, and whether it could be used to build data-driven models. The SDSS-V project has two spectrographs, at two different telescopes, each of which observes stars down different fibers (which have their own idiosyncracies). Could we build a data-driven model to see what any star observed down one fiber of one spectrograph would look like if it had been observed down any other fiber or any fiber of the other spectrograph? That would permit us to see what systematics are spectrograph-specific, and whether we would have got the same answers with the other spectrograph, and other questions like that.

There are some stars observed multiple times and by both observatories, but I'm kind-of interested in whether we could do better using the huge number of stars that haven't been observed twice instead. Indeed, it isn't clear which contains more information about the transformations. Another fun thing: The northern sky and the southern sky are different! We would have to re-build domain adaptation to be sensitive to those differences, which might get into causal-inference territory.

April 16, 2021

John BaezApplied Category Theory 2021 — Call for Papers

The deadline for submitting papers is coming up soon: May 10th.

Fourth Annual International Conference on Applied Category Theory (ACT 2021), July 12–16, 2021, online and at the Computer Laboratory of the University of Cambridge.

Plans to run ACT 2021 as one of the first physical conferences post-lockdown are progressing well. Consider going to Cambridge! Financial support is available for students and junior researchers.

Applied category theory is a topic of interest for a growing community of researchers, interested in studying many different kinds of systems using category-theoretic tools. These systems are found across computer science, mathematics, and physics, as well as in social science, linguistics, cognition, and neuroscience. The background and experience of our members is as varied as the systems being studied. The goal of the Applied Category Theory conference series is to bring researchers together, disseminate the latest results, and facilitate further development of the field.

We accept submissions of both original research papers, and work accepted/submitted/ published elsewhere. Accepted original research papers will be invited for publication in a proceedings volume. The keynote addresses will be drawn from the best accepted papers. The conference will include an industry showcase event.

We hope to run the conference as a hybrid event, with physical attendees present in Cambridge, and other participants taking part online. However, due to the state of the pandemic, the possibility of in-person attendance is not yet confirmed. Please do not book your travel or hotel accommodation yet.

Financial support

We are able to offer financial support to PhD students and junior researchers. Full guidance is on the webpage.

Important dates (all in 2021)

• Submission Deadline: Monday 10 May
• Author Notification: Monday 7 June
• Financial Support Application Deadline: Monday 7 June
• Financial Support Notification: Tuesday 8 June
• Priority Physical Registration Opens: Wednesday 9 June
• Ordinary Physical Registration Opens: Monday 13 June
• Reserved Accommodation Booking Deadline: Monday 13 June
• Adjoint School: Monday 5 to Friday 9 July
• Main Conference: Monday 12 to Friday 16 July


The following two types of submissions are accepted:

Proceedings Track. Original contributions of high-quality work consisting of an extended abstract, up to 12 pages, that provides evidence of results of genuine interest, and with enough detail to allow the program committee to assess the merits of the work. Submission of work-in-progress is encouraged, but it must be more substantial than a research proposal.

Non-Proceedings Track. Descriptions of high-quality work submitted or published elsewhere will also be considered, provided the work is recent and relevant to the conference. The work may be of any length, but the program committee members may only look at the first 3 pages of the submission, so you should ensure that these pages contain sufficient evidence of the quality and rigour of your work.

Papers in the two tracks will be reviewed against the same standards of quality. Since ACT is an interdisciplinary conference, we use two tracks to accommodate the publishing conventions of different disciplines. For example, those from a Computer Science background may prefer the Proceedings Track, while those from a Mathematics, Physics or other background may prefer the Non-Proceedings Track. However, authors from any background are free to choose the track that they prefer, and submissions may be moved from the Proceedings Track to the Non-Proceedings Track at any time at the request of the authors.

Contributions must be submitted in PDF format. Submissions to the Proceedings Track must be prepared with LaTeX, using the EPTCS style files available at

The submission link will soon be available on the ACT2021 web page:

Program Committee


• Kohei Kishida, University of Illinois, Urbana-Champaign


• Richard Blute, University of Ottawa
• Spencer Breiner, NIST
• Daniel Cicala, University of New Haven
• Robin Cockett, University of Calgary
• Bob Coecke, Cambridge Quantum Computing
• Geoffrey Cruttwell, Mount Allison University
• Valeria de Paiva, Samsung Research America and University of Birmingham
• Brendan Fong, Massachusetts Institute of Technology
• Jonas Frey, Carnegie Mellon University
• Tobias Fritz, Perimeter Institute for Theoretical Physics
• Fabrizio Romano Genovese, Statebox
• Helle Hvid Hansen, University of Groningen
• Jules Hedges, University of Strathclyde
• Chris Heunen, University of Edinburgh
• Alex Hoffnung, Bridgewater
• Martti Karvonen, University of Ottawa
• Kohei Kishida, University of Illinois, Urbana -Champaign (chair)
• Martha Lewis, University of Bristol
• Bert Lindenhovius, Johannes Kepler University Linz
• Ben MacAdam, University of Calgary
• Dan Marsden, University of Oxford
• Jade Master, University of California, Riverside
• Joe Moeller, NIST
• Koko Muroya, Kyoto University
• Simona Paoli, University of Leicester
• Daniela Petrisan, Université de Paris, IRIF
• Mehrnoosh Sadrzadeh, University College London
• Peter Selinger, Dalhousie University
• Michael Shulman, University of San Diego
• David Spivak, MIT and Topos Institute
• Joshua Tan, University of Oxford
• Dmitry Vagner
• Jamie Vicary, University of Cambridge
• John van de Wetering, Radboud University Nijmegen
• Vladimir Zamdzhiev, Inria, LORIA, Université de Lorraine
• Maaike Zwart

n-Category Café Applied Category Theory 2021 --- Call for Papers

The deadline for submitting papers is coming up soon: May 10th.

Plans to run ACT 2021 as one of the first physical conferences post-lockdown are progressing well. Consider going to Cambridge! Financial support is available for students and junior researchers.

Applied category theory is a topic of interest for a growing community of researchers, interested in studying many different kinds of systems using category-theoretic tools. These systems are found across computer science, mathematics, and physics, as well as in social science, linguistics, cognition, and neuroscience. The background and experience of our members is as varied as the systems being studied. The goal of the Applied Category Theory conference series is to bring researchers together, disseminate the latest results, and facilitate further development of the field.

We accept submissions of both original research papers, and work accepted/submitted/ published elsewhere. Accepted original research papers will be invited for publication in a proceedings volume. The keynote addresses will be drawn from the best accepted papers. The conference will include an industry showcase event.

We hope to run the conference as a hybrid event, with physical attendees present in Cambridge, and other participants taking part online. However, due to the state of the pandemic, the possibility of in-person attendance is not yet confirmed. Please do not book your travel or hotel accommodation yet.


We are able to offer financial support to PhD students and junior researchers. Full guidance is on the webpage.

IMPORTANT DATES (all in 2021)

  • Submission Deadline: Monday 10 May
  • Author Notification: Monday 7 June
  • Financial Support Application Deadline: Monday 7 June
  • Financial Support Notification: Tuesday 8 June
  • Priority Physical Registration Opens: Wednesday 9 June
  • Ordinary Physical Registration Opens: Monday 13 June
  • Reserved Accommodation Booking Deadline: Monday 13 June
  • Adjoint School: Monday 5 to Friday 9 July
  • Main Conference: Monday 12 to Friday 16 July


The following two types of submissions are accepted:

  • Proceedings Track. Original contributions of high-quality work consisting of an extended abstract, up to 12 pages, that provides evidence of results of genuine interest, and with enough detail to allow the program committee to assess the merits of the work. Submission of work-in-progress is encouraged, but it must be more substantial than a research proposal.

  • Non-Proceedings Track. Descriptions of high-quality work submitted or published elsewhere will also be considered, provided the work is recent and relevant to the conference. The work may be of any length, but the program committee members may only look at the first 3 pages of the submission, so you should ensure that these pages contain sufficient evidence of the quality and rigour of your work.

Papers in the two tracks will be reviewed against the same standards of quality. Since ACT is an interdisciplinary conference, we use two tracks to accommodate the publishing conventions of different disciplines. For example, those from a Computer Science background may prefer the Proceedings Track, while those from a Mathematics, Physics or other background may prefer the Non-Proceedings Track. However, authors from any background are free to choose the track that they prefer, and submissions may be moved from the Proceedings Track to the Non-Proceedings Track at any time at the request of the authors.

Contributions must be submitted in PDF format. Submissions to the Proceedings Track must be prepared with LaTeX, using the EPTCS style files available at

The submission link will soon be available on the ACT2021 web page:



  • Kohei Kishida, University of Illinois, Urbana-Champaign


  • Richard Blute, University of Ottawa
  • Spencer Breiner, NIST
  • Daniel Cicala, University of New Haven
  • Robin Cockett, University of Calgary
  • Bob Coecke, Cambridge Quantum Computing
  • Geoffrey Cruttwell, Mount Allison University
  • Valeria de Paiva, Samsung Research America and University of Birmingham
  • Brendan Fong, Massachusetts Institute of Technology
  • Jonas Frey, Carnegie Mellon University
  • Tobias Fritz, Perimeter Institute for Theoretical Physics
  • Fabrizio Romano Genovese, Statebox
  • Helle Hvid Hansen, University of Groningen
  • Jules Hedges, University of Strathclyde
  • Chris Heunen, University of Edinburgh
  • Alex Hoffnung, Bridgewater
  • Martti Karvonen, University of Ottawa
  • Kohei Kishida, University of Illinois, Urbana-Champaign (chair)
  • Martha Lewis, University of Bristol
  • Bert Lindenhovius, Johannes Kepler University Linz
  • Ben MacAdam, University of Calgary
  • Dan Marsden, University of Oxford
  • Jade Master, University of California, Riverside
  • Joe Moeller, NIST
  • Koko Muroya, Kyoto University
  • Simona Paoli, University of Leicester
  • Daniela Petrisan, Université de Paris, IRIF
  • Mehrnoosh Sadrzadeh, University College London
  • Peter Selinger, Dalhousie University
  • Michael Shulman, University of San Diego
  • David Spivak, MIT and Topos Institute
  • Joshua Tan, University of Oxford
  • Dmitry Vagner
  • Jamie Vicary, University of Cambridge
  • John van de Wetering, Radboud University Nijmegen
  • Vladimir Zamdzhiev, Inria, LORIA, Université de Lorraine
  • Maaike Zwart

Matt von HippelDoing Difficult Things Is Its Own Reward

Does antimatter fall up, or down?

Technically, we don’t know yet. The ALPHA-g experiment would have been the first to check this, making anti-hydrogen by trapping anti-protons and positrons in a long tube and seeing which way it falls. While they got most of their setup working, the LHC complex shut down before they could finish. It starts up again next month, so we should have our answer soon.

That said, for most theorists’ purposes, we absolutely do know: antimatter falls down. Antimatter is one of the cleanest examples of a prediction from pure theory that was confirmed by experiment. When Paul Dirac first tried to write down an equation that described electrons, he found the math forced him to add another particle with the opposite charge. With no such particle in sight, he speculated it could be the proton (this doesn’t work, they need the same mass), before Carl D. Anderson discovered the positron in 1932.

The same math that forced Dirac to add antimatter also tells us which way it falls. There’s a bit more involved, in the form of general relativity, but the recipe is pretty simple: we know how to take an equation like Dirac’s and add gravity to it, and we have enough practice doing it in different situations that we’re pretty sure it’s the right way to go. Pretty sure doesn’t mean 100% sure: talk to the right theorists, and you’ll probably find a proposal or two in which antimatter falls up instead of down. But they tend to be pretty weird proposals, from pretty weird theorists.

Ok, but if those theorists are that “weird”, that outside the mainstream, why does an experiment like ALPHA-g exist? Why does it happen at CERN, one of the flagship facilities for all of mainstream particle physics?

This gets at a misconception I occasionally hear from critics of the physics mainstream. They worry about groupthink among mainstream theorists, the physics community dismissing good ideas just because they’re not trendy (you may think I did that just now, for antigravity antimatter!) They expect this to result in a self-fulfilling prophecy where nobody tests ideas outside the mainstream, so they find no evidence for them, so they keep dismissing them.

The mistake of these critics is in assuming that what gets tested has anything to do with what theorists think is reasonable.

Theorists talk to experimentalists, sure. We motivate them, give them ideas and justification. But ultimately, people do experiments because they can do experiments. I watched a talk about the ALPHA experiment recently, and one thing that struck me was how so many different techniques play into it. They make antiprotons using a proton beam from the accelerator, slow them down with magnetic fields, and cool them with lasers. They trap their antihydrogen in an extremely precise vacuum, and confirm it’s there with particle detectors. The whole setup is a blend of cutting-edge accelerator physics and cutting-edge tricks for manipulating atoms. At its heart, ALPHA-g feels like its primary goal is to stress-test all of those tricks: to push the state of the art in a dozen experimental techniques in order to accomplish something remarkable.

And so even if the mainstream theorists don’t care, ALPHA will keep going. It will keep getting funding, it will keep getting visited by celebrities and inspiring pop fiction. Because enough people recognize that doing something difficult can be its own reward.

In my experience, this motivation applies to theorists too. Plenty of us will dismiss this or that proposal as unlikely or impossible. But give us a concrete calculation, something that lets us use one of our flashy theoretical techniques, and the tune changes. If we’re getting the chance to develop our tools, and get a paper out of it in the process, then sure, we’ll check your wacky claim. Why not?

I suspect critics of the mainstream would have a lot more success with this kind of pitch-based approach. If you can find a theorist who already has the right method, who’s developing and extending it and looking for interesting applications, then make your pitch: tell them how they can answer your question just by doing what they do best. They’ll think of it as a chance to disprove you, and you should let them, that’s the right attitude to take as a scientist anyway. It’ll work a lot better than accusing them of hogging the grant money.

John BaezBlack Dwarf Supernovae

“Black dwarf supernovae”. They sound quite dramatic! And indeed, they may be the last really exciting events in the Universe.

It’s too early to be sure. There could be plenty of things about astrophysics we don’t understand yet—and intelligent life may throw up surprises even in the very far future. But there’s a nice scenario here:

• M. E. Caplan, Black dwarf supernova in the far future, Monthly Notices of the Royal Astronomical Society 497 (2020), 4357–4362.

First, let me set the stage. What happens in the short run: say, the first 1023 years or so?

For a while, galaxies will keep colliding. These collisions seem to destroy spiral galaxies: they fuse into bigger elliptical galaxies. We can already see this happening here and there—and our own Milky Way may have a near collision with Andromeda in only 3.85 billion years or so, well before the Sun becomes a red giant. If this happens, a bunch of new stars will be born from the shock waves due to colliding interstellar gas.

By 7 billion years we expect that Andromeda and the Milky Way will merge and form a large elliptical galaxy. Unfortunately, elliptical galaxies lack spiral arms, which seem to be a crucial part of the star formation process, so star formation may cease even before the raw materials run out.

Of course, no matter what happens, the birth of new stars must eventually cease, since there’s a limited amount of hydrogen, helium, and other stuff that can undergo fusion.

This means that all the stars will eventually burn out. The longest lived are the red dwarf stars, the smallest stars capable of supporting fusion today, with a mass about 0.08 times that of the Sun. These will run out of hydrogen about 10 trillion years from now, and not be able to burn heavier elements–so then they will slowly cool down.

(I’m deliberately ignoring what intelligent life may do. We can imagine civilizations that develop the ability to control stars, but it’s hard to predict what they’ll do so I’m leaving them out of this story.)

A star becomes a white dwarf—and eventually a black dwarf when it cools—if its core, made of highly compressed matter, has a mass less than 1.4 solar masses. In this case the core can be held up by the ‘electron degeneracy pressure’ caused by the Pauli exclusion principle, which works even at zero temperature. But if the core is heavier than this, it collapses! It becomes a neutron star if it’s between 1.4 and 2 solar masses, and a black hole if it’s more massive.

In about 100 trillion years, all normal star formation processes will have ceased, and the universe will have a population of stars consisting of about 55% white dwarfs, 45% brown dwarfs, and a smaller number of neutron stars and black holes. Star formation will continue at a very slow rate due to collisions between brown and/or white dwarfs.

The black holes will suck up some of the other stars they encounter. This is especially true for the big black holes at the galactic centers, which power radio galaxies if they swallow stars at a sufficiently rapid rate. But most of the stars, as well as interstellar gas and dust, will eventually be hurled into intergalactic space. This happens to a star whenever it accidentally reaches escape velocity through its random encounters with other stars. It’s a slow process, but computer simulations show that about 90% of the mass of the galaxies will eventually ‘boil off’ this way — while the rest becomes a big black hole.

How long will all this take? Well, the white dwarfs will cool to black dwarfs in about 100 quadrillion years, and the galaxies will boil away by about 10 quintillion years. Most planets will have already been knocked off their orbits by then, thanks to random disturbances which gradually take their toll over time. But any that are still orbiting stars will spiral in thanks to gravitational radiation in about 100 quintillion years.

I think the numbers are getting a bit silly. 100 quintillion is 1020, and let’s use scientific notation from now on.

Then what? Well, in about 1023 years the dead stars will actually boil off from the galactic clusters, not just the galaxies, so the clusters will disintegrate. At this point the cosmic background radiation will have cooled to about 10-13 Kelvin, and most things will be at about that temperature unless proton decay or some other such process keeps them warmer.

Okay: so now we have a bunch of isolated black holes, neutron stars, and black dwarfs together with lone planets, asteroids, rocks, dust grains, molecules and atoms of gas, photons and neutrinos, all very close to absolute zero.

I had a dream, which was not all a dream.
The bright sun was extinguishd, and the stars
Did wander darkling in the eternal space,
Rayless, and pathless, and the icy earth
Swung blind and blackening in the moonless air.

— Lord Byron

So what happens next?

We expect that black holes evaporate due to Hawking radiation: a solar-mass one should do so in 1067 years, and a really big one, comparable to the mass of a galaxy, should take about 1099 years. Small objects like planets and asteroids may eventually ‘sublimate’: that is, slowly dissipate by losing atoms due to random processes. I haven’t seen estimates on how long this will take. For larger objects, like neutron stars, this may take a very long time.

But I want to focus on stars lighter than 1.2 solar masses. As I mentioned, these will become white dwarfs held up by their electron degeneracy pressure, and by about 1017 years they will cool down to become very cold black dwarfs. Their cores will crystallize!

Then what? If a proton can decay into other particles, for example a positron and a neutral pion, black dwarfs may slowly shrink away to nothing due to this process, emitting particles as they fade away! Right now we know that the lifetime of the proton to decay via such processes is at least 1032 years. It could be much longer.

But suppose the proton is completely stable. Then what happens? In this scenario, a very slow process of nuclear fusion will slowly turn black dwarfs into iron! It’s called pycnonuclear fusion. The idea is that due to quantum tunneling, nuclei next to each other in the crystal lattice within a black dwarf will occasionally get ‘right on top of each other’ and fuse into heavier nucleus! Since iron-56 is the most stable nucleus, eventually iron will predominate.

Iron is more dense than lighter elements, so as this happens the black dwarf will shrink. It may eventually shrink down to being so dense that electron pressure will no longer hold it up. If this happens, the black dwarf will suddenly collapse, just like heavier stars. It will release a huge amount of energy and explode as gravitational potential energy gets converted into heat. This is a black dwarf supernova.

When will black dwarf supernovae first happen, assuming proton decay or some other unknown processes don’t destroy the black dwarfs first?

This is what Matt Caplan calculated:

We now consider the evolution of a white dwarf toward an iron black dwarf and the circumstances that result in collapse. Going beyond the simple order of magnitude estimates of Dyson (1979), we know pycnonuclear fusion rates are strongly dependent on density so they are greatest in the core of the black dwarf and slowest at the surface. Therefore, the internal structure of a black dwarf evolving toward collapse can be thought of as an astronomically slowly moving ‘burning’ front growing outward from the core toward the surface. This burning front grows outward much more slowly than any hydrodynamical or nuclear timescale, and the star remains at approximately zero temperature for this phase. Furthermore, in contrast to traditional thermonuclear stellar burning, the later reactions with higher Z parents take significantly longer due to the larger tunneling barriers for fusion.

Here “later reactions with higher Z parents” means fusion reactions involving heavier nuclei. The very last step, for example, is when two silicon nuclei fuse to form a nucleus of iron. In an ordinary star these later reactions happen much faster than those involving light nuclei, but for black dwarfs this pattern is reversed—and everything happens at ridiculously slow rate, at a temperature near absolute zero.

He estimates a black dwarf of 1.24 solar masses will collapse and go supernova after about 101600 years, when roughly half its mass has turned to iron.

Lighter ones will take much longer. A black dwarf of 1.16 solar masses could take 1032000 years to go supernova.

These black dwarf supernovae could be the last really energetic events in the Universe.

It’s downright scary to think how far apart these black dwarfs will be when they explode. As I mentioned, galaxies and clusters will have long since have boiled away, so every black dwarf will be completely alone in the depths of space. Distances between them will be doubling every 12 billion years according to the current standard model of cosmology, the ΛCDM model. But 12 billion years is peanuts compared to the time scales I’m talking about now!

So, by the time black dwarfs start to explode, the distances between these stars will be expanded by a factor of roughly

\displaystyle{ e^{10^{1000}} }

compared to their distances today. That’s a very rough estimate, but it means that each black dwarf supernova will be living in its own separate world.

Doug NatelsonNSF Workshop on Quantum Engineering Infrastructure

 I spent three afternoons this week attending a NSF workshop on Quantum Engineering Infrastructure.  This was based in part on the perceived critical need for shared infrastructure (materials growth, lithographic patterning, deposition, etching, characterization) across large swaths of experimental quantum information sciences, and the fact that the NSF already runs the NNCI, which was the successor of the NNIN.  There will end up being a report generated as a result of the workshop, hopefully steering future efforts.  (I was invited because of this post.)

The workshop was very informative, touching on platforms including superconducting qubits, trapped ions, photonic devices including color centers in diamond/SiC, topological materials, and spin qubits in semiconductors.  Some key themes emerged:

  • There are many possible platforms out there for quantum information science, and all of them will require very serious materials development to be ready for prime time.  People forget that our command of silicon comes after thousands of person-years worth of research and process development.  Essentially every platform is in its infancy compared to that.  
  • There is clearly a tension between the need for exploratory research, trying new processes at the onesy-twosy level, and the requirements for work at larger scale, which needs dedicated process expertise and control at a level not typically possible in a shared university facility.  Everyone also knows that progress is automatically slow if people have to travel off-site to some user facility to do part of their processing.  Some places are well situated - MIT, for example, has an exploratory fab facility here, and a dedicated 200 mm substrate superconducting circuit fab at Lincoln Labs.  Life is extra complicated when running an unusual process in some tool like a PECVD system or an etcher can "season" the gadget, leaving an imprint on subsequent process runs.
  • Whoever really figures out how to do wafer-scale heteroepitaxy of single-crystal diamond will either become incredibly rich or will be assassinated by DeBeers.  
  • Fostering a healthy relationship between industrial materials growers and academic researchers would be very important.  Industrial expertise can be fantastic, but there is not necessarily much economic incentive to work closely with academia compared with large-scale commercial pressures.  There may be a key role for government encouragement or subsidy.  
  • It's going to be increasingly challenging for new faculty to get started in some research topics at universities - the detailed process knowhow and the need to buildup expertise can be expensive and slow to acquire compared to the timescale of, e.g., promotion to tenure.  An improved network that supports, curates, and communicates process development expertise might be extremely helpful.

April 15, 2021

Scott AaronsonThe ACM Prize thing

Last week I got an email from Dina Katabi, my former MIT colleague, asking me to call her urgently. Am I in trouble? For what, though?? I haven’t even worked at MIT for five years!

Luckily, Dina only wanted to tell me that I’d been selected to receive the 2020 ACM Prize in Computing, a mid-career award founded in 2007 that comes with $250,000 from Infosys. Not the Turing Award but I’d happily take it! And I could even look back on 2020 fondly for something.

I was utterly humbled to see the list of past ACM Prize recipients, which includes amazing computer scientists I’ve been privileged to know and learn from (like Jon Kleinberg, Sanjeev Arora, and Dan Boneh) and others who I’ve admired from afar (like Daphne Koller, Jeff Dean and Sanjay Ghemawat of Google MapReduce, and David Silver of AlphaGo and AlphaZero).

I was even more humbled, later, to read my prize citation, which focuses on four things:

  1. The theoretical foundations of the sampling-based quantum supremacy experiments now being carried out (and in particular, my and Alex Arkhipov’s 2011 paper on BosonSampling);
  2. My and Avi Wigderson’s 2008 paper on the algebrization barrier in complexity theory;
  3. Work on the limitations of quantum computers (in particular, the 2002 quantum lower bound for the collision problem); and
  4. Public outreach about quantum computing, including through QCSD, popular talks and articles, and this blog.

I don’t know if I’m worthy of such a prize—but I know that if I am, then it’s mainly for work I did between roughly 2001 and 2012. This honor inspires me to want to be more like I was back then, when I was driven, non-jaded, and obsessed with figuring out the contours of BQP and efficient computation in the physical universe. It makes me want to justify the ACM’s faith in me.

I’m grateful to the committee and nominators, and more broadly, to the whole quantum computing and theoretical computer science communities—which I “joined” in some sense around age 16, and which were the first communities where I ever felt like I belonged. I’m grateful to the mentors who made me what I am, especially Chris Lynch, Bart Selman, Lov Grover, Umesh Vazirani, Avi Wigderson, and (if he’ll allow me to include him) John Preskill. I’m grateful to the slightly older quantum computer scientists who I looked up to and tried to emulate, like Dorit Aharonov, Andris Ambainis, Ronald de Wolf, and John Watrous. I’m grateful to my wonderful colleagues at UT Austin, in the CS department and beyond. I’m grateful to my students and postdocs, the pride of my professional life. I’m grateful, of course, to my wife, parents, and kids.

By coincidence, my last post was also about prizes to theoretical computer scientists—in that case, two prizes that attracted controversy because of the recipient’s (or would-be recipient’s) political actions or views. It would understate matters to point out that not everyone has always agreed with everything I’ve said on this blog. I’m ridiculously lucky, and I know it, that even living through this polarized and tumultuous era, I never felt forced to choose between academic success and the freedom to speak my conscience in public under my real name. If there’s been one constant in my public stands, I’d like to think that—inspired by memories of my own years as an unknown, awkward, self-conscious teenager—it’s been my determination to nurture and protect talented young scientists, whatever they look like and wherever they come from. And I’ve tried to live up to that ideal in real life, and I welcome anyone’s scrutiny as to how well I’ve done.

What should I do with the prize money? I confess that my first instinct was to donate it, in its entirety, to some suitable charity—specifically, something that would make all the strangers who’ve attacked me on Twitter, Reddit, and so forth over the years realize that I’m fundamentally a good person. But I was talked out of this plan by my family, who pointed out that
(1) in all likelihood, nothing will make online strangers stop hating me,
(2) in any case this seems like a poor basis for making decisions, and
(3) if I really want to give others a say in what to do with the winnings, then why not everyone who’s stood by me and supported me?

So, beloved commenters! Please mention your favorite charitable causes below, especially weird ones that I wouldn’t have heard of otherwise. If I support their values, I’ll make a small donation from my prize winnings. Or a larger donation, especially if you donate yourself and challenge me to match. Whatever’s left after I get tired of donating will probably go to my kids’ college fund.

Update: And by an amusing coincidence, today is apparently “World Quantum Day”! I hope your Quantum Day is as pleasant as mine (and stable and coherent).

April 14, 2021

David HoggThe Practice of Astrophysics (tm)

Over the last few weeks—and the last few decades—I have had many conversations about all the things that are way more important to being a successful astrophysicist than facility with electromagnetism and quantum mechanics: There's writing, and mentoring, and project design, and reading, and visualization, and so on. Today I fantasized about a (very long) book entitled The Practice of Astrophysics that covers all of these things.

April 13, 2021

John BaezCan We Understand the Standard Model Using Octonions?

I gave two talks in Latham Boyle and Kirill Krasnov’s Perimeter Institute workshop Octonions and the Standard Model.

The first talk was on Monday April 5th at noon Eastern Time. The second was exactly one week later, on Monday April 12th at noon Eastern Time.

Here they are:

Can we understand the Standard Model? (video, slides)

Abstract. 40 years trying to go beyond the Standard Model hasn’t yet led to any clear success. As an alternative, we could try to understand why the Standard Model is the way it is. In this talk we review some lessons from grand unified theories and also from recent work using the octonions. The gauge group of the Standard Model and its representation on one generation of fermions arises naturally from a process that involves splitting 10d Euclidean space into 4+6 dimensions, but also from a process that involves splitting 10d Minkowski spacetime into 4d Minkowski space and 6 spacelike dimensions. We explain both these approaches, and how to reconcile them.

The second is on Monday April 12th at noon Eastern Time:

Can we understand the Standard Model using octonions? (video, slides)

Abstract. Dubois-Violette and Todorov have shown that the Standard Model gauge group can be constructed using the exceptional Jordan algebra, consisting of 3×3 self-adjoint matrices of octonions. After an introduction to the physics of Jordan algebras, we ponder the meaning of their construction. For example, it implies that the Standard Model gauge group consists of the symmetries of an octonionic qutrit that restrict to symmetries of an octonionic qubit and preserve all the structure arising from a choice of unit imaginary octonion. It also sheds light on why the Standard Model gauge group acts on 10d Euclidean space, or Minkowski spacetime, while preserving a 4+6 splitting.

You can see all the slides and videos and also some articles with more details here.

n-Category Café Can We Understand the Standard Model Using Octonions?

I gave two talks in Latham Boyle and Kirill Krasnov’s Perimeter Institute workshop Octonions and the Standard Model.

The first talk was on Monday April 5th at noon Eastern Time. The second was exactly one week later, on Monday April 12th at noon Eastern Time.

Here they are…

Can we understand the Standard Model? (video, slides)

Abstract. 40 years trying to go beyond the Standard Model hasn’t yet led to any clear success. As an alternative, we could try to understand why the Standard Model is the way it is. In this talk we review some lessons from grand unified theories and also from recent work using the octonions. The gauge group of the Standard Model and its representation on one generation of fermions arises naturally from a process that involves splitting 10d Euclidean space into 4+6 dimensions, but also from a process that involves splitting 10d Minkowski spacetime into 4d Minkowski space and 6 spacelike dimensions. We explain both these approaches, and how to reconcile them.

Can we understand the Standard Model using octonions? (video, slides)

Abstract. Dubois-Violette and Todorov have shown that the Standard Model gauge group can be constructed using the exceptional Jordan algebra, consisting of 3×3 self-adjoint matrices of octonions. After an introduction to the physics of Jordan algebras, we ponder the meaning of their construction. For example, it implies that the Standard Model gauge group consists of the symmetries of an octonionic qutrit that restrict to symmetries of an octonionic qubit and preserve all the structure arising from a choice of unit imaginary octonion. It also sheds light on why the Standard Model gauge group acts on 10d Euclidean space, or Minkowski spacetime, while preserving a 4+6 splitting.

You can see all the slides and videos and also some articles with more details here.

April 12, 2021

David Hoggbest setting of hyper-parameters

Adrian Price-Whelan (Flatiron) and I encountered an interesting conceptual point today in our distance estimation project: When you are doing cross-validation to set your hyper-parameters (a regularization strength in this case), what do you use as your validation scalar? That is, what are you optimizing? We started by naively optimizing the cost function, which is something like a weighted L2 of the residual and an L2 of the parameters. But then we switched from the cost function to just the data part (not the regularization part) of the cost function, and everything changed! The point is duh, actually, when you think about it from a Bayesian perspective: You want to improve the likelihood not the posterior pdf. That's another nice point for my non-existent paper on the difference between a likelihood and a posterior pdf. It also shows that, in general, the data and the regularization will be at odds.

Matt StrasslerPhysics is Broken!!!

Last Thursday, an experiment reported that the magnetic properties of the muon, the electron’s middleweight cousin, are a tiny bit different from what particle physics equations say they should be. All around the world, the headlines screamed: PHYSICS IS BROKEN!!! And indeed, it’s been pretty shocking to physicists everywhere. For instance, my equations are working erratically; many of the calculations I tried this weekend came out upside-down or backwards. Even worse, my stove froze my coffee instead of heating it, I just barely prevented my car from floating out of my garage into the trees, and my desk clock broke and spilled time all over the floor. What a mess!

Broken, eh? When we say a coffee machine or a computer is broken, it means it doesn’t work. It’s unavailable until it’s fixed. When a glass is broken, it’s shattered into pieces. We need a new one. I know it’s cute to say that so-and-so’s video “broke the internet.” But aren’t we going a little too far now? Nothing’s broken about physics; it works just as well today as it did a month ago.

More reasonable headlines have suggested that “the laws of physics have been broken”. That’s better; I know what it means to break a law. (Though the metaphor is imperfect, since if I were to break a state law, I’d be punished, whereas if an object were to break a fundamental law of physics, that law would have to be revised!) But as is true in the legal system, not all physics laws, and not all violations of law, are equally significant.

What’s a physics law, anyway? Crudely, physics is a strategy for making predictions about the behavior of physical objects, based on a set of equations and a conceptual framework for using those equations. Sometimes we refer to the equations as laws; sometimes parts of the conceptual framework are referred to that way.

But that story has layers. Physics has an underlying conceptual foundation, which includes the pillar of quantum physics and its view of reality, and the pillar of Einstein’s relativity and its view of space and time. (There are other pillars too, such as those of statistical mechanics, but let me not complicate the story now.) That foundation supports many research areas of physics. Within particle physics itself, these two pillars are combined into a more detailed framework, with concepts and equations that go by the name of “quantum effective field theory” (“QEFT”). But QEFT is still very general; this framework can describe an enormous number of possible universes, most with completely different particles and forces from the ones we have in our own universe. We can start making predictions for real-world experiments only when we put the electron, the muon, the photon, and all the other familiar particles and forces into our equations, building up a specific example of a QEFT known as “The Standard Model of particle physics.”

All along the way there are equations and rules that you might call “laws.” They too come in layers. The Standard Model itself, as a specific QEFT, has few high-level laws: there are no principles telling us why quarks exist, why there is one type of photon rather than two, or why the weak nuclear force is so weak. The few laws it does have are mostly low-level, true of our universe but not essential to it.

I’m bringing attention to these layers because an experiment might cause a problem for one layer but not another. I think you could only fairly suggest that “physics is broken” if data were putting a foundational pillar of the entire field into question. And to say “the laws of physics have been violated”, emphasis on the word “the“, is a bit melodramatic if the only thing that’s been violated is a low-level, dispensable law.

Has physics, as a whole, ever broken? You could argue that Newton’s 17th century foundation, which underpinned the next two centuries of physics, broke at the turn of the 20th century. Just after 1900, Newton-style equations had to be replaced by equations of a substantially different type; the ways physicists used the equations changed, and the concepts, the language, and even the goals of physics changed. For instance, in Newtonian physics, you can predict the outcome of any experiment, at least in principle; in post-Newtonian quantum physics, you often can only predict the probability for one or another outcome, even in principle. And in Newtonian physics we all agree what time it is; in Einsteinian physics, different observers experience time differently and there is no universal clock that we all agree on. These were immense changes in the foundation of the field.

Conversely, you could also argue that physics didn’t break; it was just remodeled and expanded. No one who’d been studying steam engines or wind erosion or electrical circuit diagrams had to throw out their books and start again from scratch. In fact this “broken” Newtonian physics is still taught in physics classes, and many physicists and engineers never use anything else. If you’re studying the physics of weather, or building a bridge, Newtonian physics is just fine. The fact that Newton-style equations are an incomplete description of the world — that there are phenomena they can’t describe properly — doesn’t invalidate them when they’re applied within their wheelhouse.

No matter which argument you prefer, it’s hard to see how to justify the phrase “physics is broken” without a profound revolution that overthrows foundational concepts. It’s rare for a serious threat to foundations to arise suddenly, because few experiments can single-handedly put fundamental principles at risk. [The infamous case of the “faster-than-light neutrinos” provides an exception. Had that experiment been correct, it would have invalidated Einstein’s relativity principles. But few of us were surprised when a glaring error turned up.]

In the Standard Model, the electron, muon and tau particles (known as the “charged leptons”) are all identical except for their masses. (More fundamentally, they have different interactions with the Higgs field, from which their rest masses arise.) This almost-identity is sometimes stated as a “principle of lepton universality.” Oh, wow, a principle — a law! But here’s the thing. Some principles are enormously important; the principles of Einsteinian relativity determine how cause and effect work in our universe, and you can’t drop them without running into big paradoxes. Other principles are weak, and could easily be discarded without making a mess of any other part of physics. The principle of lepton universality is one of these. In fact, if you extend the Standard Model by adding new particles to its equations, it can be difficult to avoid ruining this fragile principle. [In a sense, the Higgs field has already violated the principle, but we don’t hold that against it.]

All the fuss is about a new experimental result which confirms an older one and slightly disagrees with the latest theoretical predictions, which are made using the Standard Model’s equations. What could be the cause of the discrepancy? One possibility is that it arises from a previously unknown difference between muons and electrons — from a violation of the principle of lepton universality. For those who live and breathe particle physics, breaking lepton universality would be a big deal; there’d be lots of adventure in trying to figure out which of the many possible extensions of the Standard Model could actually explain what broke this law. That’s why the scientists involved sound so excited.

But the failure of lepton universality wouldn’t come as a huge surprise. From certain points of view, the surprise is that the principle has survived this long! Since this low-level law is easily violated, its demise may not lead us to a profound new understanding of the world. It’s way too early for headlines that argue that what’s at stake is the existence of “forms of matter and energy vital to the nature and evolution of the cosmos.” No one can say how much is at stake; it might be a lot, or just a little.

In particular, there’s absolutely no evidence that physics is broken, or even that particle physics is broken. The pillars of physics and QEFT are not (yet) threatened. Even to say that “the Standard Model might be broken” seems a bit melodramatic to me. Does adding a new wing to a house require “breaking” the house? Typically you can still live in the place while it’s being extended. The Standard Model’s many successes suggest that it might survive largely intact as a recognizable part of a larger, more complete set of equations.

In any case, right now it’s still too early to say anything so loudly. The apparent discrepancy may not survive the heavy scrutiny it is coming under. There’s plenty of controversy about the theoretical prediction for muon magnetism; the required calculation is extraordinarily complex, elaborate and difficult.

So, from my perspective, the headlines of the past week are way over the top. The idea that a single measurement of the muon’s magnetism could “shake physics to its core“, as claimed in another headline I happened upon, is amusing at best. Physics, and its older subdisciplines, have over time become very difficult to break, or even shake. That’s the way it should be, when science is working properly. And that’s why we can safely base the modern global economy on scientific knowledge; it’s unlikely that a single surprise could instantly invalidate large chunks of its foundation.

Some readers may view the extreme, click-baiting headlines as harmless. Maybe I’m overly concerned about them. But don’t they implicitly suggest that one day we will suddenly find physics “upended”, and in need of a complete top-to-bottom overhaul? To imply physics can “break” so easily makes a mockery of science’s strengths, and obscures the process by which scientific knowledge is obtained. And how can it be good to claim “physics is broken” and “the laws of physics have been broken” over and over and over again, in stories that almost never merit that level of hype and eventually turn out to have been much ado about nada? The constant manufacturing of scientific crisis cannot possibly be lost on readers, who I suspect are becoming increasingly jaded. At some point readers may become as skeptical of science journalism, and the science it describes, as they are of advertising; it’s all lies, so caveat emptor. That’s not where we want our society to be. As we are seeing in spades during the current pandemic, there can be serious consequences when “TRUST IN SCIENCE IS BROKEN!!!

A final footnote: Ironically, the Standard Model itself poses one of the biggest threats to the framework of QEFT. The discovery of the Higgs boson and nothing else (so far) at the Large Hadron Collider poses a conceptual challenge — the “naturalness” problem. There’s no sharp paradox, which is why I can’t promise you that the framework of QEFT will someday break if it isn’t resolved. But the breakdown of lepton universality might someday help solve the naturalness problem, by requiring a more “natural” extension of the Standard Model, and thus might actually save QEFT instead of “breaking” it.

April 11, 2021

David Hoggstrange binary star system; orbitize!

Sarah Blunt (Caltech) crashed Stars & Exoplanets Meeting today. She told us about her ambitious, community-built orbitize project, and also results on a mysterious binary-star system, HD 104304. This is a directly-imaged binary, but when they took radial-velocity measurements, the mass of the primary is way too high for its color and luminosity. The beauty of orbitize is that it can take heterogeneous data, and it uses brute-force importance sampling (like my one true love The Joker), so she can deal with very non-trivial likelihood functions and low signal-to-noise, sparse data.

The crowd had many reactions, one of which is that probably the main issue is that ESA Gaia is giving a wrong parallax. That's a boring explanation, but it opens a nice question of using the data to infer or predict a distance, which is old-school fundamental astronomy.

April 09, 2021

John BaezThe Expansion of the Universe

We can wait a while to explore the Universe, but we shouldn’t wait too long. If the Universe continues its accelerating expansion as predicted by the usual model of cosmology, it will eventually expand by a factor of 2 every 12 billion years. So if we wait too long, we can’t ever reach a distant galaxy.

In fact, after 150 billion years, all galaxies outside our Local Group will become completely inaccessible, in principle by any form of transportation not faster than light!

For an explanation, read this:

• Toby Ord, The edges of our Universe.

This is where I got the table.

150 billion years sounds like a long time, but the smallest stars powered by fusion—the red dwarf stars, which are very plentiful—are expected to last much longer: about 10 trillion years!  So, we can imagine a technologically advanced civilization that has managed to spread over the Local Group and live near red dwarf stars, which eventually regrets that it has waited too long to expand through more of the Universe.  

The Local Group is a collection of roughly 50 nearby galaxies containing about 2 trillion stars, so there’s certainly plenty to do here. It’s held together by gravity, so it won’t get stretched out by the expansion of the Universe—not, at least, until its stars slowly “boil off” due to some randomly picking up high speeds. But will happen much, much later: more than 10 quintillion years, that is, 1019 years.

For more, see this article of mine:

The end of the Universe.

Scott AaronsonJust some prizes

Oded Goldreich is a theoretical computer scientist at the Weizmann Institute in Rehovot, Israel. He’s best known for helping to lay the rigorous foundations of cryptography in the 1980s, through seminal results like the Goldreich-Levin Theorem (every one-way function can be modified to have a hard-core predicate), the Goldreich-Goldwasser-Micali Theorem (every pseudorandom generator can be made into a pseudorandom function), and the Goldreich-Micali-Wigderson protocol for secure multi-party computation. I first met Oded more than 20 years ago, when he lectured at a summer school at the Institute for Advanced Study in Princeton, barefoot and wearing a tank top and what looked like pajama pants. It was a bracing introduction to complexity-theoretic cryptography. Since then, I’ve interacted with Oded from time to time, partly around his firm belief that quantum computing is impossible.

Last month a committee in Israel voted to award Goldreich the Israel Prize (roughly analogous to the US National Medal of Science), for which I’d say Goldreich had been a plausible candidate for decades. But alas, Yoav Gallant, Netanyahu’s Education Minister, then rather non-gallantly blocked the award, solely because he objected to Goldreich’s far-left political views (and apparently because of various statements Goldreich signed, including in support of a boycott of Ariel University, which is in the West Bank). The case went all the way to the Israeli Supreme Court (!), which ruled two days ago in Gallant’s favor: he gets to “delay” the award to investigate the matter further, and in the meantime has apparently sent out invitations for an award ceremony next week that doesn’t include Goldreich. Some are now calling for the other winners to boycott the prize in solidarity until this is righted.

I doubt readers of this blog need convincing that this is a travesty and an embarrassment, a shanda, for the Netanyahu government itself. That I disagree with Goldreich’s far-left views (or might disagree, if I knew in any detail what they were) is totally immaterial to that judgment. In my opinion, not even Goldreich’s belief in the impossibility of quantum computers should affect his eligibility for the prize. 🙂

Maybe it would be better to say that, as far as his academic colleagues in Israel and beyond are concerned, Goldreich has won the Israel Prize; it’s only some irrelevant external agent who’s blocking his receipt of it. Ironically, though, among Goldreich’s many heterodox beliefs is a total rejection of the value of scientific prizes (although Goldreich has also said he wouldn’t refuse the Israel Prize if offered it!).

In unrelated news, the 2020 Turing Award has been given to Al Aho and Jeff Ullman. Aho and Ullman have both been celebrated leaders in CS for half a century, having laid many of the foundations of formal languages and compilers, and having coauthored one of CS’s defining textbooks with John Hopcroft (who already received a different Turing Award).

But again there’s a controversy. Apparently, in 2011, Ullman wrote to an Iranian student who wanted to work with him, saying that as “a matter of principle,” he would not accept Iranian students until the Iranian government recognized Israel. Maybe I should say that I, like Ullman, am both a Jew and a Zionist, but I find it hard to imagine the state of mind that would cause me to hold some hapless student responsible for the misdeeds of their birth-country’s government. Ironically, this is a mirror-image of the tactics that the BDS movement has wielded against Israeli academics. Unlike Goldreich, though, Ullman seems to have gone beyond merely expressing his beliefs, actually turning them into a one-man foreign policy.

I’m proud of the Iranian students I’ve mentored and hope to mentor more. While I don’t think this issue should affect Ullman’s Turing Award (and I haven’t seen anyone claim that it should), I do think it’s appropriate to use the occasion to express our opposition to all forms of discrimination. I fully endorse Shafi Goldwasser’s response in her capacity as Director of the Simons Institute for Theory of Computing in Berkeley:

As a senior member of the computer science community and an American-Israeli, I stand with our Iranian students and scholars and outright reject any notion by which admission, support, or promotion of individuals in academic settings should be impeded by national origin or politics. Individuals should not be conflated with the countries or institutions they come from. Statements and actions to the contrary have no place in our computer science community. Anyone experiencing such behavior will find a committed ally in me.

As for Al Aho? I knew him fifteen years ago, when he became interested in quantum computing, in part due to his then-student Krysta Svore (who’s now the head of Microsoft’s quantum computing efforts). Al struck me as not only a famous scientist but a gentleman who radiated kindness everywhere. I’m not aware of any controversies he’s been involved in and never heard anyone say a bad word about him.

Anyway, this seems like a good occasion to recognize some foundational achievements in computer science, as well as the complex human beings who produce them!

Matt von HippelTheoretical Uncertainty and Uncertain Theory

Yesterday, Fermilab’s Muon g-2 experiment announced a new measurement of the magnetic moment of the muon, a number which describes how muons interact with magnetic fields. For what might seem like a small technical detail, physicists have been very excited about this measurement because it’s a small technical detail that the Standard Model seems to get wrong, making it a potential hint of new undiscovered particles. Quanta magazine has a great piece on the announcement, which explains more than I will here, but the upshot is that there are two different calculations on the market that attempt to predict the magnetic moment of the muon. One of them, using older methods, disagrees with the experiment. The other, with a new approach, agrees. The question then becomes, which calculation was wrong? And why?

What does it mean for a prediction to match an experimental result? The simple, wrong, answer is that the numbers must be equal: if you predict “3”, the experiment has to measure “3”. The reason why this is wrong is that in practice, every experiment and every prediction has some uncertainty. If you’ve taken a college physics class, you’ve run into this kind of uncertainty in one of its simplest forms, measurement uncertainty. Measure with a ruler, and you can only confidently measure down to the smallest divisions on the ruler. If you measure 3cm, but your ruler has ticks only down to a millimeter, then what you’re measuring might be as large as 3.1cm or as small as 2.9 cm. You just don’t know.

This uncertainty doesn’t mean you throw up your hands and give up. Instead, you estimate the effect it can have. You report, not a measurement of 3cm, but of 3cm plus or minus 1mm. If the prediction was 2.9cm, then you’re fine: it falls within your measurement uncertainty.

Measurements aren’t the only thing that can be uncertain. Predictions have uncertainty too, theoretical uncertainty. Sometimes, this comes from uncertainty on a previous measurement: if you make a prediction based on that experiment that measured 3cm plus or minus 1mm, you have to take that plus or minus into account and estimate its effect (we call this propagation of errors). Sometimes, the uncertainty comes instead from an approximation you’re making. In particle physics, we sometimes approximate interactions between different particles with diagrams, beginning with the simplest diagrams and adding on more complicated ones as we go. To estimate the uncertainty there, we estimate the size of the diagrams we left out, the more complicated ones we haven’t calculated yet. Other times, that approximation doesn’t work, and we need to use a different approximation, treating space and time as a finite grid where we can do computer simulations. In that case, you can estimate your uncertainty based on how small you made your grid. The new approach to predicting the muon magnetic moment uses that kind of approximation.

There’s a common thread in all of these uncertainty estimates: you don’t expect to be too far off on average. Your measurements won’t be perfect, but they won’t all be screwed up in the same way either: chances are, they will randomly be a little below or a little above the truth. Your calculations are similar: whether you’re ignoring complicated particle physics diagrams or the spacing in a simulated grid, you can treat the difference as something small and random. That randomness means you can use statistics to talk about your errors: you have statistical uncertainty. When you have statistical uncertainty, you can estimate, not just how far off you might get, but how likely it is you ended up that far off. In particle physics, we have very strict standards for this kind of thing: to call something new a discovery, we demand that it is so unlikely that it would only show up randomly under the old theory roughly one in a million times. The muon magnetic moment isn’t quite up to our standards for a discovery yet, but the new measurement brought it closer.

The two dueling predictions for the muon’s magnetic moment both estimate some amount of statistical uncertainty. It’s possible that the two calculations just disagree due to chance, and that better measurements or a tighter simulation grid would make them agree. Given their estimates, though, that’s unlikely. That takes us from the realm of theoretical uncertainty, and into uncertainty about the theoretical. The two calculations use very different approaches. The new calculation tries to compute things from first principles, using the Standard Model directly. The risk is that such a calculation needs to make assumptions, ignoring some effects that are too difficult to calculate, and one of those assumptions may be wrong. The older calculation is based more on experimental results, using different experiments to estimate effects that are hard to calculate but that should be similar between different situations. The risk is that the situations may be less similar than expected, their assumptions breaking down in a way that the bottom-up calculation could catch.

None of these risks are easy to estimate. They’re “unknown unknowns”, or rather, “uncertain uncertainties”. And until some of them are resolved, it won’t be clear whether Fermilab’s new measurement is a sign of undiscovered particles, or just a (challenging!) confirmation of the Standard Model.

ResonaancesWhy is it when something happens it is ALWAYS you, muons?

April 7, 2021 was like a good TV episode: high-speed action, plot twists, and a cliffhanger ending. We now know that the strength of the little magnet inside the muon is described by the g-factor: 

g = 2.00233184122(82).

Any measurement of basic properties of matter is priceless, especially when it come with this incredible precision.  But for a particle physicist the main source of excitement is that this result could herald the breakdown of the Standard Model. The point is that the g-factor or the magnetic moment of an elementary particle can be calculated theoretically to a very good accuracy. Last year, the white paper of the Muon g−2 Theory Initiative came up with the consensus value for the Standard Model prediction 

                                                                      g = 2.00233183620(86), 

which is significantly smaller than the experimental value.  The discrepancy is estimated at 4.2 sigma, assuming the theoretical error is Gaussian and combining the errors in quadrature. 

As usual, when we see an experiment and the Standard Model disagree, these 3 things come to mind first

  1.  Statistical fluctuation. 
  2.  Flawed theory prediction. 
  3.  Experimental screw-up.   

The odds for 1. are extremely low in this case.  3. is not impossible but unlikely as of April 7. Basically the same experiment was repeated twice, first in Brookhaven 20 years ago, and now in Fermilab, yielding very consistent results. One day it would be nice to get an independent confirmation using alternative experimental techniques, but we are not losing any sleep over it. It is fair to say, however,  that 2. is not yet written off by most of the community. The process leading to the Standard Model prediction is of enormous complexity. It combines technically challenging perturbative calculations (5-loop QED!), data-driven methods, and non-perturbative inputs from dispersion relations, phenomenological models, and lattice QCD. One especially difficult contribution to evaluate is due to loops of light hadrons (pions etc.) affecting photon propagation.  In the white paper,  this hadronic vacuum polarization is related by theoretical tricks to low-energy electron scattering and determined from experimental data. However, the currently most precise lattice evaluation of the same quantity gives a larger value that would take the Standard Model prediction closer to the experiment. The lattice paper first appeared a year ago but only now was published in Nature in a well-timed move that can be compared to an ex crashing a wedding party. The theory and experiment are now locked in a three-way duel, and we are waiting for the shootout to see which theoretical prediction survives. Until this controversy is resolved, there will be a cloud of doubt hanging over every interpretation of the muon g-2 anomaly.   

  But let us assume for a moment that the white paper value is correct. This would be huge, as it would mean that the Standard Model does not fully capture how muons interact with light. The correct interaction Lagrangian would have to be (pardon my Greek)

The first term is the renormalizable minimal coupling present in the Standard Model, which gives the Coulomb force and all the usual electromagnetic phenomena. The second term is called the magnetic dipole. It leads to a small shift of the muon g-factor, so as to explain the Brookhaven and Fermilab measurements.  This is a non-renormalizable interaction, and so it must be an effective description of virtual effects of some new particle from beyond the Standard Model. Theorists have invented countless models for this particle in order to address the old Brookhaven measurement, and the Fermilab update changes little in this enterprise. I will write about it another time.  For now, let us just crunch some numbers to highlight one general feature. Even though the scale suppressing the effective dipole operator is in the EeV range, there are indications that the culprit particle is much lighter than that. First, electroweak gauge invariance forces it to be less than ~100 TeV in a rather model-independent way.  Next, in many models contributions to muon g-2 come with the chiral suppression proportional to the muon mass. Moreover, they typically appear at one loop, so the operator will pick up a loop suppression factor unless the new particle is strongly coupled.  The same dipole operator as above can be more suggestively recast as  

The scale 300 GeV appearing in the denominator indicates that the new particle should be around the corner!  Indeed, the discrepancy between the theory and experiment is larger than the contribution of the W and Z bosons to the muon g-2, so it seems logical to put the new particle near the electroweak scale. That's why the stakes of the April 7 Fermilab announcement are so enormous. If the gap between the Standard Model and experiment is real, the new particles and forces responsible for it should be within reach of the present or near-future colliders. This would open a new experimental era that is almost too beautiful to imagine. And for theorists, it would bring new pressing questions about who ordered it. 

April 08, 2021

Doug Natelson"Fireside Chat" about Majoranas

Along with Zeila Zanolli, tomorrow (Friday April 9) I will be serving as a moderator for a "fireside chat" about Majorana fermions being given by Sergey Frolov and Vincent Mourik.   This is being done as a zoom webinar (registration info here), at 11am EDT.   Should be an interesting discussion - about 20 minutes of presentation followed by q & a.  

Tommaso DorigoLooking For A Ph.D. In Physics Or Computer Science? Look No Further

Today the University of Padova has issued a call for Ph. D. positions to start in October 2021, and the Department of Physics and Astronomy has 23 new openings. The English version of the call page is here.

read more

April 07, 2021

Alexey PetrovLook, there is a crack in the Standard Model!

This post is about the result of the Muon g-2 experiment announced today at Fermilab. It has an admittedly bad title: the Standard Model has not cracked in any way, it is still a correct theory that decently describes interactions of known elementary particles at the energies we checked. Yet, maybe we finally found an observable that is not completely described by the Standard Model. Its theoretically computed value needs an additional component to agree with the newly reported experimental value and this component might well be New Physics! What is this observable?

This observable is the value of the anomalous magnetic moment of the muon. The muon, an elementary particle (a lepton), is a close cousin of an electron. It has very similar properties to the electron, but is about 200 times heavier and is unstable — it only lives for about two microseconds. We don’t yet know why Nature chose to create two similar copies of an electron: muon and tau-lepton. But we can study their properties to find out.

Just like an electron, the muon has spin, which makes it susceptible to the effects of the magnetic field, which is characterized by its magnetic moment. The magnetic moment tells us how the muon reacts to its presence: think of the compass needle as a classical analogy. Over a century ago, brilliant physicist Paul Dirac predicted the value of an electron’s magnetic moment, which is directly applicable to muon as well. This prediction involved a parameter, which he called g, from the gyromagnetic ratio or a g-factor. Dirac’s prediction was that, for an electron (and a muon), it is supposed to be exactly g=2. This was one of the predictions that allowed experimentalists to test the validity of Dirac’s theory which eventually led to its triumph.

With further development of quantum field theories, it was realized that g is not exactly two. The effects of virtual particles lead to the effect that the photon of the magnetic field probing the muon could instead hit those virtual particles instead, potentially changing the value of the g-factor. Now, dealing with virtual particles could be tricky in theoretical computations, as their effects lead to unphysical infinities that need to be absorbed in the definitions of muon’s mass, charge, and the wave-function. But the leading effect of such particles — assuming only the Standard Model particles — turns out to be finite! Julian Schwinger showed that in his 1948 paper. This result was so influential at the time that is literally engraved on his tombstone! This paved the way to compute the quantum radiative corrections to muon’s magnetic moment. Since the effect of such radiative corrections is to change the magnetic moment, they lead to the deviation to the Dirac’s theory prediction and lead to the non-zero value of a = (g-2)/2, which is conventionally referred to as the anomalous magnetic moment. This is precisely what Muon g-2 collaboration measured very precisely.

Why is it interesting? The thing is that among the known virtual particles there could also be new, unknown particles. If those particles interact with the photons, they could also affect the numerical value of the anomalous magnetic moment. So the idea is simple: compute it with as much precision as possible and then compare it to the measurement that is done with the best precision possible. This is precisely what was done.

Easily said but not so easily done. Precise predictions of the anomalous magnetic moment involved computations of thousands of Feynman diagrams and evaluation of contributions that can not be computed by expanding in some small quantity (aka non-perturbative effects). There are many theoretical methods used to compute those, including numerical computations in lattice QCD. But there is now an agreement among the theorists on the anomalous magnetic moment of the muon: a = 0.00116591810(43) (see here for a paper). This number is known with astonishing precision, which is indicated by the bracketed numbers.

The experimental analysis is incredibly hard. Since muons decay, the measurement of their properties is not trivial. Muons are produced in the decays of other particles, called pions, that are created at Fermilab by smashing accelerated protons into targets. Once created, they are directed into a storage ring where they decay in a magnetic field giving out their spin information. The storage ring contains about 10,000 muons at the time going around the ring. To make the measurement, it is important to know the magnetic field in which those muons are moving with incredible precision. There is also an electric field that makes the muons going around the ring, whose effect is carefully removed by choosing how fast the muons fly. If all those (and other) effects are not accounted for, they would affect the result of the measurement! Their combined effect is usually referred to as systematic uncertainties. Most of the work done by the Muon g-2 collaboration at Fermilab was to reduce such effects, which eventually led to the acceptable level of those systematic uncertainties.

And here is the result (drum roll): the anomalous magnetic moment measured by the Muon g-2 collaboration is a=0.00116592061(41).

Ok, what does it all mean? First of all, the result is seemingly only ever so slightly different from the theoretical prediction. But it is not. What is more interesting is that if one combines this new result with the old result from the Brookhaven National Lab, one gets a very significant difference between the theoretical predictions and a combined result of two measurements: it is about 4.2 sigma. Sigmas measure the statistical significance of the result, 4.2 sigma means that the chance that the theoretical and experimental results agree — which is possible due to statistical fluctuations — is about 1 out 40,000! This is incredible!

The result might mean that there are particles that are not described by the Standard Model and the New Physics could be just around the corner! Come back here for more discoveries!

Tommaso DorigoNew Muon G-2 Results!

Note: this is an updated version of the article. For the original discussion of the muon anomaly, published before the release of results, please scroll down.

read more

April 05, 2021

Scott AaronsonThe Computational Expressiveness of a Model Train Set: A Paperlet

Update (April 5, 2021): So it turns out that Adam Chalcraft and Michael Greene already proved the essential result of this post back in 1994 (hat tip to commenter Dylan). Not terribly surprising in retrospect!

My son Daniel had his fourth birthday a couple weeks ago. For a present, he got an electric train set. (For completeness—and since the details of the train set will be rather important to the post—it’s called “WESPREX Create a Dinosaur Track”, but this is not an ad and I’m not getting a kickback for it.)

As you can see, the main feature of this set is a Y-shaped junction, which has a flap that can control which direction the train goes. The logic is as follows:

  • If the train is coming up from the “bottom” of the Y, then it continues to either the left arm or the right arm, depending on where the flap is. It leaves the flap as it was.
  • If the train is coming down the left or right arms of the Y, then it continues to the bottom of the Y, pushing the flap out of its way if it’s in the way. (Thus, if the train were ever to return to this Y-junction coming up from the bottom, not having passed the junction in the interim, it would necessarily go to the same arm, left or right, that it came down from.)

The train set also comes with bridges and tunnels; thus, there’s no restriction of planarity. Finally, the train set comes with little gadgets that can reverse the train’s direction, sending it back in the direction that it came from:

These gadgets don’t seem particularly important, though, since we could always replace them if we wanted by a Y-junction together with a loop.

Notice that, at each Y-junction, the position of the flap stores one bit of internal state, and that the train can both “read” and “write” these bits as it moves around. Thus, a question naturally arises: can this train set do any nontrivial computations? If there are n Y-junctions, then can it cycle through exp(n) different states? Could it even solve PSPACE-complete problems, if we let it run for exponential time? (For a very different example of a model-train-like system that, as it turns out, is able to express PSPACE-complete problems, see this recent paper by Erik Demaine et al.)

Whatever the answers regarding Daniel’s train set, I knew immediately on watching the thing go that I’d have to write a “paperlet” on the problem and publish it on my blog (no, I don’t inflict such things on journals!). Today’s post constitutes my third “paperlet,” on the general theme of a discrete dynamical system that someone showed me in real life (e.g. in a children’s toy or in biology) having more structure and regularity than one might naïvely expect. My first such paperlet, from 2014, was on a 1960s toy called the Digi-Comp II; my second, from 2016, was on DNA strings acted on by recombinase (OK, that one was associated with a paper in Science, but my combinatorial analysis wasn’t the main point of the paper).

Anyway, after spending an enjoyable evening on the problem of Daniel’s train set, I was able to prove that, alas, the possible behaviors are quite limited (I classified them all), falling far short of computational universality.

If you feel like I’m wasting your time with trivialities (or if you simply enjoy puzzles), then before you read any further, I encourage you to stop and try to prove this for yourself!

Back yet? OK then…

Theorem: Assume a finite amount of train track. Then after a linear amount of time, the train will necessarily enter a “boring infinite loop”—i.e., an attractor state in which at most two of the flaps keep getting toggled, and the rest of the flaps are fixed in place. In more detail, the attractor must take one of four forms:

I. a line (with reversing gadgets on both ends),
II. a simple cycle,
III. a “lollipop” (with one reversing gadget and one flap that keeps getting toggled), or
IV. a “dumbbell” (with two flaps that keep getting toggled).

In more detail still, there are seven possible topologically distinct trajectories for the train, as shown in the figure below.

Here the red paths represent the attractors, where the train loops around and around for an unlimited amount of time, while the blue paths represent “runways” where the train spends a limited amount of time on its way into the attractor. Every degree-3 vertex is assumed to have a Y-junction, while every degree-1 vertex is assumed to have a reversing gadget, unless (in IIb) the train starts at that vertex and never returns to it.

The proof of the theorem rests on two simple observations.

Observation 1: While the Y-junctions correspond to vertices of degree 3, there are no vertices of degree 4 or higher. This means that, if the train ever revisits a vertex v (other than the start vertex) for a second time, then there must be some edge e incident to v that it also traverses for a second time immediately afterward.

Observation 2: Suppose the train traverses some edge e, then goes around a simple cycle (meaning, one where no edges or vertices are reused), and then traverses e again, going in the same direction as the first time. Then from that point forward, the train will just continue around the same simple cycle forever.

The proof of Observation 2 is simply that, if there were any flap that might be in the train’s way as it continued around the simple cycle, then the train would already have pushed it out of the way its first time around the cycle, and nothing that happened thereafter could possibly change the flap’s position.

Using the two observations above, let’s now prove the theorem. Let the train start where it will, and follow it as it traces out a path. Since the graph is finite, at some point some already-traversed edge must be traversed a second time. Let e be the first such edge. By Observation 1, this will also be the first time the train’s path intersects itself at all. There are then three cases:

Case 1: The train traverses e in the same direction as it did the first time. By Observation 2, the train is now stuck in a simple cycle forever after. So the only question is what the train could’ve done before entering the simple cycle. We claim that at most, it could’ve traversed a simple path. For otherwise, we’d contradict the assumption that e was the first edge that the train visited twice on its journey. So the trajectory must have type IIa, IIb, or IIc in the figure.

Case 2: Immediately after traversing e, the train hits a reversing gadget and traverses e again the other way. In this case, the train will clearly retrace its entire path and then continue past its starting point; the question is what happens next. If it hits another reversing gadget, then the trajectory will have type I in the figure. If it enters a simple cycle and stays in it, then the trajectory will have type IIb in the figure. If, finally, it makes a simple cycle and then exits the cycle, then the trajectory will have type III in the figure. In this last case, the train’s trajectory will form a “lollipop” shape. Note that there must be a Y-junction where the “stick” of the lollipop meets the “candy” (i.e., the simple cycle), with the base of the Y aligned with the stick (since otherwise the train would’ve continued around and around the candy). From this, we deduce that every time the train goes around the candy, it does so in a different orientation (clockwise or counterclockwise) than the time before; and that the train toggles the Y-junction’s flap every time it exits the candy (although not when it enters the candy).

Case 3: At some point after traversing e in the forward direction (but not immediately after), the train traverses e in the reverse direction. In this case, the broad picture is analogous to Case 2. So far, the train has made a lollipop with a Y-junction connecting the stick to the candy (i.e. cycle), the base of the Y aligned with the stick, and e at the very top of the stick. The question is what happens next. If the train next hits a reversing gadget, the trajectory will have type III in the figure. If it enters a new simple cycle, disjoint from the first cycle, and never leaves it, the trajectory will have type IId in the figure. If it enters a new simple cycle, disjoint from the first cycle, and does leave it, then the trajectory now has a “dumbbell” pattern, type IV in the figure (also shown in the first video). There’s only one other situation to worry about: namely, that the train makes a new cycle that intersects the first cycle, forming a “theta” (θ) shaped trajectory. In this case, there must be a Y-junction at the point where the new cycle bumps into the old cycle. Now, if the base of the Y isn’t part of the old cycle, then the train never could’ve made it all the way around the old cycle in the first place (it would’ve exited the old cycle at this Y-junction), contradiction. If the base of the Y is part of the old cycle, then the flap must have been initially set to let the train make it all the way around the old cycle; when the train then reenters the old cycle, the flap must be moved so that the train will never make it all the way around the old cycle again. So now the train is stuck in a new simple cycle (sharing some edges with the old cycle), and the trajectory has type IIc in the figure.

This completes the proof of the theorem.

We might wonder: why isn’t this model train set capable of universal computation, of AND, OR, and NOT gates—or at any rate, of some computation more interesting than repeatedly toggling one or two flaps? My answer might sound tautological: it’s simply that the logic of the Y-junctions is too limited. Yes, the flaps can get pushed out of the way—that’s a “bit flip”—but every time such a flip happens, it helps to set up a “groove” in which the train just wants to continue around and around forever, not flipping any additional bits, with only the minor complications of the lollipop and dumbbell structures to deal with. Even though my proof of the theorem might’ve seemed like a tedious case analysis, it had this as its unifying message.

It’s interesting to think about what gadgets would need to be added to the train set to make it computationally universal, or at least expressively richer—able, as turned out to be the case for the Digi-Comp II, to express some nontrivial complexity class falling short of P. So for example, what if we had degree-4 vertices, with little turnstile gadgets? Or multiple trains, which could be synchronized to the millisecond to control how they interacted with each other via the flaps, or which could even crash into each other? I look forward to reading your ideas in the comment section!

For the truth is this: quantum complexity classes, BosonSampling, closed timelike curves, circuit complexity in black holes and AdS/CFT, etc. etc.—all these topics are great, but the same models and problems do get stale after a while. I aspire for my research agenda to chug forward, full steam ahead, into new computational domains.

PS. Happy Easter to those who celebrate!

April 04, 2021

John BaezThe Koide Formula

There are three charged leptons: the electron, the muon and the tau. Let m_e, m_\mu and m_\tau be their masses. Then the Koide formula says

\displaystyle{ \frac{m_e + m_\mu + m_\tau}{\big(\sqrt{m_e} + \sqrt{m_\mu} + \sqrt{m_\tau}\big)^2} = \frac{2}{3} }

There’s no known reason for this formula to be true! But if you plug in the experimentally measured values of the electron, muon and tau masses, it’s accurate within the current experimental error bars:

\displaystyle{ \frac{m_e + m_\mu + m_\tau}{\big(\sqrt{m_e} + \sqrt{m_\mu} + \sqrt{m_\tau}\big)^2} = 0.666661 \pm 0.000007 }

Is this significant or just a coincidence? Will it fall apart when we measure the masses more accurately? Nobody knows.

Here’s something fun, though:

Puzzle. Show that no matter what the electron, muon and tau masses might be—that is, any positive numbers whatsoever—we must have

\displaystyle{ \frac{1}{3} \le \frac{m_e + m_\mu + m_\tau}{\big(\sqrt{m_e} + \sqrt{m_\mu} + \sqrt{m_\tau}\big)^2} \le 1}

For some reason this ratio turns out to be almost exactly halfway between the lower bound and upper bound!

Koide came up with his formula in 1982 before the tau’s mass was measured very accurately.  At the time, using the observed electron and muon masses, his formula predicted the tau’s mass was

m_\tau = 1776.97 MeV/c2

while the observed mass was

m_\tau = 1784.2 ± 3.2 MeV/c2

Not very good.

In 1992 the tau’s mass was measured much more accurately and found to be

m_\tau = 1776.99 ± 0.28 MeV/c2

Much better!

Koide has some more recent thoughts about his formula:

• Yoshio Koide, What physics does the charged lepton mass relation tell us?, 2018.

He points out how difficult it is to explain a formula like this, given how masses depend on an energy scale in quantum field theory.

April 02, 2021

Matt von HippelIs Outreach for Everyone?

Betteridge’s law applies here: the answer is “no”. It’s a subtle “no”, though.

As a scientist, you will always need to be able to communicate your work. Most of the time you can get away with papers and talks aimed at your peers. But the longer you mean to stick around, the more often you will have to justify yourself to others: to departments, to universities, and to grant agencies. A scientist cannot survive on scientific ability alone: to get jobs, to get funding, to survive, you need to be able to promote yourself, at least a little.

Self-promotion isn’t outreach, though. Talking to the public, or to journalists, is a different skill from talking to other academics or writing grants. And it’s entirely possible to go through an entire scientific career without exercising that skill.

That’s a reassuring message for some. I’ve met people for whom science is a refuge from the mess of human interaction, people horrified by the thought of fame or even being mentioned in a newspaper. When I meet these people, they sometimes seem to worry that I’m silently judging them, thinking that they’re ignoring their responsibilities by avoiding outreach. They think this in part because the field seems to be going in that direction. Grants that used to focus just on science have added outreach as a requirement, demanding that each application come with a plan for some outreach project.

I can’t guarantee that more grants won’t add outreach requirements. But I can say at least that I’m on your side here: I don’t think you should have to do outreach if you don’t want to. I don’t think you have to, just yet. And I think if grant agencies are sensible, they’ll find a way to encourage outreach without making it mandatory.

I think that overall, collectively, we have a responsibility to do outreach. Beyond the old arguments about justifying ourselves to taxpayers, we also just ought to be open about what we do. In a world where people are actively curious about us, we ought to encourage and nurture that curiosity. I don’t think this is unique to science, I think it’s something every industry, every hobby, and every community should foster. But in each case, I think that communication should be done by people who want to do it, not forced on every member.

I also think that, potentially, anyone can do outreach. Outreach can take different forms for different people, anything from speaking to high school students to talking to journalists to writing answers for Stack Exchange. I don’t think anyone should feel afraid of outreach because they think they won’t be good enough. Chances are, you know something other people don’t: I guarantee if you want to, you will have something worth saying.

April 01, 2021

ResonaancesApril Fools'21: Trouble with g-2

On April 7, the g-2 experiment at Fermilab was supposed to reveal their new measurement of the magnetic moment of the muon.  *Was*, because the announcement may be delayed for the most bizarre reason. You may have heard that the data are blinded to avoid biasing the outcome. This is now standard practice, but the g-2 collaboration went further: they are unable to unblind the data by themselves, to make sure that there is no leaks or temptations. Instead, the unblinding procedure requires an input from an external person, who is one of the Fermilab theorists. How does this work? The experiment measures the frequency of precession of antimuons circulating in a ring. From that and the known magnetic field the sought fundamental quantity - the magnetic moment of the muon, or g-2 in short - can be read off.  However, the whole analysis chain is performed using a randomly chosen number instead of the true clock frequency. Only at the very end, once all statistical and systematic errors are determined,  the true frequency is inserted and the final result is uncovered. For that last step they need to type the secret code into this machine looking like something from a 60s movie: 

The code was picked by the Fermilab theorist, and he is the only person to know it.  There is the rub... this theorist now refuses to give away the code.  It is not clear why. One time he said he had forgotten the envelope with the code on a train, another time he said the dog had eaten it. For the last few days he has locked himself in his home and completely stopped taking any calls. 

The situation is critical. PhD students from the collaboration are working round the clock to crack the code. They are basically trying out all possible combinations, but the process is painstakingly slow and may take months, delaying the long-expected announcement.  The collaboration even got a permission from the Fermilab director to search the office of the said theorist.  But they only found this piece of paper behind the bookshelf: 

It may be that the paper holds a clue about the code. If you have any idea what the code may be please email or just write it in the comments below. 

Update: a part of this post (but strangely enough not all) is an April Fools joke. The new g-2 results are going to be presented on April 7, 2021, as planned.  The code is OPE, which stands for "operator product expansion", which is an  important technique used in the theoretical calculation of hadronic corrections to muon g-2: 

Tommaso DorigoRemembering Marni

The remains of what probably was Marni Dee Sheppeard were found last week in the mountains of the west coast of New Zealand, near Otira. Although a positive identification is still pending, this does seem to mean that Marni has died while hiking in her beloved mountains, some time between mid November and December of last year.

read more

Jordan EllenbergPassover food

For the second year in a row, Seder was virtual, with Dr. Mrs. Q’s family the first night and mine the second, so for the second year in a row I cooked our own Passover meal. After a whole cycle of chagim I’m pretty OK at making brisket by now. I like matzah balls, really like them, and so does everybody else in the house, so I went nuts and tripled the matzah ball recipe. What I envisioned: matzah ball soup absolutely brimming with matzah balls cheek by unleavened jowl. What I did not take into account: the matzah balls absorb soup as they cook, which meant that what was left in the pot when they were done was a kind of matzah ball slurry with no soup at all (pictured, top right). The matzah balls themselves were moist and delicious! But it didn’t really feel right. The next night I made a whole new soup and plopped the leftover matzah balls in there (you make triple, you have leftovers) and that was much more like it.

What we usually do on Passover is visit Dr. Mrs. Q’s mom in Columbus, and get a big helping of deli from Katzinger’s ; missing that, I put in a mail order to Katz’s, which arrived today. So now, the brisket leftovers all gone, I am happily eating pastrami and tongue and matzah-chopped liver sandwiches.

(photo by CJ who never ceases to clown me about how much better his phone’s camera is than my phone’s camera.)

March 31, 2021

Doug NatelsonAmazingly good harmonic oscillators

One way that we judge the "quality" of a harmonic oscillator by how long it takes to ring down.  A truly perfect, lossless harmonic oscillator would ring forever, so that's the limiting ideal.  If you ding a tuning fork, it will oscillate about 1000 times before its energy falls by a factor of around \(\exp(-2\pi) \approx 1/535\).  That means that its quality factor, \(Q\), is about 1000.  (An ideal, lossless harmonic oscillator would have \(Q = \infty\).   In contrast, if you ding the side of a coffee mug, the sound dies out almost immediately - it doesn't seem bell-like at all, because it has a much lower \(Q\), something like 10-50.  The quality is limited by damping, and in a mechanical system this is the lossy frictional process that, in the simplest treatment, acts on the moving parts of the oscillator with a force proportional to the speed of the motion.  That damping can be from air resistance, or in the case of the coffee mug example, it's dominated by "internal friction".

So, how good of a mechanical oscillator can we make?  This paper on the arxiv last night shows a truly remarkable (to me, anyway) example, where \(Q \sim 10^{8}\) in vacuum.  The oscillators in question are nanofabricated (drumhead-like) membranes of silicon nitride, with resonant frequencies of about 300 kHz.  To put this in perspective, if a typical 1 kHz tuning fork had the same product of \(Q\) and frequency, it would take \(3 \times 10^{10}\) seconds, or 950 years, for its energy content to ring down by that 1/535 factor.  The product of \(Q\) and frequency is so high, it should be possible to do quantum mechanics experiments with these resonators at room temperature.  
A relevant ad from a favorite book.

That's impressive, but it's even more so if you know a bit about internal friction in most solids, especially amorphous ones like silicon nitride.  If you made a similar design out of ordinary silicon dioxide glass, it would have a \(Q\) at room temperature of maybe 1000.  About 15 years ago, it was discovered that there is something special about silicon nitride, so that when it is stretched into a state of high tensile stress, its internal friction falls dramatically.  This actually shows a failure of the widely used tunneling two-level system model for glasses.  The investigators in the present work have taken this to a new extreme, and it could really pave the way for some very exciting work in mechanical devices operating in the quantum regime.  

update:  In resonators made from silicon nitride beams with specially engineered clamping geometries, you can do even better.  How about the equivalent of a guitar string that takes 30000 years to ring down?  “Listen to that sustain!

March 30, 2021

ResonaancesDeath of a forgotten anomaly

Anomalies come with a big splash, but often go down quietly. A recent ATLAS measurement, just posted on arXiv, killed a long-standing and by now almost forgotten anomaly from the LEP collider.  LEP was an electron-positron collider operating some time in the late Holocene. Its most important legacy is the very precise measurements of the interaction strength between the Z boson and matter, which to this day are unmatched in accuracy. In the second stage of the experiment, called LEP-2, the collision energy was gradually raised to about 200 GeV, so that pairs of W bosons could be produced. The experiment was able to measure the branching fractions for W decays into electrons, muons, and tau leptons.  These are precisely predicted by the Standard Model: they should be equal to 10.8%, independently of the flavor of the lepton (up to a very small correction due to the lepton masses).  However, LEP-2 found 

Br(W → τν)/Br(W → eν) = 1.070 ± 0.029,     Br(W → τν)/Br(W → μν) = 1.076 ± 0.028.

While the decays to electrons and muons conformed very well to the Standard Model predictions, 
there was a 2.8 sigma excess in the tau channel. The question was whether it was simply a statistical fluctuation or new physics violating the Standard Model's sacred principle of lepton flavor universality. The ratio Br(W → τν)/Br(W → eν) was later measured at the Tevatron, without finding any excess, however the errors were larger. More recently, there have been hints of large lepton flavor universality violation in B-meson decays, so it was not completely crazy to think that the LEP-2 excess was a part of the same story.  

The solution came 20 years later LEP-2: there is no large violation of lepton flavor universality in W boson decays. The LHC has already produced hundreds million of top quarks, and each of them (as far as we know) creates a W boson in the process of its disintegration. ATLAS used this big sample to compare the W boson decay rate to taus and to muons. Their result: 

Br(W → τν)/Br(W → μν) = 0.992 ± 0.013.

There is no slightest hint of an excess here. But what is most impressive is that the error is smaller,  by more than a factor of two, than in LEP-2! After the W boson mass, this is another precision measurement where a dirty hadron collider environment achieves a better accuracy than an electron-positron machine. 
Yes, more of that :)   

Thanks to the ATLAS measurement, our knowledge of the W boson couplings has increased substantially, as shown in the picture (errors are 1 sigma): 

The current uncertainty is a few per mille. This is still worse than for the Z boson couplings to leptons, in which case the accuracy is better than per mille, but we're getting there... Within the present accuracy, the W boson couplings to all leptons are consistent with the Standard Model prediction, and with lepton flavor universality in particular. Some tensions appearing in earlier global fits are all gone. The Standard Model wins again, nothing to see here, we can move on to the next anomaly. 

ResonaancesHail the XENON excess

Where were we...  It's been years since particle physics last made an exciting headline. The result announced today by the XENON collaboration is a welcome breath of fresh air. It's too early to say whether it heralds a real breakthrough, or whether it's another bubble to be burst. But it certainly gives food for thought for particle theorists, enough to keep hep-ph going for the next few months.

The XENON collaboration was operating a 1-ton xenon detector in an underground lab in Italy. Originally, this line of experiments was devised to search for hypothetical heavy particles constituting dark matter, so called WIMPs. For that they offer a basically background-free environment, where a signal of dark matter colliding with xenon nuclei would stand out like a lighthouse. However all WIMP searches so far have returned zero, null, and nada. Partly out of boredom and despair, the xenon-based collaborations began thinking out-of-the-box to find out what else their shiny instruments could be good for. One idea was to search for axions. These are hypothetical superlight and superweakly interacting particles, originally devised to plug a certain theoretical hole in the Standard Model of particle physics. If they exist, they should be copiously produced in the core of the Sun with energies of order a keV. This is too little to perceptibly knock an atomic nucleus, as xenon weighs over a hundred GeV. However, many variants of the axion scenario, in particular the popular DFSZ model, predicts axions interacting with electrons. Then a keV axion may occasionally hit the cloud of electrons orbiting xenon atoms, sending one to an excited level or ionizing the atom. These electron-recoil events can be identified principally by the ratio of ionization and scintillation signals, which is totally different than for WIMP-like nuclear recoils. This is no longer a background-free search, as radioactive isotopes present inside the detector may lead to the same signal. Therefore collaboration have to search for a peak of electron-recoil events at keV energies.     

This is what they saw in the XENON1t data
Energy spectrum of electron-recoil events measured by the XENON1T experiment. 
The expected background is approximately flat from 30 keV down to the detection threshold at 1 keV, below which it falls off abruptly. On the other hand, the data seem to show a signal component growing towards low energies, and possibly peaking at 1-2 keV. Concentrating on the 1-7 keV range (so with a bit of cherry-picking), 285 events is observed in the data compared to an expected 232 events from the background-only fit. In purely statistical terms, this is a 3.5 sigma excess.

Assuming it's new physics, what does this mean? XENON shows that there is a flux of light relativistic particles arriving into their detector.  The peak of the excess corresponds to the temperature in the core of the Sun (15 million kelvin = 1.3 keV), so our star is a natural source of these particles (but at this point XENON cannot prove they arrive from the Sun). Furthermore, the  particles must couple to electrons, because they can knock xenon's electrons off their orbits. Several theoretical models contain particles matching that description. Axions are the primary suspects, because today they are arguably the best motivated extension of the Standard Model. They are naturally light, because their mass is protected by built-in symmetries, and for the same reason their coupling to matter must be extremely suppressed.  For QCD axions the defining feature is their coupling to gluons, but in generic constructions one also finds the pseudoscalar-type interaction between the axion and electrons e:

To explain the excess, one needs the coupling g to be of order 10^-12, which is totally natural in this context. But axions are by no means the only possibility. A related option is the dark photon, which differs from the axion by certain technicalities, in particular it has spin-1 instead of spin-0. The palette of viable models is certainly much broader, with  the details to be found soon on arXiv.           

A distinct avenue to explain the XENON excess is neutrinos. Here, the advantage is that we already know that neutrinos exist, and that the Sun emits some 10^38 of them every second. In fact, the background model used by XENON includes 220 neutrino-induced events in the 1-210 keV range.
However, in the standard picture, the interactions of neutrinos with electrons are too weak to explain the excess. To that end one has to either increase their flux (so fiddle with the solar model), or to increase their interaction strength with matter (so go beyond the Standard Model). For example, neutrinos could interact with electrons via a photon intermediary. While neutrinos do not have an electric charge, uncharged particles can still couple to photons via dipole or higher-multipole moments. It is possible that new physics (possibly the same that generates the neutrino masses) also pumps up the neutrino magnetic dipole moment. This can be described in a model-independent way by adding a non-renormalizable dimension-7 operator to the Standard Model, e.g.
To explain the XENON excess we need d of order 10^-6. That mean new physics responsible for the dipole moment must be just behind the corner, below 100 TeV or so.

How confident should we be that it's new physics? Experience has shown again and again that anomalies in new physics searches have, with a very large confidence, a mundane origin that does not involve exotic particles or interactions.  In this case, possible explanations are, in order of likelihood,  1) small contamination of the detector, 2) some other instrumental effect that the collaboration hasn't thought of, 3) the ghost of Roberto Peccei, 4) a genuine signal of new physics. In fact, the collaboration itself is hedging for the first option, as they cannot exclude the presence of a small amount of  tritium in the detector, which would produce a signal similar to the observed excess. Moreover, there are a few orange flags for the new physics interpretation:
  1.  Simplest models explaining the excess are excluded by astrophysical observations. If axions can be produced in the Sun at the rate suggested by the XENON result, they can be produced at even larger rates in hotter stars, e.g. in red giants or white dwarfs. This would lead to excessive cooling of these stars, in conflict with observations. The upper limit on the axion-electron coupling g from red giants is 3*10^-13, which is an order of magnitude  less than what is needed for the XENON excess.  The neutrino magnetic moment explanations faces a similar difficulty. Of course, astrophysical limits reside in a different epistemological reality; it is not unheard of that they are relaxed by an order of magnitude or disappear completely. But certainly this is something to worry about.  
  2.  At a more psychological level, a small excess over a large background near a detection threshold.... sounds familiar. We've seen that before in the case of the DAMA and CoGeNT dark matter experiments, at it didn't turn out well.     
  3. The bump is at 1.5 keV, which is *twice* 750 eV.  
So, as usual, more data, time, and patience is needed to verify the new physics hypothesis. On the experimental side, the near future is very optimistic, with the XENONnT, LUX-ZEPLIN, and PandaX-4T experiments all jostling for position to confirm the excess and earn eternal glory. On the theoretical side, the big question is whether the stellar cooling constraints can be avoided, without too many epicycles. It would be also good to know whether the particle responsible for the XENON excess could be related to dark matter and/or to other existing anomalies, in particular to the B-meson ones. For answers, tune in to arXiv, from tomorrow on. 

ResonaancesThoughts on RK

The hashtag #CautiouslyExcited is trending on Twitter, in spite of the raging plague. The updated RK measurement in LHCb has made a big splash and has been covered by every news outlet.  RK measures the ratio of the B->Kμμ and B->Kee decay probabilities, which the Standard Model predicts to be very close to one. Using all the data collected so far, LHCb instead finds RK = 0.846 with the error of 0.044. This is the same central value and 30% smaller error compared to their 2019 result based on half of the data.  Mathematically speaking, the update does not much change the global picture of the B-meson anomalies. However, it has an important psychological impact, which goes beyond the PR story of crossing the 3 sigma threshold. Let me explain why. 

For the last few decades, every deviation from the Standard Model prediction in a particle collider experiment would mean one of these 3 things:    

  1. Statistical fluctuation. 
  2. Flawed theory prediction. 
  3. Experimental screw-up.   

In the case of RK, the option 2. is not a worry.  Yes, flavor physics is a swamp full of snake pits, however in the RK ratio the dangerous hadronic uncertainties cancel out to a large extent, so that precise theoretical predictions are possible.  Before March 23 the biggest worry was option 1.  Indeed, 2-3 sigma fluctuations happen all the time at the LHC, due to a huge number of measurements being taken.  However, you expect statistical fluctuations to decrease in significance as more data is collected.  This is what seems to be happening to the sister RD anomaly, and the earlier history of RK was not very encouraging either (in the 2019 update the significance neither increased nor decreased).  The fact that, this time, the significance of the RK anomaly increased, more or less as you would expect it to assuming it is a genuine new physics signal, makes it unlikely that it is merely a statistical fluctuation.  This is the main reason for the excitement you may perceive among particle physicists these days. 

On the other hand,  option 3. remains a possibility.  In their analysis,  LHCb reconstructed 3850 B->Kμμ decays vs. 1640 B->Kee decays, but from that they concluded that decays to muons are less probable than those to electrons. This is because one has to take into account the different reconstruction efficiencies for muons and electrons. An estimate of that efficiency is the most difficult ingredient of the measurement,  and the LHCb folks have spent many nights of heavy drinking worrying about it. Of course, they have made multiple cross-checks and are quite confident that there is no mistake but... there will always be a shadow of a doubt until RK is confirmed by an independent experiment. Fortunately for everyone, a verification will be provided by the Belle-II experiment, probably in 3-4 years from now. Only when Belle-II sees the same thing we will breathe a sigh of relief and put all our money on option

4. Physics beyond the Standard Model 

From that point of view explaining the RK measurement is trivial.  All we need is to add a new kind of interaction between b- and s-quarks and muons to the Standard Model Lagrangian.  For example, this 4-fermion contact term will do: 

where Q3=(t,b), Q2=(c,s), L2=(νμ,μ). The Standard Model won't let you have this interaction because it violates one of its founding principles: renormalizability.  But we know that the Standard Model is just an effective theory, and that non-renormalizable interactions must exist in nature, even if they are very suppressed so as to be unobservable most of the time.  In particular, neutrino oscillations are best explained by certain dimension-5 non-renormalizable interactions.  RK may be the first evidence that also dimension-6 non-renormalizable interactions exist in nature.  The nice thing is that the interaction term above 1) does not violate any existing experimental constraints,  2) explains not only RK but also some other 2-3 sigma tensions in the data (RK*, P5'),  and 3) fits well with some smaller 1-2 sigma effects (Bs->μμ, RpK,...). The existence of a simple theoretical explanation and a consistent pattern in the data is the other element that prompts cautious optimism.  

The LHC run-3 is coming soon, and with it more data on RK.  In the shorter perspective (less than a year?) there will be other important updates (RK*, RpK) and new observables (Rϕ , RK*+) probing the same physics. Finally something to wait for.   

March 29, 2021

Tommaso DorigoThis $1000 Says Lepton Universality Is OK

Almost exactly 15 years ago, I was following a nice conference in the Azores island of San Miguel, where I witnessed with a bit of gloom how the Standard Model was capable of explaining to the tiniest level all observed features not only of electroweak physics observables, but also of low-energy hadronic physics in weak decays of bottom hadrons, from a number of different experiments. I especially remember a talk by Guido Martinelli, among others, who was remarking that if new physics was there, it was really well concealed.

read more

n-Category Café Native Type Theory (Part 3)

guest post by Christian Williams

In Part 2 we described higher-order algebraic theories: categories with products and finite-order exponents, which present languages with (binding) operations, equations, and rewrites; from these we construct native type systems.

Now let’s use the wisdom of the Yoneda embedding!

Every category embeds into a topos of presheaves

y:C𝒫C=[C op,Set]y\colon C\rightarrowtail \mathscr{P}C=[C^{op},Set]

y(c)=C(,c)y(f)=C(,f):C(,c)C(,d).y(c) = C(-,c) \quad\quad y(f)=C(-,f)\colon C(-,c)\to C(-,d).

If (C,,[,])(C,\otimes,[-,-]) is monoidal closed, then the embedding preserves this structure:

y(cd)y(c)y(d)y([c,d])[y(c),y(d)]y(c\otimes d)\simeq y(c)\otimes y(d) \quad \quad y([c,d])\simeq [y(c),y(d)]

i.e. using Day convolution, yy is monoidal closed. So, we can move into a richer environment while preserving higher-order algebraic structure, or languages.

We now explore the native type system of a language, using the ρ\rho-calculus as our running example. The complete type system is in the paper, page 9.


The simplest kind of object of the native type system is a representable T(,S)T(-,\mathtt{S}). This is the set of all terms of sort S\mathtt{S}, indexed by the context of the language. Whereas many works in computer science either restrict to closed terms or lump all terms together, this indexing is natural and useful.

In the ρ\rho-calculus, y(P)=T ρ(,P)y(\mathtt{P}) = T_\rho(-,\mathtt{P}) is the indexed set of all processes.

y(P)(Γ)={p|(x 1,,x n):Γp:P}.y(\mathtt{P})(\Gamma) = \{p \;|\; (x_1,\dots,x_n):\Gamma \vdash p:\mathtt{P}\}.

The type system is built from these basic objects by the operations of TT and the structure of 𝒫T\mathscr{P}T. We can then construct predicates, dependent types, co/limits, etc., and each constructor has corresponding inference rules which can be used by a computer.

Predicates and Types

The language of a topos is represented by two fibrations: the subobject fibration gives predicate logic, and the codomain fibration gives dependent type theory. Hence the two basic entities are predicates and (dependent) types. Types are more general, and we can think of them as the “new sorts” of language TT, which can be much more expressive.

A predicate φ:y(P)Ω\varphi:y(\mathtt{P})\to \Omega corresponds to a subobject of a representable {p|φ(p)}y(P)\{p \;|\; \varphi(p)\}\rightarrowtail y(\mathtt{P}), which is equivalent to a sieve: a set of morphisms into P\mathtt{P}, closed under precomposition:

This emphasizes the idea that predicate logic over representables is actually reasoning about abstract syntax trees: here gg is some tree of operations in TT with an S\mathtt{S}-shaped hole of variables, and the predicate φ\varphi only cares about the outer shape of gg; you can plug in any term ff while still satisfying φ\varphi.

More generally, a morphism f:BAf\colon B\to A is understood as an “indexed presheaf” or dependent type

x:AB(x):Type.x:A\vdash B(x):Type.

i.e. for every element x:XAx\colon X\to A, there is a fiber B(x):=f *(x)B(x):= f^*(x) which is the “type depending on term xx”.

An example of a type in the ρ\rho-calculus is given by the input operation,

y(in):y(N×P×[N,P])y(P)y(\mathtt{in}):y(\mathtt{N\times P\times [N,P]})\to y(\mathtt{P})

where the fiber over φ\varphi is the set of all channel-context pairs (n,λx.p)(n,\lambda x.p) such that φ(in(n,λx.p))\varphi(\mathtt{in}(n,\lambda x.p)).

Dependent Sum and Product

Here we use the structure described in Part 1. The predicate functor 𝒫T(,Ω):𝒫T opCHA\mathscr{P}T(-,\Omega):\mathscr{P}T^{op}\to CHA is a hyperdoctrine, which for each presheaf AA gives a complete Heyting algebra of predicates Ω A\Omega^A, and for each f:BAf\colon B\to A gives adjoints fΩ f f:Ω BΩ A\exists_f\dashv \Omega^f\dashv \forall_f\colon \Omega^B\to \Omega^A for image, preimage, and secure image.

Similarly, the slice functor 𝒫T/:𝒫T opCCT\mathscr{P}T/-:\mathscr{P}T^{op} \to CCT is a hyperdoctrine into co/complete toposes with adjoints Σ fΔ fΠ f\Sigma_f\dashv \Delta^f\dashv \Pi_f. These are dependent sum, substitution, and dependent product. From these we can reconstruct all the operations of predicate logic, and much more.

As (briefly) explained in Part 1, the idea of dependent sum is that indexed sums generalize products; here the codomain is the set of indices and its fibers are the sets in the family; so an element of the indexed sum is a dependent pair (a,xX a)(a,x\in X_a). Dually, indexed products generalize functions: an element of the product of the fibers is a tuple (x 1X a 1,,x nX a n)(x_1\in X_{a_1},\dots,x_n\in X_{a_n}) which can be understood as a dependent function where the codomain X aX_a depends on which aa you plug in.

Explicitly, given f:ABf\colon A\to B and p:XAp\colon X\to A, q:YBq\colon Y\to B, we have Δ f(q) S a=q S f S(a)\Delta_f(q)_\mathtt{S}^a = q_\mathtt{S}^{f_\mathtt{S}(a)} and

(1)Σ f(p) S b := a:A S f S(a)=bp S a Π f(p) S b := u:RS f R(a)=B(u)(b)p R a\begin{array}{ll} \Sigma_f(p)_\mathtt{S}^b & := \sum_{a:A_\mathtt{S}}\sum_{f_\mathtt{S}(a)=b} p_\mathtt{S}^{a} \\ \Pi_f(p)_\mathtt{S}^b & := \prod_{u:\mathtt{R}\to \mathtt{S}}\prod_{f_\mathtt{R}(a)=B(u)(b)} p_\mathtt{R}^{a} \end{array}

(letting X S=X(S)X_\mathtt{S}=X(\mathtt{S}) and p S bp_\mathtt{S}^b denote the fiber over bb). Despite the complex formulae, the intuition is essentially the same as in Set, except we need to ensure the resulting objects are still presheaves, i.e. closed under precomposition. The point is:

Σgeneralizes product, and categorifies or image; and \Sigma \;\; \text{generalizes product, and categorifies } \;\; \exists \;\; \text{ or image; and}

Πgeneralizes internal hom, and categorifies or secure image. \Pi \;\; \text{generalizes internal hom, and categorifies } \;\; \forall \;\; \text{ or secure image.}

The main examples start with just “pushing forward” operations in the theory, using \exists. Given an operation f:STf\colon \mathtt{S}\to \mathtt{T}, the image y(f):Ω y(S)Ω y(T)\exists_{y(f)}:\Omega^{y(\mathtt{S})}\to \Omega^{y(\mathtt{T})} takes a predicate (sieve) φy(S)\varphi\rightarrowtail y(\mathtt{S}) and simply postcomposes every term in φ\varphi with ff.

Hence an example predicate (leaving \exists and yy implicit) is

multi.thread=¬(0)|¬(0)y(P).\mathsf{multi.thread} = \neg(0)\vert \neg(0) \;\; \rightarrowtail y(\mathtt{P}).

This predicate determines processes which are the parallel of two non-null processes.

As an example of the distinct utility of the adjoints, recall from Part 2 that we can model computational dynamics using a graph of processes and rewrites s,t:EPs,t:\mathtt{E\to P}. Now these operations give adjunctions between sieves on E\mathtt{E} and sieves on P\mathtt{P}, which give operators for “step forward or backward”:

Σ tΩ s(φ)={q|r.r:pqφ(p)}\Sigma_t\Omega^s(\varphi) = \{q \;|\; \exists r.\; r:p\rightsquigarrow q \wedge \varphi(p)\}

Π t(Ω s(φ))={q|r.r:pqφ(p)}\Pi_t(\Omega^s(\varphi)) = \{q \;|\; \forall r.\; r:p\rightsquigarrow q \Rightarrow \varphi(p)\}

While “image” step-forward gives all possible next terms, the “secure” step-forward gives terms which could only have come from φ\varphi. For security protocols, this can be used to filter processes by past behavior.

Image / Comprehension and Subtyping

Predicates and types are related by an adjunction between the fibrations.

To convert a predicate φ:AΩ\varphi:A\to \Omega to a type, apply comprehension to construct the subobject of terms c(φ)\mathrm{c}(\varphi) which satisfy φ\varphi. To convert a type p:XAp:X\to A to a predicate, apply image factorization to construct the predicate i(p)\mathrm{i}(p) for whether each fiber is inhabited.

We implicitly use the comprehension direction all the time (thinking of predicates as their subobjects); and while taking the image is more destructive, it can certainly be useful for the sake of simplification. For example, rather than thinking about the type y(out):y(N×P)y(P)y(\mathtt{out}):y(\mathtt{N\times P})\to y(\mathtt{P}), we may simply want to consider the image i(y(out))\mathrm{i}(y(\mathtt{out})), the set of all output processes.

Internal Hom and Reification

While the Grothendieck construction is relatively known, there is less awareness about how the local structure of an indexed category (complete Heyting algebras for predicates) can often be converted to a global structure on the total category of the corresponding fibration. The total category of the predicate functor Ω𝒫T\Omega\mathscr{P}T is cartesian closed, allowing us to construct predicate homs.

The construction can be understood in the category of sets. Given φ:2 A\varphi:2^A and ψ:2 B\psi:2^B, we can define

[φ,ψ]:[A,B]2[φ,ψ](f)=aA.φ(a)ψ(f(a)).[\varphi,\psi]:[A,B]\to 2 \quad \quad [\varphi, \psi](f) = \forall a\in A.\; \varphi(a)\Rightarrow \psi(f(a)).

Hence it constructs “contexts which ensure implications”.

For example, we can construct the “wand” of separation logic: let T hT_h be the theory of a commutative monoid (H,,e)(H,\cup,e), with a set of constants {h}:1H\{h\}:1\to H adjoined as the elements of a heap. If we define

(φψ)=Ω λx.x[φ,ψ](\varphi \multimap \psi) = \Omega^{\lambda x.x\cup-}[\varphi, \psi]

then h 1:(φψ)h_1:(\varphi \multimap \psi) asserts that h 2:φh 1h 2:ψh_2:\varphi\Rightarrow h_1\cup h_2:\psi.

There is a much more expressive way of forming homs which we call reification (p7); we do not know if it has been explored, and we have yet to determine its relation to dependent product.


Similarly, the fibers of Ω𝒫T𝒫T\Omega\mathscr{P}T\to \mathscr{P}T are co/complete, and this can be assembled into a global co/complete structure on the total category. Hence, we can use this to construct co/inductive types.

For example, given a predicate on names α\alpha, we may construct a predicate for “liveness and safety” on α\alpha:α)=μα,N.X)¬in(¬α,N.P)\mathsf{}(\alpha) = \mu X. \mathtt{in}(\alpha,\mathtt{N}.X)\wedge \neg\mathtt{in}(\neg\alpha,\mathtt{N.P})

where μ\mu denotes the initial algebra, which is constructed as a colimit. This determines whether a process inputs on α\alpha, does not input on ¬α\neg\alpha, and continues as a process which satisfies this same predicate. This can be understood as a static type for a firewall.


Once these type constructors are combined, they can express highly useful and complex ideas about code. The best part is that this type system can be generated from any language with product and function types, which includes large chunks of many popular programming languages.

To get a feel for more applications, check out the final section of Native Type Theory. Of course, check out the rest of the paper, and let me know what you think! Thank you for reading.

n-Category Café Funding of the nLab

The nLab was born out of conversations at the Café back in 2008. Over the past 12 years it has grown as a wiki to over 15000 pages.

For many years it was funded personally by Urs Schreiber, until in the last few years when it relied on a fund kindly provided by Steve Awodey.

But now we have new arrangements in place, and are looking to its users to help fund the nLab:

In 2021, the nLab will move to the cloud. To fund the running of the nLab in the cloud, we have decided to rely upon funding by donations. In the autumn of 2020, at the kind initiative of Brendan Fong, the nLab decided to collaborate with the Topos Institute for the practical side of this: the Topos Institute is legally able to handle donations, and the financing of the nLab will be handled by the Topos Institute. The Topos Institute owns the cloud account in which the nLab will be run.

Please do consider making a donation here.

Doug NatelsonBrief items

Catching up after the APS meeting, here are a couple of links of interest:

  • This video has been making the rounds, and it's fun to watch.  It's an updated take on one of those powers-of-ten videos, though in this case it's really powers-of-two.  Nicely done, though I think the discussion of the Planck Length is not really correct.  As far as I know, the Planck Length is a characteristic scale where quantum gravity effects cannot be neglected - that doesn't mean that the structure of the universe is discrete on that scale.
  • There have also been a lot of articles like this one implying that new (non-Standard Model) physics has been seen at the LHC.  As is usually the case, it's premature to get too excited.  At the 3\(\sigma\) level, there is an asymmetry in decay channels (electrons vs muons) seen by the LHCb experiment when none is expected.  As the always reliable Tommaso Dorigo writes here, everyone should just take a breath before getting too excited.  At least when the LHC starts back up next year, there should be a lot of new data coming in, and either this effect will grow, or it will fade away.  Anyone want to bet on the over/under for the number of theory papers about leptoquarks that are going to show up on the arxiv in the next month?
  • We were fortunate enough to have Pablo Jarillo-Herrero give our colloquium this past Wednesday, talking about some really exciting recent results (here, here) in twisted trilayer graphene.
I'll hopefully write more soon, also touching on a recent paper of ours.

March 26, 2021

Matt von HippelRedefining Fields for Fun and Profit

When we study subatomic particles, particle physicists use a theory called Quantum Field Theory. But what is a quantum field?

Some people will describe a field in vague terms, and say it’s like a fluid that fills all of space, or a vibrating rubber sheet. These are all metaphors, and while they can be helpful, they can also be confusing. So let me avoid metaphors, and say something that may be just as confusing: a field is the answer to a question.

Suppose you’re interested in a particle, like an electron. There is an electron field that tells you, at each point, your chance of detecting one of those particles spinning in a particular way. Suppose you’re trying to measure a force, say electricity or magnetism. There is an electromagnetic field that tells you, at each point, what force you will measure.

Sometimes the question you’re asking has a very simple answer: just a single number, for each point and each time. An example of a question like that is the temperature: pick a city, pick a date, and the temperature there and then is just a number. In particle physics, the Higgs field answers a question like that: at each point, and each time, how “Higgs-y” is it there and then? You might have heard that the Higgs field gives other particles their mass: what this means is that the more “Higgs-y” it is somewhere, the higher these particles’ mass will be. The Higgs field is almost constant, because it’s very difficult to get it to change. That’s in some sense what the Large Hadron Collider did when they discovered the Higgs boson: pushed hard enough to cause a tiny, short-lived ripple in the Higgs field, a small area that was briefly more “Higgs-y” than average.

We like to think of some fields as fundamental, and others as composite. A proton is composite: it’s made up of quarks and gluons. Quarks and gluons, as far as we know, are fundamental: they’re not made up of anything else. More generally, since we’re thinking about fields as answers to questions, we can just as well ask more complicated, “composite” questions. For example, instead of “what is the temperature?”, we can ask “what is the temperature squared?” or “what is the temperature times the Higgs-y-ness?”.

But this raises a troubling point. When we single out a specific field, like the Higgs field, why are we sure that that field is the fundamental one? Why didn’t we start with “Higgs squared” instead? Or “Higgs plus Higgs squared”? Or something even weirder?

The inventor of the Higgs-squared field, Peter Higgs-squared

That kind of swap, from Higgs to Higgs squared, is called a field redefinition. In the math of quantum field theory, it’s something you’re perfectly allowed to do. Sometimes, it’s even a good idea. Other times, it can make your life quite complicated.

The reason why is that some fields are much simpler than others. Some are what we call free fields. Free fields don’t interact with anything else. They just move, rippling along in easy-to-calculate waves.

Redefine a free field, swapping it for some more complicated function, and you can easily screw up, and make it into an interacting field. An interacting field might interact with another field, like how electromagnetic fields move (and are moved by) electrons. It might also just interact with itself, a kind of feedback effect that makes any calculation we’d like to do much more difficult.

If we persevere with this perverse choice, and do the calculation anyway, we find a surprise. The final results we calculate, the real measurements people can do, are the same in both theories. The field redefinition changed how the theory appeared, quite dramatically…but it didn’t change the physics.

You might think the moral of the story is that you must always choose the right fundamental field. You might want to, but you can’t: not every field is secretly free. Some will be interacting fields, whatever you do. In that case, you can make one choice or another to simplify your life…but you can also just refuse to make a choice.

That’s something quite a few physicists do. Instead of looking at a theory and calling some fields fundamental and others composite, they treat every one of these fields, every different question they could ask, on the same footing. They then ask, for these fields, what one can measure about them. They can ask which fields travel at the speed of light, and which ones go slower, or which fields interact with which other fields, and how much. Field redefinitions will shuffle the fields around, but the patterns in the measurements will remain. So those, and not the fields, can be used to specify the theory. Instead of describing the world in terms of a few fundamental fields, they think about the world as a kind of field soup, characterized by how it shifts when you stir it with a spoon.

It’s not a perspective everyone takes. If you overhear physicists, sometimes they will talk about a theory with only a few fields, sometimes they will talk about many, and you might be hard-pressed to tell what they’re talking about. But if you keep in mind these two perspectives: either a few fundamental fields, or a “field soup”, you’ll understand them a little better.

Clifford JohnsonTalking at Fermilab!

This evening at 7:30pm Central time, come to Fermilab (online) for a public talk I’ll give about shaking up how we present serious scientific ideas in books for the public. It should be fun! The information is here. -cvj

The post Talking at Fermilab! appeared first on Asymptotia.

March 23, 2021

Jacques Distler Cosmic Strings in the Standard Model

Over at the n-Category Café, John Baez is making a big deal of the fact that the global form of the Standard Model gauge group is G=(SU(3)×SU(2)×U(1))/N G = (SU(3)\times SU(2)\times U(1))/N where NN is the 6\mathbb{Z}_6 subgroup of the center of G=SU(3)×SU(2)×U(1)G'=SU(3)\times SU(2)\times U(1) generated by the element (e 2πi/3𝟙,𝟙,e 2πi/6)\left(e^{2\pi i/3}\mathbb{1},-\mathbb{1},e^{2\pi i/6}\right).

The global form of the gauge group has various interesting topological effects. For instance, the fact that the center of the gauge group is Z(G)=U(1)Z(G)= U(1), rather than Z(G)=U(1)× 6Z(G')=U(1)\times \mathbb{Z}_6, determines the global 1-form symmetry of the theory. It also determines the presence or absence of various topological defects (in particular, cosmic strings). I pointed this out, but a proper explanation deserved a post of its own.

None of this is new. I’m pretty sure I spent a sunny afternoon in the summer of 1982 on the terrace of Café Pamplona doing this calculation. (As any incoming graduate student should do, I spent many a sunny afternoon at a café doing this and similar calculations.)

At low energies, GG is broken to the subgroup H=U(3)H=U(3), where the embedding i:HGi\colon H\hookrightarrow G is given as follows. Let hHh\in H and let ddet(h)d\coloneqq \det(h). Choose a 6th root b=d 1/6 b = d^{1/6} Then

(1)i(h)=(hb 2,(b 3 0 0 b +3),b 1)i(h) = \left(h b^{-2}, \left(\begin{smallmatrix}b^{{\color{red}-}3}&0\\0&b^{{\color{red}+}3}\end{smallmatrix}\right), b^{\color{red}-1}\right)

The ambiguity in defining bb leads precisely to an ambiguity in i(h)i(h) by multiplication by an element of NN. Thus (1) is ill-defined as a map to GG', but well-defined as a map to GG.

The (would-be) cosmic strings associated to the breaking of GG to HH are classified by π 1(G/H)\pi_1(G/H). Both π 1(H)\pi_1(H) and π 1(G)\pi_1(G) are equal to \mathbb{Z}. The long-exact sequence in homotopy yields 0π 1(H)π 1(G)π 1(G/H)0 0\to \pi_1(H)\to \pi_1(G) \to \pi_1(G/H)\to 0 So what we need to do is compute the image of the generator of π 1(H)\pi_1(H) in π 1(G)\pi_1(G). If the image is nn times the generator of π 1(G)\pi_1(G), then the quotient is nontrivial and we have n\mathbb{Z}_n cosmic strings.

π 1(G)\pi_1(G) is generated by the (homotopy class of) the loop

(2)g(s)=((e 2πis/3 0 0 0 e 2πis/3 0 0 0 e 4πis/3),(e iπs 0 0 e iπs),e 2πis/6),s[0,1]g(s)=\left(\left(\begin{smallmatrix}e^{2\pi i s/3}&0&0\\0&e^{2\pi i s/3}&0\\0&0&e^{-4\pi i s/3}\end{smallmatrix}\right),\begin{pmatrix}e^{i\pi s}&0\\0&e^{-i\pi s}\end{pmatrix},e^{2\pi i s/6}\right), \qquad s\in[0,1]

π 1(H)\pi_1(H) is generated by the loop

(3)h(s)=(1 0 0 0 1 0 0 0 e 2πis),s[0,1]h(s)= \begin{pmatrix}1&0&0\\0&1&0\\0&0&e^{{\color{red} -}2\pi i s}\end{pmatrix}, \qquad s\in[0,1]

Plugging (3) into (1), we see that i(h(s))=g(s)i(h(s))=g(s). Hence π 1(G/H)=0\pi_1(G/H)=0 and there are no cosmic strings.

March 22, 2021

Scott AaronsonQC ethics and hype: the call is coming from inside the house

For years, I’d sometimes hear discussions about the ethics of quantum computing research. Quantum ethics!

When the debates weren’t purely semantic, over the propriety of terms like “quantum supremacy” or “ancilla qubit,” they were always about chin-strokers like “but what if cracking RSA encryption gives governments more power to surveil their citizens? or what if only a few big countries or companies get quantum computers, thereby widening the divide between haves and have-nots?” Which, OK, conceivably these will someday be issues. But, besides barely depending on any specific facts about quantum computing, these debates always struck me as oddly safe, because the moral dilemmas were so hypothetical and far removed from us in time.

I confess I may have even occasionally poked fun when asked to expound on quantum ethics. I may have commented that quantum computers probably won’t kill anyone unless a dilution refrigerator tips over onto their head. I may have asked forgiveness for feeding custom-designed oracles to BQP and QMA, without first consulting an ethics committee about the long-term effects on those complexity classes.

Now fate has punished me for my flippancy. These days, I really do feel like quantum computing research has become an ethical minefield—but not for any of the reasons mentioned previously. What’s new is that millions of dollars are now potentially available to quantum computing researchers, along with equity, stock options, and whatever else causes “ka-ching” sound effects and bulging eyes with dollar signs. And in many cases, to have a shot at such riches, all an expert needs to do is profess optimism that quantum computing will have revolutionary, world-changing applications and have them soon. Or at least, not object too strongly when others say that.

Some of today’s rhetoric will of course remind people of the D-Wave saga, which first brought this blog to prominence when it began in earnest in 2007. Quantum computers, we hear now as then, will soon leave the Earth’s fastest supercomputers in the dust. They’re going to harness superposition to try all the exponentially many possible solutions at once. They’ll crack the Traveling Salesman Problem, and will transform machine learning and AI beyond recognition. Meanwhile, simulations of quantum systems will be key to solving global warming and cancer.

Despite the parallels, though, this new gold rush doesn’t feel to me like the D-Wave one, which seems in retrospect like just a little dry run. If I had to articulate what’s new in one sentence, it’s that this time “the call is coming from inside the house.” Many of the companies making wildly overhyped claims are recognized leaders of the field. They have brilliant quantum computing theorists and experimentalists on their staff with impeccable research records. Some of those researchers are among my best friends. And even when I wince at the claims of near-term applications, in many cases (especially with quantum simulation) the claims aren’t obviously false—we won’t know for certain until we try it and see! It’s genuinely gotten harder to draw the line between defensible optimism and exaggerations verging on fraud.

Indeed, this time around virtually everyone in QC is “complicit” to a greater or lesser degree. I, too, have accepted compensation to consult on quantum computing topics, to give talks at hedge funds, and in a few cases to serve as a scientific adviser to quantum computing startups. I tell myself that, by 2021 standards, this stuff is all trivial chump change—a few thousands of dollars here or there, to expound on the same themes that I already discuss free of charge on this blog. I actually get paid to dispel hype, rather than propagate it! I tell myself that I’ve turned my back on the orders of magnitude more money available to those willing to hitch their scientific reputations to the aspirations of this or that specific QC company. (Yes, this blog, and my desire to preserve its intellectual independence and credibility, might well be costing me millions!)

But, OK, some would argue that accepting any money from QC companies or QC investors just puts you at the top of a slope with unabashed snake-oil salesmen at the bottom. With the commercialization of our field that started around 2015, there’s no bright line anymore marking the boundary between pure scientific curiosity and the pursuit of filthy lucre; it’s all just points along a continuum. I’m not sure that these people are wrong.

As some of you might’ve seen already, IonQ, the trapped-ion QC startup that originated from the University of Maryland, is poised to have the first-ever quantum computing IPO—a so-called “SPAC IPO,” which while I’m a financial ignoramus, apparently involves merging with a shell company and thereby bypassing the SEC’s normal IPO rules. Supposedly they’re seeking $650 million in new funding and a $2 billion market cap. If you want to see what IonQ is saying about QC to prospective investors, click here. Lacking any choice in the matter, I’ll probably say more about these developments in a future post.

Meanwhile, PsiQuantum, the Palo-Alto-based optical QC startup, has said that it’s soon going to leave “stealth mode.” And Amazon, Microsoft, Google, IBM, Honeywell, and other big players continue making large investments in QC—treating it, at least rhetorically, not at all like blue-sky basic research, but like a central part of their future business plans.

All of these companies have produced or funded excellent QC research. And of course, they’re all heterogeneous, composed of individuals who might vehemently disagree with each other about the near- or long-term prospects of QC. And yet all of them have, at various times, inspired reflections in me like the ones in this post.

I regret that this post has no clear conclusion. I’m still hashing things out, solicing thoughts from my readers and friends. Speaking of which: this coming Monday, March 22, at 8-10pm US Eastern time, I’ve decided to hold a discussion around these issues on Clubhouse—my “grand debut” on that app, and an opportunity to see whether I like it or not! My friend Adam Brown will moderate the discussion; other likely participants will be John Horgan, George Musser, Michael Nielsen, and Matjaž Leonardis. If you’re on Clubhouse, I hope to see you there!

Update (March 22): Read this comment by “FB” if you’d like to understand how we got to this point.

Jordan EllenbergEquinox

Spring arrived right on schedule, just a little snow left in the shady places, sunny out and windy in the high 60’s. AB and I did our first real bike ride of the year, going out about 15 miles to the very agreeable Riley Tavern where you eat outside on picnic tables. A lot of people are watching Wisconsin’s basketball season slowly sputter out as the Badgers fail to mount a comeback against the much higher-seeded team from Baylor. Riley Tavern serves amazing ice cream sandwiches with two chocolate chip cookies instead of the rectangular brown things; they’re not made there, they’re from Mullen’s Dairy Bar in Watertown. The thing about an ice cream sandwich is, they use the rectangular brown things which are soft and not very interesting because you can bite right through them without messing up the ice cream. Any cookie with a little more of a resistance to the tooth tends to smoosh the ice cream out the side when you bite down. That’s unacceptable. Mullen’s has somehow found a way to use a cookie with a real bite but give the ice cream itself enough structural integrity to hold itself in place while you eat it. Extraordinary!

I’d figured it had been warm enough long enough for the bike trail to be dry, and that was sort of true, but in many places it was badly rutted from the people who’d ridden on it when it was muddy, and even though it wasn’t really muddy anymore, it was soft for a couple of miles, so that your weight pushed your back wheel down into the dirt, which clutched your tire so that you were perpetually in a kind of low-grade partially submerged wheelie. We fought our way through at about 5mph for the whole stretch. So a more strenuous 30 miles than the usual. But the last 5 miles home, on pavement, felt like absolute gliding.

John PreskillLife among the experimentalists

I used to catch lizards—brown anoles, as I learned to call them later—as a child. They were colored as their name suggests, were about as long as one of my hands, and resented my attention. But they frequented our back porch, and I had a butterfly net. So I’d catch lizards, with my brother or a friend, and watch them. They had throats that occasionally puffed out, exposing red skin, and tails that detached and wriggled of their own accord, to distract predators.

Some theorists might appreciate butterfly nets, I imagine, for catching experimentalists. Some of us theorists will end a paper or a talk with “…and these predictions are experimentally accessible.” A pause will follow the paper’s release or the talk, in hopes that a reader or an audience member will take up the challenge. Usually, none does, and the writer or speaker retires to the Great Deck Chair of Theory on the Back Patio of Science.

So I was startled when an anole, metaphorically speaking, volunteered a superconducting qubit for an experiment I’d proposed.

The experimentalist is one of the few people I can compare to a reptile without fear that he’ll take umbrage: Kater Murch, an associate professor of physics at Washington University in St. Louis. The most evocative description of Kater that I can offer appeared in an earlier blog post: “Kater exudes the soberness of a tenured professor but the irreverence of a Californian who wears his hair slightly long and who tattooed his wedding band on.”

Kater expressed interest in an uncertainty relation I’d proved with theory collaborators. According to some of the most famous uncertainty relations, a quantum particle can’t have a well-defined position and a well-defined momentum simultaneously. Measuring the position disturbs the momentum; any later momentum measurement outputs a completely random, or uncertain, number. We measure uncertainties with entropies: The greater an entropy, the greater our uncertainty. We can cast uncertainty relations in terms of entropies.

I’d proved, with collaborators, an entropic uncertainty relation that describes chaos in many-particle quantum systems. Other collaborators and I had shown that weak measurements, which don’t disturb a quantum system much, characterize chaos. So you can check our uncertainty relation using weak measurements—as well as strong measurements, which do disturb quantum systems much. One can simplify our uncertainty relation—eliminate the chaos from the problem and even eliminate most of the particles. An entropic uncertainty relation for weak and strong measurements results.

Kater specializes in weak measurements, so he resolved to test our uncertainty relation. Physical Review Letters published the paper about our collaboration this month. Quantum measurements can not only create uncertainty, the paper shows, but also reduce it: Kater and his PhD student Jonathan Monroe used light to measure a superconducting qubit, a tiny circuit in which current can flow forever. The qubit had properties analogous to position and momentum (the spin’s z– and x-components). If the atom started with a well-defined “position” (the z-component) and the “momentum” (the x-component) was measured, the outcome was highly random; the total uncertainty about the two measurements was large. But if the atom started with a well-defined “position” (z-component) and another property (the spin’s y-component) was measured before the “momentum” (the x-component) was measured strongly, the total uncertainty was lower. The extra measurement was designed not to disturb the atom much. But the nudge prodded the atom enough, rendering the later “momentum” measurement (the x measurement) more predictable. So not only can quantum measurements create uncertainty, but gentle quantum measurements can also reduce it.

I didn’t learn only physics from our experiment. When I’d catch a lizard, I’d tip it into a tank whose lid contained a magnifying lens, and I’d watch the lizard. I didn’t trap Kater and Jonathan under a magnifying glass, but I did observe their ways. Here’s what I learned about the species experimentalus quanticus.

1) They can run experiments remotely when a pandemic shuts down campus: A year ago, when universities closed and cities locked down, I feared that our project would grind to a halt. But Jonathan twiddled knobs and read dials via his computer, and Kater popped into the lab for the occasional fixer-upper. Jonathan even continued his experiment from another state, upon moving to Texas to join his parents. And here we theorists boast of being able to do our science almost anywhere.

2) They speak with one less layer of abstraction than I: We often discussed, for instance, the thing used to measure the qubit. I’d call the thing “the detector.” Jonathan would call it “the cavity mode,” referring to the light that interacts with the qubit, which sits in a box, or cavity. I’d say “poh-tay-toe”; they’d say “poh-tah-toe”; but I’m glad we didn’t call the whole thing off.

Fred Astaire: “Detector.”
Ginger Rogers: “Cavity mode.”

3) Experiments take longer than expected—even if you expect them to take longer than estimated: Kater and I hatched the plan for this project during June 2018. The experiment would take a few months, Kater estimated. It terminated last summer.

4) How they explain their data: Usually in terms of decoherence, the qubit’s leaking of quantum information into its environment. For instance, to check that the setup worked properly, Jonathan ran a simple test that ended with a measurement. (Experts: He prepared a \sigma_z eigenstate, performed a Hadamard gate, and measured \sigma_z.) The measurement should have had a 50% chance of yielding +1 and a 50% chance of yield -1. But the -1 outcome dominated the trials. Why? Decoherence pushed the qubit toward toward -1. (Amplitude damping dominated the noise.)

5) Seeing one’s theoretical proposal turn into an experiment feels satisfying: Due to point (3), among other considerations, experiments aren’t cheap. The lab’s willingness to invest in the idea I’d developed with other theorists was heartening. Furthermore, the experiment pushed us to uncover more theory—for example, how tight the uncertainty bound could grow.

After getting to know an anole, I’d release it into our backyard and bid it adieu.1 So has Kater moved on to experimenting with topology, and Jonathan has progressed toward graduation. But more visitors are wriggling in the Butterfly Net of Theory-Experiment Collaboration. Stay tuned.

1Except for the anole I accidentally killed, by keeping it in the tank for too long. But let’s not talk about that.

March 19, 2021

Matt von HippelBlack Holes, Neutron Stars, and the Power of Love

What’s the difference between a black hole and a neutron star?

When a massive star nears the end of its life, it starts running out of nuclear fuel. Without the support of a continuous explosion, the star begins to collapse, crushed under its own weight.

What happens then depends on how much weight that is. The most massive stars collapse completely, into the densest form anything can take: a black hole. Einstein’s equations say a black hole is a single point, infinitely dense: get close enough and nothing, not even light, can escape. A quantum theory of gravity would change this, but not a lot: a quantum black hole would still be as dense as quantum matter can get, still equipped with a similar “point of no return”.

A slightly less massive star collapses, not to a black hole, but to a neutron star. Matter in a neutron star doesn’t collapse to a single point, but it does change dramatically. Each electron in the old star is crushed together with a proton until it becomes a neutron, a forced reversal of the more familiar process of Beta decay. Instead of a ball of hydrogen and helium, the star then ends up like a single atomic nucleus, one roughly the size of a city.

Not kidding about the “city” thing…and remember, this is more massive than the Sun

Now, let me ask a slightly different question: how do you tell the difference between a black hole and a neutron star?

Sometimes, you can tell this through ordinary astronomy. Neutron stars do emit light, unlike black holes, though for most neutron stars this is hard to detect. In the past, astronomers would use other objects instead, looking at light from matter falling in, orbiting, or passing by a black hole or neutron star to estimate its mass and size.

Now they have another tool: gravitational wave telescopes. Maybe you’ve heard of LIGO, or its European cousin Virgo: massive machines that do astronomy not with light but by detecting ripples in space and time. In the future, these will be joined by an even bigger setup in space, called LISA. When two black holes or neutron stars collide they “ring” the fabric of space and time like a bell, sending out waves in every direction. By analyzing the frequency of these waves, scientists can learn something about what made them: in particular, whether the waves were made by black holes or neutron stars.

One big difference between black holes and neutron stars lies in something called their “Love numbers“. From far enough away, you can pretend both black holes and neutron stars are single points, like fundamental particles. Try to get more precise, and this picture starts to fail, but if you’re smart you can include small corrections and keep things working. Some of those corrections, called Love numbers, measure how much one object gets squeezed and stretched by the other’s gravitational field. They’re called Love numbers not because they measure how hug-able a neutron star is, but after the mathematician who first proposed them, A. E. H. Love.

What can we learn from Love numbers? Quite a lot. More impressively, there are several different types of questions Love numbers can answer. There are questions about our theories, questions about the natural world, and questions about fundamental physics.

You might have heard that black holes “have no hair”. A black hole in space can be described by just two numbers: its mass, and how much it spins. A star is much more complicated, with sunspots and solar flares and layers of different gases in different amounts. For a black hole, all of that is compressed down to nothing, reduced to just those two numbers and nothing else.

With that in mind, you might think a black hole should have zero Love numbers: it should be impossible to squeeze it or stretch it. This is fundamentally a question about a theory, Einstein’s theory of relativity. If we took that theory for granted, and didn’t add anything to it, what would the consequences be? Would black holes have zero Love number, or not?

It turns out black holes do have zero Love number, if they aren’t spinning. If they are, things are more complicated: a few calculations made it look like spinning black holes also had zero Love number, but just last year a more detailed proof showed that this doesn’t hold. Somehow, despite having “no hair”, you can actually “squeeze” a spinning black hole.

(EDIT: Folks on twitter pointed out a wrinkle here: more recent papers are arguing that spinning black holes actually do have zero Love number as well, and that the earlier papers confused Love numbers with a different effect. All that is to say this is still very much an active area of research!)

The physics behind neutron stars is in principle known, but in practice hard to understand. When they are formed, almost every type of physics gets involved: gas and dust, neutrino blasts, nuclear physics, and general relativity holding it all together.

Because of all this complexity, the structure of neutron stars can’t be calculated from “first principles” alone. Finding it out isn’t a question about our theories, but a question about the natural world. We need to go out and measure how neutron stars actually behave.

Love numbers are a promising way to do that. Love numbers tell you how an object gets squeezed and stretched in a gravitational field. Learning the Love numbers of neutron stars will tell us something about their structure: namely, how squeezable and stretchable they are. Already, LIGO and Virgo have given us some information about this, and ruled out a few possibilities. In future, the LISA telescope will show much more.

Returning to black holes, you might wonder what happens if we don’t stick to Einstein’s theory of relativity. Physicists expect that relativity has to be modified to account for quantum effects, to make a true theory of quantum gravity. We don’t quite know how to do that yet, but there are a few proposals on the table.

Asking for the true theory of quantum gravity isn’t just a question about some specific part of the natural world, it’s a question about the fundamental laws of physics. Can Love numbers help us answer it?

Maybe. Some theorists think that quantum gravity will change the Love numbers of black holes. Fewer, but still some, think they will change enough to be detectable, with future gravitational wave telescopes like LISA. I get the impression this is controversial, both because of the different proposals involved and the approximations used to understand them. Still, it’s fun that Love numbers can answer so many different types of questions, and teach us so many different things about physics.

Unrelated: For those curious about what I look/sound like, I recently gave a talk of outreach advice for the Max Planck Institute for Physics, and they posted it online here.

March 18, 2021

Scott AaronsonAbel to win

Many of you will have seen the happy news today that Avi Wigderson and László Lovász share this year’s Abel Prize (which now contends with the Fields Medal for the highest award in pure math). This is only the second time that the Abel Prize has been given wholly or partly for work in theoretical computer science, after Szemerédi in 2012. See also the articles in Quanta or the NYT, which actually say most of what I would’ve said for a lay audience about Wigderson’s and Lovász’s most famous research results and their importance (except, no, Avi hasn’t yet proved P=BPP, just taken some major steps toward it…).

On a personal note, Avi was both my and my wife Dana’s postdoctoral advisor at the Institute for Advanced Study in Princeton. He’s been an unbelievably important mentor to both of us, as he’s been for dozens of others in the CS theory community. Back in 2007, I also had the privilege of working closely with Avi for months on our Algebrization paper. Now would be a fine time to revisit Avi’s Permanent Impact on Me (or watch the YouTube video), which is the talk I gave at IAS in 2016 on the occasion of Avi’s 60th birthday.

Huge congratulations to both Avi and László!

March 14, 2021

Clifford JohnsonPi Day!

It’s pi day today! Don’t forget to celebrate! Options include walking around in circles at 1:59pm, baking a pie, eating pie, etc! Or maybe explain what pi is to somebody… I won’t list irrational behaviour as there’s a bit too much of that already… –cvj (Above image snapped from a … Click to continue reading this post

The post Pi Day! appeared first on Asymptotia.

March 13, 2021

Terence TaoBoosting the van der Corput inequality using the tensor power trick

In this previous blog post I noted the following easy application of Cauchy-Schwarz:

Lemma 1 (Van der Corput inequality) Let {v,u_1,\dots,u_n} be unit vectors in a Hilbert space {H}. Then

\displaystyle  (\sum_{i=1}^n |\langle v, u_i \rangle_H|)^2 \leq \sum_{1 \leq i,j \leq n} |\langle u_i, u_j \rangle_H|.

Proof: The left-hand side may be written as {\langle v, \sum_{i=1}^n \epsilon_i u_i \rangle_H} for some unit complex numbers {\epsilon_i}. By Cauchy-Schwarz we have

\displaystyle  |\langle v, \sum_{i=1}^n \epsilon_i u_i \rangle_H|^2 \leq \langle \sum_{i=1}^n \epsilon_i u_i, \sum_{j=1}^n \epsilon_j u_j \rangle_H

and the claim now follows from the triangle inequality. \Box

As a corollary, correlation becomes transitive in a statistical sense (even though it is not transitive in an absolute sense):

Corollary 2 (Statistical transitivity of correlation) Let {v,u_1,\dots,u_n} be unit vectors in a Hilbert space {H} such that {|\langle v,u_i \rangle_H| \geq \delta} for all {i=1,\dots,n} and some {0 < \delta \leq 1}. Then we have {|\langle u_i, u_j \rangle_H| \geq \delta^2/2} for at least {\delta^2 n^2/2} of the pairs {(i,j) \in \{1,\dots,n\}^2}.

Proof: From the lemma, we have

\displaystyle  \sum_{1 \leq i,j \leq n} |\langle u_i, u_j \rangle_H| \geq \delta^2 n^2.

The contribution of those {i,j} with {|\langle u_i, u_j \rangle_H| < \delta^2/2} is at most {\delta^2 n^2/2}, and all the remaining summands are at most {1}, giving the claim. \Box

One drawback with this corollary is that it does not tell us which pairs {u_i,u_j} correlate. In particular, if the vector {v} also correlates with a separate collection {w_1,\dots,w_n} of unit vectors, the pairs {(i,j)} for which {u_i,u_j} correlate may have no intersection whatsoever with the pairs in which {w_i,w_j} correlate (except of course on the diagonal {i=j} where they must correlate).

While working on an ongoing research project, I recently found that there is a very simple way to get around the latter problem by exploiting the tensor power trick:

Corollary 3 (Simultaneous statistical transitivity of correlation) Let {v, u^k_i} be unit vectors in a Hilbert space for {i=1,\dots,n} and {k=1,\dots,K} such that {|\langle v, u^k_i \rangle_H| \geq \delta_k} for all {i=1,\dots,n}, {k=1,\dots,K} and some {0 < \delta_k \leq 1}. Then there are at least {(\delta_1 \dots \delta_K)^2 n^2/2} pairs {(i,j) \in \{1,\dots,n\}^2} such that {\prod_{k=1}^K |\langle u^k_i, u^k_j \rangle_H| \geq (\delta_1 \dots \delta_K)^2/2}. In particular (by Cauchy-Schwarz) we have {|\langle u^k_i, u^k_j \rangle_H| \geq (\delta_1 \dots \delta_K)^2/2} for all {k}.

Proof: Apply Corollary 2 to the unit vectors {v^{\otimes K}} and {u^1_i \otimes \dots \otimes u^K_i}, {i=1,\dots,n} in the tensor power Hilbert space {H^{\otimes K}}. \Box

It is surprisingly difficult to obtain even a qualitative version of the above conclusion (namely, if {v} correlates with all of the {u^k_i}, then there are many pairs {(i,j)} for which {u^k_i} correlates with {u^k_j} for all {k} simultaneously) without some version of the tensor power trick. For instance, even the powerful Szemerédi regularity lemma, when applied to the set of pairs {i,j} for which one has correlation of {u^k_i}, {u^k_j} for a single {i,j}, does not seem to be sufficient. However, there is a reformulation of the argument using the Schur product theorem as a substitute for (or really, a disguised version of) the tensor power trick. For simplicity of notation let us just work with real Hilbert spaces to illustrate the argument. We start with the identity

\displaystyle  \langle u^k_i, u^k_j \rangle_H = \langle v, u^k_i \rangle_H \langle v, u^k_j \rangle_H + \langle \pi(u^k_i), \pi(u^k_j) \rangle_H

where {\pi} is the orthogonal projection to the complement of {v}. This implies a Gram matrix inequality

\displaystyle  (\langle u^k_i, u^k_j \rangle_H)_{1 \leq i,j \leq n} \succ (\langle v, u^k_i \rangle_H \langle v, u^k_j \rangle_H)_{1 \leq i,j \leq n} \succ 0

for each {k} where {A \succ B} denotes the claim that {A-B} is positive semi-definite. By the Schur product theorem, we conclude that

\displaystyle  (\prod_{k=1}^K \langle u^k_i, u^k_j \rangle_H)_{1 \leq i,j \leq n} \succ (\prod_{k=1}^K \langle v, u^k_i \rangle_H \langle v, u^k_j \rangle_H)_{1 \leq i,j \leq n}

and hence for a suitable choice of signs {\epsilon_1,\dots,\epsilon_n},

\displaystyle  \sum_{1 \leq i, j \leq n} \epsilon_i \epsilon_j \prod_{k=1}^K \langle u^k_i, u^k_j \rangle_H \geq \delta_1^2 \dots \delta_K^2 n^2.

One now argues as in the proof of Corollary 2.

A separate application of tensor powers to amplify correlations was also noted in this previous blog post giving a cheap version of the Kabatjanskii-Levenstein bound, but this seems to not be directly related to this current application.

March 05, 2021

Clifford JohnsonA Dialogue about Art and Science!

On Saturday (tomorrow), I'll be talking with science writer Philip Ball at the Malvern Festival of Ideas! The topic will be Science and Art, and I think it will be an interesting and fun exchange. It is free, online, and starts at 5:15 pm UK time. You can click here for the details.

I'll talk a little bit about how I came to create the non-fiction science book The Dialogues, using graphic narrative art to help frame and drive the ideas forward, and how I really wanted to re-shape what is the norm for a popular science book, where somehow using just prose to talk about serious scientific ideas has become regarded as the pinnacle of achievement - this runs counter to so many things, not the least being the fact that scientists themselves don't just use prose to communicate with each other!

But anyway, that's just the beginning of it all. Philip and I will talk about [...] Click to continue reading this post

The post A Dialogue about Art and Science! appeared first on Asymptotia.

March 01, 2021

Terence Tao246B, Notes 3: Elliptic functions and modular forms

Previous set of notes: Notes 2. Next set of notes: Notes 4.

On the real line, the quintessential examples of a periodic function are the (normalised) sine and cosine functions {\sin(2\pi x)}, {\cos(2\pi x)}, which are {1}-periodic in the sense that

\displaystyle  \sin(2\pi(x+1)) = \sin(2\pi x); \quad \cos(2\pi (x+1)) = \cos(2\pi x).

By taking various polynomial combinations of {\sin(2\pi x)} and {\cos(2\pi x)} we obtain more general trigonometric polynomials that are {1}-periodic; and the theory of Fourier series tells us that all other {1}-periodic functions (with reasonable integrability conditions) can be approximated in various senses by such polynomial combinations. Using Euler’s identity, one can use {e^{2\pi ix}} and {e^{-2\pi ix}} in place of {\sin(2\pi x)} and {\cos(2\pi x)} as the basic generating functions here, provided of course one is willing to use complex coefficients instead of real ones. Of course, by rescaling one can also make similar statements for other periods than {1}. {1}-periodic functions {f: {\bf R} \rightarrow {\bf C}} can also be identified (by abuse of notation) with functions {f: {\bf R}/{\bf Z} \rightarrow {\bf C}} on the quotient space {{\bf R}/{\bf Z}} (known as the additive {1}-torus or additive unit circle), or with functions {f: [0,1] \rightarrow {\bf C}} on the fundamental domain (up to boundary) {[0,1]} of that quotient space with the periodic boundary condition {f(0)=f(1)}. The map {x \mapsto (\cos(2\pi x), \sin(2\pi x))} also identifies the additive unit circle {{\bf R}/{\bf Z}} with the geometric unit circle {S^1 = \{ (x,y) \in {\bf R}^2: x^2+y^2=1\} \subset {\bf R}^2}, thanks in large part to the fundamental trigonometric identity {\cos^2 x + \sin^2 x = 1}; this can also be identified with the multiplicative unit circle {S^1 = \{ z \in {\bf C}: |z|=1 \}}. (Usually by abuse of notation we refer to all of these three sets simultaneously as the “unit circle”.) Trigonometric polynomials on the additive unit circle then correspond to ordinary polynomials of the real coefficients {x,y} of the geometric unit circle, or Laurent polynomials of the complex variable {z}.

What about periodic functions on the complex plane? We can start with singly periodic functions {f: {\bf C} \rightarrow {\bf C}} which obey a periodicity relationship {f(z+\omega)=f(z)} for all {z} in the domain and some period {\omega \in {\bf C} \backslash \{0\}}; such functions can also be viewed as functions on the “additive cylinder” {\omega {\bf Z} \backslash {\bf C}} (or equivalently {{\bf C} / \omega {\bf Z}}). We can rescale {\omega=1} as before. For holomorphic functions, we have the following characterisations:

Proposition 1 (Description of singly periodic holomorphic functions)
  • (i) Every {1}-periodic entire function {f: {\bf C} \rightarrow {\bf C}} has an absolutely convergent expansion

    \displaystyle  f(z) = \sum_{n=-\infty}^\infty a_n e^{2\pi i nz} = \sum_{n=-\infty}^\infty a_n q^n \ \ \ \ \ (1)

    where {q} is the nome {q := e^{2\pi i z}}, and the {a_n} are complex coefficients such that

    \displaystyle  \limsup_{n \rightarrow +\infty} |a_n|^{1/n} = \limsup_{n \rightarrow +\infty} |a_{-n}|^{1/n} = 0. \ \ \ \ \ (2)

    Conversely, every doubly infinite sequence {(a_n)_{n \in {\bf Z}}} of coefficients obeying (2) gives rise to a {1}-periodic entire function {f: {\bf C} \rightarrow {\bf C}} via the formula (1).
  • (ii) Every bounded {1}-periodic holomorphic function {f: {\bf H} \rightarrow {\bf C}} on the upper half-plane {\{ z: \mathrm{Im}(z) > 0\}} has an expansion

    \displaystyle  f(z) = \sum_{n=0}^\infty a_n e^{2\pi i nz} = \sum_{n=0}^\infty a_n q^n \ \ \ \ \ (3)

    where the {a_n} are complex coefficients such that

    \displaystyle  \limsup_{n \rightarrow +\infty} |a_n|^{1/n} \leq 1. \ \ \ \ \ (4)

    Conversely, every infinite sequence {(a_n)_{n \in {\bf Z}}} obeying (4) gives rise to a {1}-periodic holomorphic function {f: {\bf H} \rightarrow {\bf C}} which is bounded away from the real axis (i.e., bounded on {\{ z: \mathrm{Im}(z) \geq \varepsilon\}} for every {\varepsilon > 0}).
In both cases, the coefficients {a_n} can be recovered from {f} by the Fourier inversion formula

\displaystyle  a_n = \int_{\gamma_{z_0 \rightarrow z_0+1}} f(z) e^{-2\pi i nz}\ dz \ \ \ \ \ (5)

for any {z_0} in {{\bf C}} (in case (i)) or {{\bf H}} (in case (ii)).

Proof: If {f: {\bf C} \rightarrow {\bf C}} is {1}-periodic, then it can be expressed as {f(z) = F(q) = F(e^{2\pi i z})} for some function {F: {\bf C} \backslash \{0\} \rightarrow {\bf C}} on the “multiplicative cylinder” {{\bf C} \backslash \{0\}}, since the fibres of the map {z \mapsto e^{2\pi i z}} are cosets of the integers {{\bf Z}}, on which {f} is constant by hypothesis. As the map {z \mapsto e^{2\pi i z}} is a covering map from {{\bf C}} to {{\bf C} \backslash \{0\}}, we see that {F} will be holomorphic if and only if {f} is. Thus {F} must have a Laurent series expansion {F(q) = \sum_{n=-\infty}^\infty a_n q^n} with coefficients {a_n} obeying (2), which gives (1), and the inversion formula (5) follows from the usual contour integration formula for Laurent series coefficients. The converse direction to (i) also follows by reversing the above arguments.

For part (ii), we observe that the map {z \mapsto e^{2\pi i z}} is also a covering map from {{\bf H}} to the punctured disk {D(0,1) \backslash \{0\}}, so we can argue as before except that now {F} is a bounded holomorphic function on the punctured disk. By the Riemann singularity removal theorem (Exercise 35 of 246A Notes 3) {F} extends to be holomorphic on all of {D(0,1)}, and thus has a Taylor expansion {F(q) = \sum_{n=0}^\infty a_n q^n} for some coefficients {a_n} obeying (4). The argument now proceeds as with part (i). \Box

The additive cylinder {{\bf Z} \backslash {\bf C}} and the multiplicative cylinder {{\bf C} \backslash \{0\}} can both be identified (on the level of smooth manifolds, at least) with the geometric cylinder {\{ (x,y,z) \in {\bf R}^3: x^2+y^2=1\}}, but we will not use this identification here.

Now let us turn attention to doubly periodic functions of a complex variable {z}, that is to say functions {f} that obey two periodicity relations

\displaystyle  f(z+\omega_1) = f(z); \quad f(z+\omega_2) = f(z)

for all {z \in {\bf C}} and some periods {\omega_1,\omega_2 \in {\bf C}}, which to avoid degeneracies we will assume to be linearly independent over the reals (thus {\omega_1,\omega_2} are non-zero and the ratio {\omega_2/\omega_1} is not real). One can rescale {\omega_1,\omega_2} by a common scaling factor {\lambda \in {\bf C} \backslash \{0\}} to normalise either {\omega_1=1} or {\omega_2=1}, but one of course cannot simultaneously normalise both parameters in this fashion. As in the singly periodic case, such functions can also be identified with functions on the additive {2}-torus {\Lambda \backslash {\bf C}}, where {\Lambda} is the lattice {\Lambda := \omega_1 {\bf Z} + \omega_2 {\bf Z}}, or with functions {f} on the solid parallelogram bounded by the contour {\gamma_{0 \rightarrow \omega_1 \rightarrow \omega_1+\omega_2 \rightarrow \omega_2 \rightarrow 0}} (a fundamental domain up to boundary for that torus), obeying the boundary periodicity conditions

\displaystyle  f(z+\omega_1) = f(z)

for {z} in the edge {\gamma_{\omega_2 \rightarrow 0}}, and

\displaystyle  f(z+\omega_2) = f(z)

for {z} in the edge {\gamma_{\omega_0 \rightarrow 1}}.

Within the world of holomorphic functions, the collection of doubly periodic functions is boring:

Proposition 2 Let {f: {\bf C} \rightarrow {\bf C}} be an entire doubly periodic function (with periods {\omega_1,\omega_2} linearly independent over {{\bf R}}). Then {f} is constant.

In the language of Riemann surfaces, this proposition asserts that the torus {\Lambda \backslash {\bf C}} is a non-hyperbolic Riemann surface; it cannot be holomorphically mapped non-trivially into a bounded subset of the complex plane.

Proof: The fundamental domain (up to boundary) enclosed by {\gamma_{0 \rightarrow \omega_1 \rightarrow \omega_1+\omega_2 \rightarrow \omega_2 \rightarrow 0}} is compact, hence {f} is bounded on this domain, hence bounded on all of {{\bf C}} by double periodicity. The claim now follows from Liouville’s theorem. (One could alternatively have argued here using the compactness of the torus {(\omega_1 {\bf Z} + \omega_2 {\bf Z}) \backslash {\bf C}}. \Box

To obtain more interesting examples of doubly periodic functions, one must therefore turn to the world of meromorphic functions – or equivalently, holomorphic functions into the Riemann sphere {{\bf C} \cup \{\infty\}}. As it turns out, a particularly fundamental example of such a function is the Weierstrass elliptic function

\displaystyle  \wp(z) := \frac{1}{z^2} + \sum_{z_0 \in \Lambda \backslash 0} \frac{1}{(z-z_0)^2} - \frac{1}{z_0^2} \ \ \ \ \ (6)

which plays a role in doubly periodic functions analogous to the role of {x \mapsto \cos(2\pi x)} for {1}-periodic real functions. This function will have a double pole at the origin {0}, and more generally at all other points on the lattice {\Lambda}, but no other poles. The derivative

\displaystyle  \wp'(z) = -2 \sum_{z_0 \in \Lambda} \frac{1}{(z-z_0)^3} \ \ \ \ \ (7)

of the Weierstrass function is another doubly periodic meromorphic function, now with a triple pole at every point of {\Lambda}, and plays a role analogous to {x \mapsto \sin(2\pi x)}. Remarkably, all the other doubly periodic meromorphic functions with these periods will turn out to be rational combinations of {\wp} and {\wp'}; furthermore, in analogy with the identity {\cos^2 x+ \sin^2 x = 1}, one has an identity of the form

\displaystyle  \wp'(z)^2 = 4 \wp(z)^3 - g_2 \wp(z) - g_3 \ \ \ \ \ (8)

for all {z \in {\bf C}} (avoiding poles) and some complex numbers {g_2,g_3} that depend on the lattice {\Lambda}. Indeed, much as the map {x \mapsto (\cos 2\pi x, \sin 2\pi x)} creates a diffeomorphism between the additive unit circle {{\bf R}/{\bf Z}} to the geometric unit circle {\{ (x,y) \in{\bf R}^2: x^2+y^2=1\}}, the map {z \mapsto (\wp(z), \wp'(z))} turns out to be a complex diffeomorphism between the torus {(\omega_1 {\bf Z} + \omega_2 {\bf Z}) \backslash {\bf C}} and the elliptic curve

\displaystyle  \{ (z, w) \in {\bf C}^2: z^2 = 4w^3 - g_2 w - g_3 \} \cup \{\infty\}

with the convention that {(\wp,\wp')} maps the origin {\omega_1 {\bf Z} + \omega_2 {\bf Z}} of the torus to the point {\infty} at infinity. (Indeed, one can view elliptic curves as “multiplicative tori”, and both the additive and multiplicative tori can be identified as smooth manifolds with the more familiar geometric torus, but we will not use such an identification here.) This fundamental identification with elliptic curves and tori motivates many of the further remarkable properties of elliptic curves; for instance, the fact that tori are obviously an abelian group gives rise to an abelian group law on elliptic curves (and this law can be interpreted as an analogue of the trigonometric sum identities for {\wp, \wp'}). The description of the various meromorphic functions on the torus also helps motivate the more general Riemann-Roch theorem that is a fundamental law governing meromorphic functions on other compact Riemann surfaces (and is discussed further in these 246C notes). So far we have focused on studying a single torus {\Lambda \backslash {\bf C}}. However, another important mathematical object of study is the space of all such tori, modulo isomorphism; this is a basic example of a moduli space, known as the (classical, level one) modular curve {X_0(1)}. This curve can be described in a number of ways. On the one hand, it can be viewed as the upper half-plane {{\bf H} = \{ z: \mathrm{Im}(z) > 0 \}} quotiented out by the discrete group {SL_2({\bf Z})}; on the other hand, by using the {j}-invariant, it can be identified with the complex plane {{\bf C}}; alternatively, one can compactify the modular curve and identify this compactification with the Riemann sphere {{\bf C} \cup \{\infty\}}. (This identification, by the way, produces a very short proof of the little and great Picard theorems, which we proved in 246A Notes 4.) Functions on the modular curve (such as the {j}-invariant) can be viewed as {SL_2({\bf Z})}-invariant functions on {{\bf H}}, and include the important class of modular functions; they naturally generalise to the larger class of (weakly) modular forms, which are functions on {{\bf H}} which transform in a very specific way under {SL_2({\bf Z})}-action, and which are ubiquitous throughout mathematics, and particularly in number theory. Basic examples of modular forms include the Eisenstein series, which are also the Laurent coefficients of the Weierstrass elliptic functions {\wp}. More number theoretic examples of modular forms include (suitable powers of) theta functions {\theta}, and the modular discriminant {\Delta}. Modular forms are {1}-periodic functions on the half-plane, and hence by Proposition 1 come with Fourier coefficients {a_n}; these coefficients often turn out to encode a surprising amount of number-theoretic information; a dramatic example of this is the famous modularity theorem, (a special case of which was) used amongst other things to establish Fermat’s last theorem. Modular forms can be generalised to other discrete groups than {SL_2({\bf Z})} (such as congruence groups) and to other domains than the half-plane {{\bf H}}, leading to the important larger class of automorphic forms, which are of major importance in number theory and representation theory, but which are well outside the scope of this course to discuss.

— 1. Doubly periodic functions —

Throughout this section we fix two complex numbers {\omega_1,\omega_2} that are linearly independent over {{\bf R}}, which then generate a lattice {\Lambda := \omega_1 {\bf Z} + \omega_2{\bf Z}}.

We now study the doubly periodic meromorphic functions with respect to these periods that are not identically zero. We first observe some constraints on the poles of these functions. Of course, by periodicity, the poles will themselves be periodic, and thus the set of poles forms a finite union of disjoint cosets {\zeta_j + \Lambda} of the lattice {\Lambda}. Similarly, the zeroes form a finite union of disjoint cosets {\lambda_j + \Lambda}. Using the residue theorem, we can obtain some further constraints:

Lemma 3 (Consequences of residue theorem) Let {f: {\bf C} \rightarrow {\bf C} \cup \{\infty\}} be a doubly periodic meromorphic function (not identically zero) with periods {\omega_1,\omega_2}, poles at {\zeta_j + \Lambda}, and zeroes at {\lambda_j + \Lambda}.
  • (i) The sum of residues at each {\zeta_j} (i.e., we sum one residue per coset) is equal to zero.
  • (ii) The number of poles {\zeta_j} (counting multiplicity, but only counting once per coset) is equal to the number of zeroes {\lambda_j} (again counting multiplicity, and once per coset).
  • (iii) The sum of the poles {\zeta_j + \Lambda} (counting multiplicity, and working in the group {\Lambda \backslash {\bf C}}) is equal to the sum of the zeroes {\lambda_j + \Lambda}.

Proof: For (i), we first apply a translation so that none of the pole cosets {\zeta_j + \Lambda} intersects the fundamental parallelogram boundary {\gamma_{0 \rightarrow \omega_1 \rightarrow \omega_1+\omega_2 \rightarrow \omega_2 \rightarrow 0}}; this of course does not affect the sum of residues. Then, by the residue theorem, the sum in (i) is equal to the expression

\displaystyle  \frac{\pm 1}{2\pi i} \int_{\gamma_{0 \rightarrow \omega_1 \rightarrow \omega_1+\omega_2 \rightarrow \omega_2 \rightarrow 0}} f(z)\ dz

(the sign depends on whether this contour is oriented counter-clockwise or clockwise). But from the double periodicity we see that the integral vanishes (the contribution of parallel pairs of edges cancel each other). For part (ii), apply part (i) to the logarithmic derivative {f'/f}, which is also doubly periodic.

For part (iii), we again translate so that none of the pole or zero cosets intersects {\gamma_{0 \rightarrow \omega_1 \rightarrow \omega_1+\omega_2 \rightarrow \omega_2 \rightarrow 0}}, noting from part (ii) that any such translation affects the sum of poles and sum of zeroes by the same amount. By the residue theorem, it now suffices to show that

\displaystyle  \frac{\pm 1}{2\pi i} \int_{\gamma_{0 \rightarrow \omega_1 \rightarrow \omega_1+\omega_2 \rightarrow \omega_2 \rightarrow 0}} z \frac{f'(z)}{f(z)}\ dz

lies in the lattice {\Lambda}. But one can rewrite this using the double periodicity as

\displaystyle  \frac{\pm 1}{2\pi i} ( \int_{\gamma_{0 \rightarrow \omega_2}} \omega_1 \frac{f'(z)}{f(z)}\ dz - \int_{\gamma_{0 \rightarrow \omega_1}} \omega_2 \frac{f'(z)}{f(z)}\ dz ),

so it suffices to show that {\frac{1}{2\pi i} \int_{\gamma_{0 \rightarrow \omega_j}} \frac{f'(z)}{f(z)}\ dz} is an integer for {j=1,2}. But (a slight modification of) the argument principle shows that this number is precisely the winding number around the origin of the image of {\gamma_{0 \rightarrow \omega_j}} under the map {\frac{f'}{f}}, and the claim follows. \Box

This lemma severely limits the possible number of behaviors for the zeroes and poles of a meromorphic function. To formalise this, we introduce some general notation:

Definition 4 (Divisors)
  • (i) A divisor on the torus {\Lambda\backslash {\bf C}} is a formal integer linear combination {D = \sum_P c_P \cdot (P)}, where {P} ranges over a finite collection of points in the torus {\Lambda \backslash {\bf C}} (i.e., a finite collection of cosets {\zeta + \Lambda}), and {c_P} are integers, with the obvious additive group structure; equivalently, the space {\mathrm{Div}( \Lambda\backslash {\bf C} )} of divisors is the free abelian group with generators {(P)} for {P \in \Lambda \backslash {\bf C}} (with the convention {1 \cdot (P) = (P)}).
  • (ii) The number {\sum_P c_P} is the degree {\mathrm{deg}(D)} of a divisor {D = \sum_P c_P \cdot (P)}, the point {\sum_P c_P P \in \Lambda \backslash {\bf C}} is the sum {\mathrm{sum}(D)} of {D}, and each {c_P} is the order {\mathrm{ord}_P(D)} of the divisor at {P} (with the convention that the order is {0} if {P} does not appear in the sum). A divisor is non-negative (or effective) if {c_P \geq 0} for all {P}. We write {D_1 \geq D_2} if {D_1 - D_2} is non-negative (i.e., the order of {D_1} is greater than or equal to that of {D_2} at every point {P}, and {D_1 > D_2} if {D_1 \geq D_2} and {D_1 \neq D_2}.
  • (iii) Given a meromorphic function {f: \Lambda \backslash {\bf C} \rightarrow {\bf C} \cup \{\infty\}} (or equivalently, a doubly periodic function {f: {\bf C} \rightarrow {\bf C} \cup \{\infty\}}) that is not identically zero, the principal divisor {(f)} is the divisor {\sum_P \mathrm{ord}_P(f) (P)}, where {P} ranges over the zeroes and poles of {f}, and {\mathrm{ord}_P(f)} is the order of the zero (if {P} is a zero) or negative the order of the pole (if {P} is a pole).
  • (iv) Given a divisor {D = \sum_P c_P \cdot (P)}, we define {L(D)} to be the space of all meromorphic functions {f} that are either zero, or are such that {(f)+D \geq 0}. That is to say, {L(D)} consists of those meromorphic functions that have at most a pole of order {c_P} at {P} if {c_P} is positive, or at least zero of order {-c_P} if {c_P} is negative.

A divisor can be viewed as an abstraction of the concept of a set of zeroes and poles (counting multiplicity). Observe that principal divisors obey the laws {(fg) = (f)+(g)}, {(f/g) = (f) - (g)} when {f,g} are meromorphic and non-zero. In particular, the space {\mathrm{PDiv}(\Lambda \backslash {\bf C})} of principal divisors is a subgroup of the space {\mathrm{Div}(\Lambda \backslash {\bf C})} of all divisors. By Lemma 3(ii), all principal divisors have degree zero, and from Lemma 3(iii), all principal divisors have sum zero as well. Later on we shall establish the converse claim that every divisor of degree and sum zero is a principal divisor; see Exercise 7.

Remark 5 One can define divisors on other Riemann surfaces, such as the complex plane {{\bf C}}. Observe from the fundamental theorem of algebra that if one has two non-zero polynomials {P(z), Q(z)}, then {(P) \leq (Q)} if and only if {P} divides {Q} as a polynomial. This may give some hint as to the origin of the terminology “divisor”. The machinery of divisors turns out to have a rich algebraic and topological structure when applied to more general Riemann surfaces than tori, for instance enabling one to associate an abelian variety (the Jacobian variety) to every algebraic curve; see these 246C notes for further discussion.

It is easy to see that {L(D)} is always a vector space. All non-zero meromorphic functions {f: \Lambda \backslash {\bf C} \rightarrow {\bf C}} belong to at least one of the {L(D)}, namely {L(-(f))}, so to classify all the meromorphic functions on {\Lambda \backslash {\bf C}}, it would suffice to understand what all the spaces {L(D)} are.

Liouville’s theorem (in the form of Proposition 2) tells us that all elements of {L(0)} – that is to say, the holomorphic functions on {\Lambda \backslash {\bf C}} – are constant; thus {L(0)} is one-dimensional. If {D<0} is a negative divisor, the elements of {L(D)} are thus constant and have at least one zero, thus in these cases {L(D)} is trivial.

Now we gradually work our way up to higher degree divisors {L(D)}. A basic fact, proven from elementary linear algebra, is that every time one adds a pole to {D}, the dimension of the space {L(D)} only goes up by at most one:

Lemma 6 For any divisor {D} and any {P \in \Lambda \backslash {\bf C}}, {L(D)} is a subspace of {L(D + (P))} of codimension at most one. In particular, {L(D)} is finite-dimensional for any {D}.

Proof: It is clear that {L(D)} is a subspace of {L(D + (P))}. If {D} has order {m} at {P = \zeta + \Lambda}, then there is a linear functional {\lambda: L(D+(P)) \rightarrow {\bf C}} that assigns to each meromorphic function {f: \Lambda \backslash {\bf C} \rightarrow {\bf C} \cup \{\infty\}} the {\frac{1}{(z-\zeta)^{m+1}}} coefficient of the Laurent expansion of {f} at {\zeta} (note from periodicity that the exact choice of coset representative {\zeta} is not relevant. A little thought reveals that the kernel of {\lambda} is precisely {L(D)}, and the first claim follows. The second claim follows from iterating the first claim, noting that any divisor {D} can be obtained from a suitable negative divisor by the addition of finitely many poles {(P)}. \Box

Now consider the space {L((P))} for some point {P \in \Lambda \backslash {\bf C}}. Lemma 6 tells us that the dimension of this space is either one or two, since {L(0)} was one-dimensional. The space {L((P))} consists of functions {f} that possibly have a simple pole at most at {P}, and no other poles. But Lemma 3(i) tells us that the residue at {P} has to vanish, and so {f} is in fact in {L(0)} and thus is constant. (One could also argue here using the other two parts of Lemma 2; how?) So {L((P))} is no larger than {L(0)}, and is thus also one-dimensional.

Now let us study the space {L(2 \cdot (P))} – the space of meromorphic functions that have at most a double pole at {P} and no other poles. Again, Lemma 6 tells us that this space is one or two dimensional. To figure out which, we can normalise {P} to be the origin coset {\Lambda}. The question is now whether there is a doubly periodic meromorphic function that has a double pole at each point of {\Lambda}. A naive candidate for such a function would be the infinite series

\displaystyle  \sum_{z_0 \in \Lambda} \frac{1}{(z-z_0)^2},

however this series turns out to not be absolutely convergent. Somewhat in analogy with the discussion of the Weierstrass and Hadamard factorisation theorems in Notes 1, we then proceed instead by working with the normalised function {\wp} defined by the formula (6). Let us first verify that the series in (6) is absolutely convergent for {z \not \in \Lambda}. There are only finitely many {z_0 \in \Lambda} with {|z_0| \leq 2|z|}, and all the summands are finite for {z \not \in\Lambda}, so we only need to establish convergence of the tail

\displaystyle  \sum_{z_0 \in \Lambda: |z_0| \geq 2|z|} \frac{1}{(z-z_0)^2} - \frac{1}{z_0^2}.

However, from the fundamental theorem of calculus we have

\displaystyle \frac{1}{(z-z_0)^2} - \frac{1}{z_0^2} = z\int_0^1 \frac{-2}{(tz-z_0)^3}\ dt = O_z( |z_0|^{-3} )

so to demonstrate absolute convergence it suffices to show that

\displaystyle  \sum_{z_0 \in \Lambda: z_0 \neq 0} \frac{1}{|z_0|^3} < \infty.

But a simple volume packing argument (considering the areas of the translates {z_0 + D} of the fundamental domain {D}) shows that the number of lattice points {z_0} in any disk {D(0,R)}, {R \geq 1} is {O_\Lambda(R^2)}, and so by dyadic decomposition as in Notes 1, the series is absolutely convergent. Further repetition of the arguments from Notes 1 shows that the series in (6) converges locally uniformly in {{\bf C} \backslash \Lambda}, and thus is holomorphic on this set. Furthermore, for any {z_0 \in \Lambda}, the same arguments show that {\wp(z) - \frac{1}{(z-z_0)^2}} stays bounded in a punctured neighbourhood of {z_0}, thus by the Riemann singularity removal theorem {\wp(z)} is equal to {\frac{1}{(z-z_0)^2}} plus a bounded holomorphic function in the neighbourhood of {z_0}. Thus {\wp} is meromorphic with double poles (and vanishing residue) at every lattice point {\Lambda}, and no other poles.

Now we show that {\wp} is doubly periodic, thus {\wp(z + \omega_1) = \wp(z)} and {\wp(z + \omega_2) = \wp(z)} for {z \in {\bf C} \backslash \Lambda}. We just prove the first identity, as the second is analogous. From (6) we have

\displaystyle  \wp(z+\omega_1) - \wp(z) = \sum_{z_0 \in \Lambda} \frac{1}{(z-z_0+\omega_1)^2} - \frac{1}{(z-z_0)^2}.

The series on the right is absolutely convergent, and on every coset of {\omega_1 {\bf Z}} it telescopes to zero. The claim then follows by Fubini’s theorem.

By construction, {\wp} lies in {L(2 \cdot (\Lambda))}, and is clearly non-constant. Thus {L(2 \cdot (\Lambda))} is two-dimensional, being spanned by the constant function {1} and {\wp}. By translation, we see that {L(2 \cdot (P))} is two-dimensional for any other point {P \in \Lambda \backslash {\bf C}} as well.

From (6) it is also clear that the function {\wp} is even: {\wp(-z) = \wp(z)}. In particular, for any {a \in {\bf C}} avoiding the half-lattice {\frac{1}{2} \Lambda = \{ \frac{1}{2} z_0: z_0 \in \Lambda\}} (so that {a} and {-a} occupy different locations in the torus {\Lambda \backslash {\bf C}}), the function {\wp - \wp(a) \in L(2 \cdot (\Lambda))} has a zero at both {a+\Lambda} and {-a+\Lambda}. By Lemma 3(ii) there are no other zeroes of this function (and this claim is also consistent with Lemma 3(iii)); thus the divisor {(\wp - \wp(a))} of this function is given by

\displaystyle  (\wp - \wp(a)) = 2 \cdot (\Lambda) - (a + \Lambda) - (-a+\Lambda). \ \ \ \ \ (9)

If {a} lies in the half-lattice {\frac{1}{2} \Lambda} but not in {\Lambda} (thus, it lies in one of the half-periods {\frac{\omega_1}{2} + \Lambda}, {\frac{\omega_2}{2} + \Lambda}, or {\frac{\omega_1+\omega_2}{2} + \Lambda}) then from the even and doubly periodic nature of {\wp} we see that {\wp(a+z) = \wp(a-z)} for all {z \not \in \frac{1}{2} \Lambda}, so {\wp - \wp(a)} in fact must have at least a double zero at {a}, and again from Lemma 3(ii) these are the only zeroes of this function. So the identity (9) also holds in this case.

Exercise 7 (Classification of principal divisors)

Now let us study the space {L(3 \cdot (P))}, where we again normalise {P = \Lambda} for sake of discussion. Lemma 6 tells us that this space is two or three dimensional, being spanned by {1}, {\wp}, and possibly one other function. Note that the derivative {\wp'} of the meromorphic function {\wp} is also doubly periodic with a triple pole at {P}, so it lies in {L(3 \cdot (P))} and is not a linear combination of {1} or {\wp} (as these have a lower order singularity at {P}). Thus {L(3 \cdot (\Lambda))} is three-dimensional, being spanned by {1,\wp,\wp'}. A formal term-by-term differentiation of (6) gives (7). To justify (7), observe that the arguments that demonstrated the meromorphicity of the right-hand side of (6) also show the meromorphicity of (7). From Fubini’s theorem, the fundamental theorem of calculus, and (6) we see that

\displaystyle  \int_\gamma (-2 \sum_{z_0 \in \Lambda} \frac{1}{(z-z_0)^3})\ dz = \wp(z_2) - \wp(z_1)

for any contour {\gamma} in {{\bf C} \backslash \Lambda} from one point {z_1 \in {\bf C} \backslash \Lambda} to another {z_2 \in {\bf C} \backslash \Lambda}, and the claim (7) now follows from another appeal to the fundamental theorem of calculus. Of course, {L(3 \cdot (P))} will then also be three-dimensional for any other point {P} on the torus. From (7) we also see that {\wp'} is odd; this also follows from the even nature of {\wp}. From the oddness and periodicity {\wp'} has to have zeroes at the half-periods {\frac{1}{2} \Lambda \backslash \Lambda}; in particular, from Lemma 3(ii) there are no other zeroes, and the principal divisor is given by

\displaystyle  (\wp') = 3 \cdot (\Lambda) - (\frac{\omega_1}{2} + \Lambda) - (\frac{\omega_2}{2}+\Lambda) - (\frac{\omega_1+\omega_2}{2}+\Lambda). \ \ \ \ \ (10)

Turning now to {L(4 \cdot (\Lambda))}, we could differentiate {\wp} yet again to generate a doubly periodic function {\wp''} with a fourth order pole at the origin, but we can also work with the square {\wp^2} of the Weierstrass function. From Lemma 6 we conclude that {L(4 \cdot (\Lambda))} is four-dimensional and is spanned by {1, \wp, \wp^2, \wp'}. In a similar fashion, {L(5 \cdot (\Lambda))} is a five-dimensional space spanned by {1, \wp, \wp^2, \wp', \wp \wp'}.

Something interesting happens though at {L(6 \cdot (\Lambda))}. Lemma 6 tells us that this space is the span of {1, \wp, \wp^2, \wp', \wp\wp'}, and possibly one other function, which will have a pole of order six at the origin. Here we have two natural candidates for such a function: the cube {\wp^3} of the Weierstrass function, and the square {(\wp')^2} of its derivative. Both have a pole of order exactly six and lie in {L(6 \cdot (\Lambda))}, and so {(\wp')^2} must be a linear combination of {1, \wp, \wp^2, \wp^3, \wp', \wp \wp'}. But since {(\wp')^2, 1, \wp, \wp^2, \wp^3} are even and {\wp', \wp \wp'} are odd, {(\wp')^2} must in fact just be a linear combination of {1, \wp, \wp^2, \wp^3}. To work out the precise combination, we see by repeating the derivation of (7) that

\displaystyle  \wp^{(k)}(z) = (-1)^k (k+1)! \sum_{z_0 \in \Lambda} \frac{1}{(z-z_0)^{k+2}}

for any {k=1,2,\dots}, so that the function {\wp(z) - \frac{1}{z^2}}, which extends holomorphically to the origin, has {k^{th}} derivative at the origin equal to {(-1)^k (k+1)! G_{k+2}(\Lambda)}, where the Eisenstein series {G_k(\Lambda)} is defined by the formula

\displaystyle  G_k(\Lambda) := \sum_{z_0 \in \Lambda \backslash \{0\}} \frac{1}{z_0^k};

we can also extend this to the {k=0} case with convention {G_2(\Lambda)=0}. Note from the symmetric nature of {\Lambda} that {G_k(\Lambda)} vanishes for odd {k}; this is consistent with the even nature of {\wp}. We thus have the Laurent expansion

\displaystyle  \wp(z) = \frac{1}{z^2} + \sum_{k=1}^\infty (2k+1) G_{2k+2}(\Lambda) z^{2k}

\displaystyle  = \frac{1}{z^2} + 3 G_4(\Lambda) z^2 + 5 G_6(\Lambda) z^4 + \dots

for {z} near zero. This then gives the further Laurent expansions

\displaystyle  \wp(z)^2 = \frac{1}{z^4} + 6 G_4(\Lambda) + 10 G_6(\Lambda) z^2 + \dots

\displaystyle  \wp(z)^3 = \frac{1}{z^6} + \frac{9 G_4(\Lambda)}{z^2} + 15 G_6(\Lambda) + \dots

\displaystyle  \wp'(z) = -\frac{2}{z^3} + 6 G_4(\Lambda) z + 20 G_6(\Lambda) z^3 + \dots

\displaystyle  \wp'(z)^2 = \frac{4}{z^6} - \frac{24 G_4(\Lambda)}{z^2} - 80 G_6(\Lambda) + \dots.

From these expansions we see that the decomposition of {\wp'(z)^2} into a linear combination of {1,\wp,\wp^2,\wp^3} does not actually involve {\wp^2} (as this function is the only one that has a {1/z^4} term in its Laurent expansion), and on comparing {1/z^6} coefficients we see that the coefficient of {\wp^3} must be {4}. Thus we have a linear relationship of the form (8) for some coefficients {g_2 = g_2(\Lambda), g_3 = g_3(\Lambda)}, which on inspection of the {\frac{1}{z^2}} and constant terms leads to the formulae

\displaystyle  g_2 := 60 G_4(\Lambda); \quad g_3 := 140 G_6(\Lambda).

Exercise 8 Derive (8) directly from Proposition 2 by showing that the difference between the two sides is doubly periodic and holomorphic after removing singularities.

Exercise 9 (Classification of doubly periodic meromorphic functions)
  • (i) For any {k \geq 2}, show that {L( k \cdot (\Lambda))} has dimension {k}, and every element of this space is a polynomial combination of {\wp, \wp'}.
  • (ii) Show that every doubly periodic meromorphic function is a rational function of {\wp, \wp'}.

We have an alternate form of (8):

Exercise 10 Define the roots {e_1 := \wp( \frac{\omega_1}{2} )}, {e_2 := \wp( \frac{\omega_2}{2} )}, {e_3 := \wp( \frac{\omega_1+\omega_2}{2} )}.

If we now define the elliptic curve

\displaystyle  C := \{ (z,w) \in {\bf C}^2: w^2 = 4z^3 - g_2 z - g_3 \} \cup \{ \infty\}

\displaystyle  = \{ (z,w) \in {\bf C}^2: w^2 = 4(z-e_1)(z-e_2)(z-e_3) \} \cup \{ \infty\}

to be the union of a certain cubic curve in the complex plane {{\bf C}^2} together with the point at infinity (where the notions of “curve” and “plane” are relative to the underlying complex field {{\bf C}} rather than the more familiar real field {{\bf R}}), then we have a map

\displaystyle  \Phi: z \mapsto (\wp(z), \wp(z')) \ \ \ \ \ (11)

from {\Lambda \backslash {\bf C}} to {C}, with the convention that the origin {\Lambda} is mapped to the point at infinity {\infty}. For instance, the half-periods {\frac{\omega_1}{2} + \Lambda}, {\frac{\omega_2}{2} + \Lambda}, {\frac{\omega_1+\omega_2}{2} + \Lambda} are mapped to the points {(e_1,0), (e_2,0), (e_3,0)} of {C} respectively.

Lemma 11 The map {\Phi} defined by (11) is a bijection between {\Lambda \backslash {\bf C}} and {C}.

Among other things, this lemma implies that the elliptic curve {C} is topologically equivalent (i.e., homeomorphic to) a torus, which is not an entirely obvious fact (though if one squints hard enough, the real analogue of an elliptic curve does resemble a distorted slice of a torus embedded in {{\bf R}^3}).

Proof: Clearly {\Lambda} is the only point that maps to {\infty}, and (from (10)) the half-periods are the only points that map to {(e_1,0), (e_2,0), (e_3,0)}. It remains to show that all the other points {(z,w)} arise via {\Phi} from exactly one element of {\Lambda \backslash {\bf C}}. The function {\wp - z} has exactly two zeroes by Lemma 3(ii), which lie at {a+\Lambda, -a+\Lambda} for some {a} as {\wp} is even; since {(z,w) \neq (e_1,0), (e_2,0), (e_3,0)}, {z} is not equal to {e_1,e_2,e_3}, hence {a} is not a half-period. As {\wp'} is odd, the map (11) must therefore map {a+\Lambda, -a+\Lambda} to the two points {(z,w), (z,-w)} of the elliptic curve {C} that lie above {z}, and the claim follows. \Box

Analogously to the Riemann sphere {{\bf C} \cup \{ \infty\}}, the elliptic curve {C} can be given the structure of a Riemann surface, by prescribing the following charts:

  • (i) When {(z_0,w_0)} is a point in {C} other than {\infty} or {(e_1,0), (e_2,0), (e_3,0)}, then locally {C} is the graph of a holomorphic branch {(4(z-e_1)(z-e_2)(z-e_3))^{1/2}} of the square root of {4(z-e_1)(z-e_2)(z-e_3)} near {(z_0,w_0)}, and one can use {z} as a coordinate function in a sufficiently small neighbourhood of {(z_0,w_0)}.
  • (ii) In the neighbourhood of {(e_j,0)} for some {j=1,2,3}, the function {f: z \mapsto 4(z-e_1)(z-e_2)(z-e_3)} has a simple zero at {e_j} and so has a local inverse {f^{-1}} that maps a neighbourhood of {0} to a neighbourhood of {e_j}, and a point {(z,w)} sufficiently near {(e_j,0)} can be parameterised by {(f^{-1}(w^2),w)}. One can then use {w} as a coordinate function in a neighbourhood of {(e_j,0)}.
  • (iii) A neighbourhood of {\infty} consists of {\infty} and the points {(z,w)} in the remaining portion of {C} with {z, w} sufficiently large; then {w} is asymptotic to a square root of {4z^3}, so in particular {w/z^2} and {1/z} should both go to zero as {(z,w)} goes to infinity in {C}. We rewrite the defining equation {w^2 = 4z^3 - g_2 z - g_3} of the curve in terms of {w/z^2} and {1/z} as {(w/z^2)^2 = 4 (1/z) - g_2 (1/z)^3 - g_3 (1/z)^4}. The function {h(\xi) := 4 \xi - g_2 \xi^3 - g_3 \xi^4} has a simple zero at zero and thus has a holomorphic local inverse {h^{-1}} that maps {0} to {0}, and we have {1/z = h^{-1}(w/z^2)} in a neighbourhood of infinity. We can then use {w/z^2} as a coordinate function in a neighbourhood of {\infty}, with the convention that this coordinate function vanishes at infinity.

It is then a tedious but routine matter to check that {C} has the structure of a Riemann surface. We then claim that the bijection {\Phi} defined by (11) is holomorphic, and thus a complex diffeomorphism of Riemann surfaces. In the neighbourhood of any point {\zeta+\Lambda} of the torus {\Lambda \backslash {\bf C}} other than the origin {\Lambda}, {\Phi} maps to a neighbourhood of finite point {(z,w)} of {C}, including the three points {(e_1,0), (e_2,0), (e_3,0)}, the holomorphicity is a routine consequence of composing together the various local holomorphic functions and their inverses. In the neighbourhood of the origin {\Lambda}, {\Phi} maps {z+\Lambda} for small {z} to a point of {C} with a Laurent expansion

\displaystyle  (\frac{1}{z^2} + O(z^2), \frac{-2}{z^3} + O(z))

from the Laurent expansions of {\wp, \wp'}, so in particular the coordinate {w/z^2} takes the form {-4 z + O(z^5)} where the error term {O(z^5)} is holomorphic, with {0} mapping to {0}. In particular the map is a local complex diffeomorphism here and again we have holomorphicity. We thus conclude that the elliptic curve {C} is complex diffeomorphic to the torus {\Lambda \backslash {\bf C}} using the map {\Phi}. From Exercise 9, the meromorphic functions on {\Lambda \backslash {\bf C}} may be identified with the rational functions on {C}.

While we have shown that all tori are complex diffeomorphic to elliptic curves, the converse statement that all elliptic curves are diffeomorphic to tori will have to wait until the next section for a proof, once we have set up the machinery of modular forms.

Exercise 12 (Group law on elliptic curves)
  • (i) Let {P,Q,R} be three distinct elements of the torus {\Lambda \backslash {\bf C}} that are not equal to the origin {\Lambda}. Show that {P+Q+R=\Lambda} if and only if the three points {(\wp(P), \wp'(P))}, {(\wp(Q), \wp'(Q))}, {(\wp(R), \wp'(R))} are collinear in {C}, in the sense that they lie on a common complex line {\{ (z,w) \in {\bf C}^2: ax+by=c\}} for some complex numbers {a,b,c} with {a,b} not both zero.
  • (ii) What happens in (i) if (say) {P} and {Q} agree? What about if {R = \Lambda}?
  • (iii) Using (i), (ii), give a purely geometric definition of a group addition law on the elliptic curve {C} which is compatible with the group addition law on the torus {\Lambda \backslash {\bf C}} via (11). (We remark that the associativity property of this law is not obvious from a purely geometric perspective, and is related to the Cayley-Bacharach theorem in classical geometry; see this previous blog post.)

Exercise 13 (Addition law) Show that for any {z, w \in {\bf C}} lying in distinct cosets of {\Lambda}, one has

\displaystyle  \wp(z+w) = \frac{1}{4} \left( \frac{\wp'(z) - \wp'(w)}{\wp(z)-\wp(w)} \right)^2 - \wp(z) - \wp(w).

Exercise 14 (Special case of Riemann-Roch)
  • (i) Show that if two divisors {D,D'} are equivalent (in the sense of Exercise 7(iii)), then the vector spaces {L(D)} and {L(D')} are isomorphic (in particular, they have the same dimension).
  • (ii) If {D} is a divisor of some degree {d}, show that the dimension of the space {L(D)} is zero if {d < 0}, equal to {d} if {d>0}, equal to {0} if {d=0} and {D} has non-zero sum, and equal to {1} if {d=0} and {D} has zero sum. (Hint: use Exercise 7(iii) and part (i) to replace {D} with an equivalent divisor of a simple form.)
  • (iii) Verify the identity

    \displaystyle  \mathrm{dim} L(D) - \mathrm{dim} L(-D) = \mathrm{deg}(D)

    for any divisor {D}. This is a special case of the more general Riemann-Roch theorem, discussed in these 246C notes.

Exercise 15 (Elliptic integrals)
  • (i) Show that {\wp} is a covering map from {{\bf C} \backslash \frac{1}{2} \Lambda} to the thrice-punctured plane {{\bf C} \backslash \{e_1,e_2,e_3\}}.
  • (ii) Let {\gamma} be a contour in {{\bf C} \backslash \{e_1,e_2,e_3\}} from some complex number {z_1} to another complex number {z_2}, and suppose that there is a holomorphic branch {\sqrt{4(z-e_1)(z-e_2)(z-e_3)}} of the square root of {4(z-e_1)(z-e_2)(z-e_3)} in a neighbourhood of {\gamma}. Show that there exists complex numbers {\zeta_1, \zeta_2 \in {\bf C} \backslash \frac{1}{2} \Lambda} with {\wp(\zeta_1) = z_1}, {\wp(\zeta_2) = z_2} such that

    \displaystyle  \int_\gamma \frac{dz}{\sqrt{4(z-e_1)(z-e_2)(z-e_3)}} = \zeta_2 - \zeta_1. \ \ \ \ \ (12)

Remark 16 The integral {\int_\gamma \frac{dz}{\sqrt{4(z-e_1)(z-e_2)(z-e_3)}}} is an example of an elliptic integral; many other elliptic integrals (such as the integral arising when computing the perimeter of an ellipse) can be transformed into this form (or into a closely related integral) by various elementary substitutions. Thus the Weierstrass elliptic function {\wp} can be employed to evaluate elliptic integrals, which may help explain the terminology “elliptic” that occurs throughout these notes. In 246C notes we will introduce the notion of a meromorphic {1}-form on a Riemann surface. The identity (12) can then be interpreted in this language as the differential form identity {d(\Phi^{-1}) = \frac{dz}{w}}, where {z,w} are the standard coordinates on the elliptic curve {C}; the meromorphic {1}-form is initially only defined on {C} outside of the four points {(e_1,0), (e_2,0), (e_3,0), \infty}, but this identity in fact reveals that the form extends holomorphically to all of {C}; it is an example of what is known as an Abelian differential of the first kind.

Remark 17 The elliptic curve {C} (for various choices of parameters {g_2,g_3}) can be defined in other fields than the complex numbers (though some technicalities arise in characteristic two and three due to the pathological behaviour of the discriminant in those cases). On the other hand, the Weierstrass elliptic function {\wp} is a transcendental function which only exists in complex analysis and does not have a direct analogue in other fields. So this connection between elliptic curves and tori is specific to the complex field. Nevertheless, many facts about elliptic curves that were initially discovered over the complex numbers through this complex-analytic link to tori, were then reproven by purely algebraic means, so that they could be extended without much difficulty to many other fields than the complex numbers, such as finite fields. (For instance, the role of the complex torus can be replaced by the Jacobian variety, which was briefly introduced in Exercise 7.) Elliptic curves over such fields are of major importance in number theory (and cryptography), but we will not discuss these topics further here.

— 2. Modular functions and modular forms —

In Exercise 32 of 246A Notes 5, it was shown that two tori {(\omega_1 {\bf Z} + \omega_2{\bf Z}) \backslash {\bf C}} and {(\omega'_1 {\bf Z} + \omega'_2{\bf Z}) \backslash {\bf C}} are complex diffeomorphic if and only if one has

\displaystyle  \frac{\omega'_1}{\omega'_2} = \pm \frac{a\omega_1+b\omega_2}{c\omega_1+d\omega_2} \ \ \ \ \ (13)

for some integers {a,b,c,d} with {ad-bc=1}. From this it is not difficult to see that if {\Lambda,\Lambda'} are two lattices in {{\bf C}}, then {\Lambda \backslash {\bf C}} and {\Lambda' \backslash {\bf C}} are diffeomorphic if and only if {\Lambda' = \lambda \cdot \Lambda} for some {\lambda \in {\bf C} \backslash \{0\}}, i.e., the lattices {\Lambda,\Lambda'} are complex dilations of each other.

Let us write {X_0(1)} for the set of all tori {\Lambda \backslash {\bf C}} quotiented by the equivalence relation of complex diffeomorphism; this is the (classical, level one, noncompactified) modular curve. By the above discussion, this set can also be identified with the set of pairs {(\omega_1,\omega_2)} of linearly independent (over {{\bf R}}) complex numbers quotiented by the equivalence relation given implicitly by (13). One can simplify this a little by observing that any pair {(\omega_1,\omega_2)} is equivalent to {(1,\tau)} for some {\tau} in the upper half-plane {{\mathbf H}}, namely either {+\omega_2/\omega_1} or {-\omega_2/\omega_1} depending on the relative phases of {\omega_1} and {\omega_2}; this quantity {\tau} is known as the period ratio. From (13) (swapping the roles of {\omega_1,\omega_2} as necessary), we then see that two pairs {(1,\tau), (1,\tau')} are equivalent if one has

\displaystyle  \tau' = \pm \frac{a \tau + b}{c\tau + d}

for some integers {a,b,c,d} with {ad-bc=1}. Recall that the Möbius transformation {\tau \mapsto \frac{a \tau + b}{c\tau + d}} preserves {{\bf H}} (see Exercise 20 of 246A Notes 5), so the {\pm} sign here must actually be positive. We can interpret this in terms of the action of the matrix group

\displaystyle  SL_2({\bf Z}) := \{ \begin{pmatrix} a & b \\ c & d \end{pmatrix}: a,b,c,d \in {\bf Z}: ad-bc = 1 \}

on {{\bf H}} by Möbius transformation

\displaystyle  \begin{pmatrix} a & b \\ c & d \end{pmatrix} \cdot \tau := \frac{a\tau+b}{c\tau+d}

and we conclude that two pairs {(1,\tau), (1,\tau')} are equivalent if and only if the period ratios {\tau,\tau'} lie in the same orbit of {SL_2({\bf Z})}. Thus we can identify the modular curve {X_0(1)} (as a set, at least) with the quotient space {SL_2({\bf Z}) \backslash {\bf H}}. Actually if one wished one could replace {SL_2({\bf Z})} here with the projective subgroup {PSL_2({\bf Z})}, since the negative identity matrix {\begin{pmatrix} -1 & 0 \\0 & -1 \end{pmatrix}} acts trivially on {{\bf H}}.

If we use the relation {ad-bc=1} to write

\displaystyle  \frac{a \tau + b}{c\tau + d} = \frac{a}{c} - \frac{1}{c(c\tau+d)} \ \ \ \ \ (14)

we see that {\frac{a \tau + b}{c\tau + d}} approaches the real line as {(c,d) \rightarrow \infty} if {c} is non-zero; also, if {c} is zero, then from {ad-bc=1} we must have {d = \pm 1}, and {\frac{a\tau+b}{c\tau+d}} will either have imaginary part going off to infinity (if {a} goes to infinity) or real part going to infinity (if {a} is bounded and {b} goes to infinity). In all cases we then conclude that {\frac{a\tau+b}{c\tau+d}} goes to infinity as {\begin{pmatrix} a & b \\ c & d \end{pmatrix} \in SL_2({\bf Z})} goes to infinity, uniformly for {\tau} in any fixed compact subset of {{\bf H}}, which makes the action of {SL_2({\bf Z})} on {{\bf H}} proper (for any compact set {K \subset {\bf H}}, one has {\gamma K} intersecting {K} for at most finitely many {\gamma \in SL_2({\bf Z})}. If {PSL_2({\bf Z})} acted freely on {{\bf H}} (i.e., any element {\gamma} of {SL_2({\bf Z})} other than the identity and negative identity has no fixed points in {{\bf H}}), then the quotient {SL_2({\bf Z}) \backslash {\bf H}} would be a Riemann surface by the discussion in Section 2 of 246A Notes 5. Unfortunately, this is not quite true. For instance, the point {i \in {\bf H}} is fixed by the Möbius transformation {\omega \mapsto \frac{-1}{\omega}} coming from the rotation matrix {\begin{pmatrix} 0 & -1 \\ 1 & 0 \end{pmatrix}} of {SL_2({\bf Z})}, and the point {e^{\pi i/3} \in {\bf H}} is similarly fixed by the transformation {\omega \mapsto \frac{-1}{\omega-1}} coming from the matrix {\begin{pmatrix} 0 & -1 \\ 1 & -1 \end{pmatrix}} of {SL_2({\bf Z})}. Geometrically, these fixed points come from the fact that the Gaussian integgers {{\bf Z} + i{\bf Z}} are invariant with respect to rotation by {\pi/2}, while the Eisenstein integers {{\bf Z} + e^{\pi i/3} {\bf Z}} are invariant with respect to rotation by {\pi/3}. On the other hand, these are basically the only two places where the action is not free:

Exercise 18 Suppose that {\tau} is an element of {{\bf H}} which is fixed by some element {\gamma} of {SL_2({\bf Z})} which is not the identity or negative identity. Let {\Lambda} be the lattice {\Lambda := {\bf Z} + \tau {\bf Z}}.
  • (i) Show that {\Lambda} obeys a dilation invariance {\Lambda = \lambda \cdot \Lambda} for some complex number {\lambda} which is not real.
  • (ii) Show that the dilation {\lambda} in part (i) must have magnitude one. (Hint: look at a non-zero element of {\Lambda} of minimal magnitude.)
  • (iii) Show that there is no rotation invariance {\Lambda = e^{i\theta} \cdot \Lambda} with {0 < \theta < \pi/3}. (Hint: again, work with a non-zero element of {\Lambda} of minimal magnitude, and use the fact that {\Lambda} is closed under addition and subtraction. It may help to think geometrically and draw plenty of pictures.)
  • (iv) Show that {\Lambda} is equivalent to either the Gaussian lattice {{\bf Z} + i{\bf Z}} or the Eisenstein lattice {{\bf Z} + e^{\pi i/3} {\bf Z}}, and conclude that the period ratio {\tau} is equivalent to either {i} or {e^{\pi i/3}}.

Remark 19 The conformal map {z \mapsto iz} on the complex numbers preserves the Gaussian integers {{\bf Z} + i{\bf Z}} and thus descends to a conformal map from the Gaussian torus {{\bf Z} + i{\bf Z} \backslash {\bf C}} to itself; similarly the conformal map {z \mapsto e^{\pi i/3}} preserves the Eisenstein integers and thus descends to a conformal map from the Eisenstein torus {{\bf Z} + e^{\pi i/3}{\bf Z} \backslash {\bf C}} to itself. These rare examples of complex tori equipped with additional conformal automorphisms are examples of tori (or elliptic curves) endowed with complex multiplication. There are additional examples of elliptic curves endowed with conformal endomorphisms that are still considered to have complex multiplication, and have a particularly nice algebraic number theory structure, but we will not pursue this topic further here.

Remark 20 The fact that the action of {PSL_2({\bf Z})} on lattices contains fixed points is somewhat annoying, as it prevents one from immediately viewing the modular curve as a Riemann surface. However by passing to a suitable finite index subgroup of {PSL_2({\bf Z})}, one can remove these fixed points, leading to a theory that is cleaner in some respects. For instance, one can work with the congruence group {\Gamma(2)}, which roughly speaking amounts to decorating the lattices {\Lambda} (or their tori {\Lambda \backslash {\bf C}}) with an additional “{2}-marking” that eliminates the fixed points. This leads to a modification of the theory which is for instance well suited for studying theta functions; the role of the {j}-invariant in the discussion below is then played by the modular lambda function {\lambda}, which also gives a uniformisation of the twice-punctured complex plane {{\bf C} \backslash \{0,1\}}. However we will not develop this parallel theory further here.

If we let {{\bf H}'} be the elements {\tau} of {{\bf H}} not equivalent to {i} or {e^{\pi i/3}}, and {X_0(1)'} the equivalence class of tori not equivalent to the Gaussian torus {({\bf Z}+i{\bf Z}) \backslash {\bf C}} or the Eisenstein torus {{\bf Z} + e^{\pi i/3} {\bf Z} \backslash {\bf C}}, then {X_0(1)'} can be viewed as the quotient {PSL_2({\bf Z}) \backslash {\bf H}'} of the Riemann surface {{\bf H}'} by the free and proper action of {PSL_2({\bf Z})}, so it has the structure of a Riemann surface; {X_0(1)} can thus be thought of as the Riemann surface {X_0(1)'} with two additional points added. Later on we will also add a third point {\infty} (known as the cusp) to the Riemann surface to compactify it to {X_0(1) \cup \{\infty\}}.

A function {f: X_0(1) \rightarrow {\bf C}} on the modular curve {X_0(1)} can be thought of, equivalently, as a function {f: {\bf H} \rightarrow {\bf C}} that is {SL_2({\bf Z})}-invariant in the sense that {f(\gamma \cdot \tau) = f(\tau)} for all {\tau \in {\bf H}} and {\gamma \in SL_2({\bf Z})}, or equivalently that one has the identity

\displaystyle  f(\frac{a\tau+b}{c\tau+d}) = f(\tau) \ \ \ \ \ (15)

whenever {\tau \in {\bf H}} and {a,b,c,d} are integers with {ad-bc = 1}. Similarly if {f} takes values in the Riemann sphere {{\bf C} \cup \{\infty\}} rather than {{\bf C}}. If {f} is holomorphic (resp. meromorphic) on {{\bf H}}, this will in particular define a holomorphic (resp. meromorphic) function on {X_0(1)'}, and morally to all of {X_0(1)} as well (although we have not yet defined a Riemann structure on all of {X_0(1)}).

We define a modular function to be a meromorphic function {f} on {{\bf H}} that obeys the condition (15), and which also has at most polynomial growth at the cusp {\infty} in the sense that one has a bound of the form

\displaystyle  |f(\tau)| \leq C e^{2\pi m |\mathrm{Im}(\tau)|} \ \ \ \ \ (16)

for all {\tau} with sufficiently large imaginary part, and some constants {C,m} (this bound is needed for technical reasons to ensure “meromorphic” behaviour at the cusp {\infty}, as opposed to an essential singularity). Specialising to the matrices

\displaystyle  \begin{pmatrix} a & b \\ c & d \end{pmatrix} = \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix}, \begin{pmatrix} 0 & -1 \\ 1 & 0 \end{pmatrix} \ \ \ \ \ (17)

we see that the condition (15) in particular implies the {1}-periodicity

\displaystyle  f(\tau+1) = f(\tau) \ \ \ \ \ (18)

and the inversion law

\displaystyle  f(-1/\tau) = f(\tau) \ \ \ \ \ (19)

for all {\tau \in {\bf H}}. Conversely, these two special cases of (15) imply the general case:

Exercise 21
  • (i) Let {(a,b), (c,d)} be two elements of {{\bf Z}^2} with {ad-bc=1}. Show that it is possible to transform the quadruplet {((a,b),(c,d))} to the quadruplet {((1,0),(0,1))} after a finite number of applications of the moves

    \displaystyle  ((a,b),(c,d)) \mapsto ((a+c,b+d), (c,d)),

    \displaystyle  ((a,b),(c,d)) \mapsto ((a-c,b-d), (c,d))


    \displaystyle  ((a,b),(c,d)) \mapsto ((-c,-d), (a,b)).

    ({Hint: use the principle of infinite descent, applying the moves in a suitable order to decrease the lengths of {(a,b)} and {(c,d)} when the dot product {ac+bd} is not too small, taking advantage of the Lagrange identity {(a^2+b^2)(c^2+d^2) = (ac+bd)^2 + (ad-bc)^2} to determine when this procedure terminates. It may help to think geometrically and draw plenty of pictures.) Conclude that the two matrices (17) generate all of {SL_2({\bf Z})}.
  • (ii) Show that a function {f: {\bf H} \rightarrow {\bf C}} obeys (15) if and only if it obeys both (18) and (19).

Exercise 22 (Standard fundamental domain) Define the standard fundamental domain {{\mathcal F}} for {X_0(1)} to be the set

\displaystyle  {\mathcal F} := \{ \tau \in {\bf H}: |\tau| \geq 1; |\mathrm{Re} \tau| \leq \frac{1}{2} \}.

  • (i) Show that every lattice {\Lambda} is equivalent (up to dilations) to a lattice {{\bf Z} + \tau {\bf Z}} with {\tau \in {\mathcal F}}, with {\tau} unique except when it lies on the boundary of {{\mathcal F}}, in which case the lack of uniqueness comes either from the pair {\tau = \frac{1}{2}+it, -\frac{1}{2} +it} for some {t \geq \frac{\sqrt{3}}{2}}, or from the pair {\tau = e^{i\theta}, e^{i(\pi-\theta)}} for some {\pi/3 < \theta < \pi/2}. (Hint: arrange {\Lambda} so that {1} is a non-zero element of {\Lambda} of minimal magnitude.)
  • (ii) Show that {X_0(1)} can be identified with the fundamental domain {{\mathcal F}} after identifying {\frac{1}{2}+it} with {-\frac{1}{2}+it} for {t \geq \frac{\sqrt{3}}{2}}, and {e^{i\theta}} with {e^{i(\pi-\theta)}} for {\pi/3 < \theta < \pi/2}. Show also that the set {X_0(1)'} is then formed the same way, but first deleting the points {i, e^{\pi i/3}, e^{2\pi i/3}} from {{\mathcal F}}.

We will give some examples of modular functions (beyond the trivial example of constant functions) shortly, but let us first observe that when one differentiates a modular function one gets a more general class of function, known as a modular form. In more detail, observe from (14) that the derivative of the Möbius transformation {\tau \mapsto \frac{a\tau+b}{c\tau+d}} is {\frac{1}{(c\tau+d)^2}}, and hence by the chain rule and (15) the derivative of a modular function {f} would obey the variant law

\displaystyle  f'(\frac{a\tau+b}{c\tau+d}) = (c\tau+d)^2 f'(\tau).

Motivated by this, we can define a (weakly) modular form of weight {k} for any natural number {k} to be a meromorphic function {f: {\bf H} \rightarrow {\bf C} \cup \{\infty\}} which obeys the modularity relation

\displaystyle  f(\frac{a\tau+b}{c\tau+d}) = (c\tau+d)^k f(\tau) \ \ \ \ \ (20)

for all {\tau \in {\bf H}} and all integers {a,b,c,d} with {ad-bc=1} (with the convention that {c \infty = \infty} for any non-zero complex {c}), and which is meromorphic at the cusp {\infty} in the sense of (16). Thus for instance modular functions are weakly modular forms of weight {0}. A modular form of weight {k} is a weakly modular form {f} of weight {k} which is holomorphic (not just meromorphic) on {{\bf H}}, and also “holomorphic at {\infty}” in the sense that {f(\tau)} is bounded for {\mathrm{Im}(\tau)} large enough. Note that as viewed a function {f(\tau) = \tilde f(e^{2\pi i \tau}) = \tilde f(q)} of the nome {q = e^{2\pi i \tau}}, a modular form {f} can be thought of as a certain type of holomorphic function {\tilde f} on the disk {D(0,1)} (using the Riemann singularity removal theorem to remove the singularity at the origin {q=0}), while weakly modular forms (and in particular modular functions) are certain types of meromorphic functions on this disk. A modular form that vanishes at infinity is known as a cusp form.

Exercise 23 Let {k} be a natural number. Show that a function {f: {\bf H} \rightarrow {\bf C}} obeys (20) if and only if it is {1}-periodic in the sense of (18) and obeys the law

\displaystyle  f(-1/\tau) = \tau^k f(\tau) \ \ \ \ \ (21)

for all {\tau \in {\bf H}}.

Exercise 24 (Lattice interpretation of modular forms) Let {f} be a modular form of weight {k}. Show that there is a unique function {F} from lattices {\Lambda} to complex numbers such that

\displaystyle  f(\tau) = F({\bf Z} + \tau {\bf Z})

for all {\tau \in {\bf H}}, and such that one has the homogeneity relation

\displaystyle  F(\lambda \cdot \Lambda) = \lambda^{-k} F(\Lambda)

for any lattice {\Lambda} and non-zero complex number {\lambda}.

Observe that the product of a modular form of weight {k} and a modular form of weight {l} is a modular form of weight {k+l}, and that the ratio of two modular forms of weight {k} will be a modular function (if the denominator is not identically zero). Also, the space of modular forms of a given weight is a vector space, as is the space of modular functions. This suggests a way to generate non-trivial modular functions, by first locating some modular forms and then taking suitable rational combinations of these forms.

Somewhat analogously to how we used Lemma 3 to investigate the spaces {L(D)} for divisors {D} on a torus, we will investigate the space of modular forms via the following basic formula:

Theorem 25 (Valence formula) Let {f} be a modular form of weight {k}, not identically zero. Then we have

\displaystyle  \sum_\rho \mathrm{ord}_\rho(f) + \frac{1}{2} \mathrm{ord}_i(f) + \frac{1}{3} \mathrm{ord}_{e^{\pi i}/3}(f) + \mathrm{ord}_\infty(f) = \frac{k}{12} \ \ \ \ \ (22)

where {\mathrm{ord}_\rho} is the order of vanishing of {f} at {\rho}, {\mathrm{ord}_\infty(f)} is the order of vanishing of {\tilde f(q) := f(\tau)} (i.e., {f} viewed as a function of the nome {q = e^{2\pi i \tau}}) at {q=0}, and {\rho} ranges over the zeroes of {f} that are not equivalent to {i, e^{\pi i/3}, \infty}, with just one zero counted per equivalence class. (This will be a finite sum.)

Informally, this formula asserts that the point {i} only “deserves” to be counted in {X_0(1)} with multiplicity {1/2} due to its order {2} stabiliser, while the point {e^{\pi/3}} only “deserves” to be counted in {X_0(1)} with multiplicity {1/3} due to its order {3} stabiliser. (The cusp {\infty} has an infinite stabiliser, but this is compensated for by taking the order with respect to the nome variable {q} rather than the period ratio variable {\tau}.) The general philosophy of weighting points by the reciprocal of the order of their stabiliser occurs throughout mathematics; see this blog post for more discussion.

Proof: Firstly, from Exercise 22, we can place all the zeroes {\rho} in the fundamental domain {{\mathcal F}}. When parameterised in terms of the nome {q}, this domain is compact, hence has only finitely many zeros, so the sum in (22) is finite.

As in the proof of Lemma 3(ii), we use the residue theorem. For simplicity, let us first suppose that there are no zeroes on the boundary of the fundamental domain {{\mathcal F}} except possibly at the cusp {\infty}. Then for {T} large enough, we have from the residue theorem that

\displaystyle  \sum_\rho \mathrm{ord}_\rho(f) = \frac{1}{2\pi i} \int_\gamma \frac{f'(z)}{f(z)}\ dz,

where {\gamma} is the closed contour consisting of the polygonal path {\gamma_{\frac{1}{2}+\frac{\sqrt{3}i}{2} \rightarrow \frac{1}{2} + iT \rightarrow -\frac{1}{2}+iT \rightarrow -\frac{1}{2}+\frac{\sqrt{3}i}{2}}} concatenated with the circular arc {\{ e^{i(2\pi/3 - \theta)}: 0 \leq \theta \leq \pi/3 \}}. From the {1}-periodicity, the contribution of the two vertical edges {\gamma_{\frac{1}{2}+\frac{\sqrt{3}i}{2} \rightarrow \frac{1}{2} + iT}} and {\gamma_{-\frac{1}{2}+iT \rightarrow -\frac{1}{2}+\frac{\sqrt{3}i}{2}}} cancel each other out. The contribution of the horizontal edge {\gamma_{\frac{1}{2}+iT \rightarrow -\frac{1}{2}+iT}} can be written using the change of variables {q = e^{2\pi i \tau}} as

\displaystyle  -\frac{1}{2\pi i} \int_{\gamma_{0,e^{-2\pi T},\circlearrowleft}} \frac{\tilde f'(q)}{\tilde f(q)}\ dq

which by the residue theorem is equal to {-\mathrm{ord}_\infty(f)}. Finally, using the modularity (21), one calculates that the contribution of the left arc {\{ e^{i(2\pi/3 - \theta)}: 0 \leq \theta \leq \pi/6 \}} is equal to {k/12} minus the contribution of the right arc {\{ e^{i(2\pi/3 - \theta)}: \pi/6 \leq \theta \leq \pi/3 \}}. This gives the proof of the valence theorem in the case that there are no zeroes on the boundary of {{\mathcal F}}.

Suppose now that there is a zero on the right edge {\frac{1}{2}+it} of {{\mathcal F}}, and hence also on the left edge {-\frac{1}{2}+it} by periodicity, for some {t > \frac{\sqrt{3}}{2}}. One can account for this zero by perturbing the contour {\gamma} to make a little detour to the right of {\frac{1}{2}+it} (e.g., by a circular arc), and a matching detour to the right of {-\frac{1}{2}+it}. One can then verify that the same argument as before continues to work, with this boundary zero being counted exactly once. Similarly, if there is a zero on the left arc {e^{i\theta}} for some {\pi/2 < \theta < 2\pi/3}, and hence also at {e^{i(\pi-\theta)}} by modularity, one can make a detour slightly above {e^{i\theta}} and slightly below {e^{i(\pi-\theta)}} (with the two detours being related by the transform {\tau \mapsto -1/\tau} to ensure cancellation), and again we can argue as before. If instead there is a zero at {i}, one makes an (approximately) semicircular detour above {i}; in this case the detour does not cancel out, but instead contributes a factor of {\frac{1}{2} \mathrm{ord}_i(f)} in the limit as the radius of the detour goes to zero. Finally, if there is a zero at {e^{i\pi/3}} (and hence also at {e^{2\pi i/3}}), one makes detours by two arcs of angle approximately {\pi/3} at these two points; these two (approximate) sixth-circles end up contributing a factor of {\frac{1}{2} \mathrm{ord}_{e^{\pi i/3}}(f)} in the limit, giving the claim. \Box

Exercise 26 (Quick applications of the valence formula)
  • (i) Let {f} be a modular form of weight {k}, not identically zero. Show that {k} is equal to {0} or an even number that is at least {4}.
  • (ii) (Liouville theorem for {X_0(1)}) If {f} is a modular form of weight zero, show that it is constant. (Hint: apply the valence theorem to various shifts {f-c} of {f} by constant.)
  • (iii) For {k=4,6,8,10,14}, show that the vector space of modular forms of weight {k} is at most one dimensional. (Hint: in these cases, there are a very limited number of solutions to the equation {a + \frac{1}{2} b + \frac{1}{3} c = \frac{k}{12}} with {a,b,c} natural numbers.)
  • (iv) Show that there are no cusp forms of weight {k} when {k < 12} or {k=14}, and for {k=12} the space of cusp forms of weight {k} is at most one dimensional.
  • (v) Show that for any {k}, the space of cusp forms of weight {k} is a subspace of the space of modular forms of weight {k} of codimension at most one, and that both spaces are finite-dimensional.

A basic example of modular forms are provided by the Eisenstein series

\displaystyle  G_k( \Lambda ) := \sum_{z \in \Lambda \backslash \{0\}} \frac{1}{z^{k}} \ \ \ \ \ (23)

that we have already encountered for even integers {k = 4,6,8,\dots} greater than two (we ignore the odd Eisenstein series as they vanish). We can view this as a function on {{\bf H}} by the formula

\displaystyle  G_k( \tau) := G_k( {\bf Z} + \tau {\bf Z} ). \ \ \ \ \ (24)

Observe that if {a,b,c,d} are integers with {ad-bc = 1}, then

\displaystyle  {\bf Z} + \frac{a\tau+b}{c\tau+d} {\bf Z} = \frac{1}{c\tau+d} ( (c\tau+d) {\bf Z} + (a\tau+b) {\bf Z} ) = \frac{1}{(c\tau+d)} ({\bf Z} + \tau {\bf Z})

using the matrix inverse in {SL_2({\bf Z})}. Inserting this into (23), (24) we conclude that

\displaystyle  G_k( \frac{a\tau+b}{c\tau+d} ) = (c\tau+d)^k G_k(\tau).

(Compare also with Exercise 24.) Also, from (23), (24) we have

\displaystyle  G_k(\tau) = \sum_{(n,m) \in {\bf Z}^2 \backslash \{(0,0)\}} \frac{1}{(n+m\tau)^{k}}. \ \ \ \ \ (25)

The series here is locally uniformly convergent for {\tau \in {\bf H}}, so {G_k} is holomorphic. Also, using the bounds

\displaystyle  \sum_{n \in {\bf Z}} \frac{1}{|n+m\tau|^k} \lesssim \sum_{n \in {\bf Z}} \min( \frac{1}{|n + m \mathrm{Re} \tau|^k}, \frac{1}{|m \mathrm{Im} \tau|^k} )

\displaystyle  \lesssim \int_{\bf R} \min( \frac{1}{t^k}, \frac{1}{|m \mathrm{Im} \tau|^k} )\ dt

\displaystyle  \lesssim \frac{1}{|m \mathrm{Im} \tau|^{k-1}}

for non-zero {m}, while

\displaystyle  \sum_{n \in {\bf Z} \backslash 0} \frac{1}{n^k} = 2 \zeta(k) \ \ \ \ \ (26)

where {\zeta(k)} is the famous Riemann zeta function

\displaystyle  \zeta(k) := \sum_{n=1}^\infty \frac{1}{n^k},

we conclude on summing in {m} and using the hypothesis {k>2} that

\displaystyle  G_k(\tau) = 2 \zeta(k) + O_k( \frac{1}{|\mathrm{Im}(\tau)|^{k-1}}).

In particular, {G_k} is bounded at infinity. Summarising, we have established that the Eisenstein series {G_k} is a modular form of weight {k}, which is not identically zero (since it approaches the non-zero value {2 \zeta(k)} at the cusp {\infty}). Combining this with Exercise 26(iii), we see that we have completely classified the modular forms of weight {k} for {k=4,6,8,10,14}, namely they are the scalar multiples of {G_k}. For instance, the coefficients

\displaystyle  g_2(\tau) = 60 G_4(\tau)


\displaystyle  g_3(\tau) = 140 G_6(\tau)

appearing in the previous section are modular forms of weight {4} and weight {6} respectively, and the modular discriminant

\displaystyle  \Delta(\tau) = g_2^3(\tau) - 27 g_3^2(\tau)

from Exercise 10 is a modular form of weight {12}. From that exercise, this modular form never vanishes on {{\bf H}}, hence by the valence formula it must have a simple zero at {\infty}, and in particular is a cusp form. From Exercise 26 it is the unique cusp form of weight {12}, up to constants.

Exercise 27 Give an alternate proof that {\Delta} is a cusp form, not using the valence identity, by first establishing that {\zeta(4) = \frac{\pi^4}{90}} and {\zeta(6) = \frac{\pi^6}{945}}.

We can now create our first non-trivial modular function, the {j}-invariant

\displaystyle  j(\tau) := 1728 \frac{g_2(\tau)^3}{\Delta(\tau)}.

The factor of {1728} is traditional, as it gives a nice normalisation at {\infty}, as we shall see later. One can take advantage of complex multiplication to compute two special values immediately:

Lemma 28 We have {j(i) = 1728} and {j(e^{\pi i/3})=0}.

Proof: Using the rotation symmetry {z \mapsto iz} we see that {G_6(i) = 0}, hence {g_3(i)=0} which implies that {\Delta(i)=g_2(i)^3} and hence {j(i)=1728}. Similarly, using the rotation symmetry {z \mapsto e^{\pi i/3} z} we have {G_4(e^{\pi i/3}) = 0}, hence {j(e^{\pi i/3}=0)}. (One can also use the valence formulae to get the vanishing {G_6(i)=G_4(e^{\pi i/3})=0}). \Box

Being modular, we can think of {j} as a map from {X_0(1)} to {{\bf C}}. We have the following fundamental fact:

Proposition 29 The map {j: X_0(1) \rightarrow {\bf C}} is a bijection.

Proof: Note that for any {\lambda \in {\bf C}}, {j(\tau) = 1728 \lambda} if and only if {\tau} is a zero of {g_2^2 - \lambda \Delta}. It thus suffices to show that for every {\lambda \in {\bf C}}, the zeroes of the function {g_2^3 - \lambda \Delta} in {{\bf H}} consist of precisely one orbit of {SL_2({\bf Z})}. This function is a modular form of weight {12} that does not vanish at infinity (since {g_2} does not vanish while {\Delta} does). By the valence formula, we thus have

\displaystyle  \sum_\rho \mathrm{ord}_\rho( g_2^3 - \lambda \Delta ) + \frac{1}{2} \mathrm{ord}_1( g_2^3 - \lambda \Delta ) + \frac{1}{3} \mathrm{ord}_{e^{\pi i/3}}( g_2^3 - \lambda \Delta ) = 1.

As the orders are all natural numbers, some case checking reveals that there are now only three possibilities:
  • {g_2^3 - \lambda \Delta} has a simple zero at precisely one {SL_2({\bf Z})}-orbit, not equivalent to {1} or {e^{\pi i/3}}.
  • {g_2^3 - \lambda \Delta} has a double zero at {i} (and equivalent points), and no other zeroes.
  • {g_2^3 - \lambda \Delta} has a triple zero at {e^{\pi i/3}} (and equivalent points), and no other zeroes.
In any of these three cases, the claim follows. \Box

Note that this proof also shows that {j(\tau)-1728} has a double zero at {i} and {j(\tau)} has a triple zero at {e^{\pi i/3}}, but that {j(\tau)-j(\tau_0)} has a simple zero for any {\tau_0 \in {\bf H}} not equivalent to {i} or {e^{\pi i/3}}.

We can now give the entire modular curve {X_0(1)} the structure of a Riemann surface by declaring {j} to be the coordinate function. This is compatible with the existing Riemann surface structure on {X_0(1)'} since {j} was already holomorphic on this portion of the curve. Any modular function {f} can then factor as {f(\tau) = F(j(\tau))} for some meromorphic function {F} that is initially defined on the punctured complex plane {{\bf C} \backslash \{ 0, 1728 \}}; but from meromorphicity of {f} on {{\bf H}} and at infinity we see that {F} blows up at an at most polynomial rate as one approaches {0}, {1728}, or {\infty}, and so {F} is in fact a meromorphic function on the entire Riemann sphere and is thus a rational function (Exercise 19 of 246A Notes 4). We conclude

Proposition 30 Every modular function is a rational function of the {j}-invariant {j}.

Conversely, it is clear that every rational function of {j} is modular, thus giving a satisfactory description of the modular functions.

Exercise 31 Show that every modular function is the ratio of two modular forms of equal weight (with the denominator not identically zero).

Exercise 32 (All elliptic curves are tori) Let {A, B} be two complex numbers with {A^3 - 27B^3 \neq 0}. Show that there is a lattice {\Lambda} such that {g_2(\Lambda) = A} and {g_3(\Lambda) = B}, so in particular the elliptic curve

\displaystyle  \{ (z,w): w^2 = 4 z^3 - A z - B \} \cup \{\infty\}

is complex diffeomorphic to a torus {\Lambda \backslash {\bf C}}.

Remark 33 By applying some elementary algebraic geometry transformations one can show that any (smooth, irreducible) cubic plane curve {\{ (z,w): P(z,w) = 0 \}} generated by a polynomial {P: {\bf C} \times {\bf C} \rightarrow {\bf C}} of degree {3} is a Riemann surface complex diffeomorphic to a torus {\Lambda \backslash {\bf C}} after adding some finite number of points at infinity; also, some degree {4} curves such as

\displaystyle  \{ (z,w): w^2 = (z-a)(z-b)(z-c)(z-d) \} \cup \{\infty\}

can also be placed in this form. However we will not detail the required transformations here.

A famous application of the theory of the {j}-invariant is to give a short Riemann surface-based proof of the the little Picard theorem (first proven in Theorem 55 of 246A Notes 4):

Theorem 34 (Little Picard theorem) Let {f: {\bf C} \rightarrow {\bf C}} be entire and non-constant. Then {f({\bf C})} omits at most one point of {{\bf C}}.

Proof: Suppose for contradiction that {f({\bf C})} omits at least two points of {{\bf C}}. By applying a linear transformation, we may assume that {f} omits the points {0} and {1728}. Then {j^{-1} \circ f} is a holomorphic function from {{\bf C}} to {X_0(1)' = PSL_2({\bf Z}) \backslash {\bf H}'}. Since the domain {{\bf C}} is simply connected, {j^{-1} \circ f} lifts to a holomorphic function from {{\bf C}} to {{\bf H}}. Since {{\bf H}} is complex diffeomorphic to a disk, this lift must be constant by Liouville’s theorem, hence {f} is constant as required. (This is essentially Picard’s original proof of this theorem.) \Box

The great Picard theorem can also be proven by a more sophisticated version of these methods, but it requires some study of the possible behavior of elements of {SL_2({\bf Z})}; see Exercise 37 below.

All modular forms are {1}-periodic, and hence by Proposition 1 should have a Fourier expansion, which is also a Laurent expansion in the nome. As it turns out, the Fourier coefficients often have a highly number-theoretic interpretation. This can be illustrated with the Eisenstein series {G_k}; here we follow the treatment in Stein-Shakarchi. To compute the Fourier coefficients we first need a computation:

Exercise 35 Let {k \geq 2} and {\tau \in {\bf H}}, and let {q := e^{2\pi i \tau}} be the nome. Establish the identity

\displaystyle  \sum_{n \in {\bf Z}} \frac{1}{(n+\tau)^k} = \frac{(-2\pi i)^k}{(k-1)!} \sum_{\ell=1}^\infty \ell^{k-1} q^\ell

in two different ways:
  • (i) By applying the Poisson summation formula (Proposition 3(v) of Notes 2).
  • (ii) By first establishing the identity

    \displaystyle  \sum_{n \in {\bf Z}} \frac{1}{(n+\tau)^2} = \frac{\pi^2}{\sin^2(\pi \tau)} \ \ \ \ \ (27)

    by applying Proposition 1 to the difference of the two sides, and differentiating in {\tau}. (It is also possible to establish (27) from differentiating and then manipulating the identities in Exercises 25 or 27 of Notes 1.)

From (25), (26) (and symmetry) one has

\displaystyle  G_k(\tau) = 2 \zeta(k) + 2 \sum_{m=1}^\infty \sum_{n \in {\bf Z}} \frac{1}{(n+m\tau)^k}

and hence by the above exercise

\displaystyle  G_k(\tau) = 2 \zeta(k) + 2 \frac{(-2\pi i)^k}{(k-1)!} \sum_{m=1}^\infty \sum_{\ell=1}^\infty \ell^{k-1} q^{m\ell}.

Since {|q|<1} it is not difficult to show that the double sum here is absolutey convergent and can be rearranged as we please. If we group the terms based on the product {r=m\ell} we thus have the Fourier expansion

\displaystyle  G_k(\tau) = 2 \zeta(k) + 2 \frac{(-2\pi i)^k}{(k-1)!} \sum_{r=1}^\infty \sigma_{k-1}(r) q^r

where the {(k-1)^{th}} divisor function {\sigma_{k-1}(r)} is defined by

\displaystyle \sigma_{k-1}(r) := \sum_{\ell|r} \ell^{k-1}

where the sum is over those natural numbers {\ell} that divide {r}. Thus for instance

\displaystyle  G_4(\tau) = 2 \zeta(4) + \frac{2(2\pi)^4}{3!} \sum_{r=1}^\infty \sigma_3(r) q^r

\displaystyle  = \frac{\pi^4}{45} ( 1 + 240 \sum_{r=1}^\infty \sigma_3(r) q^r )

\displaystyle  = \frac{\pi^4}{45} ( 1 + 240 q + \dots )


\displaystyle  G_6(\tau) = 2 \zeta(6) - \frac{2(2\pi)^6}{5!} \sum_{r=1}^\infty \sigma_5(r) q^r

\displaystyle  = \frac{2\pi^6}{945} ( 1 - 504 \sum_{r=1}^\infty \sigma_5(r) q^r )

\displaystyle  = \frac{2\pi^6}{945} ( 1 - 504 q - \dots )

so after some calculation

\displaystyle  g_2(\tau) = \frac{4 \pi^4}{3} (1 + 240 q + \dots )

\displaystyle  g_3(\tau) = \frac{8 \pi^6}{27} (1 - 504 q - \dots)

\displaystyle  \Delta(\tau) = (2\pi)^{12} (q - \dots)

and therefore

\displaystyle  j(\tau) = q^{-1} + \dots, \ \ \ \ \ (28)

thus the factor of {1728} in the definition of the {j}-invariant normalises the “residue” of {j} at infinity to equal {1}.

Remark 36 If one expands out a few more terms in the above expansions, one can calculate

\displaystyle  j(z) = q^{-1} + 744 + 196884 q + 21493760 q^2 + \dots. \ \ \ \ \ (29)

The various coefficients in here have several remarkable algebraic properties. For instance, applying this expansion at {\tau = (1 + \sqrt{-d})/2} for a natural number {d}, so that {q = -e^{-\pi \sqrt{d}}}, one obtains the approximation

\displaystyle  j( (1 + \sqrt{-d})/2 ) = - e^{\pi \sqrt{d}} + 744 + O( e^{-\pi \sqrt{d}} ).

Now for certain values of {d}, most famously {d = 163}, the torus {({\bf Z} + (1 + \sqrt{-d})/2 {\bf Z}) \backslash {\bf C}} admits a complex multiplication that allows for computation of the {j}-invariant by algebraic means (think of this as a more advanced version of Lemma 28; it is closely related to the fact that the ring of algebraic integers in {{\bf Q}(\sqrt{-163})} admit unique factorisation, see these previous notes for some related discussion). For instance, one can eventually establish that

\displaystyle  j( (1 + \sqrt{-d})/2 ) = (-640320)^3

which eventually leads to the famous approximation

\displaystyle  e^{\pi \sqrt{163}} \approx 640320^3 + 744

(first observed by Hermite, but also attributed to Ramanujan via an April Fools’ joke of Martin Gardner) which is accurate to twelve decimal places. The remaining coefficients have a remarkable interpretation as dimensions of components of a certain representation of the monster group known as the moonshine module, a phenomenon known as monstrous moonshine. For instance, the smallest irreducible representation of the monster group has dimension {196883}, precisely one less than the {q} coefficient of {j}. The Fourier coefficients {\tau(n)} of the (normalised) modular discriminant,

\displaystyle  \Delta(\tau) = (2\pi)^{12} \sum_{n=1}^\infty \tau(n) q^n = (2\pi)^{12} (q - 24 q^2 + 252 q^3 + \dots)

form a sequence known as the Ramanujan {\tau} function and obeys many remarkable properties. For instance, there is the totally non-obvious fact that this function is multiplicative in the sense that {\tau(nm) = \tau(n) \tau(m)} whenever {n,m} are coprime; see Exercise 43.

Exercise 37 (Great Picard theorem)

Exercise 38 (Dimension of space of modular forms)
  • (i) If {k} is an even natural number, show that the dimension of the space of modular forms of weight {k} is equal to {\lfloor k/12\rfloor+1} except when {k} is equal to {2} mod {12}, in which case it is equal to {\lfloor k/12\rfloor}. (Hint: for {k \leq 12} this follows from Exercise 26; to cover the larger ranges of {k}, use the modular discriminant {\Delta} to show that the space of cusp forms of weight {k+12} is isomorphic to the space of modular forms of weight {k}.
  • (ii) If {k} is an even natural number, show that a basis for the space of modular forms of weight {k} is provided by the powers {G_4^i G_6^j} where {i,j} range over natural numbers (including zero) with {4i+6j=k}.

Thus far we have constructed modular forms and modular functions starting from Eisenstein series {G_k}. There is another important, and seemingly quite different, way to generate modular forms coming from theta functions. Typically these functions are not quite modular in the sense given in these notes, but are close enough that after some manipulation one can transform theta functions into modular forms. The simplest example of a theta function is the Jacobi theta function

\displaystyle  \theta(\tau) := \sum_{n \in {\bf Z}} e^{\pi i n^2 \tau}, \ \ \ \ \ (30)

which is easily seen to be a holomorphic function on {{\bf H}} that goes to zero as {\mathrm{Im}(\tau) \rightarrow +\infty}. It is not quite a modular form, but is {2}-periodic (instead of {1}-periodic) in the sense that

\displaystyle  \theta(\tau+2) = \theta(\tau)

for all {\tau \in {\bf H}}, and from Poisson summation (see Exercise 7 of Notes 2) we have the variant

\displaystyle  \theta(-1/\tau) = (-i\tau)^{1/2} \theta(\tau)

of the modularity relation (21), using the standard branch of the square root. This is not quite modular in nature, but a slight variant of the theta function fares better:

Exercise 39 Define the Dedekind eta function {\eta: {\bf H} \rightarrow {\bf C}} by the formula

\displaystyle  \eta(\tau) := e^{\pi i \tau/12} \sum_{n \in {\bf Z}} (-1)^n e^{(3n^2-n) \pi i \tau}

or in terms of the nome {q = e^{2\pi i \tau}}

\displaystyle  \eta(\tau) = q^{1/24} \sum_{n \in {\bf Z}} (-1)^n q^{(3n^2-n)/2}

where {q^{1/24}} is one of the {24^{th}} roots of {q}.
  • (i) Establish the modified {1}-periodicity

    \displaystyle  \eta(\tau+1) = e^{\pi i/12} \eta(\tau)

    and the modified modularity

    \displaystyle  \eta(-1/\tau) = (-i\tau)^{1/2} \eta(\tau)

    using the standard branch of the square root. (Hint: a direct application of Poisson summation applied to {\eta(-1/\tau)} gives a sum that looks somewhat like {\eta(\tau)} but with different numerical constants (in particular, one sees terms like {e^{\pi i n^2 \tau/3}} instead of {e^{3\pi i n^2 \tau}} arising). Split the index of summation {n} into three components {n = 3m}, {n=3m+1}, {n=3m+2} based on the residue classes modulo {3} and rearrange each component separately.)
  • (ii) Establish the identity

    \displaystyle  \Delta(\tau) = (2\pi)^{12} \eta(\tau)^{24}.

    (Hint: show that both sides are cusp forms of weight {12} that vanish like {(2\pi)^{12} q} near the cusp.)

Remark 40 The relationship between {\Delta} and the {24^{th}} power of the eta function can be interpreted (after some additional effort) as a relation {\Theta_\Lambda = G_{12} - \frac{65520}{691} \Delta} between the modular discriminant {\Delta} and the theta function {\Theta_\Lambda(\tau) := \sum_{x \in \Lambda} e^{-i\pi \tau \|x\|^2}} of a certain highly symmetric {24}-dimensional lattice {\Lambda \subset {\bf R}^{24}} known as the Leech lattice, but we will not pursue this connection further here.

The {\eta} function has a remarkable factorisation coming from Euler’s pentagonal number theorem

\displaystyle  \sum_{n \in {\bf Z}} (-1)^n q^{(3n^2-n)/2} = \prod_{m=1}^\infty (1 - q^m), \ \ \ \ \ (31)

so that

\displaystyle  \eta(\tau) = e^{\pi i \tau/12} \prod_{m=1}^\infty (1 - e^{2\pi i m \tau}). \ \ \ \ \ (32)

There are many proofs of the pentagonal number theorem in the literature. One approach is to first establish the more general Jacobi triple product identity:

Theorem 41 (Jacobi triple product identity) For any {\tau \in {\bf H}} and {z \in{\bf C}}, one has

\displaystyle  \sum_{n \in {\bf Z}} e^{\pi i n^2 \tau} e^{2\pi i nz} \ \ \ \ \ (33)

\displaystyle  = \prod_{m=1}^\infty (1 - e^{2\pi i m \tau}) (1 + e^{\pi i (2m-1) \tau} e^{2\pi i z}) (1 + e^{\pi i (2m-1) \tau} e^{-2\pi i z}).

Observe that by replacing {\tau} by {3\tau/2} and {z} with {1/2 - \tau/4} we have

\displaystyle  \sum_{n \in {\bf Z}} (-1)^n q^{(3n^2-n)/2} = \prod_{m=1}^\infty (1 - q^{3m}) (1 - q^{3m-2}) (1-q^{3m-1})

and this gives the identity (31) after splitting the integers into the three residue classes {3m, 3m-2, 3m-1} modulo {3}. One can obtain many further identities of this type by other substitutions; for instance, by setting {z=0} in the triple product identity, one obtains

\displaystyle  \theta(\tau) = \prod_{n=1}^\infty (1 - e^{2\pi i nz}) (1 + e^{\pi i (2n-1)z})^2.

Proof: Let us denote the left-hand side and right-hand side of (33) by {\Theta(z|\tau)} and {\Pi(z|\tau)} respectively. For fixed {\tau \in {\bf H}}, both sides are clearly holomorphic in {z}, with {\Pi(z|\tau)}. Our strategy in showing that {\Theta} and {\Pi} agree (following Stein-Shakarchi) is to first observe that they have many of the same periodicity properties. We clearly have {1}-periodicity

\displaystyle  \Theta(z+1|\tau) = \Theta(z|\tau); \quad \Pi(z+1|\tau) = \Pi(z|\tau).

From the identity

\displaystyle  e^{\pi i n^2 \tau} e^{2\pi i n(z+\tau)} = e^{-\pi i \tau} e^{-2\pi i z} e^{\pi i (n+1)^2 \tau} e^{2\pi i (n+1)z}

we also have the modified {\tau}-periodicity

\displaystyle  \Theta(z+\tau|\tau) = e^{-\pi i \tau} e^{-2\pi i z} \Theta(z|\tau);

similarly from the telescoping products

\displaystyle  \prod_{m=1}^\infty (1 + e^{\pi i (2m-1) \tau} e^{2\pi i (z+\tau)}) = \prod_{m=1}^\infty (1 + e^{\pi i (2(m+1)-1) \tau} e^{2\pi i z})

\displaystyle (1 + e^{\pi i \tau} e^{2\pi i z})^{-1} \prod_{m=1}^\infty (1 + e^{\pi i (2m-1) \tau} e^{2\pi i z})


\displaystyle  \prod_{m=1}^\infty (1 + e^{\pi i (2m-1) \tau} e^{-2\pi i (z+\tau)}) = \prod_{m=1}^\infty (1 + e^{\pi i (2(m-1)-1) \tau} e^{-2\pi i z})

\displaystyle (1 + e^{-\pi i \tau} e^{-2\pi i z}) \prod_{m=1}^\infty (1 + e^{\pi i (2m-1) \tau} e^{-2\pi i z})

we conclude that {\Pi} also obeys the same modified {\tau}-periodicity

\displaystyle  \Pi(z+\tau|\tau) = e^{-\pi i \tau} e^{-2\pi i z} \Pi(z|\tau).

Thus the ratio {z \mapsto \Theta(z|\tau)/\Pi(z|\tau)} is meromorphic and doubly periodic. Furthermore, one checks that {\Pi(z|\tau)} only vanishes when {z} is equal to {\frac{1+\tau}{2}} modulo {{\bf Z} + \tau{\bf Z}} with a simple zero at those locations, so the ratio {z \mapsto \Theta(z|\tau)/\Pi(z|\tau)} has at most a single simple pole on the torus {{\bf Z} + \tau {\bf Z} \backslash {\bf C}} and thus constant by the discussion after Lemma 6 (alternatively, one can show that {\Theta(z|\tau)} also vanishes at this point and apply Proposition 2). Thus there is some quantity {c(\tau)} depending only on {\tau} for which we have the identity

\displaystyle  \Theta(z|\tau) = c(\tau) \Pi(z|\tau)

for all {z \in {\bf H}}. To exploit this, we first set {z = 1/2} and replace {\tau} by {4\tau} to conclude that

\displaystyle  \sum_{n \in {\bf Z}} (-1)^n e^{4\pi i n^2 \tau} = c(4\tau) \prod_{m=1}^\infty (1 - e^{8\pi i m \tau}) (1 - e^{4\pi i (2m-1) \tau})^2

but on rearranging the absolutely convergent product we have

\displaystyle  \prod_{m=1}^\infty (1 - e^{8\pi i m \tau}) (1 - e^{4\pi i (2m-1) \tau}) = \prod_{m=1}^\infty (1 - e^{4\pi i m \tau})

and thus

\displaystyle  \sum_{n \in {\bf Z}} (-1)^n e^{4\pi i n^2 \tau} = c(4\tau) \prod_{m=1}^\infty (1 - e^{4\pi i m \tau}) (1 - e^{4\pi i (2m-1) \tau}). \ \ \ \ \ (34)

If instead we set {z=1/4} and not modify {\tau}, we have

\displaystyle  \sum_{n \in {\bf Z}} i^n e^{\pi i n^2 \tau} = c(\tau) \prod_{m=1}^\infty (1 - e^{2\pi i m \tau}) (1 + e^{2\pi i (2m-1) \tau});

the contribution of {n} and {-n} on the left-hand sides cancel when {n} is odd, so that on making the substitution {n = 2\tilde n} we obtain

\displaystyle  \sum_{n \in {\bf Z}} i^n e^{\pi i n^2 \tau} = \sum_{\tilde n \in {\bf Z}} (-1)^{\tilde n} e^{4\pi i \tilde n^2 \tau}

while from rearranging an absolutely convergent product we have

\displaystyle  \prod_{m=1}^\infty (1 - e^{2\pi i m \tau}) = \prod_{m=1}^\infty (1 - e^{2\pi i (2m-1) \tau}) (1 - e^{2\pi i (2m) \tau})

and thus by difference of two squares

\displaystyle  \sum_{\tilde n \in {\bf Z}} (-1)^{\tilde n} e^{4\pi i \tilde n^2 \tau} = c(\tau) \prod_{m=1}^\infty (1 - e^{4\pi i m \tau}) (1 - e^{4\pi i (2m-1) \tau}). \ \ \ \ \ (35)

Comparing this with (34) we obtain the surprising additional symmetry

\displaystyle  c(\tau) = c(4\tau). \ \ \ \ \ (36)

On the other hand, taking limits in (say) (35) we see that {c(\tau) \rightarrow 1} as {\mathrm{Im} \tau \rightarrow +\infty}. If we then iterate (36) we conclude that

\displaystyle  c(\tau) = \lim_{j \rightarrow \infty} c(4^j \tau) = 1

and the claim follows. \Box

Remark 42 Another equivalent form of (32) is

\displaystyle  \eta(\tau)^{-1} = e^{-\pi i \tau/12} \sum_{n=0}^\infty p(n) e^{2\pi i n \tau}

where {p(n)} is the partition function of {n} – the number of ways to represent {n} as the sum of positive integers (up to rearrangement). Among other things, this formula can be used to ascertain the asymptotic growth of {p(n)} (which turns out to roughly be of the order of {\exp( \pi \sqrt{2n/3} )}, as famously established by Hardy and Ramanujan).

Theta functions can be used to encode various number-theoretic quantities involving quadratic forms, such as sums of squares. For instance, from (30) and collecting terms one obtains the formula

\displaystyle  \theta(\tau)^k := \sum_{m=0}^\infty r_k(m) e^{\pi i m \tau}

for any natural number {k}, where {r_k(m)} denotes the number of ways to express a natural number {m} as the sum {m = n_1^2 + \dots + n_k^2} of {k} squares of integers. From Fourier inversion (Proposition 1 and a rescaling) one then has a representation

\displaystyle  r_k(m) = \frac{1}{2} \int_{\gamma_{\tau_0 \rightarrow \tau_0+2}} \theta(\tau)^k e^{-\pi i m \tau}\ d\tau

for any {\tau_0 \in {\bf H}}, which allows one to obtain asymptotics for {r_k} when {k} is large through estimation of the theta function (this is an example of the circle method); moreover, explicit identities relating the theta function to other near-modular forms (such as the Eisenstein series and their relatives) can be used to obtain exact formulae for {r_k(m)} for small values of {k} that can be used for instance to establish the famous Lagrange four-square theorem that all natural numbers are the sum of four squares. We refer the reader to the Stein-Shakarchi text for an exposition of this connection.

Exercise 43 (Hecke operators) Let {k} be a natural number.
  • (i) If {f} is a modular form of weight {k}, and {F} is the corresponding function on lattices given by Exercise 24, and {m} is a positive natural number, show that there is a unique modular form {T_m f} of weight {k} whose corresponding function {G} on lattices is related to {F} by the formula

    \displaystyle  G(\Lambda) := m^{k-1} \sum_{\Lambda' \subset \Lambda: [\Lambda:\Lambda'] = m} F(\Lambda')

    where the sum ranges over all sublattices {\Lambda'} of {\Lambda} whose index {[\Lambda:\Lambda']} is equal to {m}. Show that {T_m} is a linear operator on the space of weight {k} modular forms that also maps the space of weight {k} cusp forms to itself; this operator is known as a Hecke operator.
  • (ii) Give the more explicit formula

    \displaystyle  T_m f(\tau) = m^{k-1} \sum_{a,d>0: ad=m} \frac{1}{d^k} \sum_{b=0}^{d-1} f( \frac{az+b}{d} ).

  • (iii) Show that the Hecke operators all commute with each other, thus {T_n T_m f = T_m T_n f} whenever {f} is a modular form of weight {k} and {n,m} are positive natural numbers. Furthermore show that {T_n T_m = T_{nm}} if {n,m} are coprime.
  • (iv) If {f} is a modular form of weight {k} with Fourier expansion {f(\tau) = \sum_{n=0}^\infty a_n q^n}, show that for any coprime positive integers {n,m} that the {q^n} coefficient of {T_m f} is equal to {a_{nm}}.
  • (v) Establish the multiplicativity {\tau(nm) = \tau(n) \tau(m)} of the Ramanujan tau function (the Fourier coefficients of the modular discriminant). (Hint: use the one-dimensionality of the space of cusp forms of weight {12} to conclude that {\Delta} is a simultaneous eigenfunction of the Hecke operators.)
Simultaneous eigenfunctions of the Hecke operators are known as Hecke eigenfunctions and are of major importance in number theory.

John PreskillProject Ant-Man

The craziest challenge I’ve undertaken hasn’t been skydiving; sailing the Amazon on a homemade raft; scaling Mt. Everest; or digging for artifacts atop a hill in a Middle Eastern desert, near midday, during high summer.1 The craziest challenge has been to study the possibility that quantum phenomena affect cognition significantly. 

Most physicists agree that quantum phenomena probably don’t affect cognition significantly. Cognition occurs in biological systems, which have high temperatures, many particles, and watery components. Such conditions quash entanglement (a relationship that quantum particles can share and that can produce correlations stronger than any produceable by classical particles). 

Yet Matthew Fisher, a condensed-matter physicist, proposed a mechanism by which entanglement might enhance coordinated neuron firing. Phosphorus nuclei have spins (quantum properties similar to angular momentum) that might store quantum information for long times when in Posner molecules. These molecules may protect the information from decoherence (leaking quantum information to the environment), via mechanisms that Fisher described.

I can’t check how correct Fisher’s proposal is; I’m not a biochemist. But I’m a quantum information theorist. So I can identify how Posners could process quantum information if Fisher were correct. I undertook this task with my colleague Elizabeth Crosson, during my PhD

Experimentalists have begun testing elements of Fisher’s proposal. What if, years down the road, they find that Posners exist in biofluids and protect quantum information for long times? We’ll need to test whether Posners can share entanglement. But detecting entanglement tends to require control finer than you can exert with a stirring rod. How could you check whether a beakerful of particles contains entanglement?

I asked that question of Adam Bene Watts, a PhD student at MIT, and John Wright, then an MIT postdoc and now an assistant professor in Texas. John gave our project its codename. At a meeting one day, he reported that he’d watched the film Avengers: Endgame. Had I seen it? he asked.

No, I replied. The only superhero movie I’d seen recently had been Ant-Man and the Wasp—and that because, according to the film’s scientific advisor, the movie riffed on research of mine. 

Go on, said John.

Spiros Michalakis, the Caltech mathematician in charge of this blog, served as the advisor. The film came out during my PhD; during a meeting of our research group, Spiros advised me to watch the movie. There was something in it “for you,” he said. “And you,” he added, turning to Elizabeth. I obeyed, to hear Laurence Fishburne’s character tell Ant-Man that another character had entangled with the Posner molecules in Ant-Man’s brain.2 

John insisted on calling our research Project Ant-Man.

John and Adam study Bell tests. Bell test sounds like a means of checking whether the collar worn by your cat still jingles. But the test owes its name to John Stewart Bell, a Northern Irish physicist who wrote a groundbreaking paper in 1964

Say you’d like to check whether two particles share entanglement. You can run an experiment, described by Bell, on them. The experiment ends with a measurement of the particles. You repeat this experiment in many trials, using identical copies of the particles in subsequent trials. You accumulate many measurement outcomes, whose statistics you calculate. You plug those statistics into a formula concocted by Bell. If the result exceeds some number that Bell calculated, the particles shared entanglement.

We needed a variation on Bell’s test. In our experiment, every trial would involve hordes of particles. The experimentalists—large, clumsy, classical beings that they are—couldn’t measure the particles individually. The experimentalists could record only aggregate properties, such as the intensity of the phosphorescence emitted by a test tube.

Adam, MIT physicist Aram Harrow, and I concocted such a Bell test, with help from John. Physical Review A published our paper this month—as a Letter and an Editor’s Suggestion, I’m delighted to report.

For experts: The trick was to make the Bell correlation function nonlinear in the state. We assumed that the particles shared mostly pairwise correlations, though our Bell inequality can accommodate small aberrations. Alas, no one can guarantee that particles share only mostly pairwise correlations. Violating our Bell inequality therefore doesn’t rule out hidden-variables theories. Under reasonable assumptions, though, a not-completely-paranoid experimentalist can check for entanglement using our test. 

One can run our macroscopic Bell test on photons, using present-day technology. But we’re more eager to use the test to characterize lesser-known entities. For instance, we sketched an application to Posner molecules. Detecting entanglement in chemical systems will require more thought, as well as many headaches for experimentalists. But our paper broaches the cask—which I hope to see flow in the next Ant-Man film. Due to debut in 2022, the movie has the subtitle Quantumania. Sounds almost as crazy as studying the possibility that quantum phenomena affect cognition.

1Of those options, I’ve undertaken only the last.

2In case of any confusion: We don’t know that anyone’s brain contains Posner molecules. The movie features speculative fiction.

February 28, 2021

Jordan EllenbergOedipaVision

Like a lot of people I’m watching WandaVision, the latest Marvel show. CJ is an MCU fanatic and this show, well-acted, imaginatively shot, and legible without extreme knowledge of Marvel lore, is a good one for us to watch together.

It has settled, on the surface, into being a more “normal” MCU show after doing a lot of really interesting stuff in the first half of the season. But weirdness remains, under the surface. For example (and now the rest of this is spoilers) — the scene where Wanda magically blasts a new rendition of her dead husband Vision out of her own abdomen is clearly shot as a childbirth scene, which makes Vision both her son and her husband, so the whole thing has suddenly taken on a Freudian cast which I don’t think is from the comics. And this explains the shock of the old expert witch Agatha Harkness, who tells Wanda she’s something that isn’t supposed to exist; she is “chaos magic,” a witch with the power to spontaneously create. Witches, traditionally, are supposed to be infertile, but Wanda is not. (This is complicated, I guess, by the fact that Harkness herself apparently has a son in comics continuity but she’s presented as married and childless here.)

Isn’t the Mind Stone placed in the middle of Vision’s forehead a little like a third eye? And isn’t death by getting that eye ripped out kind of Vision’s thing?

I know, I know, sometimes a synthezoid is just a synthezoid.

February 27, 2021

Terence Tao246B, Notes 4: The Riemann zeta function and the prime number theorem

Previous set of notes: Notes 3. Next set of notes: 246C Notes 1.

One of the great classical triumphs of complex analysis was in providing the first complete proof (by Hadamard and de la Vallée Poussin in 1896) of arguably the most important theorem in analytic number theory, the prime number theorem:

Theorem 1 (Prime number theorem) Let {\pi(x)} denote the number of primes less than a given real number {x}. Then

\displaystyle  \lim_{x \rightarrow \infty} \frac{\pi(x)}{x/\ln x} = 1

(or in asymptotic notation, {\pi(x) = (1+o(1)) \frac{x}{\ln x}} as {x \rightarrow \infty}).

(Actually, it turns out to be slightly more natural to replace the approximation {\frac{x}{\ln x}} in the prime number theorem by the logarithmic integral {\int_2^x \frac{dt}{\ln t}}, which turns out to be a more precise approximation, but we will not stress this point here.)

The complex-analytic proof of this theorem hinges on the study of a key meromorphic function related to the prime numbers, the Riemann zeta function {\zeta}. Initially, it is only defined on the half-plane {\{ s \in {\bf C}: \mathrm{Re} s > 1 \}}:

Definition 2 (Riemann zeta function, preliminary definition) Let {s \in {\bf C}} be such that {\mathrm{Re} s > 1}. Then we define

\displaystyle  \zeta(s) := \sum_{n=1}^\infty \frac{1}{n^s}. \ \ \ \ \ (1)

Note that the series is locally uniformly convergent in the half-plane {\{ s \in {\bf C}: \mathrm{Re} s > 1 \}}, so in particular {\zeta} is holomorphic on this region. In previous notes we have already evaluated some special values of this function:

\displaystyle  \zeta(2) = \frac{\pi^2}{6}; \quad \zeta(4) = \frac{\pi^4}{90}; \quad \zeta(6) = \frac{\pi^6}{945}. \ \ \ \ \ (2)

However, it turns out that the zeroes (and pole) of this function are of far greater importance to analytic number theory, particularly with regards to the study of the prime numbers.

The Riemann zeta function has several remarkable properties, some of which we summarise here:

Theorem 3 (Basic properties of the Riemann zeta function)
  • (i) (Euler product formula) For any {s \in {\bf C}} with {\mathrm{Re} s > 1}, we have

    \displaystyle  \zeta(s) = \prod_p (1 - \frac{1}{p^s})^{-1} \ \ \ \ \ (3)

    where the product is absolutely convergent (and locally uniform in {s}) and is over the prime numbers {p = 2, 3, 5, \dots}.
  • (ii) (Trivial zero-free region) {\zeta(s)} has no zeroes in the region {\{s: \mathrm{Re}(s) > 1 \}}.
  • (iii) (Meromorphic continuation) {\zeta} has a unique meromorphic continuation to the complex plane (which by abuse of notation we also call {\zeta}), with a simple pole at {s=1} and no other poles. Furthermore, the Riemann xi function

    \displaystyle  \xi(s) := \frac{1}{2} s(s-1) \pi^{-s/2} \Gamma(s/2) \zeta(s) \ \ \ \ \ (4)

    is an entire function of order {1} (after removing all singularities). The function {(s-1) \zeta(s)} is an entire function of order one after removing the singularity at {s=1}.
  • (iv) (Functional equation) After applying the meromorphic continuation from (iii), we have

    \displaystyle  \zeta(s) = 2^s \pi^{s-1} \sin(\frac{\pi s}{2}) \Gamma(1-s) \zeta(1-s) \ \ \ \ \ (5)

    for all {s \in {\bf C}} (excluding poles). Equivalently, we have

    \displaystyle  \xi(s) = \xi(1-s) \ \ \ \ \ (6)

    for all {s \in {\bf C}}. (The equivalence between the (5) and (6) is a routine consequence of the Euler reflection formula and the Legendre duplication formula, see Exercises 26 and 31 of Notes 1.)

Proof: We just prove (i) and (ii) for now, leaving (iii) and (iv) for later sections.

The claim (i) is an encoding of the fundamental theorem of arithmetic, which asserts that every natural number {n} is uniquely representable as a product {n = \prod_p p^{a_p}} over primes, where the {a_p} are natural numbers, all but finitely many of which are zero. Writing this representation as {\frac{1}{n^s} = \prod_p \frac{1}{p^{a_p s}}}, we see that

\displaystyle  \sum_{n \in S_{x,m}} \frac{1}{n^s} = \prod_{p \leq x} \sum_{a=0}^m \frac{1}{p^{as}}

whenever {x \geq 1}, {m \geq 0}, and {S_{x,m}} consists of all the natural numbers of the form {n = \prod_{p \leq x} p^{a_p}} for some {a_p \leq m}. Sending {m} and {x} to infinity, we conclude from monotone convergence and the geometric series formula that

\displaystyle  \sum_{n=1}^\infty \frac{1}{n^s} = \prod_{p} \sum_{a=0}^\infty \frac{1}{p^{s}} =\prod_p (1 - \frac{1}{p^s})^{-1}

whenever {s>1} is real, and then from dominated convergence we see that the same formula holds for complex {s} with {\mathrm{Re} s > 1} as well. Local uniform convergence then follows from the product form of the Weierstrass {M}-test (Exercise 19 of Notes 1).

The claim (ii) is immediate from (i) since the Euler product {\prod_p (1-\frac{1}{p^s})^{-1}} is absolutely convergent and all terms are non-zero. \Box

We remark that by sending {s} to {1} in Theorem 3(i) we conclude that

\displaystyle  \sum_{n=1}^\infty \frac{1}{n} = \prod_p (1-\frac{1}{p})^{-1}

and from the divergence of the harmonic series we then conclude Euler’s theorem {\sum_p \frac{1}{p} = \infty}. This can be viewed as a weak version of the prime number theorem, and already illustrates the potential applicability of the Riemann zeta function to control the distribution of the prime numbers.

The meromorphic continuation (iii) of the zeta function is initially surprising, but can be interpreted either as a manifestation of the extremely regular spacing of the natural numbers {n} occurring in the sum (1), or as a consequence of various integral representations of {\zeta} (or slight modifications thereof). We will focus in this set of notes on a particular representation of {\zeta} as essentially the Mellin transform of the theta function {\theta} that briefly appeared in previous notes, and the functional equation (iv) can then be viewed as a consequence of the modularity of that theta function. This in turn was established using the Poisson summation formula, so one can view the functional equation as ultimately being a manifestation of Poisson summation. (For a direct proof of the functional equation via Poisson summation, see these notes.)

Henceforth we work with the meromorphic continuation of {\zeta}. The functional equation (iv), when combined with special values of {\zeta} such as (2), gives some additional values of {\zeta} outside of its initial domain {\{s: \mathrm{Re} s > 1\}}, most famously

\displaystyle  \zeta(-1) = -\frac{1}{12}.

If one formally compares this formula with (1), one arrives at the infamous identity

\displaystyle  1 + 2 + 3 + \dots = -\frac{1}{12}

although this identity has to be interpreted in a suitable non-classical sense in order for it to be rigorous (see this previous blog post for further discussion).

From Theorem 3 and the non-vanishing nature of {\Gamma}, we see that {\zeta} has simple zeroes (known as trivial zeroes) at the negative even integers {-2, -4, \dots}, and all other zeroes (the non-trivial zeroes) inside the critical strip {\{ s \in {\bf C}: 0 \leq \mathrm{Re} s \leq 1 \}}. (The non-trivial zeroes are conjectured to all be simple, but this is hopelessly far from being proven at present.) As we shall see shortly, these latter zeroes turn out to be closely related to the distribution of the primes. The functional equation tells us that if {\rho} is a non-trivial zero then so is {1-\rho}; also, we have the identity

\displaystyle  \zeta(s) = \overline{\zeta(\overline{s})} \ \ \ \ \ (7)

for all {s>1} by (1), hence for all {s} (except the pole at {s=1}) by meromorphic continuation. Thus if {\rho} is a non-trivial zero then so is {\overline{\rho}}. We conclude that the set of non-trivial zeroes is symmetric by reflection by both the real axis and the critical line {\{ s \in {\bf C}: \mathrm{Re} s = \frac{1}{2} \}}. We have the following infamous conjecture:

Conjecture 4 (Riemann hypothesis) All the non-trivial zeroes of {\zeta} lie on the critical line {\{ s \in {\bf C}: \mathrm{Re} s = \frac{1}{2} \}}.

This conjecture would have many implications in analytic number theory, particularly with regard to the distribution of the primes. Of course, it is far from proven at present, but the partial results we have towards this conjecture are still sufficient to establish results such as the prime number theorem.

Return now to the original region where {\mathrm{Re} s > 1}. To take more advantage of the Euler product formula (3), we take complex logarithms to conclude that

\displaystyle  -\log \zeta(s) = \sum_p \log(1 - \frac{1}{p^s})

for suitable branches of the complex logarithm, and then on taking derivatives (using for instance the generalised Cauchy integral formula and Fubini’s theorem to justify the interchange of summation and derivative) we see that

\displaystyle  -\frac{\zeta'(s)}{\zeta(s)} = \sum_p \frac{\ln p/p^s}{1 - \frac{1}{p^s}}.

From the geometric series formula we have

\displaystyle  \frac{\ln p/p^s}{1 - \frac{1}{p^s}} = \sum_{j=1}^\infty \frac{\ln p}{p^{js}}

and so (by another application of Fubini’s theorem) we have the identity

\displaystyle  -\frac{\zeta'(s)}{\zeta(s)} = \sum_{n=1}^\infty \frac{\Lambda(n)}{n^s}, \ \ \ \ \ (8)

for {\mathrm{Re} s > 1}, where the von Mangoldt function {\Lambda(n)} is defined to equal {\Lambda(n) = \ln p} whenever {n = p^j} is a power {p^j} of a prime {p} for some {j=1,2,\dots}, and {\Lambda(n)=0} otherwise. The contribution of the higher prime powers {p^2, p^3, \dots} is negligible in practice, and as a first approximation one can think of the von Mangoldt function as the indicator function of the primes, weighted by the logarithm function.

The series {\sum_{n=1}^\infty \frac{1}{n^s}} and {\sum_{n=1}^\infty \frac{\Lambda(n)}{n^s}} that show up in the above formulae are examples of Dirichlet series, which are a convenient device to transform various sequences of arithmetic interest into holomorphic or meromorphic functions. Here are some more examples:

Exercise 5 (Standard Dirichlet series) Let {s} be a complex number with {\mathrm{Re} s > 1}.
  • (i) Show that {-\zeta'(s) = \sum_{n=1}^\infty \frac{\ln n}{n^s}}.
  • (ii) Show that {\zeta^2(s) = \sum_{n=1}^\infty \frac{\tau(n)}{n^s}}, where {\tau(n) := \sum_{d|n} 1} is the divisor function of {n} (the number of divisors of {n}).
  • (iii) Show that {\frac{1}{\zeta(s)} = \sum_{n=1}^\infty \frac{\mu(n)}{n^s}}, where {\mu(n)} is the Möbius function, defined to equal {(-1)^k} when {n} is the product of {k} distinct primes for some {k \geq 0}, and {0} otherwise.
  • (iv) Show that {\frac{\zeta(2s)}{\zeta(s)} = \sum_{n=1}^\infty \frac{\lambda(n)}{n^s}}, where {\lambda(n)} is the Liouville function, defined to equal {(-1)^k} when {n} is the product of {k} (not necessarily distinct) primes for some {k \geq 0}.
  • (v) Show that {\log \zeta(s) = \sum_{n=1}^\infty \frac{\Lambda(n)/\ln n}{n^s}}, where {\log \zeta} is the holomorphic branch of the logarithm that is real for {s>1}, and with the convention that {\Lambda(n)/\ln n} vanishes for {n=1}.
  • (vi) Use the fundamental theorem of arithmetic to show that the von Mangoldt function is the unique function {\Lambda: {\bf N} \rightarrow {\bf R}} such that

    \displaystyle  \ln n = \sum_{d|n} \Lambda(d)

    for every positive integer {n}. Use this and (i) to provide an alternate proof of the identity (8). Thus we see that (8) is really just another encoding of the fundamental theorem of arithmetic.

Given the appearance of the von Mangoldt function {\Lambda}, it is natural to reformulate the prime number theorem in terms of this function:

Theorem 6 (Prime number theorem, von Mangoldt form) One has

\displaystyle  \lim_{x \rightarrow \infty} \frac{1}{x} \sum_{n \leq x} \Lambda(n) = 1

(or in asymptotic notation, {\sum_{n\leq x} \Lambda(n) = x + o(x)} as {x \rightarrow \infty}).

Let us see how Theorem 6 implies Theorem 1. Firstly, for any {x \geq 2}, we can write

\displaystyle  \sum_{n \leq x} \Lambda(n) = \sum_{p \leq x} \ln p + \sum_{j=2}^\infty \sum_{p \leq x^{1/j}} \ln p.

The sum {\sum_{p \leq x^{1/j}} \ln p} is non-zero for only {O(\ln x)} values of {j}, and is of size {O( x^{1/2} \ln x )}, thus

\displaystyle  \sum_{n \leq x} \Lambda(n) = \sum_{p \leq x} \ln p + O( x^{1/2} \ln^2 x ).

Since {x^{1/2} \ln^2 x = o(x)}, we conclude from Theorem 6 that

\displaystyle  \sum_{p \leq x} \ln p = x + o(x)

as {x \rightarrow \infty}. Next, observe from the fundamental theorem of calculus that

\displaystyle  \frac{1}{\ln p} - \frac{1}{\ln x} = \int_p^x \frac{1}{\ln^2 y} \frac{dy}{y}.

Multiplying by {\log p} and summing over all primes {p \leq x}, we conclude that

\displaystyle  \pi(x) - \frac{\sum_{p \leq x} \ln p}{\ln x} = \int_2^x \sum_{p \leq y} \ln p \frac{1}{\ln^2 y} \frac{dy}{y}.

From Theorem 6 we certainly have {\sum_{p \leq y} \ln p = O(y)}, thus

\displaystyle  \pi(x) - \frac{x + o(x)}{\ln x} = O( \int_2^x \frac{dy}{\ln^2 y} ).

By splitting the integral into the ranges {2 \leq y \leq \sqrt{x}} and {\sqrt{x} < y \leq x} we see that the right-hand side is {o(x/\ln x)}, and Theorem 1 follows.

Exercise 7 Show that Theorem 1 conversely implies Theorem 6.

The alternate form (8) of the Euler product identity connects the primes (represented here via proxy by the von Mangoldt function) with the logarithmic derivative of the zeta function, and can be used as a starting point for describing further relationships between {\zeta} and the primes. Most famously, we shall see later in these notes that it leads to the remarkably precise Riemann-von Mangoldt explicit formula:

Theorem 8 (Riemann-von Mangoldt explicit formula) For any non-integer {x > 1}, we have

\displaystyle  \sum_{n \leq x} \Lambda(n) = x - \lim_{T \rightarrow \infty} \sum_{\rho: |\hbox{Im}(\rho)| \leq T} \frac{x^\rho}{\rho} - \ln(2\pi) - \frac{1}{2} \ln( 1 - x^{-2} )

where {\rho} ranges over the non-trivial zeroes of {\zeta} with imaginary part in {[-T,T]}. Furthermore, the convergence of the limit is locally uniform in {x}.

Actually, it turns out that this formula is in some sense too precise; in applications it is often more convenient to work with smoothed variants of this formula in which the sum on the left-hand side is smoothed out, but the contribution of zeroes with large imaginary part is damped; see Exercise 22. Nevertheless, this formula clearly illustrates how the non-trivial zeroes {\rho} of the zeta function influence the primes. Indeed, if one formally differentiates the above formula in {x}, one is led to the (quite nonrigorous) approximation

\displaystyle  \Lambda(n) \approx 1 - \sum_\rho n^{\rho-1} \ \ \ \ \ (9)

or (writing {\rho = \sigma+i\gamma})

\displaystyle  \Lambda(n) \approx 1 - \sum_{\sigma+i\gamma} \frac{n^{i\gamma}}{n^{1-\sigma}}.

Thus we see that each zero {\rho = \sigma + i\gamma} induces an oscillation in the von Mangoldt function, with {\gamma} controlling the frequency of the oscillation and {\sigma} the rate to which the oscillation dies out as {n \rightarrow \infty}. This relationship is sometimes known informally as “the music of the primes”.

Comparing Theorem 8 with Theorem 6, it is natural to suspect that the key step in the proof of the latter is to establish the following slight but important extension of Theorem 3(ii), which can be viewed as a very small step towards the Riemann hypothesis:

Theorem 9 (Slight enlargement of zero-free region) There are no zeroes of {\zeta} on the line {\{ 1+it: t \in {\bf R} \}}.

It is not quite immediate to see how Theorem 6 follows from Theorem 8 and Theorem 9, but we will demonstrate it below the fold.

Although Theorem 9 only seems like a slight improvement of Theorem 3(ii), proving it is surprisingly non-trivial. The basic idea is the following: if there was a zero at {1+it}, then there would also be a different zero at {1-it} (note {t} cannot vanish due to the pole at {s=1}), and then the approximation (9) becomes

\displaystyle  \Lambda(n) \approx 1 - n^{it} - n^{-it} + \dots = 1 - 2 \cos(t \log n) + \dots.

But the expression {1 - 2 \cos(t \log n)} can be negative for large regions of the variable {n}, whereas {\Lambda(n)} is always non-negative. This conflict eventually leads to a contradiction, but it is not immediately obvious how to make this argument rigorous. We will present here the classical approach to doing so using a trigonometric identity of Mertens.

In fact, Theorem 9 is basically equivalent to the prime number theorem:

Exercise 10 For the purposes of this exercise, assume Theorem 6, but do not assume Theorem 9. For any non-zero real {t}, show that

\displaystyle  -\frac{\zeta'(\sigma+it)}{\zeta(\sigma+it)} = o( \frac{1}{\sigma-1})

as {\sigma \rightarrow 1^+}, where {o( \frac{1}{\sigma-1})} denotes a quantity that goes to zero as {\sigma \rightarrow 1^+} after being multiplied by {\sigma-1}. Use this to derive Theorem 9.

This equivalence can help explain why the prime number theorem is remarkably non-trivial to prove, and why the Riemann zeta function has to be either explicitly or implicitly involved in the proof.

This post is only intended as the briefest of introduction to complex-analytic methods in analytic number theory; also, we have not chosen the shortest route to the prime number theorem, electing instead to travel in directions that particularly showcase the complex-analytic results introduced in this course. For some further discussion see this previous set of lecture notes, particularly Notes 2 and Supplement 3 (with much of the material in this post drawn from the latter).

— 1. Meromorphic continuation and functional equation —

We now focus on understanding the meromorphic continuation of {\zeta}, as well as the functional equation that that continuation satisfies. The arguments here date back to Riemann’s original paper on the zeta function. The general strategy is to relate the zeta function {\zeta(s)} for {\mathrm{Re}(s) > 1} to some sort of integral involving the parameter {s}, which is manipulated in such a way that the integral makes sense for values of {s} outside of the halfplane {\{ s: \mathrm{Re}(s) > 1 \}}, and can thus be used to define the zeta function meromorphically in such a region. Often the Gamma function {\Gamma} is involved in the relationship between the zeta function and integral. There are many such ways to connect {\zeta} to an integral; we present some of the more classical ones here.

One way to motivate the meromorphic continuation {\zeta} is to look at the continuous analogue

\displaystyle  \frac{1}{s-1} = \int_1^\infty \frac{1}{t^s}\ dt, \quad \mathrm{Re} s > 1

of (1). This clearly extends meromorphically to the whole complex plane. So one now just has to understand the analytic continuation properties of the residual

\displaystyle  \frac{1}{s-1} - \zeta(s) = \int_1^\infty \frac{1}{t^s}\ dt - \sum_{n=1}^\infty \frac{1}{n^s}, \quad \mathrm{Re} s > 1.

For instance, using the Riemann sum type quadrature

\displaystyle  \int_n^{n+1} \frac{1}{t^s}\ dt = \frac{1}{n^s} + \int_n^{n+1} \frac{1}{t^s} - \frac{1}{n^s}\ dt

one can write this residual as

\displaystyle  \sum_{n=1}^\infty \int_n^{n+1} \frac{1}{t^s} - \frac{1}{n^s}\ dt;

since {\frac{1}{t^s} - \frac{1}{n^s} = O_s( \frac{1}{n^{\mathrm{Re} s+1}} )}, it is a routine application of the Fubini and Morera theorems to establish analytic continuation of the residual to the half-plane {\mathrm{Re} s > 0}, thus giving a meromorphic extension

\displaystyle  \zeta(s) = \frac{1}{s-1} - \sum_{n=1}^\infty \int_n^{n+1} \frac{1}{t^s} - \frac{1}{n^s}\ dt \ \ \ \ \ (10)

of {\zeta} to the region {\{ s: \mathrm{Re}(s) > 0 \}}. Among other things, this shows that (the meromorphic continuation of) {\zeta} has a simple pole at {s=1} with residue {1}.

Exercise 11 Using the trapezoid rule, show that for any {s} in the region {\{ s: \hbox{Re}(s) > -1 \}} with {s \neq 1}, there exists a unique complex number {\zeta(s)} for which one has the asymptotic

\displaystyle  \sum_{n=1}^N \frac{1}{n^s} = \zeta(s) + \frac{N^{1-s}}{1-s} + \frac{1}{2} N^{-s} + O( \frac{|s| |s+1|}{\sigma+1} N^{-s-1} )

for any natural number {N}, where {s=\sigma+it}. Use this to extend the Riemann zeta function meromorphically to the region {\{ s: \hbox{Re}(s) > -1 \}}. Conclude in particular that {\zeta(0)=-\frac{1}{2}} and {\xi(0)=\xi(1)= \frac{1}{2}}.

Exercise 12 Obtain the refinement

\displaystyle  \sum_{y \leq n \leq x} f(n) = \int_y^x f(t)\ dt + \frac{1}{2} f(x) + \frac{1}{2} f(y)

\displaystyle  + \frac{1}{12} (f'(x) - f'(y)) + O( \int_x^y |f'''(t)|\ dt )

to the trapezoid rule when {y < x} are integers and {f: [y,x] \rightarrow {\bf C}} is continuously three times differentiable. Then show that for any {s} in the region {\{ s: \hbox{Re}(s) > -2 \}} with {s \neq -1}, there exists a unique complex number {\zeta(s)} for which one has the asymptotic

\displaystyle  \sum_{n=1}^N \frac{1}{n^s} = \zeta(s) + \frac{N^{1-s}}{1-s} + \frac{1}{2} N^{-s} + \frac{s}{12} N^{-s-1}

\displaystyle  + O( \frac{|s| |s+1| |s+2|}{\sigma+2} N^{-s-2} )

for any natural number {N}, where {s=\sigma+it}. Use this to extend the Riemann zeta function meromorphically to the region {\{ s: \hbox{Re}(s) > -2 \}}. Conclude in particular that {\zeta(-1)=-\frac{1}{12}}.

One can keep going in this fashion using the Euler-Maclaurin formula (see this previous blog post) to extend the range of meromorphic continuation to the rest of the complex plane. However, we will now proceed in a different fashion, using the theta function

\displaystyle  \theta(ix) = \sum_{n \in {\bf Z}} e^{-\pi n^2 x} \ \ \ \ \ (11)

that made an appearance in previous notes, and try to transform this function into the zeta function. We will only need this function for imaginary values {ix} of the argument in the upper half-plane (so {x>0}); from Exercise 7 of Notes 2 we have the modularity relation

\displaystyle  \theta(i/x) = x^{1/2} \theta(ix). \ \ \ \ \ (12)

In particular, since {\theta(ix)} decays exponentially to {1} as {x \rightarrow \infty}, {\theta(ix)} blows up like {x^{-1/2}} as {x \rightarrow 0}.

We will attempt to apply the Mellin transform (Exercise 11 from Notes 2) to this function; formally, we have

\displaystyle  {\mathcal M}[\theta(i\cdot)](s) := \int_0^\infty x^s \theta(ix) \frac{dx}{x}.

There is however a problem: as {x} goes to infinity, {\theta(ix)} converges to one, and the integral here is unlikely to be convergent. So we will compute the Mellin transform of {\theta(ix)-1}:

\displaystyle  {\mathcal M}(\theta(i\cdot)-1)(s) := \int_0^\infty x^s (\theta(ix)-1) \frac{dx}{x}. \ \ \ \ \ (13)

The function {\theta-1} decays exponentially as {x \rightarrow \infty}, and blows up like {O(x^{-1/2})} as {x \rightarrow 0}, so this integral will be absolutely integrable when {\mathrm{Re} s > 1/2}. Since

\displaystyle  \theta(ix)-1 = 2 \sum_{n=1}^\infty e^{-\pi n^2 x}

we can write

\displaystyle  {\mathcal M}(\theta(i\cdot)-1)(s) = 2 \int_0^\infty \sum_{n=1}^\infty x^s e^{-\pi n^2 x} \frac{dx}{x}.

By the Fubini–Tonelli theorem, the integrand here is absolutely integrable, and hence

\displaystyle  {\mathcal M}(\theta(i\cdot)-1)(s) = 2 \sum_{n=1}^\infty \int_0^\infty x^s e^{-\pi n^2 x} \frac{dx}{x}.

From the Bernoulli definition of the Gamma function (Exercise 29(ii) of Notes 1) and a change of variables we have

\displaystyle  \int_0^\infty x^s e^{-\pi n^2 x} \frac{dx}{x} = \frac{\Gamma(s)}{(\pi n^2)^s}

and hence by (1) we obtain the identity

\displaystyle  {\mathcal M}(\theta(i\cdot)-1)(s) = \frac{2 \Gamma(s) \zeta(2s)}{\pi^s}

whenever {\mathrm{Re}(s) > 1/2}. Replacing {s} by {s/2}, we can rearrange this as a formula for the {\xi} function (4), namely

\displaystyle  \xi(s) = \frac{s(s-1)}{4} {\mathcal M}(\theta(i\cdot)-1)(s/2)

whenever {\mathrm{Re}(s) > 1}.

Now we exploit the modular identity (12) to improve the convergence of this formula. The convergence of {\theta(ix)-1} is much better near {x=\infty} than near {x=0}, so we use (13) to split

\displaystyle  \xi(s) = \frac{s(s-1)}{4} ( \int_0^1 x^{s/2} (\theta(ix)-1) \frac{dx}{x} + \int_1^\infty x^{s/2} (\theta(ix)-1) \frac{dx}{x} )

and then transform the first integral using the change of variables {x \mapsto 1/x} to obtain

\displaystyle  \xi(s) = \frac{s(s-1)}{4} ( \int_1^\infty x^{-s/2} (\theta(i/x)-1) \frac{dx}{x} + \int_1^\infty x^{s/2} (\theta(ix)-1) \frac{dx}{x} ).

Using (12) we can write this as

\displaystyle  \xi(s) = \frac{s(s-1)}{4} \int_1^\infty (x^{(1-s)/2} + x^{s/2}) (\theta(ix)-1) + (x^{(1-s)/2}-x^{- s/2})\frac{dx}{x}.

Direct computation shows that

\displaystyle  \int_1^\infty (x^{(1-s)/2}-x^{-s/2})\frac{dx}{x} = \frac{2}{s-1} - \frac{2}{s} = \frac{2}{s(s-1)}

and thus

\displaystyle  \xi(s) = \frac{1}{2} + \frac{s(s-1)}{4} \int_1^\infty (x^{(1-s)/2} + x^{s/2}) (\theta(ix)-1) \frac{dx}{x}

whenever {\mathrm{Re}(s) > 1}. However, the integrand here is holomorphic in {x} and exponentially decaying in {x}, so from the Fubini and Morera theorems we easily see that the right-hand side is an entire function of {s}; also from inspection we see that it is symmetric with respect to the symmetry {s \mapsto 1-s}. Thus we can define {\xi} as an entire function, and hence {\zeta} as a meromorphic function, and one verifies the functional equation (6).

It remains to establish that {\xi} is of order {1}. From (11) we have {\theta(ix)-1 = O( e^{-\pi x} )} so from the triangle inequality

\displaystyle  \xi(s) \lesssim 1 + (1+|s|)^2 \int_1^\infty x^{\frac{1+|s|}{2}} e^{-\pi x} \frac{dx}{x}

\displaystyle  = 1 + (1+|s|)^2 \pi^{-\frac{1+|s|}{2}} \Gamma( \frac{1+|s|}{2} ).

From the Stirling approximation (Exercise 30(v) from Notes 1) we conclude that

\displaystyle  \xi(s) \lesssim \exp( O( |s| \log |s| ) )

for {|s| \geq 2} (say), and hence {\xi} is of order at most {1} as required. (One can show that {\xi} has order exactly one by inspecting what happens to {\xi(s)} as {s \rightarrow +\infty}, using that {\zeta(s) \rightarrow 1} in this regime.) This completes the proof of Theorem 3.

Exercise 13 (Alternate derivation of meromorphic continuation and functional equation)
  • (i) Establish the identity

    \displaystyle  \Gamma(s) \zeta(s) = \int_0^\infty \frac{t^s}{e^t-1} \frac{dt}{t}

    whenever {\mathrm{Re}(s) > 1}.
  • (ii) Establish the identity

    \displaystyle  \zeta(s) = \frac{1}{\Gamma(s) (1 - e^{2\pi i(s-1)})} \lim_{R \rightarrow \infty} \int_{C_{R,\varepsilon}} \frac{z^s}{e^z-1} \frac{dz}{z} \ \ \ \ \ (14)

    whenever {\mathrm{Re}(s) > 1}, {s} is not an integer, {\varepsilon>0}, {z^s := \exp( s \mathrm{Log}_{[0,2\pi)} z)} where {\mathrm{Log}_{[0,2\pi)}} is the branch of the logarithm with real part in {[0,2\pi)}, and {C_{R,\varepsilon}} is the contour consisting of the line segment {\gamma_{R-i\varepsilon \rightarrow -i\varepsilon}}, the semicircle {\{ \varepsilon e^{i(3\pi/2 - \theta)}: 0 \leq \theta \leq \pi \}}, and the line segment {\gamma_{i\varepsilon \rightarrow R+i\varepsilon}}.
  • (iii) Use (ii) to meromorphically continue {\zeta} to the entire complex plane {{\bf C}}.
  • (iv) By shifting the contour {C_{R,\varepsilon}} to the contour {\tilde C_{R,N} := \gamma_{R - 2\pi i (N+1/2) \rightarrow -1 - 2\pi i(N+1/2) \rightarrow -1 + 2\pi i (N+1/2) \rightarrow R + 2\pi i (N+1/2)}} for a large natural number {N} and applying the residue theorem, show that

    \displaystyle  \lim_{R \rightarrow \infty} \int_{C_{R,\varepsilon}} \frac{z^s}{e^z-1} \frac{dz}{z} = 2\pi i \sum_{n \in {\bf Z} \backslash \{0\}} (2\pi i n)^{s-1}

    again using the branch {\mathrm{Log}_{[0,2\pi)}} of the logarithm to define {(2\pi i n)^{s-1}}.
  • (v) Establish the functional equation (5).

Exercise 14 Use the formula {\zeta(-1)=\frac{-1}{12}} from Exercise 12, together with the functional equation, to give yet another proof of the identity {\zeta(2) = \frac{\pi^2}{6}}.

Exercise 15 (Relation between zeta function and Bernoulli numbers)
  • (i) For any complex number {z} with {\hbox{Re}(z)>0}, use the Poisson summation formula (Proposition 3(v) from Notes 2) to establish the identity

    \displaystyle  \sum_{n \in {\bf Z}} e^{-|n|z} = \sum_{m \in {\bf Z}} \frac{2z}{z^2+(2\pi m)^2}.

  • (ii) For {z} as above and sufficiently small, show that

    \displaystyle  2\sum_k (-1)^{k+1} \zeta(2k) (z/2\pi)^k = \frac{z}{1-e^{-z}} - 1 - \frac{z}{2}.

    Conclude that

    \displaystyle  \zeta(2k) = \frac{(-1)^{k+1} (2\pi)^k}{2 (2k!)} B_{2k}

    for any natural number {k}, where the Bernoulli numbers {B_2, B_4, B_6, \dots} are defined through the Taylor expansion

    \displaystyle  \frac{z}{1-e^{-z}} = 1 + \frac{z}{2} + \sum_k \frac{B_{2k}}{(2k)!} z^{2k}.

    Thus for instance {B_2 = 1/6}, {B_4 = -1/30}, and so forth.
  • (iii) Show that

    \displaystyle  \zeta(-n) = -\frac{B_{n+1}}{n+1} \ \ \ \ \ (15)

    for any odd natural number {n}. (This identity can also be deduced from the Euler-Maclaurin formula, which generalises the approach in Exercise 12; see this previous post.)
  • (iv) Use (14) and the residue theorem (now working inside the contour {\tilde C_{R,N}}, rather than outside) to give an alternate proof of (15).

Exercise 16 (Convexity bounds)
  • (i) Establish the bounds {\zeta(\sigma+it) \lesssim_\sigma 1} for any {\sigma > 1} and {t \in {\bf R}} with {|t|>1}.
  • (ii) Establish the bounds {\zeta(\sigma+it) \lesssim_\sigma |t|^{1/2-\sigma}} for any {\sigma < 0} and {t \in {\bf R}} with {|t|>1}. (Hint: use the functional equation.)
  • (iii) Establish the bounds {\zeta(\sigma+it) \lesssim_{\sigma,\varepsilon} |t|^{\frac{1-\sigma}{2}+\varepsilon}} for any {0 \leq \sigma \leq 1, \varepsilon > 0} and {t \in {\bf R}} with {|t|>1}. (Hint: use the Phragmén-Lindelöf principle, Exercise 19 from Notes 2, after dealing somehow with the pole at {s=1}.)
It is possible to improve the bounds (iii) in the region {0 < \sigma < 1}; such improvements are known as subconvexity estimates. For instance, it is currently known that {\zeta(\frac{1}{2}+it) \lesssim_\mu (1+|t|)^\mu} for any {\mu > 13/84} and {t \in {\bf R}}, a result of Bourgain; the Lindelöf hypothesis asserts that this bound in fact holds for all {\mu>0}, although this remains unproven (it is however a consequence of the Riemann hypothesis).

Exercise 17 (Riemann-von Mangoldt formula) Show that for any {T \geq 2}, the number of zeroes of {\zeta} in the rectangle {\{ \sigma+it: 0 \leq \sigma \leq 1; 0 \leq t \leq T \}} is equal to {\frac{T}{2\pi} \log \frac{T}{2\pi} - \frac{T}{2\pi} + O( \log T )}. (Hint: apply the argument principle to {\xi} evaluated at a rectangle {\gamma_{2-iT' \rightarrow 2+iT' \rightarrow -1+iT' \rightarrow -1-iT' \rightarrow 2-iT'}} for some {T' = T + O(1)} that is chosen so that the horizontal edges of the rectangle do not come too close to any of the zeroes (cf. the selection of the radii {R_k} in the proof of the Hadamard factorisation theorem in Notes 1), and use the functional equation and Stirling’s formula to control the asymptotics for the horizontal edges.)

We remark that the error term {O(\log T)}, due to von Mangoldt in 1905, has not been significantly improved despite over a century of effort. Even assuming the Riemann hypothesis, the error has only been reduced very slightly to {O(\log T/\log\log T)} (a result of Littlewood from 1924).

Remark 18 Thanks to the functional equation and Rouche’s theorem, it is possible to numerically verify the Riemann hypothesis in any finite portion {\{ \sigma+it: 0 \leq \sigma \leq 1; |t| \leq T \}} of the critical strip, so long as the zeroes in that strip are all simple. Indeed, if there was a zero {\sigma+it} off of the critical line {\sigma \neq 1/2}, then an application of the argument principle (and Rouche’s theorem) in some small contour around {\sigma+it} but avoiding the critical line would be capable of numerically determining that there was a zero off of the line. Similarly, for each simple zero {\frac{1}{2}+it} on the critical line, applying the argument principle for some small contour around that zero and symmetric around the critical line would numerically verify that there was exactly one zero within that contour, which by the functional equation would then have to lie exactly on that line. (In practice, more efficient methods are used to numerically verify the Riemann hypothesis over large finite portions of the strip, but we will not detail them here.)

— 2. The explicit formula —

We now prove Riemann-von Mangoldt explicit formula. Since {\xi} is a non-trivial entire function of order {1}, with zeroes at the non-trivial zeroes of {\zeta} (the trivial zeroes having been cancelled out by the Gamma function), we see from the Hadamard factorisation theorem (in the form of Exercise 35 from Notes 1) that

\displaystyle  \frac{\xi'(s)}{\xi(s)} = B + \sum_\rho \frac{1}{s-\rho} - \frac{1}{\rho}

away from the zeroes of {\xi}, where {\rho} ranges over the non-trivial zeroes of {\zeta} (note from Exercise 11 that there is no zero at the origin), and {B} is some constant. From (4) we can calculate

\displaystyle \frac{\xi'(s)}{\xi(s)} = \frac{\zeta'(s)}{\zeta(s)} + \frac{1}{s} + \frac{1}{s-1} - \frac{1}{2} \ln \pi - \frac{1}{2} \frac{\Gamma'(s/2)}{\Gamma(s/2)}

while from Exercise 27 of Notes 1 we have

\displaystyle  \frac{1}{2} \frac{\Gamma'(s/2)}{\Gamma(s/2)} = -\frac{1}{2} \gamma + \sum_{k=0,-2,-4,\dots} \frac{1}{2-k} - \frac{1}{s-k}

and thus (after some rearranging)

\displaystyle  -\frac{\zeta'}{\zeta}(s) = B' + \frac{1}{s-1} - \sum_\rho (\frac{1}{s-\rho} + \frac{1}{\rho}) - \sum_{n=1}^\infty (\frac{1}{s+2n} - \frac{1}{2n}) \ \ \ \ \ (16)


\displaystyle  B' = -B - \frac{1}{2} \log \pi - \frac{1}{2} \gamma.

One can compute the values of {B,B'} explicitly:

Exercise 19 By inspecting both sides of (16) as {s \rightarrow 0}, show that {B' = - \log 2\pi + 1}, and hence {B = \log 2 + \frac{1}{2} \log \pi - 1 - \frac{1}{2} \gamma}.

Jensen’s formula tells us that the number of non-trivial zeroes of {\zeta} in a disk {D(0,R)} is at most {O_\varepsilon(R^{1+\varepsilon})} for any {\varepsilon>0} and {R>1}. One can obtain a local version:

Exercise 20 (Local bound on zeroes)
  • (i) Establish the upper bound {|\zeta(\sigma+it)| \lesssim |t|^{O(1)}} whenever {\frac{1}{4} \leq \sigma \lesssim 1} and {t \in {\bf R}} with {|t| \geq 1}. (Hint: use (10). More precise bounds are available with more effort, but will not be needed here.)
  • (ii) Establish the bounds {|\zeta(2+it)| \sim 1} uniformly in {t}. (Hint: use the Euler product.)
  • (iii) Show that for any {T \geq 2}, the number of non-trivial zeroes with imaginary part in {[T,T+1]} is {O(\log T)}. (Hint: use Jensen’s formula and the functional equation.)
  • (iv) For {T \geq 2}, {|\mathrm{Re} s| \leq 10}, and {|\mathrm{Im} s| \in [T,T+1]}, with {s} not a zero of {\zeta}, show that

    \displaystyle  \frac{\zeta'(s)}{\zeta(s)} = \sum_{\rho: |\rho-s| \leq 1} \frac{1}{s-\rho} + O(\log T).

    (Hint: use Exercise 9 of Notes 1.)

Meanwhile, from Perron’s formula (Exercise 12 of Notes 2) and (8) we see that for any non-integer {x>1}, we have

\displaystyle  \sum_{n \leq x} \Lambda(n) = \frac{1}{2\pi i} \lim_{T \rightarrow \infty} \int_{\gamma_{2-iT \rightarrow 2+iT}} (B' + \frac{1}{s-1} - \sum_\rho (\frac{1}{s-\rho} + \frac{1}{\rho})

\displaystyle - \sum_{n=1}^\infty (\frac{1}{s+2n} - \frac{1}{2n})) \frac{x^s}{s}\ ds.

We can compute individual terms here and then conclude the Riemann-von Mangoldt explicit formula:

Exercise 21 (Riemann-von Mangoldt explicit formula) Let {T \geq 2} and {x > 1}. Establish the following bounds:
  • (i) {\frac{1}{2\pi i} \int_{\gamma_{2-iT \rightarrow 2+iT}} B' \frac{x^s}{s}\ ds = B' + O_x( 1/T )}.
  • (ii) {\frac{1}{2\pi i} \int_{\gamma_{2-iT \rightarrow 2+iT}} \frac{1}{s-1} \frac{x^s}{s}\ ds = x - 1 + O_x( 1/T )}.
  • (iii) For any positive integer {n}, we have

    \displaystyle  \frac{1}{2\pi i} \int_{\gamma_{2-iT \rightarrow 2+iT}} (\frac{1}{s+2n} - \frac{1}{2n}) \frac{x^s}{s}\ ds

    \displaystyle = \frac{x^{-2n}}{2n} + O_x( \frac{1}{n(n+T)} ).

  • (iv) For any non-trivial zero {\rho}, we have

    \displaystyle  \frac{1}{2\pi i} \int_{\gamma_{2-iT \rightarrow 2+iT}} (\frac{1}{s-\rho} - \frac{1}{\rho}) \frac{x^s}{s}\ ds

    \displaystyle = 1_{|\mathrm{Im}(\rho)| \leq T} \frac{x^{\rho}}{\rho} + O_x( \frac{1}{(1+||\mathrm{Im}(\rho)| - T|) (|\mathrm{Im}(\rho)|+T)} ).

  • (v) We have {\lim_{T \rightarrow \infty} \sum_{n=1}^\infty \frac{1}{n(n+T)} = 0}.
  • (vi) We have {\lim_{T \rightarrow \infty} \sum_\rho \frac{1}{(1+||\mathrm{Im}(\rho)| - T|) (|\mathrm{Im}(\rho)|+T)} = 0}.
(Hint: for (i)-(iii), shift the contour {\gamma_{2-iT \rightarrow 2+iT}} to {\gamma_{2-iT \rightarrow -R-iT \rightarrow -R+iT \rightarrow 2+iT}} for an {R} that gets sent to infinity, and using the residue theorem. The same argument works for (iv) except when {\rho} is really close to {2+iT}, in which case a detour to the contour may be called for. For (vi), use Exercise 20 and partition the zeroes depending on what unit interval {\mathrm{Im}(\rho)} falls into.)
  • (viii) Using the above estimates, conclude Theorem 8.

The explicit formula in Theorem 8 is completely exact, but turns out to be a little bit inconvenient for applications because it involves all the zeroes {\rho}, and the series involving them converges very slowly (indeed the convergence is not even absolute). In practice it is preferable to work with a smoothed version of the formula. Here is one such smoothing:

Exercise 22 (Smoothed explicit formula)

— 3. Extending the zero free region, and the prime number theorem —

We now show how Theorem 9 implies Theorem 6. Let {2 \leq T \leq x} be parameters to be chosen later. We will apply Exercise 22 to a function {\eta = \eta_{T,x}} which equals one on {[2,x]}, is supported on {[1.5, x + x/T]}, and obeys the derivative estimates

\displaystyle  \eta^{(j)}(y) \lesssim_j 1

for all {y \in [1.5,2]} and {j \geq 0}, and

\displaystyle  \eta^{(j)}(y) \lesssim_j (T/x)^j

for all {y \in [x,x+T]} and {j \geq 0}. Such a function can be constructed by gluing together various rescaled versions of (antiderivatives of) standard bump functions. For such a function, we have

\displaystyle  \sum_{n \leq x} \Lambda(n) \leq \sum_n \Lambda(n) \eta(n).

On the other hand, we have

\displaystyle  \int_{\bf R} \eta(y)\ dy = x + O(x/T)


\displaystyle  \int_{\bf R} \frac{1}{y^3-y} \eta(y)\ dy = O(1)

and hence

\displaystyle  \sum_{n \leq x} \Lambda(n) \leq x + O(x/T) + \sum_\rho O( |\int_{\bf R} \eta(y) y^{\rho-1}\ dy| ). \ \ \ \ \ (17)

We split into the two cases {|\rho| \leq T_*} and {|\rho| > T_*}, where {T_* \geq T} is a parameter to be chosen later. For {|\rho| \leq T_*}, there are only {O_T(1)} zeros, and all of them have real part strictly less than {1} by Theorem 9. Hence there exists {\varepsilon = \varepsilon_{T_*} > 0} such that {\mathrm{Re} \rho \leq 1-\varepsilon} for all such zeroes. For each such zero, we have from the triangle inequality

\displaystyle  \int_{\bf R} \eta(y) y^{\rho-1}\ dy \lesssim_{T_*} x^{1-\varepsilon}

and so the total contribution of these zeroes to (17) is {O_{T_*}( x^{1-\varepsilon})}. For each zero {\rho} with {|\rho| \geq T_*}, we integrate parts twice to get some decay in {\rho}:

\displaystyle  \int_{\bf R} \eta(y) y^{\rho-1}\ dy = \frac{-1}{\rho} \int_{\bf R} \eta'(y) y^\rho\ dy

\displaystyle  = \frac{1}{\rho(\rho+1)} \int_{\bf R} \eta''(y) y^{\rho+1}\ dy,

and from the triangle inequality and the fact that {\mathrm{Re} \rho \leq 1} we conclude

\displaystyle  \int_{\bf R} \eta(y) y^{\rho-1}\ dy \lesssim \frac{1}{|\rho|^2} \frac{T^2}{x^2} \frac{x}{T} x^2 = \frac{T}{|\rho|^2} x.

Since {\sum_\rho \frac{1}{|\rho|^2}} is convergent (this follows from Exercise 20 we conclude (for {T_*} large enough depending on {T}) that the total contribution here is {O(x/T)}. Thus, after choosing {T_*} suitably, we obtain the bound

\displaystyle  \sum_{n \leq x} \Lambda(n) \leq x + O(x/T) + O_T(x^{1-\varepsilon})

and thus

\displaystyle  \sum_{n \leq x} \Lambda(n) \leq x + O(x/T)

whenever {x} is sufficiently large depending on {T} (since {\varepsilon} depends only on {T_*}, which depends only on {T}). A similar argument (replacing {x,x+x/T} by {x-x/T, x} in the construction of {\eta}) gives the matching lower bound

\displaystyle  \sum_{n \leq x} \Lambda(n) \geq x - O(x/T)

whenever {x} is sufficiently large depending on {T}. Sending {T \rightarrow \infty}, we obtain Theorem 6.

Exercise 23 Assuming the Riemann hypothesis, show that

\displaystyle  \sum_{n \leq x} \Lambda(n) = x + O_\varepsilon(x^{1/2+\varepsilon})

for any {\varepsilon>0} and {x>1}, and that

\displaystyle  \pi(x) = \int_2^x \frac{dt}{\log t} + O_\varepsilon(x^{1/2+\varepsilon})

for any {\varepsilon>0} and {x>1}. Conversely, show that either of these two estimates are equivalent to the Riemann hypothesis. (Hint: find a holomorphic continuation of {-\frac{\zeta'(s)}{\zeta(s)} - \frac{1}{s-1}} to the region {\mathrm{Re}(s) > 1/2} in a manner similar to how {\zeta(s)-\frac{1}{s-1}} was first holomorphically continued to the region {\mathrm{Re}(s) > 0}).

It remains to prove Theorem 9. The claim is clear for {t=0} thanks to the simple pole of {\zeta} at {s=1}, so we may assume {t \neq 0}. Suppose for contradiction that there was a zero of {\zeta} at {1+it}, thus

\displaystyle  \zeta(\sigma+it) = O_t( \sigma-1)

for {\sigma>1} sufficiently close to {1}. Taking logarithms, we see in particular that

\displaystyle  \ln |\zeta(\sigma+it)| \leq - \ln \frac{1}{\sigma-1} + O_t(1).

Using Lemma 5(v), we conclude that

\displaystyle  \sum_{n=2}^\infty \frac{\Lambda(n)/\ln n}{n^\sigma} \cos(t \ln n) \leq - \ln \frac{1}{\sigma-1} + O_t(1).

Note that the summands here are oscillatory due to the cosine term. To manage the oscillation, we use the simple pole at {s=1} that gives

\displaystyle  \zeta(\sigma) \sim \frac{1}{\sigma-1}

for {\sigma>1} sufficiently close to one, and on taking logarithms as before we get

\displaystyle  \sum_{n=2}^\infty \frac{\Lambda(n)/\ln n}{n^\sigma} = \ln \frac{1}{\sigma-1} + O(1).

These two estimates come close to being contradictory, but not quite (because we could have {\cos(t\ln n)} close to {-1} for most numbers {n} that are weighted by {\frac{\Lambda(n)/\ln n}{n^\sigma}}. To get the contradiction, we use the analytic continuation of {\zeta} to {1+2it} to conclude that

\displaystyle  \zeta(\sigma+2it) = O_t( 1)

and hence

\displaystyle  \sum_{n=2}^\infty \frac{\Lambda(n)/\ln n}{n^\sigma} \cos(2 t \ln n) \leq O_t(1).

Now we take advantage of the Mertens inequality

\displaystyle  3 + 4 \cos(\theta) + \cos(2\theta) = 2 (1 + \cos(\theta))^2 \geq 0

(which is a quantitative variant of the observation that if {\cos(\theta)} is close to {-1} then {\cos(2\theta)} has to be close to {+1}) as well as the non-negative nature of {\Lambda(n)} to conclude that

\displaystyle  \sum_{n=2}^\infty \frac{\Lambda(n)/\ln n}{n^\sigma} (3 + 4\cos(2 t \ln n) + \cos(2\theta) \geq 0

and hence

\displaystyle  0 \leq - \ln \frac{1}{\sigma-1} + O_t(1).

This leads to the desired contradiction by sending {\sigma \rightarrow 1^+}, and proves the prime number theorem.

Exercise 24 Establish the inequality

\displaystyle  \zeta(\sigma)^3 |\zeta(\sigma+it)|^4 |\zeta(\sigma+2it)| \geq 1

for any {t \in {\bf R}} and {\sigma>1}.

Remark 25 There are a number of ways to improve Theorem 9 that move a little closer in the direction of the Riemann hypothesis. Firstly, there are a number of zero-free regions for the Riemann zeta function known that give lower bounds for {|\zeta(s)|} (and in particular preclude the existence of zeros) a small amount inside the critical strip, and can be used to improve the error term in the prime number theorem; for instance, the classical zero-free region shows that there are no zeroes in the region {\{ \sigma+it: \sigma \geq 1 - \frac{c}{\ln(1+|t|)}} for some sufficiently small absolute constant {c}, and lets one improve the {o(x)} error term in Theorem 6 to {O( x \exp( -c\sqrt{\log x}))} (with a corresponding improvement in Theorem 1, provided that one replaces {x/\ln x} with the logarithmic integral {\int_2^x \frac{dt}{\ln t}}). A further improvement in the zero free region and in the prime number theorem error term was subsequently given by Vinogradov. We also mention a number of important zero density estimates which provide non-trivial upper bounds for the number of zeroes in other, somewhat larger regions of the critical strip; the bounds are not strong enough to completely exclude zeroes as is the case with zero-free regions, but can at least limit the collective influence of such zeroes. For more discussion of these topics, see the various lecture notes to this previous course.

Terence Tao246B, Notes 2: Some connections with the Fourier transform

Previous set of notes: Notes 1. Next set of notes: Notes 3.

In Exercise 5 (and Lemma 1) of 246A Notes 4 we already observed some links between complex analysis on the disk (or annulus) and Fourier series on the unit circle:

  • (i) Functions {f} that are holomorphic on a disk {\{ |z| < R \}} are expressed by a convergent Fourier series (and also Taylor series) {f(re^{i\theta}) = \sum_{n=0}^\infty r^n a_n e^{in\theta}} for {0 \leq r < R} (so in particular {a_n = \frac{1}{n!} f^{(n)}(0)}), where

    \displaystyle  \limsup_{n \rightarrow +\infty} |a_n|^{1/n} \leq \frac{1}{R}; \ \ \ \ \ (1)

    conversely, every infinite sequence {(a_n)_{n=0}^\infty} of coefficients obeying (1) arises from such a function {f}.
  • (ii) Functions {f} that are holomorphic on an annulus {\{ r_- < |z| < r_+ \}} are expressed by a convergent Fourier series (and also Laurent series) {f(re^{i\theta}) = \sum_{n=-\infty}^\infty r^n a_n e^{in\theta}}, where

    \displaystyle  \limsup_{n \rightarrow +\infty} |a_n|^{1/n} \leq \frac{1}{r_+}; \limsup_{n \rightarrow -\infty} |a_n|^{1/|n|} \leq \frac{1}{r_-}; \ \ \ \ \ (2)

    conversely, every doubly infinite sequence {(a_n)_{n=-\infty}^\infty} of coefficients obeying (2) arises from such a function {f}.
  • (iii) In the situation of (ii), there is a unique decomposition {f = f_1 + f_2} where {f_1} extends holomorphically to {\{ z: |z| < r_+\}}, and {f_2} extends holomorphically to {\{ z: |z| > r_-\}} and goes to zero at infinity, and are given by the formulae

    \displaystyle  f_1(z) = \sum_{n=0}^\infty a_n z^n = \frac{1}{2\pi i} \int_\gamma \frac{f(w)}{w-z}\ dw

    where {\gamma} is any anticlockwise contour in {\{ z: |z| < r_+\}} enclosing {z}, and and

    \displaystyle  f_2(z) = \sum_{n=-\infty}^{-1} a_n z^n = - \frac{1}{2\pi i} \int_\gamma \frac{f(w)}{w-z}\ dw

    where {\gamma} is any anticlockwise contour in {\{ z: |z| > r_-\}} enclosing {0} but not {z}.

This connection lets us interpret various facts about Fourier series through the lens of complex analysis, at least for some special classes of Fourier series. For instance, the Fourier inversion formula {a_n = \frac{1}{2\pi} \int_0^{2\pi} f(e^{i\theta}) e^{-in\theta}\ d\theta} becomes the Cauchy-type formula for the Laurent or Taylor coefficients of {f}, in the event that the coefficients are doubly infinite and obey (2) for some {r_- < 1 < r_+}, or singly infinite and obey (1) for some {R > 1}.

It turns out that there are similar links between complex analysis on a half-plane (or strip) and Fourier integrals on the real line, which we will explore in these notes.

We first fix a normalisation for the Fourier transform. If {f \in L^1({\bf R})} is an absolutely integrable function on the real line, we define its Fourier transform {\hat f: {\bf R} \rightarrow {\bf C}} by the formula

\displaystyle  \hat f(\xi) := \int_{\bf R} f(x) e^{-2\pi i x \xi}\ dx. \ \ \ \ \ (3)

From the dominated convergence theorem {\hat f} will be a bounded continuous function; from the Riemann-Lebesgue lemma it also decays to zero as {\xi \rightarrow \pm \infty}. My choice to place the {2\pi} in the exponent is a personal preference (it is slightly more convenient for some harmonic analysis formulae such as the identities (4), (5), (6) below), though in the complex analysis and PDE literature there are also some slight advantages in omitting this factor. In any event it is not difficult to adapt the discussion in this notes for other choices of normalisation. It is of interest to extend the Fourier transform beyond the {L^1({\bf R})} class into other function spaces, such as {L^2({\bf R})} or the space of tempered distributions, but we will not pursue this direction here; see for instance these lecture notes of mine for a treatment.

Exercise 1 (Fourier transform of Gaussian) If {a} is a coplex number with {\mathrm{Re} a>0} and {f} is the Gaussian function {f(x) := e^{-\pi a x^2}}, show that the Fourier transform {\hat f} is given by the Gaussian {\hat f(\xi) = a^{-1/2} e^{-\pi \xi^2/a}}, where we use the standard branch for {a^{-1/2}}.

The Fourier transform has many remarkable properties. On the one hand, as long as the function {f} is sufficiently “reasonable”, the Fourier transform enjoys a number of very useful identities, such as the Fourier inversion formula

\displaystyle  f(x) = \int_{\bf R} \hat f(\xi) e^{2\pi i x \xi} d\xi, \ \ \ \ \ (4)

the Plancherel identity

\displaystyle  \int_{\bf R} |f(x)|^2\ dx = \int_{\bf R} |\hat f(\xi)|^2\ d\xi, \ \ \ \ \ (5)

and the Poisson summation formula

\displaystyle  \sum_{n \in {\bf Z}} f(n) = \sum_{k \in {\bf Z}} \hat f(k). \ \ \ \ \ (6)

On the other hand, the Fourier transform also intertwines various qualitative properties of a function {f} with “dual” qualitative properties of its Fourier transform {\hat f}; in particular, “decay” properties of {f} tend to be associated with “regularity” properties of {\hat f}, and vice versa. For instance, the Fourier transform of rapidly decreasing functions tend to be smooth. There are complex analysis counterparts of this Fourier dictionary, in which “decay” properties are described in terms of exponentially decaying pointwise bounds, and “regularity” properties are expressed using holomorphicity on various strips, half-planes, or the entire complex plane. The following exercise gives some examples of this:

Exercise 2 (Decay of {f} implies regularity of {\hat f}) Let {f \in L^1({\bf R})} be an absolutely integrable function. Hint: to establish holomorphicity in each of these cases, use Morera’s theorem and the Fubini-Tonelli theorem. For uniqueness, use analytic continuation, or (for part (iv)) the Cauchy integral formula.

Later in these notes we will give a partial converse to part (ii) of this exercise, known as the Paley-Wiener theorem; there are also partial converses to the other parts of this exercise.

From (3) we observe the following intertwining property between multiplication by an exponential and complex translation: if {\xi_0} is a complex number and {f: {\bf R} \rightarrow {\bf C}} is an absolutely integrable function such that the modulated function {f_{\xi_0}(x) := e^{2\pi i \xi_0 x} f(x)} is also absolutely integrable, then we have the identity

\displaystyle  \widehat{f_{\xi_0}}(\xi) = \hat f(\xi - \xi_0) \ \ \ \ \ (7)

whenever {\xi} is a complex number such that at least one of the two sides of the equation in (7) is well defined. Thus, multiplication of a function by an exponential weight corresponds (formally, at least) to translation of its Fourier transform. By using contour shifting, we will also obtain a dual relationship: under suitable holomorphicity and decay conditions on {f}, translation by a complex shift will correspond to multiplication of the Fourier transform by an exponential weight. It turns out to be possible to exploit this property to derive many Fourier-analytic identities, such as the inversion formula (4) and the Poisson summation formula (6), which we do later in these notes. (The Plancherel theorem can also be established by complex analytic methods, but this requires a little more effort; see Exercise 8.)

The material in these notes is loosely adapted from Chapter 4 of Stein-Shakarchi’s “Complex Analysis”.

— 1. The inversion and Poisson summation formulae —

We now explore how the Fourier transform {\hat f} of a function {f} behaves when {f} extends holomorphically to a strip. For technical reasons we will also impose a fairly mild decay condition on {f} at infinity to ensure integrability. As we shall shortly see, the method of contour shifting then allows us to insert various exponentially decaying factors into Fourier integrals that make the justification of identities such as the Fourier inversion formula straightforward.

Proposition 3 (Fourier transform of functions holomorphic in a strip) Let {a > 0}, and suppose that {f} is a holomorphic function on the strip {\{ z: |\mathrm{Im} z| < a \}} which obeys a decay bound of the form

\displaystyle  |f(x+iy)| \leq \frac{C_b}{1+|x|^\alpha} \ \ \ \ \ (8)

for all {x \in {\bf R}}, {0 < b < a}, {y \in [-b,b]}, and some {C>0} and {\alpha > 1} (or in asymptotic notation, one has {f(x+iy) \lesssim_{b,f} \frac{1}{1+|x|^\alpha}} whenever {x \in {\bf R}} and {|y| \leq b < a}).
  • (i) (Translation intertwines with modulation) For any {w} in the strip {\{ z: |\mathrm{Im}(z)| < a \}}, the Fourier transform of the function {x \mapsto f(x+w)} is {\xi \mapsto e^{2\pi i w \xi} \hat f(\xi)}.
  • (ii) (Exponential decay of Fourier transform) For any {0 < b < a}, there is a quantity {C_{b,\alpha}} such that {|\hat f(\xi)| \leq C_{b,\alpha} e^{-2\pi b|\xi|}} for all {\xi \in {\bf R}} (or in asymptotic notation, one has {\hat f(\xi) \lesssim_{a,b,\alpha,f} e^{-2\pi b|\xi|}} for {0 < b < a} and {\xi \in {\bf R}}).
  • (iii) (Partial Fourier inversion) For any {0 < \varepsilon < a} and {x \in {\bf R}}, one has

    \displaystyle  \int_0^\infty \hat f(\xi) e^{2\pi i x \xi}\ d\xi = \frac{1}{2\pi i} \int_{-\infty}^\infty \frac{f(y-i\varepsilon)}{y-i\varepsilon - x}\ dy


    \displaystyle  \int_{-\infty}^0 \hat f(\xi) e^{2\pi i x \xi}\ d\xi = -\frac{1}{2\pi i} \int_{-\infty}^\infty \frac{f(y+i\varepsilon)}{y+i\varepsilon - x}\ dy.

  • (iv) (Full Fourier inversion) For any {x \in {\bf R}}, the identity (4) holds for this function {f}.
  • (v) (Poisson summation formula) The identity (6) holds for this function {f}.

Proof: We begin with (i), which is a standard application of contour shifting. Applying the definition (3) of the Fourier transform, our task is to show that

\displaystyle  \int_{\bf R} f(x+w) e^{-2\pi i x \xi}\ dx = e^{2\pi i w \xi} \int_{\bf R} f(x) e^{-2\pi i x \xi}\ dx

whenever {|\mathrm{Im} w| < a} and {\xi \in {\bf R}}. Clearly

\displaystyle  \int_{\bf R} f(x) e^{-2\pi i x \xi}\ d\xi = \lim_{R \rightarrow \infty} \int_{\gamma_{-R \rightarrow R}} f(z) e^{-2\pi i z \xi}\ dz

where {\gamma_{z_1 \rightarrow z_2}} is the line segment contour from {z_1} to {z_2}, and similarly after a change of variables

\displaystyle  e^{-2\pi i w \xi} \int_{\bf R} f(x+w) e^{-2\pi i x \xi}\ dx = \lim_{R \rightarrow \infty} \int_{\gamma_{-R+w \rightarrow R+w}} f(z) e^{-2\pi i z \xi}\ dz.

On the other hand, from Cauchy’s theorem we have

\displaystyle  \int_{\gamma_{-R+w \rightarrow R+w}} = \int_{\gamma_{-R \rightarrow R}} + \int_{\gamma_{R \rightarrow R+w}} - \int_{\gamma_{-R \rightarrow -R+w}}

when applied to the holomorphic integrand {f(z) e^{-2\pi i z \xi}}. So it suffices to show that

\displaystyle  \int_{\gamma_{\pm R \rightarrow \pm R+w}} f(z) e^{-2\pi i z \xi}\ dz \rightarrow 0

as {R \rightarrow \infty}. But the left hand side can be rewritten as

\displaystyle  w e^{\mp 2\pi i R \xi} \int_0^1 f(\pm R + tw) e^{-2\pi i tw \xi}\ dt,

and the claim follows from (8) and dominated convergence.

For (ii), we apply (i) with {w = \pm i b} to observe that the Fourier transform of {x \mapsto f(x \pm ib)} is {\xi \mapsto e^{\mp 2\pi b \xi} \hat f(\xi)}. Applying (8) and the triangle inequality we conclude that

\displaystyle  e^{\mp 2\pi b \xi} \hat f(\xi) \lesssim_{f,b,\alpha} 1

for both choices of sign {\mp} and all {\xi \in {\bf R}}, giving the claim.

For the first part of (iii), we write {f_{-\varepsilon}(y) := f(y-i\varepsilon)}. By part (i), we have {\hat f_{-\varepsilon}(\xi) = e^{2\pi \varepsilon \xi} \hat f(\xi)}, so we can rewrite the desired identity as

\displaystyle  \int_0^\infty \hat f_{-\varepsilon}(\xi) e^{-2\pi \varepsilon \xi} e^{2\pi i x \xi}\ d\xi = \frac{1}{2\pi i} \int_{-\infty}^\infty \frac{f_{-\varepsilon}(y)}{y-i\varepsilon-x}\ dy.

By (3) and Fubini’s theorem (taking advantage of (8) and the exponential decay of {e^{-2\pi \varepsilon \xi}} as {\varepsilon \rightarrow +\infty}) the left-hand side can be written as

\displaystyle  \int_{\bf R} f_{-\varepsilon}(y) \int_0^\infty e^{-2\pi i y \xi} e^{-2\pi \varepsilon \xi} e^{2\pi i x \xi}\ dy d\xi.

But a routine calculation shows that

\displaystyle  \int_0^\infty e^{-2\pi i y \xi} e^{-2\pi \varepsilon \xi} e^{2\pi i x \xi}\ dy = \frac{1}{2\pi i} \frac{1}{y-i\varepsilon-x}

giving the claim. The second part of (iii) is proven similarly.

To prove (iv), it suffices in light of (iii) to show that

\displaystyle  \int_{-\infty}^\infty \frac{f(y-i\varepsilon)}{y-i\varepsilon - x}\ dy - \int_{-\infty}^\infty \frac{f(y+i\varepsilon)}{y+i\varepsilon - x}\ dy = 2\pi i f(x)

for any {x \in {\bf R}}. The left-hand side can be written after a change of variables as

\displaystyle  \lim_{R \rightarrow \infty} \int_{\gamma_{-R-i\varepsilon \rightarrow R-i\varepsilon}} \frac{f(z)}{z-x}\ dz + \int_{\gamma_{-R+i\varepsilon \rightarrow R+i\varepsilon}} \frac{f(z)}{z-x}\ dz.

On the other hand, from dominated convergence as in the proof of (i) we have

\displaystyle  \lim_{R \rightarrow \infty} \int_{\gamma_{R-i\varepsilon \rightarrow R+i\varepsilon}} \frac{f(z)}{z-x}\ dz + \int_{\gamma_{-R+i\varepsilon \rightarrow -R-i\varepsilon}} \frac{f(z)}{z-x}\ dz = 0

while from the Cauchy integral formula one has

\displaystyle  \lim_{R \rightarrow \infty} \int_{\gamma_{-R-i\varepsilon \rightarrow R-i\varepsilon \rightarrow R+i\varepsilon \rightarrow -R+i\varepsilon \rightarrow -R-i\varepsilon}} \frac{f(z)}{z-x}\ dz = 2\pi i f(x)

giving the claim.

Now we prove (v). Let {0 < \varepsilon < a}. From (i) we have

\displaystyle  \hat f(k) = \int_{\bf R} f(x-i\varepsilon) e^{-2\pi \varepsilon k} e^{-2\pi i k x}\ dx


\displaystyle  \hat f(k) = \int_{\bf R} f(x+i\varepsilon) e^{2\pi \varepsilon k} e^{-2\pi i k x}\ dx

for any {k \in {\bf Z}}. If we sum the first identity for {k=0,1,2,\dots} we see from the geometric series formula and Fubini’s theorem that

\displaystyle  \sum_{k=0}^\infty \hat f(k) = \int_{\bf R} \frac{f(x-i\varepsilon)}{1 - e^{-2\pi i (x-i\varepsilon)}}\ dx

and similarly if we sum the second identity for {k=-1,-2,\dots} we have

\displaystyle  \sum_{k=-\infty}^{-1} \hat f(k) = \int_{\bf R} \frac{f(x+i\varepsilon) e^{2\pi i (x+i\varepsilon)}}{1 - e^{2\pi i (x+i\varepsilon)}}\ dx

\displaystyle  = - \int_{\bf R} \frac{f(x+i\varepsilon)}{1 - e^{-2\pi i (x+i\varepsilon)}}\ dx.

Adding these two identities and changing variables, we conclude that

\displaystyle  \sum_{k=-\infty}^\infty \hat f(k) = \lim_{R \rightarrow \infty} \int_{\gamma_{-R-i\varepsilon \rightarrow R-i\varepsilon}} \frac{f(z)}{1 - e^{-2\pi i z}}\ dz + \int_{\gamma_{R+i\varepsilon \rightarrow -R+i\varepsilon}} \frac{f(z)}{1 - e^{-2\pi i z}}\ dz.

We would like to use the residue theorem to evaluate the right-hand side, but we need to take a little care to avoid the poles of the integrand {\frac{f(z)}{1 - e^{-2\pi i z}}}, which are at the integers. Hence we shall restrict {R} to be a half-integer {R = N + \frac{1}{2}}. In this case, a routine application of the residue theorem shows that

\displaystyle  \int_{\gamma_{-R-i\varepsilon \rightarrow R-i\varepsilon \rightarrow R+i\varepsilon \rightarrow -R+i\varepsilon \rightarrow -R-i\varepsilon}} \frac{f(z)}{1 - e^{-2\pi i z}}\ dz = \sum_{n=-N}^N f(n).

Noting that {\frac{1}{1-e^{-2\pi i z}}} stays bounded for {z} in {\gamma_{R-i\varepsilon \rightarrow R+i\varepsilon}} or {\gamma_{-R+i\varepsilon \rightarrow -R-i\varepsilon}} when {R} is a half-integer, we also see from dominated convergence as before that

\displaystyle  \lim_{R \rightarrow \infty} \int_{\gamma_{R-i\varepsilon \rightarrow R+i\varepsilon}} \frac{f(z)}{1 - e^{-2\pi i z}}\ dz + \int_{\gamma_{-R+i\varepsilon \rightarrow -R-i\varepsilon}} \frac{f(z)}{1 - e^{-2\pi i z}} = 0.

The claim follows. \Box

Exercise 4 (Hilbert transform and Plemelj formula) Let {a, \alpha, f} be as in Proposition 3. Define the Cauchy-Stieltjes transform {{\mathcal C} f: {\bf C} \backslash {\bf R} \rightarrow {\bf C}} by the formula

\displaystyle  {\mathcal C} f(z) := \int_{\bf R} \frac{f(x)}{z-x}\ dx.

  • (i) Show that {{\mathcal C} f} is holomorphic on {{\bf C} \backslash {\bf R}} and has the Fourier representation

    \displaystyle  {\mathcal C} f(z) = -2\pi i \int_0^\infty \hat f(\xi) e^{2\pi i z \xi}\ d\xi

    in the upper half-plane {\mathrm{Im} z > 0} and

    \displaystyle  {\mathcal C} f(z) = 2\pi i \int_{-\infty}^0 \hat f(\xi) e^{2\pi i z \xi}\ d\xi

    in the lower half-plane {\mathrm{Im} z < 0}.
  • (ii) Establish the Plemelj formulae

    \displaystyle  \lim_{\varepsilon \rightarrow 0^+} {\mathcal C} f(x+i\varepsilon) = \pi H f(x) - i \pi f(x)


    \displaystyle  \lim_{\varepsilon \rightarrow 0^+} {\mathcal C} f(x-i\varepsilon) = \pi H f(x) + i \pi f(x)

    uniformly for any {x \in {\bf R}}, where the Hilbert transform {Hf} of {f} is defined by the principal value integral

    \displaystyle  Hf(x) := \lim_{\varepsilon \rightarrow 0^+, R \rightarrow \infty} \frac{1}{\pi} \int_{\varepsilon \leq |y-x| \leq R} \frac{f(y)}{x-y}\ dy.

  • (iii) Show that {{\mathcal C} f} is the unique holomorphic function on {{\bf C} \backslash {\bf R}} that obeys the decay bound

    \displaystyle  \sup_{x+iy \in {\bf C} \backslash {\bf R}} (1+|y|) |{\mathcal C} f(x+iy)| < \infty

    and solves the (very simple special case of the) Riemann-Hilbert problem

    \displaystyle  \lim_{\varepsilon \rightarrow 0^+} {\mathcal C} f(x+i\varepsilon) - \lim_{\varepsilon \rightarrow 0^+} {\mathcal C} f(x-i\varepsilon) = - 2\pi i f(x)

    uniformly for all {x \in {\bf R}}, with both limits existing uniformly in {x}.
  • (iv) Establish the identity

    \displaystyle  Hf(x) = -\int_{\bf R} i \mathrm{sgn}(\xi) \hat f(\xi) e^{2\pi i x \xi}\ d\xi,

    where the signum function {\mathrm{sgn}(\xi)} is defined to equal {+1} for {\xi>0}, {-1} for {\xi < 0}, and {0} for {\xi=0}.
  • (v) Assume now that {f} has mean zero (i.e., {\int_{\bf R} f(x)\ dx = 0}). Show that {Hf} extends holomorphically to the strip {\{ z: |\mathrm{Im} z| < a \}} and obeys the bound (8) (but possibly with a different constant {C_b}, and with {\alpha} replaced by a slightly smaller quantity {1 < \alpha' < \alpha}), with the identity

    \displaystyle  {\mathcal C} f(z) = \pi H f(z) - \pi i \mathrm{sgn}(\mathrm{Im}(z)) f(z) \ \ \ \ \ (9)

    holding for {0 < |\mathrm{Im} z| < a}. ({Hint: To exploit the mean value hypothesis to get good decay bounds on {Hf(x)}, write {\frac{f(y)}{x-y}} as the sum of {\frac{f(y)}{x}} and {f(y) (\frac{1}{x-y}-\frac{1}{x})} and use the mean value hypothesis to manage the first term. For the contribution of the second term, take advantage of contour shifting to avoid the singularity at {y=x}. One may have to divide the integrals one encounters into a couple of pieces and estimate each piece separately.)
  • (vi) Continue to assume that {f} has mean zero. Establish the identities

    \displaystyle  H(Hf) = -f


    \displaystyle  H( fHf) = \frac{(Hf)^2 - f^2}{2}.

    (Hint: for the latter inequality, square both sides of (9) and use (iii).)

Exercise 5 (Kramers-Kronig relations) Let {f} be a continuous function on the upper half-plane {\{ z: \mathrm{Im} z \geq 0 \}} which is holomorphic on the interior of this half-plane, and obeys the bound {|f(z)| \leq C/|z|} for all non-zero {z} in this half-plane and some {C>0}. Establish the Kramers-Kronig relations

\displaystyle  \mathrm{Re} f(x) = \lim_{\varepsilon \rightarrow 0, R \rightarrow \infty} \frac{1}{\pi} \int_{\varepsilon \leq |y-x| \leq R} \frac{\mathrm{Im} f(y)}{y-x}\ dy


\displaystyle  \mathrm{Im} f(x) = -\lim_{\varepsilon \rightarrow 0, R \rightarrow \infty} \frac{1}{\pi} \int_{\varepsilon \leq |y-x| \leq R} \frac{\mathrm{Re} f(y)}{y-x}\ dy

relating the real and imaginary parts of {f} to each other.

Exercise 6
  • (i) By applying the Poisson summation formula to the function {x \mapsto \frac{1}{x^2+a^2}}, establish the identity

    \displaystyle  \sum_{n \in {\bf Z}} \frac{1}{n^2 + a^2} = \frac{\pi}{a} \frac{e^{2\pi a}+1}{e^{2\pi a}-1}

    for any positive real number {a}. Explain why this is consistent with Exercise 24 from Notes 1.
  • (ii) By carefully taking limits of (i) as {a \rightarrow 0}, establish yet another alternate proof of Euler’s identity

    \displaystyle  \sum_{n=1}^\infty \frac{1}{n^2} = \frac{\pi^2}{6}.

Exercise 7 For {\tau} in the upper half-plane {\{ \mathrm{Im} \tau > 0\}}, define the theta function {\theta(\tau) := \sum_{n \in {\bf Z}} e^{\pi i n^2 \tau}}. Use Exercise 1 and the Poisson summation formula to establish the modular identity

\displaystyle  \theta(\tau) = (-i\tau)^{-1/2} \theta(-1/\tau)

for such {\tau}, where one takes the standard branch of the square root.

Exercise 8 (Fourier proof of Plancherel identity) Let {f: {\bf R} \rightarrow {\bf C}} be smooth and compactly supported. For any {\xi \in {\bf C}} with {\mathrm{Im} \xi \geq 0}, define the quantity

\displaystyle  A(\xi) := 2 \iint_{x>y} e^{2\pi i \xi (x-y)} \overline{f}(x) f(y)\ dx dy.

Remarkably, this proof of the Plancherel identity generalises to a nonlinear version involving a trace formula for the scattering transform for either Schrödinger or Dirac operators. For Schrödinger operators this was first obtained (implicitly) by Buslaev and Faddeev, and later more explicitly by by Deift and Killip. The version for Dirac operators more closely resembles the linear Plancherel identity; see for instance the appendix to this paper of Muscalu, Thiele, and myself. The quantity {A(\xi)} is a component of a nonlinear quantity known as the transmission coefficient {a(\xi)} of a Dirac operator with potential {f} and spectral parameter {\xi} (or {2\pi \xi}, depending on normalisations).

The Fourier inversion formula was only established in Proposition 3 for functions that had a suitable holomorphic extension to a strip, but one can relax the hypotheses by a limiting argument. Here is one such example of this:

Exercise 9 (More general Fourier inversion formula) Let {f: {\bf R} \rightarrow {\bf C}} be continuous and obey the bound {|f(x)| \leq \frac{C}{1+|x|^2}} for all {x \in {\bf R}} and some {C>0}. Suppose that the Fourier transform {\hat f: {\bf R} \rightarrow {\bf C}} is absolutely integrable.

Exercise 10 (Laplace inversion formula) Let {f: [0,+\infty) \rightarrow {\bf C}} be a continuously twice differentiable function, obeying the bounds {|f(x)|, |f'(x)|, |f''(x)| \leq \frac{C}{1+|x|^2}} for all {x \geq 0} and some {C>0}.
  • (i) Show that the Fourier transform {\hat f} obeys the asymptotic

    \displaystyle  \hat f(\xi) = \frac{f(0)}{2\pi i \xi} + O( \frac{C}{|\xi|^2} )

    for any non-zero {\xi \in {\bf R}}.
  • (ii) Establish the principal value inversion formula

    \displaystyle  f(x) = \lim_{T \rightarrow +\infty} \int_{-T}^T \hat f(\xi) e^{2\pi i x \xi}\ d\xi

    for any positive real {x}. (Hint: modify the proof of Exercise 9(ii).) What happens when {x} is negative? zero?
  • (iii) Define the Laplace transform {{\mathcal L} f(s)} of {f} for {\mathrm{Re}(s) \geq 0} by the formula

    \displaystyle  {\mathcal L} f(s) := \int_0^\infty f(t) e^{-st}\ dt.

    Show that {{\mathcal L} f} is continuous on the half-plane {\{ s: \mathrm{Re}(s) \geq 0\}}, holomorphic on the interior of this half-plane, and obeys the Laplace-Mellin inversion formula

    \displaystyle  f(t) = \frac{1}{2\pi i} \lim_{T \rightarrow +\infty} \int_{\gamma_{\sigma-iT \rightarrow \sigma+iT}} e^{st} {\mathcal L} f(s) \ ds \ \ \ \ \ (10)

    for any {t>0} and {\sigma \geq 0}, where {\gamma_{\sigma-iT \rightarrow \sigma+iT}} is the line segment contour from {\sigma-iT} to {\sigma+iT}. Conclude in particular that the Laplace transform {{\mathcal L}} is injective on this class of functions {f}.
The Laplace-Mellin inversion formula in fact holds under more relaxed decay and regularity hypotheses than the ones given in this exercise, but we will not pursue these generalisations here. The limiting integral in (10) is also known as the Bromwich integral, and often written (with a slight abuse of notation) as {\frac{1}{2\pi i} \int_{\sigma-i\infty}^{\sigma+i\infty} e^{st} {\mathcal L} f(s)\ ds}. The Laplace transform is a close cousin of the Fourier transform that has many uses; for instance, it is a popular tool for analysing ordinary differential equations on half-lines such as {[0,+\infty)}.

Exercise 11 (Mellin inversion formula) Let {f: (0,+\infty) \rightarrow {\bf C}} be a continuous function that is compactly supported in {(0,+\infty)}. Define the Mellin transform {{\mathcal M} f: {\bf C} \rightarrow {\bf C}} by the formula

\displaystyle  {\mathcal M} f(s) := \int_0^\infty x^s f(x) \frac{dx}{x}.

Show that {{\mathcal M} f} is entire and one has the Mellin inversion formula

\displaystyle  f(x) = \frac{1}{2\pi i} \lim_{T \rightarrow +\infty} \int_{\gamma_{\sigma-iT \rightarrow \sigma+iT}} x^{-s} {\mathcal M} f(s)\ ds

for any {x \in {\bf R}} and {\sigma \in {\bf R}}. The regularity and support hypotheses on {f} can be relaxed significantly, but we will not pursue this direction here.

Exercise 12 (Perron’s formula) Let {f: {\bf N} \rightarrow {\bf C}} be a function which is of subpolynomial growth in the sense that {|f(n)| \leq C_\varepsilon n^\varepsilon} for all {n \in {\bf N}} and {\varepsilon>0}, where {C_\varepsilon} depends on {\varepsilon} (and {f}). For {s} in the half-plane {\{ \mathrm{Re} s > 1 \}}, form the Dirichlet series

\displaystyle  F(s) := \sum_{n=1}^\infty \frac{f(n)}{n^s}.

For any non-integer {x>0} and any {\sigma>1}, establish Perron’s formula

\displaystyle  \sum_{n \leq x} f(n) = \frac{1}{2\pi i} \lim_{T \rightarrow \infty} \int_{\gamma_{\sigma-iT \rightarrow \sigma+iT}} F(s) \frac{x^s}{s}\ ds. \ \ \ \ \ (11)

What happens when {x} is an integer? (The Perron formula and its many variants are of great utility in analytic number theory; see these previous lecture notes for further discussion.)

Exercise 13 (Solution to Schrödinger equation) Let {f, a} be as in Proposition 3. Define the function {u: {\bf R} \times {\bf R} \rightarrow {\bf C}} by the formula \{ u(t,x) := \int_R \hat f(\xi) e^{2\pi i x \xi – 4 \pi^2 i \xi^2 t} d\xi.\}
  • (i) Show that {f} is a smooth function of {t,x} that obeys the Schrödinger equation {i \partial_t u + \partial_{xx} u = 0} with initial condition {u(0,x) = f(x)} for {x \in {\bf R}}.
  • (ii) Establish the formula

    \displaystyle  u(t,x) = \frac{1}{(4\pi i t)^{1/2}} \int_{\bf R} e^{-\frac{|x-y|^2}{4it}} f(y)\ dy

    for {x \in {\bf R}} and {t \neq 0}, where we use the standard branch of the square root.

— 2. Phragmén-Lindelöf and Paley-Wiener —

The maximum modulus principle (Exercise 26 from 246A Notes 1) for holomorphic functions asserts that if a function continuous on a compact subset {K} of the plane and holomorphic on the interior of that set is bounded in magnitude by a bound {M} on the boundary {\partial K}, then it is also bounded by {M} on the interior. This principle does not directly apply for noncompact domains {K}: for instance, on the entire complex plane {{\bf C}}, there is no boundary whatsoever and the bound is clearly vacuous. On the half-plane {\{ \mathrm{Im} z \geq 0 \}}, the holomorphic function {\cos z} (for instance) is bounded in magnitude by {1} on the boundary of the half-plane, but grows exponentially in the interior. Similarly, in the strip {\{ z: -\pi/2 \leq \mathrm{Re} z \leq \pi/2\}}, the holomorphic function {\exp(\exp(iz))} is bounded in magnitude by {1} on the boundary of the strip, but is grows double-exponentially in the interior of the strip. However, if one does not have such absurdly high growth, one can recover a form of the maximum principle, known as the Phragmén-Lindelöf principle. Here is one formulation of this principle:

Theorem 14 (Lindelöf’s theorem) Let {f} be a continuous function on a strip {S := \{ \sigma+it: a \leq \sigma \leq b; t \in {\bf R} \}} for some {b>a}, which is holomorphic in the interior of the strip and obeys the bound

\displaystyle  |f(\sigma+it)| \leq A \exp( B \exp( (1-\delta) \frac{\pi}{b-a} |t|) ) \ \ \ \ \ (12)

for all {\sigma+it \in S} and some constants {A, \delta > 0}. Suppose also that {|f(a+it)| \leq M} and {|f(b+it)| \leq M} for all {t \in {\bf R}} and some {M>0}. Then we have {|f(\sigma+it)| \leq M} for all {a \leq \sigma \leq b} and {t \in {\bf R}}.

Remark 15 The hypothesis (12) is a qualitative hypothesis rather than a quantitative one, since the exact values of {A, B, \delta} do not show up in the conclusion. It is quite a mild condition; any function of exponential growth in {t}, or even with such super-exponential growth as {O( |t|^{|t|})} or {O(e^{|t|^{O(1)}})}, will obey (12). The principle however fails without this hypothesis, as discussed previously.

Proof: By shifting and dilating (adjusting {A,B,\delta} as necessary) we can reduce to the case {a = -\pi/2}, {b = \pi/2}, and by multiplying {f} by a constant we can also normalise {M=1}.

Suppose we temporarily assume that {f(\sigma+it) \rightarrow 0} as {|\sigma+it| \rightarrow \infty}. Then on a sufficiently large rectangle {\{ \sigma+it: -\pi/2 \leq \sigma \leq \pi/2; -T \leq t \leq T \}}, we have {|f| \leq 1} on the boundary of the rectangle, hence on the interior by the maximum modulus principle. Sending {T \rightarrow \infty}, we obtain the claim.

To remove the assumption that {f} goes to zero at infinity, we use the trick of giving ourselves an epsilon of room. Namely, we multiply {f(z)} by the holomorphic function {g_\varepsilon(z) := \exp( -\varepsilon \exp(i(1-\delta/2) z) )} for some {\varepsilon > 0}. A little complex arithmetic shows that the function {f(z) g_\varepsilon(z) g_\varepsilon(-z)} goes to zero at infinity in {S}. Applying the previous case to this function, then taking limits as {\varepsilon \rightarrow 0}, we obtain the claim. \Box

Corollary 16 (Phragmén-Lindelöf principle in a sector) Let {f} be a continuous function on a sector {S := \{ re^{i\theta}: r \geq 0, \alpha \leq \theta \leq \beta \}} for some {\alpha < \beta < \alpha + 2\pi}, which is holomorphic on the interior of the sector and obeys the bound

\displaystyle  |f(z)| \leq A \exp( B |z|^a )

for some {A,B > 0} and {0 < a < \frac{\pi}{\beta-\alpha}}. Suppose also that {|f(z)| \leq M} on the boundary of the sector {S} for some {M >0}. Then one also has {|f(z)| \leq M} in the interior.

Proof: Apply Theorem 14 to the function {f(\exp(iz))} on the strip {\{ \sigma+it: \alpha \leq \sigma \leq \beta\}}. \Box

Exercise 17 With the notation and hypotheses of Theorem 14, show that the function {\sigma \mapsto \sup_{t \in {\bf R}} |f(\sigma+it)|} is log-convex on {[a,b]}.

Exercise 18 (Hadamard three-circles theorem) Let {f} be a holomorphic function on an annulus {\{ z \in {\bf C}: R_1 \leq |z| \leq R_2 \}}. Show that the function {r \mapsto \sup_{\theta \in [0,2\pi]} |f(re^{i\theta})|} is log-convex on {[R_1,R_2]}.

Exercise 19 (Phragmén-Lindelöf principle) Let {f} be as in Theorem 14 with {a=0, b=1}, but with the hypotheses after “Suppose also” in that theorem replaced instead by the bounds {|f(0+it)| \leq C(1+|t|)^{a_0}} and {|f(1+it)| \leq C(1+|t|)^{a_1}} for all {t \in {\bf R}} and some exponents {a_0,a_1 \in {\bf R}} and a constant {C>0}. Show that one has {|f(\sigma+it)| \leq C' (1+|t|)^{(1-\sigma) a_0 + \sigma a_1}} for all {\sigma+it \in S} and some constant {C'} (which is allowed to depend on the constants {A, \delta} in (12), as well as {C,a_0,a_1}). (Hint: it is convenient to work first in a half-strip such as {\{ \sigma+it \in S: t \geq T \}} for some large {T}. Then multiply {f} by something like {\exp( - ((1-z)a_0+z a_1) \log(-iz) )} for some suitable branch of the logarithm and apply a variant of Theorem 14 for the half-strip. A more refined estimate in this regard is due to Rademacher.) This particular version of the principle gives the convexity bound for Dirichlet series such as the Riemann zeta function. Bounds which exploit the deeper properties of these functions to improve upon the convexity bound are known as subconvexity bounds and are of major importance in analytic number theory, which is of course well outside the scope of this course.

Now we can establish a remarkable converse of sorts to Exercise 2(ii) known as the Paley-Wiener theorem, that links the exponential growth of (the analytic continuation) of a function with the support of its Fourier transform:

Theorem 20 (Paley-Wiener theorem) Let {f: {\bf R} \rightarrow {\bf C}} be a continuous function obeying the decay condition

\displaystyle  |f(x)| \leq C/(1+|x|^2) \ \ \ \ \ (13)

for all {x \in {\bf R}} and some {C>0}. Let {M > 0}. Then the following are equivalent:
  • (i) The Fourier transform {\hat f} is supported on {[-M,M]}.
  • (ii) {f} extends analytically to an entire function that obeys the bound {|f(z)| \leq A e^{2\pi M |z|}} for some {A>0}.
  • (iii) {f} extends analytically to an entire function that obeys the bound {|f(z)| \leq A e^{2\pi M |\mathrm{Im} z|}} for some {A>0}.

The continuity and decay hypotheses on {f} can be relaxed, but we will not explore such generalisations here.

Proof: If (i) holds, then by Exercise 9, we have the inversion formula (4), and the claim (iii) then holds by a slight modification of Exercise 2(ii). Also, the claim (iii) clearly implies (ii).

Now we see why (iii) implies (i). We first assume that we have the stronger bound

\displaystyle  |f(z)| \leq A e^{2\pi M |\mathrm{Im} z|} / (1 + |z|^2) \ \ \ \ \ (14)

for {z \in {\bf C}}. Then we can apply Proposition 3 for any {a>0}, and conclude in particular that

\displaystyle  \hat f(\xi) = e^{-2\pi b \xi} \int_{\bf R} f(x-ib) e^{-2\pi i x \xi}\ dx

for any {\xi \in {\bf R}} and {b \in {\bf R}}. Applying (14) and the triangle inequality, we see that

\displaystyle  \hat f(\xi) \lesssim_A e^{-2\pi b \xi} e^{2\pi M |b|}.

If {\xi > M}, we can then send {b \rightarrow +\infty} and conclude that {\hat f(\xi)=0}; similarly for {\xi < -M} we can send {b \rightarrow -\infty} and again conclude {\hat f(\xi) = 0}. This establishes (i) in this case.

Now suppose we only have the weaker bound on {f} assumed in (iii). We again use the epsilon of room trick. For any {\varepsilon>0}, we consider the modified function {f_\varepsilon(z) := f(z) / (1 - i \varepsilon z)^2}. This is still holomorphic on the lower half-plane {\{ z: \mathrm{Im} z \leq 0 \}} and obeys a bound of the form (14) on this half-plane. An inspection of the previous arguments shows that we can still show that {\hat f_\varepsilon(\xi) = 0} for {\xi > M} despite no longer having holomorphicity on the entire upper half-plane; sending {\varepsilon \rightarrow 0} using dominated convergence we conclude that {\hat f(\xi) = 0} for {\xi > M}. A similar argument (now using {1+i\varepsilon z} in place of {1-i\varepsilon z} shows that {\hat f(\xi) = 0} for {\xi < -M}. This proves (i).

Finally, we show that (ii) implies (iii). The function {f(z) e^{2\pi i Mz}} is entire, bounded on the real axis by (13), bounded on the upper imaginary axis by (iii), and has exponential growth. By Corollary 16, it is also bounded on the upper half-plane, which gives (iii) in the upper half-plane as well. A similar argument (using {e^{-2\pi i Mz}} in place of {e^{2\pi i Mz}}) also yields (iii) in the lower half-plane. \Box

— 3. The Hardy uncertainty principle —

Informally speaking, the uncertainty principle for the Fourier transform asserts that a function {f} and its Fourier transform cannot simultaneously be strongly localised, except in the degenerate case when {f} is identically zero. There are many rigorous formulations of this principle. Perhaps the best known is the Heisenberg uncertainty principle

\displaystyle  (\int_{\bf R} (\xi-\xi_0)^2 |\hat f(\xi)|^2\ d\xi)^{1/2} (\int_{\bf R} (x-x_0)^2 |f(x)|^2\ dx)^{1/2} \geq \frac{1}{4\pi} \int_{\bf R} |f(x)|^2\ dx,

valid for all {f: {\bf R} \rightarrow {\bf C}} and all {x_0,\xi_0 \in {\bf R}}, which we will not prove here (see for instance Exercise 47 of this previous set of lecture notes).

Another manifestation of the uncertainty principle is the following simple fact:

Lemma 21
  • (i) If {f: {\bf R} \rightarrow {\bf C}} is an integrable function that has exponential decay in the sense that one has {|f(x)| \leq C e^{-a|x|}} for all {x \in {\bf R}} and some {C,a>0}, then the Fourier transform {\hat f: {\bf R} \rightarrow {\bf C}} is either identically zero, or only has isolated zeroes (that is to say, the set {\{ \xi \in {\bf R}: \hat f(\xi) = 0 \}} is discrete.
  • (ii) If {f: {\bf R} \rightarrow {\bf C}} is a compactly supported continuous function such that {\hat f} is also compactly supported, then {f} is identically zero.

Proof: For (i), we observe from Exercise 2(iii) that {\hat f} extends holomorphically to a strip around the real axis, and the claim follows since non-zero holomorphic functions have isolated zeroes. For (ii), we observe from (i) that {\hat f} must be identically zero, and the claim now follows from the Fourier inversion formula (Exercise 9). \Box

Lemma 21(ii) rules out the existence of a bump function whose Fourier transform is also a bump function, which would have been a rather useful tool to have in harmonic analysis over the reals. (Such functions do exist however in some non-archimedean domains, such as the {p}-adics.) On the other hand, from Exercise 1 we see that we do at least have gaussian functions whose Fourier transform also decays as a gaussian. Unfortunately this is basically the best one can do:

Theorem 22 (Hardy uncertainty principle) Let {f} be a continuous function which obeys the bound {|f(x)| \leq C e^{-\pi ax^2}} for all {x \in {\bf R}} and some {C,a>0}. Suppose also that {|\hat f(\xi)| \leq C' e^{-\pi \xi^2/a}} for all {\xi \in {\bf R}} and some {C'>0}. Then {f(x)} is a scalar multiple of the gaussian {e^{-\pi ax^2}}, that is to say one has {f(x) = c e^{-\pi ax^2}} for some {c \in {\bf C}}.

Proof: By replacing {f} with the rescaled version {x \mapsto f(x/a^{1/2})}, which replaces {\hat f} with the rescaled version {\xi \mapsto a^{1/2} \hat f(a^{1/2} \xi)}, we may normalise {a=1}. By multiplying {f} by a small constant we may also normalise {C=C'=1}.

From Exercise 2(i), {\hat f} extends to an entire function. By the triangle inequality, we can bound

\displaystyle  |\hat f(\xi+i\eta)| \leq \int_{\bf R} e^{-\pi x^2} e^{2\pi x \eta}\ dx

for any {\xi,\eta \in {\bf R}}. Completing the square {e^{-\pi x^2} e^{2\pi x \eta} = e^{-\pi (x-\eta)^2} e^{\pi \eta^2}} and using {\int_{\bf R} e^{-\pi (x-\eta)^2}\ dx = \int_{\bf R} e^{-\pi x^2}\ dx = 1}, we conclude the bound

\displaystyle  |\hat f(\xi+i\eta)| \leq e^{\pi \eta^2}.

In particular, if we introduce the normalised function

\displaystyle  F(z) := e^{\pi z^2} \hat f(z)


\displaystyle  |F(\xi+i\eta)| \leq e^{\pi \xi^2}. \ \ \ \ \ (15)

In particular, {|F|} is bounded by {1} on the imaginary axis. On the other hand, from hypothesis {F} is also bounded by {1} on the real axis. We can now almost invoke the Phragmén-Lindeöf principle (Corollary 16) to conclude that {F} is bounded on all four quadrants, but the growth bound we have (15) is just barely too weak. To get around this we use the epsilon of room trick. For any {\varepsilon>0}, the function {F_\varepsilon(z) := e^{\pi i\varepsilon z^2} F(z)} is still entire, and is still bounded by {1} in magnitude on the real line. From (15) we have

\displaystyle  |F_\varepsilon(\xi+i\eta)| \leq e^{\pi \xi^2 - 2\varepsilon \pi \xi \eta}

so in particular it is bounded by {1} on the slightly tilted imaginary axis {(2\varepsilon + i) {\bf R}}. We can now apply Corollary 16 in the two acute-angle sectors between {(2\varepsilon+i) {\bf R}} and {{\bf R}} to conclude that {|F_\varepsilon(z)| \leq 1} in those two sectors; letting {\varepsilon \rightarrow 0}, we conclude that {|F(z)| \leq 1} in the first and third quadrants. A similar argument (using negative values of {\varepsilon}) shows that {|F(z)| \leq 1} in the second and fourth quadrants. By Liouville’s theorem, we conclude that {F} is constant, thus we have {\hat f(z) = c e^{-\pi z^2}} for some complex number {c}. The claim now follows from the Fourier inversion formula (Proposition 3(iv)) and Exercise 1. \Box

One corollary of this theorem is that if {f} is continuous and decays like {e^{-\pi ax^2}} or better, then {\hat f} cannot decay any faster than {e^{-\pi \xi^2/a}} without {f} vanishing identically. This is a stronger version of Lemma 21(ii). There is a more general tradeoff known as the Gel’fand-Shilov uncertainty principle, which roughly speaking asserts that if {f} decays like {e^{-\pi a x^p}} then {\hat f} cannot decay faster than {e^{-\pi b x^q}} without {f} vanishing identically, whenever {1 < p,q < \infty} are dual exponents in the sense that {\frac{1}{p}+\frac{1}{q}=1}, and {a^{1/p} b^{1/q}} is large enough (the precise threshold was established in work of Morgan). See for instance this article of Nazarov for further discussion of these variants.

Exercise 23 If {f} is continuous and obeys the bound {|f(x)| \leq C (1+|x|)^M e^{-\pi ax^2}} for some {M \geq 0} and {C,a>0} and all {x \in {\bf R}}, and {\hat f} obeys the bound {|\hat f(\xi)| \leq C' (1+|\xi|)^M e^{-\pi \xi^2/a}} for some {C'>0} and all {\xi \in {\bf R}}, show that {f} is of the form {f(x) = P(x) e^{-\pi ax^2}} for some polynomial {P} of degree at most {M}.

Exercise 24 In this exercise we develop an alternate proof of (a special case of) the Hardy uncertainty principle, which can be found in the original paper of Hardy. Let the hypotheses be as in Theorem 22.
  • (i) Show that the function {F(s) := (s+a)^{1/2} \int_{\bf R} e^{-\pi s x^2} f(x)\ dx} is holomorphic on the region {\{ s \in {\bf C}: \mathrm{Re}(s) > -a \}} and obeys the bound {|F(s)| \lesssim C \frac{|s+a|^{1/2}}{|\mathrm{Re}(s)+a|^{1/2}}} in this region, where we use the standard branch of the square root.
  • (ii) Show that the function {\tilde F(s) := (1 + a/s)^{1/2} \int_{\bf R} e^{-\pi \xi^2/s} \hat f(\xi)\ d\xi} is holomorphic on the region {\{ s \in {\bf C} \backslash \{0\}: \mathrm{Re}(1/s) > -1/a \}} and obeys the bound {|\tilde F(s)| \lesssim C \frac{|1+a/s|^{1/2}}{|\mathrm{Re}(1/s)+1/a|^{1/2}}} in this region.
  • (iii) Show that {F} and {\tilde F} agree on their common domain of definition.
  • (iv) Show that the functions {F,\tilde F} are constant. (You may find Exercise 13 from 246A Notes 4 to be useful.)
  • (v) Use the above to give an alternate proof of Theorem 22 in the case when {f} is even. (Hint: subtract a constant multiple of a gaussian from {f} to make {F} vanish, and conclude on Taylor expansion around the origin that all the even moments {\int_{\bf R} x^{2k} f(x)\ dx} vanish. Conclude that the Taylor series coefficients of {\hat f} around the origin all vanish.)
It is possible to adapt this argument to also cover the case of general {f} that are not required to be even; see the paper of Hardy for details.

Exercise 25 (This problem is due to Tom Liggett; see this previous post.) Let {a_0,a_1,\dots} be a sequence of complex numbers bounded in magnitude by some bound {M}, and suppose that the power series {f(t) := \sum_{n=0}^\infty a_n \frac{t^n}{n!}} obeys the bound {|f(t)| \leq C e^{-t}} for all {t \geq 0} and some {C>0}.
  • (i) Show that the Laplace transform {{\mathcal L} f(s) := \int_0^\infty f(t) e^{-st}\ dt} extends holomorphically to the region {\{ s: \mathrm{Re}(s) > -1 \}} and obeys the bound {|{\mathcal L} f(s)| \leq \frac{C}{1+\mathrm{Re}(s)}} in this region.
  • (ii) Show that the function {\tilde F(s) := \sum_{n=0}^\infty \frac{a_n}{s^{n+1}}} is holomorphic in the region {\{ s: |s| > 1\}}, obeys the bound {|\tilde F(s)| \leq \frac{M}{|s|-1}} in this region, and agrees with {{\mathcal L} f(s)} on the common domain of definition.
  • (iii) Show that {{\mathcal L} f(s)} is a constant multiple of {\frac{1}{1+s}}.
  • (iv) Show that the sequence {a_n} is a constant multiple of the sequence {(-1)^n}.

Remark 26 There are many further variants of the Hardy uncertainty principle. For instance we have the following uncertainty principle of Beurling, which we state in a strengthened form due to Bonami, Demange, and Jaming: if {f} is a square-integrable function such that {\int_{\bf R} \int_{\bf R} \frac{|f(x)| |\hat f(\xi)|}{(1+|x|+|\xi|)^N} e^{2\pi |x| |\xi|}\ dx d\xi < \infty}, then {f} is equal (almost everywhere) to a polynomial times a gaussian; it is not difficult to show that this implies Theorem 22 and Exercise 23, as well as the Gel’fand-Shilov uncertainty principle. In recent years, PDE-based proofs of the Hardy uncertainty principle have been established, which have then been generalised to establish uncertainty principles for various Schrödinger type equations; see for instance this review article of Kenig. I also have some older notes on the Hardy uncertainty principle in this blog post. Finally, we mention the Beurling-Malliavin theorem, which provides a precise description of the possible decay rates of a function whose Fourier transform is compactly supported; see for instance this paper of Mashregi, Nazarov, and Khavin for a modern treatment.

Terence Tao246B, Notes 1: Zeroes, poles, and factorisation of meromorphic functions

Previous set of notes: 246A Notes 5. Next set of notes: Notes 2.

— 1. Jensen’s formula —

Suppose {f} is a non-zero rational function {f =P/Q}, then by the fundamental theorem of algebra one can write