Planet Musings

December 10, 2016

Scott AaronsonMay reason trump the Trump in all of us

Two years ago, when I was the target of an online shaming campaign, what helped me through it were hundreds of messages of support from friends, slight acquaintances, and strangers of every background.  I vowed then to return the favor, by standing up when I saw decent people unfairly shamed.  Today I have an opportunity to make good.

Some time ago I had the privilege of interacting a bit with Sam Altman, president of the famed startup incubator Y Combinator (and a guy who’s thanked in pretty much everything Paul Graham writes).  By way of our mutual friend, the renowned former quantum computing researcher Michael Nielsen, Sam got in touch with me to solicit suggestions for “outside-the-box” scientists and writers, for a new grant program that Y Combinator was starting. I found Sam eager to delve into the merits of any suggestion, however outlandish, and was delighted to be able to make a difference for a few talented people who needed support.

Sam has also been one of the Silicon Valley leaders who’s written most clearly and openly about the threat to America posed by Donald Trump and the need to stop him, and he’s donated tens of thousands of dollars to anti-Trump causes.  Needless to say, I supported Sam on that as well.

Now Sam is under attack on social media, and there are even calls for him to resign as the president of Y Combinator.  Like me two years ago, Sam has instantly become the corporeal embodiment of the “nerd privilege” that keeps the marginalized out of Silicon Valley.

Why? Because, despite his own emphatic anti-Trump views, Sam rejected demands to fire Peter Thiel (who has an advisory role at Y Combinator) because of Thiel’s support for Trump.  Sam explained his reasoning at some length:

[A]s repugnant as Trump is to many of us, we are not going to fire someone over his or her support of a political candidate.  As far as we know, that would be unprecedented for supporting a major party nominee, and a dangerous path to start down (of course, if Peter said some of the things Trump says himself, he would no longer be part of Y Combinator) … The way we got into a situation with Trump as a major party nominee in the first place was by not talking to people who are very different than we are … I don’t understand how 43% of the country supports Trump.  But I’d like to find out, because we have to include everyone in our path forward.  If our best ideas are to stop talking to or fire anyone who disagrees with us, we’ll be facing this whole situation again in 2020.

The usual criticism of nerds is that we might have narrow technical abilities, but we lack wisdom about human affairs.  It’s ironic, then, that it appears to have fallen to Silicon Valley nerds to guard some of the most important human wisdom our sorry species ever came across—namely, the liberal ideals of the Enlightenment.  Like Sam, I despise pretty much everything Trump stands for, and I’ve been far from silent about it: I’ve blogged, donated money, advocated vote swapping, endured anonymous comments like “kill yourself kike”—whatever seemed like it might help even infinitesimally to ensure the richly-deserved electoral thrashing that Trump mercifully seems to be headed for in a few weeks.

But I also, I confess, oppose the forces that apparently see Trump less as a global calamity to be averted, than as a golden opportunity to take down anything they don’t like that’s ever been spotted within a thousand-mile radius of Trump Tower.  (Where does this Kevin Bacon game end, anyway?  Do “six degrees of Trump” suffice to contaminate you?)

And not only do I not feel a shadow of a hint of a moral conflict here, but it seems to me that precisely the same liberal Enlightenment principles are behind both of these stances.

But I’d go yet further.  It sort of flabbergasts me when social-justice activists don’t understand that, if we condemn not only Trump, not only his supporters, but even vociferous Trump opponents who associate with Trump supporters (!), all we’ll do is feed the narrative that got Trumpism as far as it has—namely, that of a smug, bubble-encased, virtue-signalling leftist elite subject to runaway political correctness spirals.  Like, a hundred million Americans’ worldviews revolve around the fear of liberal persecution, and we’re going to change their minds by firing anyone who refuses to fire them?  As a recent Washington Post story illustrates, the opposite approach is harder but can bear spectacular results.

Now, as for Peter Thiel: three years ago, he funded a small interdisciplinary workshop on the coast of France that I attended.  With me there were a bunch of honest-to-goodness conservative Christians, a Freudian psychoanalyst, a novelist, a right-wing radio host, some scientists and Silicon Valley executives, and of course Thiel himself.  Each, I found, offered tons to disagree about but also some morsels to learn.

Thiel’s worldview, focused on the technological and organizational greatness that (in his view) Western civilization used to have and has subsequently lost, was a bit too dark and pessimistic for me, and I’m a pretty dark and pessimistic person.  Thiel gave a complicated, meandering lecture that involved comparing modern narratives about Silicon Valley entrepreneurs against myths of gods, heroes, and martyrs throughout history, such as Romulus and Remus (the legendary founders of Rome).  The talk might have made more sense to Thiel than to his listeners.

At the same time, Thiel’s range of knowledge and curiosity was pretty awesome.  He avidly followed all the talks (including mine, on P vs. NP and quantum complexity theory) and asked pertinent questions. When the conversation turned to D-Wave, and Thiel’s own decision not to invest in it, he laid out the conclusions he’d come to from an extremely quick look at the question, then quizzed me as to whether he’d gotten anything wrong.  He hadn’t.

From that conversation among others, I formed the impression that Thiel’s success as an investor is, at least in part, down neither to luck nor to connections, but to a module in his brain that most people lack, which makes blazingly fast and accurate judgments about tech startups.  No wonder Y Combinator would want to keep him as an adviser.

But, OK, I’m so used to the same person being spectacularly right on some things and spectacularly wrong on others, that it no longer causes even slight cognitive dissonance.  You just take the issues one by one.

I was happy, on balance, when it came out that Thiel had financed the lawsuit that brought down Gawker Media.  Gawker really had used its power to bully the innocent, and it had broken the law to do it.  And if it’s an unaccountable, anti-egalitarian, billionaire Godzilla against a vicious, privacy-violating, nerd-baiting King Kong—well then, I guess I’m with Godzilla.

More recently, I was appalled when Thiel spoke at the Republican convention, pandering to the crowd with Fox-News-style attack lines that were unworthy of a mind of his caliber.  I lost a lot of respect for Thiel that day.  But that’s the thing: unlike with literally every other speaker at the GOP convention, my respect for Thiel had started from a point that made a decrease possible.

I reject huge parts of Thiel’s worldview.  I also reject any worldview that would threaten me with ostracism for talking to Thiel, attending a workshop he sponsors, or saying anything good about him.  This is not actually a difficult balance.

Today, when it sometimes seems like much of the world has united in salivating for a cataclysmic showdown between whites and non-whites, Christians and Muslims, “dudebros” and feminists, etc., and that the salivators differ mostly just in who they want to see victorious in the coming battle and who humiliated, it can feel lonely to stick up for naïve, outdated values like the free exchange of ideas, friendly disagreement, the presumption of innocence, and the primacy of the individual over the tribe.  But those are the values that took us all the way from a bronze spear through the enemy’s heart to a snarky rebuttal on the arXiv, and they’ll continue to build anything worth building.

And now to watch the third debate (I’ll check the comments afterward)…

Update (Oct. 20): See also this post from a blog called TheMoneyIllusion. My favorite excerpt:

So let’s see. Not only should Trump be shunned for his appalling political views, an otherwise highly respected Silicon Valley entrepreneur who just happens to support Trump (along with 80 million other Americans) should also be shunned. And a person who despises Trump and works against him but who defends Thiel’s right to his own political views should also resign. Does that mean I should be shunned too? After all, I’m a guy who hates Trump, writing a post that defends a guy who hates Trump, who wrote a post defending a guy’s freedom to support Trump, who in turn supports Trump. And suppose my mother sticks up for me? Should she also be shunned?

It’s almost enough to make me vote . . . no, just kidding.

Question … Which people on the left are beyond the pale? Suppose Thiel had supported Hugo Chavez? How about Castro? Mao? Pol Pot? Perhaps the degrees of separation could be calibrated to the awfulness of the left-winger:

Chavez: One degree of separation. (Corbyn, Sean Penn, etc.)

Castro: Two degrees of separation is still toxic.

Lenin: Three degrees of separation.

Mao: Four degrees of separation.

Pol Pot: Five degrees of separation.

Scott AaronsonThe No-Cloning Theorem and the Human Condition: My After-Dinner Talk at QCRYPT

The following are the after-dinner remarks that I delivered at QCRYPT’2016, the premier quantum cryptography conference, on Thursday Sep. 15 in Washington DC.  You could compare to my after-dinner remarks at QIP’2006 to see how much I’ve “”matured”” since then. Thanks so much to Yi-Kai Liu and the other organizers for inviting me and for putting on a really fantastic conference.

It’s wonderful to be here at QCRYPT among so many friends—this is the first significant conference I’ve attended since I moved from MIT to Texas. I do, however, need to register a complaint with the organizers, which is: why wasn’t I allowed to bring my concealed firearm to the conference? You know, down in Texas, we don’t look too kindly on you academic elitists in Washington DC telling us what to do, who we can and can’t shoot and so forth. Don’t mess with Texas! As you might’ve heard, many of us Texans even support a big, beautiful, physical wall being built along our border with Mexico. Personally, though, I don’t think the wall proposal goes far enough. Forget about illegal immigration and smuggling: I don’t even want Americans and Mexicans to be able to win the CHSH game with probability exceeding 3/4. Do any of you know what kind of wall could prevent that? Maybe a metaphysical wall.

OK, but that’s not what I wanted to talk about. When Yi-Kai asked me to give an after-dinner talk, I wasn’t sure whether to try to say something actually relevant to quantum cryptography or just make jokes. So I’ll do something in between: I’ll tell you about research directions in quantum cryptography that are also jokes.

The subject of this talk is a deep theorem that stands as one of the crowning achievements of our field. I refer, of course, to the No-Cloning Theorem. Almost everything we’re talking about at this conference, from QKD onwards, is based in some way on quantum states being unclonable. If you read Stephen Wiesner’s paper from 1968, which founded quantum cryptography, the No-Cloning Theorem already played a central role—although Wiesner didn’t call it that. By the way, here’s my #1 piece of research advice to the students in the audience: if you want to become immortal, just find some fact that everyone already knows and give it a name!

I’d like to pose the question: why should our universe be governed by physical laws that make the No-Cloning Theorem true? I mean, it’s possible that there’s some other reason for our universe to be quantum-mechanical, and No-Cloning is just a byproduct of that. No-Cloning would then be like the armpit of quantum mechanics: not there because it does anything useful, but just because there’s gotta be something under your arms.

OK, but No-Cloning feels really fundamental. One of my early memories is when I was 5 years old or so, and utterly transfixed by my dad’s home fax machine, one of those crappy 1980s fax machines with wax paper. I kept thinking about it: is it really true that a piece of paper gets transmaterialized, sent through a wire, and reconstituted at the other location? Could I have been that wrong about how the universe works? Until finally I got it—and once you get it, it’s hard even to recapture your original confusion, because it becomes so obvious that the world is made not of stuff but of copyable bits of information. “Information wants to be free!”

The No-Cloning Theorem represents nothing less than a partial return to the view of the world that I had before I was five. It says that quantum information doesn’t want to be free: it wants to be private. There is, it turns out, a kind of information that’s tied to a particular place, or set of places. It can be moved around, or even teleported, but it can’t be copied the way a fax machine copies bits.

So I think it’s worth at least entertaining the possibility that we don’t have No-Cloning because of quantum mechanics; we have quantum mechanics because of No-Cloning—or because quantum mechanics is the simplest, most elegant theory that has unclonability as a core principle. But if so, that just pushes the question back to: why should unclonability be a core principle of physics?

Quantum Key Distribution

A first suggestion about this question came from Gilles Brassard, who’s here. Years ago, I attended a talk by Gilles in which he speculated that the laws of quantum mechanics are what they are because Quantum Key Distribution (QKD) has to be possible, while bit commitment has to be impossible. If true, that would be awesome for the people at this conference. It would mean that, far from being this exotic competitor to RSA and Diffie-Hellman that’s distance-limited and bandwidth-limited and has a tiny market share right now, QKD would be the entire reason why the universe is as it is! Or maybe what this really amounts to is an appeal to the Anthropic Principle. Like, if QKD hadn’t been possible, then we wouldn’t be here at QCRYPT to talk about it.

Quantum Money

But maybe we should search more broadly for the reasons why our laws of physics satisfy a No-Cloning Theorem. Wiesner’s paper sort of hinted at QKD, but the main thing it had was a scheme for unforgeable quantum money. This is one of the most direct uses imaginable for the No-Cloning Theorem: to store economic value in something that it’s physically impossible to copy. So maybe that’s the reason for No-Cloning: because God wanted us to have e-commerce, and didn’t want us to have to bother with blockchains (and certainly not with credit card numbers).

The central difficulty with quantum money is: how do you authenticate a bill as genuine? (OK, fine, there’s also the dificulty of how to keep a bill coherent in your wallet for more than a microsecond or whatever. But we’ll leave that for the engineers.)

In Wiesner’s original scheme, he solved the authentication problem by saying that, whenever you want to verify a quantum bill, you bring it back to the bank that printed it. The bank then looks up the bill’s classical serial number in a giant database, which tells the bank in which basis to measure each of the bill’s qubits.

With this system, you can actually get information-theoretic security against counterfeiting. OK, but the fact that you have to bring a bill to the bank to be verified negates much of the advantage of quantum money in the first place. If you’re going to keep involving a bank, then why not just use a credit card?

That’s why over the past decade, some of us have been working on public-key quantum money: that is, quantum money that anyone can verify. For this kind of quantum money, it’s easy to see that the No-Cloning Theorem is no longer enough: you also need some cryptographic assumption. But OK, we can consider that. In recent years, we’ve achieved glory by proposing a huge variety of public-key quantum money schemes—and we’ve achieved even greater glory by breaking almost all of them!

After a while, there were basically two schemes left standing: one based on knot theory by Ed Farhi, Peter Shor, et al. That one has been proven to be secure under the assumption that it can’t be broken. The second scheme, which Paul Christiano and I proposed in 2012, is based on hidden subspaces encoded by multivariate polynomials. For our scheme, Paul and I were able to do better than Farhi et al.: we gave a security reduction. That is, we proved that our quantum money scheme is secure, unless there’s a polynomial-time quantum algorithm to find hidden subspaces encoded by low-degree multivariate polynomials (yadda yadda, you can look up the details) with much greater success probability than we thought possible.

Today, the situation is that my and Paul’s security proof remains completely valid, but meanwhile, our money is completely insecure! Our reduction means the opposite of what we thought it did. There is a break of our quantum money scheme, and as a consequence, there’s also a quantum algorithm to find large subspaces hidden by low-degree polynomials with much better success probability than we’d thought. What happened was that first, some French algebraic cryptanalysts—Faugere, Pena, I can’t pronounce their names—used Gröbner bases to break the noiseless version of scheme, in classical polynomial time. So I thought, phew! At least I had acceded when Paul insisted that we also include a noisy version of the scheme. But later, Paul noticed that there’s a quantum reduction from the problem of breaking our noisy scheme to the problem of breaking the noiseless one, so the former is broken as well.

I’m choosing to spin this positively: “we used quantum money to discover a striking new quantum algorithm for finding subspaces hidden by low-degree polynomials. Err, yes, that’s exactly what we did.”

But, bottom line, until we manage to invent a better public-key quantum money scheme, or otherwise sort this out, I don’t think we’re entitled to claim that God put unclonability into our universe in order for quantum money to be possible.

Copy-Protected Quantum Software

So if not money, then what about its cousin, copy-protected software—could that be why No-Cloning holds? By copy-protected quantum software, I just mean a quantum state that, if you feed it into your quantum computer, lets you evaluate some Boolean function on any input of your choice, but that doesn’t let you efficiently prepare more states that let the same function be evaluated. I think this is important as one of the preeminent evil applications of quantum information. Why should nuclear physicists and genetic engineers get a monopoly on the evil stuff?

OK, but is copy-protected quantum software even possible? The first worry you might have is that, yeah, maybe it’s possible, but then every time you wanted to run the quantum program, you’d have to make a measurement that destroyed it. So then you’d have to go back and buy a new copy of the program for the next run, and so on. Of course, to the software company, this would presumably be a feature rather than a bug!

But as it turns out, there’s a fact many of you know—sometimes called the “Gentle Measurement Lemma,” other times the “Almost As Good As New Lemma”—which says that, as long as the outcome of your measurement on a quantum state could be predicted almost with certainty given knowledge of the state, the measurement can be implemented in such a way that it hardly damages the state at all. This tells us that, if quantum money, copy-protected quantum software, and the other things we’re talking about are possible at all, then they can also be made reusable. I summarize the principle as: “if rockets, then space shuttles.”

Much like with quantum money, one can show that, relative to a suitable oracle, it’s possible to quantumly copy-protect any efficiently computable function—or rather, any function that’s hard to learn from its input/output behavior. Indeed, the implementation can be not only copy-protected but also obfuscated, so that the user learns nothing besides the input/output behavior. As Bill Fefferman pointed out in his talk this morning, the No-Cloning Theorem lets us bypass Barak et al.’s famous result on the impossibility of obfuscation, because their impossibility proof assumed the ability to copy the obfuscated program.

Of course, what we really care about is whether quantum copy-protection is possible in the real world, with no oracle. I was able to give candidate implementations of quantum copy-protection for extremely special functions, like one that just checks the validity of a password. In the general case—that is, for arbitrary programs—Paul Christiano has a beautiful proposal for how to do it, which builds on our hidden-subspace money scheme. Unfortunately, since our money scheme is currently in the shop being repaired, it’s probably premature to think about the security of the much more complicated copy-protection scheme! But these are wonderful open problems, and I encourage any of you to come and scoop us. Once we know whether uncopyable quantum software is possible at all, we could then debate whether it’s the “reason” for our universe to have unclonability as a core principle.

Unclonable Proofs and Advice

Along the same lines, I can’t resist mentioning some favorite research directions, which some enterprising student here could totally turn into a talk at next year’s QCRYPT.

Firstly, what can we say about clonable versus unclonable quantum proofs—that is, QMA witness states? In other words: for which problems in QMA can we ensure that there’s an accepting witness that lets you efficiently create as many additional accepting witnesses as you want? (I mean, besides the QCMA problems, the ones that have short classical witnesses?) For which problems in QMA can we ensure that there’s an accepting witness that doesn’t let you efficiently create any additional accepting witnesses? I do have a few observations about these questions—ask me if you’re interested—but on the whole, I believe almost anything one can ask about them remains open.

Admittedly, it’s not clear how much use an unclonable proof would be. Like, imagine a quantum state that encoded a proof of the Riemann Hypothesis, and which you would keep in your bedroom, in a glass orb on your nightstand or something. And whenever you felt your doubts about the Riemann Hypothesis resurfacing, you’d take the state out of its orb and measure it again to reassure yourself of RH’s truth. You’d be like, “my preciousssss!” And no one else could copy your state and thereby gain the same Riemann-faith-restoring powers that you had. I dunno, I probably won’t hawk this application in a DARPA grant.

Similarly, one can ask about clonable versus unclonable quantum advice states—that is, initial states that are given to you to boost your computational power beyond that of an ordinary quantum computer. And that’s also a fascinating open problem.

OK, but maybe none of this quite gets at why our universe has unclonability. And this is an after-dinner talk, so do you want me to get to the really crazy stuff? Yes?

Self-Referential Paradoxes

OK! What if unclonability is our universe’s way around the paradoxes of self-reference, like the unsolvability of the halting problem and Gödel’s Incompleteness Theorem? Allow me to explain what I mean.

In kindergarten or wherever, we all learn Turing’s proof that there’s no computer program to solve the halting problem. But what isn’t usually stressed is that that proof actually does more than advertised. If someone hands you a program that they claim solves the halting problem, Turing doesn’t merely tell you that that person is wrong—rather, he shows you exactly how to expose the person as a jackass, by constructing an example input on which their program fails. All you do is, you take their claimed halt-decider, modify it in some simple way, and then feed the result back to the halt-decider as input. You thereby create a situation where, if your program halts given its own code as input, then it must run forever, and if it runs forever then it halts. “WHOOOOSH!” [head-exploding gesture]

OK, but now imagine that the program someone hands you, which they claim solves the halting problem, is a quantum program. That is, it’s a quantum state, which you measure in some basis depending on the program you’re interested in, in order to decide whether that program halts. Well, the truth is, this quantum program still can’t work to solve the halting problem. After all, there’s some classical program that simulates the quantum one, albeit less efficiently, and we already know that the classical program can’t work.

But now consider the question: how would you actually produce an example input on which this quantum program failed to solve the halting problem? Like, suppose the program worked on every input you tried. Then ultimately, to produce a counterexample, you might need to follow Turing’s proof and make a copy of the claimed quantum halt-decider. But then, of course, you’d run up against the No-Cloning Theorem!

So we seem to arrive at the conclusion that, while of course there’s no quantum program to solve the halting problem, there might be a quantum program for which no one could explicitly refute that it solved the halting problem, by giving a counterexample.

I was pretty excited about this observation for a day or two, until I noticed the following. Let’s suppose your quantum program that allegedly solves the halting problem has n qubits. Then it’s possible to prove that the program can’t possibly be used to compute more than, say, 2n bits of Chaitin’s constant Ω, which is the probability that a random program halts. OK, but if we had an actual oracle for the halting problem, we could use it to compute as many bits of Ω as we wanted. So, suppose I treated my quantum program as if it were an oracle for the halting problem, and I used it to compute the first 2n bits of Ω. Then I would know that, assuming the truth of quantum mechanics, the program must have made a mistake somewhere. There would still be something weird, which is that I wouldn’t know on which input my program had made an error—I would just know that it must’ve erred somewhere! With a bit of cleverness, one can narrow things down to two inputs, such that the quantum halt-decider must have erred on at least one of them. But I don’t know whether it’s possible to go further, and concentrate the wrongness on a single query.

We can play a similar game with other famous applications of self-reference. For example, suppose we use a quantum state to encode a system of axioms. Then that system of axioms will still be subject to Gödel’s Incompleteness Theorem (which I guess I believe despite the umlaut). If it’s consistent, it won’t be able to prove all the true statements of arithmetic. But we might never be able to produce an explicit example of a true statement that the axioms don’t prove. To do so we’d have to clone the state encoding the axioms and thereby violate No-Cloning.

Personal Identity

But since I’m a bit drunk, I should confess that all this stuff about Gödel and self-reference is just a warmup to what I really wanted to talk about, which is whether the No-Cloning Theorem might have anything to do with the mysteries of personal identity and “free will.” I first encountered this idea in Roger Penrose’s book, The Emperor’s New Mind. But I want to stress that I’m not talking here about the possibility that the brain is a quantum computer—much less about the possibility that it’s a quantum-gravitational hypercomputer that uses microtubules to solve the halting problem! I might be drunk, but I’m not that drunk. I also think that the Penrose-Lucas argument, based on Gödel’s Theorem, for why the brain has to work that way is fundamentally flawed.

But here I’m talking about something different. See, I have a lot of friends in the Singularity / Friendly AI movement. And I talk to them whenever I pass through the Bay Area, which is where they congregate. And many of them express great confidence that before too long—maybe in 20 or 30 years, maybe in 100 years—we’ll be able to upload ourselves to computers and live forever on the Internet (as opposed to just living 70% of our lives on the Internet, like we do today).

This would have lots of advantages. For example, any time you were about to do something dangerous, you’d just make a backup copy of yourself first. If you were struggling with a conference deadline, you’d spawn 100 temporary copies of yourself. If you wanted to visit Mars or Jupiter, you’d just email yourself there. If Trump became president, you’d not run yourself for 8 years (or maybe 80 or 800 years). And so on.

Admittedly, some awkward questions arise. For example, let’s say the hardware runs three copies of your code and takes a majority vote, just for error-correcting purposes. Does that bring three copies of you into existence, or only one copy? Or let’s say your code is run homomorphically encrypted, with the only decryption key stored in another galaxy. Does that count? Or you email yourself to Mars. If you want to make sure that you’ll wake up on Mars, is it important that you delete the copy of your code that remains on earth? Does it matter whether anyone runs the code or not? And what exactly counts as “running” it? Or my favorite one: could someone threaten you by saying, “look, I have a copy of your code, and if you don’t do what I say, I’m going to make a thousand copies of it and subject them all to horrible tortures?”

The issue, in all these cases, is that in a world where there could be millions of copies of your code running on different substrates in different locations—or things where it’s not even clear whether they count as a copy or not—we don’t have a principled way to take as input a description of the state of the universe, and then identify where in the universe you are—or even a probability distribution over places where you could be. And yet you seem to need such a way in order to make predictions and decisions.

A few years ago, I wrote this gigantic, post-tenure essay called The Ghost in the Quantum Turing Machine, where I tried to make the point that we don’t know at what level of granularity a brain would need to be simulated in order to duplicate someone’s subjective identity. Maybe you’d only need to go down to the level of neurons and synapses. But if you needed to go all the way down to the molecular level, then the No-Cloning Theorem would immediately throw a wrench into most of the paradoxes of personal identity that we discussed earlier.

For it would mean that there were some microscopic yet essential details about each of us that were fundamentally uncopyable, localized to a particular part of space. We would all, in effect, be quantumly copy-protected software. Each of us would have a core of unpredictability—not merely probabilistic unpredictability, like that of a quantum random number generator, but genuine unpredictability—that an external model of us would fail to capture completely. Of course, by having futuristic nanorobots scan our brains and so forth, it would be possible in principle to make extremely realistic copies of us. But those copies necessarily wouldn’t capture quite everything. And, one can speculate, maybe not enough for your subjective experience to “transfer over.”

Maybe the most striking aspect of this picture is that sure, you could teleport yourself to Mars—but to do so you’d need to use quantum teleportation, and as we all know, quantum teleportation necessarily destroys the original copy of the teleported state. So we’d avert this metaphysical crisis about what to do with the copy that remained on Earth.

Look—I don’t know if any of you are like me, and have ever gotten depressed by reflecting that all of your life experiences, all your joys and sorrows and loves and losses, every itch and flick of your finger, could in principle be encoded by a huge but finite string of bits, and therefore by a single positive integer. (Really? No one else gets depressed about that?) It’s kind of like: given that this integer has existed since before there was a universe, and will continue to exist after the universe has degenerated into a thin gruel of radiation, what’s the point of even going through the motions? You know?

But the No-Cloning Theorem raises the possibility that at least this integer is really your integer. At least it’s something that no one else knows, and no one else could know in principle, even with futuristic brain-scanning technology: you’ll always be able to surprise the world with a new digit. I don’t know if that’s true or not, but if it were true, then it seems like the sort of thing that would be worthy of elevating unclonability to a fundamental principle of the universe.

So as you enjoy your dinner and dessert at this historic Mayflower Hotel, I ask you to reflect on the following. People can photograph this event, they can video it, they can type up transcripts, in principle they could even record everything that happens down to the millimeter level, and post it on the Internet for posterity. But they’re not gonna get the quantum states. There’s something about this evening, like about every evening, that will vanish forever, so please savor it while it lasts. Thank you.

Update (Sep. 20): Unbeknownst to me, Marc Kaplan did video the event and put it up on YouTube! Click here to watch. Thanks very much to Marc! I hope you enjoy, even though of course, the video can’t precisely clone the experience of having been there.

[Note: The part where I raise my middle finger is an inside joke—one of the speakers during the technical sessions inadvertently did the same while making a point, causing great mirth in the audience.]

December 09, 2016

John BaezModelling Interconnected Systems with Decorated Corelations

Here at the Simons Institute workshop on compositionality, my talk on network theory explained how to use ‘decorated cospans’ as a general model of open systems. These were invented by Brendan Fong, and are nicely explained in his thesis:

• Brendan Fong, The Algebra of Open and Interconnected Systems. (Blog article here.)

But he went further: to understand the externally observable behavior of an open system we often want to simplify a decorated cospan and get another sort of structure, which he calls a ‘decorated corelation’.

In this talk, Brendan explained decorated corelations and what they’re good for:

Abstract. Hypergraph categories are monoidal categories in which every object is equipped with a special commutative Frobenius monoid. Morphisms in a hypergraph category can hence be represented by string diagrams in which strings can branch and split: diagrams that are reminiscent of electrical circuit diagrams. As such they provide a framework for formalising the syntax and semantics of circuit-type diagrammatic languages. In this talk I will introduce decorated corelations as a tool for building hypergraph categories and hypergraph functors, drawing examples from linear algebra and dynamical systems.

David Hoggforegrounds and optimization

It was a low-research day today. But I did get in a short and valuable discussion of CMB foregrounds with Boris Leistedt (NYU). The approach I want to pursue is to make a latent-variable model, which posits a set of scalar fields, and nonlinear functions that convert them into (high resolution) maps, that are compared to the data through the relevant beams. I think this will (almost provably) beat current approaches. I also had some conversations with Bedell about optimization. We are trying to fit for stellar spectra and radial velocities, and (as usual) we are finding that out-of-the-box optimizers don't work well!

Robert HellingWorkshop on IoT Liability at 33c3

After my recent blog post  on the dangers of liability for manufacturers of devices in the times of IoT, I decided I will run a workshop at 33C3, the annual hacker conference of the Chaos Computer club. I am proud I could convince Ulf Buermeyer (well known judge, expert in constitutional law, hacker, podcaster) to host this workshop with me.

The main motivation for me is that I hope that this will be a big issue in the coming year but it might still be early enough to influence policy before everybody commits herself to their favorite (snake oil) solution.

I have started collecting and sorting ideas in a Google document. Internet of things signed by the author

BackreactionNo, physicists have no fear of math. But they should have more respect.

Heart curve. [Img Src]
Even physicists are `afraid’ of mathematics,” a recent headline screamed at me. This, I thought, is ridiculous. You can accuse physicists of many stupidities, but being afraid of math isn’t one of them.

But the headline was supposedly based on scientific research. Someone, somewhere, had written a paper claiming that physicists are more likely to cite papers which are light on math. So, I put aside my confirmation bias and read the paper. It was more interesting than expected.

The paper in question, it turned out, didn’t show that physicists are afraid of math. Instead, it was a reply to a comment on an analysis of an earlier paper which had claimed that biologists are afraid of math.

The original paper, “Heavy use of equations impedes communication among biologists,” was published in 2012 by Tim Fawcett and Andrew Higginson, both at the Centre for Research in Animal Behaviour at the University of Exeter. They analyzed a sample of 649 papers published in the top journals in ecology and evolution and looked for a correlation between the density of equations (equations per text) and the number of citations. They found a statistically significant negative correlation: Papers with a higher density of equations were less cited.

Unexpectedly, a group of physicists came to the defense of biologists. In a paper published last year under the title “Are physicists afraid of mathematics?” Jonathan Kollmer, Thorsten Pöschel, and Jason Galla set out to demonstrate that the statistics underlying the conclusion that biologists are afraid of math were fundamentally flawed. With these methods, the authors claimed, you could show anything, even that physicists are afraid of math. Which is surely absurd. Right? They argued that Fawcett and Higginson had arrived at a wrong conclusion because they had sorted their data into peculiar and seemingly arbitrarily chosen bins.

It’s a good point to make. The chance that you find a correlation with any one binning is much higher than the chance that you find it with one particular binning. Therefore, you can easily screw over measures of statistical significance if you allow a search for a correlation with different binnings.

As example, Kollmer at al used a sample of papers from Physical Review Letters (PRL) and showed that, with the bins used by Fawcett and Higginson, physicists too could be said to be afraid of math. Alas, the correlation goes away with a finer binning and hence is meaningless.

PRL, for those not familiar with it, is one of the most highly ranked journals in physics generally. It publishes papers from all subfields that are of broad interest to the community. PRL also has a strictly enforced page limit: You have to squeeze everything on four pages – an imo completely idiotic policy that more often than not means the authors have to publish a longer, comprehensible, paper elsewhere.

The paper that now made headline is a reply by the authors of the original study to the physicists who criticized it. Fawcett and Higginson explain that the physicists’ data analysis is too naïve. They point out that the citation rates have a pronounced rich-get-richer trend which amplifies any initial differences. This leads to an `overdispersed’ data set in which the standard errors are misleading. In that case, a more complicated statistical analysis is necessary, which is the type of analysis they had done in the original paper. The arbitrarily seeming bins were just chosen to visualize the results, they write, but their finding is independent of that.

Fawcett and Higginson then repeated the same analysis on the physics papers and revealed a clear trend: Physicists too are more likely to cite papers with a smaller density of equations!

I have to admit this doesn’t surprise me much. A paper with fewer verbal explanations per equation assumes the reader is more familiar with the particular formalism being used, and this means the target audience shrinks. The consequence is fewer citations.

But this doesn’t mean physicists are afraid of math, it merely means they have to decide which calculations are worth their time. If it’s a topic they might never have an application for, making their way through a paper heavy on math might not be the so helpful to advance their research. On the other hand, reading a more general introduction or short survey with fewer equations might be useful also on topics farther from one’s own research. These citation habits therefore show mostly that the more specialized a paper, the fewer people will read it.

I had a brief exchange with Andrew Higginson, one of the authors of the paper that’s been headlined as “Physicists are afraid of math.” He emphasizes that their point was that “busy scientists might not have time to digest lots of equations without accompanying text.” But I don’t think that’s the right conclusion to draw. Busy scientists who are familiar with the equations might not have the time to digest much text, and busy scientists might not have the time to digest long papers, period. (The corresponding author of the physicists’ study did not respond to my question for comment.)

In their recent reply, the Fawcett and Higginson suggest that “an immediate, pragmatic solution to this apparent problem would be to reduce the density of equations and add explanatory text for non-specialised readers.”

I’m not sure, however, there is any problem here in need of being solved. Adding text for non-specialized readers might be cumbersome for the specialized readers. I understand the risk that the current practice exaggerates the already pronounced specialization, which can hinder communication. But this, I think, would be better taken care of by reviews and overview papers to be referenced in the, typically short, papers on recent research.

So, I don’t think physicists are afraid of math. Indeed, it sometimes worries me how much and how uncritically they love math.

Math can do a lot of things for you, but in the end it’s merely a device to derive consequences from assumptions. Physics isn’t math, however, and physics papers don’t work by theorems and proofs. Theoretical physicists pride themselves on their intuition and frequently take the freedom to shortcut mathematical proofs by drawing on experience. This, however, amounts to making additional assumptions, for example that a certain relation holds or an expansion is well-defined.

That works well as long as these assumptions are used to arrive at testable predictions. In that case it matters only if the theory works, and the mathematical rigor can well be left to mathematical physicists for clean-up, which is how things went historically.

But today in the foundations of physics, theory-development proceeds largely without experimental feedback. In such cases, keeping track of assumptions is crucial – otherwise it becomes impossible to tell what really follows from what. Or, I should say, it would be crucial because theoretical physicists are bad at this.

The result is that some research areas can amass loosely connected arguments that follow from a set of assumptions that aren’t written down anywhere. This might result in an entirely self-consistent construction and yet not have anything to do with reality. If the underlying assumptions aren’t written down anywhere, the result is conceptual mud in which case we can’t tell philosophy from mathematics.

One such unwritten assumption that is widely used, for example, is the absence of finetuning or that a physical theory be “natural.” This assumption isn’t supported by evidence and it can’t be mathematically derived. Hence, it should be treated as a hypothesis - but that isn’t happening because the assumption itself isn’t recognized for what it is.

Another unwritten assumption is that more fundamental theories should somehow be simpler. This is reflected for example in the belief that the gauge couplings of the standard model should meet in one point. That’s an assumption; it isn’t supported by evidence. And yet it’s not treated as a hypothesis but as a guide to theory-development.

And all presently existing research on the quantization of gravity rests on the assumption that quantum theory itself remains unmodified at short distance scales. This is another assumption that isn’t written down anywhere. Should that turn out to be not true, decades of research will have been useless.

In lack of experimental guidance, what we need in the foundations of physics is conceptual clarity. We need rigorous math, not claims to experience, intuition, and aesthetic appeal. Don’t be afraid, but we need more math.

David Hoggbetter exoplanet searches through chemistry

Today I had the great privilege of spending the day with the group of Karin Öberg (CfA) at Harvard and also the NG Next team. Öberg's group is doing so many great things, related to astronomical observations of proto-planetary disks and also real lab experiments on ices and solid-state chemistry relevant to interstellar and accretion-disk physical conditions. Here are some highlights:

Ellen Price (CfA) showed a consistent chemical model in which they evolve the molecular contents of gas as it orbits in the evolving accretion disk. This is based on a NLTE chemical model built by Ilse Cleeves (CfA). She can see big changes as gas crosses the snow line (or various snow lines for different species). Edith Fayolle (CfA) showed absolutely incredible ALMA observations they have (with Cleeves and also Ryan Loomis, CfA) of proto-planetary disks around young stars. In these observations, there is so much, I could fill a whole separate set of blog posts: They see various kinds of organics that weren't expected to be formed in abiotic conditions. They also can image the disk in two spatial dimensions and the radial velocity dimension in thousands of chemical species. This is unprecedented detail on a disk, and also unprecedented information about molecules in these conditions. We discussed ways we could simultaneously model all of this and make very sensitive measurements of what is going on at various laces in the disk. As part of this discussion, Öberg and I discussed the problem that there isn't a good out-of-the-box imaging pipeline for ALMA, in part because different users with different targets have very different priors and goals.

But then we switched to lab stuff! Mahesh Rajappan (CfA) described the Öberg-lab experimental setups, in which they can deposit ices, including multilayer things, and then radiate them or heat them, to measure solid-state chemistry processes directly. Jennifer Bergner (CfA) is doing lab experiments to find and measure configurational rate constants for chemical processes in ices. These rates relate to the processes by which molecules find one another and reorient to permit solid-state reactions to take place. She was working in particular on O+CH4 to CH3OH. One theme of the day's conversations is that the organic chemistry of proto-planetary disks is seriously complex and contains everything that is needed for life (we think).

At the end of the day we discussed research synergies. I think the biggest is in building consistent models of the thousands of molecules, in the dynamical disk. One incredible idea is that a forming gas-giant planet should be hot (gravitational or accretion energy); this could affect the local chemistry in the disk: We could see the thermal signature of a forming planet in molecular species! That's a great goal for the near future. Öberg's group (and especially Cleeves and Loomis) have the data in hand, or coming when ALMA gets to their targets.

December 08, 2016

Jordan EllenbergWomen in math: accountability

I’ve talked about women in math a lot on this blog and maybe you think of me as someone who is aware of and resistant to sexism in our profession.  But what if we look at some actual numbers?

My Ph.D. students:  2 out of 15 are women.

Coauthors, last 5 years: 2 out of 23 are women.

Letters posted on MathJobs, last 2 years:  3 out of 24 are women.

That is sobering.  I’m hesitant about posting this, but I think it’s a good idea for senior people to look at their own numbers and get some sense of how much they’re actually doing to support early-career women in the profession.

Update:  I removed the numbers for tenure/promotion letters.  A correspondent pointed out that these, unlike the other items, are supposed to be confidential, and given the small numbers are at least partially de-anonymizable.

John BaezSemantics for Physicists

I once complained that my student Brendan Fong said ‘semantics’ too much. You see, I’m in a math department, but he was actually in the computer science department at Oxford: I was his informal supervisor. Theoretical computer scientists love talking about syntax versus semantics—that is, written expressions versus what those expressions actually mean, or programs versus what those programs actually do. So Brendan was very comfortable with that distinction. But my other grad students, coming from a math department didn’t understand it… and he was mentioning it in practically ever other sentence.

In 1963, in his PhD thesis, Bill Lawvere figured out a way to talk about syntax versus semantics that even mathematicians—well, even category theorists—could understand. It’s called ‘functorial semantics’. The idea is that things you write are morphisms in some category X, while their meanings are morphisms in some other category Y. There’s a functor F \colon X \to Y which sends things you write to their meanings. This functor sends syntax to semantics!

But physicists may not enjoy this idea unless they see it at work in physics. In physics, too, the distinction is important! But it takes a while to understand. I hope Prakash Panangaden’s talk at the start of the Simons Institute workshop on compositionality is helpful. Check it out:

December 07, 2016

John BaezCompositionality in Network Theory

Here are the slides of my talk at the workshop on compositionality at the Simons Institute for the Theory of Computing next week. I decided to talk about some new work with Blake Pollard. You can see slides here:

• John Baez, Compositionality in network theory, 6 December 2016.

and a video here:

Abstract. To describe systems composed of interacting parts, scientists and engineers draw diagrams of networks: flow charts, Petri nets, electrical circuit diagrams, signal-flow graphs, chemical reaction networks, Feynman diagrams and the like. In principle all these different diagrams fit into a common framework: the mathematics of symmetric monoidal categories. This has been known for some time. However, the details are more challenging, and ultimately more rewarding, than this basic insight. Two complementary approaches are presentations of symmetric monoidal categories using generators and relations (which are more algebraic in flavor) and decorated cospan categories (which are more geometrical). In this talk we focus on the latter.

This talk assumes considerable familiarity with category theory. For a much gentler talk on the same theme, see:

Monoidal categories of networks.


Jordan EllenbergHast and Matei, “Moments of arithmetic functions in short intervals”

Two of my students, Daniel Hast and Vlad Matei, have an awesome new paper, and here I am to tell you about it!

A couple of years ago at AIM I saw Jon Keating talk about this charming paper by him and Ze’ev Rudnick.  Here’s the idea.  Let f be an arithmetic function: in that particular paper, it’s the von Mangoldt function, but you can ask the same question (and they do) for Möbius and many others.

Now we know the von Mangoldt function is 1 on average.  To be more precise: in a suitably long interval ([X,X+X^{1/2 + \epsilon}] is long enough under Riemann) the average of von Mangoldt is always close to 1.  But the average over a short interval can vary.  You can think of the sum of von Mangoldt over  [x,x+H], with H = x^d,  as a function f(x) which has mean 1 but which for d < 1/2 need not be concentrated at 1.  Can we understand how much it varies?  For a start, can we compute its variance as x ranges from 1 to X?This is the subject of a conjecture of Goldston and Montgomery.  Keating and Rudnick don’t prove that conjecture in its original form; rather, they study the problem transposed into the context of the polynomial ring F_q[t].  Here, the analogue of archimedean absolute value is the absolute value

|f| = q^{\deg f}

so an interval of size q^h is the set of f such that deg(f-f_0) < q^h for some polynomial f_0.

So you can take the monic polynomials of degree n, split that up into q^{n-h} intervals of size q^h, and sum f over each interval, and take the variance of all these sums.  Call this V_f(n,h).  What Keating and Rudnick show is that

\lim_{q \rightarrow \infty} q^{-(h+1)} V(n,h) = n - h - 2.

This is not quite the analogue of the Goldston-Montgomery conjecture; that would be the limit as n,h grow with q fixed.  That, for now, seems out of reach.  Keating and Rudnick’s argument goes through the Katz equidistribution theorems (plus some rather hairy integration over groups) and the nature of those equidistribution theorems — like the Weil bounds from which they ultimately derive — is to give you control as q gets large with everything else fixed (or at least growing very slo-o-o-o-o-wly.)  Generally speaking, a large-q result like this reflects knowledge of the top cohomology group, while getting a fixed-q result requires some control of all the cohomology groups, or at least all the cohomology groups in a large range.

Now for Hast and Matei’s paper.  Their observation is that the variance of the von Mangoldt function can actually be studied algebro-geometrically without swinging the Katz hammer.  Namely:  there’s a variety X_{2,n,h} which parametrizes pairs (f_1,f_2) of monic degree-n polynomials whose difference has degree less than h, together with an ordering of the roots of each polynomial.  X_{2,n,h} carries an action of S_n x S_n by permuting the roots.  Write Y_{2,n,h} for the quotient by this action; that’s just the space of pairs of polynomials in the same h-interval.  Now the variance Keating and Rudnick ask about is more or less

\sum_{(f_1, f_2) \in Y_{2,n,h}(\mathbf{F}_q)} \Lambda(f_1) \Lambda(f_2)

where $\Lambda$ is the von Mangoldt function.  But note that $\Lambda(f_i)$ is completely determined by the factorization of $f_i$; this being the case, we can use Grothendieck-Lefschetz to express the sum above in terms of the Frobenius traces on the groups

H^i(X_{2,n,h},\mathbf{Q}_\ell) \otimes_{\mathbf{Q}_\ell[S_n \times S_n]} V_\Lambda

where $V_\Lambda$ is a representation of $S_n \times S_n$ keeping track of the function $\Lambda$.  (This move is pretty standard and is the kind of thing that happens all over the place in my paper with Church and Farb about point-counting and representation stability, in section 2.2 particularly)

When the smoke clears, the behavior of the variance V(n,h) as q gets large is controlled by the top “interesting” cohomology group of X_{2,n,h}.  Now X_{2,n,h} is a complete intersection, so you might think its interesting cohomology is all in the middle.  But no — it’s singular, so you have to be more careful.  Hast and Matei carry out a careful analysis of the singular locus of X_{2,n,h}, and use this to show that the cohomology groups that vanish in a large range.  Outside that range, Weil bounds give an upper bound on the trace of Frobenius.  In the end they get

V(n,h) = O(q^{h+1}).

In other words, they get the order of growth from Keating-Rudnick but not the constant term, and they get it without invoking all the machinery of Katz.  What’s more, their argument has nothing to do with von Mangoldt; it applies to essentially any function of f that only depends on the degrees and multiplicities of the irreducible factors.

What would be really great is to understand that top cohomology group H as an S_n x S_n – representation.  That’s what you’d need in order to get that n-h-2 from Keating-Rudnick; you could just compute it as the inner product of H with V_\Lambda.  You want the variance of a different arithmetic function, you pair H with a different representation.  H has all the answers.  But neither they nor I could see how to compute H.

Then came Brad Rodgers.  Two months ago, he posted a preprint which gets the constant term for the variance of any arithmetic function in short intervals.  His argument, like Keating-Rudnick, goes through Katz equidistribution.  This is the same information we would have gotten from knowing H.  And it turns out that Hast and Matei can actually provably recover H from Rodgers’ result; the point is that the power of q Rodgers get can only arise from H, because all the other cohomology groups of high enough weight are the ones Hast and Matei already showed are zero.

So in the end they find

H = \oplus_\lambda V_\lambda \boxtimes V_\lambda

where \lambda ranges over all partitions of n whose top row has length at most n-h-2.

I don’t think I’ve ever seen this kind of representation come up before — is it familiar to anyone?

Anyway:  what I like so much about this new development is that it runs contrary to the main current in this subject, in which you prove theorems in topology or algebraic geometry and use them to solve counting problems in arithmetic statistics over function fields.  Here, the arrow goes the other way; from Rodgers’s counting theorem, they get a computation of a cohomology group which I can’t see any way to get at by algebraic geometry.  That’s cool!  The other example I know of the arrow going this direction is this beautiful paper of Browning and Vishe, in which they use the circle method over function fields to prove the irreducibility of spaces of rational curves on low-degree hypersurfaces.  I should blog about that paper too!  But this is already getting long….



Doug NatelsonSuggested textbooks for "Modern Physics"?

I'd be curious for opinions out there regarding available textbooks for "Modern Physics".  Typically this is a sophomore-level undergraduate course at places that offer such a class.  Often these tend to focus on special relativity and "baby quantum", making the bulk of "modern" end in approximately 1930.   Ideally it would be great to have a book that includes topics from the latter half of the 20th century, too, without having them be too simplistic.  Looking around on amazon, there are a number of choices, but I wonder if I'm missing some diamond in the rough out there by not necessarily using the right search terms, or perhaps there is a new book in development of which I am unaware.   The book by Rohlf looks interesting, but the price tag is shocking - a trait shared by many similarly titled works on amazon.  Any suggestions?

December 06, 2016

Jacques Distler MathML Update

For a while now, Frédéric Wang has been urging me to enable native MathML rendering for Safari. He and his colleagues have made many improvements to Webkit’s MathML support. But there were at least two show-stopper bugs that prevented me from flipping the switch.


  • The STIX Two fonts were released this week. They represent a big improvement on Version 1, and are finally definitively better than LatinModern for displaying MathML on the web. Most interestingly, they fix this bug. That means I can bundle these fonts1, solving both that problem and the more generic problem of users not having a good set of Math fonts installed.
  • Thus inspired, I wrote a little Javascript polyfill to fix the other bug.

While there are still a lot of remaining issues (for instance this one fixed), I think Safari’s native MathML rendering is now good enough for everyday use (and, in enough respects, superior to MathJax’s) to enable it by default in Instiki, Heterotic Beast and on this blog.

Of course, you’ll need to be using2 Safari 10.1 or Safari Technology Preview.

1 In an ideal world, OS vendors would bundle the STIX Two fonts with their next release (as Apple previously bundled the STIX fonts with MacOSX ≥10.7) and motivated users would download and install them in the meantime.

2 N.B.: We’re not browser-sniffing (anymore). We’re just checking for MathML support comparable to Webkit version 203640. If Google (for instance) decided to re-enable MathML support in Chrome, that would work too.

Terence TaoFinite time blowup for a supercritical defocusing nonlinear Schrodinger system

I’ve just uploaded to the arXiv my paper Finite time blowup for a supercritical defocusing nonlinear Schrödinger system, submitted to Analysis and PDE. This paper is an analogue of a recent paper of mine in which I constructed a supercritical defocusing nonlinear wave (NLW) system {-\partial_{tt} u + \Delta u = (\nabla F)(u)} which exhibited smooth solutions that developed singularities in finite time. Here, we achieve essentially the same conclusion for the (inhomogeneous) supercritical defocusing nonlinear Schrödinger (NLS) equation

\displaystyle  i \partial_t u + \Delta u = (\nabla F)(u) + G \ \ \ \ \ (1)

where {u: {\bf R} \times {\bf R}^d \rightarrow {\bf C}^m} is now a system of scalar fields, {F: {\bf C}^m \rightarrow {\bf R}} is a potential which is strictly positive and homogeneous of degree {p+1} (and invariant under phase rotations {u \mapsto e^{i\theta} u}), and {G: {\bf R} \times {\bf R}^d \rightarrow {\bf C}^m} is a smooth compactly supported forcing term, needed for technical reasons.

To oversimplify somewhat, the equation (1) is known to be globally regular in the energy-subcritical case when {d \leq 2}, or when {d \geq 3} and {p < 1+\frac{4}{d-2}}; global regularity is also known (but is significantly more difficult to establish) in the energy-critical case when {d \geq 3} and {p = 1 +\frac{4}{d-2}}. (This is an oversimplification for a number of reasons, in particular in higher dimensions one only knows global well-posedness instead of global regularity. See this previous post for some exploration of this issue in the context of nonlinear wave equations.) The main result of this paper is to show that global regularity can break down in the remaining energy-supercritical case when {d \geq 3} and {p > 1 + \frac{4}{d-2}}, at least when the target dimension {m} is allowed to be sufficiently large depending on the spatial dimension {d} (I did not try to achieve the optimal value of {m} here, but the argument gives a value of {m} that grows quadratically in {d}). Unfortunately, this result does not directly impact the most interesting case of the defocusing scalar NLS equation

\displaystyle  i \partial_t u + \Delta u = |u|^{p-1} u \ \ \ \ \ (2)

in which {m=1}; however it does establish a rigorous barrier to any attempt to prove global regularity for the scalar NLS equation, in that such an attempt needs to crucially use some property of the scalar NLS that is not shared by the more general systems in (1). For instance, any approach that is primarily based on the conservation laws of mass, momentum, and energy (which are common to both (1) and (2)) will not be sufficient to establish global regularity of supercritical defocusing scalar NLS.

The method of proof in this paper is broadly similar to that in the previous paper for NLW, but with a number of additional technical complications. Both proofs begin by reducing matters to constructing a discretely self-similar solution. In the case of NLW, this solution lived on a forward light cone {\{ (t,x): |x| \leq t \}} and obeyed a self-similarity

\displaystyle  u(2t, 2x) = 2^{-\frac{2}{p-1}} u(t,x).

The ability to restrict to a light cone arose from the finite speed of propagation properties of NLW. For NLS, the solution will instead live on the domain

\displaystyle  H_d := ([0,+\infty) \times {\bf R}^d) \backslash \{(0,0)\}

and obey a parabolic self-similarity

\displaystyle  u(4t, 2x) = 2^{-\frac{2}{p-1}} u(t,x)

and solve the homogeneous version {G=0} of (1). (The inhomogeneity {G} emerges when one truncates the self-similar solution so that the initial data is compactly supported in space.) A key technical point is that {u} has to be smooth everywhere in {H_d}, including the boundary component {\{ (0,x): x \in {\bf R}^d \backslash \{0\}\}}. This unfortunately rules out many of the existing constructions of self-similar solutions, which typically will have some sort of singularity at the spatial origin.

The remaining steps of the argument can broadly be described as quantifier elimination: one systematically eliminates each of the degrees of freedom of the problem in turn by locating the necessary and sufficient conditions required of the remaining degrees of freedom in order for the constraints of a particular degree of freedom to be satisfiable. The first such degree of freedom to eliminate is the potential function {F}. The task here is to determine what constraints must exist on a putative solution {u} in order for there to exist a (positive, homogeneous, smooth away from origin) potential {F} obeying the homogeneous NLS equation

\displaystyle  i \partial_t u + \Delta u = (\nabla F)(u).

Firstly, the requirement that {F} be homogeneous implies the Euler identity

\displaystyle  \langle (\nabla F)(u), u \rangle = (p+1) F(u)

(where {\langle,\rangle} denotes the standard real inner product on {{\bf C}^m}), while the requirement that {F} be phase invariant similarly yields the variant identity

\displaystyle  \langle (\nabla F)(u), iu \rangle = 0,

so if one defines the potential energy field to be {V = F(u)}, we obtain from the chain rule the equations

\displaystyle  \langle i \partial_t u + \Delta u, u \rangle = (p+1) V

\displaystyle  \langle i \partial_t u + \Delta u, iu \rangle = 0

\displaystyle  \langle i \partial_t u + \Delta u, \partial_t u \rangle = \partial_t V

\displaystyle  \langle i \partial_t u + \Delta u, \partial_{x_j} u \rangle = \partial_{x_j} V.

Conversely, it turns out (roughly speaking) that if one can locate fields {u} and {V} obeying the above equations (as well as some other technical regularity and non-degeneracy conditions), then one can find an {F} with all the required properties. The first of these equations can be thought of as a definition of the potential energy field {V}, and the other three equations are basically disguised versions of the conservation laws of mass, energy, and momentum respectively. The construction of {F} relies on a classical extension theorem of Seeley that is a relative of the Whitney extension theorem.

Now that the potential {F} is eliminated, the next degree of freedom to eliminate is the solution field {u}. One can observe that the above equations involving {u} and {V} can be expressed instead in terms of {V} and the Gram-type matrix {G[u,u]} of {u}, which is a {(2d+4) \times (2d+4)} matrix consisting of the inner products {\langle D_1 u, D_2 u \rangle} where {D_1,D_2} range amongst the {2d+4} differential operators

\displaystyle  D_1,D_2 \in \{ 1, i, \partial_t, i\partial_t, \partial_{x_1},\dots,\partial_{x_d}, i\partial_{x_1}, \dots, i\partial_{x_d}\}.

To eliminate {u}, one thus needs to answer the question of what properties are required of a {(2d+4) \times (2d+4)} matrix {G} for it to be the Gram-type matrix {G = G[u,u]} of a field {u}. Amongst some obvious necessary conditions are that {G} needs to be symmetric and positive semi-definite; there are also additional constraints coming from identities such as

\displaystyle  \partial_t \langle u, u \rangle = 2 \langle u, \partial_t u \rangle

\displaystyle  \langle i u, \partial_t u \rangle = - \langle u, i \partial_t u \rangle


\displaystyle  \partial_{x_j} \langle iu, \partial_{x_k} u \rangle - \partial_{x_k} \langle iu, \partial_{x_j} u \rangle = 2 \langle i \partial_{x_j} u, \partial_{x_k} u \rangle.

Ideally one would like a theorem that asserts (for {m} large enough) that as long as {G} obeys all of the “obvious” constraints, then there exists a suitably non-degenerate map {u} such that {G = G[u,u]}. In the case of NLW, the analogous claim was basically a consequence of the Nash embedding theorem (which can be viewed as a theorem about the solvability of the system of equations {\langle \partial_{x_j} u, \partial_{x_k} u \rangle = g_{jk}} for a given positive definite symmetric set of fields {g_{jk}}). However, the presence of the complex structure in the NLS case poses some significant technical challenges (note for instance that the naive complex version of the Nash embedding theorem is false, due to obstructions such as Liouville’s theorem that prevent a compact complex manifold from being embeddable holomorphically in {{\bf C}^m}). Nevertheless, by adapting the proof of the Nash embedding theorem (in particular, the simplified proof of Gunther that avoids the need to use the Nash-Moser iteration scheme) we were able to obtain a partial complex analogue of the Nash embedding theorem that sufficed for our application; it required an artificial additional “curl-free” hypothesis on the Gram-type matrix {G[u,u]}, but fortunately this hypothesis ends up being automatic in our construction. Also, this version of the Nash embedding theorem is unable to prescribe the component {\langle \partial_t u, \partial_t u \rangle} of the Gram-type matrix {G[u,u]}, but fortunately this component is not used in any of the conservation laws and so the loss of this component does not cause any difficulty.

After applying the above-mentioned Nash-embedding theorem, the task is now to locate a matrix {G} obeying all the hypotheses of that theorem, as well as the conservation laws for mass, momentum, and energy (after defining the potential energy field {V} in terms of {G}). This is quite a lot of fields and constraints, but one can cut down significantly on the degrees of freedom by requiring that {G} is spherically symmetric (in a tensorial sense) and also continuously self-similar (not just discretely self-similar). Note that this hypothesis is weaker than the assertion that the original field {u} is spherically symmetric and continuously self-similar; indeed we do not know if non-trivial solutions of this type actually exist. These symmetry hypotheses reduce the number of independent components of the {(2d+4) \times (2d+4)} matrix {G} to just six: {g_{1,1}, g_{1,i\partial_t}, g_{1,i\partial_r}, g_{\partial_r, \partial_r}, g_{\partial_\omega, \partial_\omega}, g_{\partial_r, \partial_t}}, which now take as their domain the {1+1}-dimensional space

\displaystyle  H_1 := ([0,+\infty) \times {\bf R}) \backslash \{(0,0)\}.

One now has to construct these six fields, together with a potential energy field {v}, that obey a number of constraints, notably some positive definiteness constraints as well as the aforementioned conservation laws for mass, momentum, and energy.

The field {g_{1,i\partial_t}} only arises in the equation for the potential {v} (coming from Euler’s identity) and can easily be eliminated. Similarly, the field {g_{\partial_r,\partial_t}} only makes an appearance in the current of the energy conservation law, and so can also be easily eliminated so long as the total energy is conserved. But in the energy-supercritical case, the total energy is infinite, and so it is relatively easy to eliminate the field {g_{\partial_r, \partial_t}} from the problem also. This leaves us with the task of constructing just five fields {g_{1,1}, g_{1,i\partial_r}, g_{\partial_r,\partial_r}, g_{\partial_\omega,\partial_\omega}, v} obeying a number of positivity conditions, symmetry conditions, regularity conditions, and conservation laws for mass and momentum.

The potential field {v} can effectively be absorbed into the angular stress field {g_{\partial_\omega,\partial_\omega}} (after placing an appropriate counterbalancing term in the radial stress field {g_{\partial_r, \partial_r}} so as not to disrupt the conservation laws), so we can also eliminate this field. The angular stress field {g_{\partial_\omega, \partial_\omega}} is then only constrained through the momentum conservation law and a requirement of positivity; one can then eliminate this field by converting the momentum conservation law from an equality to an inequality. Finally, the radial stress field {g_{\partial_r, \partial_r}} is also only constrained through a positive definiteness constraint and the momentum conservation inequality, so it can also be eliminated from the problem after some further modification of the momentum conservation inequality.

The task then reduces to locating just two fields {g_{1,1}, g_{1,i\partial_r}} that obey a mass conservation law

\displaystyle  \partial_t g_{1,1} = 2 \left(\partial_r + \frac{d-1}{r} \right) g_{1,i\partial r}

together with an additional inequality that is the remnant of the momentum conservation law. One can solve for the mass conservation law in terms of a single scalar field {W} using the ansatz

\displaystyle g_{1,1} = 2 r^{1-d} \partial_r (r^d W)

\displaystyle g_{1,i\partial_r} = r^{1-d} \partial_t (r^d W)

so the problem has finally been simplified to the task of locating a single scalar field {W} with some scaling and homogeneity properties that obeys a certain differential inequality relating to momentum conservation. This turns out to be possible by explicitly writing down a specific scalar field {W} using some asymptotic parameters and cutoff functions.

Filed under: paper Tagged: conservation laws, Nash embedding theorem, nonlinear Schrodinger equation

December 04, 2016

David Hoggstars, disruption, photometric redshifts

Today began with a meeting about GALEX, where Steven Mohammed (Columbia) showed that there is great metallicity information in the overlap of GALEX and Gaia, and we discovered that something must be seriously wrong with the astrometry in our re-calibration of the data.

Andy Casey (Cambridge) organized a phone meeting in which a bunch of us discussed possible scientific exploitation of the data in the ESO HARPS archive, which contains thousands of stars, each of which has tens to thousands of epochs, each of which is signal-to-noise of hundred-ish, and resolution of 100,000. Incredibly huge amounts of data. Huge. Casey asked each of us to describe low-hanging fruit, and take on short-term tasks. One thing we might do is re-factor the archive into something more directly useful to investigators.

Sjoert Van Velzen (JHU) gave the astrophysics seminar about tidal disruption events. He has a great set of results, starting from search and discovery, going through theory and models, and continuing on to multi-wavelength follow-up. The most intriguing result is that the TDEs are amazingly over-represented in post-starburst (E+A) galaxies (which I used to work on). It is hard to imagine any origin for TDEs that would so strongly concentrate them into these environments. It makes me wonder whether the things they are seeing aren't TDEs at all?

After the seminar, Boris Leistedt (NYU) posted to the arXiv our new paper on photometric redshifts. The idea is that we use what we know about Doppler Shift and bandpasses and calibration of photometry, but let the galaxy SEDs themselves be inferred, latent variables. This combines the best properties of machine-learning methods (that is, flexibility, non-parametrics) with the best properties of template-based methods (that is, regularization to physically realizable models, a generative model, and interpretability). It seems to work very well!

Tommaso DorigoIs The X(5568) A True Resonance ?

The DZERO collaboration published earlier this year a search for resonances decaying to pairs in its Run-2 dataset of 2-TeV proton-antiproton collisions, produced by the now defunct Tevatron collider in the first 10 years of this century.

read more

Chad OrzelThe Hold Steady at Brooklyn Bowl 12/2/16

There are only a couple of bands I’d drive a significant distance to see live, and now I’ve made the trip to NYC to see two of them. I went to see the Afghan Whigs in 2014, and this past Friday, I drove to Brooklyn for a Hold Steady show. And this time, I have a cool picture as a bonus…

Me with Craig Finn of the Hold Steady.

Me with Craig Finn of the Hold Steady.

The origin of the picture, obviously, needs a little explaining. The current set of shows is a four-night stand (originally three, but they added one after the first three sold out) at the Brooklyn Bowl, reuniting with keyboardist Franz Nicolay for the 10th anniversary of their album Boys and Girls in America. They did a fan club presale thing for the shows, which included a package with all three shows and a happy hour at Friday’s sound check.

Obviously, I didn’t buy tickets for all three shows, but way back in 2013 or so, I contributed to a crowdfunded EP they did. We’ve never been able to work out the timing for the bonus item I bought then, so when I bought a ticket for Friday’s show, I contacted the people who ran the crowdfunding to see if I could swap it for a spot at the sound check event. They agreed, so I made the trip down a little earlier than I otherwise might’ve, and got to see my favorite band rehearse…

The sound check was fascinating in the way that seeing behind the scenes with people who are really good at what they do is always fascinating. They ran through a few songs that turned up in the set list that night– “Adderall,” “Stevie Nix,” “Charlemagne in Sweatpants”– and went over “Don’t Let Me Explode” about three times to sort out some issue that was completely inaudible to me. There was a bit of back-and-forth after the first pass between Franz Nicolay and Steve Selvidge (who only joined the band after Franz left) about what key they were in, but whatever it was that they were hearing wasn’t super obvious. And whatever it was must’ve gotten fixed, because they played it during the show later that night. This was, obviously, a very late stage in the rehearsal process, so they were mostly fine-tuning stuff, and the atmosphere was pretty loose and fun, though much more low-key than the real show.

The sound check was also the first definitive proof I’ve seen that Craig Finn’s guitar is actually connected to anything when he’s on stage– I’ve seen them twice before, and one of those times he never even pretended to play, but even when he had a guitar, I don’t think I heard anything that definitely came from it. It does work, though, because in the bit of “Charlemagne in Sweatpants” where Selvidge and original guitarist Tad Kubler trade off solos for a bit, Finn played a solo, too (and got teased by Kubler about it…). They also paid off another crowdfunding bonus, for a guy from Montana who had bought a “sound check jam,” and came on stage to play “Magazines” using Finn’s guitar (I have video of this on my phone, but haven’t watched it).

They also took a couple of requests– “Spinners,” and… I forget if “Multitude of Casualties” was the request, or if “Stevie Nix” was– and answered some questions from the crowd. After the sound check, there was a happy hour while the opener (The So So Glos) set up their gear and did a little soundchecking of their own. At least some of the band members hung around for the happy hour (Franz Nicolay had his family there, so he was just playing with a small child)– I spent a few minutes talking bowling and booze with bassist Galen Polivka, and Craig Finn worked the room, talking with just about everyone, signing, and taking photos. As you can see above.

(I wish I could say I had a deep and meaningful conversation with him, but really, it was just inconsequential small talk. I asked what the “North” on his hat meant, and he explained that some people he knows in Minneapolis were trying to “re-brand” the “upper midwest” as “the North,” on the grounds that Minnesota, Wisconsin, and Michigan don’t really have all the much in common with, say, Missouri. The other guys I was talking with before Finn stopped by asked a couple of questions about doing a run of shows, and then we took pictures (my phone was embarrassingly Difficult about this) and he moved on.)

I was actually starting to be drunk by the end of the happy hour– they had good local microbrews on tap at the venue, and some of them were pretty potent– and there were about three hours to kill before the opening act was scheduled to start, so I walked off to a Moroccan place that had been recommended by the guys I was talking to at happy hour, who lived in the area, and got some very good food and cleared my head.

The So So Glos were one of the opening acts on the last Hold Steady tour, so they obviously know each other (and the lead singer would come out at the end of the encore to cover “American Music” with the Hold Steady). They’ve gotten a bit of radio play with “ADD Life” (which I actually find slightly annoying), and turned in a good, high-energy opening set. As is often the case, I was struck by how the guy who played the tricky guitar parts stayed pretty still, while the rhythm guitarist jumped around a whole lot…

And then, the main event… They’re one of only two acts I’ve seen three times (the other is Bob Dylan), and every show I’ve been to has been excellent. This was a bit different than past shows, in that they don’t have a new record to promote, but are celebrating an old one. And, of course, the big selling point of these shows is that they’re back with Franz Nicolay, so the set list was very heavy on stuff from a few albums ago– I don’t think they played anything from either Heaven is Whenever or Teeth Dreams (other than “Spinners” in the sound check)– and stuff with prominent keyboard parts. This means that some of my favorites didn’t make the set list– “Cattle and the Creeping Things,” “You Gotta Dance,” “Chicago Seemed Tired Last Night,” “Most People Are DJ’s”– but, you know, if they were going to play all my favorite songs, the show would run about six hours.

They did a good mix of stuff, with several songs I haven’t heard live before– the set-opening “Positive Jam” (first track off their first album), “Don’t Let Me Explode,” and “Killer Parties” (which is a semi-traditional closing number, though in this case they went into “American Music” after it). It was neat to see the difference in energy level between the sound check run-throughs and the final performances– a great advertisement for live performances with a friendly crowd– and they remain a really tight band. Selvidge and Kubler are great guitarists but not exactly showy– Kubler sometimes turns his back to the crowd when playing really involved solos– but Finn remains a tremendously energetic front man, and it’s amazing how much dancing Nicolay manages despite being required to stay behind his keyboard.

And, of course, it’s 2016, so there’s already a video clip from Friday’s show online:

On the drive down and back, I had my whole Hold Steady collection on shuffle play (including a couple of live shows that haven’t been officially released), and as always, I was struck by how unlikely some of their success seems. I mean, these are really complicated songs with key changes and time-signature changes left and right, and really dense lyrics full of literary and religious references. Even the sing-along parts seem too complicated to work– “Chips Ahoy!” has a shout-along bit that goes eight syllables (“Whoa-oh-oh, oh-oh oh-oh-oh!”).

And yet, through some combination of formidable musical talent and sheer charisma, it all works brilliantly, especially live. Yeah, fine, they’re not selling out stadiums, but they can pack a venue with fans who turn out to spend a couple hours shouting along with all those dense lyrics and complicated musical transitions. As Craig Finn regularly notes from the stage, there is “So! Much! Joy!” in what they do, and it was totally worth the drive down to Brooklyn and back to be a part of it.

December 03, 2016

Clifford JohnsonNeither a Hot Shot nor a Know-it-All, But…

hotshotknowitallI'm neither a Hot Shot nor a Know-it-All, but I've agreed to appear as one or the other Saturday night (Sat. Dec. 3rd), in the company of real hot shots Joel Hodgson (from MST3K!), Sarah Silverman, Sam Phillips, and several others. It's a fun event with the proceeds going to a charity, and tickets are still available! I think (I'm not sure) that I'll be part of a team sitting at tables around the venue that people can ask questions about... you know, stuff. I imagine I'll be asked physics questions...?

More information here. Or here.

For the record, I'm a know-it-some.

-cvj Click to continue reading this post

The post Neither a Hot Shot nor a Know-it-All, But… appeared first on Asymptotia.

David HoggSolar twins, probabilism, model complexity, spectral hacking, and so much more

Today was the usual research-packed day at Flatiron. In the stars group meeting, Megan Bedell (UChicago) told us about her multi-epoch survey of Solar twins. Because they are twins, they have similar logg and Teff values, so she can get very precise differential abundances. Her goal is to understand the relationships between abundances and planets; she gave us mechanisms in which the stellar abundances could affect planet formation, mechanisms in which planet formation could affect stellar surface abundances, and common causes that could affect both. She has measured 20-ish detailed abundances at high precision in 88 stars with (because: multi-epoch) SNR 2000-ish!

Doug Finkbeiner (Harvard) and Stephen Portillo (Harvard) told us about probabilistic catalogs; a project they are doing that builds on work Brewer, Foreman-Mackey, and I did a few years ago. They find (like us) that a probabilistic catalog—a sampling of the posterior in catalog space—can find fainter sources reliably than any standard point-estimate catalog, even one built using crowded-field software. They use HST to deliver ground truth. They aren't going fully hierarchical; we discussed that in the meeting, and the relative merits of probabilistic catalogs and delivering an API to the likelihood function (my new baby).

Neven Caplar (ETHZ) went off-topic in the meeting to describe some results on the time-variability of AGN. Sensibly, he wants to use time-domain data to test accretion disk models. He is working with PTF data, which he had to recalibrate in a self-calibration (he even shouted out our uber-calibration of SDSS). He is computing structure functions (which look random-walk-like) and also doing inference in the context of CARMA models. He pointed out that there must be a long-term damping term in the covariance kernel, but no-one can see it, even with years of data. That's interesting; AGN really are like random walkers on very long timescales.

In the cosmology group meeting, Phil Bull (JPL) worked us through a probabilistic graphical model that replaces simple halo occupation models with something that is a bit more connected to what we think is going on with galaxy evolution. Importantly, it permits him to do large-scale structure experiments with multiple overlapping tracers from different surveys. Much of the discussion was about whether it is better to have a more sophisticated model that is more realistic, or whether it is better to have a simpler model that is more tractable. This is an important question in every data analysis and my answer is very different in different contexts.

Between these two meetings, Bedell and I worked out the simplest representation for our Avast model of stellar spectra and Bedell went off to implement it. She crushed it! She has code that can optimize a smooth model given a set of noisily measured different epochs, accounting for differences in throughput and radial velocity. Not everything is working—we need to diagnose the optimizer we are using (yes, optimization is always the hardest part of any of my projects)—but Bedell did in one afternoon more than I have got done in the last three months! Now we are in a state to make bound-saturating radial-velocity measurements and look for covariant spectral variations in an agnostic way. I couldn't have been more excited at the end of the day.

David Hoggcosmography

It's job season and my head is only just above water! Adam Riess (JHU) gave a nice colloquium at NYU today about the distance scale, and the comparison between the distance ladder and the cosmic microwave background.

December 02, 2016

Terence Tao246A, Notes 5: conformal mapping

In the previous set of notes we introduced the notion of a complex diffeomorphism {f: U \rightarrow V} between two open subsets {U,V} of the complex plane {{\bf C}} (or more generally, two Riemann surfaces): an invertible holomorphic map whose inverse was also holomorphic. (Actually, the last part is automatic, thanks to Exercise 40 of Notes 4.) Such maps are also known as biholomorphic maps or conformal maps (although in some literature the notion of “conformal map” is expanded to permit maps such as the complex conjugation map {z \mapsto \overline{z}} that are angle-preserving but not orientation-preserving, as well as maps such as the exponential map {z \mapsto \exp(z)} from {{\bf C}} to {{\bf C} \backslash \{0\}} that are only locally injective rather than globally injective). Such complex diffeomorphisms can be used in complex analysis (or in the analysis of harmonic functions) to change the underlying domain {U} to a domain that may be more convenient for calculations, thanks to the following basic lemma:

Lemma 1 (Holomorphicity and harmonicity are conformal invariants) Let {\phi: U \rightarrow V} be a complex diffeomorphism between two Riemann surfaces {U,V}.

  • (i) If {f: V \rightarrow W} is a function to another Riemann surface {W}, then {f} is holomorphic if and only if {f \circ \phi: U \rightarrow W} is holomorphic.
  • (ii) If {U,V} are open subsets of {{\bf C}} and {u: V \rightarrow {\bf R}} is a function, then {u} is harmonic if and only if {u \circ \phi: U \rightarrow {\bf R}} is harmonic.

Proof: Part (i) is immediate since the composition of two holomorphic functions is holomorphic. For part (ii), observe that if {u: V \rightarrow {\bf R}} is harmonic then on any ball {B(z_0,r)} in {V}, {u} is the real part of some holomorphic function {f: B(z_0,r) \rightarrow {\bf C}} thanks to Exercise 62 of Notes 3. By part (i), {f \circ \phi: B(z_0,r) \rightarrow {\bf C}} is also holomorphic. Taking real parts we see that {u \circ \phi} is harmonic on each ball {B(z_0,r)} in {V}, and hence harmonic on all of {V}, giving one direction of (ii); the other direction is proven similarly. \Box

Exercise 2 Establish Lemma 1(ii) by direct calculation, avoiding the use of holomorphic functions. (Hint: the calculations are cleanest if one uses Wirtinger derivatives, as per Exercise 27 of Notes 1.)

Exercise 3 Let {\phi: U \rightarrow V} be a complex diffeomorphism between two open subsets {U,V} of {{\bf C}}, let {z_0} be a point in {U}, let {m} be a natural number, and let {f: V \rightarrow {\bf C} \cup \{\infty\}} be holomorphic. Show that {f: V \rightarrow {\bf C} \cup \{\infty\}} has a zero (resp. a pole) of order {m} at {\phi(z_0)} if and only if {f \circ \phi: U \rightarrow {\bf C} \cup \{\infty\}} has a zero (resp. a pole) of order {m} at {z_0}.

From Lemma 1(ii) we can now define the notion of a harmonic function {u: M \rightarrow {\bf R}} on a Riemann surface {M}; such a function {u} is harmonic if, for every coordinate chart {\phi_\alpha: U_\alpha \rightarrow V_\alpha} in some atlas, the map {u \circ \phi_\alpha^{-1}: V_\alpha \rightarrow {\bf R}} is harmonic. Lemma 1(ii) ensures that this definition of harmonicity does not depend on the choice of atlas. Similarly, using Exercise 3 one can define what it means for a holomorphic map {f: M \rightarrow {\bf C} \cup \{\infty\}} on a Riemann surface {M} to have a pole or zero of a given order at a point {p_0 \in M}, with the definition being independent of the choice of atlas.

In view of Lemma 1, it is thus natural to ask which Riemann surfaces are complex diffeomorphic to each other, and more generally to understand the space of holomorphic maps from one given Riemann surface to another. We will initially focus attention on three important model Riemann surfaces:

  • (i) (Elliptic model) The Riemann sphere {{\bf C} \cup \{\infty\}};
  • (ii) (Parabolic model) The complex plane {{\bf C}}; and
  • (iii) (Hyperbolic model) The unit disk {D(0,1)}.

The designation of these model Riemann surfaces as elliptic, parabolic, and hyperbolic comes from Riemannian geometry, where it is natural to endow each of these surfaces with a constant curvature Riemannian metric which is positive, zero, or negative in the elliptic, parabolic, and hyperbolic cases respectively. However, we will not discuss Riemannian geometry further here.

All three model Riemann surfaces are simply connected, but none of them are complex diffeomorphic to any other; indeed, there are no non-constant holomorphic maps from the Riemann sphere to the plane or the disk, nor are there any non-constant holomorphic maps from the plane to the disk (although there are plenty of holomorphic maps going in the opposite directions). The complex automorphisms (that is, the complex diffeomorphisms from a surface to itself) of each of the three surfaces can be classified explicitly. The automorphisms of the Riemann sphere turn out to be the Möbius transformations {z \mapsto \frac{az+b}{cz+d}} with {ad-bc \neq 0}, also known as fractional linear transformations. The automorphisms of the complex plane are the linear transformations {z \mapsto az+b} with {a \neq 0}, and the automorphisms of the disk are the fractional linear transformations of the form {z \mapsto e^{i\theta} \frac{\alpha - z}{1 - \overline{\alpha} z}} for {\theta \in {\bf R}} and {\alpha \in D(0,1)}. Holomorphic maps {f: D(0,1) \rightarrow D(0,1)} from the disk {D(0,1)} to itself that fix the origin obey a basic but incredibly important estimate known as the Schwarz lemma: they are “dominated” by the identity function {z \mapsto z} in the sense that {|f(z)| \leq |z|} for all {z \in D(0,1)}. Among other things, this lemma gives guidance to determine when a given Riemann surface is complex diffeomorphic to a disk; we shall discuss this point further below.

It is a beautiful and fundamental fact in complex analysis that these three model Riemann surfaces are in fact an exhaustive list of the simply connected Riemann surfaces, up to complex diffeomorphism. More precisely, we have the Riemann mapping theorem and the uniformisation theorem:

Theorem 4 (Riemann mapping theorem) Let {U} be a simply connected open subset of {{\bf C}} that is not all of {{\bf C}}. Then {U} is complex diffeomorphic to {D(0,1)}.

Theorem 5 (Uniformisation theorem) Let {M} be a simply connected Riemann surface. Then {M} is complex diffeomorphic to {{\bf C} \cup \{\infty\}}, {{\bf C}}, or {D(0,1)}.

As we shall see, every connected Riemann surface can be viewed as the quotient of its simply connected universal cover by a discrete group of automorphisms known as deck transformations. This in principle gives a complete classification of Riemann surfaces up to complex diffeomorphism, although the situation is still somewhat complicated in the hyperbolic case because of the wide variety of discrete groups of automorphisms available in that case.

We will prove the Riemann mapping theorem in these notes, using the elegant argument of Koebe that is based on the Schwarz lemma and Montel’s theorem (Exercise 57 of Notes 4). The uniformisation theorem is however more difficult to establish; we discuss some components of a proof (based on the Perron method of subharmonic functions) here, but stop short of providing a complete proof.

The above theorems show that it is in principle possible to conformally map various domains into model domains such as the unit disk, but the proofs of these theorems do not readily produce explicit conformal maps for this purpose. For some domains we can just write down a suitable such map. For instance:

Exercise 6 (Cayley transform) Let {{\bf H} := \{ z \in {\bf C}: \mathrm{Im} z > 0 \}} be the upper half-plane. Show that the Cayley transform {\phi: {\bf H} \rightarrow D(0,1)}, defined by

\displaystyle  \phi(z) := \frac{z-i}{z+i},

is a complex diffeomorphism from the upper half-plane {{\bf H}} to the disk {D(0,1)}, with inverse map {\phi^{-1}: D(0,1) \rightarrow {\bf H}} given by

\displaystyle  \phi^{-1}(w) := i \frac{1+w}{1-w}.

Exercise 7 Show that for any real numbers {a<b}, the strip {\{ z \in {\bf C}: a < \mathrm{Re}(z) < b \}} is complex diffeomorphic to the disk {D(0,1)}. (Hint: use the complex exponential and a linear transformation to map the strip onto the half-plane {{\bf H}}.)

Exercise 8 Show that for any real numbers {a<b<a+2\pi}, the strip {\{ re^{i\theta}: r>0, a < \theta < b \}} is complex diffeomorphic to the disk {D(0,1)}. (Hint: use a branch of either the complex logarithm, or of a complex power {z \mapsto z^\alpha}.)

We will discuss some other explicit conformal maps in this set of notes, such as the Schwarz-Christoffel maps that transform the upper half-plane {{\bf H}} to polygonal regions. Further examples of conformal mapping can be found in the text of Stein-Shakarchi.

— 1. Maps between the model Riemann surfaces —

In this section we study the various holomorphic maps, and conformal maps, between the three model Riemann surfaces {{\bf C} \cup \{\infty\}}, {{\bf C}}, and {D(0,1)}.

From Exercise 19 of Notes 4, we know that the only holomorphic maps {f: {\bf C} \cup \{\infty\} \rightarrow {\bf C} \cup \{\infty\}} from the Riemann sphere to itself (besides the constant function {\infty}) take the form of a rational function {f(z) = P(z) / Q(z)} away from the zeroes of {Q} (and from {\infty}), with these singularities all being removable, and with {Q} not identically zero. We can of course reduce to lowest terms and assume that {P} and {Q} have no common factors. In particular, if {f} is to take values in {{\bf C}} rather than {{\bf C} \cup \{\infty\}}, then {Q} can have no roots (since {f} will have a pole at these roots) and so by the fundamental theorem of algebra {Q} is constant and {f} is a polynomial; in order for {f} to have no pole at infinity, {f} must then be constant. Thus the only holomorphic maps from {{\bf C} \cup \{\infty\}} to {{\bf C}} are the constants; in particular, the only holomorphic maps from {{\bf C} \cup \{\infty\}} to {D(0,1)} are the constants. In particular, {{\bf C} \cup \{\infty\}} is not complex diffeomorphic to {{\bf C}} or {D(0,1)} (this is also topologically obvious since the Riemann sphere is compact, and {{\bf C}} and {D(0,1)} are not).

Exercise 9 More generally, show that if {M} is a compact Riemann surface and {N} is a connected non-compact Riemann surface, then the only holomorphic maps from {M} to {N} are the constants. (Hint: use the open mapping theorem, Theorem 37 of Notes 4.)

Now we consider complex automorphisms of the Riemann sphere {{\bf C} \cup \{\infty\}} to itself. There are some obvious examples of such automorphisms:

  • Translation maps {z \mapsto z + c} for some {c \in {\bf C}}, with the convention that {\infty} is mapped to {\infty};
  • Dilation maps {z \mapsto \lambda z} for some {\lambda \in {\bf C} \backslash \{0\}}, with the convention that {\infty} is mapped to {\infty}; and
  • The inversion map {z \mapsto 1/z}, with the convention that {\infty} is mapped to {0}.

More generally, given any complex numbers {a,b,c,d} with {ad-bc \neq 0}, we can define the Möbius transformation (or fractional linear transformation) {z \mapsto \frac{az+b}{cz+d}} for {z \neq \infty, -d/c}, with the convention that {-d/c} is mapped to {\infty} and {\infty} is mapped to {a/c} (where we adopt the further convention that {a/0=\infty} for non-zero {a}). For {c=0}, this is an affine transformation {z \mapsto \frac{a}{d} z + \frac{b}{d}}, which is clearly a composition of a translation and dilation map; for {c \neq 0}, this is a combination {z \mapsto \frac{a}{c} - \frac{ad-bc}{cz+d}} of translations, dilations, and the inversion map. Thus all Möbius transformations are formed from composition of the translations, dilations, and inversions, and in particular are also automorphisms of the Riemann sphere; it is also easy to see that the Möbius transformations are closed under composition, and are thus the group generated by the translations, dilations, and inversions.

One can interpret the Möbius transformations as projective linear transformations as follows. Recall that the general linear group {GL_2({\bf C})} is the group of {2 \times 2} matrices {\begin{pmatrix} a & b \\ c & d \end{pmatrix}} with non-vanishing determinant {ad-bc}. Clearly every such matrix generates a Möbius transformation {z \mapsto \frac{az+b}{cz+d}}. However, two different elements of {GL_2({\bf C})} can generate the same Möbius transformation if they are scalar multiples of each other. If we define the projective linear group {PGL_2({\bf C})} to be the quotient group of {GL_2({\bf C})} by the group of scalar invertible matrices, then we may identify the set of Möbius transformations with {PGL_2({\bf C})}. The group {GL_2({\bf C})} acts on the space {{\bf C}^2} by the usual map

\displaystyle  \begin{pmatrix} a & b \\ c & d \end{pmatrix} \begin{pmatrix} z \\ w \end{pmatrix} = \begin{pmatrix} az+bw \\ cz+dw \end{pmatrix}.

If we let {{\bf CP}^1} be the complex projective line, that is to say the space of one-dimensional subspaces of {{\bf C}^2}, then {GL_2({\bf C})} acts on this space also, with the action of the scalars being trivial, so we have an action of {PGL_2({\bf C})} on {{\bf CP}^1}. We can identify the Riemann sphere {{\bf C} \cup \{\infty\}} with the complex projective line by identifying each {c \in {\bf C} \subset {\bf C} \cup \{\infty\}} with the one-dimensional subspace {\{ (cw,w): w \in {\bf C} \}} of {{\bf C}^2}, and identifying {\infty \in {\bf C} \cup \{\infty\}} with {\{ (z,0): z \in {\bf C}\}}. With this identification, one can check that the action of {PGL_2({\bf C})} on {{\bf CP}^1} has become identified with the action of the group of Möbius transformations on {{\bf C} \cup \{\infty\}}. (In particular, the group of Möbius transformations is isomorphic to {PGL_2({\bf C})}.)

There are enough Möbius transformations available that their action on the Riemann sphere is not merely transitive, but is in fact {3}-transitive:

Lemma 10 ({3}-transitivity) Let {z_1,z_2,z_3} be distinct elements of the Riemann sphere {{\bf C} \cup \{\infty\}}, and let {w_1,w_2,w_3} also be three distinct elements of the Riemann sphere. Then there exists a unique Möbius transformation {T} such that {T(z_j) = w_j} for {j=1,2,3}.

Proof: We first show existence. As the Möbius transformations form a group, it suffices to verify the claim for a single choice of {z_1,z_2,z_3}, for instance {z_1 = 0, z_2 = 1, z_3 = \infty}. If {w_3=\infty} then the affine transformation {z \mapsto w_1 + z(w_2-w_1)} will have the desired properties. If {w_3 \neq \infty}, we can use translation and inversion to find a Möbius transformation {S} that maps {w_3} to {\infty}; applying the previous case with {w_1,w_2,w_3} with {S(w_1), S(w_2), S(w_3)} and then applying {S^{-1}}, we obtain the claim.

Now we prove uniqueness. By composing on the left and right with Möbius transforms we may assume that {z_1=w_1=0, z_2=w_2=1, z_3=w_3=\infty}. A Möbius transformation {z \mapsto \frac{az+b}{cz+d}} that fixes {0,1,\infty} must obey the constraints {b=0, a+b=c+d, c=0} and so must be the identity, as required. \Box

Möbius transformations are not 4-transitive, thanks to the invariant known as the cross-ratio:

Exercise 11 Define the cross-ratio {[z_1,z_2; z_3,z_4]} between four distinct points {z_1,z_2,z_3,z_4} on the Riemann sphere {{\bf C} \cup \{\infty\}} by the formula

\displaystyle  [z_1,z_2; z_3,z_4] = \frac{(z_1-z_3)(z_2-z_4)}{(z_2-z_3)(z_1-z_4)}

if all of {z_1,z_2,z_3,z_4} avoid {\infty}, and extended continuously to the case when one of the points equals {\infty} (e.g. {[z_1,z_2;z_3,\infty] = \frac{z_1-z_3}{z_2-z_3}}).

  • (i) Show that an injective map {T: {\bf C} \cup \{\infty\} \rightarrow {\bf C} \rightarrow \{\infty\}} is a Möbius transform if and only if it preserves the cross-ratio, that is to say that {[T(z_1),T(z_2);T(z_3),T(z_4)] = [z_1,z_2;z_3,z_4]} for all distinct points {z_1,z_2,z_3,z_4 \in {\bf C} \cup \{\infty\}}. (Hint: for the “only if” part, work with the basic Möbius transforms. For the “if” part, reduce to the case when {T} fixes three points, such as {0,1,\infty}.)
  • (ii) If {z_1,z_2,z_3,z_4} are distinct points in {{\bf C} \cup \{\infty\}}, show that {z_1,z_2,z_3,z_4} lie on a common extended line (i.e., a line in {{\bf C}} together with {\infty}) or circle in {{\bf C}} if and only if the cross-ratio {[z_1,z_2;z_3,z_4]} is real. Conclude that a Möbius transform will map an extended line or circle to an extended line or circle.

As one quick application of Möbius transformations, we have

Proposition 12 {{\bf C} \cup \{\infty\}} is simply connected.

Proof: We have to show that any closed curve {\gamma} in {{\bf C} \cup \{\infty\}} is contractible to a point in {{\bf C} \cup \{\infty\}}. By deforming {\gamma} locally into line segments in either of the two standard coordinate charts of {{\bf C} \cup \{\infty\}} we may assume that {\gamma} is the concatenation of finitely many such line segments; in particular, {\gamma} cannot be a space-filling curve (as one can see from e.g. the Baire category theorem) and thus avoids at least one point in {{\bf C} \cup \{\infty\}}. If {\gamma} avoids {\infty} then it lies in {{\bf C}} and can thus be contracted to a point in {{\bf C}} (and hence in {{\bf C} \cup \{\infty\}}) since {{\bf C}} is convex. If {\gamma} avoids any other point {z_0}, then we can apply a Möbius transformation to move {z_0} to {\infty}, contract the transformed curve to a point, and then invert the Möbius transform to contract {\gamma} to a point in {{\bf C} \cup \{\infty\}}. \Box

Exercise 13 (Jordan curve theorem in the Riemann sphere) Let {\gamma: [a,b] \rightarrow {\bf C} \cup \{\infty\}} be a simple closed curve in the Riemann sphere. Show that the complement of {\gamma([a,b])} in {{\bf C} \cup \{\infty\}} is the union of two disjoint simply connected open subsets of {{\bf C} \cup \{\infty\}}. (Hint: one first has to exclude the possibility that {\gamma} is space-filling. Do this by verifying that {\gamma([a,b])} is homeomorphic to the unit circle.)

It turns out that there are no other automorphisms of the Riemann sphere than the Möbius transformations:

Proposition 14 (Automorphisms of Riemann sphere) Let {T: {\bf C} \cup \{\infty\} \rightarrow {\bf C} \cup \{\infty\}} be a complex diffeomorphism. Then {T} is a Möbius transformation.

Proof: By Lemma 10 and composing {T} with a Möbius transformation, we may assume without loss of generality that {T} fixes {0,1,\infty}. From Exercise 19 of Notes 4 we know that {T} is a rational function {T(z) = P(z)/Q(z)} (with all singularities removed); we may reduce terms so that {P,Q} have no common factors. Since {T} is bijective and fixes {\infty}, it has no poles in {{\bf C}}, and hence {Q} can have no roots; by the fundamental theorem of algebra, this makes {Q} constant. Similarly, {P} has no zeroes other than {0}, and so must be a monomial; as {T} also fixes {1}, it must be of the form {T(z) = z^n} for some natural number {n}. But this is only injective if {n=1}, in which case {T} is clearly a Möbius transformation. \Box

Now we look at holomorphic maps on {{\bf C}}. There are plenty of holomorphic maps from {{\bf C}} to {{\bf C}}; indeed, these are nothing more than the entire functions, of which there are many (indeed, an entire function is nothing more than a power series with an infinite radius of convergence). There are even more holomorphic maps from {{\bf C}} to {{\bf C} \cup \{\infty\}}, as these are just the meromorphic functions on {{\bf C}}. For instance, any ratio {f/g} of two entire functions, with {g} not identically zero, will be meromorphic on {{\bf C}}. On the other hand, from Liouville’s theorem (Theorem 28 of Notes 3) we see that the only holomorphic maps from {{\bf C}} to {D(0,1)} are the constants. In particular, {{\bf C}} and {D(0,1)} are not complex diffeomorphic (despite the fact that they are diffeomorphic over the reals, as can be seen for instance by using the projection {z \mapsto \frac{z}{\sqrt{1+|z|^2}}}).

The affine maps {z \mapsto az+b} with {a \in {\bf C} \backslash \{0\}} and {b \in {\bf C}} are clearly complex automorphisms on {{\bf C}}. In analogy with Proposition 14, these turn out to be the only automorphisms:

Proposition 15 (Automorphisms of complex plane) Let {T: {\bf C} \rightarrow {\bf C}} be a complex diffeomorphism. Then {T} is an affine transformation {T(z) = az+b} for some {a \in {\bf C} \backslash \{0\}} and {b \in {\bf C}}.

Proof: By the open mapping theorem (Theorem 37 of Notes 4), {T(D(0,1))} is open, and hence {T} avoids the non-empty open set {T(D(0,1))} on {{\bf C} \backslash D(0,1)}. By the Casorati-Weierstrass theorem (Theorem 11 of Notes 4), we conclude that {T} does not have an essential singularity at infinity. Thus {T} extends to a holomorphic function from {{\bf C} \cup \{\infty\}} to {{\bf C} \cup \{\infty\}}, hence by Exercise 19 of Notes 4 is rational. As the only pole of {T} is at infinity, {T} is a polynomial; as {T} is a diffeomorphism, the derivative has no zeroes and is thus constant by the fundamental theorem of algebra. Thus {T} must be affine, and the claim follows. \Box

Exercise 16 Let {f: {\bf C} \rightarrow {\bf C} \cup \{\infty\}} be an injective holomorphic map. Show that {f} is a Möbius transformation (restricted to {{\bf C}}).

We remark that injective holomorphic maps are often referred to as univalent functions in the literature.

Finally, we consider holomorphic maps on {D(0,1)}. There are plenty of holomorphic maps from {D(0,1)} to {{\bf C}} (indeed, these are just the power series with radius of convergence at least {1}), and even more holomorphic maps from {D(0,1)} to {{\bf C} \cup \{\infty\}} (for instance, one can take the quotient of two holomorphic functions {f,g: D(0,1) \rightarrow {\bf C}} with {g} non-zero). There are also many holomorphic maps from {D(0,1)} to {D(0,1)}, for instance one can take any bounded holomorphic function {f: D(0,1) \rightarrow {\bf C}} and multiply it by a small constant. However, we have the following fundamental estimate concerning such functions, the Schwarz lemma:

Lemma 17 (Schwarz lemma) Let {f: D(0,1) \rightarrow D(0,1)} be a holomorphic map such that {f(0)=0}. Then we have {|f(z)| \leq |z|} for all {z \in D(0,1)}. In particular, {|f'(0)| \leq 1}.

Furthermore, if {|f(z)|=|z|} for some {z \in D(0,1) \backslash \{0\}}, or if {|f'(0)|=1}, then there exists a real number {\theta} such that {f(z) = e^{i \theta} z} for all {z \in D(0,1)}.

Proof: By the factor theorem (Corollary 22 of Notes 3), we may write {f(z) = z g(z)} for some holomorphic {g: D(0,1) \rightarrow {\bf C}}. On any circle {\{ z: |z| = 1-\varepsilon \}} with {0 < \varepsilon < 1}, we have {|f(z)| <1} and hence {|g(z)| < \frac{1}{1-\varepsilon}}; by the maximum principle we conclude that {|g(z)| \leq \frac{1}{1-\varepsilon}} for all {z \in D(0,1-\varepsilon)}. Sending {\varepsilon} to zero, we conclude that {|g(z)| \leq 1} for all {z \in D(0,1)}, and hence {|f(z)| \leq |z|} and {|f'(0)| = 1}.

Finally, if {|f(z)|=|z|} for some {z \in D(0,1) \backslash \{0\}} or {|f'(0)|=1}, then {|g(z)|} equals {1} for some {z \in D(0,1)}, and hence by a variant of the maximum principle (see Exercise 18 below) we see that {g} is constant, giving the claim. \Box

Exercise 18 (Variant of maximum principle) Let {U} be a connected Riemann surface, and let {z_0} be a point in {U}.

  • (i) If {u: U \rightarrow {\bf R}} is a harmonic function such that {u(z) \leq u(z_0)} for all {z \in U}, then {u(z) = u(z_0)} for all {z \in U}.
  • (ii) If {f: U \rightarrow {\bf C}} is a holomorphic function such that {|f(z)| \leq |f(z_0)|} for all {z \in U}, then {f(z) = f(z_0)} for all {z \in U}.

(Hint: use Exercise 17 of Notes 3 .)

One can think of the Schwarz lemma as follows. Let {{\mathcal H}_0} denote the collection of holomorphic functions {f: D(0,1) \rightarrow D(0,1)} with {f(0)=0}. Inside this collection we have the rotations {R_\theta: D(0,1) \rightarrow D(0,1)} for {\theta \in {\bf R}} defined by {R_\theta(z) :=e^{i\theta} z}. The Schwarz lemma asserts that these rotations “dominate” the remaining functions {f} in {{\mathcal H}_0} in the sense that {|f(z)| \leq |R_\theta(z)|} on {D(0,1) \backslash \{0\}}, and in particular {|f'(z)| \leq |R'_\theta(z)|}; furthermore these inequalities are strict as long as {f} is not one of the {R_\theta}.

As a first application of the Schwarz lemma, we characterise the automorphisms of the disk {D(0,1)}. For any {\alpha \in D(0,1)}, one can check that the Möbius transformation {z \mapsto \frac{z-\alpha}{1-\overline{\alpha} z}} preserves the boundary of the disk {D(0,1)} (since {1 - \overline{\alpha} z = z \overline{(z-\alpha)}} when {|z|=1}), and maps the point {\alpha} to the origin, and thus maps the disk {D(0,1)} to itself. More generally, for any {\alpha \in D(0,1)} and {\theta \in {\bf R}}, the Möbius transformation {z \mapsto e^{i\theta} \frac{z-\alpha}{1-\overline{\alpha} z}} is an automorphism of the disk {D(0,1)}. It turns out that these are the only such automorphisms:

Theorem 19 (Automorphisms of disk) Let {f: D(0,1) \rightarrow D(0,1)} be a complex diffeomorphism. Then there exists {\alpha \in D(0,1)} and {\theta \in {\bf R}} such that {f(z) = e^{i\theta} \frac{z-\alpha}{1-\overline{\alpha} z}} for all {z \in D(0,1)}. If furthermore {f(0)=0}, then we can take {\alpha=0}, thus {f(z) =e^{i\theta} z} for {z \in D(0,1)}.

Proof: First suppose that {f(0)=0}. By the Schwarz lemma applied to both {f} and its inverse {f^{-1}}, we see that {|f'(0)|, |(f^{-1})'(0)| \leq 1}. But by the inverse function theorem (or the chain rule), {(f^{-1})'(0) = 1/f'(0)}, hence {|f'(0)|=1}. Applying the Schwarz lemma again, we conclude that {f(z) = e^{i\theta} z} for some {\theta}, as required.

In the general case, there exists {\alpha \in D(0,1)} such that {f(\alpha) = 0}. If one then applies the previous analysis to {f \circ g^{-1}}, where {g: D(0,1) \rightarrow D(0,1)} is the automorphism {g(z) := \frac{z-\alpha}{1-\overline{\alpha} z}}, we obtain the claim. \Box

Exercise 20 (Automorphisms of half-plane) Let {f: {\bf H} \rightarrow {\bf H}} be a complex diffeomorphism from the upper half-plane {{\bf H} := \{z \in {\bf C}: \mathrm{Im}(z) > 0 \}} to itself. Show that there exist real numbers {a,b,c,d} with {ad-bc = 1} such that {f(z) = \frac{az+b}{cz+d}} for {z \in {\bf H}}. Conclude that the automorphism group of either {D(0,1)} or {{\bf H}} is isomorphic as a group to the projective special linear group {PSL_2({\bf R})} formed by starting with the special linear group {SL_2({\bf R})} of {2 \times 2} real matrices {\begin{pmatrix} a & b \\ c & d \end{pmatrix}} of determinant {1}, and then quotienting out by the central subgroup {\{ \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}, \begin{pmatrix} -1 & 0 \\ 0 & -1 \end{pmatrix} \}}.

Remark 21 Collecting the various assertions above about the holomorphic maps between the elliptic, parabolic, and hyperbolic model Riemann surfaces {{\bf C} \cup \{\infty\}}, {{\bf C}}, {D(0,1)}, one arrives at the following rule of thumb: there are “many” holomorphic maps from “more hyperbolic” surfaces to “less hyperbolic” surfaces, but “very few” maps going in the other direction (and also relatively few automorphisms from one space to an “equally hyperbolic” surface). This rule of thumb also turns out to be accurate in the context of compact Riemann surfaces, where “higher genus” becomes the analogue of “more hyperbolic” (and similarly for “less hyperbolic” or “equally hyperbolic”). One can formalise this latter version of the rule of thumb using such results as the Riemann-Hurwitz formula and the de Franchis theorem, but these are beyond the scope of this course.

Exercise 22 Let {f: M \rightarrow N} be a non-constant holomorphic map between Riemann surfaces {M,N}. If {M} is compact and {N} is connected, show that {f} is surjective and {N} is compact. Conclude in particular that there are no non-constant bounded holomorphic functions {f: M \rightarrow {\bf C}} on a compact Riemann surface.

— 2. Quotients of the model Riemann surfaces —

The three model Riemann surfaces {{\bf C} \cup \{\infty\}}, {{\bf C}}, {D(0,1)} are all simply connected, and the uniformisation theorem will tell us that up to complex diffeomorphism, these are the only simply connected Riemann surfaces that exist. However, it is possible to form non-simply-connected Riemann surfaces from these model surfaces by the procedure of taking quotients, as follows. Let {M} be a Riemann surface, and let {\Gamma} be a group of complex automorphisms of {M}. We assume that the action of {\Gamma} on {M} is free, which means that the non-identity transformations {T: M \rightarrow M} in {\Gamma} have no fixed points (thus {T p \neq p} for all {p \in M}). We also assume that the action is proper (viewing {\Gamma} as a discrete group), which means that for any compact subset {K} of {M}, there are only finitely many automorphisms {T} in {\Gamma} for which {TK} intersects {K}. If the action is both free and proper, then we see that every point {p \in M} has a small neighbourhood {U_p} with the property that the images {TU_p, T \in \Gamma} are all disjoint; by making {U_p} small enough we can also find a holomorphic coordinate chart {\phi_p: U_p \rightarrow V_p} to some open subset {V_p} of {{\bf C}}. We can then form the quotient manifold {\Gamma \backslash M} of orbits {\Gamma p := \{ Tp: T \in \Gamma\}}, using the coordinate charts {\tilde \phi_p: \Gamma \backslash \bigcup_{T \in \Gamma} TU_p \rightarrow V_p} for any {p \in M} defined by setting

\displaystyle  \tilde \phi_p( \Gamma q ) := \phi_p( q )

for all {q \in U_p}. One can easily verify that {\Gamma \backslash M} is a Riemann surface, and that the quotient map {\pi: M \rightarrow \Gamma \backslash M} defined by {\pi(p) := \Gamma p} is a surjective holomorphic map. The ability to easily take quotients is one of the key advantages of the Riemann surface formalism; another is the converse ability to construct covers, such as the universal cover of a Riemann surface, defined in Theorem 25 below.

Exercise 23 Let {M} be a Riemann surface, {\Gamma} a group of complex automorphisms of {M} acting in a proper and free fashion, and let {\pi: M \rightarrow \Gamma \backslash M} be the quotient map. Let {f: M \rightarrow N} be a holomorphic map to another Riemann surface {N}. Show that there exists a holomorphic map {\tilde f: \Gamma \backslash M \rightarrow N} such that {f = \tilde f \circ \pi} if and only if {f = f \circ T} for all {T \in \Gamma}.

Remark 24 It is also of interest to quotient a Riemann surface {M} by a group {\Gamma} of complex automorphisms whose action is not free. In that case, the quotient space {\Gamma \backslash M} need not be a manifold, but is instead a more general object known as an orbifold. A typical example is the modular curve {SL_2({\bf Z}) \backslash {\bf H}} (where {SL_2({\bf Z})} is the group of {2 \times 2} matrices with integer coefficients and determinant {1}); this is of great importance in analytic number theory. However, we will not study orbifolds in this course.

Since the continuous image of a connected space is always connected, we see that any quotient {\Gamma \backslash M} of a connected Riemann surface is again connected. In the converse direction, one can use this construction to describe a connected Riemann surface as a quotient of a simply connected Riemann surface:

Theorem 25 (Universal cover) Let {M} be a connected Riemann surface. Then there exists a simply connected Riemann surface {N}, and a group {\Gamma} of complex automorphisms acting on {N} in a proper and free fashion, such that {M} is complex diffeomorphic to {\Gamma \backslash N}.

Proof: For sake of brevity we omit some of the details of the construction as exercises.

We use the following abstract construction to build the Riemann surface {N}. Fix a base point {p_0} in {M}. For any point {p} in {M}, we can form the space of all continuous paths {\gamma: [a,b] \rightarrow M} from {p_0} to {p} for some interval {[a,b]} with {a<b}. We let {N_p} denote the space of equivalence classes of such paths with respect to the operation of homotopy with fixed endpoints up to reparameterisation; for instance, if {M} was simply connected, then {N_p} would simply be a point. (One could also omit the reparameterisation by restricting the domain of {\gamma} to be a fixed interval such as {[0,1]}.) As {M} is connected, all the {N_p} are non-empty. We then let {N} be the (disjoint) union of all the {N_p}. This defines a set {N} together with a projection map {\pi: N \rightarrow M} that sends all the homotopy classes in {N_p} to {p} for each {p \in M}; this is clearly a surjective map.

This defines {N} as a set, but we want to give {N} the structure of a Riemann surface, and thus must create an atlas of coordinate charts. For every {p \in M}, let {\phi_p: U_p \rightarrow D(0,1)} be a coordinate chart that is a diffeomorphism between some neighbourhood {U_p} of {p} and the unit disk. Given a homotopy class {\tilde p} in {N_p} and a point {p'} in {U_p}, we can then associate a point {\tilde p'} in {N_{p'}} by taking a path {\gamma} from {p_0} to {p} in the homotopy class {\tilde p}, and concatenating it with the path {\phi_p^{-1} \circ \gamma_{0 \rightarrow \phi_p(p')}} that connects {p} to {p'} via a line segment in the disk {D(0,1)} using the coordinate chart {\phi_p}; the homotopy class of this concatenated path does not depend on the precise choice of {\gamma} and will be denoted {\tilde p'}. If we let {\tilde U_{\tilde p}} denote all the points {\tilde p'} obtained in this fashion as {p'} varies over {U_p}, then it is easy to see (Exercise!) that the {\tilde U_{\tilde p}, \tilde p \in N_p} are disjoint and partition the set {\pi^{-1}(U_p)}. We can then form coordinate charts {\tilde \phi_{\tilde p}: \tilde U_{\tilde p} \rightarrow D(0,1)} for each {\tilde p \in N_p} and {p \in M} by setting {\tilde \phi_{\tilde p}(\tilde p') = \phi_p \circ \pi(\tilde p')} for all {\tilde p' \in \tilde U_{\tilde p}}. This defines both a topology on {N} (by declaring a subset {U} of {N} to be open if {\tilde \phi_{\tilde p}( U \cap \tilde U_{\tilde p})} is open for all {\tilde p \in N}) and a complex structure, as the transition maps are easily verified (Execise!) to be both continuous and holomorphic (after first shrinking the neighbourhoods {U_p} and {\tilde U_p} if necessary). By construction we now see that {N} is a covering space of {M}, with {\pi: N \rightarrow M} the covering map.

Let {\tilde p_0 \in N_{p_0}} be the homotopy class of the constant curve at {p_0}. It is easy to see (Exercise!) that {N} is connected (indeed, any point {\tilde p} in {N} determines (more or less tautologically) a family of paths in {N} from {\tilde p_0} to {\tilde p}). Next, we make the stronger claim that {N} is simply connected. It suffices to show that any closed path {\tilde \gamma: [0,1] \rightarrow N} from {\tilde p_0} to {\tilde p_0} is contractible to a point. Let {\gamma: [0,1] \rightarrow M} denote the projected curve {\gamma := \pi \circ \tilde \gamma}, thus {\gamma} is a closed curve from {p_0} to itself. From the continuity method (Exercise!) we see that for any {0 \leq t \leq 1}, the restriction {\gamma_{[0,t]}: [0,t] \rightarrow M} of {\gamma} to {[0,t]} lies in the homotopy class of {\tilde \gamma(t)}; in particular, {\gamma} itself lies in the homotopy class of {\tilde p_0}, and is thus homotopic to a point. Another application of the continuity method (Exercise!) then shows that as one continuously deforms {\gamma} to a point, each of the curves {\gamma_s} obtained in this deformation lifts to a closed curve {\tilde \gamma_s} in {N} from {\tilde p_0} to {\tilde p_0}, in the sense that {\gamma_s = \pi \circ \tilde \gamma_s}; furthermore, {\tilde \gamma_s} varies continuously in {s}, giving the required homotopy from {\tilde \gamma} to a point.

Define a deck transformation to be a holomorphic map {T: N \rightarrow N} such that {\pi \circ T = \pi} (that is to say, {T} preserves each of the “fibres” {N_p} of {N}). Clearly the composition of two deck transformations is again a deck transformation. From Corollary 50 of Notes 4 we see that for any {p \in M} and {\tilde p, \tilde q \in N_p}, there exists a unique deck transformations that maps {\tilde p} to {\tilde q}. Composing that a deck transformations with the deck transformations that maps {\tilde q} to {\tilde p} we see that all deck transformations are invertible and are thus complex automorphisms. If we let {\Gamma} denote the collection of all deck transformations then we see that {\Gamma} is a group that acts freely on {N} and transitively on each fibre {N_p}. For any {p \in M}, and the neighbourhoods {U_p} as before, one can verify (Exercise!) that each deck transformation in {\Gamma} permutes the disjoint open sets {\tilde U_{\tilde p}, \tilde p \in N_p} covering {U_p}, and given any two of these sets {\tilde U_{\tilde p}, \tilde U_{\tilde q}} there is exactly one deck transformation that maps {\tilde U_{\tilde p}} to {\tilde U_{\tilde q}}. From this one can check (Exercise!) that {\Gamma \backslash N} is complex diffeomorphic to {M} as required. \Box

Exercise 26 Write out the steps marked “Exercise!” in the above argument.

The manifold {N} in the above theorem is called a universal cover of {M}, and the group {\Gamma} is (a copy of) the fundamental group of {M}. These objects are basically uniquely determined by {M}:

Exercise 27 Suppose one has two simply connected Riemann surfaces {N, N'} and two groups {\Gamma, \Gamma'} of automorphisms of {N, N'} respectively acting in proper and free fashions. Show that the following statements are equivalent:

(Hint: use Exercise 23 for one direction of the implication, and Corollary 50 of Notes 4 for the other implication.)

Exercise 28 Let {M} be a connected Riemann surface, and let {p_0} be a point in {M}. Define the fundamental group {\pi_1(M)} based at {p_0} to be the collection of equivalence classes {[\gamma]} of closed curves {\gamma:[a,b] \rightarrow M} from {p_0} to {p_0}, under the relation of homotopy with fixed endpoints up to reparameterisation.

  • (i) Show that {\pi_1(M)} is indeed a group, with the equivalence class of the constant curves as the identity element, the inverse of a homotopy class {[\gamma]} of a curve {\gamma} defined as {[-\gamma]}, and the product {[\gamma_1] [\gamma_2]} of two homotopy classes of curves {\gamma_1,\gamma_2} as {[\gamma_1+\gamma_2]}.
  • (ii) If {N, \Gamma} are as in Theorem 25, show that {\Gamma} is isomorphic to {\pi_1(M)}.

Exercise 29 Show that the fundamental group of {{\bf C} \backslash \{0\}} is isomorphic to the integers {{\bf Z}} (viewed as an additive group).

If we assume for now the uniformisation theorem, we conclude that every connected Riemann surface is the quotient of one of the three model surfaces {{\bf C} \cup \{\infty\}}, {{\bf C}}, {D(0,1)} by a group of complex automorphisms that act freely and properly; depending on which surface is used, we call these Riemann surfaces of elliptic type, parabolic type, and hyperbolic type respectively. We can then study each of the three model types in turn:

Elliptic type: By Proposition 14, the automorphisms of {{\bf C} \cup \{\infty\}} are the Möbius transformations. From the quadratic formula (or the fundamental theorem of algebra) we see that every Möbius transformation has at least one fixed point (for instance, the translations {z \mapsto z+c} fix {\infty}). Thus the only group of complex automorphisms that can act freely on {{\bf C} \cup \{\infty\}} is the trivial group, so the only Riemann surfaces of elliptic type are those that are complex diffeomorphic to the Riemann sphere.

Parabolic type: By Proposition 14, the automorphisms of {{\bf C}} are the affine transformations {z \mapsto az+b}. These transformations have fixed points in {{\bf C}} if {a \neq 1}, so in order to obtain a free action we must restrict {\Gamma} to the translations {z \mapsto z+b}. Thus we can view {\Gamma} as an additive subgroup of {{\bf C}}, with {\Gamma \backslash {\bf C}} now being the group quotient; as the action is additive, we can also write {\Gamma \backslash {\bf C}} as {{\bf C} / \Gamma}. In order for the action to be proper, {\Gamma} must be a discrete subgroup of {{\bf C}} (every point isolated). We can classify all such subgroups:

Exercise 30 Let {\Gamma} be a discrete additive subgroup of {{\bf C}}. Show that {\Gamma} takes on one of the following three forms:

  • (i) (Rank zero case) the trivial group {\{0\}};
  • (ii) (Rank one case) a cyclic group {\omega {\bf Z} := \{ n \omega: n \in {\bf Z}\}} for some {\omega \in {\bf C} \backslash \{0\}}; or
  • (iii) (Rank two case) a group {\omega_1 {\bf Z} + \omega_2 {\bf Z} := \{ n_1 \omega_1 + n_2 \omega_2: n_1,n_2 \in {\bf Z} \}} for some {\omega_1,\omega_2 \in {\bf C} \backslash \{0\}} with {\omega_2/\omega_1} strictly complex (i.e., not real).

We conclude that every Riemann surface of parabolic type is complex diffeomorphic to a plane {{\bf C}}, a cylinder {\omega{\bf Z} \backslash {\bf C}} for some {\omega \in {\bf C} \backslash \{0\}}, or a torus {(\omega_1 {\bf Z} + \omega_2{\bf Z}) \backslash {\bf C}} for {\omega_1,\omega_2 \in {\bf C} \backslash \{0\}} and {\omega_1/\omega_2} strictly complex.

The case of the plane is self-explanatory. Using dilation maps we see that all cylinders are complex diffeomorphic to each other; for instance, they are all diffeomorphic to {2\pi i {\bf Z} \backslash {\bf C}}. The exponential map {z \mapsto \exp(z)} is {2\pi i}-periodic and thus descends to a map from {2\pi i {\bf Z} \backslash {\bf C}} to {{\bf C} \backslash \{0\}}; it is easy to see that this map is a complex diffeomorphism, thus the punctured plane {{\bf C} \backslash \{0\}} can be used as a model for all Riemann surface cylinders.

The case of the tori {(\omega_1 {\bf Z} + \omega_2{\bf Z}) \backslash {\bf C}} are more interesting. One can use dilations to normalise one of the {\omega_1,\omega_2} to be a specific value such as {1}, but one cannot normalise both:

Exercise 31 Let {(\omega_1 {\bf Z} + \omega_2{\bf Z}) \backslash {\bf C}} and {(\omega'_1 {\bf Z} + \omega'_2{\bf Z}) \backslash {\bf C}} be two tori. Show that these two tori are complex diffeomorphic if and only if there exists an element {\begin{pmatrix} a & b \\ c & d \end{pmatrix}} of the special linear group {SL_2({\bf Z})} (thus {a,b,c,d} are integers with {ad-bc=1}) such that

\displaystyle  \frac{\omega'_1}{\omega'_2} = \pm \frac{a \omega_1 + b \omega_2}{c \omega_1 + d \omega_2}.

(Hint: lift any such diffeomorphism to a holomorphic map from {{\bf C}} to {{\bf C}} of linear growth.)

In contrast to the cylinders {\omega {\bf Z} \backslash {\bf C}}, which are complex diffeomorphic to a subset {{\bf C} \backslash \{0\}} of the complex plane, one cannot model a torus by a subset {U} of {{\bf C}}; indeed, if there were a complex diffeomorphism {\phi: (\omega_1 {\bf Z} + \omega_2{\bf Z}) \backslash {\bf C} \rightarrow U}, then {U} would have to be non-empty, compact, and (by the open mapping theorem) open in {{\bf C}}, which is impossible since {{\bf C}} is non-compact and connected. However, it is an important fact in algebraic geometry, classical analysis and number theory that these tori can be modeled instead by elliptic curves. The theory of elliptic curves is extremely rich, but is beyond the scope of this course and will not be discussed further here (but the Weierstrass elliptic functions used to construct the complex diffeomorphism between tori and elliptic curves may be covered in subsequent quarters).

Exercise 32 Let {U} be a connected subset of {{\bf C}} that omits at least two points of {{\bf C}}. Show that {U} cannot be of elliptic or parabolic type. (Hint: in addition to the open mapping theorem argument given above, one can use either the great Picard theorem, Theorem 56 of Notes 4, or the simpler Casorati-Weierstrass theorem (Theorem 11 of Notes 4).) In particular, assuming the uniformisation theorem, such sets {U} must be of hyperbolic type. (Note this is compatible with our previous intuition that “more hyperbolic” is analogous to “higher genus” or “has more holes”.)

Hyperbolic type: Here it is convenient to model the hyperbolic Riemann surface using the upper half-plane {{\bf H}} (the Poincaré half-plane model) rather than the disk {D(0,1)} (the Poincaré disk model). By Exercise 20, a Riemann surface of hyperbolic type is then isomorphic to a quotient {\Gamma \backslash {\bf H}} of {{\bf H}} by some subgroup {\Gamma} of {PSL_2({\bf R})} that acts freely and properly. Properness is easily seen to be equivalent to {\Gamma} being a discrete subgroup of {PSL_2({\bf R})} (using the topology inherited from the embedding of {SL_2({\bf R})} in {{\bf R}^4}); such groups are known as Fuchsian groups. Freeness can also be described explicitly:

Exercise 33 Show that a subgroup {\Gamma} of {PSL_2({\bf R})} acts freely on {{\bf H}} if and only if it avoids all (equivalence classes) of matrices {\begin{pmatrix} a & b \\ c & d \end{pmatrix}} in {PSL_2({\bf R})} that are elliptic in the sense that they obey the trace condition {|a+d| < 2}.

It turns out that in contrast to the elliptic type and parabolic type situations, there are a very large number of possible subgroups {\Gamma} obeying these conditions, and a complete classification of them is basically hopeless. The theory of Fuchsian groups is again very rich, being a foundational topic in hyperbolic geometry, but is again beyond the scope of this course.

Remark 34 The twice-punctured plane {{\bf C} \backslash \{0,1\}} must be of hyperbolic type by the uniformisation theorem and Exercise 32. This gives another proof of the little Picard theorem: an entire function {f: {\bf C} \rightarrow {\bf C} \backslash \{0,1\}} that omits (say) the points {0,1} must then lift (by Corollary 50 of Notes 4) to a holomorphic map from {{\bf C}} to {D(0,1)}, which must then be constant by Liouville’s theorem. A more complicated argument along these lines also proves the great Picard theorem. It turns out that the covering map from {D(0,1)} to {{\bf C} \backslash \{0,1\}} can be described explicitly using the theory of elliptic functions (and specifically the modular lambda function), but this is beyond the scope of this course.

Exercise 35 (Schottky’s theorem) Show that any annulus {\{ z: r < |z| < R \}} is of hyperbolic type, and is in fact complex diffeomorphic to {\Gamma \backslash {\bf H}} for some cyclic group {\Gamma} of dilations. (Hint: first use the complex exponential to cover the annulus by a strip, then use Exercise 7.)

Exercise 36 Let {A_1 := \{ z: r_1 < |z| < R_1\}} and {A_2 := \{ z: r_2 < |z| < R_2 \}} be two annuli with {0 < r_1 < R_1} and {0 < r_2 < R_2}. Show that {A_1} and {A_2} are complex diffeomorphic if and only if {R_2/r_2 = R_1/r_1}. (Hint: one can either argue by lifting to the half-plane using the previous exercise, or else using the Schwarz reflection principle (adapted to circles in place of lines) repeatedly to extend a holomorphic map from {A_1} to {A_2} to a holomorphic map from a punctured disk to a punctured disk; one can also combine the methods by taking logarithms to lift {A_1}, {A_2} to strips, and then using the original Schwarz reflection principle.)

Exercise 37

  • (i) Show that the punctured disk {D(0,1) \backslash \{0\}} is of hyperbolic type, and is complex diffeomorphic to {{\bf Z} \backslash {\bf H}}, where {{\bf Z}} acts on {{\bf H}} by translations.
  • (ii) Show that the Jowkowsky transform {z \mapsto \frac{1}{2} (z + \frac{1}{z})} is a complex diffeomorphism from {D(0,1)} to the slitted extended complex plane {({\bf C} \cup \{\infty\}) \backslash [-1,1]}. Conclude that {{\bf C} \backslash [-1,1]} is also complex diffeomorphic to {{\bf Z} \backslash {\bf H}}.

— 3. The Riemann mapping theorem —

We are now ready to prove the Riemann mapping theorem, Theorem 4, using an argument of Koebe. To motivate the argument, let us rephrase the Schwarz lemma in the following form:

Lemma 38 (Schwarz lemma, again) Let {U} be a Riemann surface, and let {p_0} be a point in {U}. Let {{\mathcal H}_{p_0}} denote the collection of holomorphic functions {f: U \rightarrow D(0,1)} with {f(p_0) = 0}. If {{\mathcal H}_{p_0}} contains an element {\phi} that is a complex diffeomorphism, then {|f(p)| \leq |\phi(p)|} for all {p \in U \backslash \{p_0\}}; if {U} is a subset of the complex plane {{\bf C}}, we also have {|f'(p_0)| \leq |\phi'(p_0)|}. Furthermore, in either of these two inequalities, equality holds if and only if {f = e^{i\theta} \phi} for some real number {\theta}.

Proof: Apply Lemma 17 to the map {f \circ \phi^{-1}: D(0,1) \rightarrow D(0,1)}. \Box

This lemma suggests the following strategy to prove the Riemann mapping theorem: starting with the open subset {U} of the complex plane {{\bf C}}, pick a point {p_0} in that subset, and form the collection {{\mathcal H}_{p_0}} of holomorphic maps {f: U \rightarrow D(0,1)} that map {p_0} to {0}, and locate an element {\phi} of this collection for which the magnitude {|\phi'(p_0)|} is maximal. If the Riemann mapping theorem were true, then Lemma 38 would ensure that this {\phi} would be a complex diffeomorphism, and we would be done.

It turns out to be convenient to work with the somewhat smaller collection {{\mathcal I}_{p_0}} of injective holomorphic maps {f: U \rightarrow D(0,1)} (also known as univalent functions from {U} to {D(0,1)}). We first observe that this collection is non-empty for the sets {U} of interest:

Proposition 39 Let {U} be a simply connected subset of {{\bf C}} that is not all of {{\bf C}}. Then there exists an injective holomorphic map {f: U \rightarrow D(0,1)}.

Proof: By applying a translation to {U}, we may assume that {U} avoids the origin {0}. If {U} in fact avoided a disk {D(z_0,r)}, then we could use the map {z \mapsto \frac{r}{2(z-z_0)}} to map {U} injectively into the disk {D(0,1)}. At present, {U} need not avoid any disk (e.g. {U} could be the complex plane with the negative axis {\{ t \in {\bf R}: t \leq 0 \}} removed). However, as {U} is simply connected and avoids {0}, we can argue as in Section 4 of Notes 4 to obtain a holomorphic branch {f: U \rightarrow {\bf C} \backslash \{0\}} of the square root function, that is to say a holomorphic map {f} such that {f^2(z) = z} for all {z \in U}. As {f^2} is injective, {f} must also be injective; it is also clearly non-constant, so from the open mapping theorem {f(U)} is open and thus contains some disk {D(z_0,r)}. But if {w} lies in {f(U)} then {-w} cannot lie in {f(U)} since this would make the map {f^2} non-injective; thus {f(U)} avoids a disk {D(-z_0,r)}, and the claim follows. \Box

If {p_0} is a point in {U}, then the map {f} constructed by the above proposition need not map {p_0} to the origin, but this is easily fixed by composing {f} with a suitable automorphism of {D(0,1)}. To prove the Riemann mapping theorem, it will thus suffice to show

Proposition 40 Let {U} be a simply connected Riemann surface, let {p_0} be a point in {U}, and let {{\mathcal I}_{p_0}} be the collection of injective holomorphic maps {f: U \rightarrow D(0,1)} with {f(p_0)=0}. If {{\mathcal I}_{p_0}} is non-empty, then {U} is complex diffeomorphic to {D(0,1)}.

Proof: By identifying {U} with its image under one of the elements of {{\mathcal I}_{p_0}}, we may assume without loss of generality that {U} is itself an open subset of {D(0,1)}, with {p_0=0}.

Define the quantity

\displaystyle  M := \sup \{ |f'(0)|: f \in {\mathcal I}_0 \}.

As {{\mathcal I}_0} contains the identity map, {M} is at least {1}; from the Cauchy inequalities (Corollary 27 of Notes 3) we see that {M} is finite. Hence there exists a sequence {f_n} in {{\mathcal I}_0} with {|f'_n(0)|} converging to {M} as {n \rightarrow \infty}. From Montel’s theorem (Exercise 57(i) of Notes 4) we know that {{\mathcal I}_0} is a normal family, so on passing to a subsequence we may assume that the {f_n} converge locally uniformly to some limit {f: U \rightarrow \overline{D(0,1)}}. By Hurwitz’s theorem (Exercise 41 of Notes 4), the limit {f} is holomorphic and is either injective or constant. But from the higher order Cauchy integral formula (Theorem 25 of Notes 3), {f'_n(0)} converges to {f'(0)}, hence {|f'(0)|=M} and so {f} cannot be constant, and is thus injective. From the maximum principle (Exercise 18), we know that {f} takes values in {D(0,1)}, and not just {\overline{D(0,1)}}.

To conclude the proposition, we need to show that {f} is also surjective. Here we use a variant of the argument used to prove Proposition 39. Suppose for contradiction that {f(U)} avoids some point {\alpha} in {D(0,1)}. Let {R_\alpha: D(0,1) \rightarrow D(0,1)} be an automorphism of {D(0,1)} that sends {\alpha} to {0}, then {R_\alpha \circ f(U)} avoids the origin. As {U} is simply connected, we can thus find a holomorphic square root {g: U \rightarrow D(0,1) \backslash \{0\}} of {R_\alpha \circ f}, thus

\displaystyle  R_\alpha \circ f = g^2.

Since {f} and hence {R_\alpha \circ f} are injective, {g} is also. Finally, if {R_{g(0)}: D(0,1) \rightarrow D(0,1)} is an automorphism of {D(0,1)} that sends {g(0)} to {0}, then the map {h := R_{g(0)}^{-1} \circ g} lies in {{\mathcal I}_{p_0}}. The map {h} is related to {f} by the formula

\displaystyle  f = R_\alpha^{-1} \circ s \circ R_{g(0)} \circ h

where {s: z \mapsto z^2} is the squaring map. Observe that the map {R_\alpha^{-1} \circ s \circ R_{g(0)}} is a holomorphic map from {D(0,1)} to {D(0,1)} that maps {0} to {0}, and is not a rotation map (since {s} is not a Möbius transformation). Thus by the Schwarz lemma (Lemma 17), we have

\displaystyle  |(R_\alpha^{-1} \circ s \circ R_{g(0)})'(0)| < 1

and hence by the chain rule

\displaystyle  M = |f'(0)| < |h'(0)|.

But this contradicts the definition of {M}, and we are done. \Box

Exercise 41 Let {U} be an open connected non-empty subset of {{\bf C}}. Show that the following are equivalent:

  • (i) {U} is simply connected.
  • (ii) One has {\int_\gamma f(z)\ dz = 0} for every holomorphic function {f: U \rightarrow {\bf C}} and every closed curve {\gamma} in {U}.
  • (iii) For every holomorphic function {f: U \rightarrow {\bf C} \backslash \{0\}} there exists a holomorphic {g: U \rightarrow {\bf C}} with {f = \exp(g)}.
  • (iv) For every holomorphic function {f: U \rightarrow {\bf C} \backslash \{0\}} there exists a holomorphic {g: U \rightarrow {\bf C} \backslash \{0\}} with {f = g^2}.
  • (v) The complement {({\bf C} \cup \{\infty\}) \backslash U} of {U} in the Riemann sphere is connected.

(Hint: to relate (v) to the other claims, use Exercise 43 from Notes 4.)

— 4. Schwarz-Christoffel mappings —

The Riemann mapping theorem guarantees the existence of complex diffeomorphisms {f: U \rightarrow D(0,1)} for any simply connected subset {U} of the complex plane that is not all of {{\bf C}}; in particular, such diffeomorphisms exist if {U} is a polygon, by which we mean the interior region of a simple closed anticlockwise polygonal path {\gamma_{z_1 \rightarrow z_2 \rightarrow \dots \rightarrow z_n \rightarrow z_1}}. However, the proof of the Riemann mapping theorem does not supply an easy way to compute what this map {f} is. Nevertheless, in the case of polygons a reasonably explicit formula for {f} (or more precisely, for the derivative of the inverse of {f}) may be found. Our arguments here are based on those in the text of Ahlfors.

We set up some notation. Let {\gamma_{z_1 \rightarrow z_2 \rightarrow \dots \rightarrow z_n \rightarrow z_1}} be a simple closed anticlockwise polygonal path (in particular, the {z_1,\dots,z_n} are all distinct), let {U} be the polygon enclosed by this path, and let {f: U \rightarrow D(0,1)} be a complex diffeomorphism, the existence of which is guaranteed by the Riemann mapping theorem. (The map {f} is only unique up to composition by an automorphism of {D(0,1)}, but this will not concern us for the present analysis.) We adopt the convention that {z_0 := z_n} and {z_{n+1} := z_1}, and for {j=1,\dots,n}, we let {0 < \alpha_j < 2} denote the counterclockwise angle subtended by the polygon at {z_j} (normalised by a factor of {1/\pi}), in the sense that

\displaystyle  (z_{j-1}-z_j) = c_j e^{i \alpha_j \pi} (z_{j+1} - z_j)

for some real {c_j > 0}. (Note that {\alpha_j} cannot attain the values {0} or {2} as this would cause the polygona path to be non-simple.) It is also convenient to introduce the normalised exterior angle {-1 < \beta_j < 1} by {\beta_j := 1 - \alpha_j} (thus {\beta_j} is positive at a convex angle of the polygon, zero at a reflex angle, and negative at a concave angle), so that

\displaystyle  (z_{j+1}-z_j) = c_j e^{i \beta_j \pi} (z_j - z_{j-1}).

Telescoping this identity, we conclude that {\beta_1+\dots+\beta_n} must be an even integer. Indeed, from Euclidean geometry we know that the sum of the exterior angles of a polygon add up to {2\pi}, so that

\displaystyle  \beta_1+\dots+\beta_n = 2; \ \ \ \ \ (2)

we will give an analytic proof of this fact presently.

From the Alexander numbering rule (Exercise 55 of Notes 3) we see that {U} always lies to the left of the polygonal path {\gamma_{z_1 \rightarrow z_2 \rightarrow \dots \rightarrow z_n \rightarrow z_1}}. We can formalise this statement as follows. First suppose that {z_*} is a non-vertex boundary point of {U}, thus {z_* = (1-t) z_{j-1} + t z_j} for some {1 \leq j \leq n} and {0 < t < 1}. Then we can form the affine map {\phi_{z_*}: {\bf C} \rightarrow {\bf C}} by the formula

\displaystyle  \phi_{z_*}( \zeta ) := z_* + (z_{j+1} - z_{j}) \zeta,

and the numbering rule tells us that for {\varepsilon > 0} small enough, the half-disk

\displaystyle  D_+(0,\varepsilon) := \{ z \in {\bf C}: |z| < \varepsilon, \mathrm{Im}(z) > 0 \}

is mapped holomorphically by {\phi_{z_*}} into {U}. If {z_* = z_j} is instead a vertex of {U}, the situation is a little trickier; we now define the map {\phi_{z_j}: {\bf C} \backslash I_- \rightarrow {\bf C}} by the formula

\displaystyle  \phi_{z_j}( \zeta ) := z_* + (z_{j+1}- z_{j}) \zeta^{\alpha_j}

where we choose the branch of {\zeta \rightarrow \zeta^{\alpha_j}} with branch cut at the negative imaginary axis {I_- := \{ iy: y \leq 0 \}} and to be positive real on the positive real axis. Then again {\phi_{z_j}} will map {D_+(0,\varepsilon)} holomorphically into {U} for {\varepsilon} small enough. (The reader is encouraged to draw a picture to understand these maps.)

Now we perform some local analysis near the boundary. We first need a version of the Schwarz reflection principle (Exercise 37 of Notes 3) for harmonic functions.

Exercise 42 (Dirichlet problem) Let {f: S^1 \rightarrow {\bf R}} be a continuous function. Show that there exists a unique function {u: \overline{D(0,1)} \rightarrow {\bf R}} that is continuous on the closed disk {\overline{D(0,1)}}, harmonic on the open disk {D(0,1)}, and equal to {f} on the boundary {S^1}. Furthermore, show that {u} is given by the formula

\displaystyle  u(z) = \int_0^{2\pi} P( e^{-i\alpha}z) f(e^{i\alpha})\ d\alpha

for {z \in D(0,1)}, where {P} is the Poisson kernel

\displaystyle  P(z) := \mathrm{Re} \frac{1+z}{1-z}

(compare with Exercise 17 of Notes 3).

Lemma 43 (Schwarz reflection for harmonic functions) Let {U} be an open subset of {{\bf C}} symmetric around the real axis, and let {u: \overline{U_+} \rightarrow {\bf R}} be a continuous function on the region {\overline{U_+} := \{z \in U: \mathrm{Im} z \geq 0\}} that vanishes on {U \cap {\bf R}} and is harmonic in {U_+ := \{ z \in U: \mathrm{Im} z > 0 \}}. Let {\tilde u: U \rightarrow {\bf R}} be the antisymmetric extension of {u}, defined by setting {\tilde u(z) = u(z)} and {\tilde u(\overline{z}) = -u(z)} for {z \in \overline{U_+}}. Then {\tilde u} is harmonic.

Proof: Morally speaking, this lemma follows from the analogous reflection principle for holomorphic functions, but there is a difficulty because we do not have enough regularity on the real axis to easily build a harmonic conjugate that is continuous all the way to the real axis. Instead we shall rely on the maximum principle as follows.

It is clear that {\tilde u} is continuous and harmonic away from the real axis, so it suffices to show for any {x_0 \in U \cap {\bf R}} and any small {\varepsilon>0} that {\tilde u} is harmonic on {D(x_0,\varepsilon)}.

Using Exercise 42, we can find a continuous function {v: \overline{D(x_0,\varepsilon)} \rightarrow {\bf R}} which agrees with {\tilde u} on the boundary and is harmonic on the interior. From the antisymmetry of {\tilde u} and uniqueness (or the Poisson kernel formula) we see that {v} is also antisymmetric and thus vanishes on the real axis. The difference {\tilde u-v} is then harmonic on the half-disks {\{ z \in D(x_0,\varepsilon): \mathrm{Im} z > 0\}} and {\{ z \in D(x_0,\varepsilon): \mathrm{Im} z < 0\}} and vanishes on the boundary of these half-disks, so by the maximum principle they vanish everywhere in {D(x_0,\varepsilon)}. Thus {\tilde u} agrees with {v} on {D(x_0,\varepsilon)} and is therefore harmonic on this disk as required. \Box

Proposition 44 Let {z_*} be a boundary point of {U} (which may or may not be a vertex). Then for {\varepsilon > 0} small enough, the maps {f \circ \phi_{z_*}: D_+(0,\varepsilon) \rightarrow D(0,1)} extend holomorphically to a map from {D(0,\varepsilon)} to {{\bf C}} which maps the origin to a point on the unit circle. Furthermore, this map is injective for {\varepsilon} small enough.

Proof: For any {0 < r < 1}, the preimage {f^{-1}(\overline{D(0,r)})} of the closed disk {\overline{D(0,r)}} is a compact subset of {K} and thus stays a positive distance away from the boundary of {U}. In particular, for {z \in U} sufficiently close to the boundary of {U}, {|f(z)|} must exceed {r}. We conclude that the function {|f|: U \rightarrow [0,1)} extends continuously to a map from {\overline{U}} to {[0,1]}, by declaring the map to equal {1} on the boundary. In particular, for {\varepsilon} small enough, the map {|f| \circ \phi_{z_j}: D_+(0,\varepsilon) \rightarrow [0,1)} also extends continously to {\overline{D_+(0,\varepsilon)}}, and equals {1} on the real boundary of {D_+(0,\varepsilon)}. For {\varepsilon} small enough, {|f|} avoids zero on this region, and so the function {\log |f| \circ \phi_{z_*}: D_+(0,\varepsilon) \rightarrow {\bf R}} will extend continuously to {\overline{D_+(0,\varepsilon)}}, and vanish on the real portion of the boundary. By taking local branches of {\log f} we see that this function {\log |f| \circ \phi_{z_j}} is also harmonic. By Lemma 43, {\log |f| \circ \phi_{z_*}} extends harmonically to {D(0,\varepsilon)}, and on taking harmonic conjugates we conclude that {\log f \circ \phi_{z_*}} extends holomorphically to {D(0,\varepsilon)}. Taking exponentials, we obtain a holomorphic extension {g_{z_*}: D(0,\varepsilon) \rightarrow {\bf C}} of {f \circ \phi_{z_*}} to {D(0,\varepsilon)}, with {|g_{z_*}(0)|=1}. To prove injectivity, it suffices (shrinking {\varepsilon} as necessary) to show that the derivative of {g_{z_*}} at {0} is non-zero. But if this were not the case, then {g_{z_*} - g_{z_*}(0)} would have a zero of order at least two, which by the factor theorem implies that {g_{z_*} - g_{z_*}(0)} would not map {D_+(0,\varepsilon)} to a half-plane bordering the origin, and in particular cannot map to {D(0,1) - g_{z_*}(0)}, a contradiction. \Box

As a corollary, we see that {f} extends to a continuous map {f: \overline{U} \rightarrow \overline{D(0,1)}} that maps {\partial U} to {S^1}, and around every point {z_*} in the boundary of {U}, {f} maps a small neighbourhood {\phi_{z_*}(D_+(0,\varepsilon))} of {z_*} in {U} to a small neighbourhood of {f(z_*)} in {D(0,1)}. As {f} is injective on {U}, this implies that {f} is also injective on the boundary of {U}. The image {f(\overline{U})} is compact in {\overline{D(0,1)}} and contains {D(0,1)}, hence {f: \overline{U} \rightarrow \overline{D(0,1)}} is in fact a bijective continuous map between compact Hausdorff spaces and is thus a homoeomorphism. Thus we can form an inverse map {F: \overline{D(0,1)} \rightarrow \overline{U}}, which maps {D(0,1)} holomorphically to {U}. (This latter claim in fact works if one replaces the polygonal path {\gamma_{z_1 \rightarrow \dots \rightarrow z_n \rightarrow z_1}} by a arbitrary simple closed curve; this is a theorem of Carathéodory.)

Consider the function {f} on the line segment from {z_j} to {z_{j+1}}. By Proposition 44, {f} is smooth on this line segment, has non-zero derivative, and takes values in {S^1}; setting {w_j := f(z_j) \in S^1}, we see that {f} must traverse a simple curve from {w_j} to {w_{j+1}} in {S^1}. As {f} is orientation preserving, {U} lies to the left of the line segment {\gamma_{z_j \rightarrow z_{j+1}}}, and the disk {D(0,1)} lies to the left of {S^1} traversed anticlockwise, we see that {f} must traverse the anticlockwise arc from {w_j} to {w_{j+1}}. Following {f} all around {\gamma_{z_1 \rightarrow \dots \rightarrow z_n \rightarrow z_1}}, we see that {w_1,\dots,w_n} must be arranged anticlockwise in the unit circle in the sense that we have {w_j = e^{\pi i \theta_j}} for all {1 \leq j \leq n} for some

\displaystyle  \theta_1 < \theta_2 < \dots < \theta_n < \theta_{n+1} := \theta_1 + 2.

Inverting, we see that for any {1 \leq j \leq n}, {F} smoothly maps the anticlockwise arc {\{ e^{i\theta}: \theta_j < \theta < \theta_{j+1}\}} from {w_j} to {w_{j+1}} to the line segment {\{ (1-t) z_j + t z_{j+1}: 0 < t < 1 \}} from {z_j} to {z_{j+1}}, with derivative nonvanishing. Thus on taking arguments

\displaystyle  \mathrm{arg} \frac{d}{d\theta} F(e^{i\theta}) = \mathrm{arg}(z_{j+1} - z_j)

and thus by the chain rule

\displaystyle  \mathrm{arg} i e^{i\theta} F'(e^{i\theta}) = \mathrm{arg}(z_{j+1} - z_j) \ \ \ \ \ (3)

for {\theta_j < \theta < \theta_{j+1}}.

Next, we study {f} near {z_j} (and {F} near {w_j}) for some {1 \leq j \leq m}. From Proposition 44 we see that in a sufficiently small neighbourhood of {w_j} in {\overline{D(0,1)}}, one has {F = \phi_{z_j} \circ h_{z_j}} for some injective holomorphic map {h_{z_j}} from a neighbourhood of {w_j} in {{\bf C}} to a neighbourhood of {0} in {{\bf C}} that maps {w_j} to zero. Since {F} maps the arc from {w_j} to {w_{j+1}} to the line segment from {z_j} to {z_{j+1}}, {h_{z_j}} must map the portion of the arc from {w_j} to {w_{j+1}} near {w_j} to a portion of the positive real axis; in particular, by the chain rule, {i w_j h'_{z_j}(w_j)} is a positive real, call it {a_j}. If we factor

\displaystyle  h_{z_j}(w) = a_j \frac{w-w_j}{iw_j} \frac{h_{z_j}(w)}{h'_{z_j}(w_j) (w - w_j)},

noting that the third factor is close to one and the second factor lies in the upper half-plane, we have

\displaystyle  h_{z_j}(w)^{\alpha_j} = a_j^{\alpha_j} (\frac{w-w_j}{iw_j})^{\alpha_j} (\frac{h_{z_j}(w)}{h'_{z_j}(w_j) (w - w_j)})^{\alpha_j}

and hence from {F = \phi_{z_j} \circ h_{z_j}} we have the factorisation

\displaystyle  F(w) = z_j + (\frac{w-w_j}{iw_j})^{\alpha_j} G_j(w)

for {w} near {w_j} in {D(0,1)}, for some {G_j} that is holomorphic and non-zero in a neighbourhood of {w_j} in {{\bf C}}. Differentiating using {\beta_j = \alpha_j - 1}, we conclude that

\displaystyle  F'(w) = (\frac{w-w_j}{iw_j})^{-\beta_j} \tilde G_j(w) \ \ \ \ \ (4)

for {w} near {w_j} in {D(0,1)}, for some {\tilde G_j} that is also holomorphic and non-zero in a neighbourhood of {w_j} in {{\bf C}}.

The function {F': D(0,1) \rightarrow {\bf C}} is holomorphic and non-vanishing; as {D(0,1)} is simply connected, we must therefore have {F' = \exp(H)} for some holomorphic {H: D(0,1) \rightarrow {\bf C}} (by Exercise 46 of Notes 4). For any {\theta} between {\theta_j} and {\theta_{j+1}}, we see from the previous discussion that {F} extends holomorphically to a neighbourhood of {e^{i\theta}}, with {F'} non-vanishing at {e^{i\theta}}, so {H} extends also. From (3) we see that the argument of {i e^{i\theta} \exp( H(e^{i\theta}) )} is constant on the interval {(\theta_j, \theta_{j+1})}, and hence

\displaystyle  \theta + \mathrm{Im}( H(e^{i\theta})) \ \ \ \ \ (5)

is also constant on this interval. Meanwhile, from (4) we see that for {w} near {w_j = e^{i\theta_j}} in {D(0,1)}, we have

\displaystyle  H(w) = - \beta_j \mathrm{Log}_{I_-}( \frac{w-w_j}{iw_j}) + g_j(w)

for some {g_j} holomorphic in a neighbourhood of {w_j} in {{\bf C}}, where {\mathrm{Log}_{I_-}} is a branch of the complex logarithm with branch cut at {I_-}. From this we see that the function {\theta \mapsto \theta + \mathrm{Im}( H(e^{i\theta}))} has a jump discontinuity with jump {\beta \pi} as {\theta} crosses {\theta_j}. As this function clearly increases by {2\pi} when {\theta} increases by {2\pi}, we conclude the geometric identity (2).

Now consider the modified function {\tilde H: D(0,1) \rightarrow {\bf C}} defined by

\displaystyle  \tilde H(w) := H(w) + \sum_{j=1}^n \beta_j \mathrm{Log}_{I_-}( \frac{w-w_j}{iw_j}).

Then {\tilde H} is holomorphic on {D(0,1)}, and by the above analysis it extends continuously to {\overline{D(0,1)}}. We consider the imaginary part at {w = e^{i\theta}},

\displaystyle  \mathrm{Im} \tilde H(e^{i\theta}) := \mathrm{Im} H(e^{i\theta}) + \sum_{j=1}^n \beta_j \mathrm{Arg}_{I_-}( \frac{e^{i\theta}-w_j}{iw_j}),

where {\mathrm{Arg}_{I_-}} is a branch of the argument function with branch cut at {I_-}. Writing {e^{i\theta} - w_j = 2 i e^{i(\theta+\theta_j)/2} \sin( (\theta - \theta_j)/2 )}, we see that {\mathrm{Arg}_{I_-}( \frac{e^{i\theta}-w_j}{iw_j}) - \theta/2} is constant as long as {\theta - \theta_j} is not an integer multiple of {2\pi}. From this, (5), and (2), we see that the function {\theta \mapsto \mathrm{Im} \tilde H(e^{i\theta})} is constant on each arc {\theta_j < \theta < \theta_{j+1}}. Thus the function {\mathrm{Im} \tilde H} is harmonic on {D(0,1)}, continuous on {\overline{D(0,1)}}, and constant on the boundary {S^1}, so by the maximum principle it is constant, which from the Cauchy-Riemann equations makes {\tilde H} constant also. Thus we have

\displaystyle  H(w) = c - \sum_{j=1}^n \beta_j \mathrm{Log}_{I_-}( \frac{w-w_j}{iw_j})

on {D(0,1)} for some complex constant {c}, which on exponentiating gives

\displaystyle  F'(w) = \frac{C_1}{\prod_{j=1}^n ( \frac{w-w_j}{iw_j} )^{\beta_j} } \ \ \ \ \ (6)

on {D(0,1)} for some non-zero complex constant {C_1}. Applying the fundamental theorem of calculus, we obtain the Schwarz-Christoffel formula:

Theorem 45 (Schwarz-Christoffel for the disk) Let {\gamma_{z_1 \rightarrow \dots \rightarrow z_n \rightarrow z_1}} be a closed simple anticlockwise polygonal path, and define the exterior angles {-1 < \beta_1,\dots,\beta_n < 1} as above. Let {U} be the polygon enclosed by this path, and let {F: D(0,1) \rightarrow U} be a complex diffeomorphism. Then there exist phases {w_j = e^{i \pi \theta_j}}, {j=1,\dots,n} for some {\theta_1 < \dots < \theta_n < \theta_1+2}, a non-zero complex number {C_1}, and a complex number {C_2} such that

\displaystyle  F(w) = C_1 \int_0^w \frac{d\omega}{\prod_{j=1}^n ( \frac{\omega-w_j}{iw_j} )^{\beta_j} } + C_2

for all {w \in D(0,1)}, where the integral is over an arbitrary curve from {0} to {w}, and one selects a branch of {z \mapsto z^{\beta_j}} with branch cut on the negative imaginary axis {I_- := \{iy: y \geq 0\}}. Furthermore, {F(w)} converges to {z_j} as {w} approaches {w_j} for every {1 \leq j \leq n}.

Note that one can change the branches of {z \mapsto z^{\beta_j}} here, and also modify the normalising factors {iw_j}, by adjusting the constant {C_1} in a suitable fashion, as long one does not move the branch cut for {\prod_{j=1}^n ( \frac{\omega-w_j}{iw_j} )^{\beta_j}} into the disk {D(0,1)}; one can similarly change the initial point {0} of the curve to any other point in {D(0,1)} by adjusting {C_2}. By taking log-derivatives in (6), we can also express the Schwarz-Christoffel formula equivalently as a partial fractions decomposition of {F''/F'}:

\displaystyle  \frac{F''(w)}{F'(w)} = \sum_{j=1}^n \frac{\beta_j}{w-w_j}.

The Schwarz-Christoffel formula does not completely describe the conformal mappings from {U} to the disk, because it does not specify exactly what the phases {w_j} and the complex constants {C_1,C_2} are. As the group of automorphisms {z \mapsto e^{i\theta} \frac{z-\alpha}{1-\overline{\alpha}z}} of {D(0,1)} has three degrees of freedom (one real parameter {\theta} and one complex parameter {\alpha}), one can for instance fix three of the phases {w_j}, but in general there are no simple formulae to then reconstruct the remaining parameters in the Schwarz-Christoffel formula, although numerical algorithms exist to compute them approximately. (In the case when the polygon is a rectangle, though, the Schwarz-Christoffel formula essentially produces an elliptic integral, and the complex diffeomorphisms from the rectangle to the disk or half-space are closely tied to elliptic functions; see Section 4.5 of Stein-Shakarchi for more discussion.)

Exercise 46 (Schwarz-Christoffel in a half-space) Let {\gamma_{z_1 \rightarrow \dots \rightarrow z_n \rightarrow z_1}} be a closed simple anticlockwise polygonal path, and define the exterior angles {-1 < \beta_1,\dots,\beta_n < 1} as above. Let {U} be the polygon enclosed by this path, and let {F: {\bf H} \rightarrow U} be a complex diffeomorphism from the upper half-plane {{\bf H} := \{ z \in {\bf C}: \mathrm{Im} z > 0 \}} to {U}.

  • (i) Show that {F} extends to a homeomorphism from the closure {\overline{{\bf H}}} of the upper half-plane in the Riemann sphere to {\overline{U}}, and that {F(z_1),\dots,F(z_n)} all lie on {{\bf R} \cup \{\infty\}}.
  • (ii) If all of the {F(z_1),\dots,F(z_n)} are finite, show that after a cyclic permutation one has {F(z_1) < \dots < F(z_n)}, and that there exists a non-zero complex number {C_1}, and a complex number {C_2} such that

    \displaystyle  F(w) = C_1 \int_i^w \frac{d\omega}{\prod_{j=1}^{n} ( \omega-F(z_j) )^{\beta_j} } + C_2

    for all {w \in {\bf H}}, where the integral is over any curve from {i} to {w}.

  • (iii) If one of the {F(z_1),\dots,F(z_n)} are infinite, show after a cyclic permutation that one has {F(z_1) < \dots < F(z_{n-1})} and {F(z_n) = \infty}, and there exists a non-zero complex number {C_1}, and a complex number {C_2} such that

    \displaystyle  F(w) = C_1 \int_i^w \frac{d\omega}{\prod_{j=1}^{n-1} ( \omega-F(z_j) )^{\beta_j} } + C_2

    for all {w \in {\bf H}}.

Remark 47 One could try to apply the Schwarz-Christoffel formula to a closed polygonal path {\gamma_{z_1 \rightarrow \dots \rightarrow z_n \rightarrow z_1}} that is not simple. In such cases (and after choosing the parameters {C_1,C_2,w_1,\dots,w_n} correctly), what tends to happen is that the map {F} still maps the circle {S^1} to the closed path, but fails to be injective.

Exercise 48 Let {F: P \rightarrow {\bf H}} be a complex diffeomorphism from the half-strip {P := \{ z \in {\bf C}: \mathrm{Im} z > 0; -\frac{\pi}{2} < \mathrm{Re} z < \frac{\pi}{2} \}} to the upper half-plane {{\bf H}}, which extends to a continuous map {F: \overline{P} \rightarrow \overline{{\bf H}}} to the closures of {P}, {{\bf H}} in the Riemann sphere. Suppose that {F} maps {-\frac{\pi}{2}, \frac{\pi}{2}, \infty} to {-1,1,\infty} respectively. Show that {F'(w) = \frac{1}{(1-w^2)^{1/2}}}, where we take the branch of the square root that is positive on the real axis and has a branch cut at {I_-}. (Hint: {P} is not quite a polygon, so one cannot directly apply the Schwarz-Christoffel formula; however the proof of that formula will still apply.)

— 5. The uniformisation theorem (optional) —

Now we discuss a proof of the uniformisation theorem, Theorem 5, following the approach in these notes of Marshall. Unfortunately the argument is rather complicated, and we will only give a portion of the proof here. One of the many difficulties in trying to prove this theorem is the fact that the conclusion is a disjunction of three alternatives, each with a rather different complex geometry; it would be easier if there was only one target geometry that one was trying to impose on the Riemann surface {M}. To begin separating the three geometries from each other, recall from Liouville’s theorem that there are no non-constant bounded holomorphic functions on {{\bf C}} or {{\bf C} \cup \{\infty\}}, but plenty of non-constant bounded holomorphic functions on {D(0,1)}. By Lemma 1, the same claims hold for Riemann surfaces that are complex diffeomorphic to {{\bf C}} or {{\bf C} \cup \{\infty\}} or to {D(0,1)} respectively. Note that without loss of generality we may normalise “bounded” by replacing it with “mapping into {D(0,1)}“. From this we see that the uniformisation theorem can be broken up into two simpler pieces:

Theorem 49 (Uniformisation theorem, hyperbolic case) Let {M} be a simply connected Riemann surface that admits a non-constant holomorphic map from {M} to {D(0,1)}. Then {M} is complex diffeomorphic to {D(0,1)}.

Theorem 50 (Uniformisation theorem, non-hyperbolic case) Let {M} be a simply connected Riemann surface that does not admit a non-constant holomorphic map from {M} to {D(0,1)}. Then {M} is complex diffeomorphic to {{\bf C}} or {{\bf C} \cup \{\infty\}}.

Let us now focus on the hyperbolic case of the uniformisation theorem, Theorem 49. Now we do not have the disjunction problem as there is only one target geometry to impose on {M}; we will be able to give a complete proof of this theorem here (in contrast to Theorem 50, where we will only give part of the proof). Let {p_0} be a point in {M}, and recall that {{\mathcal H}_{p_0}} denotes the collection of holomorphic maps {f: M \rightarrow D(0,1)} that maps {p_0} to {0}. By hypothesis (and applying a suitable automorphism of {D(0,1}), {{\mathcal H}_{p_0}} contains at least one non-constant map. If Theorem 49 were true, then from Lemma 38 we see that {{\mathcal H}_{p_0}} would contain a “maximal” element {\phi} which would exhibit the desired complex diffeomorphism between {M} and {D(0,1)}.

It turns out that the converse statement is true: if we can locate “maximal” elements of {{\mathcal H}_{p_0}} with certain properties, then we can prove Theorem 49. More precisely, Theorem 49 can be readily deduced from the following claim.

Theorem 51 (Maximal maps into {D(0,1)}) Let {M} be a simply connected Riemann surface, let {p_0} be a point in {M}, and let {{\mathcal H}_{p_0}} be the collection of holomorphic maps from {M} to {D(0,1)} that map {p_0} to {0}. Suppose that {{\mathcal H}_{p_0}} contains a non-constant map. Then {{\mathcal H}_{p_0}} contains a map {\phi_{p_0}: M \rightarrow D(0,1)} with the property that {|f(p)| \leq |\phi_{p_0}(p)|} for all {p \in M \backslash \{p_0\}}, with equality only if {f = e^{i\theta} \phi_{p_0}} for some real number {\theta}. Furthermore {\phi_{p_0}} has a simple zero at {p_0}, and no other zeroes.

We have seen how Theorem 49 implies Theorem 51. Let us now demonstrate the converse implication, assuming Theorem 51 for the moment and deriving Theorem 49. Let {M} be a simply connected Riemann surface that admits non-constant holomorphic maps from {M} to {D(0,1)}, and pick a point {p_0} in {M}. By applying a suitable automorphism of {D(0,1)} we see that {{\mathcal H}_{p_0}} has a non-constant map, so by Theorem 51 this collection contains an element {\phi_{p_0}} with the stated properties. If {\phi_{p_0}} were injective, then we could apply Proposition 40 to conclude that {M} and {D(0,1)} were complex diffeomorphic, so suppose for contradiction that {\phi_{p_0}} was not injective. Since {\phi_{p_0}} has a zero only at {p_0}, we thus have {\phi_{p_0}(p_1) = \phi_{p_0}(p_2) \neq 0} for some distinct {p_1, p_2 \in M \backslash \{p_0\}}. Let {R: D(0,1) \rightarrow D(0,1)} be the automorphism

\displaystyle  R(z) := \frac{z - \phi_{p_0}(p_1)}{1 - \overline{\phi_{p_0}(p_1)} z}

that maps {\phi_{p_0}(p_1)} to {0} and {0} to {-\phi_{p_0}(p_1)}, then the function {R \circ \phi_{p_0}: M \rightarrow D(0,1)} lies in {{\mathcal H}_{p_1}} and also has a zero at {p_2}. From Theorem 51, we thus have

\displaystyle  |R \circ \phi_{p_0}(p_0)| \leq |\phi_{p_1}(p_0)|;

since {\phi_{p_0}(p_0)} vanishes, we thus have from the definition of {R} that

\displaystyle |\phi_{p_0}(p_1)| \leq |\phi_{p_1}(p_0)|.

Swapping the roles of {p_0} and {p_1} gives the reverse inequality, thus we in fact have

\displaystyle  |R \circ \phi_{p_0}(p_0)| = |\phi_{p_1}(p_0)|


\displaystyle  |\phi_{p_0}(p_1)| =|\phi_{p_1}(p_0)|. \ \ \ \ \ (7)

Applying Theorem 51 again, we conclude that

\displaystyle  R \circ \phi_{p_0} = e^{i\theta} \phi_{p_1}

for some {\theta \in {\bf R}}. But {R \circ \phi_{p_0}} has a zero at {p_2} while {\phi_{p_1}} cannot have any zeroes other than at {p_1}, a contradiction.

Remark 52 We only established that {\phi_{p_0}: M \rightarrow D(0,1)} was injective in the above argument, but by inspecting the proof of Proposition 40 and using the maximality properties of {\phi_{p_0}} we see that {\phi_{p_0}} is also surjective, and thus supplies the required complex diffeomorphism between {M} and {D(0,1)}. In a similar vein, the arguments in the preceding section show that under the hypotheses of Theorem 49, there exists a surjective map from {M} to {D(0,1)}, but one needs something like Theorem 51 to obtain the crucial additional property of injectivity (which was automatic in the preceding section, since one already started with an injection in hand).

To finish off the hyperbolic case of the uniformisation theorem, it remains to prove Theorem 51. It is convenient to work with harmonic functions instead of holomorphic functions. Observe that if {\phi_{p_0}: M \rightarrow D(0,1)} were holomorphic with a simple zero at {p_0} but no other zeroes, then we have local holomorphic branches of {\log \frac{1}{\phi_{p_0}}} on small neighbourhoods of any point in {M \backslash \{p_0\}}. Taking real parts, we conclude that the function {g_{p_0} := \log \frac{1}{|\phi_{p_0}|}: M \backslash \{p_0\} \rightarrow {\bf R}} is harmonic on the punctured surface {M \backslash \{p_0\}}; it is also positive since {\phi_{p_0}} takes values in {D(0,1)}. Furthermore, the function {g_{p_0}} has a logarithmic singularity at {p_0} in the following sense: if {z: U_{p_0} \rightarrow D(0,1)} was any coordinate chart on some neighbourhood {U_{p_0}} of {p_0} that mapped {p_0} to {D(0,1)}, then as {\phi_{p_0}} had a simple zero at {p_0}, the function {\log \frac{1}{|\phi_{p_0}|} - \log \frac{1}{|z|}}, defined on {U_{p_0} \backslash \{p_0\}}, stays bounded as one approaches {p_0}.

Conversely, one can reconstruct {\phi_{p_0}} from {g_{p_0}} (up to a harmless phase {e^{i\theta}}) by the following lemma.

Lemma 53 (Reconstructing a holomorphic function from its magnitude) Let {M} be a simply connected Riemann surface, let {p_0} be a point in {M}, and let {g_{p_0}: M \backslash \{p_0\} \rightarrow {\bf R}} be harmonic. Suppose that {g_{p_0}} has a logarithmic singularity at {p_0} in the sense that {g_{p_0} - \log \frac{1}{|z|}} is bounded near {p_0} for some coordinate chart {z: U_{p_0} \rightarrow D(0,1)} on a neighbourhood {U_{p_0}} of {p_0} that maps {p_0} to {0}. Then there exists a holomorphic function {\phi: M \rightarrow {\bf C}} with a simple zero at {p_0} and no other zeroes, such that {g_{p_0} = \log \frac{1}{|\phi|}} on {M \backslash \{p_0\}}.

Proof: Let {g_{p_0}} be as above. Call a function {\phi: U \rightarrow {\bf C}} on an open subset {U} of {M} good if it is holomorphic with {g_{p_0} = \log \frac{1}{|\phi|}} on {U \backslash \{p_0\}} (in particular this forces {\phi} to be non-zero away from {p_0}), and has a simple zero at {p_0} if {p_0} lies in {U}. Clearly it will suffice to find a good function on all of {M}.

We first solve the local problem, showing that for any {p \in M} there exists a neighbourhood {U_p} of {p} that supports a good function {\phi_p: U_p \rightarrow {\bf C}}. If {p \neq p_0}, we can work in a chart {U_p} avoiding {p_0} which is diffeomorphic to a disk {D(0,1)}. If we identify {U_p} with {D(0,1)} then {g_{p_0}} restricted to {U_p} can be viewed as a harmonic function on {D(0,1)}. As this disk is simply connected, {g} will have a harmonic conjugate and is thus the real part of a holomorphic function {f} on this disk. Taking {\phi_p} to be {e^{-f}} we obtain the required good function. Now suppose instead that {p = p_0}. Using the coordinate chart {z: U_{p_0} \rightarrow D(0,1)} to identify {U_{p_0}} with {D(0,1)}, we now have a harmonic function {g_{p_0}: D(0,1) \backslash \{0\} \rightarrow {\bf R}} with {g_{p_0} - \log\frac{1}{|z|}} bounded near zero. Applying Exercise 59 of Notes 4, we conclude that {g_{p_0} - \log\frac{1}{|z|}} extends to a holomorphic function {h} on {D(0,1)}, which is then the real part of a holomorphic function {f}; taking {\phi_{p_0} := z e^{-f}} then gives a good function on {U_{p_0}}.

Next, we make the following compatibility observation: if {\phi: U \rightarrow {\bf R}} and {\psi: V \rightarrow {\bf R}} are both good functions, then {\phi/\psi} is constant on every connected component of {U \cap V} (after removing any singularity at {p_0}). Indeed, by construction {\phi/\psi} is holomorphic and of magnitude one, so locally there are holomorphic branches of {\log(\phi/\psi)} that have vanishing real part, hence locally constant imaginary part by the Cauchy-Riemann equations. Hence {\phi/\psi} is locally constant as claimed.

Now we need to glue together the local good functions into a global good functions. This is a “monodromy problem”, which can be solved using analytic continuation and the simply connected nature of {M} by the following “monodromy theorem” argument. Let us pick a good function {\phi_{p_0}: U_{p_0} \rightarrow {\bf R}} on some neighbourhood of {p_0}. Given any other point {p} in {M}, we can form a path {\gamma: [0,1] \rightarrow M} from {p_0} to {p}. We claim that for any {0 \leq T \leq 1}, we can find a finite sequence {0 = t_0 < t_1 < \dots < t_n = T} and good functions {\phi_j: U_j \rightarrow {\bf R}} for {j=1,\dots,n} such that each {U_j} contains {\gamma([t_{j-1},t_j])}, and such that {\phi_j} and {\phi_{j+1}} agree on a neighbourhood of {\gamma(t_j)} for each {j=1,\dots,n-1}, and {\phi_1} and {\phi_{p_0}} also agree on a neighbourhood of {\gamma(t_0)}. The set {\Omega} of such {T} is easily seen to be an open non-empty subset of {[0,1]}. Now we claim that it is closed. Suppose that {T_k \in \Omega} converges to a limit {T_* \in \Omega} as {k \rightarrow \infty}. If any of the {T_k} are greater than or equal to {T_*} it is easy to see that {T_* \in \Omega}, so suppose instead that the {T_k} are all less than {T_*}. We take a good function {\psi_*: U_* \rightarrow {\bf R}} supported on some neighbourhood {U_*} of {\gamma(T_*)}. By continuity, {U_*} will contain {\gamma([T_k,T_*])} for some sufficiently large {k}. We would like to append {\psi_*} and {U_*} to the sequence of good functions {\phi_j: U_j \rightarrow {\bf R}}, {j=1,\dots,k} one obtains from the hypothesis {T_k \in \Omega}, but there is the issue that {\psi_*} need not agree with {\phi_k} at the endpoint {\gamma(T_k)}. However, they only differ by a constant of magnitude one near this endpoint, so after multiplying {\psi_*} by an appropriate constant of magnitude one, we can conclude that {T_*\in \Omega} as claimed.

By the continuity method, {\Omega} is all of {[0,1]}, and in particular contains {1}. Thus we can find {0 = t_0 < t_1 < \dots < t_n = 1} and good functions {\phi_j: U_j \rightarrow {\bf R}} for {j=1,\dots,n} such that each {U_j} contains {\gamma([t_{j-1},t_j])}, and such that {\phi_j} and {\phi_{j+1}} agree on a neighbourhood of {\gamma(t_j)} for each {j=1,\dots,n-1}, and {\phi_1} and {\phi_{p_0}} also agree on a neighbourhood of {\gamma(t_0)}. Consider the final value {\phi_n(p)} obtained by the last good function {\phi_n: U_n \rightarrow {\bf R}} at the endpoint {\gamma(t_n) = p} of the curve {\gamma}. From analytic continuation and a continuity argument we see that if we perform a homotopy of {\gamma} with fixed endpoints, this final value does not change (even if the number {n} of good functions may vary). Thus we can define a function {\phi: M \rightarrow {\bf R}} by setting {\phi(p) := \phi_n(p)} whenever {\gamma} is a path from {p_0} to {p} and {\phi_n} is the final good function constructed by the above procedure. From construction we see that {\phi} is locally equal to a good function at every point in {M}, and is thus itself a good function, as required. \Box

Exercise 54 (Monodromy theorem) Let {M} be a simply connected Riemann surface, let {N} be another Riemann surface, let {p_0} be a point in {M}, let {U_{p_0}} be an open neighbourhood, and let {\phi_{p_0}: U_{p_0} \rightarrow N} be holomorphic. Prove that the following statements are equivalent.

  • (i) {\phi_{p_0}} has a holomorphic extension to {M}; that is to say, there is a holomorphic function {f: M \rightarrow N} whose restriction to {U_{p_0}} is equal to {\phi_{p_0}}.
  • (ii) For every curve {\gamma: [0,1] \rightarrow M} starting at {p_0}, we can find {0 = t_0 < t_1 < \dots < t_n = 1} and holomorphic functions {\phi_j: U_j \rightarrow {\bf R}} for {j=0,\dots,n} with {\phi_0 = \phi_{p_0}}, such that {\phi_j} and {\phi_{j+1}} agree on a neighbourhood {\gamma(t_j)} for each {j=0,\dots,n-1}.

Furthermore, if (i) holds, show that the holomorphic extension {f: M \rightarrow N} is unique. Give a counterexample that shows that the monodromy theorem fails if {M} is only assumed to be connected rather than simply connected.

We remark that while the condition (ii) in the monodromy theorem looks somewhat complicated, it becomes more geometrically natural if one adopts the language of sheaves, which we will not do here.

In view of Lemma 53, we may reduce the task of establishing Theorem 51 to that of establishing the existence of a special type of harmonic function on {M} (with one point {p_0} removed), namely a Green’s function:

Definition 55 (Green’s function) Let {M} be a connected Riemann surface, and let {p_0} be a point in {M}. A Green’s function for {M} at {p_0} is a function {g_{p_0}: M \backslash \{p_0\} \rightarrow {\bf R}} with the following properties:

  • (i) {g_{p_0}} is harmonic on {M \backslash \{p_0\}}.
  • (ii) {g_{p_0}} is non-negative on {M \backslash \{p_0\}}.
  • (iii) {g_{p_0}} has a logarithmic singularity at {p_0} in the sense that {g_{p_0} - \log \frac{1}{|z|}} is bounded near {p_0} for some coordinate chart {z: U_{p_0} \rightarrow D(0,1)} that maps {p_0} to {0}.
  • (iv) {g_{p_0}} is minimal with respect to the properties (i)-(iii), in the sense that for any other {g'_{p_0}: M \backslash \{p_0\} \rightarrow {\bf R}} obeying (i)-(iii), we have {g_{p_0} \leq g'_{p_0}} pointwise in {M \backslash \{p_0\}}.

Clearly if a Green’s function for {M} at {p_0} exists, it is unique by property (iv), so we can talk about the Green’s function for {M} at {p_0}, if it exists. In the case of the disk {D(0,1)}, a Greens’ function may be explicitly computed:

Exercise 56 If {\alpha \in D(0,1)}, show that the function {g_\alpha: D(0,1) \rightarrow {\bf R}} defined by {g_\alpha(z) := \log \frac{|1-\overline{\alpha} z|}{|z-\alpha|}} is a Green’s function for {D(0,1)} at {\alpha}.

Theorem 51 may now be deduced from the following claim.

Proposition 57 (Existence of Green’s function) Let {M} be a connected Riemann surface, let {p_0} be a point in {M}, and suppose that the collection {{\mathcal H}_{p_0}} of holomorphic maps {f: M \rightarrow D(0,1)} that map {p_0} to {0} contains at least one non-constant map. Then the Green’s function {g_{p_0}: M \backslash \{p_0\} \rightarrow {\bf R}} for {M} at {p_0} exists. Furthermore, for any {f \in {\mathcal H}_{p_0}}, one has {|f(p)| \leq e^{-g_{p_0}(p)}} for any {p \in M \backslash \{p_0\}}.

(Note that in this proposition we no longer need {M} to be simply connected.) Indeed, suppose that Proposition 57 held. Let {M} be a simply connected Riemann surface, and let {p_0 \in M} with {{\mathcal H}_{p_0}} containing a non-constant map. By hypothesis, the Green’s function {g_{p_0}} is non-negative on {M \backslash \{p_0\}}. Noting that {M} remains connected if we remove a small disk around {p_0}, and from (iii) that {g_{p_0}} will be strictly positive on the boundary of that disk, we observe from the maximum principle (Exercise 18) and (ii) that {g_{p_0}} is in fact strictly positive on {M \backslash \{p_0\}}. By Lemma 53 we can find a holomorphic function {\phi_{p_0}: M \rightarrow {\bf C}} with a simple zero at {p_0} and no other zeroes, such that {|\phi_{p_0}| = e^{-g_{p_0}}} on {M \backslash \{p_0\}}. As {g_{p_0}} is strictly positive, {\phi_{p_0}} takes values in {D(0,1)} and is thus in {{\mathcal H}_{p_0}}. From Proposition 57 we see that {|f(p)| \leq |\phi_{p_0}(p)|} for all {p \in M \backslash \{p_0\}}. If equality occurs anywhere, then the quotient {f/\phi_{p_0}} (after removing the singularity) is a function taking values in the closed unit disk {\overline{D(0,1)}}, which has magnitude {1} at {p}; by the maximum principle we then have {f/\phi_{p_0} = e^{i\theta}} for some real {\theta}. Thus {\phi_{p_0}} obeys all the properties required for Theorem 51.

It remains to obtain the existence of the Green’s function {g_{p_0}}. To do this, we use a powerful technique for constructing harmonic functions, known as Perron’s method of subharmonic functions. The basic idea is to build a harmonic function by taking a suitable large family of subharmonic functions and then forming their supremum. We first give a definition of subharmonic function.

Definition 58 (Subharmonic function) Let {M} be a Riemann surface. A subharmonic function on {M} is an upper semi-continuous function {u: M \rightarrow {\bf R} \cup \{-\infty\}} obeying the following upper maximum principle: for any compact set {K} in {M} and any function {v: K \rightarrow {\bf R}} that is continuous on {K} and harmonic on the interior of {K}, if {u(p) \leq v(p)} for all {p \in \partial K}, then {u(p) \leq v(p)} for all {p \in K}.

A superharmonic function is similarly defined as a lower semi-continuous function {u: M \rightarrow {\bf R} \cup \{+\infty\}} such that for any compact {K \subset M} and any function {v: K \rightarrow {\bf R}} continuous on {K} and harmonic on the interior of {K}, the bound {u(p) \geq v(p)} for {p \in \partial K} implies that {u(p) \geq v(p)} for all {p \in K}.

Clearly subharmonicity and superharmonicity are conformal invariants in the sense that the analogue of Lemma 1 holds for these concepts. We have the following elementary properties of subharmonic functions and superharmonic functions:

Exercise 59 Let {M} be a Riemann surface.

  • (i) Show that a function {u: M \rightarrow {\bf R} \cup \{-\infty\}} is subharmonic if and only if {-u} is superharmonic.
  • (ii) Show that a function {u: M \rightarrow {\bf R}} is harmonic if and only if it is both subharmonic and superharmonic.
  • (iii) If {u,v: M \rightarrow {\bf R} \cup \{-\infty\}} are subharmonic, show that {\max(u,v)} is also.
  • (iv) Let {u: M \rightarrow {\bf R} \cup \{-\infty\}}, and let {U} be an open subset of {M}. Show that the restriction of {u} to {U} is subharmonic.
  • (v) (Subharmonicity is a local property) Conversely, let {u: M \rightarrow {\bf R} \cup \{-\infty\}}, and suppose that for each {p \in M} there is a neighbourhood {U_p} of {p} such that the restriction of {u} to {U_p} is subharmonic. Show that {u} is itself subharmonic. (Hint: If {v} is continuous on a compact set {K} and harmonic on the interior, and {v-u} attains a maximum at an interior point of {K}, show that {v-u} is constant in some neighbourhood of that point.)
  • (vi) (Maximum principle) Let {u: M \rightarrow {\bf R} \cup \{-\infty\}} be subharmonic, let {v: M \rightarrow {\bf R} \cup \{+\infty\}} be superharmonic, and let {K} be a compact subset of {M} such that {u(p) \leq v(p)} for all {p \in \partial K}. Show that {u(p) \leq v(p)} for all {p \in K}. (This is a similar argument to (v).)
  • (vii) Show that the sum of two subharmonic functions is again subharmonic (using the usual conventions on adding {-\infty} to itself or to another real number).
  • (viii) (Harmonic patching) Let {u: M \rightarrow {\bf R} \cup \{-\infty\}} be subharmonic, let {K} be compact, and let {v: K \rightarrow {\bf R}} be a continuous function on {K} that is harmonic on the interior of {K} and agrees with {u} on the boundary of {K}. Show that the function {\tilde u: M \rightarrow {\bf R} \cup \{-\infty\}}, defined to equal {v} on {K} and {u} on {M \backslash K}, is subharmonic.
  • (ix) Let {f: M \rightarrow {\bf C}} be a holomorphic function. Show that {\log |f|} is subharmonic, with the convention that {\log 0 = -\infty}. (Hint: first use the maximum principle and harmonic conjugates to show that if {M} contains a copy of a closed disk {\overline{D(z_0,r)}}, and {\log |f| \leq u} on the boundary of this disk for some continuous {u: \overline{D(z_0,r)} \rightarrow {\bf R}} that is harmonic in the interior of the disk, then {\log |f| \leq u} in the interior of the disk also.)

For smooth functions on an open subset of {{\bf C}}, one can express the property of subharmonicity quite explicitly:

Exercise 60 Let {U} be an open subset of {{\bf C}}, and let {u: U \rightarrow {\bf R}} be continuously twice (Fréchet) differentiable. Show that the following are equivalent:

  • (i) {u} is subharmonic.
  • (ii) For all closed disks {\overline{D(z_0,r)}} in {U}, one has

    \displaystyle  u(z_0) \leq \frac{1}{2\pi} \int_0^{2\pi} u(z_0+re^{i\theta})\ d\theta.

  • (iii) One has {\Delta u(z_0) \geq 0} for all {z_0 \in U}.

Show that the equivalence of (i) and (ii) in fact holds even if {u} is only assumed to be continuous rather than continuously twice differentiable.

However, we will not use the above exercise in our analysis here as it will not be convenient to impose a hypothesis of continuous twice differentiability on our subharmonic functions.

The Perron method is based on the observation that under certain conditions, the supremum of a family of subharmonic functions is not just subharmonic (as per Exercise 59(iii)), but is in fact harmonic. A key concept here is that of a Perron family:

Definition 61 Let {M} be a Riemann surface. A continuous Perron family on {M} is a family {{\mathcal F}} of continuous subharmonic functions {u: M \rightarrow {\bf R} \cup \{-\infty\}} with the following properties:

  • (i) If {u, v \in {\mathcal F}}, then {\max(u,v) \in {\mathcal F}}.
  • (ii) (Harmonic patching) If {u \in {\mathcal F}}, {K} is a compact subset of {M}, and {v: K \rightarrow {\bf R}} is a continuous function that is harmonic in the interior of {K} and equals {u} on the boundary of {K}, then the function {\tilde u: M \rightarrow {\bf R} \cup \{-\infty\}} defined to equal {v} on {K} and {u} outside of {K} also lies in {{\mathcal F}}.

One can also consider more general Perron families of subharmonic functions that are merely upper semi-continuous rather than continuous, but for the current application continuous Perron families will suffice.

The fundamental theorem that powers the Perron method is then

Theorem 62 (Perron method) Let {{\mathcal F}} be a continuous Perron family on a connected Riemann surface {M}, and set {u: M \rightarrow {\bf R} \cup \{+\infty\}} to be the function {u(p) := \sup_{v \in {\mathcal F}} v(p)} (note that {u(p)} cannot equal {-\infty} thanks to axiom (iii) of a Perron family). Then one of the following two statements hold:

  • (i) {u(p) = +\infty} for all {p \in M}.
  • (ii) {u} is a harmonic function on {M}.

Proof: Let us first work locally in some open subset {U} of {M} that is complex diffeomorphic to a disk {D(0,1)}; to simplify the discussion we abuse notation by identifying {U} with {D(0,1)} in the following discussion.

Assume for the moment that {u} is not identically equal to {+\infty} on {D(0,1/2)}. Let {p_0} be an arbitrary point in {D(0,1/2)} (viewed as a subset of {U}). Then we can find a sequence {u_n \in {\mathcal F}} such that {u_n(p_0) \rightarrow u(p_0)} as {n \rightarrow \infty}.

We can use Exercise 42 to find a continuous function {v_1: \overline{D(0,1/2)} \rightarrow {\bf R} \cup \{-\infty\}} that equals {u_1} on the boundary of this disk (viewed as a subset of {U}) and is harmonic and at least as large as {u_1} in the interior; if we then let {\tilde u_1} be the function defined to equal {v_1} on {\overline{D(0,1/2)}} and {u_1} outside of this disk, then {\tilde u_1} is larger than {u_1} and also lies in {{\mathcal F}} thanks to axiom (ii). Thus, by replacing {u_1} with {\tilde u_1}, we may assume that {u_1} is harmonic on {D(0,1/2)}. Next, by replacing {u_1} with {\max(u_1,u_2)} and using axiom (i), we may assume that {u_1 \leq u_2} pointwise; replacing {u_2} with a harmonic function on {D(0,1/2)} as before we may assume that {u_2} is harmonic on {D(0,1/2)}. Continuing in this fashion we may assume that {u_1 \leq u_2 \leq \dots} and that {u_1,u_2,\dots} are harmonic on {D(0,1/2)}. Form the function {w_{p_0} := \sup_n u_n}, then we have {w_{p_0} \leq u} pointwise with {w_{p_0}(p_0) = u(p_0)}. By the Harnack principle (Exercise 58 of Notes 4), we thus see that {w_{p_0}} is either harmonic on {D(0,1/2)}, or equal to {+\infty} on {D(0,1/2)}. The latter cannot occur since we are assuming {u} not identically equal to {+\infty}, thus {w_{p_0}} is harmonic.

Now let {p_1} be another point in {D(0,1/2)}. We can find another sequence {u'_n \in {\mathcal F}} with {u'_n(p_1) \rightarrow u(p_1)}. As before we may assume that the {u'_n} are increasing and are harmonic on {D(0,1/2)}; we may also assume that {u'_n \geq u_n} pointwise. Setting {w_{p_1} := \sup_n u'_n}, we conclude that {w_{p_1}} is harmonic with {w_{p_0} \leq w_{p_1} \leq u} on {D(0,1/2)}. In particular {w_{p_0}(p_0) = w_{p_1}(p_0)}. The harmonic function {w_{p_1}-w_{p_0}} is non-negative on {D(0,1/2)} and vanishes at {p_0}, hence is identically zero on {D(0,1/2)} by the maximum principle. Since {w_{p_1}(p_1) = u(p_1)}, we conclude that {w_{p_0}} and {u} agree at {p_1}. Since {p_1} was an arbitrary point on {D(0,1/2)}, we conclude that {u=w_{p_0}} is harmonic at {D(0,1/2)}.

Putting all this together, we see that for any point {p} in {M} there is a neighbourhood {U_p} (corresponding to the disk {D(0,1/2)} in the above arguments) with the property that {u} is either equal to {+\infty} on {U_p}, or is harmonic on {U_p}. By a continuity argument we conclude that one of the two options (i), (ii) of the theorem must hold. \Box

Now we can conclude the proof of Proposition 57, and hence the hyperbolic case of the uniformisation theorem, by applying the above theorem to a well-chosen Perron family. Let {M} be a simply connected Riemann surface, and let {{\mathcal F}} be the collection of all continuous subharmonic functions {u: M \backslash \{p_0\} \rightarrow {\bf R}} that vanishes outside of a compact subset of {K}, and which have a logarithmic singularity at {p_0} in the sense that {u - \log \frac{1}{z}} is bounded near {p_0} for some coordinate chart {z: U_{p_0} \rightarrow D(0,1)} that takes {p_0} to {0} (note that the precise choice of chart here is irrelevant). This collection is non-empty, for it contains the function {u} that equals (say) {\log \frac{1}{2|z|}} on {z^{-1}(D(0,1/2))}, and zero elsewhere (this follows from the observation that {\log \frac{1}{|z|}} is harmonic away from the origin, and {0} is harmonic everywhere, as well as the various properties in Exercise 59). From Exercise 59 we see that {{\mathcal F}} is a Perron family; thus, by Theorem 62, the function {g_{p_0} := \sup_{u \in {\mathcal F}} u} is either harmonic on {M \backslash \{p_0\}}, or is infinite everywhere. Using the element of {{\mathcal F}} used above we see that {g_{p_0}} is non-negative.

Let {f: M \rightarrow D(0,1)} be an arbitrary element of {{\mathcal H}_{p_0}}. By Exercise 59(ix), {\log |f|} is subharmonic, hence {\log \frac{1}{|f|}} is superharmonic and also non-negative since {f} takes values in {D(0,1)}; as {f} vanishes at {p_0}, {\log \frac{1}{|f|}} has at least a logarithmic singularity at {p_0} in the sense that {\log \frac{1}{|f|} - \log \frac{1}{|z|}} is bounded from below near {p_0}. If {u \in {\mathcal F}}, then {u} vanishes outside of a compact set {K}, hence {u \leq (1+\varepsilon) \log \frac{1}{|f|}} outside of {K} for any {\varepsilon > 0}. As {u} has a logarithmic singularity at {p_0} we also have {u \leq (1+\varepsilon) \log \frac{1}{|f|}} in a sufficiently small neighbourhood of {p_0}. Appying the maximum principle (Exercise 59(vi)) we conclude that {u \leq (1+\varepsilon) \log \frac{1}{|f|}} on all of {M \backslash \{p_0\}}; sending {\varepsilon} to zero and then taking suprema in {u} we conclude that

\displaystyle  g_{p_0} \leq \log \frac{1}{|f|}

or equivalently

\displaystyle  |f| \leq e^{-g_{p_0}}

pointwise on {M \backslash \{p_0\}}. In particular, since {{\mathcal H}_{p_0}} contains at least one non-constant map, {g_{p_0}} cannot be infinite everywhere and must therefore be harmonic.

Similarly, if {g'_{p_0}: M \backslash \{p_0\} \rightarrow {\bf R}} is a function obeying the properties (i)-(iii) of a Green’s function, and {u \in {\mathcal F}}, then another application of the maximum principle shows that {u \leq (1+\varepsilon) g'_{p_0}} on {M \backslash \{p_0\}} for any {\varepsilon > 0}; sending {\varepsilon \rightarrow 0} and taking suprema in {u} we see that {g_{p_0} \leq g'_{p_0}} pointwise.

The only remaining task to show is that {g_{p_0}} has a logarithmic singularity at {p_0}. Certainly it has at least this much of a singularity, in that {g_{p_0} - \log \frac{1}{|z|}} is bounded from below near {p_0}, as can be seen by comparing {g_{p_0}} to any element of {{\mathcal F}}. To get the upper bound, observe that for any {u \in {\mathcal F}} and {\varepsilon > 0}, the function {u - (1+\varepsilon) \log \frac{1}{|z|}} is subharmonic on {M \backslash \{p_0\}} and diverges to {-\infty} at {p_0}, and is hence in fact subharmonic on all of {M}. In particular, for {p} in the disk {z^{-1}(D(0,1/2))}, we have from the maximum principle that

\displaystyle  u(p) - (1+\varepsilon) \log \frac{1}{|z(p)|} \leq \sup_{q \in U_{p_0}: |z(q)| = 1/2} u(q) - (1+\varepsilon) \log 2

and hence on taking suprema in {u} and limits in {\varepsilon}

\displaystyle  g_{p_0}(p) - \log \frac{1}{|z(p)|} \leq \sup_{q \in U_{p_0}: |z(q)| = 1/2} g_{p_0}(q) - \log 2.

The right-hand side is finite, and this gives the required upper bound to complete the proof that {g_{p_0}} has a logarithmic singularity at {p_0}. This concludes the proof of Proposition 57 and hence Theorem 49.

Before we turn to the non-hyperbolic case of the uniformisation theorem, we record a symmetry property of the Green’s functions that is used to establish that case:

Proposition 63 (Symmetry of Green’s functions) Let {M} be a connected Riemann surface, and suppose that the Green’s functions {g_{p_0}: M \backslash \{p_0\} \rightarrow {\bf R}} exist for all {p_0}. Then for all distinct {p_0,p_1 \in M}, we have {g_{p_0}(p_1) = g_{p_1}(p_0)}.

When {M} is simply connected, this symmetry can be deduced from (7). For {M} that are not simply connected, the argument is trickier, requiring one to pass to a universal cover {N} of {M}, establish the existence of Green’s functions on {N}, and find an identity relating the Green’s functions on {N} with the Green’s functions on {M}. For details see Marshall’s notes.

Now we can discuss to the non-hyperbolic case of the uniformisation theorem, Theorem 50. Now we do not have any Green’s functions, or any non-constant bounded holomorphic functions. However, note that all three of the model Riemann surfaces {D(0,1)}, {{\bf C}} and {{\bf C} \cup \{\infty\}} still have plenty of meromorphic functions: in particular, for any two distinct points {z_0,z_1} in {{\bf C}}, one can find a holomorphic function {f: {\bf C} \rightarrow {\bf C} \cup \{\infty\}} that has a simple zero at {z_0}, a simple pole at {z_1}, and no other zeroes and poles, namely {f(z) := \frac{z-z_0}{z-z_1}}; one can think of this function with a zero-pole pair as a “dipole“. Similarly if one works on the domain {{\bf C} \cup \{\infty\}} or {D(0,1)} rather than {{\bf C}}. From this we see that Theorem 50 would imply the following claim:

Theorem 64 (Existence of dipoles) Let {M} be a simply connected Riemann surface. Let {p_0,p_1} be distinct points in {M}. Then there exists a holomorphic map {f_{p_0,p_1}: M \rightarrow {\bf C} \cup \{\infty\}} that has a simple zero at {p_0}, a simple pole at {p_1}, and no other zeroes and poles. Furthermore, outside of a compact set {K} containing {p_0,p_1}, the function {f_{p_0,p_1}} can be chosen to be bounded away from both {0} and {\infty} (that is, there exists {C > c > 0} such that {c \leq |f_{p_0,p_1}(p)| \leq C} for all {p \in M \backslash K}).

In the converse direction, we can use Theorem 64 to recover Theorem 50 in a manner analogous to how Theorem 51 implies Theorem 49. Indeed, let {M} be a simply connected Riemann surface without non-constant holomorphic maps from {M} to {D(0,1)}. Given any three distinct points {p_0,p_1,p_2} in {M}, we consider the dipoles {f_{p_0,p_1}} and {f_{p_0,p_2}}. The function

\displaystyle  \frac{f_{p_0,p_1} - f_{p_0,p_1}(p_2)}{f_{p_0,p_2}} \ \ \ \ \ (8)

has removable singularities at {p_0} and at {p_2}, no poles, and is also bounded away from a compact set. Thus this function extends to a bounded holomorphic function on {M}. Since {M} does not have any non-constant bounded holomorphic functions, the function (8) must be constant, thus {f_{p_0,p_1} = a f_{p_0,p_2} + b} for some complex numbers {a,b}; as {f_{p_0,p_1}} is non-constant, {a} must be non-zero. Since {f_{p_0,p_2}} vanishes only at {p_2}, we conclude that {f_{p_0,p_1}(p_2) \neq f_{p_0,p_1}(p)} for any {p \neq p_2}. Since {f_{p_0,p_1}} also has its only zero at {p_0} and its only pole at {p_1}, we conclude that {f_{p_0,p_1}} is injective. By Exercise 40 of Notes 4, {f_{p_0,p_1}} is thus a complex diffeomorphism from {M} to an open subset {U} of {{\bf C} \cup \{\infty\}}, which of course is simply connected since {M} is. If {U} is all of {{\bf C} \cup \{\infty\}} then we are in the elliptic case and we are done. If {U} omits at least one point in {{\bf C} \cup \{\infty\}} then by applying a Möbius transform {U} is complex diffeomorphic to a simply connected open subset of {{\bf C}}; by the Riemann mapping theorem, we conclude that {M} is either complex diffeomorphic to {{\bf C}} or to {D(0,1)}. The latter case cannot occur by hypothesis, and we are done.

It remains to prove Theorem 64. As before, we convert the problem to one of finding a specific harmonic function. More precisely, one can derive Theorem 64 from

Theorem 65 (Existence of dipole Green’s functions) Let {M} be a connected Riemann surface. Let {p_0,p_1} be distinct points in {M}, and let {z_0: U_{p_0} \rightarrow D(0,1)} and {z_1: U_{p_1} \rightarrow D(0,1)} be coordinate charts on disjoint neighbourhoods {U_{p_0}, U_{p_1}} of {p_0,p_1} respectively, which map {p_0} and {p_1} respectively to {0}. Then there exists a harmonic function {g_{p_0,p_1}: M \backslash \{p_0,p_1\} \rightarrow {\bf R}} such that {g_{p_0,p_1} - \log |z_0|} is bounded near {p_0}, and {g_{p_0,p_1} + \log |z_1|} is bounded near {p_1}. Furthermore, {g_{p_0,p_1}} is bounded outside of a compact subset {K} of {M}.

In the case {M = {\bf C}}, one can take the dipole Green’s function {g_{p_0,p_1}} to be the function {\log |z-p_0| - \log |z-p_1| + C} for an arbitrary constant {C}.

Exercise 66 Adapt the proof of Lemma 53 to show that Theorem 65 implies Theorem 64 (and hence Theorem 50).

We still need to prove Theorem 65. If {M} admitted Green’s functions {g_{p_0}} for every point {p_0 \in M}, we could simply take {g_{p_0,p_1}} to be the difference {g_{p_1} - g_{p_0}}. Unfortunately, as we are in the non-hyperbolic case, {M} is not expected to have Green’s functions, and it does not appear possible to construct the dipole Green’s functions {g_{p_0,p_1}} directly from Perron’s method due to the indefinite sign of these functions. However, it turns out that if one removes a small disk from {M} of some small radius {t>0} in a given coordinate chart, then the resulting Riemann surface {M_t} will admit Green’s functions {g_{p_0,t}}, and by considering limits of the sequence {g_{p_1,t} - g_{p_0,t}} as {t \rightarrow 0} using a version of Montel’s theorem one will be able to obtain the required dipole Green’s function, after first making heavy use of the maximum principle (and an important variant of that principle known as Harnack’s inequality, see Exercise 68 below) to obtain some locally uniform control on the difference {g_{p_1,t} - g_{p_0,t}} in {t}. To obtain this locally uniform control, the symmetry property in (63) is key, as it allows one to write

\displaystyle  g_{p_1,t}(p) - g_{p_0,t}(p) = (g_{p_1,t}(p) - g_{p_1,t}(p_0)) - (g_{p_0,t}(p) - g_{p_0,t}(p_1))

so that the main challenge is to show that the differences {g_{p_1,t}(p) - g_{p_1,t}(p_0)} and {g_{p_0,t}(p) - g_{p_0,t}(p_1)} are bounded uniformly in {t}, which can be done from the maximum principle and the Harnack inequality. The details are unfortunately a little complicated, and we refer the reader to Marshall’s notes for the complete argument.

To close this section we give a quick corollary to the uniformisation theorem, namely Rado’s theorem on the topology of Riemann surfaces:

Corollary 67 (Rado’s theorem) Every connected Riemann surface is second countable and separable.

Proof: By passing to the universal cover, it suffices to verify this claim for simply connected Riemann surfaces. But the three model surfaces {{\bf C} \cup \{\infty\}}, {{\bf C}}, {D(0,1)} are clearly second countable and separable, so the claim follows from the uniformisation theorem. \Box

It is remarkably difficult to prove this theorem directly, without going through the uniformisation theorem. (As just one indication of the difficulty of this theorem, the analogue of Rado’s theorem for complex manifolds in two and higher dimensions is known to be false.)

Exercise 68 (Harnack inequality) Let {u: \overline{D(z_0,R)} \rightarrow {\bf R}} be a non-negative continuous function on a closed disk {\overline{D(z_0,R)}} that is harmonic on the interior of the disk. Show that for every {0 \leq r \leq R} and {z \in \overline{D(z_0,r)}}, one has

\displaystyle  \frac{R-r}{R+r} u(z_0) \leq u(z) \leq \frac{R+r}{R-r} u(z_0).

(Hint: use Exercise 42.)

Filed under: 246A - complex analysis, math.CV, math.DG Tagged: conformal mapping, Green's function, Riemann mapping theorem, Schwarz-Christoffel formula, uniformisation theorem

Terence TaoCall for nominations for the 2018 Chern Medal

[This guest post is authored by Caroline Series.]

The Chern Medal is a relatively new prize, awarded once every four years jointly by the IMU
and the Chern Medal Foundation (CMF) to an individual whose accomplishments warrant
the highest level of recognition for outstanding achievements in the field of mathematics.
Funded by the CMF, the Medalist receives a cash prize of US$ 250,000.  In addition, each
Medalist may nominate one or more organizations to receive funding totalling US$ 250,000, for the support of research, education, or other outreach programs in the field of mathematics.

Professor Chern devoted his life to mathematics, both in active research and education, and in nurturing the field whenever the opportunity arose. He obtained fundamental results in all the major aspects of modern geometry and founded the area of global differential geometry. Chern exhibited keen aesthetic tastes in his selection of problems, and the breadth of his work deepened the connections of geometry with different areas of mathematics. He was also generous during his lifetime in his personal support of the field.

Nominations should be sent to the Prize Committee Chair:  Caroline Series, email: by 31st December 2016. Further details and nomination guidelines for this and the other IMU prizes can be found at


Filed under: advertising, guest blog Tagged: Caroline Series, Chern medal

Jordan EllenbergReader poll: how many times have you worn a necktie?

This Fermi question is probably easy for some people:  I’m guessing there are some women for whom the answer is zero (am I right?), and some men whose necktie-wearing is dominated by “every day at work, 5 days a week, 50 days a year.”  For me it’s much harder.  I guess I’d say — three or four times a year, as an adult?  More when I was younger and went to more weddings.  I’m gonna estimate I’ve put on a tie 150 times in my life.

Followup question:  how many ties do you own?  I think I probably have about 8, but 5 are in the “will never wear again” category, and one of the 3 “sometimes wear” is the American flag tie I only wear on July 4.

Answer in comments!

BackreactionCan dark energy and dark matter emerge together with gravity?

A macaroni pie? Elephants blowing ballons? 
No, it’s Verlinde’s entangled universe.
In a recent paper, the Dutch physicist Erik Verlinde explains how dark energy and dark matter arise in emergent gravity as deviations from general relativity.

It’s taken me some while to get through the paper. Vaguely titled “Emergent Gravity and the Dark Universe,” it’s a 51-pages catalog of ideas patched together from general relativity, quantum information, quantum gravity, condensed matter physics, and astrophysics. It is clearly still research in progress and not anywhere close to completion.

The new paper substantially expands on Verlinde’s earlier idea that the gravitational force is some type of entropic force. If that was so, it would mean gravity is not due to the curvature of space-time – as Einstein taught us – but instead caused by the interaction of the fundamental elements which make up space-time. Gravity, hence, would be emergent.

I find it an appealing idea because it allows one to derive consequences without having to specify exactly what the fundamental constituents of space-time are. Like you can work out the behavior of gases under pressure without having a model for atoms, you can work out the emergence of gravity without having a model for whatever builds up space-time. The details would become relevant only at very high energies.

As I noted in a comment on the first paper, Verlinde’s original idea was merely a reinterpretation of gravity in thermodynamic quantities. What one really wants from emergent gravity, however, is not merely to get back general relativity. One wants to know which deviations from general relativity come with it, deviations that are specific predictions of the model and which can be tested.

Importantly, in emergent gravity such deviations from general relativity could make themselves noticeable at long distances. The reason is that the criterion for what it means for two points to be close by each other emerges with space-time itself. Hence, in emergent gravity there isn’t a priori any reason why new physics must be at very short distances.

In the new paper, Verlinde argues that his variant of emergent gravity gives rise to deviations from general relativity on long distances, and these deviations correspond to dark energy and dark matter. He doesn’t explain dark energy itself. Instead, he starts with a universe that by assumption contains dark energy like we observe, ie one that has a positive cosmological constant. Such a universe is described approximately by what theoretical physicists call a de-Sitter space.

Verlinde then argues that when one interprets this cosmological constant as the effect of long-distance entanglement between the conjectured fundamental elements, then one gets a modification of the gravitational law which mimics dark matter.

The reason is works is that to get normal gravity one assigns an entropy to a volume of space which scales with the surface of the area that encloses the volume. This is known as the “holographic scaling” of entropy, and is at the core of Verlinde’s first paper (and earlier work by Jacobson and Padmanabhan and others). To get deviations from normal gravity, one has to do something else. For this, Verlinde argues that de Sitter space is permeated by long-distance entanglement which gives rise to an entropy which scales, not with the surface area of a volume, but with the volume itself. It consequently leads to a different force-law. And this force-law, so he argues, has an effect very similar to dark matter.

Not only does this modified force-law from the volume-scaling of the entropy mimic dark matter, it more specifically reproduces some of the achievements of modified gravity.

In his paper, Verlinde derives the observed relation between the luminosity of spiral galaxies and the angular velocity of their outermost stars, known as the Tully-Fisher relation. The Tully-Fisher relation can also be found in certain modifications of gravity, such as Moffat Gravity (MOG), but more generally every modification that approximates Milgrom’s modified Newtonian Dynamics (MOND). Verlinde, however, does more than that. He also derives the parameter which quantifies the acceleration at which the modification of general relativity becomes important, and gets a value that fits well with observations.

It was known before that this parameter is related to the cosmological constant. There have been various attempts to exploit this relation, most recently by Lee Smolin. In Verlinde’s approach the relation between the acceleration scale and the cosmological constant comes out naturally, because dark matter has the same origin of dark energy. Verlinde further offers expressions for the apparent density of dark matter in galaxies and clusters, something that, with some more work, can probably be checked observationally.

I find this is an intriguing link which suggests that Verlinde is onto something. However, I also find the model sketchy and unsatisfactory in many regards. General Relativity is a rigorously tested theory with many achievements. To do any better than general relativity is hard, and thus for any new theory of gravity the most important thing is to have a controlled limit in which General Relativity is reproduced to good precision. How this might work in Verlinde’s approach isn’t clear to me because he doesn’t even attempt to deal with the general case. He starts right away with cosmology.

Now in cosmology we have a preferred frame which is given by the distribution of matter (or by the restframe of the CMB if you wish). In general relativity this preferred frame does not originate in the structure of space-time itself but is generated by the stuff in it. In emergent gravity models, in contrast, the fundamental structure of space-time tends to have an imprint of the preferred frame. This fundamental frame can lead to violations of the symmetries of general relativity and the effects aren’t necessarily small. Indeed, there are many experiments that have looked for such effects and haven’t found anything. It is hence a challenge for any emergent gravity approach to demonstrate just how to avoid such violations of symmetries.

Another potential problem with the idea is the long-distance entanglement which is sprinkled over the universe. The physics which we know so far works “locally,” meaning stuff can’t interact over long distances without a messenger that travels through space and time from one to the other point. It’s the reason my brain can’t make spontaneous visits to the Andromeda nebula, and most days I think that benefits both of us. But like that or not, the laws of nature we presently have are local, and any theory of emergent gravity has to reproduce that.

I have worked for some years on non-local space-time defects, and based on what I learned from that I don’t think the non-locality of Verlinde’s model is going to be a problem. My non-local defects aren’t the same as Verlinde’s entanglement, but guessing that the observational consequences scale similarly, the amount of entanglement that you need to get something like a cosmological constant is too small to leave any other noticeable effects on particle physics. I am therefore more worried about the recovery of local Lorentz-invariance. I went to great pain in my models to make sure I wouldn’t get these, and I can’t see how Verlinde addresses the issue.

The more general problem I have with Verlinde’s paper is the same I had with his 2010 paper, which is that it’s fuzzy. It remained unclear to me exactly what are the necessary assumptions. I hence don’t know whether it’s really necessary to have this interpretation with the entanglement and the volume-scaling of the entropy and with assigning elasticity to the dark energy component that pushes in on galaxies. Maybe it would be sufficient already to add a non-local modification to the sources of general relativity. Having toyed with that idea for a while, I doubt it. But I think Verlinde’s approach would benefit from a more axiomatic treatment.

In summary, Verlinde’s recent paper offers the most convincing argument I have seen so far that dark matter and dark energy are related. However, it is presently unclear if not this approach would also have unwanted side-effects that are in conflict with observation already.

Jordan EllenbergWhy did Terrill Thomas die of thirst?

Nobody decided to kill Terrill Thomas.  He kept flooding his cell at the Milwaukee County Jail and making a mess so they just turned off the water to his cell.  Then they left it off until he was dead.  It took six days.  Fellow inmates say he was calling out for water.  Corrections officers say they checked in on Thomas every half hour, and “he had made some type of noise or movement” every time.  Until the last time, when he didn’t make any type of noise or movement because he’d died of thirst.

How did this happen?  It didn’t happen because David Clarke — the sheriff of Milwaukee County, and a top candidate to lead the Department of Homeland Security in the Trump administration — wanted to kill a prisoner in the most agonizing way imaginable.  What kind of psycho would want that?  I don’t think Sheriff Clarke wanted to kill a newborn baby either.  The baby was the fourth person to die in the jail since April.

These people died because nobody really seems to care what happens in the Milwaukee County Jail.  The medical services there are run by Armor Correctional Health Services, a company which oversees healthcare for 40,000 inmates in 8 states.  What are you saying about your priorities if you call your health care company “Armor”?

Armor’s glassdoor page doesn’t make it sound like a great place to work.  One employee writes:   “Stop being bean counters and start listening to your employees. We are asked to do too much: too many patients, too many intakes, not nearly enough staff to be in compliance with your own rules!”  Armor was sued by the state of New York this year over 12 inmates who died in the Nassau County Jail, including Daniel Pantera, who died of hypothermia in solitary confinement; they settled the suit last month for $350,000 and are barred for three years from bidding for contracts in the state.  Armor does get a very nice endorsement, though, from Palm Beach County Sheriff Ric Bradshaw, who says right on the front of their website, “Armor stands out as an exemplary model of what partnership in correctional health should look like.”  Bradshaw’s department was held liable this year for $22.4m in damages to Dontrell Stephens, and this summer settled for $550,000 against a former U.S. Marshall who said he was roughed up by deputies after stopping to help the victims of a traffic accident.

Here in Wisconsin, Armor’s performance is overseen by Ronald Shansky, a court-appointed monitor and the first president of the Society of Correctional Physicians.  Some of what Shansky has to say, based on his visit to Milwaukee County correctional facilities last month:


As for the deaths:


Shansky also, I should say, has a lot of praise for some staff members at the jail, characterizing them as devoted to their jobs and patients and doing the best they can under strained circumstances.  And I believe that’s true.  Again:  the doctors of Armor didn’t want Terrill Thomas to spend six days dying of thirst.  Neither did the CEO of Armor.  Neither did David Clarke.  But it happened.  And everyone participated in creating the circumstances under which it happened, and under which it’s likely to happen again:  public services outsourced to companies without the staff or resources to do the job right.

It starts with jails.  But it goes on to schools, to parking, to Medicare, to policing, to the maintenance of our bridges and roads.  You’ll hear people say those services should be run like businesses.  We can see in Milwaukee County what that looks like.  Does it look good?

December 01, 2016

Chad OrzelPhysics Blogging Round-Up: November

I’m not posting as much as I did last year, when I was on sabbatical (gasp, shock, surprise), so making Forbes-blog links dump posts a monthly thing is probably just about sustainable.

What Math Do You Need For Physics? It Depends: Some thoughts about, well, the math you need to learn to be a physicist. Which may not be all that much, depending on your choice of subfield. Prompted a nice response from Peter Woit, too.

Physics And The Science Of Finding Missing Pieces: One of several recent-ish posts prompted by my last term teaching from Matter and Interactions.

How To Make A White Dwarf With Lasers And Cold Atoms: An explanation of ultracold plasma physics, prompted by a visit and a very nice colloquium talk by Tom Killian from Rice.

Here’s The Physics That Got Left Out Of ‘Arrival’: As noted previously, the movie adaptation of Ted Chiang’s “Story of Your Life” is very god but would’ve been better if they’d kept more of the physics from the original story. This is an explanation of that physics.

Three Candidates For The ‘Hamilton’ Of Physics: Because I needed something frivolous and morale-boosting, some suggestions for splashy Broadway biographies of some notable physicists.

So, that’s the month of November for you. The math and Arrival posts did really well, traffic-wise, but I’m probably happiest with the cold plasma one. But, you know, such is science blogging.

Chad OrzelOn Feelings and Votes

This is going to be a bit of a rant, because there’s a recurring theme in my recent social media that’s really bugging me, and I need to vent. I’m going to do it as a blog post rather than an early-morning tweetstorm, because tweets are more likely to be pulled out of context, and then I’m going to unfollow basically everybody that isn’t a weird Twitter bot or a band that I like, and try to avoid politics until the end of the year. Also, I’ll do some physics stuff.

This morning saw the umpteenth reshared tweetstorm (no link because it doesn’t matter who it was) berating people who write about how liberals ought to reach out to working-class whites– as I did a little while back— for caring too much about the “feelings” of white people. While there are undoubtedly some disingenuous op-eds being written for which that’s true, I think it misses an extremely important point about this whole thing. That is, it’s true that these pieces are concerned about the feelings of white people, but only as a means to an end. What really matters isn’t their feelings, but their votes.

And all the stuff being thrown out there as progressives work through the Kübler-Ross model need those votes. You think it’s ridiculous that Hillary Clinton won the popular vote by 2.5 million votes but still lost, so you want to get rid of the Electoral College? Great. To do that, you need to amend the Constitution, which requires control of Congress and/or a whole bunch of state legislatures, most of which are in Republican hands, because they get the votes of those working-class whites. You want to ditch the Electoral College, you need to change those votes.

Think those working-class whites have too much power because of gerrymandered districts that over-weight rural areas? You’re probably right, but if you want to fix it, you need to control the legislatures that make the districts, and those are mostly in Republican hands because they get the votes of those people in rural districts. You want to stop gerrymandering and protect voting rights, you need to change those votes.

There are a whole host of things wrong with our current system. Fixing any of them requires winning elections, particularly those off-year legislative elections where Democrats underperform even when they’re winning statewide and national elections. Winning some of those is going to require getting the people who vote in those elections to change their votes, and hopefully their minds.

And that is why pundits and those who play pundit in a half-assed way on their blogs are saying you should care about the feelings of those working-class whites: because they vote, and you need their votes. And you’re not going to get those votes by berating them and insulting them and disparaging their feelings. You get their votes by understanding where they’re coming from, offering them something they want, and treating them with respect.

And again, this does not mean you need to cater to their basest impulses. Fundamental principles of tolerance and equality are not negotiable, and can not be compromised. But you don’t have to pander to racism to move some votes– most of the policies in the Democratic platform are already clearly better for those people than the Republican alternatives. It’s just a matter of pitching them in a way that makes that clear.

As an attempt at a concrete example, look at issues of affirmative action and immigration. If you’re dealing with someone who’s concerned about immigrants or people of color “taking our jobs,” you’re not likely to bring them to your side by lecturing them about how they’re not really entitled to that job, they’re just the beneficiary of hundreds of years of racist policy, and so on. You might be right about the history, but that’s not terribly persuasive to somebody who’s worried about having a stable income and health insurance to support their family. But you don’t need to go full “build a wall,” either– something like “The real problem is that there ought to be enough good jobs for both you and them, and here’s what we’re going to do to make that happen” could work. (It has the disadvantage of needing a plan to create jobs for all, admittedly, but as the recent election shows, such a plan doesn’t even need to be all that plausible.) That steps around the implicit racism of the original concern in a way that preserves their feelings, gets their vote for better policy, and doesn’t compromise any fundamental principles.

(Yes, this is basically the Bernie Sanders strategy. I would’ve been all for Bernie’s economic program; I don’t think he would’ve been a viable candidate in the general election, though.)

Another common and maddening refrain the past few weeks has been “Why do we have to care about their feelings, when they’re hateful toward us?” The answer is, bluntly, that they don’t need your votes. They’re living in gerrymandered districts that give them too much power, and they’re winning the elections that matter. If you want to change the broken system in fundamental ways, you need to convince them to vote for policies that involve giving up some of that power. They can keep things just the way they are, or make them much, much worse, without any assistance from you.

And, yes, it’s unquestionably true that a distressingly large number of those voters are openly racist and probably not persuadable. But the hard-core racist fraction is not 100%, and you wouldn’t need a huge effect to make things better. As I said before, even if 39 out of 40 Trump voters in PA, MI, and WI was a full-on alt-right Twitter frog, flipping the vote of that one decent human being would’ve avoided our current situation. I think that would’ve been worth a little bit of effort to respect their feelings, at least long enough to win their votes.

Yes, that’s messy, and compromised, and leaves some big issues unaddressed. Welcome to politics. It’s not about feelings, on either side, it’s about getting enough votes to win elections.

Rant over, catharsis achieved. Shutting up about politics, now.

BackreactionDear Dr. B: What is emergent gravity?

    “Hello Sabine, I've seen a couple of articles lately on emergent gravity. I'm not a scientist so I would love to read one of your easy-to-understand blog entries on the subject.


    Michael Tucker
    Wichita, KS”

Dear Michael,

Emergent gravity has been in the news lately because of a new paper by Erik Verlinde. I’ll tell you some more about that paper in an upcoming post, but answering your question makes for a good preparation.

The “gravity” in emergent gravity refers to the theory of general relativity in the regimes where we have tested it. That means Einstein’s field equations and curved space-time and all that.

The “emergent” means that gravity isn’t fundamental, but instead can be derived from some underlying structure. That’s what we mean by “emergent” in theoretical physics: If theory B can be derived from theory A but not the other way round, then B emerges from A.

You might be more familiar with seeing the word “emergent” applied to objects or properties of objects, which is another way physicists use the expression. Sound waves in the theory of gases, for example, emerge from molecular interactions. Van-der Waals forces emerge from quantum electrodynamics. Protons emerge from quantum chromodynamics. And so on.

Everything that isn’t in the standard model or general relativity is known to be emergent already. And since I know that it annoys so many of you, let me point out again that, yes, to our current best knowledge this includes cells and brains and free will. Fundamentally, you’re all just a lot of interacting particles. Get over it.

General relativity and the standard model are the currently the most fundamental descriptions of nature which we have. For the theoretical physicist, the interesting question is then whether these two theories are also emergent from something else. Most physicists in the field think the answer is yes. And any theory in which general relativity – in the tested regimes – is derived from a more fundamental theory, is a case of “emergent gravity.”

That might not sound like such a new idea and indeed it isn’t. In string theory, for example, gravity – like everything else – “emerges” from, well, strings. There are a lot of other attempts to explain gravitons – the quanta of the gravitational interaction – as not-fundamental “quasi-particles” which emerge, much like sound-waves, because space-time is made of something else. An example for this is the model pursued by Xiao-Gang Wen and collaborators in which space-time, and matter, and really everything is made of qbits. Including cells and brains and so on.

Xiao-Gang’s model stands out because it can also include the gauge-groups of the standard model, though last time I looked chirality was an issue. But there are many other models of emergent gravity which focus on just getting general relativity. Lorenzo Sindoni has written a very useful, though quite technical, review of such models.

Almost all such attempts to have gravity emerge from some underlying “stuff” run into trouble because the “stuff” defines a preferred frame which shouldn’t exist in general relativity. They violate Lorentz-invariance, which we know observationally is fulfilled to very high precision.

An exception to this is entropic gravity, an idea pioneered by Ted Jacobson 20 years ago. Jacobson pointed out that there are very close relations between gravity and thermodynamics, and this research direction has since gained a lot of momentum.

The relation between general relativity and thermodynamics in itself doesn’t make gravity emergent, it’s merely a reformulation of gravity. But thermodynamics itself is an emergent theory – it describes the behavior of very large numbers of some kind of small things. Hence, that gravity looks a lot like thermodynamics makes one think that maybe it’s emergent from the interaction of a lot of small things.

What are the small things? Well, the currently best guess is that they’re strings. That’s because string theory is (at least to my knowledge) the only way to avoid the problems with Lorentz-invariance violation in emergent gravity scenarios. (Gravity is not emergent in Loop Quantum Gravity – its quantized version is directly encoded in the variables.)

But as long as you’re not looking at very short distances, it might not matter much exactly what gravity emerges from. Like thermodynamics was developed before it could be derived from statistical mechanics, we might be able to develop emergent gravity before we know what to derive it from.

This is only interesting, however, if the gravity that “emerges” is only approximately identical to general relativity, and differs from it in specific ways. For example, if gravity is emergent, then the cosmological constant and/or dark matter might emerge with it, whereas in our current formulation, these have to be added as sources for general relativity.

So, in summary “emergent gravity” is a rather vague umbrella term that encompasses a large number of models in which gravity isn’t a fundamental interaction. The specific theory of emergent gravity which has recently made headlines is better known as “entropic gravity” and is, I would say, the currently most promising candidate for emergent gravity. It’s believed to be related to, or maybe even be part of string theory, but if there are such links they aren’t presently well understood.

Thanks for an interesting question!

Aside: Sorry about the issue with the comments. I turned on G+ comments, thinking they'd be displayed in addition, but that instead removed all the other comments. So I've reset this to the previous version, though I find it very cumbersome to have to follow four different comment threads for the same post.

November 30, 2016

Jordan EllenbergCall for nominations for the Chern Medal

This is a guest post by Caroline Series.

The Chern Medal is a relatively new prize, awarded once every four years jointly by the IMU and the Chern Medal Foundation (CMF) to an individual whose accomplishments warrant the highest level of recognition for outstanding achievements in the field of mathematics. Funded by the CMF, the Medalist receives a cash prize of US$ 250,000. In addition, each Medalist may nominate one or more organizations to receive funding totalling US$ 250,000, for the support of research, education, or other outreach programs in the field of mathematics.

Professor Chern devoted his life to mathematics, both in active research and education, and in nurturing the field whenever the opportunity arose. He obtained fundamental results in all the major aspects of modern geometry and founded the area of global differential geometry. Chern exhibited keen aesthetic tastes in his selection of problems, and the breadth of his work deepened the connections of geometry with different areas of mathematics. He was also generous during his lifetime in his personal support of the field.

Nominations should be sent to the Prize Committee Chair: Caroline Series, email: chair(at) by 31st December 2016. Further details and nomination guidelines for this and the other IMU prizes can be found here.  Note that previous winners of other IMU prizes, such as the Fields Medal, are not eligible for consideration.

n-Category Café Quarter-Turns

Teaching linear algebra this semester has made me face up to the fact that for a linear operator TT on a real inner product space, Tx,x=0xT *=T \langle T x, x \rangle = 0 \,\, \forall x \,\, \iff \,\, T^\ast = -T whereas for an operator on a complex inner product space, Tx,x=0xT=0. \langle T x, x \rangle = 0 \,\, \forall x \,\, \iff \,\, T = 0. In other words, call an operator TT a quarter-turn if Tx,x=0\langle T x, x \rangle = 0 for all xx. Then the real quarter-turns correspond to the skew symmetric matrices — but apart from the zero operator, there are no complex quarter turns at all.

Where in my mental landscape should I place these facts?

The proofs of both facts are easy enough. Everyone who’s met an inner product space knows the real polarization identity: in a real inner product space XX,

x,y=14(x+y 2xy 2). \langle x, y \rangle = \frac{1}{4} \bigl( \| x + y \|^2 - \| x - y \|^2 \bigr).

All we used about ,\langle -, - \rangle here is that it’s a symmetric bilinear form (and that w 2=w,w\|w\|^2 = \langle w, w \rangle). In other words, for a symmetric bilinear form β\beta on XX, writing Q(w)=β(w,w)Q(w) = \beta(w, w), we have

β(x,y)=14(Q(x+y)Q(xy)) \beta(x, y) = \frac{1}{4} \bigl( Q(x + y) - Q(x - y) \bigr)

for all x,yXx, y \in X.

The crucial point is that we really did need the symmetry. (For it’s clear that the right-hand side is symmetric whether or not β\beta is.) For a not-necessarily-symmetric bilinear form β\beta, all we can say is

12(β(x,y)+β(y,x))=14(Q(x+y)Q(xy)) \frac{1}{2} \bigl( \beta(x, y) + \beta(y, x) \bigr) = \frac{1}{4} \bigl( Q(x + y) - Q(x - y) \bigr)

or more simply put,

β(x,y)+β(y,x)=12(Q(x+y)Q(xy)). \beta(x, y) + \beta(y, x) = \frac{1}{2} \bigl( Q(x + y) - Q(x - y) \bigr).

Now let TT be a linear operator on XX. There is a bilinear form β\beta defined by β(x,y)=Tx,y\beta(x, y) = \langle T x, y \rangle. It’s not symmetric unless TT is self-adjoint; nevertheless, the polarization identity just stated tells us that

Tx,y+Ty,x=12(T(x+y),x+yT(xy),xy). \langle T x, y \rangle + \langle T y, x \rangle = \frac{1}{2} \bigl( \langle T(x + y), x + y \rangle - \langle T(x - y), x - y \rangle \bigr).

It follows that TT is a quarter-turn if and only if

Tx,y+Ty,x=0 \langle T x, y \rangle + \langle T y , x \rangle = 0

for all x,yXx, y \in X. After some elementary rearrangement, this in turn is equivalent to

(T+T *)x,y=0 \langle (T + T^\ast)x, y \rangle = 0

for all x,yx, y, where T *T^\ast is the adjoint of TT. But that just means that T+T *=0T + T^\ast = 0. So, TT is a quarter-turn if and only if T *=TT^\ast = -T.

The complex case involves a more complicated polarization identity, but is ultimately simpler. To be clear, when I say “complex inner product” I’m talking about something that’s linear in the first argument and conjugate linear in the second.

In a complex inner product space, the polarization formula is

x,y=14 p=0 3i px+i py 2. \langle x , y \rangle = \frac{1}{4} \sum_{p = 0}^3 i^p \| x + i^p y \|^2.

This can be compared with the real version, which (in unusually heavy notation) says that

x,y=14 p=0 1(1) px+(1) py 2. \langle x , y \rangle = \frac{1}{4} \sum_{p = 0}^1 (-1)^p \| x + (-1)^p y \|^2.

And the crucial point in the complex case is that this time, we don’t need any symmetry. In other words, for any bilinear form β\beta on XX, writing Q(x)=β(x,x)Q(x) = \beta(x, x), we have

β(x,y)=14 p=0 3i pQ(x+i py). \beta(x, y) = \frac{1}{4} \sum_{p = 0}^3 i^p Q(x + i^p y).

So given a quarter-turn TT on XX, we can define a bilinear form β\beta by β(x,y)=Tx,y\beta(x, y) = \langle T x, y \rangle, and it follows immediately from this polarization identity that Tx,y=0\langle T x, y \rangle = 0 for all x,yx, y — that is, T=0T = 0.

So we’ve now shown that over \mathbb{R},

Tis a quarter-turnT *=T T \,\,\text{is a quarter-turn}\,\, \iff T^\ast = - T

but over \mathbb{C},

Tis a quarter-turnT=0. T \,\,\text{is a quarter-turn}\,\, \iff T = 0.

Obviously everything I’ve said is very well-known to those who know it. (For instance, most of it’s in Axler’s Linear Algebra Done Right.) But how should I think about these results? How can I train my intuition so that the real and complex results seem simultaneously obvious?

Whatever the intuitive picture, here’s a nice consequence, also in Axler’s book.

This pair of results immediately implies that whether we’re over \mathbb{R} or \mathbb{C}, the only self-adjoint quarter-turn is zero. Now let TT be any operator on a real or complex inner product space, and recall that TT is said to be normal if it commutes with T *T^\ast.

Equivalently, TT is normal if the operator T *TTT *T^\ast T - T T^\ast is zero.

But T *TTT *T^\ast T - T T^\ast is always self-adjoint, so TT is normal if and only if T *TTT *T^\ast T - T T^\ast is a quarter-turn.

Finally, a bit of routine messing around with inner products shows that this is in turn equivalent to

T *x=Txfor allxX. \| T^\ast x \| = \| T x \| \,\,\text{for all}\,\, x \in X.

So a real or complex operator TT is normal if and only if T *xT^\ast x and TxT x have the same length for all xx.

November 29, 2016

John BaezCompositional Frameworks for Open Systems


Here are the slides of Blake Pollard’s talk at the Santa Fe Institute workshop on Statistical Physics, Information Processing and Biology:

• Blake Pollard, Compositional frameworks for open systems, 17 November 2016.

He gave a really nice introduction to how we can use categories to study open systems, with his main example being ‘open Markov processes’, where probability can flow in and out of the set of states. People liked it a lot!


Tim GowersTime for Elsexit?

This post is principally addressed to academics in the UK, though some of it may apply to people in other countries too. The current deal that the universities have with Elsevier expires at the end of this year, and a new one has been negotiated between Elsevier and Jisc Collections, the body tasked with representing the UK universities. If you want, you can read a thoroughly misleading statement about it on Elsevier’s website. On Jisc’s website is a brief news item with a link to further details that tells you almost nothing and then contains a further link entitled “Read the full description here”, which appears to be broken. On the page with that link can be found the statement

The ScienceDirect agreement provides access to around 1,850 full text scientific, technical and medical (STM) journals – managed by renowned editors, written by respected authors and read by researchers from around the globe – all available in one place: ScienceDirect. Elsevier’s full text collection covers titles from the core scientific literature including high impact factor titles such as The Lancet, Cell and Tetrahedron.

Unless things have changed, this too is highly misleading, since up to now most Cell Press titles have not been part of the Big Deal but instead are part of a separate package. This point is worth stressing, since failure to appreciate it may cause some people to overestimate how much they rely on the Big Deal — in Cambridge at least, the Cell Press journals account for a significant percentage of our total downloads. (To be more precise, the top ten Elsevier journals accessed by Cambridge are, in order, Cell, Neuron, Current Biology, Molecular Cell, The Lancet, Developmental Cell, NeuroImage, Cell Stem Cell, Journal of Molecular Biology, and Earth and Planetary Science Letters. Of those, Cell, Neuron, Current Biology, Molecular Cell, Developmental Cell and Cell Stem Cell are Cell Press journals, and they account for over 10% of all our access to Elsevier journals.)

Jisc has also put up a Q&A, which can be found here.

Roughly how much do universities currently pay for access to ScienceDirect?

Just to remind you, here is what a number of universities were paying annually for their Elsevier subscriptions during the current deal. To be precise, these are the figures for 2014, obtained using FOI requests: they are likely to be a little higher for 2016.

University Cost Enrolment Academic Staff
Birmingham £764,553 31,070 2355 + 440
Bristol £808,840 19,220 2090 + 525
Cambridge £1,161,571 19,945 4205 + 710
Cardiff £720,533 30,000 2130 + 825
*Durham £461,020 16,570 1250 + 305
**Edinburgh £845,000 31,323 2945 + 540
*Exeter £234,126 18,720 1270 + 290
Glasgow £686,104 26,395 2000 + 650
Imperial College London £1,340,213 16,000 3295 + 535
King’s College London £655,054 26,460 2920 + 1190
Leeds £847,429 32,510 2470 + 655
Liverpool £659,796 21,875 1835 + 530
§London School of Economics £146,117 9,805 755 + 825
Manchester £1,257,407 40,860 3810 + 745
Newcastle £974,930 21,055 2010 + 495
Nottingham £903,076 35,630 2805 + 585
Oxford £990,775 25,595 5190 + 775
* ***Queen Mary U of London £454,422 14,860 1495 + 565
Queen’s U Belfast £584,020 22,990 1375 + 170
Sheffield £562,277 25,965 2300 + 460
Southampton £766,616 24,135 2065 + 655
University College London £1,381,380 25,525 4315 + 1185
Warwick £631,851 27,440 1535 + 305
*York £400,445 17,405 1205 + 285

*Joined the Russell Group two years ago.
**Information obtained by Sean Williams.
***Information obtained by Edward Hughes.
§LSE subscribes to a package of subject collections rather than to the full Freedom Collection.

These are figures for Russell Group universities: the total amount spent annually by all UK universities for access to ScienceDirect is around £40 million.

An important additional factor is that since the last deal was struck with Elsevier, we have had the Finch Report, which has led to a policy of requiring publications in the UK to be open access. The big publishers (who lobbied hard when the report was being written) have responded by turning many of their journals into “hybrid” journals, that is, subscription journals where for an additional fee, usually in the region of £2000, you can pay to make your article freely readable to everybody. This has added significantly to the total bill. Cambridge, for example, has paid over £750,000 this year in article processing charges, from a grant provided for the purpose.

How were the negotiations conducted?

Jisc started preparing for these negotiations at least two years ago, for example going on fact-finding missions round the world to see what had happened in other countries. The negotiations began in earnest in 2016, and Jisc started out with some core aims, some of which they described as red lines and some as important aims. (I know this from a briefing meeting I attended in Cambridge — I think that similar meetings took place at other universities.) Some of these were as follows.

  1. No real-terms price increases.
  2. An offsetting agreement for article processing charges.
  3. No confidentiality clauses.
  4. A move away from basing price on “historic spend”.
  5. A three-year deal rather than a five-year deal.

Let me say a little about each of these.

No real-terms price increases

This seemed extraordinarily unambitious as a starting point for negotiations. The whole point of universities asking an organization like Jisc to negotiate on our behalf was supposed to be that they would be able to negotiate hard and that the threat of not coming to an agreement would be one that Elsevier would have to be genuinely worried about. Journal prices have gone up far more than inflation for decades, while the costs of dissemination have (or at the very least should have) gone down substantially. In addition, there are a number of subjects, mathematics and high-energy physics being two notable examples, where it is now common practice to claim priority for a result by posting a preprint, and in those subjects it is less and less common for people to look at the journal versions of articles because repositories such as arXiv are much more convenient, and the value that the publishers claim they add to articles is small to nonexistent. So Jisc should have been pressing for a substantial cut in prices: maintenance of the status quo is not appropriate when technology and reading habits are changing so rapidly.

Offsetting for APCs

An offsetting agreement means a deal where if somebody pays an article processing charge in order to make an article open access in an Elsevier journal, then that charge is subtracted from the Big Deal payment. There are arguments for and against this idea. The main argument for it is that it is a way of avoiding double dipping: the phenomenon where Elsevier effectively gets paid twice for the same article, since it rakes in the article processing charges but does not reduce the subscription cost of the Big Deal.

In its defence, Elsevier makes the following two points. First, it has an explicit policy against double dipping. In answer to the obvious accusation that they are receiving a lot of APCs and we are seeing no corresponding drop in Big Deal prices, they point out that the total volume of articles they publish is going up. This highlights a huge problem with Big Deals: if universities could say that they did not want the extra content then it might be OK, but as it is, all Elsevier has to do to adhere to its policy is found enough worthless journals that nobody reads to equal the volume of articles for which APCs are paid.

But there is a second argument that carries more weight. It is that if one country has an offsetting agreement, then all other countries benefit (at least in theory) from lower subscription prices, so in total Elsevier has lost out. Or to put it another way, with an offsetting agreement, it basically becomes free for people in that country to publish an open access article with Elsevier, so they are effectively giving away that content.

Against this are two arguments: that if somebody has to lose out, why should it not be Elsevier, and that in any case it would be entirely consistent with a no-double-dipping policy for Elsevier not to reduce its Big Deal subscriptions for the other countries. In the longer term, if lots of countries had offsetting agreements, this might cease to be sustainable, since nobody would need subscriptions any more, but since most countries are not following the UK’s lead in pursuing open access with article processing charges, this is unlikely to happen any time soon.

Personally, I am not in favour of an offsetting agreement if it works on a per-article basis, since that may lead to pressure from universities for their academics to publish with Elsevier rather than with publishers that do not have offsetting agreements: that is, it gives an artificial advantage to Elsevier journals. What I would like to see is a big drop in the subscription price to allow for the fact that we are now paying a lot of APC money to Elsevier. That way, if other journals are better, they will get used, and there will be some semblance of a market.

No confidentiality clauses

It goes without saying that confidentiality clauses are one of the most obnoxious features of Elsevier contracts. And now that FOI requests have been successful in obtaining information about what universities pay for their subscriptions, they also seem rather pointless. In any case, Jisc was strongly against them, as they certainly should have been.

Another remark is that if contracts are kept confidential, there is no way of assessing whether Elsevier is double dipping.

Historic spend

When we moved from looking at print copies of journals to looking at articles online, it suddenly ceased to be obvious on what basis we should be charged. Elsevier came up with the idea of not changing anything, so even if in practice with a big deal we get access to all the journals, nominally a university subscribes to a “Core Collection”, which is based on what it used to have print subscriptions to (they are allowed to change what is in the Core Collection, but they cannot reduce its size), and then the rest goes under the Orwellian name of the Freedom Collection.

This system is manifestly unfair: for example, Cambridge, with its numerous college libraries, used to subscribe to several copies of certain journals and is now penalized for this. It also means that if a university starts to need journals less, there is no way for this to be reflected in the price it pays.

Jisc recognised the problem, and came up with a rather mealy-mouthed formula about “moving away from historic spend”. Not abolishing the system and replacing it by a fairer one (which is hard to do as there will be losers as well as winners), but “moving away” from it in ways that they did not specify when we asked about it at the briefing meeting.

A three-year deal

At some point I was told (indirectly by Cambridge’s then head librarian) that the idea was to go for a three-year deal, so that we would not be locked in for too long. This I was very pleased to hear, as a lot can change in three years.

And what was the result?

For reasons I’ve given in the previous section, even if Jisc had succeeded in its aims, I would have been disappointed by the deal. But as it was, something very strange happened. We had been told of considerable ill feeling, including cancelled meetings because the deals that Elsevier was offering were so insultingly bad, and then suddenly in late September we learned that a deal had been reached. And then when the deal was announced it was all smiles and talk of “landmark deals” and “value for money”.

So how did Jisc do, by their own criteria? Well, it is conceivable that they will end up achieving their first aim of not having any real-terms price increases: this will depend on whether Brexit causes enough inflation to cancel out such money-terms price increases as there may or may not be — I leave it to you to guess which. (In the interests of balance, I should also point out that the substantial drop in the pound means that what Elsevier receives has, in their terms, gone down. That said, currency fluctuations are a fact of life and over the last few years they have benefited a lot from a weak euro.)

Jisc said that an offsetting agreement was not just an aspiration but a red line — a requirement of any deal they would be prepared to strike. However, there is no offsetting agreement.

Jisc also said that they would insist on transparency, but when Elsevier insisted on confidentiality clauses, they meekly accepted this. (Their reasoning: Elsevier was not prepared to reach a deal without these clauses. But why didn’t an argument of exactly the same type apply to Jisc in the other direction?) It is for that reason that I have been a bit vague about prices above.

As far as historic spend is concerned, I see on the Jisc statement the following words: “The agreement includes the ability for the consortium to migrate from historical print spend and reallocate costs should we so wish.” I have no information about whether any “migration” has started, but my guess would be that it hasn’t, since if there were to be moves in that direction, then there would surely need to be difficult negotiations between the universities about how to divide up the total bill, and there has been no sign of any such negotiations taking place.

Finally, the deal is for five years and not for three years.

So Jisc has not won any clear victories and has had several clear defeats. Now if you were in that position more than three months before the end of the existing deal, what would you do? Perhaps you would follow the course suggested by a Jisc representative at one of the briefing meetings, who said the following.

We know from analysis of the experiences of other consortia that Elsevier really do want to reach an agreement this year. They really hate to go over into the next year …

A number of colleagues from other consortia have said they wished they had held on longer …

If we can hold firm even briefly into 2017 that should have quite a profound impact on what we can achieve in these negotiations.

Of course, all that is just common sense. But this sensible negotiating strategy was mysteriously abandoned, on the grounds that it had become clear that the deal on offer was the best that Jisc was going to get. Again there is a curious lack of symmetry here: why didn’t Jisc make it clear that a better deal (for Jisc) was the best that Elsevier was going to get? At the very least, why didn’t Jisc at least try to extract further concessions from Elsevier by letting the negotiations continue until much closer to the expiry of the current deal?

Jisc defended itself by saying that their job was simply to obtain the best deal they could to put before the universities, but no university was obliged to sign up to the deal. This is not a wholly satisfactory response, since (i) the whole point of using Jisc rather than negotiating individually was to exploit the extra bargaining power that should come from acting in concert and (ii) Elsevier have made it clear that they will not offer a better deal to any institution that opts out of the Jisc-negotiated one. (This is one of many parallels with Brexit — in this case with the fact that the EU cannot be seen to be giving the UK a better deal than it had in the EU.)

A particularly irritating aspect of the situation was that I and some others had organized for an open letter to be sent to Jisc from many academics, urging them to bargain hard. We asked Jisc whether this would be helpful and they requested that we should delay sending it until after a particular meeting with Elsevier had taken place. And then the premature deal took us by surprise and the letter never got sent.

What is happening now?

Several universities have already accepted the deal, and the mood amongst heads of department in Cambridge appears to be that although it is not a good deal we do not have a realistic alternative to accepting it. This may be correct, but we appear to be rushing into a decision (in Cambridge it is due to be taken in a few days’ time). We are talking about a lot of money: would it not be sensible to delay signing a contract until there has been a proper assessment of the consequences of rejecting a deal?

For Cambridge, I personally would be in favour of cancelling the Big Deal and subscribing individually to a selection of the most important journals, even if this ended up costing more than what we pay at the moment. The reason is that we would have taken back control (those parallels again). At the moment the market is completely dysfunctional, since the price we pay bears virtually no relationship to demand. But if departments were given budgets and told they could choose whether to spend them on journal subscriptions or to use the money for other purposes, then they would be able to do a proper cost-benefit analysis and act on it. Then as more and more papers became freely available online, costs would start to go down. And if other universities did the same (as some notable universities such as Harvard already have), then Elsevier might start having to lower the list prices of their journals.

If the deal is accepted, it should not be the end of the story. A large part of the reason that Elsevier and the other large publishers walk all over Jisc in these negotiations is that we lack a credible Plan B. (For mathematics there is one — just cancel the deal and read papers on the arXiv, as we do already — but many other subjects have not reached this stage.) We need to think about this, so that in future negotiations any threat to cancel the deal is itself credible. We also need to think about whether Jisc is the right body to be negotiating on our behalf, given what has happened this time. What I am hearing from many people, even those who think we should accept the deal, is full agreement that it is a bad one. Even if we accept it, the very least we can do is make clear that we are not happy with what we are accepting. It may not be very polite to those at Jisc who worked hard on our behalf, but we have paid a heavy price for politeness.

If Elsevier will not give us a proper market, we can at least create mini-markets ourselves within universities: why not charge more from faculties that rely on ScienceDirect more heavily? Such is the culture of secrecy that I am not even allowed to tell you how the cost is shared out in Cambridge, but it does not appear to be based on need.

I am often asked why I focus on Elsevier, but the truth is that I no longer do: Springer, Wiley, and Taylor and Francis are in many ways just as bad, and in some respects are even worse. (For example, while Elsevier now makes mathematics papers over four years old freely available, Springer has consistently refused to make any such move.) I am very reluctant to submit papers to any of these publishers — for example, now that the London Mathematical Society has switched from OUP to Wiley I will not be sending papers to their journals. It will be depressing if we have to wait another five years to improve the situation with Elsevier, but in the meantime there are smaller, but still pretty big, Big Deals coming up with the other members of the big four. Because they are smaller, perhaps we are less reliant on their journals, and perhaps that would allow us to drive harder bargains.

In any case, if you are unhappy with the way things are, please make your feelings known. Part of the problem is that the people who negotiate on our behalf are, quite reasonably, afraid of the reaction they would get if we lost access to important journals. It’s just a pity that they are not also afraid of the reaction if the deal they strike is significantly more expensive than it need have been. (We are in a classic game-theoretic situation where there is a wide range of prices at which it is worth it for Elsevier to provide the deal and not worth it for a university to cancel it, and Elsevier is very good at pushing the price to the top of this range.) Pressure should also be put on librarians to get organized with a proper Plan B so that we can survive for a reasonable length of time without Big Deal subscriptions. Just as with nuclear weapons, it is not necessary for such a Plan B ever to be put to use, but it needs to exist and be credible so that any threat to walk away from negotiations will be taken seriously.

n-Category Café Linear Algebraic Groups (Part 8)

The course proceeds apace, but my notifications here have slowed down as I become over-saturated with work.

In Part 8, I began explaining a bit of algebraic geometry. Following the general pattern of this course I took a quasi-historical approach, explaining some older ideas before moving on to newer ones. I’m afraid I never got to explaining schemes. That’s a tragedy, but hey—life is full of tragedies, nobody will notice this one. Affine schemes is all I had time for, despite the fact that I was discussing a lot of projective geometry. And before explaining affine schemes, it seemed wise to mention some earlier ideas and their defects.

  • Lecture 8 (Oct. 18) - Group objects in various categories. What’s the right category for linear algebraic groups? First try: algebraic sets. Over an algebraically complete field kk, Hilbert’s Nullstellensatz says there’s an order-reversing one-to-one correspondence between algebraic sets SXS \subseteq X in a finite-dimensional vector space XX over kk and radical ideals Jk[X]J \subseteq k[X] of the polynomial algebra k[X]k[X]. The algebra k[S]k[S] of polynomials restricted to SS is isomorphic to k[X]/Jk[X]/J. Problem: we’d like an ‘intrinsic’ approach that does not make use of the ambient space XX. Second try: affine algebras. For any algebraic set SS, the algebra k[S]k[S] is an affine algebra, meaning a finite-generated commutative algebra without nilpotents. Up to isomorphism, every affine algebra arises this way. Thus we can use affine algebras as a more intrinsic substitute for algebraic sets. Problem: all this works only over an algebraically complete field.

Supplementary reading:

November 28, 2016

Tommaso DorigoThe Five Stages Of A Dying Theory

I am told that when a patient is diagnosed with a terminal illness, he or she will likely go through a well-defined sequence of stages. 
The first stage is Denial: the patient will convince him- or herself that there is a mistake in the diagnosis, that somehow the doctors are wrong, or something alike. It is a protective, visceral reaction, one preventing the shock of reckoning with a completely altered landscape. There follows a state of Anger: the "why me" sentiment is the cause of this state of mind. Then there is Fear, brought about by the lack of knowledge of what is coming. Then comes Grief - for oneself as well as for the loved ones. And finally, Acceptance, which brings peace to the soul.

read more

BackreactionThis isn’t quantum physics. Wait. Actually it is.

Rocket science isn’t what it used to be. Now that you can shoot someone to Mars if you can spare a few million, the colloquialism for “It’s not that complicated” has become “This isn’t quantum physics.” And there are many things which aren’t quantum physics. For example, making a milkshake:
“Guys, this isn’t quantum physics. Put the stuff in the blender.”
Or losing weight:
“if you burn more calories than you take in, you will lose weight. This isn't quantum physics.”
Or economics:
“We’re not talking about quantum physics here, are we? We’re talking ‘this rose costs 40p, so 10 roses costs £4’.”
You should also know that Big Data isn’t Quantum Physics and Basketball isn’t Quantum Physics and not driving drunk isn’t quantum physics. Neither is understanding that “[Shoplifting isn’t] a way to accomplish anything of meaning,” or grasping that no doesn’t mean yes.

But my favorite use of the expression comes from Noam Chomsky who explains how the world works (so the modest title of his book):
“Everybody knows from their own experience just about everything that’s understood about human beings – how they act and why – if they stop to think about it. It’s not quantum physics.”
From my own experience, stopping to think and believing one understands other people effortlessly is the root of much unnecessary suffering. Leaving aside that it’s quite remarkable some people believe they can explain the world, and even more remarkable others buy their books, all of this is, as a matter of fact, quantum physics. Sorry, Noam.

Yes, that’s right. Basketballs, milkshakes, weight loss – it’s all quantum physics. Because it’s all happening by the interactions of tiny particles which obey the rules of quantum mechanics. If it wasn’t for quantum physics, there wouldn’t be atoms to begin with. There’d be no Sun, there’d be no drunk driving, and there’d be no rocket science.

Quantum mechanics is often portrayed as the theory of the very small, but this isn’t so. Quantum effects can stretch over large distances and have been measured over distances up to several hundred kilometers. It’s just that we don’t normally observe them in daily life.

The typical quantum effects that you have heard of – things whose position and momentum can’t be measured precisely, are both dead and alive, have a spooky action at a distance and so on – don’t usually manifest themselves for large objects. But that doesn’t mean that the laws of quantum physics suddenly stop applying at a hair’s width. It’s just that the effects are feeble and human experience is limited. There is some quantum physics, however, which we observe wherever we look: If it wasn’t for Pauli’s exclusion principle, you’d fall right through the ground.

Indeed, a much more interesting question is What is not quantum physics?” For all we presently know, the only thing not quantum is space-time and its curvature, manifested by gravity. Most physicists believe, however, that gravity too is a quantum theory, just that we haven’t been able to figure out how this works.

“This isn’t quantum physics,” is the most unfortunate colloquialism ever because really everything is quantum physics. Including Noam Chomsky.

November 27, 2016

John BaezJarzynksi on Non-Equilibrium Statistical Mechanics


Here at the Santa Fe Institute we’re having a workshop on Statistical Physics, Information Processing and Biology. Unfortunately the talks are not being videotaped, so it’s up to me to spread the news of what’s going on here.

Christopher Jarzynski is famous for discovering the Jarzynski equality. It says

\displaystyle{ e^ { -\Delta F / k T} = \langle e^{ -W/kT } \rangle }

where k is Boltzmann’s consstant and T is the temperature of a system that’s in equilibrium before some work is done on it. \Delta F is the change in free energy, W is the amount of work, and the angle brackets represent an average over the possible options for what takes place—this sort of process is typically nondeterministic.

We’ve seen a good quick explanation of this equation here on Azimuth:

• Eric Downes, Crooks’ Fluctuation Theorem, Azimuth, 30 April 2011.

We’ve also gotten a proof, where it was called the ‘integral fluctuation theorem’:

• Matteo Smerlak, The mathematical origin of irreversibility, Azimuth, 8 October 2012.

It’s a fundamental result in nonequilibrium statistical mechanics—a subject where inequalities are so common that this equation is called an ‘equality’.

Two days ago, Jarzynski gave an incredibly clear hour-long tutorial on this subject, starting with the basics of thermodynamics and zipping forward to modern work. With his permission, you can see the slides here:

• Christopher Jarzynski, A brief introduction to the delights of non-equilibrium statistical physics.

Also try this review article:

• Christopher Jarzynski, Equalities and inequalities: irreversibility and the Second Law of thermodynamics at the nanoscale, Séminaire Poincaré XV Le Temps (2010), 77–102.

Terence TaoMath 246A, Notes 3: Cauchy’s theorem and its consequences

We now come to perhaps the most central theorem in complex analysis (save possibly for the fundamental theorem of calculus), namely Cauchy’s theorem, which allows one to compute (or at least transform) a large number of contour integrals {\int_\gamma f(z)\ dz} even without knowing any explicit antiderivative of {f}. There are many forms and variants of Cauchy’s theorem. To give one such version, we need the basic topological notion of a homotopy:

Definition 1 (Homotopy) Let {U} be an open subset of {{\bf C}}, and let {\gamma_0: [a,b] \rightarrow U}, {\gamma_1: [a,b] \rightarrow U} be two curves in {U}.

  • (i) If {\gamma_0, \gamma_1} have the same initial point {z_0} and final point {z_1}, we say that {\gamma_0} and {\gamma_1} are homotopic with fixed endpoints in {U} if there exists a continuous map {\gamma: [0,1] \times [a,b] \rightarrow U} such that {\gamma(0,t) = \gamma_0(t)} and {\gamma(1,t) = \gamma_1(t)} for all {t \in [a,b]}, and such that {\gamma(s,a) = z_0} and {\gamma(s,b) = z_1} for all {s \in [0,1]}.
  • (ii) If {\gamma_0, \gamma_1} are closed (but possibly with different initial points), we say that {\gamma_0} and {\gamma_1} are homotopic as closed curves in {U} if there exists a continuous map {\gamma: [0,1] \times [a,b] \rightarrow U} such that {\gamma(0,t) = \gamma_0(t)} and {\gamma(1,t) = \gamma_1(t)} for all {t \in [a,b]}, and such that {\gamma(s,a) = \gamma(s,b)} for all {s \in [0,1]}.
  • (iii) If {\gamma_2: [c,d] \rightarrow U} and {\gamma_3: [e,f] \rightarrow U} are curves with the same initial point and same final point, we say that {\gamma_2} and {\gamma_3} are homotopic with fixed endpoints up to reparameterisation in {U} if there is a reparameterisation {\tilde \gamma_2: [a,b] \rightarrow U} of {\gamma_2} which is homotopic with fixed endpoints in {U} to a reparameterisation {\tilde \gamma_3: [a,b] \rightarrow U} of {\gamma_3}.
  • (iv) If {\gamma_2: [c,d] \rightarrow U} and {\gamma_3: [e,f] \rightarrow U} are closed curves, we say that {\gamma_2} and {\gamma_3} are homotopic as closed curves up to reparameterisation in {U} if there is a reparameterisation {\tilde \gamma_2: [a,b] \rightarrow U} of {\gamma_2} which is homotopic as closed curves in {U} to a reparameterisation {\tilde \gamma_3: [a,b] \rightarrow U} of {\gamma_3}.

In the first two cases, the map {\gamma} will be referred to as a homotopy from {\gamma_0} to {\gamma_1}, and we will also say that {\gamma_0} can be continously deformed to {\gamma_1} (either with fixed endpoints, or as closed curves).

Example 2 If {U} is a convex set, that is to say that {(1-s) z_0 + s z_1 \in U} whenever {z_0,z_1 \in U} and {0 \leq s \leq 1}, then any two curves {\gamma_0, \gamma_1: [0,1] \rightarrow U} from one point {z_0} to another {z_1} are homotopic, by using the homotopy

\displaystyle \gamma(s,t) := (1-s) \gamma_0(t) + s \gamma_1(t).

For a similar reason, in a convex open set {U}, any two closed curves will be homotopic to each other as closed curves.

Exercise 3 Let {U} be an open subset of {{\bf C}}.

  • (i) Prove that the property of being homotopic with fixed endpoints in {U} is an equivalence relation.
  • (ii) Prove that the property of being homotopic as closed curves in {U} is an equivalence relation.
  • (iii) If {\gamma_0, \gamma_1: [a,b] \rightarrow U} are closed curves with the same initial point, show that {\gamma_0} is homotopic to {\gamma_1} as closed curves if and only if {\gamma_0} is homotopic to {\gamma_2 + \gamma_1 + (-\gamma_2)} with fixed endpoints for some closed curve {\gamma_2} with the same initial point as {\gamma_0} or {\gamma_1}.
  • (iv) Define a point in {U} to be a curve {\gamma_1: [a,b] \rightarrow U} of the form {\gamma_1(t) = z_0} for some {z_0 \in U} and all {t \in [a,b]}. Let {\gamma_0: [a,b] \rightarrow U} be a closed curve in {U}. Show that {\gamma_0} is homotopic with fixed endpoints to a point in {U} if and only if {\gamma_0} is homotopic as a closed curve to a point in {U}. (In either case, we will call {\gamma_0} homotopic to a point, null-homotopic, or contractible to a point in {U}.)
  • (v) If {\gamma_0, \gamma_1: [a,b] \rightarrow U} are curves with the same initial point and the same terminal point, show that {\gamma_0} is homotopic to {\gamma_1} with fixed endpoints in {U} if and only if {\gamma_0 + (-\gamma_1)} is homotopic to a point in {U}.
  • (vi) If {U} is connected, and {\gamma_0, \gamma_1: [a,b] \rightarrow U} are any two curves in {U}, show that there exists a continuous map {\gamma: [0,1] \times [a,b] \rightarrow U} such that {\gamma(0,t) = \gamma_0(t)} and {\gamma(1,t) = \gamma_1(t)} for all {t \in [a,b]}. Thus the notion of homotopy becomes rather trivial if one does not fix the endpoints or require the curve to be closed.
  • (vii) Show that if {\gamma_1: [a,b] \rightarrow U} is a reparameterisation of {\gamma_0: [a,b] \rightarrow U}, then {\gamma_0} and {\gamma_1} are homotopic with fixed endpoints in U.
  • (viii) Prove that the property of being homotopic with fixed endpoints in {U} up to reparameterisation is an equivalence relation.
  • (ix) Prove that the property of being homotopic as closed curves in {U} up to reparameterisation is an equivalence relation.

We can then phrase Cauchy’s theorem as an assertion that contour integration on holomorphic functions is a homotopy invariant. More precisely:

Theorem 4 (Cauchy’s theorem) Let {U} be an open subset of {{\bf C}}, and let {f: U \rightarrow {\bf C}} be holomorphic.

  • (i) If {\gamma_0: [a,b] \rightarrow U} and {\gamma_1: [c,d] \rightarrow U} are rectifiable curves that are homotopic in {U} with fixed endpoints up to reparameterisation, then

    \displaystyle \int_{\gamma_0} f(z)\ dz = \int_{\gamma_1} f(z)\ dz.

  • (ii) If {\gamma_0: [a,b] \rightarrow U} and {\gamma_1: [c,d] \rightarrow U} are closed rectifiable curves that are homotopic in {U} as closed curves up to reparameterisation, then

    \displaystyle \int_{\gamma_0} f(z)\ dz = \int_{\gamma_1} f(z)\ dz.

This version of Cauchy’s theorem is particularly useful for applications, as it explicitly brings into play the powerful technique of contour shifting, which allows one to compute a contour integral by replacing the contour with a homotopic contour on which the integral is easier to either compute or integrate. This formulation of Cauchy’s theorem also highlights the close relationship between contour integrals and the algebraic topology of the complex plane (and open subsets {U} thereof). Setting {\gamma_1} to be a point, we obtain an important special case of Cauchy’s theorem (which is in fact equivalent to the full theorem):

Corollary 5 (Cauchy’s theorem, again) Let {U} be an open subset of {{\bf C}}, and let {f: U \rightarrow {\bf C}} be holomorphic. Then for any closed rectifiable curve {\gamma} in {U} that is contractible in {U} to a point, one has {\int_\gamma f(z)\ dz = 0}.

Exercise 6 Show that Theorem 4 and Corollary 5 are logically equivalent.

An important feature to note about Cauchy’s theorem is the global nature of its hypothesis on {f}. The conclusion of Cauchy’s theorem only involves the values of a function {f} on the images of the two curves {\gamma_0, \gamma_1}. However, in order for the hypotheses of Cauchy’s theorem to apply, the function {f} must be holomorphic not only on the images on {\gamma_0, \gamma_1}, but on an open set {U} that is large enough (and sufficiently free of “holes”) to support a homotopy between the two curves. This point can be emphasised through the following fundamental near-counterexample to Cauchy’s theorem:

Example 7 Let {U := {\bf C} \backslash \{0\}}, and let {f: U \rightarrow {\bf C}} be the holomorphic function {f(z) := \frac{1}{z}}. Let {\gamma_{0,1,\circlearrowleft}: [0,2\pi] \rightarrow {\bf C}} be the closed unit circle contour {\gamma_{0,1,\circlearrowleft}(t) := e^{it}}. Direct calculation shows that

\displaystyle \int_{\gamma_{0,1,\circlearrowleft}} f(z)\ dz = 2\pi i \neq 0.

As a consequence of this and Cauchy’s theorem, we conclude that the contour {\gamma_{0,1,\circlearrowleft}} is not contractible to a point in {U}; note that this does not contradict Example 2 because {U} is not convex. Thus we see that the lack of holomorphicity (or singularity) of {f} at the origin can be “blamed” for the non-vanishing of the integral of {f} on the closed contour {\gamma_{0,1,\circlearrowleft}}, even though this contour does not come anywhere near the origin. Thus we see that the global behaviour of {f}, not just the behaviour in the local neighbourhood of {\gamma_{0,1,\circlearrowleft}}, has an impact on the contour integral.

One can of course rewrite this example to involve non-closed contours instead of closed ones. For instance, if we let {\gamma_0, \gamma_1: [0,\pi] \rightarrow U} denote the half-circle contours {\gamma_0(t) := e^{it}} and {\gamma_1(t) := e^{-it}}, then {\gamma_0,\gamma_1} are both contours in {U} from {+1} to {-1}, but one has

\displaystyle \int_{\gamma_0} f(z)\ dz = +\pi i


\displaystyle \int_{\gamma_1} f(z)\ dz = -\pi i.

In order for this to be consistent with Cauchy’s theorem, we conclude that {\gamma_0} and {\gamma_1} are not homotopic in {U} (even after reparameterisation).

In the specific case of functions of the form {\frac{1}{z}}, or more generally {\frac{f(z)}{z-z_0}} for some point {z_0} and some {f} that is holomorphic in some neighbourhood of {z_0}, we can quantify the precise failure of Cauchy’s theorem through the Cauchy integral formula, and through the concept of a winding number. These turn out to be extremely powerful tools for understanding both the nature of holomorphic functions and the topology of open subsets of the complex plane, as we shall see in this and later notes.

— 1. Proof of Cauchy’s theorem —

The underlying reason for the truth of Cauchy’s theorem can be explained in one sentence: complex differentiable functions behave locally like complex linear functions, which are conservative thanks to the fundamental theorem of calculus. More precisely, if {f(z) = a + bz} is any complex linear function of {z}, then {f} has an antiderivative {az + \frac{1}{2} bz^2}, and hence

\displaystyle \int_\gamma a+bz\ dz = 0 \ \ \ \ \ (1)

for any rectifiable closed curve {\gamma} in the complex plane.

Perhaps the slickest way to make this intuition rigorous is through the following special case of Cauchy’s theorem.

Theorem 8 (Goursat’s theorem) Let {U} be an open subset of {{\bf C}}, and {z_1,z_2,z_3} be complex numbers such that the solid (and closed) triangle spanned by {z_1,z_2,z_3} (or more precisely, the convex hull of {\{z_1,z_2,z_3\}}) is contained in {U}. (We allow the triangle to degenerate in that we allow the {z_1,z_2,z_3} to be collinear, or even coincident.) Then for any holomorphic function {f: U \rightarrow {\bf C}}, one has

\displaystyle \int_{\gamma_{z_1 \rightarrow z_2 \rightarrow z_3 \rightarrow z_1}} f(z)\ dz = 0,

where {\gamma_{z_1 \rightarrow z_2 \rightarrow z_3 \rightarrow z_1}} is the closed polygonal path that traverses the vertices {z_1, z_2, z_3} of the solid triangle in order.

Proof: Let us denote the triangular contour {\gamma_{z_1 \rightarrow z_2 \rightarrow z_3 \rightarrow z_1}} as {T_0}. It is convenient (though odd-looking at first sight) to prove this theorem by contradiction. That is to say, suppose for contradiction that we had

\displaystyle |\int_{T_0} f(z)\ dz| \geq \varepsilon \ \ \ \ \ (2)

for some {\varepsilon > 0}. We now run the following “divide and conquer” strategy. We let {z_{12} := \frac{z_1+z_2}{2}}, {z_{23} := \frac{z_2+z_3}{2}}, {z_{31} := \frac{z_3 + z_1}{2}} be the midpoints of {z_1, z_2, z_3}. Then from the basic properties of contour integration (see Exercise 16 of Notes 2) we can split the triangular integral {\int_{T_1} f(z)\ dz} as the sum of four integrals on smaller triangles, namely

\displaystyle \int_{\gamma_{z_1 \rightarrow z_{12} \rightarrow z_{31} \rightarrow z_1}} f(z)\ dz

\displaystyle \int_{\gamma_{z_{12} \rightarrow z_2 \rightarrow z_{23} \rightarrow z_{12}}} f(z)\ dz

\displaystyle \int_{\gamma_{z_{23} \rightarrow z_3 \rightarrow z_{31} \rightarrow z_{23}}} f(z)\ dz

\displaystyle \int_{\gamma_{z_{12} \rightarrow z_{23} \rightarrow z_{31} \rightarrow z_{12}}} f(z)\ dz.

(The reader is encouraged to draw a picture to visualise this decomposition.) By (2) and the triangle inequality (or, if one prefers, the pigeonhole principle), we must therefore have

\displaystyle |\int_{T_1} f(z)\ dz| \geq \frac{\varepsilon}{4}

where {T_1} is one of the four triangular contours {\gamma_{z_1 \rightarrow z_{12} \rightarrow z_{31} \rightarrow z_1}}, {\gamma_{z_{12} \rightarrow z_2 \rightarrow z_{23} \rightarrow z_{12}}}, {\gamma_{z_{23} \rightarrow z_3 \rightarrow z_{31} \rightarrow z_{23}}}, or {\gamma_{z_{12} \rightarrow z_{23} \rightarrow z_{31} \rightarrow z_{12}}}. Regardless of which of the four contours {T_1} is, observe that the triangular region enclosed by {T_1} is contained in that of {T_0}. Furthermore, the diameter of {T_1} is precisely half that of {T_0}, where the diameter {\mathrm{diam}(\gamma)} of a curve {\gamma: [a,b] \rightarrow {\bf C}} is defined by the formula

\displaystyle \mathrm{diam}(\gamma) := \sup_{t, t' \in [a,b]} |\gamma(t) - \gamma(t')|;

similarly, the perimeter {|T_1|} of {T_1} is precisely half that of {T_0}. If we iterate the above process, we can find a nested sequence {T_0, T_1, T_2, \dots} of triangular contours, each of which is contained in the previous one with half the diameter and perimeter, such that

\displaystyle |\int_{T_n} f(z)\ dz| \geq \frac{\varepsilon}{4^n} \ \ \ \ \ (3)

for all {n=0,1,2,\dots}. If we let {z_n} be any point enclosed by {T_n}, then from the decreasing diameters it is clear that the {z_n} are a Cauchy sequence and thus converge to some limit {z_*}, which is then contained in all of the closed triangles enclosed by any of the {T_n}.

In particular, {z_*} lies in {U} and so {f} is differentiable at {z_*}. This implies, for any {\varepsilon'>0}, that there exists a {\delta > 0} such that

\displaystyle |\frac{f(z) - f(z_*)}{z-z_*} - f'(z_*)| \leq \varepsilon'

whenever {z \in D(z_*,\delta) \backslash \{z_*\}}. We can rearrange this as

\displaystyle |f(z) - (f(z_*) + (z-z_*) f'(z_*))| \leq \varepsilon' |z-z_*|

on {D(z_*,\delta)}. In particular, for {n} large enough, this bound holds on the image on {T_n}. In this case we can bound {|z-z_*|} by {\mathrm{diam}(T_n)}, and hence by Exercise 16(v) of Notes 2,

\displaystyle |\int_{T_n} f(z)\ dz - \int_{T_n} (f(z_*) + (z-z_*) f'(z_*))\ dz| \leq \varepsilon' \hbox{diam}(T_n) |T_n|.

From (1), the second integral vanishes. As each {T_n} has half the diameter and perimeter of the previous, we thus have

\displaystyle |\int_{T_n} f(z)\ dz| \leq \frac{\varepsilon' \hbox{diam}(T_0) |T_0|}{4^n}.

But if one chooses {\varepsilon'} small enough depending on {\varepsilon} and {T_0}, we contradict (3). \Box

Remark 9 This is a rare example of an argument in which a hypothesis of differentiability, rather than continuous differentiability, is used, because one can localise any failure of the conclusion all the way down to a single point. Another instance of such an argument is the standard proof of Rolle’s theorem.

Exercise 10 Find a proof of Goursat’s theorem that avoids explicit use of proof by contradiction. (Hint: use the fact that a solid triangle is compact, in the sense that every open cover has a finite subcover. For the purposes of this question, ignore the possibility that the proof of this latter fact might also use proof by contradiction.)

Goursat’s theorem only directly handles triangular contours, but as long as one works “locally”, or more precisely in a convex domain, we can quickly generalise:

Corollary 11 (Local Cauchy’s theorem for polygonal paths) Let {U} be a convex open subset of {{\bf C}}, and let {f: U \rightarrow {\bf C}} be a holomorphic function. Then for any closed polygonal path {\gamma = \gamma_{z_1 \rightarrow \dots \rightarrow z_n \rightarrow z_1}} in {U}, we have {\int_\gamma f(z)\ dz = 0}.

Proof: We induct on the number of vertices {n}. The cases {n=1,2} are trivial, and the {n=3} case follows directly from Goursat’s theorem (using the convexity of {U} to ensure that the interior of the polygon lies in {U}). If {n > 3}, we can split

\displaystyle \int_{\gamma_{z_1 \rightarrow \dots \rightarrow z_n \rightarrow z_1}} f(z)\ dz = \int_{\gamma_{z_1 \rightarrow \dots \rightarrow z_{n-1} \rightarrow z_1}} f(z)\ dz + \int_{z_{n-1} \rightarrow z_n \rightarrow z_1 \rightarrow z_{n-1}} f(z)\ dz.

The second integral on the right-hand side vanishes by Goursat’s theorem. The claim then follows from induction. \Box

Exercise 12 By using the (real-variable) fundamental theorem of calculus and Fubini’s theorem in place of Goursat’s theorem, give an alternate proof of Corollary 11 in the case that {\gamma} is a rectangle {\gamma = \gamma_{a+bi \rightarrow c+bi \rightarrow c+di \rightarrow a+di \to a+bi}} and the derivative {f'} of {f} is continuous. (One can also use Stokes’ theorem in place of the fundamental theorem of calculus and Fubini’s theorem.)

We can amplify Corollary 11 using the fundamental theorem of calculus again:

Corollary 13 (Local Cauchy’s theorem) Let {U} be a convex open subset of {{\bf C}}, and let {f: U \rightarrow {\bf C}} be a holomorphic function. Then {f} has an antiderivative {F: U \rightarrow {\bf C}}. Also, {\int_\gamma f(z)\ dz = 0} for any closed rectifiable curve {\gamma} in {U}, and {\int_{\gamma_1} f(z)\ dz = \int_{\gamma_2} f(z)\ dz} whenever {\gamma_1, \gamma_2} are two rectifiable curves in {U} with the same initial point and same terminal point. In other words, {f} is conservative on {U}.

Proof: The first claim follows from Corollary 11 and the second fundamental theorem of calculus (Theorem 30 from Notes 2). The remaining claims then follow from the first fundamental theorem of calculus (Theorem 27 from Notes 2). \Box

We can now prove Cauchy’s theorem in the form of Theorem 4.

Proof: We will just prove part (i), as part (ii) is similar (and in any event it follows from part (i)). Since reparameterisation does not affect the integral, we may assume without loss of generality that {\gamma_0: [a,b] \rightarrow U} and {\gamma_1: [a,b] \rightarrow U} are homotopic with fixed endpoints, and not merely homotopic with fixed endpoints up to reparameterisation.

Let {\gamma: [0,1] \times [a,b] \rightarrow U} be a homotopy from {\gamma_0} to {\gamma_1}. Note that for any {s \in [0,1]} and {t \in [a,b]}, {\gamma(s,t)} lies in the open set {U}. From compactness, there must exist a radius {r>0} such that {D(\gamma(s,t),r) \subset U} for all {s \in [0,1]} and {t \in [a,b]}. Next, as {\gamma} is continuous on a compact set, it is uniformly continuous. In particular, there exists {\delta > 0} such that

\displaystyle |\gamma(s',t') - \gamma(s,t)| \leq \frac{r}{4}

whenever {s,s' \in [0,1]} and {t,t' \in [a,b]} are such that {|s-s'| \leq \delta} and {|t-t'| \leq \delta}.

Now partition {[0,1]} and {[a,b]} as {0 = s_0 < \dots < s_n = 1} and {a = t_0 < \dots < t_m = b} in such a way that {|s_i - s_{i-1}| \leq \delta} and {|t_j-t_{j-1}| \leq \delta} for all {1 \leq i \leq n} and {1 \leq j \leq m}. For each such {i} and {j}, let {C_{i,j}} denote the closed polygonal contour

\displaystyle C_{i,j} := \gamma_{\gamma(s_i,t_{j-1}) \rightarrow \gamma(s_i,t_j) \rightarrow \gamma(s_{i-1},t_j) \rightarrow \gamma(s_{i-1},t_{j-1}) \rightarrow \gamma(s_i,t_{j-1})}.

(the reader is encouraged here to draw a picture of the situation; we are using polygonal contours here rather than the homotopy {\gamma} because we did not require any rectifiability properties on the homotopy). By construction, the diameter of this contour is at most {\frac{r}{4}+\frac{4}{4}+\frac{r}{4}+\frac{r}{4} = r}, so the contour is contained entirely in the disk {D( \gamma(s_i,t_i), r)}. This disk is convex and contained in {U}. Applying Corollary 11 or Corollary 13, we conclude that

\displaystyle \int_{C_{i,j}} f(z)\ dz = 0

for all {1 \leq i \leq n} and {1 \leq j \leq m}. If we sum this over all {i} and {j}, and noting that the homotopy fixes the endpoints, we conclude after a lot of cancelling that

\displaystyle \int_{\gamma(0,t_0) \rightarrow \gamma(0, t_1) \rightarrow \dots \rightarrow \gamma(0, t_n)} f(z)\ dz = \int_{\gamma(1,t_0) \rightarrow \gamma(1, t_1) \rightarrow \dots \rightarrow \gamma(1, t_n)} f(z)\ dz

(again, the reader is encouraged to draw a picture to see this cancellation). However, from a further application of Corollary 11 we have

\displaystyle \int_{\gamma(0,t_{i-1}) \rightarrow \gamma_{0,t_i}} f(z)\ dz = \int_{\gamma_{0,[t_{i-1},t_i]}} f(z)\ dz

for {i=1,\dots,n}, where {\gamma_{0,[t_{i-1},t_i]}: [t_{i-1},t_i] \rightarrow U} is the restriction of {\gamma: [a,b] \rightarrow U} to {[t_{i-1},t_i]}, and similarly for {\gamma_1}. Putting all this together we conclude that

\displaystyle \int_{\gamma_0} f(z)\ dz = \int_{\gamma_1} f(z)\ dz

as required. \Box

One nice feature of Cauchy’s theorem is that it allows one to integrate holomorphic functions on curves that are not necessarily rectifiable. Indeed, if {\gamma: [a,b] \rightarrow U} is a curve in {U}, then for a sufficiently fine partition {a = t_0 < t_1 < \dots < t_n = b}, the polygonal (and hence rectifiable) path {\gamma_{t_0 \rightarrow t_1 \rightarrow \dots \rightarrow t_n}} will be contained in {U}, and furthermore be homotopic to {\gamma} with fixed endpoints. One can then define {\int_\gamma f(z)\ dz} when {f} is holomorphic in {U} and {\gamma} is non-rectifiable by declaring

\displaystyle \int_\gamma f(z)\ dz := \int_{\tilde \gamma} f(z)\ dz

where {\tilde \gamma} is any rectifiable curve that is homotopic (with fixed endpoints) to {\gamma}. This is a well defined definition thanks to the above discussion as well as Cauchy’s theorem; also observe that the exact open set {U} in which the homotopy lives is not relevant, since given any two open sets {U,U'} containing the image of {\gamma} one can find a rectifiable curve {\tilde \gamma} which is homotopic to {\gamma} with fixed endpoints in {U \cap U'}, and hence in {U} and {U'} separately. With this extended notion of the contour integral, one can then remove the hypothesis of rectifiability from many theorems involving integration of holomorphic functions. In particular, Cauchy’s theorem itself now holds for non-rectifiable curves. This reflects some duality in the integration concept {\int_\gamma f(z)\ dz}; if one assumes more regularity on the function {f}, one can get away with worse regularity on the curve {\gamma}, and vice versa.

A special case of Cauchy’s theorem is worth recording explicitly. We say that an open set {U} in the complex plane is simply connected if it is non-empty, connected, and if every closed curve in {U} is contractible in {U} to a point. For instance, from Example 2 we see that any convex non-empty open set is simply connected. From Theorem 4 we then have

Theorem 14 (Cauchy’s theorem, simply connected case) Let {U} be a simply connected subset of {{\bf C}}, and let {f: U \rightarrow {\bf C}} be holomorphic. Then {\int_\gamma f(z)\ dz = 0} for any closed curve in {U}. In particular (by Exercise 31 of Notes 2), {f} is conservative and has an antiderivative.

— 2. Consequences of Cauchy’s theorem —

Now that we have Cauchy’s theorem, we use it to quickly give a large number of striking consequences. We begin with a special case of the Cauchy integral formula.

Theorem 15 (Cauchy integral formula, special case) Let {U} be an open subset of {{\bf C}}, let {f: U \rightarrow {\bf C}} be holomorphic, and let {z_0} be a point in {U}. Let {r>0} be such that the closed disk {\overline{D(z_0,r)} := \{ z \in {\bf C}: |z-z_0| \leq r \}} is contained in {U}. Let {\gamma} be a closed curve in {U \backslash \{z_0\}} that is homotopic (as a closed curve, and up to reparameterisation) in {U \backslash \{z_0\}} to {\gamma_{z_0,r,\circlearrowleft}} in {U}. Then

\displaystyle f(z_0) = \frac{1}{2\pi i} \int_\gamma \frac{f(z)}{z-z_0}\ dz.

Here we are already taking advantage of the ability to integrate holomorphic functions (such as {\frac{f(z)}{z-z_0}}, which is holomorphic on {U \backslash \{z_0\}}) on curves {\gamma} that are not necessarily rectifiable.

Note the remarkable feature here that the value of {f} at some point other than that on {\gamma} is completely determined by the value of {f} on the curve {\gamma}, which is a strong manifestation of the “rigid” or “global” nature of holomorphic functions. Such a formula is certainly not available in the real case (Cauchy’s theorem is technically true on the real line, but there is no analogue of the circular contours {\gamma_{z_0,r,\circlearrowleft}} available in that setting).

Proof: Observe that for any {0 < \varepsilon < r}, the circles {\gamma_{z_0,r,\circlearrowleft}} and {\gamma_{z_0,\varepsilon,\circlearrowleft}} are homotopic (as closed curves) in {\overline{D(z_0,r)}}, and hence in {U}. Since the function {z \mapsto \frac{f(z)}{z-z_0}} is holomorphic on {U \backslash \{z_0\}}, we conclude from Cauchy's theorem that

\displaystyle \int_\gamma \frac{f(z)}{z-z_0}\ dz = \int_{\gamma_{z_0,\varepsilon,\circlearrowleft}} \frac{f(z)}{z-z_0}\ dz

As {f} is complex differentiable at {z_0}, there exists a finite {M} such that

\displaystyle |\frac{f(z)-f(z_0)}{z-z_0}| \leq M

for all {z} in {\gamma_{z_0,\varepsilon,\circlearrowleft}}, and all sufficiently small {\varepsilon}. The length of this circle is of course {2\pi \varepsilon}. Applying Exercise 16(v) of Notes 2 we have

\displaystyle | \int_{\gamma_{z_0,\varepsilon,\circlearrowleft}} \frac{f(z)-f(z_0)}{z-z_0}\ dz| \leq 2 \pi \varepsilon M.

On the other hand, from explicit computation (cf. Example 7) we have

\displaystyle \int_{\gamma_{z_0,\varepsilon,\circlearrowleft}} \frac{1}{z-z_0}\ dz = 2\pi i;

putting all this together, we see that

\displaystyle |\frac{1}{2\pi i} \int_\gamma \frac{f(z)}{z-z_0}\ dz - f(z_0)| \leq \varepsilon M.

Sending {\varepsilon} to zero, we obtain the claim. \Box

Note the same argument would give

\displaystyle m f(z_0) = \frac{1}{2\pi i} \int_\gamma \frac{f(z)}{z-z_0}\ dz

if {\gamma} were homotopic to the curve {t \mapsto z_0 + r e^{i m t}}: {0 \leq t \leq 2\pi} rather than {\gamma_{z_0,r,\circlearrowleft}}, for some integer {m}. In particular, if {\gamma} were homotopic to a point in {U \backslash \{z_0\}}, then the right-hand side would vanish.

Remark 16 For various explicit examples of closed contours {\gamma}, it is also possible to prove the Cauchy integral formula by applying Cauchy’s theorem to various “keyhole contours”. We will not pursue this approach here, but see for instance Chapter 2 of Stein-Shakarchi.

Exercise 17 (Mean value property and Poisson kernel) Let {U} be an open subset of {{\bf C}}, and let {\overline{D(z_0,r)}} be a closed disk contained in {U}.

  • (i) If {f: U \rightarrow {\bf C}} is holomorphic, show that

    \displaystyle f(z_0) = \frac{1}{2\pi} \int_0^{2\pi} f(z_0 + re^{i\theta})\ d\theta..  Use this to give an alternate proof of Exercise 26 from Notes 1.

  • (ii) If {u: U \rightarrow {\bf R}} is harmonic, show that

    \displaystyle u(z_0) = \frac{1}{2\pi} \int_0^{2\pi} u(z_0 + re^{i\theta})\ d\theta..  Use this to give an alternate proof of Theorem 25 from Notes 1.

  • (iii) If {u: U \rightarrow {\bf R}} is harmonic, show that

    \displaystyle u(z) = \frac{1}{2\pi} \int_0^{2\pi} P( \frac{z - z_0}{re^{i\theta}} ) u(z_0 + re^{i\theta})\ d\theta

    for any {z \in D(z_0,r)}, where the Poisson kernel {P: D(0,1) \rightarrow {\bf R}} is defined by the formula

    \displaystyle P(z) := \mathrm{Re} \frac{1 + z}{1-z}.

    (Hint: it simplifies the calculations somewhat if one reduces to the case {z_0=0}, {r=1}, and {z = s} for some {0 < s < 1}. Then compute the integral {\frac{1}{2\pi i} \int_{\gamma_{0,1,\circlearrowleft}} f(w) \frac{1}{2} (\frac{1+s/w}{1-s/w} + \frac{1+sw}{1-sw})\ dw} in two different ways, where {f=u+iv} is holomorphic with real part {u}.)

The first important consequence of the Cauchy integral formula is the analyticity of holomorphic functions:

Corollary 18 (Holomorphic functions are analytic) Let {U} be an open subset of {{\bf C}}, let {f: U \rightarrow {\bf C}} be holomorphic, and let {z_0} be a point in {U}. Let {r>0} be such that the closed disk {\overline{D(z_0,r)} := \{ z \in {\bf C}: |z-z_0| \leq r \}} is contained in {U}. For each natural number {n}, let {a_n} denote the complex number

\displaystyle a_n := \frac{1}{2\pi i} \int_{\gamma_{z_0,r,\circlearrowleft}} \frac{f(z)}{(z-z_0)^{n+1}}\ dz. \ \ \ \ \ (4)

Then the power series {\sum_{n=0}^\infty a_n (z-z_0)^n} has radius of convergence at least {r}, and converges to {f(z)} inside the disk.

Proof: By continuity, there exists a finite {M} such that {|f(z)| \leq M} for all {z} on the circle {\gamma_{z_0,r,\circlearrowleft}}, which of course has length {2\pi r}. From Exercise 16(v) of Notes 2 we conclude that

\displaystyle |a_n| \leq \frac{1}{2\pi} 2\pi r \frac{M}{r^{n+1}}.

From this and Proposition 7 of Notes 1, we see that the radius of convergence of {a_n} is indeed at least {r}.

Next, for any {w \in D(z_0,r)}, the circle {\gamma_{z_0,r,\circlearrowleft}} is homotopic (as a closed curve) in {\overline{D(z_0,r)} \backslash \{w\}} (and hence in {U \backslash \{w\})} to {\gamma_{w,\varepsilon,\circlearrowleft}} to {\varepsilon} small enough that {D(w,\varepsilon)} lies in {D(z_0,r)}. Applying the Cauchy integral formula, we conclude that

\displaystyle f(w) = \frac{1}{2\pi i} \int_{\gamma_{z_0,r,\circlearrowleft}} \frac{f(z)}{z-w}\ dz.

On the other hand, from the geometric series formula (Exercise 12 of Notes 1) one has

\displaystyle \frac{1}{z-w} = \sum_{n=0}^\infty \frac{1}{(z-z_0)^{n+1}} (w-z_0)^n

for all {w \in D(z_0,r)}, and thus

\displaystyle f(w) = \frac{1}{2\pi i} \int_{\gamma_{z_0,r,\circlearrowleft}} (\sum_{n=0}^\infty \frac{f(z)}{(z-z_0)^{n+1}} (w-z_0)^n)\ dz.

If we could interchange the sum and integral, we would conclude from (4) that

\displaystyle f(w) = \sum_{n=0}^\infty a_n (w-z_0)^n

which would give the claim. To justify the interchange, we will use the Weierstrass {M}-test (the dominated convergence theorem would also work here). We have the pointwise bound

\displaystyle |\frac{f(z)}{(z-z_0)^{n+1}} (w-z_0)^n| \leq \frac{M}{r^{n+1}} |w-z_0|^n;

by the geometric series formula and the hypothesis {w \in D(z_0,r)}, the sum {\sum_{n=0}^\infty \frac{M}{r^{n+1}} |w-z_0|^n} is finite, and so the {M}-test applies and we are done. \Box

Remark 19 A function {f: U \rightarrow {\bf C}} on an open set {U \subset {\bf C}} is said to be complex analytic on {U} if, for every {z_0 \in U}, there is a power series {\sum_{n=0}^\infty a_n(z-z_0)^n} with a positive radius of convergence that converges to {f} on some neighbourhood of {z_0}. Combining the above corollary with Theorem 15 of Notes 1, we see that {f} is holomorphic on {U} if and only if {f} is complex analytic on {U}; thus the terms “complex differentiable”, “holomorphic”, and “complex analytic” may be used interchangeably. This can be contrasted with real variable case: there is a completely parallel notion of a real analytic function {f: (a,b) \rightarrow {\bf R}} (i.e., a function that, around every point {x_0} in the domain, can be expanded as a convergent power series in some neighbourhood of that point), and real analytic functions are automatically smooth and differentiable, but the converse is quite false.

Recalling (see Remark 21 of Notes 1) that power series are infinitely differentiable (in both the real and complex senses) inside their disk of convergence, and working locally in various small disks in {U}, we conclude

Corollary 20 Let {U} be an open subset of {{\bf C}}, and let {f: U \rightarrow {\bf C}} be a holomorphic function. Then {f': U \rightarrow {\bf C}} is also holomorphic, and {f} is smooth (i.e. infinitely differentiable in the real sense).

In view of this corollary, we may now drop hypotheses of continuous first or second differentiability from several of the theorems in Notes 1, such as Exercise 26 from that set of notes.

Combining Corollary 20 with Proposition 28 of Notes 1 (with {{\bf C}} replaced by various rectangles in {U}), we obtain a form of elliptic regularity:

Corollary 21 (Elliptic regularity) Let {U} be an open subset of {{\bf C}}, and let {u: U \rightarrow {\bf R}} be a harmonic function. Then {u} is smooth.

In fact one can even omit the hypothesis of continuous twice differentiability in the definition of harmonicity if one works with the notion of weak harmonicity, but this is a topic for a PDE or distribution theory course and will not be pursued further here.

Another immediate consequence of Corollary 18 is a version of the factor theorem:

Corollary 22 (Factor theorem for analytic functions) Let {U} be an open subset of {{\bf C}}, and let {z_0} be a point in {{\bf C}}. Let {f: U \rightarrow {\bf C}} be a complex analytic function that vanishes at {z_0}. Then there exists a unique complex analytic function {g: U \rightarrow {\bf C}} such that {f(z) = g(z) (z-z_0)} for all {z \in U}.

Proof:  For {z \neq z_0}, we can simply define {g(z) := f(z)/(z-z_0)}, and this is clearly the unique choice here.  Uniqueness at {z_0} follows from continuity. For {z} equal to or near {z_0}, we can expand {f} as a Taylor series {f(z) = \sum_{n=1}^\infty a_n (z-z_0)^n} (noting that the constant term vanishes since {f(z_0)=0}) and then set {g(z) := \sum_{n=0}^\infty a_{n+1} (z-z_0)^n}. One can check that these two definitions of {g} agree on their common domain, and that {g} is complex differentiable (and hence analytic) on {U}. \Box

Yet another consequence is the important property of analytic continuation:

Corollary 23 (Analytic continuation) Let {U} be a connected non-empty open subset of {{\bf C}}, and let {f: U \rightarrow {\bf C}}, {g: U \rightarrow {\bf C}} be complex analytic functions. If {f} and {g} agree on some non-empty open subset of {U}, then they in fact agree on all of {U}.

Proof: Let {V} denote the set of all points {z_0} in {U} where {f} and {g} agree to all orders, that is to say that

\displaystyle f^{(n)}(z_0) = g^{(n)}(z_0)

for all {n=0,1,\dots}. By hypothesis, {V} is non-empty; by the continuity of the {f^{(n)}} {V} is closed; and from analyticity and Taylor expansion (Exercise 17 of Notes 1) {V} is open. As {U} is connected, {V} must therefore be all of {U}, and the claim follows. \Box

There is also a variant of the above corollary:

Corollary 24 (Non-trivial analytic functions have isolated zeroes) Let {U} be a connected non-empty open subset of {{\bf C}}, and let {f: U \rightarrow {\bf C}} be a function which vanishes at some point {z_0 \in {\bf C}} but is not identically zero. Then there exists a disk {D(z_0,r)} in {U} on which {f} does not vanish except at {z_0}; in other words, all the zeroes of {f} are isolated points.

Proof: If all the derivatives {f^{(n)}(z_0)} of {f} at {z_0} vanish, then by Taylor expansion {f} vanishes in some open neighbourhood of {z_0}, and then by Corollary 23 {f} vanishes everywhere, a contradiction. Thus at least one of the {f^{(n)}(z_0)} is non-zero. If {n_0} is the first natural number for which {f^{(n_0)}(z_0) \neq 0}, then by iterating the factor theorem (Corollary 22) we see that {f(z) = (z-z_0)^{n_0} g(z)} for some analytic function {g:U \rightarrow {\bf C}} which is non-vanishing at {z_0}. By continuity, {g} is also non-vanishing in some disk {D(z_0,r)} in {U}, and the claim follows. \Box

One particular consequence of the above corollary is that if two entire functions {f,g} agree on the real line (or even on an infinite bounded subset of the complex plane), then they must agree everywhere, since otherwise {f-g} would have a non-isolated zero, contradicting Corollary 24. This strengthens Corollary 23, and helps explain why real-variable identities such as {\sin^2(x)+\cos^2(x)=1} automatically extend to their complex counterparts {\sin^2(z) + \cos^2(z) = 1}. Another consequence is that if an entire function {f: {\bf C} \rightarrow {\bf C}} is real-valued on the real axis, then one has the identity

\displaystyle f(z) = \overline{f(\overline{z})}

for all complex {z}, because this identity already holds on the real line, and both sides are complex analytic. Thus for instance

\displaystyle \sin(z) = \overline{\sin(\overline{z})}.

Next, if we combine Corollary 18 with Exercise 17 of Notes 1, as well as Cauchy’s theorem, we obtain

Theorem 25 (Higher order Cauchy integral formula, special case) Let {U} be an open subset of {{\bf C}}, let {f: U \rightarrow {\bf C}} be holomorphic, and let {z_0} be a point in {U}. Let {r>0} be such that the closed disk {\overline{D(z_0,r)} := \{ z \in {\bf C}: |z-z_0| \leq r \}} is contained in {U}. Let {\gamma} be a closed curve in {U \backslash \{z_0\}} that is homotopic (as a closed curve, up to reparameterisation) in {U \backslash \{z_0\}} to {\gamma_{z_0,r,\circlearrowleft}} in {U}. Then for any natural number {n}, the {n^{\mathrm{th}}} derivative {f^{(n)}(z_0)} of {f} at {z_0} is given by the formula

\displaystyle f^{(n)}(z_0) = \frac{n!}{2\pi i} \int_\gamma \frac{f(z)}{(z-z_0)^{n+1}}\ dz.

Exercise 26 Give an alternate proof of Theorem 25 by rigorously differentiating the Cauchy integral formula with respect to the {z_0} parameter.

Combining Theorem 25 with Exercise 16(v) of Notes 2, we obtain a more quantitative form of Corollary 20, which asserts not only that the higher derivatives of a holomorphic function exist, but also places a bound on them:

Corollary 27 (Cauchy inequalities) Let {U} be an open subset of {{\bf C}}, let {f: U \rightarrow {\bf C}} be holomorphic, and let {z_0} be a point in {U}. Let {r>0} be such that the closed disk {\overline{D(z_0,r)} := \{ z \in {\bf C}: |z-z_0| \leq r \}} is contained in {U}. Suppose that there is an {M} such that {|f(z)| \leq M} on the circle {\{ z \in {\bf C}: |z-z_0| = r\}}. Then for any natural number {n}, we have

\displaystyle |f^{(n)}(z_0)| \leq \frac{n!}{r^n} M. \ \ \ \ \ (5)

Note that the {n=0} case of this corollary is compatible with the maximum principle (Exercise 26 of Notes 1).

The right-hand side of (5) has a denominator {r^n} that improves when {r} gets large. In particular we have the remarkable theorem of Liouville:

Theorem 28 (Liouville’s theorem) Let {f: {\bf C} \rightarrow {\bf C}} be an entire function that is bounded. Then {f} is constant.

Proof: By hypothesis, there is a finite {M} such that {|f(z)| \leq M} for all {M}. Applying the Cauchy inequalities with {n=1} and any disk {D(z_0,r)}, we conclude that

\displaystyle |f'(z_0)| \leq \frac{M}{r}

for any {z_0 \in {\bf C}} and {r>0}. Sending {r} to infinity, we conclude that {f'} vanishes identically. The claim then follows from the fundamental theorem of calculus. \Box

This theorem displays a strong “rigidity” property for entire functions; if such a function is even vaguely close to being constant (by being bounded), then it almost magically “snaps into place” and actually is forced to be a constant! This is in stark contrast to the real case, in which there are functions such as {\sin(x)} that are differentiable (and even smooth and analytic) on the real line and bounded, but definitely not constant. Note that the complex analogue {\sin(z)} of the sine function is not a counterexample to Liouville’s theorem, since {\sin(z)} becomes quite unbounded away from the real axis (Exercise 16 of Notes 0). This also fits well with the intuition of harmonic functions (and hence also holomorphic functions) being “balanced” in that any convexity in one direction has to be balanced by concavity in the orthogonal direction, and vice versa (as discussed before Theorem 25 of Notes 1): any attempt to create an entire function that is bounded and oscillating in one direction will naturally force that function to become unbounded in the orthogonal direction.

Exercise 29 Let {f: {\bf C} \rightarrow {\bf C}} be an entire function which is of polynomial growth in the sense that there exists a finite quantity {M>0} and some exponent {A \geq 0} such that {|f(z)| \leq M (1+|z|)^A} for all {z \in {\bf C}}. Show that {f} is, in fact, a polynomial.

Now we can prove the fundamental theorem of algebra discussed back in Notes 0.

Theorem 30 (Fundamental theorem of algebra) Let

\displaystyle P(z) = a_n z^n + \dots + a_0

be a polynomial of degree {n \geq 0} for some {a_0,\dots,a_n \in {\bf C}} with {a_n} non-zero. Then there exist complex numbers {z_1,\dots,z_n} such that

\displaystyle P(z) = a_n (z-z_1) \dots (z-z_n).

Proof: This is trivial for {n=0,1}, so suppose inductively that {n \geq 2} and the claim has already been proven for {n-1}. Suppose first that the equation {P(z)=0} has no roots in the complex plane, then the function {1/P(z)} is entire. Also, this function goes to zero as {|z| \rightarrow \infty}, and so is bounded on the exterior of any sufficiently large disk; as it is also continuous, it is bounded on any disk and is thus bounded everywhere. By Liouville’s theorem, {1/P(z)} is constant, which implies that {P(z)} is constant, which is absurd (for instance, the {n^{\mathrm{th}}} derivative of {P} is the non-zero function {n! a_n}). Hence {P(z)} has at least one root {z_n}. By the factor theorem (which works in any field, including the complex numbers) we can then write {P(z) = Q(z) (z-z_n)} for some polynomial {Q(z)}, which by the long division algorithm (or by comparing coefficients) must take the form

\displaystyle Q(z) = a_n z^{n-1} + b_{n-2} z^{n-2} + \dots + b_0

for some complex numbers {b_0,\dots,b_{n-2}}. The claim then follows from the induction hypothesis. \Box

The following exercises show that {{\bf C}} can be alternatively defined as an algebraic closure of the reals {{\bf R}} (together with a designated square root {i} of {-1}), and that extending {{\bf R}} using a different irreducible polynomial than {x^2+1} would still give a field isomorphic to the complex numbers, thus supporting the notion that the complex numbers are not an arbitrary extension of the reals, but rather a quite natural and canonical one.

Exercise 31 Let {k} be a field containing {{\bf R}} which is a finite extension of {{\bf R}}, in the sense that {k} is a finite-dimensional vector space over {{\bf R}}. Show that {k} is isomorphic (as a field) to either {{\bf R}} or {{\bf C}}. (Hint: if {\alpha} is some element of {k} not in {{\bf R}}, show that {P(\alpha)=0} for some irreducible polynomial {P} with real coefficients but no real roots. Use this to set up an isomorphism between the field {\tilde k} generated by {{\bf R}} and {\alpha} with {{\bf C}}. If there is an element {\beta} of {k} not in this field {\tilde k}, show that there {Q(\beta)=0} for some irreducible polynomial {Q} with coefficients in {\tilde k} and no roots in {\tilde k}, and contradict the fundamental theorem of algebra.)

Exercise 32 A field {k} is said to be algebraically closed if the conclusion of Theorem 30 with {{\bf C}} replaced by {k}. Show that any algebraically closed field {k} containing {{\bf R}}, contains a subfield that is isomorphic to {{\bf C}} (and which contains {{\bf R}} as a subfield, isomorphic to the copy of {{\bf R}} inside {{\bf C}}). Thus, up to isomorphism, {{\bf C}} is the unique algebraic closure of {{\bf R}}, that is to say a minimal algebraically closed field containing {{\bf R}}.

Another nice consequence of the Cauchy integral formula is a converse to Cauchy’s theorem known as Morera’s theorem.

Theorem 33 (Morera’s theorem) Let {U} be an open subset of {{\bf C}}, and let {f: U \rightarrow {\bf C}} be a continuous function. Suppose that {f} is conservative in the sense that {\int_\gamma f(z)\ dz = 0} for any closed polygonal path in {U}. Then {f} is holomorphic on {U}.

Proof: By working locally with small balls in {U} we may assume that {U} is a ball (and in particular connected). By Exercise 31 of Notes 2, {f} has an antiderivative {F: U \rightarrow {\bf R}}. By definition, {F} is complex differentiable at every point of {U} (with derivative {f}), so by Corollary 20, {F} is smooth, which implies in particular that {f' = F} is holomorphic on {U} as claimed. \Box

The power of Morera’s theorem comes from the fact that there are no differentiability requirements in the hypotheses on {f}, and yet the conclusion is that {f} is differentiable (and hence smooth, by Corollary 20); it can be viewed as another manifestation of “elliptic regularity”. Here is one basic application of Morera’s theorem:

Theorem 34 (Uniform limit of holomorphic functions is holomorphic) Let {U} be an open subset of {{\bf C}}, and let {f_n: U \rightarrow {\bf C}} be a sequence of holomorphic functions that converge uniformly on compact sets to a limit {f: U \rightarrow {\bf C}}. Then {f} is also holomorphic. Furthermore, for each natural number {k}, the derivatives {f_n^{(k)}: U \rightarrow {\bf C}} also converge uniformly on compact sets to {f^{(k)}: U \rightarrow {\bf C}} for any natural number  (In particular, {f_n^{(k)}} converges pointwise to {f^{(k)}} on {U}.)

Proof: Again we may work locally and assume that {U} is a ball (and in partiular is convex and simply connected). The {f_n} are continuous, hence their locally uniform limit {f} is also continuous. From Corollary 11 (or Corollary 14), we have {\int_\gamma f_n(z)\ dz = 0} on any closed polygonal path in {U}, hence on taking locally uniform limits we also have {\int_\gamma f(z)\ dz = 0} for such paths. The holomorphicity of {f} then follows from Morera’s theorem. The uniform convergence of {f_n^{(k)}} to {f^{(k)}} on compact sets {K} follows from applying Theorem 25 to circular contours {\gamma_{z_0,\varepsilon,\circlearrowleft}} for {z_0 \in K} and {\varepsilon>0} small enough that these contours lie in {U} (note from compactness that one can take {\varepsilon} independent of {z_0}). \Box

Actually, one can weaken the uniform nature of the convergence in Theorem 34 substantially; even the weak limit of holomorphic functions in the space of locally integrable functions on {U} will remain harmonic. However, we will not need these weaker versions of this theorem here.

Exercise 35 (Riemann’s theorem on removable singularities) Let {U} be an open subset of {{\bf C}}, let {z_0} be a point in {U}, and let {f: U \backslash \{z_0\} \rightarrow {\bf C}} be a holomorphic function on {U \backslash \{z_0\}} which is bounded near {z_0}, in the sense that it is bounded on some punctured disk {D(z_0,r) \backslash \{z_0\}} contained in {z_0}. Show that {f} has a removable singularity at {z_0}, in the sense that {f} is the restriction to {U \backslash \{z_0\}} of a holomorphic function {\tilde f: U \rightarrow {\bf C}} on {U}. (Hint: show that {f} is conservative near {z_0}, find an antiderivative, extend it to {U}, and use Morera’s theorem to show that this extension is holomorphic. Alternatively, one can also proceed by some version of the Cauchy integral formula.)

Exercise 36 (Integrals of holomorphic functions) Let {U} be an open subset of {{\bf C}}, and let {f: [0,1] \times U \rightarrow {\bf C}} be a continuous function such that, for each {t \in [0,1]}, the function {z \mapsto f(t,z)} is holomorphic on {U}. Show that the function {z \mapsto \int_0^1 f(t,z)\ dz} is also holomorphic on {U}. (Hint: work locally and use Cauchy’s theorem, Morera’s theorem, and Fubini’s theorem.)

Exercise 37 (Schwarz reflection principle) Let {U} be an open subset of {{\bf C}} that is symmetric around the real axis, that is to say {\overline{z} \in U} whenever {z \in U}. Let {f_+: \overline{U_+} \rightarrow {\bf C}} be a continuous function on the set {\overline{U_+} := \{ z \in U: \mathrm{Im}(z) \geq 0\}} that is holomorphic in the open subset {U_+ := \{ z \in U: \mathrm{Im}(z) > 0 \}}. Similarly, let {f_-: \overline{U_-} \rightarrow {\bf C}} be continuous on {\overline{U_-} := \{ z \in U: \mathrm{Im}(z) \leq 0\}} that is holomorphic in the open subset {U_- := \{ z \in U: \mathrm{Im}(z) < 0 \}}. Suppose further that {f_+} and {f_-} agree on {U \cap {\bf R}}. Show that {f_+} and {f_-} are both restrictions of a single holomorphic function {f: U \rightarrow {\bf C}}.

The following two Venn diagrams (or more precisely, Euler diagrams) summarise the relationships between different types of regularity amongst continuous functions over both the reals and the complexes. The first diagram


describes the class of continuous functions on some interval {(a,b)} in the real line; such functions are automatically conservative, but not necessarily differentiable, while differentiable functions are not necessarily smooth, and smooth functions are not necessarily analytic. On the other hand, when considering the class of continuous functions on an open subset {U} of {{\bf C}}, the picture is different:


Now, very few continuous functions are conservative, and only slightly more functions are complex differentiable (and for simply connected domains {U}, these two classes in fact coincide). Whereas in the real case, differentiable functions were considerably less regular than analytic functions, in the complex case the two classes in fact coincide.

— 3. Winding number —

One defect of the current formulation of the Cauchy integral formula (see Theorem 15 and the ensuing discussion) is that the curve {\gamma} involved has to be homotopic (as a closed curve, up to reparameterisation) to a circular arc {\gamma_{z_0,r,\circlearrowleft}}, or at least to a curve of the form {t \mapsto z_0 + re^{imt}}, {t \in [0,2\pi]} for some integer {m}. We now investigate what happens when this hypothesis is removed. A key notion is that of a winding number.

Definition 38 (Winding number) Let {\gamma} be a closed curve, and let {z_0} be a complex number that is not in the image of {\gamma}. The winding number {W_\gamma(z_0)} of {\gamma} around {z_0} is defined by the integral

\displaystyle W_\gamma(z_0) := \frac{1}{2\pi i} \int_\gamma \frac{dz}{z-z_0}. \ \ \ \ \ (6)

Here we again take advantage of the ability to integrate holomorphic functions on curves that are not necessarily rectifiable. Clearly the winding number is unchanged if we replace {\gamma} by any equivalent curve, and if one replaces the curve {\gamma} with its reversal {-\gamma}, then the winding number is similarly negated. In some texts, the winding number is also referred to as the index or degree.

From the Cauchy integral formula we see that

\displaystyle W_{\gamma}(z_0) = 1

when {\gamma} is homotopic in {{\bf C} \backslash \{z_0\}} (as a closed curve, up to reparameterisation) to a circle {\gamma_{z_0,r,\circlearrowleft}}, and more generally that

\displaystyle W_{\gamma}(z_0) = m

if {\gamma} is homotopic in {{\bf C} \backslash \{z_0\}} (as a closed curve, up to reparameterisation) to a curve of the form {t \mapsto z_0 + r e^{imt}}, {t \in [0,2\pi]}. Thus we see, intuitively at least, that {W_\gamma(z_0)} measures the number of times {\gamma} winds counterclockwise about {z_0}, which explains the term “winding number”.

We can now state a more general form of the Cauchy integral formula:

Theorem 39 (General Cauchy integral formula) Let {U} be a simply connected subset of {{\bf C}}, let {\gamma} be a closed curve in {{\bf C}}, and let {f: U \rightarrow {\bf C}} be holomorphic. Then for any {z_0} that lies in {U} but not in the image of {\gamma}, we have

\displaystyle \frac{1}{2\pi i} \int_\gamma \frac{f(z)}{z-z_0}\ dz = W_\gamma(z_0) f(z_0).

Proof: By Corollary 22 (or Exercise 35), we have {f(z) - f(z_0) = (z-z_0) g(z)} for some holomorphic function {g: U \rightarrow {\bf C}}. Hence by Theorem 14 we have

\displaystyle \int_\gamma \frac{f(z)-f(z_0)}{z-z_0}\ dz = \int_\gamma g(z)\ dz = 0.

The claim then follows from (6). \Box

Exercise 40 (Higher order general Cauchy integral formula) With {U, \gamma, f, z_0} as in the above theorem, show that

\displaystyle W_\gamma(z_0) f^{(n)}(z_0) = \frac{n!}{2\pi i} \int_\gamma \frac{f(z)}{(z-z_0)^{n+1}}\ dz

for every natural number {n}. (Hint: instead of approximating {f(z)} by {f(z_0)}, use a partial Taylor expansion of {f}. Many of the terms that arise can be handled using the fundamental theorem of calculus. Alternatively, one can use differentiation under the integral sign and Lemma 44 below.)

To use Theorem 39, it becomes of interest to obtain more properties on the winding number. From Cauchy’s theorem we hav

Lemma 41 (Homotopy invariance) Let {z_0 \in {\bf C}}, and let {\gamma_0, \gamma_1} be two closed curves in {{\bf C} \backslash \{z_0\}} that are homotopic as closed curves up to reparameterisation in {{\bf C} \backslash \{z_0\}}. Then {W_{\gamma_0}(z_0) = W_{\gamma_1}(z_0)}.

The following specific corollary of this lemma will be useful for us.

Corollary 42 (Rouche’s theorem for winding number) Let {\gamma_0: [a,b] \rightarrow {\bf C}} be a closed curve, and let {z_0} lie outside of the image of {\gamma_0}. Let {\gamma_1: [a,b] \rightarrow {\bf C}} be a closed curve such that

\displaystyle |\gamma_1(t) -\gamma_0(t)| < |\gamma_0(t) - z_0| \ \ \ \ \ (7)

for all {t \in [a,b]}. Then {W_{\gamma_0}(z_0) = W_{\gamma}(z_1)}.

Proof: The map {\gamma: [0,1] \times [a,b] \rightarrow {\bf C}} defined by {\gamma(s,t) := (1-s) \gamma_0(t) + s \gamma_1(t)} is a homotopy from {\gamma_0} to {\gamma_1}; by (7) and the triangle inequality, it avoids {z_0}. The claim then follows from Lemma 41. \Box

Corollary 42 can be used to compute the winding number near infinity as follows. Given a curve {\gamma: [a,b] \rightarrow {\bf C}} and a point {z_0}, define the distance

\displaystyle \mathrm{dist}(z_0,\gamma) := \inf_{t \in [a,b]} |z_0-\gamma(t)|

and the diameter

\displaystyle \mathrm{diam}(\gamma) := \sup |\gamma(t) - \gamma(t')|.

Corollary 43 (Vanishing near infinity) Let {\gamma} be a closed curve. Then {W_\gamma(z_0) = 0} whenever {z_0 \in {\bf C}} is such that {\mathrm{dist}(z_0,\gamma) > \mathrm{diam}(\gamma)}.

Proof: Apply Corollary 42 with {\gamma_0} equal to {\gamma} and {\gamma_1} equal to any point in the image of {\gamma_0}. \Box

Corollary 42 also gives local constancy of the winding number:

Lemma 44 (Local constancy in {z_0}) Let {\gamma} be a closed curve. Then {W_\gamma} is locally constant. That is to say, if {z_0} does not lie in the image of {\gamma}, then there exists a disk {D(z_0,r)} outside of the image of {\gamma} such that {W_\gamma(z) = W_\gamma(z_0)} for all {z \in D(z_0,r)}.

Proof: From Corollary 42, we see that if {r} is small enough and {h \in D(0,r)}, then

\displaystyle W_{\gamma-h}(z_0) = W_\gamma(z_0),

where {\gamma-h: t \mapsto \gamma(t)-h} is the translation of {\gamma} by {h}. But by a translation change of variables we see that

\displaystyle W_{\gamma-h}(z_0) = W_\gamma(z_0+h)

and the claim follows. \Box

Exercise 45 Give an alternate proof of Lemma 44 based on differentiation under the integral sign and using the fact that {\frac{1}{(z-z_0)^2}} has an antiderivative away from {z_0}.

As confirmation of the interpretation of {W_\gamma(z_0)} as a winding number, we can now establish integrality:

Lemma 46 (Integrality) Let {\gamma} be a closed curve, and let {z_0} lie outside of the image of {\gamma}. Then {W_\gamma(z_0)} is an integer.

Proof: By Corollary 42 we may assume without loss of generality that {\gamma} is a closed polygonal path. By partitioning a polygon into triangles (and using Lemma 44 to move {z_0} slightly out of the way of any new edges formed by this partition) it suffices to verify this for triangular {\gamma}. But this follows from the Cauchy integral formula (if {z_0} is in the interior of the triangle) or Cauchy’s theorem (if {z_0} is in the exterior). \Box

Exercise 47 Give another proof of Lemma 46 by restricting again to closed polygonal paths {\gamma: [a,b] \rightarrow {\bf C}}, and showing that the function {t \mapsto \exp( \int_a^t \frac{\gamma'(s)}{\gamma(s)-z_0} )\ ds / (\gamma(t) - z_0)} is constant on {[a,b]} by establishing that it is continuous and has vanishing derivative at all but finitely many points. (Note that {\gamma'(s)} exists for all but finitely many {s}, so the integral here can be well defined.)

We now come to a fundamental and well known theorem about simple closed curves, namely the Jordan curve theorem.

Theorem 48 (Jordan curve theorem) Let {\gamma: [a,b] \rightarrow {\bf C}} be a non-trivial simple closed curve. Then there is an orientation {\sigma \in \{-1,+1\}} such that the complex plane {{\bf C}} is partitioned into the boundary region {\gamma([a,b])}, the exterior region

\displaystyle \{ z_0 \not \in \gamma([a,b]): W_\gamma(z_0) = 0 \}, \ \ \ \ \ (8)

and the interior region

\displaystyle \{ z_0 \not \in \gamma([a,b]): W_\gamma(z_0) = \sigma \}. \ \ \ \ \ (9)

Furthermore the exterior region is connected and unbounded, and the interior region is connected, non-empty and bounded. Finally, if {U} is any open set that contains {\gamma} and its interior, then {\gamma} is contractible to a point in {U}.

This theorem is relatively easy to prove for “nice” curves, such as polygons, but is surprisingly delicate to prove in general. Some idea of the subtlety involved can be seen by considering pathological examples such as the lakes of Wada, which are three disjoint open connected subsets of {{\bf C}} which all happen to have exactly the same boundary! This does not contradict the Jordan curve theorem, because the boundary set in this example is not given by a simple closed curve. However it does indicate that one has to carefully use the hypothesis of being a simple closed curve in order to prove Theorem 48. Another indication of the difficulty of the theorem is its global nature; the claim does not hold if one replaces the complex plane {{\bf C}} by other surfaces such as the torus, the projective plane, or the Klein bottle, so the global topological structure of the complex plane must come into play at some point. For the sake of completeness, we give a proof of this theorem in an appendix to these notes.

If the quantity {\sigma} in the above theorem is equal to {+1}, we say that the simple closed curve {\gamma} has an anticlockwise orientation; if instead {\sigma=-1} we say that {\gamma} has a clockwise orientation. Thus for instance, {\gamma_{z_0,r,\circlearrowleft}} has an anticlockwise orientation, while its reversal {-\gamma_{z_0,r,\circlearrowleft}} has the clockwise orientation.

Exercise 49 Let {\gamma_1}, {\gamma_2} be non-trivial simple closed curves.

  • (i) If {\gamma_1,\gamma_2} have disjoint image, show that {\gamma_2} either lies entirely in the interior of {\gamma_1}, or in the exterior.
  • (ii) If {\gamma_2} avoids the exterior of {\gamma_1}, show that the interior of {\gamma_2} is contained in the interior of {\gamma_1}, and the exterior of {\gamma_2} contains the exterior of {\gamma_1}.
  • (iii) If {\gamma_2} avoids the interior of {\gamma_1}, and {\gamma_1} avoids the interior of {\gamma_2}, and the two curves have disjoint images, show that the interior of {\gamma_2} is contained in the exterior of {\gamma_1}, and the exterior of {\gamma_2} contains the interior of {\gamma_1}.

(This is all visually “obvious” as soon as one draws a picture, but the challenge is to provide a rigorous proof. One should of course use the Jordan curve theorem extensively to do so. You will not need to use the final part of the Jordan curve theorem concerning contractibility.)

Exercise 50 Let {\gamma} be a non-trivial simple closed curve. Show that the interior of {\gamma} is simply connected. (Hint: first show that any simple closed polygonal path in {\gamma} is contractible to a point in the interior; then extend this to closed polygonal paths that are not necessarily simple by an induction on the number of edges in the path; then handle general closed curves.)

Remark 51 There is a refinement of the Jordan curve theorem known as the Jordan-Schoenflies theorem, that asserts that for non-trivial simple closed curve {\gamma} there is a homeomorphism {\phi: {\bf C} \rightarrow {\bf C}} that maps {\gamma} to the unit circle {S^1}, the interior of {\gamma} to the unit disk {D(0,1)}, and the exterior to the exterior region {\{ z \in {\bf C}: |z| > 1 \}}. The proof of this improved version of the Jordan curve theorem will have to wait until we have the Riemann mapping theorem (as well as a refinement of this theorem due to Carathéodory). The Jordan-Schoenflies theorem may seem self-evident, but it is worth pointing out that the analogous result in three dimensions fails without additional regularity assumptions on the boundary surface, thanks to the counterexample of the Alexander horned sphere.

From the Jordan curve theorem we have yet another form of the Cauchy theorem and Cauchy integral formula:

Theorem 52 (Cauchy’s theorem and Cauchy integral formula for simple curves) Let {\gamma} be a simple closed curve, and let {U} be an open set containing {\gamma} and its interior. Let {f: U \rightarrow {\bf C}} be a holomorphic function.

  • (i) (Cauchy’s theorem) One has {\int_\gamma f(z)\ dz = 0}.
  • (ii) (Cauchy integral formula) If {z_0 \in U} lies outside of the image of {\gamma}, then the expression {\frac{1}{2\pi i} \int_\gamma \frac{f(z)}{z-z_0}\ dz} vanishes if {z_0} lies in the exterior of {\gamma}, equals {f(z_0)} if {z_0} lies in the interior of {\gamma} and {\gamma} is oriented anti-clockwise, and equals {-f(z_0)} if {z_0} lies in the interior of {\gamma} and {\gamma} is oriented clockwise.

Exercise 53 Let {P(z) = a_n z^n + \dots + a_0} be a polynomial with complex coefficients {a_0,\dots,a_n} and {a_n \neq 0}. For any {R>0}, let {\gamma_R: [0,2\pi] \rightarrow {\bf C}} denote the closed contour {\gamma_R(t) := P(R e^{it})}.

  • (i) Show that if {R} is sufficiently large, then {W_{\gamma_R}(0) = n}.
  • (ii) Show that if {P} does not vanish on the closed disk {\overline{D(0,R)}}, then {W_{\gamma_R}(0)=0}.
  • (iii) Use these facts to give an alternate proof of the fundamental theorem of algebra that does not invoke Liouville’s theorem.

In the case when the closed curve {\gamma} is a contour (which includes of course the case of closed polygonal paths), one can describe the interior and exterior regions, as well as the winding number, more explicitly.

Exercise 54 (Local structure of interior and exterior) Let {\gamma = \gamma_1 + \dots + \gamma_n: [a,b] \rightarrow {\bf C}} be a simple closed contour formed by concatenating smooth curves {\gamma_1,\dots,\gamma_n} together. Let {z_0} be an interior point of one of these curves {\gamma_i: [a_i,b_i] \rightarrow {\bf C}}, thus {z_0 = \gamma_i(t_i)} for some {a_i < t_i < b_i}. Set {\gamma'(t_i) = re^{i\theta}} for some {r>0} and {\theta \in {\bf R}}. Recall from Exercise 24 of Notes 2 that for sufficiently small {\varepsilon}, the set {\gamma([a,b]) \cap D(z_0,\varepsilon)} can be expressed as a graph of the form

\displaystyle  \gamma([a,b] \cap D(z_0,\varepsilon)) = \{ z_0 + e^{i\theta} (s + i f(s)): s \in I_\varepsilon \}

for some interval {I_\varepsilon} and some continuously differentiable function {f: I_\varepsilon \rightarrow {\bf R}} with {f(0) =f'(0) = \varepsilon}. Show that if {\gamma} is oriented anticlockwise, and {\varepsilon} is sufficiently small then the interior of {\gamma} contains all points in {D(z_0,\varepsilon)} of the form {z_0 + e^{i\theta} (s + i (f(s)+u))} for some {s \in I_\varepsilon} and {u>0}, and the exterior of {\gamma} contains all points in {D(z_0,\varepsilon)} of the form {z_0 + e^{i\theta} (s + i (f(s)+u))} for some {s \in I_\varepsilon} and {u<0}. Similarly if {\gamma} oriented clockwise, with the conditions {u>0} and {u<0} swapped.

Exercise 55 (Alexander numbering rule) Let {\gamma = \gamma_1 + \dots + \gamma_n: [a,b] \rightarrow {\bf C}} be a simple closed contour oriented anticlockwise formed by concatenating smooth curves {\gamma_1,\dots,\gamma_n} together. Let {\sigma = \sigma_1 + \dots + \sigma_m: [c,d] \rightarrow {\bf C}} be a contour formed by concatenating smooth curves {\sigma_1,\dots,\sigma_m}, with initial point {z_0} and final point {z_1}. Assume that there are only finitely many points {w_1,\dots,w_k} where the images of {\gamma} and of {\sigma} intersect. Furthermore, assume at each of the points {w_l}, {l=1,\dots,k}, that one has a “smooth simple transverse intersection” in the sense that the following axioms are obeyed:

  • (i) {w_l} lies in the interior of one of the smooth curves {\gamma_i: [a_i,b_i] \rightarrow {\bf C}} that make up {\gamma}, thus {w_l = \gamma_i(t_i)} for some {a_i < t_i < b_i}.
  • (ii) {w_l} lies in the interior of one of the smooth curves {\sigma_j: [c_j,d_j] \rightarrow {\bf C}} that make up {\sigma}, thus {w_l = \sigma_j(s_j)} for some {c_j < t_j < b_j}.
  • (iii) {w_l} is only traversed once by {\sigma}, thus there do not exist {t \neq t'} in {[c,d]} such that {\sigma(t)=\sigma(t')=w_l}.
  • (iv) The derivatives {\gamma'_i(t_i)} and {\sigma'_j(s_j)} are linearly independent over {{\bf R}}. In other words, we either have a crossing from the right in which {\sigma'_j(s_j) = \lambda e^{i\theta} \gamma'_i(t_i)} for some {\lambda > 0} and {0 < \theta < \pi}, or else we have a crossing from the left in which {\sigma'_j(s_j) = \lambda e^{i\theta} \gamma'_i(t_i)} for some {\lambda > 0} and {-\pi < \theta < 0}.

Show that {W_\gamma(z_0) - W_\gamma(z_1)} is equal to the total number of crossings from the left, minus the total number of crossings from the right.

Exercise 56 Let {U} be a non-empty connected open subset of {{\bf C}}. Show that {U} is simply connected if and only if every holomorphic function on {U} is conservative.

— 4. Appendix: proof of the Jordan curve theorem (optional) —

We now prove the Jordan curve theorem. We begin with a variant of Corollary 42 in which the curve {\gamma_1} is only required to have image close to the image of {\gamma_0}, rather than be close to {\gamma_0} in a pointwise (and uniform) sense. For any curve {\gamma} and any {r>0}, let {N_r(\gamma) := \{ z \in {\bf C}: \mathrm{dist}(z,\gamma) < r \}} denote the {r}-neighbourhood of {\gamma}.

Proposition 57 Let {\gamma_0} be a non-trivial simple closed curve, and let {\delta>0}. Suppose that {\varepsilon>0} is sufficiently small depending on {\gamma_0} and {\delta}. Let {\gamma_1} be a closed curve (not necessarily simple) whose image lies in {N_{\varepsilon}(\gamma_0)}. Then {\gamma_1} is homotopic (as a closed curve, up to reparameterisation) to {m\gamma_0} in {N_\delta(\gamma_0)}, where {m\gamma_0} is defined as the concatenation of {m} copies of {\gamma_0} if {m} is positive, the trivial curve at the initial point of {\gamma_0} if {m} is zero, and the concatenation of {-m} copies of {-\gamma_0} if {m} is negative. In particular, from Lemma 41 one has

\displaystyle W_{\gamma_1}(z_0) = m W_{\gamma_0}(z_0)

for all {z_0 \in {\bf C} \backslash N_{\delta}(\gamma_0)}.


Proof: After reparameterisation, we can take {\gamma_0: [0,1] \rightarrow {\bf C}} to have domain on the unit interval {[0,1]}, and then by periodic extension we can view {\gamma_0: {\bf R} \rightarrow {\bf C}} as a continuous {1}-periodic function on {{\bf R}}.

As {[0,1]} is compact, {\gamma_0} is uniformly continuous on {[0,1]}, and hence also on {{\bf R}}. In particular, there exists {0 < \kappa < \frac{1}{10}} such that

\displaystyle |\gamma_0(t_1) - \gamma_0(t_2)| \leq \frac{\delta}{2} \ \ \ \ \ (10)

whenever {t_1,t_2 \in {\bf R}} are such that {|t_1-t_2| \leq \kappa}.

Fix this {\kappa}. Observe that the function {(t_1,t_2) \mapsto |\gamma_0(t_1)-\gamma_0(t_2)|} is continuous and nowhere vanishing on the region {\{ (t_1,t_2) \in [0,2] \times [0,2]: \kappa \leq |t_1-t_2| \leq 1-\kappa \}}. Thus, if {\varepsilon} is small enough depending on {\gamma_0,\kappa}, we have the lower bound

\displaystyle |\gamma_0(t_1)-\gamma_0(t_2)| \geq 3\varepsilon

whenever {t_1,t_2 \in [0,2]} are such that {\kappa \leq |t_1-t_2| \leq 1-\kappa}. Using the {1}-periodicity of {\gamma_0}, we conclude that if {t_1,t_2 \in {\bf R}} are such that

\displaystyle |\gamma_0(t_1)-\gamma_0(t_2)| < 3\varepsilon

the there must be an integer {m_{t_1,t_2}} such that

\displaystyle |t_2 - (t_1+m_{t_1,t_2})| < \kappa. \ \ \ \ \ (11)

Note that this integer {m_{t_1,t_2}} is uniquely determined by {t_1} and {t_2}.

Let {[a,b]} be the domain of {\gamma_1: [a,b] \rightarrow {\bf C}}. By the uniform continuity of {\gamma_1}, we can find a partition {a = s_0 < \dots < s_n = b} of {[a,b]} such that

\displaystyle |\gamma_1(s) - \gamma_1(s')| < \varepsilon \ \ \ \ \ (12)

for all {1 \leq j \leq n} and {s_{j-1} \leq s, s' \leq s_j}. Since the image of {\gamma_1} lies in {N_{\varepsilon}(\gamma_0)}, we can find, for each {0 \leq j \leq n}, a real number {t_j} such that

\displaystyle |\gamma_1(s_j) - \gamma_0(t_j)| < \varepsilon. \ \ \ \ \ (13)

Since {\gamma_1} is closed, we may arrange matters so that

\displaystyle \gamma_0(t_0) = \gamma_0(t_n). \ \ \ \ \ (14)

From the triangle inequality and (12), (13) we have

\displaystyle |\gamma_0(t_j) - \gamma_0(t_{j-1})| < 3\varepsilon.

Using (11), we conclude that for each {1 \leq j \leq n}, there is an integer {m_j} such that

\displaystyle |t_j - (t_{j-1}+m_j)| < \kappa.

As {\gamma_0} is {1}-periodic, we have the freedom to shift each of the {t_j} by an arbitrary integer, and by doing this for {t_1, \dots, t_n} in turn, we may assume without loss of generality that all the {m_j} vanish, thus

\displaystyle |t_j - t_{j-1}| < \kappa \ \ \ \ \ (15)

for all {j=1,\dots,n}. In particular, from (10) we have

\displaystyle |\gamma_0(t) - \gamma_0(t')| < \frac{\delta}{2} \ \ \ \ \ (16)

whenever {t_{j-1} \leq t, t' \leq t_j}. Also, as {\gamma_0} is simple, we have from (14) that

\displaystyle t_n = t_0 + m

for some integer {m}. (Note that by enforcing (15), we no longer have the freedom to individually move {t_0} or {t_n} by an integer, so we cannot assume without loss of generality that {m} vanishes.)

For {j=1,\dots,n}, let {\gamma_{0,j}: [j-1,j] \rightarrow {\bf C}} denote the curve

\displaystyle \gamma_{0,j}(t) := \gamma_0( t_{j-1} + (t-j+1) (t_j - t_{j-1}) )

from {\gamma_0(t_{j-1})} to {\gamma_0(t_j)}; similarly let {\gamma_{1,j}: [j-1,j] \rightarrow {\bf C}} denote the curve

\displaystyle \gamma_{1,j}(t) := \gamma_1( s_{j-1} + (t-j+1) (s_j - s_{j-1}) )

from {\gamma_1(s_{j-1})} to {\gamma_1(s_j)}. Observe from (16), (12), (13) that for each {j=1,\dots,n}, the images of {\gamma_{0,j}} and {\gamma_{1,j}} both lie in {D( \gamma_0(t_j), \frac{\delta}{2} + 3\varepsilon)}, which will lie in {N_\delta(\gamma_0([a,b]))} if {\varepsilon} is small enough. We can thus form a homotopy {\gamma: [0,1] \times [0,n] \rightarrow N_\delta(\gamma_0([a,b]))} from {\gamma_{0,1} + \dots + \gamma_{0,n}} to {\gamma_{1,1} + \dots + \gamma_{1,n}} by defining

\displaystyle \gamma( s, t ) = (1-s) \gamma_{0,j}(t) + s \gamma_{1,j}(t)

for all {1 \leq j \leq n} and {j-1 \leq t \leq j}. Thus {\gamma_{0,1} + \dots + \gamma_{0,n}} and {\gamma_{1,1} + \dots + \gamma_{1,n}} are homotopic as closed curves in {N_\delta(\gamma_0([a,b]))}. But by Exercise 3, {\gamma_{0,1} + \dots + \gamma_{0,n}} is homotopic up to reparameterisation as closed curves to {m \gamma_0} in {N_\delta(\gamma_0([a,b]))}, and {\gamma_{1,1} + \dots + \gamma_{1,n}} is similarly homotopic up to reparameterisation as closed curves to {\gamma_1} in {N_\delta(\gamma_0([a,b]))}, and the claim follows. \Box

We can now prove Theorem 48. We first verify the claim in the easy (and visually intuitive) case that {\gamma} is a non-trivial simple closed polygonal curve. Removing the polygon {\gamma([a,b])} from {{\bf C}} leaves an open set, which we may decompose into connected components as per Exercise 34 of Notes 2. On each of these components, the winding number {W_\gamma} is constant. Since each component has a non-empty boundary that is contained in {\gamma([a,b])}, this constant value of {W_\gamma} must also be attained arbitrarily close to {\gamma([a,b])}.

Now, a routine application of the Cauchy integral formula (see Exercise 59) shows that as {z_0} crosses one of the edges of the polygon {\gamma([a,b])}, the winding number {W_\gamma(z_0)} is shifted by either {+1} or {-1}. Hence at each point {z} on {\gamma([a,b])}, the winding number will take two values {\{ k, k+1\}} in a sufficiently small neighbourhood of {z} (excluding {\gamma([a,b])}). By a continuity argument, the integer {k} is independent of {z}. On the other hand, from Corollary 43 the winding number must be able to attain the value of zero. Thus we have {\{k,k+1\} = \{0,\sigma\}} for some {\sigma = \pm 1}. Dividing a small neighbourhood of {\gamma([a,b])} (excluding {\gamma([a,b])} itself) into the regions where the winding numbers are {0} or {\sigma}, a further continuity argument shows that each of these regions lie in a single connected component. Thus there are only two connected components, one where the winding number is zero and one where the winding number is {\sigma}. From (43) the latter component is bounded, hence the former is unbounded, and the claim follows.

Now we handle the significantly more difficult case when {\gamma} is just a non-trivial simple closed curve. As one may expect, the strategy will be to approximate this curve by a polygonal path, but some care has to be taken when performing a limit, in order to prevent the interior region from collapsing into nothingness, or becoming disconnected, in the limit.

The first challenge is to ensure that there is at least one point {z_0} outside of {\gamma([a,b])} in which {W_f(z_0)} is non-zero. This is actually rather tricky; we will achieve this by a parity argument (loosely inspired by a nonstandard version of this argument from this paper of Kanovei and Reeken). Clearly, {\gamma([a,b])} contains at least two points; by an appropriate rotation, translation, and dilation we may assume that {\gamma([a,b])} contains the points {+i} and {-i}, with {i} being both the initial point and the final point. Then we can decompose {\gamma = \gamma_1 + \gamma_2}, where {\gamma_1: [a,c] \rightarrow {\bf C}} is a curve from {i} to {-i}, and {\gamma_2: [c,b] \rightarrow {\bf C}} is a curve from {-i} to {i}.


Observe from the simplicity of {\gamma} that {|\gamma_1(t_1) - \gamma_2(t_2)| > 0} whenever {t_1 \in [a,c]} and {t_2 \in [c,b]} are such that

\displaystyle |\mathrm{Im}(\gamma_1(t))|, |\mathrm{Im}(\gamma_2(t_2))| \leq \frac{1}{2}. \ \ \ \ \ (17)

Thus, by compactness, there exists {0 < \delta < \frac{1}{10}} such that one has the lower bound

\displaystyle |\gamma_1(t_1) - \gamma_2(t_2)| \geq \delta \ \ \ \ \ (18)

separating {\gamma_1} from {\gamma_2} whenever {t_1 \in [a,c]}, {t_2 \in [c,b]} are such that (17) holds.

Next, for any natural number {N}, we may approximate {\gamma: [a,b] \rightarrow {\bf C}} by a polygonal closed path {\gamma^{(N)}: [a,b] \rightarrow {\bf C}} with

\displaystyle |\gamma^{(N)}(t) - \gamma(t)| < \frac{1}{N} \ \ \ \ \ (19)

for all {t \in [a,b]}. Although it is not particularly necessary, we can ensure that {\gamma^{(N)}(a) = \gamma(a) = i} and {\gamma^{(N)}(c) = \gamma(c) = -i}. By perturbing the edges of the polygonal path {\gamma^{(N)}} slightly, we may assume that none of the vertices of {\gamma^{(N)}} lie on the real axis, and that none of the self-crossings of {\gamma^{(N)}} (if any exist) lie on the real axis; thus, whenever {\gamma^{(N)}} crosses the real axis, it does so at an interior point of an edge, with no other edge of {\gamma^{(N)}} passing through that point. Note that we do not assert that the curve {\gamma^{(N)}} is simple; with some more effort one could “prune” {\gamma^{(N)}} by deleting short loops to make it simple, but this turns out to be unnecessary for the parity argument we give below.

Let {x^{(N)}_1 < x^{(N)}_2 < \dots < x^{(N)}_{n^{(N)}}} be the points on the real axis where {\gamma^{(N)}} crosses. By Exercise 59 below, the winding number {W_{\gamma^{(N)}}(x)} changes by {+1} or {-1} as {x} crosses each of the {x^{(N)}_j}; by Lemma 44, this winding number is constant otherwise, and by Corollary 43 it vanishes near infinity. Thus {n^{(N)}} is even, and the winding number is odd between {x^{(N)}_j} and {x^{(N)}_{j+1}} for any odd {j}.

Next, observe that each point {x^{(N)}_j} belongs to exactly one of the polygonal paths {\gamma^{(N)}([a,c])} or {\gamma^{(N)}([c,b])}. Since each of these curves starts on one side of the real axis and ends up on the other, they must both cross the real axis an odd number of times. On the other hand, the crossing points {x^{(N)}_1,\dots,x^{(N)}_n} can be grouped into pairs {\{x^{(N)}_j,x^{(N)}_{j+1}\}} with {j} odd. We conclude that there must exist an odd {j} such that one of the {x^{(N)}_j,x^{(N)}_{j+1}} lies in {\gamma^{(N)}([a,c])} and the other lies in {\tilde \gamma([c,b])}.

Fix such a {j}. For sake of discussion let suppose that {x^{(N)}_j} lies in {\gamma^{(N)}([a,c])} and {x^{(N)}_{j+1}} lies in {\gamma^{(N)}([c,b])}. From (19) we have

\displaystyle \mathrm{dist}( x^{(N)}_j, \gamma([a,c]) ) \leq \frac{1}{N}; \quad \mathrm{dist}( x^{(N)}_{j+1}, \gamma([c,b]) ) \leq \frac{1}{N}

and from (18) we have

\displaystyle \mathrm{dist}( x, \gamma([a,c]) ) + \mathrm{dist}( x, \gamma([c,b]) ) \geq \delta.

for any {x \in [x^{(N)}_j, x^{(N)}_{j+1}]}. By the intermediate value theorem, we can thus (for {N} large enough) find {x^{(N)}_j < x^{(N)}_* < x^{(N)}_{j+1}} such that

\displaystyle \mathrm{dist}( x^{(N)}_*, \gamma([a,c]) ) = \mathrm{dist}( x^{(N)}_*, \gamma([c,b]) )

and thus

\displaystyle \mathrm{dist}( x^{(N)}_*, \gamma([a,c]) ), \mathrm{dist}( x^{(N)}_*, \gamma([c,b]) ) \geq \frac{\delta}{2}

or equivalently

\displaystyle \mathrm{dist}( x^{(N)}_*, \gamma([a,b]) ) \geq \frac{\delta}{2}.

We arrive at the same conclusion in the opposite case when {x^{(N)}_j} lies in {\gamma^{(N)}([c,b])} and {x^{(N)}_{j+1}} lies in {\gamma^{(N)}([a,c])}.

By Corollary 43 (and (19)), the {x^{(N)}_*} are bounded in {N}. By the Bolzano-Weierstrass theorem, we can thus extract a subsequence of the {x^{(N)}_*} that converges to some limit {x_*}. By continuity we then have

\displaystyle \mathrm{dist}( x_*, \gamma([a,b]) ) \geq \frac{\delta}{2},

in particular {x_*} does not lie in {\gamma([a,b])}. By construction of {x^{(N)}_*}, we know that {W_{\gamma^{(N)}}( x^{(N)}_* )} is odd for all {N}; using Lemma 44 and Lemma 42 we conclude that {W_\gamma(x_*)} is also odd. Thus we have found at least one point where the winding number is non-zero.

Now we can finish the proof of the Jordan curve theorem. Let {\gamma: [a,b] \rightarrow {\bf C}} be a non-trivial simple closed curve. By the preceding discussion, we can find a point {z_*} outside of {\gamma([a,b])} where the winding number {W_\gamma} is non-zero. Let {\delta > 0} be a sufficiently small parameter, and let {0 < \varepsilon = \varepsilon(\delta) < \delta} be sufficiently small depending on