Planet Musings

September 30, 2016

Clifford JohnsonSuper-Strong…?

Are you going to watch the Luke Cage series that debuts today on Netflix? I probably will at some point (I've got several decades old reasons, and also it was set up well in the excellent Jessica Jones last year).... but not soon as I've got far too many deadlines. Here'a a related item: Using the Luke Cage character as a jumping off point, physicist Martin Archer has put together a very nice short video about the business of strong and tough (not the same thing) materials in the real world.


Have a look if you want to appreciate the nuances, and learn a bit about what's maybe just over the horizon for new amazing materials that might be come part of our every day lives. Video embed below: [...] Click to continue reading this post

The post Super-Strong…? appeared first on Asymptotia.

Terence Tao246A, Notes 0: the complex numbers

Kronecker is famously reported to have said, “God created the natural numbers; all else is the work of man”. The truth of this statement (literal or otherwise) is debatable; but one can certainly view the other standard number systems {{\bf Z}, {\bf Q}, {\bf R}, {\bf C}} as (iterated) completions of the natural numbers {{\bf N}} in various senses. For instance:

  • The integers {{\bf Z}} are the additive completion of the natural numbers {{\bf N}} (the minimal additive group that contains a copy of {{\bf N}}).
  • The rationals {{\bf Q}} are the multiplicative completion of the integers {{\bf Z}} (the minimal field that contains a copy of {{\bf Z}}).
  • The reals {{\bf R}} are the metric completion of the rationals {{\bf Q}} (the minimal complete metric space that contains a copy of {{\bf Q}}).
  • The complex numbers {{\bf C}} are the algebraic completion of the reals {{\bf R}} (the minimal algebraically closed field that contains a copy of {{\bf R}}).

These descriptions of the standard number systems are elegant and conceptual, but not entirely suitable for constructing the number systems in a non-circular manner from more primitive foundations. For instance, one cannot quite define the reals {{\bf R}} from scratch as the metric completion of the rationals {{\bf Q}}, because the definition of a metric space itself requires the notion of the reals! (One can of course construct {{\bf R}} by other means, for instance by using Dedekind cuts or by using uniform spaces in place of metric spaces.) The definition of the complex numbers as the algebraic completion of the reals does not suffer from such a non-circularity issue, but a certain amount of field theory is required to work with this definition initially. For the purposes of quickly constructing the complex numbers, it is thus more traditional to first define {{\bf C}} as a quadratic extension of the reals {{\bf R}}, and more precisely as the extension {{\bf C} = {\bf R}(i)} formed by adjoining a square root {i} of {-1} to the reals, that is to say a solution to the equation {i^2+1=0}. It is not immediately obvious that this extension is in fact algebraically closed; this is the content of the famous fundamental theorem of algebra, which we will prove later in this course.

The two equivalent definitions of {{\bf C}} – as the algebraic closure, and as a quadratic extension, of the reals respectively – each reveal important features of the complex numbers in applications. Because {{\bf C}} is algebraically closed, all polynomials over the complex numbers split completely, which leads to a good spectral theory for both finite-dimensional matrices and infinite-dimensional operators; in particular, one expects to be able to diagonalise most matrices and operators. Applying this theory to constant coefficient ordinary differential equations leads to a unified theory of such solutions, in which real-variable ODE behaviour such as exponential growth or decay, polynomial growth, and sinusoidal oscillation all become aspects of a single object, the complex exponential {z \mapsto e^z} (or more generally, the matrix exponential {A \mapsto \exp(A)}). Applying this theory more generally to diagonalise arbitrary translation-invariant operators over some locally compact abelian group, one arrives at Fourier analysis, which is thus most naturally phrased in terms of complex-valued functions rather than real-valued ones. If one drops the assumption that the underlying group is abelian, one instead discovers the representation theory of unitary representations, which is simpler to study than the real-valued counterpart of orthogonal representations. For closely related reasons, the theory of complex Lie groups is simpler than that of real Lie groups.

Meanwhile, the fact that the complex numbers are a quadratic extension of the reals lets one view the complex numbers geometrically as a two-dimensional plane over the reals (the Argand plane). Whereas a point singularity in the real line disconnects that line, a point singularity in the Argand plane leaves the rest of the plane connected (although, importantly, the punctured plane is no longer simply connected). As we shall see, this fact causes singularities in complex analytic functions to be better behaved than singularities of real analytic functions, ultimately leading to the powerful residue calculus for computing complex integrals. Remarkably, this calculus, when combined with the quintessentially complex-variable technique of contour shifting, can also be used to compute some (though certainly not all) definite integrals of real-valued functions that would be much more difficult to compute by purely real-variable methods; this is a prime example of Hadamard’s famous dictum that “the shortest path between two truths in the real domain passes through the complex domain”.

Another important geometric feature of the Argand plane is the angle between two tangent vectors to a point in the plane. As it turns out, the operation of multiplication by a complex scalar preserves the magnitude and orientation of such angles; the same fact is true for any non-degenerate complex analytic mapping, as can be seen by performing a Taylor expansion to first order. This fact ties the study of complex mappings closely to that of the conformal geometry of the plane (and more generally, of two-dimensional surfaces and domains). In particular, one can use complex analytic maps to conformally transform one two-dimensional domain to another, leading among other things to the famous Riemann mapping theorem, and to the classification of Riemann surfaces.

If one Taylor expands complex analytic maps to second order rather than first order, one discovers a further important property of these maps, namely that they are harmonic. This fact makes the class of complex analytic maps extremely rigid and well behaved analytically; indeed, the entire theory of elliptic PDE now comes into play, giving useful properties such as elliptic regularity and the maximum principle. In fact, due to the magic of residue calculus and contour shifting, we already obtain these properties for maps that are merely complex differentiable rather than complex analytic, which leads to the striking fact that complex differentiable functions are automatically analytic (in contrast to the real-variable case, in which real differentiable functions can be very far from being analytic).

The geometric structure of the complex numbers (and more generally of complex manifolds and complex varieties), when combined with the algebraic closure of the complex numbers, leads to the beautiful subject of complex algebraic geometry, which motivates the much more general theory developed in modern algebraic geometry. However, we will not develop the algebraic geometry aspects of complex analysis here.

Last, but not least, because of the good behaviour of Taylor series in the complex plane, complex analysis is an excellent setting in which to manipulate various generating functions, particularly Fourier series {\sum_n a_n e^{2\pi i n \theta}} (which can be viewed as boundary values of power (or Laurent) series {\sum_n a_n z^n}), as well as Dirichlet series {\sum_n \frac{a_n}{n^s}}. The theory of contour integration provides a very useful dictionary between the asymptotic behaviour of the sequence {a_n}, and the complex analytic behaviour of the Dirichlet or Fourier series, particularly with regard to its poles and other singularities. This turns out to be a particularly handy dictionary in analytic number theory, for instance relating the distribution of the primes to the Riemann zeta function. Nowadays, many of the analytic number theory results first obtained through complex analysis (such as the prime number theorem) can also be obtained by more “real-variable” methods; however the complex-analytic viewpoint is still extremely valuable and illuminating.

We will frequently touch upon many of these connections to other fields of mathematics in these lecture notes. However, these are mostly side remarks intended to provide context, and it is certainly possible to skip most of these tangents and focus purely on the complex analysis material in these notes if desired.

Note: complex analysis is a very visual subject, and one should draw plenty of pictures while learning it. I am however not planning to put too many pictures in these notes, partly as it is somewhat inconvenient to do so on this blog from a technical perspective, but also because pictures that one draws on one’s own are likely to be far more useful to you than pictures that were supplied by someone else.

— 1. The construction and algebra of the complex numbers —

Note: this section will be far more algebraic in nature than the rest of the course; we are concentrating almost all of the algebraic preliminaries in this section in order to get them out of the way and focus subsequently on the analytic aspects of the complex numbers.

Thanks to the laws of high-school algebra, we know that the real numbers {{\bf R}} are a field: it is endowed with the arithmetic operations of addition, subtraction, multiplication, and division, as well as the additive identity {0} and multiplicative identity {1}, that obey the usual laws of algebra (i.e. the field axioms).

The algebraic structure of the reals does have one drawback though – not all (non-trivial) polynomials have roots! Most famously, the polynomial equation {x^2+1=0} has no solutions over the reals, because {x^2} is always non-negative, and hence {x^2+1} is always strictly positive, whenever {x} is a real number.

As mentioned in the introduction, one traditional way to define the complex numbers {{\bf C}} is as the smallest possible extension of the reals {{\bf R}} that fixes this one specific problem:

Definition 1 (The complex numbers) A field of complex numbers is a field {{\bf C}} that contains the real numbers {{\bf R}} as a subfield, as well as a root {i} of the equation {i^2+1=0}. (Thus, strictly speaking, a field of complex numbers is a pair {({\bf C},i)}, but we will almost always abuse notation and use {{\bf C}} as a metonym for the pair {({\bf C},i)}.) Furthermore, {{\bf C}} is generated by {{\bf R}} and {i}, in the sense that there is no subfield of {{\bf C}}, other than {{\bf C}} itself, that contains both {{\bf R}} and {i}; thus, in the language of field extensions, we have {{\bf C} = {\bf R}(i)}.

(We will take the existence of the real numbers {{\bf R}} as a given in this course; constructions of the real number system can of course be found in many real analysis texts, including my own.)

Definition 1 is short, but proposing it as a definition of the complex numbers raises some immediate questions:

  • (Existence) Does such a field {{\bf C}} even exist?
  • (Uniqueness) Is such a field {{\bf C}} unique (up to isomorphism)?
  • (Non-arbitrariness) Why the square root of {-1}? Why not adjoin instead, say, a fourth root of {-42}, or the solution to some other algebraic equation? Also, could one iterate the process, extending {{\bf C}} further by adding more roots of equations?

The third set of questions can be answered satisfactorily once we possess the fundamental theorem of algebra. For now, we focus on the first two questions.

We begin with existence. One can construct the complex numbers quite explicitly and quickly using the Argand plane construction; see Remark 7 below. However, from the perspective of higher mathematics, it is more natural to view the construction of the complex numbers as a special case of the more general algebraic construction that can extend any field {k} by the root {\alpha} of an irreducible nonlinear polynomial {P \in k[\mathrm{x}]} over that field; this produces a field of complex numbers {{\bf C}} when specialising to the case where {k={\bf R}} and {P = \mathrm{x}^2+1}. We will just describe this construction in that special case, leaving the general case as an exercise.

Starting with the real numbers {{\bf R}}, we can form the space {{\bf R}[\mathrm{x}]} of (formal) polynomials

\displaystyle P(x) = a_d \mathrm{x}^d + a_{d-1} \mathrm{x}^{d-1} + \dots + a_0

with real co-efficients {a_0,\dots,a_d \in {\bf R}} and arbitrary non-negative integer {d} in one indeterminate variable {\mathrm{x}}. (A small technical point: we do not view this indeterminate {\mathrm{x}} as belonging to any particular domain such as {{\bf R}}, so we do not view these polynomials {P} as functions but merely as formal expressions involving a placeholder symbol {\mathrm{x}} (which we have rendered in Roman type to indicate its formal character). In this particular characteristic zero setting of working over the reals, it turns out to be harmless to identify each polynomial {P} with the corresponding function {P: {\bf R} \rightarrow {\bf R}} formed by interpreting the indeterminate {\mathrm{x}} as a real variable; but if one were to generalise this construction to positive characteristic fields, and particularly finite fields, then one can run into difficulties if polynomials are not treated formally, due to the fact that two distinct formal polynomials might agree on all inputs in a given finite field (e.g. the polynomials {x} and {x^p} agree for all {x} in the finite field {{\mathbf F}_p}). However, this subtlety can be ignored for the purposes of this course.) This space {{\bf R}[x]} of polynomials has a pretty good algebraic structure, in particular the usual operations of addition, subtraction, and multiplication on polynomials, together with the zero polynomial {0} and the unit polynomial {1}, give {{\bf R}[\mathrm{x}]} the structure of a (unital) commutative ring. This commutative ring also contains {{\bf R}} as a subring (identifying each real number {a} with the degree zero polynomial {a x^0}). The ring {{\bf R}[\mathrm{x}]} is however not a field, because many non-zero elements of {{\bf R}[\mathrm{x}]} do not have multiplicative inverses. (In fact, no non-constant polynomial in {{\bf R}[\mathrm{x}]} has an inverse in {{\bf R}[\mathrm{x}]}, because the product of two non-constant polynomials has a degree that is the sum of the degrees of the factors.)

If a unital commutative ring fails to be field, then it will instead possess a number of non-trivial ideals. The only ideal we will need to consider here is the principal ideal

\displaystyle  \langle \mathrm{x}^2+1 \rangle := \{ (\mathrm{x}^2+1) P(\mathrm{x}): P(\mathrm{x}) \in {\bf R}[\mathrm{x}] \}.

This is clearly an ideal of {{\bf R}[\mathrm{x}]} – it is closed under addition and subtraction, and the product of any element of the ideal {\langle \mathrm{x}^2 + 1 \rangle} with an element of the full ring {{\bf R}[\mathrm{x}]} remains in the ideal {\langle \mathrm{x}^2 + 1 \rangle}.

We now define {{\bf C}} to be the quotient space

\displaystyle {\bf C} := {\bf R}[\mathrm{x}] / \langle \mathrm{x}^2+1 \rangle

of the commutative ring {{\bf R}[\mathrm{x}]} by the ideal {\langle \mathrm{x}^2+1 \rangle}; this is the space of cosets {P(\mathrm{x}) + \langle \mathrm{x}^2+1 \rangle = \{ P(\mathrm{x}) + Q(\mathrm{x}): Q(\mathrm{x}) \in \langle \mathrm{x}^2+1 \rangle \}} of {\langle \mathrm{x}^2+1 \rangle} in {{\bf R}[\mathrm{x}]}. Because {\langle \mathrm{x}^2+1 \rangle} is an ideal, there is an obvious way to define addition, subtraction, and multiplication in {{\bf C}}, namely by setting

\displaystyle  (P(\mathrm{x}) + \langle \mathrm{x}^2+1 \rangle) + (Q(\mathrm{x}) + \langle \mathrm{x}^2+1 \rangle) := (P(\mathrm{x}) + Q(\mathrm{x}) + \langle \mathrm{x}^2+1 \rangle),

\displaystyle  (P(\mathrm{x}) + \langle \mathrm{x}^2+1 \rangle) - (Q(\mathrm{x}) + \langle \mathrm{x}^2+1 \rangle) := (P(\mathrm{x}) - Q(\mathrm{x}) + \langle \mathrm{x}^2+1 \rangle)


\displaystyle  (P(\mathrm{x}) + \langle \mathrm{x}^2+1 \rangle) \cdot (Q(\mathrm{x}) + \langle \mathrm{x}^2+1 \rangle) := (P(\mathrm{x}) Q(\mathrm{x}) + \langle \mathrm{x}^2+1 \rangle)

for all {P(\mathrm{x}), Q(\mathrm{x}) \in {\bf R}[\mathrm{x}]}; these operations, together with the additive identity {0 = 0 + \langle \mathrm{x}^2+1 \rangle} and the multiplicative identity {1 = 1 + \langle \mathrm{x}^2+1 \rangle}, can be easily verified to give {{\bf C}} the structure of a commutative ring. Also, the real line {{\bf R}} embeds into {{\bf C}} by identifying each real number {a} with the coset {a + \langle \mathrm{x}^2+1 \rangle}; note that this identification is injective, as no real number is a multiple of the polynomial {\mathrm{x}^2+1}.

If we define {i \in {\bf C}} to be the coset

\displaystyle  i := \mathrm{x} + \langle \mathrm{x}^2 + 1 \rangle,

then it is clear from construction that {i^2+1=0}. Thus {{\bf C}} contains both {{\bf R}} and a solution of the equation {i^2+1=0}. Also, since every element of {{\bf C}} is of the form {P(\mathrm{x}) + \langle \mathrm{x}^2+1 \rangle} for some polynomial {P \in {\bf R}[\mathrm{x}]}, we see that every element of {{\bf C}} is a polynomial combination {P(i)} of {i} with real coefficients; in particular, any subring of {{\bf C}} that contains {{\bf R}} and {i} will necessarily have to contain every element of {{\bf C}}. Thus {{\bf C}} is generated by {{\bf R}} and {i}.

The only remaining thing to verify is that {{\bf C}} is a field and not just a commutative ring. In other words, we need to show that every non-zero element of {{\bf C}} has a multiplicative inverse. This stems from a particular property of the polynomial {\mathrm{x}^2 + 1}, namely that it is irreducible in {{\bf R}[\mathrm{x}]}. That is to say, we cannot factor {\mathrm{x}^2+1} into non-constant polynomials

\displaystyle  \mathrm{x}^2 + 1 = P(\mathrm{x}) Q(\mathrm{x})

with {P(\mathrm{x}), Q(\mathrm{x}) \in {\bf R}[\mathrm{x}]}. Indeed, as {\mathrm{x}^2+1} has degree two, the only possible way such a factorisation could occur is if {P(\mathrm{x}), Q(\mathrm{x})} both have degree one, which would imply that the polynomial {x^2+1} has a root in the reals {{\bf R}}, which of course it does not.

Because the polynomial {\mathrm{x}^2+1} is irreducible, it is also prime: if {\mathrm{x}^2+1} divides a product {P(\mathrm{x}) Q(\mathrm{x})} of two polynomials in {{\bf R}[\mathrm{x}]}, then it must also divide at least one of the factors {P(\mathrm{x})}, {Q(\mathrm{x})}. Indeed, if {\mathrm{x}^2 + 1} does not divide {P(\mathrm{x})}, then by irreducibility the greatest common divisor of {\mathrm{x}^2+1} and {P(\mathrm{x})} is {1}. Applying the Euclidean algorithm for polynomials, we then obtain a representation of {1} as

\displaystyle  1 = R(\mathrm{x}) (\mathrm{x}^2+1) + S(\mathrm{x}) P(\mathrm{x})

for some polynomials {R(\mathrm{x}), S(\mathrm{x})}; multiplying both sides by {Q(\mathrm{x})}, we conclude that {Q(\mathrm{x})} is a multiple of {\mathrm{x}^2+1}.

Since {\mathrm{x}^2+1} is prime, the quotient space {{\bf C} = {\bf R}[\mathrm{x}] / \langle \mathrm{x}^2+1 \rangle} is an integral domain: there are no zero-divisors in {{\bf C}} other than zero. This brings us closer to the task of showing that {{\bf C}} is a field, but we are not quite there yet; note for instance that {{\bf R}[\mathrm{x}]} is an integral domain, but not a field. But one can finish up by using finite dimensionality. As {{\bf C}} is a ring containing the field {{\bf R}}, it is certainly a vector space over {{\bf R}}; as {{\bf C}} is generated by {{\bf R}} and {i}, and {i^2=-1}, we see that it is in fact a two-dimensional vector space over {{\bf R}}, spanned by {1} and {i} (which are linearly independent, as {i} clearly cannot be real). In particular, it is finite dimensional. For any non-zero {z \in {\bf C}}, the multiplication map {w \mapsto zw} is an {{\bf R}}-linear map from this finite-dimensional vector space to itself. As {{\bf C}} is an integral domain, this map is injective; by finite-dimensionality, it is therefore surjective (by the rank-nullity theorem). In particular, there exists {w} such that {zw = 1}, and hence {z} is invertible and {{\bf C}} is a field. This concludes the construction of a complex field {{\bf C}}.

Remark 2 One can think of the action of passing from a ring {R} to a quotient {R/I} by some ideal {I} as the action of forcing some relations to hold between the various elements of {R}, by requiring all the elements of the ideal {I} (or equivalently, all the generators of {I}) to vanish. Thus one can think of {{\bf R}[\mathrm{x}] / \langle \mathrm{x}^2 + 1 \rangle} as the ring formed by adjoining a new element {i} to the existing ring {{\bf R}} and then demanding the constraint {i^2+1=0}. With this perspective, the main issues to check in order to obtain a complex field are firstly that these relations do not collapse the ring so much that two previously distinct elements of {{\bf R}} become equal, and secondly that all the non-zero elements become invertible once the relations are imposed, so that we obtain a field rather than merely a ring or integral domain.

Remark 3 It is instructive to compare the complex field {{\bf R}[\mathrm{x}] / \langle \mathrm{x}^2 + 1 \rangle}, formed by adjoining the square root of {-1} to the reals, with other commutative rings such as the dual numbers {{\bf R}[\mathrm{x}] / \langle \mathrm{x}^2 \rangle} (which adjoins an additional square root of {0} to the reals) or the split complex numbers {{\bf R}[\mathrm{x}] / \langle \mathrm{x}^2 - 1 \rangle} (which adjoins a new root of {+1} to the reals). The latter two objects are perfectly good rings, but are not fields (they contain zero divisors, and the first ring even contains a nilpotent). This is ultimately due to the reducible nature of the polynomials {\mathrm{x}^2} and {\mathrm{x}^2-1} in {{\bf R}[\mathrm{x}]}.

Uniqueness of {{\bf C}} up to isomorphism is a straightforward exercise:

Exercise 4 (Uniqueness up to isomorphism) Suppose that one has two complex fields {{\bf C} = ({\bf C},i)} and {{\bf C}' = ({\bf C}',i')}. Show that there is a unique field isomorphism {\iota: {\bf C} \rightarrow {\bf C}'} that maps {i} to {i'} and is the identity on {{\bf R}}.

Now that we have existence and uniqueness up to isomorphism, it is safe to designate one of the complex fields {{\bf C} = ({\bf C},i)} as the complex field; the other complex fields out there will no longer be of much importance in this course (or indeed, in most of mathematics), with one small exception that we will get to later in this section. One can, if one wishes, use the above abstract algebraic construction {( {\bf R}[\mathrm{x}] / \langle \mathrm{x}^2+1 \rangle, \mathrm{x} + \langle \mathrm{x}^2 + 1 \rangle)} as the choice for “the” complex field {{\bf C}}, but one can certainly pick other choices if desired (e.g. the Argand plane construction in Remark 7 below). But in view of Exercise 4, the precise construction of {{\bf C}} is not terribly relevant for the purposes of actually doing complex analysis, much as the question of whether to construct the real numbers {{\bf R}} using Dedekind cuts, equivalence classes of Cauchy sequences, or some other construction is not terribly relevant for the purposes of actually doing real analysis. So, from here on out, we will no longer refer to the precise construction of {{\bf C}} used; the reader may certainly substitute his or her own favourite construction of {{\bf C}} in place of {{\bf R}[\mathbf{x}] / \langle {\mathbf x}^2 + 1 \rangle} if desired, with essentially no change to the rest of the lecture notes.

Exercise 5 Let {k} be an arbitrary field, let {k[\mathrm{x}]} be the ring of polynomials with coefficients in {k}, and let {P(\mathrm{x})} be an irreducible polynomial in {k[\mathrm{x}]} of degree at least two. Show that {k[\mathrm{x}] / \langle P(\mathrm{x}) \rangle} is a field containing an embedded copy of {k}, as well as a root {\alpha} of the equation {P(\alpha)=0}, and that this field is generated by {k} and {\alpha}. Also show that all such fields are unique up to isomorphism. (This field {k[\mathrm{x}] / \langle P(\mathrm{x}) \rangle} is an example of a field extension of {k}, the further study of which can be found in any advanced undergraduate or early graduate text on algebra, and is the starting point in particular for the beautiful topic of Galois theory, which we will not discuss here.)

Exercise 6 Let {k} be an arbitrary field. Show that every non-constant polynomial {P(\mathrm{x})} in {k[\mathrm{x}]} can be factored as the product {P_1(\mathrm{x}) \dots P_r(\mathrm{x})} of irreducible non-constant polynomials. Furthermore show that this factorisation is unique up to permutation of the factors {P_1(\mathrm{x}),\dots,P_r(\mathrm{x})}, and multiplication of each of the factors by a constant (with the product of all such constants being one). In other words: the polynomiail ring {k[\mathrm{x}]} is a unique factorisation domain.

Remark 7 (Real and imaginary coordinates) As a complex field {{\bf C}} is spanned (over {{\bf R}}) by the linearly independent elements {1} and {i}, we can write

\displaystyle  {\bf C} = \{ a + b i: a,b \in {\bf R} \}

with each element {z} of {{\bf C}} having a unique representation of the form {a+bi}, thus

\displaystyle  a+bi = c+di \iff a=c \hbox{ and } b=d

for real {a,b,c,d}. The addition, subtraction, and multiplication operations can then be written down explicitly in these coordinates as

\displaystyle  (a+bi) + (c+di) = (a+c) + (b+d)i

\displaystyle  (a+bi) - (c+di) = (a-c) + (b-d)i

\displaystyle  (a+bi) (c+di) = (ac-bd) + (ad+bc)i

and with a bit more work one can compute the division operation as

\displaystyle  \frac{a+bi}{c+di} = \frac{ac+bd}{c^2+d^2} + \frac{bc-ad}{c^2+d^2} i

if {c+di \neq 0}. One could take these coordinate representations as the definition of the complex field {{\bf C}} and its basic arithmetic operations, and this is indeed done in many texts introducing the complex numbers. In particular, one could take the Argand plane {({\bf R}^2, (0,1))} as the choice of complex field, where we identify each point {(a,b)} in {{\bf R}^2} with {a+bi} (so for instance {{\bf R}^2} becomes endowed with the multiplication operation {(a,b) (c,d) = (ac-bd,ad+bc)}). This is a very concrete and direct way to construct the complex numbers; the main drawback is that it is not immediately obvious that the field axioms are all satisfied. For instance, the associativity of multiplication is rather tedious to verify in the coordinates of the Argand plane. In contrast, the more abstract algebraic construction of the complex numbers given above makes it more evident what the source of the field structure on {{\bf C}} is, namely the irreducibility of the polynomial {\mathrm{x}^2+1}.

Remark 8 Because of the Argand plane construction, we will sometimes refer to the space {{\bf C}} of complex numbers as the complex plane. We should warn, though, that in some areas of mathematics, particularly in algebraic geometry, {{\bf C}} is viewed as a one-dimensional complex vector space (or a one-dimensional complex manifold or complex variety), and so {{\bf C}} is sometimes referred to in those cases as a complex line. (Similarly, Riemann surfaces, which from a real point of view are two-dimensional surfaces, can sometimes be referred to as complex curves in the literature; the modular curve is a famous instance of this.) In this current course, though, the topological notion of dimension turns out to be more important than the algebraic notions of dimension, and as such we shall generally refer to {{\bf C}} as a plane rather than a line.

Elements of {{\bf C}} of the form {bi} for {b} real are known as purely imaginary numbers; the terminology is colourful, but despite the name, imaginary numbers have precisely the same first-class mathematical object status as real numbers. If {z=a+bi} is a complex number, the real components {a,b} of {z} are known as the real part {\mathrm{Re}(z)} and imaginary part {\mathrm{Im}(z)} of {z} respectively. Complex numbers that are not real are occasionally referred to as strictly complex numbers. In the complex plane, the set {{\bf R}} of real numbers forms the real axis, and the set {i{\bf R}} of imaginary numbers forms the imaginary axis. Traditionally, elements of {{\bf C}} are denoted with symbols such as {z}, {w}, or {\zeta}, while symbols such as {a,b,c,d,x,y} are typically intended to represent real numbers instead.

Remark 9 We noted earlier that the equation {x^2+1=0} had no solutions in the reals because {x^2+1} was always positive. In other words, the properties of the order relation {<} on {{\bf R}} prevented the existence of a root for the equation {x^2+1=0}. As {{\bf C}} does have a root for {x^2+1=0}, this means that the complex numbers {{\bf C}} cannot be ordered in the same way that the reals are ordered (that is to say, being totally ordered, with the positive numbers closed under both addition and multiplication). Indeed, one usually refrains from putting any order structure on the complex numbers, so that statements such as {z < w} for complex numbers {z,w} are left undefined (unless {z,w} are real, in which case one can of course use the real ordering). In particular, the complex number {i} is considered to be neither positive nor negative, and an assertion such as {z<w} is understood to implicitly carry with it the claim that {z,w} are real numbers and not just complex numbers. (Of course, if one really wants to, one can find some total orderings to place on {{\bf C}}, e.g. lexicographical ordering on the real and imaginary parts. However, such orderings do not interact too well with the algebraic structure of {{\bf C}} and are rarely useful in practice.)

As with any other field, we can raise a complex number {z} to a non-negative integer {n} by declaring inductively {z^0 := 0} and {z^{n+1} := z \times z^n} for {n \geq 0}; in particular we adopt the usual convention that {0^0=1} (when thinking of the base {0} as a complex number, and the exponent {0} as a non-negative integer). For negative integers {n = -m}, we define {z^n = 1/z^m} for non-zero {z}; we leave {z^n} undefined when {z} is zero and {n} is negative. At the present time we do not attempt to define {z^\alpha} for any exponent {\alpha} other than an integer; we will return to such exponentiation operations later in the course, though we will at least define the complex exponential {e^z} for any complex {z} later in this set of notes.

By definition, a complex field {({\bf C},i)} is a field {{\bf C}} together with a root {z=i} of the equation {z^2+1=0}. But if {z=i} is a root of the equation {z^2+1=0}, then so is {z=-i} (indeed, from the factorisation {z^2+1 = (z-i) (z+i)} we see that these are the only two roots of this quadratic equation. Thus we have another complex field {\overline{{\bf C}} := ({\bf C},-i)} which differs from {{\bf C}} only in the choice of root {i}. By Exercise 4, there is a unique field isomorphism from {{\bf C}} to {{\bf C}} that maps {i} to {-i} (i.e. a complex field isomorphism from {{\bf C}} to {\overline{{\bf C}}}); this operation is known as complex conjugation and is denoted {z \mapsto \overline{z}}. In coordinates, we have

\displaystyle  \overline{a+bi} = a-bi.

Being a field isomorphism, we have in particular that

\displaystyle  \overline{z+w} = \overline{z} + \overline{w}


\displaystyle  \overline{zw} = \overline{z} \overline{w}

for all complex numbers {z,w}. It is also clear that complex conjugation fixes the real numbers, and only the real numbers: {z = \overline{z}} if and only if {z} is real. Geometrically, complex conjugation is the operation of reflection in the complex plane across the real axis. It is clearly an involution in the sense that it is its own inverse:

\displaystyle  \overline{\overline{z}} = z.

One can also relate the real and imaginary parts to complex conjugation via the identities

\displaystyle  \mathrm{Re}(z) = \frac{z + \overline{z}}{2}; \quad \mathrm{Im}(z) = \frac{z-\overline{z}}{2i}. \ \ \ \ \ (1)

Remark 10 Any field automorphism of {{\bf C}} has to map {i} to a root of {z^2+1=0}, and so the only field automorphisms of {{\bf C}} that preserve the real line are the identity map and the conjugation map; conversely, the real line is the subfield of {{\bf C}} fixed by both of these automorphisms. In the language of Galois theory, this means that {{\bf C}} is a Galois extension of {{\bf R}}, with Galois group {\mathrm{Gal}({\bf C}/{\bf R})} consisting of two elements. There is a certain sense in which one can think of the complex numbers (or more precisely, the scheme {{\mathcal C}} of complex numbers) as a double cover of the real numbers (or more precisely, the scheme {{\mathcal R}} of real numbers), analogous to how the boundary of a Möbius strip can be viewed as a double cover of the unit circle formed by shrinking the width of the strip to zero. (In this analogy, points on the unit circle correspond to specific models of the real number system {{\bf R}}, and lying above each such point are two specific models {({\bf C}, i)}, {({\bf C},-i)} of the complex number system; this analogy can be made precise using Grothendieck’s “functor of points” interpretation of schemes.) The operation of complex conjugation is then analogous to the operation of monodromy caused by looping once around the base unit circle, causing the two complex fields sitting above a real field to swap places with each other. (This analogy is not quite perfect, by the way, because the boundary of a Möbius strip is not simply connected and can in turn be finitely covered by other curves, whereas the complex numbers are algebraically complete and admit no further finite extensions; one should really replace the unit circle here by something with a two-element fundamental group, such as the projective plane {\mathbf{RP}^2} that is double covered by the sphere {S^2}, but this is harder to visualize.) The analogy between (absolute) Galois groups and fundamental groups suggested by this picture can be made precise in scheme theory by introducing the concept of an étale fundamental group, which unifies the two concepts, but developing this further is well beyond the scope of this course; see this book of Szamuely for further discussion.

Observe that if we multiply a complex number {z} by its complex conjugate {\overline{z}}, we obtain a quantity {N(z) := z \overline{z}} which is invariant with respect to conjugation (i.e. {\overline{N(z)} = N(z)}) and is therefore real. The map {N: {\bf C} \rightarrow {\bf R}} produced this way is known in field theory as the norm form of {{\bf C}} over {{\bf R}}; it is clearly multiplicative in the sense that {N(zw) = N(z) N(w)}, and is only zero when {z} is zero. It can be used to link multiplicative inversion with complex conjugation, in that we clearly have

\displaystyle  \frac{1}{z} = \frac{\overline{z}}{N(z)} \ \ \ \ \ (2)

for any non-zero complex number {z}. In coordinates, we have

\displaystyle  N(a+bi) = (a+bi) (a-bi) = a^2+b^2

(thus recovering, by the way, the inversion formula {\frac{1}{a+bi} = \frac{a}{a^2+b^2} - \frac{b}{a^2+b^2} i} implicit in Remark 7). In coordinates, the multiplicativity {N(zw) = N(z) N(w)} takes the form of Lagrange’s identity

\displaystyle  (ac-bd)^2 + (ad+bc)^2 = (a^2+b^2) (c^2+d^2)

— 2. The geometry of the complex numbers —

The norm form {N} of the complex numbers has the feature of being positive definite: {N(z)} is always non-negative (and strictly positive when {z} is non-zero). This is a feature that is somewhat special to the complex numbers; for instance, the quadratic extension {{\bf Q}(\sqrt{2})} of the rationals {{\bf Q}} by {\sqrt{2}} has the norm form {N(n+m\sqrt{2}) = (n+m\sqrt{2}) (n-m\sqrt{2}) = n^2-2m^2}, which is indefinite. One can view this positive definiteness of the norm form as the one remaining vestige in {{\bf C}} of the order structure {<} on the reals, which as remarked previously is no longer present directly in the complex numbers. (One can also view the positive definiteness of the norm form as a consequence of the topological connectedness of the punctured complex plane {{\bf C} \backslash \{0\}}: the norm form is positive at {z=1}, and cannot change sign anywhere in {{\bf C} \backslash \{0\}}, so is forced to be positive on the rest of this connected region.)

One consequence of positive definiteness is that the bilinear form

\displaystyle  \langle z, w \rangle := \mathrm{Re}( z \overline{w} )

becomes a positive definite inner product on {{\bf C}} (viewed as a vector space over {{\bf R}}). In particular, this turns the complex numbers into an inner product space over the reals. From the usual theory of inner product spaces, we can then construct a norm

\displaystyle  |z| := \langle z, z \rangle^{1/2} = N(z)^{1/2}

(thus, the norm is the square root of the norm form) which obeys the triangle inequality

\displaystyle  |z+w| \leq |z| + |w|; \ \ \ \ \ (3)

(which implies the usual permutations of this inequality, such as {||z|-|w|| \leq |z-w| \leq |z| + |w|}), and from the multiplicativity of the norm form we also have

\displaystyle  |zw| = |z| |w| \ \ \ \ \ (4)

(and hence also {|z/w| = |z|/|w|} if {w} is non-zero) and from the involutive nature of complex conjugation we have

\displaystyle  |\overline{z}| = |z|. \ \ \ \ \ (5)

The norm {|\cdot|} clearly extends the absolute value operation {x \mapsto |x|} on the real numbers, and so we also refer to the norm {|z|} of a complex number {z} as its absolute value or magnitude. In coordinates, we have

\displaystyle  |a+bi| = \sqrt{a^2+b^2}, \ \ \ \ \ (6)

thus for instance {|i|=1}, and from (6) we also immediately have the useful inequalities

\displaystyle  |\mathrm{Re}(z)|, |\mathrm{Im}(z)| \leq |z| \leq |\mathrm{Re}(z)| + |\mathrm{Im}(z)|. \ \ \ \ \ (7)

As with any other normed vector space, the norm {z \mapsto |z|} defines a metric on the complex numbers via the definition

\displaystyle  d(z,w) := |z-w|.

Note that using the Argand plane representation of {{\bf C}} as {{\bf R}^2} that this metric coincides with the usual Euclidean metric on {{\bf R}^2}. This metric in turn defines a topology on {{\bf C}} (generated in the usual manner by the open disks {D(z,r) := \{ w \in {\bf C}: |z-w| < r \}}), which in turn generates all the usual topological notions such as the concept of an open set, closed set, compact set, connected set, and boundary of a set; the notion of a limit of a sequence {z_n}; the notion of a continuous map, and so forth. For instance, a sequence {z_n} of complex numbers converges to a limit {z \in {\bf C}} if {|z_n -z| \rightarrow 0} as {n \rightarrow \infty}, and a map {f: {\bf C} \rightarrow {\bf C}} is continuous if one has {f(z_n) \rightarrow f(z)} whenever {z_n \rightarrow z}, or equivalently if the inverse image of any open set is open. Again, using the Argand plane representation, these notions coincide with their counterparts on the Euclidean plane {{\bf R}^2}.

As usual, if a sequence {z_n} of complex numbers converges to a limit {z}, we write {z = \lim_{n \rightarrow \infty} z_n}. From the triangle inequality (3) and the multiplicativity (4) we see that the addition operation {+: {\bf C} \times {\bf C} \rightarrow {\bf C}}, subtraction operation {-: {\bf C} \times {\bf C} \rightarrow {\bf C}}, and multiplication operation {\times: {\bf C} \rightarrow {\bf C} \rightarrow {\bf C}}, thus we have the familiar limit laws

\displaystyle  \lim_{n \rightarrow \infty} (z_n + w_n) = \lim_{n \rightarrow \infty} z_n + \lim_{n \rightarrow \infty} w_n,

\displaystyle  \lim_{n \rightarrow \infty} (z_n - w_n) = \lim_{n \rightarrow \infty} z_n - \lim_{n \rightarrow \infty} w_n


\displaystyle  \lim_{n \rightarrow \infty} (z_n \cdot w_n) = \lim_{n \rightarrow \infty} z_n \cdot \lim_{n \rightarrow \infty} w_n

whenever the limits on the right-hand side exist. Similarly, from (5) we see that complex conjugation is an isometry of the complex numbers, thus

\displaystyle  \lim_{n \rightarrow \infty} \overline{z_n} = \overline{\lim_{n \rightarrow \infty} z_n}

when the limit on the right-hand side exists. As a consequence, the norm form {N: {\bf C} \rightarrow {\bf R}} and the absolute value {|\cdot|: {\bf C} \rightarrow {\bf R}} are also continuous, thus

\displaystyle  \lim_{n \rightarrow \infty} |z_n| = |\lim_{n \rightarrow \infty} z_n|

whenever the limit on the right-hand side exists. Using the formula (2) for the reciprocal of a complex number, we also see that division is a continuous operation as long as the denominator is non-zero, thus

\displaystyle  \lim_{n \rightarrow \infty} \frac{z_n}{w_n} = \frac{\lim_{n \rightarrow \infty} z_n}{\lim_{n \rightarrow \infty} w_n}

as long as the limits on the right-hand side exist, and the limit in the denominator is non-zero.

From (7) we see that

\displaystyle  z_n \rightarrow z \iff \mathrm{Re}(z_n) \rightarrow \mathrm{Re}(z) \hbox{ and } \mathrm{Im}(z_n) \rightarrow \mathrm{Im}(z);

in particular

\displaystyle  \lim_{n \rightarrow \infty} \mathrm{Re}(z_n) = \mathrm{Re} \lim_{n \rightarrow \infty} z_n


\displaystyle  \lim_{n \rightarrow \infty} \mathrm{Im}(z_n) = \mathrm{Im} \lim_{n \rightarrow \infty} z_n

whenever the limit on the right-hand side exists. One consequence of this is that {{\bf C}} is complete: every sequence {z_n} of complex numbers that is a Cauchy sequence (thus {|z_n-z_m| \rightarrow 0} as {n,m \rightarrow \infty}) converges to a unique complex limit {z}. (As such, one can view the complex numbers as a (very small) example of a Hilbert space.)

As with the reals, we have the fundamental fact that any formal series {\sum_{n=0}^\infty z_n} of complex numbers which is absolutely convergent, in the sense that the non-negative series {\sum_{n=0}^\infty |z_n|} is finite, is necessarily convergent to some complex number {S}, in the sense that the partial sums {\sum_{n=0}^N z_n} converge to {S} as {N \rightarrow \infty}. This is because the triangle inequality ensures that the partial sums are a Cauchy sequence. As usual we write {S = \sum_{n=0}^\infty z_n} to denote the assertion that {S} is the limit of the partial sums {\sum_{n=0}^\infty}. We will occasionally have need to deal with series that are only conditionally convergent rather than absolutely convergent, but in most of our applications the only series we will actually evaluate are the absolutely convergent ones. Many of the limit laws imply analogues for series, thus for instance

\displaystyle  \sum_{n=0}^\infty \mathrm{Re}(z_n) = \mathrm{Re} \sum_{n=0}^\infty z_n

whenever the series on the right-hand side is absolutely convergent (or even just convergent). We will not write down an exhaustive list of such series laws here.

An important role in complex analysis is played by the unit circle

\displaystyle  S^1 := \{ z \in {\bf C}: |z|=1 \}.

In coordinates, this is the set of points {a+bi} for which {a^2+b^2=1}, and so this indeed has the geometric structure of a unit circle. Elements of the unit circle will be referred to in these notes as phases. Every non-zero complex number {z} has a unique polar decomposition as {z = r \omega} where {r>0} is a positive real and {\omega} lies on the unit circle {S^1}. Indeed, it is easy to see that this decomposition is given by {r = |z|} and {\omega = \frac{z}{|z|}}, and that this is the only polar decopmosition of {z}. We refer to the polar components {r=|z|} and {\omega = z/|z|} of a non-zero complex number {z} as the magnitude and phase of {z} respectively.

From (4) we see that the unit circle {S^1} is a multiplicative group; it contains the multiplicative identity {1}, and if {z, w} lie in {S^1}, then so do {zw} and {1/z}. From (2) we see that reciprocation and complex conjugation agree on the unit circle, thus

\displaystyle  \frac{1}{z} = \overline{z}

for {z \in S^1}. It is worth emphasising that this useful identity does not hold as soon as one leaves the unit circle, in which case one must use the more general formula (2) instead! If {z_1,z_2} are non-zero complex numbers {z_1,z_2} with polar decompositions {z_1 = r_1 \omega_1} and {z_2 = r_2 \omega_2} respectively, then clearly the polar decompositions of {z_1 z_2} and {z_1/z_2} are given by {z_1 z_2 = (r_1 r_2) (\omega_1 \omega_2)} and {z_1/z_2 = (r_1/r_2) (\omega_1/\omega_2)} respectively. Thus polar coordinates are very convenient for performing complex multiplication, although they turn out to be atrocious for performing complex addition. (This can be contrasted with the usual Cartesian coordinates {z=a+bi}, which are very convenient for performing complex addition and mildly inconvenient for performing complex multiplication.) In the language of group theory, the polar decomposition splits the multiplicative complex group {{\bf C}^\times = ({\bf C} \backslash \{0\}, \times)} as the direct product of the positive reals {(0,+\infty)} and the unit circle {S^1}: {{\bf C}^\times \equiv (0,+\infty) \times S^1}.

If {\omega} is an element of the unit circle {S^1}, then from (4) we see that the operation {z \mapsto \omega z} of multiplication by {\omega} is an isometry of {{\bf C}}, in the sense that

\displaystyle  |\omega z - \omega w| = |z-w|

for all complex numbers {z, w}. This isometry also preserves the origin {0}. As such, it is geometrically obvious (see Exercise 11 below) that the map {z \mapsto \omega z} must either be a rotation around the origin, or a reflection around a line. The former operation is orientation preserving, and the latter is orientation reversing. Since the map {z \mapsto \omega z} is clearly orientation preserving when {\omega = 1}, and the unit circle {S^1} is connected, a continuity argument shows that {z \mapsto \omega z} must be orientation preserving for all {\omega \in S^1}, and so must be a rotation around the origin by some angle. Of course, by trigonometry, we may write

\displaystyle  \omega = \cos \theta + i \sin \theta

for some real number {\theta}. The rotation {z \mapsto \omega z} clearly maps the number {1} to the number {\cos \theta + i \sin \theta}, and so the rotation must be a counter-clockwise rotation by {\theta} (adopting the usual convention of placing {1} to the right of the origin and {i} above it). In particular, when applying this rotation {z \mapsto \omega z} to another point {\cos \phi + i \sin \phi} on the unit circle, this point must get rotated to {\cos(\theta+\phi) + i \sin(\theta+\phi)}. We have thus given a geometric proof of the multiplication formula

\displaystyle  (\cos(\theta + \phi) + i \sin(\theta + \phi)) = (\cos \theta + i \sin \theta) (\cos \phi + i \sin \phi); \ \ \ \ \ (8)

taking real and imaginary parts, we recover the familiar trigonometric addition formulae

\displaystyle  \cos(\theta+\phi) = \cos \theta \cos \phi - \sin \theta \sin \phi

\displaystyle  \sin(\theta+\phi) = \sin \theta \cos \phi + \cos \theta \sin \phi.

We can also iterate the multiplication formula to give de Moivre’s formula

\displaystyle  \cos( n \theta ) + i \sin(n \theta) = (\cos \theta + i \sin \theta)^n

for any natural number {n} (or indeed for any integer {n}), which can in turn be used to recover familiar identities such as the double angle formulae

\displaystyle  \cos(2\theta) = \cos^2 \theta - \sin^2 \theta

\displaystyle  \sin(2\theta) = 2 \sin \theta \cos \theta

or triple angle formulae

\displaystyle  \cos(3\theta) = \cos^3 \theta - 3 \sin^2 \theta \cos \theta

\displaystyle  \sin(3\theta) = 3 \sin \theta \cos^2 \theta - \sin^3 \theta

after expanding out de Moivre’s formula for {n=2} or {n=3} and taking real and imaginary parts.

Exercise 11

Every non-zero complex number {z} can now be written in polar form as

\displaystyle  z = r (\cos(\theta) + i \sin(\theta)) \ \ \ \ \ (9)

with {r>0} and {\theta \in {\bf R}}; we refer to {\theta} as an argument of {z}, and can be interpreted as an angle of counterclockwise rotation needed to rotate the positive real axis to a position that contains {z}. The argument is not quite unique, due to the periodicity of sine and cosine: if {\theta} is an argument of {z}, then so is {\theta + 2\pi k} for any integer {k}, and conversely these are all the possible arguments that {z} can have. The set of all such arguments will be denoted {\mathrm{arg}(z)}; it is a coset of the discrete group {2\pi {\bf Z} := \{ 2\pi k: k \in {\bf Z}\}}, and can thus be viewed as an element of the {1}-torus {{\bf R}/2\pi{\bf Z}}.

The operation {w \mapsto zw} of multiplying a complex number {w} by a given non-zero complex number {z} now has a very appealing geometric interpretation when expressing {z} in polar coordinates (9): this operation is the composition of the operation of dilation by {r} around the origin, and counterclockwise rotation by {\theta} around the origin. For instance, multiplication by {i} performs a counter-clockwise rotation by {\pi/2} around the origin, while multiplication by {-i} performs instead a clockwise rotation by {\pi/2}. As complex multiplication is commutative and associative, it does not matter in which order one performs the dilation and rotation operations. Similarly, using Cartesian coordinates, we see that the operation {w \mapsto z+w} of adding a complex number {w} by a given complex number {z} is simply a spatial translation by a displacement of {z}. The multiplication operation need not be isometric (due to the presence of the dilation {r}), but observe that both the addition and multiplication operations are conformal (angle-preserving) and also orientation-preserving (a counterclockwise loop will transform to another counterclockwise loop, and similarly for clockwise loops). As we shall see later, these conformal and orientation-preserving properties of the addition and multiplication maps will extend to the larger class of complex differentiable maps (at least outside of critical points of the map), and are an important aspect of the geometry of such maps.

Remark 12 One can also interpret the operations of complex arithmetic geometrically on the Argand plane as follows. As the addition law on {{\bf C}} coincides with the vector addition law on {{\bf R}^2}, addition and subtraction of complex numbers is given by the usual parallelogram rule for vector addition; thus, to add a complex number {z} to another {w}, we can translate the complex plane until the origin {0} gets mapped to {z}, and then {w} gets mapped to {z+w}; conversely, subtraction by {z} corresponds to translating {z} back to {0}. Similarly, to multiply a complex number {z} with another {w}, we can dilate and rotate the complex plane around the origin until {1} gets mapped to {z}, and then {w} will be mapped to {zw}; conversely, division by {z} corresponds to dilating and rotating {z} back to {1}.

When performing computations, it is convenient to restrict the argument {\theta} of a non-zero complex number {z} to lie in a fundamental domain of the {1}-torus {{\bf R}/2\pi{\bf Z}}, such as the half-open interval {\{ \theta: 0 \leq \theta < 2\pi \}} or {\{ \theta: -\pi < \theta \leq \pi \}}, in order to recover a unique parameterisation (at the cost of creating a branch cut at one point of the unit circle). Traditionally, the fundamental domain that is most often used is the half-open interval {\{ \theta: -\pi < \theta \leq \pi \}}. The unique argument of {z} that lies in this interval is called the standard argument of {z} and is denoted {\mathrm{Arg}(z)}, and {\mathrm{Arg}} is called the standard branch of the argument function. Thus for instance {\mathrm{Arg}(1)=0}, {\mathrm{Arg}(i) = \pi/2}, {\mathrm{Arg}(-1) = \pi}, and {\mathrm{Arg}(-i) = -\pi/2}. Observe that the standard branch of the argument has a discontinuity on the negative real axis {\{ x \in {\bf R}: x \leq 0\}}, which is the branch cut of this branch. Changing the fundamental domain used to define a branch of the argument can move the branch cut around, but cannot eliminate it completely, due to non-trivial monodromy (if one continuously loops once counterclockwise around the origin, and varies the argument continuously as one does so, the argument will increment by {2\pi}, and so no branch of the argument function can be continuous at every point on the loop).

The multiplication formula (8) resembles the multiplication formula

\displaystyle  \exp( x + y) = \exp( x ) \exp( y ) \ \ \ \ \ (10)

for the real exponential function {\exp: {\bf R} \rightarrow {\bf R}}. The two formulae can be unified through the famous Euler formula involving the complex exponential {\exp: {\bf C} \rightarrow {\bf C}}. There are many ways to define the complex exponential. Perhaps the most natural is through the ordinary differential equation {\frac{d}{dz} \exp(z) = \exp(z)} with boundary condition {\exp(0)=1}. However, as we have not yet set up a theory of complex differentiation, we will proceed (at least temporarily) through the device of Taylor series. Recalling that the real exponential function {\exp: {\bf R} \rightarrow {\bf R}} has the Taylor expansion

\displaystyle  \exp(x) = \sum_{n=0}^\infty \frac{x^n}{n!}

\displaystyle  = 1 + x + \frac{x^2}{2!} + \frac{x^3}{3!} + \dots

which is absolutely convergent for any real {x}, one is led to define the complex exponential function {\exp: {\bf C} \rightarrow {\bf C}} by the analogous expansion

\displaystyle  \exp(z) = \sum_{n=0}^\infty \frac{z^n}{n!} \ \ \ \ \ (11)

\displaystyle  = 1 + z + \frac{z^2}{2!} + \frac{z^3}{3!} + \dots

noting from (4) that the absolute convergence of the real exponential {\exp(x)} for any {x \in {\bf R}} implies the absolute convergence of the complex exponential for any {z \in {\bf C}}. We also frequently write {e^z} for {\exp(z)}. The multiplication formula (10) for the real exponential extends to the complex exponential:

Exercise 13 Use the binomial theorem and Fubini’s theorem for (complex) doubly infinite series to conclude that

\displaystyle  \exp(z+w) = \exp(z) \exp(w) \ \ \ \ \ (12)

for any complex numbers {z,w}.

If one compares the Taylor series for {\exp(z)} with the familiar Taylor expansions

\displaystyle  \sin(x) = \sum_{n=0}^\infty (-1)^n \frac{x^{2n+1}}{(2n+1)!}

\displaystyle  = x - \frac{x^3}{3!} + \frac{x^5}{5!} - \dots


\displaystyle  \cos(x) = \sum_{n=0}^\infty (-1)^n \frac{x^{2n}}{(2n)!}

\displaystyle  = 1 - \frac{x^2}{2!} + \frac{x^4}{4!} - \dots

for the (real) sine and cosine functions, one obtains Euler formula

\displaystyle  e^{i x} = \cos x + i \sin x \ \ \ \ \ (13)

for any real number {x}; in particular we have the famous identities

\displaystyle  e^{2\pi i} = 1 \ \ \ \ \ (14)


\displaystyle  e^{\pi i} + 1 = 0. \ \ \ \ \ (15)

We now see that the multiplication formula (8) can be written as a special form

\displaystyle  e^{i(\theta + \phi)} = e^{i\theta} e^{i\phi}

of (12); similarly, de Moivre’s formula takes the simple and intuitive form

\displaystyle  e^{i n \theta} = (e^{i\theta})^n.

From (12) and (13) we also see that the exponential function basically transforms Cartesian coordinates to polar coordinates:

\displaystyle  \exp( x+iy ) = e^x ( \cos y + i \sin y ).

Later on in the course we will study (the various branches of) the logarithm function that inverts the complex exponential, thus converting polar coordinates back to Cartesian ones.

From (13) and (1), together with the easily verified identity

\displaystyle  \overline{e^{ix}} = e^{-ix},

we see that we can recover the trigonometric functions {\sin(x), \cos(x)} from the complex exponential by the formulae

\displaystyle  \sin(x) = \frac{e^{ix} - e^{-ix}}{2i}; \quad \cos(x) = \frac{e^{ix} + e^{-ix}}{2}. \ \ \ \ \ (16)

(Indeed, if one wished, one could take these identities as the definition of the sine and cosine functions, giving a purely analytic way to construct these trigonometric functions.) From these identities one can derive all the usual trigonometric identities from the basic properties of the exponential (and in particular (12)). For instance, using a little bit of high school algebra we can prove the familiar identity

\displaystyle  \sin^2(x) + \cos^2(x) = 1

from (16):

\displaystyle  \sin^2(x) + \cos^2(x) = \frac{(e^{ix}-e^{-ix})^2}{(2i)^2} + \frac{(e^{ix} + e^{-ix})^2}{2^2}

\displaystyle  = \frac{e^{2ix} - 2 + e^{-2ix}}{-4} + \frac{e^{2ix} + 2 + e^{-2ix}}{4}

\displaystyle  = 1.

Thus, in principle at least, one no longer has a need to memorize all the different trigonometric identities out there, since they can now all be unified as consequences of just a handful of basic identities for the complex exponential, such as (12), (14), and (15).

In view of (16), it is now natural to introduce the complex sine and cosine functions {\sin: {\bf C} \rightarrow {\bf C}} and {\cos: {\bf C} \rightarrow {\bf C}} by the formula

\displaystyle  \sin(z) = \frac{e^{iz} - e^{-iz}}{2i}; \quad \cos(z) = \frac{e^{iz} + e^{-iz}}{2}. \ \ \ \ \ (17)

These complex trigonometric functions no longer have a direct trigonometric interpretation (as one cannot easily develop a theory of complex angles), but they still inherit almost all of the algebraic properties of their real-variable counterparts. For instance, one can repeat the above high school algebra computations verbatim to conclude that

\displaystyle  \sin^2(z) + \cos^2(z) = 1 \ \ \ \ \ (18)

for all {z}. (We caution however that this does not imply that {\sin(z)} and {\cos(z)} are bounded in magnitude by {1} – note carefully the lack of absolute value signs outside of {\sin(z)} and {\cos(z)} in the above formula! See also Exercise 16 below.) Similarly for all of the other trigonometric identities. (Later on in this series of lecture notes, we will develop the concept of analytic continuation, which can explain why so many real-variable algebraic identities naturally extend to their complex counterparts.) From (11) we see that the complex sine and cosine functions have the same Taylor series expansion as their real-variable counterparts, namely

\displaystyle  \sin(z) = \sum_{n=0}^\infty (-1)^n \frac{z^{2n+1}}{(2n+1)!}

\displaystyle  = z - \frac{z^3}{3!} + \frac{z^5}{5!} - \dots


\displaystyle  \cos(z) = \sum_{n=0}^\infty (-1)^n \frac{z^{2n}}{(2n)!}

\displaystyle  = 1 - \frac{z^2}{2!} + \frac{z^4}{4!} - \dots.

The formulae (17) for the complex sine and cosine functions greatly resemble those of the hyperbolic trigonometric functions {\sinh, \cosh: {\bf R} \rightarrow {\bf R}}, defined by the formulae

\displaystyle  \sinh(x) := \frac{e^x - e^{-x}}{2}; \quad \cosh(x) := \frac{e^x + e^{-x}}{2}.

Indeed, if we extend these functions to the complex domain by defining {\sinh, \cosh: {\bf C} \rightarrow {\bf C}} to be the functions

\displaystyle  \sinh(z) := \frac{e^z - e^{-z}}{2}; \quad \cosh(z) := \frac{e^z + e^{-z}}{2},

then on comparison with (17) we obtain the complex identities

\displaystyle  \sinh(z) = -i \sin(iz); \quad \cosh(z) = \cos(iz) \ \ \ \ \ (19)

or equivalently

\displaystyle  \sin(z) = -i \sinh(iz); \quad \cos(z) = \cosh(iz) \ \ \ \ \ (20)

for all complex {z}. Thus we see that once we adopt the perspective of working over the complex numbers, the hyperbolic trigonometric functions are “rotations by 90 degrees” of the ordinary trigonometric functions; this is a simple example of what physicists call a Wick rotation. In particular, we see from these identities that any trigonometric identity will have a hyperbolic counterpart, though due to the presence of various factors of {i}, the signs may change as one passes from trigonometric to hyperbolic functions or vice versa (a fact quantified by Osborne’s rule). For instance, by substituting (19) or (20) into (18) (and replacing {z} by {iz} or {-iz} as appropriate), we end up with the analogous identity

\displaystyle  \cosh^2(z) - \sinh^2(z) = 1

for the hyperbolic trigonometric functions. Similarly for all other trigonometric identities. Thus we see that the complex exponential single-handedly unites the trigonometry, hyperbolic trigonometry, and the real exponential function into a single coherent theory!

Exercise 14

  • (i) If {n} is a positive integer, show that the only complex number solutions to the equation {z^n = 1} are given by the {n} complex numbers {e^{2\pi i k/n}} for {k=0,\dots,n-1}; these numbers are thus known as the {n^{th}} roots of unity. Conclude the identity {z^n - 1 = \prod_{k=0}^{n-1} (z - e^{2\pi i k/n})} for any complex number {z}.
  • (ii) Show that the only compact subgroups {G} of the multiplicative complex numbers {{\bf C}^\times} are the unit circle {S^1} and the {n^{th}} roots of unity

    \displaystyle  C_n := \{ e^{2\pi i k/n}: k=0,1,\dots,n-1\}

    for {n=1,2,\dots}. (Hint: there are two cases, depending on whether {1} is a limit point of {G} or not.)

  • (iii) Give an example of a non-compact subgroup of {S^1}.
  • (iv) (Warning: this one is tricky.) Show that the only connected closed subgroups of {{\bf C}^\times} are the whole group {{\bf C}^\times}, the trivial group {\{1\}}, and the one-parameter groups of the form {\{ \exp( tz ): t \in {\bf R} \}} for some non-zero complex number {z}.

The next exercise gives a special case of the fundamental theorem of algebra, when considering the roots of polynomials of the specific form {P(z) = z^n - w}.

Exercise 15 Show that if {w} is a non-zero complex number and {n} is a positive integer, then there are exactly {n} distinct solutions to the equation {z^n = w}, and any two such solutions differ (multiplicatively) by an {n^{th}} root of unity. In particular, a non-zero complex number {w} has two square roots, each of which is the negative of the other. What happens when {w=0}?

Exercise 16 Let {z_n} be a sequence of complex numbers. Show that {\sin(z_n)} is bounded if and only if the imaginary part of {z_n} is bounded, and similarly with {\sin(z_n)} replaced by {\cos(z_n)}.

Exercise 17 (This question was drawn from a previous version of this course taught by Rowan Killip.) Let {w_1, w_2} be distinct complex numbers, and let {\lambda} be a positive real that is not equal to {1}.

  • (i) Show that the set {\{ z \in {\bf C}: |\frac{z-w_1}{z-w_2}| = \lambda \}} defines a circle in the complex plane. (Ideally, you should be able to do this without breaking everything up into real and imaginary parts.)
  • (ii) Conversely, show that every circle in the complex plane arises in such a fashion (for suitable choices of {w_1,w_2,\lambda}, of course).
  • (iii) What happens if {\lambda=1}?
  • (iv) Let {\gamma} be a circle that does not pass through the origin. Show that the image of {\gamma} under the inversion map {z \mapsto 1/z} is a circle. What happens if {\gamma} is a line? What happens if the {\gamma} passes through the origin (and one then deletes the origin from {\gamma} before applying the inversion map)?

Exercise 18 If {z} is a complex number, show that {\exp(z) = \lim_{n \rightarrow \infty} (1 + \frac{z}{n})^n}.

Filed under: 246A - complex analysis, math.CV, math.RA Tagged: complex numbers, exponentiation, trigonometry

Terence Tao246A, Notes 2: complex integration

Having discussed differentiation of complex mappings in the preceding notes, we now turn to the integration of complex maps. We first briefly review the situation of integration of (suitably regular) real functions {f: [a,b] \rightarrow {\bf R}} of one variable. Actually there are three closely related concepts of integration that arise in this setting:

  • (i) The signed definite integral {\int_a^b f(x)\ dx}, which is usually interpreted as the Riemann integral (or equivalently, the Darboux integral), which can be defined as the limit (if it exists) of the Riemann sums

    \displaystyle  \sum_{j=1}^n f(x_j^*) (x_j - x_{j-1}) \ \ \ \ \ (1)

    where {a = x_0 < x_1 < \dots < x_n = b} is some partition of {[a,b]}, {x_j^*} is an element of the interval {[x_{j-1},x_j]}, and the limit is taken as the maximum mesh size {\max_{1 \leq j \leq n} |x_j - x_{j-1}|} goes to zero. It is convenient to adopt the convention that {\int_b^a f(x)\ dx := - \int_a^b f(x)\ dx} for {a < b}; alternatively one can interpret {\int_b^a f(x)\ dx} as the limit of the Riemann sums (1), where now the (reversed) partition {b = x_0 > x_1 > \dots > x_n = a} goes leftwards from {b} to {a}, rather than rightwards from {a} to {b}.

  • (ii) The unsigned definite integral {\int_{[a,b]} f(x)\ dx}, usually interpreted as the Lebesgue integral. The precise definition of this integral is a little complicated (see e.g. this previous post), but roughly speaking the idea is to approximate {f} by simple functions {\sum_{i=1}^n c_i 1_{E_i}} for some coefficients {c_i \in {\bf R}} and sets {E_i \subset [a,b]}, and then approximate the integral {\int_{[a,b]} f(x)\ dx} by the quantities {\sum_{i=1}^n c_i m(E_i)}, where {E_i} is the Lebesgue measure of {E_i}. In contrast to the signed definite integral, no orientation is imposed or used on the underlying domain of integration, which is viewed as an “undirected” set {[a,b]}.
  • (iii) The indefinite integral or antiderivative {\int f(x)\ dx}, defined as any function {F: [a,b] \rightarrow {\bf R}} whose derivative {F'} exists and is equal to {f} on {[a,b]}. Famously, the antiderivative is only defined up to the addition of an arbitrary constant {C}, thus for instance {\int x\ dx = \frac{1}{2} x^2 + C}.

There are some other variants of the above integrals (e.g. the Henstock-Kurzweil integral, discussed for instance in this previous post), which can handle slightly different classes of functions and have slightly different properties than the standard integrals listed here, but we will not need to discuss such alternative integrals in this course (with the exception of some improper and principal value integrals, which we will encounter in later notes).

The above three notions of integration are closely related to each other. For instance, if {f: [a,b] \rightarrow {\bf R}} is a Riemann integrable function, then the signed definite integral and unsigned definite integral coincide (when the former is oriented correctly), thus

\displaystyle  \int_a^b f(x)\ dx = \int_{[a,b]} f(x)\ dx


\displaystyle  \int_b^a f(x)\ dx = -\int_{[a,b]} f(x)\ dx

If {f: [a,b] \rightarrow {\bf R}} is continuous, then by the fundamental theorem of calculus, it possesses an antiderivative {F = \int f(x)\ dx}, which is well defined up to an additive constant {C}, and

\displaystyle  \int_c^d f(x)\ dx = F(d) - F(c)

for any {c,d \in [a,b]}, thus for instance {\int_a^b F(x)\ dx = F(b) - F(a)} and {\int_b^a F(x)\ dx = F(a) - F(b)}.

All three of the above integration concepts have analogues in complex analysis. By far the most important notion will be the complex analogue of the signed definite integral, namely the contour integral {\int_\gamma f(z)\ dz}, in which the directed line segment from one real number {a} to another {b} is now replaced by a type of curve in the complex plane known as a contour. The contour integral can be viewed as the special case of the more general line integral {\int_\gamma f(z) dx + g(z) dy}, that is of particular relevance in complex analysis. There are also analogues of the Lebesgue integral, namely the arclength measure integrals {\int_\gamma f(z)\ |dz|} and the area integrals {\int_\Omega f(x+iy)\ dx dy}, but these play only an auxiliary role in the subject. Finally, we still have the notion of an antiderivative {F(z)} (also known as a primitive) of a complex function {f(z)}.

As it turns out, the fundamental theorem of calculus continues to hold in the complex plane: under suitable regularity assumptions on a complex function {f} and a primitive {F} of that function, one has

\displaystyle  \int_\gamma f(z)\ dz = F(z_1) - F(z_0)

whenever {\gamma} is a contour from {z_0} to {z_1} that lies in the domain of {f}. In particular, functions {f} that possess a primitive must be conservative in the sense that {\int_\gamma f(z)\ dz = 0} for any closed contour. This property of being conservative is not typical, in that “most” functions {f} will not be conservative. However, there is a remarkable and far-reaching theorem, the Cauchy integral theorem (also known as the Cauchy-Goursat theorem), which asserts that any holomorphic function is conservative, so long as the domain is simply connected (or if one restricts attention to contractible closed contours). We will explore this theorem and several of its consequences the next set of notes.

— 1. Integration along a contour —

The notion of a curve is a very intuitive one. However, the precise mathematical definition of what a curve actually is depends a little bit on what type of mathematics one wishes to do. If one is mostly interested in topology, then a good notion is that of a continuous (parameterised) curve. It one wants to do analysis in somewhat irregular domains, it is convenient to restrict the notion of curve somewhat, to the rectifiable curves. If one is doing analysis in “nice” domains (such as the complex plane {{\bf C}}, a half-plane, a punctured plane, a disk, or an annulus), then it is convenient to restrict the notion further, to the piecewise smooth curves, also known as contours. If one wished to get to the main theorems of complex analysis as quickly as possible, then one would restrict attention only to contours and skip much of this section; however we shall take a more leisurely approach here, discussing curves and rectifiable curves as well, as these concepts are also useful outside of complex analysis.

We begin by defining the notion of a continuous curve.

Definition 1 (Continuous curves) A continuous parameterised curve, or curve for short, is a continuous map {\gamma: [a,b] \rightarrow {\bf C}} from a compact interval {[a,b] \subset {\bf R}} to the complex plane {{\bf C}}. We call the curve trivial if {a=b}, and non-trivial otherwise. We refer to the complex numbers {\gamma(a), \gamma(b)} as the initial point and terminal point of the curve respectively, and refer to these two points collectively as the endpoints of the curve. We say that the curve is closed if {\gamma(a) = \gamma(b)}. We say that the curve is simple if one has {\gamma(t) \neq \gamma(t')} for any distinct {t,t' \in [a,b]}, with the possible exception of the endpoint cases {t=a, t'=b} or {t=b, t'=a} (thus we allow closed curves to be simple). We refer to the subset {\gamma([a,b]) := \{ \gamma(t): t \in [a,b]\}} of the complex plane as the image of the curve.

We caution that the term “closed” here does not refer to the topological notion of closure: for any curve {\gamma} (closed or otherwise), the image {\gamma([a,b])} of the curve, being the continuous image of a compact set, is necessarily a compact subset of {{\bf C}} and is thus always topologically closed.

A basic example of a curve is the directed line segment {\gamma_{z_1 \rightarrow z_2}: [0,1] \rightarrow {\bf C}} from one complex point {z_1} to another {z_2}, defined by

\displaystyle  \gamma_{z_1 \rightarrow z_2}(t) := (1-t) z_1 + t z_2

for {0 \leq t \leq 1}. (Thus, contrary to the informal English meaning of the terms, we consider line segments to be examples of curves, despite having zero curvature; in general, it is convenient in mathematics to admit such “degenerate” objects into one’s definitions, in order to obtain good closure properties for these objects, and to maximise the generality of the definition.) If {z_1 \neq z_2}, this is a simple curve, while for {z_1 = z_2} it is (a rather degenerate, but still non-trivial) closed curve. Another important example of a curve is the anti-clockwise circle {\gamma_{z_0,r,\circlearrowleft}: [0,2\pi] \rightarrow {\bf C}} of some radius {r>0} around a complex centre {z_0 \in {\bf C}}, defined by

\displaystyle  \gamma_{z_0,r,\circlearrowleft}(t) := z_0 + r e^{i\theta}. \ \ \ \ \ (2)

This is a simple closed non-trivial curve. If we extended the domain here from {[0,2\pi]} to (say) {[0,4\pi]}, the curve would remain closed, but would no longer be simple (every point in the image is now traversed twice by the curve).

Note that it is technically possible for two distinct curves to have the same image. For instance, the anti-clockwise circle {\tilde \gamma_{z_0,r,\circlearrowleft}: [0,1] \rightarrow {\bf C}} of some radius {r>0} around a complex centre {z_0 \in {\bf C}} defined by

\displaystyle  \tilde \gamma_{z_0,r,\circlearrowleft}(t) := z_0 + r e^{2\pi i \theta}

traverses the same image as the previous curve (2), but is considered a distinct curve from {\gamma_{z_0,r,\circlearrowleft}}. Nevertheless the two curves are closely related to each other, and we formalise this as follows. We say that one curve {\gamma_2: [a_2,b_2] \rightarrow {\bf C}} is a continuous reparameterisation of another {\gamma_1: [a_1, b_1] \rightarrow {\bf C}}, if there is a homeomorphism {\phi: [a_1,b_1] \rightarrow [a_2,b_2]} (that is to say, a continuous invertible map whose inverse {\phi^{-1}: [a_2,b_2] \rightarrow [a_1,b_1]} is also continuous) which is endpoint preserving (i.e., {\phi(a_1)=a_2} and {\phi(b_1)=b_2}) such that {\gamma_2(\phi(t)) = \gamma_1(t)} for all {t \in [a_1,b_1]} (that is to say, {\gamma_1 = \gamma_2 \circ \phi}, or equivalently {\gamma_2 = \gamma_1 \circ \phi^{-1}}), in which case we write {\gamma_1 \equiv \gamma_2}. Thus for instance {\gamma_{z_0,r,\circlearrowleft} \equiv \tilde \gamma_{z_0,r,\circlearrowleft}}. The relation of being a continuous reparameterisation is an equivalence relation, so one can talk about the notion of a curve “up to continuous reparameterisation”, by which we mean an equivalence class of a curve under this relation. Thus for instance the image of a curve, as well as its initial point and end point, are well defined up to continuous reparameterisation, since if {\gamma_1 \equiv \gamma_2} then {\gamma_1} and {\gamma_2} have the same image, the same initial point, and the same terminal point. It is common to depict an equivalence class of a curve {\gamma} graphically, by drawing its image together with an arrow depicting the direction of motion from the initial point to its endpoint. (In the case of a non-simple curve, one may need multiple arrows in order to clarify the direction of motion, and also the possible multiplicity of the curve.)

Exercise 2 Let {\phi: [a_1,b_1] \rightarrow [a_2,b_2]} be a continuous invertible map.

  • (i) Show that {\phi^{-1}: [a_2,b_2] \rightarrow [a_1,b_1]} is continuous, so that {\phi} is a homeomorphism. (Hint: use the fact that a continuous image of a compact set is compact, and that a subset of an interval is topologically closed if and only if it is compact.)
  • (ii) If {\phi(a_1)=a_2}, show that {\phi(b_1) = b_2} and that {\phi} is monotone increasing. (Hint: use the intermediate value theorem.)
  • (iii) Conversely, if {\psi: [a_1,b_1] \rightarrow [a_2,b_2]} is a continuous monotone increasing map with {\psi(a_1) = a_2} and {\psi(b_1) = b_2}, show that {\psi} is a homeomorphism.

It will be important for us that we do not allow reparameterisations to reverse the endpoints. For instance, if {z_1,z_2} are distinct points in the complex plane, the directed line segment {\gamma_{z_1 \rightarrow z_2}} is not a reparameterisation of the directed line segment {\gamma_{z_2 \rightarrow z_1}} since they do not have the same initial point (or same terminal point); the map {t \mapsto 1-t} is a homeomorphism from {[0,1]} to {[0,1]} but it does not preserve the initial point or the terminal point. In general, given a curve {\gamma: [a,b] \rightarrow {\bf C}}, we define its reversal {-\gamma: [-b,-a] \rightarrow {\bf C}} to be the curve {(-\gamma)(t) := \gamma(-t)}, thus for instance {\gamma_{z_2 \rightarrow z_1}} is (up to reparameterisation) the reversal of {\gamma_{z_1 \rightarrow z_2}}, thus

\displaystyle  \gamma_{z_2 \rightarrow z_1} \equiv - \gamma_{z_1 \rightarrow z_2}.

Another basic operation on curves is that of concatenation. Suppose we have two curves {\gamma_1: [a_1,b_1] \rightarrow {\bf C}} and {\gamma_2: [a_2,b_2] \rightarrow {\bf C}} with the property that the terminal point {\gamma_1(b_1)} of {\gamma_1} equals the initial point {\gamma_2(a_2)} of {\gamma_2}. We can reparameterise {\gamma_2} by translation to {\tilde \gamma_2: [b_1, b_2+b_1-a_2] \rightarrow {\bf C}}, defined by {\tilde \gamma_2(t) := \gamma_2(t - b_1 + a_2)}. We then define the concatenation or sum {\gamma_1 + \gamma_2: [a_1, b_2+b_1-a_2] \rightarrow {\bf C}} by setting

\displaystyle  (\gamma_1 + \gamma_2)(t) := \gamma_1(t)

for {a_1 \leq t \leq b_1} and

\displaystyle  (\gamma_1 + \gamma_2)(t) := \tilde \gamma_2(t)

for {b_1 \leq t \leq b_2+b_1-a_2} (note that these two definitions agree on their common domain point {t=b_1} by the hypothesis {\gamma_1(b_1) = \gamma_2(a_2)}. It is easy to see that this concatenation {\gamma_1+\gamma_2} is still a continuous curve. The reader at this point is encouraged to draw a picture to understand what the concatenation operation is doing; it is much simpler to grasp it visually than the above lengthy definition may suggest. If the terminal point of {\gamma_1} does not equal the initial point of {\gamma_2}, we leave the sum {\gamma_1+\gamma_2} undefined. (One can define more general spaces than the space of curves in which such an addition can make sense, such as the space of {1}-currents if one assumes some rectifiability on the curves, but we will not need such general spaces here.)

Concatenation is well behaved with respect to equivalence and reversal:

Exercise 3 Let {\gamma_1, \gamma_2, \gamma_3, \tilde \gamma_1, \tilde \gamma_2} be continuous curves. Suppose that the terminal point of {\gamma_1} equals the initial point of {\gamma_2}, and the terminal point of {\gamma_2} equals the initial point of {\gamma_3}.

  • (i) (Concatenation well defined up to equivalence) If {\gamma_1 \equiv \tilde \gamma_1} and {\gamma_2 \equiv \tilde \gamma_2}, show that {\gamma_1+\gamma_2 \equiv \tilde \gamma_1 + \tilde \gamma_2}.
  • (ii) (Concatenation associative) Show that {(\gamma_1+\gamma_2)+\gamma_3 = \gamma_1 + (\gamma_2 + \gamma_3)}. In particular, we certainly have {(\gamma_1+\gamma_2)+\gamma_3 \equiv \gamma_1 + (\gamma_2 + \gamma_3)}
  • (iii) (Concatenation and reversal) Show that {-(\gamma_1+\gamma_2) \equiv (-\gamma_2) + (-\gamma_1)}.
  • (iv) (Non-commutativity) Give an example in which {\gamma_1+\gamma_2} and {\gamma_2+\gamma_1} are both well-defined, but not equivalent to each other.
  • (v) (Identity) If {z_1} and {z_2} denote the initial and terminal points of {\gamma_1} respectively, and {\delta_z: [0,0] \rightarrow {\bf C}} is the trivial curve {\delta_z: 0 \rightarrow z} defined for any {z \in {\bf C}}, show that {\delta_{z_1} + \gamma_1 \equiv \gamma_1} and {\gamma_1 + \delta_{z_2} \equiv \gamma_1}.
  • (vi) (Non-invertibility) Give an example in which {\gamma_1 + (-\gamma_1)} is not equivalent to a trivial curve.

Remark 4 The above exercise allows one to view the space of curves up to equivalence as a category, with the points in the complex plane being the objects of the category, and each equivalence class of curves being a single morphism from the initial point to the terminal point (and with the equivalence class of trivial curves being the identity morphisms). This point of view can be useful in topology, particularly when relating to concepts such as the fundamental group (and fundamental groupoid), monodromy, and holonomy. However, we will not need to use any advanced category-theoretic concepts in this course.

Exercise 5 Let {z_0 \in {\bf C}} and {r>0}. For any integer {m}, let {\gamma_m: [0,2\pi] \rightarrow {\bf C}} denote the curve

\displaystyle  \gamma_m(t) := z_0 + r e^{i m t}

(thus for instance {\gamma_1 = \gamma_{z_0,r,\circlearrowright}}).

  • (i) Show that for any integer {m,m'}, we have {-\gamma_m \equiv \gamma_{-m}}.
  • (ii) Show that for any non-negative integers {m,m'}, we have {\gamma_m + \gamma_{m'} \equiv \gamma_{m+m'}}. What happens for other values of {m,m'}?
  • (ii) If {m,m'} are distinct integers, show that {\gamma_m \not \equiv \gamma_{m'}}.

Given a sequence of complex numbers {z_0,z_1,\dots,z_n}, we define the polygonal path {\gamma_{z_0 \rightarrow z_1 \rightarrow \dots \rightarrow z_n}} traversing these numbers in order to be the curve

\displaystyle  \gamma_{z_0 \rightarrow z_1 \rightarrow \dots \rightarrow z_n} := \gamma_{z_0 \rightarrow z_1} + \gamma_{z_1 \rightarrow z_2} + \dots + \gamma_{z_{n-1} \rightarrow z_n}.

This is well-defined thanks to Exercise 3(ii) (actually all we really need in applications is being well-defined up to equivalence). Thus for instance {\gamma_{z_0 \rightarrow z_1 \rightarrow z_2 \rightarrow z_0}} would traverse a closed triangular path connecting {z_0}, {z_1}, and {z_2} (this path may end up being non-simple if the points {z_0,z_1,z_2} are collinear).

In order to do analysis, we need to restrict our attention to those curves which are rectifiable:

Definition 6 Let {\gamma: [a,b] \rightarrow {\bf C}} be a curve. The arc length {|\gamma|} of the curve is defined to be the supremum of the quantities

\displaystyle  \sum_{j=1}^n |\gamma(t_{j+1})-\gamma(t_j)|

where {n} ranges over the natural numbers and {a = t_0 < t_1 < \dots < t_n = b} ranges over the partitions of {[a,b]}. We say that the curve {\gamma} is rectifiable if its arc length is finite.

The concept is best understood visually: a curve is rectifiable if there is some finite bound on the length of polygonal paths one can form while traversing the curve in order. From Exercise 2 we see that equivalent curves have the same arclength, so the concepts of arclength and rectifiability are well defined for curves that are only given up to continuous reparameterisation.

Exercise 7 Let {\gamma_1,\gamma_2} be curves, with the terminal point of {\gamma_1} equal to the initial point of {\gamma_2}. Show that

\displaystyle  |\gamma_1+\gamma_2| = |\gamma_1| + |\gamma_2|.

In particular, {\gamma_1+\gamma_2} is rectifiable if and only if {\gamma_1, \gamma_2} are both individually rectifiable.

It is not immediately obvious that any reasonable curve (e.g. the line segments {\gamma_{z_1 \rightarrow z_2}} or the circles {\gamma_{z_0,r,\circlearrowleft}}) are rectifiable. To verify this, we need two preliminary results.

Lemma 8 (Triangle inequality) Let {f: [a,b] \rightarrow {\bf C}} be a continuous function. Then

\displaystyle  |\int_a^b f(t)\ dt| \leq \int_a^b |f(t)|\ dt.

Here we interpret {\int_a^b f(t)\ dt} as the Riemann integral (or equivalently, {\int_a^b f(t)\ dt = \int_a^b \mathrm{Re} f(t)\ dt + i \int_a^b \mathrm{Im} f(t)\ dt}).

Proof: We first attempt to prove this inequality by considering the real and imaginary parts separately. From the real-valued triangle inequality (and basic properties of the Riemann integral) we have

\displaystyle | \mathrm{Re} \int_a^b f(t)\ dt | = |\int_a^b \mathrm{Re} f(t)\ dt |

\displaystyle \leq \int_a^b |\mathrm{Re} f(t)|\ dt

\displaystyle  \leq \int_a^b |f(t)|\ dt

and similarly

\displaystyle  |\mathrm{Im} \int_a^b f(t)\ dt| \leq \int_a^b |f(t)|\ dt

but these two bounds only yield the weaker estimate

\displaystyle  |\int_a^b f(t)\ dt| \leq \sqrt{2} \int_a^b |f(t)|\ dt.

To eliminate this {\sqrt{2}} loss we can amplify the above argument by exploiting phase rotation. For any real {\theta}, we can repeat the above arguments (using the complex linearity of the Riemann integral, which is easily verified) to give

\displaystyle | \mathrm{Re} e^{i\theta} \int_a^b f(t)\ dt | = |\int_a^b \mathrm{Re} e^{i\theta} f(t)\ dt |

\displaystyle \leq \int_a^b |\mathrm{Re} e^{i\theta} f(t)|\ dt

\displaystyle  \leq \int_a^b |f(t)|\ dt.

But we have {|z| = \sup_\theta \mathrm{Re} e^{i\theta} z} for any complex number {z}, so taking the supremum of both sides in {\theta} we obtain the claim. \Box

Exercise 9 Let {a \leq b} be real numbers. Show that the interval {[a,b]} is topologically connected, that is to say the only two subsets of {[a,b]} that are both open and closed relative to {[a,b]} are the empty set and all of {[a,b]}. (Hint: if {E} is a non-empty set that is both open and closed in {[a,b]} and contains {a}, consider the supremum of all {T_* \in [a,b]} such that {[a,T_*] \subset E}.)

Next, we say that a non-trivial curve {\gamma: [a,b] \rightarrow {\bf C}} is continuously differentiable if the derivative

\displaystyle  \gamma'(t) := \lim_{t' \rightarrow t: t' \in [a,b] \backslash \{t\}} \frac{\gamma(t') - \gamma(t)}{t'-t}

exists and is continuous for all {t \in [a,b]} (note that we are only taking right-derivatives at {t=a} and left-derivatives at {t=b}).

Proposition 10 (Arclength formula) If {\gamma: [a,b] \rightarrow {\bf C}} is a continuously differentiable curve, then it is rectifiable, and

\displaystyle  |\gamma| := \int_a^b |\gamma'(t)|\ dt.

Proof: We first prove the upper bound

\displaystyle  |\gamma| \leq \int_a^b |\gamma'(t)|\ dt \ \ \ \ \ (3)

which in particular implies the rectifiability of {\gamma} since the right-hand side of (3) is finite. Let {a = t_0 < \dots < t_n = b} be any partition of {[a,b]}. By the fundamental theorem of calculus (applied to the real and imaginary parts of {\gamma}) we have

\displaystyle  \gamma(t_j) - \gamma(t_{j-1}) = \int_{t_{j-1}}^{t_j} \gamma'(t)\ dt

for any {1 \leq i \leq n}, and hence by Lemma 8 we have

\displaystyle  |\gamma(t_j) - \gamma(t_{j-1})| \leq \int_{t_{j-1}}^{t_j} |\gamma'(t)|\ dt.

Summing in {j} we obtain

\displaystyle  \sum_{j=1}^n |\gamma(t_i) - \gamma(t_{i-1})| \leq \int_a^b |\gamma'(t)|\ dt

and taking suprema over all partitions we obtain (3).

Now we need to show the matching lower bound. Let {\varepsilon>0} be a small quantity, and for any {a \leq T \leq b}, let {\gamma_{[a,T]}: [a,T] \rightarrow {\bf C}} denote the restriction of {\gamma: [a,b] \rightarrow {\bf C}} to {[a,T]}. We will show the bound

\displaystyle  |\gamma_{[a,T]}| \geq \int_a^T |\gamma'(t)| - \varepsilon (T-a) \ \ \ \ \ (4)

for all {a \leq T \leq b}; specialising to {T=b} and then sending {\varepsilon \rightarrow 0} will give the claim.

It remains to prove (4) for a given choice of {\varepsilon}. We will use a continuous version of induction known as the continuity method{continuity method}, which exploits Exercise 9.

Let {\Omega_\varepsilon \subset [a,b]} denote the set of {T_* \in [a,b]} such that (4) holds for all {a \leq T \leq T_*}. It is clear (using Exercise 7) that this set {\Omega_\varepsilon} is topologically closed, and also contains the left endpoint {a} of {[a,b]}. If {T_* \in \Omega_\varepsilon} and {T_* < b}, then from the differentiability of {\gamma} at {T_*}, we have some interval {[T_*, T_*+\delta] \subset [a,b]} such that

\displaystyle  |\frac{\gamma(T) - \gamma(T_*)}{T-T_*} - \gamma'(T_*)| \leq \varepsilon/2

for all {T \in [T_*, T_*+\delta]}. Rearranging this using the triangle inequality, we have

\displaystyle  |\gamma(T) - \gamma(T_*)| \geq |\gamma'(T_*)| (T-T_*) - \frac{\varepsilon}{2} (T-T_*).

Also, from the continuity of {|\gamma'|} we have

\displaystyle  \int_{T_*}^T |\gamma'(t)\ dt| \leq |\gamma'(T_*)| (T-T_*) + \frac{\varepsilon}{2} (t-T_*)

for all {T \in [T_*, T_*+\delta]}, if {\delta} is small enough. We conclude that

\displaystyle  |\gamma(T) - \gamma(T_*)| \geq \int_{T_*}^T \gamma'(t)\ dt - \varepsilon (T-T_*),

and hence

\displaystyle  |\gamma_{[T_*,T]}| \geq \int_{T_*}^T \gamma'(t)\ dt - \varepsilon (T-T_*)

where {\gamma_{[T_*,T]}: [T_*,T] \rightarrow {\bf C}} is the restriction of {\gamma: [a,b] \rightarrow {\bf C}} to {[T_*,T]}. Adding this to the {T=T_*} case of (4) using (7) we conclude that (4) also holds for all {T \in [T_*,T_*+\delta]}. From this we see that {\Omega_\varepsilon} is (relatively) open in {[a,b]}; from the connectedness of {[a,b]} we conclude that {\Omega_\varepsilon = [a,b]}, and we are done. \Box

It is now easy to verify that the line segment {\gamma_{z_1 \rightarrow z_2}} is rectifiable with arclength {|z_2-z_1|}, and that the circle {\gamma_{z_0,r,\circlearrowleft}} is rectifiable with arclength {2\pi r}, exactly as one would expect from elementary geometry. Finally, from Lemma 7, a polygonal path {\gamma_{z_0 \rightarrow z_1 \rightarrow \dots \rightarrow z_n}} will be rectifiable with arclength {|z_0-z_1| + \dots + |z_{n-1}-z_n|}, again exactly as one would expect.

Exercise 11 Show that the curve {\gamma: [0,1] \rightarrow {\bf C}} defined by setting {\gamma(t) := t + i t \sin(\frac{1}{t})} for {0 < t \leq 1} and {\gamma(0) := 0} is continuous but not rectifiable. (Hint: it is not necessary to compute the arclength precisely; a lower bound that goes to infinity will suffice. Graph the curve to discover some convenient partitions with which to generate such lower bounds. Alternatively, one can apply the arclength formula to some subcurves of {\gamma}.)

Exercise 12 (This exercise presumes familiarity with Lebesgue measure.) Show that the image of a rectifable curve is necessarily of measure zero in the complex plane. (In particular, space-filling curves such as the Peano curve or the Hilbert curve cannot be rectifiable.)

Remark 13 As the above exercise suggests, many fractal curves will fail to be rectifiable; for instance the Koch snowflake is a famous example of an unrectifiable curve. (The situation is clarified once one develops the theory of Hausdorff dimension, as is done for instance in this previous post: any curve of Hausdorff dimension strictly greater than one will be unrectifiable.)

Much as continuous functions {f: [a,b] \rightarrow {\bf R}} on an interval may be integrated by taking limits of Riemann sums, we may also integrate continuous functions {f: \gamma([a,b]) \rightarrow {\bf C}} on the image of a rectifiable curve {\gamma: [a,b] \rightarrow {\bf C}}:

Proposition 14 (Integration in rectifiable curves) Let {\gamma: [a,b] \rightarrow {\bf C}} be a rectifiable curve, and let {f: \gamma([a,b]) \rightarrow {\bf C}} be a continuous function on the image {\gamma([a,b])} of {\gamma}. Then the “Riemann sums”

\displaystyle  \sum_{j=1}^n f(\gamma(t^*_j)) (\gamma(t_j) - \gamma(t_{j-1})), \ \ \ \ \ (5)

where {a = t_0 < \dots < t_n = b} ranges over the partitions of {[a,b]}, and for each {1 \leq j \leq n}, {t^*_j} is an element of {[t_{j-1},t_j]}, converge as the maximum mesh size {\max_{1 \leq j \leq n} |t_j - t_{j-1}|} goes to zero to some complex limit, which we will denote as {\int_\gamma f(z)\ dz}. In other words, for every {\varepsilon>0} there exists a {\delta} such that

\displaystyle  |\sum_{j=1}^n f(\gamma(t^*_j)) (\gamma(t_j) - \gamma(t_{j-1})) - \int_\gamma f(z)\ dz| \leq \varepsilon

whenever {\max_{1 \leq j \leq n} |t_j - t_{j-1}| \leq \varepsilon}.

Proof: In real analysis courses, one often uses the order properties of the real line to replace the rather complicated looking Riemann sums with the simpler Darboux sums, en route to proving the real-variable analogue of the above proposition. However, in our complex setting the ordering of the real line is not available, so we will tackle the Riemann sums directly rather than try to compare them with Darboux sums.

It suffices to prove that the “Riemann sums” (5) are a Cauchy sequence, in the sense that the difference

\displaystyle  \sum_{j=1}^n f(\gamma(t^*_j)) (\gamma(t_j) - \gamma(t_{j-1})) - \sum_{k=1}^m f(\gamma(s^*_k)) (\gamma(s_k) - \gamma(s_{k-1}))

between two sums of the form (5) is smaller than any specified {\varepsilon>0} if the maximum mesh sizes of the two partitions {a = t_0 < \dots < t_n = b} and {a = s_0 < \dots < s_m = b} are both small enough. From the triangle inequality, and from the fact that any two partitions have a common refinement, it suffices to prove this under the additional assumption that the second partition {a = s_0 < \dots < s_m = b} is a refinemnt of {a = t_0 < \dots < t_n = b}. This means that there is an increasing sequence {0 = m_0 < m_1 < \dots < m_n = b} of natural numbers such that {s_{m_j} = t_j} for {j=0,\dots,n}. In that case, the above difference may be rearranged as

\displaystyle  \sum_{j=1}^n E_j


\displaystyle  E_j := f(\gamma(t^*_j)) (\gamma(t_j) - \gamma(t_{j-1})) - \sum_{k=m_{j-1}+1}^{m_j} f(\gamma(s^*_k)) (\gamma(s_k) - \gamma(s_{k-1})).

By telescoping series, we may rearrange {E_j} further as

\displaystyle  E_j := \sum_{k=m_{j-1}+1}^{m_j} (f(\gamma(t^*_j)) - f(\gamma(s^*_k))) (\gamma(s_k) - \gamma(s_{k-1})).

As {\gamma: [a,b] \rightarrow {\bf C}} is continuous and {[a,b]} is compact, {\gamma} is uniformly continuous. In particular, if the maximum mesh sizes are small enough, we have

\displaystyle |f(\gamma(t^*_j)) - f(\gamma(s^*_k))| \leq \varepsilon

for all {1 \leq j \leq n} and {m_{j-1} \leq k \leq m_j}. From the triangle inequality we conclude that

\displaystyle  |E_j| \leq \varepsilon \sum_{k=m_{j-1}+1}^{m_j} |\gamma(s_k) - \gamma(s_{k-1})|

and hence on summing in {j} and using the triangle inequality, we can bound

\displaystyle  |\sum_{j=1}^n E_j| \leq \varepsilon |\gamma|.

Since {|\gamma|} is finite, and {\varepsilon} can be made arbitrarily small, we obtain the required Cauchy sequence property. \Box

One cannot simply omit the rectifiability hypothesis from the above proposition:

Exercise 15 Give an example of a curve {\gamma: [a,b] \rightarrow {\bf C}} such that the Riemann sums

\displaystyle \sum_{j=1}^n \gamma(t^*_j) (\gamma(t_j) - \gamma(t_{j-1}))

fail to converge to a limit as the maximum mesh size goes to zero, so the integral {\int_\gamma z\ dz} does not exist even though the integrand {z} is extremely smooth. (Of course, such a curve {\gamma} cannot be rectifiable, thanks to Proposition 14.) Hint: the non-rectifiable curve in Exercise (11) is a good place to start, but it turns out that this curve does not oscillate wildly enough to make the Riemann sums here diverge, because of the decay of the function {z \mapsto z} near the origin. Come up with a variant of this curve which oscillates more.)

By abuse of notation, we will refer to the quantity {\int_\gamma f(z)\ dz} as the contour integral of {f} along {\gamma}, even though {\gamma} is not necessarily a contour (we will define this concept shortly). We have some easy properties of this integral:

Exercise 16 Let {\gamma: [a,b] \rightarrow {\bf C}} be a rectifiable curve, and {f: \gamma([a,b]) \rightarrow {\bf C}} be a continuous function.

Exercise 17 (This exercise assumes familiarity with the Riemann-Stieltjes integral.) Let {\gamma: [a,b] \rightarrow {\bf C}} be a rectifiable curve. Let {g: [a,b] \rightarrow {\bf C}} denote the monotone non-decreasing function

\displaystyle  g(T) := |\gamma_{[a,T]}|

for {a \leq T \leq b}, where {\gamma_{[a,T]}: [a,T] \rightarrow {\bf C}} is the restriction of {\gamma: [a,b] \rightarrow {\bf C}} to {[a,T]}. For any continuous function {f: \gamma([a,b]) \rightarrow {\bf C}}, define the arclength measure integral {\int_\gamma f(z)\ |dz|} by the formula

\displaystyle  \int_\gamma f(z)\ |dz| := \int_a^b f(\gamma(t))\ dg(t)

where the right-hand side is a Riemann-Stieltjes integral. Establish the triangle inequality

\displaystyle  |\int_\gamma f(z)\ dz| \leq \int_\gamma |f(z)|\ |dz|

for any continuous {f: \gamma([a,b]) \rightarrow {\bf C}}. Also establish the identity

\displaystyle  \int_\gamma\ |dz| = |\gamma|

and obtain an alternate proof of Exercise 16(v).

The change of variables formula (iv) lets one compute many contour integrals using the familiar Riemann integral. For instance, if {a < b} are real numbers and {f: [a,b] \rightarrow {\bf C}} is continuous, then the contour integral along {\gamma_{a \rightarrow b}} coincides with the Riemann integral,

\displaystyle  \int_{\gamma_{a \rightarrow b}} f(z)\ dz = \int_a^b f(x)\ dx

and on reversal we also have

\displaystyle  \int_{\gamma_{b \rightarrow a}} f(z)\ dz = \int_b^a f(x)\ dx = - \int_a^b f(x)\ dx.

Similarly, if {f} is continuous on the circle {\{ z \in {\bf C}: |z-z_0| = r \}}, we have

\displaystyle  \int_{\gamma_{z_0,r,\circlearrowleft}} f(z)\ dz = i \int_0^{2\pi} f(z_0+re^{i\theta}) r e^{i\theta}\ d\theta \ \ \ \ \ (6)

\displaystyle  = 2\pi i r \int_0^1 f(z_0 + re^{2\pi i \theta}) e^{2\pi i \theta}\ d\theta.

Remark 18 We caution that if {f(z)} is real-valued, we cannot conclude that {\int_\gamma f(z)\ dz} is also real valued, unless the contour {\gamma} lies in the real line. This is because the complex line element {dz} may introduce some non-trivial imaginary part, as is the case for instance in (6). For similar reasons, we have

\displaystyle  \mathrm{Re} \int_\gamma f(z)\ dz \neq \int_\gamma \mathrm{Re} f(z)\ dz

\displaystyle  \mathrm{Im} \int_\gamma f(z)\ dz \neq \int_\gamma \mathrm{Im} f(z)\ dz


\displaystyle  \overline{\int_\gamma f(z)\ dz} \neq \int_\gamma \overline{f(z)}\ dz.

in general. If one wishes to mix line integrals with real and imaginary parts, it is recommended to replace the contour integrals above with the line integrals

\displaystyle  \int_\gamma f(x+iy) dx + g(x+iy) dy,

which are defined as in Proposition 14 but where the expression

\displaystyle  f(\gamma(t^*_j)) (\gamma(t_j) - \gamma(t_{j-1}))

appearing in (5) is replaced by

\displaystyle  f(\gamma(t^*_j)) \mathrm{Re}(\gamma(t_j) - \gamma(t_{j-1})) + g(\gamma(t^*_j)) \mathrm{Im}(\gamma(t_j) - \gamma(t_{j-1})).

The contour integral corresponds to the special case {g=if} (or more informally, {dz = dx + idy}). Line integrals are in turn special cases of the more general concept of integration of differential forms, discussed for instance in this article of mine, and which are used extensively in differential geometry and geometric topology. However we will not use these more general line integrals or differential form integrals much in this course.

In later notes it will be convenient to restrict to a more regular class of curves than the rectifiable curves. We thus give the definitions here:

Definition 19 (Smooth curves and contours) A smooth curve is a curve {\gamma: [a,b] \rightarrow {\bf C}} that is continuously differentiable, and such that {\gamma'(t) \neq 0} for all {t \in [a,b]}. A contour is a curve that is equivalent to the concatenation of finitely many smooth curves (that is to say, a piecewise smooth curve).

Example 20 The line segments {\gamma_{z_1 \rightarrow z_2}} and circles {\gamma_{z_0, r, \circlearrowright}, \gamma_{z_0,r,\circlearrowleft}} are smooth curves and hence contours. Polygonal paths are usually not smooth, but they are contours. Any sum of finitely many contours is again a contour, and the reversal of a contour is also a contour.

Note here that the term “smooth” differs somewhat here from the real-variable notion of smoothness, which is defined to be “infinitely differentiable”. Smooth curves are still only assumed to just be continuously differentiable; we do not assume that the second derivative of {\gamma} exists. (In particular, smooth curves may have infinite curvature at some points.) In practice, this distinction tends to be minor, though, as the smooth curves that one actually uses in complex analysis do tend to be infinitely differentiable. On the other hand, for most applications one does not need to control any derivative of a contour beyond the first.

The following examples and exercises may help explain why the non-vanishing condition {\gamma'(t) \neq 0} imbues curves with a certain degree of “smoothness”.

Example 21 (Cuspidal curve) Consider the curve {\gamma: [-1,1] \rightarrow {\bf C}} defined by {\gamma(t) := t^2 + it^3}. Clearly {\gamma} is continously differentiable (and even infinitely differentiable), but we do not view this curve as smooth, because {\gamma'(t) = 2t + 3it^2} vanishes at the origin. Indeed, the image of the curve is {\{ x + i x^{3/2}: 0 \leq x \leq 1 \} \cup \{ x - i x^{3/2}: 0 \leq x \leq 1 \}}, which looks visibly non-smooth at the origin if one plots it, due to the presence of a cusp.

Example 22 (Absolute value function) Consider the curve {\gamma: [-1,1] \rightarrow {\bf C}} defined by {\gamma(t) := t^5 + i |t|^5}. This curve is certainly continuously differentiable, and in fact is four times continuously differentiable, but is not smooth because {\gamma'(t) = 5 t^4 + i 4 |t|^3 t} vanishes at the origin. The image of this curve is {\{ x + i |x|: -1 \leq x \leq 1\}}, which looks visibly non-smooth at the origin (in particular, there is no unique tangent line to this curve here).

Example 23 (Spiral) Consider the curve {\gamma: [-1,1] \rightarrow {\bf C}} defined by {\gamma(t) := t^2 e^{i/t}} for {t \neq 0}, and {\gamma(0) := 0}. One can check (exercise!) that {\gamma} is continuously differentiable, even at the origin {t=0}; but it is not a smooth curve because {\gamma'(0)} vanishes. The image of {\gamma} has some rather complicated behaviour at the origin, for instance it intersects itself multiple times (try to sketch it!).

Exercise 24 (Local behaviour of smooth curves) Let {\gamma: [a,b] \rightarrow {\bf C}} be a simple smooth curve, and let {t_0} be an interior point of {(a,b)}. Let {\theta \in {\bf R}} be a phase of {\gamma'(t_0)}, thus {\gamma'(t_0)= re^{i\theta}} for some {r>0}. Show that for all sufficiently small {\varepsilon>0}, the portion {\gamma([a,b]) \cap D(\gamma(t_0),\varepsilon)} of the image of {\gamma} near {\gamma(t_0)} looks like a rotated graph, in the sense that

\displaystyle  \gamma([a,b]) \cap D(\gamma(t_0),\varepsilon) = \{ \gamma(t_0) + e^{i\theta} ( s + i f(s) ): s \in I_\varepsilon \}

for some interval {I_\varepsilon = (-c_-(\varepsilon), c_+(\varepsilon))} containing the origin, and some continuously differentiable function {f: I_\varepsilon \rightarrow {\bf R}} with {f(0) = f'(0) = 0}. Furthermore, show that {c_-(\varepsilon), c_+\varepsilon = \varepsilon + o(\varepsilon)} as {\varepsilon \rightarrow 0} (or equivalently, that {\frac{c_-(\varepsilon)}{\varepsilon}, \frac{c_+(\varepsilon)}{\varepsilon} \rightarrow 1} as {\varepsilon \rightarrow 0}). (Hint: you may find it easier to first work with the model case where {\gamma(t_0) = 0} and {\gamma'(t_0) = 1}. The real-variable inverse function theorem will also be helpful.)

Exercise 25 Show that a curve {\gamma} is a contour if and only if it is equivalent to the concatenation of finitely many simple smooth curves.

Exercise 26 Show that the cuspidal curve and absolute value curves in Examples 21, 22 are contours, but the curve in Exercise 23 is not.

— 2. The fundamental theorem of calculus —

Now we establish the complex analogues of the fundamental theorem of calculus. As in the real-variable case, there are two useful formulations of this theorem. Here is the first:

Theorem 27 (First fundamental theorem of calculus) Let {U} be an open subset of {{\bf C}}, let {f: U \rightarrow {\bf C}} be a continuous function, and suppose that {f} has an antiderivative {F: U \rightarrow {\bf C}}, that is to say a holomorphic function with {f(z) = F'(z)} for all {z \in U}. Let {\gamma: [a,b] \rightarrow {\bf C}} be a rectifiable curve in {U} with initial point {z_0} and terminal point {z_1}. Then

\displaystyle  \int_\gamma f(z)\ dz = F(z_1) - F(z_0).

Proof: If {\gamma} were continuously differentiable, or at least piecewise continuously differentiable (the concatenation of finitely many continuously differentiable curves), we could establish this theorem by using Exercise 16 to rewrite everything in terms of real-variable Riemann integrals, at which point one can use the real-variable fundamental theorem of calculus (and the chain rule). But actually we can just give a direct proof that does not need any rectifiability hypothesis whatsoever.

We again use the continuity method. Let {\varepsilon > 0}, and for each {a \leq T \leq b}, let {\gamma_{[a,T]}: [a,T] \rightarrow {\bf C}} be the restriction of {\gamma: [a,b] \rightarrow {\bf C}} to {[a,T]}. It will suffice to show that

\displaystyle  |\int_{\gamma_{[a,T]}} f(z)\ dz - (F(\gamma(T)) - F(\gamma(a)))| \leq \varepsilon |\gamma_{[a,T]}| \ \ \ \ \ (7)

for all {a \leq T \leq b}, as the claim then follows by setting {T=b} and sending {\varepsilon} to zero.

Let {\Omega_\varepsilon} denote the set of all {T_* \in [a,b]} such that (7) holds for all {a \leq T \leq T_*}. As before, {\Omega_\varepsilon} is clearly closed and contains {a}; as {[a,b]} is connected, the only remaining task is to show that {\Omega_\varepsilon} is open in {[a,b]}. Let {T_* \in \Omega_\varepsilon} be such that {T_* < b}. As {F} is differentiable at {\gamma(T_*)} with derivative {f(\gamma(T_*))}, and {\gamma} is continuous, there exists {\delta>0} with {[T_*,T_*+\delta] \subset [a,b]} such that

\displaystyle  |\frac{F(\gamma(T)) - F(\gamma(T_*))}{\gamma(T) - \gamma(T_*)} - f(\gamma(T_*))| \leq \varepsilon/2

for any {T_* < T \leq T_*+\delta}, and hence

\displaystyle  |F(\gamma(T)) - F(\gamma(T_*)) - f(\gamma(T_*)) (\gamma(T)-\gamma(T_*))| \leq \varepsilon/2 |\gamma_{[T_*,T]}| \ \ \ \ \ (8)

for any {T \in [T_*, T_*+\delta]}. On the other hand, if {\delta} is small enough, we have that

\displaystyle  |f(\gamma(t)) - f(\gamma(T_*))| \leq \varepsilon/2

for all {t \in [T_*, T_*+\delta]}, and hence by Exercise 16(v) we have

\displaystyle  |\int_{\gamma_{[T_*,T]}} (f(z) - f(\gamma(T_*)))\ dz| \leq \frac{\varepsilon}{2} |\gamma_{[T_*, T]}|

where {\gamma_{[T_*,T]}: [T_*,T] \rightarrow {\bf C}} is the restriction of {\gamma: [a,b] \rightarrow {\bf C}} to {[T_*,T]}. Applying Exercise 16(vi), (vii) we thus have

\displaystyle  |\int_{\gamma_{[T_*,T]}} f(z)\ dz - f(\gamma(T_*)) (\gamma(T)-\gamma(T_*))| \leq \frac{\varepsilon}{2} |\gamma_{[T_*, T]}|.

Combining this with (8) and the triangle inequality, we conclude that

\displaystyle  |\int_{\gamma_{[T_*,T]}} f(z)\ dz - (F(\gamma(T)) - F(\gamma(T_*)))| \leq \varepsilon |\gamma_{[T_*,T]}|

and on adding this to the {T=T_*} case of (7) and again using the triangle inequality, we conclude that (7) holds for all {T \in [T_*,T_*+\delta]}. This ensures that {\Omega_\varepsilon} is open, as desired. \Box

One can use this theorem to quickly evaluate many integrals by using an antiderivative for the integrand as in the real-variable case. For instance, for any rectifiable curve {\gamma} with initial point {z_1} and terminal point {z_2}, we have

\displaystyle  \int_\gamma z\ dz = \frac{1}{2} z_2^2 - \frac{1}{2} z_1^2,

\displaystyle  \int_\gamma e^z\ dz = e^{z_2} - e^{z_1},

\displaystyle  \int_\gamma \cos(z)\ dz = \sin(z_2) - \sin(z_1),

and so forth. If the curve {\gamma} avoids the origin, we also have

\displaystyle  \int_\gamma \frac{1}{z^2}\ dz = -\frac{1}{z_2} + \frac{1}{z_1}

since {-\frac{1}{z}} is an antiderivative of {\frac{1}{z^2}} on {{\bf C} \backslash \{0\}}. If {\sum_{n=0}^\infty a_n (z-z_0)^n} is a power series with radius of convergence {R}, and {\gamma} is a curve in {D(z_0,R)} with initial point {z_1} and terminal point {z_2}, we similarly have

\displaystyle  \int_\gamma \sum_{n=0}^\infty a_n (z-z_0)^n = \sum_{n=0}^\infty a_n ( \frac{(z_2-z_0)^{n+1}}{n+1} - \frac{(z_1-z_0)^{n+1}}{n+1} ).

For the second fundamental theorem of calculus, we need a topological preliminary result.

Exercise 28 Let {U} be a non-empty open subset of {{\bf C}}. Show that the following statements are equivalent:

  • (i) {U} is topologically connected (that is to say, the only two subsets of {U} that are both open and closed relative to {U} are the empty set and {U} itself).
  • (ii) {U} is path connected (that is to say, for any {z_1,z_2 \in U} there exists a curve {\gamma} with image in {U} whose initial point is {z_1} and terminal point is {z_2})
  • (iii) {U} is polygonally path connected (the same as (ii), except that {\gamma} is now required to also be a polygonal path).

(Hint: to show that (i) implies (iii), pick a base point {z_0} in {U} and consider the set of all {z_1} in {U} that can be reached from {z_0} by a polygonal path.)

We remark that the relationship between path connectedness and connectedness is more delicate when one does not assume that the space {U} is open; every path-connected space is still connected, but the converse need not be true.

Remark 29 There is some debate as to whether to view the empty set {\emptyset} as connected, disconnected, or neither. I view this as analogous to the debate as to whether the natural number {1} should be viewed as prime, composite, or neither. In both cases I personally prefer the convention of “neither” (and like to use the term “unconnected” to describe the empty set, and “unit” to describe {1}), but to avoid any confusion I will restrict the discussion of connectedness to non-empty sets in this course (which is all we will need in applications).

In real analysis, the second fundamental theorem of calculus asserts that if a function {f: [a,b] \rightarrow {\bf R}} is continuous, then the function {F: x \mapsto \int_a^x f(y)\ dy} is an antiderivative of {f}. In the complex case, there is an analogous result, but one needs the additional requirement that the function is conservative:

Theorem 30 (Second fundamental theorem of calculus) Let {U \subset {\bf C}} be a non-empty open connected subset of the complex numbers. Let {f: U \rightarrow {\bf C}} be a continuous function which is conservative in the sense that

\displaystyle  \int_\gamma f(z)\ dz = 0 \ \ \ \ \ (9)

whenever {\gamma} is a closed polygonal path in {U}. Fix a base point {z_0 \in U}, and define the function {F: U \rightarrow {\bf C}} by the formula

\displaystyle  F(z_1) := \int_{\gamma_{z_0 \rightsquigarrow z_1}} f(z)\ dz

for all {z_1 \in U}, where {\gamma_{z_0 \rightsquigarrow z_1}} is any polygonal path from {z_0} to {z_1} in {U} (the existence of such a path follows from Exercise 28 and the hypothesis that {U} is connected, and the independence of the choice of path for the purposes of defining {F(z_1)} follows from Exercise 16 and the conservative hypothesis (9)). Then {F} is holomorphic on {U} and is an antiderivative of {f}, thus {f(z_1) = F'(z_1)} for all {z_1 \in U}.

Proof: We mimic the proof of the real-variable second fundamental theorem of calculus. Let {z_1} be any point in {U}. As {U} is open, it contains some disk {D(z_1,r)} centred at {z_1}. In particular, if {z_2} lies in this disk, then the line segment {\gamma_{z_1 \rightarrow z_2}} will have image in {U}. If {\gamma_{z_0 \rightsquigarrow z_1}} is any polygonal path from {z_0} to {z_1}, then we have

\displaystyle  F(z_1) = \int_{\gamma_{z_0 \rightsquigarrow z_1}} f(z)\ dz


\displaystyle  F(z_2) = \int_{\gamma_{z_0 \rightsquigarrow z_1} + \gamma_{z_1 \rightarrow z_2}} f(z)\ dz

and hence by Exercise 16

\displaystyle  F(z_2) - F(z_1) - f(z_1) (z_2-z_1) = \int_{\gamma_{z_1 \rightarrow z_2}} (f(z) - f(z_1))\ dz.

(The reader is strongly advised to draw a picture depicting the situation here.) Let {\varepsilon>0}, then for {z_2} sufficiently close to {z_1}, we have {|f(z)-f(z_1)| \leq \varepsilon} for all {z} in the image of {\gamma_{z_1 \rightarrow z_2}}. Thus by Exercise 16(v) we have

\displaystyle  |F(z_2) - F(z_1) - f(z_1) (z_2-z_1)| \leq \varepsilon |z_2-z_1|

for {z_2} sufficiently close to {z_1}, which implies that

\displaystyle  \lim_{z_2 \rightarrow z_1; z_2 \in U \backslash \{z_1\}} \frac{F(z_2)-F(z_1)}{z_2-z_1} = f(z_1)

for any {z_1 \in U}, and thus {F} is an antiderivative of {f} as required. \Box

The notion of a non-empty open connected subset {U} of the complex plane comes up so frequently in complex analysis that many texts assign a special term to this notion; for instance, Stein-Shakarchi refers to such sets as regions, and in other texts they may be called domains. We will stick to just “non-empty open connected subset of {{\bf C}}” in this course.

The requirement that {f} be conservative is necessary, as the following exercise shows. Actually it is also necessary in the real-variable case, but it is redundant in that case due to the topological triviality of closed polygonal paths in one dimension: see Exercise 33 below.

Exercise 31 Let {U} be a non-empty connected subset of {{\bf C}}, and let {f: U \rightarrow {\bf C}} be continuous. Show that the following are equivalent:

  • (i) {f} possesses at least one antiderivative {F}.
  • (ii) {f} is conservative in the sense that (9) holds for all closed polygonal paths in {U}.
  • (iii) {f} is conservative in the sense that (9) holds for all simple closed polygonal paths in {U}.
  • (iv) {f} is conservative in the sense that (9) holds for all closed contours in {U}.
  • (v) {f} is conservative in the sense that (9) holds for all closed rectifiable curves in {U}.

(Hint: to show that (iii) implies (ii), induct on the number of edges in the closed polygonal path, and find a way to decompose non-simple closed polygonal paths into paths with fewer edges. One should avoid non-rigorous “hand-waving” arguments, and make sure that one actually has covered all possible cases, e.g. paths that include some backtracking.) Furthermoore, show that if {f} has two antiderivatives {F_1, F_2}, then there exists a constant {C \in {\bf C}} such that {F_2 = F_1 + C}.

Exercise 32 Show that the function {\frac{1}{z}} does not have an antiderivative on {{\bf C} \backslash \{0\}}. (Hint: integrate {\frac{1}{z}} on {\gamma_{0,1,\circlearrowleft}}. In later notes we will see that {\frac{1}{z}} nevertheless does have antiderivatives on many subsets of {{\bf C} \backslash \{0\}}, formed by various branches of the complex logarithm.

Exercise 33 If {f: [a,b] \rightarrow {\bf C}} is a continuous function on an interval, show that {f} is conservative in the sense that (9) holds for any closed polygonal path in {[a,b]}. What happens for closed rectifiable paths?

Exercise 34 Let {U} be an open subset of {{\bf C}} (not necessarily connected).

  • (i) Show that there is a unique collection {{\mathcal U}} of non-empty subsets of {U} that are open, connected, disjoint, and partition {U}: {U = \biguplus_{V \in {\mathcal U}} V}. (The elements of {{\mathcal U}} are known as the connected components of {U}.)
  • (ii) Show that the number of connected components of {U} is at most countable. (Hint: show that each connected component contains at least one complex number with rational real and imaginary parts.)
  • (iii) If {f: U \rightarrow {\bf C}} is a continuous conservative function on {U}, show that {f} has at least one antiderivative {F}.
  • (iv) If {U} has more than one connected component, show that it is possible for a function {f: U \rightarrow {\bf C}} to have two antiderivatives {F_1, F_2} which do not differ by a constant (i.e. there is no complex number {C} such that {F_1 = F_2 + C}).

Exercise 35 (Integration by parts) Let {f,g: U \rightarrow {\bf C}} be holomorphic functions on an open set {U}, and let {\gamma} be a rectifiable curve in {U} with initial point {z_1} and terminal point {z_2}. Prove that

\displaystyle  \int_\gamma f(z) g'(z)\ dz = f(z_2) g(z_2) - f(z_1) g(z_1) - \int_\gamma f'(z) g(z)\ dz.

Filed under: 246A - complex analysis, math.CV, math.GT Tagged: contour integration, fundamental theorem of calculus, rectifiable curve

September 29, 2016

Mark Chu-CarrollPolls and Sampling Errors in the Presidental Debate Results

My biggest pet peeve is press coverage of statistics. As someone who is mathematically literate, I’m constantly infuriated by it. Basic statistics isn’t that hard, but people can’t be bothered to actually learn a tiny bit in order to understand the meaning of the things they’re covering.

My twitter feed has been exploding with a particularly egregious example of this. After monday night’s presidential debate, there’s been a ton of polling about who “won” the debate. One conservative radio host named Bill Mitchell has been on a rampage about those polls. Here’s a sample of his tweets:

Let’s start with a quick refresher about statistics, why we use them, and how they work.

Statistical analysis has a very simple point. We’re interested in understanding the properties of a large population of things. For whatever reason, we can’t measure the properties of every object in that population.

The exact reason can vary. In political polling, we can’t ask every single person in the country who they’re going to vote for. (Even if we could, we simply don’t know who’s actually going to show up and vote!) For a very different example, my first exposure to statistics was through my father, who worked in semiconductor manufacturing. They’d produce a run of 10,000 chips for use in Satellites. They needed to know when, on average, a chip would fail from exposure to radiation. If they measured that in every chip, they’d end up with nothing to sell.)

Anyway: you can’t measure every element of the population, but you still want to take measurements. So what you do is randomly select a collection of representative elements from the population, and you measure those. Then you can say that with a certain probability, the result of analyzing that representative subset will match the result that you’d get if you measured the entire population.

How close can you get? If you’ve really selected a random sample of the population, then the answer depends on the size of the sample. We measure that using something called the “margin of error”. “Margin of error” is actually a terrible name for it, and that’s the root cause of one of the most common problems in reporting about statistics. The margin of error is a probability measurement that says “there is an N% probability that the value for the full population lies within the margin of error of the measured value of the sample.”.

Right away, there’s a huge problem with that. What is that variable doing in there? The margin of error measures the probability that the full population value is within a confidence interval around the measured sample value. If you don’t say what the confidence interval is, the margin of error is worthless. Most of the time – but not all of the time – we’re talking about a 95% confidence interval.

But there are several subtler issues with the margin of error, both due to the name.

  1. The “true” value for the full population is not guaranteed to be within the margin of error of the sampled value. It’s just a probability. There is no hard bound on the size of the error: just a high probability of it being within the margin..
  2. The margin of error only includes errors due to sample size. It does not incorporate any other factor – and there are many! – that may have affected the result.
  3. The margin of error is deeply dependent on the way that the underlying sample was taken. It’s only meaningful for a random sample. That randomness is critically important: all of sampled statistics is built around the idea that you’ve got a randomly selected subset of your target population.

Let’s get back to our friend the radio host, and his first tweet, because he’s doing a great job of illustrating some of these errors.

The quality of a sampled statistic is entirely dependent on how well the sample matches the population. The sample is critical. It doesn’t matter how big the sample size is if it’s not random. A non-random sample cannot be treated as a representative sample.

So: an internet poll, where a group of people has to deliberately choose to exert the effort to participate cannot be a valid sample for statistical purposes. It’s not random.

It’s true that the set of people who show up to vote isn’t a random sample. But that’s fine: the purpose of an election isn’t to try to divine what the full population thinks. It’s to count what the people who chose to vote think. It’s deliberately measuring a full population: the population of people who chose to vote.

But if you’re trying to statistically measure something about the population of people who will go and vote, you need to take a randomly selected sample of people who will go to vote. The set of voters is the full population; you need to select a representative sample of that population.

Internet polls do not do that. At best, they measure a different population of people. (At worst, with ballot stuffing, they measure absolutely nothing, but we’ll give them this much benefit of the doubt.) So you can’t take much of anything about the sample population and use it to reason about the full population.

And you can’t say anything about the margin of error, either. Because the margin of error is only meaningful for a representative sample. You cannot compute a meaningful margin of error for a non-representative sample, because there is no way of knowing how that sampled population compares to the true full target population.

And that brings us to the second tweet. A properly sampled random population of 500 people can produce a high quality result with a roughly 5% margin of error and a 95% confidence interval. (I’m doing a back-of-the-envelope calculation here, so that’s not precise.) That means that if the population were randomly sampled, we could say there is in 19 out of 20 polls of that size, the full population value would be within +/- 4% of value measured by the poll. For a non-randomly selected sample of 10 million people, the margin of error cannot be measured, because it’s meaningless. The random sample of 500 people tells us a reasonable estimate based on data; the non-random sample of 10 million people tells us nothing.

And with that, on to the third tweet!

In a poll like this, the margin of error only tells us one thing: what’s the probability that the sampled population will respond to the poll in the same way that the full population would?

There are many, many things that can affect a poll beyond the sample size. Even with a truly random and representative sample, there are many things that can affect the outcome. For a couple of examples:

How, exactly, is the question phrased? For example, if you ask people “Should police shoot first and ask questions later?”, you’ll get a very different answer from “Should police shoot dangerous criminal suspects if they feel threatened?” – but both of those questions are trying to measure very similar things. But the phrasing of the questions dramatically affects the outcome.

What context is the question asked in? Is this the only question asked? Or is it asked after some other set of questions? The preceding questions can bias the answers. If you ask a bunch of questions about how each candidate did with respect to particular issues before you ask who won, those preceding questions will bias the answers.

When you’re looking at a collection of polls that asked different questions in different ways, you expect a significant variation between them. That doesn’t mean that there’s anything wrong with any of them. They can all be correct even though their results vary by much more than their margins of error, because the margin of error has nothing to do with how you compare their results: they used different samples, and measured different things.

The problem with the reporting is the same things I mentioned up above. The press treats the margin of error as an absolute bound on the error in the computed sample statistics (which it isn’t); and the press pretends that all of the polls are measuring exactly the same thing, when they’re actually measuring different (but similar) things. They don’t tell us what the polls are really measuring; they don’t tell us what the sampling methodology was; and they don’t tell us the confidence interval.

Which leads to exactly the kind of errors that Mr. Mitchell made.

And one bonus. Mr. Mitchell repeatedly rants about how many polls show a “bias” by “over-sampling< democratic party supporters. This is a classic mistake by people who don't understand statistics. As I keep repeating, for a sample to be meaningful, it must be random. You can report on all sorts of measurements of the sample, but you cannot change it.

If you’re randomly selecting phone numbers and polling the respondents, you cannot screen the responders based on their self-reported party affiliation. If you do, you are biasing your sample. Mr. Mitchell may not like the results, but that doesn’t make them invalid. People report what they report.

In the last presidential election, we saw exactly this notion in the idea of “unskewing” polls, where a group of conservative folks decided that the polls were all biased in favor of the democrats for exactly the reasons cited by Mr. Mitchell. They recomputed the poll results based on shifting the samples to represent what they believed to be the “correct” breakdown of party affiliation in the voting population. The results? The actual election results closely tracked the supposedly “skewed” polls, and the unskewers came off looking like idiots.

We also saw exactly this phenomenon going on in the Republican primaries this year. Randomly sampled polls consistently showed Donald Trump crushing his opponents. But the political press could not believe that Donald Trump would actually win – and so they kept finding ways to claim that the poll samples were off: things like they were off because they used land-lines which oversampled older people, and if you corrected for that sampling error, Trump wasn’t actually winning. Nope: the randomly sampled polls were correct, and Donald Trump is the republican nominee.

If you want to use statistics, you must work with random samples. If you don’t, you’re going to screw up the results, and make yourself look stupid.

September 28, 2016

Richard EastherThe Man Who Sold Mars

Elon Musk knows how to make a splash, and today he outlined his plan to turn humanity into a "multiplanetary species". Getting people to Mars is certainly doable and Musk's company SpaceX is at the forefront of current developments in space technology. But Musk painted a picture of a future where travel to Mars was downright cheap, with tickets costing as little as $200,000, the median price of an American home.

So is this possible? I have no idea, but it makes for a great Fermi question, a problem so fuzzy and incomplete that educated guesswork is the only way forward. (And "educated" is the key word – these puzzlers are part of the legacy of Enrico Fermi who used to test students with problems like "how many piano tuners are there in Chicago?" alongside more technical physics topics.) 

The key number is not distance but cost. Musk talks about making a journey to Mars that lasts a few months, but it will take the better part of a year for the Interplanetary Spaceship to make a return trip to Mars, even if most passengers are only travelling one way. Let's compare that to the cost of long-haul plane travel: I might pay $700 for a one-way trans-Pacific flight. This fare entitles me to 1/300th of a very expensive airplane for most of one day and covers my share of the fuel and the crew.

So how does this stack up against Musk's goal for a trip to Mars? His presentation talked about a craft that holds 100-200 passengers. Let's assume that SpaceX can eventually get to the point where Interplanetary Spaceships are about as expensive to build and run as modern airliners. Splitting the difference between 100 and 200, the daily cost of a single seat would be $1,400 a day.  And each passenger is effectively using their slice of an Interplanetary Spaceship for one year, so that works out at about $500,000 for the trip. 

By the standards that apply to Fermi problems (where you worry about factors of 10 but ignore factors of 2 or 3) that is pretty close to $200,000. But to make this actually happen we will need to build spaceships that are as cheap as airplanes. 

And you would still be flying economy class.

Coda: It's a reasonably safe bet that Elon Musk has read his Heinlein. 

David Hoggfitting spectroscopic systematics and marginalizing out #GaiaDR1

In the morning, I discussed new NYU graduate student Jason Cao's project to generalize The Cannon to fit for radial-velocity offsets and line-spread function variations at test time. This involves generalizing the model, but in a way that doesn't make anything much more computationally complex.

In the afternoon, I had a realization that we probably can compute fully marginalized likelihoods for the wide-separation binary problem in Gaia DR1 TGAS. The idea is that if we treat the velocity distribution as Gaussian, and the proper-motion errors as Gaussian, then at fixed true distance there is an analytic velocity integral. That reduces the marginalization to only two non-analytic dimensions (the true distances to the two stars). I started to work out the math and then foundered on the rocks of completing the square in the case of non-square matrix algebra. No problem really; we have succeeded before (in our K2 work).

September 27, 2016

Terence Tao246A, Notes 1: complex differentiation

At the core of almost any undergraduate real analysis course are the concepts of differentiation and integration, with these two basic operations being tied together by the fundamental theorem of calculus (and its higher dimensional generalisations, such as Stokes’ theorem). Similarly, the notion of the complex derivative and the complex line integral (that is to say, the contour integral) lie at the core of any introductory complex analysis course. Once again, they are tied to each other by the fundamental theorem of calculus; but in the complex case there is a further variant of the fundamental theorem, namely Cauchy’s theorem, which endows complex differentiable functions with many important and surprising properties that are often not shared by their real differentiable counterparts. We will give complex differentiable functions another name to emphasise this extra structure, by referring to such functions as holomorphic functions. (This term is also useful to distinguish these functions from the slightly less well-behaved meromorphic functions, which we will discuss in later notes.)

In this set of notes we will focus solely on the concept of complex differentiation, deferring the discussion of contour integration to the next set of notes. To begin with, the theory of complex differentiation will greatly resemble the theory of real differentiation; the definitions look almost identical, and well known laws of differential calculus such as the product rule, quotient rule, and chain rule carry over verbatim to the complex setting, and the theory of complex power series is similarly almost identical to the theory of real power series. However, when one compares the “one-dimensional” differentiation theory of the complex numbers with the “two-dimensional” differentiation theory of two real variables, we find that the dimensional discrepancy forces complex differentiable functions to obey a real-variable constraint, namely the Cauchy-Riemann equations. These equations make complex differentiable functions substantially more “rigid” than their real-variable counterparts; they imply for instance that the imaginary part of a complex differentiable function is essentially determined (up to constants) by the real part, and vice versa. Furthermore, even when considered separately, the real and imaginary components of complex differentiable functions are forced to obey the strong constraint of being harmonic. In later notes we will see these constraints manifest themselves in integral form, particularly through Cauchy’s theorem and the closely related Cauchy integral formula.

Despite all the constraints that holomorphic functions have to obey, a surprisingly large number of the functions of a complex variable that one actually encounters in applications turn out to be holomorphic. For instance, any polynomial {z \mapsto P(z)} with complex coefficients will be holomorphic, as will the complex exponential {z \mapsto \exp(z)}. From this and the laws of differential calculus one can then generate many further holomorphic functions. Also, as we will show presently, complex power series will automatically be holomorphic inside their disk of convergence. On the other hand, there are certainly basic complex functions of interest that are not holomorphic, such as the complex conjugation function {z \mapsto \overline{z}}, the absolute value function {z \mapsto |z|}, or the real and imaginary part functions {z \mapsto \mathrm{Re}(z), z \mapsto \mathrm{Im}(z)}. We will also encounter functions that are only holomorphic at some portions of the complex plane, but not on others; for instance, rational functions will be holomorphic except at those few points where the denominator vanishes, and are prime examples of the meromorphic functions mentioned previously. Later on we will also consider functions such as branches of the logarithm or square root, which will be holomorphic outside of a branch cut corresponding to the choice of branch. It is a basic but important skill in complex analysis to be able to quickly recognise which functions are holomorphic and which ones are not, as many of useful theorems available to the former (such as Cauchy’s theorem) break down spectacularly for the latter. Indeed, in my experience, one of the most common “rookie errors” that beginning complex analysis students make is the error of attempting to apply a theorem about holomorphic functions to a function that is not at all holomorphic. This stands in contrast to the situation in real analysis, in which one can often obtain correct conclusions by formally applying the laws of differential or integral calculus to functions that might not actually be differentiable or integrable in a classical sense. (This latter phenomenon, by the way, can be largely explained using the theory of distributions, as covered for instance in this previous post, but this is beyond the scope of the current course.)

Remark 1 In this set of notes it will be convenient to impose some unnecessarily generous regularity hypotheses (e.g. continuous second differentiability) on the holomorphic functions one is studying in order to make the proofs simpler. In later notes, we will discover that these hypotheses are in fact redundant, due to the phenomenon of elliptic regularity that ensures that holomorphic functions are automatically smooth.

— 1. Complex differentiation and power series —

Recall in real analysis that if {f: U \rightarrow {\bf R}} is a function defined on some subset {U} of the real line {{\bf R}}, and {x_0} is an interior point of {U} (that is to say, {U} contains an interval of the form {(x_0-\varepsilon,x_0+\varepsilon)} for some {\varepsilon>0}), then we say that {f} is differentiable at {x_0} if the limit

\displaystyle  \lim_{x \rightarrow x_0; x \in U \backslash \{x_0\}} \frac{f(x)-f(x_0)}{x-x_0}

exists (note we have to exclude {x_0} from the possible values of {x} to avoid division by zero. If {f} is differentiable at {x_0}, we denote the above limit as {f'(x_0)} or {\frac{df}{dx}(x_0)}, and refer to this as the derivative of {f} at {x_0}. If {U} is open (that is to say, every element of {U} is an interior point), and {f} is differentiable at every point of {U}, then we say that {f} is differentiable on {U}, and call {f': U \rightarrow {\bf R}} the derivative of {f}. (One can also define differentiability at non-interior points if they are not isolated, but for simplicity we will restrict attention to interior derivatives only.)

We can adapt this definition to the complex setting without any difficulty:

Definition 2 (Complex differentiability) Let {U} be a subset of the complex numbers {{\bf C}}, and let {f: U \rightarrow {\bf C}} be a function. If {z_0} is an interior point of {U} (that is to say, {U} contains a disk {D(z_0,\varepsilon) := \{ z \in {\bf C}: |z-z_0| < \varepsilon\}} for some {\varepsilon>0}), we say that {f} is complex differentiable at {z_0} if the limit

\displaystyle  \lim_{z \rightarrow z_0; z \in U \backslash \{z_0\}} \frac{f(z)-f(z_0)}{z-z_0}

exists, in which case we denote this limit as {f'(z_0)} or {\frac{df}{dz}(z_0)}, and refer to this as the complex derivative of {f} at {z_0}. If {U} is open (that is to say, every point in {U} is an interior point), and {f} is complex differentiable at every point at {U}, we say that {f} is complex differentiable on {U}, or holomorphic on {U}.

In terms of epsilons and deltas: {f} is complex differentiable at {z_0} with derivative {f'(z_0)} if and only if, for every {\varepsilon>0}, there exists {\delta>0} such that {|\frac{f(z)-f(z_0)}{z-z_0} - f'(z_0)| < \varepsilon} whenever {z \in U} is such that {0 < |z-z_0| < \delta}. Another way of writing this is that we have an approximate linearisation

\displaystyle  f(z) = f(z_0) + f'(z_0) (z-z_0) + o( |z-z_0| ) \ \ \ \ \ (1)

as {z} approaches {z_0}, where {o(|z-z_0|)} denotes a quantity of the form {|z-z_0| c(z)} for {z} in a neighbourhood of {z_0}, where {c(z)} goes to zero as {z} goes to {z_0}.

If {f} is differentiable at {z_0}, then from the limit laws we see that

\displaystyle  \lim_{z \rightarrow z_0: z \in U \backslash \{z_0\}} f(z) - f(z_0) = f'(z_0) \lim_{z \rightarrow z_0}(z-z_0) = 0

and hence

\displaystyle  \lim_{z \rightarrow z_0: z \in U} f(z) = f(z_0),

that is to say that {f} is continuous at {z_0}. In particular, holomorphic functions are automatically continuous. (Later on we will see that they are in fact far more regular than this, being smooth and even analytic.)

It is usually quite tedious to verify complex differentiability of a function, and to compute its derivative, from first principles. We will give just one example of this:

Proposition 3 Let {n} be a non-negative integer. Then the function {z \mapsto z^n} is holomorphic on the entire complex plane {{\bf C}}, with derivative {z \mapsto n z^{n-1}} (with the convention that {n z^{n-1}} is zero when {n=0}).

Proof: This is clear for {n=0}, so suppose {n \geq 1}. We need to show that for any complex number {z_0}, that

\displaystyle  \lim_{z \rightarrow z_0; z \neq z_0} \frac{z^n - z_0^n}{z - z_0} = n z_0^{n-1}.

But we have the geometric series identity

\displaystyle  \frac{z^n - z_0^n}{z - z_0} = z^{n-1} + z^{n-2} z_0 + \dots + z z_0^{n-2} + z_0^{n-1},

which is valid (in any field) whenever {z \neq z_0}, as can be seen either by induction or by multiplying both sides by {z-z_0} and cancelling the telescoping series on the right-hand side. The claim then follows from the usual limit laws. \Box

Fortunately, we have the familiar laws of differential calculus, that allow us to more quickly establish the differentiability of functions if they arise as various combinations of functions that are already known to be differentiable, and to compute the derivative:

Exercise 4 (Laws of differentiation) Let {U} be an open subset of {{\bf C}}, let {z_0} be a point in {U}, and let {f, g: U \rightarrow {\bf C}} be functions that are complex differentiable at {z_0}.

  • (i) (Linearity) Show that {f+g} is complex differentiable at {z_0}, with derivative {f'(z_0)+g'(z_0)}. For any constant {c \in {\bf C}}, show that {cf} is differentiable at {z_0}, with derivative {cf'(z_0)}.
  • (ii) (Product rule) Show that {fg} is complex differentiable at {z_0}, with derivative {f'(z_0) g(z_0) + f(z_0) g'(z_0)}.
  • (iii) (Quotient rule) If {g(z_0)} is non-zero, show that {f/g} (which is defined in a neighbourhood of {z_0}, by continuity) is complex differentiable at {z_0}, with derivative {\frac{f'(z_0) g(z_0) - f(z_0) g'(z_0)}{g(z_0)^2}}.
  • (iv) (Chain rule) If {V} is a neighbourhood of {f(z_0)}, and {h: V \rightarrow {\bf C}} is a function that is complex differentiable at {f(z_0)}, show that the composition {g \circ f} (which is defined in a neighbourhood of {z_0}) is complex differentiable at {z_0}, with derivative

    \displaystyle  (g \circ f)'(z_0) = g'(f(z_0)) f'(z_0).

(Hint: take your favourite proof of the real-variable version of these facts and adapt them to the complex setting.)

One could also state and prove a complex-variable form of the inverse function theorem here, but the proof of that statement is a bit more complicated than the ones in the above exercise, so we defer it until later in the course when it becomes needed.

If a function {f: {\bf C} \rightarrow {\bf C}} is holomorphic on the entire complex plane, we call it an entire function; clearly such functions remain holomorphic when restricted to any open subset {U} of the complex plane. Thus for instance Proposition 3 tells us that the functions {z \mapsto z^n} are entire, and from linearity we then see that any complex polynomial

\displaystyle  P(z) = a_n z^n + \dots + a_1 z + a_0

will be an entire function, with derivative given by the familiar formula

\displaystyle  P'(z) = n a_n z^{n-1} + \dots + a_1. \ \ \ \ \ (2)

A function of the form {P(z)/Q(z)}, where {P,Q} are polynomials with {Q} not identically zero, is called a rational function, being to polynomials as rational numbers are to integers. Such a rational function is well defined as long as {Q} is not zero. From the factor theorem (which works over any field, and in particular over the complex numbers) we know that the number of zeroes of {Q} is finite, being bounded by the degree of {Q} (of course we will be able to say something stronger once we have the fundamental theorem of algebra). Because of these singularities, rational functions are rarely entire; but from the quotient rule we do at least see that {P(z)/Q(z)} is complex differentiable wherever the denominator is non-zero. Such functions are prime examples of meromorphic functions, which we will discuss later in the course.

Exercise 5 (Gauss-Lucas theorem) Let {P(z)} be a complex polynomial that is factored as

\displaystyle  P(z) = c (z-z_1) \dots (z-z_n)

for some non-zero constant {c \in {\bf C}} and roots {z_1,\dots,z_n \in {\bf C}} (not necessarily distinct) with {n \geq 1}.

  • (i) Suppose that {z_1,\dots,z_n} all lie in the upper half-plane {\{ z \in {\bf C}: \mathrm{Im}(z) \geq 0 \}}. Show that any root of the derivative {P'(z)} also lies in the upper half-plane. (Hint: use the product rule to decompose the log-derivative {\frac{P'(z)}{P(z)}} into partial fractions, and then investigate the sign of the imaginary part of this log-derivative for {z} outside the upper half-plane.)
  • (ii) Show that all the roots of {P'} lie in the convex hull of the set {z_1,\dots,z_n} of roots of {P}, that is to say the smallest convex polygon that contains {z_1,\dots,z_n}.

Now we discuss power series, which are infinite degree variants of polynomials, and which turn out to inherit many of the algebraic and analytic properties of such polynomials, at least if one stays within the disk of convergence.

Definition 6 (Power series) Let {z_0} be a complex number. A formal power series with complex coefficients around the point {z_0} is a formal series of the form

\displaystyle  \sum_{n=0}^\infty a_n (\mathrm{z}-z_0)^n

for some complex numbers {a_0, a_1, \dots}, with {\mathrm{z}} an indeterminate.

One can attempt to evaluate a formal power series {\sum_{n=0}^\infty a_n (\mathrm{z}-z_0)^n} at a given complex number {z} by replacing the formal indeterminate {\mathrm{z}} with the complex number {z}. This may or may not produce a convergent (or absolutely convergent) series, depending on where {z} is; for instance, the power series {\sum_{n=0}^\infty a_n (\mathrm{z}-z_0)^n} is always absolutely convergent at {z=z_0}, but the geometric power series {\sum_{n=0}^\infty z^n} fails to be even conditionally convergent whenever {|z| > 1} (since the summands do not go to zero). As it turns out, the region of convergence is always essentially a disk, the size of which depends on how rapidly the coefficients {a_n} decay (or how slowly they grow):

Proposition 7 (Convergence of power series) Let {\sum_{n=0}^\infty a_n (\mathrm{z}-z_0)^n} be a formal power series, and define the radius of convergence {R \in [0,+\infty]} of the series to be the quantity

\displaystyle  R := \liminf_{n \rightarrow \infty} |a_n|^{-1/n} \ \ \ \ \ (3)

with the convention that {|a_n|^{-1/n}} is infinite if {a_n=0}. (Note that {R} is allowed to be zero or infinite.) Then the formal power series is absolutely convergent for any {z} in the disk {D(z_0,R) := \{ z: |z-z_0| < R \}} (known as the disk of convergence), and is divergent (i.e., not convergent) for any {z} in the exterior region {\{ z: |z-z_0| > R \}}.

Proof: The proof is nearly identical to the analogous result for real power series. First suppose that {z} is a complex number with {|z-z_0| > R} (this of course implies that {R} is finite). Then by (3), we have {|a_n|^{-1/n} < |z-z_0|} for infinitely many {n}, which after some rearranging implies that {| a_n (z-z_0)^n | > 1} for infinitely many {n}. In particular, the sequence {a_n (z-z_0)^n} does not go to zero as {n \rightarrow \infty}, which implies that {\sum_{n=0}^\infty a_n (z-z_0)^n} is divergent.

Now suppose that {z} is a complex number with {|z-z_0| < R} (this of course implies that {R} is non-zero). Choose a real number {r} with {|z-z_0| < r < R}, then by (3), we have {|a_n|^{-1/n} > r} for all sufficiently large {n}, which after some rearranging implies that

\displaystyle  |a_n (z-z_0)^n | < (\frac{|z-z_0|}{r})^n

for all sufficiently large {n}. Since the geometric series {\sum_{n=0}^\infty (\frac{|z-z_0|}{r})^n} is absolutely convergent, this implies that {\sum_{n=0}^\infty a_n (z-z_0)^n} is absolutely convergent also, as required. \Box

Remark 8 Note that this proposition does not say what happens on the boundary {\{ z: |z-z_0| = R \}} of this disk (assuming for sake of discussion that the radius of convergence {R} is finite and non-zero). The behaviour of power series on and near the boundary of the disk of convergence is in fact remarkably subtle; see for instance Example 11 below.

The above proposition gives a “root test” formula for the radius of convergence. The following “ratio test” variant gives a convenient lower bound for the radius of convergence which suffices in many applications:

Exercise 9 (Ratio test) If {\sum_{n=0}^\infty a_n (\mathrm{z}-z_0)^n} is a formal power series with the {a_n} non-zero for all sufficiently large {n}, show that the radius of convergence {R} of the series obeys the lower bound

\displaystyle  \limsup_{n \rightarrow \infty} \frac{|a_n|}{|a_{n+1}|} \geq R \geq \liminf_{n \rightarrow \infty} \frac{|a_n|}{|a_{n+1}|}. \ \ \ \ \ (4)

In particular, if the limit {\lim_{n \rightarrow \infty} \frac{|a_n|}{|a_{n+1}|}} exists, then it is equal to {R}. Give example to show that strict inequality can hold in both bounds in (4).

If a formal power series {\sum_{n=0}^\infty a_n (\mathrm{z}-z_0)^n} has a positive radius of convergence, then it defines a function {F: D(z_0,R) \rightarrow {\bf C}} in the disk of convergence by setting

\displaystyle  F(z) := \sum_{n=0}^\infty a_n (z-z_0)^n.

We refer to such a function as a power series, and refer to {R} as the radius of convergence of that power series. (Strictly speaking, a formal power series and a power series are different concepts, but there is little actual harm in conflating them together in practice, because of the uniqueness property established in Exercise 17 below.)

Example 10 The formal power series {\sum_{n=0}^\infty n! \mathrm{z}^n} has a zero radius of convergence, thanks to the ratio test, and so only converges at {z=0}. Conversely, the exponential formal power series {\sum_{n=0}^\infty \frac{\mathrm{z}^n}{n!}} has an infinite radius of convergence (thanks to the ratio test), and converges of course to {\exp(z)} when evaluated at any complex number {z}.

Example 11 (Geometric series) The formal power series {\sum_{n=0}^\infty \mathrm{z}^n} has radius of convergence {1}. If {z} lies in the disk of convergence {D(0,1) = \{ z \in {\bf C}: |z| < 1 \}}, then we have

\displaystyle  z \sum_{n=0}^\infty z^n = \sum_{n=0}^\infty z^{n+1}

\displaystyle  = \sum_{n=1}^\infty z^n

\displaystyle  = \sum_{n=0}^\infty z^n - 1

and thus after some algebra we obtain the geometric series formula

\displaystyle  \sum_{n=0}^\infty z^n = \frac{1}{1-z} \ \ \ \ \ (5)

as long as {z} is inside the disk {D(0,1)}. The function {z \mapsto \frac{1}{1-z}} does not extend continuously to the boundary point {z=1} of the disk, but does extend continuously (and even smoothly) to the rest of the boundary, and is in fact holomorphic on the remainder {{\bf C} \backslash \{1\}} of the complex plane. However, the geometric series {\sum_{n=0}^\infty z^n} diverges at every single point of this boundary (when {|z|=1}, the coefficients {z^n} of the series do not converge to zero), and of course definitely diverge outside of the disk as well. Thus we see that the function that a power series converges to can extend well beyond the disk of convergence, which thus may only capture a portion of the domain of definition of that function. For instance, if one formally applies (5) with, say, {z=2}, one ends up with the apparent identity

\displaystyle  1 + 2 + 2^2 + 2^3 + \dots = -1.

This identity does not make sense if one interprets infinite series in the classical fashion, as the series {1 + 2 + 2^2 + \dots} is definitely divergent. However, by formally extending identities such as (5) beyond their disk of convergence, we can generalise the notion of summation of infinite series to assign meaningful values to such series even if they do not converge in the classical sense. This leads to generalised summation methods such as zeta function regularisation, which are discussed in this previous blog post. However, we will not use such generalised interpretations of summation very much in this course.

Exercise 12 For any complex numbers {a, r, z_0}, show that the formal power series {\sum_{n=0}^\infty a r^n (\mathrm{z}-z_0)^n} has radius of convergence {1/|r|} (with the convention that this is infinite for {r=0}), and is equal to the function {z \mapsto \frac{a}{1 - r(z-z_0)}} inside the disk of convergence.

Exercise 13 For any positive integer {m}, show that the formal power series

\displaystyle  \sum_{n=0}^\infty \binom{n+m}{m} \mathrm{z}^n

has radius of convergence {1}, and converges to the function {z \mapsto \frac{1}{(1-z)^{m+1}}} in the disk {D(0,1)}. Here of course {\binom{n+m}{m} := \frac{(n+m)!}{n! m!}} is the usual binomial coefficient.

We have seen above that power series can be well behaved as one approaches the boundary of the disk of convergence, while being divergent at the boundary. However, the converse scenario, in which the power series converges at the boundary but does not behave well as one approaches the boundary, does not occur:

Exercise 14

  • (i) (Summation by parts formula) Let {a_0,a_1,a_2,\dots,a_N} be a finite sequence of complex numbers, and let {A_n := a_0 + \dots + a_n} be the partial sums for {n=0,\dots,N}. Show that for any complex numbers {b_0,\dots,b_N}, that

    \displaystyle  \sum_{n=0}^N a_n b_n = \sum_{n=0}^{N-1} A_n (b_n - b_{n+1}) + b_N A_N.

  • (ii) Let {a_0,a_1,\dots} be a sequence of complex numbers such that {\sum_{n=0}^\infty a_n} is convergent (not necessarily absolutely) to zero. Show that for any {0 < r < 1}, the series {\sum_{n=0}^\infty a_n r^n} is absolutely convergent, and

    \displaystyle  \lim_{r \rightarrow 1^-} \sum_{n=0}^\infty a_n r^n = 0.

    (Hint: use summation by parts and a limiting argument to express {\sum_{n=0}^\infty a_n r^n} in terms of the partial sums {A_n = a_0 + \dots + a_n}.)

  • (iii) (Abel’s theorem) Let {F(z) = \sum_{n=0}^\infty a_n (z-z_0)^n} be a power series with a finite positive radius of convergence {R}, and let {z_1 := z_0+Re^{i\theta}} be a point on the boundary of the disk of convergence at which the series {\sum_{n=0}^\infty a_n (z_1 - z_0)^n} converges (not necessarily absolutely). Show that {\lim_{r \rightarrow R^-} F(z_0 + r e^{i\theta}) = F(z_1)}. (Hint: use various translations and rotations to reduce to the case considered in (ii).)
  • As a general rule of thumb, as long as one is inside the disk of convergence, power series behave very similarly to polynomials. In particular, we can generalise the differentiation formula (2) to such power series:

    Theorem 15 Let {F(z) = \sum_{n=0}^\infty a_n (z-z_0)^n} be a power series with a positive radius of convergence {R}. Then {F} is holomorphic on the disk of convergence {D(z_0,R)}, and the derivative {F'} is given by the power series

    \displaystyle  F'(z) = \sum_{n=1}^\infty n a_n (z-z_0)^{n-1} = \sum_{n=0}^\infty (n+1) a_{n+1} (z-z_0)^n

    that has the same radius of convergence {R} as {F}.

    Proof: From (3), the standard limit {\lim_{n \rightarrow \infty} n^{1/n} = 1} and the usual limit laws, it is easy to see that the power series {\sum_{n=0}^\infty (n+1) a_{n+1} (z-z_0)^n} has the same radius of convergence {R} as {\sum_{n=0}^\infty a_n (z-z_0)^n}. To show that this series is actually the derivative of {F}, we use first principles. If {z_1} lies in the disk of convergence, we consider the Newton quotient

    \displaystyle  \frac{F(z)-F(z_1)}{z-z_1}

    for {z \in D(z_0,R) \backslash \{z_1\}}. Expanding out the absolutely convergent series {F(z)} and {F(z_1)}, we can write

    \displaystyle \frac{F(z)-F(z_1)}{z-z_1} = \sum_{n=0}^\infty a_n \frac{(z-z_0)^n - (z_1-z_0)^n}{z-z_1}.

    The ratio {\frac{z^n - z_1^n}{z-z_1}} vanishes for {n=0}, and for {n \geq 1} it is equal to {(z-z_0)^{n-1} + (z-z_0)^{n-2} (z_1-z_0) + \dots + (z_1-z_0)^{n-1}} as in the proof of Proposition 3. Thus

    \displaystyle \frac{F(z)-F(z_1)}{z-z_1} = \sum_{n=1}^\infty a_n ((z-z_0)^{n-1} + (z-z_0)^{n-2} (z_1-z_0) + \dots + (z_1-z_0)^{n-1}).

    As {z} approaches {z_1}, each summand {a_n ((z-z_0)^{n-1} + (z-z_0)^{n-2} (z_1-z_0) + \dots + (z_1-z_0)^{n-1})} converges to {n a_n (z-z_0)^{n-1}}. This almost proves the desired limiting formula

    \displaystyle  \lim_{z \rightarrow z_1: z \in D(z_0,R) \backslash \{z_1\}} \frac{F(z)-F(z_1)}{z-z_1} = \sum_{n=1}^\infty n a_n (z-z_0)^{n-1},

    but we need to justify the interchange of a sum and limit. Fortunately we have a standard tool for this, namely the Weierstrass {M}-test (which works for complex-valued functions exactly as it does for real-valued functions; one could also use the dominated convergence theorem here). It will be convenient to select two real numbers {r_1,r_2} with {|z_1-z_0| < r_1 < r_2 < R}. Clearly, for {z} close enough to {z_1}, we have {|z-z_0| < r_1}. By the triangle inequality we then have

    \displaystyle  |a_n ((z-z_0)^{n-1} + (z-z_0)^{n-2} (z_1-z_0) + \dots + (z_1-z_0)^{n-1})| \leq n a_n r_1^{n-1}.

    On the other hand, from (3) we know that {|a_n|^{-1/n} \geq r_2} for sufficiently large {n}, hence {|a_n| \leq r_2^{-n}} for sufficiently large {n}. From the ratio test we know that the series {\sum_{n=1}^\infty n r_2^{-n} r_1^{n-1}} is absolutely convergent, hence the series {\sum_{n=1}^\infty n a_n r_1^{n-1}} is also. Thus, for {z} sufficiently close to {z_1}, the summands {|a_n ((z-z_0)^{n-1} + (z-z_0)^{n-2} (z_1-z_0) + \dots + (z_1-z_0)^{n-1})|} are uniformly dominated by an absolutely summable sequence of numbers {n a_n r_1^{n-1}}. Applying the Weierstrass {M}-test (or dominated convergence theorem), we obtain the claim. \Box

    Exercise 16 Prove the above theorem directly using epsilon and delta type arguments, rather than invoking the {M}-test or the dominated convergence theorem.

    We remark that the above theorem is a little easier to prove once we have the complex version of the fundamental theorem of calculus, but this will have to wait until the next set of notes, where we will also prove a remarkable converse to the above theorem, in that any holomorphic function can be expanded as a power series around any point in its domain.

    A convenient feature of power series is the ability to equate coefficients: if two power series around the same point {z_0} agree, then their coefficients must also agree. More precisely, we have:

    Exercise 17 (Taylor expansion and uniqueness of power series) Let {F(z) = \sum_{n=0}^\infty a_n (z-z_0)^n} be a power series with a positive radius of convergence. Show that {a_n = \frac{1}{n!} F^{(n)}(z_0)}, where {F^{(n)}} denotes the {n^{\mathrm{th}}} complex derivative of {F}. In particular, if {G(z) = \sum_{n=0}^\infty b_n (z-z_0)^n} is another power series around {z_0} with a positive radius of convergence which agrees with {F} on some neighbourhood {U} of {z_0} (thus, {F(z)=G(z)} for all {z \in U}), show that the coefficients of {F} and {G} are identical, that is to say that {a_n = b_n} for all {n \geq 0}.

    Of course, one can no longer compare coefficients so easily if the power series are based around two different points. For instance, from Exercise 11 we see that the geometric series {\sum_{n=0}^\infty z^n} and {\sum_{n=0}^\infty \frac{1}{2^{n-1}} (z+1)^n} both converge to the same function {\frac{1}{1-z}} on the unit disk {D(0,1)}, but have differing coefficients. The precise relation between the coefficients of power series of the same function is given as follows:

    Exercise 18 (Changing the origin of a power series) Let {F(z) = \sum_{n=0}^\infty a_n (z-z_0)^n} be a power series with a positive radius of convergence {R}. Let {z_1} be an element of the disk of convergence {D(z_0,R)}. Show that the formal power series {\sum_{n=0}^\infty b_n (z-z_1)^n}, where

    \displaystyle  b_n := \sum_{m=n}^\infty \binom{m}{n} a_m (z_1-z_0)^{m-n},

    has radius of convergence at least {R - |z_1-z_0|}, and converges to {F(z)} on the disk {D(z_1, R - |z_1-z_0|)}. Here of course {\binom{m}{n} = \frac{m!}{n!(m-n)!}} is the usual binomial coefficient.

    Theorem 15 gives us a rich supply of complex differentiable functions, particularly when combined with Exercise 4. For instance, the complex exponential function

    \displaystyle  e^z = \sum_{n=0}^\infty \frac{z^n}{n!}

    has an infinite radius of convergence, and so is entire, and is its own derivative:

    \displaystyle  \frac{d}{dz} e^z = \sum_{n=1}^\infty n \frac{z^{n-1}}{n!} = \sum_{n=0}^\infty \frac{z^n}{n!} = e^z.

    This makes the complex trigonometric functions

    \displaystyle  \sin(z) := \frac{e^{iz} - e^{-iz}}{2i}, \quad \cos(z) := \frac{e^{iz} + e^{-iz}}{2}

    entire as well, and from the chain rule we recover the familiar formulae

    \displaystyle  \frac{d}{dz} \sin(z) = \cos(z); \quad \frac{d}{dz} \cos(z) = - \sin(z).

    Of course, one can combine these functions together in many ways to create countless other complex differentiable functions with explicitly computable derivatives, e.g. {\sin(z^2)} is an entire function with derivative {2z \cos(z^2)}, the tangent function {\tan(z) := \sin(z) / \cos(z)} is holomorphic outside of the discrete set {\{ (2k+1) \pi/2: k \in {\bf Z}\}} with derivative {\mathrm{sec}(z)^2 = 1/\cos(z)^2}, and so forth.

    Exercise 19 (Multiplication of power series) Let {F(z) = \sum_{n=0}^\infty a_n(z-z_0)^n} and {G(z) = \sum_{n=0}^\infty b_n (z-z_0)^n} be power series that both have radius of convergence at least {R}. Show that on the disk {D(z_0,r)}, we have

    \displaystyle  F(z) G(z) = \sum_{n=0}^\infty c_n (z-z_0)^n

    where the right-hand side is another power series of radius of convergence at least {R}, with coefficients {c_n} given as the convolution

    \displaystyle  c_n := \sum_{m=0}^n a_m b_{n-m}

    of the sequences {a_n} and {b_n}.

    — 2. The Cauchy-Riemann equations —

    Thus far, the theory of complex differentiation closely resembles the analogous theory of real differentiation that one sees in an introductory real analysis class. But now we take advantage of the Argand plane representation of {{\bf C}} to view a function {z \mapsto f(z)} of one complex variable as a function {(x,y) \mapsto f(x+iy)} of two real variables. This gives rise to some further notions of differentiation. Indeed, if {f: U \rightarrow {\bf C}} is a function defined on an open subset of {{\bf C}}, and {z_0 = x_0 + i y_0} is a point in {U}, then in addition to the complex derivative

    \displaystyle  f'(z_0) := \lim_{z \rightarrow z_0: z \in U \backslash \{z_0\}} \frac{f(z)-f(z_0)}{z-z_0} \ \ \ \ \ (6)

    already discussed, we can also define (if they exist) the partial derivatives

    \displaystyle  \frac{\partial f}{\partial x}(z_0) := \lim_{h \rightarrow 0: h \in {\bf R} \backslash \{0\}} \frac{f((x_0+h) + i y_0) - f(x_0+iy_0)}{h}


    \displaystyle  \frac{\partial f}{\partial y}(z_0) := \lim_{h \rightarrow 0: h \in {\bf R} \backslash \{0\}} \frac{f(x_0 + i (y_0+h)) - f(x_0+iy_0)}{h};

    these will be complex numbers if the limits on the right-hand side exist. There is also (if it exists) the gradient (or Fréchet derivative) {Df(z_0) \in {\bf C}^2}, defined as the vector {(D_1 f(z_0), D_2 f(z_0)) \in {\bf C}^2} with the property that

    \displaystyle  \lim_{(h_1,h_2) \rightarrow 0: (h_1,h_2) \in {\bf R}^2 \backslash \{0\}} \ \ \ \ \ (7)

    \displaystyle  \frac{|f((x_0+h_1)+i(y_0+h_2)) - f(x_0+iy_0) - h_1 Df_1(z_0) - h_2 Df_2(z_0)|}{|(h_1,h_2)|} = 0

    where {|(h_1,h_2)| := \sqrt{h_1^2+h_2^2}} is the Euclidean norm of {(h_1,h_2)}.

    These notions of derivative are of course closely related to each other. If a function {f: U \rightarrow {\bf C}} is Fréchet differentiable at {z_0}, in the sense that the gradient {Df(z_0)} exists, then on specialising the limit in (7) to vectors {(h_1,h_2)} of the form {(h,0)} or {(0,h)} we see that

    \displaystyle  \frac{\partial f}{\partial x}(z_0) = D_1 f(z_0)


    \displaystyle  \frac{\partial f}{\partial y}(z_0) = D_2 f(z_0)

    leading to the familiar formula

    \displaystyle  Df(z_0) = ( \frac{\partial f}{\partial x}(z_0), \frac{\partial f}{\partial y}(z_0)) \ \ \ \ \ (8)

    for the gradient {Df(z_0)} of a function {f} that is Fréchet differentiable at {z_0}. We caution however that it is possible for the partial derivatives { \frac{\partial f}{\partial x}(z_0), \frac{\partial f}{\partial y}(z_0)} of a function to exist without the function being Fréchet differentiable, in which case the formula (8) is of course not valid. (A typical example is the function {f: {\bf C} \rightarrow {\bf C}} defined by setting {f(x+iy) := \frac{xy}{x^2+y^2}} for {x+iy \neq 0}, with {f(0) := 0}; this function has both partial derivatives {\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}} existing at {z_0}, but {f} is not differentiable here.) On the other hand, if the partial derivatives { \frac{\partial f}{\partial x}(z_0), \frac{\partial f}{\partial y}(z_0)} exist everywhere on {U} and are additionally known to be continuous, then the fundamental theorem of calculus gives the identity

    \displaystyle  f((x_0+h_1) + i(y_0+h_2)) = f(x_0,y_0) + \int_0^{h_1} \frac{\partial f}{\partial x}(x_0+t+iy_0)\ dt

    \displaystyle  + \int_0^{h_2} \frac{\partial f}{\partial y}(x_0 + h_1 + i(y_0+t))\ dt

    for {x_0+iy_0 \in U} and {h_1,h_2} sufficiently small (with the convention that {\int_a^b = -\int_b^a} if {a>b}), and from this it is not difficult to see that {f} is then Fréchet differentiable everywhere on {U}.

    Similarly, if {f} is complex differentiable at {z_0}, then by specialising the limit (6) to variables {z} of the form {z = z_0 + h} or {z = z_0 + ih} for some non-zero real {h} near zero, we see that

    \displaystyle  \frac{\partial f}{\partial x}(z_0) = f'(z_0)


    \displaystyle  \frac{\partial f}{\partial y}(z_0) = i f'(z_0)

    leading in particular to the Cauchy-Riemann equations

    \displaystyle  \frac{\partial f}{\partial x}(z_0) = \frac{1}{i} \frac{\partial f}{\partial y}(z_0) \ \ \ \ \ (9)

    that must be satisfied in order for {f} to be complex differentiable. More generally, from (6) we see that if {f} is complex differentiable at {z_0}, then

    \displaystyle  \lim_{h \rightarrow 0: h \neq 0} \frac{|f(z_0+h) - f(z_0) - f'(z_0) h|}{|h|} = 0,

    which on comparison with (7) shows that {f} is also Fréchet differentiable with

    \displaystyle  Df(z_0) = (f'(z_0), i f'(z_0)).

    Finally, if {f} is Fréchet differentiable at {z_0} and one has the Cauchy-Riemann equations (9), then from (7) we have

    \displaystyle  \lim_{(h_1,h_2) \rightarrow 0: (h_1,h_2) \in {\bf R}^2 \backslash \{0\}}

    \displaystyle \frac{|f((x_0+h_1)+i(y_0+h_2)) - f(x_0+iy_0) - (h_1+ih_2) \frac{\partial f}{\partial x}(z_0)|}{|(h_1,h_2)|} = 0

    which after making the substitution {z := (x_0+h_1)+i(y_0+h_2)} gives

    \displaystyle  \lim_{z \rightarrow z_0: z \in U \backslash \{z_0\}} \frac{|f(z) - f(z_0) - (z-z_0) \frac{\partial f}{\partial x}(z_0)|}{|z-z_0|} = 0

    which on comparison with (6) shows that {f} is complex differentiable with

    \displaystyle  f'(z_0) =\frac{\partial f}{\partial x}(z_0).

    We summarise the above discussion as follows:

    Proposition 20 (Differentiability and the Cauchy-Riemann equations) Let {U} be an open subset of {{\bf C}}, let {F: U \rightarrow {\bf C}} be a function, and let {z_0} be an element of {{\bf C}}.

    Remark 21 From part (ii) of the above proposition we see that if {f} is Fre\’chet differentiable on {U} and obeys the Cauchy-Riemann equations on {U}, then it is holomorphic on {U}. One can ask whether the requirement of Freéchet differentiability can be weakened. It cannot be omitted entirely; one can show, for instance, that the function {f: {\bf C} \rightarrow {\bf C}} defined by {f(z) := e^{-1/z^4}} for non-zero {z} and {f(0) := 0} obeys the Cauchy-Riemann equations at every point {z_0 \in {\bf C}}, but is not complex differentiable (or even continuous) at the origin. But there is a somewhat difficult theorem of Looman and Menchoff that asserts that if {f} is continuous on {U} and obeys the Cauchy-Riemann equations on {U}, then it is holomorphic. We will not prove or use this theorem in this course; generally in modern applications, when one wants to weaken the regularity hypotheses of a theorem involving classical differentiation, the best way to do so is to replace the notion of a classical derivative with that of a weak derivative, rather than insist on computing derivatives in the classical pointwise sense. See this blog post for more discussion.

    Combining part (i) of the above proposition with Theorem 15, we also conclude as a corollary that any power series will be smooth inside its disk of convergence, in the sense all partial derivatives to all orders of this power series exist.

    Remark 22 From the geometric perspective, one can interpret complex differentiability at a point {z_0} as a requirement that a map {f} is conformal and orientation-preserving at {z_0}, at least in the non-degenerate case when {f'(z_0)} is non-zero. In more detail: suppose that {f: U \rightarrow {\bf C}} is a map that is complex differentiable at some point {z_0 \in U} with {f'(z_0) \neq 0}. Let {\gamma: (-\varepsilon,\varepsilon) \rightarrow U} be a differentiable curve with {\gamma(0)=z_0}; we view this as the trajectory of some particle which passes through {z_0} at time {t=0}. The derivative {\gamma'(0)} (defined in the usual manner by limits of Newton quotients) can then be viewed as the velocity of the particle as it passes through {z_0}. The map {f} takes this particle to a new particle parameterised by the curve {f \circ \gamma: (-\varepsilon,\varepsilon) \rightarrow {\bf C}}; at time {t=0}, this new particle passes through {f(z_0)}, and by the chain rule we see that the velocity of the new particle at this time is given by

    \displaystyle  (f \circ \gamma)'(0) = f'(z_0) \gamma'(0).

    Thus, if we write {f'(z_0)} in polar coordinates as {f'(z_0) = r e^{i\theta}}, the map {f} transforms the velocity of the particle by multiplying the speed by a factor of {r} and rotating the direction of travel counter-clockwise by {\theta}. In particular, we consider two differentiable trajectories {\gamma_1, \gamma_2} both passing through {z_0} at time {t=0} (with non-zero speeds), then the map {f} preserves the angle between the two velocity vectors {\gamma'_1(0), \gamma'_2(0)}, as well as their orientation (e.g. if {\gamma'_2(0)} is counterclockwise to {\gamma'_1(0)}, then {(f \circ \gamma_2)'(0)} is counterclockwise to {(f \circ \gamma_1)'(0)}. This is in contrast to, for instance, shear transformations such as {f(x+iy) := x + i(x+y)}, which preserve orientation but not angle, or the complex conjugation map {f(x+iy) := x-iy}, which preserve angle but not orientation. The same preservation of angle is present for real differentiable functions {f: U \rightarrow {\bf R}}, but is much less impressive in that case since the only angles possible between two vectors on the real line are {0} and {\pi}; it is the geometric two-dimensionality of the complex plane that makes conformality a much stronger and more “rigid” property for complex differentiable functions.

    If one breaks up a complex function {f: U \rightarrow {\bf C}} into real and imaginary parts {f = u+iv} for some {u,v: U \rightarrow {\bf R}}, then on taking real and imaginary parts one can express the Cauchy-Riemann equations as a system

    \displaystyle  \frac{\partial u}{\partial x}(z_0) = \frac{\partial v}{\partial y}(z_0) \ \ \ \ \ (10)

    \displaystyle  \frac{\partial v}{\partial x}(z_0) = -\frac{\partial u}{\partial y}(z_0) \ \ \ \ \ (11)

    of two partial differential equations for two functions {u,v}. This gives a quick way to test if various functions are differentiable. Consider for instance the conjugation function {f: z \mapsto \overline{z}}. In this case, {u(x+iy) = x} and {v(x+iy) = -y}. These functions, being polynomial in {x,y}, is certainly Fréchet differentiable everywhere; the equation (11) is always satisfied, but the equation (10) is never satisfied. As such, the conjugation function is never complex differentiable. Similarly for the real part function {z \mapsto \mathrm{Re}(z)}, the imaginary part function {z \mapsto \mathrm{Im}(z)}, or the absolute value function {z \mapsto |z|}. The function {z \mapsto |z|^2} has real part {u(x+iy) = x^2+y^2} and imaginary part {v(x+iy)=0}; one easily checks that the system (10), (11) is only satisfied when {x=y=0}, so this function is only complex differentiable at the origin. In particular, it is not holomorphic on any non-empty open set.

    The general rule of thumb that one should take away from these examples is that complex functions that are constructed purely out of “good” functions such as polynomials, the complex exponential, complex trigonometric functions, or other convergent power series are likely to be holomorphic, whereas functions that involve “bad” functions such as complex conjugation, the real and imaginary part, or the absolute value, are unlikely to be holomorphic.

    Exercise 23 (Wirtinger derivatives) Let {U} be an open subset of {{\bf C}}, and let {f: U \rightarrow {\bf C}} be a Fréchet differentiable function. Define the Wirtinger derivatives {\frac{\partial f}{\partial z}: U \rightarrow {\bf C}}, {\frac{\partial f}{\partial \overline{z}}: U \rightarrow {\bf C}} by the formulae

    \displaystyle  \frac{\partial f}{\partial z} := \frac{1}{2}( \frac{\partial f}{\partial x} + \frac{1}{i} \frac{\partial f}{\partial y} )

    \displaystyle  \frac{\partial f}{\partial \overline{z}} := \frac{1}{2}( \frac{\partial f}{\partial x} - \frac{1}{i} \frac{\partial f}{\partial y} ).

    Remark 24 Any polynomial

    \displaystyle  f(x+iy) = \sum_{n,m \geq 0: n+m \leq d} a_{n,m} x^n y^m

    in the real and imaginary parts {x,y} of {x+iy} can be rewritten as a polynomial in {z} and {\overline{z}} as per (17), using the usual identities

    \displaystyle  x = \frac{z+\overline{z}}{2}, y = \frac{z - \overline{z}}{2i}

    for {z = x+iy}. Thus such a non-holomorphic polynomial of one complex variable {z=x+iy} can be viewed as the restriction of a holomorphic polynomial

    \displaystyle  P(z_1,z_2) := \sum_{n,m \geq 0: n+m \leq d} c_{n,m} z_1^n z_2^m

    of two complex variables {z_1,z_2 \in {\bf C}} to the anti-diagonal {\{ (z_1,z_2) \in {\bf C}^2: z_2 = \overline{z_1}\}}, and the Wirtinger derivatives can then be interpreted as genuine (complex) partial derivatives in these two complex variables. More generally, Wirtinger derivatives are convenient tools in the subject of several complex variables, which we will not cover in this course.

    The Cauchy-Riemann equations couple the real and imaginary parts {u,v: U \rightarrow {\bf R}} of a holomorphic function to each other. But it is also possible to eliminate one of these components from the equations and obtain a constraint on just the real part {u}, or just the imaginary part {v}. Suppose for the moment that {f: U \rightarrow {\bf C}} is a holomorphic function which is twice continuously differentiable (thus the second partial derivatives {\frac{\partial^2 f}{\partial x^2}}, {\frac{\partial^2 f}{\partial y\partial x}}, {\frac{\partial^2 f}{\partial x \partial y}}, {\frac{\partial^2 f}{\partial y^2}} all exist and are continuous on {U}); we will show in the next set of notes that this extra hypothesis is in fact redundant. Assuming continuous second differentiability for now, we have Clairaut’s theorem

    \displaystyle  \frac{\partial^2 f}{\partial x \partial y} = \frac{\partial^2 f}{\partial y \partial x}

    everywhere on {U}. Similarly for the real and imaginary parts {u,v}. If we then differentiate (10) in the {x} direction, (11) in the {y} direction, and then sum, the derivatives of {v} cancel thanks to Clairaut’s theorem, and we obtain Laplace’s equation

    \displaystyle  \frac{\partial^2 u}{\partial x^2} + \frac{\partial^2 u}{\partial y^2} = 0 \ \ \ \ \ (13)

    which is often written more compactly as

    \displaystyle  \Delta u = 0

    where {\Delta} is the Laplacian operator

    \displaystyle  \Delta := \frac{\partial^2}{\partial x^2} + \frac{\partial^2}{\partial y^2}.

    A similar argument gives {\Delta v = 0}; by linearity we then also have {\Delta f=0}.

    Functions {u} for which {\Delta u = 0} are known as harmonic functions: thus we have shown that (twice continuously differentiable) holomorphic functions are automatically harmonic, as are the real and imaginary parts of such functions. The converse is not true: not every harmonic function {f: U \rightarrow {\bf C}} is holomorphic. For instance, the conjugate function {z \mapsto \overline{z}} is clearly harmonic on {{\bf C}}, but not holomorphic. We will return to the precise relationship between harmonic and holomorphic functions shortly.

    Harmonic functions have many remarkable properties. Since the second derivative in a given direction is a local measure of “convexity” of a function, we see from (13) that any convex behaviour of a harmonic function in one direction has to be balanced by an equal and opposite amount of concave behaviour in the orthogonal direction. A good example of a harmonic function to keep in mind is the function

    \displaystyle  u(x+iy) := x^2 - y^2

    which exhibits convex behavior in {x} and concave behavior in {y} in exactly opposite amounts. This function is the real part of the holomorphic function {f(z) :=z^2}, which is of course consistent with the previous observation that the real parts of holomorphic functions are harmonic.

    We will discuss harmonic functions more in later notes. For now, we record just one important property of these functions, namely the maximum principle:

    Theorem 25 (Maximum principle) Let {U} be an open subset of {{\bf C}}, and let {u: U \rightarrow {\bf R}} be a continuously twice differentiable harmonic function. Let {K} be a compact subset of {U}, and let {\partial K} be the boundary of {K}. Then

    \displaystyle  \sup_{z \in K} u(z) = \sup_{z \in \partial K} u(z) \ \ \ \ \ (14)

    and similarly

    \displaystyle  \inf_{z \in K} u(z) = \inf_{z \in \partial K} u(z) \ \ \ \ \ (15)

    Informally, the maximum principle asserts that the maximum of a real-valued harmonic function on a compact set is always attained on the boundary, and similarly for the minimum. In particular, any bound on the harmonic function that one can obtain on the boundary is automatically inherited by the interior. Compare this with a non-harmonic function such as {u(x+iy) := 1 - x^2 - y^2}, which is bounded by {0} on the boundary of the compact unit disk {\overline{D(0,1)} := \{ z \in {\bf C}: |z| \leq 1 \}}, but is not bounded by {0} on the interior of this disk.

    Proof: We begin with an “almost proof” of this principle, and then repair this attempted proof so that it is an actual proof.

    We will just prove (14), as (15) is proven similarly (or one can just observe that if {u} is harmonic then so is {-u}). Clearly we have

    \displaystyle  \sup_{z \in K} u(z) \geq \sup_{z \in \partial K} u(z)

    so the only scenario that needs to be excluded is when

    \displaystyle  \sup_{z \in K} u(z) > \sup_{z \in \partial K} u(z). \ \ \ \ \ (16)

    Suppose that this is the case. As {u} is continuous and {K} is compact, {u} must attain its maximum at some point {z_0} in {K}; from (16) we see that {z_0} must be an interior point. Since {z_0} is a local maximum of {u}, and {u} is twice differentiable, we must have

    \displaystyle  \frac{\partial^2 u}{\partial x^2}(z_0) \leq 0

    and similarly

    \displaystyle  \frac{\partial^2 u}{\partial y^2}(z_0) \leq 0.

    This almost, but does not quite, contradict the harmonicity of {u}, since it is still possible that both of these partial derivatives vanish. To get around this problem we use the trick of creating an epsilon of room, adding a tiny bit of convexity to {u}. Let {\varepsilon>0} be a small number to be chosen later, and let {u_\varepsilon: U \rightarrow {\bf R}} be the modified function

    \displaystyle  u_\varepsilon(x+iy) := u(x+iy) + \varepsilon(x^2+y^2).

    Since {K} is compact, the function {x^2+y^2} is bounded on {K}. Thus, from (16), we see that if {\varepsilon>0} is small enough we have

    \displaystyle  \sup_{z \in K} u_\varepsilon(z) > \sup_{z \in \partial K} u_\varepsilon(z).

    Arguing as before, {u_\varepsilon} must attain its maximum at some interior point {z_\varepsilon} of {K}, and so we again have

    \displaystyle  \frac{\partial^2 u_\varepsilon}{\partial x^2}(z_\varepsilon) \leq 0

    and similarly

    \displaystyle  \frac{\partial^2 u_\varepsilon}{\partial y^2}(z_\varepsilon) \leq 0.

    On the other hand, since {u} is harmonic, we have

    \displaystyle  \frac{\partial^2 u_\varepsilon}{\partial x^2} + \frac{\partial^2 u_\varepsilon}{\partial y^2} = \frac{\partial^2 u}{\partial x^2} + 2 \varepsilon + \frac{\partial^2 u}{\partial y^2} + 2\varepsilon = 4 \varepsilon > 0

    on {U}. These facts contradict each other, and we are done. \Box

    Exercise 26 (Maximum principle for holomorphic functions) If {f: U \rightarrow {\bf C}} is a continuously twice differentiable holomorphic function on an open set {U}, and {K} is a compact subset of {U}, show that

    \displaystyle  \sup_{z \in K} |f(z)| = \sup_{z \in \partial K} |f(z)|.

    (Hint: use Theorem 25 and the fact that {|w| = \sup_{\theta \in {\bf R}} \mathrm{Re} w e^{i\theta}} for any complex number {w}.) What happens if we replace the suprema on both sides by infima?

    Exercise 27 Recall the Wirtinger derivatives defined in Exercise 23(i).

    We have seen that the real and imaginary parts {u,v: U \rightarrow {\bf R}} of any holomorphic function {f: U \rightarrow {\bf C}} are harmonic functions. Conversely, let us call a harmonic function {v:U \rightarrow {\bf R}} a harmonic conjugate of another harmonic function {u: U \rightarrow {\bf R}} if {u+iv} is holomorphic on {U}; in the case that {u,v} are continuously twice differentiable, this is equivalent by Proposition 20 to {u,v} satisfying the Cauchy-Riemann equations (10), (11). Here is a short table giving some examples of harmonic conjugates:

    {u} {v} {u+iv}
    {x} {y} {z}
    {x} {y+1} {z+i}
    {y} {-x} {-iz}
    {x^2-y^2} {2xy} {z^2}
    {e^x \cos y} {e^x \sin y} {e^z}
    {\frac{x}{x^2+y^2}} {\frac{-y}{x^2+y^2}} {\frac{1}{z}}

    (for the last example one of course has to exclude the origin from the domain {U}).

    From Exercise 27(ii) we know that every harmonic polynomial has at least one harmonic conjugate; it is natural to ask whether the same fact is true for more general harmonic functions than polynomials. In the case that the domain {U} is the entire complex plane, the answer is affirmative:

    Proposition 28 Let {u: {\bf C} \rightarrow {\bf R}} be a continuously twice differentiable harmonic function. Then there exists a harmonic conjugate {v: {\bf C} \rightarrow {\bf R}} of {u}. Furthermore, this harmonic conjugate is unique up to constants: if {v,v'} are two harmonic conjugates of {u}, then {v'-v} is a constant function.

    Proof: We first prove uniqueness. If {v} is a harmonic conjugate of {u}, then from the fundamental theorem of calculus, we have

    \displaystyle  v(x+iy) = v(0) + \int_0^x \frac{\partial v}{\partial x}(t)\ dt + \int_0^y \frac{\partial v}{\partial y}(x+it)\ dt

    and hence by the Cauchy-Riemann equations (10), (11) we have

    \displaystyle  v(x+iy) = v(0) - \int_0^x \frac{\partial u}{\partial y}(t)\ dt + \int_0^y \frac{\partial u}{\partial x}(x+it)\ dt.

    Similarly for any other harmonic conjugate {v'} of {v}. It is now clear that {v.} and {v'} differ by a constant.

    Now we prove existence. Inspired by the above calculations, we define {v: {\bf C} \rightarrow {\bf R}} to be the define {v} explicitly by the formula

    \displaystyle  v(x+iy) := -\int_0^x \frac{\partial u}{\partial y}(t)\ dt + \int_0^y \frac{\partial u}{\partial x}(x+it)\ dt. \ \ \ \ \ (18)

    From the fundamental theorem of calculus, we see that {v} is differentiable in the {y} direction with

    \displaystyle  \frac{\partial v}{\partial y}(x+it) = \frac{\partial u}{\partial x}(x+it)\ dt.

    This is one of the two Cauchy-Riemann equations needed. To obtain the other one, we differentiate (18) in the {x} variable. The fact that {u} is continuously twice differentiable allows one to differentiate under the integral sign (exercise!) and conclude that

    \displaystyle  \frac{\partial v}{\partial x}(x+iy) := - \frac{\partial u}{\partial y}(x) + \int_0^y \frac{\partial^2 u}{\partial x^2}(x+it)\ dt.

    As {u} is harmonic, we have {\frac{\partial^2 u}{\partial x^2} = - \frac{\partial^2 u}{\partial y^2}}, so by the fundamental theorem of calculus we conclude that

    \displaystyle  \frac{\partial v}{\partial x}(x+iy) = -\frac{\partial u}{\partial y}(x+iy).

    Thus we now have both of the Cauchy-Riemann equations (10), (11) in {{\bf C}}. Differentiating these equations again, we conclude that {v} is twice continuously differentiable, and hence by Proposition 20 we have {u+iv} holomorphic on {{\bf C}}, giving the claim. \Box

    The same argument would also work for some other domains than {{\bf C}}, such as rectangles {\{ z: a < \mathrm{Re}(z) < b; c < \mathrm{Im}(z) < d \}}. To handle the general case, though, it becomes convenient to introduce the notion of contour integration, which we will do in the next set of notes. In some cases (specifically, when the underlying domain {U} fails to be simply connected), it will turn out that some harmonic functions do not have conjugates!

    Exercise 29 Show that an entire function {f: {\bf C} \rightarrow {\bf C}} can real-valued on {{\bf C}} only if it is constant.

    Exercise 30 Let {c} be a complex number. Show that if {f: {\bf C} \rightarrow {\bf C}} is an entire function such that {\frac{df}{dz}(z) = cf(z)} for all {z \in {\bf C}}, then {f(z) = f(0) \exp( cz)} for all {z \in {\bf C}}.

    Filed under: 246A - complex analysis, math.CA, math.CV Tagged: Cauchy-Riemann equations, differentiation, harmonic functions, holomorphicity, power series

    BackreactionDear Dr B: What do physicists mean by “quantum gravity”?

    [Image Source:]
    “please could you give me a simple definition of "quantum gravity"?


    Dear J,

    Physicists refer with “quantum gravity” not so much to a specific theory but to the sought-after solution to various problems in the established theories. The most pressing problem is that the standard model combined with general relativity is internally inconsistent. If we just use both as they are, we arrive at conclusions which do not agree with each other. So just throwing them together doesn’t work. Something else is needed, and that something else is what we call quantum gravity.

    Unfortunately, the effects of quantum gravity are very small, so presently we have no observations to guide theory development. In all experiments made so far, it’s sufficient to use unquantized gravity.

    Nobody knows how to combine a quantum theory – like the standard model – with a non-quantum theory – like general relativity – without running into difficulties (except for me, but nobody listens). Therefore the main strategy has become to find a way to give quantum properties to gravity. Or, since Einstein taught us gravity is nothing but the curvature of space-time, to give quantum properties to space and time.

    Just combining quantum field theory with general relativity doesn’t work because, as confirmed by countless experiments, all the particles we know have quantum properties. This means (among many other things) they are subject to Heisenberg’s uncertainty principle and can be in quantum superpositions. But they also carry energy and hence should create a gravitational field. In general relativity, however, the gravitational field can’t be in a quantum superposition, so it can’t be directly attached to the particles, as it should be.

    One can try to find a solution to this conundrum, for example by not directly coupling the energy (and related quantities like mass, pressure, momentum flux and so on) to gravity, but instead only coupling the average value, which behaves more like a classical field. This solves one problem, but creates a new one. The average value of a quantum state must be updated upon measurement. This measurement postulate is a non-local prescription and general relativity can’t deal with it – after all Einstein invented general relativity to get rid of the non-locality of Newtonian gravity. (Neither decoherence nor many worlds remove the problem, you still have to update the probabilities, somehow, somewhere.)

    The quantum field theories of the standard model and general relativity clash in other ways. If we try to understand the evaporation of black holes, for example, we run into another inconsistency: Black holes emit Hawking-radiation due to quantum effects of the matter fields. This radiation doesn’t carry information about what formed the black hole. And so, if the black hole entirely evaporates, this results in an irreversible process because from the end-state one can’t infer the initial state. This evaporation however can’t be accommodated in a quantum theory, where all processes can be time-reversed – it’s another contradiction that we hope quantum gravity will resolve.

    Then there is the problem with the singularities in general relativity. Singularities, where the space-time curvature becomes infinitely large, are not mathematical inconsistencies. But they are believed to be physical nonsense. Using dimensional analysis, one can estimate that the effects of quantum gravity should become large close by the singularities. And so we think that quantum gravity should replace the singularities with a better-behaved quantum space-time.

    The sought-after theory of quantum gravity is expected to solve these three problems: tell us how to couple quantum matter to gravity, explain what happens to information that falls into a black hole, and avoid singularities in general relativity. Any theory which achieves this we’d call quantum gravity, whether or not you actually get it by quantizing gravity.

    Physicists are presently pursuing various approaches to a theory of quantum gravity, notably string theory, loop quantum gravity, asymptotically safe gravity, and causal dynamical triangulation, for just to name the most popular ones. But none of these approaches has experimental evidence speaking for it. Indeed, so far none of them has made a testable prediction.

    This is why, in the area of quantum gravity phenomenology, we’re bridging the gap between theory and experiment with simplified models, some of which motivated by specific approaches (hence: string phenomenology, loop quantum cosmology, and so on). These phenomenological models don’t aim to directly solve the above mentioned problems, they merely provide a mathematical framework – consistent in its range of applicability – to quantify and hence test the presence of effects that could be signals of quantum gravity, for example space-time fluctuations, violations of the equivalence principle, deviations from general relativity, and so on.

    Thanks for an interesting question!

    September 26, 2016

    Clifford JohnsonWhere I’d Rather Be…?

    floorboards_shareRight now, I'm much rather be on the sofa reading a novel (or whatever it is she's reading)....instead of drawing all those floorboards near her. (Going to add "rooms with lots of floorboards" to [...] Click to continue reading this post

    The post Where I’d Rather Be…? appeared first on Asymptotia.

    n-Category Café Euclidean, Hyperbolic and Elliptic Geometry

    There are two famous kinds of non-Euclidean geometry: hyperbolic geometry and elliptic geometry (which almost deserves to be called ‘spherical’ geometry, but not quite because we identify antipodal points on the sphere).

    In fact, these two kinds of geometry, together with Euclidean geometry, fit into a unified framework with a parameter ss \in \mathbb{R} that tells you the curvature of space:

    • when s>0s \gt 0 you’re doing elliptic geometry

    • when s=0s = 0 you’re doing Euclidean geometry

    • when s<0s \lt 0 you’re doing hyperbolic geometry.

    This is all well-known, but I’m trying to explain it in a course I’m teaching, and there’s something that’s bugging me.

    It concerns the precise way in which elliptic and hyperbolic geometry reduce to Euclidean geometry as s0s \to 0. I know this is a problem of deformation theory involving a group contraction, indeed I know all sorts of fancy junk, but my problem is fairly basic and this junk isn’t helping.

    Here’s the nice part:

    Give 3\mathbb{R}^3 a bilinear form that depends on the parameter ss \in \mathbb{R}:

    v sw=v 1w 1+v 2w 2+sv 3w 3 v \cdot_s w = v_1 w_1 + v_2 w_2 + s v_3 w_3

    Let SO s(3)SO_s(3) be the group of linear transformations 3\mathbb{R}^3 having determinant 1 that preserve s\cdot_s. Then:

    • when s>0s \gt 0, SO s(3)SO_s(3) is isomorphic to the symmetry group of elliptic geometry,

    • when s=0s = 0, SO s(3)SO_s(3) is isomorphic to the symmetry group of Euclidean geometry,

    • when s<0s \lt 0, SO s(3)SO_s(3) is isomorphic to the symmetry group of hyperbolic geometry.

    This is sort of obvious except for s=0s = 0. The cool part is that it’s still true in the case s=0s = 0! The linear transformations having determinant 1 that preserve the bilinear form

    v 0w=v 1w 1+v 2w 2 v \cdot_0 w = v_1 w_1 + v_2 w_2

    look like this:

    (cosθ sinθ 0 sinθ cosθ 0 a b 1)\left( \begin{array}{ccc} \cos \theta & -\sin \theta & 0 \\ \sin \theta & \cos \theta & 0 \\ a & b & 1 \end{array} \right)

    And these form a group isomorphic to the Euclidean group — the group of transformations of the plane generated by rotations and translations!

    So far, everything sounds pleasantly systematic. But then things get a bit quirky:

    • Elliptic case. When s>0s \gt 0, the space X={v sv=1}X = \{v \cdot_s v = 1\} is an ellipsoid. The 1d linear subspaces of 3\mathbb{R}^3 having nonempty intersection with XX are the points of elliptic geometry. The 2d linear subspaces of 3\mathbb{R}^3 having nonempty intersection with XX are the lines. The group SO s(3)SO_s(3) acts on the space of points and the space of lines, preserving the obvious incidence relation.

    Why not just use XX as our space of points? This would give a sphere, and we could use great circles as our lines—but then distinct lines would always intersect in two points, and two points would not determine a unique line. So we want to identify antipodal points on the sphere, and one way is to do what I’ve done.

    • Hyperbolic case. When s<0s \lt 0, the space X={v sv=1}X = \{v \cdot_s v = -1\} is a hyperboloid with two sheets. The 1d linear subspaces of 3\mathbb{R}^3 having nonempty intersection with XX are the points of hyperbolic geometry. The 2d linear subspaces of 3\mathbb{R}^3 having nonempty intersection with X sX_s are the lines. The group SO s(3)SO_s(3) acts on the space of points and the space of lines, preserving the obvious incidence relation.

    This time XX is hyperboloid with two sheets, but my procedure identifies antipodal points, leaving us with a single sheet. That’s nice.

    But the obnoxious thing is that in the hyperbolic case I took XX to be the set of points with v sv=1v \cdot_s v = -1, instead of v sv=1v \cdot_s v = 1. If I hadn’t switched the sign like that, XX would be the hyperboloid with one sheet. Maybe there’s a version of hyperbolic geometry based on the one-sheeted hyperboloid (with antipodal points identified), but nobody seems to talk about it! Have you heard about it? If not, why not?


    • Euclidean case. When s=0s = 0, the space X={v sv=1}X = \{v \cdot_s v = 1\} is a cylinder. The 1d linear subspaces of 3\mathbb{R}^3 having nonempty intersection with XX are the lines of Euclidean geometry. The 2d linear subspaces of 3\mathbb{R}^3 having nonempty intersection with XX are the points. The group SO s(3)SO_s(3) acts on the space of points and the space of lines, preserving their incidence relation.

    Yes, any point (a,b,c)(a,b,c) on the cylinder

    X 0={(a,b,c):a 2+b 2=1} X_0 = \{(a,b,c) : \; a^2 + b^2 = 1 \}

    determines a line in the Euclidean plane, namely the line

    ax+by+c=0 a x + b y + c = 0

    and antipodal points on the cylinder determine the same line. I’ll let you figure out the rest, or tell you if you’re curious.

    The problem with the Euclidean case is that points and lines are getting switched! Points are corresponding to certain 2d subspaces of 3\mathbb{R}^3, and lines are corresponding to certain 1d subspaces.

    You may just tell me I just got the analogy backwards. Indeed, in elliptic geometry every point has a line orthogonal to it, and vice versa. So we can switch what counts as points and what counts as lines in that case, without causing trouble. Unfortunately, it seem for hyperbolic geometry this is not true.

    There’s got to be some way to smooth things down and make them nice. I could explain my favorite option, and why it doesn’t quite work, but I shouldn’t pollute your brain with my failed ideas. At least not until you try the exact same ideas.

    I’m sure someone has figured this out already, somewhere.

    David Hoggbinaries, velocities, Gaia

    Early in the day, I discussed with Hans-Walter Rix (MPIA) the wide-separation binaries that Adrian Price-Whelan (Princeton) and I are finding in the Gaia DR1 data. He expressed some skepticism: Are we sure that such pairs can't be produced spuriously by the pipelines or systematic errors? That's important to check; no need to hurry out a wrong paper!

    Late in the day, I had two tiny, eensy breakthroughs: In the first, I figured out that Price-Whelan and I can cast our binary discovery project in terms of a ratio of tractable marginalized likelihoods. That would be fun, and it would constitute a (relatively) responsible use of the (noisy) parallax information. In the second, I was able to confirm (by experimental coding) the (annoyingly correct) intuition of Dan Foreman-Mackey (UW) that the linearized spectral shift is not precise enough for our extreme-precision radial-velocity needs. So I have to do full-up redshifting of everything.

    September 25, 2016

    John BaezStruggles with the Continuum (Part 8)

    We’ve been looking at how the continuum nature of spacetime poses problems for our favorite theories of physics—problems with infinities. Last time we saw a great example: general relativity predicts the existence of singularities, like black holes and the Big Bang. I explained exactly what these singularities really are. They’re not points or regions of spacetime! They’re more like ways for a particle to ‘fall off the edge of spacetime’. Technically, they are incomplete timelike or null geodesics.

    The next step is to ask whether these singularities rob general relativity of its predictive power. The ‘cosmic censorship hypothesis’, proposed by Penrose in 1969, claims they do not.

    In this final post I’ll talk about cosmic censorship, and conclude with some big questions… and a place where you can get all these posts in a single file.

    Cosmic censorship

    To say what we want to rule out, we must first think about what behaviors we consider acceptable. Consider first a black hole formed by the collapse of a star. According to general relativity, matter can fall into this black hole and ‘hit the singularity’ in a finite amount of proper time, but nothing can come out of the singularity.

    The time-reversed version of a black hole, called a ‘white hole’, is often considered more disturbing. White holes have never been seen, but they are mathematically valid solutions of Einstein’s equation. In a white hole, matter can come out of the singularity, but nothing can fall in. Naively, this seems to imply that the future is unpredictable given knowledge of the past. Of course, the same logic applied to black holes would say the past is unpredictable given knowledge of the future.

    Big Bang Cosmology

    Big Bang cosmology

    If white holes are disturbing, perhaps the Big Bang should be more so. In the usual solutions of general relativity describing the Big Bang, all matter in the universe comes out of a singularity! More precisely, if one follows any timelike geodesic back into the past, it becomes undefined after a finite amount of proper time. Naively, this may seem a massive violation of predictability: in this scenario, the whole universe ‘sprang out of nothing’ about 14 billion years ago.

    However, in all three examples so far—astrophysical black holes, their time-reversed versions and the Big Bang—spacetime is globally hyperbolic. I explained what this means last time. In simple terms, it means we can specify initial data at one moment in time and use the laws of physics to predict the future (and past) throughout all of spacetime. How is this compatible with the naive intuition that a singularity causes a failure of predictability?

    For any globally hyperbolic spacetime M, one can find a smoothly varying family of Cauchy surfaces S_t (t \in \mathbb{R}) such that each point of M lies on exactly one of these surfaces. This amounts to a way of chopping spacetime into ‘slices of space’ for various choices of the ‘time’ parameter t. For an astrophysical black hole, the singularity is in the future of all these surfaces. That is, an incomplete timelike or null geodesic must go through all these surfaces S_t before it becomes undefined. Similarly, for a white hole or the Big Bang, the singularity is in the past of all these surfaces. In either case, the singularity cannot interfere with our predictions of what occurs in spacetime.

    A more challenging example is posed by the Kerr–Newman solution of Einstein’s equation coupled to the vacuum Maxwell equations. When

    e^2 + (J/m)^2 < m^2

    this solution describes a rotating charged black hole with mass m, charge e and angular momentum J in units where c = G = 1. However, an electron violates this inequality. In 1968, Brandon Carter pointed out that if the electron were described by the Kerr–Newman solution, it would have a gyromagnetic ratio of g = 2, much closer to the true answer than a classical spinning sphere of charge, which gives g = 1. But since

    e^2 + (J/m)^2 > m^2

    this solution gives a spacetime that is not globally hyperbolic: it has closed timelike curves! It also contains a ‘naked singularity’. Roughly speaking, this is a singularity that can be seen by arbitrarily faraway observers in a spacetime whose geometry asymptotically approaches that of Minkowski spacetime. The existence of a naked singularity implies a failure of global hyperbolicity.

    The cosmic censorship hypothesis comes in a number of forms. The original version due to Penrose is now called ‘weak cosmic censorship’. It asserts that in a spacetime whose geometry asymptotically approaches that of Minkowski spacetime, gravitational collapse cannot produce a naked singularity.

    In 1991, Preskill and Thorne made a bet against Hawking in which they claimed that weak cosmic censorship was false. Hawking conceded this bet in 1997 when a counterexample was found. This features finely-tuned infalling matter poised right on the brink of forming a black hole. It almost creates a region from which light cannot escape—but not quite. Instead, it creates a naked singularity!

    Given the delicate nature of this construction, Hawking did not give up. Instead he made a second bet, which says that weak cosmic censorshop holds ‘generically’ — that is, for an open dense set of initial conditions.

    In 1999, Christodoulou proved that for spherically symmetric solutions of Einstein’s equation coupled to a massless scalar field, weak cosmic censorship holds generically. While spherical symmetry is a very restrictive assumption, this result is a good example of how, with plenty of work, we can make progress in rigorously settling the questions raised by general relativity.

    Indeed, Christodoulou has been a leader in this area. For example, the vacuum Einstein equations have solutions describing gravitational waves, much as the vacuum Maxwell equations have solutions describing electromagnetic waves. However, gravitational waves can actually form black holes when they collide. This raises the question of the stability of Minkowski spacetime. Must sufficiently small perturbations of the Minkowski metric go away in the form of gravitational radiation, or can tiny wrinkles in the fabric of spacetime somehow amplify themselves and cause trouble—perhaps even a singularity? In 1993, together with Klainerman, Christodoulou proved that Minkowski spacetime is indeed stable. Their proof fills a 514-page book.

    In 2008, Christodoulou completed an even longer rigorous study of the formation of black holes. This can be seen as a vastly more detailed look at questions which Penrose’s original singularity theorem addressed in a general, preliminary way. Nonetheless, there is much left to be done to understand the behavior of singularities in general relativity.


    In this series of posts, we’ve seen that in every major theory of physics, challenging mathematical questions arise from the assumption that spacetime is a continuum. The continuum threatens us with infinities! Do these infinities threaten our ability to extract predictions from these theories—or even our ability to formulate these theories in a precise way?

    We can answer these questions, but only with hard work. Is this a sign that we are somehow on the wrong track? Is the continuum as we understand it only an approximation to some deeper model of spacetime? Only time will tell. Nature is providing us with plenty of clues, but it will take patience to read them correctly.

    For more

    To delve deeper into singularities and cosmic censorship, try this delightful book, which is free online:

    • John Earman, Bangs, Crunches, Whimpers and Shrieks: Singularities and Acausalities in Relativistic Spacetimes, Oxford U. Press, Oxford, 1993.

    To read this whole series of posts in one place, with lots more references and links, see:

    • John Baez, Struggles with the continuum.

    Sean CarrollLive Q&As, Past and Future

    On Friday I had a few minutes free, and did an experiment: put my iPhone on a tripod, pointed it at myself, and did a live video on my public Facebook page, taking questions from anyone who happened by. There were some technical glitches, as one might expect from a short-notice happening. The sound wasn’t working when I first started, and in the recording below the video fails (replacing the actual recording with a still image of me sideways, for inexplicable reasons) just when the sound starts working. (I don’t think this happened during the actual event, but maybe it did and everyone was too polite to mention it.) And for some reason the video keeps going long after the 20-some minutes for which I was actually recording.

    But overall I think it was fun and potentially worth repeating. If I were to make this an occasional thing, how best to do it? This time around I literally just read off a selection of questions that people were typing into the Facebook comment box. Alternatively, I could just talk on some particular topic, or I could solicit questions ahead of time and pick out some good ones to answer in detail.

    What do you folks think? Also — is Facebook Live the right tool for this? I know the kids these days use all sorts of different technologies. No guarantees that I’ll have time to do this regularly, but it’s worth contemplating.

    What makes the most sense to talk about in live chats?

    David Hogggroup meetings

    At my morning group meeting, Will Farr (Birmingham) told us about CARMA models and their use in stellar radial velocity analysis. His view is that they are a possible basis or method for looking (coarsely) at asteroseismology. That meshes well with things we have been talking about at NYU about doing Gaussian Processes with kernels that are non-trivial in the frequency domain to identify asteroseismic modes.

    In the afternoon group meeting, we had a very wide-ranging conversation, about possible future work on CMB foregrounds, about using shrinkage priors to improve noisy measurements of SZ clusters and other low signal-to-noise objects, We also discussed the recent Dragonfly discovery of a very low surface-brightness galaxy, and whether it presents a challenge for cosmological models.

    Chad OrzelOnline Life Is Real Life, Aleph-Nought in a Series

    Back before The Pip was born, our previous departmental administrative assistant used to bug me– in a friendly way– about how Kate and I ought to have another kid. (She had two kids of her own, about two years apart in age.) “When are you guys going to have another baby?” she would ask, and I always said “We’re thinking about it.”

    About a week passed between the last time we had that exchange and the day I came in and taped ultrasound photos of the prenatal Little Dude to my door. “You sonofabitch!,” she said (again, in a friendly way), “You were expecting this whole time!” “Yeah,” I said, “but we weren’t telling anyone until now.”

    I thought of this while I was reading John Scalzi’s epic post about self-presentation, prompted by someone who complained that he behaved differently in person than that person had expected from Scalzi’s online persona. (Personally, having met John in person several times, I don’t see it, but whatever…) Scalzi rightly notes that there’s nothing at all wrong with this, and that much of the difference is (probably) just basic courtesy and politeness.

    This is not remotely a new argument, as you probably got from my snarky post title– it’s come around before, and will come around again. I side with Scalzi in thinking that there’s nothing wrong with presenting yourself in a slightly different way online than off. I’d go maybe a little further than that, though, and note that presenting yourself in different ways to different people is something we do all the time, even in strictly offline interactions.

    This made me think about the incident with our former admin, which is a bit of an extreme example, but illustrates the point. At home, Kate and I had obviously known about the proto-Pip for a couple of months, and were making all sorts of plans and so on. But while I cheerfully talked at some length about SteelyKid, I dodged any questions about future kids, because Kate and I had agreed that we weren’t making it public, yet. I think our parents knew (but I don’t recall the exact timing), but it wasn’t something going out on the blog or even to people I spent a lot of time talking to at work.

    And basically anyone who ins’t a phenomenal boor modifies their self-presentation in this sort of way. If you have a job and family, there are things you just aren’t allowed to share with people outside those contexts, and that modifies your interaction with different groups. I talk about campus life in a different way when I’m with a bunch of students than when I’m with fellow faculty, for example. Milder versions that don’t involve trade secrets also happen all the time: when we’re with friends who share a particular interest, we play up that interest, and play down other things– I talk a lot about sports at the gym with the regular pick-up basketball crowd, but sports aren’t as big a topic around the physics department. Even when we don’t have the Internet as an intermediary, we’re slightly different things to different groups of people, because that’s one of the things that lets human society function.

    This is something we do so smoothly that we’re often not really aware of it, which is why so many people think it’s an online-only phenomenon. But if you think about it a little, it’s probably not hard to come up with offline examples of people who behave very differently in different contexts. Even people who insist that they present themselves the same way in all possible situations almost certainly change the way they talk and what they talk about when they go from home, to work, to whatever they do for fun. It’s just how we work, and sticking a computer in the middle doesn’t fundamentally change that.

    You might argue that, by removing some of the non-verbal cues and ingrained rules about in-person interactions, the Internet enables a bigger disconnect between on- and off-line personae than are possible in a strictly offline context. I’m not entirely sure I buy that, though, because I’ve seen some pretty extreme divergences in offline-only interactions. (And, of course, there’s the “He was such a quiet guy…” trope about serial killers and the like.) What’s different about the Internet is the possibility of exposing these different personae to the whole world, rather than a small group of close acquaintances who might happen to run into a person in two different subgroups.

    But that’s another topic. The main point at the moment is that I agree with Scalzi: there’s nothing inappropriate about self-presenting in a different way in person than online. Mostly because that sort of shifting is a phenomenon that pre-dates the Internet by a good many years.

    September 24, 2016

    Tommaso DorigoA Book By Guido Tonelli

    Yesterday I read with interest and curiosity some pages of a book on the search and discovery of the Higgs boson, which was published last March by Rizzoli (in Italian only, at least for the time being). The book, authored by physics professor and ex CMS spokesperson Guido Tonelli, is titled "La nascita imperfetta delle cose" ("The imperfect birth of things"). 

    read more

    September 23, 2016

    John BaezStruggles with the Continuum (Part 7)

    Combining electromagnetism with relativity and quantum mechanics led to QED. Last time we saw the immense struggles with the continuum this caused. But combining gravity with relativity led Einstein to something equally remarkable: general relativity.

    In general relativity, infinities coming from the continuum nature of spacetime are deeply connected to its most dramatic successful predictions: black holes and the Big Bang. In this theory, the density of the Universe approaches infinity as we go back in time toward the Big Bang, and the density of a star approaches infinity as it collapses to form a black hole. Thus we might say that instead of struggling against infinities, general relativity accepts them and has learned to live with them.

    General relativity does not take quantum mechanics into account, so the story is not yet over. Many physicists hope that quantum gravity will eventually save physics from its struggles with the continuum! Since quantum gravity far from being understood, this remains just a hope. This hope has motivated a profusion of new ideas on spacetime: too many to survey here. Instead, I’ll focus on the humbler issue of how singularities arise in general relativity—and why they might not rob this theory of its predictive power.

    General relativity says that spacetime is a 4-dimensional Lorentzian manifold. Thus, it can be covered by patches equipped with coordinates, so that in each patch we can describe points by lists of four numbers. Any curve \gamma(s) going through a point then has a tangent vector v whose components are v^\mu = d \gamma^\mu(s)/ds. Furthermore, given two tangent vectors v,w at the same point we can take their inner product

    g(v,w) = g_{\mu \nu} v^\mu w^\nu

    where as usual we sum over repeated indices, and g_{\mu \nu} is a 4 \times 4 matrix called the metric, depending smoothly on the point. We require that at any point we can find some coordinate system where this matrix takes the usual Minkowski form:

    \displaystyle{  g = \left( \begin{array}{cccc} -1 & 0 &0 & 0 \\ 0 & 1 &0 & 0 \\ 0 & 0 &1 & 0 \\ 0 & 0 &0 & 1 \\ \end{array}\right). }

    However, as soon as we move away from our chosen point, the form of the matrix g in these particular coordinates may change.

    General relativity says how the metric is affected by matter. It does this in a single equation, Einstein’s equation, which relates the ‘curvature’ of the metric at any point to the flow of energy-momentum through that point. To define the curvature, we need some differential geometry. Indeed, Einstein had to learn this subject from his mathematician friend Marcel Grossman in order to write down his equation. Here I will take some shortcuts and try to explain Einstein’s equation with a bare minimum of differential geometry. For how this approach connects to the full story, and a list of resources for further study of general relativity, see:

    • John Baez and Emory Bunn, The meaning of Einstein’s equation.

    Consider a small round ball of test particles that are initially all at rest relative to each other. This requires a bit of explanation. First, because spacetime is curved, it only looks like Minkowski spacetime—the world of special relativity—in the limit of very small regions. The usual concepts of ’round’ and ‘at rest relative to each other’ only make sense in this limit. Thus, all our forthcoming statements are precise only in this limit, which of course relies on the fact that spacetime is a continuum.

    Second, a test particle is a classical point particle with so little mass that while it is affected by gravity, its effects on the geometry of spacetime are negligible. We assume our test particles are affected only by gravity, no other forces. In general relativity this means that they move along timelike geodesics. Roughly speaking, these are paths that go slower than light and bend as little as possible. We can make this precise without much work.

    For a path in space to be a geodesic means that if we slightly vary any small portion of it, it can only become longer. However, a path \gamma(s) in spacetime traced out by particle moving slower than light must be ‘timelike’, meaning that its tangent vector v = \gamma'(s) satisfies g(v,v) < 0. We define the proper time along such a path from s = s_0 to s = s_1 to be

    \displaystyle{  \int_{s_0}^{s_1} \sqrt{-g(\gamma'(s),\gamma'(s))} \, ds }

    This is the time ticked out by a clock moving along that path. A timelike path is a geodesic if the proper time can only decrease when we slightly vary any small portion of it. Particle physicists prefer the opposite sign convention for the metric, and then we do not need the minus sign under the square root. But the fact remains the same: timelike geodesics locally maximize the proper time.

    Actual particles are not test particles! First, the concept of test particle does not take quantum theory into account. Second, all known particles are affected by forces other than gravity. Third, any actual particle affects the geometry of the spacetime it inhabits. Test particles are just a mathematical trick for studying the geometry of spacetime. Still, a sufficiently light particle that is affected very little by forces other than gravity can be approximated by a test particle. For example, an artificial satellite moving through the Solar System behaves like a test particle if we ignore the solar wind, the radiation pressure of the Sun, and so on.

    If we start with a small round ball consisting of many test particles that are initially all at rest relative to each other, to first order in time it will not change shape or size. However, to second order in time it can expand or shrink, due to the curvature of spacetime. It may also be stretched or squashed, becoming an ellipsoid. This should not be too surprising, because any linear transformation applied to a ball gives an ellipsoid.

    Let V(t) be the volume of the ball after a time t has elapsed, where time is measured by a clock attached to the particle at the center of the ball. Then in units where c = 8 \pi G = 1, Einstein’s equation says:

    \displaystyle{  \left.{\ddot V\over V} \right|_{t = 0} = -{1\over 2} \left( \begin{array}{l} {\rm flow \; of \;} t{\rm -momentum \; in \; the \;\,} t {\rm \,\; direction \;} + \\ {\rm flow \; of \;} x{\rm -momentum \; in \; the \;\,} x {\rm \; direction \;} + \\ {\rm flow \; of \;} y{\rm -momentum \; in \; the \;\,} y {\rm \; direction \;} + \\ {\rm flow \; of \;} z{\rm -momentum \; in \; the \;\,} z {\rm \; direction} \end{array} \right) }

    These flows here are measured at the center of the ball at time zero, and the coordinates used here take advantage of the fact that to first order, at any one point, spacetime looks like Minkowski spacetime.

    The flows in Einstein’s equation are the diagonal components of a 4 \times 4 matrix T called the ‘stress-energy tensor’. The components T_{\alpha \beta} of this matrix say how much momentum in the \alpha direction is flowing in the \beta direction through a given point of spacetime. Here \alpha and \beta range from 0 to 3, corresponding to the t,x,y and z coordinates.

    For example, T_{00} is the flow of t-momentum in the t-direction. This is just the energy density, usually denoted \rho. The flow of x-momentum in the x-direction is the pressure in the x direction, denoted P_x, and similarly for y and z. You may be more familiar with direction-independent pressures, but it is easy to manufacture a situation where the pressure depends on the direction: just squeeze a book between your hands!

    Thus, Einstein’s equation says

    \displaystyle{ {\ddot V\over V} \Bigr|_{t = 0} = -{1\over 2} (\rho + P_x + P_y + P_z) }

    It follows that positive energy density and positive pressure both curve spacetime in a way that makes a freely falling ball of point particles tend to shrink. Since E = mc^2 and we are working in units where c = 1, ordinary mass density counts as a form of energy density. Thus a massive object will make a swarm of freely falling particles at rest around it start to shrink. In short, gravity attracts.

    Already from this, gravity seems dangerously inclined to create singularities. Suppose that instead of test particles we start with a stationary cloud of ‘dust’: a fluid of particles having nonzero energy density but no pressure, moving under the influence of gravity alone. The dust particles will still follow geodesics, but they will affect the geometry of spacetime. Their energy density will make the ball start to shrink. As it does, the energy density \rho will increase, so the ball will tend to shrink ever faster, approaching infinite density in a finite amount of time. This in turn makes the curvature of spacetime become infinite in a finite amount of time. The result is a ‘singularity’.

    In reality, matter is affected by forces other than gravity. Repulsive forces may prevent gravitational collapse. However, this repulsion creates pressure, and Einstein’s equation says that pressure also creates gravitational attraction! In some circumstances this can overwhelm whatever repulsive forces are present. Then the matter collapses, leading to a singularity—at least according to general relativity.

    When a star more than 8 times the mass of our Sun runs out of fuel, its core suddenly collapses. The surface is thrown off explosively in an event called a supernova. Most of the energy—the equivalent of thousands of Earth masses—is released in a ten-minute burst of neutrinos, formed as a byproduct when protons and electrons combine to form neutrons. If the star’s mass is below 20 times that of our the Sun, its core crushes down to a large ball of neutrons with a crust of iron and other elements: a neutron star.

    However, this ball is unstable if its mass exceeds the Tolman–Oppenheimer–Volkoff limit, somewhere between 1.5 and 3 times that of our Sun. Above this limit, gravity overwhelms the repulsive forces that hold up the neutron star. And indeed, no neutron stars heavier than 3 solar masses have been observed. Thus, for very heavy stars, the endpoint of collapse is not a neutron star, but something else: a black hole, an object that bends spacetime so much even light cannot escape.

    If general relativity is correct, a black hole contains a singularity. Many physicists expect that general relativity breaks down inside a black hole, perhaps because of quantum effects that become important at strong gravitational fields. The singularity is considered a strong hint that this breakdown occurs. If so, the singularity may be a purely theoretical entity, not a real-world phenomenon. Nonetheless, everything we have observed about black holes matches what general relativity predicts. Thus, unlike all the other theories we have discussed, general relativity predicts infinities that are connected to striking phenomena that are actually observed.

    The Tolman–Oppenheimer–Volkoff limit is not precisely known, because it depends on properties of nuclear matter that are not well understood. However, there are theorems that say singularities must occur in general relativity under certain conditions.

    One of the first was proved by Raychauduri and Komar in the mid-1950’s. It applies only to ‘dust’, and indeed it is a precise version of our verbal argument above. It introduced the Raychauduri’s equation, which is the geometrical way of thinking about spacetime curvature as affecting the motion of a small ball of test particles. It shows that under suitable conditions, the energy density must approach infinity in a finite amount of time along the path traced out out by a dust particle.

    The first required condition is that the flow of dust be initally converging, not expanding. The second condition, not mentioned in our verbal argument, is that the dust be ‘irrotational’, not swirling around. The third condition is that the dust particles be affected only by gravity, so that they move along geodesics. Due to the last two conditions, the Raychauduri–Komar theorem does not apply to collapsing stars.

    The more modern singularity theorems eliminate these conditions. But they do so at a price: they require a more subtle concept of singularity! There are various possible ways to define this concept. They’re all a bit tricky, because a singularity is not a point or region in spacetime.

    For our present purposes, we can define a singularity to be an ‘incomplete timelike or null geodesic’. As already explained, a timelike geodesic is the kind of path traced out by a test particle moving slower than light. Similarly, a null geodesic is the kind of path traced out by a test particle moving at the speed of light. We say a geodesic is ‘incomplete’ if it ceases to be well-defined after a finite amount of time. For example, general relativity says a test particle falling into a black hole follows an incomplete geodesic. In a rough-and-ready way, people say the particle ‘hits the singularity’. But the singularity is not a place in spacetime. What we really mean is that the particle’s path becomes undefined after a finite amount of time.

    We need to be a bit careful about what we mean by ‘time’ here. For test particles moving slower than light this is easy, since we can parametrize a timelike geodesic by proper time. However, the tangent vector v = \gamma'(s) of a null geodesic has g(v,v) = 0, so a particle moving along a null geodesic does not experience any passage of proper time. Still, any geodesic, even a null one, has a family of preferred parametrizations. These differ only by changes of variable like this: s \mapsto as + b. By ‘time’ we really mean the variable s in any of these preferred parametrizations. Thus, if our spacetime is some Lorentzian manifold M, we say a geodesic \gamma \colon [s_0, s_1] \to M is incomplete if, parametrized in one of these preferred ways, it cannot be extended to a strictly longer interval.

    The first modern singularity theorem was proved by Penrose in 1965. It says that if space is infinite in extent, and light becomes trapped inside some bounded region, and no exotic matter is present to save the day, either a singularity or something even more bizarre must occur. This theorem applies to collapsing stars. When a star of sufficient mass collapses, general relativity says that its gravity becomes so strong that light becomes trapped inside some bounded region. We can then use Penrose’s theorem to analyze the possibilities.

    Shortly thereafter Hawking proved a second singularity theorem, which applies to the Big Bang. It says that if space is finite in extent, and no exotic matter is present, generically either a singularity or something even more bizarre must occur. The singularity here could be either a Big Bang in the past, a Big Crunch in the future, both—or possibly something else. Hawking also proved a version of his theorem that applies to certain Lorentzian manifolds where space is infinite in extent, as seems to be the case in our Universe. This version requires extra conditions.

    There are some undefined phrases in this summary of the Penrose–Hawking singularity theorems, most notably these:

    • ‘exotic matter’

    • ‘singularity’

    • ‘something even more bizarre’.

    So, let me say a bit about each.

    These singularity theorems precisely specify what is meant by ‘exotic matter’. This is matter for which

    \rho + P_x + P_y + P_z < 0

    at some point, in some coordinate system. By Einstein’s equation, this would make a small ball of freely falling test particles tend to expand. In other words, exotic matter would create a repulsive gravitational field. No matter of this sort has ever been found; the matter we know obeys the so-called ‘dominant energy condition’

    \rho + P_x + P_y + P_z \ge 0

    The Penrose–Hawking singularity theorems also say what counts as ‘something even more bizarre’. An example would be a closed timelike curve. A particle following such a path would move slower than light yet eventually reach the same point where it started—and not just the same point in space, but the same point in spacetime! If you could do this, perhaps you could wait, see if it would rain tomorrow, and then go back and decide whether to buy an umbrella today. There are certainly solutions of Einstein’s equation with closed timelike curves. The first interesting one was found by Einstein’s friend Gödel in 1949, as part of an attempt to probe the nature of time. However, closed timelike curves are generally considered less plausible than singularities.

    In the Penrose–Hawking singularity theorems, ‘something even more bizarre’ means that spacetime is not ‘globally hyperbolic’. To understand this, we need to think about when we can predict the future or past given initial data. When studying field equations like Maxwell’s theory of electromagnetism or Einstein’s theory of gravity, physicists like to specify initial data on space at a given moment of time. However, in general relativity there is considerable freedom in how we choose a slice of spacetime and call it ‘space’. What should we require? For starters, we want a 3-dimensional submanifold S of spacetime that is ‘spacelike’: every vector v tangent to S should have g(v,v) > 0. However, we also want any timelike or null curve to hit S exactly once. A spacelike surface with this property is called a Cauchy surface, and a Lorentzian manifold containing a Cauchy surface is said to be globally hyperbolic. There are many theorems justifying the importance of this concept. Globally hyperbolicity excludes closed timelike curves, but also other bizarre behavior.

    By now the original singularity theorems have been greatly generalized and clarified. Hawking and Penrose gave a unified treatment of both theorems in 1970. The 1973 textbook by Hawking and Ellis gives a systematic introduction to this subject. Hawking gave an elegant informal overview of the key ideas in 1994, and a paper by Garfinkle and Senovilla reviews the subject and its history up to 2015.

    If we accept that general relativity really predicts the existence of singularities in physically realistic situations, the next step is to ask whether they rob general relativity of its predictive power. I’ll talk about that next time!

    Doug NatelsonNanovation podcast

     Michael Filler is a chemical engineering professor at Georgia Tech, developing new and interesting nanomaterials.  He is also the host of the outstanding Nanovation podcast, a very fun and informative approach to public outreach and science communication - much more interesting than blogging :-) .  I was fortunate enough to be a guest on his podcast a couple of weeks ago - here is the link.  It was really enjoyable, and I hope you have a chance to listen, if not to that one, then to some of the other discussions.

    September 22, 2016

    David Hoggdata-driven models of images and stars

    Today was a low-research day! That said, I had two phone conversations of great value. The first was with Andy Casey (Cambridge), about possibly building a fully data-driven model of stars that goes way beyond The Cannon, using the Gaia data as labels, and de-noising the Gaia data themselves. I am trying to conceptualize a project for the upcoming #GaiaSprint.

    I also had a great phone conference with Dun Wang (NYU), Dan Foreman-Mackey (UW), and Bernhard Schölkopf (MPI-IS) about image differencing, or Wang's new version of it, that has been so successful in Kepler data. We talked about the regimes in which it would fail, and vowed to test these in writing the paper. In traditional image differencing, you use the past images to make a reference image, and you use the present image to determine pointing, rotation, and PSF adjustments. In Wang's version, you use the past images to determine regression coefficients, and you use the present image to predict itself, using those regression coefficients. That's odd, but not all that different if you view it from far enough away. We have writing to do!

    David Hoggmeasuring and modeling radial velocities

    Dan Foreman-Mackey (UW) appeared for a few days in New York City. I had various conversations with him, including one in which I sanity-checked my data-driven model for radial velocities. He was suspicious that I can take the first-order (linear) approximation on the velocities. I said that they are a thousandth of a pixel! He still was suspicious. I also discussed with him the point of—and the mathematical basis underlying—the project we have with Adrian Price-Whelan (Princeton) on inferring companion orbits from stellar radial-velocity data. He agrees with me that we have a point in doing this project despite its unbelievably limited scope! Remotely, I worked a bit more on the wide-separation binaries in Gaia DR1 with Price-Whelan.

    John PreskillTripping over my own inner product

    A scrape stood out on the back of my left hand. The scrape had turned greenish-purple, I noticed while opening the lecture-hall door. I’d jounced the hand against my dining-room table while standing up after breakfast. The table’s corners form ninety-degree angles. The backs of hands do not.

    Earlier, when presenting a seminar, I’d forgotten to reference papers by colleagues. Earlier, I’d offended an old friend without knowing how. Some people put their feet in their mouths. I felt liable to swallow a clog.

    The lecture was for Ph 219: Quantum ComputationI was TAing (working as a teaching assistant for) the course. John Preskill was discussing quantum error correction.

    Computers suffer from errors as humans do: Imagine setting a hard drive on a table. Coffee might spill on the table (as it probably would have if I’d been holding a mug near the table that week). If the table is in my California dining room, an earthquake might judder the table. Juddering bangs the hard drive against the wood, breaking molecular bonds and deforming the hardware. The information stored in computers degrades.

    How can we protect information? By encoding it—by translating the message into a longer, encrypted message. An earthquake might judder the encoded message. We can reverse some of the damage by error-correcting.

    Different types of math describe different codes. John introduced a type of math called symplectic vector spaces. “Symplectic vector space” sounds to me like a garden of spiny cacti (on which I’d probably have pricked fingers that week). Symplectic vector spaces help us translate between the original and encoded messages.


    Symplectic vector space?

    Say that an earthquake has juddered our hard drive. We want to assess how the earthquake corrupted the encoded message and to error-correct. Our encryption scheme dictates which operations we should perform. Each possible operation, we represent with a mathematical object called a vector. A vector can take the form of a list of numbers.

    We construct the code’s vectors like so. Say that our quantum hard drive consists of seven phosphorus nuclei atop a strip of silicon. Each nucleus has two observables, or measurable properties. Let’s call the observables Z and X.

    Suppose that we should measure the first nucleus’s Z. The first number in our symplectic vector is 1. If we shouldn’t measure the first nucleus’s Z, the first number is 0. If we should measure the second nucleus’s Z, the second number is 1; if not, 0; and so on for the other nuclei. We’ve assembled the first seven numbers in our vector. The final seven numbers dictate which nuclei’s Xs we measure. An example vector looks like this: ( 1, \, 0, \, 1, \, 0, \, 1, \, 0, \, 1 \; | \; 0, \, 0, \, 0, \, 0, \, 0, \, 0, \, 0 ).

    The vector dictates that we measure four Zs and no Xs.


    Symplectic vectors represent the operations we should perform to correct errors.

    A vector space is a collection of vectors. Many problems—not only codes—involve vector spaces. Have you used Google Maps? Google illustrates the step that you should take next with an arrow. We can represent that arrow with a vector. A vector, recall, can take the form of a list of numbers. The step’s list of twonumbers indicates whether you should walk ( \text{Northward or not} \; | \; \text{Westward or not} ).


    I’d forgotten about my scrape by this point in the lecture. John’s next point wiped even cacti from my mind.

    Say you want to know how similar two vectors are. You usually calculate an inner product. A vector v tends to have a large inner product with any vector w that points parallel to v.


    Parallel vectors tend to have a large inner product.

    The vector v tends to have an inner product of zero with any vector w that points perpendicularly. Such v and w are said to annihilate each other. By the end of a three-hour marathon of a research conversation, we might say that v and w “destroy” each other. v is orthogonal to w.


    Two orthogonal vectors, having an inner product of zero, annihilate each other.

    You might expect a vector v to have a huge inner product with itself, since v points parallel to v. Quantum-code vectors defy expectations. In a symplectic vector space, John said, “you can be orthogonal to yourself.”

    A symplectic vector2 can annihilate itself, destroy itself, stand in its own way. A vector can oppose itself, contradict itself, trip over its own feet. I felt like I was tripping over my feet that week. But I’m human. A vector is a mathematical ideal. If a mathematical ideal could be orthogonal to itself, I could allow myself space to err.


    Tripping over my own inner product.

    Lloyd Alexander wrote one of my favorite books, the children’s novel The Book of Three. The novel features a stout old farmer called Coll. Coll admonishes an apprentice who’s burned his fingers: “See much, study much, suffer much.” We smart while growing smarter.

    An ant-sized scar remains on the back of my left hand. The scar has been fading, or so I like to believe. I embed references to colleagues’ work in seminar Powerpoints, so that I don’t forget to cite anyone. I apologized to the friend, and I know about symplectic vector spaces. We all deserve space to err, provided that we correct ourselves. Here’s to standing up more carefully after breakfast.


    1Not that I advocate for limiting each coordinate to one bit in a Google Maps vector. The two-bit assumption simplifies the example.

    2Not only symplectic vectors are orthogonal to themselves, John pointed out. Consider a string of bits that contains an even number of ones. Examples include (0, 0, 0, 0, 1, 1). Each such string has a bit-wise inner product, over the field {\mathbb Z}_2, of zero with itself.

    John BaezStruggles with the Continuum (Part 6)

    Last time I sketched how physicists use quantum electrodynamics, or ‘QED’, to compute answers to physics problems as power series in the fine structure constant, which is

    \displaystyle{ \alpha = \frac{1}{4 \pi \epsilon_0} \frac{e^2}{\hbar c} \approx \frac{1}{137.036} }

    I concluded with a famous example: the magnetic moment of the electron. With a truly heroic computation, physicists have used QED to compute this quantity up to order \alpha^5. If we also take other Standard Model effects into account we get agreement to roughly one part in 10^{12}.

    However, if we continue adding up terms in this power series, there is no guarantee that the answer converges. Indeed, in 1952 Freeman Dyson gave a heuristic argument that makes physicists expect that the series diverges, along with most other power series in QED!

    The argument goes as follows. If these power series converged for small positive \alpha, they would have a nonzero radius of convergence, so they would also converge for small negative \alpha. Thus, QED would make sense for small negative values of \alpha, which correspond to imaginary values of the electron’s charge. If the electron had an imaginary charge, electrons would attract each other electrostatically, since the usual repulsive force between them is proportional to e^2. Thus, if the power series converged, we would have a theory like QED for electrons that attract rather than repel each other.

    However, there is a good reason to believe that QED cannot make sense for electrons that attract. The reason is that it describes a world where the vacuum is unstable. That is, there would be states with arbitrarily large negative energy containing many electrons and positrons. Thus, we expect that the vacuum could spontaneously turn into electrons and positrons together with photons (to conserve energy). Of course, this not a rigorous proof that the power series in QED diverge: just an argument that it would be strange if they did not.

    To see why electrons that attract could have arbitrarily large negative energy, consider a state \psi with a large number N of such electrons inside a ball of radius R. We require that these electrons have small momenta, so that nonrelativistic quantum mechanics gives a good approximation to the situation. Since its momentum is small, the kinetic energy of each electron is a small fraction of its rest energy m_e c^2. If we let \langle \psi, E \psi\rangle be the expected value of the total rest energy and kinetic energy of all the electrons, it follows that \langle \psi, E\psi \rangle is approximately proportional to N.

    The Pauli exclusion principle puts a limit on how many electrons with momentum below some bound can fit inside a ball of radius R. This number is asymptotically proportional to the volume of the ball. Thus, we can assume N is approximately proportional to R^3. It follows that \langle \psi, E \psi \rangle is approximately proportional to R^3.

    There is also the negative potential energy to consider. Let V be the operator for potential energy. Since we have N electrons attracted by an 1/r potential, and each pair contributes to the potential energy, we see that \langle \psi , V \psi \rangle is approximately proportional to -N^2 R^{-1}, or -R^5. Since R^5 grows faster than R^3, we can make the expected energy \langle \psi, (E + V) \psi \rangle arbitrarily large and negative as N,R \to \infty.

    Note the interesting contrast between this result and some previous ones we have seen. In Newtonian mechanics, the energy of particles attracting each other with a 1/r potential is unbounded below. In quantum mechanics, thanks the uncertainty principle, the energy is bounded below for any fixed number of particles. However, quantum field theory allows for the creation of particles, and this changes everything! Dyson’s disaster arises because the vacuum can turn into a state with arbitrarily large numbers of electrons and positrons. This disaster only occurs in an imaginary world where \alpha is negative—but it may be enough to prevent the power series in QED from having a nonzero radius of convergence.

    We are left with a puzzle: how can perturbative QED work so well in practice, if the power series in QED diverge?

    Much is known about this puzzle. There is an extensive theory of ‘Borel summation’, which allows one to extract well-defined answers from certain divergent power series. For example, consider a particle of mass m on a line in a potential

    V(x) = x^2 + \beta x^4

    When \beta \ge 0 this potential is bounded below, but when \beta < 0 it is not: classically, it describes a particle that can shoot to infinity in a finite time. Let H = K + V be the quantum Hamiltonian for this particle, where K is the usual operator for the kinetic energy and V is the operator for potential energy. When \beta \ge 0, the Hamiltonian H is essentially self-adjoint on the set of smooth wavefunctions that vanish outside a bounded interval. This means that the theory makes sense. Moreover, in this case H has a ‘ground state’: a state \psi whose expected energy \langle \psi, H \psi \rangle is as low as possible. Call this expected energy E(\beta). One can show that E(\beta) depends smoothly on \beta for \beta \ge 0, and one can write down a Taylor series for E(\beta).

    On the other hand, when \beta < 0 the Hamiltonian H is not essentially self-adjoint. This means that the quantum mechanics of a particle in this potential is ill-behaved when \beta < 0. Heuristically speaking, the problem is that such a particle could tunnel through the barrier given by the local maxima of V(x) and shoot off to infinity in a finite time.

    This situation is similar to Dyson’s disaster, since we have a theory that is well-behaved for \beta \ge 0 and ill-behaved for \beta < 0. As before, the bad behavior seems to arise from our ability to convert an infinite amount of potential energy into other forms of energy. However, in this simpler situation one can prove that the Taylor series for E(\beta) does not converge. Barry Simon did this around 1969. Moreover, one can prove that Borel summation, applied to this Taylor series, gives the correct value of E(\beta) for \beta \ge 0. The same is known to be true for certain quantum field theories. Analyzing these examples, one can see why summing the first few terms of a power series can give a good approximation to the correct answer even though the series diverges. The terms in the series get smaller and smaller for a while, but eventually they become huge.

    Unfortunately, nobody has been able to carry out this kind of analysis for quantum electrodynamics. In fact, the current conventional wisdom is that this theory is inconsistent, due to problems at very short distance scales. In our discussion so far, we summed over Feynman diagrams with \le n vertices to get the first n terms of power series for answers to physical questions. However, one can also sum over all diagrams with \le n loops. This more sophisticated approach to renormalization, which sums over infinitely many diagrams, may dig a bit deeper into the problems faced by quantum field theories.

    If we use this alternate approach for QED we find something surprising. Recall that in renormalization we impose a momentum cutoff \Lambda, essentially ignoring waves of wavelength less than \hbar/\Lambda, and use this to work out a relation between the the electron’s bare charge e_\mathrm{bare}(\Lambda) and its renormalized charge e_\mathrm{ren}. We try to choose e_\mathrm{bare}(\Lambda) that makes e_\mathrm{ren} equal to the electron’s experimentally observed charge e. If we sum over Feynman diagrams with \le n vertices this is always possible. But if we sum over Feynman diagrams with at most one loop, it ceases to be possible when \Lambda reaches a certain very large value, namely

    \displaystyle{  \Lambda \; = \; \exp\left(\frac{3 \pi}{2 \alpha} + \frac{5}{6}\right) m_e c \; \approx \; e^{647} m_e c}

    According to this one-loop calculation, the electron’s bare charge becomes infinite at this point! This value of \Lambda is known as a ‘Landau pole’, since it was first noticed in about 1954 by Lev Landau and his colleagues.

    What is the meaning of the Landau pole? We said that poetically speaking, the bare charge of the electron is the charge we would see if we could strip off the electron’s virtual particle cloud. A somewhat more precise statement is that e_\mathrm{bare}(\Lambda) is the charge we would see if we collided two electrons head-on with a momentum on the order of \Lambda. In this collision, there is a good chance that the electrons would come within a distance of \hbar/\Lambda from each other. The larger \Lambda is, the smaller this distance is, and the more we penetrate past the effects of the virtual particle cloud, whose polarization ‘shields’ the electron’s charge. Thus, the larger \Lambda is, the larger e_\mathrm{bare}(\Lambda) becomes.

    So far, all this makes good sense: physicists have done experiments to actually measure this effect. The problem is that according to a one-loop calculation, e_\mathrm{bare}(\Lambda) becomes infinite when \Lambda reaches a certain huge value.

    Of course, summing only over diagrams with at most one loop is not definitive. Physicists have repeated the calculation summing over diagrams with \le 2 loops, and again found a Landau pole. But again, this is not definitive. Nobody knows what will happen as we consider diagrams with more and more loops. Moreover, the distance \hbar/\Lambda corresponding to the Landau pole is absurdly small! For the one-loop calculation quoted above, this distance is about

    \displaystyle{  e^{-647} \frac{\hbar}{m_e c} \; \approx \; 6 \cdot 10^{-294}\, \mathrm{meters} }

    This is hundreds of orders of magnitude smaller than the length scales physicists have explored so far. Currently the Large Hadron Collider can probe energies up to about 10 TeV, and thus distances down to about 2 \cdot 10^{-20} meters, or about 0.00002 times the radius of a proton. Quantum field theory seems to be holding up very well so far, but no reasonable physicist would be willing to extrapolate this success down to 6 \cdot 10^{-294} meters, and few seem upset at problems that manifest themselves only at such a short distance scale.

    Indeed, attitudes on renormalization have changed significantly since 1948, when Feynman, Schwinger and Tomonoga developed it for QED. At first it seemed a bit like a trick. Later, as the success of renormalization became ever more thoroughly confirmed, it became accepted. However, some of the most thoughtful physicists remained worried. In 1975, Dirac said:

    Most physicists are very satisfied with the situation. They say: ‘Quantum electrodynamics is a good theory and we do not have to worry about it any more.’ I must say that I am very dissatisfied with the situation, because this so-called ‘good theory’ does involve neglecting infinities which appear in its equations, neglecting them in an arbitrary way. This is just not sensible mathematics. Sensible mathematics involves neglecting a quantity when it is small—not neglecting it just because it is infinitely great and you do not want it!

    As late as 1985, Feynman wrote:

    The shell game that we play [. . .] is technically called ‘renormalization’. But no matter how clever the word, it is still what I would call a dippy process! Having to resort to such hocus-pocus has prevented us from proving that the theory of quantum electrodynamics is mathematically self-consistent. It’s surprising that the theory still hasn’t been proved self-consistent one way or the other by now; I suspect that renormalization is not mathematically legitimate.

    By now renormalization is thoroughly accepted among physicists. The key move was a change of attitude emphasized by Kenneth Wilson in the 1970s. Instead of treating quantum field theory as the correct description of physics at arbitrarily large energy-momenta, we can assume it is only an approximation. For renormalizable theories, one can argue that even if quantum field theory is inaccurate at large energy-momenta, the corrections become negligible at smaller, experimentally accessible energy-momenta. If so, instead of seeking to take the \Lambda \to \infty limit, we can use renormalization to relate bare quantities at some large but finite value of \Lambda to experimentally observed quantities.

    From this practical-minded viewpoint, the possibility of a Landau pole in QED is less important than the behavior of the Standard Model. Physicists believe that the Standard Model would suffer from Landau pole at momenta low enough to cause serious problems if the Higgs boson were considerably more massive than it actually is. Thus, they were relieved when the Higgs was discovered at the Large Hadron Collider with a mass of about 125 GeV/c2. However, the Standard Model may still suffer from a Landau pole at high momenta, as well as an instability of the vacuum.

    Regardless of practicalities, for the mathematical physicist, the question of whether or not QED and the Standard Model can be made into well-defined mathematical structures that obey the axioms of quantum field theory remain open problems of great interest. Most physicists believe that this can be done for pure Yang–Mills theory, but actually proving this is the first step towards winning $1,000,000 from the Clay Mathematics Institute.

    ParticlebitesDaya Bay and the search for sterile neutrinos

    Article: Improved search for a light sterile neutrino with the full configuration of the Daya Bay Experiment
    Authors: Daya Bay Collaboration
    Reference: arXiv:1607.01174

    Today I bring you news from the Daya Bay reactor neutrino experiment, which detects neutrinos emitted by three nuclear power plants on the southern coast of China. The results in this paper are based on the first 621 days of data, through November 2013; more data remain to be analyzed, and we can expect a final result after the experiment ends in 2017.


    Figure 1: Antineutrino detectors installed in the far hall of the Daya Bay experiment. Source: LBL news release.

    For more on sterile neutrinos, see also this recent post by Eve.

    Neutrino oscillations

    Neutrinos exist in three flavors, each corresponding to one of the charged leptons: electron neutrinos (\nu_e), muon neutrinos (\nu_\mu) and tau neutrinos (\nu_\tau). When a neutrino is born via the weak interaction, it is created in a particular flavor eigenstate. So, for example, a neutrino born in the sun is always an electron neutrino. However, the electron neutrino does not have a definite mass. Instead, each flavor eigenstate is a linear combination of the three mass eigenstates. This “mixing” of the flavor and mass eigenstates is described by the PMNS matrix, as shown in Figure 2.

    Figure 2: Each neutrino flavor eigenstate is a linear combination of the three mass eigenstates.

    The PMNS matrix can be parameterized by 4 numbers: three mixing angles (θ12, θ23 and θ13) and a phase (δ).1  These parameters aren’t known a priori — they must be measured by experiments.

    Solar neutrinos stream outward in all directions from their birthplace in the sun. Some intercept Earth, where human-built neutrino observatories can inventory their flavors. After traveling 150 million kilometers, only ⅓ of them register as electron neutrinos — the other ⅔ have transformed along the way into muon or tau neutrinos. These neutrino flavor oscillations are the experimental signature of neutrino mixing, and the means by which we can tease out the values of the PMNS parameters. In any specific situation, the probability of measuring each type of neutrino  is described by some experiment-specific parameters (the neutrino energy, distance from the source, and initial neutrino flavor) and some fundamental parameters of the theory (the PMNS mixing parameters and the neutrino mass-squared differences). By doing a variety of measurements with different neutrino sources and different source-to-detector (“baseline”) distances, we can attempt to constrain or measure the individual theory parameters. This has been a major focus of the worldwide experimental neutrino program for the past 15 years.

    1 This assumes the neutrino is a Dirac particle. If the neutrino is a Majorana particle, there are two more phases, for a total of 6 parameters in the PMNS matrix.

    Sterile neutrinos

    Many neutrino experiments have confirmed our model of neutrino oscillations and the existence of three neutrino flavors. However, some experiments have observed anomalous signals which could be explained by the presence of a fourth neutrino. This proposed “sterile” neutrino doesn’t have a charged lepton partner (and therefore doesn’t participate in weak interactions) but does mix with the other neutrino flavors.

    The discovery of a new type of particle would be tremendously exciting, and neutrino experiments all over the world (including Daya Bay) have been checking their data for any sign of sterile neutrinos.

    Neutrinos from reactors

    Figure 3: Chart of the nuclides, color-coded by decay mode. Source: modified from Wikimedia Commons.

    Nuclear reactors are a powerful source of electron antineutrinos. To see why, take a look at this zoomed out version of the chart of the nuclides. The chart of the nuclides is a nuclear physicist’s version of the periodic table. For a chemist, Hydrogen-1 (a single proton), Hydrogen-2 (one proton and one neutron) and Hydrogen-3 (one proton and two neutrons) are essentially the same thing, because chemical bonds are electromagnetic and every hydrogen nucleus has the same electric charge. In the realm of nuclear physics, however, the number of neutrons is just as important as the number of protons. Thus, while the periodic table has a single box for each chemical element, the chart of the nuclides has a separate entry for every combination of protons and neutrons (“nuclide”) that has ever been observed in nature or created in a laboratory.

    The black squares are stable nuclei. You can see that stability only occurs when the ratio of neutrons to protons is just right. Furthermore, unstable nuclides tend to decay in such a way that the daughter nuclide is closer to the line of stability than the parent.

    Nuclear power plants generate electricity by harnessing the energy released by the fission of Uranium-235. Each U-235 nucleus contains 143 neutrons and 92 protons (n/p = 1.6). When U-235 undergoes fission, the resulting fragments also have n/p ~ 1.6, because the overall number of neutrons and protons is still the same. Thus, fission products tend to lie along the white dashed line in Figure 3, which falls above the line of stability. These nuclides have too many neutrons to be stable, and therefore undergo beta decay: n \to p + e + \bar{\nu}_e. A typical power reactor emits 6 × 10^20 \bar{\nu}_e per second.

    Figure 4: Layout of the Daya Bay experiment. Source: arXiv:1508.03943.

    The Daya Bay experiment

    The Daya Bay nuclear power complex is located on the southern coast of China, 55 km northeast of Hong Kong. With six reactor cores, it is one of the most powerful reactor complexes in the world — and therefore an excellent source of electron antineutrinos. The Daya Bay experiment consists of 8 identical antineutrino detectors in 3 underground halls. One experimental hall is located as close as possible to the Daya Bay nuclear power plant; the second is near the two Ling Ao power plants; the third is located 1.5 – 1.9 km away from all three pairs of reactors, a distance chosen to optimize Daya Bay’s sensitivity to the mixing angle \theta_{13}.

    The neutrino target at the heart of each detector is a cylindrical vessel filled with 20 tons of Gadolinium-doped liquid scintillator. The vast majority of \bar{\nu}_e pass through undetected, but occasionally one will undergo inverse beta decay in the target volume, interacting with a proton to produce a positron and a neutron: \bar{\nu}_e + p \to e^+ + n.

    Figure 5: Design of the Daya Bay \bar{\nu}_e detectors. Each detector consists of three nested cylindrical vessels. The inner acrylic vessel is about 3 meters tall and 3 meters in diameter. It contains 20 tons of Gadolinium-doped liquid scintillator; when a \bar{\nu}_e interacts in this volume, the resulting signal can be picked up by the detector. The outer acrylic vessel holds an additional 22 tons of liquid scintillator; this layer exists so that \bar{\nu}_e interactions near the edge of the inner volume are still surrounded by scintillator on all sides — otherwise, some of the gamma rays produced in the event might escape undetected. The stainless steel outer vessel is filled with 40 tons of mineral oil; its purpose to prevent outside radiation from reaching the scintillator. Finally, the outer vessel is lined with 192 photomultiplier tubes, which collect the scintillation light produced by particle interactions in the active scintillation volumes. The whole device is underwater for additional shielding. Source: arXiv:1508.03943.

    Figure 6: Cartoon version of the signal produced in the Daya Bay detectors by inverse beta decay. The size of the prompt pulse is related to the antineutrino energy; the delayed pulse has a characteristic energy of 8 MeV.

    The positron and neutron create signals in the detector with a characteristic time relationship, as shown in Figure 6. The positron immediately deposits its energy in the scintillator and then annihilates with an electron. This all happens within a few nanoseconds and causes a prompt flash of scintillation light. The neutron, meanwhile, spends some tens of microseconds bouncing around (“thermalizing”) until it is slow enough to be captured by a Gadolinium nucleus. When this happens, the nucleus emits a cascade of gamma rays, which in turn interact with the scintillator and produce a second flash of light. This combination of prompt and delayed signals is used to identify \bar{\nu}_e interaction events.

    Daya Bay’s search for sterile neutrinos

    Daya Bay is a neutrino disappearance experiment. The electron antineutrinos emitted by the reactors can oscillate into muon or tau antineutrinos as they travel, but the detectors are only sensitive to \bar{\nu}_e, because the antineutrinos have enough energy to produce a positron but not the more massive \mu^+ or \tau^+. Thus, Daya Bay observes neutrino oscillations by measuring fewer \bar{\nu}_e than would be expected otherwise.

    Based on the number of \bar{\nu}_e detected at one of the Daya Bay experimental halls, the usual three-neutrino oscillation theory can predict the number that will be seen at the other two experimental halls (EH). You can see how this plays out in Figure 7. We are looking at the neutrino energy spectrum measured at EH2 and EH3, divided by the prediction computed from the EH1 data. The gray shaded regions mark the one-standard-deviation uncertainty bounds of the predictions. If the black data points deviated significantly from the shaded region, that would be a sign that the three-neutrino oscillation model is not complete, possibly due to the presence of sterile neutrinos. However, in this case, the black data points are statistically consistent with the prediction. In other words, Daya Bay sees no evidence for sterile neutrinos.

    Figure 7: Some results of the Daya Bay sterile neutrino search. Source: arxiv:1607.01174.

    Does that mean sterile neutrinos don’t exist? Not necessarily. For one thing, the effect of a sterile neutrino on the Daya Bay results would depend on the sterile neutrino mass and mixing parameters. The blue and red dashed lines in Figure 7 show the sterile neutrino prediction for two specific choices of \theta_{14} and \Delta m_{41}^2; these two examples look quite different from the three-neutrino prediction and can be ruled out because they don’t match the data. However, there are other parameter choices for which the presence of a sterile neutrino wouldn’t have a discernable effect on the Daya Bay measurements. Thus, Daya Bay can constrain the parameter space, but can’t rule out sterile neutrinos completely. However, as more and more experiments report “no sign of sterile neutrinos here,” it appears less and less likely that they exist.

    Further Reading

    Doug NatelsonDeborah Jin - gone way too soon.

    As was pointed out by a commenter on my previous post, and mentioned here by ZapperZ, atomic physicist Deborah Jin passed away last week from cancer at 47.   I don't think I ever met Prof. Jin (though she graduated from my alma mater when I was a freshman) face to face, and I'm not by any means an expert in her subdiscipline, but I will do my best to give an overview of some of her scientific legacy.  There is a sad shortage of atomic physics blogs....  I'm sure I'm missing things - please fill in additional information in the comments if you like.

    The advent of optical trapping and laser cooling (relevant Nobel here) transformed atomic physics from what had been a comparatively sleepy specialty, concerned with measuring details of optical transitions and precision spectroscopy (useful for atomic clocks), into a hive of activity, looking at the onset of new states of matter that happen when gases become sufficiently cold and dense that their quantum statistics start to be important.  In a classical noninteracting gas, there are few limits on the constituent molecules - as long as they don't actually try to be in the same place at the same time (think of this as the billiard ball restriction), the molecules can take on whatever spatial locations and momenta that they can reach.  However, if a gas is very cold (low average kinetic energy per molecule) and dense, the quantum properties of the constituents matter - for historical reasons this is called the onset of "degeneracy".  If the constituents are fermions, then the Pauli principle, the same physics that keeps all 79 electrons in an atom of gold from hanging out in the 1s orbital, keeps the constituents apart, and keeps them from all falling into the lowest available energy state.   In contrast, if the constituents are bosons, then a macroscopic fraction of the constituents can fall into the lowest energy state, a process called Bose-Einstein condensation (relevant Nobel here); the condensed state is a single quantum state with a large occupation, and therefore can show exotic properties.

    Prof. Jin's group did landmark work with these systems.  She and her student Brian DeMarco showed that you could actually reach the degenerate limit in a trapped atomic Fermi gas.  A major challenge in this field is trying to avoid 3-body and other collisions that can create states of the atoms that are no longer trapped by the lasers and magnetic fields used to do the confinement, and yet still create systems that are (in their quantum way) dense.  Prof. Jin's group showed that you could actually finesse this issue and pair up fermionic atoms to create trapped, ultracold diatomic molecules.  Moreover, you could then create a Bose-Einstein condensate of molecules (since a pair of fermions can be considered as a composite boson).  In superconductors, we're used to the idea that electrons can form Cooper pairs, which act as composite bosons and form a coherent quantum system, the superconducting state.  However, in superconductors, the Cooper pairs are "large" - the average real-space separation between the electrons that constitute a pair is big compared to the typical separation between particles.  Prof. Jin's work showed that in atomic gases you could span between the limits (BEC of tightly bound molecules on the one hand, vs. condensed state of loosely paired fermions on the other).  More recently, her group had been doing cool work looking at systems good for testing models of magnetism and other more complicated condensed matter phenoma, by using dipolar molecules, and examining very strongly interacting fermions.   Basically, Prof. Jin was an impressively creative, technically skilled, extremely productive physicist, and by all accounts a generous person who was great at mentoring students and postdocs.   She has left a remarkable scientific legacy for someone whose professional career was tragically cut short, and she will be missed.

    September 21, 2016

    Clifford JohnsonSuper Nailed It…

    quick_sketch_of_black_pantherOn the sofa, during a moment while we watched Captain America: Civil War over the weekend:

    Amy: Wait, what...? Why's Cat-Woman in this movie?
    Me: Er... (hesitating, not wanting to spoil what is to come...)
    Amy: Isn't she a DC character?
    Me: Well... (still hesitating, but secretly impressed by her awareness of the different universes... hadn't realized she was paying attention all these years.)
    Amy: So who's going to show up next? Super-Dude? Bat-Fella? Wonder-Lady? (Now she's really showing off and poking fun.)
    Me: We'll see... (Now choking with laughter on dinner...)

    I often feel bad subjecting my wife to this stuff, but this alone was worth it.

    For those who know the answers and are wondering, I held off on launching into a discussion about the fascinating history of Marvel, representation of people of African descent in superhero comics (and now movies and TV), the [...] Click to continue reading this post

    The post Super Nailed It… appeared first on Asymptotia.

    Chad OrzelTeaching Evaluations and the Problem of Unstated Assumptions

    There’s a piece in Inside Higher Ed today on yet another study showing that student course evaluations don’t correlate with student learning. For a lot of academics, the basic reaction to this is summed up in the Chuck Pearson tweet that sent me to the story: “Haven’t we settled this already?”

    The use of student course evaluations, though, is a perennial argument in academia, not likely to be nailed into a coffin any time soon. It’s also a good example of a hard problem made intractable by a large number of assumptions and constraints that are never clearly spelled out.

    As discussed in faculty lounges and on social media, the basic argument here (over)simplifies to a collection of administrators who like using student course evaluations as a way to measure faculty teaching, and a collection of faculty who hate this practice. If this were just an argument about what is the most accurate way to assess the quality of teaching in the abstract, studies like the one reported in IHE (and numerous past examples) would probably settle the question, but it’s not, because there’s a lot of other stuff going on. And because a lot of the other stuff that’s going on is never clearly stated, a lot of the stuff people wind up saying in the course of this argument is not actually helpful.

    One source of fundamental conflict and miscommunication is over the need for evaluating teaching in the first place. On the faculty side, administrative mandates for some sort of teaching assessment are often derided as brainless corporatism– pointless hoop-jumping that is being pushed on academia by people who want everything to be run like a business. The preference of many faculty in these arguments would be for absolutely no teaching evaluation whatsoever.

    That kind of suggestion, though, gives the people who are responsible for running institutions the howling fantods. Not because they’ve sold their souls to creeping corporatism, but because some kind of evaluation is just basic, common-sense due diligence. You’ve got to do something to keep tabs on what your teaching faculty are doing in the classroom, if nothing else in order to have a response when some helicopter parent calls in and rants about how Professor So-and-So is mistreating their precious little snowflake. Or, God forbid, so you get wind of any truly outrageous misconduct on the part of faculty before it becomes a giant splashy news story that makes you look terrible.

    That helps explain why administrators want some sort of evaluation, but why are the student comment forms so ubiquitous in spite of their flaws? The big advantage that these have is that they’re cheap and easy. You just pass out bubble sheets or direct students to the right URL, and their feedback comes right to you in an easily digestible form.

    And, again, this is something that’s often derided as corporatist penny-pinching, but it’s a very real concern. We know how to do teaching evaluation well– we do it when the stakes are highest— but it’s a very expensive and labor-intensive process. It’s not something that would be practical to do every year for every faculty member, and that’s not just because administrators are cheap– it’s because the level of work required from faculty would be seen as even more of an outrage than continuing to use the bubble-sheet student comment forms.

    And that’s why the studies showing that student comments don’t accurately measure teaching quality don’t get much traction. Everybody knows that it’s a bad measurement of that, but doing a good measurement of that isn’t practical, and also isn’t really the point.

    So, what’s to be done about this?

    On the faculty side, one thing to do is to recognize that there’s a legitimate need for some sort of institutional oversight, and look for practical alternatives that avoid the worst biases of student course comment forms without being unduly burdensome to implement. You’re not going to get a perfect measure of teaching quality, and “do nothing at all” is not an option, but maybe there’s some middle ground that can provide the necessary oversight without quintupling everybody’s workload. Regular classroom observations, say, though you’d need some safeguard against personal conflicts– maybe two different observers, one by the dean/chair or their designee, one by a colleague chosen by the faculty member being evaluated. It’s more work than just passing out forms, but better and fairer evaluation might be worth the effort.

    On the administrative side, more acknowledgement that evaluation is less about assessing faculty “merit” in a meaningful way, and more about assuring some minimum level of quality for the institution as a whole. And student comments have some role to play in this, but it should be acknowledged that these are mostly customer satisfaction surveys, not serious assessments of faculty quality. In which case they shouldn’t be tied to faculty compensation, as is all too often the case– if there must be financial incentives tied to faculty evaluation, they need to be based on better information than that, and the sums involved should be commensurate with the level of effort required to make the system work.

    I don’t really expect any of those to go anywhere, of course, but that’s my $0.02 on this issue. And though it should go without saying, let me emphasize that this is only my opinion as an individual academic. While I fervently hope that my employer agrees with me about the laws of physics, I don’t expect that they share my opinions on academic economics or politics, so don’t hold it against them.

    BackreactionWe understand gravity just fine, thank you.

    Yesterday I came across a Q&A on the website of Discover magazine, titled “The Root of Gravity - Does recent research bring us any closer to understanding it?” Jeff Lepler from Michigan has the following question:
    Q: Are we any closer to understanding the root cause of gravity between objects with mass? Can we use our newly discovered knowledge of the Higgs boson or gravitational waves to perhaps negate mass or create/negate gravity?”
    A person by name Bill Andrews (unknown to me) gives the following answer:
    A: Sorry, Jeff, but scientists still don’t really know why gravity works. In a way, they’ve just barely figured out how it works.”
    The answer continues, but let’s stop right there where the nonsense begins. What’s that even mean scientists don’t know “why” gravity works? And did the Bill person really think he could get away with swapping “why” for a “how” and nobody would notice?

    The purpose of science is to explain observations. We have a theory by name General Relativity that explains literally all data of gravitational effects. Indeed, that General Relativity is so dramatically successful is a great frustration for all those people who would like to revolutionize science a la Einstein. So in which sense, please, do scientists barely know how it works?

    For all we can presently tell gravity is a fundamental force, which means we have no evidence for an underlying theory from which gravity could be derived. Sure, theoretical physicists are investigating whether there is such an underlying theory that would give rise to gravity as well as the other interactions, a “theory of everything”. (Please submit nomenclature complaints to your local language police, not to me.) Would such a theory of everything explain “why” gravity works? No, because that’s not a meaningful scientific question. A theory of everything could potentially explain how gravity can arise from more fundamental principles similar to, say, the ideal gas law can arise from statistical properties of many atoms in motion. But that still wouldn’t explain why there should be something like gravity, or anything, in the first place.

    Either way, even if gravity arises within a larger framework like, say, string theory, the effects of what we call gravity today would still come about because energy-densities (and related quantities like pressure and momentum flux and so on) curve space-time, and fields move in that space-time. Just that these quantities might no longer be fundamental. We’ve known since 101 years how this works.

    After a few words on Newtonian gravity, the answer continues:
    “Because the other forces use “force carrier particles” to impart the force onto other particles, for gravity to fit the model, all matter must emit gravitons, which physically embody gravity. Note, however, that gravitons are still theoretical. Trying to reconcile these different interpretations of gravity, and understand its true nature, are among the biggest unsolved problems of physics.”
    Reconciling which different interpretations of gravity? These are all the same “interpretation.” It is correct that we don’t know how to quantize gravity so that the resulting theory remains viable also when gravity becomes strong. It’s also correct that the force-carrying particle associated to the quantization – the graviton – hasn’t been detected. But the question was about gravity, not quantum gravity. Reconciling the graviton with unquantized gravity is straight-forward – it’s called perturbative quantum gravity –  and exactly the reason most theoretical physicists are convinced the graviton exists. It’s just that this reconciliation breaks down when gravity becomes strong, which means it’s only an approximation.
    “But, alas, what we do know does suggest antigravity is impossible.”
    That’s correct on a superficial level, but it depends on what you mean by antigravity. If you mean by antigravity that you can let any of the matter which surrounds us “fall up” it’s correct. But there are modifications of general relativity that have effects one can plausibly call anti-gravitational. That’s a longer story though and shall be told another time.

    A sensible answer to this question would have been:
    “Dear Jeff,

    The recent detection of gravitational waves has been another confirmation of Einstein’s theory of General Relativity, which still explains all the gravitational effects that physicists know of. According to General Relativity the root cause of gravity is that all types of energy curve space-time and all matter moves in this curved space-time. Near planets, such as our own, this can be approximated to good accuracy by Newtonian gravity.

    There isn’t presently any observation which suggests that gravity itself emergens from another theory, though it is certainly a speculation that many theoretical physicists have pursued. There thus isn’t any deeper root for gravity because it’s presently part of the foundations of physics. The foundations are the roots of everything else.

    The discovery of the Higgs boson doesn’t tell us anything about the gravitational interaction. The Higgs boson is merely there to make sure particles have mass in addition to energy, but gravity works the same either way. The detection of gravitational waves is exciting because it allows us to learn a lot about the astrophysical sources of these waves. But the waves themselves have proved to be as expected from General Relativity, so from the perspective of fundamental physics they didn’t bring news.

    Within the incredibly well confirmed framework of General Relativity, you cannot negate mass or its gravitational pull. 
    You might also enjoy hearing what Richard Feynman had to say when he was asked a similar question about the origin of the magnetic force:

    This answer really annoyed me because it’s a lost opportunity to explain how well physicists understand the fundamental laws of nature.

    Scott AaronsonThe No-Cloning Theorem and the Human Condition: My After-Dinner Talk at QCRYPT

    The following are the after-dinner remarks that I delivered at QCRYPT’2016, the premier quantum cryptography conference, on Thursday Sep. 15 in Washington DC.  You could compare to my after-dinner remarks at QIP’2006 to see how much I’ve “”matured”” since then. Thanks so much to Yi-Kai Liu and the other organizers for inviting me and for putting on a really fantastic conference.

    It’s wonderful to be here at QCRYPT among so many friends—this is the first significant conference I’ve attended since I moved from MIT to Texas. I do, however, need to register a complaint with the organizers, which is: why wasn’t I allowed to bring my concealed firearm to the conference? You know, down in Texas, we don’t look too kindly on you academic elitists in Washington DC telling us what to do, who we can and can’t shoot and so forth. Don’t mess with Texas! As you might’ve heard, many of us Texans even support a big, beautiful, physical wall being built along our border with Mexico. Personally, though, I don’t think the wall proposal goes far enough. Forget about illegal immigration and smuggling: I don’t even want Americans and Mexicans to be able to win the CHSH game with probability exceeding 3/4. Do any of you know what kind of wall could prevent that? Maybe a metaphysical wall.

    OK, but that’s not what I wanted to talk about. When Yi-Kai asked me to give an after-dinner talk, I wasn’t sure whether to try to say something actually relevant to quantum cryptography or just make jokes. So I’ll do something in between: I’ll tell you about research directions in quantum cryptography that are also jokes.

    The subject of this talk is a deep theorem that stands as one of the crowning achievements of our field. I refer, of course, to the No-Cloning Theorem. Almost everything we’re talking about at this conference, from QKD onwards, is based in some way on quantum states being unclonable. If you read Stephen Wiesner’s paper from 1968, which founded quantum cryptography, the No-Cloning Theorem already played a central role—although Wiesner didn’t call it that. By the way, here’s my #1 piece of research advice to the students in the audience: if you want to become immortal, just find some fact that everyone already knows and give it a name!

    I’d like to pose the question: why should our universe be governed by physical laws that make the No-Cloning Theorem true? I mean, it’s possible that there’s some other reason for our universe to be quantum-mechanical, and No-Cloning is just a byproduct of that. No-Cloning would then be like the armpit of quantum mechanics: not there because it does anything useful, but just because there’s gotta be something under your arms.

    OK, but No-Cloning feels really fundamental. One of my early memories is when I was 5 years old or so, and utterly transfixed by my dad’s home fax machine, one of those crappy 1980s fax machines with wax paper. I kept thinking about it: is it really true that a piece of paper gets transmaterialized, sent through a wire, and reconstituted at the other location? Could I have been that wrong about how the universe works? Until finally I got it—and once you get it, it’s hard even to recapture your original confusion, because it becomes so obvious that the world is made not of stuff but of copyable bits of information. “Information wants to be free!”

    The No-Cloning Theorem represents nothing less than a partial return to the view of the world that I had before I was five. It says that quantum information doesn’t want to be free: it wants to be private. There is, it turns out, a kind of information that’s tied to a particular place, or set of places. It can be moved around, or even teleported, but it can’t be copied the way a fax machine copies bits.

    So I think it’s worth at least entertaining the possibility that we don’t have No-Cloning because of quantum mechanics; we have quantum mechanics because of No-Cloning—or because quantum mechanics is the simplest, most elegant theory that has unclonability as a core principle. But if so, that just pushes the question back to: why should unclonability be a core principle of physics?

    Quantum Key Distribution

    A first suggestion about this question came from Gilles Brassard, who’s here. Years ago, I attended a talk by Gilles in which he speculated that the laws of quantum mechanics are what they are because Quantum Key Distribution (QKD) has to be possible, while bit commitment has to be impossible. If true, that would be awesome for the people at this conference. It would mean that, far from being this exotic competitor to RSA and Diffie-Hellman that’s distance-limited and bandwidth-limited and has a tiny market share right now, QKD would be the entire reason why the universe is as it is! Or maybe what this really amounts to is an appeal to the Anthropic Principle. Like, if QKD hadn’t been possible, then we wouldn’t be here at QCRYPT to talk about it.

    Quantum Money

    But maybe we should search more broadly for the reasons why our laws of physics satisfy a No-Cloning Theorem. Wiesner’s paper sort of hinted at QKD, but the main thing it had was a scheme for unforgeable quantum money. This is one of the most direct uses imaginable for the No-Cloning Theorem: to store economic value in something that it’s physically impossible to copy. So maybe that’s the reason for No-Cloning: because God wanted us to have e-commerce, and didn’t want us to have to bother with blockchains (and certainly not with credit card numbers).

    The central difficulty with quantum money is: how do you authenticate a bill as genuine? (OK, fine, there’s also the dificulty of how to keep a bill coherent in your wallet for more than a microsecond or whatever. But we’ll leave that for the engineers.)

    In Wiesner’s original scheme, he solved the authentication problem by saying that, whenever you want to verify a quantum bill, you bring it back to the bank that printed it. The bank then looks up the bill’s classical serial number in a giant database, which tells the bank in which basis to measure each of the bill’s qubits.

    With this system, you can actually get information-theoretic security against counterfeiting. OK, but the fact that you have to bring a bill to the bank to be verified negates much of the advantage of quantum money in the first place. If you’re going to keep involving a bank, then why not just use a credit card?

    That’s why over the past decade, some of us have been working on public-key quantum money: that is, quantum money that anyone can verify. For this kind of quantum money, it’s easy to see that the No-Cloning Theorem is no longer enough: you also need some cryptographic assumption. But OK, we can consider that. In recent years, we’ve achieved glory by proposing a huge variety of public-key quantum money schemes—and we’ve achieved even greater glory by breaking almost all of them!

    After a while, there were basically two schemes left standing: one based on knot theory by Ed Farhi, Peter Shor, et al. That one has been proven to be secure under the assumption that it can’t be broken. The second scheme, which Paul Christiano and I proposed in 2012, is based on hidden subspaces encoded by multivariate polynomials. For our scheme, Paul and I were able to do better than Farhi et al.: we gave a security reduction. That is, we proved that our quantum money scheme is secure, unless there’s a polynomial-time quantum algorithm to find hidden subspaces encoded by low-degree multivariate polynomials (yadda yadda, you can look up the details) with much greater success probability than we thought possible.

    Today, the situation is that my and Paul’s security proof remains completely valid, but meanwhile, our money is completely insecure! Our reduction means the opposite of what we thought it did. There is a break of our quantum money scheme, and as a consequence, there’s also a quantum algorithm to find large subspaces hidden by low-degree polynomials with much better success probability than we’d thought. What happened was that first, some French algebraic cryptanalysts—Faugere, Pena, I can’t pronounce their names—used Gröbner bases to break the noiseless version of scheme, in classical polynomial time. So I thought, phew! At least I had acceded when Paul insisted that we also include a noisy version of the scheme. But later, Paul noticed that there’s a quantum reduction from the problem of breaking our noisy scheme to the problem of breaking the noiseless one, so the former is broken as well.

    I’m choosing to spin this positively: “we used quantum money to discover a striking new quantum algorithm for finding subspaces hidden by low-degree polynomials. Err, yes, that’s exactly what we did.”

    But, bottom line, until we manage to invent a better public-key quantum money scheme, or otherwise sort this out, I don’t think we’re entitled to claim that God put unclonability into our universe in order for quantum money to be possible.

    Copy-Protected Quantum Software

    So if not money, then what about its cousin, copy-protected software—could that be why No-Cloning holds? By copy-protected quantum software, I just mean a quantum state that, if you feed it into your quantum computer, lets you evaluate some Boolean function on any input of your choice, but that doesn’t let you efficiently prepare more states that let the same function be evaluated. I think this is important as one of the preeminent evil applications of quantum information. Why should nuclear physicists and genetic engineers get a monopoly on the evil stuff?

    OK, but is copy-protected quantum software even possible? The first worry you might have is that, yeah, maybe it’s possible, but then every time you wanted to run the quantum program, you’d have to make a measurement that destroyed it. So then you’d have to go back and buy a new copy of the program for the next run, and so on. Of course, to the software company, this would presumably be a feature rather than a bug!

    But as it turns out, there’s a fact many of you know—sometimes called the “Gentle Measurement Lemma,” other times the “Almost As Good As New Lemma”—which says that, as long as the outcome of your measurement on a quantum state could be predicted almost with certainty given knowledge of the state, the measurement can be implemented in such a way that it hardly damages the state at all. This tells us that, if quantum money, copy-protected quantum software, and the other things we’re talking about are possible at all, then they can also be made reusable. I summarize the principle as: “if rockets, then space shuttles.”

    Much like with quantum money, one can show that, relative to a suitable oracle, it’s possible to quantumly copy-protect any efficiently computable function—or rather, any function that’s hard to learn from its input/output behavior. Indeed, the implementation can be not only copy-protected but also obfuscated, so that the user learns nothing besides the input/output behavior. As Bill Fefferman pointed out in his talk this morning, the No-Cloning Theorem lets us bypass Barak et al.’s famous result on the impossibility of obfuscation, because their impossibility proof assumed the ability to copy the obfuscated program.

    Of course, what we really care about is whether quantum copy-protection is possible in the real world, with no oracle. I was able to give candidate implementations of quantum copy-protection for extremely special functions, like one that just checks the validity of a password. In the general case—that is, for arbitrary programs—Paul Christiano has a beautiful proposal for how to do it, which builds on our hidden-subspace money scheme. Unfortunately, since our money scheme is currently in the shop being repaired, it’s probably premature to think about the security of the much more complicated copy-protection scheme! But these are wonderful open problems, and I encourage any of you to come and scoop us. Once we know whether uncopyable quantum software is possible at all, we could then debate whether it’s the “reason” for our universe to have unclonability as a core principle.

    Unclonable Proofs and Advice

    Along the same lines, I can’t resist mentioning some favorite research directions, which some enterprising student here could totally turn into a talk at next year’s QCRYPT.

    Firstly, what can we say about clonable versus unclonable quantum proofs—that is, QMA witness states? In other words: for which problems in QMA can we ensure that there’s an accepting witness that lets you efficiently create as many additional accepting witnesses as you want? (I mean, besides the QCMA problems, the ones that have short classical witnesses?) For which problems in QMA can we ensure that there’s an accepting witness that doesn’t let you efficiently create any additional accepting witnesses? I do have a few observations about these questions—ask me if you’re interested—but on the whole, I believe almost anything one can ask about them remains open.

    Admittedly, it’s not clear how much use an unclonable proof would be. Like, imagine a quantum state that encoded a proof of the Riemann Hypothesis, and which you would keep in your bedroom, in a glass orb on your nightstand or something. And whenever you felt your doubts about the Riemann Hypothesis resurfacing, you’d take the state out of its orb and measure it again to reassure yourself of RH’s truth. You’d be like, “my preciousssss!” And no one else could copy your state and thereby gain the same Riemann-faith-restoring powers that you had. I dunno, I probably won’t hawk this application in a DARPA grant.

    Similarly, one can ask about clonable versus unclonable quantum advice states—that is, initial states that are given to you to boost your computational power beyond that of an ordinary quantum computer. And that’s also a fascinating open problem.

    OK, but maybe none of this quite gets at why our universe has unclonability. And this is an after-dinner talk, so do you want me to get to the really crazy stuff? Yes?

    Self-Referential Paradoxes

    OK! What if unclonability is our universe’s way around the paradoxes of self-reference, like the unsolvability of the halting problem and Gödel’s Incompleteness Theorem? Allow me to explain what I mean.

    In kindergarten or wherever, we all learn Turing’s proof that there’s no computer program to solve the halting problem. But what isn’t usually stressed is that that proof actually does more than advertised. If someone hands you a program that they claim solves the halting problem, Turing doesn’t merely tell you that that person is wrong—rather, he shows you exactly how to expose the person as a jackass, by constructing an example input on which their program fails. All you do is, you take their claimed halt-decider, modify it in some simple way, and then feed the result back to the halt-decider as input. You thereby create a situation where, if your program halts given its own code as input, then it must run forever, and if it runs forever then it halts. “WHOOOOSH!” [head-exploding gesture]

    OK, but now imagine that the program someone hands you, which they claim solves the halting problem, is a quantum program. That is, it’s a quantum state, which you measure in some basis depending on the program you’re interested in, in order to decide whether that program halts. Well, the truth is, this quantum program still can’t work to solve the halting problem. After all, there’s some classical program that simulates the quantum one, albeit less efficiently, and we already know that the classical program can’t work.

    But now consider the question: how would you actually produce an example input on which this quantum program failed to solve the halting problem? Like, suppose the program worked on every input you tried. Then ultimately, to produce a counterexample, you might need to follow Turing’s proof and make a copy of the claimed quantum halt-decider. But then, of course, you’d run up against the No-Cloning Theorem!

    So we seem to arrive at the conclusion that, while of course there’s no quantum program to solve the halting problem, there might be a quantum program for which no one could explicitly refute that it solved the halting problem, by giving a counterexample.

    I was pretty excited about this observation for a day or two, until I noticed the following. Let’s suppose your quantum program that allegedly solves the halting problem has n qubits. Then it’s possible to prove that the program can’t possibly be used to compute more than, say, 2n bits of Chaitin’s constant Ω, which is the probability that a random program halts. OK, but if we had an actual oracle for the halting problem, we could use it to compute as many bits of Ω as we wanted. So, suppose I treated my quantum program as if it were an oracle for the halting problem, and I used it to compute the first 2n bits of Ω. Then I would know that, assuming the truth of quantum mechanics, the program must have made a mistake somewhere. There would still be something weird, which is that I wouldn’t know on which input my program had made an error—I would just know that it must’ve erred somewhere! With a bit of cleverness, one can narrow things down to two inputs, such that the quantum halt-decider must have erred on at least one of them. But I don’t know whether it’s possible to go further, and concentrate the wrongness on a single query.

    We can play a similar game with other famous applications of self-reference. For example, suppose we use a quantum state to encode a system of axioms. Then that system of axioms will still be subject to Gödel’s Incompleteness Theorem (which I guess I believe despite the umlaut). If it’s consistent, it won’t be able to prove all the true statements of arithmetic. But we might never be able to produce an explicit example of a true statement that the axioms don’t prove. To do so we’d have to clone the state encoding the axioms and thereby violate No-Cloning.

    Personal Identity

    But since I’m a bit drunk, I should confess that all this stuff about Gödel and self-reference is just a warmup to what I really wanted to talk about, which is whether the No-Cloning Theorem might have anything to do with the mysteries of personal identity and “free will.” I first encountered this idea in Roger Penrose’s book, The Emperor’s New Mind. But I want to stress that I’m not talking here about the possibility that the brain is a quantum computer—much less about the possibility that it’s a quantum-gravitational hypercomputer that uses microtubules to solve the halting problem! I might be drunk, but I’m not that drunk. I also think that the Penrose-Lucas argument, based on Gödel’s Theorem, for why the brain has to work that way is fundamentally flawed.

    But here I’m talking about something different. See, I have a lot of friends in the Singularity / Friendly AI movement. And I talk to them whenever I pass through the Bay Area, which is where they congregate. And many of them express great confidence that before too long—maybe in 20 or 30 years, maybe in 100 years—we’ll be able to upload ourselves to computers and live forever on the Internet (as opposed to just living 70% of our lives on the Internet, like we do today).

    This would have lots of advantages. For example, any time you were about to do something dangerous, you’d just make a backup copy of yourself first. If you were struggling with a conference deadline, you’d spawn 100 temporary copies of yourself. If you wanted to visit Mars or Jupiter, you’d just email yourself there. If Trump became president, you’d not run yourself for 8 years (or maybe 80 or 800 years). And so on.

    Admittedly, some awkward questions arise. For example, let’s say the hardware runs three copies of your code and takes a majority vote, just for error-correcting purposes. Does that bring three copies of you into existence, or only one copy? Or let’s say your code is run homomorphically encrypted, with the only decryption key stored in another galaxy. Does that count? Or you email yourself to Mars. If you want to make sure that you’ll wake up on Mars, is it important that you delete the copy of your code that remains on earth? Does it matter whether anyone runs the code or not? And what exactly counts as “running” it? Or my favorite one: could someone threaten you by saying, “look, I have a copy of your code, and if you don’t do what I say, I’m going to make a thousand copies of it and subject them all to horrible tortures?”

    The issue, in all these cases, is that in a world where there could be millions of copies of your code running on different substrates in different locations—or things where it’s not even clear whether they count as a copy or not—we don’t have a principled way to take as input a description of the state of the universe, and then identify where in the universe you are—or even a probability distribution over places where you could be. And yet you seem to need such a way in order to make predictions and decisions.

    A few years ago, I wrote this gigantic, post-tenure essay called The Ghost in the Quantum Turing Machine, where I tried to make the point that we don’t know at what level of granularity a brain would need to be simulated in order to duplicate someone’s subjective identity. Maybe you’d only need to go down to the level of neurons and synapses. But if you needed to go all the way down to the molecular level, then the No-Cloning Theorem would immediately throw a wrench into most of the paradoxes of personal identity that we discussed earlier.

    For it would mean that there were some microscopic yet essential details about each of us that were fundamentally uncopyable, localized to a particular part of space. We would all, in effect, be quantumly copy-protected software. Each of us would have a core of unpredictability—not merely probabilistic unpredictability, like that of a quantum random number generator, but genuine unpredictability—that an external model of us would fail to capture completely. Of course, by having futuristic nanorobots scan our brains and so forth, it would be possible in principle to make extremely realistic copies of us. But those copies necessarily wouldn’t capture quite everything. And, one can speculate, maybe not enough for your subjective experience to “transfer over.”

    Maybe the most striking aspect of this picture is that sure, you could teleport yourself to Mars—but to do so you’d need to use quantum teleportation, and as we all know, quantum teleportation necessarily destroys the original copy of the teleported state. So we’d avert this metaphysical crisis about what to do with the copy that remained on Earth.

    Look—I don’t know if any of you are like me, and have ever gotten depressed by reflecting that all of your life experiences, all your joys and sorrows and loves and losses, every itch and flick of your finger, could in principle be encoded by a huge but finite string of bits, and therefore by a single positive integer. (Really? No one else gets depressed about that?) It’s kind of like: given that this integer has existed since before there was a universe, and will continue to exist after the universe has degenerated into a thin gruel of radiation, what’s the point of even going through the motions? You know?

    But the No-Cloning Theorem raises the possibility that at least this integer is really your integer. At least it’s something that no one else knows, and no one else could know in principle, even with futuristic brain-scanning technology: you’ll always be able to surprise the world with a new digit. I don’t know if that’s true or not, but if it were true, then it seems like the sort of thing that would be worthy of elevating unclonability to a fundamental principle of the universe.

    So as you enjoy your dinner and dessert at this historic Mayflower Hotel, I ask you to reflect on the following. People can photograph this event, they can video it, they can type up transcripts, in principle they could even record everything that happens down to the millimeter level, and post it on the Internet for posterity. But they’re not gonna get the quantum states. There’s something about this evening, like about every evening, that will vanish forever, so please savor it while it lasts. Thank you.

    Update (Sep. 20): Unbeknownst to me, Marc Kaplan did video the event and put it up on YouTube! Click here to watch. Thanks very much to Marc! I hope you enjoy, even though of course, the video can’t precisely clone the experience of having been there.

    [Note: The part where I raise my middle finger is an inside joke—one of the speakers during the technical sessions inadvertently did the same while making a point, causing great mirth in the audience.]

    September 20, 2016

    ParticlebitesA new anomaly: the electromagnetic duality anomaly

    Article: Electromagnetic duality anomaly in curved spacetimes
    Authors: I. Agullo, A. del Rio and J. Navarro-Salas
    Reference: arXiv:1607.08879

    Disclaimer: this blogpost requires some basic knowledge of QFT (or being comfortable with taking my word at face value for some of the claims made :))

    Anomalies exists everywhere. Probably the most intriguing ones are medical, but in particle physics they can be pretty fascinating too. In physics, anomalies refer to the breaking of a symmetry. There are basically two types of anomalies:

    • The first type, gauge anomalies, are red-flags: if they show up in your theory, they indicate that the theory is mathematically inconsistent.
    • The second type of anomaly does not signal any problems with the theory and in fact can have experimentally observable consequences. A prime example is the chiral anomaly. This anomaly nicely explains the decay rate of the neutral pion into two photons.

      Fig. 1: Illustration of pion decay into two photons. [Credit: Wikimedia Commons]

    In this paper, a new anomaly is discussed. This anomaly is related to the polarization of light and is called the electromagnetic duality anomaly.

    Chiral anomaly 101
    So let’s first brush up on the basics of the chiral anomaly. How does this anomaly explain the decay rate of the neutral pion into two photons? For that we need to start with the Lagrangian for QED that describes the interactions between the electromagnetic field (that is, the photons) and spin-½ fermions (which pions are build from):

    \displaystyle \mathcal L = \bar\psi \left( i \gamma^\mu \partial_\mu - i e \gamma^\mu A_\mu \right) \psi + m \bar\psi \psi

    where the important players in the above equation are the \psis that describe the spin-½ particles and the vector potential A_\mu that describes the electromagnetic field. This Lagrangian is invariant under the chiral symmetry:

    \displaystyle \psi \to e^{i \gamma_5} \psi .

    Due to this symmetry the current density j^\mu = \bar{\psi} \gamma_5 \gamma^\mu \psi is conserved: \nabla_\mu j^\mu = 0. This then immediately tells us that the charge associated with this current density is time-independent. Since the chiral charge is time-independent, it prevents the \psi fields to decay into the electromagnetic fields, because the \psi field has a non-zero chiral charge and the photons have no chiral charge. Hence, if this was the end of the story, a pion would never be able to decay into two photons.

    However, the conservation of the charge is only valid classically! As soon as you go from classical field theory to quantum field theory this is no longer true; hence, the name (quantum) anomaly.  This can be seen most succinctly using Fujikawa’s observation that even though the field \psi and Lagrangian are invariant under the chiral symmetry, this is not enough for the quantum theory to also be invariant. If we take the path integral approach to quantum field theory, it is not just the Lagrangian that needs to be invariant but the entire path integral needs to be:

    \displaystyle \int D[A] \, D[\bar\psi]\, \int D[\psi] \, e^{i\int d^4x \mathcal L} .

    From calculating how the chiral symmetry acts on the measure D \left[\psi \right]  \, D \left[\bar \psi \right], one can extract all the relevant physics such as the decay rate.

    The electromagnetic duality anomaly
    Just like the chiral anomaly, the electromagnetic duality anomaly also breaks a symmetry at the quantum level that exists classically. The symmetry that is broken in this case is – as you might have guessed from its name – the electromagnetic duality. This symmetry is a generalization of a symmetry you are already familiar with from source-free electromagnetism. If you write down source-free Maxwell equations, you can just swap the electric and magnetic field and the equations look the same (you just have to send  \displaystyle \vec{E} \to \vec{B} and \vec{B} \to - \vec{E}). Now the more general electromagnetic duality referred to here is slightly more difficult to visualize: it is a rotation in the space of the electromagnetic field tensor and its dual. However, its transformation is easy to write down mathematically:

    \displaystyle F_{\mu \nu} \to \cos \theta \, F_{\mu \nu} + \sin \theta \, \, ^\ast F_{\mu \nu} .

    In other words, since this is a symmetry, if you plug this transformation into the Lagrangian of electromagnetism, the Lagrangian will not change: it is invariant. Now following the same steps as for the chiral anomaly, we find that the associated current is conserved and its charge is time-independent due to the symmetry. Here, the charge is simply the difference between the number of photons with left helicity and those with right helicity.

    Let us continue following the exact same steps as those for the chiral anomaly. The key is to first write electromagnetism in variables analogous to those of the chiral theory. Then you apply Fujikawa’s method and… *drum roll for the anomaly that is approaching*…. Anti-climax: nothing happens, everything seems to be fine. There are no anomalies, nothing!

    So why the title of this blog? Well, as soon as you couple the electromagnetic field with a gravitational field, the electromagnetic duality is broken in a deeply quantum way. The number of photon with left helicity and right helicity is no longer conserved when your spacetime is curved.

    Physical consequences
    Some potentially really cool consequences have to do with the study of light passing by rotating stars, black holes or even rotating clusters. These astrophysical objects do not only gravitationally bend the light, but the optical helicity anomaly tells us that there might be a difference in polarization between lights rays coming from different sides of these objects. This may also have some consequences for the cosmic microwave background radiation, which is ‘picture’ of our universe when it was only 380,000 years old (as compared to the 13.8 billion years it is today!). How big this effect is and whether we will be able to see it in the near future is still an open question.



    Further reading 

    • An introduction to anamolies using only quantum mechanics instead of quantum field theory is “Anomalies for pedestrians” by Barry Holstein 
    • The beautiful book “Quantum field theory and the Standard Model” by Michael Schwartz has a nice discussion in the later chapters on the chiral anomaly.
    • Lecture notes by Adal Bilal for graduate students on anomalies in general  can be found here

    Jordan EllenbergSuch shall not become the degradation of Wisconsin

    I’ve lived in Wisconsin for more than a decade and had never heard of Joshua Glover.  That’s not as it should be!

    Glover was a slave who escaped Missouri in 1852 and settled in Racine, a free man.  He found a job and settled down into a new life.  Two years later, his old master found out where he was, and, licensed by the Fugitive Slave Act, came north to claim his property.  The U.S. marshals seized Glover and locked him in the Milwaukee courthouse. (Cathedral Square Park is where that courthouse stood.)   A Wisconsin court issued a writ holding the Fugitive Slave Law unconstitutional, and demanding that Glover be given a trial, but the federal officers refused to comply.  So Sherman Booth, an abolitionist newspaperman from Waukesha, gathered a mob and broke Glover out.  Eventually he made it to Canada via the Underground Railroad.

    Booth spent years tangled in court, thanks to his role in the prison break.  Wisconsin, thrilled by its defiance of the hated law, bloomed with abolitionist fervency.  Judge Abram Daniel Smith declared that Wisconsin, a sovereign state, would never accept federal interference within its borders:

    “They will never consent that a slave-owner, his agent, or an officer of the United States, armed with process to arrest a fugitive from service, is clothed with entire immunity from state authority; to commit whatever crime or outrage against the laws of the state; that their own high prerogative writ of habeas corpus shall be annulled, their authority defied, their officers resisted, the process of their own courts contemned, their territory invaded by federal force, the houses of their citizens searched, the sanctuary or their homes invaded, their streets and public places made the scenes of tumultuous and armed violence, and state sovereignty succumb–paralyzed and aghast–before the process of an officer unknown to the constitution and irresponsible to its sanctions. At least, such shall not become the degradation of Wisconsin, without meeting as stern remonstrance and resistance as I may be able to interpose, so long as her people impose upon me the duty of guarding their rights and liberties, and maintaining the dignity and sovereignty of their state.”

    The sentiment, of course, was not so different from that the Southern states would use a few years later to justify their right to buy and sell human beings.  By the end of the 1850s, Wisconsin’s governor Alexander Randall would threaten to secede from the Union should slavery not be abolished.

    When Booth was arrested by federal marshals in 1860, state assemblyman Benjamin Hunkins of New Berlin went even further, introducing a bill declaring war on the United States in protest.  The speaker of the assembly declared the bill unconstitutional and no vote was taken.  (This was actually the second time Hunkins tried to declare war on the federal government; as a member of the Wisconsin territorial assembly in 1844, he became so outraged over the awarding of the Upper Peninsula to Michigan that he introduced an amendment declaring war on Great Britain, Illinois, Michigan, and the United States!)

    Milwaukee has both a Booth Street and a Glover Avenue; and they cross.

    Madison has a Randall Street (and a Randall School, and Camp Randall Stadium) but no Glover Street and no Booth Street.  Should it?






    September 19, 2016

    Clifford JohnsonKitchen Design…

    (Click for larger view.)
    sample_panel_dialogues_19_09_2016Apparently I was designing a kitchen recently. Yes, but not one I intend to build in the physical world. It's the setting (in part) for a new story I'm working on for the book. The everyday household is a great place to have a science conversation, by the way, and this is what we will see in this story. It might be one of the most important conversations in the book in some sense.

    This story is meant to be done in a looser, quicker style, and there I go again with the ridiculous level of detail... Just to get a sense of how ridiculous I'm being, note that this is not a page, but a small panel within a page of several.

    The page establishes the overall setting, and hopefully roots you [...] Click to continue reading this post

    The post Kitchen Design… appeared first on Asymptotia.

    Robert HellingBrute forcing Crazy Game Puzzles

    In the 1980s, as a kid I loved my Crazy Turtles Puzzle ("Das verrückte Schildkrötenspiel"). For a number of variations, see here or here.

    I had completely forgotten about those, but a few days ago, I saw a self-made reincarnation when staying at a friends' house:

    I tried a few minutes to solve it, unsuccessfully (in case it is not clear: you are supposed to arrange the nine tiles in a square such that they form color matching arrows wherever they meet).

    So I took the picture above with the plan to either try a bit more at home or write a program to solve it. Yesterday, I had about an hour and did the latter. I am a bit proud of the implementation I came up with and in particular the fact that I essentially came up with a correct program: It came up with the unique solution the first time I executed it. So, here I share it:


    # 1 rot 8
    # 2 gelb 7
    # 3 gruen 6
    # 4 blau 5

    @karten = (7151, 6754, 4382, 2835, 5216, 2615, 2348, 8253, 4786);

    foreach $karte(0..8) {
    $farbe[$karte] = [split //,$karten[$karte]];

    sub ausprobieren {
    my $pos = shift;

    foreach my $karte(0..8) {
    next if $benutzt[$karte];
    $benutzt[$karte] = 1;
    foreach my $dreh(0..3) {
    if ($pos % 3) {
    # Nicht linke Spalte
    $suche = 9 - $farbe[$gelegt[$pos - 1]]->[(1 - $drehung[$gelegt[$pos - 1]]) % 4];
    next if $farbe[$karte]->[(3 - $dreh) % 4] != $suche;
    if ($pos >= 3) {
    # Nicht oberste Zeile
    $suche = 9 - $farbe[$gelegt[$pos - 3]]->[(2 - $drehung[$gelegt[$pos - 3]]) % 4];
    next if $farbe[$karte]->[(4 - $dreh) % 4] != $suche;

    $benutzt[$karte] = 1;
    $gelegt[$pos] = $karte;
    $drehung[$karte] = $dreh;
    #print @gelegt[0..$pos]," ",@drehung[0..$pos]," ", 9 - $farbe[$gelegt[$pos - 1]]->[(1 - $drehung[$gelegt[$pos - 1]]) % 4],"\n";

    if ($pos == 8) {
    print "Fertig!\n";
    for $l(0..8) {
    print "$gelegt[$l] $drehung[$gelegt[$l]]\n";
    } else {
    &ausprobieren($pos + 1);
    $benutzt[$karte] = 0;

    Sorry for variable names in German, but the idea should be clear. Regarding the implementation: red, yellow, green and blue backs of arrows get numbers 1,2,3,4 respectively and pointy sides of arrows 8,7,6,5 (so matching combinations sum to 9).

    It implements depth first tree search where tile positions (numbered 0 to 8) are tried left to write top to bottom. So tile $n$ shares a vertical edge with tile $n-1$ unless it's number is 0 mod 3 (leftist column) and it shares a horizontal edge with tile $n-3$ unless $n$ is less than 3, which means it is in the first row.

    It tries rotating tiles by 0 to 3 times 90 degrees clock-wise, so finding which arrow to match with a neighboring tile can also be computed with mod 4 arithmetic.

    Clifford JohnsonBreaking, not Braking

    Well, that happened. I’ve not, at least as I recollect, written a breakup letter before…until now. It had the usual “It’s not you it’s me…”, “we’ve grown apart…” sorts of phrases. And they were all well meant. This was written to my publisher, I hasten to add! Over the last … Click to continue reading this post

    The post Breaking, not Braking appeared first on Asymptotia.

    ParticlebitesHorton Hears a Sterile Neutrino?

    Article: Limits on Active to Sterile Neutrino Oscillations from Disappearance Searches in the MINOS, Daya Bay, and Bugey-3 Experiments
    Authors:  Daya Bay and MINOS collaborations
    Reference: arXiv:1607.01177v4

    So far, the hunt for sterile neutrinos has come up empty. Could a joint analysis between MINOS, Daya Bay and Bugey-3 data hint at their existence?

    Neutrinos, like the beloved Whos in Dr. Seuss’ “Horton Hears a Who!,” are light and elusive, yet have a large impact on the universe we live in. While neutrinos only interact with matter through the weak nuclear force and gravity, they played a critical role in the formation of the early universe. Neutrino physics is now an exciting line of research pursued by the Hortons of particle physics, cosmology, and astrophysics alike. While most of what we currently know about neutrinos is well described by a three-flavor neutrino model, a few inconsistent experimental results such as those from the Liquid Scintillator Neutrino Detector (LSND) and the Mini Booster Neutrino Experiment (MiniBooNE) hint at the presence of a new kind of neutrino that only interacts with matter through gravity. If this “sterile” kind of neutrino does in fact exist, it might also have played an important role in the evolution of our universe.

    Horton hears a sterile neutrino? Source:

    The three known neutrinos come in three flavors: electron, muon, or tau. The discovery of neutrino oscillation by the Sudbury Neutrino Observatory and the Super-Kamiokande Observatory, which won the 2015 Nobel Prize, proved that one flavor of neutrino can transform into another. This led to the realization that each neutrino mass state is a superposition of the three different neutrino flavor states. From neutrino oscillation measurements, most of the parameters that define the mixing between neutrino states are well known for the three standard neutrinos.

    The relationship between the three known neutrino flavor states and mass states is usually expressed as a 3×3 matrix known as the PMNS matrix, for Bruno Pontecorvo, Ziro Maki, Masami Nakagawa and Shoichi Sakata. The PMNS matrix includes three mixing angles, the values of which determine “how much” of each neutrino flavor state is in each mass state. The distance required for one neutrino flavor to become another, the neutrino oscillation wavelength, is determined by the difference between the squared masses of the two mass states. The values of mass splittings m_2^2-m_1^2 and m_3^2-m_2^2 are known to good precision.

    A fourth flavor? Adding a sterile neutrino to the mix

    A “sterile” neutrino is referred to as such because it would not interact weakly: it would only interact through the gravitational force. Neutrino oscillations involving the hypothetical sterile neutrino can be understood using a “four-flavor model,” which introduces a fourth neutrino mass state, m_4, heavier than the three known “active” mass states. This fourth neutrino state would be mostly sterile, with only a small contribution from a mixture of the three known neutrino flavors. If the sterile neutrino exists, it should be possible to experimentally observe neutrino oscillations with a wavelength set by the difference between m_4^2 and the square of the mass of another known neutrino mass state. Current observations suggest a squared mass difference in the range of 0.1-10 eV^2.

    Oscillations between active and sterile states would result in the disappearance of muon (anti)neutrinos and electron (anti)neutrinos. In a disappearance experiment, you know how many neutrinos of a specific type you produce, and you count the number of that type of neutrino a distance away, and find that some of the neutrinos have “disappeared,” or in other words, oscillated into a different type of neutrino that you are not detecting.

    A joint analysis by the MINOS and Daya Bay collaborations

    The MINOS and Daya Bay collaborations have conducted a joint analysis to combine independent measurements of muon (anti)neutrino disappearance by MINOS and electron antineutrino disappearance by Daya Bay and Bugey-3. Here’s a breakdown of the involved experiments:

    • MINOS, the Main Injector Neutrino Oscillation Search: A long-baseline neutrino experiment with detectors at Fermilab and northern Minnesota that use an accelerator at Fermilab as the neutrino source
    • The Daya Bay Reactor Neutrino Experiment: Uses antineutrinos produced by the reactors of China’s Daya Bay Nuclear Power Plant and the Ling Ao Nuclear Power Plant
    • The Bugey-3 experiment: Performed in the early 1990s, used antineutrinos from the Bugey Nuclear Power Plant in France for its neutrino oscillation observations
    Screen Shot 2016-09-12 at 10.22.49 AM

    MINOS and Daya Bay/Bugey-3 combined 90% confidence level limits (in red) compared to the LSND and MiniBooNE 90% confidence level allowed regions (in green/purple). Plots the mass splitting between mass states 1 and 4 (corresponding to the sterile neutrino) against a function of the \mu-e mixing angle, which is equivalent to a function involving the 1-4 and 2-4 mixing angles. Regions of parameter space to the right of the red contour are excluded, counting out the majority of the LSND/MiniBooNE allowed regions. Source: arXiv:1607.01177v4.

    Assuming a four-flavor model, the MINOS and Daya Bay collaborations put new constraints on the value of the mixing angle \theta_{\mu e}, the parameter controlling electron (anti)neutrino appearance in experiments with short neutrino travel distances. As for the hypothetical sterile neutrino? The analysis excluded the parameter space allowed by the LSND and MiniBooNE appearance-based indications for the existence of light sterile neutrinos for \Delta m_{41}^2 < 0.8 eV^2 at a 95% confidence level. In other words, the MINOS and Daya Bay analysis essentially rules out the LSND and MiniBooNE inconsistencies that allowed for the presence of a sterile neutrino in the first place. These results illustrate just how at odds disappearance searches and appearance searches are when it comes to providing insight into the existence of light sterile neutrinos. If the Whos exist, they will need to be a little louder in order for the world to hear them.


    Background reading:

    n-Category Café Logical Uncertainty and Logical Induction

    Quick - what’s the 10 10010^{100}th digit of π\pi?

    If you’re anything like me, you have some uncertainty about the answer to this question. In fact, your uncertainty probably takes the following form: you assign a subjective probability of about 110\frac{1}{10} to this digit being any one of the possible values 0,1,2,90, 1, 2, \dots 9. This is despite the fact that

    • the normality of π\pi in base 1010 is a wide open problem, and
    • even if it weren’t, nothing random is happening; the 10 10010^{100}th digit of π\pi is a particular digit, not a randomly selected one, and it being a particular value is a mathematical fact which is either true or false.

    If you’re bothered by this state of affairs, you could try to resolve it by computing the 10 10010^{100}th digit of π\pi, but as far as I know nobody has the computational resources to do this in a reasonable amount of time.

    Because of this lack of computational resources, among other things, you and I aren’t logically omniscient; we don’t have access to all of the logical consequences of our beliefs. The kind of uncertainty we have about mathematical questions that are too difficult for us to settle one way or another right this moment is logical uncertainty, and standard accounts of how to have uncertain beliefs (for example, assign probabilities and update them using Bayes’ theorem) don’t capture it.

    Nevertheless, somehow mathematicians manage to have lots of beliefs about how likely mathematical conjectures such as the Riemann hypothesis are to be true, and even about simpler but still difficult mathematical questions such as how likely some very large complicated number NN is to be prime (a reasonable guess, before we’ve done any divisibility tests, is about 1lnN\frac{1}{\ln N} by the prime number theorem). In some contexts we have even more sophisticated guesses like the Cohen-Lenstra heuristics for assigning probabilities to mathematical statements such as “the class number of such-and-such complicated number field has pp-part equal to so-and-so.”

    In general, what criteria might we use to judge an assignment of probabilities to mathematical statements as reasonable or unreasonable? Given some criteria, how easy is it to find a way to assign probabilities to mathematical statements that actually satisfies them? These fundamental questions are the subject of the following paper:

    Scott Garrabrant, Tsvi Benson-Tilsen, Andrew Critch, Nate Soares, and Jessica Taylor, Logical Induction. ArXiv:1609.03543.

    Loosely speaking, in this paper the authors

    • describe a criterion called logical induction that an assignment of probabilities to mathematical statements could satisfy,
    • show that logical induction implies many other desirable criteria, some of which have previously appeared in the literature, and
    • prove that a computable logical inductor (an algorithm producing probability assignments satisfying logical induction) exists.

    Logical induction is a weak “no Dutch book” condition; the idea is that a logical inductor makes bets about which statements are true or false, and does so in a way that doesn’t lose it too much money over time.

    A warmup

    Before describing logical induction, let me describe a different and more naive criterion you could ask for, but in fact don’t want to ask for because it’s too strong. Let φ(φ)\varphi \mapsto \mathbb{P}(\varphi) be an assignment of probabilities to statements in some first-order language; for example, we might want to assign probabilities to statements in the language of Peano arithmetic (PA), conditioned on the axioms of PA being true (which means having probability 11). Say that such an assignment φ(φ)\varphi \mapsto \mathbb{P}(\varphi) is coherent if

    • ()=1\mathbb{P}(\top) = 1.
    • If φ 1\varphi_1 is equivalent to φ 2\varphi_2, then (φ 1)=(φ 2)\mathbb{P}(\varphi_1) = \mathbb{P}(\varphi_2).
    • (φ 1)=(φ 1φ 2)+(φ 1¬φ 2)\mathbb{P}(\varphi_1) = \mathbb{P}(\varphi_1 \wedge \varphi_2) + \mathbb{P}(\varphi_1 \wedge \neg \varphi_2).

    These axioms together imply various other natural-looking conditions; for example, setting φ 1=\varphi_1 = \top in the third axiom, we get that (φ 2)+(¬φ 2)=1\mathbb{P}(\varphi_2) + \mathbb{P}(\neg \varphi_2) = 1. Various other axiomatizations of coherence are possible.

    Theorem: A probability assignment such that (φ)=1\mathbb{P}(\varphi) = 1 for all statements φ\varphi in a first-order theory TT is coherent iff there is a probability measure on models of TT such that (φ)\mathbb{P}(\varphi) is the probability that φ\varphi is true in a random model.

    This theorem is a logical counterpart of the Riesz-Markov-Kakutani representation theorem relating probability distributions to linear functionals on spaces of functions; I believe it is due to Gaifman.

    For example, if TT is PA, then the sort of uncertainty that a coherent probability assignment conditioned on PA captures is uncertainty about which of the various first-order models of PA is the “true” natural numbers. However, coherent probability assignments are still logically omniscient: syntactically, every provable statement is assigned probability 11 because they’re all equivalent to \top, and semantically, provable statements are true in every model. In particular, coherence is too strong to capture uncertainty about the digits of π\pi.

    Coherent probability assignments can update over time whenever they learn that some statement is true which they haven’t assigned probability 11 to; for example, if you start by believing PA and then come to also believe that PA is consistent, then conditioning on that belief will cause your probability distribution over models to exclude models of PA where PA is inconsistent. But this doesn’t capture the kind of updating a non-logically omniscient reasoner like you or me actually does, where our beliefs about mathematics can change solely because we’ve thought a bit longer and proven some statements that we didn’t previously know (for example, about the values of more and more digits of π\pi).

    Logical induction

    The framework of logical induction is for describing the above kind of updating, based solely on proving more statements. It takes as input a deductive process which is slowly producing proofs of statements over time (for example, of theorems in PA), and assigns probabilities to statements that haven’t been proven yet. Remarkably, it’s able to do this in a way that eventually outpaces the deductive process, assigning high probabilities to true statements long before they are proven (see Theorem 4.2.1).

    So how does logical induction work? The coherence axioms above can be justified by Dutch book arguments, following Ramsey and de Finetti, which loosely say that a bookie can’t offer a coherent reasoner a bet about mathematical statements which they will take but which is in fact guaranteed to lose them money. But this is much too strong a requirement for a reasoner who is not logically omniscient. The logical induction criterion is a weaker version of this condition; we only require that an efficiently computable bookie can’t make arbitrarily large amounts of money by betting with a logical inductor about mathematical statements unless it’s willing to take on arbitrarily large amounts of risk (see Definition 3.0.1).

    This turns out to be a surprisingly useful condition to require, loosely speaking because it corresponds to being able to “notice patterns” in mathematical statements even if we can’t prove anything about them yet. A logical inductor has to be able to notice patterns that could otherwise be used by an efficiently computable bookie to exploit the inductor; for example, a logical inductor eventually assigns probability about 110\frac{1}{10} to claims that a very large digit of π\pi has a particular value, intuitively because otherwise a bookie could continue to bet with the logical inductor about more and more digits of π\pi, making money each time (see Theorem 4.4.2).

    Logical induction has many other desirable properties, some of which are described in this blog post. One of the more remarkable properties is that because logical inductors are computable, they can reason about themselves, and hence assign probabilities to statements about the probabilities they assign. Despite the possibility of running into self-referential paradoxes, logical inductors eventually have accurate beliefs about their own beliefs (see Theorem 4.11.1).

    Overall I’m excited about this circle of ideas and hope that they get more attention from the mathematical community. Speaking very speculatively, it would be great if logical induction shed some light on the role of probability in mathematics more generally - for example, in the use of informal probabilistic arguments for or against difficult conjectures. A recent example is Boklan and Conway’s probabilistic arguments in favor of the conjecture that there are no Fermat primes beyond those currently known.

    I’ve made several imprecise claims about the contents of the paper above, so please read it to get the precise claims!

    Tommaso DorigoAre There Two Higgses ? No, And I Won Another Bet!

    The 2012 measurements of the Higgs boson, performed by ATLAS and CMS on 7- and 8-TeV datasets collected during Run 1 of the LHC, were a giant triumph of fundamental physics, which conclusively showed the correctness of the theoretical explanation of electroweak symmetry breaking conceived in the 1960s.

    The Higgs boson signals found by the experiments were strong and coherent enough to convince physicists as well as the general public, but at the same time the few small inconsistencies unavoidably present in any data sample, driven by statistical fluctuations, were a stimulus for fantasy interpretations. Supersymmetry enthusiasts, in particular, saw the 125 GeV boson as the first found of a set of five. SUSY in fact requires the presence of at least five such states.

    read more

    BackreactionExperimental Search for Quantum Gravity 2016

    Research in quantum gravity is quite a challenge since we neither have a theory nor data. But some of us like a challenge.

    So far, most effort in the field has gone into using requirements of mathematical consistency to construct a theory. It is impossible of course to construct a theory based on mathematical consistency alone, because we can never prove our assumptions to be true. All we know is that the assumptions give rise to good predictions in the regime where we’ve tested them. Without assumptions, no proof. Still, you may hope that mathematical consistency tells you where to look for observational evidence.

    But in the second half of the 20th century, theorists have used the weakness of gravity as an excuse to not think about how to experimentally test quantum gravity at all. This isn’t merely a sign of laziness, it’s back to the days when philosophers believed they could find out how nature works by introspection. Just that now many theoretical physicists believe mathematical introspection is science. Particularly disturbing to me is how frequently I speak with students or young postdocs who have never even given thought to the question what makes a theory scientific. That’s one of the reasons the disconnect between physics and philosophy worries me.

    In any case, the cure clearly isn’t more philosophy, but more phenomenology. The effects of quantum gravity aren’t necessarily entirely out of experimental reach. Gravity isn’t generally a weak force, not in the same way that, for example, the weak nuclear force is weak. That’s because the effects of gravity get stronger with the amount of mass (or energy) that exerts the force. Indeed, this property of the gravitational force is the very reason why it’s so hard to quantize.

    Quantum gravitational effects hence were strong in the early universe, they are strong inside black holes, and they can be non-negligible for massive objects that have pronounced quantum properties. Furthermore, the theory of quantum gravity can be expected to give rise to deviations from general relativity or the symmetries of the standard model, which can have consequences that are observable even at low energies.

    The often repeated argument that we’d need to reach enormously high energies – close by the Planck energy, 16 orders of magnitude higher than LHC energies – is simply wrong. Physics is full with examples of short-distance phenomena that give rise to effects at longer distances, such as atoms causing Brownian motion, or quantum electrodynamics allowing stable atoms to begin with.

    I have spent the last 10 years or so studying the prospects to find experimental evidence for quantum gravity. Absent a fully-developed theory we work with models to quantify effects that could be signals of quantum gravity, and aim to test these models with data. The development of such models is relevant to identify promising experiments to begin with.

    Next week, we will hold the 5th international conference on Experimental Search for Quantum Gravity, here in Frankfurt. And I dare to say we have managed to pull together an awesome selection of talks.

    We’ll hear about the prospects of finding evidence for quantum gravity in the CMB (Bianchi, Krauss, Vennin) and in quantum oscillators (Paternostro). We have a lecture about the interface between gravity and quantum physics, both on long and short distances (Fuentes), and a talk on how to look for moduli and axion fields that are generic consequences of string theory (Conlon). Of course we’ll also cover Loop Quantum Cosmology (Barrau), asymptotically safe gravity (Eichhorn), and causal sets (Glaser). We’re super up-to-date by having a talk about constraints from the LIGO gravitational wave-measurements on deviations from general relativity (Yunes), and several of the usual suspects speaking about deviations from Lorentz-invariance (Mattingly), Planck stars (Rovelli, Vidotto), vacuum dispersion (Giovanni), and dimensional reduction (Magueijo). There’s neutrino physics (Paes), a talk about what the cosmological constant can tell us about new physics (Afshordi), and, and, and!

    You can download the abstracts here and the timetable here.

    But the best is I’m not telling you this to depress you because you can’t be with us, but because our IT guys still tell me we’ll both record the talks and livestream them (to the extent that the speakers consent of course). I’ll share the URL with you here once everything is set up, so stay tuned.

    Update:Streaming link will be posted on the institute's main page briefly before the event. Another update: Lifestream is available here.

    Jordan EllenbergKevin Jamieson, hyperparameter optimization, playoffs

    Kevin Jamieson gave a great seminar here on Hyperband, his algorithm for hyperparameter optimization.

    Here’s the idea.  Doing machine learning involves making a lot of choices.  You set up your deep learning neural thingamajig but that’s not exactly one size fits all:  How many layers do you want in your net?  How fast do you want your gradient descents to step?  And etc. and etc.  The parameters are the structures your thingamajig learns.  The hyperparameters are the decisions you make about your thingamajig before you start learning.  And it turns out these decisions can actually affect performance a lot.  So how do you know how to make them?

    Well, one option is to pick N choices of hyperparameters at random, run your algorithm on your test set with each choice, and see how you do.  The problem is, thingamajigs take a long time to converge.  This is expensive to do, and when N is small, you’re not really seeing very much of hyperparameter space (which might have dozens of dimensions.)

    A more popular choice is to place some prior on the function

    F:[hyperparameter space] -> [performance on test set]

    You make a choice of hyperparameters, you run the thingamajig, based on the output you update your distribution on F, based on your new distribution you choose a likely-to-be-informative hyperparameter and run again, etc.

    This is called “Bayesian optimization of hyperparameters” — it works pretty well — but really only about as well as taking twice as many random choices of hyperparameters, in practice.  A 2x speedup is nothing to sneeze at, but it still means you can’t get N large enough to search much of the space.

    Kevin thinks you should think of this as a multi-armed bandit problem.  You have a hyperparameter whose performance you’d like to judge.  You could run your thingamajig with those parameters until it seems to be converging, and see how well it does.  But that’s expensive.  Alternatively, you could run your thingamajig (1/c) times as long; then you have time to consider Nc values of the hyperparameters, much better.  But of course you have a much less accurate assessment of the performance:  maybe the best performer in that first (1/c) time segment is actually pretty bad, and just got off to a good start!

    So you do this instead.  Run the thingamajig for time (1/c) on Nc values.  That costs you N.  Then throw out all values of the hyperparameters that came in below median on performance.  You still have (1/2)Nc values left, so continue running those processes for another time (1/c).  That costs you (1/2)N.  Throw out everything below the median.  And so on.  When you get to the end you’ve spent N log Nc, not bad at all but instead of looking at only N hyperparameters, you’ve looked at Nc, where c might be pretty big.  And you haven’t wasted lots of processor time following unpromising choices all the way to the end; rather, you’ve mercilessly culled the low performers along the way.

    But how do you choose c?  I insisted to Kevin that he call c a hyperhyperparameter but he wasn’t into it.  No fun!  Maybe the reason Kevin resisted my choice is that he doesn’t actually choose c; he just carries out his procedure once for each c as c ranges over 1,2,4,8,…. N; this costs you only another log N.

    In practice, this seems to find hyperparameters just as well as more fancy Bayesian methods, and much faster.  Very cool!  You can imagine doing the same things in simpler situations (e.g. I want to do a gradient descent, where should I start?) and Kevin says this works too.

    In some sense this is how a single-elimination tournament works!  In the NCAA men’s basketball finals, 64 teams each play a game; the teams above the median are 1-0, while the teams below the median, at 0-1, get cut.  Then the 1-0 teams each play one more game:  the teams above the median at 2-0 stay, the teams below the median at 1-1 get cut.

    What if the regular season worked like this?  Like if in June, the bottom half of major league baseball just stopped playing, and the remaining 15 teams duked it out until August, then down to 8… It would be impossible to schedule, of course.  But in a way we have some form of it:  at the July 31 trade deadline, teams sufficiently far out of the running can just give up on the season and trade their best players for contending teams’ prospects.  Of course the bad teams keep playing games, but in some sense, the competition has narrowed to a smaller field.




    September 18, 2016

    Doug NatelsonAlan Alda Center for Communicating Science, posting

    Tomorrow I'll be a participant in an all-day workshop that Rice's Center for Teaching Excellence will be hosting with representatives from the Alan Alda Center for Communicating Science - the folks responsible for the Flame Challenge, a contest about trying to explain a science topic to an 11-year-old.  I'll write a follow-up post sometime soon about what this was like.

    I'm in the midst of some major writing commitments right now, so posting frequency may slow for a bit.  I am trying to plan out how to write some accessible content about some recent exciting work in a few different material systems. 


    Chad OrzelAdvice for New Faculty, 2016

    A couple of weeks ago, I was asked to speak on a panel about teaching during Union’s new-faculty orientation. We had one person from each of the academic divisions (arts and literature, social science, natural science, and engineering), and there was a ton of overlap in the things we said, but here’s a rough reconstruction of the advice I gave them:

    1) Be Wary of Advice

    Because it’s always good to start off with something that sounds a little counter-intuitive… What I mean by this is that lots of people will be more than happy to offer advice to a new faculty member– often without being asked– but a great deal of that advice will be bad for the person getting it. This isn’t the result of active malice, just that teaching is a highly individual endeavor.

    The relatively harmless example I use to illustrate this comes from my first year of teaching, when I went around and asked my new colleagues for advice. One very successful guy said that he made an effort to maximize the effect of our small classes by “breaking the fourth wall” and walking away from the chalkboard out into the middle of the room.

    That sounded good, so I tried it for a while. And quickly found that while it worked well for him, it was a disaster for me. I’m multiple standard deviations above average height, and several years later a student wrote on a course evaluation “He is loud and intense.” That combination meant that when I would walk out into the room while lecturing, the students closest to me were basically cowering in fear. Once I noticed that, I made a point of staying back near the blackboard unless I needed to go out into the room for a demo or activity, and everybody was happier.

    So, my advice to new faculty is: be wary of advice from older faculty. You’ll get lots of it, but much of it will be bad for you. You need to be independent enough and self-aware enough to recognize and use the bits that will work for you, and discard the bits that won’t.

    2) Don’t Be Afraid to Try New Things

    One of the big categories of well-intentioned bad advice that most new faculty can expect to hear is some form of “don’t rock the boat until you have tenure.” This particularly comes up in the context of teaching– I had people who were speakers at a workshop on improving introductory physics courses tell me not to try to implement any new methods as a junior faculty member. The logic is that changing anything will have a short-term negative effect on your teaching evaluations, and you shouldn’t risk that before a tenure review.

    I think this is bad advice, because if you’re going to go with that, there’s always some reason not to change things– you’ll be up for promotion to full professor, or looking for a fellowship, or an endowed chair, or something. There are always good, logical-seeming reasons to keep your head down and do something functional and not take risks that could improve your teaching.

    So, my advice is that if you look around and see something that you’d like to try that would improve your teaching, go for it. Again, you need to be independent and self-aware enough to recognize what’s likely to work, and make adjustments when needed, and you need to be prepared to defend the choices you make should it become necessary: “My evaluations went down when I changed teaching methods; this is a well-known effect of change, but they’ve improved since, and student learning is better by these metrics.”

    3) Don’t Assume Your Students Are Like You Were

    I count this as the best one-sentence piece of advice I got as a junior faculty member. It’s really important to remember that people who become college faculty are necessarily unusual– we’re the ones who had enough interest in our subjects to continue into graduate school, and who were sufficiently passionate and self-motivated to succeed there.

    That’s just not going to be the case for the vast majority of the students we will encounter as faculty. There will be a few, and we should cherish them, but most of them are not going to find the subject as intrinsically fascinating as we do, and won’t put in the same level of independent effort.

    So, my advice is to remember that (paraphrasing another famous comment) you go to class with the students you have, not the students you wish you had or the students you were. If you go in expecting students to react the same way you did, you’re going to end up disappointed and frustrated. This doesn’t mean you can’t ask them to become better than they are, it just means that you need to make an effort to meet them where they are, and move them toward where you’d like them to be.

    (For much of my college career, I was probably closer to the mean attitude of our students than many of my colleagues were– I played rugby and partied a lot– but I still regularly need to remind myself of this…)

    That’s the advice I offered, with the repeated caveat that they should remember the first item on my list, and be wary of all advice, including mine… Happily, many elements of this were echoed by my fellow panelists, so I think I was basically on the right track, but still, it’s impossible to succeed for everyone

    September 17, 2016

    Jordan EllenbergRoger Ailes, man of not many voices

    From Janet Maslin’s review of Gabriel Sherman’s book about Roger Ailes:

    Among those who did speak on the record to Mr. Sherman is Stephanie Gordon, an actress who in one part of that show dropped the towel she wore. She was asked by Mr. Ailes to come to his office for a Sunday photo session and felt extremely uncomfortable about having to do this for the producer. But she says Mr. Ailes could not have been nicer. He took pictures and later sent her a signed print inscribed: “Don’t throw in the towel, you’re a great actress. Roger Ailes.” But Mr. Sherman also has a story from a woman named Randi Harrison, also on the record, who claims Mr. Ailes offered her a $400-a-week job at NBC, saying: ‘If you agree to have sex with me whenever I want, I will add an extra hundred dollars a week.”

    These don’t sound like the voices of the same man.

    I think they totally sound like the voices of the same man.  It’s not like someone who sexually harasses one woman can be counted on to sexually harass every single woman within arm’s reach.  Bank robbers don’t rob every single bank!  “Why, I saw that man walk by a bank just the other day without robbing it — the person who told you he was a bank robber must just have been misinterpreting.  Probably he was just making a withdrawal and the teller took it the wrong way.”

    And what’s more:  don’t you think Ailes kind of could have been nicer to Gordon?  Like, a lot nicer?  Look at that exchange again.  He put her in a position where she felt extremely uncomfortable, and declined to sexually assault her on that occasion.  Then he sent her a signed print, on which he wrote a message reminding her that he’d seen her naked body.

    I think both these stories depict a man who sees women as existing mainly for his enjoyment, and a man who takes special pleasure in letting women know he sees them that way.  One man, one voice.


    September 15, 2016

    Tim GowersIn case you haven’t heard what’s going on in Leicester …

    Strangely, this is my second post about Leicester in just a few months, but it’s about something a lot more depressing than the football team’s fairytale winning of the Premier League (but let me quickly offer my congratulations to them for winning their first Champions League match — I won’t offer advice about whether they are worth betting on to win that competition too). News has just filtered through to me that the mathematics department is facing compulsory redundancies.

    The structure of the story is wearily familiar after what happened with USS pensions. The authorities declare that there is a financial crisis, and that painful changes are necessary. They offer a consultation. In the consultation their arguments appear to be thoroughly refuted. The refutation is then ignored and the changes go ahead.

    Here is a brief summary of the painful changes that are proposed for the Leicester mathematics department. The department has 21 permanent research-active staff. Six of those are to be made redundant. There are also two members of staff who concentrate on teaching. Their number will be increased to three. How will the six be chosen? Basically, almost everyone will be sacked and then invited to reapply for their jobs in a competitive process, and the plan is to get rid of “the lowest performers” at each level of seniority. Those lowest performers will be considered for “redeployment” — which means that the university will make efforts to find them a job of a broadly comparable nature, but doesn’t guarantee to succeed. It’s not clear to me what would count as broadly comparable to doing pure mathematical research.

    How is performance defined? It’s based on things like research grants, research outputs, teaching feedback, good citizenship, and “the ongoing and potential for continued career development and trajectory”, whatever that means. In other words, on the typical flawed metrics so beloved of university administrators, together with some subjective opinions that will presumably have to come from the department itself — good luck with offering those without creating enemies for life.

    Oh, and another detail is that they want to reduce the number of straight maths courses and promote actuarial science and service teaching in other departments.

    There is a consultation period that started in late August and ends on the 30th of September. So the lucky members of the Leicester mathematics faculty have had a whole month to marshall their to-be-ignored arguments against the changes.

    It’s important to note that mathematics is not the only department that is facing cuts. But it’s equally important to note that it is being singled out: the university is aiming for cuts of 4.5% on average, and mathematics is being asked to make a cut of more like 20%. One reason for this seems to be that the department didn’t score all that highly in the last REF. It’s a sorry state of affairs for a university that used to boast Sir Michael Atiyah as its chancellor.

    I don’t know what can be done to stop this, but at the very least there is a petition you can sign. It would be good to see a lot of signatures, so that Leicester can see how damaging a move like this will be to its reputation.

    n-Category Café Disaster at Leicester

    You’ve probably met mathematicians at the University of Leicester, or read their work, or attended their talks, or been to events they’ve organized. Their pure group includes at least four people working in categorical areas: Frank Neumann, Simona Paoli, Teimuraz Pirashvili and Andy Tonks.

    Now this department is under severe threat. A colleague of mine writes:

    24 members of the Department of Mathematics at the University of Leicester — the great majority of the members of the department — have been informed that their post is at risk of redundancy, and will have to reapply for their positions by the end of September. Only 18 of those applying will be re-appointed (and some of those have been changed to purely teaching positions).

    It’s not only mathematics at stake. The university is apparently on a process of “institutional transformation”, involving:

    the closure of departments, subject areas and courses, including the Vaughan Centre for Lifelong Learning and the university bookshop. Hundreds of academic, academic-related and support staff are to be made redundant, many of them compulsorily.

    If you don’t like this, sign the petition objecting! You’ll see lots of familiar names already on the list (Tim Gowers, John Baez, Ross Street, …). As signatory David Pritchard wrote, “successful departments and universities are hard to build and easy to destroy.”

    Jordan EllenbergAre the 2016 Orioles the slowest team in baseball history?

    The Orioles are last in the AL in stolen bases, with 17.  They also have the fewest times caught stealing, with 11; they’re so slow they’re not even trying to run.

    But here’s the thing that really jumps out at you.  With just 17 games to play, the Orioles have 6 triples on the season.  And this is a team with power, a team that hits the ball to the deep outfield a lot.  Six triples.  You know what the record is for the fewest triples ever recorded by a team?  11.  By the 1998 Orioles.  This year’s team is like the 1998 squad without the speed machine that was 34-year-old Brady Anderson.   They are going to set a fewest-triples record that may never be broken.


    Jordan EllenbergShow report: Xenia Rubinos at the Frequency

    Xenia Rubinos is a — ok, what is she?  A singer-songwriter-yeller-wreaker-of-havoc who plays an avant-garde version of R&B with a lot of loud, hectic guitar in it.  I’ve been pronouncing her name “Zenya” but she says “Senia.”  She played to about 100 people at the Frequency last Thursday.  She seems to belong in a much bigger place in front of a much bigger crowd, so much so that it feels a little weird to be right there next to her as she does her frankly pretty amazing thing.  Here’s “Cherry Tree,” from her 2013 debut, still her best song by my lights.  It would be most people’s best song.

    This, live, was pretty close to the record.  Other songs weren’t.  Live, I thought she and her band sometimes sounded like Fiery Furnaces, which doesn’t come through on the records.  “Pan Y Cafe”, a fun romp on the album

    is much more aggro live.  It’s kind of what the Pixies “Spanish songs” would be like if somebody who actually spoke Spanish wrote them.  (She likes the Pixies.)

    Maybe I should make a post about the greatest shows I’ve seen in Madison.  This was one of them.  Who else?  Man Man in 2007.  The Breeders in 2009.  Fatty Acids / Sat Nite Duets in 2012.  I’ll have to think about this more thoroughly.


    September 14, 2016

    ParticlebitesInspecting the Higgs with a golden probe

    Hello particle nibblers,

    After recovering from a dead-diphoton-excess induced depression (see here, here, and here for summaries) I am back to tell you a little more about something that actually does exist, our old friend Monsieur Higgs boson. All of the fuss over the past few months over a potential new particle at 750 GeV has perhaps made us forget just how special and interesting the Higgs boson really is, but as more data is collected at the LHC, we will surely be reminded of this fact once again (see Fig.1).

    Figure 1: Monsieur Higgs boson struggles to understand the Higgs mechanism.

    Figure 1: Monsieur Higgs boson struggles to understand the Higgs mechanism.

    Previously I discussed how one of the best and most precise ways to study the Higgs boson is just by `shining light on it’, or more specifically via its decays to pairs of photons. Today I want to expand on another fantastic and precise way to study the Higgs which I briefly mentioned previously; Higgs decays to four charged leptons (specifically electrons and muons) shown in Fig.2. This is a channel near and dear to my heart and has a long history because it was realized, way before the Higgs was actually discovered at 125 GeV, to be among the best ways to find a Higgs boson over a large range of potential masses above around 100 GeV. This led to it being dubbed the “gold plated” Higgs discovery mode, or “golden channel”, and in fact was one of the first channels (along with the diphoton channel) in which the 125 GeV Higgs boson was discovered at the LHC.

    Figure 2: Higgs decays to four leptons are mediated by the various physics effects which can enter in the grey blob. Could new physics be hiding in there?

    Figure 2: Higgs decays to four leptons are mediated by the various physics effects which can enter in the grey blob. Could new physics be hiding in there?

    One of the characteristics that makes the golden channel so valuable as a probe of the Higgs is that it is very precisely measured by the ATLAS and CMS experiments and has a very good signal to background ratio. Furthermore, it is very well understood theoretically since most of the dominant contributions can be calculated explicitly for both the signal and background. The final feature of the golden channel that makes it valuable, and the one that I will focus on today, is that it contains a wealth of information in each event due to the large number of observables associated with the four final state leptons.

    Since there are four charged leptons which are measured and each has an associated four momentum, there are in principle 16 separate numbers which can be measured in each event. However, the masses of the charged leptons are tiny in comparison to the Higgs mass so we can consider them as massless (see Footnote 1) to a very good approximation. This then reduces (using energy-momentum conservation) the number of observables to 12 which, in the lab frame, are given by the transverse momentum, rapidity, and azimuthal angle of each lepton. Now, Lorentz invariance tells us that physics doesnt care which frame of reference we pick to analyze the four lepton system. This allows us to perform a Lorentz transformation from the lab frame where the leptons are measured, but where the underlying physics can be obscured, to the much more convenient and intuitive center of mass frame of the four lepton system. Due to energy-momentum conservation, this is also the center of mass frame of the Higgs boson. In this frame the Higgs boson is at rest and the \emph{pairs} of leptons come out back to back (see Footnote 2) .

    In this frame the 12 observables can be divided into 4 production and 8 decay (see Footnote 3). The 4 production variables are characterized by the transverse momentum (which has two components), the rapidity, and the azimuthal angle of the four lepton system. The differential spectra for these four variables (especially the transverse momentum and rapidity) depend very much on how the Higgs is produced and are also affected by parton distribution functions at hadron colliders like the LHC. Thus the differential spectra for these variables can not in general be computed explicitly for Higgs production at the LHC.

    The 8 decay observables are characterized by the center of mass energy of the four lepton system, which in this case is equal to the Higgs mass, as well as two invariant masses associated with each pair of leptons (how one picks the pairs is arbitrary). There are also five angles (\Theta, \theta_1, \theta_2, Φ, Φ1) shown in Fig. 3 for a particular choice of lepton pairings. The angle \Theta is defined as the angle between the beam axis (labeled by p or z) and the axis defined to be in the direction of the momentum of one of the lepton pair systems (labeled by Z1 or z’). This angle also defines the ‘production plane’. The angles \theta_1, \theta_2 are the polar angles defined in the lepton pair rest frames. The angle Φ1 is the azimuthal angle between the production plane and the plane formed from the four vectors of one of the lepton pairs (in this case the muon pair). Finally Φ is defined as the azimuthal angle between the decay planes formed out of the two lepton pairs.

    Figure 3: Angular center of mass observables ($latex \Theta, \theta_1, \theta_2, Φ, Φ_1$) in Higgs to four lepton decays.

    Figure 3: Angular center of mass observables in Higgs to four lepton decays.

    To a good approximation these decay observables are independent of how the Higgs boson is produced. Furthermore, unlike the production variables, the fully differential spectra for the decay observables can be computed explicitly and even analytically. Each of them contains information about the properties of the Higgs boson as do the correlations between them. We see an example of this in Fig. 4 where we show the one dimensional (1D) spectrum for the Φ variable under various assumptions about the CP properties of the Higgs boson.

    Figure 4: Here I show various examples for the Φ differential spectrum assuming different possibilities for the CP properties of the Higgs boson.

    Figure 4: Here I show various examples for the Φ differential spectrum assuming different possibilities for the CP properties of the Higgs boson.

    This variable has long been known to be sensitive to the CP properties of the Higgs boson. An effect like CP violation would show up as an asymmetry in this Φ distribution which we can see in curve number 5 shown in orange. Keep in mind though that although I show a 1D spectrum for Φ, the Higgs to four lepton decay is a multidimensional differential spectrum of the 8 decay observables and all of their correlations. Thus though we can already see from a 1D projection for Φ how information about the Higgs is contained in these distributions, MUCH more information is contained in the fully differential decay width of Higgs to four lepton decays. This makes the golden channel a powerful probe of the detailed properties of the Higgs boson.

    OK nibblers, hopefully I have given you a flavor of the golden channel and why it is valuable as a probe of the Higgs boson. In a future post I will discuss in more detail the various types of physics effects which can enter in the grey blob in Fig. 2. Until then, keep nibbling and don’t let dead diphotons get you down!

    Footnote 1: If you are feeling uneasy about the fact that the Higgs can only “talk to” particles with mass and yet can decay to four massless (atleast approximately) leptons, keep in mind they do not interact directly. The Higgs decay to four charged leptons is mediated by intermediate particles which DO talk to the Higgs and charged leptons.

    Footnote 2: More precisely, in the Higgs rest frame, the four vector formed out of the sum of the two four vectors of any pair of leptons which are chosen will be back to back with the four vector formed out of the sum of the second pair of leptons.

    Footnote 3: This dividing into production and decay variables after transforming to the four lepton system center of mass frame (i.e. Higgs rest frame) is only possible in practice because all four leptons are visible and their four momentum can be reconstructed with very good precision at the LHC. This then allows for the rest frame of the Higgs boson to be reconstructed on an event by event basis. For final states with missing energy or jets which can not be reconstructed with high precision, transforming to the Higgs rest frame is in general not possible.

    John BaezThe Circular Electron Positron Collider

    Chen-Ning Yang is perhaps China’s most famous particle physicists. Together with Tsung-Dao Lee, he won the Nobel prize in 1957 for discovering that the laws of physics known the difference between left and right. He helped create Yang–Mills theory: the theory that describes all the forces in nature except gravity. He helped find the Yang–Baxter equation, which describes what particles do when they move around on a thin sheet of matter, tracing out braids.

    Right now the world of particle physics is in a shocked, somewhat demoralized state because the Large Hadron Collider has not yet found any physics beyond the Standard Model. Some Chinese scientists want to forge ahead by building an even more powerful, even more expensive accelerator.

    But Yang recently came out against this. This is a big deal, because he is very prestigious, and only China has the will to pay for the next machine. The director of the Chinese institute that wants to build the next machine, Wang Yifeng, issued a point-by-point rebuttal of Yang the very next day.

    Over on G+, Willie Wong translated some of Wang’s rebuttal in some comments to my post on this subject. The real goal of my post here is to make this translation a bit easier to find—not because I agree with Wang, but because this discussion is important: it affects the future of particle physics.

    First let me set the stage. In 2012, two months after the Large Hadron Collider found the Higgs boson, the Institute of High Energy Physics proposed a bigger machine: the Circular Electron Positron Collider, or CEPC.

    This machine would be a ring 100 kilometers around. It would collide electrons and positrons at an energy of 250 GeV, about twice what you need to make a Higgs. It could make lots of Higgs bosons and study their properties. It might find something new, too! Of course that would be the hope.

    It would cost $6 billion, and the plan was that China would pay for 70% of it. Nobody knows who would pay for the rest.

    According to Science:

    On 4 September, Yang, in an article posted on the social media platform WeChat, says that China should not build a supercollider now. He is concerned about the huge cost and says the money would be better spent on pressing societal needs. In addition, he does not believe the science justifies the cost: The LHC confirmed the existence of the Higgs boson, he notes, but it has not discovered new particles or inconsistencies in the standard model of particle physics. The prospect of an even bigger collider succeeding where the LHC has failed is “a guess on top of a guess,” he writes. Yang argues that high-energy physicists should eschew big accelerator projects for now and start blazing trails in new experimental and theoretical approaches.

    That same day, IHEP’s director, Wang Yifang, posted a point-by-point rebuttal on the institute’s public WeChat account. He criticized Yang for rehashing arguments he had made in the 1970s against building the BECP. “Thanks to comrade [Deng] Xiaoping,” who didn’t follow Yang’s advice, Wang wrote, “IHEP and the BEPC … have achieved so much today.” Wang also noted that the main task of the CEPC would not be to find new particles, but to carry out detailed studies of the Higgs boson.

    Yang did not respond to request for comment. But some scientists contend that the thrust of his criticisms are against the CEPC’s anticipated upgrade, the Super Proton-Proton Collider (SPPC). “Yang’s objections are directed mostly at the SPPC,” says Li Miao, a cosmologist at Sun Yat-sen University, Guangzhou, in China, who says he is leaning toward supporting the CEPC. That’s because the cost Yang cites—$20 billion—is the estimated price tag of both the CEPC and the SPPC, Li says, and it is the SPPC that would endeavor to make discoveries beyond the standard model.

    Still, opposition to the supercollider project is mounting outside the high-energy physics community. Cao Zexian, a researcher at CAS’s Institute of Physics here, contends that Chinese high-energy physicists lack the ability to steer or lead research in the field. China also lacks the industrial capacity for making advanced scientific instruments, he says, which means a supercollider would depend on foreign firms for critical components. Luo Huiqian, another researcher at the Institute of Physics, says that most big science projects in China have suffered from arbitrary cost cutting; as a result, the finished product is often a far cry from what was proposed. He doubts that the proposed CEPC would be built to specifications.

    The state news agency Xinhua has lauded the debate as “progress in Chinese science” that will make big science decision-making “more transparent.” Some, however, see a call for transparency as a bad omen for the CEPC. “It means the collider may not receive the go-ahead in the near future,” asserts Institute of Physics researcher Wu Baojun. Wang acknowledged that possibility in a 7 September interview with Caijing magazine: “opposing voices naturally have an impact on future approval of the project,” he said.

    Willie Wong’s prefaced his translation of Wang’s rebuttal with this:

    Here is a translation of the essential parts of the rebuttal; some standard Chinese language disclaimers of deference etc are omitted. I tried to make the translation as true to the original as possible; the viewpoints expressed are not my own.

    Here is the translation:

    Today (September 4) published the article by CN Yang titled “China should not build an SSC today”. As a scientist who works on the front line of high energy physics and the current director of the the high energy physics institute in the Chinese Academy of Sciences, I cannot agree with his viewpoint.

    (A) The first reason to Dr. Yang’s objection is that a supercollider is a bottomless hole. His objection stemmed from the American SSC wasting 3 billion US dollars and amounted to naught. The LHC cost over 10 billion US dollars. Thus the proposed Chinese accelerator cannot cost less than 20 billion US dollars, with no guaranteed returns. [Ed: emphasis original]

    Here, there are actually three problems. The first is “why did SSC fail”? The second is “how much would a Chinese collider cost?” And the third is “is the estimate reasonable and realistic?” Here I address them point by point.

    (1) Why did the American SSC fail? Are all colliders bottomless pits?

    The many reasons leading to the failure of the American SSC include the government deficit at the time, the fight for funding against the International Space Station, the party politics of the United States, the regional competition between Texas and other states. Additionally there are problems with poor management, bad budgeting, ballooning construction costs, failure to secure international collaboration. See references [2,3] [Ed: consult original article for references; items 1-3 are English language]. In reality, “exceeding the budget” is definitely not the primary reason for the failure of the SSC; rather, the failure should be attributed to some special and circumstantial reasons, caused mainly by political elements.

    For the US, abandoning the SSC was a very incorrect decision. It lost the US the chance for discovering the Higgs Boson, as well as the foundations and opportunities for future development, and thereby also the leadership position that US has occupied internationally in high energy physics until then. This definitely had a very negative impact on big science initiatives in the US, and caused one generation of Americans to lose the courage to dream. The reasons given by the American scientific community against the SSC are very similar to what we here today against the Chinese collider project. But actually the cancellation of the SSC did not increase funding to other scientific endeavors. Of course, activation of the SSC would not have reduced the funding to other scientific endeavors, and many people who objected to the project are not regretting it.

    Since then, LHC was constructed in Europe, and achieved great success. Even though its construction exceeded its original budget, but not by a lot. This shows that supercollider projects do not have to be bottomless, and has a chance to succeed.

    The Chinese political landscape is entirely different from that of the US. In particular, for large scale constructions, the political system is superior. China has already accomplished to date many tasks which the Americans would not, or could not do; many more will happen in the future. The failure of SSC doesn’t mean that we cannot do it. We should scientifically analyze the situation, and at the same time foster international collaboration, and properly manage the budget.

    (2) How much would it cost? Our planned collider (using circumference of 100 kilometers for computations) will proceed in two steps. [Ed: details omitted. The author estimated that the electron-positron collider will cost 40 Billion Yuan, followed by the proton-proton collider which will cost 100 billion Yuan, not accounting for inflation. With approximately 10 year construction time for each phase.] The two-phase planning is to showcase the scientific longevity of the project, especially entrainment of other technical development (e.g. high energy superconductors), and that the second phase [ed: the proton-proton collider] is complementary to the scientific and technical developments of the first phase. The reason that the second phase designs are incorporated in the discussion is to prevent the scenario where design elements of the first phase inadvertently shuts off possibility of further expansion in the second phase.

    (3) Is this estimate realistic? Are we going to go down the same road as the American SSC?

    First, note that in the past 50 years , there were many successful colliders internationally (LEP, LHC, PEPII, KEKB/SuperKEKB etc) and many unsuccessful ones (ISABELLE, SSC, FAIR, etc). The failed ones are all proton accelerators. All electron colliders have been successful. The main reason is that proton accelerators are more complicated, and it is harder to correctly estimate the costs related to constructing machines beyond the current frontiers.

    There are many successful large-scale constructions in China. In the 40 years since the founding of the high energy physics institute, we’ve built [list of high energy experiment facilities, I don’t know all their names in English], each costing over 100 million Yuan, and none are more than 5% over budget, in terms of actual costs of construction, time to completion, meeting milestones. We have a well developed expertise in budget, construction, and management.

    For the CEPC (electron-positron collider) our estimates relied on two methods:

    (i) Summing of the parts: separately estimating costs of individual elements and adding them up.

    (ii) Comparisons: using costs for elements derived from costs of completed instruments both domestically and abroad.

    At the level of the total cost and at the systems level, the two methods should produce cost estimates within 20% of each other.

    After completing the initial design [ref. 1], we produced a list of more than 1000 required equipments, and based our estimates on that list. The estimates are refereed by local and international experts.

    For the SPPC (the proton-proton collider; second phase) we only used the second method (comparison). This is due to the second phase not being the main mission at hand, and we are not yet sure whether we should commit to the second phase. It is therefore not very meaningful to discuss its potential cost right now. We are committed to only building the SPPC once we are sure the science and the technology are mature.

    (B) The second reason given by Dr. Yang is that China is still a developing country, and there are many social-economic problems that should be solved before considering a supercollider.

    Any country, especially one as big as China, must consider both the immediate and the long-term in its planning. Of course social-economic problems need to be solved, and indeed solving them is taking currently the lions share of our national budget. But we also need to consider the long term, including an appropriate amount of expenditures on basic research, to enable our continuous development and the potential to lead the world. The China at the end of the Qing dynasty has a rich populace with the world’s highest GDP. But even though the government has the ability to purchase armaments, the lack of scientific understanding reduced the country to always be on the losing side of wars.

    In the past few hundred years, developments into understanding the structure of matter, from molecules, atoms, to the nucleus, the elementary particles, all contributed and led the scientific developments of their era. High energy physics pursue the finest structure of matter and its laws, the techniques used cover many different fields, from accelerator, detectors, to low temperature, superconducting, microwave, high frequency, vacuum, electronic, high precision instrumentation, automatic controls, computer science and networking, in many ways led to the developments in those fields and their broad adoption. This is a indicator field in basic science and technical developments. Building the supercollider can result in China occupying the leadership position in such diverse scientific fields for several decades, and also lead to the domestic production of many of the important scientific and technical instruments. Furthermore, it will allow us to attract international intellectual capital, and allow the training of thousands of world-leading specialists in our institutes. How is this not an urgent need for the country?

    In fact, the impression the Chinese government and the average Chinese people create for the world at large is a populace with lots of money, and also infatuated with money. It is hard for a large country to have a international voice and influence without significant contribution to the human culture. This influence, in turn, affects the benefits China receive from other countries. In terms of current GDP, the proposed project (including also the phase 2 SPPC) does not exceed that of the Beijing positron-electron collider completed in the 80s, and is in fact lower than LEP, LHC, SSC, and ILC.

    Designing and starting the construction of the next supercollider within the next 5 years is a rare opportunity to let us achieve a leadership position internationally in the field of high energy physics. The newly discovered Higgs boson has a relatively low mass, which allows us to probe it further using a circular positron-electron collider. Furthermore, such colliders has a chance to be modified into proton colliders. This facility will have over 5 decades of scientific use. Furthermore, currently Europe, US, and Japan all already have scientific items on their agenda, and within 20 years probably cannot construct similar facilities. This gives us an advantage in competitiveness. Thirdly, we already have the experience building the Beijing positron-electron collider, so such a facility is in our strengths. The window of opportunity typically lasts only 10 years, if we miss it, we don’t know when the next window will be. Furthermore, we have extensive experience in underground construction, and the Chinese economy is currently at a stage of high growth. We have the ability to do the constructions and also the scientific need. Therefore a supercollider is a very suitable item to consider.

    (C) The third reason given by Dr. Yang is that constructing a supercollider necessarily excludes funding other basic sciences.

    China currently spends 5% of all R&D budget on basic research; internationally 15% is more typical for developed countries. As a developing country aiming to joint the ranks of developed country, and as a large country, I believe we should aim to raise the ratio to 10% gradually and eventually to 15%. In terms of numbers, funding for basic science has a large potential for growth (around 100 billion yuan per annum) without taking away from other basic science research.

    On the other hand, where should the increased funding be directed? Everyone knows that a large portion of our basic science research budgets are spent on purchasing scientific instruments, especially from international sources. If we evenly distribute the increased funding amount all basic science fields, the end results is raising the GDP of US, Europe, and Japan. If we instead spend 10 years putting 30 billion Yuan into accelerator science, more than 90% of the money will remain in the country, and improve our technical development and market share of domestic companies. This will also allow us to raise many new scientists and engineers, and greatly improve the state of art in domestically produced scientific instruments.

    In addition, putting emphasis into high energy physics will only bring us to the normal funding level internationally (it is a fact that particle physics and nuclear physics are severely underfunded in China). For the purposes of developing a world-leading big science project, CEPC is a very good candidate. And it does not contradict a desire to also develop other basic sciences.

    (D) Dr. Yang’s fourth objection is that both supersymmetry and quantum gravity have not been verified, and the particles we hope to discover using the new collider will in fact be nonexistent.

    That is of course not the goal of collider science. In [ref 1] which I gave to Dr. Yang myself, we clearly discussed the scientific purpose of the instrument. Briefly speaking, the standard model is only an effective theory in the low energy limit, and a new and deeper theory is need. Even though there are some experimental evidence beyond the standard model, more data will be needed to indicate the correct direction to develop the theory. Of the known problems with the standard model, most are related to the Higgs Boson. Thus a deeper physical theory should have hints in a better understanding of the Higgs boson. CEPC can probe to 1% precision [ed. I am not sure what this means] Higgs bosons, 10 times better than LHC. From this we have the hope to correctly identify various properties of the Higgs boson, and test whether it in fact matches the standard model. At the same time, CEPC has the possibility of measuring the self-coupling of the Higgs boson, of understanding the Higgs contribution to vacuum phase transition, which is important for understanding the early universe. [Ed. in this previous sentence, the translations are a bit questionable since some HEP jargon is used with which I am not familiar] Therefore, regardless of whether LHC has discovered new physics, CEPC is necessary.

    If there are new coupling mechanisms for Higgs, new associated particles, composite structure for Higgs boson, or other differences from the standard model, we can continue with the second phase of the proton-proton collider, to directly probe the difference. Of course this could be due to supersymmetry, but it could also be due to other particles. For us experimentalists, while we care about theoretical predictions, our experiments are not designed only for them. To predict whether a collider can or cannot discover a hypothetical particle at this moment in time seems premature, and is not the view point of the HEP community in general.

    (E) The fifth objection is that in the past 70 years high energy physics have not led to tangible improvements to humanity, and in the future likely will not.

    In the past 70 years, there are many results from high energy physics, which led to techniques common to everyday life. [Ed: list of examples include sychrotron radiation, free electron laser, scatter neutron source, MRI, PET, radiation therapy, touch screens, smart phones, the world-wide web. I omit the prose.]

    [Ed. Author proceeds to discuss hypothetical economic benefits from
    a) superconductor science
    b) microwave source
    c) cryogenics
    d) electronics
    sort of the usual stuff you see in funding proposals.]

    (F) The sixth reason was that the institute for High Energy Physics of the Chinese Academy of Sciences has not produced much in the past 30 years. The major scientific contributions to the proposed collider will be directed by non-Chinese, and so the nobel will also go to a non-Chinese.

    [Ed. I’ll skip this section because it is a self-congratulatory pat on one’s back (we actually did pretty well for the amount of money invested), a promise to promote Chinese participation in the project (in accordance to the economic investment), and the required comment that “we do science for the sake of science, and not for winning the Nobel.”]

    (G) The seventh reason is that the future in HEP is in developing a new technique to accelerate particles, and developing a geometric theory, not in building large accelerators.

    A new method to accelerate particles is definitely an important aspect to accelerator science. In the next several decades this can prove useful for scattering experiments or for applied fields where beam confinement is not essential. For high energy colliders, in terms of beam emittance and energy efficiency, new acceleration principles have a long way to go. During this period, high energy physics cannot be simply put on hold. In terms of “geometric theory” or “string theory”, these are too far from experimentally approachable, and is not a problem we can consider currently.

    People disagree on the future of high energy physics. Currently there are no Chinese winners of the Nobel prize in physics, but there are many internationally. Dr. Yang’s viewpoints are clearly out of mainstream. Not just currently, but also in the past several decades. Dr. Yang has been documented to have held a pessimistic view of higher energy physics and its future since the 60s, and that’s how he missed out on the discovery of the standard model. He is on record as being against Chinese collider science since the 70s. It is fortunate that the government supported the Institute of High Energy Physics and constructed various supporting facilities, leading to our current achievements in synchrotron radiation and neutron scattering. For the future, we should listen to the younger scientists at the forefront of current research, for that’s how we can gain international recognition for our scientific research.

    It will be very interesting to see how this plays out.

    John BaezStruggles with the Continuum (Part 5)

    Quantum field theory is the best method we have for describing particles and forces in a way that takes both quantum mechanics and special relativity into account. It makes many wonderfully accurate predictions. And yet, it has embroiled physics in some remarkable problems: struggles with infinities!

    I want to sketch some of the key issues in the case of quantum electrodynamics, or ‘QED’. The history of QED has been nicely told here:

    • Silvian Schweber, QED and the Men who Made it: Dyson, Feynman, Schwinger, and Tomonaga, Princeton U. Press, Princeton, 1994.

    Instead of explaining the history, I will give a very simplified account of the current state of the subject. I hope that experts forgive me for cutting corners and trying to get across the basic ideas at the expense of many technical details. The nonexpert is encouraged to fill in the gaps with the help of some textbooks.

    QED involves just one dimensionless parameter, the fine structure constant:

    \displaystyle{ \alpha = \frac{1}{4 \pi \epsilon_0} \frac{e^2}{\hbar c} \approx \frac{1}{137.036} }

    Here e is the electron charge, \epsilon_0 is the permittivity of the vacuum, \hbar is Planck’s constant and c is the speed of light. We can think of \alpha^{1/2} as a dimensionless version of the electron charge. It says how strongly electrons and photons interact.

    Nobody knows why the fine structure constant has the value it does! In computations, we are free to treat it as an adjustable parameter. If we set it to zero, quantum electrodynamics reduces to a free theory, where photons and electrons do not interact with each other. A standard strategy in QED is to take advantage of the fact that the fine structure constant is small and expand answers to physical questions as power series in \alpha^{1/2}. This is called ‘perturbation theory’, and it allows us to exploit our knowledge of free theories.

    One of the main questions we try to answer in QED is this: if we start with some particles with specified energy-momenta in the distant past, what is the probability that they will turn into certain other particles with certain other energy-momenta in the distant future? As usual, we compute this probability by first computing a complex amplitude and then taking the square of its absolute value. The amplitude, in turn, is computed as a power series in \alpha^{1/2}.

    The term of order \alpha^{n/2} in this power series is a sum over Feynman diagrams with n vertices. For example, suppose we are computing the amplitude for two electrons wth some specified energy-momenta to interact and become two electrons with some other energy-momenta. One Feynman diagram appearing in the answer is this:

    Here the electrons exhange a single photon. Since this diagram has two vertices, it contributes a term of order \alpha. The electrons could also exchange two photons:

    giving a term of \alpha^2. A more interesting term of order \alpha^2 is this:

    Here the electrons exchange a photon that splits into an electron-positron pair and then recombines. There are infinitely many diagrams with two electrons coming in and two going out. However, there are only finitely many with n vertices. Each of these contributes a term proportional to \alpha^{n/2} to the amplitude.

    In general, the external edges of these diagrams correspond to the experimentally observed particles coming in and going out. The internal edges correspond to ‘virtual particles’: that is, particles that are not directly seen, but appear in intermediate steps of a process.

    Each of these diagrams is actually a notation for an integral! There are systematic rules for writing down the integral starting from the Feynman diagram. To do this, we first label each edge of the Feynman diagram with an energy-momentum, a variable p \in \mathbb{R}^4. The integrand, which we shall not describe here, is a function of all these energy-momenta. In carrying out the integral, the energy-momenta of the external edges are held fixed, since these correspond to the experimentally observed particles coming in and going out. We integrate over the energy-momenta of the internal edges, which correspond to virtual particles, while requiring that energy-momentum is conserved at each vertex.

    However, there is a problem: the integral typically diverges! Whenever a Feynman diagram contains a loop, the energy-momenta of the virtual particles in this loop can be arbitrarily large. Thus, we are integrating over an infinite region. In principle the integral could still converge if the integrand goes to zero fast enough. However, we rarely have such luck.

    What does this mean, physically? It means that if we allow virtual particles with arbitrarily large energy-momenta in intermediate steps of a process, there are ‘too many ways for this process to occur’, so the amplitude for this process diverges.

    Ultimately, the continuum nature of spacetime is to blame. In quantum mechanics, particles with large momenta are the same as waves with short wavelengths. Allowing light with arbitrarily short wavelengths created the ultraviolet catastrophe in classical electromagnetism. Quantum electromagnetism averted that catastrophe—but the problem returns in a different form as soon as we study the interaction of photons and charged particles.

    Luckily, there is a strategy for tackling this problem. The integrals for Feynman diagrams become well-defined if we impose a ‘cutoff’, integrating only over energy-momenta p in some bounded region, say a ball of some large radius \Lambda. In quantum theory, a particle with momentum of magnitude greater than \Lambda is the same as a wave with wavelength less than \hbar/\Lambda. Thus, imposing the cutoff amounts to ignoring waves of short wavelength—and for the same reason, ignoring waves of high frequency. We obtain well-defined answers to physical questions when we do this. Unfortunately the answers depend on \Lambda, and if we let \Lambda \to \infty, they diverge.

    However, this is not the correct limiting procedure. Indeed, among the quantities that we can compute using Feynman diagrams are the charge and mass of the electron! Its charge can be computed using diagrams in which an electron emits or absorbs a photon:

    Similarly, its mass can be computed using a sum over Feynman diagrams where one electron comes in and one goes out.

    The interesting thing is this: to do these calculations, we must start by assuming some charge and mass for the electron—but the charge and mass we get out of these calculations do not equal the masses and charges we put in!

    The reason is that virtual particles affect the observed charge and mass of a particle. Heuristically, at least, we should think of an electron as surrounded by a cloud of virtual particles. These contribute to its mass and ‘shield’ its electric field, reducing its observed charge. It takes some work to translate between this heuristic story and actual Feynman diagram calculations, but it can be done.

    Thus, there are two different concepts of mass and charge for the electron. The numbers we put into the QED calculations are called the ‘bare’ charge and mass, e_\mathrm{bare} and m_\mathrm{bare}. Poetically speaking, these are the charge and mass we would see if we could strip the electron of its virtual particle cloud and see it in its naked splendor. The numbers we get out of the QED calculations are called the ‘renormalized’ charge and mass, e_\mathbb{ren} and m_\mathbb{ren}. These are computed by doing a sum over Feynman diagrams. So, they take virtual particles into account. These are the charge and mass of the electron clothed in its cloud of virtual particles. It is these quantities, not the bare quantities, that should agree with experiment.

    Thus, the correct limiting procedure in QED calculations is a bit subtle. For any value of \Lambda and any choice of e_\mathrm{bare} and m_\mathrm{bare}, we compute e_\mathbb{ren} and m_\mathbb{ren}. The necessary integrals all converge, thanks to the cutoff. We choose e_\mathrm{bare} and m_\mathrm{bare} so that e_\mathbb{ren} and m_\mathbb{ren} agree with the experimentally observed charge and mass of the electron. The bare charge and mass chosen this way depend on \Lambda, so call them e_\mathrm{bare}(\Lambda) and m_\mathrm{bare}(\Lambda).

    Next, suppose we want to compute the answer to some other physics problem using QED. We do the calculation with a cutoff \Lambda, using e_\mathrm{bare}(\Lambda) and m_\mathrm{bare}(\Lambda) as the bare charge and mass in our calculation. Then we take the limit \Lambda \to \infty.

    In short, rather than simply fixing the bare charge and mass and letting \Lambda \to \infty, we cleverly adjust the bare charge and mass as we take this limit. This procedure is called ‘renormalization’, and it has a complex and fascinating history:

    • Laurie M. Brown, ed., Renormalization: From Lorentz to Landau (and Beyond), Springer, Berlin, 2012.

    There are many technically different ways to carry out renormalization, and our account so far neglects many important issues. Let us mention three of the simplest.

    First, besides the classes of Feynman diagrams already mentioned, we must also consider those where one photon goes in and one photon goes out, such as this:

    These affect properties of the photon, such as its mass. Since we want the photon to be massless in QED, we have to adjust parameters as we take \Lambda \to \infty to make sure we obtain this result. We must also consider Feynman diagrams where nothing comes in and nothing comes out—so-called ‘vacuum bubbles’—and make these behave correctly as well.

    Second, the procedure just described, where we impose a ‘cutoff’ and integrate over energy-momenta p lying in a ball of radius \Lambda, is not invariant under Lorentz transformations. Indeed, any theory featuring a smallest time or smallest distance violates the principles of special relativity: thanks to time dilation and Lorentz contractions, different observers will disagree about times and distances. We could accept that Lorentz invariance is broken by the cutoff and hope that it is restored in the \Lambda \to \infty limit, but physicists prefer to maintain symmetry at every step of the calculation. This requires some new ideas: for example, replacing Minkowski spacetime with 4-dimensional Euclidean space. In 4-dimensional Euclidean space, Lorentz transformations are replaced by rotations, and a ball of radius \Lambda is a rotation-invariant concept. To do their Feynman integrals in Euclidean space, physicists often let time take imaginary values. They do their calculations in this context and then transfer the results back to Minkowski spacetime at the end. Luckily, there are theorems justifying this procedure.

    Third, besides infinities that arise from waves with arbitrarily short wavelengths, there are infinities that arise from waves with arbitrarily long wavelengths. The former are called ‘ultraviolet divergences’. The latter are called ‘infrared divergences’, and they afflict theories with massless particles, like the photon. For example, in QED the collision of two electrons will emit an infinite number of photons with very long wavelengths and low energies, called ‘soft photons’. In practice this is not so bad, since any experiment can only detect photons with energies above some nonzero value. However, infrared divergences are conceptually important. It seems that in QED any electron is inextricably accompanied by a cloud of soft photons. These are real, not virtual particles. This may have remarkable consequences.

    Battling these and many other subtleties, many brilliant physicists and mathematicians have worked on QED. The good news is that this theory has been proved to be ‘perturbatively renormalizable’:

    • J. S. Feldman, T. R. Hurd, L. Rosen and J. D. Wright, QED: A Proof of Renormalizability, Lecture Notes in Physics 312, Springer, Berlin, 1988.

    • Günter Scharf, Finite Quantum Electrodynamics: The Causal Approach, Springer, Berlin, 1995

    This means that we can indeed carry out the procedure roughly sketched above, obtaining answers to physical questions as power series in \alpha^{1/2}.

    The bad news is we do not know if these power series converge. In fact, it is widely believed that they diverge! This puts us in a curious situation.

    For example, consider the magnetic dipole moment of the electron. An electron, being a charged particle with spin, has a magnetic field. A classical computation says that its magnetic dipole moment is

    \displaystyle{ \vec{\mu} = -\frac{e}{2m_e} \vec{S} }

    where \vec{S} is its spin angular momentum. Quantum effects correct this computation, giving

    \displaystyle{ \vec{\mu} = -g \frac{e}{2m_e} \vec{S} }

    for some constant g called the gyromagnetic ratio, which can be computed using QED as a sum over Feynman diagrams with an electron exchanging a single photon with a massive charged particle:

    The answer is a power series in \alpha^{1/2}, but since all these diagrams have an even number of vertices, it only contains integral powers of \alpha. The lowest-order term gives simply g = 2. In 1948, Julian Schwinger computed the next term and found a small correction to this simple result:

    \displaystyle{ g = 2 + \frac{\alpha}{\pi} \approx 2.00232 }

    By now a team led by Toichiro Kinoshita has computed g up to order \alpha^5. This requires computing over 13,000 integrals, one for each Feynman diagram of the above form with up to 10 vertices! The answer agrees very well with experiment: in fact, if we also take other Standard Model effects into account we get agreement to roughly one part in 10^{12}.

    This is the most accurate prediction in all of science.

    However, as mentioned, it is widely believed that this power series diverges! Next time I’ll explain why physicists think this, and what it means for a divergent series to give such a good answer when you add up the first few terms.

    September 13, 2016

    ResonaancesNext stop: tth

    This was a summer of brutally dashed hopes for a quick discovery of many fundamental particles that we were imagining. For the time being we need  to focus on the ones that actually exist, such as the Higgs boson. In the Run-1 of the LHC, the Higgs existence and identity were firmly established,  while its mass and basic properties were measured. The signal was observed with large significance in 4 different decay channels (γγ, ZZ*, WW*, ττ), and two different production modes (gluon fusion, vector-boson fusion) have been isolated.  Still, there remains many fine details to sort out. The realistic goal for the Run-2 is to pinpoint the following Higgs processes:
    • (h→bb): Decays to b-quarks.
    • (Vh): Associated production with W or Z boson. 
    • (tth): Associated production with top quarks. 

    It seems that the last objective may be achieved quicker than expected. The tth production process is very interesting theoretically, because its rate is proportional to the (square of the) Yukawa coupling between the Higgs boson and top quarks. Within the Standard Model, the value of this parameter is known to a good accuracy, as it is related to the mass of the top quark. But that relation can be  disrupted in models beyond the Standard Model, with the two-Higgs-doublet model and composite/little Higgs models serving as prominent examples. Thus, measurements of the top Yukawa coupling will provide a crucial piece of information about new physics.

    In the Run-1, a not-so-small signal of tth production was observed by the ATLAS and CMS collaborations in several channels. Assuming that Higgs decays have the same branching fraction as in the Standard Model, the tth signal strength normalized to the Standard Model prediction was estimated as

    At face value, a strong evidence for the tth production was obtained in the Run-1! This fact was not advertised by the collaborations because the measurement is not clean due to a large number of top quarks produced by other processes at the LHC. The tth signal is thus a small blip on top of a huge background, and it's not excluded that some unaccounted for systematic errors are skewing the measurements. The collaborations thus preferred to play it safe, and wait for more data to be collected.

    In the Run-2 with 13 TeV collisions the tth production cross section is 4-times larger than in the Run-1, therefore the new data are coming at a fast pace. Both ATLAS and CMS presented their first Higgs results in early August, and the tth signal is only getting stronger.  ATLAS showed their measurements in the γγ, WW/ττ, and bb final states of Higgs decay, as well as their combination:
    Most channels display a signal-like excess, which is reflected by the Run-2 combination being 2.5 sigma away from zero. A similar picture is emerging in CMS, with 2-sigma signals in the γγ and WW/ττ channels. Naively combining all Run-1 and and Run-2 results one then finds
    At face value, this is a discovery! Of course, this number should be treated with some caution because, due to large systematic errors, a naive Gaussian combination may not represent very well the true likelihood. Nevertheless, it indicates that, if all goes well, the discovery of the tth production mode should be officially announced in the near future, maybe even this year.

    Should we get excited that the measured tth rate is significantly larger than Standard Model one? Assuming  that the current central value remains, it would mean that  the top Yukawa coupling is 40% larger than that predicted by the Standard Model. This is not impossible, but very unlikely in practice. The reason is that the top Yukawa coupling also controls the gluon fusion - the main Higgs production channel at the LHC - whose rate is measured to be in perfect agreement with the Standard Model.  Therefore, a realistic model that explains the large tth rate would also have to provide negative contributions to the gluon fusion amplitude, so as to cancel the effect of the large top Yukawa coupling. It is possible to engineer such a cancellation in concrete models, but  I'm not aware of any construction where this conspiracy arises in a natural way. Most likely, the currently observed excess is  a statistical fluctuation (possibly in combination with  underestimated theoretical and/or  experimental errors), and the central value will drift toward μ=1 as more data is collected. 

    ResonaancesWeekend Plot: update on WIMPs

    There's been a lot of discussion on this blog about the LHC not finding new physics.  I should however give justice to other experiments that also don't find new physics, often in a spectacular way. One area where this is happening is direct detection of WIMP dark matter. This weekend plot summarizes the current limits on the spin-independent scattering cross-section of dark matter particles on nucleons:
    For large WIMP masses, currently the most succesful detection technology is to fill up a tank with a ton of liquid xenon and wait for a passing dark matter particle to knock one of the nuclei. Recently, we have had updates from two such experiments: LUX in the US, and PandaX in China, whose limits now cut below zeptobarn cross sections (1 zb = 10^-9 pb = 10^-45 cm^2). These two experiments are currently going head-to-head, but  Panda, being larger, will ultimately overtake LUX.  Soon, however,  it'll have to face a new fierce competitor: the XENON1T experiment, and the plot will have to be updated next year.  Fortunately, we won't need to be learning another prefix soon. Once yoctobarn sensitivity is achieved by the experiments, we will hit the neutrino floor:  the non-reducible background from solar and atmospheric neutrinos (gray area at the bottom of the plot). This will make detecting a dark matter signal much more challenging, and will certainly slow down the progress for WIMP masses larger than ~5 GeV. For lower masses,  the distance to the floor remains large. Xenon detectors lose their steam there, and another technology is needed, like germanium detectors of CDMS and CDEX, or CaWO4 crystals of CRESST. Also on this front important progress is expected soon.

    What does the theory say about when we will find dark matter? It is perfectly viable that the discovery is waiting for us just behind the corner in the remaining space above the neutrino floor, but currently there's no strong theoretical hints in favor of that possibility. Usually, dark matter experiments advertise that they're just beginning to explore the interesting parameter space predicted by theory models.This is not quite correct.  If the WIMP were true to its name, that is to say if it was interacting via the weak force (meaning, coupled to Z with order 1 strength), it would have order 10 fb scattering cross section on neutrons. Unfortunately, that natural possibility was excluded in the previous century. Years of experimental progress have shown that the WIMPs, if they exist, must be interacting super-weakly with matter. For example, for a 100 GeV fermionic dark matter with the vector coupling g to the Z boson, the current limits imply g ≲ 10^-4. The coupling can be larger if the Higgs boson is the mediator of interactions between the dark and visible worlds, as the Higgs already couples very weakly to nucleons. This construction is, arguably, the most plausible one currently probed by direct detection experiments.  For a scalar dark matter particle X with mass 0.1-1 TeV  coupled to the Higgs via the interaction  λ v h |X|^2 the experiments are currently probing the coupling λ in the 0.01-1 ballpark. In general, there's no theoretical lower limit on the dark matter coupling to nucleons. Nevertheless, the weak coupling implied by direct detection limits creates some tension for the thermal production paradigm, which requires a weak (that is order picobarn) annihilation cross section for dark matter particles. This tension needs to be resolved by more complicated model building,  e.g. by arranging for resonant annihilation or for co-annihilation.    

    n-Category Café HoTT and Philosophy

    I’m down in Bristol at a conference – HoTT and Philosophy. Slides for my talk – The modality of physical law in modal homotopy type theory – are here.

    Perhaps ‘The modality of differential equations’ would have been more accurate as I’m looking to work through an analogy in modal type theory between necessity and the jet comonad, partial differential equations being the latter’s coalgebras.

    The talk should provide some intuition for a pair of talks the following day:

    • Urs Schreiber & Felix Wellen: ‘Formalizing higher Cartan geometry in modal HoTT’
    • Felix Wellen: ‘Synthetic differential geometry in homotopy type theory via a modal operator’

    I met up with Urs and Felix yesterday evening. Felix is coding up in Agda geometric constructions, such as frame bundles, using the modalities of differential cohesion.

    n-Category Café Twitter

    I’m now trying to announce all my new writings in one place: on Twitter.

    Why? Well…

    Someone I respect said he’s been following my online writings, off and on, ever since the old days of This Week’s Finds. He wishes it were easier to find my new stuff all in one place. Right now it’s spread out over several locations:

    Azimuth: serious posts on environmental issues and applied mathematics, fairly serious popularizations of diverse scientific subjects.

    Google+: short posts of all kinds, mainly light popularizations of math, physics, and astronomy.

    The n-Category Café: posts on mathematics, leaning toward category theory and other forms of pure mathematics that seem too intimidating for the above forums.

    Visual Insight: beautiful pictures of mathematical objects, together with explanations.

    Diary: more personal stuff, and polished versions of the more interesting Google+ posts, just so I have them on my own website.

    It’s absurd to expect anyone to look at all these locations to see what I’m writing. Even more absurdly, I claimed I was going to quit posting on Google+, but then didn’t. So, I’ll try to make it possible to reach everything via Twitter.

    September 12, 2016

    Scott AaronsonThe Ninth Circuit ruled that vote-swapping is legal. Let’s use it to stop Trump.

    Updates: Commenter JT informs me that there’s already a vote-swapping site available:  (I particularly like their motto: “Everybody wins.  Except Trump.”)  I still think there’s a need for more sites, particularly ones that would interface with Facebook, but this is a great beginning.  I’ve signed up for it myself.

    Also, Toby Ord, a philosopher I know at Oxford, points me to a neat academic paper he wrote that analyzes vote-swapping as an example of “moral trade,” and that mentions the Porter v. Bowen decision holding vote-swapping to be legal in the US.

    Also, if we find two Gary Johnson supporters in swing states willing to trade, I’ve been contacted by a fellow Austinite who’d be happy to accept the second trade.

    As regular readers might know, my first appearance in the public eye (for a loose definition of “public eye”) had nothing to do with D-Wave, Gödel’s Theorem, the computational complexity of quantum gravity, Australian printer ads, or—god forbid—social justice shaming campaigns.  Instead it centered on NaderTrading: the valiant but doomed effort, in the weeks leading up to the 2000 US Presidential election, to stop George W. Bush’s rise to power by encouraging Ralph Nader supporters in swing states (such as Florida) to vote for Al Gore, while pairing themselves off over the Internet with Gore supporters in safe states (such as Texas or California) who would vote for Nader on their behalf.  That way, Nader’s vote share (and his chance of reaching 5% of the popular vote, which would’ve qualified him for federal funds in 2004) wouldn’t be jeopardized, but neither would Gore’s chance of winning the election.

    Here’s what I thought at the time:

    1. The election would be razor-close (though I never could’ve guessed how close).
    2. Bush was a malignant doofus who would be a disaster for the US and the world (though I certainly didn’t know how—recall that, at the time, Bush was running as an isolationist).
    3. Many Nader supporters, including the ones who I met at Berkeley, prioritized personal virtue so completely over real-world consequences that they might actually throw the election to Bush.

    NaderTrading, as proposed by law professor Jamin Raskin and others, seemed like one of the clearest ways for nerds who knew these points, but who lacked political skills, to throw themselves onto the gears of history and do something good for the world.

    So, as a 19-year-old grad student, I created a website called “In Defense of NaderTrading” (archived version), which didn’t arrange vote swaps themselves—other sites did that—but which explored some of the game theory behind the concept and answered some common objections to it.  (See also here.)  Within days of creating the site, I’d somehow become an “expert” on the topic, and was fielding hundreds of emails as well as requests for print, radio, and TV interviews.

    Alas, the one question everyone wanted to ask me was the one that I, as a CS nerd, was the least qualified to answer: is NaderTrading legal? isn’t it kind of like … buying and selling votes?

    I could only reply that, to my mind, NaderTrading obviously ought to be legal, because:

    1. Members of Congress and state legislatures trade votes all the time.
    2. A private agreement between two friends to each vote for the other’s preferred candidate seems self-evidently legal, so why should it be any different if a website is involved?
    3. The whole point of NaderTrading is to exercise your voting power more fully—pretty much the opposite of bartering it away for private gain.
    4. While the election laws vary by state, the ones I read very specifically banned trading votes for tangible goods—they never even mentioned trading votes for other votes, even though they easily could’ve done so had legislators intended to ban that.

    But—and here was the fatal problem—I could only address principles and arguments, rather than politics and power.  I couldn’t honestly assure the people who wanted to vote-swap, or to set up vote-swapping sites, that they wouldn’t be prosecuted for it.

    As it happened, the main vote-swapping site,, was shut down by California’s Republican attorney general, Bill Jones, only four days after it opened.  A second vote-swapping site,, was never directly threatened but also ceased operations because of what happened to voteswap2000.  Many legal scholars felt confident that these shutdowns wouldn’t hold up in court, but with just a few weeks until the election, there was no time to fight it.

    Before it was shut down, voteswap2000 had brokered 5,041 vote-swaps, including hundreds in Florida.  Had that and similar sites been allowed to continue operating, it’s entirely plausible that they would’ve changed the outcome of the election.  No Iraq war, no 2008 financial meltdown: we would’ve been living in a different world.  Note that, of the 100,000 Floridians who ultimately voted for Nader, we would’ve needed to convince fewer than 1% of them.

    Today, we face something I didn’t expect to face in my lifetime: namely, a serious prospect of a takeover of the United States by a nativist demagogue with open contempt for democratic norms and legendarily poor impulse control. Meanwhile, there are two third-party candidates—Gary Johnson and Jill Stein—who together command 10% of the vote.  A couple months ago, I’d expressed hopes that Johnson might help Hillary, by splitting the Republican vote. But it now looks clear that, on balance, not only Stein but also Johnson are helping Trump, by splitting up that part of the American vote that’s not driven by racial resentment.

    So recently a friend—the philanthropist and rationalist Holden Karnofsky—posed a question to me: should we revive the vote-swapping idea from 2000? And presumably this time around, enhance the idea with 21st-century bells and whistles like mobile apps and Facebook, to make it all the easier for Johnson/Stein supporters in swing states and Hillary supporters in safe states to find each other and trade votes?

    Just like so many well-meaning people back in 2000, Holden was worried about one thing: is vote-swapping against the law? If someone created a mobile vote-swapping app, could that person be thrown in jail?

    At first, I had no idea: I assumed that vote-swapping simply remained in the legal Twilight Zone where it was last spotted in 2000.  But then I did something radical: I looked it up.  And when I did, I discovered a decade-old piece of news that changes everything.

    On August 6, 2007, the Ninth Circuit Court of Appeals finally ruled on a case, Porter v. Bowen, stemming from the California attorney general’s shutdown of  Their ruling, which is worth reading in full, was unequivocal.

    Vote-swapping, it said, is protected by the First Amendment, which state election laws can’t supersede.  It is fundamentally different from buying or selling votes.

    Yes, the decision also granted the California attorney general immunity from prosecution, on the ground that vote-swapping’s legality hadn’t yet been established in 2000—indeed it wouldn’t be, until the Ninth Circuit’s decision itself!  Nevertheless, the ruling made clear that the appellants (the creators of voteswap2000 and some others) were granted the relief they sought: namely, an assurance that vote-swapping websites would be protected from state interference in the future.

    Admittedly, if vote-swapping takes off again, it’s possible that the question will be re-litigated and will end up in the Supreme Court, where the Ninth Circuit’s ruling could be reversed.  For now, though, let the message be shouted from the rooftops: a court has ruled. You cannot be punished for cooperating with your fellow citizens to vote strategically, or for helping others do the same.

    For those of you who oppose Donald Trump and who are good at web and app development: with just two months until the election, I think the time to set up some serious vote-swapping infrastructure is right now.  Let your name be etched in history, alongside those who stood up to all the vicious demagogues of the past.  And let that happen without your even needing to get up from your computer chair.

    I’m not, I confess, a huge fan of either Gary Johnson or Jill Stein (especially not Stein).  Nevertheless, here’s my promise: on November 8, I will cast my vote in the State of Texas for Gary Johnson, if I can find at least one Johnson supporter who lives in a swing state, who I feel I can trust, and who agrees to vote for Hillary Clinton on my behalf.

    If you think you’ve got what it takes to be my vote-mate, send me an email, tell me about yourself, and let’s talk!  I’m not averse to some electoral polyamory—i.e., lots of Johnson supporters in swing states casting their votes for Clinton, in exchange for the world’s most famous quantum complexity blogger voting for Johnson—but I’m willing to settle for a monogamous relationship if need be.

    And as for Stein? I’d probably rather subsist on tofu than vote for her, because of her support for seemingly every pseudoscience she comes across, and especially because of her endorsement of the vile campaign to boycott Israel.  Even so: if Stein supporters in swing states whose sincerity I trusted offered to trade votes with me, and Johnson supporters didn’t, I would bury my scruples and vote for Stein.  Right now, the need to stop the madman takes precedence over everything else.

    One last thing to get out of the way.  When they learn of my history with NaderTrading, people keep pointing me a website called, and exclaiming “look! isn’t this exactly that vote-trading thing you were talking about?”

    On examination, Balanced Rebellion turns out to be the following proposal:

    1. A Trump supporter in a swing state pairs off with a Hillary supporter in a swing state.
    2. Both of them vote for Gary Johnson, thereby helping Johnson without giving an advantage to either Hillary or Trump.

    So, exercise for the reader: see if you can spot the difference between this idea and the kind of vote-swapping I’m talking about.  (Here’s a hint: my version helps prevent a racist lunatic from taking command of the most powerful military on earth, rather than being neutral about that outcome.)

    Not surprisingly, the “balanced rebellion” is advocated by Johnson fans.

    Tommaso DorigoINFN Selections - A Last Batch Of Advices

    Next Monday, the Italian city of Rome will swarm with about 700 young physicists. They will be there to participate to a selection of 58 INFN research scientists. In previous articles (see e.g.

    read more

    Doug NatelsonProfessional service

    An underappreciated part of a scientific career is "professional service" - reviewing papers and grant proposals, filling roles in professional societies, organizing workshops/conferences/summer schools - basically carrying your fair share of the load, so that the whole scientific enterprise actually functions.  Some people take on service roles primarily because they want to learn better how the system works; others do so out of altruism, realizing that it's only fair, for example, to perform reviews of papers and grants at roughly the rate you submit them; still others take on responsibility because they either think they know best how to run/fix things, or because they don't like the alternatives.   Often it's a combination of all of these.

    More and more journals proliferate; numbers of grant applications climb even as (in the US anyway) support remains flat or declining; and conference attendance continues to grow (the APS March Meeting is now twice as large as in my last year of grad school).  This means that professional demands are on the rise.  At the same time, it is difficult to track and quantify (except by self-reporting) these activities, and reward structures give only indirect incentive (e.g., reviewing grants gives you a sense of what makes a better proposal) to good citizenship.  So, when you're muttering under your breath about referee number 3 or about how the sessions are organized nonoptimally at your favorite conference (as we all do from time to time), remember that at least the people in question are trying to contribute, rather than sitting on the sidelines.

    Chad Orzel362-366/366: Sillyhead-Centric Closing

    And now, the photo-a-day project straggles in to the finish line, with a final five photos dominated by the kids:

    362/366: Kid Art I

    A set of figures drawn by The Pip at day care over the summer.

    A set of figures drawn by The Pip at day care over the summer.

    363/366: Kid Art II

    Awesome owl drawing by SteelyKid.

    Awesome owl drawing by SteelyKid.

    One of the official end of summer activities is cleaning off the “art shelf” in the bookcase in the dining room, where we pile the various projects the kids bring home from school and day care. I sort these, and take photos of the best, for historical documentation purposes, and these are two of my favorites from the lot. The four round-bodied figures all came home on the same day, from The Pip. The owl was in SteelyKid’s backpack the last week of second grade, which shows just how long we’ve gone without cleaning that shelf off. And also how good she’s gotten at drawing…

    Also, here’s some bonus kid art: a fan decorated by SteelyKid:

    A fan decorated by SteelyKid.

    A fan decorated by SteelyKid.

    I’m not sure exactly how she did this– whether the paper was pre-stretched on the fan, or whether she painted on a flat paper that somebody then attached to the frame. I should ask, as I bet the explanation will be entertainingly detailed.

    364/366: Farewell to the Pool I

    SteelyKid going off the diving board at the Niskayuna town pool.

    SteelyKid going off the diving board at the Niskayuna town pool.

    365/366: Farewell to the Pool II

    The Pip swimming with a noodle at the Niskayuna town pool.

    The Pip swimming with a noodle at the Niskayuna town pool.

    These are from the next-to-last day of our summer membership at the Niskayuna town pool, when we all went over as a family (which is why there are good photos– Kate was in the pool with The Pip, just out of frame). It’s been really remarkable to see The Pip’s development in terms of swimming. At the start of last summer, he wouldn’t even dip his feet into the pool, and now, he’s an eager swimmer. He’s using a noodle here to help him float, which lets him really motor around, but he can furiously dog-paddle short distances all by himself. And will, in fact, angrily insist on being allowed to dog-paddle freely over short distances even in the wave pool at Great Escape (which we also visited last weekend, to catch the water park before it closed), which is utterly terrifying.

    He’s extremely proud of this too, and asked for this to be the picture we printed out to send in for inclusion in his kindergarten class book. And commemorated it with a self-portrait, which I’ll throw in here as mor bonus kid art:

    The Pip's drawing of himself swimming with a green pool noodle.

    The Pip’s drawing of himself swimming with a green pool noodle.

    And we’ll close this out with the thing that’s making me close this out: the return of the school year.

    366/366: Workspace

    My new office on campus.

    My new office on campus.

    After three years in the department chair’s office (which is much bigger), I’m back in a standard-size office. Not the same office I was in before I was Chair, but one a couple doors further down the hall. A colleague wanted to be in the office I used to have, though, and moved the giant desk I ordered when I arrived to make room. So I moved into the office where my good desk was, and have, for the moment, organized things in a sensible and reasonably aesthetic way. This will be covered in random stacks of books and papers within three weeks, but it looks nice and professional right now, so we’ll give it the final spot, to mark the closing of my sabbatical with a picture of the workspace I’m returning to.

    And that’s it for that. I may or may not have some wrap-up thoughts in a day or two, but for right now, I have classes to teach, and need to go get ready.

    Terence TaoCourse announcement: 246A, complex analysis

    Next week, I will be teaching Math 246A, the first course in the three-quarter graduate complex analysis sequence.  This first course covers much of the same ground as an honours undergraduate complex analysis course, in particular focusing on the basic properties of holomorphic functions such as the Cauchy and residue theorems, the classification of singularities, and the maximum principle, but there will be more of an emphasis on rigour, generalisation and abstraction, and connections with other parts of mathematics.  If time permits I may also cover topics such as factorisation theorems, harmonic functions, conformal mapping, and/or applications to analytic number theory.  The main text I will be using for this course is Stein-Shakarchi (with Ahlfors as a secondary text), but as usual I will also be writing notes for the course on this blog.

    Filed under: 246A - complex analysis, admin Tagged: announcement

    September 11, 2016

    Tommaso DorigoStatistics At A Physics Conference ?!

    Particle physics conferences are a place where you can listen to many different topics - not just news about the latest precision tests of the standard model or searches for new particles at the energy frontier. If we exclude the very small, workshop-like events where people gather to focus on a very precise topic, all other events do allow for the contamination from reports of parallel fields of research. The reason is of course that there is a significant cross-fertilization between these fields. 

    read more

    Chad Orzel353-361/366: Penultimate Photo Dump

    Another day, another batch of photos from August.

    353/366: Trail Blaze

    One lost sock next to the hiking path in the Reist bird sanctuary.

    One lost sock next to the hiking path in the Reist bird sanctuary.

    This is actually a slight reversal of chronology, as the hike through Vischer’s Ferry was the day after I went for a hike in the H. G. Reist Wildlife Sanctuary. The pictures from Vischer’s Ferry were better though, as despite the name I didn’t see much wildlife in the Reist Sanctuary. I did, however, find this bright pink child-size sock sitting next to the trail right after a fork. Presumably a way for some wandering toddler to find their way home…

    354/366: Web

    Spiderweb near the top of a tree i the Reist sanctuary.

    Spiderweb near the top of a tree in the Reist sanctuary.

    I didn’t see much in the way of wildlife in the wildlife sanctuary. They did, however, have a wealth of spiderwebs, mostly strung face-high across the path. This one high up in a tree was a pleasant exception, and caught the light well.

    355/366: Window

    A dead tree trunk with a hole clean through it, in the Reist sanctuary.

    A dead tree trunk with a hole clean through it, in the Reist sanctuary.

    356/366: New and Old

    Fine tendrils of new vine on a dead tree in the Reist sanctuary.

    Fine tendrils of new vine on a dead tree in the Reist sanctuary.

    The sanctuary did have a wide variety of interesting-looking dead trees, though. The second of these is one of those “imprinted deeply on The Lord of the Rings” moments, because this kind of thing always reminds me of Sam and Frodo finding a defaced statue in The Two Towers.

    357/366: Please Repost

    The hanging remains of a broken telephone pole in Niskayuna.

    The hanging remains of a broken telephone pole in Niskayuna.

    On the way back from the Reist sanctuary, I stopped and made a stab at getting an image I’ve wanted for months. this is a telephone pole on Balltown Road that was hit by something and shattered several months ago. They brought crews in and cleaned out the wreckage, but left this one bit of pole hanging from the wires. I have no idea how the wires are attached in order for this to seem like a good idea. Or maybe it’s just capitalism in action– something like “National Grid owns the power lines at the top, and takes care of the poles, but can’t be bothered to remove a hanging weight stressing the telephone lines owned by Verizon down below.”

    358/366: Market Morning

    The Schenectady Greenmarket, our regular Sunday-morning activity.

    The Schenectady Greenmarket, our regular Sunday-morning activity.

    Every Sunday morning, I take the kids to the Schenectady Greenmarket to get a few things and give Kate a bit of a break. I don’t generally take the camera along, though, because it’s hard enough to wrangle the sillyheads without also looking for aesthetic views. My parents took the kids for a weekend in late August, though, so I finally got some decent shots of the market; this one is from the steps of the Post Office across the street.

    The market takes up two streets in an “L” shape, so here’s the same scene from the other angle:

    The Schenectady Greenmarket, looking down Franklin Street.

    The Schenectady Greenmarket, looking down Franklin Street.

    I’m sort of torn as to which of these I like better; the first gives a better sense of the market itself, because it’s a longer street and thus more crowded, but the isolated building in the second really pops. So I’ll put them both here, and you can decide for yourself which is the real photo 358.

    359/366: Hanging Flowers

    Planters hanging from a light pole in downtown Schenectady.

    Planters hanging from a light pole in downtown Schenectady.

    I’m sort of proud of the alignment in this one, as there’s an ugly smokestack behind this that I managed to hide pretty well behind the lamp post.

    (“Why not pick a different post?” you ask? Because the others on this street don’t have as clear a view of the sky as this one, and it was easier to hide the smokestack than the stuff that would clutter the background for the others.)

    360/366: Planted Flowers

    Roses outside the Wold building at Union.

    Roses outside the Wold building at Union.

    The end of August means the start of a new school year, which means making campus look all pretty for the new students when they arrive. I’m not sure if these roses in front of the Wold building just bloom at a fortuitous time, or if they’re recent transplants, but they’re very pretty.

    361/366: Green Lions

    Cool lions on the base of a lamp post by Memorial Chapel.

    Cool lions on the base of a lamp post by Memorial Chapel.

    Probably the most enduring affect of this experiment in taking lots of pictures will be a somewhat heightened awareness of odd little photogenic things around me. Like these funky lions holding up a light outside of Memorial Chapel. I’ve been here fifteen years, and never noticed these guys until I was wandering around campus with a camera, looking for interesting shots.

    That’s it for today. The final five photos will be posted in one last photo dump, probably tomorrow morning, but maybe Tuesday.

    BackreactionI’ve read a lot of books recently

    [Reading is to writing what eating is to...]

    Dreams Of A Final Theory: The Scientist's Search for the Ultimate Laws of Nature
    Steven Weinberg
    Vintage, Reprint Edition (1994)

    This book appeared when I was still in high school and I didn’t take note of it then. Later it seemed too out-of-date to bother, but meanwhile it’s almost become a historical document. Written with the pretty explicit aim to argue in favor of the Superconducting Supercollider (a US-proposal for a large particle collider that was scraped in the early 90s), it’s the most flawless popular science book about theoretical physics I’ve ever come across.

    Weinberg’s explanations are both comprehensible and remarkably accurate. The book contains no unnecessary clutter, is both well-structured and well written, and Weinberg doesn’t hold back with his opinions, neither on religion nor on philosophy.

    It’s also the first time I’ve tried an audio-book. I listened to it while treadmill running. A lot of sweat went into the first chapters. But I gave up half through and bought the paperback which I read on the plane to Austin. Weinberg is one of the people I interviewed for my book.

    Lesson learned: Audiobooks aren’t for me.

    Truth And Beauty – Aesthetics and Motivations in Science
    Subrahmanyan Chandrasekhar
    University of Chicago Press (1987)

    I had read this book before but wanted to remind me of its content. It’s a collection of essays on the role of beauty in physics, mostly focused on general relativity and the early 20th century. Along historical examples like Milne, Eddington, Weyl, and Einstein, Chandrasekhar discusses various aspects of beauty, like elegance, simplicity, or harmony. I find it too bad that Chandrasekhar didn’t bring in more of his own opinion but mostly summarizes other people’s thoughts.

    Lesson learned: Tell the reader what you think.

    Truth or Beauty – Science and the Quest for Order
    David Orrell
    Yale University Press (2012)

    In this book, mathematician David Orrell argues that beauty isn’t a good guide to truth. It’s an engagingly written book which covers a lot of ground, primarily in physics, from helocentrism to string theory. But Orrell tries too hard to make everything fit his bad-beauty narrative. Many of his interpretations are over-the-top, like his complaint that
     “[T]he aesthetics of science – and particularly the “hard” sciences such as physics –have been characterized by a distinctly male feel. For example, feminist psychologists have noted that the classical picture of the atom as hard, indivisible, independent, separate, and so on corresponds very closely to the stereotypically masculine sense of self. If must have come as a shock to the young, male, champions of quantum theory when they disco