# Planet Musings

## September 30, 2016

### Clifford Johnson — Super-Strong…?

Are you going to watch the Luke Cage series that debuts today on Netflix? I probably will at some point (I've got several decades old reasons, and also it was set up well in the excellent Jessica Jones last year).... but not soon as I've got far too many deadlines. Here'a a related item: Using the Luke Cage character as a jumping off point, physicist Martin Archer has put together a very nice short video about the business of strong and tough (not the same thing) materials in the real world.

Have a look if you want to appreciate the nuances, and learn a bit about what's maybe just over the horizon for new amazing materials that might be come part of our every day lives. Video embed below: [...] Click to continue reading this post

The post Super-Strong…? appeared first on Asymptotia.

### Terence Tao — 246A, Notes 0: the complex numbers

Kronecker is famously reported to have said, “God created the natural numbers; all else is the work of man”. The truth of this statement (literal or otherwise) is debatable; but one can certainly view the other standard number systems ${{\bf Z}, {\bf Q}, {\bf R}, {\bf C}}$ as (iterated) completions of the natural numbers ${{\bf N}}$ in various senses. For instance:

• The integers ${{\bf Z}}$ are the additive completion of the natural numbers ${{\bf N}}$ (the minimal additive group that contains a copy of ${{\bf N}}$).
• The rationals ${{\bf Q}}$ are the multiplicative completion of the integers ${{\bf Z}}$ (the minimal field that contains a copy of ${{\bf Z}}$).
• The reals ${{\bf R}}$ are the metric completion of the rationals ${{\bf Q}}$ (the minimal complete metric space that contains a copy of ${{\bf Q}}$).
• The complex numbers ${{\bf C}}$ are the algebraic completion of the reals ${{\bf R}}$ (the minimal algebraically closed field that contains a copy of ${{\bf R}}$).

These descriptions of the standard number systems are elegant and conceptual, but not entirely suitable for constructing the number systems in a non-circular manner from more primitive foundations. For instance, one cannot quite define the reals ${{\bf R}}$ from scratch as the metric completion of the rationals ${{\bf Q}}$, because the definition of a metric space itself requires the notion of the reals! (One can of course construct ${{\bf R}}$ by other means, for instance by using Dedekind cuts or by using uniform spaces in place of metric spaces.) The definition of the complex numbers as the algebraic completion of the reals does not suffer from such a non-circularity issue, but a certain amount of field theory is required to work with this definition initially. For the purposes of quickly constructing the complex numbers, it is thus more traditional to first define ${{\bf C}}$ as a quadratic extension of the reals ${{\bf R}}$, and more precisely as the extension ${{\bf C} = {\bf R}(i)}$ formed by adjoining a square root ${i}$ of ${-1}$ to the reals, that is to say a solution to the equation ${i^2+1=0}$. It is not immediately obvious that this extension is in fact algebraically closed; this is the content of the famous fundamental theorem of algebra, which we will prove later in this course.

The two equivalent definitions of ${{\bf C}}$ – as the algebraic closure, and as a quadratic extension, of the reals respectively – each reveal important features of the complex numbers in applications. Because ${{\bf C}}$ is algebraically closed, all polynomials over the complex numbers split completely, which leads to a good spectral theory for both finite-dimensional matrices and infinite-dimensional operators; in particular, one expects to be able to diagonalise most matrices and operators. Applying this theory to constant coefficient ordinary differential equations leads to a unified theory of such solutions, in which real-variable ODE behaviour such as exponential growth or decay, polynomial growth, and sinusoidal oscillation all become aspects of a single object, the complex exponential ${z \mapsto e^z}$ (or more generally, the matrix exponential ${A \mapsto \exp(A)}$). Applying this theory more generally to diagonalise arbitrary translation-invariant operators over some locally compact abelian group, one arrives at Fourier analysis, which is thus most naturally phrased in terms of complex-valued functions rather than real-valued ones. If one drops the assumption that the underlying group is abelian, one instead discovers the representation theory of unitary representations, which is simpler to study than the real-valued counterpart of orthogonal representations. For closely related reasons, the theory of complex Lie groups is simpler than that of real Lie groups.

Meanwhile, the fact that the complex numbers are a quadratic extension of the reals lets one view the complex numbers geometrically as a two-dimensional plane over the reals (the Argand plane). Whereas a point singularity in the real line disconnects that line, a point singularity in the Argand plane leaves the rest of the plane connected (although, importantly, the punctured plane is no longer simply connected). As we shall see, this fact causes singularities in complex analytic functions to be better behaved than singularities of real analytic functions, ultimately leading to the powerful residue calculus for computing complex integrals. Remarkably, this calculus, when combined with the quintessentially complex-variable technique of contour shifting, can also be used to compute some (though certainly not all) definite integrals of real-valued functions that would be much more difficult to compute by purely real-variable methods; this is a prime example of Hadamard’s famous dictum that “the shortest path between two truths in the real domain passes through the complex domain”.

Another important geometric feature of the Argand plane is the angle between two tangent vectors to a point in the plane. As it turns out, the operation of multiplication by a complex scalar preserves the magnitude and orientation of such angles; the same fact is true for any non-degenerate complex analytic mapping, as can be seen by performing a Taylor expansion to first order. This fact ties the study of complex mappings closely to that of the conformal geometry of the plane (and more generally, of two-dimensional surfaces and domains). In particular, one can use complex analytic maps to conformally transform one two-dimensional domain to another, leading among other things to the famous Riemann mapping theorem, and to the classification of Riemann surfaces.

If one Taylor expands complex analytic maps to second order rather than first order, one discovers a further important property of these maps, namely that they are harmonic. This fact makes the class of complex analytic maps extremely rigid and well behaved analytically; indeed, the entire theory of elliptic PDE now comes into play, giving useful properties such as elliptic regularity and the maximum principle. In fact, due to the magic of residue calculus and contour shifting, we already obtain these properties for maps that are merely complex differentiable rather than complex analytic, which leads to the striking fact that complex differentiable functions are automatically analytic (in contrast to the real-variable case, in which real differentiable functions can be very far from being analytic).

The geometric structure of the complex numbers (and more generally of complex manifolds and complex varieties), when combined with the algebraic closure of the complex numbers, leads to the beautiful subject of complex algebraic geometry, which motivates the much more general theory developed in modern algebraic geometry. However, we will not develop the algebraic geometry aspects of complex analysis here.

Last, but not least, because of the good behaviour of Taylor series in the complex plane, complex analysis is an excellent setting in which to manipulate various generating functions, particularly Fourier series ${\sum_n a_n e^{2\pi i n \theta}}$ (which can be viewed as boundary values of power (or Laurent) series ${\sum_n a_n z^n}$), as well as Dirichlet series ${\sum_n \frac{a_n}{n^s}}$. The theory of contour integration provides a very useful dictionary between the asymptotic behaviour of the sequence ${a_n}$, and the complex analytic behaviour of the Dirichlet or Fourier series, particularly with regard to its poles and other singularities. This turns out to be a particularly handy dictionary in analytic number theory, for instance relating the distribution of the primes to the Riemann zeta function. Nowadays, many of the analytic number theory results first obtained through complex analysis (such as the prime number theorem) can also be obtained by more “real-variable” methods; however the complex-analytic viewpoint is still extremely valuable and illuminating.

We will frequently touch upon many of these connections to other fields of mathematics in these lecture notes. However, these are mostly side remarks intended to provide context, and it is certainly possible to skip most of these tangents and focus purely on the complex analysis material in these notes if desired.

Note: complex analysis is a very visual subject, and one should draw plenty of pictures while learning it. I am however not planning to put too many pictures in these notes, partly as it is somewhat inconvenient to do so on this blog from a technical perspective, but also because pictures that one draws on one’s own are likely to be far more useful to you than pictures that were supplied by someone else.

— 1. The construction and algebra of the complex numbers —

Note: this section will be far more algebraic in nature than the rest of the course; we are concentrating almost all of the algebraic preliminaries in this section in order to get them out of the way and focus subsequently on the analytic aspects of the complex numbers.

Thanks to the laws of high-school algebra, we know that the real numbers ${{\bf R}}$ are a field: it is endowed with the arithmetic operations of addition, subtraction, multiplication, and division, as well as the additive identity ${0}$ and multiplicative identity ${1}$, that obey the usual laws of algebra (i.e. the field axioms).

The algebraic structure of the reals does have one drawback though – not all (non-trivial) polynomials have roots! Most famously, the polynomial equation ${x^2+1=0}$ has no solutions over the reals, because ${x^2}$ is always non-negative, and hence ${x^2+1}$ is always strictly positive, whenever ${x}$ is a real number.

As mentioned in the introduction, one traditional way to define the complex numbers ${{\bf C}}$ is as the smallest possible extension of the reals ${{\bf R}}$ that fixes this one specific problem:

Definition 1 (The complex numbers) A field of complex numbers is a field ${{\bf C}}$ that contains the real numbers ${{\bf R}}$ as a subfield, as well as a root ${i}$ of the equation ${i^2+1=0}$. (Thus, strictly speaking, a field of complex numbers is a pair ${({\bf C},i)}$, but we will almost always abuse notation and use ${{\bf C}}$ as a metonym for the pair ${({\bf C},i)}$.) Furthermore, ${{\bf C}}$ is generated by ${{\bf R}}$ and ${i}$, in the sense that there is no subfield of ${{\bf C}}$, other than ${{\bf C}}$ itself, that contains both ${{\bf R}}$ and ${i}$; thus, in the language of field extensions, we have ${{\bf C} = {\bf R}(i)}$.

(We will take the existence of the real numbers ${{\bf R}}$ as a given in this course; constructions of the real number system can of course be found in many real analysis texts, including my own.)

Definition 1 is short, but proposing it as a definition of the complex numbers raises some immediate questions:

• (Existence) Does such a field ${{\bf C}}$ even exist?
• (Uniqueness) Is such a field ${{\bf C}}$ unique (up to isomorphism)?
• (Non-arbitrariness) Why the square root of ${-1}$? Why not adjoin instead, say, a fourth root of ${-42}$, or the solution to some other algebraic equation? Also, could one iterate the process, extending ${{\bf C}}$ further by adding more roots of equations?

The third set of questions can be answered satisfactorily once we possess the fundamental theorem of algebra. For now, we focus on the first two questions.

We begin with existence. One can construct the complex numbers quite explicitly and quickly using the Argand plane construction; see Remark 7 below. However, from the perspective of higher mathematics, it is more natural to view the construction of the complex numbers as a special case of the more general algebraic construction that can extend any field ${k}$ by the root ${\alpha}$ of an irreducible nonlinear polynomial ${P \in k[\mathrm{x}]}$ over that field; this produces a field of complex numbers ${{\bf C}}$ when specialising to the case where ${k={\bf R}}$ and ${P = \mathrm{x}^2+1}$. We will just describe this construction in that special case, leaving the general case as an exercise.

Starting with the real numbers ${{\bf R}}$, we can form the space ${{\bf R}[\mathrm{x}]}$ of (formal) polynomials

$\displaystyle P(x) = a_d \mathrm{x}^d + a_{d-1} \mathrm{x}^{d-1} + \dots + a_0$

with real co-efficients ${a_0,\dots,a_d \in {\bf R}}$ and arbitrary non-negative integer ${d}$ in one indeterminate variable ${\mathrm{x}}$. (A small technical point: we do not view this indeterminate ${\mathrm{x}}$ as belonging to any particular domain such as ${{\bf R}}$, so we do not view these polynomials ${P}$ as functions but merely as formal expressions involving a placeholder symbol ${\mathrm{x}}$ (which we have rendered in Roman type to indicate its formal character). In this particular characteristic zero setting of working over the reals, it turns out to be harmless to identify each polynomial ${P}$ with the corresponding function ${P: {\bf R} \rightarrow {\bf R}}$ formed by interpreting the indeterminate ${\mathrm{x}}$ as a real variable; but if one were to generalise this construction to positive characteristic fields, and particularly finite fields, then one can run into difficulties if polynomials are not treated formally, due to the fact that two distinct formal polynomials might agree on all inputs in a given finite field (e.g. the polynomials ${x}$ and ${x^p}$ agree for all ${x}$ in the finite field ${{\mathbf F}_p}$). However, this subtlety can be ignored for the purposes of this course.) This space ${{\bf R}[x]}$ of polynomials has a pretty good algebraic structure, in particular the usual operations of addition, subtraction, and multiplication on polynomials, together with the zero polynomial ${0}$ and the unit polynomial ${1}$, give ${{\bf R}[\mathrm{x}]}$ the structure of a (unital) commutative ring. This commutative ring also contains ${{\bf R}}$ as a subring (identifying each real number ${a}$ with the degree zero polynomial ${a x^0}$). The ring ${{\bf R}[\mathrm{x}]}$ is however not a field, because many non-zero elements of ${{\bf R}[\mathrm{x}]}$ do not have multiplicative inverses. (In fact, no non-constant polynomial in ${{\bf R}[\mathrm{x}]}$ has an inverse in ${{\bf R}[\mathrm{x}]}$, because the product of two non-constant polynomials has a degree that is the sum of the degrees of the factors.)

If a unital commutative ring fails to be field, then it will instead possess a number of non-trivial ideals. The only ideal we will need to consider here is the principal ideal

$\displaystyle \langle \mathrm{x}^2+1 \rangle := \{ (\mathrm{x}^2+1) P(\mathrm{x}): P(\mathrm{x}) \in {\bf R}[\mathrm{x}] \}.$

This is clearly an ideal of ${{\bf R}[\mathrm{x}]}$ – it is closed under addition and subtraction, and the product of any element of the ideal ${\langle \mathrm{x}^2 + 1 \rangle}$ with an element of the full ring ${{\bf R}[\mathrm{x}]}$ remains in the ideal ${\langle \mathrm{x}^2 + 1 \rangle}$.

We now define ${{\bf C}}$ to be the quotient space

$\displaystyle {\bf C} := {\bf R}[\mathrm{x}] / \langle \mathrm{x}^2+1 \rangle$

of the commutative ring ${{\bf R}[\mathrm{x}]}$ by the ideal ${\langle \mathrm{x}^2+1 \rangle}$; this is the space of cosets ${P(\mathrm{x}) + \langle \mathrm{x}^2+1 \rangle = \{ P(\mathrm{x}) + Q(\mathrm{x}): Q(\mathrm{x}) \in \langle \mathrm{x}^2+1 \rangle \}}$ of ${\langle \mathrm{x}^2+1 \rangle}$ in ${{\bf R}[\mathrm{x}]}$. Because ${\langle \mathrm{x}^2+1 \rangle}$ is an ideal, there is an obvious way to define addition, subtraction, and multiplication in ${{\bf C}}$, namely by setting

$\displaystyle (P(\mathrm{x}) + \langle \mathrm{x}^2+1 \rangle) + (Q(\mathrm{x}) + \langle \mathrm{x}^2+1 \rangle) := (P(\mathrm{x}) + Q(\mathrm{x}) + \langle \mathrm{x}^2+1 \rangle),$

$\displaystyle (P(\mathrm{x}) + \langle \mathrm{x}^2+1 \rangle) - (Q(\mathrm{x}) + \langle \mathrm{x}^2+1 \rangle) := (P(\mathrm{x}) - Q(\mathrm{x}) + \langle \mathrm{x}^2+1 \rangle)$

and

$\displaystyle (P(\mathrm{x}) + \langle \mathrm{x}^2+1 \rangle) \cdot (Q(\mathrm{x}) + \langle \mathrm{x}^2+1 \rangle) := (P(\mathrm{x}) Q(\mathrm{x}) + \langle \mathrm{x}^2+1 \rangle)$

for all ${P(\mathrm{x}), Q(\mathrm{x}) \in {\bf R}[\mathrm{x}]}$; these operations, together with the additive identity ${0 = 0 + \langle \mathrm{x}^2+1 \rangle}$ and the multiplicative identity ${1 = 1 + \langle \mathrm{x}^2+1 \rangle}$, can be easily verified to give ${{\bf C}}$ the structure of a commutative ring. Also, the real line ${{\bf R}}$ embeds into ${{\bf C}}$ by identifying each real number ${a}$ with the coset ${a + \langle \mathrm{x}^2+1 \rangle}$; note that this identification is injective, as no real number is a multiple of the polynomial ${\mathrm{x}^2+1}$.

If we define ${i \in {\bf C}}$ to be the coset

$\displaystyle i := \mathrm{x} + \langle \mathrm{x}^2 + 1 \rangle,$

then it is clear from construction that ${i^2+1=0}$. Thus ${{\bf C}}$ contains both ${{\bf R}}$ and a solution of the equation ${i^2+1=0}$. Also, since every element of ${{\bf C}}$ is of the form ${P(\mathrm{x}) + \langle \mathrm{x}^2+1 \rangle}$ for some polynomial ${P \in {\bf R}[\mathrm{x}]}$, we see that every element of ${{\bf C}}$ is a polynomial combination ${P(i)}$ of ${i}$ with real coefficients; in particular, any subring of ${{\bf C}}$ that contains ${{\bf R}}$ and ${i}$ will necessarily have to contain every element of ${{\bf C}}$. Thus ${{\bf C}}$ is generated by ${{\bf R}}$ and ${i}$.

The only remaining thing to verify is that ${{\bf C}}$ is a field and not just a commutative ring. In other words, we need to show that every non-zero element of ${{\bf C}}$ has a multiplicative inverse. This stems from a particular property of the polynomial ${\mathrm{x}^2 + 1}$, namely that it is irreducible in ${{\bf R}[\mathrm{x}]}$. That is to say, we cannot factor ${\mathrm{x}^2+1}$ into non-constant polynomials

$\displaystyle \mathrm{x}^2 + 1 = P(\mathrm{x}) Q(\mathrm{x})$

with ${P(\mathrm{x}), Q(\mathrm{x}) \in {\bf R}[\mathrm{x}]}$. Indeed, as ${\mathrm{x}^2+1}$ has degree two, the only possible way such a factorisation could occur is if ${P(\mathrm{x}), Q(\mathrm{x})}$ both have degree one, which would imply that the polynomial ${x^2+1}$ has a root in the reals ${{\bf R}}$, which of course it does not.

Because the polynomial ${\mathrm{x}^2+1}$ is irreducible, it is also prime: if ${\mathrm{x}^2+1}$ divides a product ${P(\mathrm{x}) Q(\mathrm{x})}$ of two polynomials in ${{\bf R}[\mathrm{x}]}$, then it must also divide at least one of the factors ${P(\mathrm{x})}$, ${Q(\mathrm{x})}$. Indeed, if ${\mathrm{x}^2 + 1}$ does not divide ${P(\mathrm{x})}$, then by irreducibility the greatest common divisor of ${\mathrm{x}^2+1}$ and ${P(\mathrm{x})}$ is ${1}$. Applying the Euclidean algorithm for polynomials, we then obtain a representation of ${1}$ as

$\displaystyle 1 = R(\mathrm{x}) (\mathrm{x}^2+1) + S(\mathrm{x}) P(\mathrm{x})$

for some polynomials ${R(\mathrm{x}), S(\mathrm{x})}$; multiplying both sides by ${Q(\mathrm{x})}$, we conclude that ${Q(\mathrm{x})}$ is a multiple of ${\mathrm{x}^2+1}$.

Since ${\mathrm{x}^2+1}$ is prime, the quotient space ${{\bf C} = {\bf R}[\mathrm{x}] / \langle \mathrm{x}^2+1 \rangle}$ is an integral domain: there are no zero-divisors in ${{\bf C}}$ other than zero. This brings us closer to the task of showing that ${{\bf C}}$ is a field, but we are not quite there yet; note for instance that ${{\bf R}[\mathrm{x}]}$ is an integral domain, but not a field. But one can finish up by using finite dimensionality. As ${{\bf C}}$ is a ring containing the field ${{\bf R}}$, it is certainly a vector space over ${{\bf R}}$; as ${{\bf C}}$ is generated by ${{\bf R}}$ and ${i}$, and ${i^2=-1}$, we see that it is in fact a two-dimensional vector space over ${{\bf R}}$, spanned by ${1}$ and ${i}$ (which are linearly independent, as ${i}$ clearly cannot be real). In particular, it is finite dimensional. For any non-zero ${z \in {\bf C}}$, the multiplication map ${w \mapsto zw}$ is an ${{\bf R}}$-linear map from this finite-dimensional vector space to itself. As ${{\bf C}}$ is an integral domain, this map is injective; by finite-dimensionality, it is therefore surjective (by the rank-nullity theorem). In particular, there exists ${w}$ such that ${zw = 1}$, and hence ${z}$ is invertible and ${{\bf C}}$ is a field. This concludes the construction of a complex field ${{\bf C}}$.

Remark 2 One can think of the action of passing from a ring ${R}$ to a quotient ${R/I}$ by some ideal ${I}$ as the action of forcing some relations to hold between the various elements of ${R}$, by requiring all the elements of the ideal ${I}$ (or equivalently, all the generators of ${I}$) to vanish. Thus one can think of ${{\bf R}[\mathrm{x}] / \langle \mathrm{x}^2 + 1 \rangle}$ as the ring formed by adjoining a new element ${i}$ to the existing ring ${{\bf R}}$ and then demanding the constraint ${i^2+1=0}$. With this perspective, the main issues to check in order to obtain a complex field are firstly that these relations do not collapse the ring so much that two previously distinct elements of ${{\bf R}}$ become equal, and secondly that all the non-zero elements become invertible once the relations are imposed, so that we obtain a field rather than merely a ring or integral domain.

Remark 3 It is instructive to compare the complex field ${{\bf R}[\mathrm{x}] / \langle \mathrm{x}^2 + 1 \rangle}$, formed by adjoining the square root of ${-1}$ to the reals, with other commutative rings such as the dual numbers ${{\bf R}[\mathrm{x}] / \langle \mathrm{x}^2 \rangle}$ (which adjoins an additional square root of ${0}$ to the reals) or the split complex numbers ${{\bf R}[\mathrm{x}] / \langle \mathrm{x}^2 - 1 \rangle}$ (which adjoins a new root of ${+1}$ to the reals). The latter two objects are perfectly good rings, but are not fields (they contain zero divisors, and the first ring even contains a nilpotent). This is ultimately due to the reducible nature of the polynomials ${\mathrm{x}^2}$ and ${\mathrm{x}^2-1}$ in ${{\bf R}[\mathrm{x}]}$.

Uniqueness of ${{\bf C}}$ up to isomorphism is a straightforward exercise:

Now that we have existence and uniqueness up to isomorphism, it is safe to designate one of the complex fields ${{\bf C} = ({\bf C},i)}$ as the complex field; the other complex fields out there will no longer be of much importance in this course (or indeed, in most of mathematics), with one small exception that we will get to later in this section. One can, if one wishes, use the above abstract algebraic construction ${( {\bf R}[\mathrm{x}] / \langle \mathrm{x}^2+1 \rangle, \mathrm{x} + \langle \mathrm{x}^2 + 1 \rangle)}$ as the choice for “the” complex field ${{\bf C}}$, but one can certainly pick other choices if desired (e.g. the Argand plane construction in Remark 7 below). But in view of Exercise 4, the precise construction of ${{\bf C}}$ is not terribly relevant for the purposes of actually doing complex analysis, much as the question of whether to construct the real numbers ${{\bf R}}$ using Dedekind cuts, equivalence classes of Cauchy sequences, or some other construction is not terribly relevant for the purposes of actually doing real analysis. So, from here on out, we will no longer refer to the precise construction of ${{\bf C}}$ used; the reader may certainly substitute his or her own favourite construction of ${{\bf C}}$ in place of ${{\bf R}[\mathbf{x}] / \langle {\mathbf x}^2 + 1 \rangle}$ if desired, with essentially no change to the rest of the lecture notes.

Exercise 5 Let ${k}$ be an arbitrary field, let ${k[\mathrm{x}]}$ be the ring of polynomials with coefficients in ${k}$, and let ${P(\mathrm{x})}$ be an irreducible polynomial in ${k[\mathrm{x}]}$ of degree at least two. Show that ${k[\mathrm{x}] / \langle P(\mathrm{x}) \rangle}$ is a field containing an embedded copy of ${k}$, as well as a root ${\alpha}$ of the equation ${P(\alpha)=0}$, and that this field is generated by ${k}$ and ${\alpha}$. Also show that all such fields are unique up to isomorphism. (This field ${k[\mathrm{x}] / \langle P(\mathrm{x}) \rangle}$ is an example of a field extension of ${k}$, the further study of which can be found in any advanced undergraduate or early graduate text on algebra, and is the starting point in particular for the beautiful topic of Galois theory, which we will not discuss here.)

Exercise 6 Let ${k}$ be an arbitrary field. Show that every non-constant polynomial ${P(\mathrm{x})}$ in ${k[\mathrm{x}]}$ can be factored as the product ${P_1(\mathrm{x}) \dots P_r(\mathrm{x})}$ of irreducible non-constant polynomials. Furthermore show that this factorisation is unique up to permutation of the factors ${P_1(\mathrm{x}),\dots,P_r(\mathrm{x})}$, and multiplication of each of the factors by a constant (with the product of all such constants being one). In other words: the polynomiail ring ${k[\mathrm{x}]}$ is a unique factorisation domain.

$\displaystyle {\bf C} = \{ a + b i: a,b \in {\bf R} \}$

with each element ${z}$ of ${{\bf C}}$ having a unique representation of the form ${a+bi}$, thus

$\displaystyle a+bi = c+di \iff a=c \hbox{ and } b=d$

for real ${a,b,c,d}$. The addition, subtraction, and multiplication operations can then be written down explicitly in these coordinates as

$\displaystyle (a+bi) + (c+di) = (a+c) + (b+d)i$

$\displaystyle (a+bi) - (c+di) = (a-c) + (b-d)i$

$\displaystyle (a+bi) (c+di) = (ac-bd) + (ad+bc)i$

and with a bit more work one can compute the division operation as

$\displaystyle \frac{a+bi}{c+di} = \frac{ac+bd}{c^2+d^2} + \frac{bc-ad}{c^2+d^2} i$

if ${c+di \neq 0}$. One could take these coordinate representations as the definition of the complex field ${{\bf C}}$ and its basic arithmetic operations, and this is indeed done in many texts introducing the complex numbers. In particular, one could take the Argand plane ${({\bf R}^2, (0,1))}$ as the choice of complex field, where we identify each point ${(a,b)}$ in ${{\bf R}^2}$ with ${a+bi}$ (so for instance ${{\bf R}^2}$ becomes endowed with the multiplication operation ${(a,b) (c,d) = (ac-bd,ad+bc)}$). This is a very concrete and direct way to construct the complex numbers; the main drawback is that it is not immediately obvious that the field axioms are all satisfied. For instance, the associativity of multiplication is rather tedious to verify in the coordinates of the Argand plane. In contrast, the more abstract algebraic construction of the complex numbers given above makes it more evident what the source of the field structure on ${{\bf C}}$ is, namely the irreducibility of the polynomial ${\mathrm{x}^2+1}$.

Remark 8 Because of the Argand plane construction, we will sometimes refer to the space ${{\bf C}}$ of complex numbers as the complex plane. We should warn, though, that in some areas of mathematics, particularly in algebraic geometry, ${{\bf C}}$ is viewed as a one-dimensional complex vector space (or a one-dimensional complex manifold or complex variety), and so ${{\bf C}}$ is sometimes referred to in those cases as a complex line. (Similarly, Riemann surfaces, which from a real point of view are two-dimensional surfaces, can sometimes be referred to as complex curves in the literature; the modular curve is a famous instance of this.) In this current course, though, the topological notion of dimension turns out to be more important than the algebraic notions of dimension, and as such we shall generally refer to ${{\bf C}}$ as a plane rather than a line.

Elements of ${{\bf C}}$ of the form ${bi}$ for ${b}$ real are known as purely imaginary numbers; the terminology is colourful, but despite the name, imaginary numbers have precisely the same first-class mathematical object status as real numbers. If ${z=a+bi}$ is a complex number, the real components ${a,b}$ of ${z}$ are known as the real part ${\mathrm{Re}(z)}$ and imaginary part ${\mathrm{Im}(z)}$ of ${z}$ respectively. Complex numbers that are not real are occasionally referred to as strictly complex numbers. In the complex plane, the set ${{\bf R}}$ of real numbers forms the real axis, and the set ${i{\bf R}}$ of imaginary numbers forms the imaginary axis. Traditionally, elements of ${{\bf C}}$ are denoted with symbols such as ${z}$, ${w}$, or ${\zeta}$, while symbols such as ${a,b,c,d,x,y}$ are typically intended to represent real numbers instead.

Remark 9 We noted earlier that the equation ${x^2+1=0}$ had no solutions in the reals because ${x^2+1}$ was always positive. In other words, the properties of the order relation ${<}$ on ${{\bf R}}$ prevented the existence of a root for the equation ${x^2+1=0}$. As ${{\bf C}}$ does have a root for ${x^2+1=0}$, this means that the complex numbers ${{\bf C}}$ cannot be ordered in the same way that the reals are ordered (that is to say, being totally ordered, with the positive numbers closed under both addition and multiplication). Indeed, one usually refrains from putting any order structure on the complex numbers, so that statements such as ${z < w}$ for complex numbers ${z,w}$ are left undefined (unless ${z,w}$ are real, in which case one can of course use the real ordering). In particular, the complex number ${i}$ is considered to be neither positive nor negative, and an assertion such as ${z is understood to implicitly carry with it the claim that ${z,w}$ are real numbers and not just complex numbers. (Of course, if one really wants to, one can find some total orderings to place on ${{\bf C}}$, e.g. lexicographical ordering on the real and imaginary parts. However, such orderings do not interact too well with the algebraic structure of ${{\bf C}}$ and are rarely useful in practice.)

As with any other field, we can raise a complex number ${z}$ to a non-negative integer ${n}$ by declaring inductively ${z^0 := 0}$ and ${z^{n+1} := z \times z^n}$ for ${n \geq 0}$; in particular we adopt the usual convention that ${0^0=1}$ (when thinking of the base ${0}$ as a complex number, and the exponent ${0}$ as a non-negative integer). For negative integers ${n = -m}$, we define ${z^n = 1/z^m}$ for non-zero ${z}$; we leave ${z^n}$ undefined when ${z}$ is zero and ${n}$ is negative. At the present time we do not attempt to define ${z^\alpha}$ for any exponent ${\alpha}$ other than an integer; we will return to such exponentiation operations later in the course, though we will at least define the complex exponential ${e^z}$ for any complex ${z}$ later in this set of notes.

By definition, a complex field ${({\bf C},i)}$ is a field ${{\bf C}}$ together with a root ${z=i}$ of the equation ${z^2+1=0}$. But if ${z=i}$ is a root of the equation ${z^2+1=0}$, then so is ${z=-i}$ (indeed, from the factorisation ${z^2+1 = (z-i) (z+i)}$ we see that these are the only two roots of this quadratic equation. Thus we have another complex field ${\overline{{\bf C}} := ({\bf C},-i)}$ which differs from ${{\bf C}}$ only in the choice of root ${i}$. By Exercise 4, there is a unique field isomorphism from ${{\bf C}}$ to ${{\bf C}}$ that maps ${i}$ to ${-i}$ (i.e. a complex field isomorphism from ${{\bf C}}$ to ${\overline{{\bf C}}}$); this operation is known as complex conjugation and is denoted ${z \mapsto \overline{z}}$. In coordinates, we have

$\displaystyle \overline{a+bi} = a-bi.$

Being a field isomorphism, we have in particular that

$\displaystyle \overline{z+w} = \overline{z} + \overline{w}$

and

$\displaystyle \overline{zw} = \overline{z} \overline{w}$

for all complex numbers ${z,w}$. It is also clear that complex conjugation fixes the real numbers, and only the real numbers: ${z = \overline{z}}$ if and only if ${z}$ is real. Geometrically, complex conjugation is the operation of reflection in the complex plane across the real axis. It is clearly an involution in the sense that it is its own inverse:

$\displaystyle \overline{\overline{z}} = z.$

One can also relate the real and imaginary parts to complex conjugation via the identities

$\displaystyle \mathrm{Re}(z) = \frac{z + \overline{z}}{2}; \quad \mathrm{Im}(z) = \frac{z-\overline{z}}{2i}. \ \ \ \ \ (1)$

Remark 10 Any field automorphism of ${{\bf C}}$ has to map ${i}$ to a root of ${z^2+1=0}$, and so the only field automorphisms of ${{\bf C}}$ that preserve the real line are the identity map and the conjugation map; conversely, the real line is the subfield of ${{\bf C}}$ fixed by both of these automorphisms. In the language of Galois theory, this means that ${{\bf C}}$ is a Galois extension of ${{\bf R}}$, with Galois group ${\mathrm{Gal}({\bf C}/{\bf R})}$ consisting of two elements. There is a certain sense in which one can think of the complex numbers (or more precisely, the scheme ${{\mathcal C}}$ of complex numbers) as a double cover of the real numbers (or more precisely, the scheme ${{\mathcal R}}$ of real numbers), analogous to how the boundary of a Möbius strip can be viewed as a double cover of the unit circle formed by shrinking the width of the strip to zero. (In this analogy, points on the unit circle correspond to specific models of the real number system ${{\bf R}}$, and lying above each such point are two specific models ${({\bf C}, i)}$, ${({\bf C},-i)}$ of the complex number system; this analogy can be made precise using Grothendieck’s “functor of points” interpretation of schemes.) The operation of complex conjugation is then analogous to the operation of monodromy caused by looping once around the base unit circle, causing the two complex fields sitting above a real field to swap places with each other. (This analogy is not quite perfect, by the way, because the boundary of a Möbius strip is not simply connected and can in turn be finitely covered by other curves, whereas the complex numbers are algebraically complete and admit no further finite extensions; one should really replace the unit circle here by something with a two-element fundamental group, such as the projective plane ${\mathbf{RP}^2}$ that is double covered by the sphere ${S^2}$, but this is harder to visualize.) The analogy between (absolute) Galois groups and fundamental groups suggested by this picture can be made precise in scheme theory by introducing the concept of an étale fundamental group, which unifies the two concepts, but developing this further is well beyond the scope of this course; see this book of Szamuely for further discussion.

Observe that if we multiply a complex number ${z}$ by its complex conjugate ${\overline{z}}$, we obtain a quantity ${N(z) := z \overline{z}}$ which is invariant with respect to conjugation (i.e. ${\overline{N(z)} = N(z)}$) and is therefore real. The map ${N: {\bf C} \rightarrow {\bf R}}$ produced this way is known in field theory as the norm form of ${{\bf C}}$ over ${{\bf R}}$; it is clearly multiplicative in the sense that ${N(zw) = N(z) N(w)}$, and is only zero when ${z}$ is zero. It can be used to link multiplicative inversion with complex conjugation, in that we clearly have

$\displaystyle \frac{1}{z} = \frac{\overline{z}}{N(z)} \ \ \ \ \ (2)$

for any non-zero complex number ${z}$. In coordinates, we have

$\displaystyle N(a+bi) = (a+bi) (a-bi) = a^2+b^2$

(thus recovering, by the way, the inversion formula ${\frac{1}{a+bi} = \frac{a}{a^2+b^2} - \frac{b}{a^2+b^2} i}$ implicit in Remark 7). In coordinates, the multiplicativity ${N(zw) = N(z) N(w)}$ takes the form of Lagrange’s identity

$\displaystyle (ac-bd)^2 + (ad+bc)^2 = (a^2+b^2) (c^2+d^2)$

— 2. The geometry of the complex numbers —

The norm form ${N}$ of the complex numbers has the feature of being positive definite: ${N(z)}$ is always non-negative (and strictly positive when ${z}$ is non-zero). This is a feature that is somewhat special to the complex numbers; for instance, the quadratic extension ${{\bf Q}(\sqrt{2})}$ of the rationals ${{\bf Q}}$ by ${\sqrt{2}}$ has the norm form ${N(n+m\sqrt{2}) = (n+m\sqrt{2}) (n-m\sqrt{2}) = n^2-2m^2}$, which is indefinite. One can view this positive definiteness of the norm form as the one remaining vestige in ${{\bf C}}$ of the order structure ${<}$ on the reals, which as remarked previously is no longer present directly in the complex numbers. (One can also view the positive definiteness of the norm form as a consequence of the topological connectedness of the punctured complex plane ${{\bf C} \backslash \{0\}}$: the norm form is positive at ${z=1}$, and cannot change sign anywhere in ${{\bf C} \backslash \{0\}}$, so is forced to be positive on the rest of this connected region.)

One consequence of positive definiteness is that the bilinear form

$\displaystyle \langle z, w \rangle := \mathrm{Re}( z \overline{w} )$

becomes a positive definite inner product on ${{\bf C}}$ (viewed as a vector space over ${{\bf R}}$). In particular, this turns the complex numbers into an inner product space over the reals. From the usual theory of inner product spaces, we can then construct a norm

$\displaystyle |z| := \langle z, z \rangle^{1/2} = N(z)^{1/2}$

(thus, the norm is the square root of the norm form) which obeys the triangle inequality

$\displaystyle |z+w| \leq |z| + |w|; \ \ \ \ \ (3)$

$\displaystyle |zw| = |z| |w| \ \ \ \ \ (4)$

$\displaystyle |\overline{z}| = |z|. \ \ \ \ \ (5)$

The norm ${|\cdot|}$ clearly extends the absolute value operation ${x \mapsto |x|}$ on the real numbers, and so we also refer to the norm ${|z|}$ of a complex number ${z}$ as its absolute value or magnitude. In coordinates, we have

$\displaystyle |a+bi| = \sqrt{a^2+b^2}, \ \ \ \ \ (6)$

thus for instance ${|i|=1}$, and from (6) we also immediately have the useful inequalities

$\displaystyle |\mathrm{Re}(z)|, |\mathrm{Im}(z)| \leq |z| \leq |\mathrm{Re}(z)| + |\mathrm{Im}(z)|. \ \ \ \ \ (7)$

As with any other normed vector space, the norm ${z \mapsto |z|}$ defines a metric on the complex numbers via the definition

$\displaystyle d(z,w) := |z-w|.$

Note that using the Argand plane representation of ${{\bf C}}$ as ${{\bf R}^2}$ that this metric coincides with the usual Euclidean metric on ${{\bf R}^2}$. This metric in turn defines a topology on ${{\bf C}}$ (generated in the usual manner by the open disks ${D(z,r) := \{ w \in {\bf C}: |z-w| < r \}}$), which in turn generates all the usual topological notions such as the concept of an open set, closed set, compact set, connected set, and boundary of a set; the notion of a limit of a sequence ${z_n}$; the notion of a continuous map, and so forth. For instance, a sequence ${z_n}$ of complex numbers converges to a limit ${z \in {\bf C}}$ if ${|z_n -z| \rightarrow 0}$ as ${n \rightarrow \infty}$, and a map ${f: {\bf C} \rightarrow {\bf C}}$ is continuous if one has ${f(z_n) \rightarrow f(z)}$ whenever ${z_n \rightarrow z}$, or equivalently if the inverse image of any open set is open. Again, using the Argand plane representation, these notions coincide with their counterparts on the Euclidean plane ${{\bf R}^2}$.

As usual, if a sequence ${z_n}$ of complex numbers converges to a limit ${z}$, we write ${z = \lim_{n \rightarrow \infty} z_n}$. From the triangle inequality (3) and the multiplicativity (4) we see that the addition operation ${+: {\bf C} \times {\bf C} \rightarrow {\bf C}}$, subtraction operation ${-: {\bf C} \times {\bf C} \rightarrow {\bf C}}$, and multiplication operation ${\times: {\bf C} \rightarrow {\bf C} \rightarrow {\bf C}}$, thus we have the familiar limit laws

$\displaystyle \lim_{n \rightarrow \infty} (z_n + w_n) = \lim_{n \rightarrow \infty} z_n + \lim_{n \rightarrow \infty} w_n,$

$\displaystyle \lim_{n \rightarrow \infty} (z_n - w_n) = \lim_{n \rightarrow \infty} z_n - \lim_{n \rightarrow \infty} w_n$

and

$\displaystyle \lim_{n \rightarrow \infty} (z_n \cdot w_n) = \lim_{n \rightarrow \infty} z_n \cdot \lim_{n \rightarrow \infty} w_n$

whenever the limits on the right-hand side exist. Similarly, from (5) we see that complex conjugation is an isometry of the complex numbers, thus

$\displaystyle \lim_{n \rightarrow \infty} \overline{z_n} = \overline{\lim_{n \rightarrow \infty} z_n}$

when the limit on the right-hand side exists. As a consequence, the norm form ${N: {\bf C} \rightarrow {\bf R}}$ and the absolute value ${|\cdot|: {\bf C} \rightarrow {\bf R}}$ are also continuous, thus

$\displaystyle \lim_{n \rightarrow \infty} |z_n| = |\lim_{n \rightarrow \infty} z_n|$

whenever the limit on the right-hand side exists. Using the formula (2) for the reciprocal of a complex number, we also see that division is a continuous operation as long as the denominator is non-zero, thus

$\displaystyle \lim_{n \rightarrow \infty} \frac{z_n}{w_n} = \frac{\lim_{n \rightarrow \infty} z_n}{\lim_{n \rightarrow \infty} w_n}$

as long as the limits on the right-hand side exist, and the limit in the denominator is non-zero.

From (7) we see that

$\displaystyle z_n \rightarrow z \iff \mathrm{Re}(z_n) \rightarrow \mathrm{Re}(z) \hbox{ and } \mathrm{Im}(z_n) \rightarrow \mathrm{Im}(z);$

in particular

$\displaystyle \lim_{n \rightarrow \infty} \mathrm{Re}(z_n) = \mathrm{Re} \lim_{n \rightarrow \infty} z_n$

and

$\displaystyle \lim_{n \rightarrow \infty} \mathrm{Im}(z_n) = \mathrm{Im} \lim_{n \rightarrow \infty} z_n$

whenever the limit on the right-hand side exists. One consequence of this is that ${{\bf C}}$ is complete: every sequence ${z_n}$ of complex numbers that is a Cauchy sequence (thus ${|z_n-z_m| \rightarrow 0}$ as ${n,m \rightarrow \infty}$) converges to a unique complex limit ${z}$. (As such, one can view the complex numbers as a (very small) example of a Hilbert space.)

As with the reals, we have the fundamental fact that any formal series ${\sum_{n=0}^\infty z_n}$ of complex numbers which is absolutely convergent, in the sense that the non-negative series ${\sum_{n=0}^\infty |z_n|}$ is finite, is necessarily convergent to some complex number ${S}$, in the sense that the partial sums ${\sum_{n=0}^N z_n}$ converge to ${S}$ as ${N \rightarrow \infty}$. This is because the triangle inequality ensures that the partial sums are a Cauchy sequence. As usual we write ${S = \sum_{n=0}^\infty z_n}$ to denote the assertion that ${S}$ is the limit of the partial sums ${\sum_{n=0}^\infty}$. We will occasionally have need to deal with series that are only conditionally convergent rather than absolutely convergent, but in most of our applications the only series we will actually evaluate are the absolutely convergent ones. Many of the limit laws imply analogues for series, thus for instance

$\displaystyle \sum_{n=0}^\infty \mathrm{Re}(z_n) = \mathrm{Re} \sum_{n=0}^\infty z_n$

whenever the series on the right-hand side is absolutely convergent (or even just convergent). We will not write down an exhaustive list of such series laws here.

An important role in complex analysis is played by the unit circle

$\displaystyle S^1 := \{ z \in {\bf C}: |z|=1 \}.$

In coordinates, this is the set of points ${a+bi}$ for which ${a^2+b^2=1}$, and so this indeed has the geometric structure of a unit circle. Elements of the unit circle will be referred to in these notes as phases. Every non-zero complex number ${z}$ has a unique polar decomposition as ${z = r \omega}$ where ${r>0}$ is a positive real and ${\omega}$ lies on the unit circle ${S^1}$. Indeed, it is easy to see that this decomposition is given by ${r = |z|}$ and ${\omega = \frac{z}{|z|}}$, and that this is the only polar decopmosition of ${z}$. We refer to the polar components ${r=|z|}$ and ${\omega = z/|z|}$ of a non-zero complex number ${z}$ as the magnitude and phase of ${z}$ respectively.

From (4) we see that the unit circle ${S^1}$ is a multiplicative group; it contains the multiplicative identity ${1}$, and if ${z, w}$ lie in ${S^1}$, then so do ${zw}$ and ${1/z}$. From (2) we see that reciprocation and complex conjugation agree on the unit circle, thus

$\displaystyle \frac{1}{z} = \overline{z}$

for ${z \in S^1}$. It is worth emphasising that this useful identity does not hold as soon as one leaves the unit circle, in which case one must use the more general formula (2) instead! If ${z_1,z_2}$ are non-zero complex numbers ${z_1,z_2}$ with polar decompositions ${z_1 = r_1 \omega_1}$ and ${z_2 = r_2 \omega_2}$ respectively, then clearly the polar decompositions of ${z_1 z_2}$ and ${z_1/z_2}$ are given by ${z_1 z_2 = (r_1 r_2) (\omega_1 \omega_2)}$ and ${z_1/z_2 = (r_1/r_2) (\omega_1/\omega_2)}$ respectively. Thus polar coordinates are very convenient for performing complex multiplication, although they turn out to be atrocious for performing complex addition. (This can be contrasted with the usual Cartesian coordinates ${z=a+bi}$, which are very convenient for performing complex addition and mildly inconvenient for performing complex multiplication.) In the language of group theory, the polar decomposition splits the multiplicative complex group ${{\bf C}^\times = ({\bf C} \backslash \{0\}, \times)}$ as the direct product of the positive reals ${(0,+\infty)}$ and the unit circle ${S^1}$: ${{\bf C}^\times \equiv (0,+\infty) \times S^1}$.

If ${\omega}$ is an element of the unit circle ${S^1}$, then from (4) we see that the operation ${z \mapsto \omega z}$ of multiplication by ${\omega}$ is an isometry of ${{\bf C}}$, in the sense that

$\displaystyle |\omega z - \omega w| = |z-w|$

for all complex numbers ${z, w}$. This isometry also preserves the origin ${0}$. As such, it is geometrically obvious (see Exercise 11 below) that the map ${z \mapsto \omega z}$ must either be a rotation around the origin, or a reflection around a line. The former operation is orientation preserving, and the latter is orientation reversing. Since the map ${z \mapsto \omega z}$ is clearly orientation preserving when ${\omega = 1}$, and the unit circle ${S^1}$ is connected, a continuity argument shows that ${z \mapsto \omega z}$ must be orientation preserving for all ${\omega \in S^1}$, and so must be a rotation around the origin by some angle. Of course, by trigonometry, we may write

$\displaystyle \omega = \cos \theta + i \sin \theta$

for some real number ${\theta}$. The rotation ${z \mapsto \omega z}$ clearly maps the number ${1}$ to the number ${\cos \theta + i \sin \theta}$, and so the rotation must be a counter-clockwise rotation by ${\theta}$ (adopting the usual convention of placing ${1}$ to the right of the origin and ${i}$ above it). In particular, when applying this rotation ${z \mapsto \omega z}$ to another point ${\cos \phi + i \sin \phi}$ on the unit circle, this point must get rotated to ${\cos(\theta+\phi) + i \sin(\theta+\phi)}$. We have thus given a geometric proof of the multiplication formula

$\displaystyle (\cos(\theta + \phi) + i \sin(\theta + \phi)) = (\cos \theta + i \sin \theta) (\cos \phi + i \sin \phi); \ \ \ \ \ (8)$

$\displaystyle \cos(\theta+\phi) = \cos \theta \cos \phi - \sin \theta \sin \phi$

$\displaystyle \sin(\theta+\phi) = \sin \theta \cos \phi + \cos \theta \sin \phi.$

We can also iterate the multiplication formula to give de Moivre’s formula

$\displaystyle \cos( n \theta ) + i \sin(n \theta) = (\cos \theta + i \sin \theta)^n$

for any natural number ${n}$ (or indeed for any integer ${n}$), which can in turn be used to recover familiar identities such as the double angle formulae

$\displaystyle \cos(2\theta) = \cos^2 \theta - \sin^2 \theta$

$\displaystyle \sin(2\theta) = 2 \sin \theta \cos \theta$

or triple angle formulae

$\displaystyle \cos(3\theta) = \cos^3 \theta - 3 \sin^2 \theta \cos \theta$

$\displaystyle \sin(3\theta) = 3 \sin \theta \cos^2 \theta - \sin^3 \theta$

after expanding out de Moivre’s formula for ${n=2}$ or ${n=3}$ and taking real and imaginary parts.

Exercise 11

Every non-zero complex number ${z}$ can now be written in polar form as

$\displaystyle z = r (\cos(\theta) + i \sin(\theta)) \ \ \ \ \ (9)$

with ${r>0}$ and ${\theta \in {\bf R}}$; we refer to ${\theta}$ as an argument of ${z}$, and can be interpreted as an angle of counterclockwise rotation needed to rotate the positive real axis to a position that contains ${z}$. The argument is not quite unique, due to the periodicity of sine and cosine: if ${\theta}$ is an argument of ${z}$, then so is ${\theta + 2\pi k}$ for any integer ${k}$, and conversely these are all the possible arguments that ${z}$ can have. The set of all such arguments will be denoted ${\mathrm{arg}(z)}$; it is a coset of the discrete group ${2\pi {\bf Z} := \{ 2\pi k: k \in {\bf Z}\}}$, and can thus be viewed as an element of the ${1}$-torus ${{\bf R}/2\pi{\bf Z}}$.

The operation ${w \mapsto zw}$ of multiplying a complex number ${w}$ by a given non-zero complex number ${z}$ now has a very appealing geometric interpretation when expressing ${z}$ in polar coordinates (9): this operation is the composition of the operation of dilation by ${r}$ around the origin, and counterclockwise rotation by ${\theta}$ around the origin. For instance, multiplication by ${i}$ performs a counter-clockwise rotation by ${\pi/2}$ around the origin, while multiplication by ${-i}$ performs instead a clockwise rotation by ${\pi/2}$. As complex multiplication is commutative and associative, it does not matter in which order one performs the dilation and rotation operations. Similarly, using Cartesian coordinates, we see that the operation ${w \mapsto z+w}$ of adding a complex number ${w}$ by a given complex number ${z}$ is simply a spatial translation by a displacement of ${z}$. The multiplication operation need not be isometric (due to the presence of the dilation ${r}$), but observe that both the addition and multiplication operations are conformal (angle-preserving) and also orientation-preserving (a counterclockwise loop will transform to another counterclockwise loop, and similarly for clockwise loops). As we shall see later, these conformal and orientation-preserving properties of the addition and multiplication maps will extend to the larger class of complex differentiable maps (at least outside of critical points of the map), and are an important aspect of the geometry of such maps.

Remark 12 One can also interpret the operations of complex arithmetic geometrically on the Argand plane as follows. As the addition law on ${{\bf C}}$ coincides with the vector addition law on ${{\bf R}^2}$, addition and subtraction of complex numbers is given by the usual parallelogram rule for vector addition; thus, to add a complex number ${z}$ to another ${w}$, we can translate the complex plane until the origin ${0}$ gets mapped to ${z}$, and then ${w}$ gets mapped to ${z+w}$; conversely, subtraction by ${z}$ corresponds to translating ${z}$ back to ${0}$. Similarly, to multiply a complex number ${z}$ with another ${w}$, we can dilate and rotate the complex plane around the origin until ${1}$ gets mapped to ${z}$, and then ${w}$ will be mapped to ${zw}$; conversely, division by ${z}$ corresponds to dilating and rotating ${z}$ back to ${1}$.

When performing computations, it is convenient to restrict the argument ${\theta}$ of a non-zero complex number ${z}$ to lie in a fundamental domain of the ${1}$-torus ${{\bf R}/2\pi{\bf Z}}$, such as the half-open interval ${\{ \theta: 0 \leq \theta < 2\pi \}}$ or ${\{ \theta: -\pi < \theta \leq \pi \}}$, in order to recover a unique parameterisation (at the cost of creating a branch cut at one point of the unit circle). Traditionally, the fundamental domain that is most often used is the half-open interval ${\{ \theta: -\pi < \theta \leq \pi \}}$. The unique argument of ${z}$ that lies in this interval is called the standard argument of ${z}$ and is denoted ${\mathrm{Arg}(z)}$, and ${\mathrm{Arg}}$ is called the standard branch of the argument function. Thus for instance ${\mathrm{Arg}(1)=0}$, ${\mathrm{Arg}(i) = \pi/2}$, ${\mathrm{Arg}(-1) = \pi}$, and ${\mathrm{Arg}(-i) = -\pi/2}$. Observe that the standard branch of the argument has a discontinuity on the negative real axis ${\{ x \in {\bf R}: x \leq 0\}}$, which is the branch cut of this branch. Changing the fundamental domain used to define a branch of the argument can move the branch cut around, but cannot eliminate it completely, due to non-trivial monodromy (if one continuously loops once counterclockwise around the origin, and varies the argument continuously as one does so, the argument will increment by ${2\pi}$, and so no branch of the argument function can be continuous at every point on the loop).

The multiplication formula (8) resembles the multiplication formula

$\displaystyle \exp( x + y) = \exp( x ) \exp( y ) \ \ \ \ \ (10)$

for the real exponential function ${\exp: {\bf R} \rightarrow {\bf R}}$. The two formulae can be unified through the famous Euler formula involving the complex exponential ${\exp: {\bf C} \rightarrow {\bf C}}$. There are many ways to define the complex exponential. Perhaps the most natural is through the ordinary differential equation ${\frac{d}{dz} \exp(z) = \exp(z)}$ with boundary condition ${\exp(0)=1}$. However, as we have not yet set up a theory of complex differentiation, we will proceed (at least temporarily) through the device of Taylor series. Recalling that the real exponential function ${\exp: {\bf R} \rightarrow {\bf R}}$ has the Taylor expansion

$\displaystyle \exp(x) = \sum_{n=0}^\infty \frac{x^n}{n!}$

$\displaystyle = 1 + x + \frac{x^2}{2!} + \frac{x^3}{3!} + \dots$

which is absolutely convergent for any real ${x}$, one is led to define the complex exponential function ${\exp: {\bf C} \rightarrow {\bf C}}$ by the analogous expansion

$\displaystyle \exp(z) = \sum_{n=0}^\infty \frac{z^n}{n!} \ \ \ \ \ (11)$

$\displaystyle = 1 + z + \frac{z^2}{2!} + \frac{z^3}{3!} + \dots$

noting from (4) that the absolute convergence of the real exponential ${\exp(x)}$ for any ${x \in {\bf R}}$ implies the absolute convergence of the complex exponential for any ${z \in {\bf C}}$. We also frequently write ${e^z}$ for ${\exp(z)}$. The multiplication formula (10) for the real exponential extends to the complex exponential:

Exercise 13 Use the binomial theorem and Fubini’s theorem for (complex) doubly infinite series to conclude that

$\displaystyle \exp(z+w) = \exp(z) \exp(w) \ \ \ \ \ (12)$

for any complex numbers ${z,w}$.

If one compares the Taylor series for ${\exp(z)}$ with the familiar Taylor expansions

$\displaystyle \sin(x) = \sum_{n=0}^\infty (-1)^n \frac{x^{2n+1}}{(2n+1)!}$

$\displaystyle = x - \frac{x^3}{3!} + \frac{x^5}{5!} - \dots$

and

$\displaystyle \cos(x) = \sum_{n=0}^\infty (-1)^n \frac{x^{2n}}{(2n)!}$

$\displaystyle = 1 - \frac{x^2}{2!} + \frac{x^4}{4!} - \dots$

$\displaystyle e^{i x} = \cos x + i \sin x \ \ \ \ \ (13)$

$\displaystyle e^{2\pi i} = 1 \ \ \ \ \ (14)$

$\displaystyle e^{\pi i} + 1 = 0. \ \ \ \ \ (15)$

We now see that the multiplication formula (8) can be written as a special form

$\displaystyle e^{i(\theta + \phi)} = e^{i\theta} e^{i\phi}$

of (12); similarly, de Moivre’s formula takes the simple and intuitive form

$\displaystyle e^{i n \theta} = (e^{i\theta})^n.$

From (12) and (13) we also see that the exponential function basically transforms Cartesian coordinates to polar coordinates:

$\displaystyle \exp( x+iy ) = e^x ( \cos y + i \sin y ).$

Later on in the course we will study (the various branches of) the logarithm function that inverts the complex exponential, thus converting polar coordinates back to Cartesian ones.

From (13) and (1), together with the easily verified identity

$\displaystyle \overline{e^{ix}} = e^{-ix},$

we see that we can recover the trigonometric functions ${\sin(x), \cos(x)}$ from the complex exponential by the formulae

$\displaystyle \sin(x) = \frac{e^{ix} - e^{-ix}}{2i}; \quad \cos(x) = \frac{e^{ix} + e^{-ix}}{2}. \ \ \ \ \ (16)$

(Indeed, if one wished, one could take these identities as the definition of the sine and cosine functions, giving a purely analytic way to construct these trigonometric functions.) From these identities one can derive all the usual trigonometric identities from the basic properties of the exponential (and in particular (12)). For instance, using a little bit of high school algebra we can prove the familiar identity

$\displaystyle \sin^2(x) + \cos^2(x) = 1$

from (16):

$\displaystyle \sin^2(x) + \cos^2(x) = \frac{(e^{ix}-e^{-ix})^2}{(2i)^2} + \frac{(e^{ix} + e^{-ix})^2}{2^2}$

$\displaystyle = \frac{e^{2ix} - 2 + e^{-2ix}}{-4} + \frac{e^{2ix} + 2 + e^{-2ix}}{4}$

$\displaystyle = 1.$

Thus, in principle at least, one no longer has a need to memorize all the different trigonometric identities out there, since they can now all be unified as consequences of just a handful of basic identities for the complex exponential, such as (12), (14), and (15).

In view of (16), it is now natural to introduce the complex sine and cosine functions ${\sin: {\bf C} \rightarrow {\bf C}}$ and ${\cos: {\bf C} \rightarrow {\bf C}}$ by the formula

$\displaystyle \sin(z) = \frac{e^{iz} - e^{-iz}}{2i}; \quad \cos(z) = \frac{e^{iz} + e^{-iz}}{2}. \ \ \ \ \ (17)$

$\displaystyle \sin^2(z) + \cos^2(z) = 1 \ \ \ \ \ (18)$

for all ${z}$. (We caution however that this does not imply that ${\sin(z)}$ and ${\cos(z)}$ are bounded in magnitude by ${1}$ – note carefully the lack of absolute value signs outside of ${\sin(z)}$ and ${\cos(z)}$ in the above formula! See also Exercise 16 below.) Similarly for all of the other trigonometric identities. (Later on in this series of lecture notes, we will develop the concept of analytic continuation, which can explain why so many real-variable algebraic identities naturally extend to their complex counterparts.) From (11) we see that the complex sine and cosine functions have the same Taylor series expansion as their real-variable counterparts, namely

$\displaystyle \sin(z) = \sum_{n=0}^\infty (-1)^n \frac{z^{2n+1}}{(2n+1)!}$

$\displaystyle = z - \frac{z^3}{3!} + \frac{z^5}{5!} - \dots$

and

$\displaystyle \cos(z) = \sum_{n=0}^\infty (-1)^n \frac{z^{2n}}{(2n)!}$

$\displaystyle = 1 - \frac{z^2}{2!} + \frac{z^4}{4!} - \dots.$

The formulae (17) for the complex sine and cosine functions greatly resemble those of the hyperbolic trigonometric functions ${\sinh, \cosh: {\bf R} \rightarrow {\bf R}}$, defined by the formulae

$\displaystyle \sinh(x) := \frac{e^x - e^{-x}}{2}; \quad \cosh(x) := \frac{e^x + e^{-x}}{2}.$

Indeed, if we extend these functions to the complex domain by defining ${\sinh, \cosh: {\bf C} \rightarrow {\bf C}}$ to be the functions

$\displaystyle \sinh(z) := \frac{e^z - e^{-z}}{2}; \quad \cosh(z) := \frac{e^z + e^{-z}}{2},$

then on comparison with (17) we obtain the complex identities

$\displaystyle \sinh(z) = -i \sin(iz); \quad \cosh(z) = \cos(iz) \ \ \ \ \ (19)$

$\displaystyle \sin(z) = -i \sinh(iz); \quad \cos(z) = \cosh(iz) \ \ \ \ \ (20)$

for all complex ${z}$. Thus we see that once we adopt the perspective of working over the complex numbers, the hyperbolic trigonometric functions are “rotations by 90 degrees” of the ordinary trigonometric functions; this is a simple example of what physicists call a Wick rotation. In particular, we see from these identities that any trigonometric identity will have a hyperbolic counterpart, though due to the presence of various factors of ${i}$, the signs may change as one passes from trigonometric to hyperbolic functions or vice versa (a fact quantified by Osborne’s rule). For instance, by substituting (19) or (20) into (18) (and replacing ${z}$ by ${iz}$ or ${-iz}$ as appropriate), we end up with the analogous identity

$\displaystyle \cosh^2(z) - \sinh^2(z) = 1$

for the hyperbolic trigonometric functions. Similarly for all other trigonometric identities. Thus we see that the complex exponential single-handedly unites the trigonometry, hyperbolic trigonometry, and the real exponential function into a single coherent theory!

Exercise 14

• (i) If ${n}$ is a positive integer, show that the only complex number solutions to the equation ${z^n = 1}$ are given by the ${n}$ complex numbers ${e^{2\pi i k/n}}$ for ${k=0,\dots,n-1}$; these numbers are thus known as the ${n^{th}}$ roots of unity. Conclude the identity ${z^n - 1 = \prod_{k=0}^{n-1} (z - e^{2\pi i k/n})}$ for any complex number ${z}$.
• (ii) Show that the only compact subgroups ${G}$ of the multiplicative complex numbers ${{\bf C}^\times}$ are the unit circle ${S^1}$ and the ${n^{th}}$ roots of unity

$\displaystyle C_n := \{ e^{2\pi i k/n}: k=0,1,\dots,n-1\}$

for ${n=1,2,\dots}$. (Hint: there are two cases, depending on whether ${1}$ is a limit point of ${G}$ or not.)

• (iii) Give an example of a non-compact subgroup of ${S^1}$.
• (iv) (Warning: this one is tricky.) Show that the only connected closed subgroups of ${{\bf C}^\times}$ are the whole group ${{\bf C}^\times}$, the trivial group ${\{1\}}$, and the one-parameter groups of the form ${\{ \exp( tz ): t \in {\bf R} \}}$ for some non-zero complex number ${z}$.

The next exercise gives a special case of the fundamental theorem of algebra, when considering the roots of polynomials of the specific form ${P(z) = z^n - w}$.

Exercise 15 Show that if ${w}$ is a non-zero complex number and ${n}$ is a positive integer, then there are exactly ${n}$ distinct solutions to the equation ${z^n = w}$, and any two such solutions differ (multiplicatively) by an ${n^{th}}$ root of unity. In particular, a non-zero complex number ${w}$ has two square roots, each of which is the negative of the other. What happens when ${w=0}$?

Exercise 16 Let ${z_n}$ be a sequence of complex numbers. Show that ${\sin(z_n)}$ is bounded if and only if the imaginary part of ${z_n}$ is bounded, and similarly with ${\sin(z_n)}$ replaced by ${\cos(z_n)}$.

Exercise 17 (This question was drawn from a previous version of this course taught by Rowan Killip.) Let ${w_1, w_2}$ be distinct complex numbers, and let ${\lambda}$ be a positive real that is not equal to ${1}$.

• (i) Show that the set ${\{ z \in {\bf C}: |\frac{z-w_1}{z-w_2}| = \lambda \}}$ defines a circle in the complex plane. (Ideally, you should be able to do this without breaking everything up into real and imaginary parts.)
• (ii) Conversely, show that every circle in the complex plane arises in such a fashion (for suitable choices of ${w_1,w_2,\lambda}$, of course).
• (iii) What happens if ${\lambda=1}$?
• (iv) Let ${\gamma}$ be a circle that does not pass through the origin. Show that the image of ${\gamma}$ under the inversion map ${z \mapsto 1/z}$ is a circle. What happens if ${\gamma}$ is a line? What happens if the ${\gamma}$ passes through the origin (and one then deletes the origin from ${\gamma}$ before applying the inversion map)?

Exercise 18 If ${z}$ is a complex number, show that ${\exp(z) = \lim_{n \rightarrow \infty} (1 + \frac{z}{n})^n}$.

Filed under: 246A - complex analysis, math.CV, math.RA Tagged: complex numbers, exponentiation, trigonometry

### Terence Tao — 246A, Notes 2: complex integration

Having discussed differentiation of complex mappings in the preceding notes, we now turn to the integration of complex maps. We first briefly review the situation of integration of (suitably regular) real functions ${f: [a,b] \rightarrow {\bf R}}$ of one variable. Actually there are three closely related concepts of integration that arise in this setting:

• (i) The signed definite integral ${\int_a^b f(x)\ dx}$, which is usually interpreted as the Riemann integral (or equivalently, the Darboux integral), which can be defined as the limit (if it exists) of the Riemann sums

$\displaystyle \sum_{j=1}^n f(x_j^*) (x_j - x_{j-1}) \ \ \ \ \ (1)$

where ${a = x_0 < x_1 < \dots < x_n = b}$ is some partition of ${[a,b]}$, ${x_j^*}$ is an element of the interval ${[x_{j-1},x_j]}$, and the limit is taken as the maximum mesh size ${\max_{1 \leq j \leq n} |x_j - x_{j-1}|}$ goes to zero. It is convenient to adopt the convention that ${\int_b^a f(x)\ dx := - \int_a^b f(x)\ dx}$ for ${a < b}$; alternatively one can interpret ${\int_b^a f(x)\ dx}$ as the limit of the Riemann sums (1), where now the (reversed) partition ${b = x_0 > x_1 > \dots > x_n = a}$ goes leftwards from ${b}$ to ${a}$, rather than rightwards from ${a}$ to ${b}$.

• (ii) The unsigned definite integral ${\int_{[a,b]} f(x)\ dx}$, usually interpreted as the Lebesgue integral. The precise definition of this integral is a little complicated (see e.g. this previous post), but roughly speaking the idea is to approximate ${f}$ by simple functions ${\sum_{i=1}^n c_i 1_{E_i}}$ for some coefficients ${c_i \in {\bf R}}$ and sets ${E_i \subset [a,b]}$, and then approximate the integral ${\int_{[a,b]} f(x)\ dx}$ by the quantities ${\sum_{i=1}^n c_i m(E_i)}$, where ${E_i}$ is the Lebesgue measure of ${E_i}$. In contrast to the signed definite integral, no orientation is imposed or used on the underlying domain of integration, which is viewed as an “undirected” set ${[a,b]}$.
• (iii) The indefinite integral or antiderivative ${\int f(x)\ dx}$, defined as any function ${F: [a,b] \rightarrow {\bf R}}$ whose derivative ${F'}$ exists and is equal to ${f}$ on ${[a,b]}$. Famously, the antiderivative is only defined up to the addition of an arbitrary constant ${C}$, thus for instance ${\int x\ dx = \frac{1}{2} x^2 + C}$.

There are some other variants of the above integrals (e.g. the Henstock-Kurzweil integral, discussed for instance in this previous post), which can handle slightly different classes of functions and have slightly different properties than the standard integrals listed here, but we will not need to discuss such alternative integrals in this course (with the exception of some improper and principal value integrals, which we will encounter in later notes).

The above three notions of integration are closely related to each other. For instance, if ${f: [a,b] \rightarrow {\bf R}}$ is a Riemann integrable function, then the signed definite integral and unsigned definite integral coincide (when the former is oriented correctly), thus

$\displaystyle \int_a^b f(x)\ dx = \int_{[a,b]} f(x)\ dx$

and

$\displaystyle \int_b^a f(x)\ dx = -\int_{[a,b]} f(x)\ dx$

If ${f: [a,b] \rightarrow {\bf R}}$ is continuous, then by the fundamental theorem of calculus, it possesses an antiderivative ${F = \int f(x)\ dx}$, which is well defined up to an additive constant ${C}$, and

$\displaystyle \int_c^d f(x)\ dx = F(d) - F(c)$

for any ${c,d \in [a,b]}$, thus for instance ${\int_a^b F(x)\ dx = F(b) - F(a)}$ and ${\int_b^a F(x)\ dx = F(a) - F(b)}$.

All three of the above integration concepts have analogues in complex analysis. By far the most important notion will be the complex analogue of the signed definite integral, namely the contour integral ${\int_\gamma f(z)\ dz}$, in which the directed line segment from one real number ${a}$ to another ${b}$ is now replaced by a type of curve in the complex plane known as a contour. The contour integral can be viewed as the special case of the more general line integral ${\int_\gamma f(z) dx + g(z) dy}$, that is of particular relevance in complex analysis. There are also analogues of the Lebesgue integral, namely the arclength measure integrals ${\int_\gamma f(z)\ |dz|}$ and the area integrals ${\int_\Omega f(x+iy)\ dx dy}$, but these play only an auxiliary role in the subject. Finally, we still have the notion of an antiderivative ${F(z)}$ (also known as a primitive) of a complex function ${f(z)}$.

As it turns out, the fundamental theorem of calculus continues to hold in the complex plane: under suitable regularity assumptions on a complex function ${f}$ and a primitive ${F}$ of that function, one has

$\displaystyle \int_\gamma f(z)\ dz = F(z_1) - F(z_0)$

whenever ${\gamma}$ is a contour from ${z_0}$ to ${z_1}$ that lies in the domain of ${f}$. In particular, functions ${f}$ that possess a primitive must be conservative in the sense that ${\int_\gamma f(z)\ dz = 0}$ for any closed contour. This property of being conservative is not typical, in that “most” functions ${f}$ will not be conservative. However, there is a remarkable and far-reaching theorem, the Cauchy integral theorem (also known as the Cauchy-Goursat theorem), which asserts that any holomorphic function is conservative, so long as the domain is simply connected (or if one restricts attention to contractible closed contours). We will explore this theorem and several of its consequences the next set of notes.

— 1. Integration along a contour —

The notion of a curve is a very intuitive one. However, the precise mathematical definition of what a curve actually is depends a little bit on what type of mathematics one wishes to do. If one is mostly interested in topology, then a good notion is that of a continuous (parameterised) curve. It one wants to do analysis in somewhat irregular domains, it is convenient to restrict the notion of curve somewhat, to the rectifiable curves. If one is doing analysis in “nice” domains (such as the complex plane ${{\bf C}}$, a half-plane, a punctured plane, a disk, or an annulus), then it is convenient to restrict the notion further, to the piecewise smooth curves, also known as contours. If one wished to get to the main theorems of complex analysis as quickly as possible, then one would restrict attention only to contours and skip much of this section; however we shall take a more leisurely approach here, discussing curves and rectifiable curves as well, as these concepts are also useful outside of complex analysis.

We begin by defining the notion of a continuous curve.

Definition 1 (Continuous curves) A continuous parameterised curve, or curve for short, is a continuous map ${\gamma: [a,b] \rightarrow {\bf C}}$ from a compact interval ${[a,b] \subset {\bf R}}$ to the complex plane ${{\bf C}}$. We call the curve trivial if ${a=b}$, and non-trivial otherwise. We refer to the complex numbers ${\gamma(a), \gamma(b)}$ as the initial point and terminal point of the curve respectively, and refer to these two points collectively as the endpoints of the curve. We say that the curve is closed if ${\gamma(a) = \gamma(b)}$. We say that the curve is simple if one has ${\gamma(t) \neq \gamma(t')}$ for any distinct ${t,t' \in [a,b]}$, with the possible exception of the endpoint cases ${t=a, t'=b}$ or ${t=b, t'=a}$ (thus we allow closed curves to be simple). We refer to the subset ${\gamma([a,b]) := \{ \gamma(t): t \in [a,b]\}}$ of the complex plane as the image of the curve.

We caution that the term “closed” here does not refer to the topological notion of closure: for any curve ${\gamma}$ (closed or otherwise), the image ${\gamma([a,b])}$ of the curve, being the continuous image of a compact set, is necessarily a compact subset of ${{\bf C}}$ and is thus always topologically closed.

A basic example of a curve is the directed line segment ${\gamma_{z_1 \rightarrow z_2}: [0,1] \rightarrow {\bf C}}$ from one complex point ${z_1}$ to another ${z_2}$, defined by

$\displaystyle \gamma_{z_1 \rightarrow z_2}(t) := (1-t) z_1 + t z_2$

for ${0 \leq t \leq 1}$. (Thus, contrary to the informal English meaning of the terms, we consider line segments to be examples of curves, despite having zero curvature; in general, it is convenient in mathematics to admit such “degenerate” objects into one’s definitions, in order to obtain good closure properties for these objects, and to maximise the generality of the definition.) If ${z_1 \neq z_2}$, this is a simple curve, while for ${z_1 = z_2}$ it is (a rather degenerate, but still non-trivial) closed curve. Another important example of a curve is the anti-clockwise circle ${\gamma_{z_0,r,\circlearrowleft}: [0,2\pi] \rightarrow {\bf C}}$ of some radius ${r>0}$ around a complex centre ${z_0 \in {\bf C}}$, defined by

$\displaystyle \gamma_{z_0,r,\circlearrowleft}(t) := z_0 + r e^{i\theta}. \ \ \ \ \ (2)$

This is a simple closed non-trivial curve. If we extended the domain here from ${[0,2\pi]}$ to (say) ${[0,4\pi]}$, the curve would remain closed, but would no longer be simple (every point in the image is now traversed twice by the curve).

Note that it is technically possible for two distinct curves to have the same image. For instance, the anti-clockwise circle ${\tilde \gamma_{z_0,r,\circlearrowleft}: [0,1] \rightarrow {\bf C}}$ of some radius ${r>0}$ around a complex centre ${z_0 \in {\bf C}}$ defined by

$\displaystyle \tilde \gamma_{z_0,r,\circlearrowleft}(t) := z_0 + r e^{2\pi i \theta}$

traverses the same image as the previous curve (2), but is considered a distinct curve from ${\gamma_{z_0,r,\circlearrowleft}}$. Nevertheless the two curves are closely related to each other, and we formalise this as follows. We say that one curve ${\gamma_2: [a_2,b_2] \rightarrow {\bf C}}$ is a continuous reparameterisation of another ${\gamma_1: [a_1, b_1] \rightarrow {\bf C}}$, if there is a homeomorphism ${\phi: [a_1,b_1] \rightarrow [a_2,b_2]}$ (that is to say, a continuous invertible map whose inverse ${\phi^{-1}: [a_2,b_2] \rightarrow [a_1,b_1]}$ is also continuous) which is endpoint preserving (i.e., ${\phi(a_1)=a_2}$ and ${\phi(b_1)=b_2}$) such that ${\gamma_2(\phi(t)) = \gamma_1(t)}$ for all ${t \in [a_1,b_1]}$ (that is to say, ${\gamma_1 = \gamma_2 \circ \phi}$, or equivalently ${\gamma_2 = \gamma_1 \circ \phi^{-1}}$), in which case we write ${\gamma_1 \equiv \gamma_2}$. Thus for instance ${\gamma_{z_0,r,\circlearrowleft} \equiv \tilde \gamma_{z_0,r,\circlearrowleft}}$. The relation of being a continuous reparameterisation is an equivalence relation, so one can talk about the notion of a curve “up to continuous reparameterisation”, by which we mean an equivalence class of a curve under this relation. Thus for instance the image of a curve, as well as its initial point and end point, are well defined up to continuous reparameterisation, since if ${\gamma_1 \equiv \gamma_2}$ then ${\gamma_1}$ and ${\gamma_2}$ have the same image, the same initial point, and the same terminal point. It is common to depict an equivalence class of a curve ${\gamma}$ graphically, by drawing its image together with an arrow depicting the direction of motion from the initial point to its endpoint. (In the case of a non-simple curve, one may need multiple arrows in order to clarify the direction of motion, and also the possible multiplicity of the curve.)

• (i) Show that ${\phi^{-1}: [a_2,b_2] \rightarrow [a_1,b_1]}$ is continuous, so that ${\phi}$ is a homeomorphism. (Hint: use the fact that a continuous image of a compact set is compact, and that a subset of an interval is topologically closed if and only if it is compact.)
• (ii) If ${\phi(a_1)=a_2}$, show that ${\phi(b_1) = b_2}$ and that ${\phi}$ is monotone increasing. (Hint: use the intermediate value theorem.)
• (iii) Conversely, if ${\psi: [a_1,b_1] \rightarrow [a_2,b_2]}$ is a continuous monotone increasing map with ${\psi(a_1) = a_2}$ and ${\psi(b_1) = b_2}$, show that ${\psi}$ is a homeomorphism.

It will be important for us that we do not allow reparameterisations to reverse the endpoints. For instance, if ${z_1,z_2}$ are distinct points in the complex plane, the directed line segment ${\gamma_{z_1 \rightarrow z_2}}$ is not a reparameterisation of the directed line segment ${\gamma_{z_2 \rightarrow z_1}}$ since they do not have the same initial point (or same terminal point); the map ${t \mapsto 1-t}$ is a homeomorphism from ${[0,1]}$ to ${[0,1]}$ but it does not preserve the initial point or the terminal point. In general, given a curve ${\gamma: [a,b] \rightarrow {\bf C}}$, we define its reversal ${-\gamma: [-b,-a] \rightarrow {\bf C}}$ to be the curve ${(-\gamma)(t) := \gamma(-t)}$, thus for instance ${\gamma_{z_2 \rightarrow z_1}}$ is (up to reparameterisation) the reversal of ${\gamma_{z_1 \rightarrow z_2}}$, thus

$\displaystyle \gamma_{z_2 \rightarrow z_1} \equiv - \gamma_{z_1 \rightarrow z_2}.$

Another basic operation on curves is that of concatenation. Suppose we have two curves ${\gamma_1: [a_1,b_1] \rightarrow {\bf C}}$ and ${\gamma_2: [a_2,b_2] \rightarrow {\bf C}}$ with the property that the terminal point ${\gamma_1(b_1)}$ of ${\gamma_1}$ equals the initial point ${\gamma_2(a_2)}$ of ${\gamma_2}$. We can reparameterise ${\gamma_2}$ by translation to ${\tilde \gamma_2: [b_1, b_2+b_1-a_2] \rightarrow {\bf C}}$, defined by ${\tilde \gamma_2(t) := \gamma_2(t - b_1 + a_2)}$. We then define the concatenation or sum ${\gamma_1 + \gamma_2: [a_1, b_2+b_1-a_2] \rightarrow {\bf C}}$ by setting

$\displaystyle (\gamma_1 + \gamma_2)(t) := \gamma_1(t)$

for ${a_1 \leq t \leq b_1}$ and

$\displaystyle (\gamma_1 + \gamma_2)(t) := \tilde \gamma_2(t)$

Concatenation is well behaved with respect to equivalence and reversal:

• (i) (Concatenation well defined up to equivalence) If ${\gamma_1 \equiv \tilde \gamma_1}$ and ${\gamma_2 \equiv \tilde \gamma_2}$, show that ${\gamma_1+\gamma_2 \equiv \tilde \gamma_1 + \tilde \gamma_2}$.
• (ii) (Concatenation associative) Show that ${(\gamma_1+\gamma_2)+\gamma_3 = \gamma_1 + (\gamma_2 + \gamma_3)}$. In particular, we certainly have ${(\gamma_1+\gamma_2)+\gamma_3 \equiv \gamma_1 + (\gamma_2 + \gamma_3)}$
• (iii) (Concatenation and reversal) Show that ${-(\gamma_1+\gamma_2) \equiv (-\gamma_2) + (-\gamma_1)}$.
• (iv) (Non-commutativity) Give an example in which ${\gamma_1+\gamma_2}$ and ${\gamma_2+\gamma_1}$ are both well-defined, but not equivalent to each other.
• (v) (Identity) If ${z_1}$ and ${z_2}$ denote the initial and terminal points of ${\gamma_1}$ respectively, and ${\delta_z: [0,0] \rightarrow {\bf C}}$ is the trivial curve ${\delta_z: 0 \rightarrow z}$ defined for any ${z \in {\bf C}}$, show that ${\delta_{z_1} + \gamma_1 \equiv \gamma_1}$ and ${\gamma_1 + \delta_{z_2} \equiv \gamma_1}$.
• (vi) (Non-invertibility) Give an example in which ${\gamma_1 + (-\gamma_1)}$ is not equivalent to a trivial curve.

Remark 4 The above exercise allows one to view the space of curves up to equivalence as a category, with the points in the complex plane being the objects of the category, and each equivalence class of curves being a single morphism from the initial point to the terminal point (and with the equivalence class of trivial curves being the identity morphisms). This point of view can be useful in topology, particularly when relating to concepts such as the fundamental group (and fundamental groupoid), monodromy, and holonomy. However, we will not need to use any advanced category-theoretic concepts in this course.

Exercise 5 Let ${z_0 \in {\bf C}}$ and ${r>0}$. For any integer ${m}$, let ${\gamma_m: [0,2\pi] \rightarrow {\bf C}}$ denote the curve

$\displaystyle \gamma_m(t) := z_0 + r e^{i m t}$

(thus for instance ${\gamma_1 = \gamma_{z_0,r,\circlearrowright}}$).

• (i) Show that for any integer ${m,m'}$, we have ${-\gamma_m \equiv \gamma_{-m}}$.
• (ii) Show that for any non-negative integers ${m,m'}$, we have ${\gamma_m + \gamma_{m'} \equiv \gamma_{m+m'}}$. What happens for other values of ${m,m'}$?
• (ii) If ${m,m'}$ are distinct integers, show that ${\gamma_m \not \equiv \gamma_{m'}}$.

Given a sequence of complex numbers ${z_0,z_1,\dots,z_n}$, we define the polygonal path ${\gamma_{z_0 \rightarrow z_1 \rightarrow \dots \rightarrow z_n}}$ traversing these numbers in order to be the curve

$\displaystyle \gamma_{z_0 \rightarrow z_1 \rightarrow \dots \rightarrow z_n} := \gamma_{z_0 \rightarrow z_1} + \gamma_{z_1 \rightarrow z_2} + \dots + \gamma_{z_{n-1} \rightarrow z_n}.$

This is well-defined thanks to Exercise 3(ii) (actually all we really need in applications is being well-defined up to equivalence). Thus for instance ${\gamma_{z_0 \rightarrow z_1 \rightarrow z_2 \rightarrow z_0}}$ would traverse a closed triangular path connecting ${z_0}$, ${z_1}$, and ${z_2}$ (this path may end up being non-simple if the points ${z_0,z_1,z_2}$ are collinear).

In order to do analysis, we need to restrict our attention to those curves which are rectifiable:

Definition 6 Let ${\gamma: [a,b] \rightarrow {\bf C}}$ be a curve. The arc length ${|\gamma|}$ of the curve is defined to be the supremum of the quantities

$\displaystyle \sum_{j=1}^n |\gamma(t_{j+1})-\gamma(t_j)|$

where ${n}$ ranges over the natural numbers and ${a = t_0 < t_1 < \dots < t_n = b}$ ranges over the partitions of ${[a,b]}$. We say that the curve ${\gamma}$ is rectifiable if its arc length is finite.

The concept is best understood visually: a curve is rectifiable if there is some finite bound on the length of polygonal paths one can form while traversing the curve in order. From Exercise 2 we see that equivalent curves have the same arclength, so the concepts of arclength and rectifiability are well defined for curves that are only given up to continuous reparameterisation.

$\displaystyle |\gamma_1+\gamma_2| = |\gamma_1| + |\gamma_2|.$

In particular, ${\gamma_1+\gamma_2}$ is rectifiable if and only if ${\gamma_1, \gamma_2}$ are both individually rectifiable.

It is not immediately obvious that any reasonable curve (e.g. the line segments ${\gamma_{z_1 \rightarrow z_2}}$ or the circles ${\gamma_{z_0,r,\circlearrowleft}}$) are rectifiable. To verify this, we need two preliminary results.

$\displaystyle |\int_a^b f(t)\ dt| \leq \int_a^b |f(t)|\ dt.$

Here we interpret ${\int_a^b f(t)\ dt}$ as the Riemann integral (or equivalently, ${\int_a^b f(t)\ dt = \int_a^b \mathrm{Re} f(t)\ dt + i \int_a^b \mathrm{Im} f(t)\ dt}$).

Proof: We first attempt to prove this inequality by considering the real and imaginary parts separately. From the real-valued triangle inequality (and basic properties of the Riemann integral) we have

$\displaystyle | \mathrm{Re} \int_a^b f(t)\ dt | = |\int_a^b \mathrm{Re} f(t)\ dt |$

$\displaystyle \leq \int_a^b |\mathrm{Re} f(t)|\ dt$

$\displaystyle \leq \int_a^b |f(t)|\ dt$

and similarly

$\displaystyle |\mathrm{Im} \int_a^b f(t)\ dt| \leq \int_a^b |f(t)|\ dt$

but these two bounds only yield the weaker estimate

$\displaystyle |\int_a^b f(t)\ dt| \leq \sqrt{2} \int_a^b |f(t)|\ dt.$

To eliminate this ${\sqrt{2}}$ loss we can amplify the above argument by exploiting phase rotation. For any real ${\theta}$, we can repeat the above arguments (using the complex linearity of the Riemann integral, which is easily verified) to give

$\displaystyle | \mathrm{Re} e^{i\theta} \int_a^b f(t)\ dt | = |\int_a^b \mathrm{Re} e^{i\theta} f(t)\ dt |$

$\displaystyle \leq \int_a^b |\mathrm{Re} e^{i\theta} f(t)|\ dt$

$\displaystyle \leq \int_a^b |f(t)|\ dt.$

But we have ${|z| = \sup_\theta \mathrm{Re} e^{i\theta} z}$ for any complex number ${z}$, so taking the supremum of both sides in ${\theta}$ we obtain the claim. $\Box$

Exercise 9 Let ${a \leq b}$ be real numbers. Show that the interval ${[a,b]}$ is topologically connected, that is to say the only two subsets of ${[a,b]}$ that are both open and closed relative to ${[a,b]}$ are the empty set and all of ${[a,b]}$. (Hint: if ${E}$ is a non-empty set that is both open and closed in ${[a,b]}$ and contains ${a}$, consider the supremum of all ${T_* \in [a,b]}$ such that ${[a,T_*] \subset E}$.)

Next, we say that a non-trivial curve ${\gamma: [a,b] \rightarrow {\bf C}}$ is continuously differentiable if the derivative

$\displaystyle \gamma'(t) := \lim_{t' \rightarrow t: t' \in [a,b] \backslash \{t\}} \frac{\gamma(t') - \gamma(t)}{t'-t}$

exists and is continuous for all ${t \in [a,b]}$ (note that we are only taking right-derivatives at ${t=a}$ and left-derivatives at ${t=b}$).

Proposition 10 (Arclength formula) If ${\gamma: [a,b] \rightarrow {\bf C}}$ is a continuously differentiable curve, then it is rectifiable, and

$\displaystyle |\gamma| := \int_a^b |\gamma'(t)|\ dt.$

Proof: We first prove the upper bound

$\displaystyle |\gamma| \leq \int_a^b |\gamma'(t)|\ dt \ \ \ \ \ (3)$

which in particular implies the rectifiability of ${\gamma}$ since the right-hand side of (3) is finite. Let ${a = t_0 < \dots < t_n = b}$ be any partition of ${[a,b]}$. By the fundamental theorem of calculus (applied to the real and imaginary parts of ${\gamma}$) we have

$\displaystyle \gamma(t_j) - \gamma(t_{j-1}) = \int_{t_{j-1}}^{t_j} \gamma'(t)\ dt$

for any ${1 \leq i \leq n}$, and hence by Lemma 8 we have

$\displaystyle |\gamma(t_j) - \gamma(t_{j-1})| \leq \int_{t_{j-1}}^{t_j} |\gamma'(t)|\ dt.$

Summing in ${j}$ we obtain

$\displaystyle \sum_{j=1}^n |\gamma(t_i) - \gamma(t_{i-1})| \leq \int_a^b |\gamma'(t)|\ dt$

and taking suprema over all partitions we obtain (3).

Now we need to show the matching lower bound. Let ${\varepsilon>0}$ be a small quantity, and for any ${a \leq T \leq b}$, let ${\gamma_{[a,T]}: [a,T] \rightarrow {\bf C}}$ denote the restriction of ${\gamma: [a,b] \rightarrow {\bf C}}$ to ${[a,T]}$. We will show the bound

$\displaystyle |\gamma_{[a,T]}| \geq \int_a^T |\gamma'(t)| - \varepsilon (T-a) \ \ \ \ \ (4)$

for all ${a \leq T \leq b}$; specialising to ${T=b}$ and then sending ${\varepsilon \rightarrow 0}$ will give the claim.

It remains to prove (4) for a given choice of ${\varepsilon}$. We will use a continuous version of induction known as the continuity method{continuity method}, which exploits Exercise 9.

Let ${\Omega_\varepsilon \subset [a,b]}$ denote the set of ${T_* \in [a,b]}$ such that (4) holds for all ${a \leq T \leq T_*}$. It is clear (using Exercise 7) that this set ${\Omega_\varepsilon}$ is topologically closed, and also contains the left endpoint ${a}$ of ${[a,b]}$. If ${T_* \in \Omega_\varepsilon}$ and ${T_* < b}$, then from the differentiability of ${\gamma}$ at ${T_*}$, we have some interval ${[T_*, T_*+\delta] \subset [a,b]}$ such that

$\displaystyle |\frac{\gamma(T) - \gamma(T_*)}{T-T_*} - \gamma'(T_*)| \leq \varepsilon/2$

for all ${T \in [T_*, T_*+\delta]}$. Rearranging this using the triangle inequality, we have

$\displaystyle |\gamma(T) - \gamma(T_*)| \geq |\gamma'(T_*)| (T-T_*) - \frac{\varepsilon}{2} (T-T_*).$

Also, from the continuity of ${|\gamma'|}$ we have

$\displaystyle \int_{T_*}^T |\gamma'(t)\ dt| \leq |\gamma'(T_*)| (T-T_*) + \frac{\varepsilon}{2} (t-T_*)$

for all ${T \in [T_*, T_*+\delta]}$, if ${\delta}$ is small enough. We conclude that

$\displaystyle |\gamma(T) - \gamma(T_*)| \geq \int_{T_*}^T \gamma'(t)\ dt - \varepsilon (T-T_*),$

and hence

$\displaystyle |\gamma_{[T_*,T]}| \geq \int_{T_*}^T \gamma'(t)\ dt - \varepsilon (T-T_*)$

where ${\gamma_{[T_*,T]}: [T_*,T] \rightarrow {\bf C}}$ is the restriction of ${\gamma: [a,b] \rightarrow {\bf C}}$ to ${[T_*,T]}$. Adding this to the ${T=T_*}$ case of (4) using (7) we conclude that (4) also holds for all ${T \in [T_*,T_*+\delta]}$. From this we see that ${\Omega_\varepsilon}$ is (relatively) open in ${[a,b]}$; from the connectedness of ${[a,b]}$ we conclude that ${\Omega_\varepsilon = [a,b]}$, and we are done. $\Box$

It is now easy to verify that the line segment ${\gamma_{z_1 \rightarrow z_2}}$ is rectifiable with arclength ${|z_2-z_1|}$, and that the circle ${\gamma_{z_0,r,\circlearrowleft}}$ is rectifiable with arclength ${2\pi r}$, exactly as one would expect from elementary geometry. Finally, from Lemma 7, a polygonal path ${\gamma_{z_0 \rightarrow z_1 \rightarrow \dots \rightarrow z_n}}$ will be rectifiable with arclength ${|z_0-z_1| + \dots + |z_{n-1}-z_n|}$, again exactly as one would expect.

Exercise 12 (This exercise presumes familiarity with Lebesgue measure.) Show that the image of a rectifable curve is necessarily of measure zero in the complex plane. (In particular, space-filling curves such as the Peano curve or the Hilbert curve cannot be rectifiable.)

Remark 13 As the above exercise suggests, many fractal curves will fail to be rectifiable; for instance the Koch snowflake is a famous example of an unrectifiable curve. (The situation is clarified once one develops the theory of Hausdorff dimension, as is done for instance in this previous post: any curve of Hausdorff dimension strictly greater than one will be unrectifiable.)

Much as continuous functions ${f: [a,b] \rightarrow {\bf R}}$ on an interval may be integrated by taking limits of Riemann sums, we may also integrate continuous functions ${f: \gamma([a,b]) \rightarrow {\bf C}}$ on the image of a rectifiable curve ${\gamma: [a,b] \rightarrow {\bf C}}$:

$\displaystyle \sum_{j=1}^n f(\gamma(t^*_j)) (\gamma(t_j) - \gamma(t_{j-1})), \ \ \ \ \ (5)$

where ${a = t_0 < \dots < t_n = b}$ ranges over the partitions of ${[a,b]}$, and for each ${1 \leq j \leq n}$, ${t^*_j}$ is an element of ${[t_{j-1},t_j]}$, converge as the maximum mesh size ${\max_{1 \leq j \leq n} |t_j - t_{j-1}|}$ goes to zero to some complex limit, which we will denote as ${\int_\gamma f(z)\ dz}$. In other words, for every ${\varepsilon>0}$ there exists a ${\delta}$ such that

$\displaystyle |\sum_{j=1}^n f(\gamma(t^*_j)) (\gamma(t_j) - \gamma(t_{j-1})) - \int_\gamma f(z)\ dz| \leq \varepsilon$

whenever ${\max_{1 \leq j \leq n} |t_j - t_{j-1}| \leq \varepsilon}$.

Proof: In real analysis courses, one often uses the order properties of the real line to replace the rather complicated looking Riemann sums with the simpler Darboux sums, en route to proving the real-variable analogue of the above proposition. However, in our complex setting the ordering of the real line is not available, so we will tackle the Riemann sums directly rather than try to compare them with Darboux sums.

It suffices to prove that the “Riemann sums” (5) are a Cauchy sequence, in the sense that the difference

$\displaystyle \sum_{j=1}^n f(\gamma(t^*_j)) (\gamma(t_j) - \gamma(t_{j-1})) - \sum_{k=1}^m f(\gamma(s^*_k)) (\gamma(s_k) - \gamma(s_{k-1}))$

between two sums of the form (5) is smaller than any specified ${\varepsilon>0}$ if the maximum mesh sizes of the two partitions ${a = t_0 < \dots < t_n = b}$ and ${a = s_0 < \dots < s_m = b}$ are both small enough. From the triangle inequality, and from the fact that any two partitions have a common refinement, it suffices to prove this under the additional assumption that the second partition ${a = s_0 < \dots < s_m = b}$ is a refinemnt of ${a = t_0 < \dots < t_n = b}$. This means that there is an increasing sequence ${0 = m_0 < m_1 < \dots < m_n = b}$ of natural numbers such that ${s_{m_j} = t_j}$ for ${j=0,\dots,n}$. In that case, the above difference may be rearranged as

$\displaystyle \sum_{j=1}^n E_j$

where

$\displaystyle E_j := f(\gamma(t^*_j)) (\gamma(t_j) - \gamma(t_{j-1})) - \sum_{k=m_{j-1}+1}^{m_j} f(\gamma(s^*_k)) (\gamma(s_k) - \gamma(s_{k-1})).$

By telescoping series, we may rearrange ${E_j}$ further as

$\displaystyle E_j := \sum_{k=m_{j-1}+1}^{m_j} (f(\gamma(t^*_j)) - f(\gamma(s^*_k))) (\gamma(s_k) - \gamma(s_{k-1})).$

As ${\gamma: [a,b] \rightarrow {\bf C}}$ is continuous and ${[a,b]}$ is compact, ${\gamma}$ is uniformly continuous. In particular, if the maximum mesh sizes are small enough, we have

$\displaystyle |f(\gamma(t^*_j)) - f(\gamma(s^*_k))| \leq \varepsilon$

for all ${1 \leq j \leq n}$ and ${m_{j-1} \leq k \leq m_j}$. From the triangle inequality we conclude that

$\displaystyle |E_j| \leq \varepsilon \sum_{k=m_{j-1}+1}^{m_j} |\gamma(s_k) - \gamma(s_{k-1})|$

and hence on summing in ${j}$ and using the triangle inequality, we can bound

$\displaystyle |\sum_{j=1}^n E_j| \leq \varepsilon |\gamma|.$

Since ${|\gamma|}$ is finite, and ${\varepsilon}$ can be made arbitrarily small, we obtain the required Cauchy sequence property. $\Box$

One cannot simply omit the rectifiability hypothesis from the above proposition:

Exercise 15 Give an example of a curve ${\gamma: [a,b] \rightarrow {\bf C}}$ such that the Riemann sums

$\displaystyle \sum_{j=1}^n \gamma(t^*_j) (\gamma(t_j) - \gamma(t_{j-1}))$

fail to converge to a limit as the maximum mesh size goes to zero, so the integral ${\int_\gamma z\ dz}$ does not exist even though the integrand ${z}$ is extremely smooth. (Of course, such a curve ${\gamma}$ cannot be rectifiable, thanks to Proposition 14.) Hint: the non-rectifiable curve in Exercise (11) is a good place to start, but it turns out that this curve does not oscillate wildly enough to make the Riemann sums here diverge, because of the decay of the function ${z \mapsto z}$ near the origin. Come up with a variant of this curve which oscillates more.)

By abuse of notation, we will refer to the quantity ${\int_\gamma f(z)\ dz}$ as the contour integral of ${f}$ along ${\gamma}$, even though ${\gamma}$ is not necessarily a contour (we will define this concept shortly). We have some easy properties of this integral:

Exercise 17 (This exercise assumes familiarity with the Riemann-Stieltjes integral.) Let ${\gamma: [a,b] \rightarrow {\bf C}}$ be a rectifiable curve. Let ${g: [a,b] \rightarrow {\bf C}}$ denote the monotone non-decreasing function

$\displaystyle g(T) := |\gamma_{[a,T]}|$

for ${a \leq T \leq b}$, where ${\gamma_{[a,T]}: [a,T] \rightarrow {\bf C}}$ is the restriction of ${\gamma: [a,b] \rightarrow {\bf C}}$ to ${[a,T]}$. For any continuous function ${f: \gamma([a,b]) \rightarrow {\bf C}}$, define the arclength measure integral ${\int_\gamma f(z)\ |dz|}$ by the formula

$\displaystyle \int_\gamma f(z)\ |dz| := \int_a^b f(\gamma(t))\ dg(t)$

where the right-hand side is a Riemann-Stieltjes integral. Establish the triangle inequality

$\displaystyle |\int_\gamma f(z)\ dz| \leq \int_\gamma |f(z)|\ |dz|$

for any continuous ${f: \gamma([a,b]) \rightarrow {\bf C}}$. Also establish the identity

$\displaystyle \int_\gamma\ |dz| = |\gamma|$

and obtain an alternate proof of Exercise 16(v).

The change of variables formula (iv) lets one compute many contour integrals using the familiar Riemann integral. For instance, if ${a < b}$ are real numbers and ${f: [a,b] \rightarrow {\bf C}}$ is continuous, then the contour integral along ${\gamma_{a \rightarrow b}}$ coincides with the Riemann integral,

$\displaystyle \int_{\gamma_{a \rightarrow b}} f(z)\ dz = \int_a^b f(x)\ dx$

and on reversal we also have

$\displaystyle \int_{\gamma_{b \rightarrow a}} f(z)\ dz = \int_b^a f(x)\ dx = - \int_a^b f(x)\ dx.$

Similarly, if ${f}$ is continuous on the circle ${\{ z \in {\bf C}: |z-z_0| = r \}}$, we have

$\displaystyle \int_{\gamma_{z_0,r,\circlearrowleft}} f(z)\ dz = i \int_0^{2\pi} f(z_0+re^{i\theta}) r e^{i\theta}\ d\theta \ \ \ \ \ (6)$

$\displaystyle = 2\pi i r \int_0^1 f(z_0 + re^{2\pi i \theta}) e^{2\pi i \theta}\ d\theta.$

$\displaystyle \mathrm{Re} \int_\gamma f(z)\ dz \neq \int_\gamma \mathrm{Re} f(z)\ dz$

$\displaystyle \mathrm{Im} \int_\gamma f(z)\ dz \neq \int_\gamma \mathrm{Im} f(z)\ dz$

and

$\displaystyle \overline{\int_\gamma f(z)\ dz} \neq \int_\gamma \overline{f(z)}\ dz.$

in general. If one wishes to mix line integrals with real and imaginary parts, it is recommended to replace the contour integrals above with the line integrals

$\displaystyle \int_\gamma f(x+iy) dx + g(x+iy) dy,$

which are defined as in Proposition 14 but where the expression

$\displaystyle f(\gamma(t^*_j)) (\gamma(t_j) - \gamma(t_{j-1}))$

appearing in (5) is replaced by

$\displaystyle f(\gamma(t^*_j)) \mathrm{Re}(\gamma(t_j) - \gamma(t_{j-1})) + g(\gamma(t^*_j)) \mathrm{Im}(\gamma(t_j) - \gamma(t_{j-1})).$

The contour integral corresponds to the special case ${g=if}$ (or more informally, ${dz = dx + idy}$). Line integrals are in turn special cases of the more general concept of integration of differential forms, discussed for instance in this article of mine, and which are used extensively in differential geometry and geometric topology. However we will not use these more general line integrals or differential form integrals much in this course.

In later notes it will be convenient to restrict to a more regular class of curves than the rectifiable curves. We thus give the definitions here:

Definition 19 (Smooth curves and contours) A smooth curve is a curve ${\gamma: [a,b] \rightarrow {\bf C}}$ that is continuously differentiable, and such that ${\gamma'(t) \neq 0}$ for all ${t \in [a,b]}$. A contour is a curve that is equivalent to the concatenation of finitely many smooth curves (that is to say, a piecewise smooth curve).

Example 20 The line segments ${\gamma_{z_1 \rightarrow z_2}}$ and circles ${\gamma_{z_0, r, \circlearrowright}, \gamma_{z_0,r,\circlearrowleft}}$ are smooth curves and hence contours. Polygonal paths are usually not smooth, but they are contours. Any sum of finitely many contours is again a contour, and the reversal of a contour is also a contour.

Note here that the term “smooth” differs somewhat here from the real-variable notion of smoothness, which is defined to be “infinitely differentiable”. Smooth curves are still only assumed to just be continuously differentiable; we do not assume that the second derivative of ${\gamma}$ exists. (In particular, smooth curves may have infinite curvature at some points.) In practice, this distinction tends to be minor, though, as the smooth curves that one actually uses in complex analysis do tend to be infinitely differentiable. On the other hand, for most applications one does not need to control any derivative of a contour beyond the first.

The following examples and exercises may help explain why the non-vanishing condition ${\gamma'(t) \neq 0}$ imbues curves with a certain degree of “smoothness”.

Exercise 24 (Local behaviour of smooth curves) Let ${\gamma: [a,b] \rightarrow {\bf C}}$ be a simple smooth curve, and let ${t_0}$ be an interior point of ${(a,b)}$. Let ${\theta \in {\bf R}}$ be a phase of ${\gamma'(t_0)}$, thus ${\gamma'(t_0)= re^{i\theta}}$ for some ${r>0}$. Show that for all sufficiently small ${\varepsilon>0}$, the portion ${\gamma([a,b]) \cap D(\gamma(t_0),\varepsilon)}$ of the image of ${\gamma}$ near ${\gamma(t_0)}$ looks like a rotated graph, in the sense that

$\displaystyle \gamma([a,b]) \cap D(\gamma(t_0),\varepsilon) = \{ \gamma(t_0) + e^{i\theta} ( s + i f(s) ): s \in I_\varepsilon \}$

Exercise 25 Show that a curve ${\gamma}$ is a contour if and only if it is equivalent to the concatenation of finitely many simple smooth curves.

Exercise 26 Show that the cuspidal curve and absolute value curves in Examples 21, 22 are contours, but the curve in Exercise 23 is not.

— 2. The fundamental theorem of calculus —

Now we establish the complex analogues of the fundamental theorem of calculus. As in the real-variable case, there are two useful formulations of this theorem. Here is the first:

Theorem 27 (First fundamental theorem of calculus) Let ${U}$ be an open subset of ${{\bf C}}$, let ${f: U \rightarrow {\bf C}}$ be a continuous function, and suppose that ${f}$ has an antiderivative ${F: U \rightarrow {\bf C}}$, that is to say a holomorphic function with ${f(z) = F'(z)}$ for all ${z \in U}$. Let ${\gamma: [a,b] \rightarrow {\bf C}}$ be a rectifiable curve in ${U}$ with initial point ${z_0}$ and terminal point ${z_1}$. Then

$\displaystyle \int_\gamma f(z)\ dz = F(z_1) - F(z_0).$

Proof: If ${\gamma}$ were continuously differentiable, or at least piecewise continuously differentiable (the concatenation of finitely many continuously differentiable curves), we could establish this theorem by using Exercise 16 to rewrite everything in terms of real-variable Riemann integrals, at which point one can use the real-variable fundamental theorem of calculus (and the chain rule). But actually we can just give a direct proof that does not need any rectifiability hypothesis whatsoever.

We again use the continuity method. Let ${\varepsilon > 0}$, and for each ${a \leq T \leq b}$, let ${\gamma_{[a,T]}: [a,T] \rightarrow {\bf C}}$ be the restriction of ${\gamma: [a,b] \rightarrow {\bf C}}$ to ${[a,T]}$. It will suffice to show that

$\displaystyle |\int_{\gamma_{[a,T]}} f(z)\ dz - (F(\gamma(T)) - F(\gamma(a)))| \leq \varepsilon |\gamma_{[a,T]}| \ \ \ \ \ (7)$

for all ${a \leq T \leq b}$, as the claim then follows by setting ${T=b}$ and sending ${\varepsilon}$ to zero.

Let ${\Omega_\varepsilon}$ denote the set of all ${T_* \in [a,b]}$ such that (7) holds for all ${a \leq T \leq T_*}$. As before, ${\Omega_\varepsilon}$ is clearly closed and contains ${a}$; as ${[a,b]}$ is connected, the only remaining task is to show that ${\Omega_\varepsilon}$ is open in ${[a,b]}$. Let ${T_* \in \Omega_\varepsilon}$ be such that ${T_* < b}$. As ${F}$ is differentiable at ${\gamma(T_*)}$ with derivative ${f(\gamma(T_*))}$, and ${\gamma}$ is continuous, there exists ${\delta>0}$ with ${[T_*,T_*+\delta] \subset [a,b]}$ such that

$\displaystyle |\frac{F(\gamma(T)) - F(\gamma(T_*))}{\gamma(T) - \gamma(T_*)} - f(\gamma(T_*))| \leq \varepsilon/2$

for any ${T_* < T \leq T_*+\delta}$, and hence

$\displaystyle |F(\gamma(T)) - F(\gamma(T_*)) - f(\gamma(T_*)) (\gamma(T)-\gamma(T_*))| \leq \varepsilon/2 |\gamma_{[T_*,T]}| \ \ \ \ \ (8)$

for any ${T \in [T_*, T_*+\delta]}$. On the other hand, if ${\delta}$ is small enough, we have that

$\displaystyle |f(\gamma(t)) - f(\gamma(T_*))| \leq \varepsilon/2$

$\displaystyle |\int_{\gamma_{[T_*,T]}} (f(z) - f(\gamma(T_*)))\ dz| \leq \frac{\varepsilon}{2} |\gamma_{[T_*, T]}|$

where ${\gamma_{[T_*,T]}: [T_*,T] \rightarrow {\bf C}}$ is the restriction of ${\gamma: [a,b] \rightarrow {\bf C}}$ to ${[T_*,T]}$. Applying Exercise 16(vi), (vii) we thus have

$\displaystyle |\int_{\gamma_{[T_*,T]}} f(z)\ dz - f(\gamma(T_*)) (\gamma(T)-\gamma(T_*))| \leq \frac{\varepsilon}{2} |\gamma_{[T_*, T]}|.$

Combining this with (8) and the triangle inequality, we conclude that

$\displaystyle |\int_{\gamma_{[T_*,T]}} f(z)\ dz - (F(\gamma(T)) - F(\gamma(T_*)))| \leq \varepsilon |\gamma_{[T_*,T]}|$

and on adding this to the ${T=T_*}$ case of (7) and again using the triangle inequality, we conclude that (7) holds for all ${T \in [T_*,T_*+\delta]}$. This ensures that ${\Omega_\varepsilon}$ is open, as desired. $\Box$

One can use this theorem to quickly evaluate many integrals by using an antiderivative for the integrand as in the real-variable case. For instance, for any rectifiable curve ${\gamma}$ with initial point ${z_1}$ and terminal point ${z_2}$, we have

$\displaystyle \int_\gamma z\ dz = \frac{1}{2} z_2^2 - \frac{1}{2} z_1^2,$

$\displaystyle \int_\gamma e^z\ dz = e^{z_2} - e^{z_1},$

$\displaystyle \int_\gamma \cos(z)\ dz = \sin(z_2) - \sin(z_1),$

and so forth. If the curve ${\gamma}$ avoids the origin, we also have

$\displaystyle \int_\gamma \frac{1}{z^2}\ dz = -\frac{1}{z_2} + \frac{1}{z_1}$

since ${-\frac{1}{z}}$ is an antiderivative of ${\frac{1}{z^2}}$ on ${{\bf C} \backslash \{0\}}$. If ${\sum_{n=0}^\infty a_n (z-z_0)^n}$ is a power series with radius of convergence ${R}$, and ${\gamma}$ is a curve in ${D(z_0,R)}$ with initial point ${z_1}$ and terminal point ${z_2}$, we similarly have

$\displaystyle \int_\gamma \sum_{n=0}^\infty a_n (z-z_0)^n = \sum_{n=0}^\infty a_n ( \frac{(z_2-z_0)^{n+1}}{n+1} - \frac{(z_1-z_0)^{n+1}}{n+1} ).$

For the second fundamental theorem of calculus, we need a topological preliminary result.

• (i) ${U}$ is topologically connected (that is to say, the only two subsets of ${U}$ that are both open and closed relative to ${U}$ are the empty set and ${U}$ itself).
• (ii) ${U}$ is path connected (that is to say, for any ${z_1,z_2 \in U}$ there exists a curve ${\gamma}$ with image in ${U}$ whose initial point is ${z_1}$ and terminal point is ${z_2}$)
• (iii) ${U}$ is polygonally path connected (the same as (ii), except that ${\gamma}$ is now required to also be a polygonal path).

(Hint: to show that (i) implies (iii), pick a base point ${z_0}$ in ${U}$ and consider the set of all ${z_1}$ in ${U}$ that can be reached from ${z_0}$ by a polygonal path.)

We remark that the relationship between path connectedness and connectedness is more delicate when one does not assume that the space ${U}$ is open; every path-connected space is still connected, but the converse need not be true.

Remark 29 There is some debate as to whether to view the empty set ${\emptyset}$ as connected, disconnected, or neither. I view this as analogous to the debate as to whether the natural number ${1}$ should be viewed as prime, composite, or neither. In both cases I personally prefer the convention of “neither” (and like to use the term “unconnected” to describe the empty set, and “unit” to describe ${1}$), but to avoid any confusion I will restrict the discussion of connectedness to non-empty sets in this course (which is all we will need in applications).

In real analysis, the second fundamental theorem of calculus asserts that if a function ${f: [a,b] \rightarrow {\bf R}}$ is continuous, then the function ${F: x \mapsto \int_a^x f(y)\ dy}$ is an antiderivative of ${f}$. In the complex case, there is an analogous result, but one needs the additional requirement that the function is conservative:

$\displaystyle \int_\gamma f(z)\ dz = 0 \ \ \ \ \ (9)$

whenever ${\gamma}$ is a closed polygonal path in ${U}$. Fix a base point ${z_0 \in U}$, and define the function ${F: U \rightarrow {\bf C}}$ by the formula

$\displaystyle F(z_1) := \int_{\gamma_{z_0 \rightsquigarrow z_1}} f(z)\ dz$

for all ${z_1 \in U}$, where ${\gamma_{z_0 \rightsquigarrow z_1}}$ is any polygonal path from ${z_0}$ to ${z_1}$ in ${U}$ (the existence of such a path follows from Exercise 28 and the hypothesis that ${U}$ is connected, and the independence of the choice of path for the purposes of defining ${F(z_1)}$ follows from Exercise 16 and the conservative hypothesis (9)). Then ${F}$ is holomorphic on ${U}$ and is an antiderivative of ${f}$, thus ${f(z_1) = F'(z_1)}$ for all ${z_1 \in U}$.

Proof: We mimic the proof of the real-variable second fundamental theorem of calculus. Let ${z_1}$ be any point in ${U}$. As ${U}$ is open, it contains some disk ${D(z_1,r)}$ centred at ${z_1}$. In particular, if ${z_2}$ lies in this disk, then the line segment ${\gamma_{z_1 \rightarrow z_2}}$ will have image in ${U}$. If ${\gamma_{z_0 \rightsquigarrow z_1}}$ is any polygonal path from ${z_0}$ to ${z_1}$, then we have

$\displaystyle F(z_1) = \int_{\gamma_{z_0 \rightsquigarrow z_1}} f(z)\ dz$

and

$\displaystyle F(z_2) = \int_{\gamma_{z_0 \rightsquigarrow z_1} + \gamma_{z_1 \rightarrow z_2}} f(z)\ dz$

and hence by Exercise 16

$\displaystyle F(z_2) - F(z_1) - f(z_1) (z_2-z_1) = \int_{\gamma_{z_1 \rightarrow z_2}} (f(z) - f(z_1))\ dz.$

(The reader is strongly advised to draw a picture depicting the situation here.) Let ${\varepsilon>0}$, then for ${z_2}$ sufficiently close to ${z_1}$, we have ${|f(z)-f(z_1)| \leq \varepsilon}$ for all ${z}$ in the image of ${\gamma_{z_1 \rightarrow z_2}}$. Thus by Exercise 16(v) we have

$\displaystyle |F(z_2) - F(z_1) - f(z_1) (z_2-z_1)| \leq \varepsilon |z_2-z_1|$

for ${z_2}$ sufficiently close to ${z_1}$, which implies that

$\displaystyle \lim_{z_2 \rightarrow z_1; z_2 \in U \backslash \{z_1\}} \frac{F(z_2)-F(z_1)}{z_2-z_1} = f(z_1)$

for any ${z_1 \in U}$, and thus ${F}$ is an antiderivative of ${f}$ as required. $\Box$

The notion of a non-empty open connected subset ${U}$ of the complex plane comes up so frequently in complex analysis that many texts assign a special term to this notion; for instance, Stein-Shakarchi refers to such sets as regions, and in other texts they may be called domains. We will stick to just “non-empty open connected subset of ${{\bf C}}$” in this course.

The requirement that ${f}$ be conservative is necessary, as the following exercise shows. Actually it is also necessary in the real-variable case, but it is redundant in that case due to the topological triviality of closed polygonal paths in one dimension: see Exercise 33 below.

Exercise 31 Let ${U}$ be a non-empty connected subset of ${{\bf C}}$, and let ${f: U \rightarrow {\bf C}}$ be continuous. Show that the following are equivalent:

• (i) ${f}$ possesses at least one antiderivative ${F}$.
• (ii) ${f}$ is conservative in the sense that (9) holds for all closed polygonal paths in ${U}$.
• (iii) ${f}$ is conservative in the sense that (9) holds for all simple closed polygonal paths in ${U}$.
• (iv) ${f}$ is conservative in the sense that (9) holds for all closed contours in ${U}$.
• (v) ${f}$ is conservative in the sense that (9) holds for all closed rectifiable curves in ${U}$.

(Hint: to show that (iii) implies (ii), induct on the number of edges in the closed polygonal path, and find a way to decompose non-simple closed polygonal paths into paths with fewer edges. One should avoid non-rigorous “hand-waving” arguments, and make sure that one actually has covered all possible cases, e.g. paths that include some backtracking.) Furthermoore, show that if ${f}$ has two antiderivatives ${F_1, F_2}$, then there exists a constant ${C \in {\bf C}}$ such that ${F_2 = F_1 + C}$.

Exercise 32 Show that the function ${\frac{1}{z}}$ does not have an antiderivative on ${{\bf C} \backslash \{0\}}$. (Hint: integrate ${\frac{1}{z}}$ on ${\gamma_{0,1,\circlearrowleft}}$. In later notes we will see that ${\frac{1}{z}}$ nevertheless does have antiderivatives on many subsets of ${{\bf C} \backslash \{0\}}$, formed by various branches of the complex logarithm.

Exercise 33 If ${f: [a,b] \rightarrow {\bf C}}$ is a continuous function on an interval, show that ${f}$ is conservative in the sense that (9) holds for any closed polygonal path in ${[a,b]}$. What happens for closed rectifiable paths?

Exercise 34 Let ${U}$ be an open subset of ${{\bf C}}$ (not necessarily connected).

• (i) Show that there is a unique collection ${{\mathcal U}}$ of non-empty subsets of ${U}$ that are open, connected, disjoint, and partition ${U}$: ${U = \biguplus_{V \in {\mathcal U}} V}$. (The elements of ${{\mathcal U}}$ are known as the connected components of ${U}$.)
• (ii) Show that the number of connected components of ${U}$ is at most countable. (Hint: show that each connected component contains at least one complex number with rational real and imaginary parts.)
• (iii) If ${f: U \rightarrow {\bf C}}$ is a continuous conservative function on ${U}$, show that ${f}$ has at least one antiderivative ${F}$.
• (iv) If ${U}$ has more than one connected component, show that it is possible for a function ${f: U \rightarrow {\bf C}}$ to have two antiderivatives ${F_1, F_2}$ which do not differ by a constant (i.e. there is no complex number ${C}$ such that ${F_1 = F_2 + C}$).

Exercise 35 (Integration by parts) Let ${f,g: U \rightarrow {\bf C}}$ be holomorphic functions on an open set ${U}$, and let ${\gamma}$ be a rectifiable curve in ${U}$ with initial point ${z_1}$ and terminal point ${z_2}$. Prove that

$\displaystyle \int_\gamma f(z) g'(z)\ dz = f(z_2) g(z_2) - f(z_1) g(z_1) - \int_\gamma f'(z) g(z)\ dz.$

## September 29, 2016

### Mark Chu-Carroll — Polls and Sampling Errors in the Presidental Debate Results

My biggest pet peeve is press coverage of statistics. As someone who is mathematically literate, I’m constantly infuriated by it. Basic statistics isn’t that hard, but people can’t be bothered to actually learn a tiny bit in order to understand the meaning of the things they’re covering.

My twitter feed has been exploding with a particularly egregious example of this. After monday night’s presidential debate, there’s been a ton of polling about who “won” the debate. One conservative radio host named Bill Mitchell has been on a rampage about those polls. Here’s a sample of his tweets:

Statistical analysis has a very simple point. We’re interested in understanding the properties of a large population of things. For whatever reason, we can’t measure the properties of every object in that population.

The exact reason can vary. In political polling, we can’t ask every single person in the country who they’re going to vote for. (Even if we could, we simply don’t know who’s actually going to show up and vote!) For a very different example, my first exposure to statistics was through my father, who worked in semiconductor manufacturing. They’d produce a run of 10,000 chips for use in Satellites. They needed to know when, on average, a chip would fail from exposure to radiation. If they measured that in every chip, they’d end up with nothing to sell.)

Anyway: you can’t measure every element of the population, but you still want to take measurements. So what you do is randomly select a collection of representative elements from the population, and you measure those. Then you can say that with a certain probability, the result of analyzing that representative subset will match the result that you’d get if you measured the entire population.

How close can you get? If you’ve really selected a random sample of the population, then the answer depends on the size of the sample. We measure that using something called the “margin of error”. “Margin of error” is actually a terrible name for it, and that’s the root cause of one of the most common problems in reporting about statistics. The margin of error is a probability measurement that says “there is an $N$% probability that the value for the full population lies within the margin of error of the measured value of the sample.”.

Right away, there’s a huge problem with that. What is that variable doing in there? The margin of error measures the probability that the full population value is within a confidence interval around the measured sample value. If you don’t say what the confidence interval is, the margin of error is worthless. Most of the time – but not all of the time – we’re talking about a 95% confidence interval.

But there are several subtler issues with the margin of error, both due to the name.

1. The “true” value for the full population is not guaranteed to be within the margin of error of the sampled value. It’s just a probability. There is no hard bound on the size of the error: just a high probability of it being within the margin..
2. The margin of error only includes errors due to sample size. It does not incorporate any other factor – and there are many! – that may have affected the result.
3. The margin of error is deeply dependent on the way that the underlying sample was taken. It’s only meaningful for a random sample. That randomness is critically important: all of sampled statistics is built around the idea that you’ve got a randomly selected subset of your target population.

Let’s get back to our friend the radio host, and his first tweet, because he’s doing a great job of illustrating some of these errors.

The quality of a sampled statistic is entirely dependent on how well the sample matches the population. The sample is critical. It doesn’t matter how big the sample size is if it’s not random. A non-random sample cannot be treated as a representative sample.

So: an internet poll, where a group of people has to deliberately choose to exert the effort to participate cannot be a valid sample for statistical purposes. It’s not random.

It’s true that the set of people who show up to vote isn’t a random sample. But that’s fine: the purpose of an election isn’t to try to divine what the full population thinks. It’s to count what the people who chose to vote think. It’s deliberately measuring a full population: the population of people who chose to vote.

But if you’re trying to statistically measure something about the population of people who will go and vote, you need to take a randomly selected sample of people who will go to vote. The set of voters is the full population; you need to select a representative sample of that population.

Internet polls do not do that. At best, they measure a different population of people. (At worst, with ballot stuffing, they measure absolutely nothing, but we’ll give them this much benefit of the doubt.) So you can’t take much of anything about the sample population and use it to reason about the full population.

And you can’t say anything about the margin of error, either. Because the margin of error is only meaningful for a representative sample. You cannot compute a meaningful margin of error for a non-representative sample, because there is no way of knowing how that sampled population compares to the true full target population.

And that brings us to the second tweet. A properly sampled random population of 500 people can produce a high quality result with a roughly 5% margin of error and a 95% confidence interval. (I’m doing a back-of-the-envelope calculation here, so that’s not precise.) That means that if the population were randomly sampled, we could say there is in 19 out of 20 polls of that size, the full population value would be within +/- 4% of value measured by the poll. For a non-randomly selected sample of 10 million people, the margin of error cannot be measured, because it’s meaningless. The random sample of 500 people tells us a reasonable estimate based on data; the non-random sample of 10 million people tells us nothing.

And with that, on to the third tweet!

In a poll like this, the margin of error only tells us one thing: what’s the probability that the sampled population will respond to the poll in the same way that the full population would?

There are many, many things that can affect a poll beyond the sample size. Even with a truly random and representative sample, there are many things that can affect the outcome. For a couple of examples:

How, exactly, is the question phrased? For example, if you ask people “Should police shoot first and ask questions later?”, you’ll get a very different answer from “Should police shoot dangerous criminal suspects if they feel threatened?” – but both of those questions are trying to measure very similar things. But the phrasing of the questions dramatically affects the outcome.

What context is the question asked in? Is this the only question asked? Or is it asked after some other set of questions? The preceding questions can bias the answers. If you ask a bunch of questions about how each candidate did with respect to particular issues before you ask who won, those preceding questions will bias the answers.

When you’re looking at a collection of polls that asked different questions in different ways, you expect a significant variation between them. That doesn’t mean that there’s anything wrong with any of them. They can all be correct even though their results vary by much more than their margins of error, because the margin of error has nothing to do with how you compare their results: they used different samples, and measured different things.

The problem with the reporting is the same things I mentioned up above. The press treats the margin of error as an absolute bound on the error in the computed sample statistics (which it isn’t); and the press pretends that all of the polls are measuring exactly the same thing, when they’re actually measuring different (but similar) things. They don’t tell us what the polls are really measuring; they don’t tell us what the sampling methodology was; and they don’t tell us the confidence interval.

Which leads to exactly the kind of errors that Mr. Mitchell made.

And one bonus. Mr. Mitchell repeatedly rants about how many polls show a “bias” by “over-sampling< democratic party supporters. This is a classic mistake by people who don't understand statistics. As I keep repeating, for a sample to be meaningful, it must be random. You can report on all sorts of measurements of the sample, but you cannot change it.

If you’re randomly selecting phone numbers and polling the respondents, you cannot screen the responders based on their self-reported party affiliation. If you do, you are biasing your sample. Mr. Mitchell may not like the results, but that doesn’t make them invalid. People report what they report.

In the last presidential election, we saw exactly this notion in the idea of “unskewing” polls, where a group of conservative folks decided that the polls were all biased in favor of the democrats for exactly the reasons cited by Mr. Mitchell. They recomputed the poll results based on shifting the samples to represent what they believed to be the “correct” breakdown of party affiliation in the voting population. The results? The actual election results closely tracked the supposedly “skewed” polls, and the unskewers came off looking like idiots.

We also saw exactly this phenomenon going on in the Republican primaries this year. Randomly sampled polls consistently showed Donald Trump crushing his opponents. But the political press could not believe that Donald Trump would actually win – and so they kept finding ways to claim that the poll samples were off: things like they were off because they used land-lines which oversampled older people, and if you corrected for that sampling error, Trump wasn’t actually winning. Nope: the randomly sampled polls were correct, and Donald Trump is the republican nominee.

If you want to use statistics, you must work with random samples. If you don’t, you’re going to screw up the results, and make yourself look stupid.

## September 28, 2016

### Richard Easther — The Man Who Sold Mars

Elon Musk knows how to make a splash, and today he outlined his plan to turn humanity into a "multiplanetary species". Getting people to Mars is certainly doable and Musk's company SpaceX is at the forefront of current developments in space technology. But Musk painted a picture of a future where travel to Mars was downright cheap, with tickets costing as little as $200,000, the median price of an American home. So is this possible? I have no idea, but it makes for a great Fermi question, a problem so fuzzy and incomplete that educated guesswork is the only way forward. (And "educated" is the key word – these puzzlers are part of the legacy of Enrico Fermi who used to test students with problems like "how many piano tuners are there in Chicago?" alongside more technical physics topics.) The key number is not distance but cost. Musk talks about making a journey to Mars that lasts a few months, but it will take the better part of a year for the Interplanetary Spaceship to make a return trip to Mars, even if most passengers are only travelling one way. Let's compare that to the cost of long-haul plane travel: I might pay$700 for a one-way trans-Pacific flight. This fare entitles me to 1/300th of a very expensive airplane for most of one day and covers my share of the fuel and the crew.

So how does this stack up against Musk's goal for a trip to Mars? His presentation talked about a craft that holds 100-200 passengers. Let's assume that SpaceX can eventually get to the point where Interplanetary Spaceships are about as expensive to build and run as modern airliners. Splitting the difference between 100 and 200, the daily cost of a single seat would be $1,400 a day. And each passenger is effectively using their slice of an Interplanetary Spaceship for one year, so that works out at about$500,000 for the trip.

### Particlebites — Daya Bay and the search for sterile neutrinos

Article: Improved search for a light sterile neutrino with the full configuration of the Daya Bay Experiment
Authors: Daya Bay Collaboration
Reference: arXiv:1607.01174

Today I bring you news from the Daya Bay reactor neutrino experiment, which detects neutrinos emitted by three nuclear power plants on the southern coast of China. The results in this paper are based on the first 621 days of data, through November 2013; more data remain to be analyzed, and we can expect a final result after the experiment ends in 2017.

Figure 1: Antineutrino detectors installed in the far hall of the Daya Bay experiment. Source: LBL news release.

For more on sterile neutrinos, see also this recent post by Eve.

Neutrino oscillations

Neutrinos exist in three flavors, each corresponding to one of the charged leptons: electron neutrinos ($\nu_e$), muon neutrinos ($\nu_\mu$) and tau neutrinos ($\nu_\tau$). When a neutrino is born via the weak interaction, it is created in a particular flavor eigenstate. So, for example, a neutrino born in the sun is always an electron neutrino. However, the electron neutrino does not have a definite mass. Instead, each flavor eigenstate is a linear combination of the three mass eigenstates. This “mixing” of the flavor and mass eigenstates is described by the PMNS matrix, as shown in Figure 2.

Figure 2: Each neutrino flavor eigenstate is a linear combination of the three mass eigenstates.

The PMNS matrix can be parameterized by 4 numbers: three mixing angles (θ12, θ23 and θ13) and a phase (δ).1  These parameters aren’t known a priori — they must be measured by experiments.

Solar neutrinos stream outward in all directions from their birthplace in the sun. Some intercept Earth, where human-built neutrino observatories can inventory their flavors. After traveling 150 million kilometers, only ⅓ of them register as electron neutrinos — the other ⅔ have transformed along the way into muon or tau neutrinos. These neutrino flavor oscillations are the experimental signature of neutrino mixing, and the means by which we can tease out the values of the PMNS parameters. In any specific situation, the probability of measuring each type of neutrino  is described by some experiment-specific parameters (the neutrino energy, distance from the source, and initial neutrino flavor) and some fundamental parameters of the theory (the PMNS mixing parameters and the neutrino mass-squared differences). By doing a variety of measurements with different neutrino sources and different source-to-detector (“baseline”) distances, we can attempt to constrain or measure the individual theory parameters. This has been a major focus of the worldwide experimental neutrino program for the past 15 years.

1 This assumes the neutrino is a Dirac particle. If the neutrino is a Majorana particle, there are two more phases, for a total of 6 parameters in the PMNS matrix.

Sterile neutrinos

Many neutrino experiments have confirmed our model of neutrino oscillations and the existence of three neutrino flavors. However, some experiments have observed anomalous signals which could be explained by the presence of a fourth neutrino. This proposed “sterile” neutrino doesn’t have a charged lepton partner (and therefore doesn’t participate in weak interactions) but does mix with the other neutrino flavors.

The discovery of a new type of particle would be tremendously exciting, and neutrino experiments all over the world (including Daya Bay) have been checking their data for any sign of sterile neutrinos.

Neutrinos from reactors

Figure 3: Chart of the nuclides, color-coded by decay mode. Source: modified from Wikimedia Commons.

Nuclear reactors are a powerful source of electron antineutrinos. To see why, take a look at this zoomed out version of the chart of the nuclides. The chart of the nuclides is a nuclear physicist’s version of the periodic table. For a chemist, Hydrogen-1 (a single proton), Hydrogen-2 (one proton and one neutron) and Hydrogen-3 (one proton and two neutrons) are essentially the same thing, because chemical bonds are electromagnetic and every hydrogen nucleus has the same electric charge. In the realm of nuclear physics, however, the number of neutrons is just as important as the number of protons. Thus, while the periodic table has a single box for each chemical element, the chart of the nuclides has a separate entry for every combination of protons and neutrons (“nuclide”) that has ever been observed in nature or created in a laboratory.

The black squares are stable nuclei. You can see that stability only occurs when the ratio of neutrons to protons is just right. Furthermore, unstable nuclides tend to decay in such a way that the daughter nuclide is closer to the line of stability than the parent.

Nuclear power plants generate electricity by harnessing the energy released by the fission of Uranium-235. Each U-235 nucleus contains 143 neutrons and 92 protons (n/p = 1.6). When U-235 undergoes fission, the resulting fragments also have n/p ~ 1.6, because the overall number of neutrons and protons is still the same. Thus, fission products tend to lie along the white dashed line in Figure 3, which falls above the line of stability. These nuclides have too many neutrons to be stable, and therefore undergo beta decay: $n \to p + e + \bar{\nu}_e$. A typical power reactor emits 6 × 10^20 $\bar{\nu}_e$ per second.

Figure 4: Layout of the Daya Bay experiment. Source: arXiv:1508.03943.

The Daya Bay experiment

The Daya Bay nuclear power complex is located on the southern coast of China, 55 km northeast of Hong Kong. With six reactor cores, it is one of the most powerful reactor complexes in the world — and therefore an excellent source of electron antineutrinos. The Daya Bay experiment consists of 8 identical antineutrino detectors in 3 underground halls. One experimental hall is located as close as possible to the Daya Bay nuclear power plant; the second is near the two Ling Ao power plants; the third is located 1.5 – 1.9 km away from all three pairs of reactors, a distance chosen to optimize Daya Bay’s sensitivity to the mixing angle $\theta_{13}$.

The neutrino target at the heart of each detector is a cylindrical vessel filled with 20 tons of Gadolinium-doped liquid scintillator. The vast majority of $\bar{\nu}_e$ pass through undetected, but occasionally one will undergo inverse beta decay in the target volume, interacting with a proton to produce a positron and a neutron: $\bar{\nu}_e + p \to e^+ + n$.

Figure 5: Design of the Daya Bay $\bar{\nu}_e$ detectors. Each detector consists of three nested cylindrical vessels. The inner acrylic vessel is about 3 meters tall and 3 meters in diameter. It contains 20 tons of Gadolinium-doped liquid scintillator; when a $\bar{\nu}_e$ interacts in this volume, the resulting signal can be picked up by the detector. The outer acrylic vessel holds an additional 22 tons of liquid scintillator; this layer exists so that $\bar{\nu}_e$ interactions near the edge of the inner volume are still surrounded by scintillator on all sides — otherwise, some of the gamma rays produced in the event might escape undetected. The stainless steel outer vessel is filled with 40 tons of mineral oil; its purpose to prevent outside radiation from reaching the scintillator. Finally, the outer vessel is lined with 192 photomultiplier tubes, which collect the scintillation light produced by particle interactions in the active scintillation volumes. The whole device is underwater for additional shielding. Source: arXiv:1508.03943.

Figure 6: Cartoon version of the signal produced in the Daya Bay detectors by inverse beta decay. The size of the prompt pulse is related to the antineutrino energy; the delayed pulse has a characteristic energy of 8 MeV.

The positron and neutron create signals in the detector with a characteristic time relationship, as shown in Figure 6. The positron immediately deposits its energy in the scintillator and then annihilates with an electron. This all happens within a few nanoseconds and causes a prompt flash of scintillation light. The neutron, meanwhile, spends some tens of microseconds bouncing around (“thermalizing”) until it is slow enough to be captured by a Gadolinium nucleus. When this happens, the nucleus emits a cascade of gamma rays, which in turn interact with the scintillator and produce a second flash of light. This combination of prompt and delayed signals is used to identify $\bar{\nu}_e$ interaction events.

Daya Bay’s search for sterile neutrinos

Daya Bay is a neutrino disappearance experiment. The electron antineutrinos emitted by the reactors can oscillate into muon or tau antineutrinos as they travel, but the detectors are only sensitive to $\bar{\nu}_e$, because the antineutrinos have enough energy to produce a positron but not the more massive $\mu^+$ or $\tau^+$. Thus, Daya Bay observes neutrino oscillations by measuring fewer $\bar{\nu}_e$ than would be expected otherwise.

Based on the number of $\bar{\nu}_e$ detected at one of the Daya Bay experimental halls, the usual three-neutrino oscillation theory can predict the number that will be seen at the other two experimental halls (EH). You can see how this plays out in Figure 7. We are looking at the neutrino energy spectrum measured at EH2 and EH3, divided by the prediction computed from the EH1 data. The gray shaded regions mark the one-standard-deviation uncertainty bounds of the predictions. If the black data points deviated significantly from the shaded region, that would be a sign that the three-neutrino oscillation model is not complete, possibly due to the presence of sterile neutrinos. However, in this case, the black data points are statistically consistent with the prediction. In other words, Daya Bay sees no evidence for sterile neutrinos.

Figure 7: Some results of the Daya Bay sterile neutrino search. Source: arxiv:1607.01174.

Does that mean sterile neutrinos don’t exist? Not necessarily. For one thing, the effect of a sterile neutrino on the Daya Bay results would depend on the sterile neutrino mass and mixing parameters. The blue and red dashed lines in Figure 7 show the sterile neutrino prediction for two specific choices of $\theta_{14}$ and $\Delta m_{41}^2$; these two examples look quite different from the three-neutrino prediction and can be ruled out because they don’t match the data. However, there are other parameter choices for which the presence of a sterile neutrino wouldn’t have a discernable effect on the Daya Bay measurements. Thus, Daya Bay can constrain the parameter space, but can’t rule out sterile neutrinos completely. However, as more and more experiments report “no sign of sterile neutrinos here,” it appears less and less likely that they exist.

• K. Nakamura, “Neutrino mass, mixing and oscillations,” PDG review of particle physics. (http://pdg.lbl.gov/2015/reviews/rpp2015-rev-neutrino-mixing.pdf)
• C. Mariani, “Review of Reactor Neutrino Oscillation Experiments.” (arXiv:1201.6665)
• Xin Qian and Wei Wang, “Reactor Neutrino Experiments: $\theta_{13}$  and Beyond.” (arXiv:1405.7217)
• Antonio Palazzo, “Constraints on very light sterile neutrinos from $\theta_{13}$-sensitive reactor experiments.” (arXiv:1308.5880)

### Doug Natelson — Deborah Jin - gone way too soon.

As was pointed out by a commenter on my previous post, and mentioned here by ZapperZ, atomic physicist Deborah Jin passed away last week from cancer at 47.   I don't think I ever met Prof. Jin (though she graduated from my alma mater when I was a freshman) face to face, and I'm not by any means an expert in her subdiscipline, but I will do my best to give an overview of some of her scientific legacy.  There is a sad shortage of atomic physics blogs....  I'm sure I'm missing things - please fill in additional information in the comments if you like.

The advent of optical trapping and laser cooling (relevant Nobel here) transformed atomic physics from what had been a comparatively sleepy specialty, concerned with measuring details of optical transitions and precision spectroscopy (useful for atomic clocks), into a hive of activity, looking at the onset of new states of matter that happen when gases become sufficiently cold and dense that their quantum statistics start to be important.  In a classical noninteracting gas, there are few limits on the constituent molecules - as long as they don't actually try to be in the same place at the same time (think of this as the billiard ball restriction), the molecules can take on whatever spatial locations and momenta that they can reach.  However, if a gas is very cold (low average kinetic energy per molecule) and dense, the quantum properties of the constituents matter - for historical reasons this is called the onset of "degeneracy".  If the constituents are fermions, then the Pauli principle, the same physics that keeps all 79 electrons in an atom of gold from hanging out in the 1s orbital, keeps the constituents apart, and keeps them from all falling into the lowest available energy state.   In contrast, if the constituents are bosons, then a macroscopic fraction of the constituents can fall into the lowest energy state, a process called Bose-Einstein condensation (relevant Nobel here); the condensed state is a single quantum state with a large occupation, and therefore can show exotic properties.

Prof. Jin's group did landmark work with these systems.  She and her student Brian DeMarco showed that you could actually reach the degenerate limit in a trapped atomic Fermi gas.  A major challenge in this field is trying to avoid 3-body and other collisions that can create states of the atoms that are no longer trapped by the lasers and magnetic fields used to do the confinement, and yet still create systems that are (in their quantum way) dense.  Prof. Jin's group showed that you could actually finesse this issue and pair up fermionic atoms to create trapped, ultracold diatomic molecules.  Moreover, you could then create a Bose-Einstein condensate of molecules (since a pair of fermions can be considered as a composite boson).  In superconductors, we're used to the idea that electrons can form Cooper pairs, which act as composite bosons and form a coherent quantum system, the superconducting state.  However, in superconductors, the Cooper pairs are "large" - the average real-space separation between the electrons that constitute a pair is big compared to the typical separation between particles.  Prof. Jin's work showed that in atomic gases you could span between the limits (BEC of tightly bound molecules on the one hand, vs. condensed state of loosely paired fermions on the other).  More recently, her group had been doing cool work looking at systems good for testing models of magnetism and other more complicated condensed matter phenoma, by using dipolar molecules, and examining very strongly interacting fermions.   Basically, Prof. Jin was an impressively creative, technically skilled, extremely productive physicist, and by all accounts a generous person who was great at mentoring students and postdocs.   She has left a remarkable scientific legacy for someone whose professional career was tragically cut short, and she will be missed.

## September 21, 2016

### Clifford Johnson — Super Nailed It…

On the sofa, during a moment while we watched Captain America: Civil War over the weekend:

Amy: Wait, what...? Why's Cat-Woman in this movie?
Me: Er... (hesitating, not wanting to spoil what is to come...)
Amy: Isn't she a DC character?
Me: Well... (still hesitating, but secretly impressed by her awareness of the different universes... hadn't realized she was paying attention all these years.)
Amy: So who's going to show up next? Super-Dude? Bat-Fella? Wonder-Lady? (Now she's really showing off and poking fun.)
Me: We'll see... (Now choking with laughter on dinner...)

I often feel bad subjecting my wife to this stuff, but this alone was worth it.

For those who know the answers and are wondering, I held off on launching into a discussion about the fascinating history of Marvel, representation of people of African descent in superhero comics (and now movies and TV), the [...] Click to continue reading this post

The post Super Nailed It… appeared first on Asymptotia.

### Chad Orzel — Teaching Evaluations and the Problem of Unstated Assumptions

There’s a piece in Inside Higher Ed today on yet another study showing that student course evaluations don’t correlate with student learning. For a lot of academics, the basic reaction to this is summed up in the Chuck Pearson tweet that sent me to the story: “Haven’t we settled this already?”

The use of student course evaluations, though, is a perennial argument in academia, not likely to be nailed into a coffin any time soon. It’s also a good example of a hard problem made intractable by a large number of assumptions and constraints that are never clearly spelled out.

As discussed in faculty lounges and on social media, the basic argument here (over)simplifies to a collection of administrators who like using student course evaluations as a way to measure faculty teaching, and a collection of faculty who hate this practice. If this were just an argument about what is the most accurate way to assess the quality of teaching in the abstract, studies like the one reported in IHE (and numerous past examples) would probably settle the question, but it’s not, because there’s a lot of other stuff going on. And because a lot of the other stuff that’s going on is never clearly stated, a lot of the stuff people wind up saying in the course of this argument is not actually helpful.

One source of fundamental conflict and miscommunication is over the need for evaluating teaching in the first place. On the faculty side, administrative mandates for some sort of teaching assessment are often derided as brainless corporatism– pointless hoop-jumping that is being pushed on academia by people who want everything to be run like a business. The preference of many faculty in these arguments would be for absolutely no teaching evaluation whatsoever.

That kind of suggestion, though, gives the people who are responsible for running institutions the howling fantods. Not because they’ve sold their souls to creeping corporatism, but because some kind of evaluation is just basic, common-sense due diligence. You’ve got to do something to keep tabs on what your teaching faculty are doing in the classroom, if nothing else in order to have a response when some helicopter parent calls in and rants about how Professor So-and-So is mistreating their precious little snowflake. Or, God forbid, so you get wind of any truly outrageous misconduct on the part of faculty before it becomes a giant splashy news story that makes you look terrible.

That helps explain why administrators want some sort of evaluation, but why are the student comment forms so ubiquitous in spite of their flaws? The big advantage that these have is that they’re cheap and easy. You just pass out bubble sheets or direct students to the right URL, and their feedback comes right to you in an easily digestible form.

And, again, this is something that’s often derided as corporatist penny-pinching, but it’s a very real concern. We know how to do teaching evaluation well– we do it when the stakes are highest— but it’s a very expensive and labor-intensive process. It’s not something that would be practical to do every year for every faculty member, and that’s not just because administrators are cheap– it’s because the level of work required from faculty would be seen as even more of an outrage than continuing to use the bubble-sheet student comment forms.

And that’s why the studies showing that student comments don’t accurately measure teaching quality don’t get much traction. Everybody knows that it’s a bad measurement of that, but doing a good measurement of that isn’t practical, and also isn’t really the point.

On the faculty side, one thing to do is to recognize that there’s a legitimate need for some sort of institutional oversight, and look for practical alternatives that avoid the worst biases of student course comment forms without being unduly burdensome to implement. You’re not going to get a perfect measure of teaching quality, and “do nothing at all” is not an option, but maybe there’s some middle ground that can provide the necessary oversight without quintupling everybody’s workload. Regular classroom observations, say, though you’d need some safeguard against personal conflicts– maybe two different observers, one by the dean/chair or their designee, one by a colleague chosen by the faculty member being evaluated. It’s more work than just passing out forms, but better and fairer evaluation might be worth the effort.

On the administrative side, more acknowledgement that evaluation is less about assessing faculty “merit” in a meaningful way, and more about assuring some minimum level of quality for the institution as a whole. And student comments have some role to play in this, but it should be acknowledged that these are mostly customer satisfaction surveys, not serious assessments of faculty quality. In which case they shouldn’t be tied to faculty compensation, as is all too often the case– if there must be financial incentives tied to faculty evaluation, they need to be based on better information than that, and the sums involved should be commensurate with the level of effort required to make the system work.

I don’t really expect any of those to go anywhere, of course, but that’s my $0.02 on this issue. And though it should go without saying, let me emphasize that this is only my opinion as an individual academic. While I fervently hope that my employer agrees with me about the laws of physics, I don’t expect that they share my opinions on academic economics or politics, so don’t hold it against them. ### Backreaction — We understand gravity just fine, thank you. Yesterday I came across a Q&A on the website of Discover magazine, titled “The Root of Gravity - Does recent research bring us any closer to understanding it?” Jeff Lepler from Michigan has the following question: Q: Are we any closer to understanding the root cause of gravity between objects with mass? Can we use our newly discovered knowledge of the Higgs boson or gravitational waves to perhaps negate mass or create/negate gravity?” A person by name Bill Andrews (unknown to me) gives the following answer: A: Sorry, Jeff, but scientists still don’t really know why gravity works. In a way, they’ve just barely figured out how it works.” The answer continues, but let’s stop right there where the nonsense begins. What’s that even mean scientists don’t know “why” gravity works? And did the Bill person really think he could get away with swapping “why” for a “how” and nobody would notice? The purpose of science is to explain observations. We have a theory by name General Relativity that explains literally all data of gravitational effects. Indeed, that General Relativity is so dramatically successful is a great frustration for all those people who would like to revolutionize science a la Einstein. So in which sense, please, do scientists barely know how it works? For all we can presently tell gravity is a fundamental force, which means we have no evidence for an underlying theory from which gravity could be derived. Sure, theoretical physicists are investigating whether there is such an underlying theory that would give rise to gravity as well as the other interactions, a “theory of everything”. (Please submit nomenclature complaints to your local language police, not to me.) Would such a theory of everything explain “why” gravity works? No, because that’s not a meaningful scientific question. A theory of everything could potentially explain how gravity can arise from more fundamental principles similar to, say, the ideal gas law can arise from statistical properties of many atoms in motion. But that still wouldn’t explain why there should be something like gravity, or anything, in the first place. Either way, even if gravity arises within a larger framework like, say, string theory, the effects of what we call gravity today would still come about because energy-densities (and related quantities like pressure and momentum flux and so on) curve space-time, and fields move in that space-time. Just that these quantities might no longer be fundamental. We’ve known since 101 years how this works. After a few words on Newtonian gravity, the answer continues: “Because the other forces use “force carrier particles” to impart the force onto other particles, for gravity to fit the model, all matter must emit gravitons, which physically embody gravity. Note, however, that gravitons are still theoretical. Trying to reconcile these different interpretations of gravity, and understand its true nature, are among the biggest unsolved problems of physics.” Reconciling which different interpretations of gravity? These are all the same “interpretation.” It is correct that we don’t know how to quantize gravity so that the resulting theory remains viable also when gravity becomes strong. It’s also correct that the force-carrying particle associated to the quantization – the graviton – hasn’t been detected. But the question was about gravity, not quantum gravity. Reconciling the graviton with unquantized gravity is straight-forward – it’s called perturbative quantum gravity – and exactly the reason most theoretical physicists are convinced the graviton exists. It’s just that this reconciliation breaks down when gravity becomes strong, which means it’s only an approximation. “But, alas, what we do know does suggest antigravity is impossible.” That’s correct on a superficial level, but it depends on what you mean by antigravity. If you mean by antigravity that you can let any of the matter which surrounds us “fall up” it’s correct. But there are modifications of general relativity that have effects one can plausibly call anti-gravitational. That’s a longer story though and shall be told another time. A sensible answer to this question would have been: “Dear Jeff, The recent detection of gravitational waves has been another confirmation of Einstein’s theory of General Relativity, which still explains all the gravitational effects that physicists know of. According to General Relativity the root cause of gravity is that all types of energy curve space-time and all matter moves in this curved space-time. Near planets, such as our own, this can be approximated to good accuracy by Newtonian gravity. There isn’t presently any observation which suggests that gravity itself emergens from another theory, though it is certainly a speculation that many theoretical physicists have pursued. There thus isn’t any deeper root for gravity because it’s presently part of the foundations of physics. The foundations are the roots of everything else. The discovery of the Higgs boson doesn’t tell us anything about the gravitational interaction. The Higgs boson is merely there to make sure particles have mass in addition to energy, but gravity works the same either way. The detection of gravitational waves is exciting because it allows us to learn a lot about the astrophysical sources of these waves. But the waves themselves have proved to be as expected from General Relativity, so from the perspective of fundamental physics they didn’t bring news. Within the incredibly well confirmed framework of General Relativity, you cannot negate mass or its gravitational pull. You might also enjoy hearing what Richard Feynman had to say when he was asked a similar question about the origin of the magnetic force: This answer really annoyed me because it’s a lost opportunity to explain how well physicists understand the fundamental laws of nature. ### Scott Aaronson — The No-Cloning Theorem and the Human Condition: My After-Dinner Talk at QCRYPT The following are the after-dinner remarks that I delivered at QCRYPT’2016, the premier quantum cryptography conference, on Thursday Sep. 15 in Washington DC. You could compare to my after-dinner remarks at QIP’2006 to see how much I’ve “”matured”” since then. Thanks so much to Yi-Kai Liu and the other organizers for inviting me and for putting on a really fantastic conference. It’s wonderful to be here at QCRYPT among so many friends—this is the first significant conference I’ve attended since I moved from MIT to Texas. I do, however, need to register a complaint with the organizers, which is: why wasn’t I allowed to bring my concealed firearm to the conference? You know, down in Texas, we don’t look too kindly on you academic elitists in Washington DC telling us what to do, who we can and can’t shoot and so forth. Don’t mess with Texas! As you might’ve heard, many of us Texans even support a big, beautiful, physical wall being built along our border with Mexico. Personally, though, I don’t think the wall proposal goes far enough. Forget about illegal immigration and smuggling: I don’t even want Americans and Mexicans to be able to win the CHSH game with probability exceeding 3/4. Do any of you know what kind of wall could prevent that? Maybe a metaphysical wall. OK, but that’s not what I wanted to talk about. When Yi-Kai asked me to give an after-dinner talk, I wasn’t sure whether to try to say something actually relevant to quantum cryptography or just make jokes. So I’ll do something in between: I’ll tell you about research directions in quantum cryptography that are also jokes. The subject of this talk is a deep theorem that stands as one of the crowning achievements of our field. I refer, of course, to the No-Cloning Theorem. Almost everything we’re talking about at this conference, from QKD onwards, is based in some way on quantum states being unclonable. If you read Stephen Wiesner’s paper from 1968, which founded quantum cryptography, the No-Cloning Theorem already played a central role—although Wiesner didn’t call it that. By the way, here’s my #1 piece of research advice to the students in the audience: if you want to become immortal, just find some fact that everyone already knows and give it a name! I’d like to pose the question: why should our universe be governed by physical laws that make the No-Cloning Theorem true? I mean, it’s possible that there’s some other reason for our universe to be quantum-mechanical, and No-Cloning is just a byproduct of that. No-Cloning would then be like the armpit of quantum mechanics: not there because it does anything useful, but just because there’s gotta be something under your arms. OK, but No-Cloning feels really fundamental. One of my early memories is when I was 5 years old or so, and utterly transfixed by my dad’s home fax machine, one of those crappy 1980s fax machines with wax paper. I kept thinking about it: is it really true that a piece of paper gets transmaterialized, sent through a wire, and reconstituted at the other location? Could I have been that wrong about how the universe works? Until finally I got it—and once you get it, it’s hard even to recapture your original confusion, because it becomes so obvious that the world is made not of stuff but of copyable bits of information. “Information wants to be free!” The No-Cloning Theorem represents nothing less than a partial return to the view of the world that I had before I was five. It says that quantum information doesn’t want to be free: it wants to be private. There is, it turns out, a kind of information that’s tied to a particular place, or set of places. It can be moved around, or even teleported, but it can’t be copied the way a fax machine copies bits. So I think it’s worth at least entertaining the possibility that we don’t have No-Cloning because of quantum mechanics; we have quantum mechanics because of No-Cloning—or because quantum mechanics is the simplest, most elegant theory that has unclonability as a core principle. But if so, that just pushes the question back to: why should unclonability be a core principle of physics? Quantum Key Distribution A first suggestion about this question came from Gilles Brassard, who’s here. Years ago, I attended a talk by Gilles in which he speculated that the laws of quantum mechanics are what they are because Quantum Key Distribution (QKD) has to be possible, while bit commitment has to be impossible. If true, that would be awesome for the people at this conference. It would mean that, far from being this exotic competitor to RSA and Diffie-Hellman that’s distance-limited and bandwidth-limited and has a tiny market share right now, QKD would be the entire reason why the universe is as it is! Or maybe what this really amounts to is an appeal to the Anthropic Principle. Like, if QKD hadn’t been possible, then we wouldn’t be here at QCRYPT to talk about it. Quantum Money But maybe we should search more broadly for the reasons why our laws of physics satisfy a No-Cloning Theorem. Wiesner’s paper sort of hinted at QKD, but the main thing it had was a scheme for unforgeable quantum money. This is one of the most direct uses imaginable for the No-Cloning Theorem: to store economic value in something that it’s physically impossible to copy. So maybe that’s the reason for No-Cloning: because God wanted us to have e-commerce, and didn’t want us to have to bother with blockchains (and certainly not with credit card numbers). The central difficulty with quantum money is: how do you authenticate a bill as genuine? (OK, fine, there’s also the dificulty of how to keep a bill coherent in your wallet for more than a microsecond or whatever. But we’ll leave that for the engineers.) In Wiesner’s original scheme, he solved the authentication problem by saying that, whenever you want to verify a quantum bill, you bring it back to the bank that printed it. The bank then looks up the bill’s classical serial number in a giant database, which tells the bank in which basis to measure each of the bill’s qubits. With this system, you can actually get information-theoretic security against counterfeiting. OK, but the fact that you have to bring a bill to the bank to be verified negates much of the advantage of quantum money in the first place. If you’re going to keep involving a bank, then why not just use a credit card? That’s why over the past decade, some of us have been working on public-key quantum money: that is, quantum money that anyone can verify. For this kind of quantum money, it’s easy to see that the No-Cloning Theorem is no longer enough: you also need some cryptographic assumption. But OK, we can consider that. In recent years, we’ve achieved glory by proposing a huge variety of public-key quantum money schemes—and we’ve achieved even greater glory by breaking almost all of them! After a while, there were basically two schemes left standing: one based on knot theory by Ed Farhi, Peter Shor, et al. That one has been proven to be secure under the assumption that it can’t be broken. The second scheme, which Paul Christiano and I proposed in 2012, is based on hidden subspaces encoded by multivariate polynomials. For our scheme, Paul and I were able to do better than Farhi et al.: we gave a security reduction. That is, we proved that our quantum money scheme is secure, unless there’s a polynomial-time quantum algorithm to find hidden subspaces encoded by low-degree multivariate polynomials (yadda yadda, you can look up the details) with much greater success probability than we thought possible. Today, the situation is that my and Paul’s security proof remains completely valid, but meanwhile, our money is completely insecure! Our reduction means the opposite of what we thought it did. There is a break of our quantum money scheme, and as a consequence, there’s also a quantum algorithm to find large subspaces hidden by low-degree polynomials with much better success probability than we’d thought. What happened was that first, some French algebraic cryptanalysts—Faugere, Pena, I can’t pronounce their names—used Gröbner bases to break the noiseless version of scheme, in classical polynomial time. So I thought, phew! At least I had acceded when Paul insisted that we also include a noisy version of the scheme. But later, Paul noticed that there’s a quantum reduction from the problem of breaking our noisy scheme to the problem of breaking the noiseless one, so the former is broken as well. I’m choosing to spin this positively: “we used quantum money to discover a striking new quantum algorithm for finding subspaces hidden by low-degree polynomials. Err, yes, that’s exactly what we did.” But, bottom line, until we manage to invent a better public-key quantum money scheme, or otherwise sort this out, I don’t think we’re entitled to claim that God put unclonability into our universe in order for quantum money to be possible. Copy-Protected Quantum Software So if not money, then what about its cousin, copy-protected software—could that be why No-Cloning holds? By copy-protected quantum software, I just mean a quantum state that, if you feed it into your quantum computer, lets you evaluate some Boolean function on any input of your choice, but that doesn’t let you efficiently prepare more states that let the same function be evaluated. I think this is important as one of the preeminent evil applications of quantum information. Why should nuclear physicists and genetic engineers get a monopoly on the evil stuff? OK, but is copy-protected quantum software even possible? The first worry you might have is that, yeah, maybe it’s possible, but then every time you wanted to run the quantum program, you’d have to make a measurement that destroyed it. So then you’d have to go back and buy a new copy of the program for the next run, and so on. Of course, to the software company, this would presumably be a feature rather than a bug! But as it turns out, there’s a fact many of you know—sometimes called the “Gentle Measurement Lemma,” other times the “Almost As Good As New Lemma”—which says that, as long as the outcome of your measurement on a quantum state could be predicted almost with certainty given knowledge of the state, the measurement can be implemented in such a way that it hardly damages the state at all. This tells us that, if quantum money, copy-protected quantum software, and the other things we’re talking about are possible at all, then they can also be made reusable. I summarize the principle as: “if rockets, then space shuttles.” Much like with quantum money, one can show that, relative to a suitable oracle, it’s possible to quantumly copy-protect any efficiently computable function—or rather, any function that’s hard to learn from its input/output behavior. Indeed, the implementation can be not only copy-protected but also obfuscated, so that the user learns nothing besides the input/output behavior. As Bill Fefferman pointed out in his talk this morning, the No-Cloning Theorem lets us bypass Barak et al.’s famous result on the impossibility of obfuscation, because their impossibility proof assumed the ability to copy the obfuscated program. Of course, what we really care about is whether quantum copy-protection is possible in the real world, with no oracle. I was able to give candidate implementations of quantum copy-protection for extremely special functions, like one that just checks the validity of a password. In the general case—that is, for arbitrary programs—Paul Christiano has a beautiful proposal for how to do it, which builds on our hidden-subspace money scheme. Unfortunately, since our money scheme is currently in the shop being repaired, it’s probably premature to think about the security of the much more complicated copy-protection scheme! But these are wonderful open problems, and I encourage any of you to come and scoop us. Once we know whether uncopyable quantum software is possible at all, we could then debate whether it’s the “reason” for our universe to have unclonability as a core principle. Unclonable Proofs and Advice Along the same lines, I can’t resist mentioning some favorite research directions, which some enterprising student here could totally turn into a talk at next year’s QCRYPT. Firstly, what can we say about clonable versus unclonable quantum proofs—that is, QMA witness states? In other words: for which problems in QMA can we ensure that there’s an accepting witness that lets you efficiently create as many additional accepting witnesses as you want? (I mean, besides the QCMA problems, the ones that have short classical witnesses?) For which problems in QMA can we ensure that there’s an accepting witness that doesn’t let you efficiently create any additional accepting witnesses? I do have a few observations about these questions—ask me if you’re interested—but on the whole, I believe almost anything one can ask about them remains open. Admittedly, it’s not clear how much use an unclonable proof would be. Like, imagine a quantum state that encoded a proof of the Riemann Hypothesis, and which you would keep in your bedroom, in a glass orb on your nightstand or something. And whenever you felt your doubts about the Riemann Hypothesis resurfacing, you’d take the state out of its orb and measure it again to reassure yourself of RH’s truth. You’d be like, “my preciousssss!” And no one else could copy your state and thereby gain the same Riemann-faith-restoring powers that you had. I dunno, I probably won’t hawk this application in a DARPA grant. Similarly, one can ask about clonable versus unclonable quantum advice states—that is, initial states that are given to you to boost your computational power beyond that of an ordinary quantum computer. And that’s also a fascinating open problem. OK, but maybe none of this quite gets at why our universe has unclonability. And this is an after-dinner talk, so do you want me to get to the really crazy stuff? Yes? Self-Referential Paradoxes OK! What if unclonability is our universe’s way around the paradoxes of self-reference, like the unsolvability of the halting problem and Gödel’s Incompleteness Theorem? Allow me to explain what I mean. In kindergarten or wherever, we all learn Turing’s proof that there’s no computer program to solve the halting problem. But what isn’t usually stressed is that that proof actually does more than advertised. If someone hands you a program that they claim solves the halting problem, Turing doesn’t merely tell you that that person is wrong—rather, he shows you exactly how to expose the person as a jackass, by constructing an example input on which their program fails. All you do is, you take their claimed halt-decider, modify it in some simple way, and then feed the result back to the halt-decider as input. You thereby create a situation where, if your program halts given its own code as input, then it must run forever, and if it runs forever then it halts. “WHOOOOSH!” [head-exploding gesture] OK, but now imagine that the program someone hands you, which they claim solves the halting problem, is a quantum program. That is, it’s a quantum state, which you measure in some basis depending on the program you’re interested in, in order to decide whether that program halts. Well, the truth is, this quantum program still can’t work to solve the halting problem. After all, there’s some classical program that simulates the quantum one, albeit less efficiently, and we already know that the classical program can’t work. But now consider the question: how would you actually produce an example input on which this quantum program failed to solve the halting problem? Like, suppose the program worked on every input you tried. Then ultimately, to produce a counterexample, you might need to follow Turing’s proof and make a copy of the claimed quantum halt-decider. But then, of course, you’d run up against the No-Cloning Theorem! So we seem to arrive at the conclusion that, while of course there’s no quantum program to solve the halting problem, there might be a quantum program for which no one could explicitly refute that it solved the halting problem, by giving a counterexample. I was pretty excited about this observation for a day or two, until I noticed the following. Let’s suppose your quantum program that allegedly solves the halting problem has n qubits. Then it’s possible to prove that the program can’t possibly be used to compute more than, say, 2n bits of Chaitin’s constant Ω, which is the probability that a random program halts. OK, but if we had an actual oracle for the halting problem, we could use it to compute as many bits of Ω as we wanted. So, suppose I treated my quantum program as if it were an oracle for the halting problem, and I used it to compute the first 2n bits of Ω. Then I would know that, assuming the truth of quantum mechanics, the program must have made a mistake somewhere. There would still be something weird, which is that I wouldn’t know on which input my program had made an error—I would just know that it must’ve erred somewhere! With a bit of cleverness, one can narrow things down to two inputs, such that the quantum halt-decider must have erred on at least one of them. But I don’t know whether it’s possible to go further, and concentrate the wrongness on a single query. We can play a similar game with other famous applications of self-reference. For example, suppose we use a quantum state to encode a system of axioms. Then that system of axioms will still be subject to Gödel’s Incompleteness Theorem (which I guess I believe despite the umlaut). If it’s consistent, it won’t be able to prove all the true statements of arithmetic. But we might never be able to produce an explicit example of a true statement that the axioms don’t prove. To do so we’d have to clone the state encoding the axioms and thereby violate No-Cloning. Personal Identity But since I’m a bit drunk, I should confess that all this stuff about Gödel and self-reference is just a warmup to what I really wanted to talk about, which is whether the No-Cloning Theorem might have anything to do with the mysteries of personal identity and “free will.” I first encountered this idea in Roger Penrose’s book, The Emperor’s New Mind. But I want to stress that I’m not talking here about the possibility that the brain is a quantum computer—much less about the possibility that it’s a quantum-gravitational hypercomputer that uses microtubules to solve the halting problem! I might be drunk, but I’m not that drunk. I also think that the Penrose-Lucas argument, based on Gödel’s Theorem, for why the brain has to work that way is fundamentally flawed. But here I’m talking about something different. See, I have a lot of friends in the Singularity / Friendly AI movement. And I talk to them whenever I pass through the Bay Area, which is where they congregate. And many of them express great confidence that before too long—maybe in 20 or 30 years, maybe in 100 years—we’ll be able to upload ourselves to computers and live forever on the Internet (as opposed to just living 70% of our lives on the Internet, like we do today). This would have lots of advantages. For example, any time you were about to do something dangerous, you’d just make a backup copy of yourself first. If you were struggling with a conference deadline, you’d spawn 100 temporary copies of yourself. If you wanted to visit Mars or Jupiter, you’d just email yourself there. If Trump became president, you’d not run yourself for 8 years (or maybe 80 or 800 years). And so on. Admittedly, some awkward questions arise. For example, let’s say the hardware runs three copies of your code and takes a majority vote, just for error-correcting purposes. Does that bring three copies of you into existence, or only one copy? Or let’s say your code is run homomorphically encrypted, with the only decryption key stored in another galaxy. Does that count? Or you email yourself to Mars. If you want to make sure that you’ll wake up on Mars, is it important that you delete the copy of your code that remains on earth? Does it matter whether anyone runs the code or not? And what exactly counts as “running” it? Or my favorite one: could someone threaten you by saying, “look, I have a copy of your code, and if you don’t do what I say, I’m going to make a thousand copies of it and subject them all to horrible tortures?” The issue, in all these cases, is that in a world where there could be millions of copies of your code running on different substrates in different locations—or things where it’s not even clear whether they count as a copy or not—we don’t have a principled way to take as input a description of the state of the universe, and then identify where in the universe you are—or even a probability distribution over places where you could be. And yet you seem to need such a way in order to make predictions and decisions. A few years ago, I wrote this gigantic, post-tenure essay called The Ghost in the Quantum Turing Machine, where I tried to make the point that we don’t know at what level of granularity a brain would need to be simulated in order to duplicate someone’s subjective identity. Maybe you’d only need to go down to the level of neurons and synapses. But if you needed to go all the way down to the molecular level, then the No-Cloning Theorem would immediately throw a wrench into most of the paradoxes of personal identity that we discussed earlier. For it would mean that there were some microscopic yet essential details about each of us that were fundamentally uncopyable, localized to a particular part of space. We would all, in effect, be quantumly copy-protected software. Each of us would have a core of unpredictability—not merely probabilistic unpredictability, like that of a quantum random number generator, but genuine unpredictability—that an external model of us would fail to capture completely. Of course, by having futuristic nanorobots scan our brains and so forth, it would be possible in principle to make extremely realistic copies of us. But those copies necessarily wouldn’t capture quite everything. And, one can speculate, maybe not enough for your subjective experience to “transfer over.” Maybe the most striking aspect of this picture is that sure, you could teleport yourself to Mars—but to do so you’d need to use quantum teleportation, and as we all know, quantum teleportation necessarily destroys the original copy of the teleported state. So we’d avert this metaphysical crisis about what to do with the copy that remained on Earth. Look—I don’t know if any of you are like me, and have ever gotten depressed by reflecting that all of your life experiences, all your joys and sorrows and loves and losses, every itch and flick of your finger, could in principle be encoded by a huge but finite string of bits, and therefore by a single positive integer. (Really? No one else gets depressed about that?) It’s kind of like: given that this integer has existed since before there was a universe, and will continue to exist after the universe has degenerated into a thin gruel of radiation, what’s the point of even going through the motions? You know? But the No-Cloning Theorem raises the possibility that at least this integer is really your integer. At least it’s something that no one else knows, and no one else could know in principle, even with futuristic brain-scanning technology: you’ll always be able to surprise the world with a new digit. I don’t know if that’s true or not, but if it were true, then it seems like the sort of thing that would be worthy of elevating unclonability to a fundamental principle of the universe. So as you enjoy your dinner and dessert at this historic Mayflower Hotel, I ask you to reflect on the following. People can photograph this event, they can video it, they can type up transcripts, in principle they could even record everything that happens down to the millimeter level, and post it on the Internet for posterity. But they’re not gonna get the quantum states. There’s something about this evening, like about every evening, that will vanish forever, so please savor it while it lasts. Thank you. Update (Sep. 20): Unbeknownst to me, Marc Kaplan did video the event and put it up on YouTube! Click here to watch. Thanks very much to Marc! I hope you enjoy, even though of course, the video can’t precisely clone the experience of having been there. [Note: The part where I raise my middle finger is an inside joke—one of the speakers during the technical sessions inadvertently did the same while making a point, causing great mirth in the audience.] ## September 20, 2016 ### Particlebites — A new anomaly: the electromagnetic duality anomaly Article: Electromagnetic duality anomaly in curved spacetimes Authors: I. Agullo, A. del Rio and J. Navarro-Salas Reference: arXiv:1607.08879 Disclaimer: this blogpost requires some basic knowledge of QFT (or being comfortable with taking my word at face value for some of the claims made :)) Anomalies exists everywhere. Probably the most intriguing ones are medical, but in particle physics they can be pretty fascinating too. In physics, anomalies refer to the breaking of a symmetry. There are basically two types of anomalies: • The first type, gauge anomalies, are red-flags: if they show up in your theory, they indicate that the theory is mathematically inconsistent. • The second type of anomaly does not signal any problems with the theory and in fact can have experimentally observable consequences. A prime example is the chiral anomaly. This anomaly nicely explains the decay rate of the neutral pion into two photons. Fig. 1: Illustration of pion decay into two photons. [Credit: Wikimedia Commons] In this paper, a new anomaly is discussed. This anomaly is related to the polarization of light and is called the electromagnetic duality anomaly. Chiral anomaly 101 So let’s first brush up on the basics of the chiral anomaly. How does this anomaly explain the decay rate of the neutral pion into two photons? For that we need to start with the Lagrangian for QED that describes the interactions between the electromagnetic field (that is, the photons) and spin-½ fermions (which pions are build from): $\displaystyle \mathcal L = \bar\psi \left( i \gamma^\mu \partial_\mu - i e \gamma^\mu A_\mu \right) \psi + m \bar\psi \psi$ where the important players in the above equation are the $\psi$s that describe the spin-½ particles and the vector potential $A_\mu$ that describes the electromagnetic field. This Lagrangian is invariant under the chiral symmetry: $\displaystyle \psi \to e^{i \gamma_5} \psi .$ Due to this symmetry the current density $j^\mu = \bar{\psi} \gamma_5 \gamma^\mu \psi$ is conserved: $\nabla_\mu j^\mu = 0$. This then immediately tells us that the charge associated with this current density is time-independent. Since the chiral charge is time-independent, it prevents the $\psi$ fields to decay into the electromagnetic fields, because the $\psi$ field has a non-zero chiral charge and the photons have no chiral charge. Hence, if this was the end of the story, a pion would never be able to decay into two photons. However, the conservation of the charge is only valid classically! As soon as you go from classical field theory to quantum field theory this is no longer true; hence, the name (quantum) anomaly. This can be seen most succinctly using Fujikawa’s observation that even though the field $\psi$ and Lagrangian are invariant under the chiral symmetry, this is not enough for the quantum theory to also be invariant. If we take the path integral approach to quantum field theory, it is not just the Lagrangian that needs to be invariant but the entire path integral needs to be: $\displaystyle \int D[A] \, D[\bar\psi]\, \int D[\psi] \, e^{i\int d^4x \mathcal L}$ . From calculating how the chiral symmetry acts on the measure $D \left[\psi \right] \, D \left[\bar \psi \right]$, one can extract all the relevant physics such as the decay rate. The electromagnetic duality anomaly Just like the chiral anomaly, the electromagnetic duality anomaly also breaks a symmetry at the quantum level that exists classically. The symmetry that is broken in this case is – as you might have guessed from its name – the electromagnetic duality. This symmetry is a generalization of a symmetry you are already familiar with from source-free electromagnetism. If you write down source-free Maxwell equations, you can just swap the electric and magnetic field and the equations look the same (you just have to send $\displaystyle \vec{E} \to \vec{B}$ and $\vec{B} \to - \vec{E}$). Now the more general electromagnetic duality referred to here is slightly more difficult to visualize: it is a rotation in the space of the electromagnetic field tensor and its dual. However, its transformation is easy to write down mathematically: $\displaystyle F_{\mu \nu} \to \cos \theta \, F_{\mu \nu} + \sin \theta \, \, ^\ast F_{\mu \nu} .$ In other words, since this is a symmetry, if you plug this transformation into the Lagrangian of electromagnetism, the Lagrangian will not change: it is invariant. Now following the same steps as for the chiral anomaly, we find that the associated current is conserved and its charge is time-independent due to the symmetry. Here, the charge is simply the difference between the number of photons with left helicity and those with right helicity. Let us continue following the exact same steps as those for the chiral anomaly. The key is to first write electromagnetism in variables analogous to those of the chiral theory. Then you apply Fujikawa’s method and… *drum roll for the anomaly that is approaching*…. Anti-climax: nothing happens, everything seems to be fine. There are no anomalies, nothing! So why the title of this blog? Well, as soon as you couple the electromagnetic field with a gravitational field, the electromagnetic duality is broken in a deeply quantum way. The number of photon with left helicity and right helicity is no longer conserved when your spacetime is curved. Physical consequences Some potentially really cool consequences have to do with the study of light passing by rotating stars, black holes or even rotating clusters. These astrophysical objects do not only gravitationally bend the light, but the optical helicity anomaly tells us that there might be a difference in polarization between lights rays coming from different sides of these objects. This may also have some consequences for the cosmic microwave background radiation, which is ‘picture’ of our universe when it was only 380,000 years old (as compared to the 13.8 billion years it is today!). How big this effect is and whether we will be able to see it in the near future is still an open question. Further reading • An introduction to anamolies using only quantum mechanics instead of quantum field theory is “Anomalies for pedestrians” by Barry Holstein • The beautiful book “Quantum field theory and the Standard Model” by Michael Schwartz has a nice discussion in the later chapters on the chiral anomaly. • Lecture notes by Adal Bilal for graduate students on anomalies in general can be found here ### Jordan Ellenberg — Such shall not become the degradation of Wisconsin I’ve lived in Wisconsin for more than a decade and had never heard of Joshua Glover. That’s not as it should be! Glover was a slave who escaped Missouri in 1852 and settled in Racine, a free man. He found a job and settled down into a new life. Two years later, his old master found out where he was, and, licensed by the Fugitive Slave Act, came north to claim his property. The U.S. marshals seized Glover and locked him in the Milwaukee courthouse. (Cathedral Square Park is where that courthouse stood.) A Wisconsin court issued a writ holding the Fugitive Slave Law unconstitutional, and demanding that Glover be given a trial, but the federal officers refused to comply. So Sherman Booth, an abolitionist newspaperman from Waukesha, gathered a mob and broke Glover out. Eventually he made it to Canada via the Underground Railroad. Booth spent years tangled in court, thanks to his role in the prison break. Wisconsin, thrilled by its defiance of the hated law, bloomed with abolitionist fervency. Judge Abram Daniel Smith declared that Wisconsin, a sovereign state, would never accept federal interference within its borders: “They will never consent that a slave-owner, his agent, or an officer of the United States, armed with process to arrest a fugitive from service, is clothed with entire immunity from state authority; to commit whatever crime or outrage against the laws of the state; that their own high prerogative writ of habeas corpus shall be annulled, their authority defied, their officers resisted, the process of their own courts contemned, their territory invaded by federal force, the houses of their citizens searched, the sanctuary or their homes invaded, their streets and public places made the scenes of tumultuous and armed violence, and state sovereignty succumb–paralyzed and aghast–before the process of an officer unknown to the constitution and irresponsible to its sanctions. At least, such shall not become the degradation of Wisconsin, without meeting as stern remonstrance and resistance as I may be able to interpose, so long as her people impose upon me the duty of guarding their rights and liberties, and maintaining the dignity and sovereignty of their state.” The sentiment, of course, was not so different from that the Southern states would use a few years later to justify their right to buy and sell human beings. By the end of the 1850s, Wisconsin’s governor Alexander Randall would threaten to secede from the Union should slavery not be abolished. When Booth was arrested by federal marshals in 1860, state assemblyman Benjamin Hunkins of New Berlin went even further, introducing a bill declaring war on the United States in protest. The speaker of the assembly declared the bill unconstitutional and no vote was taken. (This was actually the second time Hunkins tried to declare war on the federal government; as a member of the Wisconsin territorial assembly in 1844, he became so outraged over the awarding of the Upper Peninsula to Michigan that he introduced an amendment declaring war on Great Britain, Illinois, Michigan, and the United States!) Madison has a Randall Street (and a Randall School, and Camp Randall Stadium) but no Glover Street and no Booth Street. Should it? ## September 19, 2016 ### Clifford Johnson — Kitchen Design… (Click for larger view.) Apparently I was designing a kitchen recently. Yes, but not one I intend to build in the physical world. It's the setting (in part) for a new story I'm working on for the book. The everyday household is a great place to have a science conversation, by the way, and this is what we will see in this story. It might be one of the most important conversations in the book in some sense. This story is meant to be done in a looser, quicker style, and there I go again with the ridiculous level of detail... Just to get a sense of how ridiculous I'm being, note that this is not a page, but a small panel within a page of several. The page establishes the overall setting, and hopefully roots you [...] Click to continue reading this post The post Kitchen Design… appeared first on Asymptotia. ### Robert Helling — Brute forcing Crazy Game Puzzles In the 1980s, as a kid I loved my Crazy Turtles Puzzle ("Das verrückte Schildkrötenspiel"). For a number of variations, see here or here. I had completely forgotten about those, but a few days ago, I saw a self-made reincarnation when staying at a friends' house: I tried a few minutes to solve it, unsuccessfully (in case it is not clear: you are supposed to arrange the nine tiles in a square such that they form color matching arrows wherever they meet). So I took the picture above with the plan to either try a bit more at home or write a program to solve it. Yesterday, I had about an hour and did the latter. I am a bit proud of the implementation I came up with and in particular the fact that I essentially came up with a correct program: It came up with the unique solution the first time I executed it. So, here I share it: #!/usr/bin/perl# 1 rot 8# 2 gelb 7# 3 gruen 6# 4 blau 5@karten = (7151, 6754, 4382, 2835, 5216, 2615, 2348, 8253, 4786);foreach$karte(0..8) {  $farbe[$karte] = [split //,$karten[$karte]];}&ausprobieren(0);sub ausprobieren {  my $pos = shift; foreach my$karte(0..8) {    next if $benutzt[$karte];    $benutzt[$karte] = 1;    foreach my $dreh(0..3) { if ($pos % 3) { # Nicht linke Spalte $suche = 9 -$farbe[$gelegt[$pos - 1]]->[(1 - $drehung[$gelegt[$pos - 1]]) % 4]; next if$farbe[$karte]->[(3 -$dreh) % 4] != $suche; } if ($pos >= 3) { # Nicht oberste Zeile $suche = 9 -$farbe[$gelegt[$pos - 3]]->[(2 - $drehung[$gelegt[$pos - 3]]) % 4]; next if$farbe[$karte]->[(4 -$dreh) % 4] != $suche; }$benutzt[$karte] = 1;$gelegt[$pos] =$karte;      $drehung[$karte] = $dreh; #print @gelegt[0..$pos]," ",@drehung[0..$pos]," ", 9 -$farbe[$gelegt[$pos - 1]]->[(1 - $drehung[$gelegt[$pos - 1]]) % 4],"\n"; if ($pos == 8) { print "Fertig!\n"; for $l(0..8) { print "$gelegt[$l]$drehung[$gelegt[$l]]\n"; }      } else { &ausprobieren($pos + 1); } }$benutzt[$karte] = 0; }} Sorry for variable names in German, but the idea should be clear. Regarding the implementation: red, yellow, green and blue backs of arrows get numbers 1,2,3,4 respectively and pointy sides of arrows 8,7,6,5 (so matching combinations sum to 9). It implements depth first tree search where tile positions (numbered 0 to 8) are tried left to write top to bottom. So tile$n$shares a vertical edge with tile$n-1$unless it's number is 0 mod 3 (leftist column) and it shares a horizontal edge with tile$n-3$unless$n$is less than 3, which means it is in the first row. It tries rotating tiles by 0 to 3 times 90 degrees clock-wise, so finding which arrow to match with a neighboring tile can also be computed with mod 4 arithmetic. ### Clifford Johnson — Breaking, not Braking Well, that happened. I’ve not, at least as I recollect, written a breakup letter before…until now. It had the usual “It’s not you it’s me…”, “we’ve grown apart…” sorts of phrases. And they were all well meant. This was written to my publisher, I hasten to add! Over the last … Click to continue reading this post The post Breaking, not Braking appeared first on Asymptotia. ### Particlebites — Horton Hears a Sterile Neutrino? Article: Limits on Active to Sterile Neutrino Oscillations from Disappearance Searches in the MINOS, Daya Bay, and Bugey-3 Experiments Authors: Daya Bay and MINOS collaborations Reference: arXiv:1607.01177v4 So far, the hunt for sterile neutrinos has come up empty. Could a joint analysis between MINOS, Daya Bay and Bugey-3 data hint at their existence? Neutrinos, like the beloved Whos in Dr. Seuss’ “Horton Hears a Who!,” are light and elusive, yet have a large impact on the universe we live in. While neutrinos only interact with matter through the weak nuclear force and gravity, they played a critical role in the formation of the early universe. Neutrino physics is now an exciting line of research pursued by the Hortons of particle physics, cosmology, and astrophysics alike. While most of what we currently know about neutrinos is well described by a three-flavor neutrino model, a few inconsistent experimental results such as those from the Liquid Scintillator Neutrino Detector (LSND) and the Mini Booster Neutrino Experiment (MiniBooNE) hint at the presence of a new kind of neutrino that only interacts with matter through gravity. If this “sterile” kind of neutrino does in fact exist, it might also have played an important role in the evolution of our universe. Horton hears a sterile neutrino? Source: imdb.com The three known neutrinos come in three flavors: electron, muon, or tau. The discovery of neutrino oscillation by the Sudbury Neutrino Observatory and the Super-Kamiokande Observatory, which won the 2015 Nobel Prize, proved that one flavor of neutrino can transform into another. This led to the realization that each neutrino mass state is a superposition of the three different neutrino flavor states. From neutrino oscillation measurements, most of the parameters that define the mixing between neutrino states are well known for the three standard neutrinos. The relationship between the three known neutrino flavor states and mass states is usually expressed as a 3×3 matrix known as the PMNS matrix, for Bruno Pontecorvo, Ziro Maki, Masami Nakagawa and Shoichi Sakata. The PMNS matrix includes three mixing angles, the values of which determine “how much” of each neutrino flavor state is in each mass state. The distance required for one neutrino flavor to become another, the neutrino oscillation wavelength, is determined by the difference between the squared masses of the two mass states. The values of mass splittings $m_2^2-m_1^2$ and $m_3^2-m_2^2$ are known to good precision. A fourth flavor? Adding a sterile neutrino to the mix A “sterile” neutrino is referred to as such because it would not interact weakly: it would only interact through the gravitational force. Neutrino oscillations involving the hypothetical sterile neutrino can be understood using a “four-flavor model,” which introduces a fourth neutrino mass state, $m_4$, heavier than the three known “active” mass states. This fourth neutrino state would be mostly sterile, with only a small contribution from a mixture of the three known neutrino flavors. If the sterile neutrino exists, it should be possible to experimentally observe neutrino oscillations with a wavelength set by the difference between $m_4^2$ and the square of the mass of another known neutrino mass state. Current observations suggest a squared mass difference in the range of 0.1-10 eV$^2$. Oscillations between active and sterile states would result in the disappearance of muon (anti)neutrinos and electron (anti)neutrinos. In a disappearance experiment, you know how many neutrinos of a specific type you produce, and you count the number of that type of neutrino a distance away, and find that some of the neutrinos have “disappeared,” or in other words, oscillated into a different type of neutrino that you are not detecting. A joint analysis by the MINOS and Daya Bay collaborations The MINOS and Daya Bay collaborations have conducted a joint analysis to combine independent measurements of muon (anti)neutrino disappearance by MINOS and electron antineutrino disappearance by Daya Bay and Bugey-3. Here’s a breakdown of the involved experiments: • MINOS, the Main Injector Neutrino Oscillation Search: A long-baseline neutrino experiment with detectors at Fermilab and northern Minnesota that use an accelerator at Fermilab as the neutrino source • The Daya Bay Reactor Neutrino Experiment: Uses antineutrinos produced by the reactors of China’s Daya Bay Nuclear Power Plant and the Ling Ao Nuclear Power Plant • The Bugey-3 experiment: Performed in the early 1990s, used antineutrinos from the Bugey Nuclear Power Plant in France for its neutrino oscillation observations MINOS and Daya Bay/Bugey-3 combined 90% confidence level limits (in red) compared to the LSND and MiniBooNE 90% confidence level allowed regions (in green/purple). Plots the mass splitting between mass states 1 and 4 (corresponding to the sterile neutrino) against a function of the $\mu-e$ mixing angle, which is equivalent to a function involving the 1-4 and 2-4 mixing angles. Regions of parameter space to the right of the red contour are excluded, counting out the majority of the LSND/MiniBooNE allowed regions. Source: arXiv:1607.01177v4. Assuming a four-flavor model, the MINOS and Daya Bay collaborations put new constraints on the value of the mixing angle $\theta_{\mu e}$, the parameter controlling electron (anti)neutrino appearance in experiments with short neutrino travel distances. As for the hypothetical sterile neutrino? The analysis excluded the parameter space allowed by the LSND and MiniBooNE appearance-based indications for the existence of light sterile neutrinos for $\Delta m_{41}^2$ < 0.8 eV$^2$ at a 95% confidence level. In other words, the MINOS and Daya Bay analysis essentially rules out the LSND and MiniBooNE inconsistencies that allowed for the presence of a sterile neutrino in the first place. These results illustrate just how at odds disappearance searches and appearance searches are when it comes to providing insight into the existence of light sterile neutrinos. If the Whos exist, they will need to be a little louder in order for the world to hear them. Background reading: ### n-Category CaféLogical Uncertainty and Logical Induction Quick - what’s the $10^{100}$th digit of $\pi$? If you’re anything like me, you have some uncertainty about the answer to this question. In fact, your uncertainty probably takes the following form: you assign a subjective probability of about $\frac{1}{10}$ to this digit being any one of the possible values $0, 1, 2, \dots 9$. This is despite the fact that • the normality of $\pi$ in base $10$ is a wide open problem, and • even if it weren’t, nothing random is happening; the $10^{100}$th digit of $\pi$ is a particular digit, not a randomly selected one, and it being a particular value is a mathematical fact which is either true or false. If you’re bothered by this state of affairs, you could try to resolve it by computing the $10^{100}$th digit of $\pi$, but as far as I know nobody has the computational resources to do this in a reasonable amount of time. Because of this lack of computational resources, among other things, you and I aren’t logically omniscient; we don’t have access to all of the logical consequences of our beliefs. The kind of uncertainty we have about mathematical questions that are too difficult for us to settle one way or another right this moment is logical uncertainty, and standard accounts of how to have uncertain beliefs (for example, assign probabilities and update them using Bayes’ theorem) don’t capture it. Nevertheless, somehow mathematicians manage to have lots of beliefs about how likely mathematical conjectures such as the Riemann hypothesis are to be true, and even about simpler but still difficult mathematical questions such as how likely some very large complicated number $N$ is to be prime (a reasonable guess, before we’ve done any divisibility tests, is about $\frac{1}{\ln N}$ by the prime number theorem). In some contexts we have even more sophisticated guesses like the Cohen-Lenstra heuristics for assigning probabilities to mathematical statements such as “the class number of such-and-such complicated number field has $p$-part equal to so-and-so.” In general, what criteria might we use to judge an assignment of probabilities to mathematical statements as reasonable or unreasonable? Given some criteria, how easy is it to find a way to assign probabilities to mathematical statements that actually satisfies them? These fundamental questions are the subject of the following paper: Scott Garrabrant, Tsvi Benson-Tilsen, Andrew Critch, Nate Soares, and Jessica Taylor, Logical Induction. ArXiv:1609.03543. Loosely speaking, in this paper the authors • describe a criterion called logical induction that an assignment of probabilities to mathematical statements could satisfy, • show that logical induction implies many other desirable criteria, some of which have previously appeared in the literature, and • prove that a computable logical inductor (an algorithm producing probability assignments satisfying logical induction) exists. Logical induction is a weak “no Dutch book” condition; the idea is that a logical inductor makes bets about which statements are true or false, and does so in a way that doesn’t lose it too much money over time. ## A warmup Before describing logical induction, let me describe a different and more naive criterion you could ask for, but in fact don’t want to ask for because it’s too strong. Let $\varphi \mapsto \mathbb{P}(\varphi)$ be an assignment of probabilities to statements in some first-order language; for example, we might want to assign probabilities to statements in the language of Peano arithmetic (PA), conditioned on the axioms of PA being true (which means having probability $1$). Say that such an assignment $\varphi \mapsto \mathbb{P}(\varphi)$ is coherent if • $\mathbb{P}(\top) = 1$. • If $\varphi_1$ is equivalent to $\varphi_2$, then $\mathbb{P}(\varphi_1) = \mathbb{P}(\varphi_2)$. • $\mathbb{P}(\varphi_1) = \mathbb{P}(\varphi_1 \wedge \varphi_2) + \mathbb{P}(\varphi_1 \wedge \neg \varphi_2)$. These axioms together imply various other natural-looking conditions; for example, setting $\varphi_1 = \top$ in the third axiom, we get that $\mathbb{P}(\varphi_2) + \mathbb{P}(\neg \varphi_2) = 1$. Various other axiomatizations of coherence are possible. Theorem: A probability assignment such that $\mathbb{P}(\varphi) = 1$ for all statements $\varphi$ in a first-order theory $T$ is coherent iff there is a probability measure on models of $T$ such that $\mathbb{P}(\varphi)$ is the probability that $\varphi$ is true in a random model. This theorem is a logical counterpart of the Riesz-Markov-Kakutani representation theorem relating probability distributions to linear functionals on spaces of functions; I believe it is due to Gaifman. For example, if $T$ is PA, then the sort of uncertainty that a coherent probability assignment conditioned on PA captures is uncertainty about which of the various first-order models of PA is the “true” natural numbers. However, coherent probability assignments are still logically omniscient: syntactically, every provable statement is assigned probability $1$ because they’re all equivalent to $\top$, and semantically, provable statements are true in every model. In particular, coherence is too strong to capture uncertainty about the digits of $\pi$. Coherent probability assignments can update over time whenever they learn that some statement is true which they haven’t assigned probability $1$ to; for example, if you start by believing PA and then come to also believe that PA is consistent, then conditioning on that belief will cause your probability distribution over models to exclude models of PA where PA is inconsistent. But this doesn’t capture the kind of updating a non-logically omniscient reasoner like you or me actually does, where our beliefs about mathematics can change solely because we’ve thought a bit longer and proven some statements that we didn’t previously know (for example, about the values of more and more digits of $\pi$). ## Logical induction The framework of logical induction is for describing the above kind of updating, based solely on proving more statements. It takes as input a deductive process which is slowly producing proofs of statements over time (for example, of theorems in PA), and assigns probabilities to statements that haven’t been proven yet. Remarkably, it’s able to do this in a way that eventually outpaces the deductive process, assigning high probabilities to true statements long before they are proven (see Theorem 4.2.1). So how does logical induction work? The coherence axioms above can be justified by Dutch book arguments, following Ramsey and de Finetti, which loosely say that a bookie can’t offer a coherent reasoner a bet about mathematical statements which they will take but which is in fact guaranteed to lose them money. But this is much too strong a requirement for a reasoner who is not logically omniscient. The logical induction criterion is a weaker version of this condition; we only require that an efficiently computable bookie can’t make arbitrarily large amounts of money by betting with a logical inductor about mathematical statements unless it’s willing to take on arbitrarily large amounts of risk (see Definition 3.0.1). This turns out to be a surprisingly useful condition to require, loosely speaking because it corresponds to being able to “notice patterns” in mathematical statements even if we can’t prove anything about them yet. A logical inductor has to be able to notice patterns that could otherwise be used by an efficiently computable bookie to exploit the inductor; for example, a logical inductor eventually assigns probability about $\frac{1}{10}$ to claims that a very large digit of $\pi$ has a particular value, intuitively because otherwise a bookie could continue to bet with the logical inductor about more and more digits of $\pi$, making money each time (see Theorem 4.4.2). Logical induction has many other desirable properties, some of which are described in this blog post. One of the more remarkable properties is that because logical inductors are computable, they can reason about themselves, and hence assign probabilities to statements about the probabilities they assign. Despite the possibility of running into self-referential paradoxes, logical inductors eventually have accurate beliefs about their own beliefs (see Theorem 4.11.1). Overall I’m excited about this circle of ideas and hope that they get more attention from the mathematical community. Speaking very speculatively, it would be great if logical induction shed some light on the role of probability in mathematics more generally - for example, in the use of informal probabilistic arguments for or against difficult conjectures. A recent example is Boklan and Conway’s probabilistic arguments in favor of the conjecture that there are no Fermat primes beyond those currently known. I’ve made several imprecise claims about the contents of the paper above, so please read it to get the precise claims! ### Tommaso Dorigo — Are There Two Higgses ? No, And I Won Another Bet! The 2012 measurements of the Higgs boson, performed by ATLAS and CMS on 7- and 8-TeV datasets collected during Run 1 of the LHC, were a giant triumph of fundamental physics, which conclusively showed the correctness of the theoretical explanation of electroweak symmetry breaking conceived in the 1960s. The Higgs boson signals found by the experiments were strong and coherent enough to convince physicists as well as the general public, but at the same time the few small inconsistencies unavoidably present in any data sample, driven by statistical fluctuations, were a stimulus for fantasy interpretations. Supersymmetry enthusiasts, in particular, saw the 125 GeV boson as the first found of a set of five. SUSY in fact requires the presence of at least five such states. read more ### Backreaction — Experimental Search for Quantum Gravity 2016 Research in quantum gravity is quite a challenge since we neither have a theory nor data. But some of us like a challenge. So far, most effort in the field has gone into using requirements of mathematical consistency to construct a theory. It is impossible of course to construct a theory based on mathematical consistency alone, because we can never prove our assumptions to be true. All we know is that the assumptions give rise to good predictions in the regime where we’ve tested them. Without assumptions, no proof. Still, you may hope that mathematical consistency tells you where to look for observational evidence. But in the second half of the 20th century, theorists have used the weakness of gravity as an excuse to not think about how to experimentally test quantum gravity at all. This isn’t merely a sign of laziness, it’s back to the days when philosophers believed they could find out how nature works by introspection. Just that now many theoretical physicists believe mathematical introspection is science. Particularly disturbing to me is how frequently I speak with students or young postdocs who have never even given thought to the question what makes a theory scientific. That’s one of the reasons the disconnect between physics and philosophy worries me. In any case, the cure clearly isn’t more philosophy, but more phenomenology. The effects of quantum gravity aren’t necessarily entirely out of experimental reach. Gravity isn’t generally a weak force, not in the same way that, for example, the weak nuclear force is weak. That’s because the effects of gravity get stronger with the amount of mass (or energy) that exerts the force. Indeed, this property of the gravitational force is the very reason why it’s so hard to quantize. Quantum gravitational effects hence were strong in the early universe, they are strong inside black holes, and they can be non-negligible for massive objects that have pronounced quantum properties. Furthermore, the theory of quantum gravity can be expected to give rise to deviations from general relativity or the symmetries of the standard model, which can have consequences that are observable even at low energies. The often repeated argument that we’d need to reach enormously high energies – close by the Planck energy, 16 orders of magnitude higher than LHC energies – is simply wrong. Physics is full with examples of short-distance phenomena that give rise to effects at longer distances, such as atoms causing Brownian motion, or quantum electrodynamics allowing stable atoms to begin with. I have spent the last 10 years or so studying the prospects to find experimental evidence for quantum gravity. Absent a fully-developed theory we work with models to quantify effects that could be signals of quantum gravity, and aim to test these models with data. The development of such models is relevant to identify promising experiments to begin with. Next week, we will hold the 5th international conference on Experimental Search for Quantum Gravity, here in Frankfurt. And I dare to say we have managed to pull together an awesome selection of talks. We’ll hear about the prospects of finding evidence for quantum gravity in the CMB (Bianchi, Krauss, Vennin) and in quantum oscillators (Paternostro). We have a lecture about the interface between gravity and quantum physics, both on long and short distances (Fuentes), and a talk on how to look for moduli and axion fields that are generic consequences of string theory (Conlon). Of course we’ll also cover Loop Quantum Cosmology (Barrau), asymptotically safe gravity (Eichhorn), and causal sets (Glaser). We’re super up-to-date by having a talk about constraints from the LIGO gravitational wave-measurements on deviations from general relativity (Yunes), and several of the usual suspects speaking about deviations from Lorentz-invariance (Mattingly), Planck stars (Rovelli, Vidotto), vacuum dispersion (Giovanni), and dimensional reduction (Magueijo). There’s neutrino physics (Paes), a talk about what the cosmological constant can tell us about new physics (Afshordi), and, and, and! You can download the abstracts here and the timetable here. But the best is I’m not telling you this to depress you because you can’t be with us, but because our IT guys still tell me we’ll both record the talks and livestream them (to the extent that the speakers consent of course). I’ll share the URL with you here once everything is set up, so stay tuned. Update:Streaming link will be posted on the institute's main page briefly before the event. Another update: Lifestream is available here. ### Jordan Ellenberg — Kevin Jamieson, hyperparameter optimization, playoffs Kevin Jamieson gave a great seminar here on Hyperband, his algorithm for hyperparameter optimization. Here’s the idea. Doing machine learning involves making a lot of choices. You set up your deep learning neural thingamajig but that’s not exactly one size fits all: How many layers do you want in your net? How fast do you want your gradient descents to step? And etc. and etc. The parameters are the structures your thingamajig learns. The hyperparameters are the decisions you make about your thingamajig before you start learning. And it turns out these decisions can actually affect performance a lot. So how do you know how to make them? Well, one option is to pick N choices of hyperparameters at random, run your algorithm on your test set with each choice, and see how you do. The problem is, thingamajigs take a long time to converge. This is expensive to do, and when N is small, you’re not really seeing very much of hyperparameter space (which might have dozens of dimensions.) A more popular choice is to place some prior on the function F:[hyperparameter space] -> [performance on test set] You make a choice of hyperparameters, you run the thingamajig, based on the output you update your distribution on F, based on your new distribution you choose a likely-to-be-informative hyperparameter and run again, etc. This is called “Bayesian optimization of hyperparameters” — it works pretty well — but really only about as well as taking twice as many random choices of hyperparameters, in practice. A 2x speedup is nothing to sneeze at, but it still means you can’t get N large enough to search much of the space. Kevin thinks you should think of this as a multi-armed bandit problem. You have a hyperparameter whose performance you’d like to judge. You could run your thingamajig with those parameters until it seems to be converging, and see how well it does. But that’s expensive. Alternatively, you could run your thingamajig (1/c) times as long; then you have time to consider Nc values of the hyperparameters, much better. But of course you have a much less accurate assessment of the performance: maybe the best performer in that first (1/c) time segment is actually pretty bad, and just got off to a good start! So you do this instead. Run the thingamajig for time (1/c) on Nc values. That costs you N. Then throw out all values of the hyperparameters that came in below median on performance. You still have (1/2)Nc values left, so continue running those processes for another time (1/c). That costs you (1/2)N. Throw out everything below the median. And so on. When you get to the end you’ve spent N log Nc, not bad at all but instead of looking at only N hyperparameters, you’ve looked at Nc, where c might be pretty big. And you haven’t wasted lots of processor time following unpromising choices all the way to the end; rather, you’ve mercilessly culled the low performers along the way. But how do you choose c? I insisted to Kevin that he call c a hyperhyperparameter but he wasn’t into it. No fun! Maybe the reason Kevin resisted my choice is that he doesn’t actually choose c; he just carries out his procedure once for each c as c ranges over 1,2,4,8,…. N; this costs you only another log N. In practice, this seems to find hyperparameters just as well as more fancy Bayesian methods, and much faster. Very cool! You can imagine doing the same things in simpler situations (e.g. I want to do a gradient descent, where should I start?) and Kevin says this works too. In some sense this is how a single-elimination tournament works! In the NCAA men’s basketball finals, 64 teams each play a game; the teams above the median are 1-0, while the teams below the median, at 0-1, get cut. Then the 1-0 teams each play one more game: the teams above the median at 2-0 stay, the teams below the median at 1-1 get cut. What if the regular season worked like this? Like if in June, the bottom half of major league baseball just stopped playing, and the remaining 15 teams duked it out until August, then down to 8… It would be impossible to schedule, of course. But in a way we have some form of it: at the July 31 trade deadline, teams sufficiently far out of the running can just give up on the season and trade their best players for contending teams’ prospects. Of course the bad teams keep playing games, but in some sense, the competition has narrowed to a smaller field. ## September 18, 2016 ### Doug Natelson — Alan Alda Center for Communicating Science, posting Tomorrow I'll be a participant in an all-day workshop that Rice's Center for Teaching Excellence will be hosting with representatives from the Alan Alda Center for Communicating Science - the folks responsible for the Flame Challenge, a contest about trying to explain a science topic to an 11-year-old. I'll write a follow-up post sometime soon about what this was like. I'm in the midst of some major writing commitments right now, so posting frequency may slow for a bit. I am trying to plan out how to write some accessible content about some recent exciting work in a few different material systems. ### Chad Orzel — Advice for New Faculty, 2016 A couple of weeks ago, I was asked to speak on a panel about teaching during Union’s new-faculty orientation. We had one person from each of the academic divisions (arts and literature, social science, natural science, and engineering), and there was a ton of overlap in the things we said, but here’s a rough reconstruction of the advice I gave them: 1) Be Wary of Advice Because it’s always good to start off with something that sounds a little counter-intuitive… What I mean by this is that lots of people will be more than happy to offer advice to a new faculty member– often without being asked– but a great deal of that advice will be bad for the person getting it. This isn’t the result of active malice, just that teaching is a highly individual endeavor. The relatively harmless example I use to illustrate this comes from my first year of teaching, when I went around and asked my new colleagues for advice. One very successful guy said that he made an effort to maximize the effect of our small classes by “breaking the fourth wall” and walking away from the chalkboard out into the middle of the room. That sounded good, so I tried it for a while. And quickly found that while it worked well for him, it was a disaster for me. I’m multiple standard deviations above average height, and several years later a student wrote on a course evaluation “He is loud and intense.” That combination meant that when I would walk out into the room while lecturing, the students closest to me were basically cowering in fear. Once I noticed that, I made a point of staying back near the blackboard unless I needed to go out into the room for a demo or activity, and everybody was happier. So, my advice to new faculty is: be wary of advice from older faculty. You’ll get lots of it, but much of it will be bad for you. You need to be independent enough and self-aware enough to recognize and use the bits that will work for you, and discard the bits that won’t. 2) Don’t Be Afraid to Try New Things One of the big categories of well-intentioned bad advice that most new faculty can expect to hear is some form of “don’t rock the boat until you have tenure.” This particularly comes up in the context of teaching– I had people who were speakers at a workshop on improving introductory physics courses tell me not to try to implement any new methods as a junior faculty member. The logic is that changing anything will have a short-term negative effect on your teaching evaluations, and you shouldn’t risk that before a tenure review. I think this is bad advice, because if you’re going to go with that, there’s always some reason not to change things– you’ll be up for promotion to full professor, or looking for a fellowship, or an endowed chair, or something. There are always good, logical-seeming reasons to keep your head down and do something functional and not take risks that could improve your teaching. So, my advice is that if you look around and see something that you’d like to try that would improve your teaching, go for it. Again, you need to be independent and self-aware enough to recognize what’s likely to work, and make adjustments when needed, and you need to be prepared to defend the choices you make should it become necessary: “My evaluations went down when I changed teaching methods; this is a well-known effect of change, but they’ve improved since, and student learning is better by these metrics.” 3) Don’t Assume Your Students Are Like You Were I count this as the best one-sentence piece of advice I got as a junior faculty member. It’s really important to remember that people who become college faculty are necessarily unusual– we’re the ones who had enough interest in our subjects to continue into graduate school, and who were sufficiently passionate and self-motivated to succeed there. That’s just not going to be the case for the vast majority of the students we will encounter as faculty. There will be a few, and we should cherish them, but most of them are not going to find the subject as intrinsically fascinating as we do, and won’t put in the same level of independent effort. So, my advice is to remember that (paraphrasing another famous comment) you go to class with the students you have, not the students you wish you had or the students you were. If you go in expecting students to react the same way you did, you’re going to end up disappointed and frustrated. This doesn’t mean you can’t ask them to become better than they are, it just means that you need to make an effort to meet them where they are, and move them toward where you’d like them to be. (For much of my college career, I was probably closer to the mean attitude of our students than many of my colleagues were– I played rugby and partied a lot– but I still regularly need to remind myself of this…) That’s the advice I offered, with the repeated caveat that they should remember the first item on my list, and be wary of all advice, including mine… Happily, many elements of this were echoed by my fellow panelists, so I think I was basically on the right track, but still, it’s impossible to succeed for everyone ## September 17, 2016 ### Jordan Ellenberg — Roger Ailes, man of not many voices Among those who did speak on the record to Mr. Sherman is Stephanie Gordon, an actress who in one part of that show dropped the towel she wore. She was asked by Mr. Ailes to come to his office for a Sunday photo session and felt extremely uncomfortable about having to do this for the producer. But she says Mr. Ailes could not have been nicer. He took pictures and later sent her a signed print inscribed: “Don’t throw in the towel, you’re a great actress. Roger Ailes.” But Mr. Sherman also has a story from a woman named Randi Harrison, also on the record, who claims Mr. Ailes offered her a$400-a-week job at NBC, saying: ‘If you agree to have sex with me whenever I want, I will add an extra hundred dollars a week.”

These don’t sound like the voices of the same man.

I think they totally sound like the voices of the same man.  It’s not like someone who sexually harasses one woman can be counted on to sexually harass every single woman within arm’s reach.  Bank robbers don’t rob every single bank!  “Why, I saw that man walk by a bank just the other day without robbing it — the person who told you he was a bank robber must just have been misinterpreting.  Probably he was just making a withdrawal and the teller took it the wrong way.”

And what’s more:  don’t you think Ailes kind of could have been nicer to Gordon?  Like, a lot nicer?  Look at that exchange again.  He put her in a position where she felt extremely uncomfortable, and declined to sexually assault her on that occasion.  Then he sent her a signed print, on which he wrote a message reminding her that he’d seen her naked body.

I think both these stories depict a man who sees women as existing mainly for his enjoyment, and a man who takes special pleasure in letting women know he sees them that way.  One man, one voice.

## September 15, 2016

### Tim Gowers — In case you haven’t heard what’s going on in Leicester …

Strangely, this is my second post about Leicester in just a few months, but it’s about something a lot more depressing than the football team’s fairytale winning of the Premier League (but let me quickly offer my congratulations to them for winning their first Champions League match — I won’t offer advice about whether they are worth betting on to win that competition too). News has just filtered through to me that the mathematics department is facing compulsory redundancies.

The structure of the story is wearily familiar after what happened with USS pensions. The authorities declare that there is a financial crisis, and that painful changes are necessary. They offer a consultation. In the consultation their arguments appear to be thoroughly refuted. The refutation is then ignored and the changes go ahead.

Here is a brief summary of the painful changes that are proposed for the Leicester mathematics department. The department has 21 permanent research-active staff. Six of those are to be made redundant. There are also two members of staff who concentrate on teaching. Their number will be increased to three. How will the six be chosen? Basically, almost everyone will be sacked and then invited to reapply for their jobs in a competitive process, and the plan is to get rid of “the lowest performers” at each level of seniority. Those lowest performers will be considered for “redeployment” — which means that the university will make efforts to find them a job of a broadly comparable nature, but doesn’t guarantee to succeed. It’s not clear to me what would count as broadly comparable to doing pure mathematical research.

How is performance defined? It’s based on things like research grants, research outputs, teaching feedback, good citizenship, and “the ongoing and potential for continued career development and trajectory”, whatever that means. In other words, on the typical flawed metrics so beloved of university administrators, together with some subjective opinions that will presumably have to come from the department itself — good luck with offering those without creating enemies for life.

Oh, and another detail is that they want to reduce the number of straight maths courses and promote actuarial science and service teaching in other departments.

There is a consultation period that started in late August and ends on the 30th of September. So the lucky members of the Leicester mathematics faculty have had a whole month to marshall their to-be-ignored arguments against the changes.

It’s important to note that mathematics is not the only department that is facing cuts. But it’s equally important to note that it is being singled out: the university is aiming for cuts of 4.5% on average, and mathematics is being asked to make a cut of more like 20%. One reason for this seems to be that the department didn’t score all that highly in the last REF. It’s a sorry state of affairs for a university that used to boast Sir Michael Atiyah as its chancellor.

I don’t know what can be done to stop this, but at the very least there is a petition you can sign. It would be good to see a lot of signatures, so that Leicester can see how damaging a move like this will be to its reputation.

### n-Category CaféDisaster at Leicester

You’ve probably met mathematicians at the University of Leicester, or read their work, or attended their talks, or been to events they’ve organized. Their pure group includes at least four people working in categorical areas: Frank Neumann, Simona Paoli, Teimuraz Pirashvili and Andy Tonks.

Now this department is under severe threat. A colleague of mine writes:

24 members of the Department of Mathematics at the University of Leicester — the great majority of the members of the department — have been informed that their post is at risk of redundancy, and will have to reapply for their positions by the end of September. Only 18 of those applying will be re-appointed (and some of those have been changed to purely teaching positions).

It’s not only mathematics at stake. The university is apparently on a process of “institutional transformation”, involving:

the closure of departments, subject areas and courses, including the Vaughan Centre for Lifelong Learning and the university bookshop. Hundreds of academic, academic-related and support staff are to be made redundant, many of them compulsorily.

If you don’t like this, sign the petition objecting! You’ll see lots of familiar names already on the list (Tim Gowers, John Baez, Ross Street, …). As signatory David Pritchard wrote, “successful departments and universities are hard to build and easy to destroy.”

### Jordan Ellenberg — Are the 2016 Orioles the slowest team in baseball history?

The Orioles are last in the AL in stolen bases, with 17.  They also have the fewest times caught stealing, with 11; they’re so slow they’re not even trying to run.

But here’s the thing that really jumps out at you.  With just 17 games to play, the Orioles have 6 triples on the season.  And this is a team with power, a team that hits the ball to the deep outfield a lot.  Six triples.  You know what the record is for the fewest triples ever recorded by a team?  11.  By the 1998 Orioles.  This year’s team is like the 1998 squad without the speed machine that was 34-year-old Brady Anderson.   They are going to set a fewest-triples record that may never be broken.

### Jordan Ellenberg — Show report: Xenia Rubinos at the Frequency

Xenia Rubinos is a — ok, what is she?  A singer-songwriter-yeller-wreaker-of-havoc who plays an avant-garde version of R&B with a lot of loud, hectic guitar in it.  I’ve been pronouncing her name “Zenya” but she says “Senia.”  She played to about 100 people at the Frequency last Thursday.  She seems to belong in a much bigger place in front of a much bigger crowd, so much so that it feels a little weird to be right there next to her as she does her frankly pretty amazing thing.  Here’s “Cherry Tree,” from her 2013 debut, still her best song by my lights.  It would be most people’s best song.

This, live, was pretty close to the record.  Other songs weren’t.  Live, I thought she and her band sometimes sounded like Fiery Furnaces, which doesn’t come through on the records.  “Pan Y Cafe”, a fun romp on the album

is much more aggro live.  It’s kind of what the Pixies “Spanish songs” would be like if somebody who actually spoke Spanish wrote them.  (She likes the Pixies.)

Maybe I should make a post about the greatest shows I’ve seen in Madison.  This was one of them.  Who else?  Man Man in 2007.  The Breeders in 2009.  Fatty Acids / Sat Nite Duets in 2012.  I’ll have to think about this more thoroughly.

## September 14, 2016

### Particlebites — Inspecting the Higgs with a golden probe

Hello particle nibblers,

After recovering from a dead-diphoton-excess induced depression (see here, here, and here for summaries) I am back to tell you a little more about something that actually does exist, our old friend Monsieur Higgs boson. All of the fuss over the past few months over a potential new particle at 750 GeV has perhaps made us forget just how special and interesting the Higgs boson really is, but as more data is collected at the LHC, we will surely be reminded of this fact once again (see Fig.1).

Figure 1: Monsieur Higgs boson struggles to understand the Higgs mechanism.

Previously I discussed how one of the best and most precise ways to study the Higgs boson is just by `shining light on it’, or more specifically via its decays to pairs of photons. Today I want to expand on another fantastic and precise way to study the Higgs which I briefly mentioned previously; Higgs decays to four charged leptons (specifically electrons and muons) shown in Fig.2. This is a channel near and dear to my heart and has a long history because it was realized, way before the Higgs was actually discovered at 125 GeV, to be among the best ways to find a Higgs boson over a large range of potential masses above around 100 GeV. This led to it being dubbed the “gold plated” Higgs discovery mode, or “golden channel”, and in fact was one of the first channels (along with the diphoton channel) in which the 125 GeV Higgs boson was discovered at the LHC.

Figure 2: Higgs decays to four leptons are mediated by the various physics effects which can enter in the grey blob. Could new physics be hiding in there?

One of the characteristics that makes the golden channel so valuable as a probe of the Higgs is that it is very precisely measured by the ATLAS and CMS experiments and has a very good signal to background ratio. Furthermore, it is very well understood theoretically since most of the dominant contributions can be calculated explicitly for both the signal and background. The final feature of the golden channel that makes it valuable, and the one that I will focus on today, is that it contains a wealth of information in each event due to the large number of observables associated with the four final state leptons.

Since there are four charged leptons which are measured and each has an associated four momentum, there are in principle 16 separate numbers which can be measured in each event. However, the masses of the charged leptons are tiny in comparison to the Higgs mass so we can consider them as massless (see Footnote 1) to a very good approximation. This then reduces (using energy-momentum conservation) the number of observables to 12 which, in the lab frame, are given by the transverse momentum, rapidity, and azimuthal angle of each lepton. Now, Lorentz invariance tells us that physics doesnt care which frame of reference we pick to analyze the four lepton system. This allows us to perform a Lorentz transformation from the lab frame where the leptons are measured, but where the underlying physics can be obscured, to the much more convenient and intuitive center of mass frame of the four lepton system. Due to energy-momentum conservation, this is also the center of mass frame of the Higgs boson. In this frame the Higgs boson is at rest and the $\emph{pairs}$ of leptons come out back to back (see Footnote 2) .

In this frame the 12 observables can be divided into 4 production and 8 decay (see Footnote 3). The 4 production variables are characterized by the transverse momentum (which has two components), the rapidity, and the azimuthal angle of the four lepton system. The differential spectra for these four variables (especially the transverse momentum and rapidity) depend very much on how the Higgs is produced and are also affected by parton distribution functions at hadron colliders like the LHC. Thus the differential spectra for these variables can not in general be computed explicitly for Higgs production at the LHC.

The 8 decay observables are characterized by the center of mass energy of the four lepton system, which in this case is equal to the Higgs mass, as well as two invariant masses associated with each pair of leptons (how one picks the pairs is arbitrary). There are also five angles ($\Theta, \theta_1, \theta_2$, Φ, Φ1) shown in Fig. 3 for a particular choice of lepton pairings. The angle $\Theta$ is defined as the angle between the beam axis (labeled by p or z) and the axis defined to be in the direction of the momentum of one of the lepton pair systems (labeled by Z1 or z’). This angle also defines the ‘production plane’. The angles $\theta_1, \theta_2$ are the polar angles defined in the lepton pair rest frames. The angle Φ1 is the azimuthal angle between the production plane and the plane formed from the four vectors of one of the lepton pairs (in this case the muon pair). Finally Φ is defined as the azimuthal angle between the decay planes formed out of the two lepton pairs.

Figure 3: Angular center of mass observables in Higgs to four lepton decays.

To a good approximation these decay observables are independent of how the Higgs boson is produced. Furthermore, unlike the production variables, the fully differential spectra for the decay observables can be computed explicitly and even analytically. Each of them contains information about the properties of the Higgs boson as do the correlations between them. We see an example of this in Fig. 4 where we show the one dimensional (1D) spectrum for the Φ variable under various assumptions about the CP properties of the Higgs boson.

Figure 4: Here I show various examples for the Φ differential spectrum assuming different possibilities for the CP properties of the Higgs boson.

This variable has long been known to be sensitive to the CP properties of the Higgs boson. An effect like CP violation would show up as an asymmetry in this Φ distribution which we can see in curve number 5 shown in orange. Keep in mind though that although I show a 1D spectrum for Φ, the Higgs to four lepton decay is a multidimensional differential spectrum of the 8 decay observables and all of their correlations. Thus though we can already see from a 1D projection for Φ how information about the Higgs is contained in these distributions, MUCH more information is contained in the fully differential decay width of Higgs to four lepton decays. This makes the golden channel a powerful probe of the detailed properties of the Higgs boson.

OK nibblers, hopefully I have given you a flavor of the golden channel and why it is valuable as a probe of the Higgs boson. In a future post I will discuss in more detail the various types of physics effects which can enter in the grey blob in Fig. 2. Until then, keep nibbling and don’t let dead diphotons get you down!

Footnote 1: If you are feeling uneasy about the fact that the Higgs can only “talk to” particles with mass and yet can decay to four massless (atleast approximately) leptons, keep in mind they do not interact directly. The Higgs decay to four charged leptons is mediated by intermediate particles which DO talk to the Higgs and charged leptons.

Footnote 2: More precisely, in the Higgs rest frame, the four vector formed out of the sum of the two four vectors of any pair of leptons which are chosen will be back to back with the four vector formed out of the sum of the second pair of leptons.

Footnote 3: This dividing into production and decay variables after transforming to the four lepton system center of mass frame (i.e. Higgs rest frame) is only possible in practice because all four leptons are visible and their four momentum can be reconstructed with very good precision at the LHC. This then allows for the rest frame of the Higgs boson to be reconstructed on an event by event basis. For final states with missing energy or jets which can not be reconstructed with high precision, transforming to the Higgs rest frame is in general not possible.

### John Baez — The Circular Electron Positron Collider

Chen-Ning Yang is perhaps China’s most famous particle physicists. Together with Tsung-Dao Lee, he won the Nobel prize in 1957 for discovering that the laws of physics known the difference between left and right. He helped create Yang–Mills theory: the theory that describes all the forces in nature except gravity. He helped find the Yang–Baxter equation, which describes what particles do when they move around on a thin sheet of matter, tracing out braids.

Right now the world of particle physics is in a shocked, somewhat demoralized state because the Large Hadron Collider has not yet found any physics beyond the Standard Model. Some Chinese scientists want to forge ahead by building an even more powerful, even more expensive accelerator.

But Yang recently came out against this. This is a big deal, because he is very prestigious, and only China has the will to pay for the next machine. The director of the Chinese institute that wants to build the next machine, Wang Yifeng, issued a point-by-point rebuttal of Yang the very next day.

Over on G+, Willie Wong translated some of Wang’s rebuttal in some comments to my post on this subject. The real goal of my post here is to make this translation a bit easier to find—not because I agree with Wang, but because this discussion is important: it affects the future of particle physics.

First let me set the stage. In 2012, two months after the Large Hadron Collider found the Higgs boson, the Institute of High Energy Physics proposed a bigger machine: the Circular Electron Positron Collider, or CEPC.

This machine would be a ring 100 kilometers around. It would collide electrons and positrons at an energy of 250 GeV, about twice what you need to make a Higgs. It could make lots of Higgs bosons and study their properties. It might find something new, too! Of course that would be the hope.

It would cost $6 billion, and the plan was that China would pay for 70% of it. Nobody knows who would pay for the rest. According to Science: On 4 September, Yang, in an article posted on the social media platform WeChat, says that China should not build a supercollider now. He is concerned about the huge cost and says the money would be better spent on pressing societal needs. In addition, he does not believe the science justifies the cost: The LHC confirmed the existence of the Higgs boson, he notes, but it has not discovered new particles or inconsistencies in the standard model of particle physics. The prospect of an even bigger collider succeeding where the LHC has failed is “a guess on top of a guess,” he writes. Yang argues that high-energy physicists should eschew big accelerator projects for now and start blazing trails in new experimental and theoretical approaches. That same day, IHEP’s director, Wang Yifang, posted a point-by-point rebuttal on the institute’s public WeChat account. He criticized Yang for rehashing arguments he had made in the 1970s against building the BECP. “Thanks to comrade [Deng] Xiaoping,” who didn’t follow Yang’s advice, Wang wrote, “IHEP and the BEPC … have achieved so much today.” Wang also noted that the main task of the CEPC would not be to find new particles, but to carry out detailed studies of the Higgs boson. Yang did not respond to request for comment. But some scientists contend that the thrust of his criticisms are against the CEPC’s anticipated upgrade, the Super Proton-Proton Collider (SPPC). “Yang’s objections are directed mostly at the SPPC,” says Li Miao, a cosmologist at Sun Yat-sen University, Guangzhou, in China, who says he is leaning toward supporting the CEPC. That’s because the cost Yang cites—$20 billion—is the estimated price tag of both the CEPC and the SPPC, Li says, and it is the SPPC that would endeavor to make discoveries beyond the standard model.

Still, opposition to the supercollider project is mounting outside the high-energy physics community. Cao Zexian, a researcher at CAS’s Institute of Physics here, contends that Chinese high-energy physicists lack the ability to steer or lead research in the field. China also lacks the industrial capacity for making advanced scientific instruments, he says, which means a supercollider would depend on foreign firms for critical components. Luo Huiqian, another researcher at the Institute of Physics, says that most big science projects in China have suffered from arbitrary cost cutting; as a result, the finished product is often a far cry from what was proposed. He doubts that the proposed CEPC would be built to specifications.

The state news agency Xinhua has lauded the debate as “progress in Chinese science” that will make big science decision-making “more transparent.” Some, however, see a call for transparency as a bad omen for the CEPC. “It means the collider may not receive the go-ahead in the near future,” asserts Institute of Physics researcher Wu Baojun. Wang acknowledged that possibility in a 7 September interview with Caijing magazine: “opposing voices naturally have an impact on future approval of the project,” he said.

Willie Wong’s prefaced his translation of Wang’s rebuttal with this:

Here is a translation of the essential parts of the rebuttal; some standard Chinese language disclaimers of deference etc are omitted. I tried to make the translation as true to the original as possible; the viewpoints expressed are not my own.

Here is the translation:

Today (September 4) published the article by CN Yang titled “China should not build an SSC today”. As a scientist who works on the front line of high energy physics and the current director of the the high energy physics institute in the Chinese Academy of Sciences, I cannot agree with his viewpoint.

(A) The first reason to Dr. Yang’s objection is that a supercollider is a bottomless hole. His objection stemmed from the American SSC wasting 3 billion US dollars and amounted to naught. The LHC cost over 10 billion US dollars. Thus the proposed Chinese accelerator cannot cost less than 20 billion US dollars, with no guaranteed returns. [Ed: emphasis original]

Here, there are actually three problems. The first is “why did SSC fail”? The second is “how much would a Chinese collider cost?” And the third is “is the estimate reasonable and realistic?” Here I address them point by point.

(1) Why did the American SSC fail? Are all colliders bottomless pits?

The many reasons leading to the failure of the American SSC include the government deficit at the time, the fight for funding against the International Space Station, the party politics of the United States, the regional competition between Texas and other states. Additionally there are problems with poor management, bad budgeting, ballooning construction costs, failure to secure international collaboration. See references [2,3] [Ed: consult original article for references; items 1-3 are English language]. In reality, “exceeding the budget” is definitely not the primary reason for the failure of the SSC; rather, the failure should be attributed to some special and circumstantial reasons, caused mainly by political elements.

For the US, abandoning the SSC was a very incorrect decision. It lost the US the chance for discovering the Higgs Boson, as well as the foundations and opportunities for future development, and thereby also the leadership position that US has occupied internationally in high energy physics until then. This definitely had a very negative impact on big science initiatives in the US, and caused one generation of Americans to lose the courage to dream. The reasons given by the American scientific community against the SSC are very similar to what we here today against the Chinese collider project. But actually the cancellation of the SSC did not increase funding to other scientific endeavors. Of course, activation of the SSC would not have reduced the funding to other scientific endeavors, and many people who objected to the project are not regretting it.

Since then, LHC was constructed in Europe, and achieved great success. Even though its construction exceeded its original budget, but not by a lot. This shows that supercollider projects do not have to be bottomless, and has a chance to succeed.

The Chinese political landscape is entirely different from that of the US. In particular, for large scale constructions, the political system is superior. China has already accomplished to date many tasks which the Americans would not, or could not do; many more will happen in the future. The failure of SSC doesn’t mean that we cannot do it. We should scientifically analyze the situation, and at the same time foster international collaboration, and properly manage the budget.

(2) How much would it cost? Our planned collider (using circumference of 100 kilometers for computations) will proceed in two steps. [Ed: details omitted. The author estimated that the electron-positron collider will cost 40 Billion Yuan, followed by the proton-proton collider which will cost 100 billion Yuan, not accounting for inflation. With approximately 10 year construction time for each phase.] The two-phase planning is to showcase the scientific longevity of the project, especially entrainment of other technical development (e.g. high energy superconductors), and that the second phase [ed: the proton-proton collider] is complementary to the scientific and technical developments of the first phase. The reason that the second phase designs are incorporated in the discussion is to prevent the scenario where design elements of the first phase inadvertently shuts off possibility of further expansion in the second phase.

(3) Is this estimate realistic? Are we going to go down the same road as the American SSC?

First, note that in the past 50 years , there were many successful colliders internationally (LEP, LHC, PEPII, KEKB/SuperKEKB etc) and many unsuccessful ones (ISABELLE, SSC, FAIR, etc). The failed ones are all proton accelerators. All electron colliders have been successful. The main reason is that proton accelerators are more complicated, and it is harder to correctly estimate the costs related to constructing machines beyond the current frontiers.

There are many successful large-scale constructions in China. In the 40 years since the founding of the high energy physics institute, we’ve built [list of high energy experiment facilities, I don’t know all their names in English], each costing over 100 million Yuan, and none are more than 5% over budget, in terms of actual costs of construction, time to completion, meeting milestones. We have a well developed expertise in budget, construction, and management.

For the CEPC (electron-positron collider) our estimates relied on two methods:

(i) Summing of the parts: separately estimating costs of individual elements and adding them up.

(ii) Comparisons: using costs for elements derived from costs of completed instruments both domestically and abroad.

At the level of the total cost and at the systems level, the two methods should produce cost estimates within 20% of each other.

After completing the initial design [ref. 1], we produced a list of more than 1000 required equipments, and based our estimates on that list. The estimates are refereed by local and international experts.

For the SPPC (the proton-proton collider; second phase) we only used the second method (comparison). This is due to the second phase not being the main mission at hand, and we are not yet sure whether we should commit to the second phase. It is therefore not very meaningful to discuss its potential cost right now. We are committed to only building the SPPC once we are sure the science and the technology are mature.

(B) The second reason given by Dr. Yang is that China is still a developing country, and there are many social-economic problems that should be solved before considering a supercollider.

Any country, especially one as big as China, must consider both the immediate and the long-term in its planning. Of course social-economic problems need to be solved, and indeed solving them is taking currently the lions share of our national budget. But we also need to consider the long term, including an appropriate amount of expenditures on basic research, to enable our continuous development and the potential to lead the world. The China at the end of the Qing dynasty has a rich populace with the world’s highest GDP. But even though the government has the ability to purchase armaments, the lack of scientific understanding reduced the country to always be on the losing side of wars.

In the past few hundred years, developments into understanding the structure of matter, from molecules, atoms, to the nucleus, the elementary particles, all contributed and led the scientific developments of their era. High energy physics pursue the finest structure of matter and its laws, the techniques used cover many different fields, from accelerator, detectors, to low temperature, superconducting, microwave, high frequency, vacuum, electronic, high precision instrumentation, automatic controls, computer science and networking, in many ways led to the developments in those fields and their broad adoption. This is a indicator field in basic science and technical developments. Building the supercollider can result in China occupying the leadership position in such diverse scientific fields for several decades, and also lead to the domestic production of many of the important scientific and technical instruments. Furthermore, it will allow us to attract international intellectual capital, and allow the training of thousands of world-leading specialists in our institutes. How is this not an urgent need for the country?

In fact, the impression the Chinese government and the average Chinese people create for the world at large is a populace with lots of money, and also infatuated with money. It is hard for a large country to have a international voice and influence without significant contribution to the human culture. This influence, in turn, affects the benefits China receive from other countries. In terms of current GDP, the proposed project (including also the phase 2 SPPC) does not exceed that of the Beijing positron-electron collider completed in the 80s, and is in fact lower than LEP, LHC, SSC, and ILC.

Designing and starting the construction of the next supercollider within the next 5 years is a rare opportunity to let us achieve a leadership position internationally in the field of high energy physics. The newly discovered Higgs boson has a relatively low mass, which allows us to probe it further using a circular positron-electron collider. Furthermore, such colliders has a chance to be modified into proton colliders. This facility will have over 5 decades of scientific use. Furthermore, currently Europe, US, and Japan all already have scientific items on their agenda, and within 20 years probably cannot construct similar facilities. This gives us an advantage in competitiveness. Thirdly, we already have the experience building the Beijing positron-electron collider, so such a facility is in our strengths. The window of opportunity typically lasts only 10 years, if we miss it, we don’t know when the next window will be. Furthermore, we have extensive experience in underground construction, and the Chinese economy is currently at a stage of high growth. We have the ability to do the constructions and also the scientific need. Therefore a supercollider is a very suitable item to consider.

(C) The third reason given by Dr. Yang is that constructing a supercollider necessarily excludes funding other basic sciences.

China currently spends 5% of all R&D budget on basic research; internationally 15% is more typical for developed countries. As a developing country aiming to joint the ranks of developed country, and as a large country, I believe we should aim to raise the ratio to 10% gradually and eventually to 15%. In terms of numbers, funding for basic science has a large potential for growth (around 100 billion yuan per annum) without taking away from other basic science research.

On the other hand, where should the increased funding be directed? Everyone knows that a large portion of our basic science research budgets are spent on purchasing scientific instruments, especially from international sources. If we evenly distribute the increased funding amount all basic science fields, the end results is raising the GDP of US, Europe, and Japan. If we instead spend 10 years putting 30 billion Yuan into accelerator science, more than 90% of the money will remain in the country, and improve our technical development and market share of domestic companies. This will also allow us to raise many new scientists and engineers, and greatly improve the state of art in domestically produced scientific instruments.

In addition, putting emphasis into high energy physics will only bring us to the normal funding level internationally (it is a fact that particle physics and nuclear physics are severely underfunded in China). For the purposes of developing a world-leading big science project, CEPC is a very good candidate. And it does not contradict a desire to also develop other basic sciences.

(D) Dr. Yang’s fourth objection is that both supersymmetry and quantum gravity have not been verified, and the particles we hope to discover using the new collider will in fact be nonexistent.

That is of course not the goal of collider science. In [ref 1] which I gave to Dr. Yang myself, we clearly discussed the scientific purpose of the instrument. Briefly speaking, the standard model is only an effective theory in the low energy limit, and a new and deeper theory is need. Even though there are some experimental evidence beyond the standard model, more data will be needed to indicate the correct direction to develop the theory. Of the known problems with the standard model, most are related to the Higgs Boson. Thus a deeper physical theory should have hints in a better understanding of the Higgs boson. CEPC can probe to 1% precision [ed. I am not sure what this means] Higgs bosons, 10 times better than LHC. From this we have the hope to correctly identify various properties of the Higgs boson, and test whether it in fact matches the standard model. At the same time, CEPC has the possibility of measuring the self-coupling of the Higgs boson, of understanding the Higgs contribution to vacuum phase transition, which is important for understanding the early universe. [Ed. in this previous sentence, the translations are a bit questionable since some HEP jargon is used with which I am not familiar] Therefore, regardless of whether LHC has discovered new physics, CEPC is necessary.

If there are new coupling mechanisms for Higgs, new associated particles, composite structure for Higgs boson, or other differences from the standard model, we can continue with the second phase of the proton-proton collider, to directly probe the difference. Of course this could be due to supersymmetry, but it could also be due to other particles. For us experimentalists, while we care about theoretical predictions, our experiments are not designed only for them. To predict whether a collider can or cannot discover a hypothetical particle at this moment in time seems premature, and is not the view point of the HEP community in general.

(E) The fifth objection is that in the past 70 years high energy physics have not led to tangible improvements to humanity, and in the future likely will not.

In the past 70 years, there are many results from high energy physics, which led to techniques common to everyday life. [Ed: list of examples include sychrotron radiation, free electron laser, scatter neutron source, MRI, PET, radiation therapy, touch screens, smart phones, the world-wide web. I omit the prose.]

[Ed. Author proceeds to discuss hypothetical economic benefits from
a) superconductor science
b) microwave source
c) cryogenics
d) electronics
sort of the usual stuff you see in funding proposals.]

(F) The sixth reason was that the institute for High Energy Physics of the Chinese Academy of Sciences has not produced much in the past 30 years. The major scientific contributions to the proposed collider will be directed by non-Chinese, and so the nobel will also go to a non-Chinese.

[Ed. I’ll skip this section because it is a self-congratulatory pat on one’s back (we actually did pretty well for the amount of money invested), a promise to promote Chinese participation in the project (in accordance to the economic investment), and the required comment that “we do science for the sake of science, and not for winning the Nobel.”]

(G) The seventh reason is that the future in HEP is in developing a new technique to accelerate particles, and developing a geometric theory, not in building large accelerators.

A new method to accelerate particles is definitely an important aspect to accelerator science. In the next several decades this can prove useful for scattering experiments or for applied fields where beam confinement is not essential. For high energy colliders, in terms of beam emittance and energy efficiency, new acceleration principles have a long way to go. During this period, high energy physics cannot be simply put on hold. In terms of “geometric theory” or “string theory”, these are too far from experimentally approachable, and is not a problem we can consider currently.

People disagree on the future of high energy physics. Currently there are no Chinese winners of the Nobel prize in physics, but there are many internationally. Dr. Yang’s viewpoints are clearly out of mainstream. Not just currently, but also in the past several decades. Dr. Yang has been documented to have held a pessimistic view of higher energy physics and its future since the 60s, and that’s how he missed out on the discovery of the standard model. He is on record as being against Chinese collider science since the 70s. It is fortunate that the government supported the Institute of High Energy Physics and constructed various supporting facilities, leading to our current achievements in synchrotron radiation and neutron scattering. For the future, we should listen to the younger scientists at the forefront of current research, for that’s how we can gain international recognition for our scientific research.

It will be very interesting to see how this plays out.

### John Baez — Struggles with the Continuum (Part 5)

Quantum field theory is the best method we have for describing particles and forces in a way that takes both quantum mechanics and special relativity into account. It makes many wonderfully accurate predictions. And yet, it has embroiled physics in some remarkable problems: struggles with infinities!

I want to sketch some of the key issues in the case of quantum electrodynamics, or ‘QED’. The history of QED has been nicely told here:

• Silvian Schweber, QED and the Men who Made it: Dyson, Feynman, Schwinger, and Tomonaga, Princeton U. Press, Princeton, 1994.

Instead of explaining the history, I will give a very simplified account of the current state of the subject. I hope that experts forgive me for cutting corners and trying to get across the basic ideas at the expense of many technical details. The nonexpert is encouraged to fill in the gaps with the help of some textbooks.

QED involves just one dimensionless parameter, the fine structure constant:

$\displaystyle{ \alpha = \frac{1}{4 \pi \epsilon_0} \frac{e^2}{\hbar c} \approx \frac{1}{137.036} }$

Here $e$ is the electron charge, $\epsilon_0$ is the permittivity of the vacuum, $\hbar$ is Planck’s constant and $c$ is the speed of light. We can think of $\alpha^{1/2}$ as a dimensionless version of the electron charge. It says how strongly electrons and photons interact.

Nobody knows why the fine structure constant has the value it does! In computations, we are free to treat it as an adjustable parameter. If we set it to zero, quantum electrodynamics reduces to a free theory, where photons and electrons do not interact with each other. A standard strategy in QED is to take advantage of the fact that the fine structure constant is small and expand answers to physical questions as power series in $\alpha^{1/2}.$ This is called ‘perturbation theory’, and it allows us to exploit our knowledge of free theories.

One of the main questions we try to answer in QED is this: if we start with some particles with specified energy-momenta in the distant past, what is the probability that they will turn into certain other particles with certain other energy-momenta in the distant future? As usual, we compute this probability by first computing a complex amplitude and then taking the square of its absolute value. The amplitude, in turn, is computed as a power series in $\alpha^{1/2}.$

The term of order $\alpha^{n/2}$ in this power series is a sum over Feynman diagrams with $n$ vertices. For example, suppose we are computing the amplitude for two electrons wth some specified energy-momenta to interact and become two electrons with some other energy-momenta. One Feynman diagram appearing in the answer is this:

Here the electrons exhange a single photon. Since this diagram has two vertices, it contributes a term of order $\alpha.$ The electrons could also exchange two photons:

giving a term of $\alpha^2.$ A more interesting term of order $\alpha^2$ is this:

Here the electrons exchange a photon that splits into an electron-positron pair and then recombines. There are infinitely many diagrams with two electrons coming in and two going out. However, there are only finitely many with $n$ vertices. Each of these contributes a term proportional to $\alpha^{n/2}$ to the amplitude.

In general, the external edges of these diagrams correspond to the experimentally observed particles coming in and going out. The internal edges correspond to ‘virtual particles’: that is, particles that are not directly seen, but appear in intermediate steps of a process.

Each of these diagrams is actually a notation for an integral! There are systematic rules for writing down the integral starting from the Feynman diagram. To do this, we first label each edge of the Feynman diagram with an energy-momentum, a variable $p \in \mathbb{R}^4.$ The integrand, which we shall not describe here, is a function of all these energy-momenta. In carrying out the integral, the energy-momenta of the external edges are held fixed, since these correspond to the experimentally observed particles coming in and going out. We integrate over the energy-momenta of the internal edges, which correspond to virtual particles, while requiring that energy-momentum is conserved at each vertex.

However, there is a problem: the integral typically diverges! Whenever a Feynman diagram contains a loop, the energy-momenta of the virtual particles in this loop can be arbitrarily large. Thus, we are integrating over an infinite region. In principle the integral could still converge if the integrand goes to zero fast enough. However, we rarely have such luck.

What does this mean, physically? It means that if we allow virtual particles with arbitrarily large energy-momenta in intermediate steps of a process, there are ‘too many ways for this process to occur’, so the amplitude for this process diverges.

Ultimately, the continuum nature of spacetime is to blame. In quantum mechanics, particles with large momenta are the same as waves with short wavelengths. Allowing light with arbitrarily short wavelengths created the ultraviolet catastrophe in classical electromagnetism. Quantum electromagnetism averted that catastrophe—but the problem returns in a different form as soon as we study the interaction of photons and charged particles.

Luckily, there is a strategy for tackling this problem. The integrals for Feynman diagrams become well-defined if we impose a ‘cutoff’, integrating only over energy-momenta $p$ in some bounded region, say a ball of some large radius $\Lambda.$ In quantum theory, a particle with momentum of magnitude greater than $\Lambda$ is the same as a wave with wavelength less than $\hbar/\Lambda.$ Thus, imposing the cutoff amounts to ignoring waves of short wavelength—and for the same reason, ignoring waves of high frequency. We obtain well-defined answers to physical questions when we do this. Unfortunately the answers depend on $\Lambda,$ and if we let $\Lambda \to \infty,$ they diverge.

However, this is not the correct limiting procedure. Indeed, among the quantities that we can compute using Feynman diagrams are the charge and mass of the electron! Its charge can be computed using diagrams in which an electron emits or absorbs a photon:

Similarly, its mass can be computed using a sum over Feynman diagrams where one electron comes in and one goes out.

The interesting thing is this: to do these calculations, we must start by assuming some charge and mass for the electron—but the charge and mass we get out of these calculations do not equal the masses and charges we put in!

The reason is that virtual particles affect the observed charge and mass of a particle. Heuristically, at least, we should think of an electron as surrounded by a cloud of virtual particles. These contribute to its mass and ‘shield’ its electric field, reducing its observed charge. It takes some work to translate between this heuristic story and actual Feynman diagram calculations, but it can be done.

Thus, there are two different concepts of mass and charge for the electron. The numbers we put into the QED calculations are called the ‘bare’ charge and mass, $e_\mathrm{bare}$ and $m_\mathrm{bare}.$ Poetically speaking, these are the charge and mass we would see if we could strip the electron of its virtual particle cloud and see it in its naked splendor. The numbers we get out of the QED calculations are called the ‘renormalized’ charge and mass, $e_\mathbb{ren}$ and $m_\mathbb{ren}.$ These are computed by doing a sum over Feynman diagrams. So, they take virtual particles into account. These are the charge and mass of the electron clothed in its cloud of virtual particles. It is these quantities, not the bare quantities, that should agree with experiment.

Thus, the correct limiting procedure in QED calculations is a bit subtle. For any value of $\Lambda$ and any choice of $e_\mathrm{bare}$ and $m_\mathrm{bare},$ we compute $e_\mathbb{ren}$ and $m_\mathbb{ren}.$ The necessary integrals all converge, thanks to the cutoff. We choose $e_\mathrm{bare}$ and $m_\mathrm{bare}$ so that $e_\mathbb{ren}$ and $m_\mathbb{ren}$ agree with the experimentally observed charge and mass of the electron. The bare charge and mass chosen this way depend on $\Lambda,$ so call them $e_\mathrm{bare}(\Lambda)$ and $m_\mathrm{bare}(\Lambda).$

Next, suppose we want to compute the answer to some other physics problem using QED. We do the calculation with a cutoff $\Lambda,$ using $e_\mathrm{bare}(\Lambda)$ and $m_\mathrm{bare}(\Lambda)$ as the bare charge and mass in our calculation. Then we take the limit $\Lambda \to \infty.$

In short, rather than simply fixing the bare charge and mass and letting $\Lambda \to \infty,$ we cleverly adjust the bare charge and mass as we take this limit. This procedure is called ‘renormalization’, and it has a complex and fascinating history:

• Laurie M. Brown, ed., Renormalization: From Lorentz to Landau (and Beyond), Springer, Berlin, 2012.

There are many technically different ways to carry out renormalization, and our account so far neglects many important issues. Let us mention three of the simplest.

First, besides the classes of Feynman diagrams already mentioned, we must also consider those where one photon goes in and one photon goes out, such as this:

These affect properties of the photon, such as its mass. Since we want the photon to be massless in QED, we have to adjust parameters as we take $\Lambda \to \infty$ to make sure we obtain this result. We must also consider Feynman diagrams where nothing comes in and nothing comes out—so-called ‘vacuum bubbles’—and make these behave correctly as well.

Second, the procedure just described, where we impose a ‘cutoff’ and integrate over energy-momenta $p$ lying in a ball of radius $\Lambda,$ is not invariant under Lorentz transformations. Indeed, any theory featuring a smallest time or smallest distance violates the principles of special relativity: thanks to time dilation and Lorentz contractions, different observers will disagree about times and distances. We could accept that Lorentz invariance is broken by the cutoff and hope that it is restored in the $\Lambda \to \infty$ limit, but physicists prefer to maintain symmetry at every step of the calculation. This requires some new ideas: for example, replacing Minkowski spacetime with 4-dimensional Euclidean space. In 4-dimensional Euclidean space, Lorentz transformations are replaced by rotations, and a ball of radius $\Lambda$ is a rotation-invariant concept. To do their Feynman integrals in Euclidean space, physicists often let time take imaginary values. They do their calculations in this context and then transfer the results back to Minkowski spacetime at the end. Luckily, there are theorems justifying this procedure.

Third, besides infinities that arise from waves with arbitrarily short wavelengths, there are infinities that arise from waves with arbitrarily long wavelengths. The former are called ‘ultraviolet divergences’. The latter are called ‘infrared divergences’, and they afflict theories with massless particles, like the photon. For example, in QED the collision of two electrons will emit an infinite number of photons with very long wavelengths and low energies, called ‘soft photons’. In practice this is not so bad, since any experiment can only detect photons with energies above some nonzero value. However, infrared divergences are conceptually important. It seems that in QED any electron is inextricably accompanied by a cloud of soft photons. These are real, not virtual particles. This may have remarkable consequences.

Battling these and many other subtleties, many brilliant physicists and mathematicians have worked on QED. The good news is that this theory has been proved to be ‘perturbatively renormalizable’:

• J. S. Feldman, T. R. Hurd, L. Rosen and J. D. Wright, QED: A Proof of Renormalizability, Lecture Notes in Physics 312, Springer, Berlin, 1988.

• Günter Scharf, Finite Quantum Electrodynamics: The Causal Approach, Springer, Berlin, 1995

This means that we can indeed carry out the procedure roughly sketched above, obtaining answers to physical questions as power series in $\alpha^{1/2}.$

The bad news is we do not know if these power series converge. In fact, it is widely believed that they diverge! This puts us in a curious situation.

For example, consider the magnetic dipole moment of the electron. An electron, being a charged particle with spin, has a magnetic field. A classical computation says that its magnetic dipole moment is

$\displaystyle{ \vec{\mu} = -\frac{e}{2m_e} \vec{S} }$

where $\vec{S}$ is its spin angular momentum. Quantum effects correct this computation, giving

$\displaystyle{ \vec{\mu} = -g \frac{e}{2m_e} \vec{S} }$

for some constant $g$ called the gyromagnetic ratio, which can be computed using QED as a sum over Feynman diagrams with an electron exchanging a single photon with a massive charged particle:

The answer is a power series in $\alpha^{1/2},$ but since all these diagrams have an even number of vertices, it only contains integral powers of $\alpha.$ The lowest-order term gives simply $g = 2.$ In 1948, Julian Schwinger computed the next term and found a small correction to this simple result:

$\displaystyle{ g = 2 + \frac{\alpha}{\pi} \approx 2.00232 }$

By now a team led by Toichiro Kinoshita has computed $g$ up to order $\alpha^5.$ This requires computing over 13,000 integrals, one for each Feynman diagram of the above form with up to 10 vertices! The answer agrees very well with experiment: in fact, if we also take other Standard Model effects into account we get agreement to roughly one part in $10^{12}.$

This is the most accurate prediction in all of science.

However, as mentioned, it is widely believed that this power series diverges! Next time I’ll explain why physicists think this, and what it means for a divergent series to give such a good answer when you add up the first few terms.

## September 13, 2016

### Resonaances — Next stop: tth

This was a summer of brutally dashed hopes for a quick discovery of many fundamental particles that we were imagining. For the time being we need  to focus on the ones that actually exist, such as the Higgs boson. In the Run-1 of the LHC, the Higgs existence and identity were firmly established,  while its mass and basic properties were measured. The signal was observed with large significance in 4 different decay channels (γγ, ZZ*, WW*, ττ), and two different production modes (gluon fusion, vector-boson fusion) have been isolated.  Still, there remains many fine details to sort out. The realistic goal for the Run-2 is to pinpoint the following Higgs processes:
• (h→bb): Decays to b-quarks.
• (Vh): Associated production with W or Z boson.
• (tth): Associated production with top quarks.

It seems that the last objective may be achieved quicker than expected. The tth production process is very interesting theoretically, because its rate is proportional to the (square of the) Yukawa coupling between the Higgs boson and top quarks. Within the Standard Model, the value of this parameter is known to a good accuracy, as it is related to the mass of the top quark. But that relation can be  disrupted in models beyond the Standard Model, with the two-Higgs-doublet model and composite/little Higgs models serving as prominent examples. Thus, measurements of the top Yukawa coupling will provide a crucial piece of information about new physics.

In the Run-1, a not-so-small signal of tth production was observed by the ATLAS and CMS collaborations in several channels. Assuming that Higgs decays have the same branching fraction as in the Standard Model, the tth signal strength normalized to the Standard Model prediction was estimated as

At face value, a strong evidence for the tth production was obtained in the Run-1! This fact was not advertised by the collaborations because the measurement is not clean due to a large number of top quarks produced by other processes at the LHC. The tth signal is thus a small blip on top of a huge background, and it's not excluded that some unaccounted for systematic errors are skewing the measurements. The collaborations thus preferred to play it safe, and wait for more data to be collected.

In the Run-2 with 13 TeV collisions the tth production cross section is 4-times larger than in the Run-1, therefore the new data are coming at a fast pace. Both ATLAS and CMS presented their first Higgs results in early August, and the tth signal is only getting stronger.  ATLAS showed their measurements in the γγ, WW/ττ, and bb final states of Higgs decay, as well as their combination:
Most channels display a signal-like excess, which is reflected by the Run-2 combination being 2.5 sigma away from zero. A similar picture is emerging in CMS, with 2-sigma signals in the γγ and WW/ττ channels. Naively combining all Run-1 and and Run-2 results one then finds
At face value, this is a discovery! Of course, this number should be treated with some caution because, due to large systematic errors, a naive Gaussian combination may not represent very well the true likelihood. Nevertheless, it indicates that, if all goes well, the discovery of the tth production mode should be officially announced in the near future, maybe even this year.

Should we get excited that the measured tth rate is significantly larger than Standard Model one? Assuming  that the current central value remains, it would mean that  the top Yukawa coupling is 40% larger than that predicted by the Standard Model. This is not impossible, but very unlikely in practice. The reason is that the top Yukawa coupling also controls the gluon fusion - the main Higgs production channel at the LHC - whose rate is measured to be in perfect agreement with the Standard Model.  Therefore, a realistic model that explains the large tth rate would also have to provide negative contributions to the gluon fusion amplitude, so as to cancel the effect of the large top Yukawa coupling. It is possible to engineer such a cancellation in concrete models, but  I'm not aware of any construction where this conspiracy arises in a natural way. Most likely, the currently observed excess is  a statistical fluctuation (possibly in combination with  underestimated theoretical and/or  experimental errors), and the central value will drift toward μ=1 as more data is collected.

### Resonaances — Weekend Plot: update on WIMPs

There's been a lot of discussion on this blog about the LHC not finding new physics.  I should however give justice to other experiments that also don't find new physics, often in a spectacular way. One area where this is happening is direct detection of WIMP dark matter. This weekend plot summarizes the current limits on the spin-independent scattering cross-section of dark matter particles on nucleons:
For large WIMP masses, currently the most succesful detection technology is to fill up a tank with a ton of liquid xenon and wait for a passing dark matter particle to knock one of the nuclei. Recently, we have had updates from two such experiments: LUX in the US, and PandaX in China, whose limits now cut below zeptobarn cross sections (1 zb = 10^-9 pb = 10^-45 cm^2). These two experiments are currently going head-to-head, but  Panda, being larger, will ultimately overtake LUX.  Soon, however,  it'll have to face a new fierce competitor: the XENON1T experiment, and the plot will have to be updated next year.  Fortunately, we won't need to be learning another prefix soon. Once yoctobarn sensitivity is achieved by the experiments, we will hit the neutrino floor:  the non-reducible background from solar and atmospheric neutrinos (gray area at the bottom of the plot). This will make detecting a dark matter signal much more challenging, and will certainly slow down the progress for WIMP masses larger than ~5 GeV. For lower masses,  the distance to the floor remains large. Xenon detectors lose their steam there, and another technology is needed, like germanium detectors of CDMS and CDEX, or CaWO4 crystals of CRESST. Also on this front important progress is expected soon.

What does the theory say about when we will find dark matter? It is perfectly viable that the discovery is waiting for us just behind the corner in the remaining space above the neutrino floor, but currently there's no strong theoretical hints in favor of that possibility. Usually, dark matter experiments advertise that they're just beginning to explore the interesting parameter space predicted by theory models.This is not quite correct.  If the WIMP were true to its name, that is to say if it was interacting via the weak force (meaning, coupled to Z with order 1 strength), it would have order 10 fb scattering cross section on neutrons. Unfortunately, that natural possibility was excluded in the previous century. Years of experimental progress have shown that the WIMPs, if they exist, must be interacting super-weakly with matter. For example, for a 100 GeV fermionic dark matter with the vector coupling g to the Z boson, the current limits imply g ≲ 10^-4. The coupling can be larger if the Higgs boson is the mediator of interactions between the dark and visible worlds, as the Higgs already couples very weakly to nucleons. This construction is, arguably, the most plausible one currently probed by direct detection experiments.  For a scalar dark matter particle X with mass 0.1-1 TeV  coupled to the Higgs via the interaction  λ v h |X|^2 the experiments are currently probing the coupling λ in the 0.01-1 ballpark. In general, there's no theoretical lower limit on the dark matter coupling to nucleons. Nevertheless, the weak coupling implied by direct detection limits creates some tension for the thermal production paradigm, which requires a weak (that is order picobarn) annihilation cross section for dark matter particles. This tension needs to be resolved by more complicated model building,  e.g. by arranging for resonant annihilation or for co-annihilation.

### n-Category CaféHoTT and Philosophy

I’m down in Bristol at a conference – HoTT and Philosophy. Slides for my talk – The modality of physical law in modal homotopy type theory – are here.

Perhaps ‘The modality of differential equations’ would have been more accurate as I’m looking to work through an analogy in modal type theory between necessity and the jet comonad, partial differential equations being the latter’s coalgebras.

The talk should provide some intuition for a pair of talks the following day:

• Urs Schreiber & Felix Wellen: ‘Formalizing higher Cartan geometry in modal HoTT’
• Felix Wellen: ‘Synthetic differential geometry in homotopy type theory via a modal operator’

I met up with Urs and Felix yesterday evening. Felix is coding up in Agda geometric constructions, such as frame bundles, using the modalities of differential cohesion.

I’m now trying to announce all my new writings in one place: on Twitter.

Why? Well…

Someone I respect said he’s been following my online writings, off and on, ever since the old days of This Week’s Finds. He wishes it were easier to find my new stuff all in one place. Right now it’s spread out over several locations:

Azimuth: serious posts on environmental issues and applied mathematics, fairly serious popularizations of diverse scientific subjects.

Google+: short posts of all kinds, mainly light popularizations of math, physics, and astronomy.

The n-Category Café: posts on mathematics, leaning toward category theory and other forms of pure mathematics that seem too intimidating for the above forums.

Visual Insight: beautiful pictures of mathematical objects, together with explanations.

Diary: more personal stuff, and polished versions of the more interesting Google+ posts, just so I have them on my own website.

It’s absurd to expect anyone to look at all these locations to see what I’m writing. Even more absurdly, I claimed I was going to quit posting on Google+, but then didn’t. So, I’ll try to make it possible to reach everything via Twitter.

## September 12, 2016

### Scott Aaronson — The Ninth Circuit ruled that vote-swapping is legal. Let’s use it to stop Trump.

Updates: Commenter JT informs me that there’s already a vote-swapping site available: MakeMineCount.org.  (I particularly like their motto: “Everybody wins.  Except Trump.”)  I still think there’s a need for more sites, particularly ones that would interface with Facebook, but this is a great beginning.  I’ve signed up for it myself.

Also, Toby Ord, a philosopher I know at Oxford, points me to a neat academic paper he wrote that analyzes vote-swapping as an example of “moral trade,” and that mentions the Porter v. Bowen decision holding vote-swapping to be legal in the US.

Also, if we find two Gary Johnson supporters in swing states willing to trade, I’ve been contacted by a fellow Austinite who’d be happy to accept the second trade.

As regular readers might know, my first appearance in the public eye (for a loose definition of “public eye”) had nothing to do with D-Wave, Gödel’s Theorem, the computational complexity of quantum gravity, Australian printer ads, or—god forbid—social justice shaming campaigns.  Instead it centered on NaderTrading: the valiant but doomed effort, in the weeks leading up to the 2000 US Presidential election, to stop George W. Bush’s rise to power by encouraging Ralph Nader supporters in swing states (such as Florida) to vote for Al Gore, while pairing themselves off over the Internet with Gore supporters in safe states (such as Texas or California) who would vote for Nader on their behalf.  That way, Nader’s vote share (and his chance of reaching 5% of the popular vote, which would’ve qualified him for federal funds in 2004) wouldn’t be jeopardized, but neither would Gore’s chance of winning the election.

Here’s what I thought at the time:

1. The election would be razor-close (though I never could’ve guessed how close).
2. Bush was a malignant doofus who would be a disaster for the US and the world (though I certainly didn’t know how—recall that, at the time, Bush was running as an isolationist).
3. Many Nader supporters, including the ones who I met at Berkeley, prioritized personal virtue so completely over real-world consequences that they might actually throw the election to Bush.

NaderTrading, as proposed by law professor Jamin Raskin and others, seemed like one of the clearest ways for nerds who knew these points, but who lacked political skills, to throw themselves onto the gears of history and do something good for the world.

So, as a 19-year-old grad student, I created a website called “In Defense of NaderTrading” (archived version), which didn’t arrange vote swaps themselves—other sites did that—but which explored some of the game theory behind the concept and answered some common objections to it.  (See also here.)  Within days of creating the site, I’d somehow become an “expert” on the topic, and was fielding hundreds of emails as well as requests for print, radio, and TV interviews.

Alas, the one question everyone wanted to ask me was the one that I, as a CS nerd, was the least qualified to answer: is NaderTrading legal? isn’t it kind of like … buying and selling votes?

1. Members of Congress and state legislatures trade votes all the time.
2. A private agreement between two friends to each vote for the other’s preferred candidate seems self-evidently legal, so why should it be any different if a website is involved?
3. The whole point of NaderTrading is to exercise your voting power more fully—pretty much the opposite of bartering it away for private gain.
4. While the election laws vary by state, the ones I read very specifically banned trading votes for tangible goods—they never even mentioned trading votes for other votes, even though they easily could’ve done so had legislators intended to ban that.

But—and here was the fatal problem—I could only address principles and arguments, rather than politics and power.  I couldn’t honestly assure the people who wanted to vote-swap, or to set up vote-swapping sites, that they wouldn’t be prosecuted for it.

As it happened, the main vote-swapping site, voteswap2000.com, was shut down by California’s Republican attorney general, Bill Jones, only four days after it opened.  A second vote-swapping site, votexchange.com, was never directly threatened but also ceased operations because of what happened to voteswap2000.  Many legal scholars felt confident that these shutdowns wouldn’t hold up in court, but with just a few weeks until the election, there was no time to fight it.

Before it was shut down, voteswap2000 had brokered 5,041 vote-swaps, including hundreds in Florida.  Had that and similar sites been allowed to continue operating, it’s entirely plausible that they would’ve changed the outcome of the election.  No Iraq war, no 2008 financial meltdown: we would’ve been living in a different world.  Note that, of the 100,000 Floridians who ultimately voted for Nader, we would’ve needed to convince fewer than 1% of them.

Today, we face something I didn’t expect to face in my lifetime: namely, a serious prospect of a takeover of the United States by a nativist demagogue with open contempt for democratic norms and legendarily poor impulse control. Meanwhile, there are two third-party candidates—Gary Johnson and Jill Stein—who together command 10% of the vote.  A couple months ago, I’d expressed hopes that Johnson might help Hillary, by splitting the Republican vote. But it now looks clear that, on balance, not only Stein but also Johnson are helping Trump, by splitting up that part of the American vote that’s not driven by racial resentment.

So recently a friend—the philanthropist and rationalist Holden Karnofsky—posed a question to me: should we revive the vote-swapping idea from 2000? And presumably this time around, enhance the idea with 21st-century bells and whistles like mobile apps and Facebook, to make it all the easier for Johnson/Stein supporters in swing states and Hillary supporters in safe states to find each other and trade votes?

Just like so many well-meaning people back in 2000, Holden was worried about one thing: is vote-swapping against the law? If someone created a mobile vote-swapping app, could that person be thrown in jail?

At first, I had no idea: I assumed that vote-swapping simply remained in the legal Twilight Zone where it was last spotted in 2000.  But then I did something radical: I looked it up.  And when I did, I discovered a decade-old piece of news that changes everything.

On August 6, 2007, the Ninth Circuit Court of Appeals finally ruled on a case, Porter v. Bowen, stemming from the California attorney general’s shutdown of voteswap2000.com.  Their ruling, which is worth reading in full, was unequivocal.

Vote-swapping, it said, is protected by the First Amendment, which state election laws can’t supersede.  It is fundamentally different from buying or selling votes.

Yes, the decision also granted the California attorney general immunity from prosecution, on the ground that vote-swapping’s legality hadn’t yet been established in 2000—indeed it wouldn’t be, until the Ninth Circuit’s decision itself!  Nevertheless, the ruling made clear that the appellants (the creators of voteswap2000 and some others) were granted the relief they sought: namely, an assurance that vote-swapping websites would be protected from state interference in the future.

Admittedly, if vote-swapping takes off again, it’s possible that the question will be re-litigated and will end up in the Supreme Court, where the Ninth Circuit’s ruling could be reversed.  For now, though, let the message be shouted from the rooftops: a court has ruled. You cannot be punished for cooperating with your fellow citizens to vote strategically, or for helping others do the same.

For those of you who oppose Donald Trump and who are good at web and app development: with just two months until the election, I think the time to set up some serious vote-swapping infrastructure is right now.  Let your name be etched in history, alongside those who stood up to all the vicious demagogues of the past.  And let that happen without your even needing to get up from your computer chair.

I’m not, I confess, a huge fan of either Gary Johnson or Jill Stein (especially not Stein).  Nevertheless, here’s my promise: on November 8, I will cast my vote in the State of Texas for Gary Johnson, if I can find at least one Johnson supporter who lives in a swing state, who I feel I can trust, and who agrees to vote for Hillary Clinton on my behalf.

If you think you’ve got what it takes to be my vote-mate, send me an email, tell me about yourself, and let’s talk!  I’m not averse to some electoral polyamory—i.e., lots of Johnson supporters in swing states casting their votes for Clinton, in exchange for the world’s most famous quantum complexity blogger voting for Johnson—but I’m willing to settle for a monogamous relationship if need be.

And as for Stein? I’d probably rather subsist on tofu than vote for her, because of her support for seemingly every pseudoscience she comes across, and especially because of her endorsement of the vile campaign to boycott Israel.  Even so: if Stein supporters in swing states whose sincerity I trusted offered to trade votes with me, and Johnson supporters didn’t, I would bury my scruples and vote for Stein.  Right now, the need to stop the madman takes precedence over everything else.

One last thing to get out of the way.  When they learn of my history with NaderTrading, people keep pointing me a website called BalancedRebellion.com, and exclaiming “look! isn’t this exactly that vote-trading thing you were talking about?”

On examination, Balanced Rebellion turns out to be the following proposal:

1. A Trump supporter in a swing state pairs off with a Hillary supporter in a swing state.
2. Both of them vote for Gary Johnson, thereby helping Johnson without giving an advantage to either Hillary or Trump.

So, exercise for the reader: see if you can spot the difference between this idea and the kind of vote-swapping I’m talking about.  (Here’s a hint: my version helps prevent a racist lunatic from taking command of the most powerful military on earth, rather than being neutral about that outcome.)

Not surprisingly, the “balanced rebellion” is advocated by Johnson fans.

### Tommaso Dorigo — INFN Selections - A Last Batch Of Advices

Next Monday, the Italian city of Rome will swarm with about 700 young physicists. They will be there to participate to a selection of 58 INFN research scientists. In previous articles (see e.g.

### Doug Natelson — Professional service

An underappreciated part of a scientific career is "professional service" - reviewing papers and grant proposals, filling roles in professional societies, organizing workshops/conferences/summer schools - basically carrying your fair share of the load, so that the whole scientific enterprise actually functions.  Some people take on service roles primarily because they want to learn better how the system works; others do so out of altruism, realizing that it's only fair, for example, to perform reviews of papers and grants at roughly the rate you submit them; still others take on responsibility because they either think they know best how to run/fix things, or because they don't like the alternatives.   Often it's a combination of all of these.

More and more journals proliferate; numbers of grant applications climb even as (in the US anyway) support remains flat or declining; and conference attendance continues to grow (the APS March Meeting is now twice as large as in my last year of grad school).  This means that professional demands are on the rise.  At the same time, it is difficult to track and quantify (except by self-reporting) these activities, and reward structures give only indirect incentive (e.g., reviewing grants gives you a sense of what makes a better proposal) to good citizenship.  So, when you're muttering under your breath about referee number 3 or about how the sessions are organized nonoptimally at your favorite conference (as we all do from time to time), remember that at least the people in question are trying to contribute, rather than sitting on the sidelines.

And now, the photo-a-day project straggles in to the finish line, with a final five photos dominated by the kids:

362/366: Kid Art I

A set of figures drawn by The Pip at day care over the summer.

363/366: Kid Art II

Awesome owl drawing by SteelyKid.

One of the official end of summer activities is cleaning off the “art shelf” in the bookcase in the dining room, where we pile the various projects the kids bring home from school and day care. I sort these, and take photos of the best, for historical documentation purposes, and these are two of my favorites from the lot. The four round-bodied figures all came home on the same day, from The Pip. The owl was in SteelyKid’s backpack the last week of second grade, which shows just how long we’ve gone without cleaning that shelf off. And also how good she’s gotten at drawing…

Also, here’s some bonus kid art: a fan decorated by SteelyKid:

A fan decorated by SteelyKid.

I’m not sure exactly how she did this– whether the paper was pre-stretched on the fan, or whether she painted on a flat paper that somebody then attached to the frame. I should ask, as I bet the explanation will be entertainingly detailed.

364/366: Farewell to the Pool I

SteelyKid going off the diving board at the Niskayuna town pool.

365/366: Farewell to the Pool II

The Pip swimming with a noodle at the Niskayuna town pool.

These are from the next-to-last day of our summer membership at the Niskayuna town pool, when we all went over as a family (which is why there are good photos– Kate was in the pool with The Pip, just out of frame). It’s been really remarkable to see The Pip’s development in terms of swimming. At the start of last summer, he wouldn’t even dip his feet into the pool, and now, he’s an eager swimmer. He’s using a noodle here to help him float, which lets him really motor around, but he can furiously dog-paddle short distances all by himself. And will, in fact, angrily insist on being allowed to dog-paddle freely over short distances even in the wave pool at Great Escape (which we also visited last weekend, to catch the water park before it closed), which is utterly terrifying.

He’s extremely proud of this too, and asked for this to be the picture we printed out to send in for inclusion in his kindergarten class book. And commemorated it with a self-portrait, which I’ll throw in here as mor bonus kid art:

The Pip’s drawing of himself swimming with a green pool noodle.

And we’ll close this out with the thing that’s making me close this out: the return of the school year.

366/366: Workspace

My new office on campus.

After three years in the department chair’s office (which is much bigger), I’m back in a standard-size office. Not the same office I was in before I was Chair, but one a couple doors further down the hall. A colleague wanted to be in the office I used to have, though, and moved the giant desk I ordered when I arrived to make room. So I moved into the office where my good desk was, and have, for the moment, organized things in a sensible and reasonably aesthetic way. This will be covered in random stacks of books and papers within three weeks, but it looks nice and professional right now, so we’ll give it the final spot, to mark the closing of my sabbatical with a picture of the workspace I’m returning to.

And that’s it for that. I may or may not have some wrap-up thoughts in a day or two, but for right now, I have classes to teach, and need to go get ready.

### Terence Tao — Course announcement: 246A, complex analysis

Next week, I will be teaching Math 246A, the first course in the three-quarter graduate complex analysis sequence.  This first course covers much of the same ground as an honours undergraduate complex analysis course, in particular focusing on the basic properties of holomorphic functions such as the Cauchy and residue theorems, the classification of singularities, and the maximum principle, but there will be more of an emphasis on rigour, generalisation and abstraction, and connections with other parts of mathematics.  If time permits I may also cover topics such as factorisation theorems, harmonic functions, conformal mapping, and/or applications to analytic number theory.  The main text I will be using for this course is Stein-Shakarchi (with Ahlfors as a secondary text), but as usual I will also be writing notes for the course on this blog.

Filed under: 246A - complex analysis, admin Tagged: announcement

## September 11, 2016

### Tommaso Dorigo — Statistics At A Physics Conference ?!

Particle physics conferences are a place where you can listen to many different topics - not just news about the latest precision tests of the standard model or searches for new particles at the energy frontier. If we exclude the very small, workshop-like events where people gather to focus on a very precise topic, all other events do allow for the contamination from reports of parallel fields of research. The reason is of course that there is a significant cross-fertilization between these fields.

### Chad Orzel — 353-361/366: Penultimate Photo Dump

Another day, another batch of photos from August.

353/366: Trail Blaze

One lost sock next to the hiking path in the Reist bird sanctuary.

This is actually a slight reversal of chronology, as the hike through Vischer’s Ferry was the day after I went for a hike in the H. G. Reist Wildlife Sanctuary. The pictures from Vischer’s Ferry were better though, as despite the name I didn’t see much wildlife in the Reist Sanctuary. I did, however, find this bright pink child-size sock sitting next to the trail right after a fork. Presumably a way for some wandering toddler to find their way home…

354/366: Web

Spiderweb near the top of a tree in the Reist sanctuary.

I didn’t see much in the way of wildlife in the wildlife sanctuary. They did, however, have a wealth of spiderwebs, mostly strung face-high across the path. This one high up in a tree was a pleasant exception, and caught the light well.

355/366: Window

A dead tree trunk with a hole clean through it, in the Reist sanctuary.

356/366: New and Old

Fine tendrils of new vine on a dead tree in the Reist sanctuary.

The sanctuary did have a wide variety of interesting-looking dead trees, though. The second of these is one of those “imprinted deeply on The Lord of the Rings” moments, because this kind of thing always reminds me of Sam and Frodo finding a defaced statue in The Two Towers.

The hanging remains of a broken telephone pole in Niskayuna.

On the way back from the Reist sanctuary, I stopped and made a stab at getting an image I’ve wanted for months. this is a telephone pole on Balltown Road that was hit by something and shattered several months ago. They brought crews in and cleaned out the wreckage, but left this one bit of pole hanging from the wires. I have no idea how the wires are attached in order for this to seem like a good idea. Or maybe it’s just capitalism in action– something like “National Grid owns the power lines at the top, and takes care of the poles, but can’t be bothered to remove a hanging weight stressing the telephone lines owned by Verizon down below.”

358/366: Market Morning

The Schenectady Greenmarket, our regular Sunday-morning activity.

Every Sunday morning, I take the kids to the Schenectady Greenmarket to get a few things and give Kate a bit of a break. I don’t generally take the camera along, though, because it’s hard enough to wrangle the sillyheads without also looking for aesthetic views. My parents took the kids for a weekend in late August, though, so I finally got some decent shots of the market; this one is from the steps of the Post Office across the street.

The market takes up two streets in an “L” shape, so here’s the same scene from the other angle:

The Schenectady Greenmarket, looking down Franklin Street.

I’m sort of torn as to which of these I like better; the first gives a better sense of the market itself, because it’s a longer street and thus more crowded, but the isolated building in the second really pops. So I’ll put them both here, and you can decide for yourself which is the real photo 358.

359/366: Hanging Flowers

Planters hanging from a light pole in downtown Schenectady.

I’m sort of proud of the alignment in this one, as there’s an ugly smokestack behind this that I managed to hide pretty well behind the lamp post.

(“Why not pick a different post?” you ask? Because the others on this street don’t have as clear a view of the sky as this one, and it was easier to hide the smokestack than the stuff that would clutter the background for the others.)

360/366: Planted Flowers

Roses outside the Wold building at Union.

The end of August means the start of a new school year, which means making campus look all pretty for the new students when they arrive. I’m not sure if these roses in front of the Wold building just bloom at a fortuitous time, or if they’re recent transplants, but they’re very pretty.

361/366: Green Lions

Cool lions on the base of a lamp post by Memorial Chapel.

Probably the most enduring affect of this experiment in taking lots of pictures will be a somewhat heightened awareness of odd little photogenic things around me. Like these funky lions holding up a light outside of Memorial Chapel. I’ve been here fifteen years, and never noticed these guys until I was wandering around campus with a camera, looking for interesting shots.

That’s it for today. The final five photos will be posted in one last photo dump, probably tomorrow morning, but maybe Tuesday.

### Backreaction — I’ve read a lot of books recently

[Reading is to writing what eating is to...]

Dreams Of A Final Theory: The Scientist's Search for the Ultimate Laws of Nature
Steven Weinberg
Vintage, Reprint Edition (1994)

This book appeared when I was still in high school and I didn’t take note of it then. Later it seemed too out-of-date to bother, but meanwhile it’s almost become a historical document. Written with the pretty explicit aim to argue in favor of the Superconducting Supercollider (a US-proposal for a large particle collider that was scraped in the early 90s), it’s the most flawless popular science book about theoretical physics I’ve ever come across.

Weinberg’s explanations are both comprehensible and remarkably accurate. The book contains no unnecessary clutter, is both well-structured and well written, and Weinberg doesn’t hold back with his opinions, neither on religion nor on philosophy.

It’s also the first time I’ve tried an audio-book. I listened to it while treadmill running. A lot of sweat went into the first chapters. But I gave up half through and bought the paperback which I read on the plane to Austin. Weinberg is one of the people I interviewed for my book.

Lesson learned: Audiobooks aren’t for me.

Truth And Beauty – Aesthetics and Motivations in Science
Subrahmanyan Chandrasekhar
University of Chicago Press (1987)

I had read this book before but wanted to remind me of its content. It’s a collection of essays on the role of beauty in physics, mostly focused on general relativity and the early 20th century. Along historical examples like Milne, Eddington, Weyl, and Einstein, Chandrasekhar discusses various aspects of beauty, like elegance, simplicity, or harmony. I find it too bad that Chandrasekhar didn’t bring in more of his own opinion but mostly summarizes other people’s thoughts.

Lesson learned: Tell the reader what you think.

Truth or Beauty – Science and the Quest for Order
David Orrell
Yale University Press (2012)

In this book, mathematician David Orrell argues that beauty isn’t a good guide to truth. It’s an engagingly written book which covers a lot of ground, primarily in physics, from helocentrism to string theory. But Orrell tries too hard to make everything fit his bad-beauty narrative. Many of his interpretations are over-the-top, like his complaint that
“[T]he aesthetics of science – and particularly the “hard” sciences such as physics –have been characterized by a distinctly male feel. For example, feminist psychologists have noted that the classical picture of the atom as hard, indivisible, independent, separate, and so on corresponds very closely to the stereotypically masculine sense of self. If must have come as a shock to the young, male, champions of quantum theory when they disco