Planet Musings

January 25, 2021

Terence Tao246B, Notes 1: Zeroes, poles, and factorisation of meromorphic functions

— 1. Jensen’s formula —

Suppose {f} is a non-zero rational function {f =P/Q}, then by the fundamental theorem of algebra one can write

\displaystyle  f(z) = c \frac{\prod_\rho (z-\rho)}{\prod_\zeta (z-\zeta)}

for some non-zero constant {c}, where {\rho} ranges over the zeroes of {P} (counting multiplicity) and {\zeta} ranges over the zeroes of {Q} (counting multiplicity), and assuming {z} avoids the zeroes of {Q}. Taking absolute values and then logarithms, we arrive at the formula

\displaystyle  \log |f(z)| = \log |c| + \sum_\rho \log|z-\rho| - \sum_\zeta \log |z-\zeta|, \ \ \ \ \ (1)

as long as {z} avoids the zeroes of both {P} and {Q}. (In this set of notes we use {\log} for the natural logarithm when applied to a positive real number, and {\mathrm{Log}} for the standard branch of the complex logarithm (which extends {\log}); the multi-valued complex logarithm {\log} will only be used in passing.) Alternatively, taking logarithmic derivatives, we arrive at the closely related formula

\displaystyle  \frac{f'(z)}{f(z)} = \sum_\rho \frac{1}{z-\rho} - \sum_\zeta \frac{1}{z-\zeta}, \ \ \ \ \ (2)

again for {z} avoiding the zeroes of both {P} and {Q}. Thus we see that the zeroes and poles of a rational function {f} describe the behaviour of that rational function, as well as close relatives of that function such as the log-magnitude {\log|f|} and log-derivative {\frac{f'}{f}}. We have already seen these sorts of formulae arise in our treatment of the argument principle in 246A Notes 4.

Exercise 1 Let {P(z)} be a complex polynomial of degree {n \geq 1}.
  • (i) (Gauss-Lucas theorem) Show that the complex roots of {P'(z)} are contained in the closed convex hull of the complex roots of {P(z)}.
  • (ii) (Laguerre separation theorem) If all the complex roots of {P(z)} are contained in a disk {D(z_0,r)}, and {\zeta \not \in D(z_0,r)}, then all the complex roots of {nP(z) + (\zeta - z) P'(z)} are also contained in {D(z_0,r)}. (Hint: apply a suitable Möbius transformation to move {\zeta} to infinity, and then apply part (i) to a polynomial that emerges after applying this transformation.)

There are a number of useful ways to extend these formulae to more general meromorphic functions than rational functions. Firstly there is a very handy “local” variant of (1) known as Jensen’s formula:

Theorem 2 (Jensen’s formula) Let {f} be a meromorphic function on an open neighbourhood of a disk {\overline{D(z,r)} = \{ z: |z-z_0| \leq r \}}, with all removable singularities removed. Then, if {z_0} is neither a zero nor a pole of {f}, we have

\displaystyle  \log |f(z_0)| = \int_0^1 \log |f(z_0+re^{2\pi i t})|\ dt + \sum_{\rho: |\rho-z_0| \leq r} \log \frac{|\rho-z_0|}{r} \ \ \ \ \ (3)

\displaystyle  - \sum_{\zeta: |\zeta-z_0| \leq r} \log \frac{|\zeta-z_0|}{r}

where {\rho} and {\zeta} range over the zeroes and poles of {f} respectively (counting multiplicity) in the disk {\overline{D(z,r)}}.

One can view (3) as a truncated (or localised) variant of (1). Note also that the summands {\log \frac{|\rho-z_0|}{r}, \log \frac{|\zeta-z_0|}{r}} are always non-positive.

Proof: By perturbing {r} slightly if necessary, we may assume that none of the zeroes or poles of {f} (which form a discrete set) lie on the boundary circle {\{ z: |z-z_0| = r \}}. By translating and rescaling, we may then normalise {z_0=0} and {r=1}, thus our task is now to show that

\displaystyle  \log |f(0)| = \int_0^1 \log |f(e^{2\pi i t})|\ dt + \sum_{\rho: |\rho| < 1} \log |\rho| - \sum_{\zeta: |\zeta| < 1} \log |\zeta|. \ \ \ \ \ (4)

We may remove the poles and zeroes inside the disk {D(0,1)} by the useful device of Blaschke products. Suppose for instance that {f} has a zero {\rho} inside the disk {D(0,1)}. Observe that the function

\displaystyle  B_\rho(z) := \frac{\rho - z}{1 - \overline{\rho} z} \ \ \ \ \ (5)

has magnitude {1} on the unit circle {\{ z: |z| = 1\}}, equals {\rho} at the origin, has a simple zero at {\rho}, but has no other zeroes or poles inside the disk. Thus Jensen’s formula (4) already holds if {f} is replaced by {B_\rho}. To prove (4) for {f}, it thus suffices to prove it for {f/B_\rho}, which effectively deletes a zero {\rho} inside the disk {D(0,1)} from {f} (and replaces it instead with its inversion {1/\overline{\rho}}). Similarly we may remove all the poles inside the disk. As a meromorphic function only has finitely many poles and zeroes inside a compact set, we may thus reduce to the case when {f} has no poles or zeroes on or inside the disk {D(0,1)}, at which point our goal is simply to show that

\displaystyle  \log |f(0)| = \int_0^1 \log |f(e^{2\pi i t})|\ dt.

Since {f} has no zeroes or poles inside the disk, it has a holomorphic logarithm {F} (Exercise 46 of 246A Notes 4). In particular, {\log |f|} is the real part of {F}. The claim now follows by applying the mean value property (Exercise 17 of 246A Notes 3) to {\log |f|}. \Box

An important special case of Jensen’s formula arises when {f} is holomorphic in a neighborhood of {\overline{D(z_0,r)}}, in which case there are no contributions from poles and one simply has

\displaystyle  \int_0^1 \log |f(z_0+re^{2\pi i t})|\ dt = \log |f(z_0)| + \sum_{\rho: |\rho-z_0| \leq r} \log \frac{r}{|\rho-z_0|}. \ \ \ \ \ (6)

This is quite a useful formula, mainly because the summands {\log \frac{r}{|\rho-z_0|}} are non-negative; it can be viewed as a more precise assertion of the subharmonicity of {\log |f|} (see Exercises 60(ix) and 61 of 246A Notes 5). Here are some quick applications of this formula:

Exercise 3 Use (6) to give another proof of Liouville’s theorem: a bounded holomorphic function {f} on the entire complex plane is necessarily constant.

Exercise 4 Use Jensen’s formula to prove the fundamental theorem of algebra: a complex polynomial {P(z)} of degree {n} has exactly {n} complex zeroes (counting multiplicity), and can thus be factored as {P(z) = c (z-z_1) \dots (z-z_n)} for some complex numbers {c,z_1,\dots,z_n} with {c \neq 0}. (Note that the fundamental theorem was invoked previously in this section, but only for motivational purposes, so the proof here is non-circular.)

Exercise 5 (Shifted Jensen’s formula) Let {f} be a meromorphic function on an open neighbourhood of a disk {\{ z: |z-z_0| \leq r \}}, with all removable singularities removed. Show that

\displaystyle  \log |f(z)| = \int_0^1 \log |f(z_0+re^{2\pi i t})| \mathrm{Re} \frac{r e^{2\pi i t} + (z-z_0)}{r e^{2\pi i t} - (z-z_0)}\ dt \ \ \ \ \ (7)

\displaystyle  + \sum_{\rho: |\rho-z_0| \leq r} \log \frac{|\rho-z|}{|r - \rho^* (z-z_0)|}

\displaystyle - \sum_{\zeta: |\zeta-z_0| \leq r} \log \frac{|\zeta-z|}{|r - \zeta^* (z-z_0)|}

for all {z} in the open disk {\{ z: |z-z_0| < r\}} that are not zeroes or poles of {f}, where {\rho^* = \frac{\overline{\rho-z_0}}{r}} and {\zeta^* = \frac{\overline{\zeta-z_0}}{r}}. (The function {\Re \frac{r e^{2\pi i t} + (z-z_0)}{r e^{2\pi i t} - (z-z_0)}} appearing in the integrand is sometimes known as the Poisson kernel, particularly if one normalises so that {z_0=0} and {r=1}.)

Exercise 6 (Bounded type)
  • (i) If {f} is a holomorphic function on {D(0,1)} that is not identically zero, show that {\liminf_{r \rightarrow 1^-} \int_0^{2\pi} \log |f(re^{i\theta})|\ d\theta > -\infty}.
  • (ii) If {f} is a meromorphic function on {D(0,1)} that is the ratio of two bounded holomorphic functions that are not identically zero, show that {\int_0^{2\pi} |\log |f(re^{i\theta})||\ d\theta < \infty}. (Functions {f} of this form are said to be of bounded type and lie in the Nevanlinna class for the unit disk {D(0,1)}.)

Exercise 7 (Smoothed out Jensen formula) Let {f} be a meromorphic function on an open set {U}, and let {\phi: U \rightarrow {\bf C}} be a smooth compactly supported function. Show that

\displaystyle \sum_\rho \phi(\rho) - \sum_\zeta \phi(\zeta)

\displaystyle  = \frac{-1}{2\pi} \int\int_U ((\frac{\partial}{\partial x} + i \frac{\partial}{\partial y}) \phi(x+iy)) \frac{f'}{f}(x+iy)\ dx dy

\displaystyle  = \frac{1}{2\pi} \int\int_U ((\frac{\partial^2}{\partial x^2} + \frac{\partial^2}{\partial y}^2) \phi(x+iy)) \log |f(x+iy)|\ dx dy

where {\rho, \zeta} range over the zeroes and poles of {f} (respectively) in the support of {\phi}. Informally argue why this identity is consistent with Jensen’s formula.

When applied to entire functions {f}, Jensen’s formula relates the order of growth of {f} near infinity with the density of zeroes of {f}. Here is a typical result:

Proposition 8 Let {f: {\bf C} \rightarrow {\bf C}} be an entire function, not identically zero, that obeys a growth bound {|f(z)| \leq C \exp( C|z|^\rho)} for some {C, \rho > 0} and all {z}. Then there exists a constant {C'>0} such that {D(0,R)} has at most {C' R^\rho} zeroes (counting multiplicity) for any {R \geq 1}.

Entire functions that obey a growth bound of the form {|f(z)| \leq C_\varepsilon \exp( C_\varepsilon |z|^{\rho+\varepsilon})} for every {\varepsilon>0} and {z} (where {C_\varepsilon} depends on {\varepsilon}) are said to be of order at most {\rho}. The above theorem shows that for such functions that are not identically zero, the number of zeroes in a disk of radius {R} does not grow much faster than {R^\rho}. This is often a useful preliminary upper bound on the zeroes of entire functions, as the order of an entire function tends to be relatively easy to compute in practice.

Proof: First suppose that {f(0)} is non-zero. From (6) applied with {r=2R} and {z_0=0} one has

\displaystyle  \int_0^1 \log(C \exp( C (2R)^\rho ) )\ dt \geq \log |f(0)| + \sum_{\rho: |\rho| \leq 2R} \log \frac{2R}{|\rho|}.

Every zero in {D(0,R)} contribute at least {\log 2} to a summand on the right-hand side, while all other zeroes contribute a non-negative quantity, thus

\displaystyle  \log C + C (2R)^\rho \geq \log |f(0)| + N_R \log 2

where {N_R} denotes the number of zeroes in {D(0,R)}. This gives the claim for {f(0) \neq 0}. When {f(0)=0}, one can shift {f} by a small amount to make {f} non-zero at the origin (using the fact that zeroes of holomorphic functions not identically zero are isolated), modifying {C} in the process, and then repeating the previous arguments. \Box

Just as (3) and (7) give truncated variants of (1), we can create truncated versions of (2). The following crude truncation is adequate for many applications:

Theorem 9 (Truncated formula for log-derivative) Let {f} be a holomorphic function on an open neighbourhood of a disk {\{ z: |z-z_0| \leq r \}} that is not identically zero on this disk. Suppose that one has a bound of the form {|f(z)| \leq M^{O_{c_1,c_2}(1)} |f(z_0)|} for some {M \geq 1} and all {z} on the circle {\{ z: |z-z_0| = r\}}. Let {0 < c_2 < c_1 < 1} be constants. Then one has the approximate formula

\displaystyle  \frac{f'(z)}{f(z)} = \sum_{\rho: |\rho - z_0| \leq c_1 r} \frac{1}{z-\rho} + O_{c_1,c_2}( \frac{\log M}{r} )

for all {z} in the disk {\{ z: |z-z_0| < c_2 r \}} other than zeroes of {f}. Furthermore, the number of zeroes {\rho} in the above sum is {O_{c_1,c_2}(\log M)}.

Proof: To abbreviate notation, we allow all implied constants in this proof to depend on {c_1,c_2}.

We mimic the proof of Jensen’s formula. Firstly, we may translate and rescale so that {z_0=0} and {r=1}, so we have {|f(z)| \leq M^{O(1)} |f(0)|} when {|z|=1}, and our main task is to show that

\displaystyle  \frac{f'(z)}{f(z)} - \sum_{\rho: |\rho| \leq c_1} \frac{1}{z-\rho} = O( \log M ) \ \ \ \ \ (8)

for {|z| \leq c_2}. Note that if {f(0)=0} then {f} vanishes on the unit circle and hence (by the maximum principle) vanishes identically on the disk, a contradiction, so we may assume {f(0) \neq 0}. From hypothesis we then have

\displaystyle  \log |f(z)| \leq \log |f(0)| + O(\log M)

on the unit circle, and so from Jensen’s formula (3) we see that

\displaystyle  \sum_{\rho: |\rho| \leq 1} \log \frac{1}{|\rho|} = O(\log M). \ \ \ \ \ (9)

In particular we see that the number of zeroes with {|\rho| \leq c_1} is {O(\log M)}, as claimed.

Suppose {f} has a zero {\rho} with {c_1 < |\rho| \leq 1}. If we factor {f = B_\rho g}, where {B_\rho} is the Blaschke product (5), then

\displaystyle  \frac{f'}{f} = \frac{B'_\rho}{B_\rho} + \frac{g'}{g}

\displaystyle  = \frac{g'}{g} + \frac{1}{z-\rho} - \frac{1}{z-1/\overline{\rho}}.

Observe from Taylor expansion that the distance between {\rho} and {1/\overline{\rho}} is {O( \log \frac{1}{|\rho|} )}, and hence {\frac{1}{z-\rho} - \frac{1}{z-1/\overline{\rho}} = O( \log \frac{1}{|\rho|} )} for {|z| \leq c_2}. Thus we see from (9) that we may use Blaschke products to remove all the zeroes in the annulus {c_1 < |\rho| \leq 1} while only affecting the left-hand side of (8) by {O( \log M)}; also, removing the Blaschke products does not affect {|f(z)|} on the unit circle, and only affects {\log |f(0)|} by {O(\log M)} thanks to (9). Thus we may assume without loss of generality that there are no zeroes in this annulus.

Similarly, given a zero {\rho} with {|\rho| \leq c_1}, we have {\frac{1}{z-1/\overline{\rho}} = O(1)}, so using Blaschke products to remove all of these zeroes also only affects the left-hand side of (8) by {O(\log M)} (since the number of zeroes here is {O(\log M)}), with {\log |f(0)|} also modified by at most {O(\log M)}. Thus we may assume in fact that {f} has no zeroes whatsoever within the unit disk. We may then also normalise {f(0) = 1}, then {\log |f(e^{2\pi i t})| \leq O(\log M)} for all {t \in [0,1]}. By Jensen’s formula again, we have

\displaystyle  \int_0^1 \log |f(e^{2\pi i t})|\ dt = 0

and thus (by using the identity {|x| = 2 \max(x,0) - x} for any real {x})

\displaystyle  \int_0^1 |\log |f(e^{2\pi i t})|\ dt \ll \log M. \ \ \ \ \ (10)

On the other hand, from (7) we have

\displaystyle  \log |f(z)| = \int_0^1 \log |f(e^{2\pi i t})| \Re \frac{e^{2\pi i t} + z}{e^{2\pi i t} - z}\ dt

which implies from (10) that {\log |f(z)|} and its first derivatives are {O( \log M )} on the disk {\{ z: |z| \leq c_2 \}}. But recall from the proof of Jensen’s formula that {\frac{f'}{f}} is the derivative of a logarithm {\log f} of {f}, whose real part is {\log |f|}. By the Cauchy-Riemann equations for {\log f}, we conclude that {\frac{f'}{f} = O(\log M)} on the disk {\{ z: |z| \leq c_2 \}}, as required. \Box

Exercise 10
  • (i) (Borel-Carathéodory theorem) If {f: U \rightarrow {\bf C}} is analytic on an open neighborhood of a disk {\overline{D(z_0,R)}}, show that

    \displaystyle  \sup_{z \in D(z_0,r)} |f(z)| \leq \frac{2r}{R-r} \sup_{z \in \overline{D(z_0,R)}} \mathrm{Re} f(z) + \frac{R+r}{R-r} |f(z_0)|.

    (Hint: one can normalise {z_0=0}, {R=1}, {f(0)=0}, and {\sup_{|z-z_0| \leq R} \mathrm{Re} f(z)=1}. Now {f} maps the unit disk to the half-plane {\{ \mathrm{Re} z \geq 1 \}}. Use a Möbius transformation to map the half-plane to the unit disk and then use the Schwarz lemma.)
  • (ii) Use (i) to give an alternate way to conclude the proof of Theorem 9.

A variant of the above argument allows one to make precise the heuristic that holomorphic functions locally look like polynomials:

Exercise 11 (Local Weierstrass factorisation) Let the notation and hypotheses be as in Theorem 9. Then show that

\displaystyle  f(z) = P(z) \exp( g(z) )

for all {z} in the disk {\{ z: |z-z_0| < c_2 r \}}, where {P} is a polynomial whose zeroes are precisely the zeroes of {f} in {\{ z: |z-z_0| \leq c_1r \}} (counting multiplicity) and {g} is a holomorphic function on {\{ z: |z-z_0| < c_2 r \}} of magnitude {O_{c_1,c_2}( \log M )} and first derivative {O_{c_1,c_2}( \log M / r )} on this disk. Furthermore, show that the degree of {P} is {O_{c_1,c_2}(\log M)}.

Exercise 12 (Preliminary Beurling factorisation) Let {H^\infty(D(0,1))} denote the space of bounded analytic functions {f: D(0,1) \rightarrow {\bf C}} on the unit disk; this is a normed vector space with norm

\displaystyle  \|f\|_{H^\infty(D(0,1))} := \sup_{z \in D(0,1)} |f(z)|.

  • (i) If {f \in H^\infty(D(0,1))} is not identically zero, and {z_n} denote the zeroes of {f} in {D(0,1)} counting multiplicity, show that

    \displaystyle  \sum_n (1-|z_n|) < \infty

    and

    \displaystyle  \sup_{0 < r < 1} \int_0^{2\pi} | \log |f(re^{i\theta})| |\ d\theta < \infty.

  • (ii) Let the notation be as in (i). If we define the Blaschke product

    \displaystyle  B(z) := z^m \prod_{|z_n| \neq 0} \frac{|z_n|}{z_n} \frac{z_n-z}{1-\overline{z_n} z}

    where {m} is the order of vanishing of {f} at zero, show that this product converges absolutely to a meromorphic function on {{\bf C}} outside of the {1/\overline{z_n}}, and that {|f(z)| \leq \|f\|_{H^\infty(D(0,1)} |B(z)|} for all {z \in D(0,1)}. (It may be easier to work with finite Blaschke products first to obtain this bound.)
  • (iii) Continuing the notation from (i), establish a factorisation {f(z) = B(z) \exp(g(z))} for some holomorphic function {g: D(0,1) \rightarrow {\bf C}} with {\mathrm{Re}(z) \leq \log \|f\|_{H^\infty(D(0,1)}} for all {z\in D(0,1)}.
  • (iv) (Theorem of F. and M. Riesz, special case) If {f \in H^\infty(D(0,1))} extends continuously to the boundary {\{e^{i\theta}: 0 \leq \theta < 2\pi\}}, show that the set {\{ 0 \leq \theta < 2\pi: f(e^{i\theta})=0 \}} has zero measure.

Remark 13 The factorisation (iii) can be refined further, with {g} being the Poisson integral of some finite measure on the unit circle. Using the Lebesgue decomposition of this finite measure into absolutely continuous parts one ends up factorising {H^\infty(D(0,1))} functions into “outer functions” and “inner functions”, giving the Beurling factorisation of {H^\infty}. There are also extensions to larger spaces {H^p(D(0,1))} than {H^\infty(D(0,1))} (which are to {H^\infty} as {L^p} is to {L^\infty}), known as Hardy spaces. We will not discuss this topic further here, but see for instance this text of Garnett for a treatment.

Exercise 14 (Littlewood’s lemma) Let {f} be holomorphic on an open neighbourhood of a rectangle {R = \{ \sigma+it: \sigma_0 \leq \sigma \leq \sigma_1; 0 \leq t \leq T \}} for some {\sigma_0 < \sigma_1} and {T>0}, with {f} non-vanishing on the boundary of the rectangle. Show that

\displaystyle  2\pi \sum_\rho (\mathrm{Re}(\rho)-\sigma_0) = \int_0^T \log |f(\sigma_0+it)|\ dt - \int_0^T \log |f(\sigma_1+it)|\ dt

\displaystyle  + \int_{\sigma_0}^{\sigma_1} \mathrm{arg} f(\sigma+iT)\ d\sigma - \int_{\sigma_0}^{\sigma_1} \mathrm{arg} f(\sigma)\ d\sigma

where {\rho} ranges over the zeroes of {f} inside {R} (counting multiplicity) and one uses a branch of {\mathrm{arg} f} which is continuous on the upper, lower, and right edges of {C}. (This lemma is a popular tool to explore the zeroes of Dirichlet series such as the Riemann zeta function.)

— 2. The Weierstrass factorisation theorem —

The fundamental theorem of algebra shows that every polynomial {P(z)} of degree {n} comes with {n} complex zeroes {z_1,\dots,z_n} (counting multiplicity). In the converse direction, given any {n} complex numbers {z_1,\dots,z_n} (again allowing multiplicity), one can form a degree {n} polynomial {P(z)} with precisely these zeroes by the formula

\displaystyle  P(z) := c (z-z_1) \dots (z-z_n) \ \ \ \ \ (11)

where {c} is an arbitrary non-zero constant, and by the factor theorem this is the complete set of polynomials with this set of zeroes (counting multiplicity). Thus, except for the freedom to multiply polynomials by non-zero constants, one has a one-to-one correspondence between polynomials (excluding the zero polynomial as a degenerate case) and finite (multi-)sets of complex numbers.

As discussed earlier in this set of notes, one can think of a entire function as a sort of infinite degree analogue of a polynomial. One can then ask what the analogue of the above correspondence is for entire functions are – can one identify entire functions (not identically zero, and up to constants) by their sets of zeroes?

There are two obstructions to this. Firstly there are a number of non-trivial entire functions with no zeroes whatsoever. Most prominently, we have the exponential function {\exp(z)} which has no zeroes despite being non-constant. More generally, if {g(z)} is an entire function, then clearly {\exp(g(z))} is an entire function with no zeroes. In particular one can multiply (or divide) any other entire function {f(z)} by {\exp(g(z))} without affecting the location and order of the zeroes.

Secondly, we know (see Corollary 24 of 246A Notes 3) that the set of zeroes of an entire function (that is not identically zero) must be isolated; in particular, in any compact set there can only be finitely many zeroes. Thus, by covering the complex plane by an increasing sequence of compact sets (e.g., the disks {\overline{D(0,n)}}), one can index the zeroes (counting multiplicity) by a sequence {z_1,z_2,\dots} of complex numbers (possibly with repetition) that is either finite, or goes to infinity.

Now we turn to the Weierstrass factorisation theorem, which asserts that once one accounts for these two obstructions, we recover a correspondence between entire functions and sequences of zeroes.

Theorem 15 (Weierstrass factorization theorem) Let {z_1,z_2,\dots} be a sequence of complex numbers that is either finite or going to infinity. Then there exists an entire function {f} that has zeroes precisely at {z_1,z_2,\dots}, with the order of zero of {f} at each {z_j} equal to the number of times {z_j} appears in the sequence. Furthermore, this entire function is unique up to multiplication by exponentials of entire functions; that is to say, if {f, \tilde f} are entire functions that are both of the above form, then {\tilde f(z) = f(z) \exp(g(z))} for some entire function {g}.

We now establish this theorem. We begin with the easier uniqueness part of the theorem. If {\tilde f, f} are entire functions with the same locations and orders of zeroes, then the ratio {\tilde f/f} is a meromorphic function on {{\bf C}} which only has removable singularities, and becomes an entire function with no zeroes once the singularities are removed. Since the domain {{\bf C}} of an entire function is simply connected, we can then take a branch of the complex logarithm {\tilde f/f} (see Exercise 46 of 246A Notes 4) to write {\tilde f/f = \exp(g)} for an entire function {g} (after removing singularities), giving the uniqueness claim.

Now we turn to existence. If the sequence {z_1,z_2,\dots} is finite, we can simply use the formula (11) to produce the required entire function {f} (setting {c} to equal {1}, say). So now suppose that the sequence is infinite. Naively, one might try to replicate the formula (11) and set

\displaystyle  f(z) := \prod_{n=1}^\infty (z-z_n).

Here we encounter a serious problem: the infinite product {\prod_{n=1}^\infty (z-z_n)} is likely to be divergent (that is to say, the partial products {\prod_{n=1}^N (z-z_n)}) fail to converge, given that the factors {z-z_n} go off to infinity. On the other hand, we do have this freedom to multiply {f} by a constant (or more generally, the exponential of an entire function). One can try to use this freedom to “renormalise” the factors {z-z_n} to make them more likely to converge. Much as an infinite series {\sum_{n=1}^\infty a_n} is more likely to converge when its summands {a_n} converge rapidly to zero, an infinite series {\prod_{n=1}^\infty a_n} is more likely to converge when its factors {a_n} converge rapidly to {1}. Here is one formalisation of this principle:

Lemma 16 (Absolutely convergent products) Let {a_n} be a sequence of complex numbers such that {\sum_{n=1}^\infty |a_n-1| < \infty}. Then the product {\prod_{n=1}^\infty a_n} converges. Furthermore, this product vanishes if and only if one of the factors {a_n} vanishes.

Products covered by this lemma are known as absolutely convergent products. It is possible for products to converge without being absolutely convergent, but such “conditionally convergent products” are infrequently used in mathematics.

Proof: By the zero test, {|a_n-1| \rightarrow 0}, thus {a_n} converges to {1}. In particular, all but finitely many of the {a_n} lie in the disk {D(1,1/2)}. We can then factor {\prod_{n=1}^\infty a_n = \prod_{n=1}^N a_n \times \prod_{n=N+1}^\infty a_n}, where {N} is such that {a_n \in D(1,1/2)} for {n > N}, and we see that it will suffice to show that the infinite product {\prod_{n=N+1}^\infty a_n} converges to a non-zero number. But on using the standard branch {\mathrm{Log}} of the complex logarithm on {D(1,1/2)} we can write {a_n = \exp( \mathrm{Log} a_n )}. By Taylor expansion we have {\mathrm{Log} a_n = O( |a_n-1| )}, hence the series {\sum_{n=N+1}^\infty \mathrm{Log} a_n} is absolutely convergent. From the properties of the complex exponential we then see that the product {\prod_{n=N+1}^\infty a_n} converges to {\exp(\sum_{n=N+1}^\infty \mathrm{Log} a_n)}, giving the claim. \Box

It is well known that absolutely convergent series are preserved by rearrangement, and the same is true for absolutely convergent products:

Exercise 17 If {\prod_{n=1}^\infty a_n} is an absolutely convergent product of complex numbers {a_n}, show that any permutation of the {a_n} leads to the same absolutely convergent product, thus {\prod_{n=1}^\infty a_n = \prod_{m=1}^\infty a_{\phi(m)}} for any permutation {\phi} of the positive integers {\{1,2,3,\dots\}}.

  • (i) Let {a_n} be a sequence of real numbers with {a_n \geq 1} for all {n}. Show that {\prod_{n=1}^\infty a_n} converges if and only if {\sum_{n=1}^\infty (a_n-1)} converges.
  • (ii) Let {z_n} be a sequence of complex numbers. Show that {\prod_{n=1}^\infty (1+z_n)} is absolutely convergent if and only if {\prod_{n=1}^\infty (1+|z_n|)} is convergent.

To try to use Lemma 16, we can divide each factor {z-z_n} by the constant {-z_n} to make it closer to {1} in the limit {n \rightarrow \infty}. Since

\displaystyle  \frac{z-z_n}{-z_n} = 1 - \frac{z}{z_n}

we can thus attempt to construct the desired entire function {f} using the formula

\displaystyle  f(z) = \prod_{n=1}^\infty (1 - \frac{z}{z_n}). \ \ \ \ \ (12)

Immediately there is the objection that this product is undefined if one or more of the putative zeroes {z_n} is located at the origin, this objection is easily dealt with since the origin can only occur finitely many (say {m}) times, so if we remove the {m} copies of the origin from the sequence of zeroes {z_n}, apply the Weierstrass factorisation theorem to the remaining zeroes, and then multiply the resulting entire function by {z^m}, we can reduce to the case where the origin is not one of the zeroes.

In order to apply Lemma 16 to make this product converge, we would need {\sum_{n=1}^\infty \frac{|z|}{|z_n|}} to converge for every {z}, or equivalently that

\displaystyle  \prod_{n=1}^\infty \frac{1}{|z_n|} < \infty. \ \ \ \ \ (13)

This is basically a requirement that the {z_n} converge to infinity sufficiently quickly. Not all sequences {z_n} covered by Theorem 15 obey this condition, but let us begin with this case for sake of argument. Lemma 16 now tells us that {f(z)} is well-defined for every {z}, and vanishes if and only if {z} is equal to one of the {z_n}. However, we need to establish holomorphicity. This can be accomplished by the following product form of the Weierstrass {M}-test.

Exercise 18 (Product Weierstrass {M}-test) Let {X} be a set, and for any natural number {n}, let {f_n: X \rightarrow {\bf C}} be a bounded function. If the sum {\sum_{n=1}^\infty \sup_{x \in X} |f_n(x)-1| \leq M} for some finite {M}, show that the products {\prod_{n=1}^N f_n} converge uniformly to {\prod_{n=1}^\infty f_n} on {X}. (In particular, if {X} is a topological space and all the {f_n} are continuous, then {\prod_{n=1}^\infty f_n} is continuous also.)

Using this exercise, we see that (under the assumption (13)) that the partial products {\prod_{n=1}^N (1 - \frac{z}{z_n})} converge locally uniformly to the infinite product in (12). Since each of the partial products are entire, and the (locally) uniform limit of holomorphic functions is holomorphic (Theorem 34 of 246A Notes 3), we conclude that the function (12) is entire. Finally, if a certain zero {z_j} appears {m} times in the sequence, then after factoring out {m} copies of {(1-\frac{z}{z_j})} we see that {f(z)} is the product of {(1-\frac{z}{z_j})^m} with an entire function that is non-vanishing at {z_j}, and thus {f} has a zero of order exactly {m} at {z_j}. This establishes the Weierstrass approximation theorem under the additional hypothesis that (13) holds.

What if (13) does not hold? The problem now is that our renormalized factors {1 - \frac{z}{z_n}} do not converge fast enough for Lemma 16 or Exercise 18 to apply. So we need to renormalize further, taking advantage of our ability to not just multiply by constants, but also by exponentials of entire functions. Observe that if {z} is fixed and {n} is large enough, then {1 - \frac{z}{z_n}} lies in {D(1,1/2)} and we can write

\displaystyle  1 - \frac{z}{z_n} = \exp( \mathrm{Log}( 1 - \frac{z}{z_n} ) ).

The function {\mathrm{Log}( 1 - \frac{z}{z_n} )} is not entire, but its Taylor approximation {-\frac{z}{z_n}} is. So it is natural to split

\displaystyle  1 - \frac{z}{z_n} = \exp(-\frac{z}{z_n}) \exp( \mathrm{Log}( 1 - \frac{z}{z_n} ) + \frac{z}{z_n} ) = \exp(-\frac{z}{z_n}) e^{\frac{z}{z_n}} (1-\frac{z}{z_n}).

We now discard the {\exp(-\frac{z}{z_n})} factors (which do not affect the zeroes) and now propose the new candidate entire function

\displaystyle  f(z) = \prod_{n=1}^\infty e^{\frac{z}{z_n}} (1-\frac{z}{z_n}).

The point here is that for {n} large enough, Taylor expansion gives

\displaystyle  \mathrm{Log}( 1 - \frac{z}{z_n} ) + \frac{z}{z_n} = O( \frac{|z|^2}{|z_n|^2})

and thus on exponentiating

\displaystyle  e^{\frac{z}{z_n}} (1-\frac{z}{z_n}) = 1 + O( \frac{|z|^2}{|z_n|^2}).

Repeating the previous arguments, we can then verify that {f} is an entire function with the required properties as long as we have the hypothesis

\displaystyle  \prod_{n=1}^\infty \frac{1}{|z_n|^2} < \infty \ \ \ \ \ (14)

which is weaker than (13) (for instance it is obeyed when {z_n=n}, whereas (13) fails in this case).

This suggests the way forward to the general case of the Weierstrass factorisation theorem, by using increasingly accurate Taylor expansions of

\displaystyle  \mathrm{Log}( 1 - \frac{z}{z_n} ) = \frac{z}{z_n} + \frac{1}{2} \frac{z^2}{z_n^2} + \frac{1}{3} \frac{z^3}{z_n^3} + \dots

when {|z|/|z_n| \leq 1/2}. To formalise this strategy, it is convenient to introduce the canonical factors

\displaystyle  E_k(z) := (1-z) \exp( \sum_{j=1}^k \frac{z^j}{j} ) \ \ \ \ \ (15)

for any complex number {z} and natural number {k}, thus

\displaystyle  E_0(z) = 1-z; \quad E_1(z) = e^z (1-z); \quad E_2(z) = e^{z+z^2/2} (1-z).

For any fixed {k}, these functions are entire and have precisely one zero, at {z=1}. In the disk {D(0,1/2)}, the Taylor expansion

\displaystyle  \mathrm{Log}(1-z) = - z - \frac{z^2}{2} - \frac{z^3}{3} - \dots

indicates that the {E_k(z)} converge uniformly to {1} as {k \rightarrow \infty}. Indeed, for {z \in D(0,1/2)} we have

\displaystyle  E_k(z) = \exp( - \sum_{j=k+1}^\infty \frac{z^j}{j} )

\displaystyle  = \exp( \sum_{j=k+1}^\infty O( 2^{-j}/k ) )

\displaystyle  = \exp( O( 2^{-k} ) )

\displaystyle  = 1 + O(2^{-k}) \ \ \ \ \ (16)

For our choice of entire function {f} we can now try

\displaystyle  f(z) := \prod_{n=1}^\infty E_{k_n}(z/z_n)

where {k_1,k_2,\dots} are natural numbers we are at liberty to please. To get as good a convergence for the product we can make the {k_n} go to infinity as fast as we please; but it turns out that the (somewhat arbitrary) choice of {k_n := n} will suffice for proving Weierstrass’s theorem, that is to say the product

\displaystyle  f(z) := \prod_{n=1}^\infty E_n(z/z_n)

is absolutely convergent (so that Lemma 16 and Exercise 18 applies, producing an entire function with the required zeroes). Indeed, for {z} in any disk {D(0,R)}, there is some {n_0} such that {|z_n| \geq 2R} for {n \geq n_0}, and for such {n} we have {E_n(z/z_n)-1 = O( 2^{-n})} by (16), giving the desired uniform absolute convergence on any disk {D(0,R)}, establishing the Weierstrass theorem.

Exercise 19 Let {U} be a connected non-empty open subset of {{\bf C}}.
  • (i) Show that if {z_1,z_2,\dots} is any sequence of points in {U}, that has no accumulation point inside {U}, then there exists a holomorphic function {f: U \rightarrow {\bf C}} that has zeroes precisely at the {z_j}, with the order of each zero {z_j} being the number of times {z_j} occurs in the sequence.
  • (ii) Show that any meromorphic function on {U} can be expressed as the ratio of two holomorphic functions on {U} (with the denominator being non-zero). Conclude that the field of meromorphic functions on {U} is the fraction field of the ring of holomorphic functions on {U}.

Exercise 20 (Mittag-Leffler theorem, special case) Let {z_n} be a sequence of distinct complex numbers going to infinity, and for each {n}, let {P_n} be a polynomial. Show that there exists a meromorphic function {f: {\bf C} \backslash \{z_1,z_2,\dots\} \rightarrow {\bf C}} whose singularity at each each {z_n} is given by {P_n(\frac{1}{z-z_n})}, in the sense that {f(z) - P_n(\frac{1}{z-z_n})} has a removable singularity at {z_n}. (Hint: consider a sum of the form {\sum_n P_n(\frac{1}{z-z_n}) - Q_n(z)}, where {Q_n(z)} is a partial Taylor expansion of {P_n(\frac{1}{z-z_n})} in the disk {D(0,|z_n|/2)}, chosen so that the sum becomes locally uniformly absolutely convergent.) This is a special case of the Mittag-Leffler theorem, which is the same statement but in which the domain {{\bf C}} is replaced by an arbitrary open set {U}; however, the proof of this generalisation is more difficult, requiring tools such as Runge’s approximation theorem which are not covered here.

— 3. The Hadamard factorisation theorem —

The Weierstrass factorisation theorem (and its proof) shows that any entire function {f} that is not identically zero can be factorised as

\displaystyle  f(z) = e^{g(z)} z^m \prod_{n=1}^\infty E_n(z/z_n)

where {m} is the order of vanishing of {f} at the origin, {z_n} are the zeroes of {f} away from the origin (counting multiplicity), and {g} is an additional entire function. However, this factorisation is not always convenient to work with in practice, in large part because the index {n} of the canonical factors {E_n} involved are unbounded, and also because not much information is provided about the entire function {g}. It turns out the situation becomes better if the entire function {f} is also known to be of order at most {\rho} for some {\rho \geq 0}, by which we mean that

\displaystyle |f(z)| \leq C_\varepsilon \exp( |z|^{\rho+\varepsilon})

for every {\varepsilon>0} and {z}, or in asymptotic notation

\displaystyle  f(z) = O_\varepsilon(\exp( |z|^{\rho+\varepsilon} )). \ \ \ \ \ (17)

In this case we expect to obtain a more precise factorisation for the following reason. Let us suppose we are in the simplest case where {f} has no zeroes whatsoever. Then we have {f = \exp(g)} for an entire function {g}. From (17) we then have

\displaystyle  \mathrm{Re} g(z) \leq O_\varepsilon(( 1 + |z|)^{\rho+\varepsilon}) \ \ \ \ \ (18)

for all {z}. This only controls the real part of {g}, but by applying the Borel-Carathéodory theorem (Exercise 10) to the disks {\overline{D(0,2|z|)}} we obtain a bound of the form

\displaystyle  g(z) = O_{\rho,\varepsilon,g(0)}( (1 + |z|)^{\rho+\varepsilon} )

for all {\varepsilon>0} and all {z}. That is to say, {g} is of polynomial growth of order at most {\rho}. Applying Exercise 29 of 246A Notes 3, we conclude that {g} must be a polynomial, and given its growth rate, it must have degree at most {d}, where {d := \lfloor \rho \rfloor} is the integer part of {\rho}. This hints that the entire function {g} that appears in the Weierstrass factorisation theorem could be taken to be a polynomial, if the theorem is formulated correctly. This is indeed the case, and is the content of the Hadamard factorisation theorem:

Theorem 21 (Hadamard factorisation theorem) Let {\rho \geq 0}, let {d := \lfloor \rho \rfloor}, and let {f} be an entire function of order at most {\rho}, with a zero of order {m \geq 0} at the origin and the remaining zeroes indexed (with multiplicity) as a finite or infinite sequence {z_1, z_2, \dots}. Then

\displaystyle  f(z) = e^{g(z)} z^m \prod_n E_d(z/z_n) \ \ \ \ \ (19)

for some polynomial {g} of degree at most {d}. The convergence in the infinite product is locally uniform.

We now prove this theorem. By dividing out by {z^m} (which does not affect the order of {f}) and removing the singularity at the origin, we may assume that {m=0}. If there are no other zeroes {z_n} then we are already done by the previous discussion; similarly if there are only finitely many zeroes {z_n} we can divide out by the finite number of elementary factors and remove singularities and again reduce to a case we have already established. Hence we may suppose that the sequence of zeroes {z_n} is infinite. As the zeroes of {f} are isolated, this forces {z_n} to go to infinity as {n \rightarrow \infty}.

Let us first check that the product {\prod_{n=1}^\infty E_d(z/z_n)} is absolutely convergent and locally uniform. A modification of the bound (16) shows that

\displaystyle  E_d(z/z_n) = 1 + O_d( |z|^{d+1}/|z_n|^{d+1})

if {|z|/|z_n| \leq 1/2}, which is the case for all but finitely many {n} if {z} is confined to a fixed compact set. Thus we will get absolute convergence from Lemma 16 (and also holomorphicity of the product from Exercise 18) once we establish the convergence

\displaystyle  \sum_{n=1}^\infty \frac{1}{|z_n|^{d+1}} < \infty \ \ \ \ \ (20)

(compare with (13), (14), which are basically the {d=0,1} cases of this analysis).

To achieve this convergence we will use the technique of dyadic decomposition (a generalisation of the Cauchy condensation test). Only a finite number of zeroes {z_n} lie in the disk {D(0,1)}, and we have already removed all zeroes at the origin, so by removing those finite zeroes we may assume that {|z_n| \geq 1} for all {n}. In particular, each remaining zero {z_n} lies in an annulus {\{ z: 2^k \leq |z| < 2^{k+1}\}} for some natural number {k}. On each such annulus, the expression {\frac{1}{|z_n|^{d+1}}} is at most {2^{-k(d+1)}} (and is in fact comparable to this quantity up to a constant depending on {d}, which is why we expect the dyadic decomposition method to be fairly efficient). Grouping the terms in (20) according to the annulus they lie in, it thus suffices to show that

\displaystyle  \sum_{k=0}^\infty \frac{N_k}{2^{k(d+1)}} < \infty

where {N_k} is the number of zeroes of {f} (counting multiplicity) in the annulus {\{ z: 2^k \leq |z| < 2^{k+1}\}}. But by Proposition 8, one has {N_k = O_{\rho,\varepsilon,f}( 2^{k(\rho+\varepsilon)})} for any {\varepsilon>0}. Since {d} is the integer part of {\rho}, one can choose {\varepsilon} small enough that {\rho+\varepsilon < d+1}, and so the series is dominated by a convergent geometric series and is thus itself convergent. This establishes the convergence of the product {\prod_n E_d(z/z_n)}. This function has zeroes in the same locations as {f} with the same orders, so by the same arguments as before we have

\displaystyle  f(z) = e^{g(z)} \prod_{n=1}^\infty E_d(z/z_n)

for some entire function {g}. It remains to show that {g} is a polynomial of degree at most {d}. If we could show the bound (18) for all {z} and any {\varepsilon>0}, we would be done. Taking absolute values and logarithms, we have

\displaystyle  \mathrm{Re} g(z) = \log |f(z)| - \sum_{n=1}^\infty \log |E_d(z/z_n)|

for {z \neq z_1,z_2,\dots} and hence as {f} is of order {\rho}, we have

\displaystyle  \mathrm{Re} g(z) \leq O_{f,\rho,\varepsilon}( (1+|z|)^{\rho+\varepsilon} ) - \sum_{n=1}^\infty \log |E_d(z/z_n)|

for {z \neq z_1,z_2,\dots}. So if, for a given choice of {\varepsilon>0}, we could show the lower bound

\displaystyle  \sum_{n=1}^\infty \log |E_d(z/z_n)| \geq - O_{f,\rho,\varepsilon}( (1+|z|)^{\rho+\varepsilon} ) \ \ \ \ \ (21)

for all {z \neq z_1,z_2,\dots}, we would be done (since {g} is continuous at each {z_n}, so the restriction {z \neq z_1,z_2,\dots} can be removed as far as upper bounding {g} is concerned). Unfortunately this can’t quite work as stated, because the factors {E_d(z/z_n)} go to zero as {z} approaches {z_n}, so {\log |E_d(z/z_n)|} approaches {-\infty}. So the desired bound (21) can’t work when {z} gets too close to one of the zeroes {z_n}. On the other hand, this logarithmic divergence is rather mild, so one can hope to somehow evade it. Indeed, suppose we are still able to obtain (21) for a sufficiently “dense” set of {z}, and more precisely for all {z} on a sequence {\{ |z| = R_k \}} of circles of radii {2^k \leq R_k <2^{k+1}} for {k=1,2,\dots}. Then the above argument lets us establish the upper bound

\displaystyle  \mathrm{Re} g(z) \leq O_{f,\rho,\varepsilon}( (1+|z|)^{\rho+\varepsilon} )

when {|z|=R_k}. But by the maximum principle (applied to the harmonic function {\mathrm{Re} g}) this then gives

\displaystyle  \mathrm{Re} g(z) \leq O_{f,\rho,\varepsilon}( (1+|z|)^{\rho+\varepsilon} )

for all {z} (because we have an upper bound of {O_{f,\rho,\varepsilon}( (1+R_k)^{\rho+\varepsilon} )} on each disk {D(0,R_k)}, and each {z} lies in one of the disks {D(0,R_k)} with {1+|z| \sim 1+R_k}).

So it remains to establish (21) for {z} in a sufficiently dense set of circles {\{ |z| = R_k\}}. We need lower bounds on {\log |E_d(z/z_n)|}. In the regime where the zeroes are distant in the sense that {|z|/|z_n| \leq 1/2}, Taylor expansion gives

\displaystyle  E_d(z/z_n) = \exp( O_d( |z|^{d+1}/|z_n|^{d+1} ) )

and hence

\displaystyle  \sum_{n: |z|/|z_n| \leq 1/2} \log |E_d(z/z_n)| \geq - O_d( \sum_{n: |z|/|z_n| \leq 1/2} |z|^{d+1}/|z_n|^{d+1} ).

We can adapt the proof of (20) to control this portion adequately:

Exercise 22 Establish the upper bounds

\displaystyle \sum_{n: |z|/|z_n| \leq 1/2} |z|^{d+1}/|z_n|^{d+1} \leq O_{f,\rho,\varepsilon}( (1+|z|)^{\rho+\varepsilon} )

and

\displaystyle \sum_{n: |z|/|z_n| > 1/2} |z|^{d}/|z_n|^{d} \leq O_{f,\rho,\varepsilon}( (1+|z|)^{\rho+\varepsilon} )

(the latter bound will be useful momentarily).

It thus remains to control the nearby zeroes, in the sense of showing that

\displaystyle  \sum_{n: |z|/|z_n| > 1/2} \log |E_d(z/z_n)| \geq -O_{f,\rho,\varepsilon}( (1+|z|)^{\rho+\varepsilon} )

for all {z} in a set of concentric circles {\{ |z| = R_k\}} with {2^k \leq R_k < 2^{k+1}}. Thus we need to lower bound {\log |E_d(w)|} for {|w| > 1/2}. Here we no longer attempt to take advantage of Taylor expansion as the convergence is too poor in this regime, so we fall back on the triangle inequality. Indeed from
(15) and that inequality we have

\displaystyle  \log |E_d(w)| \geq - \log \frac{1}{|1-w|} - \sum_{j=1}^k \frac{|w|^j}{j} \geq \log \frac{1}{|1-w|} - O_d(|w|^d)

hence

\displaystyle  \sum_{n: |z|/|z_n| > 1/2} \log |E_d(z/z_n)| \geq -\sum_{n: |z|/|z_n| > 1/2} \log \frac{1}{|1-z/z_n|}

\displaystyle - O_d(\sum_{n: |z|/|z_n| > 1/2} \frac{|z|^d}{|z_n|^d} ).

From the second part of Exercise 22, the contribution of the latter term is acceptable, so it remains to establish the upper bound

\displaystyle  \sum_{n: |z|/|z_n| > 1/2} \log \frac{1}{|1-z/z_n|} \leq O_{f,\rho,\varepsilon}( (1+|z|)^{\rho+\varepsilon} )

for {z} in a set of concentric circles {\{ |z| = R_k\}} with {2^k \leq R_k < 2^{k+1}}.

Given the set of {z} we are working with, it is natural to introduce the radial variable {r := |z|}. From the triangle inequality one has {|1-z/z_n| \geq |1-r/|z_n||}, so it will suffice to show that

\displaystyle  \sum_{n: r/|z_n| > 1/2} \log \frac{1}{|1-r/|z_n||} \leq O_{f,\rho,\varepsilon}( 2^{(\rho+\varepsilon)k} )

for at least one radius {r \in [2^k, 2^{k+1})} for each natural number {k}. Note that we cannot just pick any radius {r} here, because if {r} happens to be too close to one of the {|z_n|} then the term {\log \frac{1}{|1-r/|z_n||}} will become unbounded. But we can avoid this by the strategy of the probabilistic method: we just choose {r} randomly in the interval {[2^k, 2^{k+1})}. As long we can establish the average bound

\displaystyle  \frac{1}{2^k} \int_{2^k}^{2^{k+1}} \sum_{n: r/|z_n| > 1/2} \log \frac{1}{|1-r/|z_n||}\ dr \leq O_{f,\rho,\varepsilon}( 2^{(\rho+\varepsilon)k} )

we can use the pigeonhole principle to find one good radius {r} in the desired range, which is all we need.

The point is that this averaging can take advantage of the mild nature of the logarithmic singularity. Indeed a routine computation shows that

\displaystyle  \frac{1}{2^k} \int_{2^k}^{2^{k+1}} 1_{r/R > 1/2} \log \frac{1}{|1-r/R|}\ dr

vanishes unless {R \leq 2^{k+2}} and is bounded by {O(1)} otherwise, so by Fubini’s theorem the left-hand side is bounded by

\displaystyle  O( \sum_{n: |z_n| \leq 2^{k+2}} 1 )

and the claim now follows from Proposition 8. This concludes the proof of the Hadamard factorisation theorem.

Exercise 23 (Converse to Hadamard factorisation) Let {\rho \geq 0}, let {m} be a natural number, and let {z_1,z_2,\dots} be a finite or infinite sequence of non-zero complex numbers such that {\sum_{n=1}^\infty \frac{1}{|z_n|^{\rho+\varepsilon}} < \infty} for every {\varepsilon>0}. Let {d := \lfloor \rho\rfloor}. Show that for every polynomial {g} of degree at most {d}, the function {f} defined by (19) is an entire function of order at most {\rho}, with a zero of order {m} at the origin, zeroes at each {z_j} of order equal to the number of times {z_j} occurs in the sequence, and no other zeroes. Thus we see that we have a one-to-one correspondence between non-trivial entire functions of order at most {\rho} ( up to multiplication by {e^{g(z)}} factors for {g} a polynomial of degree at most {d}) and zeroes {z_n} obeying a certain growth condition.

As an illustration of the Hadamard factorisation theorem, we apply it to the entire function {\sin(z)}. Since

\displaystyle  \sin(z) = O( |e^z| + |e^{-z}| ) = O( \exp( |z| ) )

we see that {\sin} is of order at most {1} (indeed it is not difficult to show that its order is exactly {1}). Also, its zeroes are the integer multiples {\pi n} of {\pi}, with {n \in {\bf Z}}. The Hadamard factorisation theorem then tells us that we have the product expansion

\displaystyle  \sin(z) = e^{g(z)} z \prod_{n \in {\bf Z} \backslash \{0\}} E_1(\frac{z}{\pi n})

for some polynomial {g} of degree at most {1}. Writing {E_1(\frac{z}{\pi n}) = e^{z/\pi n} (1 - \frac{z}{\pi n})} we can group together the {n} and {-n} terms in the absolutely convergent product to get

\displaystyle  E_1(\frac{z}{\pi n}) E_1(\frac{z}{\pi (-n)}) = (1 - \frac{z^2}{\pi^2 n^2})

so the Hadamard factorisation can also be written in the form

\displaystyle  \sin(z) = e^{g(z)} z \prod_{n=1}^\infty (1 - \frac{z^2}{\pi^2 n^2}).

But what is {g(z)}? Dividing by {z} and taking a suitable branch {\tilde \log} of the logarithm of {\frac{\sin z}{z}} we see that

\displaystyle  \tilde \log \frac{\sin z}{z} = g(z) + \sum_{n=1}^\infty \mathrm{Log} (1 - \frac{z^2}{\pi^2 n^2}) \ \ \ \ \ (22)

for {z} sufficiently close to zero (removing the singularity of {\frac{\sin z}{z}} at the origin). By Taylor expansion we have

\displaystyle  \frac{\sin z}{z} = 1 - \frac{z^2}{3!} + \dots

and

\displaystyle  \mathrm{Log} (1 - \frac{z^2}{\pi^2 n^2}) = - \frac{z^2}{\pi^2 n^2} + \dots

for {z} close to {1}, thus

\displaystyle  \tilde \log \frac{\sin z}{z} = 2k \pi i - \frac{z^2}{3!} + \dots

for some integer {k}. Since {g} is linear, we conclude on comparing coefficients that {g} must in fact just be a constant {2k \pi i}, and we can in fact normalise {k=0} since shifting {g} by an integer multiple of {2\pi i} does not affect {e^{g(z)}}. We conclude the Euler sine product formula

\displaystyle  \sin(z) = z \prod_{n=1}^\infty (1 - \frac{z^2}{\pi^2 n^2}) \ \ \ \ \ (23)

(first conjectued by Euler in 1734). Inspecting the {z^2} coefficients of (22), we also see that

\displaystyle  -\frac{1}{3!} = \sum_{n=1}^\infty -\frac{1}{\pi^2 n^2}

which rearranges to yield Euler’s famous formula

\displaystyle  \sum_{n=1}^\infty \frac{1}{n^2} = \frac{\pi^2}{6} \ \ \ \ \ (24)

that solves the Basel problem. Observe that by inspecting higher coefficients {z^{2k}} of the Taylor series one more generally obtains identities of the form

\displaystyle  \sum_{n=1}^\infty \frac{1}{n^{2k}} = b_{2k} \pi^{2k}

for all positive integers {k} and some rational numbers {b_{2k}} (which are essentially Bernoulli numbers). Applying (23) to {z=\pi/2} and rearranging one also gets the famous Wallis product

\displaystyle  \frac{\pi}{2} = \prod_{n=1}^\infty (1 - \frac{1}{4n^2})^{-1}.

Hadamard’s theorem also tells us that any other entire function of order at most {1} that has simple zeroes at the integer multiples of {\pi}, and no other zeroes, must take the form {e^{az+b} \sin(z)} for some complex numbers {a,b}. Thus the sine function is almost completely determined by its set of zeroes, together with the fact that it is an entire function of order {1}.

Exercise 24 Show that

\displaystyle  \sum_{n \in {\bf Z}} \frac{1}{z^2 - n^2} = \frac{\pi \cot(\pi z)}{z}

for any complex number {z} that is not an integer. Use this to give an alternate proof of (24).

— 4. The Gamma function —

As we saw in the previous section (and after applying a simple change of variables), the only entire functions of order {1} that have simple zeroes at the integers (and nowhere else) are of the form {e^{az+b} \sin(\pi z)}. It is natural to ask what happens if one replaces the integers by the natural numbers {\{0, 1, 2, \dots\}}; one could think of such functions as being in some sense “half” of the function {\sin(\pi z)}. Actually, it is traditional to normalise the problem a different way, and ask what entire functions of order {1} have zeroes at the non-positive integers {0, -1, -2, -3, \dots}; it is also traditional to refer to the complex variable in this problem by {s} instead of {z}. By the Hadamard factorisation theorem, such functions must take the form

\displaystyle  f(s) = s e^{as+b} \prod_{n=1}^\infty E_1(-s/n)

\displaystyle  =s e^{as+b} \prod_{n=1}^\infty e^{-s/n} (1+\frac{s}{n})

for some complex numbers {a,b}; conversely, Exercise 23 tells us that every function of this form is entire of order at most {1}, with simple zeroes at {0,-1,-2,\dots}. We are free to select the constants {a,b} as we please to produce a function {f} of this class. It is traditional to normalise {b=0}; for the most natural normalisation of {a}, see below.

What properties would such functions have? The zero set {0,-1,-2,\dots} are nearly invariant with respect to the shift {s \mapsto s+1}, so one expects {f(s)} and {f(s+1)} to be related. Indeed we have

\displaystyle  f(s+1) = (s+1) e^{as+a} \prod_{n=1}^\infty e^{-(s+1)/n} (1+\frac{s+1}{n})

\displaystyle  = (s+1) e^{as+a} \prod_{n=2}^\infty e^{-(s+1)/(n-1)} (1+\frac{s+1}{n-1})

\displaystyle  = (s+1) e^{as+a} \prod_{n=2}^\infty e^{-s/(n-1)} (1+\frac{s}{n}) (1+\frac{1}{n-1}) e^{-1/(n-1)}

while

\displaystyle  f(s) = s e^{as} e^{-s} (1+s) \prod_{n=2}^\infty e^{-s/n} (1+\frac{s}{n})

so we have (using the telescoping series {\sum_{n=2}^\infty \frac{1}{n-1}-\frac{1}{n} = 1})

\displaystyle  s f(s+1) = f(s) e^a \prod_{n=2}^\infty (1+\frac{1}{n-1}) e^{-1/(n-1)}

\displaystyle  f(s) e^a \prod_{n=1}^\infty (1+\frac{1}{n}) e^{-1/n}.

It is then natural to normalise {a} to be the real number {\gamma} for which

\displaystyle  e^\gamma \prod_{n=1}^\infty (1+\frac{1}{n}) e^{-1/n} = 1.

This number is known as the Euler-Mascheroni constant and is approximately equal to {0.577\dots}. Taking logarithms, we can write it in a more familiar form as

\displaystyle  \gamma = -\lim_{N \rightarrow \infty} \sum_{N=1}^\infty \log (1+\frac{1}{n}) e^{-1/n}

\displaystyle  = \lim_{N \rightarrow \infty} 1 + \frac{1}{2} + \dots + \frac{1}{N} - \log N.

With this choice for {a}, {f(s)} is then an entire function of order one that obeys the functional equation {s f(s+1) = f(s)} for all {s} and has simple zeroes at the non-positive integers and nowhere else; this uniquely specifies {f} up to constants. The reciprocal

\displaystyle  \Gamma(s) := e^{-\gamma s} s^{-1} \prod_{n=1}^\infty e^{s/n} (1+\frac{s}{n})^{-1}, \ \ \ \ \ (25)

known as the (Weierstrass definition of the) Gamma function, is then a meromorphic function with simple poles at the non-positive integers and nowhere else that obeys the functional equation

\displaystyle  \Gamma(s+1) = s \Gamma(s) \ \ \ \ \ (26)

(for {z} away from the poles {0,-1,-2,\dots}) and is the reciprocal of an entire function of order {1} (in particular, it has no zeroes). This uniquely defines {\Gamma} up to constants.

Note that as {s \rightarrow 0}, the function {e^{-\gamma s} \prod_{n=1}^\infty e^{s/n} (1+\frac{s}{n})^{-1}} converges to zero, hence {\Gamma} has a residue of {1} at the origin {s=0}; equivalently, by (26) we have

\displaystyle  \Gamma(1)=1. \ \ \ \ \ (27)

Thus {\Gamma} is the unique reciprocal of an entire function of order {1} with simple poles at the non-positive integers and nowhere else that obeys (26), (27).

From (27), (26) and induction we see that

\displaystyle  \Gamma(n+1) = n!

for all natural numbers {n=0,1,2,\dots}. Thus the Gamma function can be viewed as a complex extension of the factorial function (shifted by a unit). From the definition we also see that

\displaystyle  \Gamma(\overline{s}) = \overline{\Gamma(s)} \ \ \ \ \ (28)

for all {s} outside of the poles of {\Gamma}.

One can readily establish several more identities and asymptotics for {\Gamma}:

Exercise 25 (Euler reflection formula) Show that

\displaystyle  \Gamma(s) \Gamma(1-s) = \frac{\pi}{\sin(\pi s)}

whenever {s} is not an integer. (Hint: use the Hadamard factorisation theorem.) Conclude in particular that {\Gamma(1/2) = \sqrt{\pi}}.

Exercise 26 (Digamma function) Define the digamma function to be the logarithmic derivative {\frac{\Gamma'}{\Gamma}} of the Gamma function. Show that the digamma function is a meromorphic function, with simple poles of residue {-1} at the non-positive integers {0, -1, -2, \dots} and no other poles, and that

\displaystyle  \frac{\Gamma'}{\Gamma}(s) = \lim_{N \rightarrow \infty} \log N - \sum_{n=0}^N \frac{1}{s+n}

\displaystyle  = -\gamma + \sum_{\rho = 0, -1, -2, \dots} (\frac{1}{1-\rho} - \frac{1}{s-\rho})

for {s} outside of the poles of {\frac{\Gamma'}{\Gamma}}, with the sum being absolutely convergent. Establish the reflection formula

\displaystyle  \frac{\Gamma'}{\Gamma}(1-s) - \frac{\Gamma'}{\Gamma}(s) = \pi \cot(\pi s) \ \ \ \ \ (29)

or equivalently

\displaystyle  \pi \cot(\pi s) = \sum_{\rho \in {\bf Z}} (\frac{1}{s-\rho} - \frac{1}{\rho})

for non-integer {s}.

Exercise 27 (Euler product formula) Show that for any {s \neq 0,-1,-2,\dots}, one has

\displaystyle  \Gamma(s) = \lim_{n \rightarrow \infty} \frac{n! n^s}{s(s+1) \dots (s+n)} = \frac{1}{s} \prod_n \frac{(1+1/n)^s}{1+s/n}.

Exercise 28 Let {s} be a complex number with {\mathrm{Re}(s) > 0}.
  • (i) For any positive integer {n}, show that

    \displaystyle \frac{n! n^s}{s(s+1) \dots (s+n)} = \int_0^n t^s (1-\frac{t}{n})^n\ \frac{dt}{t}.

  • (ii) (Bernoulli definition of Gamma function) Show that

    \displaystyle  \Gamma(s) = \int_0^\infty t^s e^{-t}\ \frac{dt}{t}.

    What happens if the hypothesis {\mathrm{Re} s > 0} is dropped?
  • (iii) (Beta function identity) Show that

    \displaystyle  \int_0^1 t^{s_1-1} (1-t)^{s_2-1}\ dt = \frac{\Gamma(s_1+1) \Gamma(s_2+1)}{\Gamma(s_1+s_2+1)}

    whenever {s_1,s_2} are complex numbers with {\mathrm{Re}(s_1), \mathrm{Re}(s_2) > 0}. (Hint: evaluate {\int_0^\infty \int_0^\infty t_1^{s_1-1} t_2^{s_2-1} e^{-t_1-t_2}\ dt_1 dt_2} in two different ways.)

We remark that the Bernoulli definition of the {\Gamma} function is often the first definition of the Gamma function introduced in texts (see for instance this previous blog post for an arrangement of the material here based on this definition). It is because of the identities (ii), (iii) that the Gamma function often frequently arises when evaluating many other integrals involving polynomials or exponentials, and in particular is a frequent presence in identities involving standard integral transforms, such as the Fourier transform, Laplace transform, or Mellin transform.

Exercise 29
  • (i) (Quantitative form of integral test) Show that

    \displaystyle  \sum_{y \leq n \leq x} f(n) = \int_y^x f(t)\ dt + O( \int_x^y |f'(t)|\ dt + |f(y)| )

    for any real {y \leq x} and any continuously differentiable functions {f: [y,x] \rightarrow {\bf C}}.
  • (ii) Using (i) and Exercise 26, obtain the asymptotic real axis), where {\hbox{Arg}} and {\hbox{Log}} are the standard branches of the argument and logarithm respectively (with branch cut on the negative real axis). From Exercise 26, we obtain the asymptotic

    \displaystyle  \frac{\Gamma'}{\Gamma}(s) = \hbox{Log}(s) + O_\varepsilon( \frac{1}{|s|} )

    whenever {\varepsilon>0} and {s} is in the sector {\hbox{Arg}(z) < \pi - \varepsilon} (that is, {s} makes at least a fixed angle with the negative real axis), where {\hbox{Arg}} and {\hbox{Log}} are the standard branches of the argument and logarithm respectively (with branch cut on the negative real axis).
  • (iii) (Trapezoid rule) Let {y < x} be distinct integers, and let {f: [y,x] \rightarrow {\bf C}} be a continuously twice differentiable function. Show that

    \displaystyle  \sum_{y \leq n \leq x} f(n) = \int_y^x f(t)\ dt + \frac{1}{2} f(x) + \frac{1}{2} f(y) + O( \int_x^y |f''(t)|\ dt ).

    (Hint: first establish the case when {x=y+1}.)
  • (iv) Refine the asymptotics in (ii) to

    \displaystyle  \frac{\Gamma'}{\Gamma}(s) = \hbox{Log}(s) - \frac{1}{2s} + O_\varepsilon( \frac{1}{|s|^2} ).

  • (v) (Stirling approximation) In the sector used in (ii), (iv), establish the Stirling approximation

    \displaystyle  \Gamma(s) = \exp( (s -\frac{1}{2}) \hbox{Log}(s) - s + \frac{1}{2} \log 2\pi + O_\varepsilon( \frac{1}{|s|} ) ).

  • (vi) Establish the size bound

    \displaystyle  |\Gamma(\sigma+it)| \asymp e^{-\pi|t|/2} |t|^{\sigma-\frac{1}{2}} \ \ \ \ \ (30)

    whenever {\sigma,t} are real numbers with {\sigma = O(1)} and {|t| \gtrsim 1}.

Exercise 30

Exercise 31 Show that {\Gamma'(1) = -\gamma}.

Exercise 32 (Bohr-Mollerup theorem) Establish the Bohr-Mollerup theorem: the function {\Gamma: (0,+\infty) \rightarrow (0,+\infty)}, which is the Gamma function restricted to the positive reals, is the unique log-convex function {f :(0,+\infty) \rightarrow (0,+\infty)} on the positive reals with {f(1)=1} and {sf(s) = f(s+1)} for all {s>0}.

Jordan EllenbergCaring about sports

When I was younger I cared about sports a lot. If the Orioles lost a big game — especially to the hated Yankees — it ruined my day, or more than one day. I remember when Dr. Mrs. Q. first found out about this she thought I was kidding; it made no sense to her that somebody could actually care enough to let it turn your whole ship of mood.

CJ is different. It has been an emotionally complicated last few years for Wisconsin sports fans, with all the local teams being good, really good, but never good enough to win the title. The Badgers losing the NCAA final to (the hated) Duke. The Brewers getting rolled out of the NLCS by the Dodgers. Of course, the Bucks, the team with the best record in the league and the two-time MVP, getting knocked out of the playoffs. And today, the 14-3 Packers losing the NFC championship to the Buccaneers. And I gotta say — CJ, while watching a game, is as intensely into his team as I have ever been. But after it’s over? It’s over. He doesn’t stew. I don’t know where he got this equanimity. Not from me, maybe from Dr. Mrs. Q. But I think I’m starting to get it from him. Maybe it just comes with age — or maybe I’m actually learning something.

January 24, 2021

Terence Tao246B, Notes 2: Some connections with the Fourier transform

In Exercise 5 (and Lemma 1) of 246A Notes 4 we already observed some links between complex analysis on the disk (or annulus) and Fourier series on the unit circle:

  • (i) Functions {f} that are holomorphic on an annulus {\{ r_- < |z| < r_+ \}} are expressed by a convergent Fourier series (and also Laurent series) {f(re^{i\theta}) = \sum_{n=-\infty}^\infty r^n a_n e^{in\theta}}, where

    \displaystyle  \limsup_{n \rightarrow +\infty} |a_n|^{1/n} \leq \frac{1}{r_+}; \limsup_{n \rightarrow -\infty} |a_n|^{1/|n|} \leq \frac{1}{r_-}; \ \ \ \ \ (1)

    conversely, every doubly infinite sequence {(a_n)_{n=-\infty}^\infty} of coefficients obeying (1) arises from such a function {f}.
  • (ii) Functions {f} that are holomorphic on a disk {\{ |z| < R \}} are expressed by a convergent Fourier series (and also Taylor series) {f(re^{i\theta}) = \sum_{n=0}^\infty r^n a_n e^{in\theta}} (so in particular {a_n = \frac{1}{n!} f^{(n)}(0)}), where

    \displaystyle  \limsup_{n \rightarrow +\infty} |a_n|^{1/n} \leq \frac{1}{R}; \ \ \ \ \ (2)

    conversely, every infinite sequence {(a_n)_{n=0}^\infty} of coefficients obeying (2) arises from such a function {f}.
  • (iii) In the situation of (i), there is a unique decomposition {f = f_1 + f_2} where {f_1} extends holomorphically to {\{ z: |z| < r_+\}}, and {f_2} extends holomorphically to {\{ z: |z| > r_-\}} and goes to zero at infinity, and are given by the formulae

    \displaystyle  f_1(z) = \sum_{n=0}^\infty a_n z^n = \frac{1}{2\pi i} \int_\gamma \frac{f(w)}{w-z}\ dw

    where {\gamma} is any anticlockwise contour in {\{ z: |z| < r_+\}} enclosing {z}, and and

    \displaystyle  f_2(z) = \sum_{n=-\infty}^{-1} a_n z^n = - = \frac{1}{2\pi i} \int_\gamma \frac{f(w)}{w-z}\ dw

    where {\gamma} is any anticlockwise contour in {\{ z: |z| > r_-\}} enclosing {0} but not {z}.

This connection lets us interpret various facts about Fourier series through the lens of complex analysis, at least for some special classes of Fourier series. For instance, the Fourier inversion formula {a_n = \frac{1}{2\pi} \int_0^{2\pi} f(e^{i\theta}) e^{-in\theta}\ d\theta} becomes the Cauchy-type formula for the Laurent or Taylor coefficients of {f}, in the event that the coefficients are doubly infinite and obey (1) for some {r_- < 1 < r_+}, or singly infinite and obey (2) for some {R > 1}.

It turns out that there are similar links between complex analysis on a half-plane (or strip) and Fourier integrals on the real line, which we will explore in these notes.

We first fix a normalisation for the Fourier transform. If {f \in L^1({\bf R})} is an absolutely integrable function on the real line, we define its Fourier transform {\hat f: {\bf R} \rightarrow {\bf C}} by the formula

\displaystyle  \hat f(\xi) := \int_{\bf R} f(x) e^{-2\pi i x \xi}\ dx. \ \ \ \ \ (3)

From the dominated convergence theorem {\hat f} will be a bounded continuous function; from the Riemann-Lebesgue lemma it also decays to zero as {\xi \rightarrow \pm \infty}. My choice to place the {2\pi} in the exponent is a personal preference (it is slightly more convenient for some harmonic analysis formulae such as the identities (4), (5), (6) below), though in the complex analysis and PDE literature there are also some slight advantages in omitting this factor. In any event it is not difficult to adapt the discussion in this notes for other choices of normalisation. It is of interest to extend the Fourier transform beyond the {L^1({\bf R})} class into other function spaces, such as {L^2({\bf R})} or the space of tempered distributions, but we will not pursue this direction here; see for instance these lecture notes of mine for a treatment.

Exercise 1 (Fourier transform of Gaussian) If {a>0} and {f} is the Gaussian function {f(x) := e^{-\pi a x^2}}, show that the Fourier transform {\hat f} is given by the gaussian {\hat f(\xi) = a^{-1/2} e^{-\pi \xi^2/a}}.

The Fourier transform has many remarkable properties. On the one hand, as long as the function {f} is sufficiently “reasonable”, the Fourier transform enjoys a number of very useful identities, such as the Fourier inversion formula

\displaystyle  f(x) = \int_{\bf R} \hat f(\xi) e^{2\pi i x \xi} d\xi, \ \ \ \ \ (4)

the Plancherel identity

\displaystyle  \int_{\bf R} |f(x)|^2\ dx = \int_{\bf R} |\hat f(\xi)|^2\ d\xi, \ \ \ \ \ (5)

and the Poisson summation formula

\displaystyle  \sum_{n \in {\bf Z}} f(n) = \sum_{k \in {\bf Z}} \hat f(k). \ \ \ \ \ (6)

On the other hand, the Fourier transform also intertwines various qualitative properties of a function {f} with “dual” qualitative properties of its Fourier transform {\hat f}; in particular, “decay” properties of {f} tend to be associated with “regularity” properties of {\hat f}, and vice versa. For instance, the Fourier transform of rapidly decreasing functions tend to be smooth. There are complex analysis counterparts of this Fourier dictionary, in which “decay” properties are described in terms of exponentially decaying pointwise bounds, and “regularity” properties are expressed using holomorphicity on various strips, half-planes, or the entire complex plane. The following exercise gives some examples of this:

Exercise 2 (Decay of {f} implies regularity of {\hat f}) Let {f \in L^1({\bf R})} be an absolutely integrable function. Hint: to establish holomorphicity in each of these cases, use Morera’s theorem and the Fubini-Tonelli theorem. For uniqueness, use analytic continuation, or (for part (iv)) the Cauchy integral formula.

Proof:

Later in these notes we will give a partial converse to part (ii) of this exercise, known as the Paley-Wiener theorem; there are also partial converses to the other parts of this exercise.

From (3) we observe the following intertwining property between multiplication by an exponential and complex translation: if {\xi_0} is a complex number and {f: {\bf R} \rightarrow {\bf C}} is an absolutely integrable function such that the modulated function {f_{\xi_0}(x) := e^{2\pi i \xi_0 x} f(x)} is also absolutely integrable, then we have the identity

\displaystyle  \widehat{f_{\xi_0}}(\xi) = \hat f(\xi - \xi_0) \ \ \ \ \ (7)

whenever {\xi} is a complex number such that at least one of the two sides of the equation in (7) is well defined. Thus, multiplication of a function by an exponential weight corresponds (formally, at least) to translation of its Fourier transform. By using contour shifting, we will also obtain a dual relationship: under suitable holomorphicity and decay conditions on {f}, translation by a complex shift will correspond to multiplication of the Fourier transform by an exponential weight. It turns out to be possible to exploit this property to derive many Fourier-analytic identities, such as the inversion formula (4) and the Poisson summation formula (6), which we do later in these notes. (The Plancherel theorem can also be established by complex analytic methods, but this requires a little more effort; see Exercise 8.

The material in these notes is loosely adapted from Chapter 4 of Stein-Shakarchi’s “Complex Analysis”.

— 1. The inversion and Poisson summation formulae —

We now explore how the Fourier transform {\hat f} of a function {f} behaves when {f} extends holomorphically to a strip. For technical reasons we will also impose a fairly mild decay condition on {f} at infinity to ensure integrability. As we shall shortly see, the method of contour shifting then allows us to insert various exponentially decaying factors into Fourier integrals that make the justification of identities such as the Fourier inversion formula straightforward.

Proposition 3 (Fourier transform of functions holomorphic in a strip) Let {a > 0}, and suppose that {f} is a holomorphic function on the strip {\{ z: |\mathrm{Im} z| < a \}} which obeys a decay bound of the form

\displaystyle  |f(x+iy)| \leq \frac{C_b}{1+|x|^2} \ \ \ \ \ (8)

for all {x \in {\bf R}}, {0 < b < a}, {y \in [-b,b]}, and some {C>0} (or in asymptotic notation, one has {f(x+iy) \lesssim_{b,f} \frac{1}{1+|x|^2}} whenever {x \in {\bf R}} and {|y| \leq b < a}).
  • (i) (Translation intertwines with modulation) For any {w} in the strip {\{ z: |\mathrm{Im}(z)| < a \}}, the Fourier transform of the function {x \mapsto f(x+w)} is {\xi \mapsto e^{2\pi i w \xi} \hat f(\xi)}.
  • (ii) (Exponential decay of Fourier transform) For any {0 < b < a}, there is a quantity {C} such that {|\hat f(\xi)| \leq C_b e^{-2\pi b|\xi|}} for all {\xi \in {\bf R}} (or in asymptotic notation, one has {\hat f(\xi) \lesssim_{a,b,f} e^{-2\pi b|\xi|}} for {0 < b < a} and {\xi \in {\bf R}}).
  • (iii) (Partial Fourier inversion) For any {0 < \varepsilon < a} and {x \in {\bf R}}, one has

    \displaystyle  \int_0^\infty \hat f(\xi) e^{2\pi i x \xi}\ d\xi = \frac{1}{2\pi i} \int_{-\infty}^\infty \frac{f(y-i\varepsilon)}{y-i\varepsilon - x}\ dy

    and

    \displaystyle  \int_{-\infty}^0 \hat f(\xi) e^{2\pi i x \xi}\ d\xi = -\frac{1}{2\pi i} \int_{-\infty}^\infty \frac{f(y+i\varepsilon)}{y+i\varepsilon - x}\ dy.

  • (iv) (Full Fourier inversion) For any {x \in {\bf R}}, the identity (4) holds for this function {f}.
  • (v) (Poisson summation formula) The identity (6) holds for this function {f}.

Proof: We begin with (i), which is a standard application of contour shifting. Applying the definition (3) of the Fourier transform, our task is to show that

\displaystyle  \int_{\bf R} f(x+w) e^{-2\pi i x \xi}\ dx = e^{2\pi i w \xi} \int_{\bf R} f(x) e^{-2\pi i x \xi}\ dx

whenever {|\mathrm{Im} w| < a} and {\xi \in {\bf R}}. Clearly

\displaystyle  \int_{\bf R} f(x) e^{-2\pi i x \xi}\ d\xi = \lim_{R \rightarrow \infty} \int_{\gamma_{-R \rightarrow R}} f(z) e^{-2\pi i z \xi}\ dz

where {\gamma_{z_1 \rightarrow z_2}} is the line segment contour from {z_1} to {z_2}, and similarly after a change of variables

\displaystyle  e^{-2\pi i w \xi} \int_{\bf R} f(x+w) e^{-2\pi i x \xi}\ dx = \lim_{R \rightarrow \infty} \int_{\gamma_{-R+w \rightarrow R+w}} f(z) e^{-2\pi i z \xi}\ dz.

On the other hand, from Cauchy’s theorem we have

\displaystyle  \int_{\gamma_{-R+w \rightarrow R+w}} = \int_{\gamma_{-R \rightarrow R}} + \int_{\gamma_{R \rightarrow R+w}} - \int_{\gamma_{-R \rightarrow -R+w}}

when applied to the holomorphic integrand {f(z) e^{-2\pi i z \xi}}. So it suffices to show that

\displaystyle  \int_{\gamma_{\pm R \rightarrow \pm R+w}} f(z) e^{-2\pi i z \xi}\ dz \rightarrow 0

as {R \rightarrow \infty}. But the left hand side can be rewritten as

\displaystyle  w e^{\mp 2\pi i R \xi} \int_0^1 f(\pm R + tw) e^{-2\pi i tw \xi}\ dt,

and the claim follows from (8) and dominated convergence.

For (ii), we apply (i) with {w = \pm i b} to observe that the Fourier transform of {x \mapsto f(x \pm ib)} is {\xi \mapsto e^{\mp 2\pi b \xi} \hat f(\xi)}. Applying (8) and the triangle inequality we conclude that

\displaystyle  e^{\mp 2\pi b \xi} \hat f(\xi) \lesssim_{f,b} 1

for both choices of sign {\mp} and all {\xi \in {\bf R}}, giving the claim.

For the first part of (iii), we write {f_{-\varepsilon}(y) := f(y-i\varepsilon)}. By part (i), we have {\hat f_{-\varepsilon}(\xi) = e^{2\pi \varepsilon \xi} \hat f(\xi)}, so we can rewrite the desired identity as

\displaystyle  \int_0^\infty \hat f_{-\varepsilon}(\xi) e^{-2\pi \varepsilon \xi} e^{2\pi i x \xi}\ d\xi = \frac{1}{2\pi i} \int_{-\infty}^\infty \frac{f_{-\varepsilon}(y)}{y-i\varepsilon-x}\ dy.

By (3) and Fubini’s theorem (taking advantage of (8) and the exponential decay of {e^{-2\pi \varepsilon \xi}} as {\varepsilon \rightarrow +\infty}) the left-hand side can be written as

\displaystyle  \int_{\bf R} f_{-\varepsilon}(y) \int_0^\infty e^{-2\pi i y \xi} e^{-2\pi \varepsilon \xi} e^{2\pi i x \xi}\ dy d\xi.

But a routine calculation shows that

\displaystyle  \int_0^\infty e^{-2\pi i y \xi} e^{-2\pi \varepsilon \xi} e^{2\pi i x \xi}\ dy = \frac{1}{2\pi i} \frac{1}{y-i\varepsilon-x}

giving the claim. The second part of (iii) is proven similarly.

To prove (iv), it suffices in light of (iii) to show that

\displaystyle  \int_{-\infty}^\infty \frac{f(y-i\varepsilon)}{y-i\varepsilon - x}\ dy - \int_{-\infty}^\infty \frac{f(y+i\varepsilon)}{y+i\varepsilon - x}\ dy = 2\pi i f(x)

for any {x \in {\bf R}}. The left-hand side can be written after a change of variables as

\displaystyle  \lim_{R \rightarrow \infty} \int_{\gamma_{-R-i\varepsilon \rightarrow R-i\varepsilon}} \frac{f(z)}{z-x}\ dz + \int_{\gamma_{R+i\varepsilon \rightarrow R+i\varepsilon}} \frac{f(z)}{z-x}\ dz.

On the other hand, from dominated convergence as in the proof of (i) we have

\displaystyle  \lim_{R \rightarrow \infty} \int_{\gamma_{R-i\varepsilon \rightarrow R+i\varepsilon}} \frac{f(z)}{z-x}\ dz + \int_{\gamma_{-R+i\varepsilon \rightarrow -R-i\varepsilon}} \frac{f(z)}{z-x}\ dz = 0

while from the Cauchy integral formula one has

\displaystyle  \lim_{R \rightarrow \infty} \int_{\gamma_{-R-i\varepsilon \rightarrow R-i\varepsilon \rightarrow R+i\varepsilon \rightarrow -R+i\varepsilon \rightarrow -R-i\varepsilon}} \frac{f(z)}{z-x}\ dz = 2\pi i f(x)

giving the claim.

Now we prove (v). Let {0 < \varepsilon < a}. From (i) we have

\displaystyle  \hat f(k) = \int_{\bf R} f(x-i\varepsilon) e^{-2\pi \varepsilon k} e^{-2\pi i k x}\ dx

and

\displaystyle  \hat f(k) = \int_{\bf R} f(x+i\varepsilon) e^{2\pi \varepsilon k} e^{-2\pi i k x}\ dx

for any {k \in {\bf Z}}. If we sum the first identity for {k=0,1,2,\dots} we see from the geometric series formula and Fubini’s theorem that

\displaystyle  \sum_{k=0}^\infty \hat f(k) = \int_{\bf R} \frac{f(x-i\varepsilon)}{1 - e^{-2\pi i (x-i\varepsilon)}}\ dx

and similarly if we sum the second identity for {k=-1,-2,\dots} we have

\displaystyle  \sum_{k=-\infty}^{-1} \hat f(k) = \int_{\bf R} \frac{f(x+i\varepsilon) e^{2\pi i (x+i\varepsilon)}}{1 - e^{2\pi i (x+i\varepsilon)}}\ dx

\displaystyle  = - \int_{\bf R} \frac{f(x+i\varepsilon)}{1 - e^{-2\pi i (x+i\varepsilon)}}\ dx.

Adding these two identities and changing variables, we conclude that

\displaystyle  \sum_{k=-\infty}^\infty \hat f(k) = \lim_{R \rightarrow \infty} \int_{\gamma_{-R-i\varepsilon \rightarrow R-i\varepsilon}} \frac{f(z)}{1 - e^{-2\pi i z}}\ dz + \int_{\gamma_{R+i\varepsilon \rightarrow -R+i\varepsilon}} \frac{f(z)}{1 - e^{-2\pi i z}}\ dz.

We would like to use the residue theorem to evaluate the right-hand side, but we need to take a little care to avoid the poles of the integrand {\frac{f(z)}{1 - e^{-2\pi i z}}}, which are at the integers. Hence we shall restrict {R} to be a half-integer {R = N + \frac{1}{2}}. In this case, a routine application of the residue theorem shows that

\displaystyle  \int_{\gamma_{-R-i\varepsilon \rightarrow R-i\varepsilon \rightarrow R+i\varepsilon \rightarrow -R+i\varepsilon \rightarrow -R-i\varepsilon}} \frac{f(z)}{1 - e^{-2\pi i z}}\ dz = \sum_{n=-N}^N f(n).

Noting that {\frac{1}{1-e^{-2\pi i z}}} stays bounded for {z} in {\gamma_{R-i\varepsilon \rightarrow R+i\varepsilon}} or {\gamma_{-R+i\varepsilon \rightarrow -R-i\varepsilon}} when {R} is a half-integer, we also see from dominated convergence as before that

\displaystyle  \lim_{R \rightarrow \infty} \int_{\gamma_{R-i\varepsilon \rightarrow R+i\varepsilon}} \frac{f(z)}{1 - e^{-2\pi i z}}\ dz + \int_{\gamma_{-R+i\varepsilon \rightarrow -R-i\varepsilon}} \frac{f(z)}{1 - e^{-2\pi i z}} = 0.

The claim follows. \Box

Exercise 4 (Hilbert transform and Plemelj formula) Let {a, f} be as in Proposition 3. Define the Cauchy-Stieltjes transform {{\mathcal C} f: {\bf C} \backslash {\bf R} \rightarrow {\bf C}} by the formula

\displaystyle  {\mathcal C} f(z) := \int_{\bf R} \frac{f(x)}{z-x}\ dx.

  • (i) Show that {{\mathcal C} f} is holomorphic on {{\bf C} \backslash {\bf R}} and has the Fourier representation

    \displaystyle  {\mathcal C} f(z) = -2\pi i \int_0^\infty \hat f(\xi) e^{2\pi i z \xi}\ d\xi

    in the upper half-plane {\mathrm{Im} z > 0} and

    \displaystyle  {\mathcal C} f(z) = 2\pi i \int_{-\infty}^0 \hat f(\xi) e^{2\pi i z \xi}\ d\xi

    in the lower half-plane {\mathrm{Im} z < 0}.
  • (ii) Establish the Plemelj formulae

    \displaystyle  \lim_{\varepsilon \rightarrow 0^+} {\mathcal C} f(x+i\varepsilon) = \pi H f(x) - i \pi f(x)

    and

    \displaystyle  \lim_{\varepsilon \rightarrow 0^+} {\mathcal C} f(x-i\varepsilon) = \pi H f(x) + i \pi f(x)

    for any {x \in {\bf R}}, where the Hilbert transform {Hf} of {f} is defined by the principal value integral

    \displaystyle  Hf(x) := \lim_{\varepsilon \rightarrow 0^+, R \rightarrow \infty} \frac{1}{\pi} \int_{\varepsilon \leq |y-x| \leq R} \frac{f(y)}{x-y}\ dy.

  • (iii) Show that {{\mathcal C} f} is the unique holomorphic function on {{\bf C} \backslash {\bf R}} that obeys the decay bound

    \displaystyle  \sup_{x+iy \in {\bf C} \backslash {\bf R}} |y| |{\mathcal C} f(x+iy)| < \infty

    and solves the (very simple special case of the) Riemann-Hilbert problem

    \displaystyle  \lim_{\varepsilon \rightarrow 0^+} {\mathcal C} f(x+i\varepsilon) - {\mathcal C} f(x-i\varepsilon) = - 2\pi i f(x)

    uniformly in {x \in {\bf R}}.
  • (iv) Establish the identity

    \displaystyle  Hf(x) = -\int_{\bf R} i \mathrm{sgn}(\xi) \hat f(\xi) e^{2\pi i x \xi}\ d\xi,

    where the signum function {\mathrm{sgn}(\xi)} is defined to equal {+1} for {\xi>0}, {-1} for {\xi < 0}, and {0} for {\xi=0}.
  • (v) Show that {Hf} extends holomorphically to the strip {\{ z: |\mathrm{Im} z| < a \}} and obeys the bound (8) (but possibly with a different constant {C_b}), with the identity

    \displaystyle  {\mathcal C} f(z) = \pi H f(z) - \pi i \mathrm{sgn}(\mathrm{Im}(z)) f(z) \ \ \ \ \ (9)

    holding for {0 < |\mathrm{Im} z| < a}.
  • (vi) Establish the identities

    \displaystyle  H(Hf) = -f

    and

    \displaystyle  H( fHf) = \frac{(Hf)^2 - f^2}{2}.

    (Hint: for the latter inequality, square both sides of (9).)

Exercise 5 (Kramers-Kronig relations) Let {f} be a continuous function on the upper half-plane {\{ z: \mathrm{Im} z \geq 0 \}} which is holomorphic on the interior of this half-plane, and obeys the bound {|f(z)| \leq C/|z|} for all non-zero {z} in this half-plane and some {C>0}. Establish the Kramers-Kronig relations

\displaystyle  \mathrm{Re} f(x) = \lim_{\varepsilon \rightarrow 0, R \rightarrow \infty} \frac{1}{\pi} \int_{\varepsilon \leq |y-x| \leq R} \frac{\mathrm{Im} f(y)}{y-x}\ dy

and

\displaystyle  \mathrm{Im} f(x) = -\lim_{\varepsilon \rightarrow 0, R \rightarrow \infty} \frac{1}{\pi} \int_{\varepsilon \leq |y-x| \leq R} \frac{\mathrm{Re} f(y)}{y-x}\ dy

relating the real and imaginary parts of {f} to each other.

Exercise 6
  • (i) By applying the Poisson summation formula to the function {x \mapsto \frac{1}{x^2+a^2}}, establish the identity

    \displaystyle  \sum_{n \in {\bf Z}} \frac{1}{n^2 + a^2} = \frac{\pi}{a} \frac{e^{2\pi a}+1}{e^{2\pi a}-1}

    for any positive real number {a}. Explain why this is consistent with Exercise 24 from Notes 1.
  • (ii) By carefully taking limits of (i) as {a \rightarrow 0}, establish yet another alternate proof of Euler’s identity

    \displaystyle  \sum_{n=1}^\infty \frac{1}{n^2} = \frac{\pi^2}{6}.

Exercise 7 For {\tau} in the upper half-plane {\{ \mathrm{Im} \tau > 0\}}, define the theta function {\theta(\tau) := \sum_{n \in {\bf Z}} e^{-\pi i n^2 \tau}}. Use Exercise 1 and the Poisson summation formula to establish the modular identity

\displaystyle  \theta(\tau) = (-i\tau)^{-1/2} \theta(-1/\tau)

for such {\tau}, where one takes the standard branch of the square root.

Exercise 8 (Fourier proof of Plancherel identity) Let {f: {\bf R} \rightarrow {\bf C}} be smooth and compactly supported. For any {\xi \in {\bf C}} with {\mathrm{Im} \xi \geq 0}, define the quantity

\displaystyle  A(\xi) := \int_{\bf R} \int_{\bf R} e^{2\pi i \xi |x-y|} \overline{f}(x) f(y)\ dx dy.

Remarkably, this proof of the Plancherel identity generalises to a nonlinear version involving a trace formula for the scattering transform for either Schrodinger or Dirac operators. For Schrodinger operators this was first obtained (implicitly) by Buslaev and Faddeev, and later more explicitly by by Deift and Killip. The version for Dirac operators more closely resembles the linear Plancherel identity; see for instance the appendix to this paper of Muscalu, Thiele, and myself. The quantity {A(\xi)} is a component of a nonlinear quantity known as the transmission coefficient {a(\xi)} of a Dirac operator with potential {f} and spectral parameter {\xi} (or {2\pi \xi}, depending on normalisations).

The Fourier inversion formula was only established in Proposition 3 for functions that had a suitable holomorphic extension to a strip, but one can relax the hypotheses by a limiting argument. Here is one such example of this:

Exercise 9 (More general Fourier inversion formula) Let {f: {\bf R} \rightarrow {\bf C}} be continuous and obey the bound {|f(x)| \leq \frac{C}{1+|x|^2}} for all {x \in {\bf R}} and some {C>0}. Suppose that the Fourier transform {\hat f: {\bf R} \rightarrow {\bf C}} is absolutely integrable.

Exercise 10 (Laplace inversion formula) Let {f: [0,+\infty) \rightarrow {\bf C}} be a continuously twice differentiable function, obeying the bounds {|f(x)|, |f'(x)|, |f''(x)| \leq \frac{C}{1+|x|^2}} for all {x \geq 0} and some {C>0}.
  • (i) Show that the Fourier transform {\hat f} obeys the asymptotic

    \displaystyle  \hat f(\xi) = \frac{f(0)}{2\pi i \xi} + O( \frac{C}{|\xi|^2} )

    for any non-zero {\xi \in {\bf R}}.
  • (ii) Establish the principal value inversion formula

    \displaystyle  f(x) = \lim_{T \rightarrow +\infty} \int_{-T}^T \hat f(\xi) e^{2\pi i x \xi}\ d\xi

    for any positive real {x}. (Hint: modify the proof of Exercise 9(ii).) What happens when {x} is negative? zero?
  • (iii) Define the Laplace transform {{\mathcal L} f(s)} of {f} for {\mathrm{Re}(s) \geq 0} by the formula

    \displaystyle  {\mathcal L} f(s) := \int_0^\infty f(t) e^{-st}\ dt.

    Show that {{\mathcal L} f} is continuous on the half-plane {\{ s: \mathrm{Re}(s) \geq 0\}}, holomorphic on the interior of this half-plane, and obeys the Laplace-Mellin inversion formula

    \displaystyle  f(t) = \frac{1}{2\pi i} \lim_{T \rightarrow +\infty} \int_{\gamma_{\sigma-iT \rightarrow \sigma+iT}} e^{st} {\mathcal L} f(s) \ ds \ \ \ \ \ (10)

    for any {t>0} and {\sigma \geq 0}, where {\gamma_{\sigma-iT \rightarrow \sigma+iT}} is the line segment contour from {\sigma-iT} to {\sigma+iT}. Conclude in particular that the Laplace transform {{\mathcal L}} is injective on this class of functions {f}.
The Laplace-Mellin inversion formula in fact holds under more relaxed decay and regularity hypotheses than the ones given in this exercise, but we will not pursue these generalisations here. The limiting integral in (10) is also known as the Bromwich integral, and often written (with a slight abuse of notation) as {\frac{1}{2\pi i} \int_{\sigma-i\infty}^{\sigma+i\infty} e^{st} {\mathcal L} f(s)\ ds}. The Laplace transform is a close cousin of the Fourier transform that has many uses; for instance, it is a popular tool for analysing ordinary differential equations on half-lines such as {[0,+\infty)}.

Exercise 11 (Mellin inversion formula) Let {f: (0,+\infty) \rightarrow {\bf C}} be a continuous function that is compactly supported in {(0,+\infty)}. Define the Mellin transform {{\mathcal M} f: {\bf C} \rightarrow {\bf C}} by the formula

\displaystyle  {\mathcal M} f(s) := \int_0^\infty x^s f(x) \frac{dx}{x}.

Show that {{\mathcal M} f} is entire and one has the Mellin inversion formula

\displaystyle  f(x) = \frac{1}{2\pi i} \lim_{T \rightarrow +\infty} \int_{\gamma_{\sigma-iT \rightarrow \sigma+iT}} x^{-s} {\mathcal M} f(s)\ ds

for any {x \in {\bf R}} and {\sigma \in {\bf R}}. The regularity and support hypotheses on {f} can be relaxed significantly, but we will not pursue this direction here.

Exercise 12 (Perron’s formula) Let {f: {\bf N} \rightarrow {\bf C}} be a function which is of subpolynomial growth in the sense that {|f(n)| \leq C_\varepsilon n^\varepsilon} for all {n \in {\bf N}} and {\varepsilon>0}, where {C_\varepsilon} depends on {\varepsilon} (and {f}). For {s} in the half-plane {\{ \mathrm{Re} s > 1 \}}, form the Dirichlet series

\displaystyle  F(s) := \sum_{n=1}^\infty \frac{f(n)}{n^s}.

For any non-integer {x>0} and any {\sigma>1}, establish Perron’s formula

\displaystyle  \sum_{n \leq x} f(n) = \frac{1}{2\pi i} \lim_{T \rightarrow \infty} \int_{\gamma_{\sigma-iT \rightarrow \sigma+iT}} F(s) \frac{x^s}{s}\ ds. \ \ \ \ \ (11)

What happens when {x} is an integer? (The Perron formula and its many variants are of great utility in analytic number theory; see these previous lecture notes for further discussion.)

Exercise 13 (Solution to Schrödinger equation) Let {f, a} be as in Proposition 3. Define the function {u: {\bf R} \times {\bf R} \rightarrow {\bf C}} by the formula \{ u(t,x) := \int_R \hat f(\xi) e^{2\pi i x \xi – 4 \pi^2 i \xi^2 t} d\xi.\}
  • (i) Show that {f} is a smooth function of {t,x} that obeys the Schrödinger equation {i \partial_t u + \partial_{xx} u = 0} with initial condition {u(0,x) = f(x)} for {x \in {\bf R}}.
  • (ii) Establish the formula

    \displaystyle  u(t,x) = \frac{1}{(4\pi i t)^{1/2}} \int_{\bf R} e^{-\frac{|x-y|^2}{4it}} f(y)\ dy

    for {x \in {\bf R}} and {t \neq 0}, where we use the standard branch of the square root.

— 2. Phragmen-Lindelof and Paley-Wiener —

The maximum modulus principle (Exercise 26 from 246A Notes 1) for holomorphic functions asserts that if a function continuous on a compact subset {K} of the plane and holomorphic on the interior of that set is bounded in magnitude by a bound {M} on the boundary {\partial K}, then it is also bounded by {M} on the interior. This principle does not directly apply for noncompact domains {K}: for instance, on the entire complex plane {{\bf C}}, there is no boundary whatsoever and the bound is clearly vacuous. On the half-plane {\{ \mathrm{Im} z \geq 0 \}}, the holomorphic function {\cos z} (for instance) is bounded in magnitude by {1} on the boundary of the half-plane, but grows exponentially in the interior. Similarly, in the strip {\{ z: -\pi/2 \leq \mathrm{Re} z \leq \pi/2\}}, the holomorphic function {\exp(\exp(iz))} is bounded in magnitude by {1} on the boundary of the strip, but is grows double-exponentially in the interior of the strip. However, if one does not have such absurdly high growth, one can recover a form of the maximum principle, known as the Phragmén-Lindelöf principle. Here is one formulation of this principle:

Theorem 14 (Lindelöf’s theorem) Let {f} be a continuous function on a strip {S := \{ \sigma+it: a \leq \sigma \leq b; t \in {\bf R} \}} for some {b>a}, which is holomorphic in the interior of the strip and obeys the bound

\displaystyle  |f(\sigma+it)| \leq A \exp( B \exp( (1-\delta) \frac{\pi}{b-a} |t|) ) \ \ \ \ \ (12)

for all {\sigma+it \in S} and some constants {A, \delta > 0}. Suppose also that {|f(a+it)| \leq M} and {|f(b+it)| \leq M} for all {t \in {\bf R}} and some {M>0}. Then we have {|f(\sigma+it)| \leq M} for all {a \leq \sigma \leq b} and {t \in {\bf R}}.

Remark 15 The hypothesis (12) is a qualitative hypothesis rather than a quantitative one, since the exact values of {A, B, \delta} do not show up in the conclusion. It is quite a mild condition; any function of exponential growth in {t}, or even with such super-exponential growth as {O( |t|^{|t|})} or {O(e^{|t|^{O(1)}})}, will obey (12). The principle however fails without this hypothesis, as discussed previously.

Proof: By shifting and dilating (adjusting {A,B,\delta} as necessary) we can reduce to the case {a = -\pi/2}, {b = \pi/2}, and by multiplying {f} by a constant we can also normalise {M=1}.

Suppose we temporarily assume that {f(\sigma+it) \rightarrow 0} as {|\sigma+it| \rightarrow \infty}. Then on a sufficiently large rectangle {\{ \sigma+it: -\pi/2 \leq \sigma \leq \pi/2; -T \leq t \leq T \}}, we have {|f| \leq 1} on the boundary of the rectangle, hence on the interior by the maximum modulus principle. Sending {T \rightarrow \infty}, we obtain the claim.

To remove the assumption that {f} goes to zero at infinity, we use the trick of giving ourselves an epsilon of room. Namely, we multiply {f(z)} by the holomorphic function {g_\varepsilon(z) := \exp( -\varepsilon \exp(i(1-\delta/2) z) )} for some {\varepsilon > 0}. A little complex arithmetic shows that the function {f(z) g_\varepsilon(z)} goes to zero at infinity in {S}. Applying the previous case to this function, then taking limits as {\varepsilon \rightarrow 0}, we obtain the claim. \Box

Corollary 16 (Phragmén-Lindelöf principle in a sector) Let {f} be a continuous function on a sector {S := \{ re^{i\theta}: r \geq 0, \alpha \leq \theta \leq \beta \}} for some {\alpha < \beta < \alpha + 2\pi}, which is holomorphic on the interior of the sector and obeys the bound

\displaystyle  |f(z)| \leq A \exp( B |z|^a )

for some {A,B > 0} and {0 < a < \frac{\pi}{\beta-\alpha}}. Suppose also that {|f(z)| \leq M} on the boundary of the sector {S} for some {M >0}. Then one also has {|f(z)| \leq M} in the interior.

Proof: Apply Theorem 14 to the function {f(\exp(z))} on the strip {\{ \sigma+it: \alpha \leq \sigma \leq \beta\}}. \Box

Exercise 17 With the notation and hypotheses of Theorem 14, show that the function {\sigma \mapsto \sup_{t \in {\bf R}} |f(\sigma+it)|} is log-convex on {[a,b]}.

Exercise 18 (Hadamard three-circles theorem) Let {f} be a holomorphic function on an annulus {\{ z \in {\bf C}: R_1 \leq |z| \leq R_2 \}}. Show that the function {r \mapsto \sup_{\theta \in [0,2\pi]} |f(re^{i\theta})|} is log-convex on {[R_1,R_2]}.

Exercise 19 (Phragmén-Lindelöf principle) Let {f} be as in Theorem 14 with {a=0, b=1}, but suppose that we have the bounds {f(0+it) \leq C(1+|t|)^{a_0}} and {f(1+it) \leq C(1+|t|)^{a_1}} for all {t \in {\bf R}} and some exponents {a_0,a_1 \in {\bf R}} and a constant {C>0}. Show that one has {f(\sigma+it) \leq C' (1+|t|)^{(1-\sigma) a_0 + \sigma a_1}} for all {\sigma+it \in S} and some constant {C'} (which is allowed to depend on the constants {A, \delta} in (12)). (Hint: it is convenient to work first in a half-strip such as {\{ \sigma+it \in S: t \geq T \}} for some large {T}. Then multiply {f} by something like {\exp( - ((1-z)a_0+z a_1) \log(-iz) )} for some suitable branch of the logarithm and apply a variant of Theorem 14 for the half-strip. A more refined estimate in this regard is due to Rademacher.) This particular version of the principle gives the convexity bound for Dirichlet series such as the Riemann zeta function. Bounds which exploit the deeper properties of these functions to improve upon the convexity bound are known as subconvexity bounds and are of major importance in analytic number theory, which is of course well outside the scope of this course.

Now we can establish a remarkable converse of sorts to Exercise 2(ii) known as the Paley-Wiener theorem, that links the exponential growth of (the analytic continuation) of a function with the support of its Fourier transform:

Theorem 20 (Paley-Wiener theorem) Let {f: {\bf R} \rightarrow {\bf C}} be a continuous function obeying the decay condition

\displaystyle  |f(x)| \leq C/(1+|x|^2) \ \ \ \ \ (13)

for all {x \in {\bf R}} and some {C>0}. Let {M > 0}. Then the following are equivalent:
  • (i) The Fourier transform {\hat f} is supported on {[-M,M]}.
  • (ii) {f} extends analytically to an entire function that obeys the bound {|f(z)| \leq A e^{2\pi M |z|}} for some {A>0}.
  • (iii) {f} extends analytically to an entire function that obeys the bound {|f(z)| \leq A e^{2\pi M |\mathrm{Im} z|}} for some {A>0}.

The continuity and decay hypotheses on {f} can be relaxed, but we will not explore such generalisations here.

Proof: If (i) holds, then by Exercise 9, we have the inversion formula (4), and the claim (iii) then holds by a slight modification of Exercise 2(ii). Also, the claim (iii) clearly implies (ii).

Now we see why (iii) implies (i). We first assume that we have the stronger bound

\displaystyle  |f(z)| \leq A e^{2\pi M |\mathrm{Im} z|} / (1 + |z|^2) \ \ \ \ \ (14)

for {z \in {\bf C}}. Then we can apply Proposition 3 for any {a>0}, and conclude in particular that

\displaystyle  \hat f(\xi) = e^{-2\pi b \xi} \int_{\bf R} f(x-ib) e^{-2\pi i x \xi}\ dx

for any {\xi \in {\bf R}} and {b \in {\bf R}}. Applying (14) and the triangle inequality, we see that

\displaystyle  \hat f(\xi) \lesssim_A e^{-2\pi b \xi} e^{2\pi M |b|}.

If {\xi > M}, we can then send {b \rightarrow +\infty} and conclude that {\hat f(\xi)=0}; similarly for {\xi < -M} we can send {b \rightarrow -\infty} and again conclude {\hat f(\xi) = 0}. This establishes (i) in this case.

Now suppose we only have the weaker bound on {f} assumed in (iii). We again use the epsilon of room trick. For any {\varepsilon>0}, we consider the modified function {f_\varepsilon(z) := f(z) / (1 - i \varepsilon z)^2}. This is still holomorphic on the lower half-plane {\{ z: \mathrm{Im} z \leq 0 \}} and obeys a bound of the form (14) on this half-plane. An inspection of the previous arguments shows that we can still show that {\hat f_\varepsilon(\xi) = 0} for {\xi > M} despite no longer having holomorphicity on the entire upper half-plane; sending {\varepsilon \rightarrow 0} using dominated convergence we conclude that {\hat f(\xi) = 0} for {\xi > M}. A similar argument (now using {1+i\varepsilon z} in place of {1-i\varepsilon z} shows that {\hat f(\xi) = 0} for {\xi < -M}. This proves (i).

Finally, we show that (ii) implies (iii). The function {f(z) e^{2\pi i Mz}} is entire, bounded on the real axis by (13), bounded on the upper imaginary axis by (iii), and has exponential growth. By Corollary 16, it is also bounded on the upper half-plane, which gives (iii) in the upper half-plane as well. A similar argument (using {e^{-2\pi i Mz}} in place of {e^{2\pi i Mz}}) also yields (iii) in the lower half-plane. \Box

— 3. The Hardy uncertainty principle —

Informally speaking, the uncertainty principle for the Fourier transform asserts that a function {f} and its Fourier transform cannot simultaneously be strongly localised, except in the degenerate case when {f} is identically zero. There are many rigorous formulations of this principle. Perhaps the best known is the Heisenberg uncertainty principle

\displaystyle  (\int_{\bf R} (\xi-\xi_0)^2 |\hat f(\xi)|^2\ d\xi)^{1/2} (\int_{\bf R} (x-x_0)^2 |f(x)|^2\ dx)^{1/2} \geq \frac{1}{4\pi} \int_{\bf R} |f(x)|^2\ dx,

valid for all {f: {\bf R} \rightarrow {\bf C}} and all {x_0,\xi_0 \in {\bf R}}, which we will not prove here (see for instance Exercise 47 of this previous set of lecture notes).

Another manifestation of the uncertainty principle is the following simple fact:

Lemma 21
  • (i) If {f: {\bf R} \rightarrow {\bf C}} is an integrable function that has exponential decay in the sense that one has {|f(x)| \leq C e^{-a|x|}} for all {x \in {\bf R}} and some {C,a>0}, then the Fourier transform {\hat f: {\bf R} \rightarrow {\bf C}} is either identically zero, or only has isolated zeroes (that is to say, the set {\{ \xi \in {\bf R}: \hat f(\xi) = 0 \}} is discrete.
  • (ii) If {f: {\bf R} \rightarrow {\bf C}} is a compactly supported continuous function such that {\hat f} is also compactly supported, then {f} is identically zero.

Proof: For (i), we observe from Exercise 2(iii) that {\hat f} extends holomorphically to a strip around the real axis, and the claim follows since non-zero holomorphic functions have isolated zeroes. For (ii), we observe from (i) that {\hat f} must be identically zero, and the claim now follows from the Fourier inversion formula (Exercise 9). \Box

Lemma 21(ii) rules out the existence of a bump function whose Fourier transform is also a bump function, which would have been a rather useful tool to have in harmonic analysis over the reals. (Such functions do exist however in some non-archimedean domains, such as the {p}-adics.) On the other hand, from Exercise 1 we see that we do at least have gaussian functions whose Fourier transform also decays as a gaussian. Unfortunately this is basically the best one can do:

Theorem 22 (Hardy uncertainty principle) Let {f} be a continuous function which obeys the bound {|f(x)| \leq C e^{-\pi ax^2}} for all {x \in {\bf R}} and some {C,a>0}. Suppose also that {|\hat f(\xi)| \leq C' e^{-\pi \xi^2/a}} for all {\xi \in {\bf R}} and some {C'>0}. Then {f(x)} is a scalar multiple of the gaussian {e^{-\pi ax^2}}, that is to say one has {f(x) = c e^{-\pi ax^2}} for some {c \in {\bf C}}.

Proof: By replacing {f} with the rescaled version {x \mapsto f(x/a^{1/2})}, which replaces {\hat f} with the rescaled version {\xi \mapsto a^{1/2} \hat f(a^{1/2} \xi)}, we may normalise {a=1}. By multiplying {f} by a small constant we may also normalise {C=C'=1}.

From Exercise 2(i), {\hat f} extends to an entire function. By the triangle inequality, we can bound

\displaystyle  |\hat f(\xi+i\eta)| \leq \int_{\bf R} e^{-\pi x^2} e^{2\pi x \eta}\ dx

for any {\xi,\eta \in {\bf R}}. Completing the square {e^{-\pi x^2} e^{2\pi x \eta} = e^{-\pi (x-\eta)^2} e^{\pi \eta^2}} and using {\int_{\bf R} e^{-\pi (x-\eta)^2}\ dx = \int_{\bf R} e^{-\pi x^2}\ dx = 1}, we conclude the bound

\displaystyle  |\hat f(\xi+i\eta)| \leq e^{\pi \eta^2}.

In particular, if we introduce the normalised function

\displaystyle  F(z) := e^{\pi z^2} \hat f(z)

then

\displaystyle  |F(\xi+i\eta)| \leq e^{\pi \xi^2}. \ \ \ \ \ (15)

In particular, {|F|} is bounded by {1} on the imaginary axis. On the other hand, from hypothesis {F} is also bounded by {1} on the real axis. We can now almost invoke the Phragmén-Lindeöf principle (Corollary 16) to conclude that {F} is bounded on all four quadrants, but the growth bound we have (15) is just barely too weak. To get around this we use the epsilon of room trick. For any {\varepsilon>0}, the function {F_\varepsilon(z) := e^{\pi i\varepsilon z^2} F(z)} is still entire, and is still bounded by {1} in magnitude on the real line. From (15) we have

\displaystyle  |F_\varepsilon(\xi+i\eta)| \leq e^{\pi \xi^2 - 2\varepsilon \pi \xi \eta}

so in particular it is bounded by {1} on the slightly tilted imaginary axis {(2\varepsilon + i) {\bf R}}. We can now apply Corollary 16 in the two acute-angle sectors between {(2\varepsilon+i) {\bf R}} and {{\bf R}} to conclude that {|F_\varepsilon(z)| \leq 1} in those two sectors; letting {\varepsilon \rightarrow 0}, we conclude that {|F(z)| \leq 1} in the first and third quadrants. A similar argument (using negative values of {\varepsilon}) shows that {|F(z)| \leq 1} in the second and fourth quadrants. By Liouville’s theorem, we conclude that {F} is constant, thus we have {\hat f(z) = c e^{-\pi z^2}} for some complex number {c}. The claim now follows from the Fourier inversion formula (Proposition 3(iv)) and Exercise 1. \Box

One corollary of this theorem is that if {f} is continuous and decays like {e^{-\pi ax^2}} or better, then {\hat f} cannot decay any faster than {e^{-\pi \xi^2/a}} without {f} vanishing identically. This is a stronger version of Lemma 21(ii). There is a more general tradeoff known as the Gel’fand-Shilov uncertainty principle, which roughly speaking asserts that if {f} decays like {e^{-\pi a x^p}} then {\hat f} cannot decay faster than {e^{-\pi b x^q}} without {f} vanishing identically, whenever {1 < p,q < \infty} are dual exponents in the sense that {\frac{1}{p}+\frac{1}{q}=1}, and {a^{1/p} b^{1/q}} is large enough (the precise threshold was established in work of Morgan). See for instance this article of Nazarov for further discussion of these variants.

Exercise 23 If {f} is continuous and obeys the bound {|f(x)| \leq C (1+|x|)^M e^{-\pi ax^2}} for some {M \geq 0} and {C,a>0} and all {x \in {\bf R}}, and {\hat f} obeys the bound {|\hat f(\xi)| \leq C' (1+|\xi|)^M e^{-\pi \xi^2/a}} for some {C'>0} and all {\xi \in {\bf R}}, show that {f} is of the form {f(x) = P(x) e^{-\pi ax^2}} for some polynomial {P} of degree at most {M}.

Remark 24 There are many further variants of the Hardy uncertainty principle. For instance we have the following uncertainty principle of Beurling, which we state in a strengthened form due to Bonami, Demange, and Jaming: if {f} is a square-integrable function such that {\int_{\bf R} \int_{\bf R} \frac{|f(x)| |\hat f(\xi)|}{(1+|x|+|\xi|)^N} e^{2\pi |x| |\xi|}\ dx d\xi < \infty}, then {f} is equal (almost everywhere) to a polynomial times a gaussian; it is not difficult to show that this implies Theorem 22 and Exercise 23, as well as the Gel’fand-Shilov uncertainty principle. In recent years, PDE-based proofs of the Hardy uncertainty principle have been established, which have then been generalised to establish uncertainty principles for various Schrödinger type equations; see for instance this review article of Kenig. I also have some older notes on the Hardy uncertainty principle in this blog post. Finally, we mention the Beurling-Malliavin theorem, which provides a precise description of the possible decay rates of a function whose Fourier transform is compactly supported; see for instance this paper of Mashregi, Nazarov, and Khavin for a modern treatment.

John BaezOpen Systems: A Double Categorical Perspective (Part 3)

Back to Kenny Courser’s thesis:

• Kenny Courser, Open Systems: A Double Categorical Perspective, Ph.D. thesis, U. C. Riverside, 2020.

Last time I explained the problems with decorated cospans as a framework for dealing with open systems. I vaguely hinted that Kenny’s thesis presents two solutions to these problems: so-called ‘structured cospans’, and a new improved approach to decorated cospans. Now let me explain these!

You may wonder why I’m returning to this now, after three months of silence. The reason is that Kenny, Christina Vasilakopolou, and I just finished a paper that continues this story:

• John Baez, Kenny Courser and Christina Vasilakopoulou, Structured versus decorated cospans.

We showed that under certain conditions, structured and decorated cospans are equivalent. So, I’m excited about this stuff again.

Last time I explained Fong’s theorem about decorated cospans:

Fong’s Theorem. Suppose \mathsf{A} is a category with finite colimits, and make \mathsf{A} into a symmetric monoidal category with its coproduct as the tensor product. Suppose F\colon (\mathsf{A},+) \to (\mathsf{Set},\times) is a symmetric lax monoidal functor. Define an F-decorated cospan to be a cospan

in \mathsf{A} together with an element x\in F(N) called a decoration. Then there is a symmetric monoidal category with

• objects of \mathsf{A} as objects,
• isomorphism classes of F-decorated cospans as morphisms.

The theorem is true, but it doesn’t apply to all the examples we wanted it to. The problem is that it’s ‘not categorified enough’. It’s fine if we want to decorate the apex N of our cospan with some extra structure: we do this by choosing an element of some set F(N). But in practice, we often want to decorate N with some extra stuff, which means choosing an object of a category F(N). So we should really use not a functor

F\colon (\mathsf{A},+) \to (\mathsf{Set},\times)

but something like a functor

F\colon (\mathsf{A},+) \to (\mathbf{Cat},\times)

What do I mean by ‘something like a functor?’ Well, \mathbf{Cat} is not just a category but a 2-category: it has categories as objects, functors as morphisms, but also natural transformations as 2-morphisms. The natural notion of ‘something like a functor’ from a category to a 2-category is called a pseudofunctor. And just as we can define symmetric lax monoidal functor, we can define a symmetric lax monoidal pseudofunctor.

All these nuances really matter when we’re studying open graphs, as we were last time!

Here we want the feet of our structured cospan to be finite sets and the apex to be a finite graph. So, we have \mathsf{A} = \mathsf{FinSet} and for any N \in \mathsf{FinSet} we want F(N) to be the set, or category, of finite graphs having N as their set of nodes.

I explained last time all the disasters that ensue when you try to let F(N) be the set of finite graphs having N as its set of nodes. You can try, but you will pay dearly for it! You can struggle and fight, like Hercules trying to chop all the heads off the Hydra, but you still can’t get a symmetric lax monoidal functor

F\colon (\mathsf{A},+) \to (\mathsf{Set},\times)

that sends any finite set N to the set of graphs having N as their set of nodes.

But there is a perfectly nice category F(N) of all finite graphs having N as their set of nodes. And you can get a symmetric lax monoidal pseudofunctor

F\colon (\mathsf{A},+) \to (\mathbf{Cat},\times)

that sends any any finite set to the category of finite graphs having it as nodes. So you should stop fighting and go with the flow.

Kenny, Christina and I proved an enhanced version of Fong’s theorem that works starting from this more general kind of F. And instead of just giving you a symmetric monoidal category, this theorem gives you a symmetric monoidal double category.

In fact, that is something you should have wanted already, even with Fong’s original hypotheses! The clue is that Fong’s theorem uses isomorphism classes of decorated cospans, which suggests we’d get something better if we used decorated cospans themselves. Kenny tackled this a while ago, getting a version of Fong’s theorem that produces a symmetric monoidal double category, and another version that produces a symmetric monoidal bicategory:

• Kenny Courser, A bicategory of decorated cospans, Theory and Applications of Categories 32 (2017), 995–1027.

Over the years we’ve realized that the double category is better, because it contains more information and is easier to work with. So, in our new improved approach to decorated cospans, we go straight for the jugular and get a double category. And here’s how it works:

Theorem. Suppose \mathsf{A} is a category with finite colimits, and make \mathsf{A} into a symmetric monoidal category with its coproduct as the tensor product. Suppose F\colon (\mathsf{A},+) \to (\mathbf{Cat},\times) is a symmetric lax monoidal pseudofunctor. Then there is a symmetric monoidal double category F\mathbb{C}\mathbf{sp} in which

• an object is an object of \mathsf{A}
• a vertical morphism is a morphism in \mathsf{A}
• a horizontal morphism is an F-decorated cospan, meaning a cospan in \mathsf{A} together with a decoration:


• a 2-morphism is a map of decorated cospans, meaning a commutative diagram in \mathsf{A}:

together with a morphism \tau \colon F(h)(x) \to x', the map of decorations.

We call F\mathbb{C}\mathbf{sp} a decorated cospan double category. And as our paper explains, this idea lets us fix all the broken attempted applications of Fong’s original decorated cospan categories!

All this is just what any category theorist worth their salt would try, in order to fix the original problems with decorated cospans. It turns out that proving the theorem above is not so easy, mainly because the definition of ‘symmetric monoidal double category’ is rather complex. But if you accept the theorem—including the details of how you get the symmetric monoidal structure on the double category, which I have spared you here—then it doesn’t really matter much that the proof takes work.

Next time I’ll tell you about the other way to fix the original decorated cospan formalism: structured cospans. When these work, they are often easier to use.


Part 1: an overview of Courser’s thesis and related papers.

Part 2: problems with the original decorated cospans.

Part 3: the new improved decorated cospans.

n-Category Café Open Systems: A Double Categorical Perspective (Part 3)

Back to Kenny Courser’s thesis:

Last time I explained the problems with decorated cospans as a framework for dealing with open systems. I vaguely hinted that Kenny’s thesis presents two solutions to these problems: so-called ‘structured cospans’, and a new improved approach to decorated cospans. Now let me explain these!

You may wonder why I’m returning to this now, after three months of silence. The reason is that Kenny, Christina Vasilakopolou, and I just finished a paper that continues this story:

We showed that under certain conditions, structured and decorated cospans are equivalent. So, I’m excited about this stuff again.

Last time I explained Fong’s theorem about decorated cospans:

Fong’s Theorem. Suppose A\mathsf{A} is a category with finite colimits, and make A\mathsf{A} into a symmetric monoidal category with its coproduct as the tensor product. Suppose F:(A,+)(Set,×)F\colon (\mathsf{A},+) \to (\mathsf{Set},\times) is a symmetric lax monoidal functor. Define an F-decorated cospan to be a cospan

       

in A\mathsf{A} together with an element xF(N)x\in F(N) called a decoration. Then there is a symmetric monoidal category with

  • objects of A\mathsf{A} as objects,
  • isomorphism classes of FF-decorated cospans as morphisms.

The theorem is true, but it doesn’t apply to all the examples we wanted it to. The problem is that it’s ‘not categorified enough’. It’s fine if we want to decorate the apex NN of our cospan with some extra structure: we do this by choosing an element of some set F(N).F(N). But in practice, we often want to decorate NN with some extra stuff, which means choosing an object of a category F(N).F(N). So we should really use not a functor

F:(A,+)(Set,×)F\colon (\mathsf{A},+) \to (\mathsf{Set},\times)

but something like a functor

F:(A,+)(Cat,×)F\colon (\mathsf{A},+) \to (\mathbf{Cat},\times)

What do I mean by ‘something like a functor?’ Well, Cat\mathbf{Cat} is not just a category but a 2-category: it has categories as objects, functors as morphisms, but also natural transformations as 2-morphisms. The natural notion of ‘something like a functor’ from a category to a 2-category is called a pseudofunctor. And just as we can define symmetric lax monoidal functor, we can define a symmetric lax monoidal pseudofunctor.

All these nuances really matter when we’re studying open graphs, as we were last time!

Here we want the feet of our structured cospan to be finite sets and the apex to be a finite graph. So, we have A=FinSet\mathsf{A} = \mathsf{FinSet} and for any NFinSetN \in \mathsf{FinSet} we want F(N)F(N) to be the set, or category, of finite graphs having NN as their set of nodes.

I explained last time all the disasters that ensue when you try to let F(N)F(N) be the set of finite graphs having NN as its set of nodes. You can try, but you will pay dearly for it! You can struggle and fight, like Hercules trying to chop all the heads off the Hydra, but you still can’t get a symmetric lax monoidal functor

F:(A,+)(Set,×)F\colon (\mathsf{A},+) \to (\mathsf{Set},\times)

that sends any finite set NN to the set of graphs having NN as their set of nodes.

But there is a perfectly nice category F(N)F(N) of all finite graphs having NN as their set of nodes. And you can get a symmetric lax monoidal pseudofunctor

F:(A,+)(Cat,×)F\colon (\mathsf{A},+) \to (\mathbf{Cat},\times)

that sends any any finite set to the category of finite graphs having it as nodes. So you should stop fighting and go with the flow.

Kenny, Christina and I proved an enhanced version of Fong’s theorem that works starting from this more general kind of F.F. And instead of just giving you a symmetric monoidal category, this theorem gives you a symmetric monoidal double category.

In fact, that is something you should have wanted already, even with Fong’s original hypotheses! The clue is that Fong’s theorem uses isomorphism classes of decorated cospans, which suggests we’d get something better if we used decorated cospans themselves. Kenny tackled this a while ago, getting a version of Fong’s theorem that produces a symmetric monoidal double category, and another version that produces a symmetric monoidal bicategory:

Over the years we’ve realized that the double category is better, because it contains more information and is easier to work with. So, in our new improved approach to decorated cospans, we go straight for the jugular and get a double category. And here’s how it works:

Theorem. Suppose A\mathsf{A} is a category with finite colimits, and make A\mathsf{A} into a symmetric monoidal category with its coproduct as the tensor product. Suppose F:(A,+)(Cat,×)F\colon (\mathsf{A},+) \to (\mathbf{Cat},\times) is a symmetric lax monoidal pseudofunctor. Then there is a symmetric monoidal double category FspF\mathbb{C}\!\mathbf{sp} in which

  • an object is an object of A\mathsf{A}
  • a vertical morphism is a morphism in A\mathsf{A}
  • a horizontal morphism is an F-decorated cospan, meaning a cospan in A\mathsf{A} together with a decoration:

             

  • a 2-morphism is a map of decorated cospans, meaning a commutative diagram in A\mathsf{A}:

             

together with a morphism τ:F(h)(x)x,\tau \colon F(h)(x) \to x', the map of decorations.

We call FspF\mathbb{C}\!\mathbf{sp} a decorated cospan double category. And as our paper explains, this idea lets us fix all the broken attempted applications of Fong’s original decorated cospan categories!

All this is just what any category theorist worth their salt would try, in order to fix the original problems with decorated cospans. It turns out that proving the theorem above is not so easy, mainly because the definition of ‘symmetric monoidal double category’ is rather complex. But if you accept the theorem—including the details of how you get the symmetric monoidal structure on the double category, which I have spared you here—then it doesn’t really matter much that the proof takes work.

Next time I’ll tell you about the other way to fix the original decorated cospan formalism: structured cospans. When these work, they are often easier to use.


  • Part 1: an overview of Courser’s thesis and related papers.

  • Part 2: problems with the original decorated cospans.

  • Part 3: the new improved decorated cospans.

Jordan EllenbergI don’t work at a finishing school

David Brooks, in the New York Times:

On the left, less viciously, we have elite universities that have become engines for the production of inequality. All that woke posturing is the professoriate’s attempt to mask the fact that they work at finishing schools where more students often come from the top 1 percent of earners than from the bottom 60 percent. Their graduates flock to insular neighborhoods in and around New York, D.C., San Francisco and a few other cities, have little contact with the rest of America and make everybody else feel scorned and invisible.

It’s fun to track down a fact. More from the top 1% than the bottom 60%! That certainly makes professoring sound like basically a grade-inflation concierge service for the wealthy with a few scholarship kids thrown in for flavor. But it’s interesting to try to track down the basis of a quantitative claim like this. Brooks says “more students often come,” which is hard to parse. He does, helpfully, provide a link (not all pundits do this!) to back up his claim.

Now the title of the linked NYT piece is “Some Colleges Have More Students From the Top 1 Percent Than the Bottom 60.” Some is a little different from often; how many colleges, exactly, are that badly income-skewed? The Times piece says 38, including five from the Ivy League. Thirty-eight colleges is… not actually that many! The list doesn’t include Harvard (15.1 from the 1%, 20.4 from the bottom 60%) or famously woke Oberlin (9.3/13.3) or Cornell (10.5/19.6) or MIT (5.7/23.4) or Berkeley (3.8/29.7) and it definitely doesn’t include the University of Wisconsin (1.6/27.3).

We can be more quantitative still! A couple of clicks from the Times article gets you to the paper they’re writing about, which helpfully has all its data in downloadable form. Their list has 2202 colleges. Of those, the number that have as many students from the top 1% as from the bottom 60% is 17. (The Times says 38, I know; the numbers in the authors’ database match what’s in their Feb 2020 paper but not what’s in the 2017 Times article.) The number which have even half as many 1%-ers as folks from the bottom 60% is only 64. But maybe those are the 64 elitest-snooty-tootiest colleges? Not really; a lot of them are small, expensive schools, like Bates, Colgate, Middlebury, Sarah Lawrence, Wake Forest, Vanderbilt — good places to go to school but not the ones whose faculty dominate The Discourse. The authors helpfully separate colleges into “tiers” — there are 173 schools in the tiers they label as “Ivy Plus,” “Other elite schools,” “Highly selective public,” and ‘Highly selective private.” All 17 of the schools with more 1% than 60% are in this group, as are 59 of the 64 with a ratio greater than 1/2. But still: of those 173 schools, the median ratio between “students in the top 1%” and “students in the bottom 60%: is 0.326; in other words, the typical such school has more than three times as many ordinary kids as it has Richie Riches.

Conclusion: I don’t think it is fair to characterize the data as saying that the elite universities of the US are “finishing schools where more students often come from the top 1 percent of earners than from the bottom 60 percent.”

On the other hand: of those 173 top-tier schools, 132 of them have more than half their students coming from the top 20% of the income distribution. UW–Madison draws almost two-fifths of its student body from that top quintile (household incomes of about $120K or more.) And only three out of those 173 have as many as 10% of their student body coming from the bottom quintile of the income distribution (UC-Irvine, UCLA, and Stony Brook.) The story about elite higher ed perpetuating inequality isn’t really about the kids of the hedge-fund jackpot winners and far-flung monarchs who spend four years learning critical race theory so they can work at a Gowanus nonprofit and eat locally-sourced brunch; it’s about the kids of the lawyers and the dentists and the high-end realtors, who are maybe also going to be lawyers and dentists and high-end realtors. And the students who are really shut out of elite education aren’t, as Brooks has it, the ones whose families earn the median income; they’re poor kids.

January 22, 2021

David Hoggwriting proposals is hard!

Today I took a serious shot at getting words down in my upcoming NASA proposal for open-source tools, frameworks, and libraries. This is a new call to support development and maintainance of open-source projects that are aligned with NASA science missions (yay open science and NASA!). Dustin Lang (Perimeter) and I are proposing to support Astrometry.net, which is used in multiple NASA missions, including SOFIA and SPHEREx. It is hard to put together a full proposal; writing a proposal is comparable in intellectual scope to writing a scientific paper! And it must be done on deadline, or not at all.

Matt von HippelPhysical Intuition From Physics Experience

One of the most mysterious powers physicists claim is physical intuition. Let the mathematicians have their rigorous proofs and careful calculations. We just need to ask ourselves, “Does this make sense physically?”

It’s tempting to chalk this up to bluster, or physicist arrogance. Sometimes, though, a physicist manages to figure out something that stumps the mathematicians. Edward Witten’s work on knot theory is a classic example, where he used ideas from physics, not rigorous proof, to win one of mathematics’ highest honors.

So what is physical intuition? And what is its relationship to proof?

Let me walk you through an example. I recently saw a talk by someone in my field who might be a master of physical intuition. He was trying to learn about what we call Effective Field Theories, theories that are “effectively” true at some energy but don’t include the details of higher-energy particles. He calculated that there are limits to the effect these higher-energy particles can have, just based on simple cause and effect. To explain the calculation to us, he gave a physical example, of coupled oscillators.

Oscillators are familiar problems for first-year physics students. Objects that go back and forth, like springs and pendulums, tend to obey similar equations. Link two of them together (couple them), and the equations get more complicated, work for a second-year student instead of a first-year one. Such a student will notice that coupled oscillators “repel” each other: their frequencies get father apart than they would be if they weren’t coupled.

Our seminar speaker wanted us to revisit those second-year-student days, in order to understand how different particles behave in Effective Field Theory. Just as the frequencies of the oscillators repel each other, the energies of particles repel each other: the unknown high-energy particles could only push the energies of the lighter particles we can detect lower, not higher.

This is an example of physical intuition. Examine it, and you can learn a few things about how physical intuition works.

First, physical intuition comes from experience. Using physical intuition wasn’t just a matter of imagining the particles and trying to see what “makes sense”. Instead, it required thinking about similar problems from our experience as physicists: problems that don’t just seem similar on the surface, but are mathematically similar.

Second, physical intuition doesn’t replace calculation. Our speaker had done the math, he hadn’t just made a physical argument. Instead, physical intuition serves two roles: to inspire, and to help remember. Physical intuition can inspire new solutions, suggesting ideas that you go on to check with calculation. In addition to that, it can help your mind sort out what you already know. Without the physical story, we might not have remembered that the low-energy particles have their energies pushed down. With the story though, we had a similar problem to compare, and it made the whole thing more memorable. Human minds aren’t good at holding a giant pile of facts. What they are good at is holding narratives. “Physical intuition” ties what we know into a narrative, building on past problems to understand new ones.

Finally, physical intuition can be risky. If the problem is too different then the intuition can lead you astray. The mathematics of coupled oscillators and Effective Field Theories was similar enough for this argument to work, but if it turned out to be different in an important way then the intuition would have backfired, making it harder to find the answer and harder to keep track once it was found.

Physical intuition may seem mysterious. But deep down, it’s just physicists using our experience, comparing similar problems to help keep track of what we need to know. I’m sure chemists, biologists, and mathematicians all have similar stories to tell.

Scott AaronsonSufficiently amusing that I had no choice

January 21, 2021

Scott AaronsonA day to celebrate

The reason I’m celebrating is presumably obvious to all: today is my daughter Lily’s 8th birthday! (She had a tiny Star Wars-themed party, dressed in her Rey costume.)

A second reason I’m celebrating yesterday: I began teaching (via Zoom, of course) the latest iteration of my graduate course on Quantum Complexity Theory!

A third reason: I’m now scheduled to get my first covid vaccine shot on Monday! (Texas is working through its “Phase 1b,” which includes both the over-65 and those with underlying conditions—in my case, mild type-2 diabetes.) I’d encourage everyone to do as I did: don’t lie to jump the line, but don’t sacrifice your place either. Just follow the stated rules and get vaccinated the first microsecond you can, and urge all your friends and loved ones to do the same. A crush of demand is actually good if it encourages the providers to expand their hours (they’re taking off weekends! they took off MLK Day!) and not to waste a single dose.

Anyway, people can use this thread to talk about whatever they like, but one thing that would interest me especially is readers’ experiences with vaccination: if you’ve gotten one by now, how hard did you have to look for an appointment, how orderly or chaotic was the process where you live, and what advice can you offer?

Incidentally, to the several commenters on this blog who expressed absolute certainty (as recently as yesterday) that Trump would reverse the election result and be inaugurated instead of Biden, and who confidently accused the rest of us of living in a manufactured media bubble that prevented them from seeing that: I respect that, whatever else is said about you, no one can ever again accuse you of being fair-weather friends!

Congratulations to the new President! There are difficult months ahead, but today the arc of the universe bent slightly toward sanity and goodness.

Update (Jan 21): WOOHOO! Yet another reason to celebrate: Scott Alexander is finally back in business, now blogging at Astral Codex Ten on Substack.

Tommaso DorigoGoogle Ngram Viewer, What A Tool

I know, Google has been around for decades by now, and nobody should be surprised to learn how easy they have made the life of information seekers, among other things (I am also an addict of their search engine, scholar, maps, trends, and gmail utilities). But my mouth still dropped today as I discovered their "ngram viewer". 
It happened by chance. I was trying to find out whether "as best as possible" is really a correct English phrase, or if it is just a tad slang, and the google search pointed to a page where the matter was settled by a cool graph:



read more

John BaezUS Environmental Policy (Part 3)

It’s begun! When it comes to global warming we’re in a race for time, and the US has spent the last four years with its ankles zip-tied together. On his first day in office, the new president of the US signed this executive order:

ACCEPTANCE ON BEHALF OF THE UNITED STATES OF AMERICA

I, Joseph R. Biden Jr., President of the United States of America, having seen and considered the Paris Agreement, done at Paris on December 12, 2015, do hereby accept the said Agreement and every article and clause thereof on behalf of the United States of America.

Done at Washington this 20th day of January, 2021.

JOSEPH R. BIDEN JR.

He also signed this order connected to the climate crisis and other environmental issues:

Executive order on protecting public health and the environment and restoring science to tackle the climate crisis.

It undoes many actions of the previous president.

• It revokes previous executive orders so as to:

  • reduce methane emissions in the oil and gas sector,
  • establish new fuel economy standards,
  • establish new efficiency standards for buildings, and
  • restore protection to a number of park lands and undersea protected areas (“national monuments”).

• It instantly puts a temporary halt to leasing lands in the Arctic National Wildlife Refuge for the purposes of oil and gas drilling, so this program can be reviewed.

• It prevents offshore oil and gas drilling in certain Arctic waters and the Bering Sea.

• It revokes the permit for the Keystone XL pipeline.

• It revives the Interagency Working Group on the Social Cost of Greenhouse Gases, to properly account for the full cost of these emissions.

• It revokes many other executive orders listed in section 7 below.

Here are the details:

By the authority vested in me as President by the Constitution and the laws of the United States of America, it is hereby ordered as follows:

Section 1. Policy. Our Nation has an abiding commitment to empower our workers and communities; promote and protect our public health and the environment; and conserve our national treasures and monuments, places that secure our national memory. Where the Federal Government has failed to meet that commitment in the past, it must advance environmental justice. In carrying out this charge, the Federal Government must be guided by the best science and be protected by processes that ensure the integrity of Federal decision-making. It is, therefore, the policy of my Administration to listen to the science; to improve public health and protect our environment; to ensure access to clean air and water; to limit exposure to dangerous chemicals and pesticides; to hold polluters accountable, including those who disproportionately harm communities of color and low-income communities; to reduce greenhouse gas emissions; to bolster resilience to the impacts of climate change; to restore and expand our national treasures and monuments; and to prioritize both environmental justice and the creation of the well-paying union jobs necessary to deliver on these goals.

To that end, this order directs all executive departments and agencies (agencies) to immediately review and, as appropriate and consistent with applicable law, take action to address the promulgation of Federal regulations and other actions during the last 4 years that conflict with these important national objectives, and to immediately commence work to confront the climate crisis.

Sec. 2. Immediate Review of Agency Actions Taken Between January 20, 2017, and January 20, 2021. (a) The heads of all agencies shall immediately review all existing regulations, orders, guidance documents, policies, and any other similar agency actions (agency actions) promulgated, issued, or adopted between January 20, 2017, and January 20, 2021, that are or may be inconsistent with, or present obstacles to, the policy set forth in section 1 of this order. For any such actions identified by the agencies, the heads of agencies shall, as appropriate and consistent with applicable law, consider suspending, revising, or rescinding the agency actions. In addition, for the agency actions in the 4 categories set forth in subsections (i) through (iv) of this section, the head of the relevant agency, as appropriate and consistent with applicable law, shall consider publishing for notice and comment a proposed rule suspending, revising, or rescinding the agency action within the time frame specified.

(i)    Reducing Methane Emissions in the Oil and Gas Sector:  “Oil and Natural Gas Sector: Emission Standards for New, Reconstructed, and Modified Sources Reconsideration,” 85 Fed. Reg. 57398 (September 15, 2020), by September 2021. 

(ii)   Establishing Ambitious, Job-Creating Fuel Economy Standards:  “The Safer Affordable Fuel-Efficient (SAFE) Vehicles Rule Part One: One National Program,” 84 Fed. Reg. 51310 (September 27, 2019), by April 2021; and “The Safer Affordable Fuel-Efficient (SAFE) Vehicles Rule for Model Years 2021–2026 Passenger Cars and Light Trucks,” 85 Fed. Reg. 24174 (April 30, 2020), by July 2021.  In considering whether to propose suspending, revising, or rescinding the latter rule, the agency should consider the views of representatives from labor unions, States, and industry.

(iii)  Job-Creating Appliance- and Building-Efficiency Standards:  “Energy Conservation Program for Appliance Standards: Procedures for Use in New or Revised Energy Conservation Standards and Test Procedures for Consumer Products and Commercial/Industrial Equipment,” 85 Fed. Reg. 8626 (February 14, 2020), with major revisions proposed by March 2021 and any remaining revisions proposed by June 2021; “Energy Conservation Program for Appliance Standards: Procedures for Evaluating Statutory Factors for Use in New or Revised Energy Conservation Standards,” 85 Fed. Reg. 50937 (August 19, 2020), with major revisions proposed by March 2021 and any remaining revisions proposed by June 2021; “Final Determination Regarding Energy Efficiency Improvements in the 2018 International Energy Conservation Code (IECC),” 84 Fed. Reg. 67435 (December 10, 2019), by May 2021; “Final Determination Regarding Energy Efficiency Improvements in ANSI/ASHRAE/IES Standard 90.1-2016: Energy Standard for Buildings, Except Low-Rise Residential Buildings,” 83 Fed. Reg. 8463 (February 27, 2018), by May 2021.

(iv)   Protecting Our Air from Harmful Pollution:  “National Emission Standards for Hazardous Air Pollutants: Coal- and Oil-Fired Electric Utility Steam Generating Units—Reconsideration of Supplemental Finding and Residual Risk and Technology Review,” 85 Fed. Reg. 31286 (May 22, 2020), by August 2021; “Increasing Consistency and Transparency in Considering Benefits and Costs in the Clean Air Act Rulemaking Process,” 85 Fed. Reg. 84130 (December 23, 2020), as soon as possible; “Strengthening Transparency in Pivotal Science Underlying Significant Regulatory Actions and Influential Scientific Information,” 86 Fed. Reg. 469 (January 6, 2021), as soon as possible.

(b)  Within 30 days of the date of this order, heads of agencies shall submit to the Director of the Office of Management and Budget (OMB) a preliminary list of any actions being considered pursuant to section (2)(a) of this order that would be completed by December 31, 2021, and that would be subject to OMB review.  Within 90 days of the date of this order, heads of agencies shall submit to the Director of OMB an updated list of any actions being considered pursuant to section (2)(a) of this order that would be completed by December 31, 2025, and that would be subject to OMB review.  At the time of submission to the Director of OMB, heads of agencies shall also send each list to the National Climate Advisor.  In addition, and at the same time, heads of agencies shall send to the National Climate Advisor a list of additional actions being considered pursuant to section (2)(a) of this order that would not be subject to OMB review.

(c)  Heads of agencies shall, as appropriate and consistent with applicable law, consider whether to take any additional agency actions to fully enforce the policy set forth in section 1 of this order.  With respect to the Administrator of the Environmental Protection Agency, the following specific actions should be considered:

(i)   proposing new regulations to establish comprehensive standards of performance and emission guidelines for methane and volatile organic compound emissions from existing operations in the oil and gas sector, including the exploration and production, transmission, processing, and storage segments, by September 2021; and

(ii)  proposing a Federal Implementation Plan in accordance with the Environmental Protection Agency’s “Findings of Failure To Submit State Implementation Plan Revisions in Response to the 2016 Oil and Natural Gas Industry Control Techniques Guidelines for the 2008 Ozone National Ambient Air Quality Standards (NAAQS) and for States in the Ozone Transport Region,” 85 Fed. Reg. 72963 (November 16, 2020), for California, Connecticut, New York, Pennsylvania, and Texas by January 2022. 

(d)  The Attorney General may, as appropriate and consistent with applicable law, provide notice of this order and any actions taken pursuant to section 2(a) of this order to any court with jurisdiction over pending litigation related to those agency actions identified pursuant to section (2)(a) of this order, and may, in his discretion, request that the court stay or otherwise dispose of litigation, or seek other appropriate relief consistent with this order, until the completion of the processes described in this order.

(e)  In carrying out the actions directed in this section, heads of agencies shall seek input from the public and stakeholders, including State local, Tribal, and territorial officials, scientists, labor unions, environmental advocates, and environmental justice organizations.

Sec. 3. Restoring National Monuments. (a) The Secretary of the Interior, as appropriate and consistent with applicable law, including the Antiquities Act, 54 U.S.C. 320301 et seq., shall, in consultation with the Attorney General, the Secretaries of Agriculture and Commerce, the Chair of the Council on Environmental Quality, and Tribal governments, conduct a review of the monument boundaries and conditions that were established by Proclamation 9681 of December 4, 2017 (Modifying the Bears Ears National Monument); Proclamation 9682 of December 4, 2017 (Modifying the Grand Staircase-Escalante National Monument); and Proclamation 10049 of June 5, 2020 (Modifying the Northeast Canyons and Seamounts Marine National Monument), to determine whether restoration of the monument boundaries and conditions that existed as of January 20, 2017, would be appropriate.

(b)  Within 60 days of the date of this order, the Secretary of the Interior shall submit a report to the President summarizing the findings of the review conducted pursuant to subsection (a), which shall include recommendations for such Presidential actions or other actions consistent with law as the Secretary may consider appropriate to carry out the policy set forth in section 1 of this order.

(c)  The Attorney General may, as appropriate and consistent with applicable law, provide notice of this order to any court with jurisdiction over pending litigation related to the Grand Staircase-Escalante, Bears Ears, and Northeast Canyons and Seamounts Marine National Monuments, and may, in his discretion, request that the court stay the litigation or otherwise delay further litigation, or seek other appropriate relief consistent with this order, pending the completion of the actions described in subsection (a) of this section.

Sec. 4. Arctic Refuge. (a) In light of the alleged legal deficiencies underlying the program, including the inadequacy of the environmental review required by the National Environmental Policy Act, the Secretary of the Interior shall, as appropriate and consistent with applicable law, place a temporary moratorium on all activities of the Federal Government relating to the implementation of the Coastal Plain Oil and Gas Leasing Program, as established by the Record of Decision signed August 17, 2020, in the Arctic National Wildlife Refuge. The Secretary shall review the program and, as appropriate and consistent with applicable law, conduct a new, comprehensive analysis of the potential environmental impacts of the oil and gas program.

(b)  In Executive Order 13754 of December 9, 2016 (Northern Bering Sea Climate Resilience), and in the Presidential Memorandum of December 20, 2016 (Withdrawal of Certain Portions of the United States Arctic Outer Continental Shelf From Mineral Leasing), President Obama withdrew areas in Arctic waters and the Bering Sea from oil and gas drilling and established the Northern Bering Sea Climate Resilience Area.  Subsequently, the order was revoked and the memorandum was amended in Executive Order 13795 of April 28, 2017 (Implementing an America-First Offshore Energy Strategy).  Pursuant to section 12(a) of the Outer Continental Shelf Lands Act, 43 U.S.C. 1341(a), Executive Order 13754 and the Presidential Memorandum of December 20, 2016, are hereby reinstated in their original form, thereby restoring the original withdrawal of certain offshore areas in Arctic waters and the Bering Sea from oil and gas drilling.

(c)  The Attorney General may, as appropriate and consistent with applicable law, provide notice of this order to any court with jurisdiction over pending litigation related to the Coastal Plain Oil and Gas Leasing Program in the Arctic National Wildlife Refuge and other related programs, and may, in his discretion, request that the court stay the litigation or otherwise delay further litigation, or seek other appropriate relief consistent with this order, pending the completion of the actions described in subsection (a) of this section.

Sec. 5. Accounting for the Benefits of Reducing Climate Pollution. (a) It is essential that agencies capture the full costs of greenhouse gas emissions as accurately as possible, including by taking global damages into account. Doing so facilitates sound decision-making, recognizes the breadth of climate impacts, and supports the international leadership of the United States on climate issues. The “social cost of carbon” (SCC), “social cost of nitrous oxide” (SCN), and “social cost of methane” (SCM) are estimates of the monetized damages associated with incremental increases in greenhouse gas emissions. They are intended to include changes in net agricultural productivity, human health, property damage from increased flood risk, and the value of ecosystem services. An accurate social cost is essential for agencies to accurately determine the social benefits of reducing greenhouse gas emissions when conducting cost-benefit analyses of regulatory and other actions.

(b)  There is hereby established an Interagency Working Group on the Social Cost of Greenhouse Gases (the “Working Group”).  The Chair of the Council of Economic Advisers, Director of OMB, and Director of the Office of Science and Technology Policy  shall serve as Co-Chairs of the Working Group. 

(i)    Membership.  The Working Group shall also include the following other officers, or their designees:  the Secretary of the Treasury; the Secretary of the Interior; the Secretary of Agriculture; the Secretary of Commerce; the Secretary of Health and Human Services; the Secretary of Transportation; the Secretary of Energy; the Chair of the Council on Environmental Quality; the Administrator of the Environmental Protection Agency; the Assistant to the President and National Climate Advisor; and the Assistant to the President for Economic Policy and Director of the National Economic Council.

(ii)   Mission and Work.  The Working Group shall, as appropriate and consistent with applicable law: 

(A)  publish an interim SCC, SCN, and SCM within 30 days of the date of this order, which agencies shall use when monetizing the value of changes in greenhouse gas emissions resulting from regulations and other relevant agency actions until final values are published;

(B)  publish a final SCC, SCN, and SCM by no later than January 2022;

(C)  provide recommendations to the President, by no later than September 1, 2021, regarding areas of decision-making, budgeting, and procurement by the Federal Government where the SCC, SCN, and SCM should be applied; 

(D)  provide recommendations, by no later than June 1, 2022, regarding a process for reviewing, and, as appropriate, updating, the SCC, SCN, and SCM to ensure that these costs are based on the best available economics and science; and

(E)  provide recommendations, to be published with the final SCC, SCN, and SCM under subparagraph (A) if feasible, and in any event by no later than June 1, 2022, to revise methodologies for calculating the SCC, SCN, and SCM, to the extent that current methodologies do not adequately take account of climate risk, environmental justice, and intergenerational equity.

(iii)  Methodology.  In carrying out its activities, the Working Group shall consider the recommendations of the National Academies of Science, Engineering, and Medicine as reported in Valuing Climate Damages: Updating Estimation of the Social Cost of Carbon Dioxide (2017) and other pertinent scientific literature; solicit public comment; engage with the public and stakeholders; seek the advice of ethics experts; and ensure that the SCC, SCN, and SCM reflect the interests of future generations in avoiding threats posed by climate change.

Sec. 6. Revoking the March 2019 Permit for the Keystone XL Pipeline. (a) On March 29, 2019, the President granted to TransCanada Keystone Pipeline, L.P. a Presidential permit (the “Permit”) to construct, connect, operate, and maintain pipeline facilities at the international border of the United States and Canada (the “Keystone XL pipeline”), subject to express conditions and potential revocation in the President’s sole discretion. The Permit is hereby revoked in accordance with Article 1(1) of the Permit.

(b)  In 2015, following an exhaustive review, the Department of State and the President determined that approving the proposed Keystone XL pipeline would not serve the U.S. national interest.  That analysis, in addition to concluding that the significance of the proposed pipeline for our energy security and economy is limited, stressed that the United States must prioritize the development of a clean energy economy, which will in turn create good jobs.  The analysis further concluded that approval of the proposed pipeline would undermine U.S. climate leadership by undercutting the credibility and influence of the United States in urging other countries to take ambitious climate action.

(c)  Climate change has had a growing effect on the U.S. economy, with climate-related costs increasing over the last 4 years.  Extreme weather events and other climate-related effects have harmed the health, safety, and security of the American people and have increased the urgency for combatting climate change and accelerating the transition toward a clean energy economy.  The world must be put on a sustainable climate pathway to protect Americans and the domestic economy from harmful climate impacts, and to create well-paying union jobs as part of the climate solution. 

(d)  The Keystone XL pipeline disserves the U.S. national interest.  The United States and the world face a climate crisis.  That crisis must be met with action on a scale and at a speed commensurate with the need to avoid setting the world on a dangerous, potentially catastrophic, climate trajectory.  At home, we will combat the crisis with an ambitious plan to build back better, designed to both reduce harmful emissions and create good clean-energy jobs.  Our domestic efforts must go hand in hand with U.S. diplomatic engagement.  Because most greenhouse gas emissions originate beyond our borders, such engagement is more necessary and urgent than ever.  The United States must be in a position to exercise vigorous climate leadership in order to achieve a significant increase in global climate action and put the world on a sustainable climate pathway.  Leaving the Keystone XL pipeline permit in place would not be consistent with my Administration’s economic and climate imperatives.

Sec. 7. Other Revocations. (a) Executive Order 13766 of January 24, 2017 (Expediting Environmental Reviews and Approvals For High Priority Infrastructure Projects), Executive Order 13778 of February 28, 2017 (Restoring the Rule of Law, Federalism, and Economic Growth by Reviewing the “Waters of the United States” Rule), Executive Order 13783 of March 28, 2017 (Promoting Energy Independence and Economic Growth), Executive Order 13792 of April 26, 2017 (Review of Designations Under the Antiquities Act), Executive Order 13795 of April 28, 2017 (Implementing an America-First Offshore Energy Strategy), Executive Order 13868 of April 10, 2019 (Promoting Energy Infrastructure and Economic Growth), and Executive Order 13927 of June 4, 2020 (Accelerating the Nation’s Economic Recovery from the COVID-19 Emergency by Expediting Infrastructure Investments and Other Activities), are hereby revoked. Executive Order 13834 of May 17, 2018 (Efficient Federal Operations), is hereby revoked except for sections 6, 7, and 11.

(b)  Executive Order 13807 of August 15, 2017 (Establishing Discipline and Accountability in the Environmental Review and Permitting Process for Infrastructure Projects), is hereby revoked.  The Director of OMB and the Chair of the Council on Environmental Quality shall jointly consider whether to recommend that a replacement order be issued.

(c)  Executive Order 13920 of May 1, 2020 (Securing the United States Bulk-Power System), is hereby suspended for 90 days.  The Secretary of Energy and the Director of OMB shall jointly consider whether to recommend that a replacement order be issued.

(d)  The Presidential Memorandum of April 12, 2018 (Promoting Domestic Manufacturing and Job Creation Policies and Procedures Relating to Implementation of Air Quality Standards), the Presidential Memorandum of October 19, 2018 (Promoting the Reliable Supply and Delivery of Water in the West), and the Presidential Memorandum of February 19, 2020 (Developing and Delivering More Water Supplies in California), are hereby revoked. 

(e)  The Council on Environmental Quality shall rescind its draft guidance entitled, “Draft National Environmental Policy Act Guidance on Consideration of Greenhouse Gas Emissions,” 84 Fed. Reg. 30097 (June 26, 2019).  The Council, as appropriate and consistent with applicable law, shall review, revise, and update its final guidance entitled, “Final Guidance for Federal Departments and Agencies on Consideration of Greenhouse Gas Emissions and the Effects of Climate Change in National Environmental Policy Act Reviews,” 81 Fed. Reg. 51866 (August 5, 2016).

(f)  The Director of OMB and the heads of agencies shall promptly take steps to rescind any orders, rules, regulations, guidelines, or policies, or portions thereof, including, if necessary, by proposing such rescissions through notice-and-comment rulemaking, implementing or enforcing the Executive Orders, Presidential Memoranda, and draft guidance identified in this section, as appropriate and consistent with applicable law.

Sec. 8. General Provisions. (a) Nothing in this order shall be construed to impair or otherwise affect:

(i)   the authority granted by law to an executive department or agency, or the head thereof; or

(ii)  the functions of the Director of the Office of Management and Budget relating to budgetary, administrative, or legislative proposals.

(b)  This order shall be implemented in a manner consistent with applicable law and subject to the availability of appropriations.

(c)  This order is not intended to, and does not, create any right or benefit, substantive or procedural, enforceable at law or in equity by any party against the United States, its departments, agencies, or entities, its officers, employees, or agents, or any other person.

JOSEPH R. BIDEN JR.

THE WHITE HOUSE,
January 20, 2021.

David Hoggis leading-order time dependence spirally?

Independently, Kathryn Johnston (Columbia) and David Spergel (Flatiron) have pointed out to me that if you have a Hamiltonian dynamical system that is slightly out of steady-state, you can do a kind of expansion, in which the steady-state equation is just the zeroth order term in an expansion. The first-order term looks like the zeroth-order Hamiltonian acting on the first-order perturbation to the distribution function, plus the first-order perturbation to the Hamiltonian acting on the zeroth-order distribution function (equals a time derivative of the distribution function). That's cool!

Now couple that idea with the fact that a steady-state Hamiltonian system is a set of phase-mixed orbits nested in phase space (literally a 3-torus foliation of 6-space). Isn't this first-order equation the equation of a winding-up spiral mode? I think it is! If so, it might unite a bunch of phenomenology, from cold stellar streams to spiral structure in the disk to The Snail. I discussed all this with Adrian Price-Whelan (Flatiron).

n-Category Café Postdoctoral Position in HoTT at the University of San Diego

The University of San Diego invites applications for a postdoctoral research fellowship in homotopy type theory beginning Fall 2021, or earlier if desired. This is intended as a two-year position with potential extension to a third year, funded by the second AFOSR MURI grant for HoTT, entitled “Synthetic and Constructive Mathematics of Higher Structures in Homotopy Type Theory”.

This is the same grant that’s funding Emily’s postdoc, but my bureaucrats took longer to get their ducks in a row, with the result that I won’t be able to make an offer before the AMS coordinated postdoc response deadline of February 1. Indeed, February 1 is only the priority submission deadline for the application, less than 2 weeks away. However, I’ll do my best to make a quick turnaround, and if you’re interested in the position but have other offers requiring a response, please let me know before giving up. As with Emily’s position, you must at least be able to obtain a visa to come and work physically in the U.S.; I’m still working on the details of visa sponsorship, but it should be possible.

Applications are encouraged from candidates working in any area related to homotopy type theory, broadly construed. If your background is in a related area but you’re interested in getting into the field, don’t let that deter you. Please include in your cover letter a discussion of your background and future research goals as they relate specifically to homotopy type theory, as well as any specific interests you may have in collaborating with the team for the grant, which in addition to myself includes Steve Awodey, Bob Harper, Favonia, Dan Licata, and Emily Riehl, and their students and postdocs. This is especially important if you’re new to the field and HoTT doesn’t figure prominently in your CV and past research.

The University of San Diego (not to be confused with the University of California, San Diego) is a relatively small private Catholic university. Although the university includes a few graduate schools, the mathematics department is in the College of Arts and Sciences, which is purely an undergraduate liberal arts college. We have no graduate students, and this will be the first postdoc ever in the math department. So you should be aware what you’re getting into; but on the other hand, if you have any interest in eventually becoming faculty at a liberal-arts college, this could be a good opportunity to get a feel for what such a department is like. Although I can’t make any promises, the postdoc may have the opportunity to help supervise undergraduate research students and/or to teach undergraduate courses if they are interested (though this is not required, and will be decided later).

Applications should be submitted through the University of San Diego recruitment system. Please email to alert me of your application, as well as with any questions you have!

USD is an Equal Opportunity employer, and is especially interested in candidates who can contribute to the diversity and excellence of the academic community.

January 20, 2021

John BaezCategories of Nets (Part 2)

guest post by Michael Shulman

Now that John gave an overview of the Petri nets paper that he and I have just written with Jade and Fabrizio, I want to dive a bit more into what we accomplish. The genesis of this paper was a paper written by Fabrizio and several other folks entitled Computational Petri Nets: Adjunctions Considered Harmful, which of course sounds to a category theorist like a challenge. Our paper, and particularly the notion of Σ-net and the adjunction in the middle column relating Σ-nets to symmetric strict monoidal categories, is an answer to that challenge.

Suppose you wanted to “freely” generate a symmetric monoidal category from some combinatorial data. What could that data be? In other words (for a category theorist at least), what sort of category \mathsf{C} appears in an adjunction \mathsf{C} \rightleftarrows \mathsf{SMC}? (By the way, all monoidal categories in this post will be strict, so I’m going to drop that adjective for conciseness.)

Perhaps the simplest choice is the same data that naturally generates a plain category, namely a directed graph. However, this is pretty limited in terms of what symmetric monoidal categories it can generate, since the generating morphisms will always only have single generating objects as their domain and codomain.

Another natural choice is the same data that naturally generates a multicategory, which might be called a “multigraph”: a set of objects together with, for every tuple of objects x_1,\dots,x_n and single object y, a set of arrows from (x_1,\dots,x_n) to y. In the generated symmetric monoidal category, such an arrow gives rise to a morphism x_1\otimes\cdots\otimes x_n \to y; thus we can now have multiple generating objects in the domains of generating morphisms, but not the codomains.

Of course, this suggests an even better solution: a set of objects, together with a set of arrows for every pair of tuples (x_1,\dots,x_m) and (y_1,\dots,y_n). I’d be tempted to call this a “polygraph”, since it also naturally generates a polycategory. But other folks got there first and called it a “tensor scheme” and also a “pre-net”. In the latter case, the objects are called “places” and the morphisms “transitions”. But whatever we call it, it allows us to generate free symmetric monoidal categories in which the domains and codomains of generating morphisms can both be arbitrary tensor products of generating objects. For those who like fancy higher-categorical machinery, it’s the notion of computad obtained from the monad for symmetric monoidal categories.

However, pre-nets are not without flaws. One of the most glaring, for people who actually want to compute with freely generated symmetric monoidal categories, is that there aren’t enough morphisms between them. For instance, suppose one pre-net N has three places x,y,z and a transition f:(x,x,y) \to z, while a second pre-net N' has three places x',y',z' and a transition f':(x',y',x') \to z'. Once we generate a symmetric monoidal category, then f can be composed with a symmetry x\otimes y \otimes x \cong x\otimes x\otimes y and similarly for f'; so the symmetric monoidal categories generated by N and N' are isomorphic. But there isn’t even a single map of pre-nets from N to N' or vice versa, because a map of pre-nets has to preserve the ordering on the inputs and outputs. This is weird and annoying for combinatorial data that’s supposed to present a symmetric monoidal category.

Another way of making essentially the same point is that just as the adjunction between SMCs and directed graphs factors through categories, and the adjunction between SMCs and multigraphs factors through multicategories, the adjunction between SMCs and pre-nets factors through non-symmetric monoidal categories. In other words, a pre-net is really better viewed as data for generating a non-symmetric monoidal category, which we can then freely add symmetries to.

By contrast, in the objects that we call “Petri nets”, the domain and codomain of each generating morphism are elements of the free commutative monoid on the set of places—as opposed to elements of the free monoid, which is what they are for a pre-net. Thus, the domain of f and f' above would be x+x+y and x+y+x respectively, which in a commutative monoid are equal (both are 2x+y). So the corresponding Petri nets of N and N' are indeed isomorphic. However, once we squash everything down in this way, we lose the ability to functorially generate a symmetric monoidal category; all we can generate is a commutative monoidal category where all the symmetries are identities.

At this point we’ve described the upper row and the left- and right-hand columns in John’s diagram:

What’s missing is a kind of net in the middle that corresponds to symmetric monoidal categories. To motivate the definition of Σ-net, consider how to solve the problem above of the “missing morphisms”. We want to send f:(x,x,y) \to z to a “permuted version” of f':(x',y',x') \to z'. For this to be implemented by an actual set-map, we need this “permuted version” to be present in the data of N' somehow. This suggests that the transitions should come with a permutation action like that of, say, a symmetric multicategory. Then inside N' we can actually act on f' by the transposition \tau = (2,3) \in S_3, yielding a new morphism \tau(f') : (x',x',y')\to z' which we can take to be the image of f. Of course, we can also act on f' by other permutations, and likewise on f; but since these permutation actions are part of the structure they must be preserved by the morphism, so sending f to \tau(f') uniquely determines where we have to send all these permutation images.

Now you can go back and look again at John’s definition of Σ-net: a set S, a groupoid T, and a discrete opfibration T \to P S \times P S ^{op}, where P denotes the free-symmetric-strict-monoidal-category functor \mathsf{Set} \to \mathsf{Cat}. Such a discrete opfibration is the same as a functor N \colon P S \times P S ^{op} \to \mathsf{Set}, and the objects of P S are the finite sequences of elements of S while its morphisms are permutations; thus this is precisely a pre-net (the action of the functor N on objects) with permutation actions as described above. I won’t get into the details of constructing the adjunction relating Σ-nets to symmetric monoidal categories; you can read the paper, or maybe I’ll blog about it later.

However, in solving the “missing morphisms” problem, we’ve introduced a new possibility. Suppose we act on f \colon (x,x,y) \to z by the transposition \sigma = (1,2) \in S_3 that switches the first two entries. We get another transition (x,x,y)\to z with the same domain and codomain as f; so it might equal f, or it might not! In other words, transitions in a Σ-net can have isotropy. If \sigma(f)=f, then when we generate a free symmetric monoidal category from our Σ-net, the corresponding morphism f:x\otimes x \otimes y \to z will have the property that when we compose it with the symmetry morphism x\otimes x\otimes y \cong x\otimes x\otimes y we get back f again. No symmetric monoidal category generated by a pre-net has this property; it’s more like the behavior of the commutative monoidal category generated by a Petri net, except that in the latter case the symmetry x\otimes x\otimes y \cong x\otimes x\otimes y itself is the identity, rather than just acting by the identity on f.

This suggests that Σ-nets can either “behave like pre-nets” or “behave like Petri nets”. This is made precise by the bottom row of adjunctions in the diagram. On one hand, we can map a pre-net to a Σ-net by freely generating the action of all permutations. This has a right adjoint that just forgets the permutation action (which actually has a further right adjoint, although that’s a bit weird). On the other hand, we can map a Petri net to a Σ-net by making all the permutations act as trivially as possible; this has a left adjoint that identifies each transition with all its permutation images. And these adjunctions commute with the three “free monoidal category” adjunctions in reasonable ways (see the paper for details).

The right adjoint mapping Petri nets into Σ-nets is fully faithful, so we really can say that Σ-nets “include” Petri nets. The left adjoint mapping pre-nets to Σ-nets is not fully faithful—it can’t possibly be, since the whole point of introducing Σ-nets was that pre-nets don’t have enough morphisms! But the full image of this functor is equivalent to a fourth kind of net: Kock’s whole-grain Petri nets. Kock’s approach to solving the problem of pre-nets is somewhat different, more analogous to the notion of “fat” symmetric monoidal category: he takes the domain and codomain of each transition to be a family of places indexed by a finite set. But his category turns out to be equivalent to the category of Σ-nets that are freely generated by some pre-net. (Kock actually proved this himself, as well as sketching the adjunction between Σ-nets and symmetric monoidal categories. He called Σ-nets “digraphical species”.)

So Σ-nets “include” both Petri nets and pre-nets, in an appropriate sense. The pre-nets (or, more precisely, whole-grain nets) are the Σ-nets with free permutation actions (trivial isotropy), while the Petri nets are the Σ-nets with trivial permutation actions (maximal isotropy). In Petri-net-ese, these correspond to the “individual token philosophy” and the “collective token philosophy”, respectively. (This makes it tempting to refer to the functors from Σ-nets to pre-nets and Petri nets as individuation and collectivization respectively.) But Σ-nets also allow us to mix and match the two philosophies, having some transitions with trivial isotropy, others with maximal isotropy, and still others with intermediate isotropy.

I like to think of Σ-nets as a Petri net analogue of orbifolds. Commutative-monoid-based Petri nets are like “coarse moduli spaces”, where we’ve quotiented by all symmetries but destroyed all the isotropy information; while whole-grain Petri nets are like manifolds, where we have no singularities but can only quotient by free actions. Pre-nets can then be thought of a “presentation” of a manifold, such as by a particular way of gluing coordinate patches together: useful in concrete examples, but not the “invariant” object we really want to study mathematically.

John BaezCategories of Nets (Part 1)

I’ve been thinking about Petri nets a lot. Around 2010, I got excited about using them to describe chemical reactions, population dynamics and more, using ideas taken from quantum physics. Then I started working with my student Blake Pollard on ‘open’ Petri nets, which you can glue together to form larger Petri nets. Blake and I focused on their applications to chemistry, but later my student Jade Master and I applied them to computer science and brought in some new math. I was delighted when Evan Patterson and Micah Halter used all this math, along with ideas of Joachim Kock, to develop software for rapidly assembling models of COVID-19.

Now I’m happy to announce that Jade and I have teamed up with Fabrizio Genovese and Mike Shulman to straighten out a lot of mysteries concerning Petri nets and their variants:

• John Baez, Fabrizio Genovese, Jade Master and Mike Shulman, Categories of nets.

This paper is full of interesting ideas, but I’ll just tell you the basic framework.

A Petri net is a seemingly simple thing:

It consists of places (drawn as circles) and transitions (drawn as boxes), with directed edges called arcs from places to transitions and from transitions to places.

The idea is that when you use a Petri net, you put dots called tokens in the places, and then move them around using the transitions:

A Petri net is actually a way of describing a monoidal category. A way of putting a bunch of tokens in the places gives an object of this category, and a way of moving them around repeatedly (as above) gives a morphism.

The idea sounds straightforward enough. But it conceals some subtleties, which researchers have been struggling with for at least 30 years.

There are various ways to make the definition of Petri net precise. For example: is there a finite set of arcs from a given place to a given transition (and the other way around), or merely a natural number of arcs? If there is a finite set, is this set equipped with an ordering or not? Furthermore, what is a morphism between Petri nets?

Different answers are good for different purposes. In the so-called ‘individual token philosophy’, we allow a finite set of tokens in each place. In the ‘collective token philosophy’, we merely allow a natural number of tokens in each place. It’s like the difference between having 4 individual workers named John, Fabrizio, Jade and Mike where you can tell who did what, and having 4 anonymous workers: nameless drones.

Our goal was to sort this out all and make it crystal clear. We focus on 3 kinds of net, each of which naturally generates its own kind of monoidal category:

pre-nets, which generate free strict monoidal categories.

Σ-nets, which generate free symmetric strict monoidal categories.

Petri nets, which generate free commutative monoidal categories.

These three kinds of monoidal category differ in how ‘commutative’ they are:

• In a strict monoidal category we typically have x \otimes y \ne y \otimes x.

• In a strict symmetric monoidal category we have for each pair of objects a chosen isomorphism x \otimes y \cong y \otimes x.

• A commutative monoidal category is a symmetric strict monoidal category where the symmetry isomorphisms are all identities, so x \otimes y = y \otimes x.

So, we have a spectrum running from hardcore individualism, where two different things of the same type are never interchangeable… to hardcore collectivism, where two different things of the same type are so interchangeable that switching them counts as doing nothing at all! In the theory of Petri nets and their variants, the two extremes have been studied better than the middle.

You can summarize the story with this diagram:

There are three different categories of nets at bottom, and three diffferent categories of monoidal categories on top — all related by adjoint functors! Here the left adjoints point up the page — since different kinds of nets freely generate different kinds of monoidal categories — and also to the right, in the direction of increasing ‘commutativity’.

If you’re a category theorist you’ll recognize at least two of the three categories on top:

\mathsf{StrMC}, with strict monoidal categories as objects and strict monoidal functors as morphisms.

\mathsf{SSMC}, with symmetric strict monoidal categories as objects and strict symmetric monoidal functors as their morphisms.

\mathsf{CMC}, with commutative monoidal categories as objects and strict symmetric monoidal functors as morphisms. A commutative monoidal category is a symmetric strict monoidal category where the symmetry is the identity.

The categories of nets are probably less familiar. But they are simple enough. Here I’ll just describe their objects. The morphisms are fairly obvious, but read our paper for details.

\mathsf{PreNet}, with pre-nets as objects. A pre-net consists of a set S of places, a set T of transitions, and a function T \to S^\ast\times S^\ast, where S^\ast is the set of lists of elements of S.

\Sigma\mathsf{-net}, with Σ-nets as objects. A Σ-net consists of a set S, a groupoid T, and a discrete opfibration T \to P S \times P S^{\mathrm{op}}, where P S is the free symmetric strict monoidal category generated by a set of objects S and no generating morphisms.

\mathsf{Petri}, with Petri nets as objects. A Petri net, as we use the term, consists of a set S, a set T, and a function T \to \mathbb{N}[S] \times \mathbb{N}[S], where \mathbb{N}[S] is the set of multisets of elements of S.

What does this mean in practice?

• In a pre-net, each transition has an ordered list of places as ‘inputs’ and an ordered list of places as ‘outputs’. We cannot permute the inputs or outputs of a transition.

• In a Σ-net, each transition has an ordered list of places as inputs and an ordered list of places as outputs. However, permuting the entries of these lists gives a new transition with a new list of inputs and a new list of outputs!

• In a Petri net, each transition has a multiset of places as inputs and a multiset of places as outputs. A multiset is like an ‘unordered list’: entries can appear repeatedly, but the order makes no difference at all.

So, pre-nets are rigidly individualist. Petri nets are rigidly collectivist. And Σ-nets are flexible, including both extremes as special cases!

On the one hand, we can use the left adjoint functor

\mathsf{PreNet} \to \Sigma\mathsf{-net}

to freely generate Σ-nets from pre-nets. If we do this, we get Σ-nets such that permutations of inputs and outputs act freely on transitions. Joachim Kock has recently studied Σ-nets of this sort. He calls them whole-grain Petri nets, and he treats them as forming a category in their own right, but it’s also the full image of the above functor.

On the other hand, we can use the right adjoint functor

\mathsf{Petri} \to \Sigma\mathsf{-net}

to turn Petri nets into Σ-nets. If we do this, we get Σ-nets such that permutations of inputs and outputs act trivially on transitions: the permutations have no effect at all.

I’m not going to explain how we got any of the adjunctions in this diagram:

That’s where the interesting category theory comes in. Nor will I tell you about the various alternative mathematical viewpoints on Σ-nets… nor how we draw them. I also won’t explain our work on open nets and open categories of all the various kinds. I’m hoping Mike Shulman will say some more about what we’ve done. That’s why this blog article is optimistically titled “Part 1”.

But I hope you see the main point. There are three different kinds of things like Petri nets, each of which serves to freely generate a different kind of monoidal category. They’re all interesting, and a lot of confusion can be avoided if we don’t mix them up!

David Hoggintegrated hardware–software systems

I had a wide-ranging conversation today with Rob Simcoe (MIT) about connections between my group in New York and his group in Cambridge MA. He does complex hardware. We do principled software. These two things depend on each other, or should! And yet few instruments are designed with software fully in mind, in the sense of making good, non-trivial trades between hardware costs and software costs. And also few software systems are built with deep knowledge of the hardware that produces the input data. So there are synergies possible here.

January 19, 2021

n-Category Café Applied Category Theory 2021 Adjoint School

Do you want to get involved in applied category theory? Are you willing to do a lot of work and learn a lot? Then this is for you:

There are four projects to choose from, with great mentors. You can see descriptions of them below!

By the way, it’s not yet clear if there will be an in-person component to this school — but if there is, it’ll happen at the University of Cambridge. ACT2021 is being organized by Jamie Vicary, who teaches in the computer science department there.

Who should apply?

Anyone, from anywhere in the world, who is interested in applying category-theoretic methods to problems outside of pure mathematics. This is emphatically not restricted to math students, but one should be comfortable working with mathematics. Knowledge of basic category-theoretic language — the definition of monoidal category for example — is encouraged.

We will consider advanced undergraduates, PhD students, post-docs, as well as people working outside of academia. Members of groups which are underrepresented in the mathematics and computer science communities are especially encouraged to apply.

School overview

Participants are divided into four-person project teams. Each project is guided by a mentor and a TA. The Adjoint School has two main components: an Online Seminar that meets regularly between February and June, and an in-person Research Week in Cambridge, UK on July 5–9.

During the online seminar, we will read, discuss, and respond to papers chosen by the project mentors. Every other week, a pair of participants will present a paper which will be followed by a group discussion. Leading up to this presentation, study groups will meet to digest the reading in progress, and students will submit reading responses. After the presentation, the presenters will summarize the paper into a blog post for The nn-Category Cafe.

The in-person research week will be held the week prior to the International Conference on Applied Category Theory and in the same location. During the week, participants work intensively with their research group under the guidance of their mentor. Projects from the Adjoint School will be presented during this conference. Both components of the school aim to develop a sense of belonging and camaraderie in students so that they can fully participate in the conference, for example by attending talks and chatting with other conference goers.

Projects to choose from

Here are the four projects.

Topic: Categorical and computational aspects of C-sets

Mentors: James Fairbanks and Evan Patterson

Description: Applied category theory includes major threads of inquiry into monoidal categories and hypergraph categories for describing systems in terms of processes or networks of interacting components. Structured cospans are an important class of hypergraph categories. For example, Petri net-structured cospans are models of concurrent processes in chemistry, epidemiology, and computer science. When the structured cospans are given by C-sets (also known as co-presheaves), generic software can be implemented using the mathematics of functor categories. We will study mathematical and computational aspects of these categorical constructions, as well as applications to scientific computing.

Readings:

Topic: The ubiquity of enriched profunctor nuclei

Mentor: Simon Willerton

Description: In 1964, Isbell developed a nice universal embedding for metric spaces: the tight span. In 1966, Isbell developed a duality for presheaves. These are both closely related to enriched profunctor nuclei, but the connection wasn’t spotted for 40 years. Since then, many constructions in mathematics have been observed to be enriched profunctor nuclei too, such as the fuzzy/formal concept lattice, tropical convex hull, and the Legendre–Fenchel transform. We’ll explore the world of enriched profunctor nuclei, perhaps seeking out further useful examples.

Readings:

Topic: Double categories in applied category theory

Mentor: Simona Paoli

Description: Bicategories and double categories (and their symmetric monoidal versions) have recently featured in applied category theory: for instance, structured cospans and decorated cospans have been used to model several examples, such as electric circuits, Petri nets and chemical reaction networks.

An approach to bicategories and double categories is available in higher category theory through models that do not require a direct checking of the coherence axioms, such as the Segal-type models. We aim to revisit the structures used in applications in the light of these approaches, in the hope to facilitate the construction of new examples of interest in applications.

Readings:

and introductory chapters of:

Topic: Extensions of coalgebraic dynamic logic

Mentors: Helle Hvid Hansen and Clemens Kupke

Description: Coalgebra is a branch of category theory in which different types of state-based systems are studied in a uniform framework, parametric in an endofunctor F:CCF\colon C \to C that specifies the system type. Many of the systems that arise in computer science, including deterministic/nondeterministic/weighted/probabilistic automata, labelled transition systems, Markov chains, Kripke models and neighbourhood structures, can be modeled as F-coalgebras. Once we recognise that a class of systems are coalgebras, we obtain general coalgebraic notions of morphism, bisimulation, coinduction and observable behaviour.

Modal logics are well-known formalisms for specifying properties of state-based systems, and one of the central contributions of coalgebra has been to show that modal logics for coalgebras can be developed in the general parametric setting, and many results can be proved at the abstract level of coalgebras. This area is called coalgebraic modal logic.

In this project, we will focus on coalgebraic dynamic logic, a coalgebraic framework that encompasses Propositional Dynamic Logic (PDL) and Parikh’s Game Logic. The aim is to extend coalgebraic dynamic logic to system types with probabilities. As a concrete starting point, we aim to give a coalgebraic account of stochastic game logic, and apply the coalgebraic framework to prove new expressiveness and completeness results.

Participants in this project would ideally have some prior knowledge of modal logic and PDL, as well as some familiarity with monads.

Readings:

Parts of these:

n-Category Café Categories of Nets (Part 2)

Now that John gave an overview of the Petri nets paper that he and I have just written with Jade and Fabrizio, I want to dive a bit more into what we accomplish. The genesis of this paper was a paper written by Fabrizio and several other folks entitled Computational Petri Nets: Adjunctions Considered Harmful, which of course sounds to a category theorist like a challenge. Our paper, and particularly the notion of Σ\Sigma-net and the adjunction in the middle column relating Σ\Sigma-nets to symmetric strict monoidal categories, is an answer to that challenge.

Suppose you wanted to “freely” generate a symmetric monoidal category from some combinatorial data. What could that data be? In other words (for a category theorist at least), what sort of category CC appears in an adjunction CSMCC \rightleftarrows SMC? (By the way, all monoidal categories in this post will be strict, so I’m going to drop that adjective for conciseness.)

Perhaps the simplest choice is the same data that naturally generates a plain category, namely a directed graph. However, this is pretty limited in terms of what symmetric monoidal categories it can generate, since the generating morphisms will always only have single generating objects as their domain and codomain.

Another natural choice is the same data that naturally generates a multicategory, which might be called a “multigraph”: a set of objects together with, for every tuple of objects x 1,,x nx_1,\dots,x_n and single object yy, a set of arrows from (x 1,,x n)(x_1,\dots,x_n) to yy. In the generated symmetric monoidal category, such an arrow gives rise to a morphism x 1x nyx_1\otimes\cdots\otimes x_n \to y; thus we can now have multiple generating objects in the domains of generating morphisms, but not the codomains.

Of course, this suggests an even better solution: a set of objects, together with a set of arrows for every pair of tuples (x 1,,x m)(x_1,\dots,x_m) and (y 1,,y n)(y_1,\dots,y_n). I’d be tempted to call this a “polygraph”, since it also naturally generates a polycategory. But other folks got there first and called it a “tensor scheme” and also a “pre-net”. In the latter case, the objects are called “places” and the morphisms “transitions”. But whatever we call it, it allows us to generate free symmetric monoidal categories in which the domains and codomains of generating morphisms can both be arbitrary tensor products of generating objects. For those who like fancy higher-categorical machinery, it’s the notion of computad obtained from the monad for symmetric monoidal categories.

However, pre-nets are not without flaws. One of the most glaring, for people who actually want to compute with freely generated symmetric monoidal categories, is that there aren’t enough morphisms between them. For instance, suppose one pre-net NN has three places x,y,zx,y,z and a transition f:(x,x,y)zf:(x,x,y) \to z, while a second pre-net NN' has three places x,y,zx',y',z' and a transition f:(x,y,x)zf':(x',y',x') \to z'. Once we generate a symmetric monoidal category, then ff can be composed with a symmetry xyxxxyx\otimes y \otimes x \cong x\otimes x\otimes y and similarly for ff'; so the symmetric monoidal categories generated by NN and NN' are isomorphic. But there isn’t even a single map of pre-nets from NN to NN' or vice versa, because a map of pre-nets has to preserve the ordering on the inputs and outputs. This is weird and annoying for combinatorial data that’s supposed to present a symmetric monoidal category.

Another way of making essentially the same point is that just as the adjunction between SMCs and directed graphs factors through categories, and the adjunction between SMCs and multigraphs factors through multicategories, the adjunction between SMCs and pre-nets factors through non-symmetric monoidal categories. In other words, a pre-net is really better viewed as data for generating a non-symmetric monoidal category, which we can then freely add symmetries to.

By contrast, in the objects that we call “Petri nets”, the domain and codomain of each generating morphism are elements of the free commutative monoid on the set of places — as opposed to elements of the free monoid, which is what they are for a pre-net. Thus, the domain of ff and ff' above would be x+x+yx+x+y and x+y+xx+y+x respectively, which in a commutative monoid are equal (both are 2x+y2x+y). So the corresponding Petri nets of NN and NN' are indeed isomorphic. However, once we squash everything down in this way, we lose the ability to functorially generate a symmetric monoidal category; all we can generate is a commutative monoidal category where all the symmetries are identities.

At this point we’ve described the upper row and the left- and right-hand columns in John’s diagram:

What’s missing is a kind of net in the middle that corresponds to symmetric monoidal categories. To motivate the definition of Σ\Sigma-net, consider how to solve the problem above of the “missing morphisms”. We want to send f:(x,x,y)zf:(x,x,y) \to z to a “permuted version” of f:(x,y,x)zf':(x',y',x') \to z'. For this to be implemented by an actual set-map, we need this “permuted version” to be present in the data of NN' somehow. This suggests that the transitions should come with a permutation action like that of, say, a symmetric multicategory. Then inside NN' we can actually act on ff' by the transposition τ=(2,3)S 3\tau = (2,3) \in S_3, yielding a new morphism τ(f):(x,x,y)z\tau(f') : (x',x',y')\to z' which we can take to be the image of ff. Of course, we can also act on ff' by other permutations, and likewise on ff; but since these permutation actions are part of the structure they must be preserved by the morphism, so sending ff to τ(f)\tau(f') uniquely determines where we have to send all these permutation images.

Now you can go back and look again at John’s definition of Σ\Sigma-net: a set SS, a groupoid TT, and a discrete opfibration TPS×PS opT \to P S \times P S ^{op}, where PP denotes the free-symmetric-strict-monoidal-category functor SetCatSet \to Cat. Such a discrete opfibration is the same as a functor N:PS×PS opSetN:P S \times P S ^{op} \to Set, and the objects of PSP S are the finite sequences of elements of SS while its morphisms are permutations; thus this is precisely a pre-net (the action of the functor NN on objects) with permutation actions as described above. I won’t get into the details of constructing the adjunction relating Σ\Sigma-nets to symmetric monoidal categories; you can read the paper, or maybe I’ll blog about it later.

However, in solving the “missing morphisms” problem, we’ve introduced a new possibility. Suppose we act on f:(x,x,y)zf:(x,x,y) \to z by the transposition σ=(1,2)S 3\sigma = (1,2) \in S_3 that switches the first two entries. We get another transition (x,x,y)z(x,x,y)\to z with the same domain and codomain as ff; so it might equal ff, or it might not! In other words, transitions in a Σ\Sigma-net can have isotropy. If σ(f)=f\sigma(f)=f, then when we generate a free symmetric monoidal category from our Σ\Sigma-net, the corresponding morphism f:xxyzf:x\otimes x \otimes y \to z will have the property that when we compose it with the symmetry morphism xxyxxyx\otimes x\otimes y \cong x\otimes x\otimes y we get back ff again. No symmetric monoidal category generated by a pre-net has this property; it’s more like the behavior of the commutative monoidal category generated by a Petri net, except that in the latter case the symmetry xxyxxyx\otimes x\otimes y \cong x\otimes x\otimes y itself is the identity, rather than just acting by the identity on ff.

This suggests that Σ\Sigma-nets can either “behave like pre-nets” or “behave like Petri nets”. This is made precise by the bottom row of adjunctions in the diagram. On one hand, we can map a pre-net to a Σ\Sigma-net by freely generating the action of all permutations. This has a right adjoint that just forgets the permutation action (which actually has a further right adjoint, although that’s a bit weird). On the other hand, we can map a Petri net to a Σ\Sigma-net by making all the permutations act as trivially as possible; this has a left adjoint that identifies each transition with all its permutation images. And these adjunctions commute with the three “free monoidal category” adjunctions in reasonable ways (see the paper for details).

The right adjoint mapping Petri nets into Σ\Sigma-nets is fully faithful, so we really can say that Σ\Sigma-nets “include” Petri nets. The left adjoint mapping pre-nets to Σ\Sigma-nets is not fully faithful — it can’t possibly be, since the whole point of introducing Σ\Sigma-nets was that pre-nets don’t have enough morphisms! But the full image of this functor is equivalent to a fourth kind of net: Kock’s whole-grain Petri nets. Kock’s approach to solving the problem of pre-nets is somewhat different, more analogous to the notion of “fat” symmetric monoidal category: he takes the domain and codomain of each transition to be a family of places indexed by a finite set. But his category turns out to be equivalent to the category of Σ\Sigma-nets that are freely generated by some pre-net. (Kock actually proved this himself, as well as sketching the adjunction between Σ\Sigma-nets and symmetric monoidal categories. He called Σ\Sigma-nets “digraphical species”.)

So Σ\Sigma-nets “include” both Petri nets and pre-nets, in an appropriate sense. The pre-nets (or, more precisely, whole-grain nets) are the Σ\Sigma-nets with free permutation actions (trivial isotropy), while the Petri nets are the Σ\Sigma-nets with trivial permutation actions (maximal isotropy). In Petri-net-ese, these correspond to the “individual token philosophy” and the “collective token philosophy”, respectively. (This makes it tempting to refer to the functors from Σ\Sigma-nets to pre-nets and Petri nets as individuation and collectivization respectively.) But Σ\Sigma-nets also allow us to mix and match the two philosophies, having some transitions with trivial isotropy, others with maximal isotropy, and still others with intermediate isotropy.

I like to think of Σ\Sigma-nets as a Petri net analogue of orbifolds. Commutative-monoid-based Petri nets are like “coarse moduli spaces”, where we’ve quotiented by all symmetries but destroyed all the isotropy information; while whole-grain Petri nets are like manifolds, where we have no singularities but can only quotient by free actions. Pre-nets can then be thought of a “presentation” of a manifold, such as by a particular way of gluing coordinate patches together: useful in concrete examples, but not the “invariant” object we really want to study mathematically.

David Hoggmathematical derivation of our clustering estimator

In a long conversation, Kate Storey-Fisher (NYU) and I worked through her new and nearly complete derivation of our continuous-function estimator for the correlation function (for large-scale structure). We constructed the estimator heuristically, and demonstrated its correctness somewhat indirectly, so we didn't have a good mathematical derivation per se in the paper. Now we do!

David Hoggstream finding

Stars and Exoplanets meeting at Flatiron (well on xoom, really) was all about finding stellar streams. Matt Buckley (Rutgers) talked about repurposing to the astrophysical domain machine-learning methods employed in high-energy physics experimental data to find anomalies. Sarah Pearson (Flatiron) talked about building things that evolve from Hough transforms. In both cases we (the audience) argued that the projects should make catalogs of potential streams with low (non-conservative) thresholds: After all, it is better to find low-mass streams plus some junk than it is to miss them: Every stream is potentially uniquely valuable.

January 18, 2021

Doug NatelsonBrief items, new year edition

 It's been a busy time, but here are a few items for news and discussion:

  • President-Elect Biden named key members of his science team, and for the first time ever has elevated the role of Presidential Science Advisor (and head of the White House Office of Science and Technology Policy) to a cabinet-level position.  
  • The President-Elect has also written a letter to the science advisor, outlining key questions that he wants to be considered.  
  • There is talk of a "Science New Deal", unsurprisingly directed a lot toward the pandemic, climate change, and American technological competitiveness.
  • The webcomic SMBC has decided to address controversy head on, reporting "Congressman Johnson comes out against Pauli Exclusion."  This would have rather negative unintended consequences, like destabilizing all matter more complex than elementary particles....
  • This session promises to be an interesting one at the March APS meeting, as it goes right to the heart of how difficult it is to distinguish Majorana fermion signatures in superconductor/semiconductor hybrid structures from spurious electrical features.  I may try to write more about this soon.
  • This paper (arxiv version) is very striking.  Looking in the middle of a sheet of WTe2 (that is, away from where the topological edge states live), the authors see quantum oscillations of the resistance as a function of magnetic field that look a lot like Landau quantization, even though the bulk of the material is (at zero field) quite insulating.  I need to think more carefully about the claim that this argues in favor of some kind of emergent neutral fermions.
  • Being on twitter for four months has made me realize how reality-warping that medium is.  Reading about science on twitter can be incredibly wearing - it feels like seemingly everyone else out there is publishing in glossy journals, winning major prizes, and landing huge grants.  This is, of course, a selection effect, but I don't think it's healthy.
  • I do think twitter has driven blog traffic up a bit, but I actually wonder if occasionally posting blog links to /r/physics on reddit would be far more effective in terms of outreach.  When one of my posts ends up there, it gets literally 50x the page views than normal.  Still, I have an old-internet-user aversion to astroturfing.  

Tommaso DorigoPlaying With Radioactivity

Broadly speaking, radioactivity is not something one should mess with just as a pastime. Indeed, ionizing radiation has the potential of causing carcinogenic mutations in your cells DNA, as well as produce damage to cell tissue. Indeed, it makes me chuckle that until 50 years ago or so kids could play with it by purchasing stuff like that shown below...



If you know what you are dealing with and take the necessary precautions, however, radiation _can_ be fun to study at home. The tools and the primary matter are not found at the corner grocery, though, so you need to have a specific interest in it before you get ready to start. 

read more

n-Category Café Categories of Nets (Part 1)

I’ve been thinking about Petri nets a lot. Around 2010, I got excited about using them to describe chemical reactions, population dynamics and more, using ideas taken from quantum physics. Then I started working with my student Blake Pollard on ‘open’ Petri nets, which you can glue together to form larger Petri nets. Blake and I focused on their applications to chemistry, but later my student Jade Master and I applied them to computer science and brought in some new math. I was delighted when Evan Patterson and Micah Halter used all this math, along with ideas of Joachim Kock, to develop software for rapidly assembling models of COVID-19.

Now I’m happy to announce that Jade and I have teamed up with Fabrizio Genovese and Mike Shulman to straighten out a lot of mysteries concerning Petri nets and their variants:

This paper is full of interesting ideas, but I’ll just tell you the basic framework.

A Petri net is a seemingly simple thing:

It consists of places (drawn as circles) and transitions (drawn as boxes), with directed edges called arcs from places to transitions and from transitions to places.

The idea is that when you use a Petri net, you put dots called tokens in the places, and then move them around using the transitions:

A Petri net is actually a way of describing a monoidal category. A way of putting a bunch of tokens in the places gives an object of this category, and a way of moving them around repeatedly (as above) gives a morphism.

The idea sounds straightforward enough. But it conceals some subtleties, which researchers have been struggling with for at least 30 years.

There are various ways to make the definition of Petri net precise. For example: is there a finite set of arcs from a given place to a given transition (and the other way around), or merely a natural number of arcs? If there is a finite set, is this set equipped with an ordering or not? Furthermore, what is a morphism between Petri nets?

Different answers are good for different purposes. In the so-called ‘individual token philosophy’, we allow a finite set of tokens in each place. In the ‘collective token philosophy’, we merely allow a natural number of tokens in each place. It’s like the difference between having 4 individual workers named John, Fabrizio, Jade and Mike where you can tell who did what, and having 4 anonymous workers: nameless drones.

Our goal was to sort this out all and make it crystal clear. We focus on 3 kinds of net, each of which naturally generates its own kind of monoidal category:

  • pre-nets, which generate free strict monoidal categories.
  • Σ-nets, which generate free symmetric strict monoidal categories.
  • Petri nets, which generate free commutative monoidal categories.

These three kinds of monoidal category differ in how ‘commutative’ they are:

  • In a strict monoidal category we typically have xyyxx \otimes y \ne y \otimes x.

  • In a strict symmetric monoidal category we have for each pair of objects a chosen isomorphism xyyxx \otimes y \cong y \otimes x.

  • A commutative monoidal category is a symmetric strict monoidal category where the symmetry isomorphisms are all identities, so xy=yxx \otimes y = y \otimes x.

So, we have a spectrum running from hardcore individualism, where two different things of the same type are never interchangeable… to hardcore collectivism, where two different things of the same type are so interchangeable that switching them counts as doing nothing at all! In the theory of Petri nets and their variants, the two extremes have been studied better than the middle.

You can summarize the story with this diagram:

There are three different categories of nets at bottom, and three diffferent categories of monoidal categories on top — all related by adjoint functors! Here the left adjoints point up the page — since different kinds of nets freely generate different kinds of monoidal categories — and also to the right, in the direction of increasing ‘commutativity’.

If you’re a category theorist you’ll recognize at least two of the three categories on top:

  • StrMC\mathsf{StrMC}, with strict monoidal categories as objects and strict monoidal functors as morphisms.

  • SSMC\mathsf{SSMC}, with symmetric strict monoidal categories as objects and strict symmetric monoidal functors as their morphisms.

  • CMC\mathsf{CMC}, with commutative monoidal categories as objects and strict symmetric monoidal functors as morphisms. A commutative monoidal category is a symmetric strict monoidal category where the symmetry is the identity.

The categories of nets are probably less familiar. But they are simple enough. Here I’ll just describe their objects. The morphisms are fairly obvious, but read our paper for details.

  • PreNet\mathsf{PreNet}, with pre-nets as objects. A pre-net consists of a set SS of places, a set TT of transitions, and a function TS *×S *T \to S^\ast\times S^\ast, where S *S^\ast is the underlying set of the free monoid on SS.

  • Σnet\mathsf{-net}, with Σ-nets as objects. A Σ-net consists of a set SS, a groupoid TT, and a discrete opfibration TPS×PS opT \to P S \times P S^{\mathrm{op}}, where PSP S is the free symmetric strict monoidal category generated by a set of objects SS and no generating morphisms.

  • Petri\mathsf{Petri}, with Petri nets as objects. A Petri net, as we use the term, consists of a set SS, a set TT, and a function T[S]×[S]T \to \mathbb{N}[S] \times \mathbb{N}[S], where [S]\mathbb{N}[S] is the underlying set of the free commutative monoid on SS.

What does this mean in practice?

  • In a pre-net, each transition has an ordered list of places as ‘inputs’ and an ordered list of places as ‘outputs’. We cannot permute the inputs or outputs of a transition.

  • In a Σ-net, each transition has an ordered list of places as inputs and an ordered list of places as outputs. However, permuting the entries of these lists gives a new transition with a new list of inputs and a new list of outputs!

  • In a Petri net, each transition has a multiset of places as inputs and a multiset of places as outputs. A multiset is like an ‘unordered list’: entries can appear repeatedly, but the order makes no difference at all.

So, pre-nets are rigidly individualist. Petri nets are rigidly collectivist. And Σ-nets are flexible, including both extremes as special cases!

On the one hand, we can use the left adjoint functor

PreNetΣnet \mathsf{PreNet} \to \Sigma\mathsf{-net}

to freely generate Σ-nets from pre-nets. If we do this, we get Σ-nets such that permutations of inputs and outputs act freely on transitions. Joachim Kock has recently studied Σ-nets of this sort. He calls them whole-grain Petri nets, and he treats them as forming a category in their own right, but it’s also the full image of the above functor.

On the other hand, we can use the right adjoint functor

PetriΣnet \mathsf{Petri} \to &Sigma;\mathsf{-net}

to turn Petri nets into Σ-nets. If we do this, we get Σ-nets such that permutations of inputs and outputs act trivially on transitions: the permutations have no effect at all.

I’m not going to explain how we got any of the adjunctions in this diagram:

That’s where the interesting category theory comes in. Nor will I tell you about the various alternative mathematical viewpoints on Σ-nets… nor how we draw them. I also won’t explain our work on open nets and open categories of all the various kinds. I’m hoping Mike Shulman will say some more about what we’ve done. That’s why this blog article is optimistically titled “Part 1”.

But I hope you see the main point. There are three different kinds of things like Petri nets, each of which serves to freely generate a different kind of monoidal category. They’re all interesting, and a lot of confusion can be avoided if we don’t mix them up!

Jordan EllenbergDream (boxes)

I’m at my friend Debbie Wassertzug’s house; for some reason there’s a lot of old stuff of mine in her house, boxes and books and papers and miscellany, stuff I haven’t had access to for years. I have my car with me and I’ve come by to pick it up, but unfortunately, she and her family are going to Miami — they’re leaving for the airport in five minutes — that’s how much time I have to figure out which of my things to pack and which to leave at her house, possibly for good. And I can’t decide. I’m stuck. Some of my stuff is out on shelves. An old boombox, a bunch of books. And when I look at each of those things, I think, can I live without having this? I’ve been getting along without it so far. I should take one of the sealed boxes instead, there might be something in there I really want to have again. But what if what’s in the sealed boxes is worthless to me? I’m paralyzed and very aware of Debbie and her family packing up as they get ready to leave. I feel like I could make a good decision if I only had a second to really think about it. I wake up without deciding anything.

John PreskillRandom walks

A college professor of mine proposed a restaurant venture to our class. He taught statistical mechanics, the physics of many-particle systems. Examples range from airplane fuel to ice cubes to primordial soup. Such systems contain 1024 particles each—so many particles that we couldn’t track them all if we tried. We can gather only a little information about the particles, so their actions look random.

So does a drunkard’s walk. Imagine a college student who (outside of the pandemic) has stayed out an hour too late and accepted one too many red plastic cups. He’s arrived halfway down a sidewalk, where he’s clutching a lamppost, en route home. Each step has a 50% chance of carrying him leftward and a 50% chance of carrying him rightward. This scenario repeats itself every Friday. On average, five minutes after arriving at the lamppost, he’s back at the lamppost. But, if we wait for a time T, we have a decent chance of finding him a distance \sqrt{T} away. These characteristic typify a simple random walk.

Random walks crop up across statistical physics. For instance, consider a grain of pollen dropped onto a thin film of water. The water molecules buffet the grain, which random-walks across the film. Robert Brown observed this walk in 1827, so we call it Brownian motion. Or consider a magnet at room temperature. The magnet’s constituents don’t walk across the surface, but they orient themselves according random-walk mathematics. And, in quantum many-particle systems, information can spread via a random walk. 

So, my statistical-mechanics professor said, someone should open a restaurant near MIT. Serve lo mein and Peking duck, and call the restaurant the Random Wok.

This is the professor who, years later, confronted another alumna and me at a snack buffet.

“You know what this is?” he asked, waving a pastry in front of us. We stared for a moment, concluded that the obvious answer wouldn’t suffice, and shook our heads.

“A brownie in motion!”

Not only pollen grains undergo Brownian motion, and not only drunkards undergo random walks. Many people random-walk to their careers, trying out and discarding alternatives en route. We may think that we know our destination, but we collide with a water molecule and change course.

Such is the thrust of Random Walks, a podcast to which I contributed an interview last month. Abhigyan Ray, an undergraduate in Mumbai, created the podcast. Courses, he thought, acquaint us only with the successes in science. Stereotypes cast scientists as lone geniuses working in closed offices and silent labs. He resolved to spotlight the collaborations, the wrong turns, the lessons learned the hard way—the random walks—of science. Interviewees range from a Microsoft researcher to a Harvard computer scientist to a neurobiology professor to a genomicist.

You can find my episode on Instagram, Apple Podcasts, Google Podcasts, and Spotify. We discuss the bridging of disciplines; the usefulness of a liberal-arts education in physics; Quantum Frontiers; and the delights of poking fun at my PhD advisor, fellow blogger and Institute for Quantum Information and Matter director John Preskill

January 16, 2021

Jordan EllenbergAm I supposed to say something about the invasion of the United States Capitol?

Or the reimpeachment of the President, a week before the end of his term?

I feel like I should, just because it’s history, and I might wonder how it seemed in real time. It is hard to understand what actually happened on January 6, even though we live in a world where everything is logged in real-time video. We still don’t know who left pipe bombs outside the offices of the Republican and Democratic National Committees. We don’t know what parts of the invasion were spontaneous and what parts were planned, and by whom. Some people are saying members of the House of Representatives collaborated with the invaders, giving them a guided tour of the building the day before the attack. Some people are saying some of the Capitol Police force collaborated, while others fought off the mob.

We don’t know what to expect next. There is said to be “chatter” about armed, angry people at all 50 statehouses. I don’t know how seriously to take that, but I won’t be going downtown this weekend. Moving trucks have been sighted at the White House and some people say the President has given up pretending he won re-election; but then again he is also said to have met with one of his favorite CEOs today to talk legal strategies for keeping up the show.

As I said last week, it is temperamentally hard for me to expect the worst. Probably Trump will slink away and the inauguration will happen without incident and the idea of renewed armed rebellion against the United States government will slink away too, albeit more slowly. But — as last week — I don’t have a good argument that it has to be that way.

What I find really chilling is this. Imagine it had been much worse and some number of Democratic senators, known for opposing Trump, had been kidnapped or killed. Mitch McConnell would have somberly denounced the crimes. But he would also have allowed Republican governors to appoint those senators’ replacements, and reclaimed his role as majority leader, and do everything he could to prevent the new government from governing, saying, what happened on January 6 was terrible, to be deplored and mourned, but we have to move on.

January 15, 2021

Matt von HippelPhysics Acculturation

We all agree physics is awesome, right?

Me, I chose physics as a career, so I’d better like it. And you, right now you’re reading a physics blog for fun, so you probably like physics too.

Ok, so we agree, physics is awesome. But it isn’t always awesome.

Read a blog like this, or the news, and you’ll hear about the more awesome parts of physics: the black holes and big bangs, quantum mysteries and elegant mathematics. As freshman physics majors learn every year, most of physics isn’t like that. It’s careful calculation and repetitive coding, incremental improvements to a piece of a piece of a piece of something that might eventually answer a Big Question. Even if intellectually you can see the line from what you’re doing to the big flashy stuff, emotionally the two won’t feel connected, and you might struggle to feel motivated.

Physics solves this through acculturation. Physicists don’t just work on their own, they’re part of a shared worldwide culture of physicists. They spend time with other physicists, and not just working time but social time: they eat lunch together, drink coffee together, travel to conferences together. Spending that time together gives physics more emotional weight: as humans, we care a bit about Big Questions, but we care a lot more about our community.

This isn’t unique to physics, of course, or even to academics. Programmers who have lunch together, philanthropists who pat each other on the back for their donations, these people are trying to harness the same forces. By building a culture around something, you can get people more motivated to do it.

There’s a risk here, of course, that the culture takes over, and we lose track of the real reasons to do science. It’s easy to care about something because your friends care about it because their friends care about it, looping around until it loses contact with reality. In science we try to keep ourselves grounded, to respect those who puncture our bubbles with a good argument or a clever experiment. But we don’t always succeed.

The pandemic has made acculturation more difficult. As a scientist working from home, that extra bit of social motivation is much harder to get. It’s perhaps even harder for new students, who haven’t had the chance to hang out and make friends with other researchers. People’s behavior, what they research and how and when, has changed, and I suspect changing social ties are a big part of it.

In the long run, I don’t think we can do without the culture of physics. We can’t be lone geniuses motivated only by our curiosity, that’s just not how people work. We have to meld the two, mix the social with the intellectual…and hope that when we do, we keep the engines of discovery moving.

January 14, 2021

Scott AaronsonTo all Trumpists who comment on this blog

The violent insurrection now unfolding in Washington DC is precisely the thing you called me nuts, accused me of “Trump Derangement Syndrome,” for warning about since 2016. Crazy me, huh, always seeing brownshirts around the corner? And you called the other side violent anarchists? This is all your doing. So own it. Wallow in it. May you live the rest of your lives in shame.

Update (Jan. 7): As someone who hasn’t always agreed with BLM’s slogans and tactics, I viewed the stunning passivity of the police yesterday against white insurrectionists in the Capitol as one of the strongest arguments imaginable for BLM’s main contentions.

January 12, 2021

John BaezThis Week’s Finds (1–50)

Take a copy of this!

This Week’s Finds in Mathematical Physics (1-50), 242 pages.

These are the first 50 issues of This Week’s Finds of Mathematical Physics. This series has sometimes been called the world’s first blog, though it was originally posted on a “usenet newsgroup” called sci.physics.research — a form of communication that predated the world-wide web. I began writing this series as a way to talk about papers I was reading and writing, and in the first 50 issues I stuck closely to this format. These issues focus rather tightly on quantum gravity, topological quantum field theory, knot theory, and applications of n-categories to these subjects. There are, however, digressions into elliptic curves, Lie algebras, linear logic and various other topics.

Tim Hosgood kindly typeset all 300 issues of This Week’s Finds in 2020. They will be released in six installments of 50 issues each, for a total of about 2610 pages. I have edited the issues here to make the style a bit more uniform and also to change some references to preprints, technical reports, etc. into more useful arXiv links. This accounts for some anachronisms where I discuss a paper that only appeared on the arXiv later.

The process of editing could have gone on much longer; there are undoubtedly many mistakes remaining. If you find some, please contact me and I will try to fix them.

By the way, sci.physics.research is still alive and well, and you can use it on Google. But I can’t find the first issue of This Week’s Finds there — if you can find it, I’ll be grateful. I can only get back to the sixth issue. Take a look if you’re curious about usenet newsgroups! They were low-tech compared to what we have now, but they felt futuristic at the time, and we had some good conversations.


Clifford JohnsonReunion

Revisiting an old friend you might recognize. (And discovering that my old inking/shading workflow was just fine. – I’d been experimenting with other approaches and also just getting back into the saddle, as it were. I’ve found that I’d already landed on this approach for good time-cost/benefit reasons.) -cvj

The post Reunion appeared first on Asymptotia.

January 10, 2021

Mark GoodsellRecasting a spell

For three successive Januaries now, since I started this blog in 2018, I posted a list of the things to look forward to, which for whatever reason didn't materialise and so were essentially repeated the next year. Given the state of the world right now some positive thinking seems to be needed more than ever, but it would be a bit of a joke to repeat the same mistake again. In particular, the measurement of the muon anomalous magnetic moment (which is apparently all I blog about) has still not been announced, and I'm led to wonder whether last year's controversies regarding the lattice QCD calculations have played a role in this, muddying the water.

Instead today I want to write a little about an effort that I have joined in the last couple of years, and really started to take seriously last year: recasting LHC searches. The LHC has gathered a huge amount of data and both main experiments (CMS and ATLAS) have published O(1000) papers. Many of these are studying Standard Model (SM) processes, but there are a large number with dedicated searches for New Physics models. Some of them contain deviations from the predictions of the Standard Model, although at present there is no clear and sufficiently significant deviation yet -- with the obvious exception of LHCb and the B-meson anomalies. Instead, we can use the data to constrain potential new theories.

The problem is that we can't expect the experiments to cover even a significant fraction of the cases of interest to us. For an important example, the simplest supersymmetric models have four 'neutalinos' (neutral fermions), two 'charginos' (charged fermions), plus scalar squarks and sleptons -- and two heavy neutral Higgs particles plus a charged Higgs; it is clearly impossible to list in a few papers limits on every possible combination of masses and couplings for these. So the experiments do their best: they take either the best-motivated or easiest to search for cases and try to give results that are as general as possible. But even then, supersymmetric models are just one example and it is possible that a given search channel (e.g. looking for pair production of heavy particles that then decay to jets plus some invisible particles as missing energy) could apply to many models, and it is impossible in a given paper to provide all possible interpretations.

This is where recasting comes in. The idea is to write a code that can simulate the response of the relevant LHC detector and the cuts used by the analysis and described in the paper. Then any theorist can simulate signal events for their model (using now well-established tools) and analyse them with this code, hopefully providing a reasonably accurate approximation of what the experiment would have seen. They can then determine whether the model (or really the particular choice of masses and coupling for that model) is allowed or ruled out by the particular search, without having to ask the experiments to do a dedicated analysis.

Ideally, recasting would be possible for every analysis -- not just searches for particular models, but also Standard Model searches (one example which I have been involved in is recasting the search for four top-quarks which was designed to observe the Standard Model process and measure its cross-section, but we could then use this to constrain new heavy particles that decay to pairs of tops and are produced in pairs). However this is a lot of work, because theorists do not have access to the simulation software used for the experiments' own projections (and it would probably be too computationally intensive to be really useful anyway) so the experiments cannot just hand over their simulation code. Instead there is then a lot of work to make approximations, which is why there is really some physics involved and it is not just a mechanical exercise. Sometimes the experiments provide pseudocode which include an implementation of the cuts made in the analysis, which helps understanding the paper (where there is sometimes some ambiguity) and often they provide supplementary material, but in general getting a good recast is a lot of work.

In recent years there has been a lot of effort by both experimentalists and theorists to make recasting easier and to meet in the middle. There are lots of workshops and common meetings, and lots of initiatives such as making raw data open access. On the theory side, while there are many people who write their own bespoke codes to study some model with some search, there are now several frameworks for grouping together recasts of many different analyses. The ones with which I am most familiar are CheckMATE, Rivet, ColliderBIT and MadAnalysis. These can all be used to check a given model against a (non-identical) list of analyses, but work in somewhat different ways -- at their core though they all involve simulating signals and detectors. There is therefore a little friendly competition and very useful cross-checking. Then there are other useful tools, in particular SModelS, which instead cleverly compares whatever model you give it to a list of simplified model results given by the LHC experiments, and uses those to come up with a much faster limit on the model (of course this loses in generality and can fall prey to the assumptions and whims of the particular simplified models results that are available, but is nonetheless very useful).

So the reason for the post today is the paper I was just involved in. It is a proceedings of a workshop where a group of people got together to recast a bunch of the latest LHC searches in the MadAnalysis framework. I didn't take part in the workshop, but did recast a search which was useful for a paper last year (if you are interested, it involves the signal in the inset picture)
so there are now 12 new reinterpretations of some of the latest LHC searches. This should be useful for anyone comparing their models to data. You can see by the number of authors involved how labour-intensive this is -- or check out the author list on last year's white paper; there are still many more searches from Run 2 of the LHC that are yet to be recast, so we still have our work cut out for some time to come before there is any new data!

If you are interested in the latest developments, there will be a forum next month.

Jordan EllenbergYemenite Step chicken

A short post to remind myself of a recipe. Years ago I had a very memorable plate of chicken at a restaurant in Jerusalem called the Yemenite Step. I called it “honey rosemary chicken” because those were the dominant seasonings. Thinking about it recently, I googled and found that while the restaurant no longer exists, people remember the chicken. I even found a recipe. I could link to it, but basically the recipe is “fry pieces of chicken in a pan with some olive oil and just keep pouring more honey and stripping more rosemary sprigs into it until it tastes like Yemenite Step chicken,” — literally there are no other seasonings. (I put in a little salt, it just seemed wrong not to.) Anyway, this is just to record that I did this (with some boneless chicken breast from Conscious Carnivore — I assume this would work with bone-in thighs too but might require slightly more technique.) The chicken was good, I threw some leftover rice from the fridge into the pan after the chicken was done and cooked it in the honey/chicken liquid and that was good, everybody was happy, it was extremely easy. Whether it’s actually Yemenite I have no idea.

January 09, 2021

Doug NatelsonQuestions that show who you are as a physicist

There are some cool new physics and nanoscience results out there, but after a frankly absurd week (in which lunatics stormed the US Capitol, the US reached 4000 covid deaths per day, and everything else), we need something light.  Stephen Colbert has started a new segment on his show called "The Colbert Questionert" (see an example here with Tom Hanks - sorry if that's region-coded to the US), in which he asks a list of fifteen questions that (jokingly) reveal the answerer's core as a human being.   These range from "What is your favorite sandwich?" to "What do you think happens when you die?".  Listening to this, I think we need some discipline-specific questions for physicists.  Here are some initial thoughts, and I'd be happy to get more suggestions in the comments.  

  • Food that you eat when traveling to a conference or talk but not at home?
    • Science fiction - yes or no?
    • What is your second-favorite subdiscipline of physics/astronomy/science?
    • Favorite graph:  linear-linear? Log-log?  Log-linear?  Double-log?  Polar?  Weird uninterpretable 3D representation that would make Edward Tufte's head explode?
    • Lagrangian or Hamiltonian?
    • Bayesian or frequentist?
    • Preferred interpretation of quantum mechanics/solution to the measurement problem?

January 08, 2021

Scott AaronsonDistribute the vaccines NOW!

My last post about covid vaccines felt like shouting uselessly into the void … at least until Patrick Collison, the cofounder of Stripe and a wonderful friend, massively signal-boosted the post by tweeting it. This business is of such life-and-death urgency right now, and a shift in attitude or a hardening of resolve by just a few people reading could have such an outsized effect, that with apologies to anyone wanting me to return to my math/CS/physics lane, I feel like a second post on the same topic is called for.

Here’s my main point for today (as you might have noticed, I’ve changed the tagline of this entire blog accordingly):

Reasonable people can disagree about whether vaccination could have, or should have, started much earlier. But now that we in the US have painstakingly approved two vaccines, we should all agree about the urgent need to get millions of doses into people’s arms before they spoil! Sure, better the elderly than the young, better essential than inessential workers—but much more importantly, better today than tomorrow, and better anyone than no one!

Israel, which didn’t do especially well in earlier stages of the pandemic, is now putting the rest of the planet to shame with vaccinations. What Dana and I hear from our friends and relatives there confirms what you can read here, here, and elsewhere. Rabin Square in Tel Aviv is now a huge vaccination field site. Vaccinations are now proceeding 24/7, even on Shabbat—something the ultra-Orthodox rabbis are grudgingly tolerating under the doctrine of “pikuach nefesh” (i.e., saving a life overrides almost every other religious obligation). Israelis are receiving texts at all hours telling them when it’s their turn and where to go. Apparently, after the nurses are finished with everyone who had appointments, rather than waste whatever already-thawed supply is left, they simply go into the street and offer the extra doses to anyone passing by.

Contrast that with the historic fiasco—yes, another historic fiasco—now unfolding in the US. The Trump administration had pledged to administer 20 million vaccines (well, Trump originally said 100 million) by the end of 2020. Instead, fewer than three million were administered, with the already-glacial pace slowing even further over the holidays. Unbelievably, millions of doses are on track to spoil this month, before they can be administered. The bottleneck is now not manufacturing, it’s not supply, it’s just pure bureaucratic dysfunction and chaos, lack of funding and staff, and a stone-faced unwillingness by governors to deviate from harebrained “plans” and “guidelines” even with their populations’ survival at stake.

Famously, the CDC urged that essential workers get vaccinated before the elderly, since even though their own modeling predicted that many more people from all ethnic groups would die that way, at least the deaths would be more equitably distributed. While there are some good arguments to prioritize essential workers, an outcry then led to the CDC partially backtracking, and to many states just making up their own guidelines. But we’re now, for real, headed for a scenario where none of these moral-philosophy debates turn out to matter, since the vaccines will simply spoil in freezers (!!!) while the medical system struggles to comply with the Byzantine rules about who gets them first.

While I’d obviously never advocate such a thing, one wonders whether there’s an idealistic medical worker, somewhere in the US, who’s willing to risk jail for vaccinating people without approval, using supply that would otherwise be wasted. If anything could galvanize this sad and declining nation to move faster, maybe it’s that.


In my last post, I invited people to explain to me where I went wrong in my naïve, simplistic, doofus belief that, were our civilization still capable of “WWII” levels of competence, flexibility, and calculated risk-tolerance, most of the world could have already been vaccinated by now. In the rest of this post, I’d like to list the eight most important counterarguments to that position that commenters offered (at least, those that I hadn’t already anticipated in the post itself), together with my brief responses to them.

  1. Faster approval wouldn’t have helped, since the limiting factor was just the time needed to ramp up the supply. As the first part of this post discussed, ironically supply is not now the limiting factor, and approval even a month or two earlier could’ve provided precious time to iron out the massive problems in distribution. More broadly, though, what’s becoming obvious is that we needed faster everything: testing, approval, manufacturing, and distribution.
  2. The real risk, with vaccines, is long-term side effects, ones that might manifest only after years. What I don’t get is, if people genuinely believe this, then why are they OK with having approved the vaccines last month? Why shouldn’t we have waited until 2024, or maybe 2040? By that point, those of us who were still alive could take the covid vaccine with real confidence, at least that the dreaded side effects would be unlikely to manifest before 2060.
  3. Much like with Amdahl’s Law, there are limits to how much more money could’ve sped up vaccine manufacturing. My problem is that, while this is undoubtedly true, I see no indication that we were anywhere close to those limits—or indeed, that the paltry ~$9 billion the US spent on covid vaccines was the output of any rational cost/benefit calculation. It’s like: suppose an enemy army had invaded the US mainland, slaughtered 330,000 people, and shut down much of the economy. Can you imagine Congress responding by giving the Pentagon a 1.3% budget increase to fight back, reasoning that any more would run up against Amdahl’s Law? That’s how much $9 billion is.
  4. The old, inactivated-virus vaccines often took years to develop, so spending years to test them as well made a lot more sense. This is undoubtedly true, but is not a counterargument. It’s time to rethink the whole vaccine approval process for the era of programmable mRNA, which is also the era of pandemics that can spread around the world in months.
  5. Human challenge trials wouldn’t have provided much information, because you can’t do challenge trials with old or sick people, and because covid spread so widely that normal Phase III trials were perfectly informative. Actually, 1DaySooner had plenty of elderly volunteers and volunteers with preexisting conditions. It bothers me how the impossibility of using those volunteers is treated like a law of physics, rather than what it is: another non-obvious moral tradeoff. Also, compared to Phase III trials, it looks like challenge trials would’ve bought us at least a couple months and maybe a half-million lives.
  6. Doctors can’t think like utilitarians—e.g., risking hundreds of lives in challenge trials in order to save millions of lives with a vaccine—because it’s a slippery slope from there to cutting up one person in order to save ten with their organs. Well, I think the informed consent of the challenge trial participants is a pretty important factor here! As is their >99% chance of survival. Look, anyone who works in public health makes utilitarian tradeoffs; the question is whether they’re good or bad ones. As someone who lost most of his extended family in the Holocaust, my rule of thumb is that, if you’re worrying every second about whether you might become Dr. Mengele, that’s a pretty good sign that you won’t become Dr. Mengele.
  7. If a hastily-approved vaccine turned out to be ineffective or dangerous, it could diminish the public’s trust in all future vaccines. Yes, of course there’s such a tradeoff, but I want you to notice the immense irony: this argument effectively says we can condemn millions to die right now, out of concern for hypothetical other millions in the future. And yet some of the people making this argument will then turn around and call me a callous utilitarian!
  8. I’m suffering from hindsight bias: it might be clear now that vaccine approval and distribution should’ve happened a lot faster, but experts had no way of knowing that in the spring. Here’s my post from May 1, entitled “Vaccine challenge trials NOW!” I was encouraged by the many others who said similar things still earlier. Was it just a lucky gamble? Had we been allowed to get vaccinated then, at least we could’ve put our bloodstreams where our mouths were, and profited from the gamble! More seriously, I sympathize with the decision-makers who’d be on the hook had an early vaccine rollout proved disastrous. But if we don’t learn a lesson from this, and ready ourselves for the next pandemic with an mRNA platform that can be customized, tested, and injected into people’s arms within at most 2-3 months, we’ll really have no excuse.

Matt von HippelWhat Tells Your Story

I watched Hamilton on Disney+ recently. With GIFs and songs from the show all over social media for the last few years, there weren’t many surprises. One thing that nonetheless struck me was the focus on historical evidence. The musical Hamilton is based on Ron Chernow’s biography of Alexander Hamilton, and it preserves a surprising amount of the historian’s care for how we know what we know, hidden within the show’s other themes. From the refrain of “who tells your story”, to the importance of Eliza burning her letters with Hamilton (not just the emotional gesture but the “gap in the narrative” it created for historians), to the song “The Room Where It Happens” (which looked from GIFsets like it was about Burr’s desire for power, but is mostly about how much of history is hidden in conversations we can only partly reconstruct), the show keeps the puzzle of reasoning from incomplete evidence front-and-center.

Any time we try to reason about the past, we are faced with these kinds of questions. They don’t just apply to history, but to the so-called historical sciences as well, sciences that study the past. Instead of asking “who” told the story, such scientists must keep in mind “what” is telling the story. For example, paleontologists reason from fossils, and thus are limited by what does and doesn’t get preserved. As a result after a century of studying dinosaurs, only in the last twenty years did it become clear they had feathers.

Astronomy, too, is a historical science. Whenever astronomers look out at distant stars, they are looking at the past. And just like historians and paleontologists, they are limited by what evidence happened to be preserved, and what part of that evidence they can access.

These limitations lead to mysteries, and often controversies. Before LIGO, astronomers had an idea of what the typical mass of a black hole was. After LIGO, a new slate of black holes has been observed, with much higher mass. It’s still unclear why.

Try to reason about the whole universe, and you end up asking similar questions. When we see the movement of “standard candle” stars, is that because the universe’s expansion is accelerating, or are the stars moving as a group?

Push far enough back and the evidence doesn’t just lead to controversy, but to hard limits on what we can know. No matter how good our telescopes are, we won’t see light older than the cosmic microwave background: before that background was emitted the universe was filled with plasma, which would have absorbed any earlier light, erasing anything we could learn from it. Gravitational waves may one day let us probe earlier, and make discoveries as surprising as feathered dinosaurs. But there is yet a stronger limit to how far back we can go, beyond which any evidence has been so diluted that it is indistinguishable from random noise. We can never quite see into “the room where it happened”.

It’s gratifying to see questions of historical evidence in a Broadway musical, in the same way it was gratifying to hear fractals mentioned in a Disney movie. It’s important to think about who, and what, is telling the stories we learn. Spreading that lesson helps all of us reason better.

Clifford JohnsonTaking a Breath

Felt like one of the best things to sit and do with family on the morning after the day before...

Sketch of strelitzia plant, colour all green.

-cvj Click to continue reading this post

The post Taking a Breath appeared first on Asymptotia.

January 07, 2021

Terence TaoThe effective potential of an M-matrix

Marcel Filoche, Svitlana Mayboroda, and I have just uploaded to the arXiv our preprint “The effective potential of an {M}-matrix“. This paper explores the analogue of the effective potential of Schrödinger operators {-\Delta + V} provided by the “landscape function” {u}, when one works with a certain type of self-adjoint matrix known as an {M}-matrix instead of a Schrödinger operator.

Suppose one has an eigenfunction

\displaystyle  (-\Delta + V) \phi = E \phi

of a Schrödinger operator {-\Delta+V}, where {\Delta} is the Laplacian on {{\bf R}^d}, {V: {\bf R}^d \rightarrow {\bf R}} is a potential, and {E} is an energy. Where would one expect the eigenfunction {\phi} to be concentrated? If the potential {V} is smooth and slowly varying, the correspondence principle suggests that the eigenfunction {\phi} should be mostly concentrated in the potential energy wells {\{ x: V(x) \leq E \}}, with an exponentially decaying amount of tunnelling between the wells. One way to rigorously establish such an exponential decay is through an argument of Agmon, which we will sketch later in this post, which gives an exponentially decaying upper bound (in an {L^2} sense) of eigenfunctions {\phi} in terms of the distance to the wells {\{ V \leq E \}} in terms of a certain “Agmon metric” on {{\bf R}^d} determined by the potential {V} and energy level {E} (or any upper bound {\overline{E}} on this energy). Similar exponential decay results can also be obtained for discrete Schrödinger matrix models, in which the domain {{\bf R}^d} is replaced with a discrete set such as the lattice {{\bf Z}^d}, and the Laplacian {\Delta} is replaced by a discrete analogue such as a graph Laplacian.

When the potential {V} is very “rough”, as occurs for instance in the random potentials arising in the theory of Anderson localisation, the Agmon bounds, while still true, become very weak because the wells {\{ V \leq E \}} are dispersed in a fairly dense fashion throughout the domain {{\bf R}^d}, and the eigenfunction can tunnel relatively easily between different wells. However, as was first discovered in 2012 by my two coauthors, in these situations one can replace the rough potential {V} by a smoother effective potential {1/u}, with the eigenfunctions typically localised to a single connected component of the effective wells {\{ 1/u \leq E \}}. In fact, a good choice of effective potential comes from locating the landscape function {u}, which is the solution to the equation {(-\Delta + V) u = 1} with reasonable behavior at infinity, and which is non-negative from the maximum principle, and then the reciprocal {1/u} of this landscape function serves as an effective potential.

There are now several explanations for why this particular choice {1/u} is a good effective potential. Perhaps the simplest (as found for instance in this recent paper of Arnold, David, Jerison, and my two coauthors) is the following observation: if {\phi} is an eigenvector for {-\Delta+V} with energy {E}, then {\phi/u} is an eigenvector for {-\frac{1}{u^2} \mathrm{div}(u^2 \nabla \cdot) + \frac{1}{u}} with the same energy {E}, thus the original Schrödinger operator {-\Delta+V} is conjugate to a (variable coefficient, but still in divergence form) Schrödinger operator with potential {1/u} instead of {V}. Closely related to this, we have the integration by parts identity

\displaystyle  \int_{{\bf R}^d} |\nabla f|^2 + V |f|^2\ dx = \int_{{\bf R}^d} u^2 |\nabla(f/u)|^2 + \frac{1}{u} |f|^2\ dx \ \ \ \ \ (1)

for any reasonable function {f}, thus again highlighting the emergence of the effective potential {1/u}.

These particular explanations seem rather specific to the Schrödinger equation (continuous or discrete); we have for instance not been able to find similar identities to explain an effective potential for the bi-Schrödinger operator {\Delta^2 + V}.

In this paper, we demonstrate the (perhaps surprising) fact that effective potentials continue to exist for operators that bear very little resemblance to Schrödinger operators. Our chosen model is that of an {M}-matrix: self-adjoint positive definite matrices {A} whose off-diagonal entries are negative. This model includes discrete Schrödinger operators (with non-negative potentials) but can allow for significantly more non-local interactions. The analogue of the landscape function would then be the vector {u := A^{-1} 1}, where {1} denotes the vector with all entries {1}. Our main result, roughly speaking, asserts that an eigenvector {A \phi = E \phi} of {A} will then be exponentially localised to the “potential wells” {K := \{ j: \frac{1}{u_j} \leq E \}}, where {u_j} denotes the coordinates of the landscape function {u}. In particular, we establish the inequality

\displaystyle  \sum_k \phi_k^2 e^{2 \rho(k,K) / \sqrt{W}} ( \frac{1}{u_k} - E )_+ \leq W \max_{i,j} |a_{ij}|

if {\phi} is normalised in {\ell^2}, where the connectivity {W} is the maximum number of non-zero entries of {A} in any row or column, {a_{ij}} are the coefficients of {A}, and {\rho} is a certain moderately complicated but explicit metric function on the spatial domain. Informally, this inequality asserts that the eigenfunction {\phi_k} should decay like {e^{-\rho(k,K) / \sqrt{W}}} or faster. Indeed, our numerics show a very strong log-linear relationship between {\phi_k} and {\rho(k,K)}, although it appears that our exponent {1/\sqrt{W}} is not quite optimal. We also provide an associated localisation result which is technical to state but very roughly asserts that a given eigenvector will in fact be localised to a single connected component of {K} unless there is a resonance between two wells (by which we mean that an eigenvalue for a localisation of {A} associated to one well is extremely close to an eigenvalue for a localisation of {A} associated to another well); such localisation is also strongly supported by numerics. (Analogous results for Schrödinger operators had been previously obtained by the previously mentioned paper of Arnold, David, Jerison, and my two coauthors, and to quantum graphs in a very recent paper of Harrell and Maltsev.)

Our approach is based on Agmon’s methods, which we interpret as a double commutator method, and in particular relying on exploiting the negative definiteness of certain double commutator operators. In the case of Schrödinger operators {-\Delta+V}, this negative definiteness is provided by the identity

\displaystyle  \langle [[-\Delta+V,g],g] u, u \rangle = -2\int_{{\bf R}^d} |\nabla g|^2 |u|^2\ dx \leq 0 \ \ \ \ \ (2)

for any sufficiently reasonable functions {u, g: {\bf R}^d \rightarrow {\bf R}}, where we view {g} (like {V}) as a multiplier operator. To exploit this, we use the commutator identity

\displaystyle  \langle g [\psi, -\Delta+V] u, g \psi u \rangle = \frac{1}{2} \langle [[-\Delta+V, g \psi],g\psi] u, u \rangle

\displaystyle -\frac{1}{2} \langle [[-\Delta+V, g],g] \psi u, \psi u \rangle

valid for any {g,\psi,u: {\bf R}^d \rightarrow {\bf R}} after a brief calculation. The double commutator identity then tells us that

\displaystyle  \langle g [\psi, -\Delta+V] u, g \psi u \rangle \leq \int_{{\bf R}^d} |\nabla g|^2 |\psi u|^2\ dx.

If we choose {u} to be a non-negative weight and let {\psi := \phi/u} for an eigenfunction {\phi}, then we can write

\displaystyle  [\psi, -\Delta+V] u = [\psi, -\Delta+V - E] u = \psi (-\Delta+V - E) u

and we conclude that

\displaystyle  \int_{{\bf R}^d} \frac{(-\Delta+V-E)u}{u} |g|^2 |\phi|^2\ dx \leq \int_{{\bf R}^d} |\nabla g|^2 |\phi|^2\ dx. \ \ \ \ \ (3)

We have considerable freedom in this inequality to select the functions {u,g}. If we select {u=1}, we obtain the clean inequality

\displaystyle  \int_{{\bf R}^d} (V-E) |g|^2 |\phi|^2\ dx \leq \int_{{\bf R}^d} |\nabla g|^2 |\phi|^2\ dx.

If we take {g} to be a function which equals {1} on the wells {\{ V \leq E \}} but increases exponentially away from these wells, in such a way that

\displaystyle  |\nabla g|^2 \leq \frac{1}{2} (V-E) |g|^2

outside of the wells, we can obtain the estimate

\displaystyle  \int_{V > E} (V-E) |g|^2 |\phi|^2\ dx \leq 2 \int_{V < E} (E-V) |\phi|^2\ dx,

which then gives an exponential type decay of {\phi} away from the wells. This is basically the classic exponential decay estimate of Agmon; one can basically take {g} to be the distance to the wells {\{ V \leq E \}} with respect to the Euclidean metric conformally weighted by a suitably normalised version of {V-E}. If we instead select {u} to be the landscape function {u = (-\Delta+V)^{-1} 1}, (3) then gives

\displaystyle  \int_{{\bf R}^d} (\frac{1}{u} - E) |g|^2 |\phi|^2\ dx \leq \int_{{\bf R}^d} |\nabla g|^2 |\phi|^2\ dx,

and by selecting {g} appropriately this gives an exponential decay estimate away from the effective wells {\{ \frac{1}{u} \leq E \}}, using a metric weighted by {\frac{1}{u}-E}.

It turns out that this argument extends without much difficulty to the {M}-matrix setting. The analogue of the crucial double commutator identity (2) is

\displaystyle  \langle [[A,D],D] u, u \rangle = \sum_{i \neq j} a_{ij} u_i u_j (d_{ii} - d_{jj})^2 \leq 0

for any diagonal matrix {D = \mathrm{diag}(d_{11},\dots,d_{NN})}. The remainder of the Agmon type arguments go through after making the natural modifications.

Numerically we have also found some aspects of the landscape theory to persist beyond the {M}-matrix setting, even though the double commutators cease being negative definite, so this may not yet be the end of the story, but it does at least demonstrate that utility the landscape does not purely rely on identities such as (1).

January 03, 2021

Clifford JohnsonNew Year’s Day Sketching

Found a bit of time to start a sketch on New Year's Day during a short family New Year's Day picnic. Did not get to finish, but it does not matter.
In progress watercolour sketch

Happy New Year!

-cvj Click to continue reading this post

The post New Year’s Day Sketching appeared first on Asymptotia.

January 02, 2021

Scott AaronsonMy vaccine crackpottery: a confession

I hope everyone is enjoying a New Years’ as festive as the circumstances allow!

I’ve heard from a bunch of you awaiting my next post on the continuum hypothesis, and it’s a-comin’, but I confess the new, faster-spreading covid variant is giving me the same sinking feeling that Covid 1.0 gave me in late February, making it really hard to think about the eternal. (For perspectives on Covid 2.0 from individuals who acquitted themselves well with their early warnings about Covid 1.0, see for example this by Jacob Falkovich, or this by Zvi Mowshowitz.)

So on that note: do you hold any opinions, on factual matters of practical importance, that most everyone around you sharply disagrees with? Opinions that those who you respect consider ignorant, naïve, imprudent, and well outside your sphere of expertise? Opinions that, nevertheless, you simply continue to hold, because you’ve learned that, unless and until someone shows you the light, you can no more will yourself to change what you think about the matter than change your blood type?

I try to have as few such opinions as possible. Having run Shtetl-Optimized for fifteen years, I’m acutely aware of the success rate of those autodidacts who think they’ve solved P versus NP or quantum gravity or whatever. It’s basically zero out of hundreds—and why wouldn’t it be?

And yet there’s one issue where I feel myself in the unhappy epistemic situation of those amateurs, spamming the professors in all-caps. So, OK, here it is:

I think that, in a well-run civilization, the first covid vaccines would’ve been tested and approved by around March or April 2020, while mass-manufacturing simultaneously ramped up with trillions of dollars’ investment. I think almost everyone on earth could have, and should have, already been vaccinated by now. I think a faster, “WWII-style” approach would’ve saved millions of lives, prevented economic destruction, and carried negligible risks compared to its benefits. I think this will be clear to future generations, who’ll write PhD theses exploring how it was possible that we invented multiple effective covid vaccines in mere days or weeks, but then simply sat on those vaccines for a year, ticking off boxes called “Phase I,” “Phase II,” etc. while civilization hung in the balance.

I’ve said similar things, on this blog and elsewhere, since the beginning of the pandemic, but part of me kept expecting events to teach me why I was wrong. Instead events—including the staggering cost of delay, the spectacular failures of institutional authorities to adapt to the scientific realities of covid, and the long-awaited finding that all the major vaccines safely work (some better than others), just like the experts predicted back in February—all this only made me more confident of my original, stupid and naïve position.

I’m saying all this—clearly enough that no one will misunderstand—but I’m also scared to say it. I’m scared because it sounds too much like colossal ingratitude, like Monday-morning quarterbacking of one of the great heroic achievements of our era by someone who played no part in it.

Let’s be clear: the ~11 months that it took to get from sequencing the novel coronavirus, to approving and mass-manufacturing vaccines, is a world record, soundly beating the previous record of 4 years. Nobel Prizes and billions of dollars are the least that those who made it happen deserve. Eternal praise is especially due to those like Katalin Karikó, who risked their careers in the decades before covid to do the basic research on mRNA delivery that made the development of these mRNA vaccines so blindingly fast.

Furthermore, I could easily believe that there’s no one agent—neither Pfizer nor BioNTech nor Moderna, neither the CDC nor FDA nor other health or regulatory agencies, neither Bill Gates nor Moncef Slaoui—who could’ve unilaterally sped things up very much. If one of them tried, they would’ve simply been ostracized by the other parts of the system, and they probably all understood that. It might have taken a whole different civilization, with different attitudes about utility and risk.

And yet the fact remains that, historic though it was, a one-to-two-year turnaround time wasn’t nearly good enough. Especially once we factor in the faster-spreading variant, by the time we’ve vaccinated everyone, we’ll already be a large fraction of the way to herd immunity and to the vaccine losing its purpose. For all the advances in civilization, from believing in demonic spirits all the way to understanding mRNA at a machine-code level of detail, covid is running wild much like it would have back in the Middle Ages—partly, yes, because modern transportation helps it spread, but partly also because our political and regulatory and public-health tools have lagged so breathtakingly behind our knowledge of molecular biology.

What could’ve been done faster? For starters, as I said back in March, we could’ve had human challenge trials with willing volunteers, of whom there were tens of thousands. We could’ve started mass-manufacturing months earlier, with funding commensurate with the problem’s scale (think trillions, not billions). Today, we could give as many people as possible the first doses (which apparently already provide something like ~80% protection) before circling back to give the second doses (which boost the protection as high as ~95%). We could distribute the vaccines that are now sitting in warehouses, spoiling, while people in the distribution chain take off for the holidays—but that’s such low-hanging fruit that it feels unsporting even to mention it.

Let me now respond to three counterarguments that would surely come up in the comments if I didn’t address them.

  1. The Argument from Actual Risk. Every time this subject arises, someone patiently explains to me that, since a vaccine gets administered to billions of healthy people, the standards for its safety and efficacy need to be even higher than they are for ordinary medicines. Of course that’s true, and it strikes me as an excellent reason not to inject people with a completely untested vaccine! All I ask is that the people who are, or could be, harmed by a faulty vaccine, be weighed on the same moral scale as the people harmed by covid itself. As an example, we know that the Phase III clinical trials were repeatedly halted for days or weeks because of a single participant developing strange symptoms—often a participant who’d received the placebo rather than the actual vaccine! That person matters. Any future vaccine recipient who might develop similar symptoms matters. But the 10,000 people who die of covid every single day we delay, along with the hundreds of millions more impoverished, kept out of school, etc., matter equally. If we threw them all onto the same utilitarian scale, would we be making the same tradeoffs that we are now? I feel like the question answers itself.
  2. The Argument from Perceived Risk. Even with all the testing that’s been done, somewhere between 16% and 40% of Americans (depending on which poll you believe) say that they’ll refuse to get a covid vaccine, often because of anti-vaxx conspiracy theories. How much higher would the percentage be had the vaccines been rushed out in a month or two? And of course, if not enough people get vaccinated, then R0 remains above 1 and the public-health campaign is a failure. In this way of thinking, we need three phases of clinical trials the same way we need everyone to take off their shoes at airport security: it might not prevent a single terrorist, but the masses will be too scared to get on the planes if we don’t. To me, this (if true) only underscores my broader point, that the year-long delay in getting vaccines out represents a failure of our entire civilization, rather than a failure of any one agent. But also: people’s membership in the pro- or anti-vaxx camps is not static. The percentage saying they’ll get a covid vaccine seems to have already gone up, as a formerly abstract question becomes a stark choice between wallowing in delusions and getting a deadly disease, or accepting reality and not getting it. So while the Phase III trials were still underway—when the vaccines were already known to be safe, and experts thought it much more likely than not that they’d work—would it have been such a disaster to let Pfizer and Moderna sell the vaccines, for a hefty profit, to those who wanted them? With the hope that, just like with the iPhone or any other successful consumer product, satisfied early adopters would inspire the more reticent to get in line too?
  3. The Argument from Trump. Now for the most awkward counterargument, which I’d like to address head-on rather than dodge. If the vaccines had been approved faster in the US, it would’ve looked to many like Trump deserved credit for it, and he might well have been reelected. And devastating though covid has been, Trump is plausibly worse! Here’s my response: Trump has the mentality of a toddler, albeit with curiosity swapped out for cruelty and vindictiveness. His and his cronies’ impulsivity, self-centeredness, and incompetence are likely responsible for at least ~200,000 of the 330,000 Americans now dead from covid. But, yes, reversing his previous anti-vaxx stance, Trump did say that he wanted to see a covid vaccine in months, just like I’ve said. Does it make me uncomfortable to have America’s worst president in my “camp”? Only a little, because I have no problem admitting that sometimes toddlers are right and experts are wrong. The solution, I’d say, is not to put toddlers in charge of the government! As should be obvious by now—indeed, as should’ve been obvious back in 2016—that solution has some exceedingly severe downsides. The solution, rather, is to work for a world where experts are unafraid to speak bluntly, so that it never falls to a mental toddler to say what the experts can’t say without jeopardizing their careers.

Anyway, despite everything I’ve written, considerations of Aumann’s Agreement Theorem still lead me to believe there’s an excellent chance that I’m wrong, and the vaccines couldn’t realistically have been rolled out any faster. The trouble is, I don’t understand why. And I don’t understand why compressing this process, from a year or two to at most a month or two, shouldn’t be civilization’s most urgent priority ahead of the next pandemic. So go ahead, explain it to me! I’ll be eternally grateful to whoever makes me retract this post in shame.

Update (Jan. 1, 2021): If you want a sense of the on-the-ground realities of administering the vaccine in the US, check out this long post by Zvi Mowshowitz. Briefly, it looks like in my post, I gave those in charge way too much benefit of the doubt (!!). The Trump administration pledged to administer 20 million vaccines by the end of 2020; instead it administered fewer than 3 million. Crucially, this is not because of any problem with manufacturing or supply, but just because of pure bureaucratic blank-facedness. Incredibly, even as the pandemic rages, most of the vaccines are sitting in storage, at severe risk of spoiling … and officials’ primary concern is not to administer the precious doses, but just to make sure no one gets a dose “out of turn.” In contrast to Israel, where they’re now administering vaccines 24/7, including on Shabbat, with the goal being to get through the entire population as quickly as possible, in the US they’re moving at a snail’s pace and took off for the holidays. In Wisconsin, a pharmacist intentionally spoiled hundreds of doses; in West Virginia, they mistakenly gave antibody treatments instead of vaccines. There are no longer any terms to understand what’s happening other than those of black comedy.

January 01, 2021

Doug NatelsonIdle speculation can teach physics, vacuum energy edition

To start the new year, a silly anecdote ending in real science.

Back when I was in grad school, around 25 years ago, I was goofing around chatting with one of my fellow group members, explaining about my brilliant (ahem) vacuum energy extraction machine.  See, I had read this paper by Robert L. Forward, which proposed an interesting idea, that one could use the Casimir effect to somehow "extract energy from the vacuum" - see here (pdf).  
Fig from here.


(For those not in the know: the Casimir effect is an attractive (usually) interaction between conductors that grows rapidly at very small separations.  The non-exotic explanation for the force is that it is a relativistic generalization of the van der Waals force.  The exotic explanation for the force is that conducting boundaries interact with zero-point fluctuations of the electromagnetic field, so that "empty" space outside the region of the conductors has higher energy density.   As explained in the wiki link and my previous post on the topic, the non-exotic explanation seemingly covers everything without needing exotic physics.)

Anyway, my (not serious) idea was, conceptually, to make a parallel plate structure where one plate is gold (e.g.) and the other is one of the high temperature superconductors.  Those systems are rather lousy conductors in the normal state.  So, the idea was, cool the system just below the superconducting transition.  The one plate becomes superconducting, leading ideally to dramatically increased Casimir attraction between the plates.  Let the plates get closer, doing work on some external load.  Then warm the plates just slightly, so that the superconductivity goes away.  The attraction should lessen, and the plate would spring back, doing less work of the opposite sign.  It's not obvious that the energy required to switch the superconductivity is larger than the energy one could extract from running such a cycle.   Of course, there has to be a catch (as Forward himself points out in the linked pdf above).  In our conversation, I realized that the interactions between the plates would very likely modify the superconducting transition, probably in just the way needed to avoid extracting net energy through this process.  

Fast forward to last week, when I randomly came upon this article.  Researchers actually did an experiment using nanomechanical resonators to try to measure the impact of the Casimir interactions on the superconducting transition in (ultrasmooth, quench-condensed) lead films.  They were not able to resolve anything (like a change in the transition temperature) in this first attempt, but it shows that techniques now exist to probe such tiny effects, and that idly throwing ideas around can sometimes stumble upon real physics.


Tommaso Dorigo2020 Review, 2021 Agenda

Everybody would agree that 2020 was a difficult time for all of us - the pandemic forced on us dramatic changes in our way of living, working, and interacting with one another; and let's leave alone the horrible, avoidable death toll that came with it. Notwithstanding, for some reason it was a productive year for me, and one which has potentially paved the ground for an even more productive future. Below I will summarize, if only for myself, the most important work milestones of the past year, and the ones that lay ahead in the forthcoming months. But I will also touch on a few ancillary activities and their outcome, for the record.

Geometry optimization of a muon-electron scattering experiment (MUonE) 

read more

Matt von HippelA Physicist New Year

Happy New Year to all!

Physicists celebrate the new year by trying to sneak one last paper in before the year is over. Looking at Facebook last night I saw three different friends preview the papers they just submitted. The site where these papers appear, arXiv, had seventy new papers this morning, just in the category of theoretical high-energy physics. Of those, nine of them were in my, or a closely related subfield.

I’d love to tell you all about these papers (some exciting! some long-awaited!), but I’m still tired from last night and haven’t read them yet. So I’ll just close by wishing you all, once again, a happy new year.

December 30, 2020

Tommaso DorigoIs The Dunning - Kruger Effect An Artifact ?

Ever had a nervous breakdown by reading Facebook threads where absolutely incompetent people entertain similar ignoramuses by providing explanations of everything from quantum physics to the way vaccines work? Or did you ever have to apply yoga techniques to avoid jumping into a bar conversation wherein some smart ass worked his audience by explaining things he clearly did not have the dimmest clue about?

read more

Doug NatelsonEnd of the year, looking back and looking forward

 A few odds and ends at the close of 2020:

  • This was not a good year, for just about anyone.  Please, let's take better care of each other (e.g.) and ourselves!  
  • The decision to cancel the in-person 2020 APS March Meeting looks pretty darn smart in hindsight.
  • Please take a moment and consider how amazing it is that in less than a year, there are now multiple efficacious vaccines for SARS-Cov-2, using different strategies, when no one had ever produced a successful vaccine for any coronavirus in the past.  Logistical problems of distribution aside, this is a towering scientific achievement.  People who don't "believe" in vaccines, yet are willing to use (without thinking) all sorts of other scientific and engineering marvels, are amazing to me, and not in a good way.  For a compelling book about this kind of science, I again recommend this one, as I had done ten years ago.
  • I also recommend this book about the history of money.  Fascinating and extremely readable.  It's remarkable how we ended up where we are in terms of fiat currencies, and the fact that there are still fundamental disagreements about economics is both interesting and sobering.
  • As is my habit, I've been thinking again about the amazing yet almost completely unsung intellectual achievement that is condensed matter physics.  The history of this is filled with leaps that are incredible in hindsight - for example, the Pauli principle in 1925, the formulation of the Schroedinger equation in 1926, and Bloch's theorem for electrons in crystals in 1928 (!!).  I've also found that there is seemingly only one biography of Sommerfeld (just started it) and no book-length biography of Felix Bloch (though there are this and this).  
  • Four years ago I posted about some reasons for optimism at the end of 2016.  Globally speaking, these are still basically valid, even if it doesn't feel like it many days.  Progress is not inevitable, but there is reason for hope.
Thanks for reading, and good luck in the coming year.  

December 29, 2020

Clifford JohnsonSunday Sketching

A shot of a (relatively) quick sketch in progress on Sunday, done with a dipping ink pen, and then splashing on some watercolour. Mostly just knocking rust off the sketching machinery (haven’t used these tools in a long while), and relaxing for some moments between one thing and the next … Click to continue reading this post

The post Sunday Sketching appeared first on Asymptotia.

December 28, 2020

John PreskillThe Grand Tour of quantum thermodynamics

Young noblemen used to undertake a “Grand Tour” during the 1600s and 1700s. Many of the tourists hailed from England, though well-to-do compatriots traveled from Scandinavia, Germany, and the United States. The men had just graduated from university—in many cases, Oxford or Cambridge. They’d studied classical history, language, and literature; and now, they’d experience what they’d read. Tourists flocked to Rome, Venice, and Florence, as well as to Paris; optional additions included Naples, Switzerland, Germany, and the Netherlands.

Tutors accompanied the tourists, guiding their charges across Europe. The tutors rounded out the young men’s education, instructing them in art, music, architecture, and continental society. I felt like those tutors, this month and last.1

I’m the one in the awkward-looking pose on the left.

I was lecturing in a quantum-thermodynamics mini course, with fellow postdoctoral scholar Matteo Lostaglio. Gabriel Landi, a professor of theoretical physics at the University of São Paolo in Brazil, organized the course. It targeted early-stage graduate students, who’d mastered the core of physics and who wished to immerse in quantum thermodynamics. But the enrollment ranged from PhD and Masters students to undergraduates, postdocs, faculty members, and industry employees.

The course toured quantum thermodynamics similarly to how young noblemen toured Europe. I imagine quantum thermodynamics as a landscape—one inked on a parchment map, with blue whorls representing the sea and with a dragon breathing fire in one corner. Quantum thermodynamics encompasses many communities whose perspectives differ and who wield different mathematical and conceptual tools. These communities translate into city-states, principalities, republics, and other settlements on the map. The class couldn’t visit every city, just as Grand Tourists couldn’t. But tourists had a leg up on us in their time budgets: A Grand Tour lasted months or years, whereas we presented nine hour-and-a-half lectures.

Attendees in Stuttgart

Grand Tourists returned home with trinkets, books, paintings, and ancient artifacts. I like to imagine that the tutors, too, acquired souvenirs. Here are four of my favorite takeaways from the course:

1) Most captivating subfield that I waded into for the course: Thermodynamic uncertainty relations. Researchers have derived these inequalities using nonequilibrium statistical mechanics, a field that encompasses molecular motors, nanorobots, and single strands of DNA. Despite the name “uncertainty relations,” classical and quantum systems obey these inequalities.

Imagine a small system interacting with big systems that have different temperatures and different concentrations of particles. Energy and particles hop between the systems, dissipating entropy (\Sigma) and forming currents. The currents change in time, due to the probabilistic nature of statistical mechanics. 

How much does a current vary, relative to its average value, \langle J \rangle? We quantify this variation with the relative variance, {\rm var}(J) / \langle J \rangle^2. Say that you want a low-variance, predictable current. You’ll have to pay a high entropy cost: \frac{ {\rm var} (J) }{\langle J \rangle^2 } \geq  \frac{2 k_{\rm B} }{\Sigma}, wherein k_{\rm B} denotes Boltzmann’s constant. 

Thermodynamic uncertainty relations govern systems arbitrarily far from equilibrium. We know loads about systems at equilibrium, in which large-scale properties remain approximately constant and no net flows (such as flows of particles) enter or leave the system. We know much about systems close to equilibrium. The regime arbitrarily far from equilibrium is the Wild, Wild West of statistical mechanics. Proving anything about this regime tends to require assumptions and specific models, to say nothing of buckets of work. But thermodynamic uncertainty relations are general, governing classical and quantum systems from molecular motors to quantum dots.

Multiple cats attended our mini course, according to the selfies we received.

2) Most unexpected question: During lecture one, I suggested readings that introduce quantum thermodynamics. The suggestions included two reviews and the article I wrote for Scientific American about quantum steampunk, my angle on quantum thermodynamics. The next day, a participant requested recommendations of steampunk novels. I’d prepared more for requests for justifications of the steps in my derivations. But I forwarded a suggestion given to me twice: The Difference Engine, by William Gibson and Bruce Sterling.

3) Most insightful observation: My fellow tutor—I mean lecturer—pointed out how quantum thermodynamics doesn’t and does diverge from classical thermodynamics. Quantum systems can’t break the second law of thermodynamics, as classical systems can’t. Quantum engines can’t operate more efficiently than Carnot’s engine. Erasing information costs work, regardless of whether the information-bearing degree of freedom is classical or quantum. So broad results about quantum thermodynamics coincide with broad results about classical thermodynamics. We can find discrepancies by focusing on specific physical systems, such as a spring that can be classical or quantum.  

4) Most staggering numbers: Unlike undertaking a Grand Tour, participating in the mini course cost nothing. We invited everyone across the world to join, and 420 participants from 48 countries enrolled. I learned of the final enrollment days before the course began, scrolling through the spreadsheet of participants. Motivated as I had been to double-check my lecture notes, the number spurred my determination like steel on a horse’s flanks.

The Grand Tour gave rise to travelogues and guidebooks read by tourists across the centuries: Mark Twain has entertained readers—partially at his own expense—since 1869 in the memoir The Innocents Abroad. British characters in the 1908 novel A Room with a View diverge in their views of Baedeker’s Handbook to Northern Italy. Our course material, and videos of the lectures, remain online and available to everyone for free. You’re welcome to pack your trunk, fetch your cloak, and join the trip.

A screenshot from the final lecture

1In addition to guiding their wards, tutors kept the young men out of trouble—and one can only imagine what trouble wealthy young men indulged in the year after college. I didn’t share that responsibility.

December 27, 2020

Clifford JohnsonConjunction

[caption id="attachment_19762" align="aligncenter" width="499"]Jupiter and Saturn 21st December 2020 Jupiter (with some moons) and Saturn, 21st December 2020 (click for larger view) [/caption]

But... while the viewing on the 21st (the peak of the conjunction) was perfect, seeing three of the Galilean moons, and the glorious rings of Saturn, very clearly, getting a decent through-the-lens photo was not so trouble-free. I was dissatisfied with the roughs of the photos I got that night, with lots of blurring and aberrations that I felt I should have been able to overcome. So I spent the next day taking the telescope entirely apart, checking everything, and trying to colimate it properly, and testing schemes for better vibration stabilisation of the camera. I was ready for another session of photographing the next night, but it was cloudy, with only about [...] Click to continue reading this post

The post Conjunction appeared first on Asymptotia.

December 25, 2020

Noncommutative GeometryFields Academy Graduate Courses

Starting in January 2021, the Fields Institute in Toronto is going to run various advanced graduate courses in collaboration with Ontario universities. Students from all over the world can either take these courses, or just audit them. For details  see  here                                                         The Fields Academy has an undergraduate training component as well. For details

Matt von HippelNewtonmas in Uncertain Times

Three hundred and eighty-two years ago today (depending on which calendars you use), Isaac Newton was born. For a scientist, that’s a pretty good reason to celebrate.

Reason’s Greetings Everyone!

Last month, our local nest of science historians at the Niels Bohr Archive hosted a Zoom talk by Jed Z. Buchwald, a Newton scholar at Caltech. Buchwald had a story to tell about experimental uncertainty, one where Newton had an important role.

If you’ve ever had a lab course in school, you know experiments never quite go like they’re supposed to. Set a room of twenty students to find Newton’s constant, and you’ll get forty different answers. Whether you’re reading a ruler or clicking a stopwatch, you can never measure anything with perfect accuracy. Each time you measure, you introduce a little random error.

Textbooks worth of statistical know-how has cropped up over the centuries to compensate for this error and get closer to the truth. The simplest trick though, is just to average over multiple experiments. It’s so obvious a choice, taking a thousand little errors and smoothing them out, that you might think people have been averaging in this way through history.

They haven’t though. As far as Buchwald had found, the first person to average experiments in this way was Isaac Newton.

What did people do before Newton?

Well, what might you do, if you didn’t have a concept of random error? You can still see that each time you measure you get a different result. But you would blame yourself: if you were more careful with the ruler, quicker with the stopwatch, you’d get it right. So you practice, you do the experiment many times, just as you would if you were averaging. But instead of averaging, you just take one result, the one you feel you did carefully enough to count.

Before Newton, this was almost always what scientists did. If you were an astronomer mapping the stars, the positions you published would be the last of a long line of measurements, not an average of the rest. Some other tricks existed. Tycho Brahe for example folded numbers together pair by pair, averaging the first two and then averaging that average with the next one, getting a final result weighted to the later measurements. But, according to Buchwald, Newton was the first to just add everything together.

Even Newton didn’t yet know why this worked. It would take later research, theorems of statistics, to establish the full justification. It seems Newton and his later contemporaries had a vague physics analogy in mind, finding a sort of “center of mass” of different experiments. This doesn’t make much sense – but it worked, well enough for physics as we know it to begin.

So this Newtonmas, let’s thank the scientists of the past. Working piece by piece, concept by concept, they gave use the tools to navigate our uncertain times.

December 19, 2020

Tommaso DorigoJupiter And Saturn Put Up Quite A Show

In what is a once-in-a-few-lifetimes experience, I witnessed today the conjunction of Jupiter and Saturn in the evening sky (with a crescent moon thrown in to boot). While every sixteen years or so the two planets end up angularly close because of their different orbital period (Jupiter revolves around our Sun in 11.9 years, Saturn takes 29.4 years), small differences in their orbital planes make the smallest distance they reach usually of the order a degree. 

read more

Doug NatelsonThe physics of beskar

 In keeping with my previous posts about favorite science fiction condensed matter systems and the properties of vibranium, I think we are overdue for an observational analysis of the physics of beskar.  Beskar is the material of choice of the Mandalorians in the Star Wars universe.  It is apparently an alloy (according to wookiepedia), and it is most notable for being the only material that can resist direct attack by lightsaber, as well as deflecting blaster shots.   

Like many fictional materials, beskar has whatever properties are needed to drive the plot and look cool doing so, but it's still fun to think about what would have to be going on in the material for it to behave the way it appears on screen.  

In ingot form, beskar looks rather like Damascus steel (or perhaps Valyrian steel, though without the whole dragonfire aspect).  That's a bit surprising, since the texturing in damascene steel involves phase separation upon solidification from the melt, while the appearance of beskar is homogeneous when it's in the form of armor plating or a spear.  From the way people handle it, beskar seems to have a density similar to steel, though perhaps a bit lower.

Beskar's shiny appearance says that at least at optical frequencies the material is a metal, meaning it has highly mobile charge carriers.  Certainly everyone calls it a metal.  That is interesting in light of two of its other obvious properties:  An extremely high melting point (we know that lightsabers can melt through extremely tough metal plating as in blast doors); and extremely poor thermal conductivity.  (Possible spoilers for The Mandalorian S2E8 - it is possible to hold a beskar spear with gloved hands mere inches from where the spear is visibly glowing orange.)  Because mobile charge carriers tend to conduct heat very well (see the Wiedemann Franz relation), it's tricky to have metals that are really bad thermal conductors.  This is actually a point consistent with beskar being an alloy, though.  Alloys tend to have higher electrical resistivity and poorer thermal conduction than pure substances.  

The high melting temperature is consistent with the nice acoustic properties of beskar (as seen here, in S2E7), and its extreme mechanical toughness.  The high melting temperature is tricky, though, because there is on-screen evidence that beskar may be melted (for forging into armor) without being heated to glowing.  Indeed, at about 1:02 in this video, the Armorer is able to melt a beskar ingot at the touch of a button on a console.  This raises a very interesting possibility, that beskar is close to a solid-liquid phase transition that may be tuned to room temperature via a simple external parameter (some externally applied field?).  This must be something subtle, because otherwise you could imagine anti-beskar weapons that would turn Mandalorian armor into a puddle on the floor.  

Regardless of the inconsistencies in its on-screen portrayal (which are all minor compared to the way dilithium has been shown), beskar is surely a worthy addition to fictional materials science.  This is The Way.

 

December 13, 2020

Terence TaoSendov’s conjecture for sufficiently high degree polynomials

I’ve just uploaded to the arXiv my paper “Sendov’s conjecture for sufficiently high degree polynomials“. This paper is a contribution to an old conjecture of Sendov on the zeroes of polynomials:

Conjecture 1 (Sendov’s conjecture) Let {f: {\bf C} \rightarrow {\bf C}} be a polynomial of degree {n \geq 2} that has all zeroes in the closed unit disk {\{ z: |z| \leq 1 \}}. If {\lambda_0} is one of these zeroes, then {f'} has at least one zero in {\{z: |z-\lambda_0| \leq 1\}}.

It is common in the literature on this problem to normalise {f} to be monic, and to rotate the zero {\lambda_0} to be an element {a} of the unit interval {[0,1]}. As it turns out, the location of {a} on this unit interval {[0,1]} ends up playing an important role in the arguments.

Many cases of this conjecture are already known, for instance

In particular, in high degrees the only cases left uncovered by prior results are when {a} is close (but not too close) to {0}, or when {a} is close (but not too close) to {1}; see Figure 1 of my paper.

Our main result covers the high degree case uniformly for all values of {a \in [0,1]}:

Theorem 2 There exists an absolute constant {n_0} such that Sendov’s conjecture holds for all {n \geq n_0}.

In principle, this reduces the verification of Sendov’s conjecture to a finite time computation, although our arguments use compactness methods and thus do not easily provide an explicit value of {n_0}. I believe that the compactness arguments can be replaced with quantitative substitutes that provide an explicit {n_0}, but the value of {n_0} produced is likely to be extremely large (certainly much larger than {9}).

Because of the previous results (particularly those of Chalebgwa and Chijiwa), we will only need to establish the following two subcases of the above theorem:

Theorem 3 (Sendov’s conjecture near the origin) Under the additional hypothesis {a = o(1/\log n)}, Sendov’s conjecture holds for sufficiently large {n}.

Theorem 4 (Sendov’s conjecture near the unit circle) Under the additional hypothesis {1-o(1) \leq a \leq 1 - \varepsilon_0^n} for a fixed {\varepsilon_0>0}, Sendov’s conjecture holds for sufficiently large {n}.

We approach these theorems using the “compactness and contradiction” strategy, assuming that there is a sequence of counterexamples whose degrees {n} going to infinity, using various compactness theorems to extract various asymptotic objects in the limit {n \rightarrow \infty}, and somehow using these objects to derive a contradiction. There are many ways to effect such a strategy; we will use a formalism that I call “cheap nonstandard analysis” and which is common in the PDE literature, in which one repeatedly passes to subsequences as necessary whenever one invokes a compactness theorem to create a limit object. However, the particular choice of asymptotic formalism one selects is not of essential importance for the arguments.

I also found it useful to use the language of probability theory. Given a putative counterexample {f} to Sendov’s conjecture, let {\lambda} be a zero of {f} (chosen uniformly at random among the {n} zeroes of {f}, counting multiplicity), and let {\zeta} similarly be a uniformly random zero of {f'}. We introduce the logarithmic potentials

\displaystyle  U_\lambda(z) := {\bf E} \log \frac{1}{|z-\lambda|}; \quad U_\zeta(z) := {\bf E} \log \frac{1}{|z-\zeta|}

and the Stieltjes transforms

\displaystyle  s_\lambda(z) := {\bf E} \frac{1}{z-\lambda}; \quad s_\zeta(z) := {\bf E} \log \frac{1}{z-\zeta}.

Standard calculations using the fundamental theorem of algebra yield the basic identities

\displaystyle  U_\lambda(z) = \frac{1}{n} \log \frac{1}{|f(z)|}; \quad U_\zeta(z) = \frac{1}{n-1} \log \frac{n}{|f'(z)|}

and

\displaystyle  s_\lambda(z) = \frac{1}{n} \frac{f'(z)}{f(z)}; \quad s_\zeta(z) = \frac{1}{n-1} \frac{f''(z)}{f'(z)} \ \ \ \ \ (1)

and in particular the random variables {\lambda, \zeta} are linked to each other by the identity

\displaystyle  U_\lambda(z) - \frac{n-1}{n} U_\zeta(z) = \frac{1}{n} \log |s_\lambda(z)|. \ \ \ \ \ (2)

On the other hand, the hypotheses of Sendov’s conjecture (and the Gauss-Lucas theorem) place {\lambda,\zeta} inside the unit disk {\{ z:|z| \leq 1\}}. Applying Prokhorov’s theorem, and passing to a subsequence, one can then assume that the random variables {\lambda,\zeta} converge in distribution to some limiting random variables {\lambda^{(\infty)}, \zeta^{(\infty)}} (possibly defined on a different probability space than the original variables {\lambda,\zeta}), also living almost surely inside the unit disk. Standard potential theory then gives the convergence

\displaystyle  U_\lambda(z) \rightarrow U_{\lambda^{(\infty)}}(z); \quad U_\zeta(z) \rightarrow U_{\zeta^{(\infty)}}(z) \ \ \ \ \ (3)

and

\displaystyle  s_\lambda(z) \rightarrow s_{\lambda^{(\infty)}}(z); \quad s_\zeta(z) \rightarrow s_{\zeta^{(\infty)}}(z) \ \ \ \ \ (4)

at least in the local {L^1} sense. Among other things, we then conclude from the identity (2) and some elementary inequalities that

\displaystyle  U_{\lambda^{(\infty)}}(z) = U_{\zeta^{(\infty)}}(z)

for all {|z|>1}. This turns out to have an appealing interpretation in terms of Brownian motion: if one takes two Brownian motions in the complex plane, one originating from {\lambda^{(\infty)}} and one originating from {\zeta^{(\infty)}}, then the location where these Brownian motions first exit the unit disk {\{ z: |z| \leq 1 \}} will have the same distribution. (In our paper we actually replace Brownian motion with the closely related formalism of balayage.) This turns out to connect the random variables {\lambda^{(\infty)}}, {\zeta^{(\infty)}} quite closely to each other. In particular, with this observation and some additional arguments involving both the unique continuation property for harmonic functions and Grace’s theorem (discussed in this previous post), with the latter drawn from the prior work of Dégot, we can get very good control on these distributions:

Theorem 5
  • (i) If {a = o(1)}, then {\lambda^{(\infty)}, \zeta^{(\infty)}} almost surely lie in the semicircle {\{ e^{i\theta}: \pi/2 \leq \theta \leq 3\pi/2\}} and have the same distribution.
  • (ii) If {a = 1-o(1)}, then {\lambda^{(\infty)}} is uniformly distributed on the circle {\{ z: |z|=1\}}, and {\zeta^{(\infty)}} is almost surely zero.

In case (i) (and strengthening the hypothesis {a=o(1)} to {a=o(1/\log n)} to control some technical contributions of “outlier” zeroes of {f}), we can use this information about {\lambda^{(\infty)}} and (4) to ensure that the normalised logarithmic derivative {\frac{1}{n} \frac{f'}{f} = s_\lambda} has a non-negative winding number in a certain small (but not too small) circle around the origin, which by the argument principle is inconsistent with the hypothesis that {f} has a zero at {a = o(1)} and that {f'} has no zeroes near {a}. This is how we establish Theorem 3.

Case (ii) turns out to be more delicate. This is because there are a number of “near-counterexamples” to Sendov’s conjecture that are compatible with the hypotheses and conclusion of case (ii). The simplest such example is {f(z) = z^n - 1}, where the zeroes {\lambda} of {f} are uniformly distributed amongst the {n^{th}} roots of unity (including at {a=1}), and the zeroes of {f'} are all located at the origin. In my paper I also discuss a variant of this construction, in which {f'} has zeroes mostly near the origin, but also acquires a bounded number of zeroes at various locations {\lambda_1+o(1),\dots,\lambda_m+o(1)} inside the unit disk. Specifically, we take

\displaystyle  f(z) := \left(z + \frac{c_2}{n}\right)^{n-m} P(z) - \left(a + \frac{c_2}{n}\right)^{n-m} P(a)

where {a = 1 - \frac{c_1}{n}} for some constants {0 < c_1 < c_2} and

\displaystyle  P(z) := (z-\lambda_1) \dots (z-\lambda_m).

By a perturbative analysis to locate the zeroes of {f}, one eventually would be able to arrive at a true counterexample to Sendov’s conjecture if these locations {\lambda_1,\dots,\lambda_m} were in the open lune

\displaystyle  \{ \lambda: |\lambda| < 1 < |\lambda-1| \}

and if one had the inequality

\displaystyle  c_2 - c_1 - c_2 \cos \theta + \sum_{j=1}^m \log \left|\frac{1 - \lambda_j}{e^{i\theta} - \lambda_j}\right| < 0 \ \ \ \ \ (5)

for all {0 \leq \theta \leq 2\pi}. However, if one takes the mean of this inequality in {\theta}, one arrives at the inequality

\displaystyle  c_2 - c_1 + \sum_{j=1}^m \log |1 - \lambda_j| < 0

which is incompatible with the hypotheses {c_2 > c_1} and {|\lambda_j-1| > 1}. In order to extend this argument to more general polynomials {f}, we require a stability analysis of the endpoint equation

\displaystyle  c_2 - c_1 + c_2 \cos \theta + \sum_{j=1}^m \log \left|\frac{1 - \lambda_j}{e^{i\theta} - \lambda_j}\right| = 0 \ \ \ \ \ (6)

where we now only assume the closed conditions {c_2 \geq c_1} and {|\lambda_j-1| \geq 1}. The above discussion then places all the zeros {\lambda_j} on the arc

\displaystyle  \{ \lambda: |\lambda| < 1 = |\lambda-1|\} \ \ \ \ \ (7)

and if one also takes the second Fourier coefficient of (6) one also obtains the vanishing second moment

\displaystyle  \sum_{j=1}^m \lambda_j^2 = 0.

These two conditions are incompatible with each other (except in the degenerate case when all the {\lambda_j} vanish), because all the non-zero elements {\lambda} of the arc (7) have argument in {\pm [\pi/3,\pi/2]}, so in particular their square {\lambda^2} will have negative real part. It turns out that one can adapt this argument to the more general potential counterexamples to Sendov’s conjecture (in the form of Theorem 4). The starting point is to use (1), (4), and Theorem 5(ii) to obtain good control on {f''/f'}, which one then integrates and exponentiates to get good control on {f'}, and then on a second integration one gets enough information about {f} to pin down the location of its zeroes to high accuracy. The constraint that these zeroes lie inside the unit disk then gives an inequality resembling (5), and an adaptation of the above stability analysis is then enough to conclude. The arguments here are inspired by the previous arguments of Miller, which treated the case when {a} was extremely close to {1} via a similar perturbative analysis; the main novelty is to control the error terms not in terms of the magnitude of the largest zero {\zeta} of {f'} (which is difficult to manage when {n} gets large), but rather by the variance of those zeroes, which ends up being a more tractable expression to keep track of.

December 11, 2020

Terence TaoCourse announcement: 246B, complex analysis

Just a short announcement that next quarter I will be continuing the recently concluded 246A complex analysis class as 246B. Topics I plan to cover:

  • Schwartz-Christoffel transformations and the uniformisation theorem (using the remainder of the 246A notes);
  • Jensen’s formula and factorization theorems (particularly Weierstrass and Hadamard); the Gamma function;
  • Connections with the Fourier transform on the real line;
  • Elliptic functions and their relatives;
  • (if time permits) the Riemann zeta function and the prime number theorem.

Notes for the later material will appear on this blog in due course.

December 08, 2020

Mark Chu-CarrollDeceptive Statistics and Elections

I don’t mean to turn by blog into political nonsense central, but I just can’t pass on some of these insane arguments.

This morning, the state of Texas sued four other states to overturn the results of our presidential election. As part of their suit, they included an "expert analysis" that claims that the odds of the election results in the state of Georgia being legitimate are worse that 1 in one quadrillion. So naturally, I had to take a look.

Here’s the meat of the argument that they’re claiming "proves" that the election results were fraudulent.

I tested the hypothesis that the performance of the two Democrat candidates were statistically similar by comparing Clinton to Biden. I use a Z-statistic or score, which measures the number of standard deviations the observation is above the mean value of the comparison being made. I compare the total votes of each candidate, in two elections and test the hypothesis that other things being the same they would have an equal number of votes.
I estimate the variance by multiplying the mean times the probability of the candidate not getting a vote. The hypothesis is tested using a Z-score which is the difference between the two candidates’ mean values divided by the square root of the sum of their respective variances. I use the calculated Z-score to determine the p-value, which is the probability of finding a test result at least as extreme as the actual results observed. First, I determine the Z-score comparing the number of votes Clinton received in 2016 to the number of votes Biden received in 2020. The Z-score is 396.3. This value corresponds to a confidence that I can reject the hypothesis many times more than one in a quadrillion times that the two outcomes were similar.

This is, to put it mildly, a truly trashy argument. I’d be incredibly ashamed if a high school student taking statistics turned this in.

What’s going on here?

Well, to start, this is deliberately bad writing. It’s using a whole lot of repetitive words in confusing ways in order to make it sound complicated and scientific.

I can simplify the writing, and that will make it very clear what’s going on.

I tested the hypothesis that the performance of the two Democrat candidates were statistically similar by comparing Clinton to Biden. I started by assuming that the population of eligible voters, the rates at which they cast votes, and their voting preferences, were identical for the two elections. I further assumed that the counted votes were a valid random sampling of the total population of voters. Then I computed the probability that in two identical populations, a random sampling could produce results as different as the observed results of the two elections.

As you can see from the rewrite, the "analysis" assumes that the voting population is unchanged, and the preferences of the voters are unchanged. He assumes that the only thing that changed is the specific sampling of voters from the population of eligible voters – and in both elections, he assumes that the set of people who actually vote is a valid random sample of that population.

In other words, if you assume that:

  1. No one ever changes their mind and votes for different parties candidates in two sequential elections;
  2. The population and its preferences never changes – people don’t move in and out of the state, and new people don’t register to vote;
  3. The specific people who vote in an election is completely random.

Then you can say that this election result is impossible and clearly indicates fraud.

The problem is, none of those assumptions are anywhere close to correct or reasonable. We know that people’s voting preference change. We know that the voting population changes. We know that who turns out to vote changes. None of these things are fixed constants – and any analysis that assumes any of these things is nothing but garbage.

But I’m going to zoom in a bit on one of those: the one about the set of voters being a random sample.

When it comes to statistics, the selection of a sample is one of the most important, fundamental concerns. If your sample isn’t random, then it’s not random. You can’t compare results for two samples as if they’re equivalent if they aren’t equivalent.

Elections aren’t random statistical samples of the population. They’re not even intended to be random statistical samples. They’re deliberately performed as counts of motivated individual who choose to come out and cast their votes. In statistical terms, they’re a self-selected, motivated sample. Self-selected samples are neither random nor representative in a statistical sense. There’s nothing wrong with that: an election isn’t intended to be a random sample. But it does mean that when you do statistical analysis, you cannot treat the set of voters as a random sampling of the population of elegible voters; and you cannot make any assumptions about uniformity when you’re comparing the results of two different elections.

If you could – if the set of voters was a valid random statistical sample of an unchanging population of eligible voters, then there’d be no reason to even have elections on an ongoing basis. Just have one election, take its results as the eternal truth, and just assume that every election in the future would be exactly the same!

But that’s not how it works. And the people behind this lawsuit, and particularly the "expert" who wrote this so-called statistical analysis, know that. This analysis is pure garbage, put together to deceive. They’re hoping to fool someone into believing that they actually prove something that they couldn’t prove.

And that’s despicable.

December 02, 2020

Mark Chu-CarrollHerd Immunity

With COVID running rampant throughout the US, I’ve seen a bunch of discussions about herd immunity, and questions about what it means. There’s a simple mathematical concept behind it, so I decided to spend a bit of time explaining.

The basic concept is pretty simple. Let’s put together a simple model of an infectious disease. This will be an extremely simple model – we won’t consider things like variable infectivity, population age distributions, population density – we’re just building a simple model to illustrate the point.

To start, we need to model the infectivity of the disease. This is typically done using the name R_0. R_0 is the average number of susceptible people that will be infected by each person with the disease.

R_0 is the purest measure of infectivity – it’s the infectivity of the disease in ideal circumstances. In practice, we look for a value R, which is the actual infectivity. R includes the effects of social behaviors, population density, etc.

The state of an infectious disease is based on the expected number of new infections that will be produced by each infected individual. We compute that by using a number S, which is the proportion of the population that is susceptible to the disease.

  • If R S < 1, then the disease dies out without spreading throughout the population. More people can get sick, but each wave of infection will be smaller than the last.
  • If R S = 1, then the disease is said to be endemic. It continues as a steady state in the population. It never spreads dramatically, but it never dies out, either.
  • If R S > 1, then the disease is pandemic. Each wave of infection spreads the disease to a larger subsequent wave. The higher the value of R in a pandemic, the faster the disease will spread, and the more people will end up sick.

There are two keys to managing the spread of an infectious disease

  1. Reduce the effective value of R. The value of R can be affected by various attributes of the population, including behavioral ones. In the case of COVID-19, an infected person wearing a mask will spread the disease to fewer others; and if other people are also wearing masks, then it will spread even less.
  2. Reduce the value of S. If there are fewer susceptible people in the population, then even with a high value of R, the disease can’t spread as quickly.

The latter is the key concept behind herd immunity. If you can get the value of S to be small enough, then you can get R * S to the sub-endemic level – you can prevent the disease from spreading. You’re effectively denying the disease access to enough susceptible people to be able to spread.

Let’s look at a somewhat concrete example. The R_0 for measles is somewhere around 15, which is insanely infectious. If 50% of the population is susceptible, and no one is doing anything to avoid the infection, then each person infected with measles will infect 7 or 8 other people – and they’ll each infect 7 or 8 others – and so on, which means you’ll have epidemic spread.

Now, let’s say that we get 95% of the population vaccinated, and they’re immune to measles. Now R * S = 15 * 0.05 = 0.75. The disease isn’t able to spread. If you had an initial outbreak of 5 infected, then they’ll infect around 3 people, who’ll infect around 2 people, who’ll infect one person, and soon, there’s no more infections.

In this case, we say that the population has herd immunity to the measles. There aren’t enough susceptible people in the population to sustain the spread of the disease – so if the disease is introduced to the population, it will rapidly die out. Even if there are individuals who are still susceptible, they probably won’t get infected, because there aren’t enough other susceptible people to carry it to them.

There are very few diseases that are as infectious as measles. But even with a disease that is that infectious, you can get to herd immunity relatively easily with vaccination.

Without vaccination, it’s still possible to develop herd immunity. It’s just extremely painful. If you’re dealing with a disease that can kill, getting to herd immunity means letting the disease spread until enough people have gotten sick and recovered that the disease can’t spread any more. What that means is letting a huge number of people get sick and suffer – and let some portion of those people die.

Getting back to COVID-19: it’s got an R_0 that’s much lower. It’s somewhere between 1.4 and 2.5. Of those who get sick, even with good medical care, somewhere between 1 and 2% of the infected end up dying. Based on that R_0, herd immunity for COVID-19 (the value of S required to make R*S<1) is somewhere around 50% of the population. Without a vaccine, that means that we’d need to have 150 million people in the US get sick, and of those, around 2 million would die.

(UPDATE: Ok, so I blew it here. The papers that I found in a quick search appear to have a really bad estimate. The current CDC estimate of R_0 is around 5.7 – so the S needed for herd immunity is significantly higher – upward of 80%, and so the would the number of deaths.)

A strategy for dealing with an infection disease that accepts the needless death of 2 million people is not exactly a good strategy.

November 30, 2020

John PreskillMay you go from weakness to weakness

I used to eat lunch at the foundations-of-quantum-theory table. 

I was a Masters student at the Perimeter Institute for Theoretical Physics, where I undertook a research project during the spring term. The project squatted on the border between quantum information theory and quantum foundations, where my two mentors worked. Quantum foundations concerns how quantum physics differs from classical physics; which alternatives to quantum physics could govern our world but don’t; and those questions, such as about Schrödinger’s cat, that fascinate us when we first encounter quantum theory, that many advisors warn probably won’t land us jobs if we study them, and that most physicists argue about only over a beer in the evening.

I don’t drink beer, so I had to talk foundations over sandwiches around noon.

One of us would dream up what appeared to be a perpetual-motion machine; then the rest of us would figure out why it couldn’t exist. Satisfied that the second law of thermodynamics still reigned, we’d decamp for coffee. (Perpetual-motion machines belong to the foundations of thermodynamics, rather than the foundations of quantum theory, but we didn’t discriminate.) I felt, at that lunch table, an emotion blessed to a student finding her footing in research, outside her country of origin: belonging.

The quantum-foundations lunch table came to mind last month, when I learned that Britain’s Institute of Physics had selected me to receive its International Quantum Technology Emerging Researcher Award. I was very grateful for the designation, but I was incredulous: Me? Technology? But I began grad school at the quantum-foundations lunch table. Foundations is to technology as the philosophy of economics is to dragging a plow across a wheat field, at least stereotypically.

Worse, I drag plows from wheat field to barley field to oat field. I’m an interdisciplinarian who never belongs in the room I’ve joined. Among quantum information theorists, I’m the thermodynamicist, or that theorist who works with experimentalists; among experimentalists, I’m the theorist; among condensed-matter physicists, I’m the quantum information theorist; among high-energy physicists, I’m the quantum information theorist or the atomic-molecular-and-optical (AMO) physicist; and, among quantum thermodynamicists, I do condensed matter, AMO, high energy, and biophysics. I usually know less than everyone else in the room about the topic under discussion. An interdisciplinarian can leverage other fields’ tools to answer a given field’s questions and can discover questions. But she may sound, to those in any one room, as though she were born yesterday. As Kermit the Frog said, 

Grateful as I am, I’d rather not dwell on why the Institute of Physics chose my file; anyone interested can read the citation or watch the thank-you speech. But the decision turned out to involve foundations and interdisciplinarity. So I’m dedicating this article to two sources of inspiration: an organization that’s blossomed by crossing fields and an individual who’s driven technology by studying fundamentals.

Britain’s Institute for Physics has a counterpart in the American Physical Society. The latter has divisions, each dedicated to some subfield of physics. If you belong to the society and share an interest in one of those subfields, you can join that division, attend its conferences, and receive its newsletters. I learned about Division of Soft Matter from this article, which I wish I could quote almost in full. This division’s members study “a staggering variety of materials from the everyday to the exotic, including polymers such as plastics, rubbers, textiles, and biological materials like nucleic acids and proteins; colloids, a suspension of solid particles such as fogs, smokes, foams, gels, and emulsions; liquid crystals like those found in electronic displays; [ . . . ] and granular materials.” Members belong to physics, chemistry, biology, engineering, and geochemistry. 

Despite, or perhaps because of, its interdisciplinarity, the division has thrived. The group grew from a protodivision (a “topical group,” in the society’s terminology) to a division in five years—at “an unprecedented pace.” Intellectual diversity has complemented sociological diversity: The division “ranks among the top [American Physical Society] units in terms of female membership.” The division’s chair observes a close partnership between theory and experiment in what he calls “a vibrant young field.”

And some division members study oobleck. Wouldn’t you like to have an excuse to say “oobleck” every day?

The second source of inspiration lives, like the Institute of Physics, in Britain. David Deutsch belongs at the quantum-foundations table more than I. A theoretical physicist at Oxford, David cofounded the field of quantum computing. He explained why to me in a fusion of poetry and the pedestrian: He was “fixing the roof” of quantum theory. As a graduate student, David wanted to understand quantum foundations—what happens during a measurement—but concluded that quantum theory has too many holes. The roof was leaking through those holes, so he determined to fix them. He studied how information transformed during quantum processes, married quantum theory with computer science, and formalized what quantum computers could and couldn’t accomplish. Which—years down the road, fused with others’ contributions—galvanized experimentalists to harness ions and atoms, improve lasers and refrigerators, and build quantum computers and quantum cryptography networks. 

David is a theorist and arguably a philosopher. But he’d have swept the Institute of Physics’s playing field, could he have qualified as an “emerging researcher” this autumn (David began designing quantum algorithms during the 1980s).

I returned to the Perimeter Institute during the spring term of 2019. I ate lunch at the quantum-foundations table, and I felt that I still belonged. I feel so still. But I’ve eaten lunch at other tables by now, and I feel that I belong at them, too. I’m grateful if the habit has been useful.

Congratulations to Hannes Bernien, who won the institute’s International Quantum Technology Young Scientist Award, and to the “highly commended” candidates, whom you can find here!

November 27, 2020

Mark Chu-CarrollElection Fraud? Nope, just bad math

My old ScienceBlogs friend Mike Dunford has been tweeting his way through the latest lawsuit that’s attempting to overturn the results of our presidential election. The lawsuit is an amazingly shoddy piece of work. But one bit of it stuck out to me, because it falls into my area. Part of their argument tries to make the case that, based on "mathematical analysis", the reported vote counts couldn’t possibly make any sense.

The attached affidavit of Eric Quinell, Ph.D. ("Dr. Quinell Report) analyzez the extraordinary increase in turnout from 2016 to 2020 in a relatively small subset of townships and precincts outside of Detroit in Wayne County and Oakland county, and more importantly how nearly 100% or more of all "new" voters from 2016 to 2020 voted for Biden. See Exh. 102. Using publicly available information from Wayne County andOakland County, Dr. Quinell found that for the votes received up to the 2016 turnout levels, the 2020 vote Democrat vs Republican two-ways distributions (i.e. excluding third parties) tracked the 2016 Democrat vs. Republican distribution very closely…

This is very bad statistical analysis – it’s doing something which is absolutely never correct, which is guaranteed to produce a result that looks odd, and then pretending that the fact that you deliberately did something that will produce a certain result means that there’s something weird going on.

Let’s just make up a scenario with some numbers up to demonstrate. Let’s imagine a voting district in Cosine city. Cosine city has 1 million residents that are registered to vote.

In the 2016 election, let’s say that the election was dominated by two parties: the Radians, and the Degrees. The radians won 52% of the vote, and the Degrees won 48%. The voter turnout was low – just 45%.

Now, 2020 comes, and it’s a rematch of the Radians and the Degrees. But this time, the turnout was 50% of registered votes. The Degrees won, with 51% of the vote.

So let’s break that down into numbers for the two elections:

  • In 2016:
    • A total of 450,000 voters actually cast ballots.
    • The Radians got 234,000 votes.
    • The Degrees got 216,000 votes.
  • In 2020:
    • A total of 500,000 voters actually cast ballots.</li>
    • The Radians got 245,000 votes.</li>
    • The Degrees got 255,000 votes.</li>

Let’s do what Dr. Quinell did. Let’s look at the 2020 election numbers, and take out 450,000 votes which match the distribution from 2016. What we’re left with is:

  • 11,000 new votes for the Radians, and
  • 39,000 new votes for the Degrees.

There was a 3 percent shift in the vote, combined with an increase in voter turnout. Neither of those is unusual or radically surprising. But when you extract things in a statistically invalid way, we end up with a result that in a voting district which the vote for the two parties usually varies by no more than 4%, the "new votes" in this election went nearly 4:1 for one party.

If we reduced the increase in voter turnout, that ratio becomes significant worse. If the election turnout was 46%, then the numbers would be 460,000 total votes; 225,400 for the Radians and 234,600 for the Degrees. With Dr. Quinell’s analysis, that would give us: -9,000 votes for the Radians, and +18,000 votes for the Degrees. Or since negative votes don’t make sense, we can just stop at 225,400, and say that all of the remaining votes, every single new vote beyond what the Radians won last time, was taken by the Degrees. Clearly impossible, it must be fraud!

So what’s the problem here? What caused this reasonable result to suddenly look incredibly unlikely?

The votes are one big pool of numbers. You don’t know which data points came from which voters. You don’t know which voters are new versus old. What happened here is that the bozo doing the analysis baked in an invalid assumption. He assumed that all of the voters who voted in 2016 voted the same way in 2020.

"For the votes received up to the turnout level" isn’t something that’s actually measurable in the data. It’s an assertion of something without evidence. You can’t break out subgroups within a population, unless the subgroups were actually deliberately and carefully measured when the data was gathered. And in the case of an election, the data that he’s purportedly analyzing doesn’t actually contain the information needed to separate out that group.

You can’t do that. Or rather you can, but the results are, at best, meaningless.

November 26, 2020

Sean CarrollThanksgiving

This year we give thanks for one of the very few clues we have to the quantum nature of spacetime: black hole entropy. (We’ve previously given thanks for the Standard Model Lagrangian, Hubble’s Law, the Spin-Statistics Theorem, conservation of momentum, effective field theory, the error bar, gauge symmetry, Landauer’s Principle, the Fourier Transform, Riemannian Geometry, the speed of light, the Jarzynski equality, the moons of Jupiter, and space.)

Black holes are regions of spacetime where, according to the rules of Einstein’s theory of general relativity, the curvature of spacetime is so dramatic that light itself cannot escape. Physical objects (those that move at or more slowly than the speed of light) can pass through the “event horizon” that defines the boundary of the black hole, but they never escape back to the outside world. Black holes are therefore black — even light cannot escape — thus the name. At least that would be the story according to classical physics, of which general relativity is a part. Adding quantum ideas to the game changes things in important ways. But we have to be a bit vague — “adding quantum ideas to the game” rather than “considering the true quantum description of the system” — because physicists don’t yet have a fully satisfactory theory that includes both quantum mechanics and gravity.

The story goes that in the early 1970’s, James Bardeen, Brandon Carter, and Stephen Hawking pointed out an analogy between the behavior of black holes and the laws of good old thermodynamics. For example, the Second Law of Thermodynamics (“Entropy never decreases in closed systems”) was analogous to Hawking’s “area theorem”: in a collection of black holes, the total area of their event horizons never decreases over time. Jacob Bekenstein, who at the time was a graduate student working under John Wheeler at Princeton, proposed to take this analogy more seriously than the original authors had in mind. He suggested that the area of a black hole’s event horizon really is its entropy, or at least proportional to it.

This annoyed Hawking, who set out to prove Bekenstein wrong. After all, if black holes have entropy then they should also have a temperature, and objects with nonzero temperatures give off blackbody radiation, but we all know that black holes are black. But he ended up actually proving Bekenstein right; black holes do have entropy, and temperature, and they even give off radiation. We now refer to the entropy of a black hole as the “Bekenstein-Hawking entropy.” (It is just a useful coincidence that the two gentlemen’s initials, “BH,” can also stand for “black hole.”)

Consider a black hole whose area of its event horizon is A. Then its Bekenstein-Hawking entropy is

    \[S_\mathrm{BH} = \frac{c^3}{4G\hbar}A,\]

where c is the speed of light, G is Newton’s constant of gravitation, and \hbar is Planck’s constant of quantum mechanics. A simple formula, but already intriguing, as it seems to combine relativity (c), gravity (G), and quantum mechanics (\hbar) into a single expression. That’s a clue that whatever is going on here, it something to do with quantum gravity. And indeed, understanding black hole entropy and its implications has been a major focus among theoretical physicists for over four decades now, including the holographic principle, black-hole complementarity, the AdS/CFT correspondence, and the many investigations of the information-loss puzzle.

But there exists a prior puzzle: what is the black hole entropy, anyway? What physical quantity does it describe?

Entropy itself was invented as part of the development of thermodynamics is the mid-19th century, as a way to quantify the transformation of energy from a potentially useful form (like fuel, or a coiled spring) into useless heat, dissipated into the environment. It was what we might call a “phenomenological” notion, defined in terms of macroscopically observable quantities like heat and temperature, without any more fundamental basis in a microscopic theory. But more fundamental definitions came soon thereafter, once people like Maxwell and Boltzmann and Gibbs started to develop statistical mechanics, and showed that the laws of thermodynamics could be derived from more basic ideas of atoms and molecules.

Hawking’s derivation of black hole entropy was in the phenomenological vein. He showed that black holes give off radiation at a certain temperature, and then used the standard thermodynamic relations between entropy, energy, and temperature to derive his entropy formula. But this leaves us without any definite idea of what the entropy actually represents.

One of the reasons why entropy is thought of as a confusing concept is because there is more than one notion that goes under the same name. To dramatically over-simplify the situation, let’s consider three different ways of relating entropy to microscopic physics, named after three famous physicists:

  • Boltzmann entropy says that we take a system with many small parts, and divide all the possible states of that system into “macrostates,” so that two “microstates” are in the same macrostate if they are macroscopically indistinguishable to us. Then the entropy is just (the logarithm of) the number of microstates in whatever macrostate the system is in.
  • Gibbs entropy is a measure of our lack of knowledge. We imagine that we describe the system in terms of a probability distribution of what microscopic states it might be in. High entropy is when that distribution is very spread-out, and low entropy is when it is highly peaked around some particular state.
  • von Neumann entropy is a purely quantum-mechanical notion. Given some quantum system, the von Neumann entropy measures how much entanglement there is between that system and the rest of the world.

These seem like very different things, but there are formulas that relate them to each other in the appropriate circumstances. The common feature is that we imagine a system has a lot of microscopic “degrees of freedom” (jargon for “things that can happen”), which can be in one of a large number of states, but we are describing it in some kind of macroscopic coarse-grained way, rather than knowing what its exact state actually is. The Boltzmann and Gibbs entropies worry people because they seem to be subjective, requiring either some seemingly arbitrary carving of state space into macrostates, or an explicit reference to our personal state of knowledge. The von Neumann entropy is at least an objective fact about the system. You can relate it to the others by analogizing the wave function of a system to a classical microstate. Because of entanglement, a quantum subsystem generally cannot be described by a single wave function; the von Neumann entropy measures (roughly) how many different quantum must be involved to account for its entanglement with the outside world.

So which, if any, of these is the black hole entropy? To be honest, we’re not sure. Most of us think the black hole entropy is a kind of von Neumann entropy, but the details aren’t settled.

One clue we have is that the black hole entropy is proportional to the area of the event horizon. For a while this was thought of as a big, surprising thing, since for something like a box of gas, the entropy is proportional to its total volume, not the area of its boundary. But people gradually caught on that there was never any reason to think of black holes like boxes of gas. In quantum field theory, regions of space have a nonzero von Neumann entropy even in empty space, because modes of quantum fields inside the region are entangled with those outside. The good news is that this entropy is (often, approximately) proportional to the area of the region, for the simple reason that field modes near one side of the boundary are highly entangled with modes just on the other side, and not very entangled with modes far away. So maybe the black hole entropy is just like the entanglement entropy of a region of empty space?

Would that it were so easy. Two things stand in the way. First, Bekenstein noticed another important feature of black holes: not only do they have entropy, but they have the most entropy that you can fit into a region of a fixed size (the Bekenstein bound). That’s very different from the entanglement entropy of a region of empty space in quantum field theory, where it is easy to imagine increasing the entropy by creating extra entanglement between degrees of freedom deep in the interior and those far away. So we’re back to being puzzled about why the black hole entropy is proportional to the area of the event horizon, if it’s the most entropy a region can have. That’s the kind of reasoning that leads to the holographic principle, which imagines that we can think of all the degrees of freedom inside the black hole as “really” living on the boundary, rather than being uniformly distributed inside. (There is a classical manifestation of this philosophy in the membrane paradigm for black hole astrophysics.)

The second obstacle to simply interpreting black hole entropy as entanglement entropy of quantum fields is the simple fact that it’s a finite number. While the quantum-field-theory entanglement entropy is proportional to the area of the boundary of a region, the constant of proportionality is infinity, because there are an infinite number of quantum field modes. So why isn’t the entropy of a black hole equal to infinity? Maybe we should think of the black hole entropy as measuring the amount of entanglement over and above that of the vacuum (called the Casini entropy). Maybe, but then if we remember Bekenstein’s argument that black holes have the most entropy we can attribute to a region, all that infinite amount of entropy that we are ignoring is literally inaccessible to us. It might as well not be there at all. It’s that kind of reasoning that leads some of us to bite the bullet and suggest that the number of quantum degrees of freedom in spacetime is actually a finite number, rather than the infinite number that would naively be implied by conventional non-gravitational quantum field theory.

So — mysteries remain! But it’s not as if we haven’t learned anything. The very fact that black holes have entropy of some kind implies that we can think of them as collections of microscopic degrees of freedom of some sort. (In string theory, in certain special circumstances, you can even identify what those degrees of freedom are.) That’s an enormous change from the way we would think about them in classical (non-quantum) general relativity. Black holes are supposed to be completely featureless (they “have no hair,” another idea of Bekenstein’s), with nothing going on inside them once they’ve formed and settled down. Quantum mechanics is telling us otherwise. We haven’t fully absorbed the implications, but this is surely a clue about the ultimate quantum nature of spacetime itself. Such clues are hard to come by, so for that we should be thankful.

November 21, 2020

Georg von HippelLATTICE 2021 will be virtual

Not having heard anything about LATTICE 2021, which is to be hosted by MIT, for a while, I checked out their (very small) conference website, which used to say that participants of past conferences would be notified of details regarding registration etc. in summer (sc. this past summer). Doing so, I learned that LATTICE 2021 will now be a virtual conference and will be held in the week of July 25-31, 2021.

Given the current developments regarding the Covid-19 pandemic, this seems a reasonable thing to do; even though a vaccine is almost certain to be available by then, not everyone will have received it, and there will likely still be travel restrictions in place. Still, holding a conference online means missing out on the many important informal discussions in coffee break and over shared meals, the networking opportunities for younger researchers, and the general "family reunion" atmosphere that lattice conferences can often have (after all, the community isn't that huge, and it is overall a fairly friendly community). I hope the organizers have good ideas how to provide some of these in a virtual context.