Planet Musings

June 19, 2019

Tim GowersThe fate of combinatorics at Strathclyde

I have just received an email from Sergey Kitaev, one of the three combinatorialists at Strathclyde. As in many universities, they belong not to the mathematics department but to the computer science department. Kitaev informs me that the administrators of that department, in their infinite wisdom, have decided that the future of the department is best served by axing discrete mathematics. I won’t write a long post about this, but instead refer you to a post by Peter Cameron that says everything I would want to say about the decision, and does so extremely cogently. I recommend that you read it if this kind of decision worries you.

Tommaso DorigoThe Plot Of The Week - Detecting Dark Matter With Brownian Motion

I am reading a fun paper today, while traveling back home. I spent the past three days at CERN to follow a workshop on machine learning, where I also presented the Anomaly Detection algorithm I have been working on in the past few weeks (and about which I blogged here and here). This evening, I needed a work assignment to make my travel time productive, so why not reading some cool new research and blog about it?

read more

Terence TaoAbstracting induction on scales arguments

The following situation is very common in modern harmonic analysis: one has a large scale parameter {N} (sometimes written as {N=1/\delta} in the literature for some small scale parameter {\delta}, or as {N=R} for some large radius {R}), which ranges over some unbounded subset of {[1,+\infty)} (e.g. all sufficiently large real numbers {N}, or all powers of two), and one has some positive quantity {D(N)} depending on {N} that is known to be of polynomial size in the sense that

\displaystyle  C^{-1} N^{-C} \leq D(N) \leq C N^C \ \ \ \ \ (1)

for all {N} in the range and some constant {C>0}, and one wishes to obtain a subpolynomial upper bound for {D(N)}, by which we mean an upper bound of the form

\displaystyle  D(N) \leq C_\varepsilon N^\varepsilon \ \ \ \ \ (2)

for all {\varepsilon>0} and all {N} in the range, where {C_\varepsilon>0} can depend on {\varepsilon} but is independent of {N}. In many applications, this bound is nearly tight in the sense that one can easily establish a matching lower bound

\displaystyle  D(N) \geq C_\varepsilon N^{-\varepsilon}

in which case the property of having a subpolynomial upper bound is equivalent to that of being subpolynomial size in the sense that

\displaystyle  C_\varepsilon N^{-\varepsilon} \leq D(N) \leq C_\varepsilon N^\varepsilon \ \ \ \ \ (3)

for all {\varepsilon>0} and all {N} in the range. It would naturally be of interest to tighten these bounds further, for instance to show that {D(N)} is polylogarithmic or even bounded in size, but a subpolynomial bound is already sufficient for many applications.

Let us give some illustrative examples of this type of problem:

Example 1 (Kakeya conjecture) Here {N} ranges over all of {[1,+\infty)}. Let {d \geq 2} be a fixed dimension. For each {N \geq 1}, we pick a maximal {1/N}-separated set of directions {\Omega_N \subset S^{d-1}}. We let {D(N)} be the smallest constant for which one has the Kakeya inequality

\displaystyle  \| \sum_{\omega \in \Omega_N} 1_{T_\omega} \|_{L^{\frac{d}{d-1}}({\bf R}^d)} \leq D(N),

where {T_\omega} is a {1/N \times 1}-tube oriented in the direction {\omega}. The Kakeya maximal function conjecture is then equivalent to the assertion that {D(N)} has a subpolynomial upper bound (or equivalently, is of subpolynomial size). Currently this is only known in dimension {d=2}.

Example 2 (Restriction conjecture for the sphere) Here {N} ranges over all of {[1,+\infty)}. Let {d \geq 2} be a fixed dimension. We let {D(N)} be the smallest constant for which one has the restriction inequality

\displaystyle  \| \widehat{fd\sigma} \|_{L^{\frac{2d}{d-1}}(B(0,N))} \leq D(N) \| f \|_{L^\infty(S^{d-1})}

for all bounded measurable functions {f} on the unit sphere {S^{d-1}} equipped with surface measure {d\sigma}, where {B(0,N)} is the ball of radius {N} centred at the origin. The restriction conjecture of Stein for the sphere is then equivalent to the assertion that {D(N)} has a subpolynomial upper bound (or equivalently, is of subpolynomial size). Currently this is only known in dimension {d=2}.

Example 3 (Multilinear Kakeya inequality) Again {N} ranges over all of {[1,+\infty)}. Let {d \geq 2} be a fixed dimension, and let {S_1,\dots,S_d} be compact subsets of the sphere {S^{d-1}} which are transverse in the sense that there is a uniform lower bound {|\omega_1 \wedge \dots \wedge \omega_d| \geq c > 0} for the wedge product of directions {\omega_i \in S_i} for {i=1,\dots,d} (equivalently, there is no hyperplane through the origin that intersects all of the {S_i}). For each {N \geq 1}, we let {D(N)} be the smallest constant for which one has the multilinear Kakeya inequality

\displaystyle  \| \mathrm{geom} \sum_{T \in {\mathcal T}_i} 1_{T} \|_{L^{\frac{d}{d-1}}(B(0,N))} \leq D(N) \mathrm{geom} \# {\mathcal T}_i,

where for each {i=1,\dots,d}, {{\mathcal T}_i} is a collection of infinite tubes in {{\bf R}^d} of radius {1} oriented in a direction in {S_i}, which are separated in the sense that for any two tubes {T,T'} in {{\mathcal T}_i}, either the directions of {T,T'} differ by an angle of at least {1/N}, or {T,T'} are disjoint; and {\mathrm{geom} = \mathrm{geom}_{1 \leq i \leq d}} is our notation for the geometric mean

\displaystyle  \mathrm{geom} a_i := (a_1 \dots a_d)^{1/d}.

The multilinear Kakeya inequality of Bennett, Carbery, and myself establishes that {D(N)} is of subpolynomial size; a later argument of Guth improves this further by showing that {D(N)} is bounded (and in fact comparable to {1}).

Example 4 (Multilinear restriction theorem) Once again {N} ranges over all of {[1,+\infty)}. Let {d \geq 2} be a fixed dimension, and let {S_1,\dots,S_d} be compact subsets of the sphere {S^{d-1}} which are transverse as in the previous example. For each {N \geq 1}, we let {D(N)} be the smallest constant for which one has the multilinear restriction inequality

\displaystyle  \| \mathrm{geom} \widehat{f_id\sigma} \|_{L^{\frac{2d}{d-1}}(B(0,N))} \leq D(N) \| f \|_{L^2(S^{d-1})}

for all bounded measurable functions {f_i} on {S_i} for {i=1,\dots,d}. Then the multilinear restriction theorem of Bennett, Carbery, and myself establishes that {D(N)} is of subpolynomial size; it is known to be bounded for {d=2} (as can be easily verified from Plancherel’s theorem), but it remains open whether it is bounded for any {d>2}.

Example 5 (Decoupling for the paraboloid) {N} now ranges over the square numbers. Let {d \geq 2}, and subdivide the unit cube {[0,1]^{d-1}} into {N^{(d-1)/2}} cubes {Q} of sidelength {1/N^{1/2}}. For any {g \in L^1([0,1]^{d-1})}, define the extension operators

\displaystyle  E_{[0,1]^{d-1}} g( x', x_d ) := \int_{[0,1]^{d-1}} e^{2\pi i (x' \cdot \xi + x_d |\xi|^2)} g(\xi)\ d\xi

and

\displaystyle  E_Q g( x', x_d ) := \int_{Q} e^{2\pi i (x' \cdot \xi + x_d |\xi|^2)} g(\xi)\ d\xi

for {x' \in {\bf R}^{d-1}} and {x_d \in {\bf R}}. We also introduce the weight function

\displaystyle  w_{B(0,N)}(x) := (1 + \frac{|x|}{N})^{-100d}.

For any {p}, let {D_p(N)} be the smallest constant for which one has the decoupling inequality

\displaystyle  \| E_{[0,1]^{d-1}} g \|_{L^p(w_{B(0,N)})} \leq D_p(N) (\sum_Q \| E_Q g \|_{L^p(w_{B(0,N)})}^2)^{1/2}.

The decoupling theorem of Bourgain and Demeter asserts that {D_p(N)} is of subpolynomial size for all {p} in the optimal range {2 \leq p \leq \frac{2(d+1)}{d-1}}.

Example 6 (Decoupling for the moment curve) {N} now ranges over the natural numbers. Let {d \geq 2}, and subdivide {[0,1]} into {N} intervals {J} of length {1/N}. For any {g \in L^1([0,1])}, define the extension operators

\displaystyle  E_{[0,1]} g(x_1,\dots,x_d) = \int_{[0,1]} e^{2\pi i ( x_1 \xi + x_2 \xi^2 + \dots + x_d \xi^d} g(\xi)\ d\xi

and more generally

\displaystyle  E_J g(x_1,\dots,x_d) = \int_{[0,1]} e^{2\pi i ( x_1 \xi + x_2 \xi^2 + \dots + x_d \xi^d} g(\xi)\ d\xi

for {(x_1,\dots,x_d) \in {\bf R}^d}. For any {p}, let {D_p(N)} be the smallest constant for which one has the decoupling inequality

\displaystyle  \| E_{[0,1]} g \|_{L^p(w_{B(0,N^d)})} \leq D_p(N) (\sum_J \| E_J g \|_{L^p(w_{B(0,N^d)})}^2)^{1/2}.

It was shown by Bourgain, Demeter, and Guth that {D_p(N)} is of subpolynomial size for all {p} in the optimal range {2 \leq p \leq d(d+1)}, which among other things implies the Vinogradov main conjecture (as discussed in this previous post).

It is convenient to use asymptotic notation to express these estimates. We write {X \lesssim Y}, {X = O(Y)}, or {Y \gtrsim X} to denote the inequality {|X| \leq CY} for some constant {C} independent of the scale parameter {N}, and write {X \sim Y} for {X \lesssim Y \lesssim X}. We write {X = o(Y)} to denote a bound of the form {|X| \leq c(N) Y} where {c(N) \rightarrow 0} as {N \rightarrow \infty} along the given range of {N}. We then write {X \lessapprox Y} for {X \lesssim N^{o(1)} Y}, and {X \approx Y} for {X \lessapprox Y \lessapprox X}. Then the statement that {D(N)} is of polynomial size can be written as

\displaystyle  D(N) \sim N^{O(1)},

while the statement that {D(N)} has a subpolynomial upper bound can be written as

\displaystyle  D(N) \lessapprox 1

and similarly the statement that {D(N)} is of subpolynomial size is simply

\displaystyle  D(N) \approx 1.

Many modern approaches to bounding quantities like {D(N)} in harmonic analysis rely on some sort of induction on scales approach in which {D(N)} is bounded using quantities such as {D(N^\theta)} for some exponents {0 < \theta < 1}. For instance, suppose one is somehow able to establish the inequality

\displaystyle  D(N) \lessapprox D(\sqrt{N}) \ \ \ \ \ (4)

for all {N \geq 1}, and suppose that {D} is also known to be of polynomial size. Then this implies that {D} has a subpolynomial upper bound. Indeed, one can iterate this inequality to show that

\displaystyle  D(N) \lessapprox D(N^{1/2^k})

for any fixed {k}; using the polynomial size hypothesis one thus has

\displaystyle  D(N) \lessapprox N^{C/2^k}

for some constant {C} independent of {k}. As {k} can be arbitrarily large, we conclude that {D(N) \lesssim N^\varepsilon} for any {\varepsilon>0}, and hence {D} is of subpolynomial size. (This sort of iteration is used for instance in my paper with Bennett and Carbery to derive the multilinear restriction theorem from the multilinear Kakeya theorem.)

Exercise 7 If {D} is of polynomial size, and obeys the inequality

\displaystyle  D(N) \lessapprox D(N^{1-\varepsilon}) + N^{O(\varepsilon)}

for any fixed {\varepsilon>0}, where the implied constant in the {O(\varepsilon)} notation is independent of {\varepsilon}, show that {D} has a subpolynomial upper bound. This type of inequality is used to equate various linear estimates in harmonic analysis with their multilinear counterparts; see for instance this paper of myself, Vargas, and Vega for an early example of this method.

In more recent years, more sophisticated induction on scales arguments have emerged in which one or more auxiliary quantities besides {D(N)} also come into play. Here is one example, this time being an abstraction of a short proof of the multilinear Kakeya inequality due to Guth. Let {D(N)} be the quantity in Example 3. We define {D(N,M)} similarly to {D(N)} for any {M \geq 1}, except that we now also require that the diameter of each set {S_i} is at most {1/M}. One can then observe the following estimates:

These inequalities now imply that {D} has a subpolynomial upper bound, as we now demonstrate. Let {k} be a large natural number (independent of {N}) to be chosen later. From many iterations of (6) we have

\displaystyle  D(N, N^{1/k}) \lessapprox D(N^{1/k},N^{1/k})^k

and hence by (7) (with {N} replaced by {N^{1/k}}) and (5)

\displaystyle  D(N) \lessapprox N^{O(1/k)}

where the implied constant in the {O(1/k)} exponent does not depend on {k}. As {k} can be arbitrarily large, the claim follows. We remark that a nearly identical scheme lets one deduce decoupling estimates for the three-dimensional cone from that of the two-dimensional paraboloid; see the final section of this paper of Bourgain and Demeter.

Now we give a slightly more sophisticated example, abstracted from the proof of {L^p} decoupling of the paraboloid by Bourgain and Demeter, as described in this study guide after specialising the dimension to {2} and the exponent {p} to the endpoint {p=6} (the argument is also more or less summarised in this previous post). (In the cited papers, the argument was phrased only for the non-endpoint case {p<6}, but it has been observed independently by many experts that the argument extends with only minor modifications to the endpoint {p=6}.) Here we have a quantity {D_p(N)} that we wish to show is of subpolynomial size. For any {0 < \varepsilon < 1} and {0 \leq u \leq 1}, one can define an auxiliary quantity {A_{p,u,\varepsilon}(N)}. The precise definitions of {D_p(N)} and {A_{p,u,\varepsilon}(N)} are given in the study guide (where they are called {\mathrm{Dec}_2(1/N,p)} and {A_p(u, B(0,N^2), u, g)} respectively, setting {\delta = 1/N} and {\nu = \delta^\varepsilon}) but will not be of importance to us for this discussion. Suffice to say that the following estimates are known:

In all of these bounds the implied constant exponents such as {O(\varepsilon)} or {O(u)} are independent of {\varepsilon} and {u}, although the implied constants in the {\lessapprox} notation can depend on both {\varepsilon} and {u}. Here we gloss over an annoying technicality in that quantities such as {N^{1-\varepsilon}}, {N^{1-u}}, or {N^u} might not be an integer (and might not divide evenly into {N}), which is needed for the application to decoupling theorems; this can be resolved by restricting the scales involved to powers of two and restricting the values of {\varepsilon, u} to certain rational values, which introduces some complications to the later arguments below which we shall simply ignore as they do not significantly affect the numerology.

It turns out that these estimates imply that {D_p(N)} is of subpolynomial size. We give the argument as follows. As {D_p(N)} is known to be of polynomial size, we have some {\eta>0} for which we have the bound

\displaystyle  D_p(N) \lessapprox N^\eta \ \ \ \ \ (11)

for all {N}. We can pick {\eta} to be the minimal exponent for which this bound is attained: thus

\displaystyle  \eta = \limsup_{N \rightarrow \infty} \frac{\log D_p(N)}{\log N}. \ \ \ \ \ (12)

We will call this the upper exponent of {D_p(N)}. We need to show that {\eta \leq 0}. We assume for contradiction that {\eta > 0}. Let {\varepsilon>0} be a sufficiently small quantity depending on {\eta} to be chosen later. From (10) we then have

\displaystyle  A_{p,u,\varepsilon}(N) \lessapprox N^{O(\varepsilon)} A_{p,2u,\varepsilon}(N)^{1/2} N^{\eta (\frac{1}{2} - \frac{u}{2})}

for any sufficiently small {u}. A routine iteration then gives

\displaystyle  A_{p,u,\varepsilon}(N) \lessapprox N^{O(\varepsilon)} A_{p,2^k u,\varepsilon}(N)^{1/2^k} N^{\eta (1 - \frac{1}{2^k} - k\frac{u}{2})}

for any {k \geq 1} that is independent of {N}, if {u} is sufficiently small depending on {k}. A key point here is that the implied constant in the exponent {O(\varepsilon)} is uniform in {k} (the constant comes from summing a convergent geometric series). We now use the crude bound (9) followed by (11) and conclude that

\displaystyle  A_{p,u,\varepsilon}(N) \lessapprox N^{\eta (1 - k\frac{u}{2}) + O(\varepsilon) + O(u)}.

Applying (8) we then have

\displaystyle  D_p(N) \lessapprox N^{\eta(1-\varepsilon)} + N^{\eta (1 - k\frac{u}{2}) + O(\varepsilon) + O(u)}.

If we choose {k} sufficiently large depending on {\eta} (which was assumed to be positive), then the negative term {-\eta k \frac{u}{2}} will dominate the {O(u)} term. If we then pick {u} sufficiently small depending on {k}, then finally {\varepsilon} sufficiently small depending on all previous quantities, we will obtain {D_p(N) \lessapprox N^{\eta'}} for some {\eta'} strictly less than {\eta}, contradicting the definition of {\eta}. Thus {\eta} cannot be positive, and hence {D_p(N)} has a subpolynomial upper bound as required.

Exercise 8 Show that one still obtains a subpolynomial upper bound if the estimate (10) is replaced with

\displaystyle  A_{p,u,\varepsilon}(N) \lessapprox N^{O(\varepsilon)} A_{p,2u,\varepsilon}(N)^{1-\theta} D_p(N)^{\theta}

for some constant {0 \leq \theta < 1/2}, so long as we also improve (9) to

\displaystyle  A_{p,u,\varepsilon}(N) \lessapprox N^{O(\varepsilon)} D_p(N^{1-u}).

(This variant of the argument lets one handle the non-endpoint cases {2 < p < 6} of the decoupling theorem for the paraboloid.)

To establish decoupling estimates for the moment curve, restricting to the endpoint case {p = d(d+1)} for sake of discussion, an even more sophisticated induction on scales argument was deployed by Bourgain, Demeter, and Guth. The proof is discussed in this previous blog post, but let us just describe an abstract version of the induction on scales argument. To bound the quantity {D_p(N) = D_{d(d+1)}(N)}, some auxiliary quantities {A_{t,q,s,\varepsilon}(N)} are introduced for various exponents {1 \leq t \leq \infty} and {0 \leq q,s \leq 1} and {\varepsilon>0}, with the following bounds:

It is now substantially less obvious that these estimates can be combined to demonstrate that {D(N)} is of subpolynomial size; nevertheless this can be done. A somewhat complicated arrangement of the argument (involving some rather unmotivated choices of expressions to induct over) appears in my previous blog post; I give an alternate proof later in this post.

These examples indicate a general strategy to establish that some quantity {D(N)} is of subpolynomial size, by

  • (i) Introducing some family of related auxiliary quantities, often parameterised by several further parameters;
  • (ii) establishing as many bounds between these quantities and the original quantity {D(N)} as possible; and then
  • (iii) appealing to some sort of “induction on scales” to conclude.

The first two steps (i), (ii) depend very much on the harmonic analysis nature of the quantities {D(N)} and the related auxiliary quantities, and the estimates in (ii) will typically be proven from various harmonic analysis inputs such as Hölder’s inequality, rescaling arguments, decoupling estimates, or Kakeya type estimates. The final step (iii) requires no knowledge of where these quantities come from in harmonic analysis, but the iterations involved can become extremely complicated.

In this post I would like to observe that one can clean up and made more systematic this final step (iii) by passing to upper exponents (12) to eliminate the role of the parameter {N} (and also “tropicalising” all the estimates), and then taking similar limit superiors to eliminate some other less important parameters, until one is left with a simple linear programming problem (which, among other things, could be amenable to computer-assisted proving techniques). This method is analogous to that of passing to a simpler asymptotic limit object in many other areas of mathematics (for instance using the Furstenberg correspondence principle to pass from a combinatorial problem to an ergodic theory problem, as discussed in this previous post). We use the limit superior exclusively in this post, but many of the arguments here would also apply with one of the other generalised limit functionals discussed in this previous post, such as ultrafilter limits.

For instance, if {\eta} is the upper exponent of a quantity {D(N)} of polynomial size obeying (4), then a comparison of the upper exponent of both sides of (4) one arrives at the scalar inequality

\displaystyle  \eta \leq \frac{1}{2} \eta

from which it is immediate that {\eta \leq 0}, giving the required subpolynomial upper bound. Notice how the passage to upper exponents converts the {\lessapprox} estimate to a simpler inequality {\leq}.

Exercise 9 Repeat Exercise 7 using this method.

Similarly, given the quantities {D(N,M)} obeying the axioms (5), (6), (7), and assuming that {D(N)} is of polynomial size (which is easily verified for the application at hand), we see that for any real numbers {a, u \geq 0}, the quantity {D(N^a,N^u)} is also of polynomial size and hence has some upper exponent {\eta(a,u)}; meanwhile {D(N)} itself has some upper exponent {\eta}. By reparameterising we have the homogeneity

\displaystyle  \eta(\lambda a, \lambda u) = \lambda \eta(a,u)

for any {\lambda \geq 0}. Also, comparing the upper exponents of both sides of the axioms (5), (6), (7) we arrive at the inequalities

\displaystyle  \eta(1,u) = \eta + O(u)

\displaystyle  \eta(a_1+a_2,u) \leq \eta(a_1,u) + \eta(a_2,u)

\displaystyle  \eta(1,1) \leq 0.

For any natural number {k}, the third inequality combined with homogeneity gives {\eta(1/k,1/k)}, which when combined with the second inequality gives {\eta(1,1/k) \leq k \eta(1/k,1/k) \leq 0}, which on combination with the first estimate gives {\eta \leq O(1/k)}. Sending {k} to infinity we obtain {\eta \leq 0} as required.

Now suppose that {D_p(N)}, {A_{p,u,\varepsilon}(N)} obey the axioms (8), (9), (10). For any fixed {u,\varepsilon}, the quantity {A_{p,u,\varepsilon}(N)} is of polynomial size (thanks to (9) and the polynomial size of {D_6}), and hence has some upper exponent {\eta(u,\varepsilon)}; similarly {D_p(N)} has some upper exponent {\eta}. (Actually, strictly speaking our axioms only give an upper bound on {A_{p,u,\varepsilon}} so we have to temporarily admit the possibility that {\eta(u,\varepsilon)=-\infty}, though this will soon be eliminated anyway.) Taking upper exponents of all the axioms we then conclude that

\displaystyle  \eta \leq \max( (1-\varepsilon) \eta, \eta(u,\varepsilon) + O(\varepsilon) + O(u) ) \ \ \ \ \ (20)

\displaystyle  \eta(u,\varepsilon) \leq \eta + O(\varepsilon) + O(u)

\displaystyle  \eta(u,\varepsilon) \leq \frac{1}{2} \eta(2u,\varepsilon) + \frac{1}{2} \eta (1-u) + O(\varepsilon)

for all {0 \leq u \leq 1} and {0 \leq \varepsilon \leq 1}.

Assume for contradiction that {\eta>0}, then {(1-\varepsilon) \eta < \eta}, and so the statement (20) simplifies to

\displaystyle  \eta \leq \eta(u,\varepsilon) + O(\varepsilon) + O(u).

At this point we can eliminate the role of {\varepsilon} and simplify the system by taking a second limit superior. If we write

\displaystyle  \eta(u) := \limsup_{\varepsilon \rightarrow 0} \eta(u,\varepsilon)

then on taking limit superiors of the previous inequalities we conclude that

\displaystyle  \eta(u) \leq \eta + O(u)

\displaystyle  \eta(u) \leq \frac{1}{2} \eta(2u) + \frac{1}{2} \eta (1-u) \ \ \ \ \ (21)

\displaystyle  \eta \leq \eta(u) + O(u)

for all {u}; in particular {\eta(u) = \eta + O(u)}. We take advantage of this by taking a further limit superior (or “upper derivative”) in the limit {u \rightarrow 0} to eliminate the role of {u} and simplify the system further. If we define

\displaystyle  \alpha := \limsup_{u \rightarrow 0^+} \frac{\eta(u)-\eta}{u},

so that {\alpha} is the best constant for which {\eta(u) \leq \eta + \alpha u + o(u)} as {u \rightarrow 0}, then {\alpha} is finite, and by inserting this “Taylor expansion” into the right-hand side of (21) and conclude that

\displaystyle  \alpha \leq \alpha - \frac{1}{2} \eta.

This leads to a contradiction when {\eta>0}, and hence {\eta \leq 0} as desired.

Exercise 10 Redo Exercise 8 using this method.

The same strategy now clarifies how to proceed with the more complicated system of quantities {A_{t,q,s,\varepsilon}(N)} obeying the axioms (13)(19) with {D_p(N)} of polynomial size. Let {\eta} be the exponent of {D_p(N)}. From (14) we see that for fixed {t,q,s,\varepsilon}, each {A_{t,q,s,\varepsilon}(N)} is also of polynomial size (at least in upper bound) and so has some exponent {a( t,q,s,\varepsilon)} (which for now we can permit to be {-\infty}). Taking upper exponents of all the various axioms we can now eliminate {N} and arrive at the simpler axioms

\displaystyle  \eta \leq \max( (1-\varepsilon) \eta, a(t,q,s,\varepsilon) + O(\varepsilon) + O(q) + O(s) )

\displaystyle  a(t,q,s,\varepsilon) \leq \eta + O(\varepsilon) + O(q) + O(s)

\displaystyle  a(t_0,q,s,\varepsilon) \leq a(t_1,q,s,\varepsilon) + O(\varepsilon)

\displaystyle  a(t_\theta,q,s,\varepsilon) \leq (1-\theta) a(t_0,q,s,\varepsilon) + \theta a(t_1,q,s,\varepsilon) + O(\varepsilon)

\displaystyle  a(d(d+1),q,s,\varepsilon) \leq \eta(1-q) + O(\varepsilon)

for all {0 \leq q,s \leq 1}, {1 \leq t \leq \infty}, {1 \leq t_0 \leq t_1 \leq \infty} and {0 \leq \theta \leq 1}, with the lower dimensional decoupling inequality

\displaystyle  a(k(k+1),q,s,\varepsilon) \leq a(k(k+1),s/k,s,\varepsilon) + O(\varepsilon)

for {1 \leq k \leq d-1} and {q \leq s/k}, and the multilinear Kakeya inequality

\displaystyle  a(k(d+1),q,kq,\varepsilon) \leq a(k(d+1),q,(k+1)q,\varepsilon)

for {1 \leq k \leq d-1} and {0 \leq q \leq 1}.

As before, if we assume for sake of contradiction that {\eta>0} then the first inequality simplifies to

\displaystyle  \eta \leq a(t,q,s,\varepsilon) + O(\varepsilon) + O(q) + O(s).

We can then again eliminate the role of {\varepsilon} by taking a second limit superior as {\varepsilon \rightarrow 0}, introducing

\displaystyle  a(t,q,s) := \limsup_{\varepsilon \rightarrow 0} a(t,q,s,\varepsilon)

and thus getting the simplified axiom system

\displaystyle  a(t,q,s) \leq \eta + O(q) + O(s) \ \ \ \ \ (22)

\displaystyle  a(t_0,q,s) \leq a(t_1,q,s)

\displaystyle  a(t_\theta,q,s) \leq (1-\theta) a(t_0,q,s) + \theta a(t_1,q,s)

\displaystyle  a(d(d+1),q,s) \leq \eta(1-q)

\displaystyle  \eta \leq a(t,q,s) + O(q) + O(s) \ \ \ \ \ (23)

and also

\displaystyle  a(k(k+1),q,s) \leq a(k(k+1),s/k,s)

for {1 \leq k \leq d-1} and {q \leq s/k}, and

\displaystyle  a(k(d+1),q,kq) \leq a(k(d+1),q,(k+1)q)

for {1 \leq k \leq d-1} and {0 \leq q \leq 1}.

In view of the latter two estimates it is natural to restrict attention to the quantities {a(t,q,kq)} for {1 \leq k \leq d+1}. By the axioms (22), these quantities are of the form {\eta + O(q)}. We can then eliminate the role of {q} by taking another limit superior

\displaystyle  \alpha_k(t) := \limsup_{q \rightarrow 0} \frac{a(t,q,kq)-\eta}{q}.

The axioms now simplify to

\displaystyle  \alpha_k(t) = O(1)

\displaystyle  \alpha_k(t_0) \leq \alpha_k(t_1) \ \ \ \ \ (24)

\displaystyle  \alpha_k(t_\theta) \leq (1-\theta) \alpha_k(t_0) + \theta \alpha_k(t_1) \ \ \ \ \ (25)

\displaystyle  \alpha_k(d(d+1)) \leq -\eta \ \ \ \ \ (26)

and

\displaystyle  \alpha_j(k(k+1)) \leq \frac{j}{k} \alpha_k(k(k+1)) \ \ \ \ \ (27)

for {1 \leq k \leq d-1} and {k \leq j \leq d}, and

\displaystyle  \alpha_k(k(d+1)) \leq \alpha_{k+1}(k(d+1)) \ \ \ \ \ (28)

for {1 \leq k \leq d-1}.

It turns out that the inequality (27) is strongest when {j=k+1}, thus

\displaystyle  \alpha_{k+1}(k(k+1)) \leq \frac{k+1}{k} \alpha_k(k(k+1)) \ \ \ \ \ (29)

for {1 \leq k \leq d-1}.

From the last two inequalities (28), (29) we see that a special role is likely to be played by the exponents

\displaystyle  \beta_k := \alpha_k(k(k-1))

for {2 \leq k \leq d} and

\displaystyle \gamma_k := \alpha_k(k(d+1))

for {1 \leq k \leq d}. From the convexity (25) and a brief calculation we have

\displaystyle  \alpha_{k+1}(k(d+1)) \leq \frac{1}{d-k+1} \alpha_{k+1}(k(k+1))

\displaystyle + \frac{d-k}{d-k+1} \alpha_{k+1}((k+1)(d+1)),

for {1 \leq k \leq d-1}, hence from (28) we have

\displaystyle  \gamma_k \leq \frac{1}{d-k+1} \beta_{k+1} + \frac{d-k}{d-k+1} \gamma_{k+1}. \ \ \ \ \ (30)

Similarly, from (25) and a brief calculation we have

\displaystyle  \alpha_k(k(k+1)) \leq \frac{(d-k)(k-1)}{(k+1)(d-k+2)} \alpha_k( k(k-1))

\displaystyle  + \frac{2(d+1)}{(k+1)(d-k+2)} \alpha_k(k(d+1))

for {2 \leq k \leq d-1}; the same bound holds for {k=1} if we drop the term with the {(k-1)} factor, thanks to (24). Thus from (29) we have

\displaystyle  \beta_{k+1} \leq \frac{(d-k)(k-1)}{k(d-k+2)} \beta_k + \frac{2(d+1)}{k(d-k+2)} \gamma_k, \ \ \ \ \ (31)

for {1 \leq k \leq d-1}, again with the understanding that we omit the first term on the right-hand side when {k=1}. Finally, (26) gives

\displaystyle  \gamma_d \leq -\eta.

Let us write out the system of equations we have obtained in full:

\displaystyle  \beta_2 \leq 2 \gamma_1 \ \ \ \ \ (32)

\displaystyle  \gamma_1 \leq \frac{1}{d} \beta_2 + \frac{d-1}{d} \gamma_2 \ \ \ \ \ (33)

\displaystyle  \beta_3 \leq \frac{d-2}{2d} \beta_2 + \frac{2(d+1)}{2d} \gamma_2 \ \ \ \ \ (34)

\displaystyle  \gamma_2 \leq \frac{1}{d-1} \beta_3 + \frac{d-2}{d-1} \gamma_3 \ \ \ \ \ (35)

\displaystyle  \beta_4 \leq \frac{2(d-3)}{3(d-1)} \beta_3 + \frac{2(d+1)}{3(d-1)} \gamma_3

\displaystyle  \gamma_3 \leq \frac{1}{d-2} \beta_4 + \frac{d-3}{d-2} \gamma_4

\displaystyle  ...

\displaystyle  \beta_d \leq \frac{d-2}{(d-1) 3} \beta_{d-1} + \frac{2(d+1)}{(d-1) 3} \gamma_{d-1}

\displaystyle  \gamma_{d-1} \leq \frac{1}{2} \beta_d + \frac{1}{2} \gamma_d \ \ \ \ \ (36)

\displaystyle  \gamma_d \leq -\eta. \ \ \ \ \ (37)

We can then eliminate the variables one by one. Inserting (33) into (32) we obtain

\displaystyle  \beta_2 \leq \frac{2}{d} \beta_2 + \frac{2(d-1)}{d} \gamma_2

which simplifies to

\displaystyle  \beta_2 \leq \frac{2(d-1)}{d-2} \gamma_2.

Inserting this into (34) gives

\displaystyle  \beta_3 \leq 2 \gamma_2

which when combined with (35) gives

\displaystyle  \beta_3 \leq \frac{2}{d-1} \beta_3 + \frac{2(d-2)}{d-1} \gamma_3

which simplifies to

\displaystyle  \beta_3 \leq \frac{2(d-2)}{d-3} \gamma_3.

Iterating this we get

\displaystyle  \beta_{k+1} \leq 2 \gamma_k

for all {1 \leq k \leq d-1} and

\displaystyle  \beta_k \leq \frac{2(d-k+1)}{d-k} \gamma_k

for all {2 \leq k \leq d-1}. In particular

\displaystyle  \beta_d \leq 2 \gamma_{d-1}

which on insertion into (36), (37) gives

\displaystyle  \beta_d \leq \beta_d - \eta

which is absurd if {\eta>0}. Thus {\eta \leq 0} and so {D_p(N)} must be of subpolynomial growth.

Remark 11 (This observation is essentially due to Heath-Brown.) If we let {x} denote the column vector with entries {\beta_2,\dots,\beta_d,\gamma_1,\dots,\gamma_{d-1}} (arranged in whatever order one pleases), then the above system of inequalities (32)(36) (using (37) to handle the appearance of {\gamma_d} in (36)) reads

\displaystyle  x \leq Px + \eta v \ \ \ \ \ (38)

for some explicit square matrix {P} with non-negative coefficients, where the inequality denotes pointwise domination, and {v} is an explicit vector with non-positive coefficients that reflects the effect of (37). It is possible to show (using (24), (26)) that all the coefficients of {x} are negative (assuming the counterfactual situation {\eta>0} of course). Then we can iterate this to obtain

\displaystyle  x \leq P^k x + \eta \sum_{j=0}^{k-1} P^j v

for any natural number {k}. This would lead to an immediate contradiction if the Perron-Frobenius eigenvalue of {P} exceeds {1} because {P^k x} would now grow exponentially; this is typically the situation for “non-endpoint” applications such as proving decoupling inequalities away from the endpoint. In the endpoint situation discussed above, the Perron-Frobenius eigenvalue is {1}, with {v} having a non-trivial projection to this eigenspace, so the sum {\sum_{j=0}^{k-1} \eta P^j v} now grows at least linearly, which still gives the required contradiction for any {\eta>0}. So it is important to gather “enough” inequalities so that the relevant matrix {P} has a Perron-Frobenius eigenvalue greater than or equal to {1} (and in the latter case one needs non-trivial injection of an induction hypothesis into an eigenspace corresponding to an eigenvalue {1}). More specifically, if {\rho} is the spectral radius of {P} and {w^T} is a left Perron-Frobenius eigenvector, that is to say a non-negative vector, not identically zero, such that {w^T P = \rho w^T}, then by taking inner products of (38) with {w} we obtain

\displaystyle  w^T x \leq \rho w^T x + \eta w^T v.

If {\rho > 1} this leads to a contradiction since {w^T x} is negative and {w^T v} is non-positive. When {\rho = 1} one still gets a contradiction as long as {w^T v} is strictly negative.

Remark 12 (This calculation is essentially due to Guo and Zorin-Kranich.) Here is a concrete application of the Perron-Frobenius strategy outlined above to the system of inequalities (32)(37). Consider the weighted sum

\displaystyle  W := \sum_{k=2}^d (k-1) \beta_k + \sum_{k=1}^{d-1} 2k \gamma_k;

I had secretly calculated the weights {k-1}, {2k} as coming from the left Perron-Frobenius eigenvector of the matrix {P} described in the previous remark, but for this calculation the precise provenance of the weights is not relevant. Applying the inequalities (31), (30) we see that {W} is bounded by

\displaystyle  \sum_{k=2}^d (k-1) (\frac{(d-k+1)(k-2)}{(k-1)(d-k+3)} \beta_{k-1} + \frac{2(d+1)}{(k-1)(d-k+3)} \gamma_{k-1})

\displaystyle  + \sum_{k=1}^{d-1} 2k(\frac{1}{d-k+1} \beta_{k+1} + \frac{d-k}{d-k+1} \gamma_{k+1})

(with the convention that the {\beta_1} term is absent); this simplifies after some calculation to the bound

\displaystyle  W \leq W + \frac{1}{2} \gamma_d

and this and (37) then leads to the required contradiction.

Exercise 13

  • (i) Extend the above analysis to also cover the non-endpoint case {d^2 < p < d(d+1)}. (One will need to establish the claim {\alpha_k(t) \leq -\eta} for {t \leq p}.)
  • (ii) Modify the argument to deal with the remaining cases {2 < p \leq d^2} by dropping some of the steps.

David HoggSDSS-V review, day 2

Today was day two of the SDSS-V Multi-object spectroscopy review. We heard about the spectrographs (APOGEE and BOSS), the full software stack, observatory staffing, and had an extremely good discussion of project management and systems engineering. On this latter point, we discussed the issue that scientists in academic collaborations tend to see the burdens of documenting requirements and interfaces as interfering with their work. Project management sees these things as helping get the work done, and on time and on budget. We discussed some of the ways we might get more of the project—and more of the scientific community—to see the systems-engineering point of view.

The panel spent much of the day working on our report and giving feedback to the team. I am so honored to be a (peripheral) part of this project. It is an incredible set of sub-projects and sub-systems being put together by a dream team of excellent people. And the excellence of the people cuts across all levels of seniority and all backgrounds. My day ended with conversations about how we can word our toughest recommendations so that they will constructively help the project.

One theme of the day is education: We are educators, working on a big project. Part of what we are doing is helping our people to learn, and helping the whole community to learn. And that learning is not just about astronomy. It is about hardware, engineering, documentation, management, and (gasp) project reviewing. That's an interesting lens through which to see all this stuff. I love my job!

BackreactionNo, a next larger particle collider will not tell us anything about the creation of the universe

LHC magnets. Image: CERN. A few days ago, Scientific American ran a piece by a CERN physicist and a philosopher about particle physicists’ plans to spend $20 billion on a next larger particle collider, the Future Circular Collider (FCC). To make their case, the authors have dug up a quote from 1977 and ignored the 40 years after this, which is a truly excellent illustration of all that’s wrong

June 18, 2019

BackreactionBrace for the oncoming deluge of dark matter detectors that won’t detect anything

Imagine an unknown disease spreads, causing temporarily blindness. Most patients recover after a few weeks, but some never regain eyesight. Scientists rush to identify the cause. They guess the pathogen’s shape and, based on this, develop test-stripes and antigens. If one guess doesn’t work, they’ll move on to the next. Doesn’t quite sound right? Of course it does not. Trying to identifying

June 17, 2019

John BaezApplied Category Theory Meeting at UCR

 

The American Mathematical Society is having their Fall Western meeting here at U. C. Riverside during the weekend of November 9th and 10th, 2019. Joe Moeller and I are organizing a session on Applied Category Theory! We already have some great speakers lined up:

• Tai-Danae Bradley
• Vin de Silva
• Brendan Fong
• Nina Otter
• Evan Patterson
• Blake Pollard
• Prakash Panangaden
• David Spivak
• Brad Theilman
• Dmitry Vagner
• Zhenghan Wang

Alas, we have no funds for travel and lodging. If you’re interested in giving a talk, please submit an abstract here:

General information about abstracts, American Mathematical Society.

More precisely, please read the information there and then click on the link on that page to submit an abstract. It should then magically fly through the aether to me! Abstracts are due September 3rd, but the sooner you submit one, the greater the chance that we’ll have space.

For the program of the whole conference, go here:

Fall Western Sectional Meeting, U. C. Riverside, Riverside, California, 9–10 November 2019.

I will also be running a special meeting on diversity and excellence in mathematics on Friday November 8th. There will be a banquet that evening, and at some point I’ll figure out how tickets for that will work.

We had a special session like this in 2017, and it’s fun to think about how things have evolved since then.

David Spivak had already written Category Theory for the Sciences, but more recently he’s written another book on applied category theory, Seven Sketches, with Brendan Fong. He already had a company, but now he’s helping run Conexus, which plans to award grants of up to $1.5 million to startups that use category theory (in exchange for equity). Proposals are due June 30th, by the way!

I guess Brendan Fong was already working with David Spivak at MIT in the fall of 2017, but since then they’ve written Seven Sketches and developed a graphical calculus for logic in regular categories. He’s also worked on a functorial approach to machine learning—and now he’s using category theory to unify learners and lenses.

Blake Pollard had just finished his Ph.D. work at U.C. Riverside back in 2018. He will now talk about his work with Spencer Breiner and Eswaran Subrahmanian at the National Institute of Standards and Technology, using category theory to help develop the “smart grid”—the decentralized power grid we need now. Above he’s talking to Brendan Fong at the Centre for Quantum Technologies, in Singapore. I think that’s where they first met.

Nina Otter was a grad student at Oxford in 2017, but now she’s at UCLA and the University of Leipzig. She worked with Ulrike Tillmann and Heather Harrington on stratifying multiparameter persistent homology, and is now working on a categorical formulation of positional and role analysis in social networks. Like Brendan, she’s on the executive board of the applied category theory journal Compositionality.

I first met Tai-Danae Bradley at ACT2018. Now she will talk about her work at Tunnel Technologies, a startup run by her advisor John Terilla. They model sequences—of letters from an alphabet, for instance—using quantum states and tensor networks.

Vin de Silva works on topological data analysis using persistent cohomology so he’ll probably talk about that. He’s studied the “interleaving distance” between persistence modules, using category theory to treat it and the Gromov-Hausdorff metric in the same setting. He came to the last meeting and it will be good to have him back.

Evan Patterson is a statistics grad student at Stanford. He’s worked on knowledge representation in bicategories of relations, and on teaching machines to understand data science code by the semantic enrichment of dataflow graphs. He too came to the last meeting.

Dmitry Vagner was also at the last meeting, where he spoke about his work with Spivak on open dynamical systems and the operad of wiring diagrams. Now is implementing wiring diagrams and a type-safe linear algebra library in Idris. The idea is to avoid problems that people currently run into a lot in TensorFlow (“ugh I have a 3 x 1 x 2 tensor but I need a 3 x 2 tensor”).

Prakash Panangaden has long been a leader in applied category theory, focused on semantics and logic for probabilistic systems and languages, machine learning, and quantum information theory.

Brad Theilman is a grad student in computational neuroscience at U.C. San Diego. I first met him at ACT2018. He’s using algebraic topology to design new techniques for quantifying the spatiotemporal structure of neural activity in the auditory regions of the brain of the European starling. (I bet you didn’t see those last two words coming!)

Last but not least, Zhenghan Wang works on condensed matter physics and modular tensor categories at U.C. Santa Barbara. At Microsoft’s Station Q, he is using this research to help design topological quantum computers.

In short: a lot has been happening in applied category theory, so it will be good to get together and talk about it!

David HoggSDSS-V review, day 1

Today was day one of a review of the SDSS-V Multi-object spectroscopy systems. This is not all of SDSS-V but it is a majority part. It includes the Milky Way Mapper and Black-Hole Mapper projects, two spectrographs (APOGEE and BOSS), two observatories (Apache Point and Las Campanas), and a robotic fiber-positioner system. Plus boatloads of software and operations challenges. I agreed to chair the review, so my job is to lead the writing of a report after we hear two days of detailed presentations on project sub-systems.

One of the reasons I love work like this is that I learn so much. And I love engineering. And indeed a lot of the interesting (to me) discussion today was about engineering requirements, documentation, and project design. These are not things we are traditionally taught as part of astronomy, but they are really important to all of the data we get and use. One of the things we discussed is that our telescopes have fixed focal planes and our spectrographs have fixed capacities, so it is important that the science requirements both flow down from important scientific objectives, and flow down to an achievable, schedulable operation, within budget.

There is too much to say in one blog post! But one thing that came up is fundraising: Why would an institution join the SDSS-V project when they know that we are paragons of open science and that, therefore, we will release all of our data and code publicly as we proceed? My answer is influence: The SDSS family of projects has been very good at adapting to the scientific interests of its members and collaborators, and especially weighting those adaptations in proportion to the amount that people are willing to do work. And the project has spare fibers and spare target-of-opportunity capacity! So you get a lot by buying into this project.

Related to this: This project is going to solve a set of problems in how we do massively multiplexed heterogeneous spectroscopic follow-up in a set of mixed time-domain and static target categories. These problems have not been solved previously!

David Hoggwords on a plane

I spent time today on an airplane, writing in the papers I am working on with Jessica Birky (UCSD) and Megan Bedell (Flatiron). And I read documents in preparation for the review of the SDSS-V Project that I am leading over the next two days in a Denver airport hotel.

BackreactionBook review: “Einstein’s Unfinished Revolution” by Lee Smolin

Einstein’s Unfinished Revolution: The Search for What Lies Beyond the Quantum By Lee Smolin Penguin Press (April 9, 2019) Popular science books cover a spectrum from exposition to speculation. Some writers, like Chad Orzel or Anil Ananthaswamy, stay safely on the side of established science. Others, like Philip Ball in his recent book, keep their opinions to the closing chapter. I would place

June 16, 2019

n-Category Café Applied Category Theory Meeting at UCR

The American Mathematical Society is having their Fall Western meeting here at U. C. Riverside during the weekend of November 9th and 10th, 2019. Joe Moeller and I are organizing a session on Applied Category Theory!

We already have some great speakers lined up:

  • Tai-Danae Bradley
  • Vin de Silva
  • Brendan Fong
  • Nina Otter
  • Evan Patterson
  • Blake Pollard
  • Prakash Panangaden
  • David Spivak
  • Brad Theilman
  • Dmitry Vagner
  • Zhenghan Wang

Alas, we have no funds for travel and lodging. If you’re interested in giving a talk, please submit an abstract here:

More precisely, please read the information there and then click on the link on that page to submit an abstract. It should then magically fly through the aether to me! Abstracts are due September 3rd, but the sooner you submit one, the greater the chance that we’ll have space.

For the program of the whole conference, go here:

I will also be running a special meeting on diversity and excellence in mathematics on Friday November 8th. There will be a banquet that evening, and at some point I’ll figure out how tickets for that will work.

We had a special session like this in 2017, and it’s fun to think about how things have evolved since then. Blake Pollard, a grad student then, will now talk about his work at NIST with Spencer Breiner and Eswaran Subramanian using category theory to design things like power grids. Tai-Danae Bradley will talk about her work at Tunnel Technologies, a startup run by her advisor John Terilla. They model sequences—of letters from an alphabet, for instance—using quantum states and tensor networks. David Spivak already had a company, but now he’s helping run Conexus, which plans to award grants of up to $1.5 million to startups that use category theory (in exchange for equity). Proposals are due June 30th, by the way!

Brendan Fong was already working with David in 2017, but since then they’ve written a book on applied category theory, and he’s now working with David on Conexus. Nina Otter was a grad student at Oxford in 2017, but now she has jobs at UCLA and the University of Leipzig — and like Brendan, she’s on the executive board of the applied category theory journal Compositionality.

And so on. A lot has been happening, so it will be good to get together and talk about it.

David Hoggspiral structure

This morning on my weekly call with Eilers (MPIA) we discussed the new scope of a paper about spiral and bar structure in the Milky Way disk. Back at the Gaia Sprint, we thought we had a big result: We thought we would be able to infer the locations of the spiral-arm over-densities from the velocity field. But it turned out that our simple picture was wrong (and in retrospect, it is obvious that it was). But Eilers has made beautiful visualizations of disk simulations by Tobias Buck (AIP), who shows very similar velocity structure and for which we know the truth about the density structure. These visualizations say that there are relationships between the velocity structure and the density structure, but that it evolves. We tried to write a sensible scope for the paper in this new context. There is still good science to do, because the structure we see is novel and spans much of the disk.

Matt StrasslerA Ring of Controversy Around a Black Hole Photo

[Note Added: Thanks to some great comments I’ve received, I’m continuing to add clarifying remarks to this post.  You’ll find them in green.]

It’s been a couple of months since the `photo’ (a false-color image created to show the intensity of radio waves, not visible light) of the black hole at the center of the galaxy M87, taken by the Event Horizon Telescope (EHT) collaboration, was made public. Before it was shown, I wrote an introductory post explaining what the ‘photo’ is and isn’t. There I cautioned readers that I thought it might be difficult to interpret the image, and controversies about it might erupt.EHTDiscoveryM87

So far, the claim that the image shows the vicinity of M87’s black hole (which I’ll call `M87bh’ for short) has not been challenged, and I’m not expecting it to be. But what and where exactly is the material that is emitting the radio waves and thus creating the glow in the image? And what exactly determines the size of the dark region at the center of the image? These have been problematic issues from the beginning, but discussion is starting to heat up. And it’s important: it has implications for the measurement of the black hole’s mass (which EHT claims is that of 6.5 billion Suns, with an uncertainty of about 15%), and for any attempt to estimate its rotation rate.

Over the last few weeks I’ve spent some time studying the mathematics of spinning black holes, talking to my Harvard colleagues who are world’s experts on the relevant math and physics, and learning from colleagues who produced the `photo’ and interpreted it. So I think I can now clearly explain what most journalists and scientist-writers (including me) got wrong at the time of the photo’s publication, and clarify what the photo does and doesn’t tell us.

One note before I begin: this post is long. But it starts with a summary of the situation that you can read quickly, and then comes the long part: a step-by-step non-technical explanation of an important aspect of the black hole ‘photo’ that, to my knowledge, has not yet been given anywhere else.

[I am heavily indebted to Harvard postdocs Alex Lupsasca and Shahar Hadar for assisting me as I studied the formulas and concepts relevant for fast-spinning black holes. Much of what I learned comes from early 1970s papers, especially those by my former colleague Professor Jim Bardeen (see this one written with Press and Teukolsky), and from papers written in the last couple of years, especially this one by my present and former Harvard colleagues.]

What Does the EHT Image Show?

Scientists understand the black hole itself — the geometric dimple in space and time — pretty well. If one knows the mass and the rotation rate of the black hole, and assumes Einstein’s equations for gravity are mostly correct (for which we have considerable evidence, for example from LIGO measurements and elsewhere), then the equations tell us what the black hole does to space and time and how its gravity works.

But for the `photo’, ​that’s not enough information. We don’t get to observe the black hole itself (it’s black, after all!) What the `photo’ shows is a blurry ring of radio waves, emitted from hot material (a plasma of mostly electrons and protons) somewhere around the black hole — material whose location, velocity, and temperature we do not know. That material and its emission of radio waves are influenced by powerful gravitational forces (whose details depend on the rotation rate of the M87bh, which we don’t know yet) and powerful magnetic fields (whose details we hardly know at all.) The black hole’s gravity then causes the paths on which the radio waves travel to bend, even more than a glass lens will bend the path of visible light, so that where things appear in the ‘photo’ is not where they are actually located.

The only insights we have into this extreme environment come from computer simulations and a few other `photos’ at lower magnification. The simulations are based on well-understood equations, but the equations have to be solved approximately, using methods that may or may not be justified. And the simulations don’t tell you where the matter is; they tell you where the material will go, but only after you make a guess as to where it is located at some initial point in time. (In the same sense: computers can predict the national weather tomorrow only when you tell them what the national weather was yesterday.) No one knows for sure how accurate or misleading these simulations might be; they’ve been tested against some indirect measurements, but no one can say for sure what flaws they might have.

However, there is one thing we can certainly say, and it has just been said publicly in a paper by Samuel Gralla, Daniel Holz and Robert Wald.

Two months ago, when the EHT `photo’ appeared, it was widely reported in the popular press and on blogs that the photo shows the image of a photon sphere at the edge of the shadow of the M87bh. (Instead of `shadow’, I suggested the term ‘quasi-silhouette‘, which I viewed as somewhat less misleading to a non-expert.)

Unfortunately, it seems these statements are not true; and this was well-known to (but poorly communicated by, in my opinion) the EHT folks.  This lack of clarity might perhaps annoy some scientists and science-loving non-experts; but does this issue also matter scientifically? Gralla et al., in their new preprint, suggest that it does (though they were careful to not yet make a precise claim.)

The Photon Sphere Doesn’t Exist

Indeed, if you happened to be reading my posts carefully when the `photo’ first appeared, you probably noticed that I was quite vague about the photon-sphere — I never defined precisely what it was. You would have been right to read this as a warning sign, for indeed I wasn’t getting clear explanations of it from anyone. Studying the equations and conversing with expert colleagues, I soon learned why: for a rotating black hole, the photon sphere doesn’t really exist.

But let’s first define what the photon sphere is for a non-rotating black hole! Like the Earth’s equator, the photon sphere is a location, not an object. This location is the surface of an imaginary ball, lying well outside the black hole’s horizon. On the photon sphere, photons (the particles that make up light, radio waves, and all other electromagnetic waves) travel on special circular or spherical orbits around the black hole.

By contrast, a rotating black hole has a larger, broader `photon-zone’ where photons can have special orbits. But you won’t ever see the whole photon zone in any image of a rotating black hole. Instead, a piece of the photon zone will appear as a `photon ring‘, a bright and very thin loop of radio waves. However, the photon ring is not the edge of anything spherical, is generally not perfectly circular, and generally is not even perfectly centered on the black hole.

… and the Photon Ring Isn’t What We See…

It seems likely that the M87bh is rotating quite rapidly, so it has a photon-zone rather than a photon-sphere, and images of it will have a photon ring. Ok, fine; but then, can we interpret EHT’s `photo’ simply as showing the photon ring, blurred by the imperfections in the `telescope’? Although some of the EHT folks have seemed to suggest the answer is “yes”, Gralla et al. suggest the answer is likely “no” (and many of their colleagues have been pointing out the same thing in private.) The circlet of radio waves that appears in the EHT `photo’ is probably not simply a blurred image of M87bh’s photon ring; it probably shows a combination of the photon ring with something brighter (as explained below). That’s where the controversy starts.

…so the Dark Patch May Not Be the Full Shadow…

The term `shadow’ is confusing (which is why I prefer `quasi-silhouette’ in describing it in public contexts, though that’s my own personal term) but no matter what you call it, in its ideal form it is supposed to be an absolutely dark area whose edge is the photon ring. But in reality the perfectly dark area need not appear so dark after all; it may be partly filled in by various effects. Furthermore, since the `photo’ may not show us the photon ring, it’s far from clear that the dark patch in the center is the full shadow anyway. The EHT folks are well aware of this, but at the time the photo came out, many science writers and scientist-writers (including me) were not.

…so EHT’s Measurement of the M87bh’s Mass is Being Questioned

It was wonderful that EHT could make a picture that could travel round the internet at the speed of light, and generate justifiable excitement and awe that human beings could indirectly observe such an amazing thing as a black hole with a mass of several billion Sun-like stars. Qualitatively, they achieved something fantastic in showing that yes, the object at the center of M87 really is as compact and dark as such a black hole would be expected to be! But the EHT telescope’s main quantitative achievement was a measurement of the mass of the M87bh, with a claimed precision of about 15%.

Naively, one could imagine that the mass is measured by looking at the diameter of the dark spot in the black hole ‘photo’, under the assumption that it is the black hole’s shadow. So here’s the issue: Could interpreting the dark region incorrectly perhaps lead to a significant mistake in the mass measurement, and/or an underestimate of how uncertain the mass measurement actually is?

I don’t know.  The EHT folks are certainly aware of these issues; their simulations show them explicitly.  The mass of the M87bh isn’t literally measured by putting a ruler on the ‘photo’ and measuring the size of the dark spot! The actual methods are much more sophisticated than that, and I don’t understand them well enough yet to explain, evaluate or criticize them. All I can say with confidence right now is that these are important questions that experts currently are debating, and consensus on the answer may not be achieved for quite a while.

———————————————————————-

The Appearance of a Black Hole With Nearby Matter

Ok, now I’m going to explain the most relevant points, step-by-step. Grab a cup of coffee or tea, find a comfy chair, and bear with me.

Because fast-rotating black holes are more complicated, I’m going to start illuminating the controversy by looking at a non-rotating black hole’s properties, which is also what Gralla et al. mainly do in their paper. It turns out the qualitative conclusion drawn from the non-rotating case largely applies in the rotating case too, at least in the case of the M87bh as seen from our perspective; that’s important because the M87bh may well be rotating at a very good clip.

A little terminology first: for a rotating black hole there’s a natural definition of the poles and the equator, just as there is for the Earth: there’s an axis of rotation, and the poles are where that axis intersects with the black hole horizon. The equator is the circle that lies halfway between the poles. For a non-rotating black hole, there’s no such axis and no such automatic definition, but it will be useful to define the north pole of the black hole to be the point on the horizon closest to us.

A Single Source of Electromagnetic Waves

Let’s imagine placing a bright light bulb on the same plane as the equator, outside the black hole horizon but rather close to it. (The bulb could emit radio waves or visible light or any other form of electromagnetic waves, at any frequency; for what I’m about to say, it doesn’t matter at all, so I’ll just call it `light’.) See Figure 1. Where will the light from the bulb go?

Some of it, heading inward, ends up in the black hole, while some of it heads outward toward distant observers. The gravity of the black hole will bend the path of the light. And here’s something remarkable: a small fraction of the light, aimed just so, can actually spiral around the black hole any number of times before heading out. As a result, you will see the bulb not once but multiple times!

There will be a direct image — light that comes directly to us — from near the bulb’s true location (displaced because gravity bends the light a bit, just as a glass lens will distort the appearance of what’s behind it.) That path of that light is the orange arrow in Figure 1. But then there will be an indirect image (the green arrow in Figure 1) from light that goes halfway around the black hole before heading in our direction; we will see that image of the bulb on the opposite side of the black hole. Let’s call that the `first indirect image.’ Then there will be a second indirect image from light that orbits the black hole once and comes out near the direct image, but further out; that’s the blue arrow in Figure 1. Then there will be a third indirect image from light that goes around one and a half times (not shown), and so on. In short, Figure 1 shows the paths of the direct, first indirect, and second indirect images of the bulb as they head toward our location at the top of the image.

BHTruthBulb.png

Figure 1: A light bulb (yellow) outside but near the non-rotating black hole’s horizon (in black) can be seen by someone at the top of the image not only through the light that goes directly upward (orange line) — a “direct image” — but also through light that makes partial or complete orbits of the black hole — “indirect images.” The first indirect and second indirect images are from light taking the green and blue paths. For light to make orbits of the black hole, it must travel near the grey-dashed circle that indicates the location of a “photon-sphere.” (A rotating black hole has no such sphere, but when seen from the north or south pole, the light observed takes similar paths to what is shown in this figure.) [The paths of the light rays were calculated carefully using Mathematica 11.3.]

What you can see in Figure 1 is that both the first and second indirect images are formed by light that spends part of its time close to a special radius around the back hole, shown as a dotted line. This imaginary surface, the edge of a ball,  is an honest “photon-sphere” in the case of a non-rotating black hole.

In the case of a rotating black hole, something very similar happens when you’re looking at the black hole from its north (or south) pole; there’s a special circle then too. But that circle is not the edge of a photon-sphere! In general, photons can have special orbits in a wide region, which I called the “photon-zone” earlier, and only a small set of them are on this circle. You’ll see photons from other parts of the photon zone if you look at the black hole not from the poles but from some other angle.

[If you’d like to learn a bit more about the photon zone, and you have a little bit of knowledge of black holes already, you can profit from exploring this demo by Professor Leo Stein: https://duetosymmetry.com/tool/kerr-circular-photon-orbits/ ]

Back to the non-rotating case: What our camera will see, looking at what is emitted from the light bulb, is shown in Figure 2: an infinite number of increasingly squished `indirect’ images, half on one side of the black hole near the direct image, and the other half on the other side. What is not obvious, but true, is that only the first of the indirect images is large and bright; this is one of Gralla et al.‘s main points. We can, therefore, separate the images into the direct image, the first indirect image, and the remaining indirect images. The total amount of light coming from the direct image and the first indirect image can be large, but the total amount of light from the remaining indirect images is typically (according to Gralla et al.) less than 5% of the light from the first indirect image. And so, unless we have an extremely high-powered camera, we’ll never pick those other images up. Let’s therefore focus our attention on the direct image and the first indirect image.

BHObsvBulb3.png

Figure 2: What the drawing in Figure 1 actually looks like to the observer peering toward the black hole; all the indirect images lie at almost exactly the same distance from the black hole’s center.

WARNING (since this seems to be a common confusion):

IN ALL MY FIGURES IN THIS POST, AS IN THE BLACK HOLE `PHOTO’ ITSELF, THE COLORS OF THE IMAGES ARE CHOSEN ARBITRARILY (as explained in my first blog post on this subject.) THE `PHOTO’ WAS TAKEN AT A SINGLE, NON-VISIBLE FREQUENCY OF ELECTROMAGNETIC WAVES: EVEN IF WE COULD SEE THAT TYPE OF RADIO WAVE WITH OUR EYES, IT WOULD BE A SINGLE COLOR, AND THE ONLY THING THAT WOULD VARY ACROSS THE IMAGE IS BRIGHTNESS. IN THIS SENSE, A BLACK AND WHITE IMAGE MIGHT BE CLEARER CONCEPTUALLY, BUT IT IS HARDER FOR THE EYE TO PROCESS.

A Circular Source of Electromagnetic Waves

Proceeding step by step toward a more realistic situation, let’s replace our ordinary bulb by a circular bulb (Figure 3), again set somewhat close to the horizon, sitting in the plane that contains the equator. What would we see now?

BHTruthCirc2.png

Figure 3: if we replace the light bulb with a circle of light, the paths of the light are the same as in Figure 1, except now for each point along the circle. That means each direct and indirect image itself forms a circle, as shown in the next figure.

That’s shown in Figure 4: the direct image is a circle (looking somewhat larger than it really is); outside it sits the first indirect image of the ring; and then come all the other indirect images, looking quite dim and all piling up at one radius. We’re going to call all those piled-up images the “photon ring”.

BHObsvCirc3.png

Figure 4: The circular bulb’s direct image is the bright circle, but a somewhat dimmer first indirect image appears further out, and just beyond one finds all the other indirect images, forming a thin `photon ring’.

Importantly, if we consider circular bulbs of different diameter [yellow, red and blue in Figure 5], then although the direct images reflect the differences in the bulbs’ diameters (somewhat enlarged by lensing), the first indirect images all are about the same diameter, just a tad larger or smaller than the photon ring.  The remaining indirect images all sit together at the radius of the photon ring.

BH3Circ4.png

Figure 5: Three bulbs of different diameter (yellow, blue, red) create three distinct direct images, but their first indirect images are located much closer together, and very close to the photon ring where all their remaining indirect images pile up.

These statements are also essentially true for a rotating black hole seen from the north or south pole; a circular bulb generates a series of circular images, and the indirect images all pile more or less on top of each other, forming a photon ring. When viewed off the poles, the rotating black hole becomes a more complicated story, but as long as the viewing angle is small enough, the changes are relatively minor and the picture is qualitatively somewhat similar.

A Disk as a Source of Electromagnetic Waves

And what if you replaced the circular bulb with a disk-shaped bulb, a sort of glowing pancake with a circular hole at its center, as in Figure 7? That’s relevant because black holes are thought to have `accretion disks’ made of material orbiting the black hole, and eventually spiraling in. The accretion disk may well be the dominant source emitting radio waves at the M87bh. (I’m showing a very thin uniform disk for illustration, but a real accretion disk is not uniform, changes rapidly as clumps of material move within it and then spiral into the black hole, and may be quite thick — as thick as the black hole is wide, or even thicker.)

Well, we can think of the disk as many concentric circles of light placed together. The direct images of the disk (shown in Figure 6 left, on one side of the disk, as an orange wash) would form a disk in your camera, the dim red region in Figure 6 right; the hole at its center would appear larger than it really is due to the bending caused by the black hole’s gravity, but the shape would be similar. However, the indirect images would all pile up in almost the same place from your perspective, forming a bright and quite thin ring, the bright yellow circle in Figure 6 right. (The path of the disk’s first indirect image is shown in Figure 6 left, going halfway about the black hole as a green wash; notice how it narrows as it travels, which is why it appears as a narrow ring in the image at right.) This circle — the full set of indirect images of the whole disk — is the edge of the photon-sphere for a non-rotating black hole, and the circular photon ring for a rotating black hole viewed from its north or south pole.

BHDisk2.png

Figure 6: A glowing disk of material (note it does not touch the black hole) looks like a version of Figure 5 with many more circular bulbs. The direct image of the disk forms a disk (illustrated at left, for a piece of the disk, as an orange wash) while the first indirect image becomes highly compressed (illustrated, for a piece of the disk, as a green wash) and is seen as a narrow circle of bright light.  (It is expected that the disk is mostly transparent in radio waves, so the indirect image can pass through it.) That circle, along with the other indirect images, forms the photon ring. In this case, because the disk’s inner edge lies close to the black hole horizon, the photon ring sits within the disk’s direct image, but we’ll see a different example in Figure 9.

[Gralla et al. call the first indirect image the `lensed ring’ and the remaining indirect images, currently unobservable at EHT, the `photon ring’, while EHT refers to all the indirect images as the `photon ring’. Just letting you know in case you hear `lensed ring’ referred to in future.]

So the conclusion is that if we had a perfect camera, the direct image of a disk makes a disk, but the indirect images (mainly just the first one, as Gralla et al. emphasize) make a bright, thin ring that may be superposed upon the direct image of the disk, depending on the disk’s shape.

And this conclusion, with some important adjustments, applies also for a spinning black hole viewed from above its north or south pole — i.e., along its axis of rotation — or from near that axis; I’ll mention the adjustments in a moment.

But EHT is not a perfect camera. To make the black hole image, technology had to be pushed to its absolute limits. Someday we’ll see both the disk and the ring, but right now, they’re all blurred together. So which one is more important?

From a Blurry Image to Blurry Knowledge

What does a blurry camera do to this simple image? You might think that the disk is so dim and the ring so bright that the camera will mainly show you a blurry image of the bright photon ring. But that’s wrong. The ring isn’t bright enough. A simple calculation reveals that the ​photo will show mainly the disk, not the photon ring! This is shown in Figure 9, which you can compare with the Black Hole `photo’ (Figure 10). (Figure 9 is symmetric around the ring, but the photo is not, for multiple reasons — Doppler-like effect from rotation, viewpoint off the rotation axis, etc. — which I’ll have to defer til another post.)

More precisely, the ring and disk blur together, but the brightness of the image is dominated by the disk, not the ring.

BHBlurDisk_a1_2.png

Figure 7: At left is repeated the image in Figure 6, as seen in a perfect camera, while at right the same image is shown when observed using a camera with imperfect vision. The disk and ring blur together into a single thick ring, whose brightness is dominated by the disk. Note that the shadow — the region surrounded by the yellow photon ring — is not the same as the dark patch in the right-hand image; the dark patch is considerably smaller than the shadow.

Let’s say that again: the black hole `photo’ may mainly show the M87bh’s accretion disk, with the photon ring contributing only some of the light, and therefore the photon ring does not completely and unambiguously determine the radius of the observed dark patch in the `photo​.’ In general, the patch could be considerably smaller than what is usually termed the `shadow’ of the black hole.

M87BH_Vicinity_Photo_2a.png

Figure 8: (Left) We probably observe the M87bh at a small angle off its south pole. Its accretion disk has an unknown size and shape — it may be quite thick and non-uniform — and it may not even lie at the black hole’s equator. The disk and the black hole interact to create outward-going jets of material (observed already many years ago but not clearly visible in the EHT ‘photo’.) (Right) The EHT `photo’ of the M87bh (taken in radio waves and shown in false color!) Compare with Figure 7; the most important difference is that one side of the image is brighter than the other. This likely arises from (a) our view being slightly off from the south pole, combined with (b) rotation of the black hole and its disk, and (c) possibly other more subtle issues.

This is important. The photon ring’s diameter, and thus the width of the `shadow’ too, barely depend on the rotation rate of the black hole; they depend almost exclusively on the black hole’s mass. So if the ring in the photo were simply the photon ring of the M87bh, you’d have a very simple way to measure the black hole’s mass without knowing its rotation rate: you’d look at how large the dark patch is, or equivalently, the diameter of the blurry ring, and that would give you the answer to within 10%. But it’s nowhere near so simple if the blurry ring shows the accretion disk, because the accretion disk’s properties and appearance can vary much more than the photon ring; they can depend strongly on the black hole’s rotation rate, and also on magnetic fields and other details of the black hole’s vicinity.

The Important Role of Rotation

If we conclude that EHT is seeing a mix of the accretion disk with the photon ring, with the former dominating the brightness, then this makes EHT’s measurement of the M87bh’s mass more confusing and even potentially suspect. Hence: controversy. Is it possible that EHT underestimated their uncertainties, and that their measurement of the black hole mass has more ambiguities, and is not as precise, as they currently claim?

Here’s where the rotation rate is important. Despite what I showed (for pedagogical simplicity) in Figure 7, for a non-rotating black hole the accretion disk’s central gap is actually expected to lie outside the photon ring; this is shown at the top of Figure 9.  But  the faster the black hole rotates, the smaller this central gap is expected to be, to the point that for a fast-rotating black hole the gap will lie inside the photon ring, as shown at the bottom of Figure 9. (This tendency is not obvious; it requires understanding details of the black hole geometry.) And if that is true, the dark patch in the EHT image may not be the black hole’s full shadow (i.e. quasi-silhouette), which is the region inside the photon ring. It may be just the inner portion of it, with the outer portion obscured by emission from the accretion disk.

The effect of blurring in the two cases of slow (or zero) and fast rotation are illustrated in Figure 9, where the photon ring’s size is taken to be the same in each case but the disk’s inner edge is close in or far out. (The black holes, not illustrated since they aren’t visible anyway, differ in mass by about 10% in order to have the photon ring the same size.) This shows why the size of the dark patch can be quite different, depending on the disk’s shape, even when the photon ring’s size is the same.

BHBlurDisk_a0_a1_3.png

Figure 9: Comparing the appearance of slightly more realistically-shaped disks around slowly rotating or non-rotating black holes (top) to those around fast-rotating black holes (bottom) of the same mass, as seen from the north or south pole. (Left) the view in a perfect camera; (right) rough illustration of the effect of blurring in the current version of the EHT. The faster the black hole is spinning, the smaller the central gap in the accretion disk is likely to be. No matter what the extent of the accretion disk (dark red), the photon ring (yellow) remains at roughly the same location, changing only by 10% between a non-rotating black hole and a maximally rotating black hole of the same mass. But blurring in the camera combines the disk and photon ring into a thick ring whose brightness is dominated by the disk rather than the ring, and which can therefore be of different size even though the mass is the same. This implies that the radius of the blurry ring in the EHT `photo’, and the size of the dark region inside it, cannot by themselves tell us the black hole’s mass; at a minimum we must also know the rotation rate (which we do not.)

Gralla et al. subtly raise these questions but are careful not to overstate their case, perhaps because they have not yet completed their study of rotating black holes. But the question is now in the air.

I’m interested to hear what the EHT folks have to say about it, as I’m sure they have detailed arguments in favor of their procedures. In particular, EHT’s simulations show all of the effects mentioned above; there’s none of this of which they are unaware. (In fact, the reason I know my illustrations above are reasonable is partly because you can see similar pictures in the EHT papers.) As long as the EHT folks correctly accounted for all the issues, then they should have been able to properly measure the mass and estimate their uncertainties correctly. In fact, they don’t really use the photo itself; they use more subtle techniques applied to their telescope data directly. Thus it’s not enough to argue the photo itself is ambiguous; one has to argue that EHT’s more subtle analysis methods are flawed. No one has argued that yet, as far as I am aware.

But the one thing that’s clear right now is that science writers almost uniformly got it wrong [because the experts didn’t explain these points well] when they tried to describe the image two months ago. The `photo’ probably does not show “a photon ring surrounding a shadow.” That would be nice and simple and impressive-sounding, since it refers to fundamental properties of the black hole’s warping effects on space. But it’s far too glib, as Figures 7 and 9 show. We’re probably seeing an accretion disk supplemented by a photon ring, all blurred together, and the dark region may well be smaller than the black hole’s shadow.

(Rather than, or in addition to, the accretion disk, it is also possible that the dominant emission in the photo comes from the inner portion of one of the jets that emerges from the vicinity of the black hole; see Figure 8 above. This is another detail that makes the situation more difficult to interpret, but doesn’t change the main point I’m making.)

Someday in the not distant future, improved imaging should allow EHT to separately image the photon ring and the disk, so both can be observed easily, as in the left side of Figure 9. Then all these questions will be answered definitively.

Why the Gargantua Black Hole from Interstellar is Completely Different

Just as a quick aside, what would you see if an accretion disk were edge-on rather than face-on? Then, in a perfect camera, you’d see something like the famous picture of Gargantua, the black hole from the movie Interstellar — a direct image of the front edge of the disk, and a strongly lensed indirect image of the back side of the disk, appearing both above and below the black hole, as illustrated in Figure 11. And that leads to the Gargantua image from the movie, also shown in Figure 11. Notice the photon ring (which is, as I cautioned you earlier, off-center!)   [Note added: this figure has been modified; in the original version I referred to the top and bottom views of the disk’s far side as the  “1st indirect image”, but as pointed out by Professor Jean-Pierre Luminet, that’s not correct terminology here.]

BHGarg4.png

Figure 10: The movie Interstellar features a visit to an imaginary black hole called Gargantua, and the simulated images in the movie (from 2014) are taken from near the equator, not the pole. As a result, the direct image of the disk cuts across the black hole, and indirect images of the back side of the disk are seen above and below the black hole. There is also a bright photon ring, slightly off center; this is well outside the surface of the black hole, which is not visible. A real image would not be symmetric left-to-right; it would be brighter on the side that is rotating toward the viewer.  At the bottom is shown a much more realistic visual image (albeit not so good quality) from 1994 by Jean-Alain Marck, in which this asymmetry can be seen clearly.

However, the movie image leaves out an important Doppler-like effect (which I’ll explain someday when I understand it 100%). This makes the part of the disk that is rotating toward us bright, and the part rotating away from us dim… and so a real image from this vantage point would be very asymmetric — bright on the left, dim on the right — unlike the movie image.  At the suggestion of Professsor Jean-Pierre Luminet I have added, at the bottom of Figure 10, a very early simulation by Jean-Alain Marck that shows this effect.

I mention this because a number of expert science journalists incorrectly explained the M87 image by referring to Gargantua — but that image has essentially nothing to do with the recent black hole `photo’. M87’s accretion disk is certainly not edge-on. The movie’s Gargantua image is taken from the equator, not from near the pole.

Final Remarks: Where a Rotating Black Hole Differs from a Non-Rotating One

Before I quit for the week, I’ll just summarize a few big differences for fast-rotating black holes compared to non-rotating ones.

1) As I’ve just emphasized, what a rotating black hole looks like to a distant observer depends not only on where the matter around the black hole is located but also on how the black hole’s rotation axis is oriented relative to the observer. A pole observer, an equatorial observer, and a near-pole observer see quite different things. (As noted in Figure 8, we are apparently near-south-pole observers for M87’s black hole.)

Let’s assume that the accretion disk lies in the same plane as the black hole’s equator — there are some reasons to expect this. Even then, the story is complex.

2) As I mentioned above, instead of a photon-sphere, there is a ‘photon-zone’ — a region where specially aimed photons can travel round the black hole multiple times. For high-enough spin (greater than about 80% of maximum as I recall), an accretion disk’s inner edge can lie within the photon zone, or even closer to the black hole than the photon zone; and this can cause a filling-in of the ‘shadow’.

3) Depending on the viewing angle, the indirect images of the disk that form the photon ring may not be a circle, and may not be concentric with the direct image of the disk. Only when viewed from along the rotation axis (i.e., above the north or south pole) will the direct and indirect images of the disk all be circular and concentric. We’re not viewing the M87bh on its axis, and that further complicates interpretation of the blurry image.

4) When the viewing angle is not along the rotation axis the image will be asymmetric, brighter on one side than the other. (This is true of EHT’s `photo’.) However, I know of at least four potential causes of this asymmetry, any or all of which might play a role, and the degree of asymmetry depends on properties of the accretion disk and the rotation rate of the black hole, both of which are currently unknown. Claims about the asymmetry made by the EHT folks seem, at least to me, to be based on certain assumptions that I, at least, cannot currently check.

Each of these complexities is a challenge to explain, so I’ll give both you and I a substantial break while I figure out how best to convey what is known (at least to me) about these issues.

June 14, 2019

Matt von HippelWhen to Read Someone Else’s Thesis

There’s a cynical truism we use to reassure grad students. A thesis is a big, daunting project, but it shouldn’t be too stressful: in the end, nobody else is going to read it.

This is mostly true. In many fields your thesis is a mix of papers you’ve already published, stitched together into your overall story. Anyone who’s interested will have read the papers the thesis is based on, they don’t need to read the thesis too.

Like every good truism, though, there is an exception. Some rare times, you will actually want to read someone else’s thesis. This isn’t usually because the material is new: rather it’s because it’s well explained.

When we academics publish, we’re often in a hurry, and there isn’t time to write well. When we publish more slowly, often we have more collaborators, so the paper is a set of compromises written by committee. Either way, we rarely make a concept totally crystal-clear.

A thesis isn’t always crystal-clear either, but it can be. It’s written by just one person, and that person is learning. A grad student who just learned a topic can be in the best position to teach it: they know exactly what confused them when they start out. Thesis-writing is also a slower process, one that gives more time to hammer at a text until it’s right. Finally, a thesis is written for a committee, and that committee usually contains people from different fields. A thesis needs to be an accessible introduction, in a way that a published paper doesn’t.

There are topics that I never really understood until I looked up the thesis of the grad student who helped discover it. There are tricks that never made it to published papers, that I’ve learned because they were tucked in to the thesis of someone who went on to do great things.

So if you’re finding a subject confusing, if you’ve read all the papers and none of them make any sense, look for the grad students. Sometimes the best explanation of a tricky topic isn’t in the published literature, it’s hidden away in someone’s thesis.

Terence TaoRuling out polynomial bijections over the rationals via Bombieri-Lang?

I recently came across this question on MathOverflow asking if there are any polynomials {P} of two variables with rational coefficients, such that the map {P: {\bf Q} \times {\bf Q} \rightarrow {\bf Q}} is a bijection. The answer to this question is almost surely “no”, but it is remarkable how hard this problem resists any attempt at rigorous proof. (MathOverflow users with enough privileges to see deleted answers will find that there are no fewer than seventeen deleted attempts at a proof in response to this question!)

On the other hand, the one surviving response to the question does point out this paper of Poonen which shows that assuming a powerful conjecture in Diophantine geometry known as the Bombieri-Lang conjecture (discussed in this previous post), it is at least possible to exhibit polynomials {P: {\bf Q} \times {\bf Q} \rightarrow {\bf Q}} which are injective.

I believe that it should be possible to also rule out the existence of bijective polynomials {P: {\bf Q} \times {\bf Q} \rightarrow {\bf Q}} if one assumes the Bombieri-Lang conjecture, and have sketched out a strategy to do so, but filling in the gaps requires a fair bit more algebraic geometry than I am capable of. So as a sort of experiment, I would like to see if a rigorous implication of this form (similarly to the rigorous implication of the Erdos-Ulam conjecture from the Bombieri-Lang conjecture in my previous post) can be crowdsourced, in the spirit of the polymath projects (though I feel that this particular problem should be significantly quicker to resolve than a typical such project).

Here is how I imagine a Bombieri-Lang-powered resolution of this question should proceed (modulo a large number of unjustified and somewhat vague steps that I believe to be true but have not established rigorously). Suppose for contradiction that we have a bijective polynomial {P: {\bf Q} \times {\bf Q} \rightarrow {\bf Q}}. Then for any polynomial {Q: {\bf Q} \rightarrow {\bf Q}} of one variable, the surface

\displaystyle  S_Q := \{ (x,y,z) \in \mathbb{A}^3: P(x,y) = Q(z) \}

has infinitely many rational points; indeed, every rational {z \in {\bf Q}} lifts to exactly one rational point in {S_Q}. I believe that for “typical” {Q} this surface {S_Q} should be irreducible. One can now split into two cases:

  • (a) The rational points in {S_Q} are Zariski dense in {S_Q}.
  • (b) The rational points in {S_Q} are not Zariski dense in {S_Q}.

Consider case (b) first. By definition, this case asserts that the rational points in {S_Q} are contained in a finite number of algebraic curves. By Faltings’ theorem (a special case of the Bombieri-Lang conjecture), any curve of genus two or higher only contains a finite number of rational points. So all but finitely many of the rational points in {S_Q} are contained in a finite union of genus zero and genus one curves. I think all genus zero curves are birational to a line, and all the genus one curves are birational to an elliptic curve (though I don’t have an immediate reference for this). These curves {C} all can have an infinity of rational points, but very few of them should have “enough” rational points {C \cap {\bf Q}^3} that their projection {\pi(C \cap {\bf Q}^3) := \{ z \in {\bf Q} : (x,y,z) \in C \hbox{ for some } x,y \in {\bf Q} \}} to the third coordinate is “large”. In particular, I believe

  • (i) If {C \subset {\mathbb A}^3} is birational to an elliptic curve, then the number of elements of {\pi(C \cap {\bf Q}^3)} of height at most {H} should grow at most polylogarithmically in {H} (i.e., be of order {O( \log^{O(1)} H )}.
  • (ii) If {C \subset {\mathbb A}^3} is birational to a line but not of the form {\{ (f(z), g(z), z) \}} for some rational {f,g}, then then the number of elements of {\pi(C \cap {\bf Q}^3)} of height at most {H} should grow slower than {H^2} (in fact I think it can only grow like {O(H)}).

I do not have proofs of these results (though I think something similar to (i) can be found in Knapp’s book, and (ii) should basically follow by using a rational parameterisation {\{(f(t),g(t),h(t))\}} of {C} with {h} nonlinear). Assuming these assertions, this would mean that there is a curve of the form {\{ (f(z),g(z),z)\}} that captures a “positive fraction” of the rational points of {S_Q}, as measured by restricting the height of the third coordinate {z} to lie below a large threshold {H}, computing density, and sending {H} to infinity (taking a limit superior). I believe this forces an identity of the form

\displaystyle  P(f(z), g(z)) = Q(z) \ \ \ \ \ (1)

for all {z}. Such identities are certainly possible for some choices of {Q} (e.g. {Q(z) = P(F(z), G(z))} for arbitrary polynomials {F,G} of one variable) but I believe that the only way that such identities hold for a “positive fraction” of {Q} (as measured using height as before) is if there is in fact a rational identity of the form

\displaystyle  P( f_0(z), g_0(z) ) = z

for some rational functions {f_0,g_0} with rational coefficients (in which case we would have {f = f_0 \circ Q} and {g = g_0 \circ Q}). But such an identity would contradict the hypothesis that {P} is bijective, since one can take a rational point {(x,y)} outside of the curve {\{ (f_0(z), g_0(z)): z \in {\bf Q} \}}, and set {z := P(x,y)}, in which case we have {P(x,y) = P(f_0(z), g_0(z) )} violating the injective nature of {P}. Thus, modulo a lot of steps that have not been fully justified, we have ruled out the scenario in which case (b) holds for a “positive fraction” of {Q}.

This leaves the scenario in which case (a) holds for a “positive fraction” of {Q}. Assuming the Bombieri-Lang conjecture, this implies that for such {Q}, any resolution of singularities of {S_Q} fails to be of general type. I would imagine that this places some very strong constraints on {P,Q}, since I would expect the equation {P(x,y) = Q(z)} to describe a surface of general type for “generic” choices of {P,Q} (after resolving singularities). However, I do not have a good set of techniques for detecting whether a given surface is of general type or not. Presumably one should proceed by viewing the surface {\{ (x,y,z): P(x,y) = Q(z) \}} as a fibre product of the simpler surface {\{ (x,y,w): P(x,y) = w \}} and the curve {\{ (z,w): Q(z) = w \}} over the line {\{w \}}. In any event, I believe the way to handle (a) is to show that the failure of general type of {S_Q} implies some strong algebraic constraint between {P} and {Q} (something in the spirit of (1), perhaps), and then use this constraint to rule out the bijectivity of {P} by some further ad hoc method.

Matt StrasslerA Non-Expert’s Guide to a Black Hole’s Silhouette

[Note added April 16: some minor improvements have been made to this article as my understanding has increased, specifically concerning the photon-sphere, which is the main region from which the radio waves are seen in the recently released image. See later blog posts for the image and its interpretation.]

[Note added June 14: significant misconceptions concerning the photon-sphere and shadow, as relevant to the black hole ‘photo’, dominated reporting in April, and I myself was also subject to them.  I have explained the origin of and correction to these misconceptions, which affect the interpretation of the image, in my post “A Ring of Controversy”.]

About fifteen years ago, when I was a professor at the University of Washington, the particle physics theorists and the astronomer theorists occasionally would arrange to have lunch together, to facilitate an informal exchange of information about our adjacent fields. Among the many enjoyable discussions, one I was particularly excited about — as much as an amateur as a professional — was that in which I learned of the plan to make some sort of image of a black hole. I was told that this incredible feat would likely be achieved by 2020. The time, it seems, has arrived.

The goal of this post is to provide readers with what I hope will be a helpful guide through the foggy swamp that is likely to partly obscure this major scientific result. Over the last days I’ve been reading what both scientists and science journalists are writing in advance of the press conference Wednesday morning, and I’m finding many examples of jargon masquerading as English, terms poorly defined, and phrasing that seems likely to mislead. As I’m increasingly concerned that many non-experts will be unable to understand what is presented tomorrow, and what the pictures do and do not mean, I’m using this post to answer a few questions that many readers (and many of these writers) have perhaps not thought to ask.

A caution: I am not an expert on this subject. At the moment, I’m still learning about the more subtle issues. I’ll try to be clear about when I’m on the edge of my knowledge, and hopefully won’t make any blunders [but experts, please point them out if you see any!]

Which black holes are being imaged?

The initial plan behind the so-called “Event Horizon Telescope” (the name deserves some discussion; see below) has been to make images of two black holes at the centers of galaxies (the star-cities in which most stars are found.) These are not black holes formed by the collapse of individual stars, such as the ones whose collisions have been observed through their gravitational waves. Central galactic black holes are millions or even billions of times larger!  The ones being observed are

  1. the large and `close’ black hole at the center of the Milky Way (the giant star-city we call home), and
  2. the immense but much more distant back hole at the center of M87 (a spectacularly big star-megalopolis.)
MilkyWayAndM87.jpg

Left: the Milky Way as seen in the night sky; we see our galaxy from within, and so cannot see its spiral shape directly.  The brightest region is toward the center of the galaxy, and deep within it is the black hole of interest, as big as a large star but incredibly small in this image.  Right: the enormous elliptically-shaped galaxy M87, which sports an enormous black hole (but again, incredibly small in this image) at its center.  The blue stripe is a jet of material hurled at near-light-speed from the region very close to the black hole.

Why go after both of these black holes at the same time? Just as the Sun and Moon appear the same size in our sky because of an accident — the Sun’s greater distance is almost precisely compensated by its greater size — these two black holes appear similarly sized, albeit very tiny, from our vantage point.

Our galaxy’s central black hole has a mass of about four million Suns, and its size is about twenty times wider than our Sun (in contrast to the black holes whose gravitational waves were observed, which are the size of a human city.) But from the center of our galaxy, it takes light tens of thousands of years to reach Earth, and at such a great distance, this big black hole appears as small as would a tiny grain of sand from a plane at cruising altitude. Try to see the sand grains on a beach from a plane window!

Meanwhile, although M87 lies about two thousand times further away, its central black hole has a mass and radius about two thousand times larger than our galaxy’s black hole. Consequently it appears roughly the same size on the sky.

We may get our first images of both black holes at Wednesday’s announcement, though it is possible that so far only one image is ready for public presentation.

How can one see something as black as a black hole?!?

First of all, aren’t black holes black? Doesn’t that mean they don’t emit (or reflect) any light? Yes, and yes. [With a caveat — Hawking radiation — but while that’s very important for extremely small black holes, it’s completely irrelevant for these ones.]

So how can we see something that’s black against a black night sky? Well, a black hole off by itself would indeed be almost invisible.  [Though detectable because its gravity bends the light of objects behind it, proving it was a black hole and not a clump of, say, dark matter would be tough.]

But the centers of galaxies have lots of `gas’ — not quite what we call `gas’ in school, and certainly not what we put in our cars. `Gas’ is the generalized term astronomers use for any collection of independently wandering atoms (or ions), atomic nuclei, and electrons; if it’s not just atoms it should really be called plasma, but let’s not worry about that here. Some of this `gas’ is inevitably orbiting the black hole, and some of it is falling in (and, because of complex interactions with magnetic fields, some is being blasted outward  before it falls in.) That gas, unlike the black hole, will inevitably glow — it will emit lots of light. What the astronomers are observing isn’t the black hole; they’re observing light from the gas!

And by light, I don’t only mean what we can see with human eyes. The gas emits electromagnetic waves of all frequencies, including not only the visible frequencies but also much higher frequencies, such as those we call X-rays, and much lower frequencies, such as those we call radio waves. To detect these invisible forms of light, astronomers build all sorts of scientific instruments, which we call them `telescopes’ even though they don’t involve looking into a tube as with traditional telescopes.

Is this really a “photograph” of [the gas in the neighborhood of] a black hole?

Yes and (mostly) no.  What you’ll be shown is not a picture you could take with a cell-phone camera if you were in a nearby spaceship.  It’s not visible light that’s being observed.  But it is invisible light — radio waves — and since all light, visible and not, is made from particles called `photons’, technically you could still say is a “photo”-graph.

As I said, the telescope being used in this doesn’t have a set of mirrors in a tube like your friendly neighbor’s amateur telescope. Instead, it uses radio receivers to detect electromagnetic waves that have frequencies above what your traditional radio or cell phone can detect [in the hundred gigahertz range, over a thousand times above what your FM radio is sensitive to.]  Though some might call them microwaves, let’s just call them radio waves; it’s just a matter of definition.

So the images you will see are based on the observation of electromagnetic waves at these radio frequencies, but they are turned into something visible for us humans using a computer. That means the color of the image is inserted by the computer user and will be arbitrary, so pay it limited attention. It’s not the color you would see if you were nearby.  Scientists will choose a combination of the color and brightness of the image so as to indicate the brightness of the radio waves observed.

If you were nearby and looked toward the black hole, you’d see something else. The gas would probably appear colorless — white-hot — and the shape and brightness, though similar, wouldn’t be identical.

If I had radios for eyes, is this what I would see?

Suppose you had radio receivers for eyes instead of the ones you were born with; is this image what you would see if you looked at the black hole from a nearby planet?

Well, to some extent — but still, not exactly.  There’s another very important way that what you will see is not a photograph. It is so difficult to make this measurement that the image you will see is highly processed — that is to say, it will have been cleaned up and adjusted using special mathematics and powerful computers. Various assumptions are key ingredients in this image-processing. Thus, you will not be seeing the `true’ appearance of the [surroundings of the] black hole. You will be seeing the astronomers’ best guess as to this appearance based on partial information and some very intelligent guesswork. Such guesswork may be right, but as we may learn only over time, some of it may not be.

This guesswork is necessary.  To make a nice clear image of something so tiny and faint using radio waves, you’d naively need a radio receiver as large as the Earth. A trick astronomers use when looking at distant objects it to build gigantic arrays of large radio receivers, which can be combined together to make a `telescope’ much larger and more powerful than any one receiver. The tricks for doing this efficiently are beyond what I can explain here, but involve the term `interferometry‘. Examples of large radio telescope arrays include ALMA, the center part of which can be seen in this image, which was built high up on a plateau between two volcanoes in the Atacama desert.

But even ALMA isn’t up to the task. And we can’t make a version of ALMA that covers the Earth. So the next best thing is to use ALMA and all of its friends, which are scattered at different locations around the Earth — an incomplete array of single and multiple radio receivers, combined using all the tricks of interferometry. This is a bit like (and not like) using a telescope that is powerful but has large pieces of its lens missing. You can get an image, but it will be badly distorted.

To figure out what you’re seeing, you must use your knowledge of your imperfect lens, and work backwards to figure our what your distorted image really would have looked like if you had a perfect lens.

Even that’s not quite enough: to do this, you need to have a pretty good guess about what you were going to see. That is where you might go astray; if your assumptions are wrong, you might massage the image to look like what you expected instead of how it really ought to look.   [Is this a serious risk?   I’m not yet expert enough to know the details of how delicate the situation might be.]

It is possible that in coming days and weeks there will be controversies about whether the image-processing techniques used and the assumptions underlying them have created artifacts in the images that really shouldn’t be there, or removed features that should have been there. This could lead to significant disagreements as to how to interpret the images. [Consider these concerns a statement of general prudence based on past experience; I don’t have enough specific expertise here to give a more detailed opinion.]

So in summary, this is not a photograph you’d see with your eyes, and it’s not a complete unadulterated image — it’s a highly-processed image of what a creature with radio eyes might see.  The color is arbitrary; some combination of color and brightness will express the intensity of the radio waves, nothing more.  Treat the images with proper caution and respect.

Will the image show what [the immediate neighborhood of] a black hole looks like?

Oh, my friends and colleagues! Could we please try to be a little more careful using the phrase `looks like‘? The term has two very different meanings, and in contexts like this one, it really, really matters.

Let me put a spoon into a glass of water: on the left, I’ve drawn a diagram of what it looks like, and on the right, I’ve put a photograph of what it looks like.

GlassNSpoonBoth.png

On the left, a sketchy drawing of a spoon in water, showing what it “looks like” in truth; on the right, what it “looks like” to your eyes, distorted by the bending of the light’s path between the spoon and my camera.

You notice the difference, no doubt. The spoon on the left looks like a spoon; the spoon on the right looks like something invented by a surrealist painter.  But it’s just the effect of water and glass on light.

What’s on the left is a drawing of where the spoon is located inside the glass; it shows not what you would see with your eyes, but rather a depiction of the `true location’ of the spoon. On the right is what you will see with your eyes and brain, which is not showing you where the objects are in truth but rather is showing you where the light from those objects is coming from. The truth-depiction is drawn as though the light from the object goes straight from the object into your eyes. But when the light from an object does not take a straight path from the objects to you — as in this case, where the light’s path bends due to interaction of the light with the water and the glass — then the image created in your eyes or camera can significantly differ from a depiction of the truth.

The same issue arises, of course, with any sort of lens or mirror that isn’t flat. A room seen through a curved lens, or through an old, misshapen window pane, can look pretty strange.  And gravity? Strong gravity near a black hole drastically modifies the path of any light traveling close by!

In the figure below, the left panel shows a depiction of what we think the region around a black hole typically looks like in truth. There is a spinning disk of gas, called the accretion disk, a sort of holding station for the gas. At any moment, a small fraction of the gas, at the inner edge of the disk, has crept too close to the black hole and is starting to spiral in. There are also usually jets of material flying out, roughly aligned with the axis of the black hole’s rapid rotation and its magnetic field. As I mentioned above, that material is being flung out of the black hole’s neighborhood (not out of the black hole itself, which would be impossible.)

accretiondisk_real_apparent.jpg

Left: a depiction `in truth’ of the neighborhood of a black hole, showing the disk of slowly in-spiraling gas and the jets of material funneled outward by the black hole’s magnetic field.  The black hole is not directly shown, but is significantly smaller than the inner edge of the disk.  The color is not meaningful.  Right: a simulation by Hotaka Shiokawa of how such a black hole may appear to the Event Horizon Telescope [if its disk is tipped up a bit more than in the image at left.]  The color is arbitrary; mainly the brightness matters.  The left side of the disk appears brighter than right side due to a ​`Doppler effect’; on the left the gas is moving toward us, increasing the intensity of the radio waves, while on the right side it is moving away.  The dark area at the center is the black hole’s sort-of-silhouette; see below.

The image you will be shown, however, will perhaps look like the one on the right. That is an image of the radio waves as observed here at Earth, after the waves’ paths have been wildly bent — warped by the intense gravity near the black hole. Just as with any lens or mirror or anything similar, what you will see does not directly reveal what is truly there. Instead, you must infer what is there from what you see.

Just as you infer, when you see the broken twisted spoon in the glass, that probably the spoon is not broken or twisted, and the water and glass have refracted the light in familiar ways, so too we must make assumptions to understand what we’re really looking at in truth after we see the images tomorrow.

How serious are these assumptions?  Certainly, at their first attempt, astronomers will assume Einstein’s theory of gravity, which predicts how the light is bent around a black hole, is correct. But the details of what we infer from what we’re shown might depend upon whether Einstein’s formulas are precisely right. It also may depend on the accuracy of our understanding of and assumptions about accretion disks. Further complicating the procedure is that the rate and axis of the black hole’s rotation affects the details of the bending of the light, and since we’re not sure of the rotation yet for these black holes, that adds to the assumptions that must gradually be disentangled.

Because of these assumptions, we will not have an unambiguous understanding of the true nature of what appears in these first images.

Are we seeing the `shadow’ of a black hole?

A shadow: that’s what astronomers call it, but as far as I can tell, this word is jargon masquerading as English… the most pernicious type of jargon.

What’s a shadow, in English? Your shadow is a dark area on the ground created by you blocking the light from the Sun, which emits light and illuminates the area around you. How would I see your shadow? I’d look at the ground — not at you.

This is not what we are doing in this case. The gas is glowing, illuminating the region. The black hole is `blocking’ [caution! see below] some of the light. We’re looking straight toward the black hole, and seeing dark areas where illumination doesn’t reach us. This is more like looking at someone’s silhouette, not someone’s shadow!

SilhouetteShadow2.png

With the Sun providing a source of illumination, a person standing between you and the Sun would appear as a silhouette that blocks part of the Sun, and would also create a shadow on the ground. [I’ve drawn the shadow slightly askew to avoid visual confusion.]  In the black hole images, illumination is provided by the glowing gas, and we’ll see a sort-of-silhouette [but see below!!!] of the black hole.  There’s nothing analogous to the ground, or to the person’s shadow, in the black hole images.

That being said, it’s much more clever than a simple silhouette, because of all that pesky bending of the light that the black hole is doing.  In an ordinary silhouette, the light from the illuminator travels in straight lines, and an object blocks part of the light.   But a black hole does not block your view of what’s behind it;  the light from the gas behind it gets bent around it, and thus can be seen after all!

Still, after you calculate all the bending, you find out that there’s a dark area from which no light emerges, which I’ll informally call a quasi-silhouette.  Just outside this is a `photon-sphere’, which creates a bright ring; THIS STATEMENT, MADE IN ALL THE PRESS ARTICLES I COULD FIND, IS MISLEADING, AS ARE THE THREE PARAGRAPHS THAT FOLLOW THE IMAGE BELOW; THEY ARE TRUE FOR A PERFECT CAMERA, BUT NOT FOR A CAMERA THAT CAN BARELY SEE THE IMAGE. BLURRING EFFECTS MAKE THESE CORRECT STATEMENTS SOMEWHAT IRRELEVANT AND CHANGE THE CONCLUSIONS.  SEE THIS POST FROM JUNE 2019 THAT EXPLAINS THE SITUATION MORE CLEARLY. That resembles what happens with certain lenses, in contrast to the person’s silhouette shown above, where the light travels in straight lines.  Imagine that a human body could bend light in such a way; a whimsical depiction of what that might look like is shown below:

QuasiSilhouette.png

If a human body could bend light the way a black hole does, it would distort the Sun’s appearance.  The light we’d expect to be blocked would instead be warped around the edges.  The dark area, no longer a simple outline of the body, would take on a complicated shape.

Note also that the black hole’s quasi-silhouette probably won’t be entirely dark. If material from the accretion disk (or a jet pointing toward us) lies between us and the black hole, it can emit light in our direction, partly filling in the dark region.

Thus the quasi-silhouette we’ll see in the images is not the outline of the black hole’s edge, but an effect of the light bending, and is in fact considerably larger than the black hole.  In truth it may be as much as 50% larger in radius than the event horizon, and the silhouette as seen in the image may appear  more than 2.5 to 5 times larger (depending on how fast the black hole rotates) than the true event horizon — all due to the bent paths of the radio waves. Note Added: THIS ASSUMES THAT THE QUASI-SILHOUETTE IS NOT SUBSTANTIALLY FILLED IN, WHICH, IT TURNS OUT, MAY NOT BE TRUE.

Interestingly, it turns out that the details of how the black hole is rotating don’t much affect the size of the quasi-silhouette. Note Added: THIS IS TRUE, BUT NOT ENTIRELY RELEVANT WHEN USING AN IMPERFECT CAMERA.  The black hole in the Milky Way is already understood well enough that astronomers know how big its quasi-silhouette ought to appear, even though we don’t know its rotation speed and axis.  The quasi-silhouette’s size in the image will therefore be an important test of Einstein’s formulas for gravity, even on day one.  If the size disagrees with expectations, expect a lot of hullabaloo.

What is the event horizon, and is it present in the image?

The event horizon of a black hole, in Einstein’s theory of gravity, is not an object. It’s the edge of a region of no-return, as I’ve explained here. Anything that goes past that point can’t re-emerge, and nothing that happens inside can ever send any light or messages back out.

Despite what some writers are saying, we’re not expecting to see the event horizon in the images. As is hopefully clear by now, what astronomers are observing is (a) the radio waves from the gas near the black hole, after the waves have taken strongly curved paths, and (b) a quasi-silhouette of the black hole from which radio waves don’t emerge. But as I explained above, this quasi-silhouette is considerably larger than the event horizon, both in truth and in appearance. The event horizon does not emit any light, and it sits well inside the quasi-silhouette, not at its edge.

Still, what we’ll be seeing is closer to the event horizon than anything we’ve ever seen before, which is really exciting!   And if the silhouette has an unexpected appearance, we just might get our first hint of a breakdown of Einstein’s understanding of event horizons.  Don’t bet on it, but you can hope for it.

Can we hope to see the singularity of a black hole?

No, for two reasons.

First, there probably is no singularity in the first place. It drives me nuts when people say there’s a singularity inside a black hole; that’s just wrong. The correct statement is that in Einstein’s theory of gravity, singularities (i.e. unavoidable infinities) arise in the math — in the formulas for black holes (or more precisely, in the solutions to those formulas that describe black holes.) But a singularity in the math does not mean that there’s a singularity in nature!  It usually means that the math isn’t quite right — not that anyone made a mistake, but that the formulas that we’re using aren’t appropriate, and need to be modified in some way, because of some natural phenomenon that we don’t yet understand.

A singularity in a formula implies a mystery in nature — not a singularity in nature.

In fact, historically, in every previous case where singularities have appeared in formulas, it simply meant that the formula (or solution) was not accurate in that context, or wasn’t being used correctly. We already know that Einstein’s theory of gravity can’t be complete (it doesn’t accommodate quantum physics, which is also a part of the real world) and it would be no surprise if its incompleteness is responsible for these infinities. The math singularity merely signifies that the physics very deep inside a black hole isn’t understood yet.

[The same issue afflicts the statement that the Big Bang began with a singularity; the solution to the equations has a singularity, yes, but that’s very far from saying that nature’s Big Bang actually began with one.]

Ok, so how about revising the question: is there any hope of seeing the region deep inside the black hole where Einstein’s equations have a singularity? No. Remember what’s being observed is the stuff outside the black hole, and the black hole’s quasi-silhouette. What happens inside a black hole stays inside a black hole. Anything else is inference.

[More precisely, what happens inside a huge black hole stays inside a black hole for eons.  For tiny black holes it comes out sooner, but even so it is hopelessly scrambled in the form of `Hawking radiation.’]

Any other questions?

Maybe there are some questions that are bothering readers that I haven’t addressed here?  I’m super-busy this afternoon and all day Wednesday with non-physics things, but maybe if you ask the question early enough I can get address it here before the press conference (at 9 am Wednesday, New York City time).   Also, if you find my answers confusing, please comment; I can try to further clarify them for later readers.

It’s a historic moment, or at least the first stage in a historic process.  To me, as I hope to all of you, it’s all very exciting and astonishing how the surreal weirdness of Einstein’s understanding of gravity, and the creepy mysteries of black holes, have suddenly, in just a few quick years, become undeniably real!

Matt StrasslerThe Black Hole `Photo’: What Are We Looking At?

The short answer: I’m really not sure yet.  [This post is now largely superseded by the next one, in which some of the questions raised below have now been answered.]  EVEN THAT POST WAS WRONG ABOUT THE PHOTON-SPHERE AND SHADOW.  SEE THIS POST FROM JUNE 2019 FOR SOME ESSENTIAL CORRECTIONS THAT WERE LEFT OUT OF ALL REPORTING ON THIS SUBJECT.

Neither are some of my colleagues who know more about the black hole geometry than I do. And at this point we still haven’t figured out what the Event Horizon Telescope experts do and don’t know about this question… or whether they agree amongst themselves.

[Note added: last week, a number of people pointed me to a very nice video by Veritasium illustrating some of the features of black holes, accretion disks and the warping of their appearance by the gravity of the black hole.  However, Veritasium’s video illustrates a non-rotating black hole with a thin accretion disk that is edge-on from our perspective; and this is definitely NOT what we are seeing!]

As I emphasized in my pre-photo blog post (in which I described carefully what we were likely to be shown, and the subtleties involved), this is not a simple photograph of what’s `actually there.’ We all agree that what we’re looking at is light from some glowing material around the solar-system-sized black hole at the heart of the galaxy M87.  But that light has been wildly bent on its path toward Earth, and so — just like a room seen through an old, warped window, and a dirty one at that — it’s not simple to interpret what we’re actually seeing. Where, exactly, is the material `in truth’, such that its light appears where it does in the image? Interpretation of the image is potentially ambiguous, and certainly not obvious.

The naive guess as to what to expect — which astronomers developed over many years, based on many studies of many suspected black holes — is crudely illustrated in the figure at the end of this post.  Material around a black hole has two main components:

  • An accretion disk of `gas’ (really plasma, i.e. a very hot collection of electrons, protons, and other atomic nuclei) which may be thin and concentrated, or thick and puffy, or something more complicated.  The disk extends inward to within a few times the radius of the black hole’s event horizon, the point of no-return; but how close it can be depends on how fast the black hole rotates.
  • Two oppositely-directed jets of material, created somehow by material from the disk being concentrated and accelerated by magnetic fields tied up with the black hole and its accretion disk; the jets begin not far from the event horizon, but then extend outward all the way to the outer edges of the entire galaxy.

But even if this is true, it’s not at all obvious (at least to me) what these objects look like in an image such as we saw Wednesday. As far as I am currently aware, their appearance in the image depends on

  • Whether the disk is thick and puffy, or thin and concentrated;
  • How far the disk extends inward and outward around the black hole;
  • The process by which the jets are formed and where exactly they originate;
  • How fast the black hole is spinning;
  • The orientation of the axis around which the black hole is spinning;
  • The typical frequencies of the radio waves emitted by the disk and by the jets (compared to the frequency, about 230 Gigahertz, observed by the Event Horizon Telescope);

and perhaps other things. I can’t yet figure out what we do and don’t know about these things; and it doesn’t help that some of the statements made by the EHT scientists in public and in their six papers seem contradictory (and I can’t yet say whether that’s because of typos, misstatements by them, or [most likely] misinterpretations by me.)

So here’s the best I can do right now, for myself and for you. Below is a figure that is nothing but an illustration of my best attempt so far to make sense of what we are seeing. You can expect that some fraction of this figure is wrong. Increasingly I believe this figure is correct in cartoon form, though the picture on the left is too sketchy right now and needs improvement.  [NOTE ADDED: AS EXPLAINED IN THIS MORE RECENT POST, THE “PHOTON-SPHERE” DOES NOT EXIST FOR A ROTATING BLACK HOLE; THE “PHOTON-RING” OF LIGHT THAT SURROUNDS THE SHADOW DOES NOT DOMINATE WHAT IS ACTUALLY SEEN IN THE IMAGE; AND THE DARK PATCH IN THE IMAGE ISN’T NECESSARILY THE ENTIRE SHADOW.]  What I’ll be doing this week is fixing my own misconceptions and trying to become clear on what the experts do and don’t know. Experts are more than welcome to set me straight!

In short — this story is not over, at least not for me. As I gain a clearer understanding of what we do and don’t know, I’ll write more about it.

 

MyFirstGuessBHPhoto.png

My personal confused and almost certainly inaccurate understanding [the main inaccuracy is that the disk and jets are fatter than shown, and connected to one another near the black hole; that’s important because the main illumination source may be the connection region; also jets aren’t oriented quite right] of how one might interpret the black hole image; all elements subject to revision as I learn more. Left: the standard guess concerning the immediate vicinity of M87’s black hole: an accretion disk oriented nearly face-on from Earth’s perspective, jets aimed nearly at and away from us, and a rotating black hole at the center.  The orientation of the jets may not be correct relative to the photo.  Upper right: The image after the radio waves’ paths are bent by gravity.  The quasi-silhouette of the black hole is larger than the `true’ event horizon, a lot of radio waves are concentrated at the ‘photon-sphere’ just outside (brighter at the bottom due to the black-hole spinning clockwise around an axis slightly askew to our line of sight); some additional radio waves from the accretion disk and jets further complicate the image. Most of the disk and jets are too dim to see.  Lower Right: This image is then blurred out by the Event Horizon Telescope’s limitations, partly compensated for by heavy-duty image processing.

 

Matt StrasslerThe Black Hole `Photo’: Seeing More Clearly

THIS POST CONTAINS ERRORS CONCERNING THE EXISTENCE AND VISIBILITY OF THE SO-CALLED PHOTON-SPHERE AND SHADOW; THESE ERRORS WERE COMMON TO ESSENTIALLY ALL REPORTING ON THE BLACK HOLE ‘PHOTO’.  IT HAS BEEN SUPERSEDED BY THIS POST, WHICH CORRECTS THESE ERRORS AND EXPLAINS THE SITUATION.

Ok, after yesterday’s post, in which I told you what I still didn’t understand about the Event Horizon Telescope (EHT) black hole image (see also the pre-photo blog post in which I explained pedagogically what the image was likely to show and why), today I can tell you that quite a few of the gaps in my understanding are filling in (thanks mainly to conversations with Harvard postdoc Alex Lupsasca and science journalist Davide Castelvecchi, and to direct answers from professor Heino Falcke, who leads the Event Horizon Telescope Science Council and co-wrote a founding paper in this subject).  And I can give you an update to yesterday’s very tentative figure.

First: a very important point, to which I will return in a future post, is that as I suspected, it’s not at all clear what the EHT image really shows.   More precisely, assuming Einstein’s theory of gravity is correct in this context:

  • The image itself clearly shows a black hole’s quasi-silhouette (called a `shadow’ in expert jargon) and its bright photon-sphere where photons [particles of light — of all electromagnetic waves, including radio waves] can be gathered and focused.
  • However, all the light (including the observed radio waves) coming from the photon-sphere was emitted from material well outside the photon-sphere; and the image itself does not tell you where that material is located.  (To quote Falcke: this is `a blessing and a curse’; insensitivity to the illumination source makes it easy to interpret the black hole’s role in the image but hard to learn much about the material near the black hole.) It’s a bit analogous to seeing a brightly shining metal ball while not being able to see what it’s being lit by… except that the photon-sphere isn’t an object.  It’s just a result of the play of the light [well, radio waves] directed by the bending effects of gravity.  More on that in a future post.
  • When you see a picture of an accretion disk and jets drawn to illustrate where the radio waves may come from, keep in mind that it involves additional assumptions — educated assumptions that combine many other measurements of M87’s black hole with simulations of matter, gravity and magnetic fields interacting near a black hole.  But we should be cautious: perhaps not all the assumptions are right.  The image shows no conflicts with those assumptions, but neither does it confirm them on its own.

Just to indicate the importance of these assumptions, let me highlight a remark made at the press conference that the black hole is rotating quickly, clockwise from our perspective.  But (as the EHT papers state) if one doesn’t make some of the above-mentioned assumptions, one cannot conclude from the image alone that the black hole is actually rotating.  The interplay of these assumptions is something I’m still trying to get straight.

Second, if you buy all the assumptions, then the picture I drew in yesterday’s post is mostly correct except (a) the jets are far too narrow, and shown overly disconnected from the disk, and (b) they are slightly mis-oriented relative to the orientation of the image.  Below is an improved version of this picture, probably still not the final one.  The new features: the jets (now pointing in the right directions relative to the photo) are fatter and not entirely disconnected from the accretion disk.  This is important because the dominant source of illumination of the photon-sphere might come from the region where the disk and jets meet.

My3rdGuessBHPhoto.png

Updated version of yesterday’s figure: main changes are the increased width and more accurate orientation of the jets.  Working backwards: the EHT image (lower right) is interpreted, using mainly Einstein’s theory of gravity, as (upper right) a thin photon-sphere of focused light surrounding a dark patch created by the gravity of the black hole, with a little bit of additional illumination from somewhere.  The dark patch is 2.5 – 5 times larger than the event horizon of the black hole, depending on how fast the black hole is rotating; but the image itself does not tell you how the photon-sphere is illuminated or whether the black hole is rotating.  Using further assumptions, based on previous measurements of various types and computer simulations of material, gravity and magnetic fields, a picture of the black hole’s vicinity (upper left) can be inferred by the experts. It consists of a fat but tenuous accretion disk of material, almost face-on, some of which is funneled into jets, one heading almost toward us, the other in the opposite direction.  The material surrounds but is somewhat separated from a rotating black hole’s event horizon.  At this radio frequency, the jets and disk are too dim in radio waves to see in the image; only at (and perhaps close to) the photon-sphere, where some of the radio waves are collected and focused, are they bright enough to be easily discerned by the Event Horizon Telescope.

 

Cylindrical OnionThere’s No Such Thing as a Fish… At CERN

Guest post by James Gillies

Left to right: James Harkin, Anna Ptaszynski, Dan Schreiber and Andrew Hunter Murray, collectively the QI Elves, pose for selfies at CMS. Their guide, CMS physicist Dave Barney, is on the far left.

 

On Friday 31 May, some of the writing team behind the cult UK TV panel show, QI, visited CERN to find inspiration before a live show in Geneva that night.

David Hogginformation theory and noise

In my small amount of true research time today, I wrote an abstract for the information-theory (or is it data-analysis?) paper that Bedell and I are writing about extreme-precision radial-velocity spectroscopy. The question is: What is the best precision you can achieve, and what data-analysis methods saturate the bound? The answer depends, of course, on the kinds of noise you have in your data! Oh, and what counts as noise.

BackreactionPhysicists are out to unlock the muon’s secret

Fermilab g-2 experiment. [Image Glukicov/Wikipedia] Physicists count 25 elementary particles that, for all we presently know, cannot be divided any further. They collect these particles and their interactions in what is called the Standard Model of particle physics. But the matter around us is made of merely three particles: up and down quarks (which combine to protons and neutrons, which

June 13, 2019

n-Category Café What's a One-Object Sesquicategory?

A sesquicategory, or 1121\frac{1}{2}-category, is like a 2-category, but without the interchange law relating vertical and horizontal composition of 2-morphisms:

(αβ)(γδ)=(αγ)(βδ) (\alpha \cdot \beta)(\gamma \cdot \delta) = (\alpha \gamma) \cdot (\beta \delta)

Better, sesquicategories are categories enriched over (Cat,)(Cat,\square): the category of categories with its “white” tensor product. In the cartesian product of categories CC and DD, namely C×DC \times D, we have the law

(1)(f×1)(1×g)=(1×g)(f×1) (f \times 1)(1 \times g) = (1 \times g)(f \times 1)

and we can define f×gf \times g to be either of these. In the white tensor product CDC \square D we do not have this law, and f×gf \times g makes no sense.

What’s a one-object sesquicategory?

A one-object sesquicategory is like a strict monoidal category, but without the law

(f1)(1g)=(1g)(f1) (f \otimes 1)(1 \otimes g) = (1 \otimes g)(f \otimes 1)

I seem to have run into a bunch of interesting examples. Is there some name for these gadgets?

If not, I may take the “one-and-a-half” joke embedded in the word “sesquicategory”, and subtract one. That would make these things semi-monoidal categories.

(The name “white” tensor product is part of another string of jokes, involving the white, Gray, and black tensor products of 2-categories. The white tensor product is also called the “funny” tensor product.)

June 12, 2019

BackreactionGuest Post: A conversation with Lee Smolin about his new book "Einstein’s Unfinished Revolution"

[Tam Hunt sent me another lengthy interview, this time with Lee Smolin. Smolin is a faculty member at the Perimeter Institute for Theoretical Physics in Canada and adjunct professor at the University of Waterloo. He is one of the founders of loop quantum gravity. In the past decades, Smolin’s interests have drifted to the role of time in the laws of nature and the foundations of quantum

Dave BaconHello world!

Welcome to WordPress. This is your first post. Edit or delete it, then start writing!

June 11, 2019

Tommaso DorigoA Microscope - What A Wonderful Toy

I have always been fascinated by optical instruments that provide magnified views of Nature: microscopes, binoculars, telescopes. As a child I badly wanted to watch the Moon, planets, and stars, and see as much detail as I could on all possible targets; at the same time, I avidly used a toy microscope to watch the microworld. So it is not a surprise to find out I have grown up into a particle physicist - I worked hard to put myself in a vantage position from where I can study the smallest building blocks of matter with the most powerful microscope ever constructed, the Large Hadron Collider (LHC). 

read more

Georg von HippelLooking for guest bloggers to cover LATTICE 2019

My excellent reason for not attending LATTICE 2018 has become a lot bigger, much better at many things, and (if possible) even more beautiful — which means I won't be able to attend LATTICE 2019 either (I fully expect to attend LATTICE 2020, though). So once again I would greatly welcome guest bloggers willing to cover LATTICE 2019; if you are at all interested, please send me an email and we can arrange to grant you posting rights.

Georg von HippelBook Review: "Lattice QCD — Practical Essentials"

There is a new book about Lattice QCD, Lattice Quantum Chromodynamics: Practical Essentials by Francesco Knechtli, Michael Günther and Mike Peardon. At 140 pages, this is a pretty slim volume, so it is obvious that it does not aim to displace time-honoured introductory textbooks like Montvay and Münster, or the newer books by Gattringer and Lang or DeGrand and DeTar. Instead, as suggested by the subtitle "Practical Essentials", and as said explicitly by the authors in their preface, this book aims to prepare beginning graduate students for their practical work in generating gauge configurations and measuring and analysing correlators.

In line with this aim, the authors spend relatively little time on the physical or field-theoretic background; while some more advanced topics such as the Nielsen-Ninomiya theorem and the Symanzik effective theory are touched upon, the treatment of foundational topics is generally quite brief, and some topics, such as lattice perturbation theory or non-perturbative renormalization, are omitted altogether. The focus of the book is on Monte Carlo simulations, for which both the basic ideas and practically relevant algorithms — heatbath and overrelaxation for pure gauge fields, and hybrid Monte Carlo (HMC) for dynamical fermions — are described in some detail, including the RHMC algorithm and advanced techniques such as determinant factorizations, higher-order symplectic integrators, and multiple-timescale integration. The techniques from linear algebra required to deal with fermions are also covered in some detail, from the basic ideas of Krylov-space methods through concrete descriptions of the GMRES and CG algorithms, along with such important preconditioners as even-odd and domain decomposition, to the ideas of algebraic multigrid methods. Stochastic estimation of all-to-all propagators with dilution, the one-end trick and low-mode averaging are explained, as are techniques for building interpolating operators with specific quantum numbers, gauge link and quark field smearing, and the use of the variational method to extract hadronic mass spectra. Scale setting, the Wilson flow, and Lüscher's method for extracting scattering phase shifts are also discussed briefly, as are the basic statistical techniques for data analysis. Each chapter contains a list of references to the literature covering both original research articles and reviews and textbooks for further study.

Overall, I feel that the authors succeed very well at their stated aim of giving a quick introduction to the methods most relevant to current research in lattice QCD in order to let graduate students hit the ground running and get to perform research as quickly as possible. In fact, I am slightly worried that they may turn out to be too successful, since a graduate student having studied only this book could well start performing research, while having only a very limited understanding of the underlying field-theoretical ideas and problems (a problem that already exists in our field in any case). While this in no way detracts from the authors' achievement, and while I feel I can recommend this book to beginners, I nevertheless have to add that it should be complemented by a more field-theoretically oriented traditional textbook for completeness.

___
Note that I have deliberately not linked to the Amazon page for this book. Please support your local bookstore — nowadays, you can usually order online on their websites, and many bookstores are more than happy to ship books by post.

June 10, 2019

Matt StrasslerMinor Technical Difficulty with WordPress

Hi all — sorry to bother you with an issue you may not even have noticed, but about 18 hours ago a post of mine that was under construction was accidentally published, due to a WordPress bug.  Since it isn’t done yet, it isn’t readable (and has no figures yet) and may still contain errors and typos, so of course I tried to take it down immediately.  But it seems some of you are still getting the announcement of it or are able to read parts of it.  Anyway, I suggest you completely ignore it, because I’m not done working out the scientific details yet, nor have I had it checked by my more expert colleagues; the prose and perhaps even the title may change greatly before the post comes out later this week.  Just hang tight and stay tuned…

Doug NatelsonRound-up of various links

I'll be writing more soon, but in the meantime, some items of interest:

  • A cute online drawing utility for making diagrams and flowcharts is available free at https://www.draw.io/.
  • There is more activity afoot regarding the report of possible Au/Ag superconductivity.  For example, Jeremy Levy has a youtube video about this topic, and I think it's very good - I agree strongly with the concerns about heterogeneity and percolation. The IIS group also has another preprint on the arxiv, this one looking at I-V curves and hysteresis in these Au/Ag nanoparticle films.  Based on my prior experience with various "resistive switching" systems and nanoparticle films, hysteretic current-voltage characteristics don't surprise me when biases on the scale of volts and currents on the scale of mA are applied to aggregated nanoparticles.  
  • Another group finds weird effects in sputtered Au/Ag films, and these have similar properties as those discussed by Prof. Levy.  
  • Another group finds apparent resistive superconducting transitions in Au films ion-implanted with Ag, with a transition temperature of around 2 K.  This data look clean and consistent - it would be interesting to see Meissner effect measurements here.  
  • For reference, it's worth noting that low temperature superconductivity in Au alloys is not particularly rare (pdf here from 1984, for example, or this more recent preprint).   
  • On a completely different note, I really thought this paper on the physics of suction cups was very cute.
  • Following up, Science had another article this week about graduate programs dropping the GRE requirement.
  • This is a very fun video using ball bearings to teach about crystals - just like with drought balls, we see that aspects of crystallinity like emergent broken symmetries and grain boundaries are very generic.

June 07, 2019

Mark Chu-CarrollCategory Theory Lesson 3: From Arrows to Lambda

Quick personal aside: I haven’t been posting a lot on here lately. I keep wanting to get back to it; but each time I post anything, I’m met by a flurry of crap: general threats, lawsuit threats, attempts to steal various of my accounts, spam to my contacts on linkedin, subscriptions to ashley madison or gay porn sites, etc. It’s pretty demotivating. I shouldn’t let the jerks drive me away from my hobby of writing for this blog!

I started this series of posts by saying that Category Theory was an extremely abstract field of mathematics which was really useful in programming languages and in particular in programming language type systems. We’re finally at one of the first places where you can really see how that’s going to work.

If you program in Scala, you might have encountered curried functions. A curried function is something that’s in-between a one-parameter function and a two parameter function. For a trivial example, we could write a function that adds two integers in its usual form:

  def addInt(x: Int, y: Int): Int = x + y

That’s just a normal two parameter function. Its curried form is slightly different. It’s written:

  def curriedAddInt(x: Int)(y: Int): Int = x +y

The curried version isn’t actually a two parameter function. It’s a shorthand for:

  def realCurrentAddInt(x: Int): (Int => Int) = (y: Int) => x + y

That is: currentAddInt is a function which takes an integer, x, and returns a function which takes one parameter, and adds x to that parameter.

Currying is the operation of taking a two parameter function, and turning it into a one-parameter function that returns another one-parameter function – that is, the general form of converting addInt to realAddInt. It might be easier to read its type: realCurrentAddInt: Int => (Int => Int): It’s a function that takes an int, and returns a new function from int to int.

So what does that have to do with category theory?

One of the ways that category theory applies to programming languages is that types and type theory turn out to be natural categories. Almost any programming language type system is a category. For example, the figure below shows a simple view of a programming language with the types Int, Bool, and Unit. Unit is the initial object, and so all of the primitive constants are defined with arrows from Unit.

For the most part, that seems pretty simple: a type T is an object in the programming language category; a function implemented in the language that takes a parameter of type A and returns a value of type is an arrow from A to B. A multi-parameter function just uses the cartesian product: a function that takes (A, B) and returns a C is an arrow from A \times B \rightarrow C.

But how could we write the type of a function like our curried adder? It’s a function from a value to a function. The types in our language are objects in the category. So where’s the object that represents functions from A to B?

As we do often, we’ll start by thinking about some basic concepts from set theory, and then generalize them into categories and arrows. In set theory, we can define the set of functions from A to B as: B^A={f: A \rightarrow B} – that is, as exponentiation of the range of the produced functions.

  • There’s a product object B^A \times A.
  • There’s an arrow from B^A \times A \rightarrow B, which we’ll call eval.

In terms of the category of sets, what that means is:

  • You can create a pair of a function from A \rightarrow B and an element of A.
  • There is a function named eval which takes that pair, and returns an instance of B.

Like we saw with products, there’s a lot of potential exponential objects C which have the necessary product with A, and arrow from that product to B. But which one is the ideal exponential? Again, we’re trying to get to the object with thie universal property – the terminal object in the category of pseudo-exponentials. So we use the same pattern as before. For any potential exponential, there’s an arrow from the potential exponential to the actual exponential, and the potential exponential with arrows from every other potential exponential is the exponential.

Let’s start putting that together. A potential exponential C for B^A is an object where the following product and arrow exist:

There’s an instance of that pattern for the real exponential:

We can create a category of these potential exponentials. In that category, there will be an arrow from every potential exponential to the real exponential. Each of the potential exponentials has the necessary property of an exponential – that product and eval arrow above – but they also have other properties.

In that category of potential exponentials of B^A, there’s an arrow from an object X to an object Y if the following conditions hold in the base category:

  • There is an arrow \text{curry}(x,y): X \rightarrow Y in the base category.
  • There is an arrow \text{curry}(x,y)\times id_A: X\times A \rightarrow Y\times A
  • \text{eval}_y(\text{curry}(x,y)\times id_A=\text{eval}_y y

It’s easiest to understand that by looking at what it means in Set:

  • We’ve got sets X and Y, which we believe are potential exponents.
  • X has a function \text{eval}_x: X \times A \rightarrow B.
  • Y has a function \text{eval}_y: Y \times A \rightarrow B.
  • There’s a function \text{curry}: X \rightarrow Y which converts a value of X to a value of Y, and a corresponding function \text{curry}(\cdot)\times\text{id}_A: X\times A \rightarrow Y\times A, which given a pair (x, a) \in X\times A transforms it into a pair (y, a) \in Y\times A, where evaluating \text{eval}_x(x, a)=\text{eval}_y(\text{curry}(x, a)). In other words, if we restrict the inputs to Y to be effectively the same as the inputs to X, then the two eval functions do the same thing. (Why do I say restrict? Because \text{eval}_y might have a larger domain than the range of X, but these rules won’t capture that.)

An arrow in the category of potential products is a pair of two arrows in the base category:  one from C \rightarrow B^A, and one from C\times A \rightarrow B^A \times A . Since the two arrows are deeply related (they’re one arrow in the category of potential exponentials), we’ll call them \text{curry}(g) and \text{curry}(g)\times id_A. (Note that we’re not really taking the product of an arrow here: we haven’t talked about anything like taking products of arrows! All we’re doing is giving the arrow a name that helps us understand it. The name makes it clear that we’re not touching the right-hand component of the product.)

Since the exponential is the terminal, which means that that pair of curry arrows must exist for every potential exponential to the true exponential. So the exponential object is the unique (up to isomorphism) object for which the following is true:

  • There’s an arrow \text{eval}: B^A \times A \rightarrow A. Since B^A is the type of functions from A to B, \text{eval} represents the application of one of those functions to a value of type A to produce a result of type B.
  • For each two-parameter function g:C\times A\rightarrow B, there is a unique function (arrow) \text{curry}(g) that makes the following diagram commute

Now, how does all this relate to what we understand as currying?

It shows us that in category theory we can have an object that is effectively represents a function type in the same category as the object that represents the type of values it operates on, and you can capture the notion of applying values of that function type onto values of their parameter type using an arrow.

As I said before: not every category has a structure that can support exponentiation. The examples of this aren’t particularly easy to follow. The easiest one I’ve found is Top the category of topological spaces. In Top, the exponent doesn’t exist for many objects. Objects in Top are topological spaces, and arrows are continuous functions between them. For any two objects in Top, you can create the necessary objects for the exponential. But for many topological spaces, the required arrows don’t exist. The functions that they correspond to exist in Set, but they’re not continuous – and so they aren’t arrows in Top. (The strict requirement is that for an exponential X^Y to exist, Y must be a locally compact Hausdorff space. What that means is well beyond the scope of this!)

Cartesian Closed Categories

If you have a category C, and for every pair of objects A and B in the category C, there exists an exponential object B^A \in C, then we’ll say that C has exponentiation. Similarly, if for every pair of objects A, B \in Ob(C), there exists a product object A\times B, we say that the category has products.

There’s a special kind of category, called a cartesian closed category, which is a category  where:

  1. Every pair of objects has both product and exponent objects; and
  2. Which has at least one terminal object. (Remember that terminals are something like singletons, and so they work as a way of capturing the notion of being a single element of an object; so this requirement basically says that the category has at least one value that “functions” can be applied to.)

That may seem like a very arbitrary set of rules: what’s so important about having all products, exponents, and a terminal object?

It means that we have a category which can model types, functions, and function application. Lambda calculus proves that that’s all you need to model computation. Closed cartesian categories are, basically, a formal model of a computing system! Any cartesian closed category is a model for a simply typed \lambda-calculus; and \lambda-calculus is something known as the internal language of a cartesian closed category.

What “internal language” means formally is complicated, but in simple terms: you can take any computation in lambda calculus, and perform the computation by chasing arrows in a category diagram of a closed cartesian category that includes the values of that calculus. Alternatively, every computation step that you perform evaluating a \lambda-calculus expression corresponds to an arrow in a CCC.

References

For this post, I’ve made heavy use of:

Matt von HippelAcademic Age

Growing up in the US there are a lot of age-based milestones. You can drive at 16, vote at 18, and drink at 21. Once you’re in academia though, your actual age becomes much less relevant. Instead, academics are judged based on academic age, the time since you got your PhD.

And no, we don’t get academic birthdays

Grants often have restrictions based on academic age. The European Research Council’s Starting Grant, for example, demands an academic age of 2-7. If you’re academically “older”, they expect more from you: you must instead apply for a Consolidator Grant, or an Advanced Grant.

More generally, when academics apply for jobs they are often weighed in terms of academic age. Compared to others, how long have you spent as a postdoc since your PhD? How many papers have you published since then, and how well cited were they? The longer you spend without finding a permanent position, the more likely employers are to wonder why, and the reasons they assume are rarely positive.

This creates some weird incentives. If you have a choice, it’s often better to graduate late than to graduate early. Employers don’t check how long you took to get your PhD, but they do pay attention to how many papers you published. If it’s an option, staying in school to finish one more project can actually be good for your career.

Biological age matters, but mostly for biological reasons: for example, if you plan to have children. Raising a family is harder if you have to move every few years, so those who find permanent positions by then have an easier time of it. That said, as academics have to take more temporary positions before settling down fewer people have this advantage.

Beyond that, biological age only matters again at the end of your career, especially if you work somewhere with a mandatory retirement age. Even then, retirement for academics doesn’t mean the same thing as for normal people: retired professors often have emeritus status, meaning that while technically retired they keep a role at the university, maintaining an office and often still doing some teaching or research.

June 06, 2019

n-Category Café Nonstandard Models of Arithmetic

A nice quote:

There seems to be a murky abyss lurking at the bottom of mathematics. While in many ways we cannot hope to reach solid ground, mathematicians have built impressive ladders that let us explore the depths of this abyss and marvel at the limits and at the power of mathematical reasoning at the same time.

This is from Matthew Katz and Jan Reimann’s nice little book An Introduction to Ramsey Theory: Fast Functions, Infinity, and Metamathematics. I’ve been been talking to my old friend Michael Weiss about nonstandard models of Peano arithmetic on his blog. We just got into a bit of Ramsey theory. But you might like the whole series of conversations.

  • Part 1: I say I’m trying to understand “recursively saturated” models of Peano arithmetic, and Michael dumps a lot of information on me. The posts get easier to read after this one!

  • Part 2: I explain my dream: to show that the concept of “standard model” of Peano arithmetic is more nebulous than many seem to think. We agree to go through Ali Enayat’s paper Standard models of arithmetic.

  • Part 3: We talk about the concept of “standard model”, and the ideas of some ultrafinitists.

  • Part 4: Michael mentions “the theory of true arithmetic”, and I ask what that means. We decide that a short dive into the philosophy of mathematics may be required.

  • Part 5: Michael explains his philosophies (plural!) of mathematics, and how they affect his attitude toward the natural numbers and the universe of sets.

  • Part 6: After explaining my distaste for the Punch-and-Judy approach to the philosophy of mathematics (of which Michael is not guilty), I point out a strange fact: our views on the infinite cast shadows on our study of the natural numbers. For example: large cardinal axioms help us name larger finite numbers.

  • Part 7: We discuss Enayat’s concept of “a T-standard model of PA”, where T is some set of axioms extending ZF. We conclude with a brief digression into Hermetic philosophy: “as above, so below”.

  • Part 8: We discuss the tight relation between PA and ZFC with the axiom of infinity replaced by its negation. We then chat about Ramsey theory as a warmup for the Paris–Harrington Theorem.

  • Part 9: Michael sketches the proof of the Paris–Harrington Theorem, which says that a certain rather simple theorem about combinatorics can be stated in PA, and proved in ZFC, but not proved in PA. The proof he sketches builds a nonstandard model in which this theorem does not hold!

John BaezNonstandard Models of Arithmetic

There seems to be a murky abyss lurking at the bottom of mathematics. While in many ways we cannot hope to reach solid ground, mathematicians have built impressive ladders that let us explore the depths of this abyss and marvel at the limits and at the power of mathematical reasoning at the same time.

This is a quote from Matthew Katz and Jan Reimann’s book An Introduction to Ramsey Theory: Fast Functions, Infinity, and Metamathematics. I’ve been been talking to my old friend Michael Weiss about nonstandard models of Peano arithmetic on his blog. We just got into a bit of Ramsey theory. But you might like the whole series of conversations, which are precisely about this murky abyss.

Here it is so far:

Part 1: I say I’m trying to understand ‘recursively saturated’ models of Peano arithmetic, and Michael dumps a lot of information on me. The posts get easier to read after this one!

Part 2: I explain my dream: to show that the concept of ‘standard model’ of Peano arithmetic is more nebulous than many seem to think. We agree to go through Ali Enayat’s paper Standard models of arithmetic.

Part 3: We talk about the concept of ‘standard model’, and the ideas of some ultrafinitists: Alexander Yessenin-Volpin and Edward Nelson.

Part 4: Michael mentions “the theory of true arithmetic”, and I ask what that means. We decide that a short dive into the philosophy of mathematics may be required.

Part 5: Michael explains his philosophies of mathematics, and how they affect his attitude toward the natural numbers and the universe of sets.

Part 6: After explaining my distaste for the Punch-and-Judy approach to the philosophy of mathematics (of which Michael is thankfully not guilty), I point out a strange fact: our views on the infinite cast shadows on our study of the natural numbers. For example: large cardinal axioms help us name larger finite numbers.

Part 7: We discuss Enayat’s concept of “a T-standard model of PA”, where T is some axiom system for set theory. I describe my crazy thought: maybe your standard natural numbers are nonstandard for me. We conclude with a brief digression into Hermetic philosophy: “as above, so below”.

Part 8: We discuss the tight relation between PA and ZFC with the axiom of infinity replaced by its negation. We then chat about Ramsey theory as a warmup for the Paris–Harrington Theorem.

Part 9: Michael sketches the proof of the Paris–Harrington Theorem, which says that a certain rather simple theorem about combinatorics can be stated in PA, and proved in ZFC, but not proved in PA. The proof he sketches builds a nonstandard model in which this theorem does not hold!

n-Category Café Nonstandard Models of Arithmetic

There seems to be a murky abyss lurking at the bottom of mathematics. While in many ways we cannot hope to reach solid ground, mathematicians have built impressive ladders that let us explore the depths of this abyss and marvel at the limits and at the power of mathematical reasoning at the same time.

This is a quote from Matthew Katz and Jan Reimann’s book An Introduction to Ramsey Theory: Fast Functions, Infinity, and Metamathematics. I’ve been been talking to my old friend Michael Weiss about nonstandard models of Peano arithmetic on his blog. We just got into a bit of Ramsey theory. But you might like the whole series of conversations, which are precisely about this murky abyss.

Here is is so far:

  • Part 1: I say I’m trying to understand ‘recursively saturated’ models of Peano arithmetic, and Michael dumps a lot of information on me. The posts get easier to read after this one!

  • Part 2: I explain my dream: to show that the concept of ‘standard model’ of Peano arithmetic is more nebulous than many seem to think. We agree to go through Ali Enayat’s paper Standard models of arithmetic.

  • Part 3: We talk about the concept of ‘standard model’, and the ideas of some ultrafinitists: Alexander Yessenin-Volpin and Edward Nelson.

  • Part 4: Michael mentions “the theory of true arithmetic”, and I ask what that means. We decide that a short dive into the philosophy of mathematics may be required.

  • Part 5: Michael explains his philosophies of mathematics, and how they affect his attitude toward the natural numbers and the universe of sets.

  • Part 6: After explaining my distaste for the Punch-and-Judy approach to the philosophy of mathematics (of which Michael is thankfully not guilty), I point out a strange fact: our views on the infinite cast long shadows on our study of the natural numbers. For example: large cardinal axioms help us name larger finite numbers.

  • Part 7: We discuss Enayat’s concept of “a T-standard model of PA”, where T is some axiom system for set theory. I describe my crazy thought: maybe your standard natural numbers are nonstandard for me. We conclude with a brief digression into Hermetic philosophy: “as above, so below”.

  • Part 8: We discuss the tight relation between PA and ZFC with the axiom of infinity replaced by its negation. We then chat about Ramsey theory as a warmup for the Paris–Harrington Theorem.

  • Part 9: Michael sketches the proof of the Paris–Harrington Theorem, which says that a certain rather simple theorem about combinatorics can be stated in PA, and proved in ZFC, but not proved in PA. The proof he sketches builds a nonstandard model in which this theorem does not hold!

June 05, 2019

Tommaso DorigoA Mesmerizing Double Shadow On Jupiter

Last night I was absolutely mesmerized by observing the transit of Ganymede and Io, two of Jupiter's largest four moons, on Jupiter's disk. Along with them, their respective ink-black shadows slowly crossed the illuminated disk of the gas giant. The show lasted a few hours, and by observing it through a telescope I could see a three-dimensional view of the bodies, and appreciate the dynamics of that miniature planetary system. 


In this post I wish to explain to you, dear reader, just why the whole thing is so fascinating and fantabulous to see, in the hope that, should you have a chance to observe it yourself, you grab the occasion without considering the lack of sleep it entails. I am sure you will thank me later.


read more

Clifford JohnsonNews from the Front, XVII: Super-Entropic Instability

I'm quite excited because of some new results I got recently, which appeared on the ArXiv today. I've found a new (and I think, possibly important) instability in quantum gravity.

Said more carefully, I've found a sibling to Hawking's celebrated instability that manifests itself as black hole evaporation. This new instability also results in evaporation, driven by Hawking radiation, and it can appear for black holes that might not seem unstable to evaporation in ordinary circumstances (i.e., there's no Hawking channel to decay), but turn out to be unstable upon closer examination, in a larger context. That context is the extended gravitational thermodynamics you've read me talking about here in several previous posts (see e.g. here and here). In that framework, the cosmological constant is dynamical and enters the thermodynamics as a pressure variable, p. It has a conjugate, V, which is a quantity that can be derived once you know the pressure and the mass of the black hole.

Well, Hawking evaporation is a catastrophic quantum phenomenon that follows from the fact that the radiation temperature of a Schwarzschild black hole (the simplest one you can think of) goes inversely with the mass. So the black hole radiates and loses energy, reducing its mass. But that means that it will radiate at even higher temperature, driving its mass down even more. So it will radiate even more, and so on. So it is an instability in the sense that the system drives itself even further away from where it started at every moment. Like a pencil falling over from balancing on a point.

This is the original quantum instability for gravitational systems. It's, as you probably know, very important. (Although in our universe, the temperature of radiation is so tiny for astrophysical black holes (they have large mass) that the effect is washed out by the local temperature of the universe... But if the univverse ever had microscopic black holes, they'd have radiated in this way...)

So very nice, so very 1970s. What have I found recently?

A nice way of expressing the above instability is to simply say [...] Click to continue reading this post

The post News from the Front, XVII: Super-Entropic Instability appeared first on Asymptotia.

John BaezQuantum Physics and Logic 2019

open_petri_4

There’s another conference involving applied category theory at Chapman University!

• Quantum Physics and Logic 2019, June 9-14, 2019, Chapman University, Beckman Hall 404. Organized by Matthew Leifer, Lorenzo Catani, Justin Dressel, and Drew Moshier.

The QPL series started out being about quantum programming languages, but it later broadened its scope while keeping the same acronym. This conference series now covers quite a range of topics, including the category-theoretic study of physical systems. My students Kenny Courser, Jade Master and Joe Moeller will be speaking there, and I’ll talk about Kenny’s new work on structured cospans as a tool for studying open systems.

Program

The program is here.

Invited talks

• John Baez (UC Riverside), Structured cospans.

• Anna Pappa (University College London), Classical computing via quantum means.

• Joel Wallman (University of Waterloo), TBA.

Tutorials

• Ana Belen Sainz (Perimeter Institute), Bell nonlocality: correlations from principles.

• Quanlong Wang (University of Oxford) and KangFeng Ng (Radboud University), Completeness of the ZX calculus.

June 03, 2019

Scott AaronsonNP-complete Problems and Physics: A 2019 View

If I want to get back to blogging on a regular basis, given the negative amount of time that I now have for such things, I’ll need to get better at dispensing with pun-filled titles, jokey opening statements, etc. etc., and resigning myself to a less witty, more workmanline blog.

So in that spirit: a few weeks ago I gave a talk at the Fields Institute in Toronto, at a symposium to celebrate Stephen Cook and the 50th anniversary (or actually more like 48th anniversary) of the discovery of NP-completeness. Thanks so much to the organizers for making this symposium happen.

You can watch the video of my talk here (or read the PowerPoint slides here). The talk, on whether NP-complete problems can be efficiently solved in the physical universe, covers much the same ground as my 2005 survey article on the same theme (not to mention dozens of earlier talks), but this is an updated version and I’m happier with it than I was with most past iterations.

As I explain at the beginning of the talk, I wasn’t going to fly to Toronto at all, due to severe teaching and family constraints—but my wife Dana uncharacteristically urged me to go (“don’t worry, I’ll watch the kids!”). Why? Because in her view, it was the risks that Steve Cook took 50 years ago, as an untenured assistant professor at Berkeley, that gave birth to the field of computational complexity that Dana and I both now work in.

Anyway, be sure to check out the other talks as well—they’re by an assortment of random nobodies like Richard Karp, Avi Wigderson, Leslie Valiant, Michael Sipser, Alexander Razborov, Cynthia Dwork, and Jack Edmonds. I found the talk by Edmonds particularly eye-opening: he explains how he thought about (the objects that we now call) P and NP∩coNP when he first defined them in the early 60s, and how it was similar to and different from the way we think about them today.

Another memorable moment came when Edmonds interrupted Sipser’s talk—about the history of P vs. NP—to deliver a booming diatribe about how what really matters is not mathematical proof, but just how quickly you can solve problems in the real world. Edmonds added that, from a practical standpoint, P≠NP is “true today but might become false in the future.” In response, Sipser asked “what does a mathematician like me care about the real world?,” to roars of approval from the audience. I might’ve picked a different tack—about how for every practical person I meet for whom it’s blindingly obvious that “in real life, P≠NP,” I meet another for whom it’s equally obvious that “in real life, P=NP” (for all the usual reasons: because SAT solvers work so well in practice, because physical systems so easily relax as their ground states, etc). No wonder it took 25+ years of smart people thinking about operations research and combinatorial optimization before the P vs. NP question was even explicitly posed.


Unrelated Announcement: The Texas Advanced Computing Center (TACC), a leading supercomputing facility in North Austin that’s part of the University of Texas, is seeking to hire a Research Scientist focused on quantum computing. Such a person would be a full participant in our Quantum Information Center at UT Austin, with plenty of opportunities for collaboration. Check out their posting!

John PreskillThermodynamics of quantum channels

You would hardly think that a quantum channel could have any sort of thermodynamic behavior. We were surprised, too.

How do the laws of thermodynamics apply in the quantum regime? Thanks to novel ideas introduced in the context of quantum information, scientists have been able to develop new ways to characterize the thermodynamic behavior of quantum states. If you’re a Quantum Frontiers regular, you have certainly read about these advances in Nicole’s captivating posts on the subject.

Asking the same question for quantum channels, however, turned out to be more challenging than expected. A quantum channel is a way of representing how an input state can change into an output state according to the laws of quantum mechanics. Let’s picture it as a box with an input state and an output state, like so:

channel-01

A computing gate, the building block of quantum computers, is described by a quantum channel. Or, if Alice sends a photon to Bob over an optical fiber, then the whole process is represented by a quantum channel. Thus, by studying quantum channels directly we can derive statements that are valid regardless of the physical platform used to store and process the quantum information—ion traps, superconducting qubits, photonic qubits, NV centers, etc.

We asked the following question: If I’m given a quantum channel, can I transform it into another, different channel by using something like a miniature heat engine? If so, how much work do I need to spend in order to accomplish this task? The answer is tricky because of a few aspects in which quantum channels are more complicated than quantum states.

In this post, I’ll try to give some intuition behind our results, which were developed with the help of Mario Berta and Fernando Brandão, and which were recently published in Physical Review Letters.

First things first, let’s worry about how to study the thermodynamic behavior of miniature systems.

Thermodynamics of small stuff

One of the important ideas that quantum information brought to thermodynamics is the idea of a resource theory. In a resource theory, we declare that there are certain kinds of states that are available for free, and that there are a set of operations that can be carried out for free. In a resource theory of thermodynamics, when we say “for free,” we mean “without expending any thermodynamic work.”

Here, the free states are those in thermal equilibrium at a fixed given temperature, and the free operations are those quantum operations that preserve energy and that introduce no noise into the system (we call those unitary operations). Faced with a task such as transforming one quantum state into another, we may ask whether or not it is possible to do so using the freely available operations. If that is not possible, we may then ask how much thermodynamic work we need to invest, in the form of additional energy at the input, in order to make the transformation possible.

Interestingly, the amount of work needed to go from one state ρ to another state σ might be unrelated to the work required to go back from σ to ρ. Indeed, the freely allowed operations can’t always be reversed; the reverse process usually requires a different sequence of operations, incurring an overhead. There is a mathematical framework to understand these transformations and this reversibility gap, in which generalized entropy measures play a central role. To avoid going down that road, let’s instead consider the macroscopic case in which we have a large number n of independent particles that are all in the same state ρ, a state which we denote by \rho^{\otimes n}. Then something magical happens: This macroscopic state can be reversibly converted to and from another macroscopic state \sigma^{\otimes n}, where all particles are in some other state σ. That is, the work invested in the transformation from \rho^{\otimes n} to \sigma^{\otimes n} can be entirely recovered by performing the reverse transformation:

asympt-interconversion-states-01

If this rings a bell, that is because this is precisely the kind of thermodynamics that you will find in your favorite textbook. There is an optimal, reversible way of transforming any two thermodynamic states into each other, and the optimal work cost of the transformation is the difference of a corresponding quantity known as the thermodynamic potential. Here, the thermodynamic potential is a quantity known as the free energy F(\rho). Therefore, the optimal work cost per copy w of transforming \rho^{\otimes n} into \sigma^{\otimes n} is given by the difference in free energy w = F(\sigma) - F(\rho).

From quantum states to quantum channels

Can we repeat the same story for quantum channels? Suppose that we’re given a channel \mathcal{E}, which we picture as above as a box that transforms an input state into an output state. Using the freely available thermodynamic operations, can we “transform” \mathcal{E} into another channel \mathcal{F}? That is, can we wrap this box with some kind of procedure that uses free thermodynamic operations to pre-process the input and post-process the output, such that the overall new process corresponds (approximately) to the quantum channel \mathcal{F}? We might picture the situation like this:

channel-simulation-E-F-01

Let us first simplify the question by supposing we don’t have a channel \mathcal{E} to start off with. How can we implement the channel \mathcal{F} from scratch, using only free thermodynamic operations and some invested work? That simple question led to pages and pages of calculations, lots of coffee, a few sleepless nights, and then more coffee. After finally overcoming several technical obstacles, we found that in the macroscopic limit of many copies of the channel, the corresponding amount of work per copy is given by the maximum difference of free energy F between the input and output of the channel. We decided to call this quantity the thermodynamic capacity of the channel:

thermodynamic-capacity-def

Intuitively, an implementation of \mathcal{F}^{\otimes n} must be prepared to expend an amount of work corresponding to the worst possible transformation of an input state to its corresponding output state. It’s kind of obvious in retrospect. However, what is nontrivial is that one can find a single implementation that works for all input states.

It turned out that this quantity had already been studied before. An earlier paper by Navascués and García-Pintos had shown that it was exactly this quantity that characterized the amount of work per copy that could be extracted by “consuming” many copies of a process \mathcal{E}^{\otimes n} provided as black boxes.

To our surprise, we realized that Navascués and García-Pintos’s result implied that the transformation of \mathcal{E}^{\otimes n} into \mathcal{F}^{\otimes n} is reversible. There is a simple procedure to convert \mathcal{E}^{\otimes n} into \mathcal{F}^{\otimes n} at a cost per copy that equals T(\mathcal{F}) - T(\mathcal{E}). The procedure consists in first extracting T(\mathcal{E}) work per copy of the first set of channels, and then preparing \mathcal{F}^{\otimes n} from scratch at a work cost of T(\mathcal{F}) per copy:

asympt-conversion-channels-E-to-F-01

Clearly, the reverse transformation yields back all the work invested in the forward transformation, making the transformation reversible. That’s because we could have started with \mathcal{F}’s and finished with \mathcal{E}’s instead of the opposite, and the associated work cost per copy would be T(\mathcal{E}) - T(\mathcal{F}). Thus the transformation is, indeed, reversible:

asympt-interconversion-channels-01

In turn, this implies that in the many-copy regime, quantum channels have a macroscopic thermodynamic behavior. That is, there is a thermodynamic potential—the thermodynamic capacity—that quantifies the minimal work required to transform one macroscopic set of channels into another.

Prospects for the thermodynamic capacity

Resource theories that are reversible are pretty rare. Reversibility is a coveted property because a reversible resource theory is one in which we can easily understand exactly which transformations are possible. Other than the thermodynamic resource theory of states mentioned above, most instances of a resource theory—especially resource theories of channels—typically produce the kind of overheads in the conversion cost that spoil reversibility. So it’s rather exciting when you do find a new reversible resource theory of channels.

Quantum information theorists, especially those working on the theory of quantum communication, care a lot about characterizing the capacity of a channel. This is the maximal amount of information that can be transmitted through a channel. Even though in our case we’re talking about a different kind of capacity—one where we transmit thermodynamic energy and entropy, rather than quantum bits of messages—there are some close parallels between the two settings from which both fields of quantum communication and quantum thermodynamics can profit. Our result draws deep inspiration from the so-called quantum reverse Shannon theorem, an important result in quantum communication that tells us how two parties can communicate using one kind of a channel if they have access to another kind of a channel. On the other hand, the thermodynamic capacity at zero energy is a quantity that was already studied in quantum communication, but it was not clear what that quantity represented concretely. This quantity gained even more importance as it was identified as the entropy of a channel. Now, we see that this quantity has a thermodynamic interpretation. Also, the thermodynamic capacity has a simple definition, it is relatively easy to compute and it is additive—all desirable properties that other measures of capacity of a quantum channel do not necessarily share.

We still have a few rough edges that I hope we can resolve sooner or later. In fact, there is an important caveat that I have avoided mentioning so far—our argument only holds for special kinds of channels, those that do the same thing regardless of when they are applied in time. (Those channels are called time-covariant.) A lot of channels that we’re used to studying have this property, but we think it should be possible to prove a version of our result for any general quantum channel. In fact, we do have another argument that works for all quantum channels, but it uses a slightly different thermodynamic framework which might not be physically well-grounded.

That’s all very nice, I can hear you think, but is this useful for any quantum computing applications? The truth is, we’re still pretty far from founding a new quantum start-up. The levels of heat dissipation in quantum logic elements are still orders of magnitude away from the fundamental limits that we study in the thermodynamic resource theory.

Rather, our result teaches us about the interplay of quantum channels and thermodynamic concepts. We not only have gained useful insight on the structure of quantum channels, but also developed new tools for how to analyze them. These will be useful to study more involved resource theories of channels. And still, in the future when quantum technologies will perhaps approach the thermodynamically reversible limit, it might be good to know how to implement a given quantum channel in such a way that good accuracy is guaranteed for any possible quantum input state, and without any inherent overhead due to the fact that we don’t know what the input state is.

Thermodynamics, a theory developed to study gases and steam engines, has turned out to be relevant from the most obvious to the most unexpected of situations—chemical reactions, electromagnetism, solid state physics, black holes, you name it. Trust the laws of thermodynamics to surprise you again by applying to a setting you’d never imagined them to, like quantum channels.

n-Category Café Why Category Theory Matters

No, I’m not going to tell you why category theory matters. To learn that, you must go here:

It’s interesting to see an outsider’s answer to this subject. He starts with a graph purporting to show the number of times — per year, I guess? — that the phrase “category theory” has been mentioned in books:

I’m curious about the plunge after 1990. I hadn’t noticed that.

I’m amused and flattered, but also a bit unnerved to read that

Generally speaking, there seems to be a cabal of radical category theorists, led by John Baez, who are reinterpreting anything interesting in category theoretic terms.

Of course it’s meant to be humorous. The unnerving part is the idea that I’m “leading” anybody — except perhaps my grad students. It seems to me rather that category theory has an inherent tendency to spread its reach, and I’m just one of a large group of people who have tuned in to this.

May 31, 2019

Terence TaoSearching for singularities in the Navier–Stokes equations

I was recently asked to contribute a short comment to Nature Reviews Physics, as part of a series of articles on fluid dynamics on the occasion of the 200th anniversary (this August) of the birthday of George Stokes.  My contribution is now online as “Searching for singularities in the Navier–Stokes equations“, where I discuss the global regularity problem for Navier-Stokes and my thoughts on how one could try to construct a solution that blows up in finite time via an approximately discretely self-similar “fluid computer”.  (The rest of the series does not currently seem to be available online, but I expect they will become so shortly.)

 

Matt von HippelExperimental Theoretical Physics

I was talking with some other physicists about my “Black Box Theory” thought experiment, where theorists have to compete with an impenetrable block of computer code. Even if the theorists come up with a “better” theory, that theory won’t predict anything that the code couldn’t already. If “predicting something new” is an essential part of science, then the theorists can no longer do science at all.

One of my colleagues made an interesting point: in the thought experiment, the theorists can’t predict new behaviors of reality. But they can predict new behaviors of the code.

Even when we have the right theory to describe the world, we can’t always calculate its consequences. Often we’re stuck in the same position as the theorists in the thought experiment, trying to understand the output of a theory that might as well be a black box. Increasingly, we are employing a kind of “experimental theoretical physics”. We try to predict the result of new calculations, just as experimentalists try to predict the result of new experiments.

This experimental approach seems to be a genuine cultural difference between physics and mathematics. There is such a thing as experimental mathematics, to be clear. And while mathematicians prefer proof, they’re not averse to working from a good conjecture. But when mathematicians calculate and conjecture, they still try to set a firm foundation. They’re precise about what they mean, and careful about what they imply.

“Experimental theoretical physics”, on the other hand, is much more like experimental physics itself. Physicists look for plausible patterns in the “data”, seeing if they make sense in some “physical” way. The conjectures aren’t always sharply posed, and the leaps of reasoning are often more reckless than the leaps of experimental mathematicians. We try to use intuition gleaned from a history of experiments on, and calculations about, the physical world.

There’s a real danger here, because mathematical formulas don’t behave like nature does. When we look at nature, we expect it to behave statistically. If we look at a large number of examples, we get more and more confident that they represent the behavior of the whole. This is sometimes dangerous in nature, but it’s even more dangerous in mathematics, because it’s often not clear what a good “sample” even is. Proving something is true “most of the time” is vastly different from proving it is true all of the time, especially when you’re looking at an infinity of possible examples. We can’t meet our favorite “five sigma” level of statistical confidence, or even know if we’re close.

At the same time, experimental theoretical physics has real power. Experience may be a bad guide to mathematics, but it’s a better guide to the mathematics that specifically shows up in physics. And in practice, our recklessness can accomplish great things, uncovering behaviors mathematicians would never have found by themselves.

The key is to always keep in mind that the two fields are different. “Experimental theoretical physics” isn’t mathematics, and it isn’t pretending to be, any more than experimental physics is pretending to be theoretical physics. We’re gathering data and advancing tentative explanations, but we’re fully aware that they may not hold up when examined with full rigor. We want to inspire, to raise questions and get people to think about the principles that govern the messy physical theories we use to describe our world. Experimental physics, theoretical physics, and mathematics are all part of a shared ecosystem, and each has its role to play.

Terence TaoThe spherical Cayley-Menger determinant and the radius of the Earth

Given three points {A,B,C} in the plane, the distances {|AB|, |BC|, |AC|} between them have to be non-negative and obey the triangle inequalities

\displaystyle  |AB| \leq |BC| + |AC|, |BC| \leq |AC| + |AB|, |AC| \leq |AB| + |BC|

but are otherwise unconstrained. But if one has four points {A,B,C,D} in the plane, then there is an additional constraint connecting the six distances {|AB|, |AC|, |AD|, |BC|, |BD|, |CD|} between them, coming from the Cayley-Menger determinant:

Proposition 1 (Cayley-Menger determinant) If {A,B,C,D} are four points in the plane, then the Cayley-Menger determinant

\displaystyle  \mathrm{det} \begin{pmatrix} 0 & 1 & 1 & 1 & 1 \\ 1 & 0 & |AB|^2 & |AC|^2 & |AD|^2 \\ 1 & |AB|^2 & 0 & |BC|^2 & |BD|^2 \\ 1 & |AC|^2 & |BC|^2 & 0 & |CD|^2 \\ 1 & |AD|^2 & |BD|^2 & |CD|^2 & 0 \end{pmatrix} \ \ \ \ \ (1)

vanishes.

Proof: If we view {A,B,C,D} as vectors in {{\bf R}^2}, then we have the usual cosine rule {|AB|^2 = |A|^2 + |B|^2 - 2 A \cdot B}, and similarly for all the other distances. The {5 \times 5} matrix appearing in (1) can then be written as {M+M^T-2\tilde G}, where {M} is the matrix

\displaystyle  M := \begin{pmatrix} 0 & 0 & 0 & 0 & 0 \\ 1 & |A|^2 & |B|^2 & |C|^2 & |D|^2 \\ 1 & |A|^2 & |B|^2 & |C|^2 & |D|^2 \\ 1 & |A|^2 & |B|^2 & |C|^2 & |D|^2 \\ 1 & |A|^2 & |B|^2 & |C|^2 & |D|^2 \end{pmatrix}

and {\tilde G} is the (augmented) Gram matrix

\displaystyle  \tilde G := \begin{pmatrix} 0 & 0 & 0 & 0 & 0 \\ 0 & A \cdot A & A \cdot B & A \cdot C & A \cdot D \\ 0 & B \cdot A & B \cdot B & B \cdot C & B \cdot D \\ 0 & C \cdot A & C \cdot B & C \cdot C & C \cdot D \\ 0 & D \cdot A & D \cdot B & D \cdot C & D \cdot D \end{pmatrix}.

The matrix {M} is a rank one matrix, and so {M^T} is also. The Gram matrix {\tilde G} factorises as {\tilde G = \tilde \Sigma \tilde \Sigma^T}, where {\tilde \Sigma} is the {5 \times 2} matrix with rows {0,A,B,C,D}, and thus has rank at most {2}. Therefore the matrix {M+M^T-2\tilde G} in (1) has rank at most {1+1+2=4}, and hence has determinant zero as claimed. \Box

For instance, if we know that {|AB|=|AC|=|DB|=|DC|=1} and {|BC|=\sqrt{2}}, then in order for {A,B,C,D} to be coplanar, the remaining distance {|AD|} has to obey the equation

\displaystyle  \mathrm{det} \begin{pmatrix} 0 & 1 & 1 & 1 & 1 \\ 1 & 0 & 1 & 1 & |AD|^2 \\ 1 & 1 & 0 & 2 & 1 \\ 1 & 1 & 2 & 0 & 1 \\ 1 & |AD|^2 & 1 & 1 & 0 \end{pmatrix} = 0.

After some calculation the left-hand side simplifies to {-4 |AD|^4 + 8 |AD|^2}, so the non-negative quantity is constrained to equal either {0} or {\sqrt{2}}. The former happens when {A,B,C} form a unit right-angled triangle with right angle at {A} and {D=A}; the latter happens when {A,B,D,C} form the vertices of a unit square traversed in that order. Any other value for {|AD|} is not compatible with the hypothesis for {A,B,C,D} lying on a plane; hence the Cayley-Menger determinant can be used as a test for planarity.

Now suppose that we have four points {A,B,C,D} on a sphere {S_R} of radius {R}, with six distances {|AB|_R, |AC|_R, |AD|_R, |BC|_R, |BD|_R, |AD|_R} now measured as lengths of arcs on the sphere. There is a spherical analogue of the Cayley-Menger determinant:

Proposition 2 (Spherical Cayley-Menger determinant) If {A,B,C,D} are four points on a sphere {S_R} of radius {R} in {{\bf R}^3}, then the spherical Cayley-Menger determinant

\displaystyle  \mathrm{det} \begin{pmatrix} 1 & \cos \frac{|AB|_R}{R} & \cos \frac{|AC|_R}{R} & \cos \frac{|AD|_R}{R} \\ \cos \frac{|AB|_R}{R} & 1 & \cos \frac{|BC|_R}{R} & \cos \frac{|BD|_R}{R} \\ \cos \frac{|AC|_R}{R} & \cos \frac{|BC|_R}{R} & 1 & \cos \frac{|CD|_R}{R} \\ \cos \frac{|AD|_R}{R} & \cos \frac{|BD|_R}{R} & \cos \frac{|CD|_R}{R} & 1 \end{pmatrix} \ \ \ \ \ (2)

vanishes.

Proof: We can assume that the sphere {S_R} is centred at the origin of {{\bf R}^3}, and view {A,B,C,D} as vectors in {{\bf R}^3} of magnitude {R}. The angle subtended by {AB} from the origin is {|AB|_R/R}, so by the cosine rule we have

\displaystyle  A \cdot B = R^{2} \cos \frac{|AB|_R}{R}.

Similarly for all the other inner products. Thus the matrix in (2) can be written as {R^{-2} G}, where {G} is the Gram matrix

\displaystyle  G := \begin{pmatrix} A \cdot A & A \cdot B & A \cdot C & A \cdot D \\ B \cdot A & B \cdot B & B \cdot C & B \cdot D \\ C \cdot A & C \cdot B & C \cdot C & C \cdot D \\ D \cdot A & D \cdot B & D \cdot C & D \cdot D \end{pmatrix}.

We can factor {G = \Sigma \Sigma^T} where {\Sigma} is the {4 \times 3} matrix with rows {A,B,C,D}. Thus {R^{-2} G} has rank at most {3} and thus the determinant vanishes as required. \Box

Just as the Cayley-Menger determinant can be used to test for coplanarity, the spherical Cayley-Menger determinant can be used to test for lying on a sphere of radius {R}. For instance, if we know that {A,B,C,D} lie on {S_R} and {|AB|_R, |AC|_R, |BC|_R, |BD|_R, |CD|_R} are all equal to {\pi R/2}, then the above proposition gives

\displaystyle  \mathrm{det} \begin{pmatrix} 1 & 0 & 0 & \cos \frac{|AD|_R}{R} \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ \cos \frac{|AD|_R}{R} & 0 & 0 & 1 \end{pmatrix} = 0.

The left-hand side evaluates to {1 - \cos^2 \frac{|AD|_R}{R}}; as {|AD|_R} lies between {0} and {\pi R}, the only choices for this distance are then {0} and {\pi R}. The former happens for instance when {A} lies on the north pole {(R,0,0)}, {B = (0,R,0), C = (0,R,0)} are points on the equator with longitudes differing by 90 degrees, and {D=(R,0,0)} is also equal to the north pole; the latter occurs when {D=(-R,0,0)} is instead placed on the south pole.

The Cayley-Menger and spherical Cayley-Menger determinants look slightly different from each other, but one can transform the latter into something resembling the former by row and column operations. Indeed, the determinant (2) can be rewritten as

\displaystyle  \mathrm{det} \begin{pmatrix} 1 & 1 & 1 & 1 & 1 \\ 1 & 0 & 1 - \cos \frac{|AB|_R}{R} & 1 - \cos \frac{|AC|_R}{R} & 1 - \cos \frac{|AD|_R}{R} \\ 1 & 1-\cos \frac{|AB|_R}{R} & 0 & 1-\cos \frac{|BC|_R}{R} & 1- \cos \frac{|BD|_R}{R} \\ 1 & 1-\cos \frac{|AC|_R}{R} & 1-\cos \frac{|BC|_R}{R} & 0 & 1-\cos \frac{|CD|_R}{R} \\ 1 & 1-\cos \frac{|AD|_R}{R} & 1-\cos \frac{|BD|_R}{R} & 1- \cos \frac{|CD|_R}{R} & 0 \end{pmatrix}

and by further row and column operations, this determinant vanishes if and only if the determinant

\displaystyle  \mathrm{det} \begin{pmatrix} \frac{1}{2R^2} & 1 & 1 & 1 & 1 \\ 1 & 0 & f_R(|AB|_R) & f_R(|AC|_R) & f_R(|AD|_R) \\ 1 & f_R(|AB|_R) & 0 & f_R(|BC|_R) & f_R(|BD|_R) \\ 1 & f_R(|AC|_R) & f_R(|BC|_R) & 0 & f_R(|CD|_R) \\ 1 & f_R(|AD|_R) & f_R(|BD|_R) & f_R(|CD|_R) & 0 \end{pmatrix} \ \ \ \ \ (3)

vanishes, where {f_R(x) := 2R^2 (1-\cos \frac{x}{R})}. In the limit {R \rightarrow \infty} (so that the curvature of the sphere {S_R} tends to zero), {|AB|_R} tends to {|AB|}, and by Taylor expansion {f_R(|AB|_R)} tends to {|AB|^2}; similarly for the other distances. Now we see that the planar Cayley-Menger determinant emerges as the limit of (3) as {R \rightarrow \infty}, as would be expected from the intuition that a plane is essentially a sphere of infinite radius.

In principle, one can now estimate the radius {R} of the Earth (assuming that it is either a sphere {S_R} or a flat plane {S_\infty}) if one is given the six distances {|AB|_R, |AC|_R, |AD|_R, |BC|_R, |BD|_R, |CD|_R} between four points {A,B,C,D} on the Earth. Of course, if one wishes to do so, one should have {A,B,C,D} rather far apart from each other, since otherwise it would be difficult to for instance distinguish the round Earth from a flat one. As an experiment, and just for fun, I wanted to see how accurate this would be with some real world data. I decided to take {A}, {B}, {C}, {D} be the cities of London, Los Angeles, Tokyo, and Dubai respectively. As an initial test, I used distances from this online flight calculator, measured in kilometers:

\displaystyle  |AB|_R = 8790

\displaystyle  |AC|_R = 9597 \mathrm{km}

\displaystyle  |AD|_R = 5488\mathrm{km}

\displaystyle  |BC|_R = 8849\mathrm{km}

\displaystyle  |BD|_R = 13435\mathrm{km}

\displaystyle  |CD|_R = 7957\mathrm{km}.

Given that the true radius of the earth was about {R_0 := 6371 \mathrm{km}} kilometers, I chose the change of variables {R = R_0/k} (so that {k=1} corresponds to the round Earth model with the commonly accepted value for the Earth’s radius, and {k=0} corresponds to the flat Earth), and obtained the following plot for (3):

In particular, the determinant does indeed come very close to vanishing when {k=1}, which is unsurprising since, as explained on the web site, the online flight calculator uses a model in which the Earth is an ellipsoid of radii close to {6371} km. There is another radius that would also be compatible with this data at {k\approx 1.3} (corresponding to an Earth of radius about {4900} km), but presumably one could rule out this as a spurious coincidence by experimenting with other quadruples of cities than the ones I selected. On the other hand, these distances are highly incompatible with the flat Earth model {k=0}; one could also see this with a piece of paper and a ruler by trying to lay down four points {A,B,C,D} on the paper with (an appropriately rescaled) version of the above distances (e.g., with {|AB| = 8.790 \mathrm{cm}}, {|AC| = 9.597 \mathrm{cm}}, etc.).

If instead one goes to the flight time calculator and uses flight travel times instead of distances, one now gets the following data (measured in hours):

\displaystyle  |AB|_R = 10\mathrm{h}\ 3\mathrm{m}

\displaystyle  |AC|_R = 11\mathrm{h}\ 49\mathrm{m}

\displaystyle  |AD|_R = 6\mathrm{h}\ 56\mathrm{m}

\displaystyle  |BC|_R = 10\mathrm{h}\ 56\mathrm{m}

\displaystyle  |BD|_R = 16\mathrm{h}\ 23\mathrm{m}

\displaystyle  |CD|_R = 9\mathrm{h}\ 52\mathrm{m}.

Assuming that planes travel at about {800} kilometers per hour, the true radius of the Earth should be about {R_1 := 8\mathrm{h}} of flight time. If one then uses the normalisation {R = R_1/k}, one obtains the following plot:

Not too surprisingly, this is basically a rescaled version of the previous plot, with vanishing near {k=1} and at {k=1.3}. (The website for the flight calculator does say it calculates short and long haul flight times slightly differently, which may be the cause of the slight discrepancies between this figure and the previous one.)

Of course, these two data sets are “cheating” since they come from a model which already presupposes what the radius of the Earth is. But one can input real world flight times between these four cities instead of the above idealised data. Here one runs into the issue that the flight time from {A} to {B} is not necessarily the same as that from {B} to {A} due to such factors as windspeed. For instance, I looked up the online flight time from Tokyo to Dubai to be 11 hours and 10 minutes, whereas the online flight time from Dubai to Tokyo was 9 hours and 50 minutes. The simplest thing to do here is take an arithmetic mean of the two times as a preliminary estimate for the flight time without windspeed factors, thus for instance the Tokyo-Dubai flight time would now be 10 hours and 30 minutes, and more generally

\displaystyle  |AB|_R = 10\mathrm{h}\ 47\mathrm{m}

\displaystyle  |AC|_R = 12\mathrm{h}\ 0\mathrm{m}

\displaystyle  |AD|_R = 7\mathrm{h}\ 17\mathrm{m}

\displaystyle  |BC|_R = 10\mathrm{h}\ 50\mathrm{m}

\displaystyle  |BD|_R = 15\mathrm{h}\ 55\mathrm{m}

\displaystyle  |CD|_R = 10\mathrm{h}\ 30\mathrm{m}.

This data is not too far off from the online calculator data, but it does distort the graph slightly (taking {R=8/k} as before):

Now one gets estimates for the radius of the Earth that are off by about a factor of {2} from the truth, although the {k=1} round Earth model still is twice as accurate as the flat Earth model {k=0}.

Given that windspeed should additively affect flight velocity rather than flight time, and the two are inversely proportional to each other, it is more natural to take a harmonic mean rather than an arithmetic mean. This gives the slightly different values

\displaystyle  |AB|_R = 10\mathrm{h}\ 51\mathrm{m}

\displaystyle  |AC|_R = 11\mathrm{h}\ 59\mathrm{m}

\displaystyle  |AD|_R = 7\mathrm{h}\ 16\mathrm{m}

\displaystyle  |BC|_R = 10\mathrm{h}\ 46\mathrm{m}

\displaystyle  |BD|_R = 15\mathrm{h}\ 54\mathrm{m}

\displaystyle  |CD|_R = 10\mathrm{h}\ 27\mathrm{m}

but one still gets essentially the same plot:

So the inaccuracies are presumably coming from some other source. (Note for instance that the true flight time from Tokyo to Dubai is about {6\%} greater than the calculator predicts, while the flight time from LA to Dubai is about {3\%} less; these sorts of errors seem to pile up in this calculation.) Nevertheless, it does seem that flight time data is (barely) enough to establish the roundness of the Earth and obtain a somewhat ballpark estimate for its radius. (I assume that the fit would be better if one could include some Southern Hemisphere cities such as Sydney or Santiago, but I was not able to find a good quadruple of widely spaced cities on both hemispheres for which there were direct flights between all six pairs.)

May 30, 2019

Tommaso DorigoBelle Puts Limit On B Decays To X Plus Photon

I know, the title of this article will not have you jump on your chair. Most probably, if you are reading these lines you are either terribly bored and in search of anything that can shake you from that state - but let me assure you that will not happen - or you are a freaking enthusiast of heavy flavour physics. In the latter case, you also probably do not need to read further. So why am I writing on anyway? Because I think physics is phun, and rare decays of heavy flavoured hadrons are interesting in their own right.

read more

May 28, 2019

Clifford JohnsonNews from the Front, XVI: Toward Quantum Heat Engines

(The following post is a bit more technical than usual. But non-experts may still find parts helpful.)

A couple of years ago I stumbled on an entire field that I had not encountered before: the study of Quantum Heat Engines. This sounds like an odd juxtaposition of terms since, as I say in the intro to my recent paper:

The thermodynamics of heat engines, refrigerators, and heat pumps is often thought to be firmly the domain of large classical systems, or put more carefully, systems that have a very large number of degrees of freedom such that thermal effects dominate over quantum effects. Nevertheless, there is thriving field devoted to the study—both experimental and theoretical—of the thermodynamics of machines that use small quantum systems as the working substance.

It is a fascinating field, with a lot of activity going on that connects to fields like quantum information, device physics, open quantum systems, condensed matter, etc.

Anyway, I stumbled on it because, as you may know, I've been thinking (in my 21st-meets-18th century way) about heat engines a lot over the last five years since I showed how to make them from (quantum) black holes, when embedded in extended gravitational thermodynamics. I've written it all down in blog posts before, so go look if interested (here and here).

In particular, it was when working on a project I wrote about here that I stumbled on quantum heat engines, and got thinking about their power and efficiency. It was while working on that project that I had a very happy thought: Could I show that holographic heat engines (the kind I make using black holes) -at least a class of them- are actually, in some regime, quantum heat engines? That would be potentially super-useful and, of course, super-fun.

The blunt headline statement is that they are, obviously, because every stage [...] Click to continue reading this post

The post News from the Front, XVI: Toward Quantum Heat Engines appeared first on Asymptotia.

Doug NatelsonBrief items

A number of interesting items:


May 27, 2019

John PreskillQuantum information in quantum cognition

Some research topics, says conventional wisdom, a physics PhD student shouldn’t touch with an iron-tipped medieval lance: sinkholes in the foundations of quantum theory. Problems so hard, you’d have a snowball’s chance of achieving progress. Problems so obscure, you’d have a snowball’s chance of convincing anyone to care about progress. Whether quantum physics could influence cognition much.

Quantum physics influences cognition insofar as (i) quantum physics prevents atoms from imploding and (ii) implosion inhabits atoms from contributing to cognition. But most physicists believe that useful entanglement can’t survive in brains. Entanglement consists of correlations shareable by quantum systems and stronger than any achievable by classical systems. Useful entanglement dies quickly in hot, wet, random environments. 

Brains form such environments. Imagine injecting entangled molecules A and B into someone’s brain. Water, ions, and other particles would bombard the molecules. The higher the temperature, the heavier the bombardment. The bombardiers would entangle with the molecules via electric and magnetic fields. Each molecule can share only so much entanglement. The more A entangled with the environment, the less A could remain entangled with B. A would come to share a tiny amount of entanglement with each of many particles. Such tiny amounts couldn’t accomplish much. So quantum physics seems unlikely to affect cognition significantly.

Lances

Do not touch.

Yet my PhD advisor, John Preskill, encouraged me to consider whether the possibility interested me.

Try some completely different research, he said. Take a risk. If it doesn’t pan out, fine. People don’t expect much of grad students, anyway. Have you seen Matthew Fisher’s paper about quantum cognition? 

Matthew Fisher is a theoretical physicist at the University of California, Santa Barbara. He has plaudits out the wazoo, many for his work on superconductors. A few years ago, Matthew developed an interest in biochemistry. He knew that most physicists doubt whether quantum physics could affect cognition much. But suppose that it could, he thought. How could it? Matthew reverse-engineered a mechanism, in a paper published by Annals of Physics in 2015.

A PhD student shouldn’t touch such research with a ten-foot radio antenna, says conventional wisdom. But I trust John Preskill in a way in which I trust no one else on Earth.

I’ll look at the paper, I said.

Risk

Matthew proposed that quantum physics could influence cognition as follows. Experimentalists have performed quantum computation using one hot, wet, random system: that of nuclear magnetic resonance (NMR). NMR is the process that underlies magnetic resonance imaging (MRI), a technique used to image people’s brains. A common NMR system consists of high-temperature liquid molecules. The molecules consists of atoms whose nuclei have quantum properties called spin. The nuclear spins encode quantum information (QI).

Nuclear spins, Matthew reasoned, might store QI in our brains. He catalogued the threats that could damage the QI. Hydrogen ions, he concluded, would threaten the QI most. They could entangle with (decohere) the spins via dipole-dipole interactions.

How can a spin avoid the threats? First, by having a quantum number s = 1/2. Such a quantum number zeroes out the nuclei’s electric quadrupole moments. Electric-quadrupole interactions can’t decohere such spins. Which biologically prevalent atoms have s = 1/2 nuclear spins? Phosphorus and hydrogen. Hydrogen suffers from other vulnerabilities, so phosphorus nuclear spins store QI in Matthew’s story. The spins serve as qubits, or quantum bits.

How can a phosphorus spin avoid entangling with other spins via magnetic dipole-dipole interactions? Such interactions depend on the spins’ orientations relative to their positions. Suppose that the phosphorus occupies a small molecule that tumbles in biofluids. The nucleus’s position changes randomly. The interaction can average out over tumbles.

The molecule contains atoms other than phosphorus. Those atoms have nuclei whose spins can interact with the phosphorus spins, unless every threatening spin has a quantum number s = 0. Which biologically prevalent atoms have s = 0 nuclear spins? Oxygen and calcium. The phosphorus should therefore occupy a molecule with oxygen and calcium.

Matthew designed this molecule to block decoherence. Then, he found the molecule in the scientific literature. The structure, {\rm Ca}_9 ({\rm PO}_4)_6, is called a Posner cluster or a Posner molecule. I’ll call it a Posner, for short. Posners appear to exist in simulated biofluids, fluids created to mimic the fluids in us. Posners are believed to exist in us and might participate in bone formation. According to Matthew’s estimates, Posners might protect phosphorus nuclear spins for up to 1-10 days.

Posner 2

Posner molecule (image courtesy of Swift et al.)

How can Posners influence cognition? Matthew proposed the following story.

Adenosine triphosphate (ATP) is a molecule that fuels biochemical reactions. “Triphosphate” means “containing three phosphate ions.” Phosphate ({\rm PO}_4^{3-}) consists of one phosphorus atom and three oxygen atoms. Two of an ATP molecule’s phosphates can break off while remaining joined to each other.

The phosphate pair can drift until encountering an enzyme called pyrophosphatase. The enzyme can break the pair into independent phosphates. Matthew, with Leo Radzihovsky, conjectured that, as the pair breaks, the phosphorus nuclear spins are projected onto a singlet. This state, represented by \frac{1}{ \sqrt{2} } ( | \uparrow \downarrow \rangle - | \downarrow \uparrow \rangle ), is maximally entangled. 

Imagine many entangled phosphates in a biofluid. Six phosphates can join nine calcium ions to form a Posner molecule. The Posner can share up to six singlets with other Posners. Clouds of entangled Posners can form.

One clump of Posners can enter one neuron while another clump enters another neuron. The protein VGLUT, or BNPI, sits in cell membranes and has the potential to ferry Posners in. The neurons will share entanglement. Imagine two Posners, P and Q, approaching each other in a neuron N. Quantum-chemistry calculations suggest that the Posners can bind together. Suppose that P shares entanglement with a Posner P’ in a neuron N’, while Q shares entanglement with a Posner Q’ in N’. The entanglement, with the binding of P to Q, can raise the probability that P’ binds to Q’.

Bound-together Posners will move slowly, having to push much water out of the way. Hydrogen and magnesium ions can latch onto the slow molecules easily. The Posners’ negatively charged phosphates will attract the {\rm H}^+ and {\rm Mg}^{2+} as the phosphates attract the Posner’s {\rm Ca}^{2+}. The hydrogen and magnesium can dislodge the calcium, breaking apart the Posners. Calcium will flood neurons N and N’. Calcium floods a neuron’s axion terminal (the end of the neuron) when an electrical signal reaches the axion. The flood induces the neuron to release neurotransmitters. Neurotransmitters are chemicals that travel to the next neuron, inducing it to fire. So entanglement between phosphorus nuclear spins in Posner molecules might stimulate coordinated neuron firing.

Neurons

Does Matthew’s story play out in the body? We can’t know till running experiments and analyzing the results. Experiments have begun: Last year, the Heising-Simons Foundation granted Matthew and collaborators $1.2 million to test the proposal.

Suppose that Matthew conjectures correctly, John challenged me, or correctly enough. Posner molecules store QI. Quantum systems can process information in ways in which classical systems, like laptops, can’t. How adroitly can Posners process QI?

I threw away my iron-tipped medieval lance in year five of my PhD. I left Caltech for a five-month fellowship, bent on returning with a paper with which to answer John. I did, and Annals of Physics published the paper this month.

Digest image

I had the fortune to interest Elizabeth Crosson in the project. Elizabeth, now an assistant professor at the University of New Mexico, was working as a postdoc in John’s group. Both of us are theorists who specialize in QI theory. But our backgrounds, skills, and specialties differ. We complemented each other while sharing a doggedness that kept us emailing, GChatting, and Google-hangout-ing at all hours.

Elizabeth and I translated Matthew’s biochemistry into the mathematical language of QI theory. We dissected Matthew’s narrative into a sequence of biochemical steps. We ascertained how each step would transform the QI encoded in the phosphorus nuclei. Each transformation, we represented with a piece of math and with a circuit-diagram element. (Circuit-diagram elements are pictures strung together to form circuits that run algorithms.) The set of transformations, we called Posner operations.

Imagine that you can perform Posner operations, by preparing molecules, trying to bind them together, etc. What QI-processing tasks can you perform? Elizabeth and I found applications to quantum communication, quantum error detection, and quantum computation. Our results rest on the assumption—possibly inaccurate—that Matthew conjectures correctly. Furthermore, we characterized what Posners could achieve if controlled. Randomness, rather than control, would direct Posners in biofluids. But what can happen in principle offers a starting point.

First, QI can be teleported from one Posner to another, while suffering noise.1 This noisy teleportation doubles as superdense coding: A trit is a random variable that assumes one of three possible values. A bit is a random variable that assumes one of two possible values. You can teleport a trit from one Posner to another effectively, while transmitting a bit directly, with help from entanglement. 

Teleport

Second, Matthew argued that Posners’ structures protect QI. Scientists have developed quantum error-correcting and -detecting codes to protect QI. Can Posners implement such codes, in our model? Yes: Elizabeth and I (with help from erstwhile Caltech postdoc Fernando Pastawski) developed a quantum error-detection code accessible to Posners. One Posner encodes a logical qutrit, the quantum version of a trit. The code detects any error that slams any of the Posner’s six qubits.

Third, how complicated an entangled state can Posner operations prepare? A powerful one, we found: Suppose that you can measure this state locally, such that earlier measurements’ outcomes affect which measurements you perform later. You can perform any quantum computation. That is, Posner operations can prepare a state that fuels universal measurement-based quantum computation.

Finally, Elizabeth and I quantified effects of entanglement on the rate at which Posners bind together. Imagine preparing two Posners, P and P’, that share entanglement only with other particles. If the Posners approach each other with the right orientation, they have a 33.6% chance of binding, in our model. Now, suppose that every qubit in P is maximally entangled with a qubit in P’. The binding probability can rise to 100%.

Circuit

Elizabeth and I recast as a quantum circuit a biochemical process discussed in Matthew Fisher’s 2015 paper.

I feared that other scientists would pooh-pooh our work as crazy. To my surprise, enthusiasm flooded in. Colleagues cheered the risk on a challenge in an emerging field that perks up our ears. Besides, Elizabeth’s and my work is far from crazy. We don’t assert that quantum physics affects cognition. We imagine that Matthew conjectures correctly, acknowledging that he might not, and explore his proposal’s implications. Being neither biochemists nor experimentalists, we restrict our claims to QI theory.

Maybe Posners can’t protect coherence for long enough. Would inaccuracy of Matthew’s beach our whale of research? No. Posners prompted us to propose ideas and questions within QI theory. For instance, our quantum circuits illustrate interactions (unitary gates, to experts) interspersed with measurements implemented by the binding of Posners. The circuits partially motivated a subfield that emerged last summer and is picking up speed: Consider interspersing random unitary gates with measurements. The unitaries tend to entangle qubits, whereas the measurements disentangle. Which influence wins? Does the system undergo a phase transition from “mostly entangled” to “mostly unentangled” at some measurement frequency? Researchers from Santa Barbara to Colorado; MIT; Oxford; Lancaster, UK; Berkeley; Stanford; and Princeton have taken up the challenge.  

A physics PhD student, conventional wisdom says, shouldn’t touch quantum cognition with a Swiss guard’s halberd. I’m glad I reached out: I learned much, contributed to science, and had an adventure. Besides, if anyone disapproves of daring, I can blame John Preskill.

Lance

Annals of Physics published “Quantum information in the Posner model of quantum cognition” here. You can find the arXiv version here and can watch a talk about our paper here. 

1Experts: The noise arises because, if two Posners bind, they effectively undergo a measurement. This measurement transforms a subspace of the two-Posner Hilbert space as a coarse-grained Bell measurement. A Bell measurement yields one of four possible outcomes, or two bits. Discarding one of the bits amounts to coarse-graining the outcome. Quantum teleportation involves a Bell measurement. Coarse-graining the measurement introduces noise into the teleportation.

May 24, 2019

Matt von HippelWhy I Wasn’t Bothered by the “Science” in Avengers: Endgame

Avengers: Endgame has been out for a while, so I don’t have to worry about spoilers right? Right?

Right?

Anyway, time travel. The spoiler is time travel. They bring back everyone who was eliminated in the previous movie, using time travel.

They also attempt to justify the time travel, using Ant Man-flavored quantum mechanics. This works about as plausibly as you’d expect for a superhero whose shrinking powers not only let him talk to ants, but also go to a “place” called “The Quantum Realm”. Along the way, they manage to throw in splintered references to a half-dozen almost-relevant scientific concepts. It’s the kind of thing that makes some physicists squirm.

And I enjoyed it.

Movies tend to treat time travel in one of two ways. The most reckless, and most common, let their characters rewrite history as they go, like Marty McFly almost erasing himself from existence in Back to the Future. This never makes much sense, and the characters in Avengers: Endgame make fun of it, listing a series of movies that do time travel this way (inexplicably including Wrinkle In Time, which has no time travel at all).

In the other common model, time travel has to happen in self-consistent loops: you can’t change the past, but you can go back and be part of it. This is the model used, for example, in Harry Potter, where Potter is saved by a mysterious spell only to travel back in time and cast it himself. This at least makes logical sense, whether it’s possible physically is an open question.

Avengers: Endgame uses the model of self-consistent loops, but with a twist: if you don’t manage to make your loop self-consistent you instead spawn a parallel universe, doomed to suffer the consequences of your mistakes. This is a rarer setup, but not a unique one, though the only other example I can think of at the moment is Homestuck.

Is there any physics justification for the Avengers: Endgame model? Maybe not. But you can at least guess what they were thinking.

The key clue is a quote from Tony Stark, rattling off a stream of movie-grade scientific gibberish:

“ Quantum fluctuation messes with the Planck scale, which then triggers the Deutsch Proposition. Can we agree on that? ”

From this quote, one can guess not only what scientific results inspired the writers of Avengers: Endgame, but possibly also which Wikipedia entry. David Deutsch is a physicist, and an advocate for the many-worlds interpretation of quantum mechanics. In 1991 he wrote a paper discussing what happens to quantum mechanics in the environment of a wormhole. In it he pointed out that you can make a self-consistent time travel loop, not just in classical physics, but out of a quantum superposition. This offers a weird solution to the classic grandfather paradox of time travel: instead of causing a paradox, you can form a superposition. As Scott Aaronson explains here, “you’re born with probability 1/2, therefore you kill your grandfather with probability 1/2, therefore you’re born with probability 1/2, and so on—everything is consistent.” If you believe in the many-worlds interpretation of quantum mechanics, a time traveler in this picture is traveling between two different branches of the wave-function of the universe: you start out in the branch where you were born, kill your grandfather, and end up in the branch where you weren’t born. This isn’t exactly how Avengers: Endgame handles time travel, but it’s close enough that it seems like a likely explanation.

David Deutsch’s argument uses a wormhole, but how do the Avengers make a wormhole in the first place? There we have less information, just vague references to quantum fluctuations at the Planck scale, the scale at which quantum gravity becomes important. There are a few things they could have had in mind, but one of them might have been physicists Leonard Susskind and Juan Maldacena’s conjecture that quantum entanglement is related to wormholes, a conjecture known as ER=EPR.

Long-time readers of the blog might remember I got annoyed a while back, when Caltech promoted ER=EPR using a different Disney franchise. The key difference here is that Avengers: Endgame isn’t pretending to be educational. Unlike Caltech’s ER=EPR piece, or even the movie Interstellar, Avengers: Endgame isn’t really about physics. It’s a superhero story, one that pairs the occasional scientific term with a character goofily bouncing around from childhood to old age while another character exclaims “you’re supposed to send him through time, not time through him!” The audience isn’t there to learn science, so they won’t come away with any incorrect assumptions.

The a movie like Avengers: Endgame doesn’t teach science, or even advertise it. It does celebrate it though.

That’s why, despite the silly half-correct science, I enjoyed Avengers: Endgame. It’s also why I don’t think it’s inappropriate, as some people do, to classify movies like Star Wars as science fiction. Star Wars and Avengers aren’t really about exploring the consequences of science or technology, they aren’t science fiction in that sense. But they do build off science’s role in the wider culture. They take our world and look at the advances on the horizon, robots and space travel and quantum speculations, and they let their optimism inform their storytelling. That’s not going to be scientifically accurate, and it doesn’t need to be, any more than the comic Abstruse Goose really believes Witten is from Mars. It’s about noticing we live in a scientific world, and having fun with it.

May 23, 2019

Doug NatelsonPublons?

I review quite a few papers - not Millie Dresselhaus level, but a good number.  Lately, some of the electronic review systems (e.g., manuscriptcentral.com, which is a front end for "Scholar One", a product of Clarivate) have been asking me if I want to "receive publons" in exchange for my reviewing activity. 

What are publons?  Following the wikipedia link above is a bit informative, but doesn't agree much with my impressions (which, of course, might be wrong).   My sense is that the original idea here was to have some way of recording and quantifying how much effort scientists were putting into the peer review process.  Reviewing and editorial activity would give you credit in the form of publons, and that kind of information could be used when evaluating people for promotion or hiring.   (I'm picturing some situation where a certain number of publons entitles you to a set of steak knives (nsfw language warning).)

The original idea now seems to have been taken over by Clarivate, who are the people that run Web of Science (the modern version of the science citation index) and produce bibliographic software that continually wants to be upgraded.  Instead of just a way of doing accounting of reviewing activity, it looks like they're trying to turn publons into some sort of hybrid analytics/research social network platform, like researchgate.  It feels like Clarivate is trying to (big surprise here in the modern age of social media) have users allow a bunch of data collection, which Clarivate will then find a way to monetize.  They are also getting into the "unique researcher identifier" game, apparently in duplication of or competition with orcid.

Maybe it's a sign of my advancing years, but my cynicism about this is pretty high.  Anyone have further insights into this?


Jacques Distler Brotli

I finally got around to enabling Brotli compression on Golem. Reading the manual, I came across the BrotliAlterETag directive:

Description: How the outgoing ETag header should be modified during compression
Syntax: BrotliAlterETag AddSuffix|NoChange|Remove

with the description:

AddSuffix
Append the compression method onto the end of the ETag, causing compressed and uncompressed representations to have unique ETags. In another dynamic compression module, mod_deflate, this has been the default since 2.4.0. This setting prevents serving “HTTP Not Modified (304)” responses to conditional requests for compressed content.
NoChange
Don’t change the ETag on a compressed response. In another dynamic compression module, mod_deflate, this has been the default prior to 2.4.0. This setting does not satisfy the HTTP/1.1 property that all representations of the same resource have unique ETags.
Remove
Remove the ETag header from compressed responses. This prevents some conditional requests from being possible, but avoids the shortcomings of the preceding options.

Sure enough, it turns out that ETags+compression have been completely broken in Apache 2.4.x. Two methods for saving bandwidth, and delivering pages faster, cancel each other out and chew up more bandwidth than if one or the other were disabled.

To unpack this a little further, the first time your browser requests a page, Apache computes a hash of the page and sends that along as a header in the response

etag: "38f7-56d65f4a2fcc0"

When your browser requests the page again, it sends an

If-None-Match: "38f7-56d65f4a2fcc0"

header in the request. If that matches the hash of the page, Apaches sends a “HTTP Not Modified (304)” response, telling your browser the page is unchanged from the last time it requested it.

If the page is compressed, using mod_deflate, then the header Apache sends is slightly different

etag: "38f7-56d65f4a2fcc0-gzip"

So, when your browser sends its request with an

If-None-Match: "38f7-56d65f4a2fcc0-gzip"

header, Apache compares “38f7-56d65f4a2fcc0-gzip” with the hash of the page, concludes that they don’t match, and sends the whole page again (thus wasting all the bandwidth you originally saved by sending the page compressed).

This is completely brain-dead. And, even though the problem has been around for years, the Apache folks don’t seem to have gotten around to fixing it. Instead, they just replicated the problem in mod_brotli (with a “-br” suffix replacing “-gzip”).

The solution is drop-dead simple. Add the line

RequestHeader edit "If-None-Match" '^"((.*)-(gzip|br))"$' '"$1", "$2"'

to your Apache configuration file. This gives Apache two ETags to compare with: the one with the suffix and the original unmodified one. The latter will match the hash of the file and Apache will return a “HTTP Not Modified (304)” as expected.

Why Apache didn’t just implement this in their code is beyond me.

Jordan EllenbergAssembled audience

I gave a talk at Williams College last year and took a little while to visit one of my favorite museums, Mass MoCA. There’s a new installation there, by Taryn Simon, called Assembled Audience. You walk in through a curtained opening and you’re in a pitch-black space. It’s very quiet. And then, slowly, applause starts to build. Bigger and bigger. About a minute of swell until the invisible crowd out there in the dark is going absolutely fucking nuts.

And I have to be honest, whatever this may say about me: I felt an incredible warmth and safety and satisfaction, standing there, being clapped for and adored by a recording of a crowd. Reader, I stayed for a second cycle.

May 21, 2019

Tim GowersVoting tactically in the EU elections

This post is addressed at anyone who is voting in Great Britain in the forthcoming elections to the European Parliament and whose principal aim is to maximize the number of MEPs from Remain-supporting parties, where those are deemed to be the Liberal Democrats, the Greens, Change UK, Plaid Cymru and the Scottish National Party. If you have other priorities, then the general principles laid out here may be helpful, but the examples of how to apply them will not necessarily be appropriate to your particular concerns.

What is the voting system?

The system used is called the d’Hondt system. The country is divided into a number of regions, and from each region several MEPs will be elected. You get one vote, and it is for a party rather than a single candidate. Once the votes are in, there are a couple of ways of thinking about how they translate into results. One that I like is to imagine that the parties have the option of assigning their votes to their candidates as they wish, and once the assignments have been made, the n candidates with the most votes get seats, where n is the number of MEPs representing the given region.

For example, if there are three parties for four places, and their vote shares are 50%, 30% and 20%, then the first party will give 25% to two candidates and both will be elected. If the second party tries a similar trick, it will only get one candidate through because the 20% that goes to the third party is greater than the 15% going to the two candidates from the second party. So the result is two candidates for the first party, one for the second and one for the third.

If the vote shares had been 60%, 25% and 15%, then the first party could afford to split three ways and the result would be three seats for the first party and one for the second.

The way this is sometimes presented is as follows. Let’s go back to the first case. We take the three percentages, and for each one we write down the results of dividing it by 1, 2, 3, etc. That gives us the (approximate) numbers

50%, 25%, 17%, 13%, 10%, …

30%, 15%, 10%, 8%, 6%, …

20%, 10%, 7%, 5%, 3%, …

Looking at those numbers, we see that the biggest four are 50%, 25% from the top row, 30% from the second row, and 20% from the third row. So the first party gets two seats, the second party one and the third party one.

How does this affect how I should vote?

The answer to this question depends in peculiar ways on the polling in your region. Let’s take my region, Eastern England, as an example. This region gets seven MEPs, and the latest polls show these kinds of percentages.

Brexit Party 40%
Liberal Democrats 17%
Labour 15%
Greens 10%
Conservatives 9%
Change UK 4%
UKIP 2%

If the percentages stay as they are, then the threshold for an MEP is 10%. The Brexit Party gets four MEPs, and the Lib Dems, Labour and the Greens one each. But because the Brexit Party, the Greens and the Conservatives are all close to the 10% threshold, small swings can make a difference to which two out of the fourth Brexit Party candidate, the Green candidate, or the Conservative candidate gets left out. On the other hand, it would take a much bigger swing — of 3% or so — to give the second Lib Dem candidate a chance of being elected. So if your main concern is to maximize the number of Remain-supporting MEPs, you should support the Greens.

Yes, but what if everybody were to do that?

In principle that is an annoying problem with the d’Hondt system. But don’t worry — it just isn’t going to happen. Systematic tactical voting is at best a marginal phenomenon, but fortunately in this region a marginal phenomenon may be all it takes to make sure that the Green candidate gets elected.

Aren’t you being defeatist? What about trying to get two Lib Dems and one Green through?

This might conceivably be possible, but it would be difficult, and a risky strategy, since going for that could lead to just one Remain-supporting MEP. One possibility would be for Remain-leaning Labour voters to say to themselves “Well, we’re basically guaranteed an MEP, and I’d much prefer a Remain MEP to either the Conservatives or the Brexit Party, so I’ll vote Green or Lib Dem instead.” If that started showing up in polls, then one would be able to do a better risk assessment. But for now it looks better to make sure that the Green candidate gets through.

I’m not from the Eastern region. Where can I find out how to vote in my region?

There is a website called remainvoter.com that has done the analysis. The reason I am writing this post is that I have seen online that a lot of people are highly sceptical about their conclusions, so I wanted to explain the theory behind them (as far as I can guess it) so that you don’t have to take what they say on trust and can do the calculations for yourself.

Just to check, I’ll look at another region and see whether I end up with a recommendation that agrees with that of remainvoter.com.

In the South West, there are six MEPs. A recent poll shows the following percentages.

Brexit Party 42%
Lib Dem 20%
Green Party 12%
Conservatives 9%
Labour 8%
Change UK 4%
UKIP 3%

Dividing the Brexit Party vote by 3 gives 14% and dividing the Lib Dem vote by 2 gives 10%. So as things stand there would be three Brexit Party MEPs, two Lib Dem MEPs and one Green Party MEP.

This is a bit close for comfort, but the second Lib Dem candidate is in a more precarious position than the Green Party candidate, given that the Conservative candidate is on 9%. So it would make sense for a bit of Green Party support to transfer to the Lib Dems in order to be sure that the three Remain-supporting candidates that look like being elected in the south west really are.

Interestingly, remainvoter.com recommend supporting the Greens on the grounds that one Lib Dem MEP is bound to be elected. I’m not sure I understand this, since it seems very unlikely that the Lib Dems and the Greens won’t get at least two seats between them, so they might as well aim for three. Perhaps someone can enlighten me on this point. It could be that remainvoter.com is looking at different polls from the ones I’m looking at.

I’m slightly perturbed by that so I’ll pick another region and try the same exercise. Perhaps London would be a good one. Here we have the following percentages (plus a couple of smaller ones that won’t affect anything).

Liberal Democrats 24% (12%, 8%)
Brexit Party 21% (10.5%, 7%)
Labour 19% (9.5%, 6.3%)
Green Party 14% (7%)
Conservatives 10%
Change UK 6%

London has eight MEPs. Here I find it convenient to use the algorithm of dividing by 1,2,3 etc., which explains the percentages I’ve added in brackets. Taking the eight largest numbers we see that the current threshold to get an MEP is at 9.5%, so the Lib Dems get two, the Brexit party two, Labour two and the Greens and Conservatives one each.

Here it doesn’t look obvious how to vote tactically. Clearly not Green, since the Greens are squarely in the middle of the range between the threshold and twice the threshold. Probably not Lib Dem either (unless things change quite a bit) since they’re unlikely to go up as far as 28.5%. But getting Change UK up to 9.5% also looks pretty hard to me. Perhaps the least difficult of these difficult options is for the Green Party to donate about 3% of the vote and the Lib Dems another 2% to Change UK, which would allow them to overtake Labour. But I don’t see it happening.

And now to check my answer, so to speak. And it does indeed agree with the remainvoter.com recommendation. This looks to me like a case where if tactical voting were to be widely adopted, then it might just work to get another MEP, but if it were that widely adopted, one might have to start worrying about not overshooting and accidentally losing one of the other Remain MEPs. But that’s not likely to happen, and in fact I’d predict that in London Change UK will not get an MEP because not enough people will follow remainvoter.com’s recommendation.

This all seems horribly complicated. What should I do?

If you don’t want to bother to think about it, then just go to remainvoter.com and follow their recommendation. If you do want to think, then follow these simple (for a typical reader of this blog anyway) instructions.

1. Google polls for your region. (For example, you can scroll down to near the bottom of this page to find one set of polls.)

2. Find out how many MEPs your region gets. Let that number be n.

3. For each percentage, divide it by 1, 2, 3 etc. until you reach a number that clearly won’t be in the top n.

4. See what percentage, out of all those numbers, is the nth largest.

5. Vote for a Remain party that is associated with a number that is close to the threshold if there is also a Brexit-supporting (or Brexit-fence-sitting) party with a number close to the threshold.

One can refine point 5 as follows, to cover the case when more than one Remain-supporting party has a number near the threshold. Suppose, for the sake of example, that the Brexit party is polling at 32%, the Lib Dems at 22%, the Greens at 11%, Labour at 18% and the Conservatives at 12%, and others 5%, in a region that gets five MEPs. Then carrying out step 3, we get

Brexit 32, 16, 10.6
Lib Dems 22, 11
Greens 11
Conservatives 12
Labour 18, 9

So as things stand the Brexit Party gets two MEPs, the Lib Dems one, Labour one and the Conservatives one. If you’re a Remain supporter who wants to vote tactically, then you’ll want to push one of the Lib Dems and the Greens over 12% to defeat the Conservative candidate. To do that, you’ll need either to increase the Green vote from 11% to 12% or to increase the Lib Dem vote from 22% to 24%. The latter is probably harder, so you should probably support the Greens.

A final word

I’m not writing this as an expert, so don’t assume that everything I’ve written is correct, especially given that I came to a different conclusion from remainvoter.com in the South West. If you think I’ve slipped up, then please let me know in the comments, and if I agree with you I’ll make changes. But bear in mind the premise with which I started. Of course there may well be reasons for not voting tactically, such as caring about issues other than Brexit. But this post is about what to do if Brexit is your overriding concern. And one obvious last point: PLEASE ACTUALLY BOTHER TO VOTE! Just the percentage of people voting for Remain-supporting parties will have an impact, even if Farage gets more MEPs.

Terence TaoA function field analogue of Riemann zeta statistics

This is another sequel to a recent post in which I showed the Riemann zeta function {\zeta} can be locally approximated by a polynomial, in the sense that for randomly chosen {t \in [T,2T]} one has an approximation

\displaystyle  \zeta(\frac{1}{2} + it - \frac{2\pi i z}{\log T}) \approx P_t( e^{2\pi i z/N} ) \ \ \ \ \ (1)

where {N} grows slowly with {T}, and {P_t} is a polynomial of degree {N}. It turns out that in the function field setting there is an exact version of this approximation which captures many of the known features of the Riemann zeta function, namely Dirichlet {L}-functions for a random character of given modulus over a function field. This model was (essentially) studied in a fairly recent paper by Andrade, Miller, Pratt, and Trinh; I am not sure if there is any further literature on this model beyond this paper (though the number field analogue of low-lying zeroes of Dirichlet {L}-functions is certainly well studied). In this model it is possible to set {N} fixed and let {T} go to infinity, thus providing a simple finite-dimensional model problem for problems involving the statistics of zeroes of the zeta function.

In this post I would like to record this analogue precisely. We will need a finite field {{\mathbb F}} of some order {q} and a natural number {N}, and set

\displaystyle  T := q^{N+1}.

We will primarily think of {q} as being large and {N} as being either fixed or growing very slowly with {q}, though it is possible to also consider other asymptotic regimes (such as holding {q} fixed and letting {N} go to infinity). Let {{\mathbb F}[X]} be the ring of polynomials of one variable {X} with coefficients in {{\mathbb F}}, and let {{\mathbb F}[X]'} be the multiplicative semigroup of monic polynomials in {{\mathbb F}[X]}; one should view {{\mathbb F}[X]} and {{\mathbb F}[X]'} as the function field analogue of the integers and natural numbers respectively. We use the valuation {|n| := q^{\mathrm{deg}(n)}} for polynomials {n \in {\mathbb F}[X]} (with {|0|=0}); this is the analogue of the usual absolute value on the integers. We select an irreducible polynomial {Q \in {\mathbb F}[X]} of size {|Q|=T} (i.e., {Q} has degree {N+1}). The multiplicative group {({\mathbb F}[X]/Q{\mathbb F}[X])^\times} can be shown to be cyclic of order {|Q|-1=T-1}. A Dirichlet character of modulus {Q} is a completely multiplicative function {\chi: {\mathbb F}[X] \rightarrow {\bf C}} of modulus {Q}, that is periodic of period {Q} and vanishes on those {n \in {\mathbb F}[X]} not coprime to {Q}. From Fourier analysis we see that there are exactly {\phi(Q) := |Q|-1} Dirichlet characters of modulus {Q}. A Dirichlet character is said to be odd if it is not identically one on the group {{\mathbb F}^\times} of non-zero constants; there are only {\frac{1}{q-1} \phi(Q)} non-odd characters (including the principal character), so in the limit {q \rightarrow \infty} most Dirichlet characters are odd. We will work primarily with odd characters in order to be able to ignore the effect of the place at infinity.

Let {\chi} be an odd Dirichlet character of modulus {Q}. The Dirichlet {L}-function {L(s, \chi)} is then defined (for {s \in {\bf C}} of sufficiently large real part, at least) as

\displaystyle  L(s,\chi) := \sum_{n \in {\mathbb F}[X]'} \frac{\chi(n)}{|n|^s}

\displaystyle  = \sum_{m=0}^\infty q^{-sm} \sum_{n \in {\mathbb F}[X]': |n| = q^m} \chi(n).

Note that for {m \geq N+1}, the set {n \in {\mathbb F}[X]': |n| = q^m} is invariant under shifts {h} whenever {|h| < T}; since this covers a full set of residue classes of {{\mathbb F}[X]/Q{\mathbb F}[X]}, and the odd character {\chi} has mean zero on this set of residue classes, we conclude that the sum {\sum_{n \in {\mathbb F}[X]': |n| = q^m} \chi(n)} vanishes for {m \geq N+1}. In particular, the {L}-function is entire, and for any real number {t} and complex number {z}, we can write the {L}-function as a polynomial

\displaystyle  L(\frac{1}{2} + it - \frac{2\pi i z}{\log T},\chi) = P(Z) = P_{t,\chi}(Z) := \sum_{m=0}^N c^1_m(t,\chi) Z^j

where {Z := e(z/N) = e^{2\pi i z/N}} and the coefficients {c^1_m = c^1_m(t,\chi)} are given by the formula

\displaystyle  c^1_m(t,\chi) := q^{-m/2-imt} \sum_{n \in {\mathbb F}[X]': |n| = q^m} \chi(n).

Note that {t} can easily be normalised to zero by the relation

\displaystyle  P_{t,\chi}(Z) = P_{0,\chi}( q^{-it} Z ). \ \ \ \ \ (2)

In particular, the dependence on {t} is periodic with period {\frac{2\pi}{\log q}} (so by abuse of notation one could also take {t} to be an element of {{\bf R}/\frac{2\pi}{\log q}{\bf Z}}).

Fourier inversion yields a functional equation for the polynomial {P}:

Proposition 1 (Functional equation) Let {\chi} be an odd Dirichlet character of modulus {Q}, and {t \in {\bf R}}. There exists a phase {e(\theta)} (depending on {t,\chi}) such that

\displaystyle  a_{N-m}^1 = e(\theta) \overline{c^1_m}

for all {0 \leq m \leq N}, or equivalently that

\displaystyle  P(1/Z) = e^{i\theta} Z^{-N} \overline{P}(Z)

where {\overline{P}(Z) := \overline{P(\overline{Z})}}.

Proof: We can normalise {t=0}. Let {G} be the finite field {{\mathbb F}[X] / Q {\mathbb F}[X]}. We can write

\displaystyle  a_{N-m} = q^{-(N-m)/2} \sum_{n \in q^{N-m} + H_{N-m}} \chi(n)

where {H_j} denotes the subgroup of {G} consisting of (residue classes of) polynomials of degree less than {j}. Let {e_G: G \rightarrow S^1} be a non-trivial character of {G} whose kernel lies in the space {H_N} (this is easily achieved by pulling back a non-trivial character from the quotient {G/H_N \equiv {\mathbb F}}). We can use the Fourier inversion formula to write

\displaystyle  a_{N-m} = q^{(m-N)/2} \sum_{\xi \in G} \hat \chi(\xi) \sum_{n \in T^{N-m} + H_{N-m}} e_G( n\xi )

where

\displaystyle  \hat \chi(\xi) := q^{-N-1} \sum_{n \in G} \chi(n) e_G(-n\xi).

From change of variables we see that {\hat \chi} is a scalar multiple of {\overline{\chi}}; from Plancherel we conclude that

\displaystyle  \hat \chi = e(\theta_0) q^{-(N+1)/2} \overline{\chi} \ \ \ \ \ (3)

for some phase {e(\theta_0)}. We conclude that

\displaystyle  a_{N-m} = e(\theta_0) q^{-(2N-m+1)/2} \sum_{\xi \in G} \overline{\chi}(\xi) e_G( T^{N-j} \xi) \sum_{n \in H_{N-j}} e_G( n\xi ). \ \ \ \ \ (4)

The inner sum {\sum_{n \in H_{N-m}} e_G( n\xi )} equals {q^{N-m}} if {\xi \in H_{j+1}}, and vanishes otherwise, thus

\displaystyle a_{N-m} = e(\theta_0) q^{-(m+1)/2} \sum_{\xi \in H_{j+1}} \overline{\chi}(\xi) e_G( T^{N-m} \xi).

For {\xi} in {H_j}, {e_G(T^{N-m} \xi)=1} and the contribution of the sum vanishes as {\chi} is odd. Thus we may restrict {\xi} to {H_{m+1} \backslash H_m}, so that

\displaystyle a_{N-m} = e(\theta_0) q^{-(m+1)/2} \sum_{h \in {\mathbb F}^\times} e_G( T^{N} h) \sum_{\xi \in h T^m + H_{m}} \overline{\chi}(\xi).

By the multiplicativity of {\chi}, this factorises as

\displaystyle a_{N-m} = e(\theta_0) q^{-(m+1)/2} (\sum_{h \in {\mathbb F}^\times} \overline{\chi}(h) e_G( T^{N} h)) (\sum_{\xi \in T^m + H_{m}} \overline{\chi}(\xi)).

From the one-dimensional version of (3) (and the fact that {\chi} is odd) we have

\displaystyle  \sum_{h \in {\mathbb F}^\times} \overline{\chi}(h) e_G( T^{N} h) = e(\theta_1) q^{1/2}

for some phase {e(\theta_1)}. The claim follows. \Box

As one corollary of the functional equation, {a_N} is a phase rotation of {\overline{a_1} = 1} and thus is non-zero, so {P} has degree exactly {N}. The functional equation is then equivalent to the {N} zeroes of {P} being symmetric across the unit circle. In fact we have the stronger

Theorem 2 (Riemann hypothesis for Dirichlet {L}-functions over function fields) Let {\chi} be an odd Dirichlet character of modulus {Q}, and {t \in {\bf R}}. Then all the zeroes of {P} lie on the unit circle.

We derive this result from the Riemann hypothesis for curves over function fields below the fold.

In view of this theorem (and the fact that {a_1=1}), we may write

\displaystyle  P(Z) = \mathrm{det}(1 - ZU)

for some unitary {N \times N} matrix {U = U_{t,\chi}}. It is possible to interpret {U} as the action of the geometric Frobenius map on a certain cohomology group, but we will not do so here. The situation here is simpler than in the number field case because the factor {\exp(A)} arising from very small primes is now absent (in the function field setting there are no primes of size between {1} and {q}).

We now let {\chi} vary uniformly at random over all odd characters of modulus {Q}, and {t} uniformly over {{\bf R}/\frac{2\pi}{\log q}{\bf Z}}, independently of {\chi}; we also make the distribution of the random variable {U} conjugation invariant in {U(N)}. We use {{\mathbf E}_Q} to denote the expectation with respect to this randomness. One can then ask what the limiting distribution of {U} is in various regimes; we will focus in this post on the regime where {N} is fixed and {q} is being sent to infinity. In the spirit of the Sato-Tate conjecture, one should expect {U} to converge in distribution to the circular unitary ensemble (CUE), that is to say Haar probability measure on {U(N)}. This may well be provable from Deligne’s “Weil II” machinery (in the spirit of this monograph of Katz and Sarnak), though I do not know how feasible this is or whether it has already been done in the literature; here we shall avoid using this machinery and study what partial results towards this CUE hypothesis one can make without it.

If one lets {\lambda_1,\dots,\lambda_N} be the eigenvalues of {U} (ordered arbitrarily), then we now have

\displaystyle  \sum_{m=0}^N c^1_m Z^m = P(Z) = \prod_{j=1}^N (1 - \lambda_j Z)

and hence the {c^1_m} are essentially elementary symmetric polynomials of the eigenvalues:

\displaystyle  c^1_m = (-1)^j e_m( \lambda_1,\dots,\lambda_N). \ \ \ \ \ (5)

One can take log derivatives to conclude

\displaystyle  \frac{P'(Z)}{P(Z)} = \sum_{j=1}^N \frac{\lambda_j}{1-\lambda_j Z}.

On the other hand, as in the number field case one has the Dirichlet series expansion

\displaystyle  Z \frac{P'(Z)}{P(Z)} = \sum_{n \in {\mathbb F}[X]'} \frac{\Lambda_q(n) \chi(n)}{|n|^s}

where {s = \frac{1}{2} + it - \frac{2\pi i z}{\log T}} has sufficiently large real part, {Z = e(z/N)}, and the von Mangoldt function {\Lambda_q(n)} is defined as {\log_q |p| = \mathrm{deg} p} when {n} is the power of an irreducible {p} and {0} otherwise. We conclude the “explicit formula”

\displaystyle  c^{\Lambda_q}_m = \sum_{j=1}^N \lambda_j^m = \mathrm{tr}(U^m) \ \ \ \ \ (6)

for {m \geq 1}, where

\displaystyle  c^{\Lambda_q}_m := q^{-m/2-imt} \sum_{n \in {\mathbb F}[X]': |n| = q^m} \Lambda_q(n) \chi(n).

Similarly on inverting {P(Z)} we have

\displaystyle  P(Z)^{-1} = \prod_{j=1}^N (1 - \lambda_j Z)^{-1}.

Since we also have

\displaystyle  P(Z)^{-1} = \sum_{n \in {\mathbb F}[X]'} \frac{\mu(n) \chi(n)}{|n|^s}

for {s} sufficiently large real part, where the Möbius function {\mu(n)} is equal to {(-1)^k} when {n} is the product of {k} distinct irreducibles, and {0} otherwise, we conclude that the Möbius coefficients

\displaystyle  c^\mu_m := q^{-m/2-imt} \sum_{n \in {\mathbb F}[X]': |n| = q^m} \mu(n) \chi(n)

are just the complete homogeneous symmetric polynomials of the eigenvalues:

\displaystyle  c^\mu_m = h_m( \lambda_1,\dots,\lambda_N). \ \ \ \ \ (7)

One can then derive various algebraic relationships between the coefficients {c^1_m, c^{\Lambda_q}_m, c^\mu_m} from various identities involving symmetric polynomials, but we will not do so here.

What do we know about the distribution of {U}? By construction, it is conjugation-invariant; from (2) it is also invariant with respect to the rotations {U \rightarrow e^{i\theta} U} for any phase {\theta \in{\bf R}}. We also have the function field analogue of the Rudnick-Sarnak asymptotics:

Proposition 3 (Rudnick-Sarnak asymptotics) Let {a_1,\dots,a_k,b_1,\dots,b_k} be nonnegative integers. If

\displaystyle  \sum_{j=1}^k j a_j \leq N, \ \ \ \ \ (8)

then the moment

\displaystyle  {\bf E}_{Q} \prod_{j=1}^k (\mathrm{tr} U^j)^{a_j} (\overline{\mathrm{tr} U^j})^{b_j} \ \ \ \ \ (9)

is equal to {o(1)} in the limit {q \rightarrow \infty} (holding {N,a_1,\dots,a_k,b_1,\dots,b_k} fixed) unless {a_j=b_j} for all {j}, in which case it is equal to

\displaystyle  \prod_{j=1}^k j^{a_j} a_j! + o(1). \ \ \ \ \ (10)

Comparing this with Proposition 1 from this previous post, we thus see that all the low moments of {U} are consistent with the CUE hypothesis (and also with the ACUE hypothesis, again by the previous post). The case {\sum_{j=1}^k a_j + \sum_{j=1}^k b_j \leq 2} of this proposition was essentially established by Andrade, Miller, Pratt, and Trinh.

Proof: We may assume the homogeneity relationship

\displaystyle  \sum_{j=1}^k j a_j = \sum_{j=1}^k j b_j \ \ \ \ \ (11)

since otherwise the claim follows from the invariance under phase rotation {U \mapsto e^{i\theta} U}. By (6), the expression (9) is equal to

\displaystyle  q^{-D} {\bf E}_Q \sum_{n_1,\dots,n_l,n'_1,\dots,n'_{l'} \in {\mathbb F}[X]': |n_i| = q^{s_i}, |n'_i| = q^{s'_i}} (\prod_{i=1}^l \Lambda_q(n_i) \chi(n_i)) \prod_{i=1}^{l'} \Lambda_q(n'_i) \overline{\chi(n'_i)}

where

\displaystyle  D := \sum_{j=1}^k j a_j = \sum_{j=1}^k j b_j

\displaystyle  l := \sum_{j=1}^k a_j

\displaystyle  l' := \sum_{j=1}^k b_j

and {s_1 \leq \dots \leq s_l} consists of {a_j} copies of {j} for each {j=1,\dots,k}, and similarly {s'_1 \leq \dots \leq s'_{l'}} consists of {b_j} copies of {j} for each {j=1,\dots,k}.

The polynomials {n_1 \dots n_l} and {n'_1 \dots n'_{l'}} are monic of degree {D}, which by hypothesis is less than the degree of {Q}, and thus they can only be scalar multiples of each other in {{\mathbb F}[X] / Q {\mathbb F}[X]} if they are identical (in {{\mathbb F}[X]}). As such, we see that the average

\displaystyle  {\bf E}_Q \chi(n_1) \dots \chi(n_l) \overline{\chi(n'_1)} \dots \overline{\chi(n'_{l'})}

vanishes unless {n_1 \dots n_l = n'_1 \dots n'_{l'}}, in which case this average is equal to {1}. Thus the expression (9) simplifies to

\displaystyle  q^{-D} \sum_{n_1,\dots,n_l,n'_1,\dots,n'_{l'}: |n_i| = q^{s_i}, |n'_i| = q^{s'_i}; n_1 \dots n_l = n'_1 \dots n'_l} (\prod_{i=1}^l \Lambda_q(n_i)) \prod_{i=1}^{l'} \Lambda_q(n'_i).

There are at most {q^D} choices for the product {n_1 \dots n_l}, and each one contributes {O_D(1)} to the above sum. All but {o(q^D)} of these choices are square-free, so by accepting an error of {o(1)}, we may restrict attention to square-free {n_1 \dots n_l}. This forces {n_1,\dots,n_l,n'_1,\dots,n'_{l'}} to all be irreducible (as opposed to powers of irreducibles); as {{\mathbb F}[X]} is a unique factorisation domain, this forces {l=l'} and {n_1,\dots,n_l} to be a permutation of {n'_1,\dots,n'_{l'}}. By the size restrictions, this then forces {a_j = b_j} for all {j} (if the above expression is to be anything other than {o(1)}), and each {n_1,\dots,n_l} is associated to {\prod_{j=1}^k a_j!} possible choices of {n'_1,\dots,n'_{l'}}. Writing {\Lambda_q(n'_i) = s'_i} and then reinstating the non-squarefree possibilities for {n_1 \dots n_l}, we can thus write the above expression as

\displaystyle  q^{-D} \prod_{j=1}^k j a_j! \sum_{n_1,\dots,n_l,n'_1,\dots,n'_{l'}\in {\mathbb F}[X]': |n_i| = q^{s_i}} \prod_{i=1}^l \Lambda_q(n_i) + o(1).

Using the prime number theorem {\sum_{n \in {\mathbb F}[X]': |n| = q^s} \Lambda_q(n) = q^s}, we obtain the claim. \Box

Comparing this with Proposition 1 from this previous post, we thus see that all the low moments of {U} are consistent with the CUE and ACUE hypotheses:

Corollary 4 (CUE statistics at low frequencies) Let {\lambda_1,\dots,\lambda_N} be the eigenvalues of {U}, permuted uniformly at random. Let {R(\lambda)} be a linear combination of monomials {\lambda_1^{a_1} \dots \lambda_N^{a_N}} where {a_1,\dots,a_N} are integers with either {\sum_{j=1}^N a_j \neq 0} or {\sum_{j=1}^N |a_j| \leq 2N}. Then

\displaystyle  {\bf E}_Q R(\lambda) = {\bf E}_{CUE} R(\lambda) + o(1).

The analogue of the GUE hypothesis in this setting would be the CUE hypothesis, which asserts that the threshold {2N} here can be replaced by an arbitrarily large quantity. As far as I know this is not known even for {2N+2} (though, as mentioned previously, in principle one may be able to resolve such cases using Deligne’s proof of the Riemann hypothesis for function fields). Among other things, this would allow one to distinguish CUE from ACUE, since as discussed in the previous post, these two distributions agree when tested against monomials up to threshold {2N}, though not to {2N+2}.

Proof: By permutation symmetry we can take {R} to be symmetric, and by linearity we may then take {R} to be the symmetrisation of a single monomial {\lambda_1^{a_1} \dots \lambda_N^{a_N}}. If {\sum_{j=1}^N a_j \neq 0} then both expectations vanish due to the phase rotation symmetry, so we may assume that {\sum_{j=1}^N a_j \neq 0} and {\sum_{j=1}^N |a_j| \leq 2N}. We can write this symmetric polynomial as a constant multiple of {\mathrm{tr}(U^{a_1}) \dots \mathrm{tr}(U^{a_N})} plus other monomials with a smaller value of {\sum_{j=1}^N |a_j|}. Since {\mathrm{tr}(U^{-a}) = \overline{\mathrm{tr}(U^a)}}, the claim now follows by induction from Proposition 3 and Proposition 1 from the previous post. \Box

Thus, for instance, for {k=1,2}, the {2k^{th}} moment

\displaystyle {\bf E}_Q |\det(1-U)|^{2k} = {\bf E}_Q |P(1)|^{2k} = {\bf E}_Q |L(\frac{1}{2} + it, \chi)|^{2k}

is equal to

\displaystyle  {\bf E}_{CUE} |\det(1-U)|^{2k} + o(1)

because all the monomials in {\prod_{j=1}^N (1-\lambda_j)^k (1-\lambda_j^{-1})^k} are of the required form when {k \leq 2}. The latter expectation can be computed exactly (for any natural number {k}) using a formula

\displaystyle  {\bf E}_{CUE} |\det(1-U)|^{2k} = \prod_{j=1}^N \frac{\Gamma(j) \Gamma(j+2k)}{\Gamma(j+k)^2}

of Baker-Forrester and Keating-Snaith, thus for instance

\displaystyle  {\bf E}_{CUE} |\det(1-U)|^2 = N+1

\displaystyle  {\bf E}_{CUE} |\det(1-U)|^4 = \frac{(N+1)(N+2)^2(N+3)}{12}

and more generally

\displaystyle  {\bf E}_{CUE}|\det(1-U)|^{2k} = \frac{g_k+o(1)}{(k^2)!} N^{k^2}

when {N \rightarrow \infty}, where {g_k} are the integers

\displaystyle  g_1 = 1, g_2 = 2, g_3 = 42, g_4 = 24024, \dots

and more generally

\displaystyle  g_k := \frac{(k^2)!}{\prod_{i=1}^{2k-1} i^{k-|k-i|}}

(OEIS A039622). Thus we have

\displaystyle {\bf E}_Q |\det(1-U)|^{2k} = \frac{g_k+o(1)}{k^2!} N^{k^2}

for {k=1,2} if {Q \rightarrow \infty} and {N} is sufficiently slowly growing depending on {Q}. The CUE hypothesis would imply that that this formula also holds for higher {k}. (The situation here is cleaner than in the number field case, in which the GUE hypothesis only suggests the correct lower bound for the moments rather than an asymptotic, due to the absence of the wildly fluctuating additional factor {\exp(A)} that is present in the Riemann zeta function model.)

Now we can recover the analogue of Montgomery’s work on the pair correlation conjecture. Consider the statistic

\displaystyle  {\bf E}_Q \sum_{1 \leq i,j \leq N} R( \lambda_i / \lambda_j )

where

\displaystyle R(z) = \sum_m \hat R(m) z^m

is some finite linear combination of monomials {z^m} independent of {q}. We can expand the above sum as

\displaystyle  \sum_m \hat R(m) {\bf E}_Q \mathrm{tr}(U^m) \mathrm{tr}(U^{-m}).

Assuming the CUE hypothesis, then by Example 3 of the previous post, we would conclude that

\displaystyle  {\bf E}_Q \sum_{1 \leq i,j \leq N} R( \lambda_i / \lambda_j ) = N^2 \hat R(0) + \sum_m \min(|m|,N) \hat R(m) + o(1). \ \ \ \ \ (12)

This is the analogue of Montgomery’s pair correlation conjecture. Proposition 3 implies that this claim is true whenever {\hat R} is supported on {[-N,N]}. If instead we assume the ACUE hypothesis (or the weaker Alternative Hypothesis that the phase gaps are non-zero multiples of {1/2N}), one should instead have

\displaystyle  {\bf E}_Q \sum_{1 \leq i,j \leq N} R( \lambda_i / \lambda_j ) = \sum_{k \in {\bf Z}} N^2 \hat R(2Nk) + \sum_{1 \leq |m| \leq N} |m| \hat R(m+2Nk) + o(1)

for arbitrary {R}; this is the function field analogue of a recent result of Baluyot. In any event, since {\mathrm{tr}(U^m) \mathrm{tr}(U^{-m})} is non-negative, we unconditionally have the lower bound

\displaystyle  {\bf E}_Q \sum_{1 \leq i,j \leq N} R( \lambda_i / \lambda_j ) \geq N^2 \hat R(0) + \sum_{1 \leq |m| \leq N} |m| \hat R(m) + o(1). \ \ \ \ \ (13)

if {\hat R(m)} is non-negative for {|m| > N}.

By applying (12) for various choices of test functions {R} we can obtain various bounds on the behaviour of eigenvalues. For instance suppose we take the Fejér kernel

\displaystyle  R(z) = |1 + z + \dots + z^N|^2 = \sum_{m=-N}^N (N+1-|m|) z^m.

Then (12) applies unconditionally and we conclude that

\displaystyle  {\bf E}_Q \sum_{1 \leq i,j \leq N} R( \lambda_i / \lambda_j ) = N^2 (N+1) + \sum_{1 \leq |m| \leq N} (N+1-|m|) |m| + o(1).

The right-hand side evaluates to {\frac{2}{3} N(N+1)(2N+1)+o(1)}. On the other hand, {R(\lambda_i/\lambda_j)} is non-negative, and equal to {(N+1)^2} when {\lambda_i = \lambda_j}. Thus

\displaystyle  {\bf E}_Q \sum_{1 \leq i,j \leq N} 1_{\lambda_i = \lambda_j} \leq \frac{2}{3} \frac{N(2N+1)}{N+1} + o(1).

The sum {\sum_{1 \leq j \leq N} 1_{\lambda_i = \lambda_j}} is at least {1}, and is at least {2} if {\lambda_i} is not a simple eigenvalue. Thus

\displaystyle  {\bf E}_Q \sum_{1 \leq i, \leq N} 1_{\lambda_i \hbox{ not simple}} \leq \frac{1}{3} \frac{N(N-1)}{N+1} + o(1),

and thus the expected number of simple eigenvalues is at least {\frac{2N}{3} \frac{N+4}{N+1} + o(1)}; in particular, at least two thirds of the eigenvalues are simple asymptotically on average. If we had (12) without any restriction on the support of {\hat R}, the same arguments allow one to show that the expected proportion of simple eigenvalues is {1-o(1)}.

Suppose that the phase gaps in {U} are all greater than {c/N} almost surely. Let {\hat R} is non-negative and {R(e^{i\theta})} non-positive for {\theta} outside of the arc {[-c/N,c/N]}. Then from (13) one has

\displaystyle  R(0) N \geq N^2 \hat R(0) + \sum_{1 \leq |m| \leq N} |m| \hat R(m) + o(1),

so by taking contrapositives one can force the existence of a gap less than {c/N} asymptotically if one can find {R} with {\hat R} non-negative, {R} non-positive for {\theta} outside of the arc {[-c/N,c/N]}, and for which one has the inequality

\displaystyle  R(0) N < N^2 \hat R(0) + \sum_{1 \leq |m| \leq N} |m| \hat R(m).

By a suitable choice of {R} (based on a minorant of Selberg) one can ensure this for {c \approx 0.6072} for {N} large; see Section 5 of these notes of Goldston. This is not the smallest value of {c} currently obtainable in the literature for the number field case (which is currently {0.50412}, due to Goldston and Turnage-Butterbaugh, by a somewhat different method), but is still significantly less than the trivial value of {1}. On the other hand, due to the compatibility of the ACUE distribution with Proposition 3, it is not possible to lower {c} below {0.5} purely through the use of Proposition 3.

In some cases it is possible to go beyond Proposition 3. Consider the mollified moment

\displaystyle  {\bf E}_Q |M(U) P(1)|^2

where

\displaystyle  M(U) = \sum_{m=0}^d a_m h_m(\lambda_1,\dots,\lambda_N)

for some coefficients {a_0,\dots,a_d}. We can compute this moment in the CUE case:

Proposition 5 We have

\displaystyle  {\bf E}_{CUE} |M(U) P(1)|^2 = |a_0|^2 + N \sum_{m=1}^d |a_m - a_{m-1}|^2.

Proof: From (5) one has

\displaystyle  P(1) = \sum_{i=0}^N (-1)^i e_i(\lambda_1,\dots,\lambda_N)

hence

\displaystyle  M(U) P(1) = \sum_{i=0}^N \sum_{m=0}^d (-1)^i a_m e_i h_m

where we suppress the dependence on the eigenvalues {\lambda}. Now observe the Pieri formula

\displaystyle  e_i h_m = s_{m 1^i} + s_{(m+1) 1^{i-1}}

where {s_{m 1^i}} are the hook Schur polynomials

\displaystyle  s_{m 1^i} = \sum_{a_1 \leq \dots \leq a_m; a_1 < b_1 < \dots < b_i} \lambda_{a_1} \dots \lambda_{a_m} \lambda_{b_1} \dots \lambda_{b_i}

and we adopt the convention that {s_{m 1^i}} vanishes for {i = -1}, or when {m = 0} and {i > 0}. Then {s_{m1^i}} also vanishes for {i\geq N}. We conclude that

\displaystyle  M(U) P(1) = a_0 s_{0 1^0} + \sum_{0 \leq i \leq N-1} \sum_{m \geq 1} (-1)^i (a_m - a_{m-1}) s_{m 1^i}.

As the Schur polynomials are orthonormal on the unitary group, the claim follows. \Box

The CUE hypothesis would then imply the corresponding mollified moment conjecture

\displaystyle  {\bf E}_{Q} |M(U) P(1)|^2 = |a_0|^2 + N \sum_{m=1}^d |a_m - a_{m-1}|^2 + o(1). \ \ \ \ \ (14)

(See this paper of Conrey, and this paper of Radziwill, for some discussion of the analogous conjecture for the zeta function, which is essentially due to Farmer.)

From Proposition 3 one sees that this conjecture holds in the range {d \leq \frac{1}{2} N}. It is likely that the function field analogue of the calculations of Conrey (based ultimately on deep exponential sum estimates of Deshouillers and Iwaniec) can extend this range to {d < \theta N} for any {\theta < \frac{4}{7}}, if {N} is sufficiently large depending on {\theta}; these bounds thus go beyond what is available from Proposition 3. On the other hand, as discussed in Remark 7 of the previous post, ACUE would also predict (14) for {d} as large as {N-2}, so the available mollified moment estimates are not strong enough to rule out ACUE. It would be interesting to see if there is some other estimate in the function field setting that can be used to exclude the ACUE hypothesis (possibly one that exploits the fact that GRH is available in the function field case?).

— 1. Deriving RH for Dirichlet {L}-functions from RH for curves —

In this section we show how every Dirichlet {L}-function over a function field with squarefree modulus {m} is a factor of the zeta function of some curve over a function field up to a finite number of local factors, thus giving RH for the former as a consequence of RH for the latter (which can in turn be established by elementary methods such as Stepanov’s method, as discussed in this previous post). The non-squarefree case is more complicated (and can be established using the machinery of Carlitz modules), but we will not need to develop that case here. Thanks to Felipe Voloch and Will Sawin for explaining some of the arguments in this section (from this MathOverflow post).

Let {n} be the order of the Dirichlet character in question. We first deal with the simplest case, in which the modulus {m} is irreducible, and {n} divides {q-1}; furthermore we assume that {(-1)^{\frac{q-1}{n}} = 1} in {{\mathbb F}}, that is to say at least one of {q} and {\frac{q-1}{n}} is even.

In this case, we will show that

\displaystyle  \prod_{\chi: \chi^n = 1} L(s,\chi) = \zeta_C(s)

up to a finite number of local factors, where {\chi} ranges over all Dirichlet characters of modulus {m} and order at most {n}, and {C} is the curve {\{ (x,y): y^n = m(x) \}} over {{\mathbb F}}. Taking logarithmic derivatives, this amounts to requiring the identity

\displaystyle  \sum_{\chi: \chi^n = 1} \sum_{a \in {\mathbb F}[X]': |a| = q^D} \chi(a) \Lambda_q(a) = \sum_{(x,y) \in C({\mathbb F}_{q^D})} 1 \ \ \ \ \ (15)

for all {D \geq 1}.

The term {\Lambda_q(a)} is only non-zero when {a} is of the form {a = p^j} for some {j|D} and some irreducible {p \in {\mathbb F}[X]'} of degree {d = D/j}, in which case {\Lambda_q(a) = d}. Each such {p} has {d} distinct roots in {{\mathbb F}_{q^d} \subset {\mathbb F}_{q^D}}, which by the Frobenius action can be given as {\beta, \beta^q, \dots, \beta^{q^{d-1}}}. Each such root can be a choice for {x} on the right-hand side of (15), and gives {n} choices for {y} if {m(\beta)} is an {n^{th}} power in {{\mathbb F}_{q^{dj}}}, and {0} otherwise. Thus we reduce to showing that for all but finitely many {p}, we have the {n^{th}} power reciprocity law

\displaystyle  \sum_{\chi: \chi^n = 1} \chi(p^j) = n 1_{m(\beta) = y^n \hbox{ for some } {\mathbb F}_{q^{dj}}}.

We can exclude the case {p=m}, so {p,m} are now coprime. Let {N} denote the degree of {m}. As {m} is assumed irreducible, the multiplicative group {({\mathbb F}[X]/m{\mathbb F}[X])^\times} is cyclic of order {q^N-1}. From this it is easy to see that {\sum_{\chi: \chi^n = 1} \chi(a)} is equal to {n} when {a^{\frac{q^N-1}{n}} = 1 \hbox{ mod } m}, and zero otherwise. Thus we need to show that

\displaystyle  p^{\frac{q^N-1}{n} j} = 1 \hbox{ mod } m \iff m(\beta) = y^n \hbox{ for some } {\mathbb F}_{q^{dj}}.

Let {\alpha,\alpha^q,\dots,\alpha^{q^{N-1}}} denote the roots of {m} (in some extension of {{\mathbb F}}). The condition {p^{\frac{q^N-1}{n} j} = 1 \hbox{ mod } m} can be rewritten as

\displaystyle  p^{\frac{q^N-1}{n} j}(\alpha) = 1.

Factoring {\frac{q^N-1}{n} = \frac{q-1}{N} (1 + q + q^2 + \dots + q^{N-1})}, this becomes

\displaystyle  (p(\alpha) p(\alpha^q) \dots p(\alpha^{q^{N-1}})^{\frac{q-1}{n}j} = 1.

Using resultants, this is just

\displaystyle  \mathrm{Res}(m,p)^{\frac{q-1}{n}j} = 1.

In a similar vein, as the multiplicative group of {{\mathbb F}_{q^{dj}}} is cyclic of order {q^{dj}-1}, one has {m(\beta) = y^n} for some {n \in {\mathbb F}_{q^{dj}}} if and only if

\displaystyle  m(\beta)^{\frac{q^{dj}-1}{n}} = 1,

which on factoring out {\frac{q-1}{n}} (and noting that {\beta^{q^d} = \beta}) becomes

\displaystyle  (m(\beta) m(\beta^q) \dots m(\beta^{q^{d-1}}))^{\frac{q-1}{n}j} = 1

or using resultants

\displaystyle  \mathrm{Res}(p,m)^{\frac{q-1}{n}j} = 1.

As {\mathrm{Res}(p,m) = \pm \mathrm{Res}(m,p)}, we obtain the claim in this case.

Next, we continue to assume that {n|q-1} and {(-1)^{\frac{q-1}{n}}=1}, but now allow {m} to be the product of distinct irreducibles {m_1,\dots,m_k}. The multiplicative group {({\mathbb F}[X]/m{\mathbb F}[X])^\times} now splits by the Chinese remainder theorem as the direct product of the cyclic groups {({\mathbb F}[X]/m_i{\mathbb F}[X])^\times}, {i=1,\dots,k}. It is then not difficult to repeat the above arguments, replacing {C} by the curve

\displaystyle  \{ (x,y_1,\dots,y_k): y_i^n = m_i(x) \hbox{ for } i=1,\dots,k\};

we leave the details to the reader.

Finally, we now remove the hypotheses that {n|q-1} and {(-1)^{\frac{q-1}{n}}=1}. As {m} is square-free, the Euler totient function {\phi(m)} is the product of quantities of the form {q^d-1} and is thus coprime to {q}; in particular, as {n} must divide this Euler totient function, {n} is also coprime to {q}. There must then exist some power {q^k} of {q} such that {n|q^k-1}; if {q} is odd, one can also ensure that {2n|q^k-1}, thus in either case we have {(-1)^{\frac{q^k-1}{n}}=1}. Thus to reduce to the previous case, we somehow need to change {q} to {q^k} (note that {m} will be squarefree in any field extension, since finite fields are perfect).

Let {{\mathbb F}_{q^k}} be a degree {k} field extension of {{\mathbb F}}, then {{\mathbb F}_{q^k}(T)} is a degree {k} extension of {{\mathbb F}_q(T)}. Let {\mathbf{Norm}:{\mathbb F}_{q^k}(T) \rightarrow {\mathbb F}_{q}(T)} be the norm map. Let {\chi_k: {\mathbb F}_{q^k}[X] \rightarrow {\bf C}} be the composition {\chi_k := \chi \circ \mathrm{Norm}} of the original Dirichlet character {\chi} with the norm map; this can then be checked to be a Dirichlet character on {{\mathbb F}_{q^k}[X]} with modulus {m}, of order dividing {n}. We claim that

\displaystyle  \prod_{\zeta^k = 1} L(\zeta s, \chi) = L( s, \chi_k )

where {\zeta} ranges over the complex {k^{th}} roots of unity, which allows us to establish RH for {L(s,\chi)} from that of {L(s,\chi_k)}. Taking logarithms, we see that it suffices to show that

\displaystyle  k \sum_{a \in {\mathbb F}[X]: |a| = q^{kj}} \log q \Lambda_q(a) \chi(a) = \sum_{b \in {\mathbb F}_{q^k}[X]: |b| = q^{kj}} \log q^k \Lambda_{q^k}(b) \chi_k(b)

or equivalently

\displaystyle  \sum_{a \in {\mathbb F}[X]: |a| = q^{kj}} \mathrm{deg}(a) \chi(a) = \sum_{b \in {\mathbb F}_{q^k}[X]: |b| = q^{kj}} \mathrm{deg}(b) \chi(\mathrm{Norm}(b))

for all {j \geq 0}. But each {a} that gives a non-zero contribution on the right-hand side is the power of some irreducible {p} in {{\mathbb F}[X]}, which then splits into (say) {s} distinct irreducibles {p_1,\dots,p_s} in {{\mathbb F}_k[X]}, with degree {1/s} that of {p}, and all of norm {p^{k/s}}. This gives {s} contributions to the right-hand side, each of which is {\frac{1}{s}} times that of the left-hand side; conversely, every term in the right-hand side arises precisely once in this fashion. The claim follows.

John BaezThe Monoidal Grothendieck Construction

My grad student Joe Moeller is talking at the 4th Symposium on Compositional Structures this Thursday! He’ll talk about his work with Christina Vasilakopolou, a postdoc here at U.C. Riverside. Together they created a monoidal version of a fundamental construction in category theory: the Grothendieck construction! Here is their paper:

• Joe Moeller and Christina Vasilakopoulou, Monoidal Grothendieck construction.

The monoidal Grothendieck construction plays an important role in our team’s work on network theory, in at least two ways. First, we use it to get a symmetric monoidal category, and then an operad, from any network model. Second, we use it to turn any decorated cospan category into a ‘structured cospan category’. I haven’t said anything about structured cospans yet, but they are an alternative approach to open systems, developed by my grad student Kenny Courser, that I’m very excited about. Stay tuned!

The Grothendieck construction turns a functor

F \colon \mathsf{X}^{\mathrm{op}} \to \mathsf{Cat}

into a category \int F equipped with a functor

p \colon \int F \to \mathsf{X}

The construction is quite simple but there’s a lot of ideas and terminology connected to it: for example a functor F \colon \mathsf{X}^{\mathrm{op}} \to \mathsf{Cat} is called an indexed category since it assigns a category to each object of \mathsf{X}, while the functor p \colon \int F \to \mathsf{X} is of a special sort called a fibration.

I think the easiest way to learn more about the Grothendieck construction and this new monoidal version may be Joe’s talk:

• Joe Moeller, Monoidal Grothendieck construction, SYCO4, Chapman University, 22 May 2019.

Abstract. We lift the standard equivalence between fibrations and indexed categories to an equivalence between monoidal fibrations and monoidal indexed categories, namely weak monoidal pseudofunctors to the 2-category of categories. In doing so, we investigate the relation between this global monoidal structure where the total category is monoidal and the fibration strictly preserves the structure, and a fibrewise one where the fibres are monoidal and the reindexing functors strongly preserve the structure, first hinted by Shulman. In particular, when the domain is cocartesian monoidal, lax monoidal structures on a functor to Cat bijectively correspond to lifts of the functor to MonCat. Finally, we give some indicative examples where this correspondence appears, spanning from the fundamental and family fibrations to network models and systems.

To dig deeper, try this talk Christina gave at the big annual category theory conference last year:

• Christina Vasilakopoulou, Monoidal Grothendieck construction, CT2018, University of Azores, 10 July 2018.

Then read Joe and Christina’s paper!

Here is the Grothendieck construction in a nutshell:


John BaezSymposium on Compositional Structures 4: Program


Here’s the program for this conference:

Symposium on Compositional Structures 4, 22–23 May, 2019, Chapman University, California. Organized by Alexander Kurz.

A lot of my students and collaborators are speaking here! The meeting will take place in Beckman Hall 101.

Wednesday May 22, 2019

• 10:30–11:30 — Registration.

• 11:30–12:30 — John Baez, “Props in Network Theory“.

• 12:30–1:00 — Jade Master, “Generalized Petri Nets”.

• 1:00–2:00 — Lunch.

• 2:00–2:30 — Christian Williams, “Enriched Lawvere Theories for Operational Semantics”.

• 2:30–3:00 — Kenny Courser, “Structured Cospans”.

• 3:00–3:30 — Daniel Cicala, “Rewriting Structured Cospans”.

• 3:30–4:00 — Break.

• 4:00–4:30 — Samuel Balco and Alexander Kurz, “Nominal String Diagrams”.

• 4:30–5:00 — Jeffrey Morton, “2-Group Actions and Double Categories”.

• 5:00–5:30 — Michael Shulman, “All (∞,1)-Toposes Have Strict Univalent Universes”.

• 5:30–6:30 — Reception.

Thursday May 23, 2019

• 9:30–10:30 — Nina Otter, “A Unified Framework for Equivalences in Social Networks”.

• 10:30–11:00 — Kohei Kishida, Soroush Rafiee Rad, Joshua Sack and Shengyang Zhong, “Categorical Equivalence between Orthocomplemented Quantales and Complete Orthomodular Lattices”.

• 11:00–11:30 — Break.

• 11:30–12:00 — Cole Comfort, “Circuit Relations for Real Stabilizers: Towards TOF+H”.

• 12:00–12:30 — Owen Biesel, “Duality for Algebras of the Connected Planar Wiring Diagrams Operad”.

• 12:30–1:00 — Joe Moeller and Christina Vasilakopoulou, “Monoidal Grothendieck Construction”.

• 1:00–2:00 — Lunch.

• 2:00–3:00 — Tobias Fritz, “Categorical Probability: Results and Challenges”.

• 3:00–3:30 — Harsh Beohar and Sebastian Küpper, “Bisimulation Maps in Presheaf Categories”.

• 3:30–4:00 — Break.

• 4:00–4:30 — Benjamin MacAdam, Jonathan Gallagher and Rory Lucyshyn-Wright, “Scalars in Tangent Categories”.

• 4:30–5:00 — Jonathan Gallagher, Benjamin MacAdam and Geoff Cruttwell, “Towards Formalizing and Extending Differential Programming via Tangent Categories”.

• 5:00–5:30 — David Sprunger and Shin-Ya Katsumata, “Differential Categories, Recurrent Neural Networks, and Machine Learning”.

May 20, 2019

Jacques Distler Instiki 0.30.0 and tex2svg 1.0

Instiki is my wiki-cum-collaboration platform. It has a built-in WYSIWYG vector-graphics drawing program, which is great for making figures. Unfortunately:

  • An extra step is required, in order to convert the resulting SVG into PDF for inclusion in the LaTeX paper. And what you end up with is a directory full of little PDF files (one for each figure), which need to be managed.
  • Many of my colleagues would rather use Tikz, which has become the de-facto standard for including figures in LaTeX.

Obviously, I needed to include Tikz support in Instiki. But, up until now, I didn’t really see a good way to do that, given that I wanted something that is

  1. Portable
  2. Secure

Both considerations pointed towards creating a separate, standalone piece of software to handle the conversion, which communicates with Instiki over a (local or remote) port. tex2svg 1.0.1 requires a working TeX installation and the pdf2svg commandline utility. The latter, in turn, requires the poppler-glib library, which is easily obtained from your favourite package manager. E.g., under Fink, on MacOS, you do a

fink install poppler8-glib

before install pdf2svg.

But portability is not enough. If you’re going to expose Instiki over the internet, you also need to make it secure. TeX is a Turing-complete language with (limited) access to the file system. It is trivial to compose some simple LaTeX input which, when compiled, will

  • exfiltrate sensitive information from the machine or
  • DoS the machine by using up 100% of the CPU time or filling up 100% of the available disk space.

You should never, ever compile a TeX file from an untrusted source.

tex2svg rigorously filters its input, allowing only a known-safe subset of LaTeX commands through. And it limits the size of the input. So it should be safe to use, even on the internet.

After starting up the tex2svg server, you just uncomment the last line of config/environments/production.rb and restart Instiki. Now you can write something like

\begin{tikzpicture}[decoration={markings,
mark=at position .5 with {\arrow{>}}}]
\usetikzlibrary{arrows,shapes,decorations.markings}
\begin{scope}[scale=2.0]
\node[Bl,scale=.75] (or1) at (8,3) {};
\node[scale=1] at (8.7,2.9) {$D3$ brane};
\node[draw,diamond,fill=yellow,scale=.3] (A1) at (7,0) {}; 
\draw[dashed] (A1) -- (7,-.7);
\node[draw,diamond,fill=yellow,scale=.3] (A2) at (7.5,0) {}; 
\draw[dashed] (A2) -- (7.5,-.7);
\node[draw,diamond,fill=yellow,scale=.3] (A3) at (8,0) {}; 
\draw[dashed] (A3) -- (8,-.7);
\node[draw,diamond,fill=yellow,scale=.3] (A4) at (8.5,0) {}; 
\draw[dashed] (A4) -- (8.5,-.7);
\node[draw,diamond,fill=yellow,scale=.3] (A5) at (9,0) {};
\draw[dashed] (A5) -- (9,-.7);
\node[draw,circle,fill=aqua,scale=.3] (B) at (9.5,0) {};
\draw[dashed] (B) -- (9.5,-.7);
\node[draw,regular polygon,regular polygon sides=5,fill=purple,scale=.3] (C1) at (10,0) {}; 
\draw[dashed] (C1) -- (10,-.7);
\node[draw,regular polygon,regular polygon sides=5,fill=purple,scale=.3] (C2) at (10.5,0) {};
\draw[dashed] (C2) -- (10.5,-.7);
\draw (6.8,-.7) -- (6.8,-.9) to (9.2,-.9) to (9.2,-.7);
\draw (9.8,-.7) -- (9.8,-.9) to (10.7,-.9) to (10.7,-.7);
\draw[->-=.75] (C2) to (10.2,.35);
\draw[->-=.75] (C1) to (10.2,.35);
\node[scale=.6] at (9.9,.35) {$(2,2)$};
\draw[->-=.7] (B) to (9.6,.7);
\draw (10.2,.35) to (9.6,.7);
\node[scale=.6] at (9.35,.9) {$(4,0)$};
\draw[->-=.5] (9.1,.8) to (A5);
\draw (9.6,.7) to (9.1,.8) to (A5);
\draw (9.1,.8) to [out=170,in=280] (8.3,1.45);
\draw[dashed] (8.3,1.45) to (8.1,2.5);
\draw[->-=.5] (8.1,2.5) to (or1);
\node[scale=.75] at (7.7,2.7) {$(3,0)$};
%\draw (11.4,2.4) to [out=180,in=90] (6.2,-.5) to [out=90,in=0] (or1) -- cycle;
\node[scale=.75] at (8,-1.1) {A-type};
\node[scale=.75] at (9.5,-1.1) {B-type};
\node[scale=.75] at (10.25,-1.1) {C-type};
\draw[dashed] (8.7,.6) to [out=180,in=90] (6.2,-.55) to [out=270,in=180] (8.7,-1.6) to [out=0,in=270] (11.2,-.55) to [out=90,in=0] (8.7,.6) -- cycle;
\node[scale=1] at (12,.6) {$E_6$ singularity};
\end{scope}
\end{tikzpicture}

in Instiki and have it produce

Instiki 0.30.0 incorporates these changes, is compatible with Ruby 2.6, and greatly accelerates the process of saving pages (over previous versions).

May 19, 2019

Doug NatelsonMagnets and energy machines - everything old is new again.

(Very) long-time readers of this blog will remember waaaay back in 2006-2007, when an Irish company called Steorn claimed that they had invented a machine, based on rotation and various permanent magnets, that allegedly produced more energy than it consumed.  I wrote about this herehere (complete with comments from Steorn's founder), and here.  Long story short:  The laws of thermodynamics were apparently safe, and Steorn is long gone.

This past Friday, the Wall Street Journal published this article (sorry about the pay wall - I couldn't find a non-subscriber link that worked), describing how Dennis Danzik, science and technology officer for Inductance Energy Corp, claims to have built a gizmo called the Earth Engine.  This gadget is a big rotary machine that claims it spins two 900 kg flywheels at 125 rpm (for the slow version), and generates 240V at up to 100A, with no fuel, no emissions, and practically no noise.  They claim to have run one of these in January for 422 hours generating an average 4.4 kW.  If you want, you can watch a live-stream of a version made largely out of clear plastic, designed to show that there are no hidden tricks. 

To the credit of Dan Neil, the Pulitzer-winning WSJ writer, he does state, repeatedly, in the article that physicists think this can't be right.  He includes a great quote from Don Lincoln:  "Perpetual motion machines are bunk, and magnets are the refuge of charlatans." 

Not content with just violating the law of conservation of energy, the claimed explanation relies on a weird claim that seemingly would imply a non-zero divergence of \(\mathbf{B}]) and therefore magnetic monopoles:  "The magnets IEC uses are also highly one-sided, or 'anisotropic,' which means that their field is stronger on one face than the other - say 85% North and 15% South". 

I wouldn't rush out and invest in these folks just yet.

May 18, 2019

Jordan EllenbergWhen the coffee cup shattered on the kitchen floor

As an eternal 1990s indie-pop nerd I could not but be thrilled this week when I realized I was going to Bristol

on the National Express.

Bristol, besides having lots of great mathematicians to talk to, is much lovelier than I knew. There’s lots of terrain! It seems every time you turn a corner there’s another fine vista of pastel-painted row houses and the green English hills far away. There’s a famous bridge. I walked across it, then sat on a bench at the other side doing some math, in the hopes I’d think of something really good, because I’ve always wanted to scratch some math on a British bridge, William Rowan Hamilton-style. Didn’t happen. There was a bus strike in Bristol for civil rights because the bus companies didn’t allow black or Indian drivers; the bus lines gave in to the strikers and integrated on the same day Martin Luther King, Jr. was saying “I have a dream” in Washington, DC. There’s a chain of tea shops in Bristol called Boston Tea Party. I think it’s slightly weird to have a commercial operation named after an anti-colonial uprising against your own country, but my colleagues said no one there really thinks of it that way. The University of Bristol, by the way, is sort of the Duke of the UK, in that it was founded by a limitless bequest from the biggest tobacco family in the country, the Willses. Bristol also has this clock:

May 17, 2019

Matt von HippelTwo Loops, Five Particles

There’s a very long-term view of the amplitudes field that gets a lot of press. We’re supposed to be eliminating space and time, or rebuilding quantum field theory from scratch. We build castles in the clouds, seven-loop calculations and all-loop geometrical quantum jewels.

There’s a shorter-term problem, though, that gets much less press, despite arguably being a bigger part of the field right now. In amplitudes, we take theories and turn them into predictions, order by order and loop by loop. And when we want to compare those predictions to the real world, in most cases the best we can do is two loops and five particles.

Five particles here counts the particles coming in and going out: if two gluons collide and become three gluons, we count that as five particles, two in plus three out. Loops, meanwhile, measure the complexity of the calculation, the number of closed paths you can draw in a Feynman diagram. If you use more loops, you expect more precision: you’re approximating nature step by step.

As a field we’re pretty good at one-loop calculations, enough to do them for pretty much any number of particles. As we try for more loops though, things rapidly get harder. Already for two loops, in many cases, we start struggling. We can do better if we dial down the number of particles: there are three-particle and two-particle calculations that get up to three, four, or even five loops. For more particles though, we can’t do as much. Thus the current state of the art, the field’s short term goal: two loops, five particles.

When you hear people like me talk about crazier calculations, we’ve usually got a trick up our sleeve. Often we’re looking at a much simpler theory, one that doesn’t describe the real world. For example, I like working with a planar theory, with lots of supersymmetry. Remove even one of those simplifications, and suddenly our life becomes a lot harder. Instead of seven loops and six particles, we get genuinely excited about, well, two loops five particles.

Luckily, two loops five particles is also about as good as the experiments can measure. As the Large Hadron Collider gathers more data, it measures physics to higher and higher precision. Currently for five-particle processes, its precision is just starting to be comparable with two-loop calculations. The result has been a flurry of activity, applying everything from powerful numerical techniques to algebraic geometry to the problem, getting results that genuinely apply to the real world.

“Two loops, five particles” isn’t as cool of a slogan as “space-time is doomed”. It doesn’t get much, or any media attention. But, steadily and quietly, it’s become one of the hottest topics in the amplitudes field.

Doug NatelsonLight emission from metal nanostructures

There are many ways to generate light from an electrically driven metal nanostructure.  

The simplest situation is just what happens in an old-school incandescent light bulb, or the heating element in a toaster.  An applied voltage \(V\) drives a current \(I\) in a wire, and as we learn in freshman physics, power \(IV\) is dissipated in the metal - energy is transferred into the electrons (spreading them out up to higher energy levels within the metal than in the undriven situation, with energy transfer between the electrons due to electron-electron interactions) and the disorganized vibrational jiggling of the atoms (as the electrons also couple to lattice vibrations, the phonons).  The scattering electrons and jiggling ions emit light (even classically, that's what accelerating charges do).  If we look on time scales and distance scales long compared to the various e-e and e-lattice scattering processes, we can describe the vibrations and electron populations as having some local temperature.  Light is just electromagnetic waves.  Light in thermal equilibrium with a system (on average, no net energy transfer between the light and the system) is distributed in a particular way generically called a black body spectrum.  The short version:  current heat metal structures, and hot structures glow.  My own group found an example of this with very short platinum wires.  

In nanostructures, things can get more complicated.  Metal nanostructures can support collective electronic modes called plasmons.  Plasmons can "decay" in different ways, including emitting photons (just like an atom in an excited state can emit a photon and end up in the ground state, if appropriate rules are followed).  It was found more than 40 years ago that a metal/insulator/metal tunnel junction can emit light when driven electrically.  The idea is, a tunneling electron picks up energy \(eV\) when going from one side of the junction to the other.   Some fraction of tunneling electrons deposit that energy into plasmon modes, and some of those plasmon modes decay radiatively, spitting out light with energy \(\hbar \omega \le eV\).

This same thing can happen in scanning tunneling microscopy.  There is a "tip mode" plasmon where the STM tip is above the conducting sample, and this can be excited electrically.  That tip plasmon can decay optically and spit out photons, as discussed in some detail here back in 1990. 

The situation is tricky, though.  When it comes down to atomic-scale tunneling and all the details, there are deep connections between light emission and shot noise.  Light emission is often seen at energies larger than \(eV\), implying that there can be multi-electron processes at work.  In planar tunneling structures, light emission can happen at considerably higher energies, and it really looks there like there is radiation due to the nonequilibrium electronic distribution.  It's a fascinating area - lots of rich physics.