Planet Musings

May 28, 2022

John BaezTutorial on Categorical Semantics of Entropy

Here are two talks on the categorical semantics of entropy, given on Wednesday May 11th 2022 at CUNY. First one there’s one by me and then starting around 1:31:00 there’s one by Tai-Danae Bradley:

My talk is called “Shannon entropy from category theory”:

Shannon entropy is a powerful concept. But what properties single out Shannon entropy as special? Instead of focusing on the entropy of a probability measure on a finite set, it can help to focus on the “information loss”, or change in entropy, associated with a measure-preserving function. Shannon entropy then gives the only concept of information loss that is functorial, convex-linear and continuous. This is joint work with Tom Leinster and Tobias Fritz.

Tai-Danae Bradley’s is called “Operads and entropy”:

This talk will open with a basic introduction to operads and their representations, with the main example being the operad of probabilities. I’ll then give a light sketch of how this framework leads to a small, but interesting, connection between information theory, abstract algebra, and topology, namely a correspondence between Shannon entropy and derivations of the operad of probabilities.

My talk is mainly about this paper:

• John Baez, Tobias Fritz and Tom Leinster, A characterization of entropy in terms of information loss, 2011.

and hers is mainly about this:

• Tai-Danae Bradley, Entropy as a topological operad derivation, 2021.

Here are some related readings:

• Tom Leinster, An operadic introduction to entropy, 2011.

• John Baez and Tobias Fritz, A Bayesian characterization of relative entropy, 2014.

• Tom Leinster, A short characterization of relative entropy, 2017.

• Nicolas Gagné and Prakash Panangaden,
A categorical characterization of relative entropy on standard Borel spaces, 2017.

• Tom Leinster, Entropy and Diversity: the Axiomatic Approach, 2020.

• Arthur Parzygnat, A functorial characterization of von Neumann entropy, 2020.

• Arthur Parzygnat, Towards a functorial description of quantum relative entropy, 2021.

May 27, 2022

Terence TaoNotes on inverse theorem entropy

Let {G} be a finite set of order {N}; in applications {G} will be typically something like a finite abelian group, such as the cyclic group {{\bf Z}/N{\bf Z}}. Let us define a {1}-bounded function to be a function {f: G \rightarrow {\bf C}} such that {|f(n)| \leq 1} for all {n \in G}. There are many seminorms {\| \|} of interest that one places on functions {f: G \rightarrow {\bf C}} that are bounded by {1} on {1}-bounded functions, such as the Gowers uniformity seminorms {\| \|_k} for {k \geq 1} (which are genuine norms for {k \geq 2}). All seminorms in this post will be implicitly assumed to obey this property.

In additive combinatorics, a significant role is played by inverse theorems, which abstractly take the following form for certain choices of seminorm {\| \|}, some parameters {\eta, \varepsilon>0}, and some class {{\mathcal F}} of {1}-bounded functions:

Theorem 1 (Inverse theorem template) If {f} is a {1}-bounded function with {\|f\| \geq \eta}, then there exists {F \in {\mathcal F}} such that {|\langle f, F \rangle| \geq \varepsilon}, where {\langle,\rangle} denotes the usual inner product

\displaystyle  \langle f, F \rangle := {\bf E}_{n \in G} f(n) \overline{F(n)}.

Informally, one should think of {\eta} as being somewhat small but fixed independently of {N}, {\varepsilon} as being somewhat smaller but depending only on {\eta} (and on the seminorm), and {{\mathcal F}} as representing the “structured functions” for these choices of parameters. There is some flexibility in exactly how to choose the class {{\mathcal F}} of structured functions, but intuitively an inverse theorem should become more powerful when this class is small. Accordingly, let us define the {(\eta,\varepsilon)}-entropy of the seminorm {\| \|} to be the least cardinality of {{\mathcal F}} for which such an inverse theorem holds. Seminorms with low entropy are ones for which inverse theorems can be expected to be a useful tool. This concept arose in some discussions I had with Ben Green many years ago, but never appeared in print, so I decided to record some observations we had on this concept here on this blog.

Lebesgue norms {\| f\|_{L^p} := ({\bf E}_{n \in G} |f(n)|^p)^{1/p}} for {1 < p < \infty} have exponentially large entropy (and so inverse theorems are not expected to be useful in this case):

Proposition 2 ({L^p} norm has exponentially large inverse entropy) Let {1 < p < \infty} and {0 < \eta < 1}. Then the {(\eta,\eta^p/4)}-entropy of {\| \|_{L^p}} is at most {(1+8/\eta^p)^N}. Conversely, for any {\varepsilon>0}, the {(\eta,\varepsilon)}-entropy of {\| \|_{L^p}} is at least {\exp( c \varepsilon^2 N)} for some absolute constant {c>0}.

Proof: If {f} is {1}-bounded with {\|f\|_{L^p} \geq \eta}, then we have

\displaystyle  |\langle f, |f|^{p-2} f \rangle| \geq \eta^p

and hence by the triangle inequality we have

\displaystyle  |\langle f, F \rangle| \geq \eta^p/2

where {F} is either the real or imaginary part of {|f|^{p-2} f}, which takes values in {[-1,1]}. If we let {\tilde F} be {F} rounded to the nearest multiple of {\eta^p/4}, then by the triangle inequality again we have

\displaystyle  |\langle f, \tilde F \rangle| \geq \eta^p/4.

There are only at most {1+8/\eta^p} possible values for each value {\tilde F(n)} of {\tilde F}, and hence at most {(1+8/\eta^p)^N} possible choices for {\tilde F}. This gives the first claim.

Now suppose that there is an {(\eta,\varepsilon)}-inverse theorem for some {{\mathcal F}} of cardinality {M}. If we let {f} be a random sign function (so the {f(n)} are independent random variables taking values in {-1,+1} with equal probability), then there is a random {F \in {\mathcal F}} such that

\displaystyle  |\langle f, F \rangle| \geq \varepsilon

and hence by the pigeonhole principle there is a deterministic {F \in {\mathcal F}} such that

\displaystyle  {\bf P}( |\langle f, F \rangle| \geq \varepsilon ) \leq 1/M.

On the other hand, from the
Hoeffding inequality one has

\displaystyle  {\bf P}( |\langle f, F \rangle| \geq \varepsilon ) \ll \exp( - c \varepsilon^2 N )

for some absolute constant {c}, hence

\displaystyle  M \geq \exp( c \varepsilon^2 N )

as claimed. \Box

Most seminorms of interest in additive combinatorics, such as the Gowers uniformity norms, are bounded by some finite {L^p} norm thanks to Hölder’s inequality, so from the above proposition and the obvious monotonicity properties of entropy, we conclude that all Gowers norms on finite abelian groups {G} have at most exponential inverse theorem entropy. But we can do significantly better than this:

  • For the {U^1} seminorm {\|f\|_{U^1(G)} := |{\bf E}_{n \in G} f(n)|}, one can simply take {{\mathcal F} = \{1\}} to consist of the constant function {1}, and the {(\eta,\eta)}-entropy is clearly equal to {1} for any {0 < \eta < 1}.
  • For the {U^2} norm, the standard Fourier-analytic inverse theorem asserts that if {\|f\|_{U^2(G)} \geq \eta} then {|\langle f, e(\xi \cdot) \rangle| \geq \eta^2} for some Fourier character {\xi \in \hat G}. Thus the {(\eta,\eta^2)}-entropy is at most {N}.
  • For the {U^k({\bf Z}/N{\bf Z})} norm on cyclic groups for {k > 2}, the inverse theorem proved by Green, Ziegler, and myself gives an {(\eta,\varepsilon)}-inverse theorem for some {\varepsilon \gg_{k,\eta} 1} and {{\mathcal F}} consisting of nilsequences {n \mapsto F(g(n) \Gamma)} for some filtered nilmanifold {G/\Gamma} of degree {k-1} in a finite collection of cardinality {O_{\eta,k}(1)}, some polynomial sequence {g: {\bf Z} \rightarrow G} (which was subsequently observed by Candela-Sisask (see also Manners) that one can choose to be {N}-periodic), and some Lipschitz function {F: G/\Gamma \rightarrow {\bf C}} of Lipschitz norm {O_{\eta,k}(1)}. By the Arzela-Ascoli theorem, the number of possible {F} (up to uniform errors of size at most {\varepsilon/2}, say) is {O_{\eta,k}(1)}. By standard arguments one can also ensure that the coefficients of the polynomial {g} are {O_{\eta,k}(1)}, and then by periodicity there are only {O(N^{O_{\eta,k}(1)}} such polynomials. As a consequence, the {(\eta,\varepsilon)}-entropy is of polynomial size {O_{\eta,k}( N^{O_{\eta,k}(1)} )} (a fact that seems to have first been implicitly observed in Lemma 6.2 of this paper of Frantzikinakis; thanks to Ben Green for this reference). One can obtain more precise dependence on {\eta,k} using the quantitative version of this inverse theorem due to Manners; back of the envelope calculations using Section 5 of that paper suggest to me that one can take {\varepsilon = \eta^{O_k(1)}} to be polynomial in {\eta} and the entropy to be of the order {O_k( N^{\exp(\exp(\eta^{-O_k(1)}))} )}, or alternatively one can reduce the entropy to {O_k( \exp(\exp(\eta^{-O_k(1)})) N^{\eta^{-O_k(1)}})} at the cost of degrading {\varepsilon} to {1/\exp\exp( O(\eta^{-O(1)}))}.
  • If one replaces the cyclic group {{\bf Z}/N{\bf Z}} by a vector space {{\bf F}_p^n} over some fixed finite field {{\bf F}_p} of prime order (so that {N=p^n}), then the inverse theorem of Ziegler and myself (available in both high and low characteristic) allows one to obtain an {(\eta,\varepsilon)}-inverse theorem for some {\varepsilon \gg_{k,\eta} 1} and {{\mathcal F}} the collection of non-classical degree {k-1} polynomial phases from {{\bf F}_p^n} to {S^1}, which one can normalize to equal {1} at the origin, and then by the classification of such polynomials one can calculate that the {(\eta,\varepsilon)} entropy is of quasipolynomial size {\exp( O_{p,k}(n^{k-1}) ) = \exp( O_{p,k}( \log^{k-1} N ) )} in {N}. By using the recent work of Gowers and Milicevic, one can make the dependence on {p,k} here more precise, but we will not perform these calcualtions here.
  • For the {U^3(G)} norm on an arbitrary finite abelian group, the recent inverse theorem of Jamneshan and myself gives (after some calculations) a bound of the polynomial form {O( q^{O(n^2)} N^{\exp(\eta^{-O(1)})})} on the {(\eta,\varepsilon)}-entropy for some {\varepsilon \gg \eta^{O(1)}}, which one can improve slightly to {O( q^{O(n^2)} N^{\eta^{-O(1)}})} if one degrades {\varepsilon} to {1/\exp(\eta^{-O(1)})}, where {q} is the maximal order of an element of {G}, and {n} is the rank (the number of elements needed to generate {G}). This bound is polynomial in {N} in the cyclic group case and quasipolynomial in general.

For general finite abelian groups {G}, we do not yet have an inverse theorem of comparable power to the ones mentioned above that give polynomial or quasipolynomial upper bounds on the entropy. However, there is a cheap argument that at least gives some subexponential bounds:

Proposition 3 (Cheap subexponential bound) Let {k \geq 2} and {0 < \eta < 1/2}, and suppose that {G} is a finite abelian group of order {N \geq \eta^{-C_k}} for some sufficiently large {C_k}. Then the {(\eta,c_k \eta^{O_k(1)})}-complexity of {\| \|_{U^k(G)}} is at most {O( \exp( \eta^{-O_k(1)} N^{1 - \frac{k+1}{2^k-1}} ))}.

Proof: (Sketch) We use a standard random sampling argument, of the type used for instance by Croot-Sisask or Briet-Gopi (thanks to Ben Green for this latter reference). We can assume that {N \geq \eta^{-C_k}} for some sufficiently large {C_k>0}, since otherwise the claim follows from Proposition 2.

Let {A} be a random subset of {{\bf Z}/N{\bf Z}} with the events {n \in A} being iid with probability {0 < p < 1} to be chosen later, conditioned to the event {|A| \leq 2pN}. Let {f} be a {1}-bounded function. By a standard second moment calculation, we see that with probability at least {1/2}, we have

\displaystyle  \|f\|_{U^k(G)}^{2^k} = {\bf E}_{n, h_1,\dots,h_k \in G} f(n) \prod_{\omega \in \{0,1\}^k \backslash \{0\}} {\mathcal C}^{|\omega|} \frac{1}{p} 1_A f(n + \omega \cdot h)

\displaystyle + O((\frac{1}{N^{k+1} p^{2^k-1}})^{1/2}).

Thus, by the triangle inequality, if we choose {p := C \eta^{-2^{k+1}/(2^k-1)} / N^{\frac{k+1}{2^k-1}}} for some sufficiently large {C = C_k > 0}, then for any {1}-bounded {f} with {\|f\|_{U^k(G)} \geq \eta/2}, one has with probability at least {1/2} that

\displaystyle  |{\bf E}_{n, h_1,\dots,h_k \i2^n G} f(n) \prod_{\omega \in \{0,1\}^k \backslash \{0\}} {\mathcal C}^{|\omega|} \frac{1}{p} 1_A f(n + \omega \cdot h)|

\displaystyle \geq \eta^{2^k}/2^{2^k+1}.

We can write the left-hand side as {|\langle f, F \rangle|} where {F} is the randomly sampled dual function

\displaystyle  F(n) := {\bf E}_{n, h_1,\dots,h_k \in G} f(n) \prod_{\omega \in \{0,1\}^k \backslash \{0\}} {\mathcal C}^{|\omega|+1} \frac{1}{p} 1_A f(n + \omega \cdot h).

Unfortunately, {F} is not {1}-bounded in general, but we have

\displaystyle  \|F\|_{L^2(G)}^2 \leq {\bf E}_{n, h_1,\dots,h_k ,h'_1,\dots,h'_k \in G}

\displaystyle  \prod_{\omega \in \{0,1\}^k \backslash \{0\}} \frac{1}{p} 1_A(n + \omega \cdot h) \frac{1}{p} 1_A(n + \omega \cdot h')

and the right-hand side can be shown to be {1+o(1)} on the average, so we can condition on the event that the right-hand side is {O(1)} without significant loss in falure probability.

If we then let {\tilde f_A} be {1_A f} rounded to the nearest Gaussian integer multiple of {\eta^{2^k}/2^{2^{10k}}} in the unit disk, one has from the triangle inequality that

\displaystyle  |\langle f, \tilde F \rangle| \geq \eta^{2^k}/2^{2^k+2}

where {\tilde F} is the discretised randomly sampled dual function

\displaystyle  \tilde F(n) := {\bf E}_{n, h_1,\dots,h_k \in G} f(n) \prod_{\omega \in \{0,1\}^k \backslash \{0\}} {\mathcal C}^{|\omega|+1} \frac{1}{p} \tilde f_A(n + \omega \cdot h).

For any given {A}, there are at most {2np} places {n} where {\tilde f_A(n)} can be non-zero, and in those places there are {O_k( \eta^{-2^{k}})} possible values for {\tilde f_A(n)}. Thus, if we let {{\mathcal F}_A} be the collection of all possible {\tilde f_A} associated to a given {A}, the cardinality of this set is {O( \exp( \eta^{-O_k(1)} N^{1 - \frac{k+1}{2^k-1}} ) )}, and for any {f} with {\|f\|_{U^k(G)} \geq \eta/2}, we have

\displaystyle  \sup_{\tilde F \in {\mathcal F}_A} |\langle f, \tilde F \rangle| \geq \eta^{2^k}/2^{k+2}

with probability at least {1/2}.

Now we remove the failure probability by independent resampling. By rounding to the nearest Gaussian integer multiple of {c_k \eta^{2^k}} in the unit disk for a sufficiently small {c_k>0}, one can find a family {{\mathcal G}} of cardinality {O( \eta^{-O_k(N)})} consisting of {1}-bounded functions {\tilde f} of {U^k(G)} norm at least {\eta/2} such that for every {1}-bounded {f} with {\|f\|_{U^k(G)} \geq \eta} there exists {\tilde f \in {\mathcal G}} such that

\displaystyle  \|f-\tilde f\|_{L^\infty(G)} \leq \eta^{2^k}/2^{k+3}.

Now, let {A_1,\dots,A_M} be independent samples of {A} for some {M} to be chosen later. By the preceding discussion, we see that with probability at least {1 - 2^{-M}}, we have

\displaystyle  \sup_{\tilde F \in \bigcup_{j=1}^M {\mathcal F}_{A_j}} |\langle \tilde f, \tilde F \rangle| \geq \eta^{2^k}/2^{k+2}

for any given {\tilde f \in {\mathcal G}}, so by the union bound, if we choose {M = \lfloor C N \log \frac{1}{\eta} \rfloor} for a large enough {C = C_k}, we can find {A_1,\dots,A_M} such that

\displaystyle  \sup_{\tilde F \in \bigcup_{j=1}^M {\mathcal F}_{A_j}} |\langle \tilde f, \tilde F \rangle| \geq \eta^{2^k}/2^{k+2}

for all {\tilde f \in {\mathcal G}}, and hence y the triangle inequality

\displaystyle  \sup_{\tilde F \in \bigcup_{j=1}^M {\mathcal F}_{A_j}} |\langle f, \tilde F \rangle| \geq \eta^{2^k}/2^{k+3}.

Taking {{\mathcal F}} to be the union of the {{\mathcal F}_{A_j}} (applying some truncation and rescaling to these {L^2}-bounded functions to make them {L^\infty}-bounded, and then {1}-bounded), we obtain the claim. \Box

One way to obtain lower bounds on the inverse theorem entropy is to produce a collection of almost orthogonal functions with large norm. More precisely:

Proposition 4 Let {\| \|} be a seminorm, let {0 < \varepsilon \leq \eta < 1}, and suppose that one has a collection {f_1,\dots,f_M} of {1}-bounded functions such that for all {i=1,\dots,M}, {\|f_i\| \geq \eta} one has {|\langle f_i, f_j \rangle| \leq \varepsilon^2/2} for all but at most {L} choices of {j \in \{1,\dots,M\}} for all distinct {i,j \in \{1,\dots,M\}}. Then the {(\eta, \varepsilon)}-entropy of {\| \|} is at least {\varepsilon^2 M / 2L}.

Proof: Suppose we have an {(\eta,\varepsilon)}-inverse theorem with some family {{\mathcal F}}. Then for each {i=1,\dots,M} there is {F_i \in {\mathcal F}} such that {|\langle f_i, F_i \rangle| \geq \varepsilon}. By the pigeonhole principle, there is thus {F \in {\mathcal F}} such that {|\langle f_i, F \rangle| \geq \varepsilon} for all {i} in a subset {I} of {\{1,\dots,M\}} of cardinality at least {M/|{\mathcal F}|}:

\displaystyle  |I| \geq M / |{\mathcal F}|.

We can sum this to obtain

\displaystyle  |\sum_{i \in I} c_i \langle f_i, F \rangle| \geq |I| \varepsilon

for some complex numbers {c_i} of unit magnitude. By Cauchy-Schwarz, this implies

\displaystyle  \| \sum_{i \in I} c_i f_i \|_{L^2(G)}^2 \geq |I|^2 \varepsilon^2

and hence by the triangle inequality

\displaystyle  \sum_{i,j \in I} |\langle f_i, f_j \rangle| \geq |I|^2 \varepsilon^2.

On the other hand, by hypothesis we can bound the left-hand side by {|I| (L + \varepsilon^2 |I|/2)}. Rearranging, we conclude that

\displaystyle  |I| \leq 2 L / \varepsilon^2

and hence

\displaystyle  |{\mathcal F}| \geq \varepsilon^2 M / 2L

giving the claim. \Box

Thus for instance:

  • For the {U^2(G)} norm, one can take {f_1,\dots,f_M} to be the family of linear exponential phases {n \mapsto e(\xi \cdot n)} with {M = N} and {L=1}, and obtain a linear lower bound of {\varepsilon^2 N/2} for the {(\eta,\varepsilon)}-entropy, thus matching the upper bound of {N} up to constants when {\varepsilon} is fixed.
  • For the {U^k({\bf Z}/N{\bf Z})} norm, a similar calculation using polynomial phases of degree {k-1}, combined with the Weyl sum estimates, gives a lower bound of {\gg_{k,\varepsilon} N^{k-1}} for the {(\eta,\varepsilon)}-entropy for any fixed {\eta,\varepsilon}; by considering nilsequences as well, together with nilsequence equidistribution theory, one can replace the exponent {k-1} here by some quantity that goes to infinity as {\eta \rightarrow 0}, though I have not attempted to calculate the exact rate.
  • For the {U^k({\bf F}_p^n)} norm, another similar calculation using polynomial phases of degree {k-1} should give a lower bound of {\gg_{p,k,\eta,\varepsilon} \exp( c_{p,k,\eta,\varepsilon} n^{k-1} )} for the {(\eta,\varepsilon)}-entropy, though I have not fully performed the calculation.

We close with one final example. Suppose {G} is a product {G = A \times B} of two sets {A,B} of cardinality {\asymp \sqrt{N}}, and we consider the Gowers box norm

\displaystyle  \|f\|_{\Box^2(G)}^4 := {\bf E}_{a,a' \in A; b,b' \in B} f(a,b) \overline{f}(a,b') \overline{f}(a',b) f(a,b).

One possible choice of class {{\mathcal F}} here are the indicators {1_{U \times V}} of “rectangles” {U \times V} with {U \subset A}, {V \subset B} (cf. this previous blog post on cut norms). By standard calculations, one can use this class to show that the {(\eta, \eta^4/10)}-entropy of {\| \|_{\Box^2(G)}} is {O( \exp( O(\sqrt{N}) )}, and a variant of the proof of the second part of Proposition 2 shows that this is the correct order of growth in {N}. In contrast, a modification of Proposition 3 only gives an upper bound of the form {O( \exp( O( N^{2/3} ) ) )} (the bottleneck is ensuring that the randomly sampled dual functions stay bounded in {L^2}), which shows that while this cheap bound is not optimal, it can still broadly give the correct “type” of bound (specifically, intermediate growth between polynomial and exponential).

Doug NatelsonScience Communications Symposium

 I will be posting more about science very soon, but today I'm participating in a science communications symposium here in the Wiess School of Natural Sciences at Rice.  It's a lot of fun and it's great to hear from some amazing colleagues who do impressive work.   For example, Lesa Tran Lu and her work on the chemistry of cooking, Julian West and his compelling scientific story-telling, Scott Solomon and his writing about evolution, and Kirsten Siebach and her work on Mars rovers and geology.

(On a side note, I've now been blogging for almost 17 years - that makes me almost 119 blog-years old.)


UPDATE:  Here is a link to a video of the whole symposium.


Matt von HippelTrapped in the (S) Matrix

I’ve tried to convince you that you are a particle detector. You choose your experiment, what actions you take, and then observe the outcome. If you focus on that view of yourself, data out and data in, you start to wonder if the world outside really has any meaning. Maybe you’re just trapped in the Matrix.

From a physics perspective, you actually are trapped in a sort of a Matrix. We call it the S Matrix.

“S” stands for scattering. The S Matrix is a formula we use, a mathematical tool that tells us what happens when fundamental particles scatter: when they fly towards each other, colliding or bouncing off. For each action we could take, the S Matrix gives the probability of each outcome: for each pair of particles we collide, the chance we detect different particles at the end. You can imagine putting every possible action in a giant vector, and every possible observation in another giant vector. Arrange the probabilities for each action-observation pair in a big square grid, and that’s a matrix.

Actually, I lied a little bit. This is particle physics, and particle physics uses quantum mechanics. Because of that, the entries of the S Matrix aren’t probabilities: they’re complex numbers called probability amplitudes. You have to multiply them by their complex conjugate to get probability out.

Ok, that probably seemed like a lot of detail. Why am I telling you all this?

What happens when you multiply the whole S Matrix by its complex conjugate? (Using matrix multiplication, naturally.) You can still pick your action, but now you’re adding up every possible outcome. You’re asking “suppose I take an action. What’s the chance that anything happens at all?”

The answer to that question is 1. There is a 100% chance that something happens, no matter what you do. That’s just how probability works.

We call this property unitarity, the property of giving “unity”, or one. And while it may seem obvious, it isn’t always so easy. That’s because we don’t actually know the S Matrix formula most of the time. We have to approximate it, a partial formula that only works for some situations. And unitarity can tell us how much we can trust that formula.

Imagine doing an experiment trying to detect neutrinos, like the IceCube Neutrino Observatory. For you to detect the neutrinos, they must scatter off of electrons, kicking them off of their atoms or transforming them into another charged particle. You can then notice what happens as the energy of the neutrinos increases. If you do that, you’ll notice the probability also start to increase: it gets more and more likely that the neutrino can scatter an electron. You might propose a formula for this, one that grows with energy. [EDIT: Example changed after a commenter pointed out an issue with it.]

If you keep increasing the energy, though, you run into a problem. Those probabilities you predict are going to keep increasing. Eventually, you’ll predict a probability greater than one.

That tells you that your theory might have been fine before, but doesn’t work for every situation. There’s something you don’t know about, which will change your formula when the energy gets high. You’ve violated unitarity, and you need to fix your theory.

In this case, the fix is already known. Neutrinos and electrons interact due to another particle, called the W boson. If you include that particle, then you fix the problem: your probabilities stop going up and up, instead, they start slowing down, and stay below one.

For other theories, we don’t yet know the fix. Try to write down an S Matrix for colliding gravitational waves (or really, gravitons), and you meet the same kind of problem, a probability that just keeps growing. Currently, we don’t know how that problem should be solved: string theory is one answer, but may not be the only one.

So even if you’re trapped in an S Matrix, sending data out and data in, you can still use logic. You can still demand that probability makes sense, that your matrix never gives a chance greater than 100%. And you can learn something about physics when you do!

Terence Tao2022 ICM satellite coordination group

[Note: while I happen to be the chair of the ICM Structure Committee, I am authoring this blog post as an individual, and not as a representative of that committee or of the IMU, as they do not have jurisdiction over satellite conferences. -T.]

The International Mathematical Union (IMU) has just released some updates on the status of the 2022 International Congress of Mathematicians (ICM), which was discussed in this previous post:

  • The General Assembly will take place in person in Helsinki, Finland, on July 3-4.
  • The IMU award ceremony will be held in the same location on July 5.
  • The ICM will take place virtually (with free participation) during the hours 9:00-18:00 CEST of July 6-14, with talks either live or pre-recorded according to speaker preference.

Due to the limited time and resources available, the core ICM program will be kept to the bare essentials; the lectures will be streamed but without opportunity for questions or other audience feedback. However, the IMU encourages grassroots efforts to supplement the core program with additional satellite activities, both “traditional” and “non-traditional”. Examples of such satellite activities include:

A more updated list of these events can be found here.

I will also mention the second Azat Miftakov Days, which are unaffiliated with the ICM but held concurrently with the beginning of the congress (and the prize ceremony).

Strictly speaking, satellite events are not officially part of the Congress, and not directly subject to IMU oversight; they also receive no funding or support from the IMU, other than sharing of basic logistical information, and recognition of the satellite conferences on the ICM web site. Thus this (very exceptional and sui generis) congress will differ in format from previous congresses, in that many of the features of the congress that traditionally were managed by the local organizing committee will be outsourced this one time to the broader mathematical community in a highly decentralized fashion.

In order to coordinate the various grassroots efforts to establish such satellite events, Alexei Borodin, Martin Hairer, and myself have set up a satellite coordination group to share information and advice on these events. (It should be noted that while Alexei, Martin and myself serve on either the structure committee or the program committee of the ICM, we are acting here as individuals rather than as official representatives of the IMU.) Anyone who is interested in organizing, hosting, or supporting such an event is welcome to ask to join the group (though I should say that most of the discussion concerns boring logistical issues). Readers are also welcome to discuss broader issues concerning satellites, or the congress as a whole, in the comments to this post. I will also use this space to announce details of satellite events as they become available (most are currently still only in the early stages of planning).

May 26, 2022

Tommaso DorigoGiorgio The Unstoppable

I was happy to meet Giorgio Bellettini at the Pisa Meeting on Advanced Detectors this week, and I thought I would write here a note about him. At 89 years of age Giorgio still has all his wits around him, and he is still as compelling and unstoppable as anybody who has met him will recall. It is a real pleasure to see that he still attends all sessions, always curious to hear the latest developments in detector design and performance. 

read more

May 25, 2022

Peter Rohde Three Capes Track, Tasmania

Peter Rohde South Coast Track, Tasmania

May 24, 2022

n-Category Café Holomorphic Gerbes (Part 1)

I have some guesses about holomorphic gerbes. But I don’t know much about them; what I know is a small fraction of what’s in here:

I recently blogged about the classification of holomorphic line bundles. Since gerbes are a lot like line bundles, it’s easy to guess some analogous results for holomorphic gerbes. I did that, and then looked around to see what people have already done. And it looks like I’m on the right track, though I still have lots of questions.

By ‘line bundle’ I’ll always mean a complex line bundle. These have an elegant definition, but we can describe any smooth line bundle on a smooth manifold XX by choosing an open cover U iU_i of XX and smooth functions

g ij:U iU j * g_{i j} \colon U_i \cap U_j \to \mathbb{C}^\ast

that are 1-cocycles, meaning

g ijg jk=g ik g_{i j} g_{j k} = g_{i k}

on all the intersections U iU jU kU_i \cap U_j \cap U_k. Here *\mathbb{C}^\ast is the nonzero complex numbers, made into a Lie group under multiplication.

A ‘gerbe’ is the next thing up the categorical ladder. Again these have an elegant definition, but we can describe any smooth gerbe by choosing an open cover U iU_i of our manifold and smooth functions

h ijk:U iU jU k * h_{i j k} \colon U_i \cap U_j \cap U_k \to \mathbb{C}^\ast

that are 2-cocycles, meaning

h ijkh ik=h ijh jk h_{i j k} h_{i k \ell} = h_{i j \ell} h_{j k \ell}

These cocycles live in Čech cohomology. Čech cohomology lets us eliminate the dependence on the choice of open cover, and it also gives a way to define isomorphisms between line bundles, and ‘equivalences’ between gerbes. For example, two gerbes are equivalent if their cocycles differ by a coboundary.

Using some facts about Čech cohomology, we can show that line bundles on a topological space XX are classified by elements of its second cohomology group H 2(X,)H^2(X,\mathbb{Z}). Similarly, gerbes are classified by elements of H 3(X,)H^3(X,\mathbb{Z}).

Clearly there’s a lot more to be said. But I won’t go into more detail here since I’m mainly concerned about what happens when XX is a complex manifold. Then we can talk about holomorphic line bundles and holomorphic gerbes. To do this, we simply require that the functions g ijg_{i j} and h ijkh_{i j k} are holomorphic. The notions of isomorphism and equivalence are similarly adjusted.

Holomorphic line bundles

A smooth line bundle can often be made into a holomorphic line bundle in different ways. So for a line bundle to be holomorphic is not just an extra property, it’s an extra structure.

But we can factor this structure into its property part — can we put a holomorphic structure on this line bundle? — and its ‘pure structure’ part — if we can, what’s the set of ways we can do it? And perhaps surprisingly, if the answer to the first question is yes, the answer to the second question is always the same!

This is surprising if you’re used to structures like groups. The question can we put a group structure on this set? has the answer yes whenever the set is nonempty. But knowing just that doesn’t tell us much about how many ways we can make the set into a group!

But consider orientations of connected manifolds. If the question can we put an orientation on this connected manifold? has the answer yes, we say that manifold is orientable. And if that’s the case, there are always just two orientations!

Something like this, but a lot more interesting, happens for putting holomorphic structures on smooth line bundles. There’s a short exact sequence

0Jac(X)Pic(X)NS(X)0 0 \to \mathrm{Jac}(X) \to \mathrm{Pic}(X) \to \mathrm{NS}(X) \to 0

where:

  • The Picard group Pic(X)\mathrm{Pic}(X) is the set of isomorphism classes of holomorphic line bundles on XX. It’s an abelian group, since we can tensor holomorphic line bundles and get new ones.

  • The Néron–Severi group NS(X)\mathrm{NS}(X) is the property part of Pic(X)\mathrm{Pic}(X). That is, it’s the group of isomorphism classes of smooth line bundles with the property that they can be given a holomorphic structure.

  • The Jacobian Jac(X)\mathrm{Jac}(X) is the pure structure part of Pic(X)\mathrm{Pic}(X). That is, it’s the group of holomorphic structures on the trivial smooth line bundle.

The exact sequence above says a couple of things. Two elements of Pic(X)\mathrm{Pic}(X) map to the same element of NS(X)\mathrm{NS}(X) iff they come from holomorphic line bundles whose underlying smooth line bundles are isomorphic. And if a smooth line bundle has any holomorphic structures at all, it has Jac(X)\mathrm{Jac}(X) different ones. More precisely, its set of holomorphic structures is a torsor for Jac(X)\mathrm{Jac}(X).

A torsor is a ‘group that has forgotten its identity’. More precisely, it’s a nonempty set on which a given group acts freely and transitively. Whenever we have a short exact sequence of groups

0NiGpH0 0 \to N \stackrel{i}{\longrightarrow} G \stackrel{p}{\longrightarrow} H \to 0

and hHh \in H is in the image of pp, its preimage p 1(h)p^{-1}(h) is a torsor for NN. Amusingly, even when hh is not in the image of pp, NN acts freely and transitively on p 1(h)p^{-1}(h). That’s because any group acts freely and transitively on the empty set! But the empty set is not considered to be a torsor.

I hope you see how this fits into the idea of taking a structure and factoring it into a property and ‘pure structure’. We’re wondering how to equip hHh \in H with the structure of being p(g)p(g) for some gGg \in G. But we can break this down into two questions: does hh have the property that it’s in the image of pp? and then if it does, what’s the set of things that maps to it? If the answer to the first question is yes, the answer to the second question is always “an NN-torsor”. And NN-torsors are all isomorphic!

Another cool thing about the exact sequence

0Jac(X)Pic(X)NS(X)0 0 \to \mathrm{Jac}(X) \to \mathrm{Pic}(X) \to \mathrm{NS}(X) \to 0

is that you can read it off just from knowing that Pic(X)\mathrm{Pic}(X) is a complex Lie group. Its connected component containing the identity is Jac(X)\mathrm{Jac}(X). And its set of connected components is NS(X)\mathrm{NS}(X). For a tiny bit more about this, see

And another cool thing is that we can figure out the Jacobian Jac(X)\mathrm{Jac}(X) pretty easily just knowing the topology of XX. It’s a torus, and this torus is the real cohomology H 1(X,)H^1(X,\mathbb{R}) modulo the image of the integral cohomology H 1(X,)H^1(X, \mathbb{Z}). For short:

Jac(X)H 1(X,)H 1(X,) \mathrm{Jac}(X) \cong \frac{H^1(X,\mathbb{R})}{H^1(X, \mathbb{Z})}

This description does not immediately reveal the fact that Jac(X)\mathrm{Jac}(X) is a complex torus; for that, see my blog article above.

The Néron–Severi group is also connected to cohomology: since H 2(X,)H^2(X,\mathbb{Z}) classifies smooth line bundles on XX, while NS(X)\mathrm{NS}(X) classifies smooth line bundles with the property that they come from holomorphic line bundles, we have

NS(X)H 2(X,) \mathrm{NS}(X) \subseteq H^2(X,\mathbb{Z})

But this is not a full description of the Néron–Severi group! To go further it seems we need to focus on some special cases, like complex tori, where Appell–Humbert theorem comes to our rescue. If someone knows more general tricks for computing the Néron–Severi group, please let me know.

Holomorphic gerbes

I suspect that everything I just said has a very nice analogue for gerbes. This paper helps a lot:

and I’ve also found a lot of information and references here:

But I haven’t yet absorbed all the information and tracked down all the references, and it’s a lot faster to make some guesses. So let me state these guesses, and maybe some of you can confirm or correct them.

First, some definitions! Two smooth gerbes are called ‘equivalent’ if the Čech 3-cocycles defining them are cohomologous, and ditto for holomorphic gerbes.

Definition. The gerby Picard group Pic(X)\mathbf{Pic}(X) is the group of equivalence classes of holomorphic gerbes on XX.

It’s an abelian group since we can tensor holomorphic gerbes and get new ones. This corresponds to multiplying the Čech 3-cocycles defining them.

Definition. The gerby Néron–Severi group NS(X)\mathbf{NS}(X) is the group of equivalence classes of smooth gerbes with the property that they can be given a holomorphic structure.

Definition. The gerby Jacobian Jac(X)\mathbf{Jac}(X) is the group of equivalence classes of holomorphic gerbes that are topologically trivial.

Conjecture 1. There is a short exact sequence

0Jac(X)Pic(X)NS(X)0 0 \to \mathbf{Jac}(X) \to \mathbf{Pic}(X) \to \mathbf{NS}(X) \to 0

coming from the natural inclusion of Jac(X)\mathbf{Jac}(X) in Pic(X)\mathbf{Pic}(X).

This seems easy. (I should emphasize that I’m not trying to stake out territory or say anything amazing, just state some precise guesses.)

Conjecture 2. The gerby Picard group Pic(X)\mathbf{Pic}(X) can be made into a complex Lie group in such a way that Jac(X)\mathbf{Jac}(X) is the identity component and NS(X)\mathbf{NS}(X) is the group of connected components.

Ben-Bassat has shown this when XX is a complex torus, but he goes much further and describes Jac(X)\mathbf{Jac}(X) and NS(X)\mathbf{NS}(X) quite explicitly, mimicking the Appell–Humbert theorem, which does the same for line bundles. In general I’m hoping the higher Picard group has the structure of a Lie group, just like the Picard group does, and this should give it the desired topology.

Conjecture 3. The gerby Néron–Severi group NS(X)\mathbf{NS}(X) is a subgroup of H 3(X,)H^3(X,\mathbb{Z}).

This seems obvious.

Conjecture 4. The gerby Jacobian is given by

Jac(X)H 2(X,)H 2(X,) \mathbf{Jac}(X) \cong \frac{H^2(X,\mathbb{R})}{H^2(X, \mathbb{Z})}

Ben-Bassat almost comes out and says this in the case where XX is a complex torus, in his Theorem 3. But it’s disguised by the fact that he uses a formula for the 2nd cohomology that works in this case.

Holomorphic n-gerbes

As the turtle once said: while I’m sticking my neck out, let me stick it out all the way. We can define nn-gerbes for any nn, for example using higher Čech cohomology groups or your favorite approach to higher categories. And the patterns I’m discussing want to be continued to higher nn.

My basic intuition is that while a line bundle has a set that’s a copy of \mathbb{C} sitting over each point of our space XX, the kind of gerbe I’m talking about has a category with one object and a copy of \mathbb{C} as its morphisms, while a 2-gerbe has a 2-category with one object, one morphism and a copy of \mathbb{C} as its 2-morphisms, and so on. But to actually work with these things I’ll use Čech cohomology.

At this point it’s good to introduce more systematic notation. I used Pic(X)\mathrm{Pic}(X) for the usual Picard group and Pic(X)\mathbf{Pic}(X) for its gerby analogue, but I can’t use bolder boldface for 22-gerbes and so on. Besides, the whole pattern “line bundle, gerbe, 2-gerbe…” isn’t starting at the base case, and line bundles are themselves a categorification of complex-valued functions, so the numbering is all screwed up. Let’s try this:

Definition. Given a complex manifold XX, its nnth Picard group Pic n(X)\mathrm{Pic}_n(X) is nnth Čech cohomology group of its sheaf of nonzero holomorphic complex functions.

This is the usual Picard group when n=1n = 1, and the gerby Picard group when n=2n = 2. For higher nn it’s supposed to be the group of equivalence classes of holomorphic (n1)(n-1)-gerbes.

Definition. The nnth Néron–Severi group NS n(X)\mathrm{NS}_n(X) of a complex manifold XX is the subgroup of H n+1(X)H^{n+1}(X) that’s the image of the natural map

Pic n(X)pH n+1(X,) \mathrm{Pic}_n(X) \stackrel{p}{\longrightarrow} H^{n+1}(X,\mathbb{Z})

This natural map comes from a long exact sequence in Čech cohomology called the exponential sequence. The nnth Néron–Severi group is the usual Néron–Severi group when n=1n = 1, and the gerby one when n=2n = 2. For higher nn it’s supposed to be the group of equivalence classes of smooth (n1)(n-1)-gerbes that admit a holomorphic structure.

Definition. The nnth Jacobian Jac n(X)\mathrm{Jac}_n(X) is the kernel of the natural map

Pic n(X)pH n+1(X,) \mathrm{Pic}_n(X) \stackrel{p}{\longrightarrow} H^{n+1}(X,\mathbb{Z})

This is the usual Jacobian of XX when n=1n = 1, and the gerby one when n=2n = 2. For higher nn it’s supposed to be the group of equivalence classes of holomorphic structures on the trivial (n1)(n-1)-gerbe.

Just by definition, we get a short exact sequence

0Jac n(X)iPic n(X)pNS n(X)0 0 \to \mathrm{Jac}_n(X) \stackrel{i}{\longrightarrow} \mathrm{Pic}_n(X) \stackrel{p}{\longrightarrow} \mathrm{NS}_n(X) \to 0

So the higher analogue of Conjecture 1 is true by definition. But this one is not quite so automatic:

Conjecture 2 {}^\prime. The nnth Picard group Pic n(X)\mathrm{Pic}_n(X) of a complex manifold can be made into a complex Lie group in such a way that Jac n(X)\mathrm{Jac}_n(X) is the identity component and NS n(X)\mathrm{NS}_n(X) is the group of connected components.

The nnth Néron–Severi group NS n(X)\mathrm{NS}_n(X) is a subgroup of H n+1(X,)H^{n+1}(X,\mathbb{Z}) by definition, so the higher analogue of Conjecture 3 holds. But this one is more speculative:

Conjecture 4 {}^\prime. The nnth Jacobian of a complex manifold XX is given by

Jac n(X)H n(X,)H n(X,) \mathrm{Jac}_n(X) \cong \frac{H^{n}(X,\mathbb{R})}{H^{n}(X, \mathbb{Z})}

I don’t know if the number nn is right here. But for n=1n = 1 this gives a result that matches the usual Jacobian of XX, and for n=2n = 2 this matches Ben-Bassat’s calculation of the gerby Jacobian in the special case where XX is a complex torus.

In general, this sort of quotient is called an intermediate Jacobian, and there’s a lot of material about intermediate Jacobians and nn-gerbes on the nLab page on intermediate Jacobians. There’s also some relation, unclear to me so far, between this conjecture and Theorem 1.5.11 in Brylinski’s Loop Spaces, Characteristic Classes and Geometric Quantization. Brylinski credits his Theorem 15.11 to Deligne but cites this reference:

  • Michael Rapoport, Peter Schneider, and Norbert Schappacher, eds., Beilinson’s Conjectures on Special Values of L-functions, Academic Press, New York, 2014.

and especially the paper by Esnault and Viehweg in this volume. So I should read that.

If Conjecture 4 is true, the nnth Jacobian is obviously a torus. But in fact Deligne came up with a way to make intermediate Jacobians into complex tori, and Griffiths came up with a different way. One of these ways should be the ‘right’ way if the nnth Picard group is a complex Lie group and the nnth Jacobian is the identity component of that.

So, there’s a lot left for me to think about here!

Peter Rohde Climbing on Frenchmans Cap

Tierry le Fronde (150m, trad), Tahune Face.

Sydney Route (400m, trad), South-East Face.

The post Climbing on Frenchmans Cap appeared first on Peter Rohde.

Peter Rohde MoodSnap! Technology for mental health

In this article published in The Spectator, I discuss the importance of incorporating mental health into digital health ecosystems like Apple Health, Google Fit and Fitbit. Mental health is as important as physical health, and the two are intimately connected. It would be remiss for digital health ecosystems to overlook this important aspect of our wellbeing.

Featuring in the article is my new open-source project MoodSnap for realtime tracking of mental health.

The post MoodSnap! Technology for mental health   appeared first on Peter Rohde.

Matt StrasslerEven in Einstein’s General Relativity, the Earth Orbits the Sun (& the Sun Does Not Orbit the Earth)

Back before we encountered Professor Richard Muller’s claim that “According to [Einstein’s] general theory of relativity, the Sun does orbit the Earth. And the Earth orbits the Sun,” I was creating a series of do-it-yourself astronomy posts. (A list of the links is here.) Along the way, we rediscovered for ourselves one of the key laws of the planets: Kepler’s third law, which relates the time T it takes for a planet to orbit the Sun to its distance R from the Sun. Because we’ll be referring to this law and its variants so often, let me call it the “T|R law”. [For elliptical orbits, the correct choice of R is half the longest distance across the ellipse.] From this law we figured out how much acceleration is created by the Sun’s gravity, and concluded that it varies as 1/R2.

That wasn’t all. We also saw that objects that orbit the Earth — the Moon and the vast array of human-built satellites — satisfy their own T|R law, with the same general relationship. The only difference is that the acceleration created by the Earth’s gravity is less at the same distance than is the Sun’s. (We all secretly know that this is because the Earth has a smaller mass, though as avid do-it-yourselfers we admit we didn’t actually prove this yet.)

T|R laws are indeed found among any objects that (in the Newtonian sense) orbit a common planet. For example, this is true of the moons of Jupiter, as well as the rocks that make up Jupiter’s thin ring.

Along the way, we made a very important observation. We hadn’t (and still haven’t) succeeded in figuring out if the Earth goes round the Sun or the Sun goes round the Earth. But we did notice this:

This was all in a pre-Einsteinian context. But now Professor Muller comes along, and tells us Einstein’s conception of gravity implies that the Sun goes round the Earth just as much (or just as little) as the Earth goes round the Sun.

Muller versus Kepler

What is Muller saying? He’s certainly right that in Einstein’s approach to gravity and motion, Sun-centered (heliocentric) coordinates and Earth-centered (geocentric) coordinates are equally good. Equal, that is, in the sense that Einstein’s concepts of gravity and motion work equally well in either one. Because of this, in Einstein’s theory, we cannot say whether the Earth goes round the Sun or the Sun goes round the Earth. It can appear either way, or some other way, depending on what coordinates we choose and how we visualize them.

But then Muller goes a step too far. He says: “The Sun orbits the Earth. And the Earth orbits the Sun.” This is where, in my opinion, he makes an error. He has forgotten that gravitational orbits have special properties that general looping trajectories do not have. They have T|R laws.

[More precisely, in any context where a Newtonian would feel it reasonable to say that “X orbits Y due to gravity”, then X’s path satisfies Y’s T|R law to a reasonable approximation.]

In considering Muller’s claim, the fact that the Earth satisfies the Sun’s T|R law very well, but the Sun doesn’t even come close to satisfying the Earth’s, is intriguing. But still, isn’t it just a matter of choosing the appropriate coordinates, geocentric instead of heliocentric?

No it is not! Kepler’s third law, and indeed any T|R law, is coordinate-independent!

[I’ll try to demonstrate carefully that this is true in my next post. But in short, velocities and curvatures in the solar system are small, so neither T, which involves an astronomical year as measured by a local clock on either Sun or Earth, nor R, which is a relative distance that can be estimated by light-travel time, is ambiguous; and neither cares what coordinate system you are using when you measure them.]

This is the crucial observation. You see, it’s not as though the Sun’s path relative to the Earth fails to satisfy Earth’s T|R law in heliocentric coordinates, but succeeds in satisfying it in geocentric coordinates. It fails to satisfy Earth’s T|R law in all coordinate systems.

And why does this tell us that the Sun does not orbit the Earth? Any T|R law, including Kepler’s third law for the Sun and the similar laws for all compact objects including planets, moons and asteroids, provides a quintessential test for diagnosing whether a trajectory is a gravitational orbit. After all, the Sun’s T|R law applies to more than the eight planets. It applies to all the Sun-orbiting rocks and ice-balls and human-made satellites, as well as all imaginable objects that could potentially orbit the Sun gravitationally. [We’ll discuss some of the minor fine print to this strong statement next time.] In one simple equation, it provides a basic rule that all such orbits of the Sun must satisfy. A path which does not meet this rule, at least in some approximate way, cannot be said to be a gravitational orbit of the Sun.

And a path which does not satisfy the T|R law for the Earth, to a reasonable approximation, cannot be said to be a gravitational orbit of the Earth.

Thus, the question of whether the Earth orbits the Sun, or the Sun orbits the Earth, can be addressed using their respective T|R laws, independently of whether we use heliocentric or geocentric coordinates, or any other choice of coordinates.

What is Muller’s mistake?

Muller is correct that if two paths (the Sun’s and the Earth’s, in this case) intertwine, you cannot say which one goes round the other. That’s coordinate-dependent. It is also true that, in Einstein’s gravity, the Sun’s motion around the Earth in geocentric coordinates can be interpreted in a purely gravitational language, which is not true (naively) in Newton’s gravity, where we would normally invoke “fictitious” non-gravitational forces.

But these points, though correct, are irrelevant to the question of gravitational orbits.

A gravitating system, even in Newton’s language, consists of an elaborate structure of both actual and potential orbits, characterized by a T|R law. In Einstein’s language, the system isn’t described just by the trajectories of the massive objects within it. It has an extended four-dimensional space-time geometry, which we can’t and shouldn’t ignore. What Muller has done is focus his (and our) attention on the properties of two or three geodesics in isolation, while ignoring the rest of the geometry. If we look at the spacetime as a whole, and probe its properties in a coordinate-invariant way, it is easy to see that

  • The Sun’s T|R law applies approximately to the Earth’s trajectory, and applies approximately to classes of gravitational orbits nearby to the Earth’s trajectory.
  • The Earth’s T|R law does not apply (even approximately) for the Sun’s trajectory, nor does it apply to any nearby trajectories.

These are coordinate-invariant statements that care not a whit whether, in some set of coordinates, the Sun’s path goes around a stationary Earth, or for that matter whether both the Sun’s and Earth’s paths go around a stationary Moon, a stationary Venus, or some arbitrary point in space.

So in my opinion, Muller is wrong. The Sun does not orbit the Earth, or the Moon; the Earth does not orbit the international space station or any of the GPS satellites; and the planet Saturn does not orbit any of the tiny rocks that make up its rings. To say otherwise is to misunderstand and misapply the lessons of Einstein’s theory.

During the discussion last week a number of readers suggested other methods for arguing against Muller. Maybe next week we can look at the strengths and weaknesses of these other methods, and discuss them in more detail. But first I’ll write a post putting more meat on today’s bare-bones argument.

May 23, 2022

Tommaso DorigoA Tale Of Two Two-Sigma Signals

Two recent analyses by the CMS experiments stand out, in my opinion, for their suggestive results. They both produce evidence at the two-three sigmaish level of the signals they were after: this means that the probability of the observed data under the no-signal hypothesis is between a few percent and a one in a thousand, so nothing really unmistakable. But the origin of the observed effects are probably of opposite nature - one is a genuine signal that is slowly but surely sticking its head up as we improve our analysis techniques and collect more data, while the other is a fluctuation that we bumped into. 

read more

Matt StrasslerFourth Step in the Triplet Model is up.

Advanced particle physics today:

Today we move deeper into the reader-requested explanation of the “triplet model,”  (a classic and simple variation on the Standard Model of particle physics, in which the W boson mass can be raised slightly relative to Standard Model predictions without affecting other current experiments.) The math required is still pre-university level, though slowly creeping up as complex numbers start to appear.

The firstsecond and third webpages in this series provided a self-contained introduction that concluded with a full cartoon of the triplet model, showing how a small modification of the Higgs mechanism of the Standard Model can shift a “W” particle’s mass upward.

Next, we begin a new phase in which the cartoon is gradually replaced with the real thing. In the new fourth webpage, I start laying the groundwork for understanding how the Standard Model works — in particular how the Higgs boson gives mass to the W and Z bosons, and what SU(2) x U(1) is all about — following which it won’t be hard to explain the triplet model.

Please send your comments and suggestions!

May 21, 2022

Terence TaoPartially specified mathematical objects, ambient parameters, and asymptotic notation

In orthodox first-order logic, variables and expressions are only allowed to take one value at a time; a variable {x}, for instance, is not allowed to equal {+3} and {-3} simultaneously. We will call such variables completely specified. If one really wants to deal with multiple values of objects simultaneously, one is encouraged to use the language of set theory and/or logical quantifiers to do so.

However, the ability to allow expressions to become only partially specified is undeniably convenient, and also rather intuitive. A classic example here is that of the quadratic formula:

\displaystyle  \hbox{If } x,a,b,c \in {\bf R} \hbox{ with } a \neq 0, \hbox{ then }

\displaystyle  ax^2+bx+c=0 \hbox{ if and only if } x = \frac{-b \pm \sqrt{b^2-4ac}}{2a}. \ \ \ \ \ (1)

Strictly speaking, the expression {x = \frac{-b \pm \sqrt{b^2-4ac}}{2a}} is not well-formed according to the grammar of first-order logic; one should instead use something like

\displaystyle x = \frac{-b - \sqrt{b^2-4ac}}{2a} \hbox{ or } x = \frac{-b + \sqrt{b^2-4ac}}{2a}

or

\displaystyle x \in \left\{ \frac{-b - \sqrt{b^2-4ac}}{2a}, \frac{-b + \sqrt{b^2-4ac}}{2a} \right\}

or

\displaystyle x = \frac{-b + \epsilon \sqrt{b^2-4ac}}{2a} \hbox{ for some } \epsilon \in \{-1,+1\}

in order to strictly adhere to this grammar. But none of these three reformulations are as compact or as conceptually clear as the original one. In a similar spirit, a mathematical English sentence such as

\displaystyle  \hbox{The sum of two odd numbers is an even number} \ \ \ \ \ (2)

is also not a first-order sentence; one would instead have to write something like

\displaystyle  \hbox{For all odd numbers } x, y, \hbox{ the number } x+y \hbox{ is even} \ \ \ \ \ (3)

or

\displaystyle  \hbox{For all odd numbers } x,y \hbox{ there exists an even number } z \ \ \ \ \ (4)

\displaystyle  \hbox{ such that } x+y=z

instead. These reformulations are not all that hard to decipher, but they do have the aesthetically displeasing effect of cluttering an argument with temporary variables such as {x,y,z} which are used once and then discarded.

Another example of partially specified notation is the innocuous {\ldots} notation. For instance, the assertion

\displaystyle \pi=3.14\ldots,

when written formally using first-order logic, would become something like

\displaystyle \pi = 3 + \frac{1}{10} + \frac{4}{10^2} + \sum_{n=3}^\infty \frac{a_n}{10^n} \hbox{ for some sequence } (a_n)_{n=3}^\infty

\displaystyle  \hbox{ with } a_n \in \{0,1,2,3,4,5,6,7,8,9\} \hbox{ for all } n,

which is not exactly an elegant reformulation. Similarly with statements such as

\displaystyle \tan x = x + \frac{x^3}{3} + \ldots \hbox{ for } |x| < \pi/2

or

\displaystyle \tan x = x + \frac{x^3}{3} + O(|x|^5) \hbox{ for } |x| < \pi/2.

Below the fold I’ll try to assign a formal meaning to partially specified expressions such as (1), for instance allowing one to condense (2), (3), (4) to just

\displaystyle  \hbox{odd} + \hbox{odd} = \hbox{even}.

When combined with another common (but often implicit) extension of first-order logic, namely the ability to reason using ambient parameters, we become able to formally introduce asymptotic notation such as the big-O notation {O()} or the little-o notation {o()}. We will explain how to do this at the end of this post.

— 1. Partially specified objects —

Let’s try to assign a formal meaning to partially specified mathematical expressions. We now allow expressions {X} to not necessarily be a single (completely specified) mathematical object, but more generally a partially specified instance of a class {\{X\}} of mathematical objects. For instance, {\pm 3} denotes a partially specified instance of the class {\{ \pm 3 \} = \{ 3, -3\}} of numbers consisting of {3} and {-3}; that is to say, a number which is either {-3} and {+3}. A single completely specified mathematical object, such as the number {3}, can now be also interpreted as the (unique) instance of a class {\{ 3 \}} consisting only of {3}. Here we are using set notation to describe classes, ignoring for now the well known issue from Russell’s paradox that some extremely large classes are technically not sets, as this is not the main focus of our discussion here.

For reasons that will become clearer later, we will use the symbol {==} rather than {=} to denote the assertion that two partially specified objects range across exactly the same class. That is to say, we use {X==Y} as a synonym for {\{X\} = \{Y\}}. Thus, for instance, it is not the case that {3 == \pm 3}, because the class {\{\pm 3\}} has instances that {\{3\}} does not.

Any finite sequence {x_1,\dots,x_n} of objects can also be viewed as a partially specified instance of a class {\{x_1,\dots,x_n\}}, which I will denote {x_1|\dots|x_n} in analogy with regular expressions, thus we now also have a new name {\{x_1|\dots|x_n\}} for the set {\{x_1,\dots,x_n\}}. (One could in fact propose {x_1,\dots,x_n} as the notation for {x_1|\dots|x_n}, as is done implicitly in assertions such as “{P(n)} is true for {n=1,2,3}“, but this creates notational conflicts with other uses of the comma in mathematics, such as the notation {(x_1,\dots,x_n)} for an {n}-tuple, so I will use the regular expression symbol {|} here to avoid ambiguity.) For instance, {1|3|5} denotes an partially specified instance from the class {\{1|3|5\}=\{1,3,5\}}, that is to say a number which is either {1}, {3}, and {5}. Similarly, we have

\displaystyle  \pm 3 == 3|-3 == -3|3

and

\displaystyle  3 == 3|3.

One can mimic set builder notation and denote a partially specified instance of a class {{\mathcal C}} as {(x: x \in {\mathcal C})} (or one can replace {x} by any other variable name one pleases); similarly, one can use {(x \in A: P(x))} to denote a partially specified element of {A} that obeys the predicate {P(x)}. Thus for instance

\displaystyle  (n \in {\bf Z}: n \hbox{ odd})

would denote a partially specified odd number. By a slight abuse of notation, we can abbreviate {(x \in A: P(x))} as {(x: P(x))} or simply {P}, if the domain {A} of {x} is implicitly understood from context. For instance, under this convention,

\displaystyle  \hbox{odd} == (n \in {\bf Z}: n \hbox{ odd} )

refers to an partially specified odd integer, while

\displaystyle  \hbox{integer} == (n: n \in {\bf Z})

refers to a partially specified integer. Under these conventions, it now becomes theoretically possible that the class one is drawing from becomes empty, and instantiation becomes vacuous. For instance, with our conventions, {\hbox{odd perfect}} refers to a partially specified odd perfect number, which is conjectured to not exist. As it turns out, our notation can handle instances of empty classes without difficulty (basically thanks to the concept of a vacuous truth), but we will avoid dwelling on this edge case much here since this concept is not intuitive for beginners. (But if one does want to confront this possibility, one can use a symbol such as {\perp} to denote an instance of the empty class, i.e., an object that has no specifications whatsoever.

The symbol {|} introduced above can now be extended to be a binary operation on partially specified objects, defined by the formula

\displaystyle  \{ x|y \} == \{x\} \cup \{y\}.

Thus for instance

\displaystyle  \hbox{odd} | \hbox{even} == \hbox{integer}

and

\displaystyle  x \pm y == (x+y) | (x-y).

One can then define other logical operations on partially specified objects if one wishes. For instance, we could define an “and” operator {\&} by defining

\displaystyle  \{ x \& y \} == \{x\} \cap \{y\}.

Thus for instance

\displaystyle  (5 \pm 3) \& (10 \pm 2) == 8,

\displaystyle  \hbox{odd} \& \hbox{perfect} == \hbox{odd perfect}

and

\displaystyle  \hbox{odd} \& \hbox{even} == \perp.

(Here we are deviating from the syntax of regular expression, but I am not insisting that the entirety of mathematical notation conform to that syntax, and in any event regular expressions do not appear to have a direct analogue of this “and” operation.) We leave it to the reader to propose other logical operations on partially specified objects, though the “or” operator {|} and the “and” operator {\&} will suffice for our purposes.

Any operation on completely specified mathematical objects {x} can be extended to partially specified of mathematical objects {X} by applying that operation to arbitrary instances of the class {\{X\}}, with the convention that if a class appears multiple times in an expression, then we allow each instance of that class to take different values. For instance, if {x, y} are partially specified numbers, we can define {x+y} to be the class of all numbers formed by adding an instance of {x} to an instance of {y} (this is analogous to the operation of Minkowski addition for sets, or interval arithmetic in numerical analysis). For example,

\displaystyle 5 \pm 3 == 5 + (\pm 3) == 2| 8

and

\displaystyle \pm 5 \pm 3 == (\pm 5) + (\pm 3) == -8|-2| 2|8

(recall that there is no requirement that the signs here align). Note that

\displaystyle (\pm 3)^2 == 9

but that

\displaystyle (\pm 3) \times (\pm 3) == \pm 9.

So we now see the first sign that some care has to be taken with the law of substitution; we have

\displaystyle  x \times x = x^2

but we do not have

\displaystyle  (\pm 3) \times (\pm 3) == (\pm 3)^2.

However, the law of substitution works fine as long as the variable being substituted appears exactly once on both sides of the equation.

One can have an unbounded number of partially specified instances of a class, for instance {\sum_{i=1}^n \pm 1} will be the class of all integers between {-n} and {+n} with the same parity as {n}.

Remark 1 When working with signs {\pm}, one sometimes wishes to keep all signs aligned, with {\mp} denoting the sign opposite to {\pm}, thus for instance with this notation one would have the geometric series formula

\displaystyle  (1 \pm \varepsilon)^{-1} = 1 \mp \varepsilon + \varepsilon^2 \mp \varepsilon^3 \dots

whenever {|\varepsilon| < 1}. However, this notation is difficult to place in the framework used in this blog post without causing additional confusion, and as such we will not discuss it further here. (The syntax of regular expressions does have some tools for encoding this sort of alignment, but in first-order logic we also have the perfectly servicable tool of named variables and quantifiers (or plain old mathematical English) to do so also.)

One can also extend binary relations, such as {<} or {=}, to partially specified objects, by requiring that every instance on the left-hand side of the relation relates to some instance on the right-hand side (thus binary relations become {\Pi_2} sentences). Again, if class is instantiated multiple times, we allow different appearances to correspond to different classes. For instance, the statement {5 \pm 3 \leq 8} is true, because every instance of {5 \pm 3} is less than or equal to {8}:

\displaystyle  5 \pm 3 == (2|8) \leq 8.

But the statement {5 \pm 3 \leq 6} is false. Similarly, the statement {8 = 5 \pm 3} is true, because {8} is an instance of {5 \pm 3}:

\displaystyle  8 = (2|8) == 5 \pm 3.

The statement {5 \pm 3 < 6 \pm 3} is also true, because every instance of {5 \pm 3} is less than some instance of {6 \pm 3}:

\displaystyle  5 \pm 3 == (2|8) < (3|9) == 6 \pm 3.

The relationship between a partially specified representative {x_1|\dots|x_n} of a class {\{x_1|\dots|x_n\}} can then be summarised as

\displaystyle  x_1|\dots|x_n \in \{x_1|\dots|x_n\} = \{x_1,\dots,x_n\}.

Note how this convention treats the left-hand side and right-hand side of a relation involving partially specified expressions asymmetrically. In particular, for partially specified expressions {x, y}, the relation {x=y} is no longer equivalent to {y=x}; the former states that every instance of {x} is also an instance of {y}, while the latter asserts the converse. For instance, {3 = \pm 3} is a true statement, but {\pm 3 = 3} is a false statement (much as “{3} is prime” is a true statement (or “{3 = \hbox{prime}} in our notation), but “primes are {3}” (or {\hbox{prime} = 3} in our notation) is false). In particular, we see a distinction between equality {=} and equivalence {==}; indeed, {x == y} holds if and only if {x=y} and {y=x}. On the other hand, as can be easily checked, the following three basic laws of mathematics remain valid for partially specified expressions {x,y,z}:

  • (i) (Reflexivity) {x=x}.
  • (ii) (Transitivity) If {x = y} and {y = z}, then {x=z}. Similarly, if {x < y} and {y < z}, then {x < z}, etc..
  • (iii) (Substitution) If {x = y}, then {f(x) = f(y)} for any function {f}. Similarly, if {x \leq y}, then {f(x) \leq f(y)} for any monotone function {f}, etc..

These conventions for partially specified expressions align well with informal mathematical English. For instance, as discussed in the introduction, the assertion

\displaystyle  \hbox{The sum of two odd numbers is an even number}

can now be expressed as

\displaystyle \hbox{odd} + \hbox{odd} = \hbox{even}.

Similarly, the even Goldbach’s conjecture can now be stated as

\displaystyle  \hbox{even number} \& (\hbox{greater than } 2) = \hbox{prime} + \hbox{prime},

while the Archimedean property of the reals can be reformulated as the assertion that

\displaystyle  \hbox{real} \leq \hbox{natural number} \times \varepsilon

for any {\varepsilon > 0}. Note also how the equality symbol {=} for partially specified expressions corresponds well with the multiple meanings of the word “is” in English (consider for instance “two plus two is four”, “four is even”, and “the sum of two odd numbers is even”); the set-theoretic counterpart of this concept would be a sort of amalgam of the ordinary equality relation {=}, the inclusion relation {\in}, and the subset relation {\subset}.

There are however a number of caveats one has to keep in mind, though, when dealing with formulas involving partially specified objects. The first, which has already been mentioned, is a lack of symmetry: {x=y} does not imply {y=x}; similarly, {x<y} does not imply {y>x}. The second is that negation behaves very strangely, so much so that one should basically avoid using partially specified notation for any sentence that will eventually get negated. For instance, observe that the statements {3 = \pm 3} and {3 \neq \pm 3} are both true, while {\pm 3 = 3} and {\pm 3 \neq 3} are both false. In fact, the negation of such statements as {x=y} or {x \leq y} involving partially specified objects usually cannot be expressed succinctly in partially specified notation, and one must resort to using several quantifiers instead. (In the language of the arithmetic hierarchy, the negation of a {\Pi_2} sentence is a {\Sigma_2} sentence, rather than an another {\Pi_2} sentence.)

Another subtlety, already mentioned earlier, arises from our choice to allow different instantiations of the same class to refer to different instances, namely that the law of universal instantiation does not always work if the symbol being instantiated occurs more than once on the left-hand side. For instance, the identity

\displaystyle  x - x = 0

is of course true for all real numbers {x}, but if one naively substitutes in the partially specified expression {\pm 1} for {x} one obtains the claim

\displaystyle  \pm 1 - (\pm 1) = 0

which is a false statement under our conventions (because the two instances of the sign {\pm 1, \pm 1} do not have to match). However, there is no problem with repeated instantiations on the right-hand side, as long as there is at most a single instance on the left-hand side. For instance, starting with the identity

\displaystyle  x = 2x - x

we can validly instantiate the partially specified expression {\pm 1} for {x} to obtain

\displaystyle  \pm 1 = 2(\pm 1) - (\pm 1).

A common practice that helps avoid these sorts of issues is to keep the partially specified quantities on the right-hand side of one’s relations, or if one is working with a chain of relations such as {x \leq y \leq z \leq w}, to keep the partially specified quantities away from the left-most side (so that {y}, {z}, and {w} are allowed to be multi-valued, but not {x}). This doesn’t automatically prevent all issues (for instance, one may still be tempted to “cancel” an expression such as {\pm 1 - (\pm 1)} that might arise partway through a chain of relations), but it can reduce the chance of accidentally making an error.

One can of course translate any formula that involves partially specified objects into a more orthodox first-order logic sentence by inserting the relevant quantifiers in the appropriate places – but note that the variables used in quantifiers are always completely specified, rather than partially specified. For instance, if one expands “{x = \pm y}” (for some completely specified quantities {x,y}) as “there exists {\epsilon = \pm 1} such that {x = \epsilon y}“, the quantity {\epsilon} is completely specified; it is not the partially specified {\pm 1}. (If {x} or {y} were also partially specified, the first-order translation of the expression “{x = \pm y}” would be more complicated, as it would need more quantifiers.)

One can combine partially specified notation with set builder notation, for instance the set {\{ x \in {\bf R}: x = \pm 10 \pm 1 \}} is just the four-element set {\{ -11, -9, +9, +11\}}, since these are indeed the four real numbers {x} for which the formula {x = \pm 10 \pm 1} is true. I would however avoid combining particularly heavy uses of set-theoretic notation with partially specified notation, as it may cause confusion.

Our examples above of partially specified objects have been drawn from number systems, but one can use this notation for other classes of objects as well. For instance, within the class of functions {f: {\bf R} \rightarrow {\bf R}} from the reals to themselves, one can make assertions like

\displaystyle  \hbox{increasing} + \hbox{increasing} == \hbox{increasing}

where {\hbox{increasing}} is the class of monotone increasing functions; similarly we have

\displaystyle  - \hbox{increasing} == \hbox{decreasing}

\displaystyle  \hbox{increasing} | \hbox{decreasing} == \hbox{monotone}

\displaystyle  \hbox{positive increasing} \cdot \hbox{negative increasing} = \hbox{negative decreasing}

\displaystyle  \hbox{bounded} \& \hbox{monotone} = \hbox{converges at infinity}

\displaystyle  \hbox{analytic} = \hbox{smooth} = \hbox{differentiable} = \hbox{continuous} = \hbox{measurable}

\displaystyle  \hbox{continuous} / \hbox{continuous nonvanishing} = \hbox{continuous}

\displaystyle  \hbox{square integrable} \cdot \hbox{square integrable} == \hbox{absolutely integrable}

\displaystyle  {\mathcal F} \hbox{square integrable} == \hbox{square integrable}

(with {{\mathcal F}} denoting the Fourier transform) and so forth. Or, in the class of topological spaces, we have for instance

\displaystyle  \hbox{compact} \& \hbox{discrete} == \hbox{finite},

\displaystyle  \hbox{continuous}(\hbox{compact}) = \hbox{compact},

and

\displaystyle  \pi_1( \hbox{simply connected} ) = \hbox{trivial group}

while conversely the classifying space construction gives (among other things)

\displaystyle  \hbox{group} == \pi_1( \hbox{topological space} ).

Restricting to metric spaces, we have the well known equivalences

\displaystyle  \hbox{compact} == \hbox{sequentially compact} == \hbox{complete} \& \hbox{totally bounded}.

Note in the last few examples, we are genuinely working with proper classes now, rather than sets. As the above examples hopefully demonstrate, mathematical sentences involving partially specified objects can align very well with the syntax of informal mathematical English, as long as one takes care to distinguish the asymmetric equality relation {=} from the symmetric equivalence relation {==}.

As an example of how such notation might be integrated into an actual mathematical argument, we prove a simple and well known topological result in this notation:

Proposition 2 Let {f: X \rightarrow Y} be a continuous bijection from a compact space {X} to a Hausdorff space {Y}. Then {f} is a homeomorphism.

Proof: We have

\displaystyle  f( \hbox{open subset of }X) = f(X \backslash \hbox{closed subset of } X)

\displaystyle  = Y \backslash f(\hbox{closed subset of } X)

(since {f} is a bijection)

\displaystyle  = Y \backslash f(\hbox{compact subset of } X)

(since {X} is compact)

\displaystyle  = Y \backslash (\hbox{compact subset of } Y)

(since {f} is continuous)

\displaystyle  = Y \backslash (\hbox{closed subset of } Y)

(since {Y} is Hausdorff)

\displaystyle  = \hbox{open subset of } Y.

Thus {f} is open, hence {f^{-1}} is continuous. Since {f} was already continuous, {f} is a homeomorphism. \Box

— 2. Working with parameters —

In order to introduce asymptotic notation, we will need to combine the above conventions for partially specified objects with separate common adjustment to the grammar of mathematical logic, namely the ability to work with ambient parameters. This is a special case of the more general situation of interpreting logic over an elementary topos, but we will not develop the general theory of topoi here. As this adjustment is orthogonal to the adjustments in the preceding section, we shall for simplicity revert back temporarily to the traditional notational conventions for completely specified objects, and not refer to partially specified objects at all in this section.

In the formal language of first-order logic, variables such as {x,n,X} are understood to range in various domains of discourse (e.g., {x} could range in the real numbers, {n} could range in the natural numbers, and {X} in the class of sets). One can then construct various formulas, such as {P(x,n,X)}, in which involve zero or more input variables (known as free variables), and have a truth value in {\{ \hbox{true}, \hbox{false} \}} for any given choice of free variables. For instance, {P(x,n,X)} might be true for some triples {x \in {\bf R}, n \in {\bf N}, X \in \mathrm{Set}}, and false for others. One can create formulas either by applying relations to various terms (e.g., applying the inequality relation {\leq} to the terms {x+y, z+w} gives the formula {x+y \leq z+w} with free variables {x,y,z,w}), or by combining existing formulas together with logical connectives (such as {\wedge, \vee, \not, \implies}) or quantifiers ({\forall} and {\exists}). Formulas with no free variables (e.g. {\forall x \forall y: x+y=y+x}) are known as sentences; once one fixes the domains of discourse, sentences are either true or false. We will refer to this first-order logic as orthodox first-order logic, to distinguish it from the parameterised first-order logic we shall shortly introduce.

We now generalise this setup by working relative to some ambient set of parameters – some finite collection of variables that range in some specified sets (or classes) and may be subject to one or more constraints. For instance, one may be working with some natural number parameters {n,N \in {\bf N}} with the constraint {n \leq N}; we will keep this particular choice of parameters as a running example for the discussion below. Once one selects these parameters, all other variables under consideration are not just single elements of a given domain of discourse, but rather a family of such elements, parameterised by the given parameters; we will refer to these variables as parameterised variables to distinguish them from the orthodox variables of first-order logic. For instance, with the above parameters, when one refers to a real number {x}, one now refers not just to a single element of {{\bf R}}, but rather to a function {x: (n,N) \mapsto x(n,N)} that assigns a real number {x(n,N)} to each choice of parameters {(n,N)}; we will refer to such a function {x: (n,N) \mapsto x(n,N)} as a parameterised real, and often write {x = x(n,N)} to indicate the dependence on parameters. Each of the ambient parameters can of course be viewed as a parameterised variable, thus for instance {n} can (by abuse of notation) be viewed as the parameterised natural number that maps {(n,N)} to {n} for each choice {(n,N)} of parameters.

The specific ambient set of parameters, and the constraints on them, tends to vary as one progresses through various stages of a mathematical argument, with these changes being announced by various standard phrases in mathematical English. For instance, if at some point a proof contains a sentence such as “Let {N} be a natural number”, then one is implicitly adding {N} to the set of parameters; if one later states “Let {n} be a natural number such that {n \leq N}“, then one is implicitly also adding {n} to the set of parameters and imposing a new constraint {n \leq N}. If one divides into cases, e.g., “Suppose now that {n} is odd… now suppose instead that {n} is even”, then the constraint that {n} is odd is temporarily imposed, then replaced with the complementary constraint that {n} is even, then presumably the two cases are combined and the constraint is removed completely. A bit more subtly, parameters can disappear at the conclusion of a portion of an argument (e.g., at the end of a proof of a lemma or proposition in which the parameter was introduced), replaced instead by a summary statement (e.g., “To summarise, we have shown that whenever {n,N} are natural numbers with {n \leq N}, then …”) or by the statement of the lemma or proposition in whose proof the parameter was temporarily introduced. One can also remove a variable from the set of parameters by specialising it to a specific value.

Any term that is well-defined for individual elements of a domain, is also well-defined for parameterised elements of the domain by pointwise evaluation. For instance, if {x = x(n,N)} and {y = y(n,N)} are parameterised real numbers, one can form the sum {x+y = (x+y)(n,N)}, which is another parameterised real number, by the formula

\displaystyle  (x+y)(n,N) := x(n,N) + y(n,N).

Given a relation between terms involving parameterised variables, we will interpret the relation as being true (for the given choice of parameterised variables) if it holds for all available choices of parameters {(n,N)} (obeying all ambient constraints), and false otherwise (i.e., if it fails for at least one choice of parameters). For instance, the relation

\displaystyle  x \leq y

would be interpreted as true if one has {x(n,N) \leq y(n,N)} for all choice of parameters {(n,N)}, and false otherwise.

Remark 3 In the framework of nonstandard analysis, the interpretation of truth is slightly different; the above relation would be considered true if the set of parameters for which the relation holds lies in a given (non-principal) ultrafilter. The main reason for doing this is that it allows for a significantly more general transfer principle than the one available in this setup; however we will not discuss the nonstandard analysis framework further here. (Our setup here is closer in spirit to the “cheap” version of nonstandard analysis discussed in this previous post.)

With this convention an annoying subtlety emerges with regard to boolean connectives (conjunction {\wedge}, disjunction {\vee}, implication {\implies}, and negation {\neg}), in that one now has to distinguish between internal interpretation of the connectives (applying the connectives pointwise for each choice of parameters before quantifying over parameters), and external interpretation (applying the connectives after quantifying over parameters); there is not a general transfer principle from the former to the latter. For instance, the sentence

\displaystyle  n \hbox{ is odd}

is false in parameterised logic, since not every choice of parameter {n} is odd. On the other hand, the internal negation

\displaystyle  \neg( n \hbox{ is odd})

or equivalently

\displaystyle  n \hbox{ is even}

is also false in parameterised logic, since not every choice of parameter {n} is even. To put it another way, the internal disjunction

\displaystyle  (n \hbox{ is odd}) \vee (n \hbox{ is even})

is true in parameterised logic, but the individual statements {n \hbox{ is odd}} and {n \hbox{ is even}} are not (so the external disjunction of these statements is false). To maintain this distinction, I will reserve the boolean symbols ({\vee, \wedge, \implies, \neg}) for internal boolean connectives, and reserve the corresponding English connectives (“and”, “or”, “implies”, “not”) for external boolean connectives.

Because of this subtlety, orthodox dichotomies and trichotomies do not automatically transfer over to the parameterised setting. In the orthodox natural numbers, a natural number {n} is either odd or even; but a parameterised natural number {n} could be neither even all the time nor odd all the time. Similarly, given two parameterised real numbers {x,y}, it could be that none of the statements {x<y}, {x=y}, {x>y} are true all the time. However, one can recover these dichotomies or trichotomies by subdividing the parameter space into cases. For instance, in the latter example, one could divide the parameter space into three regions, one where {x<y} is always true, one where {x=y} is always true, and one where {x>y} is always true. If one can prove a single statement in all three subregions of parameter space, then of course this implies the statement in the original parameter space. So in practice one can still use dichotomies and trichotomies in parameterised logic, so long as one is willing to subdivide the parameter space into cases at various stages of the argument and recombine them later.

There is a similar distinction between internal quantification (quantifying over orthodox variables before quantifying over parameters), and external quantification (quantifying over parameterised variables after quantifying over parameters); we will again maintain this distinction by reserving the symbols {\forall, \exists} for internal quantification and the English phrases “for all” and “there exists” for external quantification. For instance, the assertion

\displaystyle  x+y = y+x \hbox{ for all } x, y \in {\bf R}

when interpreted in parameterised logic, means that for all parameterised reals {x: (n,N) \mapsto x(n,N)} and {y: (n,N) \mapsto y(n,N)}, the assertion {x(n,N)+ y(n,N) = y(n,N) + x(n,N)} holds for all {n,N}. In this case it is clear that this assertion is true and is in fact equivalent to the orthodox sentence {\forall x,y \in {\bf R}: x+y = y+x}. More generally, we do have a restricted transfer principle in that any true sentence in orthodox logic that involves only quantifiers and no boolean connectives, will transfer over to parameterised logic (at least if one is willing to use the axiom of choice freely, which we will do in this post). We illustrate this (somewhat arbitrarily) with the Lagrange four square theorem

\displaystyle  \forall m \in {\bf N} \exists a,b,c,d \in {\bf N}: m = a^2+b^2+c^2+d^2. \ \ \ \ \ (5)

This sentence, true in orthodox logic, implies the parameterised assertion that for every parameterised natural number {m: (n,N) \mapsto m(n,N)}, there exist parameterised natural numbers {a: (n,N) \mapsto a(n,N)}, {b: (n,N) \mapsto b(n,N)}, {c: (n,N) \mapsto c(n,N)}, {d: (n,N) \mapsto d(n,N)}, such that {m(n,N) = a(n,N)^2 + b(n,N)^2 + c(n,N)^2 + d(n,N)^2} for all choice of parameters {(n,N)}. To see this, we can Skolemise the four-square theorem (5) to obtain functions {\tilde a: m \mapsto \tilde a(m)}, {\tilde b: m \mapsto \tilde b(m)}, {\tilde c: m \mapsto \tilde c(m)}, {\tilde d: m \mapsto \tilde d(m)} such that

\displaystyle  m = \tilde a(m)^2 + \tilde b(m)^2 + \tilde c(m)^2 + \tilde d(m)^2

for all orthodox natural numbers {m}. Then to obtain the parameterised claim, one simply sets {a(n,N) := \tilde a(m(n,N))}, {b(n,N) := \tilde b(m(n,N))}, {c(n,N) := \tilde c(m(n,N))}, and {d(n,N) := \tilde d(m(n,N))}. Similarly for other sentences that avoid boolean connectives. (There are some further classes of sentences that use boolean connectives in a restricted fashion that can also be transferred, but we will not attempt to give a complete classification of such classes here; in general it is better to work out some examples of transfer by hand to see what can be safely transferred and which ones cannot.)

So far this setup is not significantly increasing the expressiveness of one’s language, because any statement constructed so far in parameterised logic can be quickly translated to an equivalent (and only slightly longer) statement in orthodox logic. However, one gains more expressive power when one allows one or more of the parameterised variables to have a specified type of dependence on the parameters, and in particular depending only on a subset of the parameters. For instance, one could introduce a real number {C} which is an absolute constant in the sense that it does not depend on either of the parameters {n,N}; these are a special type of parameterised real, in much the same way that constant functions are a special type of function. Or one could consider a parameterised real {a = a(N)} that depends on {N} but not on {n}, or a parameterised real {b = b(n)} that depends on {n} but not on {N}. (One could also place other types of constraints on parameterised quantities, such as continuous or measurable dependence on the parameters, but we will not consider these variants here.)

By quantifying over these restricted classes of parameterised functions, one can now efficiently write down a variety of statements in parameterised logic, of types that actually occur quite frequently in analysis. For instance, we can define a parameterised real {x = x(n,N)} to be bounded if there exists an absolute constant {C} such that {|x| \leq C}; one can of course write this assertion equivalently in orthodox logic as

\displaystyle  \exists C \forall n,N: ((n \leq N) \implies (|x(n,N)| \leq C).

One can also define the stronger notion of {x} being {1}-bounded, by which we mean {|x| \leq 1}, or in orthodox logic

\displaystyle  \forall n,N: ((n \leq N) \implies (|x(n,N)| \leq 1).

In the opposite direction, we can assert the weaker statement that {x = x(n,N)} is bounded in magnitude by a quantity {C = C(N)} that depends on {N} but not on {n}; in orthodox logic this becomes

\displaystyle  \forall N \exists C \forall n \leq N: |x(n,N)| \leq C.

As before, each of the example statements in parameterised logic can be easily translated into a statement in traditional logic. On the other hand, consider the assertion that a parameterised real {x = x(n,N)} is expressible as the sum {x = y + z} of a quantity {y = y(n)} depending only on {n} and a quantity {z = z(N)} depending on {N}. (For instance, the parameterised real {x = (N+n)(N-n) = N^2 - n^2} would be of this form, but the parameterised real {x = Nn} cannot.) Now it becomes significantly harder to translate this statement into first-order logic! One can still do so fairly readily using second-order logic (in which one also is permitted to quantify over operators as well as variables), or by using the language of set theory (so that one can quantify over a set of functions of various forms). Indeed if one is parameterising over proper classes rather than sets, it is even possible to create sentences in parameterised logic that are non-firstorderisable; see this previous blog post for more discussion.

Another subtle distinction that arises once one has parameters is the distinction between “internal” or `parameterised” sets (sets depending on the choice of parameters), and external sets (sets of parameterised objects). For instance, the interval {[n,N]} is an internal set – it assigns an orthodox set {\{ x \in {\bf R}: n \leq x \leq N \}} of reals to each choice of parameters {(n,N)}; elements of this set consist of all the parameterised reals {x = x(n,N)} such that {n \leq x(n,N) \leq N} for all {n,N}. On the other hand, the collection {{\mathcal O}(1)} of bounded reals – i.e., parameterised reals {x = x(n,N)} such that there is a constant {C} for which {|x(n,N)| \leq C} for all choices of parameters {(n,N)} – is not an internal set; it does not arise from taking an orthodox set of reals {X(n,N)} for each choice of parameters. (Indeed, if it did do so, since every constant real is bounded, each {X(n,N)} would contain all of {{\bf R}}, which would make {{\mathcal O}(1)} the set of all parameterised reals, rather than just the bounded reals.) To maintain this distinction, we will reserve set builder notation such as {\{ x \in X: P(x) \}} for internally defined sets, and use English words (such as “the collection of all bounded parameterised reals”) to denote external sets. In particular, we do not make sense of such expressions as {\{ x \in {\bf R}: x \hbox{ bounded} \}} (or {\{ x \in {\bf R}: x = O(1) \}}, once asymptotic notation is introduced). In general, I would recommend that one avoids combining asymptotic notation with heavy use of set theoretic notation, unless one knows exactly what one is doing.

— 3. Asymptotic notation —

We now simultaneously introduce the two extensions to orthodox first order logic discussed in previous sections. In other words,

  1. We permit the use of partially specified mathematical objects in one’s mathematical statements (and in particular, on either side of an equation or inequality).
  2. We allow all quantities to depend on one or more of the ambient parameters.

In particular, we allow for the use of partially specified mathematical quantities that also depend on one or more of the ambient parameters. This allows us now formally introduce asymptotic notation. There are many variants of this notation, but here is one set of asymptotic conventions that I for one like to use:

Definition 4 (Asymptotic notation) Let {X} be a non-negative quantity (possibly depending on one or more of the ambient parameters).
  • We use {O(X)} to denote a partially specified quantity in the class of quantities {Y} (that can depend on one or more of the ambient parameters) that obeys the bound {|Y| \leq CX} for some absolute constant {C > 0}. More generally, given some ambient parameters {\lambda_1,\dots,\lambda_k}, we use {O_{\lambda_1,\dots,\lambda_k}(X)} to denote a partially specified quantity in the class of quantities {Y} that obeys the bound {|Y| \leq C_{\lambda_1,\dots,\lambda_k}(X)} for some constant {C > 0} that can depend on the {\lambda_1,\dots,\lambda_k} parameters, but not on the other ambient parameters.
  • We also use {X \ll Y} or {Y \gg X} as a synonym for {X = O(Y)}, and {X \asymp Y} as a synonym for {X \ll Y \ll X}. (In some fields of analysis, {X \lesssim Y}, {X \gtrsim Y}, and {X \sim Y} are used instead of {X \ll Y}, {X \gg Y}, and {X \asymp Y}.)
  • If {x} is a parameter and {x_*} is a limiting value of that parameter (i.e., the parameter space for {x} and {x_*} both lie in some topological space, with {x_*} an adherent point of that parameter space), we use {o_{x \rightarrow x_*}(X)} to denote a partially specified quantity in the class of quantities {Y} (that can depend on {x} as well as the other the ambient parameters) that obeys a bound of the form {|Y| \leq c(x) X} for all {x} in some neighborhood of {x_*}, and for some quantity {c(x) > 0} depending only on {x} such that {c(x) \rightarrow 0} as {x \rightarrow x_*}. More generally, given some further ambient parameters {\lambda_1,\dots,\lambda_k}, we use {o_{x \rightarrow x_*; \lambda_1,\dots,\lambda_k}(X)} to denote a partially specified quantity in the class of quantities {Y} that obey a bound of the form {|Y| \leq c_{\lambda_1,\dots,\lambda_k}(x) X} for all {x} in a neighbourhood of {x_*} (which can also depend on {\lambda_1,\dots,\lambda_k}) where {c_{\lambda_1,\dots,\lambda_k}(x) > 0} depends on {\lambda_1,\dots,\lambda_k,x} and goes to zero as {x \rightarrow x_*}. (In this more general form, the limit point {x_*} is now also permitted to depend on the parameters {\lambda_1,\dots,\lambda_k}).
Sometimes (by explicitly declaring one will do so) one suppresses the dependence on one or more of the additional parameters {\lambda_1,\dots,\lambda_k}, and/or the asymptotic limit {x \rightarrow x_*}, in order to reduce clutter.

(This is the “non-asymptotic” form of the {O()} notation, in which the bounds are assumed to hold for all values of parameters. There is also an “asymptotic” variant of this notation that is commonly used in some fields, in which the bounds in question are only assumed to hold in some neighbourhood of an asymptotic value {x_*}, but we will not focus on that variant here.)

Thus, for instance, {n} is a free variable taking values in the natural numbers, and {f(n), g(n), h(n)} are quantities depending on {n}, then the statement {f(n) = g(n) + O(h(n))} denotes the assertion that {f(n) = g(n) + k(n)} for all natural numbers {n}, where {k(n)} is another quantity depending on {n} such that {|k(n)| \leq C h(n)} for all {n}, and some absolute constant {C} independent of {n}. Similarly, {f(n) \leq g(n) + O(h(n))} denotes the assertion that {f(n) \leq g(n) + k(n)} for all natural numbers {n}, where {k(n)} is as above.

For a slightly more sophisticated example, consider the statement

\displaystyle  O(n) + O(n^2) \leq O(n^2), \ \ \ \ \ (6)

where again {n} is a free variable taking values in the natural numbers. Using the conventions for multi-valued expressions, we can translate this expression into first-order logic as the assertion that whenever {f(n), g(n)} are quantities depending on {n} such that there exists a constant {C_1} such that {|f(n)| \leq C_1 n} for all natural numbers {n}, and there exists a constant {C_2} such that {|g(n)| \leq C_2 n^2} for all natural numbers {n}, then we have {f(n)+g(n) \leq h(n)} for all {n}, where {h} is a quantity depending on natural numbers {n} with the property that there exists a constant {C_3} such that {|h(n)| \leq C_3 n^2}. Note that the first-order translation of (6) is substantially longer than (6) itself; and once one gains familiarity with the big-O notation, (6) can be deciphered much more quickly than its formal first-order translation.

It can be instructive to rewrite some basic notions in analysis in this sort of notation, just to get a slightly different perspective. For instance, if {f: {\bf R} \rightarrow {\bf R}} is a function, then:

  • {f} is continuous iff one has {f(y) = f(x) + o_{y \rightarrow x; f,x}(1)} for all {x,y \in {\bf R}}.
  • {f} is uniformly continuous iff one has {f(y) = f(x) + o_{|y-x| \rightarrow 0; f}(1)} for all {x,y \in {\bf R}}.
  • A sequence {F = (f_n)_{n \in {\bf N}}} of functions is equicontinuous if one has {f_n(y) = f_n(x) + o_{y \rightarrow x; F,x}(1)} for all {x,y \in {\bf R}} and {n \in {\bf N}} (note that the implied constant depends on the family {F}, but not on the specific function {f_n} or on the index {n}).
  • A sequence {F = (f_n)_{n \in {\bf N}}} of functions is uniformly equicontinuous if one has {f_n(y) = f_n(x) + o_{|y-x| \rightarrow 0; F}(1)} for all {x,y \in {\bf R}} and {n \in {\bf N}}.
  • {f} is differentiable iff one has {f(y) = f(x) + (y-x) f'(x) + o_{y \rightarrow x; f,x}(|y-x|)} for all {x,y \in {\bf R}}.
  • Similarly for uniformly differentiable, equidifferentiable, etc..

Remark 5 One can define additional variants of asymptotic notation such as {\Omega(X)}, {\omega(X)}, or {\Theta(X)}; see this wikipedia page for some examples. See also the related notion of “sufficiently large” or “sufficiently small”. However, one can usually reformulate such notations in terms of the above-mentioned asymptotic notations with a little bit of rearranging. For instance, the assertion

\displaystyle  \hbox{For all sufficiently large } n, P(n) \hbox{ is true}

can be rephrased as an alternative:

\displaystyle  \hbox{If } n \hbox{ is a natural number, then } P(n) \hbox{, or } n = O(1).

When used correctly, asymptotic notation can suppress a lot of distracting quantifiers (“there exists a {C} such that for every {n} one has…”) or temporary notation which is introduced once and then discarded (“where {C} is a constant, not necessarily the same as the constant {C} from the preceding line…”). It is particularly well adapted to situations in which the order of magnitude of a quantity of interest is of more importance than its exact value, and can help capture precisely such intuitive notions as “large”, “small”, “lower order”, “main term”, “error term”, etc.. Furthermore, I find that analytic assertions phrased using asymptotic notation tend to align better with the natural sentence structure of mathematical English than their logical equivalents in other notational conventions (e.g. first-order logic).

On the other hand, the notation can be somewhat confusing to use at first, as expressions involving asymptotic notation do not always obey the familiar laws of mathematical deduction if applied blindly; but the failures can be explained by the changes to orthodox first order logic indicated above. For instance, if {n} is a positive integer (which we will assume to be at least say {100}, in order to make sense of quantities such as {\log\log n}), then

  • (i) (Asymmetry of equality) We have {O(n) = O(n^2)}, but it is not true that {O(n^2) = O(n)}. In the same spirit, {O(n) \leq O(n^2)} is a true statement, but {O(n^2) \geq O(n)} is a false statement. Similarly for the {o()} notation. This of course stems from the asymmetry of the equality relation {=} that arises once one introduces partially specified objects.
  • (ii) (Intransitivity of equality) We have {n+1 = O(n)}, and {n+2 = O(n)}, but {n+1 \neq n+2}. This is again stemming from the asymmetry of the equality relation.
  • (iii) (Incompatibility with functional notation) {O(n)} generally refers to a function of {n}, but {O(1)} usually does not refer to a function of {1} (for instance, it is true that {\frac{1}{n} = O(1)}). This is a slightly unfortunate consequence of the overloaded nature of the parentheses symbols in mathematics, but as long as one keeps in mind that {O} and {o} are not function symbols, one can avoid ambiguity.
  • (iv) (Incompatibility with mathematical induction) We have {O(n+1)=O(n)}, and more generally {O(n+k+1)=O(n+k)} for any {k \geq 1}, but one cannot blindly apply induction and conclude that {O(n+k)=O(n)} for all {n,k \geq 1} (with {k} viewed as an additional parameter). This is because to induct on an internal parameter such as {n}, one is only allowed to use internal predicates {P(n)}; the assertion {O(n+1)=O(n)}, which also quantifies externally over some implicit constants {C}, is not an internal predicate. However, external induction is still valid, permitting one to conclude that {O(n+k)=O(n)} for any fixed (external) {k \geq 1}, or equivalently that {O(n+k) = O_k(n)} if {k} is now viewed instead as a parameter.
  • (v) (Failure of the law of generalisation) Every specific (or “fixed”) positive integer, such as {57}, is of the form {O(1)}, but the positive integer {n} is not of the form {O(1)}. (Again, this can be fixed by allowing implied constants to depend on the parameter one is generalising over.) Like (iv), this arises from the need to distinguish between external (fixed) variables and internal parameters.
  • (vi) (Failure of the axiom schema of specification) Given a set {X} and a predicate {P(x)} involving elements {x} of {X}, the axiom of specification allows one to use set builder notation to form the set {\{ x \in X: P(x) \}}. However, this is no longer possible if {P} involves asymptotic notation. For instance, one cannot form the “set” {\{ x \in {\bf R}: x = O(1) \}} of bounded real numbers, which somehow manages to contain all fixed numbers such as {57}, but not any unbounded free parameters such as {n}. (But, if one uses the nonstandard analysis formalism, it becomes possible again to form such sets, with the important caveat that such sets are now external sets rather than internal sets. For instance, the external set {\{ x \in {}^* {\bf R}: x = O(1) \}} of bounded nonstandard reals becomes a proper subring of the ring of nonstandard reals.) This failure is again related to the distinction between internal and external predicates.
  • (vii) (Failure of trichotomy) For non-asymptotic real numbers {X,Y}, exactly one of the statements {X<Y}, {X=Y}, {X>Y} hold. As discussed in the previous section, this is not the case for asymptotic quantities: none of the three statements {O(|\sin(n)|) = O(|\cos(n)|)}, {O(|\sin(n)|) > O(|\cos(n)|)}, or {O(|\sin(n)|) < O(|\cos(n)|)} are true, while all three of the statements {O(n) < O(n)}, {O(n) = O(n)}, and {O(n) > O(n)} are true. (This trichotomy can however be restored by using the nonstandard analysis formalism, or (in some cases) by restricting {n} to an appropriate subsequence whenever necessary.)
  • (viii) (Unintuitive interaction with {\neq}) Asymptotic notation interacts in strange ways with the {\neq} symbol, to the extent that combining the two together is not recommended. For instance, the statement {O(n) \neq O(n)} is a true statement, because for any expression {a_n} of order {a_n = O(n)}, one can find another expression {b_n} of order {b_n = O(n)} such that {a_n \neq b_n} for all {n}. Instead of using statements such as {X \neq Y} in which one of {X,Y} contain asymptotic notation, I would instead recommend using the different statement “it is not the case that {X=Y}“, e.g. “it is not the case that {O(n^2)=O(n)}“. And even then, I would generally only use negation of asymptotic statements in order to demonstrate the incorrectness of some particular argument involving asymptotic notation, and not as part of any positive argument involving such notations. These issues are of course related to (vii).
  • (ix) (Failure of cancellation law) We have {O(n) + O(n) = O(n)}, but one cannot cancel one of the {O(n)} terms and conclude that {O(n) = 0}. Indeed, {O(n)-O(n)} is not equal to {0} in general. (For instance, {2n = O(n)} and {n=O(n)}, but {2n-n \neq 0}.) More generally, {O(X)-O(Y)} is not in general equal to {O(X-Y)} or even to {O(|X-Y|)} (although there is an important exception when one of {X,Y} dominates the other). Similarly for the {o()} notation. This stems from the care one has to take in the law of substitution when working with partially specified quantities that appear multiple times on the left-hand side.
  • (x) ({O()}, {o()} do not commute with signed multiplication) If {X,Y} are non-negative, then {X \cdot O(Y) = O(XY)} and {X \cdot o_{n \rightarrow \infty}(Y) = o_{n \rightarrow \infty}(XY)}. However, these laws do not work if {X} is signed; indeed, as currently defined {O(XY)} and {o_{n \rightarrow \infty}(XY)} do not even make sense. Thus for instance {-O(n)} cannot be written as {O(-n)}. (However, one does have {X \cdot O(Y) = O(|X| Y)} and {X \cdot o_{n \rightarrow \infty}(Y) = o_{n \rightarrow \infty}(|X| Y)} when {X} is signed.) This comes from the absolute values present in the {O()}-notation. For beginners, I would recommend not placing any signed quantities inside the {O()} and {o()} symbols if at all possible.
  • (xi) ({O()} need not distribute over summation) For each fixed {k}, {k = O(1)}, and {\sum_{k=1}^n 1 = n}, but it is not the case that {\sum_{k=1}^n k = O(n)}. This example seems to indicate that the assertion {\sum_{k=1}^n O(1) = O( \sum_{k=1}^n 1 )} is not true, but that is because we have conflated an external (fixed) quantity {k} with an internal parameter {k} (the latter being needed to define the summation {\sum_{k=1}^n}). The more precise statements (with {k} now consistently an internal parameter) are that {k = O_k(1)}, and that the assertion {\sum_{k=1}^n O_k(1) = O( \sum_{k=1}^n 1 )} is not true, but the assertion {\sum_{k=1}^n O(1) = O( \sum_{k=1}^n 1 )} is still true (why?).
  • (xii) ({o()} does not distribute over summation, I) Let {\varepsilon_k(n) := \exp(\exp(n)) 1_{k \geq \log\log n - 1}}, then for each fixed {k} one has {\varepsilon_k(n) = o_{n \rightarrow \infty}(1)}; however, {\sum_{k=1}^{\log\log n} \varepsilon_k(n) \geq \exp(\exp(n))}. Thus an expression of the form {\sum_{k=1}^{\log\log n} o_{n \rightarrow \infty}(1)} can in fact grow extremely fast in {n} (and in particular is not of the form {o_{n \rightarrow \infty}(\log\log n)} or even {O(\log\log n)}). Of course, one could replace {\log\log n} here by any other growing function of {n}. This is a similar issue to (xi); it shows that the assertion

    \displaystyle  \sum_{k=1}^N o_{n \rightarrow \infty;k}(1) = o_{n \rightarrow \infty}(\sum_{k=1}^N 1)

    can fail, but if one has uniformity in the {k} parameter then things are fine:

    \displaystyle  \sum_{k=1}^N o_{n \rightarrow \infty}(1) = o_{n \rightarrow \infty}(\sum_{k=1}^N 1).

  • (xiii) ({o()} does not distribute over summation, II) In the previous example, the {o_{n \rightarrow \infty}(1)} summands were not uniformly bounded. If one imposes uniform boundedness, then one now recovers the {O()} bound, but one can still lose the {o()} bound. For instance, let {\varepsilon_k(n) := 1_{k \geq \log\log n}}, then {\varepsilon_k(n)} is now uniformly bounded in magnitude by {1}, and for each fixed {k} one has {\varepsilon_k(n) = o_{n \rightarrow \infty}(1)}; however, {\sum_{k=1}^n \varepsilon_k(n) = n - O(\log\log n)}. Thus, viewing {k} now as a parameter, the expression {\sum_{k=1}^n o_{n \rightarrow \infty;k}(1)} is bounded by {O(n)}, but not by {o_{n \rightarrow \infty}(n)}. (However, one can write {\sum_{k=1}^n o_{n \rightarrow \infty}(a_k) = o_{n \rightarrow \infty}(\sum_{k=1}^n |a_k|)} since by our conventions the implied decay rates in the {o_{n \rightarrow \infty}(a_k)} summands are uniform in {n}.)
  • (xiv) ({o()} does not distribute over summation, III) If {a_k} are non-negative quantities, and one has a summation of the form {\sum_{k=1}^n (1+o_{n \rightarrow \infty}(1)) a_k} (noting here that the decay rate is not allowed to depend on {k}), then one can “factor out” the {1+o_{n \rightarrow \infty}(1)} term to write this summation as {(1 + o_{n \rightarrow \infty}(1)) (\sum_{k=1}^n a_k)}. However this is far from being true if the sum {\sum_{k=1}^n a_k} exhibits significant cancellation. This is most vivid in the case when the sum {\sum_{k=1}^n a_k} actually vanishes. For another example, the sum {\sum_{k=1}^{n} (1 + \frac{(-1)^k}{\log\log n}) (-1)^k} is equal to {\frac{n}{\log\log n} + O(1)}, despite the fact that {1 + \frac{(-1)^k}{\log\log n} = 1 + o_{n \rightarrow\infty}(1)} uniformly in {k}, and that {\sum_{k=1}^n (-1)^k = O(1)}. For oscillating {a_k}, the best one can say in general is that

    \displaystyle \sum_{k=1}^n (1+o_{n \rightarrow \infty}(1)) a_k = \sum_{k=1}^n a_k + o_{n \rightarrow \infty}( \sum_{k=1}^n |a_k| ).

    Similarly for the {O()} notation. I see this type of error often among beginner users of asymptotic notation. Again, the general remedy is to avoid putting any signed quantities inside the {O()} or {o()} notations.

Perhaps the quickest way to develop some basic safeguards is to be aware of certain “red flags” that indicate incorrect, or at least dubious, uses of asymptotic notation, as well as complementary “safety indicators” that give more reassurance that the usage of asymptotic notation is valid. From the above examples, we can construct a small table of such red flags and safety indicators for any expression or argument involving asymptotic notation:

Red flag Safety indicator
Signed quantities in RHS Absolute values in RHS
Casually using iteration/induction Explicitly allowing bounds to depend on length of iteration/induction
Casually summing an unbounded number of terms Keeping number of terms bounded and/or ensuring uniform bounds on each term
Casually changing a “fixed” quantity to a “variable” or “bound” one Keeping track of what parameters implied constants depend on
Casually establishing lower bounds or asymptotics Establishing upper bounds and/or being careful with signs and absolute values
Signed algebraic manipulations (e.g., cancellation law) Unsigned algebraic manipulations
{X \neq Y} Negation of {X=Y}; or, better still, avoiding negation altogether
Swapping LHS and RHS Not swapping LHS and RHS
Using trichotomy of order Not using trichotomy of order
Set-builder notation Not using set-builder notation (or, in non-standard analysis, distinguishing internal sets from external sets)

When I say here that some mathematical step is performed “casually”, I mean that it is done without any of the additional care that is necessary when this step involves asymptotic notation (that is to say, the step is performed by blindly applying some mathematical law that may be valid for manipulation of non-asymptotic quantities, but can be dangerous when applied to asymptotic ones). It should also be noted that many of these red flags can be disregarded if the portion of the argument containing the red flag is free of asymptotic notation. For instance, one could have an argument that uses asymptotic notation in most places, except at one stage where mathematical induction is used, at which point the argument switches to more traditional notation (using explicit constants rather than implied ones, etc.). This is in fact the opposite of a red flag, as it shows that the author is aware of the potential dangers of combining induction and asymptotic notation. Similarly for the other red flags listed above; for instance, the use of set-builder notation that conspicuously avoids using the asymptotic notation that appears elsewhere in an argument is reassuring rather than suspicious.

If one finds oneself trying to use asymptotic notation in a way that raises one or more of these red flags, I would strongly recommend working out that step as carefully as possible, ideally by writing out both the hypotheses and conclusions of that step in non-asymptotic language (with all quantifiers present and in the correct order), and seeing if one can actually derive the conclusion from the hypothesis by traditional means (i.e., without explicit use of asymptotic notation ). Conversely, if one is reading a paper that uses asymptotic notation in a manner that casually raises several red flags without any apparent attempt to counteract them, one should be particularly skeptical of these portions of the paper.

As a simple example of asymptotic notation in action, we give a proof that convergent sequences also converge in the Césaro sense:

Proposition 6 If {\vec x = (x_n)_{n \in {\bf N}}} is a sequence of real numbers converging to a limit {L}, then the averages {\frac{1}{N} \sum_{n=1}^N x_n} also converge to {L} as {N \rightarrow \infty}.

Proof: Since {x_n} converges to {L}, we have

\displaystyle  x_n = L + o_{n \rightarrow \infty; \vec x}(1)

so in particular for any {M \geq 1} we have

\displaystyle  x_n = L + o_{M \rightarrow \infty; \vec x}(1)

whenever {n > M}. For {N \geq M}, we thus have

\displaystyle  \frac{1}{N} \sum_{n=1}^N x_n = \frac{1}{N} ( \sum_{n=1}^M x_n + \sum_{n=M+1}^N x_n)

\displaystyle  = \frac{1}{N}( O_{M,\vec x}(1) + \sum_{n=M+1}^N (L + o_{M \rightarrow \infty; \vec x}(1)))

\displaystyle  = \frac{1}{N}( O_{M,\vec x}(1) + (N-M) L + o_{M \rightarrow \infty; \vec x}(N))

\displaystyle  = O_{M,\vec x}(1/N) + L - O_{M,L}(1/N) + o_{M \rightarrow \infty; \vec x}(1)

\displaystyle  = L + o_{M \rightarrow \infty; \vec x}(1) + o_{N \rightarrow \infty; \vec x, M, L}(1)

whenever {N \geq M}. Setting {M} to grow sufficiently slowly to infinity as {N \rightarrow \infty} (for fixed {\vec x, \vec L}), we may simplify this to

\displaystyle  \frac{1}{N} \sum_{n=1}^N x_n = o_{N \rightarrow \infty; \vec x, L}(1)

for all {N \geq 1}, and the claim follows. \Box

May 20, 2022

Matt von HippelAt New Ideas in Cosmology

The Niels Bohr Institute is hosting a conference this week on New Ideas in Cosmology. I’m no cosmologist, but it’s a pretty cool field, so as a local I’ve been sitting in on some of the talks. So far they’ve had a selection of really interesting speakers with quite a variety of interests, including a talk by Roger Penrose with his trademark hand-stippled drawings.

Including this old classic

One thing that has impressed me has been the “interdisciplinary” feel of the conference. By all rights this should be one “discipline”, cosmology. But in practice, each speaker came at the subject from a different direction. They all had a shared core of knowledge, common models of the universe they all compare to. But the knowledge they brought to the subject varied: some had deep knowledge of the mathematics of gravity, others worked with string theory, or particle physics, or numerical simulations. Each talk, aware of the varied audience, was a bit “colloquium-style“, introducing a framework before diving in to the latest research. Each speaker knew enough to talk to the others, but not so much that they couldn’t learn from them. It’s been unexpectedly refreshing, a real interdisciplinary conference done right.

May 18, 2022

Matt StrasslerEarth Orbits the Sun, or Not? Why Coordinates Can’t Be Relevant to the Question.

We’ve been having some fun recently with Sun-centered and Earth-centered coordinate systems, as related to a provocative claim by certain serious scientists, most recently Berkeley professor Richard Muller. They claim that in general relativity (Einstein’s theory of gravity, the same fantastic mathematical invention which predicted black holes and gravitational waves and gravitational lensing) the statement that “The Sun Orbits the Earth” is just as true as the statement that “The Earth Orbits the Sun”… or that perhaps both statements are equally meaningless.

But, uh… sorry. All this fun with coordinates was beside the point. The truth, falsehood, or meaninglessness of “the Earth orbits the Sun” will not be answered with a choice of coordinates. Coordinates are labels. In this context, they are simply ways of labeling points in space and time. Changing how you label a system changes only how you describe that system; it does not change anything physically meaningful about that system. So rather than focusing on coordinates and how they can make things appear, we should spend some time thinking about which things do not depend on our choice of coordinates.

And so our question really needs to be this: does the statement “The Earth Orbits the Sun (and not the other way round)” have coordinate-independent meaning, and if so, is it true?

Because we are dealing with the coordinate-independence of a four-dimensional spacetime, which is not the easiest thing to think about, it’s best to build some intuition by looking at a two-dimensional spatial shape first. Let’s look at what’s coordinate-independent and coordinate-dependent about the surface of the Earth.

Is “the Earth is Not Flat” a meaningful statement?

If Muller is right, can’t we choose coordinates in which the Earth is flat? For example, how about these coordinates:

or these:

or these:

Look familiar? The third choice represents the coordinates beloved of flat Earthers, and indeed, in this view, the entire Earth, excepting the point at the south pole, is flat. In fact, all three maps are flat. But they represent a flat labeling of the Earth, not a flattening of the Earth itself.

More generally, there are an infinite number of possible maps that can be made of the Earth that will make it appear flat. (Many of them are gathered here — have fun!) Some of them will make Greenland appear larger than the continent of Africa; others will make it seem that the center of the world is the USA, or China, or Ethiopia, or the South Pole; still others will make it seem that the south tip of South America is farther from Australia than it is from parts of Asia. It doesn’t matter how things seem. Coordinates can make all kinds of things seem to be one way or another. But anything that a coordinate choice can change cannot be real. The only thing that matters is how things are. A coordinate system is just about how you describe those things that are.

To avoid confusion about what is real and what is not, you need to know how to measure things in such a way that the answers you obtain don’t depend on your coordinates. The surface area of Greenland and Africa, the shortest distance from South America to Australia or Asia, and the lack of any “central point” on our planet are all things that you can determine using whatever coordinates you choose, or in some cases without ever using coordinates at all; and if you do it correctly, you will always get the same answer no matter which coordinates you use. (For example, to measure the area of Greenland, make yourself a 1 meter by 1 meter square pieces of cardboard at home, and then go to Greenland yourself and draw a grid on it using your cardboard, until you’ve covered the whole thing.) You have to account for the distortions the map introduces; that takes some math, but if you use the math correctly, it will undo the distortions in just such a way as to assure that all true facts remain true facts. Do not let yourself be confused by the mere appearance of the map.

[… which is to say… just because I’ve chosen to draw the solar system so that it appears as though the Sun orbits the Earth doesn’t automatically mean that the Sun does orbit the Earth.]

So, how can we determine if a space is or isn’t flat? Here’s one approach: Take a square-shaped walk: walk N paces, turn right 90 degrees, walk N paces, turn right 90 degrees, walk N paces, and turn right 90 degrees, and walk N paces. On any surface, if N is small enough you will come back to your starting point. As N increases, does this remain true? If the space is flat, it will always be true, no matter how long your walks, no matter where you start and no matter which direction you go — as long as your walk doesn’t collide with the edge of the space. But if, as N increases, you find your end point is further and further from your starting point, that tells you the space is not flat.

Of course the Earth, or any real-world surface isn’t expected to be exactly a sphere or exactly flat; it has wiggles on it, in the form of hills and valleys. So we have to allow for these statements to not be exactly correct. However, the largest wiggles on Earth are only about 10 miles deep or tall, while the planet is 24000 miles around, so we should have no trouble distinguishing a nearly-flat Earth from a nearly-spherical Earth.

Imagine you start at the north pole, walk 6225 miles (about 10000 km) in any direction, turn right 90 degrees, walk 6225 miles, turn right 90 degrees, walk 6225 miles, turn right 90 degrees, and walk 6225 miles again. If the Earth were flat, you’d have ended up back at the north pole after the fourth leg of your trip. But on the real Earth, the first leg of your trip brings you to the equator; the second is along the equator; the third brings you back to the north pole, and the fourth takes you back to the equator. End of discussion; the Earth is not flat.

The coordinates we put on the Earth’s surface play no role in this determination, because in all this, you never needed to know anything about coordinates applied across the whole planet. The distance you walked can be measured by the wear and tear on your shoes; the direction you walk in each segment is a straight line (one foot in front of the other), and you can make a 90 degree turn using a straight-edge that you carry with you.

Is “the Earth’s a Sphere” a meaningful statement?

Even if we accept the Earth isn’t flat, can’t we choose coordinates that make it look like a cucumber, a pear, a peanut, or maybe even a frisbee? Sure. We can label points however we like, and then draw them however we like, so that it appears to be very different from a sphere. In fact we could use standard latitude-longitude coordinates and project them onto a plane using weird lenses to make them look like any shape we want. But the Earth is still a sphere.

How do we see that? What’s true for the north pole is true for every point on Earth. Starting in any direction, if you walk in a triangle, not a square, whose sides are length 6225 miles (measured by the number of steps you take as you go in a straight line, and not requiring any coordinate system), and whose angles are 90 degrees, you will come back to your starting point.

Since all points and all directions have this feature, the Earth’s surface is (approximately) a “homogeneous isotropic space” (all points and directions are equivalent). The fact that a triangle with 90 degree angles can bring you back to your starting point means it is “positively curved”, and a two-dimensional positively curved homogeneous isotropic space must be a complete sphere.

I used a different approach to prove the Earth’s a sphere using the Tonga eruption’s pressure waves, way back when I started this series. On a sphere, any journey in a straight line, moving in any direction from any point on the surface, will come back to itself after having traveled the same distance (the sphere’s circumference), or equivalently (if the speed of the journey is constant) having taken the same amount of time. You can tell this without coordinates; you just need to observe that all the pressure waves from Tonga (and from Krakatoa’s eruption also) roughly intersected each other halfway around and all the way around the world.

Both of these coordinate-invariant statements involve studying large paths on the surface. A different approach is to study the properties of the surface using relatively short paths, the method of measuring “local curvature.” There are various ways to do this, but the easiest is to take a triangular walk — any one you like — such that on the third leg of the triangle you return to your initial point. At each of the three points on your walk where you changed direction, measure the angle. We all know that on a flat surface, the sum of the three angles will be 180 degrees. On a positively curved surface like a sphere or cucumber, it will be larger than 180 degrees. The amount of excess angle will grow as we take larger and larger triangles, and we can use this to determine how curved the Earth’s surface is… never using a coordinate system.

Is “The Earth is Rotating” a meaningful statement?

Notice that all the coordinate systems we’ve talked about so far “rotate with the Earth”, which is to say, they make it appear that the Earth is not rotating. Does that mean it doesn’t rotate? or that rotation is meaningless?

Of course not. Foucault pendulums and gyroscopes do what they do, showing the Earth rotates relative to the slowly drifting stars, independent of whether we set longitude to be fixed upon the Earth’s continents, or whether we fix it longitude in the stars and let the Earth rotate underneath them, or choose some other time-dependent coordinate system. In this sense, the Earth’s rotation is coordinate-independent.

Lessons?

Clearly, we need to be very cautious about drawing any conclusions from coordinates. Being cavalier about coordinates will lead to mistakes. The mere fact that I can redraw the solar system in geocentric coordinates has absolutely nothing to say about whether “Sun orbits the Earth” is false, or meaningless, etc.

A critical issue is to identify what is coordinate-independent and what is not; anything that is not truly coordinate-independent is suspect. Sometimes a particular coordinate-dependent viewpoint is useful, but you should always understand what alternative viewpoints would tell you, so that you don’t overinterpret. (Over the coming months we will see just how deeply this issue, in various sophisticated forms, permeates all of modern high-energy physics.)

Another lesson: imagine someone told you that even though spherical coordinates (latitude and longitude) are the simplest coordinates (because they make the Earth look simple, and also make the equations describing it simpler), they don’t reflect anything meaningful about the Earth — that with a different choice of coordinate system, the Earth could just as well be a pear or a cucumber or a log. Or flat. That all of these things are equally true.

This person would have drawn the wrong conclusion. Spherical coordinates are certainly not the “right” coordinates — coordinates are arbitrary — but the fact that they are so simple on the Earth reflects something real about the Earth. Spherical coordinates are simple not because they are right but because the underlying space is a sphere. Had the Earth been a knobby, blobby, spiky shape, then spherical coordinates would have been no simpler than any others.

So simple coordinates, despite their arbitrariness, can reflect something important and meaningful about the underlying physical system. And that raises a question. We all agree, including professor Muller, that heliocentric (Sun-centered) coordinates make the appearance and behavior of the solar system, and the equations describing its behavior, somewhat simpler. We also all agree that coordinates are arbitrary. But should we then conclude that the simplicity of Sun-centered coordinates for the solar system is a pure fluke? Might it not reflect something simple about the underlying space-time geometry — something which could perhaps tell us that yes, unequivocally, the Earth orbits the Sun (and the Sun does not orbit the Earth)?

John PreskillQuantum Encryption in a Box

Over the last few decades, transistor density has become so high that classical computers have run into problems with some of the quirks of quantum mechanics. Quantum computers, on the other hand, exploit these quirks to revolutionize the way computers work. They promise secure communications, simulation of complex molecules, ultrafast computations, and much more. The fear of being left behind as this new technology develops is now becoming pervasive around the world. As a result, there are large, near-term investments in developing quantum technologies, with parallel efforts aimed at attracting young people into the field of quantum information science and engineering in the long-term.

I was not surprised then that, after completing my master’s thesis in quantum optics at TU Berlin in Germany, I was invited to participate in a program called Quanten 1×1 and hosted by the Junge Tueftler (Young Tinkerers) non-profit, to get young people excited about quantum technologies. As part of a small team, we decided to develop tabletop games to explain the concepts of superposition, entanglement, quantum gates, and quantum encryption. In the sections that follow, I will introduce the thought process that led to the design of one of the final products on quantum encryption. If you want to learn more about the other games, you can find the relevant links at the end of this post.

The price of admission into the quantum realm

How much quantum mechanics is too much? Is it enough for people to know about the health of Schrödinger’s cat, or should we use a squishy ball with a smiley face and an arrow on it to get people excited about qubits and the Bloch sphere? In other words, what is the best way to go beyond metaphors and start delving into the real stuff? After all, we are talking about cutting-edge quantum technology here, which requires years of study to understand. Even the quantum experts I met with during the project had a hard time explaining their work to lay people.

Since there is no standardized way to explain these topics outside a university, the goal of our project was to try different models to teach quantum phenomena and make the learning as entertaining as possible. Compared to methods where people passively absorb the information, our tabletop-games approach leverages people’s curiosity and leads to active learning through trial and error.

A wooden quantum key generator (BB84)

Everybody has secrets

Most of the (sensitive) information that is transmitted over the Internet is encrypted. This means that only those with the right “secret key” can unlock the digital box and read the private message within. Without the secret key used to decrypt, the message looks like gibberish – a series of random characters. To encrypt the billions of messages being exchanged every day (over 300 billion emails alone), the Internet relies heavily on public-key cryptography and so-called one-way functions. These mathematical functions allow one to generate a public key to be shared with everyone, from a private key kept to themselves. The public key plays the role of a digital padlock that only the private key can unlock. Anyone (human or computer) who wants to communicate with you privately can get a digital copy of your padlock (by copying it from a pinned tweet on your Twitter account, for example), put their private message inside a digital box provided by their favorite app or Internet communication protocol running behind the scenes, lock the digital box using your digital padlock (public-key), and then send it over to you (or, accidentally, to anyone else who may be trying to eavesdrop). Ingeniously, only the person with the private key (you) can open the box and read the message, even if everyone in the world has access to that digital box and padlock.

But there is a problem. Current one-way functions hide the private key within the public key in a way that powerful enough quantum computers can reveal. The implications of this are pretty staggering. Your information (bank account, email, bitcoin wallet, etc) as currently encrypted will be available to anyone with such a computer. This is a very serious issue of global importance. So serious indeed, that the President of the United States recently released a memo aimed at addressing this very issue. Fortunately, there are ways to fight quantum with quantum. That is, there are quantum encryption protocols that not even quantum computers can break. In fact, they are as secure as the laws of physics.

Quantum Keys

A popular way of illustrating how quantum encryption works is through single photon sources and polarization filters. In classroom settings, this often boils down to lasers and small polarizing filters a few meters apart. Although lasers are pretty cool, they emit streams of photons (particles of light), not single photons needed for quantum encryption. Moreover, measuring polarization of individual photons (another essential part of this process) is often very tricky, especially without the right equipment. In my opinion the concept of quantum mechanical measurement and the collapse of wave functions is not easily communicated in this way.

Inspired by wooden toys and puzzles my mom bought for me as a kid after visits to the dentist, I tried to look for a more physical way to visualize the experiment behind the famous BB84 quantum key distribution protocol. After a lot of back and forth between the drawing board and laser cutter, the first quantum key generator (QeyGen) was built. 

How does the box work?

Note: This short description leaves out some details. For a deeper dive, I recommend watching the tutorial video on our Youtube channel.

The quantum key generator (QeyGen) consists of an outer and an inner box. The outer box is used by the person generating the secret key, while the inner box is used by the person with whom they wish to share that key. The sender prepares a coin in one of two states (heads = 0, tails = 1) and inserts it either into slot 1 (horizontal basis), or slot 2 (vertical basis) of the outer box. The receiver then measures the state of the coin in one of the same two bases by sliding the inner box to the left (horizontal basis = 1) or right (vertical basis = 2). Crucially, if the bases to prepare and measure the coin match, then both sender and receiver get the same value for the coin. But if the basis used to prepare the coin doesn’t match the measurement basis, the value of the coin collapses into one of the two allowed states in the measurement basis with 50/50 chance. Because of this design, the box can be used to illustrate the BB84 protocol that allows two distant parties to create and share a secure encryption key.

Simulating the BB84 protocol

The following is a step by step tutorial on how to play out the BB84 protocol with the QeyGen. You can play it with two (Alice, Bob) or three (Alice, Bob, Eve) people. It is useful to know right from the start that this protocol is not used to send private messages, but is instead used to generate a shared private key that can then be used with various encryption methods, like the one-time pad, to send secret messages.

BB84 Protocol:

  1. Alice secretly “prepares” a coin by inserting it facing-towards (0) or facing-away (1) from her into one of the two slots (bases) on the outer box. She writes down the value (0 or 1) and basis (horizontal or vertical) of the coin she just inserted.
  2. (optional) Eve, the eavesdropper, tries to “measure” the coin by sliding the inner box left (horizontal basis) or right (vertical basis), before putting the coin back through the outer box without anyone noticing.
  3. Bob then secretly measures the coin in a basis of his choice and writes down the value (0 or 1) and basis (horizontal and vertical) as well.
  4. Steps 1 and 3 are then repeated several times. The more times Alice and Bob go through this process, the more secure their secret key will be.

Sharing the key while checking for eavesdroppers:

  1. Alice and Bob publicly discuss which bases they used at each “prepare” and “measure” step, and cross out the values of the coin corresponding to the bases that didn’t match (about half of them on average; here, it would be rounds 1,3,5,6,7, and 11).
  2. Then, they publicly announce the first few (or a random subset of the) values that survive the previous step (i.e. have matching bases; here, it is rounds 2 and 4). If the values match for each round, then it is safe to assume that there was no eavesdrop attack. The remaining values are kept secret and can be used as a secure key for further communication.
  3. If the values of Alice and Bob don’t match, Eve must have measured the coin (before Bob) in the wrong basis (hence, randomizing its value) and put it back in the wrong orientation from the one Alice had originally chosen. Having detected Eve’s presence, Alice and Bob switch to a different channel of communication and try again.

Note that the more rounds Alice and Bob choose for the eavesdropper detection, the higher the chance that the channel of communication is secure, since N rounds that all return the same value for the coin mean a 2^{-N} chance that Eve got lucky and guessed Alice’s inputs correctly. To put this in perspective, a 20-round check for Eve provides a 99.9999% guarantee of security. Of course, the more rounds used to check for Eve, the fewer secure bits are left for Alice and Bob to share at the end. On average, after a total of 2(N+M) rounds, with N rounds dedicated to Eve, we get an M-bit secret key.

What do people learn?

When we play with the box, we usually encounter three main topics that we discuss with the participants.

  1. qm states and quantum particles: We talk about superposition of quantum particles and draw an analogy from the coin to polarized photons.
  2. qm measurement and basis: We ask about the state of the coin and discuss how we actually define a state and a basis for a coin. By using the box, we emphasize that the measurement itself (in which basis the coin is observed) can directly affect the state of the coin and collapse its “wavefunction”.
  3. BB84 protocol: After a little playtime of preparing and measuring the coin with the box, we introduce the steps to perform the BB84 protocol as described above. The penny-dropping moment (pun intended) often happens when the participants realize that a spy intervening between preparation and measurement can change the state of the coin, leading to contradictions in the subsequent eavesdrop test of the protocol and exposing the spy.

I hope that this small outline has provided a rough idea of how the box works and why we developed it. If you have access to a laser cutter, I highly recommend making a QeyGen for yourself (link to files below). For any further questions, feel free to contact me at t.schubert@fu-berlin.de.

Resources and acknowledgments

Project page Junge Tueftler: tueftelakademie.de/quantum1x1
Video series for the QeyGen: youtube.com/watch?v=YmdoAP1TJRo
Laser cut files: thingiverse.com/thing:5376516

The program was funded by the Federal Ministry of Education and Research (Germany) and was a collaboration between the Jungen Tueftlern and the Technical University of Berlin.

May 17, 2022

n-Category Café The Magnitude of Information

Guest post by Heiko Gimperlein, Magnus Goffeng and Nikoletta Louca

The magnitude of a metric space (X,d)(X,d) does not require further introduction on this blog. Two of the hosts, Tom Leinster and Simon Willerton, conjectured that the magnitude function X(R):=Mag(X,Rd)\mathcal{M}_X(R) := \mathrm{Mag}(X,R \cdot \mathrm{d}) of a convex body X nX \subset \mathbb{R}^n with Euclidean distance d\mathrm{d} captures classical geometric information about XX:

X(R)= 1n!ω nvol n(X)R n+12(n1)!ω n1vol n1(X)R n1++1 = 1n!ω n j=0 nc j(X)R nj\begin{aligned} \mathcal{M}_X(R) =& \frac{1}{n! \omega_n} \mathrm{vol}_n(X)\ R^n + \frac{1}{2(n-1)! \omega_{n-1}} \mathrm{vol}_{n-1}(\partial X)\ R^{n-1} + \cdots + 1 \\ =& \frac{1}{n! \omega_n} \sum_{j=0}^n c_j(X)\ R^{n-j} \end{aligned}

where c j(X)=γ j,nV j(X)c_j(X) = \gamma_{j,n} V_j(X) is proportional to the jj-th intrinsic volume V jV_j of XX and ω n\omega_n is the volume of the unit ball in n\mathbb{R}^n.

Even more basic geometric questions have remained unknown, including:

  • What geometric content is encoded in X\mathcal{M}_X?
  • What can be said about the magnitude function of the unit disk B 2 2B_2 \subset \mathbb{R}^2?

We discuss in this post how these questions led us to possible relations to information geometry. We would love to hear from you:

  • Is magnitude an interesting invariant for information geometry?
  • Is there a category theoretic motivation, like Lawvere’s view of a metric space as an enriched category?
  • Does the magnitude relate to notions studied in information geometry?
  • Do you have interesting questions about this invariant?

(a) Cylinder, (b) spherical shell, (c) spherical cap, (d) ball in hyperbolic plane (hyperboloid model), (e) toroidal armband.

Recent years have seen much progress to understand the geometric content of the magnitude function for domains in odd-dimensional Euclidean space. In this setting Meckes and Barceló–Carbery showed how to compute magnitude using differential equations. Nevertheless, as Carbery often emphasized, hardly anything was known even for such simple geometries as the unit disk B 2B_2 in 2\mathbb{R}^2.

Our new works Semiclassical analysis of a nonlocal boundary value problem related to magnitude and The magnitude and spectral geometry show, in particular, that as RR\to \infty,

B 2(R)=12R 2+32R+98+O(R 1),\mathcal{M}_{B_2}(R) = \frac{1}{2}R^2 + \frac{3}{2}R + \frac{9}{8}+O(R^{-1}),

and that B 2(R)\mathcal{M}_{B_2}(R) is not a polynomial.
The approach does not use differential equations, but methods for integral equations. Recall that the magnitude of a positive definite compact metric space (X,d)(X,\mathrm{d}) is defined as

(X,D)(R):= Xu R(x)dx,\mathcal{M}_{(X,D)}(R):=\int_X u_R(x) \ \mathrm{d}x,

where u Ru_R is the unique distribution supported in XX that solves the integral equation

Xe Rd(x,y)u R(y)dy=1.\int_X \mathrm{e}^{-R\mathrm{d}(x,y)}u_R(y) \ \mathrm{d}y=1.

We analyze this integral equation in geometric settings, for domains XX in n\mathbb{R}^n, in spheres, tori or, generally, a manifold with boundary with a distance function of any dimension, as illustrated in the figures above. Our results shed light on the geometric content of their magnitude. In fact, our results apply beyond classical geometry and metric spaces — not even a distance function is needed!

Our techniques suggest a life of magnitude beyond metric spaces in information geometry. There one considers a statistical manifold, i.e. a smooth manifold XX with a divergence DD, not with a distance function: See John Baez’s posts on information geometry. A first example of a divergence is the square of the subspace distance |xy| 2|x-y|^2 on a submanifold in Euclidean space. A second example is the square of the geodesic distance function on a Riemannian manifold (X,g)(X,g), provided that it is smooth. (Note that on a circle the distance function and its square are non-smooth when xx and yy are conjugate points.) In general, a divergence D=D(x,y)D=D(x,y) is a smooth, non-negative function such that DD is a Riemannian metric near x=yx=y modulo lower order terms, in the sense that

D(x,x+v)=g D,x(v,v)+O(|v| 3) D(x,x+v)=g_{D,x}(v,v)+O(|v|^3)

for a Riemannian metric g Dg_D on XX.

Divergences related to relative entropy have long been used in statistics to study families of probability measures. The relative entropy of two probability measures μ\mu and ν\nu on a space Ω\Omega is defined as

D(μ,ν):= Ωlog(dνdμ)dν[0,].D(\mu,\nu):=\int_\Omega \log\left(\frac{\mathrm{d}\nu}{\mathrm{d}\mu}\right)\mathrm{d}\nu\in [0,\infty].

The notion of relative entropy and its cousins are discussed in the blog posts of Baez mentioned above and also in Leinster’s book Entropy and Diversity: The Axiomatic Approach. While the space of probability measures is too big, one can restrict to interesting submanifolds (with boundary).

Here is the definition of the magnitude function of a statistical manifold with boundary (X,D)(X,D), when R0R \gg 0 is sufficiently large:

(X,D)(R):= Xu R(x)dx,\mathcal{M}_{(X,D)}(R):=\int_X u_R(x) \ \mathrm{d}x,

where u Ru_R is the unique distribution supported in XX that solves the integral equation

Xe RD(x,y)u R(y)dy=1.\int_X \mathrm{e}^{-R\sqrt{D(x,y)}}u_R(y) \ \mathrm{d}y=1.

When DD is the square of a distance function on XX, we recover the magnitude of the metric space (X,D)(X,\sqrt{D}).

We emphasize two key points to take home:

  1. The integral equation approach is equivalent to defining the magnitude of statistical manifolds using Meckes’s classical approach in Magnitude, diversity, capacities, and dimensions of metric spaces, relying on reproducing kernel Hilbert spaces.

  2. Since DD is smooth, (X,D)\mathcal{M}_{(X,D)} shares the properties stated in The magnitude and spectral geometry for the magnitude function summarized next.

Theorem

a. The magnitude is well-defined for R0R\gg 0 sufficiently large; there the integral equation admits a unique distributional solution.

b. (X,D)\mathcal{M}_{(X,D)} extends meromorphically to the complex plane.

c. The asymptotic behavior of (X,D)(R)=1n!ω n j=0 c j(X)R nj+O(R )\mathcal{M}_{(X,D)}(R) = \frac{1}{n! \omega_n} \sum_{j=0}^\infty c_j(X) R^{n-j}+O(R^{-\infty}) is determined by the Taylor coefficients of vD(x,x+v)v\mapsto D(x,x+v) and n=dimXn=\mathrm{dim} X.

Further details and explicit computations of the first few terms c jc_j(X) can be found in The magnitude and spectral geometry: For a Riemannian manifold c 0c_0 is proportional to the volume of XX, while c 1c_1 is proportional to the surface area of X\partial X. c 2c_2 involves the integral of the scalar curvature of XX and the integral of the mean curvature of X\partial X. All these computations are relative to DD and the Riemannian metric that DD defines. For Euclidean domains X nX \subset \mathbb{R}^n, c 3c_3 is proportional to the Willmore energy of X\partial X (proven with older technology in another paper: The Willmore energy and the magnitude of Euclidean domains). We note that

in all known computations of asymptotics for Euclidean domains X nX \subset \mathbb{R}^n, c j(X)c_j(X) is proportional to XH j1dS\int_{\partial X}H^{j-1}\mathrm{d}S for j>0j\gt 0.

Here HH denotes the mean curvature of X\partial X. You can compute lower-order terms by an iterative scheme, for as long as you have the time. In fact, we have written a python code which computes c jc_j for any jj, which is available at arXiv:2201.11363.

We would love to hear from you should you have any thoughts on the following questions:

  • Is magnitude an interesting invariant for information geometry?

  • Is there a category theoretic motivation, like Lawvere’s view of a metric space as an enriched category?

  • Does the magnitude relate to notions studied in information geometry?

  • Do you have interesting questions about this invariant?

May 16, 2022

Matt StrasslerEarth Around the Sun, or Not? The Earth-Centered Coordinates You Should Worry About

We’re more than a week into a discussion of Professor Richard Muller’s claim that “According to the general theory of relativity, the Sun does orbit the Earth. And the Earth orbits the Sun. And they both orbit together around a place in between. And both the Sun and the Earth are orbiting the Moon.” Though many readers have made interesting and compelling attempts to prove the Earth orbits the Sun, none have yet been able to say why Muller is wrong.

A number of readers suggested, in one way or another, that we go far from the Sun and Earth and use the fact that out there, far from any complications, Newtonian physics should be good. From there, we can look back at the Sun and Earth, and see what’s going on in an unbiased way. Although Muller would say that you could still claim the Sun orbits the Earth by using “geocentric” coordinates centered on the Earth, these readers argued that such coordinates would not make sense in this distant, Newtonian region.

Are they correct about this?

Standard Geocentric Coordinates

Let’s make that last argument more precise. About a week ago, I offered you some geocentric coordinates; see below, and also the last two figures in that previous post. These are non-rotating Cartesian coordinates centered on the Earth. They can be defined in the usual heliocentric (Sun-centered) coordinate system, the one we normally take for granted, by centering a non-rotating grid on the Earth, shown in Figure 1. This figure shows a simplified solar system (the Sun at center, with Mercury, Venus, Earth, Mars and Jupiter in circular orbits), as well as the Earth-centered grid which follows the Earth around in its orbit.

Figure 1: The Sun and the first five planets (Earth is green-blue) in a simplified solar system, with all the planets moving in cricles around the Sun, along with a grid showing my simple Cartesian Earth-centered coordinates used in a previous post.

When we now move to the coordinate system defined by the grid in figure 1, the Earth becomes stationary and the Sun starts moving around it, as shown below. The other planets do some strange loops-within-loops — epicycles, they are called.

Figure 2: In figure 1’s Cartesian Earth-centered coordinates, the Sun orbits the Earth, and the other planets have complex orbits, with Mars (red) and Jupiter (yellow) showing clear epicycles. Distant stars would also show loops in their motions.

The argument against such geocentric coordinates is that it’s not just nearby planets like Jupiter that undergo epicycles. So would all of the distant stars! Each will move in a little loop, once an Earth-year! Now indeed, that sounds bad; why would we accept a coordinate system in which extremely distant stars like Sirius or Vega or Betelgeuse would travel in loops that somehow know how long it takes for the Earth to go around the Sun?

Such complaints seem reasonable. This kind of geocentric coordinate system implicitly stretches the Earth’s influence across the entire cosmos, and that doesn’t seem to make any physical or causal sense.

That said, coordinates are just labels. They don’t have to make physical sense or preserve a notion of causality. Only physical phenomena have to do that. But still, it seems crazy to take coordinates seriously that have this property.

And the claim that readers implicitly made is that if you forbid these coordinates — if you use coordinates in which the distant stars are fixed, or at least traveling not in Earth-year-long loops — then you inevitably will prefer heliocentric coordinates.

General Geocentric Coordinates

But this claim, and any similar one, is wrong. No one said that we have to extend the coordinates out from the Earth in a rigid, Cartesian way. Einstein claimed that physics is unchanged no matter how crazy the coordinate system you might choose to describe it. So let’s take the following coordinate system, which is warped, remains the same as the heliocentric coordinates at very large distances, but is geocentric at and near the Earth.

Figure 3: The Sun and the first five planets (Earth is green-blue) in a simplified solar system, with a grid showing Earth-centered coordinates that are the same as the Sun-centered coordinates far away (in contrast to those of figure 1) but are warped near the Earth so as to put it at the center.

In this system of coordinates, here’s what the motion of the Sun and planets looks like.

Figure 4: In the Earth-centered coordinates of figure 3, the inner planets and Sun behave almost as in figure 2, but Jupiter has mostly lost its epicycles, and distant stars act just the same as in the standard Sun-centered coordinates.

The Sun goes round the Earth. Notice that Mars still moves with a significant epicycle, but the epicycles of Jupiter are almost gone. By the time you get to the distant stars, none of them are doing loops anymore. The stars, in this coordinate system, move completely independently of Earth’s motion. Yet the coordinate system has Earth as its center, with the Sun moving round it.

For those of you who suggested that it’s obvious (or near-obvious) that Earth orbits the Sun, these are the coordinates that Muller can ask you about. The only effect of these geocentric coordinates is near the Earth and Sun. No hint remains, by the time you get to the distant stars, that anything is different from heliocentric coordinates. And so, if you assumed implicitly or explicitly that because the distant stars are in nearly flat space, you could extend good heliocentric coordinates all the way down to the Sun and apply quasi-Newtonian reasoning, these curved geocentric coordinates raise challenging questions that you need to answer. Does your argument, whatever it was, truly survive the use of a coordinate systems like this one? And why can’t Muller use them to show the Sun orbits the Earth?

Tommaso DorigoHow Inconsistent Really Are The W Mass Measurements

The recent precise measurement of the W boson mass produced by the non-dead CDF collaboration last month continues to be at the focus of attention by the scientific community, for a good reason - if correct, the CDF measurement in and of itself would be the conclusive proof that our trust in the Standard Model of particle physics when producing predictions of particle phenomenology needs a significant overhaul. 

read more

May 15, 2022

Doug NatelsonFlat bands: Why you might care, and one way to get them

When physicists talk about the electronic properties of solids, we often talk about "band theory".  I've written a bit about this before here.  In classical mechanics, a free particle of mass \(m\) and momentum \(\mathbf{p}\) has a kinetic energy given by \(p^2/2m\).  In a crystalline solid, we can define a parameter, the crystal momentum, \(\hbar \mathbf{k}\), that acts a lot like momentum (accounting for the ability to transfer momentum to and from the whole lattice).  The energy near the top or bottom of a band is often described by an effective mass \(m_{*}\), so that \(E(\mathbf{k}) = E_{0} + (\hbar^2 k^2/2m_{*})\).  The whole energy band spans some range of energies called the bandwidth, \(\Delta\). If a band is "flat", that means that its energy is independent of \(\mathbf{k}\) and \(\Delta = 0\).  In the language above, that would imply an infinite effective mass; in a semiclassical picture, that implies zero velocity - the electrons are "localized", stuck around particular spatial locations.  

Why is this an interesting situation?  Well, the typical band picture basically ignores electron-electron interactions - the assumption is that the interaction energy scale is small compared to \(\Delta\).  If there is a flat band, then interactions can become the dominant physics, leading potentially to all kinds of interesting physics, like magnetism, superconductivity, etc.  There has been enormous excitement in the last few years about this because twisting adjacent layers of atomically thin materials like graphene by the right amount can lead to flat bands and does go along with a ton of cool phenomena.  

How else can you get a flat band?  Quantum interference is one way.  When worrying about quantum interference in electron motion, you have to add the complex amplitudes for different electronic trajectories.  This is what gives you the interference pattern in the two-slit experiment.   When trajectories to a certain position interfere destructively, the electron can't end up there.  

It turns out that destructive interference can come about from lattice symmetry. Shown in the figure is a panel adapted from this paper, a snapshot of part of a 2D kagome lattice.  For the labeled hexagon of atoms there, you can think of that rather like the carbon atoms in benzene, and it turns out that there are states such that the electrons tend to be localized to that hexagon.  Within a Wannier framework, the amplitudes for an electron to hop from the + and - labeled sites to the nearest (red) site are equal in magnitude but opposite in sign.  So, hopping out of the hexagon does not happen, due to destructive interference of the two trajectories (one from the + site, and one from the - site).  

Of course, if the flat band is empty, or if the flat band is buried deep down among the completely occupied electronic states, that's not likely to have readily observable consequences.  The situation is much more interesting if the flat band is near the Fermi level, the border between filled and empty electronic states.  Happily, this does seem to happen - one example is Ni3In, as discussed here showing "strange metal" response; another example is the (semiconducting?) system Nb3Cl8, described here.  These flat bands are one reason why there is a lot of interest these days in "kagome metals".

Peter Rohde Sydney sunsets

May 14, 2022

n-Category Café Grothendieck Conference

There’s a conference on Grothendieck’s work coming up soon here in Southern California!

  • Grothendieck’s approach to mathematics, May 24-28, 2022, Chapman University, Orange, California. Organized by Peter Jipsen (Mathematics), Alexander Kurz (Computer Science), Andrew Mosher (Mathematics and Computer Science), Marco Panza (Mathematics and Philosophy), Ahmed Sebbar (Physics and Mathematics), Daniele Struppa (Mathematics).

To attend in person register here. To attend via Zoom go here. The talks will be recorded, and I hear they will be made available later on YouTube.

Here’s the program:

24th Tuesday

8:45 – 9:00 Welcome by Michel IBBA, dean of the Schmid College of Science and Technology

9:00 – 10:00 Marco Panza (Chapman, CNRS) Grothendieck’s promenade, or the eulogy of aloneness. An introduction to Grothendieck’s spirit by his own words

10:00 – 11:00 Fernando Zalamea (Univ. Nacional de Colombia): A Unitary vision of Grothendieck’s 40 main years (1951-1991): The Models TSK (Topos of Sheaves over Kripke Models)

11:00 – 11:15 Coffee Break

11:15 – 12:15 Kevin Buzzard (Imperial College London; by zoom): Grothendieck’s approach to equality

12:15 – 1:15 Jean Pierre Marquis (Univ. of Montréal): Grothendieck, Bourbaki and mathematical structuralism

1:15 – 2:30 Lunch

2:30 – 3:30 Colin McLarty (Case Western Reserve Univ.): Grothendieck did not believe in universes, he believed in topos and schemes

3:30 – 4:30 Elaine Landry (UC Davis): As if category theory were a foundation

4:30 – 5:30 Jean-Jacques Szczeciniarz (Univ Paris Cité): On some points of homological algebra

6:30 Welcome dinner at Chapman campus

25th Wednesday

9:00 – 10:00 John Baez (U.C. Riverside): Motivating motives

10:00 – 11:00 Simona Paoli (Univ. of Aberdeen; by zoom): From higher groupoids to higher categories

11:00 – 11:15 Coffee Break

11:15 – 12:15 Brice Halimi (Univ. Paris Cité): Context-dependence and descent theory

12:15 – 1:15 Goro Kato (Cal Poly): The Descent methods for phenomena of organization-emergence

1:15 – 2:30 Lunch

2:30 – 3:30 Jessica Carter (Aarhus Univ.): Grothendieck’s contribution to K-theory and some consequences for the ontology of mathematics

3:30 – 4:30 Frederic Jaeck (Univ. of Aix-Marseille): A philosophy in the shade of Grothendieck’s mathematics

4:30 – 5:30 Carmen Martinez (UNAM): Conjectures, counterexamples and A. Grothendieck

7:00 Gala Dinner at the Chapman President Residence

26th Thursday

9:00 – 10:00 Ahmed Sebbar (Chapman) Euler’s Products

10:00 – 11:00 Yves Adré (Sorbonne Univ., Paris; by zoom): Grothendieck and differential equations

11:00 – 11:15 Coffee Break

11:15 – 12:15 Daniele Struppa (Chapman): Superoscillatory sequences and infinite order differential operators

12:15 – 1:15 Mohamed Saidi (Univ. of Exeter): The anabelian geometry of Grothendieck

1:15 – 2:30 Lunch

2:30 – 3:30 Pino Rosolini (Univ. of Genova): Grothendieck fibrations, or when aesthetics drives mathematics

3:30 – 4:30 Simon Henry (Univ. of Ottawa; by zoom): Grothendieck’s homotopy hypothesis

4:30 – 5:30 Drew Moshier (Chapman) On “logical” duals of compact Hausdorff spaces

6:30 Conference Dinner in Old Orange

27th Friday

9:00 – 10:00 Andrés Villaveces (Univ. Nacional de Colombia): Galoisian model theory: the role(s) of Grothendieck (à son insu!)

10:00 – 11:00 Olivia Caramello (Univ. of Insubria; by Zoom): The “unifying notion” of topos

11:00 – 11:15 Coffee break

11:15 – 12:15 Mike Shulman (Univ. of San Diego): Lifting Grothendieck universes to Grothendieck toposes

12:15 – 1:15 José Gil-Ferez (Chapman Univ.) The isomorphism theorem of algebraic logic: a categorical perspective

1:15 – 2:30 Lunch

2:30 – 3:30 Oumar Wone (Chapman) : Vector bundles on Riemann surfaces according to Grothendieck and his followers

3:30 – 4:30 Claudio Bartocci (Univ. of Genova): The inception of the theory of moduli spaces: Grothendieck’s Quot scheme

4:30 – 5:30 Christian Houzel (IUFM de Paris): Riemann surfaces after Grothendieck [presented by J.J. Szczeciniarz]

28th Saturday

9:00 – 10:00 Silvio Ghilardi (Univ. degli Studi, Milano): Investigating definability in propositional logic via Grothendieck topologies and sheaves

10:00 – 11:00 Matteo Viale (Univ. of Turin; by zoom): The duality between Boolean valuated models and topological presheaves

11:00 – 11:15 Coffee break

11:15 – 12:15 Benjamin Collas (RIMS, Kyoto Univ.): Galois-Teichmüller: arithmetic geometric principles

12:15 – 1:15 Closing: general discussion animated by Alex Kurz (Chapman)

n-Category Café Communicating Mathematics Conference

Communicating Mathematics is a 4-day workshop for mathematicians at all career stages who are interested in exploring how we share our research and interests with fellow mathematicians, students, and the public.

The workshop takes place August 8-11 at Cornell University and will run concurrently online over zoom.

Planned sessions and workshops include:

  • Mathematics for the common good (social and civil justice issues)

  • Engaging the public in mathematical discourse

  • Inclusivity and communication in the classroom

  • Communicating to policymakers

  • Community outreach: communicating mathematics to young people (e.g. math circles)

  • Advocating for your department (communicating to university administration)

  • Communicating with fellow mathematicians:

    • What makes an engaging research talk?

    • How should we be communicating our work to each other?

    • Succinctly describing your research to both a specialist and non-specialist.

We will also have structured breakout/lunchtime discussions on specific issues related to improving communication and dissemination.

Speakers and panelists include:

  • Erika Tatiana Camacho (NSF)

  • Moon Duchin (Tufts University / Metric Geometry and Gerrymandering Group)

  • Jordan Ellenberg (University of Wisconsin)

  • Rochelle Gutiérrez (UIUC)

  • Lily Khadjavi (Loyola Marymount University)

  • Michelle Manes (University of Hawaii)

  • John Meier (Provost, Lafayette college)

  • Karen Saxe (AMS Office of government relations)

  • Steve Strogatz (Cornell University)

  • Peter Trapa (Dean of College of Science, University of Utah)

  • Sam Vandervelde (founder of Proof School)

  • Amie Wilkinson (University of Chicago)

For more information, including registration, please visit our website:

https://sites.google.com/view/communicating-math/

or contact one of the conference organizers: Katie Mann, Jennifer Taback or myself.

Doug NatelsonGrad students mentoring grad students - best practices?

I'm working on a physics post about flat bands, but in the meantime I thought I would appeal to the greater community.  Our physics and astronomy graduate student association is spinning up a mentoring program, wherein senior grad students will mentor beginning grad students.  It would be interesting to get a sense of best practices in this.  Do any readers have recommendations for resources about this kind of mentoring, or examples of departments that do this particularly well?  I'm aware of the program at UCI and the one at WUSTL, for example.

May 13, 2022

Matt von HippelAt Mikefest

I’m at a conference this week of a very particular type: a birthday conference. When folks in my field turn 60, their students and friends organize a special conference for them, celebrating their research legacy. With COVID restrictions just loosening, my advisor Michael Douglas is getting a last-minute conference. And as one of the last couple students he graduated at Stony Brook, I naturally showed up.

The conference, Mikefest, is at the Institut des Hautes Études Scientifiques, just outside of Paris. Mike was a big supporter of the IHES, putting in a lot of fundraising work for them. Another big supporter, James Simons, was Mike’s employer for a little while after his time at Stony Brook. The conference center we’re meeting in is named for him.

You might have to zoom in to see that, though.

I wasn’t involved in organizing the conference, so it was interesting seeing differences between this and other birthday conferences. Other conferences focus on the birthday prof’s “family tree”: their advisor, their students, and some of their postdocs. We’ve had several talks from Mike’s postdocs, and one from his advisor, but only one from a student. Including him and me, three of Mike’s students are here: another two have had their work mentioned but aren’t speaking or attending.

Most of the speakers have collaborated with Mike, but only for a few papers each. All of them emphasized a broader debt though, for discussions and inspiration outside of direct collaboration. The message, again and again, is that Mike’s work has been broad enough to touch a wide range of people. He’s worked on branes and the landscape of different string theory universes, pure mathematics and computation, neuroscience and recently even machine learning. The talks generally begin with a few anecdotes about Mike, before pivoting into research talks on the speakers’ recent work. The recent-ness of the work is perhaps another difference from some birthday conferences: as one speaker said, this wasn’t just a celebration of Mike’s past, but a “welcome back” after his return from the finance world.

One thing I don’t know is how much this conference might have been limited by coming together on short notice. For other birthday conferences impacted by COVID (and I’m thinking of one in particular), it might be nice to have enough time to have most of the birthday prof’s friends and “academic family” there in person. As-is, though, Mike seems to be having fun regardless.

Happy Birthday Mike!

May 12, 2022

Matt StrasslerIn Our Galaxy’s Center, a Tiny Monster

It’s far from a perfect image. [Note added: if you need an introduction to what images like this actually represent (they aren’t photographs of black holes, which are, after all, black…), start with this.]

EHT’s blurry time-averaged image of the ring of material surrounding the black hole at the center of our galaxy

It’s blurred out in space by imperfections in the telescopic array that is the “Event Horizon Telescope” (EHT) and by dust between us and our galaxy’s center. It’s blurred out in time by the fact that the glowing material around the black hole changes appreciably by the hour, while the EHT’s effective exposure time is a day. There are bright spots in the image that may just be artifacts of exactly where the telescopes are located that are combined together to make up the EHT. The details of the reconstructed image depend on exactly what assumptions are made.

At best, it shows us just a thick ring of radio waves emitted over a day by an ever-changing thick disk of matter around a black hole.

But it’s our galaxy’s black hole. And it’s just the first image. There will be many more to come, sharper and more detailed. Movies will follow. A decade or two from now, what we have been shown today will look quaint.

We already knew the mass of this black hole from other measurements, so there was a prediction for the size of the ring to within twenty percent or so. The prediction was verified today, a basic test of Einstein’s gravity equations. Moreover, EHT’s results now provide some indications that the black hole spins (as expected). And (by pure luck) its spin axis points, very roughly, toward Earth (much like M87’s black hole, whose image was provided by EHT in 2019.)

We can explore these and other details in coming days, and there’s much more to learn in the coming years. But for now, let’s appreciate the picture for what it is. It is an achievement that history will always remember.

May 11, 2022

Terence TaoResources for displaced mathematicians

In this post I would like to collect a list of resources that are available to mathematicians displaced by conflict. Here are some general resources:

There are also resources specific to the current crisis:

Finally, there are a number of institutes and departments who are willing to extend visiting or adjunct positions to such displaced mathematicians:

If readers have other such resources to contribute (or to update the ones already listed), please do so in the comments and I will modify the above lists as appropriate.

As with the previous post, any purely political comment not focused on such resources will be considered off-topic and thus subject to deletion.

David Hoggaccretion and mathematical physics

In the CCPP brown-bag today, Andrei Gruzinov (NYU) went through the full mathematical-physics argument of Bondi (from the 1950s) that leads to the Bondi formula for accretion from a stationary, thermal gas onto a point mass. He also talked about a generalization of the Bondi argument that he developed this year (permitting the gas to be moving relative to the point mass) and also a bevy of reasons, both theoretical and observational, that the Bondi solution never actually applies ever in practice! Haha, but beautiful stuff.

David Hoggdiscretized vector calculus

On Friday, Will Farr (Flatiron) suggested to me that the work I have been doing (with Soledad Villar) on image-convolution operators with good geometric and group-theoretic properties might be related somehow to discretized differential geometry. It does! I tried to read some impenetrable papers but my main take-away is that I have to understand this field.

David HoggDr Tomer Yavetz

Today Tomer Yavetz (Columbia) defended his PhD, which was in part about the dynamics of stellar streams, and in part about macroscopically quantum-mechanical dark matter. The dissertation was great. The stellar-stream part was about stream morphologies induced by dynamical separatrices in phase space: If the stars on a stream are on orbits that span a separatrix, all heck breaks loose. The part of the thesis on this was very pedagogical and insightful about theoretical dynamics. The dark-matter part was about fast computation of steady-states using orbitals and the WKB approximation. Beautiful physics and math! But my favorite part of the thesis was the introduction, in which Yavetz discusses the point that dynamics—even though we can't see stellar orbits—does have directly observable consequences, like the aforementioned streams and their morphologies (and also Saturn's rings and the gaps in the asteroid belt and the velocity substructure in the Milky Way disk). After the defense we talked about re-framing dynamics around this idea of observability. Congratulations, and it has been a pleasure!

May 10, 2022

Scott Aaronson Donate to protect women’s rights: a call to my fellow creepy, gross, misogynist nerdbros

So, I’d been planning a fun post for today about the DALL-E image-generating AI model, and in particular, a brief new preprint about DALL-E’s capabilities by Ernest Davis, Gary Marcus, and myself. We wrote this preprint as a sort of “adversarial collaboration”: Ernie and Gary started out deeply skeptical of DALL-E, while I was impressed bordering on awestruck. I was pleasantly surprised that we nevertheless managed to produce a text that we all agreed on.

Not for the first time, though, world events have derailed my plans. The most important part of today’s post is this:

For the next week, I, Scott Aaronson, will personally match all reader donations to Fund Texas Choice—a group that helps women in Texas travel to out-of-state health clinics, for reasons that are neither your business nor mine—up to a total of $5,000.

To show my seriousness, I’ve already donated $1,000. Just let me know how much you’ve donated in the comments section!

The first reason for this donation drive is that, perhaps like many of you, I stayed up hours last night reading Alito’s leaked decision in a state of abject terror. I saw how the logic of the decision, consistent and impeccable on its own terms, is one by which the Supreme Court’s five theocrats could now proceed to unravel the whole of modernity. I saw how this court, unchecked by our broken democratic system, can now permanently enshrine the will of a radical minority, perhaps unless and until the United States is plunged into a second Civil War.

Anyway, that’s the first reason for the donation drive. The second reason is to thank Shtetl-Optimized‘s commenters for their … err, consistently generous and thought-provoking contributions. Let’s take, for example, this comment on last week’s admittedly rather silly post, from an anonymous individual who calls herself “Feminist Bitch,” and who was enraged that it took me a full day to process one of the great political cataclysms of our lifetimes and publicly react to it:

OF COURSE. Not a word about Roe v. Wade being overturned, but we get a pseudo-intellectual rationalist-tier rant about whatever’s bumping around Scott’s mind right now. Women’s most basic reproductive rights are being curtailed AS WE SPEAK and not a peep from Scott, eh? Even though in our state (Texas) there are already laws ON THE BOOKS that will criminalize abortion as soon as the alt-right fascists in our Supreme Court give the go-ahead. If you cared one lick about your female students and colleagues, Scott, you’d be posting about the Supreme Court and helping feminist causes, not posting your “memes.” But we all know Scott doesn’t give a shit about women. He’d rather stand up for creepy nerdbros and their right to harass women than women’s right to control their own fucking bodies. Typical Scott.

If you want, you can read all of Feminist Bitch’s further thoughts about my failings, with my every attempt to explain and justify myself met with further contempt. No doubt my well-meaning friends of both sexes would counsel me to ignore her. Alas, from my infamous ordeal of late 2014, I know that with her every word, Feminist Bitch speaks for thousands, and the knowledge eats at me day and night.

It’s often said that “the right looks for converts, while the left looks only for heretics.” Has Feminist Bitch ever stopped to think about how our civilization reached its current terrifying predicament—how Trump won in 2016, how the Supreme Court got packed with extremists who represent a mere 25% of the country, how Putin and Erdogan and Orban and Bolsonaro and all the rest consolidated their power? Does she think it happened because wokeists like herself reached out too much, made too many inroads among fellow citizens who share some but not all of their values? Would Feminist Bitch say that, if the Democrats want to capitalize on the coming tsunami of outrage about the death of Roe and the shameless lies that enabled it, if they want to sweep to victory in the midterms and enshrine abortion rights into federal law … then their best strategy would be to double down on their condemnations of gross, creepy, smelly, white male nerdbros who all the girls, like, totally hate?

(until, thank God, some of them don’t)

I continue to think that the majority of my readers, of all races and sexes and backgrounds, are reasonable and sane. I continue to think the majority of you recoil against hatred and dehumanization of anyone—whether that means women seeking abortions, gays, trans folks, or (gasp!) even white male techbros. In this sad twilight for the United States and for liberal democracy around the world, we the reasonable and sane, we the fans of the Enlightenment, we the Party of Psychological Complexity, have decades of work cut out for us. For now I’ll simply say: I don’t hear from you nearly enough in the comments.

May 09, 2022

John BaezShannon Entropy from Category Theory

I’m giving a talk at Categorical Semantics of Entropy on Wednesday May 11th, 2022. You can watch it live on Zoom if you register, or recorded later. Here’s the idea:

Shannon entropy is a powerful concept. But what properties single out Shannon entropy as special? Instead of focusing on the entropy of a probability measure on a finite set, it can help to focus on the “information loss”, or change in entropy, associated with a measure-preserving function. Shannon entropy then gives the only concept of information loss that is functorial, convex-linear and continuous. This is joint work with Tom Leinster and Tobias Fritz.

You can see the slides now, here. I talk a bit about all these papers:

• John Baez, Tobias Fritz and Tom Leinster, A characterization of entropy in terms of information loss, 2011.

• Tom Leinster, An operadic introduction to entropy, 2011.

• John Baez and Tobias Fritz, A Bayesian characterization of relative entropy, 2014.

• Tom Leinster, A short characterization of relative entropy, 2017.

• Nicolas Gagné and Prakash Panangaden, A categorical characterization of relative entropy on standard Borel spaces, 2017.

• Tom Leinster, Entropy and Diversity: the Axiomatic Approach, 2020.

• Arthur Parzygnat, A functorial characterization of von Neumann entropy, 2020.

• Arthur Parzygnat, Towards a functorial description of quantum relative entropy, 2021.

• Tai-Danae Bradley, Entropy as a topological operad derivation, 2021.

John PreskillHow Captain Okoli got his name

About two years ago, I dreamt up a character called Captain Okoli. He features in the imaginary steampunk novel from which I drew snippets to begin the chapters of my otherwise nonfiction book. Captain Okoli is innovative, daring, and kind; he helps the imaginary novel’s heroine, Audrey, on her globe-spanning quest. 

Captain Okoli inherited his name from Chiamaka Okoli, who was a classmate and roommate of mine while we pursued our master’s degrees at the Perimeter Institute for Theoretical Physics. Unfortunately, an illness took Chiamaka’s life shortly after she completed her PhD. Captain Okoli is my tribute to her memory, but my book lacked the space for an explanation of who Chiamaka was or how Captain Okoli got his name. The Perimeter Institute offered a platform in its publication Inside the Perimeter. You can find the article—a story about an innovative, daring, and kind woman—here.

May 07, 2022

David Hoggmaking a mock Gaia quasar sample

I had conversations today with both Hans-Walter Rix (MPIA) and Kate Storey-Fisher (NYU) about the upcoming ESA Gaia quasar sample. We are trying to make somewhat realistic mocks to test the size of the sample, the computational complexity of things we want to do, the expected signal-to-noise of various cosmological signals, and the expected amplitude and spatial structure of the Gaia selection function. We have strategies that involve making clean samples with a lognormal mock, and making realistic samples (but which have no clustering) using the Gaia EDR3 photometric sample (matched to NASA WISE).

David Hoggdiscovering quantum physics, automatically?

I have been working on making machine-learning methods dimensionless (in the sense of units). In this context, a question arises: Is it possible to learn that there is a missing dimensional input to a physics problem, using machine learning? Soledad Villar (JHU) and I ignored some of our required work today and wrote some code to explore this problem, using as a toy example the Planck Law example we explained in this paper. We found that maybe you can discover a missing dimensional constant? We have lots more to do to decide what we really have.

May 06, 2022

Matt von HippelYou Are a Particle Detector

I mean that literally. True, you aren’t a 7,000 ton assembly of wires and silicon, like the ATLAS experiment inside the Large Hadron Collider. You aren’t managed by thousands of scientists and engineers, trying to sift through data from a billion pairs of protons smashing into each other every second. Nonetheless, you are a particle detector. Your senses detect particles.

Like you, and not like you

Your ears take vibrations in the air and magnify them, vibrating the fluid of your inner ear. Tiny hairs communicate that vibration to your nerves, which signal your brain. Particle detectors, too, magnify signals: photomultipliers take a single particle of light (called a photon) and set off a cascade, multiplying the signal one hundred million times so it can be registered by a computer.

Your nose and tongue are sensitive to specific chemicals, recognizing particular shapes and ignoring others. A particle detector must also be picky. A detector like ATLAS measures far more particle collisions than it could ever record. Instead, it learns to recognize particular “shapes”, collisions that might hold evidence of something interesting. Only those collisions are recorded, passed along to computer centers around the world.

Your sense of touch tells you something about the energy of a collision: specifically, the energy things have when they collide with you. Particle detectors do this with calorimeters, that generate signals based on a particle’s energy. Different parts of your body are more sensitive than others: your mouth and hands are much more sensitive than your back and shoulders. Different parts of a particle detector have different calorimeters: an electromagnetic calorimeter for particles like electrons, and a less sensitive hadronic calorimeter that can catch particles like protons.

You are most like a particle detector, though, in your eyes. The cells of your eyes, rods and cones, detect light, and thus detect photons. Your eyes are more sensitive than you think: you are likely able to detect even a single photon. In an experiment, three people sat in darkness for forty minutes, then heard two sounds, one of which might come accompanied by a single photon of light flashed into their eye. The three didn’t notice the photons every time, that’s not possible for such a small sensation: but they did much better than a random guess.

(You can be even more literal than that. An older professor here told me stories of the early days of particle physics. To check that a machine was on, sometimes physicists would come close, and watch for flashes in the corner of their vision: a sign of electrons flying through their eyeballs!)

You are a particle detector, but you aren’t just a particle detector. A particle detector can’t move, its thousands of tons are fixed in place. That gives it blind spots: for example, the tube that the particles travel through is clear, with no detectors in it, so the particle can get through. Physicists have to account for this, correcting for the missing space in their calculations. In contrast, if you have a blind spot, you can act: move, and see the world from a new point of view. You observe not merely a series of particles, but the results of your actions: what happens when you turn one way or another, when you make one choice or another.

So while you are a particle detector, what’s more, you’re a particle experiment. You can learn a lot more than those big heaps of wires and silicon could on their own. You’re like the whole scientific effort: colliders and detectors, data centers and scientists around the world. May you learn as much in your life as the experiments do in theirs.

Tommaso DorigoTwo Points On The Future Of Detector Design

My attendance to the JENAS symposium in Madrid this week provided me with the opportunity to meet some of the senior colleagues who will influence the future development of technologies for fundamental research in the coming decade and more. Over coffee-break discussions, poster sessions, and social dinner I exploited the situation by stressing a few points which I have come to consider absolutely crucial for our field. 

Of course I am moved not only by caring for the progress of humanity but also by the fact that I would like the research plan I have put together in collaboration with a few colleagues to succeed... Ultimately, the two things are very well aligned though!

read more

May 05, 2022

John BaezCategorical Semantics of Entropy

There will be a workshop on the categorical semantics of entropy at the CUNY Grad Center in Manhattan on Friday May 13th, organized by John Terilla. I was kindly invited to give an online tutorial beforehand on May 11, which I will give remotely to save carbon. Tai-Danae Bradley will also be giving a tutorial that day in person:

Tutorial: Categorical Semantics of Entropy, Wednesday 11 May 2022, 13:00–16:30 Eastern Time, Room 5209 at the CUNY Graduate Center and via Zoom. Organized by John Terilla. To attend, register here.

12:00-1:00 Eastern Daylight Time — Lunch in Room 5209.

1:00-2:30 — Shannon entropy from category theory, John Baez, University of California Riverside; Centre for Quantum Technologies (Singapore); Topos Institute.

Shannon entropy is a powerful concept. But what properties single out Shannon entropy as special? Instead of focusing on the entropy of a probability measure on a finite set, it can help to focus on the “information loss”, or change in entropy, associated with a measure-preserving function. Shannon entropy then gives the only concept of information loss that is functorial, convex-linear and continuous. This is joint work with Tom Leinster and Tobias Fritz.

2:30-3:00 — Coffee break.

3:00-4:30 — Operads and entropy, Tai-Danae Bradley, The Master’s University; Sandbox AQ.

This talk will open with a basic introduction to operads and their representations, with the main example being the operad of probabilities. I’ll then give a light sketch of how this framework leads to a small, but interesting, connection between information theory, abstract algebra, and topology, namely a correspondence between Shannon entropy and derivations of the operad of probabilities.

Symposium on Categorical Semantics of Entropy, Friday 13 May 2022, 9:30-3:15 Eastern Daylight Time, Room 5209 at the CUNY Graduate Center and via Zoom. Organized by John Terilla. To attend, register here.

9:30-10:00 Eastern Daylight Time — Coffee and pastries in Room 5209.

10:00-10:45 — Operadic composition of thermodynamic systems, Owen Lynch, Utrecht University.

The maximum entropy principle is a fascinating and productive lens with which to view both thermodynamics and statistical mechanics. In this talk, we present a categorification of the maximum entropy principle, using convex spaces and operads. Along the way, we will discuss a variety of examples of the maximum entropy principle and show how each application can be captured using our framework. This approach shines a new light on old constructions. For instance, we will show how we can derive the canonical ensemble by attaching a probabilistic system to a heat bath. Finally, our approach to this categorification has applications beyond the maximum entropy principle, and we will give an hint of how to adapt this categorification to the formalization of the composition of other systems.

11:00-11:45 — Polynomial functors and Shannon entropy, David Spivak, MIT and the Topos Institute.

The category Poly of polynomial functors in one variable is extremely rich, brimming with categorical gadgets (e.g. eight monoidal products, two closures, limits, colimits, etc.) and applications including dynamical systems, databases, open games, and cellular automata. In this talk I’ll show that objects in Poly can be understood as empirical distributions. In part using the standard derivative of polynomials, we obtain a functor to Set × Setop which encodes an invariant of a distribution as a pair of sets. This invariant is well-behaved in the sense that it is a distributive monoidal functor: it acts on both distributions and maps between them, and it preserves both the sum and the tensor product of distributions. The Shannon entropy of the original distribution is then calculated directly from the invariant, i.e. only in terms of the cardinalities of these two sets. Given the many applications of polynomial functors and of Shannon entropy, having this link between them has potential to create useful synergies, e.g. to notions of entropic causality or entropic learning in dynamical systems.

12:00-1:30 — Lunch in Room 5209

1:30-2:15 — Higher entropy, Tom Mainiero, Rutgers New High Energy Theory Center.

Is the frowzy state of your desk no longer as thrilling as it once was? Are numerical measures of information no longer able to satisfy your needs? There is a cure! In this talk we’ll learn about: the secret topological lives of multipartite measures and quantum states; how a homological probe of this geometry reveals correlated random variables; the sly decategorified involvement of Shannon, Tsallis, Réyni, and von Neumann in this larger geometric conspiracy; and the story of how Gelfand, Neumark, and Segal’s construction of von Neumann algebra representations can help us uncover this informatic ruse. So come to this talk, spice up your entropic life, and bring new meaning to your relationship with disarray.

2:30-3:15 — On characterizing classical and quantum entropy, Arthur Parzygnat, Institut des Hautes Études Scientifiques.

In 2011, Baez, Fritz, and Leinster proved that the Shannon entropy can be characterized as a functor by a few simple postulates. In 2014, Baez and Fritz extended this theorem to provide a Bayesian characterization of the classical relative entropy, also known as the Kullback–Leibler divergence. In 2017, Gagné and Panangaden extended the latter result to include standard Borel spaces. In 2020, I generalized the first result on Shannon entropy so that it includes the von Neumann (quantum) entropy. In 2021, I provided partial results indicating that the Umegaki relative entropy may also have a Bayesian characterization. My results in the quantum setting are special applications of the recent theory of quantum Bayesian inference, which is a non-commutative extension of classical Bayesian statistics based on category theory. In this talk, I will give an overview of these developments and their possible applications in quantum information theory.

Wine and cheese reception to follow, Room 5209.

May 03, 2022

Tommaso DorigoA Joint Discussion Forum For Physicists Of Particles, Nuclei, Neutrinos, And The Cosmos

As I write these few lines, I am sitting in the nice auditorium at CSIC in Madrid, where I came for a congress that is a bit different to many others that take place around the world at all times. Truth be told, covid-19 took a big hit on the organization of these events, but slowly things are getting back to normality - the only visible sign of something different from 2019 in the auditorium is the fact that about 80 percent of the 180 scientists sitting around me wear a mask.

read more

May 02, 2022

Doug NatelsonThe multiverse, everywhere, all at once

The multiverse (in a cartoonish version of the many-words interpretation of quantum mechanics sense - see here for a more in-depth writeup) is having a really good year.  There's all the Marvel properties (Spider-Man: No Way Home; Loki, with its Time Variance Authority; and this week's debut of Doctor Strange in the Multiverse of Madness), and the absolutely wonderful film Everything, Everywhere, All at Once, which I wholeheartedly recommend.  

While it's fun to imagine alternate timelines, the actual many-worlds interpretation of quantum mechanics (MWI) is considerably more complicated than that, as outlined in the wiki link above.  The basic idea is that the apparent "collapse of the wavefunction" upon a measurement is a misleading way to think about quantum mechanics.  Prepare an electron so that its spin is aligned along the \(+x\) direction, and then measure \(s_{z}\).  The Copenhagen interpretation of quantum would say that prior to the measurement, the spin is in a superposition of \(s_{z} = +1/2\) and \(s_{z}=-1/2\), with equal amplitudes.  Once the measurement is completed, the system (discontinuously) ends up in a definite state of \(s_{z}\), either up or down.  If you started with an ensemble of identically prepared systems, you'd find up or down with 50/50 probability once you looked at the measurement results.    

The MWI assumes that all time evolution of quantum systems is (in the non-relativistic limit) governed by the Schrödinger equation, period.  There is no sudden discontinuity in the time evolution of a quantum system due to measurement.  Rather, at times after the measurement, the spin up and spin down results both occur, and there are observers who (measured spin up, and \(s_{z}\) is now +1/2) and observers who (measured spin down, and \(s_{z}\) is now -1/2).  Voila, we no longer have to think about any discontinuous time evolution of a quantum state; of course, we have the small issues that (1) the universe becomes truly enormously huge, since it would have to encompass this idea that all these different branches/terms in the universal superposition "exist", and (2) there is apparently no way to tell experimentally whether that is actually the case, or whether it is just a way to think about things that makes some people feel more comfortable.  (Note, too, that exactly how the Born rule for probabilities arises and what it means in the MWI is not simple.) 

I'm not overly fond of the cartoony version of MWI.  As mentioned in point (2), there doesn't seem to be an experimental way to distinguish MWI from many other interpretations anyway, so maybe I shouldn't care.  I like Zurek's ideas quite a bit, but I freely admit that I have not had time to sit down and think deeply about this (I'm not alone in that.).  That being said, lately I've been idly wondering if the objection of the "truly enormously huge" MWI multiverse is well-founded beyond an emotional level.  I mean, as a modern physicist, I already have come to accept (because of observational evidence) that the universe is huge, possibly infinite in spatial extent, appears to have erupted into an inflationary phase 13.6 billion years ago from an incredibly dense starting point, and contains incredibly rich structure that only represents 5% of the total mass of everything, etc.  I've also come to accept that quantum mechanics makes decidedly unintuitive predictions about reality that are borne out by experiment.  Maybe I should get over being squeamish about the MWI need for a zillion-dimensional hilbert space multiverse.  As xkcd once said, the Drake Equation should include a factor for "amount of bullshit you're willing to buy from Frank Drake".  Why should MWI's overhead be a bridge too far?  

It's certainly fun to speculate idly about roads not taken.  I recommend this thought-provoking short story by Larry Niven about this, which struck my physics imagination back when I was in high school.  Perhaps there's a branch of the multiverse where my readership is vast :-)



n-Category Café Shannon Entropy from Category Theory

I’m giving a talk at Categorical Semantics of Entropy on Wednesday May 11th, 2022. You can watch it live on Zoom if you register, or recorded later. Here’s the idea:

Shannon entropy is a powerful concept. But what properties single out Shannon entropy as special? Instead of focusing on the entropy of a probability measure on a finite set, it can help to focus on the “information loss”, or change in entropy, associated with a measure-preserving function. Shannon entropy then gives the only concept of information loss that is functorial, convex-linear and continuous. This is joint work with Tom Leinster and Tobias Fritz.

You can see the slides now, here.

It’s fun to talk about work that I did with Tobias Fritz and Tom Leinster here on the nn-Café — I’ve never given a talk where I went into as much detail as I will now. In fact I will talk a bit about all these:

• John Baez, Tobias Fritz and Tom Leinster, A characterization of entropy in terms of information loss, 2011.

• Tom Leinster, An operadic introduction to entropy, 2011.

• John Baez and Tobias Fritz, A Bayesian characterization of relative entropy, 2014.

• Tom Leinster, A short characterization of relative entropy, 2017.

• Nicolas Gagné and Prakash Panangaden, A categorical characterization of relative entropy on standard Borel spaces, 2017.

• Tom Leinster, Entropy and Diversity: the Axiomatic Approach, 2020.

• Arthur Parzygnat, A functorial characterization of von Neumann entropy, 2020.

• Arthur Parzygnat, Towards a functorial description of quantum relative entropy, 2021.

• Tai-Danae Bradley, Entropy as a topological operad derivation, 2021.

April 29, 2022

Matt von HippelThings Which Are Fluids

For overambitious apes like us, adding integers is the easiest thing in the world. Take one berry, add another, and you have two. Each remains separate, you can lay them in a row and count them one by one, each distinct thing adding up to a group of distinct things.

Other things in math are less like berries. Add two real numbers, like pi and the square root of two, and you get another real number, bigger than the first two, something you can write in an infinite messy decimal. You know in principle you can separate it out again (subtract pi, get the square root of two), but you can’t just stare at it and see the parts. This is less like adding berries, and more like adding fluids. Pour some water in to some other water, and you certainly have more water. You don’t have “two waters”, though, and you can’t tell which part started as which.

More waters, please!

Some things in math look like berries, but are really like fluids. Take a polynomial, say 5 x^2 + 6 x + 8. It looks like three types of things, like three berries: five x^2, six x, and eight 1. Add another polynomial, and the illusion continues: add x^2 + 3 x + 2 and you get 6 x^2+9 x+10. You’ve just added more x^2, more x, more 1, like adding more strawberries, blueberries, and raspberries.

But those berries were a choice you made, and not the only one. You can rewrite that first polynomial, for example saying 5(x^2+2x+1) - 4 (x+1) + 7. That’s the same thing, you can check. But now it looks like five x^2+2x+1, negative four x+1, and seven 1. It’s different numbers of different things, blackberries or gooseberries or something. And you can do this in many ways, infinitely many in fact. The polynomial isn’t really a collection of berries, for all it looked like one. It’s much more like a fluid, a big sloshing mess you can pour into buckets of different sizes. (Technically, it’s a vector space. Your berries were a basis.)

Even smart, advanced students can get tripped up on this. You can be used to treating polynomials as a fluid, and forget that directions in space are a fluid, one you can rotate as you please. If you’re used to directions in space, you’ll get tripped up by something else. You’ll find that types of particles can be more fluid than berry, the question of which quark is which not as simple as how many strawberries and blueberries you have. The laws of physics themselves are much more like a fluid, which should make sense if you take a moment, because they are made of equations, and equations are like a fluid.

So my fellow overambitious apes, do be careful. Not many things are like berries in the end. A whole lot are like fluids.

Tim GowersAnnouncing an automatic theorem proving project

I am very happy to say that I have recently received a generous grant from the Astera Institute to set up a small group to work on automatic theorem proving, in the first instance for about three years after which we will take stock and see whether it is worth continuing. This will enable me to take on up to about three PhD students and two postdocs over the next couple of years. I am imagining that two of the PhD students will start next October and that at least one of the postdocs will start as soon as is convenient for them. Before any of these positions are advertised, I welcome any informal expressions of interest: in the first instance you should email me, and maybe I will set up Zoom meetings. (I have no idea what the level of demand is likely to be, so exactly how I respond to emails of this kind will depend on how many of them there are.)

I have privately let a few people know about this, and as a result I know of a handful of people who are already in Cambridge and are keen to participate. So I am expecting the core team working on the project to consist of 6-10 people. But I also plan to work in as open a way as possible, in the hope that people who want to can participate in the project remotely even if they are not part of the group that is based physically in Cambridge. Thus, part of the plan is to report regularly and publicly on what we are thinking about, what problems, both technical and more fundamental, are holding us up, and what progress we make. Also, my plan at this stage is that any software associated with the project will be open source, and that if people want to use ideas generated by the project to incorporate into their own theorem-proving programs, I will very much welcome that.

I have written a 54-page document that explains in considerable detail what the aims and approach of the project will be. I would urge anyone who thinks they might want to apply for one of the positions to read it first — not necessarily every single page, but enough to get a proper understanding of what the project is about. Here I will explain much more briefly what it will be trying to do, and what will set it apart from various other enterprises that are going on at the moment.

In brief, the approach taken will be what is often referred to as a GOFAI approach, where GOFAI stands for “good old-fashioned artificial intelligence”. Roughly speaking, the GOFAI approach to artificial intelligence is to try to understand as well as possible how humans achieve a particular task, and eventually reach a level of understanding that enables one to program a computer to do the same.

As the phrase “old-fashioned” suggests, GOFAI has fallen out of favour in recent years, and some of the reasons for that are good ones. One reason is that after initial optimism, progress with that approach stalled in many domains of AI. Another is that with the rise of machine learning it has become clear that for many tasks, especially pattern-recognition tasks, it is possible to program a computer to do them very well without having a good understanding of how humans do them. For example, we may find it very difficult to write down a set of rules that distinguishes between an array of pixels that represents a dog and an array of pixels that represents a cat, but we can still train a neural network to do the job.

However, while machine learning has made huge strides in many domains, it still has several areas of weakness that are very important when one is doing mathematics. Here are a few of them.

  1. In general, tasks that involve reasoning in an essential way.
  2. Learning to do one task and then using that ability to do another.
  3. Learning based on just a small number of examples.
  4. Common sense reasoning.
  5. Anything that involves genuine understanding (even if it may be hard to give a precise definition of what understanding is) as opposed to sophisticated mimicry.

Obviously, researchers in machine learning are working in all these areas, and there may well be progress over the next few years [in fact, there has been progress on some of these difficulties already of which I was unaware — see some of the comments below], but for the time being there are still significant limitations to what machine learning can do. (Two people who have written very interestingly on these limitations are Melanie Mitchell and François Chollet.)

That said, using machine learning techniques in automatic theorem proving is a very active area of research at the moment. (Two names you might like to look up if you want to find out about this area are Christian Szegedy and Josef Urban.) The project I am starting will not be a machine-learning project, but I think there is plenty of potential for combining machine learning with GOFAI ideas — for example, one might use GOFAI to reduce the options for what the computer will do next to a small set and use machine learning to choose the option out of that small set — so I do not rule out some kind of wider collaboration once the project has got going.

Another area that is thriving at the moment is formalization. Over the last few years, several major theorems and definitions have been fully formalized that would have previously seemed out of reach — examples include Gödel’s theorem, the four-colour theorem, Hales’s proof of the Kepler conjecture, Thompson’s odd-order theorem, and a lemma of Dustin Clausen and Peter Scholze with a proof that was too complicated for them to be able to feel fully confident that it was correct. That last formalization was carried out in Lean by a team led by Johan Commelin, which is part of the more general and exciting Lean group that grew out of Kevin Buzzard‘s decision a few years ago to incorporate Lean formalization into his teaching at Imperial College London.

As with machine learning, I mention formalization in order to contrast it with the project I am announcing here. It may seem slightly negative to focus on what it will not be doing, but I feel it is important, because I do not want to attract applications from people who have an incorrect picture of what they would be doing. Also as with machine learning, I would welcome and even expect collaboration with the Lean group. For us it would be potentially very interesting to make use of the Lean database of results, and it would also be nice (even if not essential) to have output that is formalized using a standard system. And we might be able to contribute to the Lean enterprise by creating code that performs steps automatically that are currently done by hand. A very interesting looking new institute, the Hoskinson Center for Formal Mathematics, has recently been set up with Jeremy Avigad at the helm, which will almost certainly make such collaborations easier.

But now let me turn to the kinds of things I hope this project will do.

Why is mathematics easy?

Ever since Turing, we have known that there is no algorithm that will take as input a mathematical statement and output a proof if the statement has a proof or the words “this statement does not have a proof” otherwise. (If such an algorithm existed, one could feed it statements of the form “algorithm A halts” and the halting problem would be solved.) If P\ne NP, then there is not even a practical algorithm for determining whether a statement has a proof of at most some given length — a brute-force algorithm exists, but takes far too long. Despite this, mathematicians regularly find long and complicated proofs of theorems. How is this possible?

The broad answer is that while the theoretical results just alluded to show that we cannot expect to determine the proof status of arbitrary mathematical statements, that is not what we try to do. Rather, we look at only a tiny fraction of well-formed statements, and the kinds of proofs we find tend to have a lot more structure than is guaranteed by the formal definition of a proof as a sequence of statements, each of which is either an initial assumption or follows in a simple way from earlier statements. (It is interesting to speculate about whether there are, out there, utterly bizarre and idea-free proofs that just happen to establish concise mathematical statements but that will never be discovered because searching for them would take too long.) A good way of thinking about this project is that it will be focused on the following question.

Question. What is it about the proofs that mathematicians actually find that makes them findable in a reasonable amount of time?

Clearly, a good answer to this question would be extremely useful for the purposes of writing automatic theorem proving programs. Equally, any advances in a GOFAI approach to writing automatic theorem proving programs have the potential to feed into an answer to the question. I don’t have strong views about the right balance between the theoretical and practical sides of the project, but I do feel strongly that both sides should be major components of it.

The practical side of the project will, at least to start with, be focused on devising algorithms that find proofs in a way that imitates as closely as possible how humans find them. One important aspect of this is that I will not be satisfied with programs that find proofs after carrying out large searches, even if those searches are small enough to be feasible. More precisely, searches will be strongly discouraged unless human mathematicians would also need to do them. A question that is closely related to the question above is the following, which all researchers in automatic theorem proving have to grapple with.

Question. Humans seem to be able to find proofs with a remarkably small amount of backtracking. How do they prune the search tree to such an extent?

Allowing a program to carry out searches of “silly” options that humans would never do is running away from this absolutely central question.

With Mohan Ganesalingam, Ed Ayers and Bhavik Mehta (but not simultaneously), I have over the years worked on writing theorem-proving programs with as little search as possible. This will provide a starting point for the project. One of the reasons I am excited to have the chance to set up a group is that I have felt for a long time that with more people working on the project, there is a chance of much more rapid progress — I think the progress will scale up more than linearly in the number of people, at least up to a certain size. And if others were involved round the world, I don’t think it is unreasonable to hope that within a few years there could be theorem-proving programs that were genuinely useful — not necessarily at a research level but at least at the level of a first-year undergraduate. (To be useful a program does not have to be able to solve every problem put in front of it: even a program that could solve only fairly easy problems but in a sufficiently human way that it could explain how it came up with its proofs could be a very valuable educational tool.)

A more distant dream is of course to get automatic theorem provers to the point where they can solve genuinely difficult problems. Something else that I would like to see coming out of this project is a serious study of how humans do this. From time to time I have looked at specific proofs that appear to require at a certain point an idea that comes out of nowhere, and after thinking very hard about them I have eventually managed to present a plausible account of how somebody might have had the idea, which I think of as a “justified proof”. I would love it if there were a large collection of such accounts, and I have it in mind as a possible idea to set up (with help) a repository for them, though I would need to think rather hard about how best to do it. One of the difficulties is that whereas there is widespread agreement about what constitutes a proof, there is not such a clear consensus about what constitutes a convincing explanation of where an idea comes from. Another theoretical problem that interests me a lot is the following.

Problem. Come up with a precise definition of a “proof justification”.

Though I do not have a satisfactory definition, very recently I have had some ideas that will I think help to narrow down the search substantially. I am writing these ideas down and hope to make them public soon.

Who am I looking for?

There is much more I could say about the project, but if this whets your appetite, then I refer you to the document where I have said much more about it. For the rest of this post I will say a little bit more about the kind of person I am looking for and how a typical week might be spent.

The most important quality I am looking for in an applicant for a PhD or postdoc associated with this project is a genuine enthusiasm for the GOFAI approach briefly outlined here and explained in more detail in the much longer document. If you read that document and think that that is the kind of work you would love to do and would be capable of doing, then that is a very good sign. Throughout the document I give indications of things that I don’t yet know how to do. If you find yourself immediately drawn into thinking about those problems, which range from small technical problems to big theoretical questions such as the ones mentioned above, then that is even better. And if you are not fully satisfied with a proof unless you can see why it was natural for somebody to think of it, then that is better still.

I would expect a significant proportion of people reading the document to have an instinctive reaction that the way I propose to attack the problems is not the best way, and that surely one should use some other technique — machine learning, large search, the Knuth-Bendix algorithm, a computer algebra package, etc. etc. — instead. If that is your reaction, then the project probably isn’t a good fit for you, as the GOFAI approach is what it is all about.

As far as qualifications are concerned, I think the ideal candidate is somebody with plenty of experience of solving mathematical problems (either challenging undergraduate-level problems for a PhD candidate or research-level problems for a postdoc candidate), and good programming ability. But if I had to choose one of the two, I would pick the former over the latter, provided that I could be convinced that a candidate had a deep understanding of what a well-designed algorithm would look like. (I myself am not a fluent programmer — I have some experience of Haskell and Python and I think a pretty good feel for how to specify an algorithm in a way that makes it implementable by somebody who is a quick coder, and in my collaborations so far have relied on my collaborators to do the coding.) Part of the reason for that is that I hope that if one of the outputs of the group is detailed algorithm designs, then there will be remote participants who would enjoy turning those designs into code.

How will the work be organized?

The core group is meant to be a genuine team rather than simply a group of a few individuals with a common interest in automatic theorem proving. To this end, I plan that the members of the group will meet regularly — I imagine something like twice a week for at least two hours and possibly more — and will keep in close contact, and very likely meet less formally outside those meetings. The purpose of the meetings will be to keep the project appropriately focused. That is not to say that all team members will work on the same narrow aspect of the problem at the same time. However, I think that with a project like this it will be beneficial (i) to share ideas frequently, (ii) to keep thinking strategically about how to get the maximum expected benefit for the effort put in , and (iii) to keep updating our public list of open questions (which will not be open questions in the usual mathematical sense, but questions more like “How should a computer do this?” or “Why is it so obvious to a human mathematician that this would be a silly thing to try?”).

In order to make it easy for people to participate remotely, I think probably we will want to set up a dedicated website where people can post thoughts, links to code, questions, and so on. Some thought will clearly need to go into how best to design such a site, and help may be needed to build it, which if necessary I could pay for. Another possibility would of course be to have Zoom meetings, but whether or not I would want to do that depends somewhat on who ends up participating and how they do so.

Since the early days of Polymath I have become much more conscious that merely stating that a project is open to anybody who wishes to join in does not automatically make it so. For example, whereas I myself am comfortable with publicly suggesting a mathematical idea that turns out on further reflection to be fruitless or even wrong, many people are, for very good reasons, not willing to do so, and those people belong disproportionately to groups that have been historically marginalized from mathematics — which of course is not a coincidence. Because of this, I have not yet decided on the details of how remote participation might work. Maybe part of it could be fully open in the way that Polymath was, but part of it could be more private and carefully moderated. Or perhaps separate groups could be set up that communicated regularly with the Cambridge group. There are many possibilities, but which ones would work best depends on who is interested. If you are interested in the project but would feel excluded by certain modes of participation, then please get in touch with me and we can think about what would work for you.

April 27, 2022

Scott Aaronson My first-ever attempt to create a meme!

Scott Aaronson An update on the campaign to defend serious math education in California

Update (April 27): Boaz Barak—Harvard CS professor, longtime friend-of-the-blog, and coauthor of my previous guest post on this topic—has just written an awesome FAQ, providing his personal answers to the most common questions about what I called our “campaign to defend serious math education.” It directly addresses several issues that have already come up in the comments. Check it out!


As you might remember, last December I hosted a guest post about the “California Mathematics Framework” (CMF), which was set to cause radical changes to precollege math in California—e.g., eliminating 8th-grade algebra and making it nearly impossible to take AP Calculus. I linked to an open letter setting out my and my colleagues’ concerns about the CMF. That letter went on to receive more than 1700 signatures from STEM experts in industry and academia from around the US, including recipients of the Nobel Prize, Fields Medal, and Turing Award, as well as a lot of support from college-level instructors in California. 

Following widespread pushback, a new version of the CMF appeared in mid-March. I and others are gratified that the new version significantly softens the opposition to acceleration in high school math and to calculus as a central part of mathematics.  Nonetheless, we’re still concerned that the new version promotes a narrative about data science that’s a recipe for cutting kids off from any chance at earning a 4-year college degree in STEM fields (including, ironically, in data science itself).

To that end, some of my Californian colleagues have issued a new statement today on behalf of academic staff at 4-year colleges in California, aimed at clearing away the fog on how mathematics is related to data science. I strongly encourage my readers on the academic staff at 4-year colleges in California to sign this commonsense statement, which has already been signed by over 250 people (including, notably, at least 50 from Stanford, home of two CMF authors).

As a public service announcement, I’d also like to bring to wider awareness Section 18533 of the California Education Code, for submitting written statements to the California State Board of Education (SBE) about errors, objections, and concerns in curricular frameworks such as the CMF.  

The SBE is scheduled to vote on the CMF in mid-July, and their remaining meeting before then is on May 18-19 according to this site, so it is really at the May meeting that concerns need to be aired.  Section 18533 requires submissions to be written (yes, snail mail) and postmarked at least 10 days before the SBE meeting. So to make your voice heard by the SBE, please send your written concern by certified mail (for tracking, but not requiring signature for delivery), no later than Friday May 6, to State Board of Education, c/o Executive Secretary of the State Board of Education, 1430 N Street, Room 5111, Sacramento, CA 95814, complemented by an email submission to sbe@cde.ca.gov and mathframework@cde.ca.gov.

April 25, 2022

John PreskillThese are a few of my favorite steampunk books

As a physicist, one grows used to answering audience questions at the end of a talk one presents. As a quantum physicist, one grows used to answering questions about futuristic technologies. As a quantum-steampunk physicist, one grows used to the question “Which are your favorite steampunk books?”

Literary Hub has now published my answer.

According to its website, “Literary Hub is an organizing principle in the service of literary culture, a single, trusted, daily source for all the news, ideas and richness of contemporary literary life. There is more great literary content online than ever before, but it is scattered, easily lost—with the help of its editorial partners, Lit Hub is a site readers can rely on for smart, engaged, entertaining writing about all things books.”

My article, “Five best books about the romance of Victorian science,” appeared there last week. You’ll find fiction, nonfiction as imaginative as fiction, and crossings of the border between the two. 

My contribution to literature about the romance of Victorian science—my (mostly) nonfiction book, Quantum Steampunk: The Physics Of Yesterday’s Tomorrow—was  published two weeks ago. Where’s a hot-air-balloon emoji when you need one?

April 24, 2022

Scott Aaronson On form versus meaning

There is a fundamental difference between form and meaning. Form is the physical structure of something, while meaning is the interpretation or concept that is attached to that form. For example, the form of a chair is its physical structure – four legs, a seat, and a back. The meaning of a chair is that it is something you can sit on.

This distinction is important when considering whether or not an AI system can be trained to learn semantic meaning. AI systems are capable of learning and understanding the form of data, but they are not able to attach meaning to that data. In other words, AI systems can learn to identify patterns, but they cannot understand the concepts behind those patterns.

For example, an AI system might be able to learn that a certain type of data is typically associated with the concept of “chair.” However, the AI system would not be able to understand what a chair is or why it is used. In this way, we can see that an AI system trained on form can never learn semantic meaning.

–GPT3, when I gave it the prompt “Write an essay proving that an AI system trained on form can never learn semantic meaning” 😃

ResonaancesApril Fools'21: Trouble with g-2

On April 7, the g-2 experiment at Fermilab was supposed to reveal their new measurement of the magnetic moment of the muon.  *Was*, because the announcement may be delayed for the most bizarre reason. You may have heard that the data are blinded to avoid biasing the outcome. This is now standard practice, but the g-2 collaboration went further: they are unable to unblind the data by themselves, to make sure that there is no leaks or temptations. Instead, the unblinding procedure requires an input from an external person, who is one of the Fermilab theorists. How does this work? The experiment measures the frequency of precession of antimuons circulating in a ring. From that and the known magnetic field the sought fundamental quantity - the magnetic moment of the muon, or g-2 in short - can be read off.  However, the whole analysis chain is performed using a randomly chosen number instead of the true clock frequency. Only at the very end, once all statistical and systematic errors are determined,  the true frequency is inserted and the final result is uncovered. For that last step they need to type the secret code into this machine looking like something from a 60s movie: 

The code was picked by the Fermilab theorist, and he is the only person to know it.  There is the rub... this theorist now refuses to give away the code.  It is not clear why. One time he said he had forgotten the envelope with the code on a train, another time he said the dog had eaten it. For the last few days he has locked himself in his home and completely stopped taking any calls. 

The situation is critical. PhD students from the collaboration are working round the clock to crack the code. They are basically trying out all possible combinations, but the process is painstakingly slow and may take months, delaying the long-expected announcement.  The collaboration even got a permission from the Fermilab director to search the office of the said theorist.  But they only found this piece of paper behind the bookshelf: 

It may be that the paper holds a clue about the code. If you have any idea what the code may be please email fermilab@fnal.gov or just write it in the comments below. 


Update: a part of this post (but strangely enough not all) is an April Fools joke. The new g-2 results are going to be presented on April 7, 2021, as planned.  The code is OPE, which stands for "operator product expansion", which is an  important technique used in the theoretical calculation of hadronic corrections to muon g-2: 



ResonaancesWhy is it when something happens it is ALWAYS you, muons?

April 7, 2021 was like a good TV episode: high-speed action, plot twists, and a cliffhanger ending. We now know that the strength of the little magnet inside the muon is described by the g-factor: 

g = 2.00233184122(82).

Any measurement of basic properties of matter is priceless, especially when it come with this incredible precision.  But for a particle physicist the main source of excitement is that this result could herald the breakdown of the Standard Model. The point is that the g-factor or the magnetic moment of an elementary particle can be calculated theoretically to a very good accuracy. Last year, the white paper of the Muon g−2 Theory Initiative came up with the consensus value for the Standard Model prediction 

                                                                      g = 2.00233183620(86), 

which is significantly smaller than the experimental value.  The discrepancy is estimated at 4.2 sigma, assuming the theoretical error is Gaussian and combining the errors in quadrature. 

As usual, when we see an experiment and the Standard Model disagree, these 3 things come to mind first

  1.  Statistical fluctuation. 
  2.  Flawed theory prediction. 
  3.  Experimental screw-up.   

The odds for 1. are extremely low in this case.  3. is not impossible but unlikely as of April 7. Basically the same experiment was repeated twice, first in Brookhaven 20 years ago, and now in Fermilab, yielding very consistent results. One day it would be nice to get an independent confirmation using alternative experimental techniques, but we are not losing any sleep over it. It is fair to say, however,  that 2. is not yet written off by most of the community. The process leading to the Standard Model prediction is of enormous complexity. It combines technically challenging perturbative calculations (5-loop QED!), data-driven methods, and non-perturbative inputs from dispersion relations, phenomenological models, and lattice QCD. One especially difficult contribution to evaluate is due to loops of light hadrons (pions etc.) affecting photon propagation.  In the white paper,  this hadronic vacuum polarization is related by theoretical tricks to low-energy electron scattering and determined from experimental data. However, the currently most precise lattice evaluation of the same quantity gives a larger value that would take the Standard Model prediction closer to the experiment. The lattice paper first appeared a year ago but only now was published in Nature in a well-timed move that can be compared to an ex crashing a wedding party. The theory and experiment are now locked in a three-way duel, and we are waiting for the shootout to see which theoretical prediction survives. Until this controversy is resolved, there will be a cloud of doubt hanging over every interpretation of the muon g-2 anomaly.   

  But let us assume for a moment that the white paper value is correct. This would be huge, as it would mean that the Standard Model does not fully capture how muons interact with light. The correct interaction Lagrangian would have to be (pardon my Greek)

The first term is the renormalizable minimal coupling present in the Standard Model, which gives the Coulomb force and all the usual electromagnetic phenomena. The second term is called the magnetic dipole. It leads to a small shift of the muon g-factor, so as to explain the Brookhaven and Fermilab measurements.  This is a non-renormalizable interaction, and so it must be an effective description of virtual effects of some new particle from beyond the Standard Model. Theorists have invented countless models for this particle in order to address the old Brookhaven measurement, and the Fermilab update changes little in this enterprise. I will write about it another time.  For now, let us just crunch some numbers to highlight one general feature. Even though the scale suppressing the effective dipole operator is in the EeV range, there are indications that the culprit particle is much lighter than that. First, electroweak gauge invariance forces it to be less than ~100 TeV in a rather model-independent way.  Next, in many models contributions to muon g-2 come with the chiral suppression proportional to the muon mass. Moreover, they typically appear at one loop, so the operator will pick up a loop suppression factor unless the new particle is strongly coupled.  The same dipole operator as above can be more suggestively recast as  

The scale 300 GeV appearing in the denominator indicates that the new particle should be around the corner!  Indeed, the discrepancy between the theory and experiment is larger than the contribution of the W and Z bosons to the muon g-2, so it seems logical to put the new particle near the electroweak scale. That's why the stakes of the April 7 Fermilab announcement are so enormous. If the gap between the Standard Model and experiment is real, the new particles and forces responsible for it should be within reach of the present or near-future colliders. This would open a new experimental era that is almost too beautiful to imagine. And for theorists, it would bring new pressing questions about who ordered it. 

April 23, 2022

Scott Aaronson Back

Thanks to everyone who asked whether I’m OK! Yeah, I’ve been living, loving, learning, teaching, worrying, procrastinating, just not blogging.


Last week, Takashi Yamakawa and Mark Zhandry posted a preprint to the arXiv, “Verifiable Quantum Advantage without Structure,” that represents some of the most exciting progress in quantum complexity theory in years. I wish I’d thought of it. tl;dr they show that relative to a random oracle (!), there’s an NP search problem that quantum computers can solve exponentially faster than classical ones. And yet this is 100% consistent with the Aaronson-Ambainis Conjecture!


A student brought my attention to Quantle, a variant of Wordle where you need to guess a true equation involving 1-qubit quantum states and unitary transformations. It’s really well-done! Possibly the best quantum game I’ve seen.


Last month, Microsoft announced on the web that it had achieved an experimental breakthrough in topological quantum computing: not quite the creation of a topological qubit, but some of the underlying physics required for that. This followed their needing to retract their previous claim of such a breakthrough, due to the criticisms of Sergey Frolov and others. One imagines that they would’ve taken far greater care this time around. Unfortunately, a research paper doesn’t seem to be available yet. Anyone with further details is welcome to chime in.


Woohoo! Maximum flow, maximum bipartite matching, matrix scaling, and isotonic regression on posets (among many others)—all algorithmic problems that I was familiar with way back in the 1990s—are now solvable in nearly-linear time, thanks to a breakthrough by Chen et al.! Many undergraduate algorithms courses will need to be updated.


For those interested, Steve Hsu recorded a podcast with me where I talk about quantum complexity theory.

April 19, 2022

ResonaancesHow large is the W mass anomaly

Everything is larger in the US: cars, homes, food portions, people. The CDF collaboration from the now defunct Tevatron collider argues that this phenomenon is rooted in fundamental physics: 

The plot shows the most precise measurements of the mass of the W boson - one of the fundamental particles of the Standard Model. The lone wolf is the new CDF result. It is clear that the W mass is larger around the CDF detector than in the canton of Geneva, and the effect is significant enough to be considered as evidence.  More quantitatively, the CDF result is 

  • 3.0 sigma above the most precise LHC measurement by the ATLAS collaboration. 
  • 2.4 sigma above the more recent LHC measurement by the LHCb collaboration. 
  • 1.7 sigma above the combined measurements from the four collaborations of the LEP collider. 

All in all, the evidence that the W boson is heavier in the US than in Europe stands firm. (For the sake of the script I will not mention here that the CDF result is also 2.4 sigma larger than the other Tevatron measurement from the D0 collaboration, and 2.2 sigma larger than... the previous CDF measurement from 10 years before.) 

But jokes aside, what should we make of the current confusing situation?  The tension between CDF and the combination of the remaining mW measurements is whopping 4.1 sigma.  What value of mW should we then use in the Standard Model fits and new physics analyses? Certainly not the CDF one, some 6.5 away from the Standard Model prediction, because that value does not take into account the input from other experiments. At the same time we cannot just ignore CDF. In the end we do not know for sure who is right and who is wrong here. While most physicists tacitly assume that CDF has made a mistake, it is also conceivable that the other experiments have been suffering from the confirmation bias. Finally, a naive combination of all the results is not a sensible option either.  Indeed, at face value the Gaussian combination leads to mW = 80.410(7) GeV. This value is however not very meaningful from the statistical perspective: it's impossible to state,  with 68 percent confidence, that the true value of the W mass is between 80.403 and 80.417 GeV. That range doesn't even overlap with either of the most precise measurements from CDF and ATLAS!  (One should also be careful with Gaussian combinations because there can be subtle correlations between the different experimental results. Numerically, however, this should not be a big problem in the case at hand, as in the past the W mass results obtained via naive combinations were in fact very close to the more careful averages by Particle Data Group). Due to the disagreement between the experiments, our knowledge of the true value of mW is degraded, and the combination should somehow account for that.  

The question of combining information from incompatible measurements is a delicate one, residing at a boundary between statistics, psychology, and arts. Contradictory results are rare in collider physics, because of a small number of experiments and a high level of scrutiny. However, they are common in other branches of physics, just to mention the neutron lifetime or the electron g-2 as recent examples. To deal with such unpleasantness, Particle Data Group developed a totally ad hoc but very useful procedure. The idea is to penalize everyone in a democratic way, assuming that all experimental errors have been underestimated. More quantitatively, one inflates the errors of all the involved results until the χ^2 per degree of freedom in the combination is equal to 1.  Applying this procedure to the W mass measurements, it is necessary to inflate the errors by the factor of S=2.1, which leads mW = 80.410(15) GeV. 

The inflated result make more intuitive sense, since the combined 1 sigma band overlaps with the most precise CDF measurement, and lies close enough to the error bars from other experiments. If you accept that combination, the tension with the Standard Model stands at 3 sigma. This value fairly well represents the current situation: it is large enough to warrant further interest, but not large enough to claim a discovery of new physics beyond the Standard Model. 

The confusion may stay with us for long time. It will go away if CDF finds an error in their analysis, or if the future ATLAS updates shift mW significantly upwards.  But the most likely scenario in my opinion is that the Europe/US divide will only grow in time.  The CDF result could be eliminated from the combination when other experiments reach a significantly better precision. Unfortunately, this is unlikely to happen in the foreseeable future; new colliders and better theory calculations may be necessary to shrink the error bars well below 10 MeV. The conclusion is that particle physicists should shake hands with their nuclear colleagues and start getting used to the S-factors. 

April 13, 2022

Terence TaoWeb page for the Short Communication Satellite (SCS) of the 2022 virtual ICM now live

Just a brief update to the previous post. Gerhard Paseman and I have now set up a web site for the Short Communication Satellite (SCS) for the virtual International Congress of Mathematicians (ICM), which will be an experimental independent online satellite event in which short communications on topics relevant to one or two of the sections of the ICM can be submitted, reviewed by peers, and (if appropriate for the SCS event) displayed in a virtual “poster room” during the Congress on July 6-14 (which, by the way, has recently released its schedule and list of speakers). Our plan is to open the registration for this event on April 5, and start taking submissions on April 20; we are also currently accepting any expressions of interest in helping out with the event, for instance by serving as a reviewer. For more information about the event, please see the overview page, the guidelines page, and the FAQ page of the web site. As viewers will see, the web site is still somewhat under construction, but will be updated as we move closer to the actual Congress.

The comments section of this post would be a suitable place to ask further questions about this event, or give any additional feedback.

UPDATE: for readers who have difficulty accessing the links above, here are backup copies of the overview page and guidelines page.

April 11, 2022

Jacques Distler Monterey and Samba

I reluctantly upgraded my laptop from Mojave to Monterey (macOS 12.3.1). Things have not gone smoothly. My biggest annoyance, currently, is with Time Machine.

I have things set up, so that, at home, my laptop wirelessly backs up, alternately, to one of two Samba Servers. The way this works, is that Time Machine creates a .sparsebundle on the Share. Inside the .sparsebundle is a file system on which the actual backups reside. This is entirely opaque to the Linux system on which the Samba Server is running; all it sees is a directory full of ordinary files (“bands”) which comprise the .sparsebundle.

On older versions of macOS, the file system inside the .sparsebundle was HFS+. Monterey supposedly still supports that, but creates new .sparsebundles where the internal file system is APFS.

After upgrading, I tried to do an incremental backup. This repeatedly failed, with a slew of errors that I don’t want to get into right now. Evidently, whatever claims to the contrary, backing up to a Samba Server, with the .sparsebundle formatted as HFS+ does not work on Monterey.

Reluctantly, I decided to sacrifice one of my two backups, removing it from the list of backups, deleting the .sparsebundle from the Server, and letting Time Machine create a new one, this time, internally formatted as APFS. At first, the backup seemed to go OK. But, after a couple of days, and ~300 GB written to the server, the backup failed, and Time Machine refused to restart it. Investigating, the .sparsebundle would not even mount, if I attempted to mount it manually. Disk Utility reported that the APFS file system was corrupted, and could not be repaired.

So I tried again: removed the backup, deleted the .sparsebundle from the Server, and let Time Machine create a new one from scratch. ~250 GB and another couple of days later, the backup again failed, with the same symptoms.

Here’s the bit of the Time Machine log around the failure:


2022-04-10 16:48:58.645717-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:CopyProgress] Fatal failure to copy '/Volumes/com.apple.TimeMachine.localsnapshots/Backups.backupdb/Tin Can/2022-04-09-185804/Macintosh HD - Data/usr/local/lib/ruby/gems/2.3.0/doc/did_you_mean-1.0.2/rdoc/css/rdoc.css' to '/Volumes/Backups of Tin Can/2022-04-09-185823.inprogress/Macintosh HD - Data/usr/local/lib/ruby/gems/2.3.0/doc/did_you_mean-1.0.2/rdoc/css', error: -43, srcErr: NO
2022-04-10 16:49:03.152913-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:CopyProgress] Failed copy from volume "Macintosh HD - Data"
          113980 Total Items Added (l: 6.24 GB p: 6.68 GB)
               0 Total Items Propagated (shallow) (l: Zero KB p: Zero KB)
               0 Total Items Propagated (recursive) (l: Zero KB p: Zero KB)
          113980 Total Items in Backup (l: 6.24 GB p: 6.68 GB)
           95406 Files Copied (l: 6.2 GB p: 6.62 GB)
           15585 Directories Copied (l: Zero KB p: Zero KB)
             290 Symlinks Copied (l: 7 KB p: Zero KB)
            2699 Files Linked (l: 43.8 MB p: 52.1 MB)
2022-04-10 16:49:03.155424-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:General] Copy stage failed with error: Error Domain=com.apple.backupd.ErrorDomain Code=11 "(null)" UserInfo={NSUnderlyingError=0x7fbebf3eccd0 {Error Domain=NSOSStatusErrorDomain Code=-43 "fnfErr: File not found"}, MessageParameters=(
    "/usr/local/lib/ruby/gems/2.3.0/doc/did_you_mean-1.0.2/rdoc/css/rdoc.css"
)}
2022-04-10 16:49:11.487666-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:DiskImages] Found disk3s1 41504653-0000-11AA-AA11-00306543ECAC
2022-04-10 16:49:21.387970-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:General] Unmounted '/Volumes/com.apple.TimeMachine.localsnapshots/Backups.backupdb/Tin Can/2022-04-09-185804/Personal'
2022-04-10 16:49:21.400097-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:General] Unmounted local snapshot: com.apple.TimeMachine.2022-04-09-185804.local at path: /Volumes/com.apple.TimeMachine.localsnapshots/Backups.backupdb/Tin Can/2022-04-09-185804/Personal source: Personal
2022-04-10 16:49:21.950579-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:DiskImages] Found disk3s1 41504653-0000-11AA-AA11-00306543ECAC
2022-04-10 16:49:21.987207-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:DiskImages] Found disk3s1 41504653-0000-11AA-AA11-00306543ECAC
2022-04-10 16:49:22.353689-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:General] Unmounted '/Volumes/com.apple.TimeMachine.localsnapshots/Backups.backupdb/Tin Can/2022-04-09-185804/Macintosh HD - Data'
2022-04-10 16:49:22.359520-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:DiskImages] Found disk3s1 41504653-0000-11AA-AA11-00306543ECAC
2022-04-10 16:49:23.901539-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:General] Unmounted local snapshot: com.apple.TimeMachine.2022-04-09-185804.local at path: /Volumes/com.apple.TimeMachine.localsnapshots/Backups.backupdb/Tin Can/2022-04-09-185804/Macintosh HD - Data source: Macintosh HD - Data
2022-04-10 16:49:27.292449-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:DiskImages] Found disk3s1 41504653-0000-11AA-AA11-00306543ECAC
2022-04-10 16:49:27.292852-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:General] Mountpoint '/Volumes/.timemachine/192.168.0.xxx/81CD6E80-8234-4079-B19A-3AC33F7E06EF/Distler Backup 0' is still valid
2022-04-10 16:49:27.685979-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:General] Mountpoint '/Volumes/Backups of Tin Can' is still valid
2022-04-10 16:49:28.103532-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:General] Mountpoint '/Volumes/Backups of Tin Can' is still valid
2022-04-10 16:49:28.512397-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:General] Mountpoint '/Volumes/Personal' is still valid
2022-04-10 16:49:28.713895-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:General] Mountpoint '/System/Volumes/Data' is still valid
2022-04-10 16:49:31.394479-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:General] Backup failed (11: BACKUP_FAILED_COPY_STAGE)
2022-04-10 16:49:37.756029-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:General] Unmounted '/Volumes/Backups of Tin Can'
2022-04-10 16:49:42.891886-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:General] Failed to unmount '/Volumes/.timemachine/192.168.0.xxx/81CD6E80-8234-4079-B19A-3AC33F7E06EF/Distler Backup 0', Disk Management error: {
    Action = Unmount;
    Dissenter = 1;
    DissenterPID = 19902;
    DissenterPPID = 0;
    DissenterStatus = 49168;
    Target = "file:///Volumes/.timemachine/192.168.0.xxx/81CD6E80-8234-4079-B19A-3AC33F7E06EF/Distler%20Backup%200/";
}
2022-04-10 16:49:42.896012-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:General] Failed to unmount '/Volumes/.timemachine/192.168.0.xxx/81CD6E80-8234-4079-B19A-3AC33F7E06EF/Distler Backup 0', error: Error Domain=com.apple.diskmanagement Code=0 "No error" UserInfo={NSDebugDescription=No error, NSLocalizedDescription=No Error.}
2022-04-10 16:49:42.935636-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:BackupScheduling] Not prioritizing backups with priority errors. lockState=0
...
2022-04-10 16:53:07.167148-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:General] Starting automatic backup
2022-04-10 16:53:07.168919-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:Mounting] Network destination already mounted at: /Volumes/.timemachine/192.168.0.xxx/81CD6E80-8234-4079-B19A-3AC33F7E06EF/Distler Backup 0
2022-04-10 16:53:07.169298-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:General] Initial network volume parameters for 'Distler Backup 0' {disablePrimaryReconnect: 0, disableSecondaryReconnect: 0, reconnectTimeOut: 30, QoS: 0x20, attributes: 0x1C}
2022-04-10 16:53:07.187213-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:General] Configured network volume parameters for 'Distler Backup 0' {disablePrimaryReconnect: 0, disableSecondaryReconnect: 0, reconnectTimeOut: 30, QoS: 0x20, attributes: 0x1C}
2022-04-10 16:53:08.678084-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:General] Skipping periodic backup verification: no previous backups to this destination.
2022-04-10 16:53:09.696741-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:General] 'Tin Can.sparsebundle' does not need resizing - current logical size is 2.06 TB (2,055,262,778,880 bytes), size limit is 2.06 TB (2,055,262,778,982 bytes)
2022-04-10 16:53:09.915293-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:General] Mountpoint '/Volumes/.timemachine/192.168.0.xxx/81CD6E80-8234-4079-B19A-3AC33F7E06EF/Distler Backup 0' is still valid
2022-04-10 16:53:09.996881-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:General] Checking for runtime corruption on '/Volumes/.timemachine/192.168.0.xxx/81CD6E80-8234-4079-B19A-3AC33F7E06EF/Distler Backup 0/Tin Can.sparsebundle'
2022-04-10 16:53:19.437911-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:DiskImages] Successfully attached using DiskImages2 as 'disk2' from URL '/Volumes/.timemachine/192.168.0.xxx/81CD6E80-8234-4079-B19A-3AC33F7E06EF/Distler Backup 0/Tin Can.sparsebundle'
2022-04-10 16:53:19.440910-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:DiskImages] Found disk3s1 41504653-0000-11AA-AA11-00306543ECAC
2022-04-10 16:53:19.643846-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:DiskImages] Found disk3s1 41504653-0000-11AA-AA11-00306543ECAC
2022-04-10 16:53:20.583484-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:General] Mountpoint '/Volumes/.timemachine/192.168.0.xxx/81CD6E80-8234-4079-B19A-3AC33F7E06EF/Distler Backup 0' is still valid
2022-04-10 16:53:20.586062-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:DiskImages] Found disk3s1 41504653-0000-11AA-AA11-00306543ECAC
2022-04-10 16:53:20.586242-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:General] Runtime corruption check passed for '/Volumes/.timemachine/192.168.0.xxx/81CD6E80-8234-4079-B19A-3AC33F7E06EF/Distler Backup 0/Tin Can.sparsebundle'
2022-04-10 16:53:20.587567-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:DiskImages] Found disk3s1 41504653-0000-11AA-AA11-00306543ECAC
2022-04-10 16:53:20.589086-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:DiskImages] Found disk3s1 41504653-0000-11AA-AA11-00306543ECAC
2022-04-10 16:53:20.590174-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:DiskImages] Found disk3s1 41504653-0000-11AA-AA11-00306543ECAC
2022-04-10 16:53:20.591703-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:DiskImages] Found disk3s1 41504653-0000-11AA-AA11-00306543ECAC
2022-04-10 16:53:20.592900-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:DiskImages] Found disk3s1 41504653-0000-11AA-AA11-00306543ECAC
2022-04-10 16:53:20.593027-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:Mounting] Attempting to mount APFS volume from disk3s1
2022-04-10 16:53:50.382455-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:General] Failed to mount 'disk3s1', dissenter {
    DAStatus = 49218;
}, status: (null)
2022-04-10 16:53:50.791300-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:Mounting] Mount dissented, retrying...
2022-04-10 16:53:53.893930-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:DiskImages] Found disk3s1 41504653-0000-11AA-AA11-00306543ECAC
2022-04-10 16:53:53.897615-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:DiskImages] Found disk3s1 41504653-0000-11AA-AA11-00306543ECAC
2022-04-10 16:53:53.916581-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:DiskImages] Found disk3s1 41504653-0000-11AA-AA11-00306543ECAC
2022-04-10 16:53:53.916724-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:Mounting] Attempting to mount APFS volume from disk3s1
2022-04-10 16:54:09.948095-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:General] Failed to mount 'disk3s1', dissenter {
    DAStatus = 49218;
}, status: (null)
2022-04-10 16:54:10.324846-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:Mounting] Mount dissented, retrying...
2022-04-10 16:54:13.425438-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:DiskImages] Found disk3s1 41504653-0000-11AA-AA11-00306543ECAC
2022-04-10 16:54:13.427379-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:DiskImages] Found disk3s1 41504653-0000-11AA-AA11-00306543ECAC
2022-04-10 16:54:13.429332-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:DiskImages] Found disk3s1 41504653-0000-11AA-AA11-00306543ECAC
2022-04-10 16:54:13.429410-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:Mounting] Attempting to mount APFS volume from disk3s1
2022-04-10 16:54:20.216256-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:General] Failed to mount 'disk3s1', dissenter {
    DAStatus = 49218;
}, status: (null)
2022-04-10 16:54:20.533201-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:Mounting] Mount dissented, retrying...
2022-04-10 16:54:23.633572-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:DiskImages] Found disk3s1 41504653-0000-11AA-AA11-00306543ECAC
2022-04-10 16:54:25.816743-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:General] Unmounted '/Volumes/.timemachine/192.168.0.xxx/81CD6E80-8234-4079-B19A-3AC33F7E06EF/Distler Backup 0'
2022-04-10 16:54:25.829714-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:General] Waiting 60 seconds and trying again.
2022-04-10 16:55:31.572028-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:Mounting] Attempting to mount 'smb://distler-backup@192.168.0.xxx/Distler%20Backup%200'
2022-04-10 16:55:33.819999-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:Mounting] Mounted 'smb://distler-backup@192.168.0.xxx/Distler%20Backup%200' at '/Volumes/.timemachine/192.168.0.xxx/8D2542FA-CFAB-4C6B-9E66-9005383E0039/Distler Backup 0' (1.77 TB of 2.16 TB available)
2022-04-10 16:55:33.820214-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:General] Initial network volume parameters for 'Distler Backup 0' {disablePrimaryReconnect: 0, disableSecondaryReconnect: 0, reconnectTimeOut: 60, QoS: 0x0, attributes: 0x1C}
2022-04-10 16:55:34.030606-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:General] Configured network volume parameters for 'Distler Backup 0' {disablePrimaryReconnect: 0, disableSecondaryReconnect: 0, reconnectTimeOut: 30, QoS: 0x20, attributes: 0x1C}
2022-04-10 16:55:34.784320-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:General] Skipping periodic backup verification: no previous backups to this destination.
2022-04-10 16:55:35.248129-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:General] Mountpoint '/Volumes/.timemachine/192.168.0.xxx/8D2542FA-CFAB-4C6B-9E66-9005383E0039/Distler Backup 0' is still valid
2022-04-10 16:55:35.332770-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:General] Checking for runtime corruption on '/Volumes/.timemachine/192.168.0.xxx/8D2542FA-CFAB-4C6B-9E66-9005383E0039/Distler Backup 0/Tin Can.sparsebundle'
2022-04-10 16:55:42.623176-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:DiskImages] Successfully attached using DiskImages2 as 'disk2' from URL '/Volumes/.timemachine/192.168.0.xxx/8D2542FA-CFAB-4C6B-9E66-9005383E0039/Distler Backup 0/Tin Can.sparsebundle'
2022-04-10 16:55:42.625886-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:DiskImages] Found disk3s1 41504653-0000-11AA-AA11-00306543ECAC
2022-04-10 16:55:42.627412-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:DiskImages] Found disk3s1 41504653-0000-11AA-AA11-00306543ECAC
2022-04-10 16:55:43.234062-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:General] Mountpoint '/Volumes/.timemachine/192.168.0.xxx/8D2542FA-CFAB-4C6B-9E66-9005383E0039/Distler Backup 0' is still valid
2022-04-10 16:55:43.236237-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:DiskImages] Found disk3s1 41504653-0000-11AA-AA11-00306543ECAC
2022-04-10 16:55:43.236358-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:General] Runtime corruption check passed for '/Volumes/.timemachine/192.168.0.xxx/8D2542FA-CFAB-4C6B-9E66-9005383E0039/Distler Backup 0/Tin Can.sparsebundle'
2022-04-10 16:55:43.237707-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:DiskImages] Found disk3s1 41504653-0000-11AA-AA11-00306543ECAC
2022-04-10 16:55:43.238827-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:DiskImages] Found disk3s1 41504653-0000-11AA-AA11-00306543ECAC
2022-04-10 16:55:43.239902-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:DiskImages] Found disk3s1 41504653-0000-11AA-AA11-00306543ECAC
2022-04-10 16:55:43.241266-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:DiskImages] Found disk3s1 41504653-0000-11AA-AA11-00306543ECAC
2022-04-10 16:55:43.242325-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:DiskImages] Found disk3s1 41504653-0000-11AA-AA11-00306543ECAC
2022-04-10 16:55:43.242443-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:Mounting] Attempting to mount APFS volume from disk3s1
2022-04-10 16:56:03.825110-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:General] Failed to mount 'disk3s1', dissenter {
    DAStatus = 49218;
}, status: (null)
2022-04-10 16:56:03.940935-0500  localhost backupd[71202]: (TimeMachine) [com.apple.TimeMachine:Mounting] Mount dissented, retrying...

Won’t even mount, eh?


% sudo fsck_apfs /dev/disk3s1
Password:
** Checking the container superblock.
   Checking the checkpoint with transaction ID 2703.
** Checking the space manager.
** Checking the space manager free queue trees.
** Checking the object map.
** Checking volume /dev/rdisk3s1.
** Checking the APFS volume superblock.
   The volume Backups of Tin Can was formatted by newfs_apfs (1934.101.3) and last modified by apfs_kext (1934.101.3).
** Checking the object map.
warning: (oid 0x432370d) om: btn: invalid o_cksum (0xce8d60bd7f753902)
   Object map is invalid.
** The volume /dev/rdisk3s1 was found to be corrupt and cannot be repaired.
** Verifying allocated space.
** The volume /dev/disk3s1 could not be verified completely.

Time Machine managed to corrupt the Object B-tree, on its first attempt. And fsck_apfs can’t repair it. The entire file system is hosed. So much for the “robustness” of APFS and so much for the quality of Apple’s backup software.

And it ain’t just me. There are lots of complaints on the Synology Forum from people having the same issue.

Next time, I’ll tell you about Fink.

April 09, 2022

Doug NatelsonBrief items

It's been a while since the APS meeting, with many things going on that have made catching up here a challenge.  Here are some recent items that I wanted to point out:

  • Igor Mazin had a very pointed letter to the editor in Nature last week, which is rather ironic since much of what he was excoriating is the scientific publishing culture promulgated by Nature.  His main point is that reaching for often-unjustified exotic explanations is rewarded by glossy journals - a kind of inverse Occam's Razor.   He also points out correctly that it's almost impossible for experimentalists to get a result published in a fancy journal without claiming some theoretical explanation.
  • We had a great physics colloquium here this week by Vincenzo Vitelli of the University of Chicago.  He spoke about a number of things, including "odd elasticity".  See, when relating stresses \(\sigma_{ij}\) to strains \(u_{kl}\), in ordinary elasticity there is a tensor that connects these things: \(\sigma_{ij} = K_{ijkl} u_{kl}\), and that tensor is symmetric:  \(K_{ijkl} = K_{klij}\).  Vitelli and collaborators consider what happens when there is are antisymmetric contributions to that tensor.  This means that a cycle of stress/strain ending back at the original material configuration could add or remove energy from the system, depending on the direction of the cycle.  (Clearly this only makes sense in active matter, like driven or living systems.)  The results are pretty wild - see the videos about halfway down this page.
  • Here's something I didn't expect to see:  a new result out of the Tevatron at FermiLab, which is interesting since the Tevatron hasn't run since 2011.  Quanta has a nice write-up.  Basically a new combined analysis of FermiLab data has a new estimate out for the mass of the W boson along with a claimed improved understanding of systematic errors and backgrounds.  The result is a statement that the W boson is heavier than expectations from the Standard Model by an amount that is estimated to be 7 standard deviations.  The exotic explanation (perhaps favored by the inverse Occam's Razor above) is that the Standard Model calculation is off because it's missing some added contributions from so-far-undiscovered particles.  The less exotic explanation is that the new analysis and small error estimates have some undiscovered flaw.  Time will tell - I gather that the LHC collaborations are working on their own measurements. 
  • This result is very impressive.  Princeton investigators have made qubits using spins of single electrons trapped in Si quantum dots, and they have achieved fidelity in 2-qubit operations greater than 99%.  If this is possible in (excellent) university-level fabrication, it does make you wonder whether great things may be possible in a scalable way with industrial-level process control.
  • This is a great interview with John Preskill.  In general the AIP oral history project is outstanding.
  • Well, this is certainly suggestive evidence that the universe really is a simulation.

April 08, 2022

Jordan EllenbergMask-wearing as vegetarianism

We might find out that COVID-19 infection carries with it a parcel of unwanted downstream effects. Say, a modestly increased risk of heart attack, of stroke, of early dementia. And maybe that those risks go up with repeated infection. It’s in no way certain any of this is the case. I’m not even sure it’s likely! But the probability seems high enough that it’s worth thinking about what the consequences of that would be.

My instinct is that the practice of wearing masks in crowded indoor settings would end up looking like the practice of vegetarianism does now. In other words, it would be something which:

  • clearly has individual health benefits although the magnitude is arguable;
  • clearly has public-good benefits although the magnitude is arguable;
  • most people don’t do;
  • some people feel they ought to do but don’t, or don’t fully;
  • changes over time from seeming “weird” to being well within the range of normal things people do, though there remain aggrieved antis who can’t shut up about how irrational and self-righteous the practitioners are;
  • is politically impossible to imagine being imposed by government

Would I be one of the people who kept up mask-wearing in crowded public places? I mean, I’ve been doing it so far, though certainly not with 100% adherence.

I do still eat meat, even though the environmental case for vegetarianism is clear-cut, and there’s a reasonably compelling argument that eating meat is bad for my own health. But giving up meat forever would be a lot harder on me than wearing a mask to the grocery store forever.

April 03, 2022

Jordan EllenbergNew York trip

Back from an east coast swing with the kids. We took the train up from my parents’ place in Philadelphia Friday morning, came back Saturday night; in that time we went to five museums (Natural History, Met, NYPL, MoMA, International Center of Photography) — ate belly lox at Zabar’s, pastrami at Katz’s, dumplings in Chinatown, Georgian food (Tbilisi not Atlanta) on the Upper West Side, and Junior’s cheesecake for breakfast — and saw three old friends. Oh and CJ took a college tour. My iPhone’s step counter registered 30,000 steps the first day (my all-time record!) and 20,000 the second day. We’re getting good at doing things fast!

I don’t doubt New York has been changed by the pandemic but the changes aren’t visible when you’re just walking around as a tourist on the street. Everything’s crowded and aive.

I was worried we’d have conflict about how much time to spend in art museums but both kids like the Canonical Moderns Of Painting right now so it worked out well. AB was very into Fernand Leger and was aggrieved they didn’t have any Leger postcards at the giftshop but I explained to her that it’s much cooler to be into the artists who aren’t the ones that get postcards at the giftshop. She thinks Jackson Pollock is a fraud and don’t even get her started on Barnett Newman.

My favorite old painting at the Met, the one I always go visit first, isn’t on view anymore. But my other favorite — a little on brand for me, I know — is in the gallery as always. Also saw a bunch of Max Beckmann I wasn’t familiar with, and at MoMA, this Alice Neel painting which looked kind of like Beckmann:

I took the kids to McNally Jackson and to the flagship North American MUJI (where I bought a new yak sweater, there is just no sweater like a MUJI yak sweater.) CJ went in the NBA store and complained they didn’t have enough Bucks gear. We went in a fancy stationery store where the very cute little desk clock we saw turned out to cost $172. We walked through the new Essex Market in the Lower East Side which is like Reading Terminal Market if everything were brand new. Michelle Shih took us to Economy Candy, which has been there forever and which I’d never heard of. I bought a Bar None, a candy bar I remember really liking in the 90s and which I haven’t seen in years. Turns out it was discontinued in 1997 but has been resuscitated by a company whose entire business is bringing back candy bars people fondly remember. There was a huge traffic snarl caused by someone blocking the box so my kids got to see an actual New York guy lean halfway out of his car and yell “YO!” (Then he yelled some other words.) We were so full from the pastrami that we couldn’t eat all the pickles. I brought them all the way home to Madison and just ate them. I ❤ NY.

March 31, 2022

John PreskillOne equation to rule them all?

In lieu of composing a blog post this month, I’m publishing an article in Quanta Magazine. The article provides an introduction to fluctuation relations, souped-up variations on the second law of thermodynamics, which helps us understand why time flows in only one direction. The earliest fluctuation relations described classical systems, such as single strands of DNA. Many quantum versions have been proved since. Their proliferation contrasts with the stereotype of physicists as obsessed with unification—with slimming down a cadre of equations into one über-equation. Will one quantum fluctuation relation emerge to rule them all? Maybe, and maybe not. Maybe the multiplicity of quantum fluctuation relations reflects the richness of quantum thermodynamics.

You can read more in Quanta Magazine here and yet more in chapter 9 of my book. For recent advances in fluctuation relations, as opposed to the broad introduction there, check out earlier Quantum Frontiers posts here, here, here, here, and here.

March 22, 2022

Georg von HippelScholarships for Ukrainian PhD students in Mainz

The Cluster of Excellence PRISMA+ at the University of Mainz announces scholarships for Ukrainian PhD students who have already started a PhD in the field of nuclear, hadron or particle physics and had to leave their country.

Prerequisite is a master's degree in physics. The fellowship is initially limited to a maximum of six months, but can be extended. For an application or further information, please contact prisma@uni-mainz.de.