# Planet Musings

## October 19, 2021

### John Baez — Conjectures on the Kuramoto–Sivashinsky Equation

I love this movie showing a solution of the Kuramoto–Sivashinsky equation, made by Thien An. If you haven’t seen her great math images on Twitter, check them out!

I hadn’t known about this equation, and it looked completely crazy to me at first. But it turns out to be important, because it’s one of the simplest partial differential equations that exhibits chaotic behavior.

As the image scrolls to the left, you’re seeing how a real-valued
function $u(t,x)$ of two real variables changes with the passage of time. The vertical direction is ‘space’, $x$, while the horizontal direction is time, $t$.

As time passes, bumps form and merge. I conjecture that they don’t split or disappear. This reflects the fact that the Kuramoto–Sivashinsky equation has a built-in arrow of time: it describes a world where the future is different than the past.

The behavior of these bumps makes the Kuramoto–Sivashinsky
equation an excellent playground for thinking about how differential equations can describe ‘things’ with some individuality, even though their solutions are just smooth functions. I’m going to make some conjectures about them. But I could really use some help from people who are good at numerical computation or creating mathematical images!

First let me review some known stuff.

For starters, note that these bumps seem to appear out of nowhere. That’s because this system is chaotic: small ripples get amplified. This is especially true of ripples with a certain wavelength: roughly $2 \sqrt{2} \pi$, as we’ll see later.

And yet while solutions of the Kuramoto–Sivanshinsky equation are chaotic, they have a certain repetitive character. That is, they don’t do completely novel things; they seem to keep doing the same general sort of thing. The world this equation describes has an arrow of time, but it’s ultimately rather boring compared to ours.

The reason is that all smooth solutions of the Kuramoto–Sivanshinsky equation quickly approach a certain finite-dimensional manifold of solutions, called an ‘inertial manifold’. The dynamics on the inertial manifold is chaotic. And sitting inside it is a set called an ‘attractor’, which all solutions approach. This attractor is probably a fractal. This attractor describes the complete repertoire of what you’ll see solutions do if you wait a long time.

Some mathematicians have put a lot of work into proving these things, but let’s see how much we can understand without doing anything too hard.

Written out with a bit less jargon, the Kuramoto–Sivashinky equation says

$\displaystyle{ \frac{\partial u}{\partial t} = - \frac{\partial^2 u}{\partial x^2} - \frac{\partial^4 u}{\partial x^4} - \frac{1}{2}\left( \frac{\partial u}{\partial x}\right)^2 }$

or in more compressed notation,

$u_t = -u_{xx} -u_{xxxx} - (u_x)^2$

To understand it, first remember the heat equation:

$u_t = u_{xx}$

This describes how heat spreads out. That is: if $u(t,x)$ is the temperature of an iron rod at position $x$ at time $t$, the heat equation describes how this temperature function flattens out as time passes and heat spreads.

But the Kuramoto–Sivashinsky equation more closely resembles the time-reversed heat equation

$u_t = -u_{xx}$

This equation describes how, running a movie of a hot iron rod backward, heat tends to bunch up rather than smear out! Small regions of different temperature, either hotter or colder than their surroundings, will tend to amplify.

This accounts for the chaotic behavior of the Kuramoto–Sivashinsky equation: small bumps emerge as if out of thin air and then grow larger. But what keeps these bumps from growing uncontrollably?

The next term in the equation helps. If we have

$u_t = -u_{xx} - u_{xxxx}$

then very sharp spikes in $u(t,x)$ tend to get damped out exponentially.

To see this, it helps to bring in a bit more muscle: Fourier series. We can easily solve the heat equation if our iron rod is the interval $[0,2\pi]$ and we demand that its temperature is the same at both ends:

$u(t,0) = u(t,2\pi)$

This lets us write the temperature function $u(t,x)$ in terms of the functions $e^{ikx}$ like this:

$\displaystyle{ u(t,x) = \sum_{k = -\infty}^\infty \hat{u}_k(t) e^{ikx} }$

for some functions $\hat{u}_k(t)$. Then the heat equation gives

$\displaystyle{ \frac{d}{d t} \hat{u}_k(t) = -k^2 \hat{u}_k(t) }$

and we can easily solve these equations and get

$\displaystyle{ \hat{u}_k(t) = e^{-k^2 t} \hat{u}_k(0) }$

and thus

$\displaystyle{ u(t,x) = \sum_{k = -\infty}^\infty \hat{u}_k(0) e^{-k^2 t} e^{ikx} }$

So, each function $\hat{u}_k(t)$ decays exponentially as time goes on, and the so-called ‘high-frequency modes’, $\hat{u}_k(t)$ with $|k|$ big, get damped really fast due to that $e^{-k^2 t}$ factor. This is why heat smears out as time goes on.

If we solve the time-reversed heat equation the same way we get

$\displaystyle{ u(t,x) = \sum_{k = -\infty}^\infty \hat{u}_k(0) e^{k^2 t} e^{ikx} }$

so now high-frequency modes get exponentially amplified. The time-reversed heat equation is a very unstable: if you change the initial data a little bit by adding a small amount of some high-frequency function, it will make an enormous difference as time goes by.

What keeps things from going completely out of control? The next term in the equation helps:

$u_t = -u_{xx} - u_{xxxx}$

This is still linear so we can still solve it using Fourier series. Now we get

$\displaystyle{ u(t,x) = \sum_{k = -\infty}^\infty \hat{u}_k(0) e^{(k^2-k^4) t} e^{ikx} }$

Since $k^2 - k^4 \le 0$, none of the modes $\hat{u}_k(t)$ grows exponentially. In fact, all the modes decay exponentially except for three: $k = -1,0,1$. These will be constant in time. So, any solution will approach a constant as time goes on!

We can make the story more interesting if we don’t require our rod
to have length $2\pi$. Say it has length $L$. We can write functions on the interval $[0,L]$ as linear combinations of functions $e^{ikx}$ where now the frequencies $k$ aren’t integers: instead

$k = 2\pi n/L$

for integers $n$. The longer our rod, the lower these frequencies $k$ can be. The rest of the math works almost the same: we get

$\displaystyle{ u(t,x) = \sum_{n = -\infty}^\infty \hat{u}_k(0) e^{(k^2-k^4) t} e^{ikx} }$

but we have to remember $k = 2\pi n/L$. The modes with $k^2 - k^4 > 0$ will grow exponentially, while the rest will decay exponentially or stay constant. Note that $k^2 - k^4 > 0$ only for $0 <|k| < 1$. So, modes with these frequencies grow exponentially. Modes with $|k| > 1$ decay exponentially.

If $L < 2\pi$, all the frequencies $k$ are integers times $2\pi/L$, which is bigger than $1$, so no modes grow exponentially—and indeed all solutions approach a constant! But as you look at longer and longer rods, you get more and more modes that grow exponentially. The number of these will be roughly proportional to $L$, though they will ‘pop into existence’ at certain specific values of $L$.

Which exponentially growing modes grow the fastest? These are the
ones that make $k^2 - k^4$ as large as possible, so they happen near where

$\displaystyle{ \frac{d}{dk} (k^2 - k^4) = 0 }$

namely $k = 1/\sqrt{2}$. The wavelength of a mode is $2\pi/k$, so these fastest-growing modes have wavelength close to $2\sqrt{2} \pi$.

In short, our equation has a certain length scale where the instability is most pronounced: temperature waves with about this wavelength grow fastest.

All this is very easy to work out in as much detail as we want, because our equation so far is linear. The full-fledged Kuramoto–Sivashinsky equation

$u_t = -u_{xx} - u_{xxxx} - (u_x)^2$

is a lot harder. And yet some features remain, which is why it
was worth spending time on the linear version.

For example, it turns out that the bumps we see in the movie above
have width roughly $2 \sqrt{2} \pi$. Bumps of roughly this width tend to get amplified. Why don’t they keep on growing forever? Apparently the nonlinear term $-(u_x)^2$ prevents it. But this is not obvious. Indeed, it’s conjectured that if you solve the Kuramoto–Sivashinsky equation starting with a bounded smooth function $u(0,x)$, the solution will remain bounded by a constant. But this has not been proved—or at least it was not proved as of 2000, when this very nice article was written:

• Encyclopedia of Mathematics, Kuramoto–Sivashinsky equation.

The most fascinating fact about the Kuramoto–Sivashinsky equation is that for any fixed length $L$, it has a finite-dimensional manifold $M$ of solutions such that every solution approaches one of these, exponentially fast! So, while this equation really describes an infinite-dimensional dynamical system, as $t \to \infty$ its solutions move closer and closer to the solutions of some finite-dimensional dynamical system. This finite-dimensional system contains all the information about the patterns we’re seeing in Thien An’s movie.

As I mentioned, the manifold $M$ is called an ‘inertial manifold’. This is a general concept in dynamical systems theory:

• Wikipedia, Inertial manifold.

To make these ideas precise we need to choose a notion of distance
between two solutions at a given time. A good choice uses the $L^2$ norm for periodic functions on $[0,L]$:

$\|f\| = \sqrt{\int_0^L |f(x)|^2 \, dx}.$

Functions on $[0,L]$ with finite $L^2$ norm form a Hilbert space called $L^2[0,L]$. If we start with any function $u(0,-)$ in this Hilbert space we get a solution $u(t,x)$ of the Kuramoto–Sivashinsky equation such that the function $u(t,-)$ is in this Hilbert space at all later times $t$. Furthermore, this function is smooth, even analytic, for all later times:

• P. Collet, J.-P. Eckmann, H. Epstein and J. Stubbe, Analyticity for the Kuramoto–Sivashinsky equation, Physica D 67 (1993), 321–326.

This smoothing property is well-known for the heat equation, but
it’s much less obvious here!

This work also shows that the Kuramoto–Sivashinsky equation defines a dynamical system on the Hilbert space $L^2[0,\infty]$. And based on earlier work by other mathematicians, Temam and Wang have heroically shown that this Hilbert space contains an inertial manifold of dimension bounded by some constant times $L^{1.64} (\ln L)^{0.2}.$

• Roger Temam and Xiao Ming Wang, Estimates on the lowest dimension of inertial manifolds for the Kuramoto-Sivashinsky equation in the general case, Differential and Integral Equations 7 (1994), 1095–1108.

I conjecture that in reality its dimension grows roughly linearly with $L$. Why? We’ve just seen this is true for the linearized version of the Kuramoto–Sivashinsky equation: all modes with frequency $|k| > 1$ get damped exponentially, but since there’s one mode for each integer $n$, and $k = 2\pi n/L$, these modes correspond to integers $n$ with $|n| \le L /2\pi$. So, there are $\lfloor L/\pi \rfloor$ of these modes. In short, for the linearized Kuramoto–Sivashinsky equation the inertial manifold has dimension about $L/\pi$.

This evidence is rather weak, since it completely ignores the nonlinearity of the Kuramoto–Sivashinsky equation. I would not be shocked if the dimension of the inertial manifold grew at some other rate than linearly with $L$.

Sitting inside the inertial manifold is an attractor, the smallest set that all solutions approach. This is probably a fractal, since that’s true of many chaotic systems. So besides trying to estimate the dimension of the inertial manifold, which is an integer we should try to estimate the dimension of this attractor, which may not be an integer!

There have been some nice numerical experiments studying solutions of the Kuramoto–Sivashinsky equation for various values of $L$, seeing how they get more complicated as $L$ increases. For small $L$, every solution approaches a constant, just as in the linearized version. For larger $L$ we get periodic solutions, and as $L$ continues to increase we get period doubling and finally chaos—a typical behavior for dynamical systems. But that’s just the start. For details, read this:

• Demetrios T. Papageorgiou and Yiorgos S. Smyrlis, The route to chaos for the Kuramoto–Sivashinsky equation, Theoretical and Computational Fluid Dynamics 3 (1991), 15–42.

I’ll warn you that they use a slightly different formalism. Instead of changing the length $L$, they keep it equal to $2\pi$ and change the equation, like this:

$u_t = -u_{xx} - v u_{xxxx} - (u_x)^2$

for some number $v$ they call the ‘viscosity’. It’s just a different way of describing the same business, so if I had more energy I could figure out the relation between $L$ and $v$ and tell you at which length $L$ chaos first kicks in. But I won’t now.

Instead, I want to make some conjectures. I believe there is some
fairly well-defined notion of a ‘bump’ for the Kuramoto–Sivashinsky equations: you can see the bumps form and merge here, and I believe there is a fairly simple way to actually count them at any moment in time:

Since we are studying solutions of the Kuramoto–Sivashinsky equation that are periodic in the space direction, we can think of space as a circle: the interval $[0,L]$ with its endpoints identified. Given a continuous solution $u$ of the equation, define a bump to be a maximal open interval in the circle such that $u(t,x) > c$ for all $x$ in this interval. This definition depends on a constant $c > 0$, but I think there is a range of values for which the following conjectures are true. One could guess this range with some numerical experiments, which I’m hoping one of you can do!

I believe that for nonnegative solutions in the inertial manifold, once a bump forms it never splits into two or more bumps or disappears: all it can do is merge with another bump.

I conjecture that if $L$ is large enough, almost every nonnegative solution on the inertial manifold has a finite number of bumps, and that while bumps can appear and merge, they can never split or disappear.

(Here ‘almost every’ is in the usual sense of measure theory. There are certainly solutions of the Kuramoto–Sivashinsky equation that don’t have bumps that appear and merge, like constant solutions. These solutions may lie on the inertial manifold, but I’m claiming they are rare.)

I also conjecture that the time-averaged number of bumps is asymptotically proportional to $L$ as $L \to \infty$ for almost every nonnegative solution on the inertial manifold. The constant of proportionality shouldn’t depend on the solution we pick, except for solutions in some set of measure zero. It will, however, depend on our constant $c$ that defines the minimum height of an official ‘bump’.

I also conjecture that there’s a well-defined time average of the rate at which new bumps form, which is also asymptotically proportional to $L$ and independent of which solution we pick, except for solutions in a set of measure zero.

I also conjecture that this rate equals the time-averaged rate at which bumps merge, while the time-averaged rate at which bumps disappear or split is zero.

These conjectures are rather bold, but of course there are various fallback positions if they fail.

How can we test these conjectures? It’s hard to explicitly describe solutions that are actually on the inertial manifold, but by definition, any solution keeps getting closer to the inertial manifold at an exponential rate. Thus, it should behave similarly to solutions that are on the inertial manifold, after we wait long enough. So, I’ll conjecture that the above properties hold not only for almost every solution on the inertial manifold, but for typical solutions that start near the inertial manifold… as long as we wait long enough when doing our time averages. This makes the conjectures testable with numerical experiments.

Here are some things I’d love to see:

• Movies like Thien An’s, but with various choices of $L.$ To create these, maybe start with

$u(0,x) = \sin \frac{x}{2} + \textrm{a small amount of noise}$

and run it for a longish time so it ‘settles down’—that is, gets near the inertial manifold.

• More importantly: movies like the above, but where the solution is drawn in black if $u(t,x) > c$ and white otherwise. For a good choice of $c$—and please tell me what choice that is!—we should see black stripes appear and merge but rarely if ever disappear or split.

• A time-averaged count of the average number of black stripes—that is, official ‘bumps’—for various choices of $L.$ I’m conjecturing that this is asymptotically proportional to $L$ for large $L.$

• Time-averaged counts of the rates at which black stripes are born, merge, split, and die—again for various choices of $L.$ I’m conjecturing that the first two are asymptotically proportional to $L$ for large $L$ and that they’re equal. I’m conjecturing that the last two are zero, or tiny.

If someone gets into this, maybe we could submit a short paper to Experimental Mathematics. I’ve been browsing papers on the Kuramoto–Sivashinsky equations, and I haven’t yet seen anything that gets into as much detail on what solutions look like as I’m trying to do here.

One last thing. I forgot to emphasize that the dynamical system on the Hilbert space $L^2[0,L]$ is not reversible: we can evolve a solution forwards in time and it will stay in this Hilbert space, but not backwards in time. This is very well-known for the heat equation; the point is that solutions get smoother as we run them forward, but when we run them backward they typically get more wild and eventually their $L^2$ norm blows up.

What makes this especially interesting is that the dynamical system on the inertial manifold probably is reversible. As long as this manifold is compact, it must be: any smooth vector field on a compact manifold $M$ generates a ‘flow’ that you can run forward or backward in time.

And yet, even if this flow is reversible, as I suspect it is, it doesn’t resemble its time-reversed version! It has an ‘arrow of time’ built in, since bumps are born and merge much more often than they merge and split.

So, if my guesses are right, the inertial manifold for the Kuramoto–Sivashinsky equation describes a deterministic universe where time evolution is reversible—and yet the future doesn’t look like the past, because the dynamics carry the imprint of the irreversible dynamics of the Kuramoto–Sivashinsky equation on the larger Hilbert space of all solutions.

### Andrew Jaffe —

Done with 3+5 hours of https://twitter.com/imperialphysics<@imperialphysics> tutorials in 1+1 days. (Fun, but exhausting.)

## October 18, 2021

### n-Category CaféConjectures on the Kuramoto--Sivashinsky Equation

This beautiful movie by Thien An led me to some conjectures on this chaotic nonlinear partial differential equation. If you like numerical computation, maybe we could work on them a bit!

This great movie showing a solution of the Kuramoto–Sivashinsky equation was made by Thien An. If you haven’t seen her great math images on Twitter, check them out!

I hadn’t known about this equation, and it looked completely crazy to me at first. But it turns out to be important, because it’s one of the simplest partial differential equations that exhibits chaotic behavior.

As the image scrolls to the left, you’re seeing how a real-valued function $u(t,x)$ of two real variables changes with the passage of time. The vertical direction is ‘space’, $x$, while the horizontal direction is time, $t$.

As time passes, bumps form and merge. I conjecture that they don’t split or disappear. This reflects the fact that the Kuramoto–Sivashinsky equation has a built-in arrow of time: it describes a world where the future is different than the past.

The behavior of these bumps makes the Kuramoto–Sivashinsky equation an excellent playground for thinking about how differential equations can describe ‘things’ with some individuality, even though their solutions are just smooth functions. I’m going to make some conjectures about them. But I could really use some help from people who are good at numerical computation or creating mathematical images!

First let me review some known stuff.

For starters, note that these bumps seem to appear out of nowhere. That’s because this system is chaotic: small ripples get amplified. This is especially true of ripples with a certain wavelength: roughly $2 \sqrt{2} \pi$, as we’ll see later.

And yet while solutions of the Kuramoto–Sivanshinsky equation are chaotic, they have a certain repetitive character. That is, they don’t do completely novel things; they seem to keep doing the same general sort of thing. The world this equation describes has an arrow of time, but it’s ultimately rather boring compared to ours.

The reason is that all smooth solutions of the Kuramoto–Sivanshinsky equation quickly approach a certain finite-dimensional manifold of solutions, called an ‘inertial manifold’. The dynamics on the inertial manifold is chaotic. And sitting inside it is a set called an ‘attractor’, which all solutions approach. This attractor is probably a fractal. This attractor describes the complete repertoire of what you’ll see solutions do if you wait a long time.

Some mathematicians have put a lot of work into proving these things, but let’s see how much we can understand without doing anything too hard.

Written out with a bit less jargon, the Kuramoto–Sivashinky equation says

$\displaystyle{ \frac{\partial u}{\partial t} = - \frac{\partial^2 u}{\partial x^2} - \frac{\partial^4 u}{\partial x^4} - \frac{1}{2}\left( \frac{\partial u}{\partial x}\right)^2 }$

or in more compressed notation,

$u_t = -u_{x x} -u_{x x x x} - (u_x)^2$

To understand it, first remember the heat equation:

$u_t = u_{x x}$

This describes how heat spreads out. That is: if $u(t,x)$ is the temperature of an iron rod at position $x$ at time $t$, the heat equation describes how this temperature function flattens out as time passes and heat spreads.

But the Kuramoto–Sivashinsky equation more closely resembles the time-reversed heat equation

$u_t = -u_{x x}$

This equation describes how, running a movie of a hot iron rod backward, heat tends to bunch up rather than smear out! Small regions of different temperature, either hotter or colder than their surroundings, will tend to amplify.

This accounts for the chaotic behavior of the Kuramoto–Sivashinsky equation: small bumps emerge as if out of thin air and then grow larger. But what keeps these bumps from growing uncontrollably?

The next term in the equation helps. If we have

$u_t = -u_{x x} - u_{x x x x}$

then very sharp spikes in $u(t,x)$ tend to get damped out exponentially.

To see this, it helps to bring in a bit more muscle: Fourier series. We can easily solve the heat equation if our iron rod is the interval $[0,2\pi]$ and we demand that its temperature is the same at both ends:

$u(t,0) = u(t,2\pi)$

This lets us write the temperature function $u(t,x)$ in terms of the functions $e^{i k x}$ like this:

$\displaystyle{ u(t,x) = \sum_{k = -\infty}^\infty \hat{u}_k(t) e^{i k x} }$

for some functions $\hat{u}_k(t)$. Then the heat equation gives

$\displaystyle{ \frac{d}{d t} \hat{u}_k(t) = -k^2 \hat{u}_k(t) }$

and we can easily solve these equations and get

$\displaystyle{ \hat{u}_k(t) = e^{-k^2 t} \hat{u}_k(0) }$

and thus

$\displaystyle{ u(t,x) = \sum_{k = -\infty}^\infty \hat{u}_k(0) e^{-k^2 t} e^{i k x} }$

So, each function $\hat{u}_k(t)$ decays exponentially as time goes on, and the so-called ‘high-frequency modes’, $\hat{u}_k(t)$ with $|k|$ big, get damped really fast due to that $e^{-k^2 t}$ factor. This is why heat smears out as time goes on.

If we solve the time-reversed heat equation the same way we get

$\displaystyle{ u(t,x) = \sum_{k = -\infty}^\infty \hat{u}_k(0) e^{k^2 t} e^{i k x} }$

so now high-frequency modes get exponentially amplified. The time-reversed heat equation is a very unstable: if you change the initial data a little bit by adding a small amount of some high-frequency function, it will make an enormous difference as time goes by.

What keeps things from going completely out of control? The next term in the equation helps:

$u_t = -u_{x x} - u_{x x x x}$

This is still linear so we can still solve it using Fourier series. Now we get

$\displaystyle{ u(t,x) = \sum_{k = -\infty}^\infty \hat{u}_k(0) e^{(k^2-k^4) t} e^{i k x} }$

Since $k^2 - k^4 \le 0$, none of the modes $\hat{u}_k(t)$ grows exponentially. In fact, all the modes decay exponentially except for three: $k = -1,0,1$. These will be constant in time. So, any solution will approach a constant as time goes on!

We can make the story more interesting if we don’t require our rod to have length $2\pi$. Say it has length $L$. We can write functions on the interval $[0,L]$ as linear combinations of functions $e^{i k x}$ where now the frequencies $k$ aren’t integers: instead

$k = 2\pi n/L$

for integers $n$. The longer our rod, the lower these frequencies $k$ can be. The rest of the math works almost the same: we get

$\displaystyle{ u(t,x) = \sum_{n = -\infty}^\infty \hat{u}_k(0) e^{(k^2-k^4) t} e^{i k x} }$

but we have to remember $k = 2\pi n/L$. The modes with $k^2 - k^4 > 0$ will grow exponentially, while the rest will decay exponentially or stay constant. Note that $k^2 - k^4 > 0$ only for $0 <|k| < 1$. So, modes with these frequencies grow exponentially. Modes with $|k| > 1$ decay exponentially.

If $L < 2\pi$, all the frequencies $k$ are integers times $2\pi/L$, which is bigger than $1$, so no modes grow exponentially — and indeed all solutions approach a constant! But as you look at longer and longer rods, you get more and more modes that grow exponentially. The number of these will be roughly proportional to $L$, though they will ‘pop into existence’ at certain specific values of $L$.

Which exponentially growing modes grow the fastest? These are the ones that make $k^2 - k^4$ as large as possible, so they happen near where

$\displaystyle{ \frac{d}{d k} (k^2 - k^4) = 0 }$

namely $k = 1/\sqrt{2}$. The wavelength of a mode is $2\pi/k$, so these fastest-growing modes have wavelength close to $2\sqrt{2} \pi$.

In short, our equation has a certain length scale where the instability is most pronounced: temperature waves with about this wavelength grow fastest.

All this is very easy to work out in as much detail as we want, because our equation so far is linear. The full-fledged Kuramoto–Sivashinsky equation

$u_t = -u_{x x} - u_{x x x x} - (u_x)^2$

is a lot harder. And yet some features remain, which is why it was worth spending time on the linear version.

For example, it turns out that the bumps we see in the movie above have width roughly $2 \sqrt{2} \pi$. Bumps of roughly this width tend to get amplified. Why don’t they keep on growing forever? Apparently the nonlinear term $-(u_x)^2$ prevents it. But this is not obvious. Indeed, it’s conjectured that if you solve the Kuramoto–Sivashinsky equation starting with a bounded smooth function $u(0,x)$, the solution will remain bounded by a constant. But this has not been proved — or at least it was not proved as of 2000, when this very nice article was written:

The most fascinating fact about the Kuramoto–Sivashinsky equation is that for any fixed length $L$, it has a finite-dimensional manifold $M$ of solutions such that every solution approaches one of these, exponentially fast! So, while this equation really describes an infinite-dimensional dynamical system, as $t \to \infty$ its solutions move closer and closer to the solutions of some finite-dimensional dynamical system. This finite-dimensional system contains all the information about the patterns we’re seeing in Thien An’s movie.

As I mentioned, the manifold $M$ is called an ‘inertial manifold’. This is a general concept in dynamical systems theory:

To make these ideas precise we need to choose a notion of distance between two solutions at a given time. A good choice uses the $L^2$ norm for periodic functions on $[0,L]$:

$\|f\| = \sqrt{\int_0^L |f(x)|^2 \, d x}.$

Functions on $[0,L]$ with finite $L^2$ norm form a Hilbert space called $L^2[0,L]$. If we start with any function $u(0,-)$ in this Hilbert space we get a solution $u(t,x)$ of the Kuramoto–Sivashinsky equation such that the function $u(t,-)$ is in this Hilbert space at all later times $t$. Furthermore, this function is smooth, even analytic, for all later times:

This smoothing property is well-known for the heat equation, but it’s much less obvious here!

This work also shows that the Kuramoto–Sivashinsky equation defines a dynamical system on the Hilbert space $L^2[0,\infty]$. And based on earlier work by other mathematicians, Temam and Wang have heroically shown that this Hilbert space contains an inertial manifold of dimension bounded by some constant times $L^{1.64} (\ln L)^{0.2}.$

I conjecture that in reality its dimension grows roughly linearly with $L$. Why? We’ve just seen this is true for the linearized version of the Kuramoto–Sivashinsky equation: all modes with frequency $|k| > 1$ get damped exponentially, but since there’s one mode for each integer $n$, and $k = 2\pi n/L$, these modes correspond to integers $n$ with $|n| \le L /2\pi$. So, there are $\lfloor L/\pi \rfloor$ of these modes. In short, for the linearized Kuramoto–Sivashinsky equation the inertial manifold has dimension about $L/\pi$.

This evidence is rather weak, since it completely ignores the nonlinearity of the Kuramoto–Sivashinsky equation. I would not be shocked if the dimension of the inertial manifold grew at some other rate than linearly with $L$.

Sitting inside the inertial manifold is an attractor, the smallest set that all solutions approach. This is probably a fractal, since that’s true of many chaotic systems. So besides trying to estimate the dimension of the inertial manifold, which is an integer we should try to estimate the dimension of this attractor, which may not be an integer!

There have been some nice numerical experiments studying solutions of the Kuramoto–Sivashinsky equation for various values of $L$, seeing how they get more complicated as $L$ increases. For small $L$, every solution approaches a constant, just as in the linearized version. For larger $L$ we get periodic solutions, and as $L$ continues to increase we get period doubling and finally chaos — a typical behavior for dynamical systems. But that’s just the start. For details, read this:

I’ll warn you that they use a slightly different formalism. Instead of changing the length $L$, they keep it equal to $2\pi$ and change the equation, like this:

$u_t = -u_{x x} - v u_{x x x x} - (u_x)^2$

for some number $v$ they call the ‘viscosity’. It’s just a different way of describing the same business, so if I had more energy I could figure out the relation between $L$ and $v$ and tell you at which length $L$ chaos first kicks in. But I won’t now.

Instead, I want to make some conjectures. I believe there is some fairly well-defined notion of a ‘bump’ for the Kuramoto–Sivashinsky equations: you can see the bumps form and merge here, and I believe there is a fairly simple way to actually count them at any moment in time:

Since we are studying solutions of the Kuramoto–Sivashinsky equation that are periodic in the space direction, we can think of space as a circle: the interval $[0,L]$ with its endpoints identified. Given a continuous solution $u$ of the equation, define a bump to be a maximal open interval in the circle such that $u(t,x) > c$ for all $x$ in this interval. This definition depends on a constant $c > 0$, but I think there is a range of values for which the following conjectures are true. One could guess this range with some numerical experiments, which I’m hoping one of you can do!

I believe that for nonnegative solutions in the inertial manifold, once a bump forms it never splits into two or more bumps or disappears: all it can do is merge with another bump.

I conjecture that if $L$ is large enough, almost every nonnegative solution on the inertial manifold has a finite number of bumps, and that while bumps can appear and merge, they can never split or disappear.

(Here ‘almost every’ is in the usual sense of measure theory. There are certainly solutions of the Kuramoto–Sivashinsky equation that don’t have bumps that appear and merge, like constant solutions. These solutions may lie on the inertial manifold, but I’m claiming they are rare.)

I also conjecture that the time-averaged number of bumps is asymptotically proportional to $L$ as $L \to \infty$ for almost every nonnegative solution on the inertial manifold. The constant of proportionality shouldn’t depend on the solution we pick, except for solutions in some set of measure zero. It will, however, depend on our constant $c$ that defines the minimum height of an official ‘bump’.

I also conjecture that there’s a well-defined time average of the rate at which new bumps form, which is also asymptotically proportional to $L$ and independent of which solution we pick, except for solutions in a set of measure zero.

I also conjecture that this rate equals the time-averaged rate at which bumps merge, while the time-averaged rate at which bumps disappear or split is zero.

These conjectures are rather bold, but of course there are various fallback positions if they fail.

How can we test these conjectures? It’s hard to explicitly describe solutions that are actually on the inertial manifold, but by definition, any solution keeps getting closer to the inertial manifold at an exponential rate. Thus, it should behave similarly to solutions that are on the inertial manifold, after we wait long enough. So, I’ll conjecture that the above properties hold not only for almost every solution on the inertial manifold, but for typical solutions that start near the inertial manifold… as long as we wait long enough when doing our time averages. This makes the conjectures testable with numerical experiments.

Here are some things I’d love to see:

• Movies like Thien An’s, but with various choices of $L.$ To create these, maybe start with

$u(0,x) = \sin \frac{x}{2} + \text{ a small amount of noise}$

and run it for a longish time so it ‘settles down’ — that is, gets near the inertial manifold.

• More importantly: movies like the above, but where the solution is drawn in black if $u(t,x) > c$ and white otherwise. For a good choice of $c$ — and please tell me what choice that is! — we should see black stripes appear and merge but rarely if ever disappear or split.

• A time-averaged count of the average number of black stripes — that is, official ‘bumps’ — for various choices of $L.$ I’m conjecturing that this is asymptotically proportional to $L$ for large $L.$

• Time-averaged counts of the rates at which black stripes are born, merge, split, and die — again for various choices of $L.$ I’m conjecturing that the first two are asymptotically proportional to $L$ for large $L$ and that they’re equal. I’m conjecturing that the last two are zero, or tiny.

If someone gets into this, maybe we could submit a short paper to Experimental Mathematics. I’ve been browsing papers on the Kuramoto–Sivashinsky equation, and I haven’t yet seen anything that gets into as much detail on what solutions look like as I’m trying to do here.

One last thing. I forgot to emphasize that the dynamical system on the Hilbert space $L^2[0,L]$ is not reversible: we can evolve a solution forwards in time and it will stay in this Hilbert space, but not backwards in time. This is very well-known for the heat equation; the point is that solutions get smoother as we run them forward, but when we run them backward they typically get more wild and eventually their $L^2$ norm blows up.

What makes this especially interesting is that the dynamical system on the inertial manifold probably is reversible. As long as this manifold is compact, it must be: any smooth vector field on a compact manifold $M$ generates a ‘flow’ that you can run forward or backward in time.

And yet, even if this flow is reversible, as I suspect it is, it doesn’t resemble its time-reversed version! It has an ‘arrow of time’ built in, since bumps are born and merge much more often than they merge and split.

So, if my guesses are right, the inertial manifold for the Kuramoto–Sivashinsky equation describes a deterministic universe where time evolution is reversible — and yet the future doesn’t look like the past, because the dynamics carry the imprint of the irreversible dynamics of the Kuramoto–Sivashinsky equation on the larger Hilbert space of all solutions.

## October 17, 2021

### Doug Natelson — Brief items - Sarachik, Feynman, NSF postdocs and more

Here are several items of interest:

• I was saddened to learn of the passing of Myriam Sarachik, a great experimental physicist and a generally impressive person.  I was thinking about writing a longer piece about her, but this New York Times profile from last year is better than anything I could do.  This obituary retells the story to some degree. (I know that it's pay-walled, but I can't find a link to a free version.)  In the early 1960s, after fighting appalling sexism to get a doctorate and a position at Bell Labs, she did foundational experimental work looking at the effect of dilute magnetic impurities in the conduction of nonmagnetic metals.  For each impurity, the magnetic atom has an unpaired electron in a localized orbitals.  A conduction electron of opposite spin could form a singlet to fill that orbital, but the on-site Coulomb repulsion of the electron already there makes that energetically forbidden except as a virtual intermediate state for a scattering process.  The result is that scattering by magnetic impurities gets enhanced as $$T$$ falls, leading to an upturn in the resistivity $$\rho(T)$$ that is logarithmic in $$T$$ at low temperatures.  Eventually the localized electron is entangled with the conduction electrons to form a singlet, and the resistivity saturates.  This is known as the Kondo Effect based on the theoretical explanation of the problem, but Sarachik's name could credibly have been attached.  Her family met with a personal tragedy from which it took years to recover.  Later in her career, she did great work looking at localization and the metal-insulator transition in doped semiconductors.  She also worked on the quantum tunneling of magnetization in so-called single-molecule magnets, and was a key player in the study of the 2D metal-insulator transition in silicon MOSFETs.  I was fortunate enough to meet her when she came through Rice in about 2003, and she was very generous in her time meeting with me when I was a young assistant professor.  Sarachik also had a great service career, serving as APS President around that time.  Heck of a career!
• The audio recordings of the famous Feynman Lectures on Physics are now available for free to stream from Caltech.  You can also get to these from the individual lectures by a link on the side of each page.
• There is a new NSF postdoctoral fellowship program for math and physical sciences.  I would be happy to talk to anyone who might be interested in pursuing one of these who might want to work with me.  Please reach out via email.
• I've written before about the "tunneling time" problem - how long does quantum mechanical tunneling of a particle through a barrier take?  Here is an experimental verification of one of the most counterintuitive results in this field:  the farther "below" the barrier the particle is (in the sense of having a smaller fraction of the kinetic energy needed classically to overcome the potential barrier), the faster the tunneling.  A key experimental technique here is the use of a "Larmor clock", with the precession of the spin of a tunneling atom acting as the time-keeping mechanism.
• Did you know that it is possible, in Microsoft Word, to turn on some simple LaTeX-style symbolic coding?  The key is to enable "Math Autocorrect", and then typing \alpha will automatically be turned into $$\alpha$$.  (I know act like doing scientific writing in Word is heretical, but not everyone in every discipline is facile with LaTeX/Overleaf.)

## October 16, 2021

### Tommaso Dorigo — Art And Artificial Intelligence: An Odd Couple?

This past Thursday I held a public lecture, together with my long-time friend Ivan Bianchi, on the topic of Art and Artificial Intelligence. The event was organized by the "Galileo Festival" in Padova, for the Week of Innovation.
Ivan is a professor of Contemporary Art at the University of Padova. We have known each other since we were two year olds, as our mothers were friends. We took very different career paths but we both ended up in academic and research jobs in Padova, and we have been able to take part together in several events where art and science are at the focus. Giving a lecture together is twice as fun!

## October 15, 2021

### Matt von Hippel — Outreach Talk on Math’s Role in Physics

Tonight is “Culture Night” in Copenhagen, the night when the city throws open its doors and lets the public in. Museums and hospitals, government buildings and even the Freemasons, all have public events. The Niels Bohr Institute does too, of course: an evening of physics exhibits and demos, capped off with a public lecture by Denmark’s favorite bow-tie wearing weirder-than-usual string theorist, Holger Bech Nielsen. In between, there are a number of short talks by various folks at the institute, including yours truly.

In my talk, I’m going to try and motivate the audience to care about math. Math is dry of course, and difficult for some, but we physicists need it to do our jobs. If you want to be precise about a claim in physics, you need math simply to say what you want clearly enough.

Since you guys likely don’t overlap with my audience tonight, it should be safe to give a little preview. I’ll be using a few examples, but this one is the most complicated:

I’ll be telling a story I stole from chapter seven of the web serial Almost Nowhere. (That link is to the first chapter, by the way, in case you want to read the series without spoilers. It’s very strange, very unique, and at least in my view quite worth reading.) You follow a warrior carrying a spear around a globe in two different paths. The warrior tries to always point in the same direction, but finds that the two different paths result in different spears when they meet. The story illustrates that such a simple concept as “what direction you are pointing” isn’t actually so simple: if you want to think about directions in curved space (like the surface of the Earth, but also, like curved space-time in general relativity) then you need more sophisticated mathematics (a notion called parallel transport) to make sense of it.

It’s kind of an advanced concept for a public talk. But seeing it show up in Almost Nowhere inspired me to try to get it across. I’ll let you know how it goes!

By the way, if you are interested in learning the kinds of mathematics you need for theoretical physics, and you happen to be a Bachelor’s student planning to pursue a PhD, then consider the Perimeter Scholars International Master’s Program! It’s a one-year intensive at the Perimeter Institute in Waterloo, Ontario, in Canada. In a year it gives you a crash course in theoretical physics, giving you tools that will set you ahead of other beginning PhD students. I’ve witnessed it in action, and it’s really remarkable how much the students learn in a year, and what they go on to do with it. Their early registration deadline is on November 15, just a month away, so if you’re interested you may want to start thinking about it.

### n-Category CaféDynamics of Reason Revisited

A couple of years ago, I mentioned a talk reporting my latest thoughts on a very long-term project to bring Michael Friedman’s Dynamics of Reason (2001) into relation with developments in higher category theory and its applications.

While that Vienna talk entered into some technicalities on cohomology, last week I had the opportunity of speaking at our departmental seminar in Kent, and so thought I’d sketch what might be of broader philosophical interest about the project.

You can find the slides here.

## October 12, 2021

### Scott Aaronson — Gaussian BosonSampling, higher-order correlations, and spoofing: An update

In my last post, I wrote (among other things) about an ongoing scientific debate between the group of Chaoyang Lu at USTC in China, which over the past year has been doing experiments that seek to demonstrate quantum supremacy via Gaussian BosonSampling; and the group of Sergio Boixo at Google, which had a recent paper on a polynomial-time classical algorithm to sample approximately from the same distributions.  I reported the facts as I understood them at the time.  Since then, though, a long call with the Google team gave me a new and different understanding, and I feel duty-bound to share that here.

A week ago, I considered it obvious that if, using a classical spoofer, you could beat the USTC experiment on a metric like total variation distance from the ideal distribution, then you would’ve completely destroyed USTC’s claim of quantum supremacy.  The reason I believed that, in turn, is a proposition that I hadn’t given a name but needs one, so let me call it Hypothesis H:

The only way a classical algorithm to spoof BosonSampling can possibly do well in total variation distance, is by correctly reproducing the high-order correlations (correlations among the occupation numbers of large numbers of modes) — because that’s where the complexity of BosonSampling lies (if it lies anywhere).

Hypothesis H had important downstream consequences.  Google’s algorithm, by the Google team’s own admission, does not reproduce the high-order correlations.  Furthermore, because of limitations on both samples and classical computation time, Google’s paper calculates the total variation distance from the ideal distribution only on the marginal distribution on roughly 14 out of 144 modes.  On that marginal distribution, Google’s algorithm does do better than the experiment in total variation distance.  Google presents a claimed extrapolation to the full 144 modes, but eyeballing the graphs, it was far from clear to me what would happen: like, maybe the spoofing algorithm would continue to win, but maybe the experiment would turn around and win; who knows?

Chaoyang, meanwhile, made a clear prediction that the experiment would turn around and win, because of

1. the experiment’s success in reproducing the high-order correlations,
2. the admitted failure of Google’s algorithm in reproducing the high-order correlations, and
3. the seeming impossibility of doing well on BosonSampling without reproducing the high-order correlations (Hypothesis H).

Given everything my experience told me about the central importance of high-order correlations for BosonSampling, I was inclined to agree with Chaoyang.

Now for the kicker: it seems that Hypothesis H is false.  A classical spoofer could beat a BosonSampling experiment on total variation distance from the ideal distribution, without even bothering to reproduce the high-order correlations correctly.

This is true because of a combination of two facts about the existing noisy BosonSampling experiments.  The first fact is that the contribution from the order-k correlations falls off like 1/exp(k).  The second fact is that, due to calibration errors and the like, the experiments already show significant deviations from the ideal distribution on the order-1 and order-2 correlations.

Put these facts together and what do you find?  Well, suppose your classical spoofing algorithm takes care to get the low-order contributions to the distribution exactly right.  Just for that reason alone, it could already win over a noisy BosonSampling experiment, as judged by benchmarks like total variation distance from the ideal distribution, or for that matter linear cross-entropy.  Yes, the experiment will beat the classical simulation on the higher-order correlations.  But because those higher-order correlations are exponentially attenuated anyway, they won’t be enough to make up the difference.  The experiment’s lack of perfection on the low-order correlations will swamp everything else.

Granted, I still don’t know for sure that this is what happens — that depends on whether I believe Sergio or Chaoyang about the extrapolation of the variation distance to the full 144 modes (my own eyeballs having failed to render a verdict!).  But I now see that it’s logically possible, maybe even plausible.

So, let’s imagine for the sake of argument that Google’s simulation wins on variation distance, even though the experiment wins on the high-order correlations.  In that case, what would be our verdict: would USTC have achieved quantum supremacy via BosonSampling, or not?

It’s clear what each side could say.

Google could say: by a metric that Scott Aaronson, the coinventor of BosonSampling, thought was perfectly adequate as late as last week — namely, total variation distance from the ideal distribution — we won.  We achieved lower variation distance than USTC’s experiment, and we did it using a fast classical algorithm.  End of discussion.  No moving the goalposts after the fact.

Google could even add: BosonSampling is a sampling task; it’s right there in the name!  The only purpose of any benchmark — whether Linear XEB or high-order correlation — is to give evidence about whether you are or aren’t sampling from a distribution close to the ideal one.  But that means that, if you accept that we are doing the latter better than the experiment, then there’s nothing more to argue about.

USTC could respond: even if Scott Aaronson is the coinventor of BosonSampling, he’s extremely far from an infallible oracle.  In the case at hand, his lack of appreciation for the sources of error in realistic experiments caused him to fixate inappropriately on variation distance as the success criterion.  If you want to see the quantum advantage in our system, you have to deliberately subtract off the low-order correlations and look at the high-order correlations.

USTC could add: from the very beginning, the whole point of quantum supremacy experiments was to demonstrate a clear speedup on some benchmark — we never particularly cared which one!  That horse is out of the barn as soon as we’re talking about quantum supremacy at all — something the Google group, which itself reported the first quantum supremacy experiment in Fall 2019, again for a completely artificial benchmark — knows as well as anyone else.  (The Google team even has experience with adjusting benchmarks: when, for example, Pan and Zhang pointed out that Linear XEB as originally specified is pretty easy to spoof for random 2D circuits, the most cogent rejoinder was: OK, fine then, add an extra check that the returned samples are sufficiently different from one another, which kills Pan and Zhang’s spoofing strategy.) In that case, then, why isn’t a benchmark tailored to the high-order correlations as good as variation distance or linear cross-entropy or any other benchmark?

Both positions are reasonable and have merit — though I confess to somewhat greater sympathy for the one that appeals to my doofosity rather than my supposed infallibility!

OK, but suppose, again for the sake of argument, that we accepted the second position, and we said that USTC gets to declare quantum supremacy as long as its experiment does better than any known classical simulation at reproducing the high-order correlations.  We’d still face the question: does the USTC experiment, in fact, do better on that metric?  It would be awkward if, having won the right to change the rules in its favor, USTC still lost even under the new rules.

Sergio tells me that USTC directly reported experimental data only for up to order-7 correlations, and at least individually, the order-7 correlations are easy to reproduce on a laptop (although sampling in a way that reproduces the order-7 correlations might still be hard—a point that Chaoyang confirms, and where further research would be great). OK, but USTC also reported that their experiment seems to reproduce up to order-19 correlations. And order-19 correlations, the Google team agrees, are hard to sample consistently with on a classical computer by any currently known algorithm.

So then, why don’t we have direct data for the order-19 correlations?  The trouble is simply that it would’ve taken USTC an astronomical amount of computation time.  So instead, they relied on a statistical extrapolation from the observed strength of the lower-order correlations — there we go again with the extrapolations!  Of course, if we’re going to let Google rest its case on an extrapolation, then maybe it’s only sporting to let USTC do the same.

You might wonder: why didn’t we have to worry about any of this stuff with the other path to quantum supremacy, the one via random circuit sampling with superconducting qubits?  The reason is that, with random circuit sampling, all the correlations except the highest-order ones are completely trivial — or, to say it another way, the reduced state of any small number of output qubits is exponentially close to the maximally mixed state.  This is a real difference between BosonSampling and random circuit sampling—and even 5-6 years ago, we knew that this represented an advantage for random circuit sampling, although I now have a deeper appreciation for just how great of an advantage it is.  For it means that, with random circuit sampling, it’s easier to place a “sword in the stone”: to say, for example, here is the Linear XEB score achieved by the trivial classical algorithm that outputs random bits, and lo, our experiment achieves a higher score, and lo, we challenge anyone to invent a fast classical spoofing method that achieves a similarly high score.

With BosonSampling, by contrast, we have various metrics with which to judge performance, but so far, for none of those metrics do we have a plausible hypothesis that says “here’s the best that any polynomial-time classical algorithm can possibly hope to do, and it’s completely plausible that even a noisy current or planned BosonSampling experiment can do better than that.”

In the end, then, I come back to the exact same three goals I would’ve recommended a week ago for the future of quantum supremacy experiments, but with all of them now even more acutely important than before:

1. Experimentally, to increase the fidelity of the devices (with BosonSampling, for example, to observe a larger contribution from the high-order correlations) — a much more urgent goal, from the standpoint of evading classical spoofing algorithms, than further increasing the dimensionality of the Hilbert space.
2. Theoretically, to design better ways to verify the results of sampling-based quantum supremacy experiments classically — ideally, even ways that could be applied via polynomial-time tests.
3. For Gaussian BosonSampling in particular, to get a better understanding of the plausible limits of classical spoofing algorithms, and exactly how good a noisy device needs to be before it exceeds those limits.

Thanks so much to Sergio Boixo and Ben Villalonga for the conversation, and to Chaoyang Lu and Jelmer Renema for comments on this post. Needless to say, any remaining errors are my own.

## October 11, 2021

### Doug Natelson — The Purcell effect - still mind-blowing.

The Purcell effect is named after E. M. Purcell, a Nobel-winning physicist who also was a tremendous communicator, author of one of the great undergraduate textbooks and a famous lecture about the physical world from the point of view of, e.g., a bacterium.  I've written about this before here, and in a comment I include the complete (otherwise paywalled) text of the remarkable original "paper" that describes the effect.

When we calculate things like the Planck black-body spectrum, we use the "density of states" for photons - for a volume $$V$$, we are able to count up how many electromagnetic modes are available with frequency between $$\nu$$ and $$\nu + \mathrm{d}\nu$$, keeping in mind that for each frequency, the electric field can be polarized in two orthogonal directions.  The result is $$(8\pi/c^3)\nu^2 \mathrm{d}\nu$$ states per unit volume of "free space".

In a cavity, though, the situation is different - instead, there is, roughly speaking, one electromagnetic mode per the bandwidth of the cavity per the volume of the cavity.  In other words, the effective density of states for photons in the cavity is different than that in free space.  That has enormous ramifications:  The rates of radiative processes, even those that we like to consider as fundamental, like the rate at which electrically excited atoms radiatively decay to lower states state, can be altered in a cavity.  This is the basis for a lot of quantum optics work, as in cavity quantum electrodynamics.  Similarly, the presence of an altered (from free space) photon density of states also modifies the spectrum of thermal radiation from that cavity away from the Planck black-body spectrum.

Consider an excited atom in the middle of such a cavity.  When it is going to emit a photon, how does it "know" that it's in a cavity rather than in free space, especially if the cavity is much larger than an atom?  The answer is, somehow through the electromagnetic couplings to the atoms that make up the cavity.  This is remarkable, at least to me.   (It's rather analogous to how we picture the Casimir effect, where you can think about the same physics either, e.g., as due to altering local vacuum fluctuations of the EM field in the space between conducting plates, or as due to fluctuating dipolar forces because of fluctuating polarizations on the plates.)

Any description of a cavity (or plasmonic structure) altering the local photon density of states is therefore really short-hand.  In that approximation, any radiative process in question is tacitly assuming that an emitter or absorber in there is being influenced by the surrounding material.  We just are fortunate that we can lump such complicated, relativistically retarded interactions into an effective photon density of states that differs from that in free space.

## October 09, 2021

### Tommaso Dorigo — A Mentor's Pentalogue

I believe oceans of ink were spent, ever since pens were a thing, writing on the mentor-student relationship, its do's and don'ts, and the consequences of deviations from proper practice. And rightly so, as the balancing act required for a proper, effective teaching action is entirely non trivial. The fact that our didactical systems and academia are in constant evolution, that rules and courses formats change over time, and that as humans we tend to forget what has been learnt in the past (on good practice, I mean), require us to keep thinking about the topic, and continue to keep the discussion alive.

## October 08, 2021

### John Baez — Stirling’s Formula in Words

I recently sketched a quick proof of Stirling’s asymptotic formula for the factorial. But what does this formula really mean?

Above is my favorite explanation. You can’t see the number $2\pi$ in the words here, but it’s hiding in the formula for a Gaussian probability distribution. But what about the factorial, and the rest?

My description in words was informal. I’m really talking about a
Poisson distribution. If raindrops land at an average rate $r,$ this says that after time $t$ the probability of $k$ having landed is

$\displaystyle{ \frac{(rt)^k e^{-rt}}{k!} }$

This is where the factorial comes from, and also the number $e.$

Since the average rate of rainfall is $r,$ at time $t$ the expected number of drops that have landed will be $rt.$ Since I said “wait until the expected number of drops that have landed is $n$”, we want $rt = n.$ Then the probability of $k$ having landed is

$\displaystyle{ \frac{n^k e^{-n}}{k!} }$

Next, what’s the formula for a Gaussian with mean $n$ and standard deviation $\sqrt{n}$? Written as a function of $k$ it’s

$\displaystyle{ \frac{e^{-(k-n)^2/2n}}{\sqrt{2 \pi n}} }$

If this matches the Poisson distribution above in the limit of large $n,$ the two functions must match at the point $k = n,$ at least asymptotically, so

$\displaystyle{ \frac{n^n e^{-n}}{n!} \sim \frac{1}{\sqrt{2 \pi n}} }$

And this becomes Stirling’s formula after a tiny bit of algebra!

I learned about this on Twitter: Ilya Razenshtyn showed how to prove Stirling’s formula starting from probability theory this way. But it’s much easier to use his ideas to check that my paragraph in words implies Stirling’s formula, as I’ve just done.

### Tommaso Dorigo — A Nobel To A Friend

I very much would like to write about the Nobel prize in physics here today, but I realize I cannot really pay a good service to the three winners, nor to my readers, on that topic. The reason is, quite bluntly, that I am not qualified to do that without harming my self-respect. Also, I never knew about the research of two of the winners.

As for the third, I do know Giorgio Parisi's research in qualitative terms, and I happen to know him personally; well, at least we are Facebook friends, as maybe 500 of his contacts can also claim - plus, he once invited me to a symposium at the Accademia dei Lincei, of which he his vice-president. And I did write about his scientific accomplishments in the past here, on two occasions.

### Matt von Hippel — Congratulations to Syukuro Manabe, Klaus Hasselmann, and Giorgio Parisi!

The 2021 Nobel Prize in Physics was announced this week, awarded to Syukuro Manabe and Klaus Hasselmann for climate modeling and Giorgio Parisi for understanding a variety of complex physical systems.

Before this year’s prize was announced, I remember a few “water cooler chats” about who might win. No guess came close, though. The Nobel committee seems to have settled in to a strategy of prizes on a loosely linked “basket” of topics, with half the prize going to a prominent theorist and the other half going to two experimental, observational, or (in this case) computational physicists. It’s still unclear why they’re doing this, but regardless it makes it hard to predict what they’ll do next!

When I read the announcement, my first reaction was, “surely it’s not that Parisi?” Giorgio Parisi is known in my field for the Altarelli-Parisi equations (more properly known as the DGLAP equations, the longer acronym because, as is often the case in physics, the Soviets got there first). These equations are in some sense why the scattering amplitudes I study are ever useful at all. I calculate collisions of individual fundamental particles, like quarks and gluons, but a real particle collider like the LHC collides protons. Protons are messy, interacting combinations of quarks and gluons. When they collide you need not merely the equations describing colliding quarks and gluons, but those that describe their messy dynamics inside the proton, and in particular how those dynamics look different for experiments with different energies. The equation that describes that is the DGLAP equation.

As it turns out, Parisi is known for a lot more than the DGLAP equation. He is best known for his work on “spin glasses”, models of materials where quantum spins try to line up with each other, never quite settling down. He also worked on a variety of other complex systems, including flocks of birds!

I don’t know as much about Manabe and Hasselmann’s work. I’ve only seen a few talks on the details of climate modeling. I’ve seen plenty of talks on other types of computer modeling, though, from people who model stars, galaxies, or black holes. And from those, I can appreciate what Manabe and Hasselmann did. Based on those talks, I recognize the importance of those first one-dimensional models, a single column of air, especially back in the 60’s when computer power was limited. Even more, I recognize how impressive it is for someone to stay on the forefront of that kind of field, upgrading models for forty years to stay relevant into the 2000’s, as Manabe did. Those talks also taught me about the challenge of coupling different scales: how small effects in churning fluids can add up and affect the simulation, and how hard it is to model different scales at once. To use these effects to discover which models are reliable, as Hasselmann did, is a major accomplishment.

### Jordan Ellenberg — Beautiful World, Where Are You?

I was struck by the fact that this book was getting a huge amount of press, and I was clearly supposed to have heard of the author, Sally Rooney, but I had not. And when I asked people about this, I was told it was generational. Rooney is “a millennial author” and I am not a millennial reader. I took this as a challenge! Can I read millennially?

Here are some thoughts which I suppose contain plot spoilers if you are the sort of reader who wants to avoid those before reading the books. (I am.)

What I really like about the book is its strange and affecting choice to use a narrative voice which can go anywhere and see anything but cannot enter any of the main characters’ minds. Everything is done through dialogue and description of bodily motion. The narrator never speaks in the first person but somehow has a personality, is a kind of lonely spirit, which sometimes wanders away from the narrative action entirely and goes out into the night, while the characters keep talking inside the cozy, lighted house, where the narrator can no longer hear. (The only other recent book I can think of that does something like this is is J. Robert Lennon’s Broken River, but the purpose there is pretty different; that one is really going for, and achieving, outright spookiness.)

This choice is the central stylistic fact of the novel and every moment gets colored by it, as in a novel written in the second person.

There is a lot of sex in this book, for instance, and the fact that we are locked out of the human experience of it, just watching bodies roll over each other, makes it uncomfortable to read — frankly, kind of porny. By design, since the characters themselves are not really able to experience each other as people, even though at moments they think they’re so doing.

The story is broken up by emails from one character to another — in a normal novel these could be simply changes of register, a comfortable way to vary the style and bring in information about the characters without cramming it artificially into dialogue or reminiscence. But here, because we’ve been locked out of the characters’ minds, the artificiality of the form comes to the fore. We don’t experience the emails as direct contact with the character’s beliefs, but as performances, which is of course what letters actually are. And so the little philosophical essays that might otherwise be read as authorial thesis statements by proxy are, here, more like — what, affectations? Things the characters wear, like clothes, from which observers can tell what kind of people they are asserting themselves to be.

About two-thirds of the way through, the narrative breaks the rule and goes into Eileen’s mind for a reminiscence of her early romantic feeling for Simon, the man we’re watching her present-day romance with. (Simon is also Jesus, sort of.) I’m not sure why Rooney does this. In fact the book, which sets itself up very satisfyingly, doesn’t seem to know what to do once it has established its mood of eerie distance. The last part of the novel — back to the distant narrator, at this point — contains a lot of long monologues which feel purposeless and lack the snap of the very, very good renderings of speech earlier on. To be honest I had the feeling Rooney was tired of moving the characters around on the board and knows that in novels people traditionally settle down together in the end so that’s what happens. But this very assured and unconventional book doesn’t like having a conventional ending. On some level I think Rooney recognizes this, so puts the ending in a pair of letters rather than try to narrate it.

This sounds like I didn’t like it, but I did like it. Rooney set herself a difficult task and didn’t, it seems to me, bring it off; but most books don’t even try anything hard.

Some other people writing on this book:

Tony Tulathimutte in the Nation, who makes some correct criticisms of some of the sentences in the book;

Anne Enright in the Guardian, who is very good on the strange power of the novel’s style, and who is completely won over by the ending that left me so unsatisfied.

Update: There is another read here, which is that I’m overthinking it and this is meant to be the sort of novel in which you feel about the characters the way you might about people you know, and just straightforwardly hope for certain outcomes for them, the way one does (I mean, I do) in The Age of Innocence or Elena Ferrante novels (I mean, the ones I’ve read.) If that’s the work the book is doing, it didn’t work for me (but I think it worked for others, like Anne Enright.)

## October 06, 2021

### John Baez — Stirling’s Formula

Stirling’s formula says

$\displaystyle{ n! \sim \sqrt{2 \pi n}\, \left(\frac{n}{e}\right)^n }$

where $\sim$ means that the ratio of the two quantities goes to $1$ as $n \to \infty.$

Where does this formula come from? In particular, how does the number $2\pi$ get involved? Where is the circle here?

To understand these things, I think a nonrigorous argument that can be made rigorous is more useful than a rigorous proof with all the ε’s dotted and the δ’s crossed. It’s important, I think, to keep the argument short. So let me do that.

The punchline will be that the $2\pi$ comes from this formula:

$\displaystyle{ \int_{-\infty}^\infty e^{-x^2/2} \, dx = \sqrt{2 \pi} }$

And this, I hope you know, comes from squaring both sides and converting the left side into a double integral that you can do in polar coordinates, pulling out a factor of $2 \pi$ because the thing you’re integrating only depends on $r,$ not $\theta.$

$\displaystyle{ \int_0^\infty x^n e^{-x} \, dx = n! }$

This is easy to show using repeated integration by parts.

Next, we do this:

$\begin{array}{ccl} n! &=& \displaystyle{ \int_0^\infty x^n e^{-x} \, dx } \\ \\ &=& \displaystyle{ \int_0^\infty e^{n \ln x -x} \, dx } \\ \\ &=& \displaystyle{ n\int_0^\infty e^{n \ln (ny) -n y} \, dy } \\ \\ \end{array}$

In first step we’re writing $x^n$ as $e^{n \ln x}.$ In the second we’re changing variables: $x = n y.$

Next we use $\ln(ny) = \ln n + \ln y$ to bust things up:

$\displaystyle{ n! = n e^{n \ln n} \int_0^\infty e^{n \ln y -n y} \, dy }$

All the hard work will come in showing this:

$\displaystyle{ \int_0^\infty e^{n \ln y -n y} \, dy \sim \sqrt{\frac{2 \pi}{n}} \; e^{-n} }$

Given this, we get

$\displaystyle{ n! \sim n e^{n \ln n} \sqrt{\frac{2 \pi}{n}} \; e^{-n} }$

and simplifying we get Stirling’s formulas:

$\displaystyle{ n! \sim \sqrt{2 \pi n} \, \left(\frac{n}{e}\right)^n}$

### Laplace’s method

So to prove Stirling’s formula, the big question is: how do we get

$\displaystyle{ \int_0^\infty e^{n \ln y -n y} \, dy \sim \sqrt{\frac{2 \pi}{n}} \; e^{-n} } \; ?$

Let’s write it like this:

$\displaystyle{ \int_0^\infty e^{-n (y - \ln y)} \, dy \sim \sqrt{\frac{2 \pi}{n}} \; e^{-n} }$

The trick is to note that as $n$ gets big, the integral will become dominated by the point where $y - \ln y$ is as small as possible. We can then approximate the integral by a Gaussian peaked at that point!

Notice that

$\displaystyle{ \frac{d}{dy} (y - \ln y) = 1 - y^{-1} }$

$\displaystyle{ \frac{d^2}{dy^2} (y - \ln y) = y^{-2} }$

so the function $y - \ln y$ has a critical point at $y = 1$ and its second derivative is $1$ there, so it’s a local minimum. Indeed this point is the unique minimum of our function on the whole interval $(0,\infty).$

Then we use this:

Laplace’s Method. Suppose $f \colon [a,b] \to \mathbb{R}$ has a unique minimum at some point $x_0 \in (a,b)$ and $f''(x_0) > 0.$ Then

$\displaystyle{ \int_a^b e^{-n f(x)} dx \sim \sqrt{\frac{2 \pi}{n f''(x_0)}} \; e^{-n f(x_0)} }$

as $n \to \infty.$

This says that asymptotically, the integral equals what we’d get if we replaced $f$ by the quadratic function whose value, first derivative and second derivative all match that of $f$ at the point $x_0.$ With this quadratic replacing $f,$ you can do the integral by hand—it’s the integral of a Gaussian—and you get the right hand side.

Applying this formula to the problem at hand we get

$\displaystyle{ \int_a^b e^{-n (y - \ln y)} dy \sim \sqrt{\frac{2 \pi}{n f''(y_0)}} \; e^{-n f(y_0)} }$

where $f(y) = y - \ln y,$ $y_0 = 1,$ $f(y_0) = 1$ and $f''(y_0) = 1.$ So we get

$\displaystyle{ \int_a^b e^{-n (y - \ln y)} dy \sim \sqrt{\frac{2 \pi}{n}} \; e^{-n} }$

and then letting $a = 0, b \to +\infty$ we get what we want.

So, from this viewpoint—and there are others—the key to Stirling’s formula is Laplace’s method of approximating an integral like

$\displaystyle{ \int_a^b e^{-n f(x)} dx }$

with a Gaussian integral. And in the end, the crucial calculation is where we do that Gaussian integral, using

$\displaystyle{ \int_{-\infty}^\infty e^{-x^2/2} \, dx = \sqrt{2 \pi} }$

You can see the whole proof of Laplace’s method here:

• Wikipedia, Laplace’s method.

Physicists who have done quantum field theory will know that when push comes to shove it’s largely about Gaussian integrals. The limit $n \to \infty$ we’re seeing here is like a ‘classical limit’ where $\hbar \to 0.$ So they will be familiar with this idea.

There should be some deeper moral here, about how $n!$ is related to a Gaussian process of some sort, but I don’t know it—even though I know how binomial coefficients approximate a Gaussian distribution. Do you know some deeper explanation, maybe in terms of probability theory and combinatorics, of why $n!$ winds up being asymptotically described by an integral of a Gaussian?

For a very nice account of some cruder versions of Stirling’s formula, try this blog article:

• Michael Weiss, Stirling’s formula: Ahlfors’ derivation, Diagonal Argument, 17 July 2019.

His ‘note’, which you can find there, will give you more intuition for why something like Stirling’s formula should be true. But I think the above argument explains the $\sqrt{2 \pi}$ better than Ahlfors’ argument.

But in math, there are always mysteries within mysteries. Gaussians show up in probability theory when we add up lots of independent and identically distributed random variables. Could that be going on here somehow?

Yes! See this:

• Aditya Ghosh, A probabilistic proof of Stirling’s formula, Blog on Mathematics and Statistics, September 7, 2020.

Folks at the n-Category Café noticed more mysteries. $n!/n^n$ is the probability that a randomly chosen function from an n-element set to itself is a permutation. Stirling’s formula is a cool estimate of this probability! Can we use this to prove Stirling’s formula? I don’t know!

So I don’t think we’ve gotten to the bottom of Stirling’s formula! Comments at the n-Category Café contain other guesses about what it might ‘really mean’. You can read them here. Also check out Section 3 here, which discusses many different articles on Stirling’s formula in the American Mathematical Monthly:

• Jonathan M. Borwein and Robert M. Corless, Gamma and factorial in the Monthly.

Judging by the number of articles in the Monthly on the subject, Stirling’s formula approximating n! for large n is by far the most popular aspect of the $\Gamma$ function. There are “some remarks”, “notes”, more “remarks”; there are “simple proofs”, “direct proofs”, “new proofs”, “corrections”, “short proofs”, “very short proofs”, “elementary” proofs, “probabilistic” proofs, “new derivations”, and (our favourite title) “The (n+1)th proof”.

That should be “(n+1)st”.

### Doug Natelson — Spin glasses and the Nobel

The Nobel Prize in physics this year was a bit of a surprise, at least to me.  As one friend described it, it's a bit of a Frankenprize, stitched together out of rather disparate components.  (Apologies for the slow post - work was very busy today.)  As always, it's interesting to read the more in-depth scientific background of the prize.  I was unfamiliar with the climate modeling of Manabe and Hasselmann, and this was a nice intro.

The other prize recipient was Giorgio Parisi, a statistical mechanician whose key cited contribution was in the theory of spin glasses, but was generalizable to many disordered systems with slow, many-timescale dynamics including things like polymers and neural networks.

The key actors in a spin glass are excess spins - local magnetic moments that you can picture as little magnetic dipoles. In a generic spin glass, there is both disorder (as shown in the upper panel of the cartoon, spins - in this case iron atoms doped into copper - are at random locations, and that leads to a broad distribution of spin-spin interactions in magnitude and sign) and frustration (interactions such that flipping spin A to lower its interaction energy with spin B ends up raising the interaction energy with spin C, so that there is no simple configuration of spins that gives a global minimum of the interaction energy).  One consequence of this is a very complicated energy landscape, as shown in the lower panel of the cartoon.  There can be a very large number of configurations that all have about the same total energy, and flipping between these configurations can require a lot of energy such that it is suppressed at low temperatures.  These magnetic systems then end up having slow, "glassy" dynamics with long, non-exponential relaxations, in the same way that structural glasses (e.g., SiO2 glass) can get hung up in geometric configurations that are not the global energetic minimum (crystalline quartz, in the SiO2 case).

The standard tools of statistical physics are difficult to apply to the glassy situation.  A key assumption of equilibrium thermodynamics is that, for a given total energy, a system is equally likely to be found in any microscopic configuration that has that total energy.  Being able to cycle through all those configurations is called ergodicity.  In a spin glass at low temperatures, the potential landscape means that the system can get easily hung up in a local energy minimum, becoming non-ergodic.

An approach that Parisi took to this problem involved "replicas", where one considers the whole system as an ensemble of replica systems, and a key measure of what's going on is the similarity of configurations between the replicas.  Parisi himself summarizes this in this pretty readable (for physicists) article.  One of Parisi's big contributions was showing that the Ising spin glass model of Sherrington and Kirkpatrick is exactly solvable.

I learned about spin glasses as a doctoral student, since the interacting two-level systems in structural glasses at milliKelvin temperatures act a lot like a spin glass (TLS coupled to each other via a dipolar elastic interaction, and sometimes an electric dipolar interaction), complete with slow relaxations, differences between field-cooled and zero-field-cooled properties, etc.

Parisi has made contributions across many diverse areas of physics.  Connecting his work to that of the climate modelers is a bit of a stretch thematically - sure, they all worry about dynamics of complex systems, but that's a really broad umbrella.  Still, it's nice to see recognition for the incredibly challenging problem of strongly disordered systems.

## October 05, 2021

### Scott Aaronson — The Physics Nobel, Gaussian BosonSampling, and Dorian Abbot

1. Huge congratulations to the winners of this year’s Nobel Prize in Physics: Syukuro Manabe and Klaus Hasselmann for climate modelling, and separately, Giorgio Parisi for statistical physics. While I don’t know the others, I had the great honor to get to know Parisi three years ago, when he was chair of the committee that awarded me the Tomassoni-Chisesi Prize in Physics, and when I visited Parisi’s department at Sapienza University of Rome to give the prize lecture and collect the award. I remember Parisi’s kindness, a lot of good food, and a lot of discussion of the interplay between theoretical computer science and physics. Note that, while much of Parisi’s work is beyond my competence to comment on, in computer science he’s very well-known for applying statistical physics methods to the analysis of survey propagation—an algorithm that revolutionized the study of random 3SAT when it was introduced two decades ago.

2. Two weeks ago, a group at Google put out a paper with a new efficient classical algorithm to simulate the recent Gaussian BosonSampling experiments from USTC in China. They argued that this algorithm called into question USTC’s claim of BosonSampling-based quantum supremacy. Since then, I’ve been in contact with Sergio Boixo from Google, Chaoyang Lu from USTC, and Jelmer Renema, a Dutch BosonSampling expert and friend of the blog, to try to get to the bottom of this. Very briefly, the situation seems to be that Google’s new algorithm outperforms the USTC experiment on one particular metric: namely, total variation distance from the ideal marginal distribution, if (crucially) you look at only a subset of the optical modes, say 14 modes out of 144 total. Meanwhile, though, if you look at the kth-order correlations for large values of k, then the USTC experiment continues to win. With the experiment, the correlations fall off exponentially with k but still have a meaningful, detectable signal even for (say) k=19, whereas with Google’s spoofing algorithm, you choose the k that you want to spoof (say, 2 or 3), and then the correlations become nonsense for larger k.

Now, given that you were only ever supposed to see a quantum advantage from BosonSampling if you looked at the kth-order correlations for large values of k, and given that we already knew, from the work of Leonid Gurvits, that very small marginals in BosonSampling experiments would be easy to reproduce on a classical computer, my inclination is to say that USTC’s claim of BosonSampling-based quantum supremacy still stands. On the other hand, it’s true that, with BosonSampling especially, more so than with qubit-based random circuit sampling, we currently lack an adequate theoretical understanding of what the target should be. That is, which numerical metric should an experiment aim to maximize, and how well does it have to score on that metric before it’s plausibly outperforming any fast classical algorithm? One thing I feel confident about is that, whichever metric is chosen—Linear Cross-Entropy or whatever else—it needs to capture the kth-order correlations for large values of k. No metric that’s insensitive to those correlations is good enough.

3. Like many others, I was outraged and depressed that MIT uninvited Dorian Abbot (see also here), a geophysicist at the University of Chicago, who was slated to give the Carlson Lecture in the Department of Earth, Atmospheric, and Planetary Sciences about the atmospheres of extrasolar planets. The reason for the cancellation was that, totally unrelatedly to his scheduled lecture, Abbot had argued in Newsweek and elsewhere that Diversity, Equity, and Inclusion initiatives should aim for equality for opportunity rather than equality of outcomes, a Twitter-mob decided to go after him in retaliation, and they succeeded. It should go without saying that it’s perfectly reasonable to disagree with Abbot’s stance, to counterargue—if those very concepts haven’t gone the way of floppy disks. It should also go without saying that the MIT EAPS department chair is free to bow to social-media pressure, as he did, rather than standing on principle … just like I’m free to criticize him for it. To my mind, though, cancelling a scientific talk because of the speaker’s centrist (!) political views completely, 100% validates the right’s narrative about academia, that it’s become a fanatically intolerant echo chamber. To my fellow progressive academics, I beseech thee in the bowels of Bertrand Russell: why would you commit such an unforced error?

Yes, one can imagine views (e.g., open Nazism) so hateful that they might justify the cancellation of unrelated scientific lectures by people who hold those views, as many physicists after WWII refused to speak to Werner Heisenberg. But it seems obvious to me—as it would’ve been obvious to everyone else not long ago—that no matter where a reasonable person draws the line, Abbot’s views as he expressed them in Newsweek don’t come within a hundred miles of it. To be more explicit still: if Abbot’s views justify deplatforming him as a planetary scientist, then all my quantum computing and theoretical computer science lectures deserve to be cancelled too, for the many attempts I’ve made on this blog over the past 16 years to share my honest thoughts and life experiences, to write like a vulnerable human being rather than like a university press office. While I’m sure some sneerers gleefully embrace that implication, I ask everyone else to consider how deeply they believe in the idea of academic freedom at all—keeping in mind that such a commitment only ever gets tested when there’s a chance someone might denounce you for it.

Update: Princeton’s James Madison Program has volunteered to host Abbot’s Zoom talk in place of MIT. The talk is entitled “Climate and the Potential for Life on Other Planets.” Like probably hundreds of others who heard about this only because of the attempted cancellation, I plan to attend!

Unrelated Bonus Update: Here’s a neat YouTube video put together by the ACM about me as well as David Silver of AlphaGo and AlphaZero, on the occasion of our ACM Prizes in Computing.

### John Baez — The Cyclic Identity for Partial Derivatives

As an undergrad I learned a lot about partial derivatives in physics classes. But they told us rules as needed, without proving them. This rule completely freaked me out. If derivatives are kinda like fractions, shouldn’t this equal 1?

Let me show you why it’s -1.

First, consider an example:

This example shows that the identity is not crazy. But in fact it
holds the key to the general proof! Since $(u,v)$ is a coordinate system we can assume without loss of generality that $u = x, v = y$. At any point we can approximate $w$ to first order as $ax + by + c$ for some constants $a,b,c$. But for derivatives the constant $c$ doesn’t matter, so we can assume it’s zero.

Then just compute!

There’s also a proof using differential forms that you might like
better. You can see it here, along with an application to
thermodynamics:

But this still leaves us yearning for more intuition — and for me, at least, a more symmetrical, conceptual proof. Over on Twitter, someone named
Postdoc/cake provided some intuition using the same example from thermodynamics:

Using physics intuition to get the minus sign:

1. increasing temperature at const volume = more pressure (gas pushes out more)
2. increasing temperature at const pressure = increasing volume (ditto)
3. increasing pressure at const temperature = decreasing volume (you push in more)

Jules Jacobs gave the symmetrical, conceptual proof that I was dreaming of:

This proof is more sophisticated than my simple argument, but it’s very pretty, and it generalizes to higher dimensions in ways that’d be hard to guess otherwise.

He uses some tricks that deserve explanation. As I’d hoped, the minus signs come from the anticommutativity of the wedge product of 1-forms, e.g.

$du \wedge dv = - dv \wedge du$

But what lets us divide quantities like this? Remember, $u, v, w$ are all functions on the plane, so $du, dv, dw$ are 1-forms on the plane. And since the space of 2-forms at a point in the plane is 1-dimensional, we can divide them! After all, given two vectors in a 1d vector space, the first is always some number times the second, as long as the second is nonzero. So we can define their ratio to be that number.

For Jacobs’ argument, we also need that these ratios obey rules like

$\displaystyle{ \frac{\alpha}{\beta} \cdot \frac{\gamma}{\delta} = \frac{\alpha}{\delta} \cdot \frac{\gamma}{\beta} }$

But this is easy to check: whenever $\alpha, \beta, \gamma, \delta$ are vectors in a 1-dimensional vector space, division obeys the above rule. To put it in fancy language, nonzero vectors in any 1-dimensional real vector space form a ‘torsor’ for the multiplicative group of nonzero real numbers:

• John Baez, Torsors made easy.

Finally, Jules used this sort of fact:

$\displaystyle{ \frac{du \wedge dw}{dv \wedge dw} = \left. \frac{\partial u}{\partial v} \right|_w }$

Actually he forgot to write down this particular equation at the top of his short proof—but he wrote down three others of the same form, and they all work the same way. Why are they true?

I claim this equation is true at some point of the plane whenever $u, v, w$ are smooth functions on the plane and $dv \wedge dw$ doesn’t vanish at that point. Let’s see why!

First of all, by the inverse function theorem, if $dv \wedge dw \ne 0$ at some point in the plane, the functions $v$ and $w$ serve as coordinates in some neighborhood of that point. In this case we can work out $du$ in terms of these coordinates, and we get

$\displaystyle{ du = \frac{\partial u}{\partial v} dv + \frac{\partial u}{\partial w} dw }$

or more precisely

$\displaystyle{ du = \left.\frac{\partial u}{\partial v}\right|_w dv + \left. \frac{\partial u}{\partial w} \right|_v dw }$

Thus

$\begin{array}{ccl} du \wedge dw &=& \displaystyle{ \left(\left.\frac{\partial u}{\partial v}\right|_w dv + \left. \frac{\partial u}{\partial w} \right|_v dw\right) \wedge dw } \\ \\ &=& \displaystyle{ \left.\frac{\partial u}{\partial v}\right|_w dv \wedge dw } \end{array}$

so

$\displaystyle{ \frac{du \wedge dw}{dv \wedge dw} = \left. \frac{\partial u}{\partial v} \right|_w }$

as desired!

### John Baez — Maxwell’s Relations (Part 2)

The Maxwell relations are some very general identities about the partial derivatives of functions of several variables. You don’t need to understand anything about thermodyamics to understand them, but they’re used a lot in that subject, so discussions of them tend to use notation from that subject.

Last time I went through a standard way to derive these relations for a function of two variables. Now I want to give a better derivation, which I found here:

• David J. Ritchie, A simple method for deriving Maxwell’s relations, American Journal of Physics 36 (1958), 760–760.

This paper is just one page long, and I can’t really improve on it, but I’ll work through the ideas in more detail. It again covers only the case of a function of two variables, and I won’t try to go beyond that case now—maybe later.

So, remember the setup. We have a smooth function on the plane

$U \colon \mathbb{R}^2 \to \mathbb{R}$

We call the coordinates on the plane $S$ and $V,$ and we give the partial derivatives of $U$ funny names:

$\displaystyle{ T = \left.\frac{\partial U}{\partial S} \right|_V \qquad P = - \left. \frac{\partial U}{\partial V} \right|_S }$

None of these funny names or the minus sign has any effect on the actual math involved; they’re just traditional in thermodynamics. So, mathematicians, please forgive me! If I ever generalize to the n-variable case, I’ll switch to more reasonable notation.

We instantly get this:

$dU = T dS - P dV$

and since $d^2 = 0$ we get

$0 = d^2 U = dT \wedge dS - dP \wedge dV$

so

$dT \wedge dS = dP \wedge dV$

Believe it or not, this simple relation contains all four of Maxwell’s relations within it!

To see this, note that both sides are smooth 2-forms on the plane. Now, the space of 2-forms at any one point of the plane is a 1-dimensional vector space. So, we can divide any 2-form at a point by any nonzero 2-form at that point and get a real number.

In particular, suppose $X$ and $Y$ are functions on the plane such that $dX \wedge dY \ne 0$ at some point. Then we can divide both sides of the above equation by $dX \wedge dY$ and get

$\displaystyle{ \frac{dT \wedge dS}{dX \wedge dY} \; = \; \frac{dP \wedge dV}{dX \wedge dY} }$

at this point. We can now get the four Maxwell relations simply by making different choices of $X$ and $Y.$ We’ll choose them to be either $T,S,V$ or $P.$ The argument will only work if $dX \wedge dY \ne 0,$ so I’ll always assume that. The argument works the same way each time so I’ll go faster after the first time.

### The first relation

Take $X = V$ and $Y = S$ and substitute them into the above equation. We get

$\displaystyle{ \frac{dT \wedge dS}{dV \wedge dS} \; = \; \frac{dP \wedge dV}{dV \wedge dS} }$

or

$\displaystyle{ \frac{dT \wedge dS}{dV \wedge dS} \; = \; - \frac{dP \wedge dV}{dS \wedge dV} }$

Next we use a little fact about differential forms and partial derivatives to simplify both sides:

$\displaystyle{ \frac{dT \wedge dS}{dV \wedge dS} \; = \; \left.\frac{\partial T}{\partial V} \right|_S }$

and similarly

$\displaystyle{ \frac{dP \wedge dV}{dS \wedge dV} \; = \; \left. \frac{\partial P}{\partial S}\right|_V }$

If you were scarred as a youth when plausible-looking manipulations with partial derivatives turned out to be unjustified, you might be worried about this—and rightly so! Later I’ll show how to justify the kind of ‘cancellation’ we’re doing here. But anyway, it instantly gives us the first Maxwell relation:

$\boxed{ \displaystyle{ \left. \frac{\partial T}{\partial V} \right|_S \; = \; - \left. \frac{\partial P}{\partial S} \right|_V } }$

### The second relation

This time we take $X = T, Y = V.$ Substituting this into our general formula

$\displaystyle{ \frac{dT \wedge dS}{dX \wedge dY} \; = \; \frac{dP \wedge dV}{dX \wedge dY} }$

we get

$\displaystyle{ \frac{dT \wedge dS}{dT \wedge dV} \; = \; \frac{dP \wedge dV}{dT \wedge dV} }$

and doing the same sort of ‘cancellation’ as last time, this gives the second Maxwell relation:

$\boxed{ \displaystyle{ \left. \frac{\partial S}{\partial V} \right|_T \; = \; \left. \frac{\partial P}{\partial T} \right|_V } }$

### The third relation

This time we take $X = P, Y = S.$ Substituting this into our general formula

$\displaystyle{ \frac{dT \wedge dS}{dX \wedge dY} \; = \; \frac{dP \wedge dV}{dX \wedge dY} }$

we get

$\displaystyle{ \frac{dT \wedge dS}{dP \wedge dS} \; = \; \frac{dP \wedge dV}{dP \wedge dS} }$

which gives the third Maxwell relation:

$\boxed{ \displaystyle{ \left. \frac{\partial T}{\partial P} \right|_S \; = \; \left. \frac{\partial V}{\partial S} \right|_P }}$

### The fourth relation

This time we take $X = P, Y = T.$ Substituting this into our general formula

$\displaystyle{ \frac{dT \wedge dS}{dX \wedge dY} \; = \; \frac{dP \wedge dV}{dX \wedge dY} }$

we get

$\displaystyle{ \frac{dT \wedge dS}{dP \wedge dT} \; = \; \frac{dP \wedge dV}{dP \wedge dT} }$

or

$\displaystyle{ -\frac{dS \wedge dT}{dP \wedge dT} \; = \; \frac{dP \wedge dV}{dP \wedge dT} }$

giving the fourth Maxwell relation:

$\boxed{ \displaystyle{ \left. \frac{\partial V}{\partial T} \right|_P \; = \; - \left. \frac{\partial S}{\partial P} \right|_T }}$

You can check that other choices of $X$ and $Y$ don’t give additional relations of the same form.

### Determinants

So, we’ve see that all four Maxwell relations follow quickly from the equation

$dT \wedge dS = dP \wedge dV$

if we can do ‘cancellations’ in expressions like this:

$\displaystyle{ \frac{dA \wedge dB}{dX \wedge dY} }$

when one of the functions $A,B \colon \mathbb{R}^2 \to \mathbb{R}$ equals one of the functions $X,Y \colon \mathbb{R}^2 \to \mathbb{R}.$ This works whenever $dX \wedge dY \ne 0.$ Let’s see why!

First of all, by the inverse function theorem, if $dX \wedge dY \ne 0$ at some point in the plane, the functions $X$ and $Y$ serve as coordinates in some neighborhood of that point. In this case we have

$\displaystyle{ \frac{dA \wedge dB}{dX \wedge dY} = \det \left( \begin{array}{cc} \displaystyle{ \frac{\partial A}{\partial X} } & \displaystyle{ \frac{\partial A}{\partial Y} } \\ \\ \displaystyle{ \frac{\partial B}{\partial X} } & \displaystyle{\frac{\partial B}{\partial Y} } \end{array} \right) }$

Yes: the ratio of 2-forms is just the Jacobian of the map sending $(X,Y)$ to $(A,B).$ This is clear if you know that 2-forms are ‘area elements’ and the Jacobian is a ratio of area elements. But you can also prove it by a quick calculation:

$\displaystyle{ dA = \frac{\partial A}{\partial X} dX + \frac{\partial A}{\partial Y} dY }$

$\displaystyle{ dB = \frac{\partial B}{\partial X} dX + \frac{\partial B}{\partial Y} dY }$

and thus

$\displaystyle{ dA \wedge dB = \left(\frac{\partial A}{\partial X} \frac{\partial B}{\partial Y} - \frac{\partial A}{\partial Y} \frac{\partial B}{\partial X} \right) dX \wedge dY}$

so the ratio $dA \wedge dB/dX \wedge dY$ is the desired determinant.

How does this help? Well, take

$\displaystyle{ \frac{dA \wedge dB}{dX \wedge dY} }$

and now suppose that either $A$ or $B$ equals either $X$ or $Y.$ For example, suppose $A = X.$ Then we can do a ‘cancellation’ like this:

$\begin{array}{ccl} \displaystyle{ \frac{dX \wedge dB}{dX \wedge dY} } &=& \det \left( \begin{array}{cc} \displaystyle{ \frac{\partial X}{\partial X} } & \displaystyle{ \frac{\partial X}{\partial Y} } \\ \\ \displaystyle{ \frac{\partial B}{\partial X} } & \displaystyle{\frac{\partial B}{\partial Y} } \end{array} \right) \\ \\ &=& \det \left( \begin{array}{cc} \displaystyle{ 1 } & \displaystyle{ 0 } \\ \\ \displaystyle{ \frac{\partial B}{\partial X} } & \displaystyle{\frac{\partial B}{\partial Y} } \end{array} \right) \\ \\ &=& \displaystyle{\frac{\partial B}{\partial Y} } \end{array}$

or to make it clear that the partial derivatives are being done in the $X,Y$ coordinate system:

$\displaystyle{ \frac{dX \wedge dB}{dX \wedge dY} = \left.\frac{\partial B}{\partial Y} \right|_X }$

This justifies all our calculations earlier.

### Conclusions

So, we’ve seen that all four Maxwell relations are unified in a much simpler equation:

$dT \wedge dS = dP \wedge dV$

which follows instantly from

$dU = T dS - P dV$

This is a big step forward compared to the proof I gave last time, which, at least as I presented it, required cleverly guessing a bunch of auxiliary functions—even though these auxiliary functions turn out to be incredibly important in their own right.

So, we should not stop here: we should think hard about the physical and mathematical meaning of the equation

$dT \wedge dS = dP \wedge dV$

And Ritchie does this in his paper! But I will talk about that next time.

Part 1: a proof of Maxwell’s relations using commuting partial derivatives.

Part 2: a proof of Maxwell’s relations using 2-forms.

Part 3: the physical meaning of Maxwell’s relations, and their formulation in terms of symplectic geometry.

For how Maxwell’s relations are connected to Hamilton’s equations, see this post:

### Terence Tao — Nines of safety: a proposed unit of measurement of risk

In everyday usage, we rely heavily on percentages to quantify probabilities and proportions: we might say that a prediction is ${50\%}$ accurate or ${80\%}$ accurate, that there is a ${2\%}$ chance of dying from some disease, and so forth. However, for those without extensive mathematical training, it can sometimes be difficult to assess whether a given percentage amounts to a “good” or “bad” outcome, because this depends very much on the context of how the percentage is used. For instance:

• (i) In a two-party election, an outcome of say ${51\%}$ to ${49\%}$ might be considered close, but ${55\%}$ to ${45\%}$ would probably be viewed as a convincing mandate, and ${60\%}$ to ${40\%}$ would likely be viewed as a landslide.
• (ii) Similarly, if one were to poll an upcoming election, a poll of ${51\%}$ to ${49\%}$ would be too close to call, ${55\%}$ to ${45\%}$ would be an extremely favorable result for the candidate, and ${60\%}$ to ${40\%}$ would mean that it would be a major upset if the candidate lost the election.
• (iii) On the other hand, a medical operation that only had a ${51\%}$, ${55\%}$, or ${60\%}$ chance of success would be viewed as being incredibly risky, especially if failure meant death or permanent injury to the patient. Even an operation that was ${90\%}$ or ${95\%}$ likely to be non-fatal (i.e., a ${10\%}$ or ${5\%}$ chance of death) would not be conducted lightly.
• (iv) A weather prediction of, say, ${30\%}$ chance of rain during a vacation trip might be sufficient cause to pack an umbrella, even though it is more likely than not that rain would not occur. On the other hand, if the prediction was for an ${80\%}$ chance of rain, and it ended up that the skies remained clear, this does not seriously damage the accuracy of the prediction – indeed, such an outcome would be expected in one out of every five such predictions.
• (v) Even extremely tiny percentages of toxic chemicals in everyday products can be considered unacceptable. For instance, EPA rules require action to be taken when the percentage of lead in drinking water exceeds ${0.0000015\%}$ (15 parts per billion). At the opposite extreme, recycling contamination rates as high as ${10\%}$ are often considered acceptable.

Because of all the very different ways in which percentages could be used, I think it may make sense to propose an alternate system of units to measure one class of probabilities, namely the probabilities of avoiding some highly undesirable outcome, such as death, accident or illness. The units I propose are that of “nines“, which are already commonly used to measure availability of some service or purity of a material, but can be equally used to measure the safety (i.e., lack of risk) of some activity. Informally, nines measure how many consecutive appearances of the digit ${9}$ are in the probability of successfully avoiding the negative outcome, thus

• ${90\%}$ success = one nine of safety
• ${99\%}$ success = two nines of safety
• ${99.9\%}$ success = three nines of safety
and so forth. Using the mathematical device of logarithms, one can also assign a fractional number of nines of safety to a general probability:

Definition 1 (Nines of safety) An activity (affecting one or more persons, over some given period of time) that has a probability ${p}$ of the “safe” outcome and probability ${1-p}$ of the “unsafe” outcome will have ${k}$ nines of safety against the unsafe outcome, where ${k}$ is defined by the formula

$\displaystyle k = -\log_{10}(1-p) \ \ \ \ \ (1)$

(where ${\log_{10}}$ is the logarithm to base ten), or equivalently

$\displaystyle p = 1 - 10^{-k}. \ \ \ \ \ (2)$

Remark 2 Because of the various uncertainties in measuring probabilities, as well as the inaccuracies in some of the assumptions and approximations we will be making later, we will not attempt to measure the number of nines of safety beyond the first decimal point; thus we will round to the nearest tenth of a nine of safety throughout this post.

Here is a conversion table between percentage rates of success (the safe outcome), failure (the unsafe outcome), and the number of nines of safety one has:

 Success rate ${p}$ Failure rate ${1-p}$ Number of nines ${k}$ ${0\%}$ ${100\%}$ ${0.0}$ ${50\%}$ ${50\%}$ ${0.3}$ ${75\%}$ ${25\%}$ ${0.6}$ ${80\%}$ ${20\%}$ ${0.7}$ ${90\%}$ ${10\%}$ ${1.0}$ ${95\%}$ ${5\%}$ ${1.3}$ ${97.5\%}$ ${2.5\%}$ ${1.6}$ ${98\%}$ ${2\%}$ ${1.7}$ ${99\%}$ ${1\%}$ ${2.0}$ ${99.5\%}$ ${0.5\%}$ ${2.3}$ ${99.75\%}$ ${0.25\%}$ ${2.6}$ ${99.8\%}$ ${0.2\%}$ ${2.7}$ ${99.9\%}$ ${0.1\%}$ ${3.0}$ ${99.95\%}$ ${0.05\%}$ ${3.3}$ ${99.975\%}$ ${0.025\%}$ ${3.6}$ ${99.98\%}$ ${0.02\%}$ ${3.7}$ ${99.99\%}$ ${0.01\%}$ ${4.0}$ ${100\%}$ ${0\%}$ infinite

Thus, if one has no nines of safety whatsoever, one is guaranteed to fail; but each nine of safety one has reduces the failure rate by a factor of ${10}$. In an ideal world, one would have infinitely many nines of safety against any risk, but in practice there are no ${100\%}$ guarantees against failure, and so one can only expect a finite amount of nines of safety in any given situation. Realistically, one should thus aim to have as many nines of safety as one can reasonably expect to have, but not to demand an infinite amount.

Remark 3 The number of nines of safety against a certain risk is not absolute; it will depend not only on the risk itself, but (a) the number of people exposed to the risk, and (b) the length of time one is exposed to the risk. Exposing more people or increasing the duration of exposure will reduce the number of nines, and conversely exposing fewer people or reducing the duration will increase the number of nines; see Proposition 7 below for a rough rule of thumb in this regard.

Remark 4 Nines of safety are a logarithmic scale of measurement, rather than a linear scale. Other familiar examples of logarithmic scales of measurement include the Richter scale of earthquake magnitude, the pH scale of acidity, the decibel scale of sound level, octaves in music, and the magnitude scale for stars.

Remark 5 One way to think about nines of safety is via the Swiss cheese model that was created recently to describe pandemic risk management. In this model, each nine of safety can be thought of as a slice of Swiss cheese, with holes occupying ${10\%}$ of that slice. Having ${k}$ nines of safety is then analogous to standing behind ${k}$ such slices of Swiss cheese. In order for a risk to actually impact you, it must pass through each of these ${k}$ slices. A fractional nine of safety corresponds to a fractional slice of Swiss cheese that covers the amount of space given by the above table. For instance, ${0.6}$ nines of safety corresponds to a fractional slice that covers about ${75\%}$ of the given area (leaving ${25\%}$ uncovered).

Now to give some real-world examples of nines of safety. Using data for deaths in the US in 2019 (without attempting to account for factors such as age and gender), a random US citizen will have had the following amount of safety from dying from some selected causes in that year:

 Cause of death Mortality rate per ${100,\! 000}$ (approx.) Nines of safety All causes ${870}$ ${2.0}$ Heart disease ${200}$ ${2.7}$ Cancer ${180}$ ${2.7}$ Accidents ${52}$ ${3.3}$ Drug overdose ${22}$ ${3.7}$ Influenza/Pneumonia ${15}$ ${3.8}$ Suicide ${14}$ ${3.8}$ Gun violence ${12}$ ${3.9}$ Car accident ${11}$ ${4.0}$ Murder ${5}$ ${4.3}$ Airplane crash ${0.14}$ ${5.9}$ Lightning strike ${0.006}$ ${7.2}$

The safety of air travel is particularly remarkable: a given hour of flying in general aviation has a fatality rate of ${0.00001}$, or about ${5}$ nines of safety, while for the major carriers the fatality rate drops down to ${0.0000005}$, or about ${7.3}$ nines of safety.

Of course, in 2020, COVID-19 deaths became significant. In this year in the US, the mortality rate for COVID-19 (as the underlying or contributing cause of death) was ${91.5}$ per ${100,\! 000}$, corresponding to ${3.0}$ nines of safety, which was less safe than all other causes of death except for heart disease and cancer. At this time of writing, data for all of 2021 is of course not yet available, but it seems likely that the safety level would be even lower for this year.

Some further illustrations of the concept of nines of safety:

• Each round of Russian roulette has a success rate of ${5/6}$, providing only ${0.8}$ nines of safety. Of course, the safety will decrease with each additional round: one has only ${0.5}$ nines of safety after two rounds, ${0.4}$ nines after three rounds, and so forth. (See also Proposition 7 below.)
• The ancient Roman punishment of decimation, by definition, provided exactly one nine of safety to each soldier being punished.
• Rolling a ${1}$ on a ${20}$-sided die is a risk that carries about ${1.3}$ nines of safety.
• Rolling a double one (“snake eyes“) from two six-sided dice carries about ${1.6}$ nines of safety.
• One has about ${2.6}$ nines of safety against the risk of someone randomly guessing your birthday on the first attempt.
• A null hypothesis has ${1.3}$ nines of safety against producing a ${p = 0.05}$ statistically significant result, and ${2.0}$ nines against producing a ${p=0.01}$ statistically significant result. (However, one has to be careful when reversing the conditional; a ${p=0.01}$ statistically significant result does not necessarily have ${2.0}$ nines of safety against the null hypothesis. In Bayesian statistics, the precise relationship between the two risks is given by Bayes’ theorem.)
• If a poker opponent is dealt a five-card hand, one has ${5.8}$ nines of safety against that opponent being dealt a royal flush, ${4.8}$ against a straight flush or higher, ${3.6}$ against four-of-a-kind or higher, ${2.8}$ against a full house or higher, ${2.4}$ against a flush or higher, ${2.1}$ against a straight or higher, ${1.5}$ against three-of-a-kind or higher, ${1.1}$ against two pairs or higher, and just ${0.3}$ against one pair or higher. (This data was converted from this Wikipedia table.)
• A ${k}$-digit PIN number (or a ${k}$-digit combination lock) carries ${k}$ nines of safety against each attempt to randomly guess the PIN. A length ${k}$ password that allows for numbers, upper and lower case letters, and punctuation carries about ${2k}$ nines of safety against a single guess. (For the reduction in safety caused by multiple guesses, see Proposition 7 below.)

Here is another way to think about nines of safety:

Proposition 6 (Nines of safety extend expected onset of risk) Suppose a certain risky activity has ${k}$ nines of safety. If one repeatedly indulges in this activity until the risk occurs, then the expected number of trials before the risk occurs is ${10^k}$.

Proof: The probability that the risk is activated after exactly ${n}$ trials is ${(1-10^{-k})^{n-1} 10^{-k}}$, which is a geometric distribution of parameter ${10^k}$. The claim then follows from the standard properties of that distribution. $\Box$

Thus, for instance, if one performs some risky activity daily, then the expected length of time before the risk occurs is given by the following table:

 Daily nines of safety Expected onset of risk ${0}$ One day ${0.8}$ One week ${1.5}$ One month ${2.6}$ One year ${2.9}$ Two years ${3.3}$ Five years ${3.6}$ Ten years ${3.9}$ Twenty years ${4.3}$ Fifty years ${4.6}$ A century

Or, if one wants to convert the yearly risks of dying from a specific cause into expected years before that cause of death would occur (assuming for sake of discussion that no other cause of death exists):

 Yearly nines of safety Expected onset of risk ${0}$ One year ${0.3}$ Two years ${0.7}$ Five years ${1}$ Ten years ${1.3}$ Twenty years ${1.7}$ Fifty years ${2.0}$ A century

These tables suggest a relationship between the amount of safety one would have in a short timeframe, such as a day, and a longer time frame, such as a year. Here is an approximate formalisation of that relationship:

Proposition 7 (Repeated exposure reduces nines of safety) If a risky activity with ${k}$ nines of safety is (independently) repeated ${m}$ times, then (assuming ${k}$ is large enough depending on ${m}$), the repeated activity will have approximately ${k - \log_{10} m}$ nines of safety. Conversely: if the repeated activity has ${k'}$ nines of safety, the individual activity will have approximately ${k' + \log_{10} m}$ nines of safety.

Proof: An activity with ${k}$ nines of safety will be safe with probability ${1-10^{-k}}$, hence safe with probability ${(1-10^{-k})^m}$ if repeated independently ${m}$ times. For ${k}$ large, we can approximate

$\displaystyle (1 - 10^{-k})^m \approx 1 - m 10^{-k} = 1 - 10^{-(k - \log_{10} m)}$

giving the former claim. The latter claim follows from inverting the former. $\Box$

Remark 8 The hypothesis of independence here is key. If there is a lot of correlation between the risks between different repetitions of the activity, then there can be much less reduction in safety caused by that repetition. As a simple example, suppose that ${90\%}$ of a workforce are trained to perform some task flawlessly no matter how many times they repeat the task, but the remaining ${10\%}$ are untrained and will always fail at that task. If one selects a random worker and asks them to perform the task, one has ${1.0}$ nines of safety against the task failing. If one took that same random worker and asked them to perform the task ${m}$ times, the above proposition might suggest that the number of nines of safety would drop to approximately ${1.0 - \log_{10} m}$; but in this case there is perfect correlation, and in fact the number of nines of safety remains steady at ${1.0}$ since it is the same ${10\%}$ of the workforce that would fail each time.

Because of this caveat, one should view the above proposition as only a crude first approximation that can be used as a simple rule of thumb, but should not be relied upon for more precise calculations.

One can repeat a risk either in time (extending the time of exposure to the risk, say from a day to a year), or in space (by exposing the risk to more people). The above proposition then gives an additive conversion law for nines of safety in either case. Here are some conversion tables for time:

 From/to Daily Weekly Monthly Yearly Daily 0 -0.8 -1.5 -2.6 Weekly +0.8 0 -0.6 -1.7 Monthly +1.5 +0.6 0 -1.1 Yearly +2.6 +1.7 +1.1 0

 From/to Yearly Per 5 yr Per decade Per century Yearly 0 -0.7 -1.0 -2.0 Per 5 yr +0.7 0 -0.3 -1.3 Per decade +1.0 + -0.3 0 -1.0 Per century +2.0 +1.3 +1.0 0

For instance, as mentioned before, the yearly amount of safety against cancer is about ${2.7}$. Using the above table (and making the somewhat unrealistic hypothesis of independence), we then predict the daily amount of safety against cancer to be about ${2.7 + 2.6 = 5.3}$ nines, the weekly amount to be about ${2.7 + 1.7 = 4.4}$ nines, and the amount of safety over five years to drop to about ${2.7 - 0.7 = 2.0}$ nines.

Now we turn to conversions in space. If one knows the level of safety against a certain risk for an individual, and then one (independently) exposes a group of such individuals to that risk, then the reduction in nines of safety when considering the possibility that at least one group member experiences this risk is given by the following table:

 Group Reduction in safety You (${1}$ person) ${0}$ You and your partner (${2}$ people) ${-0.3}$ You and your parents (${3}$ people) ${-0.5}$ You, your partner, and three children (${5}$ people) ${-0.7}$ An extended family of ${10}$ people ${-1.0}$ A class of ${30}$ people ${-1.5}$ A workplace of ${100}$ people ${-2.0}$ A school of ${1,\! 000}$ people ${-3.0}$ A university of ${10,\! 000}$ people ${-4.0}$ A town of ${100,\! 000}$ people ${-5.0}$ A city of ${1}$ million people ${-6.0}$ A state of ${10}$ million people ${-7.0}$ A country of ${100}$ million people ${-8.0}$ A continent of ${1}$ billion people ${-9.0}$ The entire planet ${-9.8}$

For instance, in a given year (and making the somewhat implausible assumption of independence), you might have ${2.7}$ nines of safety against cancer, but you and your partner collectively only have about ${2.7 - 0.3 = 2.4}$ nines of safety against this risk, your family of five might only have about ${2.7 - 0.7 = 2}$ nines of safety, and so forth. By the time one gets to a group of ${1,\! 000}$ people, it actually becomes very likely that at least one member of the group will die of cancer in that year. (Here the precise conversion table breaks down, because a negative number of nines such as ${2.7 - 3.0 = -0.3}$ is not possible, but one should interpret a prediction of a negative number of nines as an assertion that failure is very likely to happen. Also, in practice the reduction in safety is less than this rule predicts, due to correlations such as risk factors that are common to the group being considered that are incompatible with the assumption of independence.)

In the opposite direction, any reduction in exposure (either in time or space) to a risk will increase one’s safety level, as per the following table:

 Reduction in exposure Additional nines of safety ${\div 1}$ ${0}$ ${\div 2}$ ${+0.3}$ ${\div 3}$ ${+0.5}$ ${\div 5}$ ${+0.7}$ ${\div 10}$ ${+1.0}$ ${\div 100}$ ${+2.0}$

For instance, a five-fold reduction in exposure will reclaim about ${0.7}$ additional nines of safety.

Here is a slightly different way to view nines of safety:

Proposition 9 Suppose that a group of ${m}$ people are independently exposed to a given risk. If there are at most

$\displaystyle \log_{10} \frac{1}{1-2^{-1/m}}$

nines of individual safety against that risk, then there is at least a ${50\%}$ chance that one member of the group is affected by the risk.

Proof: If individually there are ${k}$ nines of safety, then the probability that all the members of the group avoid the risk is ${(1-10^{-k})^m}$. Since the inequality

$\displaystyle (1-10^{-k})^m \leq \frac{1}{2}$

is equivalent to

$\displaystyle k \leq \log_{10} \frac{1}{1-2^{-1/m}},$

the claim follows. $\Box$

Thus, for a group to collectively avoid a risk with at least a ${50\%}$ chance, one needs the following level of individual safety:

 Group Individual safety level required You (${1}$ person) ${0.3}$ You and your partner (${2}$ people) ${0.5}$ You and your parents (${3}$ people) ${0.7}$ You, your partner, and three children (${5}$ people) ${0.9}$ An extended family of ${10}$ people ${1.2}$ A class of ${30}$ people ${1.6}$ A workplace of ${100}$ people ${2.2}$ A school of ${1,\! 000}$ people ${3.2}$ A university of ${10,\! 000}$ people ${4.2}$ A town of ${100,\! 000}$ people ${5.2}$ A city of ${1}$ million people ${6.2}$ A state of ${10}$ million people ${7.2}$ A country of ${100}$ million people ${8.2}$ A continent of ${1}$ billion people ${9.2}$ The entire planet ${10.0}$

For large ${m}$, the level ${k}$ of nines of individual safety required to protect a group of size ${m}$ with probability at least ${50\%}$ is approximately ${\log_{10} \frac{m}{\ln 2} \approx (\log_{10} m) + 0.2}$.

Precautions that can work to prevent a certain risk from occurring will add additional nines of safety against that risk, even if the precaution is not ${100\%}$ effective. Here is the precise rule:

Proposition 10 (Precautions add nines of safety) Suppose an activity carries ${k}$ nines of safety against a certain risk, and a separate precaution can independently protect against that risk with ${l}$ nines of safety (that is to say, the probability that the protection is effective is ${1 - 10^{-l}}$). Then applying that precaution increases the number of nines in the activity from ${k}$ to ${k+l}$.

In particular, we can repurpose the table at the start of this post as a conversion chart for effectiveness of a precaution:

 Effectiveness Failure rate Additional nines provided ${0\%}$ ${100\%}$ ${+0.0}$ ${50\%}$ ${50\%}$ ${+0.3}$ ${75\%}$ ${25\%}$ ${+0.6}$ ${80\%}$ ${20\%}$ ${+0.7}$ ${90\%}$ ${10\%}$ ${+1.0}$ ${95\%}$ ${5\%}$ ${+1.3}$ ${97.5\%}$ ${2.5\%}$ ${+1.6}$ ${98\%}$ ${2\%}$ ${+1.7}$ ${99\%}$ ${1\%}$ ${+2.0}$ ${99.5\%}$ ${0.5\%}$ ${+2.3}$ ${99.75\%}$ ${0.25\%}$ ${+2.6}$ ${99.8\%}$ ${0.2\%}$ ${+2.7}$ ${99.9\%}$ ${0.1\%}$ ${+3.0}$ ${99.95\%}$ ${0.05\%}$ ${+3.3}$ ${99.975\%}$ ${0.025\%}$ ${+3.6}$ ${99.98\%}$ ${0.02\%}$ ${+3.7}$ ${99.99\%}$ ${0.01\%}$ ${+4.0}$ ${100\%}$ ${0\%}$ infinite

Thus for instance a precaution that is ${80\%}$ effective will add ${0.7}$ nines of safety, a precaution that is ${99.8\%}$ effective will add ${2.7}$ nines of safety, and so forth. The mRNA COVID vaccines by Pfizer and Moderna have somewhere between ${88\% - 96\%}$ effectiveness against symptomatic COVID illness, providing about ${0.9-1.4}$ nines of safety against that risk, and over ${95\%}$ effectiveness against severe illness, thus adding at least ${1.3}$ nines of safety in this regard.

A slight variant of the above rule can be stated using the concept of relative risk:

Proposition 11 (Relative risk and nines of safety) Suppose an activity carries ${k}$ nines of safety against a certain risk, and an action multiplies the chance of failure by some relative risk ${R}$. Then the action removes ${\log_{10} R}$ nines of safety (if ${R > 1}$) or adds ${-\log_{10} R}$ nines of safety (if ${R<1}$) to the original activity.

Proof: The additional action adjusts the probability of failure from ${10^{-k}}$ to ${R \times 10^{-k} = 10^{-(k - \log_{10} R)}}$. The claim now follows from Definition 1. $\Box$

Here is a conversion chart between relative risk and change in nines of safety:

 Relative risk Change in nines of safety ${0.01}$ ${+2.0}$ ${0.02}$ ${+1.7}$ ${0.05}$ ${+1.3}$ ${0.1}$ ${+1.0}$ ${0.2}$ ${+0.7}$ ${0.5}$ ${+0.3}$ ${1}$ ${0}$ ${2}$ ${-0.3}$ ${5}$ ${-0.7}$ ${10}$ ${-1.0}$ ${20}$ ${-1.3}$ ${50}$ ${-1.7}$ ${100}$ ${-2.0}$

Some examples:

• Smoking increases the fatality rate of lung cancer by a factor of about ${20}$, thus removing about ${1.3}$ nines of safety from this particular risk; it also increases the fatality rates of several other diseases, though not quite as dramatically an extent.
• Seatbelts reduce the fatality rate in car accidents by a factor of about two, adding about ${0.3}$ nines of safety. Airbags achieve a reduction of about ${30-50\%}$, adding about ${0.2-0.3}$ additional nines of safety.
• As far as transmission of COVID is concerned, it seems that constant use of face masks reduces transmission by a factor of about five (thus adding about ${0.7}$ nines of safety), and similarly for constant adherence to social distancing; whereas for instance a ${30\%}$ compliance with mask usage reduced transmission by about ${10\%}$ (adding only ${0.05}$ or so nines of safety).

The effect of combining multiple (independent) precautions together is cumulative; one can achieve quite a high level of safety by stacking together several precautions that individually have relatively low levels of effectiveness. Again, see the “swiss cheese model” referred to in Remark 5. For instance, if face masks add ${0.7}$ nines of safety against contracting COVID, social distancing adds another ${0.7}$ nines, and the vaccine provide another ${1.0}$ nine of safety, implementing all three mitigation methods would (assuming independence) add a net of ${2.4}$ nines of safety against contracting COVID.

In summary, when debating the value of a given risk mitigation measure, the correct question to ask is not quite “Is it certain to work” or “Can it fail?”, but rather “How many extra nines of safety does it add?”.

As one final comparison between nines of safety and other standard risk measures, we give the following proposition regarding large deviations from the mean.

Proposition 12 Let ${X}$ be a normally distributed random variable of standard deviation ${\sigma}$, and let ${\lambda > 0}$. Then the “one-sided risk” of ${X}$ exceeding its mean ${{\bf E} X}$ by at least ${\lambda \sigma}$ (i.e., ${X \geq {\bf E} X + \lambda \sigma}$) carries

$\displaystyle -\log_{10} \frac{1 - \mathrm{erf}(\lambda/\sqrt{2})}{2}$

nines of safety, the “two-sided risk” of ${X}$ deviating (in either direction) from its mean by at least ${\lambda \sigma}$ (i.e., ${|X-{\bf E} X| \geq \lambda \sigma}$) carries

$\displaystyle -\log_{10} (1 - \mathrm{erf}(\lambda/\sqrt{2}))$

nines of safety, where ${\mathrm{erf}}$ is the error function.

Proof: This is a routine calculation using the cumulative distribution function of the normal distribution. $\Box$

Here is a short table illustrating this proposition:

 Number ${\lambda}$ of deviations from the mean One-sided nines of safety Two-sided nines of safety ${0}$ ${0.3}$ ${0.0}$ ${1}$ ${0.8}$ ${0.5}$ ${2}$ ${1.6}$ ${1.3}$ ${3}$ ${2.9}$ ${2.6}$ ${4}$ ${4.5}$ ${4.2}$ ${5}$ ${6.5}$ ${6.2}$ ${6}$ ${9.0}$ ${8.7}$

Thus, for instance, the risk of a five sigma event (deviating by more than five standard deviations from the mean in either direction) should carry ${6.2}$ nines of safety assuming a normal distribution, and so one would ordinarily feel extremely safe against the possibility of such an event, unless one started doing hundreds of thousands of trials. (However, we caution that this conclusion relies heavily on the assumption that one has a normal distribution!)

See also this older essay I wrote on anonymity on the internet, using bits as a measure of anonymity in much the same way that nines are used here as a measure of safety.

## October 04, 2021

### n-Category CaféStirling's Formula

Stirling’s formula says

$\displaystyle{ n! \sim \sqrt{2 \pi n} \, \left(\frac{n}{e}\right)^n }$

where $\sim$ means that the ratio of the two quantities goes to $1$ as $n \to \infty.$

Where does this formula come from? In particular, how does the number $2\pi$ get involved? Where is the circle here?

To understand these things, I think a nonrigorous argument that can be made rigorous is more useful than a rigorous proof with all the ε’s dotted and the δ’s crossed. It’s important, I think, to keep the argument short. So let me do that.

The punchline will be that the $2\pi$ comes from this formula:

$\displaystyle{ \int_{-\infty}^\infty e^{-x^2/2} \, d x = \sqrt{2 \pi} }$

And this, I hope you know, comes from squaring both sides and converting the left side into a double integral that you can do in polar coordinates, pulling out a factor of $2 \pi$ because the thing you’re integrating only depends on $r,$ not $\theta.$

$\displaystyle{ \int_0^\infty x^n e^{-x} \, d x = n! }$

This is easy to show using repeated integration by parts.

Next, we do this:

$\begin{array}{ccl} n! &=& \displaystyle{ \int_0^\infty x^n e^{-x} \, d x } \\ \\ &=& \displaystyle{ \int_0^\infty e^{n \ln x -x} \, d x } \\ \\ &=& \displaystyle{ n\int_0^\infty e^{n \ln (n y) -n y} \, d y } \\ \\ \end{array}$

In first step we’re writing $x^n$ as $e^{n \ln x}.$ In the second we’re changing variables: $x = n y.$

Next we use $\ln(n y) = \ln n + \ln y$ to bust things up:

$\displaystyle{ n! = n e^{n \ln n} \int_0^\infty e^{n \ln y -n y} \, d y }$

All the hard work will come in showing this:

$\displaystyle{ \int_0^\infty e^{n \ln y -n y} \, d y \sim \sqrt{\frac{2 \pi}{n}} \; e^{-n} }$

Given this, we get

$\displaystyle{ n! \sim n e^{n \ln n}\; \sqrt{\frac{2 \pi}{n}} \; e^{-n} }$

and simplifying we get Stirling’s formulas:

$\displaystyle{ n! \sim \sqrt{2 \pi n} \, \left(\frac{n}{e}\right)^n}$

### Laplace’s method

So to prove Stirling’s formula, the big question is: how do we get

$\displaystyle{ \int_0^\infty e^{n \ln y -n y} \, d y \sim \sqrt{\frac{2 \pi}{n}} \; e^{-n} } \; ?$

Let’s write it like this:

$\displaystyle{ \int_0^\infty e^{-n (y - \ln y)} \, d y \sim \sqrt{\frac{2 \pi}{n}} \; e^{-n} }$

The trick is to note that as $n$ gets big, the integral will become dominated by the point where $y - \ln y$ is as small as possible. We can then approximate the integral by a Gaussian peaked at that point!

Notice that

$\displaystyle{ \frac{d}{d y} (y - \ln y) = 1 - y^{-1} }$

$\displaystyle{ \frac{d^2}{d y^2} (y - \ln y) = y^{-2} }$

so the function $y - \ln y$ has a critical point at $y = 1$ and its second derivative is $1$ there, so it’s a local minimum. Indeed this point is the unique minimum of our function on the whole interval $(0,\infty).$

Then we use this:

Laplace’s Method. Suppose $f \colon [a,b] \to \mathbb{R}$ has a unique minimum at some point $x_0 \in (a,b)$ and $f\prime\prime(x_0) > 0.$ Then

$\displaystyle{ \int_a^b e^{-n f(x)} d x \sim \sqrt{\frac{2 \pi}{n f\prime\prime(x_0)}} \; e^{-n f(x_0)} }$

as $n \to \infty.$

This says that asymptotically, the integral equals what we’d get if we replaced $f$ by the quadratic function whose value, first derivative and second derivative all match that of $f$ at the point $x_0.$ With this quadratic replacing $f,$ you can do the integral by hand—it’s the integral of a Gaussian—and you get the right hand side.

Applying this formula to the problem at hand we get

$\displaystyle{ \int_a^b e^{-n (y - \ln y)} d y \sim \sqrt{\frac{2 \pi}{n f\prime\prime(y_0)}} \; e^{-n f(y_0)} }$

where $f(y) = y - \ln y,$ $y_0 = 1,$ $f(y_0) = 1$ and $f\prime\prime(y_0) = 1.$ So we get

$\displaystyle{ \int_a^b e^{-n (y - \ln y)} d y \sim \sqrt{\frac{2 \pi}{n}} \; e^{-n} }$

and then letting $a = 0, b \to +\infty$ we get what we want.

So, from this viewpoint—and there are others—the key to Stirling’s formula is Laplace’s method of approximating an integral like

$\displaystyle{ \int_a^b e^{-n f(x)} d x }$

with a Gaussian integral. And in the end, the crucial calculation is where we do that Gaussian integral, using

$\displaystyle{ \int_{-\infty}^\infty e^{-x^2/2} \, d x = \sqrt{2 \pi} }$

You can see the whole proof of Laplace’s method here:

Physicists who have done quantum field theory will know that when push comes to shove it’s largely about Gaussian integrals. The limit $n \to \infty$ we’re seeing here is like a ‘classical limit’ where $\hbar \to 0.$ So they will be familiar with this idea.

There should be some deeper moral here, about how $n!$ is related to a Gaussian process of some sort, but I don’t know it—even though I know how binomial coefficients approximate a Gaussian distribution. Do you know some deeper explanation, maybe in terms of probability theory and combinatorics, of why $n!$ winds up being asymptotically described by an integral of a Gaussian?

For a very nice account of some cruder versions of Stirling’s formula, try this blog article:

His ‘note’, which you can find there, will give you more intuition for why something like Stirling’s formula should be true. But I think the above argument explains the $\sqrt{2 \pi}$ better than Ahlfors’ argument.

### n-Category CaféWeakly Globular Double Categories: a Model for Bicategories

guest post by Claire Ott and Emma Phillips as part of the Adjoint School for Applied Category Theory 2021.

As anyone who has worked with bicategories can tell you, checking coherence diagrams can hold up the completion of a proof for weeks. Paoli and Pronk have defined weakly globular double categories, a simplicial model of bicategories which is a sub-2-category of the 2-category of double categories. Today, we’ll introduce weakly globular double categories and briefly talk about the advantage of this model. We’ll also take a look at an application of this model: the SIRS model of infectious disease.

## What is a weakly globular double category?

We’d like to find a sub-2-category of double categories that is biequivalent to the 2-category of bicategories. There are several reasons to want this model of bicategories, which we’ll talk about below. To compare bicategories and double categories, we’re going to consider them as objects in $[\Delta^{op}, \mathsf{Cat}]$. This will enable us to develop a model of bicategories that lives between strict 2-categories and double categories. Two facts are going to help us develop this model:

Fact 1. $X \in [\Delta^{op}, \mathsf{Cat}]$ is the horizontal nerve of a double category iff the Segal maps $\eta_k: X_k \to X_1 \times_{X_0} \overset{k}{\cdots} \times_{X_0} X_1$ are isomorphisms.

Fact 2. $X \in [\Delta^{op}, \mathsf{Cat}]$ is the nerve of a strict 2-category iff $X_0$ is discrete and the Segal maps $\eta_k: X_k \to X_1 \times_{X_0} \overset{k}{\cdots} \times_{X_0} X_1$ are isomorphisms.

Since bicategories are weak 2-categories, it seems that we should be able to weaken the conditions of Fact 2 to get a model of bicategories in $[\Delta^{op}, \mathsf{Cat}]$. By Fact 1, if we want this model to land in double categories, we should keep the requirement that the Segal maps are isomorphisms. Hence we want a property that’s a little less strong than $X_0$ being discrete.

We start with a weak gloublarity condition: we require that there is an equivalence of categories $\gamma: X_0 \to X^d_0$, where $X^d_0$ is the discrete category of the path components of $X_0$. In the corresponding double category, the weak globularity condition amounts to $X_0$, the category of objects and vertical arrows, being a posetal groupoid; that is, every pair of objects has at most one arrow between them, which is an isomorphism.

When we define the fundamental bicategory of a weakly globular double category $X$, meaning the bicategory associated to $X$ (see 8.1 here), we will define the objects to be the connected components of $X_0$ and the arrows to be the horizontal morphisms of $X$. However, this creates a potential problem - how do we compose two morphisms whose codomain and domain are in the same component but don’t agree?

To solve this, we define an induced Segal maps condition: we require that for all $k \geq 2$, $\gamma: X_0 \to X^d_0$ induces an equivalence of categories $X_1 \times_{X_0} \overset{k}{\cdots} \times_{X_0} X_1 \simeq X_1 \times_{X^d_0} \overset{k}{\cdots} \times_{X^d_0} X_1$. This condition answers the composition problem because it allows us to fill in staircases, meaning paths of alternating horizontal and vertical arrows, with invertible 2-cells:

This lift of the horizontal maps will give us our composite in the fundamental bicategory.

Finally we have our definition of a weakly globular double category: $X \in [\Delta^{op}, \mathsf{Cat}]$ is a weakly gloublar double category if and only if

• the Segal maps are isomorphisms (i.e., $X$ is a double category);
• (weak globularity condition) there is an equivalence of categories $\gamma: X_0 \to X^d_0$, where $X^d_0$ is the discrete category of the path components of $X_0$;
• (induced Segal maps condition) for all $k \geq 2$, $\gamma$ induces an equivalence of categories $X_1 \times_{X_0} \overset{k}{\cdots} \times_{X_0} X_1 \simeq X_1 \times_{X^d_0} \overset{k}{\cdots} \times_{X^d_0} X_1$.

We then have the following theorem, which you can read more about here:

Theorem 1. There is a biequivalence of 2-categories between the 2-category of weakly globular double categories with (double) pseudofunctors and vertical transformations and the 2-category of bicategories with normal homomorphisms1 and icons :

$\mathbf{Bic}: \mathsf{WGDbl_{ps,v}} \simeq \mathsf{Bicat_{icon}} : \mathbf{Dbl}.$

## Why study weakly globular double categories?

We’ll now try to convince you of why you should be interested in this model of bicategories. Between strict 2-categories, bicategories, and double categories, bicategories seem to show up most often in the wild. Even in pure mathematics, requiring that composition be strictly associative is often too strong of a condition. However, bicategories are also the most difficult to work with, since we have to check several coherence diagrams.

When we use weakly globular double categories, however, the coherence data is encoded in the combinatorial data of the simplicial structure. Hence when defining a bicategory, we can simply give a double category that adheres to the weak globularity and induced Segal maps conditions.

Even so, you may be thinking that while dealing with bicategory coherence diagrams can be unpleasant, they’re not so terrible that it’s worth learning a new model. The real power of this model, however, comes when we extend this notion to higher categories. Checking the coherence diagrams for tricategories is rather terrible, and there’s not even a clear notion of what the coherence diagrams should be for $n$-categories for $n\geq 4$. In her book, Paoli proves that weakly globular n-fold categories model weak n-categories. As in the case when $n=2$, the weak globularity weakens the higher categorical structures of strict $n$-categories. For $0 \leq k \leq n$, instead of the $k$-cells forming a set as they do in a strict $n$-category, they form a $(n-k-1)$-fold category which is suitably equivalent to a discrete one.

## How do weakly globular double categories show up in applications?

To get a better understanding of what these weakly globular double categories can look like and what they might be used for, we want to introduce an example of marked paths of token placements in Petri nets. In these categories, morphisms only exist between objects with the same underlying Petri net, so we will focus on a category built on one specific net for this example: the SIRS model of infectious disease. We will call this weakly globular double category $\mathsf{Sirs^2}$. Though we have not explicitly constructed this double category using the functor $\mathbf{Dbl}$, we have drawn inspiration from the double category of marked paths in Paoli and Pronk, 8.2.

We would like to use token placements to model a group of three people which can be susceptible ($S$), infected ($I$) or resistant ($R$). A token placement is a linear combination of places, i.e the placement $2S + I$ signifies two susceptible and one infected persons. A firing can happen at a transition, which are in our example infection ($\tau_i$), recovery ($\tau_r$) and loosing resistance ($\tau_l$). Each transition has a source and a target which are again linear combinations of places and the condition for a firing is that enough tokens are placed at the transition’s source. In our example there could initially occur a firing at $\tau_i$ leading to the placement $S+2I$ or at $\tau_r$, but not at $\tau_l$.

Firings can be combined if the state after one firing is compatible with the next firing. This combination will have its own source and target which does not necessarily have to be the same as the first firing’s source and the last firing’s target. For example $\tau_l\tau_i\tau_r$ has source and target $I + R$. The order of the firings is also significant.

We want to construct $\mathsf{Sirs^2}$ to capture paths of token placements which can occur through consecutive firings. In our example, one such path could be $2S+I,S+2I,S+I+R,S+2R$, meaning we start with two susceptible and one infected person, which infects one of those two. Then one of the two infected people recovers, and next the other one recovers. One point to note is that we cannot distinguish tokens by means other than their placement, and so the scenario where the first infected person recovers first looks the same as the second infected person recovering first.

The objects of $\mathsf{Sirs^2}$ are marked paths, marking one token placement of the path. For our example, the marked paths \begin{aligned} &\mathbf{2S+I},S+2I,S+I+R,S+2R\\ &2S+I,\mathbf{S+2I},S+I+R,S+2R\\ &\dots \end{aligned} are all unique objects in $\mathsf{Sirs^2}$. Each object can be interpreted as a path of events that led to and follow from a chosen state of susceptible, infected, and resistant people.

Therefore we might be interested in all objects with the same fixed state, i.e. token placement. The vertical morphisms in $\mathsf{Sirs^2}$ are unique and exist between any two objects with the same marked token placement. For example, if we want to know the possible history and future for the state $2S+I$ of two susceptible people and one infected person, we could look at all vertical morphisms to or from the object $\mathbf{2S+I},S+2I,S+I+R,S+2R$. Our potential vertical morphisms would include the following:

Horizontal morphisms only exist between pairs of objects with the same underlying path. There is exactly one horizontal morphism between each of these pairs, with the direction of the morphisms the same as the path’s direction. Horizontal morphisms are labeled by paths with two placements marked. For example $\mathbf{S+I},2I,\mathbf{I+R},S+I$ refers to the following morphism:

$(\mathbf{S+I},2I,I+R,S+I) \xrightarrow{(\mathbf{S+I},2I,\mathbf{I+R},S+I)} (S+I,2I,\mathbf{I+R},S+I)$

Note that there is no horizontal morphism in the opposite direction even though there is a path from $I+R$ to $S+I$.

Using vertical and horizontal morphisms we get 2-cells of the following form:

In order to make our horizontal composition strictly associative, we have a chosen way to compose a string of horizontal maps which is encoded in the $f$ maps in the center of the diagram. The 2-cell witnesses the existence of alternate paths between two fixed states. In $\mathsf{Sirs^2}$, all 2-cells are invertible.

We can see that $\mathsf{Sirs^2}$ is indeed a weakly globular double-category. The weak globularity condition is immediate, since the only vertical morphisms are isomorphisms and every pair of objects has at most one vertical morphism between them. In order to see that the induced Segal maps condition holds, consider an example staircase:

We fill in the top row of the staircase by creating a path of events built from the sequences encoded in the horizontal morphisms.

1 These are pseudofunctors of bicategories which preserve identities strictly.

### Doug Natelson — Annual Nobel speculation thread

Once more, the annual tradition:  Who do people think will win the Nobel this year in physics or chemistry?  I have repeately and incorrectly suggested Aharonov and Berry .  There is a lot of speculation on social media about AspectZeilinger, and Clauser for Bell's inequality tests.  Social media speculation has included quantum cascade lasers as well as photonic bandgap/metamaterials. Other suggestions I've seen online have included superconducting qubits (with various combinations of people) and twisted bilayer graphene, though both of those may be a bit early.

## October 01, 2021

### Matt von Hippel — Breaking Out of “Self-Promotion Voice”

What do TED talks and grant applications have in common?

Put a scientist on a stage, and what happens? Some of us panic and mumble. Others are as smooth as a movie star. Most, though, fall back on a well-practiced mode: “self-promotion voice”.

A scientist doing self-promotion voice is easy to recognize. We focus on ourselves, of course (that’s in the name!), talking about all the great things we’ve done. If we have to mention someone else, we make sure to link it in some way: “my colleague”, “my mentor”, “which inspired me to”. All vulnerability is “canned” in one way or another: “challenges we overcame”, light touches on the most sympathetic of issues. Usually, we aren’t negative towards our colleagues either: apart from the occasional very distant enemy, everyone is working with great scientific virtue. If we talk about our past, we tell the same kinds of stories, mentioning our youthful curiosity and deep buzzwordy motivations. Any jokes or references are carefully pruned, made accessible to the lowest-common-denominator. This results in a standard vocabulary: see a metaphor, a quote, or a turn of phrase, and you’re bound to see it in talks again and again and again. Things get even more repetitive when you take into account how often we lean on the voice: a given speech or piece will be assembled from elementary pieces, snippets of practiced self-promotion that we pour in like packing peanuts after a minimal edit, filling all available time and word count.

Packing peanuts may not be glamorous, but they get the job done. A scientist who can’t do “the voice” is going to find life a lot harder, their negativity or clumsiness turning away support when they need it most. Except for the greatest of geniuses, we all have to learn a bit of self-promotion to stay employed.

We don’t have to stop there, though. Self-promotion voice works, but it’s boring and stilted, and it all looks basically the same. If we can do something a bit more authentic then we stand out from the crowd.

I’ve been learning this more and more lately. My blog posts have always run the gamut: some are pure formula, but the ones I’m most proud of have a voice all their own. Over the years, I’ve been pushing my applications in that direction. Each grant and job application has a bit of the standard self-promotion voice pruned away, and a bit of another voice (my own voice?) sneaking in. This year, as I send out applications, I’ve been tweaking things. I almost hope the best jobs come late in the year, my applications will be better then!

### Tommaso Dorigo — Statistics Lectures, And On Ancillarity And The Neyman Construction

The Corfu Summer Institute is a well-established institution for higher education, run since the paleolithic by the inexhaustible George Zoupanos, the soul of the whole thing. Generations of physicists have been trained in doctoral schools and conferences there over the past few decades. Of course the beauty of the island, set in the Ionian sea and green from the top of its mountains to the blue sea below, has helped to keep the events there well-attended, and even during the pandemic the events have been run in person.

## September 29, 2021

### David Hogg — radial velocity with a gas cell

I had a great meeting today with Matt Daunt (NYU), Lily Zhao (Flatiron), and Megan Bedell (Flatiron), in which we described what we need to do to fit a data-driven model to extreme-precision radial-velocity spectra that are taken with a gas cell in the light path. The gas cell imprints lines from known atomic transitions and secures the wavelength calibration of the device. How to use this in a data-driven model. We started by talking about line-spread functions and wavelength solutions and ended up asking some deep questions, like: Do you really need to know the wavelength solution in order to measure a change in radial velocity? I am not sure you do!

### David Hogg — Dr Steven Mohammed

Today I had the great pleasure to serve on the PhD defense committee for Steven Mohammed (Columbia). Mohammed worked on the post-GALEX-misson GALEX data in the Milky Way plane. We took the data specially as part of the Caltech/Columbia takeover of the mission at the end of the NASA mission lifetime. Mohammed and my former student Dun Wang built a pipeline to reduce the data and produce catalogs. And Mohammed has done work on photometric metallicities and the abundances in the Milky Way disk using the data. It is a beautiful data set and a beautiful set of results.

### Georg von Hippel — Things nobody ever thought before the pandemic

• "I'm feeling homesick for my office desk."
• "I miss my daily commute."
• "It's good to be able to spend more time away from those I love most."
• "I'm so glad to be able to eat canteen food again."

### Doug Natelson — Science/tech consulting in creative arts

I've watched the first two episodes of the new adaptation of Foundation.  It surely looks gorgeous, though there are some script challenges (even apart from the challenge of trying to adapt an enormous book series that was always long on ideas and short on character development).   The issues I've spotted seem mostly to be ones of poor script editing for consistency.  (The emperor of the Galactic Empire says in the first episode that the imperial population is 8 trillion, and then in the second episode a character says that the core worlds alone have a population of 40 trillion.  The latter number is more reasonable, given the size of Asimov's empire.)

Watching this, I again think it would be great fun to do scientific/technical consulting for TV, movies, and even books. I'm on the list for the Science and Entertainment Exchange, though all I've ever done is give a tiny bit of feedback to a would-be author.  (My expertise probably looks too narrow, and not living in southern California seems to be a major filter.)

It feels like there are some similarities to the role of science in public policy.  In the creative productions, science can contribute (and these media can be a great way of getting scientific ideas out into the public), but in the end plot and what can practically be implemented will always drive the final product.  In policy, science and technical knowledge should definitely factor in when relevant, but fundamentally there are social and political factors that can overwhelm those influences in decision-making.  Now back to our regularly scheduled psychohistorical crisis....

## September 28, 2021

### Tommaso Dorigo — Come And Do Research In Particle Physics With Machine Learning In Padova!

I used some spare research funds to open a six-months internship to help my research group in Padova, and the call is open for applications at this site (the second in the list right now, the number is #23584). So here I wish to answer a few questions from potential applicants, namely:
1) Can I apply?
2) When is the call deadline?
3) What is the salary?
4) What is the purpose of the position? What can I expect to gain from it?

5) What will I be doing if I get selected?

### David Hogg — does the Milky Way bar have distinct chemistry?

The answer to the post title question is NO, it doesn't. Eilers (MIT), Rix (MPIA), and I discussed Eilers's project on this subject today. If you use the APOGEE data naively, you can conclude that the bar has different abundances. But your conclusion is wrong, because the abundances have surface-gravity-related systematic issues, and the Galactic center is only visible in the most luminous (lowest surface-gravity) stars. So we have corrected that and have a paper to write. Even after correcting for this systematic, there are still interesting systematics, and we don't know what they are, yet.

### David Hogg — specification of a possible cosmology project in terms of spherical harmonics

Today Kate Storey-Fisher (NYU) and I asked ourselves: How can we do machine learning on cosmological simulations? If we want to use permutation-invariant methods, we need to use something like graph structure, and you can't currently execute graph neural networks at the scale of millions or billions of particles. So we need permutation-invariant scalars and vectors produced from point clouds. We discussed a set of options in terms of taking spherical-harmonic transforms in shells, and then combining those into scalars and vectors. There's a lot of fun geometry involved! Not sure if we have a plan, yet.

## September 27, 2021

### n-Category CaféShulman's Practical Type Theory for Symmetric Monoidal Categories

guest post by Nuiok Dicaire and Paul Lessard

Many well-known type theories, Martin-Löf dependent type theory or linear type theory for example, were developed as syntactic treatments of particular forms of reasoning. Constructive mathematical reasoning in the case of Martin-Löf type theory and resource dependent computation in the case of linear type theory. It is after the fact that these type theories provide convenient means to reason about locally Cartesian closed categories or $\star$-autonomous ones. In this post, a long overdue part of the Applied Category Theory Adjoint School (2020!?), we will discuss a then recent paper by Mike Shulman, A Practical Type Theory for Symmetric Monoidal Categories, where the author reverses this approach. Shulman starts with symmetric monoidal categories as the intended semantics and then (reverse)-engineers a syntax in which it is practical to reason about such categories.

Which properties of a type theory (for symmetric monoidal categories) make it practical? Shulman, synthesizing various observations, settles on a few basic tenets to guide the invention of the syntax and its interpretation into symmetric monoidal categories. We reduce these further to the conditions that the type theory be: (1) intuitive, (2) concise, and (3) complete.

Intuitive. First, a practical type theory should permit us to leverage our intuition for reasoning about “sets with elements”. What this means in practice is that we require a “natural deduction style” type theory, in which contexts look and feel like choosing elements, and typing judgements look and feel like functions of sets. Moreover, we also require an internal-language/term-model relationship with symmetric monoidal categories which provide correspondences:

(1)\array{ \arrayopts{ \frame{none} \rowalign{bottom} \rowlines{none} } \mathbf{Category} \quad & & \quad \mathbf{Type} \;\mathbf{Theory} \\ \mathrm{objects} \quad & \leftrightarrow & \quad \mathrm{contexts} \\ \mathrm{morphisms} \quad & \leftrightarrow & \quad \mathrm{typing} \; \mathrm{judgments} \\ \mathrm{commuting}\;\mathrm{diagrams} \quad & \leftrightarrow & \quad \mathrm{equality} \; \mathrm{judgments} }

Concise. When stripped of the philosophical premises of what terms, types, judgements etc. are >, the rules of a type theory can be seen to generate its derivable judgments. It is therefore desirable that the translation between symmetric monoidal categories and the rules of their associated type theories proceed by way of the most concise combinatorial description of symmetric monoidal categories possible.

Complete. Lastly we ask that, given a presentation for a symmetric monoidal category, the type theory we get therefrom should be complete. By this, we mean that a proposition should hold in a particular symmetric monoidal category if and only if it is derivable as a judgment in the associated type theory.

## Symmetric Monoidal Categories and Presentations of $\mathsf{PROP}$s

While it is well-known that every symmetric monoidal category is equivalent to a symmetric strict monoidal category, it is probably less well-known that every symmetric strict monoidal category is equivalent to a $\mathsf{PROP}$.

Definition. A $\mathsf{PROP}$, $\mathfrak{P}=\left(\mathfrak{P},\mathbf{P}\right)$, consists of a set $\mathbf{P}$ of generating objects and a symmetric strict monoidal category $\mathfrak{P}$ whose underlying monoid of objects is the free monoid on the set $\mathbf{P}$.

This is not however too hard to see: the equivalence between $\mathsf{PROP}$s and symmetric monoidal categories simply replaces every equality of objects $A\otimes B = C$ with an isomorphism $A \otimes B \overset{\sim}{\longrightarrow} C$, thereby rendering the monoidal product to be free. We will develop what we mean by presentation over the course of a few examples. In doing so we hope to give the reader a better sense for $\mathsf{PROP}$s.

Example. Given a set $\mathbf{X}$, let $\Sigma_{\mathbf{X}}$ be the free permutative category on $\mathbf{X}$. This is a symmetric monoidal category whose monoid of objects is the monoid of lists drawn from the set $\mathbf{X}$ and whose morphisms are given by the expression

(2)$\Sigma_{\mathbf{X}}\left(\overrightarrow{X},\overrightarrow{Y}\right) = \left\{ \sigma \in S_{n} \Big| \overrightarrow{X_\sigma}=\overrightarrow{Y} \right\}$

(where by $\overrightarrow{X_\sigma}$ we mean the reorganization of the list $\overrightarrow{X}$ according to the permutation $\sigma$). For every set $\mathbf{X}$, $\Sigma_{\mathbf{X}}$ is a $\mathsf{PROP}$.

Example. For a more complicated example, let $\mathbf{X}_0$ be a set and let

(3)$\mathbf{X}_1 = \left\{ (f_i,\overrightarrow{X}_i,\overrightarrow{Y}_i) \right\}_{i\in I}$

be a set of triples comprised of names $f_i$ and pairs of lists $\left(\overrightarrow{X}_i,\overrightarrow{Y}_i\right)$ valued in $\mathbf{X}_0$. Let $F\left(\mathbf{X}_1,\mathbf{X}_0\right)$ denote the free symmetric monoidal category generated by $\Sigma_{\mathbf{X}_0}$ together with additional arrows $f_{i}:\overrightarrow{X}_i \longrightarrow \overrightarrow{Y}_i$ for each $i\in I$. Then $F\left(\mathbf{X}_1,\mathbf{X}_0\right)$ is also a $\mathsf{PROP}$.

Importantly, this second example is very nearly general - every $\mathsf{PROP}$ admits a bijective-on-objects and full, but not in general faithful, functor from some $\mathsf{PROP}$ of the form $F\left(\mathbf{X}_1,\mathbf{X}_0\right)$.

Example. Let $\mathbf{X}_0$ and $\mathbf{X}_1$ be as in the previous example and let

(4)$\mathbf{R}=\Big\{ (s_j,t_j) \in \mathsf{Mor}\big(F\left(\mathbf{X}_1,\mathbf{X}_0\right)\big)^2 \Big| j \in J \Big\}$

be a set of pairs of morphisms in the $\mathsf{PROP}$ $F\left(\mathbf{X}_1,\mathbf{X}_0\right)$. Let $F\left(\mathbf{R},\mathbf{X}_1,\mathbf{X}_0\right)$ be the quotient of the symmetric monoidal category $F\left(\mathbf{X}_1,\mathbf{X}_0\right)$ by the congruence generated by $R \subset \mathsf{Mor}\big(F(\mathbf{X}_1,\mathbf{X}_0)\big) \times \mathsf{Mor}\big(F\left(\mathbf{X}_1,\mathbf{X}_0\right)\big)$.

This last example is fully general. Every $\mathsf{PROP}$, hence every symmetric monoidal category, is equivalent to a $\mathsf{PROP}$ of the form $F\left(\mathbf{R},\mathbf{X}_1,\mathbf{X}_0\right)$.

Remark. If (generalized) computads are familiar, you may notice that our three examples here are free $\mathsf{PROP}$s on $0$, $1$, and $2$-computads.

## From Presentations to Type Theories

It is from these presentations $\left(\mathbf{R},\mathbf{X}_1,\mathbf{X}_0\right)$ of $\mathsf{PROP}$s that we will build type theories $\mathsf{PTT}_{\left(\mathbf{R},\mathbf{X}_1,\mathbf{X}_0\right)}$ for the symmetric monoidal categories $F\left(\mathbf{R},\mathbf{X}_1,\mathbf{X}_0\right)$. Indeed, reading $\rightsquigarrow$ as “gives rise to”, we will see:

(5)\array{ \arrayopts{ \frame{none} \rowalign{bottom} \rowlines{none} } \mathbf{Generators} \quad & & \quad \mathbf{Judgements} \\ \mathbf{X}_0 \quad & \rightsquigarrow & \quad \mathrm{contexts} \\ \mathbf{X}_1 \quad & \rightsquigarrow & \quad \mathrm{typing} \; \mathrm{judgments} \\ \mathbf{R} \quad & \rightsquigarrow & \quad \mathrm{equality} \; \mathrm{judgment} }

### Contexts

The contexts (usually denoted $\Gamma,\;\Delta$, etc.) of the type theory $\mathsf{PTT}_{(\mathbf{R},\mathbf{X}_1,\mathbf{X}_0)}$ are lists

(6)$x_{1}:A_{1},\dots,x_{n}:A_{n}$

of typed variables, where the $A_i$ are elements of the set $\mathbf{X}_{0}$. It’s not hard to see that, up to the names of variables, contexts are in bijection with $\mathsf{List}\left(\mathbf{X}_0\right)$.

### Typing Judgments

As promised, typing judgments correspond to morphisms in $F\left(\mathbf{R},\mathbf{X}_1,\mathbf{X}_0\right)$. Here are some examples of morphisms and their corresponding judgements:

(7)\array{ \arrayopts{ \frame{none} \rowalign{bottom} \rowlines{none} \colalign{right left} } \mathbf{Morphisms} \quad & & \quad \mathbf{Typing} \;\mathbf{Judgements} \\ f:(A) \rightarrow (B) \quad & \leftrightarrow & \quad x:A\vdash f(x):B \\ h:(A)\rightarrow (B_{1},B_{2}) \quad & \leftrightarrow & \quad x:A\vdash\left(h_{(1)}(x),h_{(2)}(x)\right):B \\ z:() \rightarrow () \quad & \leftrightarrow & \quad \vdash \left(|z^{a}\right):() }

The rules from which these judgments may be derived correspond, roughly speaking, to applying a tensor product of generating morphisms in $\mathbf{X}_1$ composed with a braiding - something along the lines of:

(8)\array {\arrayopts{ \frame{none} \rowalign{top} \rowlines{none solid} \colalign{center} } \Gamma \dashv \left( \overrightarrow{m}, \dots ,\overrightarrow{n} | \overrightarrow{z}\right) : \left( \overrightarrow{A}, \dots, \overrightarrow{B}\right) \\ \left(f:\overrightarrow{A} \rightarrow \overrightarrow{F}\right) \in \mathbf{X}_1 \cdots \; \left(g:\overrightarrow{B} \rightarrow \overrightarrow{G}\right) \in \mathbf{X}_1 \quad \left( \sigma : \left(\overrightarrow{F},\dots, \overrightarrow{G}\right) \rightarrow \bigtriangleup \right) \in \Sigma_{\mathbf{X}_0} \\ \Gamma \dashv \left( \sigma \left( \overrightarrow{f}\left(\overrightarrow{m}\right),\dots,\overrightarrow{g}\left( \overrightarrow{n}\right) \right) \right) : \bigtriangleup }

in which:

• $\Gamma$ is a context, and $\Delta$ is a (list of) type(s);
• $\overrightarrow{m}, \dots ,\overrightarrow{n}$ are terms of types $\overrightarrow{A}, \dots , \overrightarrow{B}$ respectively;
• the $f$ through $g$ are generating arrows in the set $\mathbf{X}_1$; and
• $\sigma$ is the avatar in $\Sigma_{\mathbf{X}_0}$ of some permutation.

Now, it is rather clear that we are hiding something - the gory details of the rules from which the typing judgments are derived (the so-called identity and generator rules in Shulman’s paper). However, the reader should rest assured that not only are the details not that onerous, but the more naive structural rules corresponding to tensoring, composition, and braiding are admissible. In practical terms, this means that one is simply free to work with these more intuitive rules.

### Equality Judgments

Equality judgments, for example something of the form $x:A\vdash h (x)=f\big(g(x)\big):C,$ which in categorical terms corresponds to a commuting diagram

are similarly derived from rules coming from the set $\mathbf{R}$. We will be more explicit later on within the context of an example.

## How do Symmetric Monoidal Categories fit into all this?

Until this point it is not actually terribly clear what makes this type theory specifically adapted to symmetric monoidal categories as opposed to Cartesian monoidal categories. After all, we have written contexts as lists precisely as we would write contexts in a Cartesian type theory.

Although we may not always think about it, we write contexts in Cartesian type theories as lists of typed variables because of the universal property of the product - a universal property characterized in terms of projection maps. Indeed, in a Cartesian type theory, a list of typed terms can be recovered from the list of their projections. This is not the case for arbitrary symmetric monoidal products; in general there are no projection maps. And, even when there are maps with the correct domain and codomain, there is no guarantee that they have a similar computation rule (the type theoretic avatar of a universal property).

To illustrate this in a mathematical context, consider a pair of vector spaces $U_1$ and $U_2$. Any element $z\in U_1 \times U_2$ of the Cartesian product of $U_1$ and $U_2$ is uniquely specified by the pair of elements $\mathsf{pr}_{1}(z) \in U_1$ and $\mathsf{pr}_{2}(z)\in U_2$ - $z=\left(\mathsf{pr}_1(z),\mathsf{pr}_2(z)\right)$. However, elements of the tensor product $U_1 \otimes U_2$ need not be simple tensors of the form $x \otimes y$ and instead are generally linear combinations $\sum_{i=1}^{k}x_{1,i} \otimes x_{2,i}$ of simple tensors.

Provided we are careful however - meaning we do not perform any “illegal moves” in a sense we will make clear - we can still use lists. This is what Shulman does in adapting Sweedler’s notation. Consider a general element

(9)$\sum_{i=1}^{k}x_{1,i}\otimes x_{2,i} \in U_1 \otimes U_2$

In Sweedler’s notation this is written as $\left(x_{(1)}, x_{(2)}\right)$ with the summation, indices, and tensor symbols all dropped. Provided we make sure that any expression in which $(x,y)$ appears is linear in both variables, this seeming recklessness introduces no errors.

To see how this plays out, suppose that we have a morphism $f$ with domain $A$ and co-domain being the tensor product $\left(B_1,...,B_n\right)$. In Shulman’s adaptation of Sweedler’s notation this corresponds to the judgment:

(10)$a:A \vdash \left( f_{\left(1\right)} \left( a \right), \dots,f_{\left(n\right)} \left( a \right) \right) : \left( B_{1},\dots,B_{n} \right)$

Importantly, our typing rules will never derive the judgment $a:A\dashv f_{(k)}:B_k$ and we will only be able to derive $a:A \dashv \mathsf{pr}_k \left(f_{\left(1\right)}\left(a\right),\dots,f_{\left(n\right)}\left(a\right)\right):B_k$ if we have $\mathsf{pr}_k:\left(B_1,\dots,B_n\right) \rightarrow B_k$ in $\mathbf{X}_1$ and more, unless $\mathbf{R}$ specifies that $\mathsf{pr}_k$ acts like a projection, $\mathsf{pr}$ is but a name.

Since this type theory allows us to pretend, in a sense, that types are “sets with elements”, the symmetric monoidal aspect of the type theory can be moralized as:

• terms of product types are not-necessarily-decomposable tuples;

whereas in a Cartesian “sets with elements” style type theory:

• terms of product types are decomposable tuples.

Remark. Taking a hint from the bicategorical playbook can give us a more geometric picture of the difference between an indecomposable tuple and a decomposable one. Since every symmetric monoidal category is also a bicategory with a single object, a typed term, usually a $1$-cell, also corresponds to $2$-cells between (directed) loops. In this vein we may think of an indecomposable tuple $\left(x_{(1)},x_{(2)},\dots,x_{(n)}\right):\overrightarrow{A}$ as an $n$-sphere in $\overrightarrow{A}$ whereas a decomposable one, say $(x,y,\dots,z):\overrightarrow{A}$, corresponds to a torus (of some dimension) in $\overrightarrow{A}$.

## Example: The Free Dual Pair

Having sketched the basics of Shulman’s $\mathsf{PTT}$ and how a presentation for a $\mathsf{PROP}$ specifies the rules of $\mathsf{PTT}$ for that $\mathsf{PROP}$, we may now attend to an important and illuminating example. We will develop the practical type theory of the free dual pair.

Recall that a dual pair in a symmetric monoidal category abstracts the relationship

(11)$\mathsf{Hom}(A\otimes V,B)\overset{\sim}{\longrightarrow} \mathsf{Hom}(A,V^{\star}\otimes B)$

between a vector space $V$ and its dual, $V^{\star}$.

Definition. A dual pair $(D,D^{\star},\mathsf{coev},\mathsf{ev})$ in a symmetric monoidal category $\big(\mathfrak{C},(- ,-),()\big)$ is comprised of:

• a pair of objects $D$ and $D^{*}$ of $\mathfrak{C}$;
• a morphism $\mathsf{coev}:\mathbf{1}\longrightarrow D\otimes D^{*}$; and
• a morphism $\mathsf{ev}:D^{*}\otimes D\longrightarrow\mathbf{1}$

satisfying the triangle identities:

This data suggests a presentation $(\mathbf{R},\mathbf{X}_1,\mathbf{X}_0)$ for a $\mathsf{PROP}$, in particular the free dual pair. We set:

• $\mathbf{X}_0 = \{D,D^\star\}$;
• $\mathbf{X}_1 = \{ \mathsf{coev}:() \longrightarrow (D,D^{\star}), \mathsf{ev}:(D^{\star},D)\longrightarrow () \}$; and
• $\mathbf{R} = \Big\{ \big( (D,\mathsf{ev}) \circ (\mathsf{coev},D) , \mathrm{id}_{D} \big), \ \big( (\mathsf{ev},D^{\star})\circ(D^{\star},\mathsf{coev}) , \mathrm{id}_{D^{\star}} \big) \Big\}$

### The type theory of the free dual pair

We will now develop the rules of the type theory for the free dual pair from the data this presentation $\left(\mathbf{R},\mathbf{X}_1,\mathbf{X}_0\right)$. While we have not bothered with them until now, Shulman’s practical type theory does include rules for term formation. In the case of the practical type theory for the free dual pair, the rules for the term judgment are few. We present a streamlined version of them:

(12)\array {\arrayopts{ \frame{none} \rowalign{top} \rowlines{solid} \colalign{center} } (x:A) \in\Gamma \\ \Gamma\vdash x \; \mathsf{term} } \qquad \array {\arrayopts{ \frame{none} \rowalign{top} \rowlines{solid} \colalign{center} } 1\leq k \leq 2 \\ \mathsf{coev}_{(k)}\; \mathsf{term} } \qquad \array {\arrayopts{ \frame{none} \rowalign{top} \rowlines{solid} \colalign{center} } \Gamma\vdash m \;\mathsf{term} \quad \Gamma\vdash n \;\mathsf{term}\\ \Gamma\vdash\mathsf{ev}(m,n)\; \mathsf{term} }

The first rule is the usual variable rule which gives us the terms in which we may derive the typing judgments corresponding to identities. The second one gives us terms in which we may derive co-evaluation as a typing judgment. Finally, the last one gives us the terms in which we may derive evaluation as a typing judgment.

For the typing judgments, we have two (again slightly streamlined) rules:

• the so-called generator rule
(13)\array {\arrayopts{ \frame{none} \rowalign{top} \rowlines{none none solid} \colalign{center} } \Gamma \in \left( \overrightarrow{p},\dots,\overrightarrow{q},\overrightarrow{r}\left|\overrightarrow{z}\right. \right) : \left( \overrightarrow{C},\dots,\overrightarrow{D},\overrightarrow{E} \right)\\ h:\overrightarrow{C} \rightarrow () \in \mathbf{X}_1 \quad \dots \quad k:\overrightarrow{D} \rightarrow () \in \mathbf{X}_1\\ \sigma : \left(\overrightarrow{F},\dots,\overrightarrow{G},\overrightarrow{E}\right) \overset{\sim}{\longrightarrow} \bigtriangleup\in \Sigma_{\mathbf{X}_0} \tau \in \mathsf{Shuffle} \left( h,\dots,k; \overrightarrow{z} \right)\\ \Gamma \vdash \left( \left. \sigma \left( \overrightarrow{f} \left( \overrightarrow{m} \right) , \dots,\overrightarrow{g} \left( \overrightarrow{n} \right) , \overrightarrow{r} \right) \right| \tau \left( h \left( \overrightarrow{p} \right) ,\dots, k \left( \overrightarrow{p} \right) , \overrightarrow{z} \right) \right) : \bigtriangleup }
corresponds to applying a tensor product followed by a braiding $\sigma$ and a shuffling of scalars. The tensor product is taken between generating scalar-valued functions $h,\dots,k$ (which can only be some number of copies of $\mathsf{ev}$ in our case) and the identity on some object, here $\overrightarrow{E}$. And,
• the so-called identity rule
(14)\array {\arrayopts{ \frame{none} \rowalign{top} \rowlines{none solid} \colalign{center} } f:() \rightarrow \overrightarrow{B} \in \mathbf{X}_1 \quad \dots \quad g:() \rightarrow \overrightarrow{C} \in \mathbf{X}_1 \\ \sigma:\left( \overrightarrow{A}, \overrightarrow{B}, \dots, \overrightarrow{C} \right) \overset{\sim}{\longrightarrow} \bigtriangleup \in \Sigma_{\mathbf{X}_0} \\ \overrightarrow{x}:\overrightarrow{A} \dashv \left( \left. \sigma \left( \overrightarrow{x},\overrightarrow{f},\dots,\overrightarrow{g} \right) \right| h,\dots,k \right):\bigtriangleup}
which corresponds to applying a braiding $\sigma$ to the tensoring of some number of generating constants, here $f,\dots,g$ (which can only be some number of copies of $\mathsf{coev}$), and the identity on some object $\overrightarrow{A}$.

Remark. Here we are ignoring the notion consideration of “activeness”, a technical device introduced to rigidify the type theory into one with unique derivations of judgments by providing a canonical form for 1-cells. We also ignored the syntactic sugar of labels, which act like formal variables for the unit type.

Lastly, for the equality judgment we have axioms

(15)\array {\arrayopts{ \frame{none} \rowalign{top} \rowlines{solid} \colalign{center} } \left( (D,\mathsf{ev}) \circ \left( \mathsf{coev},D \right) , \mathrm{id}_{D} \right) \in \mathbf{R} \\ m:D \vdash \left(\mathsf{coev}_{(1)} \left| \mathsf{ev}\left(\mathsf{coev}_{(2)},M\right) \right.\right) = m:D }

and

(16)\array {\arrayopts{ \frame{none} \rowalign{top} \rowlines{solid} \colalign{center} } \left( ( \mathsf{ev} , D^{\star}) \circ \left( D^{\star},\mathsf{coev} \right) , \mathrm{id}_{D^{\star}} \right) \in \mathbf{R} \\ n:D^{\star} \vdash \left( \mathsf{coev}_{(2)} \; \left| \; \mathsf{ev} \left( N,\mathsf{coev}_{(1)} \right) \right. \right) = n:D^{\star} }

together with enough rules so that $=$ acts as a congruence relative to all of the other rules.

Example. Consider the composition

(17)$(D)\xrightarrow{\left(\mathsf{coev},D\right)} \left(D,D^{*},D\right) \xrightarrow{\left(D,\mathsf{ev}\right)} (D)$

which corresponds to the typing judgement

(18)$x:D\vdash \left(\mathsf{coev}_{(1)}|\mathsf{ev} \left(\mathsf{coev}_{(2)},x\right)\right):D$

This typing judgement admits the following derivation

(19)\array{ \arrayopts{ \frame{none} \rowalign{bottom} \rowlines{solid} } \array {\arrayopts{ \frame{none} \rowalign{top} \rowlines{solid} \colalign{center} } \mathsf{coev}:() \rightarrow (D,D^{\star}) \in \mathbf{X}_1 & (132): \left( D,D,D^{\star} \right) \overset{\sim}{\longrightarrow} \left( D,D^{\star},D \right) \\ \cellopts{\colspan{2}} x:D\vdash \left( \mathsf{coev}_{(1)},\mathsf{coev}_{(2)},x \right):D } & \mathsf{ev}: \left(D^*,D\right) \rightarrow () \in \mathbf{X}_1 \\ \cellopts{\colspan{2}} x:D\vdash \left( \left. \mathsf{coev}_{(1)} \right| \mathsf{ev} \left( \mathsf{coev}_{(2)},x \right) \right):D }

where:

• the first rule is an instance of the identity rule with premised generators $\mathsf{coev}$ and the braiding $(132)$; and
• the second rule is an instance of the identity rule with the typing judgement $x:D\vdash \left(\mathsf{coev}_{(1)},\mathsf{coev}_{(2)},x\right):D$ the consequent of the first rule and the additional hypothesis being the generator $\mathsf{ev}$.

### “Elements” of dual objects

We have now developed a type theory for the free dual pair which endows the dual objects $D$ and $D^{\star}$ with a universal notion of element (that of $D$-typed term and $D^{\star}$-typed term respectively). Since the notion of dual pair abstracted the instance of a pair of dual vector spaces, which in particular have actual elements, it behooves us to ask:

“to what extent is term of type $D$ like a vector (in $D$)?”

The answer is both practical and electrifying (though perhaps the authors of this post are too easily electrified).

It’s easy enough to believe that the evaluation map

(20)$\mathsf{ev}:\left(D,D^{\star}\right)\longrightarrow ( )$

endows the terms of type $D$, or $D^{\star}$ for that matter, with a structure of scalar valued functions on the other. The triangle identities impose the unique determination of terms of type $D$ or $D^{\star}$ in terms of their values as given by $\mathsf{ev}$. Indeed consider that, for a finite dimensional vector space $V$ over a field $k$, a basis $\{ \mathbf{e}_{i}\}_{i=1}^{n}$ for $V$ and a dual basis $\{ \mathbf{e}_{i}^{\star}\}_{i=1}^{n}$ for $V^{\star}$ give us an elegant way to write $\mathsf{coev}$ and the first triangle identity. We write

(21)\begin{aligned} k & \overset{\mathsf{coev}}{\rightarrow} V\otimes V^{\star}\\ x & \mapsto \sum_{i=1}^{n}\mathbf{e}_{i}\otimes\mathbf{e}_{i}^{\star} \end{aligned}

and see that

The observation for dual vector spaces

(22)$V \mathrm{ and } V^{\star}=\mathsf{Hom}_{\mathsf{Vect}_k}(V,k )$

that the triangle identities hold is just the observation that a vector is precisely determined by its values: every vector $\mathbf{v}$ is equal to the un-named vector

(23)$\sum_{i=1}^{n}\mathbf{e}_i^{\star}( \mathbf{v})\cdot \mathbf{e}_i$

which is defined by taking the values $\mathbf{e}^\star_{i}( \mathbf{v} )$ at the dual vectors $\mathbf{e}_i^{\star}$. As part of the definition of a dual pair in an arbitrary symmetric strict monoidal category then, the triangle identities imposes this as a relationship between $\mathsf{ev}$ and $\mathsf{coev}$. But within type theory, this sort of relationship between an un-named function and its values is familiar, indeed it is something very much like $\beta$-reduction.

To see this more clearly, let us make a pair of notational changes. In place of writing

(24)$(x,y) : \left( D^{\star},D \right) \vdash \mathsf{ev}(x,y):()$

we will denote $\mathsf{ev}$ infix by $\triangleleft$ and write

(25)$(x,y) : \left( D^{\star},D \right) \vdash x\triangleleft y:()$

Similarly, in place of writing

(26)$\vdash \left( \mathsf{coev}_{(1)},\mathsf{coev}_{(2)} \right):\left(D,D^{\star}\right)$

we will denote $\mathsf{coev}$ by the pair $\left(u,\lambda^{D}u\right)$ and write

(27)$\vdash\left(u,\lambda^{D}u\right):\left(D,D^{\star}\right)$

With this choice of notation then, the equality judgment which corresponds to the first triangle identity is

(28)$x:D\vdash\left( u \; \left| \; \lambda^{D}u\triangleleft x\right.\right)=x:D$

Then, since $=$ is a congruence with respect to substitution (recall that we promised that the rules for the equality judgement were precisely those required for generating a congruence from a set of primitive relations), if we have, for some term $m$, the term $\lambda^{D}u\triangleleft m$ appearing in the scalars of a list of terms, then we may replace all instances of $u$ in the rest of the term with $m$. While a mouthful, this is a sort of “$\beta$-reduction for duality” allowing us to treat $\mathsf{coev}$ as a sort of $\lambda$-abstraction, as we suggested by the change in notation. Conceptually interesting in its own right, this observation also yields a one line proof for a familiar theorem, namely the cyclicity of the trace.

Example. Consider two dual pairs

(29)\array {\arrayopts{ \frame{none} \rowalign{top} \rowlines{none} } \left(A,A^{*},\left(u,\lambda^{A}u\right),-\triangleleft -\right) &\qquad \qquad& \left(B,B^{*},\left(v,\lambda^{B}v\right),- \triangleleft -\right) }

in a symmetric strict monoidal category $\big(\mathfrak{C},(-, -),()\big)$ and a pair of maps $f:A\longrightarrow B$

(30)$()\overset{(v,\lambda^{B}v)}{\longrightarrow} \left(B,B^{\star}\right)\overset{f\circ g}{\longrightarrow} \left(B,B^{\star}\right)\overset{(12)}{\longrightarrow} \left(B^{\star},B\right)\overset{ - \triangleleft -}{\longrightarrow} ()$

where $(12)$ denotes the permutation of entries of the list. Likewise the trace $\mathsf{tr}\left(g\circ f\right)$ is the composition.

(31)$()\overset{(v,\lambda^{A}v)}{\longrightarrow} \left(A,A^{\star}\right)\overset{g\circ f}{\longrightarrow} \left(A,A^{\star}\right)\overset{(12)}{\longrightarrow} \left(A^{\star},A\right)\overset{ - \triangleleft -}{\longrightarrow} ()$

Using the notation introduced above it follows that $\mathsf{tr}\left(f\circ g\right)=\mathsf{tr}\left(g\circ f\right)$ as:

(32)\array {\arrayopts{ \frame{none} \rowalign{top} \rowlines{none} \colalign{right left left } } \mathsf{tr}(f\circ g) & \overset{\mathsf{def}}{=} & \left( \left| \lambda_{u}^{B}\triangleleft f\big( g(u) \big) \right. \right) \\ & = & \left( \left| \lambda_{u}^{B}\triangleleft f(v),\lambda_{v}^{A}\triangleleft g(u) \right. \right) \\ & = & \left( \left| \lambda_{v}^{A}\triangleleft g\big(f(v)\big) \right. \right) \\ & \overset{\mathsf{def}}{=} & \mathsf{tr}(g\circ f) }

Where the judged equalities are applications of “$\beta$-reduction for a duality”.

### Terence Tao — More analysis questions from a high school student

A few months ago I posted a question about analytic functions that I received from a bright high school student, which turned out to be studied and resolved by de Bruijn. Based on this positive resolution, I thought I might try my luck again and list three further questions that this student asked which do not seem to be trivially resolvable.

1. Does there exist a smooth function ${f: {\bf R} \rightarrow {\bf R}}$ which is nowhere analytic, but is such that the Taylor series ${\sum_{n=0}^\infty \frac{f^{(n)}(x_0)}{n!} (x-x_0)^n}$ converges for every ${x, x_0 \in {\bf R}}$? (Of course, this series would not then converge to ${f}$, but instead to some analytic function ${f_{x_0}(x)}$ for each ${x_0}$.) I have a vague feeling that perhaps the Baire category theorem should be able to resolve this question, but it seems to require a bit of effort. (Update: answered by Alexander Shaposhnikov in comments.)
2. Is there a function ${f: {\bf R} \rightarrow {\bf R}}$ which meets every polynomial ${P: {\bf R} \rightarrow {\bf R}}$ to infinite order in the following sense: for every polynomial ${P}$, there exists ${x_0}$ such that ${f^{(n)}(x_0) = P^{(n)}(x_0)}$ for all ${n=0,1,2,\dots}$? Such a function would be rather pathological, perhaps resembling a space-filling curve. (Update: solved for smooth ${f}$ by Aleksei Kulikov in comments. The situation currently remains unclear in the general case.)
3. Is there a power series ${\sum_{n=0}^\infty a_n x^n}$ that diverges everywhere (except at ${x=0}$), but which becomes pointwise convergent after dividing each of the monomials ${a_n x^n}$ into pieces ${a_n x^n = \sum_{j=1}^\infty a_{n,j} x^n}$ for some ${a_{n,j}}$ summing absolutely to ${a_n}$, and then rearranging, i.e., there is some rearrangement ${\sum_{m=1}^\infty a_{n_m, j_m} x^{n_m}}$ of ${\sum_{n=0}^\infty \sum_{j=1}^\infty a_{n,j} x^n}$ that is pointwise convergent for every ${x}$? (Update: solved by Jacob Manaker in comments.)

Feel free to post answers or other thoughts on these questions in the comments.

## September 26, 2021

### Scott Aaronson — “Is China Ahead in the Quantum Computing Race?”

Please enjoy an hourlong panel discussion of that question on YouTube, featuring yours truly, my former MIT colleague Will Oliver, and political scientist and China scholar Elsa Kania. If you’re worried that the title sounds too sensationalistic, I hope my fellow panelists and I will pleasantly surprise you with our relative sobriety! Thanks so much to QC Ware for arranging the panel (full disclosure: I’m QC Ware’s scientific adviser).

## September 24, 2021

### Scott Aaronson — Was Scientific American Sokal’d?

Here’s yesterday’s clickbait offering from Scientific American, the once-legendary home of Martin Gardner’s Mathematical Games column:

Why the Term ‘JEDI’ Is Problematic for Describing Programs That Promote Justice, Equity, Diversity and Inclusion

The sad thing is, I see few signs that this essay was meant as a Sokal-style parody, although in many ways it’s written as one. The essay actually develops a 100% cogent, reasoned argument: namely, that the ideology of the Star Wars films doesn’t easily fit with the new ideology of militant egalitarianism at the expense of all other human values, including irony, humor, joy, and the nurturing of unusual talents. The authors are merely oblivious to the conclusion that most people would draw from their argument: namely, so much the worse for the militant egalitarianism then!

I predict that this proposal—to send the acronym “JEDI” the way of “mankind,” “blacklist,” and, err, “quantum supremacy”—will meet with opposition even from the wokeists themselves, a huge fraction of whom (in my experience) have soft spots for the Star Wars franchise. Recall for example that in 2014, Laurie Penny used Star Wars metaphors in her interesting response to my comment-171, telling male nerds like me that we need to learn to accept that “[we’re] not the Rebel Alliance, [we’re] actually part of the Empire and have been all along.” Admittedly, I’ve never felt like part of an Empire, although I’ll confess to some labored breathing lately when ascending flights of stairs.

As for me, I spent much of my life opposed in principle to Star Wars—I hated how the most successful “science fiction” franchise of all time became that way precisely by ditching any pretense of science and fully embracing mystical woo—but sure, when the chips are down, I’m crazy and radical enough to take the side of Luke Skywalker, even if a team of woke theorists is earnestly, unironically explaining to me that lightsabers are phallocentric and that Vader ranks higher on the intersectional oppression axis because of his breathing problem.

Meantime, of course, the US continues to careen toward its worst Constitutional crisis since the Civil War, as Trump prepares to run again in 2024, and as this time around, the Republicans are systematically purging state governments of their last Brad Raffenspergers, of anyone who might stand in the way of them simply setting aside the vote totals and declaring Trump the winner regardless of the actual outcome. It’s good to know that my fellow progressives have their eyes on the ball—so that when that happens, at least universities will no longer be using offensive acronyms like “JEDI”!

### Matt von Hippel — Why Can’t I Pay Academics to Do Things for Me?

A couple weeks back someone linked to this blog with a problem. A non-academic, he had done some mathematical work but didn’t feel it was ready to publish. He reached out to a nearby math department and asked what they would charge to help him clean up the work. If the price was reasonable, he’d do it, if not at least he’d know what it would cost.

Neither happened. He got no response, and got more and more frustrated.

For many of you, that result isn’t a big surprise. My academic readers are probably cringing at the thought of getting an email like that. But the guy’s instinct here isn’t too off-base. Certainly, in many industries that kind of email would get a response with an actual quote. Academia happens to be different, in a way that makes the general rule not really apply.

There’s a community called Effective Altruists that evaluate charities. They have a saying, “Money is the Unit of Caring”. The point of the saying isn’t that people with more money care more, or anything like that. Rather, it’s a reminder that, whatever a charity wants to accomplish, more money makes it easier. A lawyer could work an hour in a soup kitchen, but if they donated the proceeds of an hour’s work the soup kitchen could hire four workers instead. Food banks would rather receive money than food, because the money lets them buy whatever they need in bulk. As the Simpsons meme says, “money can be exchanged for goods and services”.

If you pay a charity, or a business, it helps them achieve what they want to do. If you pay an academic, it gets a bit more complicated.

The problem is that for academics, time matters a lot more than our bank accounts. If we want to settle down with a stable job, we need to spend our time doing things that look good on job applications: writing papers, teaching students, and so on. The rest of the time gets spent resting so we have the energy to do all of that.

(What about tenured professors? They don’t have to fight for their own jobs…but by that point, they’ve gotten to know their students and other young people in their sub-field. They want them to get jobs too!)

Money can certainly help with those goals, but not personal money: grant money. With grant money we can hire students and postdocs to do some of that work for us, or pay our own salary so we’re easier for a university to hire. We can buy equipment for those who need that sort of thing, and get to do more interesting science. Rather than “Money is the Unit of Caring”, for academics, “Grant Money is the Unit of Caring”.

Personal money, in contrast, just matters for our rest time. And unless we have expensive tastes, we usually get paid enough for that.

(The exception is for extremely underpaid academics, like PhD students and adjuncts. For some of them money can make a big difference to their quality of life. I had quite a few friends during my PhD who had side gigs, like tutoring, to live a bit more comfortably.)

This is not to say that it’s impossible to pay academics to do side jobs. People do. But when it works, it’s usually due to one of these reasons:

1. It’s fun. Side work trades against rest time, but if it helps us rest up then it’s not really a tradeoff. Even if it’s a little more boring that what we’d rather do, if it’s not so bad the money can make up the difference.
2. It looks good on a CV. This covers most of the things academics are sometimes paid to do, like writing articles for magazines. If we can spin something as useful to our teaching or research, or as good for the greater health of the field (or just for our “personal brand”), then we can justify doing it.
3. It’s a door out of academia. I’ve seen the occasional academic take time off to work for a company. Usually that’s a matter of seeing what it’s like, and deciding whether it looks like a better life. It’s not really “about the money”, even in those cases.

So what if you need an academic’s help with something? You need to convince them it’s worth their time. Money could do it, but only if they’re living precariously, like some PhD students. Otherwise, you need to show that what you’re asking helps the academic do what they’re trying to do: that it is likely to move the field forward, or that it fulfills some responsibility tied to their personal brand. Without that, you’re not likely to hear back.

## September 18, 2021

### Terence Tao — The Hardy–Littlewood–Chowla conjecture in the presence of a Siegel zero

Joni Teräväinen and I have just uploaded to the arXiv our preprint “The Hardy–Littlewood–Chowla conjecture in the presence of a Siegel zero“. This paper is a development of the theme that certain conjectures in analytic number theory become easier if one makes the hypothesis that Siegel zeroes exist; this places one in a presumably “illusory” universe, since the widely believed Generalised Riemann Hypothesis (GRH) precludes the existence of such zeroes, yet this illusory universe seems remarkably self-consistent and notoriously impossible to eliminate from one’s analysis.

For the purposes of this paper, a Siegel zero is a zero ${\beta}$ of a Dirichlet ${L}$-function ${L(\cdot,\chi)}$ corresponding to a primitive quadratic character ${\chi}$ of some conductor ${q_\chi}$, which is close to ${1}$ in the sense that

$\displaystyle \beta = 1 - \frac{1}{\eta \log q_\chi}$

for some large ${\eta \gg 1}$ (which we will call the quality) of the Siegel zero. The significance of these zeroes is that they force the Möbius function ${\mu}$ and the Liouville function ${\lambda}$ to “pretend” to be like the exceptional character ${\chi}$ for primes of magnitude comparable to ${q_\chi}$. Indeed, if we define an exceptional prime to be a prime ${p^*}$ in which ${\chi(p^*) \neq -1}$, then very few primes near ${q_\chi}$ will be exceptional; in our paper we use some elementary arguments to establish the bounds

$\displaystyle \sum_{q_\chi^{1/2+\varepsilon} < p^* \leq x} \frac{1}{p^*} \ll_\varepsilon \frac{\log x}{\eta \log q_\chi} \ \ \ \ \ (1)$

for any ${x \geq q_\chi^{1/2+\varepsilon}}$ and ${\varepsilon>0}$, where the sum is over exceptional primes in the indicated range ${q_\chi^{1/2+\varepsilon} < p^* \leq x}$; this bound is non-trivial for ${x}$ as large as ${q_\chi^{\eta^{1-\varepsilon}}}$. (See Section 1 of this blog post for some variants of this argument, which were inspired by work of Heath-Brown.) There is also a companion bound (somewhat weaker) that covers a range of ${p^*}$ a little bit below ${q_\chi^{1/2}}$.

One of the early influential results in this area was the following result of Heath-Brown, which I previously blogged about here:

Theorem 1 (Hardy-Littlewood assuming Siegel zero) Let ${h}$ be a fixed natural number. Suppose one has a Siegel zero ${\beta}$ associated to some conductor ${q_\chi}$. Then we have

$\displaystyle \sum_{n \leq x} \Lambda(n) \Lambda(n+h) = ({\mathfrak S} + O( \frac{1}{\log\log \eta} )) x$

for all ${q_\chi^{250} \leq x \leq q_\chi^{300}}$, where ${\Lambda}$ is the von Mangoldt function and ${{\mathfrak S}}$ is the singular series

$\displaystyle {\mathfrak S} = \prod_{p|h} \frac{p}{p-1} \prod_{p \nmid h} (1 - \frac{1}{(p-1)^2})$

In particular, Heath-Brown showed that if there are infinitely many Siegel zeroes, then there are also infinitely many twin primes, with the correct asymptotic predicted by the Hardy-Littlewood prime tuple conjecture at infinitely many scales.

Very recently, Chinis established an analogous result for the Chowla conjecture (building upon earlier work of Germán and Katai):

Theorem 2 (Chowla assuming Siegel zero) Let ${h_1,\dots,h_\ell}$ be distinct fixed natural numbers. Suppose one has a Siegel zero ${\beta}$ associated to some conductor ${q_\chi}$. Then one has

$\displaystyle \sum_{n \leq x} \lambda(n+h_1) \dots \lambda(n+h_\ell) \ll \frac{x}{(\log\log \eta)^{1/2} (\log \eta)^{1/12}}$

in the range ${q_\chi^{10} \leq x \leq q_\chi^{\log\log \eta / 3}}$, where ${\lambda}$ is the Liouville function.

In our paper we unify these results and also improve the quantitative estimates and range of ${x}$:

Theorem 3 (Hardy-Littlewood-Chowla assuming Siegel zero) Let ${h_1,\dots,h_k,h'_1,\dots,h'_\ell}$ be distinct fixed natural numbers with ${k \leq 2}$. Suppose one has a Siegel zero ${\beta}$ associated to some conductor ${q_\chi}$. Then one has

$\displaystyle \sum_{n \leq x} \Lambda(n+h_1) \dots \Lambda(n+h_k) \lambda(n+h'_1) \dots \lambda(n+h'_\ell)$

$\displaystyle = ({\mathfrak S} + O_\varepsilon( \frac{1}{\log^{1/10\max(1,k)} \eta} )) x$

for

$\displaystyle q_\chi^{10k+\frac{1}{2}+\varepsilon} \leq x \leq q_\chi^{\eta^{1/2}}$

for any fixed ${\varepsilon>0}$.

Our argument proceeds by a series of steps in which we replace ${\Lambda}$ and ${\lambda}$ by more complicated looking, but also more tractable, approximations, until the correlation is one that can be computed in a tedious but straightforward fashion by known techniques. More precisely, the steps are as follows:

• (i) Replace the Liouville function ${\lambda}$ with an approximant ${\lambda_{\mathrm{Siegel}}}$, which is a completely multiplicative function that agrees with ${\lambda}$ at small primes and agrees with ${\chi}$ at large primes.
• (ii) Replace the von Mangoldt function ${\Lambda}$ with an approximant ${\Lambda_{\mathrm{Siegel}}}$, which is the Dirichlet convolution ${\chi * \log}$ multiplied by a Selberg sieve weight ${\nu}$ to essentially restrict that convolution to almost primes.
• (iii) Replace ${\lambda_{\mathrm{Siegel}}}$ with a more complicated truncation ${\lambda_{\mathrm{Siegel}}^\sharp}$ which has the structure of a “Type I sum”, and which agrees with ${\lambda_{\mathrm{Siegel}}}$ on numbers that have a “typical” factorization.
• (iv) Replace the approximant ${\Lambda_{\mathrm{Siegel}}}$ with a more complicated approximant ${\Lambda_{\mathrm{Siegel}}^\sharp}$ which has the structure of a “Type I sum”.
• (v) Now that all terms in the correlation have been replaced with tractable Type I sums, use standard Euler product calculations and Fourier analysis, similar in spirit to the proof of the pseudorandomness of the Selberg sieve majorant for the primes in this paper of Ben Green and myself, to evaluate the correlation to high accuracy.

Steps (i), (ii) proceed mainly through estimates such as (1) and standard sieve theory bounds. Step (iii) is based primarily on estimates on the number of smooth numbers of a certain size.

The restriction ${k \leq 2}$ in our main theorem is needed only to execute step (iv) of this step. Roughly speaking, the Siegel approximant ${\Lambda_{\mathrm{Siegel}}}$ to ${\Lambda}$ is a twisted, sieved version of the divisor function ${\tau}$, and the types of correlation one is faced with at the start of step (iv) are a more complicated version of the divisor correlation sum

$\displaystyle \sum_{n \leq x} \tau(n+h_1) \dots \tau(n+h_k).$

For ${k=1}$ this sum can be easily controlled by the Dirichlet hyperbola method. For ${k=2}$ one needs the fact that ${\tau}$ has a level of distribution greater than ${1/2}$; in fact Kloosterman sum bounds give a level of distribution of ${2/3}$, a folklore fact that seems to have first been observed by Linnik and Selberg. We use a (notationally more complicated) version of this argument to treat the sums arising in (iv) for ${k \leq 2}$. Unfortunately for ${k > 2}$ there are no known techniques to unconditionally obtain asymptotics, even for the model sum

$\displaystyle \sum_{n \leq x} \tau(n) \tau(n+1) \tau(n+2),$

although we do at least have fairly convincing conjectures as to what the asymptotics should be. Because of this, it seems unlikely that one will be able to relax the ${k \leq 2}$ hypothesis in our main theorem at our current level of understanding of analytic number theory.

Step (v) is a tedious but straightforward sieve theoretic computation, similar in many ways to the correlation estimates of Goldston and Yildirim used in their work on small gaps between primes (as discussed for instance here), and then also used by Ben Green and myself to locate arithmetic progressions in primes.

## September 17, 2021

### Matt von Hippel — Four Gravitons and a…What Exactly Are You Now?

I cleaned up my “Who Am I?” page this week, and some of you might notice my title changed. I’m no longer a Postdoc. As of this month, I’m an Assistant Professor.

Before you start congratulating me too much, saying I’ve made it and so on…to be clear, I’m not that kind of Assistant Professor.

Universities in Europe and the US work a bit differently. The US has the tenure-track system: professors start out tenure-track, and have a fixed amount of time to prove themselves. If they do, they get tenure, and essentially permanent employment. If not, they leave.

Some European countries are starting to introduce a tenure track, sometimes just university by university or job-by-job. For the rest, professors are divided not into tenured and tenure-track, but into permanent and fixed-term. Permanent professors are permanent in the way a normal employee of a company would be: they can still be fired, but if not their contract continues indefinitely. Fixed-term professors, then, have contracts for just a fixed span of time. In some cases this can be quite short. In my case, it’s one year.

Some US readers might be thinking this sounds a bit like an Adjunct. In a very literal sense that’s right, in Danish my title is Adjunkt. But it’s not the type of Adjunct you’re thinking of. US universities employ Adjuncts primarily for teaching. They’re often paid per class, and re-hired each year (though with no guarantees, leading to a lot of stress). That’s not my situation. I’m paid a fixed salary, and my primary responsibility is research, not teaching. I also won’t be re-hired next year, unless I find a totally different source of funding. Practically speaking, my situation is a lot like an extra year of Postdoc.

There are some differences. I’m paid a little more than I was as a Postdoc, and I have a few more perks. I’m getting more pedagogy training in the spring, I don’t know if I would have gotten that opportunity if I was still just a Postdoc. It’s an extra level of responsibility, and that does mean something.

But it does also mean I’m still looking for a job. Once again I find myself in application season: polishing my talks and crossing my fingers, not knowing exactly where I’ll end up.

### David Hogg — awakening old projects

Megan Bedell (Flatiron) and I discussed what it would take to resurrect and finish one of our unfinished papers today. Not much, we think! But it's noticeable that at the end of this pandemic I have at least six papers that are 75-percent done.

## September 15, 2021

### Scott Aaronson — My ACM TechTalk on quantum supremadvantage

This Erev Yom Kippur, I wish to repent for not putting enough quantum computing content on this blog. Of course, repentance is meaningless unless accompanied by genuine reform. That being the case, please enjoy the YouTube video of my ACM TechTalk from last week about quantum supremacy—although, as you’ll see if you watch the thing, I oscillate between quantum supremacy and other terms like “quantum advantage” and even “quantum supremadvantage.” This represents the first time ever that I got pushback about a talk before I’d delivered it for political reasons—the social-justice people, it turns out, are actually serious about wanting to ban the term “quantum supremacy”—but my desire to point out all the difficulties with their proposal competed with my desire not to let that issue overshadow my talk.

And there’s plenty to talk about! While regular Shtetl-Optimized readers will have already heard (or read) most of what I say, I make some new comments, including about the new paper from last week, the night before my talk (!), by the USTC group in China, where they report a quantum supremacy experiment based on random circuit sampling with a superconducting chip, this time with a record-setting 60 qubits and 24 layers of gates. On the other hand, I also stress how increasing the circuit fidelity has become a much more urgent issue than further increasing the number of qubits (or in the case of BosonSampling, the number of photons), if our goal is for these experiments to remain a couple steps ahead of classical spoofing algorithms.

Anyway, I hope you enjoy my lovingly handcrafted visuals. Over the course of this pandemic, I’ve become a full convert to writing out my talks with a stylus pen rather than PowerPointing them—not only is it faster for me, not only does it allow for continuous scrolling rather than arbitrary divisions into slides, but it enforces simplicity and concision in ways they should be enforced.

While there was only time for me to field a few questions at the end of the talk, I later supplied written answers to 52 questions (!!) that I hadn’t gotten to. If you have a question, please check to see if it’s already there, and otherwise ask away in the comments!

Thanks so much to Yan Timanovsky for inviting and organizing this talk, and to whurley for hosting it.

## September 14, 2021

### Jordan Ellenberg — Wildlife removal

Something — a rabbit, it turned out — died underneath my porch. We could smell it but we couldn’t see it. Some people wouldn’t mind handling this themselves, but none of those people are me, so I called AAAC Wildlife Removal. The guy got out of his truck, sniffed around the porch, looked me right in the eye and said, “You never get used to the smell of death.” That alone was worth the price.

## September 13, 2021

### John Preskill — My 100th anniversary with Quantum Frontiers

Queen Elizabeth II celebrated the 60th year of her reign in 2012. I was working as a research assistant at Lancaster University, in northern England. The university threw a tea party, which I attended with a friend. She wrangled me into donning a party hat decorated with the Union Jack. Sixtieth anniversaries, I learned that year, are associated with diamond.

I had trouble finding what 100th anniversaries are associated with—I presume because few queens and couples reach their centennials. But I dug up an answer (all hail the Internet): bone. This post marks my bone anniversary with Quantum Frontiers—my 100th article.

To everyone who’s journeyed with me since article number one, or joined me partway through, or tolerating my writing for the first time now: Thank you. The opportunity to connect with so many people, from undergraduates to art teachers to quantum-information experts to librarians, has been a blessing. I’ve been surprised at, and grateful for, your sharing of what this blog means to you. You’ve reached out during campus visits, at American Physical Society conferences, in emails, and on Twitter. Thank you for enriching my writing life.

The journey began in mid-May 2013, when I signed my soul to Caltech’s PhD program. Fellow blogger John Preskill1 agreed to supervise me for five years. My first blog post said, “For five years, I will haunt this blog. (Spiros [the creator and gatekeeper of Quantum Frontiers] will haunt me if I don’t haunt it.) I’ll try to post one article per month.” I’ve posted one article per month since then.

Since then, much has transpired. Career-wise, I finished a Master’s degree; earned a PhD; completed a postdoctoral fellowship at the Harvard-Smithsonian Institute for Theoretical Atomic, Molecular, and Optical Physics; committed to doing physics at the National Institute for Science and Technology; stumbled over several acronyms; and founded a research group. Outside of work, I married off my best friend, fell in love and got married myself, lost two grandmothers, and wrote a book. I found myself in more countries than I can count on one hand, in more US states than countries, and jet-lagged for more hours than I care to recall. Quantum Frontiers has served as a diary and a taskmaster, challenging me to find and crystallize meaning in my experiences.

Although professional and personal affairs have had cameos, learning and research have starred in these 100 articles. My research has evolved over the past eight years, not only as recorded on, but also partially thanks to, this blog. Physicists lionize imagination, but some imaginings have no place even in physics papers. This blog serves as a home for the poetry, the puns, the evocative typos, and the far-fetched connections that Physical Review wouldn’t publish. But nurturing whimsy that Physical Review wouldn’t publish fosters whimsy that Physical Review would. Blogging, I’ve found, promotes creativity that enhances research.

My research dwelled in Abstract-Theory Land in 2013—pure quantum-information-theoretic thermodynamics. Caltech bridged my research to the real physical world: condensed matter; atomic molecular, and optical physics; and chemistry. The transformation continued during my postdoc, producing two experimental papers and initiating three more. I don’t think that the metamorphosis will progress, and I keep a foot in abstract theory. But if I awake one morning from troubled dreams, finding myself changed into an experimentalist or an engineer, you’ll be among the first to know.

I’ve come to know you a little over the past eight years. Many of you like listicles, according to WordPress statistics. You like former Quantum Frontiers blogger Shaun Maguire more than you like me; his most popular article has logged about 142,000 views, whereas mine has logged about 18,000. That’s ok; I’ve never been the popular kid, and I’m a Shaun Maguire fan, too. But, beyond Shaun and listicles, what draws you has surprised Spiros, John, and me. John anticipated that the article “Theoretical physics has not gone to the dogs” would stir up conversation (Do you think it’ll offend anyone? I asked. I hope so, he replied), but other articles have taken off on Twitter unexpectedly. Maybe we’ll understand you better another 100 articles down the line.

My first blog post contained a quote from Goethe’s Faust. The play opens with a poet reminiscing about his earlier years: “Nothing I had; and yet, enough for youth—/ delight in fiction, and the thirst for truth.” I still delight in fiction, as attested to by a 2020 post about the magical realist Gabriel García Marquez. I’d better thirst for truth no less, now that experimental collaborators are grounding me in reality. Partnering truth with fiction, so that each enhances the other, delights me most—and encapsulates what I aim for on Quantum Frontiers. As I wrote in May 2013, invoking the thirst for truth: Drink with me. I’ll drink a cup of tea to another 100 blog posts.

1Who hasn’t blogged much recently. How about it, John?

## September 07, 2021

### Peter RohdeThe vision for the global quantum internet

Originally posted on the Cambridge University Press blog Fifteeneightyfour. The book “The Quantum Internet: The Second Quantum Revolution” is available for purchase here along with a free online preview.

The true power of classical computing was never fully realised until the emergence of the internet, enabling the global integration of computing infrastructure. Indeed, many of our present-day devices would have very little utility without it. In the absence of the internet, consumers would not be able to rely on cloud infrastructure, information sharing and communication would not be possible, supercomputing would be largely inaccessible, and smartphones would be little more than bricks. The internet enables information to be a commodity whose market value drives technological advancement.

With emerging quantum technologies the quantum internet will be very different and far more powerful. Quantum computers operate according to entirely different principles in the way they process information, which in the future will enable many advanced and extremely economically valuable forms of computation to be implemented which cannot be realised on conventional computers. This raises the immediate question “what if we start networking them together?”

The classical internet is not capable of integrating remote quantum devices. This requires entirely new infrastructure that distributes quantum entanglement, a uniquely quantum information resource. The long-term vision for this infrastructure is the quantum internet, something that will likely develop in the coming decades. As with the emergence of the classical internet, it is to be expected that quantum computers will not realise their full potential until this infrastructure exists. But the motivation is even stronger than for classical devices. When classical computers are unified via distributed cloud-based architectures, the net computational power is effectively the sum of the parts. However, quantum computers exhibit fundamentally different scaling characteristics: a classical computer’s power is roughly proportionate to the number of CPUs it contains, whereas a quantum computer’s power grows exponentially with the number of quantum bits —or qubits— it processes. Therefore, upon unifying quantum devices via a quantum internet, we are left with something far greater than the sum of the parts, acting as an immediate incentive for global cooperation.

This quantum enhancement in computing power translates to enhanced economic incentives and returns, making quantum entanglement a highly valuable future commodity. As with any fungible commodity, entanglement will have a market value that drives economic investment into the infrastructure required to distribute it. In an ideal world, a unified global marketplace would emerge, similar to what we see in other global markets. The strategic implications of quantum computing are immense — breaking important cryptographic codes, making new unbreakable ones, with major implications for research & development, important optimisation problems, drug design and simulation. However, these strategic implications may also fracture quantum networks along geo-strategic boundaries, leading to quantum alliances, diplomacy and politics.

Although it is too early to predict exactly how the quantum internet will evolve over the coming decades, it’s clear this technology will underpin the future quantum era in the same way that the classical internet underpins the present digital era. One thing is certain — the global impact of the quantum internet will be enormous.

## September 03, 2021

### Jordan Ellenberg — A farewell to Tab

Another part of my childhood gone: I learned today that Coca-Cola discontinued Tab at the end of last year. This is middle age, to feel a loose kind of sorrow at the demise of things you didn’t even like.

## September 02, 2021

### Jordan Ellenberg — Dredging as good government

A few summers ago we had really bad floods in Madison. There were a lot of reasons. The proximate reason was it rained a lot. But also: we keep the levels of the lakes artificially high with dams, in part because not doing so would make the lake levels fluctuate a lot, and that is a problem for people who have houses on the lake. It’s hard to have your dock reliably terminate at the shoreline if the shoreline keeps moving. Another problem is that the waterways joining the lakes in the chain are choked with sediment and vegetation — so even when we DO open the dams and let the water flow southward towards the Rock River and eventually the Mississippi, the water is pretty slow to drain and it eventually overtops Lake Mendota and washes into the streets of downtown.

(Which, by the way, it was 10 years I lived in the Upper Midwest before I realized that Rockford, Illinois was a place where you could ford the Rock.)

Anyway, I was happy to see that the county is spending a few million dollars to dredge those connecting waterways so the lakes can drain more easily. This is not a headline-making move or an internet sensation; as far as I can tell, the number of times this effort has been mentioned on Twitter is in the single digits. And the effect won’t be dramatic — there’s no shiny new building or bridge or factory at the end of the expenditure. The effect is on what doesn’t happen, or at least is less likely to happen: another flood causing hundreds of millions of dollars of damage.

We pay pretty high property taxes in Madison, as things go, but what’s good about our local government is that I truly feel a lot of this kind of thing happens here. We fix things before they break. It’s something governments mostly don’t get credit for. But they should.

## August 28, 2021

### Jordan Ellenberg — One thing about The White Lotus

This was a good television show, made by Mike White, who wrote three episodes of Freaks and Geeks as well as running the excellent and little watched Laura Dern show Enlightened. This one, lots of people watched, and wrote thinkpieces about. Partly, I think, because the acting was really good, and viewers experienced the characters as actually existing humans more than one usually does while watching TV. Thus people were mad at them. And I think the writing about the show was probably a bit overconcerned with the question of who the show wanted you to be mad at, and whether these were the right people to be mad at.

This post won’t make any sense if you haven’t watched the show, and contains spoilers, so if you haven’t watched the show, I recommend you do so instead of reading my post! It’s good! (The show, not the post. The post is just OK.)

I just wanted to make an observation I didn’t see in the thinkpieces, which is the twinning of the characters of Rachel and Belinda. They are both committed to the idea that rich people are concentrations of resources, which with some skill can be extracted. They are both, in some sense, hacks. Rachel aims to be a writer; we are supposed to see her new husband (wealthy, emotionally needy, hyperattentive to potential disrespect) Shane as a jerk for not taking her writing seriously, but simultaneously recognize that she’s not herself serious in her writing goals. Belinda gives a massage to a hotel guest (wealthy, emotionally needy, hyperattentive to potential abandonment) played by Jennifer Coolidge, soothing her client with a routine that asks her to say “I am my own phallic mother and my own vaginal father” and throwing in a chant of the Gayatri Mantra. Nothing here suggests she has any special ability to heal; but Coolidge’s character imprints on her, promising her patronage, her own business bankrolled by Coolidge’s money.

This is the world they live in, this is their game — everything changes if you can get the roaming eye of wealth to land, out of all the places it could settle, on you.

But Belinda lets it get away. One of the neatest tricks of this series is the way Coolidge’s character at first appears to be played for slightly low-rent laughs, then for pity, only to finally reveal herself as the only person on the show who arrives at anything like real insight. She explains to Belinda that her impulse to fund Belinda’s House of Healing was just her impulsive way of trying to buy intimacy, creating another person bound to her by money — then she gives her a bunch of money anyway, but walks away. What follows is one of the show’s Big Scenes: Rachel asks Belinda for advice about her suddenly not-fully-enjoyable marriage, and Belinda just walks out, visibly weary and in pain. A lot of viewers have seen this as a triumphant moment, Belinda exerting real agency, refusing to perform emotional labor for yet another overprivileged guest, but I don’t think that’s exactly right. Rachel doesn’t pair with Coolidge here, she pairs with Belinda herself, and Belinda’s bitterness here is coming from the fact that Rachel has succeeded where she’s just failed.

## August 26, 2021

### Clifford Johnson — A Return

Well, I'm back.

It has been very quiet on the blog recently because I've been largely occupied with the business of moving. Where have I moved to? For the next academic year I'll be on the faculty at Princeton University (as a Presidential Visiting Scholar) in the Physics department. It's sort of funny because, as part of the business of moving forward in my research, I've been looking back a lot on earlier eras of my work recently (as you know from my last two year's exciting activity in non-perturbative matrix models), and rediscovering and re-appreciating (and then enhancing and building on) a lot of the things I was doing decades ago... So now it seems that I'm physically following myself back in time too.

Princeton was in a sense my true physical point of entry into the USA: My first postdoc was here (at the Institute for Advanced Study, nearby), and I really began [...] Click to continue reading this post

The post A Return appeared first on Asymptotia.

### Richard Easther — Birthday Boy

If there is an afterlife for physicists it will certainly include conferences pulling together luminaries from across the centuries: “Newton, you really must meet Einstein, he’s over there by the registration desk. I know you’ll have lots to talk about.”

My Good Place may well be your Bad Place, but any physicist’s post-mortem bucket list has to include Ernest Rutherford, born 150 years ago on Monday, August 30th. You know those little cartoon atoms that are a graphical shorthand for “science”? The key idea they express – the central nucleus wrapped by a cloud of electrons – is Rutherford’s. The proton? Discovered by Rutherford and his collaborators. Turning the atom of one element into another, fulfilling the dreams of alchemists? Also Rutherford.

New Zealanders talk about “tall poppy syndrome”, whereby we cut down successful individuals for daring to rise above the rest of the field. But we also stretch some poppies to their limits, as seen in our love of Olympic medal tables computed on a per capita basis. Consequently, growing up with a passion for science I was never certain whether Rutherford was fully famous or just a local lad made good.

I shouldn’t have doubted. Nobel Prizes are awarded every year but insights as big as “the world is made of atoms” and “atoms are built from smaller, simpler objects” may arrive only once a century, and Rutherford has a big share of both breakthroughs. He is truly the real deal.

Rutherdford’s story starts on a farm near Nelson, followed by studies at Canterbury in Christchurch, and ends with him at Cambridge. But his full CV is the classic trajectory of the peripatetic scholar: he headed to Cambridge for his PhD in 1894, began a decade as a professor at McGill in Montreal in 1898, moved to Manchester in 1908, before finally returning to Cambridge in 1919.

In fact, one testimonial to Rutherford’s impact became apparent to me on a visit to McGill, where I was told to make my way to the Rutherford Building, which was the home of their Department of Physics. My PhD is from Canterbury in New Zealand where I did my research in their “Rutherford Building”. Manchester and Cambridge likewise each have their own Rutherford Building. Having just one building named after you is a rare achievement – Rutherford collected this honour at each place he stopped, a remarkable architectural Grand Slam.

If you guessed that as a student Rutherford was a colonial rough diamond let loose in late-Victorian Cambridge you would be right. He was subject to the predictable resentments and exclusions that came with this status, even if his sheer brilliance tipped the scales back in his favour. However, it must have been sweet to return to Cambridge as a full professor and then rise to the pinnacle of British science as president of the Royal Society in the 1920s. Rutherford appears to be a man who greatly enjoyed his life so I doubt he held a significant grudge. (Also, his Manchester salary reputedly made him the highest-paid academic in the United Kingdom and Cambridge had to stretch to match it.)

Rutherford is repeatedly described as “energetic”, “a force of nature” and a “boy” – very much in the positive sense of someone who brought both energy and wonder to everything he did. He was also loud. There is a famous photograph of him in a Cambridge lab filled with sensitive equipment. There is a sign above his head saying “Talk Softly Please”, that was reputedly installed for the benefit of the “the prof”. Another story claims that the one song he knew was “Onward Christian Soldiers” – and only the lyrics but not the tune – and that he once paraded around the room (in a different, earlier lab) performing it at the top of lungs after a successful experiment. This also appears to have been the full extent of his engagement with organised religion.

Beyond his discoveries Rutherford shaped the scientific community, both here in New Zealand and in his adopted home. A succession of people did their best work in groups he led, with a dozen of his proteges winning Nobel Prizes of their own. Rutherford was a passionate believer in science as a transnational enterprise. He was President of the Academic Assistance Council which found jobs for Jewish scholars dismissed from German universities after the Nazi regime took power in 1933, and he advocated for the participation of women in science. [Full disclosure: Harvard astronomer, Cecilia Payne-Gaposchkin thought otherwise, as Rutherford failed to quell boorish behaviour by her male classmates in Cambridge lecture theatres where she was the only woman.]

Rutherford is widely reported to have been approachable and an enemy of needless hierarchy. One tale has him chairing a PhD exam that a more junior academic saw as a chance to impress the great man by mercilessly grilling the candidate. It became apparent that no progress was being made on a gnarly question and Rutherford halted proceedings, turning to his younger colleague and saying, “I don’t know either – why don’t you tell us both,” to the visceral relief of the student. Better yet, the inquisitor was then forced to admit he also didn’t know the answer. I can’t find a source for this story, but the tales told about a person are sometimes as revealing as the truth.

Rutherford retained close links with New Zealand. One of his many proteges was Earnest Marsden, who returned (after both war service and helping Rutherford discover the atomic nucleus while still an undergraduate) to New Zealand he played a huge role in New Zealand science for decades.

My own path crossed the wake of another of Rutherfords students. Percy William Burbidge, who was appointed to head the Department of Physics at Auckland on Rutherford’s recommendation, spending decades in the job, and I have a decade of my own in the role. A letter from the great man to his former pupil is framed in one of Auckland teaching labs; its focus is a battle with Chemistry for floorspace and student numbers. Some things never change.

If I was to enjoy a pint with Rutherford in some science Valhalla, I would be meeting a hero of mine. But Baron Rutherford of Nelson (whose coat of arms includes both a kiwi and a Māori warrior) would undoubtedly have questions for me about the state of play back home. I could truthfully tell him that we enjoy cordial relations with the chemists but not all the news would be to his liking. In particular, my guess is that he would have sympathy with current concerns about support for young researchers – with his fiancee Mary he endured a years-long engagement before they could afford to marry and once successful he was effectively a one-man employment agency for young scientists across the then-Empire.

New Zealanders often portray Rutherford as a bluff, practical fellow in a world of abstract intellectuals but this shortchanges him in many ways. Rutherford was the world’s leading experimentalist during the time that physics assimilated the deep ideas of relativity and quantum mechanics and untangled the microscopic structure of matter. He spoke to Einstein as a peer and was a mentor to Niels Bohr, one of the founders of quantum mechanics. He was certainly one of the deepest thinkers my field has ever seen. Where he led, others followed.

Rutherford saw little practical use in his work – he once dismissed nuclear power as “moonshine”; it was only just before he died that new discoveries led him to revise this belief. He was driven by a passion for understanding and discovery rather than “applications”, and saw open-ended inquiry rather than goal-driven research as key to unlocking the full benefits that science offers society at large. My guess is that he would see our current science system as unambitious and top-down, driven by expedience and the shifting short-term winds of government priority rather than harnessing the full curiosity of our scientific community. But he would be optimistic that things could improve.

Beyond being a trove of information and ideas, science is a human community, and Rutherford contributed more than almost anyone to both aspects of our shared enterprise. One hundred and fifty years on it seems that there is much to celebrate in this great New Zealander – and much still to learn from him.

Happy Birthday.

IMAGE: Monument to Rutherford at Brightwater, near Nelson.

## August 24, 2021

### John Preskill — How a liberal-arts education has enhanced my physics research

I attended a liberal-arts college, and I reveled in the curriculum’s breadth. My coursework included art history, psychology, biology, economics, computer science, German literature, archaeology, and chemistry. My major sat halfway between the physics major and the create-your-own major; the requirements consisted mostly of physics but included math, philosophy, and history. By the end of college, I’d determined to dive into physics. So I undertook a physics research assistantship, enlisted in a Master’s program and then a PhD program, and became a theoretical physicist. I’m now building a physics research group that spans a government institute and the University of Maryland. One might think that I became a physicist despite my art history and archaeology.

My liberal-arts education did mortify me a little as I pursued my Master’s degree. Most of my peers had focused on physics, mathematics, and computer science while I’d been reading Aristotle. They seemed to breeze through coursework that I clawed my way through. I still sigh wistfully over math courses, such as complex analysis, that I’ve never taken. Meanwhile, a debate about the liberal arts has been raging across the nation. Debt is weighing down recent graduates, and high-school students are loading up on STEMM courses. Colleges are cutting liberal-arts departments, and educational organizations are broadcasting the value of liberal-arts educations.

I’m not an expert in public policy or school systems; I’m a physicist. As a physicist, I’m grateful for my liberal-arts education. It’s enhanced my physics research in at least five ways.

(1) I learned to seek out, and take advantage of, context. Early in my first German-literature course, I’d just completed my first reading assignment. My professor told my class to fetch out our books and open them to the beginning. A few rustles later, we held our books open to page one of the main text.

No, no, said my professor. Open your books to the beginning. Did anyone even look at the title page?

We hadn’t, we admitted. We’d missed a wealth of information, as the book contained a reproduction of an old title page. Publishers, fonts, and advertisement styles have varied across the centuries and the globe. They, together with printing and reprinting dates, tell stories about the book’s origin, popularity, role in society, and purposes. Furthermore, a frontispiece is worth a thousand words, all related before the main text begins. When my class turned to the main text, much later in the lecture, we saw it in a new light. Context deepens and broadens our understanding.

When I read a physics paper, I start at the beginning—the true beginning. I note the publication date, the authors, their institutions and countries, and the journal. X’s lab performed the experiment reported on? X was the world’s expert in Y back then but nursed a bias against Z, a bias later proved to be unjustified. So I should aim to learn from the paper about Y but should take statements about Z with a grain of salt. Seeking and processing context improves my use of physics papers, thanks to a German-literature course.

(2) I learned argumentation. Doing physics involves building, analyzing, criticizing, and repairing arguments. I argue that mathematical X models physical system Y accurately, that an experiment I’ve proposed is feasible with today’s technology, and that observation Z supports a conjecture of mine. Physicists also prove mathematical statements deductively. I received proof-writing lessons in a math course, halfway through college. One of the most competent teachers I’ve ever encountered taught the course. But I learned about classes of arguments and about properties of arguments in a philosophy course, Informal Logic.

There, I learned to distinguish deduction from inference and an argument’s validity and soundness from an argument’s strength and cogency. I learned strategies for proving arguments and learned fallacies to criticize. I came to respect the difference between “any” and “every,” which I see interchanged in many physics papers. This philosophical framework helps me formulate, process, dissect, criticize, and correct physics arguments.

For instance, I often parse long, dense, technical proofs of mathematical statements. First, I identify whether the proof strategy is reductio ad absurdum, proof by counterexample, or another strategy. Upon identifying the overarching structure, I can fill my understanding with details. Additionally, I check proofs by students, and I respond to criticisms of my papers by journal referees. I could say, upon reading an argument, “Something feels a bit off, and it’s sort of like the thing that felt a bit off in that paper I read last Tuesday.” But I’d rather group the argument I’m given together with arguments I know how to tackle. I’d rather be able to say, “They’re straw-manning my argument” or “That argument begs the question.” Doing so, I put my finger on the problem and take a step toward solving it.

(3) I learned to analyze materials to bits, then extract meaning from the analysis. English and German courses trained me to wring from literature every drop of meaning that I could discover. I used to write one to three pages about a few-line quotation. The analysis would proceed from diction and punctuation to literary devices; allusions; characters’ relationships with each other, themselves, and nature; and the quotation’s role in the monograph. Everything from minutia to grand themes required scrutiny, according to the dissection technique I trained in. Every pincer probe lifted another skein of skin or drew aside another tendon, offering deeper insights into the literary work. I learned to find the skeins to lift, lift them in the right direction, pinpoint the insights revealed, and integrate the insights into a coherent takeaway.

This training has helped me assess and interpret mathematics. Physicists pick a physical system to study, model the system with equations, and solve the equations. The next two steps are intertwined: evaluating whether one solved the equations correctly and translating the solution into the physical system’s behavior. These two steps necessitate a dissection of everything from minutia to grand themes: Why should this exponent be 4/5, rather than any other number? Should I have expected this energy to depend on that length in this way? Is the physical material aging quickly or resisting change? These questions’ answers inform more-important questions: Who cares? Do my observations shed light worth anyone’s time, or did I waste a week solving equations no one should care about?

To answer all these questions, I draw on my literary training: I dissect content, pinpoint insights, and extract meaning. Having performed this analysis in literature courses facilitates an arguably deeper analysis than my physics training did: In literature courses, I had to organize my thoughts and articulate them in essays. This process revealed holes in my argumentation, as well as connections that I’d overlooked. In contrast, a couple of lines in my physics homework earned full marks. The critical analysis of literature has deepened my assessment of solutions’ correctness, physical interpretation of mathematics, and extraction of meaning from solutions.

(4) I learned what makes a physicist a physicist. In college, I had a friend who was studying applied mathematics and economics. Over dinner, he described a problem he’d encountered in his studies. I replied, almost without thinking, “From a physics perspective, I’d approach the problem like this.” I described my view, which my friend said he wouldn’t have thought of. I hadn’t thought of myself, and of the tools I was obtaining in the physics department, the way I did after our conversation.

Physics involves a unique toolkit,1 set of goals, and philosophy. Physicists identify problems, model them, solve them, and analyze the results in certain ways. Students see examples of these techniques in lectures and practice these techniques for homework. But, as a student, I rarely heard articulations of the general principles that underlay the examples scattered across my courses like a handful of marbles across a kitchen floor. Example principles include, if you don’t understand an abstract idea, construct a simple example. Once you’ve finished a calculation, check whether your answer makes sense in the most extreme scenarios possible. After solving an equation, interpret the solution in terms of physical systems—of how particles and waves move and interact.

I was learning these techniques, in college, without realizing that I was learning them. I became conscious of the techniques by comparing the approach natural to me with the approach taken in another discipline. Becoming conscious of my toolkit enabled me to wield it more effectively; one can best fry eggs when aware that one owns a spatula. The other disciplines at my liberal-arts college served as a foil for physics. Seeing other disciplines, I saw what makes physics physics—and improved my ability to apply my physics toolkit.

(5) I learned to draw connections between diverse ideas. Senior year of high school, my courses extended from physics to English literature. One might expect such a curriculum to feel higgledy-piggledy, but I found threads that ran through all my courses. For instance, I practiced public speaking in Reasoning, Research, and Rhetoric. Because I studied rhetoric, my philosophy teacher turned to me for input when introducing the triumvirate “thesis, antithesis, synthesis.”2 The philosophy curriculum included the feminist essay “If Men Could Menstruate,” which complemented the feminist book Wide Sargasso Sea in my English-literature course. In English literature, I learned that Baldassare Castiglione codified how Renaissance noblemen should behave, in The Book of the Courtier. The author’s name was the answer to the first question on my AP Modern European History exam. My history course covered Isaac Newton and Gottfried Wilhelm Leibniz, who invented calculus during the 17th century. I leveraged their discoveries in my calculus course, which I applied in my physics course. My physics teacher hoped that his students would solve the world’s energy problems—perhaps averting the global thermonuclear war that graced every debate in my rhetoric course (“If you don’t accept my team’s policy, then X will happen, leading to Y, leading to Z, which will cause a global thermonuclear war”).

Threads linked everything across my liberal-arts education; every discipline featured an idea that paralleled an idea in another discipline. Finding those parallels grew into a game for me, a game that challenged my creativity. Cultivating that creativity paid off when I began doing physics research. Much of my research has resulted from finding, in one field, a concept that resembles a concept in another field. I smash the ideas together to gain insight into each discipline from the other discipline’s perspective. For example, during my PhD studies, I found a thread connecting the physics of DNA strands to the physics of black holes. That thread initiated a research program of mine that’s yielded nine papers, garnered 19 collaborators, and spawned two experiments. Studying diverse subjects trained me to draw creative connections, which underlie much physics research.

I haven’t detailed all the benefits that a liberal-arts education can accrue to a physics career. For instance, the liberal arts enhance one’s communication skills, key to collaborating on research and to conveying one’s research. Without conveying one’s research adroitly, one likely won’t impact a field much. Also, a liberal-arts education can help one connect with researchers from across the globe on a personal level.3 Personal connections enhance science, which scientists—humans—undertake.

As I began building my research group, I sought advice from an MIT professor who’d attended MIT as an undergraduate. He advised me to seek students who have unusual backgrounds, including liberal-arts educations. Don’t get me wrong; I respect and cherish the colleagues and friends of mine who attended MIT, Caltech, and other tech schools as undergraduates. Still, I wouldn’t trade my German literature and economics. The liberal arts have enriched my physics research no less than they’ve enriched the rest of my life.

1A toolkit that overlaps partially with other disciplines’ toolkits, as explained in (3).

2I didn’t help much. When asked to guess the last concept in the triumvirate, I tried “debate.”

3I once met a Ukrainian physicist who referred to Ilya Muromets in a conversation. Ilya Muromets is a bogatyr, a knight featured in Slavic epics set in the Middle Ages. I happened to have taken a Slavic-folklore course the previous year. So I responded with a reference to Muromets’s pals, Dobrynya Nikitich and Alyosha Popovich. The physicist and I hit it off, and he taught me much about condensed matter over the following months.

## August 20, 2021

### Terence Tao — Undecidable translational tilings with only two tiles, or one nonabelian tile

Rachel Greenfeld and I have just uploaded to the arXiv our preprint “Undecidable translational tilings with only two tiles, or one nonabelian tile“. This paper studies the following question: given a finitely generated group ${G}$, a (periodic) subset ${E}$ of ${G}$, and finite sets ${F_1,\dots,F_J}$ in ${G}$, is it possible to tile ${E}$ by translations ${a_j+F_j}$ of the tiles ${F_1,\dots,F_J}$? That is to say, is there a solution ${\mathrm{X}_1 = A_1, \dots, \mathrm{X}_J = A_J}$ to the (translational) tiling equation

$\displaystyle (\mathrm{X}_1 \oplus F_1) \uplus \dots \uplus (\mathrm{X}_J \oplus F_J) = E \ \ \ \ \ (1)$

for some subsets ${A_1,\dots,A_J}$ of ${G}$, where ${A \oplus F}$ denotes the set of sums ${\{a+f: a \in A, f \in F \}}$ if the sums ${a+f}$ are all disjoint (and is undefined otherwise), and ${\uplus}$ denotes disjoint union. (One can also write the tiling equation in the language of convolutions as ${1_{\mathrm{X}_1} * 1_{F_1} + \dots + 1_{\mathrm{X}_J} * 1_{F_J} = 1_E}$.)

A bit more specifically, the paper studies the decidability of the above question. There are two slightly different types of decidability one could consider here:

• Logical decidability. For a given ${G, E, J, F_1,\dots,F_J}$, one can ask whether the solvability of the tiling equation (1) is provable or disprovable in ZFC (where we encode all the data ${G, E, F_1,\dots,F_J}$ by appropriate constructions in ZFC). If this is the case we say that the tiling equation (1) (or more precisely, the solvability of this equation) is logically decidable, otherwise it is logically undecidable.
• Algorithmic decidability. For data ${G,E,J, F_1,\dots,F_J}$ in some specified class (and encoded somehow as binary strings), one can ask whether the solvability of the tiling equation (1) can be correctly determined for all choices of data in this class by the output of some Turing machine that takes the data as input (encoded as a binary string) and halts in finite time, returning either YES if the equation can be solved or NO otherwise. If this is the case, we say the tiling problem of solving (1) for data in the given class is algorithmically decidable, otherwise it is algorithmically undecidable.

Note that the notion of logical decidability is “pointwise” in the sense that it pertains to a single choice of data ${G,E,J,F_1,\dots,F_J}$, whereas the notion of algorithmic decidability pertains instead to classes of data, and is only interesting when this class is infinite. Indeed, any tiling problem with a finite class of data is trivially decidable because one could simply code a Turing machine that is basically a lookup table that returns the correct answer for each choice of data in the class. (This is akin to how a student with a good memory could pass any exam if the questions are drawn from a finite list, merely by memorising an answer key for that list of questions.)

The two notions are related as follows: if a tiling problem (1) is algorithmically undecidable for some class of data, then the tiling equation must be logically undecidable for at least one choice of data for this class. For if this is not the case, one could algorithmically decide the tiling problem by searching for proofs or disproofs that the equation (1) is solvable for a given choice of data; the logical decidability of all such solvability questions will ensure that this algorithm always terminates in finite time.

One can use the Gödel completeness theorem to interpret logical decidability in terms of universes (also known as structures or models) of ZFC. In addition to the “standard” universe ${{\mathfrak U}}$ of sets that we believe satisfies the axioms of ZFC, there are also other “nonstandard” universes ${{\mathfrak U}^*}$ that also obey the axioms of ZFC. If the solvability of a tiling equation (1) is logically undecidable, this means that such a tiling exists in some universes of ZFC, but not in others.

(To continue the exam analogy, we thus see that a yes-no exam question is logically undecidable if the answer to the question is yes in some parallel universes, but not in others. A course syllabus is algorithmically undecidable if there is no way to prepare for the final exam for the course in a way that guarantees a perfect score (in the standard universe).)

Questions of decidability are also related to the notion of aperiodicity. For a given ${G, E, J, F_1,\dots,F_J}$, a tiling equation (1) is said to be aperiodic if the equation (1) is solvable (in the standard universe ${{\mathfrak U}}$ of ZFC), but none of the solutions (in that universe) are completely periodic (i.e., there are no solutions ${\mathrm{X}_1 = A_1,\dots, \mathrm{X}_J = A_J}$ where all of the ${A_1,\dots,A_J}$ are periodic). Perhaps the most well-known example of an aperiodic tiling (in the context of ${{\bf R}^2}$, and using rotations as well as translations) come from the Penrose tilings, but there are many others besides.

It was (essentially) observed by Hao Wang in the 1960s that if a tiling equation is logically undecidable, then it must necessarily be aperiodic. Indeed, if a tiling equation fails to be aperiodic, then (in the standard universe) either there is a periodic tiling, or there are no tilings whatsoever. In the former case, the periodic tiling can be used to give a finite proof that the tiling equation is solvable; in the latter case, the compactness theorem implies that there is some finite fragment of ${E}$ that is not compatible with being tiled by ${F_1,\dots,F_J}$, and this provides a finite proof that the tiling equation is unsolvable. Thus in either case the tiling equation is logically decidable.

This observation of Wang clarifies somewhat how logically undecidable tiling equations behave in the various universes of ZFC. In the standard universe, tilings exist, but none of them will be periodic. In nonstandard universes, tilings may or may not exist, and the tilings that do exist may be periodic (albeit with a nonstandard period); but there must be at least one universe in which no tiling exists at all.

In one dimension when ${G={\bf Z}}$ (or more generally ${G = {\bf Z} \times G_0}$ with ${G_0}$ a finite group), a simple pigeonholing argument shows that no tiling equations are aperiodic, and hence all tiling equations are decidable. However the situation changes in two dimensions. In 1966, Berger (a student of Wang) famously showed that there exist tiling equations (1) in the discrete plane ${E = G = {\bf Z}^2}$ that are aperiodic, or even logically undecidable; in fact he showed that the tiling problem in this case (with arbitrary choices of data ${J, F_1,\dots,F_J}$) was algorithmically undecidable. (Strictly speaking, Berger established this for a variant of the tiling problem known as the domino problem, but later work of Golomb showed that the domino problem could be easily encoded within the tiling problem.) This was accomplished by encoding the halting problem for Turing machines into the tiling problem (or domino problem); the latter is well known to be algorithmically undecidable (and thus have logically undecidable instances), and so the latter does also. However, the number of tiles ${J}$ required for Berger’s construction was quite large: his construction of an aperiodic tiling required ${J = 20426}$ tiles, and his construction of a logically undecidable tiling required an even larger (and not explicitly specified) collection of tiles. Subsequent work by many authors did reduce the number of tiles required; in the ${E=G={\bf Z}^2}$ setting, the current world record for the fewest number of tiles in an aperiodic tiling is ${J=8}$ (due to Amman, Grunbaum, and Shephard) and for a logically undecidable tiling is ${J=11}$ (due to Ollinger). On the other hand, it is conjectured (see Grunbaum-Shephard and Lagarias-Wang) that one cannot lower ${J}$ all the way to ${1}$:

Conjecture 1 (Periodic tiling conjecture) If ${E}$ is a periodic subset of a finitely generated abelian group ${G}$, and ${F}$ is a finite subset of ${G}$, then the tiling equation ${\mathrm{X} \oplus F = E}$ is not aperiodic.

This conjecture is known to be true in two dimensions (by work of Bhattacharya when ${G=E={\bf Z}^2}$, and more recently by us when ${E \subset G = {\bf Z}^2}$), but remains open in higher dimensions. By the preceding discussion, the conjecture implies that every tiling equation with a single tile is logically decidable, and the problem of whether a given periodic set can be tiled by a single tile is algorithmically decidable.

In this paper we show on the other hand that aperiodic and undecidable tilings exist when ${J=2}$, at least if one is permitted to enlarge the group ${G}$ a bit:

Theorem 2 (Logically undecidable tilings)
• (i) There exists a group ${G}$ of the form ${G = {\bf Z}^2 \times G_0}$ for some finite abelian ${G_0}$, a subset ${E_0}$ of ${G_0}$, and finite sets ${F_1, F_2 \subset G}$ such that the tiling equation ${(\mathbf{X}_1 \oplus F_1) \uplus (\mathbf{X}_2 \oplus F_2) = {\bf Z}^2 \times E_0}$ is logically undecidable (and hence also aperiodic).
• (ii) There exists a dimension ${d}$, a periodic subset ${E}$ of ${{\bf Z}^d}$, and finite sets ${F_1, F_2 \subset G}$ such that tiling equation ${(\mathbf{X}_1 \oplus F_1) \uplus (\mathbf{X}_2 \oplus F_2) = E}$ is logically undecidable (and hence also aperiodic).
• (iii) There exists a non-abelian finite group ${G_0}$ (with the group law still written additively), a subset ${E_0}$ of ${G_0}$, and a finite set ${F \subset {\bf Z}^2 \times G_0}$ such that the nonabelian tiling equation ${\mathbf{X} \oplus F = {\bf Z}^2 \times E_0}$ is logically undecidable (and hence also aperiodic).

We also have algorithmic versions of this theorem. For instance, the algorithmic version of (i) is that the problem of determining solvability of the tiling equation ${(\mathbf{X}_1 \oplus F_1) \uplus (\mathbf{X}_2 \oplus F_2) = {\bf Z}^2 \times E_0}$ for a given choice of finite abelian group ${G_0}$, subset ${E_0}$ of ${G_0}$, and finite sets ${F_1, F_2 \subset {\bf Z}^2 \times G_0}$ is algorithmically undecidable. Similarly for (ii), (iii).

This result (together with a negative result discussed below) suggest to us that there is a significant qualitative difference in the ${J=1}$ theory of tiling by a single (abelian) tile, and the ${J \geq 2}$ theory of tiling with multiple tiles (or one non-abelian tile). (The positive results on the periodic tiling conjecture certainly rely heavily on the fact that there is only one tile, in particular there is a “dilation lemma” that is only available in this setting that is of key importance in the two dimensional theory.) It would be nice to eliminate the group ${G_0}$ from (i) (or to set ${d=2}$ in (ii)), but I think this would require a fairly significant modification of our methods.

Like many other undecidability results, the proof of Theorem 2 proceeds by a sequence of reductions, in which the undecidability of one problem is shown to follow from the undecidability of another, more “expressive” problem that can be encoded inside the original problem, until one reaches a problem that is so expressive that it encodes a problem already known to be undecidable. Indeed, all three undecidability results are ultimately obtained from Berger’s undecidability result on the domino problem.

The first step in increasing expressiveness is to observe that the undecidability of a single tiling equation follows from the undecidability of a system of tiling equations. More precisely, suppose we have non-empty finite subsets ${F_j^{(m)}}$ of a finitely generated group ${G}$ for ${j=1,\dots,J}$ and ${m=1,\dots,M}$, as well as periodic sets ${E^{(m)}}$ of ${G}$ for ${m=1,\dots,M}$, such that it is logically undecidable whether the system of tiling equations

$\displaystyle (\mathrm{X}_1 \oplus F_1^{(m)}) \uplus \dots \uplus (\mathrm{X}_J \oplus F_J^{(m)}) = E^{(m)} \ \ \ \ \ (2)$

for ${m=1,\dots,M}$ has no solution ${\mathrm{X}_1 = A_1,\dots, \mathrm{X}_J = A_J}$ in ${G}$. Then, for any ${N>M}$, we can “stack” these equations into a single tiling equation in the larger group ${G \times {\bf Z}/N{\bf Z}}$, and specifically to the equation

$\displaystyle (\mathrm{X}_1 \oplus F_1) \uplus \dots \uplus (\mathrm{X}_J \oplus F_J) = E \ \ \ \ \ (3)$

where

$\displaystyle F_j := \biguplus_{m=1}^M F_j^{(m)} \times \{m\}$

and

$\displaystyle E := \biguplus_{m=1}^M E^{(m)} \times \{m\}.$

It is a routine exercise to check that the system of equations (2) admits a solution in ${G}$ if and only if the single equation (3) admits a equation in ${G \times {\bf Z}/N{\bf Z}}$. Thus, to prove the undecidability of a single equation of the form (3) it suffices to establish undecidability of a system of the form (2); note here how the freedom to select the auxiliary group ${G_0}$ is important here.

We view systems of the form (2) as belonging to a kind of “language” in which each equation in the system is a “sentence” in the language imposing additional constraints on a tiling. One can now pick and choose various sentences in this language to try to encode various interesting problems. For instance, one can encode the concept of a function ${f: {\bf Z}^2 \rightarrow G_0}$ taking values in a finite group ${G_0}$ as a single tiling equation

$\displaystyle \mathrm{X} \oplus (\{0\} \times G_0) = {\bf Z}^2 \times G_0 \ \ \ \ \ (4)$

since the solutions to this equation are precisely the graphs

$\displaystyle \mathrm{X} = \{ (n, f(n)): n \in {\bf Z}^2 \}$

of a function ${f: {\bf Z}^2 \rightarrow G_0}$. By adding more tiling equations to this equation to form a larger system, we can start imposing additional constraints on this function ${f}$. For instance, if ${x+H}$ is a coset of some subgroup ${H}$ of ${G_0}$, we can impose the additional equation

$\displaystyle \mathrm{X} \oplus (\{0\} \times H) = {\bf Z}^2 \times (x+H) \ \ \ \ \ (5)$

to impose the additional constraint that ${f(n) \in x+H}$ for all ${n \in {\bf Z}^2}$, if we desire. If ${G_0}$ happens to contain two distinct elements ${1, -1}$, and ${h \in {\bf Z}^2}$, then the additional equation

$\displaystyle \mathrm{X} \oplus (\{0,h\} \times \{0\}) = {\bf Z}^2 \times \{-1,1\} \ \ \ \ \ (6)$

imposes the additional constraints that ${f(n) \in \{-1,1\}}$ for all ${n \in {\bf Z}^2}$, and additionally that

$\displaystyle f(n+h) = -f(n)$

for all ${n \in {\bf Z}^2}$.

This begins to resemble the equations that come up in the domino problem. Here one has a finite set of Wang tiles – unit squares ${T}$ where each of the four sides is colored with a color ${c_N(T), c_S(T), c_E(T), c_W(T)}$ (corresponding to the four cardinal directions North, South, East, and West) from some finite set ${{\mathcal C}}$ of colors. The domino problem is then to tile the plane with copies of these tiles in such a way that adjacent sides match. In terms of equations, one is seeking to find functions ${c_N, c_S, c_E, c_W: {\bf Z}^2 \rightarrow {\mathcal C}}$ obeying the pointwise constraint

$\displaystyle (c_N(n), c_S(n), c_E(n), c_W(n)) \in {\mathcal W} \ \ \ \ \ (7)$

for all ${n \in {\bf Z}^2}$ where ${{\mathcal W}}$ is the set of colors associated to the set of Wang tiles being used, and the matching constraints

$\displaystyle c_S(n+(0,1)) = c_N(n); \quad c_W(n+(1,0)) = c_E(n) \ \ \ \ \ (8)$

for all ${{\bf Z}^2}$. As it turns out, the pointwise constraint (7) can be encoded by tiling equations that are fancier versions of (4), (5), (6) that involve only one unknown tiling set ${{\mathrm X}}$, but in order to encode the matching constraints (8) we were forced to introduce a second tile (or work with nonabelian tiling equations). This appears to be an inherent feature of the method, since we found a partial rigidity result for tilings of one tile in one dimension that obstructs this encoding strategy from working when one only has one tile available. The result is as follows:

Proposition 3 (Swapping property) Consider the solutions to a tiling equation

$\displaystyle \mathrm{X} \oplus F = E \ \ \ \ \ (9)$

in a one-dimensional group ${G = {\bf Z} \times G_0}$ (with ${G_0}$ a finite abelian group, ${F}$ finite, and ${E}$ periodic). Suppose there are two solutions ${\mathrm{X} = A_0, \mathrm{X} = A_1}$ to this equation that agree on the left in the sense that

$\displaystyle A_0 \cap (\{0, -1, -2, \dots\} \times G_0) = A_1 \cap (\{0, -1, -2, \dots\} \times G_0).$

For any function ${\omega: {\bf Z} \rightarrow \{0,1\}}$, define the “swap” ${A_\omega}$ of ${A_0}$ and ${A_1}$ to be the set

$\displaystyle A_\omega := \{ (n, g): n \in {\bf Z}, (n,g) \in A_{\omega(n)} \}$

Then ${A_\omega}$ also solves the equation (9).

One can think of ${A_0}$ and ${A_1}$ as “genes” with “nucleotides” ${\{ g \in G_0: (n,g) \in A_0\}}$, ${\{ g \in G_0: (n,g) \in A_1\}}$ at each position ${n \in {\bf Z}}$, and ${A_\omega}$ is a new gene formed by choosing one of the nucleotides from the “parent” genes ${A_0}$, ${A_1}$ at each position. The above proposition then says that the solutions to the equation (9) must be closed under “genetic transfer” among any pair of genes that agree on the left. This seems to present an obstruction to trying to encode equation such as

$\displaystyle c(n+1) = c'(n)$

for two functions ${c, c': {\bf Z} \rightarrow \{-1,1\}}$ (say), which is a toy version of the matching constraint (8), since the class of solutions to this equation turns out not to obey this swapping property. On the other hand, it is easy to encode such equations using two tiles instead of one, and an elaboration of this construction is used to prove our main theorem.

## August 09, 2021

### Peter RohdeApple’s child protection software is not about child safety — it’s a government backdoor

Apple recently announced that the next version of their iOS operating system will include new child protection features. Amongst these is a new image scanning technology that will enable Apple to notify the National Center for Missing & Exploited Children (NCMEC) if a user synchronises images containing child abuse material with iCloud. The technical summary for how this technology works is available here.

The architecture is designed to not rely on direct comparison of images, since this would require Apple to maintain a database of offending images themselves, which clearly is something to be avoided. Instead, it relies on comparing hashes of images, or rather of image ‘signatures’ which characterise images independent of simple changes. These signatures are generated by a machine learning algorithm named NeuralHash that identifies the essential features that characterise images as opposed to raw image data which can easily be reformatted, transformed or obscured with modifications.

For our purposes, all we need to know about hash functions is that they are one-way functions that take an arbitrary piece of data as input and generate a short hash (also known as a digest or checksum) as their output, usually no longer than 256-bits in length. The one-way nature of these functions means that if you are provided with a hash it is not computationally possible to invert it to determine what the actual input was. Hash functions are one of our most secure cryptographic primitives and are believed to be robust even against future quantum computers. This provides a very useful means by which to securely compare whether two pieces of data are the same without revealing what the data was. For this application, this means that only the party that generated the list of hashes from the original images can connect the dots should a flagged hash be provided to them, but any intermediary dealing only with the hashes remains oblivious to what they correspond to.

This cryptographic property of hash functions allows Apple to identify offending material without possessing it or uploading it to your phone. However, it simultaneously implies that if the hash list provided to them contains hashes of things unrelated to child protection it is not possible for them to know and they retain full plausible deniability if the list is manipulated for other purposes.

Apple is clearly not going to get into the business of validating the integrity of the hash list they are provided with. They’ll very intentionally remain as legally and ethically separated from that process as possible. All they do is provide the NCMEC with appropriate provisions if a sufficient number of flagged hashes are attributed to someone’s account. After that, it’s entirely up to the NCMEC who by necessity have to operate behind a veil of extreme secrecy, immune to public oversight. The only other example of such organisations that spring to mind are defence and intelligence agencies, but that would be very bad marketing compared to the more emotionally appealing notion of saving children from sexual predators.

We know that the scheme as advertised is not going to catch pedophiles by virtue of the fact that Apple has openly stated that filtering only applies to iCloud uploads, and I’m sure that even pedophiles are intelligent enough to recognise that this means you can just turn it off. I’m similarly confident that Tim Cook is intelligent enough to have spotted this one for himself and the next move is understood — in a future version of iOS there’ll be one of those privacy policy updates that no one reads but everyone has to agree to, which is exactly what just happened with WhatsApp, the difference being that in WhatsApp’s case it only pertained to releasing metadata whereas here we’re talking about actual message content.

Far more suspicious is the fact that the filtering is taking place client-side rather than in the cloud. One of the primary motivations driving our technological shift towards cloud computing is that it’s far more efficient for computation to be centralised, which is especially the case when it comes to devices with limited resources like phones. It’s hard to justify the design decision to perform computational filtering of cloud data on the client-side — a complete inversion of usual cloud computing principles — unless being restricted to cloud data isn’t the intention at all.

The fact that Apple is using the same tried-and-tested “save the kids” marketing campaign so over-used that it acts as an immediate red flag is telling in itself. To provide some context into just how overused the “save the kids” marketing campaign is (along with “terrorism” — presumably coming in iOS 16), a Google test of the number of hits associated with different encryption-related search terms provides some indication of the relative attention given to the different contexts in which concerns about encryption are raised:

• “encryption child protection”: 13,900,000
• “encryption terrorism”: 10,500,000
• “encryption human centipede”: 9,970,000
• “encryption murder”: 9,190,000
• “encryption chainsaw massacre”: 8,330,000
• “encryption arms trafficking”: 3,410,000
• “encryption getting stabbed by a meth dealer in Logan”: 793,000
• “encryption drug trafficking”: 718,000
• “encryption human trafficking”: 633,000

The priorities here are obviously completely wrong. Poor old murder, who makes an appearance about 20,000 times every year in the United States, is feeling quite undervalued here. And who in their right mind is more concerned about the prospect of getting blown up by Osama bin Laden than being turned into a human centipede?

Many but not all of these things are more common and present a greater threat than the subset of pedophiles who lack the intelligence to turn iCloud off, which involves clicking this button:

iOS 15 isn’t going to be the last of Apple’s operating system updates. But let’s assume the security model described in the technical white paper remains fixed. How could it be manipulated to provide full back-door access to message content beyond child abuse material?

The obvious next step is for filtering to extend beyond images to include text strings. This has very nefarious political implications given the heavy use of hashtags, code-words and names within political organisations. For example, the Turkish government recently announced that it is investigating the #HelpTurkey hashtag, suspected of being associated with a foreign influence operation. Hashing a hashtag is completely compatible with Apple’s security model. Foreign influence aside, this could be used to identify members of political organisations, something the Indian government has been proactively pursuing.

Alternately, the machine learning algorithm that generates image signatures could simply be updated to recognise other things. This is an especially opaque part of the computational pipeline. Machine learning classifiers operate very differently than conventional algorithms whose operation can easily be understood by inspecting the source code. A machine learning implementation will typically operate according to a fixed algorithm — for example simulating a neural network — whose operation is determined by its training information, essentially a long list of numbers representing complex multi-variate correlations, which collectively allow complex patterns or features to be recognised. This is very similar to how the human brain processes information, where it is the weighting of which neurons communicate with which that determines what it is that we have learned. But in the same way that knowing the structure of a brain does not allow us to know what it has learned, it is unviable to reverse engineer machine learning training parameters and know what it has learned to recognise. This choice of algorithmic implementation, therefore, makes it extremely difficult to know what it is capable of even with full knowledge of all its parameters, thereby limiting scrutiny.

But with some simple updates on the client-side software the problems can extend far further than this. Although it isn’t possible for NCMEC to have a pre-calculated list of hashes for every possible image or every possible piece of text, a simple technical workaround is to communicate hashes on a byte-by-byte basis. If we take an arbitrary image and instead of hashing the full image we directly hash each individual byte in the image (or other) data, there are only 256 pre-calculated hashes that need to be stored to enable full reconstruction of an arbitrary data stream using a dictionary attack. On the client side, this only requires a few lines of changes in the operating system code, which have probably already been written but currently commented out. This very trivial client-side change, easily automatically pushed as part of a future iOS update, does not require any changes to the advertised security model, which is still only comparing and passing hashes around but would provide NCMEC with full backdoor access to any data stream. The model is already fully future-proofed to enable this. Only minor changes in the iOS code are needed to take full advantage of it.

It’s hard to know what NCMEC might consider doing with full backdoor privileges, but this capability is certainly something of enormous interest to many branches of government, whose eyes must be watering. A cursory glance at their organisational structure and history raises some immediate issues.

According to the NCMEC Wikipedia page the current President and CEO, John F. Clark, was appointed Director of the United States Marshals Service by George W. Bush, and in 2010 joined defence contractor Lockheed Martin as their director of security operations for information systems and global solutions, and spent seven years working in the Special Operations Group.

Its Board Chair, Karen Tandy, served as the administrator for the United States Drug Enforcement Agency (DEA), also nominated by George W. Bush, and since 2016 served as vice-chair for the Homeland Security Advisory Council.

And on the issue of emotionally manipulating public opinion, its co-founder, John Walsh, has attracted some controversy:

Some critics accuse Walsh of creating predator panic by using his publicity. Walsh was heard by Congress on February 2, 1983, where he gave an unsourced claim of 50,000 abducted and 1.5 million missing children annually. He testified that the U.S. is “littered with mutilated, decapitated, raped, strangled children,” when in fact, a 1999 Department of Justice study found only 115 incidences of stereotypical kidnappings perpetrated by strangers, about 50 of which resulted in death or the child not being found.

Apple’s child protection initiative needs to be called out for what it is — integrating a device-level backdoor into its operating system, sending information to a government-backed private agency closely linked to the security establishment, intentionally designed in such a way that Apple has plausible deniability as to what information is being targeted, by necessity completely shielded from public oversight, and all marketed using the usual vernacular that governments use to engage in emotional blackmail whenever they can’t get what they want by implying you support pedophiles if you disagree.

It’s self-evident that the agenda isn’t to catch pedophiles if they simultaneously tell them how to avoid it. The only way to prevent this software from being misused is to prevent it from being implemented.

## July 31, 2021

### Terence Tao — Varieties of general type with many vanishing plurigenera, and optimal sine and sawtooth inequalities

Louis Esser, Burt Totaro, Chengxi Wang, and myself have just uploaded to the arXiv our preprint “Varieties of general type with many vanishing plurigenera, and optimal sine and sawtooth inequalities“. This is an interdisciplinary paper that arose because in order to optimize a certain algebraic geometry construction it became necessary to solve a purely analytic question which, while simple, did not seem to have been previously studied in the literature. We were able to solve the analytic question exactly and thus fully optimize the algebraic geometry construction, though the analytic question may have some independent interest.

Let us first discuss the algebraic geometry application. Given a smooth complex ${n}$-dimensional projective variety ${X}$ there is a standard line bundle ${K_X}$ attached to it, known as the canonical line bundle; ${n}$-forms on the variety become sections of this bundle. The bundle may not actually admit global sections; that is to say, the dimension ${h^0(X, K_X)}$ of global sections may vanish. But as one raises the canonical line bundle ${K_X}$ to higher and higher powers to form further line bundles ${mK_X}$, the number of global sections tends to increase; in particular, the dimension ${h^0(X, mK_X)}$ of global sections (known as the ${m^{th}}$ plurigenus) always obeys an asymptotic of the form

$\displaystyle h^0(X, mK_X) = \mathrm{vol}(X) \frac{m^n}{n!} + O( m^{n-1} )$

as ${m \rightarrow \infty}$ for some non-negative number ${\mathrm{vol}(X)}$, which is called the volume of the variety ${X}$, which is an invariant that reveals some information about the birational geometry of ${X}$. For instance, if the canonical line bundle is ample (or more generally, nef), this volume is equal to the intersection number ${K_X^n}$ (roughly speaking, the number of common zeroes of ${n}$ generic sections of the canonical line bundle); this is a special case of the asymptotic Riemann-Roch theorem. In particular, the volume ${\mathrm{vol}(X)}$ is a natural number in this case. However, it is possible for the volume to also be fractional in nature. One can then ask: how small can the volume get ${\mathrm{vol}(X)}$ without vanishing entirely? (By definition, varieties with non-vanishing volume are known as varieties of general type.)

It follows from a deep result obtained independently by Hacon–McKernan, Takayama and Tsuji that there is a uniform lower bound for the volume ${\mathrm{vol}(X)}$ of all ${n}$-dimensional projective varieties of general type. However, the precise lower bound is not known, and the current paper is a contribution towards probing this bound by constructing varieties of particularly small volume in the high-dimensional limit ${n \rightarrow \infty}$. Prior to this paper, the best such constructions of ${n}$-dimensional varieties basically had exponentially small volume, with a construction of volume at most ${e^{-(1+o(1))n \log n}}$ given by Ballico–Pignatelli–Tasin, and an improved construction with a volume bound of ${e^{-\frac{1}{3} n \log^2 n}}$ given by Totaro and Wang. In this paper, we obtain a variant construction with the somewhat smaller volume bound of ${e^{-(1-o(1)) n^{3/2} \log^{1/2} n}}$; the method also gives comparable bounds for some other related algebraic geometry statistics, such as the largest ${m}$ for which the pluricanonical map associated to the linear system ${|mK_X|}$ is not a birational embedding into projective space.

The space ${X}$ is constructed by taking a general hypersurface of a certain degree ${d}$ in a weighted projective space ${P(a_0,\dots,a_{n+1})}$ and resolving the singularities. These varieties are relatively tractable to work with, as one can use standard algebraic geometry tools (such as the ReidTai inequality) to provide sufficient conditions to guarantee that the hypersurface has only canonical singularities and that the canonical bundle is a reflexive sheaf, which allows one to calculate the volume exactly in terms of the degree ${d}$ and weights ${a_0,\dots,a_{n+1}}$. The problem then reduces to optimizing the resulting volume given the constraints needed for the above-mentioned sufficient conditions to hold. After working with a particular choice of weights (which consist of products of mostly consecutive primes, with each product occuring with suitable multiplicities ${c_0,\dots,c_{b-1}}$), the problem eventually boils down to trying to minimize the total multiplicity ${\sum_{j=0}^{b-1} c_j}$, subject to certain congruence conditions and other bounds on the ${c_j}$. Using crude bounds on the ${c_j}$ eventually leads to a construction with volume at most ${e^{-0.8 n^{3/2} \log^{1/2} n}}$, but by taking advantage of the ability to “dilate” the congruence conditions and optimizing over all dilations, we are able to improve the ${0.8}$ constant to ${1-o(1)}$.

Now it is time to turn to the analytic side of the paper by describing the optimization problem that we solve. We consider the sawtooth function ${g: {\bf R} \rightarrow (-1/2,1/2]}$, with ${g(x)}$ defined as the unique real number in ${(-1/2,1/2]}$ that is equal to ${x}$ mod ${1}$. We consider a (Borel) probability measure ${\mu}$ on the real line, and then compute the average value of this sawtooth function

$\displaystyle \mathop{\bf E}_\mu g(x) := \int_{\bf R} g(x)\ d\mu(x)$

as well as various dilates

$\displaystyle \mathop{\bf E}_\mu g(kx) := \int_{\bf R} g(kx)\ d\mu(x)$

of this expectation. Since ${g}$ is bounded above by ${1/2}$, we certainly have the trivial bound

$\displaystyle \min_{1 \leq k \leq m} \mathop{\bf E}_\mu g(kx) \leq \frac{1}{2}.$

However, this bound is not very sharp. For instance, the only way in which ${\mathop{\bf E}_\mu g(x)}$ could attain the value of ${1/2}$ is if the probability measure ${\mu}$ was supported on half-integers, but in that case ${\mathop{\bf E}_\mu g(2x)}$ would vanish. For the algebraic geometry application discussed above one is then led to the following question: for a given choice of ${m}$, what is the best upper bound ${c^{\mathrm{saw}}_m}$ on the quantity ${\min_{1 \leq k \leq m} \mathop{\bf E}_\mu g(kx)}$ that holds for all probability measures ${\mu}$?

If one considers the deterministic case in which ${\mu}$ is a Dirac mass supported at some real number ${x_0}$, then the Dirichlet approximation theorem tells us that there is ${1 \leq k \leq m}$ such that ${x_0}$ is within ${\frac{1}{m+1}}$ of an integer, so we have

$\displaystyle \min_{1 \leq k \leq m} \mathop{\bf E}_\mu g(kx) \leq \frac{1}{m+1}$

in this case, and this bound is sharp for deterministic measures ${\mu}$. Thus we have

$\displaystyle \frac{1}{m+1} \leq c^{\mathrm{saw}}_m \leq \frac{1}{2}.$

However, both of these bounds turn out to be far from the truth, and the optimal value of ${c^{\mathrm{saw}}_m}$ is comparable to ${\frac{\log 2}{\log m}}$. In fact we were able to compute this quantity precisely:

Theorem 1 (Optimal bound for sawtooth inequality) Let ${m \geq 1}$.
• (i) If ${m = 2^r}$ for some natural number ${r}$, then ${c^{\mathrm{saw}}_m = \frac{1}{r+2}}$.
• (ii) If ${2^r < m \leq 2^{r+1}}$ for some natural number ${r}$, then ${c^{\mathrm{saw}}_m = \frac{2^r}{2^r(r+1) + m}}$.
In particular, we have ${c^{\mathrm{saw}}_m = \frac{\log 2 + o(1)}{\log m}}$ as ${m \rightarrow \infty}$.

We establish this bound through duality. Indeed, suppose we could find non-negative coefficients ${a_1,\dots,a_m}$ such that one had the pointwise bound

$\displaystyle \sum_{k=1}^m a_k g(kx) \leq 1 \ \ \ \ \ (1)$

for all real numbers ${x}$. Integrating this against an arbitrary probability measure ${\mu}$, we would conclude

$\displaystyle (\sum_{k=1}^m a_k) \min_{1 \leq k \leq m} \mathop{\bf E}_\mu g(kx) \leq \sum_{k=1}^m a_k \mathop{\bf E}_\mu g(kx) \leq 1$

and hence

$\displaystyle c^{\mathrm{saw}}_m \leq \frac{1}{\sum_{k=1}^m a_k}.$

Conversely, one can find lower bounds on ${c^{\mathrm{saw}}_m}$ by selecting suitable candidate measures ${\mu}$ and computing the means ${\mathop{\bf E}_\mu g(kx)}$. The theory of linear programming duality tells us that this method must give us the optimal bound, but one has to locate the optimal measure ${\mu}$ and optimal weights ${a_1,\dots,a_m}$. This we were able to do by first doing some extensive numerics to discover these weights and measures for small values of ${m}$, and then doing some educated guesswork to extrapolate these examples to the general case, and then to verify the required inequalities. In case (i) the situation is particularly simple, as one can take ${\mu}$ to be the discrete measure that assigns a probability ${\frac{1}{r+2}}$ to the numbers ${\frac{1}{2}, \frac{1}{4}, \dots, \frac{1}{2^r}}$ and the remaining probability of ${\frac{2}{r+2}}$ to ${\frac{1}{2^{r+1}}}$, while the optimal weighted inequality (1) turns out to be

$\displaystyle 2g(x) + \sum_{j=1}^r g(2^j x) \leq 1$

which is easily proven by telescoping series. However the general case turned out to be significantly tricker to work out, and the verification of the optimal inequality required a delicate case analysis (reflecting the fact that equality was attained in this inequality in a large number of places).

After solving the sawtooth problem, we became interested in the analogous question for the sine function, that is to say what is the best bound ${c^{\sin}_m}$ for the inequality

$\displaystyle \min_{1 \leq k \leq m} \mathop{\bf E}_\mu \sin(kx) \leq c^{\sin}_m.$

The left-hand side is the smallest imaginary part of the first ${m}$ Fourier coefficients of ${\mu}$. To our knowledge this quantity has not previously been studied in the Fourier analysis literature. By adopting a similar approach as for the sawtooth problem, we were able to compute this quantity exactly also:

Theorem 2 For any ${m \geq 1}$, one has

$\displaystyle c^{\sin}_m = \frac{m+1}{2 \sum_{1 \leq j \leq m: j \hbox{ odd}} \cot \frac{\pi j}{2m+2}}.$

In particular,

$\displaystyle c^{\sin}_m = \frac{\frac{\pi}{2} + o(1)}{\log m}.$

Interestingly, a closely related cotangent sum recently appeared in this MathOverflow post. Verifying the lower bound on ${c^{\sin}_m}$ boils down to choosing the right test measure ${\mu}$; it turns out that one should pick the probability measure supported the ${\frac{\pi j}{2m+2}}$ with ${1 \leq j \leq m}$ odd, with probability proportional to ${\cot \frac{\pi j}{2m+2}}$, and the lower bound verification eventually follows from a classical identity

$\displaystyle \frac{m+1}{2} = \sum_{1 \leq j \leq m; j \hbox{ odd}} \cot \frac{\pi j}{2m+2} \sin \frac{\pi jk}{m+1}$

for ${1 \leq k \leq m}$, first posed by Eisenstein in 1844 and proved by Stern in 1861. The upper bound arises from establishing the trigonometric inequality

$\displaystyle \frac{2}{(m+1)^2} \sum_{1 \leq k \leq m; k \hbox{ odd}}$

$\displaystyle \cot \frac{\pi k}{2m+2} ( (m+1-k) \sin kx + k \sin(m+1-k)x ) \leq 1$

for all real numbers ${x}$, which to our knowledge is new; the left-hand side has a Fourier-analytic intepretation as convolving the Fejér kernel with a certain discretized square wave function, and this interpretation is used heavily in our proof of the inequality.

## July 30, 2021

### Georg von Hippel — Impressions from LATTICE 2021, Part II

As it turns out, Gather is not particularly stable. There are frequent glitches with the audio and video functionality, where people can't hear or can't be heard. The platform also doesn't run particularly well on Firefox (apparently only Google Chrome is officially supported) – for a web-based platform that seems almost inexcusable, since the web is all about free and open standards. One possible reason for the glitches may be the huge number of tracking scripts, cookies, beacons and other privacy-invasive technology that Gather tries to afflict its users with.

That being said, it is hard to imagine a much better principle for a virtual conference than the one Gather tries to implement, although I'm not entirely sure that automatically connecting to audio/video when simply walking past a person is necessarily the best approach.

Moving past the technical surface to the actual content of the conference, one major difference between Lattice 2021 and the typical lattice conference is the emphasis on special sessions, many of which (like the career panels) specifically target a younger audience. This makes a lot of sense given that the tiny conference fee and non-existent travel costs make this conference particularly affordable for early-stage graduate students who would normally not have the means to attend a conference. It is very praiseworthy that the LOC made a refreshing lemonade out of some Covid-infested lemons and created a conference that takes advantage of the otherwise rather unfortunate situation we all find ourselves in!

## July 26, 2021

### Georg von Hippel — Impressions from LATTICE 2021, Part I

Since this year's lattice conference is fully online, and anyone who can access my blog can also attend the conference, with the slides available on the Indico page, and the Zoom sessions recorded, I don't think that my blogging the conference in the usual manner would be as useful as in an ordinary year. This is especially true given that there are over 800 participants this year (which beats Lattice 2013 in Mainz for the record attendance at a lattice conference), so I think it is a fair assumption that everyone who cares about the finer points of lattice field theory is at the conference already and doesn't really need a detailed description of the contents of the plenary talks as filtered through my own preoccupations, preconceptions, and prejudices.

I will thus just provide some brief impressions from the conference on this blog.

The Zoom sessions with the talks worked very well so far, but the Gather environment seems to be a bit buggy still (sometimes people walking past dragged my avatar along, and the video/audio connections were slow to start and sometimes a bit laggy; also, private spaces seemed not to work as intended, and sometimes it was impossible to walk through a door into another room). Also, having to spread one's attention between up to three platforms (Zoom, Gather, Slack) makes the cognitive load a bit heavy at least for me. That being said, the organizers deserve great applause for having done their utmost to provide as "real" a conference experience as possible under the present circumstances.

### John Preskill — Quantum steampunk is heading to bookstores!

I’m publishing a book! Quantum Steampunk: The Physics of Yesterday’s Tomorrow is hitting bookstores next spring, and you can preorder it now.

As Quantum Frontiers regulars know, steampunk is a genre of literature, art and film. Steampunkers fuse 19th-century settings (such as Victorian England, the Wild West, and Meiji Japan) with futuristic technologies (such as dirigibles, time machines, and automata). So does my field of research, a combination of thermodynamics, quantum physics, and information processing.

Thermodynamics, the study of energy, developed during the Industrial Revolution. The field grew from practical concerns (How efficiently can engines pump water out of mines?) but wound up addressing fundamental questions (Why does time flow in only one direction?). Thermodynamics needs re-envisioning for 21st-century science, which spotlights quantum systems—electrons, protons, and other basic particles. Early thermodynamicists couldn’t even agree that atoms existed, let alone dream that quantum systems could process information in ways impossible for nonquantum systems. Over the past few decades, we’ve learned that quantum technologies can outperform their everyday counterparts in solving certain computational problems, in securing information, and in transmitting information. The study of quantum systems’ information-processing power forms a mathematical and conceptual toolkit, quantum information science. My colleagues and I leverage this toolkit to reconceptualize thermodynamics. As we combine a 19th-century framework (thermodynamics) with advanced technology (quantum information), I call our field quantum steampunk.

Glimpses of quantum steampunk have surfaced on this blog throughout the past eight years. The book is another animal, a 15-chapter closeup of the field. The book sets the stage with introductions to information processing, quantum physics, and thermodynamics. Then, we watch these three perspectives meld into one coherent whole. We tour the landscape of quantum thermodynamics—the different viewpoints and discoveries championed by different communities. These viewpoints, we find, offer a new lens onto the rest of science, including chemistry, black holes, and materials physics. Finally, we peer through a brass telescope to where quantum steampunk is headed next. Throughout the book, the science interleaves with anecdotes, history, and the story of one woman’s (my) journey into physics—and with snippets from a quantum-steampunk novel that I’ve dreamed up.

On this blog, different parts of my posts are intended for different audiences. Each post contains something for everyone, but not everyone will understand all of each post. In contrast, the book targets the general educated layperson. One of my editors majored in English, and another majored in biology, so the physics should come across clearly to everyone (and if it doesn’t, blame my editors). But the book will appeal to physicists, too. Reviewer Jay Lawrence, a professor emeritus of Dartmouth College’s physics department, wrote, “Presenting this vision [of quantum thermodynamics] in a manner accessible to laypeople discovering new interests, Quantum Steampunk will also appeal to specialists and aspiring specialists.” This book is for you.

Strange to say, I began writing Quantum Steampunk under a year ago. I was surprised to receive an email from Tiffany Gasbarrini, a senior acquisitions editor at Johns Hopkins University Press, in April 2020. Tiffany had read the article I’d written about quantum steampunk for Scientific American. She wanted to expand the press’s offerings for the general public. Would I be interested in writing a book proposal? she asked.

Not having expected such an invitation, I poked around. The press’s roster included books that caught my eye, by thinkers I wanted to meet. From Wikipedia, I learned that Johns Hopkins University Press is “the oldest continuously running university press in the United States.” Senior colleagues of mine gave the thumbs-up. So I let my imagination run.

I developed a table of contents while ruminating on long walks, which I’d begun taking at the start of the pandemic. In late July, I submitted my book proposal. As the summer ended, I began writing the manuscript.

Writing the first draft—73,000 words—took about five months. The process didn’t disrupt life much. I’m used to writing regularly; I’ve written one blog post per month here since 2013, and I wrote two novels during and after college. I simply picked up my pace. At first, I wrote only on weekends. Starting in December 2020, I wrote 1,000 words per day. The process wasn’t easy, but it felt like a morning workout—healthy and productive. That productivity fed into my science, which fed back into the book. One of my current research projects grew from the book’s epilogue. A future project, I expect, will evolve from Chapter 5.

As soon as I finished draft one—last January—Tiffany and I hunted for an illustrator. We were fortunate to find Todd Cahill, a steampunk artist. He transformed the poor sketches that I’d made into works of art.

Early this spring, I edited the manuscript. That edit was to a stroll as the next edit was to the Boston Marathon. Editor Michael Zierler coached me through the marathon. He identified concepts that needed clarification, discrepancies between explanations, and analogies that had run away with me—as well as the visions and turns of phrase that delighted him, to balance the criticism. As Michael and I toiled, 17 of my colleagues were kind enough to provide feedback. They read sections about their areas of expertise, pointed out subtleties, and confirmed facts.

Soon after Michael and I crossed the finished line, copyeditor Susan Matheson took up the baton. She hunted for typos, standardized references, and more. Come June, I was editing again—approving and commenting on her draft. Simultaneously, Tiffany designed the cover, shown above, with more artists. The marketing team reached out, and I began planning this blog post. Scratch that—I’ve been dreaming about this blog post for almost a year. But I forced myself not to spill the beans here till I told the research group I’ve been building. I shared about the book with them two Thursdays ago, and I hope that book critics respond as they did.

Every time I’ve finished a draft, my husband and I have celebrated by ordering takeout sandwiches from our favorite restaurant. Three sandwich meals are down, and we have one to go.

Having dreamed about this blog post for a year, I’m thrilled to bits to share my book with you. It’s available for preordering, and I encourage you to support your local bookstore by purchasing through bookshop.org. The book is available also through Barnes & Noble, Amazon, Waterstones, and the other usual suspects. For press inquiries, or to request a review copy, contact Kathryn Marguy at kmarguy@jhu.edu.

Over the coming year, I’ll continue sharing about my journey into publishing—the blurbs we’ll garner for the book jacket, the first copies hot off the press, the reviews and interviews. I hope that you’ll don your duster coat and goggles (every steampunker wears goggles), hop into your steam-powered gyrocopter, and join me.

## July 23, 2021

### Robert Helling — Email is broken --- the spammers won

I am an old man. I am convinced email a superior medium for person to person information exchange. It is text based, so you don't need special hardware to use it, it can still transport various media formats and it is inter-operational, you are not tied to one company offering a service but thanks to a long list of RFCs starting with number 822 everybody can run their own service.  Via GPG or S/MIME you can even add somewhat decent encryption and authentication (at least for the body of the message) even though that is not optimal since historically it came later.

But the real advantage is on the client side: You have threading, you have the option to have folders to store your emails, you can search through them, you can set up filters so routine stuff does not clog up your inbox. And when things go really pear shaped (as they tend to do every few years), you can still recover your data from a text file on your hard drive.

This is all opposed to the continuous Now of messengers where already yesterday's message has scrolled up never to be seen again. But the young folks tend to prefer those and I cannot see any reason those would be preferable (except maybe that you get notifications on your phone). But up to now, it was still an option to say to our students "if you want to get all relevant information, check your email, it will be there".

But today, I think, this is finally over. The spammers won.

Over the last months of the pandemic, I already had to realise, mailing lists are harder and harder to use. As far as I can tell, nobody offers decent mailing lists on the net (at least without too much cost and with reasonable data protection ambitions), probably because those would immediately be used by spammers. So you have to run your own. For the families of our kid's classes at school and for daycare, to spread information that could no longer be posted on physical notice boards, I tried to set up lists on mailman like we used to do it on one of my hosted servers. Oh, what a pain. There are so many hoops you have to jump through so the messages actually end up in people's inboxes. For some major email providers (I am looking at you hosteurope) there is little chance unless the message's sender is in the recipient's address book for example. And yes, I have set up my server correctly, with reverse DNS etc working.

But now, it's application season again. We have received over 250 applications for the TMP master program and now I have to inform applicants about their results. The boundary conditions is that I want to send an individual mail to everybody  that contains their name and I want to digitally sign it so applicants know it is authentic and not sent by some prankster impersonating me (that had happened in the past, seriously). And I don't want to send those manually.

So I have a perl script (did I mention I am old), that reads the applicants' data from a database, composes the message with the appropriate template, applies a GPG signature and sends it off to the next SMTP server.

In the past, I had already learned that inside the university network, you have to sleep 10 seconds after each message as every computer that sends emails at a higher rate is considered taken over by some spammers and automatically disconnected from the network for 30 minutes.

This year, I am sending from the home office and as it turns out, some of my messages never make it to the recipients. There is of course no error message to me (thanks spammers) but the message seems to be silently dropped. It's not in the inbox and also cannot be found in the spam folder. Just gone.

I will send paper letters as well (those are required for legal reasons anyway) with a similar script and thanks to TeX being text based on the input side. But is this really the answer for 2021? How am I supposed to inform people I have not (yet) met from around the world in a reliable way?

I am lost.

## July 01, 2021

### Peter RohdeA case for universal basic income

Within modern libertarian-leaning circles, there has been increasingly vocal support for introducing a Universal Basic Income (UBI) — an unconditional state-financed income guarantee for all citizens. This is often met with surprise from those more social-democratic leaning, who, interestingly, are increasingly supportive of the idea too, albeit approaching it from the social rather than the economic angle. I believe the arguments from both sides are valid, and regardless of which way you vote or whether you lean left or right, we should support the introduction of a UBI as a replacement for our existing welfare systems.

## Government efficiency

Perhaps the biggest gripe with current social welfare spending is how inefficient governments are at handling it, which is hardly surprising, given that governments simply aren’t very strongly incentivised to be efficient. A UBI is the most simple and efficient social spending program that can be envisioned and owing to its inherent universality and uniformity it disempowers politicians from manipulating it for political purposes.

We can effectively eliminate all forms of existing welfare (including unemployment benefits, aged and disability pensions, maternity and paternity leave, carers allowances, and student income support) and their associated bureaucratic infrastructure, replacing it with a script that makes automated periodic bank transfers to everyone. The entire existing infrastructure can be completely dismantled and put to rest, along with all its operating costs and overheads, and the associated labour put to better use than paper-pushing.

This isn’t only more efficient for the government, but also for recipients, who are no longer burdened with battling against bureaucracy and can instead turn their attention to more productive things.

## Benefits for the unemployed

A common and completely correct criticism of conventional unemployment benefits is their creation of ‘poverty traps’, induced by the extremely high effective marginal tax rate (EMTR) imposed upon the unemployed when they gain employment, whose income upon gaining employment is discounted by their loss of benefits. Their effective pay increase is the difference between these (not very much for a low-income worker), yet their respective increase in workload is from zero to full-time. This is equivalent to paying a very high rate of income tax, typically far higher than the highest marginal tax rates in even highly progressive systems. This creates an enormous disincentive to seek employment since people aren’t likely to switch from unemployment to full-time work if it only earns them an extra cup of coffee. With an unconditional UBI in place, any new income earned is not discounted by benefit reductions, and the person’s marginal income remains their actual income.

Eliminating this poverty trap barrier would foreseeably result in far greater incentive amongst the unemployed to seek and hold onto employment. Additionally, by not being stuck in a situation where people have the “I have to take any job I can get otherwise I’ll be homeless” mindset, people are more likely to find themselves in jobs that suit them and provide a positive future outlook.

Additionally, in eliminating the minimum award wage we prevent low-skilled workers from being priced out of the labour market altogether given the inevitability that some workers will be unable to readily find employment in which their skillset empowers them with productivity above this imposed threshold, thereby condemning them to unemployment.

## Benefits for workers

Amongst low-skilled workers in particular, there is significant bargaining asymmetry against employers — to the employee, everything might depend on keeping their job, whereas to the employer the cost of replacing them is relatively low. A UBI to a significant extent offsets the former issue, placing workers into a stronger workplace bargaining position without the need for regulatory mechanisms of highly questionable efficacy to attempt to enforce it. This increases the labour market competitiveness of workers.

The same argument extends to self-employment. A low-skilled worker might self-employ by undertaking odd jobs, or contract- or app-based work, which don’t provide long-term job security but ought to be encouraged.

## Labour market mobility

Throughout much of the developed world, savings rates are very low, providing a significant hindrance to labour market mobility — the ability for people to switch between jobs. Since it can be very challenging to line up a new job timed exactly right to match the exit from an existing one, transitioning between jobs is effectively disincentivized by this savings barrier. This creates an employment trap that inhibits people from placing themselves into jobs to which they are best suited and most likely to be successful and productive at.

Increased mobility in the labour market is enormously beneficial not only to workers, but also to employers, and reducing inefficiencies that undermine mobility have huge economic benefits. This doesn’t only apply to upward mobility, whereby people shift into jobs with higher income, but especially to sideways mobility, where people change jobs to better suits their conditions or gain new skills. A UBI improves overall labour market mobility by ensuring that employees aren’t economically trapped into jobs that aren’t right, or where they could find something better, which isn’t a good outcome for any of us, either individually or collectively.

## How to introduce a UBI

Given that introducing a UBI represents a major structural reform, it is important to be mindful of minimising disruption during implementation. If poorly executed it could shock the system and cause destabilisation. A suggestion for how to smoothly go about implementing this is as follows.

Take all existing employment contracts and automatically overwrite them to subtract the introduced UBI amount from the specified income, with no other contractual changes — the legislation effectively negotiates down all salaries by an amount equivalent to the newly introduced UBI, which everyone now receives. The UBI amount is effectively contractually refunded back to the employer via reduced liability, with no perceived change in income by employees.

We could similarly eliminate mandatory superannuation — a hugely inefficient and inequitable system — instead converting whatever fraction is necessary into income taxation to finance the component of the UBI stream that replaces the pension and allowing workers to retain the rest.

There is one obvious pitfall in the above figure, being that at the bottom end of the labour productivity spectrum legislatively renegotiated incomes become extremely small or indeed vanish in the limit where the UBI equals the previous minimum award wage. Clearly it isn’t viable for those employees to accept that. But that isn’t to say that their labour is actually worth nothing. The purpose of the legislative override is only to balance the various money flows, but it simultaneously skews incentives.

Two ways to navigate this issue are:

1. Complement the legislation with the right for employees (but not employers) to exit their contracts and choose to either reapply with newly negotiated salary. Employees can realistically negotiate their salaries upward at this point given that their labour still has value, however they would now be doing so in competition against a bigger pool of competitors given that the minimum wage has now been abolished. The new market value is expected to be less than before (due to increased labour market competition).
2. Introduce the UBI gradually over time, discounting the pre-existing benefits stream by an equal amount until over time it evaporates entirely and can be abolished. This has the advantage of mitigating sudden labour market shocks that a spontaneous introduction would induce and allow the market to find new pricing for labour on its own without top-down intervention.

Given that employers have all now been granted an employment cost reduction equivalent to the UBI paid to their employees, the cost of the UBI can now be compensated for by tax liabilities to the employer via company tax. For this to remain budget neutral the new tax liabilities would in total be greater than before since they now also must cover those who are unemployed but now received the UBI, which corresponds to the unemployment rate, offset by whatever the costs of the previous, far less efficient unemployment benefits schemes were. All things combined, in countries with low unemployment and/or highly inefficient public sectors, this is unlikely to be a major burden.

To maintain parity from the income tax side, existing income tax rates should be renormalised such that income and UBI combined are taxed equivalently to previous incomes without the UBI. This implies workers’ marginal tax rates to be pushed upwards to remain tax revenue neutral since they are effectively promoted to higher tax brackets.

Now everything is roughly financially equivalent from the perspective of both employees and employers, the primary difference being that the unemployed have now been absorbed into this new system, with the previous one being dismantled.

## Benefits for employers

From the employer’s perspective, especially small businesses employing low-skilled workers, this is a massive boon, since the cost of employment is reduced to the differential between what it previously was and what the new UBI provides. However, they’re receiving the same labour in exchange, so the ratio between the reduced labour cost (indirectly subsidised via the UBI) and the original labour cost (the worker’s full income) provides a multiplier on the productivity of their employees. This ratio is largest for low-income workers, thereby stimulating demand for their employment. Businesses are now better off to the effect of a UBI per employee, which can be recovered via company tax adjustments to finance the UBI.

## Social benefits

It’s easy to see the attractiveness of this from a social-democratic perspective. But not just in the trivial sense that it directly undermines poverty. More generally, it places every member of society in a position of greater independence. People in abusive relationships are better able to leave. People have a greater genuine choice in making life decisions, both personally and professionally, and are less easily coerced. People who need lifestyle or career flexibility for health reasons are in a far stronger position. And perhaps most importantly, the impact on general mental health within society is likely to be significant, by removing enormous insecurities, particularly during periods of labour market volatility.

## A more powerful monetary tool

During the course of the current COVID-induced economic crisis, it has become clear that central banks have exhausted their fiscal tools in stimulating consumption, and have entered a cycle of concocting ever more elaborate and extreme new ones out of desperation. This has arisen because their tools are largely limited to manipulating the money supply via interaction with financial markets. But that isn’t where consumption takes place, it’s where investments are made and assets held. Consumption is driven down here, not up there.

With well-considered legislation in place, a UBI provides a monetary tool for stimulating consumer spending via the injection of additional funds — in addition to the regular UBI flow under the umbrella of fiscal policy, central banks have the option of supplementing it with monetary stimulus. This provides the most direct possible mechanism for stimulating consumer price inflation (CPI) as opposed to conventional monetary stimulus via open market operations (OMOs) which largely affect asset price inflation. Unlike manipulating income tax rates, this is not conditional on people being employed (and less likely to spend than those who are not), and the latency between pulling monetary levers and their impact on CPI is minimised.

This is not to advocate for more monetary intervention by governments, who have consistently demonstrated themselves not especially competent in doing so, but rather provides tools that are more effective in achieving what they’re ostensibly intending to do. It doesn’t take a genius to see that when interest rates are near zero, pumping more cash off the press into financial markets isn’t going to be hugely effective at stimulating consumption. This represents a trickle-up rather than trickle-down approach to economic stimulus, the latter being oft criticised as being ineffective, which it is.

## Economic & social robustness

During COVID, various governments scrambled to introduce mechanisms to protect businesses from the catastrophic impact of having neither workers nor clients. One approach, which seems to have worked reasonably well here in Australia (the so-called JobKeeper scheme) was for the government to temporarily pay the incomes of companies’ employees, thereby allowing businesses to go into hibernation and be insulated from payroll liabilities, ensuring that existing employees’ employment is retained. A UBI would have made this kind of response far less chaotic.

A UBI provides significant robustness against future pandemic scenarios and other temporary but catastrophic labour market shocks. There is no need to rush to put new legislation into place to absorb such shocks, the infrastructure is already in place. Companies, especially small businesses with limited financial buffers, needn’t wait for governments to decide if and when they will consider programs like JobKeeper, which has seemingly been very effective at protecting businesses from their employees being prevented from working.

## Small government

While a UBI is a ‘big government’ initiative and therefore might be unpalatable to many on the political right, it needs to be seen on a comparative basis relative to what we currently have, a social welfare system that ostensibly aims to achieve the things a UBI will but does so with massive bureaucratic overhead and inefficiencies that in many ways create more problems than they solve.

Dismantling a hugely complicated set of systems focussing on different aspects of social welfare and reducing them to a single streamlined one that requires very little overhead to maintain is very much aligned with small government philosophy and massively reduces the number of levers the government has to play with and therefore play politics with.

## The redundancy of human labour

It seems likely a UBI will become inevitable at some point in time based on the expectation that as automation and AI continue to advance, increasing numbers of workers will simply not be able to compete with the productivity of machines. There’s only so much retraining we can do before trying to do something that machines can’t do better becomes futile. And this isn’t an expectation that applies only to low-skilled or manual labour, but inevitably will over time impact the competitiveness of even the most highly skilled jobs. With this kind of inevitable future outlook, the conventional libertarian philosophy of “leave the market to itself” will simply not be capable of addressing society’s needs.

We are in an era where automation dominates productivity, a trend that will inevitably continue escalating exponentially. It simply isn’t plausible that in the long term we will be able to compete against the machines we invented, and the need for state intervention to subsidise income will be necessary to ever-increasing extents. A UBI is the most practical and efficient way to achieve this, with the most positive outcomes both economically and socially. It would be complete intellectual capitulation for libertarians and conservatives not to allow free-market philosophy to evolve and accommodate for the hard reality that our labour is becoming worthless and that a UBI is the best way to accommodate for this in a manner entirely congruent with the remainder of free-market philosophy. As we outsource our productivity to machines, we mustn’t outsource all the return also.

The fact that a UBI implies higher marginal tax rates, even if net tax revenue remains the same inevitably raises the perfectly legitimate criticism that this skews incentives and may undermine competitiveness. There is truth in that, however while higher marginal rates act as a disincentive these are counteracted by newly created incentives, specifically more competitive workers, more streamlined government, a more efficient and less regulated labour market with greatly increased mobility, and most importantly all those who were previously priced or regulated out of the labour market altogether no longer face those roadblocks and are able to compete.

From the perspective of businesses, who now pay higher tax rates to effectively reimburse the UBI, they have the new incentive that labour costs have been reduced, providing a multiplier on return on investment into labour, and face fewer regulatory barriers when negotiating employment contracts since we can now safely dismantle many existing employment regulations.

## Vote Script for President

Ladies & Gentlemen — Allow me to introduce our new Minister for Social Services.

## June 27, 2021

### John Preskill — Cutting the quantum mustard

I had a relative to whom my parents referred, when I was little, as “that great-aunt of yours who walked into a glass door at your cousin’s birthday party.” I was a small child in a large family that mostly lived far away; little else distinguished this great-aunt from other relatives, in my experience. She’d intended to walk from my grandmother’s family room to the back patio. A glass door stood in the way, but she didn’t see it. So my great-aunt whammed into the glass; spent part of the party on the couch, nursing a nosebleed; and earned the epithet via which I identified her for years.

After growing up, I came to know this great-aunt as a kind, gentle woman who adored her family and was adored in return. After growing into a physicist, I came to appreciate her as one of my earliest instructors in necessary and sufficient conditions.

My great-aunt’s intended path satisfied one condition necessary for her to reach the patio: Nothing visible obstructed the path. But the path failed to satisfy a sufficient condition: The invisible obstruction—the glass door—had been neither slid nor swung open. Sufficient conditions, my great-aunt taught me, mustn’t be overlooked.

Her lesson underlies a paper I published this month, with coauthors from the Cambridge other than mine—Cambridge, England: David Arvidsson-Shukur and Jacob Chevalier Drori. The paper concerns, rather than pools and patios, quasiprobabilities, which I’ve blogged about many times [1,2,3,4,5,6,7]. Quasiprobabilities are quantum generalizations of probabilities. Probabilities describe everyday, classical phenomena, from Monopoly to March Madness to the weather in Massachusetts (and especially the weather in Massachusetts). Probabilities are real numbers (not dependent on the square-root of -1); they’re at least zero; and they compose in certain ways (the probability of sun or hail equals the probability of sun plus the probability of hail). Also, the probabilities that form a distribution, or a complete set, sum to one (if there’s a 70% chance of rain, there’s a 30% chance of no rain).

In contrast, quasiprobabilities can be negative and nonreal. We call such values nonclassical, as they’re unavailable to the probabilities that describe classical phenomena. Quasiprobabilities represent quantum states: Imagine some clump of particles in a quantum state described by some quasiprobability distribution. We can imagine measuring the clump however we please. We can calculate the possible outcomes’ probabilities from the quasiprobability distribution.

My favorite quasiprobability is an obscure fellow unbeknownst even to most quantum physicists: the Kirkwood-Dirac distribution. John Kirkwood defined it in 1933, and Paul Dirac defined it independently in 1945. Then, quantum physicists forgot about it for decades. But the quasiprobability has undergone a renaissance over the past few years: Experimentalists have measured it to infer particles’ quantum states in a new way. Also, colleagues and I have generalized the quasiprobability and discovered applications of the generalization across quantum physics, from quantum chaos to metrology (the study of how we can best measure things) to quantum thermodynamics to the foundations of quantum theory.

In some applications, nonclassical quasiprobabilities enable a system to achieve a quantum advantage—to usefully behave in a manner impossible for classical systems. Examples include metrology: Imagine wanting to measure a parameter that characterizes some piece of equipment. You’ll perform many trials of an experiment. In each trial, you’ll prepare a system (for instance, a photon) in some quantum state, send it through the equipment, and measure one or more observables of the system. Say that you follow the protocol described in this blog post. A Kirkwood-Dirac quasiprobability distribution describes the experiment.1 From each trial, you’ll obtain information about the unknown parameter. How much information can you obtain, on average over trials? Potentially more information if some quasiprobabilities are negative than if none are. The quasiprobabilities can be negative only if the state and observables fail to commute with each other. So noncommutation—a hallmark of quantum physics—underlies exceptional metrological results, as shown by Kirkwood-Dirac quasiprobabilities.

Exceptional results are useful, and we might aim to design experiments that achieve them. We can by designing experiments described by nonclassical Kirkwood-Dirac quasiprobabilities. When can the quasiprobabilities become nonclassical? Whenever the relevant quantum state and observables fail to commute, the quantum community used to believe. This belief turns out to mirror the expectation that one could access my grandmother’s back patio from the living room whenever no visible barriers obstructed the path. As a lack of visible barriers was necessary for patio access, noncommutation is necessary for Kirkwood-Dirac nonclassicality. But noncommutation doesn’t suffice, according to my paper with David and Jacob. We identified a sufficient condition, sliding back the metaphorical glass door on Kirkwood-Dirac nonclassicality. The condition depends on simple properties of the system, state, and observables. (Experts: Examples include the Hilbert space’s dimensionality.) We also quantified and upper-bounded the amount of nonclassicality that a Kirkwood-Dirac quasiprobability can contain.

From an engineering perspective, our results can inform the design of experiments intended to achieve certain quantum advantages. From a foundational perspective, the results help illuminate the sources of certain quantum advantages. To achieve certain advantages, noncommutation doesn’t cut the mustard—but we now know a condition that does.

For another take on our paper, check out this news article in Physics Today.

1Really, a generalized Kirkwood-Dirac quasiprobability. But that phrase contains a horrendous number of syllables, so I’ll elide the “generalized.”