Planet Musings

August 03, 2021

David Hoggscalings for different methods of inference

Excellent long check-in meeting with Micah Oeur (Merced) and Juan Guerra (Yale) about their summer-school projects with me and Adrian Price-Whelan (Flatiron). The projects are all performing inferences on toy data sets, where we have fake observations of a very simple dynamical system and we try to infer the parameters of that dynamical system. We are using virial theorem, Jeans modeling, Schwarzschild modeling, full forward modeling of the kinematics, and orbital torus imaging. We have results for many of these methods already (go team!) and more to come. Today we discussed the problem of measuring scalings of the inferences (the uncertainties, say) as a function of the number of observations and the quality of the data. Do they all scale the same? We also want to check sensitivity to selection effects, and wrongnesses of the assumptions.

David Hoggexoplanet atmospheres projects

I had a nice conversation today with Laura Kreidberg (MPIA) and Jason Dittmann (MPIA) about projects in ultra-precise spectroscopy for exoplanet-atmosphere science. I was pushing for projects in which we get closer to the metal (moving spectroscopy to two dimensions), and in which we use theoretical ideas to improve extraction of very weak spectral signals. Kreidberg was pushing for projects in which we use planet kinematics to separate very weak planet signals from star signals. On the latter, I recommended starting with co-ads of residuals.

David Hoggunits symmetry and dimensionless scalar quantities

I got in some quality time with Bernhard Schölkopf (MPI-IS) this weekend, in which we discussed many things related to contemporary machine learning and its overlap with physics. In particular, we discussed the point that the nonlinearities in machine-learning methods (like deep learning) have to be interpretable as nonlinear functions of dimensionless scalars, if the methods are going to be like laws of physics. That is a strong constraint on what can be put into nonlinear functions. We understood part of that in our recent paper, but not all of it. In particular, thinking about units or dimensions in machine learning might be extremely valuable.

Scott AaronsonOn blankfaces

For years, I’ve had a private term I’ve used with my family. To give a few examples of its use:

No, I never applied for that grant. I spent two hours struggling to log in to a web portal designed by the world’s top blankfaces until I finally gave up in despair.

No, I paid for that whole lecture trip out of pocket; I never got the reimbursement they promised. Their blankface administrator just kept sending me back the form, demanding more and more convoluted bank details, until I finally got the hint and dropped it.

No, my daughter Lily isn’t allowed in the swimming pool there. She easily passed their swim test last year, but this year the blankface lifeguard made up a new rule on the spot that she needs to retake the test, so Lily took it again and passed even more easily, but then the lifeguard said she didn’t like the stroke Lily used, so she failed her and didn’t let her retake it. I complained to their blankface athletic director, who launched an ‘investigation.’ The outcome of the ‘investigation’ was that, regardless of the ground truth about how well Lily can swim, their blankface lifeguard said she’s not allowed in the pool, so being blankfaces themselves, they’re going to stand with the lifeguard.

Yeah, the kids spend the entire day indoors, breathing each other’s stale, unventilated air, then they finally go outside and they aren’t allowed on the playground equipment, because of the covid risk from them touching it. Even though we’ve known for more than a year that covid is an airborne disease. Everyone I’ve talked there agrees that I have a point, but they say their hands are tied. I haven’t yet located the blankface who actually made this decision and stands by it.

What exactly is a blankface? He or she is often a mid-level bureaucrat, but not every bureaucrat is a blankface, and not every blankface is a bureaucrat. A blankface is anyone who enjoys wielding the power entrusted in them to make others miserable by acting like a cog in a broken machine, rather than like a human being with courage, judgment, and responsibility for their actions. A blankface meets every appeal to facts, logic, and plain compassion with the same repetition of rules and regulations and the same blank stare—a blank stare that, more often than not, conceals a contemptuous smile.

The longer I live, the more I see blankfacedness as one of the fundamental evils of the human condition. Yes, it contains large elements of stupidity, incuriosity, malevolence, and bureaucratic indifference, but it’s not reducible to any of those. After enough experience, the first two questions you ask about any organization are:

  1. Who are the blankfaces here?
  2. Who are the people I can talk with to get around the blankfaces?

As far as I can tell, blankfacedness cuts straight across conventional political ideology, gender, and race. (Age, too, except that I’ve never once encountered a blankfaced child.) Brilliance and creativity do seem to offer some protection against blankfacedness—possibly because the smarter you are, the harder it is to justify idiotic rules to yourself—but even there, the protection is far from complete.

Twenty years ago, all the conformists in my age cohort were obsessed with the Harry Potter books and movies—holding parties where they wore wizard costumes, etc. I decided that the Harry Potter phenomenon was a sort of collective insanity: from what I could tell, the stories seemed like startlingly puerile and unoriginal mass-marketed wish-fulfillment fantasies.

Today, those same conformists in my age cohort are more likely to condemn the Harry Potter series as Problematically white, male, and cisnormative, and J. K. Rowling herself as a monstrous bigot whose acquaintances’ acquaintances should be shunned. Naturally, then, there was nothing for me to do but finally read the series! My 8-year-old daughter Lily and I have been partner-reading it for half a year; we’re just finishing book 5. (After we’ve finished the series, we might start on Harry Potter and the Methods of Rationality … which, I confess, I’ve also never read.)

From book 5, I learned something extremely interesting. The most despicable villain in the Harry Potter universe is not Lord Voldemort, who’s mostly just a faraway cipher and abstract embodiment of pure evil, no more hateable than an earthquake. Rather, it’s Dolores Jane Umbridge, the toadlike Ministry of Magic bureaucrat who takes over Hogwarts school, forces out Dumbledore as headmaster, and terrorizes the students with increasingly draconian “Educational Decrees.” Umbridge’s decrees are mostly aimed at punishing Harry Potter and his friends, who’ve embarrassed the Ministry by telling everyone the truth that Voldemort has returned and by readying themselves to fight him, thereby defying the Ministry’s head-in-the-sand policy.

Anyway, I’ll say this for Harry Potter: Rowling’s portrayal of Umbridge is so spot-on and merciless that, for anyone who knows the series, I could simply define a blankface to be anyone sufficiently Umbridge-like.

This week I also finished reading The Premonition, the thrilling account of the runup to covid by Michael Lewis (who also wrote The Big Short, Moneyball, etc). Lewis tells the stories of a few individuals scattered across US health and government bureaucracies who figured out over the past 20 years that the US was breathtakingly unprepared for a pandemic, and who struggled against official indifference, mostly unsuccessfully, to try to fix that. As covid hit the US in early 2020, these same individuals frantically tried to pull the fire alarms, even as the Trump White House, the CDC, and state bureaucrats all did everything in their power to block and sideline them. We all know the results.

It’s no surprise that, in Lewis’s telling, Trump and his goons come in for world-historic blame: however terrible you thought they were, they were worse. It seems that John Bolton, in particular, gleefully took an ax to everything the two previous administrations had done to try to prepare the federal government for pandemics—after Tom Bossert, the one guy in Trump’s inner circle who’d actually taken pandemic preparation seriously, was forced out for contradicting Trump about Russia and Ukraine.

But the left isn’t spared either. The most compelling character in The Premonition is Charity Dean, who escaped from the Christian fundamentalist sect in which she was raised to put herself through medical school and become a crusading public-health officer for Santa Barbara County. Lewis relates with relish how, again and again, Dean startled the bureaucrats around her by taking matters into her own hands in her war against pathogens—e.g., slicing into a cadaver herself to take samples when the people whose job it was wouldn’t do it.

In 2019, Dean moved to Sacramento to become California’s next chief public health officer, but then Governor Gavin Newsom blocked her expected promotion, instead recruiting someone from the outside named Sonia Angell, who had no infectious disease experience but to whom Dean would have to report. Lewis reports the following as the reason:

“It was an optics problem,” says a senior official in the Department of Health and Human Services. “Charity was too young, too blond, too Barbie. They wanted a person of color.” Sonia Angell identified as Latina.

After it became obvious that the White House and the CDC were both asleep at the wheel, the competent experts’ Plan B was to get California to set a national standard, one that would shame all the other states into acting, by telling the truth about covid and by aggressively testing, tracing, and isolating. And here comes the tragedy: Charity Dean spent from mid-January till mid-March trying to do exactly that, and Sonia Angell blocked her. Angell—who comes across as a real-life Dolores Umbridge—banned Dean from using the word “pandemic,” screamed at her for her insubordination, and systematically shut her out of meetings. Angell’s stated view was that, until and unless the CDC said that there was a pandemic, there was no pandemic—regardless of what hospitals across California might be reporting to the contrary.

As it happens, California was the first state to move aggressively against covid, on March 19—basically because as the bodies started piling up, Dean and her allies finally managed to maneuver around Angell and get the ear of Governor Newsom directly. Had the response started earlier, the US might have had an outcome more in line with most industrialized countries. Half of the 630,000 dead Americans might now be alive.

Sonia Angell fully deserves to have her name immortalized by history as one of the blankest of blankfaces. But of course, Angell was far from alone. Robert Redfield, Trump’s CDC director, was a blankface extraordinaire. Nancy Messonnier, who lied to stay in Trump’s good graces, was a blankface too. The entire CDC and FDA seem to have teemed with blankfaces. As for Anthony Fauci, he became a national hero, maybe even deservedly so, merely by not being 100% a blankface, when basically every other “expert” in the US with visible power was. Fauci cleared a depressingly low bar, one that the people profiled by Lewis cleared at Simone-Biles-like heights.

In March 2020, the fundamental question I had was: where are the supercompetent rule-breaking American heroes from the disaster movies? What’s taking them so long? The Premonition satisfyingly answers that question. It turns out that the heroes did exist, scattered across the American health bureaucracy. They were screaming at the top of their lungs. But they were outvoted by the critical mass of blankfaces that’s become one of my country’s defining features.

Some people will object that the term “blankface” is dehumanizing. The reason I disagree is that a blankface is someone who freely chose to dehumanize themselves: to abdicate their human responsibility to see what’s right in front of them, to act like malfunctioning pieces of electronics even though they, like all of us, were born with the capacity for empathy and reason.

With many other human evils and failings, I have a strong inclination toward mercy, because I understand how someone could’ve succumbed to the temptation—indeed, I worry that I myself might’ve succumbed to it “but for the grace of God.” But here’s the thing about blankfaces: in all my thousands of dealings with them, not once was I ever given cause to wonder whether I might have done the same in their shoes. It’s like, of course I wouldn’t have! Even if I were forced (by my own higher-ups, an intransigent computer system, or whatever else) to foist some bureaucratic horribleness on an innocent victim, I’d be sheepish and apologetic about it. I’d acknowledge the farcical absurdity of what I was making the other person do, or declaring that they couldn’t do. Likewise, even if I were useless in a crisis, at least I’d get out of the way of the people trying to solve it. How could I live with myself otherwise?

The fundamental mystery of the blankfaces, then, is how they can be so alien and yet so common.

August 02, 2021

Doug NatelsonMetallic water!

What does it take to have a material behave as a metal, from the physicist's perspective?  I've written about this before (wow, I've been blogging for a long time).  Fundamentally, there have to be "gapless" charge-carrying excitations, so that the application of even a tiny electric field allows those charge carriers to transition into states with (barely) higher kinetic energies and momenta.  

Top: a droplet of NaK 
alloy.  Bottom: That 
droplet coated with 
adsorbed water that 
has become a metal. 
From here.
In conventional band insulators, the electronic states are filled right up to the brim in an energy band.  Apply an electric field, and an electron has no states available into which it can go without somehow grabbing enough energy to make it all the way to the bottom of the next (conduction) band.  Since that band gap can be large (5.5 eV for diamond, 8.5 eV for NaCl), no current flows, and you have an insulator.

This is, broadly speaking, the situation in liquid water. (Even though it's a liquid, the basic concept of bands of energy levels is still helpful, though of course there are no Bloch waves as in crystalline solids.)  According to calculations and experiments, the band gap in ordinary water is about 7 eV.  You can dissolve ions in water and have those carry a current - that's the whole deal with electrolytes - but ordinarily water is not a conductor based on electrons.  It is possible to inject some electrons into water, and these end up "hydrated" or "solvated" thanks to interactions with the surrounding polar water molecules and the hydronium and hydroxyl ions floating around, but historically this does not result in a metal.  To achieve metallicity, you'd have to inject or borrow so many electrons that they could get up into that next band.

This paper from late last week seems to have done just that.  A few molecular layers of water adsorbed on the outside of a droplet of liquid sodium-potassium metal apparently ends up taking in enough electrons (\( \sim 5 \times 10^{21}\) per cc) to become metallic, as detected through optical measurements of its conductivity (including a plasmon resonance).   It's rather transient, since chemistry continues and the whole thing oxidizes, but the result is quite neat!

David Hoggband-limited image models

I had my quasi-weekly call with Dustin Lang (Perimeter) today, in which we discussed how to represent the differences between images, or represent models of images. I am obsessing about how to model distortions of spectrograph calibration images (like arcs and flats), and Lang and I endlessly discuss difference imaging to discover time-variable phenomena and moving objects. One thing Lang has observed is that sometimes a parametric model of a reference image makes a better reference image for image differencing than the reference image itself. Presumably this is because the model de-noises the data? But if so, could we just use a band-limited non-parametric model say? We discussed many things related to all that.

David HoggGaia BP-RP spectral fitting

Hans-Walter Rix and I discussed with Rene Andrae (MPIA) and Morgan Fouesneau (MPIA) how they are fitting the (not yet public) BP-RP spectra from the ESA Gaia mission. The rules are that they can only use certain kinds of models and they can only use Gaia data, nothing external to Gaia. It's a real challenge! We discussed methods and ideas for verifying and testing results.

July 31, 2021

Terence TaoVarieties of general type with many vanishing plurigenera, and optimal sine and sawtooth inequalities

Louis Esser, Burt Totaro, Chengxi Wang, and myself have just uploaded to the arXiv our preprint “Varieties of general type with many vanishing plurigenera, and optimal sine and sawtooth inequalities“. This is an interdisciplinary paper that arose because in order to optimize a certain algebraic geometry construction it became necessary to solve a purely analytic question which, while simple, did not seem to have been previously studied in the literature. We were able to solve the analytic question exactly and thus fully optimize the algebraic geometry construction, though the analytic question may have some independent interest.

Let us first discuss the algebraic geometry application. Given a smooth complex {n}-dimensional projective variety {X} there is a standard line bundle {K_X} attached to it, known as the canonical line bundle; {n}-forms on the variety become sections of this bundle. The bundle may not actually admit global sections; that is to say, the dimension {h^0(X, K_X)} of global sections may vanish. But as one raises the canonical line bundle {K_X} to higher and higher powers to form further line bundles {mK_X}, the number of global sections tends to increase; in particular, the dimension {h^0(X, mK_X)} of global sections (known as the {m^{th}} plurigenus) always obeys an asymptotic of the form

\displaystyle  h^0(X, mK_X) = \mathrm{vol}(X) \frac{m^n}{n!} + O( m^{n-1} )

as {m \rightarrow \infty} for some non-negative number {\mathrm{vol}(X)}, which is called the volume of the variety {X}, which is an invariant that reveals some information about the birational geometry of {X}. For instance, if the canonical line bundle is ample (or more generally, nef), this volume is equal to the intersection number {K_X^n} (roughly speaking, the number of common zeroes of {n} generic sections of the canonical line bundle); this is a special case of the asymptotic Riemann-Roch theorem. In particular, the volume {\mathrm{vol}(X)} is a natural number in this case. However, it is possible for the volume to also be fractional in nature. One can then ask: how small can the volume get {\mathrm{vol}(X)} without vanishing entirely? (By definition, varieties with non-vanishing volume are known as varieties of general type.)

It follows from a deep result obtained independently by Hacon–McKernan, Takayama and Tsuji that there is a uniform lower bound for the volume {\mathrm{vol}(X)} of all {n}-dimensional projective varieties of general type. However, the precise lower bound is not known, and the current paper is a contribution towards probing this bound by constructing varieties of particularly small volume in the high-dimensional limit {n \rightarrow \infty}. Prior to this paper, the best such constructions of {n}-dimensional varieties basically had exponentially small volume, with a construction of volume at most {e^{-(1+o(1))n \log n}} given by Ballico–Pignatelli–Tasin, and an improved construction with a volume bound of {e^{-\frac{1}{3} n \log^2 n}} given by Totaro and Wang. In this paper, we obtain a variant construction with the somewhat smaller volume bound of {e^{-(1-o(1)) n^{3/2} \log^{1/2} n}}; the method also gives comparable bounds for some other related algebraic geometry statistics, such as the largest {m} for which the pluricanonical map associated to the linear system {|mK_X|} is not a birational embedding into projective space.

The space {X} is constructed by taking a general hypersurface of a certain degree {d} in a weighted projective space {P(a_0,\dots,a_{n+1})} and resolving the singularities. These varieties are relatively tractable to work with, as one can use standard algebraic geometry tools (such as the ReidTai inequality) to provide sufficient conditions to guarantee that the hypersurface has only canonical singularities and that the canonical bundle is a reflexive sheaf, which allows one to calculate the volume exactly in terms of the degree {d} and weights {a_0,\dots,a_{n+1}}. The problem then reduces to optimizing the resulting volume given the constraints needed for the above-mentioned sufficient conditions to hold. After working with a particular choice of weights (which consist of products of mostly consecutive primes, with each product occuring with suitable multiplicities {c_0,\dots,c_{b-1}}), the problem eventually boils down to trying to minimize the total multiplicity {\sum_{j=0}^{b-1} c_j}, subject to certain congruence conditions and other bounds on the {c_j}. Using crude bounds on the {c_j} eventually leads to a construction with volume at most {e^{-0.8 n^{3/2} \log^{1/2} n}}, but by taking advantage of the ability to “dilate” the congruence conditions and optimizing over all dilations, we are able to improve the {0.8} constant to {1-o(1)}.

Now it is time to turn to the analytic side of the paper by describing the optimization problem that we solve. We consider the sawtooth function {g: {\bf R} \rightarrow (-1/2,1/2]}, with {g(x)} defined as the unique real number in {(-1/2,1/2]} that is equal to {x} mod {1}. We consider a (Borel) probability measure {\mu} on the real line, and then compute the average value of this sawtooth function

\displaystyle  \mathop{\bf E}_\mu g(x) := \int_{\bf R} g(x)\ d\mu(x)

as well as various dilates

\displaystyle  \mathop{\bf E}_\mu g(kx) := \int_{\bf R} g(kx)\ d\mu(x)

of this expectation. Since {g} is bounded above by {1/2}, we certainly have the trivial bound

\displaystyle  \min_{1 \leq k \leq m} \mathop{\bf E}_\mu g(kx) \leq \frac{1}{2}.

However, this bound is not very sharp. For instance, the only way in which {\mathop{\bf E}_\mu g(x)} could attain the value of {1/2} is if the probability measure {\mu} was supported on half-integers, but in that case {\mathop{\bf E}_\mu g(2x)} would vanish. For the algebraic geometry application discussed above one is then led to the following question: for a given choice of {m}, what is the best upper bound {c^{\mathrm{saw}}_m} on the quantity {\min_{1 \leq k \leq m} \mathop{\bf E}_\mu g(kx)} that holds for all probability measures {\mu}?

If one considers the deterministic case in which {\mu} is a Dirac mass supported at some real number {x_0}, then the Dirichlet approximation theorem tells us that there is {1 \leq k \leq m} such that {x_0} is within {\frac{1}{m+1}} of an integer, so we have

\displaystyle  \min_{1 \leq k \leq m} \mathop{\bf E}_\mu g(kx) \leq \frac{1}{m+1}

in this case, and this bound is sharp for deterministic measures {\mu}. Thus we have

\displaystyle  \frac{1}{m+1} \leq c^{\mathrm{saw}}_m \leq \frac{1}{2}.

However, both of these bounds turn out to be far from the truth, and the optimal value of {c^{\mathrm{saw}}_m} is comparable to {\frac{\log 2}{\log m}}. In fact we were able to compute this quantity precisely:

Theorem 1 (Optimal bound for sawtooth inequality) Let {m \geq 1}.
  • (i) If {m = 2^r} for some natural number {r}, then {c^{\mathrm{saw}}_m = \frac{1}{r+2}}.
  • (ii) If {2^r < m \leq 2^{r+1}} for some natural number {r}, then {c^{\mathrm{saw}}_m = \frac{2^r}{2^r(r+1) + m}}.
In particular, we have {c^{\mathrm{saw}}_m = \frac{\log 2 + o(1)}{\log m}} as {m \rightarrow \infty}.

We establish this bound through duality. Indeed, suppose we could find non-negative coefficients {a_1,\dots,a_m} such that one had the pointwise bound

\displaystyle  \sum_{k=1}^m a_k g(kx) \leq 1 \ \ \ \ \ (1)

for all real numbers {x}. Integrating this against an arbitrary probability measure {\mu}, we would conclude

\displaystyle  (\sum_{k=1}^m a_k) \min_{1 \leq k \leq m} \mathop{\bf E}_\mu g(kx) \leq \sum_{k=1}^m a_k \mathop{\bf E}_\mu g(kx) \leq 1

and hence

\displaystyle  c^{\mathrm{saw}}_m \leq \frac{1}{\sum_{k=1}^m a_k}.

Conversely, one can find lower bounds on {c^{\mathrm{saw}}_m} by selecting suitable candidate measures {\mu} and computing the means {\mathop{\bf E}_\mu g(kx)}. The theory of linear programming duality tells us that this method must give us the optimal bound, but one has to locate the optimal measure {\mu} and optimal weights {a_1,\dots,a_m}. This we were able to do by first doing some extensive numerics to discover these weights and measures for small values of {m}, and then doing some educated guesswork to extrapolate these examples to the general case, and then to verify the required inequalities. In case (i) the situation is particularly simple, as one can take {\mu} to be the discrete measure that assigns a probability {\frac{1}{r+2}} to the numbers {\frac{1}{2}, \frac{1}{4}, \dots, \frac{1}{2^r}} and the remaining probability of {\frac{2}{r+2}} to {\frac{1}{2^{r+1}}}, while the optimal weighted inequality (1) turns out to be

\displaystyle  2g(x) + \sum_{j=1}^r g(2^j x) \leq 1

which is easily proven by telescoping series. However the general case turned out to be significantly tricker to work out, and the verification of the optimal inequality required a delicate case analysis (reflecting the fact that equality was attained in this inequality in a large number of places).

After solving the sawtooth problem, we became interested in the analogous question for the sine function, that is to say what is the best bound {c^{\sin}_m} for the inequality

\displaystyle  \min_{1 \leq k \leq m} \mathop{\bf E}_\mu \sin(kx) \leq c^{\sin}_m.

The left-hand side is the smallest imaginary part of the first {m} Fourier coefficients of {\mu}. To our knowledge this quantity has not previously been studied in the Fourier analysis literature. By adopting a similar approach as for the sawtooth problem, we were able to compute this quantity exactly also:

Theorem 2 For any {m \geq 1}, one has

\displaystyle  c^{\sin}_m = \frac{m+1}{2 \sum_{1 \leq j \leq m: j \hbox{ odd}} \cot \frac{\pi j}{2m+2}}.

In particular,

\displaystyle  c^{\sin}_m = \frac{\frac{\pi}{2} + o(1)}{\log m}.

Interestingly, a closely related cotangent sum recently appeared in this MathOverflow post. Verifying the lower bound on {c^{\sin}_m} boils down to choosing the right test measure {\mu}; it turns out that one should pick the probability measure supported the {\frac{\pi j}{2m+2}} with {1 \leq j \leq m} odd, with probability proportional to {\cot \frac{\pi j}{2m+2}}, and the lower bound verification eventually follows from a classical identity

\displaystyle  \frac{m+1}{2} = \sum_{1 \leq j \leq m; j \hbox{ odd}} \cot \frac{\pi j}{2m+2} \sin \frac{\pi jk}{m+1}

for {1 \leq k \leq m}, first posed by Eisenstein in 1844 and proved by Stern in 1861. The upper bound arises from establishing the trigonometric inequality

\displaystyle  \frac{2}{(m+1)^2} \sum_{1 \leq k \leq m; k \hbox{ odd}}

\displaystyle \cot \frac{\pi k}{2m+2} ( (m+1-k) \sin kx + k \sin(m+1-k)x ) \leq 1

for all real numbers {x}, which to our knowledge is new; the left-hand side has a Fourier-analytic intepretation as convolving the Fejér kernel with a certain discretized square wave function, and this interpretation is used heavily in our proof of the inequality.

Terence TaoThe Gowers uniformity norm of order 1+

In the modern theory of higher order Fourier analysis, a key role are played by the Gowers uniformity norms {\| \|_{U^k}} for {k=1,2,\dots}. For finitely supported functions {f: {\bf Z} \rightarrow {\bf C}}, one can define the (non-normalised) Gowers norm {\|f\|_{\tilde U^k({\bf Z})}} by the formula

\displaystyle  \|f\|_{\tilde U^k({\bf Z})}^{2^k} := \sum_{n,h_1,\dots,h_k \in {\bf Z}} \prod_{\omega_1,\dots,\omega_k \in \{0,1\}} {\mathcal C}^{\omega_1+\dots+\omega_k} f(x+\omega_1 h_1 + \dots + \omega_k h_k)

where {{\mathcal C}} denotes complex conjugation, and then on any discrete interval {[N] = \{1,\dots,N\}} and any function {f: [N] \rightarrow {\bf C}} we can then define the (normalised) Gowers norm

\displaystyle  \|f\|_{U^k([N])} := \| f 1_{[N]} \|_{\tilde U^k({\bf Z})} / \|1_{[N]} \|_{\tilde U^k({\bf Z})}

where {f 1_{[N]}: {\bf Z} \rightarrow {\bf C}} is the extension of {f} by zero to all of {{\bf Z}}. Thus for instance

\displaystyle  \|f\|_{U^1([N])} = |\mathop{\bf E}_{n \in [N]} f(n)|

(which technically makes {\| \|_{U^1([N])}} a seminorm rather than a norm), and one can calculate

\displaystyle  \|f\|_{U^2([N])} \asymp (N \int_0^1 |\mathop{\bf E}_{n \in [N]} f(n) e(-\alpha n)|^4\ d\alpha)^{1/4} \ \ \ \ \ (1)

where {e(\theta) := e^{2\pi i \alpha}}, and we use the averaging notation {\mathop{\bf E}_{n \in A} f(n) = \frac{1}{|A|} \sum_{n \in A} f(n)}.

The significance of the Gowers norms is that they control other multilinear forms that show up in additive combinatorics. Given any polynomials {P_1,\dots,P_m: {\bf Z}^d \rightarrow {\bf Z}} and functions {f_1,\dots,f_m: [N] \rightarrow {\bf C}}, we define the multilinear form

\displaystyle  \Lambda^{P_1,\dots,P_m}(f_1,\dots,f_m) := \sum_{n \in {\bf Z}^d} \prod_{j=1}^m f_j 1_{[N]}(P_j(n)) / \sum_{n \in {\bf Z}^d} \prod_{j=1}^m 1_{[N]}(P_j(n))

(assuming that the denominator is finite and non-zero). Thus for instance

\displaystyle  \Lambda^{\mathrm{n}}(f) = \mathop{\bf E}_{n \in [N]} f(n)

\displaystyle  \Lambda^{\mathrm{n}, \mathrm{n}+\mathrm{r}}(f,g) = (\mathop{\bf E}_{n \in [N]} f(n)) (\mathop{\bf E}_{n \in [N]} g(n))

\displaystyle  \Lambda^{\mathrm{n}, \mathrm{n}+\mathrm{r}, \mathrm{n}+2\mathrm{r}}(f,g,h) \asymp \mathop{\bf E}_{n \in [N]} \mathop{\bf E}_{r \in [-N,N]} f(n) g(n+r) h(n+2r)

\displaystyle  \Lambda^{\mathrm{n}, \mathrm{n}+\mathrm{r}, \mathrm{n}+\mathrm{r}^2}(f,g,h) \asymp \mathop{\bf E}_{n \in [N]} \mathop{\bf E}_{r \in [-N^{1/2},N^{1/2}]} f(n) g(n+r) h(n+r^2)

where we view {\mathrm{n}, \mathrm{r}} as formal (indeterminate) variables, and {f,g,h: [N] \rightarrow {\bf C}} are understood to be extended by zero to all of {{\bf Z}}. These forms are used to count patterns in various sets; for instance, the quantity {\Lambda^{\mathrm{n}, \mathrm{n}+\mathrm{r}, \mathrm{n}+2\mathrm{r}}(1_A,1_A,1_A)} is closely related to the number of length three arithmetic progressions contained in {A}. Let us informally say that a form {\Lambda^{P_1,\dots,P_m}(f_1,\dots,f_m)} is controlled by the {U^k[N]} norm if the form is small whenever {f_1,\dots,f_m: [N] \rightarrow {\bf C}} are {1}-bounded functions with at least one of the {f_j} small in {U^k[N]} norm. This definition was made more precise by Gowers and Wolf, who then defined the true complexity of a form {\Lambda^{P_1,\dots,P_m}} to be the least {s} such that {\Lambda^{P_1,\dots,P_m}} is controlled by the {U^{s+1}[N]} norm. For instance,
  • {\Lambda^{\mathrm{n}}} and {\Lambda^{\mathrm{n}, \mathrm{n} + \mathrm{r}}} have true complexity {0};
  • {\Lambda^{\mathrm{n}, \mathrm{n} + \mathrm{r}, \mathrm{n} + \mathrm{2r}}} has true complexity {1};
  • {\Lambda^{\mathrm{n}, \mathrm{n} + \mathrm{r}, \mathrm{n} + \mathrm{2r}, \mathrm{n} + \mathrm{3r}}} has true complexity {2};
  • The form {\Lambda^{\mathrm{n}, \mathrm{n}+2}} (which among other things could be used to count twin primes) has infinite true complexity (which is quite unfortunate for applications).
Roughly speaking, patterns of complexity {1} or less are amenable to being studied by classical Fourier analytic tools (the Hardy-Littlewood circle method); patterns of higher complexity can be handled (in principle, at least) by the methods of higher order Fourier analysis; and patterns of infinite complexity are out of range of both methods and are generally quite difficult to study. See these recent slides of myself (or this video of the lecture) for some further discussion.

Gowers and Wolf formulated a conjecture on what this complexity should be, at least for linear polynomials {P_1,\dots,P_m}; Ben Green and I thought we had resolved this conjecture back in 2010, though it turned out there was a subtle gap in our arguments and we were only able to resolve the conjecture in a partial range of cases. However, the full conjecture was recently resolved by Daniel Altman.

The {U^1} (semi-)norm is so weak that it barely controls any averages at all. For instance the average

\displaystyle  \Lambda^{2\mathrm{n}}(f) = \mathop{\bf E}_{n \in [N], \hbox{ even}} f(n)

is not controlled by the {U^1[N]} semi-norm: it is perfectly possible for a {1}-bounded function {f: [N] \rightarrow {\bf C}} to even have vanishing {U^1([N])} norm but have large value of {\Lambda^{2\mathrm{n}}(f)} (consider for instance the parity function {f(n) := (-1)^n}).

Because of this, I propose inserting an additional norm in the Gowers uniformity norm hierarchy between the {U^1} and {U^2} norms, which I will call the {U^{1^+}} (or “profinite {U^1}“) norm:

\displaystyle  \| f\|_{U^{1^+}[N]} := \frac{1}{N} \sup_P |\sum_{n \in P} f(n)| = \sup_P | \mathop{\bf E}_{n \in [N]} f 1_P(n)|

where {P} ranges over all arithmetic progressions in {[N]}. This can easily be seen to be a norm on functions {f: [N] \rightarrow {\bf C}} that controls the {U^1[N]} norm. It is also basically controlled by the {U^2[N]} norm for {1}-bounded functions {f}; indeed, if {P} is an arithmetic progression in {[N]} of some spacing {q \geq 1}, then we can write {P} as the intersection of an interval {I} with a residue class modulo {q}, and from Fourier expansion we have

\displaystyle  \mathop{\bf E}_{n \in [N]} f 1_P(n) \ll \sup_\alpha |\mathop{\bf E}_{n \in [N]} f 1_I(n) e(\alpha n)|.

If we let {\psi} be a standard bump function supported on {[-1,1]} with total mass and {\delta>0} is a parameter then

\displaystyle  \mathop{\bf E}_{n \in [N]} f 1_I(n) e(\alpha n)

\displaystyle \ll |\mathop{\bf E}_{n \in [-N,2N]; h, k \in [-N,N]} \frac{1}{\delta} \psi(\frac{h}{\delta N})

\displaystyle  1_I(n+h+k) f(n+h+k) e(\alpha(n+h+k))|

\displaystyle  \ll |\mathop{\bf E}_{n \in [-N,2N]; h, k \in [-N,N]} \frac{1}{\delta} \psi(\frac{h}{\delta N}) 1_I(n+k) f(n+h+k) e(\alpha(n+h+k))|

\displaystyle + \delta

(extending {f} by zero outside of {[N]}), as can be seen by using the triangle inequality and the estimate

\displaystyle  \mathop{\bf E}_{h \in [-N,N]} \frac{1}{\delta} \psi(\frac{h}{\delta N}) 1_I(n+h+k) - \mathop{\bf E}_{h \in [-N,N]} \frac{1}{\delta} \psi(\frac{h}{\delta N}) 1_I(n+k)

\displaystyle \ll (1 + \mathrm{dist}(n+k, I) / \delta N)^{-2}.

After some Fourier expansion of {\delta \psi(\frac{h}{\delta N})} we now have

\displaystyle  \mathop{\bf E}_{n \in [N]} f 1_P(n) \ll \frac{1}{\delta} \sup_{\alpha,\beta} |\mathop{\bf E}_{n \in [N]; h, k \in [-N,N]} e(\beta h + \alpha (n+h+k))

\displaystyle 1_P(n+k) f(n+h+k)| + \delta.

Writing {\alpha h + \alpha(n+h+k)} as a linear combination of {n, n+h, n+k} and using the Gowers–Cauchy–Schwarz inequality, we conclude

\displaystyle  \mathop{\bf E}_{n \in [N]} f 1_P(n) \ll \frac{1}{\delta} \|f\|_{U^2([N])} + \delta

hence on optimising in {\delta} we have

\displaystyle  \| f\|_{U^{1^+}[N]} \ll \|f\|_{U^2[N]}^{1/2}.

Forms which are controlled by the {U^{1^+}} norm (but not {U^1}) would then have their true complexity adjusted to {0^+} with this insertion.

The {U^{1^+}} norm recently appeared implicitly in work of Peluse and Prendiville, who showed that the form {\Lambda^{\mathrm{n}, \mathrm{n}+\mathrm{r}, \mathrm{n}+\mathrm{r}^2}(f,g,h)} had true complexity {0^+} in this notation (with polynomially strong bounds). [Actually, strictly speaking this control was only shown for the third function {h}; for the first two functions {f,g} one needs to localize the {U^{1^+}} norm to intervals of length {\sim \sqrt{N}}. But I will ignore this technical point to keep the exposition simple.] The weaker claim that {\Lambda^{\mathrm{n}, \mathrm{n}+\mathrm{r}^2}(f,g)} has true complexity {0^+} is substantially easier to prove (one can apply the circle method together with Gauss sum estimates).

The well known inverse theorem for the {U^2} norm tells us that if a {1}-bounded function {f} has {U^2[N]} norm at least {\eta} for some {0 < \eta < 1}, then there is a Fourier phase {n \mapsto e(\alpha n)} such that

\displaystyle  |\mathop{\bf E}_{n \in [N]} f(n) e(-\alpha n)| \gg \eta^2;

this follows easily from (1) and Plancherel’s theorem. Conversely, from the Gowers–Cauchy–Schwarz inequality one has

\displaystyle  |\mathop{\bf E}_{n \in [N]} f(n) e(-\alpha n)| \ll \|f\|_{U^2[N]}.

For {U^1[N]} one has a trivial inverse theorem; by definition, the {U^1[N]} norm of {f} is at least {\eta} if and only if

\displaystyle  |\mathop{\bf E}_{n \in [N]} f(n)| \geq \eta.

Thus the frequency {\alpha} appearing in the {U^2} inverse theorem can be taken to be zero when working instead with the {U^1} norm.

For {U^{1^+}} one has the intermediate situation in which the frequency {\alpha} is not taken to be zero, but is instead major arc. Indeed, suppose that {f} is {1}-bounded with {\|f\|_{U^{1^+}[N]} \geq \eta}, thus

\displaystyle  |\mathop{\bf E}_{n \in [N]} 1_P(n) f(n)| \geq \eta

for some progression {P}. This forces the spacing {q} of this progression to be {\ll 1/\eta}. We write the above inequality as

\displaystyle  |\mathop{\bf E}_{n \in [N]} 1_{n=b\ (q)} 1_I(n) f(n)| \geq \eta

for some residue class {b\ (q)} and some interval {I}. By Fourier expansion and the triangle inequality we then have

\displaystyle  |\mathop{\bf E}_{n \in [N]} e(-an/q) 1_I(n) f(n)| \geq \eta

for some integer {a}. Convolving {1_I} by {\psi_\delta: n \mapsto \frac{1}{N\delta} \psi(\frac{n}{N\delta})} for {\delta} a small multiple of {\eta} and {\psi} a Schwartz function of unit mass with Fourier transform supported on {[-1,1]}, we have

\displaystyle  |\mathop{\bf E}_{n \in [N]} e(-an/q) (1_I * \psi_\delta)(n) f(n)| \gg \eta.

The Fourier transform {\xi \mapsto \sum_n 1_I * \psi_\delta(n) e(- \xi n)} of {1_I * \psi_\delta} is bounded by {O(N)} and supported on {[-\frac{1}{\delta N},\frac{1}{\delta N}]}, thus by Fourier expansion and the triangle inequality we have

\displaystyle  |\mathop{\bf E}_{n \in [N]} e(-an/q) e(-\xi n) f(n)| \gg \eta^2

for some {\xi \in [-\frac{1}{\delta N},\frac{1}{\delta N}]}, so in particular {\xi = O(\frac{1}{\eta N})}. Thus we have

\displaystyle  |\mathop{\bf E}_{n \in [N]} f(n) e(-\alpha n)| \gg \eta^2 \ \ \ \ \ (2)

for some {\alpha} of the major arc form {\alpha = \frac{a}{q} + O(1/\eta)} with {1 \leq q \leq 1/\eta}. Conversely, for {\alpha} of this form, some routine summation by parts gives the bound

\displaystyle  |\mathop{\bf E}_{n \in [N]} f(n) e(-\alpha n)| \ll \frac{q}{\eta} \|f\|_{U^{1^+}[N]} \ll \frac{1}{\eta^2} \|f\|_{U^{1^+}[N]}

so if (2) holds for a {1}-bounded {f} then one must have {\|f\|_{U^{1^+}[N]} \gg \eta^4}.

Here is a diagram showing some of the control relationships between various Gowers norms, multilinear forms, and duals of classes {{\mathcal F}} of functions (where each class of functions {{\mathcal F}} induces a dual norm {\| f \|_{{\mathcal F}^*} := \sup_{\phi \in {\mathcal F}} \mathop{\bf E}_{n \in[N]} f(n) \overline{\phi(n)}}:

Here I have included the three classes of functions that one can choose from for the {U^3} inverse theorem, namely degree two nilsequences, bracket quadratic phases, and local quadratic phases, as well as the more narrow class of globally quadratic phases.

The Gowers norms have counterparts for measure-preserving systems {(X,T,\mu)}, known as Host-Kra seminorms. The {U^1(X)} norm can be defined for {f \in L^\infty(X)} as

\displaystyle  \|f\|_{U^1(X)} := \lim_{N \rightarrow \infty} \int_X |\mathop{\bf E}_{n \in [N]} T^n f|\ d\mu

and the {U^2} norm can be defined as

\displaystyle  \|f\|_{U^2(X)}^4 := \lim_{N \rightarrow \infty} \mathop{\bf E}_{n \in [N]} \| T^n f \overline{f} \|_{U^1(X)}^2.

The {U^1(X)} seminorm is orthogonal to the invariant factor {Z^0(X)} (generated by the (almost everywhere) invariant measurable subsets of {X}) in the sense that a function {f \in L^\infty(X)} has vanishing {U^1(X)} seminorm if and only if it is orthogonal to all {Z^0(X)}-measurable (bounded) functions. Similarly, the {U^2(X)} norm is orthogonal to the Kronecker factor {Z^1(X)}, generated by the eigenfunctions of {X} (that is to say, those {f} obeying an identity {Tf = \lambda f} for some {T}-invariant {\lambda}); for ergodic systems, it is the largest factor isomorphic to rotation on a compact abelian group. In analogy to the Gowers {U^{1^+}[N]} norm, one can then define the Host-Kra {U^{1^+}(X)} seminorm by

\displaystyle  \|f\|_{U^{1^+}(X)} := \sup_{q \geq 1} \frac{1}{q} \lim_{N \rightarrow \infty} \int_X |\mathop{\bf E}_{n \in [N]} T^{qn} f|\ d\mu;

it is orthogonal to the profinite factor {Z^{0^+}(X)}, generated by the periodic sets of {X} (or equivalently, by those eigenfunctions whose eigenvalue is a root of unity); for ergodic systems, it is the largest factor isomorphic to rotation on a profinite abelian group.

July 30, 2021

John BaezInformation Geometry (Part 17)

I’m getting back into information geometry, which is the geometry of the space of probability distributions, studied using tools from information theory. I’ve written a bunch about it already, which you can see here:

Information geometry.

Now I’m fascinated by something new: how symplectic geometry and contact geometry show up in information geometry. But before I say anything about this, let me say a bit about how they show up in thermodynamics. This is more widely discussed, and it’s a good starting point.

Symplectic geometry was born as the geometry of phase space in classical mechanics: that is, the space of possible positions and momenta of a classical system. The simplest example of a symplectic manifold is the vector space \mathbb{R}^{2n}, with n position coordinates q_i and n momentum coordinates p_i.

It turns out that symplectic manifolds are always even-dimensional, because we can always cover them with coordinate charts that look like \mathbb{R}^{2n}. When we change coordinates, it turns out that the splitting of coordinates into positions and momenta is somewhat arbitrary. For example, the position of a rock on a spring now may determine its momentum a while later, and vice versa. What’s not arbitrary? It’s the so-called ‘symplectic structure’:

\omega = dp_1 \wedge dq_1 + \cdots + dp_n \wedge dq_n

While far from obvious at first, we know by now that the symplectic structure is exactly what needs to be preserved under valid changes of coordinates in classical mechanics! In fact, we can develop the whole formalism of classical mechanics starting from a manifold with a symplectic structure.

Symplectic geometry also shows up in thermodynamics. In thermodynamics we can start with a system in equilibrium whose state is described by some variables q_1, \dots, q_n. Its entropy will be a function of these variables, say

S = f(q_1, \dots, q_n)

We can then take the partial derivatives of entropy and call them something:

\displaystyle{ p_i = \frac{\partial f}{\partial q_i} }

These new variables p_i are said to be ‘conjugate’ to the q_i, and they turn out to be very interesting. For example, if q_i is energy then p_i is ‘coolness’: the reciprocal of temperature. The coolness of a system is its change in entropy per change in energy.

Often the variables q_i are ‘extensive’: that is, you can measure them only by looking at your whole system and totaling up some quantity. Examples are energy and volume. Then the new variables p_i are ‘intensive’: that is, you can measure them at any one location in your system. Examples are coolness and pressure.

Now for a twist: sometimes we do not know the function f ahead of time. Then we cannot define the p_i as above. We’re forced into a different approach where we treat them as independent quantities, at least until someone tells us what f is.

In this approach, we start with a space \mathbb{R}^{2n} having n coordinates called q_i and n coordinates called p_i. This is a symplectic manifold, with the symplectic struture \omega described earlier!

But what about the entropy? We don’t yet know what it is as a function of the q_i, but we may still want to talk about it. So, we build a space \mathbb{R}^{2n+1} having one extra coordinate S in addition to the q_i and p_i. This new coordinate stands for entropy. And this new space has an important 1-form on it:

\alpha = -dS + p_1 dq_i + \cdots + p_n dq_n

This is called the ‘contact 1-form’.

This makes \mathbb{R}^{2n+1} into an example of a ‘contact manifold’. Contact geometry is the odd-dimensional partner of symplectic geometry. Just as symplectic manifolds are always even-dimensional, contact manifolds are always odd-dimensional.

What is the point of the contact 1-form? Well, suppose someone tells us the function f relating entropy to the coordinates q_i. Now we know that we want

S = f

and also

\displaystyle{ p_i = \frac{\partial f}{\partial q_i} }

So, we can impose these equations, which pick out a subset of \mathbb{R}^{2n+1}. You can check that this subset, say \Sigma, is an n-dimensional submanifold. But even better, the contact 1-form vanishes when restricted to this submanifold:

\left.\alpha\right|_\Sigma = 0

Let’s see why! Suppose x \in \Sigma and suppose v \in T_x \Sigma is a vector tangent to \Sigma at this point x. It suffices to show

\alpha(v) = 0

Using the definition of \alpha this equation says

\displaystyle{ -dS(v) + \sum_i p_i dq_i(v) = 0 }

But on the surface \Sigma we have

S = f, \qquad  \displaystyle{ p_i = \frac{\partial f}{\partial q_i} }

So, the equation we’re trying to show can be written as

\displaystyle{ -df(v) + \sum_i \frac{\partial f}{\partial q_i} dq_i(v) = 0 }

But this follows from

\displaystyle{ df = \sum_i \frac{\partial f}{\partial q_i} dq_i }

which holds because f is a function only of the coordinates q_i.

So, any formula for entropy S = f(q_1, \dots, q_n) picks out a so-called ‘Legendrian submanifold’ of \mathbb{R}^{2n+1}: that is, an n-dimensional submanifold such that the contact 1-form vanishes when restricted to this submanifold. And the idea is that this submanifold tells you everything you need to know about a thermodynamic system.

Indeed, V. I. Arnol’d says this was implicitly known to the great founder of statistical mechanics, Josiah Willard Gibbs. Arnol’d calls \mathbb{R}^5 with coordinates energy, entropy, temperature, pressure and volume the ‘Gibbs manifold’, and he proclaims:

Gibbs’ thesis: substances are Legendrian submanifolds of the Gibbs manifold.

This is from here:

• V. I. Arnol’d, Contact geometry: the geometrical method of Gibbs’ thermodynamics, Proceedings of the Gibbs Symposium (New Haven, CT, 1989), AMS, Providence, Rhode Island, 1990.

A bit more detail

Now I want to say everything again, with a bit of extra detail, assuming more familiarity with manifolds. Above I was using \mathbb{R}^n with coordinates q_1, \dots, q_n to describe the ‘extensive’ variables of a thermodynamic system. But let’s be a bit more general and use any smooth n-dimensional manifold Q. Even if Q is a vector space, this viewpoint is nice because it’s manifestly coordinate-independent!

So: starting from Q we build the cotangent bundle T^\ast Q. A point in cotangent describes both extensive variables, namely q \in Q, and ‘intensive’ variables, namely a cotangent vector p \in T^\ast_q Q.

The manifold T^\ast Q has a 1-form \theta on it called the tautological 1-form. We can describe it as follows. Given a tangent vector v \in T_{(q,p)} T^\ast Q we have to say what \theta(v) is. Using the projection

\pi \colon T^\ast Q \to Q

we can project v down to a tangent vector d\pi(v) at the point q. But the 1-form p eats tangent vectors at q and spits out numbers! So, we set

\theta(v) = p(d\pi(v))

This is sort of mind-boggling at first, but it’s worth pondering until it makes sense. It helps to work out what \theta looks like in local coordinates. Starting with any local coordinates q_i on an open set of Q, we get local coordinates q_i, p_i on the cotangent bundle of this open set in the usual way. On this open set you then get

\theta = p_1 dq_1 + \cdots + p_n dq_n

This is a standard calculation, which is really worth doing!

It follows that we can define a symplectic structure \omega by

\omega = d \theta

and get this formula in local coordinates:

\omega = dp_1 \wedge dq_1 + \cdots + dp_n \wedge dq_n

Now, suppose we choose a smooth function

f \colon Q \to \mathbb{R}

which describes the entropy. We get a 1-form df, which we can think of as a map

df \colon Q \to T^\ast Q

assigning to each choice q of extensive variables the pair (q,p) of extensive and intensive variables where

p = df_q

The image of the map df is a ‘Lagrangian submanifold‘ of T^\ast Q: that is, an n-dimensional submanifold \Lambda such that

\left.\omega\right|_{\Lambda} = 0

Lagrangian submanifolds are to symplectic geometry as Legendrian submanifolds are to contact geometry! What we’re seeing here is that if Gibbs had preferred symplectic geometry, he could have described substances as Lagrangian submanifolds rather than Legendrian submanifolds. But this approach would only keep track of the derivatives of entropy, df, not the actual value of the entropy function f.

If we prefer to keep track of the actual value of f using contact geometry, we can do that. For this we add an extra dimension to T^\ast Q and form the manifold T^\ast Q \times \mathbb{R}. The extra dimension represents entropy, so we’ll use S as our name for the coordinate on \mathbb{R}.

We can make T^\ast Q \times \mathbb{R} into a contact manifold with contact 1-form

\alpha = -d S + \theta

In local coordinates we get

\alpha = -dS + p_1 dq_i + \cdots + p_n dq_n

just as we had earlier. And just as before, if we choose a smooth function f \colon Q \to \mathbb{R} describing entropy, the subset

\Sigma = \{(q,p,S) \in T^\ast Q \times \mathbb{R} : \; S = f(q), p = df_q \}

is a Legendrian submanifold of T^\ast Q \times \mathbb{R}.

Okay, this concludes my lightning review of symplectic and contact geometry in thermodynamics! Next time I’ll talk about something a bit less well understood: how they show up in statistical mechanics.

Scott AaronsonSteven Weinberg (1933-2021): a personal view

Steven Weinberg sitting in front of a chalkboard covered in equations

Steven Weinberg was, perhaps, the last truly towering figure of 20th-century physics. In 1967, he wrote a 3-page paper saying in effect that as far as he could see, two of the four fundamental forces of the universe—namely, electromagnetism and the weak nuclear force—had actually been the same force until a tiny fraction of a second after the Big Bang, when a broken symmetry caused them to decouple. Strangely, he had developed the math underlying this idea for the strong nuclear force, and it didn’t work there, but it did seem to work for the weak force and electromagnetism. Steve noted that, if true, this would require the existence of two force-carrying particles that hadn’t yet been seen — the W and Z bosons — and would also require the existence of the famous Higgs boson.

By 1979, enough of this picture had been confirmed by experiment that Steve shared the Nobel Prize in Physics with Sheldon Glashow—Steve’s former high-school classmate—as well as with Abdus Salam, both of whom had separately developed pieces of the same puzzle. As arguably the central architect of what we now call the Standard Model of elementary particles, Steve was in the ultra-rarefied class where, had he not won the Nobel Prize, it would’ve been a stain on the prize rather than on him.

Steve once recounted in my hearing that Richard Feynman initially heaped scorn on the electroweak proposal. Late one night, however, Steve was woken up by a phone call. It was Feynman. “I believe your theory now,” Feynman announced. “Why?” Steve asked. Feynman, being Feynman, gave some idiosyncratic reason that he’d worked out for himself.

It used to happen more often that someone would put forward a bold new proposal about the most fundamental laws of nature … and then the experimentalists would actually go out and confirm it. Besides with the Standard Model, though, there’s approximately one other time that that’s happened in the living memory of most of today’s physicists. Namely, when astronomers discovered in 1998 that the expansion of the universe was accelerating, apparently due to a dark energy that behaved like Einstein’s long-ago-rejected cosmological constant. Very few had expected such a result. There was one prominent exception, though: Steve Weinberg had written in 1987 that he saw no reason why the cosmological constant shouldn’t take a nonzero value that was still tiny enough to be consistent with galaxy formation and so forth.

In his long and illustrious career, one of the least important things Steve did, six years ago, was to play a major role in recruiting me and my wife Dana to UT Austin. The first time I met Steve, his first question to me was “have we met before? you look familiar.” It turns out that he’d met my dad, Steve Aaronson, way back in the 1970s, when my dad (then a young science writer) had interviewed Weinberg for a magazine article. I was astonished that Weinberg would remember such a thing across decades.

Steve was then gracious enough to take me, Dana, and both of my parents out to dinner in Austin as part of my and Dana’s recruiting trip.

We talked, among other things, about Telluride House at Cornell, where Steve had lived as an undergrad in the early 1950s and where I’d lived as an undergrad almost half a century later. Steve said that, while he loved the intellectual atmosphere at Telluride, he tried to have as little to do as possible with the “self-government” aspect, since he found the political squabbles that convulsed many of the humanities majors there to be a waste of time. I burst out laughing, because … well, imagine you got to have dinner with James Clerk Maxwell, and he opened up about some ridiculously specific pet peeve from his college years, and it was your ridiculously specific pet peeve from your college years.

(Steve claimed to us, not entirely convincingly, that he was a mediocre student at Cornell, more interesting in “necking” with his fellow student and future wife Louise than in studying physics.)

After Dana and I came to Austin, Steve was kind enough to invite me to the high-energy theoretical physics lunches, where I chatted with him and the other members of his group every week (or better yet, simply listened). I’d usually walk to the faculty club ten minutes early. Steve, having arrived by car, would be sitting alone in an armchair, reading a newspaper, while he waited for the other physicists to arrive by foot. No matter how scorching the Texas sun, Steve would always be wearing a suit (usually a tan one) and a necktie, his walking-cane by his side. I, typically in ratty shorts and t-shirt, would sit in the armchair next to him, and we’d talk—about the latest developments in quantum computing and information (Steve, a perpetual student, would pepper me with questions), or his recent work on nonlinear modifications of quantum mechanics, or his memories of Cambridge, MA, or climate change or the anti-Israel protests in Austin or whatever else. These conversations, brief and inconsequential as they probably were to him, were highlights of my week.

There was, of course, something a little melancholy about getting to know such a great man only in the twilight of his life. To be clear, Steve Weinberg in his mid-to-late 80s was far more cogent, articulate, and quick to understand what was said to him than just about anyone you’d ever met in their prime. But then, after a short conversation, he’d have to leave for a nap. Steve was as clear-eyed and direct about his age and impending mortality as he was about everything else. “Scott!” he once greeted me. “I just saw the announcement for your physics colloquium about quantum supremacy. I hope I’m still alive next month to attend it.”

(As it happens, the colloquium in question was on November 9, 2016, the day we learned that Trump would become president. I offered to postpone the talk, since no one could concentrate on physics on such a day. While several of the physicists agreed that that was the right call, Steve convinced me to go ahead with the following message: “I sympathize, but I do want to hear you … There is some virtue in just plowing on.”)

I sometimes felt, as well, like I was speaking with Steve across a cultural chasm even greater than the half-century that separated us in age. Steve enjoyed nothing more than to discourse at length, in his booming New-York-accented baritone, about opera, or ballet, or obscure corners of 18th-century history. It would be easy to feel like a total philistine by comparison … and I did. Steve also told me that he never reads blogs or other social media, since he’s unable believe any written work is “real” unless it’s published, ideally on paper. I could only envy such an attitude.

If you did try to judge by the social media that he never read, you might conclude that Steve would be remembered by the wider world less for any of his epochal contributions to physics than for a single viral quote of his:

With or without religion, good people can behave well and bad people can do evil; but for good people to do evil — that takes religion.

I can testify that Steve fully lived his atheism. Four years ago, I invited him (along with many other UT colleagues) to the brit milah of my newborn son Daniel. Steve said he’d be happy to come over our house another time (and I’m happy to say that he did a year later), but not to witness any body parts being cut.

Despite his hostility to Judaism—along with every other religion—Steve was a vociferous supporter of the state of Israel, almost to the point of making me look like Edward Said or Noam Chomsky. For Steve, Zionism was not in spite of his liberal, universalist Enlightenment ideals but because of them.

Anyway, there’s no need even to wonder whether Steve had any sort of deathbed conversion. He’d laugh at the thought.

In 2016, Steve published To Explain the World, a history of human progress in physics and astronomy from the ancient Greeks to Newton (when, Steve says, the scientific ethos reached the form that it still basically has today). It’s unlike any other history-of-science book that I’ve read. Of course I’d read other books about Aristarchus and Ptolemy and so forth, but I’d never read a modern writer treating them not as historical subjects, but as professional colleagues merely separated in time. Again and again, Steve would redo ancient calculations, finding errors that had escaped historical notice; he’d remark on how Eratosthenes or Kepler could’ve done better with the data available to them; he’d grade the ancients by how much of modern physics and cosmology they’d correctly anticipated.

To Explain the World was savaged in reviews by professional science historians. Apparently, Steve had committed the unforgivable sin of “Whig history”: that is, judging past natural philosophers by the standards of today. Steve clung to the naïve, debunked, scientistic notions that there’s such a thing as “actual right answers” about how the universe works; that we today are, at any rate, much closer to those right answers than the ancients were; and that we can judge the ancients by how close they got to the right answers that we now know.

As I read the sneering reviews, I kept thinking: so suppose Archimedes, Copernicus, and all the rest were brought back from the dead. Who would they rather talk to: historians seeking to explore every facet of their misconceptions, like anthropologists with a paleolithic tribe; or Steve Weinberg, who’d want to bring them up to speed as quickly as possible so they could continue the joint quest?

When it comes to the foundations of quantum mechanics, Steve took the view that no existing interpretation is satisfactory, although the Many-Worlds Interpretation is perhaps the least bad of the bunch. Steve felt that our reaction to this state of affairs should be to test quantum mechanics more precisely—for example, by looking for tiny nonlinearities in the Schrödinger equation, or other signs that QM itself is only a limit of some more all-encompassing theory. This is, to put it mildly, not a widely-held view among high-energy physicists—but it provided a fascinating glimpse into how Steve’s mind works.

Here was, empirically, the most successful theoretical physicist alive, and again and again, his response to conceptual confusion was not to ruminate more about basic principles but to ask for more data or do a more detailed calculation. He never, ever let go of a short tether to the actual testable consequences of whatever was being talked about, or future experiments that might change the situation.

(Steve worked on string theory in the early 1980s, and he remained engaged with it for the rest of his life, for example by recruiting the string theorists Jacques Distler and Willy Fischler to UT Austin. But he later soured on the prospects for getting testable consequences out of string theory within a reasonable timeframe. And he once complained to me that the papers he’d read about “It from Qubit,” AdS/CFT, and the black hole information problem had had “too many words and not enough equations.”)

Steve was, famously, about as hardcore a reductionist as has ever existed on earth. He was a reductionist not just in the usual sense that he believed there are fundamental laws of physics, from which, together with the initial conditions, everything that happens in our universe can be calculated in principle (if not in practice), at least probabilistically. He was a reductionist in the stronger sense that he thought the quest to discover the fundamental laws of the universe had a special pride of place among all human endeavors—a place not shared by the many sciences devoted to the study of complex emergent behavior, interesting and important though they might be.

This came through clearly in Steve’s critical review of Stephen Wolfram’s A New Kind of Science, where Steve (Weinberg, that is) articulated his views of why “free-floating” theories of complex behavior can’t take the place of a reductionistic description of our actual universe. (Of course, I was also highly critical of A New Kind of Science in my review, but for somewhat different reasons than Steve was.) Steve’s reductionism was also clearly expressed in his testimony to Congress in support of continued funding for the Superconducting Supercollider. (Famously, Phil Anderson testified against the SSC, arguing that the money would better be spent on condensed-matter physics and other sciences of emergent behavior. The result: Congress did cancel the SSC, and it redirected precisely zero of the money to other sciences. But at least Steve lived to see the LHC dramatically confirm the existence of the Higgs boson, as the SSC would have.)

I, of course, have devoted my career to theoretical computer science, which you might broadly call a “science of emergent behavior”: it tries to figure out the ultimate possibilities and limits of computation, taking the underlying laws of physics as given. Quantum computing, in particular, takes as its input a physical theory that was already known by 1926, and studies what can be done with it. So you might expect me to disagree passionately with Weinberg on reductionism versus holism.

In reality, I have a hard time pinpointing any substantive difference. Mostly I see a difference in opportunities: Steve saw a golden chance to contribute something to the millennia-old quest to discover the fundamental laws of nature, at the tail end of the heroic era of particle physics that culminated in what we now call the Standard Model. He was brilliant enough to seize that chance. I didn’t see a similar chance: possibly because it no longer existed; almost certainly because, even if it did, I wouldn’t have had the right mind for it. I found a different chance, to work at the intersection of physics and computer science that was finally kicking into high gear at the end of the 20th century. Interestingly, while I came to that intersection from the CS side, quite a few who were originally trained as high-energy physicists ended up there as well—including a star PhD student of Steve Weinberg’s named John Preskill.

Despite his reductionism, Steve was as curious and enthusiastic about quantum computation as he was about a hundred other topics beyond particle physics—he even ended his quantum mechanics textbook with a chapter about Shor’s factoring algorithm. Having said that, a central reason for his enthusiasm about QC was that he clearly saw how demanding a test it would be of quantum mechanics itself—and as I mentioned earlier, Steve was open to the possibility that quantum mechanics might not be exactly true.

It would be an understatement to call Steve “left-of-center.” He believed in higher taxes on rich people like himself to service a robust social safety net. When Trump won, Steve remarked to me that most of the disgusting and outrageous things Trump would do could be reversed in a generation or so—but not the aggressive climate change denial; that actually could matter on the scale of centuries. Steve made the news in Austin for openly defying the Texas law forcing public universities to allow concealed carry on campus: he said that, regardless of what the law said, firearms would not be welcome in his classroom. (Louise, Steve’s wife for 67 years and a professor at UT Austin’s law school, also wrote perhaps the definitive scholarly takedown of the shameful Bush vs. Gore Supreme Court decision, which installed George W. Bush as president.)

All the same, during the “science wars” of the 1990s, Steve was scathing about the academic left’s postmodernist streak and deeply sympathetic to what Alan Sokal had done with his Social Text hoax. Steve also once told me that, when he (like other UT faculty) was required to write a statement about what he would do to advance Diversity, Equity, and Inclusion, he submitted just a single sentence: “I will seek the best candidates, without regard to race or sex.” I remarked that he might be one of the only academics who could get away with that.

I confess that, for the past five years, knowing Steve was a greater source of psychological strength for me than, from a rational standpoint, it probably should have been. Regular readers will know that I’ve spent months of my life agonizing over various nasty things people have said me about on Twitter and Reddit—that I’m a sexist white male douchebag, a clueless techbro STEMlord, a neoliberal Zionist shill, and I forget what else.

But I lately have had a secret emotional weapon that helped somewhat: namely, the certainty that Steven Weinberg had more intellectual power in a single toenail clipping than these Twitter-attackers had collectively experienced over the course of their lives. It’s like, have you heard the joke where two rabbis are arguing some point of Talmud, and then God speaks from a booming thundercloud to declare that the first rabbi is right, and then the second rabbi says “OK fine, now it’s 2 against 1?” For the W and Z bosons and Higgs boson that you predicted to turn up at the particle accelerator is not exactly God declaring from a thundercloud that the way your mind works is aligned with the way the world actually is—Steve, of course, would wince at the suggestion—but it’s about the closest thing available in this universe. My secret emotional weapon was that I knew the man who’d experienced this, arguably more than any of the 7.6 billion other living humans, and not only did that man not sneer at me, but by some freakish coincidence, he seemed to have reached roughly the same views as I had on >95% of controversial questions where we both had strong opinions.

My final conversations with Steve Weinberg were about a laptop. When covid started in March 2020, Steve and Louise, being in their late 80s, naturally didn’t want to take chances, and rigorously sheltered at home. But an issue emerged: Steve couldn’t install Zoom on his Bronze Age computer, and so couldn’t participate in the virtual meetings of his own group, nor could he do Zoom calls with his daughter and granddaughter. While as a theoretical computer scientist, I don’t normally volunteer myself as tech support staff, I decided that an exception was more than warranted in this case. The quickest solution was to configure one of my own old laptops with everything Steve needed and bring it over to his house.

Later, Steve emailed me to say that, while the laptop had worked great and been a lifesaver, he’d finally bought his own laptop, so I should come by to pick mine up. I delayed and delayed with that, but finally decided I should do it before leaving Austin at the beginning of this summer. So I emailed Steve to tell him I’d be coming. He replied to me asking Louise to leave the laptop on the porch — but the email was addressed only to me, not her.

At that moment, I knew something had changed: only a year before, incredibly, I’d been more senile and out-of-it as a 39-year-old than Steve had been as an 87-year-old. What I didn’t know at the time was that Steve had sent that email from the hospital when he was close to death. It was the last I heard from him.

(Once I learned what was going on, I did send a get-well note, which I hope Steve saw, saying that I hoped he appreciated that I wasn’t praying for him.)

Besides the quote about good people, bad people, and religion, the other quote of Steve’s that he never managed to live down came from the last pages of The First Three Minutes, his classic 1970s popularization of big-bang cosmology:

The more the universe seems comprehensible, the more it also seems pointless.

In the 1993 epilogue, Steve tempered this with some more hopeful words, nearly as famous:

The effort to understand the universe is one of the very few things which lifts human life a little above the level of farce and gives it some of the grace of tragedy.

It’s not my purpose here to resolve the question of whether life or the universe have a point. What I can say is that, even in his last years, Steve never for a nanosecond acted as if life was pointless. He already had all the material comforts and academic renown anyone could possibly want. He could have spent all day in his swimming pool, or listening to operas. Instead, he continued publishing textbooks—a quantum mechanics textbook in 2012, an astrophysics textbook in 2019, and a “Foundations of Modern Physics” textbook in 2021 (!). As recently as this year, he continued writing papers—and not just “great man reminiscing” papers, but hardcore technical papers. He continued writing with nearly unmatched lucidity for a general audience, in the New York Review of Books and elsewhere. And I can attest that he continued peppering visiting speakers with questions about stellar evolution or whatever else they were experts on—because, more likely than not, he had redone some calculation himself and gotten a subtly different result from what was in the textbooks.

If God exists, I can’t believe He or She would find nothing more interesting to do with Steve than to torture him for his unbelief. More likely, I think, God is right now talking to Steve the same way Steve talked to Aristarchus in To Explain the World: “yes, you were close about the origin of neutrino masses, but here’s the part you were missing…” While, of course, Steve is redoing God’s calculation to be sure.

Feel free to use the comments as a place to share your own memories.

More Steven Weinberg memorial links (I’ll continue adding to this over the next few days):

Miscellaneous Steven Weinberg links

Matt von HippelLessons From Neutrinos, Part II

Last week I talked about the history of neutrinos. Neutrinos come in three types, or “flavors”. Electron neutrinos are the easiest: they’re produced alongside electrons and positrons in the different types of beta decay. Electrons have more massive cousins, called muon and tau particles. As it turns out, each of these cousins has a corresponding flavor of neutrino: muon neutrinos, and tau neutrinos.

For quite some time, physicists thought that all of these neutrinos had zero mass.

(If the idea of a particle with zero mass confuses you, think about photons. A particle with zero mass travels, like a photon, at the speed of light. This doesn’t make them immune to gravity: just as no light can escape a black hole, neither can any other massless particle. It turns out that once you take into account Einstein’s general theory of relativity, gravity cares about energy, not just mass.)

Eventually, physicists started to realize they were wrong, and neutrinos had a small non-zero mass after all. Their reason why might seem a bit strange, though. Physicists didn’t weigh the neutrinos, or measure their speed. Instead, they observed that different flavors of neutrinos transform into each other. We say that they oscillate: electron neutrinos oscillate into muon or tau neutrinos, which oscillate into the other flavors, and so on. Over time, a beam of electron neutrinos will become a beam of mostly tau and muon neutrinos, before becoming a beam of electron neutrinos again.

That might not sound like it has much to do with mass. To understand why it does, you’ll need to learn this post’s lesson:

Lesson 2: Mass is just How Particles Move

Oscillating particles seem like a weird sort of evidence for mass. What would be a more normal kind of evidence?

Those of you who’ve taken physics classes might remember the equation F=ma. Apply a known force to something, see how much it accelerates, and you can calculate its mass. If you’ve had a bit more physics, you’ll know that this isn’t quite the right equation to use for particles close to the speed of light, but that there are other equations we can use in a similar way. In particular, using relativity, we have E^2=p^2 c^2 + m^2 c^4. (At rest, p=0, and we have the famous E=mc^2). This lets us do the same kind of thing: give something a kick and see how it moves.

So let’s say we do that: we give a particle a kick, and measure it later. I’ll visualize this with a tool physicists use called a Feynman diagram. The line represents a particle traveling from one side to the other, from “kick” to “measurement”:

Because we only measure the particle at the end, we might miss if something happens in between. For example, it might interact with another particle or field, like this:

If we don’t know about this other field, then when we try to measure the particle’s mass we will include interactions like this. As it turns out, this is how the Higgs boson works: the Higgs field interacts with particles like electrons and quarks, changing how they move, so that they appear to have mass.

Quantum particles can do other things too. You might have heard people talk about one particle turning into a pair of temporary “virtual particles”. When people say that, they usually have a diagram in mind like this:

In particle physics, we need to take into account every diagram of this kind, every possible thing that could happen in between “kick” and measurement. The final result isn’t one path or another, but a sum of all the different things that could have happened in between. So when we measure the mass of a particle, we’re including every diagram that’s allowed: everything that starts with our “kick” and ends with our measurement.

Now what if our particle can transform, from one flavor to another?

Now we have a new type of thing that can happen in between “kick” and measurement. And if it can happen once, it can happen more than once:

Remember that, when we measure mass, we’re measuring a sum of all the things that can happen in between. That means our particle could oscillate back and forth between different flavors many many times, and we need to take every possibility into account. Because of that, it doesn’t actually make sense to ask what the mass is for one flavor, for just electron neutrinos or just muon neutrinos. Instead, mass is for the thing that actually moves: an average (actually, a quantum superposition) over all the different flavors, oscillating back and forth any number of times.

When a process like beta decay produces an electron neutrino, the thing that actually moves is a mix (again, a superposition) of particles with these different masses. Because each of these masses respond to their initial “kick” in different ways, you see different proportions of them over time. Try to measure different flavors at the end, and you’ll find different ones depending on when and where you measure. That’s the oscillation effect, and that’s why it means that neutrinos have mass.

It’s a bit more complicated to work out the math behind this, but not unreasonably so: it’s simpler than a lot of other physics calculations. Working through the math, we find that by measuring how long it takes neutrinos to oscillate we can calculate the differences between (squares of) neutrino masses. What we can’t calculate are the masses themselves. We know they’re small: neutrinos travel at almost the speed of light, and our cosmological models of the universe have surprisingly little room for massive neutrinos: too much mass, and our universe would look very different than it does today. But we don’t know much more than that. We don’t even know the order of the masses: you might assume electron neutrinos are on average lighter than muon neutrinos, which are lighter than tau neutrinos…but it could easily be the other way around! We also don’t know whether neutrinos get their mass from the Higgs like other particles do, or if they work in a completely different way.

Unlike other mysteries of physics, we’ll likely have the answer to some of these questions soon. People are already picking through the data from current experiments, seeing if they hint towards one order of masses or the other, or to one or the other way for neutrinos to get their mass. More experiments will start taking data this year, and others are expected to start later this decade. At some point, the textbooks may well have more “normal” mass numbers for each of the neutrinos. But until then, they serve as a nice illustration of what mass actually means in particle physics.

n-Category Café Diversity and the Mysteries of Counting

Back in 2005 or so, John Baez was giving talks about groupoid cardinality, Euler characteristic, and strange objects with two and a half elements.

two-element group acting on five-element set

I saw a version of this talk at Streetfest in Sydney, called The Mysteries of Counting. It had a big impact on me.

This post makes one simple point: that by thinking along the lines John advocated, we can arrive at the exponential of Shannon entropy — otherwise known as diversity of order 11.

Background: counting with fractional elements

Consider a group GG acting on a set XX, both finite. How many “pieces” of XX are there, modulo GG?

The first, simple, answer is |X/G||X/G|, the number of orbits. For example, in the action of C 2C_2 on the five-element set shown in the diagram above, this gives an answer of 33.

But there’s a second answer: |X|/|G||X|/|G|, the cardinality of XX divided by the order of GG. If the action is free then this is equal to |X/G||X/G|, but in general it’s not. For instance, in the example shown, this gives an answer of 2122\tfrac{1}{2}.

The short intuitive justification for this alternative answer is that when counting, we often want to quotient out by the order of symmetry groups. For example, there are situations in which a point with an S nS_n-action deserves to count as only 1/n!1/n! elements. John’s slides say much more on this theme.

This second answer also fits into an abstract framework.

Every action of a group on a set gives rise to a so-called action groupoid X//GX/\!/G. Its objects are the elements of XX, and a map xyx \to y is an element gGg \in G such that gx=yg x = y. Equivalently, it’s the category of elements of the functor GSetG \to Set corresponding to the group action.

There is a notion of the cardinality |A||A| \in \mathbb{Q} of a finite groupoid AA. And when you take the cardinality of this action groupoid, you get exactly |X|/|G||X|/|G|. That is:

|X//G|=|X|/|G|. |X/\!/G| = |X|/|G|.

In summary, the first answer to “how many pieces are there?” is |X/G||X/G|, and the second is |X//G||X/\!/G|.

Now let’s do the same thing with monoids in place of groups. Let MM be a monoid acting on a set XX, and assume they’re both finite whenever we need to. How many “pieces” of XX are there, modulo MM?

Again, the obvious answer is |X/M||X/M|. Here X/MX/M is again the set of orbits. (You have to be a bit more careful defining “orbit” for monoids than for groups: the equivalence relation on XX is generated, not defined, by xyx \sim y if there exists mMm \in M such that mx=ym x = y.)

And again, there’s a second, alternative answer: |X|/|M||X|/|M|.

The abstract framework for this second answer is now about categories rather than groupoids. Every action of a monoid on a set gives rise to an action category X//MX/\!/M. It has the same explicit and categorical descriptions as in the group case.

There is a notion of the Euler characteristic or magnitude |A||A| \in \mathbb{Q} of a finite category AA. It could equally well be called “cardinality”, and generalizes groupoid cardinality. And when you take the Euler characteristic of this action category, you get |X|/|M||X|/|M|. That is:

|X//M|=|X|/|M|. |X/\!/M| = |X|/|M|.

So just as for groups, the first and second answers to “how many pieces are there?” are |X/M||X/M| and |X//M||X/\!/M|, respectively.

How diversity emerges

Take a surjective map of sets π:AI\pi: A \to I. We’re going to think about two structures that arise from π\pi:

  • The monoid End I(A)End_I(A) of endomorphisms of AA over II (that is, making the evident triangle commute). The monoid structure is given by composition.

  • The set End(A)End(A) of all endomorphisms of AA. We could give this a monoid structure too, but we’re only ever going to treat it as a set.

The monoid End I(A)End_I(A) acts on the set End(A)End(A) on the left, by composition: an endomorphism of AA followed by an endomorphism of AA over II is an endomorphism over AA. (The fact that one of them is over II does nothing.)

So, how many pieces of End(A)End(A) are there, modulo End I(A)End_I(A)?

The first answer is “count the number of orbits”. How many orbits are there?

It’s not too hard to show that there’s a canonical isomorphism

End(A)End I(A)I A, \frac{End(A)}{End_I(A)} \cong I^A,

where the right-hand side is the set of functions AIA \to I. It’s a simple little proof of a type that doesn’t work very well in a blog post. You start with the fact that composition with π\pi gives a map End(A)I AEnd(A) \to I^A and take it from there.

So assuming now that AA and II are finite, the number of orbits is given by

|End(A)End I(A)|=|I| |A|. \biggl| \frac{End(A)}{End_I(A)} \biggr| = |I|^{|A|}.

That’s the first answer, but while we’re here, let’s observe that this equation can be rearranged to give a funny formula for the cardinality of II:

|I|=|End(A)End I(A)| 1/|A|. |I| = \biggl| \frac{End(A)}{End_I(A)} \biggr|^{1/|A|}.

The second, alternative, answer is that the number of pieces is

|End(A)||End I(A)|. \frac{|End(A)|}{|End_I(A)|}.

What is this, explicitly? Write the fibre π 1(i)\pi^{-1}(i) as A iA_i. Then

|End(A)|=|A| |A|,|End I(A)|= iI|A i| |A i|, |End(A)| = |A|^{|A|}, \quad |End_I(A)| = \prod_{i \in I} |A_i|^{|A_i|},

and since A= iIA iA = \coprod_{i \in I} A_i, this gives

|End(A)||End I(A)|= iI(|A||A i|) |A i| \frac{|End(A)|}{|End_I(A)|} = \prod_{i \in I} \biggl( \frac{|A|}{|A_i|} \biggr)^{|A_i|}

as our second answer.

Now here’s the punchline. The first answer gave us a formula for the cardinality of II:

|I|=|End(A)End I(A)| 1/|A|. |I| = \biggl| \frac{End(A)}{End_I(A)} \biggr|^{1/|A|}.

What’s the analogous formula for the second answer? That is, in the equation

?=(|End(A)||End I(A)|) 1/|A|, ? = \biggl(\frac{|End(A)|}{|End_I(A)|}\biggr)^{1/|A|},

what goes on the left-hand side?

Well, it’s

iI(|A||A i|) |A i|/|A|. \prod_{i \in I} \biggl( \frac{|A|}{|A_i|} \biggr)^{|A_i|/|A|}.

We can simplify this if we write p i=|A i|/|A|p_i = |A_i|/|A|. These numbers p ip_i are nonnegative and sum to 11, so they amount to a probability distribution

p=(p i) iI=(|A i|/|A|) iI. \mathbf{p} = (p_i)_{i \in I} = (|A_i|/|A|)_{i \in I}.

The formula for “?” then simplifies to

iIp i p i. \prod_{i \in I} p_i^{-p_i}.

And this is nothing but the exponential of Shannon entropy, also called the diversity or Hill number D 1(p)D_1(\mathbf{p}) of order 11.

In conclusion, if we take that first-answer formula for the cardinality of II

|I|=|End(A)End I(A)| 1/|A| |I| = \biggl| \frac{End(A)}{End_I(A)} \biggr|^{1/|A|}

— then its second-answer analogue is a formula for the diversity of p\mathbf{p}

D 1(p)=(|End(A)||End I(A)|) 1/|A|. D_1(\mathbf{p}) = \biggl(\frac{|End(A)|}{|End_I(A)|}\biggr)^{1/|A|}.

And this is yet another glimpse of an analogy that André Joyal first explained to me one fateful day in Barcelona in 2008: exponential of entropy is a cousin of cardinality.

Doug NatelsonWorkshop highlights: Spins, 1D topo materials from carbon, and more

 While virtual meetings can be draining (no breaks to go hiking; no grabbing a beer and catching up, especially when attendees are spread out across a 7 timezones), this workshop was a great way for me to catch up on some science that I'd been missing.  I can't write up everything (mea culpa), but here are a few experimental highlights:

  • Richard Berndt's group has again shown that shot noise integrated with STM is powerful, and they have used tunneling noise measurements to probe where and how spin-polarized transport happens through single radical-containing molecules on gold surfaces.
  • Katharina Franke's group has looked at what happens when you have a localized spin on the surface of a superconductor.  Exchange coupling can rip apart Cooper pairs and bind a quasiparticle in what are called Yu-Shiba-Rusinov states.  With STM, it is possible to map these and related phenomena spatially, and the states can also be tuned via tip height, leading to very pretty data.
  • Amazing polymers from here.
    Pavel Jelinek gave a talk with some really eye-popping images as well as cool science.  I had not realized before that in 1D conjugated systems (think polyacetylene) it is possible to see a topological transition as a function of length, between a conjugated state (with valence-band-like orbitals filled, and conduction-band-like orbitals empty) and another conjugated state that has an unpaired electron localized at each end (equivalent to surface states) with effectively band inversion (empty valence-band-like states above filled conduction-band-like states) in the middle.  You can actually make polymers (shown here) that show these properties and image the end states via STM.  
  • Latha Venkataraman spoke about intellectually related work.  Ordinarily, even with a conjugated oligomer, conductance falls exponentially with increasing molecular length.   However, under the right circumstances, you can get the equivalent topological transition, creating resonant states localized at the molecular ends, and over some range of lengths, you can get electronic conduction increasing with increasing molecular length.  As the molecule gets longer the resonances become better defined and stronger, though at even larger lengths the two end states decouple from each other and conductance falls again.
  • Jascha Repp did a really nice job laying out their technique that is AFM with single-charge-tunneling to give STM-like information for molecules on insulating substrates.  Voltage pulses are applied in sync with the oscillating tip moving into close proximity with the molecule, such that single charges can be added or removed each cycle.  This is detected through shifts in the mechanical resonance of the AFM cantilever due to the electrostatic interactions between the tip and the molecule.  This enables time-resolved measurements as well, to look at things like excited state lifetimes in individual molecules.
The meeting is wrapping up today, and the discussions have been a lot of fun.  Hopefully we will get together in person soon!

Georg von HippelImpressions from LATTICE 2021, Part II

 As it turns out, Gather is not particularly stable. There are frequent glitches with the audio and video functionality, where people can't hear or can't be heard. The platform also doesn't run particularly well on Firefox (apparently only Google Chrome is officially supported) – for a web-based platform that seems almost inexcusable, since the web is all about free and open standards. One possible reason for the glitches may be the huge number of tracking scripts, cookies, beacons and other privacy-invasive technology that Gather tries to afflict its users with.

That being said, it is hard to imagine a much better principle for a virtual conference than the one Gather tries to implement, although I'm not entirely sure that automatically connecting to audio/video when simply walking past a person is necessarily the best approach.

Moving past the technical surface to the actual content of the conference, one major difference between Lattice 2021 and the typical lattice conference is the emphasis on special sessions, many of which (like the career panels) specifically target a younger audience. This makes a lot of sense given that the tiny conference fee and non-existent travel costs make this conference particularly affordable for early-stage graduate students who would normally not have the means to attend a conference. It is very praiseworthy that the LOC made a refreshing lemonade out of some Covid-infested lemons and created a conference that takes advantage of the otherwise rather unfortunate situation we all find ourselves in!

Tommaso DorigoMultiple Scattering: When Atoms Kick Particles Around

When subnuclear particles traverse matter they give rise to a multitude of physical phenomena. The richness of the different processes is a crucial asset for the construction of sensitive particle detectors, and it is interesting in its own right. Indeed, it has been a very vigorously pursued field of research of its own ever since the end of the nineteenth century, with the discovery of X rays
(produced when electrons released their kinetic energy as they reached the cathode of an accelerating tube), and then after Rutherford's team bombarded gold foils with alpha particles (helium nuclei) emitted by a radioactive substance.

read more

July 29, 2021

John BaezStructured vs Decorated Cospans (Part 2)

Decorated cospans are a framework for studying open systems invented by Brendan Fong. Since I’m now visiting the institute he and David Spivak set up—the Topos Institute—it was a great time to give a talk explaining the history of decorated cospans, their problems, and how those problems have been solved:

Structured vs Decorated Cospans

Abstract. One goal of applied category theory is to understand open systems: that is, systems that can interact with the external world. We compare two approaches to describing open systems as morphisms: structured and decorated cospans. Each approach provides a symmetric monoidal double category. Structured cospans are easier, decorated cospans are more general, but under certain conditions the two approaches are equivalent. We take this opportunity to explain some tricky issues that have only recently been resolved.

It’s probably best to get the slides here and look at them while watching this video:

If you prefer a more elementary talk explaining what structured and decorated cospans are good for, try these slides.

For videos and slides of two related talks go here:

Structured cospans and double categories.

Structured cospans and Petri nets.

For more, read these:

• Brendan Fong, Decorated cospans.

• Evan Patterson and Micah Halter, Compositional epidemiological modeling using structured cospans.

• John Baez and Kenny Courser, Structured cospans.

• John Baez, Kenny Courser and Christina Vasilakopoluou, Structured versus decorated cospans.

• Kenny Courser, Open Systems: a Double Categorical Perspective.

• Michael Shulman, Framed bicategories and monoidal fibrations.

• Joe Moeller and Christina Vasilakopolou, Monoidal Grothendieck construction.

To read more about the network theory project, go here:

Network Theory.

July 28, 2021

Scott AaronsonStriking new Beeping Busy Beaver champion

For the past few days, I was bummed about the sooner-than-expected loss of Steven Weinberg. Even after putting up my post, I spent hours just watching old interviews with Steve on YouTube and reading his old essays for gems of insight that I’d missed. (Someday, I’ll tackle Steve’s celebrated quantum field theory and general relativity textbooks … but that day is not today.)

Looking for something to cheer me up, I was delighted when Shtetl-Optimized reader Nick Drozd reported a significant new discovery in BusyBeaverology—one that, I’m proud to say, was directly inspired by my Busy Beaver survey article from last summer (see here for blog post).

Recall that BB(n), the nth Busy Beaver number (technically, “Busy Beaver shift number”), is defined as the maximum number of steps that an n-state Turing machine, with 1 tape and 2 symbols, can make on an initially all-0 tape before it invokes a Halt transition. Famously, BB(n) is not only uncomputable, it grows faster than any computable function of n—indeed, computing anything that grows as quickly as Busy Beaver is equivalent to solving the halting problem.

As of 2021, here is the extent of human knowledge about concrete values of this function:

  • BB(1) = 1 (trivial)
  • BB(2) = 6 (Lin 1963)
  • BB(3) = 21 (Lin 1963)
  • BB(4) = 107 (Brady 1983)
  • BB(5) ≥ 47,176,870 (Marxen and Buntrock 1990)
  • BB(6) > 7.4 × 1036,534 (Kropitz 2010)
  • BB(7) > 102×10^10^10^18,705,352 (“Wythagoras” 2014)

As you can see, the function is reasonably under control for n≤4, then “achieves liftoff” at n=5.

In my survey, inspired by a suggestion of Harvey Friedman, I defined a variant called Beeping Busy Beaver, or BBB. Define a beeping Turing machine to be a TM that has a single designated state where it emits a “beep.” The beeping number of such a machine M, denoted b(M), is the largest t such that M beeps on step t, or ∞ if there’s no finite maximum. Then BBB(n) is the largest finite value of b(M), among all n-state machines M.

I noted that the BBB function grows uncomputably even given an oracle for the ordinary BB function. In fact, computing anything that grows as quickly as BBB is equivalent to solving any problem in the second level of the arithmetical hierarchy (where the computable functions are in the zeroth level, and the halting problem is in the first level). Which means that pinning down the first few values of BBB should be even more breathtakingly fun than doing the same for BB!

In my survey, I noted the following four concrete results:

  • BBB(1) = 1 = BB(1)
  • BBB(2) = 6 = BB(2)
  • BBB(3) ≥ 55 > 21 = BB(3)
  • BBB(4) ≥ 2,819 > 107 = BB(4)

The first three of these, I managed to get on my own, with the help of a little program I wrote. The fourth one was communicated to me by Nick Drozd even before I finished my survey.

So as of last summer, we knew that BBB coincides with the ordinary Busy Beaver function for n=1 and n=2, then breaks away starting at n=3. We didn’t know how quickly BBB “achieves liftoff.”

But Nick continued plugging away at the problem all year, and he now claims to have resolved the question. More concretely, he claims the following two results:

  • BBB(3) = 55 (via exhaustive enumeration of cases)
  • BBB(4) ≥ 32,779,478 (via a newly-discovered machine)

For more, see Nick’s announcement on the Foundations of Mathematics email list, or his own blog post.

Nick actually writes in terms of yet another Busy Beaver variant, which he calls BLB, or “Blanking Beaver.” He defines BLB(n) to be the maximum finite number of steps that an n-state Turing machine can take before it first “wipes its tape clean”—that is, sets all the tape squares to 0, as they were at the very beginning of the computation, but as they were not at intermediate times. Nick has discovered a 4-state machine that takes 32,779,477 steps to blank out its tape, thereby proving that

  • BLB(4) ≥ 32,779,477.

Nick’s construction, when investigated, turns out to be based on a “Collatz-like” iterative process—exactly like the BB(5) champion and most of the other strong Busy Beaver contenders currently known. A simple modification of his construction yields the lower bound on BBB.

Note that the Blanking Beaver function does not have the same sort of super-uncomputable growth that Beeping Busy Beaver has: it merely grows “normally” uncomputably fast, like the original BB function did. Yet we see that BLB, just like BBB, already “achieves liftoff” by n=4, rather than n=5. So the real lesson here is that 4-state Turing machines can already do fantastically complicated things on blank tapes. It’s just that the usual definitions of the BB function artificially prevent us from seeing that; they hide the uncomputable insanity until we get to 5 states.

n-Category Café Topos Theory and Measurability

There was an interesting talk that took place at the Topos Institute recently – Topos theory and measurability – by Asgar Jamneshan, bringing category theory to bear on measure theory.

Jamneshan has been working with Terry Tao on this:

  • Asgar Jamneshan, Terence Tao, Foundational aspects of uncountable measure theory: Gelfand duality, Riesz representation, canonical models, and canonical disintegration (arXiv:2010.00681)

The topos aspect is not emphasized in this paper, but it seems to have grown out of a post by Tao – Real analysis relative to a finite measure space – which did.

Jamneshan explains in the talk that he is using an internal language for some Boolean toposes, but can rewrite this via a “conditional set theory” (Slide 12). In another of his papers,

  • Asgar Jamneshan, An uncountable Furstenberg-Zimmer structure theory (arXiv:2103.17167),

he writes:

we apply tools from topos theory as suggested … by Tao. We will not use topos theory directly nor define a sheaf anywhere in this paper. Instead, we use the closely related conditional analysis… A main advantage of conditional analysis is that it can be setup without much costs and understood with basic knowledge in measure theory and functional analysis…

For the readers familiar with topos theory and interested in the connection between ergodic theory and topos theory, we include several extensive remarks relating the conditional analysis of this paper to the internal discourse in Boolean Grothendieck sheaf topoi.

In the talk itself, after answering some questions, Jamneshan poses a question of his own for topos theorists (at 1:02:20). Perhaps a Café-goer has something to say.

Some other work on Boolean toposes and measure theory is mentioned on the nLab here.

n-Category Café Borel Determinacy Does Not Require Replacement

Ask around for an example of ordinary mathematics that uses the axiom scheme of replacement in an essential way, and someone will probably say “the Borel determinacy theorem”. It’s probably the most common answer to this question.

As an informal statement, it’s not exactly wrong: there’s a precise mathematical result behind it. But I’ll argue that it’s misleading. It would be at least as accurate, arguably more so, to say that Borel determinacy does not require replacement.

For the purposes of this post, it doesn’t really matter what the Borel determinacy theorem says. I’ll give a lightning explanation, but you can skip even that.

Thanks to David Roberts for putting me on to this. You can read David’s recent MathOverflow question on this point too.

To read this post, all you only need to know about the statement of the Borel determinacy theorem is that it involves nothing set-theoretically exotic. Here it is:

Theorem (Borel determinacy)   Let AA be a set and XA X \subseteq A^{\mathbb{N}}. If something then something.

Neither “something” is remotely sneaky. The highest infinity mentioned is \mathbb{N}. Moreover, the set AA is often taken to be \mathbb{N} too.

If you want to know more about the theorem, there are plenty of explanations out there by people more qualified than me, including a very nice series of blog posts by Tim Gowers. But perhaps it will create an undeserved aura of mystery if I don’t actually state the theorem, so I will. I’ll indent it for easy ignoring.

Ignorable digression: what Borel determinacy says   Given a set AA and a subset XA X \subseteq A^\mathbb{N}, we imagine a two-player game as follows. Player I chooses an element a 0a_0 of AA, then player II chooses an element a 1a_1 of AA, then player I chooses an element a 2a_2 of AA, and so on forever, alternating. If the resulting sequence (a 0,a 1,a 2,)(a_0, a_1, a_2, \ldots) is in XX then player I wins. If not, II wins.

A strategy for player I is a function that takes as its input an even-length finite sequence of elements of AA (to be thought of as the moves played so far) and produces as its output a single element of AA (to be thought of as the recommended next move for player I). It is a winning strategy if following it guarantees a win for player I, regardless of what II plays. Winning strategies for player II are defined similarly.

Does either of the players have a winning strategy? Clearly they can’t both have one, and clearly the answer depends on AA and XX. If one of the players has a winning strategy, the game is said to be determined.

A negative result: using the axiom of choice, one can construct for any set A>1A \gt 1 a subset XA X \subseteq A^{\mathbb{N}} such that the game is not determined.

Positive results tend to be stated in terms of the product topology on A A^\mathbb{N}, where AA itself is seen as discrete. For example, it’s fairly easy to show that if XX is open or closed in A A^\mathbb{N} then the game is determined.

A subset of a topological space is Borel if it belongs to the σ\sigma-algebra generated by the open sets, that is, the smallest class of subsets containing the open sets and closed under complements and countable unions.

The Borel determinacy theorem, proved by Donald A. (Tony) Martin in 1975, states that if XA X \subseteq A^\mathbb{N} is Borel then the game is determined.

What is it that people say about Borel determinacy and replacement? Here are some verbatim quotations from people who surely know more about Borel determinacy than I do.

From the Wikipedia entry on the axiom scheme of replacement:

Harvey Friedman showed that replacement is required to show that Borel sets are determined

(my bold, here and throughout).

From a nice math.stackexchange answer by set theorist Andrés Caicedo:

it [the proof of the Borel determinacy theorem] uses replacement in an unavoidable manner.

From the opening post of that splendid series by Tim Gowers:

In order to iterate the power set operation many times, you have to use the axiom of replacement many times.

Gowers begins by citing two helpful sources, one by Shahzed Ahmed, who writes (p.2):

in 1971 Friedman showed it is not possible to prove Borel Determinacy without the replacement axiom, at least for an initial segment of the universe

…and one by Ross Bryant (p.3):

a proof of Borel Determinacy […] would become the first known theorem realizing the full potency of the Axiom of Replacement. […] Refinements of the proof in [Mar85] revealed a purely inductive proof and the full use of Replacement in the notion of a covering.

(Of all these quotes, this is the only one I’d call factually incorrect.)

From a paper by philosopher Michael Potter (p.185):

There are results about the fine structure of sets of real numbers and how they mesh together which require replacement for their proof. The clearest known example is a result of Martin to the effect that every Borel game is determined, which has been shown by Harvey Friedman to need replacement.

From Martin’s 1975 paper proving the Borel determinacy theorem:

Borel determinacy is probably then the first theorem whose statement does not blatantly involve the axiom of replacement but whose proof is known to require the axiom of replacement.

From philosopher of mathematics Penelope Maddy (Believing the axioms I, p.490):

Recently, however, Martin used Replacement to show that all Borel sets are determined […] Earlier work of Friedman establishes that this use of Replacement is essential.

And from a recent paper by logician Juan Aguilera (p.2):

there are many examples of theorems that cannot be proved without the use of the axiom of replacement. An early example of this was that of Borel determinacy. […] Even before Martin’s Borel determinacy theorem had been proved, it was known that any proof would need essential use of the axiom of replacement.

So there you have it. Experts agree that to prove the Borel determinacy theorem, the axiom scheme of replacement is required, needed, unavoidable, essential. You have to use it. Without replacement, it is not possible to prove Borel determinacy. It cannot be proved otherwise.

I want to argue that this emphasis is misplaced and misleading. It is especially misplaced from the point of view of categorical set theory, or more specifically Lawvere’s Elementary theory of the category of sets (ETCS).

The mathematical point is that to prove Borel determinacy, what you need to be able to do is construct the transfinitely iterated power set 𝒫 W(A)\mathcal{P}^W(A) for all countable well-ordered sets WW (or ordinals if you prefer). That is, you need 𝒫(A)\mathcal{P}(A), then 𝒫(𝒫(A))\mathcal{P}(\mathcal{P}(A)), and generally 𝒫 n(A)\mathcal{P}^n(A) for all natural numbers nn, then their supremum 𝒫 ω(A)\mathcal{P}^\omega(A), then its power set 𝒫 ω+1(A)\mathcal{P}^{\omega + 1}(A), and so on through all countable well-ordered exponents.

When I say you “need to be able to do” this, I mean it in a strong sense: Friedman showed in the early 1970s that Borel determinacy can’t be proved otherwise (even, I think, in the case where |A|=2|A| = 2). And when Martin later proved Borel determinacy, his proof used this construction and no more.

If we remove replacement from ZFC then transfinitely iterating the power set construction is impossible, so Borel determinacy cannot be proved. Indeed, Friedman found a model of ZFC-without-replacement in which the Borel determinacy theorem is false. This is the precise result that lies behind the quotes above.

So what’s the problem? It’s that replacement is vastly more than necessary.

Let me tell you a story.

One morning I went for a long walk. Far from home, I started to feel a bit hungry and wanted to buy a snack. But I realized I hadn’t brought any money . What could I do?

Out of nowhere appeared a chef. Somehow, magically, he seemed to know what I needed. “Come into my restaurant!” he cried, beckoning me in. Inside, he showed me a big table groaning with food, enough to feed a wedding party. “This is all for you!”

Table filled with food

“But I only want a snack,” I replied.

“It’s all or nothing. Either you eat everything on this table, or you go hungry,” he said, locking the door.

I weighed up the pros and cons.

I made my decision.

I sat down to eat.

Darkness had fallen before I emerged from the restaurant, belly swollen, waddling, beyond queasy. My urge for a snack had been satisfied, but what terrible price had I paid?

In the months that followed, a rumour began to circulate. People said I “required” enough food for fifty. In awed whispers, they told of how my legendary hunger “could not be satisfied” otherwise, that this vast spread was “essential” for me to live. I “needed” it, they said. Some claimed I had to have the “full table”. Without that gargantuan smorgasbord, they said, it was “not possible” for me to fulfil my urge for a snack.

I ask you, is that fair?

All I actually needed was a snack, and all the Borel determinacy theorem actually needs is the axiom “all beths exist”. This means that the transfinitely iterated power set W=𝒫 W()\beth_W = \mathcal{P}^W(\mathbb{N}) exists for all well-ordered sets WW. As I explained in a recent post, it is equivalent to any of the following conditions.

  • Recursive, order-theoretic version: for a well-ordered set WW, we say that W\beth_W exists if there is some infinite set BB such that B2 VB \geq 2^{\beth_V} for every well-ordered set VV smaller than WW. In that case, W\beth_W is defined to be the smallest such set BB. “All beths exist” means that W\beth_W exists for all well-ordered sets WW.

  • Non-recursive, order-theoretic version: “all beths exist” means that for every well-ordered set WW, there exist a set XX and a function p:XWp: X \to W with the following property: for wWw \in W, the fibre p 1(w)p^{-1}(w) is the smallest infinite set 2 p 1(v)\geq 2^{p^{-1}(v)} for all v<wv \lt w.

  • Non-recursive, non-order-theoretic version: “all beths exist” means that for every set II, there exist a set XX and a function p:XIp: X \to I such that for all distinct i,jIi, j \in I, either 2 p 1(i)p 1(j)2^{p^{-1}(i)} \leq p^{-1}(j) or vice versa.

If all beths exist then for any set AA and well-ordered set WW, we can form the transfinitely iterated power set 𝒫 W(A)\mathcal{P}^W(A), which is enough to prove the Borel determinacy theorem.

In fact, for the Borel determinacy theorem, we only need this when WW is countable. Moreover, one often assumes that the set AA on which the game is played is countable too. In that case, we only need countable beths to exist (where “countable” refers to the WW of W\beth_W, not W\beth_W itself).

The axiom that all beths exist is vastly weaker than the axiom scheme of replacement. (And the axiom that all countable beths exist, weaker still.) To give the merest hint of the distance between them, it is consistent with the existence of all beths that there are no beth fixed points, whereas replacement implies the existence of unboundedly many beth fixed points. See the diagram here.

So the take-home message is this:

Borel determinacy does not require replacement. It only requires the existence of all beths.

I now want to say something loud and clear:

Everyone knows this.

Well, not literally everyone, otherwise I wouldn’t be bothering to write this post. (It comes as a surprise to some.) By “everyone” I mean everyone who has worked through a proof of the Borel determinacy theorem, including everyone I’ve quoted and presumably most working set theorists. I’m taking no one for a fool!

First, they know Borel determinacy can’t possibly need the full power of replacement for the simple reason that replacement is an axiom scheme, a bundle of infinitely many axioms, one for each first-order formula of the appropriate kind. Any proof of a non-exotic theorem (I mean, one not quantified over formulas) can use only finitely many of these axioms. But more specifically, for this particular proof, “everyone” knows which instances are needed.

Indeed, many of the people I quoted earlier made this point too, often in the same sources I quoted from. To be fair to all concerned, I’ll show them doing so. Here’s Andrés Caicedo:

the [set needed] is something like 𝒫 α()\mathcal{P}^\alpha(\mathbb{N}) if the original Borel set was at level α\alpha.

Juan Aguilera (p.2):

Friedman’s work showed that any proof of Borel determinacy would require the use of arbitrarily large countable iterations of the power set operator, and Martin’s work showed that this suffices. A convenient slogan is that Borel determinacy captures the strength of countably iterated powersets.

Tony Martin (p.1):

Later (in §2.3) we will use results from §1.4 in analyzing level by level how much of the Power Set and Replacement Axioms is needed for our proof of the determinacy of Borel games.

Set theorist Asaf Karagila:

Even the famous Borel determinacy result (which is the usual appeal of Replacement outside of set theory […]) only requires ω 1\beth_{\omega_1} which is far, far below the least [beth] fixed point.

And Penelope Maddy (p.489, shortly before she gets on to Borel determinacy specifically):

Around 1922, both Fraenkel and Skolem noticed that Zermelo’s axioms did not imply the existence of

{N,𝒫(N),𝒫(𝒫(N)),} \{ N, \mathcal{P}(N), \mathcal{P}(\mathcal{P}(N)), \ldots \}

or the cardinal number ω\aleph_\omega. These were so much in the spirit of informal set theory that Skolem proposed an Axiom of Replacement to provide for them.

Did you notice something about that last quotation? Maddy goes straight from the existence of ω\aleph_\omega and ω\beth_\omega to the full axiom scheme of replacement. To be fair, she’s recounting history rather than doing mathematics. But it’s like saying “Skolem needed a snack, so he ate enough for an entire wedding party”.

In summary, it seems to be common for people (especially set theorists) to state explicitly that replacement is required or necessary or essential to prove Borel determinacy — that it cannot be proved without replacement — while fully aware that only a minuscule fragment of replacement is, in fact, needed.

I’ve said that I think this is misleading, but it also strikes me as curious. Those who do this, many of them experts, are eliding the difference between a light snack and a health-threateningly huge food mountain. Why would they do that?

My theory is that it’s down to the success of ZFC. Now that the axioms are fixed and published in hundreds of textbooks, it’s natural to think of each axiom in take-it-or-leave-it terms. We include replacement or we don’t. It’s the same choice the chef gave me: the full table or nothing.

Membership-based set theorists do study fragments of replacement, including in this context. This conversation began when David Roberts pointed out Corollary 2.3.8 and Remark (i) after it in Martin’s book draft, which concern exactly this point. But there are orders of magnitude fewer people who read technical side-notes like Martin’s than who read and repeat the simple but misleading message “Borel determinacy requires replacement”.

For those of us whose favourite set theory is ETCS, replacement is just one of many possible axioms with which one could supplement the core system. There are many others. “All beths exist” is one of them — a far weaker one — which is perfectly natural and perfect for the job at hand. Reflexively reaching for replacement just isn’t such a temptation in categorical set theory.

All of that leaves a question:

Is there any prominent result in mathematics, outside logic and set theory, that can be proved using replacement but cannot be proved using “all beths exist”?

I don’t know. People ask questions similar to this on MathOverflow from time to time (although I don’t think anyone’s asked exactly this one). I haven’t been through the answers systematically. But I will note that in Wikipedia’s list of applications of replacement, Borel determinacy is the only entry that could be said to lie outside set theory. (And the remainder are meaningless in an isomorphism-invariant approach to set theory, largely involving the distinction between ordinals and well-ordered sets.) So if there’s an answer to my question, someone should tell the world.

July 27, 2021

John BaezThermodynamics and Economic Equilibrium

I’m having another round of studying thermodynamics, and I’m running into more interesting leads than I can keep up with. Like this paper:

• Eric Smith and Duncan K. Foley, Classical thermodynamics and economic general equilibrium theory, Journal of Economic Dynamics and Control 32 (2008) 7–65.

I’ve always been curious about the connection between economics and thermodynamics, but I know too little about economics to make this easy to explore. There are people who work on subjects called thermoeconomics and econophysics, but classical economists consider them ‘heterodox’. While I don’t trust classical economists to be right about things, I should probably learn more classical economics before I jump into the fray.

Still, the introduction of this paper is intriguing:

The relation between economic and physical (particularly thermodynamic) concepts of equilibrium has been a topic of recurrent interest throughout the development of neoclassical economic theory. As systems for defining equilibria, proving their existence, and computing their properties, neoclassical economics (Mas-Collel et al., 1995; Varian, 1992) and classical thermodynamics (Fermi, 1956) undeniably have numerous formal and methodological similarities. Both fields seek to describe system phenomena in terms of solutions to constrained optimization problems. Both rely on dual representations of interacting subsystems: the state of each subsystem is represented by pairs of variables, one variable from each pair characterizing the subsystem’s content, and the other characterizing the way it interacts with other subsystems. In physics the content variables are quantities like asubsystem’s total energy or the volume in space it occupies; in economics they area mounts of various commodities held by agents. In physics the interaction variables are quantities like temperature and pressure that can be measured on the system boundaries; in economics they are prices that can be measured by an agent’s willingness to trade one commodity for another.

In thermodynamics these pairs are called conjugate variables. The ‘content variables’ are usually called extensive and the ‘interaction variables’ are usually called intensive. A vector space with conjugate pairs of variables as coordinates is a symplectic vector space, and I’ve written about how these show up in the category-theoretic approach to open systems:

• John Baez, A compositional framework for passive linear networks, Azimuth, 28 April 2015.

Continuing on:

The significance attached to these similarities has changed considerably, however, in the time from the first mathematical formulation of utility (Walras, 1909) to the full axiomatization of general equilibrium theory (Debreu, 1987). Léon Walras appears (Mirowski, 1989) to have conceptualized economic equilibrium as a balance of the gradients of utilities, more for the sake of similarity to the concept of force balance in mechanics, than to account for any observations about the outcomes of trade. Fisher (1892) (a student of J. Willard Gibbs) attempted to update Walrasian metaphors from mechanics to thermodynamics, but retained Walras’s program of seeking an explicit parallelism between physics and economics.

This Fisher is not the geneticist and statistician Ronald Fisher who came up with Fisher’s fundamental theorem. It’s the author of this thesis:

• Irving Fisher, Mathematical Investigations in the Theory of Value and Prices, Ph.D. thesis, Yale University, 1892.

Continuing on with Smith and Foley’s paper:

As mathematical economics has become more sophisticated (Debreu, 1987) the naive parallelism of Walras and Fisher has progressively been abandoned, and with it the sense that it matters whether neoclassical economics resembles any branch of physics. The cardinalization of utility that Walras thought of as a counterpart to energy has been discarded, apparently removing the possibility of comparing utility with any empirically measurable quantity. A long history of logically inconsistent (or simply unproductive) analogy making (see Section 7.2) has further caused the topic of parallels to fall out of favor. Samuelson (1960) summarizes well the current view among many economists, at the end of one of the few methodologically sound analyses of the parallel roles of dual representation in economics and physics:

The formal mathematical analogy between classical thermodynamics and mathematic economic systems has now been explored. This does not warrant the commonly met attempt to find more exact analogies of physical magnitudes—such as entropy or energy—in the economic realm. Why should there be laws like the first or second laws of thermodynamics holding in the economic realm? Why should ‘utility’ be literally identified with entropy, energy, or anything else? Why should a failure to make such a successful identification lead anyone to overlook or deny the mathematical isomorphism that does exist between minimum systems that arise in different disciplines?

The view that neoclassical economics is now mathematically mature, and that it is mere coincidence and no longer relevant whether it overlaps with any body of physical theory, is reflected in the complete omission of the topic of parallels from contemporary graduate texts (Mas-Collel et al., 1995). We argue here that, despite its long history of discussion, there are important insights still to be gleaned from considering the relation of neoclassical economics to classical thermodynamics. The new results concerning this relation we present here have significant implications, both for the interpretation of economic theory and for econometrics. The most important point of this paper (more important than the establishment of formal parallels between thermodynamics and utility economics) is that economics, because it does not recognize an equation of state or define prices intrinsically in terms of equilibrium, lacks the close relation between measurement and theory physical thermodynamics enjoys.

Luckily, the paper seems to be serious about explaining economics to those who know thermodynamics (and maybe vice versa). So, I will now read the rest of the paper—or at least skim it.

One interesting simple point seems to be this: there’s an analogy between entropy maximization and utility maximization, but it’s limited by the following difference.

In classical thermodynamics the total entropy of a closed system made of subsystems is the sum of the entropies of the parts. While the second law forbids the system from moving to a state to a state of lower total entropy, the entropies of some parts can decrease.

By contrast, in classical economics the total utility of a collection of agents is an unimportant quantity: what matters is the utility of each individual agent. The reason is that we assume the agents will voluntarily move from one state to another only if the utility of each agent separately increases. Furthermore, if we believe we can reparametrize the utility of each agent without changing anything, it makes no sense to add utilities.

(On the other hand, some utilitarian ethicists seem to believe it makes sense to add utilities and try to maximize the total. I imagine that libertarians would consider this ‘totalitarian’ approach morally unacceptable. I’m even less eager to enter discussions of the foundations of ethics than of economics, but it’s interesting how the question of whether a quantity can or ‘should’ be totaled up and then maximized plays a role in this debate.)

Doug Natelson2021 Telluride workshop on Quantum Transport in Nanoscale Systems

This week I'm (virtually) attending this workshop, which unfortunately is zoom-based because of the ongoing pandemic and travel restrictions.  As I've mentioned in previous years, it's rather like a Gordon Conference, in that it's supposed to be a smaller meeting with a decent amount of pre-publication work.   I'll write up some highlights later, but for now I wanted to feature this image.  At left is the molecular structure of [5]triangulene, which can be assembled by synthetic chemistry methods and surface catalysis (in this case on Au(111) surface).  At right is an AFM image taken using a tip functionalized by a carbon monoxide molecule.  In case you ever doubted, those cartoons from chemistry class are sometimes accurate!

n-Category Café Entropy and Diversity Is Out!

My new book, Entropy and Diversity: The Axiomatic Approach, is in the shops!

If you live in a place where browsing the shelves of an academic bookshop is possible, maybe you’ll find it there. If not, you can order the paperback or hardback from CUP. And you can always find it on the arXiv.

Paperback and hardback with flowers and foliage

I posted here when the book went up on the arXiv. It actually appeared in the shops a couple of months ago, but at the time all the bookshops here were closed by law and my feelings of celebration were dampened.

But today someone asked a question on MathOverflow that prompted me to write some stuff about the book and feel good about it again, so I’m going to share a version of that answer here. It was long for MathOverflow, but it’s shortish for a blog post.

There are lots of aspects of the book that I’m not going to write about here. My previous post summarizes the contents, and the book itself is quite discursive. Today, I’m just going to stick to that MathOverflow question (by Aidan Rocke) and what I answered.

Briefly put, the question asked:

Might there be a natural geometric interpretation of the exponential of entropy?

I answered more or less as follows.

Paperback and hardback with flowers and foliage

The direct answer to your literal question is that I don’t know of a compelling geometric interpretation of the exponential of entropy. But the spirit of your question is more open, so I’ll explain (1) a non-geometric interpretation of the exponential of entropy, and (2) a geometric interpretation of the exponential of maximum entropy.

Diversity as the exponential of entropy

The exponential of entropy (Shannon entropy, or more generally Rényi entropy) has long been used by ecologists to quantify biological diversity. One takes a community with nn species and writes p=(p 1,,p n)\mathbf{p} = (p_1, \ldots, p_n) for their relative abundances, so that p i=1\sum p_i = 1. Then D q(p)D_q(\mathbf{p}), the exponential of the Rényi entropy of p\mathbf{p} of order q[0,]q \in [0, \infty], is a measure of the diversity of the community, or the “effective number of species” in the community.

Ecologists call D qD_q the Hill number of order qq, after the ecologist Mark Hill, who introduced them in 1973 (acknowledging the prior work of Rényi). There is a precise mathematical sense in which the Hill numbers are the only well-behaved measures of diversity, at least if one is modelling an ecological community in this crude way. That’s Theorem 7.4.3 of my book. I won’t talk about that here.

Explicitly, for q[0,]q \in [0, \infty],

D q(p)=( i:p i0p i q) 1/(1q) D_q(\mathbf{p}) = \biggl( \sum_{i:\,p_i \neq 0} p_i^q \biggr)^{1/(1 - q)}

(q1,q \neq 1, \infty). The two exceptional cases are defined by taking limits in qq, which gives

D 1(p)= i:p i0p i p i D_1(\mathbf{p}) = \prod_{i:\, p_i \neq 0} p_i^{-p_i}

(the exponential of Shannon entropy) and

D (p)=1/max i:p i0p i. D_\infty(\mathbf{p}) = 1/\max_{i:\, p_i \neq 0} p_i.

Rather than picking one qq to work with, it’s best to consider all of them. So, given an ecological community and its abundance distribution p\mathbf{p}, we graph D q(p)D_q(\mathbf{p}) against qq.

This is called the diversity profile of the community, and is quite informative. Different values of the parameter qq tell you different things about the community. Specifically, low values of qq pay close attention to rare species, and high values of qq ignore them.

For example, here’s the diversity profile for the global community of great apes:

ape diversity profile

(from Figure 4.3 of my book). What does it tell us? At least two things:

  • The value at q=0q = 0 is 88, because there are 88 species of great ape present on Earth. D 0D_0 measures only presence or absence, so that a nearly extinct species contributes as much as a common one.

  • The graph drops very quickly to 11 — or rather, imperceptibly more than 11. This is because 99.9% of ape individuals are of a single species (humans, of course: we “outcompeted” the rest, to put it diplomatically). It’s only the very smallest values of qq that are affected by extremely rare species. Non-small qqs barely notice such rare species, so from their point of view, there is essentially only 11 species. That’s why D q(p)1D_q(\mathbf{p}) \approx 1 for most qq.

Maximum diversity as a geometric invariant

A major drawback of the Hill numbers is that they pay no attention to how similar or dissimilar the species may be. “Diversity” should depend on the degree of variation between the species, not just their abundances. Christina Cobbold and I found a natural generalization of the Hill numbers that factors this in — similarity-sensitive diversity measures.

I won’t give the definition (see that last link or Chapter 6 of the book), but mathematically, this is basically a definition of the entropy or diversity of a probability distribution on a metric space. (As before, entropy is the log of diversity.) When all the distances are \infty, it reduces to the Rényi entropies/Hill numbers.

And there’s some serious geometric content here.

Let’s think about maximum diversity. Given a list of species of known similarities to one another — or mathematically, given a metric space — one can ask what the maximum possible value of the diversity is, maximizing over all possible species distributions p\mathbf{p}. In other words, what’s the value of

sup pD q(p), \sup_{\mathbf{p}} D_q(\mathbf{p}),

where D qD_q now denotes the similarity-sensitive (or metric-sensitive) diversity? Diversity is not usually maximized by the uniform distribution (e.g. see Example 6.3.1 in the book), so the question is not trivial.

In principle, the answer depends on qq. But magically, it doesn’t! Mark Meckes and I proved this. So the maximum diversity

D max(X):=sup pD q(p) D_{\text{max}}(X) := \sup_{\mathbf{p}} D_q(\mathbf{p})

is a well-defined real invariant of finite metric spaces XX, independent of the choice of q[0,]q \in [0, \infty].

All this can be extended to compact metric spaces, as Emily Roff and I showed. So every compact metric space has a maximum diversity, which is a nonnegative real number.

What on earth is this invariant? There’s a lot we don’t yet know, but we do know that maximum diversity is closely related to some classical geometric invariants.

For instance, when X nX \subseteq \mathbb{R}^n is compact,

Vol(X)=n!ω nlim tD max(tX)t n, \text{Vol}(X) = n! \omega_n \lim_{t \to \infty} \frac{D_{\text{max}}(tX)}{t^n},

where ω n\omega_n is the volume of the unit nn-ball and tXtX is XX scaled by a factor of tt. This is Proposition 9.7 of my paper with Roff and follows from work of Juan Antonio Barceló and Tony Carbery. In short: maximum diversity determines volume.

Another example: Mark Meckes showed that the Minkowski dimension of a compact space X nX \subseteq \mathbb{R}^n is given by

dim Mink(X)=lim tD max(tX)logt \dim_{\text{Mink}}(X) = \lim_{t \to \infty} \frac{D_{\text{max}}(tX)}{\log t}

(Theorem 7.1 here). So, maximum diversity determines Minkowski dimension too.

There’s much more to say about the geometric aspects of maximum diversity. Maximum diversity is closely related to another recent invariant of metric spaces, magnitude. Mark and I wrote a survey paper on the more geometric and analytic aspects of magnitude, and you can find more on all this in Chapter 6 of my book.


Although diversity is closely related to entropy, the diversity viewpoint really opens up new mathematical questions that you don’t see from a purely information-theoretic standpoint. The mathematics of diversity is a rich, fertile and underexplored area, beckoning us to come and investigate.

Scholar studying Entropy and Diversity

July 26, 2021

Georg von HippelImpressions from LATTICE 2021, Part I

 Since this year's lattice conference is fully online, and anyone who can access my blog can also attend the conference, with the slides available on the Indico page, and the Zoom sessions recorded, I don't think that my blogging the conference in the usual manner would be as useful as in an ordinary year. This is especially true given that there are over 800 participants this year (which beats Lattice 2013 in Mainz for the record attendance at a lattice conference), so I think it is a fair assumption that everyone who cares about the finer points of lattice field theory is at the conference already and doesn't really need a detailed description of the contents of the plenary talks as filtered through my own preoccupations, preconceptions, and prejudices.

I will thus just provide some brief impressions from the conference on this blog.

The Zoom sessions with the talks worked very well so far, but the Gather environment seems to be a bit buggy still (sometimes people walking past dragged my avatar along, and the video/audio connections were slow to start and sometimes a bit laggy; also, private spaces seemed not to work as intended, and sometimes it was impossible to walk through a door into another room). Also, having to spread one's attention between up to three platforms (Zoom, Gather, Slack) makes the cognitive load a bit heavy at least for me. That being said, the organizers deserve great applause for having done their utmost to provide as "real" a conference experience as possible under the present circumstances.

Jordan EllenbergGPT-3 me, by the book

Not gonna lie, it has always been a dream of mine, a marker of being a Real Writer, to do a New York Times Book Review “By the Book” interview, and this summer I did it. Really fun — I got to praise a lot of writers I truly admire (including Janet Malcolm, who I sort of hoped would read it, and who died the day before the interview came out.) and hopefully helped generate some publicity for Shape.

I also recently got beta access to Open AI’s “GPT-3” text-generation engine. So I thought it would be fun to give it part of my interview and see what happened if I asked it to continue.

Here are some results. What’s bold is text I supplied to GPT-3 (along with the whole text of the interview previous to that line.) This uses the davinci engine, with temperature set to 0.7.

What kind of reader were you as a child? Which childhood books and authors stick with you most?

I read “The Little House on the Prairie” series as it came out, and I think I really liked them. I reread “The Hobbit” and “The Lord of the Rings” many times. I loved “The Hobbit” so much I would reread the whole thing and just skip the long chapters. The first adult book I read was “The Heart Is a Lonely Hunter” by Carson McCullers, when I was 10 or 11, because I loved the title so much. It was a little hard to get into because it was so slow, but I was so happy to be reading a book that was all about people being sad, because I was so sad!

What’s the best book you’ve ever received as a gift?

“The Go-Between,” by L. P. Hartley. I know it’s a cliché, but it’s so perfect.

If you could require the president to read one book, what would it be?

Note that GPT-3 has enough text to recognize this as a “By The Book” interview, because the two questions it generates are indeed on the standard list of questions that the NYTBR asks authors. “Temperature” is a measure of how constrained GPT-3 feels to pick the word it seems most likely at each stage. One cool thing about the interface is it gives you the probability distribution it drew from for each word. THe word “Heart” above, for instance, is one of the lowest-probability words in the above response; GPT-3 rated it the 11th most likely next word, with a probability of 0.8%. Top three were “Cat” (4.0%), “Aut” (1.7%), and “Fountain” (1.7%). Wait, “Aut”? Yeah — GPT-3 uses “tokens,” not words, and sometimes these are parts of words, or even just initial letters. I imagine “Aut” could have been the beginning of “Autobiography” or “Author.” Presumably running the algorithm at a lower temperature would have made it much more likely I’d have read something starting with “The Cat.” Having settled on “Heart,” the following word “is” came in with an 87% probability, far above the next most likely candidate, “of.”

In fact, it’s kind of cool how the OpenAI interface displays these probabilities, so I’m gonna screenshot and include:

Red means unlikely, green means likely — so you can see very clearly the red “Heart” and the green “Is.” Note that the standard questions from “By the Book” are almost all green once you get past the first word; you’re on a track the algorithm’s very confident about. “The Go-Between,” by L.P. Hartley is a real book, by the way, though not one I’d ever heard of, and not one I think my earlier responses could fairly be said to make you think I’d like. The answer itself, though, while it’s not my style, is very much “By the Book” style.

Here’s what you get at temperature 0:

What kind of reader were you as a child? Which childhood books and authors stick with you most?

I was a voracious reader, but I don’t remember any particular books. I remember the feeling of being in a room with a book. I remember the feeling of being in a room with a book and a window. I remember the feeling of being in a room with a book and a window and a tree outside the window. I remember the feeling of being in a room with a book and a window and a tree outside the window and a friend in the room. I remember the feeling of being in a room with a book and a window and a tree outside the window and a friend in the room and a dog in the room. I remember the feeling of being in a room with a book and a window and a tree outside the window and a friend in the room and a dog in the room and a cat in the room. I remember the feeling of being in a room with a book and a window and a tree outside the window and a friend in the room and a dog in the room and a cat in the room and a hamster in the room. I remember the feeling of being in a room with a book and a window and a tree outside the window and a friend in the room and a dog in the room and a cat in the room and a hamster in the room and a goldfish in the room. I remember the feeling of being in a room with a book and a window and a tree outside the window and a friend in the room and a dog in the room and a cat in the room and a hamster in the room and a goldfish in the room and a bird in the room. I remember the feeling of being in a room with a book and a window and a tree outside the window and a friend in the room and a dog in

This is pretty typical. If you pick the most likely “next word” every time, you get stuck repeating yourself, getting more and more certain that staying in the loop is the right thing to do because after all it’s what you’ve been doing! The only even moderately low-probability words here occur when a new animal is introduced to the room.

What about temperature 1, giving GPT-3 maximal license to draw less-likely next tokens? (Maybe temp 1 means literally draw from the imputed distribution? I’m not sure.)

What kind of reader were you as a child? Which childhood books and authors stick with you most?

Neil Postman recommended “Wind in the Willows,” which my mother read to me and I read at my mother’s knee, many many times, more times than I can count. I come back to it every year or so, and it’s still a great book. I also was nuts about “Blondie”; the single joke in the joke book never loses its appeal for me. I loved “Black Beauty” and several other Victorian kids’ classics with similar emotional power; I haven’t read them again yet because I never have any time. I can’t forget how much I loved it the first time I read it.

Anything you’d recommend to someone new to your books?

“Lovecraft Country,” without a doubt. Also my essay in the Yale series, “Mao II and Then Some.” I also think my Song of Ice and Fire “house style” recaps (here, here, here and piecemeal here); they’re the most comfortable thing I’ve written. Even my husband likes them, and he doesn’t care about the show or the books. And the recent article on swaddling.

Edited to make explicit that in-kind donations by the author also serve as p.r. I should have included them, which I wish I had done.

Well, this is the most fun at any rate. I love the idea that Neil Postman was recommending books to childhood me, and to my mom, I guess. This one actually feels to me the most like my own prose style, even as the things it says don’t quite make sense? Maybe there’s a tradeoff between style and sense. At any rate, there is no actual essay called “Mao II and Then Some,” but that’s 100% a title I would use, and DeLillo really is a writer I admire and didn’t mention in the piece. “Anything you’d recommend to someone new to your books?” is not one of NYTBR’s standard questions for this feature, but doesn’t it sound like one? Oh, here’s the color chart so you can see how much redder and more daring this output is.

Impressive work — and let me emphasize that I just did those three runs and that’s what I showed you, no cherry-picking of the best output. Not something that makes me feel easily impersonable, of course. But I didn’t give it that much of my writing….!

John PreskillQuantum steampunk is heading to bookstores!

I’m publishing a book! Quantum Steampunk: The Physics of Yesterday’s Tomorrow is hitting bookstores next spring, and you can preorder it now.

As Quantum Frontiers regulars know, steampunk is a genre of literature, art and film. Steampunkers fuse 19th-century settings (such as Victorian England, the Wild West, and Meiji Japan) with futuristic technologies (such as dirigibles, time machines, and automata). So does my field of research, a combination of thermodynamics, quantum physics, and information processing. 

Thermodynamics, the study of energy, developed during the Industrial Revolution. The field grew from practical concerns (How efficiently can engines pump water out of mines?) but wound up addressing fundamental questions (Why does time flow in only one direction?). Thermodynamics needs re-envisioning for 21st-century science, which spotlights quantum systems—electrons, protons, and other basic particles. Early thermodynamicists couldn’t even agree that atoms existed, let alone dream that quantum systems could process information in ways impossible for nonquantum systems. Over the past few decades, we’ve learned that quantum technologies can outperform their everyday counterparts in solving certain computational problems, in securing information, and in transmitting information. The study of quantum systems’ information-processing power forms a mathematical and conceptual toolkit, quantum information science. My colleagues and I leverage this toolkit to reconceptualize thermodynamics. As we combine a 19th-century framework (thermodynamics) with advanced technology (quantum information), I call our field quantum steampunk.

Glimpses of quantum steampunk have surfaced on this blog throughout the past eight years. The book is another animal, a 15-chapter closeup of the field. The book sets the stage with introductions to information processing, quantum physics, and thermodynamics. Then, we watch these three perspectives meld into one coherent whole. We tour the landscape of quantum thermodynamics—the different viewpoints and discoveries championed by different communities. These viewpoints, we find, offer a new lens onto the rest of science, including chemistry, black holes, and materials physics. Finally, we peer through a brass telescope to where quantum steampunk is headed next. Throughout the book, the science interleaves with anecdotes, history, and the story of one woman’s (my) journey into physics—and with snippets from a quantum-steampunk novel that I’ve dreamed up.

On this blog, different parts of my posts are intended for different audiences. Each post contains something for everyone, but not everyone will understand all of each post. In contrast, the book targets the general educated layperson. One of my editors majored in English, and another majored in biology, so the physics should come across clearly to everyone (and if it doesn’t, blame my editors). But the book will appeal to physicists, too. Reviewer Jay Lawrence, a professor emeritus of Dartmouth College’s physics department, wrote, “Presenting this vision [of quantum thermodynamics] in a manner accessible to laypeople discovering new interests, Quantum Steampunk will also appeal to specialists and aspiring specialists.” This book is for you.

Painting, by Robert Van Vranken, that plays a significant role in the book.

Strange to say, I began writing Quantum Steampunk under a year ago. I was surprised to receive an email from Tiffany Gasbarrini, a senior acquisitions editor at Johns Hopkins University Press, in April 2020. Tiffany had read the article I’d written about quantum steampunk for Scientific American. She wanted to expand the press’s offerings for the general public. Would I be interested in writing a book proposal? she asked.

Not having expected such an invitation, I poked around. The press’s roster included books that caught my eye, by thinkers I wanted to meet. From Wikipedia, I learned that Johns Hopkins University Press is “the oldest continuously running university press in the United States.” Senior colleagues of mine gave the thumbs-up. So I let my imagination run.

I developed a table of contents while ruminating on long walks, which I’d begun taking at the start of the pandemic. In late July, I submitted my book proposal. As the summer ended, I began writing the manuscript.

Writing the first draft—73,000 words—took about five months. The process didn’t disrupt life much. I’m used to writing regularly; I’ve written one blog post per month here since 2013, and I wrote two novels during and after college. I simply picked up my pace. At first, I wrote only on weekends. Starting in December 2020, I wrote 1,000 words per day. The process wasn’t easy, but it felt like a morning workout—healthy and productive. That productivity fed into my science, which fed back into the book. One of my current research projects grew from the book’s epilogue. A future project, I expect, will evolve from Chapter 5.

As soon as I finished draft one—last January—Tiffany and I hunted for an illustrator. We were fortunate to find Todd Cahill, a steampunk artist. He transformed the poor sketches that I’d made into works of art.

Steampunk illustration of a qubit, the basic unit of quantum information, by Todd Cahill

Early this spring, I edited the manuscript. That edit was to a stroll as the next edit was to the Boston Marathon. Editor Michael Zierler coached me through the marathon. He identified concepts that needed clarification, discrepancies between explanations, and analogies that had run away with me—as well as the visions and turns of phrase that delighted him, to balance the criticism. As Michael and I toiled, 17 of my colleagues were kind enough to provide feedback. They read sections about their areas of expertise, pointed out subtleties, and confirmed facts.

Soon after Michael and I crossed the finished line, copyeditor Susan Matheson took up the baton. She hunted for typos, standardized references, and more. Come June, I was editing again—approving and commenting on her draft. Simultaneously, Tiffany designed the cover, shown above, with more artists. The marketing team reached out, and I began planning this blog post. Scratch that—I’ve been dreaming about this blog post for almost a year. But I forced myself not to spill the beans here till I told the research group I’ve been building. I shared about the book with them two Thursdays ago, and I hope that book critics respond as they did.

Every time I’ve finished a draft, my husband and I have celebrated by ordering takeout sandwiches from our favorite restaurant. Three sandwich meals are down, and we have one to go.

Having dreamed about this blog post for a year, I’m thrilled to bits to share my book with you. It’s available for preordering, and I encourage you to support your local bookstore by purchasing through The book is available also through Barnes & Noble, Amazon, Waterstones, and the other usual suspects. For press inquiries, or to request a review copy, contact Kathryn Marguy at

Over the coming year, I’ll continue sharing about my journey into publishing—the blurbs we’ll garner for the book jacket, the first copies hot off the press, the reviews and interviews. I hope that you’ll don your duster coat and goggles (every steampunker wears goggles), hop into your steam-powered gyrocopter, and join me.

July 23, 2021

Matt von HippelLessons From Neutrinos, Part I

Some of the particles of the Standard Model are more familiar than others. Electrons and photons, of course, everyone has heard of, and most, though not all, have heard of quarks. Many of the rest, like the W and Z boson, only appear briefly in high-energy colliders. But one Standard Model particle is much less exotic, and nevertheless leads to all manner of confusion. That particle is the neutrino.

Neutrinos are very light, much lighter than even an electron. (Until relatively recently, we thought they were completely massless!) They have no electric charge and they don’t respond to the strong nuclear force, so aside from gravity (negligible since they’re so light), the only force that affects them is the weak nuclear force. This force is, well, weak. It means neutrinos can be produced via the relatively ordinary process of radioactive beta decay, but it also means they almost never interact with anything else. Vast numbers of neutrinos pass through you every moment, with no noticeable effect. We need enormous tanks of liquid or chunks of ice to have a chance of catching neutrinos in action.

Because neutrinos are both ordinary and unfamiliar, they tend to confuse people. I’d like to take advantage of this confusion to teach some physics. Neutrinos turn out to be a handy theme to convey a couple blog posts worth of lessons about why physics works the way it does.

I’ll start on the historical side. There’s a lesson that physicists themselves learned in the early days:

Lesson 1: Don’t Throw out a Well-Justified Conservation Law

In the early 20th century, physicists were just beginning to understand radioactivity. They could tell there were a few different types: gamma decay released photons in the form of gamma rays, alpha decay shot out heavy, positively charged particles, and beta decay made “beta particles”, or electrons. For each of these, physicists could track each particle and measure its energy and momentum. Everything made sense for gamma and alpha decay…but not for beta decay. Somehow, they could add up the energy of each of the particles they could track, and find less at the end than they did at the beginning. It was as if energy was not conserved.

These were the heady early days of quantum mechanics, so people were confused enough that many thought this was the end of the story. Maybe energy just isn’t conserved? Wolfgang Pauli, though, thought differently. He proposed that there had to be another particle, one that no-one could detect, that made energy balance out. It had to be neutral, so he called it the neutron…until two years later when James Chadwick discovered the particle we call the neutron. This was much too heavy to be Pauli’s neutron, so Edoardo Amaldi joked that Pauli’s particle was a “neutrino” instead. The name stuck, and Pauli kept insisting his neutrino would turn up somewhere. It wasn’t until 1956 that neutrinos were finally detected, so for quite a while people made fun of Pauli for his quixotic quest.

Including a Faust parody with Gretchen as the neutrino

In retrospect, people should probably have known better. Conservation of energy isn’t one of those rules that come out of nowhere. It’s deeply connected to time, and to the idea that one can perform the same experiment at any time in history and find the same result. While rules like that sometimes do turn out wrong, our first expectation should be that they won’t. Nowadays, we’re confident enough in energy conservation that we plan to use it to detect other particles: it was the main way the Large Hadron Collider planned to try to detect dark matter.

As we came to our more modern understanding, physicists started writing up the Standard Model. Neutrinos were thought of as massless, like photons, traveling at the speed of light. Now, we know that neutrinos have mass…but we don’t know how much mass they have. How do we know they have mass then? To understand that, you’ll need to understand what mass actually means in physics. We’ll talk about that next week!

Scott AaronsonSlowly emerging from blog-hibervacation

Alright everyone:

  1. Victor Galitski has an impassioned rant against out-of-control quantum computing hype, which I enjoyed and enthusiastically recommend, although I wished Galitski had engaged a bit more with the strongest arguments for optimism (e.g., the recent sampling-based supremacy experiments, the extrapolations that show gate fidelities crossing the fault-tolerance threshold within the next decade). Even if I’ve been saying similar things on this blog for 15 years, I clearly haven’t been doing so in a style that works for everyone. Quantum information needs as many people as possible who will tell the truth as best they see it, unencumbered by any competing interests, and has nothing legitimate to fear from that. The modern intersection of quantum theory and computer science has raised profound scientific questions that will be with us for decades to come. It’s a lily that need not be gilded with hype.
  2. Last month Limaye, Srinivasan, and Tavenas posted an exciting preprint to ECCC, which apparently proves the first (slightly) superpolynomial lower bound on the size of constant-depth arithmetic circuits, over fields of characteristic 0. Assuming it’s correct, this is another small victory in the generations-long war against the P vs. NP problem.
  3. I’m grateful to the Texas Democratic legislators who fled the state to prevent the legislature, a couple miles from my house, having a quorum to enact new voting restrictions, and who thereby drew national attention to the enormity of what’s at stake. It should go without saying that, if a minority gets to rule indefinitely by forcing through laws to suppress the votes of a majority that would otherwise unseat it, thereby giving itself the power to force through more such laws, etc., then we no longer live in a democracy but in a banana republic. And there’s no symmetry to the situation: no matter how terrified you (or I) might feel about wokeists and their denunciation campaigns, the Democrats have no comparable effort to suppress Republican votes. Alas, I don’t know of any solutions beyond the obvious one, of trying to deal the conspiracy-addled grievance party crushing defeats in 2022 and 2024.
  4. Added: Here’s the video of my recent Astral Codex Ten ask-me-anything session.

Robert HellingEmail is broken --- the spammers won

 I am an old man. I am convinced email a superior medium for person to person information exchange. It is text based, so you don't need special hardware to use it, it can still transport various media formats and it is inter-operational, you are not tied to one company offering a service but thanks to a long list of RFCs starting with number 822 everybody can run their own service.  Via GPG or S/MIME you can even add somewhat decent encryption and authentication (at least for the body of the message) even though that is not optimal since historically it came later.

But the real advantage is on the client side: You have threading, you have the option to have folders to store your emails, you can search through them, you can set up filters so routine stuff does not clog up your inbox. And when things go really pear shaped (as they tend to do every few years), you can still recover your data from a text file on your hard drive.

This is all opposed to the continuous Now of messengers where already yesterday's message has scrolled up never to be seen again. But the young folks tend to prefer those and I cannot see any reason those would be preferable (except maybe that you get notifications on your phone). But up to now, it was still an option to say to our students "if you want to get all relevant information, check your email, it will be there".

But today, I think, this is finally over. The spammers won.

Over the last months of the pandemic, I already had to realise, mailing lists are harder and harder to use. As far as I can tell, nobody offers decent mailing lists on the net (at least without too much cost and with reasonable data protection ambitions), probably because those would immediately be used by spammers. So you have to run your own. For the families of our kid's classes at school and for daycare, to spread information that could no longer be posted on physical notice boards, I tried to set up lists on mailman like we used to do it on one of my hosted servers. Oh, what a pain. There are so many hoops you have to jump through so the messages actually end up in people's inboxes. For some major email providers (I am looking at you hosteurope) there is little chance unless the message's sender is in the recipient's address book for example. And yes, I have set up my server correctly, with reverse DNS etc working.

But now, it's application season again. We have received over 250 applications for the TMP master program and now I have to inform applicants about their results. The boundary conditions is that I want to send an individual mail to everybody  that contains their name and I want to digitally sign it so applicants know it is authentic and not sent by some prankster impersonating me (that had happened in the past, seriously). And I don't want to send those manually.

So I have a perl script (did I mention I am old), that reads the applicants' data from a database, composes the message with the appropriate template, applies a GPG signature and sends it off to the next SMTP server.

In the past, I had already learned that inside the university network, you have to sleep 10 seconds after each message as every computer that sends emails at a higher rate is considered taken over by some spammers and automatically disconnected from the network for 30 minutes.

This year, I am sending from the home office and as it turns out, some of my messages never make it to the recipients. There is of course no error message to me (thanks spammers) but the message seems to be silently dropped. It's not in the inbox and also cannot be found in the spam folder. Just gone.

I will send paper letters as well (those are required for legal reasons anyway) with a similar script and thanks to TeX being text based on the input side. But is this really the answer for 2021? How am I supposed to inform people I have not (yet) met from around the world in a reliable way? 

I am lost.

July 22, 2021

n-Category Café Large Sets 13

Previously: Part 12.5

This is the last post in the series, and it’s a short summary of everything we’ve done.

Most posts in this series introduced some notion of large set. Here’s a diagram showing the corresponding existence conditions, with the weakest conditions at the bottom and the strongest at the top:

Diagram of large set conditions

Most of these existence conditions come in two flavours. Take beth fixed points, for instance. We could be interested in the literal existence condition “there is at least one beth fixed point”, or the stronger condition “there are unboundedly many beth fixed points” (meaning that whatever set you choose, there’s a beth fixed point bigger than it).

I’ve been a bit sloppy in using phrases like “large set conditions”. These conditions are as much about shape as size. The point is that if a set XX is measurable, or inaccessible, or whatever, and YY is another set larger than XX, then YY need not be measurable, inaccessible, etc. It usually isn’t. But in ordinary language, we’d expect a thing larger than a large thing to be large.

It’s like this: call a building centennial if the number of storeys is a (nonzero) multiple of 100. Even the smallest centennial building has 100 storeys, so all centennial buildings are what most people would call large. In that sense, being centennial is a largeness condition. But a 101-storey building is not centennial, despite being larger than a centennial building, so the adjective “centennial” doesn’t behave like the ordinary English adjective “large”.

Puzzle   There’s one largeness condition in this series that does behave like this: any set larger than one that’s large in this sense is also large. Well, uncountability and being infinite are both conditions with this property, but it’s something less trivial I’m thinking of. What is it?

In the very first post, I wrote this:

I’ll be using a style of set theory that so far I’ve been calling categorical, and which could also be called isomorphism-invariant or structural, but which I really want to call neutral set theory.

What I mean is this. In cooking, a recipe will sometimes tell you to use a neutral cooking oil. This means an oil whose flavour is barely noticeable, unlike, say, sesame oil or coconut oil or extra virgin olive oil. A neutral cooking oil fades into the background, allowing the other flavours to come through.

The kind of set theory I’ll use here, and the language I’ll use to discuss it, is “neutral” in the sense that it’s the language of the large majority of mathematical publications today, especially in more algebraic areas. It’s the language of structures and substructures and quotients and isomorphisms, the lingua franca of algebra. To most mathematicians, it should just fade into the background, allowing the essential points about sets themselves to come through.

And later on in that post, I continued:

While everyone agrees that the elements of a set XX are in canonical one-to-one correspondence with the functions 1X1 \to X, some people don’t like defining an element of XX to literally be a function 1X1 \to X. That’s OK. True to the spirit of neutral set theory, I’ll never rely on this definition of element. All we’ll need is that uncontroversial one-to-one correspondence.

I’m not sure I did a great job of explaining what I meant by “neutral”, partly because I was still fine-tuning the idea, but also because it’s one of those things that’s easier to demonstrate than describe.

The proof of the pudding is in the eating. These last dozen posts are a demonstration of what I mean by “neutral”. Throughout, sets have been treated in the same way as a modern mathematician treats any other algebraic or geometric object. In particular, everything was isomorphism-invariant, and we didn’t ask funny questions about sets (like “does ξxX\xi \in x \in X imply ξX\xi \in X?”) that we wouldn’t ask in group theory if the sets were equipped with group structures, or in differential geometry if the sets had the structure of manifolds, etc.

As promised, I’ve never needed to insist that the elements of a set XX literally are the functions 1X1 \to X, only that they’re in canonical one-to-one correspondence. Similarly, it’s never been important whether an II-indexed family of sets literally is a function into II, as long as the two notions are equivalent in the appropriate sense.

Aside   Having said that, I don’t want to concede too much of a point. In modern mathematics, it’s absolutely standard that once you’ve established that two kinds of thing are in canonical one-to-one correspondence, you treat them interchangeably. And sometimes you then rearrange the definitions so that there’s just one kind of thing, making the correspondence invisible. The definition of an element as a map from 11 is a case of this.

Personally, when I try to follow arguments that an element should not be defined as a map from 11, I have a hard time finding anything mathematically solid to hold on to. We all agree that the two things are in canonical one-to-one correspondence, so what’s the issue? For comparison, imagine someone arguing that although sequences in a set XX correspond canonically to functions X\mathbb{N} \to X, they shouldn’t be defined as functions X\mathbb{N} \to X. What would you say in response?

I’m mentioning this not in order to restart arguments that many of us have had many times before (though I know I’m running that risk). My point is that although I’ve scrupulously avoided saying that an element is a function out of 11, or that an II-indexed family is a function into II, to do so would actually be pretty neutral in the sense of being mainstream mathematical practice: when two things are equivalent, we treat them as interchangeable.

Let me finish where I started. I began the first post like this:

This is the first of a series of posts on how large cardinals look in categorical set theory.

My primary interest is not actually in large cardinals themselves. What I’m really interested in is exploring the hypothesis that everything in traditional, membership-based set theory that’s relevant to the rest of mathematics can be done smoothly in categorical set theory. I’m not sure this hypothesis is correct (and I suppose no one could ever be sure), which is why I use the words “hypothesis” and “explore”. But I know of no counterexample.

Twelve posts later, I still know of no counterexample. As far as I know, categorical set theory can do everything important that membership-based set theory can do — and not only do it, but do it in a way that’s smooth, natural, and well-motivated.

We could go on with this experiment forever. Personally, I’m very conscious of my own ignorance of set theory. For example, I know almost nothing about two important themes related to large cardinals: critical points of elementary embeddings, and Shelah’s PCF theory. So if the experiment’s going to continue, someone else might have to take the reins.

Jordan Ellenberg“I do it all better”

“I was also at the exhibition of prominent French artists Bonnard Picasso Matisse etc. and noted much to my satisfaction that I do it all better.”

Max Beckmann, letter to his wife Quappi Beckmann, 28 Jul 1925

Self-portrait, from around the same time (this is hung at the Harvard Art Museum and is well worth a trip if you’re on campus)

Jordan EllenbergThe Milwaukee Bucks are the champions of the National Basketball Association

First of all, let’s be clear — I am a bandwagon fan. Despite having lived in Wisconsin since 2005, I only got into the Bucks three years ago. I feel a real envy for the long-time fans I know here, who, if they are old enough, have been waiting for a Bucks title since the year I was born. They feel this in a way I can’t.

That said: what a run! Bucks getting revenge on the Heat, who knocked us out early last year … then beating the Nets, the kind of super-team Giannis could have jumped to but didn’t… coming through and winning those key games against the Hawks without Giannis (including a superhero game from my favorite Buck, Brook Lopez) — and finally, Giannis coming back and being so huge in the finals, scoring 50 in the closeout game; and best of all, Giannis got there by hitting 17 out of 19 free throws — it was like, he said “I have one vulnerability and I think tonight is the night to turn it off.” You couldn’t write a better story.

Giannis, the philosopher:

And the text, if that link dies:

When I think about like, “Yeah I did this.” You know, “I’m so great. I had 30, I had 25-10-10,” or whatever the case might be. Because you’re going to think about that … Usually the next day you’re going to suck. Simple as that. Like, the next few days you’re going to be terrible. And I figured out a mindset to have that, when you focus on the past, that’s your ego: “I did this in the past. I won that in the past.”

And when I focus on the future, it’s my pride. “Yeah, the next game, Game 5, I’ll do this and this and this. I’m going to dominate.” That’s your pride talking. Like, it doesn’t happen. You’re right here. I try and focus in the moment. In the present. And that’s humility. That’s being humble. That’s not setting no expectations. That’s going out there and enjoying the game. Competing at a high level. I’ve had people throughout my life who have helped me with that. But that’s a skill that I’ve tried to, like, how do you say? Perfect it. Yeah, master it. It’s been working so far, so I’m not going to stop.

Parade tomorrow in Milwaukee.

July 20, 2021

Doug NatelsonQuantum computing + hype

 Last Friday, Victor Galitski published a thought-provoking editorial on linkedin, entitled "Quantum Computing Hype is Bad for Science".  I encourage people to read it.

As a person who has spent years working in the nano world (including on topics like "molecular electronics"), I'm intimately familiar with the problem of hype.  Not every advance is a "breakthrough" or "revolutionary" or "transformative" or "disruptive", and that is fine - scientists and engineers do themselves a disservice when overpromising or unjustifiably inflating claims of significance.  Incentives often point in an unfortunate direction in the world of glossy scientific publications, and the situation is even murkier when money is involved (whether to some higher order, as in trying to excite funding agencies, or to zeroth order, as in raising money for startup companies).   Nano-related research advances overwhelmingly do not lead toward single-crystal diamond nanofab or nanobots swimming through our capillaries.  Not every genomics advance will lead to a global cure for cancer or Alzheimers.  And not every quantum widget will usher in some quantum information age that will transform the world.  It's not healthy for anyone in the long term for unsupported, inflated claims to be the norm in any of these disciplines.

I am more of an optimist than Galitski.  

I agree that we are a good number of years away from practical general-purpose quantum computers that can handle problems large enough to be really interesting (e.g. breaking 4096-bit RSA encryption).  However, I think there is a ton of fascinating and productive research to be done along the way, including in areas farther removed from quantum computing, like quantum-enhanced sensing.  Major federal investments in the relevant science and engineering research will lead to real benefits in the long run, in terms of whatever technically demanding physics/electronics/optics/materials work force needs we will have.  There is very cool science to be done.  If handled correctly, increased investment will not come at the expense of non-quantum-computing science.  It is also likely not a zero-sum game in terms of human capital - there really might be more people, total, drawn into these fields if prospects for employment look more exciting and broader than they have in the past.  

Where I think Galitski is right on is the concern about what he calls "quantum Ponzi schemes".  Some people poured billions of dollars into anything with the word "blockchain" attached to it, even without knowing what blockchain means, or how it might be implemented by some particular product.  There is a real danger that investors will be unable to tell reality from science fiction and/or outright lying when it comes to quantum technologies.  Good grief, look how much money went into Theranos when lots of knowledgable people knew that single-drop-of-blood assays have all kinds of challenges and that the company's claims seemed unrealistic. 

I also think that it is totally reasonable to be concerned about the sustainability of this - anytime there is super-rapid growth in funding for an area, it's important to think about what comes later.  The space race is a good example.  There were very cool knock-on benefits overall from the post-Sputnik space race, but there was also a decades-long hangover in the actual aerospace industry when the spending fell back to earth.  

Like I said, I'm baseline pretty optimistic about all this, but it's important to listen to cautionary voices - it's the way to stay grounded and think more broadly about context.  

Doug NatelsonAPS Division of Condensed Matter Physics Invited Symposium nominations

Hopefully the 2022 APS March Meeting in Chicago will be something closer to "normal", though (i) with covid variants it's good to be cautious about predictions, and (ii) I wouldn't be surprised if there is some hybrid content.  Anyway, I encourage submissions.  Having been a DCMP member-at-large and seen the process, it's to all of our benefit if there is a large pool of interesting contributions.


The Division of Condensed Matter Physics (DCMP) program committee requests your proposals for Invited Symposium sessions for the APS March Meeting 2022. DCMP hosts approximately 30 Invited symposia during the week of the March Meeting highlighting cutting-edge research in the broad field of condensed matter physics. These symposia consist of 5 invited talks centered on a research topic proposed by the nominator(s). Please submit only Symposium nominations. DCMP does not select individual speakers for invited talks.

Please use the APS nominations website for submission of your symposium nomination.

Submit your nomination

Nominations should be submitted as early as possible, and no later than August 13. Support your nomination with a justification, a list of five confirmed invited speakers with tentative titles, and a proposed session chair. Thank you for spending the time to help organize a strong DCMP participation at next year’s March Meeting.

Jim Sauls, Secretary/Treasurer for DCMP

July 18, 2021

Tommaso DorigoThe Meta-Vaccine

With the delta variant of Covid-19 surging in many countries - e.g., over 100,000 new cases per day foreseen in the UK in the next few days, and many other countries following suit - we may feel depressed at the thought that this pandemic is going to stay with us for a lot longer than some originally foresaw.
In truth, if you could sort out your sources well, you would have predicted this a long time ago: epidemiologists had in fact foreseen that there would continue to be waves of contagions, although at some point mitigated by the vaccination campaigns. However, so much misinformation and falsehood on the topic has been since dumped on all media, and in particular on the internet, that it is easy to pick up wrong information.

read more

July 16, 2021

Matt von HippelThe Winding Path of a Physics Conversation

In my line of work, I spend a lot of time explaining physics. I write posts here of course, and give the occasional public lecture. I also explain physics when I supervise Master’s students, and in a broader sense whenever I chat with my collaborators or write papers. I’ll explain physics even more when I start teaching. But of all the ways to explain physics, there’s one that has always been my favorite: the one-on-one conversation.

Talking science one-on-one is validating in a uniquely satisfying way. You get instant feedback, questions when you’re unclear and comprehension when you’re close. There’s a kind of puzzle to it, discovering what you need to fill in the gaps in one particular person’s understanding. As a kid, I’d chase this feeling with imaginary conversations: I’d plot out a chat with Democritus or Newton, trying to explain physics or evolution or democracy. It was a game, seeing how I could ground our modern understanding in concepts someone from history already knew.

Way better than Parcheesi

I’ll never get a chance in real life to explain physics to a Democritus or a Newton, to bridge a gap quite that large. But, as I’ve discovered over the years, everyone has bits and pieces they don’t yet understand. Even focused on the most popular topics, like black holes or elementary particles, everyone has gaps in what they’ve managed to pick up. I do too! So any conversation can be its own kind of adventure, discovering what that one person knows, what they don’t, and how to connect the two.

Of course, there’s fun in writing and public speaking too (not to mention, of course, research). Still, I sometimes wonder if there’s a career out there in just the part I like best: just one conversation after another, delving deep into one person’s understanding, making real progress, then moving on to the next. It wouldn’t be efficient by any means, but it sure sounds fun.

John BaezThe Ideal Monatomic Gas

Today at the Topos Institute, Sophie Libkind, Owen Lynch and I spent some time talking about thermodynamics, Carnot engines and the like. As a result, I want to work out for myself some basic facts about the ideal gas. This stuff is all well-known, but I’m having trouble finding exactly what I want—and no more, thank you—collected in one place.

Just for background, the Carnot cycle looks roughly like this:

This is actually a very inaccurate picture, but it gets the point across. We have a container of gas, and we make it execute a cyclic motion, so its pressure P and volume V trace out a loop in the plane. As you can see, this loop consists of four curves:

• In the first, from a to b, we put a container of gas in contact with a hot medium. Then we make it undergo isothermal expansion: that is, expansion at a constant temperature.

• In the second, from b to c, we insulate the container and let the gas undergo adiabatic reversible expansion: that is, expansion while no heat enters or leaves. The temperature drops, but merely because the container expands, not because heat leaves. It reaches a lower temperature. Then we remove the insulation.

• In the third, from c to d, we put the container in contact with a cold medium that matches its temperature. Then we make it undergo isothermal contraction: that is, contraction at a constant temperature.

• In the fourth, from d to a, we insulate the container and let the gas undergo adiabatic reversible contraction: that is, contraction while no heat enters or leaves. The temperature increases until it matches that of the hot medium. Then we remove the insulation.

The Carnot cycle is important because it provides the most efficient possible heat engine. But I don’t want to get into that. I just want to figure out formulas for everything that’s going on here—including formulas for the four curves in this picture!

To get specific formulas, I’ll consider an ideal monatomic gas, meaning a gas made of individual atoms, like helium. Some features of an ideal gas, like the formula for energy as a function of temperature, depend on whether it’s monatomic.

As a quirky added bonus, I’d like to highlight how certain properties of the ideal monatomic gas depend on the dimension of space. There’s a certain chunk of the theory that doesn’t depend on the dimension of space, as long as you interpret ‘volume’ to mean the n-dimensional analogue of volume. But the number 3 shows up in the formula for the energy of the ideal monatomic gas. And this is because space is 3-dimensional! So just for fun, I’ll do the whole analysis in n dimensions.

There are four basic formulas we need to know.

First, we have the ideal gas law:

PV = NkT


P is the pressure.
V is the n-dimensional volume.
N is the number of molecules in a container of gas.
k is a constant called Boltzmann’s constant.
T is the temperature.

Second, we have a formula for the energy, or more precisely the internal energy, of a monatomic ideal gas:

U = \frac{n}{2} NkT


U is the internal energy.
n is the dimension of space.

The factor of n/2 shows up thanks to the equipartition theorem: classically, a harmonic oscillator at temperature T has expected energy equal to kT times its number of degrees of freedom. Very roughly, the point is that in n dimensions there are n different directions in which an atom can move around.

Third, we have a relation between internal energy, work and heat:

dU = \delta W + \delta Q


dU is the differential of internal energy.
\delta W is the infinitesimal work done to the gas.
\delta Q is the infinitesimal heat transferred to the gas.

The intuition is simple: to increase the energy of some gas you can do work to it or transfer heat to it. But the math may seem a bit murky, so let me explain.

I emphasize ‘to’ because it affects the sign: for example, the work done by the gas is minus the work done to the gas. Work done to the gas increases its internal energy, while work done by it reduces its internal energy. Similarly for heat.

But what is this ‘infinitesimal’ stuff, and these weird \delta symbols?

In a minute I’m going to express everything in terms of P and V. So, T, N and U will be functions on the plane with coordinates P and V. dU will be a 1-form on this plane: it’s the differential of the function U.

But \delta W and \delta Q are not differentials of functions W and Q. There are no functions on the plane called W and Q. You can not take a box of gas and measure its work, or heat! There are just 1-forms called \delta W and \delta Q describing the change in work or heat. These are not exact 1-forms: that is, they’re not differentials of functions.

Fourth and finally:

\delta W = - P dV

This should be intuitive. The work done by the gas on the outside world by changing its volume a little equals the pressure times the change in volume. So, the work done to the gas is minus the pressure times the change in volume.

One nice feature of the 1-form \delta W = -P d V is this: as we integrate it around a simple closed curve going counterclockwise, we get the area enclosed by that curve. So, the area of this region:

is the work done by our container of gas during the Carnot cycle. (There are a lot of minus signs to worry about here, but don’t worry, I’ve got them under control. Our curve is going clockwise, so the work done to our container of gas is negative, and it’s minus the area in the region.)

Okay, now that we have our four basic equations, we can play with them and derive consequences. Let’s suppose the number N of atoms in our container of gas is fixed—a constant. Then we think of everything as a function of two variables: P and V.

First, since PV = NkT we have

\displaystyle{ T = \frac{PV}{Nk} }

So temperature is proportional to pressure times volume.

Second, since PV = NkT and U = \frac{n}{2}NkT we have

U = \frac{n}{2} P V

So, like the temperature, the internal energy of the gas is proportional to pressure times volume—but it depends on the dimension of space!

From this we get

dU = \frac{n}{2} d(PV) = \frac{n}{2}( V dP + P dV)

From this and our formulas dU = \delta W + \delta Q, \delta W = -PdV we get

\begin{array}{ccl}  \delta Q &=& dU - \delta W \\  \\  &=& \frac{n}{2}( V dP + P dV) + P dV \\ \\  &=& \frac{n}{2} V dP + \frac{n+2}{2} P dV   \end{array}

That’s basically it!

But now we know how to figure out everything about the Carnot cycle. I won’t do it all here, but I’ll work out formulas for the curves in this cycle:

The isothermal curves are easy, since we’ve seen temperature is proportional to pressure times volume:

\displaystyle{ T = \frac{PV}{Nk} }

So, an isothermal curve is any curve with

P \propto V^{-1}

The adiabatic reversible curves, or ‘adiabats’ for short, are a lot more interesting. A curve C in the P  V plane is an adiabat if when the container of gas changes pressure and volume while moving along this curve, no heat gets transferred to or from the gas. That is:

\delta Q \Big|_C = 0

where the funny symbol means I’m restricting a 1-form to the curve and getting a 1-form on that curve (which happens to be zero).

Let’s figure out what an adiabat looks like! By our formula for Q we have

(\frac{n}{2} V dP + \frac{n+2}{2} P dV) \Big|_C = 0


\frac{n}{2} V dP \Big|_C = -\frac{n+2}{2} P dV \Big|_C


\frac{dP}{P} \Big|_C = - \frac{n+2}{n} \frac{dV}{V}\Big|_C

Now, we can integrate both sides along a portion of the curve C and get

\ln P = - \frac{n+2}{n} \ln V + \mathrm{constant}


P \propto V^{-(n+2)/n}

So in 3-dimensional space, as you let a gas expand adiabatically—say by putting it in an insulated cylinder so heat can’t get in or out—its pressure drops as its volume increases. But for a monatomic gas it drops in this peculiar specific way: the pressure goes like the volume to the -5/3 power.

In any dimension, the pressure of the monatomic gas drops more steeply when the container expands adiabatically than when it expands at constant temperature. Why? Because V^{-(n+2)/n} drops more rapidly than V^{-1} since

\frac{n+2}{n} > 1

But as n \to \infty,

\frac{n+2}{n} \to 1

so the adiabats become closer and and closer to the isothermal curves in high dimensions. This is not important for understanding the conceptually significant features of the Carnot cycle! But it’s curious, and I’d like to improve my understanding by thinking about it until it seems obvious. It doesn’t yet.

July 14, 2021

John BaezFisher’s Fundamental Theorem (Part 4)

I wrote a paper that summarizes my work connecting natural selection to information theory:

• John Baez, The fundamental theorem of natural selection.

Check it out! If you have any questions or see any mistakes, please let me know.

Just for fun, here’s the abstract and introduction.

Abstract. Suppose we have n different types of self-replicating entity, with the population P_i of the ith type changing at a rate equal to P_i times the fitness f_i of that type. Suppose the fitness f_i is any continuous function of all the populations P_1, \dots, P_n. Let p_i be the fraction of replicators that are of the ith type. Then p = (p_1, \dots, p_n) is a time-dependent probability distribution, and we prove that its speed as measured by the Fisher information metric equals the variance in fitness. In rough terms, this says that the speed at which information is updated through natural selection equals the variance in fitness. This result can be seen as a modified version of Fisher’s fundamental theorem of natural selection. We compare it to Fisher’s original result as interpreted by Price, Ewens and Edwards.


In 1930, Fisher stated his “fundamental theorem of natural selection” as follows:

The rate of increase in fitness of any organism at any time is equal to its genetic variance in fitness at that time

Some tried to make this statement precise as follows:

The time derivative of the mean fitness of a population equals the variance of its fitness.

But this is only true under very restrictive conditions, so a controversy was ignited.

An interesting resolution was proposed by Price, and later amplified by Ewens and Edwards. We can formalize their idea as follows. Suppose we have n types of self-replicating entity, and idealize the population of the ith type as a real-valued function P_i(t). Suppose

\displaystyle{ \frac{d}{dt} P_i(t) = f_i(P_1(t), \dots, P_n(t)) \, P_i(t) }

where the fitness f_i is a differentiable function of the populations of every type of replicator. The mean fitness at time t is

\displaystyle{ \overline{f}(t) = \sum_{i=1}^n p_i(t) \, f_i(P_1(t), \dots, P_n(t)) }

where p_i(t) is the fraction of replicators of the ith type:

\displaystyle{ p_i(t) = \frac{P_i(t)}{\phantom{\Big|} \sum_{j = 1}^n P_j(t) } }

By the product rule, the rate of change of the mean fitness is the sum of two terms:

\displaystyle{ \frac{d}{dt} \overline{f}(t) = \sum_{i=1}^n \dot{p}_i(t) \, f_i(P_1(t), \dots, P_n(t)) \; + \; }

\displaystyle{ \sum_{i=1}^n p_i(t) \,\frac{d}{dt} f_i(P_1(t), \dots, P_n(t)) }

The first of these two terms equals the variance of the fitness at time t. We give the easy proof in Theorem 1. Unfortunately, the conceptual significance of this first term is much less clear than that of the total rate of change of mean fitness. Ewens concluded that “the theorem does not provide the substantial biological statement that Fisher claimed”.

But there is another way out, based on an idea Fisher himself introduced in 1922: Fisher information. Fisher information gives rise to a Riemannian metric on the space of probability distributions on a finite set, called the ‘Fisher information metric’—or in the context of evolutionary game theory, the ‘Shahshahani metric’. Using this metric we can define the speed at which a time-dependent probability distribution changes with time. We call this its ‘Fisher speed’. Under just the assumptions already stated, we prove in Theorem 2 that the Fisher speed of the probability distribution

p(t) = (p_1(t), \dots, p_n(t))

is the variance of the fitness at time t.

As explained by Harper, natural selection can be thought of as a learning process, and studied using ideas from information geometry—that is, the geometry of the space of probability distributions. As p(t) changes with time, the rate at which information is updated is closely connected to its Fisher speed. Thus, our revised version of the fundamental theorem of natural selection can be loosely stated as follows:

As a population changes with time, the rate at which information is updated equals the variance of fitness.

The precise statement, with all the hypotheses, is in Theorem 2. But one lesson is this: variance in fitness may not cause ‘progress’ in the sense of increased mean fitness, but it does cause change!

For more details in a user-friendly blog format, read the whole series:

Part 1: the obscurity of Fisher’s original paper.

Part 2: a precise statement of Fisher’s fundamental theorem of natural selection, and conditions under which it holds.

Part 3: a modified version of the fundamental theorem of natural selection, which holds much more generally.

Part 4: my paper on the fundamental theorem of natural selection.

Tommaso DorigoRanBox Goes Semi-Supervised

Following my strong belief that science dissemination, and open borders science, is too important to pursue as a goal to constrain it by fears of being stripped of good ideas and scooped by fast competitors, I am offering here some ideas on a reserch plan I am going to follow in the coming months.

The benefits of sharing thoughts early on is evident: you may, by reading about them below, be struck with a good idea which may further improve my plan, and decide to share it with me; you might become a collaborator - which would add to the personpower devoted to the research. You might point out problems, issues to address, or mention that some or all of the research has already been done by somebody else, and published - which would save me a lot of time!

read more

July 12, 2021

Jordan EllenbergStandard Form 171

Also in this binder is my mom’s application to work for the Federal Government, which involved filling out “Standard Form 171.” In 1971, SF_171 required you to report whether you were “Mrs.” or “Miss,” your height and weight, and whether you were now, or within the last ten years had been, a member of the Communist Party, USA, or any subdivision of same.

July 09, 2021

Jordan EllenbergThe Bottom Line – A CPA Story

A typescript of about 30 pages written by my grandfather, Jacob Smith, in the summer of 1976, work towards a memoir. Just going to record some facts and details here.

He writes that he was “born more dead than alive,” the last of seven children, six years younger than the rest.

Professor Jellinghouse, a women’s specialist and sainted genius, directed the activities of several neighbors who acted as a committee of midwives. At one point, this committee decided my mother wasn’t going to make it. They prevailed on the famous professor to walk out on the society ladies in his Park Avenue office and hustle down to 452 Cherry Street, on the Lower East Side of New York. He breathed confidence and authority and no stork would dare cross him. He waved away an offered fee, left some free medicine, gave final instructions, and drove back to his Park Avenue patients. Ersht Gott — First God — then Professor Jellinghouse, is the way ma put it when she constantly retold the story.

This must have been C.F. Jellinghaus (obit here), whose office was at 440 Park Ave. I wonder what it was that brought him to do this housecall in the Lower East Side? My grandfather’s first home seems no longer to exist; Cherry Street is still there, just a couple of blocks long off FDR drive, but it doesn’t look like there are any buildings with an address there at all.

My stereotype is that New York Jews lived among other Jews, but my grandfather’s family moved to a part of the South Bronx near the 105th Field Artillery (now the Franklin Avenue Armory) that was mostly Catholic.

There was only one other Jewish boy in the neighborhood, but he was very quiet, dignified, and studious, and he never played with us. I never quite understood why this bookworm commanded so much respect — they [the kids in the neighborhood] even cleaned up their language in front him. Me they treated terrible, just as bad as if I was another old Catholic.

He went to James Monroe High School, which has a newer building, in preference to the run-down neighborhood school, Morris High, which is still there (now the Morris Academy for Collaborative Studies.) According to my grandfather, the condition of the building in the 1920s was recorded in a popular neighborhood song:

There’s an odor in the air

You can smell it everywhere

Morris High School!

Morris HIgh School!

After high school he worked as a bookkeeper, and went to accounting school at night at Pace. His graduation speaker was Harry Allan Overstreet, chair of philosophy at City College. (My mother says he would have much preferred to go to City College, and had the grades for it, but because he was working he could only go to a college he could attend at night.) Anyway, here’s what Overstreet told the graduates:

…this is 1935 and the Great Depression is still with us. I hope you all have good jobs to go to when you leave here today. If not, I surely don’t know where you can find any. With all your special skills and qualifications, the world just doesn’t want you.


My grandfather records the reaction:

Pace officialdom sat pale and frozen-faced as we quietly filed out. In our next communication as alumni, we were told to pay no attention to “Self-centered, egotistical introverts,” and that we would have no trouble finding good jobs.

Matt von HippelThe Big Bang: What We Know and How We Know It

When most people think of the Big Bang, they imagine a single moment: a whole universe emerging from nothing. That’s not really how it worked, though. The Big Bang refers not to one event, but to a whole scientific theory. Using Einstein’s equations and some simplifying assumptions, we physicists can lay out a timeline for the universe’s earliest history. Different parts of this timeline have different evidence: some are meticulously tested, others we even expect to be wrong! It’s worth talking through this timeline and discussing what we know about each piece, and how we know it.

We can see surprisingly far back in time. As we look out into the universe, we see each star as it was when the light we see left it: longer ago the further the star is from us. Looking back, we see changes in the types of stars and galaxies: stars formed without the metals that later stars produced, galaxies made of those early stars. We see the universe become denser and hotter, until eventually we reach the last thing we can see: the cosmic microwave background, a faint light that fills our view in every direction. This light represents a change in the universe, the emergence of the first atoms. Before this, there were ions: free nuclei and electrons, forming a hot plasma. That plasma constantly emitted and absorbed light. As the universe cooled, the ions merged into atoms, and light was free to travel. Because of this, we cannot see back beyond this point. Our model gives detailed predictions for this curtain of light: its temperature, and even the ways it varies in intensity from place to place, which in turn let us hone our model further.

In principle, we could “see” a bit further. Light isn’t the only thing that travels freely through the universe. Neutrinos are almost massless, and pass through almost everything. Like the cosmic microwave background, the universe should have a cosmic neutrino background. This would come from much earlier, from an era when the universe was so dense that neutrinos regularly interacted with other matter. We haven’t detected this neutrino background yet, but future experiments might. Gravitational waves meanwhile, can also pass through almost any obstacle. There should be gravitational wave backgrounds as well, from a variety of eras in the early universe. Once again these haven’t been detected yet, but more powerful gravitational wave telescopes may yet see them.

We have indirect evidence a bit further back than we can see things directly. In the heat of the early universe the first protons and neutrons were merged via nuclear fusion, becoming the first atomic nuclei: isotopes of hydrogen, helium, and lithium. Our model lets us predict the proportions of these, how much helium and lithium per hydrogen atom. We can then compare this to the oldest stars we see, and see that the proportions are right. In this way, we know something about the universe from before we can “see” it.

We get surprised when we look at the universe on large scales, and compare widely separated regions. We find those regions are surprisingly similar, more than we would expect from randomness and the physics we know. Physicists have proposed different explanations for this. The most popular, cosmic inflation, suggests that the universe expanded very rapidly, accelerating so that a small region of similar matter was blown up much larger than the ordinary Big Bang model would have, projecting those similarities across the sky. While many think this proposal fits the data best, we still aren’t sure it’s the right one: there are alternate proposals, and it’s even controversial whether we should be surprised by the large-scale similarity in the first place.

We understand, in principle, how matter can come from “nothing”. This is sometimes presented as the most mysterious part of the Big Bang, the idea that matter could spontaneously emerge from an “empty” universe. But to a physicist, this isn’t very mysterious. Matter isn’t actually conserved, mass is just energy you haven’t met yet. Deep down, the universe is just a bunch of rippling quantum fields, with different ones more or less active at different times. Space-time itself is just another field, the gravitational field. When people say that in the Big Bang matter emerged from nothing, all they mean is that energy moved from the gravitational field to fields like the electron and quark, giving rise to particles. As we wind the model back, we can pretty well understand how this could happen.

If we extrapolate, winding Einstein’s equations back all the way, we reach a singularity: the whole universe, according to those equations, would have emerged from a single point, a time when everything was zero distance from everything else. This assumes, though, that Einstein’s equations keep working all the way back that far. That’s probably wrong, though. Einstein’s equations don’t include the effect of quantum mechanics, which should be much more important when the universe is at its hottest and densest. We don’t have a complete theory of quantum gravity yet (at least, not one that can model this), so we can’t be certain how to correct these equations. But in general, quantum theories tend to “fuzz out” singularities, spreading out a single point over a wider area. So it’s likely that the universe didn’t actually come from just a single point, and our various incomplete theories of quantum gravity tend to back this up.

So, starting from what we can see, we extrapolate back to what we can’t. We’re quite confident in some parts of the Big Bang theory: the emergence of the first galaxies, the first stars, the first atoms, and the first elements. Back far enough and things get more mysterious, we have proposals but no definite answers. And if you try to wind back up to the beginning, you find we still don’t have the right kind of theory to answer the question. That’s a task for the future.

July 06, 2021

Terence TaoQuantitative bounds for Gowers uniformity of the Mobius and von Mangoldt functions

Joni Teräväinen and myself have just uploaded to the arXiv our preprint “Quantitative bounds for Gowers uniformity of the Möbius and von Mangoldt functions“. This paper makes quantitative the Gowers uniformity estimates on the Möbius function {\mu} and the von Mangoldt function {\Lambda}.

To discuss the results we first discuss the situation of the Möbius function, which is technically simpler in some (though not all) ways. We assume familiarity with Gowers norms and standard notations around these norms, such as the averaging notation {\mathop{\bf E}_{n \in [N]}} and the exponential notation {e(\theta) = e^{2\pi i \theta}}. The prime number theorem in qualitative form asserts that

\displaystyle  \mathop{\bf E}_{n \in [N]} \mu(n) = o(1)

as {N \rightarrow \infty}. With Vinogradov-Korobov error term, the prime number theorem is strengthened to

\displaystyle  \mathop{\bf E}_{n \in [N]} \mu(n) \ll \exp( - c \log^{3/5} N (\log \log N)^{-1/5} );

we refer to such decay bounds (With {\exp(-c\log^c N)} type factors) as pseudopolynomial decay. Equivalently, we obtain pseudopolynomial decay of Gowers {U^1} seminorm of {\mu}:

\displaystyle  \| \mu \|_{U^1([N])} \ll \exp( - c \log^{3/5} N (\log \log N)^{-1/5} ).

As is well known, the Riemann hypothesis would be equivalent to an upgrade of this estimate to polynomial decay of the form

\displaystyle  \| \mu \|_{U^1([N])} \ll_\varepsilon N^{-1/2+\varepsilon}

for any {\varepsilon>0}.

Once one restricts to arithmetic progressions, the situation gets worse: the Siegel-Walfisz theorem gives the bound

\displaystyle  \| \mu 1_{a \hbox{ mod } q}\|_{U^1([N])} \ll_A \log^{-A} N \ \ \ \ \ (1)

for any residue class {a \hbox{ mod } q} and any {A>0}, but with the catch that the implied constant is ineffective in {A}. This ineffectivity cannot be removed without further progress on the notorious Siegel zero problem.

In 1937, Davenport was able to show the discorrelation estimate

\displaystyle  \mathop{\bf E}_{n \in [N]} \mu(n) e(-\alpha n) \ll_A \log^{-A} N

for any {A>0} uniformly in {\alpha \in {\bf R}}, which leads (by standard Fourier arguments) to the Fourier uniformity estimate

\displaystyle  \| \mu \|_{U^2([N])} \ll_A \log^{-A} N.

Again, the implied constant is ineffective. If one insists on effective constants, the best bound currently available is

\displaystyle  \| \mu \|_{U^2([N])} \ll \log^{-c} N \ \ \ \ \ (2)

for some small effective constant {c>0}.

For the situation with the {U^3} norm the previously known results were much weaker. Ben Green and I showed that

\displaystyle  \mathop{\bf E}_{n \in [N]} \mu(n) \overline{F}(g(n) \Gamma) \ll_{A,F,G/\Gamma} \log^{-A} N \ \ \ \ \ (3)

uniformly for any {A>0}, any degree two (filtered) nilmanifold {G/\Gamma}, any polynomial sequence {g: {\bf Z} \rightarrow G}, and any Lipschitz function {F}; again, the implied constants are ineffective. On the other hand, in a separate paper of Ben Green and myself, we established the following inverse theorem: if for instance we knew that

\displaystyle  \| \mu \|_{U^3([N])} \geq \delta

for some {0 < \delta < 1/2}, then there exists a degree two nilmanifold {G/\Gamma} of dimension {O( \delta^{-O(1)} )}, complexity {O( \delta^{-O(1)} )}, a polynomial sequence {g: {\bf Z} \rightarrow G}, and Lipschitz function {F} of Lipschitz constant {O(\delta^{-O(1)})} such that

\displaystyle  \mathop{\bf E}_{n \in [N]} \mu(n) \overline{F}(g(n) \Gamma) \gg \exp(-\delta^{-O(1)}).

Putting the two assertions together and comparing all the dependencies on parameters, one can establish the qualitative decay bound

\displaystyle  \| \mu \|_{U^3([N])} = o(1).

However the decay rate {o(1)} produced by this argument is completely ineffective: obtaining a bound on when this {o(1)} quantity dips below a given threshold {\delta} depends on the implied constant in (3) for some {G/\Gamma} whose dimension depends on {\delta}, and the dependence on {\delta} obtained in this fashion is ineffective in the face of a Siegel zero.

For higher norms {U^k, k \geq 3}, the situation is even worse, because the quantitative inverse theory for these norms is poorer, and indeed it was only with the recent work of Manners that any such bound is available at all (at least for {k>4}). Basically, Manners establishes if

\displaystyle  \| \mu \|_{U^k([N])} \geq \delta

then there exists a degree {k-1} nilmanifold {G/\Gamma} of dimension {O( \delta^{-O(1)} )}, complexity {O( \exp\exp(\delta^{-O(1)}) )}, a polynomial sequence {g: {\bf Z} \rightarrow G}, and Lipschitz function {F} of Lipschitz constant {O(\exp\exp(\delta^{-O(1)}))} such that

\displaystyle  \mathop{\bf E}_{n \in [N]} \mu(n) \overline{F}(g(n) \Gamma) \gg \exp\exp(-\delta^{-O(1)}).

(We allow all implied constants to depend on {k}.) Meanwhile, the bound (3) was extended to arbitrary nilmanifolds by Ben and myself. Again, the two results when concatenated give the qualitative decay

\displaystyle  \| \mu \|_{U^k([N])} = o(1)

but the decay rate is completely ineffective.

Our first result gives an effective decay bound:

Theorem 1 For any {k \geq 2}, we have {\| \mu \|_{U^k([N])} \ll (\log\log N)^{-c_k}} for some {c_k>0}. The implied constants are effective.

This is off by a logarithm from the best effective bound (2) in the {k=2} case. In the {k=3} case there is some hope to remove this logarithm based on the improved quantitative inverse theory currently available in this case, but there is a technical obstruction to doing so which we will discuss later in this post. For {k>3} the above bound is the best one could hope to achieve purely using the quantitative inverse theory of Manners.

We have analogues of all the above results for the von Mangoldt function {\Lambda}. Here a complication arises that {\Lambda} does not have mean close to zero, and one has to subtract off some suitable approximant {\Lambda^\sharp} to {\Lambda} before one would expect good Gowers norms bounds. For the prime number theorem one can just use the approximant {1}, giving

\displaystyle  \| \Lambda - 1 \|_{U^1([N])} \ll \exp( - c \log^{3/5} N (\log \log N)^{-1/5} )

but even for the prime number theorem in arithmetic progressions one needs a more accurate approximant. In our paper it is convenient to use the “Cramér approximant”

\displaystyle  \Lambda_{\hbox{Cram\'er}}(n) := \frac{W}{\phi(W)} 1_{(n,W)=1}


\displaystyle  W := \prod_{p<Q} p

and {Q} is the quasipolynomial quantity

\displaystyle  Q = \exp(\log^{1/10} N). \ \ \ \ \ (4)

Then one can show from the Siegel-Walfisz theorem and standard bilinear sum methods that

\displaystyle  \mathop{\bf E}_{n \in [N]} (\Lambda - \Lambda_{\hbox{Cram\'er}}(n)) e(-\alpha n) \ll_A \log^{-A} N


\displaystyle  \| \Lambda - \Lambda_{\hbox{Cram\'er}}\|_{U^2([N])} \ll_A \log^{-A} N

for all {A>0} and {\alpha \in {\bf R}} (with an ineffective dependence on {A}), again regaining effectivity if {A} is replaced by a sufficiently small constant {c>0}. All the previously stated discorrelation and Gowers uniformity results for {\mu} then have analogues for {\Lambda}, and our main result is similarly analogous:

Theorem 2 For any {k \geq 2}, we have {\| \Lambda - \Lambda_{\hbox{Cram\'er}} \|_{U^k([N])} \ll (\log\log N)^{-c_k}} for some {c_k>0}. The implied constants are effective.

By standard methods, this result also gives quantitative asymptotics for counting solutions to various systems of linear equations in primes, with error terms that gain a factor of {O((\log\log N)^{-c})} with respect to the main term.

We now discuss the methods of proof, focusing first on the case of the Möbius function. Suppose first that there is no “Siegel zero”, by which we mean a quadratic character {\chi} of some conductor {q \leq Q} with a zero {L(\beta,\chi)} with {1 - \beta \leq \frac{c}{\log Q}} for some small absolute constant {c>0}. In this case the Siegel-Walfisz bound (1) improves to a quasipolynomial bound

\displaystyle  \| \mu 1_{a \hbox{ mod } q}\|_{U^1([N])} \ll \exp(-\log^c N). \ \ \ \ \ (5)

To establish Theorem 1 in this case, it suffices by Manners’ inverse theorem to establish the polylogarithmic bound

\displaystyle  \mathop{\bf E}_{n \in [N]} \mu(n) \overline{F}(g(n) \Gamma) \ll \exp(-\log^c N) \ \ \ \ \ (6)

for all degree {k-1} nilmanifolds {G/\Gamma} of dimension {O((\log\log N)^c)} and complexity {O( \exp(\log^c N))}, all polynomial sequences {g}, and all Lipschitz functions {F} of norm {O( \exp(\log^c N))}. If the nilmanifold {G/\Gamma} had bounded dimension, then one could repeat the arguments of Ben and myself more or less verbatim to establish this claim from (5), which relied on the quantitative equidistribution theory on nilmanifolds developed in a separate paper of Ben and myself. Unfortunately, in the latter paper the dependence of the quantitative bounds on the dimension {d} was not explicitly given. In an appendix to the current paper, we go through that paper to account for this dependence, showing that all exponents depend at most doubly exponentially in the dimension {d}, which is barely sufficient to handle the dimension of {O((\log\log N)^c)} that arises here.

Now suppose we have a Siegel zero {L(\beta,\chi)}. In this case the bound (5) will not hold in general, and hence also (6) will not hold either. Here, the usual way out (while still maintaining effective estimates) is to approximate {\mu} not by {0}, but rather by a more complicated approximant {\mu_{\hbox{Siegel}}} that takes the Siegel zero into account, and in particular is such that one has the (effective) pseudopolynomial bound

\displaystyle  \| (\mu - \mu_{\hbox{Siegel}}) 1_{a \hbox{ mod } q}\|_{U^1([N])} \ll \exp(-\log^c N) \ \ \ \ \ (7)

for all residue classes {a \hbox{ mod } q}. The Siegel approximant to {\mu} is actually a little bit complicated, and to our knowledge the first appearance of this sort of approximant only appears as late as this 2010 paper of Germán and Katai. Our version of this approximant is defined as the multiplicative function such that

\displaystyle \mu_{\hbox{Siegel}}(p^j) = \mu(p^j)

when {p < Q}, and

\displaystyle  \mu_{\hbox{Siegel}}(n) = \alpha n^{\beta-1} \chi(n)

when {n} is coprime to all primes {p<Q}, and {\alpha} is a normalising constant given by the formula

\displaystyle  \alpha := \frac{1}{L'(\beta,\chi)} \prod_{p<Q} (1-\frac{1}{p})^{-1} (1 - \frac{\chi(p)}{p^\beta})^{-1}

(this constant ends up being of size {O(1)} and plays only a minor role in the analysis). This is a rather complicated formula, but it seems to be virtually the only choice of approximant that allows for bounds such as (7) to hold. (This is the one aspect of the problem where the von Mangoldt theory is simpler than the Möbius theory, as in the former one only needs to work with very rough numbers for which one does not need to make any special accommodations for the behavior at small primes when introducing the Siegel correction term.) With this starting point it is then possible to repeat the analysis of my previous papers with Ben and obtain the pseudopolynomial discorrelation bound

\displaystyle  \mathop{\bf E}_{n \in [N]} (\mu - \mu_{\hbox{Siegel}})(n) \overline{F}(g(n) \Gamma) \ll \exp(-\log^c N)

for {F(g(n)\Gamma)} as before, which when combined with Manners’ inverse theorem gives the doubly logarithmic bound

\displaystyle \| \mu - \mu_{\hbox{Siegel}} \|_{U^k([N])} \ll (\log\log N)^{-c_k}.

Meanwhile, a direct sieve-theoretic computation ends up giving the singly logarithmic bound

\displaystyle \| \mu_{\hbox{Siegel}} \|_{U^k([N])} \ll \log^{-c_k} N

(indeed, there is a good chance that one could improve the bounds even further, though it is not helpful for this current argument to do so). Theorem 1 then follows from the triangle inequality for the Gowers norm. It is interesting that the Siegel approximant {\mu_{\hbox{Siegel}}} seems to play a rather essential component in the proof, even if it is absent in the final statement. We note that this approximant seems to be a useful tool to explore the “illusory world” of the Siegel zero further; see for instance the recent paper of Chinis for some work in this direction.

For the analogous problem with the von Mangoldt function (assuming a Siegel zero for sake of discussion), the approximant {\Lambda_{\hbox{Siegel}}} is simpler; we ended up using

\displaystyle \Lambda_{\hbox{Siegel}}(n) = \Lambda_{\hbox{Cram\'er}}(n) (1 - n^{\beta-1} \chi(n))

which allows one to state the standard prime number theorem in arithmetic progressions with classical error term and Siegel zero term compactly as

\displaystyle  \| (\Lambda - \Lambda_{\hbox{Siegel}}) 1_{a \hbox{ mod } q}\|_{U^1([N])} \ll \exp(-\log^c N).

Routine modifications of previous arguments also give

\displaystyle  \mathop{\bf E}_{n \in [N]} (\Lambda - \Lambda_{\hbox{Siegel}})(n) \overline{F}(g(n) \Gamma) \ll \exp(-\log^c N) \ \ \ \ \ (8)


\displaystyle \| \Lambda_{\hbox{Siegel}} \|_{U^k([N])} \ll \log^{-c_k} N.

The one tricky new step is getting from the discorrelation estimate (8) to the Gowers uniformity estimate

\displaystyle \| \Lambda - \Lambda_{\hbox{Siegel}} \|_{U^k([N])} \ll (\log\log N)^{-c_k}.

One cannot directly apply Manners’ inverse theorem here because {\Lambda} and {\Lambda_{\hbox{Siegel}}} are unbounded. There is a standard tool for getting around this issue, now known as the dense model theorem, which is the standard engine powering the transference principle from theorems about bounded functions to theorems about certain types of unbounded functions. However the quantitative versions of the dense model theorem in the literature are expensive and would basically weaken the doubly logarithmic gain here to a triply logarithmic one. Instead, we bypass the dense model theorem and directly transfer the inverse theorem for bounded functions to an inverse theorem for unbounded functions by using the densification approach to transference introduced by Conlon, Fox, and Zhao. This technique turns out to be quantitatively quite efficient (the dependencies of the main parameters in the transference are polynomial in nature), and also has the technical advantage of avoiding the somewhat tricky “correlation condition” present in early transference results which are also not beneficial for quantitative bounds.

In principle, the above results can be improved for {k=3} due to the stronger quantitative inverse theorems in the {U^3} setting. However, there is a bottleneck that prevents us from achieving this, namely that the equidistribution theory of two-step nilmanifolds has exponents which are exponential in the dimension rather than polynomial in the dimension, and as a consequence we were unable to improve upon the doubly logarithmic results. Specifically, if one is given a sequence of bracket quadratics such as {\lfloor \alpha_1 n \rfloor \beta_1 n, \dots, \lfloor \alpha_d n \rfloor \beta_d n} that fails to be {\delta}-equidistributed, one would need to establish a nontrivial linear relationship modulo 1 between the {\alpha_1,\beta_1,\dots,\alpha_d,\beta_d} (up to errors of {O(1/N)}), where the coefficients are of size {O(\delta^{-d^{O(1)}})}; current methods only give coefficient bounds of the form {O(\delta^{-\exp(d^{O(1)})})}. An old result of Schmidt demonstrates proof of concept that these sorts of polynomial dependencies on exponents is possible in principle, but actually implementing Schmidt’s methods here seems to be a quite non-trivial task. There is also another possible route to removing a logarithm, which is to strengthen the inverse {U^3} theorem to make the dimension of the nilmanifold logarithmic in the uniformity parameter {\delta} rather than polynomial. Again, the Freiman-Bilu theorem (see for instance this paper of Ben and myself) demonstrates proof of concept that such an improvement in dimension is possible, but some work would be needed to implement it.

July 04, 2021

Scott AaronsonOpen thread on new quantum supremacy claims

Happy 4th to those in the US!

The group of Chaoyang Lu and Jianwei Pan, based at USTC in China, has been on a serious quantum supremacy tear lately. Recall that last December, USTC announced the achievement of quantum supremacy via Gaussian BosonSampling, with 50-70 detected photons—the second claim of sampling-based quantum supremacy, after Google’s in Fall 2019. However, skeptics then poked holes in the USTC claim, showing how they could spoof the results with a classical computer, basically by reproducing the k-photon correlations for relatively small values of k. Debate over the details continues, but the Chinese group seeks to render the debate largely moot with a new and better Gaussian BosonSampling experiment, with 144 modes and up to 113 detected photons. They say they were able to measure k-photon correlations for k up to about 19, which if true would constitute a serious obstacle to the classical simulation strategies that people discussed for the previous experiment.

In the meantime, though, an overlapping group of authors had put out another paper the day before (!) reporting a sampling-based quantum supremacy experiment using superconducting qubits—extremely similar to what Google did (the same circuit depth and everything), except now with 56 qubits rather than 53.

I confess that I haven’t yet studied either paper in detail—among other reasons, because I’m on vacation with my family at the beach, and because I’m trying to spend what work-time I have on my own projects. But anyone who has read them, please use the comments of this post to discuss! Hopefully I’ll learn something.

To confine myself to some general comments: since Google’s announcement in Fall 2019, I’ve consistently said that sampling-based quantum supremacy is not yet a done deal. I’ve said that quantum supremacy seems important enough to want independent replications, and demonstrations in other hardware platforms like ion traps and photonics, and better gate fidelity, and better classical hardness, and better verification protocols. Most of all, I’ve said that we needed a genuine dialogue between the “quantum supremacists” and the classical skeptics: the former doing experiments and releasing all their data, the latter trying to design efficient classical simulations for those experiments, and so on in an iterative process. Just like in applied cryptography, we’d only have real confidence in a quantum supremacy claim once it had survived at least a few years of attacks by skeptics. So I’m delighted that this is precisely what’s now happening. USTC’s papers are two new volleys in this back-and-forth; we all eagerly await the next volley, whichever side it comes from.

While I’ve been trying for years to move away from the expectation that I blog about each and every QC announcement that someone messages me about, maybe I’ll also say a word about the recent announcement by IBM of a quantum advantage in space complexity (see here for popular article and here for arXiv preprint). There appears to be a nice theoretical result here, about the ability to evaluate any symmetric Boolean function with a single qubit in a branching-program-like model. I’d love to understand that result better. But to answer the question I received, this is another case where, once you know the protocol, you know both that the experiment can be done and exactly what its result will be (namely, the thing predicted by QM). So I think the interest is almost entirely in the protocol itself.

July 02, 2021

Matt von HippelDigging for Buried Insight

The scientific method, as we usually learn it, starts with a hypothesis. The scientist begins with a guess, and asks a question with a clear answer: true, or false? That guess lets them design an experiment, observe the consequences, and improve our knowledge of the world.

But where did the scientist get the hypothesis in the first place? Often, through some form of exploratory research.

Exploratory research is research done, not to answer a precise question, but to find interesting questions to ask. Each field has their own approach to exploration. A psychologist might start with interviews, asking broad questions to find narrower questions for a future survey. An ecologist might film an animal, looking for changes in its behavior. A chemist might measure many properties of a new material, seeing if any stand out. Each approach is like digging for treasure, not sure of exactly what you will find.

Mathematicians and theoretical physicists don’t do experiments, but we still need hypotheses. We need an idea of what we plan to prove, or what kind of theory we want to build: like other scientists, we want to ask a question with a clear, true/false answer. And to find those questions, we still do exploratory research.

What does exploratory research look like, in the theoretical world? Often, it begins with examples and calculations. We can start with a known method, or a guess at a new one, a recipe for doing some specific kind of calculation. Recipe in hand, we proceed to do the same kind of calculation for a few different examples, covering different sorts of situation. Along the way, we notice patterns: maybe the same steps happen over and over, or the result always has some feature.

We can then ask, do those same steps always happen? Does the result really always have that feature? We have our guess, our hypothesis, and our attempt to prove it is much like an experiment. If we find a proof, our hypothesis was true. On the other hand, we might not be able to find a proof. Instead, exploring, we might find a counterexample – one where the steps don’t occur, the feature doesn’t show up. That’s one way to learn that our hypothesis was false.

This kind of exploration is essential to discovery. As scientists, we all have to eventually ask clear yes/no questions, to submit our beliefs to clear tests. But we can’t start with those questions. We have to dig around first, to observe the world without a clear plan, to get to a point where we have a good question to ask.

Tommaso DorigoTwo Good Opportunities For In-Person Discussions On Machine Learning For Physics

After over one year of forced confinement, due to the still ongoing Covid-19 pandemic, academics around the world seem to have settled down on the idea that after all, we can still do our job via videoconferencing. We had to adapt to the situation as everybody else, of course, and in a general sense we are a privileged minority - other human occupations which are only possible in person suffered way more.

read more

July 01, 2021

Peter Rohde A case for universal basic income

Within modern libertarian-leaning circles, there has been increasingly vocal support for introducing a Universal Basic Income (UBI) — an unconditional state-financed income guarantee for all citizens. This is often met with surprise from those more social-democratic leaning, who, interestingly, are increasingly supportive of the idea too, albeit approaching it from the social rather than the economic angle. I believe the arguments from both sides are valid, and regardless of which way you vote or whether you lean left or right, we should support the introduction of a UBI as a replacement for our existing welfare systems.

Government efficiency

Perhaps the biggest gripe with current social welfare spending is how inefficient governments are at handling it, which is hardly surprising, given that governments simply aren’t very strongly incentivised to be efficient. A UBI is the most simple and efficient social spending program that can be envisioned and owing to its inherent universality and uniformity it disempowers politicians from manipulating it for political purposes.

We can effectively eliminate all forms of existing welfare (including unemployment benefits, aged and disability pensions, maternity and paternity leave, carers allowances, and student income support) and their associated bureaucratic infrastructure, replacing it with a script that makes automated periodic bank transfers to everyone. The entire existing infrastructure can be completely dismantled and put to rest, along with all its operating costs and overheads, and the associated labour put to better use than paper-pushing.

This isn’t only more efficient for the government, but also for recipients, who are no longer burdened with battling against bureaucracy and can instead turn their attention to more productive things.

Benefits for the unemployed

A common and completely correct criticism of conventional unemployment benefits is their creation of ‘poverty traps’, induced by the extremely high effective marginal tax rate (EMTR) imposed upon the unemployed when they gain employment, whose income upon gaining employment is discounted by their loss of benefits. Their effective pay increase is the difference between these (not very much for a low-income worker), yet their respective increase in workload is from zero to full-time. This is equivalent to paying a very high rate of income tax, typically far higher than the highest marginal tax rates in even highly progressive systems. This creates an enormous disincentive to seek employment since people aren’t likely to switch from unemployment to full-time work if it only earns them an extra cup of coffee. With an unconditional UBI in place, any new income earned is not discounted by benefit reductions, and the person’s marginal income remains their actual income.

Eliminating this poverty trap barrier would foreseeably result in far greater incentive amongst the unemployed to seek and hold onto employment. Additionally, by not being stuck in a situation where people have the “I have to take any job I can get otherwise I’ll be homeless” mindset, people are more likely to find themselves in jobs that suit them and provide a positive future outlook.

With social security conditional upon being unemployed and a minimum award wage in place, those with labour productivity below the minimum award threshold are forced onto social security. Marginal income is discounted by the social security baseline, which is a very low differential for low-skilled workers, creating a ‘poverty trap’ that disincentivises employment.
By removing the minimum wage and replacing social security with a UBI the poverty trap is eliminated.

Additionally, in eliminating the minimum award wage we prevent low-skilled workers from being priced out of the labour market altogether given the inevitability that some workers will be unable to readily find employment in which their skillset empowers them with productivity above this imposed threshold, thereby condemning them to unemployment.

Benefits for workers

Amongst low-skilled workers in particular, there is significant bargaining asymmetry against employers — to the employee, everything might depend on keeping their job, whereas to the employer the cost of replacing them is relatively low. A UBI to a significant extent offsets the former issue, placing workers into a stronger workplace bargaining position without the need for regulatory mechanisms of highly questionable efficacy to attempt to enforce it. This increases the labour market competitiveness of workers.

The same argument extends to self-employment. A low-skilled worker might self-employ by undertaking odd jobs, or contract- or app-based work, which don’t provide long-term job security but ought to be encouraged.

Labour market mobility

Throughout much of the developed world, savings rates are very low, providing a significant hindrance to labour market mobility — the ability for people to switch between jobs. Since it can be very challenging to line up a new job timed exactly right to match the exit from an existing one, transitioning between jobs is effectively disincentivized by this savings barrier. This creates an employment trap that inhibits people from placing themselves into jobs to which they are best suited and most likely to be successful and productive at.

Increased mobility in the labour market is enormously beneficial not only to workers, but also to employers, and reducing inefficiencies that undermine mobility have huge economic benefits. This doesn’t only apply to upward mobility, whereby people shift into jobs with higher income, but especially to sideways mobility, where people change jobs to better suits their conditions or gain new skills. A UBI improves overall labour market mobility by ensuring that employees aren’t economically trapped into jobs that aren’t right, or where they could find something better, which isn’t a good outcome for any of us, either individually or collectively.

How to introduce a UBI

Given that introducing a UBI represents a major structural reform, it is important to be mindful of minimising disruption during implementation. If poorly executed it could shock the system and cause destabilisation. A suggestion for how to smoothly go about implementing this is as follows.

Take all existing employment contracts and automatically overwrite them to subtract the introduced UBI amount from the specified income, with no other contractual changes — the legislation effectively negotiates down all salaries by an amount equivalent to the newly introduced UBI, which everyone now receives. The UBI amount is effectively contractually refunded back to the employer via reduced liability, with no perceived change in income by employees.

We could similarly eliminate mandatory superannuation — a hugely inefficient and inequitable system — instead converting whatever fraction is necessary into income taxation to finance the component of the UBI stream that replaces the pension and allowing workers to retain the rest.

Effect of legislative renegotiation of employment contracts to discount incomes by the newly-introduced UBI. For illustration here I let the UBI be equal to the minimum award wage, at which point those previously earning the minimum wage now earn nothing. This necessitates the right for employees to choose to cancel current contracts and renegotiate them if they choose.

There is one obvious pitfall in the above figure, being that at the bottom end of the labour productivity spectrum legislatively renegotiated incomes become extremely small or indeed vanish in the limit where the UBI equals the previous minimum award wage. Clearly it isn’t viable for those employees to accept that. But that isn’t to say that their labour is actually worth nothing. The purpose of the legislative override is only to balance the various money flows, but it simultaneously skews incentives.

Two ways to navigate this issue are:

  1. Complement the legislation with the right for employees (but not employers) to exit their contracts and choose to either reapply with newly negotiated salary. Employees can realistically negotiate their salaries upward at this point given that their labour still has value, however they would now be doing so in competition against a bigger pool of competitors given that the minimum wage has now been abolished. The new market value is expected to be less than before (due to increased labour market competition).
  2. Introduce the UBI gradually over time, discounting the pre-existing benefits stream by an equal amount until over time it evaporates entirely and can be abolished. This has the advantage of mitigating sudden labour market shocks that a spontaneous introduction would induce and allow the market to find new pricing for labour on its own without top-down intervention.

Given that employers have all now been granted an employment cost reduction equivalent to the UBI paid to their employees, the cost of the UBI can now be compensated for by tax liabilities to the employer via company tax. For this to remain budget neutral the new tax liabilities would in total be greater than before since they now also must cover those who are unemployed but now received the UBI, which corresponds to the unemployment rate, offset by whatever the costs of the previous, far less efficient unemployment benefits schemes were. All things combined, in countries with low unemployment and/or highly inefficient public sectors, this is unlikely to be a major burden.

Money flow between employers, employees and government.
Upon transitioning to the UBI the flows change, with no perceived change in incomes. If the UBI takes the same value as conventional benefits, which it needn’t necessarily, there is no perceived change in benefits or net taxation either (tax rates would however have to increase).
If we opt to introduce the UBI gradually to minimise labour market shock, the money flow would instead look like this. We discount the benefits flow by the UBI, and the UBI is gradually ramped up until the difference is zero at which point the previous benefits scheme is abolished. Although this approach is a longterm one, it is likely more politically viable to pursue given the aversion politicians have to making major structural reforms that cause short-term shocks (after all, the next election is always in sight). Politicians aside, major shocks to the labour market are generally unwelcome anyway, and slowly swapping in a UBI using this approach (which I’m going to name political annealing) allows us to smoothly interpolate from one to the other, thereby allowing new market prices for different forms of labour to be properly established.

To maintain parity from the income tax side, existing income tax rates should be renormalised such that income and UBI combined are taxed equivalently to previous incomes without the UBI. This implies workers’ marginal tax rates to be pushed upwards to remain tax revenue neutral since they are effectively promoted to higher tax brackets.

Now everything is roughly financially equivalent from the perspective of both employees and employers, the primary difference being that the unemployed have now been absorbed into this new system, with the previous one being dismantled.

Benefits for employers

From the employer’s perspective, especially small businesses employing low-skilled workers, this is a massive boon, since the cost of employment is reduced to the differential between what it previously was and what the new UBI provides. However, they’re receiving the same labour in exchange, so the ratio between the reduced labour cost (indirectly subsidised via the UBI) and the original labour cost (the worker’s full income) provides a multiplier on the productivity of their employees. This ratio is largest for low-income workers, thereby stimulating demand for their employment. Businesses are now better off to the effect of a UBI per employee, which can be recovered via company tax adjustments to finance the UBI.

Social benefits

It’s easy to see the attractiveness of this from a social-democratic perspective. But not just in the trivial sense that it directly undermines poverty. More generally, it places every member of society in a position of greater independence. People in abusive relationships are better able to leave. People have a greater genuine choice in making life decisions, both personally and professionally, and are less easily coerced. People who need lifestyle or career flexibility for health reasons are in a far stronger position. And perhaps most importantly, the impact on general mental health within society is likely to be significant, by removing enormous insecurities, particularly during periods of labour market volatility.

A more powerful monetary tool

During the course of the current COVID-induced economic crisis, it has become clear that central banks have exhausted their fiscal tools in stimulating consumption, and have entered a cycle of concocting ever more elaborate and extreme new ones out of desperation. This has arisen because their tools are largely limited to manipulating the money supply via interaction with financial markets. But that isn’t where consumption takes place, it’s where investments are made and assets held. Consumption is driven down here, not up there.

With well-considered legislation in place, a UBI provides a monetary tool for stimulating consumer spending via the injection of additional funds — in addition to the regular UBI flow under the umbrella of fiscal policy, central banks have the option of supplementing it with monetary stimulus. This provides the most direct possible mechanism for stimulating consumer price inflation (CPI) as opposed to conventional monetary stimulus via open market operations (OMOs) which largely affect asset price inflation. Unlike manipulating income tax rates, this is not conditional on people being employed (and less likely to spend than those who are not), and the latency between pulling monetary levers and their impact on CPI is minimised.

This is not to advocate for more monetary intervention by governments, who have consistently demonstrated themselves not especially competent in doing so, but rather provides tools that are more effective in achieving what they’re ostensibly intending to do. It doesn’t take a genius to see that when interest rates are near zero, pumping more cash off the press into financial markets isn’t going to be hugely effective at stimulating consumption. This represents a trickle-up rather than trickle-down approach to economic stimulus, the latter being oft criticised as being ineffective, which it is.

Economic & social robustness

During COVID, various governments scrambled to introduce mechanisms to protect businesses from the catastrophic impact of having neither workers nor clients. One approach, which seems to have worked reasonably well here in Australia (the so-called JobKeeper scheme) was for the government to temporarily pay the incomes of companies’ employees, thereby allowing businesses to go into hibernation and be insulated from payroll liabilities, ensuring that existing employees’ employment is retained. A UBI would have made this kind of response far less chaotic.

A UBI provides significant robustness against future pandemic scenarios and other temporary but catastrophic labour market shocks. There is no need to rush to put new legislation into place to absorb such shocks, the infrastructure is already in place. Companies, especially small businesses with limited financial buffers, needn’t wait for governments to decide if and when they will consider programs like JobKeeper, which has seemingly been very effective at protecting businesses from their employees being prevented from working.

Small government

While a UBI is a ‘big government’ initiative and therefore might be unpalatable to many on the political right, it needs to be seen on a comparative basis relative to what we currently have, a social welfare system that ostensibly aims to achieve the things a UBI will but does so with massive bureaucratic overhead and inefficiencies that in many ways create more problems than they solve.

Dismantling a hugely complicated set of systems focussing on different aspects of social welfare and reducing them to a single streamlined one that requires very little overhead to maintain is very much aligned with small government philosophy and massively reduces the number of levers the government has to play with and therefore play politics with.

The redundancy of human labour

It seems likely a UBI will become inevitable at some point in time based on the expectation that as automation and AI continue to advance, increasing numbers of workers will simply not be able to compete with the productivity of machines. There’s only so much retraining we can do before trying to do something that machines can’t do better becomes futile. And this isn’t an expectation that applies only to low-skilled or manual labour, but inevitably will over time impact the competitiveness of even the most highly skilled jobs. With this kind of inevitable future outlook, the conventional libertarian philosophy of “leave the market to itself” will simply not be capable of addressing society’s needs.

We are in an era where automation dominates productivity, a trend that will inevitably continue escalating exponentially. It simply isn’t plausible that in the long term we will be able to compete against the machines we invented, and the need for state intervention to subsidise income will be necessary to ever-increasing extents. A UBI is the most practical and efficient way to achieve this, with the most positive outcomes both economically and socially. It would be complete intellectual capitulation for libertarians and conservatives not to allow free-market philosophy to evolve and accommodate for the hard reality that our labour is becoming worthless and that a UBI is the best way to accommodate for this in a manner entirely congruent with the remainder of free-market philosophy. As we outsource our productivity to machines, we mustn’t outsource all the return also.

The incentive tradeoff

The fact that a UBI implies higher marginal tax rates, even if net tax revenue remains the same inevitably raises the perfectly legitimate criticism that this skews incentives and may undermine competitiveness. There is truth in that, however while higher marginal rates act as a disincentive these are counteracted by newly created incentives, specifically more competitive workers, more streamlined government, a more efficient and less regulated labour market with greatly increased mobility, and most importantly all those who were previously priced or regulated out of the labour market altogether no longer face those roadblocks and are able to compete.

From the perspective of businesses, who now pay higher tax rates to effectively reimburse the UBI, they have the new incentive that labour costs have been reduced, providing a multiplier on return on investment into labour, and face fewer regulatory barriers when negotiating employment contracts since we can now safely dismantle many existing employment regulations.

Vote Script for President

Ladies & Gentlemen — Allow me to introduce our new Minister for Social Services.

June 27, 2021

John PreskillCutting the quantum mustard

I had a relative to whom my parents referred, when I was little, as “that great-aunt of yours who walked into a glass door at your cousin’s birthday party.” I was a small child in a large family that mostly lived far away; little else distinguished this great-aunt from other relatives, in my experience. She’d intended to walk from my grandmother’s family room to the back patio. A glass door stood in the way, but she didn’t see it. So my great-aunt whammed into the glass; spent part of the party on the couch, nursing a nosebleed; and earned the epithet via which I identified her for years.

After growing up, I came to know this great-aunt as a kind, gentle woman who adored her family and was adored in return. After growing into a physicist, I came to appreciate her as one of my earliest instructors in necessary and sufficient conditions.

My great-aunt’s intended path satisfied one condition necessary for her to reach the patio: Nothing visible obstructed the path. But the path failed to satisfy a sufficient condition: The invisible obstruction—the glass door—had been neither slid nor swung open. Sufficient conditions, my great-aunt taught me, mustn’t be overlooked.

Her lesson underlies a paper I published this month, with coauthors from the Cambridge other than mine—Cambridge, England: David Arvidsson-Shukur and Jacob Chevalier Drori. The paper concerns, rather than pools and patios, quasiprobabilities, which I’ve blogged about many times [1,2,3,4,5,6,7]. Quasiprobabilities are quantum generalizations of probabilities. Probabilities describe everyday, classical phenomena, from Monopoly to March Madness to the weather in Massachusetts (and especially the weather in Massachusetts). Probabilities are real numbers (not dependent on the square-root of -1); they’re at least zero; and they compose in certain ways (the probability of sun or hail equals the probability of sun plus the probability of hail). Also, the probabilities that form a distribution, or a complete set, sum to one (if there’s a 70% chance of rain, there’s a 30% chance of no rain). 

In contrast, quasiprobabilities can be negative and nonreal. We call such values nonclassical, as they’re unavailable to the probabilities that describe classical phenomena. Quasiprobabilities represent quantum states: Imagine some clump of particles in a quantum state described by some quasiprobability distribution. We can imagine measuring the clump however we please. We can calculate the possible outcomes’ probabilities from the quasiprobability distribution.

Not from my grandmother’s house, although I wouldn’t mind if it were.

My favorite quasiprobability is an obscure fellow unbeknownst even to most quantum physicists: the Kirkwood-Dirac distribution. John Kirkwood defined it in 1933, and Paul Dirac defined it independently in 1945. Then, quantum physicists forgot about it for decades. But the quasiprobability has undergone a renaissance over the past few years: Experimentalists have measured it to infer particles’ quantum states in a new way. Also, colleagues and I have generalized the quasiprobability and discovered applications of the generalization across quantum physics, from quantum chaos to metrology (the study of how we can best measure things) to quantum thermodynamics to the foundations of quantum theory.

In some applications, nonclassical quasiprobabilities enable a system to achieve a quantum advantage—to usefully behave in a manner impossible for classical systems. Examples include metrology: Imagine wanting to measure a parameter that characterizes some piece of equipment. You’ll perform many trials of an experiment. In each trial, you’ll prepare a system (for instance, a photon) in some quantum state, send it through the equipment, and measure one or more observables of the system. Say that you follow the protocol described in this blog post. A Kirkwood-Dirac quasiprobability distribution describes the experiment.1 From each trial, you’ll obtain information about the unknown parameter. How much information can you obtain, on average over trials? Potentially more information if some quasiprobabilities are negative than if none are. The quasiprobabilities can be negative only if the state and observables fail to commute with each other. So noncommutation—a hallmark of quantum physics—underlies exceptional metrological results, as shown by Kirkwood-Dirac quasiprobabilities.

Exceptional results are useful, and we might aim to design experiments that achieve them. We can by designing experiments described by nonclassical Kirkwood-Dirac quasiprobabilities. When can the quasiprobabilities become nonclassical? Whenever the relevant quantum state and observables fail to commute, the quantum community used to believe. This belief turns out to mirror the expectation that one could access my grandmother’s back patio from the living room whenever no visible barriers obstructed the path. As a lack of visible barriers was necessary for patio access, noncommutation is necessary for Kirkwood-Dirac nonclassicality. But noncommutation doesn’t suffice, according to my paper with David and Jacob. We identified a sufficient condition, sliding back the metaphorical glass door on Kirkwood-Dirac nonclassicality. The condition depends on simple properties of the system, state, and observables. (Experts: Examples include the Hilbert space’s dimensionality.) We also quantified and upper-bounded the amount of nonclassicality that a Kirkwood-Dirac quasiprobability can contain.

From an engineering perspective, our results can inform the design of experiments intended to achieve certain quantum advantages. From a foundational perspective, the results help illuminate the sources of certain quantum advantages. To achieve certain advantages, noncommutation doesn’t cut the mustard—but we now know a condition that does.

For another take on our paper, check out this news article in Physics Today.  

1Really, a generalized Kirkwood-Dirac quasiprobability. But that phrase contains a horrendous number of syllables, so I’ll elide the “generalized.”

June 25, 2021

Tommaso DorigoFitting Lines Through Points With Simple Math

One of the reasons why I love my job as a researcher in experimental physics is that every day brings along a new problem to solve, and through decades of practice I have become quick at solving them, so I typically enjoy doing it. And it does not matter whether the problem at hand is an entirely new, challenging one or a textbook thing that has been solved a million times before. It is your problem, and it deserves your attention and dedication.

read more

Robert HellingOn Choice

 This is a follow-up to a Twitter discussion with John Baez that did not fit into the 260 character limit. And before starting, I should warn you that I have never studied set theory in any seriousness and everything I am about to write here is only based on hearsay and is probably wrong.

I am currently teaching "Mathematical Statistical Physics" once more, large part of which is to explain the operator algebraic approach to quantum statistical physics, KMS states and all that. Part of this is that I cover states as continuous linear functionals on the observables (positive and normalised) and in the example of B(H), the bounded linear operators on a Hilbert space H, I mention that trace class operators $$\rho\in {\cal S}^1({\cal H})$$ give rise to those via $$\omega(a) = {\hbox tr}(\rho a).$$

And "pretty much all" states arise in this way meaning that the bounded operators are the (topological) dual to trance class operators but, for infinite dimensional H, not the other way around as the bi-dual is larger. There are bounded linear functionals on B(H) that don't come from trace class operators. But (without too much effort), I have never seen one of those extra states being "presented". I have very low standards here for "presented" meaning that I suspect you need to invoke the axiom of choice to produce them (and this is also what John said) and for class this would be sufficient. We invoke choice in other places as well, like (via Hahn-Banach) every C*-algebra having faithful representations (or realising it as a closed Subalgebra of some huge B(H)).

So much for background. I wanted to tell you about my attitude towards choice. When I was a student, I never had any doubt about it. Sure, every vector space has a basis, there are sets that are not Lebesque measurable. A little abstract, but who cares. It was a blog post by Terrence Tao that made me reconsider that (turns out, I cannot find that post anymore, bummer). It goes like this: On one of these islands, there is this prison where it is announced to the prisoners that the following morning, they are all placed in a row and everybody is given a hat that is either black or white. No prisoner can see his own hat but all of those in front of him. They can guess their color (and all other prisoners hear the guess). Those who guess right get free while those who guess wrong get executed. How many can go free?

Think about it.

The answer is: All but one half. The last one says white if he sees an even number of white hats in front of him. Then all the others can deduce their color from this parity plus the answers of the other prisoners behind them. So all but the last prisoner get out and the last has a 50% chance.

But that was too easy. Now, this is a really big prison and there are countably infinitely many prisoners. How many go free? When they are placed in a row, the row extends to infinity only in one direction and they are looking in this infinite direction. The last prisoner sees all the other prisoners.

Think about it.

In this case, the answer is: Almost all, all but finitely many. Out of all infinite sequences of black/white form equivalence classes where two sequences are equivalent if they differ at at most finitely many places, you could say they are asymptotically equal. Out of these equivalence classes, the axiom of choice allows us to pick one representative of each. The prisoners memorise these representatives (there are only aleph1 many, they have big brains). The next morning in the court of the prison, all prisoners can see almost all other prisoners, so they know which equivalence class of sequences was chosen by the prison's director. Now, every prisoner announces the color of the memorised representative at his position and by construction, only finitely many are wrong.

This argument as raised some doubts in me if I really want choice to be true. I came to terms with it and would describe my position as agnostic. I mainly try to avoid it and better not rely too much on constructions that invoke it. And for my physics applications this is usually fine.

But it can also be useful in everyday's life: My favourite example of it is in the context of distributions. Those are, as you know, continuous linear functionals on test functions. The topology on the test functions, however, is a bit inconvenient, as you have to check all (infinitely many) partial derivates to converge. So you might try to do the opposite thing: Let's study a linear functional on test functions that is not continuous. Turns out, those are harder to get hold of than you might think: You might think this is like linear operators where continuity is equivalent to boundedness. But this case is different: You need to invoke choice to find one. But this is good, since this implies that every concrete linear functional that you can construct (write down explicitly) is automatically continuous, you don't have to prove anything!

This type of argument is a little dangerous: You really need more than "the usual way to get one is to invoke choice". You really need that it is equivalent to choice. And choice says that for every collection of sets, the Cartesian product is non-empty. It is the "every" that is important. The collection that consists of copies of the set {apple} trivially has an element in the Cartesian product, it is (apple, apple, apple, ...). And this element is concrete, I just constructed it for you.

This caveat is a bit reminiscent of a false argument that you can read far too often: You show that some class of problems is NP-complete (recent incarnations: deciding if an isolated string theory vacuum as a small cosmological constant, deciding if a spin-chain model is gapped, determining the phase structure of some spin chain model, ...) and then arguing that these problems are "hard to solve". But this does not imply that a concrete problem in this class is difficult. It is only that solving all problems in this class of problems is difficult. Every single instance of practical relevance could be easy (for example because you had additional information that trivialises the problem). It could well be that you are only interested in spin chain Hamiltonians of some specific form and that you can find a proof that all of them are gapped (or not gapped for that matter). It only means that your original class of problems was too big, it contained too many problems that don't have relevance in your case. This could for example well be for the string theory vacua: In the paper I have in mind, that was modelled (of course actually fixing all moduli and computing the value of the potential in all vacua cannot be done with today's technology) by saying there are N moduli fields and each can have at least two values with different values of its potential (positive or negative) and we assume you simply have to add all those values. Is there one choice of the values of all moduli fields such that the sum of the energies is epsilon-close to 0? This turns out to be equivalent to the knapsack-problem which is known to be NP-complete. But for this you need to allow for all possible values of the potential for the individual moduli. If, for example, you knew the values for all moduli are the same, that particular incarnation of the problem is trivial. So just knowing that the concrete problem you are interested in is a member of a class of problems that is NP-complete does not make that concrete problem hard by itself.

What is you attitude towards choice? When is the argument "Here is a concrete, constructed example of a thing x. I am interested in some property P of it. To show there are y that don't have property P, I need to invoke choice. Does this prove x has property P?" to be believed?

June 13, 2021

Terence TaoSingmaster’s conjecture in the interior of Pascal’s triangle

Kaisa Matomäki, Maksym Radziwill, Xuancheng Shao, Joni Teräväinen, and myself have just uploaded to the arXiv our preprint “Singmaster’s conjecture in the interior of Pascal’s triangle“. This paper leverages the theory of exponential sums over primes to make progress on a well known conjecture of Singmaster which asserts that any natural number larger than {1} appears at most a bounded number of times in Pascal’s triangle. That is to say, for any integer {t \geq 2}, there are at most {O(1)} solutions to the equation

\displaystyle  \binom{n}{m} = t \ \ \ \ \ (1)

with {1 \leq m < n}. Currently, the largest number of solutions that is known to be attainable is eight, with {t} equal to

\displaystyle  3003 = \binom{3003}{1} = \binom{78}{2} = \binom{15}{5} = \binom{14}{6} = \binom{14}{8} = \binom{15}{10}

\displaystyle = \binom{78}{76} = \binom{3003}{3002}.

Because of the symmetry {\binom{n}{m} = \binom{n}{n-m}} of Pascal’s triangle it is natural to restrict attention to the left half {1 \leq m \leq n/2} of the triangle.

Our main result settles this conjecture in the “interior” region of the triangle:

Theorem 1 (Singmaster’s conjecture in the interior of the triangle) If {0 < \varepsilon < 1} and {t} is sufficiently large depending on {\varepsilon}, there are at most two solutions to (1) in the region

\displaystyle  \exp( \log^{2/3+\varepsilon} n ) \leq m \leq n/2 \ \ \ \ \ (2)

and hence at most four in the region

\displaystyle  \exp( \log^{2/3+\varepsilon} n ) \leq m \leq n - \exp( \log^{2/3+\varepsilon} n ).

Also, there is at most one solution in the region

\displaystyle  \exp( \log^{2/3+\varepsilon} n ) \leq m \leq n/\exp(\log^{1-\varepsilon} n ).

To verify Singmaster’s conjecture in full, it thus suffices in view of this result to verify the conjecture in the boundary region

\displaystyle  2 \leq m < \exp(\log^{2/3+\varepsilon} n) \ \ \ \ \ (3)

(or equivalently {n - \exp(\log^{2/3+\varepsilon} n) < m \leq n}); we have deleted the {m=1} case as it of course automatically supplies exactly one solution to (1). It is in fact possible that for {t} sufficiently large there are no further collisions {\binom{n}{m} = \binom{n'}{m'}=t} for {(n,m), (n',m')} in the region (3), in which case there would never be more than eight solutions to (1) for sufficiently large {t}. This is latter claim known for bounded values of {m,m'} by Beukers, Shorey, and Tildeman, with the main tool used being Siegel’s theorem on integral points.

The upper bound of two here for the number of solutions in the region (2) is best possible, due to the infinite family of solutions to the equation

\displaystyle  \binom{n+1}{m+1} = \binom{n}{m+2} \ \ \ \ \ (4)

coming from {n = F_{2j+2} F_{2j+3}-1}, {m = F_{2j} F_{2j+3}-1} and {F_j} is the {j^{th}} Fibonacci number.

The appearance of the quantity {\exp( \log^{2/3+\varepsilon} n )} in Theorem 1 may be familiar to readers that are acquainted with Vinogradov’s bounds on exponential sums, which ends up being the main new ingredient in our arguments. In principle this threshold could be lowered if we had stronger bounds on exponential sums.

To try to control solutions to (1) we use a combination of “Archimedean” and “non-Archimedean” approaches. In the “Archimedean” approach (following earlier work of Kane on this problem) we view {n,m} primarily as real numbers rather than integers, and express (1) in terms of the Gamma function as

\displaystyle  \frac{\Gamma(n+1)}{\Gamma(m+1) \Gamma(n-m+1)} = t.

One can use this equation to solve for {n} in terms of {m,t} as

\displaystyle  n = f_t(m)

for a certain real analytic function {f_t} whose asymptotics are easily computable (for instance one has the asymptotic {f_t(m) \asymp m t^{1/m}}). One can then view the problem as one of trying to control the number of lattice points on the graph {\{ (m,f_t(m)): m \in {\bf R} \}}. Here we can take advantage of the fact that in the regime {m \leq f_t(m)/2} (which corresponds to working in the left half {m \leq n/2} of Pascal’s triangle), the function {f_t} can be shown to be convex, but not too convex, in the sense that one has both upper and lower bounds on the second derivative of {f_t} (in fact one can show that {f''_t(m) \asymp f_t(m) (\log t/m^2)^2}). This can be used to preclude the possibility of having a cluster of three or more nearby lattice points on the graph {\{ (m,f_t(m)): m \in {\bf R} \}}, basically because the area subtended by the triangle connecting three of these points would lie between {0} and {1/2}, contradicting Pick’s theorem. Developing these ideas, we were able to show

Proposition 2 Let {\varepsilon>0}, and suppose {t} is sufficiently large depending on {\varepsilon}. If {(m,n)} is a solution to (1) in the left half {m \leq n/2} of Pascal’s triangle, then there is at most one other solution {(m',n')} to this equation in the left half with

\displaystyle  |m-m'| + |n-n'| \ll \exp( (\log\log t)^{1-\varepsilon} ).

Again, the example of (4) shows that a cluster of two solutions is certainly possible; the convexity argument only kicks in once one has a cluster of three or more solutions.

To finish the proof of Theorem 1, one has to show that any two solutions {(m,n), (m',n')} to (1) in the region of interest must be close enough for the above proposition to apply. Here we switch to the “non-Archimedean” approach, in which we look at the {p}-adic valuations {\nu_p( \binom{n}{m} )} of the binomial coefficients, defined as the number of times a prime {p} divides {\binom{n}{m}}. From the fundamental theorem of arithmetic, a collision

\displaystyle  \binom{n}{m} = \binom{n'}{m'}

between binomial coefficients occurs if and only if one has agreement of valuations

\displaystyle  \nu_p( \binom{n}{m} ) = \nu_p( \binom{n'}{m'} ). \ \ \ \ \ (5)

From the Legendre formula

\displaystyle  \nu_p(n!) = \sum_{j=1}^\infty \lfloor \frac{n}{p^j} \rfloor

we can rewrite this latter identity (5) as

\displaystyle  \sum_{j=1}^\infty \{ \frac{m}{p^j} \} + \{ \frac{n-m}{p^j} \} - \{ \frac{n}{p^j} \} = \sum_{j=1}^\infty \{ \frac{m'}{p^j} \} + \{ \frac{n'-m'}{p^j} \} - \{ \frac{n'}{p^j} \}, \ \ \ \ \ (6)

where {\{x\} := x - \lfloor x\rfloor} denotes the fractional part of {x}. (These sums are not truly infinite, because the summands vanish once {p^j} is larger than {\max(n,n')}.)

A key idea in our approach is to view this condition (6) statistically, for instance by viewing {p} as a prime drawn randomly from an interval such as {[P, P + P \log^{-100} P]} for some suitably chosen scale parameter {P}, so that the two sides of (6) now become random variables. It then becomes advantageous to compare correlations between these two random variables and some additional test random variable. For instance, if {n} and {n'} are far apart from each other, then one would expect the left-hand side of (6) to have a higher correlation with the fractional part {\{ \frac{n}{p}\}}, since this term shows up in the summation on the left-hand side but not the right. Similarly if {m} and {m'} are far apart from each other (although there are some annoying cases one has to treat separately when there is some “unexpected commensurability”, for instance if {n'-m'} is a rational multiple of {m} where the rational has bounded numerator and denominator). In order to execute this strategy, it turns out (after some standard Fourier expansion) that one needs to get good control on exponential sums such as

\displaystyle  \sum_{P \leq p \leq P + P\log^{-100} P} e( \frac{N}{p} + \frac{M}{p^j} )

for various choices of parameters {P, N, M, j}, where {e(\theta) := e^{2\pi i \theta}}. Fortunately, the methods of Vinogradov (which more generally can handle sums such as {\sum_{n \in I} e(f(n))} and {\sum_{p \in I} e(f(p))} for various analytic functions {f}) can give useful bounds on such sums as long as {N} and {M} are not too large compared to {P}; more specifically, Vinogradov’s estimates are non-trivial in the regime {N,M \ll \exp( \log^{3/2-\varepsilon} P )}, and this ultimately leads to a distance bound

\displaystyle  m' - m \ll_\varepsilon \exp( \log^{2/3 +\varepsilon}(n+n') )

between any colliding pair {(n,m), (n',m')} in the left half of Pascal’s triangle, as well as the variant bound

\displaystyle  n' - n \ll_\varepsilon \exp( \log^{2/3 +\varepsilon}(n+n') )

under the additional assumption

\displaystyle  m', m \geq \exp( \log^{2/3 +\varepsilon}(n+n') ).

Comparing these bounds with Proposition 2 and using some basic estimates about the function {f_t}, we can conclude Theorem 1.

A modification of the arguments also gives similar results for the equation

\displaystyle  (n)_m = t \ \ \ \ \ (7)

where {(n)_m := n (n-1) \dots (n-m+1)} is the falling factorial:

Theorem 3 If {0 < \varepsilon < 1} and {t} is sufficiently large depending on {\varepsilon}, there are at most two solutions to (7) in the region

\displaystyle  \exp( \log^{2/3+\varepsilon} n ) \leq m < n. \ \ \ \ \ (8)

Again the upper bound of two is best possible, thanks to identities such as

\displaystyle  (a^2-a)_{a^2-2a} = (a^2-a-1)_{a^2-2a+1}.

June 07, 2021

Terence TaoGoursat and Furstenberg-Weiss type lemmas

I’m collecting in this blog post a number of simple group-theoretic lemmas, all of the following flavour: if {H} is a subgroup of some product {G_1 \times \dots \times G_k} of groups, then one of three things has to happen:

  • ({H} too small) {H} is contained in some proper subgroup {G'_1 \times \dots \times G'_k} of {G_1 \times \dots \times G_k}, or the elements of {H} are constrained to some sort of equation that the full group {G_1 \times \dots \times G_k} does not satisfy.
  • ({H} too large) {H} contains some non-trivial normal subgroup {N_1 \times \dots \times N_k} of {G_1 \times \dots \times G_k}, and as such actually arises by pullback from some subgroup of the quotient group {G_1/N_1 \times \dots \times G_k/N_k}.
  • (Structure) There is some useful structural relationship between {H} and the groups {G_1,\dots,G_k}.
These sorts of lemmas show up often in ergodic theory, when the equidistribution of some orbit is governed by some unspecified subgroup {H} of a product group {G_1 \times \dots \times G_k}, and one needs to know further information about this subgroup in order to take the analysis further. In some cases only two of the above three options are relevant. In the cases where {H} is too “small” or too “large” one can reduce the groups {G_1,\dots,G_k} to something smaller (either a subgroup or a quotient) and in applications one can often proceed in this case by some induction on the “size” of the groups {G_1,\dots,G_k} (for instance, if these groups are Lie groups, one can often perform an induction on dimension), so it is often the structured case which is the most interesting case to deal with.

It is perhaps easiest to explain the flavour of these lemmas with some simple examples, starting with the {k=1} case where we are just considering subgroups {H} of a single group {G}.

Lemma 1 Let {H} be a subgroup of a group {G}. Then exactly one of the following hold:
  • (i) ({H} too small) There exists a non-trivial group homomorphism {\eta: G \rightarrow K} into a group {K = (K,\cdot)} such that {\eta(h)=1} for all {h \in H}.
  • (ii) ({H} normally generates {G}) {G} is generated as a group by the conjugates {gHg^{-1}} of {H}.

Proof: Let {G'} be the group normally generated by {H}, that is to say the group generated by the conjugates {gHg^{-1}} of {H}. This is a normal subgroup of {G} containing {H} (indeed it is the smallest such normal subgroup). If {G'} is all of {G} we are in option (ii); otherwise we can take {K} to be the quotient group {K := G/G'} and {\eta} to be the quotient map. Finally, if (i) holds, then all of the conjugates {gHg^{-1}} of {H} lie in the kernel of {\eta}, and so (ii) cannot hold. \Box

Here is a “dual” to the above lemma:

Lemma 2 Let {H} be a subgroup of a group {G}. Then exactly one of the following hold:
  • (i) ({H} too large) {H} is the pullback {H = \pi^{-1}(H')} of some subgroup {H'} of {G/N} for some non-trivial normal subgroup {N} of {G}, where {\pi: G \rightarrow G/N} is the quotient map.
  • (ii) ({H} is core-free) {H} does not contain any non-trivial conjugacy class {\{ ghg^{-1}: g \in G \}}.

Proof: Let {N} be the normal core of {H}, that is to say the intersection of all the conjugates {gHg^{-1}} of {H}. This is the largest normal subgroup of {G} that is contained in {H}. If {N} is non-trivial, we can quotient it out and end up with option (i). If instead {N} is trivial, then there is no non-trivial element {h} that lies in the core, hence no non-trivial conjugacy class lies in {H} and we are in option (ii). Finally, if (i) holds, then every conjugacy class of an element of {N} is contained in {N} and hence in {H}, so (ii) cannot hold. \Box

For subgroups of nilpotent groups, we have a nice dichotomy that detects properness of a subgroup through abelian representations:

Lemma 3 Let {H} be a subgroup of a nilpotent group {G}. Then exactly one of the following hold:
  • (i) ({H} too small) There exists non-trivial group homomorphism {\eta: G \rightarrow K} into an abelian group {K = (K,+)} such that {\eta(h)=0} for all {h \in H}.
  • (ii) {H=G}.

Informally: if {h} is a variable ranging in a subgroup {H} of a nilpotent group {G}, then either {h} is unconstrained (in the sense that it really ranges in all of {G}), or it obeys some abelian constraint {\eta(h)=0}.

Proof: By definition of nilpotency, the lower central series

\displaystyle  G_2 := [G,G], G_3 := [G,G_2], \dots

eventually becomes trivial.

Since {G_2} is a normal subgroup of {G}, {HG_2} is also a subgroup of {G}. Suppose first that {HG_2} is a proper subgroup of {G}, then the quotient map {\eta \colon G \rightarrow G/HG_2} is a non-trivial homomorphism to an abelian group {G/HG_2} that annihilates {H}, and we are in option (i). Thus we may assume that {HG_2 = G}, and thus

\displaystyle  G_2 = [G,G] = [G, HG_2].

Note that modulo the normal group {G_3}, {G_2} commutes with {G}, hence

\displaystyle  [G, HG_2] \subset [G,H] G_3 \subset H G_3

and thus

\displaystyle  G = H G_2 \subset H H G_3 = H G_3.

We conclude that {HG_3 = G}. One can continue this argument by induction to show that {H G_i = G} for every {i}; taking {i} large enough we end up in option (ii). Finally, it is clear that (i) and (ii) cannot both hold. \Box

Remark 4 When the group {G} is locally compact and {H} is closed, one can take the homomorphism {\eta} in Lemma 3 to be continuous, and by using Pontryagin duality one can also take the target group {K} to be the unit circle {{\bf R}/{\bf Z}}. Thus {\eta} is now a character of {G}. Similar considerations hold for some of the later lemmas in this post. Discrete versions of this above lemma, in which the group {H} is replaced by some orbit of a polynomial map on a nilmanifold, were obtained by Leibman and are important in the equidistribution theory of nilmanifolds; see this paper of Ben Green and myself for further discussion.

Here is an analogue of Lemma 3 for special linear groups, due to Serre (IV-23):

Lemma 5 Let {p \geq 5} be a prime, and let {H} be a closed subgroup of {SL_2({\bf Z}_p)}, where {{\bf Z}_p} is the ring of {p}-adic integers. Then exactly one of the following hold:
  • (i) ({H} too small) There exists a proper subgroup {H'} of {SL_2({\mathbf F}_p)} such that {h \hbox{ mod } p \in H'} for all {h \in H}.
  • (ii) {H=SL_2({\bf Z}_p)}.

Proof: It is a standard fact that the reduction of {SL_2({\bf Z}_p)} mod {p} is {SL_2({\mathbf F}_p)}, hence (i) and (ii) cannot both hold.

Suppose that (i) fails, then for every {g \in SL_2({\bf Z}_p)} there exists {h \in H} such that {h = g \hbox{ mod } p}, which we write as

\displaystyle  h = g + O(p).

We now claim inductively that for any {j \geq 0} and {g \in SL_2({\bf Z}_p)}, there exists {h \in SL_2({\bf Z}_p)} with {h = g + O(p^{j+1})}; taking limits as {j \rightarrow \infty} using the closed nature of {H} will then place us in option (ii).

The case {j=0} is already handled, so now suppose {j=1}. If {g \in SL_2({\bf Z}_p)}, we see from the {j=0} case that we can write {g = hg'} where {h \in H} and {g' = 1+O(p)}. Thus to establish the {j=1} claim it suffices to do so under the additional hypothesis that {g = 1+O(p)}.

First suppose that {g = 1 + pX + O(p^2)} for some {X \in M_2({\bf Z}_p)} with {X^2=0 \hbox{ mod } p}. By the {j=0} case, we can find {h \in H} of the form {h = 1 + X + pY + O(p^2)} for some {Y \in M_2({\bf Z}_p)}. Raising to the {p^{th}} power and using {X^2=0} and {p \geq 5 > 3}, we note that

\displaystyle h^p = 1 + \binom{p}{1} X + \binom{p}{1} pY + \binom{p}{2} X pY + \binom{p}{2} pY X

\displaystyle + \binom{p}{3} X pY X + O(p^2)

\displaystyle  = 1 + pX + O(p^2),

giving the claim in this case.

Any {2 \times 2} matrix of trace zero with coefficients in {{\mathbf F}_p} is a linear combination of {\begin{pmatrix} 0 & 1 \\ 0 & 0 \end{pmatrix}}, {\begin{pmatrix} 0 & 0 \\ 1 & 0 \end{pmatrix}}, {\begin{pmatrix} 1 & 1 \\ -1 & -1 \end{pmatrix}} and is thus a sum of matrices that square to zero. Hence, if {g \in SL_2({\bf Z}_p)} is of the form {g = 1 + O(p)}, then {g = 1 + pY + O(p^2)} for some matrix {Y} of trace zero, and thus one can write {g} (up to {O(p^2)} errors) as the finite product of matrices of the form {1 + pY + O(p^2)} with {Y^2=0}. By the previous arguments, such a matrix {1+pY + O(p^2)} lies in {H} up to {O(p^2)} errors, and hence {g} does also. This completes the proof of the {j=1} case.

Now suppose {j \geq 2} and the claim has already been proven for {j-1}. Arguing as before, it suffices to close the induction under the additional hypothesis that {g = 1 + O(p^j)}, thus we may write {g = 1 + p^j X + O(p^{j+1})}. By induction hypothesis, we may find {h \in H} with {h = 1 + p^{j-1} X + O(p^j)}. But then {h^p = 1 + p^j X + O(p^{j+1})}, and we are done. \Box

We note a generalisation of Lemma 3 that involves two groups {G_1,G_2} rather than just one:

Lemma 6 Let {H} be a subgroup of a product {G_1 \times G_2} of two nilpotent groups {G_1, G_2}. Then exactly one of the following hold:
  • (i) ({H} too small) There exists group homomorphisms {\eta_1: G'_1 \rightarrow K}, {\eta_2: G_2 \rightarrow K} into an abelian group {K = (K,+)}, with {\eta_2} non-trivial, such that {\eta_1(h_1) + \eta_2(h_2)=0} for all {(h_1,h_2) \in H}, where {G'_1 := \{ h_1: (h_1,h_2) \in H \hbox{ for some } h_2 \in G_2 \}} is the projection of {H} to {G_1}.
  • (ii) {H = G'_1 \times G_2} for some subgroup {G'_1} of {G_2}.

Proof: Consider the group {\{ h_2 \in G_2: (1,h_2) \in H \}}. This is a subgroup of {G_2}. If it is all of {G_2}, then {H} must be a Cartesian product {H = G'_1 \times G_2} and option (ii) holds. So suppose that this group is a proper subgroup of {G_2}. Applying Lemma 3, we obtain a non-trivial group homomorphism {\eta_2: G_2 \rightarrow K} into an abelian group {K = (K,+)} such that {\eta(h_2)=0} whenever {(1,h_2) \in H}. For any {h_1} in the projection {G'_1} of {H} to {G_1}, there is thus a unique quantity {\eta_1(h_1) \in H} such that {\eta_1(h_1) + \eta_2(h_2) = 0} whenever {(h_1,h_2) \in H}. One easily checks that {\eta_1} is a homomorphism, so we are in option (i).

Finally, it is clear that (i) and (ii) cannot both hold, since (i) places a non-trivial constraint on the second component {h_2} of an element {(h_1,h_2) \in H} of {H} for any fixed choice of {h_1}. \Box

We also note a similar variant of Lemma 5, which is Lemme 10 of this paper of Serre:

Lemma 7 Let {p \geq 5} be a prime, and let {H} be a closed subgroup of {SL_2({\bf Z}_p) \times SL_2({\bf Z}_p)}. Then exactly one of the following hold:
  • (i) ({H} too small) There exists a proper subgroup {H'} of {SL_2({\mathbf F}_p) \times SL_2({\mathbf F}_p)} such that {h \hbox{ mod } p \in H'} for all {h \in H}.
  • (ii) {H=SL_2({\bf Z}_p) \times SL_2({\bf Z}_p)}.

Proof: As in the proof of Lemma 5, (i) and (ii) cannot both hold. Suppose that (i) does not hold, then for any {g \in SL_2({\bf Z}_p)} there exists {h_1 \in H} such that {h_1 = (g+O(p), 1 + O(p))}. Similarly, there exists {h_0 \in H} with {h_0 = (1+O(p), 1+O(p))}. Taking commutators of {h_1} and {h_0}, we can find {h_2 \in H} with {h_2 = (g+O(p), 1+O(p^2))}. Continuing to take commutators with {h_0} and extracting a limit (using compactness and the closed nature of {H}), we can find {h_\infty \in H} with {h_\infty = (g+O(p),1)}. Thus, the closed subgroup {\{ g \in SL_2({\bf Z}_p): (g,1) \in H \}} of {SL_2({\bf Z}_p)} does not obey conclusion (i) of Lemma 5, and must therefore obey conclusion (ii); that is to say, {H} contains {SL_2({\bf Z}_p) \times \{1\}}. Similarly {H} contains {\{1\} \times SL_2({\bf Z}_p)}; multiplying, we end up in conclusion (ii). \Box

The most famous result of this type is of course the Goursat lemma, which we phrase here in a somewhat idiosyncratic manner to conform to the pattern of the other lemmas in this post:

Lemma 8 (Goursat lemma) Let {H} be a subgroup of a product {G_1 \times G_2} of two groups {G_1, G_2}. Then one of the following hold:
  • (i) ({H} too small) {H} is contained in {G'_1 \times G'_2} for some subgroups {G'_1}, {G'_2} of {G_1, G_2} respectively, with either {G'_1 \subsetneq G_1} or {G'_2 \subsetneq G_2} (or both).
  • (ii) ({H} too large) There exist normal subgroups {N_1, N_2} of {G_1, G_2} respectively, not both trivial, such that {H = \pi^{-1}(H')} arises from a subgroup {H'} of {G_1/N_1 \times G_2/N_2}, where {\pi: G_1 \times G_2 \rightarrow G_1/N_1 \times G_2/N_2} is the quotient map.
  • (iii) (Isomorphism) There is a group isomorphism {\phi: G_1 \rightarrow G_2} such that {H = \{ (g_1, \phi(g_1)): g_1 \in G_1\}} is the graph of {\phi}. In particular, {G_1} and {G_2} are isomorphic.

Here we almost have a trichotomy, because option (iii) is incompatible with both option (i) and option (ii). However, it is possible for options (i) and (ii) to simultaneously hold.

Proof: If either of the projections {\pi_1: H \rightarrow G_1}, {\pi_2: H \rightarrow G_2} from {H} to the factor groups {G_1,G_2} (thus {\pi_1(h_1,h_2)=h_1} and {\pi_2(h_1,h_2)=h_2} fail to be surjective, then we are in option (i). Thus we may assume that these maps are surjective.

Next, if either of the maps {\pi_1: H \rightarrow G_1}, {\pi_2: H \rightarrow G_2} fail to be injective, then at least one of the kernels {N_1 \times \{1\} := \mathrm{ker} \pi_2}, {\{1\} \times N_2 := \mathrm{ker} \pi_1} is non-trivial. We can then descend down to the quotient {G_1/N_1 \times G_2/N_2} and end up in option (ii).

The only remaining case is when the group homomorphisms {\pi_1, \pi_2} are both bijections, hence are group isomorphisms. If we set {\phi := \pi_2 \circ \pi_1^{-1}} we end up in case (iii). \Box

We can combine the Goursat lemma with Lemma 3 to obtain a variant:

Corollary 9 (Nilpotent Goursat lemma) Let {H} be a subgroup of a product {G_1 \times G_2} of two nilpotent groups {G_1, G_2}. Then one of the following hold:
  • (i) ({H} too small) There exists {i=1,2} and a non-trivial group homomorphism {\eta_i: G_i \rightarrow K} such that {\eta_i(h_i)=0} for all {(h_1,h_2) \in H}.
  • (ii) ({H} too large) There exist normal subgroups {N_1, N_2} of {G_1, G_2} respectively, not both trivial, such that {H = \pi^{-1}(H')} arises from a subgroup {H'} of {G_1/N_1 \times G_2/N_2}.
  • (iii) (Isomorphism) There is a group isomorphism {\phi: G_1 \rightarrow G_2} such that {H = \{ (g_1, \phi(g_1)): g_1 \in G_1\}} is the graph of {\phi}. In particular, {G_1} and {G_2} are isomorphic.

Proof: If Lemma 8(i) holds, then by applying Lemma 3 we arrive at our current option (i). The other options are unchanged from Lemma 8, giving the claim. \Box

Now we present a lemma involving three groups {G_1,G_2,G_3} that is known in ergodic theory contexts as the “Furstenberg-Weiss argument”, as an argument of this type arose in this paper of Furstenberg and Weiss, though perhaps it also implicitly appears in other contexts also. It has the remarkable feature of being able to enforce the abelian nature of one of the groups once the other options of the lemma are excluded.

Lemma 10 (Furstenberg-Weiss lemma) Let {H} be a subgroup of a product {G_1 \times G_2 \times G_3} of three groups {G_1, G_2, G_3}. Then one of the following hold:
  • (i) ({H} too small) There is some proper subgroup {G'_3} of {G_3} and some {i=1,2} such that {h_3 \in G'_3} whenever {(h_1,h_2,h_3) \in H} and {h_i = 1}.
  • (ii) ({H} too large) There exists a non-trivial normal subgroup {N_3} of {G_3} with {G_3/N_3} abelian, such that {H = \pi^{-1}(H')} arises from a subgroup {H'} of {G_1 \times G_2 \times G_3/N_3}, where {\pi: G_1 \times G_2 \times G_3 \rightarrow G_1 \times G_2 \times G_3/N_3} is the quotient map.
  • (iii) {G_3} is abelian.

Proof: If the group {\{ h_3 \in G_3: (1,h_2,h_3) \in H \}} is a proper subgroup of {G_3}, then we are in option (i) (with {i=1}), so we may assume that

\displaystyle \{ h_3 \in G_3: (1,h_2,h_3) \in H \} = G.

Similarly we may assume that

\displaystyle \{ h_3 \in G_3: (h_1,1,h_3) \in H \} = G.

Now let {g_3,g'_3} be any two elements of {G}. By the above assumptions, we can find {h_1 \in G_1, h_2 \in G_2} such that

\displaystyle (1, h_2, g_3) \in H


\displaystyle (h_1,1, g'_3) \in H.

Taking commutators to eliminate the {h_1,h_2} terms, we conclude that

\displaystyle  (1, 1, [g_3,g'_3]) \in H.

Thus the group {\{ h_3 \in G_3: (1,1,h_3) \in H \}} contains every commutator {[g_3,g'_3]}, and thus contains the entire group {[G_3,G_3]} generated by these commutators. If {G_3} fails to be abelian, then {[G_3,G_3]} is a non-trivial normal subgroup of {G_3}, and {H} now arises from {G_1 \times G_2 \times G_3/[G_3,G_3]} in the obvious fashion, placing one in option (ii). Hence the only remaining case is when {G_3} is abelian, giving us option (iii). \Box

As before, we can combine this with previous lemmas to obtain a variant in the nilpotent case:

Lemma 11 (Nilpotent Furstenberg-Weiss lemma) Let {H} be a subgroup of a product {G_1 \times G_2 \times G_3} of three nilpotent groups {G_1, G_2, G_3}. Then one of the following hold:
  • (i) ({H} too small) There exists {i=1,2} and group homomorphisms {\eta_i: G'_i \rightarrow K}, {\eta_3: G_3 \rightarrow K} for some abelian group {K = (K,+)}, with {\eta_3} non-trivial, such that {\eta_i(h_i) + \eta_3(h_3) = 0} whenever {(h_1,h_2,h_3) \in H}, where {G'_i} is the projection of {H} to {G_i}.
  • (ii) ({H} too large) There exists a non-trivial normal subgroup {N_3} of {G_3}, such that {H = \pi^{-1}(H')} arises from a subgroup {H'} of {G_1 \times G_2 \times G_3/N_3}.
  • (iii) {G_3} is abelian.

Informally, this lemma asserts that if {(h_1,h_2,h_3)} is a variable ranging in some subgroup {G_1 \times G_2 \times G_3}, then either (i) there is a non-trivial abelian equation that constrains {h_3} in terms of either {h_1} or {h_2}; (ii) {h_3} is not fully determined by {h_1} and {h_2}; or (iii) {G_3} is abelian.

Proof: Applying Lemma 10, we are already done if conclusions (ii) or (iii) of that lemma hold, so suppose instead that conclusion (i) holds for say {i=1}. Then the group {\{ (h_1,h_3) \in G_1 \times G_3: (h_1,h_2,h_3) \in H \hbox{ for some } h_2 \in G_2 \}} is not of the form {G'_2 \times G_3}, since it only contains those {(1,h_3)} with {h_3 \in G'_3}. Applying Lemma 6, we obtain group homomorphisms {\eta_1: G'_1 \rightarrow K}, {\eta_3: G_3 \rightarrow K} into an abelian group {K= (K,+)}, with {\eta_3} non-trivial, such that {\eta_1(h_1) + \eta_3(h_3) = 0} whenever {(h_1,h_2,h_3) \in H}, placing us in option (i). \Box

The Furstenberg-Weiss argument is often used (though not precisely in this form) to establish that certain key structure groups arising in ergodic theory are abelian; see for instance Proposition 6.3(1) of this paper of Host and Kra for an example.

One can get more structural control on {H} in the Furstenberg-Weiss lemma in option (iii) if one also broadens options (i) and (ii):

Lemma 12 (Variant of Furstenberg-Weiss lemma) Let {H} be a subgroup of a product {G_1 \times G_2 \times G_3} of three groups {G_1, G_2, G_3}. Then one of the following hold:
  • (i) ({H} too small) There is some proper subgroup {G'_{ij}} of {G_i \times G_j} for some {1 \leq i < j \leq 3} such that {(h_i,h_j) \in G'_{ij}} whenever {(h_1,h_2,h_3) \in H}. (In other words, the projection of {H} to {G_i \times G_j} is not surjective.)
  • (ii) ({H} too large) There exists a normal {N_1, N_2, N_3} of {G_1, G_2, G_3} respectively, not all trivial, such that {H = \pi^{-1}(H')} arises from a subgroup {H'} of {G_1/N_1 \times G_2/N_2 \times G_3/N_3}, where {\pi: G_1 \times G_2 \times G_3 \rightarrow G_1/N_1 \times G_2/N_2 \times G_3/N_3} is the quotient map.
  • (iii) {G_1,G_2,G_3} are abelian and isomorphic. Furthermore, there exist isomorphisms {\phi_1: G_1 \rightarrow K}, {\phi_2: G_2 \rightarrow K}, {\phi_3: G_3 \rightarrow K} to an abelian group {K = (K,+)} such that

    \displaystyle  H = \{ (g_1,g_2,g_3) \in G_1 \times G_2 \times G_3: \phi(g_1) + \phi(g_2) + \phi(g_3) = 0 \}.

The ability to encode an abelian additive relation in terms of group-theoretic properties is vaguely reminiscent of the group configuration theorem.

Proof: We apply Lemma 10. Option (i) of that lemma implies option (i) of the current lemma, and similarly for option (ii), so we may assume without loss of generality that {G_3} is abelian. By permuting we may also assume that {G_1,G_2} are abelian, and will use additive notation for these groups.

We may assume that the projections of {H} to {G_1 \times G_2} and {G_3} are surjective, else we are in option (i). The group {\{ g_3 \in G_3: (1,1,g_3) \in H\}} is then a normal subgroup of {G_3}; we may assume it is trivial, otherwise we can quotient it out and be in option (ii). Thus {H} can be expressed as a graph {\{ (h_1,h_2,\phi(h_1,h_2)): h_1 \in G_1, h_2 \in G_2\}} for some map {\phi: G_1 \times G_2 \rightarrow G_3}. As {H} is a group, {\phi} must be a homomorphism, and we can write it as {\phi(h_1+h_2) = -\phi_1(h_1) - \phi_2(h_2)} for some homomorphisms {\phi_1: G_1 \rightarrow G_3}, {\phi_2: G_2 \rightarrow G_3}. Thus elements {(h_1,h_2,h_3)} of {H} obey the constraint {\phi_1(h_1) + \phi_2(h_2) + h_3 = 0}.

If {\phi_1} or {\phi_2} fails to be injective, then we can quotient out by their kernels and end up in option (ii). If {\phi_1} fails to be surjective, then the projection of {H} to {G_2 \times G_3} also fails to be surjective (since for {(h_1,h_2,h_3) \in H}, {\phi_2(h_2) + h_3} is now constrained to lie in the range of {\phi_1}) and we are in option (i). Similarly if {\phi_2} fails to be surjective. Thus we may assume that the homomorphisms {\phi_1,\phi_2} are bijective and thus group isomorphisms. Setting {\phi_3} to the identity, we arrive at option (iii). \Box

Combining this lemma with Lemma 3, we obtain a nilpotent version:

Corollary 13 (Variant of nilpotent Furstenberg-Weiss lemma) Let {H} be a subgroup of a product {G_1 \times G_2 \times G_3} of three groups {G_1, G_2, G_3}. Then one of the following hold:
  • (i) ({H} too small) There are homomorphisms {\eta_i: G_i \rightarrow K}, {\eta_j: G_j \rightarrow K} to some abelian group {K =(K,+)} for some {1 \leq i < j \leq 3}, with {\eta_i, \eta_j} not both trivial, such that {\eta_i(h_i) + \eta_j(h_j) = 0} whenever {(h_1,h_2,h_3) \in H}.
  • (ii) ({H} too large) There exists a normal {N_1, N_2, N_3} of {G_1, G_2, G_3} respectively, not all trivial, such that {H = \pi^{-1}(H')} arises from a subgroup {H'} of {G_1/N_1 \times G_2/N_2 \times G_3/N_3}, where {\pi: G_1 \times G_2 \times G_3 \rightarrow G_1/N_1 \times G_2/N_2 \times G_3/N_3} is the quotient map.
  • (iii) {G_1,G_2,G_3} are abelian and isomorphic. Furthermore, there exist isomorphisms {\phi_1: G_1 \rightarrow K}, {\phi_2: G_2 \rightarrow K}, {\phi_3: G_3 \rightarrow K} to an abelian group {K = (K,+)} such that

    \displaystyle  H = \{ (g_1,g_2,g_3) \in G_1 \times G_2 \times G_3: \phi(g_1) + \phi(g_2) + \phi(g_3) = 0 \}.

Here is another variant of the Furstenberg-Weiss lemma, attributed to Serre by Ribet (see Lemma 3.3):

Lemma 14 (Serre’s lemma) Let {H} be a subgroup of a finite product {G_1 \times \dots \times G_k} of groups {G_1,\dots,G_k} with {k \geq 2}. Then one of the following hold:
  • (i) ({H} too small) There is some proper subgroup {G'_{ij}} of {G_i \times G_j} for some {1 \leq i < j \leq k} such that {(h_i,h_j) \in G'_{ij}} whenever {(h_1,\dots,h_k) \in H}.
  • (ii) ({H} too large) One has {H = G_1 \times \dots \times G_k}.
  • (iii) One of the {G_i} has a non-trivial abelian quotient {G_i/N_i}.

Proof: The claim is trivial for {k=2} (and we don’t need (iii) in this case), so suppose that {k \geq 3}. We can assume that each {G_i} is a perfect group, {G_i = [G_i,G_i]}, otherwise we can quotient out by the commutator and arrive in option (iii). Similarly, we may assume that all the projections of {H} to {G_i \times G_j}, {1 \leq i < j \leq k} are surjective, otherwise we are in option (i).

We now claim that for any {1 \leq j < k} and any {g_k \in G_k}, one can find {(h_1,\dots,h_k) \in H} with {h_i=1} for {1 \leq i \leq j} and {h_k = g_k}. For {j=1} this follows from the surjectivity of the projection of {H} to {G_1 \times G_k}. Now suppose inductively that {1 < j < k} and the claim has already been proven for {j-1}. Since {G_k} is perfect, it suffices to establish this claim for {g_k} of the form {g_k = [g'_k, g''_k]} for some {g'_k, g''_k \in G_k}. By induction hypothesis, we can find {(h'_1,\dots,h'_k) \in H} with {h'_i = 1} for {1 \leq i < j} and {h'_k = g'_k}. By surjectivity of the projection of {H} to {G_j \times G_k}, one can find {(h''_1,\dots,h''_k) \in H} with {h''_j = 1} and {h''_k=g''_k}. Taking commutators of these two elements, we obtain the claim.

Setting {j = k-1}, we conclude that {H} contains {1 \times \dots \times 1 \times G_k}. Similarly for permutations. Multiplying these together we see that {H} contains all of {G_1 \times \dots \times G_k}, and we are in option (ii). \Box

June 05, 2021

Peter Rohde ASPI policy brief launch

Our recent panel discussion with the Australian Strategic Policy Institute (ASPI) International Cyber Policy Centre for the launch of our policy brief “An Australian strategy for the quantum revolution”, with Simon Devitt, Tara Roberson and Gavin Brennen, hosted by Danielle Cave.

June 04, 2021

Clifford JohnsonEnergy Pulse!

For your weekend listening: Recently, I chatted with Maiken Scott, the host of the podcast The Pulse (WHYY) about many everyday aspects of energy. It was a delightful conversation and episode. Listen and share with friends! Link is here. Enjoy! -cvj

The post Energy Pulse! appeared first on Asymptotia.

June 03, 2021

John PreskillPeeking into the world of quantum intelligence

Intelligent beings have the ability to receive, process, store information, and based on the processed information, predict what would happen in the future and act accordingly.

An illustration of receiving, processing, and storing information. Based on the processed information, one can make prediction about the future.
[Credit: Claudia Cheng]

We, as intelligent beings, receive, process, and store classical information. The information comes from vision, hearing, smell, and tactile sensing. The data is encoded as analog classical information through the electrical pulses sending through our nerve fibers. Our brain processes this information classically through neural circuits (at least that is our current understanding, but one should check out this blogpost). We then store this processed classical information in our hippocampus that allows us to retrieve it later to combine it with future information that we obtain. Finally, we use the stored classical information to make predictions about the future (imagine/predict the future outcomes if we perform certain action) and choose the action that would most likely be in our favor.

Such abilities have enabled us to make remarkable accomplishments: soaring in the sky by constructing accurate models of how air flows around objects, or building weak forms of intelligent beings capable of performing basic conversations and play different board games. Instead of receiving/processing/storing classical information, one could imagine some form of quantum intelligence that deals with quantum information instead of classical information. These quantum beings can receive quantum information through quantum sensors built up from tiny photons and atoms. They would then process this quantum information with quantum mechanical evolutions (such as quantum computers), and store the processed qubits in a quantum memory (protected with a surface code or toric code).

A caricature of human intelligence dating long before 1950, artificial intelligence that began in the 50’s, and the emergence of quantum intelligence.
[Credit: Claudia Cheng]

It is natural to wonder what a world of quantum intelligence would be like. While we have never encountered such a strange creature in the real world (yet), the mathematics of quantum mechanics, machine learning, and information theory allow us to peek into what such a fantastic world would be like. The physical world we live in is intrinsically quantum. So one may imagine that a quantum being is capable of making more powerful predictions than a classical being. Maybe he/she/they could better predict events that happened further away, such as tell us how a distant black hole was engulfing another? Or perhaps he/she/they could improve our lives, for example by presenting us with an entirely new approach for capturing energy from sunlight?

One may be skeptical about finding quantum intelligent beings in nature (and rightfully so). But it may not be so absurd to synthesize a weak form of quantum (artificial) intelligence in an experimental lab, or enhance our classical human intelligence with quantum devices to approximate a quantum-mechanical being. Many famous companies, like Google, IBM, Microsoft, and Amazon, as well as many academic labs and startups have been building better quantum machines/computers day by day. By combining the concepts of machine learning on classical computers with these quantum machines, the future of us interacting with some form of quantum (artificial) intelligence may not be so distant.

Before the day comes, could we peek into the world of quantum intelligence? And could one better understand how much more powerful they could be over classical intelligence?

A cartoon depiction of me (Left), Richard Kueng (Middle), and John Preskill (Right).
[Credit: Claudia Cheng]

In a recent publication [1], my advisor John Preskill, my good friend Richard Kueng, and I made some progress toward these questions. We consider a quantum mechanical world where classical beings could obtain classical information by measuring the world (performing POVM measurement). In contrast, quantum beings could retrieve quantum information through quantum sensors and store the data in a quantum memory. We study how much better quantum over classical beings could learn from the physical world to accurately predict the outcomes of unseen events (with the focus on the number of interactions with the physical world instead of computation time). We cast these problems in a rigorous mathematical framework and utilize high-dimensional probability and quantum information theory to understand their respective prediction power. Rigorously, one refers to a classical/quantum being as a classical/quantum model, algorithm, protocol, or procedure. This is because the actions of these classical/quantum beings are the center of the mathematical analysis.

Formally, we consider the task of learning an unknown physical evolution described by a CPTP map \mathcal{E} that takes in n-qubit state and maps to m-qubit state. The classical model can select an arbitrary classical input to the CPTP map and measure the output state of the CPTP map with some POVM measurement. The quantum model can access the CPTP map coherently and obtain quantum data from each access, which is equivalent to composing multiple CPTP maps with quantum computations to learn about the CPTP map. The task is to predict a property of the output state \mathcal{E}(\lvert x \rangle\!\langle x \rvert), given by \mathrm{Tr}(O \mathcal{E}(\lvert x \rangle\!\langle x \rvert)), for a new classical input x \in \{0, 1\}^n. And the goal is to achieve the task while accessing \mathcal{E} as few times as possible (i.e., fewer interactions or experiments in the physical world). We denote the number of interactions needed by classical and quantum models as N_{\mathrm{C}}, N_{\mathrm{Q}}.

In general, quantum models could learn from fewer interactions with the physical world (or experiments in the physical world) than classical models. This is because coherent quantum information can facilitate better information synthesis with information obtained from previous experiments. Nevertheless, in [1], we show that there is a fundamental limit to how much more efficient quantum models can be. In order to achieve a prediction error

\mathbb{E}_{x \sim \mathcal{D}} |h(x) -  \mathrm{Tr}(O \mathcal{E}(\lvert x \rangle\!\langle x \rvert))| \leq \mathcal{O}(\epsilon),

where h(x) is the hypothesis learned from the classical/quantum model and \mathcal{D} is an arbitrary distribution over the input space \{0, 1\}^n, we found that the speed-up N_{\mathrm{C}} / N_{\mathrm{Q}} is upper bounded by m / \epsilon, where m > 0 is the number of qubits each experiment provides (the output number of qubits in the CPTP map \mathcal{E}), and \epsilon > 0 is the desired prediction error (smaller \epsilon means we want to predict more accurately).

In contrast, when we want to accurately predict all unseen events, we prove that quantum models could use exponentially fewer experiments than classical models. We give a construction for predicting properties of quantum systems showing that quantum models could substantially outperform classical models. These rigorous results show that quantum intelligence shines when we seek stronger prediction performance.

We have only scratched the surface of what is possible with quantum intelligence. As the future unfolds, I am hopeful that we will discover more that can be done only by quantum intelligence, through mathematical analysis, rigorous numerical studies, and physical experiments.

Further information:

  • A classical model that can be used to accurately predict properties of quantum systems is the classical shadow formalism [2] that we proposed a year ago. In many tasks, this model can be shown to be one of the strongest rivals that quantum models have to surpass.
  • Even if a quantum model only receives and stores classical data, the ability to process the data using a quantum-mechanical evolution can still be advantageous [3]. However, obtaining large advantage will be harder in this case as the computational power in data can slightly boost classical machines/intelligence [3].
  • Another nice paper by Dorit Aharonov, Jordan Cotler, and Xiao-Liang Qi [4] also proved advantages of quantum models over classical one in some classification tasks.


[1] Huang, Hsin-Yuan, Richard Kueng, and John Preskill. “Information-Theoretic Bounds on Quantum Advantage in Machine Learning.” Physical Review Letters 126: 190505 (2021).

[2] Huang, Hsin-Yuan, Richard Kueng, and John Preskill. “Predicting many properties of a quantum system from very few measurements.” Nature Physics 16: 1050-1057 (2020).

[3] Huang, Hsin-Yuan, et al. “Power of data in quantum machine learning.” Nature communications 12.1 (2021): 1-9.

[4] Aharonov, Dorit, Jordan Cotler, and Xiao-Liang Qi. “Quantum Algorithmic Measurement.” arXiv preprint arXiv:2101.04634 (2021).

May 31, 2021

John PreskillLearning about learning

The autumn of my sophomore year of college was mildly hellish. I took the equivalent of three semester-long computer-science and physics courses, atop other classwork; co-led a public-speaking self-help group; and coordinated a celebrity visit to campus. I lived at my desk and in office hours, always declining my flatmates’ invitations to watch The West Wing

Hard as I studied, my classmates enjoyed greater facility with the computer-science curriculum. They saw immediately how long an algorithm would run, while I hesitated and then computed the run time step by step. I felt behind. So I protested when my professor said, “You’re good at this.” 

I now see that we were focusing on different facets of learning. I rued my lack of intuition. My classmates had gained intuition by exploring computer science in high school, then slow-cooking their experiences on a mental back burner. Their long-term exposure to the material provided familiarity—the ability to recognize a new problem as belonging to a class they’d seen examples of. I was cooking course material in a mental microwave set on “high,” as a semester’s worth of material was crammed into ten weeks at my college.

My professor wasn’t measuring my intuition. He only saw that I knew how to compute an algorithm’s run time. I’d learned the material required of me—more than I realized, being distracted by what I hadn’t learned that difficult autumn.

We can learn a staggering amount when pushed far from our comfort zones—and not only we humans can. So can simple collections of particles.

Examples include a classical spin glass. A spin glass is a collection of particles that shares some properties with a magnet. Both a magnet and a spin glass consist of tiny mini-magnets called spins. Although I’ve blogged about quantum spins before, I’ll focus on classical spins here. We can imagine a classical spin as a little arrow that points upward or downward.  A bunch of spins can form a material. If the spins tend to point in the same direction, the material may be a magnet of the sort that’s sticking the faded photo of Fluffy to your fridge.

The spins may interact with each other, similarly to how electrons interact with each other. Not entirely similarly, though—electrons push each other away. In contrast, a spin may coax its neighbors into aligning or anti-aligning with it. Suppose that the interactions are random: Any given spin may force one neighbor into alignment, gently ask another neighbor to align, entreat a third neighbor to anti-align, and having nothing to say to neighbors four and five.

The spin glass can interact with the external world in two ways. First, we can stick the spins in a magnetic field, as by placing magnets above and below the glass. If aligned with the field, a spin has negative energy; and, if antialigned, positive energy. We can sculpt the field so that it varies across the spin glass. For instance, spin 1 can experience a strong upward-pointing field, while spin 2 experiences a weak downward-pointing field.

Second, say that the spins occupy a fixed-temperature environment, as I occupy a 74-degree-Fahrenheit living room. The spins can exchange heat with the environment. If releasing heat to the environment, a spin flips from having positive energy to having negative—from antialigning with the field to aligning.

Let’s perform an experiment on the spins. First, we design a magnetic field using random numbers. Whether the field points upward or downward at any given spin is random, as is the strength of the field experienced by each spin. We sculpt three of these random fields and call the trio a drive.

Let’s randomly select a field from the drive and apply it to the spin glass for a while; again, randomly select a field from the drive and apply it; and continue many times. The energy absorbed by the spins from the fields spikes, then declines.

Now, let’s create another drive of three random fields. We’ll randomly pick a field from this drive and apply it; again, randomly pick a field from this drive and apply it; and so on. Again, the energy absorbed by the spins spikes, then tails off.

Here comes the punchline. Let’s return to applying the initial fields. The energy absorbed by the glass will spike—but not as high as before. The glass responds differently to a familiar drive than to a new drive. The spin glass recognizes the original drive—has learned the first fields’ “fingerprint.” This learning happens when the fields push the glass far from equilibrium,1 as I learned when pushed during my mildly hellish autumn.

So spin glasses learn drives that push them far from equilibrium. So do many other simple, classical, many-particle systems: polymers, viscous liquids, crumpled sheets of Mylar, and more. Researchers have predicted such learning and observed it experimentally. 

Scientists have detected many-particle learning by measuring thermodynamic observables. Examples include the energy absorbed by the spin glass—what thermodynamicists call work. But thermodynamics developed during the 1800s, to describe equilibrium systems, not to study learning. 

One study of learning—the study of machine learning—has boomed over the past two decades. As described by the MIT Technology Review, “[m]achine-learning algorithms use statistics to find patterns in massive amounts of data.” Users don’t tell the algorithms how to find those patterns.

It seems natural and fitting to use machine learning to learn about the learning by many-particle systems. That’s what I did with collaborators from the group of Jeremy England, a GlaxoSmithKline physicist who studies complex behaviors of many particle systems. Weishun Zhong, Jacob Gold, Sarah Marzen, Jeremy, and I published our paper last month. 

Using machine learning, we detected and measured many-particle learning more reliably and precisely than thermodynamic measures seem able to. Our technique works on multiple facets of learning, analogous to the intuition and the computational ability I encountered in my computer-science course. We illustrated our technique on a spin glass, but one can apply our approach to other systems, too. I’m exploring such applications with collaborators at the University of Maryland.

The project pushed me far from my equilibrium: I’d never worked with machine learning or many-body learning. But it’s amazing, what we can learn when pushed far from equilibrium. I first encountered this insight sophomore fall of college—and now, we can quantify it better than ever.

1Equilibrium is a quiet, restful state in which the glass’s large-scale properties change little. No net flow of anything—such as heat or particles—enter or leave the system.

May 26, 2021

Peter Rohde Supermoon

May 13, 2021

Peter Rohde An Australian strategy for the quantum revolution

We’ve just released our policy brief “An Australian strategy for the quantum revolution with ASPI (the Australian Strategic Policy Institute) and their International Cyber Policy Centre, jointly with Gavin Brennen, Simon Devitt and Tara Roberson.

You can download the full PDF here.

May 11, 2021

Peter Rohde Introducing QuNet: A quantum internet simulator using cost-vector analysis

We’re pleased to announce our quantum network simulator QuNet, written by my PhD student Hudson Leone.

The Julia source code and documentation are available on GitHub here.

The development of QuNet is based on the theoretical work performed in conjunction with Hudson Leone, Nathaniel Miller, Deepesh Singh, Nathan Langford and myself, presented in our recent arXiv paper “QuNet: Cost vector analysis & multi-path entanglement routing in quantum networks” here.

Here’s the condensed Twitter-thread version:

May 08, 2021

Clifford JohnsonMatrices and Gravity

So I have a confession to make. I started working on random matrix models (the large $latex N$, double-scaled variety) in 1990 or 1991, so about 30 years ago, give or take. I've written many papers on the topic, some of which people have even read. A subset of those have even been cited from time to time. So I'm supposed to be some kind of expert. I've written extensively about them here (search for matrix models and see what comes up), including posts on how exciting they are for understanding aspects of quantum gravity and black holes. So you'd think that I'd actually done the obvious thing right? Actually taken a bunch of random matrices and played with them directly. I don't mean the fancy path integral formulation we all learn, where you take N large, find saddle points, solve for the Wigner semi-circle law that the Dyson gas of eigenvalues forms, and so forth. I don't mean the Feynman expansion of that same path integral, and identify (following 't Hooft) their topology with a tessellation of random 2D surfaces. I don't mean the decomposition into orthogonal polynomials, the rewriting of the whole problem at large $latex N$ as a theory of quantum mechanics, and so forth. No, those things I know well. I just mean do what it says on the packet: close your eyes, grab a matrix out of the bag at random, compute its eigenvalues. Then do it again. Repeat a few thousand times and see that all those things in the data that we compute those fancy ways really are true. I realized the other day that in 30 years I'd never actually done that, and (motivated by the desire to make a simple visual illustration of a point) I decided to do it, and it opened up some wonderful vistas.

Let me tell you a little more. [...] Click to continue reading this post

The post Matrices and Gravity appeared first on Asymptotia.